ik_llama.cpp

History

Kawrakow 0c15494c30 Offload only activated experts to the GPU (#698 ) * Offload only activated experts * This seems to do the trick for -fmoe * Do not recalculate activated expers for fused up/gate * Log out of bounds access details * Add a command line argument --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>		2025-09-04 12:22:30 +02:00
..
ggml-alloc.h	Merge mainline llama.cpp (#3 )	2024-07-27 07:55:01 +02:00
ggml-backend.h	Offload only activated experts to the GPU (#698 )	2025-09-04 12:22:30 +02:00
ggml-blas.h	Merge mainline llama.cpp (#3 )	2024-07-27 07:55:01 +02:00
ggml-cann.h	Merge mainline llama.cpp (#3 )	2024-07-27 07:55:01 +02:00
ggml-cuda.h	Merge mainline - Aug 12 2024 (#17 )	2024-08-12 15:14:32 +02:00
ggml-kompute.h	Merge mainline llama.cpp (#3 )	2024-07-27 07:55:01 +02:00
ggml-metal.h	Merge mainline - Aug 12 2024 (#17 )	2024-08-12 15:14:32 +02:00
ggml-rpc.h	Fix non rpc build error (#506 )	2025-06-08 17:27:00 +03:00
ggml-sycl.h	Merge mainline llama.cpp (#3 )	2024-07-27 07:55:01 +02:00
ggml-vulkan.h	Vulkan: a fresh start (#608 )	2025-07-15 08:03:13 +02:00
ggml.h	Fused FFN_UP+FFN_GATE op (#741 )	2025-08-31 18:16:36 +03:00