ik_llama.cpp

History

Iwan Kawrakow 9a790a8905 Introducing rope cache When computing RoPE, the rotation angles in each layer are exactly the same, and only depend on the token positions (and other constant, model dependent parameters). So, I wonder, why don't we compute the angles just once and then reuse for the Q and K RoPE in each layer? This commit does it as a POC on the CPU, and uses it in the Qwen3-MoE compute graph.		2025-11-03 08:30:32 +02:00
..
ggml-alloc.h	Merge mainline llama.cpp (#3 )	2024-07-27 07:55:01 +02:00
ggml-backend.h	Offload only activated experts to the GPU (#698 )	2025-09-04 12:22:30 +02:00
ggml-blas.h	Merge mainline llama.cpp (#3 )	2024-07-27 07:55:01 +02:00
ggml-cann.h	Merge mainline llama.cpp (#3 )	2024-07-27 07:55:01 +02:00
ggml-cpp.h	Port mdmd from mainline + Qwen2/2.5-VL support (#798 )	2025-09-27 08:45:29 +02:00
ggml-cuda.h	Merge mainline - Aug 12 2024 (#17 )	2024-08-12 15:14:32 +02:00
ggml-kompute.h	Merge mainline llama.cpp (#3 )	2024-07-27 07:55:01 +02:00
ggml-metal.h	Merge mainline - Aug 12 2024 (#17 )	2024-08-12 15:14:32 +02:00
ggml-rpc.h	Fix non rpc build error (#506 )	2025-06-08 17:27:00 +03:00
ggml-sycl.h	Merge mainline llama.cpp (#3 )	2024-07-27 07:55:01 +02:00
ggml-vulkan.h	Vulkan: a fresh start (#608 )	2025-07-15 08:03:13 +02:00
ggml.h	Introducing rope cache	2025-11-03 08:30:32 +02:00