ik_llama.cpp/ggml/include
Iwan Kawrakow 9a790a8905 Introducing rope cache
When computing RoPE, the rotation angles in each layer
are exactly the same, and only depend on the token positions
(and other constant, model dependent parameters).
So, I wonder, why don't we compute the angles just once
and then reuse for the Q and K RoPE in each layer?

This commit does it as a POC on the CPU, and uses it in
the Qwen3-MoE compute graph.
2025-11-03 08:30:32 +02:00
..
ggml-alloc.h Merge mainline llama.cpp (#3) 2024-07-27 07:55:01 +02:00
ggml-backend.h Offload only activated experts to the GPU (#698) 2025-09-04 12:22:30 +02:00
ggml-blas.h Merge mainline llama.cpp (#3) 2024-07-27 07:55:01 +02:00
ggml-cann.h Merge mainline llama.cpp (#3) 2024-07-27 07:55:01 +02:00
ggml-cpp.h Port mdmd from mainline + Qwen2/2.5-VL support (#798) 2025-09-27 08:45:29 +02:00
ggml-cuda.h Merge mainline - Aug 12 2024 (#17) 2024-08-12 15:14:32 +02:00
ggml-kompute.h Merge mainline llama.cpp (#3) 2024-07-27 07:55:01 +02:00
ggml-metal.h Merge mainline - Aug 12 2024 (#17) 2024-08-12 15:14:32 +02:00
ggml-rpc.h Fix non rpc build error (#506) 2025-06-08 17:27:00 +03:00
ggml-sycl.h Merge mainline llama.cpp (#3) 2024-07-27 07:55:01 +02:00
ggml-vulkan.h Vulkan: a fresh start (#608) 2025-07-15 08:03:13 +02:00
ggml.h Introducing rope cache 2025-11-03 08:30:32 +02:00