ik_llama.cpp/ggml
Kawrakow a48e163247
DeepSeek imatrix stuff (#250)
* This gives us ~20% TG speedup for DeepSeek on CUDA

* Slightly better

* Also do it for plain (not fused) mul_mat_id

* Guard against numerical precision issues for MLA on CUDA

* imatrix: wv_b <-> wkv_b

---------

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2025-03-10 16:19:09 +02:00
..
cmake Merge mainline llama.cpp (#3) 2024-07-27 07:55:01 +02:00
include SER - Smart Expert Reduction (#239) 2025-03-02 13:47:38 +02:00
src DeepSeek imatrix stuff (#250) 2025-03-10 16:19:09 +02:00
.gitignore Merge mainline llama.cpp (#3) 2024-07-27 07:55:01 +02:00
CMakeLists.txt FA: Add option to build all FA kernels (#197) 2025-02-09 18:59:33 +02:00