ik_llama.cpp

History

Kawrakow a48e163247 DeepSeek imatrix stuff (#250 ) * This gives us ~20% TG speedup for DeepSeek on CUDA * Slightly better * Also do it for plain (not fused) mul_mat_id * Guard against numerical precision issues for MLA on CUDA * imatrix: wv_b <-> wkv_b --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>		2025-03-10 16:19:09 +02:00
..
cmake	Merge mainline llama.cpp (#3 )	2024-07-27 07:55:01 +02:00
include	SER - Smart Expert Reduction (#239 )	2025-03-02 13:47:38 +02:00
src	DeepSeek imatrix stuff (#250 )	2025-03-10 16:19:09 +02:00
.gitignore	Merge mainline llama.cpp (#3 )	2024-07-27 07:55:01 +02:00
CMakeLists.txt	FA: Add option to build all FA kernels (#197 )	2025-02-09 18:59:33 +02:00