ik_llama.cpp

History

Iwan Kawrakow 20d50172d0 Much better FA TG with q8_0 KV cache Just repack it even for TG. But do the repacking for k_step rows, not the whole K tensor.		2025-04-28 11:26:28 +03:00
..
cmake	Merge mainline llama.cpp (#3 )	2024-07-27 07:55:01 +02:00
include	Add copyright notices (#317 )	2025-04-07 10:43:26 +02:00
src	Much better FA TG with q8_0 KV cache	2025-04-28 11:26:28 +03:00
.gitignore	Merge mainline llama.cpp (#3 )	2024-07-27 07:55:01 +02:00
CMakeLists.txt	Compile time option to use bf16 for qunts without MMQ kernels (#261 )	2025-03-18 07:37:10 +01:00