ik_llama.cpp/ggml
Kawrakow 4819257ce6
Quantization improvements (#295)
* Better make_qx_quants

Tested with q4_0 and q3_K (pure, imatrix), and the improvement is
quite significant.

* Sae for iq4_nl, iq4_xs

---------

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2025-03-29 08:09:52 +01:00
..
cmake Merge mainline llama.cpp (#3) 2024-07-27 07:55:01 +02:00
include Use bf16 instead of fp16 block scales for q8_1 (#292) 2025-03-27 05:49:16 +01:00
src Quantization improvements (#295) 2025-03-29 08:09:52 +01:00
.gitignore Merge mainline llama.cpp (#3) 2024-07-27 07:55:01 +02:00
CMakeLists.txt Compile time option to use bf16 for qunts without MMQ kernels (#261) 2025-03-18 07:37:10 +01:00