ik_llama.cpp/ggml
Kawrakow 46968d4ab1
Sanitize imatrix (#735)
* sanitize importance matrix: WIP

* sanitize importance matrix: iq4_k

* sanitize importance matrix: iq5_k, iq6_k

* sanitize imatrix: iq4_ks

* sanitize imatrix: iq4_kss

* sanitize imatrix: iq2_ks and iq2_kl

* sanitize imatrix: iq5_ks

* sanitize imatrix: iq4_nl_r4

* sanitize imatrix: q4_0_r8

* sanitize imatrix: q6_0_r4

* sanitize imatrix: iq4_xs_r8

* sanitize imatrix: iq4_xs_r8 and q3_k_r4 with a template

* sanitize imatrix: q2_k_r4, q4_k_r4, q5_k_r4, q6_k_r4

* sanitize imatrix: repacked i-quants

* Minor

* Add more checks for iq3_k, iq3_ks

---------

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2025-08-29 09:08:15 +03:00
..
cmake Merge mainline llama.cpp (#3) 2024-07-27 07:55:01 +02:00
include AVX512+AVXVNNI GEMM implementation for quants using Q8_K for activations (#710) 2025-08-22 06:27:07 +03:00
src Sanitize imatrix (#735) 2025-08-29 09:08:15 +03:00
.gitignore Merge mainline llama.cpp (#3) 2024-07-27 07:55:01 +02:00
CMakeLists.txt Enable CUDA graphs for MoE models + GPT-OSS support (#689) 2025-08-15 09:18:07 +03:00