ik_llama.cpp/src
Kawrakow 041d79925c iq2_k: slightly better bpw - accuracy compromise (#20)
For LLaMA-3.1 models:
* It is better to quantize all of attn_v with iq3_k instead of
  half of attn_v with iq4_k
* Quantizing attn_output with iq3_k results in a larger PPL decrease
  compared to what one expects from the added bpw.

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2024-08-19 13:36:51 +03:00
..
CMakeLists.txt Merge mainline llama.cpp (#3) 2024-07-27 07:55:01 +02:00
llama-grammar.cpp Merge mainline - Aug 12 2024 (#17) 2024-08-12 15:14:32 +02:00
llama-grammar.h Merge mainline llama.cpp (#3) 2024-07-27 07:55:01 +02:00
llama-impl.h Merge mainline - Aug 12 2024 (#17) 2024-08-12 15:14:32 +02:00
llama-sampling.cpp Merge mainline llama.cpp (#3) 2024-07-27 07:55:01 +02:00
llama-sampling.h Merge mainline llama.cpp (#3) 2024-07-27 07:55:01 +02:00
llama-vocab.cpp Merge mainline - Aug 12 2024 (#17) 2024-08-12 15:14:32 +02:00
llama-vocab.h Merge mainline - Aug 12 2024 (#17) 2024-08-12 15:14:32 +02:00
llama.cpp iq2_k: slightly better bpw - accuracy compromise (#20) 2024-08-19 13:36:51 +03:00
unicode-data.cpp Merge mainline llama.cpp (#3) 2024-07-27 07:55:01 +02:00
unicode-data.h Merge mainline llama.cpp (#3) 2024-07-27 07:55:01 +02:00
unicode.cpp Merge mainline llama.cpp (#3) 2024-07-27 07:55:01 +02:00
unicode.h Merge mainline llama.cpp (#3) 2024-07-27 07:55:01 +02:00