ik_llama.cpp

History

Iwan Kawrakow 2b9526c8b6 q8_KV_r8 - repacked q8_KV On Zen4 it is slower than q8_k_r8 (292 vs 370 t/s) This makes no sense whatsoever as the q8_KV_r8 GEMM is basically the q8_k_r8 GEMM with the unnecessary block stuff removed (so, one would think that it would be faster).	2025-02-19 10:03:15 +02:00
..
llama.h	q8_KV_r8 - repacked q8_KV	2025-02-19 10:03:15 +02:00

Iwan Kawrakow 2b9526c8b6 q8_KV_r8 - repacked q8_KV

On Zen4 it is slower than q8_k_r8 (292 vs 370 t/s)
This makes no sense whatsoever as the q8_KV_r8 GEMM is
basically the q8_k_r8 GEMM with the unnecessary block stuff
removed (so, one would think that it would be faster).

2025-02-19 10:03:15 +02:00

llama.h

q8_KV_r8 - repacked q8_KV

2025-02-19 10:03:15 +02:00