Default Branch

b2cb4512c5 · Create parameters overview (#1269) · Updated 2026-02-20 07:20:56 +01:00

Branches

df226f38c4 · Fixed compilation after revert · Updated 2025-02-07 10:44:52 +01:00    git

4211
3549

38f2270a15 · Add additional checks for iq1_s_r4 quantization · Updated 2025-02-07 07:19:58 +01:00    git

4211
3546

9ac82537dc · cuda: non-contiguous rms norm · Updated 2025-02-06 18:41:17 +01:00    git

4211
3546

5c37edf98e · Rename iq4_xs_r4 to iq4_xs_r8 to reflect actual row interleaving · Updated 2025-02-06 15:46:44 +01:00    git

4211
3547

54585d6946 · iq1_m_r4: rename mul_mat_iq1_m_r4_q8_1 to mul_mat_iq1_m_r4_q8_0 · Updated 2025-02-06 08:56:18 +01:00    git

4211
3548

f3c6937fe5 · iq1_s_r4: slightly faster NEON gemm/gemv · Updated 2025-02-05 13:22:22 +01:00    git

4211
3543

3c9b116600 · Compiler warnings · Updated 2025-02-05 10:12:00 +01:00    git

4211
3551

b8966277c0 · Make q5,6_0_r4, iq4_nl_e4 work with row size that are not a multiple of 128 · Updated 2025-01-30 17:29:04 +01:00    git

4211
3548

195d7efc8e · Cleanup · Updated 2025-01-30 08:24:52 +01:00    git

4211
3546

23e90dc325 · Make q4_0_r4 work with tensor row sizes that are not a multiple of 128 · Updated 2025-01-29 08:55:10 +01:00    git

4211
3545

b22ed8bc66 · Be able to load Deepseek-v2-Lite · Updated 2025-01-27 16:47:24 +01:00    git

4211
3556

56ca4c3ba9 · FA: repack Q8_0 to Q8_0_R8 (NEON) · Updated 2025-01-26 11:24:38 +01:00    git

4211
3546

bb23d014ab · Removing missed conflict marker · Updated 2025-01-23 18:31:49 +01:00    git

4211
3537

d868ca149a · Disable mul_mat_Qx_Qy_Mx1 on AVX2 · Updated 2025-01-23 10:58:42 +01:00    git

4211
3535

cc7642c757 · Slightly faster fp16/bf16 gemv on AVX2 · Updated 2025-01-22 08:03:57 +01:00    git

4211
3534

ef2b0066b9 · On Zen4 repack fp16 models to bf16_r16 when run-time-repacking is requested · Updated 2025-01-21 18:14:57 +01:00    git

4211
3532

31d7424afb · FA: turn off performance timer · Updated 2025-01-19 17:37:46 +01:00    git

4211
3543

3e7d5c180c · On Zen4 it is also better to not use large Q steps for fp16 K-cache · Updated 2025-01-15 17:09:07 +01:00    git

4211
3539

983e86805e · Fix the strange FA behavior with odd/even batch sizes · Updated 2025-01-12 15:49:25 +01:00    git

4211
3529

e2f8747555 · Make sure rows per thread is a multiple of 4 also for MoE when using _r4 quants · Updated 2025-01-12 10:39:52 +01:00    git

4211
3529