Default Branch

b2cb4512c5 · Create parameters overview (#1269) · Updated 2026-02-20 07:20:56 +01:00

Branches

110e9262c1 · Quick hack: add the MLA flag to llama_hparams · Updated 2025-06-06 09:47:29 +02:00    git

4211
3729

8ed7825fea · iq1_m_r4: CUDA dequantize · Updated 2025-06-05 08:38:15 +02:00    git

4211
3730

40059d0c5d · Another forgotten file · Updated 2025-06-05 07:24:30 +02:00    git

4211
3730

fb6a0d0184 · iq1_s_r4: MMQ on CUDA · Updated 2025-06-04 14:11:17 +02:00    git

4211
3728

106e326993 · More README · Updated 2025-06-03 14:16:28 +02:00    git

4211
3729

62d5e5365b · Also do the dequantize approach for iqk_moe_fused_up_gate · Updated 2025-06-03 10:11:46 +02:00    git

4211
3724

626f49ab84 · Check if MMVQ is supported before using it. · Updated 2025-06-03 08:16:53 +02:00    git

4211
3724

d4b1a7f9c5 · Adding the XTC sampler · Updated 2025-06-03 07:55:02 +02:00    git

4211
3723

061d064b21 · If available, use bf16 for iq4_kt gemm/gemv · Updated 2025-06-02 10:59:20 +02:00    git

4211
3726

a7fa24a6c5 · Disable iq4_kt on Metal for now · Updated 2025-06-01 14:21:19 +02:00    git

4211
3725

0ae9a5450d · F16 repacking attempt - slower on AVX2 · Updated 2025-06-01 10:18:02 +02:00    git

4211
3725

079753abd7 · Minor · Updated 2025-06-01 06:24:21 +02:00    git

4211
3724

df257a07e6 · Replace MLA-specific KV cache with the standard KV cache V2 (#473) · Updated 2025-05-30 09:28:27 +02:00    git

4211
3718

ae3816e13d · Fix double print · Updated 2025-05-30 09:24:22 +02:00    git

4211
3719

17dcd4dc89 · iq4_kt: slightly faster TG on NEON · Updated 2025-05-29 16:07:42 +02:00    git

4211
3718

1a203fdbc5 · Send [DONE] for OAI compatibility · Updated 2025-05-29 06:33:05 +02:00    git

4211
3716

9b97acd500 · Minor (~2%) iq2_ks TG performance improvement on CUDA · Updated 2025-05-28 12:17:18 +02:00    git

4211
3716

b033ca894b · set cache_prompt default to true · Updated 2025-05-28 02:39:02 +02:00    git

4211
3716

64c754ba8b · CUDA: iq5_ks_r4 GEMV and GEMM · Updated 2025-05-26 18:36:53 +02:00    git

4211
3716

1a8145e48e · CUDA: faster iq2_k_r4 GEMV · Updated 2025-05-26 15:29:36 +02:00    git

4211
3724