Default Branch

b2cb4512c5 · Create parameters overview (#1269) · Updated 2026-02-20 07:20:56 +01:00

Branches

ffabdce3ce · Fighting with cmake · Updated 2025-03-22 16:49:43 +01:00    git

4211
3603

0964a49990 · Cleanup · Updated 2025-03-22 11:00:26 +01:00    git

4211
3605

e1684a8d47 · Revert changes to convert_hf_to_gguf.py · Updated 2025-03-21 18:01:33 +01:00    git

4211
3604

5e1944bdec · Fix bug: missing parentheses in logical expression · Updated 2025-03-21 13:17:48 +01:00    git

4211
3601

da7d0ffba6 · Specify tensor name regex for tensors to be repacked · Updated 2025-03-21 08:03:13 +01:00    git

4211
3600

4632cb94d8 · FlashMLA-3: the best of both worlds - CPU only · Updated 2025-03-20 15:58:19 +01:00    git

4211
3605

9fe6fc3782 · Add missing include · Updated 2025-03-20 15:57:38 +01:00    git

4211
3604

1b62d0fae3 · Honor mmap setting when using tensor overrides · Updated 2025-03-19 16:05:04 +01:00    git

4211
3597

60c9495c2f · Fix ggml_compute_forward_dup_q · Updated 2025-03-19 15:44:34 +01:00    git

4211
3596

529f75c220 · Prevent FlashMLA-1 from running on CUDA · Updated 2025-03-19 11:07:51 +01:00    git

4211
3595

96d1235fb0 · Allow q8_0 cache on the CPU for FlashMLA-2 · Updated 2025-03-18 13:08:52 +01:00    git

4211
3593

a9440bd3e9 · Make Q8_0 KV cache work with mla=2,fa on CUDA · Updated 2025-03-18 10:57:32 +01:00    git

4211
3593

55b2cf98d2 · Fix #261 · Updated 2025-03-18 07:43:45 +01:00    git

4211
3592

f326a5eaf7 · Compile time option to use bf16 for qunts without MMQ kernels · Updated 2025-03-17 19:38:20 +01:00    git

4211
3590

b147e31f5a · Reduce memory usage for FlashMLA-2 · Updated 2025-03-17 14:00:26 +01:00    git

4211
3601

f2fb15de77 · Fix CUDA · Updated 2025-03-16 06:40:18 +01:00    git

4211
3596

765c03d09b · FlashMLA-2: slightly smaller computer buffer size · Updated 2025-03-12 14:06:31 +01:00    git

4211
3590

50bbc3f335 · FlashMLA(CUDA) - allow q8_0 for KV cache · Updated 2025-03-11 17:41:39 +01:00    git

4211
3590

e0eebfd8ad · Try using fp32 for FlashMLA · Updated 2025-03-10 18:07:53 +01:00    git

4211
3587

56921ccd49 · imatrix: wv_b <-> wkv_b · Updated 2025-03-10 14:31:22 +01:00    git

4211
3589