Default Branch

b2cb4512c5 · Create parameters overview (#1269) · Updated 2026-02-20 07:20:56 +01:00

Branches

ec45020e37 · Leave FFN partial results as f16 · Updated 2025-11-28 08:25:20 +01:00    git

182
15

4c4c84ba7f · Attempt to fix #1014 · Updated 2025-11-27 10:11:23 +01:00    git

4211
4028

0a6e650e29 · Fix llama-bench mla parameter · Updated 2025-11-27 09:30:58 +01:00    git

4211
4028

2339d41d2e · Change default RPC order and fix wrong RPC order in --device arg · Updated 2025-11-26 03:32:00 +01:00    git

187
1

43f9f342dd · Add MXFP4 to gguf-py constants · Updated 2025-11-24 15:42:33 +01:00    git

4211
4025

422585d726 · Enable iq4_nl KV cache on CUDA · Updated 2025-11-24 09:39:14 +01:00    git

4211
4024

8297d10111 · Fix q6_0 dequantize · Updated 2025-11-24 09:04:46 +01:00    git

4211
4023

99e0e334a5 · Disable RoPE cache · Updated 2025-11-24 07:08:07 +01:00    git

4211
4021

0369d2ba44 · Gigachat: CPU FA (needs 192 x 192 for MLA = 3) · Updated 2025-11-21 10:44:34 +01:00    git

4211
4018

2e4bfed583 · WIP: try syncing - not working yet · Updated 2025-11-20 14:30:43 +01:00    git

196
1

b9d25dc35b · Fix requatizing from row-interleaved quants · Updated 2025-11-20 11:45:56 +01:00    git

197
1

8f7dd2f06b · Make gguf-py stuff work with numpy 2.0 · Updated 2025-11-20 10:11:01 +01:00    git

199
1

4b731fe333 · Fix junja -> junja · Updated 2025-11-20 09:01:21 +01:00    git

199
3

00259c14a7 · Also llama-bench · Updated 2025-11-19 16:14:52 +01:00    git

200
2

810c47fc38 · Attempt to fix #974 · Updated 2025-11-19 13:50:35 +01:00    git

202
1

c1d0738a1b · Make sure we can fuse Q and K RoPE for DeepSeek models · Updated 2025-11-19 13:39:34 +01:00    git

204
1

f514891418 · Fuse sum_rows and div with topk-moe · Updated 2025-11-19 11:14:33 +01:00    git

204
1

5195e38d47 · Fuse Q and K RoPE · Updated 2025-11-18 13:05:15 +01:00    git

206
1

a1c32c1d39 · Add usage for -vq, --validate-quants · Updated 2025-11-17 16:00:51 +01:00    git

4211
4004

415015f386 · Handle context shift better to reduce pp · Updated 2025-11-17 02:35:25 +01:00    git

211
1