Default Branch

fc2b0053ff · ggml-cuda: Repost of 21896: Blackwell native NVFP4 support (#22196) · Updated 2026-04-29 00:47:42 +02:00

Branches

e6dbc81569 · metal : cap threadgroups size of set_rows · Updated 2025-11-10 15:17:09 +01:00

1957
1

3ad533689c · ggml : remove KQ mask padding · Updated 2025-11-10 13:35:25 +01:00

1959
1

2ef41855cf · convert : for FP8, use scale type to decide auto type · Updated 2025-11-07 04:55:53 +01:00

1997
16

e996f3aef8 · convert : fix no-lazy dtypes from direct safetensors · Updated 2025-11-07 04:33:09 +01:00

1997
3

128118fdbe · convert : use F32 for dequant of pack-quantized tensors · Updated 2025-11-07 03:59:32 +01:00

1997
6

23b70f4f70 · Initial plan · Updated 2025-11-04 12:00:12 +01:00

2025
1

79b98dbf96 · Merge branch 'master' into xsn/mtmd_custom_min_max_tokens · Updated 2025-11-02 22:14:03 +01:00

2040
2

d441c31b19 · metal : remove stray return · Updated 2025-11-02 17:24:00 +01:00

2049
9

d7f794eadb · convert : avoid dequantizing mxfp4 for GPT-OSS · Updated 2025-10-24 13:56:26 +02:00

2136
1

93fbd407f3 · Merge branch 'master' into compilade/convert-prequant · Updated 2025-10-23 20:23:12 +02:00

2139
6

f0076dc5a0 · metal : adjust .get_alloc_size to be alloc friendly · Updated 2025-10-19 16:20:54 +02:00

2169
1

96f9f391c7 · ggml : fix unaligned access in AMX code · Updated 2025-09-29 09:37:15 +02:00

2349
1

a8b0089a5b · ggml : remove SVE paths · Updated 2025-09-28 19:26:03 +02:00

2349
1

837b1b4563 · ggml : remove KQ mask padding · Updated 2025-09-28 17:10:17 +02:00

2352
6

17ca6ed540 · Implement llama-pull tool · Updated 2025-09-20 18:25:21 +02:00

2440
1

e83ef74733 · one less magic number · Updated 2025-09-20 07:58:36 +02:00

2459
6

652d303b32 · metal : fuse add + rms · Updated 2025-09-18 15:29:25 +02:00

2457
1

64c6dcbe6d · metal : make the NSG a function constant in mul_mv kernels · Updated 2025-09-18 10:31:59 +02:00

2462
2

6045c5a263 · cont : put all buffers in the same virtual address space · Updated 2025-09-14 14:46:57 +02:00

2498
2

3f62ee8bee · metal : back to a single queue per device · Updated 2025-09-09 16:06:46 +02:00

2540
9