Default Branch

24d2ee0527 · [WebGPU] Fix wait logic for inflight jobs (#20096) · Updated 2026-03-04 20:54:55 +01:00

Branches

a5b1943912 · ggml-quants : fix some edge cases in make_qkxh_nl_quants · Updated 2025-03-23 22:59:37 +01:00

3260
13

35c2f8b9ff · llama-vocab : add SuperBPE pre-tokenizer · Updated 2025-03-23 21:19:11 +01:00

3257
1

b8b173274d · server : remove old commented code [no ci] · Updated 2025-03-20 17:20:54 +01:00

3287
45

7a3c178d78 · speculative : adapt to new llama API · Updated 2025-03-18 21:05:44 +01:00

3287
36

29acf2cf05 · context : move the change to llama_context::encode() · Updated 2025-03-18 10:55:19 +01:00

3291
2

90f17bba01 · Vulkan: Default to 1GB allocations instead of 4GB to avoid fragmentation and driver issues · Updated 2025-03-17 20:41:11 +01:00

3296
1

f6711cef44 · CUDA: determine FA parallel blocks at runtime · Updated 2025-03-16 14:36:57 +01:00

3359
1

c4aca65582 · hparams : add SWA rope parameters · Updated 2025-03-13 18:26:09 +01:00

3318
1

21fe0ce4eb · hparams : add comment [no ci] · Updated 2025-03-13 16:56:38 +01:00

3319
2

ed58975f51 · server : improve infill stop criteria · Updated 2025-03-12 14:28:48 +01:00

3330
1

87dae2fd15 · Vulkan: Print coopmat shapes, then exit · Updated 2025-03-09 11:53:55 +01:00

3344
1

25840747e6 · Vulkan: Add device architecture enum and logic to recognize AMD generations · Updated 2025-03-08 09:04:45 +01:00

3524
2

c75753a01b · server : infill gen ends on new line · Updated 2025-03-07 16:19:55 +01:00

3347
1

aefa65e442 · ci : fix save-load test invokations · Updated 2025-03-07 11:17:33 +01:00

3352
1

aae2903e0b · clang-tidy : disable bugprone-branch-clone · Updated 2025-03-07 10:36:55 +01:00

3353
1

624f7bd03b · graph : add comments · Updated 2025-02-28 20:13:08 +01:00

3417
95

0f2bf55502 · speculative : do not discard the last drafted token · Updated 2025-02-19 08:21:39 +01:00

3460
2

8654805027 · docker : publish to both ggerganov and ggml-org · Updated 2025-02-15 15:18:04 +01:00

3501
1

f30aca84b2 · Revert "HIP: Switch to std::vector in rocblas version check (#11820)" · Updated 2025-02-12 19:22:04 +01:00

3505
1

d86e23101e · server : minor log updates · Updated 2025-02-08 15:23:37 +01:00

3531
1