Default Branch

b2cb4512c5 · Create parameters overview (#1269) · Updated 2026-02-20 07:20:56 +01:00

Branches

cddaa465b1 · Fix merge · Updated 2026-02-05 13:17:34 +01:00    git

33
16

38f19c55f9 · Also here · Updated 2026-02-05 12:40:30 +01:00    git

34
2

421b70b522 · Fuse the clamp (CUDA mmvq) · Updated 2026-02-04 12:34:32 +01:00    git

40
10

4f4d328f66 · Cleanup · Updated 2026-02-03 09:41:38 +01:00    git

40
2

24039c10a1 · Cleanup · Updated 2026-02-03 08:11:31 +01:00    git

41
2

6361471d5e · Fix constants.py · Updated 2026-02-02 20:10:44 +01:00    git

44
2

6944e7e68d · This is slightly better for CPU-only inference · Updated 2026-02-02 17:00:09 +01:00    git

44
1

c8d7522b3f · Fix CPU FA work buffer size · Updated 2026-02-02 11:37:24 +01:00    git

45
1

d5498c4467 · Do not repack q8_0 for batch sizes less than 8 · Updated 2026-02-02 10:07:45 +01:00    git

46
0
Included

685df0e69d · Work buffer size · Updated 2026-01-31 17:10:23 +01:00    git

48
0
Included

b85a2a50d5 · Reduce compute buffer size for mla=3 · Updated 2026-01-31 11:43:05 +01:00    git

55
0
Included

4d13ae03b5 · Also these other two places · Updated 2026-01-30 16:36:29 +01:00    git

53
0
Included

efd331f3eb · Fix bug in the CPU flash attention implementation · Updated 2026-01-30 08:50:48 +01:00    git

56
1

ffc9e48a6f · CUDA FA · Updated 2026-01-29 18:06:47 +01:00    git

56
3

647d15adca · Use standard output calculation for MiniMax-M2 graph parallel · Updated 2026-01-29 08:03:39 +01:00    git

57
1

629f546db1 · Be able to set FA offset via command line argument · Updated 2026-01-29 07:28:00 +01:00    git

58
1

6b37066b8f · Forgot to add to fattn-common.h · Updated 2026-01-28 14:08:59 +01:00    git

60
2

910ab2c6e4 · Forgotten ffn_exp_probs_b · Updated 2026-01-28 11:29:36 +01:00    git

60
3

ce4e447e0d · Much faster long context TG for Minimax-M2 · Updated 2026-01-28 09:35:43 +01:00    git

61
1

345545d1be · Remove unused arguments · Updated 2026-01-27 18:26:26 +01:00    git

62
2