Default Branch

b2cb4512c5 · Create parameters overview (#1269) · Updated 2026-02-20 07:20:56 +01:00

Branches

37caf11f2c · Cleanup · Updated 2026-01-08 09:18:34 +01:00    git

113
2

8308320bca · Do not abort on NCCL initizalization failure · Updated 2026-01-08 08:16:23 +01:00    git

4211
4098

646fe94085 · Force split_mode_f16 to false · Updated 2026-01-07 17:34:41 +01:00    git

4211
4098

edd56b1bf7 · Split mode "graph" for Hunyuan-MoE · Updated 2026-01-07 10:12:46 +01:00    git

4211
4095

a29f62fc50 · Enable up to 4 GPUs for Mimo2-Flash · Updated 2026-01-07 08:36:00 +01:00    git

4211
4094

10c531c8de · Actually enable it · Updated 2026-01-07 06:55:10 +01:00    git

4211
4094

289aadb9d4 · Disable ring reduction for now · Updated 2026-01-06 14:14:30 +01:00    git

4211
4092

b41f2c3ffe · Split mode 'graph' fpr Qwen3-VL · Updated 2026-01-05 14:21:10 +01:00    git

121
1

a725f15d9d · Split mode graph for Qwen3 · Updated 2026-01-05 09:00:30 +01:00    git

122
1

b586f89e50 · Set max_gpu to 2 for Mimo2 · Updated 2026-01-05 07:49:17 +01:00    git

123
3

ae3498dabd · Fix race in CUDA FA for head sizes 192/128 · Updated 2026-01-05 07:17:10 +01:00    git

4211
4088

ba0e88a5e3 · Minor · Updated 2025-12-28 09:57:24 +01:00    git

4211
4087

bf3ff8ec41 · Turn on graph reuse by default · Updated 2025-12-27 08:22:46 +01:00    git

4211
4083

29d323117c · Command line option to turn on async. Set to false by defualt for now · Updated 2025-12-27 07:24:01 +01:00    git

4211
4109

0e059879b7 · Be more careful with having set the device before using a stream · Updated 2025-12-26 19:18:28 +01:00    git

4211
4081

f109274859 · Graph parallel: better PP performance for 3 and more GPUs · Updated 2025-12-26 16:57:19 +01:00    git

4211
4080

d2f52ec104 · Fix split mode graph when p2p is not enabled · Updated 2025-12-25 08:52:38 +01:00    git

4211
4079

723e18bb98 · Reduce add improvemens without NCCL · Updated 2025-12-24 15:23:10 +01:00    git

4211
4078

c6a3903571 · Be able to set reduce op data type for split mode "graph" · Updated 2025-12-24 11:57:41 +01:00    git

4211
4075

2de3a96510 · Avoid computing the attention reduce op for cohere2 · Updated 2025-12-24 11:14:58 +01:00    git

4211
4076