Default Branch

b2cb4512c5 · Create parameters overview (#1269) · Updated 2026-02-20 07:20:56 +01:00

Branches

ccf72a0e46 · Also this · Updated 2025-12-09 07:36:31 +01:00    git

161
2

c83d2fd335 · WIP · Updated 2025-12-08 16:44:53 +01:00    git

163
3

be8e7057b3 · Handle split cache (read) · Updated 2025-12-08 09:55:35 +01:00    git

162
2

0e683f24ad · Fix annoying compiler warnings · Updated 2025-12-06 09:57:50 +01:00    git

164
1

a4da6e298a · Automatically disable CUDA graphs for split mode "graph" · Updated 2025-12-05 18:00:58 +01:00    git

165
1

b18f658a7d · CUDA: set current device in compute_forward · Updated 2025-12-05 16:40:48 +01:00    git

167
1

ed8a3d8e3d · Don't split the output tensor · Updated 2025-12-05 14:16:11 +01:00    git

168
1

9264abfbaf · Fix debug build (#1037) · Updated 2025-12-05 14:06:22 +01:00    git

168
0
Included

c374b221b6 · Mistral3-large · Updated 2025-12-04 17:05:40 +01:00    git

4211
4042

6387a5800a · Minor · Updated 2025-12-04 06:52:05 +01:00    git

170
2

9c17d5f176 · WIP: Hadamard transforms for K-cache · Updated 2025-12-03 15:26:46 +01:00    git

171
1

ab19054a79 · Use standard attention for Ministral3 · Updated 2025-12-03 11:51:32 +01:00    git

173
1

c5f9a5c29a · Fix bug in ggml_cuda_op_scale_tensor · Updated 2025-12-03 11:28:26 +01:00    git

174
1

84129f7eb6 · Adding ministral3: this seems to work · Updated 2025-12-03 10:41:44 +01:00    git

4211
4037

dde8028336 · WIP: allocate graph · Updated 2025-12-03 08:54:53 +01:00    git

175
4

b415e734e5 · Fix also output · Updated 2025-12-03 05:53:44 +01:00    git

175
3

49ec5726d7 · Is this better for multi-GPU and split mode "graph"? · Updated 2025-12-02 09:44:46 +01:00    git

176
1

c4c266847f · Slightly better graph split strategy · Updated 2025-12-02 09:18:55 +01:00    git

176
1

864b496831 · Try to better distribute the splits · Updated 2025-12-01 14:18:56 +01:00    git

177
32

598e8e7d5f · Fix build with RPC not enabled · Updated 2025-11-30 19:03:48 +01:00    git

4211
4034