llama.cpp/ggml/src
Jeff Bolz e06c3ab2bc
vulkan: change gated_delta_net to shard a column across a subgroup (#20662)
* vulkan: change gated_delta_net to shard a column across a subgroup

This is based on https://github.com/ggml-org/llama.cpp/pull/20391, I used an
LLM to port the CUDA code to Vulkan, and guided to it to make various fixes to
work with Vulkan (e.g. handling different subgroup sizes, unknown mapping of
subgroup to invocation id, using subgroupAdd optionally, etc.).

This fixes a perf regression from the transposing of the values in memory
(!20443).

* vulkan: Spread columns across fewer lanes to reduce the number of workgroups
2026-03-20 12:17:15 +01:00
..
ggml-blas ggml-blas: set mkl threads from thread context (#20602) 2026-03-18 01:16:49 +08:00
ggml-cann CANN: add BF16 support for core operators (#20152) 2026-03-20 17:08:39 +08:00
ggml-cpu ggml: guard KleidiAI DOWNLOAD_EXTRACT_TIMESTAMP for cmake < 3.24 (#20767) 2026-03-19 21:36:23 +02:00
ggml-cuda HIP : ignore return of hipMemAdvise [no ci] (#20696) 2026-03-18 09:53:13 +01:00
ggml-hexagon hexagon: add Matrix Extensions (HMX) for Hexagon NPU backend (#20693) 2026-03-19 09:11:06 -07:00
ggml-hip hip: Avoid compiler bug in RDNA code generation during debug builds on Windows (#20655) 2026-03-19 19:14:08 +01:00
ggml-metal metal : add FA specialization for HSK = 320, HSV = 256 (#20549) 2026-03-14 23:15:47 +02:00
ggml-musa
ggml-opencl opencl: fix l2_norm (#20480) 2026-03-13 22:18:52 -07:00
ggml-openvino ggml : add OpenVINO backend (#15307) 2026-03-14 07:56:55 +02:00
ggml-rpc rpc : use unordered_map::reserve and emplace (#18513) 2026-01-02 12:09:36 +02:00
ggml-sycl [SYCL] ehance UPSCALE to support all UT cases (#20637) 2026-03-17 10:01:52 +08:00
ggml-virtgpu ggml-virtgpu: improve the reliability of the code (#19846) 2026-02-26 20:00:57 +08:00
ggml-vulkan vulkan: change gated_delta_net to shard a column across a subgroup (#20662) 2026-03-20 12:17:15 +01:00
ggml-webgpu ggml webgpu: ops support for qwen3.5 (SET, TRI_SOLVE, SSM_CONV, GATED_DELTA_NET) + GET_ROWS optimization (#20687) 2026-03-19 08:45:28 -07:00
ggml-zdnn ggml-zdnn : mark zDNN buffers as non-host (#18967) 2026-01-22 01:16:21 +01:00
ggml-zendnn ggml-zendnn: update code for latest ZenDNN API (#19923) 2026-02-27 08:43:41 +08:00
CMakeLists.txt ggml : add OpenVINO backend (#15307) 2026-03-14 07:56:55 +02:00
ggml-alloc.c ggml : make ggml_is_view as API (#19539) 2026-02-16 17:43:34 +02:00
ggml-backend-dl.cpp hexagon: enable offloading to Hexagon on Windows on Snapdragon (#19150) 2026-01-29 12:33:21 -08:00
ggml-backend-dl.h hexagon: enable offloading to Hexagon on Windows on Snapdragon (#19150) 2026-01-29 12:33:21 -08:00
ggml-backend-impl.h llama: use host memory if device reports 0 memory (#18587) 2026-01-09 05:34:56 +08:00
ggml-backend-reg.cpp ggml : add OpenVINO backend (#15307) 2026-03-14 07:56:55 +02:00
ggml-backend.cpp llama : disable graph reuse with pipeline parallelism (#20463) 2026-03-12 21:04:13 +02:00
ggml-common.h ggml : add NVFP4 quantization type support (#19769) 2026-03-11 21:02:54 +01:00
ggml-impl.h ggml : add NVFP4 quantization type support (#19769) 2026-03-11 21:02:54 +01:00
ggml-opt.cpp
ggml-quants.c ggml : guard against sumq2 being 0 in IQ4_NL (#20460) 2026-03-15 10:47:28 +02:00
ggml-quants.h ggml : add NVFP4 quantization type support (#19769) 2026-03-11 21:02:54 +01:00
ggml-threading.cpp
ggml-threading.h
ggml.c ggml : restore ggml_type_sizef() to aboid major version bump (ggml/1441) 2026-03-18 15:17:28 +02:00
ggml.cpp
gguf.cpp gguf : avoid too many file size calls (#19919) 2026-02-26 12:46:32 +02:00