llama.cpp

mirror of https://github.com/ggerganov/llama.cpp synced 2026-04-23 20:12:00 +02:00

History

Jeff Bolz e06c3ab2bc vulkan: change gated_delta_net to shard a column across a subgroup (#20662 ) * vulkan: change gated_delta_net to shard a column across a subgroup This is based on https://github.com/ggml-org/llama.cpp/pull/20391, I used an LLM to port the CUDA code to Vulkan, and guided to it to make various fixes to work with Vulkan (e.g. handling different subgroup sizes, unknown mapping of subgroup to invocation id, using subgroupAdd optionally, etc.). This fixes a perf regression from the transposing of the values in memory (!20443). * vulkan: Spread columns across fewer lanes to reduce the number of workgroups		2026-03-20 12:17:15 +01:00
..
ggml-blas	ggml-blas: set mkl threads from thread context (#20602 )	2026-03-18 01:16:49 +08:00
ggml-cann	CANN: add BF16 support for core operators (#20152 )	2026-03-20 17:08:39 +08:00
ggml-cpu	ggml: guard KleidiAI DOWNLOAD_EXTRACT_TIMESTAMP for cmake < 3.24 (#20767 )	2026-03-19 21:36:23 +02:00
ggml-cuda	HIP : ignore return of hipMemAdvise [no ci] (#20696 )	2026-03-18 09:53:13 +01:00
ggml-hexagon	hexagon: add Matrix Extensions (HMX) for Hexagon NPU backend (#20693 )	2026-03-19 09:11:06 -07:00
ggml-hip	hip: Avoid compiler bug in RDNA code generation during debug builds on Windows (#20655 )	2026-03-19 19:14:08 +01:00
ggml-metal	metal : add FA specialization for HSK = 320, HSV = 256 (#20549 )	2026-03-14 23:15:47 +02:00
ggml-musa
ggml-opencl	opencl: fix l2_norm (#20480 )	2026-03-13 22:18:52 -07:00
ggml-openvino	ggml : add OpenVINO backend (#15307 )	2026-03-14 07:56:55 +02:00
ggml-rpc	rpc : use unordered_map::reserve and emplace (#18513 )	2026-01-02 12:09:36 +02:00
ggml-sycl	[SYCL] ehance UPSCALE to support all UT cases (#20637 )	2026-03-17 10:01:52 +08:00
ggml-virtgpu	ggml-virtgpu: improve the reliability of the code (#19846 )	2026-02-26 20:00:57 +08:00
ggml-vulkan	vulkan: change gated_delta_net to shard a column across a subgroup (#20662 )	2026-03-20 12:17:15 +01:00
ggml-webgpu	ggml webgpu: ops support for qwen3.5 (SET, TRI_SOLVE, SSM_CONV, GATED_DELTA_NET) + GET_ROWS optimization (#20687 )	2026-03-19 08:45:28 -07:00
ggml-zdnn	ggml-zdnn : mark zDNN buffers as non-host (#18967 )	2026-01-22 01:16:21 +01:00
ggml-zendnn	ggml-zendnn: update code for latest ZenDNN API (#19923 )	2026-02-27 08:43:41 +08:00
CMakeLists.txt	ggml : add OpenVINO backend (#15307 )	2026-03-14 07:56:55 +02:00
ggml-alloc.c	ggml : make `ggml_is_view` as API (#19539 )	2026-02-16 17:43:34 +02:00
ggml-backend-dl.cpp	hexagon: enable offloading to Hexagon on Windows on Snapdragon (#19150 )	2026-01-29 12:33:21 -08:00
ggml-backend-dl.h	hexagon: enable offloading to Hexagon on Windows on Snapdragon (#19150 )	2026-01-29 12:33:21 -08:00
ggml-backend-impl.h	llama: use host memory if device reports 0 memory (#18587 )	2026-01-09 05:34:56 +08:00
ggml-backend-reg.cpp	ggml : add OpenVINO backend (#15307 )	2026-03-14 07:56:55 +02:00
ggml-backend.cpp	llama : disable graph reuse with pipeline parallelism (#20463 )	2026-03-12 21:04:13 +02:00
ggml-common.h	ggml : add NVFP4 quantization type support (#19769 )	2026-03-11 21:02:54 +01:00
ggml-impl.h	ggml : add NVFP4 quantization type support (#19769 )	2026-03-11 21:02:54 +01:00
ggml-opt.cpp
ggml-quants.c	ggml : guard against sumq2 being 0 in IQ4_NL (#20460 )	2026-03-15 10:47:28 +02:00
ggml-quants.h	ggml : add NVFP4 quantization type support (#19769 )	2026-03-11 21:02:54 +01:00
ggml-threading.cpp
ggml-threading.h
ggml.c	ggml : restore ggml_type_sizef() to aboid major version bump (ggml/1441)	2026-03-18 15:17:28 +02:00
ggml.cpp
gguf.cpp	gguf : avoid too many file size calls (#19919 )	2026-02-26 12:46:32 +02:00