mirror of
https://github.com/ggerganov/llama.cpp
synced 2026-04-24 20:41:55 +02:00
* vulkan: add GATED_DELTA_NET op support Implements the fused gated delta net recurrence as a Vulkan compute shader with full support for scalar gate, KDA vector gate, GQA broadcast, multi-token sequences, and permuted (non-contiguous) q/k inputs. Specialization constants select head size (32/64/128) and KDA mode at pipeline creation time. Passes all 13 test-backend-ops cases on AMD Radeon 890M (RADV GFX1150). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * vulkan: optimize GATED_DELTA_NET shader (Phase 1) - vec4 dot products on all inner loops (dp4 hardware intrinsic) - Cache exp(g) in shared memory for KDA path, eliminating ~32K redundant global reads and ~16K redundant exp() calls per token - vec4 fused decay + rank-1 update (3 vec4 ops vs 12 scalar ops) - Add perf benchmark cases for GATED_DELTA_NET to test-backend-ops KDA TG: +5.4% throughput. Non-KDA: no regressions. 13/13 test-backend-ops passing on AMD Radeon 890M (RADV GFX1150). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * vulkan: address review feedback for GATED_DELTA_NET Pipeline array refactor [3][2], A_TYPE/D_TYPE/FLOAT_TYPE shader macros, scale in push constants, supports_op fix, dispatch restructuring. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * vulkan: use FLOAT_TYPE for buffer/shared declarations, align formatting Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * vulkan: add explicit FLOAT_TYPE casts for buffer loads Wrap data_q, data_k, and data_g buffer reads with FLOAT_TYPE() casts to ensure correct behavior across all Vulkan configurations. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * vulkan: fix Q/K broadcast for interleaved head layout Adapt to the interleaved broadcast convention from #20340: head_id / rq1 → head_id % neq1 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Progeny Alpha <ProgenyAlpha@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> |
||
|---|---|---|
| .. | ||
| peg-parser | ||
| .gitignore | ||
| CMakeLists.txt | ||
| get-model.cpp | ||
| get-model.h | ||
| gguf-model-data.cpp | ||
| gguf-model-data.h | ||
| run-json-schema-to-grammar.mjs | ||
| test-alloc.cpp | ||
| test-arg-parser.cpp | ||
| test-autorelease.cpp | ||
| test-backend-ops.cpp | ||
| test-backend-sampler.cpp | ||
| test-barrier.cpp | ||
| test-c.c | ||
| test-chat-auto-parser.cpp | ||
| test-chat-peg-parser.cpp | ||
| test-chat-template.cpp | ||
| test-chat.cpp | ||
| test-double-float.cpp | ||
| test-gbnf-validator.cpp | ||
| test-gguf-model-data.cpp | ||
| test-gguf.cpp | ||
| test-grammar-integration.cpp | ||
| test-grammar-llguidance.cpp | ||
| test-grammar-parser.cpp | ||
| test-jinja.cpp | ||
| test-json-partial.cpp | ||
| test-json-schema-to-grammar.cpp | ||
| test-llama-archs.cpp | ||
| test-llama-grammar.cpp | ||
| test-log.cpp | ||
| test-lora-conversion-inference.sh | ||
| test-model-load-cancel.cpp | ||
| test-mtmd-c-api.c | ||
| test-opt.cpp | ||
| test-peg-parser.cpp | ||
| test-quantize-fns.cpp | ||
| test-quantize-perf.cpp | ||
| test-quantize-stats.cpp | ||
| test-reasoning-budget.cpp | ||
| test-regex-partial.cpp | ||
| test-rope.cpp | ||
| test-sampling.cpp | ||
| test-state-restore-fragmented.cpp | ||
| test-thread-safety.cpp | ||
| test-tokenizer-0.cpp | ||
| test-tokenizer-0.py | ||
| test-tokenizer-0.sh | ||
| test-tokenizer-1-bpe.cpp | ||
| test-tokenizer-1-spm.cpp | ||
| test-tokenizer-random.py | ||
| test-tokenizers-repo.sh | ||
| testing.h | ||