ggml/include at 7a6837027eb8ddfff652e809f736e90bdde29859 - ggml

git/ggml

mirror of https://github.com/ggerganov/ggml synced 2026-03-02 05:00:27 +01:00

History

Max Krasnyansky 0a6f36a11d Add experimental ggml-hexagon backend for the Hexagon NPU (llama/16547) * model: add support for extra bufs for all devices * hexagon: add experimental ggml-hexagon backend for the Hexagon NPU This commit introduces a new experimental backend `ggml-hexagon` with support for the Hexagon NPU. Highlights: - Supports Hexagon versions: v73, v75, v79, and v81 - Targets Android devices based on Snapdragon SoCs: Gen3, 8-Elite, and 8-Elite Gen5 - Supports Q4_0, Q8_0, MXFP4, and FP32 data types - Implements core LLM ops: MUL_MAT/MUL_MAT_ID, ADD/SUB/MUL/ADD_ID, RMS_NORM, ROPE, GLU/SWIGLU, SOFTMAX Note: This backend is experimental and may exhibit instability or limited performance across supported devices. It is intended for early testing and feedback from llama.cpp/ggml developer and user community. Co-Authored-By: Rajdeep Ganguly <rganguly@qti.qualcomm.com> Co-Authored-By: Todor Boinovski <todorb@qti.qualcomm.com> * hexagon: fix format checker errors * hexagon: update readme and cmake presets * ci: add android-ndk-build jobs that build plain ARM64 and Snapdragon versions * hexagon: add simple graph optimizer for stacking MUL_MAT ops with the same input * hexagon: move ADB helper scripts into scripts/snapdragon/adb * hexagon: replace all f/printfs with GGML_LOG_... * readme: add hexagon to the list supported backends * hexagon: stack malmuts with quantized inputs only * hexagon: add TODO for fixing issues in hexagon_graph_optimize * hexagon: update to hex-sdk 6.4.0 and add scripts for running on QDC * scripts: fix lint errors * scripts: update qdc pytest script to make linter happy * hexagon: add reduce sum in fp32 * hexagon: reduce number of vector stores in matmul output * hexagon: remove the need for vdelta in reduce-multiply-x8 * hexagon: consistent use of reduce_sum_fp32 for row_sums * hexagon: some more matmul optimizations and comments Optimize cases where tensor dims are not multiple of 1024 (e.g in Qwen models). We've handled those cases already but at a higher overhead. * hexagon: update cmake presets * hexagon: add OPMASK support for run-bench.sh wrapper * hexagon: update to use GGML_BACKEND_API * hexagon: remove unused logic for setting tensor flags for the views * hexagon: add asserts to set/get_tensor to make sure we handle complete tensors Same asserts as the CPU backend. * hexagon: use cpy_tensor slow path for non-host buffers * hexagon: error checks in the buffer allocator * cmake: move include(extProj) under ggml-hexagon * hexagon: don't forget to delete the backend on free * hexagon: set/get_tensor size assert apply only to quantized tensors * hexagon: reintroduce HEX_VERBOSE wrapper for GGML_LOG_DEBUG for now GGML_LOG_DEBUG is always enabled for test-backend-ops and the output gets in the way. Ideally we need a bit more finer log levels. * docs: typos in hexagon developer docs (libggm-...) * hexagon: overhaul error handling in the session/device allocation this should handle all failure paths in the session allocation. * hexagon: update cmake presets to enable fp16 vectors * hexagon: remove unused time_usec function * hexagon: don't forget to release buffer contexts * hexagon: fixed indents in hvx-utils (missed clang-format auto-format failure) * hexagon: remove custom can_repeat function and use ggml_can_repeat --------- Co-authored-by: Rajdeep Ganguly <rganguly@qti.qualcomm.com> Co-authored-by: Todor Boinovski <todorb@qti.qualcomm.com>		2025-11-01 09:41:35 +02:00
..
ggml-alloc.h	ggml : upgrade init_tensor API to return a ggml_status (llama/11854)	2025-03-04 21:24:42 +02:00
ggml-backend.h	rpc : add support for multiple devices (llama/16276)	2025-10-12 07:57:25 +03:00
ggml-blas.h	ggml : build backends as libraries (llama/10256)	2024-11-15 22:51:53 +02:00
ggml-cann.h	ggml : build backends as libraries (llama/10256)	2024-11-15 22:51:53 +02:00
ggml-cpp.h	ggml : fix ggml_gallocr_ptr type (#1205 )	2025-04-30 15:20:40 +02:00
ggml-cpu.h	ggml: allow casting between f32 and i32 (llama/15783)	2025-09-20 13:33:50 +03:00
ggml-cuda.h	ggml : build backends as libraries (llama/10256)	2024-11-15 22:51:53 +02:00
ggml-hexagon.h	Add experimental ggml-hexagon backend for the Hexagon NPU (llama/16547)	2025-11-01 09:41:35 +02:00
ggml-metal.h	metal : refactor + optimize v2 (llama/15995)	2025-09-20 13:33:50 +03:00
ggml-opencl.h	Introducing experimental OpenCL backend with support for Qualcomm Adreno GPUs (llama/10693)	2024-12-17 19:23:40 +02:00
ggml-opt.h	finetune: SGD optimizer, more CLI args (llama/13873)	2025-08-14 14:17:28 +03:00
ggml-rpc.h	rpc : report actual free memory (llama/16616)	2025-10-21 18:14:33 +03:00
ggml-sycl.h	ggml : build backends as libraries (llama/10256)	2024-11-15 22:51:53 +02:00
ggml-vulkan.h	vulkan: Make Vulkan optional at runtime (#11493 ). (llama/11494)	2025-02-12 22:00:20 +02:00
ggml-webgpu.h	ggml: Add initial WebGPU backend (llama/14521)	2025-07-19 17:47:23 +03:00
ggml-zdnn.h	zdnn: refactor codebase + add docs (llama/16178)	2025-09-25 11:56:34 +03:00
ggml.h	cpu : add FLOOR, CEIL, ROUND and TRUNC unary operators (llama/16083)	2025-10-21 18:14:33 +03:00
gguf.h	GGUF: C++ refactor, backend support, misc fixes (skip) (llama/11030)	2025-01-14 09:36:36 +02:00