llama.cpp

mirror of https://github.com/ggerganov/llama.cpp synced 2026-04-30 19:32:05 +02:00

History

Yiwei Shao 5d14e5d19b hexagon: optimization for HMX mat_mul (#21554 ) * hexagon: add async HMX worker Introduce hmx-worker (dedicated thread for HMX compute) to overlap HMX matmul with HVX dequant/DMA stages in the pipeline path, replacing the previous synchronous HMX calls that blocked the main thread. * hexagon: cost-based VTCM chunk search for out-stationary matmul * hexagon: fix futex race in hmx_worker_drain Store the boolean to local variable avoid atomic load twice * hex-mm: hmx optimize scatter/transpose and use HMX intrinsics * hex-vmem: drop vmem limit a touch under 3GB on v73 * hexagon: add fwd declaration of htp_context * hex-hmx: replace hmx-worker with hmx-queue that mimics dma-queue interface Simplifies the overall implemantion, reduces thread wakeup roundtrips. * hex-mm: add debug log to hmx work func called from hmx-queue * Update hmx-queue.h Co-authored-by: Max Krasnyansky <max.krasnyansky@gmail.com> --------- Co-authored-by: Kim-Chyan Gan <kgan@qti.qualcomm.com> Co-authored-by: Max Krasnyansky <maxk@qti.qualcomm.com> Co-authored-by: Max Krasnyansky <max.krasnyansky@gmail.com>		2026-04-14 14:09:03 -07:00
..
cmake	ggml: backend-agnostic tensor parallelism (experimental) (#19378 )	2026-04-09 16:42:19 +02:00
include	ggml : remove ggml-ext.h (#21869 )	2026-04-14 17:32:58 +03:00
src	hexagon: optimization for HMX mat_mul (#21554 )	2026-04-14 14:09:03 -07:00
.gitignore	vulkan : cmake integration (#8119 )	2024-07-13 18:12:39 +02:00
CMakeLists.txt	cmake: fix CMP0194 warning on Windows with MSVC (#21630 )	2026-04-14 13:47:56 +03:00