llama.cpp

mirror of https://github.com/ggerganov/llama.cpp synced 2026-03-02 13:19:27 +01:00

History

Max Krasnyansky 39fb81f875 hexagon refactor all Ops to use local context struct (#19819 ) * hexagon: refactor set/get/sum-rows ops to use local context * hexagon: refactor ROPE and Softmax Ops to use local context Improves performance a bit by precomputing things and saving in the context. * hexagon: refactor activation ops to use local context struct * hexagon: refactor unary ops to use local context struct and DMA/VTCM * hexagon: use aligned hvx_scale function * hexagon: remove unused fields from op_context * hexagon: rewrite ROPE to use DMA and VTCM scratchpad * hex-rope: keep N rows in scratchpad (instead of just two) * hex-rope: introduce rowidx cache * hex-rope: remove unused fields * hex-rope: rewrite dma prefetch logic to allow for multi-row fetch/compute also removes the need for fastdiv. * hex-rope: minor formatting * hex-rope: use indices and unroll the loops * hex-rope: more updates to cleanup rope-block handling * hexagon: cleanup supported type/dims checks * hexagon: all reduce funcs replicated across lanes There is no need to explicitly replicate the first value. * snapdragon: update adb and windows scripts to use ubatch-size 256 Updated Ops support handles larger ubatches.		2026-02-23 16:32:14 -08:00
..
apple	scripts : make the shell scripts cross-platform (#14341 )	2025-06-30 10:17:18 +02:00
jinja	scripts : add Jinja tester PySide6 simple app (#15756 )	2025-09-05 01:05:12 +02:00
snapdragon	hexagon refactor all Ops to use local context struct (#19819 )	2026-02-23 16:32:14 -08:00
bench-models.sh	benches : update models + numbers (#19359 )	2026-02-05 14:34:07 +02:00
build-info.sh	llama : reorganize source code + improve CMake (#8006 )	2024-06-26 18:33:02 +03:00
check-requirements.sh	scripts : make the shell scripts cross-platform (#14341 )	2025-06-30 10:17:18 +02:00
compare-commits.sh	scripts: add sqlite3 check for compare-commits.sh (#15633 )	2025-08-28 19:23:22 +08:00
compare-llama-bench.py	ggml-cuda: enable cuda-graphs for `n-cpu-moe` (#18934 )	2026-01-24 14:25:20 +08:00
compare-logprobs.py	scripts: add script to compare logprobs of llama.cpp against other frameworks (#17947 )	2025-12-13 22:33:29 +01:00
create_ops_docs.py	Docs: add instructions for adding backends (#14889 )	2025-07-27 09:36:43 +08:00
debug-test.sh	refactor : remove libcurl, use OpenSSL when available (#18828 )	2026-01-14 18:02:47 +01:00
fetch_server_test_models.py	llama : move end-user examples to tools directory (#13249 )	2025-05-02 20:27:13 +02:00
gen-authors.sh	scripts : make the shell scripts cross-platform (#14341 )	2025-06-30 10:17:18 +02:00
gen-unicode-data.py	py : type-check all Python scripts with Pyright (#8341 )	2024-07-07 15:04:39 -04:00
get_chat_template.py	scripts: corrected encoding when getting chat template (#11866 ) (#11907 )	2025-02-18 10:30:16 +01:00
get-flags.mk	build : pass all warning flags to nvcc via -Xcompiler (#5570 )	2024-02-18 16:21:52 -05:00
get-hellaswag.sh	scripts : make the shell scripts cross-platform (#14341 )	2025-06-30 10:17:18 +02:00
get-pg.sh	scripts : make the shell scripts cross-platform (#14341 )	2025-06-30 10:17:18 +02:00
get-wikitext-2.sh	scripts : make the shell scripts cross-platform (#14341 )	2025-06-30 10:17:18 +02:00
get-wikitext-103.sh	scripts : make the shell scripts cross-platform (#14341 )	2025-06-30 10:17:18 +02:00
get-winogrande.sh	scripts : make the shell scripts cross-platform (#14341 )	2025-06-30 10:17:18 +02:00
hf.sh	scripts : make the shell scripts cross-platform (#14341 )	2025-06-30 10:17:18 +02:00
install-oneapi.bat	support SYCL backend windows build (#5208 )	2024-01-31 08:08:07 +05:30
pr2wt.sh	scripts : add support for forks in pr2wt.sh (#19540 )	2026-02-12 13:14:28 +01:00
serve-static.js	refactor : remove libcurl, use OpenSSL when available (#18828 )	2026-01-14 18:02:47 +01:00
server-bench.py	llama: use FA + max. GPU layers by default (#15434 )	2025-08-30 16:32:10 +02:00
sync_vendor.py	vendor : update cpp-httplib to 0.34.0 (#19830 )	2026-02-23 21:05:48 +01:00
sync-ggml-am.sh	scripts : update sync scripts	2025-08-18 22:06:44 +03:00
sync-ggml.last	sync : ggml	2026-02-15 22:24:29 +02:00
sync-ggml.sh	scripts : update sync scripts	2025-08-18 22:06:44 +03:00
tool_bench.py	refactor : remove libcurl, use OpenSSL when available (#18828 )	2026-01-14 18:02:47 +01:00
tool_bench.sh	scripts : make the shell scripts cross-platform (#14341 )	2025-06-30 10:17:18 +02:00
verify-checksum-models.py	convert.py : add python logging instead of print() (#6511 )	2024-05-03 22:36:41 +03:00
xxd.cmake	llama : move end-user examples to tools directory (#13249 )	2025-05-02 20:27:13 +02:00