mirror of
https://github.com/ggerganov/llama.cpp
synced 2026-03-02 13:19:27 +01:00
* hexagon: refactor set/get/sum-rows ops to use local context * hexagon: refactor ROPE and Softmax Ops to use local context Improves performance a bit by precomputing things and saving in the context. * hexagon: refactor activation ops to use local context struct * hexagon: refactor unary ops to use local context struct and DMA/VTCM * hexagon: use aligned hvx_scale function * hexagon: remove unused fields from op_context * hexagon: rewrite ROPE to use DMA and VTCM scratchpad * hex-rope: keep N rows in scratchpad (instead of just two) * hex-rope: introduce rowidx cache * hex-rope: remove unused fields * hex-rope: rewrite dma prefetch logic to allow for multi-row fetch/compute also removes the need for fastdiv. * hex-rope: minor formatting * hex-rope: use indices and unroll the loops * hex-rope: more updates to cleanup rope-block handling * hexagon: cleanup supported type/dims checks * hexagon: all reduce funcs replicated across lanes There is no need to explicitly replicate the first value. * snapdragon: update adb and windows scripts to use ubatch-size 256 Updated Ops support handles larger ubatches. |
||
|---|---|---|
| .. | ||
| apple | ||
| jinja | ||
| snapdragon | ||
| bench-models.sh | ||
| build-info.sh | ||
| check-requirements.sh | ||
| compare-commits.sh | ||
| compare-llama-bench.py | ||
| compare-logprobs.py | ||
| create_ops_docs.py | ||
| debug-test.sh | ||
| fetch_server_test_models.py | ||
| gen-authors.sh | ||
| gen-unicode-data.py | ||
| get_chat_template.py | ||
| get-flags.mk | ||
| get-hellaswag.sh | ||
| get-pg.sh | ||
| get-wikitext-2.sh | ||
| get-wikitext-103.sh | ||
| get-winogrande.sh | ||
| hf.sh | ||
| install-oneapi.bat | ||
| pr2wt.sh | ||
| serve-static.js | ||
| server-bench.py | ||
| sync_vendor.py | ||
| sync-ggml-am.sh | ||
| sync-ggml.last | ||
| sync-ggml.sh | ||
| tool_bench.py | ||
| tool_bench.sh | ||
| verify-checksum-models.py | ||
| xxd.cmake | ||