mirror of
https://github.com/ggerganov/llama.cpp
synced 2026-04-19 21:56:15 +02:00
* optimize hmx_mat_mul functions by calculating row and column tiles upfront
* refactor core_dot_chunk_fp16 to use size_t for tile counts and improve readability
* wip
* set scale outside of loop
* wip
* refactor core_mma_chunk_fp16 and mat_mul_qk_0_d16a32 to use size_t for tile counts
* wip
* wip
* refactor transfer_output_chunk_fp16_to_fp32 to use size_t for dimensions
* refactor core_dot_chunk_fp16 to use size_t for tile row stride calculation
* wip
* refactor hmx_mat_mul functions to use hvx_vec_splat_f16 for column scales initialization
* refactor hmx_mat_mul_permuted_w16a32_batched to streamline scale setting and locking
* refactor core_dot_chunk_fp16 to improve tile stride calculations for output
* refactor hmx_mat_mul functions to use Q6_V_vsplat_R for column scales initialization
* fix compiling error
* wip
* optimize row and column tile indexing in core_mma_chunk_fp16 function
* wip
* Revert "wip"
This reverts commit
|
||
|---|---|---|
| .. | ||
| cmake | ||
| include | ||
| src | ||
| .gitignore | ||
| CMakeLists.txt | ||