llama.cpp

mirror of https://github.com/ggerganov/llama.cpp synced 2026-03-06 07:09:21 +01:00

History

Jeff Bolz d0061be838 vulkan: split mul_mat into multiple dispatches to avoid overflow (#19509 ) * vulkan: split mul_mat into multiple dispatches to avoid overflow The batch dimensions can be greater than the max workgroup count limit, in which case we need to split into multiple dispatches and pass the base index through a push constant. Fall back for the less common p021 and nc variants. * address feedback		2026-02-18 10:47:10 +01:00
..
cmake
include	ggml : make `ggml_is_view` as API (#19539 )	2026-02-16 17:43:34 +02:00
src	vulkan: split mul_mat into multiple dispatches to avoid overflow (#19509 )	2026-02-18 10:47:10 +01:00
.gitignore
CMakeLists.txt	ggml : bump version to 0.9.7 (ggml/1425)	2026-02-15 22:24:29 +02:00