llama.cpp

mirror of https://github.com/ggerganov/llama.cpp synced 2026-03-03 13:50:01 +01:00

History

Zheyuan Chen bd90fc74c3 ggml-webgpu: improve flastAttention performance by software pipelining (#19151 ) * webgpu : pipeline flash_attn Q/K loads in WGSL * ggml-webgpu: unroll QK accumlation inner loop ggml-webgpu: vectorization * ggml-webgpu: unrolling * ggml-webgpu: remove redundant unrolling * ggml-webgpu: restore the config * ggml-webgpu: remove redundant comments * ggml-webgpu: formatting * ggml-webgpu: formatting and remove vectorization * ggml-webgpu: remove unnecessary constants * ggml-webgpu: change QKV buffer to read_write to pass validation * ggml-webgpu: add explanation for the additional bracket around Q K accumulate * Indentation and for -> if for tail * Kick off CI on wgsl only commits --------- Co-authored-by: Reese Levine <reeselevine1@gmail.com>		2026-01-29 14:05:30 -08:00
..
bench.yml.disabled	llama : move end-user examples to tools directory (#13249 )	2025-05-02 20:27:13 +02:00
build-cache.yml	ci : update GitHub Actions versions [no ci] (#18935 )	2026-01-22 00:57:18 +01:00
build-cmake-pkg.yml	ci : update GitHub Actions versions [no ci] (#18935 )	2026-01-22 00:57:18 +01:00
build-linux-cross.yml	ci : update GitHub Actions versions [no ci] (#18935 )	2026-01-22 00:57:18 +01:00
build.yml	ggml-webgpu: improve flastAttention performance by software pipelining (#19151 )	2026-01-29 14:05:30 -08:00
check-vendor.yml	ci : use new 1vCPU runner for lightweight jobs (#19107 )	2026-01-26 15:22:49 +01:00
close-issue.yml	ci : use new 1vCPU runner for lightweight jobs (#19107 )	2026-01-26 15:22:49 +01:00
copilot-setup-steps.yml	ci : update GitHub Actions versions [no ci] (#18935 )	2026-01-22 00:57:18 +01:00
docker.yml	ci : update GitHub Actions versions [no ci] (#18935 )	2026-01-22 00:57:18 +01:00
editorconfig.yml	ci : use new 1vCPU runner for lightweight jobs (#19107 )	2026-01-26 15:22:49 +01:00
gguf-publish.yml	ci : use new 1vCPU runner for lightweight jobs (#19107 )	2026-01-26 15:22:49 +01:00
labeler.yml	ci : use new 1vCPU runner for lightweight jobs (#19107 )	2026-01-26 15:22:49 +01:00
pre-tokenizer-hashes.yml	ci : use new 1vCPU runner for lightweight jobs (#19107 )	2026-01-26 15:22:49 +01:00
python-check-requirements.yml	ci : use new 1vCPU runner for lightweight jobs (#19107 )	2026-01-26 15:22:49 +01:00
python-lint.yml	ci : use new 1vCPU runner for lightweight jobs (#19107 )	2026-01-26 15:22:49 +01:00
python-type-check.yml	ci : use new 1vCPU runner for lightweight jobs (#19107 )	2026-01-26 15:22:49 +01:00
release.yml	release: update github api (#19022 )	2026-01-22 21:38:02 +08:00
server-webui.yml	ci : update GitHub Actions versions [no ci] (#18935 )	2026-01-22 00:57:18 +01:00
server.yml	graph : utilize `ggml_build_forward_select()` to avoid reallocations (#18898 )	2026-01-23 18:22:34 +02:00
update-ops-docs.yml	ci : use new 1vCPU runner for lightweight jobs (#19107 )	2026-01-26 15:22:49 +01:00
winget.yml	ci : find latest release with asset for winget (#19161 )	2026-01-28 22:05:39 +01:00