llama.cpp

mirror of https://github.com/ggerganov/llama.cpp synced 2026-04-23 12:02:17 +02:00

History

Zijun Yu 52f1096f21 openvino: driver setup, CI split, thread safety, and NPU optimizations (#21944 ) * Thread safety per request only * Fix ROPE yarn case * Fix sticky stateful config * Use i4/i8 directly for symmetric quant * Use weightless caching * Add WeightlessCacheAttribute to reduce NPU memory usage * Gelu tanh support (#125) * Imrope support (#126) * fix(openvino): explicit ov::Tensor frees in ggml_backend_openvino_free * add GPU,NPU support in OV Dockerfile * add build-openvino.yml ci * Fix sticky stateful config * add concurrency to ov-gpu ci runs. Move OV CI to build-openvino.yml * fix thread-safety of shared runtime context * rope type abstraction for frontend translations * fix editorconfig --------- Co-authored-by: Mustafa Cavus <mustafa.cavus@intel.com> Co-authored-by: Dan Hoffman <dhoff749@gmail.com> Co-authored-by: Ravi Panchumarthy <ravi.panchumarthy@intel.com>		2026-04-21 18:58:34 +03:00
..
android	android: fix missing screenshots for Android.md (#18156 )	2025-12-19 09:32:04 +02:00
backend	openvino: driver setup, CI split, thread safety, and NPU optimizations (#21944 )	2026-04-21 18:58:34 +03:00
development	docs: more extensive RoPE documentation [no ci] (#21953 )	2026-04-15 14:45:16 +02:00
multimodal	chore : correct typos [no ci] (#20041 )	2026-03-05 08:50:21 +01:00
ops	metal: Implement ROLL op (#21946 )	2026-04-16 11:54:37 +03:00
android.md	android: fix missing screenshots for Android.md (#18156 )	2025-12-19 09:32:04 +02:00
autoparser.md	common/parser: add proper reasoning tag prefill reading (#20424 )	2026-03-19 16:58:21 +01:00
build-riscv64-spacemit.md	refactor : remove libcurl, use OpenSSL when available (#18828 )	2026-01-14 18:02:47 +01:00
build-s390x.md	docs: update s390x build docs (#19643 )	2026-02-16 00:33:34 +08:00
build.md	CUDA: require explicit opt-in for P2P access (#21910 )	2026-04-15 16:01:46 +02:00
docker.md	CI : Enable CUDA and Vulkan ARM64 runners and fix CI/CD (#21122 )	2026-03-30 20:24:37 +02:00
function-calling.md	common : implement new jinja template engine (#18462 )	2026-01-16 11:22:06 +01:00
install.md	docs : add "Quick start" section for new users (#13862 )	2025-06-03 13:09:36 +02:00
llguidance.md	llguidance build fixes for Windows (#11664 )	2025-02-14 12:46:08 -08:00
multimodal.md	docs: listing qwen3-asr and qwen3-omni as supported (#21857 )	2026-04-13 22:28:17 +02:00
ops.md	metal: Implement ROLL op (#21946 )	2026-04-16 11:54:37 +03:00
preset.md	preset: allow named remote preset (#18728 )	2026-01-10 15:12:29 +01:00
speculative.md	spec : remove check rate (#19377 )	2026-02-09 15:30:50 +02:00