git/ggml

mirror of https://github.com/ggerganov/ggml synced 2026-05-02 19:51:44 +02:00

Go to file

copilot-swe-agent[bot] aeb817eb34 Fix CPU backend deadlock in RWKV_WKV6 and GLA with 33+ threads When nth > HEADS (e.g. 33 threads with 32 heads), threads with ith >= HEADS were returning early before reaching ggml_barrier(). The remaining threads would then block forever at the barrier waiting for all nth threads to arrive. Fix: move the early-return guard to after the ggml_barrier() call in both ggml_compute_forward_rwkv_wkv6_f32 and ggml_compute_forward_gla_f32. Agent-Logs-Url: https://github.com/ggml-org/ggml/sessions/9e1aafc5-b7df-488d-9d26-958ef68f78ef Co-authored-by: ggerganov <1991296+ggerganov@users.noreply.github.com>		2026-03-31 13:47:45 +00:00
.github	ci : disable AMX jobs	2026-03-16 22:40:32 +02:00
ci	ggml : fix conv2d_dw SVE path (#1380 )	2025-11-04 20:40:52 +02:00
cmake	cmake : remove unused file (#1419 )	2026-01-30 16:29:51 +02:00
docs	Update gguf specification to synchronize the `ggml_types` declaration shown in the doc with the actual one. (#1342 )	2025-09-16 13:42:24 +02:00
examples	common : add nvfp4 (#0 )	2026-03-15 21:50:13 +02:00
include	llama: fix llama-model-saver (llama/20503)	2026-03-28 13:39:09 +02:00
scripts	sync : whisper.cpp	2026-03-30 18:32:16 +03:00
src	Fix CPU backend deadlock in RWKV_WKV6 and GLA with 33+ threads	2026-03-31 13:47:45 +00:00
tests	ggml-cuda: Add NVFP4 dp4a kernel (llama/20644)	2026-03-28 13:39:09 +02:00
.editorconfig	gguf : add file format specification (#302 )	2023-11-01 19:01:49 +02:00
.gitignore	gitignore : ignore idea files (#1339 )	2025-09-09 13:17:07 +02:00
.gitmodules	git : remove kompute submodule (#1300 )	2025-07-12 16:12:49 +03:00
AUTHORS	authors : update	2025-02-04 13:03:55 +02:00
CMakeLists.txt	ggml : bump version to 0.9.9 (#1449 )	2026-03-30 18:34:29 +03:00
CONTRIBUTING.md	contrib : recommend PRs to llama.cpp (#1312 )	2025-07-25 07:05:38 +03:00
ggml.pc.in	pkg-config: include the new GGML_VERSION as a version (#1348 )	2025-09-25 18:59:38 +02:00
LICENSE	docs : Minor cleanups (llama/19252)	2026-02-07 10:37:38 +02:00
README.md	readme : simplify	2026-03-16 14:54:53 +02:00
requirements.txt	ci : update requirements.txt	2024-12-03 21:05:37 +02:00

README.md

ggml

Manifesto

Tensor library for machine learning

Note that this project is under active development.
Some of the development is currently happening in the llama.cpp and whisper.cpp repos

Features

Low-level cross-platform implementation
Integer quantization support
Broad hardware support
Automatic differentiation
ADAM and L-BFGS optimizers
No third-party dependencies
Zero memory allocations during runtime

Build

git clone https://github.com/ggml-org/ggml
cd ggml

# install python dependencies in a virtual environment
python3.10 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

# build the examples
mkdir build && cd build
cmake ..
cmake --build . --config Release -j 8

GPT inference (example)

# run the GPT-2 small 117M model
../examples/gpt-2/download-ggml-model.sh 117M
./bin/gpt-2-backend -m models/gpt-2-117M/ggml-model.bin -p "This is an example"

For more information, checkout the corresponding programs in the examples folder.

README.md

ggml

Features

Build

GPT inference (example)

Resources