mirror of
https://github.com/ggerganov/ggml
synced 2026-05-02 19:51:44 +02:00
When nth > HEADS (e.g. 33 threads with 32 heads), threads with ith >= HEADS were returning early before reaching ggml_barrier(). The remaining threads would then block forever at the barrier waiting for all nth threads to arrive. Fix: move the early-return guard to after the ggml_barrier() call in both ggml_compute_forward_rwkv_wkv6_f32 and ggml_compute_forward_gla_f32. Agent-Logs-Url: https://github.com/ggml-org/ggml/sessions/9e1aafc5-b7df-488d-9d26-958ef68f78ef Co-authored-by: ggerganov <1991296+ggerganov@users.noreply.github.com> |
||
|---|---|---|
| .github | ||
| ci | ||
| cmake | ||
| docs | ||
| examples | ||
| include | ||
| scripts | ||
| src | ||
| tests | ||
| .editorconfig | ||
| .gitignore | ||
| .gitmodules | ||
| AUTHORS | ||
| CMakeLists.txt | ||
| CONTRIBUTING.md | ||
| ggml.pc.in | ||
| LICENSE | ||
| README.md | ||
| requirements.txt | ||
ggml
Tensor library for machine learning
Note that this project is under active development.
Some of the development is currently happening in the llama.cpp and whisper.cpp repos
Features
- Low-level cross-platform implementation
- Integer quantization support
- Broad hardware support
- Automatic differentiation
- ADAM and L-BFGS optimizers
- No third-party dependencies
- Zero memory allocations during runtime
Build
git clone https://github.com/ggml-org/ggml
cd ggml
# install python dependencies in a virtual environment
python3.10 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
# build the examples
mkdir build && cd build
cmake ..
cmake --build . --config Release -j 8
GPT inference (example)
# run the GPT-2 small 117M model
../examples/gpt-2/download-ggml-model.sh 117M
./bin/gpt-2-backend -m models/gpt-2-117M/ggml-model.bin -p "This is an example"
For more information, checkout the corresponding programs in the examples folder.