mirror of
https://github.com/ggerganov/llama.cpp
synced 2026-03-13 10:41:01 +01:00
This commit adds two targets to the Makefile for quantizing of Quantization Aware Trained (QAT) models to Q4_0 format. The motivation for this is that this sets the token embedding and the output tensors data types to Q8_0 instead of the default Q6_K. This is someting that we wish to enforce for QAT Q4_0 models that are to be uploaded to ggml-org on Huggingface to guarantee the best quality. |
||
|---|---|---|
| .. | ||
| check-nmse.py | ||
| create-collection-add-model.sh | ||
| hf-add-model-to-collection.py | ||
| hf-create-collection.py | ||
| hf-create-model.py | ||
| hf-upload-gguf-model.py | ||
| inspect-converted-model.sh | ||
| inspect-org-model.py | ||
| perplexity-gen.sh | ||
| perplexity-run-simple.sh | ||
| perplexity-run.sh | ||
| quantize.sh | ||
| run-embedding-server.sh | ||
| semantic_check.py | ||