mirror of https://github.com/ggerganov/llama.cpp synced 2026-04-19 05:36:29 +02:00

History

Georgi Gerganov 6990e2f1f7 libs : rename libcommon -> libllama-common (#21936 ) * cmake : allow libcommon to be shared * cmake : rename libcommon to libllama-common * cont : set -fPIC for httplib * cont : export all symbols * cont : fix build_info exports * libs : add libllama-common-base * log : add common_log_get_verbosity_thold()		2026-04-17 11:11:46 +03:00
..
CMakeLists.txt	libs : rename libcommon -> libllama-common (#21936 )	2026-04-17 11:11:46 +03:00
finetune.cpp	common : move up common_init() and fix Windows UTF-8 logs (#21176 )	2026-03-31 12:53:41 +02:00
README.md	examples/training: Fix file name in README (#13803 )	2025-05-26 16:55:24 +02:00

README.md

llama.cpp/examples/training

This directory contains examples related to language model training using llama.cpp/GGML. So far finetuning is technically functional (for FP32 models and limited hardware setups) but the code is very much WIP. Finetuning of Stories 260K and LLaMA 3.2 1b seems to work with 24 GB of memory. For CPU training, compile llama.cpp without any additional backends such as CUDA. For CUDA training, use the maximum number of GPU layers.

Proof of concept:

export model_name=llama_3.2-1b && export quantization=f32
./build/bin/llama-finetune --file wikitext-2-raw/wiki.test.raw -ngl 999 --model models/${model_name}-${quantization}.gguf -c 512 -b 512 -ub 512
./build/bin/llama-perplexity --file wikitext-2-raw/wiki.test.raw -ngl 999 --model finetuned-model.gguf

The perplexity value of the finetuned model should be lower after training on the test set for 2 epochs.