mirror of
https://github.com/ggerganov/llama.cpp
synced 2026-03-15 03:30:45 +01:00
Currently we always log 0, as we clear slot.drafted before. To reproduce: Run llama-server with devstral-2 as main model and devstral-2-small as md, and verbose logging: ``` % ./build/bin/llama-server -v \ -m ~/llms/Devstral-2-123B-Instruct-2512-UD-Q6_K_XL-00001-of-00003.gguf \ -md ~/llms/Devstral-Small-2-24B-Instruct-2512-UD-Q2_K_XL.gguf \ -c 8192 2> /tmp/llama.cpp.debug Check the log: slot update_slots: id 3 | task 0 | accepted 11/0 draft tokens, new n_tokens = 741 slot update_slots: id 3 | task 0 | accepted 4/0 draft tokens, new n_tokens = 746 slot update_slots: id 3 | task 0 | accepted 16/0 draft tokens, new n_tokens = 763 slot update_slots: id 3 | task 0 | accepted 11/0 draft tokens, new n_tokens = 775 slot update_slots: id 3 | task 0 | accepted 2/0 draft tokens, new n_tokens = 778 slot update_slots: id 3 | task 0 | accepted 4/0 draft tokens, new n_tokens = 783 slot update_slots: id 3 | task 0 | accepted 8/0 draft tokens, new n_tokens = 792 slot update_slots: id 3 | task 0 | accepted 2/0 draft tokens, new n_tokens = 795 slot update_slots: id 3 | task 0 | accepted 1/0 draft tokens, new n_tokens = 797 slot update_slots: id 3 | task 0 | accepted 1/0 draft tokens, new n_tokens = 799 slot update_slots: id 3 | task 0 | accepted 0/0 draft tokens, new n_tokens = 800 slot update_slots: id 3 | task 0 | accepted 2/0 draft tokens, new n_tokens = 803 slot update_slots: id 3 | task 0 | accepted 1/0 draft tokens, new n_tokens = 805 slot update_slots: id 3 | task 0 | accepted 6/0 draft tokens, new n_tokens = 812 slot update_slots: id 3 | task 0 | accepted 3/0 draft tokens, new n_tokens = 816 ``` After the fix, get correct per round logging: ``` slot update_slots: id 3 | task 0 | accepted 7/8 draft tokens, new n_tokens = 654 slot update_slots: id 3 | task 0 | accepted 1/2 draft tokens, new n_tokens = 656 slot update_slots: id 3 | task 0 | accepted 2/16 draft tokens, new n_tokens = 659 slot update_slots: id 3 | task 0 | accepted 1/16 draft tokens, new n_tokens = 661 slot update_slots: id 3 | task 0 | accepted 2/16 draft tokens, new n_tokens = 664 slot update_slots: id 3 | task 0 | accepted 16/16 draft tokens, new n_tokens = 681 slot update_slots: id 3 | task 0 | accepted 16/16 draft tokens, new n_tokens = 698 slot update_slots: id 3 | task 0 | accepted 3/4 draft tokens, new n_tokens = 702 slot update_slots: id 3 | task 0 | accepted 5/12 draft tokens, new n_tokens = 708 slot update_slots: id 3 | task 0 | accepted 16/16 draft tokens, new n_tokens = 725 slot update_slots: id 3 | task 0 | accepted 1/1 draft tokens, new n_tokens = 727 slot update_slots: id 3 | task 0 | accepted 8/16 draft tokens, new n_tokens = 736 ``` |
||
|---|---|---|
| .. | ||
| batched-bench | ||
| cli | ||
| completion | ||
| cvector-generator | ||
| export-lora | ||
| fit-params | ||
| gguf-split | ||
| imatrix | ||
| llama-bench | ||
| mtmd | ||
| perplexity | ||
| quantize | ||
| rpc | ||
| run | ||
| server | ||
| tokenize | ||
| tts | ||
| CMakeLists.txt | ||