llama.cpp

mirror of https://github.com/ggerganov/llama.cpp synced 2026-04-01 21:05:43 +02:00

History

Vinkal 2f61c0f5bf llama-cli: prevent spurious assistant token (#16202 ) * tools/main: llama-cli: prevent spurious assistant token (#13402) During prompt ingestion, prompt tokens are accepted into the sampler history (for repetition penalties). The conversation-mode path then appended `common_sampler_last(smpl)` to `assistant_ss` before any new token was sampled. At that point, "last" was a prompt-side token (e.g., an input prefix), so the assistant chat message began with an extra piece. Fix: append to `assistant_ss` only for a newly sampled (non-EOG) token. This affects only chat message assembly (`assistant_ss` / `chat_msgs` / `common_chat_format_single`); terminal stdout is unchanged. Sampling order/logits are unchanged. Fixes #13402. Signed-off-by: Vinkal Chudgar <vinkal.chudgar@gmail.com> * Update tools/main/main.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * tools/main: remove outdated comment Signed-off-by: Vinkal Chudgar <vinkal.chudgar@gmail.com> --------- Signed-off-by: Vinkal Chudgar <vinkal.chudgar@gmail.com> Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>		2025-09-29 10:03:12 +03:00
..
batched-bench	cmake : Do not install tools on iOS targets (#15903 )	2025-09-16 09:54:44 +07:00
cvector-generator	cmake : Do not install tools on iOS targets (#15903 )	2025-09-16 09:54:44 +07:00
export-lora	cmake : Do not install tools on iOS targets (#15903 )	2025-09-16 09:54:44 +07:00
gguf-split	ci : use smaller model (#16168 )	2025-09-22 09:11:39 +03:00
imatrix	cmake : Do not install tools on iOS targets (#15903 )	2025-09-16 09:54:44 +07:00
llama-bench	llama-bench: add --devices and --list-devices support (#16039 )	2025-09-20 00:15:21 +02:00
main	llama-cli: prevent spurious assistant token (#16202 )	2025-09-29 10:03:12 +03:00
mtmd	mtmd : fix uninitialized variable in bicubic_resize (#16275 )	2025-09-26 15:00:44 +02:00
perplexity	perplexity : show more kl-divergence data (#16321 )	2025-09-29 09:30:45 +03:00
quantize	ci : use smaller model (#16168 )	2025-09-22 09:11:39 +03:00
rpc	rpc : fix regression when --device is used (#15981 )	2025-09-14 12:28:18 +03:00
run	cmake : Do not install tools on iOS targets (#15903 )	2025-09-16 09:54:44 +07:00
server	Fixed a few typos in the README of the LLaMA.cpp HTTP Server [no ci] (#16297 )	2025-09-28 13:04:46 +02:00
tokenize	cmake : Do not install tools on iOS targets (#15903 )	2025-09-16 09:54:44 +07:00
tts	cmake : Do not install tools on iOS targets (#15903 )	2025-09-16 09:54:44 +07:00
CMakeLists.txt	mtmd : rename llava directory to mtmd (#13311 )	2025-05-05 16:02:55 +02:00