llama.cpp

mirror of https://github.com/ggerganov/llama.cpp synced 2026-04-25 21:14:49 +02:00

History

Sascha Rogmann 455d8e4be8 server : speculative checkpointing (#19493 ) * server : speculative decoding using checkpoints * server : fix draft check with checkpoints * server : rename spec vars * server : log levels * server : refactored spec logic to speculative.cpp * server : renamed spec checkpoints option * server : fix spec checkpoints, logging * speculative : checkpoints with draft model, logging * server : n_tokens_cur and create_checkpoint in draft * server : fix server_speculative_callback (slot.id) * spec : fix ngram-map/begin idx_last_check * spec : init ckpt (begin() wasn't called) * chore: update webui build output * server : restore sampler in spec checkpoint and clear mem * cont : avoid --spec-use-checkpoints argument * cont : remove server_prompt_checkpoint_with_size * spec : rename (leave_draft_state) * cont : clean-up * cont : do not ignore partial drafts even if the are short * cont : spec callback owned by session * cont : simplify * cont : avoid empty speculative session * cont : simplify * cont : simplify * cont : enable mtmd speculative decoding * cont : keep the spec sampler alive * cont : simplify * cont : fix nullptr deref + draft checkpoints * cont : remove common_speculative_accept_response * cont : remove callback * cont : simplify * cont : minor * cont : simplify * cont : fix accepted number --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>		2026-04-19 10:24:06 +03:00
..
batched-bench	libs : rename libcommon -> libllama-common (#21936 )	2026-04-17 11:11:46 +03:00
cli	libs : rename libcommon -> libllama-common (#21936 )	2026-04-17 11:11:46 +03:00
completion	libs : rename libcommon -> libllama-common (#21936 )	2026-04-17 11:11:46 +03:00
cvector-generator	libs : rename libcommon -> libllama-common (#21936 )	2026-04-17 11:11:46 +03:00
export-lora	libs : rename libcommon -> libllama-common (#21936 )	2026-04-17 11:11:46 +03:00
fit-params	libs : rename libcommon -> libllama-common (#21936 )	2026-04-17 11:11:46 +03:00
gguf-split	libs : rename libcommon -> libllama-common (#21936 )	2026-04-17 11:11:46 +03:00
imatrix	libs : rename libcommon -> libllama-common (#21936 )	2026-04-17 11:11:46 +03:00
llama-bench	libs : rename libcommon -> libllama-common (#21936 )	2026-04-17 11:11:46 +03:00
mtmd	ci : add android arm64 build and release (#21647 )	2026-04-17 11:32:24 +02:00
parser	libs : rename libcommon -> libllama-common (#21936 )	2026-04-17 11:11:46 +03:00
perplexity	libs : rename libcommon -> libllama-common (#21936 )	2026-04-17 11:11:46 +03:00
quantize	libs : rename libcommon -> libllama-common (#21936 )	2026-04-17 11:11:46 +03:00
results	libs : rename libcommon -> libllama-common (#21936 )	2026-04-17 11:11:46 +03:00
rpc	rpc : add native RDMA transport for RPC backend (RoCEv2) (#20590 )	2026-04-15 16:44:02 +03:00
server	server : speculative checkpointing (#19493 )	2026-04-19 10:24:06 +03:00
tokenize	libs : rename libcommon -> libllama-common (#21936 )	2026-04-17 11:11:46 +03:00
tts	libs : rename libcommon -> libllama-common (#21936 )	2026-04-17 11:11:46 +03:00
CMakeLists.txt	llama: end-to-end tests (#19802 )	2026-03-08 12:30:21 +01:00