llama.cpp

mirror of https://github.com/ggerganov/llama.cpp synced 2026-03-05 14:49:45 +01:00

History

R 3d26a09dc7 server : add thinking content blocks to Anthropic Messages API (#18551 ) * server : add thinking content blocks to Anthropic Messages API Add support for returning reasoning/thinking content in Anthropic API responses when using models with --reasoning-format deepseek and the thinking parameter enabled. - Non-streaming: adds thinking block before text in content array - Streaming: emits thinking_delta events with correct block indices - Partial streaming: tracks reasoning state across chunks via anthropic_has_reasoning member variable Tested with bartowski/DeepSeek-R1-Distill-Qwen-7B-GGUF model. * server : fix Anthropic API streaming for thinking content blocks Add signature field and fix duplicate content_block_start events in Anthropic Messages API streaming responses for reasoning models. * server: refactor Anthropic streaming state to avoid raw pointer Replace raw pointer to task_result_state with direct field copies: - Copy state fields in update() before processing chunk - Use local copies in to_json_anthropic() instead of dereferencing - Pre-compute state updates for next chunk in update() This makes the data flow clearer and avoids unsafe pointer patterns.		2026-01-06 16:17:13 +01:00
..
batched-bench	tool/ex/tests: consistently free ctx, then model (#18168 )	2025-12-22 11:00:37 +01:00
cli	gen-docs: automatically update markdown file (#18294 )	2025-12-22 19:30:19 +01:00
completion	common: fix return value check for setpriority (#18412 )	2025-12-29 11:07:49 +02:00
cvector-generator	common : refactor common_sampler + grammar logic changes (#17937 )	2025-12-14 10:11:13 +02:00
export-lora	cmake : Do not install tools on iOS targets (#15903 )	2025-09-16 09:54:44 +07:00
fit-params	llama_fit_params: return enum for fail vs. error (#18374 )	2025-12-27 09:59:19 +01:00
gguf-split	cli: new CLI experience (#17824 )	2025-12-10 15:28:59 +01:00
imatrix	common : refactor common_sampler + grammar logic changes (#17937 )	2025-12-14 10:11:13 +02:00
llama-bench	common: fix return value check for setpriority (#18412 )	2025-12-29 11:07:49 +02:00
mtmd	model : mtmd : make input norm optional in LFM2-VL (#18594 )	2026-01-04 18:50:02 +01:00
perplexity	common : refactor common_sampler + grammar logic changes (#17937 )	2025-12-14 10:11:13 +02:00
quantize	quantize: prevent input/output file collision (#18451 )	2025-12-31 23:29:03 +08:00
rpc	Install rpc-server when GGML_RPC is ON. (#17149 )	2025-11-11 10:53:59 +00:00
run	Manually link -lbsd to resolve flock symbol on AIX (#16610 )	2025-10-23 19:37:31 +08:00
server	server : add thinking content blocks to Anthropic Messages API (#18551 )	2026-01-06 16:17:13 +01:00
tokenize	cmake : Do not install tools on iOS targets (#15903 )	2025-09-16 09:54:44 +07:00
tts	common : refactor common_sampler + grammar logic changes (#17937 )	2025-12-14 10:11:13 +02:00
CMakeLists.txt	llama: automatically set parameters not set by the user in such a way that maximizes GPU utilization (#16653 )	2025-12-15 09:24:59 +01:00