mirror of
https://github.com/ggerganov/llama.cpp
synced 2026-05-01 03:42:01 +02:00
When testing claude code against llama.cpp, I noticed that only n_past 18577 was used even when context was 60k or more. The log in llama-server says: ``` slot update_slots: id 3 | task 10342 | old: ... ; cch= | defa0;You are slot update_slots: id 3 | task 10342 | new: ... ; cch= | 1c8b4; ``` I observed that the cch value changed every time. Reading about that, the x-anthropic-billing-header system message seems to be specially handled inside of the anthropic api. I could remove it, but there is a meaningful string sometimes included at the end. So instead, I just replace the changing cch checksum with fffff. I'm treating this as an anthropic message body API detail - I think this is the right way to do this, but by all means please correct me! It's always 5 hexadecimal characters, but I've written the replacement defensively in case they change the protocol. |
||
|---|---|---|
| .. | ||
| batched-bench | ||
| cli | ||
| completion | ||
| cvector-generator | ||
| export-lora | ||
| fit-params | ||
| gguf-split | ||
| imatrix | ||
| llama-bench | ||
| mtmd | ||
| parser | ||
| perplexity | ||
| quantize | ||
| results | ||
| rpc | ||
| server | ||
| tokenize | ||
| tts | ||
| CMakeLists.txt | ||