mirror of
https://github.com/ggerganov/llama.cpp
synced 2026-04-20 22:28:39 +02:00
* server: clear idle slots KV from VRAM (LLAMA_KV_KEEP_ONLY_ACTIVE) * server: move idle slot KV clearing to slot release The save "cost" is now paid by the finishing request. * server: add --kv-clear-idle flag, enable by default * server: skip clearing last idle slot, clear on launch * server: test --no-kv-clear-idle flag * server: simplify on-release clearing loop * server: remove on-release KV clearing, keep launch-only * cont : clean-up * tests: update log strings after --clear-idle rename * tests: use debug tags instead of log message matching * test: fix Windows CI by dropping temp log file unlink --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> |
||
|---|---|---|
| .. | ||
| test_basic.py | ||
| test_chat_completion.py | ||
| test_compat_anthropic.py | ||
| test_compat_oai_responses.py | ||
| test_completion.py | ||
| test_ctx_shift.py | ||
| test_embedding.py | ||
| test_infill.py | ||
| test_kv_keep_only_active.py | ||
| test_lora.py | ||
| test_proxy.py | ||
| test_rerank.py | ||
| test_router.py | ||
| test_security.py | ||
| test_sleep.py | ||
| test_slot_save.py | ||
| test_speculative.py | ||
| test_template.py | ||
| test_tokenize.py | ||
| test_tool_call.py | ||
| test_vision_api.py | ||