ik_llama.cpp/common
firecoperana bb358223cd server: cache prompt to host memory (#954)
* server : host-memory prompt caching

change similarity calculation and prompt save conditions

Remove unneeded token limit

rename variable

Separate prompt save and load logic

change default values

change log

remove truncate prompt logic

* add description

* bug fixes

* remove token limit in init

---------

Co-authored-by: firecoperana <firecoperana>
2025-11-14 18:40:13 +02:00
..
cmake Merge mainline llama.cpp (#3) 2024-07-27 07:55:01 +02:00
base64.hpp llava : expose as a shared library for downstream projects (#3613) 2023-11-07 00:36:23 +03:00
build-info.cpp.in build : link against build info instead of compiling against it (#3879) 2023-11-02 08:50:16 +02:00
chat-parser.cpp Add --webui arg to launch llama.cpp new webui (#786) 2025-10-27 14:22:02 +02:00
chat-parser.h Move minja and nlohmann/json to vendor (#802) 2025-09-27 09:12:35 +02:00
chat.cpp Add --webui arg to launch llama.cpp new webui (#786) 2025-10-27 14:22:02 +02:00
chat.h Port mdmd from mainline + Qwen2/2.5-VL support (#798) 2025-09-27 08:45:29 +02:00
CMakeLists.txt Add vision support in llama-server (#901) 2025-11-05 10:43:46 +02:00
common.cpp server: cache prompt to host memory (#954) 2025-11-14 18:40:13 +02:00
common.h server: cache prompt to host memory (#954) 2025-11-14 18:40:13 +02:00
console.cpp check C++ code with -Wmissing-declarations (#3184) 2023-09-15 15:38:27 -04:00
console.h gguf : new file format with flexible meta data (beta) (#2398) 2023-08-21 23:07:43 +03:00
grammar-parser.cpp Tool calls support from mainline (#723) 2025-09-01 08:38:49 +03:00
grammar-parser.h Tool calls support from mainline (#723) 2025-09-01 08:38:49 +03:00
json-partial.cpp Move minja and nlohmann/json to vendor (#802) 2025-09-27 09:12:35 +02:00
json-partial.h Move minja and nlohmann/json to vendor (#802) 2025-09-27 09:12:35 +02:00
json-schema-to-grammar.cpp Tool calls support from mainline (#723) 2025-09-01 08:38:49 +03:00
json-schema-to-grammar.h Move minja and nlohmann/json to vendor (#802) 2025-09-27 09:12:35 +02:00
llguidance.cpp Tool calls support from mainline (#723) 2025-09-01 08:38:49 +03:00
log.h Tool calls support from mainline (#723) 2025-09-01 08:38:49 +03:00
ngram-cache.cpp Fixed lookup compilation issues on Windows (#6273) 2024-03-24 14:21:17 +01:00
ngram-cache.h Merge mainline llama.cpp (#3) 2024-07-27 07:55:01 +02:00
regex-partial.cpp Tool calls support from mainline (#723) 2025-09-01 08:38:49 +03:00
regex-partial.h Tool calls support from mainline (#723) 2025-09-01 08:38:49 +03:00
sampling.cpp Move minja and nlohmann/json to vendor (#802) 2025-09-27 09:12:35 +02:00
sampling.h Tool calls support from mainline (#723) 2025-09-01 08:38:49 +03:00
speculative.cpp Support --device and --device-draft parameter (#866) 2025-10-27 18:13:28 +02:00
speculative.h Port universal assisted decoding to llama-server (#699) 2025-08-18 09:22:23 +03:00
train.cpp train : change default FA argument (#7528) 2024-05-25 15:22:35 +03:00
train.h sync : ggml (backend v2) (#3912) 2023-11-13 14:16:23 +02:00