ik_llama.cpp/common
firecoperana c03ee1a4d2 server: improve speed of speculative decoding (#1119)
* server: improve speed of speculative decoding

change logs

rpc: add recompute

spec dec fix

* Fix n_batch_size not set to context size for draft model

---------

Co-authored-by: firecoperana <firecoperana>
2026-01-10 08:01:22 +02:00
..
cmake Merge mainline llama.cpp (#3) 2024-07-27 07:55:01 +02:00
base64.hpp llava : expose as a shared library for downstream projects (#3613) 2023-11-07 00:36:23 +03:00
build-info.cpp.in build : link against build info instead of compiling against it (#3879) 2023-11-02 08:50:16 +02:00
chat-parser-xml-toolcall.cpp fix kimi-k2 tool call (#996) 2025-11-24 06:51:16 +01:00
chat-parser-xml-toolcall.h common: Generalized XML-style tool-call parsing with streaming support (#958) 2025-11-18 15:29:58 +01:00
chat-parser.cpp Add back the fix for Kimi-K2 tool-call parsing issues (#1070) 2025-12-16 14:44:47 +01:00
chat-parser.h Refactor chat and server file (#1062) 2025-12-15 08:27:20 +01:00
chat.cpp fix grammar for Kimi-K2 (#1103) 2026-01-05 07:57:25 +02:00
chat.h Refactor chat and server file (#1062) 2025-12-15 08:27:20 +01:00
CMakeLists.txt Refactor chat and server file (#1062) 2025-12-15 08:27:20 +01:00
common.cpp server: improve speed of speculative decoding (#1119) 2026-01-10 08:01:22 +02:00
common.h Turn on graph reuse by default (#1094) 2025-12-27 08:27:16 +01:00
console.cpp check C++ code with -Wmissing-declarations (#3184) 2023-09-15 15:38:27 -04:00
console.h gguf : new file format with flexible meta data (beta) (#2398) 2023-08-21 23:07:43 +03:00
grammar-parser.cpp Update grammar (#1023) 2025-11-30 18:45:38 +01:00
grammar-parser.h Tool calls support from mainline (#723) 2025-09-01 08:38:49 +03:00
json-partial.cpp common: Generalized XML-style tool-call parsing with streaming support (#958) 2025-11-18 15:29:58 +01:00
json-partial.h Move minja and nlohmann/json to vendor (#802) 2025-09-27 09:12:35 +02:00
json-schema-to-grammar.cpp Update grammar (#1023) 2025-11-30 18:45:38 +01:00
json-schema-to-grammar.h common: Generalized XML-style tool-call parsing with streaming support (#958) 2025-11-18 15:29:58 +01:00
llguidance.cpp Tool calls support from mainline (#723) 2025-09-01 08:38:49 +03:00
log.cpp Refactor chat and server file (#1062) 2025-12-15 08:27:20 +01:00
log.h Fix log issue for llama-cli (#1071) 2025-12-16 18:12:16 +01:00
ngram-cache.cpp Fixed lookup compilation issues on Windows (#6273) 2024-03-24 14:21:17 +01:00
ngram-cache.h Merge mainline llama.cpp (#3) 2024-07-27 07:55:01 +02:00
regex-partial.cpp Tool calls support from mainline (#723) 2025-09-01 08:38:49 +03:00
regex-partial.h Tool calls support from mainline (#723) 2025-09-01 08:38:49 +03:00
sampling.cpp Implement Adaptive-P Sampler (#1100) 2026-01-10 07:58:53 +02:00
sampling.h Implement Adaptive-P Sampler (#1100) 2026-01-10 07:58:53 +02:00
speculative.cpp Support --device and --device-draft parameter (#866) 2025-10-27 18:13:28 +02:00
speculative.h Port universal assisted decoding to llama-server (#699) 2025-08-18 09:22:23 +03:00
train.cpp train : change default FA argument (#7528) 2024-05-25 15:22:35 +03:00
train.h sync : ggml (backend v2) (#3912) 2023-11-13 14:16:23 +02:00