ik_llama.cpp/ggml
Kawrakow 98b30e5e81
Faster adaptive_p sampling (#1165)
* A hopefully more efficient adaptive_p sampling

* Once at it, lets fix the formatting too

* More formatting

* Hopefully better

* This should be better

* Correctly accumulate adaptive_p sampling time

* AVX2
2026-01-19 16:03:09 +02:00
..
cmake Merge mainline llama.cpp (#3) 2024-07-27 07:55:01 +02:00
include server: improve speed of speculative decoding (#1119) 2026-01-10 08:01:22 +02:00
src Faster adaptive_p sampling (#1165) 2026-01-19 16:03:09 +02:00
.gitignore Merge mainline llama.cpp (#3) 2024-07-27 07:55:01 +02:00
CMakeLists.txt CUDA: compress-mode size (#1110) 2026-01-07 18:33:17 +02:00