mirror of
https://github.com/ggerganov/llama.cpp
synced 2026-04-06 23:35:15 +02:00
* server : refactor slot input data, move tokenizer to HTTP thread * move prompt_tokens.empty() check * fix incorrect if branch * fix infinite generation loop * bring back infill validation * add infill test * try fixing format_infill * fix test * remove redundant code * rename completion to inference * update docs * use llama_tokens everywhere |
||
|---|---|---|
| .. | ||
| steps.py | ||