llama.cpp

mirror of https://github.com/ggerganov/llama.cpp synced 2026-04-06 23:35:15 +02:00

History

Xuan Son Nguyen 958367bf53 server : refactor slot input data, move tokenizer to HTTP thread (#10023 ) * server : refactor slot input data, move tokenizer to HTTP thread * move prompt_tokens.empty() check * fix incorrect if branch * fix infinite generation loop * bring back infill validation * add infill test * try fixing format_infill * fix test * remove redundant code * rename completion to inference * update docs * use llama_tokens everywhere	2024-10-24 21:51:22 +02:00
..
steps.py	server : refactor slot input data, move tokenizer to HTTP thread (#10023 )	2024-10-24 21:51:22 +02:00

Xuan Son Nguyen 958367bf53

server : refactor slot input data, move tokenizer to HTTP thread (#10023 )

* server : refactor slot input data, move tokenizer to HTTP thread

* move prompt_tokens.empty() check

* fix incorrect if branch

* fix infinite generation loop

* bring back infill validation

* add infill test

* try fixing format_infill

* fix test

* remove redundant code

* rename completion to inference

* update docs

* use llama_tokens everywhere

2024-10-24 21:51:22 +02:00

steps.py

server : refactor slot input data, move tokenizer to HTTP thread (#10023 )

2024-10-24 21:51:22 +02:00