llama.cpp

mirror of https://github.com/ggerganov/llama.cpp synced 2026-03-10 00:59:32 +01:00

History

Olivier Chafik ab9a3240a9 JSON schema conversion: ⚡️ faster repetitions, min/maxLength for strings, cap number length (#6555 ) * json: rename python schema converter to make import easier * server: skip null json_schema / grammar fields * json: deps management for primitive rules (+ allow null values) * json: optimize repetitions for minItems/maxItems and regexps: `a{,3}` goes from `"a"? "a"? "a"?` (explosive combos) to `(a (a (a)?)?)?` * grammars: add troubleshooting section to readme * json: cap length of numbers to 15 digits before/after decimal point (avoids infinite gen, e.g. "one third" -> `0.333333333333...`) * json: unify all repetition code (w/ or w/o sep) * json: support string minLength/maxLength * server+json: update server/README w/ result_format * nits * json: fix type error w/ python 3.8 * json: fix server/README (json_schema in /completion vs. result_format in /v1/chat/completions) * json: simplify DOT `{"type": "string", "pattern": "^.$"}` * json: remove recursion in opt_repetitions (avoids Python stack overflow) * json: rm dead code * json: rm useless assert & ggml.h import		2024-04-12 19:43:38 +01:00
..
.gitignore	tests : gitignore ggml-common.h	2024-03-09 14:17:11 +02:00
CMakeLists.txt	Tests: Added integration tests for GBNF parser (#6472 )	2024-04-06 10:31:33 -04:00
get-model.cpp	ci : add model tests + script wrapper (#4586 )	2024-01-26 14:18:00 +02:00
get-model.h	ci : add model tests + script wrapper (#4586 )	2024-01-26 14:18:00 +02:00
run-json-schema-to-grammar.mjs	json-schema-to-grammar improvements (+ added to server) (#5978 )	2024-03-21 11:50:43 +00:00
test-autorelease.cpp	ggml : add numa options (#5377 )	2024-02-16 11:31:07 +02:00
test-backend-ops.cpp	metal : unify mul_mv_id kernels (#6556 )	2024-04-12 18:13:20 +02:00
test-c.c	Nomic Vulkan backend (#4456 )	2024-01-29 15:50:50 -05:00
test-chat-template.cpp	Add OpenChat, Alpaca, Vicuna chat templates (#6397 )	2024-04-03 17:24:31 +02:00
test-double-float.cpp	ggml : move FP16 <-> FP32 code to ggml-impl.h (#3861 )	2023-10-30 19:19:15 +02:00
test-grad0.cpp	cuda : improve cuda pool efficiency using virtual memory (#4606 )	2023-12-24 14:34:22 +01:00
test-grammar-integration.cpp	grammars: 1.5x faster inference w/ complex grammars (vector reserves / reuses) (#6609 )	2024-04-11 19:47:34 +01:00
test-grammar-parser.cpp	ggml, common, examples, tests : fixed type arguments in printf (#5528 )	2024-02-18 18:20:12 +02:00
test-json-schema-to-grammar.cpp	JSON schema conversion: ⚡️ faster repetitions, min/maxLength for strings, cap number length (#6555 )	2024-04-12 19:43:38 +01:00
test-llama-grammar.cpp	ggml, common, examples, tests : fixed type arguments in printf (#5528 )	2024-02-18 18:20:12 +02:00
test-model-load-cancel.cpp	ggml : add numa options (#5377 )	2024-02-16 11:31:07 +02:00
test-opt.cpp	code : normalize enum names (#5697 )	2024-02-25 12:09:09 +02:00
test-quantize-fns.cpp	tests : include IQ2_XXS and IQ2_XS in test-quantize-fns (#6303 )	2024-03-25 19:33:15 +02:00
test-quantize-perf.cpp	ggml : add mmla kernels for quantized GEMM (#4966 )	2024-02-11 15:22:33 +02:00
test-rope.cpp	llama : custom attention mask + parallel decoding + no context swaps (#3228 )	2023-09-28 19:04:36 +03:00
test-sampling.cpp	sampling: fix top_k <= 0 (#5388 )	2024-02-08 09:46:30 +01:00
test-tokenizer-0-falcon.cpp	ggml : add numa options (#5377 )	2024-02-16 11:31:07 +02:00
test-tokenizer-0-falcon.py	ci : add flake8 to github actions (python linting) (#4129 )	2023-11-20 11:35:47 +01:00
test-tokenizer-0-llama.cpp	ggml : add numa options (#5377 )	2024-02-16 11:31:07 +02:00
test-tokenizer-0-llama.py	ci : add flake8 to github actions (python linting) (#4129 )	2023-11-20 11:35:47 +01:00
test-tokenizer-1-bpe.cpp	llama : refactor unicode stuff (#5992 )	2024-03-11 17:47:47 +02:00
test-tokenizer-1-llama.cpp	llama : refactor unicode stuff (#5992 )	2024-03-11 17:47:47 +02:00