..
CMakeLists.txt
Enable and clean up compiler warnings in src ( #824 )
2025-10-11 16:01:13 +03:00
llama-arch.cpp
Add support for SmolLM3 ( #934 )
2025-11-10 15:40:12 +02:00
llama-arch.h
Add support for SmolLM3 ( #934 )
2025-11-10 15:40:12 +02:00
llama-build-context.cpp
Minor: remove unnecesssary calls to build_inp_out_ids ( #935 )
2025-11-10 17:38:46 +02:00
llama-build-context.h
Add support for SmolLM3 ( #934 )
2025-11-10 15:40:12 +02:00
llama-context.h
Support --device and --device-draft parameter ( #866 )
2025-10-27 18:13:28 +02:00
llama-cparams.h
CUDA: set compute parameters via command line arguments ( #910 )
2025-11-07 07:11:23 +02:00
llama-grammar.cpp
Tool calls support from mainline ( #723 )
2025-09-01 08:38:49 +03:00
llama-grammar.h
Tool calls support from mainline ( #723 )
2025-09-01 08:38:49 +03:00
llama-hparams.cpp
Add support for SmolLM3 ( #934 )
2025-11-10 15:40:12 +02:00
llama-hparams.h
Port of Qwen3-VL support from mainline ( #883 )
2025-11-04 19:20:54 +02:00
llama-impl.h
Fix warnings about LLAMA_DEBUG being redefined
2025-10-27 18:41:03 +02:00
llama-load-tensors.cpp
Add support for SmolLM3 ( #934 )
2025-11-10 15:40:12 +02:00
llama-mmap.cpp
Enable CUDA graphs for MoE models + GPT-OSS support ( #689 )
2025-08-15 09:18:07 +03:00
llama-mmap.h
Enable CUDA graphs for MoE models + GPT-OSS support ( #689 )
2025-08-15 09:18:07 +03:00
llama-model-loader.cpp
CUDA: set compute parameters via command line arguments ( #910 )
2025-11-07 07:11:23 +02:00
llama-model-loader.h
Merge Q, K, V ( #878 )
2025-10-30 10:49:48 +02:00
llama-model.cpp
Add support for SmolLM3 ( #934 )
2025-11-10 15:40:12 +02:00
llama-model.h
model : Port Minimax M2 from mainline ( #907 )
2025-11-06 18:09:24 +02:00
llama-quantize.cpp
Allow quantization of ffn_gate_inp ( #896 )
2025-11-05 10:44:32 +02:00
llama-sampling.cpp
Enable and clean up compiler warnings in src ( #824 )
2025-10-11 16:01:13 +03:00
llama-sampling.h
add dry sampler ( #513 )
2025-06-19 10:24:53 +03:00
llama-vocab.cpp
model : Port Minimax M2 from mainline ( #907 )
2025-11-06 18:09:24 +02:00
llama-vocab.h
model : Port Minimax M2 from mainline ( #907 )
2025-11-06 18:09:24 +02:00
llama.cpp
Set mla=3 by default ( #943 )
2025-11-12 11:00:58 +02:00
unicode-data.cpp
Merge mainline llama.cpp ( #3 )
2024-07-27 07:55:01 +02:00
unicode-data.h
Merge mainline llama.cpp ( #3 )
2024-07-27 07:55:01 +02:00
unicode.cpp
Enable CUDA graphs for MoE models + GPT-OSS support ( #689 )
2025-08-15 09:18:07 +03:00
unicode.h
Enable CUDA graphs for MoE models + GPT-OSS support ( #689 )
2025-08-15 09:18:07 +03:00