ik_llama.cpp/src
Kawrakow 8a8de91a42
Set mla=3 by default (#943)
so more recent users that haven't followed the history of FlashMLA
evolution and hence don't know about the MLA options get the best setting
without having to add -mla 3 on the command line.

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2025-11-12 11:00:58 +02:00
..
CMakeLists.txt Enable and clean up compiler warnings in src (#824) 2025-10-11 16:01:13 +03:00
llama-arch.cpp Add support for SmolLM3 (#934) 2025-11-10 15:40:12 +02:00
llama-arch.h Add support for SmolLM3 (#934) 2025-11-10 15:40:12 +02:00
llama-build-context.cpp Minor: remove unnecesssary calls to build_inp_out_ids (#935) 2025-11-10 17:38:46 +02:00
llama-build-context.h Add support for SmolLM3 (#934) 2025-11-10 15:40:12 +02:00
llama-context.h Support --device and --device-draft parameter (#866) 2025-10-27 18:13:28 +02:00
llama-cparams.h CUDA: set compute parameters via command line arguments (#910) 2025-11-07 07:11:23 +02:00
llama-grammar.cpp Tool calls support from mainline (#723) 2025-09-01 08:38:49 +03:00
llama-grammar.h Tool calls support from mainline (#723) 2025-09-01 08:38:49 +03:00
llama-hparams.cpp Add support for SmolLM3 (#934) 2025-11-10 15:40:12 +02:00
llama-hparams.h Port of Qwen3-VL support from mainline (#883) 2025-11-04 19:20:54 +02:00
llama-impl.h Fix warnings about LLAMA_DEBUG being redefined 2025-10-27 18:41:03 +02:00
llama-load-tensors.cpp Add support for SmolLM3 (#934) 2025-11-10 15:40:12 +02:00
llama-mmap.cpp Enable CUDA graphs for MoE models + GPT-OSS support (#689) 2025-08-15 09:18:07 +03:00
llama-mmap.h Enable CUDA graphs for MoE models + GPT-OSS support (#689) 2025-08-15 09:18:07 +03:00
llama-model-loader.cpp CUDA: set compute parameters via command line arguments (#910) 2025-11-07 07:11:23 +02:00
llama-model-loader.h Merge Q, K, V (#878) 2025-10-30 10:49:48 +02:00
llama-model.cpp Add support for SmolLM3 (#934) 2025-11-10 15:40:12 +02:00
llama-model.h model : Port Minimax M2 from mainline (#907) 2025-11-06 18:09:24 +02:00
llama-quantize.cpp Allow quantization of ffn_gate_inp (#896) 2025-11-05 10:44:32 +02:00
llama-sampling.cpp Enable and clean up compiler warnings in src (#824) 2025-10-11 16:01:13 +03:00
llama-sampling.h add dry sampler (#513) 2025-06-19 10:24:53 +03:00
llama-vocab.cpp model : Port Minimax M2 from mainline (#907) 2025-11-06 18:09:24 +02:00
llama-vocab.h model : Port Minimax M2 from mainline (#907) 2025-11-06 18:09:24 +02:00
llama.cpp Set mla=3 by default (#943) 2025-11-12 11:00:58 +02:00
unicode-data.cpp Merge mainline llama.cpp (#3) 2024-07-27 07:55:01 +02:00
unicode-data.h Merge mainline llama.cpp (#3) 2024-07-27 07:55:01 +02:00
unicode.cpp Enable CUDA graphs for MoE models + GPT-OSS support (#689) 2025-08-15 09:18:07 +03:00
unicode.h Enable CUDA graphs for MoE models + GPT-OSS support (#689) 2025-08-15 09:18:07 +03:00