ik_llama.cpp

History

Kawrakow df02c39650 Do not use split mode graph scheduling if there are tensor overrides (#1060 ) Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>		2025-12-12 14:48:38 +01:00
..
CMakeLists.txt	Enable and clean up compiler warnings in src (#824 )	2025-10-11 16:01:13 +03:00
llama-arch.cpp	Adding ministral3: this seems to work (#1030 )	2025-12-03 11:01:21 +01:00
llama-arch.h	Adding ministral3: this seems to work (#1030 )	2025-12-03 11:01:21 +01:00
llama-build-context.cpp	Slightly faster TG for split mode "graph" (#1057 )	2025-12-12 07:54:37 +01:00
llama-build-context.h	Hadamard transforms for K-cache - CPU only (#1033 )	2025-12-04 06:51:11 +01:00
llama-context.h	POC: CUDA tensor parallel (MoE models) (#1022 )	2025-12-01 19:25:40 +01:00
llama-cparams.h	Hadamard transforms for K-cache - CPU only (#1033 )	2025-12-04 06:51:11 +01:00
llama-grammar.cpp	Update grammar (#1023 )	2025-11-30 18:45:38 +01:00
llama-grammar.h	Update grammar (#1023 )	2025-11-30 18:45:38 +01:00
llama-hparams.cpp	Adding ministral3: this seems to work (#1030 )	2025-12-03 11:01:21 +01:00
llama-hparams.h	Adding ministral3: this seems to work (#1030 )	2025-12-03 11:01:21 +01:00
llama-impl.h	POC: CUDA tensor parallel (MoE models) (#1022 )	2025-12-01 19:25:40 +01:00
llama-load-tensors.cpp	Be able to set a max. number of GPUs to be used in split mode graph (#1051 )	2025-12-11 07:22:53 +01:00
llama-mmap.cpp	Enable CUDA graphs for MoE models + GPT-OSS support (#689 )	2025-08-15 09:18:07 +03:00
llama-mmap.h	Enable CUDA graphs for MoE models + GPT-OSS support (#689 )	2025-08-15 09:18:07 +03:00
llama-model-loader.cpp	CUDA: set compute parameters via command line arguments (#910 )	2025-11-07 07:11:23 +02:00
llama-model-loader.h	Merge Q, K, V (#878 )	2025-10-30 10:49:48 +02:00
llama-model.cpp	Adding ministral3: this seems to work (#1030 )	2025-12-03 11:01:21 +01:00
llama-model.h	Be able to set a max. number of GPUs to be used in split mode graph (#1051 )	2025-12-11 07:22:53 +01:00
llama-quantize.cpp	Fix requatizing from row-interleaved quants (#992 )	2025-11-20 11:50:09 +01:00
llama-sampling.cpp	Update grammar (#1023 )	2025-11-30 18:45:38 +01:00
llama-sampling.h	add dry sampler (#513 )	2025-06-19 10:24:53 +03:00
llama-vocab.cpp	Update mtmd to improve accuracy of M-RoPE (#993 )	2025-11-29 07:27:15 +01:00
llama-vocab.h	Update mtmd to improve accuracy of M-RoPE (#993 )	2025-11-29 07:27:15 +01:00
llama.cpp	Do not use split mode graph scheduling if there are tensor overrides (#1060 )	2025-12-12 14:48:38 +01:00
unicode-data.cpp	Merge mainline llama.cpp (#3 )	2024-07-27 07:55:01 +02:00
unicode-data.h	Merge mainline llama.cpp (#3 )	2024-07-27 07:55:01 +02:00
unicode.cpp	Enable CUDA graphs for MoE models + GPT-OSS support (#689 )	2025-08-15 09:18:07 +03:00
unicode.h	Enable CUDA graphs for MoE models + GPT-OSS support (#689 )	2025-08-15 09:18:07 +03:00