ik_llama.cpp

History

Kawrakow 22540cee60 Do not allocate KV cache for unused layers (#843 ) * Do not allocate KV cache for unused layers * Do not apply experts weight scale if it is 1 --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>		2025-10-20 10:09:39 +03:00
..
CMakeLists.txt	Enable and clean up compiler warnings in src (#824 )	2025-10-11 16:01:13 +03:00
llama-arch.cpp	Adding Ling/Ring (a.k.a., Bailing-MoE2) support (#833 )	2025-10-15 14:20:40 +03:00
llama-arch.h	Adding Ling/Ring (a.k.a., Bailing-MoE2) support (#833 )	2025-10-15 14:20:40 +03:00
llama-build-context.cpp	Do not allocate KV cache for unused layers (#843 )	2025-10-20 10:09:39 +03:00
llama-build-context.h	Grouped expert routing (CPU only) (#836 )	2025-10-16 14:57:02 +03:00
llama-context.h	Refactor file llama.cpp (#823 )	2025-10-11 11:35:20 +03:00
llama-cparams.h	Grouped expert routing (CPU only) (#836 )	2025-10-16 14:57:02 +03:00
llama-grammar.cpp	Tool calls support from mainline (#723 )	2025-09-01 08:38:49 +03:00
llama-grammar.h	Tool calls support from mainline (#723 )	2025-09-01 08:38:49 +03:00
llama-hparams.cpp	Adding Ling/Ring (a.k.a., Bailing-MoE2) support (#833 )	2025-10-15 14:20:40 +03:00
llama-hparams.h	Adding Ling/Ring (a.k.a., Bailing-MoE2) support (#833 )	2025-10-15 14:20:40 +03:00
llama-impl.h	Remove double definition of LLAMA_LOG_DEBUG	2025-09-01 08:42:04 +03:00
llama-load-tensors.cpp	Adding Ling/Ring (a.k.a., Bailing-MoE2) support (#833 )	2025-10-15 14:20:40 +03:00
llama-mmap.cpp	Enable CUDA graphs for MoE models + GPT-OSS support (#689 )	2025-08-15 09:18:07 +03:00
llama-mmap.h	Enable CUDA graphs for MoE models + GPT-OSS support (#689 )	2025-08-15 09:18:07 +03:00
llama-model-loader.cpp	Refactor file llama.cpp (#823 )	2025-10-11 11:35:20 +03:00
llama-model-loader.h	Refactor file llama.cpp (#823 )	2025-10-11 11:35:20 +03:00
llama-model.cpp	Adding Ling/Ring (a.k.a., Bailing-MoE2) support (#833 )	2025-10-15 14:20:40 +03:00
llama-model.h	Adding Ling/Ring (a.k.a., Bailing-MoE2) support (#833 )	2025-10-15 14:20:40 +03:00
llama-quantize.cpp	Fix PATH_MAX not defined on Windows (#828 )	2025-10-13 09:25:57 +03:00
llama-sampling.cpp	Enable and clean up compiler warnings in src (#824 )	2025-10-11 16:01:13 +03:00
llama-sampling.h	add dry sampler (#513 )	2025-06-19 10:24:53 +03:00
llama-vocab.cpp	Adding Ling/Ring (a.k.a., Bailing-MoE2) support (#833 )	2025-10-15 14:20:40 +03:00
llama-vocab.h	model : add grok-2 support (#782 )	2025-09-23 16:31:01 +02:00
llama.cpp	Do not allocate KV cache for unused layers (#843 )	2025-10-20 10:09:39 +03:00
unicode-data.cpp	Merge mainline llama.cpp (#3 )	2024-07-27 07:55:01 +02:00
unicode-data.h	Merge mainline llama.cpp (#3 )	2024-07-27 07:55:01 +02:00
unicode.cpp	Enable CUDA graphs for MoE models + GPT-OSS support (#689 )	2025-08-15 09:18:07 +03:00
unicode.h	Enable CUDA graphs for MoE models + GPT-OSS support (#689 )	2025-08-15 09:18:07 +03:00