ik_llama.cpp

Kawrakow c663eeaca6 Disable when the KV cache is not f16	2026-01-24 05:03:52 +00:00
..
CMakeLists.txt	Enable and clean up compiler warnings in src (#824 )	2025-10-11 16:01:13 +03:00
llama-arch.cpp	Mimo-V2-Flash support (#1096 )	2026-01-05 08:00:01 +02:00
llama-arch.h	Mimo-V2-Flash support (#1096 )	2026-01-05 08:00:01 +02:00
llama-build-context.cpp	Disable when the KV cache is not f16	2026-01-24 05:03:52 +00:00
llama-build-context.h	Avoid ggml_get_rows if not necessary (#1160 )	2026-01-20 15:38:21 +02:00
llama-context.h	POC: CUDA tensor parallel (MoE models) (#1022 )	2025-12-01 19:25:40 +01:00
llama-cparams.h	Additional graph reduce types for split mode graph (#1154 )	2026-01-18 08:02:49 +02:00
llama-grammar.cpp	Update grammar (#1023 )	2025-11-30 18:45:38 +01:00
llama-grammar.h	Update grammar (#1023 )	2025-11-30 18:45:38 +01:00
llama-hparams.cpp	Make comments more precise when experts gating function is missing (#1175 )	2026-01-21 09:12:40 +02:00
llama-hparams.h	Mimo-V2-Flash support (#1096 )	2026-01-05 08:00:01 +02:00
llama-impl.h	server: stop processing the prompt when client disconnects (#1134 )	2026-01-13 07:56:59 +02:00
llama-load-tensors.cpp	Avoid ggml_get_rows if not necessary (#1160 )	2026-01-20 15:38:21 +02:00
llama-mmap.cpp	Enable CUDA graphs for MoE models + GPT-OSS support (#689 )	2025-08-15 09:18:07 +03:00
llama-mmap.h	Enable CUDA graphs for MoE models + GPT-OSS support (#689 )	2025-08-15 09:18:07 +03:00
llama-model-loader.cpp	Merge ffn_up and ffn_gate experts tensors (#1137 )	2026-01-12 18:30:53 +02:00
llama-model-loader.h	Merge ffn_up and ffn_gate experts tensors (#1137 )	2026-01-12 18:30:53 +02:00
llama-model.cpp	Mimo-V2-Flash support (#1096 )	2026-01-05 08:00:01 +02:00
llama-model.h	Merge ffn_up and ffn_gate experts tensors (#1137 )	2026-01-12 18:30:53 +02:00
llama-quantize.cpp	Merge ffn_up and ffn_gate experts tensors (#1137 )	2026-01-12 18:30:53 +02:00
llama-sampling.cpp	sampling: refactor sorting (#1166 )	2026-01-19 16:48:54 +02:00
llama-sampling.h	Faster adaptive_p sampling (#1165 )	2026-01-19 16:03:09 +02:00
llama-vocab.cpp	Server: refactor and rename functions (#1151 )	2026-01-18 08:16:57 +02:00
llama-vocab.h	Update mtmd to improve accuracy of M-RoPE (#993 )	2025-11-29 07:27:15 +01:00
llama.cpp	Remove llamafile remnants (#1179 )	2026-01-22 13:20:23 +02:00
unicode-data.cpp	Merge mainline llama.cpp (#3 )	2024-07-27 07:55:01 +02:00
unicode-data.h	Merge mainline llama.cpp (#3 )	2024-07-27 07:55:01 +02:00
unicode.cpp	Server: refactor and rename functions (#1151 )	2026-01-18 08:16:57 +02:00
unicode.h	Enable CUDA graphs for MoE models + GPT-OSS support (#689 )	2025-08-15 09:18:07 +03:00

CMakeLists.txt

Enable and clean up compiler warnings in src (#824 )

2025-10-11 16:01:13 +03:00

llama-arch.cpp

Mimo-V2-Flash support (#1096 )

2026-01-05 08:00:01 +02:00

llama-arch.h

Mimo-V2-Flash support (#1096 )

2026-01-05 08:00:01 +02:00

llama-build-context.cpp

Disable when the KV cache is not f16

2026-01-24 05:03:52 +00:00

llama-build-context.h

Avoid ggml_get_rows if not necessary (#1160 )

2026-01-20 15:38:21 +02:00

llama-context.h

POC: CUDA tensor parallel (MoE models) (#1022 )

2025-12-01 19:25:40 +01:00

llama-cparams.h

Additional graph reduce types for split mode graph (#1154 )

2026-01-18 08:02:49 +02:00

llama-grammar.cpp

Update grammar (#1023 )

2025-11-30 18:45:38 +01:00

llama-grammar.h

Update grammar (#1023 )

2025-11-30 18:45:38 +01:00

llama-hparams.cpp

Make comments more precise when experts gating function is missing (#1175 )

2026-01-21 09:12:40 +02:00

llama-hparams.h

Mimo-V2-Flash support (#1096 )

2026-01-05 08:00:01 +02:00

llama-impl.h

server: stop processing the prompt when client disconnects (#1134 )

2026-01-13 07:56:59 +02:00

llama-load-tensors.cpp

Avoid ggml_get_rows if not necessary (#1160 )

2026-01-20 15:38:21 +02:00

llama-mmap.cpp

Enable CUDA graphs for MoE models + GPT-OSS support (#689 )

2025-08-15 09:18:07 +03:00

llama-mmap.h

Enable CUDA graphs for MoE models + GPT-OSS support (#689 )

2025-08-15 09:18:07 +03:00

llama-model-loader.cpp

Merge ffn_up and ffn_gate experts tensors (#1137 )

2026-01-12 18:30:53 +02:00

llama-model-loader.h

Merge ffn_up and ffn_gate experts tensors (#1137 )

2026-01-12 18:30:53 +02:00

llama-model.cpp

Mimo-V2-Flash support (#1096 )

2026-01-05 08:00:01 +02:00

llama-model.h

Merge ffn_up and ffn_gate experts tensors (#1137 )

2026-01-12 18:30:53 +02:00

llama-quantize.cpp

Merge ffn_up and ffn_gate experts tensors (#1137 )

2026-01-12 18:30:53 +02:00

llama-sampling.cpp

sampling: refactor sorting (#1166 )

2026-01-19 16:48:54 +02:00

llama-sampling.h

Faster adaptive_p sampling (#1165 )

2026-01-19 16:03:09 +02:00

llama-vocab.cpp

Server: refactor and rename functions (#1151 )

2026-01-18 08:16:57 +02:00

llama-vocab.h

Update mtmd to improve accuracy of M-RoPE (#993 )

2025-11-29 07:27:15 +01:00

llama.cpp

Remove llamafile remnants (#1179 )

2026-01-22 13:20:23 +02:00

unicode-data.cpp

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

unicode-data.h

Merge mainline llama.cpp (#3 )

2024-07-27 07:55:01 +02:00

unicode.cpp

Server: refactor and rename functions (#1151 )

2026-01-18 08:16:57 +02:00

unicode.h

Enable CUDA graphs for MoE models + GPT-OSS support (#689 )

2025-08-15 09:18:07 +03:00