ik_llama.cpp

Kawrakow 9a0b5e8055 Step-3.5: llama.cpp compatibility changes	2026-02-06 07:02:55 +00:00
..
CMakeLists.txt	Enable and clean up compiler warnings in src (#824 )	2025-10-11 16:01:13 +03:00
llama-arch.cpp	Step-3.5: llama.cpp compatibility changes	2026-02-06 07:02:55 +00:00
llama-arch.h	Step-3.5: llama.cpp compatibility changes	2026-02-06 07:02:55 +00:00
llama-build-context.cpp	Graph parallel for Step-3.5-Flash (#1236 )	2026-02-06 06:56:51 +02:00
llama-build-context.h	Step-3.5-Flash support (#1231 )	2026-02-05 08:13:22 +02:00
llama-context.h	POC: CUDA tensor parallel (MoE models) (#1022 )	2025-12-01 19:25:40 +01:00
llama-cparams.h	Additional graph reduce types for split mode graph (#1154 )	2026-01-18 08:02:49 +02:00
llama-grammar.cpp	llama : add token matching support to llama-grammar (#1220 )	2026-02-03 07:57:17 +02:00
llama-grammar.h	llama : add token matching support to llama-grammar (#1220 )	2026-02-03 07:57:17 +02:00
llama-hparams.cpp	Step-3.5: llama.cpp compatibility changes	2026-02-06 07:02:55 +00:00
llama-hparams.h	Step-3.5: llama.cpp compatibility changes	2026-02-06 07:02:55 +00:00
llama-impl.h	server: stop processing the prompt when client disconnects (#1134 )	2026-01-13 07:56:59 +02:00
llama-load-tensors.cpp	Graph parallel for Step-3.5-Flash (#1236 )	2026-02-06 06:56:51 +02:00
llama-mmap.cpp	Enable CUDA graphs for MoE models + GPT-OSS support (#689 )	2025-08-15 09:18:07 +03:00
llama-mmap.h	Enable CUDA graphs for MoE models + GPT-OSS support (#689 )	2025-08-15 09:18:07 +03:00
llama-model-loader.cpp	Step-3.5-Flash support (#1231 )	2026-02-05 08:13:22 +02:00
llama-model-loader.h	Merge ffn_up and ffn_gate experts tensors (#1137 )	2026-01-12 18:30:53 +02:00
llama-model.cpp	Step-3.5-Flash support (#1231 )	2026-02-05 08:13:22 +02:00
llama-model.h	Graph parallel for Step-3.5-Flash (#1236 )	2026-02-06 06:56:51 +02:00
llama-quantize.cpp	Merge ffn_up and ffn_gate experts tensors (#1137 )	2026-01-12 18:30:53 +02:00
llama-sampling.cpp	llama : add token matching support to llama-grammar (#1220 )	2026-02-03 07:57:17 +02:00
llama-sampling.h	Adaptive p: history update fix + temp as flag (#1213 )	2026-02-03 07:36:12 +02:00
llama-vocab.cpp	Server: refactor and rename functions (#1151 )	2026-01-18 08:16:57 +02:00
llama-vocab.h	Update mtmd to improve accuracy of M-RoPE (#993 )	2025-11-29 07:27:15 +01:00
llama.cpp	Graph parallel for Step-3.5-Flash (#1236 )	2026-02-06 06:56:51 +02:00
unicode-data.cpp	Merge mainline llama.cpp (#3 )	2024-07-27 07:55:01 +02:00
unicode-data.h	Merge mainline llama.cpp (#3 )	2024-07-27 07:55:01 +02:00
unicode.cpp	Server: refactor and rename functions (#1151 )	2026-01-18 08:16:57 +02:00
unicode.h	Enable CUDA graphs for MoE models + GPT-OSS support (#689 )	2025-08-15 09:18:07 +03:00