ik_llama.cpp

History

Thireus ☠ 5536e99d42 Port of Qwen3-VL support from mainline (#883 ) * Port of Qwen3-VL for latest ik_llama.cpp - convert_hf_to_gguf.py - Not touched, use llama.cpp to convert model instead - sysl and metal support for imrope not added - Vulkan support for imrope not tested - Code not tested * Bugfix n_embd was declared multiple times https://github.com/ikawrakow/ik_llama.cpp/pull/883#issuecomment-3471179655 * Fix n_embd issue with qwen3vl * model.output tensor not required https://github.com/ikawrakow/ik_llama.cpp/pull/883#discussion_r2480388389 * Improved logic for qkv combined tensors `59ceaf8fcb (r2480395800)` `59ceaf8fcb (r2480398187)` * Fix n_embd for merge_qkv() + cleaner code https://github.com/ikawrakow/ik_llama.cpp/pull/883#discussion_r2481227395 * Revert TENSOR_NOT_REQUIRED		2025-11-04 19:20:54 +02:00
..
CMakeLists.txt	Enable and clean up compiler warnings in src (#824 )	2025-10-11 16:01:13 +03:00
llama-arch.cpp	Port of Qwen3-VL support from mainline (#883 )	2025-11-04 19:20:54 +02:00
llama-arch.h	Port of Qwen3-VL support from mainline (#883 )	2025-11-04 19:20:54 +02:00
llama-build-context.cpp	Port of Qwen3-VL support from mainline (#883 )	2025-11-04 19:20:54 +02:00
llama-build-context.h	Port of Qwen3-VL support from mainline (#883 )	2025-11-04 19:20:54 +02:00
llama-context.h	Support --device and --device-draft parameter (#866 )	2025-10-27 18:13:28 +02:00
llama-cparams.h	RoPE cache (#887 )	2025-11-03 18:42:20 +02:00
llama-grammar.cpp	Tool calls support from mainline (#723 )	2025-09-01 08:38:49 +03:00
llama-grammar.h	Tool calls support from mainline (#723 )	2025-09-01 08:38:49 +03:00
llama-hparams.cpp	Port of Qwen3-VL support from mainline (#883 )	2025-11-04 19:20:54 +02:00
llama-hparams.h	Port of Qwen3-VL support from mainline (#883 )	2025-11-04 19:20:54 +02:00
llama-impl.h	Fix warnings about LLAMA_DEBUG being redefined	2025-10-27 18:41:03 +02:00
llama-load-tensors.cpp	Port of Qwen3-VL support from mainline (#883 )	2025-11-04 19:20:54 +02:00
llama-mmap.cpp	Enable CUDA graphs for MoE models + GPT-OSS support (#689 )	2025-08-15 09:18:07 +03:00
llama-mmap.h	Enable CUDA graphs for MoE models + GPT-OSS support (#689 )	2025-08-15 09:18:07 +03:00
llama-model-loader.cpp	Merge Q, K, V (#878 )	2025-10-30 10:49:48 +02:00
llama-model-loader.h	Merge Q, K, V (#878 )	2025-10-30 10:49:48 +02:00
llama-model.cpp	Port of Qwen3-VL support from mainline (#883 )	2025-11-04 19:20:54 +02:00
llama-model.h	Port of Qwen3-VL support from mainline (#883 )	2025-11-04 19:20:54 +02:00
llama-quantize.cpp	Merge Q, K, V (#878 )	2025-10-30 10:49:48 +02:00
llama-sampling.cpp	Enable and clean up compiler warnings in src (#824 )	2025-10-11 16:01:13 +03:00
llama-sampling.h	add dry sampler (#513 )	2025-06-19 10:24:53 +03:00
llama-vocab.cpp	Adding Ling/Ring (a.k.a., Bailing-MoE2) support (#833 )	2025-10-15 14:20:40 +03:00
llama-vocab.h	model : add grok-2 support (#782 )	2025-09-23 16:31:01 +02:00
llama.cpp	Port of Qwen3-VL support from mainline (#883 )	2025-11-04 19:20:54 +02:00
unicode-data.cpp	Merge mainline llama.cpp (#3 )	2024-07-27 07:55:01 +02:00
unicode-data.h	Merge mainline llama.cpp (#3 )	2024-07-27 07:55:01 +02:00
unicode.cpp	Enable CUDA graphs for MoE models + GPT-OSS support (#689 )	2025-08-15 09:18:07 +03:00
unicode.h	Enable CUDA graphs for MoE models + GPT-OSS support (#689 )	2025-08-15 09:18:07 +03:00