ik_llama.cpp

Thomas ab7d193fe0 Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
..
8-New quantization types IQ2_K, IQ3_K, IQ4_K, IQ5_K.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
15-Will LQER improve k- and i-quants_.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
18-CPU beating GPU in token generation speed.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
25-CPU prompt processing speed for large contexts.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
63-LLaMA-3.2 quantization evaluation.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
82-4bpw GGML TYPE_.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
95-Bitnet.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
100-New argument _ env variable for GGML_SCHED_MAX_COPIES_.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
104-Convenience improvements for llama-quantize.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
140-Questions about weight[j].md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
164-Latest CPU performance comparison with llama.cpp.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
165-Norm RMS Epsilon.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
166-Learning more LLM quantization.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
201-What is the NUMA situation _.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
211-help me create an importance matrix primer.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
223-Recent performance testing with DeepSeek R1.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
242-Switching from llama.cpp_ktransformers, seeking advice_guidance.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
256-Diverging from llama.cpp.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
258-Quick-start Guide coming over from llama.cpp and ktransformers!.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
266-Benchmarking DeepSeek R1 - 16x3090.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
286-Testing `deepseek-ai_DeepSeek-V3-0324` model support..md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
288-On @compilade's PR 12557 and @jukofyork's quantization ideas.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
316-Mainline is now copying stuff from ik_llama.cpp.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
319-KTransformers copying ik_llama.cpp.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
323-Is there an easy way to repack an existing GGUF so it could be used without --run-time-repack (thus enabling mmap).md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
334-`iq4_ks` performs great on gemma-3-27b-it-qat-q4_0-unquantized.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
350-Maverick slow prompt with gpu.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
354-Not all MLAs are born equal.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
357-Qwen3 - early performance comparisons.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
359-Qwen3 quantization experiments.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
372-multy gpu.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
384-ik_llama.cpp issues on an old workstation.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
385-Qwen3 235B performance on Intel Xeon Scalable processor.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
393-Creating quantized models.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
395-Why does imatrix not tokenize special tokens_.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
396-Best settings for Maverick - Dual CPU Xeon 8480+ - RTX 3090.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
397-KV split while using `-sm row`.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
399-Qwen 30b.A3b IK_LCPP comparisons on lowspec machine.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
401-install bitnet (or other cpu models) on a fresh termux aarch64.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
403-Tool Calling and Structured Response (Json Mode) support.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
434-Quant Cookers Basic Guide.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
451-Context reuse _ context shift for long prompts.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
459-qwen3 metrics on ancient hardware (2x xeon Vs 2x P100).md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
466-A curiosity..md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
477-DeepSeek-R1-0528 ik quants!.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
491--rtr actually hurts prompt t_s for large ubatch_.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
519-Android Build.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
526-Partial requant feature to save compute and time during tests..md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
532-Guidance on GPU Layer Offloading Strategy in ik_llama.cpp for Multi GPU Rig (2x5090 + 2x4090).md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
543-dots.llm1 support and thanks.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
545-Vulkan support_.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
548-Poor performance with bf16 model on Qwen3 30B-A3B.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
556-ik_llama.cpp for Armv8.0.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
562-AMD GPU Vulkan & ROCm_HIP Discussion.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
564-Maybe an interesting CUDA PR here..md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
586-Slow KV cache rm operation.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
590-How important is Vulkan back-end development_.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
591-I dont see any speed improvement in generation, so want to understand if i am missing something.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
594-Is AVX2 a hard requirement on x64_.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
599-mla matrix absorbtion.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
613-Pathological Quant_CUDA combinations -- How to know what works_.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
619-gpu p2p utilization.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
621-Deepseek v3_r1 poisoned prompt_.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
623-Quantizing panels_bundles instead of blocks_.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00

8-New quantization types IQ2_K, IQ3_K, IQ4_K, IQ5_K.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

15-Will LQER improve k- and i-quants_.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

18-CPU beating GPU in token generation speed.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

25-CPU prompt processing speed for large contexts.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

63-LLaMA-3.2 quantization evaluation.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

82-4bpw GGML TYPE_.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

95-Bitnet.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

100-New argument _ env variable for GGML_SCHED_MAX_COPIES_.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

104-Convenience improvements for llama-quantize.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

140-Questions about weight[j].md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

164-Latest CPU performance comparison with llama.cpp.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

165-Norm RMS Epsilon.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

166-Learning more LLM quantization.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

201-What is the NUMA situation _.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

211-help me create an importance matrix primer.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

223-Recent performance testing with DeepSeek R1.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

242-Switching from llama.cpp_ktransformers, seeking advice_guidance.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

256-Diverging from llama.cpp.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

258-Quick-start Guide coming over from llama.cpp and ktransformers!.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

266-Benchmarking DeepSeek R1 - 16x3090.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

286-Testing `deepseek-ai_DeepSeek-V3-0324` model support..md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

288-On @compilade's PR 12557 and @jukofyork's quantization ideas.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

316-Mainline is now copying stuff from ik_llama.cpp.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

319-KTransformers copying ik_llama.cpp.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

323-Is there an easy way to repack an existing GGUF so it could be used without --run-time-repack (thus enabling mmap).md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

334-`iq4_ks` performs great on gemma-3-27b-it-qat-q4_0-unquantized.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

350-Maverick slow prompt with gpu.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

354-Not all MLAs are born equal.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

357-Qwen3 - early performance comparisons.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

359-Qwen3 quantization experiments.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

372-multy gpu.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

384-ik_llama.cpp issues on an old workstation.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

385-Qwen3 235B performance on Intel Xeon Scalable processor.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

393-Creating quantized models.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

395-Why does imatrix not tokenize special tokens_.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

396-Best settings for Maverick - Dual CPU Xeon 8480+ - RTX 3090.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

397-KV split while using `-sm row`.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

399-Qwen 30b.A3b IK_LCPP comparisons on lowspec machine.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

401-install bitnet (or other cpu models) on a fresh termux aarch64.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

403-Tool Calling and Structured Response (Json Mode) support.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

434-Quant Cookers Basic Guide.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

451-Context reuse _ context shift for long prompts.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

459-qwen3 metrics on ancient hardware (2x xeon Vs 2x P100).md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

466-A curiosity..md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

477-DeepSeek-R1-0528 ik quants!.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

491--rtr actually hurts prompt t_s for large ubatch_.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

519-Android Build.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

526-Partial requant feature to save compute and time during tests..md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

532-Guidance on GPU Layer Offloading Strategy in ik_llama.cpp for Multi GPU Rig (2x5090 + 2x4090).md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

543-dots.llm1 support and thanks.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

545-Vulkan support_.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

548-Poor performance with bf16 model on Qwen3 30B-A3B.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

556-ik_llama.cpp for Armv8.0.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

562-AMD GPU Vulkan & ROCm_HIP Discussion.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

564-Maybe an interesting CUDA PR here..md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

586-Slow KV cache rm operation.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

590-How important is Vulkan back-end development_.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

591-I dont see any speed improvement in generation, so want to understand if i am missing something.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

594-Is AVX2 a hard requirement on x64_.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

599-mla matrix absorbtion.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

613-Pathological Quant_CUDA combinations -- How to know what works_.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

619-gpu p2p utilization.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

621-Deepseek v3_r1 poisoned prompt_.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

623-Quantizing panels_bundles instead of blocks_.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00