| .. |
|
8-New quantization types IQ2_K, IQ3_K, IQ4_K, IQ5_K.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
15-Will LQER improve k- and i-quants_.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
18-CPU beating GPU in token generation speed.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
25-CPU prompt processing speed for large contexts.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
63-LLaMA-3.2 quantization evaluation.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
82-4bpw GGML TYPE_.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
95-Bitnet.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
100-New argument _ env variable for GGML_SCHED_MAX_COPIES_.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
104-Convenience improvements for llama-quantize.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
140-Questions about weight[j].md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
164-Latest CPU performance comparison with llama.cpp.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
165-Norm RMS Epsilon.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
166-Learning more LLM quantization.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
201-What is the NUMA situation _.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
211-help me create an importance matrix primer.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
223-Recent performance testing with DeepSeek R1.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
242-Switching from llama.cpp_ktransformers, seeking advice_guidance.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
256-Diverging from llama.cpp.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
258-Quick-start Guide coming over from llama.cpp and ktransformers!.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
266-Benchmarking DeepSeek R1 - 16x3090.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
286-Testing `deepseek-ai_DeepSeek-V3-0324` model support..md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
288-On @compilade's PR 12557 and @jukofyork's quantization ideas.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
316-Mainline is now copying stuff from ik_llama.cpp.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
319-KTransformers copying ik_llama.cpp.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
323-Is there an easy way to repack an existing GGUF so it could be used without --run-time-repack (thus enabling mmap).md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
334-`iq4_ks` performs great on gemma-3-27b-it-qat-q4_0-unquantized.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
350-Maverick slow prompt with gpu.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
354-Not all MLAs are born equal.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
357-Qwen3 - early performance comparisons.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
359-Qwen3 quantization experiments.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
372-multy gpu.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
384-ik_llama.cpp issues on an old workstation.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
385-Qwen3 235B performance on Intel Xeon Scalable processor.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
393-Creating quantized models.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
395-Why does imatrix not tokenize special tokens_.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
396-Best settings for Maverick - Dual CPU Xeon 8480+ - RTX 3090.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
397-KV split while using `-sm row`.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
399-Qwen 30b.A3b IK_LCPP comparisons on lowspec machine.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
401-install bitnet (or other cpu models) on a fresh termux aarch64.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
403-Tool Calling and Structured Response (Json Mode) support.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
434-Quant Cookers Basic Guide.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
451-Context reuse _ context shift for long prompts.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
459-qwen3 metrics on ancient hardware (2x xeon Vs 2x P100).md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
466-A curiosity..md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
477-DeepSeek-R1-0528 ik quants!.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
491--rtr actually hurts prompt t_s for large ubatch_.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
519-Android Build.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
526-Partial requant feature to save compute and time during tests..md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
532-Guidance on GPU Layer Offloading Strategy in ik_llama.cpp for Multi GPU Rig (2x5090 + 2x4090).md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
543-dots.llm1 support and thanks.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
545-Vulkan support_.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
548-Poor performance with bf16 model on Qwen3 30B-A3B.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
556-ik_llama.cpp for Armv8.0.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
562-AMD GPU Vulkan & ROCm_HIP Discussion.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
564-Maybe an interesting CUDA PR here..md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
586-Slow KV cache rm operation.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
590-How important is Vulkan back-end development_.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
591-I dont see any speed improvement in generation, so want to understand if i am missing something.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
594-Is AVX2 a hard requirement on x64_.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
599-mla matrix absorbtion.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
613-Pathological Quant_CUDA combinations -- How to know what works_.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
619-gpu p2p utilization.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
621-Deepseek v3_r1 poisoned prompt_.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
623-Quantizing panels_bundles instead of blocks_.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |