ik_llama.cpp/github-data/issues
2025-07-22 18:18:40 +02:00
..
26-Feature Request_ Improve CPU processing speed for large contexts.md Add GitHub data (#637) 2025-07-22 18:18:40 +02:00
29-Bug_ some ifdefs missing in ggml_src_iqk_iqk_quantize.cpp.md Add GitHub data (#637) 2025-07-22 18:18:40 +02:00
30-Bug_ Appcrash on Windows 7 with GGML_USE_IQK_MULMAT.md Add GitHub data (#637) 2025-07-22 18:18:40 +02:00
34-Bug_ FA fails when processing prompt lengths that are not a multiple of 8.md Add GitHub data (#637) 2025-07-22 18:18:40 +02:00
59-Bug_ GGML Compilation Error_ undefined references to `iqk_mul_mat'.md Add GitHub data (#637) 2025-07-22 18:18:40 +02:00
60-Bug_ Illegal instruction on NEON and Q4_0_4_4.md Add GitHub data (#637) 2025-07-22 18:18:40 +02:00
67-Feature Request_ Elliminate_reduce unnecessary copies .md Add GitHub data (#637) 2025-07-22 18:18:40 +02:00
88-Bug_ Won't compile on MSVC.md Add GitHub data (#637) 2025-07-22 18:18:40 +02:00
92-Bug_ Quantized KV cache produces garbage in situation where llama.cpp does not.md Add GitHub data (#637) 2025-07-22 18:18:40 +02:00
103-Bug_ K cache without FA.md Add GitHub data (#637) 2025-07-22 18:18:40 +02:00
133-Refactor_ update ggml library_.md Add GitHub data (#637) 2025-07-22 18:18:40 +02:00
159-Feature Request_ steps how to compile as cmake i struction on the origi al repo not work here..md Add GitHub data (#637) 2025-07-22 18:18:40 +02:00
160-Bug_ Can't compile on MSVC 2022.md Add GitHub data (#637) 2025-07-22 18:18:40 +02:00
167-Bug_ Unable to quantize Falcon 10B 1.58 bitnet model.md Add GitHub data (#637) 2025-07-22 18:18:40 +02:00
183-Refactor_ iqk_mul_mat.md Add GitHub data (#637) 2025-07-22 18:18:40 +02:00
196-Refactor_ remove usage of Q8_1 for activation quantization.md Add GitHub data (#637) 2025-07-22 18:18:40 +02:00
199-Bug_ Changing system_prompt on llama-server at runtime breaks parallel processing.md Add GitHub data (#637) 2025-07-22 18:18:40 +02:00
203-Bug_ Compliation Error for Intel(R) Xeon(R) Gold 6326 CPU.md Add GitHub data (#637) 2025-07-22 18:18:40 +02:00
209-Does the iqk_mul_mat.cpp support 1.58-bit quantization model_.md Add GitHub data (#637) 2025-07-22 18:18:40 +02:00
214-AVX512 build error.md Add GitHub data (#637) 2025-07-22 18:18:40 +02:00
217-Bug_ CPU FA with fp16 K-cache is broken.md Add GitHub data (#637) 2025-07-22 18:18:40 +02:00
224-Bug_ IQK_FA_ALL_QUANTS causes failure to compile.md Add GitHub data (#637) 2025-07-22 18:18:40 +02:00
227-Prevent FA usage on CUDA when K and V head sizes are different.md Add GitHub data (#637) 2025-07-22 18:18:40 +02:00
228-Feature Request_ create tool to offline repack models.md Add GitHub data (#637) 2025-07-22 18:18:40 +02:00
230-Weird assert when using online repacking.md Add GitHub data (#637) 2025-07-22 18:18:40 +02:00
245-Bug_ Perplexity returns NaN with IQ4_KSS quantisation.md Add GitHub data (#637) 2025-07-22 18:18:40 +02:00
249-CUDA_ results for MoE models are not reproducible.md Add GitHub data (#637) 2025-07-22 18:18:40 +02:00
254-Split-mode row.md Add GitHub data (#637) 2025-07-22 18:18:40 +02:00
255-Feature Request_ dynamic layer by layer offloading during prompt processing for VRAM constrained scenarios.md Add GitHub data (#637) 2025-07-22 18:18:40 +02:00
257-Bug_ mla=2 in llama-server will crash when request done.md Add GitHub data (#637) 2025-07-22 18:18:40 +02:00
263-Benchmarking DeepSeek R1 - 16x3090.md Add GitHub data (#637) 2025-07-22 18:18:40 +02:00
267-Feature Request_ HugePage mmap alloc for DeepSeek V3_R1.md Add GitHub data (#637) 2025-07-22 18:18:40 +02:00
271-Possible regression computing `wk_b` tensors on the fly after PR #265.md Add GitHub data (#637) 2025-07-22 18:18:40 +02:00
281-Bug_ Strange dips in TG performance.md Add GitHub data (#637) 2025-07-22 18:18:40 +02:00
285-llama-perplexity giving all NaNs on unsloth Q8_0 quant.md Add GitHub data (#637) 2025-07-22 18:18:40 +02:00
293-Feature Request_ IQ6_K row interleaved quant.md Add GitHub data (#637) 2025-07-22 18:18:40 +02:00
296-Possible numerical stability issue with experimental quant of DeepSeek-V3-0324_.md Add GitHub data (#637) 2025-07-22 18:18:40 +02:00
297-Update gguf-py scripts to support new quant types..md Add GitHub data (#637) 2025-07-22 18:18:40 +02:00
300-Bug_ IQK_FA_ALL_QUANTS causes failure to compile.md Add GitHub data (#637) 2025-07-22 18:18:40 +02:00
305-Gibberish output when using DeepSeek-V3-0324-IQ2_K_R4 on mixed CPU + 4 GPUs with -mla (1 or 2).md Add GitHub data (#637) 2025-07-22 18:18:40 +02:00
306-Confused by the -mla flag. What's supported_.md Add GitHub data (#637) 2025-07-22 18:18:40 +02:00
308-Bug_ Compiling for arm64, error_ cannot convert ‘const uint32x4_t’ to ‘uint8x16_t’ and similar errors.md Add GitHub data (#637) 2025-07-22 18:18:40 +02:00
314-Llama 4 Support_.md Add GitHub data (#637) 2025-07-22 18:18:40 +02:00
322-Speculative decoding support.md Add GitHub data (#637) 2025-07-22 18:18:40 +02:00
335-Bug_ Llama 4 generates garbage with longer context (64K+; the issue is not present in the llama.cpp).md Add GitHub data (#637) 2025-07-22 18:18:40 +02:00
339-Bug_ bitnet2b_2501 template issues.md Add GitHub data (#637) 2025-07-22 18:18:40 +02:00
340-Bug_ _unknown model architecture_ 'cohere2'_ when trying to load Command A model.md Add GitHub data (#637) 2025-07-22 18:18:40 +02:00
345-build question newbie.md Add GitHub data (#637) 2025-07-22 18:18:40 +02:00
353-Binaries releases for Windows _.md Add GitHub data (#637) 2025-07-22 18:18:40 +02:00
358-Bug_ IQK_FA_ALL_QUANTS causes failure to compile.md Add GitHub data (#637) 2025-07-22 18:18:40 +02:00
361-Bug_ Build not detecting some supported ARM CPUs.md Add GitHub data (#637) 2025-07-22 18:18:40 +02:00
362-README language is vague wrt. _quantization improvements_.md Add GitHub data (#637) 2025-07-22 18:18:40 +02:00
363-Bug_ Gibberish output when using flash attention using Mistral-Small-Instruct-2409-Q6_K and Gemma-3-12b-it-q4_0 on CPU.md Add GitHub data (#637) 2025-07-22 18:18:40 +02:00
365-Bug_ Updated BitNet arch bitnet-b1.58.md Add GitHub data (#637) 2025-07-22 18:18:40 +02:00
367-Bug_ IQ1_S_R4, IQ1_M_R4 failed on Qwen3-235B-A22B.md Add GitHub data (#637) 2025-07-22 18:18:40 +02:00
373-DeepSeekV3 0324 can't load newest UD quants (with MLA). Older quant works but with slower pre processing than gen speed (CPU + CUDA).md Add GitHub data (#637) 2025-07-22 18:18:40 +02:00
376-Bug_ unknown model architecture_ 'deci' (when loading Llama-3_1-Nemotron-Ultra-253B).md Add GitHub data (#637) 2025-07-22 18:18:40 +02:00
378-Feature Request_ Use ik_llama.cpp with llama-cpp-python.md Add GitHub data (#637) 2025-07-22 18:18:40 +02:00
379-Bug_ Cannot build on WoA.md Add GitHub data (#637) 2025-07-22 18:18:40 +02:00
380-Drop at the start of generation.md Add GitHub data (#637) 2025-07-22 18:18:40 +02:00
381-ik_llama.cpp_ggml_src_ggml-cuda_fattn.cu_66_ fatal error after latest.md Add GitHub data (#637) 2025-07-22 18:18:40 +02:00
383-Bug_ Loading DeepSeek R1T Chimera causes _llama_model_load_ error loading model_ check_tensor_dims_ tensor 'blk.0.attn_q_b.weight' has wrong shape; expected 1536, 73728, got 1536, 24576, 1, Add GitHub data (#637) 2025-07-22 18:18:40 +02:00
387-Bug_ bitnet 1.58 on termux segmentation fault.md Add GitHub data (#637) 2025-07-22 18:18:40 +02:00
388-Bug_ Clash with mainline llama.cpp .so files.md Add GitHub data (#637) 2025-07-22 18:18:40 +02:00
389-Bug_ llama-batched-bench crashed with batch size _2.md Add GitHub data (#637) 2025-07-22 18:18:40 +02:00
398-Bug_ -fmoe causing illegal memory access.md Add GitHub data (#637) 2025-07-22 18:18:40 +02:00
407-Feature Request_ Support for function calling in llama-server.md Add GitHub data (#637) 2025-07-22 18:18:40 +02:00
412-Bug_ Static asserts trip during compile..md Add GitHub data (#637) 2025-07-22 18:18:40 +02:00
419-qwen3 metrics in expert parallel(2x P100).md Add GitHub data (#637) 2025-07-22 18:18:40 +02:00
420-Bug_ standard attention is broken.md Add GitHub data (#637) 2025-07-22 18:18:40 +02:00
423-Bug_ Compile failure undefined reference to `void mul_mat_q_case.md Add GitHub data (#637) 2025-07-22 18:18:40 +02:00
425-Bug_ CUDA error_ an illegal memory access was encountered.md Add GitHub data (#637) 2025-07-22 18:18:40 +02:00
432-Refactor_ GGUF v14 broke compatibility with IQx_KS quants.md Add GitHub data (#637) 2025-07-22 18:18:40 +02:00
433-Feature Request_ CORS support.md Add GitHub data (#637) 2025-07-22 18:18:40 +02:00
436-Bug_ Saving the prompt cache causes Segfault.md Add GitHub data (#637) 2025-07-22 18:18:40 +02:00
437-Feature Request_ support intel amx for further accelerate.md Add GitHub data (#637) 2025-07-22 18:18:40 +02:00
440-Feature Request_ Top n-sigma sampler.md Add GitHub data (#637) 2025-07-22 18:18:40 +02:00
447-Compilation Error_ Error C2676.md Add GitHub data (#637) 2025-07-22 18:18:40 +02:00
450-Bug_ Performance regression.md Add GitHub data (#637) 2025-07-22 18:18:40 +02:00
452-Falcon H1 Support.md Add GitHub data (#637) 2025-07-22 18:18:40 +02:00
455-Bug_ KV cache is never reused in OpenAI compatible Chat Completion api.md Add GitHub data (#637) 2025-07-22 18:18:40 +02:00
456-Bug_ no compilation without IQK_MULMAT.md Add GitHub data (#637) 2025-07-22 18:18:40 +02:00
463-Research_ V100 Flash Attention Implementation.md Add GitHub data (#637) 2025-07-22 18:18:40 +02:00
464-Bug_ The streaming every couple of rows blocks for 5-8s.md Add GitHub data (#637) 2025-07-22 18:18:40 +02:00
467-Bug_ Server does not send data_ [DONE] for OpenAI-compatible streaming endpoint `_v1_chat_completions`.md Add GitHub data (#637) 2025-07-22 18:18:40 +02:00
472-Bug_ Don't build ggml-aarch64 regardless of CPU arch type.md Add GitHub data (#637) 2025-07-22 18:18:40 +02:00
474-Bug_ Perf Regression in PP throughput after Pull #461 (...R4 CUDA impl).md Add GitHub data (#637) 2025-07-22 18:18:40 +02:00
476-Research_ performance divergence.md Add GitHub data (#637) 2025-07-22 18:18:40 +02:00
479-Bug_ _ggml_backend_cuda_graph_compute_ disabling CUDA graphs due to GPU architecture_ flood.md Add GitHub data (#637) 2025-07-22 18:18:40 +02:00
485-Bug_ Illegal Memory Access loading model to CUDA1.md Add GitHub data (#637) 2025-07-22 18:18:40 +02:00
490-Bug_ Performance drop with 14292913 #461.md Add GitHub data (#637) 2025-07-22 18:18:40 +02:00
498-question_ about quantize method.md Add GitHub data (#637) 2025-07-22 18:18:40 +02:00
499-Bug_ cache quantization crash with IQK_FORCE_BF16.md Add GitHub data (#637) 2025-07-22 18:18:40 +02:00
500-Bug_ Insane cudaMalloc OOM Error on Dual 3090 GPUs.md Add GitHub data (#637) 2025-07-22 18:18:40 +02:00
503-Bug_ server_cli fails with segmentation fault.md Add GitHub data (#637) 2025-07-22 18:18:40 +02:00
507-Compatible gguf models _.md Add GitHub data (#637) 2025-07-22 18:18:40 +02:00
514-CUDA Kernel Error on RTX 5090 (Compute Capability 12.0)_ _no kernel image is available for execution on the device_.md Add GitHub data (#637) 2025-07-22 18:18:40 +02:00
521-When offloading semi layers to some GPUs with -ot, TG t_s performance tanks (CUDA + CPU, DeepSeek V3-R1), while not on main llamacpp..md Add GitHub data (#637) 2025-07-22 18:18:40 +02:00
522-Bug_ disabling CUDA graphs due to mul_mat_id.md Add GitHub data (#637) 2025-07-22 18:18:40 +02:00
523-Bug_ tg speed drop after https___github.com_ikawrakow_ik_llama.cpp_pull_518.md Add GitHub data (#637) 2025-07-22 18:18:40 +02:00
527-Bug_ Webui improvement #481 core dump with a certain question..md Add GitHub data (#637) 2025-07-22 18:18:40 +02:00
530-Getting crash on second prompt..md Add GitHub data (#637) 2025-07-22 18:18:40 +02:00
538-Bug_ GGML_ASSERT failed at first prompt.md Add GitHub data (#637) 2025-07-22 18:18:40 +02:00
539-Bug_ garbage output.md Add GitHub data (#637) 2025-07-22 18:18:40 +02:00
551-Feature Request_ Support for Falcon Edge series.md Add GitHub data (#637) 2025-07-22 18:18:40 +02:00
561-Feature Request_ Tencent Hunyuan-A13B model support.md Add GitHub data (#637) 2025-07-22 18:18:40 +02:00
568-Feature Request_ ERNIE MoE Model Support.md Add GitHub data (#637) 2025-07-22 18:18:40 +02:00
572-Bug_ Oops(ggml_compute_forward_sum_rows_f32, ffn_moe_weights_sum-60)_ found nan, on DeepSeek V3_R1 on CUDA + CPU.md Add GitHub data (#637) 2025-07-22 18:18:40 +02:00
575-Bug_ llama-server crash with sampling order.md Add GitHub data (#637) 2025-07-22 18:18:40 +02:00
576-Bug_ llama-server crash with _Deepseek2 does not support K-shift_.md Add GitHub data (#637) 2025-07-22 18:18:40 +02:00
596-Bug_ Lastest commit broke llama-cli on Windows - mmq.cuh_107_ fatal error.md Add GitHub data (#637) 2025-07-22 18:18:40 +02:00
597-Feature Request_ Add THUDM_GLM-4-MoE-100B-A10B support.md Add GitHub data (#637) 2025-07-22 18:18:40 +02:00
600-Feature Request_ Port --reasoning-budget from main llamacpp (llamaserver).md Add GitHub data (#637) 2025-07-22 18:18:40 +02:00
601-Bug_ llama-imatrix crashing.md Add GitHub data (#637) 2025-07-22 18:18:40 +02:00
605-Bug_ IQ3_KS missing from GGMLQuantizationType - gguf_reader.py script cannot process IQ3_KS tensors.md Add GitHub data (#637) 2025-07-22 18:18:40 +02:00
614-Feature Request_ port no-mmproj-offload.md Add GitHub data (#637) 2025-07-22 18:18:40 +02:00
615-Bug_ Gemma3 Vision not working.md Add GitHub data (#637) 2025-07-22 18:18:40 +02:00
625-Bug_ undefined symbol errors after successful compilation.md Add GitHub data (#637) 2025-07-22 18:18:40 +02:00
626-Feature Request_ Add IQK GEMM for IQ1_M.md Add GitHub data (#637) 2025-07-22 18:18:40 +02:00
627-Feature Request_ Tensor Parallelism.md Add GitHub data (#637) 2025-07-22 18:18:40 +02:00
629-Multi-GPU performance (Windows) is significantly worse than single-GPU.md Add GitHub data (#637) 2025-07-22 18:18:40 +02:00