ik_llama.cpp

Thomas ab7d193fe0 Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
..
26-Feature Request_ Improve CPU processing speed for large contexts.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
29-Bug_ some ifdefs missing in ggml_src_iqk_iqk_quantize.cpp.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
30-Bug_ Appcrash on Windows 7 with GGML_USE_IQK_MULMAT.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
34-Bug_ FA fails when processing prompt lengths that are not a multiple of 8.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
59-Bug_ GGML Compilation Error_ undefined references to `iqk_mul_mat'.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
60-Bug_ Illegal instruction on NEON and Q4_0_4_4.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
67-Feature Request_ Elliminate_reduce unnecessary copies .md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
88-Bug_ Won't compile on MSVC.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
92-Bug_ Quantized KV cache produces garbage in situation where llama.cpp does not.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
103-Bug_ K cache without FA.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
133-Refactor_ update ggml library_.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
159-Feature Request_ steps how to compile as cmake i struction on the origi al repo not work here..md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
160-Bug_ Can't compile on MSVC 2022.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
167-Bug_ Unable to quantize Falcon 10B 1.58 bitnet model.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
183-Refactor_ iqk_mul_mat.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
196-Refactor_ remove usage of Q8_1 for activation quantization.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
199-Bug_ Changing system_prompt on llama-server at runtime breaks parallel processing.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
203-Bug_ Compliation Error for Intel(R) Xeon(R) Gold 6326 CPU.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
209-Does the iqk_mul_mat.cpp support 1.58-bit quantization model_.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
214-AVX512 build error.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
217-Bug_ CPU FA with fp16 K-cache is broken.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
224-Bug_ IQK_FA_ALL_QUANTS causes failure to compile.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
227-Prevent FA usage on CUDA when K and V head sizes are different.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
228-Feature Request_ create tool to offline repack models.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
230-Weird assert when using online repacking.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
245-Bug_ Perplexity returns NaN with IQ4_KSS quantisation.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
249-CUDA_ results for MoE models are not reproducible.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
254-Split-mode row.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
255-Feature Request_ dynamic layer by layer offloading during prompt processing for VRAM constrained scenarios.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
257-Bug_ mla=2 in llama-server will crash when request done.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
263-Benchmarking DeepSeek R1 - 16x3090.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
267-Feature Request_ HugePage mmap alloc for DeepSeek V3_R1.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
271-Possible regression computing `wk_b` tensors on the fly after PR #265.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
281-Bug_ Strange dips in TG performance.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
285-llama-perplexity giving all NaNs on unsloth Q8_0 quant.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
293-Feature Request_ IQ6_K row interleaved quant.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
296-Possible numerical stability issue with experimental quant of DeepSeek-V3-0324_.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
297-Update gguf-py scripts to support new quant types..md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
300-Bug_ IQK_FA_ALL_QUANTS causes failure to compile.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
305-Gibberish output when using DeepSeek-V3-0324-IQ2_K_R4 on mixed CPU + 4 GPUs with -mla (1 or 2).md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
306-Confused by the -mla flag. What's supported_.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
308-Bug_ Compiling for arm64, error_ cannot convert ‘const uint32x4_t’ to ‘uint8x16_t’ and similar errors.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
314-Llama 4 Support_.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
322-Speculative decoding support.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
335-Bug_ Llama 4 generates garbage with longer context (64K+; the issue is not present in the llama.cpp).md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
339-Bug_ bitnet2b_2501 template issues.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
340-Bug_ _unknown model architecture_ 'cohere2'_ when trying to load Command A model.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
345-build question newbie.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
353-Binaries releases for Windows _.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
358-Bug_ IQK_FA_ALL_QUANTS causes failure to compile.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
361-Bug_ Build not detecting some supported ARM CPUs.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
362-README language is vague wrt. _quantization improvements_.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
363-Bug_ Gibberish output when using flash attention using Mistral-Small-Instruct-2409-Q6_K and Gemma-3-12b-it-q4_0 on CPU.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
365-Bug_ Updated BitNet arch bitnet-b1.58.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
367-Bug_ IQ1_S_R4, IQ1_M_R4 failed on Qwen3-235B-A22B.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
373-DeepSeekV3 0324 can't load newest UD quants (with MLA). Older quant works but with slower pre processing than gen speed (CPU + CUDA).md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
376-Bug_ unknown model architecture_ 'deci' (when loading Llama-3_1-Nemotron-Ultra-253B).md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
378-Feature Request_ Use ik_llama.cpp with llama-cpp-python.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
379-Bug_ Cannot build on WoA.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
380-Drop at the start of generation.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
381-ik_llama.cpp_ggml_src_ggml-cuda_fattn.cu_66_ fatal error after latest.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
383-Bug_ Loading DeepSeek R1T Chimera causes _llama_model_load_ error loading model_ check_tensor_dims_ tensor 'blk.0.attn_q_b.weight' has wrong shape; expected 1536, 73728, got 1536, 24576, 1,	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
387-Bug_ bitnet 1.58 on termux segmentation fault.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
388-Bug_ Clash with mainline llama.cpp .so files.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
389-Bug_ llama-batched-bench crashed with batch size _2.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
398-Bug_ -fmoe causing illegal memory access.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
407-Feature Request_ Support for function calling in llama-server.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
412-Bug_ Static asserts trip during compile..md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
419-qwen3 metrics in expert parallel(2x P100).md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
420-Bug_ standard attention is broken.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
423-Bug_ Compile failure undefined reference to `void mul_mat_q_case.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
425-Bug_ CUDA error_ an illegal memory access was encountered.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
432-Refactor_ GGUF v14 broke compatibility with IQx_KS quants.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
433-Feature Request_ CORS support.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
436-Bug_ Saving the prompt cache causes Segfault.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
437-Feature Request_ support intel amx for further accelerate.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
440-Feature Request_ Top n-sigma sampler.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
447-Compilation Error_ Error C2676.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
450-Bug_ Performance regression.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
452-Falcon H1 Support.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
455-Bug_ KV cache is never reused in OpenAI compatible Chat Completion api.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
456-Bug_ no compilation without IQK_MULMAT.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
463-Research_ V100 Flash Attention Implementation.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
464-Bug_ The streaming every couple of rows blocks for 5-8s.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
467-Bug_ Server does not send data_ [DONE] for OpenAI-compatible streaming endpoint `_v1_chat_completions`.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
472-Bug_ Don't build ggml-aarch64 regardless of CPU arch type.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
474-Bug_ Perf Regression in PP throughput after Pull #461 (...R4 CUDA impl).md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
476-Research_ performance divergence.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
479-Bug_ _ggml_backend_cuda_graph_compute_ disabling CUDA graphs due to GPU architecture_ flood.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
485-Bug_ Illegal Memory Access loading model to CUDA1.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
490-Bug_ Performance drop with 14292913 #461.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
498-question_ about quantize method.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
499-Bug_ cache quantization crash with IQK_FORCE_BF16.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
500-Bug_ Insane cudaMalloc OOM Error on Dual 3090 GPUs.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
503-Bug_ server_cli fails with segmentation fault.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
507-Compatible gguf models _.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
514-CUDA Kernel Error on RTX 5090 (Compute Capability 12.0)_ _no kernel image is available for execution on the device_.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
521-When offloading semi layers to some GPUs with -ot, TG t_s performance tanks (CUDA + CPU, DeepSeek V3-R1), while not on main llamacpp..md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
522-Bug_ disabling CUDA graphs due to mul_mat_id.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
523-Bug_ tg speed drop after https___github.com_ikawrakow_ik_llama.cpp_pull_518.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
527-Bug_ Webui improvement #481 core dump with a certain question..md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
530-Getting crash on second prompt..md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
538-Bug_ GGML_ASSERT failed at first prompt.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
539-Bug_ garbage output.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
551-Feature Request_ Support for Falcon Edge series.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
561-Feature Request_ Tencent Hunyuan-A13B model support.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
568-Feature Request_ ERNIE MoE Model Support.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
572-Bug_ Oops(ggml_compute_forward_sum_rows_f32, ffn_moe_weights_sum-60)_ found nan, on DeepSeek V3_R1 on CUDA + CPU.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
575-Bug_ llama-server crash with sampling order.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
576-Bug_ llama-server crash with _Deepseek2 does not support K-shift_.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
596-Bug_ Lastest commit broke llama-cli on Windows - mmq.cuh_107_ fatal error.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
597-Feature Request_ Add THUDM_GLM-4-MoE-100B-A10B support.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
600-Feature Request_ Port --reasoning-budget from main llamacpp (llamaserver).md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
601-Bug_ llama-imatrix crashing.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
605-Bug_ IQ3_KS missing from GGMLQuantizationType - gguf_reader.py script cannot process IQ3_KS tensors.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
614-Feature Request_ port no-mmproj-offload.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
615-Bug_ Gemma3 Vision not working.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
625-Bug_ undefined symbol errors after successful compilation.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
626-Feature Request_ Add IQK GEMM for IQ1_M.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
627-Feature Request_ Tensor Parallelism.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00
629-Multi-GPU performance (Windows) is significantly worse than single-GPU.md	Add GitHub data (#637 )	2025-07-22 18:18:40 +02:00

26-Feature Request_ Improve CPU processing speed for large contexts.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

29-Bug_ some ifdefs missing in ggml_src_iqk_iqk_quantize.cpp.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

30-Bug_ Appcrash on Windows 7 with GGML_USE_IQK_MULMAT.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

34-Bug_ FA fails when processing prompt lengths that are not a multiple of 8.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

59-Bug_ GGML Compilation Error_ undefined references to `iqk_mul_mat'.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

60-Bug_ Illegal instruction on NEON and Q4_0_4_4.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

67-Feature Request_ Elliminate_reduce unnecessary copies .md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

88-Bug_ Won't compile on MSVC.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

92-Bug_ Quantized KV cache produces garbage in situation where llama.cpp does not.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

103-Bug_ K cache without FA.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

133-Refactor_ update ggml library_.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

159-Feature Request_ steps how to compile as cmake i struction on the origi al repo not work here..md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

160-Bug_ Can't compile on MSVC 2022.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

167-Bug_ Unable to quantize Falcon 10B 1.58 bitnet model.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

183-Refactor_ iqk_mul_mat.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

196-Refactor_ remove usage of Q8_1 for activation quantization.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

199-Bug_ Changing system_prompt on llama-server at runtime breaks parallel processing.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

203-Bug_ Compliation Error for Intel(R) Xeon(R) Gold 6326 CPU.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

209-Does the iqk_mul_mat.cpp support 1.58-bit quantization model_.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

214-AVX512 build error.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

217-Bug_ CPU FA with fp16 K-cache is broken.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

224-Bug_ IQK_FA_ALL_QUANTS causes failure to compile.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

227-Prevent FA usage on CUDA when K and V head sizes are different.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

228-Feature Request_ create tool to offline repack models.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

230-Weird assert when using online repacking.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

245-Bug_ Perplexity returns NaN with IQ4_KSS quantisation.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

249-CUDA_ results for MoE models are not reproducible.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

254-Split-mode row.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

255-Feature Request_ dynamic layer by layer offloading during prompt processing for VRAM constrained scenarios.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

257-Bug_ mla=2 in llama-server will crash when request done.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

263-Benchmarking DeepSeek R1 - 16x3090.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

267-Feature Request_ HugePage mmap alloc for DeepSeek V3_R1.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

271-Possible regression computing `wk_b` tensors on the fly after PR #265.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

281-Bug_ Strange dips in TG performance.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

285-llama-perplexity giving all NaNs on unsloth Q8_0 quant.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

293-Feature Request_ IQ6_K row interleaved quant.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

296-Possible numerical stability issue with experimental quant of DeepSeek-V3-0324_.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

297-Update gguf-py scripts to support new quant types..md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

300-Bug_ IQK_FA_ALL_QUANTS causes failure to compile.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

305-Gibberish output when using DeepSeek-V3-0324-IQ2_K_R4 on mixed CPU + 4 GPUs with -mla (1 or 2).md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

306-Confused by the -mla flag. What's supported_.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

308-Bug_ Compiling for arm64, error_ cannot convert ‘const uint32x4_t’ to ‘uint8x16_t’ and similar errors.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

314-Llama 4 Support_.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

322-Speculative decoding support.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

335-Bug_ Llama 4 generates garbage with longer context (64K+; the issue is not present in the llama.cpp).md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

339-Bug_ bitnet2b_2501 template issues.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

340-Bug_ _unknown model architecture_ 'cohere2'_ when trying to load Command A model.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

345-build question newbie.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

353-Binaries releases for Windows _.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

358-Bug_ IQK_FA_ALL_QUANTS causes failure to compile.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

361-Bug_ Build not detecting some supported ARM CPUs.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

362-README language is vague wrt. _quantization improvements_.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

363-Bug_ Gibberish output when using flash attention using Mistral-Small-Instruct-2409-Q6_K and Gemma-3-12b-it-q4_0 on CPU.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

365-Bug_ Updated BitNet arch bitnet-b1.58.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

367-Bug_ IQ1_S_R4, IQ1_M_R4 failed on Qwen3-235B-A22B.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

373-DeepSeekV3 0324 can't load newest UD quants (with MLA). Older quant works but with slower pre processing than gen speed (CPU + CUDA).md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

376-Bug_ unknown model architecture_ 'deci' (when loading Llama-3_1-Nemotron-Ultra-253B).md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

378-Feature Request_ Use ik_llama.cpp with llama-cpp-python.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

379-Bug_ Cannot build on WoA.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

380-Drop at the start of generation.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

381-ik_llama.cpp_ggml_src_ggml-cuda_fattn.cu_66_ fatal error after latest.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

383-Bug_ Loading DeepSeek R1T Chimera causes _llama_model_load_ error loading model_ check_tensor_dims_ tensor 'blk.0.attn_q_b.weight' has wrong shape; expected 1536, 73728, got 1536, 24576, 1,

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

387-Bug_ bitnet 1.58 on termux segmentation fault.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

388-Bug_ Clash with mainline llama.cpp .so files.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

389-Bug_ llama-batched-bench crashed with batch size _2.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

398-Bug_ -fmoe causing illegal memory access.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

407-Feature Request_ Support for function calling in llama-server.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

412-Bug_ Static asserts trip during compile..md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

419-qwen3 metrics in expert parallel(2x P100).md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

420-Bug_ standard attention is broken.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

423-Bug_ Compile failure undefined reference to `void mul_mat_q_case.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

425-Bug_ CUDA error_ an illegal memory access was encountered.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

432-Refactor_ GGUF v14 broke compatibility with IQx_KS quants.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

433-Feature Request_ CORS support.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

436-Bug_ Saving the prompt cache causes Segfault.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

437-Feature Request_ support intel amx for further accelerate.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

440-Feature Request_ Top n-sigma sampler.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

447-Compilation Error_ Error C2676.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

450-Bug_ Performance regression.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

452-Falcon H1 Support.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

455-Bug_ KV cache is never reused in OpenAI compatible Chat Completion api.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

456-Bug_ no compilation without IQK_MULMAT.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

463-Research_ V100 Flash Attention Implementation.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

464-Bug_ The streaming every couple of rows blocks for 5-8s.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

467-Bug_ Server does not send data_ [DONE] for OpenAI-compatible streaming endpoint `_v1_chat_completions`.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

472-Bug_ Don't build ggml-aarch64 regardless of CPU arch type.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

474-Bug_ Perf Regression in PP throughput after Pull #461 (...R4 CUDA impl).md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

476-Research_ performance divergence.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

479-Bug_ _ggml_backend_cuda_graph_compute_ disabling CUDA graphs due to GPU architecture_ flood.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

485-Bug_ Illegal Memory Access loading model to CUDA1.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

490-Bug_ Performance drop with 14292913 #461.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

498-question_ about quantize method.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

499-Bug_ cache quantization crash with IQK_FORCE_BF16.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

500-Bug_ Insane cudaMalloc OOM Error on Dual 3090 GPUs.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

503-Bug_ server_cli fails with segmentation fault.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

507-Compatible gguf models _.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

514-CUDA Kernel Error on RTX 5090 (Compute Capability 12.0)_ _no kernel image is available for execution on the device_.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

521-When offloading semi layers to some GPUs with -ot, TG t_s performance tanks (CUDA + CPU, DeepSeek V3-R1), while not on main llamacpp..md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

522-Bug_ disabling CUDA graphs due to mul_mat_id.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

523-Bug_ tg speed drop after https___github.com_ikawrakow_ik_llama.cpp_pull_518.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

527-Bug_ Webui improvement #481 core dump with a certain question..md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

530-Getting crash on second prompt..md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

538-Bug_ GGML_ASSERT failed at first prompt.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

539-Bug_ garbage output.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

551-Feature Request_ Support for Falcon Edge series.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

561-Feature Request_ Tencent Hunyuan-A13B model support.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

568-Feature Request_ ERNIE MoE Model Support.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

572-Bug_ Oops(ggml_compute_forward_sum_rows_f32, ffn_moe_weights_sum-60)_ found nan, on DeepSeek V3_R1 on CUDA + CPU.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

575-Bug_ llama-server crash with sampling order.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

576-Bug_ llama-server crash with _Deepseek2 does not support K-shift_.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

596-Bug_ Lastest commit broke llama-cli on Windows - mmq.cuh_107_ fatal error.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

597-Feature Request_ Add THUDM_GLM-4-MoE-100B-A10B support.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

600-Feature Request_ Port --reasoning-budget from main llamacpp (llamaserver).md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

601-Bug_ llama-imatrix crashing.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

605-Bug_ IQ3_KS missing from GGMLQuantizationType - gguf_reader.py script cannot process IQ3_KS tensors.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

614-Feature Request_ port no-mmproj-offload.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

615-Bug_ Gemma3 Vision not working.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

625-Bug_ undefined symbol errors after successful compilation.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

626-Feature Request_ Add IQK GEMM for IQ1_M.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

627-Feature Request_ Tensor Parallelism.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00

629-Multi-GPU performance (Windows) is significantly worse than single-GPU.md

Add GitHub data (#637 )

2025-07-22 18:18:40 +02:00