ik_llama.cpp

Thomas eaa2510a28 Add GitHub data: filename sanitization (#640 )	2025-07-23 13:31:53 +02:00
..
26 - Feature Request_ Improve CPU processing speed for large contexts.md	Add GitHub data: filename sanitization (#640 )	2025-07-23 13:31:53 +02:00
29 - Bug_ some ifdefs missing in ggml_src_iqk_iqk_quantize.cpp.md	Add GitHub data: filename sanitization (#640 )	2025-07-23 13:31:53 +02:00
30 - Bug_ Appcrash on Windows 7 with GGML_USE_IQK_MULMAT.md	Add GitHub data: filename sanitization (#640 )	2025-07-23 13:31:53 +02:00
34 - Bug_ FA fails when processing prompt lengths that are not a multiple of .md	Add GitHub data: filename sanitization (#640 )	2025-07-23 13:31:53 +02:00
59 - Bug_ GGML Compilation Error_ undefined references to _iqk_mul_mat_.md	Add GitHub data: filename sanitization (#640 )	2025-07-23 13:31:53 +02:00
60 - Bug_ Illegal instruction on NEON and Q4_0_4_4.md	Add GitHub data: filename sanitization (#640 )	2025-07-23 13:31:53 +02:00
67 - Feature Request_ Elliminate_reduce unnecessary copies.md	Add GitHub data: filename sanitization (#640 )	2025-07-23 13:31:53 +02:00
88 - Bug_ Won_t compile on MSVC.md	Add GitHub data: filename sanitization (#640 )	2025-07-23 13:31:53 +02:00
92 - Bug_ Quantized KV cache produces garbage in situation where llama.cpp do.md	Add GitHub data: filename sanitization (#640 )	2025-07-23 13:31:53 +02:00
103 - Bug_ K cache without FA.md	Add GitHub data: filename sanitization (#640 )	2025-07-23 13:31:53 +02:00
133 - Refactor_ update ggml library_.md	Add GitHub data: filename sanitization (#640 )	2025-07-23 13:31:53 +02:00
159 - Feature Request_ steps how to compile as cmake i struction on the origi.md	Add GitHub data: filename sanitization (#640 )	2025-07-23 13:31:53 +02:00
160 - Bug_ Can_t compile on MSVC 2022.md	Add GitHub data: filename sanitization (#640 )	2025-07-23 13:31:53 +02:00
167 - Bug_ Unable to quantize Falcon 10B 1.58 bitnet model.md	Add GitHub data: filename sanitization (#640 )	2025-07-23 13:31:53 +02:00
183 - Refactor_ iqk_mul_mat.md	Add GitHub data: filename sanitization (#640 )	2025-07-23 13:31:53 +02:00
196 - Refactor_ remove usage of Q8_1 for activation quantization.md	Add GitHub data: filename sanitization (#640 )	2025-07-23 13:31:53 +02:00
199 - Bug_ Changing system_prompt on llama-server at runtime breaks parallel .md	Add GitHub data: filename sanitization (#640 )	2025-07-23 13:31:53 +02:00
203 - Bug_ Compliation Error for Intel_R_ Xeon_R_ Gold 6326 CPU.md	Add GitHub data: filename sanitization (#640 )	2025-07-23 13:31:53 +02:00
209 - Does the iqk_mul_mat.cpp support 1.58-bit quantization model_.md	Add GitHub data: filename sanitization (#640 )	2025-07-23 13:31:53 +02:00
214 - AVX512 build error.md	Add GitHub data: filename sanitization (#640 )	2025-07-23 13:31:53 +02:00
217 - Bug_ CPU FA with fp16 K-cache is broken.md	Add GitHub data: filename sanitization (#640 )	2025-07-23 13:31:53 +02:00
224 - Bug_ IQK_FA_ALL_QUANTS causes failure to compile.md	Add GitHub data: filename sanitization (#640 )	2025-07-23 13:31:53 +02:00
227 - Prevent FA usage on CUDA when K and V head sizes are different.md	Add GitHub data: filename sanitization (#640 )	2025-07-23 13:31:53 +02:00
228 - Feature Request_ create tool to offline repack models.md	Add GitHub data: filename sanitization (#640 )	2025-07-23 13:31:53 +02:00
230 - Weird assert when using online repacking.md	Add GitHub data: filename sanitization (#640 )	2025-07-23 13:31:53 +02:00
245 - Bug_ Perplexity returns NaN with IQ4_KSS quantisation.md	Add GitHub data: filename sanitization (#640 )	2025-07-23 13:31:53 +02:00
249 - CUDA_ results for MoE models are not reproducible.md	Add GitHub data: filename sanitization (#640 )	2025-07-23 13:31:53 +02:00
254 - Split-mode row.md	Add GitHub data: filename sanitization (#640 )	2025-07-23 13:31:53 +02:00
255 - Feature Request_ dynamic layer by layer offloading during prompt proces.md	Add GitHub data: filename sanitization (#640 )	2025-07-23 13:31:53 +02:00
257 - Bug_ mla_2 in llama-server will crash when request done.md	Add GitHub data: filename sanitization (#640 )	2025-07-23 13:31:53 +02:00
263 - Benchmarking DeepSeek R1 - 16x3090.md	Add GitHub data: filename sanitization (#640 )	2025-07-23 13:31:53 +02:00
267 - Feature Request_ HugePage mmap alloc for DeepSeek V3_R1.md	Add GitHub data: filename sanitization (#640 )	2025-07-23 13:31:53 +02:00
271 - Possible regression computing _wk_b_ tensors on the fly after PR _265.md	Add GitHub data: filename sanitization (#640 )	2025-07-23 13:31:53 +02:00
281 - Bug_ Strange dips in TG performance.md	Add GitHub data: filename sanitization (#640 )	2025-07-23 13:31:53 +02:00
285 - llama-perplexity giving all NaNs on unsloth Q8_0 quant.md	Add GitHub data: filename sanitization (#640 )	2025-07-23 13:31:53 +02:00
293 - Feature Request_ IQ6_K row interleaved quant.md	Add GitHub data: filename sanitization (#640 )	2025-07-23 13:31:53 +02:00
296 - Possible numerical stability issue with experimental quant of DeepSeek-.md	Add GitHub data: filename sanitization (#640 )	2025-07-23 13:31:53 +02:00
297 - Update gguf-py scripts to support new quant types..md	Add GitHub data: filename sanitization (#640 )	2025-07-23 13:31:53 +02:00
300 - Bug_ IQK_FA_ALL_QUANTS causes failure to compile.md	Add GitHub data: filename sanitization (#640 )	2025-07-23 13:31:53 +02:00
305 - Gibberish output when using DeepSeek-V3-0324-IQ2_K_R4 on mixed CPU _ 4 .md	Add GitHub data: filename sanitization (#640 )	2025-07-23 13:31:53 +02:00
306 - Confused by the -mla flag. What_s supported_.md	Add GitHub data: filename sanitization (#640 )	2025-07-23 13:31:53 +02:00
308 - Bug_ Compiling for arm64_ error_ cannot convert _const uint32x4_t_ to _.md	Add GitHub data: filename sanitization (#640 )	2025-07-23 13:31:53 +02:00
314 - Llama 4 Support_.md	Add GitHub data: filename sanitization (#640 )	2025-07-23 13:31:53 +02:00
322 - Speculative decoding support.md	Add GitHub data: filename sanitization (#640 )	2025-07-23 13:31:53 +02:00
335 - Bug_ Llama 4 generates garbage with longer context _64K_ the issue is n.md	Add GitHub data: filename sanitization (#640 )	2025-07-23 13:31:53 +02:00
339 - Bug_ bitnet2b_2501 template issues.md	Add GitHub data: filename sanitization (#640 )	2025-07-23 13:31:53 +02:00
340 - Bug_ _unknown model architecture_ _cohere2_ when trying to load Command.md	Add GitHub data: filename sanitization (#640 )	2025-07-23 13:31:53 +02:00
345 - build question newbie.md	Add GitHub data: filename sanitization (#640 )	2025-07-23 13:31:53 +02:00
353 - Binaries releases for Windows _.md	Add GitHub data: filename sanitization (#640 )	2025-07-23 13:31:53 +02:00
358 - Bug_ IQK_FA_ALL_QUANTS causes failure to compile.md	Add GitHub data: filename sanitization (#640 )	2025-07-23 13:31:53 +02:00
361 - Bug_ Build not detecting some supported ARM CPUs.md	Add GitHub data: filename sanitization (#640 )	2025-07-23 13:31:53 +02:00
362 - README language is vague wrt. _quantization improvements_.md	Add GitHub data: filename sanitization (#640 )	2025-07-23 13:31:53 +02:00
363 - Bug_ Gibberish output when using flash attention using Mistral-Small-I.md	Add GitHub data: filename sanitization (#640 )	2025-07-23 13:31:53 +02:00
365 - Bug_ Updated BitNet arch bitnet-b1.58.md	Add GitHub data: filename sanitization (#640 )	2025-07-23 13:31:53 +02:00
367 - Bug_ IQ1_S_R4_ IQ1_M_R4 failed on Qwen3-235B-A22B.md	Add GitHub data: filename sanitization (#640 )	2025-07-23 13:31:53 +02:00
373 - DeepSeekV3 0324 can_t load newest UD quants _with MLA_. Older quant wor.md	Add GitHub data: filename sanitization (#640 )	2025-07-23 13:31:53 +02:00
376 - Bug_ unknown model architecture_ _deci_ _when loading Llama-3_1-Nemotro.md	Add GitHub data: filename sanitization (#640 )	2025-07-23 13:31:53 +02:00
378 - Feature Request_ Use ik_llama.cpp with llama-cpp-python.md	Add GitHub data: filename sanitization (#640 )	2025-07-23 13:31:53 +02:00
379 - Bug_ Cannot build on WoA.md	Add GitHub data: filename sanitization (#640 )	2025-07-23 13:31:53 +02:00
380 - Drop at the start of generation.md	Add GitHub data: filename sanitization (#640 )	2025-07-23 13:31:53 +02:00
381 - ik_llama.cpp_ggml_src_ggml-cuda_fattn.cu_66_ fatal error after latest.md	Add GitHub data: filename sanitization (#640 )	2025-07-23 13:31:53 +02:00
383 - Bug_ Loading DeepSeek R1T Chimera causes _llama_model_load_ error loadi.md	Add GitHub data: filename sanitization (#640 )	2025-07-23 13:31:53 +02:00
387 - Bug_ bitnet 1.58 on termux segmentation fault.md	Add GitHub data: filename sanitization (#640 )	2025-07-23 13:31:53 +02:00
388 - Bug_ Clash with mainline llama.cpp .so files.md	Add GitHub data: filename sanitization (#640 )	2025-07-23 13:31:53 +02:00
389 - Bug_ llama-batched-bench crashed with batch size _2.md	Add GitHub data: filename sanitization (#640 )	2025-07-23 13:31:53 +02:00
398 - Bug_ -fmoe causing illegal memory access.md	Add GitHub data: filename sanitization (#640 )	2025-07-23 13:31:53 +02:00
407 - Feature Request_ Support for function calling in llama-server.md	Add GitHub data: filename sanitization (#640 )	2025-07-23 13:31:53 +02:00
412 - Bug_ Static asserts trip during compile..md	Add GitHub data: filename sanitization (#640 )	2025-07-23 13:31:53 +02:00
419 - qwen3 metrics in expert parallel_2x P100_.md	Add GitHub data: filename sanitization (#640 )	2025-07-23 13:31:53 +02:00
420 - Bug_ standard attention is broken.md	Add GitHub data: filename sanitization (#640 )	2025-07-23 13:31:53 +02:00
423 - Bug_ Compile failure undefined reference to _void mul_mat_q_case.md	Add GitHub data: filename sanitization (#640 )	2025-07-23 13:31:53 +02:00
425 - Bug_ CUDA error_ an illegal memory access was encountered.md	Add GitHub data: filename sanitization (#640 )	2025-07-23 13:31:53 +02:00
432 - Refactor_ GGUF v14 broke compatibility with IQx_KS quants.md	Add GitHub data: filename sanitization (#640 )	2025-07-23 13:31:53 +02:00
433 - Feature Request_ CORS support.md	Add GitHub data: filename sanitization (#640 )	2025-07-23 13:31:53 +02:00
436 - Bug_ Saving the prompt cache causes Segfault.md	Add GitHub data: filename sanitization (#640 )	2025-07-23 13:31:53 +02:00
437 - Feature Request_ support intel amx for further accelerate.md	Add GitHub data: filename sanitization (#640 )	2025-07-23 13:31:53 +02:00
440 - Feature Request_ Top n-sigma sampler.md	Add GitHub data: filename sanitization (#640 )	2025-07-23 13:31:53 +02:00
447 - Compilation Error_ Error C2676.md	Add GitHub data: filename sanitization (#640 )	2025-07-23 13:31:53 +02:00
450 - Bug_ Performance regression.md	Add GitHub data: filename sanitization (#640 )	2025-07-23 13:31:53 +02:00
452 - Falcon H1 Support.md	Add GitHub data: filename sanitization (#640 )	2025-07-23 13:31:53 +02:00
455 - Bug_ KV cache is never reused in OpenAI compatible Chat Completion api.md	Add GitHub data: filename sanitization (#640 )	2025-07-23 13:31:53 +02:00
456 - Bug_ no compilation without IQK_MULMAT.md	Add GitHub data: filename sanitization (#640 )	2025-07-23 13:31:53 +02:00
463 - Research_ V100 Flash Attention Implementation.md	Add GitHub data: filename sanitization (#640 )	2025-07-23 13:31:53 +02:00
464 - Bug_ The streaming every couple of rows blocks for 5-8s.md	Add GitHub data: filename sanitization (#640 )	2025-07-23 13:31:53 +02:00
467 - Bug_ Server does not send data_ _DONE_ for OpenAI-compatible streaming .md	Add GitHub data: filename sanitization (#640 )	2025-07-23 13:31:53 +02:00
472 - Bug_ Don_t build ggml-aarch64 regardless of CPU arch type.md	Add GitHub data: filename sanitization (#640 )	2025-07-23 13:31:53 +02:00
474 - Bug_ Perf Regression in PP throughput after Pull _461 _...R4 CUDA impl_.md	Add GitHub data: filename sanitization (#640 )	2025-07-23 13:31:53 +02:00
476 - Research_ performance divergence.md	Add GitHub data: filename sanitization (#640 )	2025-07-23 13:31:53 +02:00
479 - Bug_ _ggml_backend_cuda_graph_compute_ disabling CUDA graphs due to GPU.md	Add GitHub data: filename sanitization (#640 )	2025-07-23 13:31:53 +02:00
485 - Bug_ Illegal Memory Access loading model to CUDA1.md	Add GitHub data: filename sanitization (#640 )	2025-07-23 13:31:53 +02:00
490 - Bug_ Performance drop with 14292913 _461.md	Add GitHub data: filename sanitization (#640 )	2025-07-23 13:31:53 +02:00
498 - question_ about quantize method.md	Add GitHub data: filename sanitization (#640 )	2025-07-23 13:31:53 +02:00
499 - Bug_ cache quantization crash with IQK_FORCE_BF16.md	Add GitHub data: filename sanitization (#640 )	2025-07-23 13:31:53 +02:00
500 - Bug_ Insane cudaMalloc OOM Error on Dual 3090 GPUs.md	Add GitHub data: filename sanitization (#640 )	2025-07-23 13:31:53 +02:00
503 - Bug_ server_cli fails with segmentation fault.md	Add GitHub data: filename sanitization (#640 )	2025-07-23 13:31:53 +02:00
507 - Compatible gguf models _.md	Add GitHub data: filename sanitization (#640 )	2025-07-23 13:31:53 +02:00
514 - CUDA Kernel Error on RTX 5090 _Compute Capability 12.0_ _no kernel imag.md	Add GitHub data: filename sanitization (#640 )	2025-07-23 13:31:53 +02:00
521 - When offloading semi layers to some GPUs with -ot_ TG t_s performance t.md	Add GitHub data: filename sanitization (#640 )	2025-07-23 13:31:53 +02:00
522 - Bug_ disabling CUDA graphs due to mul_mat_id.md	Add GitHub data: filename sanitization (#640 )	2025-07-23 13:31:53 +02:00
523 - Bug_ tg speed drop after https_github.com_ikawrakow_ik_llama.cpp_pull_5.md	Add GitHub data: filename sanitization (#640 )	2025-07-23 13:31:53 +02:00
527 - Bug_ Webui improvement _481 core dump with a certain question..md	Add GitHub data: filename sanitization (#640 )	2025-07-23 13:31:53 +02:00
530 - Getting crash on second prompt..md	Add GitHub data: filename sanitization (#640 )	2025-07-23 13:31:53 +02:00
538 - Bug_ GGML_ASSERT failed at first prompt.md	Add GitHub data: filename sanitization (#640 )	2025-07-23 13:31:53 +02:00
539 - Bug_ garbage output.md	Add GitHub data: filename sanitization (#640 )	2025-07-23 13:31:53 +02:00
551 - Feature Request_ Support for Falcon Edge series.md	Add GitHub data: filename sanitization (#640 )	2025-07-23 13:31:53 +02:00
561 - Feature Request_ Tencent Hunyuan-A13B model support.md	Add GitHub data: filename sanitization (#640 )	2025-07-23 13:31:53 +02:00
568 - Feature Request_ ERNIE MoE Model Support.md	Add GitHub data: filename sanitization (#640 )	2025-07-23 13:31:53 +02:00
572 - Bug_ Oops_ggml_compute_forward_sum_rows_f32_ ffn_moe_weights_sum-60_ fo.md	Add GitHub data: filename sanitization (#640 )	2025-07-23 13:31:53 +02:00
575 - Bug_ llama-server crash with sampling order.md	Add GitHub data: filename sanitization (#640 )	2025-07-23 13:31:53 +02:00
576 - Bug_ llama-server crash with _Deepseek2 does not support K-shift_.md	Add GitHub data: filename sanitization (#640 )	2025-07-23 13:31:53 +02:00
596 - Bug_ Lastest commit broke llama-cli on Windows - mmq.cuh_107_ fatal err.md	Add GitHub data: filename sanitization (#640 )	2025-07-23 13:31:53 +02:00
597 - Feature Request_ Add THUDM_GLM-4-MoE-100B-A10B support.md	Add GitHub data: filename sanitization (#640 )	2025-07-23 13:31:53 +02:00
600 - Feature Request_ Port --reasoning-budget from main llamacpp _llamaserve.md	Add GitHub data: filename sanitization (#640 )	2025-07-23 13:31:53 +02:00
601 - Bug_ llama-imatrix crashing.md	Add GitHub data: filename sanitization (#640 )	2025-07-23 13:31:53 +02:00
605 - Bug_ IQ3_KS missing from GGMLQuantizationType - gguf_reader.py script c.md	Add GitHub data: filename sanitization (#640 )	2025-07-23 13:31:53 +02:00
614 - Feature Request_ port no-mmproj-offload.md	Add GitHub data: filename sanitization (#640 )	2025-07-23 13:31:53 +02:00
615 - Bug_ Gemma3 Vision not working.md	Add GitHub data: filename sanitization (#640 )	2025-07-23 13:31:53 +02:00
625 - Bug_ undefined symbol errors after successful compilation.md	Add GitHub data: filename sanitization (#640 )	2025-07-23 13:31:53 +02:00
626 - Feature Request_ Add IQK GEMM for IQ1_M.md	Add GitHub data: filename sanitization (#640 )	2025-07-23 13:31:53 +02:00
627 - Feature Request_ Tensor Parallelism.md	Add GitHub data: filename sanitization (#640 )	2025-07-23 13:31:53 +02:00
629 - Multi-GPU performance _Windows_ is significantly worse than single-GPU.md	Add GitHub data: filename sanitization (#640 )	2025-07-23 13:31:53 +02:00

26 - Feature Request_ Improve CPU processing speed for large contexts.md

Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

29 - Bug_ some ifdefs missing in ggml_src_iqk_iqk_quantize.cpp.md

Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

30 - Bug_ Appcrash on Windows 7 with GGML_USE_IQK_MULMAT.md

Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

34 - Bug_ FA fails when processing prompt lengths that are not a multiple of .md

Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

59 - Bug_ GGML Compilation Error_ undefined references to _iqk_mul_mat_.md

Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

60 - Bug_ Illegal instruction on NEON and Q4_0_4_4.md

Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

67 - Feature Request_ Elliminate_reduce unnecessary copies.md

Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

88 - Bug_ Won_t compile on MSVC.md

Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

92 - Bug_ Quantized KV cache produces garbage in situation where llama.cpp do.md

Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

103 - Bug_ K cache without FA.md

Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

133 - Refactor_ update ggml library_.md

Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

159 - Feature Request_ steps how to compile as cmake i struction on the origi.md

Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

160 - Bug_ Can_t compile on MSVC 2022.md

Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

167 - Bug_ Unable to quantize Falcon 10B 1.58 bitnet model.md

Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

183 - Refactor_ iqk_mul_mat.md

Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

196 - Refactor_ remove usage of Q8_1 for activation quantization.md

Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

199 - Bug_ Changing system_prompt on llama-server at runtime breaks parallel .md

Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

203 - Bug_ Compliation Error for Intel_R_ Xeon_R_ Gold 6326 CPU.md

Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

209 - Does the iqk_mul_mat.cpp support 1.58-bit quantization model_.md

Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

214 - AVX512 build error.md

Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

217 - Bug_ CPU FA with fp16 K-cache is broken.md

Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

224 - Bug_ IQK_FA_ALL_QUANTS causes failure to compile.md

Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

227 - Prevent FA usage on CUDA when K and V head sizes are different.md

Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

228 - Feature Request_ create tool to offline repack models.md

Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

230 - Weird assert when using online repacking.md

Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

245 - Bug_ Perplexity returns NaN with IQ4_KSS quantisation.md

Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

249 - CUDA_ results for MoE models are not reproducible.md

Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

254 - Split-mode row.md

Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

255 - Feature Request_ dynamic layer by layer offloading during prompt proces.md

Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

257 - Bug_ mla_2 in llama-server will crash when request done.md

Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

263 - Benchmarking DeepSeek R1 - 16x3090.md

Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

267 - Feature Request_ HugePage mmap alloc for DeepSeek V3_R1.md

Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

271 - Possible regression computing _wk_b_ tensors on the fly after PR _265.md

Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

281 - Bug_ Strange dips in TG performance.md

Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

285 - llama-perplexity giving all NaNs on unsloth Q8_0 quant.md

Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

293 - Feature Request_ IQ6_K row interleaved quant.md

Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

296 - Possible numerical stability issue with experimental quant of DeepSeek-.md

Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

297 - Update gguf-py scripts to support new quant types..md

Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

300 - Bug_ IQK_FA_ALL_QUANTS causes failure to compile.md

Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

305 - Gibberish output when using DeepSeek-V3-0324-IQ2_K_R4 on mixed CPU _ 4 .md

Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

306 - Confused by the -mla flag. What_s supported_.md

Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

308 - Bug_ Compiling for arm64_ error_ cannot convert _const uint32x4_t_ to _.md

Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

314 - Llama 4 Support_.md

Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

322 - Speculative decoding support.md

Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

335 - Bug_ Llama 4 generates garbage with longer context _64K_ the issue is n.md

Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

339 - Bug_ bitnet2b_2501 template issues.md

Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

340 - Bug_ _unknown model architecture_ _cohere2_ when trying to load Command.md

Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

345 - build question newbie.md

Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

353 - Binaries releases for Windows _.md

Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

358 - Bug_ IQK_FA_ALL_QUANTS causes failure to compile.md

Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

361 - Bug_ Build not detecting some supported ARM CPUs.md

Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

362 - README language is vague wrt. _quantization improvements_.md

Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

363 - Bug_ Gibberish output when using flash attention using Mistral-Small-I.md

Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

365 - Bug_ Updated BitNet arch bitnet-b1.58.md

Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

367 - Bug_ IQ1_S_R4_ IQ1_M_R4 failed on Qwen3-235B-A22B.md

Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

373 - DeepSeekV3 0324 can_t load newest UD quants _with MLA_. Older quant wor.md

Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

376 - Bug_ unknown model architecture_ _deci_ _when loading Llama-3_1-Nemotro.md

Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

378 - Feature Request_ Use ik_llama.cpp with llama-cpp-python.md

Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

379 - Bug_ Cannot build on WoA.md

Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

380 - Drop at the start of generation.md

Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

381 - ik_llama.cpp_ggml_src_ggml-cuda_fattn.cu_66_ fatal error after latest.md

Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

383 - Bug_ Loading DeepSeek R1T Chimera causes _llama_model_load_ error loadi.md

Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

387 - Bug_ bitnet 1.58 on termux segmentation fault.md

Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

388 - Bug_ Clash with mainline llama.cpp .so files.md

Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

389 - Bug_ llama-batched-bench crashed with batch size _2.md

Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

398 - Bug_ -fmoe causing illegal memory access.md

Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

407 - Feature Request_ Support for function calling in llama-server.md

Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

412 - Bug_ Static asserts trip during compile..md

Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

419 - qwen3 metrics in expert parallel_2x P100_.md

Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

420 - Bug_ standard attention is broken.md

Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

423 - Bug_ Compile failure undefined reference to _void mul_mat_q_case.md

Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

425 - Bug_ CUDA error_ an illegal memory access was encountered.md

Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

432 - Refactor_ GGUF v14 broke compatibility with IQx_KS quants.md

Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

433 - Feature Request_ CORS support.md

Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

436 - Bug_ Saving the prompt cache causes Segfault.md

Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

437 - Feature Request_ support intel amx for further accelerate.md

Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

440 - Feature Request_ Top n-sigma sampler.md

Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

447 - Compilation Error_ Error C2676.md

Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

450 - Bug_ Performance regression.md

Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

452 - Falcon H1 Support.md

Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

455 - Bug_ KV cache is never reused in OpenAI compatible Chat Completion api.md

Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

456 - Bug_ no compilation without IQK_MULMAT.md

Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

463 - Research_ V100 Flash Attention Implementation.md

Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

464 - Bug_ The streaming every couple of rows blocks for 5-8s.md

Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

467 - Bug_ Server does not send data_ _DONE_ for OpenAI-compatible streaming .md

Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

472 - Bug_ Don_t build ggml-aarch64 regardless of CPU arch type.md

Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

474 - Bug_ Perf Regression in PP throughput after Pull _461 _...R4 CUDA impl_.md

Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

476 - Research_ performance divergence.md

Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

479 - Bug_ _ggml_backend_cuda_graph_compute_ disabling CUDA graphs due to GPU.md

Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

485 - Bug_ Illegal Memory Access loading model to CUDA1.md

Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

490 - Bug_ Performance drop with 14292913 _461.md

Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

498 - question_ about quantize method.md

Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

499 - Bug_ cache quantization crash with IQK_FORCE_BF16.md

Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

500 - Bug_ Insane cudaMalloc OOM Error on Dual 3090 GPUs.md

Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

503 - Bug_ server_cli fails with segmentation fault.md

Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

507 - Compatible gguf models _.md

Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

514 - CUDA Kernel Error on RTX 5090 _Compute Capability 12.0_ _no kernel imag.md

Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

521 - When offloading semi layers to some GPUs with -ot_ TG t_s performance t.md

Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

522 - Bug_ disabling CUDA graphs due to mul_mat_id.md

Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

523 - Bug_ tg speed drop after https_github.com_ikawrakow_ik_llama.cpp_pull_5.md

Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

527 - Bug_ Webui improvement _481 core dump with a certain question..md

Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

530 - Getting crash on second prompt..md

Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

538 - Bug_ GGML_ASSERT failed at first prompt.md

Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

539 - Bug_ garbage output.md

Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

551 - Feature Request_ Support for Falcon Edge series.md

Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

561 - Feature Request_ Tencent Hunyuan-A13B model support.md

Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

568 - Feature Request_ ERNIE MoE Model Support.md

Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

572 - Bug_ Oops_ggml_compute_forward_sum_rows_f32_ ffn_moe_weights_sum-60_ fo.md

Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

575 - Bug_ llama-server crash with sampling order.md

Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

576 - Bug_ llama-server crash with _Deepseek2 does not support K-shift_.md

Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

596 - Bug_ Lastest commit broke llama-cli on Windows - mmq.cuh_107_ fatal err.md

Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

597 - Feature Request_ Add THUDM_GLM-4-MoE-100B-A10B support.md

Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

600 - Feature Request_ Port --reasoning-budget from main llamacpp _llamaserve.md

Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

601 - Bug_ llama-imatrix crashing.md

Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

605 - Bug_ IQ3_KS missing from GGMLQuantizationType - gguf_reader.py script c.md

Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

614 - Feature Request_ port no-mmproj-offload.md

Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

615 - Bug_ Gemma3 Vision not working.md

Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

625 - Bug_ undefined symbol errors after successful compilation.md

Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

626 - Feature Request_ Add IQK GEMM for IQ1_M.md

Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

627 - Feature Request_ Tensor Parallelism.md

Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00

629 - Multi-GPU performance _Windows_ is significantly worse than single-GPU.md

Add GitHub data: filename sanitization (#640 )

2025-07-23 13:31:53 +02:00