| .. |
|
26-Feature Request_ Improve CPU processing speed for large contexts.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
29-Bug_ some ifdefs missing in ggml_src_iqk_iqk_quantize.cpp.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
30-Bug_ Appcrash on Windows 7 with GGML_USE_IQK_MULMAT.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
34-Bug_ FA fails when processing prompt lengths that are not a multiple of 8.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
59-Bug_ GGML Compilation Error_ undefined references to `iqk_mul_mat'.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
60-Bug_ Illegal instruction on NEON and Q4_0_4_4.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
67-Feature Request_ Elliminate_reduce unnecessary copies .md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
88-Bug_ Won't compile on MSVC.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
92-Bug_ Quantized KV cache produces garbage in situation where llama.cpp does not.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
103-Bug_ K cache without FA.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
133-Refactor_ update ggml library_.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
159-Feature Request_ steps how to compile as cmake i struction on the origi al repo not work here..md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
160-Bug_ Can't compile on MSVC 2022.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
167-Bug_ Unable to quantize Falcon 10B 1.58 bitnet model.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
183-Refactor_ iqk_mul_mat.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
196-Refactor_ remove usage of Q8_1 for activation quantization.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
199-Bug_ Changing system_prompt on llama-server at runtime breaks parallel processing.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
203-Bug_ Compliation Error for Intel(R) Xeon(R) Gold 6326 CPU.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
209-Does the iqk_mul_mat.cpp support 1.58-bit quantization model_.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
214-AVX512 build error.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
217-Bug_ CPU FA with fp16 K-cache is broken.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
224-Bug_ IQK_FA_ALL_QUANTS causes failure to compile.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
227-Prevent FA usage on CUDA when K and V head sizes are different.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
228-Feature Request_ create tool to offline repack models.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
230-Weird assert when using online repacking.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
245-Bug_ Perplexity returns NaN with IQ4_KSS quantisation.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
249-CUDA_ results for MoE models are not reproducible.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
254-Split-mode row.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
255-Feature Request_ dynamic layer by layer offloading during prompt processing for VRAM constrained scenarios.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
257-Bug_ mla=2 in llama-server will crash when request done.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
263-Benchmarking DeepSeek R1 - 16x3090.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
267-Feature Request_ HugePage mmap alloc for DeepSeek V3_R1.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
271-Possible regression computing `wk_b` tensors on the fly after PR #265.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
281-Bug_ Strange dips in TG performance.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
285-llama-perplexity giving all NaNs on unsloth Q8_0 quant.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
293-Feature Request_ IQ6_K row interleaved quant.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
296-Possible numerical stability issue with experimental quant of DeepSeek-V3-0324_.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
297-Update gguf-py scripts to support new quant types..md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
300-Bug_ IQK_FA_ALL_QUANTS causes failure to compile.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
305-Gibberish output when using DeepSeek-V3-0324-IQ2_K_R4 on mixed CPU + 4 GPUs with -mla (1 or 2).md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
306-Confused by the -mla flag. What's supported_.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
308-Bug_ Compiling for arm64, error_ cannot convert ‘const uint32x4_t’ to ‘uint8x16_t’ and similar errors.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
314-Llama 4 Support_.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
322-Speculative decoding support.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
335-Bug_ Llama 4 generates garbage with longer context (64K+; the issue is not present in the llama.cpp).md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
339-Bug_ bitnet2b_2501 template issues.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
340-Bug_ _unknown model architecture_ 'cohere2'_ when trying to load Command A model.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
345-build question newbie.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
353-Binaries releases for Windows _.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
358-Bug_ IQK_FA_ALL_QUANTS causes failure to compile.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
361-Bug_ Build not detecting some supported ARM CPUs.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
362-README language is vague wrt. _quantization improvements_.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
363-Bug_ Gibberish output when using flash attention using Mistral-Small-Instruct-2409-Q6_K and Gemma-3-12b-it-q4_0 on CPU.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
365-Bug_ Updated BitNet arch bitnet-b1.58.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
367-Bug_ IQ1_S_R4, IQ1_M_R4 failed on Qwen3-235B-A22B.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
373-DeepSeekV3 0324 can't load newest UD quants (with MLA). Older quant works but with slower pre processing than gen speed (CPU + CUDA).md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
376-Bug_ unknown model architecture_ 'deci' (when loading Llama-3_1-Nemotron-Ultra-253B).md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
378-Feature Request_ Use ik_llama.cpp with llama-cpp-python.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
379-Bug_ Cannot build on WoA.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
380-Drop at the start of generation.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
381-ik_llama.cpp_ggml_src_ggml-cuda_fattn.cu_66_ fatal error after latest.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
383-Bug_ Loading DeepSeek R1T Chimera causes _llama_model_load_ error loading model_ check_tensor_dims_ tensor 'blk.0.attn_q_b.weight' has wrong shape; expected 1536, 73728, got 1536, 24576, 1,
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
387-Bug_ bitnet 1.58 on termux segmentation fault.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
388-Bug_ Clash with mainline llama.cpp .so files.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
389-Bug_ llama-batched-bench crashed with batch size _2.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
398-Bug_ -fmoe causing illegal memory access.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
407-Feature Request_ Support for function calling in llama-server.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
412-Bug_ Static asserts trip during compile..md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
419-qwen3 metrics in expert parallel(2x P100).md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
420-Bug_ standard attention is broken.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
423-Bug_ Compile failure undefined reference to `void mul_mat_q_case.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
425-Bug_ CUDA error_ an illegal memory access was encountered.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
432-Refactor_ GGUF v14 broke compatibility with IQx_KS quants.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
433-Feature Request_ CORS support.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
436-Bug_ Saving the prompt cache causes Segfault.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
437-Feature Request_ support intel amx for further accelerate.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
440-Feature Request_ Top n-sigma sampler.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
447-Compilation Error_ Error C2676.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
450-Bug_ Performance regression.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
452-Falcon H1 Support.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
455-Bug_ KV cache is never reused in OpenAI compatible Chat Completion api.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
456-Bug_ no compilation without IQK_MULMAT.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
463-Research_ V100 Flash Attention Implementation.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
464-Bug_ The streaming every couple of rows blocks for 5-8s.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
467-Bug_ Server does not send data_ [DONE] for OpenAI-compatible streaming endpoint `_v1_chat_completions`.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
472-Bug_ Don't build ggml-aarch64 regardless of CPU arch type.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
474-Bug_ Perf Regression in PP throughput after Pull #461 (...R4 CUDA impl).md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
476-Research_ performance divergence.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
479-Bug_ _ggml_backend_cuda_graph_compute_ disabling CUDA graphs due to GPU architecture_ flood.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
485-Bug_ Illegal Memory Access loading model to CUDA1.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
490-Bug_ Performance drop with 14292913 #461.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
498-question_ about quantize method.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
499-Bug_ cache quantization crash with IQK_FORCE_BF16.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
500-Bug_ Insane cudaMalloc OOM Error on Dual 3090 GPUs.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
503-Bug_ server_cli fails with segmentation fault.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
507-Compatible gguf models _.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
514-CUDA Kernel Error on RTX 5090 (Compute Capability 12.0)_ _no kernel image is available for execution on the device_.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
521-When offloading semi layers to some GPUs with -ot, TG t_s performance tanks (CUDA + CPU, DeepSeek V3-R1), while not on main llamacpp..md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
522-Bug_ disabling CUDA graphs due to mul_mat_id.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
523-Bug_ tg speed drop after https___github.com_ikawrakow_ik_llama.cpp_pull_518.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
527-Bug_ Webui improvement #481 core dump with a certain question..md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
530-Getting crash on second prompt..md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
538-Bug_ GGML_ASSERT failed at first prompt.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
539-Bug_ garbage output.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
551-Feature Request_ Support for Falcon Edge series.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
561-Feature Request_ Tencent Hunyuan-A13B model support.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
568-Feature Request_ ERNIE MoE Model Support.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
572-Bug_ Oops(ggml_compute_forward_sum_rows_f32, ffn_moe_weights_sum-60)_ found nan, on DeepSeek V3_R1 on CUDA + CPU.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
575-Bug_ llama-server crash with sampling order.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
576-Bug_ llama-server crash with _Deepseek2 does not support K-shift_.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
596-Bug_ Lastest commit broke llama-cli on Windows - mmq.cuh_107_ fatal error.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
597-Feature Request_ Add THUDM_GLM-4-MoE-100B-A10B support.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
600-Feature Request_ Port --reasoning-budget from main llamacpp (llamaserver).md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
601-Bug_ llama-imatrix crashing.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
605-Bug_ IQ3_KS missing from GGMLQuantizationType - gguf_reader.py script cannot process IQ3_KS tensors.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
614-Feature Request_ port no-mmproj-offload.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
615-Bug_ Gemma3 Vision not working.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
625-Bug_ undefined symbol errors after successful compilation.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
626-Feature Request_ Add IQK GEMM for IQ1_M.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
627-Feature Request_ Tensor Parallelism.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |
|
629-Multi-GPU performance (Windows) is significantly worse than single-GPU.md
|
Add GitHub data (#637)
|
2025-07-22 18:18:40 +02:00 |