mirror of
https://github.com/ggerganov/llama.cpp
synced 2026-04-07 07:45:58 +02:00
* llama : fix pooling assertion crash in chunked GDN detection path
The chunked fused Gated Delta Net detection in sched_reserve() calls
graph_reserve(16*n_seqs, n_seqs, n_outputs, ...) where n_outputs = n_seqs.
This creates a dimension mismatch in build_pooling() for embedding models
with mean/rank pooling: build_inp_mean() creates a tensor with shape
[n_tokens=16*n_seqs, ...] while t_embd is reduced to [n_outputs=n_seqs, ...]
via out_ids, causing ggml_mul_mat to assert on ggml_can_mul_mat(a, b).
Fix: pass n_tokens as n_outputs in the chunked GDN graph reservation,
matching the pattern used by the pp/tg worst-case reservations.
Regression introduced by #20340 (
|
||
|---|---|---|
| .. | ||
| test_basic.py | ||
| test_chat_completion.py | ||
| test_compat_anthropic.py | ||
| test_compat_oai_responses.py | ||
| test_completion.py | ||
| test_ctx_shift.py | ||
| test_embedding.py | ||
| test_infill.py | ||
| test_lora.py | ||
| test_proxy.py | ||
| test_rerank.py | ||
| test_router.py | ||
| test_security.py | ||
| test_sleep.py | ||
| test_slot_save.py | ||
| test_speculative.py | ||
| test_template.py | ||
| test_tokenize.py | ||
| test_tool_call.py | ||
| test_vision_api.py | ||