tests : add -INF blocks to the KQ mask in the FA tests (llama/16380)

* tests : add -INF blocks to the KQ mask in the FA tests * cont : bump -INF block size to 64 Co-authored-by: Jeff Bolz <jbolz@nvidia.com> * ggml : prevent division by zero in FA CPU op --------- Co-authored-by: Jeff Bolz <jbolz@nvidia.com>
2026-04-07 23:45:24 +02:00 · 2025-10-07 08:22:35 +03:00 · 2025-10-07 08:22:35 +03:00 · 6cf0c21b09
commit 6cf0c21b09
parent 1a4116f942
1 changed files with 1 additions and 1 deletions
--- a/ggml/src/ggml-cpu/ops.cpp
+++ b/ggml/src/ggml-cpu/ops.cpp
@ -8135,7 +8135,7 @@ static void ggml_compute_forward_flash_attn_ext_f16(
        }

        // V /= S
-        const float S_inv = 1.0f/S;
+        const float S_inv = S == 0.0f ? 0.0f : 1.0f/S;
        ggml_vec_scale_f32(DV, VKQ32, S_inv);

        // dst indices