ik_llama.cpp/ggml
Kawrakow 0e1d33ca4a
Fuse add+add+fused_rms (#853)
* Fuse add+add+fused_rms

* Try this

* Macro to easily enable/disable fusion

* Various:

* Check that all tensors involved are on the same device before applying fusion
* Fuse sigmoid+scale+sum_rows+div
* Fix the fused bailingmoe2 experts selection

The issue there was that the bias was not per row, but per
expert group, so only the first n_per_group biases were used
for al experts.

---------

Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
2025-10-22 16:18:11 +03:00
..
cmake Merge mainline llama.cpp (#3) 2024-07-27 07:55:01 +02:00
include Grouped expert routing (CPU only) (#836) 2025-10-16 14:57:02 +03:00
src Fuse add+add+fused_rms (#853) 2025-10-22 16:18:11 +03:00
.gitignore Merge mainline llama.cpp (#3) 2024-07-27 07:55:01 +02:00
CMakeLists.txt Set default value of GGML_SCHED_MAX_COPIES to 1 (#751) 2025-09-02 07:04:39 +02:00