### ✨ [#626](https://github.com/ikawrakow/ik_llama.cpp/issues/626) - Feature Request: Add IQK GEMM for IQ1_M
| **Author** | `ikawrakow` |
| :--- | :--- |
| **State** | ❌ **Closed** |
| **Created** | 2025-07-18 |
| **Updated** | 2025-07-18 |
---
#### Description
### Prerequisites
- [x] I am running the latest code. Mention the version if possible as well.
- [x] I carefully followed the [README.md](https://github.com/ggerganov/llama.cpp/blob/master/README.md).
- [x] I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
- [x] I reviewed the [Discussions](https://github.com/ggerganov/llama.cpp/discussions), and have a new and useful enhancement to share.
### Feature Description
Quite a few people are trying to run Unsloth models that contain tensors quantized with `IQ1_M`. In addition, there are now the quantization recipes prepared by the @Thireus GGUF suite, which also tend to contain `IQ1_M` when a low-bpw has been requested.
When a model contains `IQ1_M` FFN tensors and `-fmoe` is specified, `ik_llama.cpp` will crash with an assert when the number of tokens processed by one of the routed experts is less than 32. This is due to the fused `ffn_up+ffn_gate` op assuming the presence of an IQK GEMM kernel, which is not implemented.
So, either add IQK GEMM for `IQ1_M`, or at least quard against the absence of a GEMM kernel in the fused `ffn_up+ffn_gate` op CPU implementation.
### Motivation
Quite a few people are trying to run Unsloth models that contain tensors quantized with `IQ1_M`. In addition, there are now the quantization recipes prepared by the @Thireus GGUF suite, which also tend to contain `IQ1_M` when a low-bpw has been requested.
When a model contains `IQ1_M` FFN tensors and `-fmoe` is specified, `ik_llama.cpp` will crash with an assert when the number of tokens processed by one of the routed experts is less than 32. This is due to the fused `ffn_up+ffn_gate` op assuming the presence of an IQK GEMM kernel, which is not implemented.
### Possible Implementation
Either add IQK GEMM for `IQ1_M`, or at least quard against the absence of a GEMM kernel in the fused `ffn_up+ffn_gate` op CPU implementation.
---
#### 💬 Conversation
👤 **ubergarm** commented the **2025-07-18** at **14:43:32**:
I'll not open a new issue regarding unsloths Kimi-K2-Instruct-IQ1_S failing with `-fmoe` as discussed on other threads here and [reported on hugging face here](https://github.com/ikawrakow/ik_llama.cpp/issues/626). I also recreated the issue and observed removing `-fmoe` allows that model to run.
I confirmed using gguf-dump.py script that the model in question indeed has a handfull of IQ1_M ffn tensors:
```bash
$ cat logs/gguf-dump-Kimi-K2-Instruct-UD-IQ1_S-0000* | grep IQ1_M
163: 5637144576 | 7168, 2048, 384, 1 | IQ1_M | blk.18.ffn_gate_exps.weight
167: 5637144576 | 7168, 2048, 384, 1 | IQ1_M | blk.18.ffn_up_exps.weight
111: 5637144576 | 7168, 2048, 384, 1 | IQ1_M | blk.50.ffn_gate_exps.weight
115: 5637144576 | 7168, 2048, 384, 1 | IQ1_M | blk.50.ffn_up_exps.weight
129: 5637144576 | 7168, 2048, 384, 1 | IQ1_M | blk.51.ffn_gate_exps.weight
133: 5637144576 | 7168, 2048, 384, 1 | IQ1_M | blk.51.ffn_up_exps.weight
147: 5637144576 | 7168, 2048, 384, 1 | IQ1_M | blk.52.ffn_gate_exps.weight
151: 5637144576 | 7168, 2048, 384, 1 | IQ1_M | blk.52.ffn_up_exps.weight
165: 5637144576 | 7168, 2048, 384, 1 | IQ1_M | blk.53.ffn_gate_exps.weight
169: 5637144576 | 7168, 2048, 384, 1 | IQ1_M | blk.53.ffn_up_exps.weight
183: 5637144576 | 7168, 2048, 384, 1 | IQ1_M | blk.54.ffn_gate_exps.weight
187: 5637144576 | 7168, 2048, 384, 1 | IQ1_M | blk.54.ffn_up_exps.weight
21: 5637144576 | 7168, 2048, 384, 1 | IQ1_M | blk.56.ffn_gate_exps.weight
25: 5637144576 | 7168, 2048, 384, 1 | IQ1_M | blk.56.ffn_up_exps.weight
```
Given the "unsloth dynamic" is to change the tensor size up and down across layers for the same tensor name, it wasn't obvious from the first GGUF splits that it contained IQ1_M.
---
👤 **ikawrakow** commented the **2025-07-18** at **14:46:02**:
I created issue #626 for this, so no need to add another one.
---
👤 **ubergarm** commented the **2025-07-18** at **17:34:41**:
Confirmed I can now run unsloths `Kimi-K2-Instruct-UD-IQ1_S-00001-of-00006.gguf` with `-fmoe`! Thanks!
```
$ ./build/bin/llama-server --version
version: 3808 (38012f72)
built with cc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0 for x86_64-linux-gnu
```