### 📝 [#353](https://github.com/ikawrakow/ik_llama.cpp/issues/353) - Binaries releases for Windows ?
| **Author** | `lbarasc` |
| :--- | :--- |
| **State** | ✅ **Open** |
| **Created** | 2025-04-28 |
| **Updated** | 2025-06-06 |
---
#### Description
Hi,
Can you release binaries for windows working on different types of CPU (avx,avx2 etc...) ?
Thank you.
---
#### 💬 Conversation
👤 **ikawrakow** commented the **2025-04-29** at **13:55:36**:
If this repository gains more momentum and there are users testing on Windows and providing feedback, sure, we can consider releasing Windows binaries.
But in the meantime
* I don't have access to a Windows machine
* This is just a hobby project that does not have the funds to go out and rent something in the cloud
* I don't feel OK releasing builds that were never tested
Another thing is that this project does not aim at providing the broad hardware support that mainline `llama.cpp` offers. The optimizations here are targeted towards newer CPUs and GPUs. For instance, a CPU old enough to not support `AVX2` will not benefit at all from this project compared to mainline `llama.cpp`.
---
👤 **PmNz8** commented the **2025-04-30** at **22:54:13**:
I managed to compile from source for Windows cpu, but not for cuda - it is above my skills level. Having (best automatically) compiled binaries available on github would be great! I can always test some binaries if that would be helpful, one of my machine runs intel with avx512 (rocket lake), the other is AMD zen 3 + Nvidia ada.
---
👤 **saood06** commented the **2025-05-01** at **07:32:23**:
> * I don't have access to a Windows machine
> * I don't feel OK releasing builds that were never tested
If you want to do occasional releases (since we don't have CI like mainline does that generates over a dozen Windows builds), I can provide the Windows builds made with MVSC 2019 and CUDA v12.1 with AVX2 that have been tested and also Android builds. I could try cross compiling with AVX512 but they wouldn't be tested. ( I know [this](https://www.intel.com/content/www/us/en/developer/articles/tool/software-development-emulator.html) exists but I've never used it and so don't know how much of a slowdown it would have).
---
👤 **SpookyT00th** commented the **2025-05-01** at **22:11:05**:
I noticed you mentioned that this is intended to support newer GPUs. Do you know if the Nvidia V100 (Volta Architecture) is supported? also, does this support tensor parallelism? i want to fit this model across 128GB VRAM : https://huggingface.co/ubergarm/Qwen3-235B-A22B-GGUF
---
👤 **SpookyT00th** commented the **2025-05-01** at **22:11:05**:
I noticed you mentioned that this is intended to support newer GPUs. Do you know if the Nvidia V100 (Volta Architecture) is supported?
---
👤 **saood06** commented the **2025-05-02** at **03:05:53**:
>also, does this support tensor parallelism? i want to fit this model across 128GB VRAM : https://huggingface.co/ubergarm/Qwen3-235B-A22B-GGUF
For MoE models such as the one you linked, `-split-mode row` does not function, see https://github.com/ikawrakow/ik_llama.cpp/issues/254
---
👤 **sousekd** commented the **2025-05-29** at **20:39:13**:
I would be happy to test on AMD Epyc Turin + RTX 4090 / RTX Pro 6000, if builds are provided.
---
👤 **Thireus** commented the **2025-06-03** at **17:54:35**:
If anyone wants to give a go to the build I've created, and report back if it works decently... https://github.com/Thireus/ik_llama.cpp/releases
Using CUDA 12.8 (and Blackwell compatible) + `-DGGML_AVX512=ON -DGGML_SCHED_MAX_COPIES=1 -DGGML_CUDA_IQK_FORCE_BF16=1`
See https://github.com/Thireus/ik_llama.cpp/blob/main/.github/workflows/release.yml#L448-L450
---
👤 **lbarasc** commented the **2025-06-03** at **19:25:40**:
Well thank you !! i will test this on my server.
---
👤 **ikawrakow** commented the **2025-06-05** at **07:05:32**:
How is the testing going here?
@Thireus
On `x86_64` the CPU implementation has basically two implementation paths:
* Vanilla `AVX2`, so `/arch:AVX2` for MSVC.
* "Fancy AVX512", which requires `/arch:AVX512`, plus `__AVX512VNNI__`, `__AVX512VL__`, `__AVX512BW__` and `__AVX512DQ__` being defined (if they are not defined, the implementation will use vanilla `AVX2`). These are supported on Zen4/Zen5 CPUs, and I guess some recent Intel CPUs. On Linux they will get defined with `-march=native` if the CPU supports them, not sure how this works under Windows.
There is also GEMM/GEMV implementation for CPUs natively supporting `bf16` (e.g., Zen4/Zen5 and some recent Intel CPUs). To be turned on it requires `__AVX512BF16__` to be defined.
So, to cover pre-build binaries for Windows users, one would need 6 different builds: vanilla `AVX2`, fancy `AVX512` without `bf16`, fancy `AVX512` with `bf16`, with or without CUDA (without CUDA for the users who don't have a supported GPU and don't want to get involved with installing CUDA toolkits and such so the app can run).
---
👤 **PmNz8** commented the **2025-06-06** at **19:01:35**:
@Thireus for me your binaries do not run. I try something simple like .\llama-cli.exe -m "D:\LLMs\bartowski\Qwen_Qwen3-4B-GGUF\Qwen_Qwen3-4B-Q8_0.gguf" and all I get in the log is:
```
[1749236397] Log start
[1749236397] Cmd: C:\Users\dawidgaming\Downloads\ik_llama-main-b3770-5a8bb97-bin-win-cuda-12.8-x64\llama-cli.exe -m D:\LLMs\bartowski\Qwen_Qwen3-4B-GGUF\Qwen_Qwen3-4B-Q8_0.gguf
[1749236397] main: build = 1 (5a8bb97)
[1749236397] main: built with MSVC 19.29.30159.0 for
[1749236397] main: seed = 1749236397
[1749236397] main: llama backend init
[1749236397] main: load the model and apply lora adapter, if any
```
Then it just shuts down.
Windows 11 + RTX 4090 @ 576.52 drivers.
---
👤 **PmNz8** commented the **2025-06-06** at **19:01:35**:
@Thireus for me your binaries do not run. I try something simple like .\llama-cli.exe -m "D:\LLMs\bartowski\Qwen_Qwen3-4B-GGUF\Qwen_Qwen3-4B-Q8_0.gguf" and all I get in the log is:
```
[1749236397] Log start
[1749236397] Cmd: C:\Users\dawidgaming\Downloads\ik_llama-main-b3770-5a8bb97-bin-win-cuda-12.8-x64\llama-cli.exe -m D:\LLMs\bartowski\Qwen_Qwen3-4B-GGUF\Qwen_Qwen3-4B-Q8_0.gguf
[1749236397] main: build = 1 (5a8bb97)
[1749236397] main: built with MSVC 19.29.30159.0 for
[1749236397] main: seed = 1749236397
[1749236397] main: llama backend init
[1749236397] main: load the model and apply lora adapter, if any
```
---
👤 **kiron111** commented the **2025-06-06** at **19:55:45**:
> If anyone wants to give a go to the build I've created, and report back if it works decently... https://github.com/Thireus/ik_llama.cpp/releases
>
> Using CUDA 12.8 (and Blackwell compatible) + `-DGGML_AVX512=ON -DGGML_SCHED_MAX_COPIES=1 -DGGML_CUDA_IQK_FORCE_BF16=1` See https://github.com/Thireus/ik_llama.cpp/blob/main/.github/workflows/release.yml#L448-L450
Thanks
it's great, I 've just stuck in compiling cuda version....failed for hours