InvokeAI/invokeai/backend/quantization
Alexander Eichhorn 3b2d2ef10a
fix(gguf): ensure dequantized tensors are on correct device for MPS (#8713)
When using GGUF-quantized models on MPS (Apple Silicon), the
dequantized tensors could end up on a different device than the
other operands in math operations, causing "Expected all tensors
to be on the same device" errors.

This fix ensures that after dequantization, tensors are moved to
the same device as the other tensors in the operation.

Co-authored-by: Lincoln Stein <lincoln.stein@gmail.com>
2026-01-02 00:45:50 +00:00
..
gguf fix(gguf): ensure dequantized tensors are on correct device for MPS (#8713) 2026-01-02 00:45:50 +00:00
scripts refactor: model manager v3 (#8607) 2025-10-15 10:18:53 +11:00
__init__.py Move requantize.py to the quatnization/ dir. 2024-08-26 20:17:50 -04:00
bnb_llm_int8.py Simplify the state management in InvokeLinear8bitLt and add unit tests. This is in preparation for wrapping it to support streaming of weights from cpu to gpu. 2024-12-24 14:32:11 +00:00
bnb_nf4.py Install sub directories with folders correctly, ensure consistent dtype of tensors in flux pipeline and vae 2024-08-26 20:17:50 -04:00