InvokeAI

mirror of https://github.com/invoke-ai/InvokeAI synced 2026-03-05 06:29:09 +01:00

Author	SHA1	Message	Date
Alexander Eichhorn	3b2d2ef10a	fix(gguf): ensure dequantized tensors are on correct device for MPS (#8713 ) When using GGUF-quantized models on MPS (Apple Silicon), the dequantized tensors could end up on a different device than the other operands in math operations, causing "Expected all tensors to be on the same device" errors. This fix ensures that after dequantization, tensors are moved to the same device as the other tensors in the operation. Co-authored-by: Lincoln Stein <lincoln.stein@gmail.com>	2026-01-02 00:45:50 +00:00
Lincoln Stein	b9493ddce7	Workaround for Windows being unable to remove tmp directories when installing GGUF files (#8699 ) * (bugfix)(mm) work around Windows being unable to rmtree tmp directories after GGUF install * (style) fix ruff error * (fix) add workaround for Windows Permission Denied on GGUF file move() call * (fix) perform torch copy() in GGUF reader to avoid deletion failures on Windows * (style) fix ruff formatting issues	2025-12-26 02:02:39 +00:00
Alexander Eichhorn	280202908a	feat: Add GGUF quantized Z-Image support and improve VAE/encoder flexibility Add comprehensive support for GGUF quantized Z-Image models and improve component flexibility: Backend: - New Main_GGUF_ZImage_Config for GGUF quantized Z-Image transformers - Z-Image key detection (_has_z_image_keys) to identify S3-DiT models - GGUF quantization detection and sidecar LoRA patching for quantized models - Qwen3Encoder_Qwen3Encoder_Config for standalone Qwen3 encoder models Model Loader: - Split Z-Image model	2025-12-02 20:31:11 +01:00
gogurtenjoyer	382d85ee23	Fix memory issues when installing models on Windows (#8652 ) * Wrap GGUF loader for context managed close() Wrap gguf.GGUFReader and then use a context manager to load memory-mapped GGUF files, so that they will automatically close properly when no longer needed. Should prevent the 'file in use in another process' errors on Windows. * Additional check for cached state_dict Additional check for cached state_dict as path is now optional - should solve model manager 'missing' this and the resultant memory errors. * Appease ruff * Further ruff appeasement * ruff * loaders.py fix for linux No longer attempting to delete internal object. * loaders.py - one more _mmap ref removed --------- Co-authored-by: Lincoln Stein <lincoln.stein@gmail.com>	2025-11-16 09:25:52 -05:00
psychedelicious	454d05bbde	refactor: model manager v3 (#8607 ) * feat(mm): add UnknownModelConfig * refactor(ui): move model categorisation-ish logic to central location, simplify model manager models list * refactor(ui)refactor(ui): more cleanup of model categories * refactor(ui): remove unused excludeSubmodels I can't remember what this was for and don't see any reference to it. Maybe it's just remnants from a previous implementation? * feat(nodes): add unknown as model base * chore(ui): typegen * feat(ui): add unknown model base support in ui * feat(ui): allow changing model type in MM, fix up base and variant selects * feat(mm): omit model description instead of making it "base type filename model" * feat(app): add setting to allow unknown models * feat(ui): allow changing model format in MM * feat(app): add the installed model config to install complete events * chore(ui): typegen * feat(ui): toast warning when installed model is unidentified * docs: update config docstrings * chore(ui): typegen * tests(mm): fix test for MM, leave the UnknownModelConfig class in the list of configs * tidy(ui): prefer types from zod schemas for model attrs * chore(ui): lint * fix(ui): wrong translation string * feat(mm): normalized model storage Store models in a flat directory structure. Each model is in a dir named its unique key (a UUID). Inside that dir is either the model file or the model dir. * feat(mm): add migration to flat model storage * fix(mm): normalized multi-file/diffusers model installation no worky now worky * refactor: port MM probes to new api - Add concept of match certainty to new probe - Port CLIP Embed models to new API - Fiddle with stuff * feat(mm): port TIs to new API * tidy(mm): remove unused probes * feat(mm): port spandrel to new API * fix(mm): parsing for spandrel * fix(mm): loader for clip embed * fix(mm): tis use existing weight_files method * feat(mm): port vae to new API * fix(mm): vae class inheritance and config_path * tidy(mm): patcher types and import paths * feat(mm): better errors when invalid model config found in db * feat(mm): port t5 to new API * feat(mm): make config_path optional * refactor(mm): simplify model classification process Previously, we had a multi-phase strategy to identify models from their files on disk: 1. Run each model config classes' `matches()` method on the files. It checks if the model could possibly be an identified as the candidate model type. This was intended to be a quick check. Break on the first match. 2. If we have a match, run the config class's `parse()` method. It derive some additional model config attrs from the model files. This was intended to encapsulate heavier operations that may require loading the model into memory. 3. Derive the common model config attrs, like name, description, calculate the hash, etc. Some of these are also heavier operations. This strategy has some issues: - It is not clear how the pieces fit together. There is some back-and-forth between different methods and the config base class. It is hard to trace the flow of logic until you fully wrap your head around the system and therefore difficult to add a model architecture to the probe. - The assumption that we could do quick, lightweight checks before heavier checks is incorrect. We often _must_ load the model state dict in the `matches()` method. So there is no practical perf benefit to splitting up the responsibility of `matches()` and `parse()`. - Sometimes we need to do the same checks in `matches()` and `parse()`. In these cases, splitting the logic is has a negative perf impact because we are doing the same work twice. - As we introduce the concept of an "unknown" model config (i.e. a model that we cannot identify, but still record in the db; see #8582), we will _always_ run _all_ the checks for every model. Therefore we need not try to defer heavier checks or resource-intensive ops like hashing. We are going to do them anyways. - There are situations where a model may match multiple configs. One known case are SD pipeline models with merged LoRAs. In the old probe API, we relied on the implicit order of checks to know that if a model matched for pipeline _and_ LoRA, we prefer the pipeline match. But, in the new API, we do not have this implicit ordering of checks. To resolve this in a resilient way, we need to get all matches up front, then use tie-breaker logic to figure out which should win (or add "differential diagnosis" logic to the matchers). - Field overrides weren't handled well by this strategy. They were only applied at the very end, if a model matched successfully. This means we cannot tell the system "Hey, this model is type X with base Y. Trust me bro.". We cannot override the match logic. As we move towards letting users correct mis-identified models (see #8582), this is a requirement. We can simplify the process significantly and better support "unknown" models. Firstly, model config classes now have a single `from_model_on_disk()` method that attempts to construct an instance of the class from the model files. This replaces the `matches()` and `parse()` methods. If we fail to create the config instance, a special exception is raised that indicates why we think the files cannot be identified as the given model config class. Next, the flow for model identification is a bit simpler: - Derive all the common fields up-front (name, desc, hash, etc). - Merge in overrides. - Call `from_model_on_disk()` for every config class, passing in the fields. Overrides are handled in this method. - Record the results for each config class and choose the best one. The identification logic is a bit more verbose, with the special exceptions and handling of overrides, but it is very clear what is happening. The one downside I can think of for this strategy is we do need to check every model type, instead of stopping at the first match. It's a bit less efficient. In practice, however, this isn't a hot code path, and the improved clarity is worth far more than perf optimizations that the end user will likely never notice. * refactor(mm): remove unused methods in config.py * refactor(mm): add model config parsing utils * fix(mm): abstractmethod bork * tidy(mm): clarify that model id utils are private * fix(mm): fall back to UnknownModelConfig correctly * feat(mm): port CLIPVisionDiffusersConfig to new api * feat(mm): port SigLIPDiffusersConfig to new api * feat(mm): make match helpers more succint * feat(mm): port flux redux to new api * feat(mm): port ip adapter to new api * tidy(mm): skip optimistic override handling for now * refactor(mm): continue iterating on config * feat(mm): port flux "control lora" and t2i adapter to new api * tidy(ui): use Extract to get model config types * fix(mm): t2i base determination * feat(mm): port cnet to new api * refactor(mm): add config validation utils, make it all consistent and clean * feat(mm): wip port of main models to new api * feat(mm): wip port of main models to new api * feat(mm): wip port of main models to new api * docs(mm): add todos * tidy(mm): removed unused model merge class * feat(mm): wip port main models to new api * tidy(mm): clean up model heuristic utils * tidy(mm): clean up ModelOnDisk caching * tidy(mm): flux lora format util * refactor(mm): make config classes narrow Simpler logic to identify, less complexity to add new model, fewer useless attrs that do not relate to the model arch, etc * refactor(mm): diffusers loras w * feat(mm): consistent naming for all model config classes * fix(mm): tag generation & scattered probe fixes * tidy(mm): consistent class names * refactor(mm): split configs into separate files * docs(mm): add comments for identification utils * chore(ui): typegen * refactor(mm): remove legacy probe, new configs dir structure, update imports * fix(mm): inverted condition * docs(mm): update docsstrings in factory.py * docs(mm): document flux variant attr * feat(mm): add helper method for legacy configs * feat(mm): satisfy type checker in flux denoise * docs(mm): remove extraneous comment * fix(mm): ensure unknown model configs get unknown attrs * fix(mm): t5 identification * fix(mm): sdxl ip adapter identification * feat(mm): more flexible config matching utils * fix(mm): clip vision identification * feat(mm): add sanity checks before probing paths * docs(mm): add reminder for self for field migrations * feat(mm): clearer naming for main config class hierarchy * feat(mm): fix clip vision starter model bases, add ref to actual models * feat(mm): add model config schema migration logic * fix(mm): duplicate import * refactor(mm): split big migration into 3 Split the big migration that did all of these things into 3: - Migration 22: Remove unique contraint on base/name/type in models table - Migration 23: Migrate configs to v6.8.0 schemas - Migration 24: Normalize file storage * fix(mm): pop base/type/format when creating unknown model config * fix(db): migration 22 insert only real cols * fix(db): migration 23 fall back to unknown model when config change fails * feat(db): run migrations 23 and 24 * fix(mm): false negative on flux lora * fix(mm): vae checkpoint probe checking for dir instead of file * fix(mm): ModelOnDisk skips dirs when looking for weights Previously a path w/ any of the known weights suffixes would be seen as a weights file, even if it was a directory. We now check to ensure the candidate path is actually a file before adding it to the list of weights. * feat(mm): add method to get main model defaults from a base * feat(mm): do not log when multiple non-unknown model matches * refactor(mm): continued iteration on model identifcation * tests(mm): refactor model identification tests Overhaul of model identification (probing) tests. Previously we didn't test the correctness of probing except in a few narrow cases - now we do. See tests/model_identification/README.md for a detailed overview of the new test setup. It includes instructions for adding a new test case. In brief: - Download the model you want to add as a test case - Run a script against it to generate the test model files - Fill in the expected model type/format/base/etc in the generated test metadata JSON file Included test cases: - All starter models - A handful of other models that I had installed - Models present in the previous test cases as smoke tests, now also tested for correctness * fix(mm): omit type/format/base when creating unknown config instance * feat(mm): use ValueError for model id sanity checks * feat(mm): add flag for updating models to allow class changes * tests(mm): fix remaining MM tests * feat: allow users to edit models freely * feat(ui): add warning for model settings edit * tests(mm): flux state dict tests * tidy: remove unused file * fix(mm): lora state dict loading in model id * feat(ui): use translation string for model edit warning * docs(db): update version numbers in migration comments * chore: bump version to v6.9.0a1 * docs: update model id readme * tests(mm): attempt to fix windows model id tests * fix(mm): issue with deleting single file models * feat(mm): just delete the dir w/ rmtree when deleting model * tests(mm): windows CI issue * fix(ui): typegen schema sync * fix(mm): fixes for migration 23 - Handle CLIP Embed and Main SD models missing variant field - Handle errors when calling the discriminator function, previously only handled ValidationError but it could be a ValueError or something else - Better logging for config migration * chore: bump version to v6.9.0a2 * chore: bump version to v6.9.0a3	2025-10-15 10:18:53 +11:00
jiangmencity	5259693ed1	chore: fix some comments Signed-off-by: jiangmencity <jiangmen@52it.net>	2025-08-14 09:32:54 +10:00
Kevin Turner	8bd52ed744	fix: improve gguf performance with torch.compile pytorch 2.7 does not implement `set.__contains__`, so make this a list instead. See https://github.com/pytorch/pytorch/issues/145761	2025-05-22 13:42:09 +10:00
David Burnett	6c0bd7d150	fix import ordering, remove code I reverted that the resync added back	2025-05-19 11:16:23 +10:00
David Burnett	99e154d773	fix picky ruff issue	2025-05-19 11:16:23 +10:00
David Burnett	e4e43ae126	fix missing bracket	2025-05-19 11:16:23 +10:00
David Burnett	a07fac6180	raise exected exception when attempting to change dtype	2025-05-19 11:16:23 +10:00
David Burnett	93d4b00082	Add to overload for GGMLTensor, so calling to on the model moves the quantized data as well	2025-05-19 11:16:23 +10:00
David Burnett	86719f2065	revert to overload due to failing tests, use Torch futures instead	2025-05-19 11:16:23 +10:00
David Burnett	5271fc1cac	fix picky ruff issue	2025-05-19 11:16:23 +10:00
David Burnett	96ff7d9093	fix missing bracket	2025-05-19 11:16:23 +10:00
David Burnett	6f73d9e9c6	raise exected exception when attempting to change dtype	2025-05-19 11:16:23 +10:00
David Burnett	29b406a84b	Add to overload for GGMLTensor, so calling to on the model moves the quantized data as well	2025-05-19 11:16:23 +10:00
Ryan Dick	5ea7953537	Update GGMLTensor with ops necessary to work with ConcatenatedLoRALayer.	2025-01-28 14:51:35 +00:00
Ryan Dick	a8b2c4c3d2	Add inference tests for all custom module types (i.e. to test autocasting from cpu to device).	2024-12-26 18:33:46 +00:00
Ryan Dick	3f990393a1	Simplify the state management in InvokeLinear8bitLt and add unit tests. This is in preparation for wrapping it to support streaming of weights from cpu to gpu.	2024-12-24 14:32:11 +00:00
Ryan Dick	65fcbf5f60	Bump bitsandbytes. The new verson contains improvements to state_dict loading/saving for LLM.int8 and promises improved speed on some HW.	2024-12-24 14:32:11 +00:00
Ryan Dick	9369b39a12	Add GGMLTensor op.	2024-12-17 13:20:19 +00:00
David Burnett	9bd17ea02f	Get flux working with MPS on 2.4.1, with GGUF support	2024-10-23 10:20:42 +11:00
Brandon Rising	d328eaf743	Remove no longer used dequantize_tensor function	2024-10-02 18:33:05 -04:00
Ryan Dick	bc63e2acc5	Add workaround for FLUX GGUF models with incorrect img_in.weight shape.	2024-10-02 18:33:05 -04:00
Ryan Dick	ec7e771942	Add a compute_dtype field to GGMLTensor.	2024-10-02 18:33:05 -04:00
Ryan Dick	fe84013392	Add unit tests for GGMLTensor.	2024-10-02 18:33:05 -04:00
Ryan Dick	710f81266b	Fix type errors in GGMLTensor.	2024-10-02 18:33:05 -04:00
Brandon Rising	446e2884bc	Remove no longer used code paths, general cleanup of new dequantization code, update probe	2024-10-02 18:33:05 -04:00
Brandon Rising	7d9f125232	Run ruff and update imports	2024-10-02 18:33:05 -04:00
Brandon Rising	66bbd62758	Run ruff and fix typing in torch patcher	2024-10-02 18:33:05 -04:00
Brandon Rising	0875e861f5	Various updates to gguf performance	2024-10-02 18:33:05 -04:00
Ryan Dick	f06765dfba	Get alternative GGUF implementation working... barely.	2024-10-02 18:33:05 -04:00
Ryan Dick	f347b26999	Initial experimentation with Tensor-like extension for GGUF.	2024-10-02 18:33:05 -04:00
Brandon Rising	2bfb0ddff5	Initial GGUF support for flux models	2024-10-02 18:33:05 -04:00
Ryan Dick	29fe1533f2	Fix bug in InvokeLinear8bitLt that was causing old state information to persist after loading from a state dict. This manifested as state tensors being left on the GPU even when a model had been offloaded to the CPU cache.	2024-08-29 19:08:18 +00:00
Brandon Rising	65bb46bcca	Rename params for flux and flux vae, add comments explaining use of the config_path in model config	2024-08-26 20:17:50 -04:00
Ryan Dick	635d2f480d	ruff	2024-08-26 20:17:50 -04:00
Brandon Rising	56b9906e2e	Setup scaffolding for in progress images and add ability to cancel the flux node	2024-08-26 20:17:50 -04:00
Ryan Dick	dff4a88baa	Move quantization scripts to a scripts/ subdir.	2024-08-26 20:17:50 -04:00
Ryan Dick	a21f6c4964	Update docs for T5 quantization script.	2024-08-26 20:17:50 -04:00
Ryan Dick	97562504b7	Remove all references to optimum-quanto and downgrade diffusers.	2024-08-26 20:17:50 -04:00
Ryan Dick	b9dd354e2b	Fixes to the T5XXL quantization script.	2024-08-26 20:17:50 -04:00
Ryan Dick	33c2fbd201	Add script for quantizing a T5 model.	2024-08-26 20:17:50 -04:00
Ryan Dick	b66f19d4d1	Add docs to the quantization scripts.	2024-08-26 20:17:50 -04:00
Ryan Dick	4105a78b83	Update load_flux_model_bnb_llm_int8.py to work with a single-file FLUX transformer checkpoint.	2024-08-26 20:17:50 -04:00
Ryan Dick	19a68afb3a	Fix bug in InvokeInt8Params that was causing it to use double the necessary VRAM.	2024-08-26 20:17:50 -04:00
Ryan Dick	cfac7c8189	Move requantize.py to the quatnization/ dir.	2024-08-26 20:17:50 -04:00
Ryan Dick	ac96f187bd	Remove duplicate log_time(...) function.	2024-08-26 20:17:50 -04:00
Brandon Rising	57168d719b	Fix styling/lint	2024-08-26 20:17:50 -04:00

1 2

59 Commits