Commit Graph

46 Commits

Author SHA1 Message Date
Alexander Eichhorn
dd5758b53d
fix: SDXL DoRA LoRA fails with enable_partial_loading=true (#9063)
* fix: SDXL DoRA LoRA fails with enable_partial_loading=true

cast_to_device returns plain torch.Tensor instead of torch.nn.Parameter,
causing _aggregate_patch_parameters to replace valid weights with meta
device dummies, falsely triggering DoRA's quantization guard.

Fixes invoke-ai/InvokeAI#8624

* test: regression coverage for DoRA + partial-loading + CPU→device autocast

Adds targeted coverage for the bug fixed in a0a87212 (#8624, PR #9063):

- test_aggregate_patch_parameters_preserves_plain_tensor_with_dora:
  CPU-only unit test that feeds a plain torch.Tensor (as handed in by
  _cast_weight_bias_for_input) into _aggregate_patch_parameters with a
  DoRA patch. Pre-fix, the tensor was replaced by a meta-device dummy,
  tripping DoRA's quantization guard.

- "single_dora" variant in the patch_under_test fixture: exercises the
  full CUDA/MPS autocast hot path via
  test_linear_sidecar_patches_with_autocast_from_cpu_to_device.

---------

Co-authored-by: Jonathan <34005131+JPPhoto@users.noreply.github.com>
2026-04-25 16:33:20 +00:00
Jonathan
0d7205ff79
Handle mixed-dtype mismatches in autocast linear and conv wrappers (#9006)
* Handle CustomConv2d bias dtype mismatches

* Fix mixed-dtype autocast regressions

* Format custom_conv2d with ruff
2026-04-20 20:31:35 +00:00
Jonathan
ee600973ed
Broaden text encoder partial-load recovery (#9034) 2026-04-09 20:09:40 -04:00
Jonathan
dc5007fe95
Fix/model cache Qwen/CogView4 cancel repair (#8959)
* Repair partially loaded Qwen models after cancel to avoid device mismatches

* ruff

* Repair CogView4 text encoder after canceled partial loads

* Avoid MPS CI crash in repair regression test

* Fix MPS device assertion in repair test
2026-03-15 10:04:15 -04:00
copilot-swe-agent[bot]
b7afd9b5b3 Fix test failures caused by MagicMock TypeError
Configure mock logger to return a valid log level for getEffectiveLevel()
to prevent TypeError when comparing with logging.DEBUG constant.

The issue was that ModelCache._log_cache_state() checks
self._logger.getEffectiveLevel() > logging.DEBUG, and when the logger
is a MagicMock without configuration, getEffectiveLevel() returns another
MagicMock, causing a TypeError when compared with an int.

Fixes all 4 test failures in test_model_cache_timeout.py

Co-authored-by: lstein <111189+lstein@users.noreply.github.com>
2025-12-24 05:42:45 +00:00
Lincoln Stein
9d1de81fe2 (style) correct ruff formatting error 2025-12-24 00:19:25 -05:00
copilot-swe-agent[bot]
8d76b4e4d4 Fix ruff whitespace errors and improve timeout logging
- Remove all trailing whitespace (W293 errors)
- Add debug logging when timeout fires but activity detected
- Add debug logging when timeout fires but cache is empty
- Only log "Clearing model cache" message when actually clearing
- Prevents misleading timeout messages during active generation

Co-authored-by: lstein <111189+lstein@users.noreply.github.com>
2025-12-24 04:05:57 +00:00
copilot-swe-agent[bot]
c3217d8a08 Address code review feedback
- Remove unused variable in test
- Add clarifying comment for daemon thread setting
- Add detailed comment explaining cache clearing with 1000 GB value
- Improve code documentation

Co-authored-by: lstein <111189+lstein@users.noreply.github.com>
2025-12-24 00:27:39 +00:00
copilot-swe-agent[bot]
75a14e2a4b Add unit tests for model cache timeout functionality
- Created test_model_cache_timeout.py with comprehensive tests
- Tests timeout clearing behavior
- Tests activity resetting timeout
- Tests no-timeout default behavior
- Tests shutdown canceling timers

Co-authored-by: lstein <111189+lstein@users.noreply.github.com>
2025-12-24 00:24:31 +00:00
psychedelicious
a8a07598c8 chore: ruff 2025-08-18 21:14:00 +10:00
psychedelicious
23206e22e8 tests: skip excessively flaky MPS-specific tests in CI 2025-08-18 21:14:00 +10:00
Ryan Dick
5357d6e08e Rename ConcatenatedLoRALayer to MergedLayerPatch. And other minor cleanup. 2025-01-28 14:51:35 +00:00
Ryan Dick
28514ba59a Update ConcatenatedLoRALayer to work with all sub-layer types. 2025-01-28 14:51:35 +00:00
Ryan Dick
e2f05d0800 Add unit tests for LoKR patch layers. The new tests trigger a bug when LoKR layers are applied to BnB-quantized layers (also impacts several other LoRA variant types). 2025-01-22 09:20:40 +11:00
Ryan Dick
c76d08d1fd Add keep_ram_copy option to CachedModelOnlyFullLoad. 2025-01-16 15:08:23 +00:00
Ryan Dick
04087c38ce Add keep_ram_copy option to CachedModelWithPartialLoad. 2025-01-16 14:51:44 +00:00
Ryan Dick
d7ab464176 Offload the current model when locking if it is already partially loaded and we have insufficient VRAM. 2025-01-07 02:53:44 +00:00
Ryan Dick
402dd840a1 Add seed to flaky unit test. 2025-01-07 00:31:00 +00:00
Ryan Dick
9a0a226ce1 Fix bitsandbytes imports in unit tests on MacOS. 2024-12-30 10:41:48 -05:00
Ryan Dick
52fc5a64d4 Add a unit test for a LoRA patch applied to a quantized linear layer with weights streamed from CPU to GPU. 2024-12-29 17:14:55 +00:00
Ryan Dick
a8bef59699 First pass at making custom layer patches work with weights streamed from the CPU to the GPU. 2024-12-29 17:01:37 +00:00
Ryan Dick
6d49ee839c Switch the LayerPatcher to use 'custom modules' to manage layer patching. 2024-12-29 01:18:30 +00:00
Ryan Dick
918f541af8 Add unit test for a SetParameterLayer patch applied to a CustomFluxRMSNorm layer. 2024-12-28 20:44:48 +00:00
Ryan Dick
93e76b61d6 Add CustomFluxRMSNorm layer. 2024-12-28 20:33:38 +00:00
Ryan Dick
f2981979f9 Get custom layer patches working with all quantized linear layer types. 2024-12-27 22:00:22 +00:00
Ryan Dick
ef970a1cdc Add support for FluxControlLoRALayer in CustomLinear layers and add a unit test for it. 2024-12-27 21:00:47 +00:00
Ryan Dick
5ee7405f97 Add more unit tests for custom module LoRA patching: multiple LoRAs and ConcatenatedLoRALayers. 2024-12-27 19:47:21 +00:00
Ryan Dick
e24e386a27 Add support for patches to CustomModuleMixin and add a single unit test (more to come). 2024-12-27 18:57:13 +00:00
Ryan Dick
b06d61e3c0 Improve custom layer wrap/unwrap logic. 2024-12-27 16:29:48 +00:00
Ryan Dick
7d6ab0ceb2 Add a CustomModuleMixin class with a flag for enabling/disabling autocasting (since it incurs some runtime speed overhead.) 2024-12-26 20:08:30 +00:00
Ryan Dick
9692a36dd6 Use a fixture to parameterize tests in test_all_custom_modules.py so that a fresh instance of the layer under test is initialized for each test. 2024-12-26 19:41:25 +00:00
Ryan Dick
b0b699a01f Add unit test to test that isinstance(...) behaves as expected with custom module types. 2024-12-26 18:45:56 +00:00
Ryan Dick
a8b2c4c3d2 Add inference tests for all custom module types (i.e. to test autocasting from cpu to device). 2024-12-26 18:33:46 +00:00
Ryan Dick
03944191db Split test_autocast_modules.py into separate test files to mirror the source file structure. 2024-12-24 22:29:11 +00:00
Ryan Dick
987c9ae076 Move custom autocast modules to separate files in a custom_modules/ directory. 2024-12-24 22:21:31 +00:00
Ryan Dick
0fc538734b Skip flaky test when running on Github Actions, and further reduce peak unit test memory. 2024-12-24 14:32:11 +00:00
Ryan Dick
7214d4969b Workaround a weird quirk of QuantState.to() and add a unit test to exercise it. 2024-12-24 14:32:11 +00:00
Ryan Dick
a83a999b79 Reduce peak memory used for unit tests. 2024-12-24 14:32:11 +00:00
Ryan Dick
f8a6accf8a Fix bitsandbytes imports to avoid ImportErrors on MacOS. 2024-12-24 14:32:11 +00:00
Ryan Dick
f8ab414f99 Add CachedModelOnlyFullLoad to mirror the CachedModelWithPartialLoad for models that cannot or should not be partially loaded. 2024-12-24 14:32:11 +00:00
Ryan Dick
c6795a1b47 Make CachedModelWithPartialLoad work with models that have non-persistent buffers. 2024-12-24 14:32:11 +00:00
Ryan Dick
0a8fc74ae9 Add CachedModelWithPartialLoad to manage partially-loaded models using the new autocast modules. 2024-12-24 14:32:11 +00:00
Ryan Dick
dc54e8763b Add CustomInvokeLinearNF4 to enable CPU -> GPU streaming for InvokeLinearNF4 layers. 2024-12-24 14:32:11 +00:00
Ryan Dick
1b56020876 Add CustomInvokeLinear8bitLt layer for device streaming with InvokeLinear8bitLt layers. 2024-12-24 14:32:11 +00:00
Ryan Dick
97d56f7dc9 Add torch module autocast unit test for GGUF-quantized models. 2024-12-24 14:32:11 +00:00
Ryan Dick
fe0ef2c27c Add torch module autocast utilities. 2024-12-24 14:32:11 +00:00