I am using the portable version with Windows and the newest driver for the AMD 9070xt. While using a Wan 2.2 workflow same in structure to the one you can download through ComyUI after a few really slow generations at a low resolution it will give a OOM error and a message comes up saying comfy will ty using tiled VAE encoder instead, this makes my generations 10X faster but I dont seem to able to use the tiled encode node in the wan workflow because it does not plug into any of the existing nodes.
The version of VAE tiled encoding that comfyui turns on seems to greatly boost AMD performance and stops OOM errors, This also vastly increases the speed of QWEN generations as in it went from 31 minutes a still to 2 minutes a still and since Comfy can turn on this setting by its self I am sure there is a Flag that can be intergrated on startup.
My system is Windows 10 (64 bit), 40GB ram, Ryzen 5800x and a AMD 7900xt.
Also AMD user here. Also WAN 2.2 workflow. The slowdown happens on both, Windows and Linux.
I don’t think that tiled encoding will help much. It fixes at the symptoms, not the cause. Usually the first run runs through just fine here. VAE encoding is done in an eye blink. Ten seconds maxium usually. The second run however then slows down to ten minutes and more just at the very first step to encode the VAE.
I have investigated a bit and the problem seems to be that the VAE does not release the vram after first run. Interestingly just a WAN 2.2 workflow is affected, like in your case. LTX 2 I2V does not have this problem.
I will continue to investigate.
I might be at something, but no guarantee. I thought this more than once now …
With help from Claude i am currently here:
Now I see the full picture! free_memory calls partially_unload_ram – but only if ram_to_free > 0:
pythonram_to_free = 1e32
if not DISABLE_SMART_MEMORY:
ram_to_free = ram_required - get_free_ram()
You have --disable-smart-memory set! This means ram_to_free remains 1e32 – which should actually free up RAM. But ram_required is passed as 0 when calling free_memory!
If ram_required=0 is passed, then ram_to_free = 0 - get_free_ram() is negative – so RAM is never freed.
That’s the actual bug. ComfyUI calls free_memory with ram_required=0 and therefore never frees up RAM.
model_management.py , model_patcher.py
Interesting thanks, My issue is not exaclty the same as your one as mine starts slow and then after an ‘OOM - Switching to Tiled VAE encode’ it is then turbo charged, I downloaded a custom node called ‘WanImageToVideo (Tiled)’ which means I can set both the encode and decode to tiled and this fixes the OOM but it does not give me the speed benifits that the automatic switch to tiled seems to bring. Using a disable SAM flag does stop OOM again but I never gain the speed increases that I see with the ComfyUI forced tiled encode which is what I am after. Maybe it is more a case of a bottleneck on my end and I am chasing the wrong horse.
Ah thanks for clarification.
1 Like
People can see what I am talking about in action here, It usually takes about 5 or 6 wan 2.2 renders tot trigger the ‘OOM switching to tiled VAE encoding’ message, but you can see in this curtailed image the render times before and after and how much better my AMD system performs after, from 14 minutes for a 544 x 544 image to video workflow using 14B lightning fp8 models at 6 seconds to less than 3 minutes with comfy forced tiled encoding and as I said before I tried the custom node WanImage2video(tiled) and it has very little effect, it is only the OOM tiled VAE encoder that speeds everything up.
Just an update, I installed ComfyUI Desktop and it seems like my issues with generation times are fixed, What would take 16 to 25 minutes to render in portable only takes 3 to 6 minutes in Desktop for the same settings, image and prompt. I dont know why portable was so slow.
UPDATE: no, I am wrong as when I was using ComfyUI desktop this moring it was super slow just like portable so checked the log from last night and it did indeed get a ‘arning: Ran out of memory when regular VAE encoding, retrying with tiled VAE encoding.’ which is what made it appear super fast, so back to square one