I love that you guys work hard on reducing VRAM needs for new models. But I’m a bit surprised by statements like this one:
try to lower the tile_size, overlap, temporal_size, or temporal_overlap if you have memory of less than 32GB
As a veteran in the software industry, I do know that making software is hard. And I know that LLM, diffusion, generative AI is full of probabilistic math and uncertainty. But I would not think that’s the case for VRAM usage. If it can be measured, if it can be predicted based on formulas, known in advance, or worst case, determined via experimentation…can’t you just query the GPU, see what’s available, and then to a large extent, “solve” for a given amount of available VRAM and automate the optimizations needed?
Don’t get me wrong, I love what comfyui is doing with raw access to building the workflows. But if my hypothesis is correct, then it would be awfully cool to do things like:
- observe and report in the UI the VRAM usage of various nodes and node settings. Maybe even add a “profile” button that will precisely calculate consumption and show a report by node, but doesn’t run the whole workflow for real. Kinda like a --whatif flag on a command-line.
- instead of “try this or that”, warn me in the UI before I waste 20 min on an OOM, or an hour on something swapping to CPU. eg. if a selection or slider can cause overflow, gimme a hint in the UI. “Sure you want FP16 here? Maybe you should stick to FP8, buddy!”, or “This workflow [> details expander] requires 22.8 GB of VRAM to execute. Do you want to continue?”