As we all know, Vram is very limited for most of us.
Its basically impossible to Run an LLM & say Flux Dev in the Vram at the same time (even on a RTX 4090)
If you have your LLM and ComfUI running on the same server, its critical to be able to quickly and automatically free up Vram
Example of how things should work to efficiently manage Vram
- Ask LLM to generate image prompt via Open-webui
- Once prompt is generated, Open-webui automatically UNLOADS LLM from Vram via Keep Alive setting 0 minutes
- Use freshly generated image prompt to send to ComfyUI via the Open-webui interface
- ComfyUI generate amazing image
- ComfyUI should then AUTOMATICALLY UNLOAD models from Vram, freeing up server ready for new LLM request.
Is this already possible?
If so, HOW?
If not, i really think it would make all our workflows far more efficient if this were possible.
Thanks