Unload model from VRAM, what is the speed bottleneck?

When I call the API node /free it will invoke the free_memory() function that moves the model from the VRAM back into the RAM.

With a 20GB model, this takes about 8 seconds. What is the bottleneck here? Both RAM (DDR4) and the VRAM and the PCIe4 are much faster than this. During this operation, I see no CPU activity.

Freeing up VRAM should be able in a split second, but how?

The speed is about the same as reading from SSD to RAM. But I was expecting that RAM to VRAM and around would be at least as fast as pci4 could do.

This forum seems not really active, know any place I would better ask such things?

ComfyUI Discord maybe. This forum is really dead, unfortunately.

Thanks, forums are dying all over the world, used to be so much fun :grinning_face_with_smiling_eyes:

I feel you ^^

Thanks. If someone comes across this topic however, the --high-vram flag seems to fixes it, because it does not offload into the ram. Unload model(s) in vram in 1 second instead of 8 or more. Works for me.

Still I believe the whole transfer between SSD → RAM → VRAM could be faster, but maybe Python is not the most suitable thing for this.