Unload model from VRAM, what is the speed bottleneck?

Illon · June 10, 2025, 1:58pm

When I call the API node /free it will invoke the free_memory() function that moves the model from the VRAM back into the RAM.

With a 20GB model, this takes about 8 seconds. What is the bottleneck here? Both RAM (DDR4) and the VRAM and the PCIe4 are much faster than this. During this operation, I see no CPU activity.

Freeing up VRAM should be able in a split second, but how?

Illon · June 12, 2025, 7:45am

The speed is about the same as reading from SSD to RAM. But I was expecting that RAM to VRAM and around would be at least as fast as pci4 could do.

Illon · June 16, 2025, 1:16pm

This forum seems not really active, know any place I would better ask such things?

Arunderan · June 17, 2025, 6:24pm

ComfyUI Discord maybe. This forum is really dead, unfortunately.

Illon · June 18, 2025, 8:23pm

Thanks, forums are dying all over the world, used to be so much fun

Arunderan · June 23, 2025, 6:01am

I feel you ^^

Illon · June 23, 2025, 7:15am

Thanks. If someone comes across this topic however, the --high-vram flag seems to fixes it, because it does not offload into the ram. Unload model(s) in vram in 1 second instead of 8 or more. Works for me.

Still I believe the whole transfer between SSD → RAM → VRAM could be faster, but maybe Python is not the most suitable thing for this.

Topic		Replies	Views
Automatically UNLOAD Models from Vram V1-Windows suggestion , settings	2	727	February 15, 2025
Isn't VRAM usage deterministic by model/workflow? If so, is it possible to automatically optimize? AMA	0	194	December 28, 2024
Speed Benchmarks? V1-Mac issue , gpu-support	1	179	March 1, 2025
Some problems and feedback with Comfy Desktop V1-Windows issue , node-library	0	279	November 27, 2024
Unable to start ComfyUI Desktop v0.4.13 Custom Nodes	3	151	March 15, 2025

Unload model from VRAM, what is the speed bottleneck?

Related topics