Unload model from VRAM, what is the speed bottleneck?

When I call the API node /free it will invoke the free_memory() function that moves the model from the VRAM back into the RAM.

With a 20GB model, this takes about 8 seconds. What is the bottleneck here? Both RAM (DDR4) and the VRAM and the PCIe4 are much faster than this. During this operation, I see no CPU activity.

Freeing up VRAM should be able in a split second, but how?

The speed is about the same as reading from SSD to RAM. But I was expecting that RAM to VRAM and around would be at least as fast as pci4 could do.