When I call the API node /free it will invoke the free_memory() function that moves the model from the VRAM back into the RAM.
With a 20GB model, this takes about 8 seconds. What is the bottleneck here? Both RAM (DDR4) and the VRAM and the PCIe4 are much faster than this. During this operation, I see no CPU activity.
Freeing up VRAM should be able in a split second, but how?