NVMe goes to great lengths to align I/O with the submitting CPU. That works great for submission, but for completion we're at the mercy of the network card to do 'something'. Idea is to implement a receive flow steering to classify incoming network traffic to the correct queue/CPU.