The e2e topology manager want to test the resource alignment using devices, and the easiest devices to use are the SRIOV devices at this moment. The resource alignment test cases are run for each supported policies, in a loop. The tests manage the SRIOV device plugin; up until now, the plugin was set up and tore down at each loop. There is no real need for that. Each loop must reconfigure (thus restart) the kubelet, but the device plugin can set up and tore down just once for all the policies, thus once. The kubelet can reconnect just fine to a running device plugin. This way, we greatly reduce the interactions and the complexity of the test environment, making it easier to understand and more robust, and we trim down some minutes from execution time. However, this patch also hides (not solves) a test flake we observed on some environment. The issue is hardly reproduceable and not well understood, but seems caused by doing the sriov dp setup/teardown in each policy testing loop. Investigation so far suggests that the kubelet sometimes have a stale state after the sriovdp teardown/setup cycle, leading to flakes and false negatives. We tried to address this in https://github.com/kubernetes/kubernetes/pull/95611 with no conclusive results yet. This patch was posted because overall we believe this patch gains exceeds the drawbacks (hiding the aforementioned flake) and because understanding the potential interaction issues between the sriovdp and the kubelet deserve a separate test. Signed-off-by: Francesco Romani <fromani@redhat.com>
32 KiB
32 KiB