Add and use more facilities to the *internal* podresources client.
Checking e2e test runs, we have quite some
```
rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial unix /var/lib/kubelet/pod-resources/kubelet.sock: connect: connection refused": rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial unix /var/lib/kubelet/pod-resources/kubelet.sock: connect: connection refused"
```
This is likely caused by kubelet restarts, which we do plenty in e2e tests,
combined with the fact gRPC does lazy connection AND we don't really
check the errors in client code - we just bubble them up.
While it's arguably bad we don't check properly error codes, it's also
true that in the main case, e2e tests, the functions should just never
fail besides few well known cases, we're connecting over a
super-reliable unix domain socket after all.
So, we centralize the fix adding a function (alongside with minor
cleanups) which wants to trigger and ensure the connection happens,
localizing the changes just here. The main advantage is this approach
is opt-in, composable, and doesn't leak gRPC details into the client
code.
Signed-off-by: Francesco Romani <fromani@redhat.com>
This deflakes the "Containers Lifecycle should not launch second
container before PostStart of the first container completed" test by
assigning enough time to finish the postStart hook.
With the new busybox, ash has a built-in sleep command. Prior to this
change we were creating half the pids expected since `sleep` wasn't
actually launching a new binary. Use the full path to /bin/sleep which
avoids the built-in and actually launches a new process.
This reverts commit bd6f548746.
Running as serial didn't completely eliminate the flake so I think
there's something more going on here. Reverting the change to serial
since its not a solution.
In testing I could only reproduce the flake by running stress-ng to load
the CPU. Running it as serial should reduce and hopefully eliminate the
flakiness.
/exports *(rw,fsid=0,insecure,no_root_squash)
can be mounted as `/exports` using NFSv3 and `/` using NFSv4
Mount as '/', since clients that support both can try both.
The tests have been introduced in
ca7be7dc6d
and checked for `ecc` in `/proc/self/status` since its creation.
We got a new field `Seccomp_filters:` with the Linux commit
c818c03b66,
means that `ecc` would now match both and interfere with possible test
results depending on the host.
The field `Seccomp:` got introduced in
2f4b3bf6b2
and has never changed since then, means we can use it directly to make
the tests more strict.
Refers to https://github.com/kubernetes-sigs/cri-tools/pull/1236
Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
PodResourcesAPI reports in the List call about resources of pods in terminal phase.
The internal managers reassign resources assigned to pods in terminal phase, so podresources should ignore them.
Whether this behavior intended or not (the docs are not unequivocal)
this e2e test demonstrates and verifies the mentioned above.
Signed-off-by: Talor Itzhak <titzhak@redhat.com>