The test "should enforce egress policy allowing traffic to a server in a
different namespace based on PodSelector and NamespaceSelector
[Feature:NetworkPolicy]" is flaky because it doesn't wait for the server
Pod to be ready before testing traffic via its service, then even the
NetworkPolicy allows it, the SYN packets will be rejected by iptables
because the service has no endpoints at that moment.
This PR fixes it by making it wait for Pods to be ready like other
tests.
Due to an oversight, the e2e topology manager tests
were leaking a configmap and a serviceaccount.
This patch ensures a proper cleanup
Signed-off-by: Francesco Romani <fromani@redhat.com>
Up until now, the test validated the alignment of resources
only in the first container in a pod. That was just an overlook.
With this patch, we validate all the containers in a given pod.
Signed-off-by: Francesco Romani <fromani@redhat.com>
Add autodetection code to figure out on which NUMA node are
the devices attached to.
This autodetection work under the assumption all the VFs in
the system must be used for the tests.
Should not this be the case, or in general to handle non-trivial
configurations, we keep the annotations mechanism added to the
SRIOV device plugin config map.
Signed-off-by: Francesco Romani <fromani@redhat.com>
On single-NUMA node systems the numa_node of sriov devices was
sometimes reported as "-1" instead of, say, 0. This makes some
tests that should succeed[0] fail unexpectedly.
The reporting works as expected on real multi-NUMA node systems.
This small workaround was added to handle this corner case,
but it makes overall the code less readable and a bit too lenient,
hence we remove it.
+++
[0] on a single NUMA node system some resources are obviously
always aligned if the pod can be admitted. It boils down to the
node capacity at pod admittal time.
Signed-off-by: Francesco Romani <fromani@redhat.com>
The e2e_node topology_manager check have a early, quick check
to rule out systems without sriov device, thus skipping the tests.
The first version of the ckeck detected PFs, (Physical Functions),
under the assumption that VFs (Virtual Functions) were already been
created. This works because, obviously, you can't have VFs without PFs.
However, it's a little safer and easier to understand if we check
firectly for VFs, bailing out from systems which don't provide them.
Nothing changes for properly configured test systems.
Signed-off-by: Francesco Romani <fromani@redhat.com>
This is useful for logs from daemonset pods because for those it is
often relevant which node they ran on because they interact with
resources or other pods on the host.
To keep the log prefix short, it gets limited to a maximum length of
10 characters.
Quite a few images are only used a few times in a few tests. Thus,
the images are being centralized into the agnhost image, reducing
the number of images that have to be pulled and used.
This PR replaces the usage of the following images with agnhost:
- resource-consumer-controller
- test-webserver
Commit 69a473be3 broke the test case (that was checking
that a file from a volume can't be read) by adding a
(wrong) assumption that error should be nil.
Fix the assumption (we do expect the error here).
Note that we were checking file contents before; now when
we check the error from read, checking the contents is
redundant.
The other issue with the test is SELinux should be in enforcing
mode for this to work, so let's check that first to avoid
false positives.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Address review comments and move the helper function
in the `framework/kubelet` package to avoid circular deps
(see https://github.com/kubernetes/kubernetes/issues/81245)
Signed-off-by: Francesco Romani <fromani@redhat.com>
The test "should allow ingress access from updated pod" fails regardless
of which CNI plugin is enabled. It's because the test assumes the client
Pod can recheck connectivity after updating its label, but the client
won't restart after the first failure, so the second check will always
fail. The PR creates a client Pod with OnFailure RestartPolicy to fix it.
In addition to the above test that checks rule selector takes effect on
updated client pod, the PR adds a test "should deny ingress access to
updated pod" to ensure network policy selector can take effect on updated
server pod.
In addition to getting overall performance measurements from golang benchmark,
collect metrics that provides information about insides of the scheduler itself.
This is a first step towards improving what we collect about the scheduler.
Metrics in question:
- scheduler_scheduling_algorithm_predicate_evaluation_seconds
- scheduler_scheduling_algorithm_priority_evaluation_seconds
- scheduler_binding_duration_seconds
- scheduler_e2e_scheduling_duration_seconds
Scheduling throughput is computed on the fly inside perfScheduling.
this patch moves the helper getCurrentKubeletConfig function,
used in both e2e and e2e_node tests and previously duplicated,
in the common framework.
There are no intended changes in behaviour.
Signed-off-by: Francesco Romani <fromani@redhat.com>
All tests remove the test client pod, usually in TestVolumeClient.
Rename TestCleanup to TestServerCleanup.
In addition, remove few calls to Test(Server)Cleanup that do not do anything
useful (server pod is not used in these tests).
- Add handlers for service account issuer metadata.
- Add option to manually override JWKS URI.
- Add unit and integration tests.
- Add a separate ServiceAccountIssuerDiscovery feature gate.
Additional notes:
- If not explicitly overridden, the JWKS URI will be based on
the API server's external address and port.
- The metadata server is configured with the validating key set rather
than the signing key set. This allows for key rotation because tokens
can still be validated by the keys exposed in the JWKs URL, even if the
signing key has been rotated (note this may still be a short window if
tokens have short lifetimes).
- The trust model of OIDC discovery requires that the relying party
fetch the issuer metadata via HTTPS; the trust of the issuer metadata
comes from the server presenting a TLS certificate with a trust chain
back to the from the relying party's root(s) of trust. For tests, we use
a local issuer (https://kubernetes.default.svc) for the certificate
so that workloads within the cluster can authenticate it when fetching
OIDC metadata. An API server cannot validly claim https://kubernetes.io,
but within the cluster, it is the authority for kubernetes.default.svc,
according to the in-cluster config.
Co-authored-by: Michael Taufen <mtaufen@google.com>