The test validates the following endpoints
- createAppsV1NamespacedControllerRevision
- deleteAppsV1CollectionNamespacedControllerRevision
- deleteAppsV1NamespacedControllerRevision
- listAppsV1ControllerRevisionForAllNamespaces
- patchAppsV1NamespacedControllerRevision
- readAppsV1NamespacedControllerRevision
- replaceAppsV1NamespacedControllerRevision
As outlined in the KEP, we now graduate the Kubelet feature to beta
which means that it is enabled by default. The corresponding Kubelet
flag still defaults to `false`, but we now have the chance to e2e test
the feature by using a new serial test case.
KEP: https://github.com/kubernetes/enhancements/issues/2413
Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
e2e test validates the following 3 extra endpoints
- deleteCoreV1CollectionNamespacedResourceQuota
- listCoreV1ResourceQuotaForAllNamespaces
- patchCoreV1NamespacedResourceQuota
As described in 8c76845b03 ("test/e2e/network: fix a bug in the hostport e2e
test") if we have two pods with the same hostPort, hostIP, but different
protocols, a CNI may be buggy and decide to forward all traffic only to one of
these pods. Add a check that we receiving requests from different pods.
Co-authored-by: Antonio Ojea <antonio.ojea.garcia@gmail.com>
Making the LoggingConfiguration part of the versioned component-base/config API
had the theoretic advantage that components could have offered different
configuration APIs with experimental features limited to alpha versions (for
example, sanitization offered only in a v1alpha1.KubeletConfiguration). Some
components could have decided to only use stable logging options.
In practice, this wasn't done. Furthermore, we don't want different components
to make different choices regarding which logging features they offer to
users. It should always be the same everywhere, for the sake of consistency.
This can be achieved with a saner Go API by dropping the distinction between
internal and external LoggingConfiguration types. Different stability levels of
indidividual fields have to be covered by documentation (done) and potentially
feature gates (not currently done).
Advantages:
- everything related to logging is under component-base/logs;
previously this was scattered across different packages and
different files under "logs" (why some code was in logs/config.go
vs. logs/options.go vs. logs/logs.go always confused me again
and again when coming back to the code):
- long-term config and command line API are clearly separated
into the "api" package underneath that
- logs/logs.go itself only deals with legacy global flags and
logging configuration
- removal of separate Go APIs like logs.BindLoggingFlags and
logs.Options
- LogRegistry becomes an implementation detail, with less code
and less exported functionality (only registration needs to
be exported, querying is internal)
The hostport e2e test (sonobuoy run --e2e-focus 'validates that there is no
conflict between pods with same hostPort but different hostIP and protocol')
checks, in particular, that two pods with the same hostPort, the same hostIP,
but different L4 protocols can coexist on one node.
In order to do this, the test creates two pods with the same hostIP:hostPort,
one TCP-based, another UDP-based. However, both pods listen on both protocols:
netexec --http-port=8080 --udp-port=8080
This can happen that a CNI which doesn't distinguish between TCP and UDP
hostPorts forwards all traffic, TCP or UDP, to the same pod. As this pod
listens on both protocols it will reply to both requests, and the test
will think that everything works properly while the second pod is indeed
disconnected. Fix this by executing different commands in different pods:
TCP: netexec --http-port=8080 --udp-port=-1
UDP: netexec --http-port=8008 --udp-port=8080
The TCP pod now doesn't listen on UDP, and the UDP pod doesn't listen on TCP on
the target hostPort. The UDP pod still needs to listen on TCP on another port
so that a pod readiness check can be made.
Use a watch to detect invalid pod status updates in graceful node
shutdown node e2e test. By using a watch, all pod updates will be
captured while the previous logic required polling the api-server which
could miss some intermediate updates.
Signed-off-by: David Porter <david@porter.me>
This is useful in environments where the Deployment image is replaced
by another image from an internal registry (via fixture). In that case,
the populator running in populate mode should use the same image as the
populator running in controller mode.
The test is about SCTP and the accessed service only forwarded SCTP
traffic to the server Pod but the client Pod used TCP protocol, so the
test traffic never reached the server Pod and the test NetworkPolicy
was never enforced, which lead to test success even if the default-deny
policy was implemented wrongly. In some cases it may got failure result
if there was an external server having same IP as the cluster IP and
listening to TCP 80 port.
Signed-off-by: Quan Tian <qtian@vmware.com>
The advantage is that the extra error information is guaranteed to be printed
directly before the failure and we avoid one extra log line that would have to
be correlated with the failure.
The failure message from Gomega was hard to read because explanation and error
text were separated by the error dump. In many cases, that error dump doesn't
add any relevant information.
Now the error is dumped first as info message (just in case that it is
relevant) and then a shorter failure message is created from explanation and
error text.
The test validates the following endpoints
- deleteApiregistrationV1CollectionAPIService
- patchApiregistrationV1APIServiceStatus
- replaceApiregistrationV1APIService
- replaceApiregistrationV1APIServiceStatus
- don't require clusteconfiguration.kubernetesVersion
- use the helper functions GetConfigMap, ExpectRole, ExpectRoleBinding
(these were used before UnversionedKubeletConfig went Beta)
The updated klog provides a reusable test suite for output handling.
Using it increases our test coverage without having to copy the test cases from
there into some JSON specific test suite.
The tests in nodes_test.go check if the Node objects
in a kubeadm cluster are annotated with a CRI socket
path. It is used by kubeadm to store a CRI socket per node.
Add a new test condition to verify if the CRI socket path
is prefixed with URL scheme "unix://".
Remove namespace from manifest to fix the error: the namespace from
the provided object "my-ns" does not match the namespace
"kubectl-8939". You must pass '--namespace=my-ns' to perform this
operation.
Since ClientCAs are provided by "client-ca::kube-system::extension-apiserver-authentication::client-ca-file" controller
we need to wait until it picks up the configmap (via a lister) before checking the CAs otherwise the response might contain an empty result.
the job controller used by the tests must wait for the caches to sync
since the tests don't check /readyz there is no way
the tests can tell it is safe to call the server and requests won't be rejected
- iniconfiguration.go: stop applying the "master" taint
for new clusters; update related unit tests in _test.go
- apply.go: Remove logic related to cleanup of the "master" label
during upgrade
- apply.go: Add cleanup of the "master" taint on CP nodes
during upgrade
- controlplane_nodes_test.go: remove test for old "master" taint
on nodes (this needs backport to 1.24, because we have a kubeadm
1.25 vs kubernetes test suite 1.24 e2e test)
Previously, the e2e test was overriding the plugins socket directory to
"/var/lib/kubelet/plugins_registry". This seems wrong, and with that
setting the e2e test was already failing, because the registration
process was timing out, in turn because the kubelet was trying to call
back the device plugin in the wrong place (see below for details).
I can't explain why it worked before - or it if worked at all - but
it really seems that `pluginapi.DevicePluginPath` is the right
setting here.
+++
In a nutshell, the device plugin registration process works like this:
1. The kubelet runs and creates the device plugin socket registration
endpoint:
KubeletSocket = DevicePluginPath + "kubelet.sock"
DevicePluginPath = "/var/lib/kubelet/device-plugins/"
2. Each device plugin will listen to an ENDPOINT the kubelet will connect
backk to. IOW the kubelet will act like a client to each device plugin,
to perform allocation requests (and more)
Each device plugin will serve from a endpoint.
The endpoint name is plugin-specific, but they all must be inside a
well-known directory: pluginapi.DevicePluginPath
3. The kubelet creates the device plugin pod, like any other pod
4. During the startup, each device plugin wants to register itself in the
kubelet. So it sends a request through
the registration endpoint. Key details:
grpc.Dial(kubelet registration socket)
registration request
reqt := &pluginapi.RegisterRequest{
Version: pluginapi.Version,
Endpoint: endpointSocket, <- socket relative to pluginapi.DevicePluginPath
ResourceName: resourceName, <- resource name to be exposed
}
5. While handling the registration request, kubelet dial back the
device plugin on socketDir + req.Endpoint.
But socketDir is hardcoded in the device manager code to
pluginapi.KubeletSocket
Signed-off-by: Francesco Romani <fromani@redhat.com>
In the AfterEach check of the e2e node device plugin tests,
the tests want really bad to clean up after themselves:
- delete the sample device plugin
- restart again the kubelet
- ensure that after the restart, no stale sample devices
(provided by the sample device plugin) are reported anymore.
We observed that in the AfterEach block of these e2e tests
we have quite reliably a flip/flop of the kubelet readiness
state, possibly related to a race with/ a slow runtime/PLEG check.
What happens is that the kubelet readiness state is true,
but goes false for a quick interval and then goes true again
and it's pretty stable after that (observed adding more logs
to the check loop).
The key factor here is the function `getLocalNode` aborts the
test (as in `framework.ExpectNoError`) if the node state is
not ready. So any occurrence of this scenario, even if it
is transient, will cause a test failure. I believe this will
make the e2e test unnecessarily fragile without making it more
correct.
For the purpose of the test we can tolerate this kind of glitches,
with kubelet flip/flopping the ready state, granted that we meet
eventually the final desired condition on which the node reports
ready AND reports no sample devices present - which was the condition
the code was trying to check.
So, we add a variant of `getLocalNode`, which just fetches the
node object the e2e_node framework created, alongside to a flag
reporting the node readiness. The new helper does not make
implicitly the test abort if the node is not ready, just bubbles
up this information.
Signed-off-by: Francesco Romani <fromani@redhat.com>
The means by which we extract and parse the version of an API object is
not specific to etcd3. In order to allow for a generic suite of tests
against any storage.Interface imlpementation, we need this logic to live
outside of the etcd3 package, or import cycles will exist.
Signed-off-by: Steve Kuznetsov <skuznets@redhat.com>
Because these tests don't run in the CI, the test did not quite match the
actual code anymore. Apparently a retry mechanism was added after the test was
written.
The test had two problems:
- the expected line was off by one (probably modified import statements)
- when Gomega failed in TestFailureOutput, Ginkgo panicked because
its fail handler was called outside of a Ginkgo node
Now github.com/stretchr/testify/assert is used for comparing the output because
it works in a unit test without further customization and because the failure
messages are more useful.
For a developer that's not very familiar with the integration flow, it
is very surprising to see that the namespace creation logic does not
create anything and that the namespace deletion logic does not delete
anything, either.
Signed-off-by: Steve Kuznetsov <skuznets@redhat.com>
validate flag does not have default value defined when there is no
parameter passed, therefore it tries to use next irrelevant flag.
This PR defines NoOptDefVal for validate flag which is set "strict".
Some of these tests could not be run previously, especially on Windows
Docker containers. But now, by using Windows Containerd, we can finally
run them:
- HostNetwork=true tests: This can now be enabled on Windows Privileged Containers.
- /etc/hosts related tests: These were not supported because it required single
file mappings, which is possible in Containerd.
- termination message as non-root user: Requires RunAsUsername, and single file
mappings.
Starting golangci-lint >= 1.45, the tool is complaining
about the function being unused:
```bash
test/e2e_node/device_plugin_test.go:82:6: func `getSampleDevicePluginPod` is unused (unused)
func getSampleDevicePluginPod() *v1.Pod {
^
Please review the above warnings. You can test via "./hack/verify-golangci-lint.sh"
If the above warnings do not make sense, you can exempt this warning with a comment
(if your reviewer is okay with it).
In general please prefer to fix the error, we have already disabled specific lints
that the project chooses to ignore.
See: https://golangci-lint.run/usage/false-positives/}
```
thing is the code is not changed lately, and manual inspection trivially
confirms it is used.
Older versions of golangci-lint (tested with
```
golangci-lint has version 1.41.1 built from a2074809 on 2021-06-19T16:01:50Z
```)
indeed do NOT complain about the function, so this seems a golangci-lint
bug.
To move forward, we can disable the warning, but this leaves a sour
taste.
Instead, since the function is pretty trivias, was used just once and the caller
was undoing some of the work done by the function, we just inline it,
which solves the linter warning and makes the code a bit better.
Signed-off-by: Francesco Romani <fromani@redhat.com>
On IPv6 clusters, one of the most frequent problems I encounter is
assumptions that one can build a URL with a host and port simply by
using Sprintf, like this:
```go
fmt.Sprintf("http://%s:%d/foo", host, port)
```
When `host` is an IPv6 address, this produces an invalid URL as it must
be bracketed, like this:
```
http://[2001:4860:4860::8888]:9443
```
This change fixes the occurences of joining a host and port with the
purpose built `net.JoinHostPort` function.
I encounter this problem often enough that I started to [write a linter
for it](https://github.com/stbenjam/go-sprintf-host-port). I don't
think the linter is quite ready for wide use yet, but I did run it
against the Kube codebase and found these. While the host portion in
some of these changes may always be an FQDN or IPv4 IP today, it's an
easy thing that can break later on.
The device plugin e2e tests where failing lately and to unblock the
release a skip was added in the prow job configuration:
71cf119c84/config/jobs/kubernetes/sig-node/sig-node-presubmit.yaml (L401)
The problem here is not only the broken test which need to be
fixed, but also the fact that this is the only skip (for a specific
test) we do this way, which is surprising (xref:
https://github.com/kubernetes/kubernetes/issues/106635#issuecomment-1105627265)
As next step towards improvement, we add an explicit skip in the tests
proper. This makes at least more obvious these tests need more work,
and allow us to remove the edge case in the prow configuration.
Signed-off-by: Francesco Romani <fromani@redhat.com>
the serviceAccountController controller used by the tests must wait for the caches to sync
since the tests don't check /readyz there is no way
the tests can tell it is safe to call the server and requests won't be rejected
The test was expecting an SA token in a secret but pods are getting
their SA tokens via projected volumes by default. Also, the SA token
controller function is getting reduced so the original check is likely
to fail.
The test/e2e suite has never supported feature gates:
- it cannot discover at runtime how the cluster is configured
- its --feature-gates parameter had no effect
Despite that, tests were written that used
e2eskipper.SkipUnlessFeatureGateEnabled even though that function then only
checked the default feature gate state. To catch such mistakes, e2e tests
suites now must explicitly enable feature gate checking via
e2eskipper.InitFeatureGates. They also must register their own command line
flag. When that is not done, then using SkipUnlessFeatureGateEnabled or
SkipIfFeatureGateEnabled leads to a test failure.
test/e2e_node does both and therefore continues to work as before.
These SkipUnlessFeatureGateEnabled are useless because:
- the tests run in test/e2e where feature gates always
have their default state
- CSIMigration, SizeMemoryBackedVolumes and ExecProbeTimeout are
all enabled by default (beta resp. GA)
The readonly port could be disabled.
Since we are only using the /healthz endpoint,
we can use the healthz port for this.
Change-Id: Ie0e05a5ab4ec6f51e4d3c63226aa23c1b3a69956