As what suggested by Ginkgo migration guide, `Measure` node was
deprecated and replaced with `It` node which creates `gmeasure.Experiment`.
Signed-off-by: Dave Chen <dave.chen@arm.com>
Some operations with Azure Disk volumes can be arbitrarily slow
sometimes. This causes CI flakes that are hard to debug.
Bumping the timeouts has sown to improve or eliminate this issue.
Ginkgo is now writing the JUnit file itself. The -report-dir parameter is used
as fallback for enabling JUnit output in case that users haven't migrated to
the new -junit-report parameter.
Co-authored-by: Patrick Ohly <patrick.ohly@intel.com>
Signed-off-by: Dave Chen <dave.chen@arm.com>
Besides, the using of method might lead to a `concurrent map writes`
issue per the discussion here: https://github.com/onsi/ginkgo/issues/970
Signed-off-by: Dave Chen <dave.chen@arm.com>
Default timeout setting has been reduced from `24h` down to `1h` in
Ginkgo V2, but for some long running test this is too short.
How long to abort the test was controlled by the the linux command `timeout`
in V1. e.g. `'timeout -k 30s 150m ...`, and is configured in the file
like `sig-network-misc.yaml`.
Set the timeout manually for Ginkgo V2 to avoid the early aborting.
Signed-off-by: Dave Chen <dave.chen@arm.com>
`FullStackTrace` is not available in v2 if no exception found
with test execution.
The change is needed for conformance test's spec validation.
pls see: https://github.com/onsi/ginkgo/issues/960 for details.
Signed-off-by: Dave Chen <dave.chen@arm.com>
The change is needed to verify the conformance test spec, as this
is verified in `verify-conformance-yaml.sh`.
Signed-off-by: Dave Chen <dave.chen@arm.com>
Full stack traces are on by default. The approach for collecting results is
different. Tests run in their own goroutine, therefore runTests is no longer
part of their callstack. To cover stack traces with more than one entry, a new
test case gets added with a separate helper function.
Gomega object formatting now includes the type.
This removes the last remaining reference to Ginkgo v1.
Co-authored-by: Patrick Ohly <patrick.ohly@intel.com>
Signed-off-by: Dave Chen <dave.chen@arm.com>
- update all the import statements
- run hack/pin-dependency.sh to change pinned dependency versions
- run hack/update-vendor.sh to update go.mod files and the vendor directory
- update the method signatures for custom reporters
Signed-off-by: Dave Chen <dave.chen@arm.com>
Seemingly on slow connections if the response to /configz request was
chunked the kubectl proxy was terminated before the response body was
received and read, causing unexpected EOF errors. This patch changes the
configz polling code so that the whole response body is read before
closing the proxy connection.
The test validates the following endpoints
- createAppsV1NamespacedControllerRevision
- deleteAppsV1CollectionNamespacedControllerRevision
- deleteAppsV1NamespacedControllerRevision
- listAppsV1ControllerRevisionForAllNamespaces
- patchAppsV1NamespacedControllerRevision
- readAppsV1NamespacedControllerRevision
- replaceAppsV1NamespacedControllerRevision
As outlined in the KEP, we now graduate the Kubelet feature to beta
which means that it is enabled by default. The corresponding Kubelet
flag still defaults to `false`, but we now have the chance to e2e test
the feature by using a new serial test case.
KEP: https://github.com/kubernetes/enhancements/issues/2413
Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
e2e test validates the following 3 extra endpoints
- deleteCoreV1CollectionNamespacedResourceQuota
- listCoreV1ResourceQuotaForAllNamespaces
- patchCoreV1NamespacedResourceQuota
As described in 8c76845b03 ("test/e2e/network: fix a bug in the hostport e2e
test") if we have two pods with the same hostPort, hostIP, but different
protocols, a CNI may be buggy and decide to forward all traffic only to one of
these pods. Add a check that we receiving requests from different pods.
Co-authored-by: Antonio Ojea <antonio.ojea.garcia@gmail.com>
Making the LoggingConfiguration part of the versioned component-base/config API
had the theoretic advantage that components could have offered different
configuration APIs with experimental features limited to alpha versions (for
example, sanitization offered only in a v1alpha1.KubeletConfiguration). Some
components could have decided to only use stable logging options.
In practice, this wasn't done. Furthermore, we don't want different components
to make different choices regarding which logging features they offer to
users. It should always be the same everywhere, for the sake of consistency.
This can be achieved with a saner Go API by dropping the distinction between
internal and external LoggingConfiguration types. Different stability levels of
indidividual fields have to be covered by documentation (done) and potentially
feature gates (not currently done).
Advantages:
- everything related to logging is under component-base/logs;
previously this was scattered across different packages and
different files under "logs" (why some code was in logs/config.go
vs. logs/options.go vs. logs/logs.go always confused me again
and again when coming back to the code):
- long-term config and command line API are clearly separated
into the "api" package underneath that
- logs/logs.go itself only deals with legacy global flags and
logging configuration
- removal of separate Go APIs like logs.BindLoggingFlags and
logs.Options
- LogRegistry becomes an implementation detail, with less code
and less exported functionality (only registration needs to
be exported, querying is internal)
The hostport e2e test (sonobuoy run --e2e-focus 'validates that there is no
conflict between pods with same hostPort but different hostIP and protocol')
checks, in particular, that two pods with the same hostPort, the same hostIP,
but different L4 protocols can coexist on one node.
In order to do this, the test creates two pods with the same hostIP:hostPort,
one TCP-based, another UDP-based. However, both pods listen on both protocols:
netexec --http-port=8080 --udp-port=8080
This can happen that a CNI which doesn't distinguish between TCP and UDP
hostPorts forwards all traffic, TCP or UDP, to the same pod. As this pod
listens on both protocols it will reply to both requests, and the test
will think that everything works properly while the second pod is indeed
disconnected. Fix this by executing different commands in different pods:
TCP: netexec --http-port=8080 --udp-port=-1
UDP: netexec --http-port=8008 --udp-port=8080
The TCP pod now doesn't listen on UDP, and the UDP pod doesn't listen on TCP on
the target hostPort. The UDP pod still needs to listen on TCP on another port
so that a pod readiness check can be made.
Use a watch to detect invalid pod status updates in graceful node
shutdown node e2e test. By using a watch, all pod updates will be
captured while the previous logic required polling the api-server which
could miss some intermediate updates.
Signed-off-by: David Porter <david@porter.me>
This is useful in environments where the Deployment image is replaced
by another image from an internal registry (via fixture). In that case,
the populator running in populate mode should use the same image as the
populator running in controller mode.
The test is about SCTP and the accessed service only forwarded SCTP
traffic to the server Pod but the client Pod used TCP protocol, so the
test traffic never reached the server Pod and the test NetworkPolicy
was never enforced, which lead to test success even if the default-deny
policy was implemented wrongly. In some cases it may got failure result
if there was an external server having same IP as the cluster IP and
listening to TCP 80 port.
Signed-off-by: Quan Tian <qtian@vmware.com>
The advantage is that the extra error information is guaranteed to be printed
directly before the failure and we avoid one extra log line that would have to
be correlated with the failure.
The failure message from Gomega was hard to read because explanation and error
text were separated by the error dump. In many cases, that error dump doesn't
add any relevant information.
Now the error is dumped first as info message (just in case that it is
relevant) and then a shorter failure message is created from explanation and
error text.
The test validates the following endpoints
- deleteApiregistrationV1CollectionAPIService
- patchApiregistrationV1APIServiceStatus
- replaceApiregistrationV1APIService
- replaceApiregistrationV1APIServiceStatus
- don't require clusteconfiguration.kubernetesVersion
- use the helper functions GetConfigMap, ExpectRole, ExpectRoleBinding
(these were used before UnversionedKubeletConfig went Beta)
The updated klog provides a reusable test suite for output handling.
Using it increases our test coverage without having to copy the test cases from
there into some JSON specific test suite.
The tests in nodes_test.go check if the Node objects
in a kubeadm cluster are annotated with a CRI socket
path. It is used by kubeadm to store a CRI socket per node.
Add a new test condition to verify if the CRI socket path
is prefixed with URL scheme "unix://".
Remove namespace from manifest to fix the error: the namespace from
the provided object "my-ns" does not match the namespace
"kubectl-8939". You must pass '--namespace=my-ns' to perform this
operation.
Since ClientCAs are provided by "client-ca::kube-system::extension-apiserver-authentication::client-ca-file" controller
we need to wait until it picks up the configmap (via a lister) before checking the CAs otherwise the response might contain an empty result.
the job controller used by the tests must wait for the caches to sync
since the tests don't check /readyz there is no way
the tests can tell it is safe to call the server and requests won't be rejected
- iniconfiguration.go: stop applying the "master" taint
for new clusters; update related unit tests in _test.go
- apply.go: Remove logic related to cleanup of the "master" label
during upgrade
- apply.go: Add cleanup of the "master" taint on CP nodes
during upgrade
- controlplane_nodes_test.go: remove test for old "master" taint
on nodes (this needs backport to 1.24, because we have a kubeadm
1.25 vs kubernetes test suite 1.24 e2e test)
Previously, the e2e test was overriding the plugins socket directory to
"/var/lib/kubelet/plugins_registry". This seems wrong, and with that
setting the e2e test was already failing, because the registration
process was timing out, in turn because the kubelet was trying to call
back the device plugin in the wrong place (see below for details).
I can't explain why it worked before - or it if worked at all - but
it really seems that `pluginapi.DevicePluginPath` is the right
setting here.
+++
In a nutshell, the device plugin registration process works like this:
1. The kubelet runs and creates the device plugin socket registration
endpoint:
KubeletSocket = DevicePluginPath + "kubelet.sock"
DevicePluginPath = "/var/lib/kubelet/device-plugins/"
2. Each device plugin will listen to an ENDPOINT the kubelet will connect
backk to. IOW the kubelet will act like a client to each device plugin,
to perform allocation requests (and more)
Each device plugin will serve from a endpoint.
The endpoint name is plugin-specific, but they all must be inside a
well-known directory: pluginapi.DevicePluginPath
3. The kubelet creates the device plugin pod, like any other pod
4. During the startup, each device plugin wants to register itself in the
kubelet. So it sends a request through
the registration endpoint. Key details:
grpc.Dial(kubelet registration socket)
registration request
reqt := &pluginapi.RegisterRequest{
Version: pluginapi.Version,
Endpoint: endpointSocket, <- socket relative to pluginapi.DevicePluginPath
ResourceName: resourceName, <- resource name to be exposed
}
5. While handling the registration request, kubelet dial back the
device plugin on socketDir + req.Endpoint.
But socketDir is hardcoded in the device manager code to
pluginapi.KubeletSocket
Signed-off-by: Francesco Romani <fromani@redhat.com>
In the AfterEach check of the e2e node device plugin tests,
the tests want really bad to clean up after themselves:
- delete the sample device plugin
- restart again the kubelet
- ensure that after the restart, no stale sample devices
(provided by the sample device plugin) are reported anymore.
We observed that in the AfterEach block of these e2e tests
we have quite reliably a flip/flop of the kubelet readiness
state, possibly related to a race with/ a slow runtime/PLEG check.
What happens is that the kubelet readiness state is true,
but goes false for a quick interval and then goes true again
and it's pretty stable after that (observed adding more logs
to the check loop).
The key factor here is the function `getLocalNode` aborts the
test (as in `framework.ExpectNoError`) if the node state is
not ready. So any occurrence of this scenario, even if it
is transient, will cause a test failure. I believe this will
make the e2e test unnecessarily fragile without making it more
correct.
For the purpose of the test we can tolerate this kind of glitches,
with kubelet flip/flopping the ready state, granted that we meet
eventually the final desired condition on which the node reports
ready AND reports no sample devices present - which was the condition
the code was trying to check.
So, we add a variant of `getLocalNode`, which just fetches the
node object the e2e_node framework created, alongside to a flag
reporting the node readiness. The new helper does not make
implicitly the test abort if the node is not ready, just bubbles
up this information.
Signed-off-by: Francesco Romani <fromani@redhat.com>
The means by which we extract and parse the version of an API object is
not specific to etcd3. In order to allow for a generic suite of tests
against any storage.Interface imlpementation, we need this logic to live
outside of the etcd3 package, or import cycles will exist.
Signed-off-by: Steve Kuznetsov <skuznets@redhat.com>
Because these tests don't run in the CI, the test did not quite match the
actual code anymore. Apparently a retry mechanism was added after the test was
written.
The test had two problems:
- the expected line was off by one (probably modified import statements)
- when Gomega failed in TestFailureOutput, Ginkgo panicked because
its fail handler was called outside of a Ginkgo node
Now github.com/stretchr/testify/assert is used for comparing the output because
it works in a unit test without further customization and because the failure
messages are more useful.