This is useful for running a driver on a subset of all ready nodes:
- use e2enode.GetBoundedReadySchedulableNodes with a suitable
maximum number of nodes to determine how much nodes are available
for a test
- define pod anti-affinity in the PodTemplate:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchLabels:
app.kubernetes.io/instance: xxxxxxx
topologyKey: kubernetes.io/hostname
- set the ReplicaSetSpec.Replicas value to the number of nodes
If the control plane emits anything at the time when the test runs, for example
"unable to sync kubernetes service", the test breaks because that additional
output is unexpected.
Pulling the CreateKubeConfig function from the expensive to build
test/utils/apiserver package had a considerable impact on the overall build
time because that package depends on a lot of other packages.
Because only that one function is needed by the framework, that extra build
time can be avoided by moving it into its own package.
The framework.AddCleanupAction API was a workaround for Ginkgo v1 not invoking
AfterEach callbacks after a test failure. Ginkgo v2 not only fixed that, but
also added a DeferCleanup API which can be used to run some code if (and only
if!) the corresponding setup code ran. In several cases that makes the test
cleanup simpler.
This covers multiple facets of the current framework and of Ginkgo:
- Ginkgo output is verbose and includes detailed progress
messages (BeforeEach/AfterEach tracing).
- Namespace creation.
- Order of callback invocation.
This runs etcd and an apiserver using it inside the test process. The caller
can either use the ClientSet or the config file. More options might get added
in the future.
Co-author: Antonio Ojea <antonio.ojea.garcia@gmail.com>
When using By or some other Ginkgo output functions, Ginkgo v2 now adds a time
stamp at the end of the line that we need to ignore. Will become relevant when
testing more complete output.
For cleanup purposes the ginkgo.DeferCleanup is a better replacement for
f.AddAfterEach:
- the cleanup only gets executed when the corresponding setup code ran
and can use the same local variables
- the callback runs after the test and before the framework
deletes namespaces (as before)
- if one callback fails, the others still get executed
For the original purpose (https://github.com/kubernetes/kubernetes/pull/86177 "This is
very useful for custom gathering scripts.") it is now possible to use
ginkgo.AfterEach because it will always get executed. Just beware that its
callbacks run in first-in-first-out order.
In contrast to ginkgo.AfterEach, ginkgo.DeferCleanup runs the callback in
first-in-last-out order. Using it makes the following test code work as
expected:
f := framework.NewDefaultFramework("some test")
ginkgo.AfterEach(func() {
// do something with f.ClientSet
})
Previously, f.ClientSet was already set to nil by the framework's cleanup code.
After updating gRPC in node-driver-registrar from v1.40.0 to v1.47.0 the
behavior of gRPC change in a way such that it no longer detected the
single-sided closing of the stream as a loss of connection. This caused gRPC in
the e2e.test to get stuck, possibly in a Read or Write for the HTTP stream
because those have neither a context nor a timeout.
Changing the connection handling so that all active connections are tracking in
the listener and closing them when the listener gets closed fixed this problem.
Some scripts and tools still relied on the deprecated flags, the ones
which are about to be removed.
This is intentionally not a complete removal of all those flags in the entire
repo. This would lead to much more code churn also in places where commands
still accept the flags because they use klog directly.
The custom progress reporter gets invoked via ginkgo.ReportAfterEach after each
test. The problem was that the e2e framework unconditionally enables Ginkgo's
-progress output which shows execution of all nodes, including this
ReportAfterEach. The effect were over 1000 lines of useless output at the start
of a test run while skipping disabled tests.
The solution is to tell Ginkgo that the ReportAfterEach isn't meant to be
reported.
This change updates TestAggregatedAPIServer and the related test
server wiring to exercise the full network path between the Kube API
server and the aggregated API server. We now assert that the wardle
API service and Kube API server discovery endpoints are fully healthy.
CRUD operations are performed through the Kube API server to the
wardle API server.
Signed-off-by: Monis Khan <mok@microsoft.com>
Contextual logging cannot be enabled manually because there is no feature gate
flag. Enabling the feature unconditionally:
- should be low risk for E2E testing
- if it fails, we want to know
- is useful to get better log output from code which already supports it
We don't want klog to print to anything other than GinkgoWriter, but it still
used os.Stderr in addition to GinkgoWriter when printing log entries with
severity >= error. Changing "stderrthreshold" fixes that.
The unit test for framework output handling didn't test klog behavior. Now it
does:
- os.Stderr is redirected, should be empty
- a new test invokes klog
The WaitFor* refactoring in 07c34eb400 had an oversight what timeout parameter
is used for calling WaitForAllPodsCondition() in WaitForPodsWithLabelRunningReady()
so the calls to WaitForPodsWithLabelRunningReady() ended up ignoring the user
provided timeout. Fix that.
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
Followup on https://github.com/kubernetes/kubernetes/pull/111846. This
particular test was left out from that PR because once it was enabled it
started failing. It was desired to merge
https://github.com/kubernetes/kubernetes/pull/111846 irrespective of
this particular test.
The failure in the test was caused due to the
`createFSGroupRequestPreHook` mock CSI driver hook function assuming
that the request object passed to it is an instance of the respective
struct, but it's actually a pointer instead. This resulted in the hook
function not fulfilling its purpose, and the so the test failed.
Fixes instances of #98213 (to ultimately complete #98213 linting is
required).
This commit fixes a few instances of a common mistake done when writing
parallel subtests or Ginkgo tests (basically any test in which the test
closure is dynamically created in a loop and the loop doesn't wait for
the test closure to complete).
I'm developing a very specific linter that detects this king of mistake
and these are the only violations of it it found in this repo (it's not
airtight so there may be more).
In the case of Ginkgo tests, without this fix, only the last entry in
the loop iteratee is actually tested. In the case of Parallel tests I
think it's the same problem but maybe a bit different, iiuc it depends
on the execution speed.
Waiting for the CI to confirm the tests are still passing, even after
this fix - since it's likely it's the first time those test cases are
executed - they may be buggy or testing code that is buggy.
Another instance of this is in `test/e2e/storage/csi_mock_volume.go` and
is still failing so it has been left out of this commit and will be
addressed in a separate one
The main purpose of this change is to update the e2e Netpol tests to use
the srandard CreateNamespace function from the Framework. Before this
change, a custom Namespace creation function was used, with the
following consequences:
* Pod security admission settings had to be enforced locally (not using
the centralized mechanism)
* the custom function was brittle, not waiting for default Namespace
ServiceAccount creation, causing tests to fail in some infrastructures
* tests were not benefiting from standard framework capabilities:
Namespace name generation, automatic Namespace deletion, etc.
As part of this change, we also do the following:
* clearly decouple responsibilities between the Model, which defines the
K8s objects to be created, and the KubeManager, which has access to
runtime information (actual Namespace names after their creation by
the framework, Service IPs, etc.)
* simplify / clean-up tests and remove as much unneeded logic / funtions
as possible for easier long-term maintenance
* remove the useFixedNamespaces compile-time constant switch, which
aimed at re-using existing K8s resources across test cases. The
reasons: a) it is currently broken as setting it to true causes most
tests to panic on the master branch, b) it is not a good idea to have
some switch like this which changes the behavior of the tests and is
never exercised in CI, c) it cannot possibly work as different test
cases have different Model requirements (e.g., the protocols list can
differ) and hence different K8s resource requirements.
For #108298
Signed-off-by: Antonin Bas <abas@vmware.com>
Introduce networking/v1alpha1 api group.
Add `ClusterCIDR` type to networking/v1alpha1 api group, this type
will enable the NodeIPAM controller to support multiple ClusterCIDRs.
Updates predicate to check for a length >=2 to avoid
the index out of bounds panic.
Signed-off-by: Edwin Xie <exie@vmware.com>
Co-authored-by: Tyler Schultz <tschultz@vmware.com>
- add feature gate
- add encrypted object and run generated_files
- generate protobuf for encrypted object and add unit tests
- move parse endpoint to util and refactor
- refactor interface and remove unused interceptor
- add protobuf generate to update-generated-kms.sh
- add integration tests
- add defaulting for apiVersion in kmsConfiguration
- handle v1/v2 and default in encryption config parsing
- move metrics to own pkg and reuse for v2
- use Marshal and Unmarshal instead of serializer
- add context for all service methods
- check version and keyid for healthz
Signed-off-by: Anish Ramasekar <anish.ramasekar@gmail.com>
Including the full information for successful tests makes the resulting XML
file too large for the 200GB limit in Spyglass when running large jobs (like
scale testing).
The original solution from https://github.com/kubernetes/kubernetes/pull/111627
broke JUnit reporting in other test suites, in particular
test/e2e_node. Keeping the code inside the framework ensures that all test
suites continue to have the JUnit reporting.
AfterReadingAllFlags is a good place to set this up because all test suites
using the test context are expected to call it before running tests and after
parsing flags.
Removing the ReportEntries added by ginkgo.By from all test reports usually
avoids the `system-err` part in the JUnit file, which in Spyglass avoids
the extra "open stdout" button.
Co-authored-by: Patrick Ohly <patrick.ohly@intel.com>
Co-authored-by: Dave Chen <dave.chen@arm.com>
It is used to request that a pod runs in a unique user namespace.
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Signed-off-by: Rodrigo Campos <rodrigoca@microsoft.com>
Co-authored-by: Rodrigo Campos <rodrigoca@microsoft.com>
Including the full information for successful tests makes the resulting XML
file too large for the 200GB limit in Spyglass when running large jobs (like
scale testing).
Co-authored-by: Patrick Ohly <patrick.ohly@intel.com>
Co-authored-by: Dave Chen <dave.chen@arm.com>
- PreemptionByKubeScheduler (Pod preempted by kube-scheduler)
- DeletionByTaintManager (Pod deleted by taint manager due to NoExecute taint)
- EvictionByEvictionAPI (Pod evicted by Eviction API)
- DeletionByPodGC (an orphaned Pod deleted by PodGC)PreemptedByScheduler (Pod preempted by kube-scheduler)
We now partly drop the support for seccomp annotations which is planned
for v1.25 as part of the KEP:
https://github.com/kubernetes/enhancements/issues/135
Pod security policies are not touched by this change and therefore we
have to keep the annotation key constants.
This means we only allow the usage of the annotations for backwards
compatibility reasons while the synchronization of the field to
annotation is no longer supported. Using the annotations for static pods
is also not supported any more.
Making the annotations fully non-functional will be deferred to a
future release.
Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
this is specifically so that we have more structured information when
the metric is parsed and stored as a stable metric. This change does not
change the name of the actual metrics.
Change-Id: I861482401ad9a0ae12306b93abf91d6f76d7a407
In order to promote kubectl alpha events to beta,
it should at least support flags which is already
supported by kubectl get events as well as new flags.
This PR adds;
--output: json|yaml support and does essential refactorings to
integrate other printing options easier in the future.
--no-headers: kubectl get events can hide headers when this flag is set for default printing.
Adds this ability to hide headers also for kubectl alpha events.
This flag has no effect when output is json or yaml for both commands.
--types: This will be used to filter certain events to be printed and
discard others(default behavior is same with --event=Normal,Warning).
Also add entrypoints for verifying and updating a test file for easier
debugging. This is considerably faster than running the stablity checks
against the entire Kubernetes codebase.
Change-Id: I5d5e5b3abf396ebf1317a44130f20771a09afb7f
- Run hack/update-codegen.sh
- Run hack/update-generated-device-plugin.sh
- Run hack/update-generated-protobuf.sh
- Run hack/update-generated-runtime.sh
- Run hack/update-generated-swagger-docs.sh
- Run hack/update-openapi-spec.sh
- Run hack/update-gofmt.sh
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
The "pause" pods that are being run in the scheduling tests are
sometimes launched in system namespaces. Therefore even if a test
is considered to be running on a "baseline" Pod Security admission
level, its "baseline" pods would fail to run if the global PSa
enforcement policy is set to "restricted" - the system namespaces
have no PSa labels.
The "pause" pods run by this test can actually easily run with
"restricted" security context, and so this patch turns them
into just that.
The test breaks the controllers that depend on api services to be resolvable,
per example, the namespace controller, that is heavily used by the e2e framework to clean the environment
NPD can run in "standalone" or "daemonset" mode in cluster GCE tests.
For standalone mode, NPD runs as a systemd unit, daemonset mode it runs
as pod.
If NPD is running in standalone mode, as soon as we detect that the npd
service does not exist with `systemctl status`, we should skip the rest
of the checks such as looking at the journalctl logs or pid.
Signed-off-by: David Porter <david@porter.me>
Fix rollout history bug where the latest revision was
always shown when requesting a specific revision and
specifying an output.
Add unit and integration tests for rollout history.
As what suggested by Ginkgo migration guide, `Measure` node was
deprecated and replaced with `It` node which creates `gmeasure.Experiment`.
Signed-off-by: Dave Chen <dave.chen@arm.com>
Some operations with Azure Disk volumes can be arbitrarily slow
sometimes. This causes CI flakes that are hard to debug.
Bumping the timeouts has sown to improve or eliminate this issue.
Ginkgo is now writing the JUnit file itself. The -report-dir parameter is used
as fallback for enabling JUnit output in case that users haven't migrated to
the new -junit-report parameter.
Co-authored-by: Patrick Ohly <patrick.ohly@intel.com>
Signed-off-by: Dave Chen <dave.chen@arm.com>
Besides, the using of method might lead to a `concurrent map writes`
issue per the discussion here: https://github.com/onsi/ginkgo/issues/970
Signed-off-by: Dave Chen <dave.chen@arm.com>
Default timeout setting has been reduced from `24h` down to `1h` in
Ginkgo V2, but for some long running test this is too short.
How long to abort the test was controlled by the the linux command `timeout`
in V1. e.g. `'timeout -k 30s 150m ...`, and is configured in the file
like `sig-network-misc.yaml`.
Set the timeout manually for Ginkgo V2 to avoid the early aborting.
Signed-off-by: Dave Chen <dave.chen@arm.com>
`FullStackTrace` is not available in v2 if no exception found
with test execution.
The change is needed for conformance test's spec validation.
pls see: https://github.com/onsi/ginkgo/issues/960 for details.
Signed-off-by: Dave Chen <dave.chen@arm.com>