This change updates the generic webhook logic to use a rest.Config
as its input instead of a kubeconfig file. This exposes all of the
rest.Config knobs to the caller instead of the more limited set
available through the kubeconfig format. This is useful when this
code is being used as a library outside of core Kubernetes. For
example, a downstream consumer may want to override the webhook's
internals such as its TLS configuration.
Signed-off-by: Monis Khan <mok@vmware.com>
Create an E2E test that creates a job that spawns a pod that should
succeed. The job reserves a fixed amount of CPU and has a large number
of completions and parallelism. Use to repro github.com/kubernetes/kubernetes/issues/106884
Signed-off-by: David Porter <david@porter.me>
Exploring termination revealed we have race conditions in certain
parts of pod initialization and termination. To better catch these
issues refactor the existing test so it can be reused, and then test
a number of alternate scenarios.
This patch aims to simplify decoupling "pkg/scheduler/framework/plugins"
from internal "k8s.io/kubernetes" packages. More described in
issue #89930 and PR #102953.
Some helpers from "k8s.io/kubernetes/pkg/controller/volume/persistentvolume"
package moved to "k8s.io/component-helpers/storage/volume" package:
- IsDelayBindingMode
- GetBindVolumeToClaim
- IsVolumeBoundToClaim
- FindMatchingVolume
- CheckVolumeModeMismatches
- CheckAccessModes
- GetVolumeNodeAffinity
Also "CheckNodeAffinity" from "k8s.io/kubernetes/pkg/volume/util"
package moved to "k8s.io/component-helpers/storage/volume" package
to prevent diamond dependency conflict.
Signed-off-by: Konstantin Misyutin <konstantin.misyutin@huawei.com>
- Lock feature gate to true and schedule for deletion in 1.26
- Remove checks on feature gate
- Graduate E2E test to Conformance
Change-Id: I6814819d318edaed5c86dae4055f4b050a4d39fd
ResourceVersion values are opaque and math should not be done on them.
The intent of this test was to watch from a resourceVersion before the
moment where the objects were created, and we can find such a version by
listing from the API server before the tests begin.
Signed-off-by: Steve Kuznetsov <skuznets@redhat.com>
This test wishes to observe a watch event. In order to do this in the
past, the test chose a well-known `Service` object, fetched it, and did
arithmetic on the returned `resourceVersion` in order to start a watch
that was guaranteed to see an event. It is not valid to parse the
`resourceVersion` as an integer or to do arithmetic on it, so in order
to make the test conformant to an appropriate use of the API it now:
- creates a namespace
- fetches the current `resourceVersion`
- creates an object
- watches from the previous `resourceVersion` that was read
This ensures that an event is seen by the watch, but uses the publically
supported API.
`ConfigMap`s are used instead of `Service`s as they do not require a
valid `spec` for creation and make the test terser.
Signed-off-by: Steve Kuznetsov <skuznets@redhat.com>
- Also update test-cmd.sh to pass a signing ca to the kube controller
manager, so CSRs work properly in integration tests.
Signed-off-by: Margo Crawford <margaretc@vmware.com>
All the controllers should use context for signalling termination of communication with API server. Once kcm cancels context all the cert controllers which are started via kcm should cancel the APIServer request in flight instead of hanging around.
It was possible to patch images in the YAML files via KUBE_TEST_REPO, but it
was not possible to know in advance which images might be needed. Now
-list-images also includes all images referenced by the
test-manifest/storage-csi YAML files.
Upon reconsidering as to the purpose of the test i.e to test the lock
contention flags (--lock-file-contention and --lock-file), it makes
sense that we test only the actual functionality which is the kubelet
should stop once there is a lock contention.
In no way it is the responsiblity of the kubelet to restart, which would
be the responsiblity of a higher system such as systemd.
Hence the removal of the check for releasing the lock and checking for
whether the kubelet is healthy again or not seem out of scope from
kubelet's responsiblities.
Signed-off-by: Imran Pochi <imran@kinvolk.io>
Some of these tests could not be run previously, especially on Windows
Docker containers. But now, by using Windows Containerd, we can finally
run them:
- HostNetwork=true tests: This can now be enabled on Windows Privileged Containers.
- /etc/hosts related tests: These were not supported because it required single
file mappings, which is possible in Containerd.
- termination message as non-root user: Requires RunAsUsername, and single file
mappings.
Some storage tests deploy DaemonSets which hard-code /var/lib/kubelet as root
directory for kubelet registration and pod directory. There was already a
parameter which allowed specifying the root directory, just with a very
confusing name ("--volume-dir") and matching field name. A --kubelet-root-dir
parameters gets added because this may make it easier to find the parameter,
with the old name preserved as an alias for the same field for backwards
compatibility.
When an envelope transformer calls out to KMS (for instance), it will be
very helpful to pass a `context.Context` to allow for cancellation. This
patch does that, while passing the previously-expected additional data
via a context value.
Signed-off-by: Steve Kuznetsov <skuznets@redhat.com>
The kubeadm defaults in features.go differ between versions.
e2e_kubeadm tests cannot import the kubeadm features.go,
or easily detect the versioned of the kubeadm binary used
to create the cluster.
Check for the existence of both versioned and unversioned
objects independent of the value of the FG. Once the FG
goes GA only the unversioned objects should be checked.
Without this change kubeadm e2e skew tests will fail where kubeadm
is at 1.24 (has the FG defaulted to true), the FG is not
explicitly set by the user and the k8s version is at 1.23.
Setting a new consumption target in autoscaling.ResourceConsumer caused
the internal sleep duration between consumption requests to reset.
The next consumption would then get delayed, starting after a gap of 0-30s.
This provides a mechanism for overriding the forced increase of the klog
verbosity to 4 when starting the apiserver and uses that for the scheduler_perf
benchmark. Other tests run as before.
A global variable was used because adding an explicit parameter to several
helper functions would have caused a lot of code churn (test ->
integration/util.StartApiserver ->
integration/framework.RunAnAPIServerUsingServer ->
integration/framework.startAPIServerOrDie).
The remote runtime implementation now supports the `verbose` fields,
which are required for consumers like cri-tools to enable multi CRI
version support.
Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
- Graduate the feature gate to Beta and enable it by default.
- Pre-set the default value for UnversionedKubeletConfigMap
to "true" in test/e2e_kubeadm.
- Fix a couple of typos in "tolerate" introduced in the PR that
added the FG in 1.23.
"-bench=PerfScheduling/Preemption/500Nodes" ran both the
PerfScheduling/Preemption/500Nodes and the
PerfScheduling/PreemptionPVs/500Nodes benchmark.
This can be avoided by choosing names where none is the prefix of another.
When running an integration test that measures performance, like for example
test/integration/scheduler_perf, running etcd with debug level output is
undesirable because it creates additional load on the system and isn't
realistic.
The default is still "debug", but ETCD_LOGLEVEL=warn can be used to override
that.
ARTIFACTS environment variables if KUBE_JUNIT_REPORT_DIR is not set.
Add missing record_command to a couple of tests in legacy-script.sh.
Fix some typos in /test/cmd/README.md
A cpu/topology manager e2e test wants to require one exclusive CPU
and a share of CPU time; let's round up the allocatable CPU requirements
(from 1 to 2) to reduce the chances of false negatives.
Signed-off-by: Francesco Romani <fromani@redhat.com>
Even though CI machines _usually_ have at least two cpus,
let's rather not assume this holds true, and let's actually
check the allocatable CPUs, skipping even the simplest
tests if the assumption is broken, to avoid false negatives.
Signed-off-by: Francesco Romani <fromani@redhat.com>
The existing cpu/topology manager tests correctly check for the
node resources and skip if the detected resources are not enough
to run the tests, to avoid false negatives.
Unfortunately they do the check against the node capacity, while
the correct approach is to check the allocatable resources.
The existing check is correct only on a narrow set of cases;
otherwise can still lead to false negatives.
This PR fixes that.
Signed-off-by: Francesco Romani <fromani@redhat.com>
Make sure to log out the cpu capacity and allocatable for
the node running the tests, to make the troubleshooting
of test failures easier.
Signed-off-by: Francesco Romani <fromani@redhat.com>
This is causing a bug when upgrading from older releases to 1.23 because
of Service's maybe-too-clever default-on-read logic.
Service depends on `Decorator()` to be called upon read, to
back-populate old saved objects which do not have `.clusterIPs[]` set.
This works on read, but the cache saves the pre-decorated type (as it is
documented)
In 1.23, this code was refactored and it seems some edge-case handling
was inadvertently removed (I have not confirmed exactly what happened).
Test by aojea
We do not have guarantee that the agnhost's `/hostname` endpoint returns
a hostname and not an FQDN. We also do not have guarantee a hostname
gets passed to the execHostnameTest() function for comparison.
So make sure we're comparing hostnames in execHostnameTest().
When CRDs are deleted, discovery local cache is not invalidated.
This brings about `resource not found` error when new CRD with same name is created
with different fields(ie. changing scope from cluster-wide to namespaced).
Because this already deleted CRD still stays in serverresources.json and kubectl tries to use it.
This local cached files have 10 minutes TTL. After deletion, if user waits 10 minutes,
files will be expired and deleted and there will be no errors. However, 10 minutes is a long time
and cache needs to be invalidated after deletion occurs.
This PR adds a document into delete command by noting that there might be a need to invalidate discovery
cache when CRD is deleted. In addition to that this PR adds a test to catch this behavior.
Number of workers was set to be 1 because prallel probing on Windows is
flakier, network policy tests may get stuck, this symptom disappears on
the newest kubernetes, network poicy tests run very well with 3 workers.
Now that projected service account tokens do not require the secret
to be created, exclude the wait condition on the token and simply
wait for the service account.
The change from service account secrets to projected tokens and
the new dependency on kube-root-ca.crt to start pods with those
projected tokens means that e2e tests can start before
kube-root-ca.crt is created in a namespace. Wait for the default
service account AND the kube-root-ca.crt configmap in normal
e2e tests.
Having only the "master" taint in the list of non-blocking taints
blocks kubeadm / kind clusters from migrating to applying
both the "control-plane" and "master" taints in 1.24.
Add "control-plane" to the list of taints.
Leave TODO to cleanup the "master" taint in 1.25+.
It has to be removed either way as part of the inclusive
language cleanup efforts.
- Reword old "master" label to "Legacy"
- Add new "control-plane" label
- Use new label to count CP nodes
- Check for both taints on CP nodes
- Leave TODOs for 1.25
* It should check one Node in a zone instead of
each Node and its fromZone.
* Check Nodes' CPUs if they are equivalent
Signed-off-by: Zhecheng Li <zhechengli@microsoft.com>
The e2e test checks that the component implementing Kubernetes Services
interprets ClusterIPs with leading zeros as decimal, otherwise the
cluster will be exposed to CVE-2021-29923.
We don't need to worry about data loss once the data has been written to an
output stream. Calling fsync unnecessarily has been the reason for performance
issues in the past.
The recent regression https://github.com/kubernetes/kubernetes/issues/107033
shows that we need a way to automatically measure different logging
configurations (structured text, JSON with and without split streams) under
realistic conditions (time stamping, caller identification).
System calls may affect the performance and thus writing into actual files is
useful. A temp dir under /tmp (usually a tmpfs) is used, so the actual IO
bandwidth shouldn't affect the outcome. The "normal" json.Factory code is used
to construct the JSON logger when we have actual files that can be set as
os.Stderr and os.Stdout, thus making this as realistic as possible.
When discarding the output instead of writing it, the focus is more on the rest
of the pipeline and changes there can be investigated more reliably.
The benchmarks automatically gather "log entries per second" and "bytes per
second", which is useful to know when considering requirements like the ones
from https://github.com/kubernetes/kubernetes/issues/107029.
logcheck complains:
Additional arguments to ErrorS should always be Key Value pairs. Please check if there is any key or value missing.
That check is intentional, but not applicable here. The check can be worked
around by calling the functions through variables.
The benchmark depends on k8s.io/api (for v1.Container). Such a dependency is
not desirable for k8s.io/component-base/logs, even if it's just for
testing. The solution is to create a separate directory where such a dependency
isn't a problem.
The alternative, a separate package with its own go.mod file under
k8s.io/component-base/logs wouldd have been more complicated to maintain (yet
another go.mod file and different whitelisted dependencies).
This currently covers two cases:
- "kubectl list" (the regression from https://github.com/kubernetes/kubernetes/issues/107012)
- "kubectl get pods/no-such-pod" (no particular reason except that the output
should be deterministic)
In contrast to some other tests that check for strings inside the
output (run_deprecated_api_tests) or compare after
sorting (run_kubectl_version_tests), stdout, stderr and the return code must
match exactly.
This ensures that there is no extra, unexpected output and that the right
output stream is used.
cli.Run was an attempt to elliminate error handling in Kubernetes
commands. However, it had to rely on heuristics that are not necessarily right
for all commands.
kubectl is one example which has its own error printing code that should be
used in all cases after a command failure. It now gets used also for
`--warnings-as-errors`. Previously, that caused the following message to be
logged at the end:
E0110 16:56:01.987555 202060 run.go:120] "command failed" err="1 warning received"
Now it ends with:
error: 1 warning received
With the removal of the kubelet AppArmor profile validation in
https://github.com/kubernetes/kubernetes/pull/97966 we passed the
responsibility of the desired behavior to the container runtime.
Therefore we have to change the e2e test which silently broke after the
PR merge.
Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
The current check enforces that a dynamic provisioner creates
persistent volumes of capacity equals to the persistent volume
claim request size. However there are provisioners that will
create persistent volumes with a capacity greater than the
request size (cinder comes to mind which increments in 1Gi
increments, so if you request 0.5Gi, you get 1Gi). Also
provisioners that have shared storage should be reporting the
total space not the request size (nfs/hostpath for instance).
All these will fail the provisioning check currently because
the capacity is > than the request size. This modifies the check
to be capacity >= request size.
Signed-off-by: Alexander Wels <awels@redhat.com>
This reverts commit 9d09c9d246
This E2E test was reverted becuase the test was failing continously.
More on the issue here #104307
This commit re-reverts and brings back the LockContention test, with
the addition of [Serial] tag to the test.
For 1.23, we removed the kubectl `--dry-run` empty default value (`--dry-run`)
and boolean values (`--dry-run=true` and `--dry-run=false`). This change
required requiring users to specify `--dry-run=client` or `--dry-run=server`
due to a deprecation. This change was made in #105327.
After reconsideration, this change is not worth the churn for users.
It's likely that many users rely on these values for automated and manual use
cases.
This change reverts #105327 and re-introduces the values `--dry-run`,
`--dry-run=true`, and `--dry-run=false`.
The apiserver may be configured to generate the Service
kubernetes.default and its endpoints addresses.
This service is single-stack, hence, the endpoints and the ClusterIP
must have the same IP family.
The tests were asserting that after a NodePort Service was removed,
no new traffic was still reaching the endpoints.
However, the number of tries was so large that another test running
in parallel could create a working Service on that NodePort, making
the test fails.
Use only 10 tries to confirm that the Service stopped working.
The Conformance test "should orphan pods created by rc if delete options say so"
is spawning 80% of the Cluster's Pod Availability (on a 2 node setup, with 30 Pods
capacity each, it spawns 48 pods).
Because of this, tests that are running in parallel with this test has a higher
chance to flake, causing them to timeout because they didn't get to spawn the
necessary Pods within the expected 1 minute time.
Lowering the percentage should reduce the ammount of flakes we see.
Revert to previous behavior in 1.21/1.20 of setting pod phase to failed
during graceful node shutdown.
Setting pods to failed phase will ensure that external controllers that
manage pods like deployments will create new pods to replace those that
are shutdown. Many customers have taken a dependency on this behavior
and it was breaking change in 1.22, so this change reverts back to the
previous behavior.
Signed-off-by: David Porter <david@porter.me>
The same change was already done for csi-driver-host-path master, but not
released yet because csi-snapshotter v5.0.0 itself was not ready yet.
We need this update in k/k because some canary jobs already use the new
snapshotter sidecar which causes permission issues.
This is an automatic update of the testing manifests that mirrors the v1.7.3
release. All of these changes were created with
test/e2e/testing-manifests/storage-csi$ ./update-hostpath.sh v1.7.3
This is a first step towards removing the mock CSI driver completely from
e2e testing in favor of hostpath plugin. With the recent hostpath plugin
changes(PR #260, #269), it supports all the features supported by the mock
csi driver.
Using hostpath-plugin for testing also covers CSI persistent feature
usecases.
Right now, `run_remote.go` only supports GCE instances. But actually
running the tests is completely independent of GCE and could work just
as well on any SSH-accessible machine.
This patch adds a new `--mode` switch, which defaults to `gce` for
backwards compatibility, but can be set to `ssh`. In that mode, the GCE
API is not used at all, and we simply connect to the hosts given via
`--hosts`.
This is still better than `run_local.go` because the latter mixes build
environment with test environment, which doesn't fit well with
container-optimized operating systems.
This is part of an effort to setup the e2e node tests on Fedora CoreOS
(see https://github.com/coreos/fedora-coreos-tracker/issues/990).
Patch best viewed with whitespace ignored.
Implements server side field validation behind the
`ServerSideFieldValidation` feature gate. With the
feature enabled, any create/update/patch request
with the `fieldValidation` query param set to
"Strict" will error if the object in the request
body have unknown fields. A value of "Warn"
(also the default when the feautre is enabled)
will succeed the request with a warning.
When the feature is disabled (or the query param
has a value of "Ignore"), the request will succeed
as it previously had with no indications of any
unknown or duplicate fields.
When a service is created with AllocateLoadBalancerNodePorts to false
it should not allocate node ports.
If the same service is updated to set AllocateLoadBalancerNodePorts
to true, it should allocate node ports.
When a service is updated from ClusterIP type to LoadBalancer type,
and AllocateLoadBalancerNodePorts is set to false, it should not
allocate node ports.
This test has been part of the Conformance suite since at least
Kubernetes 1.2 (2015-10-xx). Some years later, around 2018-10-xx, we
drafted a rigorous set of rules for tests to follow in order to be
eligible for promotion to Conformance. We explicitly disallowed any
tests that check for specific Events, since they are not an API, and we
make no guarantees about their contents nor their delivery.
Unfortunately, we neglected to go through the existing corpus of
Conformance tests with a fine-toothed comb after drafting these rules.
The very nature of what this test is attempting to exercise and verify
is specific Events, and their delivery, thus making it ineligible for
Conformance. We should have caught and demoted this test back then.
Better late than never?
This patch makes the CRI `v1` API the new project-wide default version.
To allow backwards compatibility, a fallback to `v1alpha2` has been added
as well. This fallback can either used by automatically determined by
the kubelet.
Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
* capture loop variable
* capture the loop variable and don't fail on not found errors
* capture loop variable
* Revert "Mark restart_test as flaky"
This reverts commit 990e9506de.
* skip e2e node restart test with dockershim
* Update test/e2e_node/restart_test.go
Co-authored-by: Mike Miranda <mikemp96@gmail.com>
* capture loop using index
Co-authored-by: Mike Miranda <mikemp96@gmail.com>
The device_plugin_tests have not run successfully in a very long time,
initially being marked flaky and then eventually becoming stale.
The gpu_device_plugin_tests have been used to test the same behaviour,
but are incredibly high maintenance due to external changes in behaviour
from GCP/Nvidia that we have no control over.
This commit takes the existing device plugin tests, makes them look more
like the GPU tests, and removes the cases that have been unsupported for
a long time (namely restarting containers while the plugin is
unavailable).
It also removes the GPU plugin tests, as we do not get more signal by
using real devices here.
The featureGates field in ClusterConfiguration ends up
as a map[interface{}]interface{} in the test suite
and cannot be casted to map[string]bool directly.
Adapt the test to use map[interface{}]interface{}.
On the multi NUMA node environment, kernel splits hugepages allocated under
/sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages file equally between NUMA nodes.
That makes it harder to predict where several pods will start because the number
of hugepages on each NUMA node will depend on the amount of NUMA nodes under the environment.
The memory manager test will allocate hugepages on the specific NUMA node to make
the test more predictable on the multi NUMA nodes environment.
Signed-off-by: Artyom Lukianov <alukiano@redhat.com>
Let's wait for the local node (aka the kubelet)
to be ready before to query podresources again,
to avoid false negatives.
Co-authored-by: Artyom Lukianov <alukiano@redhat.com>
Signed-off-by: Francesco Romani <fromani@redhat.com>
DKC is being removed and we don't want it to continue flaking the rest
of our tests. Lets disable them when dkc is disabled rather than hard
failing. This fits more in line with our other E2Es, and reduces the
maintenance load in test-infra.
we need to make sure the system state is completely cleaned up
again, to avoid to mess up with the shared node state, before
we transition from one test to another.
Signed-off-by: Francesco Romani <fromani@redhat.com>
Since commit 42dd01aa3f the cpuRequest is in millicores, hence
we need to properly check translating to exclusive cpus
when verifying the resource allocation.
Signed-off-by: Francesco Romani <fromani@redhat.com>
the intent is to make the code more readable, no intended
changes in behaviour. Now it should be a bit more explicit
why the code is checking some values.
Signed-off-by: Francesco Romani <fromani@redhat.com>
Signed-off-by: wangyysde <net_use@bzhy.com>
Generation swagger.json.
Use v2 path for hpa_cpu_field.
run update-codegen.sh
Signed-off-by: wangyysde <net_use@bzhy.com>
Some of the networking tests are flaking, and logging the command stdout and stderr
might show us some additional information about the the underlying issue when it
occurs.
Some tests have a short timeout for starting the pods (1 minute), but if
those tests happen to be the first ones to run, and the images have to be
pulled, then the test could timeout, especially with larger images. This
commit will allow us to prepull commonly used E2E test images, so this issue
can be avoided.
The logic to detect stale endpoints was not assuming the endpoint
readiness.
We can have stale entries on UDP services for 2 reasons:
- an endpoint was receiving traffic and is removed or replaced
- a service was receiving traffic but not forwarding it, and starts
to forward it.
Add an e2e test to cover the regression
The --log-file parameter will be deprecated as of Kubernetes 1.23 and should be
avoided. The replacement for distroless images is the image with go-runner, a
tool that handles output redirection.
For kubemark to run in that image it must be built as static binary.
* Bump the pod status and node status update timeouts to avoid flakes
* Add a small delay after dbus restart to ensure dbus has enough time to
restart to startup prior to sending shutdown signal
* Change check of pod being terminated by graceful shutdown. Previously,
the pod phase was checked to see if it was `Failed` and the pod reason
string matched. This logic needs to change after 1.22 graceful node
shutdown change introduced in PR #102344 which changed behavior to no
longer put the pods into a failed phase. Instead, the test now checks
that containers are not ready, and the pod status message and reason
are set appropriately.
Signed-off-by: David Porter <david@porter.me>
For some test failures, checking the pod logs could potentially
yield some interesting information, which could be used to further
investigate certain failures / flakes (for example, if there are some
networking issues, we could at least see if requests reach the containers,
(agnhost logs the connections / requests), or if there were any
other issues during the container's startup).
It looks like it tests two pods sharing the same volume, but the goal is
actually the opposite - two pods with the same inline volume definition
should get separate volumes.
This commit forces Kubelet Configuration files to always be generated
and when possible will use the kubeletconfig file that has been provided
by the test orchestrator
This commit enables the remote runner to provide a KubeletConfiguration
file to the test suite when uploading it to a remote host, thet test
runner will then use this configuration to run the Kubelet with the
provided config.
* De-share the Handler struct in core API
An upcoming PR adds a handler that only applies on one of these paths.
Having fields that don't work seems bad.
This never should have been shared. Lifecycle hooks are like a "write"
while probes are more like a "read". HTTPGet and TCPSocket don't really
make sense as lifecycle hooks (but I can't take that back). When we add
gRPC, it is EXPLICITLY a health check (defined by gRPC) not an arbitrary
RPC - so a probe makes sense but a hook does not.
In the future I can also see adding lifecycle hooks that don't make
sense as probes. E.g. 'sleep' is a common lifecycle request. The only
option is `exec`, which requires having a sleep binary in your image.
* Run update scripts
* fix_dsc_rbac_pod_update
* add test for DaemonSet Controller updates label of the pod after "DedupCurHistories"
* rebase
* update parameter of dsc.Run
Copying from pvcBlock swapped name and namespace (breaking the PVC test case)
and some references to the pvcBlock variable were left unchanged (incorrect
annotations for test failures).
Add a e2e test to exercise the checkpoint recovery flow.
This means we need to actually create a old (V1, pre-1.20) checkpoint,
but if we do it only in the e2e test, it's still fine.
Signed-off-by: Francesco Romani <fromani@redhat.com>
The previous implementaton called Update() without changing anything
about the object, so no MODIFIED events were ever generated. This change
ensures that all calls to Update() cause mutations, thereby ensuring
that MODIFIED events happen in the watch stream.
Signed-off-by: Steve Kuznetsov <skuznets@redhat.com>
During PR review it was pointed out that the branches for ephemeral
vs. persistent make the test harder to read. Therefore all code that depends on
if checks gets moved into two different versions of the test, one hat runs for
ephemeral volumes and one for persistent volumes, with skip statements at the
beginning.
* Cleanup FeatureGate skippers
* Perform changes requested by review
* some more review related changes
* Rename skipper functions to make code more readable
* add utilfeature back in
Conceptually, snapshots have to be taken while the pod and thus the volume
exist. Snapshotting has an issue where flushing of data is not guaranteed while
the volume is still staged on the node, so the test relied on deleting the pod
and checking for the volume to be unused. That part of the test cannot be done
for ephmeral volumes.
This is a fix for the new test case from
https://github.com/kubernetes/kubernetes/pull/105636 which had to be merged
without prior testing due to not having a cluster to test on and no pull job
which runs these
tests. https://testgrid.k8s.io/sig-storage-kubernetes#gce-serial then showed a
failure.
The fix is simple: in the ephemeral case, the PVC name isn't set in advance in
pvc.Name and instead must be computed. The fix now was tested on a kubetest
cluster in GCE.
Signed-off-by: Patrick Ohly <patrick.ohly@intel.com>
This adds a new test pattern and uses it for the inline volume tests. Because
the kind of volume now varies more, validation of the mount or block device is
always done by the caller of TestEphemeral.
hostPath volume plugin creates a directory within /tmp on host machine, to be mounted as volume.
inject-pod writes content to the volume, and a client-pod tried the read the contents and verify.
when SELinux is enabled on the host, client-pod can not read the content, with permission denied.
running the client-pod as privileged, so that it can access the volume content, even when SEinux is enabled on the host.
Enable feature by default.
Update integration tests for other features to assume that finalizers are present.
Change-Id: Ie969344f572627dba882c0e862e5700dadaf3026
Besides "subPath should unmount if pod is gracefully deleted while kubelet is
down" we also need a special case for "subPath should unmount if pod is force
deleted while kubelet is down".
This fixes a test failure in https://testgrid.k8s.io/sig-storage-kubernetes#gce-serial
It shouldn't make any difference, but it's better to actually test that
assumption.
All existing tests which create pods get converted by skipping the explicit PVC
creation for the ephemeral case and instead modifying the test pod so that it
has a volume claim template with the same spec as the PVC.
The feature gate gets locked to "true", with the goal to remove it in two
releases.
All code now can assume that the feature is enabled. Tests for "feature
disabled" are no longer needed and get removed.
Some code wasn't using the new helper functions yet. That gets changed while
touching those lines.
Each e2e test knows it wants to restart a running kubelet or a
non-running kubelet. The vast majority of times, we want to
restart a running kubelet (e.g. to change config or to check
some properties hold across kubelet crashes/restarts), but sometimes
we stop the kubelet, do some actions and only then restart.
To accomodate both use cases, we just expose the `running` boolean
flag to the e2e tests.
Having the `restartKubelet` explicitly restarting a running kubelet
helps us to trobuleshoot e2e failures on which the kubelet
was supposed to be running, while it was not; attempting a restart
in such cases only murkied the waters further, making the
troubleshooting and the eventual fix harder.
In the happy path, no expected change in behaviour.
Signed-off-by: Francesco Romani <fromani@redhat.com>
In the `restartKubelet` helper, we use `exec.Command`, whose
return value is the output as the command, but as `[]byte`.
The way we logged the output of the command was as value, making
the output, meant to be human readable, unnecessarily hard to read.
We fix this annoying behaviour converting the output to string before
to log it out, making pretty obvious to understand the outcome of
the command.
Signed-off-by: Francesco Romani <fromani@redhat.com>
This patch changes cpuCount to cpuRequest in order to cater to cases
where guaranteed pods make non-integral CPU Requests.
Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
apparmor is no longer found in Alpine edge/testing but in
edge/community, presumably in preparation for full-fledged inclusion in
3.15. If so, once that is released, BASEIMAGE can be updated again and
the explicit --repository flag to 'apk add' dropped.
Fixes: https://github.com/kubernetes/kubernetes/issues/105528
Once the node gets deleted, the nodelifecycle controller
is racing to update pod status and the pod deletion logic
is failing causing tests to flake. This commit moves
the testContext creation to within the test loop and deletes nodes,
namespace within the test loop. We don't explicitly call the node
deletion within the loop but the `testutils.CleanupTest(t, testCtx)`
call ensures that the namespace, nodes gets deleted.
While running tests in parallel, especially those with higher loads
than others, it might take some time for Pods to be Running, even more
so if the image has to be pulled as well.
The test [sig-node] Pods should delete a collection of pods [Conformance]
only waits for the for the pods to be scheduled before deleting them, and
expects them to be gone in 1 minute, which can flake because of the above
reasons. Note that the operations are in order, and kubelet runs them in
order, which means that the pod first has to enter the Running state
before attempting to delete it.
This commit waits for the Pods to enter the Running state first before
deleting the entire collection.
Co-Authored-By: Antonio Ojea <aojea@redhat.com>
This commit fixes the LocalStorageCapacityIsolationEviction test by
acknowledging that in its default configuration kubelet will no-longer
evict memory-backed volume pods as they cannot use more than their
assigned limit with SizeMemoryBackedVolumes enabled.
To account for the old behaviour, we also add a test that explicitly
disables the feature to test the behaviour of memory backed local
volumes in those scenarios. That test can be removed when/if the feature
gate is removed.
Currently the storage eviction tests fail for a few reasons:
- They re-enter storage exhaustion after pulling the images during
cleanup (increasing test storage reqs, and adding verification for
future diagnosis)
- They were timing out, as in practice it seems that eviction takes just
over 10 minutes on an n1-standard in many cases. I'm raising these to
15 to provide some padding.
This should ideally bring these tests to passing on CI, as they've now
passed locally for me several times with the remote GCE env.
Follow up work involves diagnosing why these take so long, and
restructuring them to be less finicky.
When adding the ephemeral volume feature, the special case for
PersistentVolumeClaim volume sources in kubelet's host path and node
limits checks was overlooked. An ephemeral volume source is another
way of referencing a claim and has to be treated the same way.
The recommendation from #sig-cli was to print usage, then the error. Extra care
is taken to only print the usage instruction when the error really was about
flag parsing.
Taking kube-scheduler as example:
$ _output/bin/kube-scheduler
I0929 09:42:42.289039 149029 serving.go:348] Generated self-signed cert in-memory
...
W0929 09:42:42.489255 149029 client_config.go:620] error creating inClusterConfig, falling back to default config: unable to load in-cluster configuration, KUBERNETES_SERVICE_HOST and KUBERNETES_SERVICE_PORT must be defined
E0929 09:42:42.489366 149029 run.go:98] "command failed" err="invalid configuration: no configuration has been provided, try setting KUBERNETES_MASTER environment variable"
$ _output/bin/kube-scheduler --xxx
Usage:
kube-scheduler [flags]
...
--vmodule moduleSpec
comma-separated list of pattern=N settings for file-filtered logging
Error: unknown flag: --xxx
The kubectl behavior doesn't change:
$ _output/bin/kubectl get nodes
Unable to connect to the server: dial tcp: lookup xxxx: No address associated with hostname
$ _output/bin/kubectl --xxx
Error: unknown flag: --xxx
See 'kubectl --help' for usage.
It wasn't documented that InitLogs already uses the log flush frequency, so
some commands have called it before parsing (for example, kubectl in the
original code for logs.go). The flag never had an effect in such commands.
Fixing this turned into a major refactoring of how commands set up flags and
run their Cobra command:
- component-base/logs: implicitely registering flags during package init is an
anti-pattern that makes it impossible to use the package in commands which
want full control over their command line. Logging flags must be added
explicitly now, something that the new cli.Run does automatically.
- component-base/logs: AddFlags would have crashed in kubectl-convert if it
had been called because it relied on the global pflag.CommandLine. This
has been fixed and kubectl-convert now has the same --log-flush-frequency
flag as other commands.
- component-base/logs/testinit: an exception are tests where flag.CommandLine has
to be used. This new package can be imported to add flags to that
once per test program.
- Normalization of the klog command line flags was inconsistent. Some commands
unintentionally didn't normalize to the recommended format with hyphens. This
gets fixed for sample programs, but not for production programs because
it would be a breaking change.
This refactoring has the following user-visible effects:
- The validation error for `go run ./cmd/kube-apiserver --logging-format=json
--add-dir-header` now references `add-dir-header` instead of `add_dir_header`.
- `staging/src/k8s.io/cloud-provider/sample` uses flags with hyphen instead of
underscore.
- `--log-flush-frequency` is not listed anymore in the --logging-format flag's
`non-default formats don't honor these flags` usage text because it will also
work for non-default formats once it is needed.
- `cmd/kubelet`: the description of `--logging-format` uses hyphens instead of
underscores for the flags, which now matches what the command is using.
- `staging/src/k8s.io/component-base/logs/example/cmd`: added logging flags.
- `apiextensions-apiserver` no longer prints a useless stack trace for `main`
when command line parsing raises an error.
We graduate the `CPUManagerPolicyOptions` feature to beta
in the 1.23 cycle, and we add new experimental feature gates
to guard new options which are planned in the 1.23 and in the
following cycles.
We introduce additional feature gate called `CPUManagerPolicyAlphaOptions` and
`CPUManagerPolicyBetaOptions`. The basic idea is to avoid the
cumbersome process of adding a feature gate for each option, and to have
feature gates which track the maturity level of _groups_ of options.
Besides this change, the graduation process, and the process in general,
for adding new policy options is still unchanged.
The `full-pcpus-only` option added in the 1.22 cycle is intentionally
moved into the beta policy options
For more details:
- KEP: https://github.com/kubernetes/enhancements/pull/2933
- sig-arch discussion:
https://groups.google.com/u/1/g/kubernetes-sig-architecture/c/Nxsc7pfe5rw
Signed-off-by: Francesco Romani <fromani@redhat.com>
The boolean values for --dry-run have been deprecated for removal since
1.18, more than 2 releases.
The default value for --dry-run with the flag set and unspecified has
been deprecated for removal since 1.18, more than 2 releases.
Both values are now removed in this change. Any kubectl --dry-run
usage no longer accepts --dry-run=(true|false) boolean values and usage
now requires that a value of (client|server|none) is specified.
The boom-server container forges out-of-order TCP packets and injects them into the network. This requires the container to have the CAP_NET_RAW linux capability, otherwise the test will fail.
Signed-off-by: Riccardo Ravaioli <rravaiol@redhat.com>
* Updates ImpersonationConfig in rest/config.go to include UID
attribute, and pass it through when copying the config
* Updates ImpersonationConfig in transport/config.go to include UID
attribute
* In transport/round_tripper.go, Set the "Impersonate-Uid" header in
requests based on the UID value in the config
* Update auth_test.go integration test to specify a UID through the new
rest.ImpersonationConfig field rather than manually setting the
Impersonate-Uid header
Signed-off-by: Margo Crawford <margaretc@vmware.com>
By parsing flags in the test's main function before starting etcd we bail out
early without ever starting etcd when the test was invoked with -help.
Otherwise etcd must be available, gets started and then hangs because
flag.Parse itself exits when called by testing.go. This bypasses the code in
EtcdMain which normally stops etcd.
Otherwise, nodeNameToPodList[nodeName] list will have all its references
identical (corresponding to the control variable reference).
Thus, making all the pods in the list identical.
The agnhost pods using netexec will bind by default to the UDP
port 8081, use a different port for hostNetwork pods to avoid
scheduling conflicts and fail the tests.
There are some tests that doesn't need the UDP listener, so they
can disable it.
This is specially needed for tests that use hostNetwork pods, if 2
pods try to bind to the same port, the test will fail because one
of the pod can't be scheduled because of the port conflict.
To keep backwards compatibility, we can add an option to disable
the UDP listener by setting the port number to -1, that is consistent
with the SCTP implementation.
The issue on both tests is that before the refactor we had a method that
was creating the `StorageClass` manifest only, this manifest was used
later to be created by `TestBindingWaitForFirstConsumerMultiPVC`, after
the refactor we're ensuring that the `StorageClass` exists as a resource
before calling `TestBindingWaitForFirstConsumerMultiPVC` however this
method is still attempting to create it, that's the reason behind the
error: `resourceVersion should not be set on objects to be created
This issue wasn't caught before because
`TestBindingWaitForFirstConsumerMultiPVC` is creating the StorageClass
without the common utility function, the solution is to remove the
snippet that attempts to create the StorageClass againo
This test case requires special test-handler setup which is only done
for gce clusters created by kube-up scripts. Let's skip the test when
run under other providers.
use Extraconfig to configure the repair interval
and add an integration test for services finalizers, and
possible races with the services repair loop.