For 1.23, we removed the kubectl `--dry-run` empty default value (`--dry-run`)
and boolean values (`--dry-run=true` and `--dry-run=false`). This change
required requiring users to specify `--dry-run=client` or `--dry-run=server`
due to a deprecation. This change was made in #105327.
After reconsideration, this change is not worth the churn for users.
It's likely that many users rely on these values for automated and manual use
cases.
This change reverts #105327 and re-introduces the values `--dry-run`,
`--dry-run=true`, and `--dry-run=false`.
The apiserver may be configured to generate the Service
kubernetes.default and its endpoints addresses.
This service is single-stack, hence, the endpoints and the ClusterIP
must have the same IP family.
The tests were asserting that after a NodePort Service was removed,
no new traffic was still reaching the endpoints.
However, the number of tries was so large that another test running
in parallel could create a working Service on that NodePort, making
the test fails.
Use only 10 tries to confirm that the Service stopped working.
The Conformance test "should orphan pods created by rc if delete options say so"
is spawning 80% of the Cluster's Pod Availability (on a 2 node setup, with 30 Pods
capacity each, it spawns 48 pods).
Because of this, tests that are running in parallel with this test has a higher
chance to flake, causing them to timeout because they didn't get to spawn the
necessary Pods within the expected 1 minute time.
Lowering the percentage should reduce the ammount of flakes we see.
Revert to previous behavior in 1.21/1.20 of setting pod phase to failed
during graceful node shutdown.
Setting pods to failed phase will ensure that external controllers that
manage pods like deployments will create new pods to replace those that
are shutdown. Many customers have taken a dependency on this behavior
and it was breaking change in 1.22, so this change reverts back to the
previous behavior.
Signed-off-by: David Porter <david@porter.me>
The same change was already done for csi-driver-host-path master, but not
released yet because csi-snapshotter v5.0.0 itself was not ready yet.
We need this update in k/k because some canary jobs already use the new
snapshotter sidecar which causes permission issues.
This is an automatic update of the testing manifests that mirrors the v1.7.3
release. All of these changes were created with
test/e2e/testing-manifests/storage-csi$ ./update-hostpath.sh v1.7.3
This is a first step towards removing the mock CSI driver completely from
e2e testing in favor of hostpath plugin. With the recent hostpath plugin
changes(PR #260, #269), it supports all the features supported by the mock
csi driver.
Using hostpath-plugin for testing also covers CSI persistent feature
usecases.
Right now, `run_remote.go` only supports GCE instances. But actually
running the tests is completely independent of GCE and could work just
as well on any SSH-accessible machine.
This patch adds a new `--mode` switch, which defaults to `gce` for
backwards compatibility, but can be set to `ssh`. In that mode, the GCE
API is not used at all, and we simply connect to the hosts given via
`--hosts`.
This is still better than `run_local.go` because the latter mixes build
environment with test environment, which doesn't fit well with
container-optimized operating systems.
This is part of an effort to setup the e2e node tests on Fedora CoreOS
(see https://github.com/coreos/fedora-coreos-tracker/issues/990).
Patch best viewed with whitespace ignored.
Implements server side field validation behind the
`ServerSideFieldValidation` feature gate. With the
feature enabled, any create/update/patch request
with the `fieldValidation` query param set to
"Strict" will error if the object in the request
body have unknown fields. A value of "Warn"
(also the default when the feautre is enabled)
will succeed the request with a warning.
When the feature is disabled (or the query param
has a value of "Ignore"), the request will succeed
as it previously had with no indications of any
unknown or duplicate fields.
When a service is created with AllocateLoadBalancerNodePorts to false
it should not allocate node ports.
If the same service is updated to set AllocateLoadBalancerNodePorts
to true, it should allocate node ports.
When a service is updated from ClusterIP type to LoadBalancer type,
and AllocateLoadBalancerNodePorts is set to false, it should not
allocate node ports.
This test has been part of the Conformance suite since at least
Kubernetes 1.2 (2015-10-xx). Some years later, around 2018-10-xx, we
drafted a rigorous set of rules for tests to follow in order to be
eligible for promotion to Conformance. We explicitly disallowed any
tests that check for specific Events, since they are not an API, and we
make no guarantees about their contents nor their delivery.
Unfortunately, we neglected to go through the existing corpus of
Conformance tests with a fine-toothed comb after drafting these rules.
The very nature of what this test is attempting to exercise and verify
is specific Events, and their delivery, thus making it ineligible for
Conformance. We should have caught and demoted this test back then.
Better late than never?
This patch makes the CRI `v1` API the new project-wide default version.
To allow backwards compatibility, a fallback to `v1alpha2` has been added
as well. This fallback can either used by automatically determined by
the kubelet.
Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
* capture loop variable
* capture the loop variable and don't fail on not found errors
* capture loop variable
* Revert "Mark restart_test as flaky"
This reverts commit 990e9506de.
* skip e2e node restart test with dockershim
* Update test/e2e_node/restart_test.go
Co-authored-by: Mike Miranda <mikemp96@gmail.com>
* capture loop using index
Co-authored-by: Mike Miranda <mikemp96@gmail.com>
The device_plugin_tests have not run successfully in a very long time,
initially being marked flaky and then eventually becoming stale.
The gpu_device_plugin_tests have been used to test the same behaviour,
but are incredibly high maintenance due to external changes in behaviour
from GCP/Nvidia that we have no control over.
This commit takes the existing device plugin tests, makes them look more
like the GPU tests, and removes the cases that have been unsupported for
a long time (namely restarting containers while the plugin is
unavailable).
It also removes the GPU plugin tests, as we do not get more signal by
using real devices here.
The featureGates field in ClusterConfiguration ends up
as a map[interface{}]interface{} in the test suite
and cannot be casted to map[string]bool directly.
Adapt the test to use map[interface{}]interface{}.
On the multi NUMA node environment, kernel splits hugepages allocated under
/sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages file equally between NUMA nodes.
That makes it harder to predict where several pods will start because the number
of hugepages on each NUMA node will depend on the amount of NUMA nodes under the environment.
The memory manager test will allocate hugepages on the specific NUMA node to make
the test more predictable on the multi NUMA nodes environment.
Signed-off-by: Artyom Lukianov <alukiano@redhat.com>
Let's wait for the local node (aka the kubelet)
to be ready before to query podresources again,
to avoid false negatives.
Co-authored-by: Artyom Lukianov <alukiano@redhat.com>
Signed-off-by: Francesco Romani <fromani@redhat.com>
DKC is being removed and we don't want it to continue flaking the rest
of our tests. Lets disable them when dkc is disabled rather than hard
failing. This fits more in line with our other E2Es, and reduces the
maintenance load in test-infra.
we need to make sure the system state is completely cleaned up
again, to avoid to mess up with the shared node state, before
we transition from one test to another.
Signed-off-by: Francesco Romani <fromani@redhat.com>
Since commit 42dd01aa3f the cpuRequest is in millicores, hence
we need to properly check translating to exclusive cpus
when verifying the resource allocation.
Signed-off-by: Francesco Romani <fromani@redhat.com>
the intent is to make the code more readable, no intended
changes in behaviour. Now it should be a bit more explicit
why the code is checking some values.
Signed-off-by: Francesco Romani <fromani@redhat.com>
Signed-off-by: wangyysde <net_use@bzhy.com>
Generation swagger.json.
Use v2 path for hpa_cpu_field.
run update-codegen.sh
Signed-off-by: wangyysde <net_use@bzhy.com>
Some of the networking tests are flaking, and logging the command stdout and stderr
might show us some additional information about the the underlying issue when it
occurs.
Some tests have a short timeout for starting the pods (1 minute), but if
those tests happen to be the first ones to run, and the images have to be
pulled, then the test could timeout, especially with larger images. This
commit will allow us to prepull commonly used E2E test images, so this issue
can be avoided.
The logic to detect stale endpoints was not assuming the endpoint
readiness.
We can have stale entries on UDP services for 2 reasons:
- an endpoint was receiving traffic and is removed or replaced
- a service was receiving traffic but not forwarding it, and starts
to forward it.
Add an e2e test to cover the regression
The --log-file parameter will be deprecated as of Kubernetes 1.23 and should be
avoided. The replacement for distroless images is the image with go-runner, a
tool that handles output redirection.
For kubemark to run in that image it must be built as static binary.
* Bump the pod status and node status update timeouts to avoid flakes
* Add a small delay after dbus restart to ensure dbus has enough time to
restart to startup prior to sending shutdown signal
* Change check of pod being terminated by graceful shutdown. Previously,
the pod phase was checked to see if it was `Failed` and the pod reason
string matched. This logic needs to change after 1.22 graceful node
shutdown change introduced in PR #102344 which changed behavior to no
longer put the pods into a failed phase. Instead, the test now checks
that containers are not ready, and the pod status message and reason
are set appropriately.
Signed-off-by: David Porter <david@porter.me>
For some test failures, checking the pod logs could potentially
yield some interesting information, which could be used to further
investigate certain failures / flakes (for example, if there are some
networking issues, we could at least see if requests reach the containers,
(agnhost logs the connections / requests), or if there were any
other issues during the container's startup).
It looks like it tests two pods sharing the same volume, but the goal is
actually the opposite - two pods with the same inline volume definition
should get separate volumes.
This commit forces Kubelet Configuration files to always be generated
and when possible will use the kubeletconfig file that has been provided
by the test orchestrator
This commit enables the remote runner to provide a KubeletConfiguration
file to the test suite when uploading it to a remote host, thet test
runner will then use this configuration to run the Kubelet with the
provided config.
* De-share the Handler struct in core API
An upcoming PR adds a handler that only applies on one of these paths.
Having fields that don't work seems bad.
This never should have been shared. Lifecycle hooks are like a "write"
while probes are more like a "read". HTTPGet and TCPSocket don't really
make sense as lifecycle hooks (but I can't take that back). When we add
gRPC, it is EXPLICITLY a health check (defined by gRPC) not an arbitrary
RPC - so a probe makes sense but a hook does not.
In the future I can also see adding lifecycle hooks that don't make
sense as probes. E.g. 'sleep' is a common lifecycle request. The only
option is `exec`, which requires having a sleep binary in your image.
* Run update scripts
* fix_dsc_rbac_pod_update
* add test for DaemonSet Controller updates label of the pod after "DedupCurHistories"
* rebase
* update parameter of dsc.Run
Copying from pvcBlock swapped name and namespace (breaking the PVC test case)
and some references to the pvcBlock variable were left unchanged (incorrect
annotations for test failures).
Add a e2e test to exercise the checkpoint recovery flow.
This means we need to actually create a old (V1, pre-1.20) checkpoint,
but if we do it only in the e2e test, it's still fine.
Signed-off-by: Francesco Romani <fromani@redhat.com>
The previous implementaton called Update() without changing anything
about the object, so no MODIFIED events were ever generated. This change
ensures that all calls to Update() cause mutations, thereby ensuring
that MODIFIED events happen in the watch stream.
Signed-off-by: Steve Kuznetsov <skuznets@redhat.com>
During PR review it was pointed out that the branches for ephemeral
vs. persistent make the test harder to read. Therefore all code that depends on
if checks gets moved into two different versions of the test, one hat runs for
ephemeral volumes and one for persistent volumes, with skip statements at the
beginning.
* Cleanup FeatureGate skippers
* Perform changes requested by review
* some more review related changes
* Rename skipper functions to make code more readable
* add utilfeature back in
Conceptually, snapshots have to be taken while the pod and thus the volume
exist. Snapshotting has an issue where flushing of data is not guaranteed while
the volume is still staged on the node, so the test relied on deleting the pod
and checking for the volume to be unused. That part of the test cannot be done
for ephmeral volumes.
This is a fix for the new test case from
https://github.com/kubernetes/kubernetes/pull/105636 which had to be merged
without prior testing due to not having a cluster to test on and no pull job
which runs these
tests. https://testgrid.k8s.io/sig-storage-kubernetes#gce-serial then showed a
failure.
The fix is simple: in the ephemeral case, the PVC name isn't set in advance in
pvc.Name and instead must be computed. The fix now was tested on a kubetest
cluster in GCE.
Signed-off-by: Patrick Ohly <patrick.ohly@intel.com>
This adds a new test pattern and uses it for the inline volume tests. Because
the kind of volume now varies more, validation of the mount or block device is
always done by the caller of TestEphemeral.
hostPath volume plugin creates a directory within /tmp on host machine, to be mounted as volume.
inject-pod writes content to the volume, and a client-pod tried the read the contents and verify.
when SELinux is enabled on the host, client-pod can not read the content, with permission denied.
running the client-pod as privileged, so that it can access the volume content, even when SEinux is enabled on the host.
Enable feature by default.
Update integration tests for other features to assume that finalizers are present.
Change-Id: Ie969344f572627dba882c0e862e5700dadaf3026
Besides "subPath should unmount if pod is gracefully deleted while kubelet is
down" we also need a special case for "subPath should unmount if pod is force
deleted while kubelet is down".
This fixes a test failure in https://testgrid.k8s.io/sig-storage-kubernetes#gce-serial
It shouldn't make any difference, but it's better to actually test that
assumption.
All existing tests which create pods get converted by skipping the explicit PVC
creation for the ephemeral case and instead modifying the test pod so that it
has a volume claim template with the same spec as the PVC.
The feature gate gets locked to "true", with the goal to remove it in two
releases.
All code now can assume that the feature is enabled. Tests for "feature
disabled" are no longer needed and get removed.
Some code wasn't using the new helper functions yet. That gets changed while
touching those lines.
Each e2e test knows it wants to restart a running kubelet or a
non-running kubelet. The vast majority of times, we want to
restart a running kubelet (e.g. to change config or to check
some properties hold across kubelet crashes/restarts), but sometimes
we stop the kubelet, do some actions and only then restart.
To accomodate both use cases, we just expose the `running` boolean
flag to the e2e tests.
Having the `restartKubelet` explicitly restarting a running kubelet
helps us to trobuleshoot e2e failures on which the kubelet
was supposed to be running, while it was not; attempting a restart
in such cases only murkied the waters further, making the
troubleshooting and the eventual fix harder.
In the happy path, no expected change in behaviour.
Signed-off-by: Francesco Romani <fromani@redhat.com>
In the `restartKubelet` helper, we use `exec.Command`, whose
return value is the output as the command, but as `[]byte`.
The way we logged the output of the command was as value, making
the output, meant to be human readable, unnecessarily hard to read.
We fix this annoying behaviour converting the output to string before
to log it out, making pretty obvious to understand the outcome of
the command.
Signed-off-by: Francesco Romani <fromani@redhat.com>
This patch changes cpuCount to cpuRequest in order to cater to cases
where guaranteed pods make non-integral CPU Requests.
Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
apparmor is no longer found in Alpine edge/testing but in
edge/community, presumably in preparation for full-fledged inclusion in
3.15. If so, once that is released, BASEIMAGE can be updated again and
the explicit --repository flag to 'apk add' dropped.
Fixes: https://github.com/kubernetes/kubernetes/issues/105528
Once the node gets deleted, the nodelifecycle controller
is racing to update pod status and the pod deletion logic
is failing causing tests to flake. This commit moves
the testContext creation to within the test loop and deletes nodes,
namespace within the test loop. We don't explicitly call the node
deletion within the loop but the `testutils.CleanupTest(t, testCtx)`
call ensures that the namespace, nodes gets deleted.
While running tests in parallel, especially those with higher loads
than others, it might take some time for Pods to be Running, even more
so if the image has to be pulled as well.
The test [sig-node] Pods should delete a collection of pods [Conformance]
only waits for the for the pods to be scheduled before deleting them, and
expects them to be gone in 1 minute, which can flake because of the above
reasons. Note that the operations are in order, and kubelet runs them in
order, which means that the pod first has to enter the Running state
before attempting to delete it.
This commit waits for the Pods to enter the Running state first before
deleting the entire collection.
Co-Authored-By: Antonio Ojea <aojea@redhat.com>
This commit fixes the LocalStorageCapacityIsolationEviction test by
acknowledging that in its default configuration kubelet will no-longer
evict memory-backed volume pods as they cannot use more than their
assigned limit with SizeMemoryBackedVolumes enabled.
To account for the old behaviour, we also add a test that explicitly
disables the feature to test the behaviour of memory backed local
volumes in those scenarios. That test can be removed when/if the feature
gate is removed.
Currently the storage eviction tests fail for a few reasons:
- They re-enter storage exhaustion after pulling the images during
cleanup (increasing test storage reqs, and adding verification for
future diagnosis)
- They were timing out, as in practice it seems that eviction takes just
over 10 minutes on an n1-standard in many cases. I'm raising these to
15 to provide some padding.
This should ideally bring these tests to passing on CI, as they've now
passed locally for me several times with the remote GCE env.
Follow up work involves diagnosing why these take so long, and
restructuring them to be less finicky.
When adding the ephemeral volume feature, the special case for
PersistentVolumeClaim volume sources in kubelet's host path and node
limits checks was overlooked. An ephemeral volume source is another
way of referencing a claim and has to be treated the same way.
The recommendation from #sig-cli was to print usage, then the error. Extra care
is taken to only print the usage instruction when the error really was about
flag parsing.
Taking kube-scheduler as example:
$ _output/bin/kube-scheduler
I0929 09:42:42.289039 149029 serving.go:348] Generated self-signed cert in-memory
...
W0929 09:42:42.489255 149029 client_config.go:620] error creating inClusterConfig, falling back to default config: unable to load in-cluster configuration, KUBERNETES_SERVICE_HOST and KUBERNETES_SERVICE_PORT must be defined
E0929 09:42:42.489366 149029 run.go:98] "command failed" err="invalid configuration: no configuration has been provided, try setting KUBERNETES_MASTER environment variable"
$ _output/bin/kube-scheduler --xxx
Usage:
kube-scheduler [flags]
...
--vmodule moduleSpec
comma-separated list of pattern=N settings for file-filtered logging
Error: unknown flag: --xxx
The kubectl behavior doesn't change:
$ _output/bin/kubectl get nodes
Unable to connect to the server: dial tcp: lookup xxxx: No address associated with hostname
$ _output/bin/kubectl --xxx
Error: unknown flag: --xxx
See 'kubectl --help' for usage.
It wasn't documented that InitLogs already uses the log flush frequency, so
some commands have called it before parsing (for example, kubectl in the
original code for logs.go). The flag never had an effect in such commands.
Fixing this turned into a major refactoring of how commands set up flags and
run their Cobra command:
- component-base/logs: implicitely registering flags during package init is an
anti-pattern that makes it impossible to use the package in commands which
want full control over their command line. Logging flags must be added
explicitly now, something that the new cli.Run does automatically.
- component-base/logs: AddFlags would have crashed in kubectl-convert if it
had been called because it relied on the global pflag.CommandLine. This
has been fixed and kubectl-convert now has the same --log-flush-frequency
flag as other commands.
- component-base/logs/testinit: an exception are tests where flag.CommandLine has
to be used. This new package can be imported to add flags to that
once per test program.
- Normalization of the klog command line flags was inconsistent. Some commands
unintentionally didn't normalize to the recommended format with hyphens. This
gets fixed for sample programs, but not for production programs because
it would be a breaking change.
This refactoring has the following user-visible effects:
- The validation error for `go run ./cmd/kube-apiserver --logging-format=json
--add-dir-header` now references `add-dir-header` instead of `add_dir_header`.
- `staging/src/k8s.io/cloud-provider/sample` uses flags with hyphen instead of
underscore.
- `--log-flush-frequency` is not listed anymore in the --logging-format flag's
`non-default formats don't honor these flags` usage text because it will also
work for non-default formats once it is needed.
- `cmd/kubelet`: the description of `--logging-format` uses hyphens instead of
underscores for the flags, which now matches what the command is using.
- `staging/src/k8s.io/component-base/logs/example/cmd`: added logging flags.
- `apiextensions-apiserver` no longer prints a useless stack trace for `main`
when command line parsing raises an error.
We graduate the `CPUManagerPolicyOptions` feature to beta
in the 1.23 cycle, and we add new experimental feature gates
to guard new options which are planned in the 1.23 and in the
following cycles.
We introduce additional feature gate called `CPUManagerPolicyAlphaOptions` and
`CPUManagerPolicyBetaOptions`. The basic idea is to avoid the
cumbersome process of adding a feature gate for each option, and to have
feature gates which track the maturity level of _groups_ of options.
Besides this change, the graduation process, and the process in general,
for adding new policy options is still unchanged.
The `full-pcpus-only` option added in the 1.22 cycle is intentionally
moved into the beta policy options
For more details:
- KEP: https://github.com/kubernetes/enhancements/pull/2933
- sig-arch discussion:
https://groups.google.com/u/1/g/kubernetes-sig-architecture/c/Nxsc7pfe5rw
Signed-off-by: Francesco Romani <fromani@redhat.com>
The boolean values for --dry-run have been deprecated for removal since
1.18, more than 2 releases.
The default value for --dry-run with the flag set and unspecified has
been deprecated for removal since 1.18, more than 2 releases.
Both values are now removed in this change. Any kubectl --dry-run
usage no longer accepts --dry-run=(true|false) boolean values and usage
now requires that a value of (client|server|none) is specified.
The boom-server container forges out-of-order TCP packets and injects them into the network. This requires the container to have the CAP_NET_RAW linux capability, otherwise the test will fail.
Signed-off-by: Riccardo Ravaioli <rravaiol@redhat.com>
* Updates ImpersonationConfig in rest/config.go to include UID
attribute, and pass it through when copying the config
* Updates ImpersonationConfig in transport/config.go to include UID
attribute
* In transport/round_tripper.go, Set the "Impersonate-Uid" header in
requests based on the UID value in the config
* Update auth_test.go integration test to specify a UID through the new
rest.ImpersonationConfig field rather than manually setting the
Impersonate-Uid header
Signed-off-by: Margo Crawford <margaretc@vmware.com>
By parsing flags in the test's main function before starting etcd we bail out
early without ever starting etcd when the test was invoked with -help.
Otherwise etcd must be available, gets started and then hangs because
flag.Parse itself exits when called by testing.go. This bypasses the code in
EtcdMain which normally stops etcd.
Otherwise, nodeNameToPodList[nodeName] list will have all its references
identical (corresponding to the control variable reference).
Thus, making all the pods in the list identical.
The agnhost pods using netexec will bind by default to the UDP
port 8081, use a different port for hostNetwork pods to avoid
scheduling conflicts and fail the tests.
There are some tests that doesn't need the UDP listener, so they
can disable it.
This is specially needed for tests that use hostNetwork pods, if 2
pods try to bind to the same port, the test will fail because one
of the pod can't be scheduled because of the port conflict.
To keep backwards compatibility, we can add an option to disable
the UDP listener by setting the port number to -1, that is consistent
with the SCTP implementation.
The issue on both tests is that before the refactor we had a method that
was creating the `StorageClass` manifest only, this manifest was used
later to be created by `TestBindingWaitForFirstConsumerMultiPVC`, after
the refactor we're ensuring that the `StorageClass` exists as a resource
before calling `TestBindingWaitForFirstConsumerMultiPVC` however this
method is still attempting to create it, that's the reason behind the
error: `resourceVersion should not be set on objects to be created
This issue wasn't caught before because
`TestBindingWaitForFirstConsumerMultiPVC` is creating the StorageClass
without the common utility function, the solution is to remove the
snippet that attempts to create the StorageClass againo
This test case requires special test-handler setup which is only done
for gce clusters created by kube-up scripts. Let's skip the test when
run under other providers.
use Extraconfig to configure the repair interval
and add an integration test for services finalizers, and
possible races with the services repair loop.
- Debian base used was older (v2.1.3) missing multiple fixed CVEs
- Minor update to distroless debian image name to explicitly point
to debian 10
- Debian base image now points to buster-1.9.0
The Topology Manager e2e tests wants to run on real multi-NUMA system
and want to consume real devices supported by device plugins; SRIOV
devices happen to be the most commonly available of such devices.
CI machines aren't multi NUMA nor expose SRIOV devices, so the biggest portion
of the tests will just skip, and we need to keep it like this until we
figure out how to enable these features.
However, some organizations can and want to run the testsuite on bare metal;
in this case, the current test will skip (not fail) with misconfigured
boxes, and this reports a misleading result. It will be much better to
fail if the test preconditions aren't met.
To satisfy both needs, we add an option, controlled by an environment
variable, to fail (not skip) if the machine on which the test run
doesn't meet the expectations (multi-NUMA, 4+ cores per NUMA cell,
expose SRIOV VFs).
We keep the old behaviour as default to keep being CI friendly.
Signed-off-by: Francesco Romani <fromani@redhat.com>
Previously we would try to infer the `ipFamilyPolicy` from `clusterIPs`
and/or `ipFamilies`. That is too tricky. Now you MUST specify
`ipFamilyPolicy` as one of the dual-stack options in order to get a
dual-stack service.
In older versions of Kubernetes (at least pre-0.19, it's the earliest
this test will run unmodified on), Pods that depended on devices could be
restarted after the device plugin had been removed. Currently however,
this isn't possible, as during ContainerManager.GetResources(), we
attempt to DeviceManager.GetDeviceRunContainerOptions() which fails as
there's no cached endpoint information for the plugin type.
This commit therefore breaks apart the existing test into two:
- One active test that validates that assignments are maintained across
restarts
- One skipped test that validates the behaviour after GPUs have been
removed, in case we decide that this is a bug that should be fixed in
the future.
15m is enough for Cluster Autoscaler to remove empty nodes, so we need
to break them sooner than that. Instead, wait 15m after breaking them to
ensure Cluster Autoscaler will consider them as unready instead of still
starting.
The profile gatherer has been removed in
https://github.com/kubernetes/kubernetes/pull/85304, so those options
are unused since then and can therefore be removed.
Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
The e2e test "should have Endpoints and EndpointSlices pointing to
the API Server Service" was veryfing the current endpoints
reconciler implementation on the apiservers, however, users may
disable the endpoint reconciler and create their own.
This e2e test is also a conformance test, so we should test the
behaviour and not the implementation details. The test verifies
that a kubernetes.default service exist, an endpoint and endpoint
slices object referencing that service exist and are equivalent.
The Container Images for Windows Server 2022 have been published, and we can
start adding jobs for them.
The ltsc2022-based images have been built and promoted with these image versions.
The PR https://github.com/kubernetes/kubernetes/pull/104575 introduces
some intermediate types which makes the 32GiB memory machine kill the
typecheck process. To resolve that issue and make the test more robust,
we now reduce the amount of parallel typechecks to run to `2`.
Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
Prior to this change, the pod was not getting scheduled on the node as
we don't have a running scheduler in e2e_node. PodClient solves this
problem by manually assigning the pod to the node.
The current GPU installer was built in 2017, from source that no longer
exists in Kubernetes ([adding commit][1]. The image was built on 2017-06-13.
Unfortunately, this installer no longer appears to work. When debugging
on the same node type as used by test-infra, it failed to build the
driver as the kernel sha was no longer available.
This lead to needing to find a new way to install GPUs. The smallest
logical change was switching to [cos-gpu-installer][2]
. There is a newer version of this available on [googlesource][3] that
I have not yet tested as it's not clear what the state of the project
is, as I couldn't find docs outside of the source itself.
We install things to the same location as previously to avoid needing
extra downstream changes. There are a couple of weird issues here
however, like needing to run the container twice to correctly update the
LD Cache.
[1]: 1e77594958/cluster/gce/gci/nvidia-gpus/Dockerfile
[2]: https://github.com/GoogleCloudPlatform/cos-gpu-installer
[3]: https://cos.googlesource.com/cos/tools/+/refs/heads/master/src/cmd/cos_gpu_installer/
Different CSI drivers have different error messages, making it difficult
to check them accurately. We remove the check for the error message and
only check the failure type instead, since that is all we need.
If device plugin returns device without topology, keep it internaly
as NUMA node -1, it helps at podresources level to not export NUMA
topology, otherwise topology is exported with NUMA node id 0,
which is not accurate.
It's imposible to unveile this bug just by tracing json.Marshal(resp)
in podresource client, because NUMANodes field ID has json property
omitempty, in this case when ID=0 shown as emtpy NUMANode.
To reproduce it, better to iterate on devices and just
trace dev.Topology.Nodes[0].ID.
Signed-off-by: Alexey Perevalov <alexey.perevalov@huawei.com>
The Container Images for Windows Server 2022 have been published, and
we can start building test images using them, so we can start adding
jobs for them.
The image versions for the e2e test images have been bumped in a previous
commit, but haven't been promoted yet. We don't need to bump them here.
httpd-2.4.46-win64-VC15.zip no longer exists, so we have to use
httpd-2.4.48-win64-VC15.zip instead.
Even DynamicKubeletConfig is deprecated it still used in e2e_node test.
The bug is hidden by forcibly enabled option
TEST_ARGS='--feature-gates=DynamicKubeletConfig=true'
if this option is not enabled setKubeletConfiguration tries to set
kubelet config via apiserver interface and failed with timeout.
Signed-off-by: Alexey Perevalov <alexey.perevalov@huawei.com>
Agnhost's serve-hostname at endpoint /hostname
will return hostname. Pods host node name may
return FQDN. Comparison between the two fails.
Signed-off-by: Martin Kennelly <mkennell@redhat.com>
The Container Images for Windows Server 2022 have been published, and
we can start building test images using them, so we can start adding
jobs for them.
The image versions for the e2e test images have been bumped in a previous
commit, but haven't been promoted yet. We don't need to bump them here.
We're starting with windows-servercore-cache and busybox images, since
they are needed for the other images the most.
A previous added LD_FLAGS for the go binary compilation, but it's not
defined for all images.
The pods using hostNetwork use the host network namespace, hence
they have to share it with the rest of the process and pods.
If several pods try to bind to the same port, the test will fail,
so we try to use a non common port, and run the different scenario
in the same test, so we only have to bind once and we avoid consuming
ports reducing the port collision risk.
In the test image build jobs, the image-util.sh script is not being run in a git
repository, which causes git log to fail.
In this case, we can use the PULL_BASE_SHA set in cloudbuild.yaml instead.
Windows Containerd has more features than Windows Docker. One of them is single file
mappings, allowing us to also map individual files into containers, not just folders.
This will set the tag [Excluded:WindowsDocker] for those tests instead of [LinuxOnly].
Co-authored-by: Mark Rossetti <marosset@microsoft.com>
Container without elevated privileges to bind to
host port less than 1024 causes bind permission
denied error.
Increase port number greater than 1024 to allow
binding.
Signed-off-by: Martin Kennelly <mkennell@redhat.com>
All dependencies of VolumeBinding plugin from
"k8s.io/kubernetes/pkg/controller/volume/scheduling" package moved to
"k8s.io/kubernetes/pkg/scheduler/framework/plugins/volumebinding" package:
- whole file pkg/controller/volume/scheduling/scheduler_assume_cache.go
- whole file pkg/controller/volume/scheduling/scheduler_assume_cache_test.go
- whole file pkg/controller/volume/scheduling/scheduler_binder.go
- whole file pkg/controller/volume/scheduling/scheduler_binder_fake.go
- whole file pkg/controller/volume/scheduling/scheduler_binder_test.go
Package "k8s.io/kubernetes/pkg/controller/volume/scheduling/metrics" moved
to "k8s.io/kubernetes/pkg/scheduler/framework/plugins/volumebinding/metrics"
because it only used in VolumeBinding plugin and (e2e) tests.
More described in issue #89930 and PR #102953.
Signed-off-by: Konstantin Misyutin <konstantin.misyutin@huawei.com>
Currently, whenever agnhost/VERSION is bumped, the version in
agnhost/agnhost.go has to be bumped as well. This is also verified
on presubmit (build/dependencies.yaml).
This means that whenever we need to bump the agnhost image version,
someone has to approve the build/dependencies.yaml, which is not as
easy.
This commit removes the need for this check by automatically setting
the Version inside agnhost.go at build time, simplifying the process.
This commit adds an e2e test for the kubelet flags `--lock-file` and
`exit-on-lock-contention`. Eventually we would like to move them to the
kubelet configuration file rather than flags.
This test is based on the premise that whenever there is a lock
contention of the lock file (e.g. /var/run/kubelet.lock), the running
kubelet must terminate and the waiting for the lock on the lock file to
be released before starting again.
In this test we simulate that behaviour of a file contention. The test
would try to acquire the lock on the lock file.
Success of the test is determined kubelet health check when the lock is
acquired by the test and passes when the lock on the lock file is
released.
Signed-off-by: Imran Pochi <imran@kinvolk.io>
DynamicFileCAContent and DynamicCertKeyPairContent used periodical job
to check whether the file content has changed, leading to 1 minute of
delay in worst case. This patch improves it by leveraging fsnotify
watcher. The content change will be reflected immediately.
Tests "Multi-AZ Cluster Volumes" should consider only nodes that are
schedulable and *untainted* when computing AZ where to run the tests.
GetReadySchedulableNodes() already filters schedulable + untainted nodes,
no need to do it again in GetSchedulableClusterZones().
For some reason when we send them to journald, many log lines are
consistently dropped as soon as the PLEG is started.
If we log directly to file, we don't have this problem. As a bonus, if
the tests crash, the kubelet logs will always be available since they
were already written; otherwise we normally wait until the end of the
test run to collect them from journald, meaning that we often end up
with empty logs.
Removes any reference from the registry gcr.io/kubernetes-e2e-test-images in
kubernetes/kubernetes, replacing it with k8s.gcr.io/kubernetes-e2e-test-images.
In some cases, the images had to be updated since a few things have changed since
their original implementation, most notably being the fact that some of the images
have been centralized into the agnhost image.
Co-Authored-By: Claudiu Belu <cbelu@cloudbasesolutions.com>
- recover to last-known-good ConfigMap.KubeletConfigKey
~12m to run in CI, 13m locally
- non-nil last-known-good to a new non-nil last-known-good
~24m to run in CI
- recover to last-known-good ConfigMap
~12m to run in CI
- state transitions
~8m to run in CI
Including a skip method as the first line of a test does not prevent the test to fail in the BeforeEach function.
If the test is skipped because of a tag in the name, then we can prevent such odd behavior.
We now use a host local exec instead of SSH commands to simplify the
test and make the result more robust.
Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
This assumes that SSH via bastion works if the `KUBE_SSH_BASTION`
environment variable is set, which is the case for
`pull-kubernetes-e2e-gce-correctness`.
Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
1. fix command empty issue for some Windows storage tests
2. enable more windows storage tests by adding ntfs test patten
Change-Id: Ic33be282d669a23107474a14d4368bbf95c9b459
This add e2e test for HPA ContainerResource metrics. This add test to cover two scenarios
1. Scale up on a busy application with an idle sidecar container
2. Do not scale up on a busy sidecar with an idle application.
Signed-off-by: Vivek Singh <svivekkumar@vmware.com>
* Squashed commit of the following:
commit 7f774dcb54b511a3956aed0fac5c803f145e383a
Author: Jay Vyas (jayunit100) <jvyas@vmware.com>
Date: Fri Jun 18 10:58:16 2021 +0000
fix commit message
commit 0ac09650742f02004dbb227310057ea3760c4da9
Author: jay vyas <jvyas@vmware.com>
Date: Thu Jun 17 07:50:33 2021 -0400
Update test/e2e/network/netpol/kubemanager.go
Co-authored-by: Antonio Ojea <antonio.ojea.garcia@gmail.com>
commit 6a8bf0a6a2690dac56fec2bdcdce929311c513ca
Author: jay vyas <jvyas@vmware.com>
Date: Sun Jun 13 08:17:25 2021 -0400
Implement Service polling for network policy suite to remove reliance on CoreDNS when verifying network policys
Update test/e2e/network/netpol/probe.go
Co-authored-by: Antonio Ojea <antonio.ojea.garcia@gmail.com>
Add deafultNS to use service probe
commit b9c17a48327aab35a855540c2294a51137aa4a48
Author: Matthew Fenwick <mfenwick100@gmail.com>
Date: Thu May 27 07:30:59 2021 -0400
address code review comments for networkpolicy decoupling from dns
commit e23ef6ff0d189cf2ed80dbafed9881d68402cb56
Author: jay vyas <jvyas@vmware.com>
Date: Wed May 26 13:30:21 2021 -0400
NetworkPolicy decoupling from DNS
gofmt
remove old function
* model refactor
* minor
* dropped getK8sModel func
* dropped modelMap, added global model in BeforeEach and subsequent changes
Co-authored-by: Rajas Kakodkar <rajaskakodkar16@gmail.com>
Prevent Kubelet from incorrectly interpreting "not yet started" pods as "ready to terminate pods" by unifying responsibility for pod lifecycle into pod worker
Add e2e tests to cover the basic flows for the `full-pcpus-only` option:
negative flow to ensure rejection with proper error message, and
positive flow to verify the actual cpu allocation.
Co-authored-by: Swati Sehgal <swsehgal@redhat.com>
Signed-off-by: Francesco Romani <fromani@redhat.com>
Through Job.status.uncountedPodUIDs and a Pod finalizer
An annotation marks if a job should be tracked with new behavior
A separate work queue is used to remove finalizers from orphan pods.
Change-Id: I1862e930257a9d1f7f1b2b0a526ed15bc8c248ad
As of now, we allow PDBs to be applied to pods via
selectors, so there can be unmanaged pods(pods that
don't have backing controllers) but still have PDBs associated.
Such pods are to be logged instead of immediately throwing
a sync error. This ensures disruption controller is
not frequently updating the status subresource and thus
preventing excessive and expensive writes to etcd.
A number of race conditions exist when pods are terminated early in
their lifecycle because components in the kubelet need to know "no
running containers" or "containers can't be started from now on" but
were relying on outdated state.
Only the pod worker knows whether containers are being started for
a given pod, which is required to know when a pod is "terminated"
(no running containers, none coming). Move that responsibility and
podKiller function into the pod workers, and have everything that
was killing the pod go into the UpdatePod loop. Split syncPod into
three phases - setup, terminate containers, and cleanup pod - and
have transitions between those methods be visible to other
components. After this change, to kill a pod you tell the pod worker
to UpdatePod({UpdateType: SyncPodKill, Pod: pod}).
Several places in the kubelet were incorrect about whether they
were handling terminating (should stop running, might have
containers) or terminated (no running containers) pods. The pod worker
exposes methods that allow other loops to know when to set up or tear
down resources based on the state of the pod - these methods remove
the possibility of race conditions by ensuring a single component is
responsible for knowing each pod's allowed state and other components
simply delegate to checking whether they are in the window by UID.
Removing containers now no longer blocks final pod deletion in the
API server and are handled as background cleanup. Node shutdown
no longer marks pods as failed as they can be restarted in the
next step.
See https://docs.google.com/document/d/1Pic5TPntdJnYfIpBeZndDelM-AbS4FN9H2GTLFhoJ04/edit# for details
UserInfo contains a uid field alongside groups, username and extra.
This change makes it possible to pass a UID through as an impersonation header like you
can with Impersonate-Group, Impersonate-User and Impersonate-Extra.
This PR contains:
* Changes to impersonation.go to parse the Impersonate-Uid header and authorize uid impersonation
* Unit tests for allowed and disallowed impersonation cases
* An integration test that creates a CertificateSigningRequest using impersonation,
and ensures that the API server populates the correct impersonated spec.uid upon creation.
1. add AllocateLoadBalancerNodePorts fields in specs for validation test cases
2. update fuzzer
3. in resource quota e2e, allocate node port for loadbalancer type service and
exceed the node port quota
Signed-off-by: Hanlin Shi <shihanlin9@gmail.com>
This change updates the CSR API to add a new, optional field called
expirationSeconds. This field is a request to the signer for the
maximum duration the client wishes the cert to have. The signer is
free to ignore this request based on its own internal policy. The
signers built-in to KCM will honor this field if it is not set to a
value greater than --cluster-signing-duration. The minimum allowed
value for this field is 600 seconds (ten minutes).
This change will help enforce safer durations for certificates in
the Kube ecosystem and will help related projects such as
cert-manager with their migration to the Kube CSR API.
Future enhancements may update the Kubelet to take advantage of this
field when it is configured in a way that can tolerate shorter
certificate lifespans with regular rotation.
Signed-off-by: Monis Khan <mok@vmware.com>
Ensure resources are created in zone with schedulable
nodes. For example, if we have 4 zones with 3 zones
having worker nodes and 1 zone having master nodes(unscheduable
for workloads), we should not create resources like PV, PVC or
pods in that zone.
We're running ubernetes tests
`should only be allowed to provision PDs in zones
where nodes exist`
on gcp&gke. While the test is useful in exercising
the scenario of identifying extra zone and
creating a node in it, not every Kube
distribution uses the same approach to create a node,
further if even there is an extra zone, we cannot
guarantee the zone to have enough quota. There can also
be other GCP specific edge cases all of which cannot be
covered within this test. So, removing the test
as agreed upon with the storage team
The data structure would wrap an embedded filesystem andthe root
directory relative to which the embedded filesystem is constructed.
Signed-off-by: Nabarun Pal <pal.nabarun95@gmail.com>
We're trying to fix https://github.com/kubernetes/kubernetes/issues/75355
sicne long time, and we believe the current timeout could
actually be too low (despite being "forever", which is 30s).
To validate this theory, we set the timeout to one full minute.
Also, make the logging more verbose to make the troubleshooting easier.
Signed-off-by: Francesco Romani <fromani@redhat.com>
The PR https://github.com/kubernetes/kubernetes/pull/100041 updated
node-problem-detector to v0.8.7, but unfortunately we didn't update
also the image using in the e2e_node tests.
As result, the tests were failing like
E2eNode Suite: [sig-node] NodeProblemDetector [NodeFeature:NodeProblemDetector] [Serial] SystemLogMonitor should generate node condition and events for corresponding errors
_output/local/go/src/k8s.io/kubernetes/test/e2e_node/node_problem_detector_linux.go:301
Timed out after 60.000s.
Expected success, but got an error:
<*errors.errorString | 0xc0011f2600>: {
s: "expected total number of events was 4, actual events counted was 7\nEvents
This in turn was one of the contributing factors in making the
pull-kubernetes-node-kubelet-serial lane constantly failing.
This patch updates the image used in the tests, fixing the failure.
Signed-off-by: Francesco Romani <fromani@redhat.com>
Some tests are checking the network connectivity using gomega.Consistently,
which will fail if any of the checks fails. This could lead to flakyness in
some scenarios in which kube-proxy was supposed to apply Policies for
Kubernetes services.
We can instead wait for the network connectivity to work first using gomega.Eventually,
after which we can check the consistency.
For manifest lists containing Windows images, it is important to also have the "os.version"
annotation set, as it is needed by the Windows nodes, so they can pull the appropriate image
from the list.
Previously, the docker manifest CLI did not have the capability to set it, so, we had to set
it outselves in the manifest list's image JSON file. This is no longer necessary since
docker 20.10.0, which includes docker manifest annotate --os-version.
The docker installed in the image gcr.io/k8s-testimages/gcb-docker-gcloud:v20210622-762366a
satisfies this version requirement.
The CPUManager graduated to beta a while ago (k8s 1.10?)
so let's get rid of the obsolete Alpha tag on its e2e tests.
Signed-off-by: Francesco Romani <fromani@redhat.com>
- verify memory manager data returned by `GetAllocatableResources`
- verify pod container memory manager data
Signed-off-by: Artyom Lukianov <alukiano@redhat.com>
In the case of multinode clusters, the http server pod and the test cluster can
spawn on different nodes, which can be problematic for poststart / prestop hooks,
as they are executed by the kubelet itself, and the cross-node lifecycle hook might
fail (according to the Kubernetes network model, it is not mandatory for kubelet to
be able to access pods on a different node).
This commit ensures that the test pod spawns on the same node as the http server pod.
This test verifies an implementation detail in the in-tree gcepd
plugin. The behavior is not implementated in the gcepd CSI driver
and therefore the test will be obsolete after CSI migration.
Co-Authored-By: Riaan Kleinhans <riaan@ii.coop>
e2e test validates the following 3 extra endpoints
- patchAppsV1NamespacedStatefulSet
- listAppsV1StatefulSetForAllNamespaces
- deleteAppsV1CollectionNamespacedStatefulSet
Some CSI drivers can't clone a volume into other topology segment (e.g. a
cloud availability zone). The scheduler does not know about these
restrictions and schedules pods with PVCs that clone a volume mostly
randomly.
Run all volume cloning tests in the same topology segment, if such segment
is available and has at least one schedulable node.
The MetricsGrabber itself knows now whether it supports each
component. The checks inside the tests therefore are redundant at best
or worse, they are wrong: for example, on a KinD cluster the check for
"has master node registered" failed and metrics grabbing from
scheduler and controller manager were skipped unnecessarily.
The MetricsGrabber checked whether a component supported metrics
grabbing, but then tests didn't have an API to use the result of that
check. Because metrics grabbing is an optional debug feature, tests
must skip checks that depend on metrics data or, when the entire
test is about metrics data, skip the test.
This is now supported with a special error that gets wrapped and
returned by the individual Grab functions.
This can be checked by trying to retrieve log output. As in the case
of no pod found, a warning gets emitted when log retrieval fails and
metrics grabbing gets disabled.
Logging is checked instead of actual metrics retrieval because the
latter is more complex and thus more likely to fail for other reasons.
The previous approach with grabbing via a nginx proxy had some
drawbacks:
- it did not work when the pods only listened on localhost (as
configured by kubeadm) and the proxy got deployed on a different
node
- starting the proxy raced with starting the pods, causing
sporadic test failures because the proxy was not set up
properly unless it saw all pods when starting the e2e.test
- the proxy was always started, whether it is needed or not
- the proxy was left running after a test and then the next
test run triggered potentially confusing messages when
it failed to create objects for the proxy
The new approach is similar to "kubectl port-forward" + "kubectl get
--raw". It uses the port forwarding feature to establish a TCP
connection via a custom dialer, then lets client-go handle TLS and
credentials.
Somehow verifying the server certificate did not work. As this
shouldn't be a big concern for E2E testing, certificate checking gets
disabled on the client side instead of investigating this further.
The value here is that the exec plugin author can use the kubeconfig to assert
how standard input is treated with respect to the exec plugin, e.g.,
- an exec plugin author can ensure that kubectl fails if it cannot provide
standard input to an exec plugin that needs it (Always)
- an exec plugin author can ensure that an client-go process will still call an
exec plugin that prefers standard input even if standard input is not
available (IfAvailable)
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
The apiserver and test suite in node e2e runs under the sshd daemon
that can limit the amount of files it can open. Set a higher limit
to address the issues.
Signed-off-by: Odin Ugedal <odin@uged.al>
on cgroup v2 the reported metric is recursive for the entire and it
includes all the sub cgroups.
Adjust the test accordingly.
Closes: https://github.com/kubernetes/kubernetes/issues/99230
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Node e2e tests exceeding the global timeout are sent SIGINT, resulting
in no artifacts or console output. This will ignore the first SIGINT,
and since all children processes are being stopped due to SIGINT, we can
clean up before exiting.
This reverts commit
c15fd76ee9. Most (all?) of the hostpath
tests and several other tests started to fail again in
gce-scale-master-correctness after re-enabling the controller. This
shows that it was not just the obsolete agent which causes scalability
problems, but also the controller.
It has to be disabled until the scalability problems are addressed.
This make sure the testcase that cannot run locally will be skipped
instead of throwing the misleading failure message.
Signed-off-by: Dave Chen <dave.chen@arm.com>
CSI driver need to pass special mount opts to XFS filesystem to be able to
mount a volume + its clone or its restored snapshot on the same node. Add a
test to exhibit this behavior.
The test is optional for now, giving CSI drivers time to fix it.
These tests have been flaky for a long time, with a relatively low
rate of flakes. Nonetheless it seems better to extend the timeouts to
reduce the flakiness.
It was disabled together with the agent to avoid test failures in
gce-master-scale-correctness (https://github.com/kubernetes/kubernetes/issues/102452). That
solved the problem, but we still need to check whether the controller
alone works.
They are not needed for any of the tests and may be causing too much
overhead (see
https://github.com/kubernetes/kubernetes/issues/102452#issuecomment-854452816).
We already disabled them earlier and then re-enabled them again
because it wasn't clear how much overhead they were causing. A recent
change in how the sidecars get
deployed (https://github.com/kubernetes/kubernetes/pull/102282) seems
to have made the situation worse again. There's no logical explanation
for that yet, though.
(cherry picked from commit 0c2cee5676e64976f9e767f40c4c4750a8eeb11f)
As seen in https://github.com/kubernetes/kubernetes/issues/102452, we
currently don't have pod events for the CSI driver pods because of the
different namespace and would need them to determine whether the
driver gets evicted.
Previously, only changes of the pods where logged. Perhaps even more
interesting are events in the namespace.
The "[Feature:SCTP]" tag was needed on "should not allow access by TCP
when a policy specifies only SCTP" back when SCTP was alpha, because
it wasn't possible to create a policy that even mentioned SCTP without
enabling the feature gate. This no longer applies, and the tag was
removed from the original copy of network_policy.go, but accidentally
got left behind in the netpol/ version.
Likewise, the newly-added "should not allow access by TCP when a
policy specifies only UDP" got tagged "[Feature:UDP]", but this was
never necessary, and is inconsistent with other UDP tests anyway.
Similarly, we need "[Feature:SCTPConnectivity]" on tests that make
SCTP connections, because that functionality is not available in all
clusters, but "[Feature:UDPConnectivity]" is unnecessary and
inconsistent.
This image is the same as "gcr.io/authenticated-image-pulling/windows-nanoserver:v1"
that is used for the "should be able to pull from private registry with secret"
test on Windows.
Adding this image will allow other people to build and push their own
images to their own private registries.
Make sure to use SIGKILL so that the service is killed in a dirty way.
In case container runtime use "Restart=on-abnormal" in systemd, killing
with SIGTERM will not restart the service, as the kill looks intentional
and clean. This is used by cri-o by default.
We can indirectly retrieve the kube-cross version from the
`build/build-image/cross/VERSION` for the sample-apiserver. This allows
us to simplify the handling in `build/dependencies.yaml` as well as
the required approval (via `OWNERS`) if the kube-cross version changes.
Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
Current test assumes that test pod is deleted when the test
namespace is deleted. However, namespace deletion is an asynchronous
operation. The pod may still be running and allocating hugepages
resources when next test case creates another pod that requests
the same hugepages resources. This can cause kubelet to fail the test
pod with this kind of error:
OutOfhugepages-2Mi: Node didn't have enough resource: hugepages-2Mi
requested: 6291456, used: 6291456, capacity: 10485760
Explicitly deleting test pod should fix this issue.
Update dependencies and the test images to use pause 3.5. We also
provide a changelog entry for the new container image version.
Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
The e2e that create/deletes pods rapidly and verifies their status
was reporting a very long timing:
timings total=12.211347385s t=491ms run=2s execute=450402h8m25s
in a few scenarios. Add error checks that clarify when this happens
and why. Report p50/75/90/99 latencies on teardown as observed from
the test for baseline for future changes.
This change fixes the "Variable Expansion should verify that a failing subpath
expansion can be modified during the lifecycle of a container" conformance test
where the annotations are replaced with those used by the test, instead of
appending them.
This may cause undesirable side-effects with other controllers that may be
using pod annotations.
If a user specifies basic auth, then apply the same short circuit logic
that we do for bearer tokens (see comment).
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
e2e test validates the following 3 extra endpoints
- replaceAppsV1NamespacedDaemonSetStatus
- readAppsV1NamespacedDaemonSetStatus
- patchAppsV1NamespacedDaemonSetStatus
e2e test validates the following 3 extra endpoints
- replaceAppsV1NamespacedReplicaSetStatus
- readAppsV1NamespacedReplicaSetStatus
- patchAppsV1NamespacedReplicaSetStatus
Remove the e2e test for ClusterStatus in the kubeadm suite.
The object was deprecated in a previous release and is no longer
written by kubeadm v1.22 in the kubeadm-config config map.
e2e test validates the following 3 endpoints
- readAppsV1NamespacedDeploymentStatus
- replaceAppsV1NamespacedDeploymentStatus
- patchAppsV1NamespacedDeploymentStatus
e2e test validates the following 3 endpoints
- patchAppsV1NamespacedStatefulSetStatus
- readAppsV1NamespacedStatefulSetStatus
- replaceAppsV1NamespacedStatefulSetStatus
- ceph deploy on ARM64 depends on "libec_jerasure_neon.so" which is not included
in `ceph-base` package in fedora26 distro, updated the distro to fedora33 to
fix the issue
```
sh ./mon.sh "$(hostname -i)"
/usr/lib64/ceph/erasure-code/libec_jerasure_neon.so: cannot open shared object file
```
- default pool `rbd` is not created on arm64, need to created this pool manually.
```
rbd import --image-feature layering block foo
rbd: error opening default pool 'rbd'
```
Signed-off-by: Dave Chen <dave.chen@arm.com>