Automatic merge from submit-queue
Fix the properties file for node e2e cri validation.
I fixed this locally before, but accidentally missed in the PR. Sorry about that.
This time, I've tried myself, it should work.
@yujuhong
Automatic merge from submit-queue
Bump up GCI version.
```release-note
Upgrading Container-VM base image for k8s on GCE. Brief changelog as follows:
- Fixed performance regression in veth device driver
- Docker and related binaries are statically linked
- Fixed the issue of systemd being oom-killable
```
Fixes#32596
This needs a cherrypick into v1.4 release branch because it is fixing v1.4 release blocking issues. This patch is easy and safe to rollback in case of emergencies.
@vishh can you please review?
Fixes#32596 and many other issues.
cc/ @kubernetes/goog-image FYI
Brief changelog compared to gci-dev-54-8743-3-0:
- Fixed performance regression in veth device driver
- Docker and related binaries are statically linked
- Fixed the issue of systemd being oom-killable
- Updated built-in kubelet version to 1.3.7
- add ethtool and ebtables binaries expected by kubelet
Fixes#32596
Automatic merge from submit-queue
Node E2E: Add image white list
This is part of #29081. Fixes#29155.
As is discussed with @yujuhong in #29155, it is difficult to maintain the prepull image list if it is not enforced.
This PR added an image white list in the test framework, only images in the white list could be used in the test. If the image is not in the white list, the test will fail with reason:
```
Image "XXX" is not in the white list, consider adding it to CommonImageWhiteList in test/e2e/common/util.go or NodeImageWhiteList in test/e2e_node/image_list.go
```
Notice that if image pull policy is `PullAlways`, the image is not necessary to be in the white list or prepulled, because the test expects the image to be pulled during the test.
Currently, the image white list is only enabled in node e2e, because the image puller in e2e test is not integrated with the image white list yet.
/cc @kubernetes/sig-node
Automatic merge from submit-queue
[kubelet] Fix oom-score-adj policy in kubelet
Fixes#32238
We have been having this regression since v1.3. It is critical for GKE/GCE deployments of k8s because docker daemon has a high likelihood of being OOM killed which will end up nuking all containers.
The reason for moving from mnt to pid is that docker daemon moves itself into a new mnt namespace with systemd based deployments.
Automatic merge from submit-queue
change the error log for empty resource usage
This PR changes the error log for empty resource usage buffer for a container to be more clear. It happens when the container name is wrong, or cAdvisor somehow does not response.
Automatic merge from submit-queue
Get image and machine info from apiserver in node e2e test
This PR changes node e2e test to get image and machine information from API server instead of pass them from Jenkins test framework. The original format to pass image and machine info is naming the test node as "machine-image-uuid", which is hard to parse because "-" occurs a lot in both machine and image names.
Now we add two labels "image" and "machine" into performance data. The machine type has the format "cpu:1core,memory:3.6GB".
This PR is based on #32250.
Automatic merge from submit-queue
Add node e2e density test using 60 QPS for benchmark
This PR adds a new benchmark node e2e density test which sets Kubelet API QPS limit from default 5 to 60, through ConfigMap.
The latency caused by API QPS limit is as large as ~30% when creating a large batch of pods (e.g. 105). It makes the pod startup latency, as well creation throughput underestimated. This test helps us to know the real performance of Kubelet core.
Automatic merge from submit-queue
Update container image version for downward api volume tests
Some tests were using 0.7, and some were using 0.6, so updating all to 0.7.
@kubernetes/rh-cluster-infra
Automatic merge from submit-queue
Automated Docker Validation: Change wrong name in perf config.
The config key `containervm-density*` is improper, remove it.
/cc @coufon
Automatic merge from submit-queue
Log pressure condition, memory usage, events in memory eviction test
I want to log this to help us debug some of the latest memory eviction test flakes, where we are seeing burstable "fail" before the besteffort. I saw (in the logs) attempts by the eviction manager to evict besteffort a while before burstable phase changed to "Failed", but the besteffort's phase appeared to remain "Running". I want to see the pressure condition interleaved with the pod phases to get a sense of the eviction manager's knowledge vs. pod phase.
Automatic merge from submit-queue
Plumb --feature-gates from TEST_ARGS to components in node e2e tests
This means you can set `TEST_ARGS` on the command line, in a `.properties` config for a Jenkins job, etc, to toggle gated features. For example:
`TEST_ARGS='--feature-gates=DynamicKubeletConfig=true'`
/cc @vishh @jlowdermilk
Automatic merge from submit-queue
Rename ConnectToDockerOrDie to CreateDockerClientOrDie
This function does not actually attempt to connect to the docker daemon, it just creates a client object that can be used to do so later. The old name was confusing, as it implied that a failure to touch the docker daemon could cause program termination (rather than just a failure to create the client).
Automatic merge from submit-queue
Node E2E: Fix wrong permission bit for log file.
When creating log for logs from journald, we use `0755` which is weird to me.
This PR changes it to `0666`.
Automatic merge from submit-queue
Move wait for pressure to subside to AfterEach
so we still wait if the test part of the test for eviction order fails.
Prior to this change, a K8s branch (master as well as release) was
pinned to a GCI milestone. It would pick up the latest GCI release on
that milestone at the time of cluster creation. The rationale was the
K8s users would automatically get the bug fixes in newer versions of
GCI. However in practice, it makes the runtime environment
non-deterministic, and lack of continuous e2e tests mean we would run
into breakages sooner or later.
With this change, each K8s release will pick a specific version
of GCI by default (similar to how the Debian-based container-vm gets used).
Users can override the default version through KUBE_GCE_MASTER_IMAGE and
KUBE_GCE_NODE_IMAGE environment variables.
We expect the default GCI version will be updated relatively frequently stay
updated with newer GCI releases. We can also automate the process to
automatically bump the hard-coded GCI version in future.
This commit enables the dynamic kubelet configuration feature for the
node e2e Jenkins serial tests, which is where the test for dynamic kubelet
configuration currently runs.
This gives the node e2e test binary a --feature-gates flag that populates a
FeatureGates field on the test context. The value of this field is forwarded
to the kubelet's --feature-gates flag and is also used to populate the global
DefaultFeatureGate object so that statically-linked components see the same
feature gate settings as provided via the flag.
This means that you can set feature gates via the TEST_ARGS environment
variable when running node e2e tests. For example:
TEST_ARGS='--feature-gates=DynamicKubeletConfig=true'
This should stop the test from flaking while we figure out why there is
a mismatch between the reported pressure condition and the eviction
manager's decision to evict due to memory pressure.