Commit Graph

11454 Commits

Author SHA1 Message Date
Kubernetes Prow Robot
e9d9a82839 Merge pull request #124101 from haircommander/process_stats-with-pid-fix
kubelet: fix PID based eviction
2024-07-25 11:59:57 -07:00
Kevin Hannon
3e642aee3f move container fs check so that we only check if system is split 2024-07-24 11:22:23 -04:00
carlory
c4851c64a0 remove volumeoptions from VolumePlugin and BlockVolumePlugin 2024-07-24 14:07:02 +08:00
Kubernetes Prow Robot
57d197fb89 Merge pull request #124430 from AllenXu93/fix-kubelet-restart-notReady
fix node notReady in first sync period after kubelet restart
2024-07-23 21:20:40 -07:00
Kubernetes Prow Robot
5af1710d90 Merge pull request #126243 from SergeyKanzhelev/devicePluginFailures
Implement resource health in pod status (KEP 4680)
2024-07-23 20:12:24 -07:00
Kubernetes Prow Robot
d97cf3a1eb Merge pull request #126303 from bart0sh/PR150-dra-refactor-checkpoint-upstream
DRA: refactor checkpointing
2024-07-23 18:01:53 -07:00
Sergey Kanzhelev
62f96d2748 set AllocatedResourcesStatus in the Pod Status 2024-07-24 00:29:35 +00:00
Kubernetes Prow Robot
fa4b8f32ac Merge pull request #125935 from gjkim42/fix-125880
Terminate restartable init containers ignoring not-started containers
2024-07-23 15:45:11 -07:00
Ed Bartosh
c0d922e786 DRA: Kubelet code cleanup 2024-07-24 00:27:52 +03:00
Ed Bartosh
59555c6a62 DRA: move dra/checkpont/* to dra/state/* 2024-07-24 00:12:10 +03:00
Ed Bartosh
35fbbc5cfd DRA: use crc32.ChecksumIEEE to calculate checkpoint checksum 2024-07-24 00:10:39 +03:00
Ed Bartosh
59daed75d6 DRA: refactor checkpointing
Co-authored-by: Kevin Klues <klueska@gmail.com>
2024-07-24 00:10:30 +03:00
Kubernetes Prow Robot
107f621462 Merge pull request #126108 from gnufied/changes-volume-recovery
Reduce state changes when expansion fails and mark certain failures as infeasible
2024-07-23 13:30:56 -07:00
Kubernetes Prow Robot
fbdfb9d8d9 Merge pull request #126031 from harche/kubelet_cgroupv1_arg
KEP-4569: Kubelet option to disable cgroup v1 support
2024-07-23 09:21:11 -07:00
Kubernetes Prow Robot
a4f9910c51 Merge pull request #126014 from PannagaRao/kep-ephemeral-storage-quota
pkg/volume/*: Enable quotas in user namespace
2024-07-23 09:21:02 -07:00
Kubernetes Prow Robot
d7194eb370 Merge pull request #124884 from carlory/report-event-when-kubelet-attach-failed
report an event to pod if kubelet does attach operation failed
2024-07-23 09:20:43 -07:00
Kubernetes Prow Robot
581a073dc4 Merge pull request #125663 from saschagrunert/oci-volumesource-kubelet
[KEP-4639] Add `ImageVolumeSource` implementation
2024-07-22 15:48:33 -07:00
Kubernetes Prow Robot
d21b17264e Merge pull request #125488 from pohly/dra-1.31
DRA for 1.31
2024-07-22 11:45:55 -07:00
Sascha Grunert
979863d15c Add ImageVolumeSource implementation
This patch adds the kubelet implementation of the image volume source
feature.

Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
2024-07-22 18:46:46 +02:00
Patrick Ohly
d11b58efe6 DRA kubelet: refactor gRPC call timeouts
Some of the E2E node tests were flaky. Their timeout apparently was chosen
under the assumption that kubelet would retry immediately after a failed gRPC
call, with a factor of 2 as safety margin. But according to
0449cef8fd,
kubelet has a different, higher retry period of 90 seconds, which was exactly
the test timeout. The test timeout has to be higher than that.

As the tests don't use the gRPC call timeout anymore, it can be made
private. While at it, the name and documentation gets updated.
2024-07-22 18:09:34 +02:00
Patrick Ohly
877829aeaa DRA kubelet: adapt to v1alpha3 API
This adds the ability to select specific requests inside a claim for a
container.

NodePrepareResources is always called, even if the claim is not used by any
container. This could be useful for drivers where that call has some effect
other than injecting CDI device IDs into containers. It also ensures that
drivers can validate configs.

The pod resource API can no longer report a class for each claim because there
is no such 1:1 relationship anymore. Instead, that API reports claim,
API devices (with driver/pool/device as ID) and CDI device IDs. The kubelet
itself doesn't extract that information from the claim. Instead, it relies on
drivers to report this information when the claim gets prepared. This isolates
the kubelet from API changes.

Because of a faulty E2E test, kubelet was told to contact the wrong driver for
a claim. This was not visible in the kubelet log output. Now changes to the
claim info cache are getting logged. While at it, naming of variables and some
existing log output gets harmonized.

Co-authored-by: Oksana Baranova <oksana.baranova@intel.com>
Co-authored-by: Ed Bartosh <eduard.bartosh@intel.com>
2024-07-22 18:09:34 +02:00
Patrick Ohly
91d7882e86 DRA: new API for 1.31
This is a complete revamp of the original API. Some of the key
differences:
- refocused on structured parameters and allocating devices
- support for constraints across devices
- support for allocating "all" or a fixed amount
  of similar devices in a single request
- no class for ResourceClaims, instead individual
  device requests are associated with a mandatory
  DeviceClass

For the sake of simplicity, optional basic types (ints, strings) where the null
value is the default are represented as values in the API types. This makes Go
code simpler because it doesn't have to check for nil (consumers) and values
can be set directly (producers). The effect is that in protobuf, these fields
always get encoded because `opt` only has an effect for pointers.

The roundtrip test data for v1.29.0 and v1.30.0 changes because of the new
"request" field. This is considered acceptable because the entire `claims`
field in the pod spec is still alpha.

The implementation is complete enough to bring up the apiserver.
Adapting other components follows.
2024-07-22 18:09:34 +02:00
Itamar Holder
6c1f14c468 unit tests: exclude critical pods from swapping
Signed-off-by: Itamar Holder <iholder@redhat.com>
2024-07-22 17:56:52 +03:00
Itamar Holder
532cd5f84c Exclude critical pods from having swap access
Signed-off-by: Itamar Holder <iholder@redhat.com>
2024-07-22 17:56:52 +03:00
Peter Hunt
5fd7219cf4 kubelet/stats: fix pid stats for cadvisor stats provider
the process stats aren't correct coming from only the pod stats.
They need to be summed for all of the containers, as cadvisor
is only reading per pid (per container process)

Signed-off-by: Peter Hunt <pehunt@redhat.com>
2024-07-22 10:54:42 -04:00
Kevin Hannon
7d8ba7849b priority pid tests should match on processes
pids 0
process should not be nonzero
2024-07-22 10:54:42 -04:00
David Porter
6e6b2b76a3 test: Update summary test to check for process count
The process count is expected to always be >= 1 for pods in the test.

Let's check it's >= 1, so we can catch issues if the proecss count is
not reported.

Signed-off-by: David Porter <david@porter.me>
Signed-off-by: Paco Xu <paco.xu@daocloud.io>
2024-07-22 10:54:42 -04:00
David Porter
f58b46cb97 fix process stats
Signed-off-by: David Porter <david@porter.me>
2024-07-22 10:54:42 -04:00
PannagaRamamanohara
d16fd6a915 pkg/volume: Use QuotaMonitoring in UserNamespace
Enable LocalStorageCapacityIsolationFSQuotaMonitoring
only when hostUsers in PodSpec is set to false.
Modify unit tests and e2e tests to verify

Signed-off-by: PannagaRamamanohara <pbhojara@redhat.com>
2024-07-22 09:43:57 -04:00
Patrick Ohly
b51d68bb87 DRA: bump API v1alpha2 -> v1alpha3
This is in preparation for revamping the resource.k8s.io completely. Because
there will be no support for transitioning from v1alpha2 to v1alpha3, the
roundtrip test data for that API in 1.29 and 1.30 gets removed.

Repeating the version in the import name of the API packages is not really
required. It was done for a while to support simpler grepping for usage of
alpha APIs, but there are better ways for that now. So during this transition,
"resourceapi" gets used instead of "resourcev1alpha3" and the version gets
dropped from informer and lister imports. The advantage is that the next bump
to v1beta1 will affect fewer source code lines.

Only source code where the version really matters (like API registration)
retains the versioned import.
2024-07-21 17:28:13 +02:00
Kubernetes Prow Robot
558c9536a1 Merge pull request #123678 from kinvolk/userns-use-kubelet-user-mappings
kubelet: Add logs for userns custom mappings parsing
2024-07-20 19:59:57 -07:00
Kubernetes Prow Robot
a8d354bf39 Merge pull request #126122 from HirazawaUi/remove-unused-options
kubelet: Remove unused run container options
2024-07-19 18:05:16 -07:00
Kubernetes Prow Robot
14b34fc255 Merge pull request #125834 from tallclair/log-cleanup
[kubelet] Cleanup incorrect log about static pod status change
2024-07-19 16:58:54 -07:00
Kubernetes Prow Robot
ec8015daac Merge pull request #124273 from panoswoo/fix/124255
Remove missing extended resources from init containers
2024-07-19 15:29:53 -07:00
Kubernetes Prow Robot
27fa59a8af Merge pull request #125656 from gyuho/recent-stats-check-error-for-error-level-logging
feat(kubelet/stats): match cadvisor error to lower not found stats log level
2024-07-18 19:24:01 -07:00
Kubernetes Prow Robot
f2428d66cc Merge pull request #125163 from pohly/dra-kubelet-api-version-independent-no-rest-proxy
DRA: make kubelet independent of the resource.k8s.io API version
2024-07-18 17:47:48 -07:00
Kubernetes Prow Robot
5fc7032a0e Merge pull request #126156 from pohly/kubelet-test-enhancements
kubelet test enhancements
2024-07-18 14:50:54 -07:00
Kubernetes Prow Robot
fa7fcde5a4 Merge pull request #125813 from aojea/node_csr_ips
Node Request Certificates require to have IPs
2024-07-18 14:50:48 -07:00
Patrick Ohly
7701a48bd6 dra kubelet: bump gRPC API to v1alpha4
The previous changes are an API break, therefore we need a new version.
2024-07-18 23:30:09 +02:00
Harshal Patil
fff2b7f566 Kubelet option to disable cgroup v1 support
Signed-off-by: Harshal Patil <harpatil@redhat.com>
2024-07-18 14:00:21 -04:00
Kubernetes Prow Robot
601eb7e9cf Merge pull request #122922 from marosset/windows-memory-eviction
Add support for Windows memory-pressure eviction
2024-07-18 10:39:06 -07:00
Antonio Ojea
bc63c412b9 kubelet request certificates if at least one IP exist
A Kubernetes Node requires to have at minimum one IP address
because those are used on the Pods field HostIPs and in some cases,
when pods uses hostNetwork: true, as PodIPs.
Nodes that use IP addresses as Hostname are interpreted as an IP
address, so it is possible that are nodes that don't hane any DNSname.

The feature gate AllowDNSOnlyNodeCSR will allow user to opt-in for
the old behavior.

Change-Id: I094531d87246f1e7a5ef4fe57bd5d9840cb1375d
2024-07-18 09:44:48 +00:00
Kubernetes Prow Robot
9196650533 Merge pull request #123819 from fakecore/fc/master
fix: handle socket file detection on Windows
2024-07-18 00:53:16 -07:00
Patrick Ohly
348f94ab55 DRA: read ResourceClaim in DRA drivers
This is the second and final step towards making kubelet independent of the
resource.k8s.io API versioning because it now doesn't need to copy structs
defined by that API from the driver to the API server.
2024-07-18 09:09:20 +02:00
Patrick Ohly
616a014347 DRA: move ResourceSlice publishing into DRA drivers
This is a first step towards making kubelet independent of the resource.k8s.io
API versioning because it now doesn't need to copy structs defined by that API
from the driver to the API server. The next step is removing the other
direction (reading ResourceClaim status and passing the resource handle to
drivers).

The drivers must get deployed so that they have their own connection to the API
server. Securing at least the writes via a validating admission policy should
be possible.

As before, the kubelet removes all ResourceSlices for its node at startup, then
DRA drivers recreate them if (and only if) they start up again. This ensures
that there are no orphaned ResourceSlices when a driver gets removed while the
kubelet was down.

While at it, logging gets cleaned up and updated to use structured, contextual
logging as much as possible. gRPC requests and streams now use a shared,
per-process request ID and streams also get logged.
2024-07-18 09:09:19 +02:00
Mark Rossetti
0411a3d565 Add support for memory pressure evictiong on Windows
Signed-off-by: Mark Rossetti <marosset@microsoft.com>
2024-07-17 15:11:30 -07:00
Kubernetes Prow Robot
5d40866fae Merge pull request #125994 from carlory/fix-job-api
clean up codes after PodDisruptionConditions was promoted to GA
2024-07-17 14:37:09 -07:00
Patrick Ohly
6604ff94d8 kubelet: enhance podresources tests
The manual deep comparison code is hard to maintain (would need to be updated
in https://github.com/kubernetes/kubernetes/pull/125488) and error prone.

In fact, one test case failed when doing a full automatic comparison with
cmp.Diff because it wasn't setting allMemory.
2024-07-17 17:50:10 +02:00
Patrick Ohly
b9d00841a6 kubelet: improve checkpoint errors
Recording the expected and actual checksum in the error makes it possible to
provide that information, for example in a failed test like the ones for DRA.
Otherwise developers have to manually step through the test with a debugger to
figure out what the new checksum is.
2024-07-17 16:07:31 +02:00
Kubernetes Prow Robot
52c0ed4673 Merge pull request #124342 from zhifei92/fix-error-check
fix error checking in kl.killPod within SyncPod
2024-07-16 16:05:07 -07:00