Commit Graph

11283 Commits

Author SHA1 Message Date
Kubernetes Prow Robot
4734021993 Merge pull request #121258 from saschagrunert/cri-fs-err
Populate CRI filesystem info error
2023-10-19 04:03:45 +02:00
Kubernetes Prow Robot
39697a9f3b Merge pull request #120782 from PI-Victor/fix/refactor_port_resolver_test
kubelet/lifecycle handlers: refactor port resolver
2023-10-19 04:03:26 +02:00
Kubernetes Prow Robot
3cb3e8b7dc Merge pull request #116892 from SataQiu/fix-kubelet-20230323
kubelet: perform the admission checks that preemption will not help first to avoid meaningless pod eviction
2023-10-19 02:47:50 +02:00
Rodrigo Campos
2508f468a8 kubelet/userns: Add more unit tests
This covers all public methods and overall test coverage is above 80%
again.

Signed-off-by: Rodrigo Campos <rodrigoca@microsoft.com>
2023-10-18 11:59:54 +02:00
Todd Neal
7bcc98c46b sidecars: terminate sidecars after main containers
Sidecars should terminate:
- after all main containers have exited
- serialized and in reverse order
2023-10-17 19:07:21 -05:00
Kubernetes Prow Robot
a7b8357a55 Merge pull request #118165 from champly/master
kubelet: fix comment typo
2023-10-17 23:28:25 +02:00
Kubernetes Prow Robot
0095ae3b25 Merge pull request #120195 from Ithrael/fix/error-handling-condition-in-test
fix(test): fix error handling condition in test
2023-10-17 20:08:01 +02:00
Kubernetes Prow Robot
3d77b95bcf Merge pull request #118704 from dgl/crio-socket-fix
Match on cri-o socket suffix only
2023-10-17 20:07:52 +02:00
Kubernetes Prow Robot
7824ac0f3e Merge pull request #114336 from claudiubelu/fixes-test-get-file-type
unittests: Fixes hostutil.GetFileType for Windows
2023-10-17 20:07:39 +02:00
Kubernetes Prow Robot
639f63c4e5 Merge pull request #121261 from kannon92/revert-119882-podres-client-wait
Revert "podresources: e2e: force eager connection"
2023-10-17 16:14:29 +02:00
Gunju Kim
ca6fda05ce Restart containers in right order after the podSandboxChanged
This is a workaround for the issue that the kubelet cannot differentiate
the container statuses of the previous podSandbox from the current one.

If the node is rebooted, all containers will be in the exited state and
the kubelet will try to recreate a new podSandbox. In this case, the
kubelet should not mistakenly think that the newly created podSandbox
has been initialized.
2023-10-17 22:11:31 +09:00
Swati Sehgal
9a354fc9d0 node: sample-dp: Add retry to handle device plugin restart failure
Add retry mechanism to handle cases where after kubelet restarts, the device
plugin unix socket(s) were created but not ready to serve yet.

Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
2023-10-17 12:19:10 +01:00
Swati Sehgal
d0d133298d node: sample-dp: Use fsnotify for kubelet restart detection
Add kubeletSocket file to fsnotify instead of polling and waiting for deletion
of device plugin unix socket as a way of detecting kubelet restart. We need to
ensure that the device plugin re-registers itself after kubelet restart depending
on the configured registration mode (auto-registration or controller registration).

Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
2023-10-17 12:19:10 +01:00
Swati Sehgal
211d8cc80a node: sample-dp: stubRegisterControlFunc for controlling registration
If the user specifies the intent to control registration process, we rely on
registration triggers (deletion of control file) to prompt registration.

This behvaiour is expected to be consistent across kubelet restarts and therefore
across the watch calls where we watch for changes to the unix socket so we make
this part of Stub object instead of a parameter.

Co-authored-by: Francesco Romani <fromani@redhat.com>
Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
2023-10-17 12:19:10 +01:00
Swati Sehgal
c4c9d61d66 node: sample-dp: Handle re-registration for controlled registrations
In case `REGISTER_CONTROL_FILE` is specified, we want to ensure that the
registration is triggered by deletion of the control file. This is
applicable both when the registration happens for the first time and
subsequent ones because of kubelet restarts.

Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
2023-10-17 12:19:07 +01:00
Swati Sehgal
6714e678d3 node: sample-dp: register by default and re-register on restarts
In issue: 115107 we added an environment variable to control the registration of sample
device plugin to kubelet. The intent of this patch is to ensure that the default
behaviour of the plugin is to register to kubelet (in case no environment
variable is specified).

In addition to that, we want to ensure that the plugin registers itself not just once.
It should re-register itself to kubelet in case of node reboot or kubelet restarts.

Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
2023-10-17 12:14:09 +01:00
Gunju Kim
d2b803246a Don't reuse the device allocated to the restartable init container 2023-10-17 18:28:29 +09:00
Sai Ramesh Vanka
1715fc0ca0 Fix issue in enabling evented pleg feature gate
Fixes https://github.com/kubernetes/kubernetes/issues/120941
GetNewerThan() call isn't blocking until the pod status/cache is updated and returning the empty pod status.
Hence, whenever the `SyncLoop ADD/UPDATE/RECONCILE` functions are called multiple times in a very less time interval,
Kubelet calls multiple `CreateContainer` CRI api that results in the creation of duplicate containers within a given pod.
The initially created conainer keeps `Running` and the later container keeps `Exiting` and hence resulting the pod in `CrashLoopBackOff` state forever

Signed-off-by: Sai Ramesh Vanka <svanka@redhat.com>
2023-10-17 13:07:01 +05:30
Kubernetes Prow Robot
873eb57a4e Merge pull request #119522 from YTGhost/validation-for-static-pods-name
add validation for static pods to have a name
2023-10-17 03:38:15 +02:00
Kubernetes Prow Robot
c5815fee72 Merge pull request #113825 from harche/ep_comments
Keep PLEG interruptions in a separate interface
2023-10-17 03:37:57 +02:00
Peter Hunt
28f335a339 kubelet/images: refactor image gc unit tests
Signed-off-by: Peter Hunt <pehunt@redhat.com>
2023-10-16 16:34:29 -04:00
Peter Hunt
e22ebf13a9 kubelet/images: refactor freeImage and imagesInEvictionOrder
Signed-off-by: Peter Hunt <pehunt@redhat.com>
2023-10-16 16:34:29 -04:00
Kubernetes Prow Robot
f6ba25fdbd Merge pull request #119026 from AxeZhan/sleepAction
Introducing Sleep Action for PreStop Hook
2023-10-16 21:19:44 +02:00
Kubernetes Prow Robot
c7d270302c Merge pull request #121059 from matte21/improve_err_message_in_cpu_assignments
Improve error message in Kubelet CPU assignment logic
2023-10-16 16:48:54 +02:00
Kubernetes Prow Robot
0de29e1d43 Merge pull request #120911 from gjkim42/devicemanager-remove-deprecated-sets-string
pkg/kubelet/cm: Remove deprecated sets.String and sets.Int
2023-10-16 16:48:40 +02:00
Kevin Hannon
dd9c3358f5 Revert "podresources: e2e: force eager connection" 2023-10-16 09:46:04 -04:00
Sascha Grunert
39dcad8a19 Populate CRI filesystem info error
Usually we just log the error but since it's used by the GC we now
populate it up the call stack.

Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
2023-10-16 13:09:04 +02:00
HirazawaUi
1132fd0afd add tcp_fin_timeout, tcp_keepalive_intvl and tcp_keepalive_probes to safe sysctls 2023-10-15 23:05:40 +08:00
AxeZhan
3a96afdfef implementation 2023-10-15 13:57:48 +08:00
Kubernetes Prow Robot
378866edba Merge pull request #120518 from saschagrunert/metrics-container-start
kubelet: fix metric `container_start_time_seconds` timestamp
2023-10-15 07:05:37 +02:00
Kubernetes Prow Robot
95bd8b95a7 Merge pull request #100448 from saschagrunert/cri-stats-log
Do not error log CRI stats for not cached partitions
2023-10-14 23:49:12 +02:00
Kubernetes Prow Robot
4911aad463 Merge pull request #115702 from xyz-li/master
Fix:  kubelet will not output logs after log file is rotated
2023-10-14 22:42:04 +02:00
cyclinder
10151a5e38 kubelet/sysctl: update log level 2023-10-13 11:23:59 +08:00
Kubernetes Prow Robot
a7f8c2f787 Merge pull request #118846 from cyclinder/net.ipv4.tcp_keepalive_time
Mark net.ipv4.tcp_keepalive_time as a safe sysctl
2023-10-13 05:02:51 +02:00
Kubernetes Prow Robot
8923c3c871 Merge pull request #119659 from kannon92/beta-pod-ready-to-start
[KEP-3085] Promote PodReadyToStartContainers to beta in 1.29
2023-10-12 22:49:16 +02:00
Kevin Hannon
c94240e2e2 move kubelet constant for podreadytostart to staging 2023-10-12 11:18:11 -04:00
Kubernetes Prow Robot
38a1ec75f0 Merge pull request #119882 from ffromani/podres-client-wait
podresources: e2e: force eager connection
2023-10-12 15:59:55 +02:00
cyclinder
0167a9f833 mark net.ipv4.tcp_keepalive_time as a safe sysctl 2023-10-11 10:24:19 +08:00
Justin Garrison
4acaf9ebed Add service name tests 2023-10-10 22:27:44 +00:00
matte21
d4a5a085a8 Improve error message in cpu assignment logic
Include number of requested and available CPUs in the error message
when the assignment of CPUs fails because there are less available
CPUs than requested.
2023-10-09 13:31:37 -04:00
Jan Safranek
0be5fdb5ce Add volume plugin label to SELinux metrics
Record volume plugin name when a volume in a Pod needs a different
"mount -o context" value than the actually mounted one.

We expect that NFS, CIFS and CephFS volumes would be able to mount such
volumes just fine with multiple "-o context" values.

We know that the block-volume based ones (ext4, xfs, btrfs, ...) cannot do
that.

Therefore want to distinguish the volume plugin in metrics, anything
block-volume based could break an existing application.
2023-10-09 11:18:39 +02:00
Katarzyna Lach
122ff5a212 Move grpc rate limitter from podresource folder
Rate limitter.go file is a generic file implementing
grpc Limiter interface. This file can be reuse by other gRPC
API not only by podresource.

Change-Id: I905a46b5b605fbb175eb9ad6c15019ffdc7f2563
2023-10-09 07:22:23 +00:00
Antonio Ojea
3ee2f27e5b kubelet: cloud-provider external addresses
Kubelet, if using cloud provider external, initializes temporary
the node addresses using the non-cloud provider logic, until the
cloud provider overrides it.

This behavior has undesired consequences if the cloud-provider addresses
are different than the original ones, specially for hostNetwork pods,
that inherit these addresses from the Node.

Since some cloud-providers depend on this behavior, in order to keep
backward compatibility, assume that the specifying addresses via
the node-ip flags means that the intent is to keep the existing
behavior to temporary initialize the addresses.

If the node-ips are the unspecified addresses or are not set, then
wait for the external cloud provider to set the node addresses.

Change-Id: I3a3895f9b830769f9658e6a03f058c914c438a09
Signed-off-by: Antonio Ojea <aojea@google.com>
2023-10-06 14:01:28 +00:00
Gunju Kim
8b5f30ef09 Don't reuse CPU set of a restartable init container 2023-10-06 22:16:15 +09:00
matte21
a213edae2a Add package-level godoc to pkg/kubelet/cm
Add file doc.go with some rudimentary information to package
kubelet/cm. This will make it easier for people approaching the
kubelet codebase for the first time to quickly understand what's
in the package, since its name is abbreviated and hostile to
newcomers.
2023-10-05 14:20:51 -04:00
Kubernetes Prow Robot
a321897e77 Merge pull request #120262 from harche/list_timeout
Add timeout to listContainerStats context
2023-10-02 07:46:46 -07:00
Kubernetes Prow Robot
622509830c Merge pull request #120716 from xrstf/fix-typos
Fix typos
2023-09-30 00:25:56 -07:00
Gunju Kim
a0610a97b3 pkg/kubelet/cm: Remove deprecated sets.String and sets.Int
This removes deprecated sets.String and sets.Int
- replace sets.String with sets.Set[string]
- replace sets.Int with sets.Set[int]
- replace sets.NewString with sets.New[string]
- replace sets.NewInt with sets.New[int]
- replace sets.(OLD).List with sets.List(NEW)
2023-09-27 22:02:15 +09:00
Evan Lezar
394bcaf182 Only configure swap if available on node
This change bypasses all logic to set swap in the linux container
resources if a swap controller is not available on node. Failing
to do so may cause errors in runc when starting a container with
a swap configuration -- even if this is set to 0.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-09-26 21:32:58 +02:00
Evan Lezar
d3d1827c05 Use local isCgroup2UnifiedMode consistently
This change switches to using isCgroup2UnifiedMode locally to ensure
that any mocked function is also used when checking the swap controller
availability.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-09-21 16:09:04 +02:00