Commit Graph

11283 Commits

Author SHA1 Message Date
Shiming Zhang
e6bdd224c1 Add HostIPs for kubelet 2023-07-14 09:35:30 +08:00
Kubernetes Prow Robot
d37c62dcbf Merge pull request #117800 from cyclinder/loggin_format
Add '--logging-format' flag to kube-proxy
2023-07-13 08:40:37 -07:00
cyclinder
c550c17f7f accept int or string flush frequency 2023-07-13 14:33:33 +08:00
Kubernetes Prow Robot
70370d0210 Merge pull request #117731 from jongwooo/refactor/use-early-return-pattern
Use early return pattern to avoid nested conditions
2023-07-12 17:59:41 -07:00
Kubernetes Prow Robot
0086712926 Merge pull request #116922 from sourcelliu/checkpoint
Improve the performance of map usage
2023-07-12 17:59:30 -07:00
Kubernetes Prow Robot
047d040ce7 Merge pull request #119012 from pohly/dra-batch-node-prepare
kubelet: support batched prepare/unprepare in v1alpha3 DRA plugin API
2023-07-12 10:57:37 -07:00
Kubernetes Prow Robot
ac07b4612e Merge pull request #117804 from jsafrane/fix-csi-attachable-reconstruction
Fix reconstruction of CSI volumes
2023-07-12 10:57:15 -07:00
Kubernetes Prow Robot
be222f38f0 Merge pull request #119058 from TommyStarK/dra-state-checkpoint-unit-test
dynamic resource allocation: Improve code coverage of state checkpoint
2023-07-12 07:49:14 -07:00
Patrick Ohly
d743c50bb9 kubelet: support batched prepare/unprepare in v1alpha3 DRA plugin API
Combining all prepare/unprepare operations for a pod enables plugins to
optimize the execution. Plugins can continue to use the v1beta2 API for now,
but should switch. The new API is designed so that plugins which want to work
on each claim one-by-one can do so and then report errors for each claim
separately, i.e. partial success is supported.
2023-07-12 14:50:30 +02:00
Francesco Romani
01c3a51a78 node: podresources: getallocatable: move to GA
lock the feature gate to GA, and remove the now-redundant code.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2023-07-12 14:11:22 +02:00
TommyStarK
f924bf95df dynamic resource allocation: Improve code coverage of state checkpoint
Signed-off-by: TommyStarK <thomasmilox@gmail.com>
2023-07-12 13:27:18 +02:00
Francesco Romani
c635a7e7d8 node: devicemgr: topomgr: add logs
One of the contributing factors of issues #118559 and #109595 hard to
debug and fix is that the devicemanager has very few logs in important
flow, so it's unnecessarily hard to reconstruct the state from logs.

We add minimal logs to be able to improve troubleshooting.
We add minimal logs to be backport-friendly, deferring a more
comprehensive review of logging to later PRs.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2023-07-12 13:25:36 +02:00
Francesco Romani
3bcf4220ec kubelet: devices: skip allocation for running pods
When kubelet initializes, runs admission for pods and possibly
allocated requested resources. We need to distinguish between
node reboot (no containers running) versus kubelet restart (containers
potentially running).

Running pods should always survive kubelet restart.
This means that device allocation on admission should not be attempted,
because if a container requires devices and is still running when kubelet
is restarting, that container already has devices allocated and working.

Thus, we need to properly detect this scenario in the allocation step
and handle it explicitely. We need to inform
the devicemanager about which pods are already running.

Note that if container runtime is down when kubelet restarts, the
approach implemented here won't work. In this scenario, so on kubelet
restart containers will again fail admission, hitting
https://github.com/kubernetes/kubernetes/issues/118559 again.
This scenario should however be pretty rare.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2023-07-12 13:25:36 +02:00
Kubernetes Prow Robot
e0dafe57a3 Merge pull request #117351 from pohly/dra-generated-resource-claim-names
DRA: generated resource claim names
2023-07-11 10:33:11 -07:00
PiotrProkop
f855a23b45 topologymanager: promote TopologyManagerPolicyOptions feature to beta
* Promote TopologyManagerPolicyOptions feature to beta
* Promote PreferClosestNUMANodes TopologyManagerPolicyOption to beta

Signed-off-by: PiotrProkop <pprokop@nvidia.com>
2023-07-11 15:06:57 +02:00
PiotrProkop
23833b9c81 topologymanager: Increase TopologyManager test coverage by adding negative test cases around NUMA topology discovery
Signed-off-by: PiotrProkop <pprokop@nvidia.com>
2023-07-11 15:04:32 +02:00
Patrick Ohly
444d23bd2f dra: generated name for ResourceClaim from template
Generating the name avoids all potential name collisions. It's not clear how
much of a problem that was because users can avoid them and the deterministic
names for generic ephemeral volumes have not led to reports from users. But
using generated names is not too hard either.

What makes it relatively easy is that the new pod.status.resourceClaimStatus
map stores the generated name for kubelet and node authorizer, i.e. the
information in the pod is sufficient to determine the name of the
ResourceClaim.

The resource claim controller becomes a bit more complex and now needs
permission to modify the pod status. The new failure scenario of "ResourceClaim
created, updating pod status fails" is handled with the help of a new special
"resource.kubernetes.io/pod-claim-name" annotation that together with the owner
reference identifies exactly for what a ResourceClaim was generated, so
updating the pod status can be retried for existing ResourceClaims.

The transition from deterministic names is handled with a special case for that
recovery code path: a ResourceClaim with no annotation and a name that follows
the Kubernetes <= 1.27 naming pattern is assumed to be generated for that pod
claim and gets added to the pod status.

There's no immediate need for it, but just in case that it may become relevant,
the name of the generated ResourceClaim may also be left unset to record that
no claim was needed. Components processing such a pod can skip whatever they
normally would do for the claim. To ensure that they do and also cover other
cases properly ("no known field is set", "must check ownership"),
resourceclaim.Name gets extended.
2023-07-11 14:23:48 +02:00
Kubernetes Prow Robot
bc01306c98 Merge pull request #116738 from AxeZhan/TopologyManagerPolicy
When TopologyManagerPolicy is None, skip checks in NewManager.
2023-07-11 04:53:13 -07:00
Evan Lezar
cd14e97ea8 Add a builder for ContainerAllocateResponse objects
This chagne introduces a helper to construct ContainerAllocateResponse instances.
Test cases are updated to use a new constructor accepting functional options
allowing the response contents to be set based on the test requirements.

This can then be extended to also test additional fields in the device plugin API
such as annotations which are not currently covered or new fields.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-07-11 11:48:26 +02:00
Evan Lezar
db2a1edbdd Generate empty cdi annotations
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-07-11 11:48:24 +02:00
Evan Lezar
f0e3c32fe5 Move CDI annotation code to utils package
Signed-off-by: Evan Lezar <elezar@nvidia.com>
2023-07-11 11:47:53 +02:00
Jan Safranek
354b6c409f Rename updateReconstructedFromAPIServer
to be in sync with volumesNeedUpdateFromNodeStatus.
2023-07-11 11:25:43 +02:00
Jan Safranek
1903f5aa2a Rename volumesNeedDevicePath
To volumesNeedUpdateFromNodeStatus - because both devicePath and uncertain
attach-ability needs to be fixed from node status.
2023-07-11 11:15:24 +02:00
Jan Safranek
7cd60df4aa Update volumesInUse after attachability is confirmed
node.status.volumesInUse should report only attachable volumes, therefore
it needs to wait for the reconciler to update uncertain attachability of
volumes from the API server.
2023-07-11 10:32:22 +02:00
Jan Safranek
0a2272dc68 Add uncertain state of volume attach-ability
During CSI volume reconstruction it's not possible to tell, if the volume
is attachable or not - CSIDriver instance may not be available, because
kubelet may not have connection to the API server at that time.

Adding uncertain state during reconstruction + adding a correct state when
the API server is available.
2023-07-11 10:32:22 +02:00
Sascha Grunert
a6554b9d5d Make kubelet label types public
We use the label definitions in CRI-O, means we now make them public to
stop vendoring/copying this part of Kubernetes.

Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
2023-07-10 10:58:44 +02:00
Kubernetes Prow Robot
1e0b4c84cf Merge pull request #116879 from lzhecheng/fix-generateAPIPodStatus-dualstack
[Dual-stack] Fix generateAPIPodStatus() of kubelet handling Secondary IP
2023-07-07 20:37:04 -07:00
Todd Neal
ea1eb7f8f7 implement sidecar resource calculation 2023-07-08 07:26:13 +09:00
Gunju Kim
b94fa250c2 Sidecar: Implement lifecycle of the restartable init container
- Implement `computeInitContainerActions` to sets the actions for the
  init containers, including those with `RestartPolicyAlways`.
- Allow StartupProbe on the restartable init containers.
- Update PodPhase considering the restartable init containers.
- Update PodInitialized status and status manager considering the
  restartable init containers.

Co-authored-by: Matthias Bertschy <matthias.bertschy@gmail.com>
2023-07-08 07:26:12 +09:00
Kubernetes Prow Robot
7581ae8123 Merge pull request #116739 from moshe010/clone-cdi-devices
kubelet dra: lock before getting claimInfo CDIDevices and annotations fields
2023-07-07 06:31:04 -07:00
Sascha Grunert
20a25cbfcf Add user specified image to CRI ContainerConfig
The container config image references either an image ID or a digest,
but not the original image from the container config. We require the
image for signature verification to ensure that we actually verify the
correct image.

Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
2023-07-06 08:40:09 +02:00
Zhecheng Li
985cf718a4 [Dual-stack] Fix generateAPIPodStatus() of kubelet handling Secondary IP
hostIPs order may not be be consistent. If secondary IP is before
primary one, current logic adds primary IP twice into PodIPs, which
leads to error: "may specify no more than one IP for each IP family".
In this case, the second IP shouldn't be added.

Co-authored-by: Antonio Ojea <antonio.ojea.garcia@gmail.com>
2023-07-03 06:47:15 +00:00
Kubernetes Prow Robot
01171e8250 Merge pull request #118976 from ctripcloud/fix-typo
fix kubelet podWorkers.SyncKnownPods() typo
2023-07-01 06:06:48 -07:00
Kubernetes Prow Robot
c2b7d25ff8 Merge pull request #118691 from giuseppe/drop-check-for-volumes
apis: drop check for volumes with user namespaces
2023-06-29 16:23:56 -07:00
zach593
f7cf9effa3 fix kubelet podWorkers.SyncKnownPods() typo
Signed-off-by: zach593 <montenukem@outlook.com>
2023-06-29 22:32:03 +08:00
guoguangwu
0da37d8c54 chore: omit comparison to bool constant 2023-06-29 10:41:50 +08:00
Kubernetes Prow Robot
c3c731890c Merge pull request #117927 from kaisoz/add-FailedToRetrieveImagePullSecret-event
Log a warning if a ImagePullSecrets does not exist
2023-06-28 11:14:31 -07:00
Kubernetes Prow Robot
52457842d1 Merge pull request #117055 from cyclinder/csi_migration
remove CSI-migration gate
2023-06-28 04:28:31 -07:00
Kubernetes Prow Robot
b3d94ae74f Merge pull request #118786 from pohly/dra-test-skip-prepare
dra: kubelet must skip NodePrepareResource if not used by any container
2023-06-27 09:58:32 -07:00
Patrick Ohly
bde66bfb55 kubelet dra: restore skipping of unused resource claims
1aeec10efb removed iterating over containers in favor of iterating over pod
claims. This had the unintended consequence that NodePrepareResource gets
called unnecessarily when no container needs the claim. The more natural
behavior is to skip unused resources. This enables (theoretic, at this time)
use cases where some DRA driver relies on the controller part to influence
scheduling, but then doesn't use CDI with containers.
2023-06-27 16:02:31 +02:00
Patrick Ohly
874daa8b52 kubelet dra: fix checking of second pod which uses a claim
When a second pod wanted to use a claim, the obligatory sanity check whether
the pod is really allowed to use the claim ("reserved for") was skipped.
2023-06-27 16:01:11 +02:00
Davanum Srinivas
f7239e4095 Better back off delays and connection timeout to talk to containerd
Set up params similar to what we do in cadvisor:
e9068e3273/container/containerd/client.go (L59-L61)

Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2023-06-25 16:25:28 -04:00
Jan Safranek
45aa59946a Refactor FindAttachablePluginBySpec out of CSI code path
reconstructVolume() is called when kubelet may not have connection to the
API server yet, therefore it cannot get CSIDriver instances to figure out
if a CSI volume is attachable or not.

Refactor reconstructVolume(), so it does not need
FindAttachablePluginBySpec for CSI volumes, because all of them are
deviceMountable (i.e. FindDeviceMountablePluginBySpec always returns the
CSI volume plugin).
2023-06-23 12:28:15 +02:00
Giuseppe Scrivano
531d38e323 features: rename UserNamespacesStatelessPodsSupport
now it is called UserNamespacesSupport since all kind of volumes are
supported.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2023-06-22 15:19:50 +02:00
Michal Wozniak
17013d3960 Review remarks to improve HandlePodCleanups in kubelet 2023-06-22 10:55:39 +02:00
Michal Wozniak
e3ee9b9adc Fix the deletion of rejected pods 2023-06-22 09:18:34 +02:00
Davanum Srinivas
c98e72841b Add a connection backoff to talk to CRI impls
We can add backoff for connection like we do in cadvisor:
https://github.com/google/cadvisor/blob/master/container/containerd/client.go#L76-L80

for now, don't tune it, just use the default:
https://github.com/grpc/grpc/blob/master/doc/connection-backoff.md

Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2023-06-20 21:32:50 -04:00
Kubernetes Prow Robot
6a79a8a57c Merge pull request #115835 from HirazawaUi/fix-terminationGracePeriod-bug
fix terminationGracePeriod blocked by preStop
2023-06-14 10:34:18 -07:00
carlory
5e048041e4 remove helper function for unused storage feature in pkg/proxy/util 2023-06-13 09:22:59 +08:00
Kubernetes Prow Robot
86d786090a Merge pull request #117793 from tzneal/memory-oom-group-support
use the cgroup aware OOM killer if available
2023-06-12 14:45:58 -07:00