Commit Graph

10630 Commits

Author SHA1 Message Date
Rodrigo Campos
ec0410a266 kubelet: Move userns manager to its own package
To that end, we need to add one kubelet getter listPodsFromDisk(). Other
than that, it is a pretty trivial move.

Signed-off-by: Rodrigo Campos <rodrigoca@microsoft.com>
2023-03-13 22:28:04 +01:00
Rodrigo Campos
16d76f6813 kubelet: Don't reserve mapping for userns phase II
Latest changes to KEP-127 removed that phase, so let's stop reserving
those IDs for that.

While we are there, we replace 0 for 0*65536 as before we had a bug that
we were not multiplying the index, to avoid bugs in the future.

Signed-off-by: Rodrigo Campos <rodrigoca@microsoft.com>
2023-03-13 22:28:04 +01:00
Rodrigo Campos
8af3cce7fe kubelet: remove GetHostIDsForPod()
Now KEP-127 relies on idmap mounts to do the ID translation and we won't
do any chowns in the kubelet.

This patch just removes the usage of GetHostIDsForPod() in
operationexecutor to do the chown, and also removes the
GetHostIDsForPod() method from the kubelet volume interface.

Signed-off-by: Rodrigo Campos <rodrigoca@microsoft.com>
2023-03-13 22:28:03 +01:00
Giuseppe Scrivano
9075404dc4 kubelet: use idmapped mounts for all volumes
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2023-03-13 22:28:03 +01:00
Mark Rossetti
b3a67abec7 Updating perfCounterUpdatePerioud for Windows to 10 seconds
Signed-off-by: Mark Rossetti <marosset@microsoft.com>
2023-03-13 12:13:24 -07:00
Kubernetes Prow Robot
3106a5c553 Merge pull request #116301 from andyzhangx/remove-azuredisk-code
Remove Azure disk in-tree storage plugin
2023-03-13 10:38:48 -07:00
Kevin Klues
685688c703 Update DRAManager to allow multiple plugins to process a single claim
Right now, the v1alpha1 API only passes enough information for one plugin to
process a claim, but the v1alpha2 API will allow for multiple plugins to
process a claim. This commit prepares the code for this upcoming change.

Signed-off-by: Kevin Klues <kklues@nvidia.com>
2023-03-13 12:52:41 +00:00
Kevin Klues
569ed33d78 Add additional tests to DRAManager checkpointing
Signed-off-by: Kevin Klues <kklues@nvidia.com>
2023-03-13 12:52:41 +00:00
Kevin Klues
fd7370b84d Update DRAManager checkpoint to store a map for CDIDevices
The key of the map is the KubeletPluginName where the CDIDevices originate.

Signed-off-by: Kevin Klues <kklues@nvidia.com>
2023-03-13 12:52:41 +00:00
Kevin Klues
273a8ffad1 Rename CdiDevices to CDIDevices in dramanager checkpoint
Signed-off-by: Kevin Klues <kklues@nvidia.com>
2023-03-13 12:52:41 +00:00
Claudiu Belu
e3edf13486 unittests: Adds winstats unittests
The module pkg/kubelet/winstats has almost no coverage for Windows. This
commit adds unit tests to cover the mentioned module.
2023-03-13 12:08:15 +00:00
Saza
d34b0275a3 dynamic resource allocation: add timeouts for communiction with plugin (#114844)
* add timeouts for communication with dra plugin

* move timeout constant to k8s.io/kubernetes/pkg/kubelet/cm/util

* move settings of timeout to pkg/kubelet/plugin/dra/plugin/client.go

* remove timeout constant
2023-03-13 04:34:56 -07:00
John Kwiatkoski
69465d2949 Adding test coverage for NewPodContainerManager() (#110220) 2023-03-13 02:08:44 -07:00
vinay kulkarni
1e01358ea2 Initialize pod resource allocation checkpoint manager to noop
This avoids accidentally introducing null pointer access if manager
functions were called outside of InPlacePodVerticalScaling feature gate.
2023-03-13 00:16:44 +00:00
vinay kulkarni
9a805db010 Set default resize policy only for specified resource types, rename RestartNotRequired -> NotRequired 2023-03-12 23:46:40 +00:00
vinay kulkarni
8b23497ae7 Restructure naming of resource resize restart policy 2023-03-12 23:11:32 +00:00
Kubernetes Prow Robot
3c6e419cc3 Merge pull request #116450 from vinaykul/restart-free-pod-vertical-scaling-api
Rename ContainerStatus.ResourcesAllocated to ContainerStatus.AllocatedResources
2023-03-12 16:06:40 -07:00
Kubernetes Prow Robot
a4a0fd44d8 Merge pull request #115912 from moshe010/dra-checkpoint
kubelet DRA: Add checkpointing mechanism in the DRA Manager
2023-03-12 12:20:40 -07:00
Moshe Levi
2c79af0d63 kubelet dra: add unit tests for checkpoint
Signed-off-by: Moshe Levi <moshele@nvidia.com>
2023-03-12 09:13:19 +02:00
vinay kulkarni
1c7850c355 Fix null pointer access in doPodResizeAction for kubeletonly mode 2023-03-12 05:59:14 +00:00
Vadim Rutkovsky
556d774945 kubelet: create top-level traces for pod sync and GC
This starts new top level OpenTelemetry spans every time syncPod or image / container GC is invoked
2023-03-11 10:42:14 +01:00
andyzhangx
c2b2a7622f revert azuredisk test removal change
revert

revert vendor changes

revert

revert

fix
2023-03-11 07:10:05 +00:00
Francesco Romani
b837a0c1ff kubelet: podresources: DOS prevention with builtin ratelimit
Implement DOS prevention wiring a global rate limit for podresources
API. The goal here is not to introduce a general ratelimiting solution
for the kubelet (we need more research and discussion to get there),
but rather to prevent misuse of the API.

Known limitations:
- the rate limits value (QPS, BurstTokens) are hardcoded to
  "high enough" values.
  Enabling user-configuration would require more discussion
  and sweeping changes to the other kubelet endpoints, so it
  is postponed for now.
- the rate limiting is global. Malicious clients can starve other
  clients consuming the QPS quota.

Add e2e test to exercise the flow, because the wiring itself
is mostly boilerplate and API adaptation.
2023-03-11 08:00:54 +01:00
Kubernetes Prow Robot
c6f3007071 Merge pull request #115967 from harche/evented_pleg_metrics
Graduate Evented PLEG to Beta
2023-03-10 17:34:40 -08:00
Kubernetes Prow Robot
1f2d49972c Merge pull request #116424 from jsafrane/add-selinux-metric-test
Add e2e tests for SELinux metrics
2023-03-10 12:41:06 -08:00
vinay kulkarni
01b96e7704 Rename ContainerStatus.ResourcesAllocated to ContainerStatus.AllocatedResources 2023-03-10 14:49:26 +00:00
Jan Safranek
05cd2ba863 Don't bump nr. of admitted volumes on retry
AddPodToVolume is called periodically, it does not make sense to bump
volume_manager_selinux_volumes_admitted_total on each call.
2023-03-10 15:03:56 +01:00
Jan Safranek
48ea6a3f3a Fix SELinux mismatch metrics
DesiredStateOfWorld must remember both
- the effective SELinux label to apply as a mount option (non-empty for
  RWOP volumes, empty otherwise)
- and the label that _would_ be used if the mount option would be used by
  all access modes.

Mismatch warning metrics must be generated from the second label.
2023-03-10 15:03:56 +01:00
Kubernetes Prow Robot
f734741cb8 Merge pull request #114373 from TommyStarK/unit-tests/kubelet-kuberuntime
kubelet/kuberuntime: Improving test coverage
2023-03-10 04:34:58 -08:00
Moshe Levi
e7256e08d3 kubelet dra: add checkpointing mechanism in the DRA Manager
The checkpointing mechanism will repopulate DRA Manager in-memory cache on kubelet restart.
This will ensure that the information needed by the PodResources API is available across
a kubelet restart.

The ClaimInfoState struct represent the DRA Manager in-memory cache state in checkpoint.
It is embedd in the ClaimInfo which also include the annotation field. The separation between
the in-memory cache and the cache state in the checkpoint is so we won't be tied to the in-memory
cache struct which may change in the future. In the ClaimInfoState we save the minimal required fields
to restore the in-memory cache.

Signed-off-by: Moshe Levi <moshele@nvidia.com>
2023-03-10 12:22:15 +02:00
TommyStarK
7f21a9ce01 kubelet/kuberuntime: Improving test coverage
Signed-off-by: TommyStarK <thomasmilox@gmail.com>
2023-03-10 11:06:54 +01:00
Francesco Romani
09517c27c4 kubelet: podresources: pack parameters in a struct
To enable rate limiting, needed for GA graduation,
we need to pass more parameters to the already crowded
`ListenAndServePodresources` function.

To tidy up a bit, pack the parameters in a helper struct,
with no intended changes in behavior.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2023-03-10 10:28:52 +01:00
Kubernetes Prow Robot
3219564cf3 Merge pull request #116296 from SataQiu/clean-kubelet-20230306
Remove unused resize.go from pkg/kubelet/container
2023-03-09 22:43:48 -08:00
Kubernetes Prow Robot
e57d968323 Merge pull request #116015 from SataQiu/clean-kubelet-20230223
kubelet: remove the deprecated --master-service-namespace flag
2023-03-09 22:43:34 -08:00
Kubernetes Prow Robot
a408be817f Merge pull request #115972 from jsafrane/add-orphan-pod-metrics
Add metric for failed orphan pod cleanup
2023-03-09 22:43:26 -08:00
Kubernetes Prow Robot
33d8614c9c Merge pull request #115929 from HirazawaUi/delete-kubelet-unused-function
cleanup(kubelet): remove unused function
2023-03-09 22:43:12 -08:00
Kubernetes Prow Robot
0018c07050 Merge pull request #115898 from saschagrunert/seccomp-todo
Default to sandbox `Seccomp` field instead of `SeccompProfilePath`
2023-03-09 22:43:05 -08:00
Kubernetes Prow Robot
06f0cba9b1 Merge pull request #115367 from tzneal/dedupe-resource-calculation
dedupe pod resource request calculation
2023-03-09 22:42:50 -08:00
Kubernetes Prow Robot
1b647d5bf8 Merge pull request #114558 from TommyStarK/unit-tests/pkg-kubelet-nodestatus
kubelet/nodestatus: Improving test coverage
2023-03-09 21:34:00 -08:00
Kubernetes Prow Robot
10802e9be1 Merge pull request #114498 from runzhliu/patch-2
Update kuberuntime_manager_test.go
2023-03-09 21:33:52 -08:00
Kubernetes Prow Robot
f6564d33ba Merge pull request #114357 from dengyufeng2206/1208pull
Log spelling formatting
2023-03-09 21:33:22 -08:00
Kubernetes Prow Robot
a3ad4d7623 Merge pull request #114017 from calvin0327/cleanup-containerruntime-options
cleanup container runtime options
2023-03-09 21:33:06 -08:00
Kubernetes Prow Robot
15f5a5c6ef Merge pull request #110949 from claudiubelu/adds-unittests-4
tests: Ports kubelet unit tests to Windows
2023-03-09 21:32:30 -08:00
Kubernetes Prow Robot
d241fcb4bd Merge pull request #110760 from zhoumingcheng/master-unit-v2
add unit test coverage for pkg/kubelet/types/
2023-03-09 20:30:29 -08:00
Kubernetes Prow Robot
33d9543ceb Merge pull request #111634 from KunWuLuan/pluginmanager_cache_log_amend
docs(desired_state_of_world.go): log in desired_state_of_world.go seems to be wrong
2023-03-09 19:08:29 -08:00
Kubernetes Prow Robot
45b96eae98 Merge pull request #113145 from smarterclayton/zombie_terminating_pods
kubelet: Force deleted pods can fail to move out of terminating
2023-03-09 15:32:30 -08:00
Todd Neal
4096c9209c dedupe pod resource request calculation 2023-03-09 17:15:53 -06:00
Kubernetes Prow Robot
54ec651ab5 Merge pull request #110741 from zhoumingcheng/master-unit-v1
add unit test coverage for pkg/kubelet/util/queue
2023-03-09 11:15:51 -08:00
andyzhangx
5d0a54dcb5 remove Azure Disk in-tree driver code
fix
2023-03-09 13:24:08 +00:00
Clayton Coleman
6b9a381185 kubelet: Force deleted pods can fail to move out of terminating
If a CRI error occurs during the terminating phase after a pod is
force deleted (API or static) then the housekeeping loop will not
deliver updates to the pod worker which prevents the pod's state
machine from progressing. The pod will remain in the terminating
phase but no further attempts to terminate or cleanup will occur
until the kubelet is restarted.

The pod worker now maintains a store of the pods state that it is
attempting to reconcile and uses that to resync unknown pods when
SyncKnownPods() is invoked, so that failures in sync methods for
unknown pods no longer hang forever.

The pod worker's store tracks desired updates and the last update
applied on podSyncStatuses. Each goroutine now synchronizes to
acquire the next work item, context, and whether the pod can start.
This synchronization moves the pending update to the stored last
update, which will ensure third parties accessing pod worker state
don't see updates before the pod worker begins synchronizing them.

As a consequence, the update channel becomes a simple notifier
(struct{}) so that SyncKnownPods can coordinate with the pod worker
to create a synthetic pending update for unknown pods (i.e. no one
besides the pod worker has data about those pods). Otherwise the
pending update info would be hidden inside the channel.

In order to properly track pending updates, we have to be very
careful not to mix RunningPods (which are calculated from the
container runtime and are missing all spec info) and config-
sourced pods. Update the pod worker to avoid using ToAPIPod()
and instead require the pod worker to directly use
update.Options.Pod or update.Options.RunningPod for the
correct methods. Add a new SyncTerminatingRuntimePod to prevent
accidental invocations of runtime only pod data.

Finally, fix SyncKnownPods to replay the last valid update for
undesired pods which drives the pod state machine towards
termination, and alter HandlePodCleanups to:

- terminate runtime pods that aren't known to the pod worker
- launch admitted pods that aren't known to the pod worker

Any started pods receive a replay until they reach the finished
state, and then are removed from the pod worker. When a desired
pod is detected as not being in the worker, the usual cause is
that the pod was deleted and recreated with the same UID (almost
always a static pod since API UID reuse is statistically
unlikely). This simplifies the previous restartable pod support.
We are careful to filter for active pods (those not already
terminal or those which have been previously rejected by
admission). We also force a refresh of the runtime cache to
ensure we don't see an older version of the state.

Future changes will allow other components that need to view the
pod worker's actual state (not the desired state the podManager
represents) to retrieve that info from the pod worker.

Several bugs in pod lifecycle have been undetectable at runtime
because the kubelet does not clearly describe the number of pods
in use. To better report, add the following metrics:

  kubelet_desired_pods: Pods the pod manager sees
  kubelet_active_pods: "Admitted" pods that gate new pods
  kubelet_mirror_pods: Mirror pods the kubelet is tracking
  kubelet_working_pods: Breakdown of pods from the last sync in
    each phase, orphaned state, and static or not
  kubelet_restarted_pods_total: A counter for pods that saw a
    CREATE before the previous pod with the same UID was finished
  kubelet_orphaned_runtime_pods_total: A counter for pods detected
    at runtime that were not known to the kubelet. Will be
    populated at Kubelet startup and should never be incremented
    after.

Add a metric check to our e2e tests that verifies the values are
captured correctly during a serial test, and then verify them in
detail in unit tests.

Adds 23 series to the kubelet /metrics endpoint.
2023-03-08 22:03:51 -06:00