Commit Graph

46684 Commits

Author SHA1 Message Date
Humble Chirammal
0bdb2db18d update internal type of csiNodeExpand feature to beta
Signed-off-by: Humble Chirammal <humble.devassy@gmail.com>
2023-03-14 22:12:17 +05:30
Humble Chirammal
92f59b6323 Update NodeExpandSecretRef comment for beta
Signed-off-by: Humble Chirammal <humble.devassy@gmail.com>
2023-03-14 17:57:24 +05:30
Kubernetes Prow Robot
45b96eae98 Merge pull request #113145 from smarterclayton/zombie_terminating_pods
kubelet: Force deleted pods can fail to move out of terminating
2023-03-09 15:32:30 -08:00
Kubernetes Prow Robot
c67953a2d0 Merge pull request #116428 from mborsz/fix
Avoid metric lookup in Parallelizer.Until on every work piece
2023-03-09 13:08:29 -08:00
Kubernetes Prow Robot
54ec651ab5 Merge pull request #110741 from zhoumingcheng/master-unit-v1
add unit test coverage for pkg/kubelet/util/queue
2023-03-09 11:15:51 -08:00
Kubernetes Prow Robot
c9bbb6553d Merge pull request #116422 from aojea/nodeslect
unexport buggy function nodeSelectorAsSelector
2023-03-09 10:06:03 -08:00
Maciej Borsz
30bca1e1d5 Avoid metric lookup in Parallelizer.Util on every work piece 2023-03-09 17:12:30 +00:00
Antonio Ojea
fd62265d19 unexport buggy function nodeSelectorAsSelector
Change-Id: I1e48ac0dd0b33c367fa9be4f4adb11a4531849f9
2023-03-09 16:58:25 +00:00
Kubernetes Prow Robot
f90643435e Merge pull request #113840 from 249043822/br-context-logging-statefulset
statefulset: use contextual logging
2023-03-09 06:42:02 -08:00
Clayton Coleman
6b9a381185 kubelet: Force deleted pods can fail to move out of terminating
If a CRI error occurs during the terminating phase after a pod is
force deleted (API or static) then the housekeeping loop will not
deliver updates to the pod worker which prevents the pod's state
machine from progressing. The pod will remain in the terminating
phase but no further attempts to terminate or cleanup will occur
until the kubelet is restarted.

The pod worker now maintains a store of the pods state that it is
attempting to reconcile and uses that to resync unknown pods when
SyncKnownPods() is invoked, so that failures in sync methods for
unknown pods no longer hang forever.

The pod worker's store tracks desired updates and the last update
applied on podSyncStatuses. Each goroutine now synchronizes to
acquire the next work item, context, and whether the pod can start.
This synchronization moves the pending update to the stored last
update, which will ensure third parties accessing pod worker state
don't see updates before the pod worker begins synchronizing them.

As a consequence, the update channel becomes a simple notifier
(struct{}) so that SyncKnownPods can coordinate with the pod worker
to create a synthetic pending update for unknown pods (i.e. no one
besides the pod worker has data about those pods). Otherwise the
pending update info would be hidden inside the channel.

In order to properly track pending updates, we have to be very
careful not to mix RunningPods (which are calculated from the
container runtime and are missing all spec info) and config-
sourced pods. Update the pod worker to avoid using ToAPIPod()
and instead require the pod worker to directly use
update.Options.Pod or update.Options.RunningPod for the
correct methods. Add a new SyncTerminatingRuntimePod to prevent
accidental invocations of runtime only pod data.

Finally, fix SyncKnownPods to replay the last valid update for
undesired pods which drives the pod state machine towards
termination, and alter HandlePodCleanups to:

- terminate runtime pods that aren't known to the pod worker
- launch admitted pods that aren't known to the pod worker

Any started pods receive a replay until they reach the finished
state, and then are removed from the pod worker. When a desired
pod is detected as not being in the worker, the usual cause is
that the pod was deleted and recreated with the same UID (almost
always a static pod since API UID reuse is statistically
unlikely). This simplifies the previous restartable pod support.
We are careful to filter for active pods (those not already
terminal or those which have been previously rejected by
admission). We also force a refresh of the runtime cache to
ensure we don't see an older version of the state.

Future changes will allow other components that need to view the
pod worker's actual state (not the desired state the podManager
represents) to retrieve that info from the pod worker.

Several bugs in pod lifecycle have been undetectable at runtime
because the kubelet does not clearly describe the number of pods
in use. To better report, add the following metrics:

  kubelet_desired_pods: Pods the pod manager sees
  kubelet_active_pods: "Admitted" pods that gate new pods
  kubelet_mirror_pods: Mirror pods the kubelet is tracking
  kubelet_working_pods: Breakdown of pods from the last sync in
    each phase, orphaned state, and static or not
  kubelet_restarted_pods_total: A counter for pods that saw a
    CREATE before the previous pod with the same UID was finished
  kubelet_orphaned_runtime_pods_total: A counter for pods detected
    at runtime that were not known to the kubelet. Will be
    populated at Kubelet startup and should never be incremented
    after.

Add a metric check to our e2e tests that verifies the values are
captured correctly during a serial test, and then verify them in
detail in unit tests.

Adds 23 series to the kubelet /metrics endpoint.
2023-03-08 22:03:51 -06:00
Kubernetes Prow Robot
625b8be09e Merge pull request #115371 from pacoxu/cgroup-v2-memory-tuning
default memoryThrottlingFactor to 0.9 and optimize the memory.high formulas
2023-03-08 18:46:00 -08:00
Kubernetes Prow Robot
8d5c96fed2 Merge pull request #116093 from swatisehgal/topologymanager-ga-graduation
node: topologymgr: Graduate Kubelet Topology Manager to GA
2023-03-08 16:56:06 -08:00
Kubernetes Prow Robot
8fa82976fc Merge pull request #116356 from pacoxu/cleanup-bump_qps_kubelet
sync default qps of kubelet change everywhere
2023-03-08 15:42:41 -08:00
Maksim Nabokikh
c1431af4f8 KEP-3325: Promote SelfSubjectReview to Beta (#116274)
* Promote SelfSubjectReview to Beta

Signed-off-by: m.nabokikh <maksim.nabokikh@flant.com>

* Fix whoami API

Signed-off-by: m.nabokikh <maksim.nabokikh@flant.com>

* Fixes according to code review

Signed-off-by: m.nabokikh <maksim.nabokikh@flant.com>

---------

Signed-off-by: m.nabokikh <maksim.nabokikh@flant.com>
2023-03-08 15:42:33 -08:00
Kubernetes Prow Robot
4a896644de Merge pull request #116235 from Jefftree/oas-ga
Promote OpenAPI V3 to GA
2023-03-08 14:44:20 -08:00
Kubernetes Prow Robot
8b413d224a Merge pull request #116342 from msau42/unlock
Unlock CSIMigrationvSphere feature gate
2023-03-08 11:27:24 -08:00
Kubernetes Prow Robot
03ff890ef4 Merge pull request #116329 from dims/drop-aws-kubelet-credential-provider-and-cleanup-aws-storage-e2e-tests
Drop aws kubelet credential provider and cleanup aws storage e2e tests
2023-03-08 06:49:11 -08:00
ZhangKe10140699
a239b9986b Migrated the StatefulSet controller (within `kube-controller-manager) to use [contextual logging](https://k8s.io/docs/concepts/cluster-administration/system-logs/#contextual-logging) 2023-03-08 18:57:57 +08:00
Kubernetes Prow Robot
f99c351992 Merge pull request #116174 from pacoxu/fix-TestNewNodeIpamControllerWithCIDRMasks
node ipam controller ut: run test in parallel to avoid timeout
2023-03-08 02:25:12 -08:00
Paco Xu
f368413d65 sync default qps of kubelet change 2023-03-08 14:04:51 +08:00
Paco Xu
0e6636eb33 nodeipam: return error instead of panics 2023-03-08 12:47:14 +08:00
Kubernetes Prow Robot
e390791e5f Merge pull request #116341 from bobbypage/revert-114640-handle-device-mgr-recovery
Revert "node: device-mgr: Handle recovery flow by checking if healthy devices exist"
2023-03-07 19:31:33 -08:00
Kubernetes Prow Robot
83334ccaa1 Merge pull request #116320 from wangchen615/lastminute-scheduler-fix
Address last-minute requested changes for inplace update feature testing in scheduler
2023-03-07 19:31:26 -08:00
Paco Xu
4fdca4c5f6 node ipam ut: run test in parallel to avoid timeout; and optimize the panic check 2023-03-08 11:10:51 +08:00
Kubernetes Prow Robot
fe6a51ed4c Merge pull request #116121 from wojtek-t/bump_qps_kubelet
Bump default API QPS limits for Kubelet
2023-03-07 15:08:43 -08:00
Kubernetes Prow Robot
6bce018b36 Merge pull request #116271 from vinaykul/restart-free-pod-vertical-scaling-kubelet-panic-fix
Fix nil pointer access panic in kubelet from uninitialized pod allocation checkpoint manager in standalone kubelet scenario
2023-03-07 12:38:45 -08:00
Michelle Au
4c0ed3b52e Unlock CSIMigrationvSphere feature gate until there is a supported vSphere CSI driver available 2023-03-07 20:26:27 +00:00
David Porter
9c20cee504 Revert "node: device-mgr: Handle recovery flow by checking if healthy devices exist" 2023-03-07 11:50:52 -08:00
Kubernetes Prow Robot
2c8f63f693 Merge pull request #115268 from jsafrane/split-reconstruction
Split volume reconstruction refactoring from SELinuxMountReadWriteOncePod
2023-03-07 10:44:34 -08:00
Kubernetes Prow Robot
37326f7cea Merge pull request #112670 from yangjunmyfm192085/delklogV0
use contextual logging(nodeipam and nodelifecycle part)
2023-03-07 09:40:33 -08:00
Kubernetes Prow Robot
e28b191581 Merge pull request #116167 from borgerli/pr/kcm-podgc
delete Evicted pods first during pod gc
2023-03-07 07:21:04 -08:00
Kubernetes Prow Robot
2225ee5dd3 Merge pull request #115904 from soltysh/cronjob_tz_ga
Promote CronJob TZ to GA
2023-03-07 07:20:47 -08:00
Kubernetes Prow Robot
51ef4b10ba Merge pull request #115504 from pacoxu/cronjob-timezone
add some ut for cronjob strategy and timezone in schedule
2023-03-07 07:20:34 -08:00
Davanum Srinivas
90d185b7e1 Drop AWS kubelet credential provider and cleanup AWS storage e2e tests
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2023-03-07 09:00:12 -05:00
Kubernetes Prow Robot
86bf570711 Merge pull request #111661 from alexanderConstantinescu/etp-local-svc-hc-kube-proxy
[Proxy]: add `healthz` verification when determining HC response for eTP:Local
2023-03-07 05:34:36 -08:00
Kubernetes Prow Robot
637bd66165 Merge pull request #115332 from obaranov1/ttlafterfinished-logging-migration
Migrate /pkg/controller/ttlafterfinished to structured and contextual logging
2023-03-07 04:20:08 -08:00
Naman Lakhwani
b6f9a65558 Migrating pkg/controller/serviceaccount to contextual logging (#114918)
* migrating pkg/controller/serviceaccount to contextual logging

Signed-off-by: Naman <namanlakhwani@gmail.com>

* small nit

Signed-off-by: Naman <namanlakhwani@gmail.com>

* capitalising first letter of error

Signed-off-by: Naman <namanlakhwani@gmail.com>

* addressed review comments

Signed-off-by: Naman <namanlakhwani@gmail.com>

* small nit to add key

Signed-off-by: Naman <namanlakhwani@gmail.com>

---------

Signed-off-by: Naman <namanlakhwani@gmail.com>
2023-03-07 04:19:59 -08:00
Naman Lakhwani
8f45b64c93 Migrated pkg/controller/replicaset to contextual logging (#114871)
* migrated controller/replicaset to contextual logging

Signed-off-by: Naman <namanlakhwani@gmail.com>

* small nits

Signed-off-by: Naman <namanlakhwani@gmail.com>

* addressed changes

Signed-off-by: Naman <namanlakhwani@gmail.com>

* small nit

Signed-off-by: Naman <namanlakhwani@gmail.com>

* taking t as input

Signed-off-by: Naman <namanlakhwani@gmail.com>

---------

Signed-off-by: Naman <namanlakhwani@gmail.com>
2023-03-07 04:19:51 -08:00
Kubernetes Prow Robot
4aaa4df840 Merge pull request #113986 from songxiao-wang87/runwxs-test2
Migrate StorageVersionGC to contextual logging
2023-03-07 04:19:43 -08:00
Kubernetes Prow Robot
471b392f43 Merge pull request #113916 from songxiao-wang87/runwxs-test1
Migrate ttl_controller to contextual logging
2023-03-07 04:18:30 -08:00
Maciej Szulik
e047c859be Update generated 2023-03-07 12:58:59 +01:00
Maciej Szulik
1b825c179b Promote CronJob TZ to GA 2023-03-07 12:58:57 +01:00
Swati Sehgal
bea99ae1ee node: topologymgr: update autogenerated code
Changes committed after running:
`./hack/update-codegen.sh`

Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
2023-03-07 11:11:31 +00:00
Swati Sehgal
ae964a493f node: topologymgr: remove comments with feature gate references
Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
2023-03-07 09:42:54 +00:00
Kubernetes Prow Robot
3489796d5c Merge pull request #113428 from mengjiao-liu/contextual-logging-controller-cronjob
Update `pkg/controller/cronjob/` for contextual logging
2023-03-07 01:28:18 -08:00
JunYang
780ef3afb0 use klog.InfoS instead of klog.V(0),Info 2023-03-07 15:50:01 +08:00
vinay kulkarni
98e8f42f33 panic on pod resources alloc checkpoint failure 2023-03-07 05:59:34 +00:00
Kubernetes Prow Robot
04675428bb Merge pull request #115973 from jpbetz/enforcement-actions
KEP-3488: Implement Enforcement Actions and Audit Annotations
2023-03-06 21:56:37 -08:00
Kubernetes Prow Robot
8e659d43ec Merge pull request #115925 from claudiubelu/skip-flaky-tests
unit tests: Skip flaky tests on Windows
2023-03-06 21:56:29 -08:00
Kubernetes Prow Robot
b4305fcf63 Merge pull request #115391 from haoruan/bugfix/allow-pv-nodeaffinity-to-be-mutable
allow to mutate pv nodeaffinity label key
2023-03-06 21:56:17 -08:00