Commit Graph

23001 Commits

Author SHA1 Message Date
Kubernetes Prow Robot
e8ae6658ed
Merge pull request #115065 from apelisse/apimachinery-managed-fields
managedfields: Move most of fieldmanager package to managefields
2023-03-09 21:34:22 -08:00
Kubernetes Prow Robot
45b96eae98
Merge pull request #113145 from smarterclayton/zombie_terminating_pods
kubelet: Force deleted pods can fail to move out of terminating
2023-03-09 15:32:30 -08:00
Kubernetes Prow Robot
f90643435e
Merge pull request #113840 from 249043822/br-context-logging-statefulset
statefulset: use contextual logging
2023-03-09 06:42:02 -08:00
Kubernetes Prow Robot
f02da82e36
Merge pull request #116404 from cpanato/go1202
[go] Bump images, dependencies and versions to go 1.20.2
2023-03-09 05:16:20 -08:00
Kubernetes Prow Robot
da87af638f
Merge pull request #115856 from lanycrost/e2e-115780-grpc-probe-tests
Promote gRPC probe e2e test to Conformance
2023-03-09 01:06:03 -08:00
cpanato
99c80ac119
[go] Bump images, dependencies and versions to go 1.20.2 2023-03-09 09:57:45 +01:00
Clayton Coleman
6b9a381185
kubelet: Force deleted pods can fail to move out of terminating
If a CRI error occurs during the terminating phase after a pod is
force deleted (API or static) then the housekeeping loop will not
deliver updates to the pod worker which prevents the pod's state
machine from progressing. The pod will remain in the terminating
phase but no further attempts to terminate or cleanup will occur
until the kubelet is restarted.

The pod worker now maintains a store of the pods state that it is
attempting to reconcile and uses that to resync unknown pods when
SyncKnownPods() is invoked, so that failures in sync methods for
unknown pods no longer hang forever.

The pod worker's store tracks desired updates and the last update
applied on podSyncStatuses. Each goroutine now synchronizes to
acquire the next work item, context, and whether the pod can start.
This synchronization moves the pending update to the stored last
update, which will ensure third parties accessing pod worker state
don't see updates before the pod worker begins synchronizing them.

As a consequence, the update channel becomes a simple notifier
(struct{}) so that SyncKnownPods can coordinate with the pod worker
to create a synthetic pending update for unknown pods (i.e. no one
besides the pod worker has data about those pods). Otherwise the
pending update info would be hidden inside the channel.

In order to properly track pending updates, we have to be very
careful not to mix RunningPods (which are calculated from the
container runtime and are missing all spec info) and config-
sourced pods. Update the pod worker to avoid using ToAPIPod()
and instead require the pod worker to directly use
update.Options.Pod or update.Options.RunningPod for the
correct methods. Add a new SyncTerminatingRuntimePod to prevent
accidental invocations of runtime only pod data.

Finally, fix SyncKnownPods to replay the last valid update for
undesired pods which drives the pod state machine towards
termination, and alter HandlePodCleanups to:

- terminate runtime pods that aren't known to the pod worker
- launch admitted pods that aren't known to the pod worker

Any started pods receive a replay until they reach the finished
state, and then are removed from the pod worker. When a desired
pod is detected as not being in the worker, the usual cause is
that the pod was deleted and recreated with the same UID (almost
always a static pod since API UID reuse is statistically
unlikely). This simplifies the previous restartable pod support.
We are careful to filter for active pods (those not already
terminal or those which have been previously rejected by
admission). We also force a refresh of the runtime cache to
ensure we don't see an older version of the state.

Future changes will allow other components that need to view the
pod worker's actual state (not the desired state the podManager
represents) to retrieve that info from the pod worker.

Several bugs in pod lifecycle have been undetectable at runtime
because the kubelet does not clearly describe the number of pods
in use. To better report, add the following metrics:

  kubelet_desired_pods: Pods the pod manager sees
  kubelet_active_pods: "Admitted" pods that gate new pods
  kubelet_mirror_pods: Mirror pods the kubelet is tracking
  kubelet_working_pods: Breakdown of pods from the last sync in
    each phase, orphaned state, and static or not
  kubelet_restarted_pods_total: A counter for pods that saw a
    CREATE before the previous pod with the same UID was finished
  kubelet_orphaned_runtime_pods_total: A counter for pods detected
    at runtime that were not known to the kubelet. Will be
    populated at Kubelet startup and should never be incremented
    after.

Add a metric check to our e2e tests that verifies the values are
captured correctly during a serial test, and then verify them in
detail in unit tests.

Adds 23 series to the kubelet /metrics endpoint.
2023-03-08 22:03:51 -06:00
Kubernetes Prow Robot
8d5c96fed2
Merge pull request #116093 from swatisehgal/topologymanager-ga-graduation
node: topologymgr: Graduate Kubelet Topology Manager to GA
2023-03-08 16:56:06 -08:00
Kubernetes Prow Robot
30ee6914c5
Merge pull request #115149 from nilekhc/encrypt-all
Allow encryption for all resources
2023-03-08 16:55:59 -08:00
Kubernetes Prow Robot
8fa82976fc
Merge pull request #116356 from pacoxu/cleanup-bump_qps_kubelet
sync default qps of kubelet change everywhere
2023-03-08 15:42:41 -08:00
Maksim Nabokikh
c1431af4f8
KEP-3325: Promote SelfSubjectReview to Beta (#116274)
* Promote SelfSubjectReview to Beta

Signed-off-by: m.nabokikh <maksim.nabokikh@flant.com>

* Fix whoami API

Signed-off-by: m.nabokikh <maksim.nabokikh@flant.com>

* Fixes according to code review

Signed-off-by: m.nabokikh <maksim.nabokikh@flant.com>

---------

Signed-off-by: m.nabokikh <maksim.nabokikh@flant.com>
2023-03-08 15:42:33 -08:00
Kubernetes Prow Robot
0a5310fe9a
Merge pull request #116232 from aojea/e2e_terminating_connectivity
test connectivity for terminating pods
2023-03-08 15:42:21 -08:00
Kubernetes Prow Robot
8ee9b82b10
Merge pull request #115984 from tzneal/init-container-tests
add more init container testing
2023-03-08 15:42:08 -08:00
Nilekh Chaudhari
9382fab9b6
feat: implements encrypt all
Signed-off-by: Nilekh Chaudhari <1626598+nilekhc@users.noreply.github.com>
2023-03-08 22:18:49 +00:00
Antoine Pelisse
4f3859ce91 managedfields: Move most of fieldmanager package to managefields 2023-03-08 13:44:00 -08:00
Kubernetes Prow Robot
8319ac5274
Merge pull request #116383 from Huang-Wei/fix/sched-perf-test
fix: remove SchedulingMigratedInTreePVs feature gate in sched perf test
2023-03-08 13:12:20 -08:00
Kubernetes Prow Robot
2a22864d9c
Merge pull request #116381 from pohly/cronjob-integration-test-shutdown
cronjob: shut down integration test quickly again
2023-03-08 13:12:08 -08:00
Todd Neal
123ab80333 add more init container lifetime testing
Add some additional init container tests that work via monitoring
container lifetime based on logs written to a common file. This allows
more easily writing assertions about the container lifetimes with
respect to one another.
2023-03-08 14:39:10 -06:00
Kubernetes Prow Robot
7598ff36cf
Merge pull request #116333 from aojea/multiport_service
e2e network test for multiple protocol services on same port
2023-03-08 11:27:12 -08:00
Wei Huang
c9bc2f98d0
fix: remove SchedulingMigratedInTreePVs feature gate in sched perf test 2023-03-08 08:34:44 -08:00
Patrick Ohly
be82872eff cronjob: shut down integration test quickly again
6f2cd1b5bd swapped the order of cancel() and
closeFn() so that closeFn got called first when the test was done. This caused
it to block while waiting for goroutines which themselves were waiting for
the context cancellation. The test still shut down, it just took ~86s instead
of ~30s.

The fix is to register the cancel twice: once as soon as the context is
created (to clean up in case of an unexpected panic) and once after
closeFn (because then it'll get called first, as before).
2023-03-08 17:26:47 +01:00
Kubernetes Prow Robot
499a03d88b
Merge pull request #115451 from zhucan/nodeexpandvolume-secret-e2e
e2e: add e2e test to node expand volume with secret
2023-03-08 07:53:12 -08:00
Kubernetes Prow Robot
03ff890ef4
Merge pull request #116329 from dims/drop-aws-kubelet-credential-provider-and-cleanup-aws-storage-e2e-tests
Drop aws kubelet credential provider and cleanup aws storage e2e tests
2023-03-08 06:49:11 -08:00
ZhangKe10140699
a239b9986b Migrated the StatefulSet controller (within `kube-controller-manager) to use [contextual logging](https://k8s.io/docs/concepts/cluster-administration/system-logs/#contextual-logging) 2023-03-08 18:57:57 +08:00
Paco Xu
f368413d65 sync default qps of kubelet change 2023-03-08 14:04:51 +08:00
Kubernetes Prow Robot
e390791e5f
Merge pull request #116341 from bobbypage/revert-114640-handle-device-mgr-recovery
Revert "node: device-mgr: Handle recovery flow by checking if healthy devices exist"
2023-03-07 19:31:33 -08:00
Kubernetes Prow Robot
83334ccaa1
Merge pull request #116320 from wangchen615/lastminute-scheduler-fix
Address last-minute requested changes for inplace update feature testing in scheduler
2023-03-07 19:31:26 -08:00
Kubernetes Prow Robot
548e856b58
Merge pull request #116144 from dashpole/apiserver_tracing_beta_round_2
Graduate API Server tracing to beta
2023-03-07 19:31:12 -08:00
zhucan
7a87b4363b e2e: add e2e test to node expand volume with secret
Signed-off-by: zhucan <zhucan.k8s@gmail.com>
2023-03-08 10:27:50 +08:00
Kubernetes Prow Robot
88cce39155
Merge pull request #116200 from Jefftree/e2e-openapi
Add OpenAPI V3 E2E Tests
2023-03-07 15:08:55 -08:00
David Ashpole
4014d0fbbf
graduate API Server tracing to beta 2023-03-07 21:39:39 +00:00
Antonio Ojea
7cb135a888 e2e network test for multiple protocol services on same port
The test creates a Service exposing two protocols on the same port
and a backend that replies on both protocols.

1. Test that Service with works for both protocol
2. Update Service to expose only the TCP port
3. Verify that TCP works and UDP does not work
4. Update Service to expose only the UDP port
5. Verify that TCP does not work and UDP does work

Change-Id: Ic4f3a6509e332aa5694d20dfc3b223d7063a7871
2023-03-07 21:30:39 +00:00
Jefftree
658ea4b591 Refactor aggregated apiserver e2e, add openapi e2e 2023-03-07 20:04:00 +00:00
David Porter
9c20cee504
Revert "node: device-mgr: Handle recovery flow by checking if healthy devices exist" 2023-03-07 11:50:52 -08:00
Kubernetes Prow Robot
2bac225e42
Merge pull request #116331 from aojea/revert_aojea_mistake
Revert "do not assume backend on e2e service jig"
2023-03-07 10:44:53 -08:00
Kubernetes Prow Robot
7ec3c2727b
Merge pull request #115358 from pohly/logs-performance-benchmarks
Logs performance benchmarks
2023-03-07 10:44:46 -08:00
Antonio Ojea
bcc61bbb8b test connectivity for terminating pods
Test 2 scenarios:

- pod can connect to a terminating pods
- terminating pod can connect to other pods

Change-Id: Ia5dc4e7370cc055df452bf7cbaddd9901b4d229d
2023-03-07 17:48:07 +00:00
Kubernetes Prow Robot
0e9ad242bd
Merge pull request #116298 from soltysh/simplify_sset_test
Get rid of context.TODO and simplify waitForStatusCurrentReplicas
2023-03-07 09:40:45 -08:00
Kubernetes Prow Robot
37326f7cea
Merge pull request #112670 from yangjunmyfm192085/delklogV0
use contextual logging(nodeipam and nodelifecycle part)
2023-03-07 09:40:33 -08:00
Davanum Srinivas
b50d9d1e28
Cleanup vendor/
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2023-03-07 10:27:15 -05:00
Antonio Ojea
0826438d72 Revert "do not assume backend on e2e service jig"
This reverts commit 909a08d011.

Change-Id: Ie9ba4284ac739b3ea7eb57c44bb4551316121920
2023-03-07 15:12:22 +00:00
Patrick Ohly
5ee679b340 test/integration/logs: use stable struct for unit test
v1.Container is still changing a log which caused the test to fail each time a
new field was added. To test loading, let's better use something that is
unlikely to change. The runtimev1.VersionResponse gets logged by kubelet and
seems to be stable.
2023-03-07 16:04:32 +01:00
Patrick Ohly
eaa95b9178 test/integration/logs: benchmark using logsapi
The benchmarks and unit tests were written so that they used custom APIs for
each log format. This made them less realistic because there were subtle
differences between the benchmark and a real Kubernetes component. Now all
logging configuration is done with the official
k8s.io/component-base/logs/api/v1.

To make the different test cases more comparable, "messages/s" is now reported
instead of the generic "ns/op".
2023-03-07 16:04:32 +01:00
Patrick Ohly
10c15d7a67 test/integration/logs: replace assert.Contains
For long strings the output of assert.Contains is not very readable.
2023-03-07 16:04:32 +01:00
Patrick Ohly
a862a269b0 test/integration/logs: remove useless stats case
The same effect can be achieved with `-bench=BenchmarkEncoding/none`.
2023-03-07 16:04:32 +01:00
Patrick Ohly
97a8d72a67 test/integration/logs: update benchmark support
When trying again with recent log files from the CI job, it was found that some
JSON messages get split across multiple lines, both in container logs and in
the systemd journal:

   2022-12-21T07:09:47.914739996Z stderr F {"ts":1671606587914.691,"caller":"rest/request.go:1169","msg":"Response ...
   2022-12-21T07:09:47.914984628Z stderr F 70 72  6f 78 79 10 01 1a 13 53 ... \".|\n","v":8}

Note the different time stamp on the second line. That first line is
long (17384 bytes). This seems to happen because the data must pass through a
stream-oriented pipe and thus may get split up by the Linux kernel.

The implication is that lines must get merged whenever the JSON decoder
encounters an incomplete line. The benchmark loader now supports that. To
simplifies this, stripping the non-JSON line prefixes must be done before using
a log as test data.

The updated README explains how to do that when downloading a CI job
result. The amount of manual work gets reduced by committing symlinks under
data to the expected location under ci-kubernetes-kind-e2e-json-logging and
ignoring them when the data is not there.

Support for symlinks gets removed and path/filepath is used instead of path
because it has better Windows support.
2023-03-07 16:03:48 +01:00
Davanum Srinivas
90d185b7e1
Drop AWS kubelet credential provider and cleanup AWS storage e2e tests
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2023-03-07 09:00:12 -05:00
Maciej Szulik
27d5cda811
Simplify waitForStatusCurrentReplicas helper 2023-03-07 14:18:31 +01:00
Maciej Szulik
17117dc47d
Replace context.TODO with proper context 2023-03-07 14:18:29 +01:00
Antonio Ojea
4482d1c2f4 e2e support Services with multiple EndpointSlices
A Service can use multiple EndpointSlices for its backend, when
using custom Endpoint Slices, the data plane should forward traffic
to any of the endpoints in the Endpointslices that belong to the
Service.

Change-Id: I80b42522bf6ab443050697a29b94d8245943526f
2023-03-07 13:00:46 +00:00