Commit Graph

9799 Commits

Author SHA1 Message Date
Patrick Ohly
9eaa2dc554 avoid klog Info calls without verbosity
In the following code pattern, the log message will get logged with v=0 in JSON
output although conceptually it has a higher verbosity:

   if klog.V(5).Enabled() {
       klog.Info("hello world")
   }

Having the actual verbosity in the JSON output is relevant, for example for
filtering out only the important info messages. The solution is to use
klog.V(5).Info or something similar.

Whether the outer if is necessary at all depends on how complex the parameters
are. The return value of klog.V can be captured in a variable and be used
multiple times to avoid the overhead for that function call and to avoid
repeating the verbosity level.
2022-01-12 07:48:36 +01:00
Kubernetes Prow Robot
b5103f6117 Merge pull request #107426 from yanghesong/remove_validate_runtime
Remove runtime in validate
2022-01-11 20:50:36 -08:00
Kubernetes Prow Robot
cadbe8dfb5 Merge pull request #107250 from cndoit18/use-errors
cleanup(kubelet): use errors.Is(err, os.ErrProcessDone)
2022-01-11 10:49:01 -08:00
Kubernetes Prow Robot
19069665f9 Merge pull request #107094 from adisky/d-container-runtime
Mark container-runtime kubelet flag as deprecated
2022-01-11 10:48:46 -08:00
Kubernetes Prow Robot
7eb5046064 Merge pull request #106470 from qmloong/qmloong/fix
fix: some typos and syncPod outdated workflow annotation
2022-01-11 10:48:38 -08:00
Kubernetes Prow Robot
5f4914604d Merge pull request #106353 from gjkim42/remove-false-pleg-errors
kubelet: Remove false PLEG errors
2022-01-11 10:48:26 -08:00
Kubernetes Prow Robot
a0dfd958d5 Merge pull request #107163 from cyclinder/fix_leak_goroutine
fix goroutine leaks in TestConfigurationChannels
2022-01-10 17:23:16 -08:00
cyclinder
928e686877 fix goroutine leaks in TestConfigurationChannels
Signed-off-by: cyclinder <qifeng.guo@daocloud.io>
2022-01-10 19:51:16 +08:00
yanghesong
6905fef761 Remove runtime in validate
Validate is useless as dockershim is removed

Signed-off-by: yanghesong <hesong.yang@foxmail.com>
2022-01-09 09:11:49 +08:00
wq
4f38d4aaa1 fix a typo in the comment of ImageCredentialProviderConfigFile 2022-01-09 00:07:43 +09:00
Kubernetes Prow Robot
d1a5513cb0 Merge pull request #107006 from gnufied/add-total-mount-time-metrics
Add metric for reporting total end-to-end mount time
2022-01-07 06:19:31 -08:00
Kubernetes Prow Robot
09fccc3533 Merge pull request #106796 from jonyhy96/fix-timer
kubelet: use newtimer instead in nodeshutdown manager
2022-01-06 11:47:12 -08:00
Kubernetes Prow Robot
03ee86c09c Merge pull request #104837 from eggiter/fix-release-reused-cpus
fix(cpumanager): Do not release CPUs of init containers while they are being reused in app containers
2022-01-06 11:46:38 -08:00
Kubernetes Prow Robot
0b9ad84973 Merge pull request #107116 from yxxhero/add_more_msg_for_no_podsandbox_container
add more message for no PodSandbox container
2022-01-06 08:58:09 -08:00
Kubernetes Prow Robot
b457ae72f5 Merge pull request #106644 from ahrtr/add_info_counter_perfcounter
Add more info when failing to call PdhAddEnglishCounter
2022-01-06 06:45:01 -08:00
Aditi Sharma
e03d7d3fdd Mark container-runtime flag as deprecated
Signed-off-by: Aditi Sharma <adi.sky17@gmail.com>
2022-01-06 10:23:03 +05:30
Kubernetes Prow Robot
73b68f5233 Merge pull request #106979 from a2ush/fix_typo
Fix comment out typo (from resolve.conf to resolv.conf) and change the content name (from maxResolveConfLength to maxResolvConfLength)
2022-01-05 16:08:26 -08:00
Kubernetes Prow Robot
afd254a18f Merge pull request #106756 from victory460/feature_helpers
code cleanup for container/helpers.go
2022-01-05 08:20:42 -08:00
Kubernetes Prow Robot
19591a1324 Merge pull request #105829 from yuanchen8911/master
Fix and improve comments on kubelet metrics
2022-01-04 23:02:32 -08:00
Kubernetes Prow Robot
abfbbe4dda Merge pull request #107119 from hakman/remove_dockerless
Remove dockerless build tag and DockerLegacyService interface
2022-01-04 11:27:21 -08:00
cndoit18
601d02b90f refactor(kubelet): use errors.Is(err, os.ErrProcessDone)
use errors.Is(err, os.ErrProcessDone) here and remove "process already finished" string comparison.

Signed-off-by: cndoit18 <cndoit18@outlook.com>
2021-12-29 18:10:06 +08:00
Kubernetes Prow Robot
f0dbc32ed9 Merge pull request #106853 from gnufied/disable-exp-backoff-volume-not-inuse
When volume is not marked in-use, do not backoff
2021-12-22 19:46:37 -08:00
Hemant Kumar
7989f27044 use node informer to check volumes attachment status before backoff
fix unit tests
2021-12-20 11:57:05 -05:00
Ciprian Hacman
5bae9b9288 Clean up DockerLegacyService interface
Signed-off-by: Ciprian Hacman <ciprian@hakman.dev>
2021-12-18 12:24:54 +02:00
Ciprian Hacman
6cdb1c225d Clean up dockerless build tag
Signed-off-by: Ciprian Hacman <ciprian@hakman.dev>
2021-12-18 12:18:25 +02:00
yxxhero
a90b149be0 add more message for no PodSandbox container
Signed-off-by: yxxhero <aiopsclub@163.com>
2021-12-18 09:52:03 +08:00
Davanum Srinivas
497e9c1971 Cleanup OWNERS files (No Activity in the last year)
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2021-12-15 10:34:02 -05:00
a2ush
393dec26f6 Change the name of the constant 2021-12-14 22:42:57 +09:00
Hemant Kumar
55b5e6dc33 Add metric for reporting total end-to-end mount time
This metric includes time spent in waiting for devices to be attached,
any RPC calls and performing recursive chown etc.
2021-12-13 16:23:01 -05:00
a2ush
d775483381 Fix comment out typo 2021-12-11 22:27:38 +09:00
Kubernetes Prow Robot
1d66302c42 Merge pull request #106458 from dims/lint-yaml-in-owners-files
Lint/Beautify yaml in OWNERS files
2021-12-10 06:39:12 -08:00
Kubernetes Prow Robot
1b0d83f1d6 Merge pull request #106599 from klueska/fix-numa-bug
Fix Bugs in CPUManager distribute NUMA policy option
2021-12-10 04:41:12 -08:00
haoyun
92fa957dd1 feat: use clock instead
Signed-off-by: haoyun <yun.hao@daocloud.io>
2021-12-10 13:59:12 +08:00
Kubernetes Prow Robot
15e5f2a19a Merge pull request #106291 from sbs2001/fix_invalid_comment
Remove invalid comment in legacyregistry
2021-12-09 19:03:10 -08:00
Davanum Srinivas
9405e9b55e Check in OWNERS modified by update-yamlfmt.sh
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2021-12-09 21:31:26 -05:00
David Porter
95264a418d kubelet: set failed phase during graceful shutdown
Revert to previous behavior in 1.21/1.20 of setting pod phase to failed
during graceful node shutdown.

Setting pods to failed phase will ensure that external controllers that
manage pods like deployments will create new pods to replace those that
are shutdown. Many customers have taken a dependency on this behavior
and it was breaking change in 1.22, so this change reverts back to the
previous behavior.

Signed-off-by: David Porter <david@porter.me>
2021-12-09 13:17:40 -08:00
Kubernetes Prow Robot
cdf3ad823a Merge pull request #97252 from dims/drop-dockershim
Completely remove in-tree dockershim from kubelet
2021-12-08 12:51:46 -08:00
Kubernetes Prow Robot
f356ae4ad9 Merge pull request #101719 from SergeyKanzhelev/removeReallyCrashForTesting
Remove ReallyCrashForTesting and cleaned up some references to Handle…
2021-12-07 23:39:45 -08:00
Kubernetes Prow Robot
b685b3982d Merge pull request #105360 from shuheiktgw/refactor_kubelet_config_validation_tests
Refactor kubelet config validation tests
2021-12-07 17:25:43 -08:00
Davanum Srinivas
bc78dff42e update files to drop dockershim
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2021-12-07 15:15:13 -05:00
Davanum Srinivas
83265c9171 drop files deleted from pkg/kubelet/dockershim
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2021-12-07 15:15:13 -05:00
Hemant Kumar
5b7b2e2f6c When volume is not marked in-use, do not backoff 2021-12-07 11:50:15 -05:00
Sascha Grunert
a063a2ba3e Revert dockershim CRI v1 changes
We should not touch the dockershim ahead of removal and therefore
default to `v1alpha2` CRI instead of `v1`.

Partially reverts changes from https://github.com/kubernetes/kubernetes/pull/106501

Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
2021-12-03 18:37:11 +01:00
xuweiwei
21238c2593 code cleanup for container/helpers.go 2021-12-01 11:17:33 +08:00
Sergey Kanzhelev
a11453efbc remove ReallyCrashForTesting and cleaned up some references to HandleCrash behavior 2021-11-29 20:00:10 +00:00
menglong.qi
12eff56460 fix: syncPod outdated workflow comment 2021-11-28 17:21:29 +08:00
Kevin Klues
f8511877e2 Add regression test for CPUManager distribute NUMA algorithm
We witnessed this exact allocation attempt in a live cluster and witnessed the
algorithm fail with an accounting error. This test was added to verify that
this case is now handled by the updates to the algorithm and that we don't
regress from it in the future.

"test" description="ensure previous failure encountered on live machine has been fixed (1/1)"
"combo remainderSet balance" combo=[2 4 6] remainderSet=[2 4 6] distribution=9 remainder=1 available=[14 2 4 4 0 3 4 1] balance=4.031
"combo remainderSet balance" combo=[2 4 6] remainderSet=[2 4] distribution=9 remainder=1 available=[0 3 4 1 14 2 4 4] balance=4.031
"combo remainderSet balance" combo=[2 4 6] remainderSet=[2 6] distribution=9 remainder=1 available=[1 14 2 4 4 0 3 4] balance=4.031
"combo remainderSet balance" combo=[2 4 6] remainderSet=[4 6] distribution=9 remainder=1 available=[1 3 4 0 14 2 4 4] balance=4.031
"combo remainderSet balance" combo=[2 4 6] remainderSet=[2] distribution=9 remainder=1 available=[4 0 3 4 1 14 2 4] balance=4.031
"combo remainderSet balance" combo=[2 4 6] remainderSet=[4] distribution=9 remainder=1 available=[3 4 0 14 2 4 4 1] balance=4.031
"combo remainderSet balance" combo=[2 4 6] remainderSet=[6] distribution=9 remainder=1 available=[1 13 2 4 4 1 3 4] balance=3.606
"bestCombo found" distribution=9 bestCombo=[2 4 6] bestRemainder=[6]

Signed-off-by: Kevin Klues <kklues@nvidia.com>
2021-11-24 20:49:58 +00:00
Kevin Klues
e284c74d93 Add unit test for CPUManager distribute NUMA algorithm verifying fixes
Before Change:
"test" description="ensure bestRemainder chosen with NUMA nodes that have enough CPUs to satisfy the request"
"combo remainderSet balance" combo=[0 1 2 3] remainderSet=[0 1] distribution=8 remainder=2 available=[-1 -1 0 6] balance=2.915
"combo remainderSet balance" combo=[0 1 2 3] remainderSet=[0 2] distribution=8 remainder=2 available=[-1 0 -1 6] balance=2.915
"combo remainderSet balance" combo=[0 1 2 3] remainderSet=[0 3] distribution=8 remainder=2 available=[5 -1 0 0] balance=2.345
"combo remainderSet balance" combo=[0 1 2 3] remainderSet=[1 2] distribution=8 remainder=2 available=[0 -1 -1 6] balance=2.915
"combo remainderSet balance" combo=[0 1 2 3] remainderSet=[1 3] distribution=8 remainder=2 available=[0 -1 0 5] balance=2.345
"combo remainderSet balance" combo=[0 1 2 3] remainderSet=[2 3] distribution=8 remainder=2 available=[0 0 -1 5] balance=2.345
"bestCombo found" distribution=8 bestCombo=[0 1 2 3] bestRemainder=[0 3]

--- FAIL: TestTakeByTopologyNUMADistributed (0.01s)
    --- FAIL: TestTakeByTopologyNUMADistributed/ensure_bestRemainder_chosen_with_NUMA_nodes_that_have_enough_CPUs_to_satisfy_the_request (0.00s)
        cpu_assignment_test.go:867: unexpected error [accounting error, not enough CPUs allocated, remaining: 1]

After Change:
"test" description="ensure bestRemainder chosen with NUMA nodes that have enough CPUs to satisfy the request"
"combo remainderSet balance" combo=[0 1 2 3] remainderSet=[3] distribution=8 remainder=2 available=[0 0 0 4] balance=1.732
"bestCombo found" distribution=8 bestCombo=[0 1 2 3] bestRemainder=[3]

SUCCESS

Signed-off-by: Kevin Klues <kklues@nvidia.com>
2021-11-24 20:45:37 +00:00
Kevin Klues
031f11513d Fix accounting bug in CPUManager distribute NUMA policy
Without this fix, the algorithm may decide to allocate "remainder" CPUs from a
NUMA node that has no more CPUs to allocate. Moreover, it was only considering
allocation of remainder CPUs from NUMA nodes such that each NUMA node in the
remainderSet could only allocate 1 (i.e. 'cpuGroupSize') more CPUs. With these
two issues in play, one could end up with an accounting error where not enough
CPUs were allocated by the time the algorithm runs to completion.

The updated algorithm will now omit any NUMA nodes that have 0 CPUs left from
the set of NUMA nodes considered for allocating remainder CPUs. Additionally,
we now consider *all* combinations of nodes from the remainder set of size
1..len(remainderSet). This allows us to find a better solution if allocating
CPUs from a smaller set leads to a more balanced allocation. Finally, we loop
through all NUMA nodes 1-by-1 in the remainderSet until all rmeainer CPUs have
been accounted for and allocated. This ensure that we will not hit an
accounting error later on because we explicitly remove CPUs from the remainder
set until there are none left.

A follow-on commit adds a set of unit tests that will fail before these
changes, but succeeds after them.

Signed-off-by: Kevin Klues <kklues@nvidia.com>
2021-11-24 19:18:11 +00:00
Kevin Klues
5317a2e2ac Fix error handling in CPUManager distribute NUMA tests
Signed-off-by: Kevin Klues <kklues@nvidia.com>
2021-11-24 16:51:31 +00:00