Commit Graph

10171 Commits

Author SHA1 Message Date
Kubernetes Prow Robot
8580bbf7d7
Merge pull request #107594 from hakman/remove_container-runtime_logic
Clean up logic for deprecated flag --container-runtime in kubelet
2022-02-11 12:57:47 -08:00
Kubernetes Prow Robot
e24b5333e5
Merge pull request #108052 from klueska/fix-topology-manager
Fix bug in TopologyManager with merging hints when NUM_NUMA > 2
2022-02-11 07:37:34 -08:00
Jan Safranek
77aa06d0c8 Remove util/selinux package
The package says:

> the libcontainer SELinux package is only built for Linux, so it is
> necessary to have a NOP wrapper which is built for non-Linux platforms

This is not true, Kubernetes now imports
github.com/opencontainers/selinux/go-selinux and it has proper
multiplatform support (i.e. NOOP on non-Linux platforms).

Removing the whole package and calling go-selinux directly.
2022-02-11 15:20:35 +01:00
Kubernetes Prow Robot
7cfe0ca828
Merge pull request #107774 from calvin0327/fix-data-race
fix: data race when hijack klog
2022-02-10 23:32:15 -08:00
Cheng Xing
b152fa9b6c Remove verult from OWNERS files 2022-02-10 18:25:38 -08:00
Kevin Klues
155562dd2e Fix bug in TopologyManager with merging hints when NUM_NUMA > 2
Before this fix, hint permutations such as:

	permutation: [{11 true} {0101 true}]

Could result in merged hints of:

	mergedHint: {01 true}

This was possible because both hints in the permutation container a "preferred"
allocation (i.e. the full set of NUMA nodes set in the affinity bitmask are
*required* to satisfy the allocation). With this in place, the simplified logic
we had simply kept the merged hint as preferred as well.

However, what we really want is to ensure that the merged hint is only
preferred if *true* alignment of all resources is possible (i.e. if all hints
in the permutation are preferred AND their affinities are exactly equal).

The only exception to this is if *no* topology information is provided by a
given hint provider. In this case, we assume alignment doesn't matter and only
consider the resources that actually have hints provided for them.

This changes the semantics of permutations of the form:

	permutation: [{111 true} {011 true}]

To now result in the merged hint of:

	mergedHint: {011 false}

Instead of:

	mergedHint: {011 true}

This is arguably how it should always have been though (because a hint should
not be preferred if true alignment isn't possible), and two tests have had to
change to accomodate these new semantics.

This commit changes the merge function to implement the updated logic, adds a
test to verify it is functioning correctly, and updates the two tests mentioned
above to adjust to the new semantics.

Signed-off-by: Kevin Klues <kklues@nvidia.com>
2022-02-10 22:07:51 +00:00
Sascha Grunert
effbcd3a0a
Add support for CRI verbose fields
The remote runtime implementation now supports the `verbose` fields,
which are required for consumers like cri-tools to enable multi CRI
version support.

Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
2022-02-10 17:12:26 +01:00
Ciprian Hacman
0819451ea6 Clean up logic for deprecated flag --container-runtime in kubelet
Signed-off-by: Ciprian Hacman <ciprian@hakman.dev>
2022-02-10 13:26:59 +02:00
Kubernetes Prow Robot
3b4a9cdfff
Merge pull request #108007 from endocrimes/dani/cm-remove-docker
cm: Remove legacy docker references
2022-02-10 03:23:47 -08:00
Gunju Kim
eb4cd9ab4e
Check taint/toleration before accepting pods, except for static pods 2022-02-10 19:39:26 +09:00
Kubernetes Prow Robot
518a3c2f70
Merge pull request #107108 from linxiulei/fix_pid
Read number of running processes from /proc/loadavg.
2022-02-10 01:15:47 -08:00
Kubernetes Prow Robot
40c2d04946
Merge pull request #107112 from linxiulei/fix_pidmax
Consider threads-max when deciding MaxPID.
2022-02-09 20:49:45 -08:00
Kubernetes Prow Robot
0dcd6eaa0d
Merge pull request #103934 from boenn/tainttoleration
De-duplicate predicate (known as filter now) logic shared in kubelet and scheduler
2022-02-09 16:53:46 -08:00
Kubernetes Prow Robot
8d01b02c60
Merge pull request #107096 from hakman/remove_non-masquerade-cidr
Remove deprecated flag --non-masquerade-cidr in kubelet
2022-02-08 12:42:50 -08:00
Danielle Lancashire
3630328fd9 eviction: Deflake TestStart
TestStart was previously flaky. In approx 100_000 local runs, it failed
about 70% of the time, and has been mentioned as a flaky unit test in
the past.

This flake was due to a race condition with the logic as written and the
go scheduler. UpdateThreshold calls `notifier.Start(events)` in a new Go
Routine, which is not guarunteed to be called immediately.

This meant that if `m.Start()` was called before `notifier.Start()`, the
test would fail, as the notifier would not have been started before the
4 events were processed and lock released.

Here, we update the test to more closely match the intended application
behaviour, and have events passed to the channel when `Start` is called
on the notifier.

This ensures that -Start gets called and additionally validates
that the correct channel is provided to the notifier.

Stop was never called previously, as it only gets called on a subsequent
call to UpdateThreshold. `AnyTimes()` hid that this did not occur.
2022-02-08 17:03:44 +01:00
Danielle Lancashire
c198062da4 cm: Remove legacy docker references
Dockershim and built-in Docker support are gone. Cleans up dead code
references to them.
2022-02-08 16:25:04 +01:00
Jorik Jonker
27b8f13763 kubelet: expose OOM metrics
cAdvisor has code to expose OOM metrics since 0.40.0, but this was not
included in Kubelet so far. This commit enables it.

Signed-off-by: Jorik Jonker <jorik.jonker@eu.equinix.com>
2022-02-08 12:24:25 +01:00
Jordan Liggitt
3a132bd206 Fix kubelet cri round trip test 2022-02-05 17:59:29 -05:00
Kubernetes Prow Robot
469c4c4a30
Merge pull request #106715 from aojea/dual_hostnet_pods
set secondary address on host-network pods
2022-02-04 12:17:30 -08:00
Antonio Ojea
bc8e7ac1a0 ignore CRI PodSandboxNetworkStatus for host network pods 2022-02-04 18:41:57 +01:00
Gunju Kim
3ce5c944a8
kubelet: Clean up a static pod that has been terminated before starting
- Allow a podWorker to start if it is blocked by a pod that has been
  terminated before starting
- When a pod can't start AND has already been terminated, exit cleanly
- Add a unit test that exercises race conditions in pod workers
2022-02-02 16:05:32 -05:00
Clayton Coleman
b638bd8b03 kubelet: If the container status is created, we are waiting
If CRI returns a container that has been created but is not running,
it is not safe to assume it is terminal, as our connection to CRI
may have failed. Instead, created is treated as waiting, as in
"waiting for this container to start". Either syncPod or
syncTerminatingPod is responsible for handling this state.
2022-01-28 18:32:15 -05:00
Jordan Liggitt
1d27942efc Include pod UID in secret/configmap cache key 2022-01-27 22:21:52 -05:00
Kubernetes Prow Robot
4dba52cdf4
Merge pull request #107821 from liggitt/kubelet-secret-manager
Move kubelet secret and configmap manager calls to sync_Pod functions
2022-01-27 08:26:58 -08:00
Jordan Liggitt
085693eff2 Move kubelet secret and configmap manager calls to sync_Pod functions 2022-01-27 10:09:13 -05:00
Kubernetes Prow Robot
8712a903cb
Merge pull request #107608 from marseel/fake_prober_in_kubemark
Use FakeProber in kubemark clusters
2022-01-26 19:42:49 -08:00
Jyoti Mahapatra
0e0abd602f
parse ipv6 address before comparison (#107736)
* parse ipv6 address before comparison

Signed-off-by: Jyoti Mahapatra <jyotima@amazon.com>

* use parse sloppy

Signed-off-by: Jyoti Mahapatra <jyotima@amazon.com>

* use parse sloppy

Signed-off-by: Jyoti Mahapatra <jyotima@amazon.com>

* use node address from cloudprovider as is

Signed-off-by: Jyoti Mahapatra <jyotima@amazon.com>
2022-01-26 18:38:49 -08:00
Marcel Zięba
b4b4b8fd6d Use FakeProber in kubemark clusters 2022-01-26 09:29:04 +00:00
Kubernetes Prow Robot
38e9a29620
Merge pull request #106932 from SergeyKanzhelev/removeDynamicKubeletConfig
Remove dynamic kubelet config
2022-01-25 19:20:25 -08:00
Ryan Phillips
25f95f2bde kubelet: fix podstatus not containing pod full name 2022-01-25 13:21:04 -06:00
calvin
d9ab5e18d3 fix: data race when hijack klog
Signed-off-by: calvin <wen.chen@daocloud.io>
2022-01-24 15:01:49 +08:00
fengzixu
9808ae48a0 change the volume health status metrics name 2022-01-23 02:44:10 +00:00
Jack
7655702313 add container probe duration metrics 2022-01-20 16:50:02 -08:00
yanghesong
4cab028a92 Remove dockershim comments in kubelet
Signed-off-by: yanghesong <hesong.yang@foxmail.com>
2022-01-20 16:15:29 +08:00
Sergey Kanzhelev
7e7bc6d53b remove DynamicKubeletConfig logic from kubelet 2022-01-19 22:38:04 +00:00
Paco Xu
6611c36372 add volume type and seperated histogram for volume stat collection 2022-01-19 22:33:37 +08:00
Ciprian Hacman
21809043b5 Remove deprecated flag --non-masquerade-cidr in kubelet
Signed-off-by: Ciprian Hacman <ciprian@hakman.dev>
2022-01-19 09:17:26 +02:00
Kubernetes Prow Robot
feb758027c
Merge pull request #106907 from cyclinder/remove_dockershim_flags
Clean up dockershim flags in the kubelet
2022-01-18 09:09:09 -08:00
Eric Lin
fea15977c8 Consider threads-max when deciding MaxPID.
Fixes kubernetes#107111
2022-01-17 21:51:59 +00:00
Antonio Ojea
a20b2088ac set secondary address on host-network pods
host-network pods IPs are obtained from the reported kubelet nodeIPs.

Historically, host-network podIPs are immutable once set, but when
we've added dual-stack support, we didn't consider that the secondary
IP address may not be present at the same time that the primary nodeIP.

If a secondary IP address is added to a node after the host-network pods
IPs are set, we can add the secondary host-network pod IP address
maintaining the current behavior of not updating the current podIPs on
host-network pods.
2022-01-17 18:05:42 +01:00
Paco Xu
e3745a10aa add warning log if volume calculation took too long than 1 second 2022-01-17 10:40:49 +08:00
Kubernetes Prow Robot
22a03f893d
Merge pull request #107207 from ehashman/deprecate-log-sanitization
Deprecate dynamic log sanitization
2022-01-15 15:19:26 -08:00
songlh
50840f5039 change to use require.NoError 2022-01-14 21:46:12 -05:00
cyclinder
07999dac70 Clean up dockershim flags in the kubelet
Signed-off-by: cyclinder <qifeng.guo@daocloud.io>
Co-authored-by: Ciprian Hacman <ciprian@hakman.dev>
Signed-off-by: Ciprian Hacman <ciprian@hakman.dev>
2022-01-14 16:02:50 +02:00
Kubernetes Prow Robot
8c6b910e68
Merge pull request #107550 from wojtek-t/remove_selflink_from_kubelet
Remove no-longer used selflink code from kubelet
2022-01-14 03:28:27 -08:00
Wojciech Tyczyński
6088fe4221 Remove no-longer used selflink code from kubelet 2022-01-14 10:38:23 +01:00
Kubernetes Prow Robot
3bd422dc76
Merge pull request #107293 from dims/jan-1-owners-cleanup
Cleanup OWNERS files - Jan 2021 Week 1
2022-01-13 10:30:30 -08:00
JUN YANG
2247b76ab1 Add test cases of kubelet_pods_test.go.
Signed-off-by: JUN YANG <yang.jun22@zte.com.cn>
2022-01-13 14:37:31 +08:00
Patrick Ohly
9eaa2dc554 avoid klog Info calls without verbosity
In the following code pattern, the log message will get logged with v=0 in JSON
output although conceptually it has a higher verbosity:

   if klog.V(5).Enabled() {
       klog.Info("hello world")
   }

Having the actual verbosity in the JSON output is relevant, for example for
filtering out only the important info messages. The solution is to use
klog.V(5).Info or something similar.

Whether the outer if is necessary at all depends on how complex the parameters
are. The return value of klog.V can be captured in a variable and be used
multiple times to avoid the overhead for that function call and to avoid
repeating the verbosity level.
2022-01-12 07:48:36 +01:00
Kubernetes Prow Robot
b5103f6117
Merge pull request #107426 from yanghesong/remove_validate_runtime
Remove runtime in validate
2022-01-11 20:50:36 -08:00
Eric Lin
5fdf24baca Read number of running processes from /proc/loadavg.
Fallback to using sysinfo syscall if failed.

Fix kubernetes#107107
2022-01-11 21:33:53 +00:00
Kubernetes Prow Robot
cadbe8dfb5
Merge pull request #107250 from cndoit18/use-errors
cleanup(kubelet): use errors.Is(err, os.ErrProcessDone)
2022-01-11 10:49:01 -08:00
Kubernetes Prow Robot
19069665f9
Merge pull request #107094 from adisky/d-container-runtime
Mark container-runtime kubelet flag as deprecated
2022-01-11 10:48:46 -08:00
Kubernetes Prow Robot
7eb5046064
Merge pull request #106470 from qmloong/qmloong/fix
fix: some typos and syncPod outdated workflow annotation
2022-01-11 10:48:38 -08:00
Kubernetes Prow Robot
5f4914604d
Merge pull request #106353 from gjkim42/remove-false-pleg-errors
kubelet: Remove false PLEG errors
2022-01-11 10:48:26 -08:00
fengzixu
5d544d3f01 fix comment 2022-01-11 14:28:31 +00:00
fengzixu
f96449f2e2 fix unit test 2022-01-11 13:50:18 +00:00
fengzixu
e2b5b5465a improve metrics comment 2022-01-11 13:50:18 +00:00
fengzixu
c1a58d715c fix unit test 2022-01-11 13:50:18 +00:00
fengzixu
5593e27429 improve metrics comment 2022-01-11 13:50:18 +00:00
fengzixu
1cdc694ac2 fix unit test 2022-01-11 13:50:18 +00:00
fengzixu
4a72f08a28 add useful comment for volume stats metrics 2022-01-11 13:50:18 +00:00
fengzixu
b885deffe3 fix unit test 2022-01-11 13:50:17 +00:00
fengzixu
ed7fd0ced5 add volumeHealth label to metrics 2022-01-11 13:50:17 +00:00
fengzixu
bab1755274 fix: correct metrics expression 2022-01-11 13:50:17 +00:00
fengzixu
d71e21e01e add volume kubelet_volume_stats_health_abnormal to kubelet 2022-01-11 13:50:17 +00:00
Dingzhu Lurong
1de2f3cc7d add writer error handler 2022-01-11 11:47:25 +08:00
Kubernetes Prow Robot
a0dfd958d5
Merge pull request #107163 from cyclinder/fix_leak_goroutine
fix goroutine leaks in TestConfigurationChannels
2022-01-10 17:23:16 -08:00
Davanum Srinivas
9682b7248f
OWNERS cleanup - Jan 2021 Week 1
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2022-01-10 08:14:29 -05:00
cyclinder
928e686877 fix goroutine leaks in TestConfigurationChannels
Signed-off-by: cyclinder <qifeng.guo@daocloud.io>
2022-01-10 19:51:16 +08:00
yanghesong
6905fef761 Remove runtime in validate
Validate is useless as dockershim is removed

Signed-off-by: yanghesong <hesong.yang@foxmail.com>
2022-01-09 09:11:49 +08:00
wq
4f38d4aaa1 fix a typo in the comment of ImageCredentialProviderConfigFile 2022-01-09 00:07:43 +09:00
Kubernetes Prow Robot
d1a5513cb0
Merge pull request #107006 from gnufied/add-total-mount-time-metrics
Add metric for reporting total end-to-end mount time
2022-01-07 06:19:31 -08:00
Kubernetes Prow Robot
09fccc3533
Merge pull request #106796 from jonyhy96/fix-timer
kubelet: use newtimer instead in nodeshutdown manager
2022-01-06 11:47:12 -08:00
Kubernetes Prow Robot
03ee86c09c
Merge pull request #104837 from eggiter/fix-release-reused-cpus
fix(cpumanager): Do not release CPUs of init containers while they are being reused in app containers
2022-01-06 11:46:38 -08:00
Kubernetes Prow Robot
0b9ad84973
Merge pull request #107116 from yxxhero/add_more_msg_for_no_podsandbox_container
add more message for no PodSandbox container
2022-01-06 08:58:09 -08:00
Kubernetes Prow Robot
b457ae72f5
Merge pull request #106644 from ahrtr/add_info_counter_perfcounter
Add more info when failing to call PdhAddEnglishCounter
2022-01-06 06:45:01 -08:00
Aditi Sharma
e03d7d3fdd Mark container-runtime flag as deprecated
Signed-off-by: Aditi Sharma <adi.sky17@gmail.com>
2022-01-06 10:23:03 +05:30
Mengjiao Liu
beda4cafb6 kubelet: Remove the deprecated flag --experimental-check-node-capabilities-before-mount 2022-01-06 11:47:11 +08:00
Kubernetes Prow Robot
73b68f5233
Merge pull request #106979 from a2ush/fix_typo
Fix comment out typo (from resolve.conf to resolv.conf) and change the content name (from maxResolveConfLength to maxResolvConfLength)
2022-01-05 16:08:26 -08:00
Kubernetes Prow Robot
afd254a18f
Merge pull request #106756 from victory460/feature_helpers
code cleanup for container/helpers.go
2022-01-05 08:20:42 -08:00
Kubernetes Prow Robot
19591a1324
Merge pull request #105829 from yuanchen8911/master
Fix and improve comments on kubelet metrics
2022-01-04 23:02:32 -08:00
Kubernetes Prow Robot
abfbbe4dda
Merge pull request #107119 from hakman/remove_dockerless
Remove dockerless build tag and DockerLegacyService interface
2022-01-04 11:27:21 -08:00
Paco Xu
c5d8354e0e add "kubelet_volume_stat_cal_duration_seconds_bucket" VolumeStatCalDuration metrics for fsquato monitoring benchmark 2021-12-31 11:39:40 +08:00
cndoit18
601d02b90f
refactor(kubelet): use errors.Is(err, os.ErrProcessDone)
use errors.Is(err, os.ErrProcessDone) here and remove "process already finished" string comparison.

Signed-off-by: cndoit18 <cndoit18@outlook.com>
2021-12-29 18:10:06 +08:00
Elana Hashman
dbd50d9f50
Remove dynamic log sanitization fields from Kubelet config validation 2021-12-23 13:03:13 -08:00
Kubernetes Prow Robot
f0dbc32ed9
Merge pull request #106853 from gnufied/disable-exp-backoff-volume-not-inuse
When volume is not marked in-use, do not backoff
2021-12-22 19:46:37 -08:00
Hemant Kumar
7989f27044 use node informer to check volumes attachment status before backoff
fix unit tests
2021-12-20 11:57:05 -05:00
songlh
e03a0bc105 fixing the panic in TestVersion 2021-12-18 19:20:15 -05:00
Ciprian Hacman
5bae9b9288 Clean up DockerLegacyService interface
Signed-off-by: Ciprian Hacman <ciprian@hakman.dev>
2021-12-18 12:24:54 +02:00
Ciprian Hacman
6cdb1c225d Clean up dockerless build tag
Signed-off-by: Ciprian Hacman <ciprian@hakman.dev>
2021-12-18 12:18:25 +02:00
yxxhero
a90b149be0 add more message for no PodSandbox container
Signed-off-by: yxxhero <aiopsclub@163.com>
2021-12-18 09:52:03 +08:00
Davanum Srinivas
497e9c1971
Cleanup OWNERS files (No Activity in the last year)
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2021-12-15 10:34:02 -05:00
a2ush
393dec26f6 Change the name of the constant 2021-12-14 22:42:57 +09:00
Hemant Kumar
55b5e6dc33 Add metric for reporting total end-to-end mount time
This metric includes time spent in waiting for devices to be attached,
any RPC calls and performing recursive chown etc.
2021-12-13 16:23:01 -05:00
a2ush
d775483381 Fix comment out typo 2021-12-11 22:27:38 +09:00
Kubernetes Prow Robot
1d66302c42
Merge pull request #106458 from dims/lint-yaml-in-owners-files
Lint/Beautify yaml in OWNERS files
2021-12-10 06:39:12 -08:00
Kubernetes Prow Robot
1b0d83f1d6
Merge pull request #106599 from klueska/fix-numa-bug
Fix Bugs in CPUManager distribute NUMA policy option
2021-12-10 04:41:12 -08:00
haoyun
92fa957dd1 feat: use clock instead
Signed-off-by: haoyun <yun.hao@daocloud.io>
2021-12-10 13:59:12 +08:00
Kubernetes Prow Robot
15e5f2a19a
Merge pull request #106291 from sbs2001/fix_invalid_comment
Remove invalid comment in legacyregistry
2021-12-09 19:03:10 -08:00
Davanum Srinivas
9405e9b55e
Check in OWNERS modified by update-yamlfmt.sh
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2021-12-09 21:31:26 -05:00
David Porter
95264a418d kubelet: set failed phase during graceful shutdown
Revert to previous behavior in 1.21/1.20 of setting pod phase to failed
during graceful node shutdown.

Setting pods to failed phase will ensure that external controllers that
manage pods like deployments will create new pods to replace those that
are shutdown. Many customers have taken a dependency on this behavior
and it was breaking change in 1.22, so this change reverts back to the
previous behavior.

Signed-off-by: David Porter <david@porter.me>
2021-12-09 13:17:40 -08:00
Kubernetes Prow Robot
cdf3ad823a
Merge pull request #97252 from dims/drop-dockershim
Completely remove in-tree dockershim from kubelet
2021-12-08 12:51:46 -08:00
Kubernetes Prow Robot
f356ae4ad9
Merge pull request #101719 from SergeyKanzhelev/removeReallyCrashForTesting
Remove ReallyCrashForTesting and cleaned up some references to Handle…
2021-12-07 23:39:45 -08:00
caozhiyuan
1a59bcb142 add validation test for RegisterWithTaints 2021-12-08 10:36:43 +08:00
Kubernetes Prow Robot
b685b3982d
Merge pull request #105360 from shuheiktgw/refactor_kubelet_config_validation_tests
Refactor kubelet config validation tests
2021-12-07 17:25:43 -08:00
Davanum Srinivas
bc78dff42e
update files to drop dockershim
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2021-12-07 15:15:13 -05:00
Davanum Srinivas
83265c9171
drop files deleted from pkg/kubelet/dockershim
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2021-12-07 15:15:13 -05:00
Hemant Kumar
5b7b2e2f6c When volume is not marked in-use, do not backoff 2021-12-07 11:50:15 -05:00
Sascha Grunert
a063a2ba3e
Revert dockershim CRI v1 changes
We should not touch the dockershim ahead of removal and therefore
default to `v1alpha2` CRI instead of `v1`.

Partially reverts changes from https://github.com/kubernetes/kubernetes/pull/106501

Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
2021-12-03 18:37:11 +01:00
xuweiwei
21238c2593 code cleanup for container/helpers.go 2021-12-01 11:17:33 +08:00
Sergey Kanzhelev
a11453efbc remove ReallyCrashForTesting and cleaned up some references to HandleCrash behavior 2021-11-29 20:00:10 +00:00
menglong.qi
12eff56460 fix: syncPod outdated workflow comment 2021-11-28 17:21:29 +08:00
boenn
cec2aae1e5 rebase master 2021-11-25 11:21:12 +08:00
Kevin Klues
f8511877e2 Add regression test for CPUManager distribute NUMA algorithm
We witnessed this exact allocation attempt in a live cluster and witnessed the
algorithm fail with an accounting error. This test was added to verify that
this case is now handled by the updates to the algorithm and that we don't
regress from it in the future.

"test" description="ensure previous failure encountered on live machine has been fixed (1/1)"
"combo remainderSet balance" combo=[2 4 6] remainderSet=[2 4 6] distribution=9 remainder=1 available=[14 2 4 4 0 3 4 1] balance=4.031
"combo remainderSet balance" combo=[2 4 6] remainderSet=[2 4] distribution=9 remainder=1 available=[0 3 4 1 14 2 4 4] balance=4.031
"combo remainderSet balance" combo=[2 4 6] remainderSet=[2 6] distribution=9 remainder=1 available=[1 14 2 4 4 0 3 4] balance=4.031
"combo remainderSet balance" combo=[2 4 6] remainderSet=[4 6] distribution=9 remainder=1 available=[1 3 4 0 14 2 4 4] balance=4.031
"combo remainderSet balance" combo=[2 4 6] remainderSet=[2] distribution=9 remainder=1 available=[4 0 3 4 1 14 2 4] balance=4.031
"combo remainderSet balance" combo=[2 4 6] remainderSet=[4] distribution=9 remainder=1 available=[3 4 0 14 2 4 4 1] balance=4.031
"combo remainderSet balance" combo=[2 4 6] remainderSet=[6] distribution=9 remainder=1 available=[1 13 2 4 4 1 3 4] balance=3.606
"bestCombo found" distribution=9 bestCombo=[2 4 6] bestRemainder=[6]

Signed-off-by: Kevin Klues <kklues@nvidia.com>
2021-11-24 20:49:58 +00:00
Kevin Klues
e284c74d93 Add unit test for CPUManager distribute NUMA algorithm verifying fixes
Before Change:
"test" description="ensure bestRemainder chosen with NUMA nodes that have enough CPUs to satisfy the request"
"combo remainderSet balance" combo=[0 1 2 3] remainderSet=[0 1] distribution=8 remainder=2 available=[-1 -1 0 6] balance=2.915
"combo remainderSet balance" combo=[0 1 2 3] remainderSet=[0 2] distribution=8 remainder=2 available=[-1 0 -1 6] balance=2.915
"combo remainderSet balance" combo=[0 1 2 3] remainderSet=[0 3] distribution=8 remainder=2 available=[5 -1 0 0] balance=2.345
"combo remainderSet balance" combo=[0 1 2 3] remainderSet=[1 2] distribution=8 remainder=2 available=[0 -1 -1 6] balance=2.915
"combo remainderSet balance" combo=[0 1 2 3] remainderSet=[1 3] distribution=8 remainder=2 available=[0 -1 0 5] balance=2.345
"combo remainderSet balance" combo=[0 1 2 3] remainderSet=[2 3] distribution=8 remainder=2 available=[0 0 -1 5] balance=2.345
"bestCombo found" distribution=8 bestCombo=[0 1 2 3] bestRemainder=[0 3]

--- FAIL: TestTakeByTopologyNUMADistributed (0.01s)
    --- FAIL: TestTakeByTopologyNUMADistributed/ensure_bestRemainder_chosen_with_NUMA_nodes_that_have_enough_CPUs_to_satisfy_the_request (0.00s)
        cpu_assignment_test.go:867: unexpected error [accounting error, not enough CPUs allocated, remaining: 1]

After Change:
"test" description="ensure bestRemainder chosen with NUMA nodes that have enough CPUs to satisfy the request"
"combo remainderSet balance" combo=[0 1 2 3] remainderSet=[3] distribution=8 remainder=2 available=[0 0 0 4] balance=1.732
"bestCombo found" distribution=8 bestCombo=[0 1 2 3] bestRemainder=[3]

SUCCESS

Signed-off-by: Kevin Klues <kklues@nvidia.com>
2021-11-24 20:45:37 +00:00
Kevin Klues
031f11513d Fix accounting bug in CPUManager distribute NUMA policy
Without this fix, the algorithm may decide to allocate "remainder" CPUs from a
NUMA node that has no more CPUs to allocate. Moreover, it was only considering
allocation of remainder CPUs from NUMA nodes such that each NUMA node in the
remainderSet could only allocate 1 (i.e. 'cpuGroupSize') more CPUs. With these
two issues in play, one could end up with an accounting error where not enough
CPUs were allocated by the time the algorithm runs to completion.

The updated algorithm will now omit any NUMA nodes that have 0 CPUs left from
the set of NUMA nodes considered for allocating remainder CPUs. Additionally,
we now consider *all* combinations of nodes from the remainder set of size
1..len(remainderSet). This allows us to find a better solution if allocating
CPUs from a smaller set leads to a more balanced allocation. Finally, we loop
through all NUMA nodes 1-by-1 in the remainderSet until all rmeainer CPUs have
been accounted for and allocated. This ensure that we will not hit an
accounting error later on because we explicitly remove CPUs from the remainder
set until there are none left.

A follow-on commit adds a set of unit tests that will fail before these
changes, but succeeds after them.

Signed-off-by: Kevin Klues <kklues@nvidia.com>
2021-11-24 19:18:11 +00:00
Kevin Klues
5317a2e2ac Fix error handling in CPUManager distribute NUMA tests
Signed-off-by: Kevin Klues <kklues@nvidia.com>
2021-11-24 16:51:31 +00:00
Kevin Klues
dc4430b663 Add a sum() helper to the CPUManager cpuassignment logic
Signed-off-by: Kevin Klues <kklues@nvidia.com>
2021-11-24 16:51:29 +00:00
Kevin Klues
cfacc22459 Allow the map.Values() function in the CPUManager to take a set of keys
Signed-off-by: Kevin Klues <kklues@nvidia.com>
2021-11-24 16:51:28 +00:00
Kevin Klues
a160d9a8cd Fix CPUManager algo to calculate min NUMA nodes needed for distribution
Previously the algorithm was too restrictive because it tried to calculate the
minimum based on the number of *available* NUMA nodes and the number of
*available* CPUs on those NUMA nodes. Since there was no (easy) way to tell how
many CPUs an individual NUMA node happened to have, the average across them was
used. Using this value however, could result in thinking you need more NUMA
nodes to possibly satisfy a request than you actually do.

By using the *total* number of NUMA nodes and CPUs per NUMA node, we can get
the true minimum number of nodes required to satisfy a request. For a given
"current" allocation this may not be the true minimum, but its better to start
with fewer and move up than to start with too many and miss out on a better
option.

Signed-off-by: Kevin Klues <kklues@nvidia.com>
2021-11-24 16:51:26 +00:00
Kevin Klues
209cd20548 Fix unit tests following bug fix in CPUManager for map functions (2/2)
Now that the algorithm for balancing CPU distributions across NUMA nodes is
correct, this test actually behaves differently for the "packed" vs.
"distributed" allocation algorithms (as it should).

In the "packed" case we need to ensure that CPUs are allocated such that they
are packed onto cores. Since one CPU is already allocated from a core on NUMA
node 0, we want the next CPU to be its hyperthreaded pair (even though the
first available CPU id is on Socket 1).

In the "distributed" case, however, we want to ensure CPUs are allocated such
that we have an balanced distribution of CPUs across all NUMA nodes. This
points to allocating from Socket 1 if the only other CPU allocated has been
done on Socket 0.

To allow CPUs allocations to be packed onto full cores, one can allocate them
from the "distributed" algorithm with a 'cpuGroupSize' equal to the number of
hypthreads per core (in this case 2). We added an explicit test case for this,
demonstrating that we get the same result as the "packed" algorithm does, even
though the "distributed" algorithm is in use.

Signed-off-by: Kevin Klues <kklues@nvidia.com>
2021-11-24 16:51:24 +00:00
Kevin Klues
67f719cb1d Fix unit tests following bug fix in CPUManager for map functions (1/2)
This fixes two related tests to better test our "balanced" distribution algorithm.

The first test originally provided an input with the following number of CPUs
available on each NUMA node:

Node 0: 16
Node 1: 20
Node 2: 20
Node 3: 20

It then attempted to distribute 48 CPUs across them with an expectation that
each of the first 3 NUMA nodes would have 16 CPUs taken from them (leaving Node
0 with no more CPUs in the end).

This would have resulted in the following amount of CPUs on each node:

Node 0: 0
Node 1: 4
Node 2: 4
Node 3: 20

Which results in a standard deviation of 7.6811

However, a more balanced solution would actually be to pull 16 CPUs from NUMA
nodes 1, 2, and 3, and leave 0 untouched, i.e.:

Node 0: 16
Node 1: 4
Node 2: 4
Node 3: 4

Which results in a standard deviation of 5.1961524227066

To fix this test we changed the original number of available CPUs to start with
4 less CPUs on NUMA node 3, and 2 more CPUs on NUMA node 0, i.e.:

Node 0: 18
Node 1: 20
Node 2: 20
Node 3: 16

So that we end up with a result of:

Node 0: 2
Node 1: 4
Node 2: 4
Node 3: 16

Which pulls the CPUs from where we want and results in a standard deviation of 5.5452

For the second test, we simply reverse the number of CPUs available for Nodes 0
and 3 as:

Node 0: 16
Node 1: 20
Node 2: 20
Node 3: 18

Which forces the allocation to happen just as it did for the first test, except
now on NUMA nodes 1, 2, and 3 instead of NUMA nodes 0,1, and 2.

Signed-off-by: Kevin Klues <kklues@nvidia.com>
2021-11-24 16:51:23 +00:00
Kevin Klues
4008ea0b4c Fix bug in CPUManager map.Keys() and map.Values() implementations
Previously these would return lists that were too long because we appended to
pre-initialized lists with a specific size.

Since the primary place these functions are used is in the mean and standard
deviation calculations for the NUMA distribution algorithm, it meant that the
results of these calculations were often incorrect.

As a result, some of the unit tests we have are actually incorrect (because the
results we expect do not actually produce the best balanced
distribution of CPUs across all NUMA nodes for the input provided).

These tests will be patched up in subsequent commits.

Signed-off-by: Kevin Klues <kklues@nvidia.com>
2021-11-24 16:51:21 +00:00
Kevin Klues
446c58e0e7 Ensure we balance across *all* NUMA nodes in NUMA distribution algo
Signed-off-by: Kevin Klues <kklues@nvidia.com>
2021-11-24 16:51:19 +00:00
Kevin Klues
c8559bc43e Short-circuit CPUManager distribute NUMA algo for unusable cpuGroupSize
Signed-off-by: Kevin Klues <kklues@nvidia.com>
2021-11-24 16:51:16 +00:00
Kevin Klues
b28c1392d7 Round the CPUManager mean and stddev calculations to the nearest 1000th
Signed-off-by: Kevin Klues <kklues@nvidia.com>
2021-11-24 16:51:13 +00:00
ahrtr
b7f22801fe add more info when failing to call PdhAddEnglishCounter 2021-11-24 13:49:34 +08:00
Kubernetes Prow Robot
ddfc53922c
Merge pull request #106414 from jonyhy96/kubelet-fix-flake
kubelet: fix npe in test
2021-11-19 07:06:51 -08:00
haoyun
65ac99eef5 fix: npe in kubelet test
Signed-off-by: haoyun <yun.hao@daocloud.io>
Co-authored-by: Antonio Ojea <antonio.ojea.garcia@gmail.com>
2021-11-19 17:44:05 +08:00
shuheiktgw
2acdaeb361 Refactor Kubelet config validation tests 2021-11-18 22:38:01 +09:00
shuheiktgw
35ad91ab37 Refactor Kubelet config validations 2021-11-18 22:31:31 +09:00
Shivam Sandbhor
6652c54d83 Remove invalid comment in legacyregistry
Signed-off-by: Shivam Sandbhor <shivam.sandbhor@gmail.com>
2021-11-18 15:05:00 +05:30
Kubernetes Prow Robot
d766ab88f7
Merge pull request #106501 from ehashman/cri-graduation-v1
Make CRI v1 the default and allow a fallback to v1alpha2
2021-11-17 19:57:01 -08:00
Kubernetes Prow Robot
91b7fb4dc9
Merge pull request #102915 from wzshiming/feat/graceful-shutdown-based-on-pod-priority
Graceful Node Shutdown Based On Pod Priority
2021-11-17 18:45:03 -08:00
Kubernetes Prow Robot
321e22d365
Merge pull request #106505 from ehashman/revert-103980-dkc-metrics
Revert "Bump DynamicKubeConfig metric deprecation to 1.23"
2021-11-17 16:55:03 -08:00
Kubernetes Prow Robot
e4952f32b7
Merge pull request #106463 from SergeyKanzhelev/grpcProbe
Implement grpc probe action
2021-11-17 12:43:54 -08:00
Elana Hashman
b35c500541
Revert "Bump DynamicKubeConfig metric deprecation to 1.23" 2021-11-17 11:48:49 -08:00
Elana Hashman
31c4273f66
Add test for memory equivalence
See https://github.com/kubernetes/kubernetes/pull/106006#issuecomment-971004230

Co-Authored-By: Jordan Liggitt <liggitt@google.com>
2021-11-17 11:07:09 -08:00
Sascha Grunert
de37b9d293
Make CRI v1 the default and allow a fallback to v1alpha2
This patch makes the CRI `v1` API the new project-wide default version.
To allow backwards compatibility, a fallback to `v1alpha2` has been added
as well. This fallback can either used by automatically determined by
the kubelet.

Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
2021-11-17 11:05:05 -08:00
Sergey Kanzhelev
b7affcced1 implement :grpc probe action 2021-11-17 17:31:23 +00:00
Antonio Ojea
d126b14838 migrate nolint coments to golangci-lint 2021-11-17 13:58:53 +01:00
Hanna Lee
e78b3e8dfe Use nolint directive instead of stopping ticker, per liggit's suggestion 2021-11-17 08:56:57 +01:00
Hanna Lee
69d029bddb Add syncTicker.Stop() 2021-11-17 08:56:57 +01:00
Hanna Lee
07a883d8e6 Remove //lint:ignore pragmas that aren't being used anymore 2021-11-17 08:56:54 +01:00
Hanna Lee
1fbf06f5ad Use time.NewTicker instead of time.Tick to avoid leaking 2021-11-17 08:56:00 +01:00
Hanna Lee
0f3836dcc5 Ignore deprecation warnings with //nolint:staticcheck 2021-11-17 08:55:57 +01:00
Kubernetes Prow Robot
6c357f9996
Merge pull request #106041 from jonyhy96/volumemanager-reconciler-codefmt
kubelet: extract multiple ignore errors validate logic to isExpectedError
2021-11-16 22:55:53 -08:00
Shiming Zhang
7a6f792ff3 Add validation for GracefulNodeShutdownBasedOnPodPriority
Co-authored-by: Elana Hashman <ehashman@users.noreply.github.com>
2021-11-17 11:47:12 +08:00
Shiming Zhang
545313bdc7 Implement graceful shutdown based on Pod priority 2021-11-17 11:47:12 +08:00
Shiming Zhang
d82f606970 Add field for KubeletConfiguration and Regenerate 2021-11-17 11:47:12 +08:00
Kubernetes Prow Robot
1f6d5caa9a
Merge pull request #105437 from cmssczy/update-kubelet-configuration
migrate --register-with-taints to KubeletConfiguration
2021-11-16 17:44:00 -08:00
menglong.qi
b886b9b108 fix: typo 2021-11-17 09:22:57 +08:00
Kubernetes Prow Robot
42d8b2f3b9
Merge pull request #106289 from CatherineF-dev/fix-metrics-AlreadyRegisteredError-in-unit-test
Fix metrics AlreadyRegisteredError on TestRecordOperation and TestGetHistogramVecFromGatherer unit test
2021-11-16 16:36:15 -08:00
Kubernetes Prow Robot
6805e6ee41
Merge pull request #104722 from leiyiz/migration
turning on the CSIMigrationGCE feature flag
2021-11-16 15:28:32 -08:00
Léiyì Zhang
275fdf0884 fixing unit test failures induced by turning on CSIMigrationGCE
disable CSIMigrationGCE in some unit tests
2021-11-16 19:26:30 +00:00
CatherineF-dev
5646120fbb Use Reset at first 2021-11-16 18:57:24 +00:00
haoyun
b5409adaeb refactor: extract multiple ignore errors validate to ignoreError
Signed-off-by: haoyun <yun.hao@daocloud.io>
2021-11-16 20:43:50 +08:00
caozhiyuan
bad4faf1b9 migrate --register-with-taints to KubeletConfiguration 2021-11-16 19:10:36 +08:00
Kubernetes Prow Robot
1d1d462d2f
Merge pull request #104287 from jsturtevant/windows-stats
Reduce the number of expensive calls in the Windows stats queries for dockershim
2021-11-15 18:51:37 -08:00
Kubernetes Prow Robot
0473cab823
Merge pull request #103299 from wgahnagl/addPinned
prevents garbage collection from removing pinned images
2021-11-15 18:51:25 -08:00
Kubernetes Prow Robot
39af75af30
Merge pull request #106201 from yxxhero/fea_106111
Add more msg when exec probe timeout
2021-11-15 17:51:37 -08:00
Kubernetes Prow Robot
463802765d
Merge pull request #104650 from yxxhero/initcontainer_oomkiil_as_a_failure
fix init container oomkilled as a failure
2021-11-15 17:51:25 -08:00
Kubernetes Prow Robot
b7c4962472
Merge pull request #105685 from liggitt/kubelet-file-test
Simplify kubelet file config field allowlists
2021-11-15 14:06:48 -08:00
Odin Ugedal
de0ece541c
Fix cpu share issues on systems with large amounts of cpu
On systems where the calculated cpu shares results in a value above the
max value in linux, containers getting that value are unable to start.
This occur on systems with 300+ cpu cores, and where containers are
given such a value.

This issue was fixed for the pod and qos control groups in the similar
cm.MilliCPUToShares that also has tests verifying the behavior. Since
this code already has an dependency on kubelet/cm, lets reuse that code
instead.
2021-11-14 19:49:19 +00:00
Kubernetes Prow Robot
e4c795168b
Merge pull request #106332 from bobbypage/disable-memcg-notifier
kubelet: cgroupv2 disable memcg notifications
2021-11-12 18:36:46 -08:00
CatherineF-dev
d9737eabf4 Use HandlerFor 2021-11-12 23:09:51 +00:00
CatherineF-dev
49d341aa2b Use defer in non-loop 2021-11-12 23:03:38 +00:00
Kubernetes Prow Robot
1f6aa87a93
Merge pull request #105744 from jsturtevant/windows-containerd-networkstats
Get Windows network stats directly for Containerd
2021-11-12 12:36:41 -08:00
Kubernetes Prow Robot
5f0a94b23c
Merge pull request #104743 from gjkim42/ensure-pod-uniqueness
Ensure there is one running static pod with the same full name
2021-11-12 12:36:28 -08:00
Kubernetes Prow Robot
6c04f87470
Merge pull request #106382 from rphillips/fix_close_log
kubelet: fix file descriptor leak in log rotations
2021-11-12 09:22:40 -08:00
Neha Lohia
fa1b6765d5
move pkg/util/node to component-helpers/node/util (#105347)
Signed-off-by: Neha Lohia <nehapithadiya444@gmail.com>
2021-11-12 07:52:27 -08:00
CatherineF-dev
a30af261f1 remove lint 2021-11-12 15:03:44 +00:00
Ryan Phillips
d6f9df424a defer close the rotated log open 2021-11-12 08:13:24 -06:00
CatherineF-dev
a8324a3bb7 clean 2021-11-12 03:52:19 +00:00
CatherineF-dev
744785ee40 remove prometheus.DefaultRegisterer 2021-11-12 02:17:28 +00:00
Kubernetes Prow Robot
3ca3daac76
Merge pull request #103415 from tiloso/staticcheck-kubelet
Fix staticcheck failure in pkg/kubelet/cm/cpuset
2021-11-11 15:15:13 -08:00
Gunju Kim
2dd4a00509
kubelet: Remove false PLEG errors 2021-11-12 00:03:01 +09:00
David Porter
f5140d3145 kubelet: cgroupv2 disable memcg notifications
The current memory notifier on cgroupv2 relies on reading
`cgroup.event_control` which is unsupported on cgroupv2. For now, let's
disable the feature on cgroupv2.
2021-11-10 15:40:59 -08:00
ravisantoshgudimetla
696abecada [test][kubelet]: Fix out of bounds in TestSyncLabels unit 2021-11-10 16:53:59 -05:00
James Sturtevant
ab2e58c416 Get networks stats directly 2021-11-10 12:43:56 -08:00
James Sturtevant
c39945c116 Add unit tests to existing code 2021-11-10 11:50:04 -08:00
James Sturtevant
3564cd5beb Reduce calls to docker from dockershim for stats 2021-11-10 11:25:03 -08:00
Kubernetes Prow Robot
b56dc43458
Merge pull request #106282 from bobbypage/cadvisor-v043
vendor: Bump cAdvisor to v0.43.0
2021-11-10 08:17:38 -08:00
CatherineF-dev
8290400e9c format 2021-11-10 03:29:13 +00:00
CatherineF-dev
ef0b2dfbf4 Fix metrics AlreadyRegisteredError on TestRecordOperation and TestGetHistogramVecFromGatherer unit test 2021-11-10 03:23:54 +00:00
Kubernetes Prow Robot
5d60c8d857
Merge pull request #102393 from mengjiao-liu/fix-sysctl-regex
Upgrade preparation to verify sysctl values containing forward slashes by regex
2021-11-09 18:23:26 -08:00
David Porter
b6269ce5de kubelet: update cAdvisor usage for v0.43
* Change cAdvisor manager constructor
* Change call to adding AcceleratorUsageMetrics

Signed-off-by: David Porter <david@porter.me>
2021-11-09 17:09:12 -08:00
Kubernetes Prow Robot
6ac2d8edc8
Merge pull request #105967 from shivanshu1333/feature2/master/105841
Migrated scheduler files `preemption.go`, `stateful.go`, `resource_allocation.go` to structured logging
2021-11-09 10:28:01 -08:00
ravisantoshgudimetla
889d45d3fb [kubelet] Reject pods with OS field mismatch
Once kubernetes#104613 and kubernetes#104693
merge, we'll have OS field in pod spec. Kubelet should start rejecting pods
where pod.Spec.OS and node's OS(using runtime.GOOS) won't match
2021-11-08 19:18:15 -05:00
Kubernetes Prow Robot
cda360c59f
Merge pull request #104613 from ravisantoshgudimetla/reconcile-labels
[kubelet]: Reconcile OS and arch labels periodically
2021-11-08 14:15:19 -08:00
Kubernetes Prow Robot
8b463cd141
Merge pull request #105406 from marosset/kubelet-metrics-for-host-process-containers
Adding kubelet metrics for started and failed to start HostProcess containers
2021-11-08 13:11:20 -08:00
Shivanshu Raj Shrivastava
f4aad52885
migrated preemption.go, stateful.go, resource_allocation.go to structured logging 2021-11-08 22:52:47 +05:30
Kubernetes Prow Robot
33de444861
Merge pull request #103095 from haircommander/podAndContainerStatsFromCRI-feature-gate
Kubelet: implement support for podAndContainerStatsFromCRI
2021-11-07 18:26:53 -08:00
yxxhero
4211826c3c add more msg when exec probe timeout
Signed-off-by: yxxhero <aiopsclub@163.com>
2021-11-06 15:59:22 +08:00
ravisantoshgudimetla
21c5c2ec5c [kubelet][podadmission]: Validate and reject pods with mismatching labels 2021-11-05 18:47:43 -04:00
ravisantoshgudimetla
02c1bac0b6 [kubelet]: Sync label periodically 2021-11-05 18:47:43 -04:00
Mark Rossetti
ef324d6bbd Adding kubelet metrics for started and failed to start HostProcess containers
Signed-off-by: Mark Rossetti <marosset@microsoft.com>
2021-11-04 14:39:57 -07:00
Andy Pan
3033a64135 kubelet/eviction: eliminate redundant allocations when handling eventfd 2021-11-04 15:41:46 +08:00
Mengjiao Liu
275d832ce2 Upgrade preparation to verify sysctl values containing forward slashes by regex 2021-11-04 11:49:56 +08:00
Skyler Clark
e9766c2b81
adds pinned field to imageRecords 2021-11-03 14:47:37 -04:00
Patrick Ohly
3948cb8d1b component-base: move v/vmodule/log-flush-frequency into LoggingConfiguration
These three options are the ones from logs.AddFlags which are not deprecated.
Therefore it makes sense to make them available also via the configuration file
support in the one command which currently supports that (kubelet).

Long-term, all commands should use LoggingConfiguration, either with a
configuration file (as in kubelet) or via flags (kube-scheduler,
kube-apiserver, kube-controller-manager).

Short-term, both approaches have to be supported. As the majority of the
commands only use logs.AddFlags, that function by default continues to register
the flags and only leaves that to Options.AddFlags when explicitly requested.

A drive-by bug fix is done for log flushing: the periodic flushing called
klog.Flush and therefore missed explicit flushing of the newer logr
backend. This bug was never present in any release Kubernetes and therefore the
fix is not submitted in a separate PR.
2021-11-03 07:41:46 +01:00
Kubernetes Prow Robot
aa0ea62489
Merge pull request #104903 from ikeeip/storageobjectinuseprotection_feature_ga_cleanup
Remove StorageObjectInUseProtection feature gate logic
2021-11-02 20:22:57 -07:00
Kubernetes Prow Robot
359b722c19
Merge pull request #102882 from fromanirh/device-manager-checkpoints
devicemanager: checkpoint: support pre-1.20 data
2021-11-02 16:56:57 -07:00
Konstantin Misyutin
808c8f42d5 Remove StorageObjectInUseProtection feature gate logic
This feature has graduated to GA in v1.11 and will always be
enabled. So no longe need to check if enabled.

Signed-off-by: Konstantin Misyutin <konstantin.misyutin@huawei.com>
2021-11-03 00:13:50 +03:00
Skyler Clark
d3ae0a381a
prevents garbage collection from removing pinned images 2021-11-02 14:43:02 -04:00
Jordan Liggitt
94d0c0f78e Simplify kubelet file config field allowlists 2021-11-02 10:23:54 -04:00
Kubernetes Prow Robot
08bf54678e
Merge pull request #101909 from nolancon/cpu-mgr-testing
Additional cases for reconcileState testing
2021-10-30 00:01:17 -07:00
Tim Hockin
11a25bfeb6
De-share the Handler struct in core API (#105979)
* De-share the Handler struct in core API

An upcoming PR adds a handler that only applies on one of these paths.
Having fields that don't work seems bad.

This never should have been shared.  Lifecycle hooks are like a "write"
while probes are more like a "read". HTTPGet and TCPSocket don't really
make sense as lifecycle hooks (but I can't take that back). When we add
gRPC, it is EXPLICITLY a health check (defined by gRPC) not an arbitrary
RPC - so a probe makes sense but a hook does not.

In the future I can also see adding lifecycle hooks that don't make
sense as probes.  E.g. 'sleep' is a common lifecycle request. The only
option is `exec`, which requires having a sleep binary in your image.

* Run update scripts
2021-10-29 13:15:11 -07:00
Peter Hunt
6b3f8e5662 kubelet: fallback to partial CRI stats if full fails
This is partially to allow the kube alpha tests to pass until CRI implementations have support, but also to handle this error situation a bit more elegantly

Signed-off-by: Peter Hunt <pehunt@redhat.com>
2021-10-29 09:40:20 -04:00
Peter Hunt
feb5f5e0ed kubelet: use helper function to check for nil fields in sandbox stats
Signed-off-by: Peter Hunt <pehunt@redhat.com>
2021-10-29 09:40:20 -04:00
Peter Hunt
85e8a4bf73 kubelet stats: use UsageNanoCores if available
Signed-off-by: Peter Hunt <pehunt@redhat.com>
2021-10-29 09:40:20 -04:00
Peter Hunt
ffdb4b9c4a kubelet: slightly move around some cri stats functions
to reduce duplication and add clarity

Signed-off-by: Peter Hunt <pehunt@redhat.com>
2021-10-29 09:40:20 -04:00
Peter Hunt
d2c436700e kubelet stats: add support for podAndContainerStatsFromCRI
This commit adds an initial implementation of translating from the new CRI fields
to the /stats/summary PodStats object

Signed-off-by: Peter Hunt <pehunt@redhat.com>
2021-10-29 09:40:20 -04:00
Peter Hunt
7866287ba1 kubelet stats: wire up podAndContainerStatsFromCRI feature gate
though it is currently unused

Signed-off-by: Peter Hunt <pehunt@redhat.com>
2021-10-29 09:40:20 -04:00
Kubernetes Prow Robot
c592bd40f2
Merge pull request #105609 from pohly/generic-ephemeral-volume-ga
generic ephemeral volume GA
2021-10-28 17:36:50 -07:00
Francesco Romani
2f426fdba6 devicemanager: checkpoint: support pre-1.20 data
The commit a8b8995ef2
changed the content of the data kubelet writes in the checkpoint.
Unfortunately, the checkpoint restore code was not updated,
so if we upgrade kubelet from pre-1.20 to 1.20+, the
device manager cannot anymore restore its state correctly.

The only trace of this misbehaviour is this line in the
kubelet logs:
```
W0615 07:31:49.744770    4852 manager.go:244] Continue after failing to read checkpoint file. Device allocation info may NOT be up-to-date. Err: json: cannot unmarshal array into Go struct field PodDevicesEntry.Data.PodDeviceEntries.DeviceIDs of type checkpoint.DevicesPerNUMA
```

If we hit this bug, the device allocation info is
indeed NOT up-to-date up until the device plugins register
themselves again. This can take up to few minutes, depending
on the specific device plugin.

While the device manager state is inconsistent:
1. the kubelet will NOT update the device availability to zero, so
   the scheduler will send pods towards the inconsistent kubelet.
2. at pod admission time, the device manager allocation will not
   trigger, so pods will be admitted without devices actually
   being allocated to them.

To fix these issues, we add support to the device manager to
read pre-1.20 checkpoint data. We retroactively call this
format "v1".

Signed-off-by: Francesco Romani <fromani@redhat.com>
2021-10-26 09:54:11 +02:00
Kubernetes Prow Robot
17da6a2345
Merge pull request #105699 from yuzhiquan/remove-format-pods
Remove format.pods func, instead with klog.Kobjs
2021-10-25 15:53:30 -07:00
Yuan Chen
b99495d1d9 Fix and improve comments on kubelet metrics 2021-10-21 17:38:25 -07:00
Eric Ernst
2c0fad1f52 kuberuntime: populate sandbox resources, overhead
Populate Resources and Overhead fields which, are now part of
LinuxPodSandboxConfig.

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2021-10-20 11:30:23 -07:00
Eric Ernst
ddcf815d12 kuberuntime: refactor linux resources for better reuse
Seperate the CPU/Memory req/limit -> linux resource conversion into its
own function for better reuse.

Elsewhere in kuberuntime pkg, we will want to leverage this
requests/limits to Linux Resource type conversion.

Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2021-10-20 11:30:23 -07:00
Eric Ernst
b1361aed93 kuberuntime: augment linux container config unit test
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2021-10-20 11:30:23 -07:00
Eric Ernst
a73502a0be kuberuntime: augment linux container config unit test
Signed-off-by: Eric Ernst <eric_ernst@apple.com>
2021-10-20 11:29:22 -07:00
Kubernetes Prow Robot
b2c4269992
Merge pull request #105631 from klueska/upstream-distribute-cpus-across-numa
Add CPUManager policy option to distribute CPUs across NUMA nodes instead of packing them
2021-10-19 11:40:24 -07:00
Gunju Kim
3bce245279
Ensure there is one running static pod with the same full name 2021-10-19 16:30:18 +09:00
Kubernetes Prow Robot
1af8a8c026
Merge pull request #105465 from marosset/remove-host-process-contianer-kubelet-annotations
Stop passing WindowsHostProcessContainer annotations for CRI calls in kubelet
2021-10-18 15:50:02 -07:00
Kubernetes Prow Robot
e595d79dfc
Merge pull request #104574 from 249043822/br-repeat-package
fix duplicate package import in pod_worker
2021-10-18 15:49:46 -07:00
Kubernetes Prow Robot
5889fb4fbc
Merge pull request #105652 from wzshiming/feat/structure-shutdown-config
Refactor to use structure to pass parameters for GracefulNodeShutdown
2021-10-18 14:45:20 -07:00
Kevin Klues
86f9c266bc Add optimizations to reduce iterations in distributed NUMA algorithm
Signed-off-by: Kevin Klues <kklues@nvidia.com>
2021-10-18 08:53:25 +00:00
Kevin Klues
70e0f47191 Support full-pcpus-only with the new NUMA distribution policy option
Signed-off-by: Kevin Klues <kklues@nvidia.com>
2021-10-16 19:31:02 +00:00
Kevin Klues
d54445a84d Generalize the NUMA distribution algorithm to take cpuGroupSize
This parameter ensures that CPUs are always allocated in groups of size
'cpuGroupSize'. This is important, for example, to ensure that all CPUs (i.e.
hyperthreads) from the same core are handed out together.

Signed-off-by: Kevin Klues <kklues@nvidia.com>
2021-10-16 19:31:02 +00:00
Kevin Klues
1436e33642 Add more extensive testing for NUMA distribution algorithm in CPUManager
Signed-off-by: Kevin Klues <kklues@nvidia.com>
2021-10-16 19:31:02 +00:00
Kevin Klues
cf3afb8602 Add 2 distinguishing test cases between the 2 takeByTopology algorithms
Signed-off-by: Kevin Klues <kklues@nvidia.com>
2021-10-16 19:31:02 +00:00
Kevin Klues
eb78e2406b Add a new TestTakeByTopologyNUMADistributed() test to the CPUManager
As part of this, pull out all of the existing "TakeByTopology" tests and have
them be called by the original TestTakeByTopologyNUMAPacked() as well as the
new TestTakeByTopologyNUMADistributed() test. In a subsequent commit, we will
add some tests that should differ between these two algorithms.

Signed-off-by: Kevin Klues <kklues@nvidia.com>
2021-10-16 19:31:02 +00:00
Kevin Klues
876dd9b078 Added algorithm to CPUManager to distribute CPUs across NUMA nodes
Signed-off-by: Kevin Klues <kklues@nvidia.com>
2021-10-16 19:31:02 +00:00
Kevin Klues
462544d079 Split CPUManager takeByTopology() into two different algorithms
The first implements the original algorithm which packs CPUs onto NUMA nodes if
more than one NUMA node is required to satisfy the allocation. The second
disitributes CPUs across NUMA nodes if they can't all fit into one.

The "distributing" algorithm is currently a noop and just returns an error of
"unimplemented". A subsequent commit will add the logic to implement this
algorithm according to KEP 2902:

https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/2902-cpumanager-distribute-cpus-policy-option

Signed-off-by: Kevin Klues <kklues@nvidia.com>
2021-10-16 14:46:19 +00:00
Kevin Klues
0e7928edce Add new CPUManager policy option for "distribute-cpus-across-numa"
This commit only adds the option to the policy options framework. A
subsequent commit will add the logic to utilize it.

The KEP describing this new option can be found here:
https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/2902-cpumanager-distribute-cpus-policy-option

Signed-off-by: Kevin Klues <kklues@nvidia.com>
2021-10-16 14:46:19 +00:00
yuzhiquanlong
27fe56e916 remove unused import 2021-10-15 18:40:31 +08:00
Francesco Romani
4bae656835 cpumanager: test NUMA node support for CPU assign (2)
This batch of tests adds a fake topology on which each numa node
has multiple sockets. We didn't find yet a real HW topology in the wild
like this, but we need one to fully exercise the code.

So, until we find a HW topology, we add a fake one flipping
the NUMA/socket config of the existing xeon dual gold 6320.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2021-10-15 10:29:21 +00:00
Francesco Romani
547996f3f6 cpumanager: test NUMA node support for CPU assign (1)
This batch of tests adds a real topology on which each physical socket
has multiple NUMA zones. Taken by a real dual xeon 6320 gold.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2021-10-15 10:29:21 +00:00
Francesco Romani
f6ccc4426a cpumanager: test: use proper subtests
The exisiting unit tests where performing subtests without
actually using the full features of the testing package
(https://pkg.go.dev/testing#hdr-Subtests_and_Sub_benchmarks)

Update them with fairly minimal changes. The patch is deceptively
large because we need to move the code inside a new block.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2021-10-15 10:29:21 +00:00
Francesco Romani
15caa134b2 cpumanager: topology: use rich cmp package
User the `cmp.Diff` package in the unit tests, moving away from
`reflect.DeepEqual`. This gives us a clearer picture of the differences
when the tests fail.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2021-10-15 10:29:21 +00:00
Kevin Klues
aff54a0914 Abstract out whether NUMA or Sockets come first in the memory hierarchy
This allows us to get rid of the check for determining which one is higher all
throughout the code. Now we just check once and instantiate an interface of the
appropriate type that makes sure the ordering in the hierarchy is preserved
through the appropriate calls.

Signed-off-by: Kevin Klues <kklues@nvidia.com>
2021-10-15 10:29:15 +00:00
yuzhiquanlong
be9e1fda5e remove format pods func, instead with klog.Kobjs 2021-10-15 18:26:02 +08:00
Kevin Klues
17c7e86c6d Add NUMA support to the CPU assignment algorithm in the CPUManager
Signed-off-by: Kevin Klues <kklues@nvidia.com>
2021-10-15 08:35:59 +00:00
Shiming Zhang
e47c78a354 Add log for creating node shutdown manager 2021-10-15 11:16:21 +08:00
Shiming Zhang
b468c24e85 Refactor to use structure to pass parameters 2021-10-15 11:16:21 +08:00
Kubernetes Prow Robot
a923852ba0
Merge pull request #105215 from rphillips/add_probe_shutdown
kubelet: add probe termination to graceful shutdowns
2021-10-11 21:19:46 -07:00
Patrick Ohly
a8c930ef46 generic ephemeral volume: graduation to GA
The feature gate gets locked to "true", with the goal to remove it in two
releases.

All code now can assume that the feature is enabled. Tests for "feature
disabled" are no longer needed and get removed.

Some code wasn't using the new helper functions yet. That gets changed while
touching those lines.
2021-10-11 20:54:20 +02:00
nolancon
6bbb36df10 Additional cases for reconcileState testing 2021-10-11 16:17:21 +00:00
Kubernetes Prow Robot
dc9c571166
Merge pull request #105569 from pohly/generic-ephemeral-kubelet-volume-stats
kubelet: also provide filesystem stats for generic ephemeral volumes
2021-10-11 07:52:39 -07:00
Kubernetes Prow Robot
1f2813368e
Merge pull request #105542 from pohly/generic-ephemeral-volume-util-kubelet
kubelet: use generic ephemeral volume helper functions
2021-10-11 02:16:40 -07:00
Kubernetes Prow Robot
fb82a0d7eb
Merge pull request #104873 from pohly/json-output-stream
JSON output streams
2021-10-10 17:04:37 -07:00
Patrick Ohly
b22263d835 component-base: configurable JSON output
This implements the replacement of klog output to different files per level
with optionally splitting JSON output into two streams: one for info messages
on stdout, one for error messages on stderr. The info messages can get buffered
to increase performance. Because stdout and stderr might be merged by the
consumer, the info stream gets flushed before writing an error, to ensure that
the order of messages is preserved.

This also ensures that the following code pattern doesn't leak info messages:
   klog.ErrorS(err, ...)
   os.Exit(1)

Commands explicitly have to flush before exiting via logs.FlushLogs. Most
already do. But buffered info messages can still get lost during an unexpected
program termination, therefore buffering is off by default.

The new options get added to the v1alpha1 LoggingConfiguration with new command
line flags. Because it is an alpha field, changing it inside the v1beta kubelet
config should be okay as long as the fields are clearly marked as alpha.
2021-10-09 10:10:35 +02:00
Kubernetes Prow Robot
63f66e6c99
Merge pull request #105012 from fromanirh/cpumanager-policy-options-beta
node: graduate CPUManagerPolicyOptions to beta
2021-10-08 07:32:59 -07:00
Kubernetes Prow Robot
2face135c7
Merge pull request #97415 from AlexeyPerevalov/ExcludeSharedPoolFromPodResources
Return only isolated cpus in podresources interface
2021-10-08 05:58:58 -07:00
Patrick Ohly
b1ba381ef8 kubelet: also provide filesystem stats for generic ephemeral volumes
When checking for a reference to a PVC, the code also needs to consider that a
PVC might be referenced indirectly through an ephemeral volume source.
2021-10-08 12:11:52 +02:00
Kubernetes Prow Robot
dd650bd41f
Merge pull request #105527 from rphillips/fixes/filter_terminated_pods
kubelet: set terminated podWorker status for terminated pods
2021-10-07 22:19:51 -07:00
Ryan Phillips
0166d446b9 kubelet: set terminated podWorker status for terminated pods 2021-10-07 16:18:59 -05:00
Patrick Ohly
844662e7fa kubelet: use generic ephemeral volume helper functions
The name concatenation and ownership check were originally considered small
enough to not warrant dedicated functions, but the intent of the code is more
readable with them.
2021-10-07 17:31:54 +02:00
Alexey Perevalov
5d9032007a Return only isolated cpus in podresources interface
Co-Authored-by: Swati Sehgal <swsehgal@redhat.com>
Signed-off-by: Alexey Perevalov <alexey.perevalov@huawei.com>
2021-10-07 15:34:08 +01:00
Kubernetes Prow Robot
c4d802b0b5
Merge pull request #103289 from AlexeyPerevalov/DoNotExportEmptyTopology
podresources: do not export empty NUMA topology
2021-10-07 07:11:46 -07:00
Kubernetes Prow Robot
907d62eac8
Merge pull request #105462 from ehashman/merge-terminal-phase
Ensure terminal pods maintain terminal status
2021-10-05 13:12:58 -07:00
Mark Rossetti
99e43bfa8c Stop passing WindowsHostProcessContainer annotations for CRI calls in kubelet
Signed-off-by: Mark Rossetti <marosset@microsoft.com>
2021-10-05 10:08:53 -07:00
Elana Hashman
3005ef34f2
Ensure terminal pods maintain terminal status 2021-10-05 09:26:27 -07:00
Kubernetes Prow Robot
c91f9bdc60
Merge pull request #104689 from cynepco3hahue/memory_manager_restricted_policy_fix
kubelet: memory manager: fix preferred topology hints calculation
2021-10-05 06:47:08 -07:00
Kubernetes Prow Robot
efa9029a0d
Merge pull request #104920 from tkashem/response-writer-cleanup
apiserver: decorate http.ResponseWriter correctly
2021-10-05 00:53:09 -07:00
Elana Hashman
5ff6c2396d
Do not sync Waiting statuses for Terminated pods 2021-10-04 11:05:54 -07:00
Abu Kashem
0d50c969c5
apiserver: wrap ResponseWriter using abstraction 2021-10-04 10:59:11 -04:00
Kubernetes Prow Robot
e414cf7641
Merge pull request #100482 from pohly/generic-ephemeral-volume-checks
generic ephemeral volume checks
2021-10-01 10:47:22 -07:00
Patrick Ohly
1e26115df5 consider ephemeral volumes for host path and node limits check
When adding the ephemeral volume feature, the special case for
PersistentVolumeClaim volume sources in kubelet's host path and node
limits checks was overlooked. An ephemeral volume source is another
way of referencing a claim and has to be treated the same way.
2021-10-01 17:03:44 +02:00
Kubernetes Prow Robot
883250145c
Merge pull request #104788 from 249043822/memorymanager-br
Fix initContainersReusableMemory delete bug in MemoryManager
2021-10-01 05:27:22 -07:00
Kubernetes Prow Robot
cab54856f1
Merge pull request #104933 from vikramcse/automate_mockery
conversion of tests from mockery to mockgen
2021-09-30 18:33:21 -07:00
Shuhei Kitagawa
ef0eff14ab
Add tests kubelet default config (#105116)
* Use utilpointer to get a pointer

* Add tests for kubelet default configs

* Change copyright year from 2015 to 2021

* Run gofmt

* Add all negative and all positive test cases
2021-09-30 17:29:33 -07:00
Francesco Romani
077c0aa1be node: graduate CPUManagerPolicyOptions to beta
We graduate the `CPUManagerPolicyOptions` feature to beta
in the 1.23 cycle, and we add new experimental feature gates
to guard new options which are planned in the 1.23 and in the
following cycles.

We introduce additional feature gate called `CPUManagerPolicyAlphaOptions` and
`CPUManagerPolicyBetaOptions`. The basic idea is to avoid the
cumbersome process of adding a feature gate for each option, and to have
feature gates which track the maturity level of _groups_ of options.
Besides this change, the graduation process, and the process in general,
for adding new policy options is still unchanged.

The `full-pcpus-only` option added in the 1.22 cycle is intentionally
moved into the beta policy options

For more details:
- KEP: https://github.com/kubernetes/enhancements/pull/2933
- sig-arch discussion:
  https://groups.google.com/u/1/g/kubernetes-sig-architecture/c/Nxsc7pfe5rw

Signed-off-by: Francesco Romani <fromani@redhat.com>
2021-09-29 11:40:03 +02:00
Kubernetes Prow Robot
e138afc35d
Merge pull request #105213 from yxxhero/remove_StartedPodsErrorsTotal_metrice_message
Remove StartedPodsErrorsTotal metric message
2021-09-28 10:45:16 -07:00
Kubernetes Prow Robot
9005160245
Merge pull request #105272 from wojtek-t/add_jittering_for_kubelet
Add jittering for Kubelet status computing
2021-09-28 00:20:42 -07:00
wojtekt
65d8037ae3 Add jittering for Kubelet status computing 2021-09-27 19:39:50 +02:00
vikram Jadhav
0de4397490 mockery to mockgen conversion 2021-09-25 16:15:08 +00:00
Khaled Henidak (Kal)
a53e2eaeab
move IPv6DualStack feature to stable. (#104691)
* kube-proxy

* endpoints controller

* app: kube-controller-manager

* app: cloud-controller-manager

* kubelet

* app: api-server

* node utils + registry/strategy

* api: validation (comment removal)

* api:pod strategy (util pkg)

* api: docs

* core: integration testing

* kubeadm: change feature gate to GA

* service registry and rest stack

* move feature to GA

* generated
2021-09-24 16:30:22 -07:00
yxxhero
35df409a7e remove StartedPodsErrorsTotal metrice message
Signed-off-by: yxxhero <aiopsclub@163.com>
2021-09-23 22:18:56 +08:00
Kubernetes Prow Robot
2541fcf256
Merge pull request #104123 from fromanirh/podresources-not-report-unhealthy-devices
devicemanager: skip unhealthy devices in GetAllocatable
2021-09-23 05:39:21 -07:00
Ryan Phillips
e2e938066d kubelet: add probe termination to graceful shutdowns 2021-09-22 14:13:25 -05:00
Francesco Romani
1b6efa5e21 devicemanager: skip unhealthy devs in GetAllocatable
The GetAllocatableDevices, needed to support the podresources
API, doesn't take into account the device health when computing
its output.

In this PR we address this gap and add unit tests along the way
to prevent regressions. This gives us a good initial coverage,
E2E tests to cover this case are much harder to write, because
we would need to inject faults to trigger the unhealthy status.
We will evaluate if adding these tests into later PRs.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2021-09-22 19:20:04 +02:00
Kubernetes Prow Robot
7c71e06cd1
Merge pull request #104959 from calvin0327/issue-test-dataRace
fix the test issue of node shutdown manager
2021-09-21 11:56:30 -07:00
Kubernetes Prow Robot
44d4d007bf
Merge pull request #103424 from 249043822/br-cadvisor-perf
Optimize kubelet stats provider for perfomace bottleneck
2021-09-21 11:56:18 -07:00
Kubernetes Prow Robot
353f0a5eab
Merge pull request #105095 from wojtek-t/migrate_clock_3
Unify towards k8s.io/utils/clock - part 3
2021-09-20 12:46:45 -07:00
Kubernetes Prow Robot
0d20f47c7a
Merge pull request #105090 from saad-ali/removeSubpathFeaturegate
Remove VolumeSubpath feature gate
2021-09-17 15:52:07 -07:00
wojtekt
d9b08c611d Migrate to k8s.io/utils/clock 2021-09-17 15:19:08 +02:00
Kubernetes Prow Robot
cb2ea4bf7c
Merge pull request #101161 from rikatz/move-sysctl-util
Move node and networking related helpers from pkg/util to component helpers
2021-09-17 02:11:00 -07:00
saad-ali
beb17fe10b Remove VolumeSubpath feature gate
Remove the VolumeSubpath feature gate.

Feature gate convention has been updated since this was introduced to
indicate that they "are intended to be deprecated and removed after a
feature becomes GA or is dropped.".
2021-09-17 01:59:23 -07:00
Ricardo Pchevuzinske Katz
37d11bcdaf Move node and networking related helpers from pkg/util to component helpers
Signed-off-by: Ricardo Katz <rkatz@vmware.com>
2021-09-16 17:00:19 -03:00
Clayton Coleman
d5719800bf
kubelet: Handle UID reuse in pod worker
If a pod is killed (no longer wanted) and then a subsequent create/
add/update event is seen in the pod worker, assume that a pod UID
was reused (as it could be in static pods) and have the next
SyncKnownPods after the pod terminates remove the worker history so
that the config loop can restart the static pod, as well as return
to the caller the fact that this termination was not final.

The housekeeping loop then reconciles the desired state of the Kubelet
(pods in pod manager that are not in a terminal state, i.e. admitted
pods) with the pod worker by resubmitting those pods. This adds a
small amount of latency (2s) when a pod UID is reused and the pod
is terminated and restarted.
2021-09-15 14:02:00 -04:00
KeZhang
a629ceeb58 Fix initContainersReusableMemory delete bug 2021-09-15 10:04:49 +08:00
Kubernetes Prow Robot
fa2657b8b2
Merge pull request #104624 from Haleygo/support-null-resolvConf-in-configFile
When resolvConf is "" in kubelet configuration, pod will be created with wrong dns policy
2021-09-14 14:18:59 -07:00
yxxhero
c1b94d27d9 fix typo
Signed-off-by: yxxhero <aiopsclub@163.com>
2021-09-14 23:24:14 +08:00
Haleygo
46454ea9dc support null resolvConf in Kubelet Configuration 2021-09-14 16:12:52 +08:00
Kubernetes Prow Robot
047a6b9f86
Merge pull request #104874 from wojtek-t/migrate_clock_1
Unify towards k8s.io/utils/clock - part 1
2021-09-13 19:09:20 -07:00
Kubernetes Prow Robot
c79f7c1add
Merge pull request #104711 from claudiubelu/update-pause-3.6
update pause image references to use 3.6
2021-09-13 19:09:08 -07:00
yxxhero
20b3cd5198 fix typo
Signed-off-by: yxxhero <aiopsclub@163.com>
2021-09-14 09:04:59 +08:00
yxxhero
5ba76eb911 fix typo
Signed-off-by: yxxhero <aiopsclub@163.com>
2021-09-14 09:03:29 +08:00
Kubernetes Prow Robot
0e2acbe9a8
Merge pull request #104794 from wzshiming/fix/kubelet-cm-kv-pair
pkg/kubelet/cm/memorymanager: Fix ErrorS key/value pair
2021-09-13 15:44:04 -07:00
calvin0327
db82e282fc fix the test issue of data race to node shutdown manager 2021-09-13 18:12:19 +08:00
wojtekt
53ce79a18a Migrate to k8s.io/utils/clock in pkg/kubelet 2021-09-10 12:20:09 +02:00
Kubernetes Prow Robot
1dcea5cb02
Merge pull request #104817 from smarterclayton/pod_status
kubelet: Rejected pods should be filtered from admission
2021-09-09 22:15:59 -07:00
Kubernetes Prow Robot
5724484bda
Merge pull request #104069 from pacoxu/fix-data-race-104057
fix data race in kubelet volume test: add lock for ut
2021-09-09 21:09:59 -07:00
eggiter
20d3bc32ac fix(cpumanager): Do not release cpus of init containers while they are reused in app containers 2021-09-10 10:01:35 +08:00
Clayton Coleman
17d32ed0b8
kubelet: Rejected pods should be filtered from admission
A pod that has been rejected by admission will have status manager
set the phase to Failed locally, which make take some time to
propagate to the apiserver. The rejected pod will be included in
admission until the apiserver propagates the change back, which
was an unintended regression when checking pod worker state as
authoritative.

A pod that is terminal in the API may still be consuming resources
on the system, so it should still be included in admission.
2021-09-08 10:23:45 -04:00
Shiming Zhang
7706d3d281 pkg/kubelet/cm/memorymanager: Fix ErrorS key/value pair 2021-09-06 17:37:04 +08:00
vikram Jadhav
c10c92bda9 changes made by introducing mockgen command 2021-09-03 17:40:11 +00:00
Vikram Jadhav
5f674101bb Added update and verify scripts for automated mock generation 2021-09-03 17:40:11 +00:00
yxxhero
2f448a0789 fix oomkilled description
Signed-off-by: yxxhero <aiopsclub@163.com>
2021-09-03 22:07:46 +08:00
yxxhero
71a91d55cb update func description 2021-09-03 07:20:28 +08:00
yxxhero
afde4c8bc4 fix init container oomkilled as a failure
Signed-off-by: yxxhero <aiopsclub@163.com>
2021-09-03 07:04:57 +08:00
Kubernetes Prow Robot
0b4a793da2
Merge pull request #103941 from saschagrunert/seccomp-profile-root
Remove deprecated `--seccomp-profile-root`/`seccompProfileRoot` config
2021-09-02 08:52:57 -07:00
paco
ab055e9ba4 fix data race in kubelet volume test: add lock
Signed-off-by: Paco Xu <paco.xu@daocloud.io>
Co-authored-by: Jian Zeng <zengjian.zj@bytedance.com>
2021-09-01 16:13:55 +08:00
Artyom Lukianov
9ea9798759 kubelet: memory manager: fix topology preferred topology hints calculation
Prevent starting pods with resources satisfied by a single NUMA node on multiple NUMA nodes.
The code returned before it updated the minimal amount of NUMA nodes that can satisfy the container
requests.

Signed-off-by: Artyom Lukianov <alukiano@redhat.com>
2021-08-31 17:46:59 +03:00
Sascha Grunert
46077e6be7
Remove deprecated --seccomp-profile-root/seccompProfileRoot configuration
The configuration is deprecated and targets removal for v1.23. Tests
cases have been changed as well.

Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
2021-08-31 09:55:28 +02:00
Kubernetes Prow Robot
bbbeceb6aa
Merge pull request #104577 from smarterclayton/smaller_filter_master
kubelet: Admission must exclude completed pods and avoid races
2021-08-30 13:17:13 -07:00
Claudiu Belu
18936d4785 updates pause image references
The pause:3.6 image has been published.

Also updates older / incorrect references.
2021-08-29 21:50:05 -07:00
Kubernetes Prow Robot
c262d09bb7
Merge pull request #104604 from wojtek-t/fix_secret_manager_2
Don't prematurely close reflectors in case of slow initialization in watch based manager
2021-08-26 06:11:23 -07:00
wojtekt
515106b795 Don't prematurely close reflectors in case of slow initialization in watch based manager 2021-08-26 11:34:24 +02:00
tiloso
2b86541313 Fix staticcheck failure in pkg/kubelet/cm/cpuset 2021-08-26 08:50:08 +02:00
Kubernetes Prow Robot
cbd0611d49
Merge pull request #104528 from kolyshkin/runc-1.0.2
vendor: bump runc to 1.0.2
2021-08-25 18:17:23 -07:00
Kubernetes Prow Robot
2f6b9166d7
Merge pull request #104039 from YanzhaoLi/extract-containerdid-from-various-cgrouppath
Get containerID from systemd-style cgroupPath in cri_stats_provider
2021-08-25 17:05:22 -07:00
Clayton Coleman
a2ca66d280
kubelet: Admission must exclude completed pods and avoid races
Fixes two issues with how the pod worker refactor calculated the
pods that admission could see (GetActivePods() and
filterOutTerminatedPods())

First, completed pods must be filtered from the "desired" state
for admission, which arguably should be happening earlier in
config. Exclude the two terminal pods states from GetActivePods()

Second, the previous check introduced with the pod worker lifecycle
ownership changes was subtly wrong for the admission use case.
Admission has to include pods that haven't yet hit the pod worker,
which CouldHaveRunningContainers was filtering out (because the
pod worker hasn't seen them). Introduce a weaker check -
IsPodKnownTerminated() - that returns true only if the pod is in
a known terminated state (no running containers AND known to pod
worker). This weaker check may only be called from components that
need admitted pods, not other kubelet subsystems.

This commit does not fix the long standing bug that force deleted
pods are omitted from admission checks, which must be fixed by
having GetActivePods() also include pods "still terminating".
2021-08-25 13:31:02 -04:00
KeZhang
dd4fd54427 fix duplicate package import in pod_worker 2021-08-25 21:16:38 +08:00
Stephen Augustus
481cf6fbe7
generated: Run hack/update-gofmt.sh
Signed-off-by: Stephen Augustus <foo@auggie.dev>
2021-08-24 15:47:49 -04:00
Alexey Perevalov
bb81101570 podresource: do not export NUMA topology if it's empty
If device plugin returns device without topology, keep it internaly
as NUMA node -1, it helps at podresources level to not export NUMA
topology, otherwise topology is exported with NUMA node id 0,
which is not accurate.

It's imposible to unveile this bug just by tracing json.Marshal(resp)
in podresource client, because NUMANodes field ID has json property
omitempty, in this case when ID=0 shown as emtpy NUMANode.
To reproduce it, better to iterate on devices and just
trace dev.Topology.Nodes[0].ID.

Signed-off-by: Alexey Perevalov <alexey.perevalov@huawei.com>
2021-08-24 15:38:21 +00:00
Kir Kolyshkin
c06a851042 pkg/kubelet/cm: use SkipFreezeOnSet
This is a knob added by runc 1.0.2 specifically for kubernetes,
which tells runc/libcontainer/cgroups/systemd v1 manager to not
freeze the cgroup in Set().

We set this knob here because this code is only used for pods
(rather than containers) management, and in this place we create or
update the pod cgroup with no device limits set, so we can skip the
freeze.

If this knob is not set, libcontainer's cgroup v1 manager tries to
figure out whether the freeze is needed or not, but it's a somewhat
expensive check to perform, thus the knob is a shortcut.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2021-08-23 13:41:51 -07:00
Antonio Ojea
0cd75e8fec run hack/update-netparse-cve.sh 2021-08-20 10:42:09 +02:00
Kubernetes Prow Robot
8dbc33d649
Merge pull request #101081 from rphillips/add_graceful_shutdown_event
kubelet: add graceful shutdown events
2021-08-17 22:08:08 -07:00
Kubernetes Prow Robot
a779c58b16
Merge pull request #104330 from liggitt/defaulter-package
Change defaulter-gen input to package import path
2021-08-17 11:42:18 -07:00
Kubernetes Prow Robot
07b7afefbf
Merge pull request #103862 from tanjing2020/cleancode
Replace 'x.Sub(time.Now())' with 'time.Until(x)'
2021-08-17 11:42:01 -07:00
Kubernetes Prow Robot
d7c1663556
Merge pull request #103137 from wzshiming/fix/expected_inhibit_delay
Allow the actual inhibit delay to be greater than the expected inhibit delay
2021-08-17 11:41:49 -07:00
Kubernetes Prow Robot
a9aad7e034
Merge pull request #103107 from pacoxu/fix-93300
ResourceConfigForPod: check initContainers as other QoS func
2021-08-17 11:41:37 -07:00
Kubernetes Prow Robot
f4185318bc
Merge pull request #103048 from gy95/remove_static
remove not used IsStaticPod, prevent possible panic
2021-08-17 11:41:25 -07:00
Kubernetes Prow Robot
b559434c02
Merge pull request #103059 from rajaSahil/fix-error
Update github.com/pkg/errors to go native errors pkg
2021-08-17 10:29:25 -07:00
Kubernetes Prow Robot
db42b67f3c
Merge pull request #101962 from llhhbc/add-osinfo-logs
Add getOSInfo err info
2021-08-17 10:29:13 -07:00
Jordan Liggitt
87a4e082ac Change defaulter-gen input to package path 2021-08-14 11:00:18 -04:00
YanzhaoLi
545d898584 Extract containerID from systemd-style cgroupPath in cri_stats_provider
And fix test to generate UUID without dash
2021-08-11 19:03:56 -07:00
Ryan Phillips
30e9a420c4 kubelet: fix sandbox creation error suppression when pods are quickly deleted 2021-08-10 08:55:25 -05:00
Kubernetes Prow Robot
4b4d12f8a6
Merge pull request #102913 from pacoxu/upgrade-promotheus-common
upgrade prometheus/common to v0.28.0
2021-08-09 08:03:31 -07:00
longhui.li
4af506c989 Add getOSInfo err info 2021-08-09 11:04:53 +08:00
Artyom Lukianov
73a5cce3e6 device manager: do not clean admitted pods from the state
Signed-off-by: Artyom Lukianov <alukiano@redhat.com>
2021-08-08 16:46:06 +03:00
Artyom Lukianov
93a237abd8 memory manager: do not clean admitted pods from the state
Signed-off-by: Artyom Lukianov <alukiano@redhat.com>
2021-08-08 16:46:06 +03:00
Artyom Lukianov
66babd1a90 cpu manager: do not clean admitted pods from the state
Signed-off-by: Artyom Lukianov <alukiano@redhat.com>
2021-08-08 16:46:06 +03:00
Elana Hashman
d2ed3b28b7
Revert "revert Bump DynamicKubeConfig metric deprecation to 1.23 by delta update" 2021-08-06 08:38:56 -07:00
Kubernetes Prow Robot
28990f7664
Merge pull request #103958 from liggitt/server-timeouts
Set idle and readheader timeouts
2021-08-05 14:11:02 -07:00
Kubernetes Prow Robot
3b84cc9e6b
Merge pull request #104075 from kerthcet/cleanup/revert-dynamickubeconfig-metric
revert Bump DynamicKubeConfig metric deprecation to 1.23 by delta update
2021-08-05 08:18:40 -07:00