kubernetes

Author	SHA1	Message	Date
Kubernetes Prow Robot	8580bbf7d7	Merge pull request #107594 from hakman/remove_container-runtime_logic Clean up logic for deprecated flag --container-runtime in kubelet	2022-02-11 12:57:47 -08:00
Kubernetes Prow Robot	e24b5333e5	Merge pull request #108052 from klueska/fix-topology-manager Fix bug in TopologyManager with merging hints when NUM_NUMA > 2	2022-02-11 07:37:34 -08:00
Jan Safranek	77aa06d0c8	Remove util/selinux package The package says: > the libcontainer SELinux package is only built for Linux, so it is > necessary to have a NOP wrapper which is built for non-Linux platforms This is not true, Kubernetes now imports github.com/opencontainers/selinux/go-selinux and it has proper multiplatform support (i.e. NOOP on non-Linux platforms). Removing the whole package and calling go-selinux directly.	2022-02-11 15:20:35 +01:00
Kubernetes Prow Robot	7cfe0ca828	Merge pull request #107774 from calvin0327/fix-data-race fix: data race when hijack klog	2022-02-10 23:32:15 -08:00
Cheng Xing	b152fa9b6c	Remove verult from OWNERS files	2022-02-10 18:25:38 -08:00
Kevin Klues	155562dd2e	Fix bug in TopologyManager with merging hints when NUM_NUMA > 2 Before this fix, hint permutations such as: permutation: [{11 true} {0101 true}] Could result in merged hints of: mergedHint: {01 true} This was possible because both hints in the permutation container a "preferred" allocation (i.e. the full set of NUMA nodes set in the affinity bitmask are required to satisfy the allocation). With this in place, the simplified logic we had simply kept the merged hint as preferred as well. However, what we really want is to ensure that the merged hint is only preferred if true alignment of all resources is possible (i.e. if all hints in the permutation are preferred AND their affinities are exactly equal). The only exception to this is if no topology information is provided by a given hint provider. In this case, we assume alignment doesn't matter and only consider the resources that actually have hints provided for them. This changes the semantics of permutations of the form: permutation: [{111 true} {011 true}] To now result in the merged hint of: mergedHint: {011 false} Instead of: mergedHint: {011 true} This is arguably how it should always have been though (because a hint should not be preferred if true alignment isn't possible), and two tests have had to change to accomodate these new semantics. This commit changes the merge function to implement the updated logic, adds a test to verify it is functioning correctly, and updates the two tests mentioned above to adjust to the new semantics. Signed-off-by: Kevin Klues <kklues@nvidia.com>	2022-02-10 22:07:51 +00:00
Sascha Grunert	effbcd3a0a	Add support for CRI `verbose` fields The remote runtime implementation now supports the `verbose` fields, which are required for consumers like cri-tools to enable multi CRI version support. Signed-off-by: Sascha Grunert <sgrunert@redhat.com>	2022-02-10 17:12:26 +01:00
Ciprian Hacman	0819451ea6	Clean up logic for deprecated flag --container-runtime in kubelet Signed-off-by: Ciprian Hacman <ciprian@hakman.dev>	2022-02-10 13:26:59 +02:00
Kubernetes Prow Robot	3b4a9cdfff	Merge pull request #108007 from endocrimes/dani/cm-remove-docker cm: Remove legacy docker references	2022-02-10 03:23:47 -08:00
Gunju Kim	eb4cd9ab4e	Check taint/toleration before accepting pods, except for static pods	2022-02-10 19:39:26 +09:00
Kubernetes Prow Robot	518a3c2f70	Merge pull request #107108 from linxiulei/fix_pid Read number of running processes from /proc/loadavg.	2022-02-10 01:15:47 -08:00
Kubernetes Prow Robot	40c2d04946	Merge pull request #107112 from linxiulei/fix_pidmax Consider threads-max when deciding MaxPID.	2022-02-09 20:49:45 -08:00
Kubernetes Prow Robot	0dcd6eaa0d	Merge pull request #103934 from boenn/tainttoleration De-duplicate predicate (known as filter now) logic shared in kubelet and scheduler	2022-02-09 16:53:46 -08:00
Kubernetes Prow Robot	8d01b02c60	Merge pull request #107096 from hakman/remove_non-masquerade-cidr Remove deprecated flag --non-masquerade-cidr in kubelet	2022-02-08 12:42:50 -08:00
Danielle Lancashire	3630328fd9	eviction: Deflake TestStart TestStart was previously flaky. In approx 100_000 local runs, it failed about 70% of the time, and has been mentioned as a flaky unit test in the past. This flake was due to a race condition with the logic as written and the go scheduler. UpdateThreshold calls `notifier.Start(events)` in a new Go Routine, which is not guarunteed to be called immediately. This meant that if `m.Start()` was called before `notifier.Start()`, the test would fail, as the notifier would not have been started before the 4 events were processed and lock released. Here, we update the test to more closely match the intended application behaviour, and have events passed to the channel when `Start` is called on the notifier. This ensures that -Start gets called and additionally validates that the correct channel is provided to the notifier. Stop was never called previously, as it only gets called on a subsequent call to UpdateThreshold. `AnyTimes()` hid that this did not occur.	2022-02-08 17:03:44 +01:00
Danielle Lancashire	c198062da4	cm: Remove legacy docker references Dockershim and built-in Docker support are gone. Cleans up dead code references to them.	2022-02-08 16:25:04 +01:00
Jorik Jonker	27b8f13763	kubelet: expose OOM metrics cAdvisor has code to expose OOM metrics since 0.40.0, but this was not included in Kubelet so far. This commit enables it. Signed-off-by: Jorik Jonker <jorik.jonker@eu.equinix.com>	2022-02-08 12:24:25 +01:00
Jordan Liggitt	3a132bd206	Fix kubelet cri round trip test	2022-02-05 17:59:29 -05:00
Kubernetes Prow Robot	469c4c4a30	Merge pull request #106715 from aojea/dual_hostnet_pods set secondary address on host-network pods	2022-02-04 12:17:30 -08:00
Antonio Ojea	bc8e7ac1a0	ignore CRI PodSandboxNetworkStatus for host network pods	2022-02-04 18:41:57 +01:00
Gunju Kim	3ce5c944a8	kubelet: Clean up a static pod that has been terminated before starting - Allow a podWorker to start if it is blocked by a pod that has been terminated before starting - When a pod can't start AND has already been terminated, exit cleanly - Add a unit test that exercises race conditions in pod workers	2022-02-02 16:05:32 -05:00
Clayton Coleman	b638bd8b03	kubelet: If the container status is created, we are waiting If CRI returns a container that has been created but is not running, it is not safe to assume it is terminal, as our connection to CRI may have failed. Instead, created is treated as waiting, as in "waiting for this container to start". Either syncPod or syncTerminatingPod is responsible for handling this state.	2022-01-28 18:32:15 -05:00
Jordan Liggitt	1d27942efc	Include pod UID in secret/configmap cache key	2022-01-27 22:21:52 -05:00
Kubernetes Prow Robot	4dba52cdf4	Merge pull request #107821 from liggitt/kubelet-secret-manager Move kubelet secret and configmap manager calls to sync_Pod functions	2022-01-27 08:26:58 -08:00
Jordan Liggitt	085693eff2	Move kubelet secret and configmap manager calls to sync_Pod functions	2022-01-27 10:09:13 -05:00
Kubernetes Prow Robot	8712a903cb	Merge pull request #107608 from marseel/fake_prober_in_kubemark Use FakeProber in kubemark clusters	2022-01-26 19:42:49 -08:00
Jyoti Mahapatra	0e0abd602f	parse ipv6 address before comparison (#107736 ) * parse ipv6 address before comparison Signed-off-by: Jyoti Mahapatra <jyotima@amazon.com> * use parse sloppy Signed-off-by: Jyoti Mahapatra <jyotima@amazon.com> * use parse sloppy Signed-off-by: Jyoti Mahapatra <jyotima@amazon.com> * use node address from cloudprovider as is Signed-off-by: Jyoti Mahapatra <jyotima@amazon.com>	2022-01-26 18:38:49 -08:00
Marcel Zięba	b4b4b8fd6d	Use FakeProber in kubemark clusters	2022-01-26 09:29:04 +00:00
Kubernetes Prow Robot	38e9a29620	Merge pull request #106932 from SergeyKanzhelev/removeDynamicKubeletConfig Remove dynamic kubelet config	2022-01-25 19:20:25 -08:00
Ryan Phillips	25f95f2bde	kubelet: fix podstatus not containing pod full name	2022-01-25 13:21:04 -06:00
calvin	d9ab5e18d3	fix: data race when hijack klog Signed-off-by: calvin <wen.chen@daocloud.io>	2022-01-24 15:01:49 +08:00
fengzixu	9808ae48a0	change the volume health status metrics name	2022-01-23 02:44:10 +00:00
Jack	7655702313	add container probe duration metrics	2022-01-20 16:50:02 -08:00
yanghesong	4cab028a92	Remove dockershim comments in kubelet Signed-off-by: yanghesong <hesong.yang@foxmail.com>	2022-01-20 16:15:29 +08:00
Sergey Kanzhelev	7e7bc6d53b	remove DynamicKubeletConfig logic from kubelet	2022-01-19 22:38:04 +00:00
Paco Xu	6611c36372	add volume type and seperated histogram for volume stat collection	2022-01-19 22:33:37 +08:00
Ciprian Hacman	21809043b5	Remove deprecated flag --non-masquerade-cidr in kubelet Signed-off-by: Ciprian Hacman <ciprian@hakman.dev>	2022-01-19 09:17:26 +02:00
Kubernetes Prow Robot	feb758027c	Merge pull request #106907 from cyclinder/remove_dockershim_flags Clean up dockershim flags in the kubelet	2022-01-18 09:09:09 -08:00
Eric Lin	fea15977c8	Consider threads-max when deciding MaxPID. Fixes kubernetes#107111	2022-01-17 21:51:59 +00:00
Antonio Ojea	a20b2088ac	set secondary address on host-network pods host-network pods IPs are obtained from the reported kubelet nodeIPs. Historically, host-network podIPs are immutable once set, but when we've added dual-stack support, we didn't consider that the secondary IP address may not be present at the same time that the primary nodeIP. If a secondary IP address is added to a node after the host-network pods IPs are set, we can add the secondary host-network pod IP address maintaining the current behavior of not updating the current podIPs on host-network pods.	2022-01-17 18:05:42 +01:00
Paco Xu	e3745a10aa	add warning log if volume calculation took too long than 1 second	2022-01-17 10:40:49 +08:00
Kubernetes Prow Robot	22a03f893d	Merge pull request #107207 from ehashman/deprecate-log-sanitization Deprecate dynamic log sanitization	2022-01-15 15:19:26 -08:00
songlh	50840f5039	change to use require.NoError	2022-01-14 21:46:12 -05:00
cyclinder	07999dac70	Clean up dockershim flags in the kubelet Signed-off-by: cyclinder <qifeng.guo@daocloud.io> Co-authored-by: Ciprian Hacman <ciprian@hakman.dev> Signed-off-by: Ciprian Hacman <ciprian@hakman.dev>	2022-01-14 16:02:50 +02:00
Kubernetes Prow Robot	8c6b910e68	Merge pull request #107550 from wojtek-t/remove_selflink_from_kubelet Remove no-longer used selflink code from kubelet	2022-01-14 03:28:27 -08:00
Wojciech Tyczyński	6088fe4221	Remove no-longer used selflink code from kubelet	2022-01-14 10:38:23 +01:00
Kubernetes Prow Robot	3bd422dc76	Merge pull request #107293 from dims/jan-1-owners-cleanup Cleanup OWNERS files - Jan 2021 Week 1	2022-01-13 10:30:30 -08:00
JUN YANG	2247b76ab1	Add test cases of kubelet_pods_test.go. Signed-off-by: JUN YANG <yang.jun22@zte.com.cn>	2022-01-13 14:37:31 +08:00
Patrick Ohly	9eaa2dc554	avoid klog Info calls without verbosity In the following code pattern, the log message will get logged with v=0 in JSON output although conceptually it has a higher verbosity: if klog.V(5).Enabled() { klog.Info("hello world") } Having the actual verbosity in the JSON output is relevant, for example for filtering out only the important info messages. The solution is to use klog.V(5).Info or something similar. Whether the outer if is necessary at all depends on how complex the parameters are. The return value of klog.V can be captured in a variable and be used multiple times to avoid the overhead for that function call and to avoid repeating the verbosity level.	2022-01-12 07:48:36 +01:00
Kubernetes Prow Robot	b5103f6117	Merge pull request #107426 from yanghesong/remove_validate_runtime Remove runtime in validate	2022-01-11 20:50:36 -08:00
Eric Lin	5fdf24baca	Read number of running processes from /proc/loadavg. Fallback to using sysinfo syscall if failed. Fix kubernetes#107107	2022-01-11 21:33:53 +00:00
Kubernetes Prow Robot	cadbe8dfb5	Merge pull request #107250 from cndoit18/use-errors cleanup(kubelet): use errors.Is(err, os.ErrProcessDone)	2022-01-11 10:49:01 -08:00
Kubernetes Prow Robot	19069665f9	Merge pull request #107094 from adisky/d-container-runtime Mark container-runtime kubelet flag as deprecated	2022-01-11 10:48:46 -08:00
Kubernetes Prow Robot	7eb5046064	Merge pull request #106470 from qmloong/qmloong/fix fix: some typos and syncPod outdated workflow annotation	2022-01-11 10:48:38 -08:00
Kubernetes Prow Robot	5f4914604d	Merge pull request #106353 from gjkim42/remove-false-pleg-errors kubelet: Remove false PLEG errors	2022-01-11 10:48:26 -08:00
fengzixu	5d544d3f01	fix comment	2022-01-11 14:28:31 +00:00
fengzixu	f96449f2e2	fix unit test	2022-01-11 13:50:18 +00:00
fengzixu	e2b5b5465a	improve metrics comment	2022-01-11 13:50:18 +00:00
fengzixu	c1a58d715c	fix unit test	2022-01-11 13:50:18 +00:00
fengzixu	5593e27429	improve metrics comment	2022-01-11 13:50:18 +00:00
fengzixu	1cdc694ac2	fix unit test	2022-01-11 13:50:18 +00:00
fengzixu	4a72f08a28	add useful comment for volume stats metrics	2022-01-11 13:50:18 +00:00
fengzixu	b885deffe3	fix unit test	2022-01-11 13:50:17 +00:00
fengzixu	ed7fd0ced5	add volumeHealth label to metrics	2022-01-11 13:50:17 +00:00
fengzixu	bab1755274	fix: correct metrics expression	2022-01-11 13:50:17 +00:00
fengzixu	d71e21e01e	add volume kubelet_volume_stats_health_abnormal to kubelet	2022-01-11 13:50:17 +00:00
Dingzhu Lurong	1de2f3cc7d	add writer error handler	2022-01-11 11:47:25 +08:00
Kubernetes Prow Robot	a0dfd958d5	Merge pull request #107163 from cyclinder/fix_leak_goroutine fix goroutine leaks in TestConfigurationChannels	2022-01-10 17:23:16 -08:00
Davanum Srinivas	9682b7248f	OWNERS cleanup - Jan 2021 Week 1 Signed-off-by: Davanum Srinivas <davanum@gmail.com>	2022-01-10 08:14:29 -05:00
cyclinder	928e686877	fix goroutine leaks in TestConfigurationChannels Signed-off-by: cyclinder <qifeng.guo@daocloud.io>	2022-01-10 19:51:16 +08:00
yanghesong	6905fef761	Remove runtime in validate Validate is useless as dockershim is removed Signed-off-by: yanghesong <hesong.yang@foxmail.com>	2022-01-09 09:11:49 +08:00
wq	4f38d4aaa1	fix a typo in the comment of ImageCredentialProviderConfigFile	2022-01-09 00:07:43 +09:00
Kubernetes Prow Robot	d1a5513cb0	Merge pull request #107006 from gnufied/add-total-mount-time-metrics Add metric for reporting total end-to-end mount time	2022-01-07 06:19:31 -08:00
Kubernetes Prow Robot	09fccc3533	Merge pull request #106796 from jonyhy96/fix-timer kubelet: use newtimer instead in nodeshutdown manager	2022-01-06 11:47:12 -08:00
Kubernetes Prow Robot	03ee86c09c	Merge pull request #104837 from eggiter/fix-release-reused-cpus fix(cpumanager): Do not release CPUs of init containers while they are being reused in app containers	2022-01-06 11:46:38 -08:00
Kubernetes Prow Robot	0b9ad84973	Merge pull request #107116 from yxxhero/add_more_msg_for_no_podsandbox_container add more message for no PodSandbox container	2022-01-06 08:58:09 -08:00
Kubernetes Prow Robot	b457ae72f5	Merge pull request #106644 from ahrtr/add_info_counter_perfcounter Add more info when failing to call PdhAddEnglishCounter	2022-01-06 06:45:01 -08:00
Aditi Sharma	e03d7d3fdd	Mark container-runtime flag as deprecated Signed-off-by: Aditi Sharma <adi.sky17@gmail.com>	2022-01-06 10:23:03 +05:30
Mengjiao Liu	beda4cafb6	kubelet: Remove the deprecated flag `--experimental-check-node-capabilities-before-mount`	2022-01-06 11:47:11 +08:00
Kubernetes Prow Robot	73b68f5233	Merge pull request #106979 from a2ush/fix_typo Fix comment out typo (from resolve.conf to resolv.conf) and change the content name (from maxResolveConfLength to maxResolvConfLength)	2022-01-05 16:08:26 -08:00
Kubernetes Prow Robot	afd254a18f	Merge pull request #106756 from victory460/feature_helpers code cleanup for container/helpers.go	2022-01-05 08:20:42 -08:00
Kubernetes Prow Robot	19591a1324	Merge pull request #105829 from yuanchen8911/master Fix and improve comments on kubelet metrics	2022-01-04 23:02:32 -08:00
Kubernetes Prow Robot	abfbbe4dda	Merge pull request #107119 from hakman/remove_dockerless Remove dockerless build tag and DockerLegacyService interface	2022-01-04 11:27:21 -08:00
Paco Xu	c5d8354e0e	add "kubelet_volume_stat_cal_duration_seconds_bucket" VolumeStatCalDuration metrics for fsquato monitoring benchmark	2021-12-31 11:39:40 +08:00
cndoit18	601d02b90f	refactor(kubelet): use errors.Is(err, os.ErrProcessDone) use errors.Is(err, os.ErrProcessDone) here and remove "process already finished" string comparison. Signed-off-by: cndoit18 <cndoit18@outlook.com>	2021-12-29 18:10:06 +08:00
Elana Hashman	dbd50d9f50	Remove dynamic log sanitization fields from Kubelet config validation	2021-12-23 13:03:13 -08:00
Kubernetes Prow Robot	f0dbc32ed9	Merge pull request #106853 from gnufied/disable-exp-backoff-volume-not-inuse When volume is not marked in-use, do not backoff	2021-12-22 19:46:37 -08:00
Hemant Kumar	7989f27044	use node informer to check volumes attachment status before backoff fix unit tests	2021-12-20 11:57:05 -05:00
songlh	e03a0bc105	fixing the panic in TestVersion	2021-12-18 19:20:15 -05:00
Ciprian Hacman	5bae9b9288	Clean up DockerLegacyService interface Signed-off-by: Ciprian Hacman <ciprian@hakman.dev>	2021-12-18 12:24:54 +02:00
Ciprian Hacman	6cdb1c225d	Clean up dockerless build tag Signed-off-by: Ciprian Hacman <ciprian@hakman.dev>	2021-12-18 12:18:25 +02:00
yxxhero	a90b149be0	add more message for no PodSandbox container Signed-off-by: yxxhero <aiopsclub@163.com>	2021-12-18 09:52:03 +08:00
Davanum Srinivas	497e9c1971	Cleanup OWNERS files (No Activity in the last year) Signed-off-by: Davanum Srinivas <davanum@gmail.com>	2021-12-15 10:34:02 -05:00
a2ush	393dec26f6	Change the name of the constant	2021-12-14 22:42:57 +09:00
Hemant Kumar	55b5e6dc33	Add metric for reporting total end-to-end mount time This metric includes time spent in waiting for devices to be attached, any RPC calls and performing recursive chown etc.	2021-12-13 16:23:01 -05:00
a2ush	d775483381	Fix comment out typo	2021-12-11 22:27:38 +09:00
Kubernetes Prow Robot	1d66302c42	Merge pull request #106458 from dims/lint-yaml-in-owners-files Lint/Beautify yaml in OWNERS files	2021-12-10 06:39:12 -08:00
Kubernetes Prow Robot	1b0d83f1d6	Merge pull request #106599 from klueska/fix-numa-bug Fix Bugs in CPUManager distribute NUMA policy option	2021-12-10 04:41:12 -08:00
haoyun	92fa957dd1	feat: use clock instead Signed-off-by: haoyun <yun.hao@daocloud.io>	2021-12-10 13:59:12 +08:00
Kubernetes Prow Robot	15e5f2a19a	Merge pull request #106291 from sbs2001/fix_invalid_comment Remove invalid comment in legacyregistry	2021-12-09 19:03:10 -08:00
Davanum Srinivas	9405e9b55e	Check in OWNERS modified by update-yamlfmt.sh Signed-off-by: Davanum Srinivas <davanum@gmail.com>	2021-12-09 21:31:26 -05:00
David Porter	95264a418d	kubelet: set failed phase during graceful shutdown Revert to previous behavior in 1.21/1.20 of setting pod phase to failed during graceful node shutdown. Setting pods to failed phase will ensure that external controllers that manage pods like deployments will create new pods to replace those that are shutdown. Many customers have taken a dependency on this behavior and it was breaking change in 1.22, so this change reverts back to the previous behavior. Signed-off-by: David Porter <david@porter.me>	2021-12-09 13:17:40 -08:00
Kubernetes Prow Robot	cdf3ad823a	Merge pull request #97252 from dims/drop-dockershim Completely remove in-tree dockershim from kubelet	2021-12-08 12:51:46 -08:00
Kubernetes Prow Robot	f356ae4ad9	Merge pull request #101719 from SergeyKanzhelev/removeReallyCrashForTesting Remove ReallyCrashForTesting and cleaned up some references to Handle…	2021-12-07 23:39:45 -08:00
caozhiyuan	1a59bcb142	add validation test for RegisterWithTaints	2021-12-08 10:36:43 +08:00
Kubernetes Prow Robot	b685b3982d	Merge pull request #105360 from shuheiktgw/refactor_kubelet_config_validation_tests Refactor kubelet config validation tests	2021-12-07 17:25:43 -08:00
Davanum Srinivas	bc78dff42e	update files to drop dockershim Signed-off-by: Davanum Srinivas <davanum@gmail.com>	2021-12-07 15:15:13 -05:00
Davanum Srinivas	83265c9171	drop files deleted from pkg/kubelet/dockershim Signed-off-by: Davanum Srinivas <davanum@gmail.com>	2021-12-07 15:15:13 -05:00
Hemant Kumar	5b7b2e2f6c	When volume is not marked in-use, do not backoff	2021-12-07 11:50:15 -05:00
Sascha Grunert	a063a2ba3e	Revert dockershim CRI v1 changes We should not touch the dockershim ahead of removal and therefore default to `v1alpha2` CRI instead of `v1`. Partially reverts changes from https://github.com/kubernetes/kubernetes/pull/106501 Signed-off-by: Sascha Grunert <sgrunert@redhat.com>	2021-12-03 18:37:11 +01:00
xuweiwei	21238c2593	code cleanup for container/helpers.go	2021-12-01 11:17:33 +08:00
Sergey Kanzhelev	a11453efbc	remove ReallyCrashForTesting and cleaned up some references to HandleCrash behavior	2021-11-29 20:00:10 +00:00
menglong.qi	12eff56460	fix: syncPod outdated workflow comment	2021-11-28 17:21:29 +08:00
boenn	cec2aae1e5	rebase master	2021-11-25 11:21:12 +08:00
Kevin Klues	f8511877e2	Add regression test for CPUManager distribute NUMA algorithm We witnessed this exact allocation attempt in a live cluster and witnessed the algorithm fail with an accounting error. This test was added to verify that this case is now handled by the updates to the algorithm and that we don't regress from it in the future. "test" description="ensure previous failure encountered on live machine has been fixed (1/1)" "combo remainderSet balance" combo=[2 4 6] remainderSet=[2 4 6] distribution=9 remainder=1 available=[14 2 4 4 0 3 4 1] balance=4.031 "combo remainderSet balance" combo=[2 4 6] remainderSet=[2 4] distribution=9 remainder=1 available=[0 3 4 1 14 2 4 4] balance=4.031 "combo remainderSet balance" combo=[2 4 6] remainderSet=[2 6] distribution=9 remainder=1 available=[1 14 2 4 4 0 3 4] balance=4.031 "combo remainderSet balance" combo=[2 4 6] remainderSet=[4 6] distribution=9 remainder=1 available=[1 3 4 0 14 2 4 4] balance=4.031 "combo remainderSet balance" combo=[2 4 6] remainderSet=[2] distribution=9 remainder=1 available=[4 0 3 4 1 14 2 4] balance=4.031 "combo remainderSet balance" combo=[2 4 6] remainderSet=[4] distribution=9 remainder=1 available=[3 4 0 14 2 4 4 1] balance=4.031 "combo remainderSet balance" combo=[2 4 6] remainderSet=[6] distribution=9 remainder=1 available=[1 13 2 4 4 1 3 4] balance=3.606 "bestCombo found" distribution=9 bestCombo=[2 4 6] bestRemainder=[6] Signed-off-by: Kevin Klues <kklues@nvidia.com>	2021-11-24 20:49:58 +00:00
Kevin Klues	e284c74d93	Add unit test for CPUManager distribute NUMA algorithm verifying fixes Before Change: "test" description="ensure bestRemainder chosen with NUMA nodes that have enough CPUs to satisfy the request" "combo remainderSet balance" combo=[0 1 2 3] remainderSet=[0 1] distribution=8 remainder=2 available=[-1 -1 0 6] balance=2.915 "combo remainderSet balance" combo=[0 1 2 3] remainderSet=[0 2] distribution=8 remainder=2 available=[-1 0 -1 6] balance=2.915 "combo remainderSet balance" combo=[0 1 2 3] remainderSet=[0 3] distribution=8 remainder=2 available=[5 -1 0 0] balance=2.345 "combo remainderSet balance" combo=[0 1 2 3] remainderSet=[1 2] distribution=8 remainder=2 available=[0 -1 -1 6] balance=2.915 "combo remainderSet balance" combo=[0 1 2 3] remainderSet=[1 3] distribution=8 remainder=2 available=[0 -1 0 5] balance=2.345 "combo remainderSet balance" combo=[0 1 2 3] remainderSet=[2 3] distribution=8 remainder=2 available=[0 0 -1 5] balance=2.345 "bestCombo found" distribution=8 bestCombo=[0 1 2 3] bestRemainder=[0 3] --- FAIL: TestTakeByTopologyNUMADistributed (0.01s) --- FAIL: TestTakeByTopologyNUMADistributed/ensure_bestRemainder_chosen_with_NUMA_nodes_that_have_enough_CPUs_to_satisfy_the_request (0.00s) cpu_assignment_test.go:867: unexpected error [accounting error, not enough CPUs allocated, remaining: 1] After Change: "test" description="ensure bestRemainder chosen with NUMA nodes that have enough CPUs to satisfy the request" "combo remainderSet balance" combo=[0 1 2 3] remainderSet=[3] distribution=8 remainder=2 available=[0 0 0 4] balance=1.732 "bestCombo found" distribution=8 bestCombo=[0 1 2 3] bestRemainder=[3] SUCCESS Signed-off-by: Kevin Klues <kklues@nvidia.com>	2021-11-24 20:45:37 +00:00
Kevin Klues	031f11513d	Fix accounting bug in CPUManager distribute NUMA policy Without this fix, the algorithm may decide to allocate "remainder" CPUs from a NUMA node that has no more CPUs to allocate. Moreover, it was only considering allocation of remainder CPUs from NUMA nodes such that each NUMA node in the remainderSet could only allocate 1 (i.e. 'cpuGroupSize') more CPUs. With these two issues in play, one could end up with an accounting error where not enough CPUs were allocated by the time the algorithm runs to completion. The updated algorithm will now omit any NUMA nodes that have 0 CPUs left from the set of NUMA nodes considered for allocating remainder CPUs. Additionally, we now consider all combinations of nodes from the remainder set of size 1..len(remainderSet). This allows us to find a better solution if allocating CPUs from a smaller set leads to a more balanced allocation. Finally, we loop through all NUMA nodes 1-by-1 in the remainderSet until all rmeainer CPUs have been accounted for and allocated. This ensure that we will not hit an accounting error later on because we explicitly remove CPUs from the remainder set until there are none left. A follow-on commit adds a set of unit tests that will fail before these changes, but succeeds after them. Signed-off-by: Kevin Klues <kklues@nvidia.com>	2021-11-24 19:18:11 +00:00
Kevin Klues	5317a2e2ac	Fix error handling in CPUManager distribute NUMA tests Signed-off-by: Kevin Klues <kklues@nvidia.com>	2021-11-24 16:51:31 +00:00
Kevin Klues	dc4430b663	Add a sum() helper to the CPUManager cpuassignment logic Signed-off-by: Kevin Klues <kklues@nvidia.com>	2021-11-24 16:51:29 +00:00
Kevin Klues	cfacc22459	Allow the map.Values() function in the CPUManager to take a set of keys Signed-off-by: Kevin Klues <kklues@nvidia.com>	2021-11-24 16:51:28 +00:00
Kevin Klues	a160d9a8cd	Fix CPUManager algo to calculate min NUMA nodes needed for distribution Previously the algorithm was too restrictive because it tried to calculate the minimum based on the number of available NUMA nodes and the number of available CPUs on those NUMA nodes. Since there was no (easy) way to tell how many CPUs an individual NUMA node happened to have, the average across them was used. Using this value however, could result in thinking you need more NUMA nodes to possibly satisfy a request than you actually do. By using the total number of NUMA nodes and CPUs per NUMA node, we can get the true minimum number of nodes required to satisfy a request. For a given "current" allocation this may not be the true minimum, but its better to start with fewer and move up than to start with too many and miss out on a better option. Signed-off-by: Kevin Klues <kklues@nvidia.com>	2021-11-24 16:51:26 +00:00
Kevin Klues	209cd20548	Fix unit tests following bug fix in CPUManager for map functions (2/2) Now that the algorithm for balancing CPU distributions across NUMA nodes is correct, this test actually behaves differently for the "packed" vs. "distributed" allocation algorithms (as it should). In the "packed" case we need to ensure that CPUs are allocated such that they are packed onto cores. Since one CPU is already allocated from a core on NUMA node 0, we want the next CPU to be its hyperthreaded pair (even though the first available CPU id is on Socket 1). In the "distributed" case, however, we want to ensure CPUs are allocated such that we have an balanced distribution of CPUs across all NUMA nodes. This points to allocating from Socket 1 if the only other CPU allocated has been done on Socket 0. To allow CPUs allocations to be packed onto full cores, one can allocate them from the "distributed" algorithm with a 'cpuGroupSize' equal to the number of hypthreads per core (in this case 2). We added an explicit test case for this, demonstrating that we get the same result as the "packed" algorithm does, even though the "distributed" algorithm is in use. Signed-off-by: Kevin Klues <kklues@nvidia.com>	2021-11-24 16:51:24 +00:00
Kevin Klues	67f719cb1d	Fix unit tests following bug fix in CPUManager for map functions (1/2) This fixes two related tests to better test our "balanced" distribution algorithm. The first test originally provided an input with the following number of CPUs available on each NUMA node: Node 0: 16 Node 1: 20 Node 2: 20 Node 3: 20 It then attempted to distribute 48 CPUs across them with an expectation that each of the first 3 NUMA nodes would have 16 CPUs taken from them (leaving Node 0 with no more CPUs in the end). This would have resulted in the following amount of CPUs on each node: Node 0: 0 Node 1: 4 Node 2: 4 Node 3: 20 Which results in a standard deviation of 7.6811 However, a more balanced solution would actually be to pull 16 CPUs from NUMA nodes 1, 2, and 3, and leave 0 untouched, i.e.: Node 0: 16 Node 1: 4 Node 2: 4 Node 3: 4 Which results in a standard deviation of 5.1961524227066 To fix this test we changed the original number of available CPUs to start with 4 less CPUs on NUMA node 3, and 2 more CPUs on NUMA node 0, i.e.: Node 0: 18 Node 1: 20 Node 2: 20 Node 3: 16 So that we end up with a result of: Node 0: 2 Node 1: 4 Node 2: 4 Node 3: 16 Which pulls the CPUs from where we want and results in a standard deviation of 5.5452 For the second test, we simply reverse the number of CPUs available for Nodes 0 and 3 as: Node 0: 16 Node 1: 20 Node 2: 20 Node 3: 18 Which forces the allocation to happen just as it did for the first test, except now on NUMA nodes 1, 2, and 3 instead of NUMA nodes 0,1, and 2. Signed-off-by: Kevin Klues <kklues@nvidia.com>	2021-11-24 16:51:23 +00:00
Kevin Klues	4008ea0b4c	Fix bug in CPUManager map.Keys() and map.Values() implementations Previously these would return lists that were too long because we appended to pre-initialized lists with a specific size. Since the primary place these functions are used is in the mean and standard deviation calculations for the NUMA distribution algorithm, it meant that the results of these calculations were often incorrect. As a result, some of the unit tests we have are actually incorrect (because the results we expect do not actually produce the best balanced distribution of CPUs across all NUMA nodes for the input provided). These tests will be patched up in subsequent commits. Signed-off-by: Kevin Klues <kklues@nvidia.com>	2021-11-24 16:51:21 +00:00
Kevin Klues	446c58e0e7	Ensure we balance across all NUMA nodes in NUMA distribution algo Signed-off-by: Kevin Klues <kklues@nvidia.com>	2021-11-24 16:51:19 +00:00
Kevin Klues	c8559bc43e	Short-circuit CPUManager distribute NUMA algo for unusable cpuGroupSize Signed-off-by: Kevin Klues <kklues@nvidia.com>	2021-11-24 16:51:16 +00:00
Kevin Klues	b28c1392d7	Round the CPUManager mean and stddev calculations to the nearest 1000th Signed-off-by: Kevin Klues <kklues@nvidia.com>	2021-11-24 16:51:13 +00:00
ahrtr	b7f22801fe	add more info when failing to call PdhAddEnglishCounter	2021-11-24 13:49:34 +08:00
Kubernetes Prow Robot	ddfc53922c	Merge pull request #106414 from jonyhy96/kubelet-fix-flake kubelet: fix npe in test	2021-11-19 07:06:51 -08:00
haoyun	65ac99eef5	fix: npe in kubelet test Signed-off-by: haoyun <yun.hao@daocloud.io> Co-authored-by: Antonio Ojea <antonio.ojea.garcia@gmail.com>	2021-11-19 17:44:05 +08:00
shuheiktgw	2acdaeb361	Refactor Kubelet config validation tests	2021-11-18 22:38:01 +09:00
shuheiktgw	35ad91ab37	Refactor Kubelet config validations	2021-11-18 22:31:31 +09:00
Shivam Sandbhor	6652c54d83	Remove invalid comment in legacyregistry Signed-off-by: Shivam Sandbhor <shivam.sandbhor@gmail.com>	2021-11-18 15:05:00 +05:30
Kubernetes Prow Robot	d766ab88f7	Merge pull request #106501 from ehashman/cri-graduation-v1 Make CRI v1 the default and allow a fallback to v1alpha2	2021-11-17 19:57:01 -08:00
Kubernetes Prow Robot	91b7fb4dc9	Merge pull request #102915 from wzshiming/feat/graceful-shutdown-based-on-pod-priority Graceful Node Shutdown Based On Pod Priority	2021-11-17 18:45:03 -08:00
Kubernetes Prow Robot	321e22d365	Merge pull request #106505 from ehashman/revert-103980-dkc-metrics Revert "Bump DynamicKubeConfig metric deprecation to 1.23"	2021-11-17 16:55:03 -08:00
Kubernetes Prow Robot	e4952f32b7	Merge pull request #106463 from SergeyKanzhelev/grpcProbe Implement grpc probe action	2021-11-17 12:43:54 -08:00
Elana Hashman	b35c500541	Revert "Bump DynamicKubeConfig metric deprecation to 1.23"	2021-11-17 11:48:49 -08:00
Elana Hashman	31c4273f66	Add test for memory equivalence See https://github.com/kubernetes/kubernetes/pull/106006#issuecomment-971004230 Co-Authored-By: Jordan Liggitt <liggitt@google.com>	2021-11-17 11:07:09 -08:00
Sascha Grunert	de37b9d293	Make CRI `v1` the default and allow a fallback to `v1alpha2` This patch makes the CRI `v1` API the new project-wide default version. To allow backwards compatibility, a fallback to `v1alpha2` has been added as well. This fallback can either used by automatically determined by the kubelet. Signed-off-by: Sascha Grunert <sgrunert@redhat.com>	2021-11-17 11:05:05 -08:00
Sergey Kanzhelev	b7affcced1	implement :grpc probe action	2021-11-17 17:31:23 +00:00
Antonio Ojea	d126b14838	migrate nolint coments to golangci-lint	2021-11-17 13:58:53 +01:00
Hanna Lee	e78b3e8dfe	Use nolint directive instead of stopping ticker, per liggit's suggestion	2021-11-17 08:56:57 +01:00
Hanna Lee	69d029bddb	Add syncTicker.Stop()	2021-11-17 08:56:57 +01:00
Hanna Lee	07a883d8e6	Remove //lint:ignore pragmas that aren't being used anymore	2021-11-17 08:56:54 +01:00
Hanna Lee	1fbf06f5ad	Use time.NewTicker instead of time.Tick to avoid leaking	2021-11-17 08:56:00 +01:00
Hanna Lee	0f3836dcc5	Ignore deprecation warnings with //nolint:staticcheck	2021-11-17 08:55:57 +01:00
Kubernetes Prow Robot	6c357f9996	Merge pull request #106041 from jonyhy96/volumemanager-reconciler-codefmt kubelet: extract multiple ignore errors validate logic to isExpectedError	2021-11-16 22:55:53 -08:00
Shiming Zhang	7a6f792ff3	Add validation for GracefulNodeShutdownBasedOnPodPriority Co-authored-by: Elana Hashman <ehashman@users.noreply.github.com>	2021-11-17 11:47:12 +08:00
Shiming Zhang	545313bdc7	Implement graceful shutdown based on Pod priority	2021-11-17 11:47:12 +08:00
Shiming Zhang	d82f606970	Add field for KubeletConfiguration and Regenerate	2021-11-17 11:47:12 +08:00
Kubernetes Prow Robot	1f6d5caa9a	Merge pull request #105437 from cmssczy/update-kubelet-configuration migrate --register-with-taints to KubeletConfiguration	2021-11-16 17:44:00 -08:00
menglong.qi	b886b9b108	fix: typo	2021-11-17 09:22:57 +08:00
Kubernetes Prow Robot	42d8b2f3b9	Merge pull request #106289 from CatherineF-dev/fix-metrics-AlreadyRegisteredError-in-unit-test Fix metrics AlreadyRegisteredError on TestRecordOperation and TestGetHistogramVecFromGatherer unit test	2021-11-16 16:36:15 -08:00
Kubernetes Prow Robot	6805e6ee41	Merge pull request #104722 from leiyiz/migration turning on the CSIMigrationGCE feature flag	2021-11-16 15:28:32 -08:00
Léiyì Zhang	275fdf0884	fixing unit test failures induced by turning on CSIMigrationGCE disable CSIMigrationGCE in some unit tests	2021-11-16 19:26:30 +00:00
CatherineF-dev	5646120fbb	Use Reset at first	2021-11-16 18:57:24 +00:00
haoyun	b5409adaeb	refactor: extract multiple ignore errors validate to ignoreError Signed-off-by: haoyun <yun.hao@daocloud.io>	2021-11-16 20:43:50 +08:00
caozhiyuan	bad4faf1b9	migrate --register-with-taints to KubeletConfiguration	2021-11-16 19:10:36 +08:00
Kubernetes Prow Robot	1d1d462d2f	Merge pull request #104287 from jsturtevant/windows-stats Reduce the number of expensive calls in the Windows stats queries for dockershim	2021-11-15 18:51:37 -08:00
Kubernetes Prow Robot	0473cab823	Merge pull request #103299 from wgahnagl/addPinned prevents garbage collection from removing pinned images	2021-11-15 18:51:25 -08:00
Kubernetes Prow Robot	39af75af30	Merge pull request #106201 from yxxhero/fea_106111 Add more msg when exec probe timeout	2021-11-15 17:51:37 -08:00
Kubernetes Prow Robot	463802765d	Merge pull request #104650 from yxxhero/initcontainer_oomkiil_as_a_failure fix init container oomkilled as a failure	2021-11-15 17:51:25 -08:00
Kubernetes Prow Robot	b7c4962472	Merge pull request #105685 from liggitt/kubelet-file-test Simplify kubelet file config field allowlists	2021-11-15 14:06:48 -08:00
Odin Ugedal	de0ece541c	Fix cpu share issues on systems with large amounts of cpu On systems where the calculated cpu shares results in a value above the max value in linux, containers getting that value are unable to start. This occur on systems with 300+ cpu cores, and where containers are given such a value. This issue was fixed for the pod and qos control groups in the similar cm.MilliCPUToShares that also has tests verifying the behavior. Since this code already has an dependency on kubelet/cm, lets reuse that code instead.	2021-11-14 19:49:19 +00:00
Kubernetes Prow Robot	e4c795168b	Merge pull request #106332 from bobbypage/disable-memcg-notifier kubelet: cgroupv2 disable memcg notifications	2021-11-12 18:36:46 -08:00
CatherineF-dev	d9737eabf4	Use HandlerFor	2021-11-12 23:09:51 +00:00
CatherineF-dev	49d341aa2b	Use defer in non-loop	2021-11-12 23:03:38 +00:00
Kubernetes Prow Robot	1f6aa87a93	Merge pull request #105744 from jsturtevant/windows-containerd-networkstats Get Windows network stats directly for Containerd	2021-11-12 12:36:41 -08:00
Kubernetes Prow Robot	5f0a94b23c	Merge pull request #104743 from gjkim42/ensure-pod-uniqueness Ensure there is one running static pod with the same full name	2021-11-12 12:36:28 -08:00
Kubernetes Prow Robot	6c04f87470	Merge pull request #106382 from rphillips/fix_close_log kubelet: fix file descriptor leak in log rotations	2021-11-12 09:22:40 -08:00
Neha Lohia	fa1b6765d5	move pkg/util/node to component-helpers/node/util (#105347 ) Signed-off-by: Neha Lohia <nehapithadiya444@gmail.com>	2021-11-12 07:52:27 -08:00
CatherineF-dev	a30af261f1	remove lint	2021-11-12 15:03:44 +00:00
Ryan Phillips	d6f9df424a	defer close the rotated log open	2021-11-12 08:13:24 -06:00
CatherineF-dev	a8324a3bb7	clean	2021-11-12 03:52:19 +00:00
CatherineF-dev	744785ee40	remove prometheus.DefaultRegisterer	2021-11-12 02:17:28 +00:00
Kubernetes Prow Robot	3ca3daac76	Merge pull request #103415 from tiloso/staticcheck-kubelet Fix staticcheck failure in pkg/kubelet/cm/cpuset	2021-11-11 15:15:13 -08:00
Gunju Kim	2dd4a00509	kubelet: Remove false PLEG errors	2021-11-12 00:03:01 +09:00
David Porter	f5140d3145	kubelet: cgroupv2 disable memcg notifications The current memory notifier on cgroupv2 relies on reading `cgroup.event_control` which is unsupported on cgroupv2. For now, let's disable the feature on cgroupv2.	2021-11-10 15:40:59 -08:00
ravisantoshgudimetla	696abecada	[test][kubelet]: Fix out of bounds in TestSyncLabels unit	2021-11-10 16:53:59 -05:00
James Sturtevant	ab2e58c416	Get networks stats directly	2021-11-10 12:43:56 -08:00
James Sturtevant	c39945c116	Add unit tests to existing code	2021-11-10 11:50:04 -08:00
James Sturtevant	3564cd5beb	Reduce calls to docker from dockershim for stats	2021-11-10 11:25:03 -08:00
Kubernetes Prow Robot	b56dc43458	Merge pull request #106282 from bobbypage/cadvisor-v043 vendor: Bump cAdvisor to v0.43.0	2021-11-10 08:17:38 -08:00
CatherineF-dev	8290400e9c	format	2021-11-10 03:29:13 +00:00
CatherineF-dev	ef0b2dfbf4	Fix metrics AlreadyRegisteredError on TestRecordOperation and TestGetHistogramVecFromGatherer unit test	2021-11-10 03:23:54 +00:00
Kubernetes Prow Robot	5d60c8d857	Merge pull request #102393 from mengjiao-liu/fix-sysctl-regex Upgrade preparation to verify sysctl values containing forward slashes by regex	2021-11-09 18:23:26 -08:00
David Porter	b6269ce5de	kubelet: update cAdvisor usage for v0.43 * Change cAdvisor manager constructor * Change call to adding AcceleratorUsageMetrics Signed-off-by: David Porter <david@porter.me>	2021-11-09 17:09:12 -08:00
Kubernetes Prow Robot	6ac2d8edc8	Merge pull request #105967 from shivanshu1333/feature2/master/105841 Migrated scheduler files `preemption.go`, `stateful.go`, `resource_allocation.go` to structured logging	2021-11-09 10:28:01 -08:00
ravisantoshgudimetla	889d45d3fb	[kubelet] Reject pods with OS field mismatch Once kubernetes#104613 and kubernetes#104693 merge, we'll have OS field in pod spec. Kubelet should start rejecting pods where pod.Spec.OS and node's OS(using runtime.GOOS) won't match	2021-11-08 19:18:15 -05:00
Kubernetes Prow Robot	cda360c59f	Merge pull request #104613 from ravisantoshgudimetla/reconcile-labels [kubelet]: Reconcile OS and arch labels periodically	2021-11-08 14:15:19 -08:00
Kubernetes Prow Robot	8b463cd141	Merge pull request #105406 from marosset/kubelet-metrics-for-host-process-containers Adding kubelet metrics for started and failed to start HostProcess containers	2021-11-08 13:11:20 -08:00
Shivanshu Raj Shrivastava	f4aad52885	migrated preemption.go, stateful.go, resource_allocation.go to structured logging	2021-11-08 22:52:47 +05:30
Kubernetes Prow Robot	33de444861	Merge pull request #103095 from haircommander/podAndContainerStatsFromCRI-feature-gate Kubelet: implement support for podAndContainerStatsFromCRI	2021-11-07 18:26:53 -08:00
yxxhero	4211826c3c	add more msg when exec probe timeout Signed-off-by: yxxhero <aiopsclub@163.com>	2021-11-06 15:59:22 +08:00
ravisantoshgudimetla	21c5c2ec5c	[kubelet][podadmission]: Validate and reject pods with mismatching labels	2021-11-05 18:47:43 -04:00
ravisantoshgudimetla	02c1bac0b6	[kubelet]: Sync label periodically	2021-11-05 18:47:43 -04:00
Mark Rossetti	ef324d6bbd	Adding kubelet metrics for started and failed to start HostProcess containers Signed-off-by: Mark Rossetti <marosset@microsoft.com>	2021-11-04 14:39:57 -07:00
Andy Pan	3033a64135	kubelet/eviction: eliminate redundant allocations when handling eventfd	2021-11-04 15:41:46 +08:00
Mengjiao Liu	275d832ce2	Upgrade preparation to verify sysctl values containing forward slashes by regex	2021-11-04 11:49:56 +08:00
Skyler Clark	e9766c2b81	adds pinned field to imageRecords	2021-11-03 14:47:37 -04:00
Patrick Ohly	3948cb8d1b	component-base: move v/vmodule/log-flush-frequency into LoggingConfiguration These three options are the ones from logs.AddFlags which are not deprecated. Therefore it makes sense to make them available also via the configuration file support in the one command which currently supports that (kubelet). Long-term, all commands should use LoggingConfiguration, either with a configuration file (as in kubelet) or via flags (kube-scheduler, kube-apiserver, kube-controller-manager). Short-term, both approaches have to be supported. As the majority of the commands only use logs.AddFlags, that function by default continues to register the flags and only leaves that to Options.AddFlags when explicitly requested. A drive-by bug fix is done for log flushing: the periodic flushing called klog.Flush and therefore missed explicit flushing of the newer logr backend. This bug was never present in any release Kubernetes and therefore the fix is not submitted in a separate PR.	2021-11-03 07:41:46 +01:00
Kubernetes Prow Robot	aa0ea62489	Merge pull request #104903 from ikeeip/storageobjectinuseprotection_feature_ga_cleanup Remove StorageObjectInUseProtection feature gate logic	2021-11-02 20:22:57 -07:00
Kubernetes Prow Robot	359b722c19	Merge pull request #102882 from fromanirh/device-manager-checkpoints devicemanager: checkpoint: support pre-1.20 data	2021-11-02 16:56:57 -07:00
Konstantin Misyutin	808c8f42d5	Remove StorageObjectInUseProtection feature gate logic This feature has graduated to GA in v1.11 and will always be enabled. So no longe need to check if enabled. Signed-off-by: Konstantin Misyutin <konstantin.misyutin@huawei.com>	2021-11-03 00:13:50 +03:00
Skyler Clark	d3ae0a381a	prevents garbage collection from removing pinned images	2021-11-02 14:43:02 -04:00
Jordan Liggitt	94d0c0f78e	Simplify kubelet file config field allowlists	2021-11-02 10:23:54 -04:00
Kubernetes Prow Robot	08bf54678e	Merge pull request #101909 from nolancon/cpu-mgr-testing Additional cases for reconcileState testing	2021-10-30 00:01:17 -07:00
Tim Hockin	11a25bfeb6	De-share the Handler struct in core API (#105979 ) * De-share the Handler struct in core API An upcoming PR adds a handler that only applies on one of these paths. Having fields that don't work seems bad. This never should have been shared. Lifecycle hooks are like a "write" while probes are more like a "read". HTTPGet and TCPSocket don't really make sense as lifecycle hooks (but I can't take that back). When we add gRPC, it is EXPLICITLY a health check (defined by gRPC) not an arbitrary RPC - so a probe makes sense but a hook does not. In the future I can also see adding lifecycle hooks that don't make sense as probes. E.g. 'sleep' is a common lifecycle request. The only option is `exec`, which requires having a sleep binary in your image. * Run update scripts	2021-10-29 13:15:11 -07:00
Peter Hunt	6b3f8e5662	kubelet: fallback to partial CRI stats if full fails This is partially to allow the kube alpha tests to pass until CRI implementations have support, but also to handle this error situation a bit more elegantly Signed-off-by: Peter Hunt <pehunt@redhat.com>	2021-10-29 09:40:20 -04:00
Peter Hunt	feb5f5e0ed	kubelet: use helper function to check for nil fields in sandbox stats Signed-off-by: Peter Hunt <pehunt@redhat.com>	2021-10-29 09:40:20 -04:00
Peter Hunt	85e8a4bf73	kubelet stats: use UsageNanoCores if available Signed-off-by: Peter Hunt <pehunt@redhat.com>	2021-10-29 09:40:20 -04:00
Peter Hunt	ffdb4b9c4a	kubelet: slightly move around some cri stats functions to reduce duplication and add clarity Signed-off-by: Peter Hunt <pehunt@redhat.com>	2021-10-29 09:40:20 -04:00
Peter Hunt	d2c436700e	kubelet stats: add support for podAndContainerStatsFromCRI This commit adds an initial implementation of translating from the new CRI fields to the /stats/summary PodStats object Signed-off-by: Peter Hunt <pehunt@redhat.com>	2021-10-29 09:40:20 -04:00
Peter Hunt	7866287ba1	kubelet stats: wire up podAndContainerStatsFromCRI feature gate though it is currently unused Signed-off-by: Peter Hunt <pehunt@redhat.com>	2021-10-29 09:40:20 -04:00
Kubernetes Prow Robot	c592bd40f2	Merge pull request #105609 from pohly/generic-ephemeral-volume-ga generic ephemeral volume GA	2021-10-28 17:36:50 -07:00
Francesco Romani	2f426fdba6	devicemanager: checkpoint: support pre-1.20 data The commit `a8b8995ef2` changed the content of the data kubelet writes in the checkpoint. Unfortunately, the checkpoint restore code was not updated, so if we upgrade kubelet from pre-1.20 to 1.20+, the device manager cannot anymore restore its state correctly. The only trace of this misbehaviour is this line in the kubelet logs: ``` W0615 07:31:49.744770 4852 manager.go:244] Continue after failing to read checkpoint file. Device allocation info may NOT be up-to-date. Err: json: cannot unmarshal array into Go struct field PodDevicesEntry.Data.PodDeviceEntries.DeviceIDs of type checkpoint.DevicesPerNUMA ``` If we hit this bug, the device allocation info is indeed NOT up-to-date up until the device plugins register themselves again. This can take up to few minutes, depending on the specific device plugin. While the device manager state is inconsistent: 1. the kubelet will NOT update the device availability to zero, so the scheduler will send pods towards the inconsistent kubelet. 2. at pod admission time, the device manager allocation will not trigger, so pods will be admitted without devices actually being allocated to them. To fix these issues, we add support to the device manager to read pre-1.20 checkpoint data. We retroactively call this format "v1". Signed-off-by: Francesco Romani <fromani@redhat.com>	2021-10-26 09:54:11 +02:00
Kubernetes Prow Robot	17da6a2345	Merge pull request #105699 from yuzhiquan/remove-format-pods Remove format.pods func, instead with klog.Kobjs	2021-10-25 15:53:30 -07:00
Yuan Chen	b99495d1d9	Fix and improve comments on kubelet metrics	2021-10-21 17:38:25 -07:00
Eric Ernst	2c0fad1f52	kuberuntime: populate sandbox resources, overhead Populate Resources and Overhead fields which, are now part of LinuxPodSandboxConfig. Signed-off-by: Eric Ernst <eric_ernst@apple.com>	2021-10-20 11:30:23 -07:00
Eric Ernst	ddcf815d12	kuberuntime: refactor linux resources for better reuse Seperate the CPU/Memory req/limit -> linux resource conversion into its own function for better reuse. Elsewhere in kuberuntime pkg, we will want to leverage this requests/limits to Linux Resource type conversion. Signed-off-by: Eric Ernst <eric_ernst@apple.com>	2021-10-20 11:30:23 -07:00
Eric Ernst	b1361aed93	kuberuntime: augment linux container config unit test Signed-off-by: Eric Ernst <eric_ernst@apple.com>	2021-10-20 11:30:23 -07:00
Eric Ernst	a73502a0be	kuberuntime: augment linux container config unit test Signed-off-by: Eric Ernst <eric_ernst@apple.com>	2021-10-20 11:29:22 -07:00
Kubernetes Prow Robot	b2c4269992	Merge pull request #105631 from klueska/upstream-distribute-cpus-across-numa Add CPUManager policy option to distribute CPUs across NUMA nodes instead of packing them	2021-10-19 11:40:24 -07:00
Gunju Kim	3bce245279	Ensure there is one running static pod with the same full name	2021-10-19 16:30:18 +09:00
Kubernetes Prow Robot	1af8a8c026	Merge pull request #105465 from marosset/remove-host-process-contianer-kubelet-annotations Stop passing WindowsHostProcessContainer annotations for CRI calls in kubelet	2021-10-18 15:50:02 -07:00
Kubernetes Prow Robot	e595d79dfc	Merge pull request #104574 from 249043822/br-repeat-package fix duplicate package import in pod_worker	2021-10-18 15:49:46 -07:00
Kubernetes Prow Robot	5889fb4fbc	Merge pull request #105652 from wzshiming/feat/structure-shutdown-config Refactor to use structure to pass parameters for GracefulNodeShutdown	2021-10-18 14:45:20 -07:00
Kevin Klues	86f9c266bc	Add optimizations to reduce iterations in distributed NUMA algorithm Signed-off-by: Kevin Klues <kklues@nvidia.com>	2021-10-18 08:53:25 +00:00
Kevin Klues	70e0f47191	Support full-pcpus-only with the new NUMA distribution policy option Signed-off-by: Kevin Klues <kklues@nvidia.com>	2021-10-16 19:31:02 +00:00
Kevin Klues	d54445a84d	Generalize the NUMA distribution algorithm to take cpuGroupSize This parameter ensures that CPUs are always allocated in groups of size 'cpuGroupSize'. This is important, for example, to ensure that all CPUs (i.e. hyperthreads) from the same core are handed out together. Signed-off-by: Kevin Klues <kklues@nvidia.com>	2021-10-16 19:31:02 +00:00
Kevin Klues	1436e33642	Add more extensive testing for NUMA distribution algorithm in CPUManager Signed-off-by: Kevin Klues <kklues@nvidia.com>	2021-10-16 19:31:02 +00:00
Kevin Klues	cf3afb8602	Add 2 distinguishing test cases between the 2 takeByTopology algorithms Signed-off-by: Kevin Klues <kklues@nvidia.com>	2021-10-16 19:31:02 +00:00
Kevin Klues	eb78e2406b	Add a new TestTakeByTopologyNUMADistributed() test to the CPUManager As part of this, pull out all of the existing "TakeByTopology" tests and have them be called by the original TestTakeByTopologyNUMAPacked() as well as the new TestTakeByTopologyNUMADistributed() test. In a subsequent commit, we will add some tests that should differ between these two algorithms. Signed-off-by: Kevin Klues <kklues@nvidia.com>	2021-10-16 19:31:02 +00:00
Kevin Klues	876dd9b078	Added algorithm to CPUManager to distribute CPUs across NUMA nodes Signed-off-by: Kevin Klues <kklues@nvidia.com>	2021-10-16 19:31:02 +00:00
Kevin Klues	462544d079	Split CPUManager takeByTopology() into two different algorithms The first implements the original algorithm which packs CPUs onto NUMA nodes if more than one NUMA node is required to satisfy the allocation. The second disitributes CPUs across NUMA nodes if they can't all fit into one. The "distributing" algorithm is currently a noop and just returns an error of "unimplemented". A subsequent commit will add the logic to implement this algorithm according to KEP 2902: https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/2902-cpumanager-distribute-cpus-policy-option Signed-off-by: Kevin Klues <kklues@nvidia.com>	2021-10-16 14:46:19 +00:00
Kevin Klues	0e7928edce	Add new CPUManager policy option for "distribute-cpus-across-numa" This commit only adds the option to the policy options framework. A subsequent commit will add the logic to utilize it. The KEP describing this new option can be found here: https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/2902-cpumanager-distribute-cpus-policy-option Signed-off-by: Kevin Klues <kklues@nvidia.com>	2021-10-16 14:46:19 +00:00
yuzhiquanlong	27fe56e916	remove unused import	2021-10-15 18:40:31 +08:00
Francesco Romani	4bae656835	cpumanager: test NUMA node support for CPU assign (2) This batch of tests adds a fake topology on which each numa node has multiple sockets. We didn't find yet a real HW topology in the wild like this, but we need one to fully exercise the code. So, until we find a HW topology, we add a fake one flipping the NUMA/socket config of the existing xeon dual gold 6320. Signed-off-by: Francesco Romani <fromani@redhat.com>	2021-10-15 10:29:21 +00:00
Francesco Romani	547996f3f6	cpumanager: test NUMA node support for CPU assign (1) This batch of tests adds a real topology on which each physical socket has multiple NUMA zones. Taken by a real dual xeon 6320 gold. Signed-off-by: Francesco Romani <fromani@redhat.com>	2021-10-15 10:29:21 +00:00
Francesco Romani	f6ccc4426a	cpumanager: test: use proper subtests The exisiting unit tests where performing subtests without actually using the full features of the testing package (https://pkg.go.dev/testing#hdr-Subtests_and_Sub_benchmarks) Update them with fairly minimal changes. The patch is deceptively large because we need to move the code inside a new block. Signed-off-by: Francesco Romani <fromani@redhat.com>	2021-10-15 10:29:21 +00:00
Francesco Romani	15caa134b2	cpumanager: topology: use rich cmp package User the `cmp.Diff` package in the unit tests, moving away from `reflect.DeepEqual`. This gives us a clearer picture of the differences when the tests fail. Signed-off-by: Francesco Romani <fromani@redhat.com>	2021-10-15 10:29:21 +00:00
Kevin Klues	aff54a0914	Abstract out whether NUMA or Sockets come first in the memory hierarchy This allows us to get rid of the check for determining which one is higher all throughout the code. Now we just check once and instantiate an interface of the appropriate type that makes sure the ordering in the hierarchy is preserved through the appropriate calls. Signed-off-by: Kevin Klues <kklues@nvidia.com>	2021-10-15 10:29:15 +00:00
yuzhiquanlong	be9e1fda5e	remove format pods func, instead with klog.Kobjs	2021-10-15 18:26:02 +08:00
Kevin Klues	17c7e86c6d	Add NUMA support to the CPU assignment algorithm in the CPUManager Signed-off-by: Kevin Klues <kklues@nvidia.com>	2021-10-15 08:35:59 +00:00
Shiming Zhang	e47c78a354	Add log for creating node shutdown manager	2021-10-15 11:16:21 +08:00
Shiming Zhang	b468c24e85	Refactor to use structure to pass parameters	2021-10-15 11:16:21 +08:00
Kubernetes Prow Robot	a923852ba0	Merge pull request #105215 from rphillips/add_probe_shutdown kubelet: add probe termination to graceful shutdowns	2021-10-11 21:19:46 -07:00
Patrick Ohly	a8c930ef46	generic ephemeral volume: graduation to GA The feature gate gets locked to "true", with the goal to remove it in two releases. All code now can assume that the feature is enabled. Tests for "feature disabled" are no longer needed and get removed. Some code wasn't using the new helper functions yet. That gets changed while touching those lines.	2021-10-11 20:54:20 +02:00
nolancon	6bbb36df10	Additional cases for reconcileState testing	2021-10-11 16:17:21 +00:00
Kubernetes Prow Robot	dc9c571166	Merge pull request #105569 from pohly/generic-ephemeral-kubelet-volume-stats kubelet: also provide filesystem stats for generic ephemeral volumes	2021-10-11 07:52:39 -07:00
Kubernetes Prow Robot	1f2813368e	Merge pull request #105542 from pohly/generic-ephemeral-volume-util-kubelet kubelet: use generic ephemeral volume helper functions	2021-10-11 02:16:40 -07:00
Kubernetes Prow Robot	fb82a0d7eb	Merge pull request #104873 from pohly/json-output-stream JSON output streams	2021-10-10 17:04:37 -07:00
Patrick Ohly	b22263d835	component-base: configurable JSON output This implements the replacement of klog output to different files per level with optionally splitting JSON output into two streams: one for info messages on stdout, one for error messages on stderr. The info messages can get buffered to increase performance. Because stdout and stderr might be merged by the consumer, the info stream gets flushed before writing an error, to ensure that the order of messages is preserved. This also ensures that the following code pattern doesn't leak info messages: klog.ErrorS(err, ...) os.Exit(1) Commands explicitly have to flush before exiting via logs.FlushLogs. Most already do. But buffered info messages can still get lost during an unexpected program termination, therefore buffering is off by default. The new options get added to the v1alpha1 LoggingConfiguration with new command line flags. Because it is an alpha field, changing it inside the v1beta kubelet config should be okay as long as the fields are clearly marked as alpha.	2021-10-09 10:10:35 +02:00
Kubernetes Prow Robot	63f66e6c99	Merge pull request #105012 from fromanirh/cpumanager-policy-options-beta node: graduate CPUManagerPolicyOptions to beta	2021-10-08 07:32:59 -07:00
Kubernetes Prow Robot	2face135c7	Merge pull request #97415 from AlexeyPerevalov/ExcludeSharedPoolFromPodResources Return only isolated cpus in podresources interface	2021-10-08 05:58:58 -07:00
Patrick Ohly	b1ba381ef8	kubelet: also provide filesystem stats for generic ephemeral volumes When checking for a reference to a PVC, the code also needs to consider that a PVC might be referenced indirectly through an ephemeral volume source.	2021-10-08 12:11:52 +02:00
Kubernetes Prow Robot	dd650bd41f	Merge pull request #105527 from rphillips/fixes/filter_terminated_pods kubelet: set terminated podWorker status for terminated pods	2021-10-07 22:19:51 -07:00
Ryan Phillips	0166d446b9	kubelet: set terminated podWorker status for terminated pods	2021-10-07 16:18:59 -05:00
Patrick Ohly	844662e7fa	kubelet: use generic ephemeral volume helper functions The name concatenation and ownership check were originally considered small enough to not warrant dedicated functions, but the intent of the code is more readable with them.	2021-10-07 17:31:54 +02:00
Alexey Perevalov	5d9032007a	Return only isolated cpus in podresources interface Co-Authored-by: Swati Sehgal <swsehgal@redhat.com> Signed-off-by: Alexey Perevalov <alexey.perevalov@huawei.com>	2021-10-07 15:34:08 +01:00
Kubernetes Prow Robot	c4d802b0b5	Merge pull request #103289 from AlexeyPerevalov/DoNotExportEmptyTopology podresources: do not export empty NUMA topology	2021-10-07 07:11:46 -07:00
Kubernetes Prow Robot	907d62eac8	Merge pull request #105462 from ehashman/merge-terminal-phase Ensure terminal pods maintain terminal status	2021-10-05 13:12:58 -07:00
Mark Rossetti	99e43bfa8c	Stop passing WindowsHostProcessContainer annotations for CRI calls in kubelet Signed-off-by: Mark Rossetti <marosset@microsoft.com>	2021-10-05 10:08:53 -07:00
Elana Hashman	3005ef34f2	Ensure terminal pods maintain terminal status	2021-10-05 09:26:27 -07:00
Kubernetes Prow Robot	c91f9bdc60	Merge pull request #104689 from cynepco3hahue/memory_manager_restricted_policy_fix kubelet: memory manager: fix preferred topology hints calculation	2021-10-05 06:47:08 -07:00
Kubernetes Prow Robot	efa9029a0d	Merge pull request #104920 from tkashem/response-writer-cleanup apiserver: decorate http.ResponseWriter correctly	2021-10-05 00:53:09 -07:00
Elana Hashman	5ff6c2396d	Do not sync Waiting statuses for Terminated pods	2021-10-04 11:05:54 -07:00
Abu Kashem	0d50c969c5	apiserver: wrap ResponseWriter using abstraction	2021-10-04 10:59:11 -04:00
Kubernetes Prow Robot	e414cf7641	Merge pull request #100482 from pohly/generic-ephemeral-volume-checks generic ephemeral volume checks	2021-10-01 10:47:22 -07:00
Patrick Ohly	1e26115df5	consider ephemeral volumes for host path and node limits check When adding the ephemeral volume feature, the special case for PersistentVolumeClaim volume sources in kubelet's host path and node limits checks was overlooked. An ephemeral volume source is another way of referencing a claim and has to be treated the same way.	2021-10-01 17:03:44 +02:00
Kubernetes Prow Robot	883250145c	Merge pull request #104788 from 249043822/memorymanager-br Fix initContainersReusableMemory delete bug in MemoryManager	2021-10-01 05:27:22 -07:00
Kubernetes Prow Robot	cab54856f1	Merge pull request #104933 from vikramcse/automate_mockery conversion of tests from mockery to mockgen	2021-09-30 18:33:21 -07:00
Shuhei Kitagawa	ef0eff14ab	Add tests kubelet default config (#105116 ) * Use utilpointer to get a pointer * Add tests for kubelet default configs * Change copyright year from 2015 to 2021 * Run gofmt * Add all negative and all positive test cases	2021-09-30 17:29:33 -07:00
Francesco Romani	077c0aa1be	node: graduate CPUManagerPolicyOptions to beta We graduate the `CPUManagerPolicyOptions` feature to beta in the 1.23 cycle, and we add new experimental feature gates to guard new options which are planned in the 1.23 and in the following cycles. We introduce additional feature gate called `CPUManagerPolicyAlphaOptions` and `CPUManagerPolicyBetaOptions`. The basic idea is to avoid the cumbersome process of adding a feature gate for each option, and to have feature gates which track the maturity level of _groups_ of options. Besides this change, the graduation process, and the process in general, for adding new policy options is still unchanged. The `full-pcpus-only` option added in the 1.22 cycle is intentionally moved into the beta policy options For more details: - KEP: https://github.com/kubernetes/enhancements/pull/2933 - sig-arch discussion: https://groups.google.com/u/1/g/kubernetes-sig-architecture/c/Nxsc7pfe5rw Signed-off-by: Francesco Romani <fromani@redhat.com>	2021-09-29 11:40:03 +02:00
Kubernetes Prow Robot	e138afc35d	Merge pull request #105213 from yxxhero/remove_StartedPodsErrorsTotal_metrice_message Remove StartedPodsErrorsTotal metric message	2021-09-28 10:45:16 -07:00
Kubernetes Prow Robot	9005160245	Merge pull request #105272 from wojtek-t/add_jittering_for_kubelet Add jittering for Kubelet status computing	2021-09-28 00:20:42 -07:00
wojtekt	65d8037ae3	Add jittering for Kubelet status computing	2021-09-27 19:39:50 +02:00
vikram Jadhav	0de4397490	mockery to mockgen conversion	2021-09-25 16:15:08 +00:00
Khaled Henidak (Kal)	a53e2eaeab	move IPv6DualStack feature to stable. (#104691 ) * kube-proxy * endpoints controller * app: kube-controller-manager * app: cloud-controller-manager * kubelet * app: api-server * node utils + registry/strategy * api: validation (comment removal) * api:pod strategy (util pkg) * api: docs * core: integration testing * kubeadm: change feature gate to GA * service registry and rest stack * move feature to GA * generated	2021-09-24 16:30:22 -07:00
yxxhero	35df409a7e	remove StartedPodsErrorsTotal metrice message Signed-off-by: yxxhero <aiopsclub@163.com>	2021-09-23 22:18:56 +08:00
Kubernetes Prow Robot	2541fcf256	Merge pull request #104123 from fromanirh/podresources-not-report-unhealthy-devices devicemanager: skip unhealthy devices in GetAllocatable	2021-09-23 05:39:21 -07:00
Ryan Phillips	e2e938066d	kubelet: add probe termination to graceful shutdowns	2021-09-22 14:13:25 -05:00
Francesco Romani	1b6efa5e21	devicemanager: skip unhealthy devs in GetAllocatable The GetAllocatableDevices, needed to support the podresources API, doesn't take into account the device health when computing its output. In this PR we address this gap and add unit tests along the way to prevent regressions. This gives us a good initial coverage, E2E tests to cover this case are much harder to write, because we would need to inject faults to trigger the unhealthy status. We will evaluate if adding these tests into later PRs. Signed-off-by: Francesco Romani <fromani@redhat.com>	2021-09-22 19:20:04 +02:00
Kubernetes Prow Robot	7c71e06cd1	Merge pull request #104959 from calvin0327/issue-test-dataRace fix the test issue of node shutdown manager	2021-09-21 11:56:30 -07:00
Kubernetes Prow Robot	44d4d007bf	Merge pull request #103424 from 249043822/br-cadvisor-perf Optimize kubelet stats provider for perfomace bottleneck	2021-09-21 11:56:18 -07:00
Kubernetes Prow Robot	353f0a5eab	Merge pull request #105095 from wojtek-t/migrate_clock_3 Unify towards k8s.io/utils/clock - part 3	2021-09-20 12:46:45 -07:00
Kubernetes Prow Robot	0d20f47c7a	Merge pull request #105090 from saad-ali/removeSubpathFeaturegate Remove VolumeSubpath feature gate	2021-09-17 15:52:07 -07:00
wojtekt	d9b08c611d	Migrate to k8s.io/utils/clock	2021-09-17 15:19:08 +02:00
Kubernetes Prow Robot	cb2ea4bf7c	Merge pull request #101161 from rikatz/move-sysctl-util Move node and networking related helpers from pkg/util to component helpers	2021-09-17 02:11:00 -07:00
saad-ali	beb17fe10b	Remove VolumeSubpath feature gate Remove the VolumeSubpath feature gate. Feature gate convention has been updated since this was introduced to indicate that they "are intended to be deprecated and removed after a feature becomes GA or is dropped.".	2021-09-17 01:59:23 -07:00
Ricardo Pchevuzinske Katz	37d11bcdaf	Move node and networking related helpers from pkg/util to component helpers Signed-off-by: Ricardo Katz <rkatz@vmware.com>	2021-09-16 17:00:19 -03:00
Clayton Coleman	d5719800bf	kubelet: Handle UID reuse in pod worker If a pod is killed (no longer wanted) and then a subsequent create/ add/update event is seen in the pod worker, assume that a pod UID was reused (as it could be in static pods) and have the next SyncKnownPods after the pod terminates remove the worker history so that the config loop can restart the static pod, as well as return to the caller the fact that this termination was not final. The housekeeping loop then reconciles the desired state of the Kubelet (pods in pod manager that are not in a terminal state, i.e. admitted pods) with the pod worker by resubmitting those pods. This adds a small amount of latency (2s) when a pod UID is reused and the pod is terminated and restarted.	2021-09-15 14:02:00 -04:00
KeZhang	a629ceeb58	Fix initContainersReusableMemory delete bug	2021-09-15 10:04:49 +08:00
Kubernetes Prow Robot	fa2657b8b2	Merge pull request #104624 from Haleygo/support-null-resolvConf-in-configFile When resolvConf is "" in kubelet configuration, pod will be created with wrong dns policy	2021-09-14 14:18:59 -07:00
yxxhero	c1b94d27d9	fix typo Signed-off-by: yxxhero <aiopsclub@163.com>	2021-09-14 23:24:14 +08:00
Haleygo	46454ea9dc	support null resolvConf in Kubelet Configuration	2021-09-14 16:12:52 +08:00
Kubernetes Prow Robot	047a6b9f86	Merge pull request #104874 from wojtek-t/migrate_clock_1 Unify towards k8s.io/utils/clock - part 1	2021-09-13 19:09:20 -07:00
Kubernetes Prow Robot	c79f7c1add	Merge pull request #104711 from claudiubelu/update-pause-3.6 update pause image references to use 3.6	2021-09-13 19:09:08 -07:00
yxxhero	20b3cd5198	fix typo Signed-off-by: yxxhero <aiopsclub@163.com>	2021-09-14 09:04:59 +08:00
yxxhero	5ba76eb911	fix typo Signed-off-by: yxxhero <aiopsclub@163.com>	2021-09-14 09:03:29 +08:00
Kubernetes Prow Robot	0e2acbe9a8	Merge pull request #104794 from wzshiming/fix/kubelet-cm-kv-pair pkg/kubelet/cm/memorymanager: Fix ErrorS key/value pair	2021-09-13 15:44:04 -07:00
calvin0327	db82e282fc	fix the test issue of data race to node shutdown manager	2021-09-13 18:12:19 +08:00
wojtekt	53ce79a18a	Migrate to k8s.io/utils/clock in pkg/kubelet	2021-09-10 12:20:09 +02:00
Kubernetes Prow Robot	1dcea5cb02	Merge pull request #104817 from smarterclayton/pod_status kubelet: Rejected pods should be filtered from admission	2021-09-09 22:15:59 -07:00
Kubernetes Prow Robot	5724484bda	Merge pull request #104069 from pacoxu/fix-data-race-104057 fix data race in kubelet volume test: add lock for ut	2021-09-09 21:09:59 -07:00
eggiter	20d3bc32ac	fix(cpumanager): Do not release cpus of init containers while they are reused in app containers	2021-09-10 10:01:35 +08:00
Clayton Coleman	17d32ed0b8	kubelet: Rejected pods should be filtered from admission A pod that has been rejected by admission will have status manager set the phase to Failed locally, which make take some time to propagate to the apiserver. The rejected pod will be included in admission until the apiserver propagates the change back, which was an unintended regression when checking pod worker state as authoritative. A pod that is terminal in the API may still be consuming resources on the system, so it should still be included in admission.	2021-09-08 10:23:45 -04:00
Shiming Zhang	7706d3d281	pkg/kubelet/cm/memorymanager: Fix ErrorS key/value pair	2021-09-06 17:37:04 +08:00
vikram Jadhav	c10c92bda9	changes made by introducing mockgen command	2021-09-03 17:40:11 +00:00
Vikram Jadhav	5f674101bb	Added update and verify scripts for automated mock generation	2021-09-03 17:40:11 +00:00
yxxhero	2f448a0789	fix oomkilled description Signed-off-by: yxxhero <aiopsclub@163.com>	2021-09-03 22:07:46 +08:00
yxxhero	71a91d55cb	update func description	2021-09-03 07:20:28 +08:00
yxxhero	afde4c8bc4	fix init container oomkilled as a failure Signed-off-by: yxxhero <aiopsclub@163.com>	2021-09-03 07:04:57 +08:00
Kubernetes Prow Robot	0b4a793da2	Merge pull request #103941 from saschagrunert/seccomp-profile-root Remove deprecated `--seccomp-profile-root`/`seccompProfileRoot` config	2021-09-02 08:52:57 -07:00
paco	ab055e9ba4	fix data race in kubelet volume test: add lock Signed-off-by: Paco Xu <paco.xu@daocloud.io> Co-authored-by: Jian Zeng <zengjian.zj@bytedance.com>	2021-09-01 16:13:55 +08:00
Artyom Lukianov	9ea9798759	kubelet: memory manager: fix topology preferred topology hints calculation Prevent starting pods with resources satisfied by a single NUMA node on multiple NUMA nodes. The code returned before it updated the minimal amount of NUMA nodes that can satisfy the container requests. Signed-off-by: Artyom Lukianov <alukiano@redhat.com>	2021-08-31 17:46:59 +03:00
Sascha Grunert	46077e6be7	Remove deprecated `--seccomp-profile-root`/`seccompProfileRoot` configuration The configuration is deprecated and targets removal for v1.23. Tests cases have been changed as well. Signed-off-by: Sascha Grunert <sgrunert@redhat.com>	2021-08-31 09:55:28 +02:00
Kubernetes Prow Robot	bbbeceb6aa	Merge pull request #104577 from smarterclayton/smaller_filter_master kubelet: Admission must exclude completed pods and avoid races	2021-08-30 13:17:13 -07:00
Claudiu Belu	18936d4785	updates pause image references The pause:3.6 image has been published. Also updates older / incorrect references.	2021-08-29 21:50:05 -07:00
Kubernetes Prow Robot	c262d09bb7	Merge pull request #104604 from wojtek-t/fix_secret_manager_2 Don't prematurely close reflectors in case of slow initialization in watch based manager	2021-08-26 06:11:23 -07:00
wojtekt	515106b795	Don't prematurely close reflectors in case of slow initialization in watch based manager	2021-08-26 11:34:24 +02:00
tiloso	2b86541313	Fix staticcheck failure in pkg/kubelet/cm/cpuset	2021-08-26 08:50:08 +02:00
Kubernetes Prow Robot	cbd0611d49	Merge pull request #104528 from kolyshkin/runc-1.0.2 vendor: bump runc to 1.0.2	2021-08-25 18:17:23 -07:00
Kubernetes Prow Robot	2f6b9166d7	Merge pull request #104039 from YanzhaoLi/extract-containerdid-from-various-cgrouppath Get containerID from systemd-style cgroupPath in cri_stats_provider	2021-08-25 17:05:22 -07:00
Clayton Coleman	a2ca66d280	kubelet: Admission must exclude completed pods and avoid races Fixes two issues with how the pod worker refactor calculated the pods that admission could see (GetActivePods() and filterOutTerminatedPods()) First, completed pods must be filtered from the "desired" state for admission, which arguably should be happening earlier in config. Exclude the two terminal pods states from GetActivePods() Second, the previous check introduced with the pod worker lifecycle ownership changes was subtly wrong for the admission use case. Admission has to include pods that haven't yet hit the pod worker, which CouldHaveRunningContainers was filtering out (because the pod worker hasn't seen them). Introduce a weaker check - IsPodKnownTerminated() - that returns true only if the pod is in a known terminated state (no running containers AND known to pod worker). This weaker check may only be called from components that need admitted pods, not other kubelet subsystems. This commit does not fix the long standing bug that force deleted pods are omitted from admission checks, which must be fixed by having GetActivePods() also include pods "still terminating".	2021-08-25 13:31:02 -04:00
KeZhang	dd4fd54427	fix duplicate package import in pod_worker	2021-08-25 21:16:38 +08:00
Stephen Augustus	481cf6fbe7	generated: Run hack/update-gofmt.sh Signed-off-by: Stephen Augustus <foo@auggie.dev>	2021-08-24 15:47:49 -04:00
Alexey Perevalov	bb81101570	podresource: do not export NUMA topology if it's empty If device plugin returns device without topology, keep it internaly as NUMA node -1, it helps at podresources level to not export NUMA topology, otherwise topology is exported with NUMA node id 0, which is not accurate. It's imposible to unveile this bug just by tracing json.Marshal(resp) in podresource client, because NUMANodes field ID has json property omitempty, in this case when ID=0 shown as emtpy NUMANode. To reproduce it, better to iterate on devices and just trace dev.Topology.Nodes[0].ID. Signed-off-by: Alexey Perevalov <alexey.perevalov@huawei.com>	2021-08-24 15:38:21 +00:00
Kir Kolyshkin	c06a851042	pkg/kubelet/cm: use SkipFreezeOnSet This is a knob added by runc 1.0.2 specifically for kubernetes, which tells runc/libcontainer/cgroups/systemd v1 manager to not freeze the cgroup in Set(). We set this knob here because this code is only used for pods (rather than containers) management, and in this place we create or update the pod cgroup with no device limits set, so we can skip the freeze. If this knob is not set, libcontainer's cgroup v1 manager tries to figure out whether the freeze is needed or not, but it's a somewhat expensive check to perform, thus the knob is a shortcut. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2021-08-23 13:41:51 -07:00
Antonio Ojea	0cd75e8fec	run hack/update-netparse-cve.sh	2021-08-20 10:42:09 +02:00
Kubernetes Prow Robot	8dbc33d649	Merge pull request #101081 from rphillips/add_graceful_shutdown_event kubelet: add graceful shutdown events	2021-08-17 22:08:08 -07:00
Kubernetes Prow Robot	a779c58b16	Merge pull request #104330 from liggitt/defaulter-package Change defaulter-gen input to package import path	2021-08-17 11:42:18 -07:00
Kubernetes Prow Robot	07b7afefbf	Merge pull request #103862 from tanjing2020/cleancode Replace 'x.Sub(time.Now())' with 'time.Until(x)'	2021-08-17 11:42:01 -07:00
Kubernetes Prow Robot	d7c1663556	Merge pull request #103137 from wzshiming/fix/expected_inhibit_delay Allow the actual inhibit delay to be greater than the expected inhibit delay	2021-08-17 11:41:49 -07:00
Kubernetes Prow Robot	a9aad7e034	Merge pull request #103107 from pacoxu/fix-93300 ResourceConfigForPod: check initContainers as other QoS func	2021-08-17 11:41:37 -07:00
Kubernetes Prow Robot	f4185318bc	Merge pull request #103048 from gy95/remove_static remove not used IsStaticPod, prevent possible panic	2021-08-17 11:41:25 -07:00
Kubernetes Prow Robot	b559434c02	Merge pull request #103059 from rajaSahil/fix-error Update github.com/pkg/errors to go native errors pkg	2021-08-17 10:29:25 -07:00
Kubernetes Prow Robot	db42b67f3c	Merge pull request #101962 from llhhbc/add-osinfo-logs Add getOSInfo err info	2021-08-17 10:29:13 -07:00
Jordan Liggitt	87a4e082ac	Change defaulter-gen input to package path	2021-08-14 11:00:18 -04:00
YanzhaoLi	545d898584	Extract containerID from systemd-style cgroupPath in cri_stats_provider And fix test to generate UUID without dash	2021-08-11 19:03:56 -07:00
Ryan Phillips	30e9a420c4	kubelet: fix sandbox creation error suppression when pods are quickly deleted	2021-08-10 08:55:25 -05:00
Kubernetes Prow Robot	4b4d12f8a6	Merge pull request #102913 from pacoxu/upgrade-promotheus-common upgrade prometheus/common to v0.28.0	2021-08-09 08:03:31 -07:00
longhui.li	4af506c989	Add getOSInfo err info	2021-08-09 11:04:53 +08:00
Artyom Lukianov	73a5cce3e6	device manager: do not clean admitted pods from the state Signed-off-by: Artyom Lukianov <alukiano@redhat.com>	2021-08-08 16:46:06 +03:00
Artyom Lukianov	93a237abd8	memory manager: do not clean admitted pods from the state Signed-off-by: Artyom Lukianov <alukiano@redhat.com>	2021-08-08 16:46:06 +03:00
Artyom Lukianov	66babd1a90	cpu manager: do not clean admitted pods from the state Signed-off-by: Artyom Lukianov <alukiano@redhat.com>	2021-08-08 16:46:06 +03:00
Elana Hashman	d2ed3b28b7	Revert "revert Bump DynamicKubeConfig metric deprecation to 1.23 by delta update"	2021-08-06 08:38:56 -07:00
Kubernetes Prow Robot	28990f7664	Merge pull request #103958 from liggitt/server-timeouts Set idle and readheader timeouts	2021-08-05 14:11:02 -07:00
Kubernetes Prow Robot	3b84cc9e6b	Merge pull request #104075 from kerthcet/cleanup/revert-dynamickubeconfig-metric revert Bump DynamicKubeConfig metric deprecation to 1.23 by delta update	2021-08-05 08:18:40 -07:00

... 5 6 7 8 9 ...

10171 Commits