kubernetes

Author	SHA1	Message	Date
Eric Lin	5fdf24baca	Read number of running processes from /proc/loadavg. Fallback to using sysinfo syscall if failed. Fix kubernetes#107107	2022-01-11 21:33:53 +00:00
Kubernetes Prow Robot	cadbe8dfb5	Merge pull request #107250 from cndoit18/use-errors cleanup(kubelet): use errors.Is(err, os.ErrProcessDone)	2022-01-11 10:49:01 -08:00
Kubernetes Prow Robot	19069665f9	Merge pull request #107094 from adisky/d-container-runtime Mark container-runtime kubelet flag as deprecated	2022-01-11 10:48:46 -08:00
Kubernetes Prow Robot	7eb5046064	Merge pull request #106470 from qmloong/qmloong/fix fix: some typos and syncPod outdated workflow annotation	2022-01-11 10:48:38 -08:00
Kubernetes Prow Robot	5f4914604d	Merge pull request #106353 from gjkim42/remove-false-pleg-errors kubelet: Remove false PLEG errors	2022-01-11 10:48:26 -08:00
fengzixu	5d544d3f01	fix comment	2022-01-11 14:28:31 +00:00
fengzixu	f96449f2e2	fix unit test	2022-01-11 13:50:18 +00:00
fengzixu	e2b5b5465a	improve metrics comment	2022-01-11 13:50:18 +00:00
fengzixu	c1a58d715c	fix unit test	2022-01-11 13:50:18 +00:00
fengzixu	5593e27429	improve metrics comment	2022-01-11 13:50:18 +00:00
fengzixu	1cdc694ac2	fix unit test	2022-01-11 13:50:18 +00:00
fengzixu	4a72f08a28	add useful comment for volume stats metrics	2022-01-11 13:50:18 +00:00
fengzixu	b885deffe3	fix unit test	2022-01-11 13:50:17 +00:00
fengzixu	ed7fd0ced5	add volumeHealth label to metrics	2022-01-11 13:50:17 +00:00
fengzixu	bab1755274	fix: correct metrics expression	2022-01-11 13:50:17 +00:00
fengzixu	d71e21e01e	add volume kubelet_volume_stats_health_abnormal to kubelet	2022-01-11 13:50:17 +00:00
Dingzhu Lurong	1de2f3cc7d	add writer error handler	2022-01-11 11:47:25 +08:00
Kubernetes Prow Robot	a0dfd958d5	Merge pull request #107163 from cyclinder/fix_leak_goroutine fix goroutine leaks in TestConfigurationChannels	2022-01-10 17:23:16 -08:00
Davanum Srinivas	9682b7248f	OWNERS cleanup - Jan 2021 Week 1 Signed-off-by: Davanum Srinivas <davanum@gmail.com>	2022-01-10 08:14:29 -05:00
cyclinder	928e686877	fix goroutine leaks in TestConfigurationChannels Signed-off-by: cyclinder <qifeng.guo@daocloud.io>	2022-01-10 19:51:16 +08:00
yanghesong	6905fef761	Remove runtime in validate Validate is useless as dockershim is removed Signed-off-by: yanghesong <hesong.yang@foxmail.com>	2022-01-09 09:11:49 +08:00
wq	4f38d4aaa1	fix a typo in the comment of ImageCredentialProviderConfigFile	2022-01-09 00:07:43 +09:00
Kubernetes Prow Robot	d1a5513cb0	Merge pull request #107006 from gnufied/add-total-mount-time-metrics Add metric for reporting total end-to-end mount time	2022-01-07 06:19:31 -08:00
Kubernetes Prow Robot	09fccc3533	Merge pull request #106796 from jonyhy96/fix-timer kubelet: use newtimer instead in nodeshutdown manager	2022-01-06 11:47:12 -08:00
Kubernetes Prow Robot	03ee86c09c	Merge pull request #104837 from eggiter/fix-release-reused-cpus fix(cpumanager): Do not release CPUs of init containers while they are being reused in app containers	2022-01-06 11:46:38 -08:00
Kubernetes Prow Robot	0b9ad84973	Merge pull request #107116 from yxxhero/add_more_msg_for_no_podsandbox_container add more message for no PodSandbox container	2022-01-06 08:58:09 -08:00
Kubernetes Prow Robot	b457ae72f5	Merge pull request #106644 from ahrtr/add_info_counter_perfcounter Add more info when failing to call PdhAddEnglishCounter	2022-01-06 06:45:01 -08:00
Aditi Sharma	e03d7d3fdd	Mark container-runtime flag as deprecated Signed-off-by: Aditi Sharma <adi.sky17@gmail.com>	2022-01-06 10:23:03 +05:30
Mengjiao Liu	beda4cafb6	kubelet: Remove the deprecated flag `--experimental-check-node-capabilities-before-mount`	2022-01-06 11:47:11 +08:00
Kubernetes Prow Robot	73b68f5233	Merge pull request #106979 from a2ush/fix_typo Fix comment out typo (from resolve.conf to resolv.conf) and change the content name (from maxResolveConfLength to maxResolvConfLength)	2022-01-05 16:08:26 -08:00
Kubernetes Prow Robot	afd254a18f	Merge pull request #106756 from victory460/feature_helpers code cleanup for container/helpers.go	2022-01-05 08:20:42 -08:00
Kubernetes Prow Robot	19591a1324	Merge pull request #105829 from yuanchen8911/master Fix and improve comments on kubelet metrics	2022-01-04 23:02:32 -08:00
Kubernetes Prow Robot	abfbbe4dda	Merge pull request #107119 from hakman/remove_dockerless Remove dockerless build tag and DockerLegacyService interface	2022-01-04 11:27:21 -08:00
Paco Xu	c5d8354e0e	add "kubelet_volume_stat_cal_duration_seconds_bucket" VolumeStatCalDuration metrics for fsquato monitoring benchmark	2021-12-31 11:39:40 +08:00
cndoit18	601d02b90f	refactor(kubelet): use errors.Is(err, os.ErrProcessDone) use errors.Is(err, os.ErrProcessDone) here and remove "process already finished" string comparison. Signed-off-by: cndoit18 <cndoit18@outlook.com>	2021-12-29 18:10:06 +08:00
Elana Hashman	dbd50d9f50	Remove dynamic log sanitization fields from Kubelet config validation	2021-12-23 13:03:13 -08:00
Kubernetes Prow Robot	f0dbc32ed9	Merge pull request #106853 from gnufied/disable-exp-backoff-volume-not-inuse When volume is not marked in-use, do not backoff	2021-12-22 19:46:37 -08:00
Hemant Kumar	7989f27044	use node informer to check volumes attachment status before backoff fix unit tests	2021-12-20 11:57:05 -05:00
songlh	e03a0bc105	fixing the panic in TestVersion	2021-12-18 19:20:15 -05:00
Ciprian Hacman	5bae9b9288	Clean up DockerLegacyService interface Signed-off-by: Ciprian Hacman <ciprian@hakman.dev>	2021-12-18 12:24:54 +02:00
Ciprian Hacman	6cdb1c225d	Clean up dockerless build tag Signed-off-by: Ciprian Hacman <ciprian@hakman.dev>	2021-12-18 12:18:25 +02:00
yxxhero	a90b149be0	add more message for no PodSandbox container Signed-off-by: yxxhero <aiopsclub@163.com>	2021-12-18 09:52:03 +08:00
Davanum Srinivas	497e9c1971	Cleanup OWNERS files (No Activity in the last year) Signed-off-by: Davanum Srinivas <davanum@gmail.com>	2021-12-15 10:34:02 -05:00
a2ush	393dec26f6	Change the name of the constant	2021-12-14 22:42:57 +09:00
Hemant Kumar	55b5e6dc33	Add metric for reporting total end-to-end mount time This metric includes time spent in waiting for devices to be attached, any RPC calls and performing recursive chown etc.	2021-12-13 16:23:01 -05:00
a2ush	d775483381	Fix comment out typo	2021-12-11 22:27:38 +09:00
Kubernetes Prow Robot	1d66302c42	Merge pull request #106458 from dims/lint-yaml-in-owners-files Lint/Beautify yaml in OWNERS files	2021-12-10 06:39:12 -08:00
Kubernetes Prow Robot	1b0d83f1d6	Merge pull request #106599 from klueska/fix-numa-bug Fix Bugs in CPUManager distribute NUMA policy option	2021-12-10 04:41:12 -08:00
haoyun	92fa957dd1	feat: use clock instead Signed-off-by: haoyun <yun.hao@daocloud.io>	2021-12-10 13:59:12 +08:00
Kubernetes Prow Robot	15e5f2a19a	Merge pull request #106291 from sbs2001/fix_invalid_comment Remove invalid comment in legacyregistry	2021-12-09 19:03:10 -08:00
Davanum Srinivas	9405e9b55e	Check in OWNERS modified by update-yamlfmt.sh Signed-off-by: Davanum Srinivas <davanum@gmail.com>	2021-12-09 21:31:26 -05:00
David Porter	95264a418d	kubelet: set failed phase during graceful shutdown Revert to previous behavior in 1.21/1.20 of setting pod phase to failed during graceful node shutdown. Setting pods to failed phase will ensure that external controllers that manage pods like deployments will create new pods to replace those that are shutdown. Many customers have taken a dependency on this behavior and it was breaking change in 1.22, so this change reverts back to the previous behavior. Signed-off-by: David Porter <david@porter.me>	2021-12-09 13:17:40 -08:00
Kubernetes Prow Robot	cdf3ad823a	Merge pull request #97252 from dims/drop-dockershim Completely remove in-tree dockershim from kubelet	2021-12-08 12:51:46 -08:00
Kubernetes Prow Robot	f356ae4ad9	Merge pull request #101719 from SergeyKanzhelev/removeReallyCrashForTesting Remove ReallyCrashForTesting and cleaned up some references to Handle…	2021-12-07 23:39:45 -08:00
caozhiyuan	1a59bcb142	add validation test for RegisterWithTaints	2021-12-08 10:36:43 +08:00
Kubernetes Prow Robot	b685b3982d	Merge pull request #105360 from shuheiktgw/refactor_kubelet_config_validation_tests Refactor kubelet config validation tests	2021-12-07 17:25:43 -08:00
Davanum Srinivas	bc78dff42e	update files to drop dockershim Signed-off-by: Davanum Srinivas <davanum@gmail.com>	2021-12-07 15:15:13 -05:00
Davanum Srinivas	83265c9171	drop files deleted from pkg/kubelet/dockershim Signed-off-by: Davanum Srinivas <davanum@gmail.com>	2021-12-07 15:15:13 -05:00
Hemant Kumar	5b7b2e2f6c	When volume is not marked in-use, do not backoff	2021-12-07 11:50:15 -05:00
Sascha Grunert	a063a2ba3e	Revert dockershim CRI v1 changes We should not touch the dockershim ahead of removal and therefore default to `v1alpha2` CRI instead of `v1`. Partially reverts changes from https://github.com/kubernetes/kubernetes/pull/106501 Signed-off-by: Sascha Grunert <sgrunert@redhat.com>	2021-12-03 18:37:11 +01:00
xuweiwei	21238c2593	code cleanup for container/helpers.go	2021-12-01 11:17:33 +08:00
Sergey Kanzhelev	a11453efbc	remove ReallyCrashForTesting and cleaned up some references to HandleCrash behavior	2021-11-29 20:00:10 +00:00
menglong.qi	12eff56460	fix: syncPod outdated workflow comment	2021-11-28 17:21:29 +08:00
boenn	cec2aae1e5	rebase master	2021-11-25 11:21:12 +08:00
Kevin Klues	f8511877e2	Add regression test for CPUManager distribute NUMA algorithm We witnessed this exact allocation attempt in a live cluster and witnessed the algorithm fail with an accounting error. This test was added to verify that this case is now handled by the updates to the algorithm and that we don't regress from it in the future. "test" description="ensure previous failure encountered on live machine has been fixed (1/1)" "combo remainderSet balance" combo=[2 4 6] remainderSet=[2 4 6] distribution=9 remainder=1 available=[14 2 4 4 0 3 4 1] balance=4.031 "combo remainderSet balance" combo=[2 4 6] remainderSet=[2 4] distribution=9 remainder=1 available=[0 3 4 1 14 2 4 4] balance=4.031 "combo remainderSet balance" combo=[2 4 6] remainderSet=[2 6] distribution=9 remainder=1 available=[1 14 2 4 4 0 3 4] balance=4.031 "combo remainderSet balance" combo=[2 4 6] remainderSet=[4 6] distribution=9 remainder=1 available=[1 3 4 0 14 2 4 4] balance=4.031 "combo remainderSet balance" combo=[2 4 6] remainderSet=[2] distribution=9 remainder=1 available=[4 0 3 4 1 14 2 4] balance=4.031 "combo remainderSet balance" combo=[2 4 6] remainderSet=[4] distribution=9 remainder=1 available=[3 4 0 14 2 4 4 1] balance=4.031 "combo remainderSet balance" combo=[2 4 6] remainderSet=[6] distribution=9 remainder=1 available=[1 13 2 4 4 1 3 4] balance=3.606 "bestCombo found" distribution=9 bestCombo=[2 4 6] bestRemainder=[6] Signed-off-by: Kevin Klues <kklues@nvidia.com>	2021-11-24 20:49:58 +00:00
Kevin Klues	e284c74d93	Add unit test for CPUManager distribute NUMA algorithm verifying fixes Before Change: "test" description="ensure bestRemainder chosen with NUMA nodes that have enough CPUs to satisfy the request" "combo remainderSet balance" combo=[0 1 2 3] remainderSet=[0 1] distribution=8 remainder=2 available=[-1 -1 0 6] balance=2.915 "combo remainderSet balance" combo=[0 1 2 3] remainderSet=[0 2] distribution=8 remainder=2 available=[-1 0 -1 6] balance=2.915 "combo remainderSet balance" combo=[0 1 2 3] remainderSet=[0 3] distribution=8 remainder=2 available=[5 -1 0 0] balance=2.345 "combo remainderSet balance" combo=[0 1 2 3] remainderSet=[1 2] distribution=8 remainder=2 available=[0 -1 -1 6] balance=2.915 "combo remainderSet balance" combo=[0 1 2 3] remainderSet=[1 3] distribution=8 remainder=2 available=[0 -1 0 5] balance=2.345 "combo remainderSet balance" combo=[0 1 2 3] remainderSet=[2 3] distribution=8 remainder=2 available=[0 0 -1 5] balance=2.345 "bestCombo found" distribution=8 bestCombo=[0 1 2 3] bestRemainder=[0 3] --- FAIL: TestTakeByTopologyNUMADistributed (0.01s) --- FAIL: TestTakeByTopologyNUMADistributed/ensure_bestRemainder_chosen_with_NUMA_nodes_that_have_enough_CPUs_to_satisfy_the_request (0.00s) cpu_assignment_test.go:867: unexpected error [accounting error, not enough CPUs allocated, remaining: 1] After Change: "test" description="ensure bestRemainder chosen with NUMA nodes that have enough CPUs to satisfy the request" "combo remainderSet balance" combo=[0 1 2 3] remainderSet=[3] distribution=8 remainder=2 available=[0 0 0 4] balance=1.732 "bestCombo found" distribution=8 bestCombo=[0 1 2 3] bestRemainder=[3] SUCCESS Signed-off-by: Kevin Klues <kklues@nvidia.com>	2021-11-24 20:45:37 +00:00
Kevin Klues	031f11513d	Fix accounting bug in CPUManager distribute NUMA policy Without this fix, the algorithm may decide to allocate "remainder" CPUs from a NUMA node that has no more CPUs to allocate. Moreover, it was only considering allocation of remainder CPUs from NUMA nodes such that each NUMA node in the remainderSet could only allocate 1 (i.e. 'cpuGroupSize') more CPUs. With these two issues in play, one could end up with an accounting error where not enough CPUs were allocated by the time the algorithm runs to completion. The updated algorithm will now omit any NUMA nodes that have 0 CPUs left from the set of NUMA nodes considered for allocating remainder CPUs. Additionally, we now consider all combinations of nodes from the remainder set of size 1..len(remainderSet). This allows us to find a better solution if allocating CPUs from a smaller set leads to a more balanced allocation. Finally, we loop through all NUMA nodes 1-by-1 in the remainderSet until all rmeainer CPUs have been accounted for and allocated. This ensure that we will not hit an accounting error later on because we explicitly remove CPUs from the remainder set until there are none left. A follow-on commit adds a set of unit tests that will fail before these changes, but succeeds after them. Signed-off-by: Kevin Klues <kklues@nvidia.com>	2021-11-24 19:18:11 +00:00
Kevin Klues	5317a2e2ac	Fix error handling in CPUManager distribute NUMA tests Signed-off-by: Kevin Klues <kklues@nvidia.com>	2021-11-24 16:51:31 +00:00
Kevin Klues	dc4430b663	Add a sum() helper to the CPUManager cpuassignment logic Signed-off-by: Kevin Klues <kklues@nvidia.com>	2021-11-24 16:51:29 +00:00
Kevin Klues	cfacc22459	Allow the map.Values() function in the CPUManager to take a set of keys Signed-off-by: Kevin Klues <kklues@nvidia.com>	2021-11-24 16:51:28 +00:00
Kevin Klues	a160d9a8cd	Fix CPUManager algo to calculate min NUMA nodes needed for distribution Previously the algorithm was too restrictive because it tried to calculate the minimum based on the number of available NUMA nodes and the number of available CPUs on those NUMA nodes. Since there was no (easy) way to tell how many CPUs an individual NUMA node happened to have, the average across them was used. Using this value however, could result in thinking you need more NUMA nodes to possibly satisfy a request than you actually do. By using the total number of NUMA nodes and CPUs per NUMA node, we can get the true minimum number of nodes required to satisfy a request. For a given "current" allocation this may not be the true minimum, but its better to start with fewer and move up than to start with too many and miss out on a better option. Signed-off-by: Kevin Klues <kklues@nvidia.com>	2021-11-24 16:51:26 +00:00
Kevin Klues	209cd20548	Fix unit tests following bug fix in CPUManager for map functions (2/2) Now that the algorithm for balancing CPU distributions across NUMA nodes is correct, this test actually behaves differently for the "packed" vs. "distributed" allocation algorithms (as it should). In the "packed" case we need to ensure that CPUs are allocated such that they are packed onto cores. Since one CPU is already allocated from a core on NUMA node 0, we want the next CPU to be its hyperthreaded pair (even though the first available CPU id is on Socket 1). In the "distributed" case, however, we want to ensure CPUs are allocated such that we have an balanced distribution of CPUs across all NUMA nodes. This points to allocating from Socket 1 if the only other CPU allocated has been done on Socket 0. To allow CPUs allocations to be packed onto full cores, one can allocate them from the "distributed" algorithm with a 'cpuGroupSize' equal to the number of hypthreads per core (in this case 2). We added an explicit test case for this, demonstrating that we get the same result as the "packed" algorithm does, even though the "distributed" algorithm is in use. Signed-off-by: Kevin Klues <kklues@nvidia.com>	2021-11-24 16:51:24 +00:00
Kevin Klues	67f719cb1d	Fix unit tests following bug fix in CPUManager for map functions (1/2) This fixes two related tests to better test our "balanced" distribution algorithm. The first test originally provided an input with the following number of CPUs available on each NUMA node: Node 0: 16 Node 1: 20 Node 2: 20 Node 3: 20 It then attempted to distribute 48 CPUs across them with an expectation that each of the first 3 NUMA nodes would have 16 CPUs taken from them (leaving Node 0 with no more CPUs in the end). This would have resulted in the following amount of CPUs on each node: Node 0: 0 Node 1: 4 Node 2: 4 Node 3: 20 Which results in a standard deviation of 7.6811 However, a more balanced solution would actually be to pull 16 CPUs from NUMA nodes 1, 2, and 3, and leave 0 untouched, i.e.: Node 0: 16 Node 1: 4 Node 2: 4 Node 3: 4 Which results in a standard deviation of 5.1961524227066 To fix this test we changed the original number of available CPUs to start with 4 less CPUs on NUMA node 3, and 2 more CPUs on NUMA node 0, i.e.: Node 0: 18 Node 1: 20 Node 2: 20 Node 3: 16 So that we end up with a result of: Node 0: 2 Node 1: 4 Node 2: 4 Node 3: 16 Which pulls the CPUs from where we want and results in a standard deviation of 5.5452 For the second test, we simply reverse the number of CPUs available for Nodes 0 and 3 as: Node 0: 16 Node 1: 20 Node 2: 20 Node 3: 18 Which forces the allocation to happen just as it did for the first test, except now on NUMA nodes 1, 2, and 3 instead of NUMA nodes 0,1, and 2. Signed-off-by: Kevin Klues <kklues@nvidia.com>	2021-11-24 16:51:23 +00:00
Kevin Klues	4008ea0b4c	Fix bug in CPUManager map.Keys() and map.Values() implementations Previously these would return lists that were too long because we appended to pre-initialized lists with a specific size. Since the primary place these functions are used is in the mean and standard deviation calculations for the NUMA distribution algorithm, it meant that the results of these calculations were often incorrect. As a result, some of the unit tests we have are actually incorrect (because the results we expect do not actually produce the best balanced distribution of CPUs across all NUMA nodes for the input provided). These tests will be patched up in subsequent commits. Signed-off-by: Kevin Klues <kklues@nvidia.com>	2021-11-24 16:51:21 +00:00
Kevin Klues	446c58e0e7	Ensure we balance across all NUMA nodes in NUMA distribution algo Signed-off-by: Kevin Klues <kklues@nvidia.com>	2021-11-24 16:51:19 +00:00
Kevin Klues	c8559bc43e	Short-circuit CPUManager distribute NUMA algo for unusable cpuGroupSize Signed-off-by: Kevin Klues <kklues@nvidia.com>	2021-11-24 16:51:16 +00:00
Kevin Klues	b28c1392d7	Round the CPUManager mean and stddev calculations to the nearest 1000th Signed-off-by: Kevin Klues <kklues@nvidia.com>	2021-11-24 16:51:13 +00:00
ahrtr	b7f22801fe	add more info when failing to call PdhAddEnglishCounter	2021-11-24 13:49:34 +08:00
Kubernetes Prow Robot	ddfc53922c	Merge pull request #106414 from jonyhy96/kubelet-fix-flake kubelet: fix npe in test	2021-11-19 07:06:51 -08:00
haoyun	65ac99eef5	fix: npe in kubelet test Signed-off-by: haoyun <yun.hao@daocloud.io> Co-authored-by: Antonio Ojea <antonio.ojea.garcia@gmail.com>	2021-11-19 17:44:05 +08:00
shuheiktgw	2acdaeb361	Refactor Kubelet config validation tests	2021-11-18 22:38:01 +09:00
shuheiktgw	35ad91ab37	Refactor Kubelet config validations	2021-11-18 22:31:31 +09:00
Shivam Sandbhor	6652c54d83	Remove invalid comment in legacyregistry Signed-off-by: Shivam Sandbhor <shivam.sandbhor@gmail.com>	2021-11-18 15:05:00 +05:30
Kubernetes Prow Robot	d766ab88f7	Merge pull request #106501 from ehashman/cri-graduation-v1 Make CRI v1 the default and allow a fallback to v1alpha2	2021-11-17 19:57:01 -08:00
Kubernetes Prow Robot	91b7fb4dc9	Merge pull request #102915 from wzshiming/feat/graceful-shutdown-based-on-pod-priority Graceful Node Shutdown Based On Pod Priority	2021-11-17 18:45:03 -08:00
Kubernetes Prow Robot	321e22d365	Merge pull request #106505 from ehashman/revert-103980-dkc-metrics Revert "Bump DynamicKubeConfig metric deprecation to 1.23"	2021-11-17 16:55:03 -08:00
Kubernetes Prow Robot	e4952f32b7	Merge pull request #106463 from SergeyKanzhelev/grpcProbe Implement grpc probe action	2021-11-17 12:43:54 -08:00
Elana Hashman	b35c500541	Revert "Bump DynamicKubeConfig metric deprecation to 1.23"	2021-11-17 11:48:49 -08:00
Elana Hashman	31c4273f66	Add test for memory equivalence See https://github.com/kubernetes/kubernetes/pull/106006#issuecomment-971004230 Co-Authored-By: Jordan Liggitt <liggitt@google.com>	2021-11-17 11:07:09 -08:00
Sascha Grunert	de37b9d293	Make CRI `v1` the default and allow a fallback to `v1alpha2` This patch makes the CRI `v1` API the new project-wide default version. To allow backwards compatibility, a fallback to `v1alpha2` has been added as well. This fallback can either used by automatically determined by the kubelet. Signed-off-by: Sascha Grunert <sgrunert@redhat.com>	2021-11-17 11:05:05 -08:00
Sergey Kanzhelev	b7affcced1	implement :grpc probe action	2021-11-17 17:31:23 +00:00
Antonio Ojea	d126b14838	migrate nolint coments to golangci-lint	2021-11-17 13:58:53 +01:00
Hanna Lee	e78b3e8dfe	Use nolint directive instead of stopping ticker, per liggit's suggestion	2021-11-17 08:56:57 +01:00
Hanna Lee	69d029bddb	Add syncTicker.Stop()	2021-11-17 08:56:57 +01:00
Hanna Lee	07a883d8e6	Remove //lint:ignore pragmas that aren't being used anymore	2021-11-17 08:56:54 +01:00
Hanna Lee	1fbf06f5ad	Use time.NewTicker instead of time.Tick to avoid leaking	2021-11-17 08:56:00 +01:00
Hanna Lee	0f3836dcc5	Ignore deprecation warnings with //nolint:staticcheck	2021-11-17 08:55:57 +01:00
Kubernetes Prow Robot	6c357f9996	Merge pull request #106041 from jonyhy96/volumemanager-reconciler-codefmt kubelet: extract multiple ignore errors validate logic to isExpectedError	2021-11-16 22:55:53 -08:00
Shiming Zhang	7a6f792ff3	Add validation for GracefulNodeShutdownBasedOnPodPriority Co-authored-by: Elana Hashman <ehashman@users.noreply.github.com>	2021-11-17 11:47:12 +08:00
Shiming Zhang	545313bdc7	Implement graceful shutdown based on Pod priority	2021-11-17 11:47:12 +08:00
Shiming Zhang	d82f606970	Add field for KubeletConfiguration and Regenerate	2021-11-17 11:47:12 +08:00
Kubernetes Prow Robot	1f6d5caa9a	Merge pull request #105437 from cmssczy/update-kubelet-configuration migrate --register-with-taints to KubeletConfiguration	2021-11-16 17:44:00 -08:00
menglong.qi	b886b9b108	fix: typo	2021-11-17 09:22:57 +08:00
Kubernetes Prow Robot	42d8b2f3b9	Merge pull request #106289 from CatherineF-dev/fix-metrics-AlreadyRegisteredError-in-unit-test Fix metrics AlreadyRegisteredError on TestRecordOperation and TestGetHistogramVecFromGatherer unit test	2021-11-16 16:36:15 -08:00
Kubernetes Prow Robot	6805e6ee41	Merge pull request #104722 from leiyiz/migration turning on the CSIMigrationGCE feature flag	2021-11-16 15:28:32 -08:00
Léiyì Zhang	275fdf0884	fixing unit test failures induced by turning on CSIMigrationGCE disable CSIMigrationGCE in some unit tests	2021-11-16 19:26:30 +00:00
CatherineF-dev	5646120fbb	Use Reset at first	2021-11-16 18:57:24 +00:00
haoyun	b5409adaeb	refactor: extract multiple ignore errors validate to ignoreError Signed-off-by: haoyun <yun.hao@daocloud.io>	2021-11-16 20:43:50 +08:00
caozhiyuan	bad4faf1b9	migrate --register-with-taints to KubeletConfiguration	2021-11-16 19:10:36 +08:00
Kubernetes Prow Robot	1d1d462d2f	Merge pull request #104287 from jsturtevant/windows-stats Reduce the number of expensive calls in the Windows stats queries for dockershim	2021-11-15 18:51:37 -08:00
Kubernetes Prow Robot	0473cab823	Merge pull request #103299 from wgahnagl/addPinned prevents garbage collection from removing pinned images	2021-11-15 18:51:25 -08:00
Kubernetes Prow Robot	39af75af30	Merge pull request #106201 from yxxhero/fea_106111 Add more msg when exec probe timeout	2021-11-15 17:51:37 -08:00
Kubernetes Prow Robot	463802765d	Merge pull request #104650 from yxxhero/initcontainer_oomkiil_as_a_failure fix init container oomkilled as a failure	2021-11-15 17:51:25 -08:00
Kubernetes Prow Robot	b7c4962472	Merge pull request #105685 from liggitt/kubelet-file-test Simplify kubelet file config field allowlists	2021-11-15 14:06:48 -08:00
Odin Ugedal	de0ece541c	Fix cpu share issues on systems with large amounts of cpu On systems where the calculated cpu shares results in a value above the max value in linux, containers getting that value are unable to start. This occur on systems with 300+ cpu cores, and where containers are given such a value. This issue was fixed for the pod and qos control groups in the similar cm.MilliCPUToShares that also has tests verifying the behavior. Since this code already has an dependency on kubelet/cm, lets reuse that code instead.	2021-11-14 19:49:19 +00:00
Kubernetes Prow Robot	e4c795168b	Merge pull request #106332 from bobbypage/disable-memcg-notifier kubelet: cgroupv2 disable memcg notifications	2021-11-12 18:36:46 -08:00
CatherineF-dev	d9737eabf4	Use HandlerFor	2021-11-12 23:09:51 +00:00
CatherineF-dev	49d341aa2b	Use defer in non-loop	2021-11-12 23:03:38 +00:00
Kubernetes Prow Robot	1f6aa87a93	Merge pull request #105744 from jsturtevant/windows-containerd-networkstats Get Windows network stats directly for Containerd	2021-11-12 12:36:41 -08:00
Kubernetes Prow Robot	5f0a94b23c	Merge pull request #104743 from gjkim42/ensure-pod-uniqueness Ensure there is one running static pod with the same full name	2021-11-12 12:36:28 -08:00
Kubernetes Prow Robot	6c04f87470	Merge pull request #106382 from rphillips/fix_close_log kubelet: fix file descriptor leak in log rotations	2021-11-12 09:22:40 -08:00
Neha Lohia	fa1b6765d5	move pkg/util/node to component-helpers/node/util (#105347 ) Signed-off-by: Neha Lohia <nehapithadiya444@gmail.com>	2021-11-12 07:52:27 -08:00
CatherineF-dev	a30af261f1	remove lint	2021-11-12 15:03:44 +00:00
Ryan Phillips	d6f9df424a	defer close the rotated log open	2021-11-12 08:13:24 -06:00
CatherineF-dev	a8324a3bb7	clean	2021-11-12 03:52:19 +00:00
CatherineF-dev	744785ee40	remove prometheus.DefaultRegisterer	2021-11-12 02:17:28 +00:00
Kubernetes Prow Robot	3ca3daac76	Merge pull request #103415 from tiloso/staticcheck-kubelet Fix staticcheck failure in pkg/kubelet/cm/cpuset	2021-11-11 15:15:13 -08:00
Gunju Kim	2dd4a00509	kubelet: Remove false PLEG errors	2021-11-12 00:03:01 +09:00
David Porter	f5140d3145	kubelet: cgroupv2 disable memcg notifications The current memory notifier on cgroupv2 relies on reading `cgroup.event_control` which is unsupported on cgroupv2. For now, let's disable the feature on cgroupv2.	2021-11-10 15:40:59 -08:00
ravisantoshgudimetla	696abecada	[test][kubelet]: Fix out of bounds in TestSyncLabels unit	2021-11-10 16:53:59 -05:00
James Sturtevant	ab2e58c416	Get networks stats directly	2021-11-10 12:43:56 -08:00
James Sturtevant	c39945c116	Add unit tests to existing code	2021-11-10 11:50:04 -08:00
James Sturtevant	3564cd5beb	Reduce calls to docker from dockershim for stats	2021-11-10 11:25:03 -08:00
Kubernetes Prow Robot	b56dc43458	Merge pull request #106282 from bobbypage/cadvisor-v043 vendor: Bump cAdvisor to v0.43.0	2021-11-10 08:17:38 -08:00
CatherineF-dev	8290400e9c	format	2021-11-10 03:29:13 +00:00
CatherineF-dev	ef0b2dfbf4	Fix metrics AlreadyRegisteredError on TestRecordOperation and TestGetHistogramVecFromGatherer unit test	2021-11-10 03:23:54 +00:00
Kubernetes Prow Robot	5d60c8d857	Merge pull request #102393 from mengjiao-liu/fix-sysctl-regex Upgrade preparation to verify sysctl values containing forward slashes by regex	2021-11-09 18:23:26 -08:00
David Porter	b6269ce5de	kubelet: update cAdvisor usage for v0.43 * Change cAdvisor manager constructor * Change call to adding AcceleratorUsageMetrics Signed-off-by: David Porter <david@porter.me>	2021-11-09 17:09:12 -08:00
Kubernetes Prow Robot	6ac2d8edc8	Merge pull request #105967 from shivanshu1333/feature2/master/105841 Migrated scheduler files `preemption.go`, `stateful.go`, `resource_allocation.go` to structured logging	2021-11-09 10:28:01 -08:00
ravisantoshgudimetla	889d45d3fb	[kubelet] Reject pods with OS field mismatch Once kubernetes#104613 and kubernetes#104693 merge, we'll have OS field in pod spec. Kubelet should start rejecting pods where pod.Spec.OS and node's OS(using runtime.GOOS) won't match	2021-11-08 19:18:15 -05:00
Kubernetes Prow Robot	cda360c59f	Merge pull request #104613 from ravisantoshgudimetla/reconcile-labels [kubelet]: Reconcile OS and arch labels periodically	2021-11-08 14:15:19 -08:00
Kubernetes Prow Robot	8b463cd141	Merge pull request #105406 from marosset/kubelet-metrics-for-host-process-containers Adding kubelet metrics for started and failed to start HostProcess containers	2021-11-08 13:11:20 -08:00
Shivanshu Raj Shrivastava	f4aad52885	migrated preemption.go, stateful.go, resource_allocation.go to structured logging	2021-11-08 22:52:47 +05:30
Kubernetes Prow Robot	33de444861	Merge pull request #103095 from haircommander/podAndContainerStatsFromCRI-feature-gate Kubelet: implement support for podAndContainerStatsFromCRI	2021-11-07 18:26:53 -08:00
yxxhero	4211826c3c	add more msg when exec probe timeout Signed-off-by: yxxhero <aiopsclub@163.com>	2021-11-06 15:59:22 +08:00
ravisantoshgudimetla	21c5c2ec5c	[kubelet][podadmission]: Validate and reject pods with mismatching labels	2021-11-05 18:47:43 -04:00
ravisantoshgudimetla	02c1bac0b6	[kubelet]: Sync label periodically	2021-11-05 18:47:43 -04:00
Mark Rossetti	ef324d6bbd	Adding kubelet metrics for started and failed to start HostProcess containers Signed-off-by: Mark Rossetti <marosset@microsoft.com>	2021-11-04 14:39:57 -07:00
Andy Pan	3033a64135	kubelet/eviction: eliminate redundant allocations when handling eventfd	2021-11-04 15:41:46 +08:00
Mengjiao Liu	275d832ce2	Upgrade preparation to verify sysctl values containing forward slashes by regex	2021-11-04 11:49:56 +08:00
Skyler Clark	e9766c2b81	adds pinned field to imageRecords	2021-11-03 14:47:37 -04:00
Patrick Ohly	3948cb8d1b	component-base: move v/vmodule/log-flush-frequency into LoggingConfiguration These three options are the ones from logs.AddFlags which are not deprecated. Therefore it makes sense to make them available also via the configuration file support in the one command which currently supports that (kubelet). Long-term, all commands should use LoggingConfiguration, either with a configuration file (as in kubelet) or via flags (kube-scheduler, kube-apiserver, kube-controller-manager). Short-term, both approaches have to be supported. As the majority of the commands only use logs.AddFlags, that function by default continues to register the flags and only leaves that to Options.AddFlags when explicitly requested. A drive-by bug fix is done for log flushing: the periodic flushing called klog.Flush and therefore missed explicit flushing of the newer logr backend. This bug was never present in any release Kubernetes and therefore the fix is not submitted in a separate PR.	2021-11-03 07:41:46 +01:00
Kubernetes Prow Robot	aa0ea62489	Merge pull request #104903 from ikeeip/storageobjectinuseprotection_feature_ga_cleanup Remove StorageObjectInUseProtection feature gate logic	2021-11-02 20:22:57 -07:00
Kubernetes Prow Robot	359b722c19	Merge pull request #102882 from fromanirh/device-manager-checkpoints devicemanager: checkpoint: support pre-1.20 data	2021-11-02 16:56:57 -07:00
Konstantin Misyutin	808c8f42d5	Remove StorageObjectInUseProtection feature gate logic This feature has graduated to GA in v1.11 and will always be enabled. So no longe need to check if enabled. Signed-off-by: Konstantin Misyutin <konstantin.misyutin@huawei.com>	2021-11-03 00:13:50 +03:00
Skyler Clark	d3ae0a381a	prevents garbage collection from removing pinned images	2021-11-02 14:43:02 -04:00
Jordan Liggitt	94d0c0f78e	Simplify kubelet file config field allowlists	2021-11-02 10:23:54 -04:00
Kubernetes Prow Robot	08bf54678e	Merge pull request #101909 from nolancon/cpu-mgr-testing Additional cases for reconcileState testing	2021-10-30 00:01:17 -07:00
Tim Hockin	11a25bfeb6	De-share the Handler struct in core API (#105979 ) * De-share the Handler struct in core API An upcoming PR adds a handler that only applies on one of these paths. Having fields that don't work seems bad. This never should have been shared. Lifecycle hooks are like a "write" while probes are more like a "read". HTTPGet and TCPSocket don't really make sense as lifecycle hooks (but I can't take that back). When we add gRPC, it is EXPLICITLY a health check (defined by gRPC) not an arbitrary RPC - so a probe makes sense but a hook does not. In the future I can also see adding lifecycle hooks that don't make sense as probes. E.g. 'sleep' is a common lifecycle request. The only option is `exec`, which requires having a sleep binary in your image. * Run update scripts	2021-10-29 13:15:11 -07:00
Peter Hunt	6b3f8e5662	kubelet: fallback to partial CRI stats if full fails This is partially to allow the kube alpha tests to pass until CRI implementations have support, but also to handle this error situation a bit more elegantly Signed-off-by: Peter Hunt <pehunt@redhat.com>	2021-10-29 09:40:20 -04:00
Peter Hunt	feb5f5e0ed	kubelet: use helper function to check for nil fields in sandbox stats Signed-off-by: Peter Hunt <pehunt@redhat.com>	2021-10-29 09:40:20 -04:00
Peter Hunt	85e8a4bf73	kubelet stats: use UsageNanoCores if available Signed-off-by: Peter Hunt <pehunt@redhat.com>	2021-10-29 09:40:20 -04:00
Peter Hunt	ffdb4b9c4a	kubelet: slightly move around some cri stats functions to reduce duplication and add clarity Signed-off-by: Peter Hunt <pehunt@redhat.com>	2021-10-29 09:40:20 -04:00
Peter Hunt	d2c436700e	kubelet stats: add support for podAndContainerStatsFromCRI This commit adds an initial implementation of translating from the new CRI fields to the /stats/summary PodStats object Signed-off-by: Peter Hunt <pehunt@redhat.com>	2021-10-29 09:40:20 -04:00
Peter Hunt	7866287ba1	kubelet stats: wire up podAndContainerStatsFromCRI feature gate though it is currently unused Signed-off-by: Peter Hunt <pehunt@redhat.com>	2021-10-29 09:40:20 -04:00
Kubernetes Prow Robot	c592bd40f2	Merge pull request #105609 from pohly/generic-ephemeral-volume-ga generic ephemeral volume GA	2021-10-28 17:36:50 -07:00
Francesco Romani	2f426fdba6	devicemanager: checkpoint: support pre-1.20 data The commit `a8b8995ef2` changed the content of the data kubelet writes in the checkpoint. Unfortunately, the checkpoint restore code was not updated, so if we upgrade kubelet from pre-1.20 to 1.20+, the device manager cannot anymore restore its state correctly. The only trace of this misbehaviour is this line in the kubelet logs: ``` W0615 07:31:49.744770 4852 manager.go:244] Continue after failing to read checkpoint file. Device allocation info may NOT be up-to-date. Err: json: cannot unmarshal array into Go struct field PodDevicesEntry.Data.PodDeviceEntries.DeviceIDs of type checkpoint.DevicesPerNUMA ``` If we hit this bug, the device allocation info is indeed NOT up-to-date up until the device plugins register themselves again. This can take up to few minutes, depending on the specific device plugin. While the device manager state is inconsistent: 1. the kubelet will NOT update the device availability to zero, so the scheduler will send pods towards the inconsistent kubelet. 2. at pod admission time, the device manager allocation will not trigger, so pods will be admitted without devices actually being allocated to them. To fix these issues, we add support to the device manager to read pre-1.20 checkpoint data. We retroactively call this format "v1". Signed-off-by: Francesco Romani <fromani@redhat.com>	2021-10-26 09:54:11 +02:00
Kubernetes Prow Robot	17da6a2345	Merge pull request #105699 from yuzhiquan/remove-format-pods Remove format.pods func, instead with klog.Kobjs	2021-10-25 15:53:30 -07:00
Yuan Chen	b99495d1d9	Fix and improve comments on kubelet metrics	2021-10-21 17:38:25 -07:00
Eric Ernst	2c0fad1f52	kuberuntime: populate sandbox resources, overhead Populate Resources and Overhead fields which, are now part of LinuxPodSandboxConfig. Signed-off-by: Eric Ernst <eric_ernst@apple.com>	2021-10-20 11:30:23 -07:00
Eric Ernst	ddcf815d12	kuberuntime: refactor linux resources for better reuse Seperate the CPU/Memory req/limit -> linux resource conversion into its own function for better reuse. Elsewhere in kuberuntime pkg, we will want to leverage this requests/limits to Linux Resource type conversion. Signed-off-by: Eric Ernst <eric_ernst@apple.com>	2021-10-20 11:30:23 -07:00
Eric Ernst	b1361aed93	kuberuntime: augment linux container config unit test Signed-off-by: Eric Ernst <eric_ernst@apple.com>	2021-10-20 11:30:23 -07:00
Eric Ernst	a73502a0be	kuberuntime: augment linux container config unit test Signed-off-by: Eric Ernst <eric_ernst@apple.com>	2021-10-20 11:29:22 -07:00
Kubernetes Prow Robot	b2c4269992	Merge pull request #105631 from klueska/upstream-distribute-cpus-across-numa Add CPUManager policy option to distribute CPUs across NUMA nodes instead of packing them	2021-10-19 11:40:24 -07:00
Gunju Kim	3bce245279	Ensure there is one running static pod with the same full name	2021-10-19 16:30:18 +09:00
Kubernetes Prow Robot	1af8a8c026	Merge pull request #105465 from marosset/remove-host-process-contianer-kubelet-annotations Stop passing WindowsHostProcessContainer annotations for CRI calls in kubelet	2021-10-18 15:50:02 -07:00
Kubernetes Prow Robot	e595d79dfc	Merge pull request #104574 from 249043822/br-repeat-package fix duplicate package import in pod_worker	2021-10-18 15:49:46 -07:00
Kubernetes Prow Robot	5889fb4fbc	Merge pull request #105652 from wzshiming/feat/structure-shutdown-config Refactor to use structure to pass parameters for GracefulNodeShutdown	2021-10-18 14:45:20 -07:00
Kevin Klues	86f9c266bc	Add optimizations to reduce iterations in distributed NUMA algorithm Signed-off-by: Kevin Klues <kklues@nvidia.com>	2021-10-18 08:53:25 +00:00
Kevin Klues	70e0f47191	Support full-pcpus-only with the new NUMA distribution policy option Signed-off-by: Kevin Klues <kklues@nvidia.com>	2021-10-16 19:31:02 +00:00
Kevin Klues	d54445a84d	Generalize the NUMA distribution algorithm to take cpuGroupSize This parameter ensures that CPUs are always allocated in groups of size 'cpuGroupSize'. This is important, for example, to ensure that all CPUs (i.e. hyperthreads) from the same core are handed out together. Signed-off-by: Kevin Klues <kklues@nvidia.com>	2021-10-16 19:31:02 +00:00
Kevin Klues	1436e33642	Add more extensive testing for NUMA distribution algorithm in CPUManager Signed-off-by: Kevin Klues <kklues@nvidia.com>	2021-10-16 19:31:02 +00:00
Kevin Klues	cf3afb8602	Add 2 distinguishing test cases between the 2 takeByTopology algorithms Signed-off-by: Kevin Klues <kklues@nvidia.com>	2021-10-16 19:31:02 +00:00
Kevin Klues	eb78e2406b	Add a new TestTakeByTopologyNUMADistributed() test to the CPUManager As part of this, pull out all of the existing "TakeByTopology" tests and have them be called by the original TestTakeByTopologyNUMAPacked() as well as the new TestTakeByTopologyNUMADistributed() test. In a subsequent commit, we will add some tests that should differ between these two algorithms. Signed-off-by: Kevin Klues <kklues@nvidia.com>	2021-10-16 19:31:02 +00:00
Kevin Klues	876dd9b078	Added algorithm to CPUManager to distribute CPUs across NUMA nodes Signed-off-by: Kevin Klues <kklues@nvidia.com>	2021-10-16 19:31:02 +00:00
Kevin Klues	462544d079	Split CPUManager takeByTopology() into two different algorithms The first implements the original algorithm which packs CPUs onto NUMA nodes if more than one NUMA node is required to satisfy the allocation. The second disitributes CPUs across NUMA nodes if they can't all fit into one. The "distributing" algorithm is currently a noop and just returns an error of "unimplemented". A subsequent commit will add the logic to implement this algorithm according to KEP 2902: https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/2902-cpumanager-distribute-cpus-policy-option Signed-off-by: Kevin Klues <kklues@nvidia.com>	2021-10-16 14:46:19 +00:00
Kevin Klues	0e7928edce	Add new CPUManager policy option for "distribute-cpus-across-numa" This commit only adds the option to the policy options framework. A subsequent commit will add the logic to utilize it. The KEP describing this new option can be found here: https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/2902-cpumanager-distribute-cpus-policy-option Signed-off-by: Kevin Klues <kklues@nvidia.com>	2021-10-16 14:46:19 +00:00
yuzhiquanlong	27fe56e916	remove unused import	2021-10-15 18:40:31 +08:00
Francesco Romani	4bae656835	cpumanager: test NUMA node support for CPU assign (2) This batch of tests adds a fake topology on which each numa node has multiple sockets. We didn't find yet a real HW topology in the wild like this, but we need one to fully exercise the code. So, until we find a HW topology, we add a fake one flipping the NUMA/socket config of the existing xeon dual gold 6320. Signed-off-by: Francesco Romani <fromani@redhat.com>	2021-10-15 10:29:21 +00:00
Francesco Romani	547996f3f6	cpumanager: test NUMA node support for CPU assign (1) This batch of tests adds a real topology on which each physical socket has multiple NUMA zones. Taken by a real dual xeon 6320 gold. Signed-off-by: Francesco Romani <fromani@redhat.com>	2021-10-15 10:29:21 +00:00
Francesco Romani	f6ccc4426a	cpumanager: test: use proper subtests The exisiting unit tests where performing subtests without actually using the full features of the testing package (https://pkg.go.dev/testing#hdr-Subtests_and_Sub_benchmarks) Update them with fairly minimal changes. The patch is deceptively large because we need to move the code inside a new block. Signed-off-by: Francesco Romani <fromani@redhat.com>	2021-10-15 10:29:21 +00:00
Francesco Romani	15caa134b2	cpumanager: topology: use rich cmp package User the `cmp.Diff` package in the unit tests, moving away from `reflect.DeepEqual`. This gives us a clearer picture of the differences when the tests fail. Signed-off-by: Francesco Romani <fromani@redhat.com>	2021-10-15 10:29:21 +00:00
Kevin Klues	aff54a0914	Abstract out whether NUMA or Sockets come first in the memory hierarchy This allows us to get rid of the check for determining which one is higher all throughout the code. Now we just check once and instantiate an interface of the appropriate type that makes sure the ordering in the hierarchy is preserved through the appropriate calls. Signed-off-by: Kevin Klues <kklues@nvidia.com>	2021-10-15 10:29:15 +00:00
yuzhiquanlong	be9e1fda5e	remove format pods func, instead with klog.Kobjs	2021-10-15 18:26:02 +08:00
Kevin Klues	17c7e86c6d	Add NUMA support to the CPU assignment algorithm in the CPUManager Signed-off-by: Kevin Klues <kklues@nvidia.com>	2021-10-15 08:35:59 +00:00
Shiming Zhang	e47c78a354	Add log for creating node shutdown manager	2021-10-15 11:16:21 +08:00
Shiming Zhang	b468c24e85	Refactor to use structure to pass parameters	2021-10-15 11:16:21 +08:00
Kubernetes Prow Robot	a923852ba0	Merge pull request #105215 from rphillips/add_probe_shutdown kubelet: add probe termination to graceful shutdowns	2021-10-11 21:19:46 -07:00
Patrick Ohly	a8c930ef46	generic ephemeral volume: graduation to GA The feature gate gets locked to "true", with the goal to remove it in two releases. All code now can assume that the feature is enabled. Tests for "feature disabled" are no longer needed and get removed. Some code wasn't using the new helper functions yet. That gets changed while touching those lines.	2021-10-11 20:54:20 +02:00
nolancon	6bbb36df10	Additional cases for reconcileState testing	2021-10-11 16:17:21 +00:00
Kubernetes Prow Robot	dc9c571166	Merge pull request #105569 from pohly/generic-ephemeral-kubelet-volume-stats kubelet: also provide filesystem stats for generic ephemeral volumes	2021-10-11 07:52:39 -07:00
Kubernetes Prow Robot	1f2813368e	Merge pull request #105542 from pohly/generic-ephemeral-volume-util-kubelet kubelet: use generic ephemeral volume helper functions	2021-10-11 02:16:40 -07:00
Kubernetes Prow Robot	fb82a0d7eb	Merge pull request #104873 from pohly/json-output-stream JSON output streams	2021-10-10 17:04:37 -07:00
Patrick Ohly	b22263d835	component-base: configurable JSON output This implements the replacement of klog output to different files per level with optionally splitting JSON output into two streams: one for info messages on stdout, one for error messages on stderr. The info messages can get buffered to increase performance. Because stdout and stderr might be merged by the consumer, the info stream gets flushed before writing an error, to ensure that the order of messages is preserved. This also ensures that the following code pattern doesn't leak info messages: klog.ErrorS(err, ...) os.Exit(1) Commands explicitly have to flush before exiting via logs.FlushLogs. Most already do. But buffered info messages can still get lost during an unexpected program termination, therefore buffering is off by default. The new options get added to the v1alpha1 LoggingConfiguration with new command line flags. Because it is an alpha field, changing it inside the v1beta kubelet config should be okay as long as the fields are clearly marked as alpha.	2021-10-09 10:10:35 +02:00
Kubernetes Prow Robot	63f66e6c99	Merge pull request #105012 from fromanirh/cpumanager-policy-options-beta node: graduate CPUManagerPolicyOptions to beta	2021-10-08 07:32:59 -07:00
Kubernetes Prow Robot	2face135c7	Merge pull request #97415 from AlexeyPerevalov/ExcludeSharedPoolFromPodResources Return only isolated cpus in podresources interface	2021-10-08 05:58:58 -07:00
Patrick Ohly	b1ba381ef8	kubelet: also provide filesystem stats for generic ephemeral volumes When checking for a reference to a PVC, the code also needs to consider that a PVC might be referenced indirectly through an ephemeral volume source.	2021-10-08 12:11:52 +02:00
Kubernetes Prow Robot	dd650bd41f	Merge pull request #105527 from rphillips/fixes/filter_terminated_pods kubelet: set terminated podWorker status for terminated pods	2021-10-07 22:19:51 -07:00
Ryan Phillips	0166d446b9	kubelet: set terminated podWorker status for terminated pods	2021-10-07 16:18:59 -05:00
Patrick Ohly	844662e7fa	kubelet: use generic ephemeral volume helper functions The name concatenation and ownership check were originally considered small enough to not warrant dedicated functions, but the intent of the code is more readable with them.	2021-10-07 17:31:54 +02:00
Alexey Perevalov	5d9032007a	Return only isolated cpus in podresources interface Co-Authored-by: Swati Sehgal <swsehgal@redhat.com> Signed-off-by: Alexey Perevalov <alexey.perevalov@huawei.com>	2021-10-07 15:34:08 +01:00
Kubernetes Prow Robot	c4d802b0b5	Merge pull request #103289 from AlexeyPerevalov/DoNotExportEmptyTopology podresources: do not export empty NUMA topology	2021-10-07 07:11:46 -07:00
Kubernetes Prow Robot	907d62eac8	Merge pull request #105462 from ehashman/merge-terminal-phase Ensure terminal pods maintain terminal status	2021-10-05 13:12:58 -07:00
Mark Rossetti	99e43bfa8c	Stop passing WindowsHostProcessContainer annotations for CRI calls in kubelet Signed-off-by: Mark Rossetti <marosset@microsoft.com>	2021-10-05 10:08:53 -07:00
Elana Hashman	3005ef34f2	Ensure terminal pods maintain terminal status	2021-10-05 09:26:27 -07:00
Kubernetes Prow Robot	c91f9bdc60	Merge pull request #104689 from cynepco3hahue/memory_manager_restricted_policy_fix kubelet: memory manager: fix preferred topology hints calculation	2021-10-05 06:47:08 -07:00
Kubernetes Prow Robot	efa9029a0d	Merge pull request #104920 from tkashem/response-writer-cleanup apiserver: decorate http.ResponseWriter correctly	2021-10-05 00:53:09 -07:00
Elana Hashman	5ff6c2396d	Do not sync Waiting statuses for Terminated pods	2021-10-04 11:05:54 -07:00
Abu Kashem	0d50c969c5	apiserver: wrap ResponseWriter using abstraction	2021-10-04 10:59:11 -04:00
Kubernetes Prow Robot	e414cf7641	Merge pull request #100482 from pohly/generic-ephemeral-volume-checks generic ephemeral volume checks	2021-10-01 10:47:22 -07:00
Patrick Ohly	1e26115df5	consider ephemeral volumes for host path and node limits check When adding the ephemeral volume feature, the special case for PersistentVolumeClaim volume sources in kubelet's host path and node limits checks was overlooked. An ephemeral volume source is another way of referencing a claim and has to be treated the same way.	2021-10-01 17:03:44 +02:00
Kubernetes Prow Robot	883250145c	Merge pull request #104788 from 249043822/memorymanager-br Fix initContainersReusableMemory delete bug in MemoryManager	2021-10-01 05:27:22 -07:00
Kubernetes Prow Robot	cab54856f1	Merge pull request #104933 from vikramcse/automate_mockery conversion of tests from mockery to mockgen	2021-09-30 18:33:21 -07:00
Shuhei Kitagawa	ef0eff14ab	Add tests kubelet default config (#105116 ) * Use utilpointer to get a pointer * Add tests for kubelet default configs * Change copyright year from 2015 to 2021 * Run gofmt * Add all negative and all positive test cases	2021-09-30 17:29:33 -07:00
Francesco Romani	077c0aa1be	node: graduate CPUManagerPolicyOptions to beta We graduate the `CPUManagerPolicyOptions` feature to beta in the 1.23 cycle, and we add new experimental feature gates to guard new options which are planned in the 1.23 and in the following cycles. We introduce additional feature gate called `CPUManagerPolicyAlphaOptions` and `CPUManagerPolicyBetaOptions`. The basic idea is to avoid the cumbersome process of adding a feature gate for each option, and to have feature gates which track the maturity level of _groups_ of options. Besides this change, the graduation process, and the process in general, for adding new policy options is still unchanged. The `full-pcpus-only` option added in the 1.22 cycle is intentionally moved into the beta policy options For more details: - KEP: https://github.com/kubernetes/enhancements/pull/2933 - sig-arch discussion: https://groups.google.com/u/1/g/kubernetes-sig-architecture/c/Nxsc7pfe5rw Signed-off-by: Francesco Romani <fromani@redhat.com>	2021-09-29 11:40:03 +02:00
Kubernetes Prow Robot	e138afc35d	Merge pull request #105213 from yxxhero/remove_StartedPodsErrorsTotal_metrice_message Remove StartedPodsErrorsTotal metric message	2021-09-28 10:45:16 -07:00
Kubernetes Prow Robot	9005160245	Merge pull request #105272 from wojtek-t/add_jittering_for_kubelet Add jittering for Kubelet status computing	2021-09-28 00:20:42 -07:00
wojtekt	65d8037ae3	Add jittering for Kubelet status computing	2021-09-27 19:39:50 +02:00
vikram Jadhav	0de4397490	mockery to mockgen conversion	2021-09-25 16:15:08 +00:00
Khaled Henidak (Kal)	a53e2eaeab	move IPv6DualStack feature to stable. (#104691 ) * kube-proxy * endpoints controller * app: kube-controller-manager * app: cloud-controller-manager * kubelet * app: api-server * node utils + registry/strategy * api: validation (comment removal) * api:pod strategy (util pkg) * api: docs * core: integration testing * kubeadm: change feature gate to GA * service registry and rest stack * move feature to GA * generated	2021-09-24 16:30:22 -07:00
yxxhero	35df409a7e	remove StartedPodsErrorsTotal metrice message Signed-off-by: yxxhero <aiopsclub@163.com>	2021-09-23 22:18:56 +08:00
Kubernetes Prow Robot	2541fcf256	Merge pull request #104123 from fromanirh/podresources-not-report-unhealthy-devices devicemanager: skip unhealthy devices in GetAllocatable	2021-09-23 05:39:21 -07:00
Ryan Phillips	e2e938066d	kubelet: add probe termination to graceful shutdowns	2021-09-22 14:13:25 -05:00
Francesco Romani	1b6efa5e21	devicemanager: skip unhealthy devs in GetAllocatable The GetAllocatableDevices, needed to support the podresources API, doesn't take into account the device health when computing its output. In this PR we address this gap and add unit tests along the way to prevent regressions. This gives us a good initial coverage, E2E tests to cover this case are much harder to write, because we would need to inject faults to trigger the unhealthy status. We will evaluate if adding these tests into later PRs. Signed-off-by: Francesco Romani <fromani@redhat.com>	2021-09-22 19:20:04 +02:00
Kubernetes Prow Robot	7c71e06cd1	Merge pull request #104959 from calvin0327/issue-test-dataRace fix the test issue of node shutdown manager	2021-09-21 11:56:30 -07:00
Kubernetes Prow Robot	44d4d007bf	Merge pull request #103424 from 249043822/br-cadvisor-perf Optimize kubelet stats provider for perfomace bottleneck	2021-09-21 11:56:18 -07:00
Kubernetes Prow Robot	353f0a5eab	Merge pull request #105095 from wojtek-t/migrate_clock_3 Unify towards k8s.io/utils/clock - part 3	2021-09-20 12:46:45 -07:00
Kubernetes Prow Robot	0d20f47c7a	Merge pull request #105090 from saad-ali/removeSubpathFeaturegate Remove VolumeSubpath feature gate	2021-09-17 15:52:07 -07:00
wojtekt	d9b08c611d	Migrate to k8s.io/utils/clock	2021-09-17 15:19:08 +02:00
Kubernetes Prow Robot	cb2ea4bf7c	Merge pull request #101161 from rikatz/move-sysctl-util Move node and networking related helpers from pkg/util to component helpers	2021-09-17 02:11:00 -07:00
saad-ali	beb17fe10b	Remove VolumeSubpath feature gate Remove the VolumeSubpath feature gate. Feature gate convention has been updated since this was introduced to indicate that they "are intended to be deprecated and removed after a feature becomes GA or is dropped.".	2021-09-17 01:59:23 -07:00
Ricardo Pchevuzinske Katz	37d11bcdaf	Move node and networking related helpers from pkg/util to component helpers Signed-off-by: Ricardo Katz <rkatz@vmware.com>	2021-09-16 17:00:19 -03:00
Clayton Coleman	d5719800bf	kubelet: Handle UID reuse in pod worker If a pod is killed (no longer wanted) and then a subsequent create/ add/update event is seen in the pod worker, assume that a pod UID was reused (as it could be in static pods) and have the next SyncKnownPods after the pod terminates remove the worker history so that the config loop can restart the static pod, as well as return to the caller the fact that this termination was not final. The housekeeping loop then reconciles the desired state of the Kubelet (pods in pod manager that are not in a terminal state, i.e. admitted pods) with the pod worker by resubmitting those pods. This adds a small amount of latency (2s) when a pod UID is reused and the pod is terminated and restarted.	2021-09-15 14:02:00 -04:00
KeZhang	a629ceeb58	Fix initContainersReusableMemory delete bug	2021-09-15 10:04:49 +08:00
Kubernetes Prow Robot	fa2657b8b2	Merge pull request #104624 from Haleygo/support-null-resolvConf-in-configFile When resolvConf is "" in kubelet configuration, pod will be created with wrong dns policy	2021-09-14 14:18:59 -07:00
yxxhero	c1b94d27d9	fix typo Signed-off-by: yxxhero <aiopsclub@163.com>	2021-09-14 23:24:14 +08:00
Haleygo	46454ea9dc	support null resolvConf in Kubelet Configuration	2021-09-14 16:12:52 +08:00
Kubernetes Prow Robot	047a6b9f86	Merge pull request #104874 from wojtek-t/migrate_clock_1 Unify towards k8s.io/utils/clock - part 1	2021-09-13 19:09:20 -07:00
Kubernetes Prow Robot	c79f7c1add	Merge pull request #104711 from claudiubelu/update-pause-3.6 update pause image references to use 3.6	2021-09-13 19:09:08 -07:00
yxxhero	20b3cd5198	fix typo Signed-off-by: yxxhero <aiopsclub@163.com>	2021-09-14 09:04:59 +08:00
yxxhero	5ba76eb911	fix typo Signed-off-by: yxxhero <aiopsclub@163.com>	2021-09-14 09:03:29 +08:00
Kubernetes Prow Robot	0e2acbe9a8	Merge pull request #104794 from wzshiming/fix/kubelet-cm-kv-pair pkg/kubelet/cm/memorymanager: Fix ErrorS key/value pair	2021-09-13 15:44:04 -07:00
calvin0327	db82e282fc	fix the test issue of data race to node shutdown manager	2021-09-13 18:12:19 +08:00
wojtekt	53ce79a18a	Migrate to k8s.io/utils/clock in pkg/kubelet	2021-09-10 12:20:09 +02:00
Kubernetes Prow Robot	1dcea5cb02	Merge pull request #104817 from smarterclayton/pod_status kubelet: Rejected pods should be filtered from admission	2021-09-09 22:15:59 -07:00
Kubernetes Prow Robot	5724484bda	Merge pull request #104069 from pacoxu/fix-data-race-104057 fix data race in kubelet volume test: add lock for ut	2021-09-09 21:09:59 -07:00
eggiter	20d3bc32ac	fix(cpumanager): Do not release cpus of init containers while they are reused in app containers	2021-09-10 10:01:35 +08:00
Clayton Coleman	17d32ed0b8	kubelet: Rejected pods should be filtered from admission A pod that has been rejected by admission will have status manager set the phase to Failed locally, which make take some time to propagate to the apiserver. The rejected pod will be included in admission until the apiserver propagates the change back, which was an unintended regression when checking pod worker state as authoritative. A pod that is terminal in the API may still be consuming resources on the system, so it should still be included in admission.	2021-09-08 10:23:45 -04:00
Shiming Zhang	7706d3d281	pkg/kubelet/cm/memorymanager: Fix ErrorS key/value pair	2021-09-06 17:37:04 +08:00
vikram Jadhav	c10c92bda9	changes made by introducing mockgen command	2021-09-03 17:40:11 +00:00
Vikram Jadhav	5f674101bb	Added update and verify scripts for automated mock generation	2021-09-03 17:40:11 +00:00
yxxhero	2f448a0789	fix oomkilled description Signed-off-by: yxxhero <aiopsclub@163.com>	2021-09-03 22:07:46 +08:00
yxxhero	71a91d55cb	update func description	2021-09-03 07:20:28 +08:00
yxxhero	afde4c8bc4	fix init container oomkilled as a failure Signed-off-by: yxxhero <aiopsclub@163.com>	2021-09-03 07:04:57 +08:00
Kubernetes Prow Robot	0b4a793da2	Merge pull request #103941 from saschagrunert/seccomp-profile-root Remove deprecated `--seccomp-profile-root`/`seccompProfileRoot` config	2021-09-02 08:52:57 -07:00
paco	ab055e9ba4	fix data race in kubelet volume test: add lock Signed-off-by: Paco Xu <paco.xu@daocloud.io> Co-authored-by: Jian Zeng <zengjian.zj@bytedance.com>	2021-09-01 16:13:55 +08:00
Artyom Lukianov	9ea9798759	kubelet: memory manager: fix topology preferred topology hints calculation Prevent starting pods with resources satisfied by a single NUMA node on multiple NUMA nodes. The code returned before it updated the minimal amount of NUMA nodes that can satisfy the container requests. Signed-off-by: Artyom Lukianov <alukiano@redhat.com>	2021-08-31 17:46:59 +03:00
Sascha Grunert	46077e6be7	Remove deprecated `--seccomp-profile-root`/`seccompProfileRoot` configuration The configuration is deprecated and targets removal for v1.23. Tests cases have been changed as well. Signed-off-by: Sascha Grunert <sgrunert@redhat.com>	2021-08-31 09:55:28 +02:00
Kubernetes Prow Robot	bbbeceb6aa	Merge pull request #104577 from smarterclayton/smaller_filter_master kubelet: Admission must exclude completed pods and avoid races	2021-08-30 13:17:13 -07:00
Claudiu Belu	18936d4785	updates pause image references The pause:3.6 image has been published. Also updates older / incorrect references.	2021-08-29 21:50:05 -07:00
Kubernetes Prow Robot	c262d09bb7	Merge pull request #104604 from wojtek-t/fix_secret_manager_2 Don't prematurely close reflectors in case of slow initialization in watch based manager	2021-08-26 06:11:23 -07:00
wojtekt	515106b795	Don't prematurely close reflectors in case of slow initialization in watch based manager	2021-08-26 11:34:24 +02:00
tiloso	2b86541313	Fix staticcheck failure in pkg/kubelet/cm/cpuset	2021-08-26 08:50:08 +02:00
Kubernetes Prow Robot	cbd0611d49	Merge pull request #104528 from kolyshkin/runc-1.0.2 vendor: bump runc to 1.0.2	2021-08-25 18:17:23 -07:00
Kubernetes Prow Robot	2f6b9166d7	Merge pull request #104039 from YanzhaoLi/extract-containerdid-from-various-cgrouppath Get containerID from systemd-style cgroupPath in cri_stats_provider	2021-08-25 17:05:22 -07:00
Clayton Coleman	a2ca66d280	kubelet: Admission must exclude completed pods and avoid races Fixes two issues with how the pod worker refactor calculated the pods that admission could see (GetActivePods() and filterOutTerminatedPods()) First, completed pods must be filtered from the "desired" state for admission, which arguably should be happening earlier in config. Exclude the two terminal pods states from GetActivePods() Second, the previous check introduced with the pod worker lifecycle ownership changes was subtly wrong for the admission use case. Admission has to include pods that haven't yet hit the pod worker, which CouldHaveRunningContainers was filtering out (because the pod worker hasn't seen them). Introduce a weaker check - IsPodKnownTerminated() - that returns true only if the pod is in a known terminated state (no running containers AND known to pod worker). This weaker check may only be called from components that need admitted pods, not other kubelet subsystems. This commit does not fix the long standing bug that force deleted pods are omitted from admission checks, which must be fixed by having GetActivePods() also include pods "still terminating".	2021-08-25 13:31:02 -04:00
KeZhang	dd4fd54427	fix duplicate package import in pod_worker	2021-08-25 21:16:38 +08:00
Stephen Augustus	481cf6fbe7	generated: Run hack/update-gofmt.sh Signed-off-by: Stephen Augustus <foo@auggie.dev>	2021-08-24 15:47:49 -04:00
Alexey Perevalov	bb81101570	podresource: do not export NUMA topology if it's empty If device plugin returns device without topology, keep it internaly as NUMA node -1, it helps at podresources level to not export NUMA topology, otherwise topology is exported with NUMA node id 0, which is not accurate. It's imposible to unveile this bug just by tracing json.Marshal(resp) in podresource client, because NUMANodes field ID has json property omitempty, in this case when ID=0 shown as emtpy NUMANode. To reproduce it, better to iterate on devices and just trace dev.Topology.Nodes[0].ID. Signed-off-by: Alexey Perevalov <alexey.perevalov@huawei.com>	2021-08-24 15:38:21 +00:00
Kir Kolyshkin	c06a851042	pkg/kubelet/cm: use SkipFreezeOnSet This is a knob added by runc 1.0.2 specifically for kubernetes, which tells runc/libcontainer/cgroups/systemd v1 manager to not freeze the cgroup in Set(). We set this knob here because this code is only used for pods (rather than containers) management, and in this place we create or update the pod cgroup with no device limits set, so we can skip the freeze. If this knob is not set, libcontainer's cgroup v1 manager tries to figure out whether the freeze is needed or not, but it's a somewhat expensive check to perform, thus the knob is a shortcut. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2021-08-23 13:41:51 -07:00
Antonio Ojea	0cd75e8fec	run hack/update-netparse-cve.sh	2021-08-20 10:42:09 +02:00
Kubernetes Prow Robot	8dbc33d649	Merge pull request #101081 from rphillips/add_graceful_shutdown_event kubelet: add graceful shutdown events	2021-08-17 22:08:08 -07:00
Kubernetes Prow Robot	a779c58b16	Merge pull request #104330 from liggitt/defaulter-package Change defaulter-gen input to package import path	2021-08-17 11:42:18 -07:00
Kubernetes Prow Robot	07b7afefbf	Merge pull request #103862 from tanjing2020/cleancode Replace 'x.Sub(time.Now())' with 'time.Until(x)'	2021-08-17 11:42:01 -07:00
Kubernetes Prow Robot	d7c1663556	Merge pull request #103137 from wzshiming/fix/expected_inhibit_delay Allow the actual inhibit delay to be greater than the expected inhibit delay	2021-08-17 11:41:49 -07:00
Kubernetes Prow Robot	a9aad7e034	Merge pull request #103107 from pacoxu/fix-93300 ResourceConfigForPod: check initContainers as other QoS func	2021-08-17 11:41:37 -07:00
Kubernetes Prow Robot	f4185318bc	Merge pull request #103048 from gy95/remove_static remove not used IsStaticPod, prevent possible panic	2021-08-17 11:41:25 -07:00
Kubernetes Prow Robot	b559434c02	Merge pull request #103059 from rajaSahil/fix-error Update github.com/pkg/errors to go native errors pkg	2021-08-17 10:29:25 -07:00
Kubernetes Prow Robot	db42b67f3c	Merge pull request #101962 from llhhbc/add-osinfo-logs Add getOSInfo err info	2021-08-17 10:29:13 -07:00
Jordan Liggitt	87a4e082ac	Change defaulter-gen input to package path	2021-08-14 11:00:18 -04:00
YanzhaoLi	545d898584	Extract containerID from systemd-style cgroupPath in cri_stats_provider And fix test to generate UUID without dash	2021-08-11 19:03:56 -07:00
Ryan Phillips	30e9a420c4	kubelet: fix sandbox creation error suppression when pods are quickly deleted	2021-08-10 08:55:25 -05:00
Kubernetes Prow Robot	4b4d12f8a6	Merge pull request #102913 from pacoxu/upgrade-promotheus-common upgrade prometheus/common to v0.28.0	2021-08-09 08:03:31 -07:00
longhui.li	4af506c989	Add getOSInfo err info	2021-08-09 11:04:53 +08:00
Artyom Lukianov	73a5cce3e6	device manager: do not clean admitted pods from the state Signed-off-by: Artyom Lukianov <alukiano@redhat.com>	2021-08-08 16:46:06 +03:00
Artyom Lukianov	93a237abd8	memory manager: do not clean admitted pods from the state Signed-off-by: Artyom Lukianov <alukiano@redhat.com>	2021-08-08 16:46:06 +03:00
Artyom Lukianov	66babd1a90	cpu manager: do not clean admitted pods from the state Signed-off-by: Artyom Lukianov <alukiano@redhat.com>	2021-08-08 16:46:06 +03:00
Elana Hashman	d2ed3b28b7	Revert "revert Bump DynamicKubeConfig metric deprecation to 1.23 by delta update"	2021-08-06 08:38:56 -07:00
Kubernetes Prow Robot	28990f7664	Merge pull request #103958 from liggitt/server-timeouts Set idle and readheader timeouts	2021-08-05 14:11:02 -07:00
Kubernetes Prow Robot	3b84cc9e6b	Merge pull request #104075 from kerthcet/cleanup/revert-dynamickubeconfig-metric revert Bump DynamicKubeConfig metric deprecation to 1.23 by delta update	2021-08-05 08:18:40 -07:00
Kubernetes Prow Robot	fa1d682bd7	Merge pull request #103353 from njuptlzf/fix_datarace fix data race for Test_Run_Positive_VolumeMountControllerAttachEnabledRace	2021-08-04 19:00:23 -07:00
Kubernetes Prow Robot	a674fb496c	Merge pull request #103261 from markusthoemmes/kubelet-volume-logs Add pod context to volume lifecycle logs	2021-08-04 19:00:15 -07:00
Kubernetes Prow Robot	4b2f2a0cd8	Merge pull request #102789 from haircommander/add-summary-stats-to-cri CRI: add fields for pod level stats to satisfy the /stats/summary API	2021-08-04 18:59:43 -07:00
Wesley Williams	ff165c8823	Replace usage of Whitelist with Allowlist within Kubelet's sysctl package (#102298 ) * Change uses of whitelist to allowlist in kubelet sysctl * Rename whitelist files to allowlist in Kubelet sysctl * Further renames of whitelist to allowlist in Kubelet * Rename podsecuritypolicy uses of whitelist to allowlist * Update pkg/kubelet/kubelet.go Co-authored-by: Danielle <dani@builds.terrible.systems> Co-authored-by: Danielle <dani@builds.terrible.systems>	2021-08-04 18:59:35 -07:00
Markus Thömmes	c820824711	Add pod context to volume lifecycle logs	2021-08-03 13:12:22 +02:00
kerthcet	980cf85439	revert Bump DynamicKubeConfig metric deprecation to 1.23 by delta update Signed-off-by: kerthcet <kerthcet@gmail.com>	2021-08-02 23:15:10 +08:00
Elana Hashman	b5f24c334e	Bump DynamicKubeConfig metric deprecation to 1.23	2021-07-28 09:29:57 -07:00
Jordan Liggitt	db48793269	Set idle and readheader timeouts	2021-07-27 11:58:45 -04:00
njuptlzf	1555877cc5	fix data race for Test_Run_Positive_VolumeMountControllerAttachEnabledRace	2021-07-26 17:17:16 +08:00
Kubernetes Prow Robot	47e1df8f4e	Merge pull request #103743 from kolyshkin/runc-1.0.1 vendor: bump runc to v1.0.1	2021-07-23 15:16:33 -07:00
tanjing2020	523b4c0918	Replace 'x.Sub(time.Now())' with 'time.Until(x)'	2021-07-23 10:03:36 +08:00
Kubernetes Prow Robot	9f47110aa2	Merge pull request #103785 from smarterclayton/preserve_reason Ensure that Reason and Message are preserved on pod status	2021-07-20 15:21:26 -07:00
Kubernetes Prow Robot	6aa160f3ba	Merge pull request #103181 from 249043822/bugfix-volumemanager Add sync reconstructed volume from desired state of world for volumemanager	2021-07-19 15:04:52 -07:00
Clayton Coleman	d7ee024cc5	kubelet: Make condition processing in one spot The list of status conditions should be calculated all together, this made review more complex. Readability only.	2021-07-19 17:56:22 -04:00
Clayton Coleman	c2a6d07b8f	kubelet: Avoid allocating multiple times during status Noticed while reviewing this code path. We can assume the temporary slice should be about the same size as it was previously.	2021-07-19 17:55:18 -04:00
Clayton Coleman	9efd40d72a	kubelet: Preserve reason/message when phase changes The Kubelet always clears reason and message in generateAPIPodStatus even when the phase is unchanged. It is reasonable that we preserve the previous values when the phase does not change, and clear it when the phase does change. When a pod is evicted, this ensurse that the eviction message and reason are propagated even in the face of subsequent updates. It also preserves the message and reason if components beyond the Kubelet choose to set that value. To preserve the value we need to know the old phase, which requires a change to convertStatusToAPIStatus so that both methods have access to it.	2021-07-19 17:54:55 -04:00
Kir Kolyshkin	e5b434e990	kubelet/cm: don't set Devices Since runc 1.0.0 it is now sufficient to have SkipDevices: true. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2021-07-16 12:45:35 -07:00
Davanum Srinivas	75748c185e	enable verify-golangci-lint.sh Signed-off-by: Davanum Srinivas <davanum@gmail.com>	2021-07-14 08:53:33 -04:00
Davanum Srinivas	26cc8e40a8	fix deadcode issues Signed-off-by: Davanum Srinivas <davanum@gmail.com>	2021-07-14 08:41:21 -04:00
Kubernetes Prow Robot	2da4d48e6d	Merge pull request #100567 from jingxu97/mar/mark Mark volume mount as uncertain in case of volume expansion fails	2021-07-13 22:20:26 -07:00
KeZhang	1467065352	Optimize kubelet stats provider for perfomace bottleneck	2021-07-14 11:11:12 +08:00
Kubernetes Prow Robot	d6f2473d08	Merge pull request #103668 from smarterclayton/panic_in_pod_worker kubelet: Prevent runtime-only pods from going into terminated phase	2021-07-13 17:42:26 -07:00
Clayton Coleman	de9cdab5ae	kubelet: Prevent runtime-only pods from going into terminated phase If a pod is already in terminated and the housekeeping loop sees an out of date cache entry for a running container, the pod worker should ignore that running pod termination request. Once the worker completes, a subsequent housekeeping invocation will then invoke terminating because the worker is no longer processing any pod with that UID. This does leave the possibility of syncTerminatedPod being blocked if a container in the pod is started after killPod successfully completes but before syncTerminatedPod can exit successfully, perhaps because the terminated flow (detach volumes) is blocked on that running container. A future change will address that issue.	2021-07-13 15:41:49 -04:00
rarashid	bf2ae14501	Move feature flag to beta (but leave as false) and remove the feature flag from Kubelet	2021-07-13 14:25:44 -05:00
KeZhang	65618bfd69	Add sync reconstructed volume from desired state of world for volumemanager	2021-07-13 12:51:37 +08:00
Kubernetes Prow Robot	04ef2b115d	Merge pull request #90216 from DataDog/nayef/fix-container-statuses-race Avoid overwriting podStatus ContainerStatuses in convertToAPIContainerStatuses	2021-07-12 17:02:29 -07:00
pacoxu	abd8acc259	fix exec failure for gomock finish calling Signed-off-by: pacoxu <paco.xu@daocloud.io>	2021-07-12 10:10:01 +08:00
Elana Hashman	642eff0c69	Rename NodeSwapEnabled flag to NodeSwap	2021-07-09 11:39:52 -07:00
Kubernetes Prow Robot	a6c2cd7d18	Merge pull request #103291 from wzshiming/fix/nodeshutdown-restart Fix Data Race in nodeshutdown restart	2021-07-09 08:43:14 -07:00
Kubernetes Prow Robot	617064d732	Merge pull request #101432 from swatisehgal/smtaware node: cpumanager: add options to reject non SMT-aligned workload	2021-07-08 21:04:53 -07:00
Kubernetes Prow Robot	83baa708df	Merge pull request #103429 from saschagrunert/metrics-test-fix Fix resource metrics e2e test	2021-07-08 17:58:53 -07:00
Kubernetes Prow Robot	dab6f6a43d	Merge pull request #102344 from smarterclayton/keep_pod_worker Prevent Kubelet from incorrectly interpreting "not yet started" pods as "ready to terminate pods" by unifying responsibility for pod lifecycle into pod worker	2021-07-08 16:48:53 -07:00
Jing Xu	0fa01c371c	Mark volume mount as uncertain in case of volume expansion fails should mark volume mount in actual state even if volume expansion fails so that reconciler can tear down the volume when needed. To avoid pods start using it, mark volume as uncertain instead of mounted. Will add unit test after the logic is reviewed. Change-Id: I5aebfa11ec93235a87af8f17bea7f7b1570b603d	2021-07-08 16:00:34 -07:00
Kubernetes Prow Robot	57716897eb	Merge pull request #103434 from perithompson/windows-etchostcreate-skip Explicitly skip host file mounting for Windows when HostProcess pod	2021-07-08 15:36:53 -07:00
Francesco Romani	23abdab2b7	smtalign: propagate policy options to policies Consume in the static policy the cpu manager policy options from the cpumanager instance. Validate in the none policy if any option is given, and fail if so - this is almost surely a configuration mistake. Add new cpumanager.Options type to hold the options and translate from user arguments to flags. Co-authored-by: Swati Sehgal <swsehgal@redhat.com> Signed-off-by: Francesco Romani <fromani@redhat.com>	2021-07-08 23:15:37 +02:00
Francesco Romani	6dcec345df	smtalign: cm: factor out admission response Introduce a new `admission` subpackage to factor out the responsability to create `PodAdmitResult` objects. This enables resource manager to report specific errors in Allocate() and to bubble up them in the relevant fields of the `PodAdmitResult`. To demonstrate the approach we refactor TopologyAffinityError as a proper error. Co-authored-by: Kevin Klues <kklues@nvidia.com> Co-authored-by: Swati Sehgal <swsehgal@redhat.com> Signed-off-by: Francesco Romani <fromani@redhat.com>	2021-07-08 23:15:37 +02:00
Francesco Romani	c5cb263dcf	smtalign: propagate policy options to cpumanager The CPUManagerPolicyOptions received from the kubelet config/command line args is propogated to the Container Manager. We defer the consumption of the options to a later patch(set). Co-authored-by: Swati Sehgal <swsehgal@redhat.com> Signed-off-by: Francesco Romani <fromani@redhat.com>	2021-07-08 23:15:35 +02:00
Francesco Romani	6dccad45b4	smtalign: add auto generated code Files generate after running `make generated_files`. Co-authored-by: Swati Sehgal <swsehgal@redhat.com> Signed-off-by: Francesco Romani <fromani@redhat.com>	2021-07-08 23:14:59 +02:00
Swati Sehgal	cc76a756e4	smtalign: add cpu-manager-policy-options flag in Kubelet In this patch we enhance the kubelet configuration to support cpuManagerPolicyOptions. In order to introduce SMT-awareness in CPU Manager, we introduce a new flag in Kubelet to allow the user to specify an additional flag called `cpumanager-policy-options` to allow the user to modify the behaviour of static policy to strictly guarantee allocation of whole core. Co-authored-by: Francesco Romani <fromani@redhat.com> Signed-off-by: Swati Sehgal <swsehgal@redhat.com>	2021-07-08 23:14:59 +02:00
Kubernetes Prow Robot	4d78db54a5	Merge pull request #103580 from tkestack/fix-version-format fix kubelet panic when DynamicKubeletConfig enabled	2021-07-08 14:02:24 -07:00
Kubernetes Prow Robot	a9d7526864	Merge pull request #102970 from tkestack/feature-memory-qos Feature: Support memory qos with cgroups v2	2021-07-08 14:01:36 -07:00
Kubernetes Prow Robot	7c84064a4f	Merge pull request #99000 from verb/1.21-kubelet-metrics Add kubelet metrics for ephemeral containers	2021-07-08 14:00:55 -07:00
Peri Thompson	8e2b728c68	Explicitly skip host file mounting for windows	2021-07-08 19:38:49 +01:00
Peter Hunt	a9b7dcc8c2	kubelet: update remote runtimes for cri stat changes Signed-off-by: Peter Hunt <pehunt@redhat.com>	2021-07-08 13:17:04 -04:00
Li Bo	79e230ea21	fix kubelet panic when DynamicKubeletConfig enabled	2021-07-08 16:20:51 +08:00
Li Bo	c3d9b10ca8	feature: support Memory QoS for cgroups v2	2021-07-08 09:26:46 +08:00
Kubernetes Prow Robot	36a7426aa5	Merge pull request #99144 from bart0sh/PR0094-promote-HugePageStorageMediumSize-to-GA promote huge page storage medium size to GA	2021-07-07 18:09:05 -07:00
Kubernetes Prow Robot	ebbe63f116	Merge pull request #92863 from AkihiroSuda/rootless-pr kubelet & kube-proxy: ignore sysctl errors and rlimit errors when running in UserNS (for rootless)	2021-07-07 18:08:53 -07:00
Kubernetes Prow Robot	8e56a34195	Merge pull request #102966 from SergeyKanzhelev/deprecateDynamicKubeletConfig deprecate and disable by default DynamicKubeletConfig feature flag	2021-07-07 17:05:15 -07:00
Nayef Ghattas	bb3fe633b4	add test for triggering race condition	2021-07-07 20:17:22 +02:00
Nayef Ghattas	ab1807f2bc	copy podStatus.ContainerStatuses before sorting it	2021-07-07 20:14:53 +02:00
Akihiro Suda	26e83ac4d4	kubelet: ignore /dev/kmsg error when running in userns oomwatcher.NewWatcher returns "open /dev/kmsg: operation not permitted" error, when running with sysctl value `kernel.dmesg_restrict=1`. The error is negligible for KubeletInUserNamespace. Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>	2021-07-07 14:23:31 +09:00
Akihiro Suda	dbe0155139	kubelet/cm: ignore sysctl error when running in userns Errors during setting the following sysctl values are ignored: - vm.overcommit_memory - vm.panic_on_oom - kernel.panic - kernel.panic_on_oops - kernel.keys.root_maxkeys - kernel.keys.root_maxbytes Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>	2021-07-07 14:23:29 +09:00
Kubernetes Prow Robot	2547c5bb97	Merge pull request #103307 from aojea/kubelet_podIPs podIPs order match node IP family preference (Downward API)	2021-07-06 22:11:20 -07:00
Kubernetes Prow Robot	561959f682	Merge pull request #102823 from ehashman/kep-2400-swap Alpha node swap support	2021-07-06 22:11:11 -07:00
Antonio Ojea	a7469cf680	sort and filter exposed Pod IPs runtimes may return an arbitrary number of Pod IPs, however, kubernetes only takes into consideration the first one of each IP family. The order of the IPs are the one defined by the Kubelet: - default prefer IPv4 - if NodeIPs are defined, matching the first nodeIP family PodIP is always the first IP of PodIPs. The downward API must expose the same IPs and in the same order than the pod.Status API object.	2021-07-07 00:15:31 +02:00
Elana Hashman	5584725605	Explicitly set LimitedSwap case with fallthrough	2021-07-06 13:50:09 -07:00
Clayton Coleman	3eadd1a9ea	Keep pod worker running until pod is truly complete A number of race conditions exist when pods are terminated early in their lifecycle because components in the kubelet need to know "no running containers" or "containers can't be started from now on" but were relying on outdated state. Only the pod worker knows whether containers are being started for a given pod, which is required to know when a pod is "terminated" (no running containers, none coming). Move that responsibility and podKiller function into the pod workers, and have everything that was killing the pod go into the UpdatePod loop. Split syncPod into three phases - setup, terminate containers, and cleanup pod - and have transitions between those methods be visible to other components. After this change, to kill a pod you tell the pod worker to UpdatePod({UpdateType: SyncPodKill, Pod: pod}). Several places in the kubelet were incorrect about whether they were handling terminating (should stop running, might have containers) or terminated (no running containers) pods. The pod worker exposes methods that allow other loops to know when to set up or tear down resources based on the state of the pod - these methods remove the possibility of race conditions by ensuring a single component is responsible for knowing each pod's allowed state and other components simply delegate to checking whether they are in the window by UID. Removing containers now no longer blocks final pod deletion in the API server and are handled as background cleanup. Node shutdown no longer marks pods as failed as they can be restarted in the next step. See https://docs.google.com/document/d/1Pic5TPntdJnYfIpBeZndDelM-AbS4FN9H2GTLFhoJ04/edit# for details	2021-07-06 15:55:22 -04:00
Kubernetes Prow Robot	eae87bfe7e	Merge pull request #103483 from odinuge/revert-102508-runc-1.0 Revert "Update runc to 1.0.0"	2021-07-06 10:42:56 -07:00
Artyom Lukianov	bb6d5b1f95	memory manager: provide unittests for init containers re-use - provide tests for static policy allocation, when init containers requested memory bigger than the memory requested by app containers - provide tests for static policy allocation, when init containers requested memory smaller than the memory requested by app containers - provide tests to verify that init containers removed from the state file once the app container started Signed-off-by: Artyom Lukianov <alukiano@redhat.com>	2021-07-05 20:52:25 +03:00
Artyom Lukianov	960da7895c	memory manager: remove init containers once app container started Remove init containers from the state file once the app container started, it will release the memory allocated for the init container and can intense the density of containers on the NUMA node in cases when the memory allocated for init containers is bigger than the memory allocated for app containers. Signed-off-by: Artyom Lukianov <alukiano@redhat.com>	2021-07-05 20:52:25 +03:00
Artyom Lukianov	b965502c49	memory manager: re-use the memory allocated for init containers The idea that during allocation phase we will: - during call to `Allocate` and `GetTopologyHints` we will take into account the init containers reusable memory, which means that we will re-use the memory and update container memory blocks accordingly. For example for the pod with two init containers that requested: 1Gi and 2Gi, and app container that requested 4Gi, we can re-use 2Gi of memory. Signed-off-by: Artyom Lukianov <alukiano@redhat.com>	2021-07-05 20:52:25 +03:00
Odin Ugedal	61d88af9e4	Revert "Update runc to 1.0.0"	2021-07-05 14:03:04 +02:00
Sascha Grunert	2d0f99fba1	Fix resource metrics e2e test Signed-off-by: Sascha Grunert <sgrunert@redhat.com>	2021-07-05 11:16:05 +02:00
Sergey Kanzhelev	dffc2a60a2	deprecate and disable by default DynamicKubeletConfig feature flag	2021-07-02 23:53:11 +00:00
Kubernetes Prow Robot	659c7e709f	Merge pull request #99494 from enj/enj/i/not_after_ttl_hint csr: add expirationSeconds field to control cert lifetime	2021-07-01 23:02:12 -07:00
Monis Khan	cd91e59f7c	csr: add expirationSeconds field to control cert lifetime This change updates the CSR API to add a new, optional field called expirationSeconds. This field is a request to the signer for the maximum duration the client wishes the cert to have. The signer is free to ignore this request based on its own internal policy. The signers built-in to KCM will honor this field if it is not set to a value greater than --cluster-signing-duration. The minimum allowed value for this field is 600 seconds (ten minutes). This change will help enforce safer durations for certificates in the Kube ecosystem and will help related projects such as cert-manager with their migration to the Kube CSR API. Future enhancements may update the Kubelet to take advantage of this field when it is configured in a way that can tolerate shorter certificate lifespans with regular rotation. Signed-off-by: Monis Khan <mok@vmware.com>	2021-07-01 23:38:15 -04:00
Kubernetes Prow Robot	062bc359ca	Merge pull request #102444 from sanwishe/resourceStartTime Expose container start time in kubelet /metrics/resource endpoint	2021-07-01 14:27:51 -07:00
Kir Kolyshkin	ab5b77944e	kubelet/cm: don't set Devices Since runc 1.0.0 it is now sufficient to have SkipDevices: true. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>	2021-06-30 16:17:35 -07:00
Shiming Zhang	212ce7c287	Shorten test time	2021-06-30 09:48:26 +08:00
Elana Hashman	39f32d7286	Ensure MemorySwapConfig can't be set without feature flag	2021-06-29 12:08:25 -07:00
Elana Hashman	d4041cb80f	Add generated files for swap API changes	2021-06-29 12:08:25 -07:00
Elana Hashman	d3fd1362ca	Rename NoSwap to LimitedSwap as workloads may still swap Also made the options a kubelet type, address API review feedback	2021-06-29 12:08:21 -07:00
Elana Hashman	0deef4610e	Set MemorySwapLimitInBytes for CRI when NodeSwapEnabled	2021-06-29 11:59:02 -07:00
Elana Hashman	7342acb0b8	Add validation for KubeletConfig MemorySwap	2021-06-29 11:59:01 -07:00
Elana Hashman	bda03b4818	API change: add MemorySwap to KubeletConfiguration	2021-06-29 11:58:59 -07:00
Kubernetes Prow Robot	01819dd322	Merge pull request #102028 from chrishenzie/read-write-once-pod-access-mode ReadWriteOncePod access mode for PVs and PVCs	2021-06-29 10:04:40 -07:00
Kubernetes Prow Robot	756203fda0	Merge pull request #102576 from dobsonj/101911 kubelet: do not call RemoveAll on volumes directory for orphaned pods	2021-06-29 06:54:40 -07:00
Shiming Zhang	a42c066af7	Fix Data Race in nodeshutdown restart	2021-06-29 16:23:45 +08:00
Chris Henzie	2b98f8edc7	Enforce ReadWriteOncePod access mode during mount	2021-06-28 21:25:37 -07:00
Kubernetes Prow Robot	15d3c3a5e2	Merge pull request #102821 from ehashman/phase-fix Ensure kubelet statuses can handle loss of container runtime state	2021-06-28 15:38:40 -07:00
pacoxu	f2eec0a816	ResourceConfigForPod: check initContainers as other QoS func Signed-off-by: pacoxu <paco.xu@daocloud.io>	2021-06-28 19:22:42 +08:00
Kubernetes Prow Robot	07358f1663	Merge pull request #103146 from tech-geek29/fix-95380 Change log level to Debug	2021-06-25 07:44:45 -07:00
Kubernetes Prow Robot	49ab9ac160	Merge pull request #103154 from jsafrane/fix-asw-mounter Update mounter interface in volume manager	2021-06-24 14:18:05 -07:00
Kubernetes Prow Robot	2e93b3924a	Merge pull request #101943 from saschagrunert/seccomp-default Add kubelet `SeccompDefault` alpha feature	2021-06-24 13:07:41 -07:00
Kubernetes Prow Robot	79494183b7	Merge pull request #102869 from mengjiao-liu/json-register-move Remove default JSON logging format registration from k8s.io/component-base/logs package	2021-06-24 11:59:41 -07:00
Kubernetes Prow Robot	06dfe683ce	Merge pull request #103123 from dims/remove-fakefs-to-drop-spf13/afero-dependency Remove fakefs to drop spf13/afero dependency	2021-06-24 07:57:41 -07:00
Davanum Srinivas	5feff280e1	remove fakefs to drop spf13/afero dependency Signed-off-by: Davanum Srinivas <davanum@gmail.com>	2021-06-24 09:51:34 -04:00
Mengjiao Liu	af825b4357	Remove default JSON logging format registration from component-base/logs package	2021-06-24 20:37:09 +08:00
Jan Safranek	d3dfe124da	Update mounter interface in volume manager Update mounter interface in volume manager's ActualStateOfWorld every time. Otherwise kubelet uses the first mounter it gets, which may not have the latest information. This fixes set up of CSI volumes, which store information about SELinux support in their `mounter` interface implementation. With each MountVolume() retry, a new mounter is instantiated and only the final mounter that succeeds has the right info if the volume supports SELinux or not and can later return the right attributes on GetAttributes() call.	2021-06-24 14:11:31 +02:00
Rishabh Jain	8f08db9164	Change log level to Debug	2021-06-24 14:23:06 +05:30
Kenta Tada	89a4d4b071	kubelet: modify the function of getCgroupSubsystemsV2 to use libcontainer API	2021-06-24 16:58:05 +09:00
Shiming Zhang	97bcfbd674	Allow the actual inhibit delay to be greater than the expected inhibit delay	2021-06-24 14:11:58 +08:00
Ryan Phillips	d9be5abc37	kubelet: add shutdown events	2021-06-23 16:44:19 -05:00
sanwishe	43f8f58895	add containers starttime metrics for metrics/resource endpoint Signed-off-by: sanwishe <jiang.mingzhi35@zte.com.cn>	2021-06-24 02:53:21 +08:00
Sascha Grunert	8b7003aff4	Add SeccompDefault feature This adds the gate `SeccompDefault` as new alpha feature. Seccomp path and field fallbacks are now passed to the helper functions, whereas unit tests covering those code paths have been added as well. Beside enabling the feature gate, the feature has to be enabled by the `SeccompDefault` kubelet configuration or its corresponding `--seccomp-default` CLI flag. Signed-off-by: Sascha Grunert <sgrunert@redhat.com> Apply suggestions from code review Co-authored-by: Paulo Gomes <pjbgf@linux.com> Signed-off-by: Sascha Grunert <sgrunert@redhat.com>	2021-06-23 10:22:57 +02:00
Sahil Raja	992993257d	Removed usage of github.com/pkg/errors Signed-off-by: Sahil Raja <sahilraja242@gmail.com>	2021-06-23 08:07:05 +05:30
Kubernetes Prow Robot	985ac8ae50	Merge pull request #101030 from cynepco3hahue/pod_resources_memory_interface Extend pod resource API response to return the information from memory manager	2021-06-22 06:35:58 -07:00
Artyom Lukianov	03830db82d	Implement all necessary methods to provide memory manager data under pod resources metrics Signed-off-by: Artyom Lukianov <alukiano@redhat.com>	2021-06-22 13:06:32 +03:00
Artyom Lukianov	24023f9fcc	Extend pod resource API response to return the memory manager information Signed-off-by: Artyom Lukianov <alukiano@redhat.com>	2021-06-22 12:59:04 +03:00

... 6 7 8 9 10 ...

10171 Commits