kubernetes

Author	SHA1	Message	Date
Kubernetes Prow Robot	fd0358fd21	Merge pull request #86689 from klueska/upstream-fix-cpumanager-v1-state-checksum Lock checksum calculation for v1 CPUManager state to pre 1.18 logic	2020-01-08 02:57:40 -08:00
Kubernetes Prow Robot	d6412b856f	Merge pull request #84345 from danielqsj/withdialer replace grpc.WithDialer which is deprecated	2020-01-06 15:56:17 -08:00
Kubernetes Prow Robot	9acf7d11fe	Merge pull request #86344 from klueska/upstream-cm-approver Add klueska as an approver in pkg/kubelet/cm/OWNERS	2020-01-06 09:54:16 -08:00
Kevin Klues	b373121a14	Make CPUManagerCheckpointV2 type an alias of CPUManagerCheckpoint This change is to prevent problems when we remove the V1->V2 migration code in the future. Without this, the checksums of all checkpoints would be hashed with the name CPUManagerCheckpointV2 embedded inside of them, which is undesirable. We want the checkpoints to be hashed with the name CPUManagerCheckpoint instead.	2019-12-28 19:29:13 +01:00
Kevin Klues	5faf8f4c52	Lock checksum calculation for v1 CPUManager state to pre 1.18 logic The updated CPUManager from PR #84462 implements logic to migrate the CPUManager checkpoint file from an old format to a new one. To do so, it defines the following types: ``` type CPUManagerCheckpoint = CPUManagerCheckpointV2 type CPUManagerCheckpointV1 struct { ... } type CPUManagerCheckpointV2 struct { ... } ``` This replaces the old definition of just: ``` type CPUManagerCheckpoint struct { ... } ``` Code was put in place to ensure proper migration from checkpoints in V1 format to checkpoints in V2 format. However (and this is a big however), all of the unit tests were performed on V1 checkpoints that were generated using the type name `CPUManagerCheckpointV1` and not the original type name of `CPUManagerCheckpoint`. As such, the checksum in the checkpoint file uses the `CPUManagerCheckpointV1` type to calculate its checksum and not the original type name of `CPUManagerCheckpoint`. This causes problems in the real world since all pre-1.18 checkpoint files will have been generated with the original type name of `CPUManagerCheckpoint`. When verifying the checksum of the checkpoint file across an upgrade to 1.18, the checksum is calculated assuming a type name of `CPUManagerCheckpointV1` (which is incorrect) and the file is seen to be corrupt. This patch ensures that all V1 checksums are verified against a type name of `CPUManagerCheckpoint` instead of ``CPUManagerCheckpointV1`. It also locks the algorithm used to calculate the checksum in place, since it wil never change in the future (for pre-1.18 checkpoint files at least).	2019-12-28 14:17:55 +01:00
danielqsj	19fe9f8d94	replace grpc.WithDialer which is deprecated	2019-12-26 17:46:59 +08:00
whypro	f4bd4e2e96	Return error instead of panic when cpu manager starts failed.	2019-12-19 21:56:23 +08:00
Kevin Klues	9818b4522e	Add klueska as an approver in pkg/kubelet/cm/OWNERS	2019-12-17 10:40:23 +01:00
Kevin Klues	f553286156	Pass initial set of runtime containers to the CPUManager at startup These information associatedd with these containers is used to migrate the CPUManager state from it's old format to its new (i.e. keyed off of podUID and containerName instead of containerID).	2019-12-11 23:02:51 +01:00
Kevin Klues	6441e1ef43	Move CPUManager Checkpoint restoration to Start() instead of New()	2019-12-11 23:02:51 +01:00
Kevin Klues	69f8053850	Update top-level CPUManager to adhere to new state semantics For now, we just pass 'nil' as the set of 'initialContainers' for migrating from old state semantics to new ones. In a subsequent commit will we pull this information from higher layers so that we can pass it down at this stage properly.	2019-12-11 23:02:51 +01:00
Kevin Klues	185e790f71	Update CPUManager policies to adhere to new state semantics	2019-12-11 23:02:51 +01:00
Kevin Klues	7c760fea38	Change CPUManager state to key off of podUID and containerName Previously, the state was keyed off of containerID intead of podUID and containerName. Unfortunately, this is no longer possible as we move to a to model where we we allocate CPUs to containers at pod adit time rather than container start time. This patch is the first step towards full migration to the new semantics. Only the unit tests in cpumanager/state are passing. In subsequent commits we will update the CPUManager itself to use these new semantics. This patch also includes code to do migration from the old checkpoint format to the new one, assuming the existence of a ContainerMap with the proper mapping of (containerID)->(podUID, containerName). A subsequent commit will update code in higher layers to make sure that this ContainerMap is made available to this state logic.	2019-12-11 23:02:51 +01:00
Kevin Klues	9191a949ae	Extend makePod() helper in CPUManager to take PodUID and ContainerName	2019-12-11 23:02:51 +01:00
Kevin Klues	7a15d3a4d7	Fix bug in parsing int to string in CPUManager tests	2019-12-11 23:02:51 +01:00
Kevin Klues	765aae93f8	Move containerMap out of static policy and into top-level CPUManager	2019-12-11 23:02:51 +01:00
Kevin Klues	1d995c98ef	Update CPUmanager containerMap to allow removal by containerRef	2019-12-11 23:02:47 +01:00
Kevin Klues	0639bd0942	Change CPUManager containerMap to key off of (podUID, containerName) Previously it keyed off of a pointer to the actual pod / container, which was unnecessary, and hard to work with (especially on the retrieval side).	2019-12-11 23:02:11 +01:00
Kevin Klues	3881e50cce	Update CPUmanager containerMap to also return a containerRef	2019-12-11 23:01:01 +01:00
Kevin Klues	347d5f57ac	Move CPUManager ContainerMap to its own package	2019-12-11 22:59:00 +01:00
Kubernetes Prow Robot	ad5d4c4705	Merge pull request #85706 from yutedz/per-node-dev Remove nodes slice in loop of takeByTopology	2019-12-05 13:50:30 -08:00
Kubernetes Prow Robot	57b6b287d4	Merge pull request #85688 from yutedz/pods-to-rm Reduce unnecessary Set in updateAllocatedDevices	2019-12-02 17:07:26 -08:00
Kubernetes Prow Robot	833f585104	Merge pull request #85760 from yutedz/chkpt-write-err Log error when writing checkpoint fails	2019-12-02 10:27:06 -08:00
Ted Yu	84a9803741	Log error when writing checkpoint fails	2019-11-29 19:47:17 -08:00
Ted Yu	6415fa765e	Remove nodes slice in loop of takeByTopology	2019-11-29 12:12:22 -08:00
Kubernetes Prow Robot	80eed952f0	Merge pull request #84854 from BSWANG/fix-hugetlb-cgroup fix kubelet failed to start on setting hugetlb limits	2019-11-27 12:29:03 -08:00
Ted Yu	86f3bc25e1	Reduce unnecessary Set in updateAllocatedDevices	2019-11-27 08:48:06 -08:00
Travis Rhoden	0c5c3d8bb9	Remove pkg/util/mount (moved out of tree) This patch removes pkg/util/mount completely, and replaces it with the mount package now located at k8s.io/utils/mount. The code found at k8s.io/utils/mount was moved there from pkg/util/mount, so the code is identical, just no longer in-tree to k/k.	2019-11-15 08:29:12 -07:00
Kubernetes Prow Robot	30e6238795	Merge pull request #85147 from yutedz/devmgr-rm-contents Continue removing file in ManagerImpl#removeContents	2019-11-14 16:38:28 -08:00
Ted Yu	fb046f7787	Continue removing file in ManagerImpl#removeContents	2019-11-13 06:00:34 -08:00
Kubernetes Prow Robot	ed10b5b17f	Merge pull request #85047 from yutedz/dev-mgr-err-handling Handle error return from allocatePodResources	2019-11-12 11:51:27 -08:00
Kubernetes Prow Robot	897ce3073c	Merge pull request #84533 from davidz627/fix/deprecatedPath Remove plugin watching of deprecated directory and CSI v0 support in accordance with deprecation policy	2019-11-12 04:48:20 -08:00
David Zhu	802fe12803	Remove plugin watching of deprecated directory {kubelet_root_dir}/plugins and support for CSI V0 in accordance with deprecation announcement in https://v1-13.docs.kubernetes.io/docs/setup/release/notes/	2019-11-11 11:42:58 -08:00
Ted Yu	db0f616974	Handle error return from allocatePodResources	2019-11-09 16:25:15 -08:00
Travis Rhoden	1fd8921546	Move mount/fake.go to mount/fake_mount.go This patch moves fake.go to mount_fake.go, and follows to principle of always returning a discrete type rather than an Interface. All callers of "FakeMounter" are changed to instead use "NewFakeMounter()". The FakeMounter "Log" struct member is changed to not be exported, and instead only access through a new "GetLog()" method.	2019-11-08 08:07:41 -07:00
mrobson	e401ee9158	Errors from cgroup destroy and pid kills are swallowed. Log a warning when that happens.	2019-11-07 07:47:57 -05:00
Kubernetes Prow Robot	73b2c82b28	Merge pull request #83592 from jianzzha/opt-reserved-cpus added --reserved-cpus kubelet command option	2019-11-06 22:14:42 -08:00
Kubernetes Prow Robot	695c3061dd	Merge pull request #82809 from liggitt/go-1.13-no-modules update to use go1.13.4	2019-11-06 17:02:43 -08:00
Kubernetes Prow Robot	08e5781b41	Merge pull request #84525 from klueska/upstream-fix-hint-generation-after-kubelet-restart Fix bug in TopologyManager hint generation after kubelet restart	2019-11-06 15:33:50 -08:00
Jordan Liggitt	297570e06a	hack/update-vendor.sh	2019-11-06 17:42:34 -05:00
Kubernetes Prow Robot	46472773cb	Merge pull request #84836 from yuxiaobo96/k8s-checks Correct spelling mistakes	2019-11-06 12:21:11 -08:00
Kevin Klues	4d4d4bdd61	Ensure devicemanager TopologyHints are regenerated after kubelet restart This patch also includes test to make sure the newly added logic works as expected.	2019-11-06 15:01:34 +00:00
Jianzhu Zhang	89dfd24483	added --reserved-cpus kubelet command option	2019-11-06 07:33:52 -05:00
yuxiaobo	81e9f21f83	Correct spelling mistakes Signed-off-by: yuxiaobo <yuxiaobogo@163.com>	2019-11-06 20:25:19 +08:00
bingshen.wbs	47642a0bad	fix kubelet failed to start on setting hugetlb limits in non-exist cgroup dir cause by kubelet startup be interrupted on setting list of cgroups In the 'cgroupManagerImpl.Exists' not check&recreate the hugetlb cgroup dir. Then setting the limits in non-exist cgroup dir will cause kubelet start failed. Signed-off-by: bingshen.wbs <bingshen.wbs@alibaba-inc.com>	2019-11-06 16:39:55 +08:00
Kubernetes Prow Robot	0c0408c790	Merge pull request #76407 from yanghaichao12/dev0411 change directory permissions from 0755 to 0750	2019-11-05 19:30:59 -08:00
Kevin Klues	9dc116eb08	Ensure CPUManager TopologyHints are regenerated after kubelet restart This patch also includes test to make sure the newly added logic works as expected.	2019-11-05 15:48:51 +00:00
Kevin Klues	a338c8f7fd	Add some more comments to GetTopologyHints() in the devicemanager	2019-11-05 13:06:23 +00:00
Kevin Klues	58f3554ebe	Sync all CPU and device state before generating TopologyHints for them This ensures that we have the most up-to-date state when generating topology hints for a container. Without this, it's possible that some resources will be seen as allocated, when they are actually free.	2019-11-05 13:00:20 +00:00
Kevin Klues	d9adf20360	Abstract removeStaleState from reconcileState in CPUManager This will become especially important as we move to a model where exclusive CPUs are assigned at pod admission time rather than at pod creation time. Having this function will allow us to do garbage collection on these CPUs anytime we are about to allocate CPUs to a new set of containers, in addition to reclaiming state periodically in the reconcileState() loop.	2019-11-05 12:45:11 +00:00

... 3 4 5 6 7 ...

853 Commits