kubernetes

Author	SHA1	Message	Date
Tim Allclair	a2c51674cf	Cleanup more static check issues (S1,ST)	2019-08-21 10:40:21 -07:00
Tim Allclair	8a495cb5e4	Clean up error messages (ST1005)	2019-08-21 10:40:21 -07:00
Tim Allclair	6510d26b6a	Fix misc static check issues	2019-08-21 10:40:21 -07:00
Tim Allclair	3f510c69f6	Remove dead code from pkg/kubelet/...	2019-08-21 10:40:21 -07:00
Kevin Klues	4fdd52b058	Update GetTopologyHints() API to return a map At present, there is no way for a hint provider to return distinct hints for different resource types via a call to GetTopologyHints(). This means that hint providers that govern multiple resource types (e.g. the devicemanager) must do some sort of "pre-merge" on the hints it generates for each resource type before passing them back to the TopologyManager. This patch changes the GetTopologyHints() interface to allow a hint provider to pass back raw hints for each resource type, and allow the TopologyManager to merge them using a single unified strategy. This change also allows the TopologyManager to recognize which resource type a set of hints originated from, should this information become useful in the future.	2019-08-16 08:06:12 +02:00
Kevin Klues	b3f4bed97f	Add CPUManager tests for TopologyHint consumption	2019-08-14 06:22:56 +02:00
Kevin Klues	8278d1134c	Consume TopologyHints in the CPUManager Co-Authored-By: Conor Nolan <conor.nolan@intel.com>	2019-08-14 06:22:56 +02:00
Sreemanti Ghosh	7c626a2a00	Add CPUManager tests for TopologyHint generation Co-Authored-By: Conor Nolan <conor.nolan@intel.com> Co-Authored-By: Kevin Klues <kklues@nvidia.com>	2019-08-14 06:22:56 +02:00
Kevin Klues	156b3f6af8	Generate TopologyHints from the CPUManager	2019-08-14 06:22:56 +02:00
Conor Nolan	e33af11add	Add stub support for TopologyManager to CPUManager Co-Authored-By: Louise Daly <louise.m.daly@intel.com>	2019-08-07 15:56:05 +02:00
Kevin Klues	9f36f1a173	Add tests for proactive init Container removal in the CPUManager static policy	2019-07-26 14:34:51 +02:00
Kevin Klues	6a7db380de	Add tests for new containertMap type in the CPUManager	2019-07-26 14:34:51 +02:00
Kevin Klues	c6d9bbcb74	Proactively remove init Containers in CPUManager static policy This patch fixes a bug in the CPUManager, whereby it doesn't honor the "effective requests/limits" of a Pod as defined by: https://kubernetes.io/docs/concepts/workloads/pods/init-containers/#resources The rule states that a Pod’s "effective request/limit" for a resource should be the larger of: * The highest of any particular resource request or limit defined on all init Containers * The sum of all app Containers request/limit for a resource Moreover, the rule states that: * The effective QoS tier is the same for init Containers and app containers alike This means that the resource requests of init Containers and app Containers should be able to overlap, such that the larger of the two becomes the "effective resource request/limit" for the Pod. Likewise, if a QoS tier of "Guaranteed" is determined for the Pod, then both init Containers and app Containers should run in this tier. In its current implementation, the CPU manager honors the effective QoS tier for both init and app containers, but doesn't honor the "effective request/limit" correctly. Instead, it treats the "effective request/limit" as: * The sum of all init Containers plus the sum of all app Containers request/limit for a resource It does this by not proactively removing the CPUs given to previous init containers when new containers are being created. In the worst case, this causes the CPUManager to give non-overlapping CPUs to all containers (whether init or app) in the "Guaranteed" QoS tier before any of the containers in the Pod actually start. This effectively blocks these Pods from running if the total number of CPUs being requested across init and app Containers goes beyond the limits of the system. This patch fixes this problem by updating the CPUManager static policy so that it proactively removes any guaranteed CPUs it has granted to init Containers before allocating CPUs to app containers. Since all init container are run sequentially, it also makes sure this proactive removal happens for previous init containers when allocating CPUs to later ones.	2019-07-26 14:34:51 +02:00
Kevin Klues	5dc5f1de06	Update the cpumanager to error out if an invalid policy is given Previously, the cpumanager would simply fall back to the None() policy if an invalid policy was specified. This patch updates this to return an error when an invalid policy is passed, forcing the kubelet to fail fast when this occurs. These semantics should be preferable because an invalid policy likely indicates operator error in setting the policy flag on the kubelet correctly (e.g. misspelling 'static' as 'statiic'). In this case it is better to fail fast so the operator can detect this and correct the mistake, than to mask the error and essentially disable the cpumanager unexpectedly.	2019-07-18 13:24:09 +02:00
Kubernetes Prow Robot	b276043051	Merge pull request #77421 from tedyu/cpu-free-no-sort Obtain unsorted slice in cpuAccumulator#freeCores	2019-05-16 16:26:53 -07:00
Kubernetes Prow Robot	b4211dea98	Merge pull request #77422 from tedyu/policy-set-union Union all CPUSets in one round	2019-05-06 14:02:05 -07:00
Ted Yu	e967c37068	Union all CPUSets in one round	2019-05-03 14:40:33 -07:00
Ted Yu	f83bac61a4	Obtain unsorted slice in cpuAccumulator#freeCores	2019-05-03 14:07:47 -07:00
Kubernetes Prow Robot	98c4c1e2d8	Merge pull request #77291 from tedyu/cpu-pod-stat Query pod status outside loop over containers	2019-05-01 23:28:56 -07:00
Kubernetes Prow Robot	a5a70b4de3	Merge pull request #74859 from ahadas/static_policy kubelet/cm: code optimization for the static policy	2019-05-01 23:28:19 -07:00
Ted Yu	3fc16a7e82	Log pod name when pod status cannot be queried	2019-05-01 15:01:56 -07:00
Ted Yu	66ce52578a	Query pod status outside loop over containers	2019-04-30 19:35:32 -07:00
Kevin Klues	ef27f5f1a5	Add ability to find init Container IDs in cpumanager reconcileState() The cpumanager loops through all init Containers and app Containers when reconciling its state. However, the current implementation of findContainerIDByName(), which is call by the reconciler, does not resolve for init Containers. This patch updates findContainerIDByName() to account for init Containers and adds a regression test that fails before the change and succeeds after.	2019-04-27 06:18:55 -07:00
Davanum Srinivas	33081c1f07	New staging repository for cri-api Change-Id: I2160b0b0ec4b9870a2d4452b428e395bbe12afbb	2019-03-26 18:21:04 -04:00
Arik Hadas	4a47148afe	kubelet/cm: fix test description Signed-off-by: Arik Hadas <ahadas@redhat.com>	2019-03-07 21:23:15 +02:00
Arik Hadas	26e1c1cee7	kubelet/cm: code optimization for the static policy Minor optimization in the code that attempts to assign whole sockets/cores in case the number of CPUs requested is higher than CPUs-per-socket/core: check if the number of requested CPUs is higher than CPUs-per-socket/core before retrieving and iterating the free sockets/cores, and break the loops when that is no longer the case. Signed-off-by: Arik Hadas <ahadas@redhat.com>	2019-03-07 21:23:15 +02:00
Arik Hadas	c3a533e5b2	Cleanup in topology.go 1. Find the minimal thread number within a core using a single loop rather than by sorting the thread numbers. 2. Inline getUniqueCoreID#err and Discover#numCPUs variables. 3. Narrow the scope of Discover#coreID and Discover#err variables. Signed-off-by: Arik Hadas <ahadas@redhat.com>	2019-02-14 16:55:37 +02:00
Roy Lenferink	b43c04452f	Updated OWNERS files to include link to docs	2019-02-04 22:33:12 +01:00
ailusazh	10995f661d	clean containers in reconcileState of cpuManager	2019-01-15 16:09:28 +08:00
yanghaichao12	982d1778f8	Fix comment error of 'cpuManagerStateFileName'	2018-11-19 08:07:04 -05:00
Davanum Srinivas	954996e231	Move from glog to klog - Move from the old github.com/golang/glog to k8s.io/klog - klog as explicit InitFlags() so we add them as necessary - we update the other repositories that we vendor that made a similar change from glog to klog * github.com/kubernetes/repo-infra * k8s.io/gengo/ * k8s.io/kube-openapi/ * github.com/google/cadvisor - Entirely remove all references to glog - Fix some tests by explicit InitFlags in their init() methods Change-Id: I92db545ff36fcec83afe98f550c9e630098b3135	2018-11-10 07:50:31 -05:00
choury	36b92b9b29	cpumanager: rollback state if updateContainerCPUSet failed	2018-08-17 18:08:58 +08:00
Lin Yang	b7e1f0bf17	kubelet/cm/cpumanager: Fix unused variable "skipIfPermissionsError" The variable "skipIfPermissionsError" is not needed even when permission error happened.	2018-08-02 17:24:33 -07:00
Ismo Puustinen	3bb5ca9257	cpumanager: add test for available CPUs in static policy. Test the cases where the number of CPUs available in the system is smaller or larger than the number of CPUs known in the state, which should lead to a panic. This covers both CPU onlining and offlining. The case where the number of CPUs matches is already covered by the "non-corrupted state" test.	2018-07-31 10:20:37 +03:00
Ismo Puustinen	4f604eb73c	cpumanager: validate topology in static policy. This patch adds a check for the static policy state validation. The check fails if the CPU topology obtained from cadvisor doesn't match with the current topology in the state file. If the CPU topology has changed in a node, cpu manager static policy might try to assign non-present cores to containers. For example in my test case, static policy had the default CPU set of 0-1,4-7. Then kubelet was shut down and CPU 7 was offlined. After restarting the kubelet, CPU manager tries to assign the non-existent CPU 7 to containers which don't have exclusive allocations assigned to them: Error response from daemon: Requested CPUs are not available - requested 0-1,4-7, available: 0-6) This breaks the exclusivity, since the CPUs from the shared pool don't get assigned to non-exclusive containers, meaning that they can execute on the exclusive CPUs.	2018-07-30 08:49:13 +03:00
choury	8e4b62a74b	Remove duplicate check line There is a same [line](https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/cm/cpumanager/policy_static.go#L81).	2018-07-05 11:07:56 +08:00
Kubernetes Submit Queue	991a84758f	Merge pull request #59214 from kdembler/cpumanager-checkpointing Automatic merge from submit-queue (batch tested with PRs 59214, 65330). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. Migrate cpumanager to use checkpointing manager What this PR does / why we need it: This PR migrates `cpumanager` to use new kubelet level node checkpointing feature (#56040) to decrease code redundancy and improve consistency. Which issue(s) this PR fixes: Fixes #58339 Notes: At point of submitting PR the most straightforward approach was used - `state_checkpoint` implementation of `State` interface was added. However, with checkpointing implementation there might be no point to keep `State` interface and just use single implementation with checkpoint backend and in case of different backend than filestore needed just supply `cpumanager` with custom `CheckpointManager` implementation. /kind feature /sig node cc @flyingcougar @ConnorDoyle	2018-06-25 18:19:00 -07:00
Jeff Grafton	23ceebac22	Run hack/update-bazel.sh	2018-06-22 16:22:57 -07:00
Klaudiusz Dembler	a9df2acc4b	Typo fix	2018-06-07 12:08:48 +02:00
Klaudiusz Dembler	9384937f2f	Update bazel	2018-05-21 17:39:51 +02:00
Klaudiusz Dembler	de1063bc7d	Add compatibility tests	2018-05-21 14:50:31 +02:00
Klaudiusz Dembler	3d09101b6f	Add docstrings	2018-05-21 11:40:04 +02:00
Klaudiusz Dembler	aa325ec2d9	Change JSON letter case in tests	2018-05-15 18:43:48 +02:00
Klaudiusz Dembler	7bb047ec75	Rebase and backward compatibility	2018-05-15 18:34:53 +02:00
Klaudiusz Dembler	ba8d82c96a	Update error indicating unexistent checkpoint	2018-05-14 09:51:27 +02:00
Klaudiusz Dembler	0b1a73e94b	Make cpuManagerCheckpoint exported	2018-05-14 09:51:27 +02:00
Klaudiusz Dembler	cc3fa67bda	Add comments to MockCheckpoint functions and gofmt	2018-05-14 09:51:27 +02:00
Klaudiusz Dembler	0fbd19bc06	Tweaks	2018-05-14 09:51:26 +02:00
Klaudiusz Dembler	3991ed5d2f	Add tests	2018-05-14 09:51:26 +02:00
Klaudiusz Dembler	6bfceed4ab	Migrate cpumanager to use checkpointing manager	2018-05-14 09:45:58 +02:00

1 2

89 Commits