Commit Graph

33 Commits

Author SHA1 Message Date
Kevin Klues
00df26a985 Fix a bug whereby reusable CPUs and devices were not being honored
Previously, it was possible for reusable CPUs and reusable devices (i.e.
those previously consumed by init containers) to not be reused by
subsequent init containers or app containers if the TopologyManager was
enabled. This would happen because hint generation for the
TopologyManager was not considering the reusable devices when it made
its hint calculation.

As such, it would sometimes:
1) Generate a hint for a differnent NUMA node, causing the CPUs and
devices to be allocated from that node instead of the one where the
reusable devices live; or
2) End up thinking there were not enough CPUs or devices to allocate and
throw a TopologyAffinity admission error

This patch fixes this by ensuring that reusable CPUs and devices are
considered as part of TopologyHint generation. This frunctionality is
difficult to unit test since it spans multiple components, but an e2e
test will be added in a subsequent patch to test this functionality.
2020-07-20 11:41:13 +00:00
Davanum Srinivas
442a69c3bd
switch over k/k to use klog v2
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2020-05-16 07:54:27 -04:00
Kevin Klues
751b9f3e13 Update strategy used to reuse CPUs from init containers in CPUManager
With the old strategy, it was possible for an init container to end up
running without some of its CPUs being exclusive if it requested more
guaranteed CPUs than the sum of all guaranteed CPUs requested by app
containers. Unfortunately, this case was not caught by our unit tests
because they didn't validate the state of the defaultCPUSet to ensure
there was no overlap with CPUs assigned to containers. This patch
updates the strategy to reuse the CPUs assigned to init containers
across into app containers, while avoiding this edge case. It also
updates the unit tests to now catch this type of error in the future.
2020-04-23 20:27:43 +00:00
nolancon
467f66580b CPU Manager - Add check to policy.Allocate() for init conatiners
If container allocated CPUs is an init container, release those CPUs
back into the shared pool for re-allocation to next container.
2020-02-27 07:24:33 +00:00
nolancon
709989efa2 CPU Manager - Rename policy.AddContainer() to policy.Allocate() 2020-02-27 07:24:33 +00:00
Kevin Klues
bc686ea27b Update TopologyManager.GetTopologyHints() to take pointers
Previously, this function was taking full Pod and Container objects
unnecessarily. This commit updates this so that they will take pointers
instead.
2020-02-03 17:13:28 +00:00
whypro
f4bd4e2e96 Return error instead of panic when cpu manager starts failed. 2019-12-19 21:56:23 +08:00
Kevin Klues
185e790f71 Update CPUManager policies to adhere to new state semantics 2019-12-11 23:02:51 +01:00
Kevin Klues
765aae93f8 Move containerMap out of static policy and into top-level CPUManager 2019-12-11 23:02:51 +01:00
Kevin Klues
1d995c98ef Update CPUmanager containerMap to allow removal by containerRef 2019-12-11 23:02:47 +01:00
Kevin Klues
0639bd0942 Change CPUManager containerMap to key off of (podUID, containerName)
Previously it keyed off of a pointer to the actual pod / container,
which was unnecessary, and hard to work with (especially on the
retrieval side).
2019-12-11 23:02:11 +01:00
Kevin Klues
3881e50cce Update CPUmanager containerMap to also return a containerRef 2019-12-11 23:01:01 +01:00
Kevin Klues
347d5f57ac Move CPUManager ContainerMap to its own package 2019-12-11 22:59:00 +01:00
Kubernetes Prow Robot
73b2c82b28
Merge pull request #83592 from jianzzha/opt-reserved-cpus
added --reserved-cpus kubelet command option
2019-11-06 22:14:42 -08:00
Jianzhu Zhang
89dfd24483 added --reserved-cpus kubelet command option 2019-11-06 07:33:52 -05:00
Kevin Klues
9dc116eb08 Ensure CPUManager TopologyHints are regenerated after kubelet restart
This patch also includes test to make sure the newly added logic works
as expected.
2019-11-05 15:48:51 +00:00
Connor Doyle
389853894d Delegate topology hint gen to CPU manager policy
- The previous implementation depended on a fixed set of policies.
2019-09-27 22:29:02 -07:00
Connor Doyle
e35301c19f Rename package socketmask to bitmask.
- As discussed in reviews and other public channels,
  this abstraction is used to represent numa nodes, not sockets.
- There is nothing inherently related to sockets in this package anyway.
2019-09-23 17:08:45 -07:00
Kevin Klues
e0e8b3e4fd Update CPUManager topology helpers to accept multiple ids 2019-08-29 13:22:54 -05:00
Kevin Klues
f4dbd29cdb Rename TopologyHint.SocketAffinity to TopologyHint.NUMANodeAffinity
As part of this, update the logic to use the NUMA information instead of
the Socket information when generating and consuming TopologyHints in
the CPUManager.
2019-08-27 16:51:05 -05:00
Kevin Klues
8278d1134c Consume TopologyHints in the CPUManager
Co-Authored-By: Conor Nolan <conor.nolan@intel.com>
2019-08-14 06:22:56 +02:00
Kevin Klues
156b3f6af8 Generate TopologyHints from the CPUManager 2019-08-14 06:22:56 +02:00
Conor Nolan
e33af11add Add stub support for TopologyManager to CPUManager
Co-Authored-By: Louise Daly <louise.m.daly@intel.com>
2019-08-07 15:56:05 +02:00
Kevin Klues
c6d9bbcb74 Proactively remove init Containers in CPUManager static policy
This patch fixes a bug in the CPUManager, whereby it doesn't honor the
"effective requests/limits" of a Pod as defined by:

    https://kubernetes.io/docs/concepts/workloads/pods/init-containers/#resources

The rule states that a Pod’s "effective request/limit" for a resource
should be the larger of:
    * The highest of any particular resource request or limit
      defined on all init Containers
    * The sum of all app Containers request/limit for a
      resource

Moreover, the rule states that:
    * The effective QoS tier is the same for init Containers
      and app containers alike

This means that the resource requests of init Containers and app
Containers should be able to overlap, such that the larger of the two
becomes the "effective resource request/limit" for the Pod. Likewise,
if a QoS tier of "Guaranteed" is determined for the Pod, then both init
Containers and app Containers should run in this tier.

In its current implementation, the CPU manager honors the effective QoS
tier for both init and app containers, but doesn't honor the "effective
request/limit" correctly.

Instead, it treats the "effective request/limit" as:
    * The sum of all init Containers plus the sum of all app
      Containers request/limit for a resource

It does this by not proactively removing the CPUs given to previous init
containers when new containers are being created. In the worst case,
this causes the CPUManager to give non-overlapping CPUs to all
containers (whether init or app) in the "Guaranteed" QoS tier before any
of the containers in the Pod actually start.

This effectively blocks these Pods from running if the total number of
CPUs being requested across init and app Containers goes beyond the
limits of the system.

This patch fixes this problem by updating the CPUManager static policy
so that it proactively removes any guaranteed CPUs it has granted to
init Containers before allocating CPUs to app containers. Since all init
container are run sequentially, it also makes sure this proactive
removal happens for previous init containers when allocating CPUs to
later ones.
2019-07-26 14:34:51 +02:00
Ted Yu
e967c37068 Union all CPUSets in one round 2019-05-03 14:40:33 -07:00
Davanum Srinivas
954996e231
Move from glog to klog
- Move from the old github.com/golang/glog to k8s.io/klog
- klog as explicit InitFlags() so we add them as necessary
- we update the other repositories that we vendor that made a similar
change from glog to klog
  * github.com/kubernetes/repo-infra
  * k8s.io/gengo/
  * k8s.io/kube-openapi/
  * github.com/google/cadvisor
- Entirely remove all references to glog
- Fix some tests by explicit InitFlags in their init() methods

Change-Id: I92db545ff36fcec83afe98f550c9e630098b3135
2018-11-10 07:50:31 -05:00
Ismo Puustinen
4f604eb73c cpumanager: validate topology in static policy.
This patch adds a check for the static policy state validation. The
check fails if the CPU topology obtained from cadvisor doesn't match
with the current topology in the state file.

If the CPU topology has changed in a node, cpu manager static policy
might try to assign non-present cores to containers.

For example in my test case, static policy had the default CPU set of
0-1,4-7. Then kubelet was shut down and CPU 7 was offlined. After
restarting the kubelet, CPU manager tries to assign the non-existent CPU
7 to containers which don't have exclusive allocations assigned to them:

 Error response from daemon: Requested CPUs are not available - requested 0-1,4-7, available: 0-6)

This breaks the exclusivity, since the CPUs from the shared pool don't
get assigned to non-exclusive containers, meaning that they can execute
on the exclusive CPUs.
2018-07-30 08:49:13 +03:00
choury
8e4b62a74b
Remove duplicate check line
There is a same [line](https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/cm/cpumanager/policy_static.go#L81).
2018-07-05 11:07:56 +08:00
Szymon Scharmach
552e4d3a9d Cpu manager reconclie loop can restore state 2017-11-27 11:22:21 +01:00
Kubernetes Submit Queue
e99544d018
Merge pull request #54409 from intelsdi-x/cpu-enable-state-file
Automatic merge from submit-queue (batch tested with PRs 55764, 55683, 55468, 54409, 55546). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Enable file back state in static policy

**What this PR does / why we need it**:
Enables file back `State` in `static policy` and cpu manager + tests.
Upon policy start, state read from file is validated whether it meets the policy assumption. In case of any error, state is cleared.

Previous PR: #54408
Next PR: #54409
2017-11-15 22:16:05 -08:00
Szymon Scharmach
7e7301ffaf Enable file state in static policy 2017-11-14 18:25:58 +01:00
Dr. Stefan Schimanski
012b085ac8 pkg/apis/core: mechanical import fixes in dependencies 2017-11-09 12:14:08 +01:00
Connor Doyle
d0bcbbb437 Added static cpumanager policy. 2017-09-04 07:24:59 -07:00