Commit Graph

53 Commits

Author SHA1 Message Date
Kevin Klues
bc686ea27b Update TopologyManager.GetTopologyHints() to take pointers
Previously, this function was taking full Pod and Container objects
unnecessarily. This commit updates this so that they will take pointers
instead.
2020-02-03 17:13:28 +00:00
Kubernetes Prow Robot
9822016bf8
Merge pull request #87397 from klueska/upstream-cpu-manager-set-initial-containers
Initialize CPUManager containerMap to set of initial containers
2020-01-20 17:39:50 -08:00
Kubernetes Prow Robot
e6b5194ec1
Merge pull request #84300 from klueska/upstream-cpu-manager-reconcile-on-container-state
Update logic in `CPUManager` `reconcileState()`
2020-01-20 12:27:37 -08:00
Kevin Klues
bd9d8fa42f Initialize CPUManager containerMap to set of initial containers
A recent change made it so that the CPUManager receives a list of
initial containers that exist on the system at startup. This list can be
non-empty, for example, after a kubelet retart.

This commit ensures that the CPUManagers containerMap structure is
initialized with the containers from this list.
2020-01-20 20:42:29 +01:00
Kubernetes Prow Robot
37ee6425ef
Merge pull request #87255 from klueska/upstream-remove-redundant-active-pods-check
Remove check for empty activePods list in CPUManager removeStaleState
2020-01-20 09:05:50 -08:00
Kevin Klues
7be9b0fe55 Update comments and error messages in the CPUManager 2020-01-20 15:31:01 +01:00
Kevin Klues
f2acbf6607 Base CPUManager state reconciliation on container state, not pod state 2020-01-20 13:57:30 +00:00
Kevin Klues
f6cf9b8ce9 Move CPUManager Pod Status logic before container loop 2020-01-20 13:57:30 +00:00
Kevin Klues
34b942a41d Remove check for empty activePods list in CPUManager removeStaleState
This check is redundant since we protect this call with a call to
`m.sourcesReady.AllReady()` earlier on. Moreover, having this check in
place means that we will leave some stale state around in cases where
there are actually no active pods in the system and this loop hasn't
cleaned them up yet. This can happen, for example, if a pod exits while
the kubelet is down for some reason. We see this exact case being
triggered in our e2e tests, where a test has been failing since October
when this change was first introduced.
2020-01-15 20:09:24 +01:00
whypro
f4bd4e2e96 Return error instead of panic when cpu manager starts failed. 2019-12-19 21:56:23 +08:00
Kevin Klues
f553286156 Pass initial set of runtime containers to the CPUManager at startup
These information associatedd with these containers is used to migrate
the CPUManager state from it's old format to its new (i.e. keyed off of
podUID and containerName instead of containerID).
2019-12-11 23:02:51 +01:00
Kevin Klues
6441e1ef43 Move CPUManager Checkpoint restoration to Start() instead of New() 2019-12-11 23:02:51 +01:00
Kevin Klues
69f8053850 Update top-level CPUManager to adhere to new state semantics
For now, we just pass 'nil' as the set of 'initialContainers' for
migrating from old state semantics to new ones. In a subsequent commit
will we pull this information from higher layers so that we can pass it
down at this stage properly.
2019-12-11 23:02:51 +01:00
Kevin Klues
765aae93f8 Move containerMap out of static policy and into top-level CPUManager 2019-12-11 23:02:51 +01:00
Kubernetes Prow Robot
73b2c82b28
Merge pull request #83592 from jianzzha/opt-reserved-cpus
added --reserved-cpus kubelet command option
2019-11-06 22:14:42 -08:00
Jianzhu Zhang
89dfd24483 added --reserved-cpus kubelet command option 2019-11-06 07:33:52 -05:00
Kevin Klues
58f3554ebe Sync all CPU and device state before generating TopologyHints for them
This ensures that we have the most up-to-date state when generating
topology hints for a container. Without this, it's possible that some
resources will be seen as allocated, when they are actually free.
2019-11-05 13:00:20 +00:00
Kevin Klues
d9adf20360 Abstract removeStaleState from reconcileState in CPUManager
This will become especially important as we move to a model where
exclusive CPUs are assigned at pod admission time rather than at pod
creation time.

Having this function will allow us to do garbage collection on these
CPUs anytime we are about to allocate CPUs to a new set of containers,
in addition to reclaiming state periodically in the reconcileState()
loop.
2019-11-05 12:45:11 +00:00
Kubernetes Prow Robot
17a57f99d5
Merge pull request #81344 from zouyee/cpm
fix cpumanager reconcileState without sourceready
2019-10-30 23:33:36 -07:00
Connor Doyle
389853894d Delegate topology hint gen to CPU manager policy
- The previous implementation depended on a fixed set of policies.
2019-09-27 22:29:02 -07:00
zouyee
594fc0f4b9 fix cpumanager reconcileState without sourceready
Signed-off-by: Zou Nengren <zouyee1989@gmail.com>
2019-09-25 10:39:06 +08:00
Kevin Klues
ddfd9ac0ca Fix bug in CPUManager with setting topology for policies
Also add a check in the unit tests to avoid regressions
2019-08-28 17:32:25 -05:00
Kevin Klues
ecc14fe661 Update CPUManager to include NUMANodeID in CPUTopology
Unfortunately, the NUMA information is not readily available from
cadvisor, so we have to roll the logic to discover it by hand. In the
future, we should remove this custiom code to use the information
provided by cadvisor once it is made available.
2019-08-27 16:51:05 -05:00
Kevin Klues
869962fa48 Cache the discovered topology in the CPUManager instead of MachineInfo 2019-08-27 16:23:07 -05:00
Kevin Klues
4fdd52b058 Update GetTopologyHints() API to return a map
At present, there is no way for a hint provider to return distinct hints
for different resource types via a call to GetTopologyHints(). This
means that hint providers that govern multiple resource types (e.g. the
devicemanager) must do some sort of "pre-merge" on the hints it
generates for each resource type before passing them back to the
TopologyManager.

This patch changes the GetTopologyHints() interface to allow a hint
provider to pass back raw hints for each resource type, and allow the
TopologyManager to merge them using a single unified strategy.

This change also allows the TopologyManager to recognize which
resource type a set of hints originated from, should this information
become useful in the future.
2019-08-16 08:06:12 +02:00
Conor Nolan
e33af11add Add stub support for TopologyManager to CPUManager
Co-Authored-By: Louise Daly <louise.m.daly@intel.com>
2019-08-07 15:56:05 +02:00
Kevin Klues
5dc5f1de06 Update the cpumanager to error out if an invalid policy is given
Previously, the cpumanager would simply fall back to the None() policy
if an invalid policy was specified. This patch updates this to return an
error when an invalid policy is passed, forcing the kubelet to fail
fast when this occurs.

These semantics should be preferable because an invalid policy likely
indicates operator error in setting the policy flag on the kubelet
correctly (e.g. misspelling 'static' as 'statiic'). In this case it is
better to fail fast so the operator can detect this and correct the
mistake, than to mask the error and essentially disable the cpumanager
unexpectedly.
2019-07-18 13:24:09 +02:00
Kubernetes Prow Robot
98c4c1e2d8
Merge pull request #77291 from tedyu/cpu-pod-stat
Query pod status outside loop over containers
2019-05-01 23:28:56 -07:00
Ted Yu
3fc16a7e82 Log pod name when pod status cannot be queried 2019-05-01 15:01:56 -07:00
Ted Yu
66ce52578a Query pod status outside loop over containers 2019-04-30 19:35:32 -07:00
Kevin Klues
ef27f5f1a5 Add ability to find init Container IDs in cpumanager reconcileState()
The cpumanager loops through all init Containers and app Containers when
reconciling its state. However, the current implementation of
findContainerIDByName(), which is call by the reconciler, does not
resolve for init Containers.

This patch updates findContainerIDByName() to account for init
Containers and adds a regression test that fails before the change and
succeeds after.
2019-04-27 06:18:55 -07:00
Davanum Srinivas
33081c1f07
New staging repository for cri-api
Change-Id: I2160b0b0ec4b9870a2d4452b428e395bbe12afbb
2019-03-26 18:21:04 -04:00
ailusazh
10995f661d clean containers in reconcileState of cpuManager 2019-01-15 16:09:28 +08:00
yanghaichao12
982d1778f8 Fix comment error of 'cpuManagerStateFileName' 2018-11-19 08:07:04 -05:00
Davanum Srinivas
954996e231
Move from glog to klog
- Move from the old github.com/golang/glog to k8s.io/klog
- klog as explicit InitFlags() so we add them as necessary
- we update the other repositories that we vendor that made a similar
change from glog to klog
  * github.com/kubernetes/repo-infra
  * k8s.io/gengo/
  * k8s.io/kube-openapi/
  * github.com/google/cadvisor
- Entirely remove all references to glog
- Fix some tests by explicit InitFlags in their init() methods

Change-Id: I92db545ff36fcec83afe98f550c9e630098b3135
2018-11-10 07:50:31 -05:00
choury
36b92b9b29 cpumanager: rollback state if updateContainerCPUSet failed 2018-08-17 18:08:58 +08:00
Klaudiusz Dembler
a9df2acc4b Typo fix 2018-06-07 12:08:48 +02:00
Klaudiusz Dembler
de1063bc7d Add compatibility tests 2018-05-21 14:50:31 +02:00
Klaudiusz Dembler
0fbd19bc06
Tweaks 2018-05-14 09:51:26 +02:00
Klaudiusz Dembler
6bfceed4ab
Migrate cpumanager to use checkpointing manager 2018-05-14 09:45:58 +02:00
Lee Verberne
e10042d22f Increment CRI version from v1alpha1 to v1alpha2
This also incorporates the version string into the package name so
that incompatibile versions will fail to connect.

Arbitrary choices:
- The proto3 package name is runtime.v1alpha2. The proto compiler
  normally translates this to a go package of "runtime_v1alpha2", but
  I renamed it to "v1alpha2" for consistency with existing packages.
- kubelet/apis/cri is used as "internalapi". I left it alone and put the
  public "runtimeapi" in kubelet/apis/cri/runtime.
2018-02-07 09:06:26 +01:00
Di Xu
d474b86e05 Propagate error up instead panic 2017-12-18 14:05:06 +08:00
Szymon Scharmach
552e4d3a9d Cpu manager reconclie loop can restore state 2017-11-27 11:22:21 +01:00
Connor Doyle
c95ee34234 Use file-backed state for all cpumanager policies
- Add unit test to verify policy name mismatch behavior.
2017-11-15 22:38:11 -08:00
Szymon Scharmach
7e7301ffaf Enable file state in static policy 2017-11-14 18:25:58 +01:00
Shawn Hsiao
f7a15cb751 set leveled logging (v=4) for 'updating container' message 2017-11-01 16:54:23 -04:00
Harry Zhang
282973d87d Elimenate extra CRI call 2017-09-30 16:51:32 +08:00
Connor Doyle
d0bcbbb437 Added static cpumanager policy. 2017-09-04 07:24:59 -07:00
Connor Doyle
ec706216e6 Un-revert "CPU manager wiring and none policy"
This reverts commit 8d2832021a.
2017-09-04 07:24:59 -07:00
Shyam JVS
8d2832021a Revert "CPU manager wiring and none policy" 2017-09-01 18:17:36 +02:00