Commit Graph

791 Commits

Author SHA1 Message Date
Kubernetes Prow Robot
002dbf6a4c Merge pull request #83777 from lmdaly/fix-single-numa-node-with-best-effort-pods
Fixed bug in TopologyManager with SingleNUMANode Policy
2019-11-01 04:53:23 -07:00
Kubernetes Prow Robot
17a57f99d5 Merge pull request #81344 from zouyee/cpm
fix cpumanager reconcileState without sourceready
2019-10-30 23:33:36 -07:00
nolancon
b0a85177d2 Clean-up and additional test cases for socket-mask unit test. 2019-10-18 04:16:06 +01:00
Kubernetes Prow Robot
017842d49d Merge pull request #83492 from ConnorDoyle/topo-align-all-qos
Topology manager aligns pods of all QoS classes.
2019-10-11 03:03:40 -07:00
Louise Daly
a353247d44 Fixed bug in TopologyManager with SingleNUMANode Policy
This patch fixes an issue where best-effort pods were not admitted
to the node if the single-numa-node policy was set.

This was because the Admit policy in single-numa-node policy does
not admit any pod where the hint is anything but single NUMA node. The 'best hint' in this case is {<set bits for num. Numa Nodes on machine>, true}
So on a machine with 2 NUMA nodes the best hint for a best-effort pod is {11,true} as best-effort pods have no Topology preferences.

The single-numa-node policy fails any pod with a not preferred hint OR a hint where > 1 bits are set, thus the above example resulting in termintaed pods with a Topology Affinity Error.

This is a short term fix for the single-numa-node policy, as there will be code refactoring for the 1.17 release.
2019-10-11 07:00:37 +01:00
Kubernetes Prow Robot
4561b67971 Merge pull request #83697 from klueska/fix-single-numa-with-one-provider
Fixed bug in TopologyManager with SingleNUMANode Policy
2019-10-10 19:00:33 -07:00
Kubernetes Prow Robot
3db6d3abcf Merge pull request #83551 from dims/move-external-facing-kubelet-apis-to-staging
Move external facing kubelet apis to staging
2019-10-10 13:41:36 -07:00
Connor Doyle
a598369e3c Gofmt. 2019-10-10 12:16:21 -07:00
Connor Doyle
a9203ebdcf Topology manager aligns pods of all QoS classes. 2019-10-10 12:16:21 -07:00
Kevin Klues
5501f542cd Fixed bug in TopologyManager with SingleNUMANode Policy
This patch fixes an issue in the TopologyManager that wouldn't allow
pods to be admitted if pods were launched with the SingleNUMANode policy
and any of the hint providers had no NUMA preferences.

This is due to 2 factors:

1) Any hint provider that passes back a `nil` as its hints, has its hint
automatically transformed into a single {11 true} hint before merging

2) We added a special casing for the SingleNumaNodePolicy() in the
TopologyManager that essentially turns these hints into a
{11 false} anytime a {11 true} is seen.

The current patch reworks this logic so the that TopologyManager can
tell the difference between a "don't care" hint and a true "{11 true}"
hint returned by the hint provider. Only true "{11 true}" hints will be
converted by the special casing for the SingleNumaNodePolicy(), while
"don't care" hints will not.

This is a short term fix for this issue until we do a larger refactoring
of this code for the 1.17 release.
2019-10-09 17:41:08 -07:00
mrobson
ad3dcb9fa0 Add podCgroup to process kill events to allow for correlation 2019-10-08 13:12:48 -04:00
Kubernetes Prow Robot
d70b2db1f2 Merge pull request #83296 from yutedz/kill-cgrp-proc
Only kill process where killing failed during previous iterations
2019-10-08 07:19:13 -07:00
Kubernetes Prow Robot
3f8f0a32fa Merge pull request #83527 from odinuge/runc-rc9
Bump dependency opencontainers/runc@v1.0.0-rc9
2019-10-08 03:45:44 -07:00
Davanum Srinivas
f29d2272c8 fix gofmt and golint failures
Change-Id: I6535b506f50558b31663a13cd270b15023afa2c6
2019-10-06 18:43:17 -04:00
Kubernetes Prow Robot
48b90db9c3 Merge pull request #83495 from tanjunchen/fix-typo
remove the repeat word in documents
2019-10-06 15:05:08 -07:00
Davanum Srinivas
6ecc0f83af update bazel BUILD files
Change-Id: Ia3917cec1453c0b22a958faf8c22bccd79242d14
2019-10-06 15:29:23 -04:00
Davanum Srinivas
d30c489c54 Move pkg/kubelet/pluginregistration and deviceplugin
Change-Id: I06adcb43bd278b430ffad2010869e1524c8cc4ff
2019-10-06 15:28:38 -04:00
tanjunchen
de3cf23414 remove the repeat word in documents 2019-10-06 23:32:01 +08:00
Odin Ugedal
b9cfb19321 Rename cgroupsystemd.Manager to LegacyManager 2019-10-05 14:22:35 +02:00
Kubernetes Prow Robot
d60bda1971 Merge pull request #83043 from ConnorDoyle/cleanup-cpumanger-topo-hints
Delegate topology hint gen to CPU manager policy
2019-10-05 00:59:39 -07:00
Kevin Klues
d2b53af7d7 Add klueska as reviewer for CPUManager and devicemanager 2019-10-03 13:01:41 -07:00
Ted Yu
6dbb533e3c Only kill process where killing failed during previous iterations 2019-09-29 19:53:43 -07:00
Connor Doyle
389853894d Delegate topology hint gen to CPU manager policy
- The previous implementation depended on a fixed set of policies.
2019-09-27 22:29:02 -07:00
zouyee
b1f6974f7b using online instead to fix kubelet service failed with wrong number of possible NUMA nodes
Signed-off-by: Zou Nengren <zouyee1989@gmail.com>
2019-09-26 21:48:50 +08:00
zouyee
594fc0f4b9 fix cpumanager reconcileState without sourceready
Signed-off-by: Zou Nengren <zouyee1989@gmail.com>
2019-09-25 10:39:06 +08:00
Connor Doyle
e35301c19f Rename package socketmask to bitmask.
- As discussed in reviews and other public channels,
  this abstraction is used to represent numa nodes, not sockets.
- There is nothing inherently related to sockets in this package anyway.
2019-09-23 17:08:45 -07:00
Kubernetes Prow Robot
07cc813956 Merge pull request #81793 from lmdaly/topology-manager-owners
Added OWNERS file for Topology Manager
2019-09-11 18:26:52 -07:00
Louise Daly
fbccf25e29 Added OWNERS file for Topology Manager 2019-09-11 06:40:24 +01:00
Kubernetes Prow Robot
887edd2273 Merge pull request #82099 from lmdaly/single-numa-node-policy
Topology Manager Policy: single-numa-node
2019-08-30 11:21:26 -07:00
Kubernetes Prow Robot
9165f7bf56 Merge pull request #82104 from klueska/upstream-fix-cpu-manager-topology-bug
Fix bug in CPUManager with setting topology for policies
2019-08-30 08:00:44 -07:00
Louise Daly
8ad1b5ba3b Single-numa-node Topology Manager bug fix
Added one off fix for single-numa-node policy to correctly
reject pod admission on a resource allocation that spans
NUMA nodes

Co-authored-by: Kevin Klues <kklues@nvidia.com>
2019-08-30 07:17:56 +01:00
Louise Daly
f6c085f60e Added Single NUMA Node Policy which ensure resource are
aligned on a single NUMA node

Co-authored-by: Kevin Klues <kklues@nvidia.com>
2019-08-30 07:17:17 +01:00
Kevin Klues
5ed80dadcf Update CanAdmitPodResult() in TopologyManager to take a TopologyHint
Previously it only took a bool, which limited the logic it could perform
to determine if a pod should be admitted or not based on the merged hint
from the policy.
2019-08-30 07:17:17 +01:00
Kevin Klues
eb0216e54e Update semantics to set Preferred field in TopologyHint generation
We now only set Preferred to true if resources can be allocated with a
size equal to the minimimum _possible_ mask when all resources are
available.
2019-08-29 14:32:10 -05:00
Kevin Klues
e0e8b3e4fd Update CPUManager topology helpers to accept multiple ids 2019-08-29 13:22:54 -05:00
Kevin Klues
dcc9f66311 Add devicemanager tests for TopologyHint consumption 2019-08-29 08:22:50 -05:00
Kevin Klues
cc567afaf0 Consume TopologyHints in the devicemanager 2019-08-29 08:22:50 -05:00
Kevin Klues
a3320f80d9 Add devicemanager tests for TopologyHint generation 2019-08-29 07:45:43 -05:00
Kevin Klues
d3d7a8f5d4 Generate TopologyHints from the devicemanager 2019-08-29 07:45:43 -05:00
Louise Daly
9a118ceac4 Added stub support for Topology Manager to Device Manager
Co-authored-by: Conor Nolan <conor.nolan@intel.com>
Co-authored-by: Sreemanti Ghosh <sreemanti.ghosh@intel.com>
Co-authored-by: Kevin Klues <kklues@nvidia.com>
2019-08-29 07:45:43 -05:00
Kevin Klues
ddfd9ac0ca Fix bug in CPUManager with setting topology for policies
Also add a check in the unit tests to avoid regressions
2019-08-28 17:32:25 -05:00
Kevin Klues
df1b54fc09 Fail fast with TopologyManager on machines with more than 8 NUMA Nodes 2019-08-28 11:04:52 -05:00
Kevin Klues
5660cd3cfb Add NUMA Node awareness to the TopologyManager 2019-08-28 11:04:52 -05:00
Kubernetes Prow Robot
35867b160a Merge pull request #81951 from klueska/upstream-update-cpu-amanger-numa-mapping
Update the CPUManager to include NUMANodeID in its topology information
2019-08-28 08:55:40 -07:00
Kubernetes Prow Robot
de1cfa9bc1 Merge pull request #81787 from lmdaly/topology-manager-rename-strict-policy
Renaming strict policy to restricted policy
2019-08-28 01:38:04 -07:00
Kevin Klues
f4dbd29cdb Rename TopologyHint.SocketAffinity to TopologyHint.NUMANodeAffinity
As part of this, update the logic to use the NUMA information instead of
the Socket information when generating and consuming TopologyHints in
the CPUManager.
2019-08-27 16:51:05 -05:00
Kevin Klues
ecc14fe661 Update CPUManager to include NUMANodeID in CPUTopology
Unfortunately, the NUMA information is not readily available from
cadvisor, so we have to roll the logic to discover it by hand. In the
future, we should remove this custiom code to use the information
provided by cadvisor once it is made available.
2019-08-27 16:51:05 -05:00
Kevin Klues
869962fa48 Cache the discovered topology in the CPUManager instead of MachineInfo 2019-08-27 16:23:07 -05:00
Kubernetes Prow Robot
a3488b4cee Merge pull request #81206 from tallclair/staticcheck-kubelet-push
Cleanup Kubelet static analysis issues
2019-08-22 15:09:43 -07:00
Kubernetes Prow Robot
6b47754740 Merge pull request #81627 from tallclair/copy
Delete duplicate resource.Quantity.Copy()
2019-08-22 11:13:13 -07:00