Commit Graph

751 Commits

Author SHA1 Message Date
Kubernetes Prow Robot
d92fdebd85
Merge pull request #89897 from giuseppe/test-e2e-node
kubelet: fix e2e-node cgroups test on cgroup v2
2020-04-20 15:54:12 -07:00
Kubernetes Prow Robot
d0183703cb
Merge pull request #90059 from ahg-g/ahg-nodeinfo2
Cleanup obsolete NodeInfo methods
2020-04-14 17:32:04 -07:00
Abdullah Gharaibeh
d6522e0e74 rename framework pkg with schedulerframework for all instances under pkg/kubelet 2020-04-14 14:24:07 -04:00
Kubernetes Prow Robot
105c0c6951
Merge pull request #88970 from mysunshine92/correct-NodeAllocatableRoot
fix function NodeAllocatableRoot
2020-04-14 11:04:13 -07:00
Abdullah Gharaibeh
bed9b2f23b Cleanup obsolete NodeInfo methods 2020-04-12 18:13:46 -04:00
Giuseppe Scrivano
26d94ad628
kubelet: do not configure the device cgroup
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2020-04-09 16:18:06 +02:00
Giuseppe Scrivano
a9772b2290
kubelet: adapt cgroup_manager to cgroup v2
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2020-04-09 16:18:04 +02:00
Giuseppe Scrivano
6d16fee229
kubelet: cpu hard capping is supported on cgroup v2
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2020-04-09 16:18:03 +02:00
Francesco Romani
623587ec8b cpumanager: test: add missing helper
add back the missing AssertStateEqual helper;
it is needed by some tests we still want to run.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2020-04-07 16:59:07 +02:00
Francesco Romani
be0fe3df9b cpumanager: drop old custom file backend
The cpumanager file-based state backend was obsoleted since few
releases, aving the cpumanager moved to the checkpointmanager common
infrastructure.
The old test checking compatibility to/from the old format is
also no longer needed, because the checkpoint format is stable
(see
https://github.com/kubernetes/kubernetes/tree/master/pkg/kubelet/checkpointmanager).

Signed-off-by: Francesco Romani <fromani@redhat.com>
2020-04-07 13:24:48 +02:00
Kubernetes Prow Robot
b030be376b
Merge pull request #89581 from Wenfeng-GAO/simplify
simplify code in topologymanager
2020-04-02 23:07:46 -07:00
Wenfeng-GAO
1aebbee7da simplify code in topologymanager 2020-03-28 00:04:51 +08:00
Giuseppe Scrivano
c4429d8bd4
kubelet: add tests for cgroup v2 conversions
follow-up for https://github.com/kubernetes/kubernetes/pull/85218

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2020-03-27 13:50:57 +01:00
Kubernetes Prow Robot
34c8b26c9f
Merge pull request #85218 from giuseppe/cgroupv2
kubelet: add initial support for cgroupv2
2020-03-26 14:10:23 -07:00
Kubernetes Prow Robot
4488fd4749
Merge pull request #89053 from bg-chun/move_package
migration of re-usable package from pkg/kubelet/cm/cpumanager to pkg/kubelet/cm
2020-03-26 11:14:09 -07:00
yameiwang
6783f991c3 fix function NodeAllocatableRoot 2020-03-26 18:48:05 +08:00
Byonggon Chun
a3047672d0 move pkg/kubelet/cm/cpumanager/containermap to pkg/kubelet/cm/containermap for reusing
containerMap is used in CPU Manager to store all containers information in the node.
containerMap provides a mapping from (pod, container) -> containerID for all containers a pod
It is reusable in another component in pkg/kubelet/cm which needs to track changes of all containers in the node.

Signed-off-by: Byonggon Chun <bg.chun@samsung.com>
2020-03-14 02:38:51 +09:00
Giuseppe Scrivano
bb5ed1b797
kubelet: add initial support for cgroupv2
do a conversion from the cgroups v1 limits to cgroups v2.

e.g. cpu.shares on cgroups v1 has a range of [2-262144] while the
equivalent on cgroups v2 is cpu.weight that uses a range [1-10000].

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2020-03-12 08:50:19 +01:00
Adelina Tuvenie
a9f834d17d Implement noopWindowsResourceAllocator
On Windows, the podAdmitHandler returned by the GetAllocateResourcesPodAdmitHandler() func
and registered by the Kubelet is nil.

We implement a noopWindowsResourceAllocator that would admit any pod for Windows in order
to be consistent with the original implementation.
2020-03-10 21:32:23 +01:00
Kubernetes Prow Robot
cd0057c16a
Merge pull request #88876 from nolancon/none-policy-fix
Topology Manager none policy bug fix
2020-03-05 21:40:33 -08:00
Kubernetes Prow Robot
ce01a9bad0
Merge pull request #88857 from nolancon/test-fix
Check for nil cpuManager in container manager
2020-03-05 20:05:14 -08:00
Kubernetes Prow Robot
48541a0b16
Merge pull request #87650 from nolancon/beta-feature-gate
Update TopologyManager Feature Gate
2020-03-05 20:03:04 -08:00
nolancon
0551d408ac Bug fix for TM none policy 2020-03-05 14:25:48 +00:00
nolancon
4baa1d967d Check for nil cpuManager 2020-03-05 07:54:33 +00:00
Kubernetes Prow Robot
ac32644d6e
Merge pull request #87759 from klueska/upstream-move-cpu-allocation-to-pod-admit
Guarantee aligned resources across containers
2020-03-04 20:12:37 -08:00
nolancon
e8538d9b76 Add mutex to Topology Manager Add/RemoveContainer
This was exposed as a potential bug during e2e test debugging of this
PR.
2020-03-02 04:07:21 +00:00
Kevin Klues
2327934a86 Rename GetTopologyPodAmitHandler() as
GetAllocateResourcesPodAdmitHandler(). It is named as such to reflect its
new function. Also remove the Topology Manager feature gate check at higher level
kubelet.go, as it is now done in GetAllocateResourcesPodAdmitHandler().
2020-02-27 07:52:43 +00:00
nolancon
a9c6129577 Device Manager - Update unit tests
- Pass container to Allocate().
- Loop through containers to call Allocate() on container by container
basis.
2020-02-27 07:24:34 +00:00
nolancon
cb9fdc49db Device Manager - Refactor allocatePodResources
- allocatePodResources logic altered to allow for container by container
device allocation.
- New type PodReusableDevices
- New field in devicemanager devicesToReuse
2020-02-27 07:24:34 +00:00
nolancon
0a9bd0334d CPU Manager - Updates to unit tests:
- Where previously we called manager.AddContainer(), we now call both
manager.Allocate() and manager.AddContainer().
- Some test cases now have two expected errors. One each
from Allocate() and AddContainer(). Existing outcomes are unchanged.
2020-02-27 07:24:34 +00:00
nolancon
467f66580b CPU Manager - Add check to policy.Allocate() for init conatiners
If container allocated CPUs is an init container, release those CPUs
back into the shared pool for re-allocation to next container.
2020-02-27 07:24:33 +00:00
nolancon
709989efa2 CPU Manager - Rename policy.AddContainer() to policy.Allocate() 2020-02-27 07:24:33 +00:00
Kevin Klues
0d68bffd03 Change GetTopologyPodAdmitHandler() to be more general
GetTopologyPodAdmitHandler() now returns a lifecycle.PodAdmitHandler
type instead of the TopologyManager directly. The handler it returns
is generally responsible for attempting to allocate any resources that
require a pod admission check. When the TopologyManager feature gate
is on, this comes directly from the TopologyManager. When it is off,
we simply attempt the allocations ourselves and fail the admission
on an unexpected error. The higher level kubelet.go feature gate
check will be removed in an upcoming PR.
2020-02-27 07:24:26 +00:00
Oleg Chunikhin
b651178849 fix incorrect configuration of kubepods.slice unit by kubelet (issue #88197) 2020-02-17 13:22:45 -05:00
Kubernetes Prow Robot
1a0f923a65
Merge pull request #87712 from alena1108/jan30kubelet
Ineffassign fixes for pkg/controller and kubelet
2020-02-14 14:29:27 -08:00
Kevin Klues
0b168f0243 Change devicemanager to implement HintProvider.Allocate()
This change will not work on its own. Higher level code needs to make
sure and call Allocate() before AddContainer is called. This is already
being done in cases when the TopologyManager feature gate is enabled (in
the PodAdmitHandler of the TopologyManager). However, we need to make
sure we add proper logic to call it in cases when the TopologyManager
feature gate is disabled.
2020-02-10 03:27:47 +00:00
Kevin Klues
91f91858a5 Change CPUManager to implement HintProvider.Allocate()
This change will not work on its own. Higher level code needs to make
sure and call Allocate() before AddContainer is called. This is already
being done in cases when the TopologyManager feature gate is enabled (in
the PodAdmitHandler of the TopologyManager). However, we need to make
sure we add proper logic to call it in cases when the TopologyManager
feature gate is disabled.
2020-02-10 03:27:47 +00:00
Kevin Klues
9e4ee5ecc3 Add Allocate() call to TopologyManager's HintProvider interface
Having this interface allows us to perform a tight loop of:

    for each container {
        containerHints = {}
        for each provider {
	    containerHints[provider] = provider.GatherHints(container)
        }

        containerHints.MergeAndPublish()

        for each provider {
	    provider.Allocate(container)
        }
    }

With this in place we can now be sure that the hints gathered in one
iteration of the loop always consider the allocations made in the
previous.
2020-02-10 03:27:47 +00:00
Kevin Klues
a3f099ea4d Split devicemanager Allocate into two functions
Instead of having a single call for Allocate(), we now split this into two
functions Allocate() and UpdatePluginResources().

The semantics split across them:

// Allocate configures and assigns devices to a pod. From the requested
// device resources, Allocate will communicate with the owning device
// plugin to allow setup procedures to take place, and for the device
// plugin to provide runtime settings to use the device (environment
// variables, mount points and device files).
Allocate(pod *v1.Pod) error

// UpdatePluginResources updates node resources based on devices already
// allocated to pods. The node object is provided for the device manager to
// update the node capacity to reflect the currently available devices.
UpdatePluginResources(
    node *schedulernodeinfo.NodeInfo,
    attrs *lifecycle.PodAdmitAttributes) error

As we move to a model in which the TopologyManager is able to ensure
aligned allocations from the CPUManager, devicemanger, and any
other TopologManager HintProviders in the same synchronous loop, we will
need to be able to call Allocate() independently from an
UpdatePluginResources(). This commit makes that possible.
2020-02-10 03:27:47 +00:00
Takeaki Matsumoto
785fac6826 Make updateAllocatedDevices() as a public method and call it in
podresources api
2020-02-07 13:26:56 +09:00
Kevin Klues
d5addb4090 Cleanup logging and creation logic of TopologyManager in prep for beta 2020-02-03 17:13:29 +00:00
Kevin Klues
bc686ea27b Update TopologyManager.GetTopologyHints() to take pointers
Previously, this function was taking full Pod and Container objects
unnecessarily. This commit updates this so that they will take pointers
instead.
2020-02-03 17:13:28 +00:00
Kevin Klues
adaa58b6cb Update TopologyManager.Policy.Merge() to return a simple bool
Previously, the verious Merge() policies of the TopologyManager all
eturned their own lifecycle.PodAdmitResult result. However, for
consistency in any failed admits, this is better handled in the
top-level Topology manager, with each policy only returning a boolean
about whether or not they would like to admit the pod or not. This
commit changes the semantics to match this logic.
2020-02-03 17:13:28 +00:00
Kevin Klues
95a3ac447f Fix bug in TopologManager RemoveContainer()
Previously, we unconditionally removed *all* topology hints from a pod
whenever just one container was being removed. This commit makes it so
we only remove the hints for the single container being removed, and
then conditionally remove the pod from the podTopologyHints[podUID] when
no containers left in it.
2020-02-03 17:13:14 +00:00
Alena Prokharchyk
6c3093f970 Ineffassign fixes for pkg/controller and kubelet 2020-01-30 14:35:10 -08:00
sewon.oh
463442aa29
Update container hugepage limit when creating the container
Unit test for updating container hugepage limit
Add warning message about ignoring case.
Update error handling about hugepage size requirements

Signed-off-by: sewon.oh <sewon.oh@samsung.com>
2020-01-28 09:35:02 +09:00
Kubernetes Prow Robot
98f63eee1b
Merge pull request #87460 from nolancon/policies_refactor
Refactor Topology Manager policies to reduce code duplication
2020-01-24 21:11:15 -08:00
nolancon
4d76b1c8de Add mergeFilteredHints:
- Move remaining logic from mergeProvidersHints to generic top level
mergeFilteredHints function.
- Add numaNodes as parameter in order to make generic.
- Move single NUMA node specific check to single-numa-node Merge
function.
2020-01-22 09:07:41 +00:00
nolancon
fc300e0e7d Move filterSingleNumaHints call to top level Merge 2020-01-22 08:39:22 +00:00
nolancon
45660fd3a2 Add filterProvidersHints function:
- Move initial 'filtering' functionality to generic function
filterProvidersHints level policy.go.
- Call new function from top level Merge function.
- Rename some variables/parameters to reflect changes.
2020-01-22 08:35:28 +00:00