Commit Graph

791 Commits

Author SHA1 Message Date
Kubernetes Prow Robot
b6174e605f
Merge pull request #93189 from klueska/upstream-fix-bug-topology-manager
Fix a bug whereby reusable CPUs and devices were not being honored
2020-07-21 04:35:17 -07:00
Kubernetes Prow Robot
1fdd8fb213
Merge pull request #93263 from liggitt/windows
Fix windows kubelet startup
2020-07-20 19:51:57 -07:00
Jordan Liggitt
886727a4c0 Revert "Add deviceManager in windows container manager"
This reverts commit 056d73b1a1.
2020-07-20 16:13:53 -04:00
Giuseppe Scrivano
ef935bd991
kubelet: clamp cpu shares to max allowed
clamp the max cpu.shares to the maximum value allowed by the kernel.

It is not an issue when using cgroupfs, as the kernel will
anyway make sure the value is not out of range and automatically clamp
it, systemd has an additional check that prevents the cgroup creation.

Closes: https://github.com/kubernetes/kubernetes/issues/92855

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2020-07-20 17:18:03 +02:00
Kevin Klues
00df26a985 Fix a bug whereby reusable CPUs and devices were not being honored
Previously, it was possible for reusable CPUs and reusable devices (i.e.
those previously consumed by init containers) to not be reused by
subsequent init containers or app containers if the TopologyManager was
enabled. This would happen because hint generation for the
TopologyManager was not considering the reusable devices when it made
its hint calculation.

As such, it would sometimes:
1) Generate a hint for a differnent NUMA node, causing the CPUs and
devices to be allocated from that node instead of the one where the
reusable devices live; or
2) End up thinking there were not enough CPUs or devices to allocate and
throw a TopologyAffinity admission error

This patch fixes this by ensuring that reusable CPUs and devices are
considered as part of TopologyHint generation. This frunctionality is
difficult to unit test since it spans multiple components, but an e2e
test will be added in a subsequent patch to test this functionality.
2020-07-20 11:41:13 +00:00
Kevin Klues
74fe9364c3 Simplify logic in devicemanager TopologyHint generation 2020-07-20 11:41:13 +00:00
Kevin Klues
9f5f401d60 Add AnySet() to topologymanager bitmask API 2020-07-20 11:41:13 +00:00
Kubernetes Prow Robot
242f3d9dce
Merge pull request #80917 from aarnaud/windows-devicemanager
Port deviceManager to windows container manager to enable GPU access
2020-07-17 21:04:50 -07:00
Giuseppe Scrivano
79be8be10e
kubelet, cgroupv2: make hugetlb optional
make the hugetlb controller optional when cgroup v2 is used.

Closes: https://github.com/kubernetes/kubernetes/issues/92933

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2020-07-13 09:40:55 +02:00
Kubernetes Prow Robot
63926cf8e7
Merge pull request #92862 from giuseppe/cgroup-fix-leaks
vendor: update github.com/opencontainers/runc
2020-07-11 20:57:11 -07:00
Giuseppe Scrivano
0d2a493a8f
kubelet: skip setting the devices cgroup
use the new libcontainer feature of skipping setting the devices
cgroup.  This is necessary on cgroup v2 to avoid leaking a eBPF
program every time the cgroup is re-configured.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2020-07-09 09:37:46 +02:00
Anthony ARNAUD
056d73b1a1
Add deviceManager in windows container manager 2020-07-08 18:22:16 +02:00
Kevin Klues
26cb650655 Remove unnecessary union after call to GetPreferredAllocation()
There is no need to try and allocate already-allocated devices again.
2020-07-07 06:35:57 +00:00
Kevin Klues
67ecc11c44 Harden callGetPreferredAllocationIfAvailable() return value
Previously, we didn't check the contents of the result after calling out
to the plugin endpoint. This could have resulted in errors if the plugin
returned either 'nil' or an empty result. This patch fixes this.
2020-07-07 06:35:57 +00:00
Kevin Klues
d87365494a Fix bug in call to callGetPreferredAllocationIfAvailable()
Previously, we were passing the variable 'devices' to this function,
when we should have been passing 'allocated'. This bug crept in due to a
variable name change that didn't propogate its way through the entire
function. The tests added in the previous commit would have caught this.
2020-07-07 06:35:57 +00:00
Kevin Klues
d551ab1e78 Add tests to check paramaters passed to GetPreferredAllocation()
These tests uncovered some small bugs that will be fixed in a subsequent
set of commits.
2020-07-07 06:35:57 +00:00
Kevin Klues
5bd0db0b1f Add new test cases for GetPreferredAllocation() in allocation path 2020-07-03 13:01:32 +00:00
Kevin Klues
83f18d9975 Remove unnecessary field from TestTopologyAlignedAllocation() test cases 2020-07-03 13:01:32 +00:00
Kevin Klues
bb08fd1135 Add a simple endpoint test for GetPreferredAllocation()
More extensive tests that exercise the allocation logic are to follow.
2020-07-03 13:01:32 +00:00
Kevin Klues
cbd405d85c Update existing tests in support of GetPreferredallocation() 2020-07-03 13:01:32 +00:00
Kevin Klues
a780ccff5b Updates logic in devicesToAllocate() to call GetPreferredAllocation() 2020-07-02 22:07:27 +00:00
Kevin Klues
bb56a09133 Add callGetPreferredAllocationIfAvailable() function in devicemanager
This function mimics what is already done for the conditional call to
PreStartContainer() via the callPreStartContainerIfNeeded() function.
2020-07-02 22:07:27 +00:00
Kevin Klues
abf87c99c6 Add GetPreferredAllocation() as a supported device plugin endpoint 2020-07-02 15:15:50 +00:00
Kevin Klues
32c047a52e Update device plugin stub with new GetPreferredAllocation() call 2020-07-02 15:15:48 +00:00
Kevin Klues
c45f1317eb Fix some whitespacing and comments in devicemanager 2020-07-02 15:15:44 +00:00
Giuseppe Scrivano
e94aebf4cb
pkg/kubelet: adapt to new libcontainer API
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2020-06-24 18:39:51 +02:00
Kubernetes Prow Robot
86ad0df820
Merge pull request #92203 from sjenning/add-sjenning-node-approver
Add sjenning as kubelet approver
2020-06-19 21:52:02 -07:00
Seth Jennings
45d2b98aa8 add sjenning as kubelet approver 2020-06-19 13:00:55 -05:00
kadisi
a75323c76b fix unexpected append mutations about pkg/kubelet package
Signed-off-by: kadisi <iamkadisi@163.com>
Co-authored-by: Dr. Stefan Schimanski <stefan.schimanski@gmail.com>
2020-06-03 13:36:57 +08:00
Kubernetes Prow Robot
0e37bcce2c
Merge pull request #88385 from tallclair/node-reviews
Remove tallclair from some OWNERS files
2020-05-24 20:23:11 -07:00
Kubernetes Prow Robot
b170451caa
Merge pull request #90183 from dims/update-kubernetes-to-klog-v2
Update kubernetes to klog v2
2020-05-16 18:59:51 -07:00
Kubernetes Prow Robot
f011430e85
Merge pull request #84599 from mrobson/log-destroy
Errors from cgroup destroy are swallowed. Log error at warning level.
2020-05-16 18:59:36 -07:00
Davanum Srinivas
07d88617e5
Run hack/update-vendor.sh
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2020-05-16 07:54:33 -04:00
Davanum Srinivas
442a69c3bd
switch over k/k to use klog v2
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2020-05-16 07:54:27 -04:00
caiweidong
5ed8fb690c Use klog to replace log to keep them in consistence 2020-05-12 13:49:10 +08:00
Tim Allclair
029a144ae9 Remove tallclair from some OWNERS files 2020-05-11 11:44:38 -07:00
Kubernetes Prow Robot
7fdc1275d9
Merge pull request #90377 from cbf123/container_cpuset_fixup_2
Fix exclusive CPU allocations being deleted at container restart
2020-04-27 13:40:04 -07:00
Chris Friesen
ab5870d808 Fix exclusive CPU allocations being deleted at container restart
The expectation is that exclusive CPU allocations happen at pod
creation time. When a container restarts, it should not have its
exclusive CPU allocations removed, and it should not need to
re-allocate CPUs.

There are a few places in the current code that look for containers
that have exited and call CpuManager.RemoveContainer() to clean up
the container.  This will end up deleting any exclusive CPU
allocations for that container, and if the container restarts within
the same pod it will end up using the default cpuset rather than
what should be exclusive CPUs.

Removing those calls and adding resource cleanup at allocation
time should get rid of the problem.

Signed-off-by: Chris Friesen <chris.friesen@windriver.com>
2020-04-27 11:36:54 -06:00
Kevin Klues
751b9f3e13 Update strategy used to reuse CPUs from init containers in CPUManager
With the old strategy, it was possible for an init container to end up
running without some of its CPUs being exclusive if it requested more
guaranteed CPUs than the sum of all guaranteed CPUs requested by app
containers. Unfortunately, this case was not caught by our unit tests
because they didn't validate the state of the defaultCPUSet to ensure
there was no overlap with CPUs assigned to containers. This patch
updates the strategy to reuse the CPUs assigned to init containers
across into app containers, while avoiding this edge case. It also
updates the unit tests to now catch this type of error in the future.
2020-04-23 20:27:43 +00:00
Kubernetes Prow Robot
d92fdebd85
Merge pull request #89897 from giuseppe/test-e2e-node
kubelet: fix e2e-node cgroups test on cgroup v2
2020-04-20 15:54:12 -07:00
Kubernetes Prow Robot
d0183703cb
Merge pull request #90059 from ahg-g/ahg-nodeinfo2
Cleanup obsolete NodeInfo methods
2020-04-14 17:32:04 -07:00
Abdullah Gharaibeh
d6522e0e74 rename framework pkg with schedulerframework for all instances under pkg/kubelet 2020-04-14 14:24:07 -04:00
Kubernetes Prow Robot
105c0c6951
Merge pull request #88970 from mysunshine92/correct-NodeAllocatableRoot
fix function NodeAllocatableRoot
2020-04-14 11:04:13 -07:00
Abdullah Gharaibeh
bed9b2f23b Cleanup obsolete NodeInfo methods 2020-04-12 18:13:46 -04:00
Giuseppe Scrivano
26d94ad628
kubelet: do not configure the device cgroup
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2020-04-09 16:18:06 +02:00
Giuseppe Scrivano
a9772b2290
kubelet: adapt cgroup_manager to cgroup v2
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2020-04-09 16:18:04 +02:00
Giuseppe Scrivano
6d16fee229
kubelet: cpu hard capping is supported on cgroup v2
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2020-04-09 16:18:03 +02:00
Francesco Romani
623587ec8b cpumanager: test: add missing helper
add back the missing AssertStateEqual helper;
it is needed by some tests we still want to run.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2020-04-07 16:59:07 +02:00
Francesco Romani
be0fe3df9b cpumanager: drop old custom file backend
The cpumanager file-based state backend was obsoleted since few
releases, aving the cpumanager moved to the checkpointmanager common
infrastructure.
The old test checking compatibility to/from the old format is
also no longer needed, because the checkpoint format is stable
(see
https://github.com/kubernetes/kubernetes/tree/master/pkg/kubelet/checkpointmanager).

Signed-off-by: Francesco Romani <fromani@redhat.com>
2020-04-07 13:24:48 +02:00
Kubernetes Prow Robot
b030be376b
Merge pull request #89581 from Wenfeng-GAO/simplify
simplify code in topologymanager
2020-04-02 23:07:46 -07:00