Commit Graph

821 Commits

Author SHA1 Message Date
Kubernetes Prow Robot
bbc26ba7e6
Merge pull request #96048 from rphillips/fixes/device_plugin_stub_race
Deflake TestDevicePluginReRegistrationProbeMode: Devices of previous registered should be removed
2020-11-02 08:20:54 -08:00
Kubernetes Prow Robot
332d17c7f5
Merge pull request #95731 from farah/split-scheduler
Delete framework/v1alpha1 folder and change remaining import paths
2020-10-30 11:14:22 -07:00
Ryan Phillips
4fdfbc718c devicemanager: fix race in stub
There is a race when the server is coming up and the subsequent dial on
the socket. Fix the race with a PollImmediate retry.
2020-10-30 11:42:01 -05:00
Kubernetes Prow Robot
94cedd9f14
Merge pull request #95720 from draveness/feature/topology-manager-format
style: update comments in topology manager
2020-10-27 10:36:38 -07:00
draveness
60d3f99b1f style: update comments in topology manager 2020-10-23 18:20:50 +08:00
Ali
bfdeda58b7 Delete framework/v1alpha1 folder and change remaining import paths 2020-10-23 13:16:13 +11:00
chenyw1990
009d46f834 write checkpoint only when allocated devices updated. 2020-10-22 22:45:04 +08:00
Kubernetes Prow Robot
6ac2930ef0
Merge pull request #94574 from auxten/pkg-kubelet-staticchecks
Fix pkg/kubelet static checks
2020-09-21 21:22:47 -07:00
Srini Brahmaroutu
fbe5daed73 Change code to use staging/k8s.io/mount-utils 2020-09-16 21:51:24 -07:00
Kubernetes Prow Robot
09b3f6dbb3
Merge pull request #93214 from trashhalo/prefer-error
test: prefer NoError/Error over Nil/NotNil
2020-09-16 15:10:45 -07:00
Renaud Gaubert
4eadf40448 Run gofmt
Signed-off-by: Renaud Gaubert <rgaubert@nvidia.com>
2020-09-15 06:22:44 -07:00
Renaud Gaubert
ba95a8c641 run hack/update-vendor.sh
Signed-off-by: Renaud Gaubert <rgaubert@nvidia.com>
2020-09-15 05:13:33 -07:00
Renaud Gaubert
60304452ff Move podresources api to k8s.io/kubelet/pkg/apis
Signed-off-by: Renaud Gaubert <rgaubert@nvidia.com>
2020-09-15 05:13:33 -07:00
Kubernetes Prow Robot
119c94214c
Merge pull request #93931 from SataQiu/fix-kubelet-swap-20200812
kubelet: assume that swap is disabled when /proc/swaps does not exist
2020-09-11 04:20:14 -07:00
Kubernetes Prow Robot
293a53f2c0
Merge pull request #94140 from derekwaynecarr/pid-ga
Promote PidLimits to GA
2020-09-09 06:35:52 -07:00
auxten
a9c1acc044 Fix staticchecks ST1005,S1002,S1008,S1039 in pkg/kubelet 2020-09-07 10:53:43 +08:00
Kubernetes Prow Robot
60c421b6f6
Merge pull request #94541 from liggitt/deflake-cpucheckpoint
Deflake cpumanager checkpoint unit tests
2020-09-04 18:47:40 -07:00
Stephen Solka
203679cc61 prefer NoError/Error over Nil/NotNil 2020-09-04 18:35:52 -04:00
Jordan Liggitt
7268b1d557 Deflake cpumanager checkpoint unit tests 2020-09-04 15:06:04 -04:00
Jordan Liggitt
803da10d8b Use unique socket name per cm test 2020-09-04 14:55:23 -04:00
Kubernetes Prow Robot
81bf1f8789
Merge pull request #90980 from AlexeyPerevalov/GetNUMANodeInfo
Avoid using socket for hints in generateCPUTopologyHints
2020-09-02 03:41:06 -07:00
Kubernetes Prow Robot
f19118eea8
Merge pull request #94111 from giuseppe/fix-cgroup-v2-cgroupfs-path
kubelet, cgroupv2: do not create /sys/fs/cgroup/sys with cgroupfs
2020-09-01 19:41:33 -07:00
Kubernetes Prow Robot
2e59a17dc1
Merge pull request #92288 from zhijianli88/cleanup-tempfiles
Cleanup tempfiles
2020-08-27 17:56:54 -07:00
Derek Carr
6f2153986a Promote PidLimits to GA 2020-08-24 13:57:48 -04:00
Giuseppe Scrivano
49cbf91fce
kubelet, cgroupv2: do not create /sys/fs/cgroup/sys with cgroupfs
Closes: https://github.com/kubernetes/kubernetes/issues/94104

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2020-08-19 22:29:38 +02:00
SataQiu
ad1739f8bc kubelet: assume that swap is disabled when /proc/swaps does not exist 2020-08-12 22:43:58 +08:00
Jordan Liggitt
f33dc28094 generated: hack/update-hack-tools.sh && hack/update-vendor.sh 2020-07-25 16:45:02 -04:00
Alexey Perevalov
a047e8aa1b move to cadvisor.MachineInfo
This patch removes GetNUMANodeInfo, cadvisor.MachineInfo will be used
instead of it. GetNUMANodeInfo was introduced due to difference of meaning of
MachineInfo.Topology. On the arm it was NUMA nodes, but on the x86 it
represents sockets (since reading from /proc/cpuinfo). Now it unified
and MachineInfo.Topology represents NUMA node.

Signed-off-by: Alexey Perevalov <alexey.perevalov@huawei.com>
2020-07-24 09:29:41 -04:00
Alexey Perevalov
e33ba9e974 Avoid using socket for hints
Sockets don't affect performance as NUMA node does, since NUMA
node has dedicated memory controller, but socket it's physical
extension point.
Socket it's only cpu specific thing and it's strange to merge bitmask of
deviceplugin's and cpu manager, when cpu manager takes into account
socket.

Signed-off-by: Alexey Perevalov <alexey.perevalov@huawei.com>
2020-07-22 05:14:34 -04:00
Kubernetes Prow Robot
b6174e605f
Merge pull request #93189 from klueska/upstream-fix-bug-topology-manager
Fix a bug whereby reusable CPUs and devices were not being honored
2020-07-21 04:35:17 -07:00
Kubernetes Prow Robot
1fdd8fb213
Merge pull request #93263 from liggitt/windows
Fix windows kubelet startup
2020-07-20 19:51:57 -07:00
Jordan Liggitt
886727a4c0 Revert "Add deviceManager in windows container manager"
This reverts commit 056d73b1a1.
2020-07-20 16:13:53 -04:00
Giuseppe Scrivano
ef935bd991
kubelet: clamp cpu shares to max allowed
clamp the max cpu.shares to the maximum value allowed by the kernel.

It is not an issue when using cgroupfs, as the kernel will
anyway make sure the value is not out of range and automatically clamp
it, systemd has an additional check that prevents the cgroup creation.

Closes: https://github.com/kubernetes/kubernetes/issues/92855

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2020-07-20 17:18:03 +02:00
Kevin Klues
00df26a985 Fix a bug whereby reusable CPUs and devices were not being honored
Previously, it was possible for reusable CPUs and reusable devices (i.e.
those previously consumed by init containers) to not be reused by
subsequent init containers or app containers if the TopologyManager was
enabled. This would happen because hint generation for the
TopologyManager was not considering the reusable devices when it made
its hint calculation.

As such, it would sometimes:
1) Generate a hint for a differnent NUMA node, causing the CPUs and
devices to be allocated from that node instead of the one where the
reusable devices live; or
2) End up thinking there were not enough CPUs or devices to allocate and
throw a TopologyAffinity admission error

This patch fixes this by ensuring that reusable CPUs and devices are
considered as part of TopologyHint generation. This frunctionality is
difficult to unit test since it spans multiple components, but an e2e
test will be added in a subsequent patch to test this functionality.
2020-07-20 11:41:13 +00:00
Kevin Klues
74fe9364c3 Simplify logic in devicemanager TopologyHint generation 2020-07-20 11:41:13 +00:00
Kevin Klues
9f5f401d60 Add AnySet() to topologymanager bitmask API 2020-07-20 11:41:13 +00:00
Kubernetes Prow Robot
242f3d9dce
Merge pull request #80917 from aarnaud/windows-devicemanager
Port deviceManager to windows container manager to enable GPU access
2020-07-17 21:04:50 -07:00
Giuseppe Scrivano
79be8be10e
kubelet, cgroupv2: make hugetlb optional
make the hugetlb controller optional when cgroup v2 is used.

Closes: https://github.com/kubernetes/kubernetes/issues/92933

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2020-07-13 09:40:55 +02:00
Kubernetes Prow Robot
63926cf8e7
Merge pull request #92862 from giuseppe/cgroup-fix-leaks
vendor: update github.com/opencontainers/runc
2020-07-11 20:57:11 -07:00
Giuseppe Scrivano
0d2a493a8f
kubelet: skip setting the devices cgroup
use the new libcontainer feature of skipping setting the devices
cgroup.  This is necessary on cgroup v2 to avoid leaking a eBPF
program every time the cgroup is re-configured.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
2020-07-09 09:37:46 +02:00
Anthony ARNAUD
056d73b1a1
Add deviceManager in windows container manager 2020-07-08 18:22:16 +02:00
Kevin Klues
26cb650655 Remove unnecessary union after call to GetPreferredAllocation()
There is no need to try and allocate already-allocated devices again.
2020-07-07 06:35:57 +00:00
Kevin Klues
67ecc11c44 Harden callGetPreferredAllocationIfAvailable() return value
Previously, we didn't check the contents of the result after calling out
to the plugin endpoint. This could have resulted in errors if the plugin
returned either 'nil' or an empty result. This patch fixes this.
2020-07-07 06:35:57 +00:00
Kevin Klues
d87365494a Fix bug in call to callGetPreferredAllocationIfAvailable()
Previously, we were passing the variable 'devices' to this function,
when we should have been passing 'allocated'. This bug crept in due to a
variable name change that didn't propogate its way through the entire
function. The tests added in the previous commit would have caught this.
2020-07-07 06:35:57 +00:00
Kevin Klues
d551ab1e78 Add tests to check paramaters passed to GetPreferredAllocation()
These tests uncovered some small bugs that will be fixed in a subsequent
set of commits.
2020-07-07 06:35:57 +00:00
Kevin Klues
5bd0db0b1f Add new test cases for GetPreferredAllocation() in allocation path 2020-07-03 13:01:32 +00:00
Kevin Klues
83f18d9975 Remove unnecessary field from TestTopologyAlignedAllocation() test cases 2020-07-03 13:01:32 +00:00
Kevin Klues
bb08fd1135 Add a simple endpoint test for GetPreferredAllocation()
More extensive tests that exercise the allocation logic are to follow.
2020-07-03 13:01:32 +00:00
Kevin Klues
cbd405d85c Update existing tests in support of GetPreferredallocation() 2020-07-03 13:01:32 +00:00
Kevin Klues
a780ccff5b Updates logic in devicesToAllocate() to call GetPreferredAllocation() 2020-07-02 22:07:27 +00:00