Kubernetes Prow Robot
06a7e2bacf
Merge pull request #96781 from fighterhit/fix-kukelet-device-plugin-bug
...
Fix: kubelet return error when device plugin sets PreStartRequired true while creating pods with 0 resource
2021-01-25 17:59:00 -08:00
fighterhit
16c6b99fcd
del unused value
2021-01-13 12:43:54 +08:00
fighterhit
24dd9b1f04
add a test to demonstrate PR#96781
2021-01-13 11:27:30 +08:00
Kubernetes Prow Robot
b37e9a440e
Merge pull request #97193 from JornShen/flaky_devicemanager_test
...
[flaky test] fix devicemanager TestDevicePluginReRegistrationProbeMode failed
2021-01-05 11:46:21 -08:00
Anthony ARNAUD
6013aaa370
use Lstat instead of Stat for unix socket on windows
2020-12-29 15:14:29 -05:00
Anthony ARNAUD
8bdc3d8970
Port deviceManager in windows container manager
2020-12-16 00:25:26 -05:00
jornshen
93606f8ba3
[flaky test] fix devicemanager TestDevicePluginReRegistrationProbeMode fail
2020-12-10 21:07:49 +08:00
fighterhit
0eaceb7eb5
Fix: kubelet return error when device plugin sets PreStartRequired true while creating pods with 0 resource
2020-11-21 22:44:27 +08:00
Alexey Perevalov
5e6aed4137
Fixes sigfault in case of empty TopologyInfo
...
Device plugin which implements v1beta interface can return nil in
Topology field
For example nvidia-gpu-deviceplugin
3520254b75/nvidia.go (L147)
Signed-off-by: Alexey Perevalov <alexey.perevalov@huawei.com >
2020-11-13 11:51:47 +03:00
Krzysztof Wiatrzyk
b7714918db
Run ./update-all.sh
...
Signed-off-by: Krzysztof Wiatrzyk <k.wiatrzyk@samsung.com >
2020-11-12 12:25:55 +01:00
sw.han
7b0ccaa1e9
Add tests for getPodDeviceRequest() for devicemanager
...
Signed-off-by: Krzysztof Wiatrzyk <k.wiatrzyk@samsung.com >
2020-11-12 12:25:55 +01:00
sw.han
ba1e8abce7
Add tests for GetPodTopologyHints() for devicemanager
...
* Add additional test cases returned by getPodScopeTestCases()
Signed-off-by: Krzysztof Wiatrzyk <k.wiatrzyk@samsung.com >
2020-11-12 12:25:55 +01:00
sw.han
1c4a1ba6ae
Update topology hints tests to use pod object for devicemanager
...
Pod object is more flexible to use and construct
* Update TestGetTopologyHints() to work according to new test cases
* Update topologyHintTestCase{} to include proper field
Signed-off-by: Krzysztof Wiatrzyk <k.wiatrzyk@samsung.com >
2020-11-12 12:25:55 +01:00
sw.han
7ad65bf22d
Add tests for GetPodTopologyHints() for cpumanager
...
* Add tests for getPodRequestedCPU()
Signed-off-by: Krzysztof Wiatrzyk <k.wiatrzyk@samsung.com >
2020-11-12 12:25:55 +01:00
Byonggon Chun
9da0912a33
Implement devicemanager.GetPodLevelTopologyHints() function
...
* Add podDevices() func
* Add getPodDeviceRequest() func
Signed-off-by: Krzysztof Wiatrzyk <k.wiatrzyk@samsung.com >
2020-11-12 12:25:55 +01:00
Krzysztof Wiatrzyk
6db58b2e92
Update logging to use a format util
...
Signed-off-by: Krzysztof Wiatrzyk <k.wiatrzyk@samsung.com >
2020-11-12 12:25:55 +01:00
sw.han
f5997fe537
Add GetPodTopologyHints() interface to Topology/CPU/Device Manager
...
Signed-off-by: Krzysztof Wiatrzyk <k.wiatrzyk@samsung.com >
2020-11-12 12:25:54 +01:00
Alexey Perevalov
a8b8995ef2
Implement TopologyInfo and cpu_ids in podresources
...
It covers deviceplugin & cpumanager.
It has drawback, since cpuset and all other structs including cadvisor's keep
cpu as int, but for protobuf based interface is better to have fixed
int.
This patch also introduces additional interface CPUsProvider, while
DeviceProvider might have been extended too.
Checkpoint not covered by unit test.
Signed-off-by: Swati Sehgal <swsehgal@redhat.com >
Signed-off-by: Alexey Perevalov <alexey.perevalov@huawei.com >
2020-11-11 13:50:49 +03:00
Alexey Perevalov
62326a1846
Convert podDevices to struct
...
PodDevices will have its own guard
Signed-off-by: Alexey Perevalov <alexey.perevalov@huawei.com >
2020-11-11 13:50:48 +03:00
Alexey Perevalov
9f54dccc92
Change GetDevices interface
...
This change is necessary for supporting Topology in the ContainerDevices.
Signed-off-by: Alexey Perevalov <alexey.perevalov@huawei.com >
2020-11-11 12:41:31 +03:00
Kubernetes Prow Robot
bbc26ba7e6
Merge pull request #96048 from rphillips/fixes/device_plugin_stub_race
...
Deflake TestDevicePluginReRegistrationProbeMode: Devices of previous registered should be removed
2020-11-02 08:20:54 -08:00
Kubernetes Prow Robot
332d17c7f5
Merge pull request #95731 from farah/split-scheduler
...
Delete framework/v1alpha1 folder and change remaining import paths
2020-10-30 11:14:22 -07:00
Ryan Phillips
4fdfbc718c
devicemanager: fix race in stub
...
There is a race when the server is coming up and the subsequent dial on
the socket. Fix the race with a PollImmediate retry.
2020-10-30 11:42:01 -05:00
Ali
bfdeda58b7
Delete framework/v1alpha1 folder and change remaining import paths
2020-10-23 13:16:13 +11:00
chenyw1990
009d46f834
write checkpoint only when allocated devices updated.
2020-10-22 22:45:04 +08:00
Renaud Gaubert
4eadf40448
Run gofmt
...
Signed-off-by: Renaud Gaubert <rgaubert@nvidia.com >
2020-09-15 06:22:44 -07:00
Renaud Gaubert
ba95a8c641
run hack/update-vendor.sh
...
Signed-off-by: Renaud Gaubert <rgaubert@nvidia.com >
2020-09-15 05:13:33 -07:00
Renaud Gaubert
60304452ff
Move podresources api to k8s.io/kubelet/pkg/apis
...
Signed-off-by: Renaud Gaubert <rgaubert@nvidia.com >
2020-09-15 05:13:33 -07:00
Jordan Liggitt
803da10d8b
Use unique socket name per cm test
2020-09-04 14:55:23 -04:00
Kubernetes Prow Robot
81bf1f8789
Merge pull request #90980 from AlexeyPerevalov/GetNUMANodeInfo
...
Avoid using socket for hints in generateCPUTopologyHints
2020-09-02 03:41:06 -07:00
Kubernetes Prow Robot
2e59a17dc1
Merge pull request #92288 from zhijianli88/cleanup-tempfiles
...
Cleanup tempfiles
2020-08-27 17:56:54 -07:00
Alexey Perevalov
a047e8aa1b
move to cadvisor.MachineInfo
...
This patch removes GetNUMANodeInfo, cadvisor.MachineInfo will be used
instead of it. GetNUMANodeInfo was introduced due to difference of meaning of
MachineInfo.Topology. On the arm it was NUMA nodes, but on the x86 it
represents sockets (since reading from /proc/cpuinfo). Now it unified
and MachineInfo.Topology represents NUMA node.
Signed-off-by: Alexey Perevalov <alexey.perevalov@huawei.com >
2020-07-24 09:29:41 -04:00
Kevin Klues
00df26a985
Fix a bug whereby reusable CPUs and devices were not being honored
...
Previously, it was possible for reusable CPUs and reusable devices (i.e.
those previously consumed by init containers) to not be reused by
subsequent init containers or app containers if the TopologyManager was
enabled. This would happen because hint generation for the
TopologyManager was not considering the reusable devices when it made
its hint calculation.
As such, it would sometimes:
1) Generate a hint for a differnent NUMA node, causing the CPUs and
devices to be allocated from that node instead of the one where the
reusable devices live; or
2) End up thinking there were not enough CPUs or devices to allocate and
throw a TopologyAffinity admission error
This patch fixes this by ensuring that reusable CPUs and devices are
considered as part of TopologyHint generation. This frunctionality is
difficult to unit test since it spans multiple components, but an e2e
test will be added in a subsequent patch to test this functionality.
2020-07-20 11:41:13 +00:00
Kevin Klues
74fe9364c3
Simplify logic in devicemanager TopologyHint generation
2020-07-20 11:41:13 +00:00
Kevin Klues
26cb650655
Remove unnecessary union after call to GetPreferredAllocation()
...
There is no need to try and allocate already-allocated devices again.
2020-07-07 06:35:57 +00:00
Kevin Klues
67ecc11c44
Harden callGetPreferredAllocationIfAvailable() return value
...
Previously, we didn't check the contents of the result after calling out
to the plugin endpoint. This could have resulted in errors if the plugin
returned either 'nil' or an empty result. This patch fixes this.
2020-07-07 06:35:57 +00:00
Kevin Klues
d87365494a
Fix bug in call to callGetPreferredAllocationIfAvailable()
...
Previously, we were passing the variable 'devices' to this function,
when we should have been passing 'allocated'. This bug crept in due to a
variable name change that didn't propogate its way through the entire
function. The tests added in the previous commit would have caught this.
2020-07-07 06:35:57 +00:00
Kevin Klues
d551ab1e78
Add tests to check paramaters passed to GetPreferredAllocation()
...
These tests uncovered some small bugs that will be fixed in a subsequent
set of commits.
2020-07-07 06:35:57 +00:00
Kevin Klues
5bd0db0b1f
Add new test cases for GetPreferredAllocation() in allocation path
2020-07-03 13:01:32 +00:00
Kevin Klues
83f18d9975
Remove unnecessary field from TestTopologyAlignedAllocation() test cases
2020-07-03 13:01:32 +00:00
Kevin Klues
bb08fd1135
Add a simple endpoint test for GetPreferredAllocation()
...
More extensive tests that exercise the allocation logic are to follow.
2020-07-03 13:01:32 +00:00
Kevin Klues
cbd405d85c
Update existing tests in support of GetPreferredallocation()
2020-07-03 13:01:32 +00:00
Kevin Klues
a780ccff5b
Updates logic in devicesToAllocate() to call GetPreferredAllocation()
2020-07-02 22:07:27 +00:00
Kevin Klues
bb56a09133
Add callGetPreferredAllocationIfAvailable() function in devicemanager
...
This function mimics what is already done for the conditional call to
PreStartContainer() via the callPreStartContainerIfNeeded() function.
2020-07-02 22:07:27 +00:00
Kevin Klues
abf87c99c6
Add GetPreferredAllocation() as a supported device plugin endpoint
2020-07-02 15:15:50 +00:00
Kevin Klues
32c047a52e
Update device plugin stub with new GetPreferredAllocation() call
2020-07-02 15:15:48 +00:00
Kevin Klues
c45f1317eb
Fix some whitespacing and comments in devicemanager
2020-07-02 15:15:44 +00:00
Li Zhijian
02eaa4f354
cleanup tempfiles in unit test
...
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com >
2020-06-23 11:47:18 +08:00
Davanum Srinivas
07d88617e5
Run hack/update-vendor.sh
...
Signed-off-by: Davanum Srinivas <davanum@gmail.com >
2020-05-16 07:54:33 -04:00
Davanum Srinivas
442a69c3bd
switch over k/k to use klog v2
...
Signed-off-by: Davanum Srinivas <davanum@gmail.com >
2020-05-16 07:54:27 -04:00