Commit Graph

165 Commits

Author SHA1 Message Date
Elana Hashman
6af7eb6d49 Migrate missed log entries in kubelet
Co-Authored-By: pacoxu <paco.xu@daocloud.io>
2021-03-18 14:26:26 -07:00
Kubernetes Prow Robot
e082d84575 Merge pull request #100196 from ehashman/remains-of-logs
Migrate remaining logs to structured logging
2021-03-16 13:12:55 -07:00
Elana Hashman
ee0bcac1d2 Migrate devicemanager/topology_hints.go to structured logs 2021-03-15 12:39:45 -07:00
Amim Knabben
c1d24c87bb Migrate devicemanager to structured logging 2021-03-14 11:57:06 -04:00
Francesco Romani
ad68f9588c node: podresources: make GetDevices() consistent
We want to make the return type of the GetDevices() method of the
podresources DevicesProvider interface consistent with
the newly added GetAllocatableDevices type.
This makes the code easier to read and reduces the coupling between
the podresourcesapi server and the devicemanager code.

No intended changes in behaviour, but the different return types
now requires some data massaging. Tests are updated accordingly.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2021-03-09 13:13:36 +01:00
Francesco Romani
6d33354e4c node: podresources: implement GetAllocatableResources API
Extend the podresources API implementing the GetAllocatableResources endpoint,
as specified in the KEPs:

https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/2043-pod-resource-concrete-assigments
https://github.com/kubernetes/enhancements/pull/2404

Signed-off-by: Francesco Romani <fromani@redhat.com>
2021-03-09 13:13:36 +01:00
Benjamin Elder
56e092e382 hack/update-bazel.sh 2021-02-28 15:17:29 -08:00
Kubernetes Prow Robot
06a7e2bacf Merge pull request #96781 from fighterhit/fix-kukelet-device-plugin-bug
Fix: kubelet return error when device plugin sets PreStartRequired true while creating pods with 0 resource
2021-01-25 17:59:00 -08:00
fighterhit
16c6b99fcd del unused value 2021-01-13 12:43:54 +08:00
fighterhit
24dd9b1f04 add a test to demonstrate PR#96781 2021-01-13 11:27:30 +08:00
Kubernetes Prow Robot
b37e9a440e Merge pull request #97193 from JornShen/flaky_devicemanager_test
[flaky test] fix devicemanager TestDevicePluginReRegistrationProbeMode failed
2021-01-05 11:46:21 -08:00
Anthony ARNAUD
6013aaa370 use Lstat instead of Stat for unix socket on windows 2020-12-29 15:14:29 -05:00
Anthony ARNAUD
8bdc3d8970 Port deviceManager in windows container manager 2020-12-16 00:25:26 -05:00
jornshen
93606f8ba3 [flaky test] fix devicemanager TestDevicePluginReRegistrationProbeMode fail 2020-12-10 21:07:49 +08:00
fighterhit
0eaceb7eb5 Fix: kubelet return error when device plugin sets PreStartRequired true while creating pods with 0 resource 2020-11-21 22:44:27 +08:00
Alexey Perevalov
5e6aed4137 Fixes sigfault in case of empty TopologyInfo
Device plugin which implements v1beta interface can return nil in
Topology field

For example nvidia-gpu-deviceplugin
3520254b75/nvidia.go (L147)
Signed-off-by: Alexey Perevalov <alexey.perevalov@huawei.com>
2020-11-13 11:51:47 +03:00
Krzysztof Wiatrzyk
b7714918db Run ./update-all.sh
Signed-off-by: Krzysztof Wiatrzyk <k.wiatrzyk@samsung.com>
2020-11-12 12:25:55 +01:00
sw.han
7b0ccaa1e9 Add tests for getPodDeviceRequest() for devicemanager
Signed-off-by: Krzysztof Wiatrzyk <k.wiatrzyk@samsung.com>
2020-11-12 12:25:55 +01:00
sw.han
ba1e8abce7 Add tests for GetPodTopologyHints() for devicemanager
* Add additional test cases returned by getPodScopeTestCases()

Signed-off-by: Krzysztof Wiatrzyk <k.wiatrzyk@samsung.com>
2020-11-12 12:25:55 +01:00
sw.han
1c4a1ba6ae Update topology hints tests to use pod object for devicemanager
Pod object is more flexible to use and construct
* Update TestGetTopologyHints() to work according to new test cases
* Update topologyHintTestCase{} to include proper field

Signed-off-by: Krzysztof Wiatrzyk <k.wiatrzyk@samsung.com>
2020-11-12 12:25:55 +01:00
sw.han
7ad65bf22d Add tests for GetPodTopologyHints() for cpumanager
* Add tests for getPodRequestedCPU()

Signed-off-by: Krzysztof Wiatrzyk <k.wiatrzyk@samsung.com>
2020-11-12 12:25:55 +01:00
Byonggon Chun
9da0912a33 Implement devicemanager.GetPodLevelTopologyHints() function
* Add podDevices() func
* Add getPodDeviceRequest() func

Signed-off-by: Krzysztof Wiatrzyk <k.wiatrzyk@samsung.com>
2020-11-12 12:25:55 +01:00
Krzysztof Wiatrzyk
6db58b2e92 Update logging to use a format util
Signed-off-by: Krzysztof Wiatrzyk <k.wiatrzyk@samsung.com>
2020-11-12 12:25:55 +01:00
sw.han
f5997fe537 Add GetPodTopologyHints() interface to Topology/CPU/Device Manager
Signed-off-by: Krzysztof Wiatrzyk <k.wiatrzyk@samsung.com>
2020-11-12 12:25:54 +01:00
Alexey Perevalov
a8b8995ef2 Implement TopologyInfo and cpu_ids in podresources
It covers deviceplugin & cpumanager.

It has drawback, since cpuset and all other structs including cadvisor's keep
cpu as int, but for protobuf based interface is better to have fixed
int.
This patch also introduces additional interface CPUsProvider, while
DeviceProvider might have been extended too.

Checkpoint not covered by unit test.

Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
Signed-off-by: Alexey Perevalov <alexey.perevalov@huawei.com>
2020-11-11 13:50:49 +03:00
Alexey Perevalov
62326a1846 Convert podDevices to struct
PodDevices will have its own guard

Signed-off-by: Alexey Perevalov <alexey.perevalov@huawei.com>
2020-11-11 13:50:48 +03:00
Alexey Perevalov
9f54dccc92 Change GetDevices interface
This change is necessary for supporting Topology in the ContainerDevices.

Signed-off-by: Alexey Perevalov <alexey.perevalov@huawei.com>
2020-11-11 12:41:31 +03:00
Kubernetes Prow Robot
bbc26ba7e6 Merge pull request #96048 from rphillips/fixes/device_plugin_stub_race
Deflake TestDevicePluginReRegistrationProbeMode: Devices of previous registered should be removed
2020-11-02 08:20:54 -08:00
Kubernetes Prow Robot
332d17c7f5 Merge pull request #95731 from farah/split-scheduler
Delete framework/v1alpha1 folder and change remaining import paths
2020-10-30 11:14:22 -07:00
Ryan Phillips
4fdfbc718c devicemanager: fix race in stub
There is a race when the server is coming up and the subsequent dial on
the socket. Fix the race with a PollImmediate retry.
2020-10-30 11:42:01 -05:00
Ali
bfdeda58b7 Delete framework/v1alpha1 folder and change remaining import paths 2020-10-23 13:16:13 +11:00
chenyw1990
009d46f834 write checkpoint only when allocated devices updated. 2020-10-22 22:45:04 +08:00
Renaud Gaubert
4eadf40448 Run gofmt
Signed-off-by: Renaud Gaubert <rgaubert@nvidia.com>
2020-09-15 06:22:44 -07:00
Renaud Gaubert
ba95a8c641 run hack/update-vendor.sh
Signed-off-by: Renaud Gaubert <rgaubert@nvidia.com>
2020-09-15 05:13:33 -07:00
Renaud Gaubert
60304452ff Move podresources api to k8s.io/kubelet/pkg/apis
Signed-off-by: Renaud Gaubert <rgaubert@nvidia.com>
2020-09-15 05:13:33 -07:00
Jordan Liggitt
803da10d8b Use unique socket name per cm test 2020-09-04 14:55:23 -04:00
Kubernetes Prow Robot
81bf1f8789 Merge pull request #90980 from AlexeyPerevalov/GetNUMANodeInfo
Avoid using socket for hints in generateCPUTopologyHints
2020-09-02 03:41:06 -07:00
Kubernetes Prow Robot
2e59a17dc1 Merge pull request #92288 from zhijianli88/cleanup-tempfiles
Cleanup tempfiles
2020-08-27 17:56:54 -07:00
Alexey Perevalov
a047e8aa1b move to cadvisor.MachineInfo
This patch removes GetNUMANodeInfo, cadvisor.MachineInfo will be used
instead of it. GetNUMANodeInfo was introduced due to difference of meaning of
MachineInfo.Topology. On the arm it was NUMA nodes, but on the x86 it
represents sockets (since reading from /proc/cpuinfo). Now it unified
and MachineInfo.Topology represents NUMA node.

Signed-off-by: Alexey Perevalov <alexey.perevalov@huawei.com>
2020-07-24 09:29:41 -04:00
Kevin Klues
00df26a985 Fix a bug whereby reusable CPUs and devices were not being honored
Previously, it was possible for reusable CPUs and reusable devices (i.e.
those previously consumed by init containers) to not be reused by
subsequent init containers or app containers if the TopologyManager was
enabled. This would happen because hint generation for the
TopologyManager was not considering the reusable devices when it made
its hint calculation.

As such, it would sometimes:
1) Generate a hint for a differnent NUMA node, causing the CPUs and
devices to be allocated from that node instead of the one where the
reusable devices live; or
2) End up thinking there were not enough CPUs or devices to allocate and
throw a TopologyAffinity admission error

This patch fixes this by ensuring that reusable CPUs and devices are
considered as part of TopologyHint generation. This frunctionality is
difficult to unit test since it spans multiple components, but an e2e
test will be added in a subsequent patch to test this functionality.
2020-07-20 11:41:13 +00:00
Kevin Klues
74fe9364c3 Simplify logic in devicemanager TopologyHint generation 2020-07-20 11:41:13 +00:00
Kevin Klues
26cb650655 Remove unnecessary union after call to GetPreferredAllocation()
There is no need to try and allocate already-allocated devices again.
2020-07-07 06:35:57 +00:00
Kevin Klues
67ecc11c44 Harden callGetPreferredAllocationIfAvailable() return value
Previously, we didn't check the contents of the result after calling out
to the plugin endpoint. This could have resulted in errors if the plugin
returned either 'nil' or an empty result. This patch fixes this.
2020-07-07 06:35:57 +00:00
Kevin Klues
d87365494a Fix bug in call to callGetPreferredAllocationIfAvailable()
Previously, we were passing the variable 'devices' to this function,
when we should have been passing 'allocated'. This bug crept in due to a
variable name change that didn't propogate its way through the entire
function. The tests added in the previous commit would have caught this.
2020-07-07 06:35:57 +00:00
Kevin Klues
d551ab1e78 Add tests to check paramaters passed to GetPreferredAllocation()
These tests uncovered some small bugs that will be fixed in a subsequent
set of commits.
2020-07-07 06:35:57 +00:00
Kevin Klues
5bd0db0b1f Add new test cases for GetPreferredAllocation() in allocation path 2020-07-03 13:01:32 +00:00
Kevin Klues
83f18d9975 Remove unnecessary field from TestTopologyAlignedAllocation() test cases 2020-07-03 13:01:32 +00:00
Kevin Klues
bb08fd1135 Add a simple endpoint test for GetPreferredAllocation()
More extensive tests that exercise the allocation logic are to follow.
2020-07-03 13:01:32 +00:00
Kevin Klues
cbd405d85c Update existing tests in support of GetPreferredallocation() 2020-07-03 13:01:32 +00:00
Kevin Klues
a780ccff5b Updates logic in devicesToAllocate() to call GetPreferredAllocation() 2020-07-02 22:07:27 +00:00