Commit Graph

123 Commits

Author SHA1 Message Date
Daniel Hu
1baf7d4586 Corrected some spelling and grammatical errors
Signed-off-by: Daniel Hu <farmer.hutao@outlook.com>
2024-01-27 10:10:25 +08:00
Daniel Hu
d652596e42 Remove redundant string conversions in print statements
Signed-off-by: Daniel Hu <farmer.hutao@outlook.com>
2024-01-15 09:57:35 +08:00
Kubernetes Prow Robot
a5ff0324a9
Merge pull request #120461 from gjkim42/do-not-reuse-device-of-restartable-init-container
Don't reuse the device of a restartable init container
2023-10-31 19:15:53 +01:00
Shiming Zhang
35f4d29d73 Fix unit test 2023-10-24 11:06:35 +08:00
Gunju Kim
d2b803246a
Don't reuse the device allocated to the restartable init container 2023-10-17 18:28:29 +09:00
Gunju Kim
a0610a97b3
pkg/kubelet/cm: Remove deprecated sets.String and sets.Int
This removes deprecated sets.String and sets.Int
- replace sets.String with sets.Set[string]
- replace sets.Int with sets.Set[int]
- replace sets.NewString with sets.New[string]
- replace sets.NewInt with sets.New[int]
- replace sets.(OLD).List with sets.List(NEW)
2023-09-27 22:02:15 +09:00
Francesco Romani
c635a7e7d8 node: devicemgr: topomgr: add logs
One of the contributing factors of issues #118559 and #109595 hard to
debug and fix is that the devicemanager has very few logs in important
flow, so it's unnecessarily hard to reconstruct the state from logs.

We add minimal logs to be able to improve troubleshooting.
We add minimal logs to be backport-friendly, deferring a more
comprehensive review of logging to later PRs.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2023-07-12 13:25:36 +02:00
Francesco Romani
3bcf4220ec kubelet: devices: skip allocation for running pods
When kubelet initializes, runs admission for pods and possibly
allocated requested resources. We need to distinguish between
node reboot (no containers running) versus kubelet restart (containers
potentially running).

Running pods should always survive kubelet restart.
This means that device allocation on admission should not be attempted,
because if a container requires devices and is still running when kubelet
is restarting, that container already has devices allocated and working.

Thus, we need to properly detect this scenario in the allocation step
and handle it explicitely. We need to inform
the devicemanager about which pods are already running.

Note that if container runtime is down when kubelet restarts, the
approach implemented here won't work. In this scenario, so on kubelet
restart containers will again fail admission, hitting
https://github.com/kubernetes/kubernetes/issues/118559 again.
This scenario should however be pretty rare.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2023-07-12 13:25:36 +02:00
Swati Sehgal
dc1a592632 node: device-mgr: Handle recovery by checking if healthy devices exist
In case of node reboot/kubelet restart, the flow of events involves
obtaining the state from the checkpoint file followed by setting
the `healthDevices`/`unhealthyDevices` to its zero value. This is
done to allow the device plugin to re-register itself so that
capacity can be updated appropriately.

During the allocation phase, we need to check if the resources requested
by the pod have been registered AND healthy devices are present on
the node to be allocated.

Also we need to move this check above `needed==0` where needed is
required - devices allocated to the container (which is obtained from
the checkpoint file) because even in cases where no additional devices
have to be allocated (as they were pre-allocated), we still need to
make sure he devices that were previously allocated are healthy.

Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
2023-04-28 14:41:30 +01:00
David Porter
9c20cee504
Revert "node: device-mgr: Handle recovery flow by checking if healthy devices exist" 2023-03-07 11:50:52 -08:00
Swati Sehgal
5b2a3dbbdc node: device-mgr: explicitly check if pre-allocated devices are healthy
Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
2023-03-06 11:52:23 +00:00
Swati Sehgal
7ac399c205 node: device-mgr: Handle recovery by checking if healthy devices exist
In case of node reboot/kubelet restart, the flow of events involves
obtaining the state from the checkpoint file followed by setting
the `healthDevices`/`unhealthyDevices` to its zero value. This is
done to allow the device plugin to re-register itself so that
capacity can be updated appropriately.

During the allocation phase, we need to check if the resources requested
by the pod have been registered AND healthy devices are present on
the node to be allocated.

Also we need to move this check above `needed==0` where needed is
required - devices allocated to the container (which is obtained from
the checkpoint file) because even in cases where no additional devices
have to be allocated (as they were pre-allocated), we still need to
make the devices that were previously allocated are healthy.

Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
2023-03-06 11:52:23 +00:00
Kubernetes Prow Robot
25dc4c4f32
Merge pull request #112980 from swatisehgal/devicemanager-ga-graduation
node: devicemgr: Graduate Kubelet DeviceManager to GA
2022-11-02 13:17:01 -07:00
Swati Sehgal
40741681a2 node: devicemgr: Address warnings from golint
Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
2022-11-02 11:05:20 +00:00
Swati Sehgal
752fa093e0 node: devicemgr: GA graduation implies Feature Gate is ON by default
Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
2022-11-02 11:05:20 +00:00
Ryan Phillips
2514486d80 kubelet: fix nil crash in allocateRemainingFrom 2022-10-12 12:51:17 -05:00
Kevin Klues
db88676c20 Refactor all device plugin logic into separate 'plugin' package
This is the first step towards being able to support a new plugin API version
in parallel with the existing one.

Signed-off-by: Kevin Klues <kklues@nvidia.com>
2022-04-29 10:52:37 +00:00
waynepeking348
6157d3cc4a skip deleted activePods and return nil 2022-03-27 20:35:09 +08:00
waynepeking348
35a456b0c6 skip reallocate logic if pod is already removed 2022-03-20 21:09:47 +08:00
Jan Safranek
77aa06d0c8 Remove util/selinux package
The package says:

> the libcontainer SELinux package is only built for Linux, so it is
> necessary to have a NOP wrapper which is built for non-Linux platforms

This is not true, Kubernetes now imports
github.com/opencontainers/selinux/go-selinux and it has proper
multiplatform support (i.e. NOOP on non-Linux platforms).

Removing the whole package and calling go-selinux directly.
2022-02-11 15:20:35 +01:00
Francesco Romani
2f426fdba6 devicemanager: checkpoint: support pre-1.20 data
The commit a8b8995ef2
changed the content of the data kubelet writes in the checkpoint.
Unfortunately, the checkpoint restore code was not updated,
so if we upgrade kubelet from pre-1.20 to 1.20+, the
device manager cannot anymore restore its state correctly.

The only trace of this misbehaviour is this line in the
kubelet logs:
```
W0615 07:31:49.744770    4852 manager.go:244] Continue after failing to read checkpoint file. Device allocation info may NOT be up-to-date. Err: json: cannot unmarshal array into Go struct field PodDevicesEntry.Data.PodDeviceEntries.DeviceIDs of type checkpoint.DevicesPerNUMA
```

If we hit this bug, the device allocation info is
indeed NOT up-to-date up until the device plugins register
themselves again. This can take up to few minutes, depending
on the specific device plugin.

While the device manager state is inconsistent:
1. the kubelet will NOT update the device availability to zero, so
   the scheduler will send pods towards the inconsistent kubelet.
2. at pod admission time, the device manager allocation will not
   trigger, so pods will be admitted without devices actually
   being allocated to them.

To fix these issues, we add support to the device manager to
read pre-1.20 checkpoint data. We retroactively call this
format "v1".

Signed-off-by: Francesco Romani <fromani@redhat.com>
2021-10-26 09:54:11 +02:00
Kubernetes Prow Robot
c4d802b0b5
Merge pull request #103289 from AlexeyPerevalov/DoNotExportEmptyTopology
podresources: do not export empty NUMA topology
2021-10-07 07:11:46 -07:00
Francesco Romani
1b6efa5e21 devicemanager: skip unhealthy devs in GetAllocatable
The GetAllocatableDevices, needed to support the podresources
API, doesn't take into account the device health when computing
its output.

In this PR we address this gap and add unit tests along the way
to prevent regressions. This gives us a good initial coverage,
E2E tests to cover this case are much harder to write, because
we would need to inject faults to trigger the unhealthy status.
We will evaluate if adding these tests into later PRs.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2021-09-22 19:20:04 +02:00
Alexey Perevalov
bb81101570 podresource: do not export NUMA topology if it's empty
If device plugin returns device without topology, keep it internaly
as NUMA node -1, it helps at podresources level to not export NUMA
topology, otherwise topology is exported with NUMA node id 0,
which is not accurate.

It's imposible to unveile this bug just by tracing json.Marshal(resp)
in podresource client, because NUMANodes field ID has json property
omitempty, in this case when ID=0 shown as emtpy NUMANode.
To reproduce it, better to iterate on devices and just
trace dev.Topology.Nodes[0].ID.

Signed-off-by: Alexey Perevalov <alexey.perevalov@huawei.com>
2021-08-24 15:38:21 +00:00
Artyom Lukianov
73a5cce3e6 device manager: do not clean admitted pods from the state
Signed-off-by: Artyom Lukianov <alukiano@redhat.com>
2021-08-08 16:46:06 +03:00
sanwishe
9e257ec194 Optimization logging format for pkg/kubelet
Signed-off-by: sanwishe <jiang.mingzhi35@zte.com.cn>
2021-05-25 08:52:08 +08:00
kikimo
c0a7939cbb remove redundant test branch in sorting algorithm 2021-05-20 20:31:47 +08:00
kikimo
445b9c0762 minor tweak on numa node sorting algorithm 2021-05-20 08:21:20 +08:00
kikimo
ecfa609b71 simplify sorting comparator of numa nodes 2021-05-19 21:19:47 +08:00
kikimo
7d30bfecd5 simplify sorting comparator of numa nodes 2021-05-19 10:07:37 +08:00
kikimo
2ef1f81076 Avoid undesirable allocation when device is associated with multiple NUMA Nodes
suppose there are two devices dev1 and dev2, each has NUMA Nodes associated as below:
  dev1: numa1
  dev2: numa1, numa2

and we request a device from numa2, currently filterByAffinity() will return
[], [dev1, dev2], [] if loop of available devices produce a sequence of [dev1, dev2],
that is is not desirable as what we truely expect is an allocation of dev2 from numa2.
2021-05-19 10:07:37 +08:00
Elana Hashman
6af7eb6d49
Migrate missed log entries in kubelet
Co-Authored-By: pacoxu <paco.xu@daocloud.io>
2021-03-18 14:26:26 -07:00
Amim Knabben
c1d24c87bb Migrate devicemanager to structured logging 2021-03-14 11:57:06 -04:00
Francesco Romani
ad68f9588c node: podresources: make GetDevices() consistent
We want to make the return type of the GetDevices() method of the
podresources DevicesProvider interface consistent with
the newly added GetAllocatableDevices type.
This makes the code easier to read and reduces the coupling between
the podresourcesapi server and the devicemanager code.

No intended changes in behaviour, but the different return types
now requires some data massaging. Tests are updated accordingly.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2021-03-09 13:13:36 +01:00
Francesco Romani
6d33354e4c node: podresources: implement GetAllocatableResources API
Extend the podresources API implementing the GetAllocatableResources endpoint,
as specified in the KEPs:

https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/2043-pod-resource-concrete-assigments
https://github.com/kubernetes/enhancements/pull/2404

Signed-off-by: Francesco Romani <fromani@redhat.com>
2021-03-09 13:13:36 +01:00
Kubernetes Prow Robot
06a7e2bacf
Merge pull request #96781 from fighterhit/fix-kukelet-device-plugin-bug
Fix: kubelet return error when device plugin sets PreStartRequired true while creating pods with 0 resource
2021-01-25 17:59:00 -08:00
Anthony ARNAUD
6013aaa370
use Lstat instead of Stat for unix socket on windows 2020-12-29 15:14:29 -05:00
Anthony ARNAUD
8bdc3d8970 Port deviceManager in windows container manager 2020-12-16 00:25:26 -05:00
fighterhit
0eaceb7eb5 Fix: kubelet return error when device plugin sets PreStartRequired true while creating pods with 0 resource 2020-11-21 22:44:27 +08:00
Alexey Perevalov
5e6aed4137 Fixes sigfault in case of empty TopologyInfo
Device plugin which implements v1beta interface can return nil in
Topology field

For example nvidia-gpu-deviceplugin
3520254b75/nvidia.go (L147)
Signed-off-by: Alexey Perevalov <alexey.perevalov@huawei.com>
2020-11-13 11:51:47 +03:00
Krzysztof Wiatrzyk
6db58b2e92 Update logging to use a format util
Signed-off-by: Krzysztof Wiatrzyk <k.wiatrzyk@samsung.com>
2020-11-12 12:25:55 +01:00
Alexey Perevalov
a8b8995ef2 Implement TopologyInfo and cpu_ids in podresources
It covers deviceplugin & cpumanager.

It has drawback, since cpuset and all other structs including cadvisor's keep
cpu as int, but for protobuf based interface is better to have fixed
int.
This patch also introduces additional interface CPUsProvider, while
DeviceProvider might have been extended too.

Checkpoint not covered by unit test.

Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
Signed-off-by: Alexey Perevalov <alexey.perevalov@huawei.com>
2020-11-11 13:50:49 +03:00
Alexey Perevalov
62326a1846 Convert podDevices to struct
PodDevices will have its own guard

Signed-off-by: Alexey Perevalov <alexey.perevalov@huawei.com>
2020-11-11 13:50:48 +03:00
Alexey Perevalov
9f54dccc92 Change GetDevices interface
This change is necessary for supporting Topology in the ContainerDevices.

Signed-off-by: Alexey Perevalov <alexey.perevalov@huawei.com>
2020-11-11 12:41:31 +03:00
Kubernetes Prow Robot
332d17c7f5
Merge pull request #95731 from farah/split-scheduler
Delete framework/v1alpha1 folder and change remaining import paths
2020-10-30 11:14:22 -07:00
Ali
bfdeda58b7 Delete framework/v1alpha1 folder and change remaining import paths 2020-10-23 13:16:13 +11:00
chenyw1990
009d46f834 write checkpoint only when allocated devices updated. 2020-10-22 22:45:04 +08:00
Renaud Gaubert
4eadf40448 Run gofmt
Signed-off-by: Renaud Gaubert <rgaubert@nvidia.com>
2020-09-15 06:22:44 -07:00
Renaud Gaubert
60304452ff Move podresources api to k8s.io/kubelet/pkg/apis
Signed-off-by: Renaud Gaubert <rgaubert@nvidia.com>
2020-09-15 05:13:33 -07:00
Alexey Perevalov
a047e8aa1b move to cadvisor.MachineInfo
This patch removes GetNUMANodeInfo, cadvisor.MachineInfo will be used
instead of it. GetNUMANodeInfo was introduced due to difference of meaning of
MachineInfo.Topology. On the arm it was NUMA nodes, but on the x86 it
represents sockets (since reading from /proc/cpuinfo). Now it unified
and MachineInfo.Topology represents NUMA node.

Signed-off-by: Alexey Perevalov <alexey.perevalov@huawei.com>
2020-07-24 09:29:41 -04:00