Commit Graph

60 Commits

Author SHA1 Message Date
Chen Wang
7db339dba2 This commit contains the following:
1. Scheduler bug-fix + scheduler-focussed E2E tests
2. Add cgroup v2 support for in-place pod resize
3. Enable full E2E pod resize test for containerd>=1.6.9 and EventedPLEG related changes.

Co-Authored-By: Vinay Kulkarni <vskibum@gmail.com>
2023-02-24 18:21:21 +00:00
Vinay Kulkarni
f2bd94a0de In-place Pod Vertical Scaling - core implementation
1. Core Kubelet changes to implement In-place Pod Vertical Scaling.
2. E2E tests for In-place Pod Vertical Scaling.
3. Refactor kubelet code and add missing tests (Derek's kubelet review)
4. Add a new hash over container fields without Resources field to allow feature gate toggling without restarting containers not using the feature.
5. Fix corner-case where resize A->B->A gets ignored
6. Add cgroup v2 support to pod resize E2E test.
KEP: /enhancements/keps/sig-node/1287-in-place-update-pod-resources

Co-authored-by: Chen Wang <Chen.Wang1@ibm.com>
2023-02-24 18:21:21 +00:00
Ed Bartosh
4f88332ab4 kubelet: prepare DRA resources before CNI setup 2023-02-06 20:40:11 +02:00
Ed Bartosh
abcb56defb kubelet: do not enter termination status if pod might need to unprepare resources 2022-11-11 21:58:03 +01:00
Ed Bartosh
ae0f38437c kubelet: add support for dynamic resource allocation
Dependencies need to be updated to use
github.com/container-orchestrated-devices/container-device-interface.

It's not decided yet whether we will implement Topology support
for DRA or not. Not having any toppology-related code
will help to avoid wrong impression that DRA is used as a hint
provider for the Topology Manager.
2022-11-11 21:58:03 +01:00
jinxu
0064010cdd Promote Local storage capacity isolation feature to GA
This change is to promote local storage capacity isolation feature to GA

At the same time, to allow rootless system disable this feature due to
unable to get root fs, this change introduced a new kubelet config
"localStorageCapacityIsolation". By default it is set to true. For
rootless systems, they can set this configuration to false to disable
the feature. Once it is set, user cannot set ephemeral-storage
request/limit because capacity and allocatable will not be set.

Change-Id: I48a52e737c6a09e9131454db6ad31247b56c000a
2022-08-02 23:45:48 -07:00
Li Bo
c3d9b10ca8 feature: support Memory QoS for cgroups v2 2021-07-08 09:26:46 +08:00
Artyom Lukianov
03830db82d Implement all necessary methods to provide memory manager data under pod resources metrics
Signed-off-by: Artyom Lukianov <alukiano@redhat.com>
2021-06-22 13:06:32 +03:00
Kubernetes Prow Robot
12f8466459
Merge pull request #100267 from Jeffwan/support_arbitratry_resources
Expose resources overrides and maxPods conf in kubemark
2021-04-08 20:29:12 -07:00
Jiaxin Shan
1b4dc87a1f Expose resources overrides and maxPods conf in kubemark 2021-03-17 16:31:58 -07:00
Kubernetes Prow Robot
38fbecf0c8
Merge pull request #100001 from shiyajuan123/logs
migrate kubelet/cm/container logs to structured logging
2021-03-16 14:50:06 -07:00
Francesco Romani
8afdf4f146 node: podresources: translate types in cm
during the review, we convened that the manager types
(CPUSet, ResourceDeviceInstances) should not cross the
containermanager API boundary; thus, the ContainerManager layer
is the correct place to do the type conversion

We push back the type conversions from the podresources server
layer, fixing tests accordingly.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2021-03-09 13:13:36 +01:00
Francesco Romani
ad68f9588c node: podresources: make GetDevices() consistent
We want to make the return type of the GetDevices() method of the
podresources DevicesProvider interface consistent with
the newly added GetAllocatableDevices type.
This makes the code easier to read and reduces the coupling between
the podresourcesapi server and the devicemanager code.

No intended changes in behaviour, but the different return types
now requires some data massaging. Tests are updated accordingly.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2021-03-09 13:13:36 +01:00
Francesco Romani
6d33354e4c node: podresources: implement GetAllocatableResources API
Extend the podresources API implementing the GetAllocatableResources endpoint,
as specified in the KEPs:

https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/2043-pod-resource-concrete-assigments
https://github.com/kubernetes/enhancements/pull/2404

Signed-off-by: Francesco Romani <fromani@redhat.com>
2021-03-09 13:13:36 +01:00
Francesco Romani
1375c5bdc7 node: podresources: make GetCPUs return cpuset
a upcoming patch wants to add GetAllocatableCPUs() returning a cpuset.
To make the code consistent and a bit more flexible, we change the
existing interface to also return a cpuset.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2021-03-09 13:13:36 +01:00
shiyajuan123
6b53f8c65d migrate kubelet/cm/container logs to structured logging 2021-03-09 18:27:54 +08:00
Artyom Lukianov
afb1ae3458 memory manager: add fake memory manager
The fake memory manager needed for the unittesting.

Signed-off-by: Artyom Lukianov <alukiano@redhat.com>
2021-02-09 01:09:59 +02:00
Alexey Perevalov
a8b8995ef2 Implement TopologyInfo and cpu_ids in podresources
It covers deviceplugin & cpumanager.

It has drawback, since cpuset and all other structs including cadvisor's keep
cpu as int, but for protobuf based interface is better to have fixed
int.
This patch also introduces additional interface CPUsProvider, while
DeviceProvider might have been extended too.

Checkpoint not covered by unit test.

Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
Signed-off-by: Alexey Perevalov <alexey.perevalov@huawei.com>
2020-11-11 13:50:49 +03:00
Alexey Perevalov
9f54dccc92 Change GetDevices interface
This change is necessary for supporting Topology in the ContainerDevices.

Signed-off-by: Alexey Perevalov <alexey.perevalov@huawei.com>
2020-11-11 12:41:31 +03:00
Ali
bfdeda58b7 Delete framework/v1alpha1 folder and change remaining import paths 2020-10-23 13:16:13 +11:00
Renaud Gaubert
60304452ff Move podresources api to k8s.io/kubelet/pkg/apis
Signed-off-by: Renaud Gaubert <rgaubert@nvidia.com>
2020-09-15 05:13:33 -07:00
Davanum Srinivas
442a69c3bd
switch over k/k to use klog v2
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2020-05-16 07:54:27 -04:00
Abdullah Gharaibeh
d6522e0e74 rename framework pkg with schedulerframework for all instances under pkg/kubelet 2020-04-14 14:24:07 -04:00
Abdullah Gharaibeh
bed9b2f23b Cleanup obsolete NodeInfo methods 2020-04-12 18:13:46 -04:00
Kevin Klues
2327934a86 Rename GetTopologyPodAmitHandler() as
GetAllocateResourcesPodAdmitHandler(). It is named as such to reflect its
new function. Also remove the Topology Manager feature gate check at higher level
kubelet.go, as it is now done in GetAllocateResourcesPodAdmitHandler().
2020-02-27 07:52:43 +00:00
Kevin Klues
0d68bffd03 Change GetTopologyPodAdmitHandler() to be more general
GetTopologyPodAdmitHandler() now returns a lifecycle.PodAdmitHandler
type instead of the TopologyManager directly. The handler it returns
is generally responsible for attempting to allocate any resources that
require a pod admission check. When the TopologyManager feature gate
is on, this comes directly from the TopologyManager. When it is off,
we simply attempt the allocations ourselves and fail the admission
on an unexpected error. The higher level kubelet.go feature gate
check will be removed in an upcoming PR.
2020-02-27 07:24:26 +00:00
Takeaki Matsumoto
785fac6826 Make updateAllocatedDevices() as a public method and call it in
podresources api
2020-02-07 13:26:56 +09:00
Louise Daly
9f0081cc36 Updates to container manager and internal container lifecycle to accommodate Topology Manager
Co-authored-by: Conor Nolan <conor.nolan@intel.com>
2019-07-24 08:09:38 +01:00
Tara Gu
5e18554442 Implement plugin manager - a controller that manages plugin registration/unregistration 2019-05-30 19:00:59 -04:00
Richard Chen
c9f1b57b5b Reset extended resources only when node is recreated. 2019-05-21 14:16:54 -07:00
Davanum Srinivas
33081c1f07
New staging repository for cri-api
Change-Id: I2160b0b0ec4b9870a2d4452b428e395bbe12afbb
2019-03-26 18:21:04 -04:00
yuexiao-wang
f3353c358d [scheduler cleanup phase 2]: Rename to
Signed-off-by: yuexiao-wang <wang.yuexiao@zte.com.cn>
2018-12-11 11:21:12 +08:00
David Ashpole
630cb53f82 add kubelet grpc server for pod-resources service 2018-11-15 09:43:20 -08:00
Davanum Srinivas
954996e231
Move from glog to klog
- Move from the old github.com/golang/glog to k8s.io/klog
- klog as explicit InitFlags() so we add them as necessary
- we update the other repositories that we vendor that made a similar
change from glog to klog
  * github.com/kubernetes/repo-infra
  * k8s.io/gengo/
  * k8s.io/kube-openapi/
  * github.com/google/cadvisor
- Entirely remove all references to glog
- Fix some tests by explicit InitFlags in their init() methods

Change-Id: I92db545ff36fcec83afe98f550c9e630098b3135
2018-11-10 07:50:31 -05:00
Renaud Gaubert
8dd1d27c03 Updated the device manager pluginwatcher handler 2018-09-06 15:34:46 +02:00
vikaschoudhary16
a5842503eb Use probe based plugin discovery mechanism in device manager 2018-07-17 04:02:31 -04:00
Guoliang Wang
761cf41427 Move pkg/scheduler/schedulercache -> pkg/scheduler/cache 2018-05-31 22:55:34 +08:00
Jing Xu
b2e744c620 Promote LocalStorageCapacityIsolation feature to beta
The LocalStorageCapacityIsolation feature added a new resource type
ResourceEphemeralStorage "ephemeral-storage" so that this resource can
be allocated, limited, and consumed as the same way as CPU/memory. All
the features related to resource management (resource request/limit, quota, limitrange) are avaiable for local ephemeral storage.

This local ephemeral storage represents the storage for root file system, which will be consumed by containers' writtable layer and logs. Some volumes such as emptyDir might also consume this storage.
2018-03-02 15:10:08 -08:00
David Ashpole
960856f4e8 collect metrics on the /kubepods cgroup on-demand 2018-02-17 12:32:40 -08:00
Kubernetes Submit Queue
f2e46a2147
Merge pull request #57266 from vikaschoudhary16/unhealthy_device
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Handle Unhealthy devices

Update node capacity with sum of both healthy and unhealthy devices.
Node allocatable reflect only healthy devices.



**What this PR does / why we need it**:
Currently node capacity only reflects healthy devices. Unhealthy devices are ignored totally while updating node status. This PR handles unhealthy devices while updating node status. 

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #57241

**Special notes for your reviewer**:

**Release note**:
<!--  Write your release note:
Handle Unhealthy devices

```release-note
Handle Unhealthy devices
```
/cc @tengqm @ConnorDoyle @jiayingz @vishh @jeremyeder @sjenning @resouer @ScorpioCPH @lichuqiang @RenaudWasTaken @balajismaniam 

/sig node
2018-01-12 19:55:54 -08:00
vikaschoudhary16
e9cf3f1ac4 Handle Unhealthy devices
Update node capacity with sum of both healthy and unhealthy devices.
Node allocatable reflect only healthy devices.
2018-01-09 11:38:48 -05:00
Jonathan Basseri
30b89d830b Move scheduler code out of plugin directory.
This moves plugin/pkg/scheduler to pkg/scheduler and
plugin/cmd/kube-scheduler to cmd/kube-scheduler.

Bulk of the work was done with gomvpkg, except for kube-scheduler main
package.
2018-01-05 15:05:01 -08:00
Jiaying Zhang
1eb4e79453 Extends deviceplugin to gracefully handle full device plugin lifecycle.
- Instead of using cm.capacity field to communicate device plugin resource
capacity, this PR changes to use an explicit cm.GetDevicePluginResourceCapacity()
function that returns device plugin resource capacity as well as any inactive
device plugin resource. Kubelet syncNodeStatus call this function during its
periodic run to update node status capacity and allocatable. After this call,
device plugin can remove the inactive device plugin resource from its allDevices
field as the update is already pushed to API server.
- Extends device plugin checkpoint data to record registered resources
so that we can finish resource removing even upon kubelet restarts.
- Passes sourcesReady from kubelet to device plugin to avoid removing
inactive pods during grace period of kubelet restart.
2017-11-20 23:40:14 -08:00
lichuqiang
ebd445eb8c add admission handler for device resources allocation 2017-11-02 09:17:48 +08:00
Connor Doyle
81ccd396d7 Fixed nil InternalContainerLifecycle in cm stubs. 2017-09-04 07:24:59 -07:00
Connor Doyle
ec706216e6 Un-revert "CPU manager wiring and none policy"
This reverts commit 8d2832021a.
2017-09-04 07:24:59 -07:00
Jiaying Zhang
02001af752 Kubelet side extension to support device allocation 2017-09-01 11:56:35 -07:00
Shyam JVS
8d2832021a Revert "CPU manager wiring and none policy" 2017-09-01 18:17:36 +02:00
Connor Doyle
7c6e31617d CPU Manager initialization and lifecycle calls. 2017-08-30 08:50:41 -07:00
Vishnu kannan
82f7820066 Kubelet:
Centralize Capacity discovery of standard resources in Container manager.
Have storage derive node capacity from container manager.
Move certain cAdvisor interfaces to the cAdvisor package in the process.

This patch fixes a bug in container manager where it was writing to a map without synchronization.

Signed-off-by: Vishnu kannan <vishnuk@google.com>
2017-06-27 18:45:02 -07:00