Commit Graph

125 Commits

Author SHA1 Message Date
Kevin Klues
00df26a985 Fix a bug whereby reusable CPUs and devices were not being honored
Previously, it was possible for reusable CPUs and reusable devices (i.e.
those previously consumed by init containers) to not be reused by
subsequent init containers or app containers if the TopologyManager was
enabled. This would happen because hint generation for the
TopologyManager was not considering the reusable devices when it made
its hint calculation.

As such, it would sometimes:
1) Generate a hint for a differnent NUMA node, causing the CPUs and
devices to be allocated from that node instead of the one where the
reusable devices live; or
2) End up thinking there were not enough CPUs or devices to allocate and
throw a TopologyAffinity admission error

This patch fixes this by ensuring that reusable CPUs and devices are
considered as part of TopologyHint generation. This frunctionality is
difficult to unit test since it spans multiple components, but an e2e
test will be added in a subsequent patch to test this functionality.
2020-07-20 11:41:13 +00:00
Kevin Klues
74fe9364c3 Simplify logic in devicemanager TopologyHint generation 2020-07-20 11:41:13 +00:00
Kevin Klues
26cb650655 Remove unnecessary union after call to GetPreferredAllocation()
There is no need to try and allocate already-allocated devices again.
2020-07-07 06:35:57 +00:00
Kevin Klues
67ecc11c44 Harden callGetPreferredAllocationIfAvailable() return value
Previously, we didn't check the contents of the result after calling out
to the plugin endpoint. This could have resulted in errors if the plugin
returned either 'nil' or an empty result. This patch fixes this.
2020-07-07 06:35:57 +00:00
Kevin Klues
d87365494a Fix bug in call to callGetPreferredAllocationIfAvailable()
Previously, we were passing the variable 'devices' to this function,
when we should have been passing 'allocated'. This bug crept in due to a
variable name change that didn't propogate its way through the entire
function. The tests added in the previous commit would have caught this.
2020-07-07 06:35:57 +00:00
Kevin Klues
d551ab1e78 Add tests to check paramaters passed to GetPreferredAllocation()
These tests uncovered some small bugs that will be fixed in a subsequent
set of commits.
2020-07-07 06:35:57 +00:00
Kevin Klues
5bd0db0b1f Add new test cases for GetPreferredAllocation() in allocation path 2020-07-03 13:01:32 +00:00
Kevin Klues
83f18d9975 Remove unnecessary field from TestTopologyAlignedAllocation() test cases 2020-07-03 13:01:32 +00:00
Kevin Klues
bb08fd1135 Add a simple endpoint test for GetPreferredAllocation()
More extensive tests that exercise the allocation logic are to follow.
2020-07-03 13:01:32 +00:00
Kevin Klues
cbd405d85c Update existing tests in support of GetPreferredallocation() 2020-07-03 13:01:32 +00:00
Kevin Klues
a780ccff5b Updates logic in devicesToAllocate() to call GetPreferredAllocation() 2020-07-02 22:07:27 +00:00
Kevin Klues
bb56a09133 Add callGetPreferredAllocationIfAvailable() function in devicemanager
This function mimics what is already done for the conditional call to
PreStartContainer() via the callPreStartContainerIfNeeded() function.
2020-07-02 22:07:27 +00:00
Kevin Klues
abf87c99c6 Add GetPreferredAllocation() as a supported device plugin endpoint 2020-07-02 15:15:50 +00:00
Kevin Klues
32c047a52e Update device plugin stub with new GetPreferredAllocation() call 2020-07-02 15:15:48 +00:00
Kevin Klues
c45f1317eb Fix some whitespacing and comments in devicemanager 2020-07-02 15:15:44 +00:00
Davanum Srinivas
07d88617e5
Run hack/update-vendor.sh
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2020-05-16 07:54:33 -04:00
Davanum Srinivas
442a69c3bd
switch over k/k to use klog v2
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2020-05-16 07:54:27 -04:00
caiweidong
5ed8fb690c Use klog to replace log to keep them in consistence 2020-05-12 13:49:10 +08:00
Abdullah Gharaibeh
d6522e0e74 rename framework pkg with schedulerframework for all instances under pkg/kubelet 2020-04-14 14:24:07 -04:00
Abdullah Gharaibeh
bed9b2f23b Cleanup obsolete NodeInfo methods 2020-04-12 18:13:46 -04:00
nolancon
a9c6129577 Device Manager - Update unit tests
- Pass container to Allocate().
- Loop through containers to call Allocate() on container by container
basis.
2020-02-27 07:24:34 +00:00
nolancon
cb9fdc49db Device Manager - Refactor allocatePodResources
- allocatePodResources logic altered to allow for container by container
device allocation.
- New type PodReusableDevices
- New field in devicemanager devicesToReuse
2020-02-27 07:24:34 +00:00
Kevin Klues
0b168f0243 Change devicemanager to implement HintProvider.Allocate()
This change will not work on its own. Higher level code needs to make
sure and call Allocate() before AddContainer is called. This is already
being done in cases when the TopologyManager feature gate is enabled (in
the PodAdmitHandler of the TopologyManager). However, we need to make
sure we add proper logic to call it in cases when the TopologyManager
feature gate is disabled.
2020-02-10 03:27:47 +00:00
Kevin Klues
a3f099ea4d Split devicemanager Allocate into two functions
Instead of having a single call for Allocate(), we now split this into two
functions Allocate() and UpdatePluginResources().

The semantics split across them:

// Allocate configures and assigns devices to a pod. From the requested
// device resources, Allocate will communicate with the owning device
// plugin to allow setup procedures to take place, and for the device
// plugin to provide runtime settings to use the device (environment
// variables, mount points and device files).
Allocate(pod *v1.Pod) error

// UpdatePluginResources updates node resources based on devices already
// allocated to pods. The node object is provided for the device manager to
// update the node capacity to reflect the currently available devices.
UpdatePluginResources(
    node *schedulernodeinfo.NodeInfo,
    attrs *lifecycle.PodAdmitAttributes) error

As we move to a model in which the TopologyManager is able to ensure
aligned allocations from the CPUManager, devicemanger, and any
other TopologManager HintProviders in the same synchronous loop, we will
need to be able to call Allocate() independently from an
UpdatePluginResources(). This commit makes that possible.
2020-02-10 03:27:47 +00:00
Takeaki Matsumoto
785fac6826 Make updateAllocatedDevices() as a public method and call it in
podresources api
2020-02-07 13:26:56 +09:00
Kevin Klues
bc686ea27b Update TopologyManager.GetTopologyHints() to take pointers
Previously, this function was taking full Pod and Container objects
unnecessarily. This commit updates this so that they will take pointers
instead.
2020-02-03 17:13:28 +00:00
danielqsj
1a9b121764 remove deprecated metrics of kubelet 2020-01-10 16:46:52 +08:00
danielqsj
19fe9f8d94 replace grpc.WithDialer which is deprecated 2019-12-26 17:46:59 +08:00
Kubernetes Prow Robot
ad5d4c4705
Merge pull request #85706 from yutedz/per-node-dev
Remove nodes slice in loop of takeByTopology
2019-12-05 13:50:30 -08:00
Kubernetes Prow Robot
57b6b287d4
Merge pull request #85688 from yutedz/pods-to-rm
Reduce unnecessary Set in updateAllocatedDevices
2019-12-02 17:07:26 -08:00
Ted Yu
84a9803741 Log error when writing checkpoint fails 2019-11-29 19:47:17 -08:00
Ted Yu
6415fa765e Remove nodes slice in loop of takeByTopology 2019-11-29 12:12:22 -08:00
Ted Yu
86f3bc25e1 Reduce unnecessary Set in updateAllocatedDevices 2019-11-27 08:48:06 -08:00
Kubernetes Prow Robot
30e6238795
Merge pull request #85147 from yutedz/devmgr-rm-contents
Continue removing file in ManagerImpl#removeContents
2019-11-14 16:38:28 -08:00
Ted Yu
fb046f7787 Continue removing file in ManagerImpl#removeContents 2019-11-13 06:00:34 -08:00
Kubernetes Prow Robot
ed10b5b17f
Merge pull request #85047 from yutedz/dev-mgr-err-handling
Handle error return from allocatePodResources
2019-11-12 11:51:27 -08:00
Kubernetes Prow Robot
897ce3073c
Merge pull request #84533 from davidz627/fix/deprecatedPath
Remove plugin watching of deprecated directory and CSI v0 support in accordance with deprecation policy
2019-11-12 04:48:20 -08:00
David Zhu
802fe12803 Remove plugin watching of deprecated directory {kubelet_root_dir}/plugins and support for CSI V0 in accordance with deprecation announcement in https://v1-13.docs.kubernetes.io/docs/setup/release/notes/ 2019-11-11 11:42:58 -08:00
Ted Yu
db0f616974 Handle error return from allocatePodResources 2019-11-09 16:25:15 -08:00
Kubernetes Prow Robot
08e5781b41
Merge pull request #84525 from klueska/upstream-fix-hint-generation-after-kubelet-restart
Fix bug in TopologyManager hint generation after kubelet restart
2019-11-06 15:33:50 -08:00
Kevin Klues
4d4d4bdd61 Ensure devicemanager TopologyHints are regenerated after kubelet restart
This patch also includes test to make sure the newly added logic works
as expected.
2019-11-06 15:01:34 +00:00
Kubernetes Prow Robot
0c0408c790
Merge pull request #76407 from yanghaichao12/dev0411
change directory permissions from 0755 to 0750
2019-11-05 19:30:59 -08:00
Kevin Klues
a338c8f7fd Add some more comments to GetTopologyHints() in the devicemanager 2019-11-05 13:06:23 +00:00
Kevin Klues
58f3554ebe Sync all CPU and device state before generating TopologyHints for them
This ensures that we have the most up-to-date state when generating
topology hints for a container. Without this, it's possible that some
resources will be seen as allocated, when they are actually free.
2019-11-05 13:00:20 +00:00
Adrian Chiris
b17706b149 Added LessThan() and IsEqual() methods for TopologyHints 2019-11-04 18:43:07 +01:00
yanghaichao12
5cbafba457 change directory permissions from 0755 to 0750 2019-11-04 17:04:37 +08:00
Kubernetes Prow Robot
3db6d3abcf
Merge pull request #83551 from dims/move-external-facing-kubelet-apis-to-staging
Move external facing kubelet apis to staging
2019-10-10 13:41:36 -07:00
Davanum Srinivas
f29d2272c8
fix gofmt and golint failures
Change-Id: I6535b506f50558b31663a13cd270b15023afa2c6
2019-10-06 18:43:17 -04:00
Kubernetes Prow Robot
48b90db9c3
Merge pull request #83495 from tanjunchen/fix-typo
remove the repeat word in documents
2019-10-06 15:05:08 -07:00
Davanum Srinivas
6ecc0f83af
update bazel BUILD files
Change-Id: Ia3917cec1453c0b22a958faf8c22bccd79242d14
2019-10-06 15:29:23 -04:00