This makes the API nicer:
resourceClaims:
- name: with-template
resourceClaimTemplateName: test-inline-claim-template
- name: with-claim
resourceClaimName: test-shared-claim
Previously, this was:
resourceClaims:
- name: with-template
source:
resourceClaimTemplateName: test-inline-claim-template
- name: with-claim
source:
resourceClaimName: test-shared-claim
A more long-term benefit is that other, future alternatives
might not make sense under the "source" umbrella.
This is a breaking change. It's justified because DRA is still
alpha and will have several other API breaks in 1.31.
MultiCIDRServiceAllocator implements a new ClusterIP allocator based on
IPAddress object to solve the problems and limitations caused by
existing bitmap allocators.
However, during the rollout of new versions, deployments need to support
a skew of one version between kube-apiservers. To avoid the possible
problem where there are multiple Services requests on the skewed
apiservers and that both allocate the same IP to different Services,
the new allocator will implement a dual-write strategy under the
feature gate DisableAllocatorDualWrite.
After the MultiCIDRServiceAllocator is GA, the DisableAllocatorDualWrite
can be enabled safely as all apiservers will run with the new
allocators. The graduation of DisableAllocatorDualWrite can also
be used to clean up the opaque API object that contains the old bitmaps.
If MultiCIDRServiceAllocator is enabled and DisableAllocatorDualWrite is disable
and is a new environment, there is no bitmap object created, hence, the
apiserver will initialize it to be able to write on it.
The current results with 100 works and 15k services on a (n2-standard-48) vCPU: 48 RAM: 192 GB are:
Old allocator:
perf_test.go:139: [RESULT] Duration 1m9.646167533s: [quantile:0.5 value:0.462886801 quantile:0.9 value:0.496662838 quantile:0.99 value:0.725845905]
New allocator:
perf_test.go:139: [RESULT] Duration 2m12.900694343s: [quantile:0.5 value:0.481814448 quantile:0.9 value:1.3867615469999999 quantile:0.99 value:1.888190671]
The new allocator has higher latency but in contrast allow to use a
larger number of services, when tested with 65k Services the old
allocator etcd crashes with storage exceeded.
The scenario is also not realistic, as a continuous and high load on
Service creation is not expected.
Dropping the error that is returned by allocateOne hides the reason *why*
allocation failed. Including the UID is "too much information" for an error
message (usually the user doesn't care about the exact identity, just the name)
and the claim name can and will be added by the caller.
Before:
controller.go:373: E0625 16:04:12.140953] test-driver.cdi.k8s.io/resource controller: processing failed err="claim test-dramq9jv-resource-h72pg: failed allocating claim 8551afba-3c9a-4a8a-8633-6fad6c4b9e42" key="schedulingCtx:test/test-dramq9jv"
event.go:377: I0625 16:04:12.141031] test-driver.cdi.k8s.io/resource controller: Event(v1.ObjectReference{Kind:"PodSchedulingContext", Namespace:"test", Name:"test-dra65gfw", UID:"6be9ba57-31da-4fef-b61d-b0468d71afcf", APIVersion:"resource.k8s.io/v1alpha3", ResourceVersion:"197", FieldPath:""}): type: 'Warning' reason: 'Failed' claim test-dra65gfw-resource-zpzrj: failed allocating claim f98a32e1-ab7d-4b34-a258-6d8224aa9006
After:
controller.go:373: E0625 16:02:54.248059] test-driver.cdi.k8s.io/resource controller: processing failed err="claim test-dram98ll-resource-nvsbj: device selectors are not supported" key="schedulingCtx:test/test-dram98ll"
event.go:377: I0625 16:02:54.248163] test-driver.cdi.k8s.io/resource controller: Event(v1.ObjectReference{Kind:"PodSchedulingContext", Namespace:"test", Name:"test-dratpt77", UID:"24010402-b026-4fe4-a535-e1dab69db8c0", APIVersion:"resource.k8s.io/v1alpha3", ResourceVersion:"298", FieldPath:""}): type: 'Warning' reason: 'Failed' claim test-dratpt77-resource-vlgrv: device selectors are not supported
format.Object adds some white space in front of the value and a type identifier
in angle brackets. Both is distracting when printing simple values and can be
avoided by picking fmt.Sprintf for those types, plus trimming the result of
format.Object.
Before:
allocator.go:483: I0625 15:35:31.946980] Allocating one device currentClaim= <int>: 0 totalClaims= <int>: 1 currentRequest= <int>: 0 totalRequestsPerClaim= <int>: 1 currentDevice= <int>: 0 devicesPerRequest= <int>: 1 allDevices= <bool>: false adminAccess= <bool>: false
After:
allocator.go:483: I0625 15:35:04.371441] Allocating one device currentClaim=0 totalClaims=1 currentRequest=0 totalRequestsPerClaim=1 currentDevice=0 devicesPerRequest=1 allDevices=false adminAccess=false