1. Define ContainerResizePolicy and add it to Container struct.
2. Add ResourcesAllocated and Resources fields to ContainerStatus struct.
3. Define ResourcesResizeStatus and add it to PodStatus struct.
4. Add InPlacePodVerticalScaling feature gate and drop disabled fields.
5. ResizePolicy validation & defaulting and Resources mutability for CPU/Memory.
6. Various fixes from code review feedback (originally committed on Apr 12, 2022)
KEP: /enhancements/keps/sig-node/1287-in-place-update-pod-resources
PVC and containers share the same ResourceRequirements struct. The Claims field
in it only makes sense when used in containers. When used in a PVC, the field
should have been rejected by validation. This was overlooked when introducing
it, so now persisted objects might have it set and/or people may have started
to rely on it being accepted even when it has no effect.
Therefore we cannot reject it in validation anymore, but we can still strip
it out on create or update.
Currently, there are some unit tests that are failing on Windows due to
various reasons:
- On Windows, consecutive time.Now() calls may return the same timestamp, which would cause
the TestFreeSpaceRemoveByLeastRecentlyUsed test to flake.
- tests in kuberuntime_container_windows_test.go fail on Nodes that have fewer than 3 CPUs,
expecting the CPU max set to be more than 100% of available CPUs, which is not possible.
- calls in summary_windows_test.go are missing context.
- filterTerminatedContainerInfoAndAssembleByPodCgroupKey will filter and group container
information by the Pod cgroup key, if it exists. However, we don't have cgroups on Windows,
thus we can't make the same assertions.
This fixes the following warning (error?) in the apiserver:
E0126 18:10:38.665239 16370 fieldmanager.go:210] "[SHOULD NOT HAPPEN] failed to update managedFields" err="failed to convert new object (test/claim-84; resource.k8s.io/v1alpha1, Kind=ResourceClaim) to smd typed: .status.reservedFor: element 0: associative list without keys has an element that's a map type" VersionKind="/, Kind=" namespace="test" name="claim-84"
The root cause is the same as in e50e8a0c91:
nothing in Kubernetes outright complains about a list of items where the item
type is comparable in Go, but not a simple type. This nonetheless isn't
supposed to be done in the API and can causes problems elsewhere.
For the ReservedFor field, everything seems to work okay except for the
warning. However, it's better to follow conventions and use a map. This is
possible in this case because UID is guaranteed to be a unique key.
Validation is now stricter than before, which is a good thing: previously,
two entries with the same UID were allowed as long as some other field was
different, which wasn't a situation that should have been allowed.
PV.Spec.CSI.*SecretReference.Name should be allowed to have up to be
limited to 253 characters (DNS1123Subdomain) and not to 63 characters
(DNS1123Label), so all possible Secrets names can be used as secrets in a
PV.
This is continuation of
https://github.com/kubernetes/kubernetes/pull/108331 / Kubernetes 1.25,
which allowed updating PVs with long secret names, if the previous PV had
long secret name too. This makes sure downgrade from 1.27 to 1.26 works well
and allows PVs created in 1.27 to be updated in 1.26.
Now the long secret names are accepted during PV creation too.
A recent commit changed name validation from DNS Subdomain to DNS Label.
The assumption was that a subdomain-named SS could never work and the
only reasonable thing to do would be to delete it. But if there is a
finalizer, the delete is not possible because we would reject the update
because the old name (subdomain) did not pass the new validation.
This commit does not re-validate the ObjectMeta on update. Probably
every resource should follow this pattern, but mostly it's a non-issue
becauase the above change (name validation) is not something we do -
this case was excpetional.
Any StatefuleSet which took advantage of this (by having dots in the
name) can't have worked because we set `pod.spec.hostname` from it,
which is validated as a DNS label.
So while this is strictly a breaking change, it doesn't break anything
that was not already broken.
Add generatod docs for batch v1
Start types with uppercase letters
Fix batch API docs under pgs/apis
Create generated files for batch v1
Fix batch v1beta1 docs
Generate new files after merge conflict
This is in response to review feedback. Checking for valid node names and the
set property catches programming mistakes in the components that have write
permission.
This adds a new resource.k8s.io API group with v1alpha1 as version. It contains
four new types: resource.ResourceClaim, resource.ResourceClass, resource.ResourceClaimTemplate, and
resource.PodScheduling.
Also make some design changes exposed in testing and review.
Do not remove the ambiguous old metric
`apiserver_flowcontrol_request_concurrency_limit` because reviewers
though it is too early. This creates a problem, that metric can not
keep both of its old meanings. I chose the configured concurrency
limit.
Testing has revealed a design flaw, which concerns the initialization
of the seat demand state tracking. The current design in the KEP is
as follows.
> Adjustment is also done on configuration change … For a newly
> introduced priority level, we set HighSeatDemand, AvgSeatDemand, and
> SmoothSeatDemand to NominalCL-LendableSD/2 and StDevSeatDemand to
> zero.
But this does not work out well at server startup. As part of its
construction, the APF controller does a configuration change with zero
objects read, to initialize its request-handling state. As always,
the two mandatory priority levels are implicitly added whenever they
are not read. So this initial reconfig has one non-exempt priority
level, the mandatory one called catch-all --- and it gets its
SmoothSeatDemand initialized to the whole server concurrency limit.
From there it decays slowly, as per the regular design. So for a
fairly long time, it appears to have a high demand and competes
strongly with the other priority levels. Its Target is higher than
all the others, once they start to show up. It properly gets a low
NominalCL once other levels show up, which actually makes it compete
harder for borrowing: it has an exceptionally high Target and a rather
low NominalCL.
I have considered the following fix. The idea is that the designed
initialization is not appropriate before all the default objects are
read. So the fix is to have a mode bit in the controller. In the
initial state, those seat demand tracking variables are set to zero.
Once the config-producing controller detects that all the default
objects are pre-existing, it flips the mode bit. In the later mode,
the seat demand tracking variables are initialized as originally
designed.
However, that still gives preferential treatment to the default
PriorityLevelConfiguration objects, over any that may be added later.
So I have made a universal and simpler fix: always initialize those
seat demand tracking variables to zero. Even if a lot of load shows
up quickly, remember that adjustments are frequent (every 10 sec) and
the very next one will fully respond to that load.
Also: revise logging logic, to log at numerically lower V level when
there is a change.
Also: bug fix in float64close.
Also, separate imports in some file
Co-authored-by: Han Kang <hankang@google.com>
so that it explicitly describe group information defined in the
container image will be kept. This also adds e2e test case of
SupplementalGroups with pre-defined groups in the container
image to make the behaivier clearer.