Also make some design changes exposed in testing and review.
Do not remove the ambiguous old metric
`apiserver_flowcontrol_request_concurrency_limit` because reviewers
though it is too early. This creates a problem, that metric can not
keep both of its old meanings. I chose the configured concurrency
limit.
Testing has revealed a design flaw, which concerns the initialization
of the seat demand state tracking. The current design in the KEP is
as follows.
> Adjustment is also done on configuration change … For a newly
> introduced priority level, we set HighSeatDemand, AvgSeatDemand, and
> SmoothSeatDemand to NominalCL-LendableSD/2 and StDevSeatDemand to
> zero.
But this does not work out well at server startup. As part of its
construction, the APF controller does a configuration change with zero
objects read, to initialize its request-handling state. As always,
the two mandatory priority levels are implicitly added whenever they
are not read. So this initial reconfig has one non-exempt priority
level, the mandatory one called catch-all --- and it gets its
SmoothSeatDemand initialized to the whole server concurrency limit.
From there it decays slowly, as per the regular design. So for a
fairly long time, it appears to have a high demand and competes
strongly with the other priority levels. Its Target is higher than
all the others, once they start to show up. It properly gets a low
NominalCL once other levels show up, which actually makes it compete
harder for borrowing: it has an exceptionally high Target and a rather
low NominalCL.
I have considered the following fix. The idea is that the designed
initialization is not appropriate before all the default objects are
read. So the fix is to have a mode bit in the controller. In the
initial state, those seat demand tracking variables are set to zero.
Once the config-producing controller detects that all the default
objects are pre-existing, it flips the mode bit. In the later mode,
the seat demand tracking variables are initialized as originally
designed.
However, that still gives preferential treatment to the default
PriorityLevelConfiguration objects, over any that may be added later.
So I have made a universal and simpler fix: always initialize those
seat demand tracking variables to zero. Even if a lot of load shows
up quickly, remember that adjustments are frequent (every 10 sec) and
the very next one will fully respond to that load.
Also: revise logging logic, to log at numerically lower V level when
there is a change.
Also: bug fix in float64close.
Also, separate imports in some file
Co-authored-by: Han Kang <hankang@google.com>
so that it explicitly describe group information defined in the
container image will be kept. This also adds e2e test case of
SupplementalGroups with pre-defined groups in the container
image to make the behaivier clearer.
- New API field .spec.schedulingGates
- Validation and drop disabled fields
- Disallow binding a Pod carrying non-nil schedulingGates
- Disallow creating a Pod with non-nil nodeName and non-nil schedulingGates
- Adds a {type:PodScheduled, reason:WaitingForGates} condition if necessary
- New literal SchedulingGated in the STATUS column of `k get pod`
In the PR https://github.com/kubernetes/kubernetes/pull/86139, two more lifecycle hook tests (poststart / prestop)
were added using HTTPS. They are similar with the existing HTTP tests.
However, this causes failures on Windows due to how networking
works there. We previously fixed this in the HTTP tests via f9e4a015e2.
This commit applies the same fix on the lifecycle hook HTTPS tests.
Signed-off-by: Ionut Balutoiu <ibalutoiu@cloudbasesolutions.com>
The cloud-provider and the e2e test were racing on deleting the
cloud resources.
Also, the cloud-provider should not leave orphan resources, that will
be detected by the job and fail, thus we should not have additional
logic to cleanup masking these errors.
Removed the unit tests that test the cases when the MixedProtocolLBService feature flag was false - the feature flag is locked to true with GA
Added an integration test to test whether the API server accepts an LB Service with different protocols.
Added an e2e test to test whether a service which is exposed by a multi-protocol LB Service is accessible via both ports.
Removed the conditional validation that compared the new and the old Service definitions during an update - the feature flag is locked to true with GA.
The original intention (adding more information for later analysis)
is probably obsolete because there is no code which does anything
with the extended error.
The code in upgrade_suite.go collected it in an in-memory JUnit report, but
then didn't do anything with that field. The code also wouldn't work for
failures detected by Ginkgo itself, like the upcoming timeout handling. If the
upgrade suite needs the information, it probably should get it from Gingko with
a ReportAfterSuite call instead of depending in some fragile interception
mechanism.
Some of our API types contain fields that get rendered very poorly by
gomega.format.Object because they contain lots of internal information, for
example CreationTimestamp. As a result, dumping full API object typically gets
truncated.
What we want is a representation that is a) multi-line (in contrast to the
stringer implemented by our types) and b) drops empty fields where it
was defined that this is okay.
The normal YAML representation fits that requirement. We just need to teach
gomega how and when to do that. This cannot be done for each type through a
generated GomegaString method (lots of code, additional dependency in public
API on YAML encoder), but it can be done inside tests by adding a formatting
handler (new gomega feature).
Our tooling cannot handle very long failure messages well:
- when unfolding a test in the spyglass UI, it fills the entire screen
- failure correlation for http://go.k8s.io/triage has resource constraints
We cannot enforce that all tests only produce short failure messages and even
if we could, depending on the test failure, including more information may be
useful to understand it.
To achieve both goals (summary for correlation and overview, all details
available when digging deeper), too longer failure messages now get truncated,
with the full message guaranteed to be captured in the test output.
"Too long" is arbitrarily chosen to be similar to the gomega.MaxLength because
that has been a limit for failure message size in the past.
When gomega.format exceeds the default size of 4000, it truncates and prints:
Gomega truncated this representation as it exceeds 'format.MaxLength'.
Consider having the object provide a custom 'GomegaStringer' representation
or adjust the parameters in Gomega's 'format' package.
Learn more here: https://onsi.github.io/gomega/#adjusting-output
These instructions don't help the user of the e2e.test binary unless we provide
a command line flag.
Commit 99e9096034 was only supposed to remove the
FailfWithOffset function, but it also changed the behavior by skipping one
additional stack frame. That makes no sense and is inconsistent with Fail, which
also logs the direct caller.