Commit Graph

13646 Commits

Author SHA1 Message Date
Kubernetes Prow Robot
8eddcac00a Merge pull request #113597 from ionutbalutoiu/tests/lifecycle-hooks
tests: Spawn poststart / prestop pods on the same node as the https pod
2022-11-09 11:12:40 -08:00
Kubernetes Prow Robot
7e0e0c8ec3 Merge pull request #113360 from mimowo/handling-pod-failures-beta-enable
Enable the "Retriable and non-retriable pod failures for jobs" feature into beta
2022-11-09 08:30:24 -08:00
Kubernetes Prow Robot
a14601a77f Merge pull request #113753 from gnufied/fix-broken-e2e-csi-serial-readwrite-once
Fix broken readwriteOncePod serial tests
2022-11-09 03:50:24 -08:00
Kubernetes Prow Robot
1193a9abcb Merge pull request #113485 from MikeSpreitzer/apf-borrowing
Add borrowing between priority levels in APF
2022-11-09 01:40:12 -08:00
Michal Wozniak
818e180300 Add e2e test for adding DisruptionTarget condition to the preemption victim pod 2022-11-09 09:02:40 +01:00
Michal Wozniak
c803892bd8 Enable the feature into beta 2022-11-09 09:02:40 +01:00
Mike Spreitzer
feb4227788 apiserver: finish implementation of borrowing in APF
Also make some design changes exposed in testing and review.

Do not remove the ambiguous old metric
`apiserver_flowcontrol_request_concurrency_limit` because reviewers
though it is too early.  This creates a problem, that metric can not
keep both of its old meanings.  I chose the configured concurrency
limit.

Testing has revealed a design flaw, which concerns the initialization
of the seat demand state tracking.  The current design in the KEP is
as follows.

> Adjustment is also done on configuration change … For a newly
> introduced priority level, we set HighSeatDemand, AvgSeatDemand, and
> SmoothSeatDemand to NominalCL-LendableSD/2 and StDevSeatDemand to
> zero.

But this does not work out well at server startup.  As part of its
construction, the APF controller does a configuration change with zero
objects read, to initialize its request-handling state.  As always,
the two mandatory priority levels are implicitly added whenever they
are not read.  So this initial reconfig has one non-exempt priority
level, the mandatory one called catch-all --- and it gets its
SmoothSeatDemand initialized to the whole server concurrency limit.
From there it decays slowly, as per the regular design.  So for a
fairly long time, it appears to have a high demand and competes
strongly with the other priority levels.  Its Target is higher than
all the others, once they start to show up.  It properly gets a low
NominalCL once other levels show up, which actually makes it compete
harder for borrowing: it has an exceptionally high Target and a rather
low NominalCL.

I have considered the following fix.  The idea is that the designed
initialization is not appropriate before all the default objects are
read.  So the fix is to have a mode bit in the controller.  In the
initial state, those seat demand tracking variables are set to zero.
Once the config-producing controller detects that all the default
objects are pre-existing, it flips the mode bit.  In the later mode,
the seat demand tracking variables are initialized as originally
designed.

However, that still gives preferential treatment to the default
PriorityLevelConfiguration objects, over any that may be added later.

So I have made a universal and simpler fix: always initialize those
seat demand tracking variables to zero.  Even if a lot of load shows
up quickly, remember that adjustments are frequent (every 10 sec) and
the very next one will fully respond to that load.

Also: revise logging logic, to log at numerically lower V level when
there is a change.

Also: bug fix in float64close.

Also, separate imports in some file

Co-authored-by: Han Kang <hankang@google.com>
2022-11-08 21:51:44 -08:00
Kubernetes Prow Robot
d619f60e0f Merge pull request #113442 from Huang-Wei/kep-3521-C
[KEP-3521] Part 3: Bug fixes, integration & E2E Test
2022-11-08 15:08:15 -08:00
Hemant Kumar
8cc30f5e0b Fix broken readwriteOncePod serial tests
These tests can't yet run in non-alpha clusters
2022-11-08 15:58:53 -05:00
Kubernetes Prow Robot
3a99a5954d Merge pull request #113629 from andrewsykim/apiserver-identity-beta
Promote APIServerIdentity to Beta
2022-11-08 12:43:10 -08:00
Kubernetes Prow Robot
da735b5415 Merge pull request #113596 from jsafrane/selinux-reconstruction
Reconstruct SELinux  mount label
2022-11-08 12:43:03 -08:00
Kubernetes Prow Robot
6687496832 Merge pull request #113383 from pohly/e2e-failure-handling
e2e: improve failure handling
2022-11-08 12:42:31 -08:00
Wei Huang
abe0c5d5b4 E2E test for KEP Scheduling Readiness Gates 2022-11-08 12:38:21 -08:00
Jan Safranek
d6c36736d5 Add mock CSI driver test for SELinux mount 2022-11-08 13:37:09 +01:00
Jan Safranek
802979c295 Add SELinux disruptive test 2022-11-08 12:42:20 +01:00
Andrew Sy Kim
368f9f949a test/e2e/apimachinery: add e2e test for APIServerIdentity, validating behavior when restarting kube-apiserver
Signed-off-by: Andrew Sy Kim <andrewsy@google.com>
2022-11-07 19:36:22 -05:00
Kubernetes Prow Robot
68875cf4a7 Merge pull request #113047 from everpeace/improve-supplemental-groups-description
Improve the API  description of `PodSecurityContext.SupplementalGroups` to clarify its unfamiliar behavior
2022-11-07 16:01:00 -08:00
Kubernetes Prow Robot
47952e0917 Merge pull request #112360 from mimowo/handling-pod-failures-beta-kubelet
Add pod disruption conditions for kubelet-initiated failures
2022-11-07 16:00:40 -08:00
Kubernetes Prow Robot
fbde6ab05c Merge pull request #111724 from dobsonj/csi-inline-conformance-tests
CSI Inline Volumes: promote API tests to conformance
2022-11-07 16:00:16 -08:00
Michal Wozniak
52cd6755eb Add pod disruption conditions for kubelet initiated failures 2022-11-07 11:23:22 +01:00
Shingo Omura
ac1d5fdf37 Improve the description of PodSecurityContext.SupplementalGroups (including cri-api)
so that it explicitly describe group information defined in the
container image will be kept. This also adds e2e test case of
SupplementalGroups with pre-defined groups in the container
image to make the behaivier clearer.
2022-11-06 10:03:13 +09:00
Davanum Srinivas
f19589d38a Switch to newer nvidia installer for m97
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2022-11-04 14:32:04 -04:00
Kubernetes Prow Robot
1bf4af4584 Merge pull request #111930 from azylinski/new-histogram-pod_start_sli_duration_seconds
New histogram: Pod start SLI duration
2022-11-04 07:28:14 -07:00
Kubernetes Prow Robot
8c77820759 Merge pull request #113274 from Huang-Wei/kep-3521-A
[KEP-3521] Part 1: New Pod API .spec.schedulingGates
2022-11-03 21:24:25 -07:00
Kubernetes Prow Robot
54d3de0850 Merge pull request #113562 from aojea/e2e_lb
e2e loadbalancer remove after each cleanup
2022-11-03 18:54:13 -07:00
Wei Huang
7b6293b6b6 APIs, Validation and condition enforcements
- New API field .spec.schedulingGates
- Validation and drop disabled fields
- Disallow binding a Pod carrying non-nil schedulingGates
- Disallow creating a Pod with non-nil nodeName and non-nil schedulingGates
- Adds a {type:PodScheduled, reason:WaitingForGates} condition if necessary
- New literal SchedulingGated in the STATUS column of `k get pod`
2022-11-03 14:32:34 -07:00
Ionut Balutoiu
f092343300 tests: Spawn poststart / prestop pods on the same node as the https pod
In the PR https://github.com/kubernetes/kubernetes/pull/86139, two more lifecycle hook tests (poststart / prestop)
were added using HTTPS. They are similar with the existing HTTP tests.

However, this causes failures on Windows due to how networking
works there. We previously fixed this in the HTTP tests via f9e4a015e2.

This commit applies the same fix on the lifecycle hook HTTPS tests.

Signed-off-by: Ionut Balutoiu <ibalutoiu@cloudbasesolutions.com>
2022-11-03 19:25:15 +02:00
Kubernetes Prow Robot
c98aef484d Merge pull request #112895 from nokia/kep-1435-GA
KEP-1435 Mixed Protocol values in LoadBalancer Service GA
2022-11-03 05:43:35 -07:00
Laszlo Janosi
9d75c958ce Fix review comments. Implement endpoint port validation that verifies the protocol, too. 2022-11-03 10:54:14 +02:00
Kubernetes Prow Robot
a9f87ad6c8 Merge pull request #113384 from pohly/e2e-formatting
e2e: formatting enhancements
2022-11-02 21:40:08 -07:00
Kubernetes Prow Robot
818b13544f Merge pull request #106242 from thockin/revive-copy-lb-status-type-to-ingress
Copy LoadBalancerStatus from core to networking
2022-11-02 21:39:34 -07:00
Kubernetes Prow Robot
3cf75a2f76 Merge pull request #103177 from arkbriar/support_cancelable_exec_stream
Support cancelable SPDY executor stream
2022-11-02 19:47:36 -07:00
Tim Hockin
0153bfad16 Copy LoadBalancerStatus from core to networking
This type should never have been shared between Service and Ingress.
The `ports` field is unfortunate, but it is needed to stay compatible.
2022-11-02 16:13:31 -07:00
Antonio Ojea
924b467789 e2e loadbalancer remove after each cleanup
The cloud-provider and the e2e test were racing on deleting the
cloud resources.

Also, the cloud-provider should not leave orphan resources, that will
be detected by the job and fail, thus we should not have additional
logic to cleanup masking these errors.
2022-11-02 22:23:45 +00:00
Kubernetes Prow Robot
e8449012e2 Merge pull request #113512 from ehashman/rm-ehashman-node
Remove ehashman from sig-node roles
2022-11-02 08:36:00 -07:00
Laszlo Janosi
82ce61afc7 KEP-1435 Mixed Protocol values in LoadBalancer Service GA
Removed the unit tests that test the cases when the MixedProtocolLBService feature flag was false - the feature flag is locked to true with GA
Added an integration test to test whether the API server accepts an LB Service with different protocols.
Added an e2e test to test whether a service which is exposed by a multi-protocol LB Service is accessible via both ports.
Removed the conditional validation that compared the new and the old Service definitions during an update - the feature flag is locked to true with GA.
2022-11-02 13:44:52 +02:00
Roman Bednar
8209066c4c graduate RetroactiveDefaultStorageClass to beta
Change feature to beta and remove e2e test feature flags since they're
not needed anymore.
2022-11-02 09:25:10 +01:00
Patrick Ohly
80c554c5b0 e2e framework: simplify failure and skip handling
The original intention (adding more information for later analysis)
is probably obsolete because there is no code which does anything
with the extended error.

The code in upgrade_suite.go collected it in an in-memory JUnit report, but
then didn't do anything with that field. The code also wouldn't work for
failures detected by Ginkgo itself, like the upcoming timeout handling. If the
upgrade suite needs the information, it probably should get it from Gingko with
a ReportAfterSuite call instead of depending in some fragile interception
mechanism.
2022-11-02 09:23:01 +01:00
Elana Hashman
9d2d392802 Remove ehashman from sig-node roles 2022-11-01 12:16:43 -07:00
Stephen Heywood
168fa73b60 Promote ResourceQuota e2e test to Conformance 2022-11-01 09:20:38 +13:00
Kubernetes Prow Robot
62b21069d7 Merge pull request #113119 from marosset/hpc-local-account-e2e
Adding e2e test for running Windows hostprocess containers as members of a local usergroup
2022-10-31 11:50:45 -07:00
Stephen Heywood
cdfdf0f6ce Promote Namespace e2e test to Conformance 2022-10-31 09:48:28 +13:00
Patrick Ohly
5a01a52b0c test: extend gomega to use YAML for API types
Some of our API types contain fields that get rendered very poorly by
gomega.format.Object because they contain lots of internal information, for
example CreationTimestamp. As a result, dumping full API object typically gets
truncated.

What we want is a representation that is a) multi-line (in contrast to the
stringer implemented by our types) and b) drops empty fields where it
was defined that this is okay.

The normal YAML representation fits that requirement. We just need to teach
gomega how and when to do that. This cannot be done for each type through a
generated GomegaString method (lots of code, additional dependency in public
API on YAML encoder), but it can be done inside tests by adding a formatting
handler (new gomega feature).
2022-10-28 15:43:48 +02:00
Patrick Ohly
023baa5e45 e2e framework: truncate too long failure messages when writing JUnit
Our tooling cannot handle very long failure messages well:
- when unfolding a test in the spyglass UI, it fills the entire screen
- failure correlation for http://go.k8s.io/triage has resource constraints

We cannot enforce that all tests only produce short failure messages and even
if we could, depending on the test failure, including more information may be
useful to understand it.

To achieve both goals (summary for correlation and overview, all details
available when digging deeper), too longer failure messages now get truncated,
with the full message guaranteed to be captured in the test output.

"Too long" is arbitrarily chosen to be similar to the gomega.MaxLength because
that has been a limit for failure message size in the past.
2022-10-28 15:43:48 +02:00
Patrick Ohly
b3f4cd66cd e2e framework: add -gomega-max-length parameter
When gomega.format exceeds the default size of 4000, it truncates and prints:

  Gomega truncated this representation as it exceeds 'format.MaxLength'.
  Consider having the object provide a custom 'GomegaStringer' representation
  or adjust the parameters in Gomega's 'format' package.

  Learn more here: https://onsi.github.io/gomega/#adjusting-output

These instructions don't help the user of the e2e.test binary unless we provide
a command line flag.
2022-10-28 15:43:48 +02:00
Kubernetes Prow Robot
3df170d1c4 Merge pull request #113198 from pacoxu/kubectl-alpha-events
kubectl-alpha-events: e2e ignore some timeout errors(flake)
2022-10-28 00:42:41 -07:00
Kubernetes Prow Robot
f163fae7d5 Merge pull request #113409 from gnufied/disable-generic-ephemeral-expansion
Disable expansion in SC, if driver does not support it
2022-10-27 13:20:42 -07:00
Hemant Kumar
fa242ff102 Disable expansion in SC, if driver does not support it 2022-10-27 13:35:36 -04:00
Patrick Ohly
2a7f9723ec e2e framework: fix incorrect backtrace in Failf
Commit 99e9096034 was only supposed to remove the
FailfWithOffset function, but it also changed the behavior by skipping one
additional stack frame. That makes no sense and is inconsistent with Fail, which
also logs the direct caller.
2022-10-27 12:24:00 +02:00
Jonathan Dobson
5d1725f17b CSI Inline Volumes: promote API tests to conformance 2022-10-26 07:41:01 -06:00