Now that everything is connected to a per-test context, the gRPC server might
encounter an error before it gets shut down normally. We must not panic in that
case because it would kill the entire Ginkgo worker process. This is not even
an error, so just log it as info message.
The wrapper can be used in combination with ginkgo.DeferCleanup to ignore
harmless "not found" errors during delete operations.
Original code suggested by Onsi Fakhouri.
It is set in all of the test/e2e* suites, but not in the ginkgo output
tests. This check is needed before adding a test case there which would trigger
this nil pointer access.
Adding the "context" import in the previous commit must get compensated by
removing one of the blank lines in the output unit tests, otherwise the stack
backtrace don't match expectations.
Adding "ctx" as parameter in the previous commit led to some linter errors
about code that overwrites "ctx" without using it.
This gets fixed by replacing context.Background or context.TODO in those code
lines with the new ctx parameter.
Two context.WithCancel calls can get removed completely because the context
automatically gets cancelled by Ginkgo when the test returns.
Every ginkgo callback should return immediately when a timeout occurs or the
test run manually gets aborted with CTRL-C. To do that, they must take a ctx
parameter and pass it through to all code which might block.
This is a first automated step towards that: the additional parameter got added
with
sed -i 's/\(framework.ConformanceIt\|ginkgo.It\)\(.*\)func() {$/\1\2func(ctx context.Context) {/' \
$(git grep -l -e framework.ConformanceIt -e ginkgo.It )
$GOPATH/bin/goimports -w $(git status | grep modified: | sed -e 's/.* //')
log_test.go was left unchanged.
Endpoints generated by the endpoints controller are in the canonical
form, however, custom endpoints can not be in canonical format
(there was a time they were canonicalized in the apiserver, but this
caused performance issues because the endpoint controller kept
updating them since the created endpoint were different than the
stored one due to the canonicalization)
There are cases where a custom endpoint may generate multiple slices
due to the controller, per example, when the same address is present
in different subsets.
The endpointslice mirroring controller should canonicalize the
endpoints subsets before start processing them to be consistent
on the slices generated, there is no risk of hotlooping because
the endpoint is only used as input.
Change-Id: I2a8cd53c658a640aea559a88ce33e857fa98cc5c
This ensures that the daemonset controller updates daemonset statuses in
a best-effort manner even if syncDaemonSet fails.
In order to add an integration test, this also replaces
`cmd/kube-apiserver/app/testing.StartTestServer` with
`test/integration/framework.StartTestServer` and adds
`setupWithServerSetup` to configure the admission control of the
apiserver.
Currently, if user executes `kubectl scale --dry-run`, output has no
indicator showing that this is not applied in reality.
This PR adds dry run suffix to the output as well as more integration
tests to verify it.
`kubectl scale` calls visitor two times. Second call fails when
the piped input is passed by returning an
`error: no objects passed to scale` error.
This PR uses the result of first visitor and fixes that piped
input problem. In addition to that, this PR also adds new
scale test to verify.
`kubectl exec` command supports getting files as inputs. However,
if the file contains multiple resources, it returns unclear error message;
`cannot attach to *v1.List: selector for *v1.List not implemented`.
Since `exec` command does not support multi resources, this PR
handles that and returns descriptive error message earlier.
One of the cpumanager tests doesn't remove the pod
that got created during the test.
This causes pollution of other tests and failures
from time to time (depends on the test execution order).
In order to defalke the tests, we should delete the pod
and wait for it to be completely remove.
Signed-off-by: Talor Itzhak <titzhak@redhat.com>
This introduces `singularNameProvider`. This provider will be used
by core types to have their singular names are defined in discovery
endpoint. Thanks to that, core resources singular name always have
higher precedence than CRDs shortcuts or singular names.
This adds new integration tests to test shortnames and
singular names are expanding to correct resources.
In this case, core types have always higher precendence than
CRDs.
This change will leverage the new PreFilterResult
to reduce down the list of eligible nodes for pod
using Bound Local PVs during PreFilter stage so
that only the node(s) which local PV node affinity
matches will be cosnidered in subsequent scheduling
stages.
Today, the NodeAffinity check is done during Filter
which means all nodes will be considered even though
there may be a large number of nodes that are not
eligible due to not matching the pod's bound local
PV(s)' node affinity requirement. Here we can
reduce down the node list in PreFilter to ensure that
during Filter we are only considering the reduced
list and thus can provide a more clear message to
users when node(s) are not available for scheduling
since the list only contains relevant nodes.
If error is encountered (e.g. PV cache read error) or
if node list reduction cannot be done (e.g. pod uses
no local PVs), then we will still proceed to consider
all nodes for the rest of scheduling stages.
Signed-off-by: Yibo Zhuang <yibzhuang@gmail.com>
In the Dynamic Resource allocation example specs, the claim
parameter name specified was inconsistent.
This commit fixes that with a better/more consistent name,
which is used to define the configmap and referenced in
the `ResourceClaimTemplate` spec.
Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
A recent PR [1] updated the image versions we use for E2E tests. However, the ``windows-nanoserver`` image is meant to be in a private authenticated registry: ``gcr.io/authenticated-image-pulling/windows-nanoserver``, which requires credentials to pull images from it. This image is required by the ``[sig-node] Container Runtime blackbox test when running a container with a new image should be able to pull from private registry with secret [NodeConformance]`` test for Windows. The ``v3`` image does not exist, there's no automatic promotion process for that registry. Previously, it was built and pushed manually.
Because of this, the https://testgrid.k8s.io/sig-windows-signal#capz-windows-containerd-master jobs have started to fail.
Reverts the image version to ``v1``.
[1] https://github.com/kubernetes/kubernetes/pull/113900
Many clusters block direct requests from internal resources to the nodes
external IPs as best practice. All accesses from internal resources that
want to access resources running on nodes go through load balancers,
nodes being on private or public subnets. Let's prefer internal IPs
first, so the tests can work even when there are security group rules
present blocking requests to the external IPs.
We should not require ExternalIP for Conformance, but should keep
testing ExternalIPs in sig network.
Signed-off-by: Rafael Fonseca <r4f4rfs@gmail.com>
These instructions bring up a kind cluster with containerd 34d078e99, the
latest commit from the main branch. This version of containerd has
support for CDI.
The driver can be used manually against a cluster started with
local-up-cluster.sh and is also used for E2E testing. Because the tests proxy
connections from the nodes into the e2e.test binary and create/delete files via
the equivalent of "kubectl exec dd/rm", they can be run against arbitrary
clusters. Each test gets its own driver instance and resource class, therefore
they can run in parallel.
Add volumePath parameter to all disruptive checks, so subpath tests can use
"/test-volume" and disruptive tests can use "/mnt/volume1" for their
respective Pods.
This adds a new resource.k8s.io API group with v1alpha1 as version. It contains
four new types: resource.ResourceClaim, resource.ResourceClass, resource.ResourceClaimTemplate, and
resource.PodScheduling.
This removes WaitTimeoutForPodNoLongerRunningOrNotFoundInNamespace
introduced in f2b9479f8e and changes
the test to use goroutines to speed up the cleanups.
Most CI jobs run an OS that does not support SELinux, therefore tests that
need it should be skipped by default.
* [Feature:SELinux] marks tests that need SELinux (for any feature)
* [Feature:SELinuxMountReadWriteOncePod] marks tests that need
SELinuxMountReadWriteOncePod alpha gate enabled.
Currently, all SELinux tests have both, but it will change in the future.
Also make some design changes exposed in testing and review.
Do not remove the ambiguous old metric
`apiserver_flowcontrol_request_concurrency_limit` because reviewers
though it is too early. This creates a problem, that metric can not
keep both of its old meanings. I chose the configured concurrency
limit.
Testing has revealed a design flaw, which concerns the initialization
of the seat demand state tracking. The current design in the KEP is
as follows.
> Adjustment is also done on configuration change … For a newly
> introduced priority level, we set HighSeatDemand, AvgSeatDemand, and
> SmoothSeatDemand to NominalCL-LendableSD/2 and StDevSeatDemand to
> zero.
But this does not work out well at server startup. As part of its
construction, the APF controller does a configuration change with zero
objects read, to initialize its request-handling state. As always,
the two mandatory priority levels are implicitly added whenever they
are not read. So this initial reconfig has one non-exempt priority
level, the mandatory one called catch-all --- and it gets its
SmoothSeatDemand initialized to the whole server concurrency limit.
From there it decays slowly, as per the regular design. So for a
fairly long time, it appears to have a high demand and competes
strongly with the other priority levels. Its Target is higher than
all the others, once they start to show up. It properly gets a low
NominalCL once other levels show up, which actually makes it compete
harder for borrowing: it has an exceptionally high Target and a rather
low NominalCL.
I have considered the following fix. The idea is that the designed
initialization is not appropriate before all the default objects are
read. So the fix is to have a mode bit in the controller. In the
initial state, those seat demand tracking variables are set to zero.
Once the config-producing controller detects that all the default
objects are pre-existing, it flips the mode bit. In the later mode,
the seat demand tracking variables are initialized as originally
designed.
However, that still gives preferential treatment to the default
PriorityLevelConfiguration objects, over any that may be added later.
So I have made a universal and simpler fix: always initialize those
seat demand tracking variables to zero. Even if a lot of load shows
up quickly, remember that adjustments are frequent (every 10 sec) and
the very next one will fully respond to that load.
Also: revise logging logic, to log at numerically lower V level when
there is a change.
Also: bug fix in float64close.
Also, separate imports in some file
Co-authored-by: Han Kang <hankang@google.com>
This change enables hot reload of encryption config file when api server
flag --encryption-provider-config-automatic-reload is set to true. This
allows the user to change the encryption config file without restarting
kube-apiserver. The change is detected by polling the file and is done
by using fsnotify watcher. When file is updated it's process to generate
new set of transformers and close the old ones.
Signed-off-by: Nilekh Chaudhari <1626598+nilekhc@users.noreply.github.com>
This change adds a flag --encryption-provider-config-automatic-reload
which will be used to drive automatic reloading of the encryption
config at runtime. While this flag is set to true, or when KMS v2
plugins are used without KMS v1 plugins, the /healthz endpoints
associated with said plugins are collapsed into a single endpoint at
/healthz/kms-providers - in this state, it is not possible to
configure exclusions for specific KMS providers while including the
remaining ones - ex: using /readyz?exclude=kms-provider-1 to exclude
a particular KMS is not possible. This single healthz check handles
checking all configured KMS providers. When reloading is enabled
but no KMS providers are configured, it is a no-op.
k8s.io/apiserver does not support dynamic addition and removal of
healthz checks at runtime. Reloading will instead have a single
static healthz check and swap the underlying implementation at
runtime when a config change occurs.
Signed-off-by: Monis Khan <mok@microsoft.com>
so that it explicitly describe group information defined in the
container image will be kept. This also adds e2e test case of
SupplementalGroups with pre-defined groups in the container
image to make the behaivier clearer.
- New API field .spec.schedulingGates
- Validation and drop disabled fields
- Disallow binding a Pod carrying non-nil schedulingGates
- Disallow creating a Pod with non-nil nodeName and non-nil schedulingGates
- Adds a {type:PodScheduled, reason:WaitingForGates} condition if necessary
- New literal SchedulingGated in the STATUS column of `k get pod`
In the PR https://github.com/kubernetes/kubernetes/pull/86139, two more lifecycle hook tests (poststart / prestop)
were added using HTTPS. They are similar with the existing HTTP tests.
However, this causes failures on Windows due to how networking
works there. We previously fixed this in the HTTP tests via f9e4a015e2.
This commit applies the same fix on the lifecycle hook HTTPS tests.
Signed-off-by: Ionut Balutoiu <ibalutoiu@cloudbasesolutions.com>
The cloud-provider and the e2e test were racing on deleting the
cloud resources.
Also, the cloud-provider should not leave orphan resources, that will
be detected by the job and fail, thus we should not have additional
logic to cleanup masking these errors.
Removed the unit tests that test the cases when the MixedProtocolLBService feature flag was false - the feature flag is locked to true with GA
Added an integration test to test whether the API server accepts an LB Service with different protocols.
Added an e2e test to test whether a service which is exposed by a multi-protocol LB Service is accessible via both ports.
Removed the conditional validation that compared the new and the old Service definitions during an update - the feature flag is locked to true with GA.
The original intention (adding more information for later analysis)
is probably obsolete because there is no code which does anything
with the extended error.
The code in upgrade_suite.go collected it in an in-memory JUnit report, but
then didn't do anything with that field. The code also wouldn't work for
failures detected by Ginkgo itself, like the upcoming timeout handling. If the
upgrade suite needs the information, it probably should get it from Gingko with
a ReportAfterSuite call instead of depending in some fragile interception
mechanism.
Tests scheduler enforcement of the ReadWriteOncePod PVC access mode.
- Creates a pod using a PVC with ReadWriteOncePod
- Creates a second pod using the same PVC
- Observes the second pod fails to schedule because PVC is in-use
- Deletes the first pod
- Observes the second pod successfully schedules