* Add tracker types and tests
* Modify ResourceEventHandler interface's OnAdd member
* Add additional ResourceEventHandlerDetailedFuncs struct
* Fix SharedInformer to let users track HasSynced for their handlers
* Fix in-tree controllers which weren't computing HasSynced correctly
* Deprecate the cache.Pop function
The Feature:SCTPConnectivity tests cannot run at the same time as the
"X doesn't cause sctp.ko to be loaded" tests, since they may cause
sctp.ko to be loaded. We had dealt with this in the past by marking
them [Disruptive], but this isn't really fair; the problem is more
with the sctp.ko-checking tests than it is with the SCTPConnectivity
tests. So make them not [Disruptive] and instead make the
sctp.ko-checking tests be [Serial].
There were two SCTP tests grouped together in
test/e2e/network/service.go, but one of them wasn't a service test...
so move the SCTP service test to be grouped with the other service
tests, and the SCTP hostport tests to be grouped with other
non-service tests.
The SCTP HostPort test was checking that creating a pod with an SCTP
HostPort would create a certain iptables rule, but the handling of
HostPorts is now up to CRI, not kubelet, so kubernetes e2e cannot
assume it will implement the feature in any specific way.
(The test still ensures that (a) the apiserver accepts SCTP HostPorts,
and (b) neither kubelet nor the runtime causes the SCTP kernel module
to be loaded as part of creating a pod with an SCTP HostPort.)
We had a test that creating a Service with an SCTP port would create
an iptables rule with "-p sctp" in it, which let us test that
kube-proxy was doing vaguely the right thing with SCTP even if the e2e
environment didn't have SCTP support. But this would really make much
more sense as a unit test.
Windows ComputerNames cannot exceed 15 characters. This causes a few tests to fail
when the node names exceed that limit. Additionally, the checks should be case
insensitive.
The Gingo v2 time suffix is hh:mm:ss without the .xyz sub-second details if the
time stamp happens to land exactly on a second.
This change fixes test flakes like the following:
-STEP: Building a namespace api object, basename test-namespace
+STEP: Building a namespace api object, basename test-namespace 12/13/22 11:43:53
--- FAIL: TestCleanup (36.79s)
ginkgo.DeferCleanup has multiple advantages:
- The cleanup operation can get registered if and only if needed.
- No need to return a cleanup function that the caller must invoke.
- Automatically determines whether a context is needed, which will
simplify the introduction of context parameters.
- Ginkgo's timeline shows when it executes the cleanup operation.
- use `ginkgo.DeferCleanup` instead of clean up in the AfterEach block
- encourage use of ginkgo by not extending expect.go
Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
Add a test case with a DaemonSet behind a simple load balancer whose
address is being constantly hit via HTTP requests.
The test passes if there are no errors when doing HTTP requests to the
load balancer address, during DaemonSet `RollingUpdate` operations.
Signed-off-by: Ionut Balutoiu <ibalutoiu@cloudbasesolutions.com>
Some node e2e tests check for expected number of pods running
on the node to verify the correct state of that node after running
test scenarios. An example of such a check is in the device plugin
end to end test here: [1].
If the node is not left in a clean state after an e2e test finishes
running, it can lead to flaky tests because the node might have
unexpected pods running on the node.
In order to avoid that, we make sure that the test pods are
cleaned up after the test runs.
[1]: https://github.com/kubernetes/kubernetes/blob/master/test/e2e_node/device_plugin_test.go#L189-L190
Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
wait.Until catches panics and logs them, which leads to confusing
output. Besides, the test is written so that failures must get reported to the
main goroutine.
Looking up the expected nodes in the goroutine raced with the test making
changes to the configuration. When doing (unrelated?) changes, the test started
to fail:
Oct 23 15:47:03.092: INFO: Unexpected error:
<*errors.errorString | 0xc001154c70>: {
s: "no subset of available IP address found for the endpoint test-rolling-update-with-lb within timeout 2m0s",
}
Oct 23 15:47:03.092: FAIL: no subset of available IP address found for the endpoint test-rolling-update-with-lb within timeout 2m0s
Now that everything is connected to a per-test context, the gRPC server might
encounter an error before it gets shut down normally. We must not panic in that
case because it would kill the entire Ginkgo worker process. This is not even
an error, so just log it as info message.
The wrapper can be used in combination with ginkgo.DeferCleanup to ignore
harmless "not found" errors during delete operations.
Original code suggested by Onsi Fakhouri.
It is set in all of the test/e2e* suites, but not in the ginkgo output
tests. This check is needed before adding a test case there which would trigger
this nil pointer access.
Adding the "context" import in the previous commit must get compensated by
removing one of the blank lines in the output unit tests, otherwise the stack
backtrace don't match expectations.
Adding "ctx" as parameter in the previous commit led to some linter errors
about code that overwrites "ctx" without using it.
This gets fixed by replacing context.Background or context.TODO in those code
lines with the new ctx parameter.
Two context.WithCancel calls can get removed completely because the context
automatically gets cancelled by Ginkgo when the test returns.
Every ginkgo callback should return immediately when a timeout occurs or the
test run manually gets aborted with CTRL-C. To do that, they must take a ctx
parameter and pass it through to all code which might block.
This is a first automated step towards that: the additional parameter got added
with
sed -i 's/\(framework.ConformanceIt\|ginkgo.It\)\(.*\)func() {$/\1\2func(ctx context.Context) {/' \
$(git grep -l -e framework.ConformanceIt -e ginkgo.It )
$GOPATH/bin/goimports -w $(git status | grep modified: | sed -e 's/.* //')
log_test.go was left unchanged.
Endpoints generated by the endpoints controller are in the canonical
form, however, custom endpoints can not be in canonical format
(there was a time they were canonicalized in the apiserver, but this
caused performance issues because the endpoint controller kept
updating them since the created endpoint were different than the
stored one due to the canonicalization)
There are cases where a custom endpoint may generate multiple slices
due to the controller, per example, when the same address is present
in different subsets.
The endpointslice mirroring controller should canonicalize the
endpoints subsets before start processing them to be consistent
on the slices generated, there is no risk of hotlooping because
the endpoint is only used as input.
Change-Id: I2a8cd53c658a640aea559a88ce33e857fa98cc5c
This ensures that the daemonset controller updates daemonset statuses in
a best-effort manner even if syncDaemonSet fails.
In order to add an integration test, this also replaces
`cmd/kube-apiserver/app/testing.StartTestServer` with
`test/integration/framework.StartTestServer` and adds
`setupWithServerSetup` to configure the admission control of the
apiserver.
Currently, if user executes `kubectl scale --dry-run`, output has no
indicator showing that this is not applied in reality.
This PR adds dry run suffix to the output as well as more integration
tests to verify it.
`kubectl scale` calls visitor two times. Second call fails when
the piped input is passed by returning an
`error: no objects passed to scale` error.
This PR uses the result of first visitor and fixes that piped
input problem. In addition to that, this PR also adds new
scale test to verify.
`kubectl exec` command supports getting files as inputs. However,
if the file contains multiple resources, it returns unclear error message;
`cannot attach to *v1.List: selector for *v1.List not implemented`.
Since `exec` command does not support multi resources, this PR
handles that and returns descriptive error message earlier.
One of the cpumanager tests doesn't remove the pod
that got created during the test.
This causes pollution of other tests and failures
from time to time (depends on the test execution order).
In order to defalke the tests, we should delete the pod
and wait for it to be completely remove.
Signed-off-by: Talor Itzhak <titzhak@redhat.com>
This introduces `singularNameProvider`. This provider will be used
by core types to have their singular names are defined in discovery
endpoint. Thanks to that, core resources singular name always have
higher precedence than CRDs shortcuts or singular names.
This adds new integration tests to test shortnames and
singular names are expanding to correct resources.
In this case, core types have always higher precendence than
CRDs.
This change will leverage the new PreFilterResult
to reduce down the list of eligible nodes for pod
using Bound Local PVs during PreFilter stage so
that only the node(s) which local PV node affinity
matches will be cosnidered in subsequent scheduling
stages.
Today, the NodeAffinity check is done during Filter
which means all nodes will be considered even though
there may be a large number of nodes that are not
eligible due to not matching the pod's bound local
PV(s)' node affinity requirement. Here we can
reduce down the node list in PreFilter to ensure that
during Filter we are only considering the reduced
list and thus can provide a more clear message to
users when node(s) are not available for scheduling
since the list only contains relevant nodes.
If error is encountered (e.g. PV cache read error) or
if node list reduction cannot be done (e.g. pod uses
no local PVs), then we will still proceed to consider
all nodes for the rest of scheduling stages.
Signed-off-by: Yibo Zhuang <yibzhuang@gmail.com>
In the Dynamic Resource allocation example specs, the claim
parameter name specified was inconsistent.
This commit fixes that with a better/more consistent name,
which is used to define the configmap and referenced in
the `ResourceClaimTemplate` spec.
Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
A recent PR [1] updated the image versions we use for E2E tests. However, the ``windows-nanoserver`` image is meant to be in a private authenticated registry: ``gcr.io/authenticated-image-pulling/windows-nanoserver``, which requires credentials to pull images from it. This image is required by the ``[sig-node] Container Runtime blackbox test when running a container with a new image should be able to pull from private registry with secret [NodeConformance]`` test for Windows. The ``v3`` image does not exist, there's no automatic promotion process for that registry. Previously, it was built and pushed manually.
Because of this, the https://testgrid.k8s.io/sig-windows-signal#capz-windows-containerd-master jobs have started to fail.
Reverts the image version to ``v1``.
[1] https://github.com/kubernetes/kubernetes/pull/113900
Many clusters block direct requests from internal resources to the nodes
external IPs as best practice. All accesses from internal resources that
want to access resources running on nodes go through load balancers,
nodes being on private or public subnets. Let's prefer internal IPs
first, so the tests can work even when there are security group rules
present blocking requests to the external IPs.
We should not require ExternalIP for Conformance, but should keep
testing ExternalIPs in sig network.
Signed-off-by: Rafael Fonseca <r4f4rfs@gmail.com>
These instructions bring up a kind cluster with containerd 34d078e99, the
latest commit from the main branch. This version of containerd has
support for CDI.
The driver can be used manually against a cluster started with
local-up-cluster.sh and is also used for E2E testing. Because the tests proxy
connections from the nodes into the e2e.test binary and create/delete files via
the equivalent of "kubectl exec dd/rm", they can be run against arbitrary
clusters. Each test gets its own driver instance and resource class, therefore
they can run in parallel.
Add volumePath parameter to all disruptive checks, so subpath tests can use
"/test-volume" and disruptive tests can use "/mnt/volume1" for their
respective Pods.
This adds a new resource.k8s.io API group with v1alpha1 as version. It contains
four new types: resource.ResourceClaim, resource.ResourceClass, resource.ResourceClaimTemplate, and
resource.PodScheduling.
This removes WaitTimeoutForPodNoLongerRunningOrNotFoundInNamespace
introduced in f2b9479f8e and changes
the test to use goroutines to speed up the cleanups.
Most CI jobs run an OS that does not support SELinux, therefore tests that
need it should be skipped by default.
* [Feature:SELinux] marks tests that need SELinux (for any feature)
* [Feature:SELinuxMountReadWriteOncePod] marks tests that need
SELinuxMountReadWriteOncePod alpha gate enabled.
Currently, all SELinux tests have both, but it will change in the future.
Also make some design changes exposed in testing and review.
Do not remove the ambiguous old metric
`apiserver_flowcontrol_request_concurrency_limit` because reviewers
though it is too early. This creates a problem, that metric can not
keep both of its old meanings. I chose the configured concurrency
limit.
Testing has revealed a design flaw, which concerns the initialization
of the seat demand state tracking. The current design in the KEP is
as follows.
> Adjustment is also done on configuration change … For a newly
> introduced priority level, we set HighSeatDemand, AvgSeatDemand, and
> SmoothSeatDemand to NominalCL-LendableSD/2 and StDevSeatDemand to
> zero.
But this does not work out well at server startup. As part of its
construction, the APF controller does a configuration change with zero
objects read, to initialize its request-handling state. As always,
the two mandatory priority levels are implicitly added whenever they
are not read. So this initial reconfig has one non-exempt priority
level, the mandatory one called catch-all --- and it gets its
SmoothSeatDemand initialized to the whole server concurrency limit.
From there it decays slowly, as per the regular design. So for a
fairly long time, it appears to have a high demand and competes
strongly with the other priority levels. Its Target is higher than
all the others, once they start to show up. It properly gets a low
NominalCL once other levels show up, which actually makes it compete
harder for borrowing: it has an exceptionally high Target and a rather
low NominalCL.
I have considered the following fix. The idea is that the designed
initialization is not appropriate before all the default objects are
read. So the fix is to have a mode bit in the controller. In the
initial state, those seat demand tracking variables are set to zero.
Once the config-producing controller detects that all the default
objects are pre-existing, it flips the mode bit. In the later mode,
the seat demand tracking variables are initialized as originally
designed.
However, that still gives preferential treatment to the default
PriorityLevelConfiguration objects, over any that may be added later.
So I have made a universal and simpler fix: always initialize those
seat demand tracking variables to zero. Even if a lot of load shows
up quickly, remember that adjustments are frequent (every 10 sec) and
the very next one will fully respond to that load.
Also: revise logging logic, to log at numerically lower V level when
there is a change.
Also: bug fix in float64close.
Also, separate imports in some file
Co-authored-by: Han Kang <hankang@google.com>
This change enables hot reload of encryption config file when api server
flag --encryption-provider-config-automatic-reload is set to true. This
allows the user to change the encryption config file without restarting
kube-apiserver. The change is detected by polling the file and is done
by using fsnotify watcher. When file is updated it's process to generate
new set of transformers and close the old ones.
Signed-off-by: Nilekh Chaudhari <1626598+nilekhc@users.noreply.github.com>