There are some tests which want to insert a tag before the main Describe text,
for example:
sigDescribe("[Feature:Windows] Cpu Resources [Serial]",
skipUnlessWindows(func() { ... })
In order to support this without change existing test names, it must be
possible to do this instead:
sigDescribe(feature.Windows, "Cpu Resources", framework.WithSerial(),
skipUnlessWindows(func() { ... })
There are similar examples for the other functions.
While at it, replace one left-over panic with ReportBug and add the missing
`NodeFeature:` prefix.
The sysctl tests have to be skipped when the node components are running in UserNS,
because the tests fail due to `open /proc/sys/kernel/shm_rmid_forced: permission denied`
(as expected).
Can be verified with Rootless kind (https://kind.sigs.k8s.io/docs/user/rootless/):
```
dockerd-rootless-setuptool.sh install
: The following steps are added because 'kubetest2 kind --build' does not seem to build e2e.test and ginkgo
make WHAT=test/e2e/e2e.test
make ginkgo
cp -f _output/bin/{e2e.test,ginkgo} _output/dockerized/bin/linux/amd64
kubetest2 kind --build --up --down --test=ginkgo -- \
--use-built-binaries \
--focus-regex='\[NodeConformance\]' \
--skip-regex='\[Environment:NotInUserNS\]'
```
Test with the following host environment:
- kubernetes-sigs/kind@ac28d7fb19 (main)
- kubernetes-sigs/kubetest2@89f09b65e8 (master)
- Docker 24.0.6
- Ubuntu 22.04 amd64, kernel 5.15
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
Using the feature gate has the advantage that the stability tag gets added
automatically and no changes are needed when graduating it. Once it reaches GA,
the tag needs to be removed together with the feature gate.
In fact, the current `[ALPHA]` is already wrong: the feature has already
graduated to beta...
After a CRD or an APIService was deleted, the corresponding group was
never unregistered. It caused a stale entry to remain in the root path
and could potentially lead to memory leak as the groupDiscoveryHandler
was never released and the handledGroups was never cleaned up.
The commit implements the cleanup. It tracks each group's usage and
unregister the a group when there is no version for this group.
Signed-off-by: Quan Tian <qtian@vmware.com>
* Job: Handle error returned from AddEventHandler function
Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
* Use the error message the similar to CronJob
Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
* Clean up error messages
Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
* Put the tesing.T on the second place in the args for the newControllerFromClient function
Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
* Put the testing.T on the second place in the args for the newControllerFromClientWithClock function
Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
* Call t.Helper()
Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
* Put the testing.TB on the second place in the args for the createJobControllerWithSharedInformers function and call tb.Helper() there
Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
* Put the testing.TB on the second place in the args for the startJobControllerAndWaitForCaches function and call tb.Helper() there
Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
* Adapt TestFinializerCleanup to the eventhandler error
Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
---------
Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
TestSampleAPIServer tried to validate APIService deletion, but it used
an unmatched selector to delete and list APIServices, which essentially
validated nothing.
Signed-off-by: Quan Tian <qtian@vmware.com>
Some test cases can make nodes not ready and use DeferCleanup to bring
nodes back online. Checking if all nodes are online would fail
in such cases as AfterEach runs before DeferCleanup.
Scheduling nodes readines check to DeferCleanup should solve this
issue as nodes would be brought back to a `Ready` state before the
check.
This is a workaround for the issue that the kubelet cannot differentiate
the container statuses of the previous podSandbox from the current one.
If the node is rebooted, all containers will be in the exited state and
the kubelet will try to recreate a new podSandbox. In this case, the
kubelet should not mistakenly think that the newly created podSandbox
has been initialized.
If the user specifies the intent to control registration process, we rely on
registration triggers (deletion of control file) to prompt registration.
This behvaiour is expected to be consistent across kubelet restarts and therefore
across the watch calls where we watch for changes to the unix socket so we make
this part of Stub object instead of a parameter.
Co-authored-by: Francesco Romani <fromani@redhat.com>
Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
In case `REGISTER_CONTROL_FILE` is specified, we want to ensure that the
registration is triggered by deletion of the control file. This is
applicable both when the registration happens for the first time and
subsequent ones because of kubelet restarts.
Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
In issue: 115107 we added an environment variable to control the registration of sample
device plugin to kubelet. The intent of this patch is to ensure that the default
behaviour of the plugin is to register to kubelet (in case no environment
variable is specified).
In addition to that, we want to ensure that the plugin registers itself not just once.
It should re-register itself to kubelet in case of node reboot or kubelet restarts.
Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
These were found with a modified klog that enables "go vet" to check klog call
parameters:
cmd/kubeadm/app/features/features.go:149:4: printf: k8s.io/klog/v2.Warningf format %t has arg v of wrong type string (govet)
klog.Warningf("Setting deprecated feature gate %s=%t. It will be removed in a future release.", k, v)
test/images/sample-device-plugin/sampledeviceplugin.go:147:5: printf: k8s.io/klog/v2.Errorf does not support error-wrapping directive %w (govet)
klog.Errorf("error: %w", err)
test/images/sample-device-plugin/sampledeviceplugin.go:155:3: printf: k8s.io/klog/v2.Errorf does not support error-wrapping directive %w (govet)
klog.Errorf("Failed to add watch to %q: %w", triggerPath, err)
staging/src/k8s.io/code-generator/cmd/prerelease-lifecycle-gen/prerelease-lifecycle-generators/status.go:207:5: printf: k8s.io/klog/v2.Fatalf does not support error-wrapping directive %w (govet)
klog.Fatalf("Package %v: unsupported %s value: %q :%w", i, tagEnabledName, ptag.value, err)
staging/src/k8s.io/legacy-cloud-providers/vsphere/nodemanager.go:286:3: printf: (k8s.io/klog/v2.Verbose).Infof format %s reads arg #1, but call has 0 args (govet)
klog.V(4).Infof("Node %s missing in vSphere cloud provider cache, trying node informer")
staging/src/k8s.io/legacy-cloud-providers/vsphere/nodemanager.go:302:3: printf: (k8s.io/klog/v2.Verbose).Infof format %s reads arg #1, but call has 0 args (govet)
klog.V(4).Infof("Node %s missing in vSphere cloud provider caches, trying the API server")
KEP-2593 proposed to expand the existing node-ipam controller
to be configurable via a ClusterCIDR objects, however, there
were reasonable doubts on the SIG about the feature and after
several months of dicussions we decided to not move forward
with the KEP intree, hence, we are going to remove the existing
code, that is still in alpha.
https://groups.google.com/g/kubernetes-sig-network/c/nts1xEZ--gQ/m/2aTOUNFFAAAJ
Change-Id: Ieaf2007b0b23c296cde333247bfb672441fe6dfc
* Add warning handler callback function in shortcut expander
Currently, errors in client-go are propagated back to the callers via
function returns. However, there is no elegant way for just warning users.
For example, when user wants to get a resource with it's short name format
and if there are multiple resources belonging to this short name, we need to
warn user about this ambugity which one is picked and which ones are discarded.
Not only to overcome this particular case mentioned above, but also propose a
way for the possible warnings in the future, this commit adds a warningHandler
callback function in shortcutExpander.
* Add warningPrinter functionality in ConfigFlags
ConfigFlags has neither warning user in a standardized
format functionality nor passing warning callback functions to other upper level
libraries such as client-go.
This commit adds an ability that user can set warningPrinters
according to their IOStreams and this warningPrinters will be used
to raise possible warnings happening not only in cli-runtime but
also in client-go.
* Pass warning callback function in ConfigFlags to shortcutExpander
This commit passes warning callback function to print possible
warnings happened in shortcut expander to warn user in a
standardized format.
* Add integration test for CRDs having ambiguous short names
This commit adds integration test to assure that warning message
related to this ambiguity is printed when resources are being retrieved via their short name
representations in cases where multiple resources have same
short names.
This integration test also ensures that the logic behind which resource
will be selected hasn't been changed which may cause disperancies in
clusters.
* Remove defaultConfigFlag global variable
* Move default config flags initialization into function
* Skip warning for versions of same group/resource
* Run update-vendor
* Warn only once when there are multiple versions registered for ambiguous resource
* Apply gocritic review
* Add multi-resource multi-version ambiguity unit test
The kubelet restarts working pods with an exponential back-off delay,
with a maximum cap of 5 minutes. The waiting 1 minutes may happen to be
in back-off time.
Signed-off-by: Ruquan Zhao <ruquan.zhao@arm.com>
framework.SIGDescribe is better because:
- Ginkgo uses the source code location of the test, not of the wrapper,
when reporting progress.
- Additional annotations can be passed.
To make this a drop-in replacement, framework.SIGDescribe generates a function
that can be used instead of the former SIGDescribe functions.
windows.SIGDescribe contained some additional code to ensure that tests are
skipped when not running with a suitable node OS. This gets moved into a
separate wrapper generator, to allow using framework.SIGDescribe as intended.
To ensure that all callers were modified, the windows.sigDescribe isn't
exported anymore (wasn't necessary in the first place!).
These wrapper functions set labels in addition to injecting the annotation into
the test text. It then becomes possible to select tests in different ways:
ginkgo -v --focus="should respect internalTrafficPolicy.*\[FeatureGate:ServiceInternalTrafficPolicy\]"
ginkgo -v --label-filter="FeatureGate:ServiceInternalTrafficPolicy"
ginkgo -v --label-filter="Beta"
When a test runs, ginkgo shows it as:
[It] should respect internalTrafficPolicy=Local Pod to Pod [FeatureGate:ServiceInternalTrafficPolicy] [Beta] [FeatureGate:ServiceInternalTrafficPolicy, Beta]
The test name and the labels at the end are in different colors. Embedding the
annotations inside the text is redundant and only done because users of the e2e
suite might expect it. Also, our tooling that consumes test results currently
doesn't know about ginkgo labels.
Environments, features and node features as described by
https://github.com/kubernetes/enhancements/tree/master/keps/sig-testing/3041-node-conformance-and-features
are also supported.
The framework and thus (at the moment) test/e2e do not have any pre-defined
environments and features. Adding those and modifying tests will follow in
a separate commit.
If something goes wrong during the test registration phase, the only solution
so far was to panic. This is not user-friendly and only allows to report one
problem at a time.
If initialization can continue, then a better solution is to record a bug,
continue, and then report all bugs together.
This also works when just listing tests. The new verify-e2e-suites.sh uses that
to check all test suites (identified as "packages that call
framework.AfterReadingAllFlags", with some exceptions) as part of
pull-kubernetes-verify.
Example output for a fake
framework.RecordBug(framework.NewBug("fake bug during SIGDescribe", 0))
in test/e2e/storage/volume_metrics.go:
```
$ hack/verify-e2e-suites.sh
go version go1.21.1 linux/amd64
ERROR: E2E test suite invocation failed for test/e2e.
ERROR: E2E suite initialization was faulty, these errors must be fixed:
ERROR: test/e2e/storage/volume_metrics.go:49: fake bug during SIGDescribe
E2E suite test/e2e_kubeadm passed.
E2E suite test/e2e_node passed.
```
-list-tests is a more concise alternative for `ginkgo --dry-run` with one line
per test. In contrast to `--dry-run`, it really lists all tests. `--dry-run`
without additional parameters uses the default skip expression from the E2E
context, which filters out flaky and feature-gated tests. The output includes
the source code location where each test is defined. It is sorted by test
name (not source code location) because that order is independent of
reorganizing the source code and ordering by location can be achieved with
"sort".
-list-labels has no corresponding feature in Ginkgo.
One possible usage is to figure out what values might make sense for
-focus/skip/label-filter.
Unit tests will follow in a future commit.
Always printing "Enabling in-tree volume drivers" whenever the E2E suite is
initializing doesn't provide any useful information and makes output of the
upcoming -list-tests look weird.
Rate limitter.go file is a generic file implementing
grpc Limiter interface. This file can be reuse by other gRPC
API not only by podresource.
Change-Id: I905a46b5b605fbb175eb9ad6c15019ffdc7f2563
The spaces are redundant because Ginkgo will add them itself when concatenating
the different test name components. Upcoming change in the framework will
enforce that there are no such redundant spaces.
Several enhancements:
- `--resource-config` is now listed under `controller` options instead of
`leader election`: merely a cosmetic change
- The driver name can be configured as part of the resource config. The
command line flag overrides the config, but only when set explicitly.
This makes it possible to pre-define complete driver setups where the
name is associated with certain resource availability. This will be
used for testing cluster autoscaling.
- The set of nodes where resources are available can optionally be specified
via node labels. This will be used for testing cluster autoscaling.
On local execution of Topology Manager metrics tests, the tests pass rate was 100%.
Yet, we can see that the Topology Manager metrics tests are failing in upstream
CI consistently: https://testgrid.k8s.io/sig-node-presubmits#pr-kubelet-serial-gce-e2e-topology-manager.
From the logs, it was identified that these failures are because of timeouts,
so we are increasing the default timeout as well as polling interval frequency
of obtaining KubeletMetrics to deflake this test.
We have noticed a similar flake in case of CPU manager metrics tests as well:
https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/directory/pull-kubernetes-node-kubelet-serial-cpu-manager/1701615009836044288.
Once it is confirmed that the issue is resolved for Topology Manager test,
we will be fix this for CPU Manager as well in a follow-up PR.
Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
Go 1.22 changed the name of init functions from "glob..func" to
"init.func". That difference is acceptable and has to be ignored when comparing
output.
During scheduler_perf testing, roughly 10% of the PodSchedulingContext update
operations failed with a conflict error. Using SSA would avoid that, but
performance measurements showed that this causes a considerable
slowdown (primarily because of the slower encoding with JSON instead of
protobuf, but also because server-side processing is more expensive).
Therefore a normal update is tried first and SSA only gets used when there has
been a conflict. Using SSA in that case instead of giving up outright is better
because it avoids another scheduling attempt.
The post merge job was failed #117103
and this causes the e2e tests to fail. Considering we have increased
the timeout recently, retriggering the job.
Signed-off-by: Humble Chirammal <humble.devassy@gmail.com>
The Service API Rest implementation is complex and has to use different
hooks on the REST storage. The status store was making a shallow copy of
the storage before adding the hooks, so it was not inheriting the hooks.
The status store must have the same hooks as the rest store to be able
to handle correctly the allocation and deallocation of ClusterIPs and
nodePorts.
Change-Id: I44be21468d36017f0ec41a8f912b8490f8f13f55
Signed-off-by: Antonio Ojea <aojea@google.com>
- move fabriziopandini to emeritus_approvers for /test/e2e*
and /cmd/kubeadm. fabriziopandini remains in /OWNERS_ALIASES
under sig-cluster-lifecycle-leads.
- remove RA489 as reviewer for /test/e2e* and /cmd/kubeadm
The status error was embedded inside the new error constructed by
WaitForPodsResponding's get function, but not wrapped. Therefore
`apierrors.IsServiceUnavailable(err)` didn't find it and returned false -> no
retries.
Wrapping fixes this and Gomega formatting of the error remains useful:
err := &errors.StatusError{}
err.ErrStatus.Code = 503
err.ErrStatus.Message = "temporary failure"
err2 := fmt.Errorf("Controller %s: failed to Get from replica pod %s:\n%w\nPod status:\n%s",
"foo", "bar",
err, "some status")
fmt.Println(format.Object(err2, 1))
fmt.Println(errors.IsServiceUnavailable(err2))
=>
<*fmt.wrapError | 0xc000139340>:
Controller foo: failed to Get from replica pod bar:
temporary failure
Pod status:
some status
{
msg: "Controller foo: failed to Get from replica pod bar:\ntemporary failure\nPod status:\nsome status",
err: <*errors.StatusError | 0xc0001a01e0>{
ErrStatus: {
TypeMeta: {Kind: "", APIVersion: ""},
ListMeta: {
SelfLink: "",
ResourceVersion: "",
Continue: "",
RemainingItemCount: nil,
},
Status: "",
Message: "temporary failure",
Reason: "",
Details: nil,
Code: 503,
},
},
}
true
This uses the generic ptr.To in k8s.io/utils to replace functions and
code constructs which only serve to return pointers to intstr
values. Other uses of the deprecated pointer package are updated in
modified files.
Signed-off-by: Stephen Kitt <skitt@redhat.com>
Add and use more facilities to the *internal* podresources client.
Checking e2e test runs, we have quite some
```
rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial unix /var/lib/kubelet/pod-resources/kubelet.sock: connect: connection refused": rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial unix /var/lib/kubelet/pod-resources/kubelet.sock: connect: connection refused"
```
This is likely caused by kubelet restarts, which we do plenty in e2e tests,
combined with the fact gRPC does lazy connection AND we don't really
check the errors in client code - we just bubble them up.
While it's arguably bad we don't check properly error codes, it's also
true that in the main case, e2e tests, the functions should just never
fail besides few well known cases, we're connecting over a
super-reliable unix domain socket after all.
So, we centralize the fix adding a function (alongside with minor
cleanups) which wants to trigger and ensure the connection happens,
localizing the changes just here. The main advantage is this approach
is opt-in, composable, and doesn't leak gRPC details into the client
code.
Signed-off-by: Francesco Romani <fromani@redhat.com>
This deflakes the "Containers Lifecycle should not launch second
container before PostStart of the first container completed" test by
assigning enough time to finish the postStart hook.
When filtering fails because a ResourceClass is missing, we can treat the pod
as "unschedulable" as long as we then also register a cluster event that wakes
up the pod. This is more efficient than periodically retrying.
Corrected the gotemplate range call
Modified the wrapper class
Delete test/instrumentation/documentation/documentation.md
Removed documentation.md change as we're changing it over the other PR
Restored the original doc.md ; PR is solely for the generator code now
Some label fixes
merge commits
If pods get stuck, then giving the name of one makes it possible
to search for it in the log output. Without the name it's hard
to figure out which pods got stuck.
This helps when using -feature-gate=ContextualLogging=true and running the
SchedulingWithMultipleResourceClaims test case because then output from the two
driver instances is easy to distinguish.
With the new busybox, ash has a built-in sleep command. Prior to this
change we were creating half the pids expected since `sleep` wasn't
actually launching a new binary. Use the full path to /bin/sleep which
avoids the built-in and actually launches a new process.
There were four issues in OpenAPI aggregation cleanup:
1. When removing an APIService, openAPIAggregationController was called
twice while openAPIV3AggregationController was never called, leading
to OpenAPI v3 for the APIService not cleaned up.
2. When removing a local APIService, v2 specAggregator should not return
ErrAPIServiceNotFound when it doesn't find the APIService because
local APIServices were never added to its cache, otherwise confusing
error logs would be generated. Besides, the method's comment
indicates that the desired behavior is that no error is returned if
the APIService does not exist.
3. When removing an APIService, v3 specProxier should update
openapiv2converter's cache, like when updating an APIService,
otherwise the API would not be removed from "/openapi/v3".
4. When v3 AggregationController reconciles an APIService, it should
stop requeueing it if it fails with ErrAPIServiceNotFound as the
APIService has been removed, like what v2 AggregationController does,
otherwise it would keep reconciling the APIService forever.
Signed-off-by: Quan Tian <qtian@vmware.com>
When defining a ClusterIP Service, we can specify externalIP, and the
traffic policy of externalIP is subject to externalTrafficPolicy.
However, the policy can't be set when type is not NodePort or
LoadBalancer, and will default to Cluster when kube-proxy processes the
Service.
This commit updates the defaulting and validation of Service to allow
specifying ExternalTrafficPolicy for ClusterIP Services with
ExternalIPs.
Signed-off-by: Quan Tian <qtian@vmware.com>
Services can expose network applications that are running on
one or more Pods. User need to specify the Port and Protocol of the
network application, and network implementations must forward only
the traffic indicated in the Service, as it may present a security
problem if you allow to forward traffic to a backend if the user
didn't specify it.
Change-Id: I77fbb23c6415ed09dd81c4f2deb6df7a17de46f0
There are some implementations of service that use socket loadbalancing
instead of NAT. These implementations don't need to deal with the
conntrack cleanup, however, they need to cleanup the sockets that are
no longer needed, so the application does not get stuck forever.
This can happen in both TCP or UDP, but since UDP is stateless, the
situation is much complicated because does not have mechanisms like TCP
to detect that socket is no longer needed.
Change-Id: Ic2cfbdf6c8b1f1335e8b5964825dd1fa716fef53
PVC and containers shared the same ResourceRequirements struct to define their
API. When resource claims were added, that struct got extended, which
accidentally also changed the PVC API. To avoid such a mistake from happening
again, PVC now uses its own VolumeResourceRequirements struct.
The `Claims` field gets removed because risk of breaking someone is low:
theoretically, YAML files which have a claims field for volumes now
get rejected when validating against the OpenAPI. Such files
have never made sense and should be fixed.
Code that uses the struct definitions needs to be updated.
The fact that the .status.loadBalancer field can be set while .spec.type
is not "LoadBalancer" is a flub. Any spec update will already clear
.status.ingress, so it's hard to really rely on this. After this
change, updates which try to set this combination will fail validation.
Existing cases of this will not be broken. Any spec/metadata update
will clear it (no error) and this is the only stanza of status.
New gate "AllowServiceLBStatusOnNonLB" is off by default, but can be
enabled if this change actually breaks someone, which seems exceeedingly
unlikely.
This reverts commit bd6f548746.
Running as serial didn't completely eliminate the flake so I think
there's something more going on here. Reverting the change to serial
since its not a solution.
In testing I could only reproduce the flake by running stress-ng to load
the CPU. Running it as serial should reduce and hopefully eliminate the
flakiness.
using wait.PollUntilContextTimeout instead of deprecated wait.Poll for test/integration/scheduler
using wait.PollUntilContextTimeout instead of deprecated wait.Poll for test/e2e/scheduling
using wait.ConditionWithContextFunc for PodScheduled/PodIsGettingEvicted/PodScheduledIn/PodUnschedulable/PodSchedulingError
The new test case covers pods with multiple claims from multiple drivers. This
leads to different behavior (scheduler waits for information from all drivers
instead of optimistically selecting one node right away) and to more concurrent
updates of the PodSchedulingContext objects.
The test case is currently not enabled for unit testing or integration
testing. It can be used manually with:
-bench=BenchmarkPerfScheduling/SchedulingWithMultipleResourceClaims/2000pods_100nodes
... -perf-scheduling-label-filter=
Since EndpointSlices can carry dual stack families, but Endpoints can
only have one single family, the function must take this into account
and only compare the addresses of the same family, otherwise it will
always fail for Services with dual stack endoints, because the endpoint
slices will have always twice addresses than the Endpoints.
Change-Id: Id08cb22f8a2adc103a4f5a4fe3eec25f448cd21b
/exports *(rw,fsid=0,insecure,no_root_squash)
can be mounted as `/exports` using NFSv3 and `/` using NFSv4
Mount as '/', since clients that support both can try both.
The tests have been introduced in
ca7be7dc6d
and checked for `ecc` in `/proc/self/status` since its creation.
We got a new field `Seccomp_filters:` with the Linux commit
c818c03b66,
means that `ecc` would now match both and interfere with possible test
results depending on the host.
The field `Seccomp:` got introduced in
2f4b3bf6b2
and has never changed since then, means we can use it directly to make
the tests more strict.
Refers to https://github.com/kubernetes-sigs/cri-tools/pull/1236
Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
Analyzing the CPU profile of
go test -timeout=0 -count=5 -cpuprofile profile.out -bench=BenchmarkPerfScheduling/.*Claim.* -benchtime=1ns -run=xxx ./test/integration/scheduler_perf
showed that a significant amount of time was spent iterating over allocated
claims to determine how many were allocated per node. That "naive" approach was
taken to avoid maintaining a redundant data structure, but now that performance
measurements show that this comes at a cost, it's not "premature optimization"
anymore to introduce such a second field.
The average scheduling throughput in
SchedulingWithResourceClaimTemplate/2000pods_100nodes increases from 16.4
pods/s to 19.2 pods/s.
EndpointSlices is the evolution of the Endpoint object and most of the
components are using it for implementing Services, this menas that
despite the Endpoint object is up to date, the EndpointSlices may
lag behind, so test must ensure that both objects are in sync to
avoid race conditions.
Change-Id: I5d9bc7774c68f321537379d1f20b2a1fe0b39e6e
In v1.27 StatefulSetStartOrdinal became beta, which makes it on by
default, but we forgot to turn these tests on along with it. This makes
these tests run always.
* Add e2e tests for admission webhooks MatchCondition fields
Signed-off-by: Amine Hilaly <hilalyamine@gmail.com>
* improve naming to distinguish tests
* adding e2e for mutating webhooks and match conditions
* Use `ginkgo.It` instead of `framework.ConformaceIt` and cleanup
resrources after creation
* Enable AdmissionWebhookMatchConditions feature
* Tag only matchcondition tests
* Improve expected error message for denied requests.
* Rename `onlyAllowLeaseObjectMatchConditions` to
`excludeLeasesMatchConditions`
* remove [Alpha] tag from AdmissionWebhookMatchConditions tests
* Using `gomega.Expect` instead of `framworkfail`
* Remove [Feature:AdmissionWebhookMatchConditions] tag
Signed-off-by: Amine <hilalyamine@gmail.com>
* Improve e2e names to specify whether it's using Validating or Mutating admission webhooks
---------
Signed-off-by: Amine Hilaly <hilalyamine@gmail.com>
Signed-off-by: Amine <hilalyamine@gmail.com>
Currently, the test queries the local node, which is not correct for most kubernetes environments.
Instead, ssh to the target node and call journalctl there
Signed-off-by: Peter Hunt <pehunt@redhat.com>
Make it possible to parse jsonpath filter expressions: Split
jsonpath expressions on single '=' only and leave '==' as part of the
string.
Reported-at: https://github.com/kubernetes/kubernetes/issues/119206
Signed-off-by: Andreas Karis <ak.karis@gmail.com>
PodResourcesAPI reports in the List call about resources of pods in terminal phase.
The internal managers reassign resources assigned to pods in terminal phase, so podresources should ignore them.
Whether this behavior intended or not (the docs are not unequivocal)
this e2e test demonstrates and verifies the mentioned above.
Signed-off-by: Talor Itzhak <titzhak@redhat.com>
Instead of numerating all the etcd endpoints known by apiserver, we will
group them by purpose. `etcd-0` will be the default etcd, `etcd-1` will
be the first resource override, `etcd-2` will be the second override and
so on.
The `diff` binary (required by the `kubectl diff` e2e test) gets
statically or dynamically linked based on the used glibc version. We
cannot really predict that behavior for the various platforms of
debian-base and therefore cannot copy the binary around. This means that
distroless is not a great choice for the conformance image unless we
stop relying on `diff`.
This means we now switch back to `debian-base` for the conformance image
to simplify the build process and reduce the amount of moving parts.
Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
* feature(sscheduling_queue): track events per Pods
* fix typos
* record events in one slice and make each in-flight Pod to refer it
* fix: use Pop() in test before AddUnschedulableIfNotPresent to register in-flight Pods
* eliminate MakeNextPodFuncs
* call Done inside the scheduling queue
* fix comment
* implement done() not to require lock in it
* fix UTs
* improve the receivedEvents implementation based on suggestions
* call DonePod when we don't call AddUnschedulableIfNotPresent
* fix UT
* use queuehint to filter out events for in-flight Pods
* fix based on suggestion from aldo
* fix based on suggestion from Wei
* rename lastEventBefore → previousEvent
* fix based on suggestion
* address comments from aldo
* fix based on the suggestion from Abdullah
* gate in-flight Pods logic by the SchedulingQueueHints feature gate
Passing "/bin/sh" arguments to agnhost container has caused failure by
itself.
This fixes the container image, allowing it to properly test the restart
triggered by probe failure.
* Support namespace access from cel expression in validatingadmissionpolicy.
* Whitelist the exposed fields in namespace object and add test
* better handling of cluster-scoped resources.
* [API REVIEW] namespaceObject in Expression doc.
* compatibility with composition.
* generated: ./hack/update-codegen.sh && ./hack/update-openapi-spec.sh
* workaround namespace of namespace is unexpectedly set.
* basic test coverage for namespaceObject.
---------
Co-authored-by: Jiahui Feng <jhf@google.com>