Some test cases can make nodes not ready and use DeferCleanup to bring
nodes back online. Checking if all nodes are online would fail
in such cases as AfterEach runs before DeferCleanup.
Scheduling nodes readines check to DeferCleanup should solve this
issue as nodes would be brought back to a `Ready` state before the
check.
This is a workaround for the issue that the kubelet cannot differentiate
the container statuses of the previous podSandbox from the current one.
If the node is rebooted, all containers will be in the exited state and
the kubelet will try to recreate a new podSandbox. In this case, the
kubelet should not mistakenly think that the newly created podSandbox
has been initialized.
If the user specifies the intent to control registration process, we rely on
registration triggers (deletion of control file) to prompt registration.
This behvaiour is expected to be consistent across kubelet restarts and therefore
across the watch calls where we watch for changes to the unix socket so we make
this part of Stub object instead of a parameter.
Co-authored-by: Francesco Romani <fromani@redhat.com>
Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
In case `REGISTER_CONTROL_FILE` is specified, we want to ensure that the
registration is triggered by deletion of the control file. This is
applicable both when the registration happens for the first time and
subsequent ones because of kubelet restarts.
Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
In issue: 115107 we added an environment variable to control the registration of sample
device plugin to kubelet. The intent of this patch is to ensure that the default
behaviour of the plugin is to register to kubelet (in case no environment
variable is specified).
In addition to that, we want to ensure that the plugin registers itself not just once.
It should re-register itself to kubelet in case of node reboot or kubelet restarts.
Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
These were found with a modified klog that enables "go vet" to check klog call
parameters:
cmd/kubeadm/app/features/features.go:149:4: printf: k8s.io/klog/v2.Warningf format %t has arg v of wrong type string (govet)
klog.Warningf("Setting deprecated feature gate %s=%t. It will be removed in a future release.", k, v)
test/images/sample-device-plugin/sampledeviceplugin.go:147:5: printf: k8s.io/klog/v2.Errorf does not support error-wrapping directive %w (govet)
klog.Errorf("error: %w", err)
test/images/sample-device-plugin/sampledeviceplugin.go:155:3: printf: k8s.io/klog/v2.Errorf does not support error-wrapping directive %w (govet)
klog.Errorf("Failed to add watch to %q: %w", triggerPath, err)
staging/src/k8s.io/code-generator/cmd/prerelease-lifecycle-gen/prerelease-lifecycle-generators/status.go:207:5: printf: k8s.io/klog/v2.Fatalf does not support error-wrapping directive %w (govet)
klog.Fatalf("Package %v: unsupported %s value: %q :%w", i, tagEnabledName, ptag.value, err)
staging/src/k8s.io/legacy-cloud-providers/vsphere/nodemanager.go:286:3: printf: (k8s.io/klog/v2.Verbose).Infof format %s reads arg #1, but call has 0 args (govet)
klog.V(4).Infof("Node %s missing in vSphere cloud provider cache, trying node informer")
staging/src/k8s.io/legacy-cloud-providers/vsphere/nodemanager.go:302:3: printf: (k8s.io/klog/v2.Verbose).Infof format %s reads arg #1, but call has 0 args (govet)
klog.V(4).Infof("Node %s missing in vSphere cloud provider caches, trying the API server")
KEP-2593 proposed to expand the existing node-ipam controller
to be configurable via a ClusterCIDR objects, however, there
were reasonable doubts on the SIG about the feature and after
several months of dicussions we decided to not move forward
with the KEP intree, hence, we are going to remove the existing
code, that is still in alpha.
https://groups.google.com/g/kubernetes-sig-network/c/nts1xEZ--gQ/m/2aTOUNFFAAAJ
Change-Id: Ieaf2007b0b23c296cde333247bfb672441fe6dfc
* Add warning handler callback function in shortcut expander
Currently, errors in client-go are propagated back to the callers via
function returns. However, there is no elegant way for just warning users.
For example, when user wants to get a resource with it's short name format
and if there are multiple resources belonging to this short name, we need to
warn user about this ambugity which one is picked and which ones are discarded.
Not only to overcome this particular case mentioned above, but also propose a
way for the possible warnings in the future, this commit adds a warningHandler
callback function in shortcutExpander.
* Add warningPrinter functionality in ConfigFlags
ConfigFlags has neither warning user in a standardized
format functionality nor passing warning callback functions to other upper level
libraries such as client-go.
This commit adds an ability that user can set warningPrinters
according to their IOStreams and this warningPrinters will be used
to raise possible warnings happening not only in cli-runtime but
also in client-go.
* Pass warning callback function in ConfigFlags to shortcutExpander
This commit passes warning callback function to print possible
warnings happened in shortcut expander to warn user in a
standardized format.
* Add integration test for CRDs having ambiguous short names
This commit adds integration test to assure that warning message
related to this ambiguity is printed when resources are being retrieved via their short name
representations in cases where multiple resources have same
short names.
This integration test also ensures that the logic behind which resource
will be selected hasn't been changed which may cause disperancies in
clusters.
* Remove defaultConfigFlag global variable
* Move default config flags initialization into function
* Skip warning for versions of same group/resource
* Run update-vendor
* Warn only once when there are multiple versions registered for ambiguous resource
* Apply gocritic review
* Add multi-resource multi-version ambiguity unit test
The kubelet restarts working pods with an exponential back-off delay,
with a maximum cap of 5 minutes. The waiting 1 minutes may happen to be
in back-off time.
Signed-off-by: Ruquan Zhao <ruquan.zhao@arm.com>
framework.SIGDescribe is better because:
- Ginkgo uses the source code location of the test, not of the wrapper,
when reporting progress.
- Additional annotations can be passed.
To make this a drop-in replacement, framework.SIGDescribe generates a function
that can be used instead of the former SIGDescribe functions.
windows.SIGDescribe contained some additional code to ensure that tests are
skipped when not running with a suitable node OS. This gets moved into a
separate wrapper generator, to allow using framework.SIGDescribe as intended.
To ensure that all callers were modified, the windows.sigDescribe isn't
exported anymore (wasn't necessary in the first place!).
These wrapper functions set labels in addition to injecting the annotation into
the test text. It then becomes possible to select tests in different ways:
ginkgo -v --focus="should respect internalTrafficPolicy.*\[FeatureGate:ServiceInternalTrafficPolicy\]"
ginkgo -v --label-filter="FeatureGate:ServiceInternalTrafficPolicy"
ginkgo -v --label-filter="Beta"
When a test runs, ginkgo shows it as:
[It] should respect internalTrafficPolicy=Local Pod to Pod [FeatureGate:ServiceInternalTrafficPolicy] [Beta] [FeatureGate:ServiceInternalTrafficPolicy, Beta]
The test name and the labels at the end are in different colors. Embedding the
annotations inside the text is redundant and only done because users of the e2e
suite might expect it. Also, our tooling that consumes test results currently
doesn't know about ginkgo labels.
Environments, features and node features as described by
https://github.com/kubernetes/enhancements/tree/master/keps/sig-testing/3041-node-conformance-and-features
are also supported.
The framework and thus (at the moment) test/e2e do not have any pre-defined
environments and features. Adding those and modifying tests will follow in
a separate commit.
If something goes wrong during the test registration phase, the only solution
so far was to panic. This is not user-friendly and only allows to report one
problem at a time.
If initialization can continue, then a better solution is to record a bug,
continue, and then report all bugs together.
This also works when just listing tests. The new verify-e2e-suites.sh uses that
to check all test suites (identified as "packages that call
framework.AfterReadingAllFlags", with some exceptions) as part of
pull-kubernetes-verify.
Example output for a fake
framework.RecordBug(framework.NewBug("fake bug during SIGDescribe", 0))
in test/e2e/storage/volume_metrics.go:
```
$ hack/verify-e2e-suites.sh
go version go1.21.1 linux/amd64
ERROR: E2E test suite invocation failed for test/e2e.
ERROR: E2E suite initialization was faulty, these errors must be fixed:
ERROR: test/e2e/storage/volume_metrics.go:49: fake bug during SIGDescribe
E2E suite test/e2e_kubeadm passed.
E2E suite test/e2e_node passed.
```
-list-tests is a more concise alternative for `ginkgo --dry-run` with one line
per test. In contrast to `--dry-run`, it really lists all tests. `--dry-run`
without additional parameters uses the default skip expression from the E2E
context, which filters out flaky and feature-gated tests. The output includes
the source code location where each test is defined. It is sorted by test
name (not source code location) because that order is independent of
reorganizing the source code and ordering by location can be achieved with
"sort".
-list-labels has no corresponding feature in Ginkgo.
One possible usage is to figure out what values might make sense for
-focus/skip/label-filter.
Unit tests will follow in a future commit.
Always printing "Enabling in-tree volume drivers" whenever the E2E suite is
initializing doesn't provide any useful information and makes output of the
upcoming -list-tests look weird.
Rate limitter.go file is a generic file implementing
grpc Limiter interface. This file can be reuse by other gRPC
API not only by podresource.
Change-Id: I905a46b5b605fbb175eb9ad6c15019ffdc7f2563
The spaces are redundant because Ginkgo will add them itself when concatenating
the different test name components. Upcoming change in the framework will
enforce that there are no such redundant spaces.
Several enhancements:
- `--resource-config` is now listed under `controller` options instead of
`leader election`: merely a cosmetic change
- The driver name can be configured as part of the resource config. The
command line flag overrides the config, but only when set explicitly.
This makes it possible to pre-define complete driver setups where the
name is associated with certain resource availability. This will be
used for testing cluster autoscaling.
- The set of nodes where resources are available can optionally be specified
via node labels. This will be used for testing cluster autoscaling.
On local execution of Topology Manager metrics tests, the tests pass rate was 100%.
Yet, we can see that the Topology Manager metrics tests are failing in upstream
CI consistently: https://testgrid.k8s.io/sig-node-presubmits#pr-kubelet-serial-gce-e2e-topology-manager.
From the logs, it was identified that these failures are because of timeouts,
so we are increasing the default timeout as well as polling interval frequency
of obtaining KubeletMetrics to deflake this test.
We have noticed a similar flake in case of CPU manager metrics tests as well:
https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/directory/pull-kubernetes-node-kubelet-serial-cpu-manager/1701615009836044288.
Once it is confirmed that the issue is resolved for Topology Manager test,
we will be fix this for CPU Manager as well in a follow-up PR.
Signed-off-by: Swati Sehgal <swsehgal@redhat.com>
Go 1.22 changed the name of init functions from "glob..func" to
"init.func". That difference is acceptable and has to be ignored when comparing
output.
During scheduler_perf testing, roughly 10% of the PodSchedulingContext update
operations failed with a conflict error. Using SSA would avoid that, but
performance measurements showed that this causes a considerable
slowdown (primarily because of the slower encoding with JSON instead of
protobuf, but also because server-side processing is more expensive).
Therefore a normal update is tried first and SSA only gets used when there has
been a conflict. Using SSA in that case instead of giving up outright is better
because it avoids another scheduling attempt.
The post merge job was failed #117103
and this causes the e2e tests to fail. Considering we have increased
the timeout recently, retriggering the job.
Signed-off-by: Humble Chirammal <humble.devassy@gmail.com>
The Service API Rest implementation is complex and has to use different
hooks on the REST storage. The status store was making a shallow copy of
the storage before adding the hooks, so it was not inheriting the hooks.
The status store must have the same hooks as the rest store to be able
to handle correctly the allocation and deallocation of ClusterIPs and
nodePorts.
Change-Id: I44be21468d36017f0ec41a8f912b8490f8f13f55
Signed-off-by: Antonio Ojea <aojea@google.com>
- move fabriziopandini to emeritus_approvers for /test/e2e*
and /cmd/kubeadm. fabriziopandini remains in /OWNERS_ALIASES
under sig-cluster-lifecycle-leads.
- remove RA489 as reviewer for /test/e2e* and /cmd/kubeadm
The status error was embedded inside the new error constructed by
WaitForPodsResponding's get function, but not wrapped. Therefore
`apierrors.IsServiceUnavailable(err)` didn't find it and returned false -> no
retries.
Wrapping fixes this and Gomega formatting of the error remains useful:
err := &errors.StatusError{}
err.ErrStatus.Code = 503
err.ErrStatus.Message = "temporary failure"
err2 := fmt.Errorf("Controller %s: failed to Get from replica pod %s:\n%w\nPod status:\n%s",
"foo", "bar",
err, "some status")
fmt.Println(format.Object(err2, 1))
fmt.Println(errors.IsServiceUnavailable(err2))
=>
<*fmt.wrapError | 0xc000139340>:
Controller foo: failed to Get from replica pod bar:
temporary failure
Pod status:
some status
{
msg: "Controller foo: failed to Get from replica pod bar:\ntemporary failure\nPod status:\nsome status",
err: <*errors.StatusError | 0xc0001a01e0>{
ErrStatus: {
TypeMeta: {Kind: "", APIVersion: ""},
ListMeta: {
SelfLink: "",
ResourceVersion: "",
Continue: "",
RemainingItemCount: nil,
},
Status: "",
Message: "temporary failure",
Reason: "",
Details: nil,
Code: 503,
},
},
}
true
This uses the generic ptr.To in k8s.io/utils to replace functions and
code constructs which only serve to return pointers to intstr
values. Other uses of the deprecated pointer package are updated in
modified files.
Signed-off-by: Stephen Kitt <skitt@redhat.com>
Add and use more facilities to the *internal* podresources client.
Checking e2e test runs, we have quite some
```
rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial unix /var/lib/kubelet/pod-resources/kubelet.sock: connect: connection refused": rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial unix /var/lib/kubelet/pod-resources/kubelet.sock: connect: connection refused"
```
This is likely caused by kubelet restarts, which we do plenty in e2e tests,
combined with the fact gRPC does lazy connection AND we don't really
check the errors in client code - we just bubble them up.
While it's arguably bad we don't check properly error codes, it's also
true that in the main case, e2e tests, the functions should just never
fail besides few well known cases, we're connecting over a
super-reliable unix domain socket after all.
So, we centralize the fix adding a function (alongside with minor
cleanups) which wants to trigger and ensure the connection happens,
localizing the changes just here. The main advantage is this approach
is opt-in, composable, and doesn't leak gRPC details into the client
code.
Signed-off-by: Francesco Romani <fromani@redhat.com>
This deflakes the "Containers Lifecycle should not launch second
container before PostStart of the first container completed" test by
assigning enough time to finish the postStart hook.
When filtering fails because a ResourceClass is missing, we can treat the pod
as "unschedulable" as long as we then also register a cluster event that wakes
up the pod. This is more efficient than periodically retrying.
Corrected the gotemplate range call
Modified the wrapper class
Delete test/instrumentation/documentation/documentation.md
Removed documentation.md change as we're changing it over the other PR
Restored the original doc.md ; PR is solely for the generator code now
Some label fixes
merge commits
If pods get stuck, then giving the name of one makes it possible
to search for it in the log output. Without the name it's hard
to figure out which pods got stuck.
This helps when using -feature-gate=ContextualLogging=true and running the
SchedulingWithMultipleResourceClaims test case because then output from the two
driver instances is easy to distinguish.
With the new busybox, ash has a built-in sleep command. Prior to this
change we were creating half the pids expected since `sleep` wasn't
actually launching a new binary. Use the full path to /bin/sleep which
avoids the built-in and actually launches a new process.
There were four issues in OpenAPI aggregation cleanup:
1. When removing an APIService, openAPIAggregationController was called
twice while openAPIV3AggregationController was never called, leading
to OpenAPI v3 for the APIService not cleaned up.
2. When removing a local APIService, v2 specAggregator should not return
ErrAPIServiceNotFound when it doesn't find the APIService because
local APIServices were never added to its cache, otherwise confusing
error logs would be generated. Besides, the method's comment
indicates that the desired behavior is that no error is returned if
the APIService does not exist.
3. When removing an APIService, v3 specProxier should update
openapiv2converter's cache, like when updating an APIService,
otherwise the API would not be removed from "/openapi/v3".
4. When v3 AggregationController reconciles an APIService, it should
stop requeueing it if it fails with ErrAPIServiceNotFound as the
APIService has been removed, like what v2 AggregationController does,
otherwise it would keep reconciling the APIService forever.
Signed-off-by: Quan Tian <qtian@vmware.com>
When defining a ClusterIP Service, we can specify externalIP, and the
traffic policy of externalIP is subject to externalTrafficPolicy.
However, the policy can't be set when type is not NodePort or
LoadBalancer, and will default to Cluster when kube-proxy processes the
Service.
This commit updates the defaulting and validation of Service to allow
specifying ExternalTrafficPolicy for ClusterIP Services with
ExternalIPs.
Signed-off-by: Quan Tian <qtian@vmware.com>
Services can expose network applications that are running on
one or more Pods. User need to specify the Port and Protocol of the
network application, and network implementations must forward only
the traffic indicated in the Service, as it may present a security
problem if you allow to forward traffic to a backend if the user
didn't specify it.
Change-Id: I77fbb23c6415ed09dd81c4f2deb6df7a17de46f0
There are some implementations of service that use socket loadbalancing
instead of NAT. These implementations don't need to deal with the
conntrack cleanup, however, they need to cleanup the sockets that are
no longer needed, so the application does not get stuck forever.
This can happen in both TCP or UDP, but since UDP is stateless, the
situation is much complicated because does not have mechanisms like TCP
to detect that socket is no longer needed.
Change-Id: Ic2cfbdf6c8b1f1335e8b5964825dd1fa716fef53
PVC and containers shared the same ResourceRequirements struct to define their
API. When resource claims were added, that struct got extended, which
accidentally also changed the PVC API. To avoid such a mistake from happening
again, PVC now uses its own VolumeResourceRequirements struct.
The `Claims` field gets removed because risk of breaking someone is low:
theoretically, YAML files which have a claims field for volumes now
get rejected when validating against the OpenAPI. Such files
have never made sense and should be fixed.
Code that uses the struct definitions needs to be updated.
The fact that the .status.loadBalancer field can be set while .spec.type
is not "LoadBalancer" is a flub. Any spec update will already clear
.status.ingress, so it's hard to really rely on this. After this
change, updates which try to set this combination will fail validation.
Existing cases of this will not be broken. Any spec/metadata update
will clear it (no error) and this is the only stanza of status.
New gate "AllowServiceLBStatusOnNonLB" is off by default, but can be
enabled if this change actually breaks someone, which seems exceeedingly
unlikely.
This reverts commit bd6f548746.
Running as serial didn't completely eliminate the flake so I think
there's something more going on here. Reverting the change to serial
since its not a solution.
In testing I could only reproduce the flake by running stress-ng to load
the CPU. Running it as serial should reduce and hopefully eliminate the
flakiness.
using wait.PollUntilContextTimeout instead of deprecated wait.Poll for test/integration/scheduler
using wait.PollUntilContextTimeout instead of deprecated wait.Poll for test/e2e/scheduling
using wait.ConditionWithContextFunc for PodScheduled/PodIsGettingEvicted/PodScheduledIn/PodUnschedulable/PodSchedulingError
The new test case covers pods with multiple claims from multiple drivers. This
leads to different behavior (scheduler waits for information from all drivers
instead of optimistically selecting one node right away) and to more concurrent
updates of the PodSchedulingContext objects.
The test case is currently not enabled for unit testing or integration
testing. It can be used manually with:
-bench=BenchmarkPerfScheduling/SchedulingWithMultipleResourceClaims/2000pods_100nodes
... -perf-scheduling-label-filter=
Since EndpointSlices can carry dual stack families, but Endpoints can
only have one single family, the function must take this into account
and only compare the addresses of the same family, otherwise it will
always fail for Services with dual stack endoints, because the endpoint
slices will have always twice addresses than the Endpoints.
Change-Id: Id08cb22f8a2adc103a4f5a4fe3eec25f448cd21b
/exports *(rw,fsid=0,insecure,no_root_squash)
can be mounted as `/exports` using NFSv3 and `/` using NFSv4
Mount as '/', since clients that support both can try both.
The tests have been introduced in
ca7be7dc6d
and checked for `ecc` in `/proc/self/status` since its creation.
We got a new field `Seccomp_filters:` with the Linux commit
c818c03b66,
means that `ecc` would now match both and interfere with possible test
results depending on the host.
The field `Seccomp:` got introduced in
2f4b3bf6b2
and has never changed since then, means we can use it directly to make
the tests more strict.
Refers to https://github.com/kubernetes-sigs/cri-tools/pull/1236
Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
Analyzing the CPU profile of
go test -timeout=0 -count=5 -cpuprofile profile.out -bench=BenchmarkPerfScheduling/.*Claim.* -benchtime=1ns -run=xxx ./test/integration/scheduler_perf
showed that a significant amount of time was spent iterating over allocated
claims to determine how many were allocated per node. That "naive" approach was
taken to avoid maintaining a redundant data structure, but now that performance
measurements show that this comes at a cost, it's not "premature optimization"
anymore to introduce such a second field.
The average scheduling throughput in
SchedulingWithResourceClaimTemplate/2000pods_100nodes increases from 16.4
pods/s to 19.2 pods/s.
EndpointSlices is the evolution of the Endpoint object and most of the
components are using it for implementing Services, this menas that
despite the Endpoint object is up to date, the EndpointSlices may
lag behind, so test must ensure that both objects are in sync to
avoid race conditions.
Change-Id: I5d9bc7774c68f321537379d1f20b2a1fe0b39e6e
In v1.27 StatefulSetStartOrdinal became beta, which makes it on by
default, but we forgot to turn these tests on along with it. This makes
these tests run always.
* Add e2e tests for admission webhooks MatchCondition fields
Signed-off-by: Amine Hilaly <hilalyamine@gmail.com>
* improve naming to distinguish tests
* adding e2e for mutating webhooks and match conditions
* Use `ginkgo.It` instead of `framework.ConformaceIt` and cleanup
resrources after creation
* Enable AdmissionWebhookMatchConditions feature
* Tag only matchcondition tests
* Improve expected error message for denied requests.
* Rename `onlyAllowLeaseObjectMatchConditions` to
`excludeLeasesMatchConditions`
* remove [Alpha] tag from AdmissionWebhookMatchConditions tests
* Using `gomega.Expect` instead of `framworkfail`
* Remove [Feature:AdmissionWebhookMatchConditions] tag
Signed-off-by: Amine <hilalyamine@gmail.com>
* Improve e2e names to specify whether it's using Validating or Mutating admission webhooks
---------
Signed-off-by: Amine Hilaly <hilalyamine@gmail.com>
Signed-off-by: Amine <hilalyamine@gmail.com>
Currently, the test queries the local node, which is not correct for most kubernetes environments.
Instead, ssh to the target node and call journalctl there
Signed-off-by: Peter Hunt <pehunt@redhat.com>
Make it possible to parse jsonpath filter expressions: Split
jsonpath expressions on single '=' only and leave '==' as part of the
string.
Reported-at: https://github.com/kubernetes/kubernetes/issues/119206
Signed-off-by: Andreas Karis <ak.karis@gmail.com>
PodResourcesAPI reports in the List call about resources of pods in terminal phase.
The internal managers reassign resources assigned to pods in terminal phase, so podresources should ignore them.
Whether this behavior intended or not (the docs are not unequivocal)
this e2e test demonstrates and verifies the mentioned above.
Signed-off-by: Talor Itzhak <titzhak@redhat.com>
Instead of numerating all the etcd endpoints known by apiserver, we will
group them by purpose. `etcd-0` will be the default etcd, `etcd-1` will
be the first resource override, `etcd-2` will be the second override and
so on.
The `diff` binary (required by the `kubectl diff` e2e test) gets
statically or dynamically linked based on the used glibc version. We
cannot really predict that behavior for the various platforms of
debian-base and therefore cannot copy the binary around. This means that
distroless is not a great choice for the conformance image unless we
stop relying on `diff`.
This means we now switch back to `debian-base` for the conformance image
to simplify the build process and reduce the amount of moving parts.
Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
* feature(sscheduling_queue): track events per Pods
* fix typos
* record events in one slice and make each in-flight Pod to refer it
* fix: use Pop() in test before AddUnschedulableIfNotPresent to register in-flight Pods
* eliminate MakeNextPodFuncs
* call Done inside the scheduling queue
* fix comment
* implement done() not to require lock in it
* fix UTs
* improve the receivedEvents implementation based on suggestions
* call DonePod when we don't call AddUnschedulableIfNotPresent
* fix UT
* use queuehint to filter out events for in-flight Pods
* fix based on suggestion from aldo
* fix based on suggestion from Wei
* rename lastEventBefore → previousEvent
* fix based on suggestion
* address comments from aldo
* fix based on the suggestion from Abdullah
* gate in-flight Pods logic by the SchedulingQueueHints feature gate
Passing "/bin/sh" arguments to agnhost container has caused failure by
itself.
This fixes the container image, allowing it to properly test the restart
triggered by probe failure.
* Support namespace access from cel expression in validatingadmissionpolicy.
* Whitelist the exposed fields in namespace object and add test
* better handling of cluster-scoped resources.
* [API REVIEW] namespaceObject in Expression doc.
* compatibility with composition.
* generated: ./hack/update-codegen.sh && ./hack/update-openapi-spec.sh
* workaround namespace of namespace is unexpectedly set.
* basic test coverage for namespaceObject.
---------
Co-authored-by: Jiahui Feng <jhf@google.com>
* [API REVIEW] ValidatingAdmissionPolicyStatucController config.
worker count.
* ValidatingAdmissionPolicyStatus controller.
* remove CEL typechecking from API server.
* fix initializer tests.
* remove type checking integration tests
from API server integration tests.
* validatingadmissionpolicy-status options.
* grant access to VAP controller.
* add defaulting unit test.
* generated: ./hack/update-codegen.sh
* add OWNERS for VAP status controller.
* type checking test case.
When someone decides that a Pod should definitely run on a specific node, they
can create the Pod with spec.nodeName already set. Some custom scheduler might
do that. Then kubelet starts to check the pod and (if DRA is enabled) will
refuse to run it, either because the claims are still waiting for the first
consumer or the pod wasn't added to reservedFor. Both are things the scheduler
normally does.
Also, if a pod got scheduled while the DRA feature was off in the
kube-scheduler, a pod can reach the same state.
The resource claim controller can handle these two cases by taking over for the
kube-scheduler when nodeName is set. Triggering an allocation is simpler than
in the scheduler because all it takes is creating the right
PodSchedulingContext with spec.selectedNode set. There's no need to list nodes
because that choice was already made, permanently. Adding the pod to
reservedFor also isn't hard.
What's currently missing is triggering de-allocation of claims to re-allocate
them for the desired node. This is not important for claims that get created
for the pod from a template and then only get used once, but it might be
worthwhile to add de-allocation in the future.
If something goes wrong during the Azure cloud detection, trying to cast
the returned value will result in the following panic and give no clue
as to what the error was.
```
panic: interface conversion: cloudprovider.Interface is nil, not *azure.Cloud
goroutine 1 [running]:
k8s.io/kubernetes/test/e2e/framework/providers/azure.newProvider()
test/e2e/framework/providers/azure/azure.go:50 +0x2b5
k8s.io/kubernetes/test/e2e/framework.SetupProviderConfig({0xc0007966b8, 0x5})
test/e2e/framework/provider.go:82 +0x1a6
```
The recommendation and default in the controller helper code is to set
ReservedFor to the pod which triggered delayed allocation. However, this
is neither required nor enforced. Therefore we should also test the fallback
path were kube-scheduler itself adds the pod to ReservedFor.
Combining all prepare/unprepare operations for a pod enables plugins to
optimize the execution. Plugins can continue to use the v1beta2 API for now,
but should switch. The new API is designed so that plugins which want to work
on each claim one-by-one can do so and then report errors for each claim
separately, i.e. partial success is supported.
Change name to make it compliant with prometheus guidelines.
Calculate it on demand instead of periodic to comply with prometheus standards.
Replace "endpoint" with "server" label to make it semantically consistent with storage factory
Make sure orphanded pods (pods deleted while kubelet is down) are
handled correctly.
Outline:
1. create a pod (not static pod)
2. stop kubelet
3. while kubelet is down, force delete the pod on API server
4. restart kubelet
the pod becomes an orphaned pod and is expected to be killed by HandlePodCleanups.
There is a similar test already, but here we want to check device
assignment.
Signed-off-by: Francesco Romani <fromani@redhat.com>
The recently added e2e device plugins test to cover node reboot
works fine if runs every time on CI environment (e.g CI) but
doesn't handle correctly partial setup when run repeatedly on
the same instance (developer setup).
To accomodate both flows, we extend the error management, checking
more error conditions in the flow.
Signed-off-by: Francesco Romani <fromani@redhat.com>
Fix e2e device manager tests.
Most notably, the workload pods needs to survive a kubelet
restart. Update tests to reflect that.
Signed-off-by: Francesco Romani <fromani@redhat.com>
The main problem probably was that
https://github.com/kubernetes/kubernetes/pull/118862 moved creating the first
pod before setting up the callback which blocks allocating one claim for that
pod. This is racy because allocations happen in the background.
The test also was unnecessarily complex and hard to read:
- The intended effect can be achieved with three instead of four claims.
- It wasn't clear which claim has "external-claim-other" as name.
Using the claim variable avoids that.
Generating the name avoids all potential name collisions. It's not clear how
much of a problem that was because users can avoid them and the deterministic
names for generic ephemeral volumes have not led to reports from users. But
using generated names is not too hard either.
What makes it relatively easy is that the new pod.status.resourceClaimStatus
map stores the generated name for kubelet and node authorizer, i.e. the
information in the pod is sufficient to determine the name of the
ResourceClaim.
The resource claim controller becomes a bit more complex and now needs
permission to modify the pod status. The new failure scenario of "ResourceClaim
created, updating pod status fails" is handled with the help of a new special
"resource.kubernetes.io/pod-claim-name" annotation that together with the owner
reference identifies exactly for what a ResourceClaim was generated, so
updating the pod status can be retried for existing ResourceClaims.
The transition from deterministic names is handled with a special case for that
recovery code path: a ResourceClaim with no annotation and a name that follows
the Kubernetes <= 1.27 naming pattern is assumed to be generated for that pod
claim and gets added to the pod status.
There's no immediate need for it, but just in case that it may become relevant,
the name of the generated ResourceClaim may also be left unset to record that
no claim was needed. Components processing such a pod can skip whatever they
normally would do for the claim. To ensure that they do and also cover other
cases properly ("no known field is set", "must check ownership"),
resourceclaim.Name gets extended.
We should evaluate the error, otherwise we risk to hang indefinately on
waiting for the `reschan` in:
64939b66c6/test/e2e_node/util.go (L419)
We also increase the timeout, because it can take a bit longer for
runtimes to determinate depending on the work they have to be done on
running containers.
Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
We use the label definitions in CRI-O, means we now make them public to
stop vendoring/copying this part of Kubernetes.
Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
We only added failed plulgins, but actually this will not work unless
we make the status with a fitError because we only copy the failured plugins
to podInfo if it is a fitError
Signed-off-by: kerthcet <kerthcet@gmail.com>
Normal binaries should never have to do this. It's not safe when there are
already some goroutines running which might do logging. Therefore the new
default is to return an error when a binary accidentally re-applies.
A few unit ensure that there are no goroutines and have to call the functions
more then once. The new ResetForTest API gets used by those to enable changing the
logging settings more than once in the same process.
Integration tests use the same code as the normal binaries. To make reuse of
that code safe, component-base/logs can be configured to silently ignore any
additional calls. This addresses data races that were found when enabling -race
for integration tests. To catch cases where the integration test does want
to modify the config, the old and new config get compared and an error is
raised when it's not the same.
To avoid having to modify all integration tests which start test servers,
reconfiguring component-base/logs is done by the test server packages.
When a pod is done, but not getting removed yet for while, then a claim that
got generated for that pod can be deleted already. This then also triggers
deallocation.
Invalid flags are detected by flag parsing, but optional arguments are just
passed through to the E2E suites. None of them support any, so rejecting them
with an error message is useful because it helps catch typos (like a missing
hyphen before a flag).
perfdash expects all data items to have the same set of labels. It then
renders drop-down buttons for each label with all values found for each
label. Previously, data items that didn't have a label didn't match any label
filter in perfdash and couldn't get selected because perfdash doesn't have
"unset" in it's drop-down menus.
To avoid that, scheduler-perf now collects all labels and then adds missing
labels with "not applicable" as value:
{
"data": {
"Average": 939.7071223010004,
"Perc50": 927.7987421383649,
"Perc90": 2166.153846153846,
"Perc95": 2363.076923076923,
"Perc99": 2520.6153846153848
},
"unit": "ms",
"labels": {
"Metric": "scheduler_pod_scheduling_duration_seconds",
"Name": "SchedulingBasic/5000Nodes/namespace-2",
"extension_point": "not applicable",
"result": "not applicable"
}
},
...
{
"data": {
"Average": 1.1172570650000004,
"Perc50": 1.1418367346938776,
"Perc90": 1.5500000000000003,
"Perc95": 1.6410256410256412,
"Perc99": 3.7333333333333334
},
"unit": "ms",
"labels": {
"Metric": "scheduler_framework_extension_point_duration_seconds",
"Name": "SchedulingBasic/5000Nodes/namespace-2",
"extension_point": "Score",
"result": "not applicable"
}
},
Because the JSON file gets written at the end of the top-level benchmark, all
data items had `BenchmarkPerfScheduling/` as prefix in the `Name` label. This
is redundant and makes it harder to see the actual name. Now that common prefix
gets removed.
CreatePod and MakePod only accepted an `isPrivileged` boolean, which made it
impossible to write tests using those helpers which work in a default
framework.Framework, because the default there is LevelRestricted.
The simple boolean gets replaced with admissionapi.Level. Passing
LevelRestricted does the same as calling e2epod.MixinRestrictedPodSecurity.
Instead of explicitly passing a constant to these modified helpers, most tests
get updated to pass f.NamespacePodSecurityLevel. This has the advantage
that if that level gets lowered in the future, tests only need to be updated in
one place.
In some cases, helpers taking client+namespace+timeouts parameters get replaced
with passing the Framework instance to get access to
f.NamespacePodSecurityEnforceLevel. These helpers don't need separate
parameters because in practice all they ever used where the values from the
Framework instance.
The post merge job was failed https://github.com/kubernetes/kubernetes/pull/117103
and this causes the e2e tests to fail. This PR retrigger the same.
Signed-off-by: Humble Chirammal <humble.devassy@gmail.com>
The namespace the crictical pod was referring to was wrong, because it
was using the generated one instead of `kube-system`. This and the
resulting test condition is now fixed.
The test seems to run only in `ci-crio-cgroupv1-node-e2e-flaky` for now.
Closes https://github.com/kubernetes/kubernetes/issues/109296
Signed-off-by: Sascha Grunert <sgrunert@redhat.com>