* Support namespace access from cel expression in validatingadmissionpolicy.
* Whitelist the exposed fields in namespace object and add test
* better handling of cluster-scoped resources.
* [API REVIEW] namespaceObject in Expression doc.
* compatibility with composition.
* generated: ./hack/update-codegen.sh && ./hack/update-openapi-spec.sh
* workaround namespace of namespace is unexpectedly set.
* basic test coverage for namespaceObject.
---------
Co-authored-by: Jiahui Feng <jhf@google.com>
* [API REVIEW] ValidatingAdmissionPolicyStatucController config.
worker count.
* ValidatingAdmissionPolicyStatus controller.
* remove CEL typechecking from API server.
* fix initializer tests.
* remove type checking integration tests
from API server integration tests.
* validatingadmissionpolicy-status options.
* grant access to VAP controller.
* add defaulting unit test.
* generated: ./hack/update-codegen.sh
* add OWNERS for VAP status controller.
* type checking test case.
When someone decides that a Pod should definitely run on a specific node, they
can create the Pod with spec.nodeName already set. Some custom scheduler might
do that. Then kubelet starts to check the pod and (if DRA is enabled) will
refuse to run it, either because the claims are still waiting for the first
consumer or the pod wasn't added to reservedFor. Both are things the scheduler
normally does.
Also, if a pod got scheduled while the DRA feature was off in the
kube-scheduler, a pod can reach the same state.
The resource claim controller can handle these two cases by taking over for the
kube-scheduler when nodeName is set. Triggering an allocation is simpler than
in the scheduler because all it takes is creating the right
PodSchedulingContext with spec.selectedNode set. There's no need to list nodes
because that choice was already made, permanently. Adding the pod to
reservedFor also isn't hard.
What's currently missing is triggering de-allocation of claims to re-allocate
them for the desired node. This is not important for claims that get created
for the pod from a template and then only get used once, but it might be
worthwhile to add de-allocation in the future.
If something goes wrong during the Azure cloud detection, trying to cast
the returned value will result in the following panic and give no clue
as to what the error was.
```
panic: interface conversion: cloudprovider.Interface is nil, not *azure.Cloud
goroutine 1 [running]:
k8s.io/kubernetes/test/e2e/framework/providers/azure.newProvider()
test/e2e/framework/providers/azure/azure.go:50 +0x2b5
k8s.io/kubernetes/test/e2e/framework.SetupProviderConfig({0xc0007966b8, 0x5})
test/e2e/framework/provider.go:82 +0x1a6
```
The recommendation and default in the controller helper code is to set
ReservedFor to the pod which triggered delayed allocation. However, this
is neither required nor enforced. Therefore we should also test the fallback
path were kube-scheduler itself adds the pod to ReservedFor.
Combining all prepare/unprepare operations for a pod enables plugins to
optimize the execution. Plugins can continue to use the v1beta2 API for now,
but should switch. The new API is designed so that plugins which want to work
on each claim one-by-one can do so and then report errors for each claim
separately, i.e. partial success is supported.
Change name to make it compliant with prometheus guidelines.
Calculate it on demand instead of periodic to comply with prometheus standards.
Replace "endpoint" with "server" label to make it semantically consistent with storage factory
Make sure orphanded pods (pods deleted while kubelet is down) are
handled correctly.
Outline:
1. create a pod (not static pod)
2. stop kubelet
3. while kubelet is down, force delete the pod on API server
4. restart kubelet
the pod becomes an orphaned pod and is expected to be killed by HandlePodCleanups.
There is a similar test already, but here we want to check device
assignment.
Signed-off-by: Francesco Romani <fromani@redhat.com>
The recently added e2e device plugins test to cover node reboot
works fine if runs every time on CI environment (e.g CI) but
doesn't handle correctly partial setup when run repeatedly on
the same instance (developer setup).
To accomodate both flows, we extend the error management, checking
more error conditions in the flow.
Signed-off-by: Francesco Romani <fromani@redhat.com>
Fix e2e device manager tests.
Most notably, the workload pods needs to survive a kubelet
restart. Update tests to reflect that.
Signed-off-by: Francesco Romani <fromani@redhat.com>
The main problem probably was that
https://github.com/kubernetes/kubernetes/pull/118862 moved creating the first
pod before setting up the callback which blocks allocating one claim for that
pod. This is racy because allocations happen in the background.
The test also was unnecessarily complex and hard to read:
- The intended effect can be achieved with three instead of four claims.
- It wasn't clear which claim has "external-claim-other" as name.
Using the claim variable avoids that.
Generating the name avoids all potential name collisions. It's not clear how
much of a problem that was because users can avoid them and the deterministic
names for generic ephemeral volumes have not led to reports from users. But
using generated names is not too hard either.
What makes it relatively easy is that the new pod.status.resourceClaimStatus
map stores the generated name for kubelet and node authorizer, i.e. the
information in the pod is sufficient to determine the name of the
ResourceClaim.
The resource claim controller becomes a bit more complex and now needs
permission to modify the pod status. The new failure scenario of "ResourceClaim
created, updating pod status fails" is handled with the help of a new special
"resource.kubernetes.io/pod-claim-name" annotation that together with the owner
reference identifies exactly for what a ResourceClaim was generated, so
updating the pod status can be retried for existing ResourceClaims.
The transition from deterministic names is handled with a special case for that
recovery code path: a ResourceClaim with no annotation and a name that follows
the Kubernetes <= 1.27 naming pattern is assumed to be generated for that pod
claim and gets added to the pod status.
There's no immediate need for it, but just in case that it may become relevant,
the name of the generated ResourceClaim may also be left unset to record that
no claim was needed. Components processing such a pod can skip whatever they
normally would do for the claim. To ensure that they do and also cover other
cases properly ("no known field is set", "must check ownership"),
resourceclaim.Name gets extended.
We should evaluate the error, otherwise we risk to hang indefinately on
waiting for the `reschan` in:
64939b66c6/test/e2e_node/util.go (L419)
We also increase the timeout, because it can take a bit longer for
runtimes to determinate depending on the work they have to be done on
running containers.
Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
We use the label definitions in CRI-O, means we now make them public to
stop vendoring/copying this part of Kubernetes.
Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
We only added failed plulgins, but actually this will not work unless
we make the status with a fitError because we only copy the failured plugins
to podInfo if it is a fitError
Signed-off-by: kerthcet <kerthcet@gmail.com>
Normal binaries should never have to do this. It's not safe when there are
already some goroutines running which might do logging. Therefore the new
default is to return an error when a binary accidentally re-applies.
A few unit ensure that there are no goroutines and have to call the functions
more then once. The new ResetForTest API gets used by those to enable changing the
logging settings more than once in the same process.
Integration tests use the same code as the normal binaries. To make reuse of
that code safe, component-base/logs can be configured to silently ignore any
additional calls. This addresses data races that were found when enabling -race
for integration tests. To catch cases where the integration test does want
to modify the config, the old and new config get compared and an error is
raised when it's not the same.
To avoid having to modify all integration tests which start test servers,
reconfiguring component-base/logs is done by the test server packages.
When a pod is done, but not getting removed yet for while, then a claim that
got generated for that pod can be deleted already. This then also triggers
deallocation.
Invalid flags are detected by flag parsing, but optional arguments are just
passed through to the E2E suites. None of them support any, so rejecting them
with an error message is useful because it helps catch typos (like a missing
hyphen before a flag).
perfdash expects all data items to have the same set of labels. It then
renders drop-down buttons for each label with all values found for each
label. Previously, data items that didn't have a label didn't match any label
filter in perfdash and couldn't get selected because perfdash doesn't have
"unset" in it's drop-down menus.
To avoid that, scheduler-perf now collects all labels and then adds missing
labels with "not applicable" as value:
{
"data": {
"Average": 939.7071223010004,
"Perc50": 927.7987421383649,
"Perc90": 2166.153846153846,
"Perc95": 2363.076923076923,
"Perc99": 2520.6153846153848
},
"unit": "ms",
"labels": {
"Metric": "scheduler_pod_scheduling_duration_seconds",
"Name": "SchedulingBasic/5000Nodes/namespace-2",
"extension_point": "not applicable",
"result": "not applicable"
}
},
...
{
"data": {
"Average": 1.1172570650000004,
"Perc50": 1.1418367346938776,
"Perc90": 1.5500000000000003,
"Perc95": 1.6410256410256412,
"Perc99": 3.7333333333333334
},
"unit": "ms",
"labels": {
"Metric": "scheduler_framework_extension_point_duration_seconds",
"Name": "SchedulingBasic/5000Nodes/namespace-2",
"extension_point": "Score",
"result": "not applicable"
}
},
Because the JSON file gets written at the end of the top-level benchmark, all
data items had `BenchmarkPerfScheduling/` as prefix in the `Name` label. This
is redundant and makes it harder to see the actual name. Now that common prefix
gets removed.
CreatePod and MakePod only accepted an `isPrivileged` boolean, which made it
impossible to write tests using those helpers which work in a default
framework.Framework, because the default there is LevelRestricted.
The simple boolean gets replaced with admissionapi.Level. Passing
LevelRestricted does the same as calling e2epod.MixinRestrictedPodSecurity.
Instead of explicitly passing a constant to these modified helpers, most tests
get updated to pass f.NamespacePodSecurityLevel. This has the advantage
that if that level gets lowered in the future, tests only need to be updated in
one place.
In some cases, helpers taking client+namespace+timeouts parameters get replaced
with passing the Framework instance to get access to
f.NamespacePodSecurityEnforceLevel. These helpers don't need separate
parameters because in practice all they ever used where the values from the
Framework instance.
The post merge job was failed https://github.com/kubernetes/kubernetes/pull/117103
and this causes the e2e tests to fail. This PR retrigger the same.
Signed-off-by: Humble Chirammal <humble.devassy@gmail.com>
The namespace the crictical pod was referring to was wrong, because it
was using the generated one instead of `kube-system`. This and the
resulting test condition is now fixed.
The test seems to run only in `ci-crio-cgroupv1-node-e2e-flaky` for now.
Closes https://github.com/kubernetes/kubernetes/issues/109296
Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
Doing the initialization once was not good enough because it was not guaranteed
that RunCustomEtcd gets called early enough, before there are other goroutines
which use gRPC. The data race for
test/integration/apiserver.TestWatchCacheUpdatedByEtcd was:
WARNING: DATA RACE
Read at 0x00000cfffb90 by goroutine 140052:
k8s.io/kubernetes/vendor/google.golang.org/grpc/grpclog.V()
/home/prow/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/google.golang.org/grpc/grpclog/grpclog.go:41 +0x30
k8s.io/kubernetes/vendor/google.golang.org/grpc/grpclog.(*componentData).V()
/home/prow/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/google.golang.org/grpc/grpclog/component.go:103 +0x4e
k8s.io/kubernetes/vendor/google.golang.org/grpc/internal/transport.(*http2Client).Close()
/home/prow/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/google.golang.org/grpc/internal/transport/http2_client.go:955 +0xca
k8s.io/kubernetes/vendor/google.golang.org/grpc/internal/transport.(*http2Client).reader()
/home/prow/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/google.golang.org/grpc/internal/transport/http2_client.go:1619 +0xbfb
k8s.io/kubernetes/vendor/google.golang.org/grpc/internal/transport.newHTTP2Client.func11()
/home/prow/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/google.golang.org/grpc/internal/transport/http2_client.go:394 +0x47
Previous write at 0x00000cfffb90 by goroutine 145643:
k8s.io/kubernetes/vendor/google.golang.org/grpc/grpclog.SetLoggerV2()
/home/prow/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/vendor/google.golang.org/grpc/grpclog/loggerv2.go:75 +0x104
k8s.io/kubernetes/test/integration/framework.RunCustomEtcd.func2()
/home/prow/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/test/integration/framework/etcd.go:157 +0x33
sync.(*Once).doSlow()
/usr/local/go/src/sync/once.go:74 +0x101
sync.(*Once).Do()
/usr/local/go/src/sync/once.go:65 +0x46
k8s.io/kubernetes/test/integration/framework.RunCustomEtcd()
/home/prow/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/test/integration/framework/etcd.go:156 +0xb97
k8s.io/kubernetes/test/integration/apiserver.multiEtcdSetup()
/home/prow/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/test/integration/apiserver/watchcache_test.go:41 +0xc4
k8s.io/kubernetes/test/integration/apiserver.TestWatchCacheUpdatedByEtcd()
/home/prow/go/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/test/integration/apiserver/watchcache_test.go:92 +0xa9
testing.tRunner()
/usr/local/go/src/testing/testing.go:1576 +0x216
testing.(*T).Run.func1()
/usr/local/go/src/testing/testing.go:1629 +0x47
This commit removes the legacy networkpolicy tests since they now have
complete appropriate coverage in the new netpol suite.
Signed-off-by: Andrew Stoycos <astoycos@redhat.com>
We have a e2e test which want to get a rate limit error. To do so, we
sent an abnormally high amount of calls in a tight loop.
The relevant test per se is reportedly fine, but wwe need to play nicer
with *other* tests which may run just after and which need to query the API.
If the testsuite runs "too fast", it's possible an innocent test falls in the
same rate limit watch period which was saturated by the ratelimit test,
so the innocent test can still be throttled because the throttling period
is not exhausted yet, yielding false negatives, leading to flakes.
We can't reset the period for the rate limit, we just wait "long enough" to
make sure we absorb the burst and other legit queries are not rejected.
Signed-off-by: Francesco Romani <fromani@redhat.com>
This releases the underlying resource sooner and ensures that another consumer
can get scheduled without being influenced by a decision that was made for the
previous consumer.
An alternative would have been to have the apiserver trigger the deallocation
whenever it sees the `status.reservedFor` getting reduced to zero. But that
then also triggers deallocation when kube-scheduler removes the last
reservation after a failed scheduling cycle. In that case we want to keep the
claim allocated and let the kube-scheduler decide on a case-by-case basis which
claim should get deallocated.
add integration test to wait for json without value
refactor JSON condition value parsing and validating
adjusting test to reflect the error message refactoring
ginkgo.By should be used for steps in the test flow. Creating and deleting CDI
files happens in parallel to that. If reported via ginkgo.By, progress reports
look weird because they contain e.g. step "waiting for...." (from the main
test, which is still on-going) and end with "creating CDI file" (which is
already completed).
This avoids the surprise of identical authorization checks within a
policy evaluating to different decisions during the same admission
pass, and reduces the overhead of repeatedly referencing the same
authorization check.
This runs workloads that are labeled as "integration-test". The apiserver and
scheduler are only started once per unique configuration, followed by each
workload using that configuration. This makes execution faster. In contrast to
benchmarking, we care less about starting with a clean slate for each test.
Merely deleting the namespace is not enough:
- Workloads might rely on the garbage collector to get rid of obsolete objects,
so we should run it to be on the safe side.
- Pods must be force-deleted because kubelet is not running.
- Finally, the namespace controller is needed to get rid of
deleted namespaces.
* Skip terminal Pods with a deletion timestamp from the Daemonset sync
Change-Id: I64a347a87c02ee2bd48be10e6fff380c8c81f742
* Review comments and fix integration test
Change-Id: I3eb5ec62bce8b4b150726a1e9b2b517c4e993713
* Include deleted terminal pods in history
Change-Id: I8b921157e6be1c809dd59f8035ec259ea4d96301
* test comment should match the code in podgc
* Update test/integration/podgc/podgc_test.go
Co-authored-by: Michał Woźniak <mimowo@users.noreply.github.com>
* test comment should match the code in podgc
---------
Co-authored-by: Michał Woźniak <mimowo@users.noreply.github.com>
The exception comments were added due to a false positive in
staticcheck. This has since been rectified.
Signed-off-by: Madhav Jivrajani <madhav.jiv@gmail.com>
The failure message becomes nicer. Found with the new ginkgolinter, for
example:
test/e2e/apps/cronjob.go:113:3: ginkgo-linter: wrong length assertion; consider using `gomega.Expect(jobs.Items).To(gomega.BeEmpty())` instead (ginkgolinter)
gomega.Expect(jobs.Items).To(gomega.HaveLen(0))
^
The failure message becomes a bit nicer. Found by the new ginkgolinter, for
example:
test/e2e/windows/memory_limits.go:160:2: ginkgo-linter: wrong boolean assertion; consider using `gomega.Eventually(ctx, func() bool {
eventList, err := f.ClientSet.CoreV1().Events(f.Namespace.Name).List(ctx, metav1.ListOptions{})
...
}, 3*time.Minute, 10*time.Second).Should(gomega.BeTrue())` instead (ginkgolinter)
"gomega.Expect" is not the same as "assert" in C: it always has to be combined
with a statement of what is expected.
Found with the new ginkgolinter, for example:
test/e2e/node/pod_resize.go:242:3: ginkgo-linter: "Expect": missing assertion method. Expected "Should()", "To()", "ShouldNot()", "ToNot()" or "NotTo()" (ginkgolinter)
gomega.Expect(found == true)
NOTE: we are not installing the ecr-credential-provider binary
itself here we are, we need to do it out-of-band from the test
suite itself before it runs.
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
Previously, we were trying to patch the deployment's ready replicas by
changing it to 0, at the same time we have 2 available replicas.
Therefore, this test fails since the ready replicas is less than the
available replicas. As a fix, we make the number of ready replicas equals to the number
of available ones.
Signed-off-by: AhmedGrati <ahmedgrati1999@gmail.com>
Each benchmark test case runs with a fresh etcd instance. Therefore it is not
necessary to delete objects after a run.
A future unit test might reuse etcd, therefore cleanup is optional.
Added NodeAlphaFeature:DynamicResourceAllocation to the Node DRA test
to fix failing containerd serial jobs. Those jobs skip tests labeled
with NodeAlphaFeature.
Removed NodeFeature:DynamicResourceAllocation label from the
tests to fix cos-cgroupv1/v2-containerd-node-e2e-serial CI jobs.
It turned out that labeling DRA Node tests as NodeFeature was
a mistake. Re-labeling with NodeAlphaFeature would not work either.
It would fail certain containerd jobs as DRA requires containerd >= 1.7
When running iscsi test, use dbus socket from the host. targetcli uses the
socket for synchronization.
Recent Fedoras can run dbus only via systemd, which is cumbersome here.
Once DeferCleanup for the worker goroutine is invoked, there's no need to
continue doing anything anymore in that goroutine and it can return
immediately, without reporting the "context canceled" error because there is no
other reason for that.
This test requires consistent CPU consumption for 3 minutes
to pass. Consumption on a single Pod is more consistent than
split across multiple Pods: no temporary usage drops in aggregate.
Ginkgo changed the noColor command line arg to be no-color and will
issue the following warning:
You're using deprecated Ginkgo functionality:
=============================================
--noColor is deprecated, use --no-color instead
Fix this by changing all occurrences accordingly.
Fixes issue 115945 by moving the cleanup code in AfterEach into DeferCleanup.
Cleanup stanzas are now paired with their setup stanzas within the body
of the BeforeEach and are now guarenteed to run in the correct order.
Prior to this there was no guarantee that the goroutine to recycle
unbound PVs had finished before the AfterEach began.
Since the feature is GA and locked to true, tests can no longer set it
to false. Cleaning up by removing all references to this feature gate
from tests.
Feature gate will be removed in v1.29.
T.Setenv ensures that the environment is returned to its prior state
when the test ends. It also panics when called from a parallel test to
prevent racy test interdependencies.
ReadWriteOncePod feature needs min requirement of 1.27 kubelet, add the
tag to skip test if kubelet version is smaller than 1.27
Change-Id: I27959156db90f2477cead6dfc16f42dbc54663bc
If kubelet plugin registration fails, it would be good to know more about the
communication with kubelet. Capturing the GRPC calls and then checking that
makes the failure messages more informative. Here's an example where a failure
was triggered by temporarily modifying the check so that it didn't find the
call:
[FAILED] Timed out after 30.000s.
Expected:
<[]app.GRPCCall | len:2, cap:2>: [
{
FullMethod: "/pluginregistration.Registration/GetInfo",
Request:
{},
Response:
endpoint: /var/lib/kubelet/plugins/test-driver/dra.sock
name: test-driver.cdi.k8s.io
supported_versions:
- 1.0.0
type: DRAPlugin,
Err: nil,
},
{
FullMethod: "/pluginregistration.Registration/NotifyRegistrationStatus",
Request:
plugin_registered: true,
Response:
{},
Err: nil,
},
]
to contain successful NotifyRegistrationStatus call
the e2e framwork use active loops to wait for certain async operations,
these loops need to retry on some operations and fail in others.
For the functions that depend on some operations to happen, the
apiserver may return 503 errors until that specific service is
available, so we should retry on those too.
Change-Id: Ib3d194184f6385b9d3d151c7055f27c97c21c3ff
Applying it to the entire spec included cleaning up, which makes predicting the
acceptable duration harder because it includes code not owned by the test
itself. It's better to specify a timeout only for the test code itself.
To test https://github.com/kubernetes/kubernetes/issues/117745,
restart kubelet with a CSI volume mounted *and* the API server running as a
static pod.
The test heavily uses `kind` containers and the fact that it uses the API
server as a static pod.
condidering NewSerializer* funcs are deprecated with
NewSerializerWithOptions(), the test functions are adjusted to the same.
Signed-off-by: Humble Chirammal <humble.devassy@gmail.com>
This is required because an empty name is no longer supported: the
perf-dashboard is run with --allow-parsers-matching-all-tests=false with causes
perfdash to skip current configuration for BenchmarkPerfResults as it does not
have name
set (4674704f45/perfdash/metrics-downloader.go (L165-L167)).
The perf-dash config needs to be updated accordingly.
* Enable dockerized build with --use-dockerized-build=true
* Build and create test artifacts for ARM64 with --target-build-arch=arm64
* Prepull multi-arch ready container image
* Download ARM64 binaries/packages if running on ARM64 machine
`Framework` variable has been removed from test/*
unwanted `[]byte` conversion has been removed
import alias has been avoided
Signed-off-by: Humble Chirammal <humble.devassy@gmail.com>
test/e2e images have lost the parity compared the e2e suite image
versions and this commit make them in parity.
Signed-off-by: Humble Chirammal <humble.devassy@gmail.com>
The plugins get called by scheduler goroutines. At least the polling seems to
be done concurrently and thus needs locking.
Locking the PreBindPlugin state is less obvious. It might be that the scheduler
is really done with the test pod, but that ordering doesn't seem to be enough
for the race detector. It's simpler to add mutex locking.
v1.17.0 has been built at present where this commit make use
of available latest version/tag for the build purpose.
Signed-off-by: Humble Chirammal <humble.devassy@gmail.com>