Through Job.status.uncountedPodUIDs and a Pod finalizer
An annotation marks if a job should be tracked with new behavior
A separate work queue is used to remove finalizers from orphan pods.
Change-Id: I1862e930257a9d1f7f1b2b0a526ed15bc8c248ad
As of now, we allow PDBs to be applied to pods via
selectors, so there can be unmanaged pods(pods that
don't have backing controllers) but still have PDBs associated.
Such pods are to be logged instead of immediately throwing
a sync error. This ensures disruption controller is
not frequently updating the status subresource and thus
preventing excessive and expensive writes to etcd.
A number of race conditions exist when pods are terminated early in
their lifecycle because components in the kubelet need to know "no
running containers" or "containers can't be started from now on" but
were relying on outdated state.
Only the pod worker knows whether containers are being started for
a given pod, which is required to know when a pod is "terminated"
(no running containers, none coming). Move that responsibility and
podKiller function into the pod workers, and have everything that
was killing the pod go into the UpdatePod loop. Split syncPod into
three phases - setup, terminate containers, and cleanup pod - and
have transitions between those methods be visible to other
components. After this change, to kill a pod you tell the pod worker
to UpdatePod({UpdateType: SyncPodKill, Pod: pod}).
Several places in the kubelet were incorrect about whether they
were handling terminating (should stop running, might have
containers) or terminated (no running containers) pods. The pod worker
exposes methods that allow other loops to know when to set up or tear
down resources based on the state of the pod - these methods remove
the possibility of race conditions by ensuring a single component is
responsible for knowing each pod's allowed state and other components
simply delegate to checking whether they are in the window by UID.
Removing containers now no longer blocks final pod deletion in the
API server and are handled as background cleanup. Node shutdown
no longer marks pods as failed as they can be restarted in the
next step.
See https://docs.google.com/document/d/1Pic5TPntdJnYfIpBeZndDelM-AbS4FN9H2GTLFhoJ04/edit# for details
UserInfo contains a uid field alongside groups, username and extra.
This change makes it possible to pass a UID through as an impersonation header like you
can with Impersonate-Group, Impersonate-User and Impersonate-Extra.
This PR contains:
* Changes to impersonation.go to parse the Impersonate-Uid header and authorize uid impersonation
* Unit tests for allowed and disallowed impersonation cases
* An integration test that creates a CertificateSigningRequest using impersonation,
and ensures that the API server populates the correct impersonated spec.uid upon creation.
1. add AllocateLoadBalancerNodePorts fields in specs for validation test cases
2. update fuzzer
3. in resource quota e2e, allocate node port for loadbalancer type service and
exceed the node port quota
Signed-off-by: Hanlin Shi <shihanlin9@gmail.com>
This change updates the CSR API to add a new, optional field called
expirationSeconds. This field is a request to the signer for the
maximum duration the client wishes the cert to have. The signer is
free to ignore this request based on its own internal policy. The
signers built-in to KCM will honor this field if it is not set to a
value greater than --cluster-signing-duration. The minimum allowed
value for this field is 600 seconds (ten minutes).
This change will help enforce safer durations for certificates in
the Kube ecosystem and will help related projects such as
cert-manager with their migration to the Kube CSR API.
Future enhancements may update the Kubelet to take advantage of this
field when it is configured in a way that can tolerate shorter
certificate lifespans with regular rotation.
Signed-off-by: Monis Khan <mok@vmware.com>
Ensure resources are created in zone with schedulable
nodes. For example, if we have 4 zones with 3 zones
having worker nodes and 1 zone having master nodes(unscheduable
for workloads), we should not create resources like PV, PVC or
pods in that zone.
We're running ubernetes tests
`should only be allowed to provision PDs in zones
where nodes exist`
on gcp&gke. While the test is useful in exercising
the scenario of identifying extra zone and
creating a node in it, not every Kube
distribution uses the same approach to create a node,
further if even there is an extra zone, we cannot
guarantee the zone to have enough quota. There can also
be other GCP specific edge cases all of which cannot be
covered within this test. So, removing the test
as agreed upon with the storage team
The data structure would wrap an embedded filesystem andthe root
directory relative to which the embedded filesystem is constructed.
Signed-off-by: Nabarun Pal <pal.nabarun95@gmail.com>
We're trying to fix https://github.com/kubernetes/kubernetes/issues/75355
sicne long time, and we believe the current timeout could
actually be too low (despite being "forever", which is 30s).
To validate this theory, we set the timeout to one full minute.
Also, make the logging more verbose to make the troubleshooting easier.
Signed-off-by: Francesco Romani <fromani@redhat.com>
The PR https://github.com/kubernetes/kubernetes/pull/100041 updated
node-problem-detector to v0.8.7, but unfortunately we didn't update
also the image using in the e2e_node tests.
As result, the tests were failing like
E2eNode Suite: [sig-node] NodeProblemDetector [NodeFeature:NodeProblemDetector] [Serial] SystemLogMonitor should generate node condition and events for corresponding errors
_output/local/go/src/k8s.io/kubernetes/test/e2e_node/node_problem_detector_linux.go:301
Timed out after 60.000s.
Expected success, but got an error:
<*errors.errorString | 0xc0011f2600>: {
s: "expected total number of events was 4, actual events counted was 7\nEvents
This in turn was one of the contributing factors in making the
pull-kubernetes-node-kubelet-serial lane constantly failing.
This patch updates the image used in the tests, fixing the failure.
Signed-off-by: Francesco Romani <fromani@redhat.com>
Some tests are checking the network connectivity using gomega.Consistently,
which will fail if any of the checks fails. This could lead to flakyness in
some scenarios in which kube-proxy was supposed to apply Policies for
Kubernetes services.
We can instead wait for the network connectivity to work first using gomega.Eventually,
after which we can check the consistency.
For manifest lists containing Windows images, it is important to also have the "os.version"
annotation set, as it is needed by the Windows nodes, so they can pull the appropriate image
from the list.
Previously, the docker manifest CLI did not have the capability to set it, so, we had to set
it outselves in the manifest list's image JSON file. This is no longer necessary since
docker 20.10.0, which includes docker manifest annotate --os-version.
The docker installed in the image gcr.io/k8s-testimages/gcb-docker-gcloud:v20210622-762366a
satisfies this version requirement.
The CPUManager graduated to beta a while ago (k8s 1.10?)
so let's get rid of the obsolete Alpha tag on its e2e tests.
Signed-off-by: Francesco Romani <fromani@redhat.com>
- verify memory manager data returned by `GetAllocatableResources`
- verify pod container memory manager data
Signed-off-by: Artyom Lukianov <alukiano@redhat.com>
In the case of multinode clusters, the http server pod and the test cluster can
spawn on different nodes, which can be problematic for poststart / prestop hooks,
as they are executed by the kubelet itself, and the cross-node lifecycle hook might
fail (according to the Kubernetes network model, it is not mandatory for kubelet to
be able to access pods on a different node).
This commit ensures that the test pod spawns on the same node as the http server pod.
This test verifies an implementation detail in the in-tree gcepd
plugin. The behavior is not implementated in the gcepd CSI driver
and therefore the test will be obsolete after CSI migration.
Co-Authored-By: Riaan Kleinhans <riaan@ii.coop>
e2e test validates the following 3 extra endpoints
- patchAppsV1NamespacedStatefulSet
- listAppsV1StatefulSetForAllNamespaces
- deleteAppsV1CollectionNamespacedStatefulSet
Some CSI drivers can't clone a volume into other topology segment (e.g. a
cloud availability zone). The scheduler does not know about these
restrictions and schedules pods with PVCs that clone a volume mostly
randomly.
Run all volume cloning tests in the same topology segment, if such segment
is available and has at least one schedulable node.
The MetricsGrabber itself knows now whether it supports each
component. The checks inside the tests therefore are redundant at best
or worse, they are wrong: for example, on a KinD cluster the check for
"has master node registered" failed and metrics grabbing from
scheduler and controller manager were skipped unnecessarily.
The MetricsGrabber checked whether a component supported metrics
grabbing, but then tests didn't have an API to use the result of that
check. Because metrics grabbing is an optional debug feature, tests
must skip checks that depend on metrics data or, when the entire
test is about metrics data, skip the test.
This is now supported with a special error that gets wrapped and
returned by the individual Grab functions.
This can be checked by trying to retrieve log output. As in the case
of no pod found, a warning gets emitted when log retrieval fails and
metrics grabbing gets disabled.
Logging is checked instead of actual metrics retrieval because the
latter is more complex and thus more likely to fail for other reasons.
The previous approach with grabbing via a nginx proxy had some
drawbacks:
- it did not work when the pods only listened on localhost (as
configured by kubeadm) and the proxy got deployed on a different
node
- starting the proxy raced with starting the pods, causing
sporadic test failures because the proxy was not set up
properly unless it saw all pods when starting the e2e.test
- the proxy was always started, whether it is needed or not
- the proxy was left running after a test and then the next
test run triggered potentially confusing messages when
it failed to create objects for the proxy
The new approach is similar to "kubectl port-forward" + "kubectl get
--raw". It uses the port forwarding feature to establish a TCP
connection via a custom dialer, then lets client-go handle TLS and
credentials.
Somehow verifying the server certificate did not work. As this
shouldn't be a big concern for E2E testing, certificate checking gets
disabled on the client side instead of investigating this further.
The value here is that the exec plugin author can use the kubeconfig to assert
how standard input is treated with respect to the exec plugin, e.g.,
- an exec plugin author can ensure that kubectl fails if it cannot provide
standard input to an exec plugin that needs it (Always)
- an exec plugin author can ensure that an client-go process will still call an
exec plugin that prefers standard input even if standard input is not
available (IfAvailable)
Signed-off-by: Andrew Keesler <akeesler@vmware.com>
The apiserver and test suite in node e2e runs under the sshd daemon
that can limit the amount of files it can open. Set a higher limit
to address the issues.
Signed-off-by: Odin Ugedal <odin@uged.al>
on cgroup v2 the reported metric is recursive for the entire and it
includes all the sub cgroups.
Adjust the test accordingly.
Closes: https://github.com/kubernetes/kubernetes/issues/99230
Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
Node e2e tests exceeding the global timeout are sent SIGINT, resulting
in no artifacts or console output. This will ignore the first SIGINT,
and since all children processes are being stopped due to SIGINT, we can
clean up before exiting.
This reverts commit
c15fd76ee9. Most (all?) of the hostpath
tests and several other tests started to fail again in
gce-scale-master-correctness after re-enabling the controller. This
shows that it was not just the obsolete agent which causes scalability
problems, but also the controller.
It has to be disabled until the scalability problems are addressed.
This make sure the testcase that cannot run locally will be skipped
instead of throwing the misleading failure message.
Signed-off-by: Dave Chen <dave.chen@arm.com>
CSI driver need to pass special mount opts to XFS filesystem to be able to
mount a volume + its clone or its restored snapshot on the same node. Add a
test to exhibit this behavior.
The test is optional for now, giving CSI drivers time to fix it.
These tests have been flaky for a long time, with a relatively low
rate of flakes. Nonetheless it seems better to extend the timeouts to
reduce the flakiness.