Tests "Multi-AZ Cluster Volumes" should consider only nodes that are
schedulable and *untainted* when computing AZ where to run the tests.
GetReadySchedulableNodes() already filters schedulable + untainted nodes,
no need to do it again in GetSchedulableClusterZones().
For some reason when we send them to journald, many log lines are
consistently dropped as soon as the PLEG is started.
If we log directly to file, we don't have this problem. As a bonus, if
the tests crash, the kubelet logs will always be available since they
were already written; otherwise we normally wait until the end of the
test run to collect them from journald, meaning that we often end up
with empty logs.
Removes any reference from the registry gcr.io/kubernetes-e2e-test-images in
kubernetes/kubernetes, replacing it with k8s.gcr.io/kubernetes-e2e-test-images.
In some cases, the images had to be updated since a few things have changed since
their original implementation, most notably being the fact that some of the images
have been centralized into the agnhost image.
Co-Authored-By: Claudiu Belu <cbelu@cloudbasesolutions.com>
- recover to last-known-good ConfigMap.KubeletConfigKey
~12m to run in CI, 13m locally
- non-nil last-known-good to a new non-nil last-known-good
~24m to run in CI
- recover to last-known-good ConfigMap
~12m to run in CI
- state transitions
~8m to run in CI
Including a skip method as the first line of a test does not prevent the test to fail in the BeforeEach function.
If the test is skipped because of a tag in the name, then we can prevent such odd behavior.
We now use a host local exec instead of SSH commands to simplify the
test and make the result more robust.
Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
This assumes that SSH via bastion works if the `KUBE_SSH_BASTION`
environment variable is set, which is the case for
`pull-kubernetes-e2e-gce-correctness`.
Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
1. fix command empty issue for some Windows storage tests
2. enable more windows storage tests by adding ntfs test patten
Change-Id: Ic33be282d669a23107474a14d4368bbf95c9b459
This add e2e test for HPA ContainerResource metrics. This add test to cover two scenarios
1. Scale up on a busy application with an idle sidecar container
2. Do not scale up on a busy sidecar with an idle application.
Signed-off-by: Vivek Singh <svivekkumar@vmware.com>
* Squashed commit of the following:
commit 7f774dcb54b511a3956aed0fac5c803f145e383a
Author: Jay Vyas (jayunit100) <jvyas@vmware.com>
Date: Fri Jun 18 10:58:16 2021 +0000
fix commit message
commit 0ac09650742f02004dbb227310057ea3760c4da9
Author: jay vyas <jvyas@vmware.com>
Date: Thu Jun 17 07:50:33 2021 -0400
Update test/e2e/network/netpol/kubemanager.go
Co-authored-by: Antonio Ojea <antonio.ojea.garcia@gmail.com>
commit 6a8bf0a6a2690dac56fec2bdcdce929311c513ca
Author: jay vyas <jvyas@vmware.com>
Date: Sun Jun 13 08:17:25 2021 -0400
Implement Service polling for network policy suite to remove reliance on CoreDNS when verifying network policys
Update test/e2e/network/netpol/probe.go
Co-authored-by: Antonio Ojea <antonio.ojea.garcia@gmail.com>
Add deafultNS to use service probe
commit b9c17a48327aab35a855540c2294a51137aa4a48
Author: Matthew Fenwick <mfenwick100@gmail.com>
Date: Thu May 27 07:30:59 2021 -0400
address code review comments for networkpolicy decoupling from dns
commit e23ef6ff0d189cf2ed80dbafed9881d68402cb56
Author: jay vyas <jvyas@vmware.com>
Date: Wed May 26 13:30:21 2021 -0400
NetworkPolicy decoupling from DNS
gofmt
remove old function
* model refactor
* minor
* dropped getK8sModel func
* dropped modelMap, added global model in BeforeEach and subsequent changes
Co-authored-by: Rajas Kakodkar <rajaskakodkar16@gmail.com>
Prevent Kubelet from incorrectly interpreting "not yet started" pods as "ready to terminate pods" by unifying responsibility for pod lifecycle into pod worker
Add e2e tests to cover the basic flows for the `full-pcpus-only` option:
negative flow to ensure rejection with proper error message, and
positive flow to verify the actual cpu allocation.
Co-authored-by: Swati Sehgal <swsehgal@redhat.com>
Signed-off-by: Francesco Romani <fromani@redhat.com>
Through Job.status.uncountedPodUIDs and a Pod finalizer
An annotation marks if a job should be tracked with new behavior
A separate work queue is used to remove finalizers from orphan pods.
Change-Id: I1862e930257a9d1f7f1b2b0a526ed15bc8c248ad
As of now, we allow PDBs to be applied to pods via
selectors, so there can be unmanaged pods(pods that
don't have backing controllers) but still have PDBs associated.
Such pods are to be logged instead of immediately throwing
a sync error. This ensures disruption controller is
not frequently updating the status subresource and thus
preventing excessive and expensive writes to etcd.
A number of race conditions exist when pods are terminated early in
their lifecycle because components in the kubelet need to know "no
running containers" or "containers can't be started from now on" but
were relying on outdated state.
Only the pod worker knows whether containers are being started for
a given pod, which is required to know when a pod is "terminated"
(no running containers, none coming). Move that responsibility and
podKiller function into the pod workers, and have everything that
was killing the pod go into the UpdatePod loop. Split syncPod into
three phases - setup, terminate containers, and cleanup pod - and
have transitions between those methods be visible to other
components. After this change, to kill a pod you tell the pod worker
to UpdatePod({UpdateType: SyncPodKill, Pod: pod}).
Several places in the kubelet were incorrect about whether they
were handling terminating (should stop running, might have
containers) or terminated (no running containers) pods. The pod worker
exposes methods that allow other loops to know when to set up or tear
down resources based on the state of the pod - these methods remove
the possibility of race conditions by ensuring a single component is
responsible for knowing each pod's allowed state and other components
simply delegate to checking whether they are in the window by UID.
Removing containers now no longer blocks final pod deletion in the
API server and are handled as background cleanup. Node shutdown
no longer marks pods as failed as they can be restarted in the
next step.
See https://docs.google.com/document/d/1Pic5TPntdJnYfIpBeZndDelM-AbS4FN9H2GTLFhoJ04/edit# for details
UserInfo contains a uid field alongside groups, username and extra.
This change makes it possible to pass a UID through as an impersonation header like you
can with Impersonate-Group, Impersonate-User and Impersonate-Extra.
This PR contains:
* Changes to impersonation.go to parse the Impersonate-Uid header and authorize uid impersonation
* Unit tests for allowed and disallowed impersonation cases
* An integration test that creates a CertificateSigningRequest using impersonation,
and ensures that the API server populates the correct impersonated spec.uid upon creation.
1. add AllocateLoadBalancerNodePorts fields in specs for validation test cases
2. update fuzzer
3. in resource quota e2e, allocate node port for loadbalancer type service and
exceed the node port quota
Signed-off-by: Hanlin Shi <shihanlin9@gmail.com>
This change updates the CSR API to add a new, optional field called
expirationSeconds. This field is a request to the signer for the
maximum duration the client wishes the cert to have. The signer is
free to ignore this request based on its own internal policy. The
signers built-in to KCM will honor this field if it is not set to a
value greater than --cluster-signing-duration. The minimum allowed
value for this field is 600 seconds (ten minutes).
This change will help enforce safer durations for certificates in
the Kube ecosystem and will help related projects such as
cert-manager with their migration to the Kube CSR API.
Future enhancements may update the Kubelet to take advantage of this
field when it is configured in a way that can tolerate shorter
certificate lifespans with regular rotation.
Signed-off-by: Monis Khan <mok@vmware.com>
Ensure resources are created in zone with schedulable
nodes. For example, if we have 4 zones with 3 zones
having worker nodes and 1 zone having master nodes(unscheduable
for workloads), we should not create resources like PV, PVC or
pods in that zone.
We're running ubernetes tests
`should only be allowed to provision PDs in zones
where nodes exist`
on gcp&gke. While the test is useful in exercising
the scenario of identifying extra zone and
creating a node in it, not every Kube
distribution uses the same approach to create a node,
further if even there is an extra zone, we cannot
guarantee the zone to have enough quota. There can also
be other GCP specific edge cases all of which cannot be
covered within this test. So, removing the test
as agreed upon with the storage team
The data structure would wrap an embedded filesystem andthe root
directory relative to which the embedded filesystem is constructed.
Signed-off-by: Nabarun Pal <pal.nabarun95@gmail.com>
We're trying to fix https://github.com/kubernetes/kubernetes/issues/75355
sicne long time, and we believe the current timeout could
actually be too low (despite being "forever", which is 30s).
To validate this theory, we set the timeout to one full minute.
Also, make the logging more verbose to make the troubleshooting easier.
Signed-off-by: Francesco Romani <fromani@redhat.com>
The PR https://github.com/kubernetes/kubernetes/pull/100041 updated
node-problem-detector to v0.8.7, but unfortunately we didn't update
also the image using in the e2e_node tests.
As result, the tests were failing like
E2eNode Suite: [sig-node] NodeProblemDetector [NodeFeature:NodeProblemDetector] [Serial] SystemLogMonitor should generate node condition and events for corresponding errors
_output/local/go/src/k8s.io/kubernetes/test/e2e_node/node_problem_detector_linux.go:301
Timed out after 60.000s.
Expected success, but got an error:
<*errors.errorString | 0xc0011f2600>: {
s: "expected total number of events was 4, actual events counted was 7\nEvents
This in turn was one of the contributing factors in making the
pull-kubernetes-node-kubelet-serial lane constantly failing.
This patch updates the image used in the tests, fixing the failure.
Signed-off-by: Francesco Romani <fromani@redhat.com>
Some tests are checking the network connectivity using gomega.Consistently,
which will fail if any of the checks fails. This could lead to flakyness in
some scenarios in which kube-proxy was supposed to apply Policies for
Kubernetes services.
We can instead wait for the network connectivity to work first using gomega.Eventually,
after which we can check the consistency.
For manifest lists containing Windows images, it is important to also have the "os.version"
annotation set, as it is needed by the Windows nodes, so they can pull the appropriate image
from the list.
Previously, the docker manifest CLI did not have the capability to set it, so, we had to set
it outselves in the manifest list's image JSON file. This is no longer necessary since
docker 20.10.0, which includes docker manifest annotate --os-version.
The docker installed in the image gcr.io/k8s-testimages/gcb-docker-gcloud:v20210622-762366a
satisfies this version requirement.
The CPUManager graduated to beta a while ago (k8s 1.10?)
so let's get rid of the obsolete Alpha tag on its e2e tests.
Signed-off-by: Francesco Romani <fromani@redhat.com>
- verify memory manager data returned by `GetAllocatableResources`
- verify pod container memory manager data
Signed-off-by: Artyom Lukianov <alukiano@redhat.com>
In the case of multinode clusters, the http server pod and the test cluster can
spawn on different nodes, which can be problematic for poststart / prestop hooks,
as they are executed by the kubelet itself, and the cross-node lifecycle hook might
fail (according to the Kubernetes network model, it is not mandatory for kubelet to
be able to access pods on a different node).
This commit ensures that the test pod spawns on the same node as the http server pod.
This test verifies an implementation detail in the in-tree gcepd
plugin. The behavior is not implementated in the gcepd CSI driver
and therefore the test will be obsolete after CSI migration.
Co-Authored-By: Riaan Kleinhans <riaan@ii.coop>
e2e test validates the following 3 extra endpoints
- patchAppsV1NamespacedStatefulSet
- listAppsV1StatefulSetForAllNamespaces
- deleteAppsV1CollectionNamespacedStatefulSet
Some CSI drivers can't clone a volume into other topology segment (e.g. a
cloud availability zone). The scheduler does not know about these
restrictions and schedules pods with PVCs that clone a volume mostly
randomly.
Run all volume cloning tests in the same topology segment, if such segment
is available and has at least one schedulable node.
The MetricsGrabber itself knows now whether it supports each
component. The checks inside the tests therefore are redundant at best
or worse, they are wrong: for example, on a KinD cluster the check for
"has master node registered" failed and metrics grabbing from
scheduler and controller manager were skipped unnecessarily.
The MetricsGrabber checked whether a component supported metrics
grabbing, but then tests didn't have an API to use the result of that
check. Because metrics grabbing is an optional debug feature, tests
must skip checks that depend on metrics data or, when the entire
test is about metrics data, skip the test.
This is now supported with a special error that gets wrapped and
returned by the individual Grab functions.
This can be checked by trying to retrieve log output. As in the case
of no pod found, a warning gets emitted when log retrieval fails and
metrics grabbing gets disabled.
Logging is checked instead of actual metrics retrieval because the
latter is more complex and thus more likely to fail for other reasons.
The previous approach with grabbing via a nginx proxy had some
drawbacks:
- it did not work when the pods only listened on localhost (as
configured by kubeadm) and the proxy got deployed on a different
node
- starting the proxy raced with starting the pods, causing
sporadic test failures because the proxy was not set up
properly unless it saw all pods when starting the e2e.test
- the proxy was always started, whether it is needed or not
- the proxy was left running after a test and then the next
test run triggered potentially confusing messages when
it failed to create objects for the proxy
The new approach is similar to "kubectl port-forward" + "kubectl get
--raw". It uses the port forwarding feature to establish a TCP
connection via a custom dialer, then lets client-go handle TLS and
credentials.
Somehow verifying the server certificate did not work. As this
shouldn't be a big concern for E2E testing, certificate checking gets
disabled on the client side instead of investigating this further.