Commit Graph

2264 Commits

Author SHA1 Message Date
Kubernetes Prow Robot
91aca10d59 Merge pull request #108958 from 249043822/e2e-density
Fix:[Flaky test] create a sequence of pods latency/resource should be within limit when create 10 pods with 50 background pods
2022-06-29 20:18:06 -07:00
Paul S. Schweigert
b6675fce4a fix link to eviction policy in e2enode eviction test
Signed-off-by: Paul S. Schweigert <paulschw@us.ibm.com>
2022-06-29 19:23:49 -04:00
ZhangKe10140699
a945b6f066 Fix:[Flaky test] ci-kubernetes-node-kubelet-serial-cri-o job: [sig-node] Density [Serial] [Slow] create a sequence of pods latency/resource should be within limit when create 10 pods with 50 background pods 2022-06-22 08:14:43 +08:00
David Porter
b4b338d4eb test: update graceful node shutdown e2e with watch
Use a watch to detect invalid pod status updates in graceful node
shutdown node e2e test. By using a watch, all pod updates will be
captured while the previous logic required polling the api-server which
could miss some intermediate updates.

Signed-off-by: David Porter <david@porter.me>
2022-06-08 16:19:16 -07:00
Kubernetes Prow Robot
19ca12cb3e Merge pull request #109820 from fromanirh/e2e-node-enable-device-plugin-test
e2e: node: re-enable the device plugin tests
2022-06-01 12:03:40 -07:00
Davanum Srinivas
50bea1dad8 Move from k8s.gcr.io to registry.k8s.io
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2022-05-31 10:16:53 -04:00
Francesco Romani
f3e157d168 e2e: node: re-enable the device plugin tests
Signed-off-by: Francesco Romani <fromani@redhat.com>
2022-05-16 16:05:13 +02:00
Francesco Romani
48b5af49e0 e2e: node: reorder imports
trivial cleanup

Signed-off-by: Francesco Romani <fromani@redhat.com>
2022-05-16 16:04:01 +02:00
Francesco Romani
98eb6db7c0 e2e: node: fix plugins directory
Previously, the e2e test was overriding the plugins socket directory to
"/var/lib/kubelet/plugins_registry". This seems wrong, and with that
setting the e2e test was already failing, because the registration
process was timing out, in turn because the kubelet was trying to call
back the device plugin in the wrong place (see below for details).

I can't explain why it worked before - or it if worked at all - but
it really seems that `pluginapi.DevicePluginPath` is the right
setting here.

+++

In a nutshell, the device plugin registration process works like this:

1. The kubelet runs and creates the device plugin socket registration
   endpoint:
	KubeletSocket = DevicePluginPath + "kubelet.sock"
	DevicePluginPath = "/var/lib/kubelet/device-plugins/"
2. Each device plugin will listen to an ENDPOINT the kubelet will connect
   backk to.  IOW the kubelet will act like a client to each device plugin,
   to perform allocation requests (and more)
   Each device plugin will serve from a endpoint.
   The endpoint name is plugin-specific, but they all must be inside a
   well-known directory: pluginapi.DevicePluginPath
3. The kubelet creates the device plugin pod, like any other pod
4. During the startup, each device plugin wants to register itself in the
   kubelet. So it sends a request through
   the registration endpoint. Key details:
	grpc.Dial(kubelet registration socket)
	registration request
	reqt := &pluginapi.RegisterRequest{
		Version:      pluginapi.Version,
		Endpoint:     endpointSocket,	<- socket relative to pluginapi.DevicePluginPath
		ResourceName: resourceName, 	<- resource name to be exposed
}
5. While handling the registration request, kubelet dial back the
   device plugin on socketDir + req.Endpoint.
   But socketDir is hardcoded in the device manager code to
   pluginapi.KubeletSocket

Signed-off-by: Francesco Romani <fromani@redhat.com>
2022-05-16 16:03:50 +02:00
Francesco Romani
23147ff4b3 e2e: node: devplugin: tolerate node readiness flip
In the AfterEach check of the e2e node device plugin tests,
the tests want really bad to clean up after themselves:
- delete the sample device plugin
- restart again the kubelet
- ensure that after the restart, no stale sample devices
  (provided  by the sample device plugin) are reported anymore.

We observed that in the AfterEach block of these e2e tests
we have quite reliably a flip/flop of the kubelet readiness
state, possibly related to a race with/ a slow runtime/PLEG check.

What happens is that the kubelet readiness state is true,
but goes false for a quick interval and then goes true again
and it's pretty stable after that (observed adding more logs
to the check loop).

The key factor here is the function `getLocalNode` aborts the
test (as in `framework.ExpectNoError`) if the node state is
not ready. So any occurrence of this scenario, even if it
is transient, will cause a test failure. I believe this will
make the e2e test unnecessarily fragile without making it more
correct.

For the purpose of the test we can tolerate this kind of glitches,
with kubelet flip/flopping the ready state, granted that we meet
eventually the final desired condition on which the node reports
ready AND reports no sample devices present - which was the condition
the code was trying to check.

So, we add a variant of `getLocalNode`, which just fetches the
node object the e2e_node framework created, alongside to a flag
reporting the node readiness. The new helper does not make
implicitly the test abort if the node is not ready, just bubbles
up this information.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2022-05-16 14:22:25 +02:00
Francesco Romani
56c539bff0 e2e: node: deviceplug: deepcopy the pod dev template
Let's avoid unexpected side effects

Signed-off-by: Francesco Romani <fromani@redhat.com>
2022-05-16 14:22:24 +02:00
Kubernetes Prow Robot
1a6adee3d6 Merge pull request #109753 from matthyx/109577
do not install docker with curl
2022-05-13 07:33:49 -07:00
Matthias Bertschy
d42321dc05 recommend containerd instead of docker, cleanup 2022-05-13 15:25:15 +02:00
Jordan Liggitt
2a10ca650d drop vendor from test targets 2022-05-05 08:47:33 -04:00
Kubernetes Prow Robot
f0928952d7 Merge pull request #109770 from fromanirh/e2e-node-device-plugin-skip
e2e: node: explicit skip for device plugin tests
2022-05-05 01:43:37 -07:00
Francesco Romani
19ae360af9 e2e: node: inline getSampleDevicePluginPod
Starting golangci-lint >= 1.45, the tool is complaining
about the function being unused:
```bash
test/e2e_node/device_plugin_test.go:82:6: func `getSampleDevicePluginPod` is unused (unused)
func getSampleDevicePluginPod() *v1.Pod {
     ^

Please review the above warnings. You can test via "./hack/verify-golangci-lint.sh"
If the above warnings do not make sense, you can exempt this warning with a comment
 (if your reviewer is okay with it).
In general please prefer to fix the error, we have already disabled specific lints
 that the project chooses to ignore.
See: https://golangci-lint.run/usage/false-positives/}
```

thing is the code is not changed lately, and manual inspection trivially
confirms it is used.
Older versions of golangci-lint (tested with
```
golangci-lint has version 1.41.1 built from a2074809 on 2021-06-19T16:01:50Z
```)
indeed do NOT complain about the function, so this seems a golangci-lint
bug.

To move forward, we can disable the warning, but this leaves a sour
taste.
Instead, since the function is pretty trivias, was used just once and the caller
was undoing some of the work done by the function, we just inline it,
which solves the linter warning and makes the code a bit better.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2022-05-04 17:05:00 +02:00
Stephen Benjamin
b351745c1c Replace use of Sprintf with net.JoinHostPort
On IPv6 clusters, one of the most frequent problems I encounter is
assumptions that one can build a URL with a host and port simply by
using Sprintf, like this:

```go
fmt.Sprintf("http://%s:%d/foo", host, port)
```

When `host` is an IPv6 address, this produces an invalid URL as it must
be bracketed, like this:

```
http://[2001:4860:4860::8888]:9443
```

This change fixes the occurences of joining a host and port with the
purpose built `net.JoinHostPort` function.

I encounter this problem often enough that I started to [write a linter
for it](https://github.com/stbenjam/go-sprintf-host-port).  I don't
think the linter is quite ready for wide use yet, but I did run it
against the Kube codebase and found these.  While the host portion in
some of these changes may always be an FQDN or IPv4 IP today, it's an
easy thing that can break later on.
2022-05-04 06:37:50 -04:00
Kubernetes Prow Robot
c1ad54dfe3 Merge pull request #109649 from pohly/e2e-feature-gates
e2e: move feature gate support from test/e2e to test/e2e_node
2022-05-04 02:35:18 -07:00
Kubernetes Prow Robot
2a0d2331a8 Merge pull request #109574 from endocrimes/dani/e2e_node-approver
sig-node: endocrimes as e2e_node approver
2022-05-04 02:34:11 -07:00
Kubernetes Prow Robot
1347e560ec Merge pull request #109572 from endocrimes/dani/remote-docker
e2e_node: remote runner: Require containerd/crio
2022-05-04 02:34:03 -07:00
Kubernetes Prow Robot
a637604399 Merge pull request #109571 from endocrimes/dani/cleanup-e2e-node
e2e_node: Cleanup old unused jenkins scripts/config
2022-05-04 02:33:56 -07:00
Kubernetes Prow Robot
35de9f5027 Merge pull request #109410 from dims/set-default-flake-attempt-to-one
Set default flake attempt to 1 (not 2)
2022-05-04 01:27:30 -07:00
Kubernetes Prow Robot
e5115587b3 Merge pull request #109322 from hoskeri/conformance-test-healthz
conformance-test: use kubelet healthz port.
2022-05-03 19:31:06 -07:00
Kubernetes Prow Robot
6ffd13f460 Merge pull request #107819 from matthyx/107505
Replace dbus-send for fake PrepareForShutdown message
2022-05-03 17:18:39 -07:00
Kubernetes Prow Robot
ea7c57b2ee Merge pull request #99685 from yangjunmyfm192085/run-test24
Fix misspelling of success.
2022-05-03 17:16:47 -07:00
Francesco Romani
017998e889 e2e: node: explicit skip for device plugin tests
The device plugin e2e tests where failing lately and to unblock the
release a skip was added in the prow job configuration:
71cf119c84/config/jobs/kubernetes/sig-node/sig-node-presubmit.yaml (L401)

The problem here is not only the broken test which need to be
fixed, but also the fact that this is the only skip (for a specific
test) we do this way, which is surprising (xref:
https://github.com/kubernetes/kubernetes/issues/106635#issuecomment-1105627265)

As next step towards improvement, we add an explicit skip in the tests
proper. This makes at least more obvious these tests need more work,
and allow us to remove the edge case in the prow configuration.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2022-05-03 18:44:11 +02:00
Patrick Ohly
2664740043 e2e: move feature gate support from test/e2e to test/e2e_node
The test/e2e suite has never supported feature gates:
- it cannot discover at runtime how the cluster is configured
- its --feature-gates parameter had no effect

Despite that, tests were written that used
e2eskipper.SkipUnlessFeatureGateEnabled even though that function then only
checked the default feature gate state.  To catch such mistakes, e2e tests
suites now must explicitly enable feature gate checking via
e2eskipper.InitFeatureGates. They also must register their own command line
flag. When that is not done, then using SkipUnlessFeatureGateEnabled or
SkipIfFeatureGateEnabled leads to a test failure.

test/e2e_node does both and therefore continues to work as before.
2022-04-25 15:41:41 +02:00
Abhijit Hoskeri
49dc59873b e2e_node/{service,util}: use kubelet healthz port.
The readonly port could be disabled.

Since we are only using the /healthz endpoint,
we can use the healthz port for this.

Change-Id: Ie0e05a5ab4ec6f51e4d3c63226aa23c1b3a69956
2022-04-22 16:14:31 -07:00
Danielle Lancashire
d6c184084c sig-node: endocrimes as e2e_node approver 2022-04-20 17:12:09 +00:00
Danielle Lancashire
0e0e3113e2 e2e_node: remote runner: Require containerd/crio 2022-04-20 16:49:29 +00:00
Danielle Lancashire
7151ff8d5c e2e_node: remove jenkins docker_validation 2022-04-20 16:16:57 +00:00
Danielle Lancashire
3e0041b5b9 e2e_node: remove copy-e2e-image.sh
This script is unused, and the project that was formerly used for e2e
node images is in the process of being removed.
2022-04-20 16:15:25 +00:00
Danielle Lancashire
d90ba453ce e2e_node: remove unused jenkins runner script 2022-04-20 16:15:15 +00:00
Danielle Lancashire
8333bcc6ab e2e_node: remove unused jenkins/coreos-init.json 2022-04-20 16:11:36 +00:00
Abhijit Hoskeri
ea6e653db1 conformance-test: use kubelet healthz port.
The readonly port could be disabled.

Since we are only using the /healthz endpoint,
we can use the healthz port.

Change-Id: If004f2888ca5847b9e2d8c02d5615bed52d94b24
2022-04-11 16:57:29 -07:00
Davanum Srinivas
984037d4f7 Set default flake attempt to 1 (not 2)
Signed-off-by: Davanum Srinivas <davanum@gmail.com>
2022-04-10 20:24:17 -04:00
Sergiusz Urbaniak
1495c9f2cd test/e2e/*: default existing tests to privileged pod security policy
This is to ensure that all existing tests don't break when defaulting
the pod security policy to restricted in the e2e test framework.
2022-04-05 08:41:12 +02:00
Maciej Wyrzuc
1108bed763 Revert "Field status.hostIPs added for Pod (#101566)"
This reverts commit 61b3c028ba.
2022-03-31 12:39:45 +00:00
Kubernetes Prow Robot
9e65ee3908 Merge pull request #109097 from pacoxu/fix-sig-node-failures
remove baseline pod security in host pid sharing testing
2022-03-29 20:36:37 -07:00
Kubernetes Prow Robot
4b3ddcf793 Merge pull request #108909 from saschagrunert/wip-crio
Inject SSH public key into CRI-O serial prow jobs
2022-03-29 17:35:42 -07:00
Kubernetes Prow Robot
b0254c8a0b Merge pull request #108758 from fengzixu/improvement-volume-health
re-push "add volume kubelet_volume_stats_health_abnormal to kubelet #105585"
2022-03-29 17:35:34 -07:00
Shiming Zhang
61b3c028ba Field status.hostIPs added for Pod (#101566)
* Add FeatureGate PodHostIPs

* Add HostIPs field and update PodIPs field

* Types conversion

* Add dropDisabledStatusFields

* Add HostIPs for kubelet

* Add fuzzer for PodStatus

* Add status.hostIPs in ConvertDownwardAPIFieldLabel

* Add status.hostIPs in validEnvDownwardAPIFieldPathExpressions

* Downward API support for status.hostIPs

* Add DownwardAPI validation for status.hostIPs

* Add e2e to check that hostIPs works

* Add e2e to check that Downward API works

* Regenerate
2022-03-29 11:46:07 -07:00
Paco Xu
4e96009c15 use privileged enforce level in host pid sharing testing 2022-03-29 15:51:33 +08:00
Kir Kolyshkin
37761a329e pkg/kubelet: changes to update runc to 1.1.0
The changes (mostly in pkg/kubelet/cm) are there to adopt changed
runc 1.1 API, and simplify things a bit. In particular:

1. simplify cgroup manager instantiation, using a new, easier way of
   libcontainers/cgroups/manager.New;

2. replace libcontainerAdapter with a boolean variable (all it did
   was passing on whether systemd manager should be used);

3. trivial change due to removed cgroupfs.HugePageSizes and added
    cgroups.HugePageSizes();

4. do not calculate cgroup paths in update / destroy, since libcontainer
   cgroup managers now calculate the paths upon creation (previously,
   they were doing that only in Apply, so using e.g. Set or Destroy right
   after creation was impossible without specifying paths).

We currently still calculate cgroup paths in Exists -- this is to be
addressed separately.

Co-Authored-By: Elana Hashman <ehashman@redhat.com>
2022-03-28 16:23:20 -07:00
Sergiusz Urbaniak
373c08e0c7 test/e2e/framework: configure pod security admission level for e2e tests 2022-03-28 15:42:10 +02:00
Sascha Grunert
57a3ce1a3e Inject SSH public key into CRI-O serial prow jobs
This allows using the `GCE_SSH_PUBLIC_KEY_FILE_CONTENT` placeholder to
inject the public SSH key for running the tests.

Signed-off-by: Sascha Grunert <sgrunert@redhat.com>
2022-03-25 08:23:57 +01:00
Andrew Sy Kim
45e6498fc5 test/e2e_node/plugins/gcp-credential-provider: update Test_getCredentials to validate against v1beta1 kubelet APIs
Signed-off-by: Andrew Sy Kim <andrewsy@google.com>
2022-03-24 23:29:13 -04:00
Andrew Sy Kim
3600a7a355 test/e2e_node: update test plugin to use v1beta1 kubelet APIs
Signed-off-by: Andrew Sy Kim <andrewsy@google.com>
2022-03-24 23:29:06 -04:00
Andrew Sy Kim
ef3c4fb3cd test/e2e_node: update credential provider config to use v1beta1 kubelet
APIs

Signed-off-by: Andrew Sy Kim <andrewsy@google.com>
2022-03-24 23:28:37 -04:00
Aditi Sharma
51cd36cf80 prepend credential provider flags on ubuntu also 2022-03-24 14:08:41 +05:30