Commit Graph

16259 Commits

Author SHA1 Message Date
Jan Chaloupka
7b5534021c Collect some of scheduling metrics and scheduling throughput
In addition to getting overall performance measurements from golang benchmark,
collect metrics that provides information about insides of the scheduler itself.
This is a first step towards improving what we collect about the scheduler.

Metrics in question:
- scheduler_scheduling_algorithm_predicate_evaluation_seconds
- scheduler_scheduling_algorithm_priority_evaluation_seconds
- scheduler_binding_duration_seconds
- scheduler_e2e_scheduling_duration_seconds

Scheduling throughput is computed on the fly inside perfScheduling.
2020-02-13 13:32:09 +01:00
tanjunchen
15bc88785a test/e2e/framework/util.go:make function LookForString private 2020-02-13 20:29:27 +08:00
Francesco Romani
08ba240c6b e2e: e2e_node: refactor getCurrentKubeletConfig
this patch moves the helper getCurrentKubeletConfig function,
used in both e2e and e2e_node tests and previously duplicated,
in the common framework.

There are no intended changes in behaviour.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2020-02-13 12:53:15 +01:00
Kubernetes Prow Robot
8ca96f3e07 Merge pull request #80724 from cceckman/provider-info-e2e
Provide OIDC discovery for service account token issuer
2020-02-13 01:38:35 -08:00
Maciej Borsz
c78c0e949d Remove unnecessary calls to GCE API after PD is created. 2020-02-13 09:57:25 +01:00
Jan Safranek
528adbefe4 Remove client cleanup from TestCleanup
All tests remove the test client pod, usually in TestVolumeClient.
Rename TestCleanup to TestServerCleanup.
In addition, remove few calls to Test(Server)Cleanup that do not do anything
useful (server pod is not used in these tests).
2020-02-13 09:55:53 +01:00
Shukun
7f9e228bd7 Fix impossible condition in test/e2e/framework/resource_usage_gatherer.go 2020-02-13 16:13:19 +09:00
Kubernetes Prow Robot
fb6f5d739b Merge pull request #88070 from sshukun/remove-tautological-condition
Remove tautological condition in test/e2e/framework/pod/resource.go
2020-02-12 21:37:35 -08:00
Kubernetes Prow Robot
f0c14f291f Merge pull request #87751 from skilxn-go/Rename
[Scheduler Framework] Rename `PostFilter` plugin to `PreScore`
2020-02-12 21:37:12 -08:00
Kubernetes Prow Robot
a11a8b8691 Merge pull request #87714 from julianvmodesto/use-kubectl-ss-dry-run-flag
Use --dry-run=server in kubectl commands
2020-02-12 21:36:57 -08:00
Kubernetes Prow Robot
1aa21639cf Merge pull request #88064 from wongma7/webhook-header
Initialize http Request Header before RoundTrip to avoid panic
2020-02-12 19:54:35 -08:00
Kubernetes Prow Robot
4ab8c5393f Merge pull request #88059 from msau42/refactor-e2e-node-selection
Refactor e2e node selection
2020-02-12 17:54:58 -08:00
Julian V. Modesto
13b80b48cd Use --dry-run=client,server in kubectl.
- Support --dry-run=server for subcommands apply, run, create, annotate,
expose, patch, label, autoscale, apply set-last-applied, drain, rollout undo
- Support --dry-run=server for set subcommands
  - image
  - resources
  - serviceaccount
  - selector
  - env
  - subject
- Support --dry-run=server for create subcommands.
  - clusterrole
  - clusterrolebinding
  - configmap
  - cronjob
  - job
  - deployment
  - namespace
  - poddisruptionbudget
  - priorityclass
  - quota
  - role
  - rolebinding
  - service
  - secret
  - serviceaccount
- Remove GetClientSideDryRun
2020-02-12 20:46:54 -05:00
Walter Fender
e8f67d122f Fix gce-cos-master-reboot test
Adding additional steps to network restart to ensure it restarts.
Also directing output to serial port to make the test debuggable.
2020-02-12 17:07:41 -08:00
drfish
5782d616f2 Move skip method from e2e fw ginkgowrapper to e2e skipper fw 2020-02-13 06:25:43 +08:00
skilxn-go
f5b7e3cca3 Rename PostFilter plugin to PreScore 2020-02-12 23:25:08 +08:00
Kubernetes Prow Robot
460fdc7f48 Merge pull request #87057 from oomichi/add-debugging-msg-issue86678
Add logs of port-forward-tester pod
2020-02-12 05:22:51 -08:00
Shukun
0421b40d79 Remove tautological condition in test/e2e/framework/pod/resource.go 2020-02-12 20:49:24 +09:00
Kubernetes Prow Robot
9f58fb790c Merge pull request #88033 from dims/avoid-running-docker-specific-test-in-containerd
Avoid running docker specific test in containerd
2020-02-11 23:16:33 -08:00
Kubernetes Prow Robot
6c074f819c Merge pull request #88003 from misterikkit/vsphere-tags
Add missing tag to vSphere storage E2E tests
2020-02-11 23:15:44 -08:00
Kubernetes Prow Robot
8f07f3a156 Merge pull request #87943 from tanjunchen/move-funcs001
test/e2e/framework:move functions to test/e2e/scheduling/
2020-02-11 23:15:34 -08:00
Kubernetes Prow Robot
c9d4257cbc Merge pull request #87819 from mortent/SerialFlakyPDBTests
Make DisruptionController eviction tests serial to avoid flakes
2020-02-11 23:14:55 -08:00
Kubernetes Prow Robot
52fb02fdbe Merge pull request #87718 from wojtek-t/kubelet_not_watching_immutable_secret_configmaps
WatchBasedManager stops  watching immutable objects
2020-02-11 23:14:33 -08:00
Matthew Wong
c048fb19fc Initialize http Request Header before RoundTrip to avoid panic 2020-02-12 06:55:37 +00:00
Michelle Au
d9184b75c9 Convert volume.TestConfig to use NodeSelection
Change-Id: I6adbb53b65e4a4f7e220fc0d91a26dc6bc135c36
2020-02-11 21:13:42 -08:00
Michelle Au
76a4a34dae Pass NodeSelection directly into e2e testsuites so that tests can use them more consistently
Change-Id: I99c8c1d8535a2a2319fbe8216b953c14a56f2763
2020-02-11 20:25:24 -08:00
Jordan Liggitt
242e3ebf01 Add buffer for GC resync retry to GC e2e tests 2020-02-11 22:31:09 -05:00
Michelle Au
fb9f02b5e1 Don't set NodeName directly in Pods so that it still goes through the scheduler
Change-Id: I244b6aac0289a13339f3ac228c4ad9ecf8c07b42
2020-02-11 19:17:41 -08:00
Charles Eckman
5a176ac772 Provide OIDC discovery endpoints
- Add handlers for service account issuer metadata.
- Add option to manually override JWKS URI.
- Add unit and integration tests.
- Add a separate ServiceAccountIssuerDiscovery feature gate.

Additional notes:
- If not explicitly overridden, the JWKS URI will be based on
  the API server's external address and port.

- The metadata server is configured with the validating key set rather
than the signing key set. This allows for key rotation because tokens
can still be validated by the keys exposed in the JWKs URL, even if the
signing key has been rotated (note this may still be a short window if
tokens have short lifetimes).

- The trust model of OIDC discovery requires that the relying party
fetch the issuer metadata via HTTPS; the trust of the issuer metadata
comes from the server presenting a TLS certificate with a trust chain
back to the from the relying party's root(s) of trust. For tests, we use
a local issuer (https://kubernetes.default.svc) for the certificate
so that workloads within the cluster can authenticate it when fetching
OIDC metadata. An API server cannot validly claim https://kubernetes.io,
but within the cluster, it is the authority for kubernetes.default.svc,
according to the in-cluster config.

Co-authored-by: Michael Taufen <mtaufen@google.com>
2020-02-11 16:23:31 -08:00
Davanum Srinivas
f26dbc473d Avoid running docker specific test in containerd 2020-02-11 14:32:18 -05:00
Jan Safranek
2430c48c10 Delete pod in volume tests
All storage e2e tests should delete pods they use so we can identify issues
on volume cleanup easily.
2020-02-11 12:54:38 +01:00
Kubernetes Prow Robot
dc8208dddc Merge pull request #87871 from msau42/fix-hostexec
Use NodeSelector instead of NodeName in hostexec Pod
2020-02-10 20:44:01 -08:00
Kubernetes Prow Robot
f8f6229d77 Merge pull request #87950 from tanjunchen/fix-no-non-ascii-characters-/test
test/ : fix non-ascii characters
2020-02-10 17:22:15 -08:00
Kubernetes Prow Robot
921ef35e64 Merge pull request #87949 from 928234269/non_ascii_01
Fix non-ascii characters in test/e2e_node and test/network.
2020-02-10 17:22:01 -08:00
Michelle Au
1ee35e788e Use NodeSelector instead of NodeName in hostexec Pod so that the Pod runs through the scheduler
Change-Id: Ia2f7ad39af318bbe707b43dfea706293ecdf5203
2020-02-10 15:36:04 -08:00
Jonathan Basseri
09121d9686 Add missing tag to vSphere storage E2E tests
This adds the [Feature:vsphere] tag to those vSphere tests which were
missing it. This makes it easier to specifically target the vSphere
storage E2E test suite.
2020-02-10 14:48:55 -08:00
Francesco Romani
70cce5e3f1 e2e: topomgr: introduce sriov setup/teardown funcs
Reorganize the code with setup and teardown functions,
to make room for the future addition of more device plugin
support, and to make the code a bit tidier.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2020-02-10 22:47:54 +01:00
Francesco Romani
2f0a6d2c76 e2e: topomgr: use constants for test limits
Signed-off-by: Francesco Romani <fromani@redhat.com>
2020-02-10 22:47:54 +01:00
Francesco Romani
fee1dba054 e2r: topomgr: improve the test logs
Add clarification to which test is doing what, to make
the test output easier to understand.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2020-02-10 22:47:54 +01:00
Francesco Romani
83c344647f e2e: topomgr: better check for AffinityError
Add a helper function to check if a Pod failed
admission for Topology Affinity Error.
So far we only check the Status.Reason.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2020-02-10 22:47:54 +01:00
Francesco Romani
512a4e8a3e e2e: topomgr: reduce node readiness timeout
Five minutes was initially used only to be overcautious.
From my experiments, the node is ready in usually less than a minute.
Double it to give some buffer space.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2020-02-10 22:47:54 +01:00
Francesco Romani
3b4122bd03 e2e: topomgr: get and use topology hints from conf
TO properly implement some e2e tests, we need to know
some basic topology facts about the system running the tests.
The bare minimum we need to know is how many PCI SRIOV devices
are attached to which NUMA node.

This way we know which core we can reserve for kube services,
and which NUMA socket we can take to test full socket reservation.

To let the tests know the PCI device topology, we use annotations
in the SRIOV device plugin ConfigMap we need anyway.
The format is

```yaml
  metadata:
    annotations:
      pcidevice_node0: "2"
      pcidevice_node1: "0"
```

with one annotation per NUMA node in the system.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2020-02-10 22:47:53 +01:00
Francesco Romani
d9d652e867 e2e: topomgr: initial negative tests
Negative tests is when we request a gu Pod we know the system cannot
fullfill - hence we expect rejection from the topology manager.

Unfortunately, besides the trivial case of excessive cores (request
more socket than a NUMA node provides) we cannot easily test the
devices, because crafting a proper pod will require detailed knowledge
of the hw topology.

Let's consider a hypotetical two-node NUMA system with two PCIe busses,
one per NUMA node, with a SRIOV device on each bus.
A proper negative test would require two SRIOV device, that the system
can provide but not on the same single NUMA node.
Requiring for example three devices (one more than the system provides)
will lead to a different, legitimate admission error.

For these reasons we bootstrap the testing infra for the negative tests,
but we add just the simplest one.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2020-02-10 22:47:53 +01:00
Francesco Romani
ee92b4aae0 e2e: topomgr: add more positive tests
this patch builds on the topology manager e2e infrastructure to
add more positive e2e test cases.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2020-02-10 22:47:53 +01:00
Francesco Romani
1b5801a086 e2e: topomgr: add option to specify the SRIOV conf
We cannot anticipate all the possible configurations
needed by the SRIOV device plugin: there is too much variety.

Hence, we need to allow the test environment to supply
a host-specific ConfigMap to properly configure the device
plugin and avoid false negatives.

We still provide a the default config map as fallback and reference.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2020-02-10 22:47:53 +01:00
Francesco Romani
6687fcc78c e2e: topomgr: autodetect SRIOV resource to use
The SRIOV device plugin can create different resources depending
on both the hardware present on the system and the configuration.
As long as we have at least one SRIOV device, the tests don't actually
care about which specific device is.

Previously, the test hardcoded the most common intel SRIOV device
identifier. This patch lifts the restriction and let the test
autodetect and use what's available.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2020-02-10 22:47:53 +01:00
Francesco Romani
fa26fb6817 e2e: topomgr: check pod resource alignment
This patch extends and completes the previously-added
empty topology manager test for single-NUMA node policy
by adding reporting in the test pod and checking
the resource alignment.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2020-02-10 22:47:53 +01:00
Francesco Romani
cd7e3d626c e2e: topomgr: add test infra
This patch all the testing infra and utilities needed
to run e2e topology manager tests. This include setup
a guaranteed pod which needs some devices.

The simplest real device available for the purpose
are the SRIOV devices, hence we use them.

This patch pulls the SRIOV device plugin from
the official, yet external, repository.
We do it as close as possible for the nvidia GPU plugin.

This patch also performs minor refactoring for some
test framework utilities, needed to support the new
e2e tests.

Finally, we add an empty e2e topology manager test,
to be completed by the next patch.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2020-02-10 22:47:53 +01:00
Francesco Romani
1fdf262137 e2e: topomgr: explicit save the kubelet config
For the sake of readability, save the old Kubelet config
once.

Signed-off-by: Francesco Romani <fromani@redhat.com>
2020-02-10 22:47:53 +01:00
marosset
a4d7a67bbd Run Windows kubelet stats e2e tests serially because it needs to start many pods on a single node 2020-02-10 17:56:33 +00:00