Commit Graph

56 Commits

Author SHA1 Message Date
Sergiusz Urbaniak
1495c9f2cd
test/e2e/*: default existing tests to privileged pod security policy
This is to ensure that all existing tests don't break when defaulting
the pod security policy to restricted in the e2e test framework.
2022-04-05 08:41:12 +02:00
hasheddan
e990698d5f
Use local daemonset manifest for installing Nvidia drivers
Updates sig-scheduling e2e Nvidia GPU tests to install drivers using
local manifest by default. Currently the DaemonSet is fetched from the
GoogleCloudPlatform/container-enginer-accelerators repo by default.
Using a local manifest allows for manually specifying the image
cos-gpu-installer image rather than always using latest. A remote
manifest can still be fetched by setting
NVIDIA_DRIVER_INSTALLER_DAEMONSET env var.

Signed-off-by: hasheddan <georgedanielmangum@gmail.com>
2020-07-18 21:01:00 -05:00
tanjunchen
f76da50c7d test/e2e/framework/util.go:move DsFromManifest to test/e2e/framework/manifest , and rename it to DaemonSetFromURL 2020-04-14 09:54:41 +08:00
tanjunchen
e358e75681 test/e2e/scheduling:improve code 2020-04-08 18:03:15 +08:00
drfish
dfab6b637f Update .import-aliases for e2e test framework 2020-03-25 11:40:02 +08:00
drfish
f4da086cbe Move resource methods from e2e fw to e2e resource fw 2020-03-08 15:27:49 +08:00
Mike Danese
25651408ae generated: run refactor 2020-02-08 12:30:21 -05:00
Mike Danese
3aa59f7f30 generated: run refactor 2020-02-07 18:16:47 -08:00
toyoda
0c12ae5240 Modify alias of e2e/framework/job to e2ejob 2020-01-17 10:56:05 +09:00
s-ito-ts
4789e51d8e Use e2eskipper package in e2e/scheduling and e2e/servicecatalog 2020-01-14 01:54:25 +00:00
s-ito-ts
8745f02015 Use log functions of core framework on test/e2e/scheduling 2019-08-30 06:27:42 +00:00
Kubernetes Prow Robot
0f32f9ef0e
Merge pull request #77100 from chardch/add-driver-version
Emit the nvidia driver version in gpu e2e test
2019-07-27 00:49:57 -07:00
Kubernetes Prow Robot
d3be556e1c
Merge pull request #77150 from chardch/gpu-test-pod-number
Only create one pod per node with gpus in E2E test
2019-07-01 15:09:07 -07:00
Richard Chen
9368b2ce87 Only create one pod per gpu node in E2E test 2019-06-27 13:40:35 -07:00
SataQiu
332be4b1e3 refactor: replace framework.Failf with e2elog.Failf 2019-06-19 17:52:35 +08:00
Richard Chen
794ec63bbd Output the nvidia gpu information in the E2E test.
Including the gpu information simplifies driver version verification.
nvidia-smi is used in order to display gpu information, which contains the driver version.
2019-06-18 17:11:19 -07:00
Jiatong Wang
b1c346c295 Move node related methods to framework/node package
- Add a package "node" under e2e/framework and alias e2enode;
- Rename some functions whose name have redundant string.

Signed-off-by: Jiatong Wang <wangjiatong@vmware.com>
2019-06-17 16:59:07 -07:00
Jorge Alarcon Ochoa
4969a05327 Refactored pod-related functions from framework/util.go
This a refactoring of framework/utils.go into framework/pod.

Signed-off-by: Jorge Alarcon Ochoa <alarcj137@gmail.com>
2019-05-30 09:30:26 -04:00
SataQiu
d3a902ff5b e2e refactor: cleanup Logf form framework/util 2019-05-24 16:39:46 +08:00
Richard Chen
2a70a0b424 Add an e2e test for running a gpu job interrupted by node recreation. 2019-05-16 11:41:01 -07:00
danielqsj
ccecc67a5b fix golint error in test/e2e/scheduling 2019-05-14 14:18:52 +08:00
danielqsj
15a4342fe8 remove dot imports in e2e/scheduling 2019-05-14 14:17:20 +08:00
draveness
da7507500f refactor: use e2elog.Logf instead of framework.Logf 2019-05-07 08:15:31 +08:00
Kubernetes Prow Robot
6cd85298c5
Merge pull request #75566 from jiayingz/gpu-test-update
Update test/e2e/scheduling/nvidia-gpus to also run cuda10 vector add.
2019-04-24 14:20:47 -07:00
Jiatong Wang
7814865b40 Move gpu_util.go to e2e/framework/gpu 2019-04-10 14:30:24 -07:00
Jiaying Zhang
54c2c2690c Update test/e2e/scheduling/nvidia-gpus to also run cuda10 vector add. 2019-03-21 16:29:47 -07:00
Chris O'Haver
9060fc6e6d add opt to track dns pods 2018-10-01 10:00:16 -04:00
Da K. Ma
adbdbdec49 Got allocatable GPUs.
Signed-off-by: Da K. Ma <klaus1982.cn@gmail.com>
2018-09-25 12:33:42 +08:00
Francois Tur
5c20fff19d
Revert "Add DNS pod resource monitoring option" 2018-09-19 14:54:29 -04:00
Chris O'Haver
af0c1d2a4c Add dns pod monitoring option 2018-09-17 12:52:05 -04:00
linyouchong
d7b7fdd0dc Make log more readable 2018-08-16 17:31:02 +08:00
Rohit Agarwal
af3bc705b5 Remove COS requirement while running e2e nvidia gpu tests. 2018-06-26 12:12:06 -07:00
Maciej Szulik
a2a3a98e1d
DaemonSet internals are still in extensions 2018-05-28 17:59:54 +02:00
Jiaying Zhang
6e0badc0d1 Fix DsFromManifest() after we switch from extensions/v1beta1 to apps/v1
in cluster/addons/device-plugins/nvidia-gpu/daemonset.yaml.
2018-05-25 16:05:35 -07:00
Kubernetes Submit Queue
043204b1e5
Merge pull request #61498 from mindprince/delete-in-tree-gpu
Automatic merge from submit-queue (batch tested with PRs 61498, 62030). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Delete in-tree support for NVIDIA GPUs.

This removes the alpha Accelerators feature gate which was deprecated in 1.10 (#57384).
The alternative feature DevicePlugins went beta in 1.10 (#60170).

Fixes #54012

```release-note
Support for "alpha.kubernetes.io/nvidia-gpu" resource which was deprecated in 1.10 is removed. Please use the resource exposed by DevicePlugins instead ("nvidia.com/gpu").
```
2018-04-03 02:02:04 -07:00
Rohit Agarwal
87dda3375b Delete in-tree support for NVIDIA GPUs.
This removes the alpha Accelerators feature gate which was deprecated in 1.10.
The alternative feature DevicePlugins went beta in 1.10.
2018-04-02 20:17:01 -07:00
Christoph Blecker
710c8563b4
Fix go vet errors 2018-04-02 17:57:44 -07:00
Jiaying Zhang
9a05af5502 Update gke nvidia-gpu-device-plugin to the latest version that supports
both v1alpha and v1beta1 device plugin versions.
Re-enables nvidia-gpus e2e test after verifying the test passes now.
2018-02-26 14:08:58 -08:00
vikaschoudhary16
e64517cd74 Migrate deviceplugin api from v1alpha to v1beta1 2018-02-21 01:26:20 -05:00
Rohit Agarwal
d191c57cad Add e2e tests for GPU monitoring. 2018-01-26 15:30:55 -08:00
Rohit Agarwal
a959ae636b Make it possible to override the driver installer daemonset url from test-infra. 2018-01-25 09:21:12 -08:00
Jiaying Zhang
4a1a205109 Changes nvidia-gpu device plugin addon config settings:
- Runs as system critical pod
- Makes resource limits to match its resource requets
- Modifies test/e2e/scheduling/nvidia-gpus.go to cope with the recent
change of running the device plugin as a system addon.
- The resource settings of the addon is based on the test results
from 8 nvidia-tesla-k80 gpus.
2017-11-20 17:32:53 -08:00
Kubernetes Submit Queue
87d45a54bd
Merge pull request #55940 from shyamjvs/reduce-spam-from-resource-gatherer
Automatic merge from submit-queue (batch tested with PRs 55233, 55927, 55903, 54867, 55940). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Control logs verbosity in resource gatherer

PR https://github.com/kubernetes/kubernetes/pull/53541 added some logging in resource gatherer which is a bit too verbose for normal purposes.
As a result, we're seeing a lot of spam in our large cluster performance tests (e.g - https://storage.googleapis.com/kubernetes-jenkins/logs/ci-kubernetes-e2e-gci-gce-scalability/8046/build-log.txt)

This PR is making the verbosity of those logs controllable through an option. It's off by default, but turning it on for the gpu test to preserve behavior there.

/cc @jiayingz @mindprince
2017-11-18 12:26:18 -08:00
Shyam Jeedigunta
fce28995e1 Control logs verbosity in resource gatherer 2017-11-17 13:03:32 +01:00
Rohit Agarwal
3ac94a57eb Update URLs for nvidia gpu device plugin and nvidia driver installer.
Device plugin is now an addon and its manifest is now in
kubernetes/kubernetes. The manifest on
GoogleCloudPlatform/container-engine-accelerators no longer contains
device plugin.
2017-11-14 15:31:22 -08:00
Jiaying Zhang
ae36f8ee95 Extend test/e2e/scheduling/nvidia-gpus.go to track resource usage of
installer and device plugin containers.
To support this, exports certain functions and fields in
framework/resource_usage_gatherer.go so that it can be used in any
e2e test to track any specified pod resource usage with the specified
probe interval and duration.
2017-11-13 16:24:41 -08:00
supereagle
b694d51842 use versiond group clients from client-go 2017-11-07 14:47:22 +08:00
Jiaying Zhang
6fecd04924 Fixes a regression introduced by PR 52290 that extended resource
capacity may temporarily drop to zero after kubelet restarts and
PODs restarted during that time window could fail to be scheduled.
2017-10-03 10:26:53 -07:00
Jiaying Zhang
65b76f361e Fixes a flakiness in GPUDevicePlugin e2e test.
Waits till nvidia gpu disappears from all nodes after deleting the
device plug DaemonSet to make sure its pods are deleted from all nodes.
2017-09-29 10:06:58 -07:00
Jiaying Zhang
ba40bee5c1 Modified test/e2e_node/gpu-device-plugin.go to make sure it passes. 2017-09-22 20:21:26 +02:00