Files

Kubernetes Submit Queue 3a3dc827e4 Merge pull request #43467 from tvansteenburgh/gpu-support

Automatic merge from submit-queue (batch tested with PRs 44047, 43514, 44037, 43467)

Juju: Enable GPU mode if GPU hardware detected

**What this PR does / why we need it**:

Automatically configures kubernetes-worker node to utilize GPU hardware when such hardware is detected.

layer-nvidia-cuda does the hardware detection, installs CUDA and Nvidia
drivers, and sets a state that the k8s-worker can react to.

When gpu is available, worker updates config and restarts kubelet to
enable gpu mode. Worker then notifies master that it's in gpu mode via
the kube-control relation.

When master sees that a worker is in gpu mode, it updates to privileged
mode and restarts kube-apiserver.

The kube-control interface has subsumed the kube-dns interface
functionality.

An 'allow-privileged' config option has been added to both worker and
master charms. The gpu enablement respects the value of this option;
i.e., we can't enable gpu mode if the operator has set
allow-privileged="false".

**Special notes for your reviewer**:

Quickest test setup is as follows:
```bash
# Bootstrap. If your aws account doesn't have a default vpc, you'll need to
# specify one at bootstrap time so that juju can provision a p2.xlarge.
# Otherwise you can leave out the --config "vpc-id=vpc-xxxxxxxx" bit.
juju bootstrap --config "vpc-id=vpc-xxxxxxxx" --constraints "cores=4 mem=16G root-disk=64G" aws/us-east-1 k8s

# Deploy the bundle containing master and worker charms built from
# https://github.com/tvansteenburgh/kubernetes/tree/gpu-support/cluster/juju/layers
juju deploy cs:~tvansteenburgh/bundle/kubernetes-gpu-support-3

# Setup kubectl locally
mkdir -p ~/.kube
juju scp kubernetes-master/0:config ~/.kube/config
juju scp kubernetes-master/0:kubectl ./kubectl

# Download a gpu-dependent job spec
wget -O /tmp/nvidia-smi.yaml https://raw.githubusercontent.com/madeden/blogposts/master/k8s-gpu-cloud/src/nvidia-smi.yaml

# Create the job
kubectl create -f /tmp/nvidia-smi.yaml

# You should see a new nvidia-smi-xxxxx pod created
kubectl get pods

# Wait a bit for the job to run, then view logs; you should see the
# nvidia-smi table output
kubectl logs $(kubectl get pods -l name=nvidia-smi -o=name -a)
```

kube-control interface: https://github.com/juju-solutions/interface-kube-control
nvidia-cuda layer: https://github.com/juju-solutions/layer-nvidia-cuda
(Both are registered on http://interfaces.juju.solutions/)

**Release note**:
```release-note
Juju: Enable GPU mode if GPU hardware detected
```

2017-04-04 14:33:26 -07:00

addons

Merge pull request #42668 from ixdy/build-silence-docker-rmi

2017-03-30 23:36:24 -07:00

aws

AWS: Kill bash deployment

2017-02-27 14:39:25 -08:00

centos

Centos provider: generate SSL certificates for etcd cluster.

2017-03-24 09:15:57 +08:00

gce

Merge pull request #43726 from vishh/local-ssd-gce

2017-03-29 16:56:27 -07:00

gke

GCE will properly regenerate basic_auth.csv on kube-apiserver start.

2017-02-25 11:31:59 -08:00

images

Merge pull request #42668 from ixdy/build-silence-docker-rmi

2017-03-30 23:36:24 -07:00

juju

Merge pull request #43467 from tvansteenburgh/gpu-support

2017-04-04 14:33:26 -07:00

kubemark

Correct CIDR range for kubemark

2017-02-28 19:26:32 +01:00

lib

Add test shell stack traces

2017-01-25 13:34:16 -05:00

libvirt-coreos

Keep ResourceQuota admission at the end of the chain

2017-03-21 01:53:11 -04:00

local

Merge pull request #28469 from asalkeld/local-e2e

2016-09-11 05:44:47 -07:00

openstack-heat

Merge pull request #42638 from jamiehannaford/minion-fip

2017-03-25 18:15:21 -07:00

ovirt

…

photon-controller

Keep ResourceQuota admission at the end of the chain

2017-03-21 01:53:11 -04:00

rackspace

Keep ResourceQuota admission at the end of the chain

2017-03-21 01:53:11 -04:00

saltbase

Make a smaller redis image for testing, based on Alpine.

2017-03-28 16:18:00 -07:00

skeleton

2016-06-29 17:47:36 -07:00

ubuntu

Merge pull request #42467 from chentao1596/change-etcd-version

2017-03-28 14:09:22 -07:00

vagrant

Keep ResourceQuota admission at the end of the chain

2017-03-21 01:53:11 -04:00

vsphere

Update generated for 2017

2017-01-01 23:11:09 -08:00

windows

Fixed the issue with log rotation

2016-12-12 11:08:41 -05:00

BUILD

Build release tarballs in bazel and add make bazel-release rule

2017-01-13 16:17:44 -08:00

clientbin.sh

Refactor the common parts of cluster/kube{ctl,adm}.sh into a util script.

2017-01-26 21:29:49 -08:00

common.sh

Merge pull request #42668 from ixdy/build-silence-docker-rmi

2017-03-30 23:36:24 -07:00

get-kube-binaries.sh

Do not override KUBERNETES_RELEASE if already set

2017-03-17 15:29:21 -07:00

get-kube-local.sh

get-kube-local.sh checks pods with option "--namespace=kube-system"

2017-03-04 00:18:42 -05:00

get-kube.sh

Update a few regex patterns to support release candidates

2017-03-24 14:38:04 -07:00

kube-down.sh

Automatically download missing kube binaries in kube-up/kube-down.

2016-12-13 14:59:13 -08:00

kube-push.sh

Automatically download missing kube binaries in kube-up/kube-down.

2016-12-13 14:59:13 -08:00

kube-up.sh

Automatically download missing kube binaries in kube-up/kube-down.

2016-12-13 14:59:13 -08:00

kube-util.sh

Split federation-{up,down} from e2e-{up,down}.

2017-02-24 14:27:31 -08:00

kubeadm.sh

Refactor the common parts of cluster/kube{ctl,adm}.sh into a util script.

2017-01-26 21:29:49 -08:00

kubectl.sh

Fix failing kubectl skew tests

2017-03-08 16:08:47 -03:00

log-dump.sh

cluster/log-dump - chmod files before dumping

2017-04-03 21:41:24 -04:00

options.md

…

OWNERS

Updated top level owners file to match new format

2017-01-19 11:29:16 -08:00

README.md

Fix typos and linted_packages sorting

2016-10-31 18:31:08 +01:00

restore-from-backup.sh

Fix restore-from-backup.sh script

2017-03-21 11:58:13 +01:00

test-e2e.sh

2016-06-29 17:47:36 -07:00

test-network.sh

2016-06-29 17:47:36 -07:00

test-smoke.sh

2016-06-29 17:47:36 -07:00

update-storage-objects.sh

2016-06-29 17:47:36 -07:00

validate-cluster.sh

Fixed cluster validation: added -q and project flags to gcloud.

2016-12-21 14:13:14 +01:00

README.md

Cluster Configuration

Deprecation Notice: This directory has entered maintenance mode and will not be accepting new providers. Please submit new automation deployments to kube-deploy. Deployments in this directory will continue to be maintained and supported at their current level of support.

The scripts and data in this directory automate creation and configuration of a Kubernetes cluster, including networking, DNS, nodes, and master components.

See the getting-started guides for examples of how to use the scripts.

cloudprovider/config-default.sh contains a set of tweakable definitions/parameters for the cluster.

The heavy lifting of configuring the VMs is done by SaltStack.