kubernetes/cluster
Kubernetes Submit Queue 3a3dc827e4 Merge pull request #43467 from tvansteenburgh/gpu-support
Automatic merge from submit-queue (batch tested with PRs 44047, 43514, 44037, 43467)

Juju: Enable GPU mode if GPU hardware detected

**What this PR does / why we need it**:

Automatically configures kubernetes-worker node to utilize GPU hardware when such hardware is detected.

layer-nvidia-cuda does the hardware detection, installs CUDA and Nvidia
drivers, and sets a state that the k8s-worker can react to.

When gpu is available, worker updates config and restarts kubelet to
enable gpu mode. Worker then notifies master that it's in gpu mode via
the kube-control relation.

When master sees that a worker is in gpu mode, it updates to privileged
mode and restarts kube-apiserver.

The kube-control interface has subsumed the kube-dns interface
functionality.

An 'allow-privileged' config option has been added to both worker and
master charms. The gpu enablement respects the value of this option;
i.e., we can't enable gpu mode if the operator has set
allow-privileged="false".

**Special notes for your reviewer**:

Quickest test setup is as follows:
```bash
# Bootstrap. If your aws account doesn't have a default vpc, you'll need to
# specify one at bootstrap time so that juju can provision a p2.xlarge.
# Otherwise you can leave out the --config "vpc-id=vpc-xxxxxxxx" bit.
juju bootstrap --config "vpc-id=vpc-xxxxxxxx" --constraints "cores=4 mem=16G root-disk=64G" aws/us-east-1 k8s

# Deploy the bundle containing master and worker charms built from
# https://github.com/tvansteenburgh/kubernetes/tree/gpu-support/cluster/juju/layers
juju deploy cs:~tvansteenburgh/bundle/kubernetes-gpu-support-3

# Setup kubectl locally
mkdir -p ~/.kube
juju scp kubernetes-master/0:config ~/.kube/config
juju scp kubernetes-master/0:kubectl ./kubectl

# Download a gpu-dependent job spec
wget -O /tmp/nvidia-smi.yaml https://raw.githubusercontent.com/madeden/blogposts/master/k8s-gpu-cloud/src/nvidia-smi.yaml

# Create the job
kubectl create -f /tmp/nvidia-smi.yaml

# You should see a new nvidia-smi-xxxxx pod created
kubectl get pods

# Wait a bit for the job to run, then view logs; you should see the
# nvidia-smi table output
kubectl logs $(kubectl get pods -l name=nvidia-smi -o=name -a)
```

kube-control interface: https://github.com/juju-solutions/interface-kube-control
nvidia-cuda layer: https://github.com/juju-solutions/layer-nvidia-cuda
(Both are registered on http://interfaces.juju.solutions/)

**Release note**:
```release-note
Juju: Enable GPU mode if GPU hardware detected
```
2017-04-04 14:33:26 -07:00
..
addons Merge pull request #42668 from ixdy/build-silence-docker-rmi 2017-03-30 23:36:24 -07:00
aws AWS: Kill bash deployment 2017-02-27 14:39:25 -08:00
centos Centos provider: generate SSL certificates for etcd cluster. 2017-03-24 09:15:57 +08:00
gce Merge pull request #43726 from vishh/local-ssd-gce 2017-03-29 16:56:27 -07:00
gke GCE will properly regenerate basic_auth.csv on kube-apiserver start. 2017-02-25 11:31:59 -08:00
images Merge pull request #42668 from ixdy/build-silence-docker-rmi 2017-03-30 23:36:24 -07:00
juju Merge pull request #43467 from tvansteenburgh/gpu-support 2017-04-04 14:33:26 -07:00
kubemark Correct CIDR range for kubemark 2017-02-28 19:26:32 +01:00
lib Add test shell stack traces 2017-01-25 13:34:16 -05:00
libvirt-coreos Keep ResourceQuota admission at the end of the chain 2017-03-21 01:53:11 -04:00
local Merge pull request #28469 from asalkeld/local-e2e 2016-09-11 05:44:47 -07:00
openstack-heat Merge pull request #42638 from jamiehannaford/minion-fip 2017-03-25 18:15:21 -07:00
ovirt
photon-controller Keep ResourceQuota admission at the end of the chain 2017-03-21 01:53:11 -04:00
rackspace Keep ResourceQuota admission at the end of the chain 2017-03-21 01:53:11 -04:00
saltbase Make a smaller redis image for testing, based on Alpine. 2017-03-28 16:18:00 -07:00
skeleton
ubuntu Merge pull request #42467 from chentao1596/change-etcd-version 2017-03-28 14:09:22 -07:00
vagrant Keep ResourceQuota admission at the end of the chain 2017-03-21 01:53:11 -04:00
vsphere Update generated for 2017 2017-01-01 23:11:09 -08:00
windows Fixed the issue with log rotation 2016-12-12 11:08:41 -05:00
BUILD Build release tarballs in bazel and add make bazel-release rule 2017-01-13 16:17:44 -08:00
clientbin.sh Refactor the common parts of cluster/kube{ctl,adm}.sh into a util script. 2017-01-26 21:29:49 -08:00
common.sh Merge pull request #42668 from ixdy/build-silence-docker-rmi 2017-03-30 23:36:24 -07:00
get-kube-binaries.sh Do not override KUBERNETES_RELEASE if already set 2017-03-17 15:29:21 -07:00
get-kube-local.sh get-kube-local.sh checks pods with option "--namespace=kube-system" 2017-03-04 00:18:42 -05:00
get-kube.sh Update a few regex patterns to support release candidates 2017-03-24 14:38:04 -07:00
kube-down.sh Automatically download missing kube binaries in kube-up/kube-down. 2016-12-13 14:59:13 -08:00
kube-push.sh Automatically download missing kube binaries in kube-up/kube-down. 2016-12-13 14:59:13 -08:00
kube-up.sh Automatically download missing kube binaries in kube-up/kube-down. 2016-12-13 14:59:13 -08:00
kube-util.sh Split federation-{up,down} from e2e-{up,down}. 2017-02-24 14:27:31 -08:00
kubeadm.sh Refactor the common parts of cluster/kube{ctl,adm}.sh into a util script. 2017-01-26 21:29:49 -08:00
kubectl.sh Fix failing kubectl skew tests 2017-03-08 16:08:47 -03:00
log-dump.sh cluster/log-dump - chmod files before dumping 2017-04-03 21:41:24 -04:00
options.md
OWNERS Updated top level owners file to match new format 2017-01-19 11:29:16 -08:00
README.md Fix typos and linted_packages sorting 2016-10-31 18:31:08 +01:00
restore-from-backup.sh Fix restore-from-backup.sh script 2017-03-21 11:58:13 +01:00
test-e2e.sh
test-network.sh
test-smoke.sh
update-storage-objects.sh
validate-cluster.sh Fixed cluster validation: added -q and project flags to gcloud. 2016-12-21 14:13:14 +01:00

Cluster Configuration

Deprecation Notice: This directory has entered maintenance mode and will not be accepting new providers. Please submit new automation deployments to kube-deploy. Deployments in this directory will continue to be maintained and supported at their current level of support.

The scripts and data in this directory automate creation and configuration of a Kubernetes cluster, including networking, DNS, nodes, and master components.

See the getting-started guides for examples of how to use the scripts.

cloudprovider/config-default.sh contains a set of tweakable definitions/parameters for the cluster.

The heavy lifting of configuring the VMs is done by SaltStack.

Analytics