![]() Automatic merge from submit-queue (batch tested with PRs 44047, 43514, 44037, 43467) Juju: Enable GPU mode if GPU hardware detected **What this PR does / why we need it**: Automatically configures kubernetes-worker node to utilize GPU hardware when such hardware is detected. layer-nvidia-cuda does the hardware detection, installs CUDA and Nvidia drivers, and sets a state that the k8s-worker can react to. When gpu is available, worker updates config and restarts kubelet to enable gpu mode. Worker then notifies master that it's in gpu mode via the kube-control relation. When master sees that a worker is in gpu mode, it updates to privileged mode and restarts kube-apiserver. The kube-control interface has subsumed the kube-dns interface functionality. An 'allow-privileged' config option has been added to both worker and master charms. The gpu enablement respects the value of this option; i.e., we can't enable gpu mode if the operator has set allow-privileged="false". **Special notes for your reviewer**: Quickest test setup is as follows: ```bash # Bootstrap. If your aws account doesn't have a default vpc, you'll need to # specify one at bootstrap time so that juju can provision a p2.xlarge. # Otherwise you can leave out the --config "vpc-id=vpc-xxxxxxxx" bit. juju bootstrap --config "vpc-id=vpc-xxxxxxxx" --constraints "cores=4 mem=16G root-disk=64G" aws/us-east-1 k8s # Deploy the bundle containing master and worker charms built from # https://github.com/tvansteenburgh/kubernetes/tree/gpu-support/cluster/juju/layers juju deploy cs:~tvansteenburgh/bundle/kubernetes-gpu-support-3 # Setup kubectl locally mkdir -p ~/.kube juju scp kubernetes-master/0:config ~/.kube/config juju scp kubernetes-master/0:kubectl ./kubectl # Download a gpu-dependent job spec wget -O /tmp/nvidia-smi.yaml https://raw.githubusercontent.com/madeden/blogposts/master/k8s-gpu-cloud/src/nvidia-smi.yaml # Create the job kubectl create -f /tmp/nvidia-smi.yaml # You should see a new nvidia-smi-xxxxx pod created kubectl get pods # Wait a bit for the job to run, then view logs; you should see the # nvidia-smi table output kubectl logs $(kubectl get pods -l name=nvidia-smi -o=name -a) ``` kube-control interface: https://github.com/juju-solutions/interface-kube-control nvidia-cuda layer: https://github.com/juju-solutions/layer-nvidia-cuda (Both are registered on http://interfaces.juju.solutions/) **Release note**: ```release-note Juju: Enable GPU mode if GPU hardware detected ``` |
||
---|---|---|
.. | ||
addons | ||
aws | ||
centos | ||
gce | ||
gke | ||
images | ||
juju | ||
kubemark | ||
lib | ||
libvirt-coreos | ||
local | ||
openstack-heat | ||
ovirt | ||
photon-controller | ||
rackspace | ||
saltbase | ||
skeleton | ||
ubuntu | ||
vagrant | ||
vsphere | ||
windows | ||
BUILD | ||
clientbin.sh | ||
common.sh | ||
get-kube-binaries.sh | ||
get-kube-local.sh | ||
get-kube.sh | ||
kube-down.sh | ||
kube-push.sh | ||
kube-up.sh | ||
kube-util.sh | ||
kubeadm.sh | ||
kubectl.sh | ||
log-dump.sh | ||
options.md | ||
OWNERS | ||
README.md | ||
restore-from-backup.sh | ||
test-e2e.sh | ||
test-network.sh | ||
test-smoke.sh | ||
update-storage-objects.sh | ||
validate-cluster.sh |
Cluster Configuration
Deprecation Notice: This directory has entered maintenance mode and will not be accepting new providers. Please submit new automation deployments to kube-deploy. Deployments in this directory will continue to be maintained and supported at their current level of support.
The scripts and data in this directory automate creation and configuration of a Kubernetes cluster, including networking, DNS, nodes, and master components.
See the getting-started guides for examples of how to use the scripts.
cloudprovider/config-default.sh
contains a set of tweakable definitions/parameters for the cluster.
The heavy lifting of configuring the VMs is done by SaltStack.