modified spark example to use kubectl exec to interact with cluster and create spark driver pod

This commit is contained in:
Gurvinder Singh 2015-07-10 11:19:55 +02:00
parent ccc4cfc7e1
commit a046fa478d
5 changed files with 61 additions and 32 deletions

View File

@ -110,44 +110,35 @@ $ kubectl logs spark-master
15/06/26 14:15:55 INFO Master: Registering worker 10.244.1.15:44839 with 1 cores, 2.6 GB RAM 15/06/26 14:15:55 INFO Master: Registering worker 10.244.1.15:44839 with 1 cores, 2.6 GB RAM
15/06/26 14:15:55 INFO Master: Registering worker 10.244.0.19:60970 with 1 cores, 2.6 GB RAM 15/06/26 14:15:55 INFO Master: Registering worker 10.244.0.19:60970 with 1 cores, 2.6 GB RAM
``` ```
## Step Three: Do something with the cluster
Get the address and port of the Master service. ## Step Three: Start your Spark driver to launch jobs on your Spark cluster
The Spark driver is used to launch jobs into Spark cluster. You can read more about it in
[Spark architecture](http://spark.apache.org/docs/latest/cluster-overview.html).
```shell ```shell
$ kubectl get service spark-master $ kubectl create -f examples/spark/spark-driver.json
NAME LABELS SELECTOR IP(S) PORT(S) ```
spark-master name=spark-master name=spark-master 10.0.204.187 7077/TCP The Spark driver needs the Master service to be running.
### Check to see if the driver is running
```shell
$ kubectl get pods
NAME READY REASON RESTARTS AGE
[...]
spark-master 1/1 Running 0 14m
spark-driver 1/1 Running 0 10m
``` ```
SSH to one of your cluster nodes. On GCE/GKE you can either use [Developers Console](https://console.developers.google.com) ## Step Four: Do something with the cluster
(more details [here](https://cloud.google.com/compute/docs/ssh-in-browser))
or run `gcloud compute ssh <name>` where the name can be taken from `kubectl get nodes` Use the kubectl exec to connect to Spark driver
(more details [here](https://cloud.google.com/compute/docs/gcloud-compute/#connecting)).
``` ```
$ kubectl get nodes $ kubectl exec spark-driver -it bash
NAME LABELS STATUS root@spark-driver:/#
kubernetes-minion-5jvu kubernetes.io/hostname=kubernetes-minion-5jvu Ready root@spark-driver:/# pyspark
kubernetes-minion-6fbi kubernetes.io/hostname=kubernetes-minion-6fbi Ready
kubernetes-minion-8y2v kubernetes.io/hostname=kubernetes-minion-8y2v Ready
kubernetes-minion-h0tr kubernetes.io/hostname=kubernetes-minion-h0tr Ready
$ gcloud compute ssh kubernetes-minion-5jvu --zone=us-central1-b
Linux kubernetes-minion-5jvu 3.16.0-0.bpo.4-amd64 #1 SMP Debian 3.16.7-ckt9-3~deb8u1~bpo70+1 (2015-04-27) x86_64
=== GCE Kubernetes node setup complete ===
me@kubernetes-minion-5jvu:~$
```
Once logged in run spark-base image. Inside of the image there is a script
that sets up the environment based on the provided IP and port of the Master.
```
cluster-node $ sudo docker run -it gcr.io/google_containers/spark-base
root@f12a6fec45ce:/# . /setup_client.sh 10.0.204.187 7077
root@f12a6fec45ce:/# pyspark
Python 2.7.9 (default, Mar 1 2015, 12:57:24) Python 2.7.9 (default, Mar 1 2015, 12:57:24)
[GCC 4.9.2] on linux2 [GCC 4.9.2] on linux2
Type "help", "copyright", "credits" or "license" for more information. Type "help", "copyright", "credits" or "license" for more information.
@ -166,7 +157,7 @@ SparkContext available as sc, HiveContext available as sqlContext.
``` ```
## Result ## Result
You now have services, replication controllers, and pods for the Spark master and Spark workers. You now have services, replication controllers, and pods for the Spark master , Spark driver and Spark workers.
You can take this example to the next step and start using the Apache Spark cluster You can take this example to the next step and start using the Apache Spark cluster
you just created, see [Spark documentation](https://spark.apache.org/documentation.html) you just created, see [Spark documentation](https://spark.apache.org/documentation.html)
for more information. for more information.
@ -181,4 +172,6 @@ Make sure the Master Pod is running (use: ```kubectl get pods```).
```kubectl create -f spark-worker-controller.json``` ```kubectl create -f spark-worker-controller.json```
```kubectl create -f spark-driver.json```
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/examples/spark/README.md?pixel)]() [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/examples/spark/README.md?pixel)]()

View File

@ -0,0 +1,4 @@
FROM gcr.io/google_containers/spark-base
ADD start.sh /start.sh
ADD log4j.properties /opt/spark/conf/log4j.properties
CMD ["/start.sh"]

View File

View File

@ -0,0 +1,9 @@
#!/bin/bash
echo "$SPARK_MASTER_SERVICE_HOST spark-master" >> /etc/hosts
echo "SPARK_LOCAL_HOSTNAME=$(hostname -i)" >> /opt/spark/conf/spark-env.sh
echo "MASTER=spark://spark-master:$SPARK_MASTER_SERVICE_PORT" >> /opt/spark/conf/spark-env.sh
while true; do
sleep 100
done

View File

@ -0,0 +1,23 @@
{
"kind": "Pod",
"apiVersion": "v1",
"metadata": {
"name": "spark-driver",
"labels": {
"name": "spark-driver"
}
},
"spec": {
"containers": [
{
"name": "spark-driver",
"image": "gurvin/spark-driver",
"resources": {
"limits": {
"cpu": "100m"
}
}
}
]
}
}