modified spark example to use kubectl exec to interact with cluster and create spark driver pod

2015-07-10 11:19:55 +02:00
parent ccc4cfc7e1
commit a046fa478d
5 changed files with 61 additions and 32 deletions
--- a/examples/spark/README.md
+++ b/examples/spark/README.md
@@ -110,44 +110,35 @@ $ kubectl logs spark-master
 15/06/26 14:15:55 INFO Master: Registering worker 10.244.1.15:44839 with 1 cores, 2.6 GB RAM
 15/06/26 14:15:55 INFO Master: Registering worker 10.244.0.19:60970 with 1 cores, 2.6 GB RAM
 ```
 ## Step Three: Do something with the cluster
-Get the address and port of the Master service.
+## Step Three: Start your Spark driver to launch jobs on your Spark cluster
 The Spark driver is used to launch jobs into Spark cluster. You can read more about it in
 [Spark architecture](http://spark.apache.org/docs/latest/cluster-overview.html).
 ```shell
-$ kubectl get service spark-master
+$ kubectl create -f examples/spark/spark-driver.json
-NAME           LABELS              SELECTOR            IP(S)          PORT(S)
+```
-spark-master   name=spark-master   name=spark-master   10.0.204.187   7077/TCP
+The Spark driver needs the Master service to be running.
 ### Check to see if the driver is running
 ```shell
 $ kubectl get pods
 NAME                                           READY     REASON    RESTARTS   AGE
 [...]
 spark-master                                    1/1       Running   0          14m
 spark-driver                                    1/1       Running   0          10m
 ```
-SSH to one of your cluster nodes. On GCE/GKE you can either use [Developers Console](https://console.developers.google.com)
+## Step Four: Do something with the cluster
-(more details [here](https://cloud.google.com/compute/docs/ssh-in-browser))
+
-or run  `gcloud compute ssh <name>` where the name can be taken from `kubectl get nodes`
+Use the kubectl exec to connect to Spark driver
 (more details [here](https://cloud.google.com/compute/docs/gcloud-compute/#connecting)).
 ```
-$ kubectl get nodes
+$ kubectl exec spark-driver -it bash
-NAME                     LABELS                                          STATUS
+root@spark-driver:/#
-kubernetes-minion-5jvu   kubernetes.io/hostname=kubernetes-minion-5jvu   Ready
+root@spark-driver:/# pyspark
 kubernetes-minion-6fbi   kubernetes.io/hostname=kubernetes-minion-6fbi   Ready
 kubernetes-minion-8y2v   kubernetes.io/hostname=kubernetes-minion-8y2v   Ready
 kubernetes-minion-h0tr   kubernetes.io/hostname=kubernetes-minion-h0tr   Ready
 $ gcloud compute ssh kubernetes-minion-5jvu --zone=us-central1-b
 Linux kubernetes-minion-5jvu 3.16.0-0.bpo.4-amd64 #1 SMP Debian 3.16.7-ckt9-3~deb8u1~bpo70+1 (2015-04-27) x86_64
 === GCE Kubernetes node setup complete ===
 me@kubernetes-minion-5jvu:~$
 ```
 Once logged in run spark-base image. Inside of the image there is a script
 that sets up the environment based on the provided IP and port of the Master.
 ```
 cluster-node $ sudo docker run -it gcr.io/google_containers/spark-base
 root@f12a6fec45ce:/# . /setup_client.sh 10.0.204.187 7077
 root@f12a6fec45ce:/# pyspark
 Python 2.7.9 (default, Mar  1 2015, 12:57:24) 
 [GCC 4.9.2] on linux2
 Type "help", "copyright", "credits" or "license" for more information.
@@ -166,7 +157,7 @@ SparkContext available as sc, HiveContext available as sqlContext.
 ```
 ## Result
-You now have services, replication controllers, and pods for the Spark master and Spark workers.
+You now have services, replication controllers, and pods for the Spark master , Spark driver and Spark workers.
 You can take this example to the next step and start using the Apache Spark cluster 
 you just created, see [Spark documentation](https://spark.apache.org/documentation.html) 
 for more information.
@@ -181,4 +172,6 @@ Make sure the Master Pod is running (use: ```kubectl get pods```).
 ```kubectl create -f spark-worker-controller.json```
 ```kubectl create -f spark-driver.json```
 [![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/examples/spark/README.md?pixel)]()
--- a/examples/spark/images/driver/Dockerfile
+++ b/examples/spark/images/driver/Dockerfile
@@ -0,0 +1,4 @@
 FROM gcr.io/google_containers/spark-base
 ADD start.sh /start.sh
 ADD log4j.properties /opt/spark/conf/log4j.properties
 CMD ["/start.sh"]
--- a/examples/spark/images/driver/README.md
+++ b/examples/spark/images/driver/README.md
--- a/examples/spark/images/driver/start.sh
+++ b/examples/spark/images/driver/start.sh
@@ -0,0 +1,9 @@
 #!/bin/bash
 echo "$SPARK_MASTER_SERVICE_HOST spark-master" >> /etc/hosts
 echo "SPARK_LOCAL_HOSTNAME=$(hostname -i)" >> /opt/spark/conf/spark-env.sh
 echo "MASTER=spark://spark-master:$SPARK_MASTER_SERVICE_PORT" >> /opt/spark/conf/spark-env.sh
 while true; do
 sleep 100
 done
--- a/examples/spark/spark-driver.json
+++ b/examples/spark/spark-driver.json
@@ -0,0 +1,23 @@
 {
  "kind": "Pod",
  "apiVersion": "v1",
  "metadata": {
    "name": "spark-driver",
    "labels": {
      "name": "spark-driver"
    }
  },
  "spec": {
    "containers": [
      {
        "name": "spark-driver",
        "image": "gurvin/spark-driver",
        "resources": {
          "limits": {
            "cpu": "100m"
          }
        }
      }
    ]
  }
 }