Files
kubernetes/examples/spark/images/base/Dockerfile
Zach Loafman 51817850ba Zeppelin: Add Zeppelin image to Spark example
This adds a very basic Zeppelin image that works with the existing
Spark example. As can be seen from the documentation, it has a couple
of warts:

* It requires kubectl port-forward (which is unstable across long
periods of time, at least for me, on this app, bug incoming). See

* I needed to roll my own container (none of the existing containers
exactly matched needs, or even built anymore against modern Zeppelin
master, and the rest of the example is Spark 1.5).

The image itself is *huge*. One of the further refinements we need to
look at is how to possibly strip the Maven build for this container
down to just the interpreters we care about, because the deps here
are frankly ridiculous.

This might be a case where, if possible, we might want to open an
upstream request to build things dynamically, then use something like
probably the cut the image down considerably. (This might already be
possible, need to poke at whether you can late-bind interpreters
later.)
2015-11-13 12:02:11 -08:00

34 lines
1.3 KiB
Docker

FROM java:openjdk-8-jdk
ENV hadoop_ver 2.6.1
ENV spark_ver 1.5.1
# Get Hadoop from US Apache mirror and extract just the native
# libs. (Until we care about running HDFS with these containers, this
# is all we need.)
RUN mkdir -p /opt && \
cd /opt && \
wget http://www.us.apache.org/dist/hadoop/common/hadoop-${hadoop_ver}/hadoop-${hadoop_ver}.tar.gz && \
tar -zvxf hadoop-${hadoop_ver}.tar.gz hadoop-${hadoop_ver}/lib/native && \
rm hadoop-${hadoop_ver}.tar.gz && \
ln -s hadoop-${hadoop_ver} hadoop && \
echo Hadoop ${hadoop_ver} native libraries installed in /opt/hadoop/lib/native
# Get Spark from US Apache mirror.
RUN mkdir -p /opt && \
cd /opt && \
wget http://www.us.apache.org/dist/spark/spark-${spark_ver}/spark-${spark_ver}-bin-hadoop2.6.tgz && \
tar -zvxf spark-${spark_ver}-bin-hadoop2.6.tgz && \
rm spark-${spark_ver}-bin-hadoop2.6.tgz && \
ln -s spark-${spark_ver}-bin-hadoop2.6 spark && \
echo Spark ${spark_ver} installed in /opt
# Add the GCS connector.
RUN wget -O /opt/spark/lib/gcs-connector-latest-hadoop2.jar https://storage.googleapis.com/hadoop-lib/gcs/gcs-connector-latest-hadoop2.jar
ADD log4j.properties /opt/spark/conf/log4j.properties
ADD start-common.sh /
ADD core-site.xml /opt/spark/conf/core-site.xml
ADD spark-defaults.conf /opt/spark/conf/spark-defaults.conf
ENV PATH $PATH:/opt/spark/bin