Setup Spark 2 on Ubuntu 14 and run with Python

 

These instructions worked in 2017.

Ubuntu 14.04 / Spark 2.1.0 / Python

Download Spark
tar xzf spark-x.x.x-bin-hadoopx.x.tgz
mv spark-x.x.x-bin-hadoopx.x ~/spark-x.x.x-bin-hadoopx.x

sudo apt-get install python-software-properties
sudo add-apt-repository ppa:webupd8team/java
sudo apt-get update
sudo apt-get install oracle-java8-installer
sudo apt-get install python-pip
sudo pip install py4j

Add the following to ~/.bashrc
export JAVA_HOME=/usr/lib/jvm/java-8-oracle/
export SPARK_HOME=/home/your-user-name/spark-x.x.x-bin-hadoopx.x/
export PYTHONPATH=$SPARK_HOME/python/:$PYTHONPATH

source ~/.bashrc

Now you are ready to run something, for example word count.

 

Advertisements