Setup Spark 2 on Ubuntu 14 and run with Python


These instructions worked in 2017.

Ubuntu 14.04 / Spark 2.1.0 / Python

Download Spark
tar xzf spark-x.x.x-bin-hadoopx.x.tgz
mv spark-x.x.x-bin-hadoopx.x ~/spark-x.x.x-bin-hadoopx.x

sudo apt-get install python-software-properties
sudo add-apt-repository ppa:webupd8team/java
sudo apt-get update
sudo apt-get install oracle-java8-installer
sudo apt-get install python-pip
sudo pip install py4j

Add the following to ~/.bashrc
export JAVA_HOME=/usr/lib/jvm/java-8-oracle/
export SPARK_HOME=/home/your-user-name/spark-x.x.x-bin-hadoopx.x/

source ~/.bashrc

Now you are ready to run something, for example word count.



Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s