Python: Custom WITH statement / Context Manager

The Python WITH statement lets an object do something when it enters the scope and when it exits the scope. You must have seen code like this:

with open(filename) as f: 
    # do something

The file will be closed when it is out of scope no matter how it goes out of scope. This behavior is implemented by a pair of methods, and classes that implement these methods are called context managers. Here is an example:

class Demo:   
    def __init__(self, msg):
        print("creating")
        self.msg = msg
    
    def __enter__(self):
        print("entering")
        return self
    
    def __exit__(self, type, value, traceback):
        print("exiting")
        
    def hello(self):
        print(self.msg)
        
with Demo('hello world!') as d:
    d.hello()

Results:

creating
entering
hello world!
exiting

 

 

Advertisements

Setup Spark 2 on Ubuntu 14 and run with Python

 

These instructions worked in 2017.

Ubuntu 14.04 / Spark 2.1.0 / Python

Download Spark
tar xzf spark-x.x.x-bin-hadoopx.x.tgz
mv spark-x.x.x-bin-hadoopx.x ~/spark-x.x.x-bin-hadoopx.x

sudo apt-get install python-software-properties
sudo add-apt-repository ppa:webupd8team/java
sudo apt-get update
sudo apt-get install oracle-java8-installer
sudo apt-get install python-pip
sudo pip install py4j

Add the following to ~/.bashrc
export JAVA_HOME=/usr/lib/jvm/java-8-oracle/
export SPARK_HOME=/home/your-user-name/spark-x.x.x-bin-hadoopx.x/
export PYTHONPATH=$SPARK_HOME/python/:$PYTHONPATH

source ~/.bashrc

Now you are ready to run something, for example word count.

 

An audio dataset and IPython notebook for training a convolutional neural network to distinguish the sound of foosball goals from other noises using TensorFlow

tl;dr:

foos

https://github.com/dk1027/ConvolutionalNeuralNetOnFoosballSounds
IPython notebook: CNN on Foosball sounds.ipynb
Trained CNN model using TensorFlow: model.ckpt
Pickled Pandas dataframe: full_dataset_44100.pickle

Abstract

I setup mics at the foosball table and recorded a few hours of foosball games. The audio files were labelled by hand and then segmented into one-second clips of goals / other noises. Mel spectrograms were created from the clips and about 200 samples were created and used for training, testing, and validation, resulting in 5% error on test data.

Data collection and labelling

I used a Zoom H5 XY stereo mic, a Shure SM57, and a few other mics for recording. Each mic had its own characteristic and they were placed at different locations around the table, for instance, pointing close to a goalie, high above the table pointing downward, or from one side of the table pointing at a goalie at the far side. There might be enough differences between each track to improve the model. The audio was recorded in 16-bit wav format at 44.1kHz.

The audio files were labelled manually by playing each file in Audacity at 2x speed and then a label was added on the timeline whenever a goal was heard. Audacity has a function to export the labels to a text file. There was typically half a second to two seconds lag between my labels and the actual foosball goal, so I did a second pass through each label to fine-adjusted them. Adjusting the labels allowed me to use a shorter audio clip as the input to the model later.

For the non-goal samples, I simply took the middle of two goals.

When I recorded the audio, I adjusted the gains such that each mic is more or less at the same level. I did not do any post processing, not even noise reduction.

Notes on dealing with audio data in Python

As mentioned earlier the audio was recorded in 16-bit wav format at sample rate 44.1kHz. 44.1kHz means sound is sampled 44100 times per second. Each sample represents the amplitude of the sound wave at that instance. 16-bit is the bit depth of the samples.

Using librosa to load audio data in Python:

import librosa
y, sr = librosa.core.load("path_to_file")

y is a numpy array of the audio data. sr is the sample rate.

Since sample rate is the number of samples per second, this returns a segment between 00:01 and 00:02:

segment = y[1*sr:2*sr]

Then we can create an audio control to play the clip in IPython notebook:

import IPython.display
from IPython.display import Audio
IPython.display.display(IPython.display.Audio(segment))

I wrote some code to read the timestamps and segment the original audio files. I have also pushed the final resulting dataset to GitHub.

I used librosa to create some additional features such as mel spectrograms. It seems to work better than the original waveform for training a neural network.

See https://bmcfee.github.io/librosa/generated/librosa.feature.melspectrogram.html

Loading the ready-to-use dataset

Load full_dataset_44100.pickle:

ds = pandas.read_pickle("full_dataset_44100.pickle")

Play a a clip:

IPython.display.display(IPython.display.Audio(data=ds.iloc[0]["data"], rate=44100))

Building the convolutional neural network

I used the TensorFlow MNIST example as my template but instead of doing mini batches, I used the entire training set for each iteration because there are only 160 samples in the training set. The first neural net I built used 11.025kHz waveform as the input but had 15% error on test data. Then I trained another one using mel spectrograms as the input which yielded better results. After training for 200 iterations, the CNN had 5% error on the test data. It took 15 minutes to train the model on my laptop with i5-6200U CPU @2.30GHz and 8GB ram, in a docker container in a VirtualBox Ubuntu VM in a Windows 10 host!

There are two convolution layers / ReLU / max pooling, two fully connected layers, and softmax as the output.

The IPython notebook is available at https://github.com/dk1027/ConvolutionalNeuralNetOnFoosballSounds along with the pickled dataset and the original recording encoded in mp3 format.

Improvements

There are some more foosball recordings which I have not labelled. Those data could be used. Would appreciate it if anyone would like to label those foosball games.

The full_dataset_44100.pickle only uses the TrLR track (i.e. the XY stereo mics) and they were prepared from the original WAV files instead of the mp3s. I am interested to know if it would provide any improvement by training with the Tr1 and Tr2 tracks too. They recorded the same foosball games but the mics were placed at different positions and pointed at different directions.

 

Mesos framework example in Python

The example I am presenting here is built on this blog post by James J Porter. James’ example runs a shell command to echo Hello World. Now, let’s say we want to do some computation and the framework needs to tell the slaves to get the binaries/data from somewhere. How do we do that?

The answer lies in the mesos.proto file:

/**
 * Describes a command, executed via: '/bin/sh -c value'. Any URIs specified
 * are fetched before executing the command.  If the executable field for an
 * uri is set, executable file permission is set on the downloaded file.
 * Otherwise, if the downloaded file has a recognized archive extension
 * (currently [compressed] tar and zip) it is extracted into the executor's
 * working directory. This extraction can be disabled by setting `extract` to
 * false. In addition, any environment variables are set before executing
 * the command (so they can be used to "parameterize" your command).
 */
message CommandInfo {
  message URI {
    required string value = 1;
    optional bool executable = 2;
    optional bool extract = 3 [default = true];
  }

Mesos take cares of downloading and extracting the files; all we need is state where to get it from. Therefore compose the task message like this (in Python):


task = new_task(offer)
uri = task.command.uris.add()
uri.value = "path-to-file"
task.command.value = "command to-run-something"

What exactly do we put as the uri? Looking at the Mesos code here, turns out the URI can point to a local file, HDFS, http, ftp, etc.

Base on these new found knowledge, I put bells and whistles on the hello world framework (actually I took the hello world part away). The example framework here is going to tell the slaves to talk to a web service to get a Python script, use the script to sum up a few numbers, then send the results to the web service. The web service is going to print out the results. When it is done, the framework will stop.

Continue reading