Retraining Inception-v3 neural network for a new task with Tensorflow

This post is a work log for taking a pre-trained Inception-v3 network and repurpose it to colorize a grey scale image. The idea is based on this paper.


  • Prepare the dataset: Convert training images from JPEG to HSV values. The input is V and target is HS.
  • Train a scene classification network using the Places365 data. Use a pre-trained Inception-v3 image classification model.
  • Fuse the classification network feature with mid-level feature layers to become “Fusion Layer”, then build a colorization network.
  • Compute MSE between the predicted HS values and the actual HS values
  • Generative adversarial network to generate realistic looking colorized images

[log][20170131] I did not end up training the hue and saturation. I was, however, able to generate colorized images although the images are colored green-blue-brownish and desaturated. Apparently this is a common problem. I am now training a deep convolutional generative adversarial network (DCGAN) so that the colorization part uses a discriminator network instead of a mean squared error as the cost function.

[log][20170126] I left the network to train for almost 2 days. The training errors are going up and down, and it colored the image all green. I don’t want to make the network more complex yet, so I just added batch normalization layers, tweaked the activation functions, and retrained. I left it to train overnight and this morning its training error is already lower than before. This morning I have a new idea. Perhaps I should have one branch to train the hue and another branch to train the saturation. They can share the fusion layer but have different colorization layers.

[log][20170124] I have built the colorization model according to the paper. However our models are not identical because for the classification part I am using Inception-v3, therefore the shape of the low level feature and mid level feature are different from that used in the paper. Initially my training error was not going down, that’s because of a implement error and I have set the learning rate too high. After I addressed those issues the training error is steadily going down.

[log][20170110] inception/slim/ defines the inception model.

[log][20170108] inception-v3 takes an arbitrarily sized image, crop it 87.5%, resize back to original size then resize down to 299×299.

[log][20170108] Realized that the downloaded inception model is hardcoded to have batch size of 1. That won’t do for training. It is not feasible to change it. I found a newer version of inception saved in March 2016 that uses checkpoint instead of pb. That should be able to let me change the batch size arbitrarily.

[todo][20170107] figure out how to connect the pre-trained model to another network

[log][20170107] I played with inception. Since my input will be grayscale, the greatest concern is whether inception plays well with grayscale. Yes, it still classify a cat as a cat. Good enough.

[todo][20170103] TensorFlow inception is already an image classifier. Can I take this one and use it? I was working on my own data loader and model because Places365’s pre-built models were not built with TensorFlow. However, inception is trained on color images, not greyscale. That said, can I take the code and make my own modifications to adapt it to greyscale images, then train from scratch? Probably much faster than building everything from scratch on my own. Seems like the answer is yes.

[todo][20170103] Improve jpeg->training data performance. Utilize more CPU? Seems like this is not necessary. TensorFlow can process images in a background thread while training. See: Reading Data.

[log][20170103] Wondered about reading from jpeg then convert to inputs and targets each time vs converts all jpeg and write all the results to disk, then load a batch at a time during training. Did some quick calculations: input = 256*256*1*4 bytes, target = 128*128*2*4 bytes. Roughly 630GB for the 1.6 million images data set. This is just not feasible with my hardware without spending more money! Instead, I need to spend more time to figure out how to convert the jpeg to training data faster. Currently on my laptop, it takes 140 seconds to convert 5000 images. I noticed that only 30% of the CPU was used while it is doing the conversion.

[todo][20170103] Figure out what’s the best way to persist the preprocessed X and Y – not going to do this due to disk space limitation. See log above.

[log][20170103] Compared original vs upsampled vs only upsampled HS channels combined with full res V channel. As expected, visually, upsampling only the HS channels shows tolerable visual degradation. DataProcessor now downsamples HS channels to config.output_size.

[todo][20170103] Test run preprocessing a large batch of training jpg images.

[todo][20170102] Verify that my X and Y outputs are correct. Put them back together, convert back to RGB, and see that image looks right?

[todo][20170102] Downsample Y. In the paper they predict the color values at a lower resolution then upsample it.

[todo][20170102] Remove hardcoded limit in Currently hardcoded to only list 10 files.

[log][20170102] Decompressing the dataset took a long time… Found an index of categories on places365 github. Worked on DataPreprocessor: loading jpg to rgb then convert to HSV and split into inputs (V) and targets (HS).

[todo][20161231] Pickle the dataset – not going to do this. Not feasible due to disk limitation. See log on 20160103

[log][20161231] I have decided to use the “small” dataset from Places365. It is still going to be 24GB for the training data. In the paper, they converted images to LAB. However, I need an efficient way to create my dataset, and TensorFlow already provide methods to convert colorspace between RGB and HSV, and to decode JPEG. I want to focus on building the neural net so this seems a reasonable compromise.


Save and restore TensorFlow session

This blog post is written when TensorFlow was version r0.12.

Stackoverflow and the TensorFlow documentations are pretty clear that Saver is what you want, but it is less clear how to use it in the code. It boils down to this:

  1. Define model
  2. Create session
  3. Initialize variables
  4. If restoring, call saver.restore, passing in the session and path to the directory containing checkpoint files
  5. When saving, call, passing in the session and the path to the save directory
# ... code to define model omitted ...
# Initializing the variables
init = tf.initialize_all_variables()
saver = tf.train.Saver()
with tf.Session() as sess:
    if restore:
        ckpt = tf.train.get_checkpoint_state(save_path)
        if ckpt and ckpt.model_checkpoint_path:
            saver.restore(sess, ckpt.model_checkpoint_path)
        # ... training code omitted ..., save_path)

What if you want to train on one computer and restore it on a different computer? You will get an error: open the file named “checkpoint” with a text editor and edit the paths to the new path.

Here is a working example. Credits to the original author Aymeric Damien. I forked the code and added my edits to save/restore.



An audio dataset and IPython notebook for training a convolutional neural network to distinguish the sound of foosball goals from other noises using TensorFlow


IPython notebook: CNN on Foosball sounds.ipynb
Trained CNN model using TensorFlow: model.ckpt
Pickled Pandas dataframe: full_dataset_44100.pickle


I setup mics at the foosball table and recorded a few hours of foosball games. The audio files were labelled by hand and then segmented into one-second clips of goals / other noises. Mel spectrograms were created from the clips and about 200 samples were created and used for training, testing, and validation, resulting in 5% error on test data.

Data collection and labelling

I used a Zoom H5 XY stereo mic, a Shure SM57, and a few other mics for recording. Each mic had its own characteristic and they were placed at different locations around the table, for instance, pointing close to a goalie, high above the table pointing downward, or from one side of the table pointing at a goalie at the far side. There might be enough differences between each track to improve the model. The audio was recorded in 16-bit wav format at 44.1kHz.

The audio files were labelled manually by playing each file in Audacity at 2x speed and then a label was added on the timeline whenever a goal was heard. Audacity has a function to export the labels to a text file. There was typically half a second to two seconds lag between my labels and the actual foosball goal, so I did a second pass through each label to fine-adjusted them. Adjusting the labels allowed me to use a shorter audio clip as the input to the model later.

For the non-goal samples, I simply took the middle of two goals.

When I recorded the audio, I adjusted the gains such that each mic is more or less at the same level. I did not do any post processing, not even noise reduction.

Notes on dealing with audio data in Python

As mentioned earlier the audio was recorded in 16-bit wav format at sample rate 44.1kHz. 44.1kHz means sound is sampled 44100 times per second. Each sample represents the amplitude of the sound wave at that instance. 16-bit is the bit depth of the samples.

Using librosa to load audio data in Python:

import librosa
y, sr = librosa.core.load("path_to_file")

y is a numpy array of the audio data. sr is the sample rate.

Since sample rate is the number of samples per second, this returns a segment between 00:01 and 00:02:

segment = y[1*sr:2*sr]

Then we can create an audio control to play the clip in IPython notebook:

import IPython.display
from IPython.display import Audio

I wrote some code to read the timestamps and segment the original audio files. I have also pushed the final resulting dataset to GitHub.

I used librosa to create some additional features such as mel spectrograms. It seems to work better than the original waveform for training a neural network.


Loading the ready-to-use dataset

Load full_dataset_44100.pickle:

ds = pandas.read_pickle("full_dataset_44100.pickle")

Play a a clip:

IPython.display.display(IPython.display.Audio(data=ds.iloc[0]["data"], rate=44100))

Building the convolutional neural network

I used the TensorFlow MNIST example as my template but instead of doing mini batches, I used the entire training set for each iteration because there are only 160 samples in the training set. The first neural net I built used 11.025kHz waveform as the input but had 15% error on test data. Then I trained another one using mel spectrograms as the input which yielded better results. After training for 200 iterations, the CNN had 5% error on the test data. It took 15 minutes to train the model on my laptop with i5-6200U CPU @2.30GHz and 8GB ram, in a docker container in a VirtualBox Ubuntu VM in a Windows 10 host!

There are two convolution layers / ReLU / max pooling, two fully connected layers, and softmax as the output.

The IPython notebook is available at along with the pickled dataset and the original recording encoded in mp3 format.


There are some more foosball recordings which I have not labelled. Those data could be used. Would appreciate it if anyone would like to label those foosball games.

The full_dataset_44100.pickle only uses the TrLR track (i.e. the XY stereo mics) and they were prepared from the original WAV files instead of the mp3s. I am interested to know if it would provide any improvement by training with the Tr1 and Tr2 tracks too. They recorded the same foosball games but the mics were placed at different positions and pointed at different directions.


How to setup a TensorFlow Cuda/GPU-enabled dev environment in a Docker container

Getting TensorFlow to run in a Docker container with GPU support is no easy task. I think I have it figured out. Here are my steps to create a Docker image. I have Ubuntu 14 hosting a Ubuntu 14 Docker container.

Install video card (I have a Nvidia GTX 980) Note that Ubuntu runs an open source driver, we want the Nvidia driver.

Sign up for CUDA Developer (Nvidia takes one day to approve it!)

Install Nvidia video driver and  Cuda Tool Kit

ctrl+alt+f1 (to switch to tty1, in order to stop X server to install Cuda)
sudo service lightdm stop
sudo ./
sudo service lightdm start

Install cuDNN v4

tar -xzvf [cudnn tgz file]
cd [cudnn directory]
sudo cp lib* /usr/local/cuda/lib64/ 
sudo cp cudnn.h /usr/local/cuda/include/

Pull a Docker Ubuntu 14 image

docker pull ubuntu

Create directory for TensorFlow source

I created it in /home/dk1027/git on my host. I also used this to transfer files between my host and the docker container.

Run the docker container

export CUDA_SO=$(\ls /usr/lib/x86_64-linux-gnu/libcuda* | xargs -I{} echo '-v {}:{}')
export DEVICES=$(\ls /dev/nvidia* | xargs -I{} echo '--device {}:{}')
docker run -it --device /dev/mem:/dev/mem -v /lib/modules:/lib/modules --cap-add=ALL --privileged $CUDA_SO $DEVICES -v /home/dk1027/git:/git -v /usr/local/cuda:/usr/local/cuda ubuntu /bin/bash

Some explanations:

$CUDA_SO and $DEVICES exposes the GPU and CUDA .so files to the docker container.

/usr/local/cuda is where I installed cudnn and cuda tool kit.

–device /dev/mem:/dev/mem -v /lib/modules:/lib/modules –cap-add=ALL –privileged made it possible to build the example trainer and to run it later, otherwise will run into this error: modprobe: ERROR: ../libkmod/libkmod.c:556 kmod_search_moddep() could not open moddep file '/lib/modules/3.19.0-25-generic/modules.dep.bin'

What follows happens within the container

Install dependencies for building TensorFlow and the example trainer

apt-get install build-essential
apt-get install python-numpy swig python-dev 
apt-get install zlib1g-dev

Following “Install from source” section:

Setup Bazel

apt-get install software-properties-common 
apt-get update 
apt-get install oracle-java8-installer
apt-get install unzip (because the bazel installer needs it)

Downloaded (because that is the version TensorFlow instructions asked to install)

Download the latest bazel (1.5) … otherwise it doesn’t build

chmod +x
./ --user
export PATH="$PATH:$HOME/bin"

Build TensorFlow and example trainer

cd /git/TensorFlow

Follow instructions on the TensorFlow page for ./configure

Not sure why I had an compiler internal seg fault error. Retried and succeeded.

Build example trainer

bazel-bin/tensorflow/cc/tutorials_example_trainer --use_gpu

It got stuck allocating memory, I ran nvidia-smi and saw that GPU memory was used. However it seg fault after a while.

Tried again, took a couple minute, and then it finishes successfully!

Create the pip package and install

sudo apt-get install python-pip python-dev
bazel build -c opt --config=cuda //tensorflow/tools/pip_package:build_pip_package
bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
pip install /tmp/tensorflow_pkg/tensorflow-0.7.0-cp27-none-linux_x86_64.whl
pip uninstall protobuf
pip install protobuf==3.0.0b2 (

Train something!

cd /git/tensorflow/tensorflow/models/image/mnist

Go to the host, on another terminal, run this command. Observe that Python is using the GPU now!

watch nvidia-smi
Fri Feb 19 01:00:40 2016       
| NVIDIA-SMI 352.39     Driver Version: 352.39         |                       
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|   0  GeForce GTX 980     Off  | 0000:01:00.0      On |                  N/A |
| 33%   39C    P2   101W / 390W |   3868MiB /  4088MiB |     82%      Default |
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|    0      1265    G   /usr/bin/X                                     109MiB |
|    0      2104    G   compiz                                          12MiB |
|    0     10751    C   python                                        3728MiB |

It maybe a good idea to commit your container at this point.

Update May 2, 2016

To update from source:

git pull --recurse-submodules
git submodule update --recursive
bazel build -c opt --config=cuda //tensorflow/tools/pip_package:build_pip_package
bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
pip install /tmp/tensorflow_pkg/tensorflow-VERSION-none-linux_x86_64.whl
pip uninstall protobuf
pip install protobuf==3.0.0b2

Resolved: Error building tensorflow from source in docker container

I followed all the instructions in this page but encountered two errors.

Running with verbose mode gives more information on the errors, i.e.:

bazel build -c opt //tensorflow/cc:tutorials_example_trainer --verbose_failures

The first was permission denied:

mount("none", "/", NULL, MS_REC | MS_PRIVATE, NULL): Permission denied

The solution is to run the docker container with the –privileged flag. I am not sure exactly which privileged is needed.

Ran again and encountered the second error. It requires zlib which I have not installed in my image.

apt-get install zlib1g-dev

Afterwards I am able to finish compiling.