Practical things I have learned from doing deep learning

In an earlier project I used parts of a pre-trained large scale classification network to train a CNN for automatic colorization. Here are a few lessons I have learned to make myself more efficient.

  1. Prefer many short development iterations over long cycles
    Quickly build something and train it. It is almost always better to have a prototype quickly and gain some feedback on it and iterate this many times, than to spend a whole lot of effort to build something great and shiny that I wouldn’t gain any feedback until much later. This is especially true if the task is not well understood and requires experimentation.
  2. Try small models. Try lots of models.
    This follows #1. For example, if the goal is to train a model that generate images, try models that generates low res images because a small model can be trained much faster (a larger batch can be used). Try many ideas. Once you find something that works well, you can even transfer part of your low res model to train your high res model faster.
  3. Make yourself 10x more efficient by writing reusable code
    Chances are this is not the last model you train. Identify work that is common and write code that can be reused for your next model, so that next time you have less work to do. Examples: visualize data, preprocessing data, save and load data, helper functions that build parts of a model. Not only are you going to spend less time writing code next time, you will spend a lot less time debugging too.
  4. Tune learning rate early on
    Sometimes the training error goes up and down right off the bat and it wouldn’t go down much, if at all. When that happens reduce the learning rate by a magnitude. Play with the learning rate a few times before changing the model.
  5. Use batch normalization
    I am not going to explain batch norm here. Look it up and use it. Often the model trains a lot faster with batch norm although it uses more memory.
  6. Visualize
    Plotting error curves helps. It is even better if you can visualize the results to diagnose problems you might encounter. For example, my automatic colorization model painted everything green in the beginning, and looking at the errors wouldn’t have helped me.
  7. Determine if you need more data or a more complex model
    If the training error is high, then the model is not using the training data and more data is not going to help. Try to tweak the model.
    If training error is low but test error is high, then more data may help.
    if training error is low and test error is low, then you are done 🙂
  8. Examine the training data
    Is the training data useful to begin with? For example, in my automatic colorization project, I noticed that some of my images are black and white! That wouldn’t help with colorization.
  9. Pause and think
    It is tempting to dive right into the code. Resist that temptation. It is way easier to design and change the model on paper than to do it in code. Same goes for diagnosing problems. Once you have done the thinking, it is a piece of cake to transcribe the ideas into code.

Setup Spark 2 on Ubuntu 14 and run with Python

 

These instructions worked in 2017.

Ubuntu 14.04 / Spark 2.1.0 / Python

Download Spark
tar xzf spark-x.x.x-bin-hadoopx.x.tgz
mv spark-x.x.x-bin-hadoopx.x ~/spark-x.x.x-bin-hadoopx.x

sudo apt-get install python-software-properties
sudo add-apt-repository ppa:webupd8team/java
sudo apt-get update
sudo apt-get install oracle-java8-installer
sudo apt-get install python-pip
sudo pip install py4j

Add the following to ~/.bashrc
export JAVA_HOME=/usr/lib/jvm/java-8-oracle/
export SPARK_HOME=/home/your-user-name/spark-x.x.x-bin-hadoopx.x/
export PYTHONPATH=$SPARK_HOME/python/:$PYTHONPATH

source ~/.bashrc

Now you are ready to run something, for example word count.

 

Retraining Inception-v3 neural network for a new task with Tensorflow

This post is a work log for taking a pre-trained Inception-v3 network and repurpose it to colorize a grey scale image. The idea is based on this paper.

Plan:

  • Prepare the dataset: Convert training images from JPEG to HSV values. The input is V and target is HS.
  • Train a scene classification network using the Places365 data. Use a pre-trained Inception-v3 image classification model.
  • Fuse the classification network feature with mid-level feature layers to become “Fusion Layer”, then build a colorization network.
  • Compute MSE between the predicted HS values and the actual HS values
  • Generative adversarial network to generate realistic looking colorized images

[log][20170131] I did not end up training the hue and saturation. I was, however, able to generate colorized images although the images are colored green-blue-brownish and desaturated. Apparently this is a common problem. I am now training a deep convolutional generative adversarial network (DCGAN) so that the colorization part uses a discriminator network instead of a mean squared error as the cost function.

[log][20170126] I left the network to train for almost 2 days. The training errors are going up and down, and it colored the image all green. I don’t want to make the network more complex yet, so I just added batch normalization layers, tweaked the activation functions, and retrained. I left it to train overnight and this morning its training error is already lower than before. This morning I have a new idea. Perhaps I should have one branch to train the hue and another branch to train the saturation. They can share the fusion layer but have different colorization layers.

[log][20170124] I have built the colorization model according to the paper. However our models are not identical because for the classification part I am using Inception-v3, therefore the shape of the low level feature and mid level feature are different from that used in the paper. Initially my training error was not going down, that’s because of a implement error and I have set the learning rate too high. After I addressed those issues the training error is steadily going down.

[log][20170110] inception/slim/inception_model.py defines the inception model.

[log][20170108] inception-v3 takes an arbitrarily sized image, crop it 87.5%, resize back to original size then resize down to 299×299.

[log][20170108] Realized that the downloaded inception model is hardcoded to have batch size of 1. That won’t do for training. It is not feasible to change it. I found a newer version of inception saved in March 2016 that uses checkpoint instead of pb. That should be able to let me change the batch size arbitrarily.

[todo][20170107] figure out how to connect the pre-trained model to another network

[log][20170107] I played with inception. Since my input will be grayscale, the greatest concern is whether inception plays well with grayscale. Yes, it still classify a cat as a cat. Good enough.

[todo][20170103] TensorFlow inception is already an image classifier. Can I take this one and use it? I was working on my own data loader and model because Places365’s pre-built models were not built with TensorFlow. However, inception is trained on color images, not greyscale. That said, can I take the code and make my own modifications to adapt it to greyscale images, then train from scratch? Probably much faster than building everything from scratch on my own. Seems like the answer is yes.

[todo][20170103] Improve jpeg->training data performance. Utilize more CPU? Seems like this is not necessary. TensorFlow can process images in a background thread while training. See: Reading Data.

[log][20170103] Wondered about reading from jpeg then convert to inputs and targets each time vs converts all jpeg and write all the results to disk, then load a batch at a time during training. Did some quick calculations: input = 256*256*1*4 bytes, target = 128*128*2*4 bytes. Roughly 630GB for the 1.6 million images data set. This is just not feasible with my hardware without spending more money! Instead, I need to spend more time to figure out how to convert the jpeg to training data faster. Currently on my laptop, it takes 140 seconds to convert 5000 images. I noticed that only 30% of the CPU was used while it is doing the conversion.

[todo][20170103] Figure out what’s the best way to persist the preprocessed X and Y – not going to do this due to disk space limitation. See log above.

[log][20170103] Compared original vs upsampled vs only upsampled HS channels combined with full res V channel. As expected, visually, upsampling only the HS channels shows tolerable visual degradation. DataProcessor now downsamples HS channels to config.output_size.

[todo][20170103] Test run preprocessing a large batch of training jpg images.

[todo][20170102] Verify that my X and Y outputs are correct. Put them back together, convert back to RGB, and see that image looks right?

[todo][20170102] Downsample Y. In the paper they predict the color values at a lower resolution then upsample it.

[todo][20170102] Remove hardcoded limit in Utils.py. Currently hardcoded to only list 10 files.

[log][20170102] Decompressing the dataset took a long time… Found an index of categories on places365 github. Worked on DataPreprocessor: loading jpg to rgb then convert to HSV and split into inputs (V) and targets (HS).

[todo][20161231] Pickle the dataset – not going to do this. Not feasible due to disk limitation. See log on 20160103

[log][20161231] I have decided to use the “small” dataset from Places365. It is still going to be 24GB for the training data. In the paper, they converted images to LAB. However, I need an efficient way to create my dataset, and TensorFlow already provide methods to convert colorspace between RGB and HSV, and to decode JPEG. I want to focus on building the neural net so this seems a reasonable compromise.

Save and restore TensorFlow session

This blog post is written when TensorFlow was version r0.12.

Stackoverflow and the TensorFlow documentations are pretty clear that Saver is what you want, but it is less clear how to use it in the code. It boils down to this:

  1. Define model
  2. Create session
  3. Initialize variables
  4. If restoring, call saver.restore, passing in the session and path to the directory containing checkpoint files
  5. When saving, call saver.save, passing in the session and the path to the save directory
# ... code to define model omitted ...
# Initializing the variables
init = tf.initialize_all_variables()
saver = tf.train.Saver()
with tf.Session() as sess:
    sess.run(init)
    if restore:
        ckpt = tf.train.get_checkpoint_state(save_path)
        if ckpt and ckpt.model_checkpoint_path:
            saver.restore(sess, ckpt.model_checkpoint_path)
    else:
        # ... training code omitted ...
        saver.save(sess, save_path)

What if you want to train on one computer and restore it on a different computer? You will get an error: open the file named “checkpoint” with a text editor and edit the paths to the new path.

Here is a working example. Credits to the original author Aymeric Damien. I forked the code and added my edits to save/restore.

 

 

Notes on Recurrent Neural Networks

Recurrent neural nets have states, unlike feed-forward networks. An analogy for RNN is the C strtok function, where calling it with the same parameter typically yields a different value (but of course, unlike strtok, RNN does not modify the input). An analogy for feed-forward networks is a function in the mathematical sense, where y=f(x) regardless of how many times it was called.

At first I thought what makes RNN special is that it uses its own output as part of its input. While that’s true, after more reading, it seems that the magic really is the cell state. The cell state in an RNN is updated each time it processes the input. Using the strtok analogy, it is like how strtok updates its internal position of the last token each time strtok is called, so the next time you call it, it returns the next token.

So RNN is like a program whereas a feed-forward network is like a function.

https://www.tensorflow.org/versions/r0.12/tutorials/recurrent/index.html

http://karpathy.github.io/2015/05/21/rnn-effectiveness/

http://colah.github.io/posts/2015-08-Understanding-LSTMs/