Deep Learning

In an earlier project I used parts of a pre-trained large scale classification network to train a CNN for automatic colorization. Here are a few lessons I have learned to make myself more efficient.

Prefer many short development iterations over long cycles
Quickly build something and train it. It is almost always better to have a prototype quickly and gain some feedback on it and iterate this many times, than to spend a whole lot of effort to build something great and shiny that I wouldn’t gain any feedback until much later. This is especially true if the task is not well understood and requires experimentation.
Try small models. Try lots of models.
This follows #1. For example, if the goal is to train a model that generate images, try models that generates low res images because a small model can be trained much faster (a larger batch can be used). Try many ideas. Once you find something that works well, you can even transfer part of your low res model to train your high res model faster.
Make yourself 10x more efficient by writing reusable code
Chances are this is not the last model you train. Identify work that is common and write code that can be reused for your next model, so that next time you have less work to do. Examples: visualize data, preprocessing data, save and load data, helper functions that build parts of a model. Not only are you going to spend less time writing code next time, you will spend a lot less time debugging too.
Tune learning rate early on
Sometimes the training error goes up and down right off the bat and it wouldn’t go down much, if at all. When that happens reduce the learning rate by a magnitude. Play with the learning rate a few times before changing the model.
Use batch normalization
I am not going to explain batch norm here. Look it up and use it. Often the model trains a lot faster with batch norm although it uses more memory.
Visualize
Plotting error curves helps. It is even better if you can visualize the results to diagnose problems you might encounter. For example, my automatic colorization model painted everything green in the beginning, and looking at the errors wouldn’t have helped me.
Determine if you need more data or a more complex model
If the training error is high, then the model is not using the training data and more data is not going to help. Try to tweak the model.
If training error is low but test error is high, then more data may help.
if training error is low and test error is low, then you are done 🙂
Examine the training data
Is the training data useful to begin with? For example, in my automatic colorization project, I noticed that some of my images are black and white! That wouldn’t help with colorization.
Pause and think
It is tempting to dive right into the code. Resist that temptation. It is way easier to design and change the model on paper than to do it in code. Same goes for diagnosing problems. Once you have done the thinking, it is a piece of cake to transcribe the ideas into code.

humblesoftwaredev

Thoughts related to software development

Practical things I have learned from doing deep learning