Practical things I have learned from doing deep learning

In an earlier project I used parts of a pre-trained large scale classification network to train a CNN for automatic colorization. Here are a few lessons I have learned to make myself more efficient.

  1. Prefer many short development iterations over long cycles
    Quickly build something and train it. It is almost always better to have a prototype quickly and gain some feedback on it and iterate this many times, than to spend a whole lot of effort to build something great and shiny that I wouldn’t gain any feedback until much later. This is especially true if the task is not well understood and requires experimentation.
  2. Try small models. Try lots of models.
    This follows #1. For example, if the goal is to train a model that generate images, try models that generates low res images because a small model can be trained much faster (a larger batch can be used). Try many ideas. Once you find something that works well, you can even transfer part of your low res model to train your high res model faster.
  3. Make yourself 10x more efficient by writing reusable code
    Chances are this is not the last model you train. Identify work that is common and write code that can be reused for your next model, so that next time you have less work to do. Examples: visualize data, preprocessing data, save and load data, helper functions that build parts of a model. Not only are you going to spend less time writing code next time, you will spend a lot less time debugging too.
  4. Tune learning rate early on
    Sometimes the training error goes up and down right off the bat and it wouldn’t go down much, if at all. When that happens reduce the learning rate by a magnitude. Play with the learning rate a few times before changing the model.
  5. Use batch normalization
    I am not going to explain batch norm here. Look it up and use it. Often the model trains a lot faster with batch norm although it uses more memory.
  6. Visualize
    Plotting error curves helps. It is even better if you can visualize the results to diagnose problems you might encounter. For example, my automatic colorization model painted everything green in the beginning, and looking at the errors wouldn’t have helped me.
  7. Determine if you need more data or a more complex model
    If the training error is high, then the model is not using the training data and more data is not going to help. Try to tweak the model.
    If training error is low but test error is high, then more data may help.
    if training error is low and test error is low, then you are done 🙂
  8. Examine the training data
    Is the training data useful to begin with? For example, in my automatic colorization project, I noticed that some of my images are black and white! That wouldn’t help with colorization.
  9. Pause and think
    It is tempting to dive right into the code. Resist that temptation. It is way easier to design and change the model on paper than to do it in code. Same goes for diagnosing problems. Once you have done the thinking, it is a piece of cake to transcribe the ideas into code.

Laziness Driven Development

This is not a new idea at all but it is seldom applied. I like to call it “being lazy”. The essence is to identify what is painful and then eliminate the pain. Spend some time each sprint to eliminate pain and see huge productivity improvement.

Here are some examples:

  • It takes a few minute to compile the code, load the binaries to the embedded device, and restart the program. This is done many, many times a day when we are working on the code. The idle time is not enough to work on a different task but is definitely enough to break my chain of thought. Instead, I spent about a day to create a Docker image containing the entire tool chain to compile and run the program. We can now compile the code and run it. No more waiting time.
  • I recently joined a project which has its own source control system different from everyone else in the company. It isn’t integrated with the code review tools so doing code review was a pain. People complained about it but tolerated it anyway because while it was painful, it was not that painful. I, on the other hand, didn’t want the pain at all (see this post where I described a temporarily method to make it less painful). I talked to DevOp and asked for an integration with the code review tools. Now, it is all integrated. No pain at all.
  • When implementing a particular feature, think about what’s the general problem then solve the general problem. Any instances of the general problem can then be trivially solved. This is exactly what libraries do.