This post is a work log for taking a pre-trained Inception-v3 network and repurpose it to colorize a grey scale image. The idea is based on this paper.
- Prepare the dataset: Convert training images from JPEG to HSV values. The input is V and target is HS.
Train a scene classification network using the Places365 data. Use a pre-trained Inception-v3 image classification model.
- Fuse the classification network feature with mid-level feature layers to become “Fusion Layer”, then build a colorization network.
Compute MSE between the predicted HS values and the actual HS values
- Generative adversarial network to generate realistic looking colorized images
[log] I did not end up training the hue and saturation. I was, however, able to generate colorized images although the images are colored green-blue-brownish and desaturated. Apparently this is a common problem. I am now training a deep convolutional generative adversarial network (DCGAN) so that the colorization part uses a discriminator network instead of a mean squared error as the cost function.
[log] I left the network to train for almost 2 days. The training errors are going up and down, and it colored the image all green. I don’t want to make the network more complex yet, so I just added batch normalization layers, tweaked the activation functions, and retrained. I left it to train overnight and this morning its training error is already lower than before. This morning I have a new idea. Perhaps I should have one branch to train the hue and another branch to train the saturation. They can share the fusion layer but have different colorization layers.
[log] I have built the colorization model according to the paper. However our models are not identical because for the classification part I am using Inception-v3, therefore the shape of the low level feature and mid level feature are different from that used in the paper. Initially my training error was not going down, that’s because of a implement error and I have set the learning rate too high. After I addressed those issues the training error is steadily going down.
[log] inception/slim/inception_model.py defines the inception model.
[log] inception-v3 takes an arbitrarily sized image, crop it 87.5%, resize back to original size then resize down to 299×299.
[log] Realized that the downloaded inception model is hardcoded to have batch size of 1. That won’t do for training. It is not feasible to change it. I found a newer version of inception saved in March 2016 that uses checkpoint instead of pb. That should be able to let me change the batch size arbitrarily.
[todo] figure out how to connect the pre-trained model to another network
[log] I played with inception. Since my input will be grayscale, the greatest concern is whether inception plays well with grayscale. Yes, it still classify a cat as a cat. Good enough.
[todo] TensorFlow inception is already an image classifier. Can I take this one and use it? I was working on my own data loader and model because Places365’s pre-built models were not built with TensorFlow. However, inception is trained on color images, not greyscale. That said, can I take the code and make my own modifications to adapt it to greyscale images, then train from scratch? Probably much faster than building everything from scratch on my own. Seems like the answer is yes.
[todo] Improve jpeg->training data performance. Utilize more CPU? Seems like this is not necessary. TensorFlow can process images in a background thread while training. See: Reading Data.
[log] Wondered about reading from jpeg then convert to inputs and targets each time vs converts all jpeg and write all the results to disk, then load a batch at a time during training. Did some quick calculations: input = 256*256*1*4 bytes, target = 128*128*2*4 bytes. Roughly 630GB for the 1.6 million images data set. This is just not feasible with my hardware without spending more money! Instead, I need to spend more time to figure out how to convert the jpeg to training data faster. Currently on my laptop, it takes 140 seconds to convert 5000 images. I noticed that only 30% of the CPU was used while it is doing the conversion.
[todo] Figure out what’s the best way to persist the preprocessed X and Y – not going to do this due to disk space limitation. See log above.
[log] Compared original vs upsampled vs only upsampled HS channels combined with full res V channel. As expected, visually, upsampling only the HS channels shows tolerable visual degradation. DataProcessor now downsamples HS channels to config.output_size.
[todo] Test run preprocessing a large batch of training jpg images.
[todo] Verify that my X and Y outputs are correct. Put them back together, convert back to RGB, and see that image looks right?
[todo] Downsample Y. In the paper they predict the color values at a lower resolution then upsample it.
[todo] Remove hardcoded limit in Utils.py. Currently hardcoded to only list 10 files.
[log] Decompressing the dataset took a long time… Found an index of categories on places365 github. Worked on DataPreprocessor: loading jpg to rgb then convert to HSV and split into inputs (V) and targets (HS).
[todo] Pickle the dataset – not going to do this. Not feasible due to disk limitation. See log on 20160103
[log] I have decided to use the “small” dataset from Places365. It is still going to be 24GB for the training data. In the paper, they converted images to LAB. However, I need an efficient way to create my dataset, and TensorFlow already provide methods to convert colorspace between RGB and HSV, and to decode JPEG. I want to focus on building the neural net so this seems a reasonable compromise.