My ideal Bible app

I am unable to find a simple Bible app for reading during sermon. It seems like most apps are either cluttered with features or restricted to one or two specific translations. So here is my spec for my ideal app.

As a church goer with a small budget for mobile data and devices, I want to navigate to a specific Bible location on a specific translation while offline with as little latency as possible so that I can focus on the pastor’s sermon.

  • Targeted for budget Android mobile devices released after 2015 with minimum 2GB of ram
  • Maximum startup time of 2 seconds regardless of connectivity
  • Priority on reading
    • No unnecessary features like Biblical social media, reading plans, reading reminder, reading-of-the-day, dictionary, maps, notes taking tool, bookmarks, read-out-loud, commentary, cross references.
    • No ads, no requests-to-like, no requests-to-rate-this-app, no requests-to-share, no requests-to-sign-in
    • Minimal UI complexity
  • Fuzzy search – single text input box with ability to clear the input
    • Ability to search by partially entering book name and optionally chapter number / verse number and optionally specifying the Bible translation, with some tolerance of spelling mistakes
    • Ability to search by entering arbitrary text
    • Maximum load time of 2 seconds to display text after selecting a search result
  • Offline reading
    • Ability to download different Bible translations for offline reading where license conditions permits
  • Sharing
    • Only text copy is available
  • Industry standard navigation controls
    • Display text by chapter, scrolling up and down by swiping
    • Navigate to previous / next chapter by swiping left / right or using volume keys
    • A book selection function so that the user can select a book if he only remembers its relative position in the Bible
    • Near instantaneous load time when navigating up/down/left/right
    • Standard Android back button behavior (unimportant to me as long as it is not astonishing)


Practical things I have learned from doing deep learning

In an earlier project I used parts of a pre-trained large scale classification network to train a CNN for automatic colorization. Here are a few lessons I have learned to make myself more efficient.

  1. Prefer many short development iterations over long cycles
    Quickly build something and train it. It is almost always better to have a prototype quickly and gain some feedback on it and iterate this many times, than to spend a whole lot of effort to build something great and shiny that I wouldn’t gain any feedback until much later. This is especially true if the task is not well understood and requires experimentation.
  2. Try small models. Try lots of models.
    This follows #1. For example, if the goal is to train a model that generate images, try models that generates low res images because a small model can be trained much faster (a larger batch can be used). Try many ideas. Once you find something that works well, you can even transfer part of your low res model to train your high res model faster.
  3. Make yourself 10x more efficient by writing reusable code
    Chances are this is not the last model you train. Identify work that is common and write code that can be reused for your next model, so that next time you have less work to do. Examples: visualize data, preprocessing data, save and load data, helper functions that build parts of a model. Not only are you going to spend less time writing code next time, you will spend a lot less time debugging too.
  4. Tune learning rate early on
    Sometimes the training error goes up and down right off the bat and it wouldn’t go down much, if at all. When that happens reduce the learning rate by a magnitude. Play with the learning rate a few times before changing the model.
  5. Use batch normalization
    I am not going to explain batch norm here. Look it up and use it. Often the model trains a lot faster with batch norm although it uses more memory.
  6. Visualize
    Plotting error curves helps. It is even better if you can visualize the results to diagnose problems you might encounter. For example, my automatic colorization model painted everything green in the beginning, and looking at the errors wouldn’t have helped me.
  7. Determine if you need more data or a more complex model
    If the training error is high, then the model is not using the training data and more data is not going to help. Try to tweak the model.
    If training error is low but test error is high, then more data may help.
    if training error is low and test error is low, then you are done 🙂
  8. Examine the training data
    Is the training data useful to begin with? For example, in my automatic colorization project, I noticed that some of my images are black and white! That wouldn’t help with colorization.
  9. Pause and think
    It is tempting to dive right into the code. Resist that temptation. It is way easier to design and change the model on paper than to do it in code. Same goes for diagnosing problems. Once you have done the thinking, it is a piece of cake to transcribe the ideas into code.

Setup Spark 2 on Ubuntu 14 and run with Python


These instructions worked in 2017.

Ubuntu 14.04 / Spark 2.1.0 / Python

Download Spark
tar xzf spark-x.x.x-bin-hadoopx.x.tgz
mv spark-x.x.x-bin-hadoopx.x ~/spark-x.x.x-bin-hadoopx.x

sudo apt-get install python-software-properties
sudo add-apt-repository ppa:webupd8team/java
sudo apt-get update
sudo apt-get install oracle-java8-installer
sudo apt-get install python-pip
sudo pip install py4j

Add the following to ~/.bashrc
export JAVA_HOME=/usr/lib/jvm/java-8-oracle/
export SPARK_HOME=/home/your-user-name/spark-x.x.x-bin-hadoopx.x/

source ~/.bashrc

Now you are ready to run something, for example word count.


Retraining Inception-v3 neural network for a new task with Tensorflow

This post is a work log for taking a pre-trained Inception-v3 network and repurpose it to colorize a grey scale image. The idea is based on this paper.


  • Prepare the dataset: Convert training images from JPEG to HSV values. The input is V and target is HS.
  • Train a scene classification network using the Places365 data. Use a pre-trained Inception-v3 image classification model.
  • Fuse the classification network feature with mid-level feature layers to become “Fusion Layer”, then build a colorization network.
  • Compute MSE between the predicted HS values and the actual HS values
  • Generative adversarial network to generate realistic looking colorized images

[log][20170131] I did not end up training the hue and saturation. I was, however, able to generate colorized images although the images are colored green-blue-brownish and desaturated. Apparently this is a common problem. I am now training a deep convolutional generative adversarial network (DCGAN) so that the colorization part uses a discriminator network instead of a mean squared error as the cost function.

[log][20170126] I left the network to train for almost 2 days. The training errors are going up and down, and it colored the image all green. I don’t want to make the network more complex yet, so I just added batch normalization layers, tweaked the activation functions, and retrained. I left it to train overnight and this morning its training error is already lower than before. This morning I have a new idea. Perhaps I should have one branch to train the hue and another branch to train the saturation. They can share the fusion layer but have different colorization layers.

[log][20170124] I have built the colorization model according to the paper. However our models are not identical because for the classification part I am using Inception-v3, therefore the shape of the low level feature and mid level feature are different from that used in the paper. Initially my training error was not going down, that’s because of a implement error and I have set the learning rate too high. After I addressed those issues the training error is steadily going down.

[log][20170110] inception/slim/ defines the inception model.

[log][20170108] inception-v3 takes an arbitrarily sized image, crop it 87.5%, resize back to original size then resize down to 299×299.

[log][20170108] Realized that the downloaded inception model is hardcoded to have batch size of 1. That won’t do for training. It is not feasible to change it. I found a newer version of inception saved in March 2016 that uses checkpoint instead of pb. That should be able to let me change the batch size arbitrarily.

[todo][20170107] figure out how to connect the pre-trained model to another network

[log][20170107] I played with inception. Since my input will be grayscale, the greatest concern is whether inception plays well with grayscale. Yes, it still classify a cat as a cat. Good enough.

[todo][20170103] TensorFlow inception is already an image classifier. Can I take this one and use it? I was working on my own data loader and model because Places365’s pre-built models were not built with TensorFlow. However, inception is trained on color images, not greyscale. That said, can I take the code and make my own modifications to adapt it to greyscale images, then train from scratch? Probably much faster than building everything from scratch on my own. Seems like the answer is yes.

[todo][20170103] Improve jpeg->training data performance. Utilize more CPU? Seems like this is not necessary. TensorFlow can process images in a background thread while training. See: Reading Data.

[log][20170103] Wondered about reading from jpeg then convert to inputs and targets each time vs converts all jpeg and write all the results to disk, then load a batch at a time during training. Did some quick calculations: input = 256*256*1*4 bytes, target = 128*128*2*4 bytes. Roughly 630GB for the 1.6 million images data set. This is just not feasible with my hardware without spending more money! Instead, I need to spend more time to figure out how to convert the jpeg to training data faster. Currently on my laptop, it takes 140 seconds to convert 5000 images. I noticed that only 30% of the CPU was used while it is doing the conversion.

[todo][20170103] Figure out what’s the best way to persist the preprocessed X and Y – not going to do this due to disk space limitation. See log above.

[log][20170103] Compared original vs upsampled vs only upsampled HS channels combined with full res V channel. As expected, visually, upsampling only the HS channels shows tolerable visual degradation. DataProcessor now downsamples HS channels to config.output_size.

[todo][20170103] Test run preprocessing a large batch of training jpg images.

[todo][20170102] Verify that my X and Y outputs are correct. Put them back together, convert back to RGB, and see that image looks right?

[todo][20170102] Downsample Y. In the paper they predict the color values at a lower resolution then upsample it.

[todo][20170102] Remove hardcoded limit in Currently hardcoded to only list 10 files.

[log][20170102] Decompressing the dataset took a long time… Found an index of categories on places365 github. Worked on DataPreprocessor: loading jpg to rgb then convert to HSV and split into inputs (V) and targets (HS).

[todo][20161231] Pickle the dataset – not going to do this. Not feasible due to disk limitation. See log on 20160103

[log][20161231] I have decided to use the “small” dataset from Places365. It is still going to be 24GB for the training data. In the paper, they converted images to LAB. However, I need an efficient way to create my dataset, and TensorFlow already provide methods to convert colorspace between RGB and HSV, and to decode JPEG. I want to focus on building the neural net so this seems a reasonable compromise.

Save and restore TensorFlow session

This blog post is written when TensorFlow was version r0.12.

Stackoverflow and the TensorFlow documentations are pretty clear that Saver is what you want, but it is less clear how to use it in the code. It boils down to this:

  1. Define model
  2. Create session
  3. Initialize variables
  4. If restoring, call saver.restore, passing in the session and path to the directory containing checkpoint files
  5. When saving, call, passing in the session and the path to the save directory
# ... code to define model omitted ...
# Initializing the variables
init = tf.initialize_all_variables()
saver = tf.train.Saver()
with tf.Session() as sess:
    if restore:
        ckpt = tf.train.get_checkpoint_state(save_path)
        if ckpt and ckpt.model_checkpoint_path:
            saver.restore(sess, ckpt.model_checkpoint_path)
        # ... training code omitted ..., save_path)

What if you want to train on one computer and restore it on a different computer? You will get an error: open the file named “checkpoint” with a text editor and edit the paths to the new path.

Here is a working example. Credits to the original author Aymeric Damien. I forked the code and added my edits to save/restore.