Organizing my Son’s Drawings with Tensorflow

2021/02/28

My 5-year-old son loves to draw. He draws on several pages of white papers every day, especially during this global pandemic time. In last a couple of months, he emptied two boxes of printing papers, resulting in roughly one thousand drawings. Though I really like his artwork, organizing them is not very easy.

We did a systematic cleaned-up in our home the other day, and gathered a thick pile of his work.

/images/james-pile-of-paper.jpg

To make sure we don't lose these precious memory, I batch scanned them into our computer (luckily we have a new printer that does that fairly quickly). When staring at 650 pages pdf file, I don't think it's such a good idea storing the pictures this way. The ordering seems too random: one good drawing is often burried among several mediocre ones. One reason this happened was because I scanned both sides of each page, where the back side may or may not have a full drawing. In addition, my son just pays more efforts on certain drawings than the others.

This random ordering is just unacceptable for a machine learning engineer. I decided to sort these pages using a ranking algorithm. I was hoping to have a classifier that assigns higher scores to better drawings. That way I can put better pictures at the beginning of the booklet. One immediate challenge is: how do I gather the data? I was too lazy to annotate things on my own… right?

Then I suddenly realized I already have "annotated" data for this task. I made lots of subconscious decision when putting the papers into a pile – I always put the better-looking side facing up, the other side facing down. This is just the right kind of annotation I need: pairwise comparisons between images, with labels on good v.s. bad. This directly translates to labels we can access from the original scanned pdf: the odd-numbered pages are better pages (labeled as 1) than the even-numbered pages (labeled as 0).

With that, I built a classifier using tensorflow Keras API, to sort these 650 pages. The program loads images into 50x50 gray-scale images, run them through several layers. With the model, we got to sort all these pages, and discard the random ones.

Code

First "un-staple" the original PDF file into individual pictures.

pdftoppm -jpeg merged.pdf pics/pics

Then by looking at the page numbers, I'm able to separate them into front v.s. back images. I use that as the classification labels.

mkdir pics/front pics/back
mv pics/*1.jpg pics/*3.jpg pics/*5.jpg pics/*7.jpg pics/*9.jpg pics/front
mv pics/*.jpg pics/back

Now into python/tensorflow:

  import tensorflow as tf
  import glob
  import os
  import shutil


  pics = tf.keras.preprocessing.image_dataset_from_directory(
      'pics', image_size=(50, 50), color_mode='grayscale')

  # Train a model
  model = tf.keras.Sequential([
      tf.keras.Input((50, 50, 1)),
      tf.keras.layers.Conv2D(1, 5, 3),
      tf.keras.layers.Conv2D(1, 5),
      tf.keras.layers.Flatten(),
      tf.keras.layers.Dense(1, activation=tf.keras.activations.sigmoid),
  ])
  model.summary()
  model.compile(
      loss=tf.keras.losses.binary_crossentropy,
      metrics=[tf.keras.metrics.binary_accuracy])
  model.fit(pics, epochs=50)


  # Assign scores
  results = []
  for path in glob.glob('pics/*/*.jpg'):
      score = model.predict(
	  tf.expand_dims(
	      tf.keras.preprocessing.image.img_to_array(
		  tf.keras.preprocessing.image.load_img(
		      path, target_size=(50, 50), color_mode='grayscale'
		  )), 0))
      results.append((score[0, 0], path))

  # Create a new directory with all the pages in sorted order
  os.mkdir('sorted_pics')
  for i, (score, path) in enumerate(sorted(results, reverse=True)):
      target_path = f'sorted_pics/pic-{i:03}.jpg'
      shutil.copy(path, target_path)

The model has 197 parameters, and the classification accuracy became ~88% after 50 epochs of training. I'm not too worried about over-fitting.

_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
conv2d_53 (Conv2D)           (None, 16, 16, 1)         26
_________________________________________________________________
conv2d_54 (Conv2D)           (None, 12, 12, 1)         26
_________________________________________________________________
flatten_45 (Flatten)         (None, 144)               0
_________________________________________________________________
dense_38 (Dense)             (None, 1)                 145
=================================================================
Total params: 197
Trainable params: 197
Non-trainable params: 0

Then stapling the pages together again, to get one big PDF (I also manually deleted several chunks of not-so-good images after sorting the drawings).

find -name \*.jpg -exec bash -c 'convert "$1" "${1%.jpg}.pdf"' - \{\} \;
pdfunite *.pdf merged.pdf

Demo

Before sorting, drawings with different quality are interleaving.

/images/james-booklet-unsorted.png

After sorting, in general, drawings with higher quality pop up to the top.

The first a couple of pages: /images/james-booklet-top-pages.png

The last a couple of pages generally look less appealing than the ones at the top: /images/james-booklet-bottom-pages.png

My son and I both are quite happy with the final PDF. This file has lots of arts and science in it:D