For the past few months (they’ve really flown by!) I’ve been working on how the brain processes faces. Specifically, how does inferotemporal cortex produce a rich representation of faces? In a small step towards addressing this scientific question, I’ve been building sparse coding models of faces. This is a natural direction for me and builds on previous work on unsupervised statistical learning algorithms.
So, what do you get when you run sparse coding on faces?
Before I describe the results, I want to go over a few considerations I took in formulating the model.
First, I didn’t want to precisely align the faces. I want the networks to have to deal with position variation, just as the biological visual system needs to.
Second, I didn’t want to build small patch based models. I don’t want small basis functions to impose specific structure in the learned weights (if weights should go to zero, learning will take care of this).
Here’s the setup:
- YouTube Faces Database. I start with the aligned dataset and then introduce position variation by taking apertures large enough to contain the face, but not necessarily centered on the face. The apertures are 48×48 pixels.
- Run PCA whitening to produce 576 dimensions.
- Train a 1-layer sparse coding model (L1 sparse cost function) with 1024 sparse coding basis functions.
Here’s the result in the unwhitened space:
And see the full result here (1.3MB): A_hires_final.png
A few observations:
- The model learns a spatial tiling of faces. This is similar to the standard Gabor result, which also tiles space.
- There are functions that appear to represent the mean face at each position.
- There are functions that seem to represent each eye separately. I wasn’t sure if this would happen, or if all eye related functions would encode content related to both the right and left eye.
- There are other functions that seem to be related to the borders of the face and the hairline.
The characteristics of the basis functions change slightly as I vary the number of functions. See this paper for a similar result on natural images.
The sparse coding model is only an approximation to the face distribution and leaves many dependencies unmodeled. More work to be done!
Some innovations and tools that made this possible:
- FISTA algorithm for sparse coding
- Theano for seemless compilation to GPUs
- YouTube Faces database
You can find a skeleton of this code here: