Last time, we looked at machine learning. Today, let’s take a look at deep learning for image recognition (convolutional network) and transfer learning.

Convolutional Network

In a typical, fully connected neural network, a neuron on the first hidden layer is connected to every single input neuron. This means that for x input neurons, each neuron on the first hidden layer has x weights and 1 bias; for y neurons on the first hidden layer, you get y * (x+1) parameters. That’s a lot of parameters, and the problem becomes much worse as the number of hidden layers increases. To combat this, convolutional networks simplify the network with the following techniques:

Input and hidden layer. Color shows which neurons on the input layer connect to which neurons on hidden layer.
Note how red and blue overlap on the input layer
  • Local receptive field: instead of a hidden neuron seeing the entire input layer, it only sees a small part of it. Say the input is a 15 x 15 pixel input image, and the size of the local receptive field is 5 x 5 pixels; in that case, each neuron on the hidden layer will see 5 x 5 pixels. The top left hidden neuron will see the top left 5 x 5 pixels of the input, and the neuron next to it will see another 5 x 5 pixels, shifted over by a small amount. This small amount is called the stride length. If the stride length is 1 (5 x 5 pixel is shifted over by 1 pixel), then the hidden layer will be 11 x 11 neurons large.
    • The first advantage to this approach is reducing the number of parameters. Instead of a neuron on the hidden layer having 15 x 15 weights, now each one only has 5 x 5 weights. That’s a 9 fold decrease!
    • The second advantage is information about the location of data is preserved. In a fully connected layer, a neuron on the hidden layer lighting up means something happened somewhere in the input image. For a convolutional network, if a neuron on a hidden layer lights up, you know which 5 x 5 pixels caused that neuron to activate.
    • Note the hidden layer here is called a convolutional layer, due to its similarity with the mathematical operation of convolution
  • Shared weights & biases: Unlike a fully connected neural network, each neuron on the hidden layer in a convolutional network all have the same weights and biases. For example, in the image above, the red, blue, yellow and green neuron on the hidden layer all have the same 5 x 5 = 25 weights, and 1 bias.
    • The key advantage to this approach is further reducing the number of parameters. In a fully connected network, you would have 25 weights and 1 bias for each of the 11 x 11 neurons on the hidden network. Here, since all the neurons have the same weights and biases, you only have 25 weights and 1 bias for the entire hidden layer. That’s a 121 fold decrease!
    • The sharing of weights and biases means that all the neurons on the hidden layer look for a specific feature in its own local receptive field. Say the weights and biases are tuned to look for a vertical line. Then, in the image above, for the hidden layer, the red neuron looks for a vertical line at the top left of the input image, and the green neuron looks for a vertical line at the bottom right of the input image.
    • The hidden layer will look for a single feature in the entire image, forming a feature map. For effective image recognition, the neural network must recognize multiple features, such as vertical, horizontal, diagonal and curved lines. To accomplish this, the network must have multiple feature maps. For example, if you have 5 feature maps, then the network will recognize 5 distinct features. The top left neuron of each feature map will see the same 5 x 5 image, but since each one has different weights and biases (since they’re on different feature maps), each neuron will look for a different feature:
Input layer to multiple feature maps. Each feature map scans the entire image for a single feature.
Pooling layer reduces the number of neurons per layer
  • Pooling layer: Multiple neurons on the hidden layer are combined together, simplifying the network. This can be done in multiple ways. One common approach is max-pooling: the value of each neuron on the pooling layer is equal to the maximum value of 2 x 2 neurons on the hidden layer. Note that there is no overlap this time. Often, the hidden layer and pooling layer are considered a single layer.
    • By pooling, some of the location information is lost. On the convolutional layer, you know which 5 x 5 pixel of the input a neuron is looking at. On the pooling layer, each neuron is the amalgamation of 4 neurons on the convolutional layer, so you don’t know which 5 x 5 pixels of the input image activated the neuron. This is okay; generally, knowing the approximate location of a feature is all you need. Knowing its exact location doesn’t really help.

Let’s recap. An input image is provided to a layer of input neurons, arranged as a grid. The input neurons are then fed to feature maps, which scan the image for different features. The feature map, upon finding a feature, will activate, telling the network that a feature exists at a certain location. The pooling layer then compresses the information, making computation easier.

We still need a fully connected network to turn feature recognition into image recognition. The pooling layer will say “I found these features in these locations”, and the fully connected network will use that to categorize the input image. An example of a full network is shown below. The sizes of the layers are chosen arbitrarily:

Input layer: 28 x 28
Convolutional layer: 5 x 5 receptive field, stride length 1, so 24 x 24. 10 feature maps.
Pooling layer: 2 x 2 max-pooling, so 12 x 12
Fully connected layer: 28 neurons
Output layer: 14 neurons

Note that each neuron on the fully connected layer connects to every neuron on the pooling layer, and every neuron on the output layer.

For very complex neural networks, there are often multiple convolutional-pooling layers; that is, the output of one convolutional-pooling layer is fed into another convolutional-pooling layer. These networks also often have more than one fully connected layer. All in all, deep neural networks can get very complex, but thanks to sharing weights and pooling, learning is still possible.

Transfer Learning

Humans are remarkably good at pattern recognition. If you show someone one or two pictures of a raccoon, they will be able to correctly identify raccoons in real life, even if they’ve never seen that animal before. This amazing ability isn’t limited to just raccoons; elephants, cars, characters in a TV show, you name it.

How is this possible? A computer needs hundreds, thousands or millions of examples to learn how to recognize things, but a human needs only a handful. The key difference is that machines are learning from scratch, while humans are not. Take for example the person in the previous paragraph. This person has been learning how to recognize things since they were born. They can recognize people, sounds, textures, smells, etc. This person has developed the skills needed to recognize patterns and classify things already. When they see a couple of pictures of raccoons, they use those preexisting skills to pick up on key features, which they use to identify raccoons out in the field.

My point is that humans have a powerful and robust ability to recognize things already. When they need to recognize something new, they don’t start from scratch, but rather build upon existing abilities to rapidly gain new skills. Fortunately, computers can emulate this ability through transfer learning.

Say you have a neural network trained to recognize thousands of things, such as hammers, bears, dogs, etc. This neural network is very robust, and can handle a wide range of images with different backgrounds, lighting conditions, and weird angles. It is possible to take this fully trained neural network, and adapt it for something specific. For instance, you could use the neural network to differentiate between a thumbs up and thumbs down, or a smile and a frown. Though the original network wasn’t trained for this application, we can use transfer learning to re-train the network. This allows us to create powerful neural networks in a very short amount of time! My next post will be on the Jetson Nano, a development kit for playing with deep learning and transfer learning. Stay tuned to find out more!

Leave a comment

Design a site like this with WordPress.com
Get started