Building a Simple Neural Network

So… you want to learn about neural networks? Well, you’ve come to the right place. This post won’t focus on the theory behind how neural nets work. There are already numerous blog posts and books for that.1 This focuses on building a neural net in code. So, if you want to skip straight to the code, the repo is on Github.

What is a Neural Network?

There are many ways to answer this question, but the answer that resonates most deeply with me, and is perhaps most fundamental, is that a neural network is basically a function. It transforms its input into its output. One of my old college professors actually wrote a paper, Approximation by superpositions of a sigmoidal function, proving that neural networks can approximate any function. 2 This capability is what makes neural networks so powerful and exciting. And all you need to do is select the right weights (it’s not quite that simple).

The Simplest Neural Network

We’ll start with a single perceptron, the simplest model of a neuron. Depending on the size of the inputs, the output is either 1 or -1. An output of 1 means the perceptron is on, -1 means the perceptron is off.3 For our simple example, there’s one input, $x$, which has weight $w$. To determine the output, called the activation, we first take the dot product of the input and weight vectors, $\sum_{i=1}^{N}w_i \cdot x_i$, then pass the result through the sign function to get the activation. You can see the code for this below.

You may be wondering why we have to use the sign function. This is just because we want the output of the perceptron to be 1 or -1. For other problems we might want to have a wider range of output values. In such cases we would replace the step function with something else, like the sigmoid or arctangent. In general though, the activation functions used in neural networks take real-valued input and return output that is limited to specific range or to a set of specific values.

A Simple Problem

We’ll look at a simple binary classification problem. That is, to classify an input as belonging to one of two categories. In such a case, we can map each category to one of the two possible output values of the perceptron. Let’s consider the case where $x$ is zero. In such a case it doesn’t matter what the weights are set to, the activation of the perceptron will always be off. That’s not good. To combat such problems we add a fixed input to the perceptron, called the bias; the bias is generally set to 1. The addition of the bias slightly changes how we compute the output. Now we add a term to the sum we saw above, $\sum_{i=1}^{N}w_i \cdot x_i + w_0 x_0$, where $x_0$ is the bias and $w_0$ is the bias weight:

Let’s put these pieces together:

But how do we pick the right weights? The answer is that we don’t. There’s an algorithm for that, backpropagation, though it seems no one really calls it backpropagation until there are multiple layers. Backpropagation is a fancy way of saying that we propagate the error in the output back to the inputs:

1. See how far away the prediction of our network is from the expected output
2. Take a step in weight parameter space in the direction that minimizes the error. If you remember your calculus lessons, this is a step in the negative gradient direction. How a big a step to take depends on the size of the error and on how fast we want to move in that direction. We don’t want to take too big a step or too small step. In the former case, we can easily shoot past the optimal weights and in the latter case we might take a long time to get there.
3. Return to step 1 and repeat until the error is “small enough”.

We could write steps 1 and 2 in code as

Note that we can also use the squared error instead of the The full training happens when we pass a sequence of input, output pairs to the backProp function. With each call to backProp the weights of the perceptron are altered to decrease future errors. To handle the training, I’ve made a PerceptronTrainer struct and a struct to hold the training data as well.

Training the Perceptron

We want our perceptron to tell us if a given point is above or below a line in the xy plane. You can pick any line you want to, but I’ll take a simple one like $y(x)=3x+1$. We can generate training data by

1. picking $N$ input values at random and computing the $y$ value for each
2. Determine whether the $y$ value is above or below the line.

Then, create a PerceptronTrainer  and pass the training data to it and call the train function.

How Well Does the Perceptron Work?

Let’s pass 100 random inputs to the perceptron and see how often the predictions are correct. We’ll also create a new, untrained perceptron and see how often it’s predictions are correct.

I get anywhere from 88-100% correct for the trained perceptron and about 4-40% correct for the untrained perceptron. Not bad for a simple neural network and a simple problem.

1. A very nice online book is Michael Nielsen’s Neural Networks and Deep Learning
2. There were several papers written around the same time that talk about these issues. They’re behind a paywall, but you can probably get them on Sci-Hub: Multilayer feedforward networks are universal approximatorsUniversal approximation of an unknown mapping and its derivatives using multilayer feedforward networksOn the approximate realization of continuous mappings by neural networksApproximation capabilities of multilayer feedforward networks
3. Usually, a perceptron’s output is 1 or 0. I have a specific use case in mind for which -1 is more convenient than 0.