What is a Neural Network?
Computer programs come in many shapes and sizes, but have a few things in common. We provide them with input, and they provide us with the result as output. A simple example is the “AVERAGE” function on a spreadsheet. We provide it with a column of figures, as input. The function computes the average of those figures and outputs it in the cell where we asked for it.
A neural network is a computer program in just this sense. We provide it with input (for example, a digitised image of a handwritten number) and it returns the output (which in this case might be the number that the image corresponds to).
In a classical computer program, a programmer will design and write the algorithm (the set of steps) that processes the input to give the output.
The reason neural network programs have their fancy name is that they are created in a different way. The program is made up of connected nodes, each of which performs a simple (usually standard) calculation on its incoming connections, and pushes the result to its outgoing ones. The simple calculation for each node could be to add up its inputs, and output if that sum exceeds a threshold. In this sense the neural network program is broken down to a number of simple connected sub-programs — the nodes.
Here’s a diagram illustrating the structure of a very simple ‘feed forward’ neural network:
This example takes 2 numbers as input, and produces 1 number as output (e.g. an estimate of the average of the input numbers).
In between input and output, it has 5 ‘hidden’ nodes, each connected to the 2 inputs and to the output.
Each connection in the graph (for example the one coloured blue between input1 and node1) has a number (0.61 in this case) that represents its ‘strength’. The result of the previous node (input1) is going to be multiplied by this strength before it goes to the next (node1). These numbers are typically called ‘weights’ though they represent the strength of the connection.
I’ve only illustrated 3 of the ‘weights’ or ‘strengths’ above but every connection has one.
Each hidden node does a simple calculation on the values passed to it, and passes the result to its output connection.
An example of the simple calculation at each node could be: add up the incoming connections, and output the result if it’s greater than zero. Otherwise output zero.
Start with guesses, but get better over time
To begin with, the weights are set to random numbers, and so the network as a whole produces random and incorrect output.
But this network is going to be a ‘supervised’ neural network, which means we’re going to train it to get better with examples. Instead of coding the algorithm, as we did at the beginning of the article, we are going to try to get the network to learn an approximation by giving it examples and making adjustments to our weights each time it gets it wrong.
For our average example, we’ll give the network training examples consisting of 2 input numbers, and their average.
example:
input: [35, 99] output: [67.0]
We give the 2 numbers as input, the network feeds them through, multiplying by weights and calculating the output at each node, before presenting its output.
input: [35, 99] network guess: [23.2894]
To begin with of course, this output is wrong, because the weights are random. But we compare it to the correct output from the example [67.0]. The crucial step comes next: after each example we can adjust each weight slightly according to the contribution it made to the error. If the error would have been lower if this weight were slightly higher, we make it slightly higher.
We do this for every weight in the network at each training step, adjusting slightly up or down depending on which would have made the error a bit lower.
If we do this repeatedly over many examples, the network as a whole can slowly converge to a reasonable estimate of the algorithm we want.
Why is this called a neural network?
When these kinds of programs were invented, they were loosely based on an analogy with how brain cells (neurons) work. These cells are also connected together in networks, and also pass information to each other (in the form of chemical signals) along paths which can vary in strength and can change as we learn.
That’s ridiculous #1: there’s no way this could work!
Well in fact it does work. Here is a simple example of the neural network that approximates the average of 2 numbers between 0 and 100.
It has 1 hidden layer with 2 nodes in it, and it’s trained on a data set of 1000 examples. After 500 training steps, it reaches a reasonable approximation.
Neural networks are sometimes called ‘general program approximators’. “Approximator” means they come up with an approximate (roughly correct) output for each input. ‘General’ is because they have this general, simple form, of standard nodes with weighted connections (though as we’ll see later, the number and arrangement of nodes can vary enormously).
That’s ridiculous #2: Why would you ever want to do that ?
Well computing the average of 2 numbers isn’t a great use of a neural network (though it helps to illustrate how they work). But there are many problems in computing that do lend themselves well to this approach. In another post I’ll explore one that should both motivate and develop intuition (the correct pronunciation of English words).
Artificial Intelligence has produced many amazing results in the last few years — for example a network that can look at pictures like this:
And produce captions such as:
But these are simply very large scale versions of the networks we’ve explored here (often with somewhat more complex connections which we’ll also discuss in future posts).