The Power of Activating The Artificial Neuron
In the previous post, we took a look at the structure of ANNs, and how the inputs and weights for particular Neurons in ANNs get calculated. However, this is only half of the battle. Once we have our input data calculation for our neuron what do we do with it? This post will go into detail as to how that process looks. By the end of it, we will see what the final output for a single neuron looks like, and what the implications are for setting up layers of neurons for information processing and learning.
The human brain has an enormously dizzying amount of neurons. In fact, during my research I've learned that there is a total number of 86 billion of them. Each one has a series of connections with other neurons passing signals back and forth to one another. The interesting thing to note is that these neurons are not all firing at the same time. These activations, or firings, of neurons, only come about at the right moment when called. This moment that triggers the activation process only happens when a neuron is excited by its input and a specified threshold as been reached, or passed!
This model of the brain's functions is of great interest to us as engineers, but the problem is we don't all have access to a machine that can model all 86 billion neuron connections (There are only a few companies and research labs with computers that could scale to this level and beyond). Fortunately, we can avoid this lack of computer power for most problems and shrink our efforts down to a smaller size while still accomplishing some pretty amazing things in the process.
Imagine looking at the inside of an artificial neuron. You can think of this neuron as a tiny process that takes several inputs and weights, and measures the data input and associated weights and produces an output to either another connected neuron or the output layer itself. I envision this internal process for calculating data inputs inside an artificial neuron to look like this:
I like to think of the Artificial Neuron inputs as going through 3 phases. The first is the summation calculation, which we covered in the previous post. The second phase is the activation or transfer function. Last but not least, we then pass through the bias, before generating our final output for the neuron.
THE ACTIVATION FUNCTIONS
A firm understanding of activation functions and their application is paramount to designing a successful neural network from scratch. Most posts on deep learning that you will find out on the internet will discuss the sigmoid function, but there are many more activation functions that can be used to build extremely sophisticated neural networks. We will examine some interesting ones in detail, and implement some Elixir code for them. We will then watch them go to work on several Artificial Neurons to generate some outputs.
First, let's look at the Linear transfer function. This transfer function is the most primitive of the family in my opinion. The linear function takes its inputs to be what they are. It does no further calculations, and its' input is returned just as it was received. The code for this can be modeled like
You might be wondering what's the purpose of this function, but if you think about it, some things in life can be described exactly like this. What you see is what you get. No further explanation is needed. Artificial Neurons need a way to handle those situations. Reaching for this function is the best bet when designing a system that deals with information with no ambiguity.
The next step up we can examine is the hard limit function. This transfer function is really good at handling binary cases. For example, suppose I was designing a network that had the ability to separate grocery items into hot and cold groups. This is a binary separation. This function is so binary that it only gives two possible outputs. The logic for this transfer function is simplistic in that it measures its input, and decides if the input calculation is less than 0. If it is, it will return a 0. If it finds the input calculation is greater than 0 it will return a 1. We can code that this way
As I mentioned earlier, the sigmoid function is the most widely used transfer function for a neural network. The reason for this is quite simple. It can handle a broad range of positive numbers all the way to infinity, and squash the calculation into an output between 0 and 1! Why is this important? It's important because we need systems that are capable of reasoning within uncertain environments. Nature is not a straight forward apparatus. The problem with today's computer programs is that they are extremely linear. Linearity is not how the real world works. This ability to handle the unknown and unaccounted for scenarios will be a requirement for all computer software in the future. Today's linearity is the main reason why imperative programming won't make the technological shift currently underway. The real world has twists, turns, and curves. Our systems must exist and strive in this world, so we need to deprogram ourselves away from our old linear way of thinking. This is not easy because software engineers are trained from the very beginning to think linearly! The formula for a sigmoid function is quite simple, I've posted an image of it below and give credit to Dr. Saed Sayad's website as thats where I got it from.
This is all great, but how do we model this in Elixir? Well turns out you can't, but we can turn to our Erlang libraries for some assistance. I wrote this formula this way...
HYPERBOLIC TANGENT FUNCTION
The sigmoid is great, but what if we have a situation that needs to handle both positive and negative numbers on a graph? Surely there are situations when this will happen in the real world. The sigmoid can only take us so far. There is an alternative to the sigmoid. The alternative to the sigmoid is the Hyperbolic Tangent function. This function squashes its output into a range of numbers between -1 and 1. That is a much more flexible approach to consider because it accounts for both positive and the negative values. Hyperbolic Tangent still remains one of my favorite functions to use while designing Neural Networks. In life, you never know what you're going to get, and our autonomous systems need to learn this. The Hyperbolic Tangent is a good way to teach them this fact. Here is a brief mathematical description of the hyperbolic tangent function
Again, we will need our Erlang library to code this function. This has already been implemented for us using our :math.tanh/1.
This is in no way an exhaustive list of all the activation functions you can use to fire up a neuron, but they are the most common.
Earlier in this post we talked about the 3 phases a calculation goes through. The summation function was covered in our last post. For a refresher here is the way it looks...
The third phase is adding the bias. The bias is important because it helps balance out calculations. Our world needs bias views. If we didn't consider biases we would be in a situation where we would have intellectual zealots walking the earth with no ability to question the information that's being passed back and forth. Bias views are healthy for society because they keep things neutral and modest and prevent a single view from taking over society completely. They are also healthy for Neural Networks. This ability to model bias needs to be considered in our networks, we can code this easily this way
You might notice the reference to our Neuron that is being passed into our function. If you're wondering where it came from, don't panic. We will implement the structure next.
Now that we covered the 3 phases an input goes through within a Neuron. Let's express this process for our 4 different activation functions in the order they have been introduced
Great! Before we see these in action there is one more thing we need to understand about these next generation systems. Our next section will briefly cover this.
Life is never perfect. We make mistakes often. As a human being, it doesn't make sense for us to strive to a point of making no mistakes, instead, it makes more sense for us to learn from them while accepting the fact that we will continue to make them. This same behavior applies to these types of systems. We can't expect them to be perfect. We have to allow them room to correct themselves. This feature of Neural Networks is actually what makes them so damn amazing!! They know when something is wrong, and they will work until they can find out what that is, and how they can make it better.
There are two types of errors in Neural Networks. The first is the Local Error. This is the error calculation that is specific to a single neuron. We will see how this works shortly. The other type of Error is the Global Error. This is the Error rate for the whole entire Neural Network, and it involves the process of Back Propagation. Back Propagation requires a whole post to itself in order to explain this properly, so we will only talk about the Local Error because we are not looking at things from a global perspective just yet.
Remember the guess my number game? This game involved saying to someone I'm thinking of a number between 1 and 10. The person doing the guessing guesses 5. The person with the number in mind then says higher. The guesser then starts again with this new range they've been given hoping that they get closer and closer to the number originally thought of by the other individual. In theory, this whole process is actually how the local error calculation works. The Neuron is given an Ideal output, and it then compares its output with that of the ideal output. This difference is known as the local error calculation. It can be thought of as a simple subtraction problem. Let's write that.
THE NEURON OUTPUT
Let's generate some output for some example Neurons.
Given our 4 different transfer functions we explored, let's find out how they affect the data that went into these two artificial neurons.
So if the calculations for our output for n1 is -0.75, and our output for n2 is equal to 1.99 what output do you see coming for our hard limit transfer function? Let's find out.
Let's see what happens when we use the sigmoid function
Finally, let's see how the hyperbolic tangent function affects our two neurons.
We briefly talked about the Error calculation, but for the fun of it lets see what that looks like for n1. Lets say you wanted to calculate the local error function for n1 using the sigmoid function. Your ideal output for that neuron would be 0.5. Lets see what happens when we plug those numbers in
These calculations are pretty neat! We've seen how all these activation functions work, and how they produced outputs for our neurons along with calculating errors.
Many different transfer functions are used while designing neural networks. I've personally used 10 of them. I was just asked on Twitter about the code for these posts, so if you are interested in seeing the rest of these transfer functions and how they work check out the code here!
Imagine the type of systems you can build when you have a cluster of these Neurons connected and trained for specific tasks? This structure is how sophisticated systems are currently being designed. It is important that we understand the basics of what happens within an Artifical Neuron. I'm aware there are frameworks out there that hide a lot of this complexity, but it's important to know the fundamentals yourself. Being dependent on a framework is never a good thing because it stifles creativity and real ingenuity. We don't want that.
Next post, we will explore Learning rules and different architectures we can use while designing our network from the ground up.