Deep Learning with Elixir: Building and Training a Multi-Layered Neural Network
"A Computational process is indeed much like a sorcerer's idea of spirit. It cannot be seen or touched. It is not composed of matter at all. However, it is very real. It can perform actual work. It can answer questions. It can affect the world by disbursing money at a bank or by controlling a robot arm in a factory. The programs we use to conjure processes are like a sorcerer's spells. They are carefully composed from symbolic expressions in arcane and esoteric programming languages that prescribe the tasks we want our processes to perform."
- The Structure and Interpretation of Computer Programs, Second Edition, Abelson and Sussman, ch1. pg.1
IN THE BEGINNING
Getting into deep learning is a lot like engaging in the madness of sorcery. One must spend a considerable amount of time understanding the craft, while also contemplating the pros and cons of creating something that is truly autonomous while at the same time laying awake at night in fear of how much disruption automation will cause in our society.
Creating your first Neural Network is a humbling experience in that you're witnessing something you built with a small amount of code evolve to achieve a given task all on its own.
In this post, we will be creating a standard 3 x 3 Deep Learning Neural Network using only the Elixir programming language. Hopefully, by the end of reading this, the reader will have a new appreciation for advanced genetic programming, and the depths in which these new A.I. techniques get used with Elixir.
Today's big data world requires tomorrow's autonomous Deep Learning systems. To grasp the concepts of how these systems work we will build a standard neural network that learns a minuscule problem set.
There are three things I've found helpful in designing and building these types of systems from scratch using Elixir. The reader can think of these three items as the helpers.
- A basic understanding of Elixir & Erlang's OTP eco-system
- Numerix (A Machine Learning Library based in Elixir written by Safwan Kamarrudin)
- Matrix (A Helpful library useful for calculating matrices written by Tom Krauss)
There are other Elixir packages like Tensor which allow an Elixir dev to do some sophisticated things, but we will keep it limited to Matrix and Numerix listed above.
As discussed before on Automating the Future neural nets solve their problems in a much more different way than our traditional programming model we've all been trained to use. These systems learn by example. A creator comes up with the desired target or a goal for the system to achieve, and the system is fed several examples until it learns how to reach the objective the creator wants.
Computers are lousy at understanding the nuances of life and what we want and need if communicated to via human language. However, if machines are represented the world's data and problems via numbers and floating point integers, miraculously they can begin to problem solve and understand human-like things!
This shift from procedural/object driven programming now begins to change to a more statistical-mathematical/data driven approach for new software systems. This new paradigm shift is paramount to recognize if the reader is planning to remain relevant in the tech industry. The old way of programming and problem solving has been dead since 2011 with the advent of WATSON on Jeopardy. The companies embracing this data-driven statistical paradigm are the ones that will eventually take over the tech industry.
There is no way to explicitly program WATSON with imperative instructions to do what it did that night on the show. In fact, WATSON was not even aware of the questions that it would be getting. All it had at its' disposal was data, learning algorithms, and a Multi-Layered Neural Network.
The network we'll build here will be nowhere near as sophisticated as the one above, but It will illustrate the concepts of Multi-Layered Networks and how they work on a conceptual level.
Suppose we had a sequence of numbers that could represent anything in a particular problem space. Let's say this series of numbers are 1, 0, and 0. Together, they make up a list that looks like [1,0,0].
This list is a problem, though. The sequence that we want to achieve is a list of all ones. The list is represented as [1,1,1]. This list of numbers can be considered our goal, or target.
Our problem space, in a nutshell, is listed in the table below...
We would love for our system to be able to distinguish its' input data from its' target data, to do that, we would need another random data set for it to compare against the target. This random data set is known as the training set. This training set is what the network will use to learn how to achieve its' outcome. This new addition of our training data modifies our chart to look like below...
The best way to represent a neural network to a computer is via Matrices. Matrices are great tools in Linear Algebra that allow us to perform operations on groups of numbers.
Looking at our chart above we can see that we have three columns and three rows. This chart can represent a 3 x 3 matrix!
A primary Neural network is modeled as a Linear Algebra matrix list. Each element in the array can be considered a node/neuron. Each neuron is responsible for calculating and producing its' output which affects the entire mind of the system.
A Multi-Layered Network always has three section names to it. The first part is called the input layer. The second layer is called the hidden layer. The final part is called the output layer. Very sophisticated networks have multiple hidden layers, but we will only have one hidden layer for this example.
As we visualize the flow of data from left to right, the picture of our network can look something like this...
If we take our chart and turn it into this architecture we can then visualize it to look this way...
This is what we want our network to do. We need it to calculate our inputs and turn it into our desired output!
Now that our design is complete the first thing to do is create our Elixir project. I've decided to call it DEEPNET. We would like a Supervisor to make this project startup much more automated so we used the command...
mix new deepnet --sup
This command creates a new Elixir project with a supervisor.
The next thing to do is add the dependencies that are needed. I've mentioned Numerix and Matrix, but I've added sfmt in order to ensure our random weights are indeed random.
We need a way for our Supervisor to start up our Neural Network automatically. Because we want a 3 x 3 architecture, we need 9 neurons to be created. That would mean each layer would need to have 3 neurons in them. These neurons can be created at startup. Let's do that in our Supervisors start function.
CREATING THE NETWORK
We are referencing a create function here. It would be wise for us to create this next so the Supervisor has an implementation to go off of. Our create function will need to handle these lists of numbers. Because these numbers represent neurons in the layer. It might be wise to store the initial state in an Elixir Agent.
INITIALIZING RANDOM WEIGHTS
If you've been following ATF one might remember that all neurons need weights associated with them. We want our weights to be as random as possible. This randomization is what gives us confidence that we have converged on the right solution during training. The whole point of a Neural Network is to find the appropriate weights suitable for the particular problem at hand. We need a function that essentially creates 9 different weights corresponding to each neuron. We also would like to calculate in a bias. Here is what that function looks like...
First, we utilize sfmt to seed our timestamp to help ensure we are getting random weights. Next we generate random weights for the inputs. However, we don't want to stop there. It would help to add a bias so as to really balance out our weights, so subtracting our input weights from a matrix of 0.5 as the bias will give us a pretty good assortment of randomized weights to start off with. Finally, we update the network with those initialized weights. Our 9 weights should look something like this...
Now that we are getting our weights. Let's check the size of our matrix.
Great! Now we have our 3 x 3 architecture initialized with weights. Our entire network at startup now looks like so
A neural network needs feedback about its' performance over time. This feedback is gathered through what we call the error rate. There are several ways of calculating the error rate and the method of calculation is completely up to the creator. In this post, we will utilize the MSE or the Mean Squared Error.
A Neural Network's job during training is to constantly compare its output to that of the target given during training. We will need a way to calculate this error for our network and store it so that we can monitor how well our network is doing while training. That function is pretty straight forward.
ADJUSTING NETWORK WEIGHTS
In the last line of the function above one might have noticed a reference to a Deepnet.Network.adjust_weights/2 function. This is an important step. During training, the Neural Network needs a way to improve itself if it finds it missed the target its' been training to achieve. Earlier, I talked a little bit about the learning rate constant. This is where it comes into play. Let's explore the Deepnet.Network.adjust_weights/2 function...
CALCULATING NEURON OUTPUT
One might wonder how to produce the output that our functions continue to reference. We need a way to calculate the output of each neuron. If you need a refresher on how neurons calculate their output check out the blog post here.
For our particular problem set, we are going to use the sigmoid function as our activation function. Remember, a data signal inside a neuron goes through 3 phases. The first phase is the summation or the dot product of the inputs and the weights. The next phase is the activation function. The final phase is the adding of the bias. We already calculated in our bias at weight initialization so we won't need to have that part in our function. It seems all we will need to do is phase 1 and 2.
We now can calculate our outputs for our Neurons. However, we are not quite finished as of now. We will need a way to move the data from one layer to the next. This process of moving data from one layer to the next is known as feed-forward. Since we are feeding forward data from the input layer to the hidden layer, and then feeding that hidden layer output to the output layer we are essentially feeding data forward twice. Luckily we can do this pretty easily via pattern matching with Elixir.
Learning is a repetitive process. If our network has not come to the correct solution it must repeat this entire process once more until it gets it. Each time the network will make small changes to itself until it gets to its' ultimate goal. One could think of this process as a giant learning loop.
Every time a loop is completed and the network starts again in order to minimize the error, we call this process back propagation. This is because the error is propagated through the network for readjustment. This is what separates modern systems from more traditional systems. Traditional systems had to wait on humans to come and fix the errors present. These systems want to minimize their error rate and they strive to perfection on their own, thus relieving the engineer of the burden of maintenance. Hopefully, you are beginning to see the benefits of solving problems this way!
It's always a good idea to automate training for a neural network. There could be times when training on a particular problem set can take hours or even days. It would not be wise to perform this process manually, so we will instead write a function that handles this for us.
As mentioned before the learning process is a loop. Elixir is a functional language and this causes us to use functions to handle our loops for us. In our loop, we will need to collect the input and the target and pass it into the network. The network trains on the data and checks its error rate. For me, I'd like my error rate to be minuscule. As a result, I'd like the network to train until it gets to an error rate below 0.02. If it finds that its' error rate is higher than 0.02 then it must continue training. This is what the learning process looks like. A repetitive cycle that the network must go through until the task is learned with little to no error. We can accomplish this via pattern matching...
The final thing we need to do is create our data struct for our user input and target. This information will then need to be passed to a learn function that will kick off the entire process.
Now we kick off the whole process via our final function...
That's it! Our network is fully built. What happens when we fire this up?
BANG!!! We can see it took 13 epochs for the training to complete. Our network finally reached our target list of [1,1,1] and it was able to get its error rate down below 0.02!! That's pretty impressive!
One might be thinking, what's the significance of this? How can this be used in the real world? The importance of Machine Learning is essential to the next era of the technological age in that it allows us as engineers to deal with large sums of data and train our systems to gather insights, or predict outcomes, and solve problems that we might not have a clue how to solve ourselves. As we've just witnessed, we can see how these systems are good at minimizing error, which is invaluable in the real world.
The beauty of Neural Networks is the fact that we can architect them in different ways to create human-like intelligence in our software systems. In this post, we have by no means covered all of the algorithms and different ways these networks can be designed to do amazing things. The goal of Automating the future is to continue to bring the Elixir community wonderful examples of how Neural Networks can be used to solve a wide variety of problems.
Now that we know how to design a basic multi-layered neural network we can move on to some excellent example projects of truly automated software systems that learn and solve different types of problems in our future posts. If you want to check out the Deepnet code I have placed it on GitHub. Feel free to fork, experiment and change as much as you like. This repo can serve as an example for the Elixir community on one way we can design a deep learning network from scratch!