Automation And A.I. Systems will be the nexus for next generation Applications.

Join us on our journey of exploring the Techniques of these fields

Photo by Bram Janssens/Hemera / Getty Images

 

 

Deep Learning with Elixir: Building and Training a Multi-Layered Neural Network

Deep Learning with Elixir: Building and Training a Multi-Layered Neural Network

"A Computational process is indeed much like a sorcerer's idea of spirit. It cannot be seen or touched. It is not composed of matter at all. However, it is very real. It can perform actual work. It can answer questions. It can affect the world by disbursing money at a bank or by controlling a robot arm in a factory. The programs we use to conjure processes are like a sorcerer's spells. They are carefully composed from symbolic expressions in arcane and esoteric programming languages that prescribe the tasks we want our processes to perform."
- The Structure and Interpretation of Computer Programs, Second Edition, Abelson and Sussman, ch1. pg.1

IN THE BEGINNING

Getting into deep learning is a lot like engaging in the madness of sorcery. One must spend a considerable amount of time understanding the craft, while also contemplating the pros and cons of creating something that is truly autonomous while at the same time laying awake at night in fear of how much disruption automation will cause in our society. 

Creating your first Neural Network is a humbling experience in that you're witnessing something you built with a small amount of code evolve to achieve a given task all on its own.

In this post, we will be creating a standard 3 x 3 Deep Learning Neural Network using only the Elixir programming language. Hopefully, by the end of reading this, the reader will have a new appreciation for advanced genetic programming, and the depths in which these new A.I. techniques get used with Elixir.

THE HELPERS

Today's big data world requires tomorrow's autonomous Deep Learning systems. To grasp the concepts of how these systems work we will build a standard neural network that learns a minuscule problem set.

There are three things I've found helpful in designing and building these types of systems from scratch using Elixir.  The reader can think of these three items as the helpers.

  1.  A basic understanding of Elixir & Erlang's OTP eco-system
  2.  Numerix (A Machine Learning Library based in Elixir written by Safwan Kamarrudin)
  3.  Matrix (A Helpful library useful for calculating matrices written by Tom Krauss)

There are other Elixir packages like Tensor which allow an Elixir dev to do some sophisticated things, but we will keep it limited to Matrix and Numerix listed above.

THE PARADIGM

As discussed before on Automating the Future neural nets solve their problems in a much more different way than our traditional programming model we've all been trained to use. These systems learn by example. A creator comes up with the desired target or a goal for the system to achieve, and the system is fed several examples until it learns how to reach the objective the creator wants. 

Computers are lousy at understanding the nuances of life and what we want and need if communicated to via human language. However, if machines are represented the world's data and problems via numbers and floating point integers, miraculously they can begin to problem solve and understand human-like things!

This shift from procedural/object driven programming now begins to change to a more statistical-mathematical/data driven approach for new software systems. This new paradigm shift is paramount to recognize if the reader is planning to remain relevant in the tech industry. The old way of programming and problem solving has been dead since 2011 with the advent of WATSON on Jeopardy. The companies embracing this data-driven statistical paradigm are the ones that will eventually take over the tech industry.

There is no way to explicitly program WATSON with imperative instructions to do what it did that night on the show. In fact, WATSON was not even aware of the questions that it would be getting. All it had at its' disposal was data, learning algorithms, and a Multi-Layered Neural Network. 

The network we'll build here will be nowhere near as sophisticated as the one above, but It will illustrate the concepts of Multi-Layered Networks and how they work on a conceptual level. 

THE PROBLEM

Suppose we had a sequence of numbers that could represent anything in a particular problem space. Let's say this series of numbers are 1, 0, and 0. Together, they make up a list that looks like [1,0,0]. 

This list is a problem, though. The sequence that we want to achieve is a list of all ones. The list  is represented as [1,1,1]. This list of numbers can be considered our goal, or target. 

Our problem space, in a nutshell, is listed in the table below...

Our Inputs listed along with the target we would like our inputs to turn into.

Our Inputs listed along with the target we would like our inputs to turn into.

We would love for our system to be able to distinguish its' input data from its' target data, to do that, we would need another random data set for it to compare against the target. This random data set is known as the training set. This training set is what the network will use to learn how to achieve its' outcome. This new addition of our training data modifies our chart to look like below...

Chart with training data

THE DESIGN

 

The best way to represent a neural network to a computer is via Matrices. Matrices are great tools in Linear Algebra that allow us to perform operations on groups of numbers.

Looking at our chart above we can see that we have three columns and three rows. This chart can represent a 3 x 3 matrix!  

A primary Neural network is modeled as a Linear Algebra matrix list. Each element in the array can be considered a node/neuron. Each neuron is responsible for calculating and producing its' output which affects the entire mind of the system.

A Multi-Layered Network always has three section names to it. The first part is called the input layer. The second layer is called the hidden layer. The final part is called the output layer. Very sophisticated networks have multiple hidden layers, but we will only have one hidden layer for this example.

As we visualize the flow of data from left to right, the picture of our network can look something like this...

- Data flows from left to right - IL stands for the Input layer, HL stands for the hidden layer, and finally, the OL is the output layer - Each neuron is represented within the network. Because we are building a 3 x 3 network that will give us 9 neurons total.

- Data flows from left to right

- IL stands for the Input layer, HL stands for the hidden layer, and finally, the OL is the output layer

- Each neuron is represented within the network. Because we are building a 3 x 3 network that will give us 9 neurons total.

If we take our chart and turn it into this architecture we can then visualize it to look this way...

I = input and h represents hidden, while o represents output. Each number corresponds to the numbers that were listed on our data chart above. 

I = input and h represents hidden, while o represents output. Each number corresponds to the numbers that were listed on our data chart above. 

This is what we want our network to do. We need it to calculate our inputs and turn it into our desired output!

THE CODE

Now that our design is complete the first thing to do is create our Elixir project. I've decided to call it DEEPNET. We would like a Supervisor to make this project startup much more automated so we used the command...

mix new deepnet --sup

This command creates a new Elixir project with a supervisor.

DEPENDENCIES

The next thing to do is add the dependencies that are needed. I've mentioned Numerix and Matrix, but I've added sfmt in order to ensure our random weights are indeed random.

deps

APPLICATION START

We need a way for our Supervisor to start up our Neural Network automatically. Because we want a 3 x 3 architecture, we need 9 neurons to be created. That would mean each layer would need to have 3 neurons in them. These neurons can be created at startup. Let's do that in our Supervisors start function.

- Our application Supervisor which references the Deepnet.Network module which we will complete shortly. We also will need to give our network the number of nodes we want created in each layer.

- Our application Supervisor which references the Deepnet.Network module which we will complete shortly. We also will need to give our network the number of nodes we want created in each layer.

CREATING THE NETWORK

We are referencing a create function here. It would be wise for us to create this next so the Supervisor has an implementation to go off of. Our create function will need to handle these lists of numbers. Because these numbers represent neurons in the layer. It might be wise to store the initial state in an Elixir Agent.

- Each argument corresponds to a number of nodes in the layer. The fourth argument is the learning rate which is defaulted to 1.0. This will be further explained later.

- Each argument corresponds to a number of nodes in the layer. The fourth argument is the learning rate which is defaulted to 1.0. This will be further explained later.

INITIALIZING RANDOM WEIGHTS

If you've been following ATF one might remember that all neurons need weights associated with them. We want our weights to be as random as possible. This randomization is what gives us confidence that we have converged on the right solution during training. The whole point of a Neural Network is to find the appropriate weights suitable for the particular problem at hand. We need a function that essentially creates 9 different weights corresponding to each neuron. We also would like to calculate in a bias. Here is what that function looks like...

create weights

First, we utilize sfmt to seed our timestamp to help ensure we are getting random weights. Next we generate random weights for the inputs. However, we don't want to stop there. It would help to add a bias so as to really balance out our weights, so subtracting our input weights from a matrix of 0.5 as the bias will give us a pretty good assortment of randomized weights to start off with. Finally, we update the network with those initialized weights. Our 9 weights should look something like this...

- Randomized weights corresponding to all the neurons in the network. - Its' great to have a mixture of negative and positive weights

- Randomized weights corresponding to all the neurons in the network.

- Its' great to have a mixture of negative and positive weights

Now that we are getting our weights. Let's check the size of our matrix. 

matrix size

Great! Now we have our 3 x 3 architecture initialized with weights. Our entire network at startup now looks like so

Our Network is listed as an Elixir Struct. All of our values are set except the error rate and the target. We will explore those next.

Our Network is listed as an Elixir Struct. All of our values are set except the error rate and the target. We will explore those next.

ERROR CALCULATION

A neural network needs feedback about its' performance over time.  This feedback is gathered through what we call the error rate. There are several ways of calculating the error rate and the method of calculation is completely up to the creator. In this post, we will utilize the MSE or the Mean Squared Error. 

A Neural Network's job during training is to constantly compare its output to that of the target given during training. We will need a way to calculate this error for our network and store it so that we can monitor how well our network is doing while training. That function is pretty straight forward.

- We take in the final output of the network along with the initial inputs. - We then fetch our target so that we can calculate the Mean Squared error with the final output of the entire network.  Our reference to List.flatten/1 is designed to change our multi-dimensional list to a single list to make our calculations easier. Finally, we update our Agent with the new network error rate.

- We take in the final output of the network along with the initial inputs.

- We then fetch our target so that we can calculate the Mean Squared error with the final output of the entire network. 

Our reference to List.flatten/1 is designed to change our multi-dimensional list to a single list to make our calculations easier.

Finally, we update our Agent with the new network error rate.

ADJUSTING NETWORK WEIGHTS

In the last line of the function above one might have noticed a reference to a Deepnet.Network.adjust_weights/2 functionThis is an important step. During training, the Neural Network needs a way to improve itself if it finds it missed the target its' been training to achieve.  Earlier, I talked a little bit about the learning rate constant. This is where it comes into play. Let's explore the Deepnet.Network.adjust_weights/2 function... 

- Our adjust_weights function takes the output of the layer and the initial inputs - The next thing to do here is to calculate the delta. Here we get the difference of the output from the target and calculate the dot product of that result. This gives us a single calculation which is the DELTA, or the small change calculation. - Next, we tackle our gradient, which is essentially the smallest change we can make that gets us closer to our ultimate goal. Because we could be dealing with many outputs we will parallel map the gradient calculation on all of our outputs. The gradient calculation is defined as output x delta x learning_rate This learning rate can be anything from 1.0 - 3.0. I've seen many people utilize different ranges. It really depends on the creator of the network and how fast progress is needed. In our case, we will use 1.0 because our problem is trivial. - Finally, we take our current weights and we subtract them from our gradient. This result then becomes our new weights. Our new weights are then updated for the network.  

- Our adjust_weights function takes the output of the layer and the initial inputs

- The next thing to do here is to calculate the delta. Here we get the difference of the output from the target and calculate the dot product of that result. This gives us a single calculation which is the DELTA, or the small change calculation.

- Next, we tackle our gradient, which is essentially the smallest change we can make that gets us closer to our ultimate goal. Because we could be dealing with many outputs we will parallel map the gradient calculation on all of our outputs. The gradient calculation is defined as

output x delta x learning_rate

This learning rate can be anything from 1.0 - 3.0. I've seen many people utilize different ranges. It really depends on the creator of the network and how fast progress is needed. In our case, we will use 1.0 because our problem is trivial.

- Finally, we take our current weights and we subtract them from our gradient. This result then becomes our new weights. Our new weights are then updated for the network.

 

CALCULATING NEURON OUTPUT

One might wonder how to produce the output that our functions continue to reference. We need a way to calculate the output of each neuron. If you need a refresher on how neurons calculate their output check out the blog post here

For our particular problem set, we are going to use the sigmoid function as our activation function. Remember, a data signal inside a neuron goes through 3 phases. The first phase is the summation or the dot product of the inputs and the weights. The next phase is the activation function. The final phase is the adding of the bias. We already calculated in our bias at weight initialization so we won't need to have that part in our function. It seems all we will need to do is phase 1 and 2.

Here we take the inputs and weights and we parallel map through them. Each weight corresponds to an input, so to do that in Elixir we can just zip them up into tuples. The first element of the tuple is the input and the second would be the weight. We calculate the dot_product/summation of each of these inputs and weights. Next, we use the Numerix.Special.logistic/1  function, which is essentially the sigmoid function by another name. Because we need each one of these calculations to be a list we then wrap the result to take care of that output the appropriate way.

Here we take the inputs and weights and we parallel map through them. Each weight corresponds to an input, so to do that in Elixir we can just zip them up into tuples. The first element of the tuple is the input and the second would be the weight. We calculate the dot_product/summation of each of these inputs and weights. Next, we use the Numerix.Special.logistic/1  function, which is essentially the sigmoid function by another name.

Because we need each one of these calculations to be a list we then wrap the result to take care of that output the appropriate way.

We now can calculate our outputs for our Neurons. However, we are not quite finished as of now. We will need a way to move the data from one layer to the next. This process of moving data from one layer to the next is known as feed-forward. Since we are feeding forward data from the input layer to the hidden layer, and then feeding that hidden layer output to the output layer we are essentially feeding data forward twice. Luckily we can do this pretty easily via pattern matching with Elixir.

The first feed forward just takes the input list and calculates the output for the connection of the input layer with the hidden layer. That result is then passed on to the second version of the feed forward function. The second feed forward function takes in the output of the previous layer, along with the old weights of the previous layer, and the original inputs. The final output is then calculated. This brings us to the end of our entire network. Once here we can see how well we performed by calculating the errors in the network.

The first feed forward just takes the input list and calculates the output for the connection of the input layer with the hidden layer. That result is then passed on to the second version of the feed forward function.

The second feed forward function takes in the output of the previous layer, along with the old weights of the previous layer, and the original inputs. The final output is then calculated. This brings us to the end of our entire network. Once here we can see how well we performed by calculating the errors in the network.

THE PROCESS

Learning is a repetitive process. If our network has not come to the correct solution it must repeat this entire process once more until it gets it. Each time the network will make small changes to itself until it gets to its' ultimate goal. One could think of this process as a giant learning loop.

Learning Loop

Every time a loop is completed and the network starts again in order to minimize the error, we call this process back propagation. This is because the error is propagated through the network for readjustment. This is what separates modern systems from more traditional systems. Traditional systems had to wait on humans to come and fix the errors present. These systems want to minimize their error rate and they strive to perfection on their own, thus relieving the engineer of the burden of maintenance. Hopefully, you are beginning to see the benefits of solving problems this way!

AUTOMATING TRAINING

It's always a good idea to automate training for a neural network. There could be times when training on a particular problem set can take hours or even days. It would not be wise to perform this process manually, so we will instead write a function that handles this for us. 

Here we take in a list of inputs along with a target list. We then turn both the input and the target into a 2-dimensional list. We then update our agent with the target so that it is no longer nil. Finally, we start our feed-forward process.

Here we take in a list of inputs along with a target list. We then turn both the input and the target into a 2-dimensional list. We then update our agent with the target so that it is no longer nil. Finally, we start our feed-forward process.

AUTOMATING LEARNING

As mentioned before the learning process is a loop. Elixir is a functional language and this causes us to use functions to handle our loops for us. In our loop, we will need to collect the input and the target and pass it into the network. The network trains on the data and checks its error rate. For me, I'd like my error rate to be minuscule. As a result, I'd like the network to train until it gets to an error rate below 0.02. If it finds that its' error rate is higher than 0.02 then it must continue training. This is what the learning process looks like. A repetitive cycle that the network must go through until the task is learned with little to no error. We can accomplish this via pattern matching...

- The first learn function takes in our network error rate, our user data, and something we call an epoch. An epoch is a lifetime count of a neural network. You can think of an epoch as a network's age. This function is only called when our error rate is above 0.02. This will indicate to the system that it needs more training. Each loop of training increases our epoch by 1. Our error rate is fetched and then passed to the final learn function if the error rate is less than 0.02. If not, we call the current learn function. - The second learn function takes the same parameters, but it is considered our stopping function. This function is used when the training is complete and our error rate is acceptable. This indicates that our system is fully trained on the data set and is ready for testing.

- The first learn function takes in our network error rate, our user data, and something we call an epoch. An epoch is a lifetime count of a neural network. You can think of an epoch as a network's age. This function is only called when our error rate is above 0.02. This will indicate to the system that it needs more training. Each loop of training increases our epoch by 1. Our error rate is fetched and then passed to the final learn function if the error rate is less than 0.02. If not, we call the current learn function.

- The second learn function takes the same parameters, but it is considered our stopping function. This function is used when the training is complete and our error rate is acceptable. This indicates that our system is fully trained on the data set and is ready for testing.

The final thing we need to do is create our data struct for our user input and target. This information will then need to be passed to a learn function that will kick off the entire process.

Our original data table defined as a struct.

Our original data table defined as a struct.

Now we kick off the whole process via our final function...

- Here we initialize the random weights and pass our user data and our desired target to the network. Next, we call our Learn function by passing in the error rate, and our user data along with our network age, which is 0 because it will be starting for the first time.

- Here we initialize the random weights and pass our user data and our desired target to the network. Next, we call our Learn function by passing in the error rate, and our user data along with our network age, which is 0 because it will be starting for the first time.

That's it!  Our network is fully built. What happens when we fire this up?

output

BANG!!! We can see it took 13 epochs for the training to complete. Our network finally reached our target list of [1,1,1] and it was able to get its error rate down below 0.02!! That's pretty impressive! 

CONCLUSION

One might be thinking, what's the significance of this? How can this be used in the real world? The importance of Machine Learning is essential to the next era of the technological age in that it allows us as engineers to deal with large sums of data and train our systems to gather insights, or predict outcomes, and solve problems that we might not have a clue how to solve ourselves. As we've just witnessed, we can see how these systems are good at minimizing error, which is invaluable in the real world.

The beauty of Neural Networks is the fact that we can architect them in different ways to create human-like intelligence in our software systems. In this post, we have by no means covered all of the algorithms and different ways these networks can be designed to do amazing things. The goal of Automating the future is to continue to bring the Elixir community wonderful examples of how Neural Networks can be used to solve a wide variety of problems.

Now that we know how to design a basic multi-layered neural network we can move on to some excellent example projects of truly automated software systems that learn and solve different types of problems in our future posts. If you want to check out the Deepnet code I have placed it on GitHub. Feel free to fork, experiment and change as much as you like. This repo can serve as an example for the Elixir community on one way we can design a deep learning network from scratch!

 

Deepnet

 

Making Predictions With Simple Linear Regression Models

Making Predictions With Simple Linear Regression Models

Automated Problem Solving with Genetic Algorithms

Automated Problem Solving with Genetic Algorithms