Training Elixir Processes To Learn Like Neurons.
Like the old saying goes "Learning is a Process". The same is true for machines that have the ability to learn. In Elixir, we think in terms of processes. These processes are the secret to our ability to do very complex tasks in a concurrent fashion. In this post we are going to explore learning rules for a Neural Network, and how these same learning rules can be modeled and taught to Elixir processes behaving as neurons.
Before we dive into creating a learning network it is necessary to understand the 3 ways an Artifical Neural Network achieves learning.
THE 3 TYPES OF LEARNING TECHNIQUES
- Supervised Learning: This is the level of learning we will Implement in this post. It involves defining a problem as a training set and feeding the network examples of this set until it adjusts to the correct weights that match the desired output of the human. This is the easiest form of learning to implement, and this technique can be applied to most problems where a solution is known and understood. Most Apps in existence today could have been built using this learning technique without the need for countless hours of developers blood sweat and tears writing and maintaining bloated code bases full of imperative code instructions.
- Reinforcement Learning: This one does learning by grading the network. Instead of giving the network the desired output the human wants achieved, it instead is given a grade and works until the grade it receives is satisfactory to the network. Imagine playing games in the future with systems that possess this capability. Games would get harder to beat the more it learns about your tactics!!! Control systems in automated factories will utilize this technique often.
- Unsupervised Learning: This is the mother load for anyone in this field. This is the ability for a network to learn on its own through experience, without the aid of the human being. I can't keep up with all the breakthroughs happening with this technique. Here is one example here. There are many companies making great strides here because they know that whoever can do this at scale will win AI. It will be extremely difficult to compete with companies that can do this effectively. ( Popular belief has it that one of the reasons Neural Nets went underground is because of the inability to train these types of systems for unknown situations. I think it had more to do with the lack of data volume for a network to utilize for training at that time. Fun Fact, DARPA has been in this Artificial Neural Network research for a long time. This paper will enlighten you in what they found in their studies. Keep in mind that this study was done in The 80's!!!! That means the research was likely going on long before that!!! Needless to say, the private sector is just now rediscovering the power of this form of computation and how it can be applied to today's various applications.)
After several times of going to the grocery store with my wife, I noticed a pattern emerged in the way she placed things on the counter at the conveyer belt of the register. She made an effort to separate all the hot items from the cold items so that they could be bagged together and preserve the grocery's shelf temperature. This was interesting to me considering the fact I never previously subscribed to such a system, but after hearing her explanation for doing it this way, it made complete sense for me to adopt it.
By me seeing the examples of her doing this over and over the neuron processes in my head learned this pattern as well. It was only a matter of time before I picked this up, and now I place groceries on the belt the same way.
We are going to model this problem for a neural network and then train it to be able to classify items the same way.
In order to train our Neurons, we need to take it through iterations of training data. The training data must be good enough to teach our neurons how to recognize patterns within the problem space of our Grocery store dilemma we mentioned above. Consider the graph below...
We have two classifications, hot and cold. While thinking about the problem we came up with 2 attributes we would like to train our neurons to distinguish an items classification by. Temperature is needed because that is a good indicator of whether something is hot or cold. The location of an item in the store is also a good indicator of whether it is hot or cold because most grocery stores layout items in aisles separated by their perspective temperature. It would be remiss not to consider this. For example, Ice-cream, frozen dinners, and popsicles would likely be in the same location. We will leave one more input for the bias, after all, we are generalizing, so we want the system to make room for such a generalization. The target is what we want our system to work towards. We'll use 1 to represent a hot item, and 0 to represent a cold item.
Now that we know the structure of our problem let's prepare our training data. This training will serve as a guideline for our network in its effort to solve our problem on its own. Considering our structure of the problem listed above we clearly see that each classification has 3 inputs. In order for the system to learn to classify generic items on its' own, it needs to have a base to draw from. This base is called it's "Learning Rules".
Let's resist the temptation to write our rules out in an imperative fashion. Instead, we will solely represent these rules via data. Like the Clojure folks say, "Data can represent itself just fine!".
Let's start with our cold inputs. For temperature, let us consider that any item that has a temperature of 60 degrees or less will have to be considered a cold item. This can be anything from Ice-cream to kale, or chicken. These items will then be represented by numeric values. The inputs are usually points on a graph that represents the probability of something. The numbers can be organized in whichever way fits your specific problem. Temperature will be the first input. For location, let's say aisles 1 - 5 are where all the cold items are usually found in the store. That input will need to be represented to our network numerically as well. Biases are usually generated randomly along with weights, so we will let the system determine that.
For our hot items, let's deal with the inputs for our network. For the temperature, we want to be the opposite of our cold item temperature rule, so we will let any temperatures that go above 60 degrees will definitely need to show a different output. For the location of hot items, we expect that to be different as well because items that are hot are never located near the frozen foods aisle!.
THE GRAND ARCHITECT
The next step is to model our Neural Network Architecture. This is a very simple problem with a binary output of 1 or 0, so we are going to go with what is sometimes called the Single Neuron Perceptron.
If you study the history of Neural Nets (which I highly recommend) this was one of the first architectures that showed amazing progress in cybernetics due to Frank Rosenblatt's contributions regarding learning rules. Later the multilayered architecture pushed AI further, but for this post, we will only be working with a single layer perceptron because our problem is trivial.
Being able to develop this type of network from scratch is basically the initiation into the world of neural nets. All algorithms for Neural networks stem from the concepts of the Single Neuron Perceptron. Mastering the Single Neuron Perceptron will prepare you for mastering more complicated networks. We can model our architecture this way....
Elixir is my go to when developing any autonomous system, so the architecture as it appears to the elixir developer can be seen like this....
As I've mentioned before, artificial neurons are just small processes of computation. In Elixir, we utilize processes to get our work done. In the image above you will see that our Layer has been turned into a simple Supervisor. Supervisors are perfect for modeling Neural Network layers because they give you the ability to supervise any child processes it spawns. These child processes are simply neurons in that layer. This way, if anything happens to a specific neuron in the layer it can be restarted by the layer supervisor.
The neuron can be represented by the Elixir Agent. The Agent will be responsible for tracking the state of the neuron as it is being adjusted by its weights to come to the desired output. The inputs and weights have been represented as Elixir Tuples, which is how we pass messages back and forth between processes.
Now that we know what our training data looks like, and the structure of our network, now it's time to model our network and train it to solve our grocery dilemma. Let's call this new system "BAGGER". BAGGER will be a Single Perceptron Network that is responsible for classifying whether or not a grocery item goes into the cold bag, or goes into the hot bag. We will then give BAGGER a list of random grocery items and see if it has learned to put the items in the correct bags!
BAGGER will be reading a grocery list of choice, analyzing it, and bagging them in the appropriate bag. This is significant because it allows for a system to operate generically. The system will figure out via the neuron how it should deal with the data it is seeing. Let's take a look at what our example file looks like. I like shopping at whole foods, so our shopping list will be called "whole_foods.csv".
Notice each item has an input for temperature and location. These inputs can be essentially anything that exists on a graph. Each input represents a point on a graph. The target, however, must be what you want the system to reach. Remembering our story above we want 0 to represent our cold items and we want 1 to represent our hot items. The idea here is that regardless of the inputs the system receives it will always converge to the correct solution. This breakthrough is what has lead to the birth of ANN development years ago.
The hard part for us is to read this file, and format the data, luckily `Elixir` has many options out there for reading and parsing CSV files. Here is how we capture our data for processing...
This is the first pipeline that the data goes through. There are two points we want to focus on with this code. The first point is the `parse/1` function. The parse/1 function formats our data for the algorithm. All the data in the files are considered strings so the integers in the file cannot be used in the calculations that way. Let's examine our parse function first!
Notice the use of Stream.map/2 this helps us just in case we might be reading a file with over 1,000 items in there!!! We want to perform Lazy Evaluation on it so that we can navigate the file in one pass, and then do the operation once the navigation is done. The operation is to prepare the integers in the file to appear as integers instead of strings so that our neuron/agent can do the necessary calculation.
This brings us to the next important thing. The Neuron. If you looked closely at our first code example you might have noticed the reference to the Bagger.Workers.Neuron.add_inputs/1 function. This is where all the work is done.
I've found that Elixir Agents are the best at representing Neurons. They can be passed around to many other processes, and its internal state is not shared with anything on the outside. This allows you to easily stack hundreds of them in a multi-layered network architecture if you wanted to. Agents can also be attached to Supervisors which is an extremely useful feature especially if you are building a multi-layered network. Here is how I've structured "Bagger's One Neuron"...
A Neuron's struct is pretty straight forward. We want a place to store our "pid" because this could be referenced by many other neurons. It's helpful to know which one is which. Admittedly, it's not entirely necessary for this network because there is only one Neuron in it, but its good practice to always have a way to get to a neurons process id if you can. The rest of the references should be familiar if you've been following the series.
I mentioned that Neurons are attached to layers. I view layers as Supervisors that manages child processes. Since we are dealing with 1 layer, we will need one Supervisor that watches this neuron much like in our diagram above. When the Layer/Supervisor starts, it needs to give birth to the neuron. Let's add a function that creates a new Neuron so that our layer supervisor can use it.
All neurons need a bias. A bias can be random or set to 1. I've chosen to set ours to 1. Next, we use the self() function so that we can get the actual process id of the new neuron that has gotten created.
The new/0 is called in our Supervisor. This automatically connects the Neuron to the Layer and begins the process of fault tolerance. This way, once the App starts our Neuron is all ready to go.
Finally, we can take a look at what happens when we add inputs to our Neuron. In this case all the inputs will be everything that is coming from our grocery list. Taking a look at this function will show the entire process of how the Neuron will do its work...
It's important to note that here we update our neuron with random weights. Nobody has any idea what these numbers will be, however, there does need to be one weight per input to the neuron. These weights and inputs are then calculated via our activation function, and then our neuron measures whether or not it has reached the target threshold. If it has not reached the target threshold it makes a necessary adjustment to the input weights and bias until it can reach the desired target. This simple flow makes this algorithm extremely powerful.
Because we are dealing with a binary output scenario, the `hard_limit` function should train it the way we need it to be trained since it specifically works via outputs of 1 or 0. For a refresher on the different types of activation functions my post here goes into detail! Here is our reference to the activation function and how its used in the code...
Training a neuron or a group of neurons is an iterative process. The beautiful part about this is the fact that it can do the training on its own until it has learned what you want it to learn. In our case we wanted our neuron to learn how to bag our groceries correctly. The last thing we want is a Nice hot steamy lasagna to be thrown in the bag with Ice-cream and Frozen Pizza!!!! That would be unacceptable for any system. To avoid this we train it to reach its target. If the Target is not reached it tries again, until eventually, the right solution is found. This is the reason behind our adjust function. It's specifically designed to automate this process for our neuron...
This ability to retrain and adjust is what makes AI systems very powerful. The errors are collected and cleaned up by the system so that it can continue on with its' learning. The goal is to get to a point where there are no errors in its logic, or the probability of error is extremely low. In this case, we need our system to be exact because we are dealing with a right or wrong answer!!
The neuron then updates and tries that process over.
Let's take a peek at what happens when we tell our system to start bagging our groceries after we've trained it!!
What we've just seen here is a quick example of how the Single Neuron Perceptron works. More importantly, we've seen how accurate it is at finding its target regardless of the input. It Also looks like our BAGGER put everything where it should go! It doesn't matter if I had 10 items in the list or 10,000 this architecture is pretty accurate for binary based problems. This architecture can be used for automated testing systems especially when you want a system to learn the difference between something that is passing or failing!
If you want to dive into the code check out the project on Github