Neural Network From Scratch

5 min readMar 20, 2019

Currently I am enrolled in a machine learning course as part of my computer science curriculum. We recently had a project where we had to build a Multilayer Perceptron neural network to classify a specific dataset. Although, the project gave me a strong introduction to how a multilayer perceptron works, we were able to import very powerful packages in Python (part of what makes the language so awesome) that didn’t allow much of a view under the hood of the neural network.

So I thought I would take on the responsibility of building a neural network from scratch as my own project in order to learn more about how data flows back and forth through layers of neurons and trains a machine to make intelligent decisions, much like the human brain is designed

Let’s begin.

To begin my project, I of course imported some basic packages I would need from python in order to pull a couple functions that I would use in my implementation. In this block, I also pulled the dataset that I would be learning on and separated out a basic feature vector, which included the information the machine would use to make predictions, and a label vector, which includes the names of the flower species that the machine would be predicting.

After I have my initial vectors and program setup, I started pre-processing my data. In this case, this is a very popular dataset that is fairly clean, so the extent of my work here was limited. Although, I did use a pre-defined one_hot_encoder() function in my pre-processing phase. This popular function basically turns my classes, or values of my label vector, into numerical values through a process called binarization. This function is useful in the sense that if I hadn’t and my labels were alphabetic values, my python program might assign randomly some labels to be “greater than” or “less than” others like numbers operate. This could cause my machine to make false calculations later on.

Moving on…

I then split my two initial feature and label vectors into further sub-vectors. This is so that we have separate data to train our model and to test our model. Why would we test it on data it has already seen, of course it would get it right!

Now the data is all split up and we can breath fresh air and move on to setting up the network to train. Our input layer is a vector that we will initially start our network with. But before that, we have to define a hidden layer, the secret area where values will be calculated in magic ways and eventually come out into the output layer where predictions are made. The hidden layer is by rule of thumb, 3–5 times the amount of nodes that the input vector consists of. In this project, I chose three. To finish out the network initialization, we had to of course give the network weights and bias values, basically values that connect the layers and are later adjusted to make the model smarter.

Core Functionality

This brings us to the core functionality of the network. For this block, I decided on my activation functions I would use, and designed functions for them. I chose the Rectified Linear Unit (relu) function to calculate values at the hidden layer. This function is super popular lately and I believe it is because it is pretty simple and super effective. It just decides if the value given to it is bigger than 0, and if it is it passes on that value, and if not it passes on a 0. You might realize this might not work as my output activation function for obvious reasons, so I chose to use a softmax function to predict the output values. The softmax function is nice because it ouputs values between the interval [0, 1]. Since all the values put together will have to equal one, it chooses the values with highest probability to make a prediction.

Furthermore, there are two major processes left I designed to give this neural network a mind of its own. Forward propagation and back propagation. We can imagine on a high level this network works in a way that data pulsates forwards and backwards through it to train it. When data pulsates forward — Forward propagation, and when data and values pulsate backwards — Backpropagation.

Simple right?

“IT’S ALIVE”

Forward Propagation — The input values move through the network with the current weights or values through the layers and the prediction is made based on those numbers. At the start of course your machine is going to predict with awful error because we initialize the weights and biases randomly. And of course, because machines are dumb. Eventually it will forward propagate and be smart (but only because of the next process).

Back Propagation — This is where the machine learns. Or as Vic Frankenstein would say, “IT’S ALIVE!”. So after the initial run through of the network, the values outputted will have a specific error value associated with them, and then that error or loss rate is sent backwards through the network and through a process called gradient descent to minimize the error as much as possible, and adjust the weights and biases accordingly.

Imagine this fancy term gradient descent as you are trying you are standing on the side of a mountain and you are trying to get home but your home is in the valley beneath you. You are the loss value in this case and your home is the value that the process of gradient descent is trying to get you to, in order to minimize error on the machine’s part. So you’ll slide down the mountain and map your directions on the way just as the network will map new values as its parameters so that next time forward propagation happens it’s that much smarter.

Finally after running your network a ton of times to get it guessing as best it can, we’ll give it new data with the smart parameters that it has gained and see how it does! In my specific project, I got the machine to usually predict with 92%+ accuracy with a learning rate of 0.0001. For my first network from scratch, I wasn’t disappointed, but I left the iteration number and learning rate open to modification to see if you could tweak it and outperform my parameter choices.

Conclusion

Phew! That’s basically the processes I considered in creating this network from scratch. Of course I did my best to turn these words and concepts into code with actual calculation. Go check out my actual project on my Github here. It’s documented a little differently in the project file so maybe that will provide some clarity where I couldn’t provide it here.

Thanks for reading!

-mt