A different look at neural networks

I'm still figuring this website stuff out, but the "learning" page has this same material in a prettier format: https://www.mohammadathar.com/a-different-look-at-neural-networks

 

I was on a project once that was trying to predict the output value of engine emissions (in parts per million, or ppm); The engineer in charge gave me lab data that reported several channels (such as engine speed, oil temperature, etc), including a channel for particulate count.

The goal was to predict particulate count as a function of various channels.  The challenge was that I had to design a formula that could be put on to a piece of hardware with limited mathematical skills (these days, very few electronics hardware can do native matrix multiplication much less neural network (ANN) regression!)

So the goal wasn't so much to train an ANN, but to translate it into simple arithmetic.

 

Well, after digging around, rereading about perceptrons, and poking around source code, I came up with the following!

 

np.dot((np.dot(X_test[1],reg.coefs_[0]) +reg.intercepts_[0] ),reg.coefs_[1]) + reg.intercepts_[1]

 

Simple right?!

Let's take a look at what's going on.  First, we have to train a simple neural network.  For illustrative purposes, I'm using 2 input values, and outputting one regression value.  I'm using a very simple ANN, with an 'identity' activation function.  The other parameters are more important in the training step, but I want to focus on the prediction step.

 

from sklearn.neural_network import MLPRegressor 
import pandas as pd
from sklearn import datasets
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
import numpy as np

# import some data to play with
iris = datasets.load_iris()
X = iris.data[0:100, :2]  # we only take the first two features.
Y = iris.data[0:100, 3] #and try to predict the third feature
    
X = StandardScaler().fit_transform(X)

X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.15)
reg = MLPRegressor(hidden_layer_sizes=(4,),
                           solver="lbfgs",
                           max_iter = 200,
                           activation = 'identity')
reg.fit(X_train,y_train)

 

So we've gone ahead and fit the ANN.  What does that mean?  Anyone that's familiar with ANNs has surely seen images like this:

http://i.imgur.com/0XguzRa.png

Well, that's a common  model of a perceptron network with one hidden layer.  Let's add some annotations to show weights, inputs and outputs.

 

http://i.imgur.com/HAy7DYd.png

 

You'll notice, I added a bias unit.

The basic steps of a multilayer perceptron is as follows

  1. Start with input vector X
  2. Multiply by weights, and add
  3. Add bias value
  4. Activate
  5. For each layer, go to 2
  6. End with output value Y

But what do those weights really mean?  For a lot of data scientists, they either know weights, and inputs/outputs, or some ugly looking series such as the one found on Wikipedia: https://wikimedia.org/api/rest_v1/media/math/render/svg/bf43c01ee8403ea39c2f6d2829576c1769a100d7

 

Another way to look at this diagram is matrix multiplication.  We basically compress step 2 and 3 into a pair of matrix operations (multiply by the weights matrix, and add by the bias matrix).  I'm ignoring the activation step, since I used the identity function as the activation function.

 

http://i.imgur.com/SELgFpl.png

 

So, just like before, we start with a 2-D vector: X ∈ R2, and end up with a real number: Y.

 

Going back to our python code, let's look at the reg object.

reg.coefs_
Out[17]: 
[array([[-0.26405775,  0.25966321,  0.06294898, -0.5467497 ],
        [ 0.22969049,  0.07363783, -0.31106338,  0.55501851]]),
 array([[ 0.58731861],
        [ 0.13638876],
        [-0.59955134],
        [-1.02503658]])]

or, 

 

reg.coefs_[0]
Out[18]: 
array([[-0.26405775,  0.25966321,  0.06294898, -0.5467497 ],
       [ 0.22969049,  0.07363783, -0.31106338,  0.55501851]])

 

So, the first set of coefficients corresponds to the first weight matrix!

Similarly, the first set of intercepts correspond to the first bias matrix

reg.intercepts_[0]
Out[21]: array([ 0.30440519,0.58837277, -0.11741732, -0.74999497])

 

So, now, we can substitute the coefficients/intercepts into the matrix multiplication to get

 

np.dot((np.dot(X_test[1],reg.coefs_[0]) +reg.intercepts_[0] ),reg.coefs_[1]) + reg.intercepts_[1]

 

Let's check that for X_test[0] (the first element in the test matrix)

 

reg.predict(X_test[0])
C:\Users\user\AppData\Local\Continuum\Anaconda3\lib\site-packages\sklearn\utils\validation.py:395: DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and will raise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.
DeprecationWarning)
Out[24]: array([ 1.7076929])

np.dot((np.dot(X_test[0],reg.coefs_[0]) +reg.intercepts_[0] ),reg.coefs_[1]) + reg.intercepts_[1]
Out[25]: array([ 1.7076929])

so, the results of our prediction match the results of our long math.  I'm lazy, so I won't post extra step to unwrap matrix multiplication, but that's a pretty straightforward step.  Or in academic speak "the reader is encouraged to expand the matrix multiplication."