A different look at neural networks

I was on a project once that was trying to predict the output value of engine emissions (in parts per million, or ppm); The engineer in charge gave me lab data that reported several channels (such as engine speed, oil temperature, etc), including a channel for particulate count.

The goal was to predict particulate count as a function of various channels.  The challenge was that I had to design a formula that could be put on to a piece of hardware with limited mathematical skills (these days, very few electronics hardware can do native matrix multiplication much less neural network (ANN) regression!)

So the goal wasn't so much to train an ANN, but to translate it into simple arithmetic.

 
 

Well, after digging around, rereading about perceptrons, and poking around source code, I came up with the following!

np.dot((np.dot(Xtest[1],reg.coefs[0]) +reg.intercepts[0] ),reg.coefs[1]) + reg.intercepts_[1]

Simple right?!

Let's take a look at what's going on.  First, we have to train a simple neural network.  For illustrative purposes, I'm using 2 input values, and outputting one regression value.  I'm using a very simple ANN, with an 'identity' activation function.  The other parameters are more important in the training step, but I want to focus on the prediction step.

Let's quickly train an ANN, so we have a common point of reference

from sklearn.neural_network import MLPRegressor import pandas as pd from sklearn import datasets from sklearn.decomposition import PCA from sklearn.preprocessing import StandardScaler from sklearn.model_selection import train_test_split import numpy as np

# import some data to play with
iris = datasets.load_iris()
X = iris.data[0:100, :2]  # we only take the first two features.
Y = iris.data[0:100, 3] #and try to predict the third feature

X = StandardScaler().fit_transform(X)

X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.15)
reg = MLPRegressor(hidden_layer_sizes=(4,),
                       solver="lbfgs",
                       max_iter = 200,
                       activation = 'identity')
reg.fit(X_train,y_train)

So we've gone ahead and fit the ANN.  

What does that mean?  Anyone that's familiar with ANNs has surely seen images like this:

 

Well, that's a common  model of a perceptron network with one hidden layer.  Let's add some annotations to show weights, inputs and outputs.

 

You'll notice, I added a bias unit.

The basic steps of evaluating with a multilayer perceptron is as follows

  1. Start with input vector X
  2. Multiply by weights, and add
  3. Add bias value
  4. Activate
  5. For each layer, go to 2
  6. End with output value Y

But what do those weights really mean?  

 

Let's calculate the values of the hidden layer manually.  The first node is X1 * W111 + X2* W121 + b1.  Repeating this process for each of the four nodes, we get

 

When I first studied it, I was (and frankly, still am!) astounded at the ability to convert linear systems of equations into matrices.  What's really cool about vector spaces are the complex behaviors that emerge from 8 simple rules (but that's a topic for another time)!  It is vital that the reader be able to understand the translation from a perceptron model to a set of linear equations to a matrix operation! Take your time to really comprehend and appreciate what's going on here.  

Another way to look at this diagram is matrix multiplication.  We basically compress step 2 and 3 into a pair of matrix operations (multiply by the weights matrix, and add by the bias matrix).  I'm ignoring the activation step and the second bias term, since I used the identity function as the activation function.

So, just like before, we start with a 2-D vector: X ∈ R2, and end up with a real number: Y!

Going back to our python code, let's look at the reg object.

reg.coefs_

Out[17]:
[array([[-0.26405775,  0.25966321,  0.06294898, -0.5467497 ],
        [ 0.22969049,  0.07363783, -0.31106338,  0.55501851]]),
array([[ 0.58731861],
        [ 0.13638876],
        [-0.59955134],
        [-1.02503658]])]
#or, 

reg.coefs_[0]
Out[18]: 
array([[-0.26405775,  0.25966321,  0.06294898, -0.5467497 ],
       [ 0.22969049,  0.07363783, -0.31106338,  0.55501851]])

 

 

So, the first set of coefficients corresponds to the first weight matrix!

Similarly, the first set of intercepts correspond to the first bias matrix

reg.intercepts_[0]

Out[21]: array([ 0.30440519,0.58837277, -0.11741732, -0.74999497])

 

So, now, we can substitute the coefficients/intercepts into the matrix multiplication to get

np.dot((np.dot(Xtest[1],reg.coefs[0]) +reg.intercepts[0] ),reg.coefs[1]) + reg.intercepts_[1]

And when we evaluate it in the short and long way, we get the same answer!

reg.predict(X_test[0])

Out[24]: array([ 1.7076929])

np.dot((np.dot(X_test[0],reg.coefs_[0]) +reg.intercepts_[0] ),reg.coefs_[1]) + reg.intercepts_[1]

Out[25]: array([ 1.7076929])

so, the results of our prediction match the results of our long math.  I'm lazy, so I won't post extra step to unwrap matrix multiplication, but that's a pretty straightforward step.  Or in academic speak "the reader is encouraged to expand the matrix multiplication."