# A different look at neural networks

I was on a project once that was trying to predict the output value of engine emissions (in parts per million, or ppm); The engineer in charge gave me lab data that reported several channels (such as engine speed, oil temperature, etc), including a channel for particulate count.

The goal was to predict particulate count as a function of various channels. The challenge was that I had to design a formula that could be put on to a piece of hardware with limited mathematical skills (these days, very few electronics hardware can do native matrix multiplication much less neural network (ANN) regression!)

*So the goal wasn't so much to train an ANN, but to translate it into simple arithmetic.*

Well, after digging around, rereading about perceptrons, and poking around source code, I came up with the following!

`np.dot((np.dot(X`

*test[1],reg.coefs*[0]) +reg.intercepts*[0] ),reg.coefs*[1]) + reg.intercepts_[1]

Simple right?!

Let's take a look at what's going on. First, we have to train a simple neural network. For illustrative purposes, I'm using 2 input values, and outputting one regression value. I'm using a very simple ANN, with an 'identity' activation function. The other parameters are more important in the training step, but I want to focus on the prediction step.

### Let's quickly train an ANN, so we have a common point of reference

```
from sklearn.neural_network import MLPRegressor
import pandas as pd
from sklearn import datasets
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
import numpy as np
```

```
# import some data to play with
iris = datasets.load_iris()
X = iris.data[0:100, :2] # we only take the first two features.
Y = iris.data[0:100, 3] #and try to predict the third feature
X = StandardScaler().fit_transform(X)
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.15)
reg = MLPRegressor(hidden_layer_sizes=(4,),
solver="lbfgs",
max_iter = 200,
activation = 'identity')
reg.fit(X_train,y_train)
```

### So we've gone ahead and fit the ANN.

What does that mean? Anyone that's familiar with ANNs has surely seen images like this:

Well, that's a common model of a **perceptron network with one hidden layer**. Let's add some annotations to show weights, inputs and outputs.

You'll notice, I added a bias unit.

The basic steps of evaluating with a multilayer perceptron is as follows

- Start with input vector
**X** - Multiply by weights, and add
- Add bias value
- Activate
- For each layer, go to 2
- End with output value Y

But what do those weights really mean?

Let's calculate the values of the hidden layer manually. The first node is X1 * W111 + X2* W121 + b1. Repeating this process for each of the four nodes, we get

When I first studied it, I was (and frankly, still am!) astounded at the ability to convert linear systems of equations into matrices. What's really cool about vector spaces are the complex behaviors that emerge from 8 simple rules (but that's a topic for another time)! *It is vital that the reader be able to understand the translation from a perceptron model to a set of linear equations to a matrix operation*! Take your time to really comprehend and appreciate what's going on here.

Another way to look at this diagram is *matrix multiplication*. We basically compress step 2 and 3 into a pair of matrix operations (multiply by the weights matrix, and add by the bias matrix). I'm ignoring the activation step and the second bias term, since I used the identity function as the activation function.

So, just like before, we start with a 2-D vector: X ∈ R2, and end up with a real number: Y!

Going back to our python code, let's look at the reg object.

`reg.coefs_`

```
Out[17]:
[array([[-0.26405775, 0.25966321, 0.06294898, -0.5467497 ],
[ 0.22969049, 0.07363783, -0.31106338, 0.55501851]]),
array([[ 0.58731861],
[ 0.13638876],
[-0.59955134],
[-1.02503658]])]
#or,
reg.coefs_[0]
Out[18]:
array([[-0.26405775, 0.25966321, 0.06294898, -0.5467497 ],
[ 0.22969049, 0.07363783, -0.31106338, 0.55501851]])
```

So, the first set of coefficients corresponds to the first weight matrix!

Similarly, the first set of intercepts correspond to the first bias matrix

`reg.intercepts_[0]`

```
Out[21]: array([ 0.30440519,0.58837277, -0.11741732, -0.74999497])
```

So, now, we can substitute the coefficients/intercepts into the matrix multiplication to get

`np.dot((np.dot(X`

*test[1],reg.coefs*[0]) +reg.intercepts*[0] ),reg.coefs*[1]) + reg.intercepts_[1]

And when we evaluate it in the short and long way, we get the same answer!

`reg.predict(X_test[0])`

Out[24]: array([ 1.7076929])

```
np.dot((np.dot(X_test[0],reg.coefs_[0]) +reg.intercepts_[0] ),reg.coefs_[1]) + reg.intercepts_[1]
```

Out[25]: array([ 1.7076929])

so, the results of our prediction match the results of our long math. I'm lazy, so I won't post extra step to unwrap matrix multiplication, but that's a pretty straightforward step. Or in academic speak "the reader is encouraged to expand the matrix multiplication."