# PyTorch From First Principles: Part II

In the first part of this article, we built a multi-layer perceptron from scratch in order to learn an arbitrary function, utilizing some conveniences of PyTorch. In this article, we’ll ditch the conveniences; after which developing any kind of neural network becomes easy!

As usual, we need some basics:


import torch
import torch.nn as nn
import torch.nn.functional as F


Previously, the neural network looked something like this:


class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.fc1 = nn.Linear(1,1)
def forward(self, x):
x = F.relu(self.fc1(x))
return x


Let’s say that to really understand neural nets, we need to understand what nn.Linear is. This is their basic building block; depending on our understanding of and belief in neurobiology we might want to develop something else.

## Making a Neuron MS Paint style

Recall that the basic neuron multiplies inputs by weights, sums them, and performs some non-linear function on the result. Here’s an example, where non-linear function is ReLU, and where there is no bias term:

(Inputs in red, weights in green, result in black).

Written in the familiar linear algebra notation, what you see here is:

This calculation extends to any size of neural network, where each layer builds on the last: it’s a series of matrix multiplications.

Here’s another example, this time with two inputs and two neurons.

(The top neuron outputs 19, the lower one outputs 22.)

Say the mini-batch size was 2: a training sample of [1,2] and another training sample of [3,4]. The outputs of this layer can be computed simultaneously, making better use of a GPU:

Conventionally, the green weights matrix is written the other way as [[5,7],[6,8]] and transposed. In general form:

Input activations Z are put through the non-linear function sigma to yield output activations A.

You can confirm this in PyTorch, as follows:

Series’ of operations like this can be chained to make a deep network; somewhere internally the framework is compiling a graph of computations to figure out what can be done simultaneously vs sequentially. Additionally, inputs do not have to be fed in to the network in a long line — as tensors, this basic calculation extends into addditional dimensions (e.g. the three RGB channels of an image) which preserves spatial data that would be lost if the inputs to a network were just a long vector.

Building a fully connected layer of neurons is a doddle. Inherit the utility class nn.Module, set inputs and outputs, create the parameters and initialize them, and define a forward method (PyTorch takes care of the corresponding backward method and neurons’ gradients, and indeed recommends not trying to override it yourself).

## Optimizing The Neural Network

I mentioned in the first article that this one wouldn’t just build a neuron, but would also make a new learning method to compete with SGD. Here it is! Zero error after a single epoch (the function we are learning is to triple a positive number: therefore set the weight to 3 and the bias to 0). This is just to demonstrate how to work with the network’s parameters.

As a class, it just needs to be initialized with a generator of the network’s weights. We can process them as a list, then convert the list back to a generator at the end. All you need to do is define a step method:


class Solved():
def __init__(self, params):
self.params = params
def step(self):
weights = list(self.params)
for name, weight in weights:
if name == 'fc1.weight':
weight.data.fill_(3.)
if name == 'fc1.bias':
weight.data.fill_(0.)
self.params = iter(weights)


And use named_parameters instead of parameters:


solver = Solved(net.named_parameters())


Viola, incredible performance:


Epoch 0 - loss: Variable containing:
0
[torch.FloatTensor of size 1]


## Conclusion

I’ve taken a few shortcuts here:

• There are heuristics for initializing layer weights/biases which I have ignored and just set them to zero
• The optimizer is obviously ‘fake’

Nevertheless: this is a neural network from scratch. To go from here to state-of-the-art only requires adding more of the same. Let’s go play! As usual you can find the final notebook from this article on my Github.