PyTorch From First Principles: Part II
In the first part of this article, we built a multi-layer perceptron from scratch in order to learn an arbitrary function, utilizing some conveniences of PyTorch. In this article, we’ll ditch the conveniences; after which developing any kind of neural network becomes easy!
As usual, we need some basics:
import torch from torch.autograd import Variable import torch.nn as nn import torch.nn.functional as F
Previously, the neural network looked something like this:
class Net(nn.Module): def __init__(self): super(Net, self).__init__() self.fc1 = nn.Linear(1,1) def forward(self, x): x = F.relu(self.fc1(x)) return x
Let’s say that to really understand neural nets, we need to understand what nn.Linear is. This is their basic building block; depending on our understanding of and belief in neurobiology we might want to develop something else.
Making a Neuron MS Paint style
Recall that the basic neuron multiplies inputs by weights, sums them, and performs some non-linear function on the result. Here’s an example, where non-linear function is ReLU, and where there is no bias term:
(Inputs in red, weights in green, result in black).
Written in the familiar linear algebra notation, what you see here is:
This calculation extends to any size of neural network, where each layer builds on the last: it’s a series of matrix multiplications.
Here’s another example, this time with two inputs and two neurons.
(The top neuron outputs 19, the lower one outputs 22.)
Say the mini-batch size was 2: a training sample of [1,2] and another training sample of [3,4]. The outputs of this layer can be computed simultaneously, making better use of a GPU:
Conventionally, the green weights matrix is written the other way as [[5,7],[6,8]] and transposed. In general form:
Input activations Z are put through the non-linear function sigma to yield output activations A.
You can confirm this in PyTorch, as follows:
Series’ of operations like this can be chained to make a deep network; somewhere internally the framework is compiling a graph of computations to figure out what can be done simultaneously vs sequentially. Additionally, inputs do not have to be fed in to the network in a long line — as tensors, this basic calculation extends into addditional dimensions (e.g. the three RGB channels of an image) which preserves spatial data that would be lost if the inputs to a network were just a long vector.
Building a fully connected layer of neurons is a doddle. Inherit the utility class nn.Module, set inputs and outputs, create the parameters and initialize them, and define a forward method (PyTorch takes care of the corresponding backward method and neurons’ gradients, and indeed recommends not trying to override it yourself).
Optimizing The Neural Network
I mentioned in the first article that this one wouldn’t just build a neuron, but would also make a new learning method to compete with SGD. Here it is! Zero error after a single epoch (the function we are learning is to triple a positive number: therefore set the weight to 3 and the bias to 0). This is just to demonstrate how to work with the network’s parameters.
As a class, it just needs to be initialized with a generator of the network’s weights. We can process them as a list, then convert the list back to a generator at the end. All you need to do is define a step method:
class Solved(): def __init__(self, params): self.params = params def step(self): weights = list(self.params) for name, weight in weights: if name == 'fc1.weight': weight.data.fill_(3.) if name == 'fc1.bias': weight.data.fill_(0.) self.params = iter(weights)
And use named_parameters instead of parameters:
solver = Solved(net.named_parameters())
Viola, incredible performance:
Epoch 0 - loss: Variable containing: 0 [torch.FloatTensor of size 1]
I’ve taken a few shortcuts here:
- There are heuristics for initializing layer weights/biases which I have ignored and just set them to zero
- The optimizer is obviously ‘fake’
Nevertheless: this is a neural network from scratch. To go from here to state-of-the-art only requires adding more of the same. Let’s go play! As usual you can find the final notebook from this article on my Github.