4 minute read

PyTorch From First Principles: Part II

In the first part of this article, we built a multi-layer perceptron from scratch in order to learn an arbitrary function, utilizing some conveniences of PyTorch. In this article, we’ll ditch the conveniences; after which developing any kind of neural network becomes easy!

As usual, we need some basics:

import torch
from torch.autograd import Variable
import torch.nn as nn
import torch.nn.functional as F

Previously, the neural network looked something like this:

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.fc1 = nn.Linear(1,1)
    def forward(self, x):
        x = F.relu(self.fc1(x))
        return x

Let’s say that to really understand neural nets, we need to understand what nn.Linear is. This is their basic building block; depending on our understanding of and belief in neurobiology we might want to develop something else.

Making a Neuron MS Paint style

Recall that the basic neuron multiplies inputs by weights, sums them, and performs some non-linear function on the result. Here’s an example, where non-linear function is ReLU, and where there is no bias term:


(Inputs in red, weights in green, result in black).

Written in the familiar linear algebra notation, what you see here is:


This calculation extends to any size of neural network, where each layer builds on the last: it’s a series of matrix multiplications.

Here’s another example, this time with two inputs and two neurons.


(The top neuron outputs 19, the lower one outputs 22.)

Say the mini-batch size was 2: a training sample of [1,2] and another training sample of [3,4]. The outputs of this layer can be computed simultaneously, making better use of a GPU:


Conventionally, the green weights matrix is written the other way as [[5,7],[6,8]] and transposed. In general form:


Input activations Z are put through the non-linear function sigma to yield output activations A.

You can confirm this in PyTorch, as follows:


Series’ of operations like this can be chained to make a deep network; somewhere internally the framework is compiling a graph of computations to figure out what can be done simultaneously vs sequentially. Additionally, inputs do not have to be fed in to the network in a long line — as tensors, this basic calculation extends into addditional dimensions (e.g. the three RGB channels of an image) which preserves spatial data that would be lost if the inputs to a network were just a long vector.

Building a fully connected layer of neurons is a doddle. Inherit the utility class nn.Module, set inputs and outputs, create the parameters and initialize them, and define a forward method (PyTorch takes care of the corresponding backward method and neurons’ gradients, and indeed recommends not trying to override it yourself).


Optimizing The Neural Network

I mentioned in the first article that this one wouldn’t just build a neuron, but would also make a new learning method to compete with SGD. Here it is! Zero error after a single epoch (the function we are learning is to triple a positive number: therefore set the weight to 3 and the bias to 0). This is just to demonstrate how to work with the network’s parameters.

As a class, it just needs to be initialized with a generator of the network’s weights. We can process them as a list, then convert the list back to a generator at the end. All you need to do is define a step method:

class Solved():
        def __init__(self, params):
            self.params = params
        def step(self):
            weights = list(self.params)
            for name, weight in weights:
                if name == 'fc1.weight':
                if name == 'fc1.bias':
            self.params = iter(weights)

And use named_parameters instead of parameters:

solver = Solved(net.named_parameters())

Viola, incredible performance:

Epoch 0 - loss: Variable containing:
[torch.FloatTensor of size 1]


I’ve taken a few shortcuts here:

  • There are heuristics for initializing layer weights/biases which I have ignored and just set them to zero
  • The optimizer is obviously ‘fake’

Nevertheless: this is a neural network from scratch. To go from here to state-of-the-art only requires adding more of the same. Let’s go play! As usual you can find the final notebook from this article on my Github.

comments powered by Disqus