Points on PyTorch

Today, December 8th, 2018, PyTorch 1.0 stable has been released. It is a milestone and I’d like to keep notes on PyTorch as I learn and use PyTorch. The resource mainly comes from PyTorch official tutorial and Intro to Deep Learning with PyTorch on Udacity.

Tensors

Simply put, TENSORS are a generalization of vectors and matrices. In PyTorch, they are a multi-dimensional matrix containing elements of a single data type.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# From the slide PyTorch under the hood
# by Christian S. Perone
# https://speakerdeck.com/perone/pytorch-under-the-hood?slide=12
>>> import torch
>>> t = torch.tensor([[1., -1.], [1., -1.]])
>>> t
tensor([[ 1., -1.],
[ 1., -1.]])

>>> t.dtype # They have a type
torch.float32

>>> t.shape # a shape
torch.Size([2, 2])

>>> t.device # and live in some device
device(type='cpu')

Resizing the tensor

There are a few options to use: .reshape(), .resize() and .view().

  • w.reshape(a, b) will return a new tensor with the same data as w with size (a, b)sometimes, and sometimes a clone, as in it copies the data to another part of memory
  • w.resize_(a, b) returns the same tensor with a different shape. However, if the new shape results in fewer elements than the original tensor, some elements will be removed from the tensor (but not from memory). If the new shape results in more elements than the original tensor, new elements will be uninitialized in memory.
  • w.view(a, b) will return a new tensor with the same data as w with size (a, b)

The above three methods are introduced in Intro to Deep Learning with PyTorch. PyTorch official tutorial only introduces w.view(), so generally I’d use w.view() for resizing.

Convenience of -1

When resizing the tensor, -1 is helpful to determine the only unknown size when we already know the other sizes. It can be inferred from other dimensions.

E.g.,

1
2
x = torch.randn(4, 4)
y = x.view(-1, 8)

The size of y is torch.Size([2, 8]), just as we want.

In-place operation

An in-place operation is an operation that changes directly the content of a given tensor without making a copy. In-place operations in PyTorch are always postfixed with a _, like .add_(). The .resize_() mentioned above is also an in-place operation.

NumPy to Torch and back

PyTorch has a great feature for converting between NumPy arrays and Torch tensors. To create a tensor from a NumPy array, use torch.from_numpy(). To convert a tensor to a NumPy array, use the .numpy() method.

1
2
3
4
import numpy as np
a = np.random.rand(4, 3)
b = torch.from_numpy(a)
b.numpy()

The memory is shared between the NumPy array and Torch tensor.

sum() method

Setting the dim keyword dim=0 takes the sum across the rows, i.e., compute the sum of the column vector. Similarly, dim=1 takes the sum across the columns (compute the sum of the row vector).

Neural Network

The general process with PyTorch:

  • Make a forward pass through the network
  • Use the network output to calculate the loss
  • Perform a backward pass through the network with loss.backward() to calculate the gradients
  • Take a step with the optimizer to update the weights

Constructing Neural Networks

1
2
3
4
5
6
7
8
9
10
11
12
13
import torch.nn as nn
import torch.nn.functional as F

class Network(nn.Module):
def __init__(self):
super().__init__()
self.hidden = nn.Linear(784, 256)
self.output = nn.Linear(256, 10)

def forward(self, x):
x = F.sigmoid(self.hidden(x))
x = F.softmax(self.output(x), dim=1)
return x

It is mandatory to inherit from nn.Module when creating a class for our network. The name of the class itself can be anything.

PyTorch networks created with nn.Module must have a forward method defined. It takes in a tensor x and passes it through the operations you defined in the __init__ method. And the backward function (where gradients are computed) is automatically defined for you using autograd.

Another way is mentioned in the course: nn.Sequential. (See Doc in detail)

1
2
3
4
5
6
model = nn.Sequential(nn.Linear(input_size, hidden_sizes[0]),
nn.ReLU(),
nn.Linear(hidden_sizes[0], hidden_sizes[1]),
nn.ReLU(),
nn.Linear(hidden_sizes[1], output_size),
nn.Softmax(dim=1))

Loss

PyTorch provides losses such as the cross-entropy loss (nn.CrossEntropyLoss). You’ll usually see the loss assigned to criterion. This criterion combines nn.LogSoftmax() and nn.NLLLoss() in one single class.

The input is expected to contain scores for each class.

1
2
3
4
5
>>> loss = nn.CrossEntropyLoss()
>>> input = torch.randn(3, 5, requires_grad=True)
>>> target = torch.empty(3, dtype=torch.long).random_(5)
>>> output = loss(input, target)
>>> output.backward()

Autograd

The autograd package provides automatic differentiation for all operations on Tensors. If the attribute requires_grad of torch.Tensor is set as True, it starts to track all operations on it. When you finished your computation you can call .backward() and have all the gradients computed automatically. The gradient for this tensor will be accumulated into .grad attribute.

For more information, refer to autograd tutorial and autograd doc.

Testing and Validating

The most common method to reduce overfitting is dropout, where we randomly drop input units. Adding dropout in PyTorch is straightforward using the nn.Dropout module.

1
self.dropout = nn.Dropout(p=0.2)

Turning off gradient descent when tesing or validating can help accelerate the process. Generally we can do the testing in the following mode:

1
2
with torch.no_grad()
# testing

During training we want to use dropout to prevent overfitting, but during inference we want to use the entire network. So, we need to turn off dropout during validation, testing, and whenever we’re using the network to make predictions. To do this, you use model.eval(). This sets the model to evaluation mode where the dropout probability is 0. You can turn dropout back on by setting the model to train mode with model.train().

Convolutional Neural Network

Convolutional Layer
Conv2d Documentation
We typically define a convolutional layer in PyTorch using nn.Conv2d:

1
nn.Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0)

  • in_channels number of channels in the input image. For a grayscale image, this depth = 1; for a RGB image, this depth = 3.
  • out_channels number of channels produced by the convolution
  • kernel_size size of the convolutional kernel (most commonly 3 for a 3x3 kernel)
  • stride stride of the convolution (default: 1, a single number or a tuple)
  • padding zero-padding added to both sides of the input (default: 0, a number or a tuple)

Pooling Layer

1
2
nn.MaxPool2d(kernel_size, stride=None)
nn.AvgPool2d(kernel_size, stride=None)

  • kernel_size the size of the window to take a max over
  • stride the stride of the window (default value is kernel_size)