PyTorch

PyTorch logomark with an orange flame icon

📌 Introduction

In this blog post, I will walk you through the basics of PyTorch, an open-source deep learning framework that is used to build AI products such as Tesla Autopilot, GPT-3, and Stable Diffusion. It’s very popular because:

It’s easy to understand like Python
Has a strong open-source community
It is mainly used in research

Before we start, this is how to import PyTorch:

import torch

🔢 Tensors

I think that it’s pretty reasonable to think PyTorch as:
PyTorch = Python + Tensors + Autograd

ℹ️ Note:

This is a simplified way to think about PyTorch. In reality, it also includes other things such as modules and GPU acceleration, but for the purpose of this blog post, I’ll use this analogy.

The first thing I will talk about are tensors.
Tensors are the heart of PyTorch, but it’s actually just an array that has several useful features.
This is how you initialize a tensor:

x = torch.tensor([3.14, 1.59])
print(x)

Output:

tensor([3.1400, 1.5900])

Now, I will introduce you some basic functions of tensors:

# uninitialized
x = torch.empty(2, 3)
print(f"empty(2, 3):\n{x}\n")

# random
x = torch.rand(4, 2)
print(f"rand(4, 2):\n{x}\n")

# zeros, ones
x = torch.zeros(3)
print(f"zeros(3):\n{x}\n")
x = torch.ones(2, 3)
print(f"ones(2, 3):\n{x}\n")

Output:

empty(2, 3):
tensor([[1.4013e-45, 0.0000e+00, 3.9236e-44],
        [0.0000e+00, 0.0000e+00, 0.0000e+00]])

rand(4, 2):
tensor([[0.6997, 0.2806],
        [0.2932, 0.3877],
        [0.0806, 0.2674],
        [0.5697, 0.7124]])

zeros(3):
tensor([0., 0., 0.])

ones(2, 3):
tensor([[1., 1., 1.],
        [1., 1., 1.]])

Here you might ask, “What’s the difference between torch.empty and torch.zeros?” The difference is that torch.empty creates tensors without initializing the values and torch.zeros creates tensors by filling it with zeros. Because of the initialization step, torch.empty runs quicker than torch.zeros. The difference might not even be noticeable, but in large scale, this can save a lot of time.

These are some functions that allow us to modify and know sizes and data types of tensors:

# size
print(f"size:\n{x.size()}\n")
print(f"shape:\n{x.shape}\n")

# data type
print(f"data type:\n{x.dtype}\n")

# specify types (float32 is default)
x = torch.ones(1, 2, dtype=torch.float16)
print(f"new data type:\n{x.dtype}\n")

zeros(3):
tensor([0., 0., 0.])

ones(2, 3):
tensor([[1., 1., 1.],
        [1., 1., 1.]])

size:
torch.Size([2, 3])

shape:
torch.Size([2, 3])

data type:
torch.float32

new data type:
torch.float16

In machine learning, linear algebra is used a lot, since it allows computations to run more efficiently. As a machine learning library, PyTorch allows us to perform linear operations with tensors, and here are some examples (which are pretty straightforward):

x = torch.ones(2, 2)
y = torch.rand(2, 2)

# add
z = x + y
print(z)

# subtract
z = x - y
print(z)

# multiply
z = x * y
print(z)

# divide
z = x / y
print(z)

Output:

tensor([[1.6110, 1.0638],
        [1.9629, 1.8989]])
tensor([[0.3890, 0.9362],
        [0.0371, 0.1011]])
tensor([[0.6110, 0.0638],
        [0.9629, 0.8989]])
tensor([[ 1.6368, 15.6769],
        [ 1.0386,  1.1125]])

These are some other functions that might be useful:

x = torch.rand(5,3)
print(x)

# Slicing
print(f"x[:, 0]: {x[:, 0]}") # all rows, column 0
print(f"x[1, :]: {x[1, :]}") # row 1, all columns
print(f"x[1, 1]: {x[1, 1]}\n") # element at 1, 1

# Reshape
y = x.view(2, 8)
print(f"x.view(2, 8):\n{y}")

z = x.view(4, -1)
print(f"x.view(4, -1):\n{z}\n")

# torch to numpy
y = x.numpy()
print(y)
print(type(y))

Output:

tensor([[0.1636, 0.0014, 0.1437],
        [0.5086, 0.3499, 0.9198],
        [0.2150, 0.2378, 0.7212],
        [0.9720, 0.6074, 0.4831],
        [0.5725, 0.7592, 0.3244]])
x[:, 0]: tensor([0.1636, 0.5086, 0.2150, 0.9720, 0.5725])
x[1, :]: tensor([0.5086, 0.3499, 0.9198])
x[1, 1]: 0.34985458850860596

x.view(2, 8):
tensor([[0.1636, 0.0014, 0.1437, 0.5086, 0.3499, 0.9198, 0.2150, 0.2378, 0.7212,
         0.9720, 0.6074, 0.4831, 0.5725, 0.7592, 0.3244]])
x.view(4, -1):
tensor([[0.1636, 0.0014, 0.1437, 0.5086, 0.3499],
        [0.9198, 0.2150, 0.2378, 0.7212, 0.9720],
        [0.6074, 0.4831, 0.5725, 0.7592, 0.3244]])

[[0.16361332 0.0014295  0.1437028 ]
 [0.508555   0.3498546  0.91982764]
 [0.21500635 0.2377767  0.7212155 ]
 [0.97201943 0.6073574  0.483128  ]
 [0.57247156 0.7592116  0.3243752 ]]
<class 'numpy.ndarray'>

⚙️ Autograd

Remember I said PyTorch = Python + Tensors + Autograd? Currently, you might think that PyTorch is no different from NumPy (a Python library used for working with arrays). However, autograd is what makes PyTorch special.
Autograd stands for ‘Automatic Differentiation’. In machine learning, training a model means minimizing the cost, and to minimize the cost, you have to tweak the weights of the model. Here we use gradients (another word for derivatives) to find out how much we have to tweak weights to lower the cost. PyTorch’s autograd automatically applies the chain rule of calculus for us, and that’s why so many people use it.
Now I’ll tell you how to use autograd in PyTorch.
To turn on the autograd function in a tensor, you have to set requires_grad = True (it’s set to False in default). Here’s an example of using autograd:

x = torch.tensor([2.0], requires_grad=True)
y = x ** 2 + 1

# .backward() runs backpropagation
y.backward()
# .grad shows the gradient w.r.t the value
print(x.grad)

# the gradient accumulates
# so it's good practice to set the gradient back to 0
x.grad.zero_()
print(x.grad)

Output:

tensor([4.])
tensor([0.])

This is how you stop autograd:

# stopping gradient tracking
# .requires_grad_(False)
x = torch.ones(2, requires_grad=True)
print(x.requires_grad)
x.requires_grad_(False)
print(x.requires_grad)

Output:

True
False

If you want a new tensor to not be connected to the computational graph (the chain), you can use .detach()

x = torch.tensor([2.0], requires_grad=True)
y = x * 3
z = y.detach() 

print(y.requires_grad)
print(z.requires_grad)

Output:

True
False

Finally, you can use with torch.no_grad() to temporarily stop gradient tracking:

x = torch.randn(2, 2, requires_grad=True)
print(x.requires_grad)
with torch.no_grad():
    y = x ** 2
    print(y.requires_grad)

Output:

True
False

🏗️ Building and Training a Simple Model

Now we have all the building blocks, and it’s time to actually build something! Here, we will build a simple linear regression model (if you’re not used to linear regression, I encourage you to read this blog post):

# Import libraries
import torch
import torch.nn as nn


# Create dataset
torch.manual_seed(314)
X = torch.rand(100, 1) * 10
# y = 2x + 3 + noise
y = 2 * X + 3 + torch.randn(100, 1)


# Define model
class LinearRegressionModel(nn.Module):
    def __init__(self):
        super().__init__()
        # input_dim=1, output_dim=1
        self.linear = nn.Linear(1, 1)

    def forward(self, x):
        return self.linear(x)


# Model, loss, and optimizer
model = LinearRegressionModel()
# Mean squared error
criterion = nn.MSELoss()
# Stochastic gradient descent
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)


# Training loop
epochs = 1000
for epoch in range(epochs):
    # Forward propagation
    y_pred = model(X)

    # Compute loss
    loss = criterion(y_pred, y)

    # Backward propagation & optimization
    optimizer.zero_grad()
    loss.backward()
    # Update weights
    optimizer.step()

    if (epoch+1) % 100 == 0:
        [w, b] = model.parameters()
        print(f'Epoch {epoch+1}/{epochs}, Loss: {loss.item():.4f}')

w, b = model.parameters()
print(f"Learned weight: {w.item():.2f}, bias: {b.item():.2f}")

Output:

Epoch 100/1000, Loss: 1.4866
Epoch 200/1000, Loss: 1.1636
Epoch 300/1000, Loss: 1.0480
Epoch 400/1000, Loss: 1.0067
Epoch 500/1000, Loss: 0.9919
Epoch 600/1000, Loss: 0.9866
Epoch 700/1000, Loss: 0.9847
Epoch 800/1000, Loss: 0.9840
Epoch 900/1000, Loss: 0.9838
Epoch 1000/1000, Loss: 0.9837
Learned weight: 2.00, bias: 2.92

This is it! We’ve now built and trained our first model 😊

If you’d like to learn more about PyTorch, the PyTorch Documentation is a great place to start with. It would guide you from installaion to all the main components of PyTorch.

✅ Summary

In this blog post, we’ve learned about PyTorch, one of the essentials in deep learning. We’ve covered tensors, autograd, and training models, and you’ll now understand about 80% of PyTorch, which I think is enough to build and train real models. Thanks for reading, and I’ll see you in the next blog post!