.numel()
: returns the number of elements in the tensor
Updating Weights with SGD
sgd = optim.SGD(model.parameters(), lr=0.01, momentum=0.95)
- Two parameters:
- learning rate: controls the step size
- momentum: controls the inertia of the optimizer
- Bad values can lead to:
- long training times
- bad overall performances (poor accuracy)
Learning Rate |
Momentum |
Controls the step size |
Controls the inertia |
Too small leads to long training times |
Null momentum can lead to the optimizer being stuck in a local minimum |
Too high leads to poor performances |
Non-null momentum can help find the function minimum |
Typical values between \(10^{-2}\) and \(10^{-4}\) |
Typical values between 0.85 and 0.99 |
Layer Initialization
- A layer weights are initialized to small values
- The outputs of a layer would explode if the inputs and the weights are not normalized
- The weights can be initialized using different methods (for example, using a uniform distribution)
nn.init.uniform_(layer.weight)
Transfer Learning anf Fine-Tuning
- Fine-Tuning = A type of transfer learning
- Smaller learning rate
- Not every layer is trained (we freeze some of them)
- Rule of thumb: freeze early layers of network and fine-tune layers closer to output layer
import torch.nn as nn
model = nn.Sequential(nn.Linear(64, 128),
nn.Linear(128, 256))
for name, param in model.named_parameters():
if name == '0.weight':
param.requires_grad = False
TensorDataset
import torch
from torch.utils.data import TensorDataset
# Instantiate dataset class
dataset = TensorDataset(torch.tensor(X).float(), torch.tensor(y).float())
# Access an individual sample
sample = dataset[0]
input_sample, label_sample = sample
print('input sample:', input_sample)
print('label sample:', label_sample)
DataLoader
from torch.utils.data import DataLoader
batch_size = 2
shuffle = True # Tell the dataloader to shuffle at each iteration
# Create a DataLoader
dataloader = DataLoader(dataset, batch_size = batch_size, shuffle = shuffle)
# Iterate over the dataloader
for batch_inputs, batch_labels in dataloader:
print('batch inputs', batch_inputs)
print('batch labels', batch_labels)
Calculating Training Loss
- For each epoch:
- we sum up the loss for each iteration of the training set dataloader
- at the end of the epoch, we calculate the mean training loss
training_loss = 0.0
for i, data in enumerate(trainloader, 0):
# Run the forward pass
...
# Calculate the loss
loss = criterion(outputs, labels)
# Calculate the gradients
...
# Calculate and sum the loss
training_loss += loss.item()
epoch_loss = training_loss / len(trainloader)
Calculating Validation Loss
After the training epoch, we iterate over the validation set and calculate the average validation loss
validation_loss = 0.0
model.eval() # Put model in evaluation mode
with torch.no_grad(): # Speed up the forward pass
for i, data in enumerate(validationloader, 0):
# Run the forward pass
...
# Calculate the loss
loss = criterion(outputs, labels)
validation_loss += loss.item()
epoch_loss = validation_loss / len(validationloader)
model.train()
Calculating Accuracy with Torchmetrics
import torchmetrics
# Create accuracy metric using torch metrics
metric = torchmetrics.Accuracy(task="multiclass", num_classes=3)
for i, data in enumerate(dataloader, 0):
features, labels = data
outputs = model(features)
# Calculate accuracy over the batch
acc = metric(outputs, labels.argmax(dim=-1))
# Calculate accuracy over the whole epoch
acc = metric.compute()
print(f"Accuracy on all data: {acc}")
# Rest the metric for the next epoch (training or validation)
metric.reset()
Overfitting
Problem |
Solutions |
Dataset is not large enough |
Get more data / use data augmentation |
Model has too much capacity |
Reduce model size / add dropout |
Weights are too large |
Weight decay |
"Regularization" using a dropout layer
- Randomly zeroes out elements of the input tensor during training
model = nn.Sequential(nn.Linear(8,4),
nn.ReLU(),
nn.Dropout(p=0.5))
features = torch.randn((1,8))
model(i)
- Dropout is added after the activation function
- Behaves differently during training and evaluation; we must remember to switch modes using
model.train()
and model.eval()
Regularization with weight decay
optimizer = optim.SGD(model.parameters(), lr=1e-3, weight_decay=1e-4)
- Optimizer's
weight_decay
parameter takes values between zero and one
- Typically small value, e.g. 1e-3
- Weight decay adds penalty to loss function to discourage large weights and biases
- The higher the parameter, the less likely the model is to overfit
留言