• .numel(): returns the number of elements in the tensor

Updating Weights with SGD

sgd = optim.SGD(model.parameters(), lr=0.01, momentum=0.95)
  • Two parameters:
    • learning rate: controls the step size
    • momentum: controls the inertia of the optimizer
  • Bad values can lead to:
    • long training times
    • bad overall performances (poor accuracy)
Learning Rate Momentum
Controls the step size Controls the inertia
Too small leads to long training times Null momentum can lead to the optimizer being stuck in a local minimum
Too high leads to poor performances Non-null momentum can help find the function minimum
Typical values between \(10^{-2}\) and \(10^{-4}\) Typical values between 0.85 and 0.99

Layer Initialization

  • A layer weights are initialized to small values
  • The outputs of a layer would explode if the inputs and the weights are not normalized
  • The weights can be initialized using different methods (for example, using a uniform distribution)
  • nn.init.uniform_(layer.weight)

Transfer Learning anf Fine-Tuning

  • Fine-Tuning = A type of transfer learning
    • Smaller learning rate
    • Not every layer is trained (we freeze some of them)
    • Rule of thumb: freeze early layers of network and fine-tune layers closer to output layer
    import torch.nn as nn
    
    model = nn.Sequential(nn.Linear(64, 128),
                          nn.Linear(128, 256))
                          
    for name, param in model.named_parameters():
        if name == '0.weight':
            param.requires_grad = False
    

TensorDataset

import torch
from torch.utils.data import TensorDataset

# Instantiate dataset class
dataset = TensorDataset(torch.tensor(X).float(), torch.tensor(y).float())

# Access an individual sample
sample = dataset[0]
input_sample, label_sample = sample
print('input sample:', input_sample)
print('label sample:', label_sample)

DataLoader

from torch.utils.data import DataLoader

batch_size = 2
shuffle = True  # Tell the dataloader to shuffle at each iteration

# Create a DataLoader
dataloader = DataLoader(dataset, batch_size = batch_size, shuffle = shuffle)

# Iterate over the dataloader
for batch_inputs, batch_labels in dataloader:
    print('batch inputs', batch_inputs)
    print('batch labels', batch_labels)

Calculating Training Loss

  • For each epoch:
    • we sum up the loss for each iteration of the training set dataloader
    • at the end of the epoch, we calculate the mean training loss
    training_loss = 0.0
    for i, data in enumerate(trainloader, 0):
        # Run the forward pass
        ...
        
        # Calculate the loss
        loss = criterion(outputs, labels)
        # Calculate the gradients
        ...
        # Calculate and sum the loss
        training_loss += loss.item()
    epoch_loss = training_loss / len(trainloader)
    

Calculating Validation Loss

After the training epoch, we iterate over the validation set and calculate the average validation loss

validation_loss = 0.0
model.eval() # Put model in evaluation mode
with torch.no_grad():   # Speed up the forward pass
    for i, data in enumerate(validationloader, 0):
        # Run the forward pass
        ...
        # Calculate the loss
        loss = criterion(outputs, labels)
        validation_loss += loss.item()
epoch_loss = validation_loss / len(validationloader)
model.train()    

Calculating Accuracy with Torchmetrics

import torchmetrics

# Create accuracy metric using torch metrics
metric = torchmetrics.Accuracy(task="multiclass", num_classes=3)
for i, data in enumerate(dataloader, 0):
    features, labels = data
    outputs = model(features)
    # Calculate accuracy over the batch
    acc = metric(outputs, labels.argmax(dim=-1))
# Calculate accuracy over the whole epoch
acc = metric.compute()
print(f"Accuracy on all data: {acc}")
# Rest the metric for the next epoch (training or validation)
metric.reset()    

Overfitting

Problem Solutions
Dataset is not large enough Get more data / use data augmentation
Model has too much capacity Reduce model size / add dropout
Weights are too large Weight decay

"Regularization" using a dropout layer

  • Randomly zeroes out elements of the input tensor during training
    model = nn.Sequential(nn.Linear(8,4),
            nn.ReLU(),
            nn.Dropout(p=0.5))
    features = torch.randn((1,8))
    model(i)
    
  • Dropout is added after the activation function
  • Behaves differently during training and evaluation; we must remember to switch modes using model.train() and model.eval()

Regularization with weight decay

optimizer = optim.SGD(model.parameters(), lr=1e-3, weight_decay=1e-4)
  • Optimizer's weight_decay parameter takes values between zero and one
    • Typically small value, e.g. 1e-3
  • Weight decay adds penalty to loss function to discourage large weights and biases
  • The higher the parameter, the less likely the model is to overfit
最后修改日期: 2024年 3月 5日

作者

留言

撰写回覆或留言

发布留言必须填写的电子邮件地址不会公开。