PyTorch学习笔记（3）

.numel(): returns the number of elements in the tensor

Updating Weights with SGD

sgd = optim.SGD(model.parameters(), lr=0.01, momentum=0.95)

Two parameters:
- learning rate: controls the step size
- momentum: controls the inertia of the optimizer
Bad values can lead to:
- long training times
- bad overall performances (poor accuracy)

Learning Rate	Momentum
Controls the step size	Controls the inertia
Too small leads to long training times	Null momentum can lead to the optimizer being stuck in a local minimum
Too high leads to poor performances	Non-null momentum can help find the function minimum
Typical values between \(10^{-2}\) and \(10^{-4}\)	Typical values between 0.85 and 0.99

Layer Initialization

A layer weights are initialized to small values
The outputs of a layer would explode if the inputs and the weights are not normalized
The weights can be initialized using different methods (for example, using a uniform distribution)
nn.init.uniform_(layer.weight)

Transfer Learning anf Fine-Tuning

Fine-Tuning = A type of transfer learning

Smaller learning rate
Not every layer is trained (we freeze some of them)
Rule of thumb: freeze early layers of network and fine-tune layers closer to output layer

import torch.nn as nn

model = nn.Sequential(nn.Linear(64, 128),
                      nn.Linear(128, 256))
                      
for name, param in model.named_parameters():
    if name == '0.weight':
        param.requires_grad = False

TensorDataset

import torch
from torch.utils.data import TensorDataset

# Instantiate dataset class
dataset = TensorDataset(torch.tensor(X).float(), torch.tensor(y).float())

# Access an individual sample
sample = dataset[0]
input_sample, label_sample = sample
print('input sample:', input_sample)
print('label sample:', label_sample)

DataLoader

from torch.utils.data import DataLoader

batch_size = 2
shuffle = True  # Tell the dataloader to shuffle at each iteration

# Create a DataLoader
dataloader = DataLoader(dataset, batch_size = batch_size, shuffle = shuffle)

# Iterate over the dataloader
for batch_inputs, batch_labels in dataloader:
    print('batch inputs', batch_inputs)
    print('batch labels', batch_labels)

Calculating Training Loss

For each epoch:

we sum up the loss for each iteration of the training set dataloader
at the end of the epoch, we calculate the mean training loss

training_loss = 0.0
for i, data in enumerate(trainloader, 0):
    # Run the forward pass
    ...
    
    # Calculate the loss
    loss = criterion(outputs, labels)
    # Calculate the gradients
    ...
    # Calculate and sum the loss
    training_loss += loss.item()
epoch_loss = training_loss / len(trainloader)

Calculating Validation Loss

After the training epoch, we iterate over the validation set and calculate the average validation loss

validation_loss = 0.0
model.eval() # Put model in evaluation mode
with torch.no_grad():   # Speed up the forward pass
    for i, data in enumerate(validationloader, 0):
        # Run the forward pass
        ...
        # Calculate the loss
        loss = criterion(outputs, labels)
        validation_loss += loss.item()
epoch_loss = validation_loss / len(validationloader)
model.train()

Calculating Accuracy with Torchmetrics

import torchmetrics

# Create accuracy metric using torch metrics
metric = torchmetrics.Accuracy(task="multiclass", num_classes=3)
for i, data in enumerate(dataloader, 0):
    features, labels = data
    outputs = model(features)
    # Calculate accuracy over the batch
    acc = metric(outputs, labels.argmax(dim=-1))
# Calculate accuracy over the whole epoch
acc = metric.compute()
print(f"Accuracy on all data: {acc}")
# Rest the metric for the next epoch (training or validation)
metric.reset()

Overfitting

Problem	Solutions
Dataset is not large enough	Get more data / use data augmentation
Model has too much capacity	Reduce model size / add dropout
Weights are too large	Weight decay

"Regularization" using a dropout layer

Randomly zeroes out elements of the input tensor during training

model = nn.Sequential(nn.Linear(8,4),
        nn.ReLU(),
        nn.Dropout(p=0.5))
features = torch.randn((1,8))
model(i)

Dropout is added after the activation function
Behaves differently during training and evaluation; we must remember to switch modes using model.train() and model.eval()

Regularization with weight decay

optimizer = optim.SGD(model.parameters(), lr=1e-3, weight_decay=1e-4)

Optimizer's weight_decay parameter takes values between zero and one
- Typically small value, e.g. 1e-3
Weight decay adds penalty to loss function to discourage large weights and biases
The higher the parameter, the less likely the model is to overfit

Updating Weights with SGD

Layer Initialization

Transfer Learning anf Fine-Tuning

TensorDataset

DataLoader

Calculating Training Loss

Calculating Validation Loss

Calculating Accuracy with Torchmetrics

Overfitting

"Regularization" using a dropout layer

Regularization with weight decay

作者

留言

撰写回覆或留言取消回复

PyTorch学习笔记（3）

Updating Weights with SGD

Layer Initialization

Transfer Learning anf Fine-Tuning

TensorDataset

DataLoader

Calculating Training Loss

Calculating Validation Loss

Calculating Accuracy with Torchmetrics

Overfitting

"Regularization" using a dropout layer

Regularization with weight decay

作者

留言

撰写回覆或留言 取消回复

撰写回覆或留言取消回复