Tutorials Logic, IN info@tutorialslogic.com

PyTorch Capstone: Build, Train and Serve an Image Classifier

PyTorch Capstone

This capstone brings the PyTorch tutorial together as a real developer project. You will structure an image classifier so data loading, model definition, training, evaluation, checkpointing, and inference are separated into files that can be tested and maintained.

The project is intentionally practical. It teaches the decisions behind the code: where transforms belong, how labels are mapped, why training and validation modes are different, what to save in a checkpoint, how to load on another device, and how to write inference code that behaves like the training pipeline.

Add one worked example that compares the normal path with the boundary case for PyTorch Capstone: Build, Train and Serve an Image Classifier.

PyTorch Capstone Build Train and Serve an Image Classifier should be studied as a practical PyTorch lesson, not as a label. Start by naming the input, the rule that changes the input, and the result a learner should be able to predict after reading the page.

In the pytorch > capstone-image-classifier page, the notes should connect the definition with a working scenario, a mistake that beginners actually make, and the exact check that proves the fix. That makes the topic useful for coding, debugging, and interview revision.

Mental Model

A PyTorch project becomes production-ready when the training path and inference path share the same assumptions about shape, dtype, normalization, labels, and model architecture.

Project Architecture

A complete PyTorch project should not hide everything inside one notebook. Notebooks are useful for exploration, but durable projects need files with clear responsibilities. This makes experiments reproducible and deployment simpler.

The capstone supports two modeling choices: a small custom CNN for learning fundamentals, or a transfer learning model for better accuracy on limited data. The rest of the project structure stays almost the same.

  • <strong>data/:</strong> image folders arranged by class name.
  • <strong>src/data.py:</strong> transforms, datasets, dataloaders, and class names.
  • <strong>src/model.py:</strong> model factory for CNN or transfer learning.
  • <strong>src/train.py:</strong> training loop, validation loop, metrics, and checkpoints.
  • <strong>src/infer.py:</strong> safe prediction code for one image or a batch.
  • <strong>checkpoints/:</strong> saved weights, metadata, metrics, and label mapping.

Dataset Contract

Use a folder-per-class layout because it works well with ImageFolder and makes label mapping explicit. Keep validation data separate from training data. If you split programmatically, save the split so future runs evaluate on the same examples.

  • Resize or crop images to a stable input size.
  • Normalize using the same mean and standard deviation during training and inference.
  • Use augmentation only for training, not validation or inference.
  • Save class_names so prediction output is human-readable and stable.

PyTorch Capstone Build Train and Serve an Image Classifier in Real Work

PyTorch Capstone Build Train and Serve an Image Classifier matters in PyTorch because it changes how a program is written, tested, or debugged. The page should explain the normal flow first: what the developer writes, what the runtime or platform does, and what result should appear.

When teaching PyTorch Capstone Build Train and Serve an Image Classifier, avoid stopping at syntax. Show the surrounding decision: why this feature is chosen, what problem it removes, and what would become harder if the feature were not used.

  • Identify the concrete problem solved by PyTorch Capstone Build Train and Serve an Image Classifier.
  • Show the normal input, operation, and output for pytorch.
  • Mention the nearby alternative a beginner may confuse with this topic.
  • Tie the explanation to a real project task, command, component, query, or debugging step.

Recommended Project Layout

This layout keeps the model lifecycle readable from raw data to prediction.

Recommended Project Layout
pytorch-image-classifier/
  requirements.txt
  data/
    train/
      cats/
      dogs/
    val/
      cats/
      dogs/
  checkpoints/
  src/
    data.py
    model.py
    train.py
    infer.py
  • The class folder names become labels, so rename them carefully before training.
  • Keep train and validation folders separate to avoid data leakage.

DataLoaders with Training and Validation Transforms

Training data gets augmentation. Validation data should represent real evaluation without random distortions.

DataLoaders with Training and Validation Transforms
# src/data.py
from pathlib import Path

from torch.utils.data import DataLoader
from torchvision import datasets, transforms

IMAGE_SIZE = 224
MEAN = [0.485, 0.456, 0.406]
STD = [0.229, 0.224, 0.225]

def build_transforms(train: bool):
    if train:
        return transforms.Compose([
            transforms.Resize((IMAGE_SIZE, IMAGE_SIZE)),
            transforms.RandomHorizontalFlip(p=0.5),
            transforms.RandomRotation(degrees=8),
            transforms.ToTensor(),
            transforms.Normalize(MEAN, STD),
        ])

    return transforms.Compose([
        transforms.Resize((IMAGE_SIZE, IMAGE_SIZE)),
        transforms.ToTensor(),
        transforms.Normalize(MEAN, STD),
    ])

def build_loaders(data_dir="data", batch_size=32, num_workers=2):
    data_dir = Path(data_dir)
    train_ds = datasets.ImageFolder(data_dir / "train", transform=build_transforms(True))
    val_ds = datasets.ImageFolder(data_dir / "val", transform=build_transforms(False))

    train_loader = DataLoader(
        train_ds,
        batch_size=batch_size,
        shuffle=True,
        num_workers=num_workers,
        pin_memory=True,
    )
    val_loader = DataLoader(
        val_ds,
        batch_size=batch_size,
        shuffle=False,
        num_workers=num_workers,
        pin_memory=True,
    )
    return train_loader, val_loader, train_ds.classes
  • Shuffle training data, but keep validation order stable.
  • pin_memory can speed CPU-to-GPU transfer when using CUDA.

Model Factory with Transfer Learning

Transfer learning starts from pretrained visual features and replaces the classifier head for your classes.

Model Factory with Transfer Learning
# src/model.py
from torch import nn
from torchvision.models import ResNet18_Weights, resnet18

def build_model(num_classes: int, freeze_backbone: bool = True):
    weights = ResNet18_Weights.DEFAULT
    model = resnet18(weights=weights)

    if freeze_backbone:
        for parameter in model.parameters():
            parameter.requires_grad = False

    in_features = model.fc.in_features
    model.fc = nn.Sequential(
        nn.Dropout(p=0.25),
        nn.Linear(in_features, num_classes),
    )
    return model
  • Freeze the backbone for a fast baseline, then unfreeze later layers if validation accuracy stalls.
  • The final layer must match the number of classes in your dataset.

Complete Training and Validation Loop

This loop tracks train loss, validation loss, validation accuracy, and saves the best checkpoint.

Complete Training and Validation Loop
# src/train.py
from pathlib import Path

import torch
from torch import nn

from src.data import build_loaders
from src.model import build_model

def run_epoch(model, loader, loss_fn, optimizer, device, train: bool):
    model.train() if train else model.eval()
    total_loss, total_correct, total_items = 0.0, 0, 0

    context = torch.enable_grad() if train else torch.inference_mode()
    with context:
        for images, labels in loader:
            images = images.to(device)
            labels = labels.to(device)

            logits = model(images)
            loss = loss_fn(logits, labels)

            if train:
                optimizer.zero_grad(set_to_none=True)
                loss.backward()
                optimizer.step()

            batch_size = labels.size(0)
            total_loss += loss.item() * batch_size
            total_correct += (logits.argmax(dim=1) == labels).sum().item()
            total_items += batch_size

    return total_loss / total_items, total_correct / total_items

def train(epochs=10, batch_size=32, lr=3e-4):
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    train_loader, val_loader, class_names = build_loaders(batch_size=batch_size)

    model = build_model(num_classes=len(class_names)).to(device)
    loss_fn = nn.CrossEntropyLoss()
    optimizer = torch.optim.AdamW(
        [p for p in model.parameters() if p.requires_grad],
        lr=lr,
        weight_decay=1e-4,
    )

    best_acc = 0.0
    Path("checkpoints").mkdir(exist_ok=True)

    for epoch in range(1, epochs + 1):
        train_loss, train_acc = run_epoch(model, train_loader, loss_fn, optimizer, device, True)
        val_loss, val_acc = run_epoch(model, val_loader, loss_fn, optimizer, device, False)

        print(
            f"epoch={epoch} "
            f"train_loss={train_loss:.4f} train_acc={train_acc:.3f} "
            f"val_loss={val_loss:.4f} val_acc={val_acc:.3f}"
        )

        if val_acc > best_acc:
            best_acc = val_acc
            torch.save({
                "model_state_dict": model.state_dict(),
                "class_names": class_names,
                "image_size": 224,
                "best_val_acc": best_acc,
            }, "checkpoints/best.pt")

if __name__ == "__main__":
    train()
  • CrossEntropyLoss expects raw logits, not softmax probabilities.
  • set_to_none=True is a common efficient way to clear gradients.

Deployment-Safe Inference Script

Inference loads the same architecture, restores weights, applies the same preprocessing, and disables gradients.

Deployment-Safe Inference Script
# src/infer.py
import torch
from PIL import Image
from torchvision import transforms

from src.data import IMAGE_SIZE, MEAN, STD
from src.model import build_model

def load_model(checkpoint_path, device):
    checkpoint = torch.load(checkpoint_path, map_location=device, weights_only=True)
    model = build_model(num_classes=len(checkpoint["class_names"]), freeze_backbone=False)
    model.load_state_dict(checkpoint["model_state_dict"])
    model.to(device)
    model.eval()
    return model, checkpoint["class_names"]

def preprocess(image_path):
    transform = transforms.Compose([
        transforms.Resize((IMAGE_SIZE, IMAGE_SIZE)),
        transforms.ToTensor(),
        transforms.Normalize(MEAN, STD),
    ])
    image = Image.open(image_path).convert("RGB")
    return transform(image).unsqueeze(0)

@torch.inference_mode()
def predict(image_path, checkpoint_path="checkpoints/best.pt"):
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    model, class_names = load_model(checkpoint_path, device)
    batch = preprocess(image_path).to(device)

    logits = model(batch)
    probabilities = torch.softmax(logits, dim=1)[0]
    confidence, class_id = probabilities.max(dim=0)

    return {
        "class": class_names[class_id.item()],
        "confidence": round(confidence.item(), 4),
    }

if __name__ == "__main__":
    print(predict("sample.jpg"))
  • The preprocessing constants match training, which prevents silent accuracy drops.
  • weights_only=True avoids loading arbitrary pickled Python objects when loading a state dict checkpoint.
Key Takeaways
  • Complete PyTorch projects separate data, model, training, checkpointing, and inference code.
  • Training mode and evaluation mode must be controlled deliberately.
  • Save class names and preprocessing metadata with model weights.
  • Validation accuracy should decide the best checkpoint, not training loss alone.
  • Explain the purpose of PyTorch Capstone: Build, Train and Serve an Image Classifier before memorizing syntax.
Common Mistakes to Avoid
WRONG Apply random augmentation during validation and inference.
RIGHT Use deterministic transforms for validation and inference.
Random validation transforms make metrics noisy and hard to trust.
WRONG Forget to save class_names with the checkpoint.
RIGHT Save label mapping beside the weights.
A model that predicts class index 2 is not useful unless you know what index 2 means.
WRONG Use softmax before CrossEntropyLoss.
RIGHT Pass raw logits to CrossEntropyLoss.
CrossEntropyLoss already applies log-softmax internally.
WRONG Memorizing PyTorch Capstone Build Train and Serve an Image Classifier without the situation where it is useful.
RIGHT Connect PyTorch Capstone Build Train and Serve an Image Classifier to a concrete PyTorch task.
Purpose makes syntax easier to recall.

Practice Tasks

  • Replace ResNet18 with EfficientNet or MobileNet and compare validation accuracy and inference speed.
  • Add early stopping when validation loss stops improving for five epochs.
  • Write a FastAPI endpoint that accepts an uploaded image and returns class plus confidence.
  • Add a confusion matrix after validation to identify classes the model mixes up.
  • Write a small example that uses PyTorch Capstone Build Train and Serve an Image Classifier in a realistic PyTorch scenario.

Frequently Asked Questions

The structure still applies, but the Dataset, transforms, model architecture, and inference preprocessing will change for text, tabular, audio, or time-series data.

The checkpoint stores learned parameters. Your code defines the architecture, then load_state_dict fills that architecture with trained values.

The common mistake is memorizing syntax without understanding when the behavior changes or fails.

Remember the problem it solves in PyTorch, then attach the syntax or steps to that problem.

Ready to Level Up Your Skills?

Explore 500+ free tutorials across 20+ languages and frameworks.