This capstone brings the PyTorch tutorial together as a real developer project. You will structure an image classifier so data loading, model definition, training, evaluation, checkpointing, and inference are separated into files that can be tested and maintained.
The project is intentionally practical. It teaches the decisions behind the code: where transforms belong, how labels are mapped, why training and validation modes are different, what to save in a checkpoint, how to load on another device, and how to write inference code that behaves like the training pipeline.
Add one worked example that compares the normal path with the boundary case for PyTorch Capstone: Build, Train and Serve an Image Classifier.
PyTorch Capstone Build Train and Serve an Image Classifier should be studied as a practical PyTorch lesson, not as a label. Start by naming the input, the rule that changes the input, and the result a learner should be able to predict after reading the page.
In the pytorch > capstone-image-classifier page, the notes should connect the definition with a working scenario, a mistake that beginners actually make, and the exact check that proves the fix. That makes the topic useful for coding, debugging, and interview revision.
A PyTorch project becomes production-ready when the training path and inference path share the same assumptions about shape, dtype, normalization, labels, and model architecture.
A complete PyTorch project should not hide everything inside one notebook. Notebooks are useful for exploration, but durable projects need files with clear responsibilities. This makes experiments reproducible and deployment simpler.
The capstone supports two modeling choices: a small custom CNN for learning fundamentals, or a transfer learning model for better accuracy on limited data. The rest of the project structure stays almost the same.
Use a folder-per-class layout because it works well with ImageFolder and makes label mapping explicit. Keep validation data separate from training data. If you split programmatically, save the split so future runs evaluate on the same examples.
PyTorch Capstone Build Train and Serve an Image Classifier matters in PyTorch because it changes how a program is written, tested, or debugged. The page should explain the normal flow first: what the developer writes, what the runtime or platform does, and what result should appear.
When teaching PyTorch Capstone Build Train and Serve an Image Classifier, avoid stopping at syntax. Show the surrounding decision: why this feature is chosen, what problem it removes, and what would become harder if the feature were not used.
This layout keeps the model lifecycle readable from raw data to prediction.
pytorch-image-classifier/
requirements.txt
data/
train/
cats/
dogs/
val/
cats/
dogs/
checkpoints/
src/
data.py
model.py
train.py
infer.py
Training data gets augmentation. Validation data should represent real evaluation without random distortions.
# src/data.py
from pathlib import Path
from torch.utils.data import DataLoader
from torchvision import datasets, transforms
IMAGE_SIZE = 224
MEAN = [0.485, 0.456, 0.406]
STD = [0.229, 0.224, 0.225]
def build_transforms(train: bool):
if train:
return transforms.Compose([
transforms.Resize((IMAGE_SIZE, IMAGE_SIZE)),
transforms.RandomHorizontalFlip(p=0.5),
transforms.RandomRotation(degrees=8),
transforms.ToTensor(),
transforms.Normalize(MEAN, STD),
])
return transforms.Compose([
transforms.Resize((IMAGE_SIZE, IMAGE_SIZE)),
transforms.ToTensor(),
transforms.Normalize(MEAN, STD),
])
def build_loaders(data_dir="data", batch_size=32, num_workers=2):
data_dir = Path(data_dir)
train_ds = datasets.ImageFolder(data_dir / "train", transform=build_transforms(True))
val_ds = datasets.ImageFolder(data_dir / "val", transform=build_transforms(False))
train_loader = DataLoader(
train_ds,
batch_size=batch_size,
shuffle=True,
num_workers=num_workers,
pin_memory=True,
)
val_loader = DataLoader(
val_ds,
batch_size=batch_size,
shuffle=False,
num_workers=num_workers,
pin_memory=True,
)
return train_loader, val_loader, train_ds.classes
Transfer learning starts from pretrained visual features and replaces the classifier head for your classes.
# src/model.py
from torch import nn
from torchvision.models import ResNet18_Weights, resnet18
def build_model(num_classes: int, freeze_backbone: bool = True):
weights = ResNet18_Weights.DEFAULT
model = resnet18(weights=weights)
if freeze_backbone:
for parameter in model.parameters():
parameter.requires_grad = False
in_features = model.fc.in_features
model.fc = nn.Sequential(
nn.Dropout(p=0.25),
nn.Linear(in_features, num_classes),
)
return model
This loop tracks train loss, validation loss, validation accuracy, and saves the best checkpoint.
# src/train.py
from pathlib import Path
import torch
from torch import nn
from src.data import build_loaders
from src.model import build_model
def run_epoch(model, loader, loss_fn, optimizer, device, train: bool):
model.train() if train else model.eval()
total_loss, total_correct, total_items = 0.0, 0, 0
context = torch.enable_grad() if train else torch.inference_mode()
with context:
for images, labels in loader:
images = images.to(device)
labels = labels.to(device)
logits = model(images)
loss = loss_fn(logits, labels)
if train:
optimizer.zero_grad(set_to_none=True)
loss.backward()
optimizer.step()
batch_size = labels.size(0)
total_loss += loss.item() * batch_size
total_correct += (logits.argmax(dim=1) == labels).sum().item()
total_items += batch_size
return total_loss / total_items, total_correct / total_items
def train(epochs=10, batch_size=32, lr=3e-4):
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
train_loader, val_loader, class_names = build_loaders(batch_size=batch_size)
model = build_model(num_classes=len(class_names)).to(device)
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.AdamW(
[p for p in model.parameters() if p.requires_grad],
lr=lr,
weight_decay=1e-4,
)
best_acc = 0.0
Path("checkpoints").mkdir(exist_ok=True)
for epoch in range(1, epochs + 1):
train_loss, train_acc = run_epoch(model, train_loader, loss_fn, optimizer, device, True)
val_loss, val_acc = run_epoch(model, val_loader, loss_fn, optimizer, device, False)
print(
f"epoch={epoch} "
f"train_loss={train_loss:.4f} train_acc={train_acc:.3f} "
f"val_loss={val_loss:.4f} val_acc={val_acc:.3f}"
)
if val_acc > best_acc:
best_acc = val_acc
torch.save({
"model_state_dict": model.state_dict(),
"class_names": class_names,
"image_size": 224,
"best_val_acc": best_acc,
}, "checkpoints/best.pt")
if __name__ == "__main__":
train()
Inference loads the same architecture, restores weights, applies the same preprocessing, and disables gradients.
# src/infer.py
import torch
from PIL import Image
from torchvision import transforms
from src.data import IMAGE_SIZE, MEAN, STD
from src.model import build_model
def load_model(checkpoint_path, device):
checkpoint = torch.load(checkpoint_path, map_location=device, weights_only=True)
model = build_model(num_classes=len(checkpoint["class_names"]), freeze_backbone=False)
model.load_state_dict(checkpoint["model_state_dict"])
model.to(device)
model.eval()
return model, checkpoint["class_names"]
def preprocess(image_path):
transform = transforms.Compose([
transforms.Resize((IMAGE_SIZE, IMAGE_SIZE)),
transforms.ToTensor(),
transforms.Normalize(MEAN, STD),
])
image = Image.open(image_path).convert("RGB")
return transform(image).unsqueeze(0)
@torch.inference_mode()
def predict(image_path, checkpoint_path="checkpoints/best.pt"):
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model, class_names = load_model(checkpoint_path, device)
batch = preprocess(image_path).to(device)
logits = model(batch)
probabilities = torch.softmax(logits, dim=1)[0]
confidence, class_id = probabilities.max(dim=0)
return {
"class": class_names[class_id.item()],
"confidence": round(confidence.item(), 4),
}
if __name__ == "__main__":
print(predict("sample.jpg"))
Apply random augmentation during validation and inference.
Use deterministic transforms for validation and inference.
Forget to save class_names with the checkpoint.
Save label mapping beside the weights.
Use softmax before CrossEntropyLoss.
Pass raw logits to CrossEntropyLoss.
Memorizing PyTorch Capstone Build Train and Serve an Image Classifier without the situation where it is useful.
Connect PyTorch Capstone Build Train and Serve an Image Classifier to a concrete PyTorch task.
The structure still applies, but the Dataset, transforms, model architecture, and inference preprocessing will change for text, tabular, audio, or time-series data.
The checkpoint stores learned parameters. Your code defines the architecture, then load_state_dict fills that architecture with trained values.
The common mistake is memorizing syntax without understanding when the behavior changes or fails.
Remember the problem it solves in PyTorch, then attach the syntax or steps to that problem.
Explore 500+ free tutorials across 20+ languages and frameworks.