PyTorch Tensors and Autograd: Shapes, Gradients and Backpropagation

PyTorch Tensors and Autograd

PyTorch tensors are multidimensional arrays that can run on CPU or GPU. They store model inputs, outputs, labels, weights, gradients, and intermediate activations. If you understand tensors, shapes, dtypes, devices, and broadcasting, most PyTorch code becomes much easier to debug.

Autograd is PyTorch automatic differentiation. When tensors have `requires_grad=True`, PyTorch records operations on them and builds a dynamic computation graph. Calling `backward()` computes gradients that optimizers use to update model parameters.

PyTorch is expanded here with a practical explanation, multiple examples, and beginner-focused checks so the idea is easier to learn from this page alone.

Read the concept first, then trace the example line by line. The important habit is to connect the rule to visible behavior instead of memorizing only the name.

Mental Model

A tensor carries data; autograd records how the data was produced so PyTorch can calculate how each parameter affected the loss.

Tensor Essentials

A tensor has shape, dtype, device, and values. Shape tells you the dimensions, dtype tells you numeric type, and device tells you where the tensor lives. Shape mismatches are among the most common beginner errors in PyTorch.

Use `.shape` to inspect dimensions before passing tensors into a model.
Use floating tensors for neural network inputs and parameters.
Use long integer tensors for class labels passed to `CrossEntropyLoss`.
Move both model and tensors to the same device with `.to(device)`.
Use `view`, `reshape`, `permute`, and `unsqueeze` deliberately when changing shapes.

Create and Inspect Tensors

import torch

x = torch.tensor([[1.0, 2.0], [3.0, 4.0]])
labels = torch.tensor([0, 1], dtype=torch.long)

print(x.shape)      # torch.Size([2, 2])
print(x.dtype)      # torch.float32
print(labels.dtype) # torch.int64
print(x.device)     # cpu

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
x = x.to(device)
print(x.device)

Autograd Flow

Autograd tracks operations only when at least one participating tensor requires gradients. Model parameters normally require gradients automatically. Data tensors usually do not need gradients unless you are doing special optimization on inputs.

Call `loss.backward()` to compute gradients.
Read gradients from `parameter.grad` after backward.
Call `optimizer.zero_grad()` before the next backward pass.
Use `torch.no_grad()` or `torch.inference_mode()` during evaluation and inference.

Manual Gradient Example

import torch

w = torch.tensor(2.0, requires_grad=True)
b = torch.tensor(1.0, requires_grad=True)

x = torch.tensor(3.0)
y_true = torch.tensor(10.0)

y_pred = w * x + b
loss = (y_pred - y_true) ** 2

loss.backward()

print("loss:", loss.item())
print("dLoss/dw:", w.grad.item())
print("dLoss/db:", b.grad.item())

Broadcasting and Shape Bugs

Broadcasting lets PyTorch combine tensors with compatible shapes, but accidental broadcasting can hide bugs. Always verify prediction and target shapes before computing loss.

For regression, predictions and targets should usually have the same shape.
For classification with `CrossEntropyLoss`, logits are shaped `[batch, classes]` and labels are shaped `[batch]`.
Use assertions in training code while developing.

Detailed Explanation of PyTorch

PyTorch becomes much easier when you separate the concept from the tool syntax. First identify the problem being solved, then identify the data or resource being changed, and finally identify the proof that the change worked.

In PyTorch, this topic should be studied through tensor shape, dtype, device, gradient flow, loss movement, and reproducibility. Those points explain not only how to use the feature, but also why it fails when the wrong assumption is made.

The previous audit note was: under 650 content words . This expanded section adds a fuller explanation, concrete examples, and practice guidance so the page can stand on its own for beginners.

A good way to learn this page is to read the normal path once, run or trace the example, then intentionally change one input to observe the different result. That one change teaches more than memorizing several definitions.

Write the goal of PyTorch before touching code or configuration.
Identify the normal case, edge case, and failure case.
Trace what changes before and after the operation.
Use a command, output, compiler message, log, metric, or table to verify the result.
Record the mistake that would confuse a beginner and the exact fix.

Beginner-Friendly Walkthrough for PyTorch

Start with a tiny project scenario. For example, imagine one user action, one request, one resource, one function call, or one batch of data. Keep the scenario small enough that every step can be explained without skipping details.

Next, describe the movement of information. Where does the input start? Which rule or component handles it? What result should appear? If the result is wrong, where would you inspect first?

Finally, compare two outcomes. The correct outcome proves that you understand the main rule. The incorrect outcome teaches the symptom, which is what you will recognize later during debugging or interviews.

Normal path: valid input produces the expected result.
Boundary path: the smallest, largest, empty, or unusual input still behaves predictably.
Error path: a realistic mistake creates a visible symptom.
Fix path: one focused correction removes the symptom without changing unrelated code.

Training Step with Correct Gradient Handling

import torch
from torch import nn

model = nn.Linear(4, 1)
loss_fn = nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)

features = torch.randn(8, 4)
targets = torch.randn(8, 1)

predictions = model(features)
loss = loss_fn(predictions, targets)

optimizer.zero_grad()
loss.backward()
optimizer.step()

print("loss:", loss.item())

PyTorch PyTorch shape-first example

import torch

x = torch.randn(4, 3)
print('topic:', 'PyTorch')
print('shape:', x.shape)
print('dtype:', x.dtype)
print('device:', x.device)

# Shape, dtype, and device checks catch many PyTorch mistakes early.

PyTorch PyTorch train-step example

import torch
from torch import nn

model = nn.Sequential(nn.Linear(3, 4), nn.ReLU(), nn.Linear(4, 1))
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
loss_fn = nn.MSELoss()

x = torch.randn(8, 3)
y = torch.randn(8, 1)
loss = loss_fn(model(x), y)
optimizer.zero_grad()
loss.backward()
optimizer.step()
print(float(loss))

Key Takeaways

Inspect tensor shape, dtype, and device when debugging.
Use `requires_grad=True` only for values that need gradients.
Call `zero_grad`, `backward`, and `step` in the correct order.
Disable gradient tracking during inference.
Explain the purpose of PyTorch in your own words.
Run or trace a small PyTorch example for PyTorch.
Test a normal case, a boundary case, and a broken case.
Verify the result with visible output, logs, metrics, compiler feedback, or a table.
Summarize the common mistake and the correction.

Common Mistakes to Avoid

WRONG Forget `optimizer.zero_grad()`.

RIGHT Clear gradients before each backward pass.

PyTorch accumulates gradients by default.

WRONG Move model to GPU but leave batches on CPU.

RIGHT Move model and batch tensors to the same device.

Device mismatch causes runtime errors.

WRONG Learning PyTorch only as a term.

RIGHT Learn it through a working example, a boundary case, and a failure case.

Concept plus behavior is easier to remember than definition alone.

WRONG Skipping verification.

RIGHT Always check output, state, logs, metrics, query results, or compiler feedback.

Verification turns confidence into evidence.

WRONG Changing many things at once while debugging.

RIGHT Change one setting, input, or line, then inspect the result.

Small changes reveal the real cause.

Practice Tasks

Create tensors with different shapes and test which operations broadcast.
Print gradients for a two-parameter linear equation.
Add shape assertions before a loss function in a training step.
Create a small demo that shows PyTorch clearly.
Add one edge case and write the expected result before running it.
Break the demo intentionally and document the error symptom.
Fix the broken version and explain why the fix works.

Frequently Asked Questions

Do all tensors need gradients?

No. Inputs and labels usually do not need gradients. Model parameters normally do.

Why is `.item()` used on loss?

It converts a one-value tensor into a Python number for logging. Do not use it inside differentiable calculations.

What is the fastest way to understand PyTorch?

Start with one tiny example, trace every step, then compare it with a broken version.

What should I verify after using PyTorch?

Verify the visible result: output, state, log entry, metric, query result, compiler feedback, or rendered behavior.

Why does PyTorch feel confusing at first?

It often combines vocabulary with behavior. The confusion drops when you trace the input, rule, result, and failure path.

Previous Next

PyTorch Tensors and Autograd: Shapes, Gradients and Backpropagation