Table of Contents

Testing: Verifying Correctness Against PyTorch

The test suite validates micrograd's gradient computations by comparing them against PyTorch as a ground-truth oracle. This chapter explains the testing strategy, what the two test functions cover, and what edge cases they expose about the autograd engine's correctness.

Oracle Testing: Using PyTorch as Ground Truth

The testing strategy in `test_engine.py` is called **oracle testing** or **reference implementation testing**: rather than hand-computing expected gradient values, the tests run the same computation in both micrograd and PyTorch and assert that the results match. This is powerful because PyTorch's autograd is battle-tested and correct; any discrepancy means micrograd has a bug.

For each test, the same arithmetic expression is written twice — once using `Value` objects and once using `torch.tensor` objects with `requires_grad=True`. Both forward passes produce equivalent scalar outputs, and both call `.backward()`. The test then compares `.grad` from the micrograd `Value`s against `.grad.item()` from the PyTorch tensors. PyTorch tensors are created with `dtype=torch.float64` to match Python's native float precision, reducing numerical noise in the comparison.

test/test_engine.py — test_sanity_check (lines 1-26)

Extended Operations and Numerical Tolerance

The `test_more_ops` function exercises a richer set of operations including division, negative exponents, and `relu` on negative values (the 'dead neuron' case). It also demonstrates **gradient accumulation at shared nodes**: the variable `d` is used in two separate sub-expressions, so its final gradient must be the sum of two contributions. The test verifies this is handled correctly by micrograd's `+=` accumulation in the backward closures.

The assertions in `test_more_ops` use `abs(val - ref) < tol` with `tol=1e-5` rather than exact equality. This tolerance is necessary because floating-point arithmetic is not associative — evaluating the same mathematical expression in a different order (as micrograd and PyTorch may do) can produce results that differ in the least significant bits. The tolerance is tight enough to catch real bugs while being loose enough to tolerate normal floating-point rounding. The `test_sanity_check` function, by contrast, uses exact equality checks, which works because the expressions are simple enough that both implementations follow the same evaluation path.

test/test_engine.py — test_more_ops (lines 28-67)

← Neural Network Abstractions: Neuron, Layer, and MLP Package Structure and the Public API →