The Trail Map

Seven modules. One tiny codebase. Each checkpoint builds on the last until you can explain every major block, modify the code with confidence, and build your own variant.

MODULE 0

Trailhead

Orientation, prerequisites, and what a GPT is actually trying to do. Set up your environment and understand the landscape before the hike begins.

Outcome

Learner understands the project scope, has a working environment, and can articulate what next-token prediction means at a high level.

Lessons

Welcome to the Trail
What is a GPT, really?
Environment Setup
The microGPT Codebase Tour
Your First Forward Pass

Trailhead Complete

MODULE 1

Token Trail

Documents, BOS, vocabulary, tokenization, and next-token prediction. Understand what the model sees and what it's trying to predict.

Outcome

Learner can explain what the model is predicting and why, trace from raw text to token IDs, and describe the vocabulary.

Lessons

From Text to Tokens
Building a Vocabulary
BOS and Special Tokens
Next-Token Prediction
What Goes In, What Comes Out

Token Trail Cleared

MODULE 2

Gradient Gorge

Bigram intuition, loss functions, autograd, backpropagation, and learning dynamics. Where the actual learning happens.

Outcome

Learner can explain where learning happens, how parameters update, and what the loss function is measuring.

Lessons

The Bigram Baseline
Cross-Entropy Loss
Autograd Under the Hood
Backpropagation Step by Step
Learning Rate and Dynamics

Survived Gradient Gorge

MODULE 3

Attention Pass

Embeddings, positional information, self-attention, residual connections, MLP blocks, and layer normalization. The transformer core.

Outcome

Learner can trace a complete forward pass and explain every major component of the transformer architecture.

Lessons

Token and Position Embeddings
Self-Attention from Scratch
Multi-Head Attention
Residual Connections
MLP and Layer Norm

Reached Attention Pass

MODULE 4

Optimizer Ridge

Adam optimizer, training loops, logits interpretation, sampling strategies, temperature, and inference. From training to generation.

Outcome

Learner can configure training, interpret logits, and generate text with different sampling strategies.

Lessons

The Training Loop
Adam Optimizer
Interpreting Logits
Temperature and Sampling
From Training to Inference

Optimizer Ridge Conquered

MODULE 5

Summit Project

Modify the model, swap the dataset or architecture, and build your own tiny GPT variant. Ship something real.

Outcome

Learner ships a working variant and can defend every modification they made.

Lessons

Choosing Your Variant
Dataset Preparation
Architecture Modifications
Training Your Variant
Capstone Defense

Summit Unlocked

BONUS

Beyond the Map

What scales from tiny GPTs to real-world systems, and what changes in production. The bridge from learning to building.

Outcome

Learner understands the gap between microGPT and production systems, and knows where to go next.

Lessons

From Tiny to Large
What Changes at Scale
Production Considerations
Where to Go Next

I Followed the Karpath