Founding Cohort — 150 seats max
Full Curriculum

The Trail Map

Seven modules. One tiny codebase. Each checkpoint builds on the last until you can explain every major block, modify the code with confidence, and build your own variant.

MODULE 0

Trailhead

Orientation, prerequisites, and what a GPT is actually trying to do. Set up your environment and understand the landscape before the hike begins.

Outcome

Learner understands the project scope, has a working environment, and can articulate what next-token prediction means at a high level.

Lessons

  • Welcome to the Trail
  • What is a GPT, really?
  • Environment Setup
  • The microGPT Codebase Tour
  • Your First Forward Pass
Trailhead Complete
MODULE 1

Token Trail

Documents, BOS, vocabulary, tokenization, and next-token prediction. Understand what the model sees and what it's trying to predict.

Outcome

Learner can explain what the model is predicting and why, trace from raw text to token IDs, and describe the vocabulary.

Lessons

  • From Text to Tokens
  • Building a Vocabulary
  • BOS and Special Tokens
  • Next-Token Prediction
  • What Goes In, What Comes Out
Token Trail Cleared
MODULE 2

Gradient Gorge

Bigram intuition, loss functions, autograd, backpropagation, and learning dynamics. Where the actual learning happens.

Outcome

Learner can explain where learning happens, how parameters update, and what the loss function is measuring.

Lessons

  • The Bigram Baseline
  • Cross-Entropy Loss
  • Autograd Under the Hood
  • Backpropagation Step by Step
  • Learning Rate and Dynamics
Survived Gradient Gorge
MODULE 3

Attention Pass

Embeddings, positional information, self-attention, residual connections, MLP blocks, and layer normalization. The transformer core.

Outcome

Learner can trace a complete forward pass and explain every major component of the transformer architecture.

Lessons

  • Token and Position Embeddings
  • Self-Attention from Scratch
  • Multi-Head Attention
  • Residual Connections
  • MLP and Layer Norm
Reached Attention Pass
MODULE 4

Optimizer Ridge

Adam optimizer, training loops, logits interpretation, sampling strategies, temperature, and inference. From training to generation.

Outcome

Learner can configure training, interpret logits, and generate text with different sampling strategies.

Lessons

  • The Training Loop
  • Adam Optimizer
  • Interpreting Logits
  • Temperature and Sampling
  • From Training to Inference
Optimizer Ridge Conquered
MODULE 5

Summit Project

Modify the model, swap the dataset or architecture, and build your own tiny GPT variant. Ship something real.

Outcome

Learner ships a working variant and can defend every modification they made.

Lessons

  • Choosing Your Variant
  • Dataset Preparation
  • Architecture Modifications
  • Training Your Variant
  • Capstone Defense
Summit Unlocked
BONUS

Beyond the Map

What scales from tiny GPTs to real-world systems, and what changes in production. The bridge from learning to building.

Outcome

Learner understands the gap between microGPT and production systems, and knows where to go next.

Lessons

  • From Tiny to Large
  • What Changes at Scale
  • Production Considerations
  • Where to Go Next
I Followed the Karpath