Module 3: Beyond Prediction

Reasoning and Alignment in Large Language Models

The Paradigm Shift

This module explores how language models transcend simple pattern prediction to develop genuine reasoning capabilities and align with human values through reinforcement learning.

From Imitation to Experience

📚 Supervised Learning:
Learn from examples
🎯 Reinforcement Learning:
Learn from feedback
6
Interactive Demos
4
Core Sessions
2
Key Applications
1
Paradigm Shift
3.0

Beyond Prediction - Learning Without Labels

Why language models need reinforcement learning and how it enables new capabilities

1
Visualization
Learning Paradigm Shift
Interactive comparison of supervised vs reinforcement learning approaches
Paradigm Shift RL vs Supervised Concepts
Explore Paradigm Shift
3.1

The Alignment Problem and RLHF

How reinforcement learning from human feedback transforms text predictors into helpful assistants

2
Visualizations
RLHF Pipeline Demo
Complete walkthrough of the RLHF process from generation to policy update
RLHF Alignment Pipeline
Explore RLHF Pipeline
Preference Learning Demo
Interactive A/B testing interface showing how human preferences train reward models
Preference Learning Human Feedback Interactive
Try Preference Learning
3.2

Beyond Pattern Matching to Reasoning

How models develop sophisticated reasoning through chain-of-thought and test-time computation

2
Visualizations
Chain-of-Thought vs Direct Prediction
Side-by-side comparison showing how step-by-step reasoning improves accuracy
Chain-of-Thought Reasoning Comparison
Compare Reasoning
Test-Time Computation Explorer
Interactive demonstration of how "thinking time" improves solution quality
Test-Time Computation Dynamic Reasoning Interactive
Explore Computation
3.3

From Prediction to Reasoning - The Complete Journey

Synthesis of the entire evolution from n-grams to sophisticated reasoning systems

1
Visualization
Evolution Timeline Interactive
Complete journey from simple n-grams to reasoning systems with core ML principles
Evolution Timeline Synthesis ML Principles
Explore Evolution

Key Concepts in Module 3

🚫 Limits of Prediction
Why next-token prediction alone can't teach values, safety, or complex reasoning capabilities.
🔄 Learning from Feedback
How reinforcement learning enables models to learn from outcomes rather than just examples.
🎯 Human Alignment
Using human preferences to train helpful, harmless, and honest AI assistants through RLHF.
🧠 Chain-of-Thought
Making AI reasoning explicit and step-by-step to improve accuracy and transparency.
⚡ Test-Time Computation
Dynamic reasoning during inference that allows models to "think longer" for better solutions.
📈 The Complete Evolution
Understanding the full journey from statistical patterns to reasoning and alignment.