Module 2: Transformer Architecture

Interactive Transformer Explorations

This module takes you inside the transformer architecture through hands-on visualizations. See how attention mechanisms enable dynamic focus, how position embeddings solve order problems, and how the complete transformer block processes language.

13+

Interactive Demos

Core Sessions

Key Concepts

Prerequisites

2.0

The Big Picture: Generative Search Engines

Conceptual overview of transformers as generative search systems

Visualizations

Generative Search Engine Demo

See how transformers differ from traditional search engines

Concepts Search Generation

Explore Demo

Architecture Evolution

Trace the evolution from Bengio's model to modern transformers and understand scaling advantages

Evolution Bengio to Transformer Scaling

Explore Demo

Attention vs Knowledge Storage

Interactive comparison of attention mechanism and FFN knowledge storage

Architecture Concepts

Coming Soon

2.1

From Text to Transformer Inputs

Tokenization, knowledge storage, and the selection problem

Visualizations

Tokenization Explorer

Discover how subword tokenization handles the long tail of language

Tokenization BPE Power Laws

Explore Tokenization

FFN Knowledge Storage

See how feed-forward networks store knowledge through expand-contract architecture

FFN Knowledge Architecture

Explore Storage

Selection Problem Demo

Why Bengio's fixed concatenation fails for dynamic language understanding

Selection Context Problems

See the Problem

2.2

Attention and the Transformer Block

Understanding attention mechanisms and complete transformer architecture

Visualizations

Attention Visualizer

Interactive exploration of attention weights and dynamic selection

Attention Weights Selection

Explore Attention

Position Embeddings

Why attention needs position information to understand word order

Position Order Embeddings

Explore Position

Multi-Head Attention

How different heads specialize in different types of relationships

Multi-Head Specialization Parallel

Explore Heads

Transformer Block Builder

Build and understand the complete transformer block architecture

Architecture Builder Complete

Build Blocks

2.3

Training and Scaling Modern LLMs

Scaling laws and supervised fine-tuning transformation

Visualizations

Scaling Laws Explorer

Discover how AI performance improves predictably with scale

Scaling Power Laws Performance

Explore Scaling

SFT Transformation

See how supervised fine-tuning transforms text predictors into assistants

SFT Training Assistant

See Transformation

Pre-training vs Fine-tuning

Interactive comparison of training phases and their effects

Pre-training Fine-tuning Comparison

Coming Soon