From Bigram to Transformer Permalink
Published:
Character-level name generation is a simple but effective task for understanding modern generative sequence models. This blog explains a PyTorch-based implementation that progressively builds name generators from a count-based Bigram model to Neural Bigram, MLP, RNN, GRU, LSTM, and a tiny decoder-only Transformer. Inspired by Karpathy’s makemore, the project uses a names dataset where each model learns autoregressive next-character prediction with a special start/end token. This blog post explains concepts includes embeddings, hidden states, gated memory, and causal self-attention. Implementation can be found at repository.
