Neural Networks: Zero to Hero

https://news.ycombinator.com/rss Hits: 18

Summary

A course by Andrej Karpathy on building neural networks, from scratch, in code. We start with the basics of backpropagation and build up to modern deep neural networks, like GPT. In my opinion language models are an excellent place to learn deep learning, even if your intention is to eventually go to other areas like computer vision because most of what you learn will be immediately transferable. This is why we dive into and focus on languade models. Prerequisites: solid programming (Python), intro-level math (e.g. derivative, gaussian). Syllabus 2h25m This is the most step-by-step spelled-out explanation of backpropagation and training of neural networks. It only assumes basic knowledge of Python and a vague recollection of calculus from high school. 1h57m We implement a bigram character-level language model, which we will further complexify in followup videos into a modern Transformer language model, like GPT. In this video, the focus is on (1) introducing torch.Tensor and its subtleties and use in efficiently evaluating neural networks and (2) the overall framework of language modeling that includes model training, sampling, and the evaluation of a loss (e.g. the negative log likelihood for classification). 1h15m We implement a multilayer perceptron (MLP) character-level language model. In this video we also introduce many basics of machine learning (e.g. model training, learning rate tuning, hyperparameters, evaluation, train/dev/test splits, under/overfitting, etc.). 1h55m We dive into some of the internals of MLPs with multiple layers and scrutinize the statistics of the forward pass activations, backward pass gradients, and some of the pitfalls when they are improperly scaled. We also look at the typical diagnostic tools and visualizations you'd want to use to understand the health of your deep network. We learn why training deep neural nets can be fragile and introduce the first modern innovation that made doing so much easier: Batch Normalization. Residual ...

First seen: 2026-01-04 06:19

Last seen: 2026-01-04 23:21

Read Full Article More from this Source

Neural Networks: Zero to Hero

Summary

Related News

The Concise TypeScript Book

'Bandersnatch': The Works That Inspired the 'Black Mirror' Interactive Feature (2019)

I build products to get "unplugged" from the internet

CPU Counters on Apple Silicon: article + tool

Datadog, Thank You for Blocking Us