Learning to Play Tic-Tac-Toe with Jax

https://news.ycombinator.com/rss Hits: 12
Summary

In this article we’ll learn how to train a neural network to play Tic-Tac-Toe using reinforcement learning in Jax. This article will aim to be more pedagogical, so the code we’ll end up with won’t be super optimized, but it will be fast enough to train a model to perfect play in about 15 seconds on a laptop. Code from this page can be found at this Github repo as well as in a Colab notebook (although the Colab notebook runs considerably more slowly). Playing Tic-Tac-Toe in Jax Before we get to the fancy neural networks and reinforcement learning we’ll first look at how a Tic-Tac-Toe game might be represented using Jax. For this we’ll use the PGX library, which implements a number of games in pure Jax. PGX represents a game’s state with a dataclass called State. This dataclass has a couple of fields: current_player: This is simply a 0 or a 1 and alternates on every turn. What is perhaps confusing about this is that there is no relationship between player 0 and an X or an O. Player 0 is randomly assigned X or O on each game and X always goes first. This is helpful because it means that you can assign your neural net to always play as Player 0 and ensure that it plays as X (and goes first) half the time and plays as O (going second) half the time. observation: This tells us what the board looks like at the current step. The representation PGX uses is a boolean array of shape (3, 3, 2). The first two axes represent the 3x3 grid as you might expect, and then the first channel of the last axis is True wherever there is a piece for the current player and the second channel is True wherever there is a piece for the opponent. (Note that the axes switch on every turn since the current_player switches.) For example, here is a state that the board might be in: This gets represented as: Array([[[False, False], [False, True], [False, True]], [[False, False], [ True, False], [False, False]], [[ True, False], [False, False], [False, False]]], dtype=bool) legal_action_mask: This is ...

First seen: 2026-01-04 06:19

Last seen: 2026-01-09 04:50