King – man + woman is queen; but why? (2017)

https://news.ycombinator.com/rss Hits: 3
Summary

Intro word2vec is an algorithm that transforms words into vectors, so that words with similar meanings end up laying close to each other. Moreover, it allows us to use vector arithmetics to work with analogies, for example, the famous king - man + woman = queen. I will try to explain how it works, with special emphasis on the meaning of vector differences, at the same time omitting as many technicalities as possible. If you would rather explore than read, here is an interactive exploration by my mentee Julia Bazińska, now a freshman in computer science at the University of Warsaw: Word2viz by using GloVe pre-trained vectors (it takes 30MB to load - please be patient) Counts, coincidences and meaning I love letter co-occurrence in the word co-occurrence. Sometimes a seemingly naive technique gives powerful results. It turns out that merely looking at word coincidences, while ignoring all grammar and context, can provide us insight into the meaning of a word. Consider this sentence: A small, fluffy roosety climbed a tree. What’s a roosety? I would say that something like a squirrel since the two words can be easily interchanged. Such reasoning is called the distributional hypothesis and can be summarized as: a word is characterized by the company it keeps - John Rupert Firth If we want to teach it to a computer, the simplest, approximated approach is making it look only at word pairs. Let P(a|b) be the conditional probability that given a word b there is a word a within a short distance (let’s say - being spaced by no more than 2 words). Then we claim that two words a and b are similar if P(w∣a)=P(w∣b)P(w|a) = P(w|b)P(w∣a)=P(w∣b) for every word w. In other words, if we have this equality, no matter if there is a word a or b, all other words occur with the same frequency. Even simple word counts, compared by source, can give interesting results, e.g. in lyrics of metal songs words (cries, eternity or ashes are popular, while words particularly or approximately are not,...

First seen: 2026-01-20 10:33

Last seen: 2026-01-20 12:34