I recently worked on https://github.com/ClickHouse/ClickHouse/pull/87366. The idea is simple but learned a lot of ClickHouse aggregation, so want to jot it down.BackgroundThe original issue stated it clear, if you run almost identical queries, their performances varies a lot!milovidov-pc :) SELECT number % 10000 AS k, uniq(number) AS u FROM numbers_mt(1e9) GROUP BY k ORDER BY u DESC LIMIT 10 SELECT number % 10000 AS k, uniq(number) AS u FROM numbers_mt(1000000000.) GROUP BY k ORDER BY u DESC LIMIT 10 Query id: d3e22e17-1a32-4615-bf1a-a2da6e0510eb โโโโโkโโฌโโโโโโuโโ 1. โ 4759 โ 101196 โ 2. โ 4587 โ 101079 โ 3. โ 6178 โ 101034 โ 4. โ 6567 โ 101032 โ 5. โ 9463 โ 101013 โ 6. โ 298 โ 101009 โ 7. โ 2049 โ 100993 โ 8. โ 8167 โ 100989 โ 9. โ 5530 โ 100973 โ 10. โ 1968 โ 100973 โ โโโโโโโโดโโโโโโโโโ 10 rows in set. Elapsed: 62.793 sec. Processed 1.00 billion rows, 8.00 GB (15.93 million rows/s., 127.40 MB/s.) Peak memory usage: 11.30 GiB. milovidov-pc :) SELECT 0 + number % 10000 AS k, uniq(number) AS u FROM numbers_mt(1e9) GROUP BY k ORDER BY u DESC LIMIT 10 SELECT 0 + (number % 10000) AS k, uniq(number) AS u FROM numbers_mt(1000000000.) GROUP BY k ORDER BY u DESC LIMIT 10 Query id: e6a24292-54cf-47cb-8e39-81584736d41a โโโโโkโโฌโโโโโโuโโ 1. โ 4759 โ 101196 โ 2. โ 4587 โ 101079 โ 3. โ 6178 โ 101034 โ 4. โ 6567 โ 101032 โ 5. โ 9463 โ 101013 โ 6. โ 298 โ 101009 โ 7. โ 2049 โ 100993 โ 8. โ 8167 โ 100989 โ 9. โ 5530 โ 100973 โ 10. โ 1968 โ 100973 โ โโโโโโโโดโโโโโโโโโ 10 rows in set. Elapsed: 8.547 sec. Processed 1.00 billion rows, 8.00 GB (116.99 million rows/s., 935.95 MB/s.) Peak memory usage: 10.09 GiB.The only difference is that second query is using 0 + (number % 10000) for the group by value k.ClickHouse treats the k as UInt16 in first query and UInt64 (or other types that UInt16). But why this matters? Let's delve into the aggregation technical details a bit in ClickHouse.How Aggregation WorksWhen group by a number smaller than UInt16, could use array for hashmap.Otherwise use...
First seen: 2025-12-20 05:23
Last seen: 2025-12-20 05:23