custom hash-based COUNT(DISTINCT) aggregate - unexpectedly high memory consumption

From: Tomas Vondra <tv(at)fuzzy(dot)cz>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: custom hash-based COUNT(DISTINCT) aggregate - unexpectedly high memory consumption
Date: 2013-10-05 18:22:54
Message-ID: 525058FE.1060609@fuzzy.cz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

I've been playing with a custom COUNT(DISTINCT) aggregate based on hash
tables - it seems to be working as planned, except for memory
consumption which is much higher than I expected.

The code is available at https://github.com/tvondra/count_distinct and
in short it works like this:

1) There's a simple hash table for each group (so with HashAggregate
it's effectively a hash table of hash tables). The hash table
contains only distinct values for that particular group, so the
result is simply equal to number of items in the table.

2) The table starts very small (4 buckets), and when it grows too large
(more 20 items in a bucket on average) it gets resized on the fly to
4x the size. So the buckets grows 4 -> 16 -> 64 -> 256 -> ... (and
the max number of items in the table grows 80 -> 320 -> 1280 ...).

Let's assume I have a table like this

CREATE TABLE test_table (a int, b int);

INSERT INTO test_table SELECT mod(i,4000), (100000*random())::int
FROM generate_series(1,80000000) s(i);

And let's run a query like this:

SELECT a, count_distinct(b) FROM test_table GROUP a;

I.e. there are ~4k groups with ~20k distinct values for each group. This
query however consumes >5GB of memory, which is much more than I
expected, and I've spent a fair amount of time looking for memory leaks
in the code so I'm wondering if I'm missing something important.

The custom hash tables are built from these very simple structures:

typedef struct hash_element_t {
uint32 hash;
uint32 length;
char *value; /* the actual value (say 4B for int4) */
} hash_element_t;

typedef struct hash_bucket_t {
uint32 nitems;
hash_element_t * items; /* array of hash_element_t items */
} hash_bucket_t;

typedef struct hash_table_t {
uint16 nbits; /* number of significant bits of the hash */
uint32 nbuckets; /* number of buckets (2^nbits) */
uint32 nitems; /* current number of elements in the table */
hash_bucket_t * buckets; /* array of hash_bucket_t */
} hash_table_t;

I'm on 64-bit architecture and the example works with int32, which means
the sizes should be about this:

hash_element_t => 20B
hash_bucket_t => 4B + (20B * items in the bucket [in steps of 5])
hash_table_t => 4B + space for buckets

In the example above, there's ~20k unique values in each group. The
threshold is 20 items per bucket on average, so that's 1024 buckets, and
the buckets are almost full.

So for single group, the hash table size is about

4B + 1024 * (4B + 20 * 20B) = 413700B = ~ 400 kB

There are 4000 groups, so the total estimate is ~1.6GB in total.

However when executed (9.2, 9.3 and HEAD behave exactly the same), the
query consumes almost ~5GB of RAM (excluding shared buffers).

This is how the memory context stats: http://pastebin.com/JHjnehvC

The interesting part is

ExecutorState: 24576 total in 2 blocks; 16368 free (6 chunks); 8208 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
AggContext: 5106224640 total in 4612 blocks; 565779368 free
(1072086 chunks); 4540445272 used
TupleHashTable: 253952 total in 5 blocks; 55360 free
(16 chunks); 198592 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used

So there's ~5GB in the AggContext, ~500MB of that is free.

I'm aware that there will be some more memory allocated by the
HashAggregate itself, but although I'm not sure how much that should be,
I would expect a "small amount" compared to the 1.6GB.

So I'm wondering what uses those 3.5GB? Is this the overhead of
HashAggregate, or am I missing something (e.g. a memory leak in the code)?

Tomas

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Noah Misch 2013-10-05 22:10:02 Re: pgbench progress report improvements - split 3 v2 - A
Previous Message Sameer Thakur 2013-10-05 14:47:45 Re: pg_stat_statements: calls under-estimation propagation