Re: jsonb and nested hstore

From: Tomas Vondra <tv(at)fuzzy(dot)cz>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: jsonb and nested hstore
Date: 2014-03-11 22:58:30
Message-ID: 531F9516.8080808@fuzzy.cz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

I've spent a few hours stress-testing this a bit - loading a mail
archive with ~1M of messages (with headers stored in a jsonb column) and
then doing queries on that. Good news - no crashes or any such issues so
far. The queries that I ran manually seem to return sane results.

The only problem I ran into is with limited index row size with GIN
indexes. I understand it's not a bug, but I admit I haven't realized I
might run into it in this case ...

The data I used for testing is just a bunch of e-mail messages, with
headers stored as jsonb, so each row has something like this in
"headers" column:

{
"from" : "John Doe <john(at)example(dot)com>",
"to" : ["Jane Doe <jane(at)example(dot)com>", "Jack Doe <jack(at)example(dot)com>"],
"cc" : ...,
"bcc" : ...,
... various other headers ...
}

The snag is that some of the header values may be very long, exceeding
the limit of 1352 bytes and causing errors like this:

ERROR: index row size 1416 exceeds maximum 1352 for index "gin_idx"

A good example of such header is "dkim-signature" which basically
contains the whole message digitally signed with DKIM. The signature
tends to be long and non-compressible, thanks to the signature.

I'm wondering what's the best way around this, because I suspect many
new users (especially those attracted by jsonb and GIN improvements)
will run into this. Maybe not immediately, but eventully they'll try to
insert a jsonb with long value, and it will fail ...

With btree indexes on text I would probably create an index on
substr(column,0,1000) or something like that, but doing that with JSON
seems a bit strange.

I assume we need to store the actual values in the GIN index (so a hash
is not sufficient), right?

GIST indexes work, but with that I have to give up the significant
performance gains that we got thanks to Alexander's GIN patches.

regards
Tomas

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2014-03-11 23:41:00 Re: db_user_namespace a "temporary measure"
Previous Message Tomas Vondra 2014-03-11 22:09:24 Re: GIN improvements part2: fast scan