Re: Hstore: Query speedups with Gin index

From: Blake Smith <blakesmith0(at)gmail(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Hstore: Query speedups with Gin index
Date: 2013-09-03 14:24:07
Message-ID: CAPxT4eHJfC=kTr17spGKuHTy_5C0U3U8uC3Cmn-V+k++7CT7=w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thanks for the feedback everyone. I've attached the patch that we are now
running in production to service our hstore include queries. We rebuilt the
index to account for the on-disk incompatibility. I've submitted the patch
to commitfest here:
https://commitfest.postgresql.org/action/patch_view?id=1203

Michael: I don't have a formal benchmark, but several of our worst queries
went from 10-20 seconds per query down to 50-400 ms. These are numbers
we've seen when testing real production queries against our production
dataset with real world access patterns.
Oleg: Thanks for your thoughts on this change. As for the spgist / gin work
you're doing, is there anything you need help with or are you still in the
research phase? I'd love to help get something more robust merged into
mainline if you think there's collaborative work to be done (even if it's
only user testing).

Thanks,

Blake

On Wed, Aug 28, 2013 at 12:40 PM, Andres Freund <andres(at)2ndquadrant(dot)com>wrote:

> On 2013-08-28 13:31:22 -0400, Bruce Momjian wrote:
> > On Sun, Aug 25, 2013 at 10:11:50PM -0400, Tom Lane wrote:
> > > Michael Paquier <michael(dot)paquier(at)gmail(dot)com> writes:
> > > > On Thu, Aug 22, 2013 at 11:55 PM, Blake Smith <blakesmith0(at)gmail(dot)com>
> wrote:
> > > >> The combined entry is used to support "contains (@>)" queries, and
> the key
> > > >> only item is used to support "key contains (?)" queries. This
> change seems
> > > >> to help especially with hstore keys that have high cardinalities.
> Downsides
> > > >> of this change is that it requires an index rebuild, and the index
> will be
> > > >> larger in size.
> > >
> > > > Index rebuild would be a problem only for minor releases,
> > >
> > > That's completely false; people have expected major releases to be
> > > on-disk-compatible for several years now. While there probably will be
> > > future releases in which we are willing to break storage compatibility,
> > > a contrib module doesn't get to dictate that.
> > >
> > > What might be a practical solution, especially if this isn't always a
> > > win (which seems likely given the index-bloat risk), is to make hstore
> > > offer two different GIN index opclasses, one that works the traditional
> > > way and one that works this way.
> > >
> > > Another thing that needs to be taken into account here is Oleg and
> > > Teodor's in-progress work on extending hstore:
> > > https://www.pgcon.org/2013/schedule/events/518.en.html
> > > I'm not sure if this patch would conflict with that at all, but it
> > > needs to be considered.
> >
> > We can disallow in-place upgrades for clusters that use certain contrib
> > modules --- we have done that in the past.
>
> But that really cannot be acceptable for hstore. The probably most
> widely used extension there is.
>
> Greetings,
>
> Andres Freund
>
> --
> Andres Freund http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Training & Services
>

--
Blake Smith
http://blakesmith.me
@blakesmith

Attachment Content-Type Size
hstore_gin_speedup.patch application/octet-stream 3.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2013-09-03 14:43:06 Re: [9.4] Make full_page_writes only settable on server start?
Previous Message Tom Lane 2013-09-03 14:13:12 Re: operator precedence issues