Re: Hstore: Query speedups with Gin index

Lists: pgsql-hackers
From: Blake Smith <blakesmith0(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Hstore: Query speedups with Gin index
Date: 2013-08-22 14:55:45
Message-ID: CAPxT4eEe-MNs4FbVN1Le_nWsTgyw3N38tENbN5W7GPBrWB7cJA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hey everyone,

I'm looking for feedback on a contrib/hstore patch.

We've been experiencing slow "@>" queries involving an hstore column that's
covered by a Gin index. At the current postgresql git HEAD, the hstore <->
gin interface produces the following text items to be indexed:

hstore: "'a'=>'1234', 'b'=>'test'"
Produces indexed text items: "Ka", "V1234", "Kb", "Vtest"

For the size of our production table (10s of millions of rows), I observed
significant query speedups by changing the index strategy to the following:

hstore: "'a'=>'1234', 'b'=>'test'"
Produces indexed text items: "Ka", "KaV1234", "Kb", "KbVtest"

The combined entry is used to support "contains (@>)" queries, and the key
only item is used to support "key contains (?)" queries. This change seems
to help especially with hstore keys that have high cardinalities. Downsides
of this change is that it requires an index rebuild, and the index will be
larger in size.

Patch attached. Any thoughts on this change?

Thanks,

Blake

Attachment Content-Type Size
hstore_gin_speedup.patch application/octet-stream 3.4 KB

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Blake Smith <blakesmith0(at)gmail(dot)com>
Cc: PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Hstore: Query speedups with Gin index
Date: 2013-08-26 01:36:54
Message-ID: CAB7nPqTZdPCY-hh9kr8L2MzW1cEeJukEbutiP3dMhhrkKXyF+A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Aug 22, 2013 at 11:55 PM, Blake Smith <blakesmith0(at)gmail(dot)com> wrote:
> We've been experiencing slow "@>" queries involving an hstore column that's
> covered by a Gin index. At the current postgresql git HEAD, the hstore <->
> gin interface produces the following text items to be indexed:
>
> hstore: "'a'=>'1234', 'b'=>'test'"
> Produces indexed text items: "Ka", "V1234", "Kb", "Vtest"
>
> For the size of our production table (10s of millions of rows), I observed
> significant query speedups by changing the index strategy to the following:
What is the order of the speedup?

> hstore: "'a'=>'1234', 'b'=>'test'"
> Produces indexed text items: "Ka", "KaV1234", "Kb", "KbVtest"
I am not a gin expert, but do you see the same speedup for tables with
a lower number of rows, or even a degradation in performance?

> The combined entry is used to support "contains (@>)" queries, and the key
> only item is used to support "key contains (?)" queries. This change seems
> to help especially with hstore keys that have high cardinalities. Downsides
> of this change is that it requires an index rebuild, and the index will be
> larger in size.
Index rebuild would be a problem only for minor releases, this patch
would be applied only on the current master branch for 9.4 and above.

> Patch attached. Any thoughts on this change?
Please add your patch to the next commit fest that will begin in 3
weeks so as you could get more formal review.
https://commitfest.postgresql.org/action/commitfest_view?id=19

Regards,
--
Michael


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: Blake Smith <blakesmith0(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Hstore: Query speedups with Gin index
Date: 2013-08-26 02:11:50
Message-ID: 5170.1377483110@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Michael Paquier <michael(dot)paquier(at)gmail(dot)com> writes:
> On Thu, Aug 22, 2013 at 11:55 PM, Blake Smith <blakesmith0(at)gmail(dot)com> wrote:
>> The combined entry is used to support "contains (@>)" queries, and the key
>> only item is used to support "key contains (?)" queries. This change seems
>> to help especially with hstore keys that have high cardinalities. Downsides
>> of this change is that it requires an index rebuild, and the index will be
>> larger in size.

> Index rebuild would be a problem only for minor releases,

That's completely false; people have expected major releases to be
on-disk-compatible for several years now. While there probably will be
future releases in which we are willing to break storage compatibility,
a contrib module doesn't get to dictate that.

What might be a practical solution, especially if this isn't always a
win (which seems likely given the index-bloat risk), is to make hstore
offer two different GIN index opclasses, one that works the traditional
way and one that works this way.

Another thing that needs to be taken into account here is Oleg and
Teodor's in-progress work on extending hstore:
https://www.pgcon.org/2013/schedule/events/518.en.html
I'm not sure if this patch would conflict with that at all, but it
needs to be considered.

regards, tom lane


From: Oleg Bartunov <obartunov(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Blake Smith <blakesmith0(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Hstore: Query speedups with Gin index
Date: 2013-08-26 12:26:40
Message-ID: CAF4Au4zNqezd2Vx5By11J+YDB=uCLF-Z8N1MRYOcgO65Ns1yNQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Michael, take a look on http://obartunov.livejournal.com/171959.html
As for the indexing stuff we already thought many times about key&value
mixing, but real solution, probably, could come from spgist and gin
combination. I mean, spgist (suffix array) instead of btree for avoiding
key duplication, which is real stopper for key.value mixing, especially,
for deep nesting structures. We'll research further and probably will
develop a prototype of such hybrid search tree.

Oleg

On Mon, Aug 26, 2013 at 6:11 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Michael Paquier <michael(dot)paquier(at)gmail(dot)com> writes:
> > On Thu, Aug 22, 2013 at 11:55 PM, Blake Smith <blakesmith0(at)gmail(dot)com>
> wrote:
> >> The combined entry is used to support "contains (@>)" queries, and the
> key
> >> only item is used to support "key contains (?)" queries. This change
> seems
> >> to help especially with hstore keys that have high cardinalities.
> Downsides
> >> of this change is that it requires an index rebuild, and the index will
> be
> >> larger in size.
>
> > Index rebuild would be a problem only for minor releases,
>
> That's completely false; people have expected major releases to be
> on-disk-compatible for several years now. While there probably will be
> future releases in which we are willing to break storage compatibility,
> a contrib module doesn't get to dictate that.
>
> What might be a practical solution, especially if this isn't always a
> win (which seems likely given the index-bloat risk), is to make hstore
> offer two different GIN index opclasses, one that works the traditional
> way and one that works this way.
>
> Another thing that needs to be taken into account here is Oleg and
> Teodor's in-progress work on extending hstore:
> https://www.pgcon.org/2013/schedule/events/518.en.html
> I'm not sure if this patch would conflict with that at all, but it
> needs to be considered.
>
> regards, tom lane
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Blake Smith <blakesmith0(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Hstore: Query speedups with Gin index
Date: 2013-08-28 17:31:22
Message-ID: 20130828173122.GC27334@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Aug 25, 2013 at 10:11:50PM -0400, Tom Lane wrote:
> Michael Paquier <michael(dot)paquier(at)gmail(dot)com> writes:
> > On Thu, Aug 22, 2013 at 11:55 PM, Blake Smith <blakesmith0(at)gmail(dot)com> wrote:
> >> The combined entry is used to support "contains (@>)" queries, and the key
> >> only item is used to support "key contains (?)" queries. This change seems
> >> to help especially with hstore keys that have high cardinalities. Downsides
> >> of this change is that it requires an index rebuild, and the index will be
> >> larger in size.
>
> > Index rebuild would be a problem only for minor releases,
>
> That's completely false; people have expected major releases to be
> on-disk-compatible for several years now. While there probably will be
> future releases in which we are willing to break storage compatibility,
> a contrib module doesn't get to dictate that.
>
> What might be a practical solution, especially if this isn't always a
> win (which seems likely given the index-bloat risk), is to make hstore
> offer two different GIN index opclasses, one that works the traditional
> way and one that works this way.
>
> Another thing that needs to be taken into account here is Oleg and
> Teodor's in-progress work on extending hstore:
> https://www.pgcon.org/2013/schedule/events/518.en.html
> I'm not sure if this patch would conflict with that at all, but it
> needs to be considered.

We can disallow in-place upgrades for clusters that use certain contrib
modules --- we have done that in the past.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Blake Smith <blakesmith0(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Hstore: Query speedups with Gin index
Date: 2013-08-28 17:40:13
Message-ID: 20130828174013.GA32052@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2013-08-28 13:31:22 -0400, Bruce Momjian wrote:
> On Sun, Aug 25, 2013 at 10:11:50PM -0400, Tom Lane wrote:
> > Michael Paquier <michael(dot)paquier(at)gmail(dot)com> writes:
> > > On Thu, Aug 22, 2013 at 11:55 PM, Blake Smith <blakesmith0(at)gmail(dot)com> wrote:
> > >> The combined entry is used to support "contains (@>)" queries, and the key
> > >> only item is used to support "key contains (?)" queries. This change seems
> > >> to help especially with hstore keys that have high cardinalities. Downsides
> > >> of this change is that it requires an index rebuild, and the index will be
> > >> larger in size.
> >
> > > Index rebuild would be a problem only for minor releases,
> >
> > That's completely false; people have expected major releases to be
> > on-disk-compatible for several years now. While there probably will be
> > future releases in which we are willing to break storage compatibility,
> > a contrib module doesn't get to dictate that.
> >
> > What might be a practical solution, especially if this isn't always a
> > win (which seems likely given the index-bloat risk), is to make hstore
> > offer two different GIN index opclasses, one that works the traditional
> > way and one that works this way.
> >
> > Another thing that needs to be taken into account here is Oleg and
> > Teodor's in-progress work on extending hstore:
> > https://www.pgcon.org/2013/schedule/events/518.en.html
> > I'm not sure if this patch would conflict with that at all, but it
> > needs to be considered.
>
> We can disallow in-place upgrades for clusters that use certain contrib
> modules --- we have done that in the past.

But that really cannot be acceptable for hstore. The probably most
widely used extension there is.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Blake Smith <blakesmith0(at)gmail(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Hstore: Query speedups with Gin index
Date: 2013-09-03 14:24:07
Message-ID: CAPxT4eHJfC=kTr17spGKuHTy_5C0U3U8uC3Cmn-V+k++7CT7=w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Thanks for the feedback everyone. I've attached the patch that we are now
running in production to service our hstore include queries. We rebuilt the
index to account for the on-disk incompatibility. I've submitted the patch
to commitfest here:
https://commitfest.postgresql.org/action/patch_view?id=1203

Michael: I don't have a formal benchmark, but several of our worst queries
went from 10-20 seconds per query down to 50-400 ms. These are numbers
we've seen when testing real production queries against our production
dataset with real world access patterns.
Oleg: Thanks for your thoughts on this change. As for the spgist / gin work
you're doing, is there anything you need help with or are you still in the
research phase? I'd love to help get something more robust merged into
mainline if you think there's collaborative work to be done (even if it's
only user testing).

Thanks,

Blake

On Wed, Aug 28, 2013 at 12:40 PM, Andres Freund <andres(at)2ndquadrant(dot)com>wrote:

> On 2013-08-28 13:31:22 -0400, Bruce Momjian wrote:
> > On Sun, Aug 25, 2013 at 10:11:50PM -0400, Tom Lane wrote:
> > > Michael Paquier <michael(dot)paquier(at)gmail(dot)com> writes:
> > > > On Thu, Aug 22, 2013 at 11:55 PM, Blake Smith <blakesmith0(at)gmail(dot)com>
> wrote:
> > > >> The combined entry is used to support "contains (@>)" queries, and
> the key
> > > >> only item is used to support "key contains (?)" queries. This
> change seems
> > > >> to help especially with hstore keys that have high cardinalities.
> Downsides
> > > >> of this change is that it requires an index rebuild, and the index
> will be
> > > >> larger in size.
> > >
> > > > Index rebuild would be a problem only for minor releases,
> > >
> > > That's completely false; people have expected major releases to be
> > > on-disk-compatible for several years now. While there probably will be
> > > future releases in which we are willing to break storage compatibility,
> > > a contrib module doesn't get to dictate that.
> > >
> > > What might be a practical solution, especially if this isn't always a
> > > win (which seems likely given the index-bloat risk), is to make hstore
> > > offer two different GIN index opclasses, one that works the traditional
> > > way and one that works this way.
> > >
> > > Another thing that needs to be taken into account here is Oleg and
> > > Teodor's in-progress work on extending hstore:
> > > https://www.pgcon.org/2013/schedule/events/518.en.html
> > > I'm not sure if this patch would conflict with that at all, but it
> > > needs to be considered.
> >
> > We can disallow in-place upgrades for clusters that use certain contrib
> > modules --- we have done that in the past.
>
> But that really cannot be acceptable for hstore. The probably most
> widely used extension there is.
>
> Greetings,
>
> Andres Freund
>
> --
> Andres Freund http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Training & Services
>

--
Blake Smith
http://blakesmith.me
@blakesmith

Attachment Content-Type Size
hstore_gin_speedup.patch application/octet-stream 3.4 KB

From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Blake Smith <blakesmith0(at)gmail(dot)com>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Hstore: Query speedups with Gin index
Date: 2013-09-04 01:04:58
Message-ID: 1378256698.28745.1.camel@vanquo.pezone.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, 2013-09-03 at 09:24 -0500, Blake Smith wrote:
> Thanks for the feedback everyone. I've attached the patch that we are
> now running in production to service our hstore include queries. We
> rebuilt the index to account for the on-disk incompatibility. I've
> submitted the patch to commitfest
> here: https://commitfest.postgresql.org/action/patch_view?id=1203

I'm getting the attached failure from the hstore regression test.

Attachment Content-Type Size
regression.diffs text/x-patch 638.8 KB

From: Blake Smith <blakesmith0(at)gmail(dot)com>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Hstore: Query speedups with Gin index
Date: 2013-09-05 18:42:56
Message-ID: CAPxT4eEYDTz2sY+tEwf+nZrp=Z5zxaWgq+Bm_WUhfi7Qf+LkDQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi Peter,

Thanks for checking the tests. I wasn't able to duplicate your test
results. Did you run the hstore regression tests with the revised patch I
attached in the thread? Attached is the output I got with the latest patch
applied.

Thanks!

Blake

On Tue, Sep 3, 2013 at 8:04 PM, Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:

> On Tue, 2013-09-03 at 09:24 -0500, Blake Smith wrote:
> > Thanks for the feedback everyone. I've attached the patch that we are
> > now running in production to service our hstore include queries. We
> > rebuilt the index to account for the on-disk incompatibility. I've
> > submitted the patch to commitfest
> > here: https://commitfest.postgresql.org/action/patch_view?id=1203
>
> I'm getting the attached failure from the hstore regression test.
>
>
>
>

--
Blake Smith
http://blakesmith.me
@blakesmith

Attachment Content-Type Size
regressions.diff application/octet-stream 1.0 KB

From: Oleg Bartunov <obartunov(at)gmail(dot)com>
To: Blake Smith <blakesmith0(at)gmail(dot)com>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Hstore: Query speedups with Gin index
Date: 2013-09-06 18:47:31
Message-ID: CAF4Au4z05LhmA-=BzwPAx6stcu0Qkzd+jm231NEcNHcbuoyN4g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Blake,

I think it's better to implement this patch as a separate opclass, so users
will have option to choose indexing.

Oleg

On Tue, Sep 3, 2013 at 6:24 PM, Blake Smith <blakesmith0(at)gmail(dot)com> wrote:

> Thanks for the feedback everyone. I've attached the patch that we are now
> running in production to service our hstore include queries. We rebuilt the
> index to account for the on-disk incompatibility. I've submitted the patch
> to commitfest here:
> https://commitfest.postgresql.org/action/patch_view?id=1203
>
> Michael: I don't have a formal benchmark, but several of our worst queries
> went from 10-20 seconds per query down to 50-400 ms. These are numbers
> we've seen when testing real production queries against our production
> dataset with real world access patterns.
> Oleg: Thanks for your thoughts on this change. As for the spgist / gin
> work you're doing, is there anything you need help with or are you still in
> the research phase? I'd love to help get something more robust merged into
> mainline if you think there's collaborative work to be done (even if it's
> only user testing).
>
> Thanks,
>
> Blake
>
>
>
>
> On Wed, Aug 28, 2013 at 12:40 PM, Andres Freund <andres(at)2ndquadrant(dot)com>wrote:
>
>> On 2013-08-28 13:31:22 -0400, Bruce Momjian wrote:
>> > On Sun, Aug 25, 2013 at 10:11:50PM -0400, Tom Lane wrote:
>> > > Michael Paquier <michael(dot)paquier(at)gmail(dot)com> writes:
>> > > > On Thu, Aug 22, 2013 at 11:55 PM, Blake Smith <
>> blakesmith0(at)gmail(dot)com> wrote:
>> > > >> The combined entry is used to support "contains (@>)" queries, and
>> the key
>> > > >> only item is used to support "key contains (?)" queries. This
>> change seems
>> > > >> to help especially with hstore keys that have high cardinalities.
>> Downsides
>> > > >> of this change is that it requires an index rebuild, and the index
>> will be
>> > > >> larger in size.
>> > >
>> > > > Index rebuild would be a problem only for minor releases,
>> > >
>> > > That's completely false; people have expected major releases to be
>> > > on-disk-compatible for several years now. While there probably will
>> be
>> > > future releases in which we are willing to break storage
>> compatibility,
>> > > a contrib module doesn't get to dictate that.
>> > >
>> > > What might be a practical solution, especially if this isn't always a
>> > > win (which seems likely given the index-bloat risk), is to make hstore
>> > > offer two different GIN index opclasses, one that works the
>> traditional
>> > > way and one that works this way.
>> > >
>> > > Another thing that needs to be taken into account here is Oleg and
>> > > Teodor's in-progress work on extending hstore:
>> > > https://www.pgcon.org/2013/schedule/events/518.en.html
>> > > I'm not sure if this patch would conflict with that at all, but it
>> > > needs to be considered.
>> >
>> > We can disallow in-place upgrades for clusters that use certain contrib
>> > modules --- we have done that in the past.
>>
>> But that really cannot be acceptable for hstore. The probably most
>> widely used extension there is.
>>
>> Greetings,
>>
>> Andres Freund
>>
>> --
>> Andres Freund http://www.2ndQuadrant.com/
>> PostgreSQL Development, 24x7 Support, Training & Services
>>
>
>
>
> --
> Blake Smith
> http://blakesmith.me
> @blakesmith
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Blake Smith <blakesmith0(at)gmail(dot)com>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Hstore: Query speedups with Gin index
Date: 2013-09-06 19:17:37
Message-ID: 522A2A51.3090002@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 9/5/13 2:42 PM, Blake Smith wrote:
> Thanks for checking the tests. I wasn't able to duplicate your test
> results. Did you run the hstore regression tests with the revised patch
> I attached in the thread? Attached is the output I got with the latest
> patch applied.

See
http://pgci.eisentraut.org/jenkins/job/postgresql_commitfest_world/46/consoleFull

Perhaps you didn't build with --enable-cassert?


From: Blake Smith <blakesmith0(at)gmail(dot)com>
To: obartunov(at)gmail(dot)com
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Hstore: Query speedups with Gin index
Date: 2013-09-09 14:55:01
Message-ID: CAPxT4eHn7BctMjiUo-LjB4-JU6hHfByeRnkn+e7pDXtx1HhOZw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Thanks for getting back to me about this change Oleg. I took your advice
and reworked the patch by adding a new hstore gin opclass
(gin_hstore_combined_ops) and leaving the functionality of the default
hstore gin opclass the same. This should prevent the on-disk compatibility
issues from the first patch, and allow users to select the different
indexing method when they build the index. The hstore regression suite is
passing for me locally with the --enable-cassert configure flag. Please let
me know what you think and if there is any other work that would need to be
done (style cleanups, updating documentation, etc) to get this merged.

Thanks!

Blake

On Fri, Sep 6, 2013 at 1:47 PM, Oleg Bartunov <obartunov(at)gmail(dot)com> wrote:

> Blake,
>
> I think it's better to implement this patch as a separate opclass, so
> users will have option to choose indexing.
>
> Oleg
>
>
> On Tue, Sep 3, 2013 at 6:24 PM, Blake Smith <blakesmith0(at)gmail(dot)com> wrote:
>
>> Thanks for the feedback everyone. I've attached the patch that we are now
>> running in production to service our hstore include queries. We rebuilt the
>> index to account for the on-disk incompatibility. I've submitted the patch
>> to commitfest here:
>> https://commitfest.postgresql.org/action/patch_view?id=1203
>>
>> Michael: I don't have a formal benchmark, but several of our worst
>> queries went from 10-20 seconds per query down to 50-400 ms. These are
>> numbers we've seen when testing real production queries against our
>> production dataset with real world access patterns.
>> Oleg: Thanks for your thoughts on this change. As for the spgist / gin
>> work you're doing, is there anything you need help with or are you still in
>> the research phase? I'd love to help get something more robust merged into
>> mainline if you think there's collaborative work to be done (even if it's
>> only user testing).
>>
>> Thanks,
>>
>> Blake
>>
>>
>>
>>
>> On Wed, Aug 28, 2013 at 12:40 PM, Andres Freund <andres(at)2ndquadrant(dot)com>wrote:
>>
>>> On 2013-08-28 13:31:22 -0400, Bruce Momjian wrote:
>>> > On Sun, Aug 25, 2013 at 10:11:50PM -0400, Tom Lane wrote:
>>> > > Michael Paquier <michael(dot)paquier(at)gmail(dot)com> writes:
>>> > > > On Thu, Aug 22, 2013 at 11:55 PM, Blake Smith <
>>> blakesmith0(at)gmail(dot)com> wrote:
>>> > > >> The combined entry is used to support "contains (@>)" queries,
>>> and the key
>>> > > >> only item is used to support "key contains (?)" queries. This
>>> change seems
>>> > > >> to help especially with hstore keys that have high cardinalities.
>>> Downsides
>>> > > >> of this change is that it requires an index rebuild, and the
>>> index will be
>>> > > >> larger in size.
>>> > >
>>> > > > Index rebuild would be a problem only for minor releases,
>>> > >
>>> > > That's completely false; people have expected major releases to be
>>> > > on-disk-compatible for several years now. While there probably will
>>> be
>>> > > future releases in which we are willing to break storage
>>> compatibility,
>>> > > a contrib module doesn't get to dictate that.
>>> > >
>>> > > What might be a practical solution, especially if this isn't always a
>>> > > win (which seems likely given the index-bloat risk), is to make
>>> hstore
>>> > > offer two different GIN index opclasses, one that works the
>>> traditional
>>> > > way and one that works this way.
>>> > >
>>> > > Another thing that needs to be taken into account here is Oleg and
>>> > > Teodor's in-progress work on extending hstore:
>>> > > https://www.pgcon.org/2013/schedule/events/518.en.html
>>> > > I'm not sure if this patch would conflict with that at all, but it
>>> > > needs to be considered.
>>> >
>>> > We can disallow in-place upgrades for clusters that use certain contrib
>>> > modules --- we have done that in the past.
>>>
>>> But that really cannot be acceptable for hstore. The probably most
>>> widely used extension there is.
>>>
>>> Greetings,
>>>
>>> Andres Freund
>>>
>>> --
>>> Andres Freund http://www.2ndQuadrant.com/
>>> PostgreSQL Development, 24x7 Support, Training & Services
>>>
>>
>>
>>
>> --
>> Blake Smith
>> http://blakesmith.me
>> @blakesmith
>>
>>
>> --
>> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-hackers
>>
>>
>

--
Blake Smith
http://blakesmith.me
@blakesmith

Attachment Content-Type Size
0001-Add-gin_hstore_combined_ops-hstore-indexing-opclass.patch application/octet-stream 9.3 KB

From: Oleg Bartunov <obartunov(at)gmail(dot)com>
To: Blake Smith <blakesmith0(at)gmail(dot)com>, Teodor Sigaev <teodor(at)sigaev(dot)ru>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Hstore: Query speedups with Gin index
Date: 2013-09-10 11:58:44
Message-ID: CAF4Au4w77bbezSRTBWx2fB=RHyR3W0R7uLW8f=sifUQRECnofA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Blake,

Teodor will review your patch, but I have one consideration about the patch
in context of future hstore, which supports hierarchical structures. In
that case overhead of composite keys will be enormous and the only way in
this direction is to think about idea suffix array instead of btree to
store keys. But this is another big task and I afraid to think about this
now.

Oleg

On Mon, Sep 9, 2013 at 6:55 PM, Blake Smith <blakesmith0(at)gmail(dot)com> wrote:

> Thanks for getting back to me about this change Oleg. I took your advice
> and reworked the patch by adding a new hstore gin opclass
> (gin_hstore_combined_ops) and leaving the functionality of the default
> hstore gin opclass the same. This should prevent the on-disk compatibility
> issues from the first patch, and allow users to select the different
> indexing method when they build the index. The hstore regression suite is
> passing for me locally with the --enable-cassert configure flag. Please let
> me know what you think and if there is any other work that would need to be
> done (style cleanups, updating documentation, etc) to get this merged.
>
> Thanks!
>
> Blake
>
>
>
>
>
>
> On Fri, Sep 6, 2013 at 1:47 PM, Oleg Bartunov <obartunov(at)gmail(dot)com> wrote:
>
>> Blake,
>>
>> I think it's better to implement this patch as a separate opclass, so
>> users will have option to choose indexing.
>>
>> Oleg
>>
>>
>> On Tue, Sep 3, 2013 at 6:24 PM, Blake Smith <blakesmith0(at)gmail(dot)com>wrote:
>>
>>> Thanks for the feedback everyone. I've attached the patch that we are
>>> now running in production to service our hstore include queries. We rebuilt
>>> the index to account for the on-disk incompatibility. I've submitted the
>>> patch to commitfest here:
>>> https://commitfest.postgresql.org/action/patch_view?id=1203
>>>
>>> Michael: I don't have a formal benchmark, but several of our worst
>>> queries went from 10-20 seconds per query down to 50-400 ms. These are
>>> numbers we've seen when testing real production queries against our
>>> production dataset with real world access patterns.
>>> Oleg: Thanks for your thoughts on this change. As for the spgist / gin
>>> work you're doing, is there anything you need help with or are you still in
>>> the research phase? I'd love to help get something more robust merged into
>>> mainline if you think there's collaborative work to be done (even if it's
>>> only user testing).
>>>
>>> Thanks,
>>>
>>> Blake
>>>
>>>
>>>
>>>
>>> On Wed, Aug 28, 2013 at 12:40 PM, Andres Freund <andres(at)2ndquadrant(dot)com>wrote:
>>>
>>>> On 2013-08-28 13:31:22 -0400, Bruce Momjian wrote:
>>>> > On Sun, Aug 25, 2013 at 10:11:50PM -0400, Tom Lane wrote:
>>>> > > Michael Paquier <michael(dot)paquier(at)gmail(dot)com> writes:
>>>> > > > On Thu, Aug 22, 2013 at 11:55 PM, Blake Smith <
>>>> blakesmith0(at)gmail(dot)com> wrote:
>>>> > > >> The combined entry is used to support "contains (@>)" queries,
>>>> and the key
>>>> > > >> only item is used to support "key contains (?)" queries. This
>>>> change seems
>>>> > > >> to help especially with hstore keys that have high
>>>> cardinalities. Downsides
>>>> > > >> of this change is that it requires an index rebuild, and the
>>>> index will be
>>>> > > >> larger in size.
>>>> > >
>>>> > > > Index rebuild would be a problem only for minor releases,
>>>> > >
>>>> > > That's completely false; people have expected major releases to be
>>>> > > on-disk-compatible for several years now. While there probably
>>>> will be
>>>> > > future releases in which we are willing to break storage
>>>> compatibility,
>>>> > > a contrib module doesn't get to dictate that.
>>>> > >
>>>> > > What might be a practical solution, especially if this isn't always
>>>> a
>>>> > > win (which seems likely given the index-bloat risk), is to make
>>>> hstore
>>>> > > offer two different GIN index opclasses, one that works the
>>>> traditional
>>>> > > way and one that works this way.
>>>> > >
>>>> > > Another thing that needs to be taken into account here is Oleg and
>>>> > > Teodor's in-progress work on extending hstore:
>>>> > > https://www.pgcon.org/2013/schedule/events/518.en.html
>>>> > > I'm not sure if this patch would conflict with that at all, but it
>>>> > > needs to be considered.
>>>> >
>>>> > We can disallow in-place upgrades for clusters that use certain
>>>> contrib
>>>> > modules --- we have done that in the past.
>>>>
>>>> But that really cannot be acceptable for hstore. The probably most
>>>> widely used extension there is.
>>>>
>>>> Greetings,
>>>>
>>>> Andres Freund
>>>>
>>>> --
>>>> Andres Freund http://www.2ndQuadrant.com/
>>>> PostgreSQL Development, 24x7 Support, Training & Services
>>>>
>>>
>>>
>>>
>>> --
>>> Blake Smith
>>> http://blakesmith.me
>>> @blakesmith
>>>
>>>
>>> --
>>> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
>>> To make changes to your subscription:
>>> http://www.postgresql.org/mailpref/pgsql-hackers
>>>
>>>
>>
>
>
> --
> Blake Smith
> http://blakesmith.me
> @blakesmith
>


From: David Fetter <david(at)fetter(dot)org>
To: Blake Smith <blakesmith0(at)gmail(dot)com>
Cc: obartunov(at)gmail(dot)com, Andres Freund <andres(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Hstore: Query speedups with Gin index
Date: 2013-09-16 05:37:10
Message-ID: 20130916053710.GA9493@fetter.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Blake,

I've taken the liberty of adding this patch to the current Commitfest.

In future, please continue to send patches both to this thread and to
the commitfest application when you have a message ID for them :)

Cheers,
David.
On Mon, Sep 09, 2013 at 09:55:01AM -0500, Blake Smith wrote:
> Thanks for getting back to me about this change Oleg. I took your advice
> and reworked the patch by adding a new hstore gin opclass
> (gin_hstore_combined_ops) and leaving the functionality of the default
> hstore gin opclass the same. This should prevent the on-disk compatibility
> issues from the first patch, and allow users to select the different
> indexing method when they build the index. The hstore regression suite is
> passing for me locally with the --enable-cassert configure flag. Please let
> me know what you think and if there is any other work that would need to be
> done (style cleanups, updating documentation, etc) to get this merged.
>
> Thanks!
>
> Blake
>
>
>
>
>
>
> On Fri, Sep 6, 2013 at 1:47 PM, Oleg Bartunov <obartunov(at)gmail(dot)com> wrote:
>
> > Blake,
> >
> > I think it's better to implement this patch as a separate opclass, so
> > users will have option to choose indexing.
> >
> > Oleg
> >
> >
> > On Tue, Sep 3, 2013 at 6:24 PM, Blake Smith <blakesmith0(at)gmail(dot)com> wrote:
> >
> >> Thanks for the feedback everyone. I've attached the patch that we are now
> >> running in production to service our hstore include queries. We rebuilt the
> >> index to account for the on-disk incompatibility. I've submitted the patch
> >> to commitfest here:
> >> https://commitfest.postgresql.org/action/patch_view?id=1203
> >>
> >> Michael: I don't have a formal benchmark, but several of our worst
> >> queries went from 10-20 seconds per query down to 50-400 ms. These are
> >> numbers we've seen when testing real production queries against our
> >> production dataset with real world access patterns.
> >> Oleg: Thanks for your thoughts on this change. As for the spgist / gin
> >> work you're doing, is there anything you need help with or are you still in
> >> the research phase? I'd love to help get something more robust merged into
> >> mainline if you think there's collaborative work to be done (even if it's
> >> only user testing).
> >>
> >> Thanks,
> >>
> >> Blake
> >>
> >>
> >>
> >>
> >> On Wed, Aug 28, 2013 at 12:40 PM, Andres Freund <andres(at)2ndquadrant(dot)com>wrote:
> >>
> >>> On 2013-08-28 13:31:22 -0400, Bruce Momjian wrote:
> >>> > On Sun, Aug 25, 2013 at 10:11:50PM -0400, Tom Lane wrote:
> >>> > > Michael Paquier <michael(dot)paquier(at)gmail(dot)com> writes:
> >>> > > > On Thu, Aug 22, 2013 at 11:55 PM, Blake Smith <
> >>> blakesmith0(at)gmail(dot)com> wrote:
> >>> > > >> The combined entry is used to support "contains (@>)" queries,
> >>> and the key
> >>> > > >> only item is used to support "key contains (?)" queries. This
> >>> change seems
> >>> > > >> to help especially with hstore keys that have high cardinalities.
> >>> Downsides
> >>> > > >> of this change is that it requires an index rebuild, and the
> >>> index will be
> >>> > > >> larger in size.
> >>> > >
> >>> > > > Index rebuild would be a problem only for minor releases,
> >>> > >
> >>> > > That's completely false; people have expected major releases to be
> >>> > > on-disk-compatible for several years now. While there probably will
> >>> be
> >>> > > future releases in which we are willing to break storage
> >>> compatibility,
> >>> > > a contrib module doesn't get to dictate that.
> >>> > >
> >>> > > What might be a practical solution, especially if this isn't always a
> >>> > > win (which seems likely given the index-bloat risk), is to make
> >>> hstore
> >>> > > offer two different GIN index opclasses, one that works the
> >>> traditional
> >>> > > way and one that works this way.
> >>> > >
> >>> > > Another thing that needs to be taken into account here is Oleg and
> >>> > > Teodor's in-progress work on extending hstore:
> >>> > > https://www.pgcon.org/2013/schedule/events/518.en.html
> >>> > > I'm not sure if this patch would conflict with that at all, but it
> >>> > > needs to be considered.
> >>> >
> >>> > We can disallow in-place upgrades for clusters that use certain contrib
> >>> > modules --- we have done that in the past.
> >>>
> >>> But that really cannot be acceptable for hstore. The probably most
> >>> widely used extension there is.
> >>>
> >>> Greetings,
> >>>
> >>> Andres Freund
> >>>
> >>> --
> >>> Andres Freund http://www.2ndQuadrant.com/
> >>> PostgreSQL Development, 24x7 Support, Training & Services
> >>>
> >>
> >>
> >>
> >> --
> >> Blake Smith
> >> http://blakesmith.me
> >> @blakesmith
> >>
> >>
> >> --
> >> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> >> To make changes to your subscription:
> >> http://www.postgresql.org/mailpref/pgsql-hackers
> >>
> >>
> >
>
>
> --
> Blake Smith
> http://blakesmith.me
> @blakesmith

>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Blake Smith <blakesmith0(at)gmail(dot)com>
Cc: obartunov(at)gmail(dot)com, Andres Freund <andres(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Hstore: Query speedups with Gin index
Date: 2013-11-30 13:08:05
Message-ID: 1385816885.27340.15.camel@vanquo.pezone.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Are you still working on this patch?