Re: more than one index in a single heap pass?

Lists: pgsql-hackers
From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: more than one index in a single heap pass?
Date: 2009-07-14 19:33:14
Message-ID: 4A5CDD7A.3080101@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


I was just wondering idly today if we could usefully build a number of
indexes at the same time in a single pass over the heap, or could it be
that we wouldn't gain much? I haven't even got around to thinking about
any syntax for it.

cheers

andrew


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: more than one index in a single heap pass?
Date: 2009-07-14 19:36:14
Message-ID: 20090714193614.GN4799@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Dunstan wrote:
>
> I was just wondering idly today if we could usefully build a number of
> indexes at the same time in a single pass over the heap, or could it be
> that we wouldn't gain much? I haven't even got around to thinking about
> any syntax for it.

Could we make it work on two backends building one index each in
synchronized scans?

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: more than one index in a single heap pass?
Date: 2009-07-14 19:50:06
Message-ID: 13828.1247601006@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> Andrew Dunstan wrote:
>> I was just wondering idly today if we could usefully build a number of
>> indexes at the same time in a single pass over the heap, or could it be
>> that we wouldn't gain much? I haven't even got around to thinking about
>> any syntax for it.

> Could we make it work on two backends building one index each in
> synchronized scans?

Don't we more or less have that already?

regards, tom lane


From: Greg Stark <gsstark(at)mit(dot)edu>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: more than one index in a single heap pass?
Date: 2009-07-14 23:04:59
Message-ID: 407d949e0907141604m30309312ua7e805a085c5d84c@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Jul 14, 2009 at 8:50 PM, Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
>> Andrew Dunstan wrote:
>>> I was just wondering idly today if we could usefully build a number of
>>> indexes at the same time in a single pass over the heap, or could it be
>>> that we wouldn't gain much? I haven't even got around to thinking about
>>> any syntax for it.
>
>> Could we make it work on two backends building one index each in
>> synchronized scans?
>
> Don't we more or less have that already?

Wasn't that a big part of the point of the "parallel pg_restore" feature?

--
greg
http://mit.edu/~gsstark/resume.pdf


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: more than one index in a single heap pass?
Date: 2009-07-14 23:32:05
Message-ID: 4A5D1575.1070602@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Greg Stark wrote:
> On Tue, Jul 14, 2009 at 8:50 PM, Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
>> Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
>>
>>> Andrew Dunstan wrote:
>>>
>>>> I was just wondering idly today if we could usefully build a number of
>>>> indexes at the same time in a single pass over the heap, or could it be
>>>> that we wouldn't gain much? I haven't even got around to thinking about
>>>> any syntax for it.
>>>>
>>> Could we make it work on two backends building one index each in
>>> synchronized scans?
>>>
>> Don't we more or less have that already?
>>
>
> Wasn't that a big part of the point of the "parallel pg_restore" feature?
>
>

Well, yes, it's some of it, and in theory Tom's late addition of a queue
that gets all the dependencies of a table as soon as the table data is
restored should make that work better. But of course, that's not the
only time indexes are created, and each index creation command will be
doing its own heap processing, albeit that synchronised scanning will
make that lots cheaper.

As I said originally, it was just an idle thought that came to me today.

cheers

andrew


From: Glen Parker <glenebob(at)nwlink(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>
Subject: Re: more than one index in a single heap pass?
Date: 2009-07-15 00:01:15
Message-ID: 4A5D1C4B.1030008@nwlink.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Dunstan wrote:
> Well, yes, it's some of it, and in theory Tom's late addition of a queue
> that gets all the dependencies of a table as soon as the table data is
> restored should make that work better. But of course, that's not the
> only time indexes are created, and each index creation command will be
> doing its own heap processing, albeit that synchronised scanning will
> make that lots cheaper.
>
> As I said originally, it was just an idle thought that came to me today.

Sounds to me like another reason to separate index definition from
creation. If an index can be defined but not yet created or valid, then
you could imagine syntax like:

DEFINE INDEX blahblah1 ON mytable (some fields);
DEFINE INDEX blahblah2 ON mytable (some other fields);
[RE]INDEX TABLE mytable;

...provided that REINDEX TABLE could recreate all indexes simultaneously
as you suggest.

-Glen


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Greg Stark <gsstark(at)mit(dot)edu>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: more than one index in a single heap pass?
Date: 2009-07-15 14:45:56
Message-ID: 200907151445.n6FEjuT23471@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Dunstan wrote:
> > Wasn't that a big part of the point of the "parallel pg_restore" feature?
> >
> >
>
> Well, yes, it's some of it, and in theory Tom's late addition of a queue
> that gets all the dependencies of a table as soon as the table data is
> restored should make that work better. But of course, that's not the
> only time indexes are created, and each index creation command will be
> doing its own heap processing, albeit that synchronised scanning will
> make that lots cheaper.
>
> As I said originally, it was just an idle thought that came to me today.

Well, TODO has:

Allow multiple indexes to be created concurrently, ideally via a
single heap scan, and have pg_restore use it

Isn't this already largely done by parallel pg_restore work?

so we have to decide if we still want that item. I think what we don't
have is a way to create multiple indexes simultaneously via SQL.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Greg Stark <gsstark(at)mit(dot)edu>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: more than one index in a single heap pass?
Date: 2009-07-15 14:53:24
Message-ID: 22494.1247669604@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Bruce Momjian <bruce(at)momjian(dot)us> writes:
> Well, TODO has:

> Allow multiple indexes to be created concurrently, ideally via a
> single heap scan, and have pg_restore use it

> Isn't this already largely done by parallel pg_restore work?

> so we have to decide if we still want that item. I think what we don't
> have is a way to create multiple indexes simultaneously via SQL.

And if we did build that, people would be bleating because it could only
make use of one CPU. I think multiple backends using the existing
syncscan infrastructure meets this need pretty well already.

regards, tom lane


From: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>
To: Glen Parker <glenebob(at)nwlink(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andrew Dunstan <andrew(at)dunslane(dot)net>
Subject: Re: more than one index in a single heap pass?
Date: 2009-07-15 19:52:58
Message-ID: 00DEECD0-9F56-4D24-9229-13FC2D1DBBD1@hi-media.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

Le 15 juil. 09 à 02:01, Glen Parker a écrit :
> Sounds to me like another reason to separate index definition from
> creation. If an index can be defined but not yet created or valid,
> then you could imagine syntax like:
>
> DEFINE INDEX blahblah1 ON mytable (some fields);
> DEFINE INDEX blahblah2 ON mytable (some other fields);
> [RE]INDEX TABLE mytable;
>
> ...provided that REINDEX TABLE could recreate all indexes
> simultaneously as you suggest.

Well to me it sounded much more like:
BEGIN;
CREATE INDEX idx_a ON t(a) DEFERRED;
CREATE INDEX idx_b ON t(b) DEFERRED;
COMMIT;

And at commit time, PostgreSQL would build all the transaction indexes
in one pass over the heap, but as Tom already pointed out, using only
1 CPU. Maybe that'd be a way to limit the overall io bandwidth usage
while not consuming too many CPU resources at the same time.

I mean now we have a choice to either sync scan the table heap on
multiple CPU, saving IO but using 1 CPU per index, or to limit CPU to
only 1 but then scan the heap once per index. The intermediary option
of using 1 CPU while still making a single heap scan sure can be
worthwhile to some?

Regards,
--
dim


From: decibel <decibel(at)decibel(dot)org>
To: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>
Cc: Glen Parker <glenebob(at)nwlink(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andrew Dunstan <andrew(at)dunslane(dot)net>
Subject: Re: more than one index in a single heap pass?
Date: 2009-07-17 16:57:51
Message-ID: C27DBC18-9BC8-499C-94BA-60FA3FD4739E@decibel.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Jul 15, 2009, at 2:52 PM, Dimitri Fontaine wrote:
> Le 15 juil. 09 à 02:01, Glen Parker a écrit :
>> Sounds to me like another reason to separate index definition from
>> creation. If an index can be defined but not yet created or
>> valid, then you could imagine syntax like:
>>
>> DEFINE INDEX blahblah1 ON mytable (some fields);
>> DEFINE INDEX blahblah2 ON mytable (some other fields);
>> [RE]INDEX TABLE mytable;
>>
>> ...provided that REINDEX TABLE could recreate all indexes
>> simultaneously as you suggest.
>
> Well to me it sounded much more like:
> BEGIN;
> CREATE INDEX idx_a ON t(a) DEFERRED;
> CREATE INDEX idx_b ON t(b) DEFERRED;
> COMMIT;
>
> And at commit time, PostgreSQL would build all the transaction
> indexes in one pass over the heap, but as Tom already pointed out,
> using only 1 CPU. Maybe that'd be a way to limit the overall io
> bandwidth usage while not consuming too many CPU resources at the
> same time.
>
> I mean now we have a choice to either sync scan the table heap on
> multiple CPU, saving IO but using 1 CPU per index, or to limit CPU
> to only 1 but then scan the heap once per index. The intermediary
> option of using 1 CPU while still making a single heap scan sure
> can be worthwhile to some?

Here's an off-the-wall thought... since most of the CPU time is in
the sort, what about allowing a backend to fork off dedicated sort
processes? Aside from building multiple indexes at once, that
functionality could also be useful in general queries.
--
Decibel!, aka Jim C. Nasby, Database Architect decibel(at)decibel(dot)org
Give your computer some brain candy! www.distributed.net Team #1828


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: decibel <decibel(at)decibel(dot)org>
Cc: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, Glen Parker <glenebob(at)nwlink(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andrew Dunstan <andrew(at)dunslane(dot)net>
Subject: Re: more than one index in a single heap pass?
Date: 2009-07-17 19:15:56
Message-ID: 4A60CDEC.7080506@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

decibel wrote:
> Here's an off-the-wall thought... since most of the CPU time is in the
> sort, what about allowing a backend to fork off dedicated sort
> processes? Aside from building multiple indexes at once, that
> functionality could also be useful in general queries.

Sure, that would be cool. And also a lot of work :-). The comparison
operators can be arbitrarily complex, potentially querying other tables
etc, so you would indeed need pretty much all the infrastrucutre you
need to solve the general case.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com