Lists: | pgsql-performance |
---|
From: | Sergey Burladyan <eshkinkot(at)gmail(dot)com> |
---|---|
To: | pgsql-performance(at)postgresql(dot)org |
Subject: | Why creating GIN table index is so slow than inserting data into empty table with the same index? |
Date: | 2009-03-24 00:59:34 |
Message-ID: | 874oxjrb0p.fsf@seb.progtech.ru |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-performance |
example:
select version();
version
--------------------------------------------------------------------------------------------
PostgreSQL 8.3.6 on i486-pc-linux-gnu, compiled by GCC gcc-4.3.real (Debian 4.3.3-3) 4.3.3
show maintenance_work_mem ;
maintenance_work_mem
----------------------
128MB
create table a (i1 int, i2 int, i3 int, i4 int, i5 int, i6 int);
insert into a select n, n, n, n, n, n from generate_series(1, 100000) as n;
INSERT 0 100000
Время: 570,110 мс
create index arr_gin on a using gin ( (array[i1, i2, i3, i4, i5, i6]) );
CREATE INDEX
Время: 203068,314 мс
truncate a;
drop index arr_gin ;
create index arr_gin on a using gin ( (array[i1, i2, i3, i4, i5, i6]) );
CREATE INDEX
Время: 3,246 мс
insert into a select n, n, n, n, n, n from generate_series(1, 100000) as n;
INSERT 0 100000
Время: 2405,481 мс
select pg_size_pretty(pg_total_relation_size('a')) as total,
pg_size_pretty(pg_relation_size('a')) as table;
total | table
---------+---------
9792 kB | 5096 kB
203068.314 ms VS 2405.481 ms, is this behaviour normal ?
Thanks !
--
Sergey Burladyan
From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Sergey Burladyan <eshkinkot(at)gmail(dot)com> |
Cc: | Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, pgsql-performance(at)postgresql(dot)org |
Subject: | Re: Why creating GIN table index is so slow than inserting data into empty table with the same index? |
Date: | 2009-03-24 03:35:12 |
Message-ID: | 18581.1237865712@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-performance |
Sergey Burladyan <eshkinkot(at)gmail(dot)com> writes:
> show maintenance_work_mem ;
> maintenance_work_mem
> ----------------------
> 128MB
> create table a (i1 int, i2 int, i3 int, i4 int, i5 int, i6 int);
> insert into a select n, n, n, n, n, n from generate_series(1, 100000) as n;
> create index arr_gin on a using gin ( (array[i1, i2, i3, i4, i5, i6]) );
[ takes forever ]
Seems the reason this is so awful is that the incoming data is exactly
presorted, meaning that the tree structure that ginInsertEntry() is
trying to build degenerates to a linear list (ie, every new element
becomes the right child of the prior one). So the processing becomes
O(N^2) up till you reach maintenance_work_mem and flush the tree. With
a large setting for maintenance_work_mem it gets spectacularly slow.
I think a reasonable solution for this might be to keep an eye on
maxdepth and force a flush if that gets too large (more than a few
hundred, perhaps?). Something like this:
/* If we've maxed out our available memory, dump everything to the index */
+ /* Also dump if the tree seems to be getting too unbalanced */
- if (buildstate->accum.allocatedMemory >= maintenance_work_mem * 1024L)
+ if (buildstate->accum.allocatedMemory >= maintenance_work_mem * 1024L ||
+ buildstate->accum.maxdepth > DEPTH_LIMIT)
{
The new fast-insert code likely will need a similar defense.
Comments?
regards, tom lane
From: | Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Sergey Burladyan <eshkinkot(at)gmail(dot)com>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, pgsql-performance(at)postgresql(dot)org |
Subject: | Re: Why creating GIN table index is so slow than inserting data into empty table with the same index? |
Date: | 2009-03-24 11:07:34 |
Message-ID: | 49C8BEF6.3080909@enterprisedb.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-performance |
Tom Lane wrote:
> Sergey Burladyan <eshkinkot(at)gmail(dot)com> writes:
>> show maintenance_work_mem ;
>> maintenance_work_mem
>> ----------------------
>> 128MB
>
>> create table a (i1 int, i2 int, i3 int, i4 int, i5 int, i6 int);
>> insert into a select n, n, n, n, n, n from generate_series(1, 100000) as n;
>> create index arr_gin on a using gin ( (array[i1, i2, i3, i4, i5, i6]) );
>
> [ takes forever ]
>
> Seems the reason this is so awful is that the incoming data is exactly
> presorted, meaning that the tree structure that ginInsertEntry() is
> trying to build degenerates to a linear list (ie, every new element
> becomes the right child of the prior one). So the processing becomes
> O(N^2) up till you reach maintenance_work_mem and flush the tree. With
> a large setting for maintenance_work_mem it gets spectacularly slow.
Yes, this is probably the same issue I bumped into a while ago:
http://archives.postgresql.org/message-id/49350A13.3020105@enterprisedb.com
> I think a reasonable solution for this might be to keep an eye on
> maxdepth and force a flush if that gets too large (more than a few
> hundred, perhaps?). Something like this:
>
> /* If we've maxed out our available memory, dump everything to the index */
> + /* Also dump if the tree seems to be getting too unbalanced */
> - if (buildstate->accum.allocatedMemory >= maintenance_work_mem * 1024L)
> + if (buildstate->accum.allocatedMemory >= maintenance_work_mem * 1024L ||
> + buildstate->accum.maxdepth > DEPTH_LIMIT)
> {
>
> The new fast-insert code likely will need a similar defense.
I fooled around with a balanced tree, which solved the problem but
unfortunately made the unsorted case slower. Limiting the depth like
that should avoid that so it's worth trying.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> |
Cc: | Sergey Burladyan <eshkinkot(at)gmail(dot)com>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, pgsql-performance(at)postgresql(dot)org |
Subject: | Re: Why creating GIN table index is so slow than inserting data into empty table with the same index? |
Date: | 2009-03-24 13:45:13 |
Message-ID: | 26264.1237902313@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-performance |
Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
> Tom Lane wrote:
>> I think a reasonable solution for this might be to keep an eye on
>> maxdepth and force a flush if that gets too large (more than a few
>> hundred, perhaps?). Something like this:
> I fooled around with a balanced tree, which solved the problem but
> unfortunately made the unsorted case slower.
Yeah, rebalancing the search tree would fix that, but every balanced
tree algorithm I know about is complicated, slow, and needs extra
memory. It's really unclear that it'd be worth the trouble for a
transient data structure like this one.
regards, tom lane