Lists: | pgsql-performance |
---|
From: | Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com> |
---|---|
To: | pgsql-performance(at)postgresql(dot)org |
Subject: | Performance of a large array access by position (tested version 9.1.3) |
Date: | 2012-06-22 07:02:32 |
Message-ID: | CAK-MWwQxjfug8CqBjQXQJJQLQ2bb5kcagEm4GwFvVTgHkLuSBA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-performance |
Hi all,
May be I completely wrong but I always assumed that the access speed to the
array element in PostgreSQL should be close to constant time.
But in tests I found that access speed degrade as O(N) of array size.
Test case (performed on large not busy server with 1GB work_mem to ensure I
working with memory only):
WITH
t AS (SELECT ARRAY(SELECT * FROM generate_series(1,N)) AS _array)
SELECT count((SELECT _array[i] FROM t)) FROM generate_series(1,10000) as
g(i);
Results for N between 1 and 10.000.000 (used locally connected psql with
\timing):
N: Time:
1 5.8ms
10 5.8ms
100 5.8ms
1000 6.7ms
--until there all reasonable
5k 21ms
10k 34ms
50k 177ms
100k 321ms
500k 4100ms
1M 8100ms
2M 22000ms
5M 61000ms
10M 220000ms = 22ms to sinlge array element access.
Is that behaviour is correct?
PS: what I actually lookin for - constant fast access by position
tuplestore for use with recursive queries and/or pl/pgsql, but without
using C programming.
--
Maxim Boguk
Senior Postgresql DBA.
Phone RU: +7 910 405 4718
Phone AU: +61 45 218 5678
Skype: maxim.boguk
Jabber: maxim(dot)boguk(at)gmail(dot)com
МойКруг: http://mboguk.moikrug.ru/
"People problems are solved with people.
If people cannot solve the problem, try technology.
People will then wish they'd listened at the first stage."
From: | Jesper Krogh <jesper(at)krogh(dot)cc> |
---|---|
To: | Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com> |
Cc: | pgsql-performance(at)postgresql(dot)org |
Subject: | Re: Performance of a large array access by position (tested version 9.1.3) |
Date: | 2012-06-26 05:03:26 |
Message-ID: | 4FE9429E.5070905@krogh.cc |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-performance |
On 22/06/12 09:02, Maxim Boguk wrote:
> Hi all,
>
> May be I completely wrong but I always assumed that the access speed
> to the array element in PostgreSQL should be close to constant time.
> But in tests I found that access speed degrade as O(N) of array size.
>
> Test case (performed on large not busy server with 1GB work_mem to
> ensure I working with memory only):
>
> WITH
> t AS (SELECT ARRAY(SELECT * FROM generate_series(1,N)) AS _array)
> SELECT count((SELECT _array[i] FROM t)) FROM generate_series(1,10000)
> as g(i);
>
> Results for N between 1 and 10.000.000 (used locally connected psql
> with \timing):
>
> N: Time:
> 1 5.8ms
> 10 5.8ms
> 100 5.8ms
> 1000 6.7ms
> --until there all reasonable
> 5k 21ms
> 10k 34ms
> 50k 177ms
> 100k 321ms
> 500k 4100ms
> 1M 8100ms
> 2M 22000ms
> 5M 61000ms
> 10M 220000ms = 22ms to sinlge array element access.
>
>
> Is that behaviour is correct?
>
> PS: what I actually lookin for - constant fast access by position
> tuplestore for use with recursive queries and/or pl/pgsql, but without
> using C programming.
Default column storage is to "compress it, and store in TOAST" with
large values.
This it what is causing the shift. Try to change the column storage of
the column
to EXTERNAL instead and rerun the test.
ALTER TABLE <tablename> ALTER COLUMN <column name> SET STORAGE EXTERNAL
Default is EXTENDED which runs compression on it, which again makes it
hard to
position into without reading and decompressing everything.
http://www.postgresql.org/docs/9.1/static/sql-altertable.html
Let us know what you get.?
Jesper
From: | "Marc Mamin" <M(dot)Mamin(at)intershop(dot)de> |
---|---|
To: | "Jesper Krogh" <jesper(at)krogh(dot)cc>, "Maxim Boguk" <maxim(dot)boguk(at)gmail(dot)com> |
Cc: | <pgsql-performance(at)postgresql(dot)org> |
Subject: | Re: Performance of a large array access by position (tested version 9.1.3) |
Date: | 2012-06-26 07:53:07 |
Message-ID: | C4DAC901169B624F933534A26ED7DF310861B5E8@JENMAIL01.ad.intershop.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-performance |
>> On 22/06/12 09:02, Maxim Boguk wrote:
>> May be I completely wrong but I always assumed that the access speed to the array element in PostgreSQL should be close to constant time.
>> But in tests I found that access speed degrade as O(N) of array size.
>> Is that behaviour is correct?
> From: pgsql-performance-owner(at)postgresql(dot)org On Behalf Of Jesper Krogh
> Default column storage is to "compress it, and store in TOAST" with large values.
> This it what is causing the shift. Try to change the column storage of the column
> to EXTERNAL instead and rerun the test.
Hello,
I've repeated your test in a simplified form:
you are right :-(
create table t1 ( _array int[]);
alter table t1 alter _array set storage external;
insert into t1 SELECT ARRAY(SELECT * FROM generate_series(1,50000));
create table t2 ( _array int[]);
alter table t2 alter _array set storage external;
insert into t2 SELECT ARRAY(SELECT * FROM generate_series(1,5000000));
explain analyze SELECT _array[1] FROM t1;
Total runtime: 0.125 ms
explain analyze SELECT _array[1] FROM t2;
Total runtime: 8.649 ms
best regards,
Marc Mamin
From: | Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> |
---|---|
To: | Marc Mamin <M(dot)Mamin(at)intershop(dot)de> |
Cc: | Jesper Krogh <jesper(at)krogh(dot)cc>, Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com>, pgsql-performance(at)postgresql(dot)org |
Subject: | Re: Performance of a large array access by position (tested version 9.1.3) |
Date: | 2012-06-26 08:04:22 |
Message-ID: | CAFj8pRD4jxJxr5i6WOCtRLGAwecXg-i6=taMpseuDxZh=NVx7g@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-performance |
2012/6/26 Marc Mamin <M(dot)Mamin(at)intershop(dot)de>:
>
>>> On 22/06/12 09:02, Maxim Boguk wrote:
>
>>> May be I completely wrong but I always assumed that the access speed to the array element in PostgreSQL should be close to constant time.
>>> But in tests I found that access speed degrade as O(N) of array size.
>
>>> Is that behaviour is correct?
yes - access to n position means in postgresql - skip n-1 elements
Regards
Pavel
>
>
>> From: pgsql-performance-owner(at)postgresql(dot)org On Behalf Of Jesper Krogh
>
>> Default column storage is to "compress it, and store in TOAST" with large values.
>> This it what is causing the shift. Try to change the column storage of the column
>> to EXTERNAL instead and rerun the test.
>
>
> Hello,
>
> I've repeated your test in a simplified form:
> you are right :-(
>
> create table t1 ( _array int[]);
> alter table t1 alter _array set storage external;
> insert into t1 SELECT ARRAY(SELECT * FROM generate_series(1,50000));
>
> create table t2 ( _array int[]);
> alter table t2 alter _array set storage external;
> insert into t2 SELECT ARRAY(SELECT * FROM generate_series(1,5000000));
>
> explain analyze SELECT _array[1] FROM t1;
> Total runtime: 0.125 ms
>
> explain analyze SELECT _array[1] FROM t2;
> Total runtime: 8.649 ms
>
>
> best regards,
>
> Marc Mamin
>
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
From: | "Marc Mamin" <M(dot)Mamin(at)intershop(dot)de> |
---|---|
To: | "Pavel Stehule" <pavel(dot)stehule(at)gmail(dot)com> |
Cc: | "Jesper Krogh" <jesper(at)krogh(dot)cc>, "Maxim Boguk" <maxim(dot)boguk(at)gmail(dot)com>, <pgsql-performance(at)postgresql(dot)org> |
Subject: | Re: Performance of a large array access by position (tested version 9.1.3) |
Date: | 2012-06-26 08:19:45 |
Message-ID: | C4DAC901169B624F933534A26ED7DF310861B5E9@JENMAIL01.ad.intershop.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-performance |
> -----Original Message-----
> From: Pavel Stehule [mailto:pavel(dot)stehule(at)gmail(dot)com]
>
> 2012/6/26 Marc Mamin <M(dot)Mamin(at)intershop(dot)de>:
> >
> >>> On 22/06/12 09:02, Maxim Boguk wrote:
> >
> >>> May be I completely wrong but I always assumed that the access
> speed to the array element in PostgreSQL should be close to constant
> time.
> >>> But in tests I found that access speed degrade as O(N) of array
> size.
> >
> >>> Is that behaviour is correct?
>
> yes - access to n position means in postgresql - skip n-1 elements
Hmmm...
how many elements to be skipped here ?
SELECT _array[1] FROM t2;
I wonder if the time rather get spent in first retrieving the array itself before accessing its elements.
regards,
Marc Mamin
>
> Regards
>
> Pavel
>
> >
> >
> >> From: pgsql-performance-owner(at)postgresql(dot)org On Behalf Of Jesper
> Krogh
> >
> >> Default column storage is to "compress it, and store in TOAST" with
> large values.
> >> This it what is causing the shift. Try to change the column storage
> of the column
> >> to EXTERNAL instead and rerun the test.
> >
> >
> > Hello,
> >
> > I've repeated your test in a simplified form:
> > you are right :-(
> >
> > create table t1 ( _array int[]);
> > alter table t1 alter _array set storage external;
> > insert into t1 SELECT ARRAY(SELECT * FROM generate_series(1,50000));
> >
> > create table t2 ( _array int[]);
> > alter table t2 alter _array set storage external;
> > insert into t2 SELECT ARRAY(SELECT * FROM
> generate_series(1,5000000));
> >
> > explain analyze SELECT _array[1] FROM t1;
> > Total runtime: 0.125 ms
> >
> > explain analyze SELECT _array[1] FROM t2;
> > Total runtime: 8.649 ms
> >
> >
> > best regards,
> >
> > Marc Mamin
> >
> >
> >
> > --
> > Sent via pgsql-performance mailing list (pgsql-
> performance(at)postgresql(dot)org)
> > To make changes to your subscription:
> > http://www.postgresql.org/mailpref/pgsql-performance
From: | Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> |
---|---|
To: | Marc Mamin <M(dot)Mamin(at)intershop(dot)de> |
Cc: | Jesper Krogh <jesper(at)krogh(dot)cc>, Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com>, pgsql-performance(at)postgresql(dot)org |
Subject: | Re: Performance of a large array access by position (tested version 9.1.3) |
Date: | 2012-06-26 08:22:41 |
Message-ID: | CAFj8pRCddmsCZFJgEG5zqGd5wj3RuhODJKD4SLpdT5gnfdrEGw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-performance |
2012/6/26 Marc Mamin <M(dot)Mamin(at)intershop(dot)de>:
>
>
>> -----Original Message-----
>> From: Pavel Stehule [mailto:pavel(dot)stehule(at)gmail(dot)com]
>>
>> 2012/6/26 Marc Mamin <M(dot)Mamin(at)intershop(dot)de>:
>> >
>> >>> On 22/06/12 09:02, Maxim Boguk wrote:
>> >
>> >>> May be I completely wrong but I always assumed that the access
>> speed to the array element in PostgreSQL should be close to constant
>> time.
>> >>> But in tests I found that access speed degrade as O(N) of array
>> size.
>> >
>> >>> Is that behaviour is correct?
>>
>> yes - access to n position means in postgresql - skip n-1 elements
>
>
> Hmmm...
>
> how many elements to be skipped here ?
there are two independent stages:
a) detoast - loading and decompression (complete array is detoasted)
b) access
if you has very large arrays, then @a is significant
Regards
Pavel
>
> SELECT _array[1] FROM t2;
>
> I wonder if the time rather get spent in first retrieving the array itself before accessing its elements.
>
> regards,
>
> Marc Mamin
>
>>
>> Regards
>>
>> Pavel
>>
>> >
>> >
>> >> From: pgsql-performance-owner(at)postgresql(dot)org On Behalf Of Jesper
>> Krogh
>> >
>> >> Default column storage is to "compress it, and store in TOAST" with
>> large values.
>> >> This it what is causing the shift. Try to change the column storage
>> of the column
>> >> to EXTERNAL instead and rerun the test.
>> >
>> >
>> > Hello,
>> >
>> > I've repeated your test in a simplified form:
>> > you are right :-(
>> >
>> > create table t1 ( _array int[]);
>> > alter table t1 alter _array set storage external;
>> > insert into t1 SELECT ARRAY(SELECT * FROM generate_series(1,50000));
>> >
>> > create table t2 ( _array int[]);
>> > alter table t2 alter _array set storage external;
>> > insert into t2 SELECT ARRAY(SELECT * FROM
>> generate_series(1,5000000));
>> >
>> > explain analyze SELECT _array[1] FROM t1;
>> > Total runtime: 0.125 ms
>> >
>> > explain analyze SELECT _array[1] FROM t2;
>> > Total runtime: 8.649 ms
>> >
>> >
>> > best regards,
>> >
>> > Marc Mamin
>> >
>> >
>> >
>> > --
>> > Sent via pgsql-performance mailing list (pgsql-
>> performance(at)postgresql(dot)org)
>> > To make changes to your subscription:
>> > http://www.postgresql.org/mailpref/pgsql-performance
From: | Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> |
---|---|
To: | Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com> |
Cc: | Marc Mamin <M(dot)Mamin(at)intershop(dot)de>, Jesper Krogh <jesper(at)krogh(dot)cc>, pgsql-performance(at)postgresql(dot)org |
Subject: | Re: Performance of a large array access by position (tested version 9.1.3) |
Date: | 2012-06-26 09:03:14 |
Message-ID: | CAFj8pRA03UgeHmpau+Ze-V8dmpSxhJbtk87Va3_9pbHT2UeHTA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-performance |
2012/6/26 Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com>:
>
>
> On Tue, Jun 26, 2012 at 6:04 PM, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
> wrote:
>>
>> 2012/6/26 Marc Mamin <M(dot)Mamin(at)intershop(dot)de>:
>> >
>> >>> On 22/06/12 09:02, Maxim Boguk wrote:
>> >
>> >>> May be I completely wrong but I always assumed that the access speed
>> >>> to the array element in PostgreSQL should be close to constant time.
>> >>> But in tests I found that access speed degrade as O(N) of array size.
>> >
>> >>> Is that behaviour is correct?
>>
>> yes - access to n position means in postgresql - skip n-1 elements
>>
>
> Hi,
>
> I understand what you mean, but in my test for all values of N test
> performed access only to first 10000 elements of the array independent of
> the array size....
> So i still can't see why access time degrade soo fast for N>10000...:
>
> WITH
> --GENERATE single entry table with single ARRAY field of size N
> t AS (SELECT ARRAY(SELECT * FROM generate_series(1,N)) AS _array)
> --iterate over first 10000 elements of that ARRAY
> SELECT count((SELECT _array[i] FROM t)) FROM generate_series(1,10000) as
> g(i);
>
> ... if access time depend only on position then after 10k there should not
> be any serious degradation, in fact a perfromance degradation is almost
> linear.
> 10k 34ms
> 50k 177ms
> 100k 321ms
> 500k 4100ms
> 1M 8100ms
> 2M 22000ms
> 5M 61000ms
> 10M 220000ms = 22ms to sinlge array element access.
in this use case TOAST/DETOAST is not used, but probably there are
problem with array copy. Taking n element of array means calling some
function, and postgresql uses only passing parameters by value - so
copy of large array needs higher time.
this issue is solved partially in 9.2, where you can use FOR EACH
cycle http://www.postgresql.org/docs/9.1/interactive/plpgsql-control-structures.html#PLPGSQL-FOREACH-ARRAY
Regards
Pavel
>
> And I think no toasting/detoasting happen in my test case.
>
> Kind Regards,
> Maksym
From: | Cédric Villemain <cedric(at)2ndquadrant(dot)com> |
---|---|
To: | pgsql-performance(at)postgresql(dot)org |
Cc: | Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Marc Mamin <M(dot)Mamin(at)intershop(dot)de>, Jesper Krogh <jesper(at)krogh(dot)cc>, Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com> |
Subject: | Re: Performance of a large array access by position (tested version 9.1.3) |
Date: | 2012-06-28 10:22:35 |
Message-ID: | 201206281222.45642.cedric@2ndquadrant.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-performance |
> >> >>> Is that behaviour is correct?
> >>
> >> yes - access to n position means in postgresql - skip n-1 elements
> >
> > Hmmm...
> >
> > how many elements to be skipped here ?
>
> there are two independent stages:
>
> a) detoast - loading and decompression (complete array is detoasted)
> b) access
>
> if you has very large arrays, then @a is significant
There is a place to add PG_GETARG_ARRAY_P_SLICE. The code is just not done
yet.
--
Cédric Villemain +33 (0)6 20 30 22 52
http://2ndQuadrant.fr/
PostgreSQL: Support 24x7 - Développement, Expertise et Formation