Performance of a large array access by position (tested version 9.1.3)

Lists: pgsql-performance
From: Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Performance of a large array access by position (tested version 9.1.3)
Date: 2012-06-22 07:02:32
Message-ID: CAK-MWwQxjfug8CqBjQXQJJQLQ2bb5kcagEm4GwFvVTgHkLuSBA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Hi all,

May be I completely wrong but I always assumed that the access speed to the
array element in PostgreSQL should be close to constant time.
But in tests I found that access speed degrade as O(N) of array size.

Test case (performed on large not busy server with 1GB work_mem to ensure I
working with memory only):

WITH
t AS (SELECT ARRAY(SELECT * FROM generate_series(1,N)) AS _array)
SELECT count((SELECT _array[i] FROM t)) FROM generate_series(1,10000) as
g(i);

Results for N between 1 and 10.000.000 (used locally connected psql with
\timing):

N: Time:
1 5.8ms
10 5.8ms
100 5.8ms
1000 6.7ms
--until there all reasonable
5k 21ms
10k 34ms
50k 177ms
100k 321ms
500k 4100ms
1M 8100ms
2M 22000ms
5M 61000ms
10M 220000ms = 22ms to sinlge array element access.

Is that behaviour is correct?

PS: what I actually lookin for - constant fast access by position
tuplestore for use with recursive queries and/or pl/pgsql, but without
using C programming.

--
Maxim Boguk
Senior Postgresql DBA.

Phone RU: +7 910 405 4718
Phone AU: +61 45 218 5678

Skype: maxim.boguk
Jabber: maxim(dot)boguk(at)gmail(dot)com
МойКруг: http://mboguk.moikrug.ru/

"People problems are solved with people.
If people cannot solve the problem, try technology.
People will then wish they'd listened at the first stage."


From: Jesper Krogh <jesper(at)krogh(dot)cc>
To: Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Performance of a large array access by position (tested version 9.1.3)
Date: 2012-06-26 05:03:26
Message-ID: 4FE9429E.5070905@krogh.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On 22/06/12 09:02, Maxim Boguk wrote:
> Hi all,
>
> May be I completely wrong but I always assumed that the access speed
> to the array element in PostgreSQL should be close to constant time.
> But in tests I found that access speed degrade as O(N) of array size.
>
> Test case (performed on large not busy server with 1GB work_mem to
> ensure I working with memory only):
>
> WITH
> t AS (SELECT ARRAY(SELECT * FROM generate_series(1,N)) AS _array)
> SELECT count((SELECT _array[i] FROM t)) FROM generate_series(1,10000)
> as g(i);
>
> Results for N between 1 and 10.000.000 (used locally connected psql
> with \timing):
>
> N: Time:
> 1 5.8ms
> 10 5.8ms
> 100 5.8ms
> 1000 6.7ms
> --until there all reasonable
> 5k 21ms
> 10k 34ms
> 50k 177ms
> 100k 321ms
> 500k 4100ms
> 1M 8100ms
> 2M 22000ms
> 5M 61000ms
> 10M 220000ms = 22ms to sinlge array element access.
>
>
> Is that behaviour is correct?
>
> PS: what I actually lookin for - constant fast access by position
> tuplestore for use with recursive queries and/or pl/pgsql, but without
> using C programming.

Default column storage is to "compress it, and store in TOAST" with
large values.
This it what is causing the shift. Try to change the column storage of
the column
to EXTERNAL instead and rerun the test.

ALTER TABLE <tablename> ALTER COLUMN <column name> SET STORAGE EXTERNAL

Default is EXTENDED which runs compression on it, which again makes it
hard to
position into without reading and decompressing everything.

http://www.postgresql.org/docs/9.1/static/sql-altertable.html

Let us know what you get.?

Jesper


From: "Marc Mamin" <M(dot)Mamin(at)intershop(dot)de>
To: "Jesper Krogh" <jesper(at)krogh(dot)cc>, "Maxim Boguk" <maxim(dot)boguk(at)gmail(dot)com>
Cc: <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Performance of a large array access by position (tested version 9.1.3)
Date: 2012-06-26 07:53:07
Message-ID: C4DAC901169B624F933534A26ED7DF310861B5E8@JENMAIL01.ad.intershop.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance


>> On 22/06/12 09:02, Maxim Boguk wrote:

>> May be I completely wrong but I always assumed that the access speed to the array element in PostgreSQL should be close to constant time.
>> But in tests I found that access speed degrade as O(N) of array size.

>> Is that behaviour is correct?

> From: pgsql-performance-owner(at)postgresql(dot)org On Behalf Of Jesper Krogh

> Default column storage is to "compress it, and store in TOAST" with large values.
> This it what is causing the shift. Try to change the column storage of the column
> to EXTERNAL instead and rerun the test.

Hello,

I've repeated your test in a simplified form:
you are right :-(

create table t1 ( _array int[]);
alter table t1 alter _array set storage external;
insert into t1 SELECT ARRAY(SELECT * FROM generate_series(1,50000));

create table t2 ( _array int[]);
alter table t2 alter _array set storage external;
insert into t2 SELECT ARRAY(SELECT * FROM generate_series(1,5000000));

explain analyze SELECT _array[1] FROM t1;
Total runtime: 0.125 ms

explain analyze SELECT _array[1] FROM t2;
Total runtime: 8.649 ms

best regards,

Marc Mamin


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Marc Mamin <M(dot)Mamin(at)intershop(dot)de>
Cc: Jesper Krogh <jesper(at)krogh(dot)cc>, Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Performance of a large array access by position (tested version 9.1.3)
Date: 2012-06-26 08:04:22
Message-ID: CAFj8pRD4jxJxr5i6WOCtRLGAwecXg-i6=taMpseuDxZh=NVx7g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

2012/6/26 Marc Mamin <M(dot)Mamin(at)intershop(dot)de>:
>
>>> On 22/06/12 09:02, Maxim Boguk wrote:
>
>>> May be I completely wrong but I always assumed that the access speed to the array element in PostgreSQL should be close to constant time.
>>> But in tests I found that access speed degrade as O(N) of array size.
>
>>> Is that behaviour is correct?

yes - access to n position means in postgresql - skip n-1 elements

Regards

Pavel

>
>
>> From: pgsql-performance-owner(at)postgresql(dot)org On Behalf Of Jesper Krogh
>
>> Default column storage is to "compress it, and store in TOAST" with large values.
>> This it what is causing the shift. Try to change the column storage of the column
>> to EXTERNAL instead and rerun the test.
>
>
> Hello,
>
> I've repeated your test in a simplified form:
> you are right :-(
>
> create table t1 ( _array int[]);
> alter table t1 alter _array set storage external;
> insert into t1 SELECT ARRAY(SELECT * FROM generate_series(1,50000));
>
> create table t2 ( _array int[]);
> alter table t2 alter _array set storage external;
> insert into t2 SELECT ARRAY(SELECT * FROM generate_series(1,5000000));
>
> explain analyze SELECT _array[1] FROM t1;
> Total runtime: 0.125 ms
>
> explain analyze SELECT _array[1] FROM t2;
> Total runtime: 8.649 ms
>
>
> best regards,
>
> Marc Mamin
>
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance


From: "Marc Mamin" <M(dot)Mamin(at)intershop(dot)de>
To: "Pavel Stehule" <pavel(dot)stehule(at)gmail(dot)com>
Cc: "Jesper Krogh" <jesper(at)krogh(dot)cc>, "Maxim Boguk" <maxim(dot)boguk(at)gmail(dot)com>, <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Performance of a large array access by position (tested version 9.1.3)
Date: 2012-06-26 08:19:45
Message-ID: C4DAC901169B624F933534A26ED7DF310861B5E9@JENMAIL01.ad.intershop.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

> -----Original Message-----
> From: Pavel Stehule [mailto:pavel(dot)stehule(at)gmail(dot)com]
>
> 2012/6/26 Marc Mamin <M(dot)Mamin(at)intershop(dot)de>:
> >
> >>> On 22/06/12 09:02, Maxim Boguk wrote:
> >
> >>> May be I completely wrong but I always assumed that the access
> speed to the array element in PostgreSQL should be close to constant
> time.
> >>> But in tests I found that access speed degrade as O(N) of array
> size.
> >
> >>> Is that behaviour is correct?
>
> yes - access to n position means in postgresql - skip n-1 elements

Hmmm...

how many elements to be skipped here ?

SELECT _array[1] FROM t2;

I wonder if the time rather get spent in first retrieving the array itself before accessing its elements.

regards,

Marc Mamin

>
> Regards
>
> Pavel
>
> >
> >
> >> From: pgsql-performance-owner(at)postgresql(dot)org On Behalf Of Jesper
> Krogh
> >
> >> Default column storage is to "compress it, and store in TOAST" with
> large values.
> >> This it what is causing the shift. Try to change the column storage
> of the column
> >> to EXTERNAL instead and rerun the test.
> >
> >
> > Hello,
> >
> > I've repeated your test in a simplified form:
> > you are right :-(
> >
> > create table t1 ( _array int[]);
> > alter table t1 alter _array set storage external;
> > insert into t1 SELECT ARRAY(SELECT * FROM generate_series(1,50000));
> >
> > create table t2 ( _array int[]);
> > alter table t2 alter _array set storage external;
> > insert into t2 SELECT ARRAY(SELECT * FROM
> generate_series(1,5000000));
> >
> > explain analyze SELECT _array[1] FROM t1;
> > Total runtime: 0.125 ms
> >
> > explain analyze SELECT _array[1] FROM t2;
> > Total runtime: 8.649 ms
> >
> >
> > best regards,
> >
> > Marc Mamin
> >
> >
> >
> > --
> > Sent via pgsql-performance mailing list (pgsql-
> performance(at)postgresql(dot)org)
> > To make changes to your subscription:
> > http://www.postgresql.org/mailpref/pgsql-performance


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Marc Mamin <M(dot)Mamin(at)intershop(dot)de>
Cc: Jesper Krogh <jesper(at)krogh(dot)cc>, Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Performance of a large array access by position (tested version 9.1.3)
Date: 2012-06-26 08:22:41
Message-ID: CAFj8pRCddmsCZFJgEG5zqGd5wj3RuhODJKD4SLpdT5gnfdrEGw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

2012/6/26 Marc Mamin <M(dot)Mamin(at)intershop(dot)de>:
>
>
>> -----Original Message-----
>> From: Pavel Stehule [mailto:pavel(dot)stehule(at)gmail(dot)com]
>>
>> 2012/6/26 Marc Mamin <M(dot)Mamin(at)intershop(dot)de>:
>> >
>> >>> On 22/06/12 09:02, Maxim Boguk wrote:
>> >
>> >>> May be I completely wrong but I always assumed that the access
>> speed to the array element in PostgreSQL should be close to constant
>> time.
>> >>> But in tests I found that access speed degrade as O(N) of array
>> size.
>> >
>> >>> Is that behaviour is correct?
>>
>> yes - access to n position means in postgresql - skip n-1 elements
>
>
> Hmmm...
>
> how many elements to be skipped here ?

there are two independent stages:

a) detoast - loading and decompression (complete array is detoasted)
b) access

if you has very large arrays, then @a is significant

Regards

Pavel

>
> SELECT _array[1] FROM t2;
>
> I wonder if the time rather get spent in first retrieving the array itself before accessing its elements.
>
> regards,
>
> Marc Mamin
>
>>
>> Regards
>>
>> Pavel
>>
>> >
>> >
>> >> From: pgsql-performance-owner(at)postgresql(dot)org On Behalf Of Jesper
>> Krogh
>> >
>> >> Default column storage is to "compress it, and store in TOAST" with
>> large values.
>> >> This it what is causing the shift. Try to change the column storage
>> of the column
>> >> to EXTERNAL instead and rerun the test.
>> >
>> >
>> > Hello,
>> >
>> > I've repeated your test in a simplified form:
>> > you are right :-(
>> >
>> > create table t1 ( _array int[]);
>> > alter table t1 alter _array set storage external;
>> > insert into t1 SELECT ARRAY(SELECT * FROM generate_series(1,50000));
>> >
>> > create table t2 ( _array int[]);
>> > alter table t2 alter _array set storage external;
>> > insert into t2 SELECT ARRAY(SELECT * FROM
>> generate_series(1,5000000));
>> >
>> > explain analyze SELECT _array[1] FROM t1;
>> > Total runtime: 0.125 ms
>> >
>> > explain analyze SELECT _array[1] FROM t2;
>> > Total runtime: 8.649 ms
>> >
>> >
>> > best regards,
>> >
>> > Marc Mamin
>> >
>> >
>> >
>> > --
>> > Sent via pgsql-performance mailing list (pgsql-
>> performance(at)postgresql(dot)org)
>> > To make changes to your subscription:
>> > http://www.postgresql.org/mailpref/pgsql-performance


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com>
Cc: Marc Mamin <M(dot)Mamin(at)intershop(dot)de>, Jesper Krogh <jesper(at)krogh(dot)cc>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Performance of a large array access by position (tested version 9.1.3)
Date: 2012-06-26 09:03:14
Message-ID: CAFj8pRA03UgeHmpau+Ze-V8dmpSxhJbtk87Va3_9pbHT2UeHTA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

2012/6/26 Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com>:
>
>
> On Tue, Jun 26, 2012 at 6:04 PM, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
> wrote:
>>
>> 2012/6/26 Marc Mamin <M(dot)Mamin(at)intershop(dot)de>:
>> >
>> >>> On 22/06/12 09:02, Maxim Boguk wrote:
>> >
>> >>> May be I completely wrong but I always assumed that the access speed
>> >>> to the array element in PostgreSQL should be close to constant time.
>> >>> But in tests I found that access speed degrade as O(N) of array size.
>> >
>> >>> Is that behaviour is correct?
>>
>> yes - access to n position means in postgresql - skip n-1 elements
>>
>
> Hi,
>
> I understand what you mean, but in my test for all values of N test
> performed access only to first 10000 elements of the array independent of
> the array size....
> So i still can't see why access time degrade soo fast for N>10000...:
>
> WITH
> --GENERATE single entry table with single ARRAY field of size N
> t AS (SELECT ARRAY(SELECT * FROM generate_series(1,N)) AS _array)
> --iterate over first 10000 elements of that ARRAY
> SELECT count((SELECT _array[i] FROM t)) FROM generate_series(1,10000) as
> g(i);
>
> ... if access time depend only on position then after 10k there should not
> be any serious degradation, in fact a perfromance degradation is almost
> linear.
> 10k        34ms
> 50k       177ms
> 100k      321ms
> 500k     4100ms
> 1M       8100ms
> 2M      22000ms
> 5M      61000ms
> 10M    220000ms = 22ms to sinlge array element access.

in this use case TOAST/DETOAST is not used, but probably there are
problem with array copy. Taking n element of array means calling some
function, and postgresql uses only passing parameters by value - so
copy of large array needs higher time.

this issue is solved partially in 9.2, where you can use FOR EACH
cycle http://www.postgresql.org/docs/9.1/interactive/plpgsql-control-structures.html#PLPGSQL-FOREACH-ARRAY

Regards

Pavel

>
> And I think no toasting/detoasting happen in my test case.
>
> Kind Regards,
> Maksym


From: Cédric Villemain <cedric(at)2ndquadrant(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Marc Mamin <M(dot)Mamin(at)intershop(dot)de>, Jesper Krogh <jesper(at)krogh(dot)cc>, Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com>
Subject: Re: Performance of a large array access by position (tested version 9.1.3)
Date: 2012-06-28 10:22:35
Message-ID: 201206281222.45642.cedric@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

> >> >>> Is that behaviour is correct?
> >>
> >> yes - access to n position means in postgresql - skip n-1 elements
> >
> > Hmmm...
> >
> > how many elements to be skipped here ?
>
> there are two independent stages:
>
> a) detoast - loading and decompression (complete array is detoasted)
> b) access
>
> if you has very large arrays, then @a is significant

There is a place to add PG_GETARG_ARRAY_P_SLICE. The code is just not done
yet.

--
Cédric Villemain +33 (0)6 20 30 22 52
http://2ndQuadrant.fr/
PostgreSQL: Support 24x7 - Développement, Expertise et Formation