Re: ToDo: fast update of arrays with fixed length fields for PL/pgSQL

Lists: pgsql-hackers
From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: ToDo: fast update of arrays with fixed length fields for PL/pgSQL
Date: 2013-10-02 16:56:51
Message-ID: CAFj8pRAKuDU_0md-dg6Ftk0wSupvMLyrV1PB+HyC+GUBZz346w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hello

this proposal is related to reported issue
http://www.postgresql.org/message-id/E1VQuta-0007Y4-Ip@wrigleys.postgresql.org

We can directly modify any fields of int, float, double arrays (when result
size will be immutable). This trick is used now for acceleration of some
aggregates.

Ideas, comments?

Regards

Pavel


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: ToDo: fast update of arrays with fixed length fields for PL/pgSQL
Date: 2013-10-02 17:01:49
Message-ID: 20131002170149.GA19661@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

On 2013-10-02 18:56:51 +0200, Pavel Stehule wrote:
> Hello
>
> this proposal is related to reported issue
> http://www.postgresql.org/message-id/E1VQuta-0007Y4-Ip@wrigleys.postgresql.org
>
>
> We can directly modify any fields of int, float, double arrays (when result
> size will be immutable). This trick is used now for acceleration of some
> aggregates.
>
> Ideas, comments?

A specific array might be located directly on a buffer - starting to
manipulate them inplace will lead to havoc in such scenarios. I don't
think we have the infrastructure for detecting such cases.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: ToDo: fast update of arrays with fixed length fields for PL/pgSQL
Date: 2013-10-02 17:05:59
Message-ID: CAFj8pRB8BFpx9xD43Sf3hAxG_V0bELPnZeWkzuGb3n_MVKr7JA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2013/10/2 Andres Freund <andres(at)2ndquadrant(dot)com>

> Hi,
>
> On 2013-10-02 18:56:51 +0200, Pavel Stehule wrote:
> > Hello
> >
> > this proposal is related to reported issue
> >
> http://www.postgresql.org/message-id/E1VQuta-0007Y4-Ip@wrigleys.postgresql.org
> >
> >
> > We can directly modify any fields of int, float, double arrays (when
> result
> > size will be immutable). This trick is used now for acceleration of some
> > aggregates.
> >
> > Ideas, comments?
>
> A specific array might be located directly on a buffer - starting to
> manipulate them inplace will lead to havoc in such scenarios. I don't
> think we have the infrastructure for detecting such cases.
>

If you can do a update of some array in plpgsql now, then you have to work
with local copy only. It is a necessary precondition, and I am think it is
valid.

Pavel

>
> Greetings,
>
> Andres Freund
>
> --
> Andres Freund http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Training & Services
>


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: ToDo: fast update of arrays with fixed length fields for PL/pgSQL
Date: 2013-10-03 07:14:59
Message-ID: 44766.1380784499@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> writes:
> If you can do a update of some array in plpgsql now, then you have to work
> with local copy only. It is a necessary precondition, and I am think it is
> valid.

If the proposal only relates to assignments to elements of plpgsql local
variables, it's probably safe, but it's also probably not of much value.
plpgsql has enough overhead that I'm doubting you'd get much real-world
speedup. I'm also not very excited about putting even more low-level
knowledge about array representation into plpgsql.

regards, tom lane


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: ToDo: fast update of arrays with fixed length fields for PL/pgSQL
Date: 2013-10-03 07:23:48
Message-ID: CAFj8pRDxCGTZbFJ4CM2v+PS4OTy9Co9d2prwW407kYU9KdLbUQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2013/10/3 Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>

> Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> writes:
> > If you can do a update of some array in plpgsql now, then you have to
> work
> > with local copy only. It is a necessary precondition, and I am think it
> is
> > valid.
>
> If the proposal only relates to assignments to elements of plpgsql local
> variables, it's probably safe, but it's also probably not of much value.
> plpgsql has enough overhead that I'm doubting you'd get much real-world
> speedup. I'm also not very excited about putting even more low-level
> knowledge about array representation into plpgsql.
>

I looked to code, and I am thinking so this can be done inside array
related routines. We just have to signalize request for inplace update (if
we have a local copy).

I have not idea, how significant speedup can be (if any), but current
behave is not friendly (and for multidimensional arrays there are no
workaround), so it is interesting way - and long time I though about some
similar optimization.

Regards

Pavel

>
> regards, tom lane
>


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: ToDo: fast update of arrays with fixed length fields for PL/pgSQL
Date: 2013-10-03 20:06:46
Message-ID: CAFj8pRAGAyCUZq+5c7-pUg3Pew-kGcuHBG=52-Tafs39ZgN1NQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hello

a very ugly test shows a possibility about 100% speedup on reported
example (on small arrays, a patch is buggy and doesn't work for larger
arrays).

I updated a code to be read only

CREATE OR REPLACE FUNCTION public.fill_2d_array(rows integer, cols integer)
RETURNS integer
LANGUAGE plpgsql
AS $function$
DECLARE
img double precision[][];
i integer; j integer;
cont integer; r double precision;
BEGIN
img := ARRAY( SELECT 0 FROM generate_series(1, rows * cols) ) ;
cont:= 0;
For i IN 1..rows LOOP
For j IN 1..cols LOOP r := img[i * cols + j];
r := (i * cols + j)::double precision;
cont := cont + 1; --raise notice '%', img;
END LOOP;
END LOOP;
return cont;
END;
$function$

It exec all expressions

-- original
postgres=# select fill_2d_array(200,200);
fill_2d_array
---------------
40000
(1 row)

Time: 12726.117 ms

-- read only version
postgres=# select fill_2d_array(200,200); fill_2d_array
---------------
40000
(1 row)

Time: 245.894 ms

so there is about 50x slowdown

2013/10/3 Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>

>
>
>
> 2013/10/3 Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
>
>> Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> writes:
>> > If you can do a update of some array in plpgsql now, then you have to
>> work
>> > with local copy only. It is a necessary precondition, and I am think it
>> is
>> > valid.
>>
>> If the proposal only relates to assignments to elements of plpgsql local
>> variables, it's probably safe, but it's also probably not of much value.
>> plpgsql has enough overhead that I'm doubting you'd get much real-world
>> speedup. I'm also not very excited about putting even more low-level
>> knowledge about array representation into plpgsql.
>>
>
> I looked to code, and I am thinking so this can be done inside array
> related routines. We just have to signalize request for inplace update (if
> we have a local copy).
>
> I have not idea, how significant speedup can be (if any), but current
> behave is not friendly (and for multidimensional arrays there are no
> workaround), so it is interesting way - and long time I though about some
> similar optimization.
>
> Regards
>
> Pavel
>
>
>>
>> regards, tom lane
>>
>
>

Attachment Content-Type Size
fast_array_update.patch application/octet-stream 4.0 KB

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: ToDo: fast update of arrays with fixed length fields for PL/pgSQL
Date: 2013-10-04 21:18:44
Message-ID: CAFj8pRAeyqhvKNaeaz8V+schvno4grbv-7_o49Q8urNXO4LbFg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hello

I am sending little bit cleaned patch

there is significant speedup (on example from bug report) - more than 100x
on my Dell D830

postgres=# select fill_2d_array(300,300,1);
fill_2d_array
───────────────
90000
(1 row)

Time: 751.572 ms -- patched
postgres=# \q
bash-4.1$ psql postgres
psql (9.4devel)
Type "help" for help.

postgres=# select fill_2d_array(300,300,2);
fill_2d_array
───────────────
90000
(1 row)

Time: 87453.351 ms -- original

I checked a result and it is same.

Probably there are some issues, because domain tests fails, but these
numbers shows so a significantly faster array update is possible.

Interesting - I did a profiling and I didn't find anything interesting :(

Regards

Pavel

2013/10/3 Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>

> Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> writes:
> > If you can do a update of some array in plpgsql now, then you have to
> work
> > with local copy only. It is a necessary precondition, and I am think it
> is
> > valid.
>
> If the proposal only relates to assignments to elements of plpgsql local
> variables, it's probably safe, but it's also probably not of much value.
> plpgsql has enough overhead that I'm doubting you'd get much real-world
> speedup. I'm also not very excited about putting even more low-level
> knowledge about array representation into plpgsql.
>
> regards, tom lane
>

Attachment Content-Type Size
fast-array-update.patch application/octet-stream 7.2 KB

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: ToDo: fast update of arrays with fixed length fields for PL/pgSQL
Date: 2013-10-07 14:00:54
Message-ID: CAFj8pRB8=cGxi1XAtjiTD0Sn3ZqYkTomEd96OdzGPUVs7tX9Cg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hello

I fixed patch - there was a missing cast to domain when it was used (and
all regress tests are ok now).

a some performance tests

set array_fast_update to off;
postgres=# select fill_2d_array(300,300);
fill_2d_array
───────────────
90000
(1 row)

Time: 33570.087 ms
postgres=# set array_fast_update to on;
SET
Time: 0.279 ms
postgres=# select fill_2d_array(300,300);
fill_2d_array
───────────────
90000
(1 row)

Time: 505.202 ms

CREATE OR REPLACE FUNCTION public.quicksort(l integer, r integer, a
integer[])
RETURNS integer[]
LANGUAGE plpgsql
IMMUTABLE STRICT
AS $function$
DECLARE akt int[] = a;
i integer := l; j integer := r; x integer = akt[(l+r) / 2];
w integer;
BEGIN
LOOP
WHILE akt[i] < x LOOP i := i + 1; END LOOP;
WHILE x < akt[j] loop j := j - 1; END LOOP;
IF i <= j THEN
w := akt[i];
akt[i] := akt[j]; akt[j] := w;
i := i + 1; j := j - 1;
END IF;
EXIT WHEN i > j;
END LOOP;
IF l < j THEN akt := quicksort(l,j,akt); END IF;
IF i < r then akt := quicksort(i,r,akt); END IF;
RETURN akt;
END;
$function$

postgres=# set array_fast_update to off;
SET
Time: 0.282 ms
postgres=# SELECT array_upper(quicksort(1,21000,array_agg(a)),1) FROM test;
array_upper
─────────────
21000
(1 row)

Time: 5086.781 ms
postgres=# set array_fast_update to on;
SET
Time: 0.702 ms
postgres=# SELECT array_upper(quicksort(1,21000,array_agg(a)),1) FROM test;
array_upper
─────────────
21000
(1 row)

Time: 1751.920 ms

So in first test - fast update is 66x faster, second test - 3x faster

I found so for small arrays (about 1000 fields) a difference is not
significant.

This code can be cleaned (a domains can be better optimized generally - a
IO cast can be expensive for large arrays and domain_check can be used
there instead), but it is good enough for discussion if we would this
optimization or not.

Regards

Pavel

2013/10/3 Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>

> Hello
>
> a very ugly test shows a possibility about 100% speedup on reported
> example (on small arrays, a patch is buggy and doesn't work for larger
> arrays).
>
> I updated a code to be read only
>
> CREATE OR REPLACE FUNCTION public.fill_2d_array(rows integer, cols integer)
> RETURNS integer
> LANGUAGE plpgsql
> AS $function$
> DECLARE
> img double precision[][];
> i integer; j integer;
> cont integer; r double precision;
> BEGIN
> img := ARRAY( SELECT 0 FROM generate_series(1, rows * cols) ) ;
> cont:= 0;
> For i IN 1..rows LOOP
> For j IN 1..cols LOOP r := img[i * cols + j];
> r := (i * cols + j)::double precision;
> cont := cont + 1; --raise notice '%', img;
> END LOOP;
> END LOOP;
> return cont;
> END;
> $function$
>
> It exec all expressions
>
> -- original
> postgres=# select fill_2d_array(200,200);
> fill_2d_array
> ---------------
> 40000
> (1 row)
>
> Time: 12726.117 ms
>
> -- read only version
> postgres=# select fill_2d_array(200,200); fill_2d_array
> ---------------
> 40000
> (1 row)
>
> Time: 245.894 ms
>
> so there is about 50x slowdown
>
>
> 2013/10/3 Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
>
>>
>>
>>
>> 2013/10/3 Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
>>
>>> Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> writes:
>>> > If you can do a update of some array in plpgsql now, then you have to
>>> work
>>> > with local copy only. It is a necessary precondition, and I am think
>>> it is
>>> > valid.
>>>
>>> If the proposal only relates to assignments to elements of plpgsql local
>>> variables, it's probably safe, but it's also probably not of much value.
>>> plpgsql has enough overhead that I'm doubting you'd get much real-world
>>> speedup. I'm also not very excited about putting even more low-level
>>> knowledge about array representation into plpgsql.
>>>
>>
>> I looked to code, and I am thinking so this can be done inside array
>> related routines. We just have to signalize request for inplace update (if
>> we have a local copy).
>>
>> I have not idea, how significant speedup can be (if any), but current
>> behave is not friendly (and for multidimensional arrays there are no
>> workaround), so it is interesting way - and long time I though about some
>> similar optimization.
>>
>> Regards
>>
>> Pavel
>>
>>
>>>
>>> regards, tom lane
>>>
>>
>>
>

Attachment Content-Type Size
fast-array-update.patch application/octet-stream 8.7 KB

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: ToDo: fast update of arrays with fixed length fields for PL/pgSQL
Date: 2013-10-07 14:05:51
Message-ID: 20131007140551.GE15202@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2013-10-07 16:00:54 +0200, Pavel Stehule wrote:
> /*
> * We need to do subscript evaluation, which might require
> @@ -4321,6 +4322,14 @@ exec_assign_value(PLpgSQL_execstate *estate,
> oldarrayval = (ArrayType *) DatumGetPointer(oldarraydatum);
>
> /*
> + * support fast update for array scalar variable is enabled only
> + * when target is a scalar variable and variable holds a local
> + * copy of some array.
> + */
> + inplace_update = (((PLpgSQL_datum *) target)->dtype == PLPGSQL_DTYPE_VAR
> + && ((PLpgSQL_var *) target)->freeval);
> +
> + /*
> * Build the modified array value.
> */

Will this recognize if the local Datum is just a reference to an
external toast Datum (i.e. varattrib_1b_e)?

I don't know much about plpgsql's implementation, so please excuse if
the question is stupid.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: ToDo: fast update of arrays with fixed length fields for PL/pgSQL
Date: 2013-10-07 14:22:10
Message-ID: CAFj8pRB+RzMmy4K2_DeHecz5k6MigmD=uQmayQ4SWQNZesj2TA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2013/10/7 Andres Freund <andres(at)2ndquadrant(dot)com>

> On 2013-10-07 16:00:54 +0200, Pavel Stehule wrote:
> > /*
> > * We need to do subscript evaluation,
> which might require
> > @@ -4321,6 +4322,14 @@ exec_assign_value(PLpgSQL_execstate *estate,
> > oldarrayval = (ArrayType *)
> DatumGetPointer(oldarraydatum);
> >
> > /*
> > + * support fast update for array scalar
> variable is enabled only
> > + * when target is a scalar variable and
> variable holds a local
> > + * copy of some array.
> > + */
> > + inplace_update = (((PLpgSQL_datum *)
> target)->dtype == PLPGSQL_DTYPE_VAR
> > + &&
> ((PLpgSQL_var *) target)->freeval);
> > +
> > + /*
> > * Build the modified array value.
> > */
>
> Will this recognize if the local Datum is just a reference to an
> external toast Datum (i.e. varattrib_1b_e)?
>
>
this works on plpgsql local copies only (when cleaning is controlled by
plpgsql) - so it should be untoasted every time.

btw when I tested this patch I found a second plpgsql array issue -
repeated untoasting :( and we should not forget it.

> I don't know much about plpgsql's implementation, so please excuse if
> the question is stupid.
>
> Greetings,
>
> Andres Freund
>
> --
> Andres Freund http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Training & Services
>


From: Rushabh Lathia <rushabh(dot)lathia(at)gmail(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: ToDo: fast update of arrays with fixed length fields for PL/pgSQL
Date: 2013-11-21 09:21:40
Message-ID: CAGPqQf3RB6s+AdKuF6OyzkegVSKQbm=N_F-XYQfe02RGJHXOfw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

I started reviewing this patch. Gone through the mail thread discussion to
understand the need of the patch. With patch there is significant
performance
improvement in case of update for the array scalar variable.

- Patch gets apply cleanly on master branch
- make, make install and make check works fine

Did code walk through, code change looks good to me. I was doing some
testing
in the area of scalar variable array and domain type. For the big array size
noticed significant performance improvement. So far haven't found any issues
with the code.

Patch looks good to me.

Just FYI.. Do need to remove array_fast_update GUC in the final version of
commit.

Regards,
Rushabh

On Mon, Oct 7, 2013 at 7:52 PM, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>wrote:

>
>
>
> 2013/10/7 Andres Freund <andres(at)2ndquadrant(dot)com>
>
>> On 2013-10-07 16:00:54 +0200, Pavel Stehule wrote:
>> > /*
>> > * We need to do subscript evaluation,
>> which might require
>> > @@ -4321,6 +4322,14 @@ exec_assign_value(PLpgSQL_execstate *estate,
>> > oldarrayval = (ArrayType *)
>> DatumGetPointer(oldarraydatum);
>> >
>> > /*
>> > + * support fast update for array scalar
>> variable is enabled only
>> > + * when target is a scalar variable and
>> variable holds a local
>> > + * copy of some array.
>> > + */
>> > + inplace_update = (((PLpgSQL_datum *)
>> target)->dtype == PLPGSQL_DTYPE_VAR
>> > + &&
>> ((PLpgSQL_var *) target)->freeval);
>> > +
>> > + /*
>> > * Build the modified array value.
>> > */
>>
>> Will this recognize if the local Datum is just a reference to an
>> external toast Datum (i.e. varattrib_1b_e)?
>>
>>
> this works on plpgsql local copies only (when cleaning is controlled by
> plpgsql) - so it should be untoasted every time.
>
> btw when I tested this patch I found a second plpgsql array issue -
> repeated untoasting :( and we should not forget it.
>
>
>
>
>> I don't know much about plpgsql's implementation, so please excuse if
>> the question is stupid.
>>
>> Greetings,
>>
>> Andres Freund
>>
>> --
>> Andres Freund http://www.2ndQuadrant.com/
>> PostgreSQL Development, 24x7 Support, Training & Services
>>
>
>

--
Rushabh Lathia


From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: ToDo: fast update of arrays with fixed length fields for PL/pgSQL
Date: 2013-11-25 21:28:41
Message-ID: 5293C109.90206@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

(sorry for possible duplicates, used wrong account on first try)

On 07.10.2013 17:00, Pavel Stehule wrote:
> Hello
>
> I fixed patch - there was a missing cast to domain when it was used (and
> all regress tests are ok now).

This doesn't work correctly for varlen datatypes. I modified your
quicksort example to work with text[], and got this:

postgres=# SELECT
array_upper(quicksort(1,20000,array_agg((g::text))),1) FROM
generate_series(1,20000) g;
ERROR: invalid input syntax for integer: ""
CONTEXT: PL/pgSQL function quicksort(integer,integer,text[]) line 10 at
assignment
PL/pgSQL function quicksort(integer,integer,text[]) line 16 at assignment

The handling of array domains is also wrong. You replace the new value
to the array and only check the domain constraint after that. If the
constraint is violated, it's too late to roll that back. For example:

postgres=# create domain posarray as int[] check (value[1] > 0);
CREATE DOMAIN
postgres=# do $$
declare
iarr posarray;
begin
begin
iarr[1] := 1;
iarr[1] := -1;
exception when others then raise notice 'got error: %', sqlerrm; end;
raise notice 'value: %', iarr[1];
end;
$$;
NOTICE: got error: value for domain posarray violates check constraint
"posarray_check"
NOTICE: value: -1
DO

Without the patch, it prints 1, but with the patch, -1.

In general, I'm not convinced this patch is worth the trouble. The
speedup isn't all that great; manipulating large arrays in PL/pgSQL is
still so slow that if you care about performance you'll want to rewrite
your function in some other language or use temporary tables. And you
only get a gain with arrays of fixed-length element type with no NULLs.

So I think we should drop this patch in its current form. If we want to
make array manipulation in PL/pgSQL, I think we'll have to do something
similar to how we handle "row" variables, or something else entirely.
But that's a much bigger patch, and I'm not sure it's worth the trouble
either.

Looking at perf profile and the functions involved, though, I see some
low-hanging fruit: in array_set, the new array is allocated with
palloc0'd, but we only need to zero out a few alignment bytes where the
new value is copied. Replacing it with palloc() will save some cycles,
per the attached patch. Nowhere near as much as your patch, but this is
pretty uncontroversial.

- Heikki

Attachment Content-Type Size
quicksort-example.sql text/x-sql 630 bytes
avoid-unnecessary-zeroing-in-array_set-1.patch text/x-diff 1.0 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Heikki Linnakangas <heikki(at)localhost(dot)vmware(dot)com>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: ToDo: fast update of arrays with fixed length fields for PL/pgSQL
Date: 2013-11-25 21:33:29
Message-ID: 21502.1385415209@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Heikki Linnakangas <heikki(at)localhost(dot)vmware(dot)com> writes:
> In general, I'm not convinced this patch is worth the trouble. The
> speedup isn't all that great; manipulating large arrays in PL/pgSQL is
> still so slow that if you care about performance you'll want to rewrite
> your function in some other language or use temporary tables. And you
> only get a gain with arrays of fixed-length element type with no NULLs.

> So I think we should drop this patch in its current form. If we want to
> make array manipulation in PL/pgSQL, I think we'll have to do something
> similar to how we handle "row" variables, or something else entirely.

I think that this area would be a fruitful place to make use of the
noncontiguous datatype storage ideas that we were discussing with the
PostGIS guys recently. I agree that tackling it in the context of plpgsql
alone is not a good way to go at it.

I'm not saying this in a vacuum of information, either. Some of the guys
at Salesforce have been poking at noncontiguous storage for arrays and
have gotten nice speedups --- but their patch is for plpgsql only and
only addresses arrays, which makes it enough of a kluge that I've not
wanted to bring it to the community. I think we should work towards
a general solution instead.

regards, tom lane


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Heikki Linnakangas <heikki(at)localhost(dot)vmware(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: ToDo: fast update of arrays with fixed length fields for PL/pgSQL
Date: 2013-11-26 12:08:28
Message-ID: CAFj8pRDg9oLcOHrGrZC_A-Nv_HHa-TR7AYy5q3SxSPqrFyJZWQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2013/11/25 Heikki Linnakangas <heikki(at)localhost(dot)vmware(dot)com>

> On 07.10.2013 17:00, Pavel Stehule wrote:
>
>> Hello
>>
>> I fixed patch - there was a missing cast to domain when it was used (and
>> all regress tests are ok now).
>>
>
> This doesn't work correctly for varlen datatypes. I modified your
> quicksort example to work with text[], and got this:
>

> postgres=# SELECT array_upper(quicksort(1,20000,array_agg((g::text))),1)
> FROM generate_series(1,20000) g;
> ERROR: invalid input syntax for integer: " "
> CONTEXT: PL/pgSQL function quicksort(integer,integer,text[]) line 10 at
> assignment
> PL/pgSQL function quicksort(integer,integer,text[]) line 16 at assignment
>
> The handling of array domains is also wrong. You replace the new value to
> the array and only check the domain constraint after that. If the
> constraint is violated, it's too late to roll that back. For example:
>

both are bug, and should be fixed

>
> postgres=# create domain posarray as int[] check (value[1] > 0);
> CREATE DOMAIN
> postgres=# do $$
> declare
> iarr posarray;
> begin
> begin
> iarr[1] := 1;
> iarr[1] := -1;
> exception when others then raise notice 'got error: %', sqlerrm; end;
> raise notice 'value: %', iarr[1];
> end;
> $$;
> NOTICE: got error: value for domain posarray violates check constraint
> "posarray_check"
> NOTICE: value: -1
> DO
>
> Without the patch, it prints 1, but with the patch, -1.
>
>
> In general, I'm not convinced this patch is worth the trouble. The speedup
> isn't all that great; manipulating large arrays in PL/pgSQL is still so
> slow that if you care about performance you'll want to rewrite your
> function in some other language or use temporary tables. And you only get a
> gain with arrays of fixed-length element type with no NULLs.
>

I understand to your objection, but I disagree so particular solution is
not enough. Update of fixed array should be fast - for this task there are
no any workaround in plpgsql, so using a array as fast cache for
calculation has some (sometimes very high) hidden costs. A very advanced
users can use a different PL or can prepare C extensions, but for common
users, this cost is hidden and unsolvable.

>
> So I think we should drop this patch in its current form. If we want to
> make array manipulation in PL/pgSQL, I think we'll have to do something
> similar to how we handle "row" variables, or something else entirely. But
> that's a much bigger patch, and I'm not sure it's worth the trouble either.
>

Sorry, I disagree again - using same mechanism for arrays of fixed types
and arrays of varlena types should be ineffective. Slow part of this area
is manipulation with large memory blocks - there we fighting with physical
CPU and access to memory limits.

Now, I know about few significant PL/pgSQL issues related to manipulation
with variables

* repeated detoast
* missing preallocation
* low effectiveness manipulation with large arrays
* low effectiveness with casting (using IO casting instead binary)

A final solution of these issues needs redesign of work with variables in
PL/pgSQL - some in Perl style (maybe). It can be nice, but I don't know
somebody who will work on this early.

I would to concentrate to integration of plpgsql_check to core - and maybe
later I can work on this topic (maybe after 2 years).

So any particular solution (should not be mine) can serve well two three
years. This code is 100% hidden on top, and we can change it safely, so
particular solution can be enough.

Regards

Pavel

> Looking at perf profile and the functions involved, though, I see some
> low-hanging fruit: in array_set, the new array is allocated with palloc0'd,
> but we only need to zero out a few alignment bytes where the new value is
> copied. Replacing it with palloc() will save some cycles, per the attached
> patch. Nowhere near as much as your patch, but this is pretty
> uncontroversial.
>
> - Heikki
>
>


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Heikki Linnakangas <heikki(at)localhost(dot)vmware(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: ToDo: fast update of arrays with fixed length fields for PL/pgSQL
Date: 2013-11-26 12:12:21
Message-ID: CAFj8pRBtqPrsrcRB9Q=OcGrs6BgU0PTqsPMiPzET6hPF_Dj2pg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2013/11/25 Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>

> Heikki Linnakangas <heikki(at)localhost(dot)vmware(dot)com> writes:
> > In general, I'm not convinced this patch is worth the trouble. The
> > speedup isn't all that great; manipulating large arrays in PL/pgSQL is
> > still so slow that if you care about performance you'll want to rewrite
> > your function in some other language or use temporary tables. And you
> > only get a gain with arrays of fixed-length element type with no NULLs.
>
> > So I think we should drop this patch in its current form. If we want to
> > make array manipulation in PL/pgSQL, I think we'll have to do something
> > similar to how we handle "row" variables, or something else entirely.
>
> I think that this area would be a fruitful place to make use of the
> noncontiguous datatype storage ideas that we were discussing with the
> PostGIS guys recently. I agree that tackling it in the context of plpgsql
> alone is not a good way to go at it.
>
> I'm not saying this in a vacuum of information, either. Some of the guys
> at Salesforce have been poking at noncontiguous storage for arrays and
> have gotten nice speedups --- but their patch is for plpgsql only and
> only addresses arrays, which makes it enough of a kluge that I've not
> wanted to bring it to the community. I think we should work towards
> a general solution instead.
>

I am for general solution (because these issues are good performance
traps), but a early particular solution can be valuable for lot of users
too - mainly if general solution can carry in two, three years horizon

Regards

Pavel

>
> regards, tom lane
>