Re: ToDo: fast update of arrays with fixed length fields for PL/pgSQL

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Heikki Linnakangas <heikki(at)localhost(dot)vmware(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: ToDo: fast update of arrays with fixed length fields for PL/pgSQL
Date: 2013-11-26 12:08:28
Message-ID: CAFj8pRDg9oLcOHrGrZC_A-Nv_HHa-TR7AYy5q3SxSPqrFyJZWQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

2013/11/25 Heikki Linnakangas <heikki(at)localhost(dot)vmware(dot)com>

> On 07.10.2013 17:00, Pavel Stehule wrote:
>
>> Hello
>>
>> I fixed patch - there was a missing cast to domain when it was used (and
>> all regress tests are ok now).
>>
>
> This doesn't work correctly for varlen datatypes. I modified your
> quicksort example to work with text[], and got this:
>

> postgres=# SELECT array_upper(quicksort(1,20000,array_agg((g::text))),1)
> FROM generate_series(1,20000) g;
> ERROR: invalid input syntax for integer: " "
> CONTEXT: PL/pgSQL function quicksort(integer,integer,text[]) line 10 at
> assignment
> PL/pgSQL function quicksort(integer,integer,text[]) line 16 at assignment
>
> The handling of array domains is also wrong. You replace the new value to
> the array and only check the domain constraint after that. If the
> constraint is violated, it's too late to roll that back. For example:
>

both are bug, and should be fixed

>
> postgres=# create domain posarray as int[] check (value[1] > 0);
> CREATE DOMAIN
> postgres=# do $$
> declare
> iarr posarray;
> begin
> begin
> iarr[1] := 1;
> iarr[1] := -1;
> exception when others then raise notice 'got error: %', sqlerrm; end;
> raise notice 'value: %', iarr[1];
> end;
> $$;
> NOTICE: got error: value for domain posarray violates check constraint
> "posarray_check"
> NOTICE: value: -1
> DO
>
> Without the patch, it prints 1, but with the patch, -1.
>
>
> In general, I'm not convinced this patch is worth the trouble. The speedup
> isn't all that great; manipulating large arrays in PL/pgSQL is still so
> slow that if you care about performance you'll want to rewrite your
> function in some other language or use temporary tables. And you only get a
> gain with arrays of fixed-length element type with no NULLs.
>

I understand to your objection, but I disagree so particular solution is
not enough. Update of fixed array should be fast - for this task there are
no any workaround in plpgsql, so using a array as fast cache for
calculation has some (sometimes very high) hidden costs. A very advanced
users can use a different PL or can prepare C extensions, but for common
users, this cost is hidden and unsolvable.

>
> So I think we should drop this patch in its current form. If we want to
> make array manipulation in PL/pgSQL, I think we'll have to do something
> similar to how we handle "row" variables, or something else entirely. But
> that's a much bigger patch, and I'm not sure it's worth the trouble either.
>

Sorry, I disagree again - using same mechanism for arrays of fixed types
and arrays of varlena types should be ineffective. Slow part of this area
is manipulation with large memory blocks - there we fighting with physical
CPU and access to memory limits.

Now, I know about few significant PL/pgSQL issues related to manipulation
with variables

* repeated detoast
* missing preallocation
* low effectiveness manipulation with large arrays
* low effectiveness with casting (using IO casting instead binary)

A final solution of these issues needs redesign of work with variables in
PL/pgSQL - some in Perl style (maybe). It can be nice, but I don't know
somebody who will work on this early.

I would to concentrate to integration of plpgsql_check to core - and maybe
later I can work on this topic (maybe after 2 years).

So any particular solution (should not be mine) can serve well two three
years. This code is 100% hidden on top, and we can change it safely, so
particular solution can be enough.

Regards

Pavel

> Looking at perf profile and the functions involved, though, I see some
> low-hanging fruit: in array_set, the new array is allocated with palloc0'd,
> but we only need to zero out a few alignment bytes where the new value is
> copied. Replacing it with palloc() will save some cycles, per the attached
> patch. Nowhere near as much as your patch, but this is pretty
> uncontroversial.
>
> - Heikki
>
>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Rajeev rastogi 2013-11-26 12:09:31 Re: PostgreSQL Service on Windows does not start. ~ "is not a valid Win32 application"
Previous Message Sawada Masahiko 2013-11-26 12:05:24 psql shows line number