Re: COPY and Volatile default expressions

From: David Fetter <david(at)fetter(dot)org>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: COPY and Volatile default expressions
Date: 2013-04-15 15:24:27
Message-ID: 20130415152427.GH5337@fetter.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Apr 15, 2013 at 03:00:34PM +0100, Simon Riggs wrote:
> COPY cannot be optimised correctly if we have before triggers or
> volatile default expressions.
>
> The multi-insert code detects those cases and falls back to the
> single row mechanism in those cases.
>
> There a common class of volatile functions that wouldn't cause
> problems: any volatile function that doesn't touch the table being
> loaded and still works correctly when called with alternately
> ordered data.

"Doesn't touch already-existing rows?" Makes a lot of sense :)

> I claim this is a common class, since sequence next_val functions and
> uuid generators meet that criteria and most common forms of auditing
> trigger, as well as any other form of data-reformatting trigger. Since
> this is a common case, it seems worth optimising.

Do you have numbers on this, or ways to gather same? In other words,
how do we know what resources (time, CPU cycles, disk seeks, etc.) are
being consumed here?

> What I'd like to do is to invent a new form of labelling that allows
> us to understand that COPY can still be optimised. I'm thinking to add
> a new function label, something like one of
> * IDEMPOTENT
> * ORDER INDEPENDENT
> * BATCHABLE
> * NON SELF REFERENCING
> * GO FASTER DAMMIT
> etc
>
> I'm sure many people will have a much more exact description and a
> better name than I do.
>
> This becomes more important when we think about parallelising SQL,
> since essentially the same problem exists with parallel SQL calling
> volatile functions. Oracle handles that by having a pragma to allow a
> function to be declared as parallel safe.

What happens when you misinform Oracle about this? Does it attempt to
check? More importantly, what *should* happen?

> I was also thinking that the best way to do this would be to invent a
> new flexible function labelling scheme, so use something like hstore
> to store a list of function attributes. Something that would mean we
> don't have to invent new keywords every time we have a new function
> label.
>
> Suggestions please.

JSON's in core. How about using that?

Cheers,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Fetter 2013-04-15 15:27:37 Re: WIP: index support for regexp search
Previous Message hubert depesz lubaczewski 2013-04-15 15:16:57 Why are JSON extraction functions STABLE and not IMMUTABLE?