IDENTITY/GENERATED patch

From: Zoltan Boszormenyi <zb(at)cybertec(dot)at>
To: Postgresql-Hackers <pgsql-hackers(at)postgresql(dot)org>
Cc: Hans-Juergen Schoenig <hs(at)cybertec(dot)at>
Subject: IDENTITY/GENERATED patch
Date: 2008-02-18 13:18:36
Message-ID: 47B985AC.2050907@cybertec.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

as the wishlist for PostgreSQL 8.4 still has my
IDENTITY/GENERATED patches, I thought I refresh it.
Before actually doing it though, I wanted to ask for opinions
on implementation and ideas.

Here are the general ideas that were done by my patch:

1. IDENTITY columns.

The IDENTITY columns are similar to SERIALs as featured
by current PostgreSQL with the differences below.

They can be of almost any basic type, i.e. other than INTEGER or BIGINT.

There are two types of them:
GENERATED BY DEFAULT [ AS ( sequence_options ) ]
and
GENERATED ALWAYS [ AS ( sequence_options ) ]

The first behaves more or less as SERIAL in PostgreSQL currently,
i.e. upon INSERT and UPDATE such a field can be assigned a value
explicitely, they can be updated, etc. So, this can be viewed as
nothing more than a SERIAL pseudo-type column, followed by
the statements below:

ALTER TABLE ... ALTER COLUMN ... TYPE ...;
ALTER SEQUENCE seq_name sequence_options;

So, the above form is mostly a "syntax sugar" over already existing
features. In fact, this was the reason the acceptance was shot down.
The standard describes some behavioural difference that may make
it worth to distinguish from plain SERIALs.

However, IDENTITY GENERATED ALWAYS has some more
constraints. This is more like the autoincrementing fields in other
DBMSs, i.e. upon INSERT, usually the value "0" indicates the
generation of the next value. With the standard behaviour, any value
is ignored and the sequence next value is generated. UPDATE can only
use the DEFAULT keyword, not an explicit value.

It seems the last draft for SQL:2008 largely clarified details for
IDENTITY columns that were either conflicting or unclear in SQL:2003.

2. GENERATED columns

The GENERATED column is an automatic way of computing
expressions over columns in the same row. This is a feature of DB2.
Currently this can be implemented as a BEFORE TRIGGER
in PostgreSQL but that requires being a little familiar writing
PLPGSQL functions. The syntax of GENERATED columns
allows using plain expressions, so it's an easy to use feature.

The GENERATED columns are real, stored columns, not
on-the-fly computed virtual columns. This means that the value
computation cost is in the INSERT and UPDATE statements
instead of the SELECTs. This provides a nice speedup which
I tested but cannot currently find the mail in the archive.
Despite this demonstrable speedup, exactly the non-virtual nature
was the cause why Tom Lane didn't like it.

There's a problem however with GENERATED columns that
was pointed out, regarding BEFORE TRIGGERs. For speed,
the GENERATED columns' values can be computed after all
BEFORE TRIGGERs were executed. However, this causes
that functions running as BEFORE TRIGGERs see the old value
of the columns. This detail was mentioned in some of the
previous drafts but not in the latest SQL:2008 draft. This means
such columns MUST be recomputed before running every
BEFORE TRIGGERs and after the last one, too.
The performance hit in this case is large. A GUC variable can be
introduced to make both the fast and the correct behaviour possible.

Based on my last patch:
http://archives.postgresql.org/pgsql-patches/2007-05/msg00076.php
what problems can you see in it that needs fixing? I know, it extends
on the standard. (e.g. DROP IDENTITY, etc.) However, would a strictly
standard compliant IDENTITY/GENERATED get accepted into
PostgreSQL? Which extensions in the patch are worth keeping?

There are two possible problems that I know of.
I need to fix the OVERRIDING USER VALUE case as it isn't a NOP.
This convinced me that PostgreSQL SERIAL is not equivalent
to the IDENTITY GENERATED BY DEFAULT column and
the requirement that only one IDENTITY column may exists in
a table is strongly assumed by the OVERRIDING clauses.

And I just discovered an extension of TRUNCATE TABLE
in SQL:2008 that make it possible to restart the sequence behind
the identity column upon TRUNCATE.

What else?

--
----------------------------------
Zoltán Böszörményi
Cybertec Schönig & Schönig GmbH
http://www.postgresql.at/

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Doug Knight 2008-02-18 13:27:14 Error building 32 bit on 64 bit linux system
Previous Message Pattu, Susanta Kumar 2008-02-18 13:12:11 Error"Failed to run initdb: 128!"