Re: Suggested TODO: allow ALTERing of typemods without heap/index rebuild

Lists: pgsql-hackers
From: Josh Berkus <josh(at)agliodbs(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Suggested TODO: allow ALTERing of typemods without heap/index rebuild
Date: 2009-06-01 20:26:56
Message-ID: 4A243990.9090701@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

All,

I just realized that even if you do this:

table foo (
id serial,
bar varchar(200)
)

ALTER TABLE foo ALTER COLUMN bar TYPE VARCHAR(1000)

... it triggers a heap & index rebuild, even though it's completely
unnecessary. Is this a special case of VARCHAR, or are there other
types where we should be allowing typemod changes without rebuilding?

--
Josh Berkus
PostgreSQL Experts Inc.
www.pgexperts.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Suggested TODO: allow ALTERing of typemods without heap/index rebuild
Date: 2009-06-01 20:40:48
Message-ID: 23990.1243888848@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Josh Berkus <josh(at)agliodbs(dot)com> writes:
> I just realized that even if you do this:
> ALTER TABLE foo ALTER COLUMN bar TYPE VARCHAR(1000)
> ... it triggers a heap & index rebuild, even though it's completely
> unnecessary.

Yeah, this has been discussed before; I think it's even in the TODO
list. The stumbling block has been to identify a reasonably clean way
of determining which datatype changes don't require a scan.

> Is this a special case of VARCHAR, or are there other
> types where we should be allowing typemod changes without rebuilding?

There are any number of other cases where it's potentially interesting.
Consider:

* NUMERIC -> NUMERIC with a larger precision and/or scale

* VARBIT lengthening

* TIMESTAMP precision increase

* VARCHAR(anything) -> TEXT

and that's without considering the potential uses for user-defined
types. Now that we allow user-defined types to have usable typmods,
I'm sure there will be applications for them too. There are also
cases where a change might require a scan to ensure a new constraint
is met, but not a rewrite (eg, reducing the max length of VARCHAR).

We could certainly put in a quick hack that just covered a few of the
cases for built-in types, but it's not very pleasing ...

regards, tom lane


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Suggested TODO: allow ALTERing of typemods without heap/index rebuild
Date: 2009-06-01 20:49:48
Message-ID: 1243889388.12209.39.camel@monkey-cat.sm.truviso.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, 2009-06-01 at 13:26 -0700, Josh Berkus wrote:
> All,
>
> I just realized that even if you do this:
>
> table foo (
> id serial,
> bar varchar(200)
> )
>
> ALTER TABLE foo ALTER COLUMN bar TYPE VARCHAR(1000)
>
> ... it triggers a heap & index rebuild, even though it's completely
> unnecessary. Is this a special case of VARCHAR, or are there other
> types where we should be allowing typemod changes without rebuilding?

NUMERIC(x, y) comes to mind, although that might be a more dangerous
case. If you turn a NUMERIC(5,0) into a NUMERIC(5,1), then '1.2' may be
stored as either '1' or '1.2' depending on whether you did the insert
before or after the change. That's because, with NUMERIC, it's not
really a constraint, but a rule for rounding.

There may be other interesting cases involving constraints. For
instance, if you have CHECK(i < 200), you should be able to add CHECK(i
< 1000) without an exclusive lock or recheck. Then, with an exclusive
lock, you can remove the original tighter constraint, but at least it
wouldn't have to recheck the entire table.

Not sure how much effort that is worth -- VARCHAR and NUMERIC typmods
are probably more common problems and easier to fix.

Regards,
Jeff Davis


From: Greg Stark <stark(at)enterprisedb(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Suggested TODO: allow ALTERing of typemods without heap/index rebuild
Date: 2009-06-01 21:17:30
Message-ID: 4136ffa0906011417l6f4b1875o8f97207ca78ac592@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Jun 1, 2009 at 9:49 PM, Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
>
>
> NUMERIC(x, y) comes to mind, although that might be a more dangerous
> case. If you turn a NUMERIC(5,0) into a NUMERIC(5,1), then '1.2' may be
> stored as either '1' or '1.2' depending on whether you did the insert
> before or after the change. That's because, with NUMERIC, it's not
> really a constraint, but a rule for rounding.

Well it's not like rewriting the table is going to accomplish anything though...

> There may be other interesting cases involving constraints. For
> instance, if you have CHECK(i < 200), you should be able to add CHECK(i
> < 1000) without an exclusive lock or recheck. Then, with an exclusive
> lock, you can remove the original tighter constraint, but at least it
> wouldn't have to recheck the entire table.

We have the infrastructure for this kind of check actually, it's the
same kind of thing we do for partition exclusion...

--
greg


From: Guillaume Smet <guillaume(dot)smet(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Suggested TODO: allow ALTERing of typemods without heap/index rebuild
Date: 2009-06-01 21:39:08
Message-ID: 1d4e0c10906011439r45ee5aa7jc40b1dda4df0ee51@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Jun 1, 2009 at 10:40 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> We could certainly put in a quick hack that just covered a few of the
> cases for built-in types, but it's not very pleasing ...

Jonah proposed a patch for that a long time ago.

See http://archives.postgresql.org/pgsql-patches/2006-10/msg00154.php .

--
Guillaume


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Suggested TODO: allow ALTERing of typemods without heap/index rebuild
Date: 2009-06-01 21:39:50
Message-ID: 4A244AA6.4090907@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


> Yeah, this has been discussed before; I think it's even in the TODO
> list.

I couldn't find it. At least, not under data types, and also not with
the keyword "typemod". Anyone see it?

> The stumbling block has been to identify a reasonably clean way
> of determining which datatype changes don't require a scan.

Yep. One possibility I'm thinking is supplying a function for each type
which takes two typemods (old and new) and returns a value (none, check,
rebuild) which defines what we need to do: nothing, check but not
rebuild, or rebuild. Default would be rebuild. Then the logic is
simple for each data type.

Note that this doesn't deal with the special case of VARCHAR-->TEXT, but
just with changing typemods. Are there other cases of data *type*
conversions where no check or rebuild is required? Otherwise we might
just special case VARCHAR-->TEXT.

Oh, here's a general case: changing DOMAINs on the same base type should
only be a check, and changing from a DOMAIN to its own base type should
be a none.

--
Josh Berkus
PostgreSQL Experts Inc.
www.pgexperts.com


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Suggested TODO: allow ALTERing of typemods without heap/index rebuild
Date: 2009-06-01 21:58:45
Message-ID: 1243893525.12209.51.camel@monkey-cat.sm.truviso.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, 2009-06-01 at 14:39 -0700, Josh Berkus wrote:
> Note that this doesn't deal with the special case of VARCHAR-->TEXT, but
> just with changing typemods. Are there other cases of data *type*
> conversions where no check or rebuild is required? Otherwise we might
> just special case VARCHAR-->TEXT.

I observe that the casts (VARCHAR -> TEXT and TEXT -> VARCHAR) are
marked WITHOUT FUNCTION. If that's the case, can't we use that to say
that no heap rebuild is required? Perhaps we'll need to combine this
with the typmod checks to see if we need to check the heap.

Regards,
Jeff Davis


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Suggested TODO: allow ALTERing of typemods without heap/index rebuild
Date: 2009-06-01 22:10:46
Message-ID: 4A2451E6.3060703@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Jeff,

> I observe that the casts (VARCHAR -> TEXT and TEXT -> VARCHAR) are
> marked WITHOUT FUNCTION. If that's the case, can't we use that to say
> that no heap rebuild is required? Perhaps we'll need to combine this
> with the typmod checks to see if we need to check the heap.

yeah, you're right .. that would give us a short list of conversions
which don't require a rewrite. However, as Tom points out, that
doesn't mean that they might not need a reindex (as well as OID, there's
also XML).

--
Josh Berkus
PostgreSQL Experts Inc.
www.pgexperts.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Suggested TODO: allow ALTERing of typemods without heap/index rebuild
Date: 2009-06-01 22:39:05
Message-ID: 25609.1243895945@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Josh Berkus <josh(at)agliodbs(dot)com> writes:
> yeah, you're right .. that would give us a short list of conversions
> which don't require a rewrite. However, as Tom points out, that
> doesn't mean that they might not need a reindex (as well as OID, there's
> also XML).

Um. I had actually forgotten about the reindexing point, but yup that
is a stumbling block to any "no work" conversions. It might be best to
only handle cases where the column's base type is not changing, so that
we don't have any index semantics changes happening. I think we could
still handle the varchar->text case (since they share index opclasses)
but that could be a hardwired special case.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Suggested TODO: allow ALTERing of typemods without heap/index rebuild
Date: 2009-06-01 22:52:53
Message-ID: 25899.1243896773@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Josh Berkus <josh(at)agliodbs(dot)com> writes:
>> Yeah, this has been discussed before; I think it's even in the TODO
>> list.

> I couldn't find it. At least, not under data types, and also not with
> the keyword "typemod". Anyone see it?

It's the last item under ALTER:

* Don't require table rewrite on ALTER TABLE ... ALTER COLUMN TYPE, when the old and new data types are binary compatible
http://archives.postgresql.org/message-id/200903040137.n241bAUV035002@wwwmaster.postgresql.org
http://archives.postgresql.org/pgsql-patches/2006-10/msg00154.php

regards, tom lane


From: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Suggested TODO: allow ALTERing of typemods without heap/index rebuild
Date: 2009-06-02 08:37:28
Message-ID: 87prdn2fqv.fsf@hi-media-techno.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

Josh Berkus <josh(at)agliodbs(dot)com> writes:
>> The stumbling block has been to identify a reasonably clean way
>> of determining which datatype changes don't require a scan.
>
> Yep. One possibility I'm thinking is supplying a function for each type
> which takes two typemods (old and new) and returns a value (none, check,
> rebuild) which defines what we need to do: nothing, check but not rebuild,
> or rebuild. Default would be rebuild. Then the logic is simple for each
> data type.

That seems like a good idea, I don't see how the current infrastructure
could provide enough information to skip this here. Add in there whether
a reindex is needed, too, in the accepted return values (maybe a mask is
needed, such as NOREWRITE|REINDEX).

> Note that this doesn't deal with the special case of VARCHAR-->TEXT, but
> just with changing typemods. Are there other cases of data *type*
> conversions where no check or rebuild is required? Otherwise we might just
> special case VARCHAR-->TEXT.

It seems there's some new stuff for this in 8.4, around the notions of
binary coercibility and type categories, which allow user defined types
to be declared IO compatible with built-in types, e.g. citext/text.

Maybe the case is not so special anymore?

http://git.postgresql.org/gitweb?p=postgresql.git;a=commit;h=22ff6d46991447bffaff343f4e333dcee188094d
http://git.postgresql.org/gitweb?p=postgresql.git;a=commit;h=4a3be7e52d7e87d2c05ecc59bc4e7d20f0bc9b17

> Oh, here's a general case: changing DOMAINs on the same base type should
> only be a check, and changing from a DOMAIN to its own base type should be a
> none.

DOMAINs and CASTs are still on the todo list IIRC, so I'm not sure the
current infrastructure around DOMAINs would be flexible (or complete)
enough for the system to determine when the domain A to domain B type
change is binary coercible. It has no CAST information to begin with, I
guess.

As far as reindexing is concerned, talking with RhodiumToad (Andrew
Gierth) on IRC gave insights, as usual. Standard PostgreSQL supports two
data type change without reindex needs: varchar to text and cidr to
inet. In both cases, the types share the indexing infrastructure: same
PROCEDUREs are in use in the OPERATORs that the index is using.

Could it be that we already have the information we need in order to
dynamically decide whether a heap rewrite and a reindex are necessary,
even in case of user defined type conversions?

Regards,
--
dim