Possible patch for better index name choosing

Lists: pgsql-hackers
From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: Possible patch for better index name choosing
Date: 2009-12-21 03:17:17
Message-ID: 25441.1261365437@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Attached is a WIP patch for addressing the problems mentioned in this
thread:
http://archives.postgresql.org/pgsql-hackers/2009-12/msg01764.php

The main things that it does are (1) consider all index columns, not
just the first one as formerly; and (2) try to generate a usable name
for index expression columns, rather than just ignoring them which was
the effective behavior formerly.

There are several changes in the regression test outputs, mostly as a
result of choice (1). I've not bothered to update the expected files
yet but just attached the output diffs to show what happens.

There is one thing that is not terribly nice about the behavior, which
is that CREATE TABLE LIKE INCLUDING INDEXES is unable to generate smart
names for expression indexes; it falls back to "expr", as for example
in

regression=# create table foo (f1 text, exclude (lower(f1) with =));
NOTICE: CREATE TABLE / EXCLUDE will create implicit index "foo_lower_exclusion" for table "foo"
CREATE TABLE
regression=# create table foo2 (like foo including all);
NOTICE: CREATE TABLE / EXCLUDE will create implicit index "foo2_expr_exclusion" for table "foo2"
CREATE TABLE

The reason for this is that the patch depends on FigureColname which
works on untransformed parse trees, and we don't have access to such
a tree when copying an existing index. There seem to be three possible
responses to that:

1. Decide this isn't worth worrying about and use the patch as-is.

2. Change FigureColname to work on already-transformed expressions.
I don't care for this idea much, for two reasons. First, FigureColname
would become significantly slower (eg, it would have to do catalog
lookups to resolve names of Vars, instead of just pulling the name out
of a ColumnRef), and this is objectionable considering it's part of
the required parsing path for even very simple commands. Second, there
are various corner cases where we'd get different results, which would
likely break applications that are expecting specific result column
names from given queries.

3. Implement a separate FigureIndexColname function that works as much
like FigureColname as it can, but takes a transformed parse tree.
This fixes the LIKE case and also removes the need for the iexprname
field that the attached patch adds to IndexElem. I think it largely
overcomes the two objections to idea #2, since an extra few lookups
during index creation are hardly a performance problem, and exact
application compatibility shouldn't be an issue here either. It's
a bit ugly to have to keep two such functions in sync though.

I'm not real sure whether to go with the patch as-is or use idea #3.
It seems to depend on how annoyed you are by the LIKE behavior.

A different consideration is whether it's really a good idea to be
messing with default index names at all. As illustrated in the attached
regression diffs, this does impact the error messages returned to
applications for unique-index failures. I don't think this is a serious
problem across a major version update, but maybe someone thinks
differently.

Comments?

regards, tom lane

Attachment Content-Type Size
index-naming-1.patch text/x-patch 23.2 KB

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Possible patch for better index name choosing
Date: 2009-12-21 04:11:25
Message-ID: 603c8f070912202011n45349c2dybca8b040d44ed4f@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Dec 20, 2009 at 10:17 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Attached is a WIP patch for addressing the problems mentioned in this
> thread:
> http://archives.postgresql.org/pgsql-hackers/2009-12/msg01764.php
>
> The main things that it does are (1) consider all index columns, not
> just the first one as formerly; and (2) try to generate a usable name
> for index expression columns, rather than just ignoring them which was
> the effective behavior formerly.

I'm not really sure there's any point to this. Anyone who cares about
giving their index an intelligible name should manually assign one.
If they don't bother doing that, I don't really see why we should
worry about it either. If anything, it seems like we should err on
the side of simplicity, since some users (or even applications) might
attempt to identify or predict automatically generated names.

> A different consideration is whether it's really a good idea to be
> messing with default index names at all.  As illustrated in the attached
> regression diffs, this does impact the error messages returned to
> applications for unique-index failures.  I don't think this is a serious
> problem across a major version update, but maybe someone thinks
> differently.

Maybe I'll reserve final judgement pending further discussion, but my
first reaction is to say it's not worth the risk. Probably this
shouldn't be an issue for a well-designed application, but the world
is full of badly-written code. We shouldn't throw up barriers (even
relatively trivial ones) to updating applications unless we get
something out of it, and I'm not convinced that's the case here.

...Robert


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Possible patch for better index name choosing
Date: 2009-12-21 05:03:09
Message-ID: 27234.1261371789@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Sun, Dec 20, 2009 at 10:17 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Attached is a WIP patch for addressing the problems mentioned in this
>> thread:
>> http://archives.postgresql.org/pgsql-hackers/2009-12/msg01764.php

> I'm not really sure there's any point to this. Anyone who cares about
> giving their index an intelligible name should manually assign one.
> If they don't bother doing that, I don't really see why we should
> worry about it either.

Mainly because we historically *have* put some work into it, and it
would be inconsistent to not pay attention to the point now as we extend
the set of possible index-building constraints further. In particular
we're going to see a lot of exclusion constraints named foo_exclusionN
if we don't expend any effort on it now. I also claim that this is
necessary infrastructure if we are going to accept Peter's proposal of
allowing CREATE INDEX without an explicit index name. That is really
dependent on the assumption that the system will expend more than no
effort on picking useful names.

> Maybe I'll reserve final judgement pending further discussion, but my
> first reaction is to say it's not worth the risk. Probably this
> shouldn't be an issue for a well-designed application, but the world
> is full of badly-written code. We shouldn't throw up barriers (even
> relatively trivial ones) to updating applications unless we get
> something out of it, and I'm not convinced that's the case here.

Well, we could tamp down the risks considerably if we undid my point
(1), namely to still consider only the first index column when
generating a name. I am not really happy with that answer though.
I could turn your first point back on you: if an app is concerned about
the exact names assigned to indexes, why isn't it specifying them?

It's worth noting that pg_dump does preserve index names, so this isn't
going to be an issue in any case for existing apps that dump and reload
their databases. AFAICS the only case where it would actually create a
compatibility issue is if an existing app creates multi-column UNIQUE
(non-PKEY) constraints on-the-fly, without a constraint name, and
depends on the generated name being the same as before.

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Possible patch for better index name choosing
Date: 2009-12-21 05:29:19
Message-ID: 603c8f070912202129o27da735dka1b8c383fb6d8ac2@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Dec 21, 2009 at 12:03 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Sun, Dec 20, 2009 at 10:17 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> Attached is a WIP patch for addressing the problems mentioned in this
>>> thread:
>>> http://archives.postgresql.org/pgsql-hackers/2009-12/msg01764.php
>
>> I'm not really sure there's any point to this.  Anyone who cares about
>> giving their index an intelligible name should manually assign one.
>> If they don't bother doing that, I don't really see why we should
>> worry about it either.
>
> Mainly because we historically *have* put some work into it, and it
> would be inconsistent to not pay attention to the point now as we extend
> the set of possible index-building constraints further.  In particular
> we're going to see a lot of exclusion constraints named foo_exclusionN
> if we don't expend any effort on it now.

Maybe that's worth fixing and maybe it isn't. My first reaction is
"so what"? In all likelihood, you're going to have to look at the
index definition to see what the thing does anyway.

> I also claim that this is
> necessary infrastructure if we are going to accept Peter's proposal of
> allowing CREATE INDEX without an explicit index name.  That is really
> dependent on the assumption that the system will expend more than no
> effort on picking useful names.

That's a point to consider, though perhaps if they aren't specifying a
name it means they don't care that much.

>> Maybe I'll reserve final judgement pending further discussion, but my
>> first reaction is to say it's not worth the risk.  Probably this
>> shouldn't be an issue for a well-designed application, but the world
>> is full of badly-written code.  We shouldn't throw up barriers (even
>> relatively trivial ones) to updating applications unless we get
>> something out of it, and I'm not convinced that's the case here.
>
> Well, we could tamp down the risks considerably if we undid my point
> (1), namely to still consider only the first index column when
> generating a name.  I am not really happy with that answer though.
> I could turn your first point back on you: if an app is concerned about
> the exact names assigned to indexes, why isn't it specifying them?
>
> It's worth noting that pg_dump does preserve index names, so this isn't
> going to be an issue in any case for existing apps that dump and reload
> their databases.  AFAICS the only case where it would actually create a
> compatibility issue is if an existing app creates multi-column UNIQUE
> (non-PKEY) constraints on-the-fly, without a constraint name, and
> depends on the generated name being the same as before.

Right. Imagine, for example, a poorly written initialization script
for an app. Existing instances that are dumped and reloaded will be
OK, but new instances might not come out as expected.

I don't think that what you're proposing here is completely stupid;
I'm just wondering if it's not an ultimately somewhat pointless
activity. I'm not convinced that it's possible or sensible to try to
stringify all the things that people put in their index definitions,
or that we're going to be able to do it well enough to really add any
value. Perhaps I should RTFP before sticking my neck out too far,
but... will you serialize EXCLUDE (a =), EXCLUDE (a &&), and EXCLUDE
(a <some other operator>) differently? And if so, do you expect the
user to be able to reconstruct what the constraint is doing by looking
at the serialized version? It seems like something reasonably sane
can be done when the definition uses mostly column names and
functions, but operators seem like more of a problem. I think mostly
people are going to see the constraint name that got violated and then
run \d on the table and look for it. foo_exclusion3 may not be very
informative, but it's easy to remember for long enough to find it in
the \d output, whereas something long and hairy may not be.

...Robert


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Possible patch for better index name choosing
Date: 2009-12-21 05:39:16
Message-ID: 27774.1261373956@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> Perhaps I should RTFP before sticking my neck out too far,
> but... will you serialize EXCLUDE (a =), EXCLUDE (a &&), and EXCLUDE
> (a <some other operator>) differently?

No, and I'm not proposing to expose ASC/DESC/NULLS FIRST/LAST or
nondefault opclasses (to say nothing of non-btree AMs) or index
predicates either. The proposed patch is to my mind just a logical
extension of what we have always done --- namely, to pay attention
to index column names --- to some new cases that were never exposed
before.

We could certainly make it pay attention to all that stuff, but I have
the same feeling you do that it wouldn't produce readable results.
And it would make any compatibility issues a lot worse.

regards, tom lane


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Possible patch for better index name choosing
Date: 2009-12-21 21:58:34
Message-ID: 1261432714.9031.4.camel@vanquo.pezone.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On mån, 2009-12-21 at 00:03 -0500, Tom Lane wrote:
> Well, we could tamp down the risks considerably if we undid my point
> (1), namely to still consider only the first index column when
> generating a name.

I think putting all the column names into the index names instead of
only the first is a significant improvement that should be kept. If we
can't do it properly in some cases, we should punt in some obvious way,
not pretend to do the correct thing but actually omit some bits.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Possible patch for better index name choosing
Date: 2009-12-21 22:37:25
Message-ID: 21780.1261435045@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Peter Eisentraut <peter_e(at)gmx(dot)net> writes:
> On mn, 2009-12-21 at 00:03 -0500, Tom Lane wrote:
>> Well, we could tamp down the risks considerably if we undid my point
>> (1), namely to still consider only the first index column when
>> generating a name.

> I think putting all the column names into the index names instead of
> only the first is a significant improvement that should be kept.

Yeah, I think so too. It's well worth any risk of application
incompatibility --- we make much bigger changes in every major
release without blinking.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Possible patch for better index name choosing
Date: 2009-12-22 21:01:42
Message-ID: 1534.1261515702@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I wrote:
> Attached is a WIP patch for addressing the problems mentioned in this
> thread:
> http://archives.postgresql.org/pgsql-hackers/2009-12/msg01764.php
> ...
> There is one thing that is not terribly nice about the behavior, which
> is that CREATE TABLE LIKE INCLUDING INDEXES is unable to generate smart
> names for expression indexes;
> ...
> The reason for this is that the patch depends on FigureColname which
> works on untransformed parse trees, and we don't have access to such
> a tree when copying an existing index. There seem to be three possible
> responses to that:
> ...
> 3. Implement a separate FigureIndexColname function that works as much
> like FigureColname as it can, but takes a transformed parse tree.

I fooled around with this solution and decided that it is a lot messier
than it's worth.

In the first place, we can't make a FigureColname-like function that
just takes a transformed tree: there is no way to interpret Vars without
some context. You need at least a table OID, and more than that if
you'd like to handle cases like multiple-relation expressions or
non-RELATION RTEs. For the case at hand of index expressions, a table
OID would be enough, but that doesn't leave much room for imagining the
function could be used for anything else in future. Worse, for the
problematic case (CREATE TABLE LIKE) we actually do not have a table OID
because the target table doesn't exist yet. We could finesse that by
passing the source table's OID instead, but that seems pretty klugy
itself.

In the second place, the number of "corner cases" where we'd generate
output different from FigureColname is much greater than I realized.
As an example, if foo is a type name then foo(x) and x::foo produce
the same parsed tree, but FigureColname will treat them differently.

Seeing that CREATE TABLE LIKE doesn't try to reproduce the source table's
index names anyway, I'm inclined to just go with the patch as-is and not
try to make it handle this one case nicely.

regards, tom lane