Re: Doing better at HINTing an appropriate column within errorMissingColumn()

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Peter Geoghegan <pg(at)heroku(dot)com>
Cc: Ian Barwick <ian(at)2ndquadrant(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Greg Stark <stark(at)mit(dot)edu>, Jim Nasby <jim(at)nasby(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at>
Subject: Re: Doing better at HINTing an appropriate column within errorMissingColumn()
Date: 2014-06-17 03:56:39
Message-ID: 362.1402977399@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Peter Geoghegan <pg(at)heroku(dot)com> writes:
> On Mon, Jun 16, 2014 at 7:09 PM, Ian Barwick <ian(at)2ndquadrant(dot)com> wrote:
>> Howver in this particular use case, as long as it doesn't produce false
>> positives (I haven't looked at the patch) I don't think it would cause
>> any problems (of the kind which would require actively excluding certain
>> languages/character sets), it just wouldn't be quite as useful.

> I'm not sure what you mean by false positives. The patch just shows a
> HINT, where before there was none. It's possible for any number of
> reasons that it isn't the most useful possible suggestion, since
> Levenshtein distance is used as opposed to any other scheme that might
> be better sometimes. I think that the hint given is a generally useful
> piece of information in the event of an ERRCODE_UNDEFINED_COLUMN
> error. Obviously I think the patch is worthwhile, but fundamentally
> the HINT given is just a guess, as with the existing HINTs.

Not having looked at the patch, but: I think the probability of
useless-noise HINTs could be substantially reduced if the code prints a
HINT only when there is a single available alternative that is clearly
better than the others in Levenshtein distance. I'm not sure how much
better is "clearly better", but I exclude "zero" from that. I see that
the original description of the patch says that it will arbitrarily
choose one alternative when there are several with equal Levenshtein
distance, and I'd say that's a bad idea.

You could possibly answer this objection by making the HINT list *all*
the alternatives meeting the minimum Levenshtein distance. But I think
that's probably overcomplicated and of uncertain value anyhow. I'd rather
have a rule that "we print only the choice that is at least K units better
than any other choice", where K remains to be determined exactly.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2014-06-17 04:19:35 Re: postgresql.auto.conf read from wrong directory
Previous Message Ian Barwick 2014-06-17 03:38:15 Re: Doing better at HINTing an appropriate column within errorMissingColumn()