Re: Doing better at HINTing an appropriate column within errorMissingColumn()

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)heroku(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Ian Barwick <ian(at)2ndquadrant(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Greg Stark <stark(at)mit(dot)edu>, Jim Nasby <jim(at)nasby(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at>
Subject: Re: Doing better at HINTing an appropriate column within errorMissingColumn()
Date: 2014-06-17 20:46:39
Message-ID: CA+TgmoY7C8=3SaEaii4extc5NT6-z6qe2p0JGL++iy2oZSTZ0g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jun 17, 2014 at 12:51 AM, Peter Geoghegan <pg(at)heroku(dot)com> wrote:
> On Mon, Jun 16, 2014 at 8:56 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Not having looked at the patch, but: I think the probability of
>> useless-noise HINTs could be substantially reduced if the code prints a
>> HINT only when there is a single available alternative that is clearly
>> better than the others in Levenshtein distance. I'm not sure how much
>> better is "clearly better", but I exclude "zero" from that. I see that
>> the original description of the patch says that it will arbitrarily
>> choose one alternative when there are several with equal Levenshtein
>> distance, and I'd say that's a bad idea.
>
> I disagree. I happen to think that making some guess is better than no
> guess at all here, given the fact that there aren't too many
> possibilities to choose from. I think that it might be particularly
> annoying to not show some suggestion in the event of a would-be
> ambiguous column reference where the column name is itself wrong,
> since both mistakes are common. For example, "order_id" was specified
> instead of one of either "o.orderid" or "ol.orderid", as in my
> original examples. If some correct alias was specified, that would
> make the new code prefer the appropriate Var, but it might not be, and
> that should be okay in my view.
>
> I'm not trying to remove the need for human judgement here. We've all
> heard stories about people who did things like input "Portland" into
> their GPS only to end up in Maine rather than Oregon, but I think in
> general you can only go so far in worrying about those cases.

Emitting a suggestion with a large distance seems like it could be
rather irritating. If the user types in SELECT prodct_id FROM orders,
and that column does not exist, suggesting "product_id", if such a
column exists, will likely be well-received. Suggesting a column
named, say, "price", however, will likely make at least some users say
"no I didn't mean that you stupid @%!#" - because probably the issue
there is that the user selected from the completely wrong table,
rather than getting 6 of the 9 characters they typed incorrect.

One existing tool that does something along these lines is 'git',
which seems to have some kind of a heuristic to know when to give up:

[rhaas pgsql]$ git gorp
git: 'gorp' is not a git command. See 'git --help'.

Did you mean this?
grep
[rhaas pgsql]$ git goop
git: 'goop' is not a git command. See 'git --help'.

Did you mean this?
grep
[rhaas pgsql]$ git good
git: 'good' is not a git command. See 'git --help'.
[rhaas pgsql]$ git puma
git: 'puma' is not a git command. See 'git --help'.

Did you mean one of these?
pull
push

I suspect that the maximum useful distance is a function of the string
length. Certainly, if the distance is greater than or equal to the
length of one of the strings involved, it's just a totally unrelated
string and thus not worth suggesting. A useful heuristic might be
something like "distance at most 3, or at most half the string length,
whichever is less".

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2014-06-17 20:48:43 Re: Built-in support for a memory consumption ulimit?
Previous Message Tom Lane 2014-06-17 20:39:51 Re: Built-in support for a memory consumption ulimit?