Re: Doing better at HINTing an appropriate column within errorMissingColumn()

From: Peter Geoghegan <pg(at)heroku(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>, Josh Berkus <josh(at)agliodbs(dot)com>, Ian Barwick <ian(at)2ndquadrant(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Greg Stark <stark(at)mit(dot)edu>, Jim Nasby <jim(at)nasby(dot)net>, Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at>
Subject: Re: Doing better at HINTing an appropriate column within errorMissingColumn()
Date: 2014-11-19 19:33:23
Message-ID: CAM3SWZQRsJuXaeo1Qk5a6jnvz5p-8ZAULu5dX1TjeaAKWHFBAA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Nov 19, 2014 at 11:13 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> That's precisely the time I think it's *most* important. In a very
> long string, the threshold should be LESS than 50%. My original
> proposal was "no more than 3 characters of difference, but in any
> event not more than half the length of the shorter string".

We can only hint based on the information given by the user. If they
give a lot of badly matching information, we have something to go on.

> That one's right at 50% too, but it's certainly more than 3 characters
> of difference. I think it's going to be pretty hard to emit a
> suggestion in that case but not in a whole lot of cases that don't
> make any sense.

I don't think that's the case. Other RTEs are penalized for having
non-matching aliases here.

In general, I think the cost of a bad suggestion is much lower than
the benefit of a good one. You seem to be suggesting that they're
equal. Or that they're equally likely in an organic situation. In my
estimation, this is not the case at all.

I'm curious about your thoughts on the compromise of a ramped up
distance threshold to apply a test for the absolute quality of a
match. I think that the fact that git gives bad suggestions with terse
strings tells us a lot, though. Note that unlike git, with terse
strings we may well have a good deal more equidistant matches, and as
soon as the number of would-be matches exceeds 2, we actually give no
matches at all. So that's an additional protection against poor
matches with terse strings.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2014-11-19 19:34:11 Re: Doing better at HINTing an appropriate column within errorMissingColumn()
Previous Message Peter Geoghegan 2014-11-19 19:20:52 Re: Doing better at HINTing an appropriate column within errorMissingColumn()