Re: Doing better at HINTing an appropriate column within errorMissingColumn()

From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Peter Geoghegan <pg(at)heroku(dot)com>, Ian Barwick <ian(at)2ndquadrant(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Greg Stark <stark(at)mit(dot)edu>, Jim Nasby <jim(at)nasby(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at>
Subject: Re: Doing better at HINTing an appropriate column within errorMissingColumn()
Date: 2014-06-17 21:58:18
Message-ID: 53A0B9FA.6030904@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 06/17/2014 02:53 PM, Tom Lane wrote:
> Josh Berkus <josh(at)agliodbs(dot)com> writes:
>> On 06/17/2014 02:36 PM, Tom Lane wrote:
>>> Another issue is whether to print only those having exactly the minimum
>>> observed Levenshtein distance, or to print everything less than some
>>> cutoff. The former approach seems to me to be placing a great deal of
>>> faith in something that's only a heuristic.
>
>> Well, that depends on what the cutoff is. If it's high, like 0.5, that
>> could be a LOT of columns. Like, I plan to test this feature with a
>> 3-table join that has a combined 300 columns. I can completely imagine
>> coming up with a string which is within 0.5 or even 0.3 of 40 columns names.
>
> I think Levenshtein distances are integers, though that's just a minor
> point.

I was giving distance/length ratios. That is, 0.5 would mean that up to
50% of the characters could be replaced/changed. 0.2 would mean that
only one character could be changed at lengths of five characters. Etc.

The problem with these ratios is that they behave differently with long
strings than short ones. I think realistically we'd need a double
threshold, i.e. ( distance >= 2 OR ratio <= 0.4 ). Otherwise the
obvious case, getting two characters wrong in a 4-character column name
(or one in a two character name), doesn't get a HINT.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2014-06-17 22:02:11 Re: Atomics hardware support table & supported architectures
Previous Message Tom Lane 2014-06-17 21:53:40 Re: Doing better at HINTing an appropriate column within errorMissingColumn()