Re: Doing better at HINTing an appropriate column within errorMissingColumn()

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Peter Geoghegan <pg(at)heroku(dot)com>, Ian Barwick <ian(at)2ndquadrant(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Greg Stark <stark(at)mit(dot)edu>, Jim Nasby <jim(at)nasby(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at>
Subject: Re: Doing better at HINTing an appropriate column within errorMissingColumn()
Date: 2014-06-17 21:53:40
Message-ID: 16109.1403042020@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Josh Berkus <josh(at)agliodbs(dot)com> writes:
> On 06/17/2014 02:36 PM, Tom Lane wrote:
>> Another issue is whether to print only those having exactly the minimum
>> observed Levenshtein distance, or to print everything less than some
>> cutoff. The former approach seems to me to be placing a great deal of
>> faith in something that's only a heuristic.

> Well, that depends on what the cutoff is. If it's high, like 0.5, that
> could be a LOT of columns. Like, I plan to test this feature with a
> 3-table join that has a combined 300 columns. I can completely imagine
> coming up with a string which is within 0.5 or even 0.3 of 40 columns names.

I think Levenshtein distances are integers, though that's just a minor
point.

> So if we want to list everything below a cutoff, we'd need to make that
> cutoff fairly narrow, like 0.2. But that means we'd miss a lot of
> potential matches on short column names.

I'm not proposing an immutable cutoff. Something that scales with the
string length might be a good idea, or we could make it a multiple of
the minimum observed distance, or probably there are a dozen other things
we could do. I'm just saying that if we have an alternative at distance
3, and another one at distance 4, it's not clear to me that we should
assume that the first one is certainly what the user had in mind.
Especially not if all the other alternatives are distance 10 or more.

> I really think we're overthinking this: it is just a HINT, and we can
> improve it in future PostgreSQL versions, and most of our users will
> ignore it anyway because they'll be using a client which doesn't display
> HINTs.

Agreed that we can make it better later. But whether it prints exactly
one suggestion, and whether it does that no matter how silly the
suggestion is, are rather fundamental decisions.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Josh Berkus 2014-06-17 21:58:18 Re: Doing better at HINTing an appropriate column within errorMissingColumn()
Previous Message Tom Lane 2014-06-17 21:46:02 Re: Doing better at HINTing an appropriate column within errorMissingColumn()