Re: MySQL search query is not executing in Postgres DB

From: Christopher Browne <cbbrowne(at)gmail(dot)com>
To: Don Baccus <dhogaza(at)pacifier(dot)com>
Cc: Rob Wultsch <wultsch(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: MySQL search query is not executing in Postgres DB
Date: 2012-02-18 21:43:12
Message-ID: CAFNqd5U1nSHRRugSg2L75NpiZNNT_MKnRuLKKAD-6gY8nz=w4A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Feb 18, 2012 at 4:12 PM, Don Baccus <dhogaza(at)pacifier(dot)com> wrote:
>
> On Feb 18, 2012, at 12:57 PM, Rob Wultsch wrote:
>>
>> Where first_name is string the queries above have very different
>> behaviour in MySQL. The first does a full table scan and coerces
>> first_name to an integer (so '5adfs' -> 5)
>
> Oh my, I can't wait to see someone rise to the defense of *this* behavior!

I can see a use, albeit a clumsy one, to the notion of looking for values
WHERE integer_id_column like '1%'

It's entirely common for companies to organize general ledger account
numbers by having numeric prefixes that are somewhat meaningful.

A hierarchy like the following is perfectly logical:
- 0000 to 0999 :: Cash accounts [1]
- 1000 to 1999 :: Short Term Assets
- 2000 to 2999 :: Long Term Assets
- 3000 to 3999 :: Incomes
- 4000 to 4999 :: Costs of Goods Sold
- 5000 to 5999 :: Other Expenses
- 6000 to 6999 :: Share Capital
- 7000 to 7999 :: Retained Earnings and such

And back in the pre-computer days, accountants got very comfortable
with the shorthands that, for instance, "Income is in the 3000
series."

We are much smarter today (well, not necessarily!) and can use other
ways to indicate hierarchy, so that there's no reason to *care* what
that account number is.

But if old-school accountants that think "3000 series" *demand* that,
and as they're likely senior enough to assert their way, they're
likely to succeed in that demand, then it's pretty easy to this to
lead to somewhat clumsy "account_id like '3%'" as a search for income.

If I put my purist hat on, then the *right* answer is a range query, thus
WHERE account_id between 3000 and 3999

The new RANGE stuff that Jeff Davis has been adding into 9.2 should,
in principle, be the even better way to represent this kind of thing.

I'd think it nearly insane if someone was expecting '3%' to match not
only the '3000 thru 3999' series, but also '300 to 399' and "30 to 39"
and "3". A situation where that is the right set of results requires
a mighty strangely designed numbering system. I imagine a designer
would want to rule out the range 0-999, in such a design.

Nonetheless, the need for "where account_id like '1%'" comes from a
system designed with the above kind of thinking about account numbers,
and that approach fits mighty well with the way people thought back
when a "computer" was a person whose job it was to work out sums.

Notes:
[1] A careful observer will notice that the prefix notion doesn't
work for the first range without forcing leading zeroes onto
numbers...
--
When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Don Baccus 2012-02-18 22:34:24 Re: MySQL search query is not executing in Postgres DB
Previous Message Dimitri Fontaine 2012-02-18 21:24:01 Re: Command Triggers