Re: Bad query plan when the wrong data type is used

Lists: pgsql-performance
From: Laszlo Nagy <gandalf(at)shopzeus(dot)com>
To: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Cc: Daniel Fekete <danieleff(at)gmail(dot)com>
Subject: Bad query plan when the wrong data type is used
Date: 2011-02-08 14:15:27
Message-ID: 4D514FFF.9060308@shopzeus.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

This query:

select p.id,p.producer_id,visa.variation_item_id, vi.qtyavail
from variation_item_sellingsite_asin visa
inner join product p on p.id = visa.product_id
inner join variation_item vi on vi.id =
visa.variation_item_id
where visa.id =4

runs in 43 msec. The "visa.id" column has int4 datatype. The query plan
uses an index condition:

"Nested Loop (cost=0.00..26.19 rows=1 width=28)"
" -> Nested Loop (cost=0.00..17.75 rows=1 width=24)"
" -> Index Scan using variation_item_sellingsite_asin_pkey on
variation_item_sellingsite_asin visa (cost=0.00..8.58 rows=1 width=16)"
" Index Cond: (id = 4)"
" -> Index Scan using pk_product_id on product p
(cost=0.00..9.16 rows=1 width=16)"
" Index Cond: (p.id = visa.product_id)"
" -> Index Scan using pk_variation_item_id on variation_item vi
(cost=0.00..8.43 rows=1 width=12)"
" Index Cond: (vi.id = visa.variation_item_id)"

This query:

select p.id,p.producer_id,visa.variation_item_id, vi.qtyavail
from variation_item_sellingsite_asin visa
inner join product p on p.id = visa.product_id
inner join variation_item vi on vi.id =
visa.variation_item_id
where visa.id =4.0

Runs for 1144 msec! Query plan uses seq scan + filter:

"Nested Loop (cost=33957.27..226162.68 rows=14374 width=28)"
" -> Hash Join (cost=33957.27..106190.76 rows=14374 width=20)"
" Hash Cond: (visa.variation_item_id = vi.id)"
" -> Seq Scan on variation_item_sellingsite_asin visa
(cost=0.00..71928.04 rows=14374 width=16)"
" Filter: ((id)::numeric = 4.0)"
" -> Hash (cost=22026.01..22026.01 rows=954501 width=12)"
" -> Seq Scan on variation_item vi (cost=0.00..22026.01
rows=954501 width=12)"
" -> Index Scan using pk_product_id on product p (cost=0.00..8.33
rows=1 width=16)"
" Index Cond: (p.id = visa.product_id)"

Which is silly. I think that PostgreSQL converts the int side to a
float, and then compares them.

It would be better to do this, for each item in the loop:

* evaluate the right side (which is float)
* tell if it is an integer or not
* if not an integer, then discard the row immediately
* otherwise use its integer value for the index scan

The result is identical, but it makes possible to use the index scan. Of
course, I know that the query itself is wrong, because I sould not use a
float where an int is expected. But this CAN be optimized, so I think it
should be! My idea for the query optimizer is not to use the "wider"
data type, but use the data type that has an index on it instead.

(I spent an hour figuring out what is wrong with my program. In some
cases it was slow, in other cases it was really fast, and I never got an
error message.)

What do you think?

Laszlo


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Laszlo Nagy <gandalf(at)shopzeus(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org, Daniel Fekete <danieleff(at)gmail(dot)com>
Subject: Re: Bad query plan when the wrong data type is used
Date: 2011-02-08 22:04:59
Message-ID: 4D51BE0B.1060404@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Laszlo,

> Which is silly. I think that PostgreSQL converts the int side to a
> float, and then compares them.
>
> It would be better to do this, for each item in the loop:
>
> * evaluate the right side (which is float)
> * tell if it is an integer or not
> * if not an integer, then discard the row immediately
> * otherwise use its integer value for the index scan

Not terribly likely, I'm afraid. Data type coercion is *way* more
complex than you realize (consider the number of data types we have, and
the ability to add UDTs, and then square it). And the functionality you
propose would break backwards compatibility; many people currently use
".0" currently in order to force a coercion to Float or Numeric.

I'm not saying that PostgreSQL couldn't do better on this kind of case,
but that doing better is a major project, not a minor one.

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com


From: Dave Crooke <dcrooke(at)gmail(dot)com>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: Laszlo Nagy <gandalf(at)shopzeus(dot)com>, pgsql-performance(at)postgresql(dot)org, Daniel Fekete <danieleff(at)gmail(dot)com>
Subject: Re: Bad query plan when the wrong data type is used
Date: 2011-02-08 23:14:20
Message-ID: AANLkTikWGEAgaOe4Z3UfT-k=9jZ6Z0GRTD3AzX3hRmcC@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

You will get the same behaviour from any database product where the query as
written requires type coercion - the coercion has to go in the direction of
the "wider" type. I have seen the exact same scenario with Oracle, and I
view it as a problem with the way the query is written, not with the
database server.

Whoever coded the application which is making this query presumably knows
that the visa.id field is an integer type in the schema they designed, so
why are they passing a float? Convert the 4.0 to 4 on the application side
instead, it's one function call or cast.

It's not reasonable to expect the query compiler to pick up the slack for
poorly written SQL.

Cheers
Dave

On Tue, Feb 8, 2011 at 4:04 PM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:

> Laszlo,
>
> > Which is silly. I think that PostgreSQL converts the int side to a
> > float, and then compares them.
> >
> > It would be better to do this, for each item in the loop:
> >
> > * evaluate the right side (which is float)
> > * tell if it is an integer or not
> > * if not an integer, then discard the row immediately
> > * otherwise use its integer value for the index scan
>
> Not terribly likely, I'm afraid. Data type coercion is *way* more
> complex than you realize (consider the number of data types we have, and
> the ability to add UDTs, and then square it). And the functionality you
> propose would break backwards compatibility; many people currently use
> ".0" currently in order to force a coercion to Float or Numeric.
>
> I'm not saying that PostgreSQL couldn't do better on this kind of case,
> but that doing better is a major project, not a minor one.
>
> --
> -- Josh Berkus
> PostgreSQL Experts Inc.
> http://www.pgexperts.com
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>


From: Vitalii Tymchyshyn <tivv00(at)gmail(dot)com>
To: Dave Crooke <dcrooke(at)gmail(dot)com>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, Laszlo Nagy <gandalf(at)shopzeus(dot)com>, pgsql-performance(at)postgresql(dot)org, Daniel Fekete <danieleff(at)gmail(dot)com>
Subject: Re: Bad query plan when the wrong data type is used
Date: 2011-02-09 09:52:44
Message-ID: 4D5263EC.9000205@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

09.02.11 01:14, Dave Crooke написав(ла):
> You will get the same behaviour from any database product where the
> query as written requires type coercion - the coercion has to go in
> the direction of the "wider" type. I have seen the exact same scenario
> with Oracle, and I view it as a problem with the way the query is
> written, not with the database server.
>
> Whoever coded the application which is making this query presumably
> knows that the visa.id <http://visa.id> field is an integer type in
> the schema they designed, so why are they passing a float? Convert the
> 4.0 to 4 on the application side instead, it's one function call or cast.
Actually the problem may be in layers, and the problem may even be not
noticed until it's late enough. As far as I remember from this list
there are problems with column being integer and parameter prepared as
bigint or number. Same for number vs double vs float.
As for me it would be great for optimizer to consider the next:
1) val1::narrow = val2::wide as (val1::narrow = val2::narrow and
val2::narrow = val2::wide)
2) val1::narrow < val2::wide as (val1::narrow < val2::narrow and
val1::wide < val2::wide)
3) val1::narrow > val2::wide as (val1::narrow + 1 > val2::narrow and
val1::wide > val2::wide)
Of course it should use additional check it this allows to use an index.
Surely, this is not an easy thing to implement, but as for me similar
question are raised quite often in this list.

Best regards, Vitalii Tymchyshyn


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: Laszlo Nagy <gandalf(at)shopzeus(dot)com>, pgsql-performance(at)postgresql(dot)org, Daniel Fekete <danieleff(at)gmail(dot)com>
Subject: Re: Bad query plan when the wrong data type is used
Date: 2011-02-27 18:16:55
Message-ID: AANLkTim3q+L0=tmipSQ8nE5JRt3ay8Gcy-LgxHtbZTu8@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Tue, Feb 8, 2011 at 5:04 PM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
> Laszlo,
>
>> Which is silly. I think that PostgreSQL converts the int side to a
>> float, and then compares them.
>>
>> It would be better to do this, for each item in the loop:
>>
>>     * evaluate the right side (which is float)
>>     * tell if it is an integer or not
>>     * if not an integer, then discard the row immediately
>>     * otherwise use its integer value for the index scan
>
> Not terribly likely, I'm afraid.  Data type coercion is *way* more
> complex than you realize (consider the number of data types we have, and
> the ability to add UDTs, and then square it).  And the functionality you
> propose would break backwards compatibility; many people currently use
> ".0" currently in order to force a coercion to Float or Numeric.
>
> I'm not saying that PostgreSQL couldn't do better on this kind of case,
> but that doing better is a major project, not a minor one.

Specifically, the problem is that x = 4.0, where x is an integer, is
defined to mean x::numeric = 4.0, not x = 4.0::integer. If it meant
the latter, then testing x = 3.5 would throw an error, whereas what
actually happens is it just returns false.

Now, in this particular case, we all know that the only way x::numeric
= 4.0 can be true is if x = 4::int. But that's a property of the
numeric and integer data types that doesn't hold in general. Consider
t = 'foo'::citext, where t has type text. That could be true if t =
'Foo' or t = 'foO' or t = 'FOO', etc.

We could fix this by adding some special case logic that understands
properties of integers and numeric values and optimizes x =
4.0::numeric to x = 4::int and x = 3.5::numeric to constant false.
That would be cool, in a way, but I'm not sure it's really worth the
code it would take, unless it falls naturally out of some larger
project in that area.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, Laszlo Nagy <gandalf(at)shopzeus(dot)com>, pgsql-performance(at)postgresql(dot)org, Daniel Fekete <danieleff(at)gmail(dot)com>
Subject: Re: Bad query plan when the wrong data type is used
Date: 2011-02-27 18:39:41
Message-ID: 18767.1298831981@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Tue, Feb 8, 2011 at 5:04 PM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
>> I'm not saying that PostgreSQL couldn't do better on this kind of case,
>> but that doing better is a major project, not a minor one.

> Specifically, the problem is that x = 4.0, where x is an integer, is
> defined to mean x::numeric = 4.0, not x = 4.0::integer. If it meant
> the latter, then testing x = 3.5 would throw an error, whereas what
> actually happens is it just returns false.

> We could fix this by adding some special case logic that understands
> properties of integers and numeric values and optimizes x =
> 4.0::numeric to x = 4::int and x = 3.5::numeric to constant false.
> That would be cool, in a way, but I'm not sure it's really worth the
> code it would take, unless it falls naturally out of some larger
> project in that area.

I think that most of the practical problems around this case could be
solved without such a hack. What we should do instead is invent
cross-type operators "int = numeric" etc and make them members of both
the integer and numeric index opclasses. There are reasons why that
wouldn't work for integer versus float (read the last section of
src/backend/access/nbtree/README) but right offhand it seems like it
ought to be safe enough for numeric. Now, it wouldn't be quite as fast
as if we somehow downconverted numeric to integer beforehand, but at
least you'd only be talking about a slow comparison operator and not a
fundamentally stupider plan. That's close enough for me, for what is
in the end a stupidly written query.

Of course, the above is still not exactly a small project, since you'd
be talking about something like 36 new operators to cover all of int2,
int4, int8. But it's a straightforward extension.

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, Laszlo Nagy <gandalf(at)shopzeus(dot)com>, pgsql-performance(at)postgresql(dot)org, Daniel Fekete <danieleff(at)gmail(dot)com>
Subject: Re: Bad query plan when the wrong data type is used
Date: 2011-02-28 19:04:53
Message-ID: AANLkTi=7f1v1bG1FU_gXxzarKgp6TxFjoea88ez6jqP1@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Sun, Feb 27, 2011 at 1:39 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Tue, Feb 8, 2011 at 5:04 PM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
>>> I'm not saying that PostgreSQL couldn't do better on this kind of case,
>>> but that doing better is a major project, not a minor one.
>
>> Specifically, the problem is that x = 4.0, where x is an integer, is
>> defined to mean x::numeric = 4.0, not x = 4.0::integer.  If it meant
>> the latter, then testing x = 3.5 would throw an error, whereas what
>> actually happens is it just returns false.
>
>> We could fix this by adding some special case logic that understands
>> properties of integers and numeric values and optimizes x =
>> 4.0::numeric to x = 4::int and x = 3.5::numeric to constant false.
>> That would be cool, in a way, but I'm not sure it's really worth the
>> code it would take, unless it falls naturally out of some larger
>> project in that area.
>
> I think that most of the practical problems around this case could be
> solved without such a hack.  What we should do instead is invent
> cross-type operators "int = numeric" etc and make them members of both
> the integer and numeric index opclasses.  There are reasons why that
> wouldn't work for integer versus float (read the last section of
> src/backend/access/nbtree/README) but right offhand it seems like it
> ought to be safe enough for numeric.  Now, it wouldn't be quite as fast
> as if we somehow downconverted numeric to integer beforehand, but at
> least you'd only be talking about a slow comparison operator and not a
> fundamentally stupider plan.  That's close enough for me, for what is
> in the end a stupidly written query.
>
> Of course, the above is still not exactly a small project, since you'd
> be talking about something like 36 new operators to cover all of int2,
> int4, int8.  But it's a straightforward extension.

Interesting. Worth a TODO?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Josh Berkus <josh(at)agliodbs(dot)com>, Laszlo Nagy <gandalf(at)shopzeus(dot)com>, pgsql-performance(at)postgresql(dot)org, Daniel Fekete <danieleff(at)gmail(dot)com>
Subject: Re: Bad query plan when the wrong data type is used
Date: 2012-09-01 16:25:37
Message-ID: 20120901162537.GA32319@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Mon, Feb 28, 2011 at 02:04:53PM -0500, Robert Haas wrote:
> On Sun, Feb 27, 2011 at 1:39 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> >> On Tue, Feb 8, 2011 at 5:04 PM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
> >>> I'm not saying that PostgreSQL couldn't do better on this kind of case,
> >>> but that doing better is a major project, not a minor one.
> >
> >> Specifically, the problem is that x = 4.0, where x is an integer, is
> >> defined to mean x::numeric = 4.0, not x = 4.0::integer.  If it meant
> >> the latter, then testing x = 3.5 would throw an error, whereas what
> >> actually happens is it just returns false.
> >
> >> We could fix this by adding some special case logic that understands
> >> properties of integers and numeric values and optimizes x =
> >> 4.0::numeric to x = 4::int and x = 3.5::numeric to constant false.
> >> That would be cool, in a way, but I'm not sure it's really worth the
> >> code it would take, unless it falls naturally out of some larger
> >> project in that area.
> >
> > I think that most of the practical problems around this case could be
> > solved without such a hack.  What we should do instead is invent
> > cross-type operators "int = numeric" etc and make them members of both
> > the integer and numeric index opclasses.  There are reasons why that
> > wouldn't work for integer versus float (read the last section of
> > src/backend/access/nbtree/README) but right offhand it seems like it
> > ought to be safe enough for numeric.  Now, it wouldn't be quite as fast
> > as if we somehow downconverted numeric to integer beforehand, but at
> > least you'd only be talking about a slow comparison operator and not a
> > fundamentally stupider plan.  That's close enough for me, for what is
> > in the end a stupidly written query.
> >
> > Of course, the above is still not exactly a small project, since you'd
> > be talking about something like 36 new operators to cover all of int2,
> > int4, int8.  But it's a straightforward extension.
>
> Interesting. Worth a TODO?

Since we are discussing int2 casting, I wanted to bring up this other
casting issue from 2011, in case it helped the discussion.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +