regression between 8.4.8 and 8.4.2?

Lists: pgsql-general
From: Ben Chobot <bench(at)silentmedia(dot)com>
To: pgsql-general General <pgsql-general(at)postgresql(dot)org>
Subject: regression between 8.4.8 and 8.4.2?
Date: 2011-08-30 22:24:38
Message-ID: 760C0206-B5F4-4DC6-9296-B7A730B7F403@silentmedia.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

We recently took a copy of our production data (running on 8.4.2), scrubbed many data fields, and then loaded it onto a qa server (running 8.4.8). We're seeing some odd planner performance that I think might be a bug, though I'm hoping it's just idiocy on my part. I've analyzed things and looked into pg_stats and it seems as if the relevant columns have about the same statistics.

I've managed to simplify the query, but if I make it any simpler, then the two servers end up with the same good plan. The query is down to:

SELECT machines.quota_purchased
FROM machines
WHERE NOT deleted AND machines.user_id IN (
SELECT id FROM users WHERE user_group_id IN (
select 607547 offset 0
) OFFSET 0
);

(Those "offset 0" are in there to protect us from planner regressions we saw when moving to 8.4. When we move to 9, they can hopefully go away.)

On the production server, this returns a fairly accurate plan:

QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=843.59..1447.90 rows=243 width=8) (actual time=0.044..0.045 rows=1 loops=1)
-> HashAggregate (cost=843.59..845.59 rows=200 width=4) (actual time=0.027..0.027 rows=1 loops=1)
-> Limit (cost=0.02..823.90 rows=1575 width=4) (actual time=0.024..0.025 rows=1 loops=1)
-> Nested Loop (cost=0.02..823.90 rows=1575 width=4) (actual time=0.023..0.024 rows=1 loops=1)
-> HashAggregate (cost=0.02..0.03 rows=1 width=4) (actual time=0.005..0.005 rows=1 loops=1)
-> Limit (cost=0.00..0.01 rows=1 width=0) (actual time=0.001..0.002 rows=1 loops=1)
-> Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.000..0.000 rows=1 loops=1)
-> Index Scan using users_user_groups_idx on users (cost=0.00..804.18 rows=1575 width=8) (actual time=0.017..0.018 rows=1 loops=1)
Index Cond: (users.user_group_id = (607547))
-> Index Scan using machines_sid_un on machines (cost=0.00..3.00 rows=1 width=12) (actual time=0.015..0.015 rows=1 loops=1)
Index Cond: (machines.user_id = users.id)
Total runtime: 0.121 ms
(12 rows)

On the QA server, things are not so accurate. It doesn't hurt the timing of this simplified query much, but when put into the actual query, the row estimation being off by 6 orders of magnitude really throws the planning in the wrong direction. The plan on the QA server is:

QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=1887.16..3671.20 rows=1192462 width=8) (actual time=0.049..0.051 rows=1 loops=1)
-> HashAggregate (cost=1887.16..1889.16 rows=200 width=4) (actual time=0.032..0.033 rows=1 loops=1)
-> Limit (cost=0.02..1868.20 rows=1517 width=4) (actual time=0.027..0.029 rows=1 loops=1)
-> Nested Loop (cost=0.02..1868.20 rows=1517 width=4) (actual time=0.027..0.028 rows=1 loops=1)
-> HashAggregate (cost=0.02..0.03 rows=1 width=4) (actual time=0.008..0.008 rows=1 loops=1)
-> Limit (cost=0.00..0.01 rows=1 width=0) (actual time=0.001..0.001 rows=1 loops=1)
-> Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.001..0.001 rows=1 loops=1)
-> Index Scan using users_user_groups_idx on users (cost=0.00..1849.20 rows=1517 width=8) (actual time=0.015..0.016 rows=1 loops=1)
Index Cond: (users.user_group_id = (607547))
-> Index Scan using machines_sid_un on machines (cost=0.00..8.90 rows=1 width=12) (actual time=0.013..0.013 rows=1 loops=1)
Index Cond: (machines.user_id = users.id)
Total runtime: 0.148 ms
(12 rows)

The problem here (I think) seems to be that the QA server believes that running a nested loop over 200 users.id values and joining that against machines.user_id will result in >1M rows. The production servers sees this more accurately as the nearly 1:1 relationship that it is.

The reason I wonder if this might be a bug is because if I change the obtuse clause "WHERE user_group_id IN (select 607547 offset 0)" to simply "where user_group_id in (607547)" then the plan collapses to the same plan on both servers:

explain analyze SELECT machines.quota_purchased
FROM machines
WHERE NOT deleted AND machines.user_id IN (
SELECT id FROM users WHERE user_group_id IN (
607547
) OFFSET 0
);

QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=82.27..636.70 rows=62 width=8) (actual time=0.042..0.043 rows=1 loops=1)
-> HashAggregate (cost=82.27..82.88 rows=61 width=4) (actual time=0.024..0.025 rows=1 loops=1)
-> Limit (cost=0.00..81.51 rows=61 width=4) (actual time=0.017..0.018 rows=1 loops=1)
-> Index Scan using users_user_groups_idx on users (cost=0.00..81.51 rows=61 width=4) (actual time=0.015..0.016 rows=1 loops=1)
Index Cond: (user_group_id = 607547)
-> Index Scan using machines_sid_un on machines (cost=0.00..9.07 rows=1 width=12) (actual time=0.013..0.013 rows=1 loops=1)
Index Cond: (machines.user_id = users.id)
Total runtime: 0.106 ms
(8 rows)

But, as I understand it, that remaining OFFSET 0 should keep the planner from caring that it's a single value or the result of a subselect.

Anyway, anything I can check? The hardware is different so the configuration files are different, but they're not that different in anything other than effective_cache_size. Certainly not different in any way I can think that would affect this kind of planning mis-estimation.


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Ben Chobot <bench(at)silentmedia(dot)com>
Cc: pgsql-general General <pgsql-general(at)postgresql(dot)org>
Subject: Re: regression between 8.4.8 and 8.4.2?
Date: 2011-08-31 07:42:54
Message-ID: 1314776574.27073.1.camel@fsopti579.F-Secure.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

I don't have an answer for you, but this report looks suspiciously
similar to the one I posted the other day at
<http://archives.postgresql.org/pgsql-hackers/2011-08/msg01224.php>,
which, now that I think about it, also manifested itself after the
upgrade to 8.4.8.

On tis, 2011-08-30 at 15:24 -0700, Ben Chobot wrote:
> We recently took a copy of our production data (running on 8.4.2),
> scrubbed many data fields, and then loaded it onto a qa server
> (running 8.4.8). We're seeing some odd planner performance that I
> think might be a bug, though I'm hoping it's just idiocy on my part.
> I've analyzed things and looked into pg_stats and it seems as if the
> relevant columns have about the same statistics.
>
>
> I've managed to simplify the query, but if I make it any simpler, then
> the two servers end up with the same good plan. The query is down to:
>
>
> SELECT machines.quota_purchased
> FROM machines
> WHERE NOT deleted AND machines.user_id IN (
> SELECT id FROM users WHERE user_group_id IN (
> select 607547 offset 0
> ) OFFSET 0
> );
>
>
>
>
> (Those "offset 0" are in there to protect us from planner regressions
> we saw when moving to 8.4. When we move to 9, they can hopefully go
> away.)
>
>
> On the production server, this returns a fairly accurate plan:
>
>
>
> QUERY PLAN
>
> ----------------------------------------------------------------------------------------------------------------------------------------------------------
> Nested Loop (cost=843.59..1447.90 rows=243 width=8) (actual
> time=0.044..0.045 rows=1 loops=1)
> -> HashAggregate (cost=843.59..845.59 rows=200 width=4) (actual
> time=0.027..0.027 rows=1 loops=1)
> -> Limit (cost=0.02..823.90 rows=1575 width=4) (actual
> time=0.024..0.025 rows=1 loops=1)
> -> Nested Loop (cost=0.02..823.90 rows=1575 width=4)
> (actual time=0.023..0.024 rows=1 loops=1)
> -> HashAggregate (cost=0.02..0.03 rows=1
> width=4) (actual time=0.005..0.005 rows=1 loops=1)
> -> Limit (cost=0.00..0.01 rows=1 width=0)
> (actual time=0.001..0.002 rows=1 loops=1)
> -> Result (cost=0.00..0.01 rows=1
> width=0) (actual time=0.000..0.000 rows=1 loops=1)
> -> Index Scan using users_user_groups_idx on
> users (cost=0.00..804.18 rows=1575 width=8) (actual time=0.017..0.018
> rows=1 loops=1)
> Index Cond: (users.user_group_id =
> (607547))
> -> Index Scan using machines_sid_un on machines (cost=0.00..3.00
> rows=1 width=12) (actual time=0.015..0.015 rows=1 loops=1)
> Index Cond: (machines.user_id = users.id)
> Total runtime: 0.121 ms
> (12 rows)
>
>
>
>
> On the QA server, things are not so accurate. It doesn't hurt the
> timing of this simplified query much, but when put into the actual
> query, the row estimation being off by 6 orders of magnitude really
> throws the planning in the wrong direction. The plan on the QA server
> is:
>
>
>
> QUERY PLAN
>
> -----------------------------------------------------------------------------------------------------------------------------------------------------------
> Nested Loop (cost=1887.16..3671.20 rows=1192462 width=8) (actual
> time=0.049..0.051 rows=1 loops=1)
> -> HashAggregate (cost=1887.16..1889.16 rows=200 width=4) (actual
> time=0.032..0.033 rows=1 loops=1)
> -> Limit (cost=0.02..1868.20 rows=1517 width=4) (actual
> time=0.027..0.029 rows=1 loops=1)
> -> Nested Loop (cost=0.02..1868.20 rows=1517 width=4)
> (actual time=0.027..0.028 rows=1 loops=1)
> -> HashAggregate (cost=0.02..0.03 rows=1
> width=4) (actual time=0.008..0.008 rows=1 loops=1)
> -> Limit (cost=0.00..0.01 rows=1 width=0)
> (actual time=0.001..0.001 rows=1 loops=1)
> -> Result (cost=0.00..0.01 rows=1
> width=0) (actual time=0.001..0.001 rows=1 loops=1)
> -> Index Scan using users_user_groups_idx on
> users (cost=0.00..1849.20 rows=1517 width=8) (actual
> time=0.015..0.016 rows=1 loops=1)
> Index Cond: (users.user_group_id =
> (607547))
> -> Index Scan using machines_sid_un on machines (cost=0.00..8.90
> rows=1 width=12) (actual time=0.013..0.013 rows=1 loops=1)
> Index Cond: (machines.user_id = users.id)
> Total runtime: 0.148 ms
> (12 rows)
>
>
>
>
>
>
> The problem here (I think) seems to be that the QA server believes
> that running a nested loop over 200 users.id values and joining that
> against machines.user_id will result in >1M rows. The production
> servers sees this more accurately as the nearly 1:1 relationship that
> it is.
>
>
> The reason I wonder if this might be a bug is because if I change the
> obtuse clause "WHERE user_group_id IN (select 607547 offset 0)" to
> simply "where user_group_id in (607547)" then the plan collapses to
> the same plan on both servers:
>
>
> explain analyze SELECT machines.quota_purchased
> FROM machines
> WHERE NOT deleted AND machines.user_id IN (
> SELECT id FROM users WHERE user_group_id IN (
> 607547
> ) OFFSET 0
> );
>
>
>
>
>
> QUERY PLAN
>
> -------------------------------------------------------------------------------------------------------------------------------------------------
> Nested Loop (cost=82.27..636.70 rows=62 width=8) (actual
> time=0.042..0.043 rows=1 loops=1)
> -> HashAggregate (cost=82.27..82.88 rows=61 width=4) (actual
> time=0.024..0.025 rows=1 loops=1)
> -> Limit (cost=0.00..81.51 rows=61 width=4) (actual
> time=0.017..0.018 rows=1 loops=1)
> -> Index Scan using users_user_groups_idx on users
> (cost=0.00..81.51 rows=61 width=4) (actual time=0.015..0.016 rows=1
> loops=1)
> Index Cond: (user_group_id = 607547)
> -> Index Scan using machines_sid_un on machines (cost=0.00..9.07
> rows=1 width=12) (actual time=0.013..0.013 rows=1 loops=1)
> Index Cond: (machines.user_id = users.id)
> Total runtime: 0.106 ms
> (8 rows)
>
>
>
>
> But, as I understand it, that remaining OFFSET 0 should keep the
> planner from caring that it's a single value or the result of a
> subselect.
>
>
>
>
> Anyway, anything I can check? The hardware is different so the
> configuration files are different, but they're not that different in
> anything other than effective_cache_size. Certainly not different in
> any way I can think that would affect this kind of planning
> mis-estimation.


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Ben Chobot <bench(at)silentmedia(dot)com>
Cc: pgsql-general General <pgsql-general(at)postgresql(dot)org>
Subject: Re: regression between 8.4.8 and 8.4.2?
Date: 2011-08-31 07:53:24
Message-ID: 1314777204.27073.2.camel@fsopti579.F-Secure.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On ons, 2011-08-31 at 10:42 +0300, Peter Eisentraut wrote:
> I don't have an answer for you, but this report looks suspiciously
> similar to the one I posted the other day at
> <http://archives.postgresql.org/pgsql-hackers/2011-08/msg01224.php>,
> which, now that I think about it, also manifested itself after the
> upgrade to 8.4.8.

See this thread:
http://archives.postgresql.org/pgsql-performance/2011-08/msg00248.php

It looks like there are a number of users affected by this.

>
> On tis, 2011-08-30 at 15:24 -0700, Ben Chobot wrote:
> > We recently took a copy of our production data (running on 8.4.2),
> > scrubbed many data fields, and then loaded it onto a qa server
> > (running 8.4.8). We're seeing some odd planner performance that I
> > think might be a bug, though I'm hoping it's just idiocy on my part.
> > I've analyzed things and looked into pg_stats and it seems as if the
> > relevant columns have about the same statistics.
> >
> >
> > I've managed to simplify the query, but if I make it any simpler, then
> > the two servers end up with the same good plan. The query is down to:
> >
> >
> > SELECT machines.quota_purchased
> > FROM machines
> > WHERE NOT deleted AND machines.user_id IN (
> > SELECT id FROM users WHERE user_group_id IN (
> > select 607547 offset 0
> > ) OFFSET 0
> > );
> >
> >
> >
> >
> > (Those "offset 0" are in there to protect us from planner regressions
> > we saw when moving to 8.4. When we move to 9, they can hopefully go
> > away.)
> >
> >
> > On the production server, this returns a fairly accurate plan:
> >
> >
> >
> > QUERY PLAN
> >
> > ----------------------------------------------------------------------------------------------------------------------------------------------------------
> > Nested Loop (cost=843.59..1447.90 rows=243 width=8) (actual
> > time=0.044..0.045 rows=1 loops=1)
> > -> HashAggregate (cost=843.59..845.59 rows=200 width=4) (actual
> > time=0.027..0.027 rows=1 loops=1)
> > -> Limit (cost=0.02..823.90 rows=1575 width=4) (actual
> > time=0.024..0.025 rows=1 loops=1)
> > -> Nested Loop (cost=0.02..823.90 rows=1575 width=4)
> > (actual time=0.023..0.024 rows=1 loops=1)
> > -> HashAggregate (cost=0.02..0.03 rows=1
> > width=4) (actual time=0.005..0.005 rows=1 loops=1)
> > -> Limit (cost=0.00..0.01 rows=1 width=0)
> > (actual time=0.001..0.002 rows=1 loops=1)
> > -> Result (cost=0.00..0.01 rows=1
> > width=0) (actual time=0.000..0.000 rows=1 loops=1)
> > -> Index Scan using users_user_groups_idx on
> > users (cost=0.00..804.18 rows=1575 width=8) (actual time=0.017..0.018
> > rows=1 loops=1)
> > Index Cond: (users.user_group_id =
> > (607547))
> > -> Index Scan using machines_sid_un on machines (cost=0.00..3.00
> > rows=1 width=12) (actual time=0.015..0.015 rows=1 loops=1)
> > Index Cond: (machines.user_id = users.id)
> > Total runtime: 0.121 ms
> > (12 rows)
> >
> >
> >
> >
> > On the QA server, things are not so accurate. It doesn't hurt the
> > timing of this simplified query much, but when put into the actual
> > query, the row estimation being off by 6 orders of magnitude really
> > throws the planning in the wrong direction. The plan on the QA server
> > is:
> >
> >
> >
> > QUERY PLAN
> >
> > -----------------------------------------------------------------------------------------------------------------------------------------------------------
> > Nested Loop (cost=1887.16..3671.20 rows=1192462 width=8) (actual
> > time=0.049..0.051 rows=1 loops=1)
> > -> HashAggregate (cost=1887.16..1889.16 rows=200 width=4) (actual
> > time=0.032..0.033 rows=1 loops=1)
> > -> Limit (cost=0.02..1868.20 rows=1517 width=4) (actual
> > time=0.027..0.029 rows=1 loops=1)
> > -> Nested Loop (cost=0.02..1868.20 rows=1517 width=4)
> > (actual time=0.027..0.028 rows=1 loops=1)
> > -> HashAggregate (cost=0.02..0.03 rows=1
> > width=4) (actual time=0.008..0.008 rows=1 loops=1)
> > -> Limit (cost=0.00..0.01 rows=1 width=0)
> > (actual time=0.001..0.001 rows=1 loops=1)
> > -> Result (cost=0.00..0.01 rows=1
> > width=0) (actual time=0.001..0.001 rows=1 loops=1)
> > -> Index Scan using users_user_groups_idx on
> > users (cost=0.00..1849.20 rows=1517 width=8) (actual
> > time=0.015..0.016 rows=1 loops=1)
> > Index Cond: (users.user_group_id =
> > (607547))
> > -> Index Scan using machines_sid_un on machines (cost=0.00..8.90
> > rows=1 width=12) (actual time=0.013..0.013 rows=1 loops=1)
> > Index Cond: (machines.user_id = users.id)
> > Total runtime: 0.148 ms
> > (12 rows)
> >
> >
> >
> >
> >
> >
> > The problem here (I think) seems to be that the QA server believes
> > that running a nested loop over 200 users.id values and joining that
> > against machines.user_id will result in >1M rows. The production
> > servers sees this more accurately as the nearly 1:1 relationship that
> > it is.
> >
> >
> > The reason I wonder if this might be a bug is because if I change the
> > obtuse clause "WHERE user_group_id IN (select 607547 offset 0)" to
> > simply "where user_group_id in (607547)" then the plan collapses to
> > the same plan on both servers:
> >
> >
> > explain analyze SELECT machines.quota_purchased
> > FROM machines
> > WHERE NOT deleted AND machines.user_id IN (
> > SELECT id FROM users WHERE user_group_id IN (
> > 607547
> > ) OFFSET 0
> > );
> >
> >
> >
> >
> >
> > QUERY PLAN
> >
> > -------------------------------------------------------------------------------------------------------------------------------------------------
> > Nested Loop (cost=82.27..636.70 rows=62 width=8) (actual
> > time=0.042..0.043 rows=1 loops=1)
> > -> HashAggregate (cost=82.27..82.88 rows=61 width=4) (actual
> > time=0.024..0.025 rows=1 loops=1)
> > -> Limit (cost=0.00..81.51 rows=61 width=4) (actual
> > time=0.017..0.018 rows=1 loops=1)
> > -> Index Scan using users_user_groups_idx on users
> > (cost=0.00..81.51 rows=61 width=4) (actual time=0.015..0.016 rows=1
> > loops=1)
> > Index Cond: (user_group_id = 607547)
> > -> Index Scan using machines_sid_un on machines (cost=0.00..9.07
> > rows=1 width=12) (actual time=0.013..0.013 rows=1 loops=1)
> > Index Cond: (machines.user_id = users.id)
> > Total runtime: 0.106 ms
> > (8 rows)
> >
> >
> >
> >
> > But, as I understand it, that remaining OFFSET 0 should keep the
> > planner from caring that it's a single value or the result of a
> > subselect.
> >
> >
> >
> >
> > Anyway, anything I can check? The hardware is different so the
> > configuration files are different, but they're not that different in
> > anything other than effective_cache_size. Certainly not different in
> > any way I can think that would affect this kind of planning
> > mis-estimation.
>
>
>
>


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: Ben Chobot <bench(at)silentmedia(dot)com>, pgsql-general General <pgsql-general(at)postgresql(dot)org>
Subject: Re: regression between 8.4.8 and 8.4.2?
Date: 2011-08-31 17:47:11
Message-ID: 21181.1314812831@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Peter Eisentraut <peter_e(at)gmx(dot)net> writes:
> I don't have an answer for you, but this report looks suspiciously
> similar to the one I posted the other day at
> <http://archives.postgresql.org/pgsql-hackers/2011-08/msg01224.php>,
> which, now that I think about it, also manifested itself after the
> upgrade to 8.4.8.

I think there is more than one thing going on here. I've identified a
logic error in this 8.4 change:
http://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=7f3eba30
which is that it is relying on vardata[12]->rel->rows to provide the
number of rows coming in to the semijoin, but that's only accurate for a
single level of join. With two nested semijoins you get a pretty wacko
answer --- unless you prevent them from being folded by inserting OFFSET
0. So that's definitely a bug, but it doesn't explain Ben's complaint
because he's griping about a case where he did have OFFSET 0. (I wonder
though if this is the "8.4 planner regression" that he put in the OFFSET
for originally. Sure would be nice if people reported such things
instead of hacking around them and imagining that they'll get fixed
magically.) It also doesn't explain Mark Kirkwood's complaint, since
he's showing test cases that involve only one join.

The only relevant-looking change I can find between 8.4.7 and 8.4.8 is
http://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=0ae8b300388c2a3eaf90e6e6f13d6be1f4d4ac2d
which again should not have caused the amount of excitement we're seeing
on this thread, since it should represent no worse than a reversion to
8.3 behavior. Possibly what we're after is some earlier 8.4.x patch.

regards, tom lane


From: Ben Chobot <bench(at)silentmedia(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, pgsql-general General <pgsql-general(at)postgresql(dot)org>
Subject: Re: regression between 8.4.8 and 8.4.2?
Date: 2011-08-31 17:56:28
Message-ID: ACAB6730-B89D-4A0A-A776-B2DF78AB7684@silentmedia.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Aug 31, 2011, at 10:47 AM, Tom Lane wrote:

> Peter Eisentraut <peter_e(at)gmx(dot)net> writes:
>> I don't have an answer for you, but this report looks suspiciously
>> similar to the one I posted the other day at
>> <http://archives.postgresql.org/pgsql-hackers/2011-08/msg01224.php>,
>> which, now that I think about it, also manifested itself after the
>> upgrade to 8.4.8.
>
> I think there is more than one thing going on here. I've identified a
> logic error in this 8.4 change:
> http://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=7f3eba30
> which is that it is relying on vardata[12]->rel->rows to provide the
> number of rows coming in to the semijoin, but that's only accurate for a
> single level of join. With two nested semijoins you get a pretty wacko
> answer --- unless you prevent them from being folded by inserting OFFSET
> 0. So that's definitely a bug, but it doesn't explain Ben's complaint
> because he's griping about a case where he did have OFFSET 0. (I wonder
> though if this is the "8.4 planner regression" that he put in the OFFSET
> for originally. Sure would be nice if people reported such things
> instead of hacking around them and imagining that they'll get fixed
> magically.) It also doesn't explain Mark Kirkwood's complaint, since
> he's showing test cases that involve only one join.
>
> The only relevant-looking change I can find between 8.4.7 and 8.4.8 is
> http://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=0ae8b300388c2a3eaf90e6e6f13d6be1f4d4ac2d
> which again should not have caused the amount of excitement we're seeing
> on this thread, since it should represent no worse than a reversion to
> 8.3 behavior. Possibly what we're after is some earlier 8.4.x patch.

Tom, if there's anything else we can provide that might you out, let me know. We're currently about to install an earlier 8.4 version to see if the problem goes away. Is there a particular version you'd be interested to know about?


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Ben Chobot <bench(at)silentmedia(dot)com>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, pgsql-general General <pgsql-general(at)postgresql(dot)org>
Subject: Re: regression between 8.4.8 and 8.4.2?
Date: 2011-08-31 18:10:51
Message-ID: 21693.1314814251@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Ben Chobot <bench(at)silentmedia(dot)com> writes:
> Tom, if there's anything else we can provide that might you out, let me know.

If you could extract a self-contained test case for the bad estimation,
that would be useful.

regards, tom lane


From: Ben Chobot <bench(at)silentmedia(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, pgsql-general General <pgsql-general(at)postgresql(dot)org>
Subject: Re: regression between 8.4.8 and 8.4.2?
Date: 2011-08-31 18:53:39
Message-ID: B98B268E-E59E-4992-B69C-D43CB010BCBE@silentmedia.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Aug 31, 2011, at 11:10 AM, Tom Lane wrote:

> Ben Chobot <bench(at)silentmedia(dot)com> writes:
>> Tom, if there's anything else we can provide that might you out, let me know.
>
> If you could extract a self-contained test case for the bad estimation,
> that would be useful.

OK, we'll pull something together. In the meantime, we can confirm that reverting from 8.4.8 to 8.4.3 fixes things.


From: Ben Chobot <bench(at)silentmedia(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, pgsql-general General <pgsql-general(at)postgresql(dot)org>
Subject: Re: regression between 8.4.8 and 8.4.2?
Date: 2011-08-31 22:49:18
Message-ID: 0DA23DFF-FA67-41C8-83EF-D9901AE06141@silentmedia.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Aug 31, 2011, at 11:53 AM, Ben Chobot wrote:

> On Aug 31, 2011, at 11:10 AM, Tom Lane wrote:
>
>> Ben Chobot <bench(at)silentmedia(dot)com> writes:
>>> Tom, if there's anything else we can provide that might you out, let me know.
>>
>> If you could extract a self-contained test case for the bad estimation,
>> that would be useful.
>
> OK, we'll pull something together. In the meantime, we can confirm that reverting from 8.4.8 to 8.4.3 fixes things.

.... and FWIW 9.0.3 does the right thing as well.