Quick Links

Re: Queries joining views

Lists:	pgsql-generalpgsql-hackers

From:	DelGurth <delgurth(at)gmail(dot)com>
To:	pgsql-general(at)postgresql(dot)org
Subject:	Re: Queries joining views
Date:	2006-08-21 20:57:52
Message-ID:	10268b3e0608211357n4f0e8b99u689803c079af25d7@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

>
> Alban Hertroys <alban ( at ) magproductions ( dot ) nl> writes:
> > Is there a trick to make this work a bit faster?
>
> Have you really shown us the right queries for those explain results?
> I don't see where the second plan is testing "dir <> 1" at all.
> It looks like the first one is faster because it's using a partial
> index that has predicate dir <> 1, while the second one is using
> a much larger full index. But I don't see where the second plan
> is applying that restriction, so I wonder if you forgot it in the
> query.
>
> regards, tom lane

He has really shown the right queries. But I see the table definition if
mm_insrel_table (including the indexes) is not in the e-mail, so you don't
see why the dir <> 1 is not in the query plan. Here is the table definition,
with the indexes. As you can see we tried some indexes, to see if we could
get the queries on the views to become faster.

zorgweb_solaris=> \d mm_insrel_table
Table "public.mm_insrel_table"
Column | Type | Modifiers
---------+---------+-----------
number | integer | not null
snumber | integer | not null
dnumber | integer | not null
rnumber | integer | not null
dir | integer |
Indexes:
"mm_insrel_table_pkey" PRIMARY KEY, btree (number)
"mm_insrel_dir_not_one_idx" btree (dnumber, snumber) WHERE dir <> 1
"mm_insrel_dnumber_dir_not_one_idx" btree (dnumber) WHERE dir <> 1
"mm_insrel_full_idx" btree (snumber, dnumber, rnumber, dir) WHERE dir <>
1
"mm_insrel_relation_idx" btree (snumber, dnumber, rnumber)
Foreign-key constraints:
"mm_insrel_table_dnumber_fkey" FOREIGN KEY (dnumber) REFERENCES
mm_object(number)
"mm_insrel_table_rnumber_fkey" FOREIGN KEY (rnumber) REFERENCES
mm_object(number)
"mm_insrel_table_snumber_fkey" FOREIGN KEY (snumber) REFERENCES
mm_object(number)

I hope this explains you why the dir <> 1 is not in the view query. Why the
other query plan thinks it needs to recheck the condition is not clear to
me, but I'm not an expert on PostgreSQL query plans.

Regards,
Wessel van Norel

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	DelGurth <delgurth(at)gmail(dot)com>
Cc:	pgsql-general(at)postgresql(dot)org
Subject:	Re: Queries joining views
Date:	2006-08-21 21:32:33
Message-ID:	20119.1156195953@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

DelGurth <delgurth(at)gmail(dot)com> writes:
> As you can see we tried some indexes, to see if we could
> get the queries on the views to become faster.

> Indexes:
> "mm_insrel_table_pkey" PRIMARY KEY, btree (number)
> "mm_insrel_dir_not_one_idx" btree (dnumber, snumber) WHERE dir <> 1
> "mm_insrel_dnumber_dir_not_one_idx" btree (dnumber) WHERE dir <> 1
> "mm_insrel_full_idx" btree (snumber, dnumber, rnumber, dir) WHERE dir <>
> 1
> "mm_insrel_relation_idx" btree (snumber, dnumber, rnumber)

Hmph ... it certainly appears to be choosing the wrong index in the
second case. I wonder why --- can you show the relpages and reltuples
stats from pg_class for these indexes?

It might be interesting also to examine the output of just

explain select * from mm_insrel_table where dnumber=558332 and dir<>1

with different subsets of these indexes in place. I'd like to see what
it's deriving as the cost estimates for these indexes. If you can get
an EXPLAIN selecting each one of the indexes, that would help diagnose
what's going on.

Dunno if you know this trick already, but it's possible to experiment
with different index subsets without physically dropping and recreating
the indexes. Try

begin;
drop [unwanted indexes]
explain ...
rollback;

This will hold exclusive lock on the table until you rollback, so if
it's a production database you want to be quick about it --- maybe put
the whole thing in a SQL script. But it sure beats rebuilding indexes.

BTW, what PG version is this exactly?

regards, tom lane

From:	DelGurth <delgurth(at)gmail(dot)com>
To:	"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-general(at)postgresql(dot)org
Cc:	alban(at)magproductions(dot)nl
Subject:	Re: Queries joining views
Date:	2006-08-21 22:23:36
Message-ID:	10268b3e0608211523x24cb472er737af03657785113@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On 8/21/06, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> Hmph ... it certainly appears to be choosing the wrong index in the
> second case. I wonder why --- can you show the relpages and reltuples
> stats from pg_class for these indexes?

I'm personally not aware how to do that, perhaps Alban will (tell me how to)
do that tomorrow.

It might be interesting also to examine the output of just
>
> explain select * from mm_insrel_table where dnumber=558332 and dir<>1
>
> with different subsets of these indexes in place. I'd like to see what
> it's deriving as the cost estimates for these indexes. If you can get
> an EXPLAIN selecting each one of the indexes, that would help diagnose
> what's going on.

Ok. Did that (with your trick, thanks!). The output is attached to
this e-mail. The script I used to drop the indexes dropped them in the
order it was using them
(partially by accident, partially because I assumed it would work in that
order).

I'm not sure if you want to see more permutations, if so please tell me.

BTW, what PG version is this exactly?

Our PG version is the version downloadable from
http://www.sunfreeware.com/programlistsparc10.html#postgresql, so 8.0.1 for
solaris sparc.

(it's a sun T2000 test machine).

regards, tom lane
>

Regards,
Wessel van Norel

Attachment	Content-Type	Size
index_differences.txt	text/plain	3.3 KB

From:	DelGurth <delgurth(at)gmail(dot)com>
To:	"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-general(at)postgresql(dot)org
Subject:	Re: Queries joining views
Date:	2006-08-21 22:37:55
Message-ID:	10268b3e0608211537r7df3cd5cyc60629187d438a35@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

>
>
> BTW, what PG version is this exactly?
>
>
> Our PG version is the version downloadable from http://www.sunfreeware.com/programlistsparc10.html#postgresql
> , so 8.0.1 for solaris sparc.
>

Sorry I was wrong on this point, it's 8.1.4

-bash-3.00$ pg_config --version
PostgreSQL 8.1.4

And it's the version from blastwave.org:
http://www.blastwave.org/packages.php/postgresql

Regards,
Wessel van Norel

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	DelGurth <delgurth(at)gmail(dot)com>
Cc:	pgsql-general(at)postgresql(dot)org, alban(at)magproductions(dot)nl
Subject:	Re: Queries joining views
Date:	2006-08-22 00:09:38
Message-ID:	26781.1156205378@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

DelGurth <delgurth(at)gmail(dot)com> writes:
> On 8/21/06, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> It might be interesting also to examine the output of just
>> explain select * from mm_insrel_table where dnumber=558332 and dir<>1
>> with different subsets of these indexes in place.

> Ok. Did that (with your trick, thanks!). The output is attached to
> this e-mail.

Well, no smoking gun there, it clearly knows that mm_insrel_full_idx
is much more expensive for this query than the other two...

Looking back at Alban's original post, I finally see what the planner
is up to:

-> Merge Join (cost=0.00..165.07 rows=1 width=28) (actual time=53.890..129.310 rows=1 loops=1)
Merge Cond: ("outer".number = "inner".number)
-> Nested Loop (cost=0.00..2796.82 rows=30 width=28) (actual time=44.088..117.487 rows=2 loops=1)
-> Nested Loop (cost=0.00..2682.38 rows=30 width=24) (actual time=44.034..117.375 rows=2 loops=1)
-> Index Scan using mm_insrel_full_idx on mm_insrel_table (cost=0.00..2512.97 rows=30 width=20) (actual time=43.975..117.246 rows=2 loops=1)
Index Cond: (dnumber = 558332)

The reason it's choosing this indexscan is that that will give it data
sorted by mm_insrel_table.number, which it can feed into the mergejoin
without an extra sort step. Now sorting 30 rows is not going to take
nearly as much time as the indexscan eats up, so this still doesn't
make sense --- until you notice that it's estimating the top merge join
at considerably less than the cost of its inputs (165.07, vss 2796.82
just for this input). That means it thinks it won't have to run the
inputs to completion in order to finish the mergejoin, and so it's
picking a sub-plan that has zero start cost.

What this means is that the planner thinks the range of "number" values
in mm_product_table (the other side of the mergejoin) is much less than
the range in mm_insrel_table. Is that the case? Perhaps your ANALYZE
stats for these tables are out of date. If not I'd like to see the
pg_stats entries for the two "number" columns.

regards, tom lane

From:	Alban Hertroys <alban(at)magproductions(dot)nl>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	DelGurth <delgurth(at)gmail(dot)com>, pgsql-general(at)postgresql(dot)org
Subject:	Re: Queries joining views
Date:	2006-08-22 09:50:26
Message-ID:	44EAD362.8050501@magproductions.nl
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

Tom Lane wrote:
> DelGurth <delgurth(at)gmail(dot)com> writes:
>> As you can see we tried some indexes, to see if we could
>> get the queries on the views to become faster.
>
>> Indexes:
>> "mm_insrel_table_pkey" PRIMARY KEY, btree (number)
>> "mm_insrel_dir_not_one_idx" btree (dnumber, snumber) WHERE dir <> 1
>> "mm_insrel_dnumber_dir_not_one_idx" btree (dnumber) WHERE dir <> 1
>> "mm_insrel_full_idx" btree (snumber, dnumber, rnumber, dir) WHERE dir <>
>> 1
>> "mm_insrel_relation_idx" btree (snumber, dnumber, rnumber)
>
> Hmph ... it certainly appears to be choosing the wrong index in the
> second case. I wonder why --- can you show the relpages and reltuples
> stats from pg_class for these indexes?

Here they are:

relname | relpages | reltuples
-----------------------------------+----------+-----------
mm_insrel_dir_not_one_idx | 899 | 323628
mm_insrel_dnumber_dir_not_one_idx | 899 | 323628
mm_insrel_table_pkey | 1237 | 323628
mm_insrel_relation_idx | 1849 | 323628
mm_insrel_full_idx | 1260 | 323628

--
Alban Hertroys
alban(at)magproductions(dot)nl

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
7500 AK Enschede

// Integrate Your World //

From:	Alban Hertroys <alban(at)magproductions(dot)nl>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	DelGurth <delgurth(at)gmail(dot)com>, pgsql-general(at)postgresql(dot)org
Subject:	Re: Queries joining views
Date:	2006-08-22 10:12:41
Message-ID:	44EAD899.3050906@magproductions.nl
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

Tom Lane wrote:
> DelGurth <delgurth(at)gmail(dot)com> writes:
>> On 8/21/06, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> It might be interesting also to examine the output of just
>>> explain select * from mm_insrel_table where dnumber=558332 and dir<>1
>>> with different subsets of these indexes in place.
>
>> Ok. Did that (with your trick, thanks!). The output is attached to
>> this e-mail.
>
> Well, no smoking gun there, it clearly knows that mm_insrel_full_idx
> is much more expensive for this query than the other two...
>
> Looking back at Alban's original post, I finally see what the planner
> is up to:
>
> -> Merge Join (cost=0.00..165.07 rows=1 width=28) (actual time=53.890..129.310 rows=1 loops=1)
> Merge Cond: ("outer".number = "inner".number)
> -> Nested Loop (cost=0.00..2796.82 rows=30 width=28) (actual time=44.088..117.487 rows=2 loops=1)
> -> Nested Loop (cost=0.00..2682.38 rows=30 width=24) (actual time=44.034..117.375 rows=2 loops=1)
> -> Index Scan using mm_insrel_full_idx on mm_insrel_table (cost=0.00..2512.97 rows=30 width=20) (actual time=43.975..117.246 rows=2 loops=1)
> Index Cond: (dnumber = 558332)
>
> The reason it's choosing this indexscan is that that will give it data
> sorted by mm_insrel_table.number, which it can feed into the mergejoin
> without an extra sort step. Now sorting 30 rows is not going to take
> nearly as much time as the indexscan eats up, so this still doesn't
> make sense --- until you notice that it's estimating the top merge join
> at considerably less than the cost of its inputs (165.07, vss 2796.82
> just for this input). That means it thinks it won't have to run the
> inputs to completion in order to finish the mergejoin, and so it's
> picking a sub-plan that has zero start cost.

I see. Obviously that's not right for our case, so we'll have to figure
out why it thinks that.

> What this means is that the planner thinks the range of "number" values
> in mm_product_table (the other side of the mergejoin) is much less than
> the range in mm_insrel_table. Is that the case? Perhaps your ANALYZE

Very much so. The mm_product_table only contains about 1500 products,
while the mm_insrel_table contains a record for every relation in the
application (This is MMBase; it handles all relations with a relation
table). That's about 330,000 records.

> stats for these tables are out of date. If not I'd like to see the
> pg_stats entries for the two "number" columns.

Especially during optimization sessions like this one we tend to analyse
rather frequently. All these indices were created yesterday (except for
the primary key index) and the corresponding tables were analyzed after
each index creation (aboutish). Data hasn't changed since a while -
we're in the middle of a test migration from mysql[1].

As for the stats, I included the one for mm_object as well, as every
view contains at least a join with that table - thus it contains 1284556
records... I suspect we're in for another few painful surprises there.

zorgweb_solaris=> select * from pg_stats where attname = 'number' and
tablename IN ('mm_insrel_table', 'mm_product_table', 'mm_object');
-[ RECORD 1
]-----+------------------------------------------------------------------------------------
schemaname | public
tablename | mm_product_table
attname | number
null_frac | 0
avg_width | 4
n_distinct | -1
most_common_vals |
most_common_freqs |
histogram_bounds | {2930,3244,3558,3872,4186,4500,4814,5128,5442,5756,6070}
correlation | 0.993398
-[ RECORD 2
]-----+------------------------------------------------------------------------------------
schemaname | public
tablename | mm_insrel_table
attname | number
null_frac | 0
avg_width | 4
n_distinct | -1
most_common_vals |
most_common_freqs |
histogram_bounds |
{615920,689286,750855,812003,872741,933041,1004672,1068250,1134894,1198559,1261685}
correlation | 0.664637
-[ RECORD 3
]-----+------------------------------------------------------------------------------------
schemaname | public
tablename | mm_object
attname | number
null_frac | 0
avg_width | 4
n_distinct | -1
most_common_vals |
most_common_freqs |
histogram_bounds |
{287,124412,256534,375896,505810,643940,770327,899229,1028933,1153260,1262338}
correlation | 0.858558

As a side note, I'm in the progress of rewriting MMBase code to use
explicit joins where applicable. AFAIK the planner can handle those better.

> regards, tom lane

[1]: With MySQL(4) there was no way for us to enhance performance any
more. There are a number of tree-like structures in our data model, and
MySQL just lacks the features to cope with that. With PostgreSQL we at
least can make use of the ltree contrib package - quite an improvement
in performance so far. Even though the ltrees are applied on text
columns with text2ltree() conversions in both the queries and the GIST
index.

We also figured we could use some triggers to generate data that could
improve query performance (moving conversions from SELECT-time to
INSERT-time), but unfortunately MMBase's caches are in the way there.

Regards, Alban.
--
Alban Hertroys
alban(at)magproductions(dot)nl

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
7500 AK Enschede

// Integrate Your World //

From:	Alban Hertroys <alban(at)magproductions(dot)nl>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	DelGurth <delgurth(at)gmail(dot)com>, pgsql-general(at)postgresql(dot)org
Subject:	Re: Queries joining views
Date:	2006-08-22 10:16:53
Message-ID:	44EAD995.8090705@magproductions.nl
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

Tom Lane wrote:
> Looking back at Alban's original post, I finally see what the planner
> is up to:
>
> -> Merge Join (cost=0.00..165.07 rows=1 width=28) (actual time=53.890..129.310 rows=1 loops=1)
> Merge Cond: ("outer".number = "inner".number)
> -> Nested Loop (cost=0.00..2796.82 rows=30 width=28) (actual time=44.088..117.487 rows=2 loops=1)
> -> Nested Loop (cost=0.00..2682.38 rows=30 width=24) (actual time=44.034..117.375 rows=2 loops=1)
> -> Index Scan using mm_insrel_full_idx on mm_insrel_table (cost=0.00..2512.97 rows=30 width=20) (actual time=43.975..117.246 rows=2 loops=1)
> Index Cond: (dnumber = 558332)
>
> The reason it's choosing this indexscan is that that will give it data
> sorted by mm_insrel_table.number, which it can feed into the mergejoin
> without an extra sort step. Now sorting 30 rows is not going to take
> nearly as much time as the indexscan eats up, so this still doesn't
> make sense --- until you notice that it's estimating the top merge join
> at considerably less than the cost of its inputs (165.07, vss 2796.82
> just for this input). That means it thinks it won't have to run the
> inputs to completion in order to finish the mergejoin, and so it's
> picking a sub-plan that has zero start cost.

I'm thinking that removing the indexes it's erroneously using now could
help performance, as it can no longer use that index. It may however
pick the primary key index (likely), or - if we remove even that one - a
sequential scan... Experimenting will answer that.

Thanks for your answers so far, at least now we know what's going on.

Regards,
--
Alban Hertroys
alban(at)magproductions(dot)nl

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
7500 AK Enschede

// Integrate Your World //

From:	Alban Hertroys <alban(at)magproductions(dot)nl>
To:	Alban Hertroys <alban(at)magproductions(dot)nl>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, DelGurth <delgurth(at)gmail(dot)com>, pgsql-general(at)postgresql(dot)org
Subject:	Re: Queries joining views
Date:	2006-08-22 10:30:13
Message-ID:	44EADCB5.9020401@magproductions.nl
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

Alban Hertroys wrote:
> Tom Lane wrote:
> I'm thinking that removing the indexes it's erroneously using now could
> help performance, as it can no longer use that index. It may however
> pick the primary key index (likely), or - if we remove even that one - a
> sequential scan... Experimenting will answer that.

Well, look at the attachhed explain output of:

zorgweb_solaris=> BEGIN;
DROP INDEX mm_insrel_full_idx; DROP INDEX mm_insrel_relation_idx;
EXPLAIN ANALYZE
SELECT
insrel.rnumber,insrel.dnumber,insrel.snumber,insrel.dir,insrel.number,medical_care_container.number,product.number
FROM mm_medical_care_container medical_care_container,
mm_insrel insrel,
mm_product product
WHERE medical_care_container.number=558332
AND (medical_care_container.number=insrel.dnumber
AND product.number=insrel.snumber AND insrel.dir<>1);
ROLLBACK;

Only 13ms as opposed to 130-ish :)

--
Alban Hertroys
alban(at)magproductions(dot)nl

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
7500 AK Enschede

// Integrate Your World //

Attachment	Content-Type	Size
explain_plan.txt	text/plain	2.6 KB

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Alban Hertroys <alban(at)magproductions(dot)nl>
Cc:	DelGurth <delgurth(at)gmail(dot)com>, pgsql-general(at)postgresql(dot)org
Subject:	Re: Queries joining views
Date:	2006-08-22 12:56:37
Message-ID:	2072.1156251397@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

Alban Hertroys <alban(at)magproductions(dot)nl> writes:
> zorgweb_solaris=> select * from pg_stats where attname = 'number' and
> tablename IN ('mm_insrel_table', 'mm_product_table', 'mm_object');

> tablename | mm_product_table
> histogram_bounds | {2930,3244,3558,3872,4186,4500,4814,5128,5442,5756,6070}

> tablename | mm_insrel_table
> {615920,689286,750855,812003,872741,933041,1004672,1068250,1134894,1198559,1261685}

> tablename | mm_object
> histogram_bounds |
> {287,124412,256534,375896,505810,643940,770327,899229,1028933,1153260,1262338}

OK, so here's our problem: according to those stats, the ranges of
"number" in mm_product_table and mm_insrel_table don't overlap at all.
So the cost model for mergejoin predicts that a mergejoin on "number"
will have to read all of mm_product_table but only the first record from
mm_insrel_table, and given the difference in size of the two tables,
that looks like a pretty good deal.

Given that the plan is not actually very fast, I suppose that the
histogram is not telling the whole truth --- probably there are a few
outlying records in one table or the other causing there to be a more
significant overlap than the planner expects. If so, you can probably
fix it by increasing the statistics target for that table.

regards, tom lane

From:	Alban Hertroys <alban(at)magproductions(dot)nl>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	DelGurth <delgurth(at)gmail(dot)com>, pgsql-general(at)postgresql(dot)org
Subject:	Re: Queries joining views
Date:	2006-08-22 13:08:09
Message-ID:	44EB01B9.6000705@magproductions.nl
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

Tom Lane wrote:
> Alban Hertroys <alban(at)magproductions(dot)nl> writes:
>> zorgweb_solaris=> select * from pg_stats where attname = 'number' and
>> tablename IN ('mm_insrel_table', 'mm_product_table', 'mm_object');
>
>> tablename | mm_product_table
>> histogram_bounds | {2930,3244,3558,3872,4186,4500,4814,5128,5442,5756,6070}
>
>> tablename | mm_insrel_table
>> {615920,689286,750855,812003,872741,933041,1004672,1068250,1134894,1198559,1261685}
>
>> tablename | mm_object
>> histogram_bounds |
>> {287,124412,256534,375896,505810,643940,770327,899229,1028933,1153260,1262338}
>
> OK, so here's our problem: according to those stats, the ranges of
> "number" in mm_product_table and mm_insrel_table don't overlap at all.

That's correct, the numbers are generated by a global sequence.
Insrel.number can never match a product.number.

However, mm_product.number always matches either mm_insrel.snumber or
mm_insrel.dnumber (source and destination respectively). The other way
around this isn't the case; then snumber and dnumber match number-fields
in other tables (they always do).

> So the cost model for mergejoin predicts that a mergejoin on "number"
> will have to read all of mm_product_table but only the first record from
> mm_insrel_table, and given the difference in size of the two tables,
> that looks like a pretty good deal.
>
> Given that the plan is not actually very fast, I suppose that the
> histogram is not telling the whole truth --- probably there are a few
> outlying records in one table or the other causing there to be a more
> significant overlap than the planner expects. If so, you can probably
> fix it by increasing the statistics target for that table.

That's a bit odd, as the number fields of the different tables are
globally unique by definition.

> regards, tom lane

Regards,
--
Alban Hertroys
alban(at)magproductions(dot)nl

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
7500 AK Enschede

// Integrate Your World //

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Alban Hertroys <alban(at)magproductions(dot)nl>
Cc:	DelGurth <delgurth(at)gmail(dot)com>, pgsql-general(at)postgresql(dot)org
Subject:	Re: Queries joining views
Date:	2006-08-22 13:21:46
Message-ID:	2446.1156252906@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

Alban Hertroys <alban(at)magproductions(dot)nl> writes:
> However, mm_product.number always matches either mm_insrel.snumber or
> mm_insrel.dnumber (source and destination respectively). The other way
> around this isn't the case; then snumber and dnumber match number-fields
> in other tables (they always do).

Oh, then we are looking at the wrong things: we should be comparing the
histograms of the fields that are being used as the join keys in this
query. I had thought they were both "number", but I must be confused.

regards, tom lane

From:	Alban Hertroys <alban(at)magproductions(dot)nl>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	DelGurth <delgurth(at)gmail(dot)com>, pgsql-general(at)postgresql(dot)org
Subject:	Re: Queries joining views
Date:	2006-08-22 13:59:12
Message-ID:	44EB0DB0.1060602@magproductions.nl
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

Tom Lane wrote:
> Alban Hertroys <alban(at)magproductions(dot)nl> writes:
>> However, mm_product.number always matches either mm_insrel.snumber or
>> mm_insrel.dnumber (source and destination respectively). The other way
>> around this isn't the case; then snumber and dnumber match number-fields
>> in other tables (they always do).
>
> Oh, then we are looking at the wrong things: we should be comparing the
> histograms of the fields that are being used as the join keys in this
> query. I had thought they were both "number", but I must be confused.

The design is certainly a bit confusing until you get used to it. It
usually takes new devs here a while to find their way around MMBase
(www.mmbase.org) and its peculiarities. It doesn't help that the
documentation is in rather bad English.

> regards, tom lane

So this is what we're looking for, right? I can't say I understand how
to interpret this, let alone come to conclusions. I'm afraid I totally
depend on your interpretation here...

zorgweb_solaris=> select * from pg_stats where (attname in ('snumber',
'dnumber') and tablename = 'mm_insrel_table') or (attname = 'number' and
tablename = 'mm_product_table');
-[ RECORD 1
]-----+-----------------------------------------------------------------------------------------------------------
schemaname | public
tablename | mm_product_table
attname | number
null_frac | 0
avg_width | 4
n_distinct | -1
most_common_vals |
most_common_freqs |
histogram_bounds | {2930,3244,3558,3872,4186,4500,4814,5128,5442,5756,6070}
correlation | 0.993398
-[ RECORD 2
]-----+-----------------------------------------------------------------------------------------------------------
schemaname | public
tablename | mm_insrel_table
attname | snumber
null_frac | 0
avg_width | 4
n_distinct | 14336
most_common_vals | {4300,5210,5366,2994,3724,4118,2982,3058,3072,3460}
most_common_freqs |
{0.00266667,0.002,0.002,0.00166667,0.00166667,0.00166667,0.00133333,0.00133333,0.00133333,0.00133333}
histogram_bounds |
{135,3768,4780,14822,57048,92958,125442,158954,433002,502836,610034}
correlation | 0.083602
-[ RECORD 3
]-----+-----------------------------------------------------------------------------------------------------------
schemaname | public
tablename | mm_insrel_table
attname | dnumber
null_frac | 0
avg_width | 4
n_distinct | 11028
most_common_vals | {1117583,279,415,291,343,389,635,839,1043,319}
most_common_freqs |
{0.00433333,0.00333333,0.003,0.00266667,0.00266667,0.00266667,0.00266667,0.00266667,0.00266667,0.00233333}
histogram_bounds |
{147,717,3770,263126,327054,429524,461026,490094,518872,544098,1117603}
correlation | 0.0571927

Regards,
--
Alban Hertroys
alban(at)magproductions(dot)nl

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
7500 AK Enschede

// Integrate Your World //

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Alban Hertroys <alban(at)magproductions(dot)nl>
Cc:	DelGurth <delgurth(at)gmail(dot)com>, pgsql-general(at)postgresql(dot)org
Subject:	Re: Queries joining views
Date:	2006-08-22 14:48:07
Message-ID:	3222.1156258087@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

Alban Hertroys <alban(at)magproductions(dot)nl> writes:
> tablename | mm_product_table
> attname | number
> histogram_bounds | {2930,3244,3558,3872,4186,4500,4814,5128,5442,5756,6070}

> tablename | mm_insrel_table
> attname | snumber
> histogram_bounds |
> {135,3768,4780,14822,57048,92958,125442,158954,433002,502836,610034}

Hmm ... if I'm not still confused, these are the two columns being
mergejoined in your slow query (would you double-check that?).
But the numbers don't seem to add up. Given those stats the estimate
should be that something over 20% of the mm_insrel_table has to be
scanned to complete the join (since 6070 falls into the third decile
of the other histogram). But we saw from Alban's original post that
the planner must be estimating well under 10% of the table needs to
be scanned. Either we're still confused about which columns are being
joined, or there's some weird bug in the computation.

regards, tom lane

From:	Alban Hertroys <alban(at)magproductions(dot)nl>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	DelGurth <delgurth(at)gmail(dot)com>, pgsql-general(at)postgresql(dot)org
Subject:	Re: Queries joining views
Date:	2006-08-22 15:07:19
Message-ID:	44EB1DA7.9040000@magproductions.nl
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

Tom Lane wrote:
> Alban Hertroys <alban(at)magproductions(dot)nl> writes:
>> tablename | mm_product_table
>> attname | number
>> histogram_bounds | {2930,3244,3558,3872,4186,4500,4814,5128,5442,5756,6070}
>
>> tablename | mm_insrel_table
>> attname | snumber
>> histogram_bounds |
>> {135,3768,4780,14822,57048,92958,125442,158954,433002,502836,610034}
>
> Hmm ... if I'm not still confused, these are the two columns being
> mergejoined in your slow query (would you double-check that?).

That's correct.

I read up some on the meaning of the pg_stats values, and I noticed that
mm_insrel_tables' snumber and dnumber columns seem to have a rather
bad correlation. I think this could be improved by clustering on an
index over (number, snumber, dnumber); is that correct?

> But the numbers don't seem to add up. Given those stats the estimate
> should be that something over 20% of the mm_insrel_table has to be
> scanned to complete the join (since 6070 falls into the third decile
> of the other histogram). But we saw from Alban's original post that
> the planner must be estimating well under 10% of the table needs to
> be scanned. Either we're still confused about which columns are being
> joined, or there's some weird bug in the computation.

Since the start of this thread the insrel table has grown to 339195
records (it was closer to 330,000), maybe that changed the statistics a
bit. To be sure, attached is the query plan of the problematic query
again at this moment.

The other table involved, mm_medical_care_container_table, has the
following stats on number:

zorgweb_solaris=> select * from pg_stats where attname = 'number' and
tablename = 'mm_medical_care_container_table';
-[ RECORD 1
]-----+-------------------------------------------------------------------------------
schemaname | public
tablename | mm_medical_care_container_table
attname | number
null_frac | 0
avg_width | 4
n_distinct | -1
most_common_vals |
most_common_freqs |
histogram_bounds |
{418768,436686,455444,473600,490610,508680,527182,545038,562786,578528,595132}
correlation | 0.339138

I sure hope we get this mystery unveiled...

Regards,
--
Alban Hertroys
alban(at)magproductions(dot)nl

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
7500 AK Enschede

// Integrate Your World //

Attachment	Content-Type	Size
query.plan	text/plain	2.7 KB

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Alban Hertroys <alban(at)magproductions(dot)nl>
Cc:	DelGurth <delgurth(at)gmail(dot)com>, pgsql-general(at)postgresql(dot)org
Subject:	Re: Queries joining views
Date:	2006-08-22 15:14:24
Message-ID:	3465.1156259664@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

Alban Hertroys <alban(at)magproductions(dot)nl> writes:
> I sure hope we get this mystery unveiled...

I'm confused too. Would it be possible for you to send me a dump of
your database? I need the full schema definitions of these tables,
indexes, and views, but you could null out all but the various
"number" columns being used in the joins and query conditions,
so there shouldn't be any privacy issues.

regards, tom lane

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Alban Hertroys <alban(at)magproductions(dot)nl>
Cc:	pgsql-general(at)postgreSQL(dot)org, pgsql-hackers(at)postgreSQL(dot)org
Subject:	Re: Queries joining views
Date:	2006-08-22 19:46:24
Message-ID:	10968.1156275984@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

Alban Hertroys <alban(at)magproductions(dot)nl> writes:
>> I'm confused too. Would it be possible for you to send me a dump of
>> your database?

> Attached is a cleaned out database, the full schema is included, but
> only the relevant tables contain any data.

Thanks. After digging through it a bit, I understand what's happening,
but I'm not seeing any simple fix. The problem is basically that
you've got

create or replace view mm_product as
SELECT ... FROM mm_product_table JOIN mm_object mmop USING (number);

and then the problem query has WHERE mm_product.number = insrel.snumber
which causes the planner to conclude that mm_product_table.number,
mm_object.number, and mm_insrel_table.snumber are all basically
interchangeable. In particular it ends up performing the join between
mm_product_table.number and mm_object.number as though
mm_product_table.number were being joined to mm_insrel_table.snumber.
Which is fine, except that it's thinking that the statistics for
mm_object.number are applicable in this context, and they're completely
misleading. After the join to mm_insrel_table, the statistics of the
variable are really like mm_insrel_table.number --- in particular the
fraction of the table that has to be visited is much larger than it
would've been for mm_object as a whole.

This is a problem we've understood in a generic form for awhile:
a join or selection might change the statistics of a variable,
and so the info stored in the catalogs ought to be modified somehow
to predict what will happen at upper join levels. We've not seen
it in this particular form before, though.

I'm not sure if there's a whole lot you can do about it in the near term
other than refactor your schema to avoid having different tables joining
to different subranges of mm_object.number. (You don't necessarily have
to get rid of mm_object --- just try assigning its keys from a serial,
or something, so that there's no correlation to the ranges of keys in
other tables.)

We might be able to do something about actually solving the statistical
problem in 8.3, but I fear it's too late to think about it for 8.2.

regards, tom lane

From:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Alban Hertroys <alban(at)magproductions(dot)nl>, pgsql-general(at)postgreSQL(dot)org, pgsql-hackers(at)postgreSQL(dot)org
Subject:	Re: [HACKERS] Queries joining views
Date:	2006-08-22 19:51:59
Message-ID:	20060822195159.GI25475@alvh.no-ip.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

Tom Lane wrote:

> We might be able to do something about actually solving the statistical
> problem in 8.3, but I fear it's too late to think about it for 8.2.

I take it you mean you already have a very concrete idea on how to solve
it. Come on, illuminate us poor dumb souls.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Alban Hertroys <alban(at)magproductions(dot)nl>
Cc:	pgsql-general(at)postgreSQL(dot)org, pgsql-hackers(at)postgreSQL(dot)org
Subject:	Re: Queries joining views
Date:	2006-08-22 19:52:30
Message-ID:	11034.1156276350@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

I wrote:
> Thanks. After digging through it a bit, I understand what's happening,
> but I'm not seeing any simple fix.

I forgot to mention that although I could reproduce your bad plan in
8.1, CVS HEAD doesn't fall into the trap. I don't believe we've done
anything to fix the fundamental problem however --- it may just be a
side effect of the changes in the indexscan cost model that cause it
to not go for the bogus plan.

regards, tom lane

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc:	Alban Hertroys <alban(at)magproductions(dot)nl>, pgsql-general(at)postgreSQL(dot)org, pgsql-hackers(at)postgreSQL(dot)org
Subject:	Re: [HACKERS] Queries joining views
Date:	2006-08-22 20:30:15
Message-ID:	11384.1156278615@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> Tom Lane wrote:
>> We might be able to do something about actually solving the statistical
>> problem in 8.3, but I fear it's too late to think about it for 8.2.

> I take it you mean you already have a very concrete idea on how to solve
> it. Come on, illuminate us poor dumb souls.

No, I don't :-( ... that was intended to suggest that we might think of
a solution given months to work on it rather than days.

regards, tom lane

From:	Alban Hertroys <alban(at)magproductions(dot)nl>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	pgsql-general(at)postgreSQL(dot)org, pgsql-hackers(at)postgreSQL(dot)org
Subject:	Re: Queries joining views
Date:	2006-08-23 12:07:49
Message-ID:	44EC4515.3000103@magproductions.nl
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

Tom Lane wrote:
> Alban Hertroys <alban(at)magproductions(dot)nl> writes:
>>> I'm confused too. Would it be possible for you to send me a dump of
>>> your database?
>
>> Attached is a cleaned out database, the full schema is included, but
>> only the relevant tables contain any data.
>
> Thanks. After digging through it a bit, I understand what's happening,
> but I'm not seeing any simple fix. The problem is basically that
> you've got
>
> create or replace view mm_product as
> SELECT ... FROM mm_product_table JOIN mm_object mmop USING (number);
>
> and then the problem query has WHERE mm_product.number = insrel.snumber
> which causes the planner to conclude that mm_product_table.number,
> mm_object.number, and mm_insrel_table.snumber are all basically
> interchangeable. In particular it ends up performing the join between
> mm_product_table.number and mm_object.number as though
> mm_product_table.number were being joined to mm_insrel_table.snumber.

It's even worse, I guess, as the mm_insrel view joins mm_insrel_table
with mm_object again. So basically the query performs a self-join on
mm_object with a detour through mm_insrel_table and mm_product_table...

> Which is fine, except that it's thinking that the statistics for
> mm_object.number are applicable in this context, and they're completely
> misleading. After the join to mm_insrel_table, the statistics of the
> variable are really like mm_insrel_table.number --- in particular the
> fraction of the table that has to be visited is much larger than it
> would've been for mm_object as a whole.

I don't entirely understand what you're saying here.

Mm_object is always larger than any other table in the database, as
every table joins with (different) records in it to determine it's otype
and owner. So I don't understand how a fraction of any of those tables
could be larger than mm_object as a whole...

In fact, originally the schema used inheritance; every table inherited
(directly or indirectly) from mm_object. As this resulted in unions,
which caused much more performance problems than the current
view-approach, I implemented the current approach.
In fact, this approach was lent from what MMBase uses for the MSSQL layer.

Well, as I implemented the way the views are defined, there is room for
changes in that area. Suggestions are welcome.

> This is a problem we've understood in a generic form for awhile:
> a join or selection might change the statistics of a variable,
> and so the info stored in the catalogs ought to be modified somehow
> to predict what will happen at upper join levels. We've not seen
> it in this particular form before, though.
>
> I'm not sure if there's a whole lot you can do about it in the near term
> other than refactor your schema to avoid having different tables joining
> to different subranges of mm_object.number. (You don't necessarily have
> to get rid of mm_object --- just try assigning its keys from a serial,
> or something, so that there's no correlation to the ranges of keys in
> other tables.)

Unfortunately the number key is required to correlate to the number keys
in other tables. That's the whole point of that table. It's also already
generated from a sequence...

I am looking at a view options at the moment:

1.) Cluster mm_object on an index over otype - I'm not sure how that
would influence the statistics; if it doesn't then this wouldn't change
much.

2.) Change mm_object into a view over the tables that now join with it.
I'll have to devise some way to get the otype and owner columns into the
other tables.

3.) An extension to option 2; Creating seperate tables, only containing
the relevant sections from mm_object, combining them into a view-version
of mm_object. Like this:

CREATE TABLE mm_product_object (
number integer PRIMARY KEY,
otype integer,
owner text
);
CREATE TABLE mm_insrel_object (
number integer PRIMARY KEY,
otype integer,
owner text
);

(I recall seeing an inheritance-like statement that makes copies of
table definitions - seems useful in this case)

CREATE OR REPLACE VIEW mm_object AS
SELECT * FROM mm_product_object
UNION ALL
SELECT * FROM mm_insrel_object;

It remains to be seen that MMBase can handle mm_object being a view, but
(if not) it probably will work if it's an updatable view.

I'm leaning to option 3, but if option 1 helps it saves me a lot of trouble.

> We might be able to do something about actually solving the statistical
> problem in 8.3, but I fear it's too late to think about it for 8.2.

Well, I had hoped for a suitable workaround, and I believe I may have a
few options now. Waiting for the next PostgreSQL release never really
was an option for us (deadline is somewhere next week). So it doesn't
really matter to us that there won't be a solution until 8.3, or maybe
even later.

Thanks for the help so far, glad to be able to point out an actual problem.

Regards,
--
Alban Hertroys
alban(at)magproductions(dot)nl

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
7500 AK Enschede

// Integrate Your World //

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Alban Hertroys <alban(at)magproductions(dot)nl>
Cc:	pgsql-general(at)postgreSQL(dot)org, pgsql-hackers(at)postgreSQL(dot)org
Subject:	Re: Queries joining views
Date:	2006-08-23 12:50:32
Message-ID:	18053.1156337432@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

Alban Hertroys <alban(at)magproductions(dot)nl> writes:
> Mm_object is always larger than any other table in the database, as
> every table joins with (different) records in it to determine it's otype
> and owner. So I don't understand how a fraction of any of those tables
> could be larger than mm_object as a whole...

No, I said a larger fraction, not a larger absolute number of tuples.
The problem is that because mm_product contains only very small values
of "number", a mergejoin looks like a great way to join it to mm_object:
only the first 5% of mm_object will need to be scanned. The bug
consists in applying that 5% number to mm_insrel, for which it's not
correct.

regards, tom lane