Quick Links

Re: planner/optimizer question

Lists:	pgsql-performance

From:	brad-pgperf(at)duttonbros(dot)com
To:	pgsql-performance(at)postgresql(dot)org
Subject:	planner/optimizer question
Date:	2004-04-26 17:11:03
Message-ID:	20040426171103.45600.qmail@uno.mnl.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

Hi,

I have a query which I think should be using an index all of the time but
postgres only uses the index part of the time. The index
(ticket_crm_map_crm_id_suppid) has the where clause column (crm_id) listed
first followed by the selected column (support_person_id). Wouldn't the
most efficient plan be to scan the index regardless of crm_id because the
only columns needed are in the index? Below is the table, 2 queries showing
the difference in plans, followed by the record distribution of
ticket_crm_map. I first did a 'vacuum analyze' and am running postgres
7.4.2.

Thanks,
Brad

athenapost=> \d ticket_crm_map
Table "public.ticket_crm_map"
Column | Type |
Modifiers
------------------------+-----------------------------+---------------------
-----------------------
tcrm_map_id | integer | not null
ticket_id | integer | not null
crm_id | integer | not null
support_person_id | integer | not null
escalated_to_person_id | integer | not null
status | character varying(50) | not null default
'Open'::character varying
close_date | timestamp without time zone |
updated_date | timestamp without time zone |
updated_by | character varying(255) |
created_date | timestamp without time zone |
created_by | character varying(255) |
additional_info | text |
subject | character varying(255) |
Indexes:
"ticket_crm_map_pkey" primary key, btree (tcrm_map_id)
"ticket_crm_map_crm_id_key" unique, btree (crm_id, ticket_id)
"ticket_crm_map_crm_id_suppid" btree (crm_id, support_person_id)
"ticket_crm_map_status" btree (status)
"ticket_crm_map_ticket_id" btree (ticket_id)
Foreign-key constraints:
"$1" FOREIGN KEY (ticket_id) REFERENCES ticket(ticket_id)
"$2" FOREIGN KEY (crm_id) REFERENCES company_crm(crm_id)
"$3" FOREIGN KEY (support_person_id) REFERENCES person(person_id)
"$4" FOREIGN KEY (escalated_to_person_id) REFERENCES person(person_id)
"$5" FOREIGN KEY (status) REFERENCES ticket_status(status)

athenapost=> explain analyze select distinct support_person_id from
ticket_crm_map where crm_id = 7;

QUERY PLAN
----------------------------------------------------------------------------
----------------------------------------------------------------------------
----------
Unique (cost=1262.99..1265.27 rows=1 width=4) (actual time=15.335..18.245
rows=20 loops=1)
-> Sort (cost=1262.99..1264.13 rows=456 width=4) (actual
time=15.332..16.605 rows=2275 loops=1)
Sort Key: support_person_id
-> Index Scan using ticket_crm_map_crm_id_suppid on ticket_crm_map
(cost=0.00..1242.85 rows=456 width=4) (actual time=0.055..11.281 rows=2275
loops=1)
Index Cond: (crm_id = 7)
Total runtime: 18.553 ms
(6 rows)

Time: 20.598 ms
athenapost=> explain analyze select distinct support_person_id from
ticket_crm_map where crm_id = 1;
QUERY PLAN
----------------------------------------------------------------------------
-----------------------------------------------------
Unique (cost=10911.12..11349.26 rows=32 width=4) (actual
time=659.102..791.517 rows=24 loops=1)
-> Sort (cost=10911.12..11130.19 rows=87628 width=4) (actual
time=659.090..713.285 rows=93889 loops=1)
Sort Key: support_person_id
-> Seq Scan on ticket_crm_map (cost=0.00..3717.25 rows=87628
width=4) (actual time=0.027..359.299 rows=93889 loops=1)
Filter: (crm_id = 1)
Total runtime: 814.601 ms
(6 rows)

Time: 817.095 ms
athenapost=> select count(*), crm_id from ticket_crm_map group by crm_id;
count | crm_id
-------+--------
2554 | 63
129 | 25
17 | 24
110 | 23
74 | 22
69 | 21
2 | 20
53 | 82
10 | 17
16 | 81
46637 | 16
14 | 80
2 | 15
1062 | 79
87 | 78
93 | 77
60 | 44
363 | 76
225 | 10
4 | 74
83 | 9
27 | 73
182 | 8
2275 | 7
15 | 71
554 | 6
44 | 70
631 | 5
37 | 4
190 | 3
112 | 2
93889 | 1
(32 rows)

Time: 436.697 ms

From:	"Atesz" <atesz(at)ritek(dot)hu>
To:	<pgsql-performance(at)postgresql(dot)org>
Subject:	Re: planner/optimizer question
Date:	2004-04-27 12:54:27
Message-ID:	002901c42c56$c8e077a0$0b02010a@atesz
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

Hi,

You should try the next queries:

select support_person_id from ticket_crm_map where crm_id = 7 GROUP BY
support_person_id;
select support_person_id from ticket_crm_map where crm_id = 1 GROUP BY
support_person_id;

It can use the 'ticket_crm_map_crm_id_suppid' index.

Generally the Postgres use an k-column index if columns of your
conditions are prefix of the index column.
For example:
CREATE INDEX test_idx on test(col1,col2,col3,col4);
SELECT * FROM test WHERE col1=3 AND col2=13; -- This can use the index.

But the next queries cannot use the index:
SELECT * FROM test WHERE col1=3 AND col3=13;.
SELECT * FROM test WHERE col2=3;

If you have problem with seq_scan or sort, you can disable globally and
locally:
SET enable_seqscan=0;
SET enable_sort = 0;

Regards, Antal Attila

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	brad-pgperf(at)duttonbros(dot)com
Cc:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: planner/optimizer question
Date:	2004-04-28 04:27:59
Message-ID:	23032.1083126479@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

brad-pgperf(at)duttonbros(dot)com writes:
> ... Wouldn't the most efficient plan be to scan the index regardless
> of crm_id because the only columns needed are in the index?

No. People coming from other databases often have the misconception
that queries can be answered by looking only at an index. That is never
true in Postgres because row validity info is only stored in the table;
so we must always visit the table entry to make sure the row is still
valid/visible for the current query.

Accordingly, columns added to the index that aren't constrained by the
WHERE clause are not very useful ...

regards, tom lane

From:	"Gary Doades" <gpd(at)gpdnet(dot)co(dot)uk>
To:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: planner/optimizer question
Date:	2004-04-28 06:35:41
Message-ID:	408F5ECD.29849.CA91084@localhost
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

I know you will shoot me down, but...

Why is there an entry in the index for a row if the row is not valid?
Wouldn't it be better for the index entry validity to track the row validity.
If a particular data value for a query (join, where etc.) can be satisfied
by the index entry itself this would be a big performance gain.

Cheers,
Gary.

On 28 Apr 2004 at 0:27, Tom Lane wrote:

> brad-pgperf(at)duttonbros(dot)com writes:
> > ... Wouldn't the most efficient plan be to scan the index regardless
> > of crm_id because the only columns needed are in the index?
>
> No. People coming from other databases often have the misconception
> that queries can be answered by looking only at an index. That is never
> true in Postgres because row validity info is only stored in the table;
> so we must always visit the table entry to make sure the row is still
> valid/visible for the current query.
>
> Accordingly, columns added to the index that aren't constrained by the
> WHERE clause are not very useful ...
>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 8: explain analyze is your friend

From:	Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>
To:	Gary Doades <gpd(at)gpdnet(dot)co(dot)uk>
Cc:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: planner/optimizer question
Date:	2004-04-28 07:04:14
Message-ID:	408F576E.1080504@familyhealth.com.au
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

> Why is there an entry in the index for a row if the row is not valid?
> Wouldn't it be better for the index entry validity to track the row validity.
> If a particular data value for a query (join, where etc.) can be satisfied
> by the index entry itself this would be a big performance gain.

For SELECTs, yes - but for INSERT, UPDATE and DELETE it would be a big
performance loss.

Chris

From:	"Gary Doades" <gpd(at)gpdnet(dot)co(dot)uk>
To:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: planner/optimizer question
Date:	2004-04-28 07:08:03
Message-ID:	408F6663.22977.CC6AF7D@localhost
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

I can understand the performance loss on non-selects for keeping the
index validity state tracking the row validity, but would that outweigh the
performance gains on selects? Depends on your mix of selects to non
selects I guess, but other database systems seem to imply that keeping
the index on track is worth it overall.

Cheers,
Gary.

On 28 Apr 2004 at 15:04, Christopher Kings-Lynne wrote:

> > Why is there an entry in the index for a row if the row is not valid?
> > Wouldn't it be better for the index entry validity to track the row validity.
> > If a particular data value for a query (join, where etc.) can be satisfied
> > by the index entry itself this would be a big performance gain.
>
> For SELECTs, yes - but for INSERT, UPDATE and DELETE it would be a big
> performance loss.
>
> Chris
>

From:	Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>
To:	Gary Doades <gpd(at)gpdnet(dot)co(dot)uk>
Cc:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: planner/optimizer question
Date:	2004-04-28 07:23:48
Message-ID:	408F5C04.7020101@familyhealth.com.au
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

> I can understand the performance loss on non-selects for keeping the
> index validity state tracking the row validity, but would that outweigh the
> performance gains on selects? Depends on your mix of selects to non
> selects I guess, but other database systems seem to imply that keeping
> the index on track is worth it overall.

Yes, some sort of flag on index creation would be sweet :)

Chris

From:	Manfred Koizar <mkoi-pg(at)aon(dot)at>
To:	"Gary Doades" <gpd(at)gpdnet(dot)co(dot)uk>
Cc:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: planner/optimizer question
Date:	2004-04-28 07:34:59
Message-ID:	j8nu80566omg3pne8bs1qtco2s8aeggnnq@email.aon.at
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

On Wed, 28 Apr 2004 07:35:41 +0100, "Gary Doades" <gpd(at)gpdnet(dot)co(dot)uk>
wrote:
>Why is there an entry in the index for a row if the row is not valid?

Because whether a row is seen as valid or not lies in the eye of the
transaction looking at it. Full visibility information is stored in the
heap tuple header. The developers' consensus is that this overhead
should not be in every index tuple.

Servus
Manfred

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Manfred Koizar <mkoi-pg(at)aon(dot)at>
Cc:	"Gary Doades" <gpd(at)gpdnet(dot)co(dot)uk>, pgsql-performance(at)postgresql(dot)org
Subject:	Re: planner/optimizer question
Date:	2004-04-28 13:05:04
Message-ID:	27806.1083157504@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

Manfred Koizar <mkoi-pg(at)aon(dot)at> writes:
> On Wed, 28 Apr 2004 07:35:41 +0100, "Gary Doades" <gpd(at)gpdnet(dot)co(dot)uk>
> wrote:
>> Why is there an entry in the index for a row if the row is not valid?

> Because whether a row is seen as valid or not lies in the eye of the
> transaction looking at it. Full visibility information is stored in the
> heap tuple header. The developers' consensus is that this overhead
> should not be in every index tuple.

Storing that information would at least double the overhead space used
for each index tuple. The resulting index bloat would significantly
slow index operations by requiring more I/O. So it's far from clear
that this would be a win, even for those who care only about select
speed.

regards, tom lane

From:	Manfred Koizar <mkoi-pg(at)aon(dot)at>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	"Gary Doades" <gpd(at)gpdnet(dot)co(dot)uk>, pgsql-performance(at)postgresql(dot)org
Subject:	Re: planner/optimizer question
Date:	2004-04-29 17:03:03
Message-ID:	pvc290584alms4s6hurvsh5ra09ki7se1s@email.aon.at
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

On Wed, 28 Apr 2004 09:05:04 -0400, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> [ ... visibility information in index tuples ... ]

>Storing that information would at least double the overhead space used
>for each index tuple. The resulting index bloat would significantly
>slow index operations by requiring more I/O. So it's far from clear
>that this would be a win, even for those who care only about select
>speed.

While the storage overhead could be reduced to 1 bit (not a joke) we'd
still have the I/O overhead of locating and updating index tuples for
every heap tuple deleted/updated.

Servus
Manfred

From:	"Gary Doades" <gpd(at)gpdnet(dot)co(dot)uk>
To:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: planner/optimizer question
Date:	2004-04-29 18:13:51
Message-ID:	409153EF.24647.144E9BF7@localhost
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

On 29 Apr 2004 at 19:03, Manfred Koizar wrote:

> While the storage overhead could be reduced to 1 bit (not a joke) we'd
> still have the I/O overhead of locating and updating index tuples for
> every heap tuple deleted/updated.

But this is what a lot of DBMSs do and seem to do well enough. I can see that the
MVCC system gives additional problems, but maybe it shouldn't be dismissed so lightly.

Coming from a MS SQLServer platform I have spent a lot of time optimising SQL in
PostgreSQL to be comparable to SQLServer. For the most part I have done this, but
some things are just slower in PostgreSQL.

Recently I have been looking at raw performance (CPU, IO) rather than the plans. I
have some test queries that (as far as I can determine) use the same access plans on
PostgreSQL and SQLServer. Getting to the detail, an index scan of an index on a
integer column (222512 rows) takes 60ms on SQLServer and 540ms on PostgreSQL.
A full seq table scan on the same table without the index on the other hand takes 370ms
in SQLServer and 420ms in PostgreSQL.

I know that the platforms are different (windows 2000 vs Linux 2.6.3), but the statement
was executed several times to make sure the index and data was in cache (no disk io)
on both systems. Same data, Same CPU, Same disks, Same memory, Same
motherboards.

The only thing I can think of is the way that the index scan is performed on each
platform, SQLServer can use the data directly from the index. This makes the biggest
difference in multi join statements where several of the intermediate tables do not need
to be accessed at all, the data is contained in the join indexes. This results in almost an
order of magnitude performance difference for the same data.

I would be nice to get a feel for how much performance loss would be incurred in
maintaining the index flags against possible performance gains for getting the data back
out again.

Regards,
Gary.

From:	Rod Taylor <pg(at)rbt(dot)ca>
To:	Gary Doades <gpd(at)gpdnet(dot)co(dot)uk>
Cc:	Postgresql Performance <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: planner/optimizer question
Date:	2004-04-29 19:12:15
Message-ID:	1083265934.30065.359.camel@jester
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

> I would be nice to get a feel for how much performance loss would be incurred in
> maintaining the index flags against possible performance gains for getting the data back
> out again.

I guess the real question is, why maintain index flags and not simply
drop the index entry altogether?

A more interesting case would be to have the backend process record
index tuples that it would invalidate (if committed), then on commit
send that list to a garbage collection process.

It's still vacuum -- just the reaction time for it would be much
quicker.

From:	"Gary Doades" <gpd(at)gpdnet(dot)co(dot)uk>
To:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: planner/optimizer question
Date:	2004-04-29 19:23:19
Message-ID:	40916437.19555.148E3602@localhost
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

>
> I guess the real question is, why maintain index flags and not simply
> drop the index entry altogether?
>
> A more interesting case would be to have the backend process record
> index tuples that it would invalidate (if committed), then on commit
> send that list to a garbage collection process.
>
> It's still vacuum -- just the reaction time for it would be much
> quicker.
>
This was my original question.

I guess the problem is with MVCC. The row may have gone from your
current view of the table but not from someone elses. I don't (yet)
understand the way it works to say for sure, but I still think it is worth
pursuing further for someone who does know the deep stuff. They seem
to have concluded that it is not worth it however.

Cheers,
Gary.

From:	"Rosser Schwarz" <rschwarz(at)totalcardinc(dot)com>
To:	"'Gary Doades'" <gpd(at)gpdnet(dot)co(dot)uk>, <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: planner/optimizer question
Date:	2004-04-29 20:12:29
Message-ID:	003101c42e26$4c384e90$2500fa0a@CardServices.TCI.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

while you weren't looking, Gary Doades wrote:

> Recently I have been looking at raw performance (CPU, IO)
> rather than the plans. I have some test queries that (as far
> as I can determine) use the same access plans on PostgreSQL
> and SQLServer. Getting to the detail, an index scan of an
> index on a integer column (222512 rows) takes 60ms on
> SQLServer and 540ms on PostgreSQL.

After a recent power outage, I had the opportunity to watch both
PostgreSQL and MS SQL come back from forced shutdowns (clean,
though there were active connections, in one case a bulk insert).
PostgreSQL was available and responsive as soon as the postmaster
had started. MS SQL, on the other hand, took the better part of
an hour to become remotely usable again -- on a radically faster
machine (Dell 6650, versus the 6450 we run PostgreSQL on).

Digging a bit, I noted that once MS SQL was up again, it was
using nearly 2GB main memory even when more or less idle. From
this, and having observed the performance differences between
the two, I'm left with little alternative but to surmise that
part of MS SQL's noted performance advantage [1] is due to its
forcibly storing its indices in main memory. Its startup lag
(during which it was utterly unusable; even SELECTs blocked)
could be accounted for by reindexing the tables. [2]

Granted, this is only a hypothesis, is rather unverifyable, and
probably belongs more on ADVOCACY than it does PERFORM, but it
seemed relevant.

It's also entirely possible your indices are using inaccurate
statistical information. Have you ANALYZEd recently?

/rls

[1] Again, at least in our case, the comparison is entirely
invalid, as MS SQL gets a hell of a lot more machine than
PostgreSQL. Even so, for day-to-day work and queries, even
our DBA, an until-recently fervent MS SQL advocate can't
fault PostgreSQL's SELECT, INSERT or DELETE performance.
We still can't get UPDATEs (at least bulk such) to pass
muster.

[2] This is further supported by having observed MS SQL run a
"recovery process" on databases that were entirely unused,
even for SELECT queries, at the time of the outage. The
only thing it might conceivably need to recover on them
is in-memory indices that were lost when power was lost.

--
Rosser Schwarz
Total Card, Inc.

From:	"Gary Doades" <gpd(at)gpdnet(dot)co(dot)uk>
To:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: planner/optimizer question
Date:	2004-04-29 20:26:18
Message-ID:	409172FA.1509.14C7E00E@localhost
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

> It's also entirely possible your indices are using inaccurate
> statistical information. Have you ANALYZEd recently?
>

In this example the statistics don't matter. The plans used were the same for
MSSQL and Postgres. I was trying to eliminate the difference in plans
between the two, which obviously does make a difference, sometimes in
MSSQL favour and sometimes the other way round. Both systems, having
decided to do the same index scan, took noticably different times. The
Postgres database was fully vacuumed and analysed anyway.

I agree about MSSQL recovery time. it sucks. This is why they are making a
big point about the improved recovery time in "yukon". Although the recovery
time is important, I see this as an exception, whereas at the moment I am
interested in the everyday.

Cheers,
Gary.

From:	Josh Berkus <josh(at)agliodbs(dot)com>
To:	"Gary Doades" <gpd(at)gpdnet(dot)co(dot)uk>, pgsql-performance(at)postgresql(dot)org
Subject:	Re: planner/optimizer question
Date:	2004-04-29 20:54:33
Message-ID:	200404291354.33304.josh@agliodbs.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

Gary,

> In this example the statistics don't matter. The plans used were the same
for
> MSSQL and Postgres. I was trying to eliminate the difference in plans
> between the two, which obviously does make a difference, sometimes in
> MSSQL favour and sometimes the other way round. Both systems, having
> decided to do the same index scan, took noticably different times. The
> Postgres database was fully vacuumed and analysed anyway.

It's also quite possble the MSSQL simply has more efficient index scanning
implementation that we do. They've certainly had incentive; their storage
system sucks big time for random lookups and they need those fast indexes.
(just try to build a 1GB adjacency list tree on SQL Server. I dare ya).

Certainly the fact that MSSQL is essentially a single-user database makes
things easier for them. They don't have to maintain multiple copies of the
index tuples in memory. I think that may be our main performance loss.

--
-Josh Berkus
Aglio Database Solutions
San Francisco

From:	"Gary Doades" <gpd(at)gpdnet(dot)co(dot)uk>
To:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: planner/optimizer question
Date:	2004-04-29 21:08:28
Message-ID:	40917CDC.3078.14EE7897@localhost
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

On 29 Apr 2004 at 15:35, Kenneth Marshall wrote:

> Did you try to cluster based on the index?
>
> --Ken

Yes, This speeds up the index scan a little (12%). This to me just
reinforces the overhead that subsequently having to go and fetch the
data tuple actually has on the performance.

Cheers,
Gary.

From:	"Gary Doades" <gpd(at)gpdnet(dot)co(dot)uk>
To:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: planner/optimizer question
Date:	2004-04-29 21:15:12
Message-ID:	40917E70.21362.14F4A330@localhost
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

On 29 Apr 2004 at 13:54, Josh Berkus wrote:

> Gary,
>
>
> It's also quite possble the MSSQL simply has more efficient index scanning
> implementation that we do. They've certainly had incentive; their storage
> system sucks big time for random lookups and they need those fast indexes.
> (just try to build a 1GB adjacency list tree on SQL Server. I dare ya).
>
> Certainly the fact that MSSQL is essentially a single-user database makes
> things easier for them. They don't have to maintain multiple copies of the
> index tuples in memory. I think that may be our main performance loss.
>

Possibly, but MSSQL certainly uses data from indexes and cuts out the
subsequent (possibly random seek) data fetch. This is also why the
"Index Tuning Wizard" often recommends multi column compound
indexes in some cases. I've tried these recommendations on occasions
and they certainly speed up the selects significantly. If anyhing the index
scan on the new compound index must be slower then the original single
column index and yet it still gets the data faster.

This indicates to me that it is not the scan (or IO) performance that is
making the difference, but not having to go get the data row.

Cheers,
Gary.

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	"Gary Doades" <gpd(at)gpdnet(dot)co(dot)uk>
Cc:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: planner/optimizer question
Date:	2004-04-29 21:54:31
Message-ID:	17175.1083275671@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

"Gary Doades" <gpd(at)gpdnet(dot)co(dot)uk> writes:
> In this example the statistics don't matter.

Don't they?

A prior poster mentioned that he thought MSSQL tries to keep all its
indexes in memory. I wonder whether you are giving Postgres a fair
chance to do the same. What postgresql.conf settings are you using?

regards, tom lane

From:	"Gary Doades" <gpd(at)gpdnet(dot)co(dot)uk>
To:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: planner/optimizer question
Date:	2004-04-29 22:31:00
Message-ID:	40919034.20317.153A0B6F@localhost
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

On 29 Apr 2004 at 17:54, Tom Lane wrote:

> "Gary Doades" <gpd(at)gpdnet(dot)co(dot)uk> writes:
> > In this example the statistics don't matter.
>
> Don't they?
>
> A prior poster mentioned that he thought MSSQL tries to keep all its
> indexes in memory. I wonder whether you are giving Postgres a fair
> chance to do the same. What postgresql.conf settings are you using?
>
> regards, tom lane

As far as I understand it the statistics only contribute to determining the query plan.
Once the access methods are determined, the stats don't matter during the running of
the query.

I believe I have given Postgres exactly the same chance. The data is small enough to fit
into RAM (all the tables in the query add up to around 50meg) and I executed the query
several times to get a consistent figure for the explain analyze.

Having picked out an index scan as being the highest time user I concentrated on that in
this case and compared the same index scan on MSSQL. At least MSSQL reported it as
an index scan on the same index for the same number of rows.

There was nothing wrong with the query plan that Postgres used. As far as I could see it
was probably the best one to use, it just physically took longer than the same access
plan on MSSQL.

The query and plan are included below, the main thing I was looking at was the index
scan on staff_booking_pkey being 676ms long.

The only postgresql.conf parameters changed from the default are:

shared_buffers = 3000
sort_mem = 4096
effective_cache_size = 15000
default_statistics_target = 100

There was no disk IO (above the small background IO) during the final run of the query
as reported by vmstat (Task Mangler on Windows).

SELECT B.CONTRACT_ID,SUM(R.DURATION+1)/60.0 AS SUMDUR FROM
SEARCH_REQT_RESULT TSR
JOIN STAFF_BOOKING B ON (B.STAFF_ID = TSR.STAFF_ID)
JOIN ORDER_REQT R ON (R.REQT_ID = B.REQT_ID)
JOIN BOOKING_PLAN BP ON (BP.BOOKING_ID = B.BOOKING_ID) AND
BP.BOOKING_DATE BETWEEN '2004-04-12' AND '2004-04-18' AND
TSR.SEARCH_ID = 8 GROUP BY B.CONTRACT_ID

QUERY PLAN
HashAggregate (cost=11205.80..11209.81 rows=401 width=6) (actual
time=1179.729..1179.980 rows=50 loops=1)
-> Nested Loop (cost=326.47..11203.79 rows=401 width=6) (actual
time=39.700..1177.149 rows=652 loops=1)
-> Hash Join (cost=326.47..9990.37 rows=401 width=8) (actual
time=39.537..1154.807 rows=652 loops=1)
Hash Cond: ("outer".staff_id = "inner".staff_id)
-> Merge Join (cost=320.39..9885.06 rows=3809 width=12) (actual
time=38.316..1143.953 rows=4079 loops=1)
Merge Cond: ("outer".booking_id = "inner".booking_id)
-> Index Scan using staff_booking_pkey on staff_booking b
(cost=0.00..8951.94 rows=222612 width=16) (actual time=0.218..676.219 rows=222609
loops=1)
-> Sort (cost=320.39..329.91 rows=3808 width=4) (actual
time=26.225..32.754 rows=4079 loops=1)
Sort Key: bp.booking_id
-> Index Scan using booking_plan_idx2 on booking_plan bp
(cost=0.00..93.92 rows=3808 width=4) (actual time=0.223..14.186 rows=4079 loops=1)
Index Cond: ((booking_date >= '2004-04-12'::date) AND
(booking_date <= '2004-04-18'::date))
-> Hash (cost=5.59..5.59 rows=193 width=4) (actual time=1.139..1.139
rows=0 loops=1)
-> Index Scan using fk_idx_search_reqt_result on search_reqt_result tsr
(cost=0.00..5.59 rows=193 width=4) (actual time=0.213..0.764 rows=192 loops=1)
Index Cond: (search_id = 8)
-> Index Scan using order_reqt_pkey on order_reqt r (cost=0.00..3.01 rows=1
width=6) (actual time=0.023..0.025 rows=1 loops=652)
Index Cond: (r.reqt_id = "outer".reqt_id)
Total runtime: 1181.239 ms

Cheers,
Gary.

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	josh(at)agliodbs(dot)com
Cc:	"Gary Doades" <gpd(at)gpdnet(dot)co(dot)uk>, pgsql-performance(at)postgresql(dot)org
Subject:	Re: planner/optimizer question
Date:	2004-04-29 23:17:49
Message-ID:	17898.1083280669@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

Josh Berkus <josh(at)agliodbs(dot)com> writes:
> Certainly the fact that MSSQL is essentially a single-user database makes
> things easier for them.

Our recent testing (cf the "Xeon" thread) says that the interlocking we
do to make the world safe for multiple backends has a fairly high cost
(at least on some hardware) compared to the rest of the work in
scenarios where you are doing zero-I/O scans of data already in memory.
Especially so for index scans. I'm not sure this completely explains
the differential that Gary is complaining about, but it could be part of
it. Is it really true that MSSQL doesn't support concurrent operations?

regards, tom lane

From:	"Gary Doades" <gpd(at)gpdnet(dot)co(dot)uk>
To:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: planner/optimizer question
Date:	2004-04-30 06:33:11
Message-ID:	40920137.13806.16F37CC5@localhost
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

On 29 Apr 2004 at 19:17, Tom Lane wrote:

> Josh Berkus <josh(at)agliodbs(dot)com> writes:
> > Certainly the fact that MSSQL is essentially a single-user database makes
> > things easier for them.
>
> Our recent testing (cf the "Xeon" thread) says that the interlocking we
> do to make the world safe for multiple backends has a fairly high cost
> (at least on some hardware) compared to the rest of the work in
> scenarios where you are doing zero-I/O scans of data already in memory.
> Especially so for index scans. I'm not sure this completely explains
> the differential that Gary is complaining about, but it could be part of
> it. Is it really true that MSSQL doesn't support concurrent operations?
>
> regards, tom lane

As far as I am aware SQLSever supports concurrent operations. It
certainly creates more threads for each connection. None of my
observations of the system under load (50 ish concurrent users, 150 ish
connections) suggest that it is serializing queries.

These tests are currentl on single processor Athlon XP 2000+ systems.

Regards,
Gary.

From:	Jochem van Dieten <jochemd(at)oli(dot)tudelft(dot)nl>
To:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: planner/optimizer question
Date:	2004-04-30 17:46:24
Message-ID:	409290F0.10509@oli.tudelft.nl
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

Manfred Koizar wrote:
> On Wed, 28 Apr 2004 09:05:04 -0400, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> [ ... visibility information in index tuples ... ]
>>
>> Storing that information would at least double the overhead space used
>> for each index tuple. The resulting index bloat would significantly
>> slow index operations by requiring more I/O. So it's far from clear
>> that this would be a win, even for those who care only about select
>> speed.
>
> While the storage overhead could be reduced to 1 bit (not a joke)

You mean adding an isLossy bit and only where it is set the head
tuple has to be checked for visibility, if it is not set the head
tuple does not have to be checked?

> we'd
> still have the I/O overhead of locating and updating index tuples for
> every heap tuple deleted/updated.

Would there be additional I/O for the additional bit in the index
tuple (I am unable to find the layout of index tuple headers in
the docs)?

Jochem

--
I don't get it
immigrants don't work
and steal our jobs
- Loesje

From:	"Gary Doades" <gpd(at)gpdnet(dot)co(dot)uk>
To:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: planner/optimizer question
Date:	2004-04-30 18:59:38
Message-ID:	4092B02A.15469.199EE3DE@localhost
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

On 30 Apr 2004 at 9:37, Kevin Barnard wrote:

>
> I was always under the impression that MSSQL used leaf and row level locking and therefore
> was not a concurrent, in the same sense that postgres is, database. It would still allow for
> concurrent connections and such but updates will get blocked/ delayed. I might be wrong.
>

Ultimately you may be right. I don't know enough about SQLServer
internals to say either way. Anyway, most of our system is in selects for
70% of the time. I could try and set up a test for this when I get a bit
more time.

Unfortunately I suspect that this topic won't get taken much further. In
order to test this it would mean modifying quite a bit of code. Whether
putting additional info in the index header and not visiting the data row
if all the required data is in the index would be beneficial would require
quite a bit of work by someone who knows more than I do. I reckon that
no-one has the time to do this at the moment.

Regards,
Gary.

From:	Manfred Koizar <mkoi-pg(at)aon(dot)at>
To:	Jochem van Dieten <jochemd(at)oli(dot)tudelft(dot)nl>
Cc:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: planner/optimizer question
Date:	2004-04-30 21:36:58
Message-ID:	toe590tgu39ilahg3p1os2mjlgf4iss582@email.aon.at
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

On Fri, 30 Apr 2004 19:46:24 +0200, Jochem van Dieten
<jochemd(at)oli(dot)tudelft(dot)nl> wrote:
>> While the storage overhead could be reduced to 1 bit (not a joke)
>
>You mean adding an isLossy bit and only where it is set the head
>tuple has to be checked for visibility, if it is not set the head
>tuple does not have to be checked?

Yes, something like this. Actually I imagined it the other way round: a
visible-to-all flag similar to the existing dead-to-all flag (search for
LP_DELETE and ItemIdDeleted in nbtree.c).

>> we'd
>> still have the I/O overhead of locating and updating index tuples for
>> every heap tuple deleted/updated.
>
>Would there be additional I/O for the additional bit in the index
>tuple (I am unable to find the layout of index tuple headers in
>the docs)?

Yes, the visible-to-all flag would be set as a by-product of an index
scan, if the heap tuple is found to be visible to all active
transactions. This update is non-critical and, I think, not very
expensive.

Deleting (and hence updating) a tuple is more critical, regarding both
consistency and performance. We'd have to locate all index entries
pointing to the heap tuple and set their visible-to-all flags to false.

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Manfred Koizar <mkoi-pg(at)aon(dot)at>
Cc:	Jochem van Dieten <jochemd(at)oli(dot)tudelft(dot)nl>, pgsql-performance(at)postgresql(dot)org
Subject:	Re: planner/optimizer question
Date:	2004-05-01 01:19:39
Message-ID:	8881.1083374379@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

Manfred Koizar <mkoi-pg(at)aon(dot)at> writes:
> Yes, the visible-to-all flag would be set as a by-product of an index
> scan, if the heap tuple is found to be visible to all active
> transactions. This update is non-critical

Oh really? I think you need to think harder about the transition
conditions.

Dead-to-all is reasonably safe to treat as a hint bit because *it does
not ever need to be undone*. Visible-to-all does not have that
property.

regards, tom lane

From:	Jochem van Dieten <jochemd(at)oli(dot)tudelft(dot)nl>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Manfred Koizar <mkoi-pg(at)aon(dot)at>, pgsql-performance(at)postgresql(dot)org
Subject:	Re: planner/optimizer question
Date:	2004-05-01 11:18:04
Message-ID:	4093876C.2060409@oli.tudelft.nl
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

Tom Lane wrote:
> Manfred Koizar <mkoi-pg(at)aon(dot)at> writes:
>>
>> Yes, the visible-to-all flag would be set as a by-product of an index
>> scan, if the heap tuple is found to be visible to all active
>> transactions. This update is non-critical
>
> Oh really? I think you need to think harder about the transition
> conditions.
>
> Dead-to-all is reasonably safe to treat as a hint bit because *it does
> not ever need to be undone*. Visible-to-all does not have that
> property.

Yes, really :-)

When a tuple is inserted the visible-to-all flag is set to false.
The effect of this is that every index scan that finds this tuple
has to visit the heap to verify visibility. If it turns out the
tuple is not only visible to the current transaction, but to all
current transactions, the visible-to-all flag can be set to true.
This is non-critical, because if it is set to false scans will
not miss the tuple, they will just visit the heap to verify
visibility.

The moment the heap tuple is updated/deleted the visible-to-all
flag needs to be set to false again in all indexes. This is
critical, and the I/O and (dead)lock costs of unsetting the
visible-to-all flag are unknown and might be big enough to ofset
any advantage on the selects.

But I believe that for applications with a "load, select, drop"
usage pattern (warehouses, archives etc.) having this
visible-to-all flag would be a clear winner.

Jochem

--
I don't get it
immigrants don't work
and steal our jobs
- Loesje

From:	"Gary Doades" <gpd(at)gpdnet(dot)co(dot)uk>
To:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: planner/optimizer question
Date:	2004-05-01 12:48:35
Message-ID:	4093AAB3.21370.742972@localhost
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

On 1 May 2004 at 13:18, Jochem van Dieten wrote:

> Yes, really :-)
>
> When a tuple is inserted the visible-to-all flag is set to false.
> The effect of this is that every index scan that finds this tuple
> has to visit the heap to verify visibility. If it turns out the
> tuple is not only visible to the current transaction, but to all
> current transactions, the visible-to-all flag can be set to true.
> This is non-critical, because if it is set to false scans will
> not miss the tuple, they will just visit the heap to verify
> visibility.
>
> The moment the heap tuple is updated/deleted the visible-to-all
> flag needs to be set to false again in all indexes. This is
> critical, and the I/O and (dead)lock costs of unsetting the
> visible-to-all flag are unknown and might be big enough to ofset
> any advantage on the selects.
>
> But I believe that for applications with a "load, select, drop"
> usage pattern (warehouses, archives etc.) having this
> visible-to-all flag would be a clear winner.
>
> Jochem
>

If needs be this index maintenance could be set as a configuration
option. It is likely that database usage patterns are reasonably well
known for a particular installation. This option could be set on or off
dependant on typical transactions.

In my case with frequent large/complex selects and few very short
insert/updates I think it could be a big performance boost. If it works :-)

Regards,
Gary.

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Jochem van Dieten <jochemd(at)oli(dot)tudelft(dot)nl>
Cc:	Manfred Koizar <mkoi-pg(at)aon(dot)at>, pgsql-performance(at)postgresql(dot)org
Subject:	Re: planner/optimizer question
Date:	2004-05-01 16:21:28
Message-ID:	14614.1083428488@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

Jochem van Dieten <jochemd(at)oli(dot)tudelft(dot)nl> writes:
> The moment the heap tuple is updated/deleted the visible-to-all
> flag needs to be set to false again in all indexes. This is
> critical,

Exactly. This gets you out of the hint-bit semantics and into a ton
of interesting problems, such as race conditions. (Process A determines
that tuple X is visible-to-all, and goes to update the index tuple.
Before it can reacquire lock on the index page, process B updates the
heap tuple and visits the index to clear the flag bit. Once A obtains
lock it will set the flag bit. Oops.)

Basically what you are buying into with such a thing is multiple copies
of critical state. It may be only one bit rather than several words,
but updating it is no less painful than if it were a full copy of the
tuple's commit status.

regards, tom lane

From:	Manfred Koizar <mkoi-pg(at)aon(dot)at>
To:	Jochem van Dieten <jochemd(at)oli(dot)tudelft(dot)nl>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-performance(at)postgresql(dot)org
Subject:	Re: planner/optimizer question
Date:	2004-05-02 08:03:36
Message-ID:	ra9990hea0ats52frhjd6u0c5dsbr7jkmd@email.aon.at
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

On Sat, 01 May 2004 13:18:04 +0200, Jochem van Dieten
<jochemd(at)oli(dot)tudelft(dot)nl> wrote:
>Tom Lane wrote:
>> Oh really? I think you need to think harder about the transition
>> conditions.

Indeed.

>>
>> Dead-to-all is reasonably safe to treat as a hint bit because *it does
>> not ever need to be undone*. Visible-to-all does not have that
>> property.
>
>Yes, really :-)

No, not really :-(

As Tom has explained in a nearby message his concern is that -- unlike
dead-to-all -- visible-to-all starts as false, is set to true at some
point in time, and is eventually set to false again. Problems arise if
one backend wants to set visible-to-all to true while at the same time
another backend wants to set it to false.

This could be curable by using a second bit as a deleted flag (might be
even the same bit that's now used as dead-to-all, but I'm not sure). An
index tuple having both the visible flag (formerly called
visible-to-all) and the deleted flag set would cause a heap tuple access
to check visibility. But that leaves the question of what to do after
the deleting transaction has rolled back. I see no clean way from the
visible-and-deleted state to visible-to-all.

This obviously needs another round of hard thinking ...

From:	"Jochem van Dieten" <jochemd(at)oli(dot)tudelft(dot)nl>
To:	<mkoi-pg(at)aon(dot)at>
Cc:	<tgl(at)sss(dot)pgh(dot)pa(dot)us>, <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: planner/optimizer question
Date:	2004-05-02 14:30:47
Message-ID:	1529.130.161.199.221.1083508247.squirrel@secure.oli.tudelft.nl
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

Manfred Koizar said:
>
> As Tom has explained in a nearby message his concern is that --
> unlike dead-to-all -- visible-to-all starts as false, is set to true
> at some point in time, and is eventually set to false again.
> Problems arise if one backend wants to set visible-to-all to true
> while at the same time another backend wants to set it to false.

Got it, I misinterpreted his concern as "visible-to-all should not be
set to true when the tuple is inserted".

> This could be curable by using a second bit as a deleted flag (might
> be even the same bit that's now used as dead-to-all, but I'm not
> sure). An index tuple having both the visible flag (formerly called
> visible-to-all) and the deleted flag set would cause a heap tuple
> access to check visibility.

Or in a more generalized way: with 2 bits written at the same time you
can express 4 states. But only 3 actions need to be signalled:
dead-to-all, visible-to-all and check-heap. So we can have 2 states
that both signal check-heap.

The immediate solution to the race condition Tom presented would be to
have the transaction that invalidates the heap tuple switch the index
tuple from the one check-heap state to the other. The transaction that
wants to update to visible-to-all can now see that the state has
changed (but not the meaning) and aborts its change.

> But that leaves the question of what to
> do after the deleting transaction has rolled back. I see no clean
> way from the visible-and-deleted state to visible-to-all.

I'm afraid I don't know enough about the inner workings of rollbacks
to determine how the scenario "A determines visible-to-all should be
set, B invalidates tuple, B rolls back, C invalidates tuple, C
commits, A reaquires lock on index" would work out. I guess I have
some more reading to do.

But if you don't roll back too often it wouldn't even be a huge
problem to just leave them in visible-and-deleted state until
eventually they go into the dead-to-all state.

Jochem