BUG #1552: massive performance hit between 7.4 and 8.0.1

Lists: pgsql-bugspgsql-performance
From: "Brian O'Reilly" <fade(at)deepsky(dot)com>
To: pgsql-bugs(at)postgresql(dot)org
Subject: BUG #1552: massive performance hit between 7.4 and 8.0.1
Date: 2005-03-18 23:21:02
Message-ID: 20050318232102.08FA0F0E5D@svr2.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-performance


The following bug has been logged online:

Bug reference: 1552
Logged by: Brian O'Reilly
Email address: fade(at)deepsky(dot)com
PostgreSQL version: 8.0.1
Operating system: Linux 2.6.11
Description: massive performance hit between 7.4 and 8.0.1
Details:

When doing a lot of inserts to an empty table with a foreign key to another
table, there is an incredible performance degredation issue on 8.0.1. I have
a program that is inserting rows in an iterative loop, and in this form it
inserts about 110,000 rows. On postgresql 7.4 on a debian machine it takes a
shade over 2 minutes to complete. On an amd64 box running gentoo, it takes
over an hour and fourty minutes to complete. The query plan on the debian
host that completes quickly follows:

"Fast" machine, Debian, PSQL 7.4:

----------------------------------------------------------------------------
----------------------------------------------------
Index Scan using requirements_pkey on requirements (cost=0.00..4.82 rows=2
width=0) (actual time=0.013..0.013 rows=0 loops=1)
Index Cond: (reqid = 10::bigint)
Total runtime: 0.134 ms
(3 rows)

and the query plan on the 'slow' machine:

QUERY PLAN
----------------------------------------------------------------------------
--------------------------
Seq Scan on requirements (cost=0.00..0.00 rows=1 width=0) (actual
time=0.002..0.002 rows=0 loops=1)
Filter: (reqid = 10::bigint)
Total runtime: 0.040 ms
(3 rows)

The script I am using to show this behaviour follows:

CREATE TABLE packages
(name text PRIMARY KEY);
CREATE TABLE binary_packages
(name text REFERENCES packages,
version text,
PRIMARY KEY(name, version));
CREATE TABLE requirements
(reqid bigint PRIMARY KEY,
name text,
version text,
FOREIGN KEY (name, version) REFERENCES
binary_packages);
CREATE TABLE constraints
(constid bigint PRIMARY KEY,
reqid bigint REFERENCES requirements,
type text,
name text REFERENCES packages,
version text DEFAULT '',
relation character(2));

explain analyze select 1 from only requirements where reqid='10';

the query optimiser seems to be setting a default strategy of doing
sequential scans on an empty table, which is a fast strategy when the table
is empty and not particularly full, but obviously on a large table the
performance is O(N^2). This is clearly a bug. Please let me know if I can
provide any more information.

Brian O'Reilly
System Architect.,
DeepSky Media Resources


From: Zeki <zekispam(at)freewebz(dot)com>
To: "Brian O'Reilly" <fade(at)deepsky(dot)com>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #1552: massive performance hit between 7.4 and 8.0.1
Date: 2005-03-21 15:27:17
Message-ID: 423EE7D5.7090506@freewebz.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-performance

Have you tried an analyze after 1,000 or so inserts? Also, you should
be able to disable sequence scans for the duration of the connection
using SET enable_seqscan=false.

-Zeki

Brian O'Reilly wrote:

>The following bug has been logged online:
>
>Bug reference: 1552
>Logged by: Brian O'Reilly
>Email address: fade(at)deepsky(dot)com
>PostgreSQL version: 8.0.1
>Operating system: Linux 2.6.11
>Description: massive performance hit between 7.4 and 8.0.1
>Details:
>
>When doing a lot of inserts to an empty table with a foreign key to another
>table, there is an incredible performance degredation issue on 8.0.1. I have
>a program that is inserting rows in an iterative loop, and in this form it
>inserts about 110,000 rows. On postgresql 7.4 on a debian machine it takes a
>shade over 2 minutes to complete. On an amd64 box running gentoo, it takes
>over an hour and fourty minutes to complete. The query plan on the debian
>host that completes quickly follows:
>
> "Fast" machine, Debian, PSQL 7.4:
>
>----------------------------------------------------------------------------
>----------------------------------------------------
> Index Scan using requirements_pkey on requirements (cost=0.00..4.82 rows=2
>width=0) (actual time=0.013..0.013 rows=0 loops=1)
> Index Cond: (reqid = 10::bigint)
> Total runtime: 0.134 ms
>(3 rows)
>
>and the query plan on the 'slow' machine:
>
>
> QUERY PLAN
>----------------------------------------------------------------------------
>--------------------------
> Seq Scan on requirements (cost=0.00..0.00 rows=1 width=0) (actual
>time=0.002..0.002 rows=0 loops=1)
> Filter: (reqid = 10::bigint)
> Total runtime: 0.040 ms
>(3 rows)
>
>The script I am using to show this behaviour follows:
>
>CREATE TABLE packages
> (name text PRIMARY KEY);
>CREATE TABLE binary_packages
> (name text REFERENCES packages,
> version text,
> PRIMARY KEY(name, version));
>CREATE TABLE requirements
> (reqid bigint PRIMARY KEY,
> name text,
> version text,
> FOREIGN KEY (name, version) REFERENCES
>binary_packages);
>CREATE TABLE constraints
> (constid bigint PRIMARY KEY,
> reqid bigint REFERENCES requirements,
> type text,
> name text REFERENCES packages,
> version text DEFAULT '',
> relation character(2));
>
>explain analyze select 1 from only requirements where reqid='10';
>
>the query optimiser seems to be setting a default strategy of doing
>sequential scans on an empty table, which is a fast strategy when the table
>is empty and not particularly full, but obviously on a large table the
>performance is O(N^2). This is clearly a bug. Please let me know if I can
>provide any more information.
>
>Brian O'Reilly
>System Architect.,
>DeepSky Media Resources
>
>---------------------------(end of broadcast)---------------------------
>TIP 8: explain analyze is your friend
>
>


From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Brian O'Reilly <fade(at)deepsky(dot)com>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #1552: massive performance hit between 7.4 and 8.0.1
Date: 2005-03-23 08:40:30
Message-ID: 1111567230.11750.596.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-performance

On Fri, 2005-03-18 at 23:21 +0000, Brian O'Reilly wrote:
> The following bug has been logged online:
>
> Bug reference: 1552
> Logged by: Brian O'Reilly
> Email address: fade(at)deepsky(dot)com
> PostgreSQL version: 8.0.1
> Operating system: Linux 2.6.11
> Description: massive performance hit between 7.4 and 8.0.1
> Details:
>
> When doing a lot of inserts to an empty table with a foreign key to another
> table, there is an incredible performance degredation issue on 8.0.1. I have
> a program that is inserting rows in an iterative loop, and in this form it
> inserts about 110,000 rows. On postgresql 7.4 on a debian machine it takes a
> shade over 2 minutes to complete. On an amd64 box running gentoo, it takes
> over an hour and fourty minutes to complete. The query plan on the debian
> host that completes quickly follows:
>

This may be a bug, thanks for filing it.

However, we can't tell at the moment from what you've said.

The EXPLAINs you've enclosed are for SELECTs, yet your bug report
describes INSERTs as being the things that are slow.
[You may find better performance from using COPY]

Also, your tests have compared two systems, so it might be that the
hardware or configuration of one system is different from the other.

If you could repeat the test on one single system, then this would
assist in the diagnosis of this bug report. Also, if you could describe
the workload that is giving you a problem more exactly, that would help.
Specifically, can you confirm that you have run ANALYZE on the tables,
and also give us some idea of numbers of rows in each table at the time
you first run your programs.

> the query optimiser seems to be setting a default strategy of doing
> sequential scans on an empty table, which is a fast strategy when the table
> is empty and not particularly full, but obviously on a large table the
> performance is O(N^2).

> This is clearly a bug.

There is clearly a problem, but it is not yet clearly a bug. If it is a
bug, we're interested in solving it as much as you.

> Please let me know if I can
> provide any more information.

Yes, all of the above, plus more.

Best Regards, Simon Riggs


From: Keith Browne <tuxedo(at)deepsky(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-bugs(at)postgresql(dot)org
Cc: Brian O'Reilly <fade(at)deepsky(dot)com>
Subject: Re: BUG #1552: massive performance hit between 7.4 and 8.0.1
Date: 2005-03-23 19:22:07
Message-ID: 4241C1DF.2020000@deepsky.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-performance

Simon Riggs wrote:

> The EXPLAINs you've enclosed are for SELECTs, yet your bug report
> describes INSERTs as being the things that are slow.
> [You may find better performance from using COPY]

Simon,

Brian and I are working together on this problem.

We're starting with an empty database, creating four tables, and
populating those tables with a total of 180,000-200,000 rows. Each
table has a primary key, and several of the tables reference foreign
keys in other tables. We've written a Python script, using psycopg,
which executes all the queries to create the tables and insert the rows.
The database is running on the same machine where the script runs.

I've seen similar performance when issuing a COMMIT after each
insertion, and also after batching insertions in blocks of 250 per
COMMIT, so batching the commits is not helping much. I've looked at the
possibility of using COPY, but in our production environment it will be
prohibitive to build a flat file with all this data. I'd rather
generate it on the fly, as we've been able to do with PostgreSQL 7.4.

> Also, your tests have compared two systems, so it might be that the
> hardware or configuration of one system is different from the other.

When running with PostgreSQL 7.4 on a dual-CPU Athlon MP2400+ machine
with a gigabyte of RAM, running Debian Linux version 2.6.8.1, we were
able to insert all this data in 5-7 minutes. It's taken a while to
install Postgres 8.0.1 on the same machine, but now I have, and it's
taking 40-45 minutes to run the same insert script. This is similar to
the performance we saw on another machine, a fast single-CPU AMD64 box
running Gentoo.

I don't think it's a hardware issue. I dug around a bit, and found
suggestions that this sort of problem could be worked around by breaking
the database connection and restarting it after the tables had been
partially filled. I modified our script to break and re-establish the
database connection when each table first has 4,000 records inserted,
and the performance is greatly improved; it now takes only about 3.5
minutes to insert 180,000+ rows.

I've since modified this script to build and populate a fifth table with
over 1.3 million rows. The fifth table has no primary key, but lists a
foreign key into one of the first four tables. With the above
modification (break and re-build the DB connection after 4,000 rows have
been inserted), the whole database can be populated in about 15 minutes.
I wouldn't have dared try to build a one-million-plus-row table until
I found this speed-up.

> If you could repeat the test on one single system, then this would
> assist in the diagnosis of this bug report. Also, if you could describe
> the workload that is giving you a problem more exactly, that would help.
> Specifically, can you confirm that you have run ANALYZE on the tables,
> and also give us some idea of numbers of rows in each table at the time
> you first run your programs.

Just to see if it would help, I tried modifying the script to run an
ANALYZE against each table after 4,000 insertions, instead of breaking
and re-establishing the DB connection. I still saw ~45-minute times to
insert 180,000 rows. I then tried running ANALYZE against each table
after *each* 4,000 rows inserted, and again, it took about 45 minutes to
run the insert.

Each table is empty when I first run the program. I am dropping and
re-creating the database for each test run.

> There is clearly a problem, but it is not yet clearly a bug. If it is a
> bug, we're interested in solving it as much as you.

I'd be happy to run further tests or provide more details, if they'll
help. We now have a workaround which is allowing us to proceed with our
project, but I'd like to know if there's another way to do this. While
I understand that large or complex databases require careful tuning, I
was surprised to see a six- or seven-fold increase in run times between
PostgreSQL 7.4 and 8.0.1 on the same hardware, on an operation which
seems fairly straightforward: populating an empty table.

One other thing which puzzled me: as a test, I tried modifying our
script to spit out raw SQL statements instead of connecting to the
database and performing the inserts itself. Normally, our script
populates two tables in one pass, and then populates the third and
fourth tables in a second pass. I massaged the SQL by hand to group the
inserts together by table, so that the first table would be entirely
populated, then the second, etc. When I ran this SQL script by piping
it straight into psql, it finished in about four minutes. This is
comparable to the time it takes to run my modified script which breaks
and re-establishes the connection to the database.

It would appear that psql is doing something right here which we have
had to go out of our way to get with psycopg.

Keith Browne
tuxedo(at)deepsky(dot)com


From: Andrew - Supernews <andrew+nonews(at)supernews(dot)com>
To: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #1552: massive performance hit between 7.4 and 8.0.1
Date: 2005-03-23 19:46:50
Message-ID: slrnd43hta.rhd.andrew+nonews@trinity.supernews.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-performance

On 2005-03-23, Keith Browne <tuxedo(at)deepsky(dot)com> wrote:
> One other thing which puzzled me: as a test, I tried modifying our
> script to spit out raw SQL statements instead of connecting to the
> database and performing the inserts itself. Normally, our script
> populates two tables in one pass, and then populates the third and
> fourth tables in a second pass. I massaged the SQL by hand to group the
> inserts together by table, so that the first table would be entirely
> populated, then the second, etc. When I ran this SQL script by piping
> it straight into psql, it finished in about four minutes.

Changing the order so that the referenced table is fully populated, or at
least populated with more than a handful of pages of rows, before doing
_any_ insert on a referencing table in the same session will avoid the
misplan of the FK trigger queries, because when the first insert happens
on a referencing table, there will be no reason for the planner to prefer
a sequential scan. So this result is not surprising at all.

--
Andrew, Supernews
http://www.supernews.com - individual and corporate NNTP services


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: andrew(at)supernews(dot)com
Cc: pgsql-bugs(at)postgresql(dot)org, Keith Browne <tuxedo(at)deepsky(dot)com>, "Brian O'Reilly" <fade(at)deepsky(dot)com>
Subject: Re: BUG #1552: massive performance hit between 7.4 and 8.0.1
Date: 2005-03-23 20:12:43
Message-ID: 13841.1111608763@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-performance

Andrew - Supernews <andrew+nonews(at)supernews(dot)com> writes:
> Changing the order so that the referenced table is fully populated, or at
> least populated with more than a handful of pages of rows, before doing
> _any_ insert on a referencing table in the same session will avoid the
> misplan of the FK trigger queries, because when the first insert happens
> on a referencing table, there will be no reason for the planner to prefer
> a sequential scan. So this result is not surprising at all.

I'm still looking for an example that demonstrates why this is a common
problem that we need to worry about. ISTM that if an FK reference is
hit when there are still zero entries in the referenced table, that
insertion will fail anyway, and so people wouldn't try to load data in
such an order.

In the long term it would be good to replan the FK plans when the
referenced tables have grown so much that the plan ought to change.
Once we have the plan invalidation machinery that Neil is working on,
it might be fairly practical to do that; but no such thing is going
to appear in existing release branches of course.

We could band-aid this in 8.0 as previously suggested (have the planner
assume > 0 pages when it sees actually 0 pages) but without seeing a
concrete example I can't tell if that will fix the complaint or not.

regards, tom lane


From: Keith Browne <tuxedo(at)deepsky(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-bugs(at)postgresql(dot)org
Cc: andrew(at)supernews(dot)com, Brian O'Reilly <fade(at)deepsky(dot)com>
Subject: Re: BUG #1552: massive performance hit between 7.4 and 8.0.1
Date: 2005-03-23 20:55:07
Message-ID: 4241D7AB.4030607@deepsky.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-performance

Tom Lane wrote:

> I'm still looking for an example that demonstrates why this is a common
> problem that we need to worry about. ISTM that if an FK reference is
> hit when there are still zero entries in the referenced table, that
> insertion will fail anyway, and so people wouldn't try to load data in
> such an order.

Tom,

We're filling pairs of tables with rows having nearly a one-to-one
mapping; very rarely, the second table will have multiple rows
corresponding to one row in the first table. When we insert the first
row in the second table, therefore, we've just put the corresponding row
into the first table, so the foreign key constraint is satisfied.

I can't say how common this sort of thing will be. It appears to me
that BUG #1541 is similar to what we're seeing, and a search of the
mailing lists also turns up this message:

http://archives.postgresql.org/pgsql-performance/2004-11/msg00416.php

which also describes symptoms similar to what I'm seeing.

> We could band-aid this in 8.0 as previously suggested (have the planner
> assume > 0 pages when it sees actually 0 pages) but without seeing a
> concrete example I can't tell if that will fix the complaint or not.

It sounds like this could work for us, if it would disable sequential
searches into a table which grows from 0 to >60,000 rows in one session.
Is breaking and re-establishing the database session the best
workaround, or is there a better way to provide a hint to the planner?

Regards,

Keith Browne
tuxedo(at)deepsky(dot)com


From: Andrew - Supernews <andrew+nonews(at)supernews(dot)com>
To: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #1552: massive performance hit between 7.4 and 8.0.1
Date: 2005-03-23 21:26:55
Message-ID: slrnd43nov.rhd.andrew+nonews@trinity.supernews.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-performance

On 2005-03-23, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Andrew - Supernews <andrew+nonews(at)supernews(dot)com> writes:
>> Changing the order so that the referenced table is fully populated, or at
>> least populated with more than a handful of pages of rows, before doing
>> _any_ insert on a referencing table in the same session will avoid the
>> misplan of the FK trigger queries, because when the first insert happens
>> on a referencing table, there will be no reason for the planner to prefer
>> a sequential scan. So this result is not surprising at all.
>
> I'm still looking for an example that demonstrates why this is a common
> problem that we need to worry about. ISTM that if an FK reference is
> hit when there are still zero entries in the referenced table, that
> insertion will fail anyway, and so people wouldn't try to load data in
> such an order.

Think "1 row", not "0 rows".

It is not reasonable to assume that _all_ cases of data loading (other than
perhaps the very largest) will be done by loading entire tables at a time,
especially when importing from external sources where the data is
differently structured.

> We could band-aid this in 8.0 as previously suggested (have the planner
> assume > 0 pages when it sees actually 0 pages) but without seeing a
> concrete example I can't tell if that will fix the complaint or not.

It won't; the problem is with 1 page, not 0.

--
Andrew, Supernews
http://www.supernews.com - individual and corporate NNTP services


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Keith Browne <tuxedo(at)deepsky(dot)com>
Cc: pgsql-bugs(at)postgresql(dot)org, andrew(at)supernews(dot)com, "Brian O'Reilly" <fade(at)deepsky(dot)com>
Subject: Re: BUG #1552: massive performance hit between 7.4 and 8.0.1
Date: 2005-03-23 22:13:40
Message-ID: 14846.1111616020@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-performance

Keith Browne <tuxedo(at)deepsky(dot)com> writes:
> Tom Lane wrote:
>> I'm still looking for an example that demonstrates why this is a common
>> problem that we need to worry about.

> We're filling pairs of tables with rows having nearly a one-to-one
> mapping; very rarely, the second table will have multiple rows
> corresponding to one row in the first table. When we insert the first
> row in the second table, therefore, we've just put the corresponding row
> into the first table, so the foreign key constraint is satisfied.

Hmm ...

>> We could band-aid this in 8.0 as previously suggested (have the planner
>> assume > 0 pages when it sees actually 0 pages) but without seeing a
>> concrete example I can't tell if that will fix the complaint or not.

> It sounds like this could work for us,

No, it wouldn't, because by the time you do the first FK trigger you'd
have one row/one page in the referenced table, so it'd still look like a
seqscan situation to the planner. The only way we could make that work
is to effectively disable seqscans entirely, by *always* pretending the
table size is large enough to trigger an indexscan, even when the
planner can plainly see that it's not. This is not an acceptable answer
IMHO.

[ thinks for a bit... ] The reason 7.4 and before worked reasonably
for you is that they assumed the 10/1000 statistics for any
never-yet-vacuumed table, whether it is empty or not. (This worked fine
for your problem but shot a lot of other people in the foot, because
that's what the estimate would stay at even if the table grew vastly
larger, so long as it wasn't vacuuumed.) Maybe we could
put in a hack that detects whether a table has yet been vacuumed, and
sets 10/1000 as the minimum stats --- not fixed values, but minimum
values that can be overridden when the table is actually larger ---
until it has been vacuumed. I'm not sure if this is workable. It looks
to me like we'd have to approximate the "never vacuumed" condition by
checking whether pg_class.reltuples and relpages are both zero, which
is the initial condition all right but would also arise after a vacuum
finds nothing in the table. So basically the planner would never
optimize the entirely-empty-table condition properly, even after vacuum.
Maybe this is the least bad alternative for 8.0.*.

regards, tom lane


From: Andrew - Supernews <andrew+nonews(at)supernews(dot)com>
To: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #1552: massive performance hit between 7.4 and 8.0.1
Date: 2005-03-24 09:14:25
Message-ID: slrnd4517h.rhd.andrew+nonews@trinity.supernews.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-performance

On 2005-03-23, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> No, it wouldn't, because by the time you do the first FK trigger you'd
> have one row/one page in the referenced table, so it'd still look like a
> seqscan situation to the planner. The only way we could make that work
> is to effectively disable seqscans entirely, by *always* pretending the
> table size is large enough to trigger an indexscan, even when the
> planner can plainly see that it's not. This is not an acceptable answer
> IMHO.

I'm not yet convinced the planner is right to _ever_ choose a seqscan for
FK triggers. The idea that a seqscan is faster on small tables is
traditional, and it has some justification in the case where nothing is
in the cache (since index scan will touch the disk twice in that case),
but I'm finding that for tables of the order of 50 rows (easily fitting in
one page) that index scans are as fast as or faster than seqscans for
doing simple one-row lookups provided the tables are in cache.

--
Andrew, Supernews
http://www.supernews.com - individual and corporate NNTP services


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Keith Browne <tuxedo(at)deepsky(dot)com>, pgsql-bugs(at)postgresql(dot)org, andrew(at)supernews(dot)com, "Brian O'Reilly" <fade(at)deepsky(dot)com>
Subject: Re: BUG #1552: massive performance hit between 7.4 and 8.0.1
Date: 2005-03-24 19:22:22
Message-ID: 4841.1111692142@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-performance

I wrote:
> ... Maybe we could
> put in a hack that detects whether a table has yet been vacuumed, and
> sets 10/1000 as the minimum stats --- not fixed values, but minimum
> values that can be overridden when the table is actually larger ---
> until it has been vacuumed.

For lack of any better suggestions, I've done this in HEAD and 8.0
branches. It proved simplest to just limit the page estimate to be
at least 10 pages when relpages == 0. The tuple estimate will be
derived from that using pre-existing code that estimates the average
tuple size.

regards, tom lane


From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Keith Browne <tuxedo(at)deepsky(dot)com>
Cc: pgsql-bugs(at)postgresql(dot)org, Brian O'Reilly <fade(at)deepsky(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: BUG #1552: massive performance hit between 7.4 and 8.0.1
Date: 2005-03-25 10:18:37
Message-ID: 1111745917.11750.713.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-performance

On Wed, 2005-03-23 at 14:22 -0500, Keith Browne wrote:
> Simon Riggs wrote:
>
> > The EXPLAINs you've enclosed are for SELECTs, yet your bug report
> > describes INSERTs as being the things that are slow.
> > [You may find better performance from using COPY]

> We're starting with an empty database, creating four tables, and
> populating those tables with a total of 180,000-200,000 rows. Each
> table has a primary key, and several of the tables reference foreign
> keys in other tables. We've written a Python script, using psycopg,
> which executes all the queries to create the tables and insert the rows.
> The database is running on the same machine where the script runs.
>
> I've seen similar performance when issuing a COMMIT after each
> insertion, and also after batching insertions in blocks of 250 per
> COMMIT, so batching the commits is not helping much. I've looked at the
> possibility of using COPY, but in our production environment it will be
> prohibitive to build a flat file with all this data. I'd rather
> generate it on the fly, as we've been able to do with PostgreSQL 7.4.
>
> > Also, your tests have compared two systems, so it might be that the
> > hardware or configuration of one system is different from the other.
>
> When running with PostgreSQL 7.4 on a dual-CPU Athlon MP2400+ machine
> with a gigabyte of RAM, running Debian Linux version 2.6.8.1, we were
> able to insert all this data in 5-7 minutes. It's taken a while to
> install Postgres 8.0.1 on the same machine, but now I have, and it's
> taking 40-45 minutes to run the same insert script. This is similar to
> the performance we saw on another machine, a fast single-CPU AMD64 box
> running Gentoo.
>
> I don't think it's a hardware issue. I dug around a bit, and found
> suggestions that this sort of problem could be worked around by breaking
> the database connection and restarting it after the tables had been
> partially filled. I modified our script to break and re-establish the
> database connection when each table first has 4,000 records inserted,
> and the performance is greatly improved; it now takes only about 3.5
> minutes to insert 180,000+ rows.
>
> I've since modified this script to build and populate a fifth table with
> over 1.3 million rows. The fifth table has no primary key, but lists a
> foreign key into one of the first four tables. With the above
> modification (break and re-build the DB connection after 4,000 rows have
> been inserted), the whole database can be populated in about 15 minutes.
> I wouldn't have dared try to build a one-million-plus-row table until
> I found this speed-up.
>
> > If you could repeat the test on one single system, then this would
> > assist in the diagnosis of this bug report. Also, if you could describe
> > the workload that is giving you a problem more exactly, that would help.
> > Specifically, can you confirm that you have run ANALYZE on the tables,
> > and also give us some idea of numbers of rows in each table at the time
> > you first run your programs.
>
> Just to see if it would help, I tried modifying the script to run an
> ANALYZE against each table after 4,000 insertions, instead of breaking
> and re-establishing the DB connection. I still saw ~45-minute times to
> insert 180,000 rows. I then tried running ANALYZE against each table
> after *each* 4,000 rows inserted, and again, it took about 45 minutes to
> run the insert.
>
> Each table is empty when I first run the program. I am dropping and
> re-creating the database for each test run.
>
> > There is clearly a problem, but it is not yet clearly a bug. If it is a
> > bug, we're interested in solving it as much as you.
>
> I'd be happy to run further tests or provide more details, if they'll
> help. We now have a workaround which is allowing us to proceed with our
> project, but I'd like to know if there's another way to do this. While
> I understand that large or complex databases require careful tuning, I
> was surprised to see a six- or seven-fold increase in run times between
> PostgreSQL 7.4 and 8.0.1 on the same hardware, on an operation which
> seems fairly straightforward: populating an empty table.
>
> One other thing which puzzled me: as a test, I tried modifying our
> script to spit out raw SQL statements instead of connecting to the
> database and performing the inserts itself. Normally, our script
> populates two tables in one pass, and then populates the third and
> fourth tables in a second pass. I massaged the SQL by hand to group the
> inserts together by table, so that the first table would be entirely
> populated, then the second, etc. When I ran this SQL script by piping
> it straight into psql, it finished in about four minutes. This is
> comparable to the time it takes to run my modified script which breaks
> and re-establishes the connection to the database.

OK. Not-a-bug.

Your situation is covered in the manual with some sage advice
http://www.postgresql.org/docs/8.0/static/populate.html
It doesn't go into great lengths about all the reasons why those
recommendations are good ones - but they are clear.

There isn't anything in there (yet) that says, "turn off Referential
Integrity too" and perhaps it should...

The tables you are loading all refer to one another with referential
constraints? Possibly a master-detail relationship, or two major
entities joined via an associative one. The plan is bad because your FKs
point to what are initially empty tables. The best thing to do would be
to add the RI constraints after the tables are loaded, rather than
adding them before.

Your program is issuing a Prepare statement, then followed by thousands
of Execute statements. This reduces much of the overhead of
optimization, since the plan is cached early in that sequence of
executes. The plan thus remains the same all the way through, though as
you observe, that isn't optimal. The initial plan saw an empty table,
though it didn't stay empty long. Breaking the connection and
reattaching forces the plan to be reevaluated; when this is performed
after the point at which a more optimal plan will be generated, your
further inserts use the better plan and work continues as fast as
before.

psql doesn't suffer from this problem because it doesn't use Prepared
statements. That means you pay the cost of compiling each SQL statement
at execution time, though gain the benefit of an immediate plan change
at the optimal moment.

I think we should spawn a TODO item from this:

* Coerce FK lookups to always use an available index

but that in itself isn't a certain fix and might cause other
difficulties elsewhere.

Best Regards, Simon Riggs


From: Karim Nassar <karim(dot)nassar(at)NAU(dot)EDU>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Keith Browne <tuxedo(at)deepsky(dot)com>, Brian O'Reilly <fade(at)deepsky(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: [BUGS] BUG #1552: massive performance hit between 7.4
Date: 2005-03-25 10:50:36
Message-ID: 1111747837.898.19.camel@k2.cet.nau.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-performance

On Fri, 2005-03-25 at 10:18 +0000, Simon Riggs wrote:
> > When running with PostgreSQL 7.4 on a dual-CPU Athlon MP2400+ machine
> > with a gigabyte of RAM, running Debian Linux version 2.6.8.1, we were
> > able to insert all this data in 5-7 minutes. It's taken a while to
> > install Postgres 8.0.1 on the same machine, but now I have, and it's
> > taking 40-45 minutes to run the same insert script.

<snip>

> OK. Not-a-bug.
>
> Your situation is covered in the manual with some sage advice
> http://www.postgresql.org/docs/8.0/static/populate.html
> It doesn't go into great lengths about all the reasons why those
> recommendations are good ones - but they are clear.

Simon, this begs the question: what changed from 7.4->8.0 to require he
modify his script?

TIA,
--
Karim Nassar
Department of Computer Science
Box 15600, College of Engineering and Natural Sciences
Northern Arizona University, Flagstaff, Arizona 86011
Office: (928) 523-5868 -=- Mobile: (928) 699-9221


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Keith Browne <tuxedo(at)deepsky(dot)com>, pgsql-bugs(at)postgresql(dot)org, "Brian O'Reilly" <fade(at)deepsky(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: BUG #1552: massive performance hit between 7.4 and 8.0.1
Date: 2005-03-25 14:41:09
Message-ID: 23228.1111761669@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-performance

Simon Riggs <simon(at)2ndquadrant(dot)com> writes:
> I think we should spawn a TODO item from this:
> * Coerce FK lookups to always use an available index

No, we aren't doing that.

The correct TODO item is "Replan cached plans when table size has
changed a lot" which of course depends on having a framework to do
replanning at all. I intend to take a look at that once Neil has
created such a framework ...

regards, tom lane


From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Karim Nassar <karim(dot)nassar(at)NAU(dot)EDU>
Cc: Keith Browne <tuxedo(at)deepsky(dot)com>, Brian O'Reilly <fade(at)deepsky(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: [BUGS] BUG #1552: massive performance hit
Date: 2005-03-25 15:38:25
Message-ID: 1111765105.11750.749.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-performance

On Fri, 2005-03-25 at 03:50 -0700, Karim Nassar wrote:
> On Fri, 2005-03-25 at 10:18 +0000, Simon Riggs wrote:
> > > When running with PostgreSQL 7.4 on a dual-CPU Athlon MP2400+ machine
> > > with a gigabyte of RAM, running Debian Linux version 2.6.8.1, we were
> > > able to insert all this data in 5-7 minutes. It's taken a while to
> > > install Postgres 8.0.1 on the same machine, but now I have, and it's
> > > taking 40-45 minutes to run the same insert script.
>
> <snip>
>
> > OK. Not-a-bug.
> >
> > Your situation is covered in the manual with some sage advice
> > http://www.postgresql.org/docs/8.0/static/populate.html
> > It doesn't go into great lengths about all the reasons why those
> > recommendations are good ones - but they are clear.

> Simon, this begs the question: what changed from 7.4->8.0 to require he
> modify his script?

Good question. Clearly, some combination of stats-plus-index-selection
code changed but I suspect this is a case of more, not less accuracy,
affecting us here.

The FK code literally generates SQL statements, then prepares them.
AFAICS it should be possible to add more code to
src/backend/utils/adt/ritrigger.c to force the prepare of FK code to
avoid seq scans by executing "SET enable_seqscan = off;"
I'll have a play....

But, the wider point raised by this is whether Prepare should be more
conservative in the plan it generates. When we Execute a single query,
it is perfectly OK to go for the "best" plan, since it is being executed
only this once and we can tell, right now, which one the "best" is.

With a Prepared query, it is clearly going to be executed many times and
so we should consider that the optimal plan may change over time.

Index access has more overhead for small tables, but increases by (I
think) only logN as the number of rows in a table, N, increases.
Sequential scan access varies by N. Thus, as N increases from zero,
first of all Seq Scan is the best plan - but only marginally better than
Index access, then this changes at some value of N, then after that
index access is the best plan. As N increases, Seq Scan access clearly
diverges badly from Indexed access.

The conservative choice for unknown, or varying N would be index access,
rather than the best plan available when the query is prepared.

I propose a more general TODO item:

* Make Prepared queries always use indexed access, if it is available

Best Regards, Simon Riggs