pg_dump issues

Lists: pgsql-hackers
From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: pg_dump issues
Date: 2011-10-01 21:08:02
Message-ID: 4E878132.4080903@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

While investigating a client problem I just observed that pg_dump takes
a surprisingly large amount of time to dump a schema with a large number
of views. The client's hardware is quite spiffy, and yet pg_dump is
taking many minutes to dump a schema with some 35,000 views. Here's a
simple test case:

create schema views;
do 'begin for i in 1 .. 10000 loop execute $$create view views.v_$$
|| i ||$$ as select current_date as d, current_timestamp as ts,
$_$a$_$::text || n as t, n from generate_series(1,5) as n$$; end
loop; end;';

On my modest hardware this database took 4m18.864s for pg_dump to run.
Should we be looking at replacing the retail operations which consume
most of this time with something that runs faster?

There is also this gem of behaviour, which is where I started:

p1 p2
begin;
drop view foo;
pg_dump
commit;
boom.

with this error:

2011-10-01 16:38:20 EDT [27084] 30063 ERROR: could not open
relation with OID 133640
2011-10-01 16:38:20 EDT [27084] 30064 STATEMENT: SELECT
pg_catalog.pg_get_viewdef('133640'::pg_catalog.oid) AS viewdef

Of course, this isn't caused by having a large catalog, but it's
terrible nevertheless. I'm not sure what to do about it.

cheers

andrew


From: Joe Abbate <jma(at)freedomcircle(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_dump issues
Date: 2011-10-01 21:48:10
Message-ID: 4E878A9A.20509@freedomcircle.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 10/01/2011 05:08 PM, Andrew Dunstan wrote:
> While investigating a client problem I just observed that pg_dump takes
> a surprisingly large amount of time to dump a schema with a large number
> of views. The client's hardware is quite spiffy, and yet pg_dump is
> taking many minutes to dump a schema with some 35,000 views. Here's a
> simple test case:
>
> create schema views;
> do 'begin for i in 1 .. 10000 loop execute $$create view views.v_$$
> || i ||$$ as select current_date as d, current_timestamp as ts,
> $_$a$_$::text || n as t, n from generate_series(1,5) as n$$; end
> loop; end;';
>
>
> On my modest hardware this database took 4m18.864s for pg_dump to run.
> Should we be looking at replacing the retail operations which consume
> most of this time with something that runs faster?

How modest? Was there anything else in the database? I tried with 9000
views (because I didn't want to bother increasing
max_locks_per_transaction) and the pg_dump in less than 10 seconds
(8.991s) redirecting (plain-text) output to a file (this is on a Core i5).

> There is also this gem of behaviour, which is where I started:
>
> p1 p2
> begin;
> drop view foo;
> pg_dump
> commit;
> boom.
>
> with this error:
>
> 2011-10-01 16:38:20 EDT [27084] 30063 ERROR: could not open
> relation with OID 133640
> 2011-10-01 16:38:20 EDT [27084] 30064 STATEMENT: SELECT
> pg_catalog.pg_get_viewdef('133640'::pg_catalog.oid) AS viewdef
>
> Of course, this isn't caused by having a large catalog, but it's
> terrible nevertheless. I'm not sure what to do about it.

Couldn't you run pg_dump with --lock-wait-timeout?

Joe


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Joe Abbate <jma(at)freedomcircle(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_dump issues
Date: 2011-10-02 01:46:49
Message-ID: 4E87C289.7060102@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 10/01/2011 05:48 PM, Joe Abbate wrote:
> On 10/01/2011 05:08 PM, Andrew Dunstan wrote:
>> While investigating a client problem I just observed that pg_dump takes
>> a surprisingly large amount of time to dump a schema with a large number
>> of views. The client's hardware is quite spiffy, and yet pg_dump is
>> taking many minutes to dump a schema with some 35,000 views. Here's a
>> simple test case:
>>
>> create schema views;
>> do 'begin for i in 1 .. 10000 loop execute $$create view views.v_$$
>> || i ||$$ as select current_date as d, current_timestamp as ts,
>> $_$a$_$::text || n as t, n from generate_series(1,5) as n$$; end
>> loop; end;';
>>
>>
>> On my modest hardware this database took 4m18.864s for pg_dump to run.
>> Should we be looking at replacing the retail operations which consume
>> most of this time with something that runs faster?
> How modest? Was there anything else in the database? I tried with 9000
> views (because I didn't want to bother increasing
> max_locks_per_transaction) and the pg_dump in less than 10 seconds
> (8.991s) redirecting (plain-text) output to a file (this is on a Core i5).

Yeah, it must be pretty modest :-) On more powerful h/w I get the same.
I need to dig further into why it's taking so long to dump my client's
schema on server class hardware.

>> There is also this gem of behaviour, which is where I started:
>>
>> p1 p2
>> begin;
>> drop view foo;
>> pg_dump
>> commit;
>> boom.
>>
>> with this error:
>>
>> 2011-10-01 16:38:20 EDT [27084] 30063 ERROR: could not open
>> relation with OID 133640
>> 2011-10-01 16:38:20 EDT [27084] 30064 STATEMENT: SELECT
>> pg_catalog.pg_get_viewdef('133640'::pg_catalog.oid) AS viewdef
>>
>> Of course, this isn't caused by having a large catalog, but it's
>> terrible nevertheless. I'm not sure what to do about it.
> Couldn't you run pg_dump with --lock-wait-timeout?
>

How would that help? This isn't a lock failure.

cheers

andrew


From: Joe Abbate <jma(at)freedomcircle(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_dump issues
Date: 2011-10-02 21:15:53
Message-ID: 4E88D489.7060102@freedomcircle.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi Andrew,

On 10/01/2011 09:46 PM, Andrew Dunstan wrote:
>
>
> On 10/01/2011 05:48 PM, Joe Abbate wrote:
>> On 10/01/2011 05:08 PM, Andrew Dunstan wrote:
>>> There is also this gem of behaviour, which is where I started:
>>>
>>> p1 p2
>>> begin;
>>> drop view foo;
>>> pg_dump
>>> commit;
>>> boom.
>>>
>>> with this error:
>>>
>>> 2011-10-01 16:38:20 EDT [27084] 30063 ERROR: could not open
>>> relation with OID 133640
>>> 2011-10-01 16:38:20 EDT [27084] 30064 STATEMENT: SELECT
>>> pg_catalog.pg_get_viewdef('133640'::pg_catalog.oid) AS viewdef
>>>
>>> Of course, this isn't caused by having a large catalog, but it's
>>> terrible nevertheless. I'm not sure what to do about it.
>> Couldn't you run pg_dump with --lock-wait-timeout?
>>
>
> How would that help? This isn't a lock failure.

I misinterpreted the error. I have confirmed the behavior you see. I'm
somewhat surprised there appears to be no ability to lock a database
exclusively for something like pg_dump/pg_restore, so you're not
surprised by concurrent activity against the catalogs. I'm guessing the
assumption is that MVCC will take care of that?

Joe


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_dump issues
Date: 2011-10-03 04:47:18
Message-ID: 23954.1317617238@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> While investigating a client problem I just observed that pg_dump takes
> a surprisingly large amount of time to dump a schema with a large number
> of views. The client's hardware is quite spiffy, and yet pg_dump is
> taking many minutes to dump a schema with some 35,000 views. Here's a
> simple test case:

> create schema views;
> do 'begin for i in 1 .. 10000 loop execute $$create view views.v_$$
> || i ||$$ as select current_date as d, current_timestamp as ts,
> $_$a$_$::text || n as t, n from generate_series(1,5) as n$$; end
> loop; end;';

> On my modest hardware this database took 4m18.864s for pg_dump to run.

It takes about that on my machine too ... with --enable-cassert.
oprofile said that 90% of the runtime was going into AllocSetCheck,
so I rebuilt without cassert, and the runtime dropped to 16 seconds.
What were you testing?

(Without cassert, it looks like LockReassignCurrentOwner is the next
biggest time sink; I'm wondering if there's some sort of O(N^2) behavior
in there.)

regards, tom lane


From: Florian Pflug <fgp(at)phlo(dot)org>
To: Joe Abbate <jma(at)freedomcircle(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_dump issues
Date: 2011-10-03 08:30:55
Message-ID: 24B1FA58-50E4-41ED-B237-05540DABA25B@phlo.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Oct2, 2011, at 23:15 , Joe Abbate wrote:
> I'm
> somewhat surprised there appears to be no ability to lock a database
> exclusively for something like pg_dump/pg_restore, so you're not
> surprised by concurrent activity against the catalogs. I'm guessing the
> assumption is that MVCC will take care of that?

I think the hope is more that it will, one day. Currently, postgres internally
accesses the catalog with SnapshotNow, not with a MVCC snapshot. This is
necessary to ensure, for example, that rows inserted into a table also get
inserted into a newly created index. This wouldn't affects pg_dump if it only
access the catalog via SQL, but it doesn't. pg_dump also depends on some server-side
functions to do its work, and since these functions in turn use SnapshotNow-based
internal backend functions, pg_dump essentially uses a mix of SnapshotNow and
its transaction's MVCC snapshot.

There has been talk about reducing the use of of SnapshotNow for catalog access,
but AFAIK there's no concrete proposal, and certainly no patch, available.

best regards,
Florian Pflug


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Joe Abbate <jma(at)freedomcircle(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_dump issues
Date: 2011-10-03 15:04:31
Message-ID: CA+TgmobrML0=8YbrkKvsGv-1Sc9HQ7oXZ8bhdmunoOz_i-tYSg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Oct 1, 2011 at 9:46 PM, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
> How would that help? This isn't a lock failure.

It is, rather, a failure to lock. Currently, LOCK TABLE only works on
tables, and pg_dump only applies it to tables. If the offending
object had been a table rather than a view, pg_dump would (I believe)
have blocked trying to obtain an AccessShareLock against the existing
AccessExclusiveLock. We talked about allowing locks on other types of
relations, but due to some bad syntax choices in the past it's not
completely obvious how to shoehorn that in.

http://archives.postgresql.org/pgsql-hackers/2011-06/msg00119.php

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Joe Abbate <jma(at)freedomcircle(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_dump issues
Date: 2011-10-03 15:21:02
Message-ID: 15826.1317655262@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Sat, Oct 1, 2011 at 9:46 PM, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
>> How would that help? This isn't a lock failure.

> It is, rather, a failure to lock. Currently, LOCK TABLE only works on
> tables, and pg_dump only applies it to tables. If the offending
> object had been a table rather than a view, pg_dump would (I believe)
> have blocked trying to obtain an AccessShareLock against the existing
> AccessExclusiveLock.

Yeah, and it would still have failed once the lock was released.

Short of providing some sort of global DDL-blocking lock (with attendant
performance consequences) it's not clear how to create an entirely
bulletproof solution here. This isn't a new problem --- we've been
aware of pg_dump's limitations in this area for many years.

regards, tom lane


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_dump issues
Date: 2011-10-03 15:38:27
Message-ID: 1317656240-sup-777@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Excerpts from Tom Lane's message of lun oct 03 01:47:18 -0300 2011:

> (Without cassert, it looks like LockReassignCurrentOwner is the next
> biggest time sink; I'm wondering if there's some sort of O(N^2) behavior
> in there.)

That seems fishy. Even if there weren't quadratic behavior, should this
be called at all? AFAIK it should only be used on cases where there are
subtransactions at work, and I don't think pg_dump uses them.

--
Álvaro Herrera <alvherre(at)commandprompt(dot)com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_dump issues
Date: 2011-10-03 15:39:05
Message-ID: 4E89D719.80705@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 10/03/2011 12:47 AM, Tom Lane wrote:
> Andrew Dunstan<andrew(at)dunslane(dot)net> writes:
>> While investigating a client problem I just observed that pg_dump takes
>> a surprisingly large amount of time to dump a schema with a large number
>> of views. The client's hardware is quite spiffy, and yet pg_dump is
>> taking many minutes to dump a schema with some 35,000 views. Here's a
>> simple test case:
>> create schema views;
>> do 'begin for i in 1 .. 10000 loop execute $$create view views.v_$$
>> || i ||$$ as select current_date as d, current_timestamp as ts,
>> $_$a$_$::text || n as t, n from generate_series(1,5) as n$$; end
>> loop; end;';
>> On my modest hardware this database took 4m18.864s for pg_dump to run.
> It takes about that on my machine too ... with --enable-cassert.
> oprofile said that 90% of the runtime was going into AllocSetCheck,
> so I rebuilt without cassert, and the runtime dropped to 16 seconds.
> What were you testing?

Yeah, you're right, that must have been it. That's a big hit, I didn't
realise cassert was so heavy. (Note to self: test with production build
settings). I don't seem to be batting 1000 ...

I still need to get to the bottom of why the client's machine is taking
so long.

I do notice that we seem to be doing a lot of repetitive tasks, e.g.
calling pg_format_type() over and over for the same arguments. Would we
be better off cacheing that?

cheers

andrew


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_dump issues
Date: 2011-10-03 16:02:14
Message-ID: 16626.1317657734@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> Excerpts from Tom Lane's message of lun oct 03 01:47:18 -0300 2011:
>> (Without cassert, it looks like LockReassignCurrentOwner is the next
>> biggest time sink; I'm wondering if there's some sort of O(N^2) behavior
>> in there.)

> That seems fishy. Even if there weren't quadratic behavior, should this
> be called at all? AFAIK it should only be used on cases where there are
> subtransactions at work, and I don't think pg_dump uses them.

I wondered that too, but the calls are legit --- they're coming from
PortalDrop.

It appears that most of the calls don't actually have anything to do,
but they're iterating through a rather large local lock table to find
that out. We probably ought to think of a way to avoid that. The trick
is to not make performance worse for typical small transactions that
aren't holding many locks (which I think was the design center for this
to begin with).

regards, tom lane