Re: [PERFORM] Slow BLOBs restoring

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Vlad Arkhipov <arhipov(at)dc(dot)baikal(dot)ru>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PERFORM] Slow BLOBs restoring
Date: 2010-12-09 13:05:33
Message-ID: AANLkTinvRa_XM88FV8vJjjqkRONeNiQiY=Lxg5iARqtU@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

On Thu, Dec 9, 2010 at 12:28 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Vlad Arkhipov <arhipov(at)dc(dot)baikal(dot)ru> writes:
>> 08.12.2010 22:46, Tom Lane writes:
>>> Are you by any chance restoring from an 8.3 or older pg_dump file made
>>> on Windows?  If so, it's a known issue.
>
>> No, I tried Linux only.
>
> OK, then it's not the missing-data-offsets issue.
>
>> I think you can reproduce it. First I created a database full of many
>> BLOBs on Postres 8.4.5. Then I created a dump:
>
> Oh, you should have said how many was "many".  I had tried with several
> thousand large blobs yesterday and didn't see any problem.  However,
> with several hundred thousand small blobs, indeed it gets pretty slow
> as soon as you use -j.
>
> oprofile shows all the time is going into reduce_dependencies during the
> first loop in restore_toc_entries_parallel (ie, before we've actually
> started doing anything in parallel).  The reason is that for each blob,
> we're iterating through all of the several hundred thousand TOC entries,
> uselessly looking for anything that depends on the blob.  And to add
> insult to injury, because the blobs are all marked as SECTION_PRE_DATA,
> we don't get to parallelize at all.  I think we won't get to parallelize
> the blob data restoration either, since all the blob data is hidden in a
> single TOC entry :-(
>
> So the short answer is "don't bother to use -j in a mostly-blobs restore,
> becausw it isn't going to help you in 9.0".
>
> One fairly simple, if ugly, thing we could do about this is skip calling
> reduce_dependencies during the first loop if the TOC object is a blob;
> effectively assuming that nothing could depend on a blob.  But that does
> nothing about the point that we're failing to parallelize blob
> restoration.  Right offhand it seems hard to do much about that without
> some changes to the archive representation of blobs.  Some things that
> might be worth looking at for 9.1:
>
> * Add a flag to TOC objects saying "this object has no dependencies",
> to provide a generalized and principled way to skip the
> reduce_dependencies loop.  This is only a good idea if pg_dump knows
> that or can cheaply determine it at dump time, but I think it can.
>
> * Mark BLOB TOC entries as SECTION_DATA, or somehow otherwise make them
> parallelizable.  Also break the BLOBS data item apart into an item per
> BLOB, so that that part's parallelizable.  Maybe we should combine the
> metadata and data for each blob into one TOC item --- if we don't, it
> seems like we need a dependency, which will put us back behind the
> eight-ball.  I think the reason it's like this is we didn't originally
> have a separate TOC item per blob; but now that we added that to support
> per-blob ACL data, the monolithic BLOBS item seems pretty pointless.
> (Another thing that would have to be looked at here is the dependency
> between a BLOB and any BLOB COMMENT for it.)
>
> Thoughts?

Is there any use case for restoring a BLOB but not the BLOB COMMENT or
BLOB ACLs? Can we just smush everything together into one section?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2010-12-09 13:13:02 PS display and standby query conflict
Previous Message Simon Riggs 2010-12-09 11:12:15 Re: Hot Standby tuning for btree_xlog_vacuum()

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2010-12-09 14:50:30 Re: [PERFORM] Slow BLOBs restoring
Previous Message Marti Raudsepp 2010-12-09 12:09:28 Re: Hardware recommendations