Inefficient handling of LO-restore + Patch

From: "Mario Weilguni" <mario(dot)weilguni(at)icomedias(dot)com>
To: <pgsql-hackers(at)postgresql(dot)org>
Subject: Inefficient handling of LO-restore + Patch
Date: 2002-04-11 08:56:02
Message-ID: D143FBF049570C4BB99D962DC25FC2D204B858@freedom.icomedias.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I've detected that the restoring of large objects may consume huge amounts of
diskspace when using unusual blocksizes (e.g. 32KB). My setup is Postgresql-7.2.1
+ 32KB blocks + LOBLKSIZE 16KB, a unusual combination I think, , because this setup gave
the very best performance. I wanted to restore a database containing 2 gigabytes of
large objects, and noticed that it took around 6 gigabytes of diskspace to finish.
After it finished I ran "VACUUM FULL VERBOSE pg_largeobject",
and had around 140000 of live tuples, and around 480000 of dead tuples (I don't remember the exact
values, but I think there were 3 times dead tuples to live tuples).

I checked the pg_dump sources and found out that data is writen in 4KB chunks to the large object.
Since in my database the LO tuples are 16KB each, that would mean:
1. write 4KB -> have 1 live 4KB tuple
2. write 4KB -> 1 live 8KB tuple and 1 dead 4KB tuple
3. write 4KB -> 1 live 12KB tuple and 2 dead tuples
3. write 4KB -> 1 live 16KB tuple and 3 dead tuples

So creating a 16KB chunk took 16+12+8+4 => 40KB of diskspace, so recovering 2GB large objects
takes around 40/16 * 2 => 5GB diskspace and leaves 3 times the number of dead tuples (supposing
all LO's have sizes which are multples of 16KB).

I've written a patch which buffers LO's in 32KB blocks and tested again, and had 140000 live tuples
and nearly no dead tuples (around 10000, I'm still not sure where they're coming from).

Is there a better way to fix this? Can I post the patch to this list (~150 lines).And I did not find out how I can detect the large object chunksize, either from getting it from the headers (include "storage/large_object.h" did not work) or how to get it from the database I restore to. Any hints?

Best regards,
Mario Weilguni

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Christopher Kings-Lynne 2002-04-11 09:10:59 Make text output more generic
Previous Message Nicolas Bazin 2002-04-11 06:35:51 Re: 7.3 schedule