Re: Dump size bigger than pgdata size?

Lists: pgsql-admin
From: "Nicola Mauri" <Nicola(dot)Mauri(at)saga(dot)it>
To: pgsql-admin(at)postgresql(dot)org
Subject: Dump size bigger than pgdata size?
Date: 2006-06-21 14:30:17
Message-ID: OF406B6E88.F5B37D76-ONC1257194.004D5FAC-C1257194.004F9F44@saga.it
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin

[sorry if this was previously asked: list searches seem to be down]

I'm using pg_dump to take a full backup of my database using a compressed
format:
$ pg_dump -Fc my_db > /backup/my_db.dmp

It produces a 6 GB file whereas the pgdata uses only 5 GB of disk space:
$ ls -l /backup
-rw-r--r-- 6592715242 my_db.dmp
$ du -b /data
5372269196 /data

How could it be?
As far as I know, dumps should be smaller than filesystem datafile since
they do not store indexes, etc.

Database contains about one-hundred-thousands binary images, some of which
may be already compressed. So i tried the --compress=0 option but this
produces a dump that does not fit on my disk (more than 11 GB).
I'm using postgres 8.1.2 on RHEL4.

So, what can I do to diagnose the problem?
Thanks in advance,

Nicola


From: <alex(dot)cotarlan(at)thomson(dot)com>
To: <pgsql-admin(at)postgresql(dot)org>
Subject: Re: Dump size bigger than pgdata size?
Date: 2006-06-21 14:37:32
Message-ID: ED5A90704BD7744380E00432EEE165230276A8B4@TFUSNYNYCMBX01.ERF.THOMSON.COM
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin

It might happen because of the type of data you have ( binary images).
The compression for binary files is notorious horrible since there is a
small chance of occurrence of same chars

In other words it is possible since during compression there are
additional chars added for checksums and redundancy

This would normally happen on small binary files, though.

-----Original Message-----
From: pgsql-admin-owner(at)postgresql(dot)org
[mailto:pgsql-admin-owner(at)postgresql(dot)org] On Behalf Of Nicola Mauri
Sent: Wednesday, June 21, 2006 10:30 AM
To: pgsql-admin(at)postgresql(dot)org
Subject: [ADMIN] Dump size bigger than pgdata size?

[sorry if this was previously asked: list searches seem to be down]

I'm using pg_dump to take a full backup of my database using a
compressed format:
$ pg_dump -Fc my_db > /backup/my_db.dmp

It produces a 6 GB file whereas the pgdata uses only 5 GB of disk space:

$ ls -l /backup
-rw-r--r-- 6592715242 my_db.dmp
$ du -b /data
5372269196 /data

How could it be?
As far as I know, dumps should be smaller than filesystem datafile since
they do not store indexes, etc.

Database contains about one-hundred-thousands binary images, some of
which may be already compressed. So i tried the --compress=0 option but
this produces a dump that does not fit on my disk (more than 11 GB).
I'm using postgres 8.1.2 on RHEL4.

So, what can I do to diagnose the problem?
Thanks in advance,

Nicola


From: "Aaron Bono" <postgresql(at)aranya(dot)com>
To: "Nicola Mauri" <Nicola(dot)Mauri(at)saga(dot)it>
Cc: pgsql-admin(at)postgresql(dot)org
Subject: Re: Dump size bigger than pgdata size?
Date: 2006-06-21 14:54:50
Message-ID: bf05e51c0606210754o489d2fb7h2c0c6de60f93a2c5@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin

I would dare guess, and it seems you suspect as well, that the binary data
is why you are not getting very good compression.

You may try dumping the tables individually with *--table=**table* to see
which tables are taking the most space in your dump. Once you find out
which tables are taking the most space, you can check to see what is in
those tables and provide more details on the problem.

Personally I don't use the built in compression in pg_dump but pipe it to
gzip instead (not sure if it makes any difference). See
http://manual.intl.indoglobal.com/ch06s07.html for details.

-Aaron

On 6/21/06, Nicola Mauri <Nicola(dot)Mauri(at)saga(dot)it> wrote:
>
>
> [sorry if this was previously asked: list searches seem to be down]
>
> I'm using pg_dump to take a full backup of my database using a compressed
> format:
> $ pg_dump -Fc my_db > /backup/my_db.dmp
>
> It produces a 6 GB file whereas the pgdata uses only 5 GB of disk space:
> $ ls -l /backup
> -rw-r--r-- 6592715242 my_db.dmp
> $ du -b /data
> 5372269196 /data
>
> How could it be?
> As far as I know, dumps should be smaller than filesystem datafile since
> they do not store indexes, etc.
>
> Database contains about one-hundred-thousands binary images, some of which
> may be already compressed. So i tried the --compress=0 option but this
> produces a dump that does not fit on my disk (more than 11 GB).
> I'm using postgres 8.1.2 on RHEL4.
>
> So, what can I do to diagnose the problem?
> Thanks in advance,
>
> Nicola
>
>
>
==================================================================
Aaron Bono
President Aranya Software Technologies, Inc.
http://www.aranya.com We take care of your technology needs.
Phone: (816) 695-6071
==================================================================


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Nicola Mauri" <Nicola(dot)Mauri(at)saga(dot)it>
Cc: pgsql-admin(at)postgresql(dot)org
Subject: Re: Dump size bigger than pgdata size?
Date: 2006-06-21 17:00:22
Message-ID: 11486.1150909222@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin

"Nicola Mauri" <Nicola(dot)Mauri(at)saga(dot)it> writes:
> I'm using pg_dump to take a full backup of my database using a compressed
> format:
> $ pg_dump -Fc my_db > /backup/my_db.dmp

> It produces a 6 GB file whereas the pgdata uses only 5 GB of disk space:
> ...
> Database contains about one-hundred-thousands binary images, some of which
> may be already compressed.

Are those stored as bytea fields, or large objects, or what?

I can easily imagine bytea inflating to be larger in a dump than it is
on disk, since the text format for non-ASCII byte values looks like "\\nnn"
ie 5 bytes for only one byte on disk. Assuming that ASCII bytes make up
about half of the data, the average expansion factor would be about 2.5x.
Compression of the dump text would buy back some of this bloat but
probably not all.

This could be avoided by using COPY BINARY format, but I don't see any
very nice way to do that in the context of pg_dump --- it needs to
intermix COPY data with SQL commands ...

regards, tom lane


From: Jim Nasby <jnasby(at)pervasive(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Nicola Mauri" <Nicola(dot)Mauri(at)saga(dot)it>, pgsql-admin(at)postgresql(dot)org
Subject: Re: Dump size bigger than pgdata size?
Date: 2006-06-23 01:58:14
Message-ID: 5F1778F3-ACEA-44C3-B895-D651EF62CBBA@pervasive.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin

On Jun 21, 2006, at 12:00 PM, Tom Lane wrote:
> This could be avoided by using COPY BINARY format, but I don't see any
> very nice way to do that in the context of pg_dump --- it needs to
> intermix COPY data with SQL commands ...

Would the tar or custom format allow for this? IIRC, at least tar
puts all the copied data into separate files...
--
Jim C. Nasby, Sr. Engineering Consultant jnasby(at)pervasive(dot)com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jim Nasby <jnasby(at)pervasive(dot)com>
Cc: "Nicola Mauri" <Nicola(dot)Mauri(at)saga(dot)it>, pgsql-admin(at)postgresql(dot)org
Subject: Re: Dump size bigger than pgdata size?
Date: 2006-06-23 02:39:06
Message-ID: 10263.1151030346@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin

Jim Nasby <jnasby(at)pervasive(dot)com> writes:
> On Jun 21, 2006, at 12:00 PM, Tom Lane wrote:
>> This could be avoided by using COPY BINARY format, but I don't see any
>> very nice way to do that in the context of pg_dump --- it needs to
>> intermix COPY data with SQL commands ...

> Would the tar or custom format allow for this? IIRC, at least tar
> puts all the copied data into separate files...

Well, you could sorta do that, but the case that would stop working is
pg_restore output to a plain text SQL script (and related issues such as
the ability to use the feature in the context of pg_dumpall). Having
just gotten done fixing similar inconsistencies in pg_dump/pg_restore
for BLOBs, I'm not eager to re-introduce 'em for COPY BINARY ...

regards, tom lane


From: Jim Nasby <jnasby(at)pervasive(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Nicola Mauri" <Nicola(dot)Mauri(at)saga(dot)it>, pgsql-admin(at)postgresql(dot)org
Subject: Re: Dump size bigger than pgdata size?
Date: 2006-06-28 00:15:39
Message-ID: 219F801D-2039-4729-8CEC-B9EEC0C2F3BF@pervasive.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin

On Jun 22, 2006, at 9:39 PM, Tom Lane wrote:
> Jim Nasby <jnasby(at)pervasive(dot)com> writes:
>> On Jun 21, 2006, at 12:00 PM, Tom Lane wrote:
>>> This could be avoided by using COPY BINARY format, but I don't
>>> see any
>>> very nice way to do that in the context of pg_dump --- it needs to
>>> intermix COPY data with SQL commands ...
>
>> Would the tar or custom format allow for this? IIRC, at least tar
>> puts all the copied data into separate files...
>
> Well, you could sorta do that, but the case that would stop working is
> pg_restore output to a plain text SQL script (and related issues
> such as
> the ability to use the feature in the context of pg_dumpall). Having
> just gotten done fixing similar inconsistencies in pg_dump/pg_restore
> for BLOBs, I'm not eager to re-introduce 'em for COPY BINARY ...

Yeah, but how many people actually do that anyway? I can't really
come up with a use-case for it, and I'm pretty sure there's other
gains to be had by turning custom or tar format into more of a
'binary dump'. For one thing, that should ease the need to run the
newer version of pg_dump when upgrading (if we put the requisite
brains into pg_restore).

I suppose we could put support in pg_restore to convert between
BINARY and escaped as needed; or just disallow pg_restore from
dumping SQL if there's binary data (maybe have it include copy
statements that reference the specific files).
--
Jim C. Nasby, Sr. Engineering Consultant jnasby(at)pervasive(dot)com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461