Re: size of pg_dump files containing bytea values

Lists: pgsql-performance
From: "Steve McWilliams" <smcwilliams(at)EmprisaNetworks(dot)com>
To: <pgsql-performance(at)postgresql(dot)org>
Subject: size of pg_dump files containing bytea values
Date: 2006-07-12 18:36:27
Message-ID: 3985.10.1.1.126.1152729387.squirrel@portal.emprisanetworks.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

I notice that non-printables in bytea values are being spit out by pg_dump
using escaped octet sequences even when the "-Fc" option is present
specifying use of the custom binary output format rather than plain text
format. This bloats the size of bytea values in the dump file by a factor
of 3+ typically. When you have alot of large bytea values in your db this
can add up very quickly.

Shouldn't the custom format be smart and just write the raw bytes to the
output file rather than trying to make them ascii readable?

Thanks.

Steve McWilliams
Software Engineer
Emprisa Networks
703-691-0433x21
smcwilliams(at)emprisanetworks(dot)com

The information contained in this communication is intended only for the
use of the recipient named above, and may be legally privileged,
confidential and exempt from disclosure under applicable law. If the
reader of this communication is not the intended recipient, you are hereby
notified that any dissemination, distribution or copying of this
communication, or any of its contents, is strictly prohibited. If you have
received this communication in error, please resend this communication to
the sender and delete the original communication and any copy of it from
your computer system. Thank you.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Steve McWilliams" <smcwilliams(at)EmprisaNetworks(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: size of pg_dump files containing bytea values
Date: 2006-07-13 02:53:25
Message-ID: 1820.1152759205@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

"Steve McWilliams" <smcwilliams(at)EmprisaNetworks(dot)com> writes:
> I notice that non-printables in bytea values are being spit out by pg_dump
> using escaped octet sequences even when the "-Fc" option is present
> specifying use of the custom binary output format rather than plain text
> format. This bloats the size of bytea values in the dump file by a factor
> of 3+ typically.

No, because the subsequent compression step should buy back most of
that.

regards, tom lane


From: Greg Stark <gsstark(at)mit(dot)edu>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Steve McWilliams" <smcwilliams(at)EmprisaNetworks(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: size of pg_dump files containing bytea values
Date: 2006-07-13 16:30:50
Message-ID: 87k66hii11.fsf@stark.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

> "Steve McWilliams" <smcwilliams(at)EmprisaNetworks(dot)com> writes:
> > I notice that non-printables in bytea values are being spit out by pg_dump
> > using escaped octet sequences even when the "-Fc" option is present
> > specifying use of the custom binary output format rather than plain text
> > format. This bloats the size of bytea values in the dump file by a factor
> > of 3+ typically.
>
> No, because the subsequent compression step should buy back most of
> that.

Didn't byteas used to get printed as hex? Even in psql they're now being
printed in the escaped octet sequence. When did this change?

--
greg


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: "Steve McWilliams" <smcwilliams(at)EmprisaNetworks(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: size of pg_dump files containing bytea values
Date: 2006-07-13 17:42:41
Message-ID: 16771.1152812561@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Greg Stark <gsstark(at)mit(dot)edu> writes:
> Didn't byteas used to get printed as hex?

No, not that I recall. I don't have anything older than 7.0 running,
but it behaves the same as now:

play=> select 'xyz\\001'::bytea;
?column?
----------
xyz\001
(1 row)

play=>

regards, tom lane


From: Florian Weimer <fweimer(at)bfk(dot)de>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Steve McWilliams" <smcwilliams(at)EmprisaNetworks(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: size of pg_dump files containing bytea values
Date: 2006-07-14 07:05:31
Message-ID: 82k66g64zo.fsf@mid.bfk.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

* Greg Stark:

> Didn't byteas used to get printed as hex?

No, they didn't. It would be useful to support hexadecimal BYTEA
literals, though. Unfortunately, X'DEADBEEF' has already been taken
by bit strings.

--
Florian Weimer <fweimer(at)bfk(dot)de>
BFK edv-consulting GmbH http://www.bfk.de/
Durlacher Allee 47 tel: +49-721-96201-1
D-76131 Karlsruhe fax: +49-721-96201-99