Always include encoding of database in pg_dumpall

Lists: pgsql-hackers
From: Jeremy Evans <code(at)jeremyevans(dot)net>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Always include encoding of database in pg_dumpall
Date: 2012-10-18 17:44:33
Message-ID: 20121018174433.GL4122@jeremyevans.local
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I've have a PostgreSQL database cluster that I've continually upgraded
from 7.1 to 9.1 without problems using pg_dumpall and psql. When
migrating to 9.2, I decided to change the default encoding for the
database cluster from SQL_ASCII to UTF8. When I went to restore my
database backup (created using 9.1's pg_dumpall), the tables
containing data not in UTF8 format were empty.

This appears to be caused by pg_dumpall omitting the ENCODING when
dumping if it is the same as the current database cluster's default
encoding. I believe this is a bad idea, as there is no guarantee the
backup will be restored on a cluster with the same default encoding.

This patch always sets the ENCODING for the databases, even if it is
the same as the current database cluster default encoding. Thoughts?

I'm not sure if similar changes should be made for LC_COLLATE or
LC_CTYPE, but that's something that should be considered.

Please CC me when responding as I don't currently subscribe to the
list.

Thanks,
Jeremy

Attachment Content-Type Size
pg_dumpall_encoding.diff text/plain 660 bytes

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jeremy Evans <code(at)jeremyevans(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Always include encoding of database in pg_dumpall
Date: 2012-10-19 22:55:02
Message-ID: 13332.1350687302@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Jeremy Evans <code(at)jeremyevans(dot)net> writes:
> I've have a PostgreSQL database cluster that I've continually upgraded
> from 7.1 to 9.1 without problems using pg_dumpall and psql. When
> migrating to 9.2, I decided to change the default encoding for the
> database cluster from SQL_ASCII to UTF8. When I went to restore my
> database backup (created using 9.1's pg_dumpall), the tables
> containing data not in UTF8 format were empty.

I assume there were some errors reported?

> This appears to be caused by pg_dumpall omitting the ENCODING when
> dumping if it is the same as the current database cluster's default
> encoding. I believe this is a bad idea, as there is no guarantee the
> backup will be restored on a cluster with the same default encoding.

I'm not persuaded by this argument. The choice to omit default encoding
and locale settings was intentional; if you choose to load into an
installation with different defaults, it's probably because you *want*
to change the encoding/locale you're using. What you propose would make
it impossible to do that. Indeed, I'd have thought that was exactly
the use-case you were trying to accomplish here.

> I'm not sure if similar changes should be made for LC_COLLATE or
> LC_CTYPE, but that's something that should be considered.

It would be a particularly bad idea to change the behavior for the
locale names, because those aren't consistent across platforms.
If we didn't suppress them when possible, cross-platform restores
would likely become impossible.

regards, tom lane


From: Jeremy Evans <code(at)jeremyevans(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Always include encoding of database in pg_dumpall
Date: 2012-10-19 23:19:19
Message-ID: 20121019231919.GP4122@jeremyevans.local
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 10/19 06:55, Tom Lane wrote:
> Jeremy Evans <code(at)jeremyevans(dot)net> writes:
> > I've have a PostgreSQL database cluster that I've continually upgraded
> > from 7.1 to 9.1 without problems using pg_dumpall and psql. When
> > migrating to 9.2, I decided to change the default encoding for the
> > database cluster from SQL_ASCII to UTF8. When I went to restore my
> > database backup (created using 9.1's pg_dumpall), the tables
> > containing data not in UTF8 format were empty.
>
> I assume there were some errors reported?

Yes. Of course, using psql to restore the dump, the error messages are
interspersed with all of the notices output, making it harder to see the
errors.

> > This appears to be caused by pg_dumpall omitting the ENCODING when
> > dumping if it is the same as the current database cluster's default
> > encoding. I believe this is a bad idea, as there is no guarantee the
> > backup will be restored on a cluster with the same default encoding.
>
> I'm not persuaded by this argument. The choice to omit default encoding
> and locale settings was intentional; if you choose to load into an
> installation with different defaults, it's probably because you *want*
> to change the encoding/locale you're using. What you propose would make
> it impossible to do that. Indeed, I'd have thought that was exactly
> the use-case you were trying to accomplish here.

My use case was just to change the default encoding to UTF8 so I
wouldn't have to set the encoding manually when creating databases in
the future, I didn't want to change the encoding of any of my current
databases.

I think the current behavior gives preference to usability at the
expense of reliability, but I understand that that might be an
appropriate decision to make.

> > I'm not sure if similar changes should be made for LC_COLLATE or
> > LC_CTYPE, but that's something that should be considered.
>
> It would be a particularly bad idea to change the behavior for the
> locale names, because those aren't consistent across platforms.
> If we didn't suppress them when possible, cross-platform restores
> would likely become impossible.

That makes sense.

Thanks for considering my patch. :)

Jeremy