Re: Ripping out pg_restore's attempts to parse SQL before sending it

Lists: pgsql-hackers
From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: Ripping out pg_restore's attempts to parse SQL before sending it
Date: 2011-07-28 00:30:49
Message-ID: 20451.1311813049@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

In
http://archives.postgresql.org/message-id/201107270042.22427.julian@mehnle.net
it's pointed out that pg_restore in direct-to-database mode is pretty
badly broken if standard_conforming_strings=on. The reason is that it
tries to lex SQL commands well enough to find line boundaries, and the
code that does so has never heard of standard_conforming_strings.
While we could possibly teach it about that, I have a more radical
solution in mind: I think we should get rid of that code entirely.

The reason it's attempting to do this is so that it can switch between
send-SQL and send-copy-data modes at line boundaries. However, there
really is no reason to do that, because COPY data is stored separately
from SQL commands in pg_dump archives, and has been for a very long
time. So we could simply shove the whole archive entry into either
PQexec or PQputCopyData, saving a lot of complexity and probably a
nontrivial number of cycles.

While I've not yet done any excavation in the commit logs to confirm
this, the nearby comments in the code indicate that separation of COPY
data from SQL commands was adopted in archive format version 1.3, which
is ancient. In fact, it's so ancient that there was never a production
release of pg_dump that created pre-1.3 archives --- only 7.1devel
versions ever produced those. It's thus a fairly safe bet that no such
archive files exist in the field. And just to add insult to injury,
there's this in RestoreArchive():

if (AH->version < K_VERS_1_3)
die_horribly(AH, modulename, "direct database connections are not supported in pre-1.3 archives\n");

which means that we wouldn't get to the code that is trying to parse
the SQL commands if we did have a pre-1.3 archive to read.

So as far as I can tell, this code is as dead as code can possibly be,
and we ought to rip it out instead of adding still another layer of
complexity to it.

I don't have a problem with back-patching a fix of that ilk to 9.1,
but I'm less sure that it's appropriate to do so in pre-9.1 branches.
Given the lack of prior complaints, maybe we don't have to fix the
older branches, although certainly something must be done in 9.1.

Opinions?

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Ripping out pg_restore's attempts to parse SQL before sending it
Date: 2011-07-28 03:28:00
Message-ID: 24743.1311823680@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I wrote:
> While I've not yet done any excavation in the commit logs to confirm
> this, the nearby comments in the code indicate that separation of COPY
> data from SQL commands was adopted in archive format version 1.3, which
> is ancient. In fact, it's so ancient that there was never a production
> release of pg_dump that created pre-1.3 archives --- only 7.1devel
> versions ever produced those.

I've now poked around in the commit history, and found that pg_dump's
archive format, as well as pg_restore itself, were introduced in commit
500b62b05 of 2000-07-04. The 1.3 format revision was added in commit
e8f69be05 of 2000-07-21. So not only did the prior formats not get
written by any production releases, there was only an interval of a
couple of weeks where they'd even have been emitted by development HEAD.

regards, tom lane