Re: pg_dump's checkSeek() seems inadequate

Lists: pgsql-hackers
From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: pg_dump's checkSeek() seems inadequate
Date: 2010-06-27 17:42:11
Message-ID: 28254.1277660531@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

While testing a fix for the pg_restore issues mentioned a few days ago,
I noticed that checkSeek() returns true on my old HPUX box even when the
input is in fact not seekable, for instance a pipe. This leads to
pg_restore failing completely in cases where it ought to work, like this:

$ cat vector.dump | pg_restore -d vector
pg_restore: [custom archiver] WARNING: ftell mismatch with expected position -- ftell used
pg_restore: [custom archiver] error during file seek: Illegal seek
$

The test that checkSeek() is using is to see whether this works:

fseeko(fp, 0, SEEK_CUR)

and apparently on this platform that's a no-op even on an otherwise
unseekable file. I don't know how common this behavior is; I don't
recall having seen complaints about pg_restore failing for other people.

If I change the test to be

fseeko(fp, 0, SEEK_SET)

then it does the right thing. Since checkSeek() is applied immediately
after opening the input file this should be OK, but it does limit the
scope of usefulness of that function.

Any thoughts about whether or not to apply such a patch? If it should
be changed, should we back-patch it?

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: pg_dump's checkSeek() seems inadequate
Date: 2010-06-27 18:33:41
Message-ID: 28889.1277663621@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I wrote:
> The test that checkSeek() is using is to see whether this works:
> fseeko(fp, 0, SEEK_CUR)
> and apparently on this platform that's a no-op even on an otherwise
> unseekable file.

BTW, I looked in the archives for related discussions and found only the
thread in which the current fseek-checking code was designed. There
were a couple of mentions that SEEK_CUR should be changed to SEEK_SET, eg
http://archives.postgresql.org/pgsql-hackers/2002-10/msg01088.php
but it was apparently focused around coping with specific seek APIs
that didn't have SEEK_CUR at all. We ended up not using those so the
change was never made. The thought that a no-op seek might not give
the right answer wasn't discussed AFAICT.

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: pg_dump's checkSeek() seems inadequate
Date: 2010-06-27 22:05:49
Message-ID: AANLkTiko4KJ0Bc8Rbw_7eby8r9vuNPSu_oLd6ImCqeQq@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Jun 27, 2010 at 1:42 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> If I change the test to be
>
>        fseeko(fp, 0, SEEK_SET)
>
> then it does the right thing.  Since checkSeek() is applied immediately
> after opening the input file this should be OK, but it does limit the
> scope of usefulness of that function.
>
> Any thoughts about whether or not to apply such a patch?  If it should
> be changed, should we back-patch it?

Well, I guess it depends on what you think the chances are that the
revised test will fail on some other obscure platform. Have there
been any reports from the field? If not, I might apply to HEAD and
await developments.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: pg_dump's checkSeek() seems inadequate
Date: 2010-06-27 22:19:02
Message-ID: 2508.1277677142@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Sun, Jun 27, 2010 at 1:42 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> If I change the test to be
>> fseeko(fp, 0, SEEK_SET)
>> then it does the right thing.

> Well, I guess it depends on what you think the chances are that the
> revised test will fail on some other obscure platform.

To believe that, you'd have to believe that fseeko(fp, 0, SEEK_SET)
will fail but fseeko(fp, something-not-zero, SEEK_SET) will succeed.

A somewhat more plausible scenario is that somebody might hope that
they could do something like this:

echo 'some custom header' >pg.dump
pg_dump -Fc >>pg.dump

I believe that (at least on most Unixen) doing fseeko(fp, 0, SEEK_SET)
would result in overwriting the custom header, where it would not have
been overwritten before. However the usefulness of the above is at
best far-fetched; and I'm not very sure that it works today anyway,
since pg_dump/pg_restore seem to assume that manual byte counting should
match the results of ftell().

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: pg_dump's checkSeek() seems inadequate
Date: 2010-06-28 00:42:43
Message-ID: AANLkTinfAT40PNuyiOPdjcgRWzgqO0nDmOSoukf-CxcE@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Jun 27, 2010 at 6:19 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Sun, Jun 27, 2010 at 1:42 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> If I change the test to be
>>>        fseeko(fp, 0, SEEK_SET)
>>> then it does the right thing.
>
>> Well, I guess it depends on what you think the chances are that the
>> revised test will fail on some other obscure platform.
>
> To believe that, you'd have to believe that fseeko(fp, 0, SEEK_SET)
> will fail but fseeko(fp, something-not-zero, SEEK_SET) will succeed.
>
> A somewhat more plausible scenario is that somebody might hope that
> they could do something like this:
>
>        echo 'some custom header' >pg.dump
>        pg_dump -Fc >>pg.dump
>
> I believe that (at least on most Unixen) doing fseeko(fp, 0, SEEK_SET)
> would result in overwriting the custom header, where it would not have
> been overwritten before.  However the usefulness of the above is at
> best far-fetched; and I'm not very sure that it works today anyway,
> since pg_dump/pg_restore seem to assume that manual byte counting should
> match the results of ftell().

That doesn't actually sound all that far-fetched.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: pg_dump's checkSeek() seems inadequate
Date: 2010-06-28 00:55:52
Message-ID: 4908.1277686552@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Sun, Jun 27, 2010 at 6:19 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> A somewhat more plausible scenario is that somebody might hope that
>> they could do something like this:
>>
>> echo 'some custom header' >pg.dump
>> pg_dump -Fc >>pg.dump

> That doesn't actually sound all that far-fetched.

I've got my doubts, but I guess we could write an even-more-bulletproof
test by doing ftello() and then seeing if fseeko to that position
complains. This might be a good thing anyway since most of the other
uses of ftello aren't really checking for errors ...

regards, tom lane


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: pg_dump's checkSeek() seems inadequate
Date: 2010-06-28 01:25:18
Message-ID: 4C27F9FE.10209@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> A somewhat more plausible scenario is that somebody might hope that
> they could do something like this:
>
> echo 'some custom header' >pg.dump
> pg_dump -Fc >>pg.dump
>
> I believe that (at least on most Unixen) doing fseeko(fp, 0, SEEK_SET)
> would result in overwriting the custom header, where it would not have
> been overwritten before. However the usefulness of the above is at
> best far-fetched; and I'm not very sure that it works today anyway,
> since pg_dump/pg_restore seem to assume that manual byte counting should
> match the results of ftell().
>
>
>

What would anyone hope to achieve by such a manoeuvre, even if it
worked, which I am close the dead sure it would not?

cheers

andrew


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: pg_dump's checkSeek() seems inadequate
Date: 2010-06-28 02:24:21
Message-ID: 6345.1277691861@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> Tom Lane wrote:
>> A somewhat more plausible scenario is that somebody might hope that
>> they could do something like this:
>>
>> echo 'some custom header' >pg.dump
>> pg_dump -Fc >>pg.dump

> What would anyone hope to achieve by such a manoeuvre, even if it
> worked, which I am close the dead sure it would not?

It looks to me like it probably would actually work, so far as pg_dump
is concerned, but _discoverArchiveFormat() would break it because that
tries to do an unconditional fseeko(fp, 0, SEEK_SET) (and the position
counting is screwed up even if the fseeko fails). That could probably
be fixed if anyone thought this scenario was interesting enough to
justify work directed specifically at it. I did the ftello/fseeko dance
in checkSeek() just now because it seems sensible to me to have
checkSeek() actually verify functionality of both of those calls,
not because I think it's real likely that the position won't be 0.

regards, tom lane


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: pg_dump's checkSeek() seems inadequate
Date: 2010-06-28 11:32:17
Message-ID: 4C288841.7030601@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
>
>> Tom Lane wrote:
>>
>>> A somewhat more plausible scenario is that somebody might hope that
>>> they could do something like this:
>>>
>>> echo 'some custom header' >pg.dump
>>> pg_dump -Fc >>pg.dump
>>>
>
>
>> What would anyone hope to achieve by such a manoeuvre, even if it
>> worked, which I am close the dead sure it would not?
>>
>
> It looks to me like it probably would actually work, so far as pg_dump
> is concerned, but _discoverArchiveFormat() would break it because that
> tries to do an unconditional fseeko(fp, 0, SEEK_SET) (and the position
> counting is screwed up even if the fseeko fails). That could probably
> be fixed if anyone thought this scenario was interesting enough to
> justify work directed specifically at it.
>

IIRC pg_restore expects the archive to begin with the header and TOC
produced by pg_dump. Maybe I'm being dense, but I'm not able to see how
prefixing that with something else could possibly do something useful or
workable.

cheers

andrew


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: pg_dump's checkSeek() seems inadequate
Date: 2010-06-28 14:08:55
Message-ID: 26772.1277734135@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> Tom Lane wrote:
>>>> echo 'some custom header' >pg.dump
>>>> pg_dump -Fc >>pg.dump

> IIRC pg_restore expects the archive to begin with the header and TOC
> produced by pg_dump. Maybe I'm being dense, but I'm not able to see how
> prefixing that with something else could possibly do something useful or
> workable.

You'd have to do something like

(1) open the file as stdin
(2) read the custom header
(3) exec pg_restore, telling it to read stdin

A possible application for this would be for the header to contain
information needed to prepare pg_restore's arguments, like where to
contact the server. I still think it's a bit too klugy to justify
the effort though.

In a more abstract sense, what this would be is basically a custom
label for a dump file. I could see that being useful, but if we
wanted to support it then it ought to be an actual Feature and not
a kluge like this. Something like

pg_dump --label 'any string' ... other args ...

pg_restore --print-label dump.file

regards, tom lane


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: pg_dump's checkSeek() seems inadequate
Date: 2010-06-28 15:19:19
Message-ID: 4C28BD77.8070907@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> In a more abstract sense, what this would be is basically a custom
> label for a dump file. I could see that being useful, but if we
> wanted to support it then it ought to be an actual Feature and not
> a kluge like this. Something like
>
> pg_dump --label 'any string' ... other args ...
>
> pg_restore --print-label dump.file
>
>
>

Right, or possibly a file that would be added. I don't think we should
waste any effort at all on the kluge, or worrying about whether or not
it would work.

cheers

andrew