Re: 64-bit API for large objects

Lists: pgsql-hackers
From: Mark Dilger <pgsql(at)markdilger(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: 64-bit API for large objects
Date: 2005-09-18 16:13:00
Message-ID: 432D920C.5030709@markdilger.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

My company has written a 64-bit large object API, extending the postgresql
server to be able to read/write/seek/tell/open/close objects larger than 2GB.
If the hackers community considers this valuable, we will submit the changes
back for the rest of the community to share.

From one of my programmers, Jeremy Drake:

I tested this out on my box with a 4gb dvd iso image, and it appears to
work correctly. The test code I found for large object things does not
really seem to exercise the api very well though. And the regression
tests do not seem to even touch large objects (they all still pass after
this change).

Mark, can you take a look at this and make sure I haven't broken anything
too obviously?

I wrote it into the same file as the large object code, since that file
has some static stuff for caching things which I would like to share. I
opted to add new functions tell64 and seek64 rather than changing the
existing ones for backwards compatibility. I plugged them into the
pg_proc catalog, but everything in that file has an explicit OID, and I do
not feel comfortable (yet) grabbing up OIDs for stuff. So I set them to
an OID of zero, which means the scripts will assign it one which is not
used (in the range 10000-something). Since the convention is that such
functions have explicit assigned OIDs, it would probably be required to
get real ones if this were ever to be submitted back. Also, in the libpq
stuff, at the moment I have it fail if it cannot find the seek64 or tell64
functions. It may be best to have it work as long as you don't try to
call them, in order to preserve backwards compatibility with other server
versions.

If you think this is a reasonable patch, it might be nice to send it to
them, be a good neighbor and all that...


From: "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>
To: Mark Dilger <pgsql(at)markdilger(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: 64-bit API for large objects
Date: 2005-09-19 11:56:43
Message-ID: 36e68292050919045617795fd3@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Mark,

If you don't mind contributing the changes, we'd be glad to take a look at
them. Thanks.

-Jonah

On 9/18/05, Mark Dilger <pgsql(at)markdilger(dot)com> wrote:
>
> My company has written a 64-bit large object API, extending the postgresql
> server to be able to read/write/seek/tell/open/close objects larger than
> 2GB.
> If the hackers community considers this valuable, we will submit the
> changes
> back for the rest of the community to share.
>
>
> From one of my programmers, Jeremy Drake:
>
> I tested this out on my box with a 4gb dvd iso image, and it appears to
> work correctly. The test code I found for large object things does not
> really seem to exercise the api very well though. And the regression
> tests do not seem to even touch large objects (they all still pass after
> this change).
>
> Mark, can you take a look at this and make sure I haven't broken anything
> too obviously?
>
> I wrote it into the same file as the large object code, since that file
> has some static stuff for caching things which I would like to share. I
> opted to add new functions tell64 and seek64 rather than changing the
> existing ones for backwards compatibility. I plugged them into the
> pg_proc catalog, but everything in that file has an explicit OID, and I do
> not feel comfortable (yet) grabbing up OIDs for stuff. So I set them to
> an OID of zero, which means the scripts will assign it one which is not
> used (in the range 10000-something). Since the convention is that such
> functions have explicit assigned OIDs, it would probably be required to
> get real ones if this were ever to be submitted back. Also, in the libpq
> stuff, at the moment I have it fail if it cannot find the seek64 or tell64
> functions. It may be best to have it work as long as you don't try to
> call them, in order to preserve backwards compatibility with other server
> versions.
>
> If you think this is a reasonable patch, it might be nice to send it to
> them, be a good neighbor and all that...
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
> choose an index scan if your joining column's datatypes do not
> match
>

--
Respectfully,

Jonah H. Harris, Database Internals Architect
EnterpriseDB Corporation
http://www.enterprisedb.com/


From: Mark Dilger <pgsql(at)markdilger(dot)com>
To: "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: 64-bit API for large objects
Date: 2005-09-20 00:47:39
Message-ID: 432F5C2B.6040003@markdilger.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Jonah H. Harris wrote:
> Mark,
>
> If you don't mind contributing the changes, we'd be glad to take a look
> at them. Thanks.
>
> -Jonah
>

Ok, we will post it back soon. We have tested it on two different 64-bit
architectures (Sparc and AMD) and are now testing on pentium before posting up
to the list.

mark


From: "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>
To: Mark Dilger <pgsql(at)markdilger(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: 64-bit API for large objects
Date: 2005-09-20 12:15:22
Message-ID: 36e68292050920051566c86ea9@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Cool. We look forward to it.

On 9/19/05, Mark Dilger <pgsql(at)markdilger(dot)com> wrote:
>
> Jonah H. Harris wrote:
> > Mark,
> >
> > If you don't mind contributing the changes, we'd be glad to take a look
> > at them. Thanks.
> >
> > -Jonah
> >
>
> Ok, we will post it back soon. We have tested it on two different 64-bit
> architectures (Sparc and AMD) and are now testing on pentium before
> posting up
> to the list.
>
> mark
>

--
Respectfully,

Jonah H. Harris, Database Internals Architect
EnterpriseDB Corporation
http://www.enterprisedb.com/


From: Jeremy Drake <pgsql(at)jdrake(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Mark Dilger <pgsql(at)markdilger(dot)com>
Subject: Re: 64-bit API for large objects
Date: 2005-09-23 18:07:30
Message-ID: Pine.LNX.4.63.0509231040590.5158@garibaldi.apptechsys.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

This patch implements the ability for large objects to be larger than 2GB.
I believe the limit to now be about 2TB, based on the fact that the large
object page size is 2048 bytes, and the page number is still 32 bits.

There are a few things about this patch which probably require tweaking or
at least a closer inspection from the list.

1) The lo_*64 functions are added to the catalog/pg_proc.h (spacing exact
location atm) with OID set to 0, all other entries in this file have OIDs
explicitly defined.

2) The lo_*64, in order to be convenient from the client end, have
functions added to libpq as the existing lo_* functions. The client side
of libpq did not previously know anything about int64 or how to
send/receive them. I added an include of postgres-fe.h (which according
to the comment in that file looks like it should go there) so int64 would
be defined, also implemented functions (code mostly stolen from the server
libpq format functions for same) to convert them to/from network byte
order. I did this in a somewhat inconsistent way between the get and put,
as I did not want to change the existing api at all, and existing code as
little as possible.

3) The 32 bit box I tested this on was a PII 300MHz laptop. Not exactly
the fastest. The "test" consisted entirely of making sure it compiled.
Perhaps someone with a fast IA32 box and spare cycles can test it? Also,
so far the only platforms I have tried to compile this on have been:

* Linux 2.6 (gentoo), AMD64, gcc-3.4.4
* Solaris 8, SPARCv9, gcc-3.4.2
* Linux 2.6 (debian unstable), i686, gcc-3.4.x (laptop, don't remember
exact version).

Would probably be a good idea to verify this on other platforms as well,
or at least other compilers.

Hopefully I did not break anything too badly with this. All of the
regression tests still pass after the patch, and I made a version of the
tests/examples/testlo which uses 64bit (in the patch) which works also. I
grepped in the regression tests, and I could not find any usage of large
objects in them, which I found to be rather odd, which is why I used
testlo and my new testlo64 to test them instead.

On Tue, 20 Sep 2005, Jonah H. Harris wrote:

> Cool. We look forward to it.
>
> On 9/19/05, Mark Dilger <pgsql(at)markdilger(dot)com> wrote:
> >
> > Jonah H. Harris wrote:
> > > Mark,
> > >
> > > If you don't mind contributing the changes, we'd be glad to take a look
> > > at them. Thanks.
> > >
> > > -Jonah
> > >
> >
> > Ok, we will post it back soon. We have tested it on two different 64-bit
> > architectures (Sparc and AMD) and are now testing on pentium before
> > posting up
> > to the list.
> >
> > mark
> >
>
>
>
> --
> Respectfully,
>
> Jonah H. Harris, Database Internals Architect
> EnterpriseDB Corporation
> http://www.enterprisedb.com/
>

--
Mere nonexistence is a feeble excuse for declaring a thing unseeable. You
*can* see dragons. You just have to look in the right direction.
-- John Hasler

Attachment Content-Type Size
postgres-8.0.3-64bit-lo.patch text/plain 22.9 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jeremy Drake <pgsql(at)jdrake(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, Mark Dilger <pgsql(at)markdilger(dot)com>
Subject: Re: 64-bit API for large objects
Date: 2005-09-23 21:40:09
Message-ID: 6801.1127511609@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Jeremy Drake <pgsql(at)jdrake(dot)com> writes:
> 2) The lo_*64, in order to be convenient from the client end, have
> functions added to libpq as the existing lo_* functions. The client side
> of libpq did not previously know anything about int64 or how to
> send/receive them. I added an include of postgres-fe.h (which according
> to the comment in that file looks like it should go there) so int64 would
> be defined,

Unfortunately that's completely unacceptable from a namespace-pollution
point of view.

The real problem here is that int64 isn't a well-defined portable
datatype, and so it's going to be very hard to export these functions in
a way that won't break on different platforms, applications compiled
with a different compiler than libpq was, etc.

For that matter, we can't even guarantee that they work at all: not all
platforms even *have* int64 types. We have so far avoided putting any
fundamental dependencies on int64 arithmetic into the system, and I'm a
bit worried that this patch will break LO support entirely on platforms
that don't have working int64 arithmetic.

regards, tom lane


From: David Fetter <david(at)fetter(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jeremy Drake <pgsql(at)jdrake(dot)com>, pgsql-hackers(at)postgresql(dot)org, Mark Dilger <pgsql(at)markdilger(dot)com>
Subject: Re: 64-bit API for large objects
Date: 2005-09-23 22:32:20
Message-ID: 20050923223220.GD4164@fetter.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Sep 23, 2005 at 05:40:09PM -0400, Tom Lane wrote:
> Jeremy Drake <pgsql(at)jdrake(dot)com> writes:
>
> The real problem here is that int64 isn't a well-defined portable
> datatype, and so it's going to be very hard to export these
> functions in a way that won't break on different platforms,
> applications compiled with a different compiler than libpq was, etc.
>
> For that matter, we can't even guarantee that they work at all: not
> all platforms even *have* int64 types. We have so far avoided
> putting any fundamental dependencies on int64 arithmetic into the
> system, and I'm a bit worried that this patch will break LO support
> entirely on platforms that don't have working int64 arithmetic.

What platforms that PG supports don't have int64 arithmetic?

Cheers,
D
--
David Fetter david(at)fetter(dot)org http://fetter.org/
phone: +1 510 893 6100 mobile: +1 415 235 3778

Remember to vote!


From: Jeremy Drake <jeremyd(at)apptechsys(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jeremy Drake <pgsql(at)jdrake(dot)com>, pgsql-hackers(at)postgresql(dot)org, Mark Dilger <pgsql(at)markdilger(dot)com>
Subject: Re: 64-bit API for large objects
Date: 2005-09-24 02:57:43
Message-ID: Pine.LNX.4.63.0509231936220.5158@garibaldi.apptechsys.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, 23 Sep 2005, Tom Lane wrote:

> Jeremy Drake <pgsql(at)jdrake(dot)com> writes:
> > 2) The lo_*64, in order to be convenient from the client end, have
> > functions added to libpq as the existing lo_* functions. The client side
> > of libpq did not previously know anything about int64 or how to
> > send/receive them. I added an include of postgres-fe.h (which according
> > to the comment in that file looks like it should go there) so int64 would
> > be defined,
>
> Unfortunately that's completely unacceptable from a namespace-pollution
> point of view.

I don't quite understand. Allow me to cite the source, so we both are
referring to the same thing here...

jeremyd(at)pluto postgresql-8.0.3 $ head -n17 src/include/postgres_fe.h
/*-------------------------------------------------------------------------
*
* postgres_fe.h
* Primary include file for PostgreSQL client-side .c files
*
* This should be the first file included by PostgreSQL client libraries and
* application programs --- but not by backend modules, which should include
* postgres.h.
*
*
* Portions Copyright (c) 1996-2005, PostgreSQL Global Development Group
* Portions Copyright (c) 1995, Regents of the University of California
*
* $PostgreSQL: pgsql/src/include/postgres_fe.h,v 1.10 2004/12/31 22:03:19 pgsql Exp $
*
*-------------------------------------------------------------------------
*/

Now I may not completely understand the term "client", but I think libpq
is a "client library" and anything which may use it would be an
"application program". So it seems it was an oversight on the part of
libpq to not include it. Does the term "client" not mean what I thought
it did (anything which connects to a postgresql server)?

>
> The real problem here is that int64 isn't a well-defined portable
> datatype, and so it's going to be very hard to export these functions in
> a way that won't break on different platforms, applications compiled
> with a different compiler than libpq was, etc.

Umm, what wouldn't break if you switched compilers in a way that redefined
sizeof(things)? I happen to know, even using the same compiler but just
changing a compile flag (-m64) which changes sizes of integral types
(sizeof(long) from 32 to 64 bits) will make such actions stop working on
one of my tested platform. It sucks, I happen to not be fond of this
because I tend not to have every library which is on my box built for
both, but it is the way life is. I do not know of a platform where the
size of an integral type can change and still be able to link against
libraries and things. And if the size of some type is not changing, then
things should already be correctly set for the platform. But I admit I
have not met every platform in existance. Do you happen to be able to
cite a platform where this is the case?

>
> For that matter, we can't even guarantee that they work at all: not all
> platforms even *have* int64 types. We have so far avoided putting any
> fundamental dependencies on int64 arithmetic into the system, and I'm a
> bit worried that this patch will break LO support entirely on platforms
> that don't have working int64 arithmetic.

They should in fact break gracefully on such platforms, or at least as
gracefully as any other int64-using code might. I did check a couple
places for #ifdef INT64_BROKEN (or whatever it was called) to make sure
that on those platforms something at least somewhat sane would happen.
(they use 32 bits instead). Also, on those platforms, you could always
use the non-64 versions if you were concerned about that. The patches
would allow seeking past the old limit using the 32 function in stages
(seek 2G, seek 2G, seek 2G would put you at 6G) if you do not mind wierd
return values and tell not working. And if you use a platform which
does not support 64bit integral types, then you cannot reasonably expect
those functions to work correctly anyway. But they should compile at
least.

>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
> choose an index scan if your joining column's datatypes do not
> match
>

--
I don't wanna argue, and I don't wanna fight,
But there will definitely be a party tonight...


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jeremy Drake <jeremyd(at)apptechsys(dot)com>
Cc: Jeremy Drake <pgsql(at)jdrake(dot)com>, pgsql-hackers(at)postgresql(dot)org, Mark Dilger <pgsql(at)markdilger(dot)com>
Subject: Re: 64-bit API for large objects
Date: 2005-09-24 03:14:12
Message-ID: 8919.1127531652@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Jeremy Drake <jeremyd(at)apptechsys(dot)com> writes:
> On Fri, 23 Sep 2005, Tom Lane wrote:
>> Unfortunately that's completely unacceptable from a namespace-pollution
>> point of view.

> I don't quite understand.

postgresql-fe.h defines a ton of stuff that has no business being
visible to libpq's client applications. It's designed to be used by
our *own* client-side code (psql and the like), but we have not made
any attempt to keep it from defining stuff that would likely break
other peoples' code.

regards, tom lane


From: Jeremy Drake <pgsql(at)jdrake(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org, Mark Dilger <pgsql(at)markdilger(dot)com>
Subject: Re: 64-bit API for large objects
Date: 2005-09-24 05:35:54
Message-ID: Pine.LNX.4.63.0509232214290.5158@garibaldi.apptechsys.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, 23 Sep 2005, Tom Lane wrote:

> postgresql-fe.h defines a ton of stuff that has no business being
> visible to libpq's client applications. It's designed to be used by
> our *own* client-side code (psql and the like), but we have not made
> any attempt to keep it from defining stuff that would likely break
> other peoples' code.

So does this mean that there is a different, more advanced and more likely
to break random other code, client library where this call would fit
better? If so, I would be happy to change the patch to put it there. I
did not see it, but I did not look very hard.

If not, what is a client side programmer to do if they want to pass int64s
around? Every client app has to basically write their own htonll (or
whatever you want to call it) and perform their own detection of what type
is a 64bit int, and cache the oids for the fastcall interface themselves?
There seems to be a lot of overhead which libpq saves you from. Or the
client program could perform the detection of the type, and also detect a
function which would reasonably serve as an atoll on the platform, and
snprintf(buf, 1024, "SELECT lo_seek64(%d, %lld, %d)", fh, offset,
SEEK_SET); exec the buf, check to see if any tuples came back, if so (get
the first column of the first tuple, call atoll on that) else handle
error, and in either case free the result?

In any case, are there any comments on the changes below libpq (the
functions visible to queries on down)? I don't want to get hung up in the
client issues just to find out later that the server stuff was completely
insane anyway... The client library seems to me to be less important
anyway. If the server can support it, the client can always manage to do
it some how, and then once the client lib can support it, it should be
fairly transparent to swap that out later, so that code that worked around
could be updated without immediately breaking all other code working
around.

So that means that if I get good feedback on the server side code, I could
start having people code to it using one of the above workaround methods
listed, and then if we manage to come up with some way which would be more
"correct" (if that is the right word) than the libpq hack I did then they
could gradually switch over to that (or use sed -i).

--
All that glitters has a high refractive index.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jeremy Drake <pgsql(at)jdrake(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, Mark Dilger <pgsql(at)markdilger(dot)com>
Subject: Re: 64-bit API for large objects
Date: 2005-09-24 15:20:38
Message-ID: 12776.1127575238@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Jeremy Drake <pgsql(at)jdrake(dot)com> writes:
> On Fri, 23 Sep 2005, Tom Lane wrote:
>> postgresql-fe.h defines a ton of stuff that has no business being
>> visible to libpq's client applications. It's designed to be used by
>> our *own* client-side code (psql and the like), but we have not made
>> any attempt to keep it from defining stuff that would likely break
>> other peoples' code.

> So does this mean that there is a different, more advanced and more likely
> to break random other code, client library where this call would fit
> better?

I've been thinking more about this and come to these conclusions:

1. libpq_fe.h definitely cannot include postgres_fe.h; in fact, it has
no business even defining a type named "int64". That is way too likely
to collide with symbols coming from elsewhere in a client compilation.
I think what we need is to declare a type named "pg_int64" and use that
in the externally visible declarations. The most reasonable place to
put the typedef is postgres_ext.h. This will mean making configure
generate postgres_ext.h from a template postgres_ext.h.in, but that's
no big deal.

2. We need a strategy for what to do when configure doesn't find a
working int64 type. My inclination is to just not export the functions
in that case. So normally, postgres_ext.h would contain something
like

#define HAVE_PG_INT64 1
typedef long long int pg_int64;

but neither of these would appear if configure couldn't find a working
type. In libpq-fe.h, we'd have

#ifdef HAVE_PG_INT64
extern pg_int64 lo_lseek64(PGconn *conn, int fd, pg_int64 offset, int whence);
extern pg_int64 lo_tell64(PGconn *conn, int fd);
#endif

and similarly for all the code inside libpq. The reason this seems like
a good idea is that client code could key off "#ifdef HAVE_PG_INT64"
to detect whether the lo64 functions are available; which is useful even
if you don't care about machines without int64, because you still need
to think about machines with pre-8.2 PG installations.

3. This is still not 100% bulletproof, as it doesn't address situations
like building PG with gcc and then trying to compile client apps with a
vendor cc that doesn't understand "long long int". The compile would
choke on the typedef even if you weren't trying to use large objects at
all. I don't see any very nice way around that. It might be worth
doing this in postgres_ext.h:

#ifndef NO_PG_INT64
#define HAVE_PG_INT64 1
typedef long long int pg_int64;
#endif

which would at least provide an escape hatch for such situations: define
NO_PG_INT64 before trying to build.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jeremy Drake <pgsql(at)jdrake(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, Mark Dilger <pgsql(at)markdilger(dot)com>
Subject: Re: 64-bit API for large objects
Date: 2005-09-24 15:56:06
Message-ID: 13028.1127577366@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Jeremy Drake <pgsql(at)jdrake(dot)com> writes:
> In any case, are there any comments on the changes below libpq (the
> functions visible to queries on down)?

Within the backend, I don't see the point in maintaining a distinction
between 32- and 64-bit APIs for inv_api.c: you should just switch 'em
to use int64. You did it that way for inv_getsize but then proceeded
to add separate inv_seek64/tell64 functions, which is inconsistent.
The submitted version of lo_tell64 isn't even calling the right one :-(

The 32-bit version of lo_tell will need to check and error out if the
value it'd need to return doesn't fit in int32. (Come to think of it,
so should the 32-bit version of lo_lseek.)

All of the LO code needs to be eyeballed to make sure it still behaves
sanely if "int64" is really only 32 bits.

It would probably be a good idea also to introduce some overflow checks
to detect cases where the current LO offset would overflow int64 after a
read, write, or seek. This is missing from the existing code :-(
It is possible to code overflow checks that work regardless of the size
of "int64"; see int8.c for some inspiration. I'd suggest also that the
offset be defined as signed not unsigned (int64 not uint64) as this will
simplify the overflow checks and eliminate the prospect of scenarios
where lo_tell64 cannot return a correct value.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: David Fetter <david(at)fetter(dot)org>
Cc: Jeremy Drake <pgsql(at)jdrake(dot)com>, pgsql-hackers(at)postgresql(dot)org, Mark Dilger <pgsql(at)markdilger(dot)com>
Subject: Re: 64-bit API for large objects
Date: 2005-09-24 16:13:11
Message-ID: 13145.1127578391@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

David Fetter <david(at)fetter(dot)org> writes:
> On Fri, Sep 23, 2005 at 05:40:09PM -0400, Tom Lane wrote:
>> For that matter, we can't even guarantee that they work at all: not
>> all platforms even *have* int64 types.

> What platforms that PG supports don't have int64 arithmetic?

We claim to build with any ANSI C compiler, and there is no requirement
for a 64-bit type in ANSI C.

The historical project policy is that we should still build without
such a type, and everything should still work except that the effective
bounds of bigint data correspond to int32 instead of int64 limits.
I see no reason to back off that policy. It's not very much harder
to do it right.

regards, tom lane


From: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jeremy Drake <pgsql(at)jdrake(dot)com>, pgsql-hackers(at)postgresql(dot)org, Mark Dilger <pgsql(at)markdilger(dot)com>
Subject: Re: 64-bit API for large objects
Date: 2005-09-24 16:38:52
Message-ID: 20050924163852.GD12857@surnet.cl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hey,

While you guys are hacking at the LO code, it would be nice to consider
the suggestions outlined here:

http://archives.postgresql.org/pgsql-bugs/2004-07/msg00143.php

--
Alvaro Herrera http://www.amazon.com/gp/registry/CTMLCN8V17R4
Y una voz del caos me habló y me dijo
"Sonríe y sé feliz, podría ser peor".
Y sonreí. Y fui feliz.
Y fue peor.


From: Jeremy Drake <pgsql(at)jdrake(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org, Mark Dilger <pgsql(at)markdilger(dot)com>
Subject: Re: 64-bit API for large objects
Date: 2005-09-24 19:47:26
Message-ID: Pine.LNX.4.63.0509241229030.5158@garibaldi.apptechsys.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, 24 Sep 2005, Tom Lane wrote:

> Jeremy Drake <pgsql(at)jdrake(dot)com> writes:
> > In any case, are there any comments on the changes below libpq (the
> > functions visible to queries on down)?
>
> Within the backend, I don't see the point in maintaining a distinction
> between 32- and 64-bit APIs for inv_api.c: you should just switch 'em
> to use int64. You did it that way for inv_getsize but then proceeded
> to add separate inv_seek64/tell64 functions, which is inconsistent.

Right. I did it the way you describe my first cut (I did this several
times and changed my mind and started over). I was concerned (perhaps
needlessly) about breaking the signatures of the inv_* functions which are
visible outside of the .c file. This is why I did the getsize differently
- it is static. But it sounds like there is no concern about changing the
signatures of these functions, so I will change my patch to not maintain
the seperate code paths in inv_api.c.

> The submitted version of lo_tell64 isn't even calling the right one :-(

Oops. That's what I get for lots of copy/paste and doing it multiple
times... Bonehead mistake, thanks for catching it.

> The 32-bit version of lo_tell will need to check and error out if the
> value it'd need to return doesn't fit in int32. (Come to think of it,
> so should the 32-bit version of lo_lseek.)

That makes sense. Or it could return some value (INT_MAX?) which could
mean that it is outside the range, so someone could still get at the data
even if they are using a backwards client box? I don't know if that makes
sense at all, it sounds pretty wacky since these clients would have no
way of knowing where they are in the file. Erroring would probably be
best.

> All of the LO code needs to be eyeballed to make sure it still behaves
> sanely if "int64" is really only 32 bits.

Of course.

> It would probably be a good idea also to introduce some overflow checks
> to detect cases where the current LO offset would overflow int64 after a
> read, write, or seek. This is missing from the existing code :-(
> It is possible to code overflow checks that work regardless of the size
> of "int64"; see int8.c for some inspiration.

Yes. That would be good. These would be in the inv_* functions, correct?

> I'd suggest also that the
> offset be defined as signed not unsigned (int64 not uint64) as this will
> simplify the overflow checks and eliminate the prospect of scenarios
> where lo_tell64 cannot return a correct value.

I intended to do that. I think the only place I made uint64 vs int64 was
the getsize function, and that could be int64 also. I will look at that
code and make sure I am not mixing them in ways that are not necessary and
useful.

I will take these suggestions and make another revision of the patch
shortly. Also, I was considering exposing up an lo_getsize or lo_stat
function which would tell you how big a large object was without having to
seek to the end and look at the return value, requiring you to have the
large object open at the time, to loose your old seek position (unless you
do a tell first), and requires several more server round-trips (if not
open, would involve open/seek/close, if open, could require
tell/seek/seek).


From: Jeremy Drake <pgsql(at)jdrake(dot)com>
To: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org, Mark Dilger <pgsql(at)markdilger(dot)com>
Subject: Re: 64-bit API for large objects
Date: 2005-09-24 20:00:10
Message-ID: Pine.LNX.4.63.0509241249570.5158@garibaldi.apptechsys.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, 24 Sep 2005, Alvaro Herrera wrote:

> Hey,
>
> While you guys are hacking at the LO code, it would be nice to consider
> the suggestions outlined here:
>
> http://archives.postgresql.org/pgsql-bugs/2004-07/msg00143.php

Included from that message for easier reference:

> 0) In "Oid lo_creat(PGconn *conn, int mode)," why is there a mode on
> lo_create? The mode is determined when the object is lo_open()ed, right?

I think the docs basically said it is a vestigial feature, it used to be
useful but the code evolved in such a way that it ceased being useful. It
is probably still there to allow old code to continue to compile against
newer servers without being recompiled.

> 1) There is no lo_truncate(PGconn *conn, int fd, off_t len).

Did not notice that one. That is a good one to add if adding functions is
in the cards. I bet when the person/people who are attempting to write to
this api here get far enough, they would have noticed that too ;)

> 2) There is no lo_length(PGconn *conn, int fd).

We did notice this one however. There is also no lo_stat(PGconn *conn,
Oid lobjId). I have been thinking about implementing these two.

I think I will make a revision of the patch at some point with these. The
size ones will be extremely easy, the functionality is already there, just
a matter of exposing it. The truncate is not too difficult, but actually
requires me to think a little more ;)

--
When does summertime come to Minnesota, you ask? Well, last year, I
think it was a Tuesday.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jeremy Drake <pgsql(at)jdrake(dot)com>
Cc: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, pgsql-hackers(at)postgresql(dot)org, Mark Dilger <pgsql(at)markdilger(dot)com>
Subject: Re: 64-bit API for large objects
Date: 2005-09-24 21:19:47
Message-ID: 23730.1127596787@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Jeremy Drake <pgsql(at)jdrake(dot)com> writes:
>> 0) In "Oid lo_creat(PGconn *conn, int mode)," why is there a mode on
>> lo_create? The mode is determined when the object is lo_open()ed, right?

> I think the docs basically said it is a vestigial feature, it used to be
> useful but the code evolved in such a way that it ceased being useful. It
> is probably still there to allow old code to continue to compile against
> newer servers without being recompiled.

Yeah. There were once multiple types of large objects, and I suppose
the mode argument told lo_creat which kind to create. I have no idea
how the read/write bits got included into that --- it doesn't make any
sense. As of PG 8.1, lo_creat just ignores the mode argument. We can't
delete the argument though without causing a lot of compatibility
headaches.

regards, tom lane


From: "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: David Fetter <david(at)fetter(dot)org>, Jeremy Drake <pgsql(at)jdrake(dot)com>, pgsql-hackers(at)postgresql(dot)org, Mark Dilger <pgsql(at)markdilger(dot)com>
Subject: Re: 64-bit API for large objects
Date: 2005-09-26 21:21:48
Message-ID: 20050926212148.GU30974@pervasive.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Sep 24, 2005 at 12:13:11PM -0400, Tom Lane wrote:
> David Fetter <david(at)fetter(dot)org> writes:
> > On Fri, Sep 23, 2005 at 05:40:09PM -0400, Tom Lane wrote:
> >> For that matter, we can't even guarantee that they work at all: not
> >> all platforms even *have* int64 types.
>
> > What platforms that PG supports don't have int64 arithmetic?
>
> We claim to build with any ANSI C compiler, and there is no requirement
> for a 64-bit type in ANSI C.
>
> The historical project policy is that we should still build without
> such a type, and everything should still work except that the effective
> bounds of bigint data correspond to int32 instead of int64 limits.
> I see no reason to back off that policy. It's not very much harder
> to do it right.

So what happens if you attempt to put a value greater than 2^32 into a
bigint on a non-int64 platform?

I would argue that by default we should not allow users to even create
bigints on any platform where bigint = int. And if the default is
overridden, we should still throw a warning.
--
Jim C. Nasby, Sr. Engineering Consultant jnasby(at)pervasive(dot)com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
Cc: David Fetter <david(at)fetter(dot)org>, Jeremy Drake <pgsql(at)jdrake(dot)com>, pgsql-hackers(at)postgresql(dot)org, Mark Dilger <pgsql(at)markdilger(dot)com>
Subject: Re: 64-bit API for large objects
Date: 2005-09-26 21:31:12
Message-ID: 4440.1127770272@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Jim C. Nasby" <jnasby(at)pervasive(dot)com> writes:
> So what happens if you attempt to put a value greater than 2^32 into a
> bigint on a non-int64 platform?

You get the same error as if you tried to store a value greater than 2^64.

regards, tom lane


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jeremy Drake <pgsql(at)jdrake(dot)com>, pgsql-hackers(at)postgresql(dot)org, Mark Dilger <pgsql(at)markdilger(dot)com>
Subject: Re: 64-bit API for large objects
Date: 2005-10-13 21:01:15
Message-ID: 200510132101.j9DL1FL00779@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


This has been saved for the 8.2 release:

http://momjian.postgresql.org/cgi-bin/pgpatches_hold

---------------------------------------------------------------------------

Tom Lane wrote:
> Jeremy Drake <pgsql(at)jdrake(dot)com> writes:
> > On Fri, 23 Sep 2005, Tom Lane wrote:
> >> postgresql-fe.h defines a ton of stuff that has no business being
> >> visible to libpq's client applications. It's designed to be used by
> >> our *own* client-side code (psql and the like), but we have not made
> >> any attempt to keep it from defining stuff that would likely break
> >> other peoples' code.
>
> > So does this mean that there is a different, more advanced and more likely
> > to break random other code, client library where this call would fit
> > better?
>
> I've been thinking more about this and come to these conclusions:
>
> 1. libpq_fe.h definitely cannot include postgres_fe.h; in fact, it has
> no business even defining a type named "int64". That is way too likely
> to collide with symbols coming from elsewhere in a client compilation.
> I think what we need is to declare a type named "pg_int64" and use that
> in the externally visible declarations. The most reasonable place to
> put the typedef is postgres_ext.h. This will mean making configure
> generate postgres_ext.h from a template postgres_ext.h.in, but that's
> no big deal.
>
> 2. We need a strategy for what to do when configure doesn't find a
> working int64 type. My inclination is to just not export the functions
> in that case. So normally, postgres_ext.h would contain something
> like
>
> #define HAVE_PG_INT64 1
> typedef long long int pg_int64;
>
> but neither of these would appear if configure couldn't find a working
> type. In libpq-fe.h, we'd have
>
> #ifdef HAVE_PG_INT64
> extern pg_int64 lo_lseek64(PGconn *conn, int fd, pg_int64 offset, int whence);
> extern pg_int64 lo_tell64(PGconn *conn, int fd);
> #endif
>
> and similarly for all the code inside libpq. The reason this seems like
> a good idea is that client code could key off "#ifdef HAVE_PG_INT64"
> to detect whether the lo64 functions are available; which is useful even
> if you don't care about machines without int64, because you still need
> to think about machines with pre-8.2 PG installations.
>
> 3. This is still not 100% bulletproof, as it doesn't address situations
> like building PG with gcc and then trying to compile client apps with a
> vendor cc that doesn't understand "long long int". The compile would
> choke on the typedef even if you weren't trying to use large objects at
> all. I don't see any very nice way around that. It might be worth
> doing this in postgres_ext.h:
>
> #ifndef NO_PG_INT64
> #define HAVE_PG_INT64 1
> typedef long long int pg_int64;
> #endif
>
> which would at least provide an escape hatch for such situations: define
> NO_PG_INT64 before trying to build.
>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend
>

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jeremy Drake <pgsql(at)jdrake(dot)com>, pgsql-hackers(at)postgresql(dot)org, Mark Dilger <pgsql(at)markdilger(dot)com>
Subject: Re: 64-bit API for large objects
Date: 2006-06-14 19:33:19
Message-ID: 200606141933.k5EJXJ424400@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Thread added to TODO. As far as I can see, this thread never produced
an patch that could be applied.

---------------------------------------------------------------------------

Tom Lane wrote:
> Jeremy Drake <pgsql(at)jdrake(dot)com> writes:
> > On Fri, 23 Sep 2005, Tom Lane wrote:
> >> postgresql-fe.h defines a ton of stuff that has no business being
> >> visible to libpq's client applications. It's designed to be used by
> >> our *own* client-side code (psql and the like), but we have not made
> >> any attempt to keep it from defining stuff that would likely break
> >> other peoples' code.
>
> > So does this mean that there is a different, more advanced and more likely
> > to break random other code, client library where this call would fit
> > better?
>
> I've been thinking more about this and come to these conclusions:
>
> 1. libpq_fe.h definitely cannot include postgres_fe.h; in fact, it has
> no business even defining a type named "int64". That is way too likely
> to collide with symbols coming from elsewhere in a client compilation.
> I think what we need is to declare a type named "pg_int64" and use that
> in the externally visible declarations. The most reasonable place to
> put the typedef is postgres_ext.h. This will mean making configure
> generate postgres_ext.h from a template postgres_ext.h.in, but that's
> no big deal.
>
> 2. We need a strategy for what to do when configure doesn't find a
> working int64 type. My inclination is to just not export the functions
> in that case. So normally, postgres_ext.h would contain something
> like
>
> #define HAVE_PG_INT64 1
> typedef long long int pg_int64;
>
> but neither of these would appear if configure couldn't find a working
> type. In libpq-fe.h, we'd have
>
> #ifdef HAVE_PG_INT64
> extern pg_int64 lo_lseek64(PGconn *conn, int fd, pg_int64 offset, int whence);
> extern pg_int64 lo_tell64(PGconn *conn, int fd);
> #endif
>
> and similarly for all the code inside libpq. The reason this seems like
> a good idea is that client code could key off "#ifdef HAVE_PG_INT64"
> to detect whether the lo64 functions are available; which is useful even
> if you don't care about machines without int64, because you still need
> to think about machines with pre-8.2 PG installations.
>
> 3. This is still not 100% bulletproof, as it doesn't address situations
> like building PG with gcc and then trying to compile client apps with a
> vendor cc that doesn't understand "long long int". The compile would
> choke on the typedef even if you weren't trying to use large objects at
> all. I don't see any very nice way around that. It might be worth
> doing this in postgres_ext.h:
>
> #ifndef NO_PG_INT64
> #define HAVE_PG_INT64 1
> typedef long long int pg_int64;
> #endif
>
> which would at least provide an escape hatch for such situations: define
> NO_PG_INT64 before trying to build.
>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend
>

--
Bruce Momjian http://candle.pha.pa.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +