Re: 64-bit API for large object

From: Kohei KaiGai <kaigai(at)kaigai(dot)gr(dot)jp>
To: Tatsuo Ishii <ishii(at)postgresql(dot)org>
Cc: pgsql-hackers(at)postgresql(dot)org, tgl(at)sss(dot)pgh(dot)pa(dot)us, anzai(at)sraoss(dot)co(dot)jp, nagata(at)sraoss(dot)co(dot)jp
Subject: Re: 64-bit API for large object
Date: 2012-09-20 05:06:02
Message-ID: CADyhKSXTsg5w4LRdXiXHsXZvzJOsOZRC1M3Ujkt3yNhsD6cV-g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I checked this patch. It can be applied onto the latest master branch
without any problems. My comments are below.

2012/9/11 Tatsuo Ishii <ishii(at)postgresql(dot)org>:
> Ok, here is the patch to implement 64-bit API for large object, to
> allow to use up to 4TB large objects(or 16TB if BLCKSZ changed to
> 32KB). The patch is based on Jeremy Drake's patch posted on September
> 23, 2005
> (http://archives.postgresql.org/pgsql-hackers/2005-09/msg01026.php)
> and reasonably updated/edited to adopt PostgreSQL 9.3 by Nozomi Anzai
> for the backend part and Yugo Nagata for the rest(including
> documentation patch).
>
> Here are changes made in the patch:
>
> 1) Frontend lo_* libpq functions(fe-lobj.c)(Yugo Nagata)
>
> lo_initialize() gathers backend 64-bit large object handling
> function's oid, namely lo_lseek64, lo_tell64, lo_truncate64.
>
> If client calls lo_*64 functions and backend does not support them,
> lo_*64 functions return error to caller. There might be an argument
> since calls to lo_*64 functions can automatically be redirected to
> 32-bit older API. I don't know this is worth the trouble though.
>
I think it should definitely return an error code when user tries to
use lo_*64 functions towards the backend v9.2 or older, because
fallback to 32bit API can raise unexpected errors if application
intends to seek the area over than 2GB.

> Currently lo_initialize() throws an error if one of oids are not
> available. I doubt we do the same way for 64-bit functions since this
> will make 9.3 libpq unable to access large objects stored in pre-9.2
> PostgreSQL servers.
>
It seems to me the situation to split the case of pre-9.2 and post-9.3
using a condition of "conn->sversion >= 90300".

> To pass 64-bit integer to PQfn, PQArgBlock is used like this: int *ptr
> is a pointer to 64-bit integer and actual data is placed somewhere
> else. There might be other way: add new member to union u to store
> 64-bit integer:
>
> typedef struct
> {
> int len;
> int isint;
> union
> {
> int *ptr; /* can't use void (dec compiler barfs) */
> int integer;
> int64 bigint; /* 64-bit integer */
> } u;
> } PQArgBlock;
>
> I'm a little bit worried about this way because PQArgBlock is a public
> interface.
>
I'm inclined to add a new field for the union; that seems to me straight
forward approach.
For example, the manner in lo_seek64() seems to me confusable.
It set 1 on "isint" field even though pointer is delivered actually.

+ argv[1].isint = 1;
+ argv[1].len = 8;
+ argv[1].u.ptr = (int *) &len;

> Also we add new type "pg_int64":
>
> #ifndef NO_PG_INT64
> #define HAVE_PG_INT64 1
> typedef long long int pg_int64;
> #endif
>
> in postgres_ext.h per suggestion from Tom Lane:
> http://archives.postgresql.org/pgsql-hackers/2005-09/msg01062.php
>
I'm uncertain about context of this discussion.

Does it make matter if we include <stdint.h> and use int64_t instead
of the self defined data type?

> 2) Backend lo_* functions (be-fsstubs.c)(Nozomi Anzai)
>
> Add lo_lseek64, lo_tell64, lo_truncate64 so that they can handle
> 64-bit seek position and data length. loread64 and lowrite64 are not
> added because if a program tries to read/write more than 2GB at once,
> it would be a sign that the program need to be re-designed anyway.
>
I think it is a reasonable.

> 3) Backend inv_api.c functions(Nozomi Anzai)
>
> No need to add new functions. Just extend them to handle 64-bit data.
>
> BTW , what will happen if older 32-bit libpq accesses large objects
> over 2GB?
>
> lo_read and lo_write: they can read or write lobjs using 32-bit API as
> long as requested read/write data length is smaller than 2GB. So I
> think we can safely allow them to access over 2GB lobjs.
>
> lo_lseek: again as long as requested offset is smaller than 2GB, there
> would be no problem.
>
> lo_tell:if current seek position is beyond 2GB, returns an error.
>
Even though iteration of lo_lseek() may move the offset to 4TB, it also
makes unavailable to use lo_tell() to obtain the current offset, so I think
it is reasonable behavior.

However, error code is not an appropriate one.

+ if (INT_MAX < offset)
+ {
+ ereport(ERROR,
+ (errcode(ERRCODE_UNDEFINED_OBJECT),
+ errmsg("invalid large-object
descriptor: %d", fd)));
+ PG_RETURN_INT32(-1);
+ }

According to the manpage of lseek(2)
EOVERFLOW
The resulting file offset cannot be represented in an off_t.

Please add a new error code such as ERRCODE_BLOB_OFFSET_OVERFLOW.

> 4) src/test/examples/testlo64.c added for 64-bit API example(Yugo Nagata)
>
> Comments and suggestions are welcome.
>
miscellaneous comments are below.

Regression test is helpful. Even though no need to try to create 4TB large
object, it is helpful to write some chunks around the design boundary.
Could you add some test cases that writes some chunks around 4TB offset.

Thanks,
--
KaiGai Kohei <kaigai(at)kaigai(dot)gr(dot)jp>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2012-09-20 05:10:23 Re: [v9.3] Extra Daemons (Re: elegant and effective way for running jobs inside a database)
Previous Message Peter Eisentraut 2012-09-20 02:56:03 newline conversion in SQL command strings