Re: Large objects.

Lists: pgsql-hackers
From: Dmitriy Igrishin <dmitigr(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Large objects.
Date: 2010-09-24 13:13:45
Message-ID: AANLkTi=UAXxCVaOPvBtQLQYji5ZJ30my8bO3-CdcM7Dx@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hey all,

Here is simple test case of LOB usage, please, note the comments:

#include <libpq-fe.h>
#include <libpq/libpq-fs.h>

int main(int argc, char* argv[])
{
PGconn* c = PQconnectdb("password=test");

PGresult* r = PQexec(c, "BEGIN");
PQclear(r);

const unsigned int id = lo_create(c, 0);

int fd1 = lo_open(c, id, INV_READ | INV_WRITE);
int nBytes = lo_write(c, fd1, "D", 1);
int fd1Pos = lo_lseek(c, fd1, 2147483647, SEEK_SET);
fd1Pos = lo_lseek(c, fd1, 1, SEEK_CUR);
nBytes = lo_write(c, fd1, "Dima", 4); // nBytes == 4 ! Should be 0, IMO.
// If not, where is
my name
// will be written?

r = PQexec(c, "COMMIT");
PQclear(r);

r = PQexec(c, "BEGIN");
PQclear(r);

fd1 = lo_open(c, id, INV_READ | INV_WRITE);
fd1Pos = lo_lseek(c, fd1, 0, SEEK_END); // fd1Pos == -2147483647 !

char buf[16];
nBytes = lo_read(c, fd1, buf, 4); // nBytes == 0 ! Correct, IMO.

r = PQexec(c, "COMMIT");
PQclear(r);

return 0;
}

Tell me please, why lo_write() returns me the number of bytes "actually
written"
when current write location is out of 2GB ? IMO, in this case it should
returns
at least zero.
lo_read() returns zero in this case, and it is correct, IMO.

--
Regards,
Dmitriy


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Dmitriy Igrishin <dmitigr(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Large objects.
Date: 2010-09-26 16:14:38
Message-ID: AANLkTimygJfJRHjEa1bGMwkU-oPGMqdX62FWCSQcCXO6@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Sep 24, 2010 at 9:13 AM, Dmitriy Igrishin <dmitigr(at)gmail(dot)com> wrote:
> Tell me please, why lo_write() returns me the number of bytes "actually
> written"
> when current write location is out of 2GB ? IMO, in this case it should
> returns
> at least zero.
> lo_read() returns zero in this case, and it is correct, IMO.

Hmm, are you sure? If the behavior of lo_read and lo_write is not
symmetric, that's probably not good, but I don't see anything obvious
in the code to make me think that's the case. Returning 0 for a value
>= 2^31 seems problematic unless there is no possibility of a short
read (or write).

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Dmitriy Igrishin <dmitigr(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Large objects.
Date: 2010-09-26 16:21:05
Message-ID: AANLkTi=iKHEVS4cRwrTXYgs+Eavp7UmR5D2nxfAG1yyv@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hey Robert,

Yes, I am sure. I've tested it by test case in my original post.
Do you can compile and reproduce it please?

--
// Dmitriy.


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Dmitriy Igrishin <dmitigr(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Large objects.
Date: 2010-09-27 14:35:00
Message-ID: AANLkTik_dzHw-iCu8WfjzYMBmzOd1xEqk6d3Lf3iM1yR@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Sep 26, 2010 at 12:21 PM, Dmitriy Igrishin <dmitigr(at)gmail(dot)com> wrote:
> Yes, I am sure. I've tested it by test case in my original post.
> Do you can compile and reproduce it please?

I think the reason lo_read is returning 0 is because it's not reading
anything. See attached test case, cleaned up a bit from yours and
with some error checks added.

According to the documentation, the maximum size of a large object is
2 GB, which may be the reason for this behavior.

http://www.postgresql.org/docs/9/static/lo-intro.html

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

Attachment Content-Type Size
d.c application/octet-stream 2.1 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Dmitriy Igrishin <dmitigr(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Large objects.
Date: 2010-09-27 14:50:34
Message-ID: 14109.1285599034@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> According to the documentation, the maximum size of a large object is
> 2 GB, which may be the reason for this behavior.

In principle, since pg_largeobject stores an integer pageno, we could
support large objects of up to LOBLKSIZE * 2^31 bytes = 4TB without any
incompatible change in on-disk format. This'd require converting a lot
of the internal LO access logic to track positions as int64 not int32,
but now that we require platforms to have working int64 that's no big
drawback. The main practical problem is that the existing lo_seek and
lo_tell APIs use int32 positions. I'm not sure if there's any cleaner
way to deal with that than to add "lo_seek64" and "lo_tell64" functions,
and have the existing ones throw error if asked to deal with positions
past 2^31.

In the particular case here, I think that lo_write may actually be
writing past the 2GB boundary, while the coding in lo_read is a bit
different and stops at the 2GB "limit".

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Dmitriy Igrishin <dmitigr(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Large objects.
Date: 2010-09-27 15:01:05
Message-ID: AANLkTinr5s-jKyESwAbX5qW9-Oh6WWUdZZODFNeKw0Kc@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Sep 27, 2010 at 10:50 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> According to the documentation, the maximum size of a large object is
>> 2 GB, which may be the reason for this behavior.
>
> In principle, since pg_largeobject stores an integer pageno, we could
> support large objects of up to LOBLKSIZE * 2^31 bytes = 4TB without any
> incompatible change in on-disk format.  This'd require converting a lot
> of the internal LO access logic to track positions as int64 not int32,
> but now that we require platforms to have working int64 that's no big
> drawback.  The main practical problem is that the existing lo_seek and
> lo_tell APIs use int32 positions.  I'm not sure if there's any cleaner
> way to deal with that than to add "lo_seek64" and "lo_tell64" functions,
> and have the existing ones throw error if asked to deal with positions
> past 2^31.
>
> In the particular case here, I think that lo_write may actually be
> writing past the 2GB boundary, while the coding in lo_read is a bit
> different and stops at the 2GB "limit".

Ouch. Letting people write data to where they can't get it back from
seems double-plus ungood.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Dmitriy Igrishin <dmitigr(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Large objects.
Date: 2010-09-27 18:25:04
Message-ID: AANLkTin3XrKsdx+YXgqQC+bMbzEe_yhBAncVQuS-SY7E@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hey Robert, Tom

Tom, thank you for explanation!

Ouch. Letting people write data to where they can't get it back from
> seems double-plus ungood.
>
> Robert, yes, I agree with you. This is exactly what I wanted to say.
I've implemented a stream class in C++ and this circumstance makes
the code not so clean because I need to take into account the behavior
of lo_write() and 2GB limit.

--
// Dmitriy.


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Dmitriy Igrishin <dmitigr(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Large objects.
Date: 2010-09-27 22:16:56
Message-ID: AANLkTi=9RGkFExVkjhMHjzdJWRsg4x8_VEW67E7piKPa@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Sep 27, 2010 at 2:25 PM, Dmitriy Igrishin <dmitigr(at)gmail(dot)com> wrote:
> Hey Robert, Tom
>
> Tom, thank you for explanation!
>
>> Ouch.  Letting people write data to where they can't get it back from
>> seems double-plus ungood.
>>
> Robert, yes, I agree with you. This is exactly what I wanted to say.
> I've implemented a stream class in C++ and this circumstance makes
> the code not so clean because I need to take into account the behavior
> of lo_write() and 2GB limit.

On further examination, it appears we're not doing this. The reason
lo_read wasn't returning any data in your earlier example is because
you called it after seeking to the end of the object. If you seek to
the position where the data was written, it works fine.

A fairly plausible argument could be made that we shouldn't allow
reading or writing past 2^31-1, but it now appears to me that the
behavior is at least self-consistent.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company