Re: [PROTOCOL TODO] Permit streaming of unknown-length lob/clob (bytea,text,etc)

Lists: pgsql-hackers
From: Craig Ringer <craig(at)2ndquadrant(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: [PROTOCOL TODO] Permit streaming of unknown-length lob/clob (bytea, text, etc)
Date: 2014-12-01 06:55:22
Message-ID: 547C10DA.7070903@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi all

Currently the client must know the size of a large lob/clob field, like
a 'bytea' or 'text' field, in order to send it to the server. This can
force the client to buffer all the data before sending it to the server.

It would be helpful if the v4 protocol permitted the client to specify
the field length as unknown / TBD, then stream data until an end marker
is read. Some encoding would be required for binary data to ensure that
occurrences of the end marker in the streamed data were properly
handled, but there are many well established schemes for doing this.

I'm aware that this is possible for pg_largeobject, but this is with
reference to big varlena fields.

This would be a useful change to have in connection with the
already-TODO'd lazy fetching of large TOASTed values, as part of a
general improvement in Pg's handling of big values in tuples.

Thoughts/comments?

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: David Fetter <david(at)fetter(dot)org>
To: Craig Ringer <craig(at)2ndquadrant(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PROTOCOL TODO] Permit streaming of unknown-length lob/clob (bytea,text,etc)
Date: 2014-12-01 14:38:11
Message-ID: 20141201143811.GA7121@fetter.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Dec 01, 2014 at 02:55:22PM +0800, Craig Ringer wrote:
> Hi all
>
> Currently the client must know the size of a large lob/clob field, like
> a 'bytea' or 'text' field, in order to send it to the server. This can
> force the client to buffer all the data before sending it to the server.

Yes, this is not good.

> It would be helpful if the v4 protocol permitted the client to specify
> the field length as unknown / TBD, then stream data until an end marker
> is read.

What's wrong with specifying its length in advance instead? Are you
thinking of a one or more use cases where it's both large and unknown?

Cheers,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


From: Craig Ringer <craig(at)2ndquadrant(dot)com>
To: David Fetter <david(at)fetter(dot)org>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PROTOCOL TODO] Permit streaming of unknown-length lob/clob (bytea,text,etc)
Date: 2014-12-01 14:54:08
Message-ID: 547C8110.305@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 12/01/2014 10:38 PM, David Fetter wrote:
> On Mon, Dec 01, 2014 at 02:55:22PM +0800, Craig Ringer wrote:
>> Hi all
>>
>> Currently the client must know the size of a large lob/clob field, like
>> a 'bytea' or 'text' field, in order to send it to the server. This can
>> force the client to buffer all the data before sending it to the server.
>
> Yes, this is not good.
>
>> It would be helpful if the v4 protocol permitted the client to specify
>> the field length as unknown / TBD, then stream data until an end marker
>> is read.
>
> What's wrong with specifying its length in advance instead? Are you
> thinking of a one or more use cases where it's both large and unknown?

I am - specifically, the JDBC setBlob(...) and setClob(...) APIs that
accept streams without a specified length:

https://docs.oracle.com/javase/7/docs/api/java/sql/PreparedStatement.html#setBlob(int,%20java.io.InputStream)

https://docs.oracle.com/javase/7/docs/api/java/sql/PreparedStatement.html#setClob(int,%20java.io.Reader)

There are variants that do take a length, so PgJDBC can (and now does)
implement the no-length variants by internally buffering the stream
until EOF. It'd be nice to get rid of that though.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Craig Ringer <craig(at)2ndquadrant(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PROTOCOL TODO] Permit streaming of unknown-length lob/clob (bytea, text, etc)
Date: 2014-12-01 14:57:01
Message-ID: 8075.1417445821@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Craig Ringer <craig(at)2ndquadrant(dot)com> writes:
> Currently the client must know the size of a large lob/clob field, like
> a 'bytea' or 'text' field, in order to send it to the server. This can
> force the client to buffer all the data before sending it to the server.

> It would be helpful if the v4 protocol permitted the client to specify
> the field length as unknown / TBD, then stream data until an end marker
> is read. Some encoding would be required for binary data to ensure that
> occurrences of the end marker in the streamed data were properly
> handled, but there are many well established schemes for doing this.

I think this is pretty much a non-starter as stated, because the v3
protocol requires all messages to have a preceding length word. That's
not very negotiable.

What's already on the TODO list is to allow large field values to be sent
or received in segments, perhaps with a cursor-like arrangement. You can
do that today for blobs, but not for oversize regular table fields.

Of course, considering that the maximum practical size of a regular field
is probably in the dozens of megabytes, and that RAM is getting cheaper
all the time, it's not clear that it's all that much of a hardship for
clients to buffer the whole thing. If we've not gotten around to this
in the last dozen years, it's unlikely we'll get to it in the future
either ...

regards, tom lane