Re: BLOB support

From: "Ross J(dot) Reedstrom" <reedstrm(at)rice(dot)edu>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: BLOB support
Date: 2011-06-02 20:54:53
Message-ID: 20110602205453.GC15641@rice.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jun 02, 2011 at 01:43:16PM -0400, Tom Lane wrote:
> =?utf-8?q?Rados=C5=82aw_Smogura?= <rsmogura(at)softperience(dot)eu> writes:
> > Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> Thursday 02 of June 2011 16:42:42
> >> Yes. I think the appropriate problem statement is "provide streaming
> >> access to large field values, as an alternative to just fetching/storing
> >> the entire value at once". I see no good reason to import the entire
> >> messy notion of LOBS/CLOBS. (The fact that other databases have done it
> >> is not a good reason.)
>
> > In context of LOBs streaming is resolved... I use current LO functionallity
> > (so driver may be able to read LOBs as psql \lo_export does it or using COPY
> > subprotocol) and client should get just LO's id.
>
> Just to be clear: I do not want to expose a concept of object IDs for
> field values in the first place. All of the problems you enumerate stem
> from the idea that LOBs ought to be a distinct kind of field, and I
> don't buy that.
>

I think you're saying no OIDs exposed to the SQL i.e. actually stored in
a field and returned by a SELECT? (Which seems to be the proposal).

As I mentioned recently on another list, I've wrapped a block-oriented
"streaming" interface over bytea in python for a web app, specifically
to deal with the latency and memory footprint issues of storing
'largish' files directly in the db. I find that with a 64K blocksize,
latency is 'good enough' and substr() seems to be constant time for a
given size, no matter what part of the bytea value I'm fetching: toast
does a fine job of random access.

I was musing about providing a way to use the existing client lo
streaming interface (rather than the backend bits) for this type of
access. The thing called an OID in the client interface is really just a
nonce to tell the backend what data to send. With a single generator
function:

SELECT CASE WHEN is_lo THEN my_loid ELSE make_lo_oid(my_bytea) END
FROM my_file_table WHERE id = 34534;

Then plugging that back into the lo interface from the client side,
would let me use bytea as I currently do for files under 1GB, lo for
larger, and gain client side streaming that is transparent to the
storage of that particular value. Admittedly, application software would
still need to know how to _store_ different values, and manage large
objects, with all the pain that entails. But there's some gain in
unifying the reading part.

Hard to not call it an oid, since that's what the client libraries
already document it as (at least, python does)

Ross
--
Ross Reedstrom, Ph.D. reedstrm(at)rice(dot)edu
Systems Engineer & Admin, Research Scientist phone: 713-348-6166
Connexions http://cnx.org fax: 713-348-3665
Rice University MS-375, Houston, TX 77005
GPG Key fingerprint = F023 82C8 9B0E 2CC6 0D8E F888 D3AE 810E 88F0 BEDE

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2011-06-02 21:04:16 Re: 9.2 branch and 9.1beta2 timing (was Re: InitProcGlobal cleanup)
Previous Message Tom Lane 2011-06-02 20:44:00 Re: Domains versus polymorphic functions, redux