Re: Large Objects

Lists: pgsql-general
From: Dan Boitnott <dan(at)mcneese(dot)edu>
To: pgsql-general(at)postgresql(dot)org
Subject: Large Objects
Date: 2004-12-27 16:39:48
Message-ID: EC42F2F9-5825-11D9-B847-000D932E24AA@mcneese.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

I need to do some investigation into the way Postgres handles large
objects for a major project involving large objects. My questions are:
* Can large objects be stored within a field value or must they be
referenced by OID?
* Are large objects backed up in the normal way or does special
action have to be taken?
* If the objects are restored or migrated will they retain their OID?
* If not, is there another means of referencing them that would be
persistent through migrations?
* Is it practical/desirable to store files MIME-Encoded inside a
text field?
* The obvious advantages:
* definitely portable across migrations and backups
* based on universal technology
* easy to code
* easy to reference and version-control
* The obvious disadvantages:
* slow, Slow, SLOW
* significant increase in per-file storage requirements

Any help would be greatly appreciated. Hope I'm posting to the right
list.

Dan


From: Bruno Wolff III <bruno(at)wolff(dot)to>
To: Dan Boitnott <dan(at)mcneese(dot)edu>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Large Objects
Date: 2004-12-31 06:21:40
Message-ID: 20041231062140.GC17555@wolff.to
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Mon, Dec 27, 2004 at 10:39:48 -0600,
Dan Boitnott <dan(at)mcneese(dot)edu> wrote:
> I need to do some investigation into the way Postgres handles large
> objects for a major project involving large objects. My questions are:

I don't know the answer to all of your questions.

> * Is it practical/desirable to store files MIME-Encoded inside a
> text field?

This should be possible if the files aren't too large. bytea is another type
that might be better to use.

> * The obvious disadvantages:
> * slow, Slow, SLOW

If you always need to access the whole file this might not be too bad.
But if you only need to access a small part, you are going to pay a big
cost as the whole record will need to be retrieved before you can pick
out the part you want.

> * significant increase in per-file storage requirements

It might not be too bad as large records can be compressed. That should get
back some of the bloat from uuencoding.


From: "Frank D(dot) Engel, Jr(dot)" <fde101(at)fjrhome(dot)net>
To: pgsql general list <pgsql-general(at)postgresql(dot)org>
Subject: Re: Large Objects
Date: 2004-12-31 15:57:11
Message-ID: A1E09AA0-5B44-11D9-A106-0050E410655F@fjrhome.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I'd advise use of BYTEA as well. It's much simpler to work with than
the OIDs, and has simpler semantics. You do need to escape data before
handing it to the query string, and handle escaped results (see the
docs), but overall much nicer than working with OIDs.

On Dec 31, 2004, at 1:21 AM, Bruno Wolff III wrote:

> On Mon, Dec 27, 2004 at 10:39:48 -0600,
> Dan Boitnott <dan(at)mcneese(dot)edu> wrote:
>> I need to do some investigation into the way Postgres handles large
>> objects for a major project involving large objects. My questions
>> are:
>
> I don't know the answer to all of your questions.
>
>> * Is it practical/desirable to store files MIME-Encoded inside a
>> text field?
>
> This should be possible if the files aren't too large. bytea is
> another type
> that might be better to use.
>
>> * The obvious disadvantages:
>> * slow, Slow, SLOW
>
> If you always need to access the whole file this might not be too bad.
> But if you only need to access a small part, you are going to pay a big
> cost as the whole record will need to be retrieved before you can pick
> out the part you want.
>
>> * significant increase in per-file storage requirements
>
> It might not be too bad as large records can be compressed. That
> should get
> back some of the bloat from uuencoding.
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 9: the planner will ignore your desire to choose an index scan if
> your
> joining column's datatypes do not match
>
>
- -----------------------------------------------------------
Frank D. Engel, Jr. <fde101(at)fjrhome(dot)net>

$ ln -s /usr/share/kjvbible /usr/manual
$ true | cat /usr/manual | grep "John 3:16"
John 3:16 For God so loved the world, that he gave his only begotten
Son, that whosoever believeth in him should not perish, but have
everlasting life.
$
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (Darwin)

iD8DBQFB1XbY7aqtWrR9cZoRAp6PAJ0UMNDpfeiI2iUaAp3CMIyaxuJNgQCffoqJ
mn4M418e7V9YZX5fwte9Ra0=
=iXtd
-----END PGP SIGNATURE-----

___________________________________________________________
$0 Web Hosting with up to 120MB web space, 1000 MB Transfer
10 Personalized POP and Web E-mail Accounts, and much more.
Signup at www.doteasy.com


From: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To: "Frank D(dot) Engel, Jr(dot)" <fde101(at)fjrhome(dot)net>
Cc: pgsql general list <pgsql-general(at)postgresql(dot)org>
Subject: Re: Large Objects
Date: 2004-12-31 17:40:27
Message-ID: 41D58F0B.7020608@commandprompt.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Frank D. Engel, Jr. wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> I'd advise use of BYTEA as well. It's much simpler to work with than
> the OIDs, and has simpler semantics. You do need to escape data
> before handing it to the query string, and handle escaped results (see
> the docs), but overall much nicer than working with OIDs.

BYTEA is not always pragmatic. What is the file is 100 megs? 256 megs?

pg_largeobject is more efficient than BYTEA for larger binaries.

Sincerely,

Joshua D. Drake

>
> On Dec 31, 2004, at 1:21 AM, Bruno Wolff III wrote:
>
>> On Mon, Dec 27, 2004 at 10:39:48 -0600,
>> Dan Boitnott <dan(at)mcneese(dot)edu> wrote:
>>
>>> I need to do some investigation into the way Postgres handles large
>>> objects for a major project involving large objects. My questions are:
>>
>>
>> I don't know the answer to all of your questions.
>>
>>> * Is it practical/desirable to store files MIME-Encoded inside a
>>> text field?
>>
>>
>> This should be possible if the files aren't too large. bytea is
>> another type
>> that might be better to use.
>>
>>> * The obvious disadvantages:
>>> * slow, Slow, SLOW
>>
>>
>> If you always need to access the whole file this might not be too bad.
>> But if you only need to access a small part, you are going to pay a big
>> cost as the whole record will need to be retrieved before you can pick
>> out the part you want.
>>
>>> * significant increase in per-file storage requirements
>>
>>
>> It might not be too bad as large records can be compressed. That
>> should get
>> back some of the bloat from uuencoding.
>>
>> ---------------------------(end of broadcast)---------------------------
>> TIP 9: the planner will ignore your desire to choose an index scan if
>> your
>> joining column's datatypes do not match
>>
>>
> - -----------------------------------------------------------
> Frank D. Engel, Jr. <fde101(at)fjrhome(dot)net>
>
> $ ln -s /usr/share/kjvbible /usr/manual
> $ true | cat /usr/manual | grep "John 3:16"
> John 3:16 For God so loved the world, that he gave his only begotten
> Son, that whosoever believeth in him should not perish, but have
> everlasting life.
> $
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.2.4 (Darwin)
>
> iD8DBQFB1XbY7aqtWrR9cZoRAp6PAJ0UMNDpfeiI2iUaAp3CMIyaxuJNgQCffoqJ
> mn4M418e7V9YZX5fwte9Ra0=
> =iXtd
> -----END PGP SIGNATURE-----
>
>
>
> ___________________________________________________________
> $0 Web Hosting with up to 120MB web space, 1000 MB Transfer
> 10 Personalized POP and Web E-mail Accounts, and much more.
> Signup at www.doteasy.com
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
> (send "unregister YourEmailAddressHere" to majordomo(at)postgresql(dot)org)

--
Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC
Postgresql support, programming shared hosting and dedicated hosting.
+1-503-667-4564 - jd(at)commandprompt(dot)com - http://www.commandprompt.com
PostgreSQL Replicator -- production quality replication for PostgreSQL

Attachment Content-Type Size
jd.vcf text/x-vcard 285 bytes

From: Michael Ben-Nes <miki(at)canaan(dot)co(dot)il>
To: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
Cc: "Frank D(dot) Engel, Jr(dot)" <fde101(at)fjrhome(dot)net>, pgsql general list <pgsql-general(at)postgresql(dot)org>
Subject: Re: Large Objects
Date: 2005-01-01 10:28:04
Message-ID: 41D67B34.2000103@canaan.co.il
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Joshua D. Drake wrote:

> Frank D. Engel, Jr. wrote:
>
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>> I'd advise use of BYTEA as well. It's much simpler to work with than
>> the OIDs, and has simpler semantics. You do need to escape data
>> before handing it to the query string, and handle escaped results
>> (see the docs), but overall much nicer than working with OIDs.
>
>
> BYTEA is not always pragmatic. What is the file is 100 megs? 256 megs?
>
Intresting.
What is the size when bytea become inafective ?

Currently i keep all my products images in bytea record. is it practical ?

how slower is it then accessing an image on a file system ( like ext3 ) ?

Cheers

>
> pg_largeobject is more efficient than BYTEA for larger binaries.
>
> Sincerely,
>
> Joshua D. Drake
>
>

--
--------------------------
Canaan Surfing Ltd.
Internet Service Providers
Ben-Nes Michael - Manager
Tel: 972-4-6991122
Cel: 972-52-8555757
Fax: 972-4-6990098
http://www.canaan.net.il
--------------------------


From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: Michael Ben-Nes <miki(at)canaan(dot)co(dot)il>
Cc: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, "Frank D(dot) Engel, Jr(dot)" <fde101(at)fjrhome(dot)net>, pgsql general list <pgsql-general(at)postgresql(dot)org>
Subject: Re: Large Objects
Date: 2005-01-01 15:48:46
Message-ID: 20050101154836.GA1090@svana.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Sat, Jan 01, 2005 at 01:28:04PM +0300, Michael Ben-Nes wrote:
> Joshua D. Drake wrote:
> >Frank D. Engel, Jr. wrote:
> >>I'd advise use of BYTEA as well. It's much simpler to work with than
> >>the OIDs, and has simpler semantics. You do need to escape data
> >>before handing it to the query string, and handle escaped results
> >>(see the docs), but overall much nicer than working with OIDs.
> >
> >
> >BYTEA is not always pragmatic. What is the file is 100 megs? 256 megs?
> >
> Intresting.
> What is the size when bytea become inafective ?

I don't think it's so much a matter of effectiveness, it makes no
difference at all in storage space. The issue is that if you store it
in a field, accessing it becomes an all or nothing affair, which means
if it's a 100Mb object, it's all going to be accessed whenever you ask
for it. OTOH, large objects have lo_read/write/seek meaning you can
access small parts at a time.

So I imagine if you're storing large PDF files and all you're doing is
dumping them to a client when they ask, it doesn't matter. But if the
objects have structure and you might be interested in looking inside
them without pulling the whole object down, the LO interface is better
suited.

When you delete a row, the object contained in it goes away too. Large
Objects have a lifecycle outside of normal table values, and so may
need separate managing...

Hope this helps,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.


From: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To: Michael Ben-Nes <miki(at)canaan(dot)co(dot)il>
Cc: "Frank D(dot) Engel, Jr(dot)" <fde101(at)fjrhome(dot)net>, pgsql general list <pgsql-general(at)postgresql(dot)org>
Subject: Re: Large Objects
Date: 2005-01-01 17:40:09
Message-ID: 41D6E079.7030309@commandprompt.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general


>>
> Intresting.
> What is the size when bytea become inafective ?
>
> Currently i keep all my products images in bytea record. is it
> practical ?

Well I am going to make the assumption that you product images are small...
sub 100k or something. Bytea is just fine for that. The problem is when
the binary you want to store is 50 megs. When you access that file you
will be using 50 megs of ram to do so.

Large Objects don't work that way, you don't have the memory overhead. So
it really depends on what you want to store.

>
> how slower is it then accessing an image on a file system ( like ext3 ) ?

Well that would be an interesting test. Ext3 is very slow. I would assume
that Ext3 would be faster just because of the database overhead. However
you gain from having the images in the database for flexibility and
manageability.

Sincerely,

Joshua D. Drake

>
>
> Cheers
>
>>
>> pg_largeobject is more efficient than BYTEA for larger binaries.
>>
>> Sincerely,
>>
>> Joshua D. Drake
>>
>>
>

--
Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC
Postgresql support, programming shared hosting and dedicated hosting.
+1-503-667-4564 - jd(at)commandprompt(dot)com - http://www.commandprompt.com
PostgreSQL Replicator -- production quality replication for PostgreSQL

Attachment Content-Type Size
jd.vcf text/x-vcard 285 bytes

From: Dan Boitnott <dan(at)mcneese(dot)edu>
To: pgsql general list <pgsql-general(at)postgresql(dot)org>
Subject: Re: Large Objects
Date: 2005-01-02 01:50:30
Message-ID: AEC4C07E-5C60-11D9-83B8-000D932E24AA@mcneese.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Jan 1, 2005, at 11:40 AM, Joshua D. Drake wrote:

>
>>>
>> Intresting.
>> What is the size when bytea become inafective ?
>>
>> Currently i keep all my products images in bytea record. is it
>> practical ?
>
> Well I am going to make the assumption that you product images are
> small...
> sub 100k or something. Bytea is just fine for that. The problem is when
> the binary you want to store is 50 megs. When you access that file you
> will be using 50 megs of ram to do so.
>
> Large Objects don't work that way, you don't have the memory overhead.
> So
> it really depends on what you want to store.
>

I prefer the _idea_ of using large objects but am worried about the
implications. Without them I can back up the database using pg_dump
and get a single tar file which can perfectly represent the database.
This gives me (and those on high) the warm-fuzzies. If I store files
(PDFs of varying sizes by the way, say from 500k to 50M) as large
objects, will I still be able to restore the _whole_ database from a
single pg_dump tar file?

>
>>
>> how slower is it then accessing an image on a file system ( like ext3
>> ) ?
>
> Well that would be an interesting test. Ext3 is very slow. I would
> assume
> that Ext3 would be faster just because of the database overhead.
> However you gain from having the images in the database for
> flexibility and manageability.
>
> Sincerely,
>
> Joshua D. Drake
>
>
>
>>
>>
>> Cheers
>>
>>>
>>> pg_largeobject is more efficient than BYTEA for larger binaries.
>>>
>>> Sincerely,
>>>
>>> Joshua D. Drake
>>>
>>>
>>
>
>
> --
> Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC
> Postgresql support, programming shared hosting and dedicated hosting.
> +1-503-667-4564 - jd(at)commandprompt(dot)com - http://www.commandprompt.com
> PostgreSQL Replicator -- production quality replication for PostgreSQL
>
> <jd.vcf>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
> message can get through to the mailing list cleanly


From: Karsten Hilbert <Karsten(dot)Hilbert(at)gmx(dot)net>
To: pgsql general list <pgsql-general(at)postgresql(dot)org>
Subject: Re: Large Objects
Date: 2005-01-02 16:38:24
Message-ID: 20050102173824.C566@hermes.hilbert.loc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

> > >BYTEA is not always pragmatic. What is the file is 100 megs? 256 megs?

> > What is the size when bytea become inafective ?

> I don't think it's so much a matter of effectiveness, it makes no
> difference at all in storage space.
Ah, thanks, good to know. Something new to learn every day...

> The issue is that if you store it
> in a field, accessing it becomes an all or nothing affair, which means
> if it's a 100Mb object, it's all going to be accessed whenever you ask
> for it.
At least for reads you are wrong. You can use substring() on
bytea quite nicely. Remember, however, that that operates on
*bytes*, not characters. Also be careful about encodings being
set for the connection. At least with PostgreSQL < 7.4 we found
we had to "reset client_encoding" despite the docs saying
encodings won't affect bytea field reads.

Karsten
--
GPG key ID E4071346 @ wwwkeys.pgp.net
E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346


From: Robby Russell <robby(at)planetargon(dot)com>
To: Dan Boitnott <dan(at)mcneese(dot)edu>
Cc: pgsql general list <pgsql-general(at)postgresql(dot)org>
Subject: Re: Large Objects
Date: 2005-01-03 18:15:30
Message-ID: 1104776131.12005.69.camel@linus
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Sat, 2005-01-01 at 19:50 -0600, Dan Boitnott wrote:
> On Jan 1, 2005, at 11:40 AM, Joshua D. Drake wrote:
>
> >
> >>>
> >> Intresting.
> >> What is the size when bytea become inafective ?
> >>
> >> Currently i keep all my products images in bytea record. is it
> >> practical ?
> >
> > Well I am going to make the assumption that you product images are
> > small...
> > sub 100k or something. Bytea is just fine for that. The problem is when
> > the binary you want to store is 50 megs. When you access that file you
> > will be using 50 megs of ram to do so.
> >
> > Large Objects don't work that way, you don't have the memory overhead.
> > So
> > it really depends on what you want to store.
> >
>
> I prefer the _idea_ of using large objects but am worried about the
> implications. Without them I can back up the database using pg_dump
> and get a single tar file which can perfectly represent the database.
> This gives me (and those on high) the warm-fuzzies. If I store files
> (PDFs of varying sizes by the way, say from 500k to 50M) as large
> objects, will I still be able to restore the _whole_ database from a
> single pg_dump tar file?

Yes, you will be able to do this. Your pg_dump

http://www.postgresql.org/docs/current/static/app-pgdump.html
> -b
> --blobs
>
> Include large objects in dump.
>
>

-Robby

--
/***************************************
* Robby Russell | Owner.Developer.Geek
* PLANET ARGON | www.planetargon.com
* Portland, OR | robby(at)planetargon(dot)com
* 503.351.4730 | blog.planetargon.com
* PHP/PostgreSQL Hosting & Development
* --- Now supporting PHP5 ---
****************************************/


From: Pierre-Frédéric Caillaud <lists(at)boutiquenumerique(dot)com>
To: "pgsql general list" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Large Objects
Date: 2005-01-03 18:43:04
Message-ID: opsj1h922pcq72hf@musicbox
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general


> gives me (and those on high) the warm-fuzzies. If I store files (PDFs
> of varying sizes by the way, say from 500k to 50M) as large objects,
> will I still be able to restore the _whole_ database from a single
> pg_dump tar file?

Don't forget a thing :

If you put a webserver in front of this, you will have to load the entire
object in RAM before you can serve it to the client, which means a little
number of requests on large files will kick the server into swapalot mode.
Using filesystem files solves this in the most easy of ways, but creates
backup problems, but it's not that complex.