Re: Storing many big files in database- should I do it?

Lists: pgsql-general
From: Rod <cckramer(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Storing many big files in database- should I do it?
Date: 2010-04-27 08:42:40
Message-ID: l2kb4c00a111004270142uecd400b9r1a202ea9d154eca@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Hello,

I have a web application where users upload/share files.
After file is uploaded it is copied to S3 and all subsequent downloads
are done from there.
So in a file's lifetime it's accessed only twice- when created and
when copied to S3.

Files are documents, of different size from few kilobytes to 200
Megabytes. Number of files: thousands to hundreds of thousands.

My dilemma is - Should I store files in PGSQL database or store in
filesystem and keep only metadata in database?

I see the possible cons of using PGSQL as storage:
- more network bandwidth required comparing to access NFS-mounted filesystem ?
- if database becomes corrupt you can't recover individual files
- you can't backup live database unless you install complicated
replication add-ons
- more CPU required to store/retrieve files (comparing to filesystem access)
- size overhead, e.g. storing 1000 bytes will take 1000 bytes in
database + 100 bytes for db metadata, index, etc. with lot of files
this will be a lot of overhead.

Are these concerns valid?
Anyone had this kind of design problem and how did you solve it?

Thanks.


From: John R Pierce <pierce(at)hogranch(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Storing many big files in database- should I do it?
Date: 2010-04-27 08:54:15
Message-ID: 4BD6A637.8030902@hogranch.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Rod wrote:
> Hello,
>
> I have a web application where users upload/share files.
> After file is uploaded it is copied to S3 and all subsequent downloads
> are done from there.
> So in a file's lifetime it's accessed only twice- when created and
> when copied to S3.
>
> Files are documents, of different size from few kilobytes to 200
> Megabytes. Number of files: thousands to hundreds of thousands.
>
> My dilemma is - Should I store files in PGSQL database or store in
> filesystem and keep only metadata in database?
>
> I see the possible cons of using PGSQL as storage:
> - more network bandwidth required comparing to access NFS-mounted filesystem ?
> - if database becomes corrupt you can't recover individual files
> - you can't backup live database unless you install complicated
> replication add-ons
> - more CPU required to store/retrieve files (comparing to filesystem access)
> - size overhead, e.g. storing 1000 bytes will take 1000 bytes in
> database + 100 bytes for db metadata, index, etc. with lot of files
> this will be a lot of overhead.
>
> Are these concerns valid?
> Anyone had this kind of design problem and how did you solve it?
>

S3 storage is not suitable for running a RDBMS.

An RDBMS wants fast low latency storage using 8k block random reads and
writes. S3 is high latency and oriented towards streaming


From: Rod <cckramer(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Storing many big files in database- should I do it?
Date: 2010-04-27 08:58:30
Message-ID: q2yb4c00a111004270158rb76cc76dj52c7c9f709eb1a14@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

No, I'm not storing RDBMS in S3. I didn't write that in my post.
S3 is used as CDN, only for downloading files.

On Tue, Apr 27, 2010 at 6:54 PM, John R Pierce <pierce(at)hogranch(dot)com> wrote:
> Rod wrote:
>>
>> Hello,
>>
>> I have a web application where users upload/share files.
>> After file is uploaded it is copied to S3 and all subsequent downloads
>> are done from there.
>> So in a file's lifetime it's accessed only twice- when created and
>> when copied to S3.
>>
>> Files are documents, of different size from few kilobytes to 200
>> Megabytes. Number of files: thousands to hundreds of thousands.
>>
>> My dilemma is - Should I store files in PGSQL database or store in
>> filesystem and keep only metadata in database?
>>
>> I see the possible cons of using PGSQL as storage:
>> - more network bandwidth required comparing to access NFS-mounted
>> filesystem ?
>> - if database becomes corrupt you can't recover individual files
>> - you can't backup live database unless you install complicated
>> replication add-ons
>> - more CPU required to store/retrieve files (comparing to filesystem
>> access)
>> - size overhead, e.g. storing 1000 bytes will take 1000 bytes in
>> database + 100 bytes for db metadata, index, etc. with lot of files
>> this will be a lot of overhead.
>>
>> Are these concerns valid?
>> Anyone had this kind of design problem and how did you solve it?
>>
>
> S3 storage is not suitable for running a RDBMS.
> An RDBMS wants fast low latency storage using 8k block random reads and
> writes.  S3 is high latency and oriented towards streaming
>
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>


From: "Massa, Harald Armin" <chef(at)ghum(dot)de>
To: Rod <cckramer(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Storing many big files in database- should I do it?
Date: 2010-04-27 09:03:42
Message-ID: g2ue3e180dc1004270203w9c7edf0epd25c4459f7cab5a2@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

>
> No, I'm not storing RDBMS in S3. I didn't write that in my post.
> S3 is used as CDN, only for downloading files.
>

So you are storing your files on S3 ?

Why should you store those files additionally in a PostgreSQL database?

If you want to keep track of them / remember metadata, hashes will do the
job with much less memory.

Harald

--
GHUM Harald Massa
persuadere et programmare
Harald Armin Massa
Spielberger Straße 49
70435 Stuttgart
0173/9409607
no fx, no carrier pigeon
-
%s is too gigantic of an industry to bend to the whims of reality


From: Cédric Villemain <cedric(dot)villemain(dot)debian(at)gmail(dot)com>
To: Rod <cckramer(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Storing many big files in database- should I do it?
Date: 2010-04-27 09:17:42
Message-ID: q2ge94e14cd1004270217n67faa189r8f5df165571a0c80@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

2010/4/27 Rod <cckramer(at)gmail(dot)com>:
> Hello,
>
> I have a web application where users upload/share files.
> After file is uploaded it is copied to S3 and all subsequent downloads
> are done from there.
> So in a file's lifetime it's accessed only twice- when created and
> when copied to S3.
>
> Files are documents, of different size from few kilobytes to 200
> Megabytes. Number of files: thousands to hundreds of thousands.
>
> My dilemma is - Should I store files in PGSQL database or store in
> filesystem and keep only metadata in database?
>
> I see the possible cons of using PGSQL as storage:
> - more network bandwidth required comparing to access NFS-mounted filesystem ?
> - if database becomes corrupt you can't recover individual files
> - you can't backup live database unless you install complicated
> replication add-ons
> - more CPU required to store/retrieve files (comparing to filesystem access)
> - size overhead, e.g. storing 1000 bytes will take 1000 bytes in
> database + 100 bytes for db metadata, index, etc. with lot of files
> this will be a lot of overhead.
>
> Are these concerns valid?

yes

> Anyone had this kind of design problem and how did you solve it?

store your files in a filesystem, and keep the path to the file (plus
metadata, acl, etc...) in database.

>
> Thanks.
>
> --
> Sent via pgsql-general mailing list (pgsql-general(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

--
Cédric Villemain


From: Rod <cckramer(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Storing many big files in database- should I do it?
Date: 2010-04-27 09:23:31
Message-ID: j2ib4c00a111004270223xe2764897w4e8938caa83c6eb7@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

S3 is not primary storage for the files, it's a distribution system.
We want to be able to switch form S3 to other CDN if required.
So, "Master" copies of files is kept on private server. Question is
should it be database of filesystem.

On Tue, Apr 27, 2010 at 7:03 PM, Massa, Harald Armin <chef(at)ghum(dot)de> wrote:
>> No, I'm not storing RDBMS in S3. I didn't write that in my post.
>> S3 is used as CDN, only for downloading files.
>
>
> So you are storing your files on S3 ?
>
> Why should you store those files additionally in a PostgreSQL database?
>
> If you want to keep track of them / remember metadata, hashes will do the
> job with much less memory.
>
> Harald
>
> --
> GHUM Harald Massa
> persuadere et programmare
> Harald Armin Massa
> Spielberger Straße 49
> 70435 Stuttgart
> 0173/9409607
> no fx, no carrier pigeon
> -
> %s is too gigantic of an industry to bend to the whims of reality
>


From: Adrian von Bidder <avbidder(at)fortytwo(dot)ch>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Storing many big files in database- should I do it?
Date: 2010-04-27 10:43:28
Message-ID: 201004271243.35493@fortytwo.ch
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Tuesday 27 April 2010 11.17:42 Cédric Villemain wrote:
> > Anyone had this kind of design problem and how did you solve it?
>
> store your files in a filesystem, and keep the path to the file (plus
> metadata, acl, etc...) in database.

... and be careful that db and file storage do not go out of sync.

But if files are ever only added and possibly removed (but never changed),
this is not too hard:

* be sure to commit db transaction only after file has been written to disk
(use fsync or similar to be sure!) (For file deletions: first delete db
metadata, then delete the file.)
* be sure to detect failed writes and abort the db transaction or otherwise
properly handle errors while storing the file.
* occasionally run a clean-up to remove files that were written to
filesystem where the db metadata was not stored. Should be a rare case but
it probably will happen.

PostgreSQL support 2PC (PREPARE and then COMMIT as separate steps); you may
want to use this (PREPARE database transaction, then do filesystem
operations. If filessystem operation fails, you cann ROLLBACK the db
connection, otherwise COMMIT.) That way, you don't lose transactional
semantics.

Backup requires some more thought. I guess you could use some kind of
volume management to get filesysstem snapshots, but you have to be sure the
fs snapshot reflects the point in time when the database backup was made.
Depending on load / availability requirements you may get away with stopping
data modification at the application level for a few seconds until the db
backup has started and the filesystem snapshot has been created.

cheers
-- vbi

--
featured product: PostgreSQL - http://postgresql.org


From: Anthony <osm(at)inbox(dot)org>
To: Cédric Villemain <cedric(dot)villemain(dot)debian(at)gmail(dot)com>
Cc: Rod <cckramer(at)gmail(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: Storing many big files in database- should I do it?
Date: 2010-04-28 00:45:43
Message-ID: m2n71cd4dd91004271745z3d75b328z258e1a5818f1779b@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Tue, Apr 27, 2010 at 5:17 AM, Cédric Villemain <
cedric(dot)villemain(dot)debian(at)gmail(dot)com> wrote:

> store your files in a filesystem, and keep the path to the file (plus
> metadata, acl, etc...) in database.
>

What type of filesystem is good for this? A filesystem with support for
storing tens of thousands of files in a single directory, or should one play
the 41/56/34/41563489.ext game?

Are there any open source systems which handle keeping a filesystem and
database in sync for this purpose, or is it a wheel that keeps getting
reinvented?

I know "store your files in a filesystem" is the best long-term solution.
But it's just so much easier to just throw everything in the database.


From: Adrian Klaver <adrian(dot)klaver(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Cc: Anthony <osm(at)inbox(dot)org>, Cédric Villemain <cedric(dot)villemain(dot)debian(at)gmail(dot)com>, Rod <cckramer(at)gmail(dot)com>
Subject: Re: Storing many big files in database- should I do it?
Date: 2010-04-28 02:32:38
Message-ID: 201004271932.38494.adrian.klaver@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Tuesday 27 April 2010 5:45:43 pm Anthony wrote:
> On Tue, Apr 27, 2010 at 5:17 AM, Cédric Villemain <
>
> cedric(dot)villemain(dot)debian(at)gmail(dot)com> wrote:
> > store your files in a filesystem, and keep the path to the file (plus
> > metadata, acl, etc...) in database.
>
> What type of filesystem is good for this? A filesystem with support for
> storing tens of thousands of files in a single directory, or should one
> play the 41/56/34/41563489.ext game?
>
> Are there any open source systems which handle keeping a filesystem and
> database in sync for this purpose, or is it a wheel that keeps getting
> reinvented?
>
> I know "store your files in a filesystem" is the best long-term solution.
> But it's just so much easier to just throw everything in the database.

In the for what it is worth department check out this Wiki:
http://sourceforge.net/apps/mediawiki/fuse/index.php?title=DatabaseFileSystems

--
Adrian Klaver
adrian(dot)klaver(at)gmail(dot)com


From: Cédric Villemain <cedric(dot)villemain(dot)debian(at)gmail(dot)com>
To: adrian(dot)klaver(at)gmail(dot)com
Cc: pgsql-general(at)postgresql(dot)org, Anthony <osm(at)inbox(dot)org>, Rod <cckramer(at)gmail(dot)com>
Subject: Re: Storing many big files in database- should I do it?
Date: 2010-04-29 09:10:48
Message-ID: o2ie94e14cd1004290210y354cbf68s43422977260cecea@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

2010/4/28 Adrian Klaver <adrian(dot)klaver(at)gmail(dot)com>:
> On Tuesday 27 April 2010 5:45:43 pm Anthony wrote:
>> On Tue, Apr 27, 2010 at 5:17 AM, Cédric Villemain <
>>
>> cedric(dot)villemain(dot)debian(at)gmail(dot)com> wrote:
>> > store your files in a filesystem, and keep the path to the file (plus
>> > metadata, acl, etc...) in database.
>>
>> What type of filesystem is good for this?  A filesystem with support for
>> storing tens of thousands of files in a single directory, or should one
>> play the 41/56/34/41563489.ext game?

I'll prefer go with XFS or ext{3-4}. In both case with a path game.
You path game will let you handle the scalability of your uploads. (so
the first increment is the first directory) something like
1/2/3/4/foo.file 2/2/3/4/bar.file etc... You might explore a hash
function or something that split a SHA1(or other) sum of the file to
get the path.

>>
>> Are there any open source systems which handle keeping a filesystem and
>> database in sync for this purpose, or is it a wheel that keeps getting
>> reinvented?
>>
>> I know "store your files in a filesystem" is the best long-term solution.
>> But it's just so much easier to just throw everything in the database.
>
> In the for what it is worth department check out this Wiki:
> http://sourceforge.net/apps/mediawiki/fuse/index.php?title=DatabaseFileSystems

and postgres fuse also :-D

>
> --
> Adrian Klaver
> adrian(dot)klaver(at)gmail(dot)com
>

--
Cédric Villemain


From: David Wall <d(dot)wall(at)computer(dot)org>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Storing many big files in database- should I do it?
Date: 2010-04-29 16:07:52
Message-ID: 4BD9AED8.5060502@computer.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Things to consider when /not /storing them in the DB:

1) Backups of DB are incomplete without a corresponding backup of the files.

2) No transactional integrity between filesystem and DB, so you will
have to deal with orphans from both INSERT and DELETE (assuming you
don't also update the files).

3) No built in ability for replication, such as WAL shipping

Big downside for the DB is that all large objects appear to be stored
together in pg_catalog.pg_largeobject, which seems axiomatically
troubling that you know you have lots of big data, so you then store
them together, and then worry about running out of 'loids'.

David

On 4/29/2010 2:10 AM, Cédric Villemain wrote:
> 2010/4/28 Adrian Klaver<adrian(dot)klaver(at)gmail(dot)com>:
>
>> On Tuesday 27 April 2010 5:45:43 pm Anthony wrote:
>>
>>> On Tue, Apr 27, 2010 at 5:17 AM, Cédric Villemain<
>>>
>>> cedric(dot)villemain(dot)debian(at)gmail(dot)com> wrote:
>>>
>>>> store your files in a filesystem, and keep the path to the file (plus
>>>> metadata, acl, etc...) in database.
>>>>
>>> What type of filesystem is good for this? A filesystem with support for
>>> storing tens of thousands of files in a single directory, or should one
>>> play the 41/56/34/41563489.ext game?
>>>
> I'll prefer go with XFS or ext{3-4}. In both case with a path game.
> You path game will let you handle the scalability of your uploads. (so
> the first increment is the first directory) something like
> 1/2/3/4/foo.file 2/2/3/4/bar.file etc... You might explore a hash
> function or something that split a SHA1(or other) sum of the file to
> get the path.
>
>
>
>>> Are there any open source systems which handle keeping a filesystem and
>>> database in sync for this purpose, or is it a wheel that keeps getting
>>> reinvented?
>>>
>>> I know "store your files in a filesystem" is the best long-term solution.
>>> But it's just so much easier to just throw everything in the database.
>>>
>> In the for what it is worth department check out this Wiki:
>> http://sourceforge.net/apps/mediawiki/fuse/index.php?title=DatabaseFileSystems
>>
> and postgres fuse also :-D
>
>
>> --
>> Adrian Klaver
>> adrian(dot)klaver(at)gmail(dot)com
>>
>>
>
>
>


From: Justin Graf <justin(at)magwerks(dot)com>
To: David Wall <d(dot)wall(at)computer(dot)org>, Postgres General Postgres General <pgsql-general(at)postgresql(dot)org>
Subject: Re: Storing many big files in database- should I do it?
Date: 2010-04-29 16:45:55
Message-ID: 4BD9B7C3.7090309@magwerks.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On 4/29/2010 12:07 PM, David Wall wrote:
>
>
> Big downside for the DB is that all large objects appear to be stored
> together in pg_catalog.pg_largeobject, which seems axiomatically
> troubling that you know you have lots of big data, so you then store
> them together, and then worry about running out of 'loids'.
Huh ??? isn't that point of using bytea or text datatypes.

I could have sworn bytea does not use large object interface it uses
TOAST or have i gone insane

Many people encode the binary data in Base64 and store as text data
type?? Then never have to deal with escaping bytea data type. Which i
have found can be a pain

All legitimate Magwerks Corporation quotations are sent in a .PDF file attachment with a unique ID number generated by our proprietary quotation system. Quotations received via any other form of communication will not be honored.

CONFIDENTIALITY NOTICE: This e-mail, including attachments, may contain legally privileged, confidential or other information proprietary to Magwerks Corporation and is intended solely for the use of the individual to whom it addresses. If the reader of this e-mail is not the intended recipient or authorized agent, the reader is hereby notified that any unauthorized viewing, dissemination, distribution or copying of this e-mail is strictly prohibited. If you have received this e-mail in error, please notify the sender by replying to this message and destroy all occurrences of this e-mail immediately.
Thank you.


From: Guillaume Lelarge <guillaume(at)lelarge(dot)info>
To: Justin Graf <justin(at)magwerks(dot)com>
Cc: David Wall <d(dot)wall(at)computer(dot)org>, Postgres General Postgres General <pgsql-general(at)postgresql(dot)org>
Subject: Re: Storing many big files in database- should I do it?
Date: 2010-04-29 16:52:20
Message-ID: 4BD9B944.4090604@lelarge.info
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Le 29/04/2010 18:45, Justin Graf a écrit :
> On 4/29/2010 12:07 PM, David Wall wrote:
>>
>>
>> Big downside for the DB is that all large objects appear to be stored
>> together in pg_catalog.pg_largeobject, which seems axiomatically
>> troubling that you know you have lots of big data, so you then store
>> them together, and then worry about running out of 'loids'.
> Huh ??? isn't that point of using bytea or text datatypes.
>
> I could have sworn bytea does not use large object interface it uses
> TOAST or have i gone insane
>

You're not insane :)

Put it another way: bytea values are not stored in the pg_largeobject
catalog.

--
Guillaume.
http://www.postgresqlfr.org
http://dalibo.com


From: David Wall <d(dot)wall(at)computer(dot)org>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Storing many big files in database- should I do it?
Date: 2010-04-29 17:51:53
Message-ID: 4BD9C739.70708@computer.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general


>> Huh ??? isn't that point of using bytea or text datatypes.
>>
>> I could have sworn bytea does not use large object interface it uses
>> TOAST or have i gone insane
>>
>>
> You're not insane :)
>
> Put it another way: bytea values are not stored in the pg_largeobject
> catalog.
>

I missed the part that BYTEA was being used since it's generally not a
good way for starting large binary data because you are right that BYTEA
requires escaping across the wire (client to backend) both directions,
which for true binary data (like compressed/encrypted data, images or
other non-text files) makes for a lot of expansion in size and related
memory.

BYTEA and TEXT both can store up to 1GB of data (max field length),
which means even less "file size" supported if you use TEXT with base64
coding. LO supports 2GB of data. In JDBC, typically BYTEA is used with
byte[] or binary stream while LOs with BLOB. I think LOs allow for
streaming with the backend, too, but not sure about that, whereas I'm
pretty sure BYTEA/TEXT move all the data together you it will be in
memory all or nothing.

Of course, to support larger file storage than 1GB or 2GB, you'll have
to create your own "toast" like capability to split them into multiple rows.

David


From: Justin Graf <justin(at)magwerks(dot)com>
To: David Wall <d(dot)wall(at)computer(dot)org>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Storing many big files in database- should I do it?
Date: 2010-04-29 18:07:31
Message-ID: 4BD9CAE3.2000804@magwerks.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On 4/29/2010 1:51 PM, David Wall wrote:
>
>> Put it another way: bytea values are not stored in the pg_largeobject
>> catalog.
>
> I missed the part that BYTEA was being used since it's generally not a
> good way for starting large binary data because you are right that
> BYTEA requires escaping across the wire (client to backend) both
> directions, which for true binary data (like compressed/encrypted
> data, images or other non-text files) makes for a lot of expansion in
> size and related memory.
>
> BYTEA and TEXT both can store up to 1GB of data (max field length),
> which means even less "file size" supported if you use TEXT with
> base64 coding. LO supports 2GB of data. In JDBC, typically BYTEA is
> used with byte[] or binary stream while LOs with BLOB. I think LOs
> allow for streaming with the backend, too, but not sure about that,
> whereas I'm pretty sure BYTEA/TEXT move all the data together you it
> will be in memory all or nothing.
>
> Of course, to support larger file storage than 1GB or 2GB, you'll have
> to create your own "toast" like capability to split them into multiple
> rows.
>
> David
>
Outside of videos/media streams what other kind of data is going to be
1gig in size. Thats allot of data still even still today.

We all talk about 1 gig and 2 gig limits on this, but really who has
bumped into that on regular bases??? Every time i hear about that not
being big enough the person is trying to shoe horn in media files into
the database, which is insane

All legitimate Magwerks Corporation quotations are sent in a .PDF file attachment with a unique ID number generated by our proprietary quotation system. Quotations received via any other form of communication will not be honored.

CONFIDENTIALITY NOTICE: This e-mail, including attachments, may contain legally privileged, confidential or other information proprietary to Magwerks Corporation and is intended solely for the use of the individual to whom it addresses. If the reader of this e-mail is not the intended recipient or authorized agent, the reader is hereby notified that any unauthorized viewing, dissemination, distribution or copying of this e-mail is strictly prohibited. If you have received this e-mail in error, please notify the sender by replying to this message and destroy all occurrences of this e-mail immediately.
Thank you.


From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: David Wall <d(dot)wall(at)computer(dot)org>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Storing many big files in database- should I do it?
Date: 2010-04-29 18:32:19
Message-ID: j2ob42b73151004291132n923d4a5i161a6d30e57802fe@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Thu, Apr 29, 2010 at 1:51 PM, David Wall <d(dot)wall(at)computer(dot)org> wrote:
> I missed the part that BYTEA was being used since it's generally not a good
> way for starting large binary data because you are right that BYTEA requires
> escaping across the wire (client to backend) both directions, which for true
> binary data (like compressed/encrypted data, images or other non-text files)
> makes for a lot of expansion in size and related memory.

what?? postgresql supports binary data in both directions without
escaping. here is how i do it with libpqtypes:

PGbytea b;
b.data = some_pointer;
b.len = data_length;

res = PGexecf(conn, "insert into table values (%bytea*);", b);

merlin


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Justin Graf <justin(at)magwerks(dot)com>
Cc: David Wall <d(dot)wall(at)computer(dot)org>, Postgres General Postgres General <pgsql-general(at)postgresql(dot)org>
Subject: Re: Storing many big files in database- should I do it?
Date: 2010-04-29 19:11:40
Message-ID: 20100429191140.GE3296@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Justin Graf wrote:
> On 4/29/2010 12:07 PM, David Wall wrote:
> >
> >
> > Big downside for the DB is that all large objects appear to be stored
> > together in pg_catalog.pg_largeobject, which seems axiomatically
> > troubling that you know you have lots of big data, so you then store
> > them together, and then worry about running out of 'loids'.
> Huh ??? isn't that point of using bytea or text datatypes.
>
> I could have sworn bytea does not use large object interface it uses
> TOAST or have i gone insane

Each toasted object also requires an OID, so you cannot have more than 4
billion toasted attributes in a table.

I've never seen this to be a problem in real life, but if you're talking
about having that many large objects, then it will be a problem with
toast too.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Justin Graf <justin(at)magwerks(dot)com>, David Wall <d(dot)wall(at)computer(dot)org>, Postgres General Postgres General <pgsql-general(at)postgresql(dot)org>
Subject: Re: Storing many big files in database- should I do it?
Date: 2010-04-29 19:18:11
Message-ID: 17549.1272568691@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> Each toasted object also requires an OID, so you cannot have more than 4
> billion toasted attributes in a table.

> I've never seen this to be a problem in real life, but if you're talking
> about having that many large objects, then it will be a problem with
> toast too.

However, that toast limit is per-table, whereas the pg_largeobject limit
is per-database. So for example if you have a partitioned table then
the toast limit only applies per partition. With large objects you'd
fall over at 4G objects (probably quite a bit less in practice) no
matter what.

regards, tom lane


From: Justin Graf <justin(at)magwerks(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, David Wall <d(dot)wall(at)computer(dot)org>, Postgres General Postgres General <pgsql-general(at)postgresql(dot)org>
Subject: Re: Storing many big files in database- should I do it?
Date: 2010-04-29 19:33:46
Message-ID: 4BD9DF1A.2080906@magwerks.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On 4/29/2010 3:18 PM, Tom Lane wrote:
> Alvaro Herrera<alvherre(at)commandprompt(dot)com> writes:
>
> However, that toast limit is per-table, whereas the pg_largeobject limit
> is per-database. So for example if you have a partitioned table then
> the toast limit only applies per partition. With large objects you'd
> fall over at 4G objects (probably quite a bit less in practice) no
> matter what.
>
> regards, tom lane
>
has there been any thought of doing something similar to MS filestream ????
http://msdn.microsoft.com/en-us/library/cc949109.aspx

it seems to overcome all the draw backs of storing files in the DB.

All legitimate Magwerks Corporation quotations are sent in a .PDF file attachment with a unique ID number generated by our proprietary quotation system. Quotations received via any other form of communication will not be honored.

CONFIDENTIALITY NOTICE: This e-mail, including attachments, may contain legally privileged, confidential or other information proprietary to Magwerks Corporation and is intended solely for the use of the individual to whom it addresses. If the reader of this e-mail is not the intended recipient or authorized agent, the reader is hereby notified that any unauthorized viewing, dissemination, distribution or copying of this e-mail is strictly prohibited. If you have received this e-mail in error, please notify the sender by replying to this message and destroy all occurrences of this e-mail immediately.
Thank you.


From: Scott Ribe <scott_ribe(at)killerbytes(dot)com>
To: Justin Graf <justin(at)magwerks(dot)com>
Cc: David Wall <d(dot)wall(at)computer(dot)org>, Postgres General Postgres General <pgsql-general(at)postgresql(dot)org>
Subject: Re: Storing many big files in database- should I do it?
Date: 2010-04-29 19:38:39
Message-ID: 11DB60AB-60F6-4EF2-8CFD-1009D12B8456@killerbytes.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Apr 29, 2010, at 10:45 AM, Justin Graf wrote:

> Many people encode the binary data in Base64 and store as text data
> type?? Then never have to deal with escaping bytea data type. Which i
> have found can be a pain

Damn. Wish I'd thought of that ;-)

--
Scott Ribe
scott_ribe(at)elevated-dev(dot)com
http://www.elevated-dev.com/
(303) 722-0567 voice