Quick Links

Re: Need suggestion

Lists:	pgsql-general

From:	Carl von Clausewitz <clausewitz45(at)gmail(dot)com>
To:	pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject:	Need suggestion
Date:	2011-06-01 08:08:14
Message-ID:	BANLkTi=Vt21VW7TXzr0df8DdwDRgF9o9rw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

Hello Everyone,

I got a new project, with 100 user in Europe. In this case, I need to handle
production and sales processes an its documentations in PostgreSQL with PHP.
The load of the sales process is negligible, but every user produces 2
transaction in the production process, with 10-30 scanned documents (each
are 400kb - 800kb), and 30-50 high resolution pictures (each are 3-8 MB),
and they wanted to upload it to 'somewhere'. 'Somewhere' could be the server
files system, and a link in the PostgreSQL database for the location of the
files (with some metadata), or it could be the PostgreSQL database.

My question is that: what is your opinion about to store the scanned
documentation and the pictures in the database? This is a huge amount of
data (between daily 188MB and 800MB data, average year is about 1 TB data),
but is must be searchable, and any document must be retrieved within 1 hour.
Every documentations must be stored for up to 5 years... It means the
database could be about 6-7 TB large after 5 years, and then we can start to
archive documents. Any other data size is negligible.

If you suggest, to store all of the data in PostgreSQL, what is your
recommendation about table, index structure, clustering, archiving?

Thank you in advance!
Regards,
Carl

From:	Ognjen Blagojevic <ognjen(dot)d(dot)blagojevic(at)gmail(dot)com>
To:	pgsql-general(at)postgresql(dot)org
Subject:	Re: Need suggestion
Date:	2011-06-02 09:19:03
Message-ID:	4DE75587.60900@gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

Carl,

I don't have experience with that big databases, but I did both
solutions, and here are pros of both of them:

1. Files stored on the filesystem:
- Small database footprint
- Faster backup, export and import

2. Files stored in the database
- RDBMS takes care of transactions and ref. int.
- Slower backup, export and import but all done in one step
- Easier continuous archiving

I slightly prefer option no. 2, since transaction handling, rollback and
ref. integrity is not so easy to implement when you have two different
storage systems (FS and RDB).

As for indexes and tables it is not clear form your message whether you
need just a regular search (field LIKE 'something'), full text search of
metadata, or full text search of scanned documents (in case they are OCRed).

Regards,
Ognjen

On 1.6.2011 10:08, Carl von Clausewitz wrote:
> Hello Everyone,
>
> I got a new project, with 100 user in Europe. In this case, I need to
> handle production and sales processes an its documentations in
> PostgreSQL with PHP. The load of the sales process is negligible, but
> every user produces 2 transaction in the production process, with 10-30
> scanned documents (each are 400kb - 800kb), and 30-50 high resolution
> pictures (each are 3-8 MB), and they wanted to upload it to 'somewhere'.
> 'Somewhere' could be the server files system, and a link in
> the PostgreSQL database for the location of the files (with some
> metadata), or it could be the PostgreSQL database.
>
> My question is that: what is your opinion about to store the scanned
> documentation and the pictures in the database? This is a huge amount of
> data (between daily 188MB and 800MB data, average year is about 1 TB
> data), but is must be searchable, and any document must be retrieved
> within 1 hour. Every documentations must be stored for up to 5 years...
> It means the database could be about 6-7 TB large after 5 years, and
> then we can start to archive documents. Any other data size is negligible.
>
> If you suggest, to store all of the data in PostgreSQL, what is your
> recommendation about table, index structure, clustering, archiving?
>
> Thank you in advance!
> Regards,
> Carl

From:	Carl von Clausewitz <clausewitz45(at)gmail(dot)com>
To:	Ognjen Blagojevic <ognjen(dot)d(dot)blagojevic(at)gmail(dot)com>
Cc:	pgsql-general(at)postgresql(dot)org
Subject:	Re: Need suggestion
Date:	2011-06-02 10:22:15
Message-ID:	BANLkTimqLS2h74pVKJc=ChFessQ7xW22xQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

Dear Ognjen,

thank you - that was my idea too, but I've never seen such a workload like
this. The docu's (which are not processed by any ocr hopefully) and the
pictures are not indexed off course, just some metadatas, which are related
to the exact docu, or pic For example:

productions_docu1:
-sent date
-recieved date
-type
-owner
-case_id
-etc

image_001:
-picturetaken date
-case_id
-image_type

Just these metadatas need to be searched. My questions about the structure
was like this: do you recommend, to store the images and the docu's in a
same table (CREATE TABLE docu_img_store (id BIGSERIAL, case_id
BIGINT, content_type INTEGER, content bytea), or store it in two different
tables? Is there any special settings while table creations, that I have to
set for optimal work (like index, storage parameter, toast, etc).

(:-) I know, that this project could be a high value revenue for any DB
consultancy related company, but this is a small country, with small project
fees, and I'm employee, not a contractor at my company :-)

Thanks you in advance,
Regards,
Carl

2011/6/2 Ognjen Blagojevic <ognjen(dot)d(dot)blagojevic(at)gmail(dot)com>

> Carl,
>
> I don't have experience with that big databases, but I did both solutions,
> and here are pros of both of them:
>
> 1. Files stored on the filesystem:
> - Small database footprint
> - Faster backup, export and import
>
> 2. Files stored in the database
> - RDBMS takes care of transactions and ref. int.
> - Slower backup, export and import but all done in one step
> - Easier continuous archiving
>
> I slightly prefer option no. 2, since transaction handling, rollback and
> ref. integrity is not so easy to implement when you have two different
> storage systems (FS and RDB).
>
> As for indexes and tables it is not clear form your message whether you
> need just a regular search (field LIKE 'something'), full text search of
> metadata, or full text search of scanned documents (in case they are OCRed).
>
> Regards,
> Ognjen
>
>
>
> On 1.6.2011 10:08, Carl von Clausewitz wrote:
>
>> Hello Everyone,
>>
>> I got a new project, with 100 user in Europe. In this case, I need to
>> handle production and sales processes an its documentations in
>> PostgreSQL with PHP. The load of the sales process is negligible, but
>> every user produces 2 transaction in the production process, with 10-30
>> scanned documents (each are 400kb - 800kb), and 30-50 high resolution
>> pictures (each are 3-8 MB), and they wanted to upload it to 'somewhere'.
>> 'Somewhere' could be the server files system, and a link in
>> the PostgreSQL database for the location of the files (with some
>> metadata), or it could be the PostgreSQL database.
>>
>> My question is that: what is your opinion about to store the scanned
>> documentation and the pictures in the database? This is a huge amount of
>> data (between daily 188MB and 800MB data, average year is about 1 TB
>> data), but is must be searchable, and any document must be retrieved
>> within 1 hour. Every documentations must be stored for up to 5 years...
>> It means the database could be about 6-7 TB large after 5 years, and
>> then we can start to archive documents. Any other data size is negligible.
>>
>> If you suggest, to store all of the data in PostgreSQL, what is your
>> recommendation about table, index structure, clustering, archiving?
>>
>> Thank you in advance!
>> Regards,
>> Carl
>>
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

From:	Ognjen Blagojevic <ognjen(dot)d(dot)blagojevic(at)gmail(dot)com>
To:	Postgres general mailing list <pgsql-general(at)postgresql(dot)org>
Subject:	Re: Need suggestion
Date:	2011-06-02 15:33:31
Message-ID:	4DE7AD4B.9050803@gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

Carl,

Please keep in mind I am not a Postgres expert nor consultant, I'm just
sharing my experience. I would also like to hear the opinion of other
people who worked on projects with similar database sizes.

I would keep all files in the single table -- most probably they will be
served to the user by the same code (e.g. file download servlet or
something similar) so it is good if all relevant info is in one table,
something like:

file {
id
mime_type
name
content bytea
creation_date datetime
modification_date datetime
creation_user
modification_user
}

Since both image and document inherits file, you may choose any of the
common RDB inheritance modeling strategies (one table per hierarchy, one
table per class...), but since there is just a few fields, I would put
everything in the same table.

Consider cardinality between cases and files/users. Can one file be
related with two cases and so on...

Toast table will be splitted in 1GB pieces.

Create indexes considering ways your users will browse or search data.

Regards,
Ognjen

On 2.6.2011 12:22, Carl von Clausewitz wrote:
> Dear Ognjen,
>
> thank you - that was my idea too, but I've never seen such a workload
> like this. The docu's (which are not processed by any ocr hopefully) and
> the pictures are not indexed off course, just some metadatas, which are
> related to the exact docu, or pic For example:
>
> productions_docu1:
> -sent date
> -recieved date
> -type
> -owner
> -case_id
> -etc
>
> image_001:
> -picturetaken date
> -case_id
> -image_type
>
> Just these metadatas need to be searched. My questions about the
> structure was like this: do you recommend, to store the images and the
> docu's in a same table (CREATE TABLE docu_img_store (id
> BIGSERIAL, case_id BIGINT, content_type INTEGER, content bytea), or
> store it in two different tables? Is there any special settings while
> table creations, that I have to set for optimal work (like index,
> storage parameter, toast, etc).
>
> (:-) I know, that this project could be a high value revenue for any DB
> consultancy related company, but this is a small country, with small
> project fees, and I'm employee, not a contractor at my company :-)
>
> Thanks you in advance,
> Regards,
> Carl
>
>
> 2011/6/2 Ognjen Blagojevic <ognjen(dot)d(dot)blagojevic(at)gmail(dot)com
> <mailto:ognjen(dot)d(dot)blagojevic(at)gmail(dot)com>>
>
> Carl,
>
> I don't have experience with that big databases, but I did both
> solutions, and here are pros of both of them:
>
> 1. Files stored on the filesystem:
> - Small database footprint
> - Faster backup, export and import
>
> 2. Files stored in the database
> - RDBMS takes care of transactions and ref. int.
> - Slower backup, export and import but all done in one step
> - Easier continuous archiving
>
> I slightly prefer option no. 2, since transaction handling, rollback
> and ref. integrity is not so easy to implement when you have two
> different storage systems (FS and RDB).
>
> As for indexes and tables it is not clear form your message whether
> you need just a regular search (field LIKE 'something'), full text
> search of metadata, or full text search of scanned documents (in
> case they are OCRed).
>
> Regards,
> Ognjen
>
>
>
> On 1.6.2011 10:08, Carl von Clausewitz wrote:
>
> Hello Everyone,
>
> I got a new project, with 100 user in Europe. In this case, I
> need to
> handle production and sales processes an its documentations in
> PostgreSQL with PHP. The load of the sales process is
> negligible, but
> every user produces 2 transaction in the production process,
> with 10-30
> scanned documents (each are 400kb - 800kb), and 30-50 high
> resolution
> pictures (each are 3-8 MB), and they wanted to upload it to
> 'somewhere'.
> 'Somewhere' could be the server files system, and a link in
> the PostgreSQL database for the location of the files (with some
> metadata), or it could be the PostgreSQL database.
>
> My question is that: what is your opinion about to store the scanned
> documentation and the pictures in the database? This is a huge
> amount of
> data (between daily 188MB and 800MB data, average year is about 1 TB
> data), but is must be searchable, and any document must be retrieved
> within 1 hour. Every documentations must be stored for up to 5
> years...
> It means the database could be about 6-7 TB large after 5 years, and
> then we can start to archive documents. Any other data size is
> negligible.
>
> If you suggest, to store all of the data in PostgreSQL, what is your
> recommendation about table, index structure, clustering, archiving?
>
> Thank you in advance!
> Regards,
> Carl
>
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general(at)postgresql(dot)org
> <mailto:pgsql-general(at)postgresql(dot)org>)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>
>

From:	Ben Chobot <bench(at)silentmedia(dot)com>
To:	Carl von Clausewitz <clausewitz45(at)gmail(dot)com>
Cc:	pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject:	Re: Need suggestion
Date:	2011-06-02 16:32:54
Message-ID:	FFCD031D-068D-4C52-B37F-1FE99E1B213A@silentmedia.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

On Jun 1, 2011, at 1:08 AM, Carl von Clausewitz wrote:

> Hello Everyone,
>
> I got a new project, with 100 user in Europe. In this case, I need to handle production and sales processes an its documentations in PostgreSQL with PHP. The load of the sales process is negligible, but every user produces 2 transaction in the production process, with 10-30 scanned documents (each are 400kb - 800kb), and 30-50 high resolution pictures (each are 3-8 MB), and they wanted to upload it to 'somewhere'. 'Somewhere' could be the server files system, and a link in the PostgreSQL database for the location of the files (with some metadata), or it could be the PostgreSQL database.
>
> My question is that: what is your opinion about to store the scanned documentation and the pictures in the database? This is a huge amount of data (between daily 188MB and 800MB data, average year is about 1 TB data), but is must be searchable, and any document must be retrieved within 1 hour. Every documentations must be stored for up to 5 years... It means the database could be about 6-7 TB large after 5 years, and then we can start to archive documents. Any other data size is negligible.
>
> If you suggest, to store all of the data in PostgreSQL, what is your recommendation about table, index structure, clustering, archiving?

So, you're mostly storing ~1TB of images/year? That doesn't seem so bad. How will the documents be searched? Will their contents be OCR'd out and put into a full text search? How many searches will be going on?

If you're asking whether or not it makes sense to store 7TB of images in the database, as opposed to storing links to those images and keeping the images themselves on a normal filesystem, there's no clear answer. Check the archives for pros and cons of each method.

From:	John R Pierce <pierce(at)hogranch(dot)com>
To:	pgsql-general(at)postgresql(dot)org
Subject:	Re: Need suggestion
Date:	2011-06-02 16:58:51
Message-ID:	4DE7C14B.3080902@hogranch.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

On 06/02/11 2:19 AM, Ognjen Blagojevic wrote:
> 1. Files stored on the filesystem:
> - Small database footprint
> - Faster backup, export and import
>
> 2. Files stored in the database
> - RDBMS takes care of transactions and ref. int.
> - Slower backup, export and import but all done in one step
> - Easier continuous archiving

with many terabytes of large file data accumulating, the database will
become very unweildy to do any maintenance on. a simple pg_dump will
take many hours vs a few minutes.

I would almost certainly use a filesystem for an app like this, and just
store the metadata in the database.

--
john r pierce N 37, W 123
santa cruz ca mid-left coast

From:	tomas(at)tuxteam(dot)de
To:	Carl von Clausewitz <clausewitz45(at)gmail(dot)com>
Cc:	pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject:	Re: Need suggestion
Date:	2011-06-03 05:15:40
Message-ID:	20110603051540.GA16132@tomas
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Wed, Jun 01, 2011 at 10:08:14AM +0200, Carl von Clausewitz wrote:
> Hello Everyone,
>
> I got a new project, with 100 user in Europe. In this case, I need to handle
> production and sales processes an its documentations in PostgreSQL with PHP.

Something to consider too -- if you decide to store the big objects in
the database, that is -- is PostgreSQL's large object interface
<http://www.postgresql.org/docs/9.1/static/largeobjects.html>. The
problems with backup someone else mentioned in this thread would remain,
but you wouldn't have large blobs of data clobbering your "regular"
queries. You could pass the scans and pics piecemeal between client and
database without having to store them in the middleware (which may be an
advantage or a disadvantage, mind you).

Don't know whether PHP has bindings for that, though.

Regards
- -- tomás
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFN6G38Bcgs9XrR2kYRAmOyAJwIGwk57tH5X8V4uEV5c3peQv7aKACfZ+Tm
9ogbAeWTKwxM2/o7aKz9kbc=
=MMDN
-----END PGP SIGNATURE-----

From:	tomas(at)tuxteam(dot)de
To:	Carl von Clausewitz <clausewitz45(at)gmail(dot)com>
Cc:	pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject:	Re: Need suggestion
Date:	2011-06-03 05:28:47
Message-ID:	20110603052847.GB16132@tomas
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Following up on myself:

Just stumbled upon this in the hackers mailing list, which might be
interesting to you, since it highlights pros & cons of current
implementations:

<http://archives.postgresql.org/pgsql-hackers/2011-06/threads.php#00049>

Regards
- -- tomás
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFN6HEPBcgs9XrR2kYRAsg0AJ4o2fLheYZQAhpKE7cd7LWEOJc2vwCfUvnu
+Skz5eZti3cdDoode6Zu6s4=
=ImVK
-----END PGP SIGNATURE-----

From:	Karsten Hilbert <Karsten(dot)Hilbert(at)gmx(dot)net>
To:	pgsql-general(at)postgresql(dot)org
Subject:	Re: Need suggestion
Date:	2011-06-03 08:56:09
Message-ID:	20110603085609.GC2365@hermes.hilbert.loc
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

On Fri, Jun 03, 2011 at 07:15:40AM +0200, tomas(at)tuxteam(dot)de wrote:

> but you wouldn't have large blobs of data clobbering your "regular" queries.

You would want to write better queries than

select * from my_table_with_bytea_column;

anyway.

> You could pass the scans and pics piecemeal between client and database

At least for retrieval even BYTEA can be accessed piecemeal:

select substring(bytea_column from <start> for <number_of_bytes>)

Karsten
--
GPG key ID E4071346 @ gpg-keyserver.de
E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346

From:	Carl von Clausewitz <clausewitz45(at)gmail(dot)com>
To:	pgsql-general(at)postgresql(dot)org
Subject:	Re: Need suggestion
Date:	2011-06-03 16:33:30
Message-ID:	BANLkTikLBj_zpyE5dCvG9ycHEf3ahX8Ncw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

Thanks for the replies, and suggestion from Ognjen, Ben Chobot, John R
Pierce, Tomás, and Karsten... I checked the links, and I decided, that I
cannot decide :-) because I don't know, how large could be the
infrastructure for this. If I store the images, and scanned docus in the
database, a radically larger enviroment will be needed, than if I store only
the link. My boss will decide, and I will implement anything, that he wants,
I just wanted to collect some experience, that you have, and that you have
provided form me, and many thanks for it :-)

Thanks again,
Regards,
Carl.

2011/6/3 Karsten Hilbert <Karsten(dot)Hilbert(at)gmx(dot)net>

> On Fri, Jun 03, 2011 at 07:15:40AM +0200, tomas(at)tuxteam(dot)de wrote:
>
> > but you wouldn't have large blobs of data clobbering your "regular"
> queries.
>
> You would want to write better queries than
>
> select * from my_table_with_bytea_column;
>
> anyway.
>
> > You could pass the scans and pics piecemeal between client and database
>
> At least for retrieval even BYTEA can be accessed piecemeal:
>
> select substring(bytea_column from <start> for <number_of_bytes>)
>
> Karsten
> --
> GPG key ID E4071346 @ gpg-keyserver.de
> E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346
>
> --
> Sent via pgsql-general mailing list (pgsql-general(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

From:	Esmin Gracic <esmin(dot)gracic(at)gmail(dot)com>
To:	Carl von Clausewitz <clausewitz45(at)gmail(dot)com>
Cc:	pgsql-general(at)postgresql(dot)org
Subject:	Re: Need suggestion
Date:	2011-06-03 22:09:29
Message-ID:	BANLkTim+LvXO7Qa6Qaz+Lc7nyU=+ZCiR3w@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

another option is using sqlite for storing images. All data is in single
file. (or files if you organize it that way) easier backup etc... you have
some db benefits and retaining solid speed vs file system. Haven't used
this, but seems as viable option to explore.

Esmin

On Fri, Jun 3, 2011 at 6:33 PM, Carl von Clausewitz
<clausewitz45(at)gmail(dot)com>wrote:

> Thanks for the replies, and suggestion from Ognjen, Ben Chobot, John R
> Pierce, Tomás, and Karsten... I checked the links, and I decided, that I
> cannot decide :-) because I don't know, how large could be the
> infrastructure for this. If I store the images, and scanned docus in the
> database, a radically larger enviroment will be needed, than if I store only
> the link. My boss will decide, and I will implement anything, that he wants,
> I just wanted to collect some experience, that you have, and that you have
> provided form me, and many thanks for it :-)
>
> Thanks again,
> Regards,
> Carl.
>
> 2011/6/3 Karsten Hilbert <Karsten(dot)Hilbert(at)gmx(dot)net>
>
>> On Fri, Jun 03, 2011 at 07:15:40AM +0200, tomas(at)tuxteam(dot)de wrote:
>>
>> > but you wouldn't have large blobs of data clobbering your "regular"
>> queries.
>>
>> You would want to write better queries than
>>
>> select * from my_table_with_bytea_column;
>>
>> anyway.
>>
>> > You could pass the scans and pics piecemeal between client and database
>>
>> At least for retrieval even BYTEA can be accessed piecemeal:
>>
>> select substring(bytea_column from <start> for <number_of_bytes>)
>>
>> Karsten
>> --
>> GPG key ID E4071346 @ gpg-keyserver.de
>> E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346
>>
>> --
>> Sent via pgsql-general mailing list (pgsql-general(at)postgresql(dot)org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-general
>>
>
>

From:	John R Pierce <pierce(at)hogranch(dot)com>
To:	pgsql-general(at)postgresql(dot)org
Subject:	Re: Need suggestion
Date:	2011-06-03 22:44:47
Message-ID:	4DE963DF.4000405@hogranch.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

On 06/03/11 3:09 PM, Esmin Gracic wrote:
> another option is using sqlite for storing images. All data is in
> single file. (or files if you organize it that way) easier backup
> etc... you have some db benefits and retaining solid speed vs file
> system. Haven't used this, but seems as viable option to explore.

a single multi-terabyte file? what a *wonderful* idea. *NOT*

From:	Esmin Gracic <esmin(dot)gracic(at)gmail(dot)com>
To:	John R Pierce <pierce(at)hogranch(dot)com>
Cc:	pgsql-general(at)postgresql(dot)org
Subject:	Re: Need suggestion
Date:	2011-06-04 10:18:35
Message-ID:	BANLkTiks4eHUuv7xC1r0ABW_TRF1ddG38g@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

*(or files if you organize it that way)*
if the problem was so simple, I guess Carl would not have asked the question
in the first place.

there could be one sqlite db file for each day, week or month (1TB over 365
days). Something like partitioning on date dimension.
Actually, sqlite scales well up to 2 TB so it's not so *unwonderfull* idea
even using one file per year (especially on XFS file system).

I would use postgres + sqlite as image storing engine.

"...but is must be searchable, and any document must be retrieved within 1
hour..." - this is not bleeding edge requirement, so I would use filesystem
if this was 10 sec or something, but having 1 hour timeframe would
definitely made me choose db over filesystem.

Esmin.

On Sat, Jun 4, 2011 at 12:44 AM, John R Pierce <pierce(at)hogranch(dot)com> wrote:

> On 06/03/11 3:09 PM, Esmin Gracic wrote:
>
>> another option is using sqlite for storing images. All data is in single
>> file. (or files if you organize it that way) easier backup etc... you have
>> some db benefits and retaining solid speed vs file system. Haven't used
>> this, but seems as viable option to explore.
>>
>
> a single multi-terabyte file? what a *wonderful* idea. *NOT*
>
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

From:	Merlin Moncure <mmoncure(at)gmail(dot)com>
To:	Esmin Gracic <esmin(dot)gracic(at)gmail(dot)com>
Cc:	Carl von Clausewitz <clausewitz45(at)gmail(dot)com>, pgsql-general(at)postgresql(dot)org
Subject:	Re: Need suggestion
Date:	2011-06-07 16:51:11
Message-ID:	BANLkTimATh81X=XXE+TaMFxc1LNb66Og8g@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

On Fri, Jun 3, 2011 at 5:09 PM, Esmin Gracic <esmin(dot)gracic(at)gmail(dot)com> wrote:
> another option is using sqlite for storing images. All data is in single
> file. (or files if you organize it that way) easier backup etc... you have
> some db benefits and retaining solid speed vs file system. Haven't used
> this, but seems as viable option to explore.

My postgres database is stored on a single file as well...on my
workstation it's /dev/sda2. Using a loopback device I could create a
classic file. Point being, having a single file doesn't eliminate or
simplify fragmentation and sync issues -- it just moves them from one
place to another.

sqlite has fundamentally different operational characteristics due to
it's architecture. It is unsuited for problems where a multi-user
database is typically the tool of choice for a number of reasons. For
example, sqlite's locking model is exceptionally crude by comparison,
and intentionally so. Being able to run inside an applications's
process is a huge asset though.

merlin