Re: Large rows number, and large objects

Lists: pgsql-performance
From: Jose Ildefonso Camargo Tolosa <ildefonso(dot)camargo(at)gmail(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Large rows number, and large objects
Date: 2011-06-19 04:06:02
Message-ID: BANLkTikXvotWjfWBra=6+66WZKfHkie4AQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Greetings,

I have been thinking a lot about pgsql performance when it is dealing
with tables with lots of rows on one table (several millions, maybe
thousands of millions). Say, the Large Object use case:

one table has large objects (have a pointer to one object).
The large object table stores the large object in 2000 bytes chunks
(iirc), so, if we have something like 1TB of data stored in large
objects, the large objects table would have something like 550M rows,
if we get to 8TB, we will have 4400M rows (or so).

I have read at several places that huge tables should be partitioned,
to improve performance.... now, my first question comes: does the
large objects system automatically partitions itself? if no: will
Large Objects system performance degrade as we add more data? (I guess
it would).

Now... I can't fully understand this: why does the performance
actually goes lower? I mean, when we do partitioning, we take a
certain parameter to "divide" the data,and then use the same parameter
to issue the request against the correct table... shouldn't the DB
actually do something similar with the indexes? I mean, I have always
thought about the indexes, well, exactly like that: approximation
search, I know I'm looking for, say, a date that is less than
2010-03-02, and the system should just position itself on the index
around that date, and scan from that point backward... as far as my
understanding goes, the partitioning only adds like this "auxiliary"
index, making the system, for example, go to a certain table if the
query goes toward one particular year (assuming we partitioned by
year), what if the DB actually implemented something like an Index for
the Index (so that the first search on huge tables scan on an smaller
index that points to a position on the larger index, thus avoiding the
scan of the large index initially).

Well.... I'm writing all of this half-sleep now, so... I'll re-read it
tomorrow... in the meantime, just ignore anything that doesn't make a
lot of sense :) .

Thanks!

Ildefonso Camargo


From: Samuel Gendler <sgendler(at)ideasculptor(dot)com>
To: Jose Ildefonso Camargo Tolosa <ildefonso(dot)camargo(at)gmail(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Large rows number, and large objects
Date: 2011-06-19 11:37:59
Message-ID: BANLkTimrZuk8kvftwuwRakxKD0DRS61hAQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Sat, Jun 18, 2011 at 9:06 PM, Jose Ildefonso Camargo Tolosa <
ildefonso(dot)camargo(at)gmail(dot)com> wrote:

> Greetings,
>
> I have been thinking a lot about pgsql performance when it is dealing
> with tables with lots of rows on one table (several millions, maybe
> thousands of millions). Say, the Large Object use case:
>
> one table has large objects (have a pointer to one object).
> The large object table stores the large object in 2000 bytes chunks
> (iirc), so, if we have something like 1TB of data stored in large
> objects, the large objects table would have something like 550M rows,
> if we get to 8TB, we will have 4400M rows (or so).
>
> I have read at several places that huge tables should be partitioned,
> to improve performance.... now, my first question comes: does the
> large objects system automatically partitions itself? if no: will
> Large Objects system performance degrade as we add more data? (I guess
> it would).
>
> Now... I can't fully understand this: why does the performance
> actually goes lower? I mean, when we do partitioning, we take a
> certain parameter to "divide" the data,and then use the same parameter
> to issue the request against the correct table... shouldn't the DB
> actually do something similar with the indexes? I mean, I have always
> thought about the indexes, well, exactly like that: approximation
> search, I know I'm looking for, say, a date that is less than
> 2010-03-02, and the system should just position itself on the index
> around that date, and scan from that point backward... as far as my
> understanding goes, the partitioning only adds like this "auxiliary"
> index, making the system, for example, go to a certain table if the
> query goes toward one particular year (assuming we partitioned by
> year), what if the DB actually implemented something like an Index for
> the Index (so that the first search on huge tables scan on an smaller
> index that points to a position on the larger index, thus avoiding the
> scan of the large index initially).
>
> Well.... I'm writing all of this half-sleep now, so... I'll re-read it
> tomorrow... in the meantime, just ignore anything that doesn't make a
> lot of sense :) .
>

Partitioning helps in a number of ways. First, if running a query which
must scan an entire table, if the table is very large, that scan will be
expensive. Partitioning can allow the query planner to do a sequential scan
over just some of the data and skip the rest (or process it via some other
manner, such as an index lookup). Also, the larger an index is, the more
expensive the index is to maintain. Inserts and lookups will both take
longer. Partitioning will give you n indexes, each with m/n entries
(assuming fairly even distribution of data among partitions), so any given
index will be smaller, which means inserts into a partition will potentially
be much faster. Since large tables often also have very high insert rates,
this can be a big win. You can also gain better resource utilization by
moving less frequently used partitions onto slower media (via a tablespace),
freeing up space on your fastest i/o devices for the most important data. A
lot of partitioning tends to happen by time, and the most frequently run
queries are often on the partitions containing the most recent data, so it
often can be very beneficial to keep only the most recent partitions on
fastest storage. Then there is caching. Indexes and tables are cached by
page. Without clustering a table on a particular index, the contents of a
single page may be quite arbitrary. Even with clustering, depending upon
the usage patterns of the table in question, it is entirely possible that
any given page may only have some fairly small percentage of highly relevant
data if the table is very large. By partitioning, you can (potentially)
ensure that any given page in cache will have a higher density of highly
relevant entries, so you'll get better efficiency out of the caching layers.
And with smaller indexes, it is less likely that loading an index into
shared buffers will push some other useful chunk of data out of the cache.

As for the large object tables, I'm not sure about the internals. Assuming
that each table gets its own table for large objects, partitioning the main
table will have the effect of partitioning the large object table, too -
keeping index maintenance more reasonable and ensuring that lookups are as
fast as possible. There's probably a debate to be had on the benefit of
storing very large numbers of large objects in the db, too (as opposed to
keeping references to them in the db and actually accessing them via some
other mechanism. Both product requirements and performance are significant
factors in that discussion).

As for your suggestion that the db maintain an index on an index, how would
the database do so in an intelligent manner? It would have to maintain such
indexes on every index and guess as to which values to use as boundaries for
each bucket. Partitioning solves the same problem, but allows you to direct
the database such that it only does extra work where the dba, who is much
more knowledgable about the structure of the data and how it will be used
than the database itself, tells it to. And the dba gets to tell the db what
buckets to use when partitioning the database - via the check constraints on
the partitions. Without that, the db would have to guess as to appropriate
bucket sizes and the distribution of values within them.

I'm sure there are reasons beyond even those I've listed here. I'm not one
of the postgresql devs, so my understanding of how it benefits from
partitioning is shallow, at best. If the usage pattern of your very large
table is such that every query tends to use all of the table, then I'm not
sure partitioning really offers much gain. The benefits of partitioning
are, at least in part, predicated on only a subset of the data being useful
to any one query, and the benefits get that much stronger if some portion of
the data is rarely used by any query.


From: Craig James <craig_james(at)emolecules(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Large rows number, and large objects
Date: 2011-06-19 15:49:28
Message-ID: 4DFE1A88.4000304@emolecules.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On 6/19/11 4:37 AM, Samuel Gendler wrote:
> On Sat, Jun 18, 2011 at 9:06 PM, Jose Ildefonso Camargo Tolosa <ildefonso(dot)camargo(at)gmail(dot)com <mailto:ildefonso(dot)camargo(at)gmail(dot)com>> wrote:
>
> Greetings,
>
> I have been thinking a lot about pgsql performance when it is dealing
> with tables with lots of rows on one table (several millions, maybe
> thousands of millions). Say, the Large Object use case:
>
> one table has large objects (have a pointer to one object).
> The large object table stores the large object in 2000 bytes chunks
> (iirc), so, if we have something like 1TB of data stored in large
> objects, the large objects table would have something like 550M rows,
> if we get to 8TB, we will have 4400M rows (or so).
>
> I have read at several places that huge tables should be partitioned,
> to improve performance.... now, my first question comes: does the
> large objects system automatically partitions itself? if no: will
> Large Objects system performance degrade as we add more data? (I guess
> it would).
>
You should consider "partitioning" your data in a different way: Separate the relational/searchable data from the bulk data that is merely being stored.

Relational databases are just that: relational. The thing they do well is to store relationships between various objects, and they are very good at finding objects using relational queries and logical operators.

But when it comes to storing bulk data, a relational database is no better than a file system.

In our system, each "object" is represented by a big text object of a few kilobytes. Searching that text file is essential useless -- the only reason it's there is for visualization and to pass on to other applications. So it's separated out into its own table, which only has the text record and a primary key.

We then use other tables to hold extracted fields and computed data about the primary object, and the relationships between the objects. That means we've effectively "partitioned" our data into searchable relational data and non-searchable bulk data. The result is that we have around 50 GB of bulk data that's never searched, and about 1GB of relational, searchable data in a half-dozen other tables.

With this approach, there's no need for table partitioning, and full table scans are quite reasonable.

Craig


From: Jose Ildefonso Camargo Tolosa <ildefonso(dot)camargo(at)gmail(dot)com>
To: Craig James <craig_james(at)emolecules(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Large rows number, and large objects
Date: 2011-06-20 02:19:24
Message-ID: BANLkTi=2tGT+aR13bihXfoOM4UkVS+36Qg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Hi!

Thanks (you both, Samuel and Craig) for your answers!

On Sun, Jun 19, 2011 at 11:19 AM, Craig James
<craig_james(at)emolecules(dot)com> wrote:
> On 6/19/11 4:37 AM, Samuel Gendler wrote:
>
> On Sat, Jun 18, 2011 at 9:06 PM, Jose Ildefonso Camargo Tolosa
> <ildefonso(dot)camargo(at)gmail(dot)com> wrote:
>>
>> Greetings,
>>
>> I have been thinking a lot about pgsql performance when it is dealing
>> with tables with lots of rows on one table (several millions, maybe
>> thousands of millions).  Say, the Large Object use case:
>>
>> one table has large objects (have a pointer to one object).
>> The large object table stores the large object in 2000 bytes chunks
>> (iirc), so, if we have something like 1TB of data stored in large
>> objects, the large objects table would have something like 550M rows,
>> if we get to 8TB, we will have 4400M rows (or so).
>>
>> I have read at several places that huge tables should be partitioned,
>> to improve performance.... now, my first question comes: does the
>> large objects system automatically partitions itself? if no: will
>> Large Objects system performance degrade as we add more data? (I guess
>> it would).
>
> You should consider "partitioning" your data in a different way: Separate
> the relational/searchable data from the bulk data that is merely being
> stored.
>
> Relational databases are just that: relational.  The thing they do well is
> to store relationships between various objects, and they are very good at
> finding objects using relational queries and logical operators.
>
> But when it comes to storing bulk data, a relational database is no better
> than a file system.
>
> In our system, each "object" is represented by a big text object of a few
> kilobytes.  Searching that text file is essential useless -- the only reason
> it's there is for visualization and to pass on to other applications.  So
> it's separated out into its own table, which only has the text record and a
> primary key.

Well, my original schema does exactly that (I mimic the LO schema):

files (searchable): id, name, size, hash, mime_type, number_chunks
files_chunks : id, file_id, hash, chunk_number, data (bytea)

So, my bulk data is on files_chunks table, but due that data is
restricted (by me) to 2000 bytes, the total number of rows on the
files_chunks table can get *huge*.

So, system would search the files table, and then, search the
files_chunks table (to get each of the chunks, and, maybe, send them
out to the web client).

So, with a prospect of ~4500M rows for that table, I really thought it
could be a good idea to partition files_chunks table. Due that I'm
thinking on relatively small files (<100MB), table partitioning should
do great here, because I could manage to make all of the chunks for a
table to be contained on the same table. Now, even if the system
were to get larger files (>5GB), this approach should still work.

The original question was about Large Objects, and partitioning...
see, according to documentation:
http://www.postgresql.org/docs/9.0/static/lo-intro.html

"All large objects are placed in a single system table called pg_largeobject."

So, the question is, if I were to store 8TB worth of data into large
objects system, it would actually make the pg_largeobject table slow,
unless it was automatically partitioned.

Thanks for taking the time to discuss this matter with me!

Sincerely,

Ildefonso Camargo


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Jose Ildefonso Camargo Tolosa <ildefonso(dot)camargo(at)gmail(dot)com>
Cc: Craig James <craig_james(at)emolecules(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Large rows number, and large objects
Date: 2011-07-19 20:27:54
Message-ID: CA+TgmoYU73jnHa07ZuRZau3EiCUvYu0aBbf_4MnTHkq1O5uuLw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Sun, Jun 19, 2011 at 10:19 PM, Jose Ildefonso Camargo Tolosa
<ildefonso(dot)camargo(at)gmail(dot)com> wrote:
> So, the question is, if I were to store 8TB worth of data into large
> objects system, it would actually make the pg_largeobject table slow,
> unless it was automatically partitioned.

I think it's a bit of an oversimplification to say that large,
unpartitioned tables are automatically going to be slow. Suppose you
had 100 tables that were each 80GB instead of one table that is 8TB.
The index lookups would be a bit faster on the smaller tables, but it
would take you some non-zero amount of time to figure out which index
to read in the first place. It's not clear that you are really
gaining all that much.

Many of the advantages of partitioning have to do with maintenance
tasks. For example, if you gather data on a daily basis, it's faster
to drop the partition that contains Thursday's data than it is to do a
DELETE that finds the rows and deletes them one at a time. And VACUUM
can be a problem on very large tables as well, because only one VACUUM
can run on a table at any given time. If the frequency with which the
table needs to be vacuumed is less than the time it takes for VACUUM
to complete, then you've got a problem.

But I think that if we want to optimize pg_largeobject, we'd probably
gain a lot more by switching to a different storage format than we
could ever gain by partitioning the table. For example, we might
decide that any object larger than 16MB should be stored in its own
file. Even somewhat smaller objects would likely benefit from being
stored in larger chunks - say, a bunch of 64kB chunks, with any
overage stored in the 2kB chunks we use now. While this might be an
interesting project, it's probably not going to be anyone's top
priority, because it would be a lot of work for the amount of benefit
you'd get. There's an easy workaround: store the files in the
filesystem, and a path to those files in the database.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Jose Ildefonso Camargo Tolosa <ildefonso(dot)camargo(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Craig James <craig_james(at)emolecules(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Large rows number, and large objects
Date: 2011-07-20 15:57:29
Message-ID: CAETJ_S-2CwPD+oCzRmnM4i66S6o-ih_qBzdKDknSBCdJFwQvzA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Tue, Jul 19, 2011 at 3:57 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> On Sun, Jun 19, 2011 at 10:19 PM, Jose Ildefonso Camargo Tolosa
> <ildefonso(dot)camargo(at)gmail(dot)com> wrote:
> > So, the question is, if I were to store 8TB worth of data into large
> > objects system, it would actually make the pg_largeobject table slow,
> > unless it was automatically partitioned.
>
> I think it's a bit of an oversimplification to say that large,
> unpartitioned tables are automatically going to be slow. Suppose you
> had 100 tables that were each 80GB instead of one table that is 8TB.
> The index lookups would be a bit faster on the smaller tables, but it
> would take you some non-zero amount of time to figure out which index
> to read in the first place. It's not clear that you are really
> gaining all that much.
>

Certainly.... but it is still very blurry to me on *when* it is better to
partition than not.

>
> Many of the advantages of partitioning have to do with maintenance
> tasks. For example, if you gather data on a daily basis, it's faster
> to drop the partition that contains Thursday's data than it is to do a
> DELETE that finds the rows and deletes them one at a time. And VACUUM
> can be a problem on very large tables as well, because only one VACUUM
> can run on a table at any given time. If the frequency with which the
> table needs to be vacuumed is less than the time it takes for VACUUM
> to complete, then you've got a problem.
>

And.... pg_largeobject table doesn't get vacuumed? I mean, isn't that table
just as any other table?

>
> But I think that if we want to optimize pg_largeobject, we'd probably
> gain a lot more by switching to a different storage format than we
> could ever gain by partitioning the table. For example, we might
> decide that any object larger than 16MB should be stored in its own
> file. Even somewhat smaller objects would likely benefit from being
> stored in larger chunks - say, a bunch of 64kB chunks, with any
> overage stored in the 2kB chunks we use now. While this might be an
> interesting project, it's probably not going to be anyone's top
> priority, because it would be a lot of work for the amount of benefit
> you'd get. There's an easy workaround: store the files in the
> filesystem, and a path to those files in the database.
>

Ok, one reason for storing a file *in* the DB is to be able to do PITR of a
wrongly deleted files (or overwritten, and that kind of stuff), on the
filesystem level you would need a versioning filesystem (and I don't, yet,
know any that is stable in the Linux world).

Also, you can use streaming replication and at the same time you stream your
data, your files are also streamed to a secondary server (yes, on the
FS-level you could use drbd or similar).

Ildefonso.


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Jose Ildefonso Camargo Tolosa <ildefonso(dot)camargo(at)gmail(dot)com>
Cc: Craig James <craig_james(at)emolecules(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Large rows number, and large objects
Date: 2011-07-20 19:30:25
Message-ID: CA+TgmobiK855HKR336ScFPzXe6syWjDdUKE+EZHV5ZennX8xMw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Wed, Jul 20, 2011 at 11:57 AM, Jose Ildefonso Camargo Tolosa
<ildefonso(dot)camargo(at)gmail(dot)com> wrote:
> On Tue, Jul 19, 2011 at 3:57 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> On Sun, Jun 19, 2011 at 10:19 PM, Jose Ildefonso Camargo Tolosa
>> <ildefonso(dot)camargo(at)gmail(dot)com> wrote:
>> > So, the question is, if I were to store 8TB worth of data into large
>> > objects system, it would actually make the pg_largeobject table slow,
>> > unless it was automatically partitioned.
>>
>> I think it's a bit of an oversimplification to say that large,
>> unpartitioned tables are automatically going to be slow.  Suppose you
>> had 100 tables that were each 80GB instead of one table that is 8TB.
>> The index lookups would be a bit faster on the smaller tables, but it
>> would take you some non-zero amount of time to figure out which index
>> to read in the first place.  It's not clear that you are really
>> gaining all that much.
>
> Certainly.... but it is still very blurry to me on *when* it is better to
> partition than not.

I think that figuring that out is as much an art as it is a science.
It's better to partition when most of your queries are going to touch
only a single partition; when you are likely to want to remove
partitions in their entirety; when VACUUM starts to have trouble
keeping up... but the reality is that in some cases you probably have
to try it both ways and see which one works better.

>> Many of the advantages of partitioning have to do with maintenance
>> tasks.  For example, if you gather data on a daily basis, it's faster
>> to drop the partition that contains Thursday's data than it is to do a
>> DELETE that finds the rows and deletes them one at a time.  And VACUUM
>> can be a problem on very large tables as well, because only one VACUUM
>> can run on a table at any given time.  If the frequency with which the
>> table needs to be vacuumed is less than the time it takes for VACUUM
>> to complete, then you've got a problem.
>
> And.... pg_largeobject table doesn't get vacuumed? I mean, isn't that table
> just as any other table?

Yes, it is. So, I agree: putting 8TB of data in there is probably
going to hurt.

>> But I think that if we want to optimize pg_largeobject, we'd probably
>> gain a lot more by switching to a different storage format than we
>> could ever gain by partitioning the table.  For example, we might
>> decide that any object larger than 16MB should be stored in its own
>> file.  Even somewhat smaller objects would likely benefit from being
>> stored in larger chunks - say, a bunch of 64kB chunks, with any
>> overage stored in the 2kB chunks we use now.  While this might be an
>> interesting project, it's probably not going to be anyone's top
>> priority, because it would be a lot of work for the amount of benefit
>> you'd get.  There's an easy workaround: store the files in the
>> filesystem, and a path to those files in the database.
>
> Ok, one reason for storing a file *in* the DB is to be able to do PITR of a
> wrongly deleted files (or overwritten, and that kind of stuff), on the
> filesystem level you would need a versioning filesystem (and I don't, yet,
> know any that is stable in the Linux world).
>
> Also, you can use streaming replication and at the same time you stream your
> data, your files are also streamed to a secondary server (yes, on the
> FS-level you could use drbd or similar).

Well, those are good arguments for putting the functionality in the
database and making it all play nicely with write-ahead logging. But
nobody's felt motivated to write the code yet, so...

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Andrzej Nakonieczny <dzemik-pgsql-performance(at)e-list(dot)pingwin(dot)eu(dot)org>
To: pgsql-performance(at)postgresql(dot)org
Cc: Jose Ildefonso Camargo Tolosa <ildefonso(dot)camargo(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Craig James <craig_james(at)emolecules(dot)com>
Subject: Re: Large rows number, and large objects
Date: 2011-07-20 19:33:53
Message-ID: 4E272DA1.6080205@e-list.pingwin.eu.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

W dniu 20.07.2011 17:57, Jose Ildefonso Camargo Tolosa pisze:

[...]

> Many of the advantages of partitioning have to do with maintenance
> tasks. For example, if you gather data on a daily basis, it's faster
> to drop the partition that contains Thursday's data than it is to do a
> DELETE that finds the rows and deletes them one at a time. And VACUUM
> can be a problem on very large tables as well, because only one VACUUM
> can run on a table at any given time. If the frequency with which the
> table needs to be vacuumed is less than the time it takes for VACUUM
> to complete, then you've got a problem.
>
>
> And.... pg_largeobject table doesn't get vacuumed? I mean, isn't that
> table just as any other table?

Vacuum is a real problem on big pg_largeobject table. I have 1.6 TB
database mostly with large objects and vacuuming that table on fast SAN
takes about 4 hours:

now | start | time | datname |
current_query
---------------------+---------------------+----------+------------+----------------------------------------------
2011-07-20 20:12:03 | 2011-07-20 16:21:20 | 03:50:43 | bigdb |
autovacuum: VACUUM pg_catalog.pg_largeobject
(1 row)

LO generates a lot of dead tuples when object are adding:

relname | n_dead_tup
------------------+------------
pg_largeobject | 246731

Adding LO is very fast when table is vacuumed. But when there is a lot
of dead tuples adding LO is very slow (50-100 times slower) and eats
100% of CPU.

It looks that better way is writing object directly as a bytea on
paritioned tables althought it's a bit slower than LO interface on a
vacuumed table.

Regards,
Andrzej


From: Jose Ildefonso Camargo Tolosa <ildefonso(dot)camargo(at)gmail(dot)com>
To: Andrzej Nakonieczny <dzemik-pgsql-performance(at)e-list(dot)pingwin(dot)eu(dot)org>
Cc: pgsql-performance(at)postgresql(dot)org, Robert Haas <robertmhaas(at)gmail(dot)com>, Craig James <craig_james(at)emolecules(dot)com>
Subject: Re: Large rows number, and large objects
Date: 2011-07-22 00:46:40
Message-ID: CAETJ_S9UtECSQUezPL6a-ZgnLNBFC_o6-k7WFrYdnsuqYXgcRw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Wed, Jul 20, 2011 at 3:03 PM, Andrzej Nakonieczny
<dzemik-pgsql-performance(at)e-list(dot)pingwin(dot)eu(dot)org> wrote:
> W dniu 20.07.2011 17:57, Jose Ildefonso Camargo Tolosa pisze:
>
> [...]
>
>>    Many of the advantages of partitioning have to do with maintenance
>>    tasks.  For example, if you gather data on a daily basis, it's faster
>>    to drop the partition that contains Thursday's data than it is to do a
>>    DELETE that finds the rows and deletes them one at a time.  And VACUUM
>>    can be a problem on very large tables as well, because only one VACUUM
>>    can run on a table at any given time.  If the frequency with which the
>>    table needs to be vacuumed is less than the time it takes for VACUUM
>>    to complete, then you've got a problem.
>>
>>
>> And.... pg_largeobject table doesn't get vacuumed? I mean, isn't that
>> table just as any other table?
>
> Vacuum is a real problem on big pg_largeobject table. I have 1.6 TB database
> mostly with large objects and vacuuming that table on fast SAN takes about 4
> hours:
>
>        now          |        start        |   time   |  datname   |
>  current_query
> ---------------------+---------------------+----------+------------+----------------------------------------------
>  2011-07-20 20:12:03 | 2011-07-20 16:21:20 | 03:50:43 | bigdb      |
> autovacuum: VACUUM pg_catalog.pg_largeobject
> (1 row)
>
>
> LO generates a lot of dead tuples when object are adding:
>
>     relname      | n_dead_tup
> ------------------+------------
>  pg_largeobject   |     246731
>
> Adding LO is very fast when table is vacuumed. But when there is a lot of
> dead tuples adding LO is very slow (50-100 times slower) and eats 100% of
> CPU.
>
> It looks that better way is writing object directly as a bytea on paritioned
> tables althought it's a bit slower than LO interface on a vacuumed table.

Well... yes... I thought about that, but now then, what happen when
you need to fetch the file from the DB? will that be fetched
completely at once? I'm thinking about large files here, say
(hypothetically speaking) you have 1GB files stored.... if the system
will fetch the whole 1GB at once, it would take 1GB RAM (or not?), and
that's what I wanted to avoid by dividing the file in 2kB chunks
(bytea chunks, actually).... I don't quite remember where I got the
2kB size from... but I decided I wanted to avoid using TOAST too.

>
>
> Regards,
> Andrzej
>