Re: PostgreSQL Arrays and Performance

Lists: pgsql-general
From: Marc Philipp <mail(at)marcphilipp(dot)de>
To: pgsql-general(at)postgresql(dot)org
Subject: PostgreSQL Arrays and Performance
Date: 2006-01-03 15:46:35
Message-ID: 927255D6-5FD9-4D06-93B6-C3FE25395C61@marcphilipp.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

A few performance issues using PostgreSQL's arrays led us to the
question how postgres actually stores variable length arrays. First,
let me explain our situation.

We have a rather large table containing a simple integer primary key
and a couple more columns of fixed size. However, there is a dates
column of type "timestamp without time zone[]" that is apparently
causing some severe performance problems.

During a daily update process new timestamps are collected and
existing data rows are being updated (new rows are also being added).
These changes affect a large percentage of the existing rows.

What we have been observing in the last few weeks is, that the
overall database size is increasing rapidly due to this table and
vacuum processes seem to deadlock with other processes querying data
from this table.

Therefore, the the database keeps growing and becomes more and more
unusable. The only thing that helps is dumping and restoring it which
is nothing you are eager to do on a large live system and a daily basis.

This problem led us to the question, how these arrays are stored
internally. Are they stored "in-place" with the other columns or
merely as a pointer to another file?

Would it be more efficient to not use an array for this purpose but
split the table in two parts?

Any help is appreciated!

Marc Philipp


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Marc Philipp <mail(at)marcphilipp(dot)de>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: PostgreSQL Arrays and Performance
Date: 2006-01-03 18:40:44
Message-ID: 18853.1136313644@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Marc Philipp <mail(at)marcphilipp(dot)de> writes:
> A few performance issues using PostgreSQL's arrays led us to the
> question how postgres actually stores variable length arrays. First,
> let me explain our situation.

> We have a rather large table containing a simple integer primary key
> and a couple more columns of fixed size. However, there is a dates
> column of type "timestamp without time zone[]" that is apparently
> causing some severe performance problems.

How large are the arrays? PG is definitely not designed to do well with
very large arrays (say more than a couple hundred elements). You should
reconsider your data design if you find yourself trying to do that.

regards, tom lane


From: Joe Conway <mail(at)joeconway(dot)com>
To: Marc Philipp <mail(at)marcphilipp(dot)de>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: PostgreSQL Arrays and Performance
Date: 2006-01-03 19:13:15
Message-ID: 43BACCCB.7010903@joeconway.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Marc Philipp wrote:
> During a daily update process new timestamps are collected and existing
> data rows are being updated (new rows are also being added). These
> changes affect a large percentage of the existing rows.
>
> What we have been observing in the last few weeks is, that the overall
> database size is increasing rapidly due to this table and vacuum
> processes seem to deadlock with other processes querying data from this
> table.

This sounds like it has more to do with inadequate freespace map
settings than use of arrays. Every update creates a dead tuple, and if
it is large (because the array is large) and leaked (because you have no
room in your freespace map), that would explain a rapidly increasing
database size.

> Therefore, the the database keeps growing and becomes more and more
> unusable. The only thing that helps is dumping and restoring it which
> is nothing you are eager to do on a large live system and a daily basis.

Arrays are stored as compressed varlenas, pretty much exactly like
varchar or text fields. However, if your use of arrays causes you to
need to perform updates to a large percentage of your rows on a daily
basis, instead of just inserting new rows, you should probably rethink
your data model.

> Would it be more efficient to not use an array for this purpose but
> split the table in two parts?
>

I think so.

Joe


From: Marc Philipp <mail(at)marcphilipp(dot)de>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: PostgreSQL Arrays and Performance
Date: 2006-01-08 09:22:22
Message-ID: 1136712142.9076.22.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

> This sounds like it has more to do with inadequate freespace map
> settings than use of arrays. Every update creates a dead tuple, and
> if
> it is large (because the array is large) and leaked (because you have
> no
> room in your freespace map), that would explain a rapidly increasing
> database size.

Can you tell me more about free-space map settings? What exactly is the
free-space map? The information in the documentation is not very
helpful. How can dead tuples leak?

Regards, Marc Philipp


From: Marc Philipp <mail(at)marcphilipp(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: PostgreSQL Arrays and Performance
Date: 2006-01-08 09:26:32
Message-ID: 1136712392.9076.27.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

> How large are the arrays? PG is definitely not designed to do well
> with
> very large arrays (say more than a couple hundred elements). You
> should
> reconsider your data design if you find yourself trying to do that

At the moment, the arrays are not larger than 200 entries. But there is
not upper bound for their size.

Regards, Marc Philipp


From: "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
To: Marc Philipp <mail(at)marcphilipp(dot)de>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: PostgreSQL Arrays and Performance
Date: 2006-01-10 20:34:19
Message-ID: 20060110203419.GF3902@pervasive.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Sun, Jan 08, 2006 at 10:22:22AM +0100, Marc Philipp wrote:
> > This sounds like it has more to do with inadequate freespace map
> > settings than use of arrays. Every update creates a dead tuple, and
> > if
> > it is large (because the array is large) and leaked (because you have
> > no
> > room in your freespace map), that would explain a rapidly increasing
> > database size.
>
> Can you tell me more about free-space map settings? What exactly is the
> free-space map? The information in the documentation is not very
> helpful. How can dead tuples leak?

http://www.pervasive-postgres.com/lp/newsletters/2005/Insights_opensource_Nov.asp#3
is an article I wrote that might clear things up.
http://www.pervasive-postgres.com/lp/newsletters/2005/Insights_opensource_Dec.asp#2
might also be an interesting read, though it's just about MVCC in
general.
--
Jim C. Nasby, Sr. Engineering Consultant jnasby(at)pervasive(dot)com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461