Quick Links

Re: performance problems inserting random vals in index

Lists:	pgsql-general

From:	Leonardo F <m_lists(at)yahoo(dot)it>
To:	pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject:	performance problems inserting random vals in index
Date:	2010-04-19 16:30:05
Message-ID:	790487.49845.qm@web29017.mail.ird.yahoo.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

Hi,

I have a simple table that has indexes on 2 integer columns.
Data is inserted very often (no updates, no deletes, just inserts):
at least 4000/5000 rows per second.
The input for the 2 indexed columns is very random.

Everything is "fine" for the first 10-20M rows; after that, performance
gets worse and worse, and by 50M rows I can't insert more than
1500 rows per second.

How can I improve it?

1) the table is already partitioned; at the moment it's based on
30 partitions. What kind of "select" performance impact would I
hit going to 3000 partitions? I know in the 3000 partition case there
will be less rows per partition, but index seek times are O(logN),
right? So:

3000*(log(N/3000)) is much bigger than 30*(log(N/30))

2) How much improvement can I expect going to RAID1+0 on
15rpm disks instead of raid5 on 7200 disks? I know, I know, I
shouldn't even be testing such a huge workload with such a
setup, but can't try anything else at the moment. Disk array is
pretty fast anyway: 190MB seq output, 320MB seq input, 1600
random seeks / s (bonnie++)

3) Is there anything else I can try to "help" postgres update those
index faster?

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Leonardo F <m_lists(at)yahoo(dot)it>
Cc:	pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject:	Re: performance problems inserting random vals in index
Date:	2010-04-19 17:04:30
Message-ID:	17147.1271696670@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

Leonardo F <m_lists(at)yahoo(dot)it> writes:
> I have a simple table that has indexes on 2 integer columns.
> Data is inserted very often (no updates, no deletes, just inserts):
> at least 4000/5000 rows per second.
> The input for the 2 indexed columns is very random.

> Everything is "fine" for the first 10-20M rows; after that, performance
> gets worse and worse, and by 50M rows I can't insert more than
> 1500 rows per second.

> How can I improve it?

Does it help to reindex the index at that point?

> 1) the table is already partitioned; at the moment it's based on
> 30 partitions. What kind of "select" performance impact would I
> hit going to 3000 partitions?

Bad. The partitioning code isn't designed to scale beyond a few dozen
partitions.

regards, tom lane

From:	Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Leonardo F <m_lists(at)yahoo(dot)it>, pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject:	Re: performance problems inserting random vals in index
Date:	2010-04-19 17:06:18
Message-ID:	q2xdcc563d11004191006sfeae98a0r73e4f336b6bbed1c@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

On Mon, Apr 19, 2010 at 11:04 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Leonardo F <m_lists(at)yahoo(dot)it> writes:
>> I have a simple table that has indexes on 2 integer columns.
>> Data is inserted very often (no updates, no deletes, just inserts):
>> at least 4000/5000 rows per second.
>> The input for the 2 indexed columns is very random.
>
>> Everything is "fine" for the first 10-20M rows; after that, performance
>> gets worse and worse, and by 50M rows I can't insert more than
>> 1500 rows per second.
>
>> How can I improve it?
>
> Does it help to reindex the index at that point?
>
>> 1) the table is already partitioned; at the moment it's based on
>> 30 partitions. What kind of "select" performance impact would I
>> hit going to 3000 partitions?
>
> Bad. The partitioning code isn't designed to scale beyond a few dozen
> partitions.

On a few very narrow applications I've gotten good performance in the
low hundreds. After that things fall apart quickly.

From:	Greg Smith <greg(at)2ndquadrant(dot)com>
To:	Leonardo F <m_lists(at)yahoo(dot)it>
Cc:	pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject:	Re: performance problems inserting random vals in index
Date:	2010-04-19 23:42:37
Message-ID:	4BCCEA6D.8070800@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

Leonardo F wrote:
> Is there anything else I can try to "help" postgres update those
> index faster?
>

It sounds like your problem is that read/write activities on the indexes
are becoming increasingly large and random as more data is inserted.
There are two standard ways to improve on that:

1) Periodically rebuild the indexes and hope that the new version is
laid out better than what you've ended up with the random key value
insertions.

2) Optimize your database server configuration to perform better under
this particular situation. The usual set of tricks is to increase
shared_buffers, checkpoint_segments, and checkpoint_timeout to cut down
on the amount of time that the database has to write to the index
blocks, and improve the odds that ones it needs to read are already in
its cache.

It's hard to say whether increasing raw disk speed will help you as much
as you'd like or not. Index-related performance is often driven by
whether the working set needed to work on them efficiently can fit in
RAM or not. Once you've exceeded that, performance drops really fast,
and a linear increase in disk speed may not recover very much of that.
You can look at the size of all the active indexes using something like
the first query at http://wiki.postgresql.org/wiki/Disk_Usage to get an
idea how big they are relative to RAM. Sometimes having more memory is
the only good way to scale upwards in this situation while retaining
something close to original performance.

--
Greg Smith 2ndQuadrant US Baltimore, MD
PostgreSQL Training, Services and Support
greg(at)2ndQuadrant(dot)com www.2ndQuadrant.us

From:	Leonardo F <m_lists(at)yahoo(dot)it>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject:	Re: performance problems inserting random vals in index
Date:	2010-04-20 06:46:58
Message-ID:	468728.53329.qm@web29006.mail.ird.yahoo.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

> Does it help to reindex the index at that point?

Didn't try; but I guess a reindex of such a big table
would block inserts for a long time... but I'll try

> Bad. The partitioning code isn't designed to scale
> beyond a few dozen partitions.

What kind of problems am I going to experience?
It's just a N times access to the index or something
worse? It's the fact that it will take a linear amount of
time to "pick" the right partitions?

From:	Leonardo F <m_lists(at)yahoo(dot)it>
To:	Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
Cc:	pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject:	Re: performance problems inserting random vals in index
Date:	2010-04-20 06:51:01
Message-ID:	655851.114.qm@web29001.mail.ird.yahoo.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

> On a few very narrow applications I've gotten good
> performance in the
> low hundreds. After that things fall apart
> quickly.

Ehm... what exactly "fall apart quickly" means?
I can trade some "select" speed for "insert" speed...

I don't have experience with partitioning, if some of
you already know what's going to happen I could
avoid testing or at least testing for the wrong setup...

Thank you

From:	Leonardo F <m_lists(at)yahoo(dot)it>
To:	Greg Smith <greg(at)2ndquadrant(dot)com>
Cc:	pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject:	Re: performance problems inserting random vals in index
Date:	2010-04-20 06:52:43
Message-ID:	938636.36439.qm@web29008.mail.ird.yahoo.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

> The usual set of tricks is to
> increase shared_buffers, checkpoint_segments, and checkpoint_timeout to cut down

Uh, didn't know shared_buffers played a role in index insertion as well...
got to try that. Thank you.

From:	Leonardo F <m_lists(at)yahoo(dot)it>
To:	pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject:	Re: performance problems inserting random vals in index
Date:	2010-04-21 10:36:16
Message-ID:	516768.87883.qm@web29016.mail.ird.yahoo.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

> > The usual set of tricks is to
> increase shared_buffers,
> checkpoint_segments, and checkpoint_timeout to cut down

That did it. Setting a much higher shared_buffers helped quite a lot.

Thank you everybody for your suggestions.

From:	Leonardo F <m_lists(at)yahoo(dot)it>
To:	pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject:	Re: performance problems inserting random vals in index
Date:	2010-05-06 12:27:22
Message-ID:	541118.25935.qm@web29013.mail.ird.yahoo.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

I'm sorry I have to come back at this, but the solution the list gave helped,
but didn't fully solved my problems...

To sum up:

I have a simple table that has indexes on 2 integer columns.
Data is inserted very often (no updates, no deletes, just inserts): at
least 4000/5000 rows per second. The input for the 2 indexed columns
is very random.

I was experiencing bad insertion times past 20M rows.

Some of you suggested that it might have to do with the indexes being
too big to fit in shared_buffers, so I raised it and, in fact, that solved the
issue... but for another 50M rows: basically at roughly 70M rows I'm
back at poor insertion times (disk always 100% busy).

The indexes at this point are 1.7GB each. My shared buffers is 10GB
(machine has 32GB). So I expect the whole indexes to fit in ram, yet
the disk array is 100% busy...

What's going on? What can I try to get back at good insertion
performance?