Re: Inserts or Updates

Lists: pgsql-performance
From: Ofer Israeli <oferi(at)checkpoint(dot)com>
To: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Cc: Olga Vingurt <olgavi(at)checkpoint(dot)com>, Netta Kabala <nettak(at)checkpoint(dot)com>
Subject: Inserts or Updates
Date: 2012-02-07 10:18:35
Message-ID: 217DDBC2BB1E394CA9E7446337CBDEF20102C056BD4F@il-ex01.ad.checkpoint.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Hi all,

We are currently "stuck" with a performance bottleneck in our server using PG and we are thinking of two potential solutions which I would be happy to hear your opinion about.

Our system has a couple of tables that hold client generated information. The clients communicate every minute with the server and thus we perform an update on these two tables every minute. We are talking about ~50K clients (and therefore records).

These constant updates have made the table sizes to grow drastically and index bloating. So the two solutions that we are talking about are:

1. Configure autovacuum to work more intensively in both time and cost parameters.
Pros:
Not a major architectural change.
Cons:
Autovacuum does not handle index bloating and thus we will need to periodically reindex the tables.
Perhaps we will also need to run vacuum full periodically if the autovacuum cleaning is not at the required pace and therefore defragmentation of the tables is needed?

1. Creating a new table every minute and inserting the data into this new temporary table (only inserts). This process will happen every minute. Note that in this process we will also need to copy missing data (clients that didn't communicate) from older table.
Pros:
Tables are always compact.
We will not reach a limit of autovacuum.
Cons:
Major architectural change.

So to sum it up, we would be happy to refrain from performing a major change to the system (solution #2), but we are not certain that the correct way to work in our situation, constant updates of records, is to configure an aggressive autovacuum or perhaps the "known methodology" is to work with temporary tables that are always inserted into?

Thank you,
Ofer


From: Andy Colson <andy(at)squeakycode(dot)net>
To: Ofer Israeli <oferi(at)checkpoint(dot)com>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>, Olga Vingurt <olgavi(at)checkpoint(dot)com>, Netta Kabala <nettak(at)checkpoint(dot)com>
Subject: Re: Inserts or Updates
Date: 2012-02-07 14:47:22
Message-ID: 4F31397A.5090304@squeakycode.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On 2/7/2012 4:18 AM, Ofer Israeli wrote:
> Hi all,
>
> We are currently “stuck” with a performance bottleneck in our server
> using PG and we are thinking of two potential solutions which I would be
> happy to hear your opinion about.
>
> Our system has a couple of tables that hold client generated
> information. The clients communicate *every* minute with the server and
> thus we perform an update on these two tables every minute. We are
> talking about ~50K clients (and therefore records).
>
> These constant updates have made the table sizes to grow drastically and
> index bloating. So the two solutions that we are talking about are:
>

You dont give any table details, so I'll have to guess. Maybe you have
too many indexes on your table? Or, you dont have a good primary index,
which means your updates are changing the primary key?

If you only have a primary index, and you are not changing it, Pg should
be able to do HOT updates.

If you have lots of indexes, you should review them, you probably don't
need half of them.

And like Kevin said, try the simple one first. Wont hurt anything, and
if it works, great!

-Andy


From: Ofer Israeli <oferi(at)checkpoint(dot)com>
To: Andy Colson <andy(at)squeakycode(dot)net>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>, Olga Vingurt <olgavi(at)checkpoint(dot)com>, Netta Kabala <nettak(at)checkpoint(dot)com>
Subject: Re: Inserts or Updates
Date: 2012-02-07 17:40:11
Message-ID: 217DDBC2BB1E394CA9E7446337CBDEF20102C056BE6F@il-ex01.ad.checkpoint.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Hi Andy,

The two tables I am referring to have the following specs:
Table 1:
46 columns
23 indexes on fields of the following types:
INTEGER - 7
TIMESTAMP - 2
VARCHAR - 12
UUID - 2

23 columns
12 indexes on fields of the following types:
INTEGER - 3
TIMESTAMP - 1
VARCHAR - 6
UUID - 2

All indexes are default indexes.

The primary index is INTERGER and is not updated.

The indexes are used for sorting and filtering purposes in our UI.

I will be happy to hear your thoughts on this.

Thanks,
Ofer

-----Original Message-----
From: Andy Colson [mailto:andy(at)squeakycode(dot)net]
Sent: Tuesday, February 07, 2012 4:47 PM
To: Ofer Israeli
Cc: pgsql-performance(at)postgresql(dot)org; Olga Vingurt; Netta Kabala
Subject: Re: [PERFORM] Inserts or Updates

On 2/7/2012 4:18 AM, Ofer Israeli wrote:
> Hi all,
>
> We are currently "stuck" with a performance bottleneck in our server
> using PG and we are thinking of two potential solutions which I would be
> happy to hear your opinion about.
>
> Our system has a couple of tables that hold client generated
> information. The clients communicate *every* minute with the server and
> thus we perform an update on these two tables every minute. We are
> talking about ~50K clients (and therefore records).
>
> These constant updates have made the table sizes to grow drastically and
> index bloating. So the two solutions that we are talking about are:
>

You dont give any table details, so I'll have to guess. Maybe you have
too many indexes on your table? Or, you dont have a good primary index,
which means your updates are changing the primary key?

If you only have a primary index, and you are not changing it, Pg should
be able to do HOT updates.

If you have lots of indexes, you should review them, you probably don't
need half of them.

And like Kevin said, try the simple one first. Wont hurt anything, and
if it works, great!

-Andy

Scanned by Check Point Total Security Gateway.


From: Andy Colson <andy(at)squeakycode(dot)net>
To: Ofer Israeli <oferi(at)checkpoint(dot)com>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>, Olga Vingurt <olgavi(at)checkpoint(dot)com>, Netta Kabala <nettak(at)checkpoint(dot)com>
Subject: Re: Inserts or Updates
Date: 2012-02-07 19:30:06
Message-ID: 4F317BBE.4020508@squeakycode.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

> -----Original Message-----
> From: Andy Colson [mailto:andy(at)squeakycode(dot)net]
> Sent: Tuesday, February 07, 2012 4:47 PM
> To: Ofer Israeli
> Cc: pgsql-performance(at)postgresql(dot)org; Olga Vingurt; Netta Kabala
> Subject: Re: [PERFORM] Inserts or Updates
>
> On 2/7/2012 4:18 AM, Ofer Israeli wrote:
>> Hi all,
>>
>> We are currently "stuck" with a performance bottleneck in our server
>> using PG and we are thinking of two potential solutions which I would be
>> happy to hear your opinion about.
>>
>> Our system has a couple of tables that hold client generated
>> information. The clients communicate *every* minute with the server and
>> thus we perform an update on these two tables every minute. We are
>> talking about ~50K clients (and therefore records).
>>
>> These constant updates have made the table sizes to grow drastically and
>> index bloating. So the two solutions that we are talking about are:
>>
>
> You dont give any table details, so I'll have to guess. Maybe you have
> too many indexes on your table? Or, you dont have a good primary index,
> which means your updates are changing the primary key?
>
> If you only have a primary index, and you are not changing it, Pg should
> be able to do HOT updates.
>
> If you have lots of indexes, you should review them, you probably don't
> need half of them.
>
>
> And like Kevin said, try the simple one first. Wont hurt anything, and
> if it works, great!
>
> -Andy
>

On 2/7/2012 11:40 AM, Ofer Israeli wrote:
> Hi Andy,
>
> The two tables I am referring to have the following specs:
> Table 1:
> 46 columns
> 23 indexes on fields of the following types:
> INTEGER - 7
> TIMESTAMP - 2
> VARCHAR - 12
> UUID - 2
>
> 23 columns
> 12 indexes on fields of the following types:
> INTEGER - 3
> TIMESTAMP - 1
> VARCHAR - 6
> UUID - 2
>
> All indexes are default indexes.
>
> The primary index is INTERGER and is not updated.
>
> The indexes are used for sorting and filtering purposes in our UI.
>
>
> I will be happy to hear your thoughts on this.
>
> Thanks,
> Ofer
>

Fixed that top post for ya.

Wow, so out of 46 columns, half of them have indexes? That's a lot.
I'd bet you could drop a bunch of them. You should review them and see
if they are actually helping you. You already found out that maintain
all those indexes is painful. If they are not speeding up your SELECT's
by a huge amount, you should drop them.

Sounds like you went thru your sql statements and any field that was
either in the where or order by clause you added an index for?

You need to find the columns that are the most selective. An index
should be useful at cutting the number of rows down. Once you have it
cut down, an index on another field wont really help that much. And
after a result set has been collected, an index may or may not help for
sorting.

Running some queries with EXPLAIN ANALYZE would be helpful. Give it a
run, drop an index, try it again to see if its about the same, or if
that index made a difference.

-Andy


From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Ofer Israeli" <oferi(at)checkpoint(dot)com>, "Andy Colson" <andy(at)squeakycode(dot)net>
Cc: "Netta Kabala" <nettak(at)checkpoint(dot)com>, "Olga Vingurt" <olgavi(at)checkpoint(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Inserts or Updates
Date: 2012-02-07 19:40:41
Message-ID: 4F3129D90200002500044F85@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Andy Colson <andy(at)squeakycode(dot)net> wrote:

> Wow, so out of 46 columns, half of them have indexes? That's a
> lot. I'd bet you could drop a bunch of them. You should review
> them and see if they are actually helping you. You already found
> out that maintain all those indexes is painful. If they are not
> speeding up your SELECT's by a huge amount, you should drop them.

You might want to review usage counts in pg_stat_user_indexes.

-Kevin


From: Andy Colson <andy(at)squeakycode(dot)net>
To: Ofer Israeli <oferi(at)checkpoint(dot)com>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>, Olga Vingurt <olgavi(at)checkpoint(dot)com>, Netta Kabala <nettak(at)checkpoint(dot)com>
Subject: Re: Inserts or Updates
Date: 2012-02-07 19:43:13
Message-ID: 4F317ED1.6080907@squeakycode.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Oh, I knew I'd seen index usage stats someplace.

give this a run:

select * from pg_stat_user_indexes where relname = 'SuperBigTable';

http://www.postgresql.org/docs/current/static/monitoring-stats.html

-Andy


From: Ofer Israeli <oferi(at)checkpoint(dot)com>
To: Andy Colson <andy(at)squeakycode(dot)net>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>, Olga Vingurt <olgavi(at)checkpoint(dot)com>, Netta Kabala <nettak(at)checkpoint(dot)com>
Subject: Re: Inserts or Updates
Date: 2012-02-08 19:22:48
Message-ID: 217DDBC2BB1E394CA9E7446337CBDEF20102C056BFBD@il-ex01.ad.checkpoint.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Andy Colson wrote:
> Oh, I knew I'd seen index usage stats someplace.
>
> give this a run:
>
> select * from pg_stat_user_indexes where relname = 'SuperBigTable';
>
> http://www.postgresql.org/docs/current/static/monitoring-stats.html
>
> -Andy
>
> Scanned by Check Point Total Security Gateway.

Thanks. We have begun analyzing the indexes and indeed found many are pretty useless and will be removed.


From: Vik Reykja <vikreykja(at)gmail(dot)com>
To: Ofer Israeli <oferi(at)checkpoint(dot)com>
Cc: Andy Colson <andy(at)squeakycode(dot)net>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>, Olga Vingurt <olgavi(at)checkpoint(dot)com>, Netta Kabala <nettak(at)checkpoint(dot)com>
Subject: Re: Inserts or Updates
Date: 2012-02-09 10:09:25
Message-ID: CALDgxVtboBUYPCQDPa1GROeJ6a1nyQTAO+SQtze6ycTGvMW3Ag@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Wed, Feb 8, 2012 at 20:22, Ofer Israeli <oferi(at)checkpoint(dot)com> wrote:

> Andy Colson wrote:
> > Oh, I knew I'd seen index usage stats someplace.
> >
> > give this a run:
> >
> > select * from pg_stat_user_indexes where relname = 'SuperBigTable';
> >
> > http://www.postgresql.org/docs/current/static/monitoring-stats.html
> >
> > -Andy
> >
> > Scanned by Check Point Total Security Gateway.
>
>
> Thanks. We have begun analyzing the indexes and indeed found many are
> pretty useless and will be removed.
>

A quick word of warning: not all indexes are used for querying, some are
used for maintaining constraints and foreign keys. These show up as
"useless" in the above query.


From: Frank Lanitz <frank(at)frank(dot)uvena(dot)de>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Inserts or Updates
Date: 2012-02-09 13:28:35
Message-ID: 4F33CA03.3080106@frank.uvena.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Am 07.02.2012 18:40, schrieb Ofer Israeli:
> Table 1:
> 46 columns
> 23 indexes on fields of the following types:
> INTEGER - 7
> TIMESTAMP - 2
> VARCHAR - 12
> UUID - 2
>
> 23 columns
> 12 indexes on fields of the following types:
> INTEGER - 3
> TIMESTAMP - 1
> VARCHAR - 6
> UUID - 2

Are you regularly updating all columns? If not, maybe a good idea to
split the tables so highly updated columns don't effect complete line.

cheers,
Frank


From: Ofer Israeli <oferi(at)checkpoint(dot)com>
To: Frank Lanitz <frank(at)frank(dot)uvena(dot)de>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Inserts or Updates
Date: 2012-02-12 10:48:48
Message-ID: 217DDBC2BB1E394CA9E7446337CBDEF20102C056C1C3@il-ex01.ad.checkpoint.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Frank Lanitz wrote:
> Am 07.02.2012 18:40, schrieb Ofer Israeli:
>> Table 1:
>> 46 columns
>> 23 indexes on fields of the following types:
>> INTEGER - 7
>> TIMESTAMP - 2
>> VARCHAR - 12
>> UUID - 2
>>
>> 23 columns
>> 12 indexes on fields of the following types:
>> INTEGER - 3
>> TIMESTAMP - 1
>> VARCHAR - 6
>> UUID - 2
>
> Are you regularly updating all columns? If not, maybe a good idea to
> split the tables so highly updated columns don't effect complete
> line.

We're not always updating all of the columns, but the reason for consolidating all the columns into one table is for UI purposes - in the past, they had done benchmarks and found the JOINs to be extremely slow and so all data was consolidated into one table.

Thanks,
Ofer


From: Frank Lanitz <frank(at)frank(dot)uvena(dot)de>
To: Ofer Israeli <oferi(at)checkpoint(dot)com>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Inserts or Updates
Date: 2012-02-12 18:29:13
Message-ID: 4F3804F9.4080607@frank.uvena.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Am 12.02.2012 11:48, schrieb Ofer Israeli:
> Frank Lanitz wrote:
>>> Am 07.02.2012 18:40, schrieb Ofer Israeli:
>>>>> Table 1: 46 columns 23 indexes on fields of the following
>>>>> types: INTEGER - 7 TIMESTAMP - 2 VARCHAR - 12 UUID - 2
>>>>>
>>>>> 23 columns 12 indexes on fields of the following types:
>>>>> INTEGER - 3 TIMESTAMP - 1 VARCHAR - 6 UUID - 2
>>>
>>> Are you regularly updating all columns? If not, maybe a good idea
>>> to split the tables so highly updated columns don't effect
>>> complete line.
> We're not always updating all of the columns, but the reason for
> consolidating all the columns into one table is for UI purposes - in
> the past, they had done benchmarks and found the JOINs to be
> extremely slow and so all data was consolidated into one table.

Ah... I see. Maybe you can check whether all of the data are really
needed to fetch with one select but this might end up in tooo much
guessing and based on your feedback you already did this step.

Cheers,
Frank


From: Ofer Israeli <oferi(at)checkpoint(dot)com>
To: Frank Lanitz <frank(at)frank(dot)uvena(dot)de>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Inserts or Updates
Date: 2012-02-12 18:32:11
Message-ID: 217DDBC2BB1E394CA9E7446337CBDEF20102C056C2A6@il-ex01.ad.checkpoint.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Frank Lanitz wrote:
> Am 12.02.2012 11:48, schrieb Ofer Israeli:
>> Frank Lanitz wrote:
>>>> Am 07.02.2012 18:40, schrieb Ofer Israeli:
>>>>>> Table 1: 46 columns 23 indexes on fields of the following
>>>>>> types: INTEGER - 7 TIMESTAMP - 2 VARCHAR - 12 UUID - 2
>>>>>>
>>>>>> 23 columns 12 indexes on fields of the following types:
>>>>>> INTEGER - 3 TIMESTAMP - 1 VARCHAR - 6 UUID - 2
>>>>
>>>> Are you regularly updating all columns? If not, maybe a good idea
>>>> to split the tables so highly updated columns don't effect complete
>>>> line.
>> We're not always updating all of the columns, but the reason for
>> consolidating all the columns into one table is for UI purposes - in
>> the past, they had done benchmarks and found the JOINs to be
>> extremely slow and so all data was consolidated into one table.
>
> Ah... I see. Maybe you can check whether all of the data are really
> needed to fetch with one select but this might end up in tooo much
> guessing and based on your feedback you already did this step.

This was indeed checked, but I'm not sure it was thorough enough so we're having a go at it again. In the meanwhile, the autovacuum configurations have proved to help us immensely so for now we're good (will probably be asking around soon when we hit our next bottleneck :)). Thanks for your help!