Optimize update query

Lists: pgsql-performance
From: Niels Kristian Schjødt <nielskristian(at)autouncle(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Optimize update query
Date: 2012-11-28 12:57:49
Message-ID: 1C199C0D-13DF-47B3-944A-B0560C6D2E5A@autouncle.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Hi, i have these update queries, that run very often, and takes too long time, in order for us to reach the throughput we are aiming at. However, the update query is very simple, and I can't figure out any way to improve the situation. The query looks like this:

UPDATE "adverts" SET "last_observed_at" = '2012-11-28 00:02:30.265154', "data_source_id" ='83d024a57bc2958940f3ca281bddcbf4' WHERE"adverts"."id" IN ( 1602382, 4916432, 3221246, 4741057, 3853335, 571429, 3222740, 571736, 3544903, 325378,5774338, 5921451, 4295768, 3223170, 5687001, 4741966, 325519, 580867, 325721, 4412200, 4139598, 325567, 1616653,1616664, 6202007, 3223748, 325613, 3223764, 325615, 4296536, 3854595, 4971428, 3224146, 5150522, 4412617, 5073048,325747, 325771, 1622154, 5794384, 5736581, 1623767, 5686945, 3224627, 5073009, 3224747, 3224749, 325809, 5687051,3224811, 5687052, 4917824, 5073013, 3224816, 3224834, 4297331, 1623907, 325864, 1623947, 6169706, 325869, 325877,3225074, 3225112, 325893, 325912, 3225151, 3225184, 3225175, 1624659, 325901, 4033926, 325904, 325911, 4412835,1624737, 5073004, 5921434, 325915, 3225285, 3225452, 4917672, 1624984, 3225472, 325940, 5380611, 325957, 5073258,3225500, 1625002, 5923489, 4413009, 325952, 3961122, 3637777 ) ;

An explain outputs me the following:

"Update on adverts (cost=0.12..734.27 rows=95 width=168)"
" -> Index Scan using adverts_pkey on adverts (cost=0.12..734.27 rows=95 width=168)"
" Index Cond: (id = ANY ('{1602382,4916432,3221246,4741057,3853335,571429,3222740,571736,3544903,325378,5774338,5921451,4295768,3223170,5687001,4741966,325519,580867,325721,4412200,4139598,325567,1616653,1616664,6202007,3223748,325613,3223764,325615,4296536,3854595,4971428,3224146,5150522,4412617,5073048,325747,325771,1622154,5794384,5736581,1623767,5686945,3224627,5073009,3224747,3224749,325809,5687051,3224811,5687052,4917824,5073013,3224816,3224834,4297331,1623907,325864,1623947,6169706,325869,325877,3225074,3225112,325893,325912,3225151,3225184,3225175,1624659,325901,4033926,325904,325911,4412835,1624737,5073004,5921434,325915,3225285,3225452,4917672,1624984,3225472,325940,5380611,325957,5073258,3225500,1625002,5923489,4413009,325952,3961122,3637777}'::integer[]))"

So as you can see, it's already pretty optimized, it's just not enough :-) So what can I do? the two columns last_observed_at and data_source_id has an index, and it is needed elsewhere, so I can't delete those.

PS. I'm on postgres 9.2 on a server with 32gb ram, 8 cores and two 3T disks in a software raid 1 setup.

Is the only way out of this really a SSD disk?


From: Shaun Thomas <sthomas(at)optionshouse(dot)com>
To: Niels Kristian Schjødt <nielskristian(at)autouncle(dot)com>
Cc: <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Optimize update query
Date: 2012-11-28 14:07:59
Message-ID: 50B61ABF.4070000@optionshouse.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On 11/28/2012 06:57 AM, Niels Kristian Schjødt wrote:

Before I go crazy, here... you really need to tell us what "not enough"
means. You didn't provide an explain analyze, so we don't know what your
actual performance is. But I have my suspicions.

> So as you can see, it's already pretty optimized, it's just not
> enough :-) So what can I do? the two columns last_observed_at and
> data_source_id has an index, and it is needed elsewhere, so I can't
> delete those.

Ok, so part of your problem is that you're tying an advertising system
directly to the database for direct updates. That's a big no-no. Any
time you got a huge influx of views, there would be a logjam. You need
to decouple this so you can use a second tool to load the database in
larger batches. You'll get much higher throughput this way.

If you absolutely must use this approach, you're going to have to beef
up your hardware.

> PS. I'm on postgres 9.2 on a server with 32gb ram, 8 cores and two 3T
> disks in a software raid 1 setup.

This is not sufficient for a high-bandwidth stream of updates. Not even
close. Even if those 3T disks are 7200 RPM, and even in RAID-1, you're
going to have major problems with concurrent reads and writes. You need
to do several things:

1. Move your transaction logs (pg_xlog) to another pair of disks
entirely. Do not put these on the same disks as your data if you need
high write throughput.
2. Get a better disk architecture. You need 10k, or 15k RPM disks.
Starting with 6 or more of them in a RAID-10 would be a good beginning.

You never told us your postgresql.conf settings, so I'm just going with
very generic advice. Essentially, you're expecting too much for too
little. That machine would have been low-spec three years ago, and
unsuited to database use simply due to the 2-disk RAID.

> Is the only way out of this really a SSD disk?

No. There are many, many steps you can and should take before going this
route. You need to know the problem you're solving before making
potentially expensive hardware decisions.

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-444-8534
sthomas(at)optionshouse(dot)com

______________________________________________

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email


From: Marcin Mirosław <marcin(at)mejor(dot)pl>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Optimize update query
Date: 2012-11-28 14:39:55
Message-ID: 50B6223B.6000100@mejor.pl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

W dniu 28.11.2012 15:07, Shaun Thomas pisze:
> On 11/28/2012 06:57 AM, Niels Kristian Schjødt wrote:
>
> Before I go crazy, here... you really need to tell us what "not enough"
> means. You didn't provide an explain analyze, so we don't know what your
> actual performance is. But I have my suspicions.
>
>> So as you can see, it's already pretty optimized, it's just not
>> enough :-) So what can I do? the two columns last_observed_at and
>> data_source_id has an index, and it is needed elsewhere, so I can't
>> delete those.
>
> Ok, so part of your problem is that you're tying an advertising system
> directly to the database for direct updates. That's a big no-no. Any
> time you got a huge influx of views, there would be a logjam. You need
> to decouple this so you can use a second tool to load the database in
> larger batches. You'll get much higher throughput this way.

+1, sql databases has limited number of inserts/updates per second. Even
with highend hardware you won't have more than XXX operations per
second. As Thomas said, you should feed something like nosql database
from www server and use other tool to do aggregation and batch inserts
to postgresql. It will scale much better.

Marcin


From: Willem Leenen <willem_leenen(at)hotmail(dot)com>
To: <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Optimize update query
Date: 2012-11-28 15:11:46
Message-ID: DUB104-W48E83E17609D570C42B5AD8F5D0@phx.gbl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

I assume that SQL databases ( Banks? Telecom?) can handle an used car shop. No need for an unstructured data tool.

> +1, sql databases has limited number of inserts/updates per second. Even
> with highend hardware you won't have more than XXX operations per
> second. As Thomas said, you should feed something like nosql database
> from www server and use other tool to do aggregation and batch inserts
> to postgresql. It will scale much better.
>
> Marcin


From: Niels Kristian Schjødt <nielskristian(at)autouncle(dot)com>
To: Willem Leenen <willem_leenen(at)hotmail(dot)com>
Cc: <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Optimize update query
Date: 2012-11-28 16:19:17
Message-ID: CAC9DA03-64C2-45E2-9B88-703F9768F574@autouncle.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Okay guys,

Thanks for all the great help and advice already! Let me just clear some things, to make my question a little easier to answer :-)
Now my site is a search engine for used cars - not just a car shop with a few hundred cars.
The update query you look at, is an update that is executed once a day in chunks for all active adverts, so we know they are still for sale (one car can be advertised at several places hence several "adverts"). So it's not a "constant stream" but it has a fairly high volume especially at night time though.

A compressed version of my .conf looks like this (note: there is some tweaks at the end of the file)
data_directory = '/var/lib/postgresql/9.2/main'
hba_file = '/etc/postgresql/9.2/main/pg_hba.conf'
ident_file = '/etc/postgresql/9.2/main/pg_ident.conf'
external_pid_file = '/var/run/postgresql/9.2-main.pid'
listen_addresses = '192.168.0.2, localhost'
port = 5432
max_connections = 1000
unix_socket_directory = '/var/run/postgresql'
wal_level = hot_standby
synchronous_commit = off
archive_mode = onarchive_command = 'rsync -a %p postgres(at)192(dot)168(dot)0(dot)4:/var/lib/postgresql/9.2/wals/%f </dev/null'
max_wal_senders = 1
wal_keep_segments = 32
logging_collector = on
log_min_messages = debug1
log_min_error_statement = debug1
log_min_duration_statement = 0
log_checkpoints = on
log_connections = on
log_disconnections = onlog_line_prefix = '%t [%p]: [%l-1] user=%u,db=%d '
log_lock_waits = on log_temp_files = 0
datestyle = 'iso, mdy'
lc_messages = 'C'
lc_monetary = 'en_US.UTF-8'
lc_numeric = 'en_US.UTF-8'
lc_time = 'en_US.UTF-8'
default_text_search_config = 'pg_catalog.english'
default_statistics_target = 100
maintenance_work_mem = 1GB
checkpoint_completion_target = 0.7
effective_cache_size = 22GB
work_mem = 160MB
wal_buffers = 4MB
checkpoint_segments = 16
shared_buffers = 7680MB

# All the log stuff is mainly temporary requirement for pgBadger
# The database has been tuned with pgtuner

You might be familiar with new relic, and I use that for quite a lot of monitoring. So, this is what I see at night time (a lot of I/O). So I went to play around with pgBadger to get some insights at database level.
<iframe src="https://rpm.newrelic.com/public/charts/h2dtedghfsv" width="500" height="300" scrolling="no" frameborder="no"></iframe>

This shows me, that the by far most time-consuming queries are updates (in general). On avg. a query like the one I showed you, take 1,3 sec (but often it takes several minutes - which makes me wonder). So correct me if I'm wrong here: my theory is, that I have too many too slow update queries, that then often end up in a situation, where they "wait" for each other to finish, hence the sometimes VERY long execution times. So my basic idea here is, that if I could reduce the cost of the updates, then I could get a hight throughput overall.

Here is a sample of the pgBadger analysis:

Queries that took up the most time (N) ^
Rank Total duration Times executed Av. duration (s) Query
1 1d15h28m38.71s
948,711
0.15s
COMMIT;

2 1d2h17m55.43s
401,002
0.24s
INSERT INTO "car_images" ( "car_id", "created_at", "image", "updated_at" ) VALUES ( '', '', '', '' ) returning "id";

3 23h18m33.68s
195,093
0.43s
SELECT DISTINCT "cars".id FROM "cars" LEFT OUTER JOIN "adverts" ON "adverts"."car_id" = "cars"."id" LEFT OUTERJOIN "sellers" ON "sellers"."id" = "adverts"."seller_id" WHERE "cars"."sales_state" = '' AND "cars"."year" = 0 AND"cars"."engine_size" = 0.0 AND ( ( "cars"."id" IS NOT NULL AND "cars"."brand" = '' AND "cars"."model_name" = ''AND "cars"."fuel" = '' AND "cars"."km" = 0 AND "cars"."price" = 0 AND "sellers"."kind" = '' ) ) LIMIT 0;

4 22h45m26.52s
3,374,133
0.02s
SELECT "adverts".* FROM "adverts" WHERE ( source_name = '' AND md5 ( url ) = md5 ( '' ) ) LIMIT 0;

5 10h31m37.18s
29,671
1.28s
UPDATE "adverts" SET "last_observed_at" = '', "data_source_id" = '' WHERE "adverts"."id" IN ( ... ) ;

6 7h18m40.65s
396,393
0.07s
UPDATE "cars" SET "updated_at" = '' WHERE "cars"."id" = 0;

7 7h6m7.87s
241,294
0.11s
UPDATE "cars" SET "images_count" = COALESCE ( "images_count", 0 ) + 0 WHERE "cars"."id" = 0;

8 6h56m11.78s
84,571
0.30s
INSERT INTO "failed_adverts" ( "active_record_object_class", "advert_candidate", "created_at", "exception_class","exception_message", "from_rescraper", "last_retried_at", "retry_count", "source_name", "stack_trace","updated_at", "url" ) VALUES ( NULL, '', '', '', '', NULL, NULL, '', '', '', '', '' ) returning "id";

9 5h47m25.45s
188,402
0.11s
INSERT INTO "adverts" ( "availability_state", "car_id", "created_at", "data_source_id", "deactivated_at","first_extraction", "last_observed_at", "price", "seller_id", "source_id", "source_name", "updated_at", "url" )VALUES ( '', '', '', '', NULL, '', '', '', '', '', '', '', '' ) returning "id";

10 3h4m26.86s
166,235
0.07s
UPDATE "adverts" SET "deactivated_at" = '', "availability_state" = '', "updated_at" = '' WHERE "adverts"."id" = 0;

(Yes I'm already on the task of improving the selects)

Den 28/11/2012 kl. 16.11 skrev Willem Leenen <willem_leenen(at)hotmail(dot)com>:

>
> I assume that SQL databases ( Banks? Telecom?) can handle an used car shop. No need for an unstructured data tool.
>
>
>
> > +1, sql databases has limited number of inserts/updates per second. Even
> > with highend hardware you won't have more than XXX operations per
> > second. As Thomas said, you should feed something like nosql database
> > from www server and use other tool to do aggregation and batch inserts
> > to postgresql. It will scale much better.
> >
> > Marcin


From: Bèrto ëd Sèra <berto(dot)d(dot)sera(at)gmail(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Optimize update query
Date: 2012-11-28 16:29:24
Message-ID: CAKwGa_-gQY79w_syKm6TUJ7=K_Fk_bw1urzyQdNiAEtmNJ3=Pw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

max_connections = 1000 looks bad... why not a pooler in place?
Cheers
Bèrto

On 28 November 2012 16:19, Niels Kristian Schjødt
<nielskristian(at)autouncle(dot)com> wrote:
> max_connections = 1000

--
==============================
If Pac-Man had affected us as kids, we'd all be running around in a
darkened room munching pills and listening to repetitive music.


From: Shaun Thomas <sthomas(at)optionshouse(dot)com>
To: Niels Kristian Schjødt <nielskristian(at)autouncle(dot)com>
Cc: Willem Leenen <willem_leenen(at)hotmail(dot)com>, <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Optimize update query
Date: 2012-11-28 16:54:05
Message-ID: 50B641AD.3020204@optionshouse.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On 11/28/2012 10:19 AM, Niels Kristian Schjødt wrote:

> https://rpm.newrelic.com/public/charts/h2dtedghfsv

Doesn't this answer your question?

That iowait is crushing your server into the ground. It's no surprise
updates are taking several seconds. That update you sent us *should*
execute on the order of only a few milliseconds.

So I'll reiterate that you *must* move your pg_xlog location elsewhere.
You've got row lookup bandwidth conflicting with writes. There are a
couple other changes you should probably make to your config:

> checkpoint_segments = 16

This is not enough for the workload you describe. Every time the
database checkpoints, all of those changes in pg_xlog are applied to the
backend data files. You should set these values:

checkpoint_segments = 100
checkpoint_timeout = 10m
checkpoint_completion_target = 0.9

This will reduce your overall write workload, and make it less active.
Too many checkpoints massively reduce write throughput. With the
settings you have, it's probably checkpointing constantly while your
load runs. Start with this, but experiment with increasing
checkpoint_segments further.

If you check your logs now, you probably see a ton of "checkpoint
starting: xlog" in there. That's very bad. It should say "checkpoint
starting: time" meaning it's keeping up with your writes naturally.

> work_mem = 160MB

This is probably way too high. work_mem is used every sort operation in
a query. So each connection could have several of these allocated, thus
starting your system of memory which will reduce that available for page
cache. Change it to 8mb, and increase it in small increments if necessary.

> So correct me if I'm wrong here: my theory is, that I have too many
> too slow update queries, that then often end up in a situation, where
> they "wait" for each other to finish, hence the sometimes VERY long
> execution times.

Sometimes this is the case, but for you, you're running into IO
contention, not lock contention. Your 3TB RAID-1 is simply insufficient
for this workload.

If you check your logs after making the changes I've suggested, take a
look at your checkpoint sync times. That will tell you how long it took
the kernel to physically commit those blocks to disk and get a
confirmation back from the controller. If those take longer than a
second or two, you're probably running into controller buffer overflows.
You have a large amount of RAM, so you should also make these two kernel
changes to sysctl.conf:

vm.dirty_ratio = 10
vm.dirty_writeback_ratio = 1

Then run this:

sysctl -p

This will help prevent large IO write spikes caused when the kernel
decides to write out dirty memory. That can make checkpoints take
minutes to commit in some cases, which basically stops all write traffic
to your database entirely.

That should get you going, anyway. You still need more/better disks so
you can move your pg_xlog directory. With your write load, that will
make a huge difference.

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-444-8534
sthomas(at)optionshouse(dot)com

______________________________________________

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email


From: Shaun Thomas <sthomas(at)optionshouse(dot)com>
To: Niels Kristian Schjødt <nielskristian(at)autouncle(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Optimize update query
Date: 2012-11-28 18:01:41
Message-ID: 50B65185.6010005@optionshouse.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On 11/28/2012 11:44 AM, Niels Kristian Schjødt wrote:

> Thanks a lot - on the server I already have one additional SSD 250gb
> disk, that I don't use for anything at the moment.

Goooood. An SSD would actually be better for your data, as it follows
more random access patterns, and xlogs are more sequential. But it's
better than nothing.

And yes, you'd be better off with a RAID-1 of two of these SSDs, because
the xlogs are critical to database health. You have your archived copy
due to the rsync, which helps. But if you had a crash, there could
potentially be a need to replay unarchived transaction logs, and you'd
end up with some data loss.

> BTW. as you might have seen from the .conf I have a second slave
> server with the exact same setup, which currently runs as a hot
> streaming replication slave. I might ask a stupid question here, but
> this does not affect the performance of the master does it?

Only if you're using synchronous replication. From what I saw in the
config, that isn't the case.

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-444-8534
sthomas(at)optionshouse(dot)com

______________________________________________

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email


From: Niels Kristian Schjødt <nielskristian(at)autouncle(dot)com>
To: <sthomas(at)optionshouse(dot)com>
Cc: Willem Leenen <willem_leenen(at)hotmail(dot)com>, <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Optimize update query
Date: 2012-11-29 03:32:11
Message-ID: F9939192-448B-41FF-9FB1-61343754D050@autouncle.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Hi, I have started to implement your suggestions . I have a small error so far though. The "vm.dirty_writeback_ratio = 1" command rerurns:
error: "vm.dirty_writeback_ratio" is an unknown key
I'm on ubuntu 12.04

Den 28/11/2012 kl. 17.54 skrev Shaun Thomas <sthomas(at)optionshouse(dot)com>:

> On 11/28/2012 10:19 AM, Niels Kristian Schjødt wrote:
>
>> https://rpm.newrelic.com/public/charts/h2dtedghfsv
>
> Doesn't this answer your question?
>
> That iowait is crushing your server into the ground. It's no surprise updates are taking several seconds. That update you sent us *should* execute on the order of only a few milliseconds.
>
> So I'll reiterate that you *must* move your pg_xlog location elsewhere. You've got row lookup bandwidth conflicting with writes. There are a couple other changes you should probably make to your config:
>
>> checkpoint_segments = 16
>
> This is not enough for the workload you describe. Every time the database checkpoints, all of those changes in pg_xlog are applied to the backend data files. You should set these values:
>
> checkpoint_segments = 100
> checkpoint_timeout = 10m
> checkpoint_completion_target = 0.9
>
> This will reduce your overall write workload, and make it less active. Too many checkpoints massively reduce write throughput. With the settings you have, it's probably checkpointing constantly while your load runs. Start with this, but experiment with increasing checkpoint_segments further.
>
> If you check your logs now, you probably see a ton of "checkpoint starting: xlog" in there. That's very bad. It should say "checkpoint starting: time" meaning it's keeping up with your writes naturally.
>
>> work_mem = 160MB
>
> This is probably way too high. work_mem is used every sort operation in a query. So each connection could have several of these allocated, thus starting your system of memory which will reduce that available for page cache. Change it to 8mb, and increase it in small increments if necessary.
>
>> So correct me if I'm wrong here: my theory is, that I have too many
>> too slow update queries, that then often end up in a situation, where
>> they "wait" for each other to finish, hence the sometimes VERY long
>> execution times.
>
> Sometimes this is the case, but for you, you're running into IO contention, not lock contention. Your 3TB RAID-1 is simply insufficient for this workload.
>
> If you check your logs after making the changes I've suggested, take a look at your checkpoint sync times. That will tell you how long it took the kernel to physically commit those blocks to disk and get a confirmation back from the controller. If those take longer than a second or two, you're probably running into controller buffer overflows. You have a large amount of RAM, so you should also make these two kernel changes to sysctl.conf:
>
> vm.dirty_ratio = 10
> vm.dirty_writeback_ratio = 1
>
> Then run this:
>
> sysctl -p
>
> This will help prevent large IO write spikes caused when the kernel decides to write out dirty memory. That can make checkpoints take minutes to commit in some cases, which basically stops all write traffic to your database entirely.
>
> That should get you going, anyway. You still need more/better disks so you can move your pg_xlog directory. With your write load, that will make a huge difference.
>
> --
> Shaun Thomas
> OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
> 312-444-8534
> sthomas(at)optionshouse(dot)com
>
> ______________________________________________
>
> See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email


From: Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
To: Niels Kristian Schjødt <nielskristian(at)autouncle(dot)com>
Cc: sthomas(at)optionshouse(dot)com, Willem Leenen <willem_leenen(at)hotmail(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Optimize update query
Date: 2012-11-29 04:30:52
Message-ID: 50B6E4FC.8040605@catalyst.net.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

In later kernels these have been renamed:

Welcome to Ubuntu 12.04.1 LTS (GNU/Linux 3.2.0-32-generic x86_64)

$ sysctl -a|grep dirty
vm.dirty_background_ratio = 5
vm.dirty_background_bytes = 0
vm.dirty_ratio = 10
vm.dirty_bytes = 0
vm.dirty_writeback_centisecs = 500
vm.dirty_expire_centisecs = 3000

You the option of specifying either a ratio, or - more usefully for
machines with a lot of ram - bytes.

Regards

Mark

P.s: People on this list usually prefer it if you *bottom* post (i.e
reply underneath the original).

On 29/11/12 16:32, Niels Kristian Schjødt wrote:
> Hi, I have started to implement your suggestions . I have a small error so far though. The "vm.dirty_writeback_ratio = 1" command rerurns:
> error: "vm.dirty_writeback_ratio" is an unknown key
> I'm on ubuntu 12.04
>
>


From: Niels Kristian Schjødt <nielskristian(at)autouncle(dot)com>
To: <sthomas(at)optionshouse(dot)com>
Cc: Willem Leenen <willem_leenen(at)hotmail(dot)com>, <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Optimize update query
Date: 2012-11-30 00:59:00
Message-ID: A7A3104F-D847-4352-ACD6-4E64067E2D3A@autouncle.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance


Den 28/11/2012 kl. 17.54 skrev Shaun Thomas <sthomas(at)optionshouse(dot)com>:

> On 11/28/2012 10:19 AM, Niels Kristian Schjødt wrote:
>
>> https://rpm.newrelic.com/public/charts/h2dtedghfsv
>
> Doesn't this answer your question?
>
> That iowait is crushing your server into the ground. It's no surprise updates are taking several seconds. That update you sent us *should* execute on the order of only a few milliseconds.
>
> So I'll reiterate that you *must* move your pg_xlog location elsewhere. You've got row lookup bandwidth conflicting with writes. There are a couple other changes you should probably make to your config:
>
>> checkpoint_segments = 16
>
> This is not enough for the workload you describe. Every time the database checkpoints, all of those changes in pg_xlog are applied to the backend data files. You should set these values:
>
> checkpoint_segments = 100
> checkpoint_timeout = 10m
> checkpoint_completion_target = 0.9
>
> This will reduce your overall write workload, and make it less active. Too many checkpoints massively reduce write throughput. With the settings you have, it's probably checkpointing constantly while your load runs. Start with this, but experiment with increasing checkpoint_segments further.
>
> If you check your logs now, you probably see a ton of "checkpoint starting: xlog" in there. That's very bad. It should say "checkpoint starting: time" meaning it's keeping up with your writes naturally.
>
>> work_mem = 160MB
>
> This is probably way too high. work_mem is used every sort operation in a query. So each connection could have several of these allocated, thus starting your system of memory which will reduce that available for page cache. Change it to 8mb, and increase it in small increments if necessary.
>
>> So correct me if I'm wrong here: my theory is, that I have too many
>> too slow update queries, that then often end up in a situation, where
>> they "wait" for each other to finish, hence the sometimes VERY long
>> execution times.
>
> Sometimes this is the case, but for you, you're running into IO contention, not lock contention. Your 3TB RAID-1 is simply insufficient for this workload.
>
> If you check your logs after making the changes I've suggested, take a look at your checkpoint sync times. That will tell you how long it took the kernel to physically commit those blocks to disk and get a confirmation back from the controller. If those take longer than a second or two, you're probably running into controller buffer overflows. You have a large amount of RAM, so you should also make these two kernel changes to sysctl.conf:
>
> vm.dirty_ratio = 10
> vm.dirty_writeback_ratio = 1
>
> Then run this:
>
> sysctl -p
>
> This will help prevent large IO write spikes caused when the kernel decides to write out dirty memory. That can make checkpoints take minutes to commit in some cases, which basically stops all write traffic to your database entirely.
>
> That should get you going, anyway. You still need more/better disks so you can move your pg_xlog directory. With your write load, that will make a huge difference.
>
> --
> Shaun Thomas
> OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
> 312-444-8534
> sthomas(at)optionshouse(dot)com
>
> ______________________________________________
>
> See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email

Okay, now I'm done the updating as described above. I did the postgres.conf changes. I did the kernel changes, i added two SSD's in a software RAID1 where the pg_xlog is now located - unfortunately the the picture is still the same :-(
When the database is under "heavy" load, there is almost no improvement to see in the performance compared to before the changes. A lot of both read and writes takes more than a 1000 times as long as they usually do, under "lighter" overall load.

I added All the overview charts I can get hold on from new relic beneath. What am I overlooking? There must be an obvious bottleneck? Where should I dive in?

Database server CPU usage
https://rpm.newrelic.com/public/charts/cEdIvvoQZCr

Database server load average
https://rpm.newrelic.com/public/charts/cMNdrYW51QJ

Database server physical memory
https://rpm.newrelic.com/public/charts/c3dZBntNpa1

Database server disk I/O utulization
https://rpm.newrelic.com/public/charts/9YEVw6RekFG

Database server network I/O (Mb/s)
https://rpm.newrelic.com/public/charts/lKiZ0Szmwe7

Top 5 database operations by wall clock time
https://rpm.newrelic.com/public/charts/dCt45YH12FK

Database throughput
https://rpm.newrelic.com/public/charts/bIbtQ1mDzMI

Database response time
https://rpm.newrelic.com/public/charts/fPcNL8WA6xx