Re: Increase default effective_cache_size?

Lists: pgsql-generalpgsql-hackers
From: Russ Brown <pickscrape(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Optimising a query requiring seqscans=0
Date: 2006-09-14 15:15:34
Message-ID: 1158246934.23051.131.camel@aeolian.my-domain.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Hi,

We recently upgraded our trac backend from sqlite to postgres, and I
decided to have a little fun and write some reports that delve into
trac's subversion cache, and got stuck with a query optimisation
problem.

Table revision contains 2800+ rows
Table node_change contains 370000+.

rev is a 'TEXT' field on both containing numerical data (not my choice,
trac does it like this to support VCS backends with non-numerical
revision identifiers).

I've got stuck with this query:

SELECT author, COUNT(DISTINCT r.rev)
FROM revision AS r
LEFT JOIN node_change AS nc
ON r.rev=nc.rev
WHERE r.time >= EXTRACT(epoch FROM (NOW() - interval '30
days'))::integer
GROUP BY r.author;

Currently it shows the number of commits per author in the last 30 days.
The join to node_change is superfluous for this purpose but was added
because I intended to add new columns which draw on this table, such as
the number of files added, deleted etc.

I never got that far however due to the planner problem:

GroupAggregate (cost=56755.41..56852.28 rows=2 width=17) (actual
time=4836.433..4897.458 rows=25 loops=1)
-> Sort (cost=56755.41..56787.69 rows=12913 width=17) (actual
time=4836.233..4851.968 rows=22419 loops=1)
Sort Key: r.author
-> Merge Left Join (cost=53886.10..55873.68 rows=12913
width=17) (actual time=4600.733..4641.749 rows=22419 loops=1)
Merge Cond: ("outer".rev = "inner".rev)
-> Sort (cost=93.78..96.24 rows=982 width=17) (actual
time=7.050..7.383 rows=1088 loops=1)
Sort Key: r.rev
-> Index Scan using revision_time_idx on revision
r (cost=0.01..44.98 rows=982 width=17) (actual time=0.191..4.014
rows=1088 loops=1)
Index Cond: ("time" >=
(date_part('epoch'::text, (now() - '30 days'::interval)))::integer)
-> Sort (cost=53792.32..54719.09 rows=370707 width=8)
(actual time=4203.665..4443.748 rows=346238 loops=1)
Sort Key: nc.rev
-> Seq Scan on node_change nc
(cost=0.00..12852.07 rows=370707 width=8) (actual time=0.054..663.719
rows=370707 loops=1)
Total runtime: 4911.430 ms

If I disable sequential scans I get the following explain:

GroupAggregate (cost=221145.13..221242.01 rows=2 width=17) (actual
time=286.348..348.268 rows=25 loops=1)
-> Sort (cost=221145.13..221177.42 rows=12913 width=17) (actual
time=286.183..302.239 rows=22419 loops=1)
Sort Key: r.author
-> Nested Loop Left Join (cost=0.01..220263.40 rows=12913
width=17) (actual time=0.339..86.626 rows=22419 loops=1)
-> Index Scan using revision_time_idx on revision r
(cost=0.01..44.98 rows=982 width=17) (actual time=0.194..4.056 rows=1088
loops=1)
Index Cond: ("time" >= (date_part('epoch'::text,
(now() - '30 days'::interval)))::integer)
-> Index Scan using node_change_rev_idx on node_change
nc (cost=0.00..223.18 rows=86 width=8) (actual time=0.009..0.058
rows=21 loops=1088)
Index Cond: ("outer".rev = nc.rev)
Total runtime: 350.103 ms

Statistics are set to 20, and I have ANALYZEd both tables.

The report itself isn't important, but I'm using this as an exercise in
PostgreSQL query optimisation and planner tuning, so any help/hints
would be appreciated.

Thanks.

--

Russ


From: Jim Nasby <jim(at)nasby(dot)net>
To: Russ Brown <pickscrape(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Optimising a query requiring seqscans=0
Date: 2006-09-22 03:39:16
Message-ID: 4E6B4E5B-DD9C-45DE-A915-0EADD9988E28@nasby.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

On Sep 14, 2006, at 11:15 AM, Russ Brown wrote:
> We recently upgraded our trac backend from sqlite to postgres, and I
> decided to have a little fun and write some reports that delve into
> trac's subversion cache, and got stuck with a query optimisation
> problem.
>
> Table revision contains 2800+ rows
> Table node_change contains 370000+.
<...>
> I've got stuck with this query:
>
> SELECT author, COUNT(DISTINCT r.rev)
> FROM revision AS r
> LEFT JOIN node_change AS nc
> ON r.rev=nc.rev
> WHERE r.time >= EXTRACT(epoch FROM (NOW() - interval '30
> days'))::integer

Man I really hate when people store time_t in a database...

> GROUP BY r.author;
>
> Statistics are set to 20, and I have ANALYZEd both tables.
>
> The report itself isn't important, but I'm using this as an
> exercise in
> PostgreSQL query optimisation and planner tuning, so any help/hints
> would be appreciated.

Setting statistics higher (100-200), at least for the large table
will likely help. Also make sure that you've set effective_cache_size
correctly (I generally set it to total memory - 1G, assuming the
server has at least 4G in it).
--
Jim Nasby jimn(at)enterprisedb(dot)com
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)


From: Russ Brown <pickscrape(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Optimising a query requiring seqscans=0
Date: 2006-09-23 22:58:48
Message-ID: 1159052328.8670.65.camel@aeolian.my-domain.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

On Thu, 2006-09-21 at 23:39 -0400, Jim Nasby wrote:
> On Sep 14, 2006, at 11:15 AM, Russ Brown wrote:
> > We recently upgraded our trac backend from sqlite to postgres, and I
> > decided to have a little fun and write some reports that delve into
> > trac's subversion cache, and got stuck with a query optimisation
> > problem.
> >
> > Table revision contains 2800+ rows
> > Table node_change contains 370000+.
> <...>
> > I've got stuck with this query:
> >
> > SELECT author, COUNT(DISTINCT r.rev)
> > FROM revision AS r
> > LEFT JOIN node_change AS nc
> > ON r.rev=nc.rev
> > WHERE r.time >= EXTRACT(epoch FROM (NOW() - interval '30
> > days'))::integer
>
> Man I really hate when people store time_t in a database...
>

I know. Probably something to do with database engine independence. I
don't know if sqlite even has a date type (probably does, but I haven't
checked).

> > GROUP BY r.author;
> >
> > Statistics are set to 20, and I have ANALYZEd both tables.
> >
> > The report itself isn't important, but I'm using this as an
> > exercise in
> > PostgreSQL query optimisation and planner tuning, so any help/hints
> > would be appreciated.
>
> Setting statistics higher (100-200), at least for the large table
> will likely help. Also make sure that you've set effective_cache_size
> correctly (I generally set it to total memory - 1G, assuming the
> server has at least 4G in it).

Thank you: the problem was the effective_cache_size (which I hadn't
changed from the default of 1000). This machine doesn't have loads of
RAM, but I knocked it up to 65536 and now the query uses the index,
without having to change the statistics.

Thanks a lot!

> --
> Jim Nasby jimn(at)enterprisedb(dot)com
> EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
>
>


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Cc: Russ Brown <pickscrape(at)gmail(dot)com>
Subject: Increase default effective_cache_size?
Date: 2006-09-23 23:10:33
Message-ID: 29442.1159053033@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Russ Brown <pickscrape(at)gmail(dot)com> writes on pgsql-general:
> On Thu, 2006-09-21 at 23:39 -0400, Jim Nasby wrote:
>> Also make sure that you've set effective_cache_size
>> correctly (I generally set it to total memory - 1G, assuming the
>> server has at least 4G in it).

> Thank you: the problem was the effective_cache_size (which I hadn't
> changed from the default of 1000). This machine doesn't have loads of
> RAM, but I knocked it up to 65536 and now the query uses the index,
> without having to change the statistics.

Considering recent discussion about how 8.2 is probably noticeably more
sensitive to effective_cache_size than prior releases, I wonder whether
it's not time to adopt a larger default value for that setting. The
current default of 1000 pages (8Mb) seems really pretty silly for modern
machines; we could certainly set it to 10 times that without problems,
and maybe much more. Thoughts?

regards, tom lane


From: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org, Russ Brown <pickscrape(at)gmail(dot)com>
Subject: Re: Increase default effective_cache_size?
Date: 2006-09-24 00:14:45
Message-ID: 4515CDF5.9030808@commandprompt.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers


>> Thank you: the problem was the effective_cache_size (which I hadn't
>> changed from the default of 1000). This machine doesn't have loads of
>> RAM, but I knocked it up to 65536 and now the query uses the index,
>> without having to change the statistics.
>
> Considering recent discussion about how 8.2 is probably noticeably more
> sensitive to effective_cache_size than prior releases, I wonder whether
> it's not time to adopt a larger default value for that setting. The
> current default of 1000 pages (8Mb) seems really pretty silly for modern
> machines; we could certainly set it to 10 times that without problems,
> and maybe much more. Thoughts?

I think that 128 megs is probably a reasonable starting point. I know
plenty of people that run postgresql on 512 megs of ram. If you take
into account shared buffers and work mem, that seems like a reasonable
starting point.

Joshua D. Drake
>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
> message can get through to the mailing list cleanly
>

--

=== The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive PostgreSQL solutions since 1997
http://www.commandprompt.com/


From: Gevik Babakhani <pgdev(at)xs4all(dot)nl>
To: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org, Russ Brown <pickscrape(at)gmail(dot)com>
Subject: Re: Increase default effective_cache_size?
Date: 2006-09-24 00:37:29
Message-ID: 1159058249.5087.10.camel@voyager.truesoftware.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

On Sat, 2006-09-23 at 17:14 -0700, Joshua D. Drake wrote:
> >> Thank you: the problem was the effective_cache_size (which I hadn't
> >> changed from the default of 1000). This machine doesn't have loads of
> >> RAM, but I knocked it up to 65536 and now the query uses the index,
> >> without having to change the statistics.
> >
> > Considering recent discussion about how 8.2 is probably noticeably more
> > sensitive to effective_cache_size than prior releases, I wonder whether
> > it's not time to adopt a larger default value for that setting. The
> > current default of 1000 pages (8Mb) seems really pretty silly for modern
> > machines; we could certainly set it to 10 times that without problems,
> > and maybe much more. Thoughts?
>
> I think that 128 megs is probably a reasonable starting point. I know
> plenty of people that run postgresql on 512 megs of ram. If you take
> into account shared buffers and work mem, that seems like a reasonable
> starting point.
>

I agree, Adopting a higher effective_cache_size seems to be a good thing
to do.

(hmmm.... I must be dreaming again.... But I cannot stop wondering how
it would be to have a smart "agent" that configures these values by
analyzing the machine power and statistical values gathered from
database usage......)


From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgreSQL(dot)org, Russ Brown <pickscrape(at)gmail(dot)com>
Subject: Re: Increase default effective_cache_size?
Date: 2006-09-24 00:59:58
Message-ID: 20060924005958.GE24675@kenobi.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

* Tom Lane (tgl(at)sss(dot)pgh(dot)pa(dot)us) wrote:
> Russ Brown <pickscrape(at)gmail(dot)com> writes on pgsql-general:
> > Thank you: the problem was the effective_cache_size (which I hadn't
> > changed from the default of 1000). This machine doesn't have loads of
> > RAM, but I knocked it up to 65536 and now the query uses the index,
> > without having to change the statistics.
>
> Considering recent discussion about how 8.2 is probably noticeably more
> sensitive to effective_cache_size than prior releases, I wonder whether
> it's not time to adopt a larger default value for that setting. The
> current default of 1000 pages (8Mb) seems really pretty silly for modern
> machines; we could certainly set it to 10 times that without problems,
> and maybe much more. Thoughts?

I'd have to agree 100% with this. Though don't we now have something
automated for shared_buffers? I'd think effective_cache_size would
definitely be a candidate for automation (say, half or 1/4th the ram in
the box...).

Barring the ability to do something along those lines- yes, I'd
recommend up'ing it to at least 128M or 256M.

Thanks,

Stephen


From: Teodor Sigaev <teodor(at)sigaev(dot)ru>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgreSQL(dot)org, Russ Brown <pickscrape(at)gmail(dot)com>
Subject: Re: Increase default effective_cache_size?
Date: 2006-09-25 08:11:06
Message-ID: 45178F1A.3030500@sigaev.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

> current default of 1000 pages (8Mb) seems really pretty silly for modern
> machines; we could certainly set it to 10 times that without problems,
> and maybe much more. Thoughts?

May be, set by default effective_cache_size equal to number of shared buffers?
If pgsql is configured to use quarter or half of total memory for shared
buffer, then effective_cache_size will have good approximation...

--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Teodor Sigaev <teodor(at)sigaev(dot)ru>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org, Russ Brown <pickscrape(at)gmail(dot)com>
Subject: Re: Increase default effective_cache_size?
Date: 2006-09-25 12:17:01
Message-ID: 4517C8BD.2080908@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Teodor Sigaev wrote:
>> current default of 1000 pages (8Mb) seems really pretty silly for modern
>> machines; we could certainly set it to 10 times that without problems,
>> and maybe much more. Thoughts?
>
> May be, set by default effective_cache_size equal to number of shared
> buffers?
> If pgsql is configured to use quarter or half of total memory for
> shared buffer, then effective_cache_size will have good approximation...
>
>

Initdb does not currently make any attempt to discover the extent of
physical or virtual memory, it simply tries to start postgres with
certain shared_buffer settings, starting at 4000, and going down until
we get a success.

max_fsm_pages is now fixed proportionally with shared_buffers, and I
guess we could do something similar with effective_cache_size, but since
IIRC this doesn't involve shared memory I'm inclined to agree with Tom
that it should just be fixed at some substantially higher level.

cheers

andrew


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Teodor Sigaev <teodor(at)sigaev(dot)ru>, pgsql-hackers(at)postgresql(dot)org, Russ Brown <pickscrape(at)gmail(dot)com>
Subject: Re: Increase default effective_cache_size?
Date: 2006-09-25 17:07:17
Message-ID: 11617.1159204037@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> Initdb does not currently make any attempt to discover the extent of
> physical or virtual memory, it simply tries to start postgres with
> certain shared_buffer settings, starting at 4000, and going down until
> we get a success.

> max_fsm_pages is now fixed proportionally with shared_buffers, and I
> guess we could do something similar with effective_cache_size, but since
> IIRC this doesn't involve shared memory I'm inclined to agree with Tom
> that it should just be fixed at some substantially higher level.

Right, the default shared_buffers doesn't have much of anything to do
with actual RAM size. If the user has altered it, then it might (or
might not) ... but that doesn't help us for setting a default
effective_cache_size.

Barring objections, I'll change it to Josh Drake's suggestion of ~ 128Mb
(versus current 8Mb).

regards, tom lane