Quick Links

Re: Query much faster with enable_seqscan=0

Lists:	pgsql-performance

From:	Ogden <lists(at)darkstatic(dot)com>
To:	pgsql-performance(at)postgresql(dot)org
Subject:	Query much faster with enable_seqscan=0
Date:	2010-09-21 17:32:01
Message-ID:	0D5B9ED0-00C3-4EEB-8336-0B2F8A884620@darkstatic.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

Hello,

I have received some help from the IRC channel, however, the problem still exists. When running the following query with enable_seqscan set to 0, it takes less than a second, whereas with it set to 1, the query returns in 14 seconds. The machine itself has 8GB Ram and is running PostgreSQL 9.0 on Debian Lenny. The database size is about 7GB.

Query:
SELECT tr.id, tr.sid
FROM
test_registration tr,
INNER JOIN test_registration_result r on (tr.id = r.test_registration_id)
WHERE.
tr.test_administration_id='32a22b12-aa21-11df-a606-96551e8f4e4c'::uuid
GROUP BY tr.id, tr.sid

demo=# \d test_registration_result
Table "public.test_registration_result"
Column | Type | Modifiers
----------------------+-----------------------+-----------
answer | character varying(15) |
question_id | uuid | not null
score | double precision |
test_registration_id | uuid | not null
Indexes:
"test_registration_result_pkey" PRIMARY KEY, btree (question_id, test_registration_id)
"test_registration_result_answer" btree (test_registration_id, answer, score)
"test_registration_result_test" btree (test_registration_id)

Explain Analyze:

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=951169.97..951198.37 rows=2840 width=25) (actual time=14669.039..14669.843 rows=2972 loops=1)
-> Hash Join (cost=2988.07..939924.85 rows=2249024 width=25) (actual time=551.464..14400.061 rows=638980 loops=1)
Hash Cond: (r.test_registration_id = tr.id)
-> Seq Scan on test_registration_result r (cost=0.00..681946.72 rows=37199972 width=16) (actual time=0.015..6073.101 rows=37198734 loops=1)
-> Hash (cost=2952.57..2952.57 rows=2840 width=25) (actual time=2.516..2.516 rows=2972 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 160kB
-> Bitmap Heap Scan on test_registration tr (cost=44.29..2952.57 rows=2840 width=25) (actual time=0.528..1.458 rows=2972 loops=1)
Recheck Cond: (test_administration_id = 'e26a165a-c19f-11df-be2f-778af560e5a2'::uuid)
-> Bitmap Index Scan on test_registration_test_administration_id (cost=0.00..43.58 rows=2840 width=0) (actual time=0.507..0.507 rows=2972 loops=1)
Index Cond: (test_administration_id = 'e26a165a-c19f-11df-be2f-778af560e5a2'::uuid)
Total runtime: 14670.337 ms
(11 rows)

real 0m14.698s
user 0m0.000s
sys 0m0.008s

With "set enable_seqscan=0;"

SET
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=1225400.19..1225428.59 rows=2840 width=25) (actual time=748.397..749.160 rows=2972 loops=1)
-> Nested Loop (cost=0.00..1214155.07 rows=2249024 width=25) (actual time=0.107..465.165 rows=638980 loops=1)
-> Index Scan using test_registration_test_administration_id on test_registration tr (cost=0.00..4413.96 rows=2840 width=25) (actual time=0.050..1.610 rows=2972 loops=1)
Index Cond: (test_administration_id = 'e26a165a-c19f-11df-be2f-778af560e5a2'::uuid)
-> Index Scan using test_registration_result_answer on test_registration_result r (cost=0.00..416.07 rows=792 width=16) (actual time=0.019..0.106 rows=215 loops=2972)
Index Cond: (r.test_registration_id = tr.id)
Total runtime: 749.745 ms
(7 rows)

real 0m0.759s
user 0m0.008s
sys 0m0.000s

The following parameters are changed in postgresql.conf and I have routinely vacuum analyzed the tables and database:

shared_buffers = 2048MB
work_mem = 8MB
maintenance_work_mem = 256MB
wal_buffers = 640kB
random_page_cost = 4.0
effective_cache_size = 7000MB
default_statistics_target = 200

free -m:
total used free shared buffers cached
Mem: 8003 7849 153 0 25 7555
-/+ buffers/cache: 268 7735
Swap: 7640 0 7639

Any help would be appreciated. Thank you very much.

Ogden

From:	Kenneth Marshall <ktm(at)rice(dot)edu>
To:	Ogden <lists(at)darkstatic(dot)com>
Cc:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: Query much faster with enable_seqscan=0
Date:	2010-09-21 18:06:18
Message-ID:	20100921180618.GI17152@aart.is.rice.edu
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

You DB is more than likely cached. You should adjust your
page costs to better reflect reality and then the planner
can make more accurate estimates and then choose the proper
plan.

Cheers,
Ken

On Tue, Sep 21, 2010 at 12:32:01PM -0500, Ogden wrote:
> Hello,
>
> I have received some help from the IRC channel, however, the problem still exists. When running the following query with enable_seqscan set to 0, it takes less than a second, whereas with it set to 1, the query returns in 14 seconds. The machine itself has 8GB Ram and is running PostgreSQL 9.0 on Debian Lenny. The database size is about 7GB.
>
>
> Query:
> SELECT tr.id, tr.sid
> FROM
> test_registration tr,
> INNER JOIN test_registration_result r on (tr.id = r.test_registration_id)
> WHERE.
> tr.test_administration_id='32a22b12-aa21-11df-a606-96551e8f4e4c'::uuid
> GROUP BY tr.id, tr.sid
>
>
>
> demo=# \d test_registration
> Table "public.test_registration"
> Column | Type | Modifiers
> ------------------------+-----------------------------+------------------------
> id | uuid | not null
> sid | character varying(36) | not null
> created_date | timestamp without time zone | not null default now()
> modified_date | timestamp without time zone | not null
> test_administration_id | uuid | not null
> teacher_number | character varying(15) |
> test_version_id | uuid |
> Indexes:
> "test_registration_pkey" PRIMARY KEY, btree (id)
> "test_registration_sid_key" UNIQUE, btree (sid, test_administration_id)
> "test_registration_teacher" btree (teacher_number)
> "test_registration_test_id" btree (test_administration_id)
>
> demo=# \d test_registration_result
> Table "public.test_registration_result"
> Column | Type | Modifiers
> ----------------------+-----------------------+-----------
> answer | character varying(15) |
> question_id | uuid | not null
> score | double precision |
> test_registration_id | uuid | not null
> Indexes:
> "test_registration_result_pkey" PRIMARY KEY, btree (question_id, test_registration_id)
> "test_registration_result_answer" btree (test_registration_id, answer, score)
> "test_registration_result_test" btree (test_registration_id)
>
>
> Explain Analyze:
>
>
> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> HashAggregate (cost=951169.97..951198.37 rows=2840 width=25) (actual time=14669.039..14669.843 rows=2972 loops=1)
> -> Hash Join (cost=2988.07..939924.85 rows=2249024 width=25) (actual time=551.464..14400.061 rows=638980 loops=1)
> Hash Cond: (r.test_registration_id = tr.id)
> -> Seq Scan on test_registration_result r (cost=0.00..681946.72 rows=37199972 width=16) (actual time=0.015..6073.101 rows=37198734 loops=1)
> -> Hash (cost=2952.57..2952.57 rows=2840 width=25) (actual time=2.516..2.516 rows=2972 loops=1)
> Buckets: 1024 Batches: 1 Memory Usage: 160kB
> -> Bitmap Heap Scan on test_registration tr (cost=44.29..2952.57 rows=2840 width=25) (actual time=0.528..1.458 rows=2972 loops=1)
> Recheck Cond: (test_administration_id = 'e26a165a-c19f-11df-be2f-778af560e5a2'::uuid)
> -> Bitmap Index Scan on test_registration_test_administration_id (cost=0.00..43.58 rows=2840 width=0) (actual time=0.507..0.507 rows=2972 loops=1)
> Index Cond: (test_administration_id = 'e26a165a-c19f-11df-be2f-778af560e5a2'::uuid)
> Total runtime: 14670.337 ms
> (11 rows)
>
>
> real 0m14.698s
> user 0m0.000s
> sys 0m0.008s
>
>
> With "set enable_seqscan=0;"
>
>
> SET
> QUERY PLAN
> -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> HashAggregate (cost=1225400.19..1225428.59 rows=2840 width=25) (actual time=748.397..749.160 rows=2972 loops=1)
> -> Nested Loop (cost=0.00..1214155.07 rows=2249024 width=25) (actual time=0.107..465.165 rows=638980 loops=1)
> -> Index Scan using test_registration_test_administration_id on test_registration tr (cost=0.00..4413.96 rows=2840 width=25) (actual time=0.050..1.610 rows=2972 loops=1)
> Index Cond: (test_administration_id = 'e26a165a-c19f-11df-be2f-778af560e5a2'::uuid)
> -> Index Scan using test_registration_result_answer on test_registration_result r (cost=0.00..416.07 rows=792 width=16) (actual time=0.019..0.106 rows=215 loops=2972)
> Index Cond: (r.test_registration_id = tr.id)
> Total runtime: 749.745 ms
> (7 rows)
>
>
> real 0m0.759s
> user 0m0.008s
> sys 0m0.000s
>
>
> The following parameters are changed in postgresql.conf and I have routinely vacuum analyzed the tables and database:
>
> shared_buffers = 2048MB
> work_mem = 8MB
> maintenance_work_mem = 256MB
> wal_buffers = 640kB
> random_page_cost = 4.0
> effective_cache_size = 7000MB
> default_statistics_target = 200
>
>
> free -m:
> total used free shared buffers cached
> Mem: 8003 7849 153 0 25 7555
> -/+ buffers/cache: 268 7735
> Swap: 7640 0 7639
>
>
> Any help would be appreciated. Thank you very much.
>
> Ogden

From:	Ogden <lists(at)darkstatic(dot)com>
To:	Kenneth Marshall <ktm(at)rice(dot)edu>
Cc:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: Query much faster with enable_seqscan=0
Date:	2010-09-21 18:21:52
Message-ID:	D707045F-C689-403A-9729-8DBCC39F53A1@darkstatic.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

I assume you mean random_page_cost? It is currently set to 4.0 - is it better to increase or decrease this value?

Thank you

Ogden

On Sep 21, 2010, at 1:06 PM, Kenneth Marshall wrote:

> You DB is more than likely cached. You should adjust your
> page costs to better reflect reality and then the planner
> can make more accurate estimates and then choose the proper
> plan.
>
> Cheers,
> Ken
>
> On Tue, Sep 21, 2010 at 12:32:01PM -0500, Ogden wrote:
>> Hello,
>>
>> I have received some help from the IRC channel, however, the problem still exists. When running the following query with enable_seqscan set to 0, it takes less than a second, whereas with it set to 1, the query returns in 14 seconds. The machine itself has 8GB Ram and is running PostgreSQL 9.0 on Debian Lenny. The database size is about 7GB.
>>
>>
>> Query:
>> SELECT tr.id, tr.sid
>> FROM
>> test_registration tr,
>> INNER JOIN test_registration_result r on (tr.id = r.test_registration_id)
>> WHERE.
>> tr.test_administration_id='32a22b12-aa21-11df-a606-96551e8f4e4c'::uuid
>> GROUP BY tr.id, tr.sid
>>
>>
>>
>> demo=# \d test_registration
>> Table "public.test_registration"
>> Column | Type | Modifiers
>> ------------------------+-----------------------------+------------------------
>> id | uuid | not null
>> sid | character varying(36) | not null
>> created_date | timestamp without time zone | not null default now()
>> modified_date | timestamp without time zone | not null
>> test_administration_id | uuid | not null
>> teacher_number | character varying(15) |
>> test_version_id | uuid |
>> Indexes:
>> "test_registration_pkey" PRIMARY KEY, btree (id)
>> "test_registration_sid_key" UNIQUE, btree (sid, test_administration_id)
>> "test_registration_teacher" btree (teacher_number)
>> "test_registration_test_id" btree (test_administration_id)
>>
>> demo=# \d test_registration_result
>> Table "public.test_registration_result"
>> Column | Type | Modifiers
>> ----------------------+-----------------------+-----------
>> answer | character varying(15) |
>> question_id | uuid | not null
>> score | double precision |
>> test_registration_id | uuid | not null
>> Indexes:
>> "test_registration_result_pkey" PRIMARY KEY, btree (question_id, test_registration_id)
>> "test_registration_result_answer" btree (test_registration_id, answer, score)
>> "test_registration_result_test" btree (test_registration_id)
>>
>>
>> Explain Analyze:
>>
>>
>> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>> HashAggregate (cost=951169.97..951198.37 rows=2840 width=25) (actual time=14669.039..14669.843 rows=2972 loops=1)
>> -> Hash Join (cost=2988.07..939924.85 rows=2249024 width=25) (actual time=551.464..14400.061 rows=638980 loops=1)
>> Hash Cond: (r.test_registration_id = tr.id)
>> -> Seq Scan on test_registration_result r (cost=0.00..681946.72 rows=37199972 width=16) (actual time=0.015..6073.101 rows=37198734 loops=1)
>> -> Hash (cost=2952.57..2952.57 rows=2840 width=25) (actual time=2.516..2.516 rows=2972 loops=1)
>> Buckets: 1024 Batches: 1 Memory Usage: 160kB
>> -> Bitmap Heap Scan on test_registration tr (cost=44.29..2952.57 rows=2840 width=25) (actual time=0.528..1.458 rows=2972 loops=1)
>> Recheck Cond: (test_administration_id = 'e26a165a-c19f-11df-be2f-778af560e5a2'::uuid)
>> -> Bitmap Index Scan on test_registration_test_administration_id (cost=0.00..43.58 rows=2840 width=0) (actual time=0.507..0.507 rows=2972 loops=1)
>> Index Cond: (test_administration_id = 'e26a165a-c19f-11df-be2f-778af560e5a2'::uuid)
>> Total runtime: 14670.337 ms
>> (11 rows)
>>
>>
>> real 0m14.698s
>> user 0m0.000s
>> sys 0m0.008s
>>
>>
>> With "set enable_seqscan=0;"
>>
>>
>> SET
>> QUERY PLAN
>> -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>> HashAggregate (cost=1225400.19..1225428.59 rows=2840 width=25) (actual time=748.397..749.160 rows=2972 loops=1)
>> -> Nested Loop (cost=0.00..1214155.07 rows=2249024 width=25) (actual time=0.107..465.165 rows=638980 loops=1)
>> -> Index Scan using test_registration_test_administration_id on test_registration tr (cost=0.00..4413.96 rows=2840 width=25) (actual time=0.050..1.610 rows=2972 loops=1)
>> Index Cond: (test_administration_id = 'e26a165a-c19f-11df-be2f-778af560e5a2'::uuid)
>> -> Index Scan using test_registration_result_answer on test_registration_result r (cost=0.00..416.07 rows=792 width=16) (actual time=0.019..0.106 rows=215 loops=2972)
>> Index Cond: (r.test_registration_id = tr.id)
>> Total runtime: 749.745 ms
>> (7 rows)
>>
>>
>> real 0m0.759s
>> user 0m0.008s
>> sys 0m0.000s
>>
>>
>> The following parameters are changed in postgresql.conf and I have routinely vacuum analyzed the tables and database:
>>
>> shared_buffers = 2048MB
>> work_mem = 8MB
>> maintenance_work_mem = 256MB
>> wal_buffers = 640kB
>> random_page_cost = 4.0
>> effective_cache_size = 7000MB
>> default_statistics_target = 200
>>
>>
>> free -m:
>> total used free shared buffers cached
>> Mem: 8003 7849 153 0 25 7555
>> -/+ buffers/cache: 268 7735
>> Swap: 7640 0 7639
>>
>>
>> Any help would be appreciated. Thank you very much.
>>
>> Ogden
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance

From:	Jesper Krogh <jesper(at)krogh(dot)cc>
To:	Ogden <lists(at)darkstatic(dot)com>
Cc:	Kenneth Marshall <ktm(at)rice(dot)edu>, pgsql-performance(at)postgresql(dot)org
Subject:	Re: Query much faster with enable_seqscan=0
Date:	2010-09-21 18:51:14
Message-ID:	4C98FEA2.8030607@krogh.cc
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

On 2010-09-21 20:21, Ogden wrote:
> I assume you mean random_page_cost? It is currently set to 4.0 - is it better to increase or decrease this value?
>

Should be lowered to a bit over seq_page_cost.. and more importantly..
you should
make sure that you have updated your statistics .. run "ANALYZE";

--
Jesper

From:	Ogden <lists(at)darkstatic(dot)com>
To:	Jesper Krogh <jesper(at)krogh(dot)cc>
Cc:	Kenneth Marshall <ktm(at)rice(dot)edu>, pgsql-performance(at)postgresql(dot)org
Subject:	Re: Query much faster with enable_seqscan=0
Date:	2010-09-21 19:02:11
Message-ID:	3062DF4E-7460-4487-818A-BF77F176E501@darkstatic.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

How odd, I set the following:

seq_page_cost = 1.0
random_page_cost = 2.0

And now the query runs in milliseconds as opposed to 14 seconds. Could this really be the change? I am running ANALYZE now - how often is it recommended to do this?

Thank you

Ogden

On Sep 21, 2010, at 1:51 PM, Jesper Krogh wrote:

> On 2010-09-21 20:21, Ogden wrote:
>> I assume you mean random_page_cost? It is currently set to 4.0 - is it better to increase or decrease this value?
>>
>
> Should be lowered to a bit over seq_page_cost.. and more importantly.. you should
> make sure that you have updated your statistics .. run "ANALYZE";
>
> --
> Jesper
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance

From:	"Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To:	Ogden <lists(at)darkstatic(dot)com>
Cc:	Jesper Krogh <jesper(at)krogh(dot)cc>, Kenneth Marshall <ktm(at)rice(dot)edu>, pgsql-performance(at)postgresql(dot)org
Subject:	Re: Query much faster with enable_seqscan=0
Date:	2010-09-21 19:07:11
Message-ID:	1285096031.15919.68.camel@jd-desktop.unknown.charter.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

On Tue, 2010-09-21 at 14:02 -0500, Ogden wrote:
> How odd, I set the following:
>
> seq_page_cost = 1.0
> random_page_cost = 2.0
>
> And now the query runs in milliseconds as opposed to 14 seconds. Could this really be the change? I am running ANALYZE now - how often is it recommended to do this?

PostgreSQL's defaults are based on extremely small and some would say
(non production) size databases. As a matter of course I always
recommend bringing seq_page_cost and random_page_cost more in line.

However, you may want to try moving random_page_cost back to 4 and try
increasing cpu_tuple_cost instead.

Sincerely,

Joshua D. Drake

--
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579
Consulting, Training, Support, Custom Development, Engineering
http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt

From:	Greg Smith <greg(at)2ndquadrant(dot)com>
To:	jd(at)commandprompt(dot)com
Cc:	Ogden <lists(at)darkstatic(dot)com>, Jesper Krogh <jesper(at)krogh(dot)cc>, Kenneth Marshall <ktm(at)rice(dot)edu>, pgsql-performance(at)postgresql(dot)org
Subject:	Re: Query much faster with enable_seqscan=0
Date:	2010-09-21 19:16:57
Message-ID:	4C9904A9.5080301@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

Joshua D. Drake wrote:
> PostgreSQL's defaults are based on extremely small and some would say
> (non production) size databases. As a matter of course I always
> recommend bringing seq_page_cost and random_page_cost more in line.
>

Also, they presume that not all of your data is going to be in memory,
and the query optimizer needs to be careful about what it does and
doesn't pull from disk. If that's not the case, like here where there's
8GB of RAM and a 7GB database, dramatic reductions to both seq_page_cost
and random_page_cost can make sense. Don't be afraid to think lowering
below 1.0 is going too far--something more like 0.01 for sequential and
0.02 for random may actually reflect reality here.

--
Greg Smith, 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services and Support www.2ndQuadrant.us
Author, "PostgreSQL 9.0 High Performance" Pre-ordering at:
https://www.packtpub.com/postgresql-9-0-high-performance/book

From:	Ogden <lists(at)darkstatic(dot)com>
To:	Greg Smith <greg(at)2ndquadrant(dot)com>
Cc:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: Query much faster with enable_seqscan=0
Date:	2010-09-21 19:34:42
Message-ID:	D551B381-C707-43C1-A3F5-A3064F09152C@darkstatic.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

On Sep 21, 2010, at 2:16 PM, Greg Smith wrote:

> Joshua D. Drake wrote:
>> PostgreSQL's defaults are based on extremely small and some would say
>> (non production) size databases. As a matter of course I always
>> recommend bringing seq_page_cost and random_page_cost more in line.
>>
>
> Also, they presume that not all of your data is going to be in memory, and the query optimizer needs to be careful about what it does and doesn't pull from disk. If that's not the case, like here where there's 8GB of RAM and a 7GB database, dramatic reductions to both seq_page_cost and random_page_cost can make sense. Don't be afraid to think lowering below 1.0 is going too far--something more like 0.01 for sequential and 0.02 for random may actually reflect reality here.
>

I have done just that, per your recommendations and now what took 14 seconds, only takes less than a second, so it was certainly these figures I messed around with. I have set:

seq_page_cost = 0.01
random_page_cost = 0.02
cpu_tuple_cost = 0.01

Everything seems to run faster now. I think this should be fine - I'll keep an eye on things over the next few days.

I truly appreciate everyone's help.

Ogden

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Greg Smith <greg(at)2ndquadrant(dot)com>
Cc:	jd(at)commandprompt(dot)com, Ogden <lists(at)darkstatic(dot)com>, Jesper Krogh <jesper(at)krogh(dot)cc>, Kenneth Marshall <ktm(at)rice(dot)edu>, pgsql-performance(at)postgresql(dot)org
Subject:	Re: Query much faster with enable_seqscan=0
Date:	2010-09-21 23:24:47
Message-ID:	11632.1285111487@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

Greg Smith <greg(at)2ndquadrant(dot)com> writes:
> and the query optimizer needs to be careful about what it does and
> doesn't pull from disk. If that's not the case, like here where there's
> 8GB of RAM and a 7GB database, dramatic reductions to both seq_page_cost
> and random_page_cost can make sense. Don't be afraid to think lowering
> below 1.0 is going too far--something more like 0.01 for sequential and
> 0.02 for random may actually reflect reality here.

If you are tuning for an all-in-RAM situation, you should set
random_page_cost equal to seq_page_cost (and usually set both smaller
than 1). By definition, those costs are equal if you're fetching from
RAM. If it's only mostly-in-RAM then keeping random_page_cost a bit
higher makes sense.

regards, tom lane

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Ogden <lists(at)darkstatic(dot)com>
Cc:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: Query much faster with enable_seqscan=0
Date:	2010-09-21 23:30:32
Message-ID:	11723.1285111832@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

Ogden <lists(at)darkstatic(dot)com> writes:
> SELECT tr.id, tr.sid
> FROM
> test_registration tr,
> INNER JOIN test_registration_result r on (tr.id = r.test_registration_id)
> WHERE.
> tr.test_administration_id='32a22b12-aa21-11df-a606-96551e8f4e4c'::uuid
> GROUP BY tr.id, tr.sid

Seeing that tr.id is a primary key, I think you might be a lot better
off if you avoided the inner join and group by. I think what you really
want here is something like

SELECT tr.id, tr.sid
FROM
test_registration tr
WHERE
tr.test_administration_id='32a22b12-aa21-11df-a606-96551e8f4e4c'::uuid
AND EXISTS(SELECT 1 FROM test_registration_result r
WHERE tr.id = r.test_registration_id)

regards, tom lane

From:	Ogden <lists(at)darkstatic(dot)com>
To:	pgsql-performance(at)postgresql(dot)org
Cc:	Greg Smith <greg(at)2ndquadrant(dot)com>
Subject:	Re: Query much faster with enable_seqscan=0
Date:	2010-09-22 13:36:43
Message-ID:	47D7FDEE-BC85-4254-A994-4E8312EF0C6A@darkstatic.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

On Sep 21, 2010, at 2:34 PM, Ogden wrote:

>
> On Sep 21, 2010, at 2:16 PM, Greg Smith wrote:
>
>> Joshua D. Drake wrote:
>>> PostgreSQL's defaults are based on extremely small and some would say
>>> (non production) size databases. As a matter of course I always
>>> recommend bringing seq_page_cost and random_page_cost more in line.
>>>
>>
>> Also, they presume that not all of your data is going to be in memory, and the query optimizer needs to be careful about what it does and doesn't pull from disk. If that's not the case, like here where there's 8GB of RAM and a 7GB database, dramatic reductions to both seq_page_cost and random_page_cost can make sense. Don't be afraid to think lowering below 1.0 is going too far--something more like 0.01 for sequential and 0.02 for random may actually reflect reality here.
>>
>
> I have done just that, per your recommendations and now what took 14 seconds, only takes less than a second, so it was certainly these figures I messed around with. I have set:
>
> seq_page_cost = 0.01
> random_page_cost = 0.02
> cpu_tuple_cost = 0.01
>
> Everything seems to run faster now. I think this should be fine - I'll keep an eye on things over the next few days.
>
> I truly appreciate everyone's help.
>
> Ogden
>

I spoke too soon - well I came in this morning and reran the query that was speeded up yesterday by a lot after tweaking those numbers. This morning the first time I ran it, it took 16 seconds whereas every subsequent run was a matter of 2 seconds. I assume there is OS caching going on for those results. Is this normal or could it also be the speed of my disks which is causing a lag when I first run it (it's RAID 5 across 6 disks). Is there any explanation for this or what should those settings really be? Perhaps 0.01 is too low?

Thank you

Ogden

From:	Scott Carey <scott(at)richrelevance(dot)com>
To:	Ogden <lists(at)darkstatic(dot)com>
Cc:	"pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>, Greg Smith <greg(at)2ndquadrant(dot)com>
Subject:	Re: Query much faster with enable_seqscan=0
Date:	2010-09-24 17:34:41
Message-ID:	F99BA3A3-9D60-4943-81C8-9F11CD55E1C1@richrelevance.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

On Sep 22, 2010, at 6:36 AM, Ogden wrote:

>
> On Sep 21, 2010, at 2:34 PM, Ogden wrote:
>
>>
>> On Sep 21, 2010, at 2:16 PM, Greg Smith wrote:
>>
>>> Joshua D. Drake wrote:
>>>> PostgreSQL's defaults are based on extremely small and some would say
>>>> (non production) size databases. As a matter of course I always
>>>> recommend bringing seq_page_cost and random_page_cost more in line.
>>>>
>>>
>>> Also, they presume that not all of your data is going to be in memory, and the query optimizer needs to be careful about what it does and doesn't pull from disk. If that's not the case, like here where there's 8GB of RAM and a 7GB database, dramatic reductions to both seq_page_cost and random_page_cost can make sense. Don't be afraid to think lowering below 1.0 is going too far--something more like 0.01 for sequential and 0.02 for random may actually reflect reality here.
>>>
>>
>> I have done just that, per your recommendations and now what took 14 seconds, only takes less than a second, so it was certainly these figures I messed around with. I have set:
>>
>> seq_page_cost = 0.01
>> random_page_cost = 0.02
>> cpu_tuple_cost = 0.01
>>
>> Everything seems to run faster now. I think this should be fine - I'll keep an eye on things over the next few days.
>>
>> I truly appreciate everyone's help.
>>
>> Ogden
>>
>
>
> I spoke too soon - well I came in this morning and reran the query that was speeded up yesterday by a lot after tweaking those numbers. This morning the first time I ran it, it took 16 seconds whereas every subsequent run was a matter of 2 seconds. I assume there is OS caching going on for those results. Is this normal or could it also be the speed of my disks which is causing a lag when I first run it (it's RAID 5 across 6 disks). Is there any explanation for this or what should those settings really be? Perhaps 0.01 is too low?
>
> Thank you
>
> Ogden

When not cached, the plan with sequential scans will almost always be much faster.

When cached in memory, the ones using indexes are almost always faster.

The tuning parameters are essentially telling postgres the likelihood of finding things on disk instead versus in memory. The default parameters are essentially "not likely in memory, with a somewhat slow disk".

> --
> Sent via pgsql-performance mailing list (pgsql-performance(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Ogden <lists(at)darkstatic(dot)com>
Cc:	pgsql-performance(at)postgresql(dot)org, Greg Smith <greg(at)2ndquadrant(dot)com>
Subject:	Re: Query much faster with enable_seqscan=0
Date:	2010-09-28 16:03:10
Message-ID:	AANLkTikCg558pUy0dFXN7G3xne7Q2-cfz_93a2+V1SrR@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

On Wed, Sep 22, 2010 at 9:36 AM, Ogden <lists(at)darkstatic(dot)com> wrote:
>
> On Sep 21, 2010, at 2:34 PM, Ogden wrote:
>
>>
>> On Sep 21, 2010, at 2:16 PM, Greg Smith wrote:
>>
>>> Joshua D. Drake wrote:
>>>> PostgreSQL's defaults are based on extremely small and some would say
>>>> (non production) size databases. As a matter of course I always
>>>> recommend bringing seq_page_cost and random_page_cost more in line.
>>>>
>>>
>>> Also, they presume that not all of your data is going to be in memory, and the query optimizer needs to be careful about what it does and doesn't pull from disk. If that's not the case, like here where there's 8GB of RAM and a 7GB database, dramatic reductions to both seq_page_cost and random_page_cost can make sense. Don't be afraid to think lowering below 1.0 is going too far--something more like 0.01 for sequential and 0.02 for random may actually reflect reality here.
>>>
>>
>> I have done just that, per your recommendations and now what took 14 seconds, only takes less than a second, so it was certainly these figures I messed around with. I have set:
>>
>> seq_page_cost = 0.01
>> random_page_cost = 0.02
>> cpu_tuple_cost = 0.01
>>
>> Everything seems to run faster now. I think this should be fine - I'll keep an eye on things over the next few days.
>>
>> I truly appreciate everyone's help.
>>
>> Ogden
>>
>
>
> I spoke too soon - well I came in this morning and reran the query that was speeded up yesterday by a lot after tweaking those numbers. This morning the first time I ran it, it took 16 seconds whereas every subsequent run was a matter of 2 seconds. I assume there is OS caching going on for those results. Is this normal or could it also be the speed of my disks which is causing a lag when I first run it (it's RAID 5 across 6 disks). Is there any explanation for this or what should those settings really be? Perhaps 0.01 is too low?

Yeah, I think those numbers are a bit low. Your database probably
isn't fully cached. Keep in mind there's going to be some fluctuation
as to what is and is not in cache, and you can't expect whatever plan
the planner picks to be exactly perfect for both cases. I might try
something more like 0.2 / 0.1. If you really need the query to be
fast, though, you might need to do more than jigger the page costs.
Did you try Tom's suggested rewrite?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

From:	Ogden <lists(at)darkstatic(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: Query much faster with enable_seqscan=0
Date:	2010-10-13 00:23:24
Message-ID:	9EE3D98A-CF01-43A3-B3AC-636F5D407FA8@darkstatic.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

On Sep 21, 2010, at 6:30 PM, Tom Lane wrote:

> Ogden <lists(at)darkstatic(dot)com> writes:
>> SELECT tr.id, tr.sid
>> FROM
>> test_registration tr,
>> INNER JOIN test_registration_result r on (tr.id = r.test_registration_id)
>> WHERE.
>> tr.test_administration_id='32a22b12-aa21-11df-a606-96551e8f4e4c'::uuid
>> GROUP BY tr.id, tr.sid
>
> Seeing that tr.id is a primary key, I think you might be a lot better
> off if you avoided the inner join and group by. I think what you really
> want here is something like
>
> SELECT tr.id, tr.sid
> FROM
> test_registration tr
> WHERE
> tr.test_administration_id='32a22b12-aa21-11df-a606-96551e8f4e4c'::uuid
> AND EXISTS(SELECT 1 FROM test_registration_result r
> WHERE tr.id = r.test_registration_id)
>
> regards, tom lane
>

Thank you for this suggestion, however, what if I wanted some columns from test_registration_result - this wouldn't work, for example if I wanted test_registration_result.answer to be fetched. Hence, I had to have a JOIN with test_registration_result and a GROUP BY. I still am not happy with my query - the EXISTS executes in great speed however I cannot retrieve any of the columns from that table.

Thank you

Ogden

From:	Samuel Gendler <sgendler(at)ideasculptor(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Ogden <lists(at)darkstatic(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject:	Re: Query much faster with enable_seqscan=0
Date:	2010-10-13 02:28:55
Message-ID:	AANLkTi=GVn5pB_Xt2PkoWAbZWrBQTbARtcrLNtBJ8N-4@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

On Tue, Sep 21, 2010 at 4:30 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Ogden <lists(at)darkstatic(dot)com> writes:
> > SELECT tr.id, tr.sid
> > FROM
> > test_registration tr,
> > INNER JOIN test_registration_result r on (tr.id =
> r.test_registration_id)
> > WHERE.
> >
> tr.test_administration_id='32a22b12-aa21-11df-a606-96551e8f4e4c'::uuid
> > GROUP BY tr.id, tr.sid
>
> Seeing that tr.id is a primary key, I think you might be a lot better
> off if you avoided the inner join and group by. I think what you really
> want here is something like
>
> SELECT tr.id, tr.sid
> FROM
> test_registration tr
> WHERE
>
> tr.test_administration_id='32a22b12-aa21-11df-a606-96551e8f4e4c'::uuid
> AND EXISTS(SELECT 1 FROM test_registration_result r
> WHERE tr.id = r.test_registration_id)
>
> regards, tom lane
>
>
Could you explain the logic behind why this structure is better than the
other? Is it always the case that one should just always use the
'exists(select 1 from x...)' structure when trying to strip rows that don't
join or is it just the case when you know that the rows which do join are a
fairly limited subset? Does the same advantage exist if filtering rows in
the joined table on some criteria, or is it better at that point to use an
inner join and add a where clause to filter the joined rows.

select table1.columns
from table1, table2
where table1.column = 'some_value'
and table1.fk = table2.pk
AND table2.column = 'some_other_value'

versus

select table1.columns
from table1
where table1.column = 'some_value'
and exists(select 1 from table2 where table1.fk = table2.pk
and table2.column ='some_other_value')

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Samuel Gendler <sgendler(at)ideasculptor(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Ogden <lists(at)darkstatic(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject:	Re: Query much faster with enable_seqscan=0
Date:	2010-10-28 16:21:01
Message-ID:	AANLkTi=cvPjxDLJHpJ82F=zXtGqXRjb7mtvUod-vx6ZC@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

On Tue, Oct 12, 2010 at 10:28 PM, Samuel Gendler
<sgendler(at)ideasculptor(dot)com> wrote:
>
>
> On Tue, Sep 21, 2010 at 4:30 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>
>> Ogden <lists(at)darkstatic(dot)com> writes:
>> > SELECT tr.id, tr.sid
>> > FROM
>> > test_registration tr,
>> > INNER JOIN test_registration_result r on (tr.id =
>> > r.test_registration_id)
>> > WHERE.
>> >
>> > tr.test_administration_id='32a22b12-aa21-11df-a606-96551e8f4e4c'::uuid
>> > GROUP BY tr.id, tr.sid
>>
>> Seeing that tr.id is a primary key, I think you might be a lot better
>> off if you avoided the inner join and group by. I think what you really
>> want here is something like
>>
>> SELECT tr.id, tr.sid
>> FROM
>> test_registration tr
>> WHERE
>>
>> tr.test_administration_id='32a22b12-aa21-11df-a606-96551e8f4e4c'::uuid
>> AND EXISTS(SELECT 1 FROM test_registration_result r
>> WHERE tr.id = r.test_registration_id)
>>
>> regards, tom lane
>>
>
> Could you explain the logic behind why this structure is better than the
> other? Is it always the case that one should just always use the
> 'exists(select 1 from x...)' structure when trying to strip rows that don't
> join or is it just the case when you know that the rows which do join are a
> fairly limited subset? Does the same advantage exist if filtering rows in
> the joined table on some criteria, or is it better at that point to use an
> inner join and add a where clause to filter the joined rows.
> select table1.columns
> from table1, table2
> where table1.column = 'some_value'
>    and table1.fk = table2.pk
>   AND table2.column = 'some_other_value'
> versus
> select table1.columns
>   from table1
> where table1.column = 'some_value'
>    and exists(select 1 from table2 where table1.fk = table2.pk
>    and table2.column ='some_other_value')

I don't think there's much difference between those two cases. I
think Tom's point was that GROUP BY can be expensive - which it
certainly can. It's absolutely necessary and unavoidable for certain
queries, of course, but don't include it unless you need it.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company