Lists: | pgsql-general |
---|
From: | "Oliver Weichhold" <oliver(at)weichhold(dot)com> |
---|---|
To: | pgsql-general(at)postgresql(dot)org |
Subject: | Indexing problem with OFFSET LIMIT |
Date: | 2008-08-29 20:38:28 |
Message-ID: | a13da490808291338u449390aco49b466cb8932927d@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-general |
Hello
I have problem in my applications and don't know how to fix it.
This is the table and one of the indexes:
CREATE TABLE foo
(
id serial NOT NULL,
foo_name character varying(100),
realm_id integer
... and about 50 other columns
)
CREATE INDEX idx_foo_name_realm
ON foo
USING btree
(realm_id, foo_name);
Table foo contains about 8 Million Rows.
The problem:
Consider this query:
SELECT * FROM foo WHERE realm_id = 228 order by foo_name LIMIT 200 OFFSET
15000
And it's execution plan:
"Limit (cost=57527.13..58294.16 rows=200 width=575) (actual
time=182.302..184.971 rows=200 loops=1)"
" -> Index Scan using idx_foo_name_realm on foo (cost=0.00..62159.98
rows=16208 width=575) (actual time=0.085..166.861 rows=15200 loops=1)"
" Index Cond: (realm_id = 228)"
"Total runtime: 185.591 ms"
And now look at this:
SELECT * FROM foo WHERE realm_id = 228 order by foo_name LIMIT 200 OFFSET
15999
"Limit (cost=59601.92..59602.42 rows=200 width=575) (actual
time=1069.759..1072.310 rows=200 loops=1)"
" -> Sort (cost=59561.92..59602.44 rows=16208 width=575) (actual
time=929.948..1052.620 rows=16199 loops=1)"
" Sort Key: foo_name"
" Sort Method: external merge Disk: 8984kB"
" -> Bitmap Heap Scan on foo (cost=306.69..54270.62 rows=16208
width=575) (actual time=9.612..235.902 rows=21788 loops=1)"
" Recheck Cond: (realm_id = 228)"
" -> Bitmap Index Scan on foo_realm_id (cost=0.00..302.64
rows=16208 width=0) (actual time=8.733..8.733 rows=21810 loops=1)"
" Index Cond: (realm_id = 228)"
"Total runtime: 1084.706 ms"
Execution time increases tenfold because postgres stopped using the index.
Can anybody explain to me what's going on and what can be done? Is this a
memory problem?
From: | "David Rowley" <dgrowley(at)gmail(dot)com> |
---|---|
To: | "'Oliver Weichhold'" <oliver(at)weichhold(dot)com>, <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: Indexing problem with OFFSET LIMIT |
Date: | 2008-08-30 01:14:15 |
Message-ID: | AA105C7BB2F141BA91A6C7F0BD44BFE6@amd64 |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-general |
I'm no expert at reading query plans, but I'm guessing the planner chose the
other plan because your offset + limit went beyond the row estimate.
Look's like it's then doing a disk based sort in the other plan which
probably explain why it's slow.
Someone please correct me if I'm wrong.
Perhaps if you can spare a little more work_mem in postgesql.conf it might
go back to a memory sort.
David.
-----Original Message-----
From: pgsql-general-owner(at)postgresql(dot)org
[mailto:pgsql-general-owner(at)postgresql(dot)org] On Behalf Of Oliver Weichhold
Sent: 29 August 2008 21:38
To: pgsql-general(at)postgresql(dot)org
Subject: [GENERAL] Indexing problem with OFFSET LIMIT
Hello
I have problem in my applications and don't know how to fix it.
This is the table and one of the indexes:
CREATE TABLE foo
(
id serial NOT NULL,
foo_name character varying(100),
realm_id integer
... and about 50 other columns
)
CREATE INDEX idx_foo_name_realm
ON foo
USING btree
(realm_id, foo_name);
Table foo contains about 8 Million Rows.
The problem:
Consider this query:
SELECT * FROM foo WHERE realm_id = 228 order by foo_name LIMIT 200 OFFSET
15000
And it's execution plan:
"Limit (cost=57527.13..58294.16 rows=200 width=575) (actual
time=182.302..184.971 rows=200 loops=1)"
" -> Index Scan using idx_foo_name_realm on foo (cost=0.00..62159.98
rows=16208 width=575) (actual time=0.085..166.861 rows=15200 loops=1)"
" Index Cond: (realm_id = 228)"
"Total runtime: 185.591 ms"
And now look at this:
SELECT * FROM foo WHERE realm_id = 228 order by foo_name LIMIT 200 OFFSET
15999
"Limit (cost=59601.92..59602.42 rows=200 width=575) (actual
time=1069.759..1072.310 rows=200 loops=1)"
" -> Sort (cost=59561.92..59602.44 rows=16208 width=575) (actual
time=929.948..1052.620 rows=16199 loops=1)"
" Sort Key: foo_name"
" Sort Method: external merge Disk: 8984kB"
" -> Bitmap Heap Scan on foo (cost=306.69..54270.62 rows=16208
width=575) (actual time=9.612..235.902 rows=21788 loops=1)"
" Recheck Cond: (realm_id = 228)"
" -> Bitmap Index Scan on foo_realm_id (cost=0.00..302.64
rows=16208 width=0) (actual time=8.733..8.733 rows=21810 loops=1)"
" Index Cond: (realm_id = 228)"
"Total runtime: 1084.706 ms"
Execution time increases tenfold because postgres stopped using the index.
Can anybody explain to me what's going on and what can be done? Is this a
memory problem?
From: | "Merlin Moncure" <mmoncure(at)gmail(dot)com> |
---|---|
To: | "Oliver Weichhold" <oliver(at)weichhold(dot)com> |
Cc: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: Indexing problem with OFFSET LIMIT |
Date: | 2008-08-30 02:11:12 |
Message-ID: | b42b73150808291911j3aad986fu151a7c516ab5afb7@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-general |
On Fri, Aug 29, 2008 at 4:38 PM, Oliver Weichhold <oliver(at)weichhold(dot)com> wrote:
> Hello
>
> I have problem in my applications and don't know how to fix it.
>
> This is the table and one of the indexes:
>
> CREATE TABLE foo
> (
> id serial NOT NULL,
> foo_name character varying(100),
> realm_id integer
>
> ... and about 50 other columns
> )
>
> CREATE INDEX idx_foo_name_realm
> ON foo
> USING btree
> (realm_id, foo_name);
>
> Table foo contains about 8 Million Rows.
>
>
> The problem:
>
> Consider this query:
>
> SELECT * FROM foo WHERE realm_id = 228 order by foo_name LIMIT 200 OFFSET
> 15000
try this:
SELECT * FROM foo WHERE realm_id = 228 order by realm_id, foo_name
LIMIT 200 OFFSET
15000
Or even better don't use 'offset' at all. It's simply lousy. If
you want to skip ahead 200 rows at a time, save off the previous last
extracted rows in the app:
1st time:
select * from foo order by realm_id, foo_name limit 200;
times after that:
select * from foo where (realm_id, foo_name) > (last_realm_id,
last_foo_name) order by realm_id, foo_name limit 200;
you should be pleasantly surprised :-). This is also a little bit
more graceful if other sessions are deleting/inserting rows while you
are browsing.
merlin
From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | "Merlin Moncure" <mmoncure(at)gmail(dot)com> |
Cc: | "Oliver Weichhold" <oliver(at)weichhold(dot)com>, pgsql-general(at)postgresql(dot)org |
Subject: | Re: Indexing problem with OFFSET LIMIT |
Date: | 2008-08-30 02:58:34 |
Message-ID: | 7115.1220065114@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-general |
"Merlin Moncure" <mmoncure(at)gmail(dot)com> writes:
> On Fri, Aug 29, 2008 at 4:38 PM, Oliver Weichhold <oliver(at)weichhold(dot)com> wrote:
>> Consider this query:
>>
>> SELECT * FROM foo WHERE realm_id = 228 order by foo_name LIMIT 200 OFFSET
>> 15000
> try this:
> SELECT * FROM foo WHERE realm_id = 228 order by realm_id, foo_name
> LIMIT 200 OFFSET
> 15000
> Or even better don't use 'offset' at all. It's simply lousy. If
> you want to skip ahead 200 rows at a time, save off the previous last
> extracted rows in the app:
Yeah, large OFFSET values are always going to suck performance-wise,
because there is no magic way to skip over those rows --- the backend
has to read them, and then throw them away internally. Avoid OFFSET.
Aside from the type of trick Merlin mentions, have you considered using
a cursor?
regards, tom lane