Re: [SQL] OFFSET impact on Performance???

Lists: pgsql-performancepgsql-sql
From: "Andrei Bintintan" <klodoma(at)ar-sd(dot)net>
To: <pgsql-sql(at)postgresql(dot)org>, <pgsql-performance(at)postgresql(dot)org>
Subject: OFFSET impact on Performance???
Date: 2005-01-20 11:13:44
Message-ID: 019f01c4fee1$1b58d600$0b00a8c0@forge
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance pgsql-sql

Hi to all,

I have the following 2 examples. Now, regarding on the offset if it is small(10) or big(>50000) what is the impact on the performance of the query?? I noticed that if I return more data's(columns) or if I make more joins then the query runs even slower if the OFFSET is bigger. How can I somehow improve the performance on this?

Best regards,
Andy.

explain analyze
SELECT o.id
FROM report r
INNER JOIN orders o ON o.id=r.id_order AND o.id_status=6
ORDER BY 1 LIMIT 10 OFFSET 10

Limit (cost=44.37..88.75 rows=10 width=4) (actual time=0.160..0.275 rows=10 loops=1)
-> Merge Join (cost=0.00..182150.17 rows=41049 width=4) (actual time=0.041..0.260 rows=20 loops=1)
Merge Cond: ("outer".id_order = "inner".id)
-> Index Scan using report_id_order_idx on report r (cost=0.00..157550.90 rows=42862 width=4) (actual time=0.018..0.075 rows=20 loops=1)
-> Index Scan using orders_pkey on orders o (cost=0.00..24127.04 rows=42501 width=4) (actual time=0.013..0.078 rows=20 loops=1)
Filter: (id_status = 6)
Total runtime: 0.373 ms

explain analyze
SELECT o.id
FROM report r
INNER JOIN orders o ON o.id=r.id_order AND o.id_status=6
ORDER BY 1 LIMIT 10 OFFSET 1000000

Limit (cost=31216.85..31216.85 rows=1 width=4) (actual time=1168.152..1168.152 rows=0 loops=1)
-> Sort (cost=31114.23..31216.85 rows=41049 width=4) (actual time=1121.769..1152.246 rows=42693 loops=1)
Sort Key: o.id
-> Hash Join (cost=2329.99..27684.03 rows=41049 width=4) (actual time=441.879..925.498 rows=42693 loops=1)
Hash Cond: ("outer".id_order = "inner".id)
-> Seq Scan on report r (cost=0.00..23860.62 rows=42862 width=4) (actual time=38.634..366.035 rows=42864 loops=1)
-> Hash (cost=2077.74..2077.74 rows=42501 width=4) (actual time=140.200..140.200 rows=0 loops=1)
-> Seq Scan on orders o (cost=0.00..2077.74 rows=42501 width=4) (actual time=0.059..96.890 rows=42693 loops=1)
Filter: (id_status = 6)
Total runtime: 1170.586 ms


From: Richard Huxton <dev(at)archonet(dot)com>
To: Andrei Bintintan <klodoma(at)ar-sd(dot)net>
Cc: pgsql-sql(at)postgresql(dot)org, pgsql-performance(at)postgresql(dot)org
Subject: Re: [SQL] OFFSET impact on Performance???
Date: 2005-01-20 12:10:59
Message-ID: 41EF9FD3.5020400@archonet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance pgsql-sql

Andrei Bintintan wrote:
> Hi to all,
>
> I have the following 2 examples. Now, regarding on the offset if it
> is small(10) or big(>50000) what is the impact on the performance of
> the query?? I noticed that if I return more data's(columns) or if I
> make more joins then the query runs even slower if the OFFSET is
> bigger. How can I somehow improve the performance on this?

There's really only one way to do an offset of 1000 and that's to fetch
1000 rows and then some and discard the first 1000.

If you're using this to provide "pages" of results, could you use a cursor?

--
Richard Huxton
Archonet Ltd


From: "Andrei Bintintan" <klodoma(at)ar-sd(dot)net>
To: "Richard Huxton" <dev(at)archonet(dot)com>
Cc: <pgsql-sql(at)postgresql(dot)org>, <pgsql-performance(at)postgresql(dot)org>
Subject: Re: [SQL] OFFSET impact on Performance???
Date: 2005-01-20 13:45:47
Message-ID: 020d01c4fef6$595bb9d0$0b00a8c0@forge
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance pgsql-sql

> If you're using this to provide "pages" of results, could you use a
> cursor?
What do you mean by that? Cursor?

Yes I'm using this to provide "pages", but If I jump to the last pages it
goes very slow.

Andy.

----- Original Message -----
From: "Richard Huxton" <dev(at)archonet(dot)com>
To: "Andrei Bintintan" <klodoma(at)ar-sd(dot)net>
Cc: <pgsql-sql(at)postgresql(dot)org>; <pgsql-performance(at)postgresql(dot)org>
Sent: Thursday, January 20, 2005 2:10 PM
Subject: Re: [SQL] OFFSET impact on Performance???

> Andrei Bintintan wrote:
>> Hi to all,
>>
>> I have the following 2 examples. Now, regarding on the offset if it
>> is small(10) or big(>50000) what is the impact on the performance of
>> the query?? I noticed that if I return more data's(columns) or if I
>> make more joins then the query runs even slower if the OFFSET is
>> bigger. How can I somehow improve the performance on this?
>
> There's really only one way to do an offset of 1000 and that's to fetch
> 1000 rows and then some and discard the first 1000.
>
> If you're using this to provide "pages" of results, could you use a
> cursor?
>
> --
> Richard Huxton
> Archonet Ltd
>


From: Richard Huxton <dev(at)archonet(dot)com>
To: Andrei Bintintan <klodoma(at)ar-sd(dot)net>
Cc: pgsql-sql(at)postgresql(dot)org, pgsql-performance(at)postgresql(dot)org
Subject: Re: [SQL] OFFSET impact on Performance???
Date: 2005-01-20 15:20:59
Message-ID: 41EFCC5B.5010302@archonet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance pgsql-sql

Andrei Bintintan wrote:
>> If you're using this to provide "pages" of results, could you use a
>> cursor?
>
> What do you mean by that? Cursor?
>
> Yes I'm using this to provide "pages", but If I jump to the last pages
> it goes very slow.

DECLARE mycursor CURSOR FOR SELECT * FROM ...
FETCH FORWARD 10 IN mycursor;
CLOSE mycursor;

Repeated FETCHes would let you step through your results. That won't
work if you have a web-app making repeated connections.

If you've got a web-application then you'll probably want to insert the
results into a cache table for later use.

--
Richard Huxton
Archonet Ltd


From: Alex Turner <armtuk(at)gmail(dot)com>
To: Richard Huxton <dev(at)archonet(dot)com>
Cc: Andrei Bintintan <klodoma(at)ar-sd(dot)net>, pgsql-sql(at)postgresql(dot)org, pgsql-performance(at)postgresql(dot)org
Subject: Re: [SQL] OFFSET impact on Performance???
Date: 2005-01-20 16:39:16
Message-ID: 33c6269f05012008391490448b@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance pgsql-sql

I am also very interesting in this very question.. Is there any way to
declare a persistant cursor that remains open between pg sessions?
This would be better than a temp table because you would not have to
do the initial select and insert into a fresh table and incur those IO
costs, which are often very heavy, and the reason why one would want
to use a cursor.

Alex Turner
NetEconomist

On Thu, 20 Jan 2005 15:20:59 +0000, Richard Huxton <dev(at)archonet(dot)com> wrote:
> Andrei Bintintan wrote:
> >> If you're using this to provide "pages" of results, could you use a
> >> cursor?
> >
> > What do you mean by that? Cursor?
> >
> > Yes I'm using this to provide "pages", but If I jump to the last pages
> > it goes very slow.
>
> DECLARE mycursor CURSOR FOR SELECT * FROM ...
> FETCH FORWARD 10 IN mycursor;
> CLOSE mycursor;
>
> Repeated FETCHes would let you step through your results. That won't
> work if you have a web-app making repeated connections.
>
> If you've got a web-application then you'll probably want to insert the
> results into a cache table for later use.
>
> --
> Richard Huxton
> Archonet Ltd
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
> message can get through to the mailing list cleanly
>


From: Ron Mayer <rm_pg(at)cheapcomplexdevices(dot)com>
To: Richard Huxton <dev(at)archonet(dot)com>
Subject: Re: [SQL] OFFSET impact on Performance???
Date: 2005-01-20 16:49:39
Message-ID: 41EFE123.9020209@cheapcomplexdevices.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance pgsql-sql

Richard Huxton wrote:
>
> If you've got a web-application then you'll probably want to insert the
> results into a cache table for later use.
>

If I have quite a bit of activity like this (people selecting 10000 out
of a few million rows and paging through them in a web browser), would
it be good to have a single table with a userid column shared by all
users, or a separate table for each user that can be truncated/dropped?

I started out with one table; but with people doing 10s of thousand
of inserts and deletes per session, I had a pretty hard time figuring
out a reasonable vacuum strategy.

Eventually I started doing a whole bunch of create table tmp_XXXX
tables where XXXX is a userid; and a script to drop these tables - but
that's quite ugly in a different way.

With 8.0 I guess I'll try the single table again - perhaps what I
want may be to always have a I/O throttled vacuum running... hmm.

Any suggestions?


From: Richard Huxton <dev(at)archonet(dot)com>
To: alex(at)neteconomist(dot)com
Cc: Andrei Bintintan <klodoma(at)ar-sd(dot)net>, pgsql-sql(at)postgresql(dot)org, pgsql-performance(at)postgresql(dot)org
Subject: Re: [SQL] OFFSET impact on Performance???
Date: 2005-01-20 16:53:14
Message-ID: 41EFE1FA.60004@archonet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance pgsql-sql

Alex Turner wrote:
> I am also very interesting in this very question.. Is there any way
> to declare a persistant cursor that remains open between pg sessions?

Not sure how this would work. What do you do with multiple connections?
Only one can access the cursor, so which should it be?

> This would be better than a temp table because you would not have to
> do the initial select and insert into a fresh table and incur those
> IO costs, which are often very heavy, and the reason why one would
> want to use a cursor.

I'm pretty sure two things mean there's less difference than you might
expect:
1. Temp tables don't fsync
2. A cursor will spill to disk beyond a certain size

--
Richard Huxton
Archonet Ltd


From: Greg Stark <gsstark(at)mit(dot)edu>
To: "Andrei Bintintan" <klodoma(at)ar-sd(dot)net>
Cc: "Richard Huxton" <dev(at)archonet(dot)com>, <pgsql-sql(at)postgresql(dot)org>, <pgsql-performance(at)postgresql(dot)org>
Subject: Re: [SQL] OFFSET impact on Performance???
Date: 2005-01-20 16:59:34
Message-ID: 87pt00c9zd.fsf@stark.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance pgsql-sql

"Andrei Bintintan" <klodoma(at)ar-sd(dot)net> writes:

> > If you're using this to provide "pages" of results, could you use a cursor?
> What do you mean by that? Cursor?
>
> Yes I'm using this to provide "pages", but If I jump to the last pages it goes
> very slow.

The best way to do pages for is not to use offset or cursors but to use an
index. This only works if you can enumerate all the sort orders the
application might be using and can have an index on each of them.

To do this the query would look something like:

SELECT * FROM tab WHERE col > ? ORDER BY col LIMIT 50

Then you take note of the last value used on a given page and if the user
selects "next" you pass that as the starting point for the next page.

This query takes the same amount of time no matter how many records are in the
table and no matter what page of the result set the user is on. It should
actually be instantaneous even if the user is on the hundredth page of
millions of records because it uses an index both for the finding the right
point to start and for the ordering.

It also has the advantage that it works even if the list of items changes as
the user navigates. If you use OFFSET and someone inserts a record in the
table then the "next" page will overlap the current page. Worse, if someone
deletes a record then "next" will skip a record.

The disadvantages of this are a) it's hard (but not impossible) to go
backwards. And b) it's impossible to give the user a list of pages and let
them skip around willy nilly.

(If this is for a web page then specifically don't recommend cursors. It will
mean you'll have to have some complex session management system that
guarantees the user will always come to the same postgres session and has some
garbage collection if the user disappears. And it means the URL is only good
for a limited amount of time. If they bookmark it it'll break if they come
back the next day.)

--
greg


From: Richard Huxton <dev(at)archonet(dot)com>
To: Ron Mayer <rm_pg(at)cheapcomplexdevices(dot)com>
Cc: 'Postgresql Performance' <pgsql-performance(at)postgresql(dot)org>
Subject: Re: [SQL] OFFSET impact on Performance???
Date: 2005-01-20 17:04:23
Message-ID: 41EFE497.2070607@archonet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance pgsql-sql

Ron Mayer wrote:
> Richard Huxton wrote:
>
>>
>> If you've got a web-application then you'll probably want to insert
>> the results into a cache table for later use.
>>
>
> If I have quite a bit of activity like this (people selecting 10000 out
> of a few million rows and paging through them in a web browser), would
> it be good to have a single table with a userid column shared by all
> users, or a separate table for each user that can be truncated/dropped?
>
> I started out with one table; but with people doing 10s of thousand
> of inserts and deletes per session, I had a pretty hard time figuring
> out a reasonable vacuum strategy.

As often as you can, and make sure your config allocates enough
free-space-map for them. Unless, of course, you end up I/O saturated.

> Eventually I started doing a whole bunch of create table tmp_XXXX
> tables where XXXX is a userid; and a script to drop these tables - but
> that's quite ugly in a different way.
>
> With 8.0 I guess I'll try the single table again - perhaps what I
> want may be to always have a I/O throttled vacuum running... hmm.

Well, there have been some tweaks, but I don't know if they'll help in
this case.

--
Richard Huxton
Archonet Ltd


From: Richard Huxton <dev(at)archonet(dot)com>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: Andrei Bintintan <klodoma(at)ar-sd(dot)net>, pgsql-sql(at)postgresql(dot)org, pgsql-performance(at)postgresql(dot)org
Subject: Re: [SQL] OFFSET impact on Performance???
Date: 2005-01-20 17:24:36
Message-ID: 41EFE954.7080205@archonet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance pgsql-sql

Greg Stark wrote:
> "Andrei Bintintan" <klodoma(at)ar-sd(dot)net> writes:
>
>
>>>If you're using this to provide "pages" of results, could you use a cursor?
>>
>>What do you mean by that? Cursor?
>>
>>Yes I'm using this to provide "pages", but If I jump to the last pages it goes
>>very slow.
>
>
> The best way to do pages for is not to use offset or cursors but to use an
> index. This only works if you can enumerate all the sort orders the
> application might be using and can have an index on each of them.
>
> To do this the query would look something like:
>
> SELECT * FROM tab WHERE col > ? ORDER BY col LIMIT 50
>
> Then you take note of the last value used on a given page and if the user
> selects "next" you pass that as the starting point for the next page.

Greg's is the most efficient, but you need to make sure you have a
suitable key available in the output of your select.

Also, since you are repeating the query you could get different results
as people insert/delete rows. This might or might not be what you want.

A similar solution is to partition by date/alphabet or similar, then
page those results. That can reduce your resultset to a manageable size.
--
Richard Huxton
Archonet Ltd


From: Ragnar Hafstað <gnari(at)simnet(dot)is>
To: pgsql-performance(at)postgresql(dot)org
Cc: Andrei Bintintan <klodoma(at)ar-sd(dot)net>, pgsql-sql(at)postgresql(dot)org
Subject: Re: [SQL] OFFSET impact on Performance???
Date: 2005-01-20 19:12:06
Message-ID: 1106248326.22416.22.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance pgsql-sql

On Thu, 2005-01-20 at 11:59 -0500, Greg Stark wrote:

> The best way to do pages for is not to use offset or cursors but to use an
> index. This only works if you can enumerate all the sort orders the
> application might be using and can have an index on each of them.
>
> To do this the query would look something like:
>
> SELECT * FROM tab WHERE col > ? ORDER BY col LIMIT 50
>
> Then you take note of the last value used on a given page and if the user
> selects "next" you pass that as the starting point for the next page.

this will only work unchanged if the index is unique. imagine , for
example if you have more than 50 rows with the same value of col.

one way to fix this is to use ORDER BY col,oid

gnari


From: Ragnar Hafstað <gnari(at)simnet(dot)is>
To: pgsql-performance(at)postgresql(dot)org
Cc: Andrei Bintintan <klodoma(at)ar-sd(dot)net>, pgsql-sql(at)postgresql(dot)org
Subject: Re: [SQL] OFFSET impact on Performance???
Date: 2005-01-20 19:23:12
Message-ID: 1106248992.22416.25.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance pgsql-sql

On Thu, 2005-01-20 at 19:12 +0000, Ragnar Hafstað wrote:
> On Thu, 2005-01-20 at 11:59 -0500, Greg Stark wrote:
>
> > The best way to do pages for is not to use offset or cursors but to use an
> > index. This only works if you can enumerate all the sort orders the
> > application might be using and can have an index on each of them.
> >
> > To do this the query would look something like:
> >
> > SELECT * FROM tab WHERE col > ? ORDER BY col LIMIT 50
> >
> > Then you take note of the last value used on a given page and if the user
> > selects "next" you pass that as the starting point for the next page.
>
> this will only work unchanged if the index is unique. imagine , for
> example if you have more than 50 rows with the same value of col.
>
> one way to fix this is to use ORDER BY col,oid

and a slightly more complex WHERE clause as well, of course

gnari


From: "Andrei Bintintan" <klodoma(at)ar-sd(dot)net>
To: Ragnar Hafstað <gnari(at)simnet(dot)is>, <pgsql-performance(at)postgresql(dot)org>
Cc: <pgsql-sql(at)postgresql(dot)org>
Subject: Re: [SQL] OFFSET impact on Performance???
Date: 2005-01-21 09:20:48
Message-ID: 00c001c4ff9a$809918e0$0b00a8c0@forge
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance pgsql-sql

Now I read all the posts and I have some answers.

Yes, I have a web aplication.
I HAVE to know exactly how many pages I have and I have to allow the user to
jump to a specific page(this is where I used limit and offset). We have this
feature and I cannot take it out.

>> > SELECT * FROM tab WHERE col > ? ORDER BY col LIMIT 50
Now this solution looks very fast, but I cannot implement it, because I
cannot jump from page 1 to page xxxx only to page 2. Because I know with
this type where did the page 1 ended. And we have some really complicated
where's and about 10 tables are involved in the sql query.

About the CURSOR I have to read more about them because this is my first
time when I hear about.
I don't know if temporary tables are a solution, really I don't think so,
there are a lot of users that are working in the same time at the same page.

So... still DIGGING for solutions.

Andy.

----- Original Message -----
From: "Ragnar Hafstað" <gnari(at)simnet(dot)is>
To: <pgsql-performance(at)postgresql(dot)org>
Cc: "Andrei Bintintan" <klodoma(at)ar-sd(dot)net>; <pgsql-sql(at)postgresql(dot)org>
Sent: Thursday, January 20, 2005 9:23 PM
Subject: Re: [PERFORM] [SQL] OFFSET impact on Performance???

> On Thu, 2005-01-20 at 19:12 +0000, Ragnar Hafstað wrote:
>> On Thu, 2005-01-20 at 11:59 -0500, Greg Stark wrote:
>>
>> > The best way to do pages for is not to use offset or cursors but to use
>> > an
>> > index. This only works if you can enumerate all the sort orders the
>> > application might be using and can have an index on each of them.
>> >
>> > To do this the query would look something like:
>> >
>> > SELECT * FROM tab WHERE col > ? ORDER BY col LIMIT 50
>> >
>> > Then you take note of the last value used on a given page and if the
>> > user
>> > selects "next" you pass that as the starting point for the next page.
>>
>> this will only work unchanged if the index is unique. imagine , for
>> example if you have more than 50 rows with the same value of col.
>>
>> one way to fix this is to use ORDER BY col,oid
>
> and a slightly more complex WHERE clause as well, of course
>
> gnari
>
>
>


From: Greg Stark <gsstark(at)mit(dot)edu>
To: alex(at)neteconomist(dot)com
Cc: Richard Huxton <dev(at)archonet(dot)com>, Andrei Bintintan <klodoma(at)ar-sd(dot)net>, pgsql-sql(at)postgresql(dot)org, pgsql-performance(at)postgresql(dot)org
Subject: Re: [SQL] OFFSET impact on Performance???
Date: 2005-01-25 18:28:46
Message-ID: 87y8eh1hy9.fsf@stark.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance pgsql-sql


Alex Turner <armtuk(at)gmail(dot)com> writes:

> I am also very interesting in this very question.. Is there any way to
> declare a persistant cursor that remains open between pg sessions?
> This would be better than a temp table because you would not have to
> do the initial select and insert into a fresh table and incur those IO
> costs, which are often very heavy, and the reason why one would want
> to use a cursor.

TANSTAAFL. How would such a persistent cursor be implemented if not by
building a temporary table somewhere behind the scenes?

There could be some advantage if the data were stored in a temporary table
marked as not having to be WAL logged. Instead it could be automatically
cleared on every database start.

--
greg


From: "Andrei Bintintan" <klodoma(at)ar-sd(dot)net>
To: <alex(at)neteconomist(dot)com>, "Greg Stark" <gsstark(at)mit(dot)edu>
Cc: "Richard Huxton" <dev(at)archonet(dot)com>, <pgsql-sql(at)postgresql(dot)org>, <pgsql-performance(at)postgresql(dot)org>
Subject: Re: [SQL] OFFSET impact on Performance???
Date: 2005-01-26 10:11:49
Message-ID: 015a01c5038f$73a70720$0b00a8c0@forge
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance pgsql-sql

The problems still stays open.

The thing is that I have about 20 - 30 clients that are using that SQL query
where the offset and limit are involved. So, I cannot create a temp table,
because that means that I'll have to make a temp table for each session...
which is a very bad ideea. Cursors somehow the same. In my application the
Where conditions can be very different for each user(session) apart.

The only solution that I see in the moment is to work at the query, or to
write a more complex where function to limit the results output. So no
replace for Offset/Limit.

Best regards,
Andy.

----- Original Message -----
From: "Greg Stark" <gsstark(at)mit(dot)edu>
To: <alex(at)neteconomist(dot)com>
Cc: "Richard Huxton" <dev(at)archonet(dot)com>; "Andrei Bintintan"
<klodoma(at)ar-sd(dot)net>; <pgsql-sql(at)postgresql(dot)org>;
<pgsql-performance(at)postgresql(dot)org>
Sent: Tuesday, January 25, 2005 8:28 PM
Subject: Re: [PERFORM] [SQL] OFFSET impact on Performance???

>
> Alex Turner <armtuk(at)gmail(dot)com> writes:
>
>> I am also very interesting in this very question.. Is there any way to
>> declare a persistant cursor that remains open between pg sessions?
>> This would be better than a temp table because you would not have to
>> do the initial select and insert into a fresh table and incur those IO
>> costs, which are often very heavy, and the reason why one would want
>> to use a cursor.
>
> TANSTAAFL. How would such a persistent cursor be implemented if not by
> building a temporary table somewhere behind the scenes?
>
> There could be some advantage if the data were stored in a temporary table
> marked as not having to be WAL logged. Instead it could be automatically
> cleared on every database start.
>
> --
> greg
>
>


From: Alex Turner <armtuk(at)gmail(dot)com>
To: Andrei Bintintan <klodoma(at)ar-sd(dot)net>
Cc: Greg Stark <gsstark(at)mit(dot)edu>, Richard Huxton <dev(at)archonet(dot)com>, pgsql-sql(at)postgresql(dot)org, pgsql-performance(at)postgresql(dot)org
Subject: Re: [SQL] OFFSET impact on Performance???
Date: 2005-01-26 13:47:32
Message-ID: 33c6269f05012605473319ef40@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance pgsql-sql

As I read the docs, a temp table doesn't solve our problem, as it does
not persist between sessions. With a web page there is no guarentee
that you will receive the same connection between requests, so a temp
table doesn't solve the problem. It looks like you either have to
create a real table (which is undesirable becuase it has to be
physicaly synced, and TTFB will be very poor) or create an application
tier in between the web tier and the database tier to allow data to
persist between requests tied to a unique session id.

Looks like the solutions to this problem is not RDBMS IMHO.

Alex Turner
NetEconomist

On Wed, 26 Jan 2005 12:11:49 +0200, Andrei Bintintan <klodoma(at)ar-sd(dot)net> wrote:
> The problems still stays open.
>
> The thing is that I have about 20 - 30 clients that are using that SQL query
> where the offset and limit are involved. So, I cannot create a temp table,
> because that means that I'll have to make a temp table for each session...
> which is a very bad ideea. Cursors somehow the same. In my application the
> Where conditions can be very different for each user(session) apart.
>
> The only solution that I see in the moment is to work at the query, or to
> write a more complex where function to limit the results output. So no
> replace for Offset/Limit.
>
> Best regards,
> Andy.
>
>
> ----- Original Message -----
> From: "Greg Stark" <gsstark(at)mit(dot)edu>
> To: <alex(at)neteconomist(dot)com>
> Cc: "Richard Huxton" <dev(at)archonet(dot)com>; "Andrei Bintintan"
> <klodoma(at)ar-sd(dot)net>; <pgsql-sql(at)postgresql(dot)org>;
> <pgsql-performance(at)postgresql(dot)org>
> Sent: Tuesday, January 25, 2005 8:28 PM
> Subject: Re: [PERFORM] [SQL] OFFSET impact on Performance???
>
>
> >
> > Alex Turner <armtuk(at)gmail(dot)com> writes:
> >
> >> I am also very interesting in this very question.. Is there any way to
> >> declare a persistant cursor that remains open between pg sessions?
> >> This would be better than a temp table because you would not have to
> >> do the initial select and insert into a fresh table and incur those IO
> >> costs, which are often very heavy, and the reason why one would want
> >> to use a cursor.
> >
> > TANSTAAFL. How would such a persistent cursor be implemented if not by
> > building a temporary table somewhere behind the scenes?
> >
> > There could be some advantage if the data were stored in a temporary table
> > marked as not having to be WAL logged. Instead it could be automatically
> > cleared on every database start.
> >
> > --
> > greg
> >
> >
>
>


From: Richard Huxton <dev(at)archonet(dot)com>
To: alex(at)neteconomist(dot)com
Cc: Andrei Bintintan <klodoma(at)ar-sd(dot)net>, Greg Stark <gsstark(at)mit(dot)edu>, pgsql-sql(at)postgresql(dot)org, pgsql-performance(at)postgresql(dot)org
Subject: Re: [SQL] OFFSET impact on Performance???
Date: 2005-01-26 13:57:00
Message-ID: 41F7A1AC.1060406@archonet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance pgsql-sql

Alex Turner wrote:
> As I read the docs, a temp table doesn't solve our problem, as it does
> not persist between sessions. With a web page there is no guarentee
> that you will receive the same connection between requests, so a temp
> table doesn't solve the problem. It looks like you either have to
> create a real table (which is undesirable becuase it has to be
> physicaly synced, and TTFB will be very poor) or create an application
> tier in between the web tier and the database tier to allow data to
> persist between requests tied to a unique session id.
>
> Looks like the solutions to this problem is not RDBMS IMHO.

It's less the RDBMS than the web application. You're trying to mix a
stateful setup (the application) with a stateless presentation layer
(the web). If you're using PHP (which doesn't offer a "real" middle
layer) you might want to look at memcached.

--
Richard Huxton
Archonet Ltd


From: David Brown <time(at)bigpond(dot)net(dot)au>
To: Andrei Bintintan <pgsql-performance(at)postgresql(dot)org>
Subject: Re: OFFSET impact on Performance???
Date: 2005-01-27 03:50:25
Message-ID: 41F86501.1060006@bigpond.net.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance pgsql-sql

Although larger offsets have some effect, your real problem is the sort
(of 42693 rows).

Try:

SELECT r.id_order
FROM report r
WHERE r.id_order IN
(SELECT id
FROM orders
WHERE id_status = 6
ORDER BY 1
LIMIT 10 OFFSET 1000)
ORDER BY 1

The subquery doesn't *have* to sort because the table is already ordered
on the primary key.
You can still add a join to orders outside the subselect without
significant cost.

Incidentally, I don't know how you got the first plan - it should
include a sort as well.

Andrei Bintintan wrote:

> explain analyze
> SELECT o.id
> FROM report r
> INNER JOIN orders o ON o.id=r.id_order AND o.id_status=6
> ORDER BY 1 LIMIT 10 OFFSET 10
>
> Limit (cost=44.37..88.75 rows=10 width=4) (actual time=0.160..0.275
rows=10 loops=1)
> -> Merge Join (cost=0.00..182150.17 rows=41049 width=4) (actual
time=0.041..0.260 rows=20 loops=1)
> Merge Cond: ("outer".id_order = "inner".id)
> -> Index Scan using report_id_order_idx on report r
(cost=0.00..157550.90 rows=42862 width=4) (actual time=0.018..0.075
rows=20 loops=1)
> -> Index Scan using orders_pkey on orders o
(cost=0.00..24127.04 rows=42501 width=4) (actual time=0.013..0.078
rows=20 loops=1)
> Filter: (id_status = 6)
> Total runtime: 0.373 ms
>
> explain analyze
> SELECT o.id
> FROM report r
> INNER JOIN orders o ON o.id=r.id_order AND o.id_status=6
> ORDER BY 1 LIMIT 10 OFFSET 1000000
> Limit (cost=31216.85..31216.85 rows=1 width=4) (actual
time=1168.152..1168.152 rows=0 loops=1)
> -> Sort (cost=31114.23..31216.85 rows=41049 width=4) (actual
time=1121.769..1152.246 rows=42693 loops=1)
> Sort Key: o.id
> -> Hash Join (cost=2329.99..27684.03 rows=41049 width=4)
(actual time=441.879..925.498 rows=42693 loops=1)
> Hash Cond: ("outer".id_order = "inner".id)
> -> Seq Scan on report r (cost=0.00..23860.62
rows=42862 width=4) (actual time=38.634..366.035 rows=42864 loops=1)
> -> Hash (cost=2077.74..2077.74 rows=42501 width=4)
(actual time=140.200..140.200 rows=0 loops=1)
> -> Seq Scan on orders o (cost=0.00..2077.74
rows=42501 width=4) (actual time=0.059..96.890 rows=42693 loops=1)
> Filter: (id_status = 6)
> Total runtime: 1170.586 ms


From: PFC <lists(at)boutiquenumerique(dot)com>
To: alex(at)neteconomist(dot)com, "Andrei Bintintan" <klodoma(at)ar-sd(dot)net>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: [SQL] OFFSET impact on Performance???
Date: 2005-02-01 09:16:47
Message-ID: opslihd9o0th1vuj@musicbox
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance pgsql-sql


> As I read the docs, a temp table doesn't solve our problem, as it does
> not persist between sessions. With a web page there is no guarentee
> that you will receive the same connection between requests, so a temp
> table doesn't solve the problem. It looks like you either have to
> create a real table (which is undesirable becuase it has to be
> physicaly synced, and TTFB will be very poor) or create an application
> tier in between the web tier and the database tier to allow data to
> persist between requests tied to a unique session id.
>
> Looks like the solutions to this problem is not RDBMS IMHO.
>
> Alex Turner
> NetEconomist

Did you miss the proposal to store arrays of the found rows id's in a
"cache" table ? Is 4 bytes per result row still too large ?

If it's still too large, you can still implement the same cache in the
filesystem !
If you want to fetch 100.000 rows containing just an integer, in my case
(psycopy) it's a lot faster to use an array aggregate. Time to get the
data in the application (including query) :

select id from temp
=> 849 ms
select int_array_aggregate(id) as ids from temp
=> 300 ms

So you can always fetch the whole wuery results (in the form of an
integer per row) and cache it in the filesystem. It won't work if you have
10 million rows though !