Re: Performance issues when the number of records are around 10 Million

From: venu madhav <venutaurus539(at)gmail(dot)com>
To: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Performance issues when the number of records are around 10 Million
Date: 2010-05-12 06:59:11
Message-ID: AANLkTikKeQExTNXN7XppB81pEOk26aGPHLAx51GcrKXa@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Wed, May 12, 2010 at 3:20 AM, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov
> wrote:

> venu madhav <venutaurus539(at)gmail(dot)com> wrote:
>
> > When I try to get the last twenty records from the database, it
> > takes around 10-15 mins to complete the operation.
>
> Making this a little easier to read (for me, at least) I get this:
>
> select e.cid, timestamp, s.sig_class, s.sig_priority, s.sig_name,
> e.sniff_ip, e.sniff_channel, s.sig_config, e.wifi_addr_1,
> e.wifi_addr_2, e.view_status, bssid
> FROM event e,
> signature s
> WHERE s.sig_id = e.signature
> AND e.timestamp >= '1270449180'
> AND e.timestamp < '1273473180'
> ORDER BY
> e.cid DESC,
> e.cid DESC
> limit 21
> offset 10539780
> ;
>
> Why the timestamp range, the order by, the limit, *and* the offset?
> On the face of it, that seems a bit confused. Not to mention that
> your ORDER BY has the same column twice.
>
[Venu] The second column acts as a secondary key for sorting if the primary
sorting key is a different column. For this query both of them are same.
This query is part of an application which allows user to select time ranges
and retrieve the data in that interval. Hence the time stamp. To have it in
some particular order we're doing order by. If the records are more in the
interval, we display in sets of 20/30 etc. The user also has the option to
browse through any of those records hence the limit and offset.

>
> Perhaps that OFFSET is not needed? It is telling PostgreSQL that
> whatever results are generated based on the rest of the query, read
> through and ignore the first ten and a half million. Since you said
> you had about ten million rows, you wanted the last 20, and the
> ORDER by is DESCending, you're probably not going to get what you
> want.
>
> What, exactly, *is* it you want again?
>
> [Venu] As explain above this query is part of the application where user
wishes to see the records from the database between any start and end times.
They get rendered as a HTML page with pagination links to traverse through
the data. The user has option to go to any set of records. When the user
asks for the last set of 20 records, this query gets executed.
Hope it is clear now. Please let me know if you need any further info.

Thank you,
Venu

> -Kevin
>

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Kevin Grittner 2010-05-12 11:55:26 Re: Performance issues when the number of records are around 10 Million
Previous Message venu madhav 2010-05-12 06:42:33 Re: Performance issues when the number of records are around 10 Million