Re: Question regarding indices

Lists: pgsql-sql
From: "Steve" <steeeeeveee(at)gmx(dot)net>
To: pgsql-sql(at)postgresql(dot)org
Subject: Question regarding indices
Date: 2010-09-11 12:29:23
Message-ID: 20100911122923.75010@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

Hello List,

I have a small question about the order of values in a query. Assume I have a table with the following fields:
uid INT,
data BIGINT,
hits INT

And an unique index on (uid, data). I use libpq C API to query data from the table. The query is something like this:
SELECT uid,data,hits FROM mytable WHERE uid=2 AND data IN (2033,2499,590,19,201,659)

Would the speed of the query be influenced if I would sort the data? I can imagine that just querying a bunch of bigint would not make a big difference but what about several thousand of values? Would sorting them and sending the SQL query with ordered data influence the speed of the query?

// Steve
--
GRATIS: Spider-Man 1-3 sowie 300 weitere Videos!
Jetzt freischalten! http://portal.gmx.net/de/go/maxdome


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Steve" <steeeeeveee(at)gmx(dot)net>
Cc: pgsql-sql(at)postgresql(dot)org
Subject: Re: Question regarding indices
Date: 2010-09-11 15:04:16
Message-ID: 17448.1284217456@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

"Steve" <steeeeeveee(at)gmx(dot)net> writes:
> I have a small question about the order of values in a query. Assume I have a table with the following fields:
> uid INT,
> data BIGINT,
> hits INT

> And an unique index on (uid, data). I use libpq C API to query data from the table. The query is something like this:
> SELECT uid,data,hits FROM mytable WHERE uid=2 AND data IN (2033,2499,590,19,201,659)

> Would the speed of the query be influenced if I would sort the data? I can imagine that just querying a bunch of bigint would not make a big difference but what about several thousand of values? Would sorting them and sending the SQL query with ordered data influence the speed of the query?

It's unlikely to make enough difference to be worth the trouble.

regards, tom lane


From: Lew <noone(at)lewscanon(dot)com>
To: pgsql-sql(at)postgresql(dot)org
Subject: Re: Question regarding indices
Date: 2010-09-11 15:08:00
Message-ID: i6g60e$345$1@news.albasani.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

On 09/11/2010 08:29 AM, Steve wrote:
> I have a small question about the order of values in a query.
> Assume I have a table with the following fields:
> uid INT,
> data BIGINT,
> hits INT
> And an unique index on (uid, data). I use libpq C API to query
> data from the table. The query is something like this:
> SELECT uid,data,hits FROM mytable WHERE uid=2
> AND data IN (2033,2499,590,19,201,659)
>
> Would the speed of the query be influenced if I would sort the data?

What do you mean by "sort the data"? Which data?

> I can imagine that just querying a bunch of bigint would not make a
> big difference but what about several thousand of values? Would sorting
> them and sending the SQL query with ordered data influence the speed of the query?

Send the query from where to where?

Are you referring to a sort of the items in the IN subselect? My guess is
that sorting that won't matter but it's only a WAG.

--
Lew


From: "Steve" <steeeeeveee(at)gmx(dot)net>
To: pgsql-sql(at)postgresql(dot)org
Subject: Re: Question regarding indices
Date: 2010-09-11 17:23:32
Message-ID: 20100911172332.240150@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql


-------- Original-Nachricht --------
> Datum: Sat, 11 Sep 2010 11:04:16 -0400
> Von: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
> An: "Steve" <steeeeeveee(at)gmx(dot)net>
> CC: pgsql-sql(at)postgresql(dot)org
> Betreff: Re: [SQL] Question regarding indices

> "Steve" <steeeeeveee(at)gmx(dot)net> writes:
> > I have a small question about the order of values in a query. Assume I
> have a table with the following fields:
> > uid INT,
> > data BIGINT,
> > hits INT
>
> > And an unique index on (uid, data). I use libpq C API to query data from
> the table. The query is something like this:
> > SELECT uid,data,hits FROM mytable WHERE uid=2 AND data IN
> (2033,2499,590,19,201,659)
>
> > Would the speed of the query be influenced if I would sort the data? I
> can imagine that just querying a bunch of bigint would not make a big
> difference but what about several thousand of values? Would sorting them and
> sending the SQL query with ordered data influence the speed of the query?
>
> It's unlikely to make enough difference to be worth the trouble.
>
Making a quick sort is ultra easy in C. Anyway... is there a difference in the speed of the query with pre-sorted values or not? If there is one then I will go and sort the values. Right now I have a quick sort implemented but I will probably do a lazy quick sort and then a final insertion sort, because insertion is faster on a slightly ordered dataset than quick sort.

Probably I am pulling hairs here about the speed but I really want to minimize the time it needs for PostgreSQL to return the data. I personally am happy with the speed when using PostgreSQL but the application I use has an MySQL driver too and I got some users claiming that MySQL is faster than PostgreSQL, transfers less data over the wire, etc... and I want to optimize the PostgreSQL part to be on the same level as the MySQL part. So everything that helps to squeeze the last nanosecond out of the PostgreSQL part is welcome. I already switched to binary transmission in order to minimize the data send over the wire when using PostgreSQL and I have added an function to do +/- what the MySQL proprietary "INSERT ON DUPLICATE KEY UPDATE" does. I hate when users compare apples with oranges but what can I do? You can not explain them that PostgreSQL is different and more standard compliant and that the chance to loose data is lower with PostgreSQL then with all that what MySQL is d
oing (MyISAM tables, etc...). It's pointless to explain to them. It's like trying to explain a mole how the sun is shining.

So all I want is to explore the available capabilities of PostgreSQL to get the best out of the libpq engine as possible. If you have any recommendation at what I should look in order to get better speed, then let me know.

> regards, tom lane
>
// Steve
--
Achtung Sicherheitswarnung: GMX warnt vor Phishing-Attacken!
http://portal.gmx.net/de/go/sicherheitspaket


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Steve" <steeeeeveee(at)gmx(dot)net>
Cc: pgsql-sql(at)postgresql(dot)org
Subject: Re: Question regarding indices
Date: 2010-09-11 18:22:08
Message-ID: 20432.1284229328@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

"Steve" <steeeeeveee(at)gmx(dot)net> writes:
>> Von: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
>> It's unlikely to make enough difference to be worth the trouble.
>>
> Making a quick sort is ultra easy in C. Anyway... is there a
> difference in the speed of the query with pre-sorted values or not?
> If there is one then I will go and sort the values.

I didn't opine on whether it was "easy" or not. I said it was unlikely
to be worth the trouble. You could very well spend more time sorting
the values than you buy in whatever you might save on the server side.

Each value in the IN list is going to require a separate index probe.
The sorting might buy something in locality of reference for successive
probes, but most likely not enough to notice.

regards, tom lane


From: "Steve" <steeeeeveee(at)gmx(dot)net>
To: pgsql-sql(at)postgresql(dot)org
Subject: Re: Question regarding indices
Date: 2010-09-13 00:03:06
Message-ID: 20100913000306.202150@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

-------- Original-Nachricht --------
> Datum: Sat, 11 Sep 2010 11:08:00 -0400
> Von: Lew <noone(at)lewscanon(dot)com>
> An: pgsql-sql(at)postgresql(dot)org
> Betreff: Re: [SQL] Question regarding indices

> On 09/11/2010 08:29 AM, Steve wrote:
> > I have a small question about the order of values in a query.
> > Assume I have a table with the following fields:
> > uid INT,
> > data BIGINT,
> > hits INT
> > And an unique index on (uid, data). I use libpq C API to query
> > data from the table. The query is something like this:
> > SELECT uid,data,hits FROM mytable WHERE uid=2
> > AND data IN (2033,2499,590,19,201,659)
> >
> > Would the speed of the query be influenced if I would sort the data?
>
> What do you mean by "sort the data"? Which data?
>
I mean sorting the values in the brackets. Instead of:
SELECT uid,data,hits FROM mytable WHERE uid=2 AND data IN (2033,2499,590,19,201,659)

I would then send this here:
SELECT uid,data,hits FROM mytable WHERE uid=2 AND data IN (19,201,590,659,2033,2499)

Off course this is a small dataset but the query usually has thousands of elements and not only the above 6 elements.

> > I can imagine that just querying a bunch of bigint would not make a
> > big difference but what about several thousand of values? Would sorting
> > them and sending the SQL query with ordered data influence the speed of
> the query?
>
> Send the query from where to where?
>
Sending the query from my application to the PostgreSQL server.

> Are you referring to a sort of the items in the IN subselect?
>
Yes.

> My guess is
> that sorting that won't matter but it's only a WAG.
>
What is "WAG"?

> --
> Lew
>
SteveB
--
Achtung Sicherheitswarnung: GMX warnt vor Phishing-Attacken!
http://portal.gmx.net/de/go/sicherheitspaket


From: Frank Bax <fbax(at)sympatico(dot)ca>
To: pgsql-sql(at)postgresql(dot)org
Subject: Re: Question regarding indices
Date: 2010-09-14 11:31:33
Message-ID: BLU0-SMTP822F4EE03686BA05451C63AC780@phx.gbl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

Steve wrote:
> -------- Original-Nachricht --------
>> Datum: Sat, 11 Sep 2010 11:08:00 -0400
>> Von: Lew <noone(at)lewscanon(dot)com>
>> An: pgsql-sql(at)postgresql(dot)org
>> Betreff: Re: [SQL] Question regarding indices
>
>> On 09/11/2010 08:29 AM, Steve wrote:
>>> I have a small question about the order of values in a query.
>>> Assume I have a table with the following fields:
>>> uid INT,
>>> data BIGINT,
>>> hits INT
>>> And an unique index on (uid, data). I use libpq C API to query
>>> data from the table. The query is something like this:
>>> SELECT uid,data,hits FROM mytable WHERE uid=2
>>> AND data IN (2033,2499,590,19,201,659)
>>>
>>> Would the speed of the query be influenced if I would sort the data?
>> What do you mean by "sort the data"? Which data?
>>
> I mean sorting the values in the brackets. Instead of:
> SELECT uid,data,hits FROM mytable WHERE uid=2 AND data IN (2033,2499,590,19,201,659)
>
> I would then send this here:
> SELECT uid,data,hits FROM mytable WHERE uid=2 AND data IN (19,201,590,659,2033,2499)
>
> Off course this is a small dataset but the query usually has thousands of elements and not only the above 6 elements.

If there will be thousands; why not create a temp table containing these
values then join to table - might that be faster?