Re: SELECT TOP %d PERCENT, or SELECT ... LIMIT %d PERCENT ?

Lists: pgsql-hackers
From: Grzegorz Jaskiewicz <gj(at)pointblue(dot)com(dot)pl>
To: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: SELECT TOP %d PERCENT, or SELECT ... LIMIT %d PERCENT ?
Date: 2008-10-12 12:44:45
Message-ID: 05A4CFF2-E348-4967-A769-25F8F0AE998E@pointblue.com.pl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

so I wonder, quite few times ppl asked me about my-word database TOP %
PERCENT (and I guess ms-word db has it too).
Now that postgres has limit(subselect) - postgresql can do the same
thing. But, using a count(*) in subselects isn't very efficient. The
backend gets information from limit X, so I figure it shouldn't be a
very big deal to add LIMIT %d PERCENT , and pass it on as percentage,
instead of just number of rows.
I can give it a go and try to implement it myself, but wanted to ask
here first for opinion on syntax (I have no idea what would be right
syntax by sql standards), and how keen would you guys be on accepting
such patch.

thanks.


From: "Hitoshi Harada" <umi(dot)tanuki(at)gmail(dot)com>
To: "Grzegorz Jaskiewicz" <gj(at)pointblue(dot)com(dot)pl>
Cc: "Pg Hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: SELECT TOP %d PERCENT, or SELECT ... LIMIT %d PERCENT ?
Date: 2008-10-12 12:54:03
Message-ID: e08cc0400810120554q38ef1769wf41c7ddb3d28dccf@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2008/10/12 Grzegorz Jaskiewicz <gj(at)pointblue(dot)com(dot)pl>:
> so I wonder, quite few times ppl asked me about my-word database TOP %
> PERCENT (and I guess ms-word db has it too).
> Now that postgres has limit(subselect) - postgresql can do the same thing.
> But, using a count(*) in subselects isn't very efficient. The backend gets
> information from limit X, so I figure it shouldn't be a very big deal to add
> LIMIT %d PERCENT , and pass it on as percentage, instead of just number of
> rows.
> I can give it a go and try to implement it myself, but wanted to ask here
> first for opinion on syntax (I have no idea what would be right syntax by
> sql standards), and how keen would you guys be on accepting such patch.
>
> thanks.
>
>

I'm afraid you will not succeed it because PostgreSQL doesn't know how
many rows are totally returned until it returns the last row, which is
as ineficient as count(*) as you point out.

Regards,

--
Hitoshi Harada


From: Grzegorz Jaskiewicz <gj(at)pointblue(dot)com(dot)pl>
To: "Hitoshi Harada" <umi(dot)tanuki(at)gmail(dot)com>
Cc: "Pg Hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: SELECT TOP %d PERCENT, or SELECT ... LIMIT %d PERCENT ?
Date: 2008-10-12 13:07:14
Message-ID: B119A552-BB7C-4FE9-8EDE-7D4884BC603D@pointblue.com.pl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

that's a shame.
I figure, with the "WITH ... AS .." you can kind of speed up counts,
just by doing it once - now. But still, it would be great to be able
to use PERCENT, against estimate.
After all, imagine a table FOO with 5 rows, and something like this:

WITH c AS (select count(*)/4 as n from FOO) select * from FOO limit
(select n from c) offset (select n*X from c);

now X is for every 25%, but you wan't get to see last row if you use
offset 75 percent (X=3). Which kind of puts me off the idea of using
PERCENT anyway, because it isn't quite accurate.

So thanks for answer, I'll pass on this one. Hopefully there would be
another chance for me to write some patch in future.

ta.


From: "Robert Haas" <robertmhaas(at)gmail(dot)com>
To: "Grzegorz Jaskiewicz" <gj(at)pointblue(dot)com(dot)pl>
Cc: "Hitoshi Harada" <umi(dot)tanuki(at)gmail(dot)com>, "Pg Hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: SELECT TOP %d PERCENT, or SELECT ... LIMIT %d PERCENT ?
Date: 2008-10-12 13:46:11
Message-ID: 603c8f070810120646i7fbe3f24x295516476ca41f1f@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I wonder if this could be implemented using the window-function
infrastructure...

...Robert

On Sun, Oct 12, 2008 at 9:07 AM, Grzegorz Jaskiewicz
<gj(at)pointblue(dot)com(dot)pl> wrote:
> that's a shame.
> I figure, with the "WITH ... AS .." you can kind of speed up counts, just by
> doing it once - now. But still, it would be great to be able to use PERCENT,
> against estimate.
> After all, imagine a table FOO with 5 rows, and something like this:
>
> WITH c AS (select count(*)/4 as n from FOO) select * from FOO limit (select
> n from c) offset (select n*X from c);
>
> now X is for every 25%, but you wan't get to see last row if you use offset
> 75 percent (X=3). Which kind of puts me off the idea of using PERCENT
> anyway, because it isn't quite accurate.
>
> So thanks for answer, I'll pass on this one. Hopefully there would be
> another chance for me to write some patch in future.
>
> ta.
>
>


From: "Hitoshi Harada" <umi(dot)tanuki(at)gmail(dot)com>
To: "Robert Haas" <robertmhaas(at)gmail(dot)com>
Cc: "Grzegorz Jaskiewicz" <gj(at)pointblue(dot)com(dot)pl>, "Pg Hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: SELECT TOP %d PERCENT, or SELECT ... LIMIT %d PERCENT ?
Date: 2008-10-12 15:22:33
Message-ID: e08cc0400810120822y7ba25d68h7f83ab00517746c0@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2008/10/12 Robert Haas <robertmhaas(at)gmail(dot)com>:
> I wonder if this could be implemented using the window-function
> infrastructure...
>

Yeah, actually window functions I am working now has percent_rank() or
something he wants. That is better than WITH clauses but still we must
see the last row before determine the percent. Better than count(*) of
course, but I'm not sure it meets his needs.

Regards,

--
Hitoshi Harada


From: Grzegorz Jaskiewicz <gj(at)pointblue(dot)com(dot)pl>
To: Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>
Cc: "Robert Haas" <robertmhaas(at)gmail(dot)com>, "Pg Hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: SELECT TOP %d PERCENT, or SELECT ... LIMIT %d PERCENT ?
Date: 2008-10-12 23:14:14
Message-ID: 3CF9CDC2-F186-42E6-A24A-A8EAD5668C99@pointblue.com.pl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On 2008-10-12, at 16:22, Hitoshi Harada wrote:

> 2008/10/12 Robert Haas <robertmhaas(at)gmail(dot)com>:
>> I wonder if this could be implemented using the window-function
>> infrastructure...
>>
>
> Yeah, actually window functions I am working now has percent_rank() or
> something he wants. That is better than WITH clauses but still we must
> see the last row before determine the percent. Better than count(*) of
> course, but I'm not sure it meets his needs.
>
excuse my ignorance, apparently I don't recall that patches discussion
in here. If it pops out, and gets into cvs, can someone remind me of
that - please ?
Also, will that patch also be able to speed up count(*) , or is it for
different purpose ? Sorry if I am repeating the subject, if so -
please just respond in private.

thanks folks.


From: "Hitoshi Harada" <umi(dot)tanuki(at)gmail(dot)com>
To: "Grzegorz Jaskiewicz" <gj(at)pointblue(dot)com(dot)pl>
Cc: "Robert Haas" <robertmhaas(at)gmail(dot)com>, "Pg Hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: SELECT TOP %d PERCENT, or SELECT ... LIMIT %d PERCENT ?
Date: 2008-10-13 05:06:51
Message-ID: e08cc0400810122206s2e3241feu25d2cb65795c0cf7@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2008/10/13 Grzegorz Jaskiewicz <gj(at)pointblue(dot)com(dot)pl>:
>
> On 2008-10-12, at 16:22, Hitoshi Harada wrote:
>
>> 2008/10/12 Robert Haas <robertmhaas(at)gmail(dot)com>:
>>>
>>> I wonder if this could be implemented using the window-function
>>> infrastructure...
>>>
>>
>> Yeah, actually window functions I am working now has percent_rank() or
>> something he wants. That is better than WITH clauses but still we must
>> see the last row before determine the percent. Better than count(*) of
>> course, but I'm not sure it meets his needs.
>>
> excuse my ignorance, apparently I don't recall that patches discussion in
> here. If it pops out, and gets into cvs, can someone remind me of that -
> please ?

You can track pgsql news on David Fetter's blog:
http://people.planetpostgresql.org/dfetter/

> Also, will that patch also be able to speed up count(*) , or is it for
> different purpose ? Sorry if I am repeating the subject, if so - please just
> respond in private.

I guess cume_dist() will do for you:

SELECT id, cume_dist
FROM(
SELECT id, cume_dist() OVER (ORDER BY id)
FROM foo
)s
WHERE cume_dist < 0.3

will produce a result with id of 30% ascending order. I am working on
this window functions infrastructure to get into 8.4 but we're still
not sure if it will.

You can google "window functions" or my design docs on
http://umitanuki.net/pgsql/wfv06/design.html to see what it is for.

Regards,

--
Hitoshi Harada