Re: Highly Efficient Custom Sorting

From: Eliot Gable <egable+pgsql-performance(at)gmail(dot)com>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, Craig James <craig_james(at)emolecules(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Highly Efficient Custom Sorting
Date: 2010-07-03 20:17:56
Message-ID: AANLkTiln7OejssTCZzsnbvYnXOyJL_v5CkEhZ9asHy6X@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Read RFC 2782 on random weighted load balancing of SRV records inside DNS.
That algorithm is what I need implemented, but with an extension. I have
groups of records I need to have the algorithm applied to where each group
is treated separately from the others. I understand the operational
complexity of what I'm doing. It is more like N^3, or more precisely G*P*W
where G is the number of groups, P the number of priorities per group, and W
the number of different weights per priority. But, the complexity of the
algorithm means nothing in terms of performance or run time because it will
only ever deal with very small sets of records (maybe 20 rows of data,
tops). Even if the algorithm were N^4, it wouldn't matter with that few
records. But, more importantly, there are constraints in how the data is
sub-divided. Primarily, G < P < W. Further, G and P are groupings which
subdivide the entire set of data and the groups do not have overlapping
data. So, maybe it's more like N^2.2 or something. But, regardless, we're
only talking about 20 rows, tops.

The issue is how efficiently the languages can deal with arrays. In Perl, I
have to parse a string into an array of data, then break it up into sub
arrays inside associative arrays just to work with the input. I also have to
splice the array to remove elements, which I don't think is very efficient.
Any way I could come up with of removing elements involved rebuilding the
entire array. The same thing goes for pl/pgsql. Dealing with arrays there is
also not very efficient. I do a lot of constructing of arrays from sets of
data using myvar = array(select blah);. While pl/pgsql was considerably
faster than Perl, it cannot come close to what I did in C++ using a hash of
a hash of a linked list. The two hash tables provide my groupings and the
linked list gives me something that is highly efficient for removing
elements as I pick them.

I've looked through the documentation on how to re-write this in C, but I
cannot seem to find good documentation on working with the input array
(which is an array of a complex type). I also don't see good documentation
for working with the complex type. I found stuff that talks about
constructing a complex type in C and returning it. However, I'm not sure how
to take an input complex type and deconstruct it into something I can work
with in C. Also, the memory context management stuff is not entirely clear.
Specifically, how do I go about preserving the pointers to the data that I
allocate in multi-call memory context so that they still point to the data
on the next call to the function for the next result row? Am I supposed to
set up some global variables to do that, or am I supposed to take a
different approach? If I need to use global variables, then how do I deal
with concurrency?

On Sat, Jul 3, 2010 at 2:08 PM, Merlin Moncure <mmoncure(at)gmail(dot)com> wrote:

> On Fri, Jul 2, 2010 at 11:17 PM, Eliot Gable
> <egable+pgsql-performance(at)gmail(dot)com <egable%2Bpgsql-performance(at)gmail(dot)com>>
> wrote:
> > Well, I re-wrote the algorithm in Perl. However, it did not solve the
> speed
> > issue. Running time now is a whopping 240+ ms instead of the 31.8ms I was
> > getting before (15.2 of which is sorting). Here is the Perl code on the
> > sorting. I won't post the pl/pgsql code, because this is far more clear
> (in
> > my opinion) on what the algorithm does:
> > DROP TYPE IF EXISTS glbtype CASCADE;
> > CREATE TYPE glbtype AS (
> > id INTEGER,
> > "group" TEXT,
> > priority INTEGER,
> > weight INTEGER
> > );
> > DROP TYPE IF EXISTS glbtype_result CASCADE;
> > CREATE TYPE glbtype_result AS (
> > id INTEGER,
> > priority INTEGER,
> > weight INTEGER,
> > "order" BIGINT
> > );
> > CREATE OR REPLACE FUNCTION GroupedRandomWeightedLB(glbtype[]) RETURNS
> SETOF
> > glbtype_result AS
>
> ok, I didn't take the time to read your implementation and completely
> understand it, but it looks like you're looking at a N^2 sorting at
> best.
>
> You probably want to do something like this (it might not be quite
> right, you need to explain what each of your input array fields is
> supposed to represent):
> CREATE OR REPLACE FUNCTION GroupedRandomWeightedLB(glbtype[]) RETURNS SETOF
> glbtype_result AS
> $$
> with g as (select unnest(glbtype) as t)
> select array(select ((t).id, (t).priority) (t).weight),
> 0)::glbtype_result
> from g order by (t).group, (t).priority, random() * (t).weight);
> $$ language sql;
>
> (not sure what "order" is, is that the rownum, can't that just be
> inferred from the array position?)
>
> merlin
>

--
Eliot Gable

"We do not inherit the Earth from our ancestors: we borrow it from our
children." ~David Brower

"I decided the words were too conservative for me. We're not borrowing from
our children, we're stealing from them--and it's not even considered to be a
crime." ~David Brower

"Esse oportet ut vivas, non vivere ut edas." (Thou shouldst eat to live; not
live to eat.) ~Marcus Tullius Cicero

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Merlin Moncure 2010-07-03 22:53:46 Re: Highly Efficient Custom Sorting
Previous Message Merlin Moncure 2010-07-03 18:08:35 Re: Highly Efficient Custom Sorting