Re: Sorted union

Lists: pgsql-performance
From: "Merlin Moncure" <merlin(dot)moncure(at)rcsonline(dot)com>
To: "Scott Lamb" <slamb(at)slamb(dot)org>
Cc: <pgsql-performance(at)postgresql(dot)org>, "Dustin Sallings" <dustin(at)spy(dot)net>
Subject: Re: Sorted union
Date: 2005-11-03 18:21:07
Message-ID: 6EE64EF3AB31D5448D0007DD34EEB3417DD79E@Herge.rcsinc.local
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

> Wow. I hadn't known about generate_series, but there are a bunch of
> places I've needed it.

It's a wonder tool :).

> But I think there is something I can do: I can just do a query of the
> transaction table sorted by start time. My graph tool can keep a

Reading the previous paragraphs I was just about to suggest this. This
is a much more elegant method...you are reaping the benefits of having
normalized your working set. You were trying to denormalize it back to
what you were used to. Yes, now you can drop your index and simplify
your queries...normalized data is always more 'natural'.

> Mind you, I still think PostgreSQL should be able to perform that
> sorted union fast. Maybe sometime I'll have enough free time to take
> my first plunge into looking at a database query planner.

I'm not so sure I agree, by using union you were basically pulling two
independent sets (even if they were from the same table) that needed to
be ordered. There is zero chance of using the index here for ordering
because you are ordering a different set than the one being indexed.
Had I not been able to talk you out of de-normalizing your table I was
going to suggest rigging up a materialized view and indexing that:

http://jonathangardner.net/PostgreSQL/materialized_views/matviews.html

Merlin


From: Scott Lamb <slamb(at)slamb(dot)org>
To: "Merlin Moncure" <merlin(dot)moncure(at)rcsonline(dot)com>
Cc: <pgsql-performance(at)postgresql(dot)org>, "Dustin Sallings" <dustin(at)spy(dot)net>
Subject: Re: Sorted union
Date: 2005-11-03 18:49:50
Message-ID: DE2ABD40-3B22-49EE-BC76-FF2AE68F7B4E@slamb.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Nov 3, 2005, at 10:21 AM, Merlin Moncure wrote:
> Reading the previous paragraphs I was just about to suggest this.
> This
> is a much more elegant method...you are reaping the benefits of having
> normalized your working set. You were trying to denormalize it
> back to
> what you were used to. Yes, now you can drop your index and simplify
> your queries...normalized data is always more 'natural'.

I'm not sure normalized is the right word. In either case, I'm
storing it in the same form. In either case, my ConcurrencyProcessor
class gets the same form. The only difference is if the database
splits the rows or if my application does so.

But we're essentially agreed. This is the algorithm I'm going to try
implementing, and I think it will work out well. It also means
sending about half as much data from the database to the application.

>> Mind you, I still think PostgreSQL should be able to perform that
>> sorted union fast. Maybe sometime I'll have enough free time to take
>> my first plunge into looking at a database query planner.
>
> I'm not so sure I agree, by using union you were basically pulling two
> independent sets (even if they were from the same table) that
> needed to
> be ordered.

Yes.

> There is zero chance of using the index here for ordering
> because you are ordering a different set than the one being indexed.

I don't think that's true. It just needs to look at the idea of
independently ordering each element of the union and then merging
that, compared to the cost of grabbing the union and then ordering
it. In this case, the former cost is about 0 - it already has
independently ordered them, and the merge algorithm is trivial.
<http://en.wikipedia.org/wiki/Merge_algorithm>

Regards,
Scott

--
Scott Lamb <http://www.slamb.org/>