Re: Query only slow on first run
The query's spending nearly all its time in the scan of "posts", and
I'm wondering why --- doesn't seem like it should take 6400msec to fetch
646 rows, unless perhaps the data is just horribly misordered relative
to the index. Which may in fact be the case ...
Yes, they probably are. I use the random_number column in order to
receive a semi random sample subset from the large amount of rows. The
technique is described in [1]. This subset is later used for some
statistical investigation, but this is somewhat irrelevant here. In
order to receive the sample fast, I have made an index on the
random_number column.
what exactly is that
"random_number" column
A random float that is initialized when the row is created and never
modified afterwards. The physical row ordering will clearly not match
the random_number ordering. However, other queries uses a row ordering
by the primary key so I don't think it would make much sense to make the
index on random_number a clustering index just in order to speed up this
single query.
and why are you desirous of ordering by it?
In order to simulate a random pick of K rows. See [1].
For that matter, if it is what it sounds like, why is it sane to group
by it? You'll probably always get groups of one row ...
For each random_number, another table (question_tags) holds zero or more
rows satisfying a number of constraints. I need to count(*) the number
of corresponding question_tag rows for each random_number.
We have primarily two tables of interest here: questions (~100k rows)
and posts (~400k rows). Each post refers to a question, but only the
"posts" rows for which the corresponding "question.status = 1" are
relevant. This reduces the number of relevant question rows to about
10k. Within the post rows corresponding to these 10k questions I would
like to pick a random sample of size K.
[1] http://archives.postgresql.org/pgsql-general/2007-10/msg01240.php
Home |
Main Index |
Thread Index