Re: I: About "Our CLUSTER implementation is pessimal" patch

From: Leonardo Francalanci <m_lists(at)yahoo(dot)it>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: I: About "Our CLUSTER implementation is pessimal" patch
Date: 2010-10-04 20:47:30
Message-ID: 863800.83599.qm@web29017.mail.ird.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> It sounds like the costing model might need a bit more work before we commit
>this.

I tried again the simple sql tests I posted a while ago, and I still get the
same ratios.
I've tested the applied patch on a dual opteron + disk array Solaris machine.

I really don't get how a laptop hard drive can be faster at reading data using
random
seeks (required by the original cluster method) than seq scan + sort for the 5M
rows
test case.
Same thing for the "cluster vs bloat" test: the seq scan + sort is faster on my
machine.

I've just noticed that Josh used shared_buffers = 16MB for the "cluster vs
bloat" test:
I'm using a much higher shared_buffers (I think something like 200MB), since if
you're working with tables this big I thought it could be a more appropriate
value.
Maybe that's the thing that makes the difference???

Can someone else test the patch?

And: I don't have that deep knowledge of how postgresql deletes rows; but I
thought
that something like:

DELETE FROM mybloat WHERE RANDOM() < 0.9;

would only delete data, not indexes; so the patch should perform even better in
this
case (as it does, in fact, on my test machine), as:

- the original cluster method would read the whole index, and fetch only the
"still alive"
rows
- the new method would read the table using a seq scan, and sort in memory the
few
rows still alive

But, as I said, maybe I'm getting this part wrong...

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2010-10-04 20:55:46 Re: ALTER DATABASE RENAME with HS/SR
Previous Message Alexander Korotkov 2010-10-04 20:19:18 Re: levenshtein_less_equal (was: multibyte charater set in levenshtein function)