Re: About "Our CLUSTER implementation is pessimal" patch

Lists: pgsql-hackers
From: Leonardo F <m_lists(at)yahoo(dot)it>
To: pgsql-hackers(at)postgresql(dot)org
Cc: stark(at)enterprisedb(dot)com
Subject: About "Our CLUSTER implementation is pessimal" patch
Date: 2010-01-15 11:45:24
Message-ID: 42957.48433.qm@web29016.mail.ird.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

I read the thread "Our CLUSTER implementation is pessimal" http://archives.postgresql.org/pgsql-hackers/2008-08/msg01371.php .

I would like to try/integrate that patch as we use CLUSTER a lot on our system.

I was going to try to add the proper cost_index/cost_sort calls to decide which "path" should be executed, as in:

http://archives.postgresql.org/pgsql-hackers/2008-09/msg00517.php

I don't think it will be easy without help... I'll ask here a lot I'm afraid...

About that patch:

1) would it be possible to use the tuplesort_*tupleslot set of functions instead of writing new ones for HeapTuple? That is: is it that difficult/impossible/nonsense to construct TupleTableSlot from HeapTuple and use those?

2) The patch doesn't check "HeapTupleSatisfiesVacuum" before passing it to tuplesort_putrawtuple: would it be reasonable to check the "isdead" flag before calling tuplesort_putrawtuple for each tuple?

Sorry if I talked nonsense...

Leonardo


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Leonardo F <m_lists(at)yahoo(dot)it>
Cc: pgsql-hackers(at)postgresql(dot)org, stark(at)enterprisedb(dot)com
Subject: Re: About "Our CLUSTER implementation is pessimal" patch
Date: 2010-01-15 12:05:32
Message-ID: 4B505A0C.4070403@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Leonardo F wrote:
> I read the thread "Our CLUSTER implementation is pessimal" http://archives.postgresql.org/pgsql-hackers/2008-08/msg01371.php .
>
> I would like to try/integrate that patch as we use CLUSTER a lot on our system.

Great!

> About that patch:
>
> 1) would it be possible to use the tuplesort_*tupleslot set of functions instead of writing new ones for HeapTuple? That is: is it that difficult/impossible/nonsense to construct TupleTableSlot from HeapTuple and use those?

Yeah, I think you could do that, I agree it feels better that way.
You'll still need new copytup and comparetup functions, though, to deal
with HeapTupleHeaders instead of MinimalTuples, or modify the existing
ones to handle both. And some way to indicate that you want to preserve
the visibility information when you create the tuplesort, maybe a new
parameter to tuplesort_begin_heap().

> 2) The patch doesn't check "HeapTupleSatisfiesVacuum" before passing it to tuplesort_putrawtuple: would it be reasonable to check the "isdead" flag before calling tuplesort_putrawtuple for each tuple?

Yeah, seems reasonable, to avoid sorting dead tuples unnecessarily.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Leonardo F <m_lists(at)yahoo(dot)it>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, stark(at)enterprisedb(dot)com
Subject: Re: About "Our CLUSTER implementation is pessimal" patch
Date: 2010-01-15 15:48:29
Message-ID: 907930.87593.qm@web29012.mail.ird.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> Yeah, I think you could do that, I agree it feels better that way.
> You'll still need new copytup and comparetup functions, though, to deal
> with HeapTupleHeaders instead of MinimalTuples, or modify the existing
> ones to handle both.

You meant HeapTuple, not HeapTupleHeaders, right?

Mmh, didn't think of those two functions; I might as well start with Gregory
Stark's patch (that is: using HeapTuple)

> And some way to indicate that you want to preserve
> the visibility information when you create the tuplesort, maybe a new
> parameter to tuplesort_begin_heap().

I guess that using Gregory Stark's patch there's no need for it, since it uses
HeapTuples, right?

A patch that:

1) uses always the old CLUSTER method for non-btree indexes and for
expression indexes
2) add a whole set of new functions to tuplesort (as in Gregory Stark's patch)

would be rejected "for sure"? Or can be thought as a "better than nothing,
works in 90% cases" patch?

Leonardo


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Leonardo F <m_lists(at)yahoo(dot)it>
Cc: pgsql-hackers(at)postgresql(dot)org, stark(at)enterprisedb(dot)com
Subject: Re: About "Our CLUSTER implementation is pessimal" patch
Date: 2010-01-15 18:23:13
Message-ID: 4B50B291.30102@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Leonardo F wrote:
>> Yeah, I think you could do that, I agree it feels better that way.
>> You'll still need new copytup and comparetup functions, though, to deal
>> with HeapTupleHeaders instead of MinimalTuples, or modify the existing
>> ones to handle both.
>
> You meant HeapTuple, not HeapTupleHeaders, right?

No, I did mean HeapTupleHeader. MinimalTuple struct is cut-down version
HeapTupleHeader, while HeapTuple is structure that holds a pointer to
HeapTupleHeader + some extra information. SortTuple takes the role of
HeapTUple in tuplesort.c. A bit confusing, yes.

That said, I didn't really look closely, maybe I'm missing something and
HeapTuple is in fact the right struct to pass around.

>> And some way to indicate that you want to preserve
>> the visibility information when you create the tuplesort, maybe a new
>> parameter to tuplesort_begin_heap().
>
> I guess that using Gregory Stark's patch there's no need for it, since it uses
> HeapTuples, right?

Hmm, you still need to set different comparison function in
Tuplesortstate->comparetup, so you'll still need a separate begin()
function too, or a flag to the existing one.

> A patch that:
>
> 1) uses always the old CLUSTER method for non-btree indexes and for
> expression indexes
> 2) add a whole set of new functions to tuplesort (as in Gregory Stark's patch)
>
> would be rejected "for sure"? Or can be thought as a "better than nothing,
> works in 90% cases" patch?

I'm fine with 1), though I wish we didn't have to add all that
boilerplate code 2). I guess it's not a show-stopper.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Leonardo F <m_lists(at)yahoo(dot)it>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: About "Our CLUSTER implementation is pessimal" patch
Date: 2010-01-20 17:48:00
Message-ID: 925824.32998.qm@web29018.mail.ird.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> I read the thread "Our CLUSTER implementation is pessimal"
> http://archives.postgresql.org/pgsql-hackers/2008-08/msg01371.php .
>
> I would like to try/integrate that patch as we use CLUSTER a lot on our system.
>
> I was going to try to add the proper cost_index/cost_sort calls to decide which
> "path" should be executed, as in:
>
> http://archives.postgresql.org/pgsql-hackers/2008-09/msg00517.php

I think I got something up and running to check if a table scan + sort is supposed
to be faster than an index scan for a certain CLUSTER operation.

The way I did it is (I guess...) wrong: I created the elements needed by
get_relation_info, create_seqscan_path, create_index_path, cost_sort.

It has been, obviously, a trial and error approach: I added the member values as
soon as one function call crashed... and I bet I didn't get all the corner cases.
Is there any better way of doing it?

Leonardo

(this is called in copy_heap_data to decide which path to choose:)

static bool use_index_scan(Oid tableOid, Oid indexOid)
{
RelOptInfo *rel;
PlannerInfo *root;
Query *query;
PlannerGlobal *glob;
Path *seqAndSortPath;
IndexPath *indexPath;
RangeTblEntry *rte;

rel = makeNode(RelOptInfo);
rel->reloptkind = RELOPT_BASEREL;
rel->relid = 1;
rel->rtekind = RTE_RELATION;

/* needed by get_relation_info */
glob = makeNode(PlannerGlobal);

/* needed by get_relation_info: */
query = makeNode(Query);
query->resultRelation = 0;

root = makeNode(PlannerInfo);

root->parse = query;
root->glob = glob;

get_relation_info(root, tableOid, false, rel);
seqAndSortPath = create_seqscan_path(NULL, rel);

rel->rows = rel->tuples;

rte = makeNode(RangeTblEntry);
rte->rtekind = RTE_RELATION;
rte->relid = tableOid;

root->simple_rel_array_size = 2;
root->simple_rte_array = (RangeTblEntry **)
palloc0(root->simple_rel_array_size * sizeof(RangeTblEntry *));
root->simple_rte_array[1] = rte;

root->total_table_pages = rel->pages;

indexPath = create_index_path(root, (IndexOptInfo*)(list_head(rel->indexlist)->data.ptr_value), NULL, NULL, ForwardScanDirection, NULL);
cost_sort(seqAndSortPath, root, NULL, seqAndSortPath->total_cost, rel->tuples, rel->width, -1);

return indexPath->path.total_cost < seqAndSortPath->total_cost;
}


From: Leonardo F <m_lists(at)yahoo(dot)it>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: About "Our CLUSTER implementation is pessimal" patch
Date: 2010-01-21 08:18:01
Message-ID: 556775.51248.qm@web29015.mail.ird.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Anyone? I'd like some feedback before moving on to do the seq scan + sort in those
CLUSTER cases where "use_index_scan" returns false...

----- Messaggio originale -----
> Da: Leonardo F <m_lists(at)yahoo(dot)it>
> A: pgsql-hackers(at)postgresql(dot)org
> Inviato: Mer 20 gennaio 2010, 18:48:00
> Oggetto: Re: [HACKERS] About "Our CLUSTER implementation is pessimal" patch
>
> > I read the thread "Our CLUSTER implementation is pessimal"
> > http://archives.postgresql.org/pgsql-hackers/2008-08/msg01371.php .
> >
> > I was going to try to add the proper cost_index/cost_sort calls to decide
> which
> > "path" should be executed, as in:
> >
> > http://archives.postgresql.org/pgsql-hackers/2008-09/msg00517.php
>
> I think I got something up and running to check if a table scan + sort is
> supposed
> to be faster than an index scan for a certain CLUSTER operation.
>
> The way I did it is (I guess...) wrong: I created the elements needed by
> get_relation_info, create_seqscan_path, create_index_path, cost_sort.
>
> It has been, obviously, a trial and error approach: I added the member values as
> soon as one function call crashed... and I bet I didn't get all the corner
> cases.
> Is there any better way of doing it?
>
> Leonardo
>
> (this is called in copy_heap_data to decide which path to choose:)
>
> static bool use_index_scan(Oid tableOid, Oid indexOid)
> {
> RelOptInfo *rel;
> PlannerInfo *root;
> Query *query;
> PlannerGlobal *glob;
> Path *seqAndSortPath;
> IndexPath *indexPath;
> RangeTblEntry *rte;
>
> rel = makeNode(RelOptInfo);
> rel->reloptkind = RELOPT_BASEREL;
> rel->relid = 1;
> rel->rtekind = RTE_RELATION;
>
> /* needed by get_relation_info */
> glob = makeNode(PlannerGlobal);
>
> /* needed by get_relation_info: */
> query = makeNode(Query);
> query->resultRelation = 0;
>
> root = makeNode(PlannerInfo);
>
> root->parse = query;
> root->glob = glob;
>
> get_relation_info(root, tableOid, false, rel);
> seqAndSortPath = create_seqscan_path(NULL, rel);
>
> rel->rows = rel->tuples;
>
> rte = makeNode(RangeTblEntry);
> rte->rtekind = RTE_RELATION;
> rte->relid = tableOid;
>
> root->simple_rel_array_size = 2;
> root->simple_rte_array = (RangeTblEntry **)
> palloc0(root->simple_rel_array_size * sizeof(RangeTblEntry *));
> root->simple_rte_array[1] = rte;
>
> root->total_table_pages = rel->pages;
>
> indexPath = create_index_path(root,
> (IndexOptInfo*)(list_head(rel->indexlist)->data.ptr_value), NULL, NULL,
> ForwardScanDirection, NULL);
> cost_sort(seqAndSortPath, root, NULL, seqAndSortPath->total_cost, rel->tuples,
> rel->width, -1);
>
> return indexPath->path.total_cost < seqAndSortPath->total_cost;
> }


From: Takahiro Itagaki <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
To: Leonardo F <m_lists(at)yahoo(dot)it>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: About "Our CLUSTER implementation is pessimal" patch
Date: 2010-01-21 08:34:20
Message-ID: 20100121173419.D19E.52131E4D@oss.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Leonardo F <m_lists(at)yahoo(dot)it> wrote:

> Anyone? I'd like some feedback before moving on to do the seq scan + sort in those
> CLUSTER cases where "use_index_scan" returns false...

+1 for CLUSTER using sort.

I have a couple of comments for the current implementation:
* Do we need to disable sort-path for tables clustered on a gist index?
* I'd prefer to separate cost calculation routines from create_index_path()
and cost_sort(), rather than using a dummy planner.

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center


From: Greg Stark <stark(at)mit(dot)edu>
To: Leonardo F <m_lists(at)yahoo(dot)it>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: About "Our CLUSTER implementation is pessimal" patch
Date: 2010-01-21 11:06:04
Message-ID: 407d949e1001210306l2e70b438j8194120896f86b22@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

one idea could be to actually prepare a query using SPI for "select * from
table order by <cols>" and then peek inside to see which plan was generated.
perhaps you could do this using the existing planner hook.

you might have to watch out for the user's rules or planner hooks (though I
don't think referential integrity triggers take any precautions about those
dangers)

greg

On 20 Jan 2010 17:48, "Leonardo F" <m_lists(at)yahoo(dot)it> wrote:

> I read the thread "Our CLUSTER implementation is pessimal" >
http://archives.postgresql.org/pgsql...
I think I got something up and running to check if a table scan + sort is
supposed
to be faster than an index scan for a certain CLUSTER operation.

The way I did it is (I guess...) wrong: I created the elements needed by
get_relation_info, create_seqscan_path, create_index_path, cost_sort.

It has been, obviously, a trial and error approach: I added the member
values as
soon as one function call crashed... and I bet I didn't get all the corner
cases.
Is there any better way of doing it?

Leonardo

(this is called in copy_heap_data to decide which path to choose:)

static bool use_index_scan(Oid tableOid, Oid indexOid)
{
RelOptInfo *rel;
PlannerInfo *root;
Query *query;
PlannerGlobal *glob;
Path *seqAndSortPath;
IndexPath *indexPath;
RangeTblEntry *rte;

rel = makeNode(RelOptInfo);
rel->reloptkind = RELOPT_BASEREL;
rel->relid = 1;
rel->rtekind = RTE_RELATION;

/* needed by get_relation_info */
glob = makeNode(PlannerGlobal);

/* needed by get_relation_info: */
query = makeNode(Query);
query->resultRelation = 0;

root = makeNode(PlannerInfo);

root->parse = query;
root->glob = glob;

get_relation_info(root, tableOid, false, rel);
seqAndSortPath = create_seqscan_path(NULL, rel);

rel->rows = rel->tuples;

rte = makeNode(RangeTblEntry);
rte->rtekind = RTE_RELATION;
rte->relid = tableOid;

root->simple_rel_array_size = 2;
root->simple_rte_array = (RangeTblEntry **)
palloc0(root->simple_rel_array_size * sizeof(RangeTblEntry *));
root->simple_rte_array[1] = rte;

root->total_table_pages = rel->pages;

indexPath = create_index_path(root,
(IndexOptInfo*)(list_head(rel->indexlist)->data.ptr_value), NULL, NULL,
ForwardScanDirection, NULL);
cost_sort(seqAndSortPath, root, NULL, seqAndSortPath->total_cost,
rel->tuples, rel->width, -1);

return indexPath->path.total_cost < seqAndSortPath->total_cost;

} -- Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org) To
make changes to you...


From: Leonardo F <m_lists(at)yahoo(dot)it>
To: Takahiro Itagaki <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: About "Our CLUSTER implementation is pessimal" patch
Date: 2010-01-21 12:00:09
Message-ID: 879360.78742.qm@web29019.mail.ird.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> * Do we need to disable sort-path for tables clustered on a gist index?

Yes; as I said in a previous mail, only plain btree indexes (that is, not
custom expression indexes) would have that option (at least in a first
version...)

> * I'd prefer to separate cost calculation routines from create_index_path()
> and cost_sort(), rather than using a dummy planner.

How? Copy&paste (removing unnecessary code) of the existing
create_index_path() and cost_sort()?

Leonardo


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Takahiro Itagaki <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Cc: Leonardo F <m_lists(at)yahoo(dot)it>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: About "Our CLUSTER implementation is pessimal" patch
Date: 2010-01-21 15:30:09
Message-ID: 7942.1264087809@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Takahiro Itagaki <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp> writes:
> * I'd prefer to separate cost calculation routines from create_index_path()
> and cost_sort(), rather than using a dummy planner.

Don't go that way. The cost functions have enough dependencies on
low-level planner functionality that making them be standalone would be
a serious mess, both immediately and in terms of future maintenance.
(As an example, someday we'll probably get around to having cost_sort
actually look at the specific columns being sorted by, and that's
going to involve a lot of interaction with pathkeys.)

What I do think is that the quoted code snippet has no business being
outside the planner proper. It'd be better to put it in planner.c
or someplace like that.

regards, tom lane


From: Leonardo F <m_lists(at)yahoo(dot)it>
To: Greg Stark <stark(at)mit(dot)edu>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: About "Our CLUSTER implementation is pessimal" patch
Date: 2010-01-21 15:44:19
Message-ID: 363421.156.qm@web29006.mail.ird.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

>one idea could be to actually prepare a query using SPI for "select * from table order by <cols>" and then peek inside
> to see which plan was generated.

I like that!!!
Here's a first attempt, it looks like it's working...
(I still have to skip non-btree indexes and expression indexes, plus
add a ASC/DESC to the select)

static bool use_index_scan(Relation OldHeap, Oid indexOid)
{
HeapTuple indexTuple;
Form_pg_index indexForm;
struct _SPI_plan *spiPlan;
char st[(NAMEDATALEN+1)*INDEX_MAX_KEYS+NAMEDATALEN+100];
int i;
TupleDesc oldTupDesc;
bool retval = true;
PlannedStmt *plan;
CachedPlanSource *cps;

oldTupDesc = RelationGetDescr(OldHeap);

sprintf(st, "select * from %s order by ", OldHeap->rd_rel->relname.data);
indexTuple = SearchSysCache(INDEXRELID, ObjectIdGetDatum(indexOid),0, 0, 0);

if (!HeapTupleIsValid(indexTuple))
elog(ERROR, "cache lookup failed for index %u", indexOid);

indexForm = (Form_pg_index) GETSTRUCT(indexTuple);

for (i = 0; i < indexForm->indnatts; i++)
{
strcat(st, oldTupDesc->attrs[indexForm->indkey.values[i]-1]->attname.data);
if (i+1 < indexForm->indnatts)
{
strcat(st, ",");
}
}

ReleaseSysCache(indexTuple);

if (SPI_connect() != SPI_OK_CONNECT)
return false;

spiPlan = SPI_prepare(st, 0, NULL);
if (spiPlan == NULL)
{
SPI_finish();
return false;
}

cps = (CachedPlanSource*)(list_head(spiPlan->plancache_list)->data.ptr_value);
plan = (PlannedStmt*)(list_head(cps->plan->stmt_list)->data.ptr_value);
if (IsA(plan->planTree, Sort))
{
retval = false;
}

SPI_freeplan(spiPlan);
SPI_finish();

return retval;
}


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Leonardo F <m_lists(at)yahoo(dot)it>
Cc: Greg Stark <stark(at)mit(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: About "Our CLUSTER implementation is pessimal" patch
Date: 2010-01-21 16:03:55
Message-ID: 8581.1264089835@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Leonardo F <m_lists(at)yahoo(dot)it> writes:
>> one idea could be to actually prepare a query using SPI for "select * from table order by <cols>" and then peek inside
>> to see which plan was generated.

> I like that!!!
> Here's a first attempt, it looks like it's working...
> (I still have to skip non-btree indexes and expression indexes, plus
> add a ASC/DESC to the select)

By the time you make this actually work in all cases, it's probably
going to be more of a mess than the other way; not to mention that it
doesn't work *at all* without violating SPI internals.

regards, tom lane


From: Leonardo F <m_lists(at)yahoo(dot)it>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Greg Stark <stark(at)mit(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: About "Our CLUSTER implementation is pessimal" patch
Date: 2010-01-21 16:13:25
Message-ID: 366059.67064.qm@web29018.mail.ird.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> By the time you make this actually work in all cases, it's probably
> going to be more of a mess than the other way;

I meant to add only ASC/DESC; I would leave all other cases
(non-btrees, custom expression btrees) to use the old index-scan method.

> not to mention that it
> doesn't work *at all* without violating SPI internals.

You lost me there...


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Leonardo F <m_lists(at)yahoo(dot)it>
Cc: Greg Stark <stark(at)mit(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: About "Our CLUSTER implementation is pessimal" patch
Date: 2010-01-21 16:19:51
Message-ID: 8918.1264090791@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Leonardo F <m_lists(at)yahoo(dot)it> writes:
>> By the time you make this actually work in all cases, it's probably
>> going to be more of a mess than the other way;

> I meant to add only ASC/DESC; I would leave all other cases
> (non-btrees, custom expression btrees) to use the old index-scan method.

That hardly seems acceptable.

>> not to mention that it
>> doesn't work *at all* without violating SPI internals.

> You lost me there...

You're poking into a data structure you shouldn't be poking into:

/* Plans are opaque structs for standard users of SPI */
typedef struct _SPI_plan *SPIPlanPtr;

I hardly think that keeping yourself at arm's length from the planner
by getting cozy with SPI internals is a net improvement in modularity.

regards, tom lane


From: Leonardo F <m_lists(at)yahoo(dot)it>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Greg Stark <stark(at)mit(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: About "Our CLUSTER implementation is pessimal" patch
Date: 2010-01-21 17:00:20
Message-ID: 638245.69950.qm@web29020.mail.ird.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> > I meant to add only ASC/DESC; I would leave all other cases
> > (non-btrees, custom expression btrees) to use the old index-scan method.
>
> That hardly seems acceptable.

Well I brought up that in an earlier post:

http://old.nabble.com/Re%3A-About-%22Our-CLUSTER-implementation-is-pessimal%22-patch-p27179107.html

I hoped that since people mostly (>95%?) use plain btree indexes,
a patch that helped CLUSTER with using such indexes would be fine
(at least at first...). I guess that a patch that deals with all other types of
indexes would be way more complicated (not at the "planning" stage,
but in the scan+sort phase)?

> I hardly think that keeping yourself at arm's length from the planner
> by getting cozy with SPI internals is a net improvement in modularity.

So you think that code snippet that I sent earlier (the function that uses
create_index_path etc) could be put in planner.c (almost) as it is? It looked
clumsy to me (I liked the SPI code better)

Leonardo


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Leonardo F <m_lists(at)yahoo(dot)it>
Cc: Greg Stark <stark(at)mit(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: About "Our CLUSTER implementation is pessimal" patch
Date: 2010-01-21 17:12:10
Message-ID: 9900.1264093930@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Leonardo F <m_lists(at)yahoo(dot)it> writes:
> I hoped that since people mostly (>95%?) use plain btree indexes,
> a patch that helped CLUSTER with using such indexes would be fine
> (at least at first...). I guess that a patch that deals with all other types of
> indexes would be way more complicated (not at the "planning" stage,
> but in the scan+sort phase)?

Well, the expression cases would be more likely to cost more if
implemented as a sort, but that doesn't mean that a sort couldn't be a
win. Besides, even if you blow off the expression case, what about
nulls first/last, nondefault opclasses, etc?

regards, tom lane


From: Leonardo F <m_lists(at)yahoo(dot)it>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: About "Our CLUSTER implementation is pessimal" patch
Date: 2010-01-21 18:28:34
Message-ID: 51641.59582.qm@web29009.mail.ird.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> Well, the expression cases would be more likely to cost more if
> implemented as a sort, but that doesn't mean that a sort couldn't be a
> win. Besides, even if you blow off the expression case, what about
> nulls first/last, nondefault opclasses, etc?

Ok, let's split the problem in 2 parts:

a) "choosing the right path"
b) "using seq scan + sort in case the planner (somehow) said it's faster"

You're right: for a) nondefault opclasses and other things would make the
SPI version "wrong". So let's stick for the moment with the cost_sort
create_index_path etc calls for a). I think that the calls to create_index_path
would also deal with every other possible index type (something that's
impossible with the SPI version).

For b):

I'm using the code in

http://archives.postgresql.org/pgsql-hackers/2008-08/msg01371.php

That doesn't deal with expression indexes, nor with anything that is not
a btree index. But it's much faster on what people use the most (and not
slower on anything else).

So my proposal would be: do the test seq_scan vs sort/index_scan only for
regular btree index, and integrate that test in the planner.

That doesn't mean that sometime in the future we're not going to have full
support for all index types in seq scan + sort CLUSTER... but I would like
to have something that works faster on what people use, and not slower
in the other cases without waiting ages to have the "whole" thing...


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Leonardo F <m_lists(at)yahoo(dot)it>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: About "Our CLUSTER implementation is pessimal" patch
Date: 2010-01-21 18:32:45
Message-ID: 603c8f071001211032k3168aac2w432e74ae6bda6c17@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Jan 21, 2010 at 1:28 PM, Leonardo F <m_lists(at)yahoo(dot)it> wrote:
>> Well, the expression cases would be more likely to cost more if
>> implemented as a sort, but that doesn't mean that a sort couldn't be a
>> win.  Besides, even if you blow off the expression case, what about
>> nulls first/last, nondefault opclasses, etc?
>
>
> Ok, let's split the problem in 2 parts:
>
> a) "choosing the right path"
> b) "using seq scan + sort in case the planner (somehow) said it's faster"
>
> You're right: for a) nondefault opclasses and other things would make the
> SPI version "wrong". So let's stick for the moment with the cost_sort
> create_index_path etc calls for a). I think that the calls to create_index_path
> would also deal with every other possible index type (something that's
> impossible with the SPI version).
>
> For b):
>
> I'm using the code in
>
> http://archives.postgresql.org/pgsql-hackers/2008-08/msg01371.php
>
> That doesn't deal with expression indexes, nor with anything that is not
> a btree index. But it's much faster on what people use the most (and not
> slower on anything else).
>
> So my proposal would be: do the test seq_scan vs sort/index_scan only for
> regular btree index, and integrate that test in the planner.
>
> That doesn't mean that sometime in the future we're not going to have full
> support for all index types in seq scan + sort CLUSTER... but I would like
> to have something that works faster on what people use, and not slower
> in the other cases without waiting ages to have the "whole" thing...

Keep in mind that this patch was after the deadline for 9.0, so there
is probably not a huge rush to get this done.

...Robert


From: Takahiro Itagaki <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Leonardo F <m_lists(at)yahoo(dot)it>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: About "Our CLUSTER implementation is pessimal" patch
Date: 2010-01-22 00:45:04
Message-ID: 20100122094504.E4A7.52131E4D@oss.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> What I do think is that the quoted code snippet has no business being
> outside the planner proper. It'd be better to put it in planner.c
> or someplace like that.

Ah, I see. My concern was the dummy planner approach is using internal
functions of planner. It would be better if planner module exports
a cost estimate function for cluster.

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center


From: Leonardo F <m_lists(at)yahoo(dot)it>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: About "Our CLUSTER implementation is pessimal" patch
Date: 2010-01-22 08:11:28
Message-ID: 788271.67623.qm@web29002.mail.ird.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> > So my proposal would be: do the test seq_scan vs sort/index_scan only for
> > regular btree index, and integrate that test in the planner.
>
> Keep in mind that this patch was after the deadline for 9.0, so there
> is probably not a huge rush to get this done.

That's true; I'll try to get the whole thing done then...


From: Greg Stark <stark(at)mit(dot)edu>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Leonardo F <m_lists(at)yahoo(dot)it>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: About "Our CLUSTER implementation is pessimal" patch
Date: 2010-01-22 14:19:47
Message-ID: 407d949e1001220619t58b43210tbc03c217a6d4f2e7@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Jan 21, 2010 at 4:19 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> You're poking into a data structure you shouldn't be poking into:
>
> /* Plans are opaque structs for standard users of SPI */
> typedef struct _SPI_plan *SPIPlanPtr;
>
> I hardly think that keeping yourself at arm's length from the planner
> by getting cozy with SPI internals is a net improvement in modularity.

You could do it without poking into the SPI structure. You could
define a planner_hook which calls the regular planner and then just
use SPI to drive the parser etc and invoke your planner hook. The
planner hook could inspect the plan, even modify it.

One of the plans I considered early on which I skipped on because I
thought it might get too complex was to add special syntax like
"SELECT __heaptuple__ from table ORDER BY ..." and then just run it
through the regular executor (and do some unspecified hackery around
the visibility rules).

Part of the reason I'm interested in using the regular planner as much
as possible is that if you have a partial index it could be a huge win
to use some unexpected different index to scan the part of the table
you need.

--
greg


From: Leonardo F <m_lists(at)yahoo(dot)it>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: About "Our CLUSTER implementation is pessimal" patch
Date: 2010-01-22 16:54:39
Message-ID: 491989.57840.qm@web29017.mail.ird.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

So, if I'm not mistaken:

hash indexes -> can't be used in CLUSTER
gin indexes -> can't be used in CLUSTER

that leaves:

btree -> ok
expression btree -> I have to find a way to compute the expression for
each tuple: hints?
gist -> how can I get something "comparable" by tuplesort? Or should I rule it
out from the seq scan + sort path?

Leonardo


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Leonardo F <m_lists(at)yahoo(dot)it>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: About "Our CLUSTER implementation is pessimal" patch
Date: 2010-01-22 17:31:33
Message-ID: 3533.1264181493@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Leonardo F <m_lists(at)yahoo(dot)it> writes:
> gist -> how can I get something "comparable" by tuplesort? Or should I rule it
> out from the seq scan + sort path?

Rule it out. Note you should be looking at pg_am.amcanorder, not
hardwiring knowledge of particular index types.

regards, tom lane


From: Leonardo F <m_lists(at)yahoo(dot)it>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: About "Our CLUSTER implementation is pessimal" patch
Date: 2010-01-22 17:56:55
Message-ID: 2601.38715.qm@web29001.mail.ird.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> Note you should be looking at pg_am.amcanorder, not
> hardwiring knowledge of particular index types.

Ok.

Would it make sense to use

FormIndexDatum

to get the index value to be used by tuplesort?

I'm having trouble avoiding the call to _bt_mkscankey_nodata
to get the scanKeys... that relates to btree only, I can't find
the "general" function...


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Leonardo F <m_lists(at)yahoo(dot)it>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: About "Our CLUSTER implementation is pessimal" patch
Date: 2010-01-22 18:09:28
Message-ID: 4251.1264183768@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Leonardo F <m_lists(at)yahoo(dot)it> writes:
> Would it make sense to use
> FormIndexDatum
> to get the index value to be used by tuplesort?

That would probably work. You might want to look at the code associated
with the recently-added exclusion constraint feature; that also has a
need to reproduce index entries sometimes.

regards, tom lane