Re: Enhanced containment selectivity function

Lists: pgsql-hackerspgsql-patches
From: Matteo Beccati <php(at)beccati(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Enhanced containment selectivity function
Date: 2005-08-04 14:36:54
Message-ID: 42F22806.20503@beccati.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Hi,

I've recently had problems with slow queries caused by the selectivity
of the <@ ltree operator, as you may see in my post here:

http://archives.postgresql.org/pgsql-performance/2005-07/msg00473.php

Someone on IRC (AndrewSN if I'm not wrong) pointed out that the
restriction selectivity function for <@ is contsel, which returns a
constant value of 0.001. So I started digging in the source code trying
to understand how the default behaviour could be enhanced, and ended up
writing a little patch which adds an alternative containment selectivity
function (called "contstatsel") which is able to deliver better results.

This first version is based on the eqsel function and uses only
histogram values to calculate the selectivity and uses the 0.001
constant as a fallback.

This also made me think: is there a reason why geometric selectivity
functions return constant values rather than checking statistics for a
better result?

Attached you will find a patch suitable for current CVS HEAD. My C
skills are a bit rusty and my knowledge of pg internals are very poor,
so I'm sure it could be improved and modified to better fit the pg
coding standards.

Here are the results on a slow query:

test=# EXPLAIN ANALYZE SELECT * FROM gw_users JOIN gw_batches USING
(u_id) WHERE tree <@ '1041' AND t_stamp > '2005-07-01';

QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.00..553.02 rows=8 width=364) (actual
time=2.423..19787.259 rows=6785 loops=1)
-> Index Scan using gw_users_gisttree_key on gw_users
(cost=0.00..21.63 rows=5 width=156) (actual time=0.882..107.434
rows=4696 loops=1)
Index Cond: (tree <@ '1041'::ltree)
-> Index Scan using gw_batches_t_stamp_u_id_key on gw_batches
(cost=0.00..106.09 rows=15 width=212) (actual time=3.898..4.171 rows=1
loops=4696)
Index Cond: ((gw_batches.t_stamp > '2005-07-01
00:00:00+02'::timestamp with time zone) AND ("outer".u_id =
gw_batches.u_id))
Total runtime: 19805.447 ms
(6 rows)

test=# EXPLAIN ANALYZE SELECT * FROM gw_users JOIN gw_batches USING
(u_id) WHERE tree <<@ '1041' AND t_stamp > '2005-07-01';

QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=245.26..1151.80 rows=7671 width=364) (actual
time=69.562..176.966 rows=6785 loops=1)
Hash Cond: ("outer".u_id = "inner".u_id)
-> Bitmap Heap Scan on gw_batches (cost=57.74..764.39 rows=8212
width=212) (actual time=8.330..39.542 rows=7819 loops=1)
Recheck Cond: (t_stamp > '2005-07-01 00:00:00+02'::timestamp
with time zone)
-> Bitmap Index Scan on gw_batches_t_stamp_u_id_key
(cost=0.00..57.74 rows=8212 width=0) (actual time=8.120..8.120 rows=7819
loops=1)
Index Cond: (t_stamp > '2005-07-01
00:00:00+02'::timestamp with time zone)
-> Hash (cost=175.79..175.79 rows=4692 width=156) (actual
time=61.046..61.046 rows=4696 loops=1)
-> Seq Scan on gw_users (cost=0.00..175.79 rows=4692
width=156) (actual time=0.083..34.200 rows=4696 loops=1)
Filter: (tree <<@ '1041'::ltree)
Total runtime: 194.621 ms
(10 rows)

The second query uses a custom <<@ operator I added to test the
alternative selectivity function:

CREATE FUNCTION contstatsel(internal, oid, internal, integer) RETURNS
double precision AS 'contstatsel' LANGUAGE internal;
CREATE OPERATOR <<@ (
LEFTARG = ltree,
LEFTARG = ltree,
PROCEDURE = ltree_risparent,
COMMUTATOR = '@>',
RESTRICT = contstatsel,
JOIN = contjoinsel
);

Of course any comments/feedback are welcome.

Best regards
--
Matteo Beccati
http://phpadsnew.com/
http://phppgads.com/

Attachment Content-Type Size
contstatsel.diff text/plain 6.1 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Matteo Beccati <php(at)beccati(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Enhanced containment selectivity function
Date: 2005-08-04 15:12:58
Message-ID: 15936.1123168378@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Matteo Beccati <php(at)beccati(dot)com> writes:
> This also made me think: is there a reason why geometric selectivity
> functions return constant values rather than checking statistics for a
> better result?

Because no one's ever bothered to work on them. You should talk to
the PostGIS guys, however, because they *have* been working on real
statistics and real estimation functions for geometric types. It'd
be nice to get some of that work back-ported into the core database.

http://www.postgis.org/

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Matteo Beccati <php(at)beccati(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Teodor Sigaev <teodor(at)sigaev(dot)ru>
Subject: Re: Enhanced containment selectivity function
Date: 2005-08-04 15:49:55
Message-ID: 16193.1123170595@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Matteo Beccati <php(at)beccati(dot)com> writes:
> Someone on IRC (AndrewSN if I'm not wrong) pointed out that the
> restriction selectivity function for <@ is contsel, which returns a
> constant value of 0.001. So I started digging in the source code trying
> to understand how the default behaviour could be enhanced, and ended up
> writing a little patch which adds an alternative containment selectivity
> function (called "contstatsel") which is able to deliver better results.

After looking at this a little, it doesn't seem like it has much to do
with the ordinary 2-D notion of containment. In most of the core
geometric types, the "histogram" ordering is based on area, and so
testing the histogram samples against the query doesn't seem like it's
able to give very meaningful containment results --- the items shown
in the histogram could have any locations whatever.

The approach might be sensible for ltree's isparent operator --- I don't
have a very good feeling for the behavior of that operator, but it looks
like it has at least some relationship to the ordering induced by the
ltree < operator.

So my thought is that (assuming Oleg and Teodor agree this is sensible
for ltree) we should put the selectivity function into contrib/ltree,
not directly into the core. It might be best to call it something like
"parentsel", too, to avoid giving the impression that it has something
to do with 2-D containment.

Also, you should think about using the most-common-values list as well
as the histogram. I would guess that many ltree applications would have
enough duplicate entries that the MCV list represents a significant
fraction of the total population. Keep in mind when thinking about this
that the histogram describes the population of data *exclusive of the
MCV entries*.

regards, tom lane


From: Matteo Beccati <php(at)beccati(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Teodor Sigaev <teodor(at)sigaev(dot)ru>
Subject: Re: Enhanced containment selectivity function
Date: 2005-08-04 17:09:30
Message-ID: 42F24BCA.2030802@beccati.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Tom Lane wrote:
> After looking at this a little, it doesn't seem like it has much to do
> with the ordinary 2-D notion of containment. In most of the core
> geometric types, the "histogram" ordering is based on area, and so
> testing the histogram samples against the query doesn't seem like it's
> able to give very meaningful containment results --- the items shown
> in the histogram could have any locations whatever.
>
> The approach might be sensible for ltree's isparent operator --- I don't
> have a very good feeling for the behavior of that operator, but it looks
> like it has at least some relationship to the ordering induced by the
> ltree < operator.

Actually, this was one of my doubts. The custom function seem to work
well with ltree, but this also could be dependant from the way my
dataset is organized.

> So my thought is that (assuming Oleg and Teodor agree this is sensible
> for ltree) we should put the selectivity function into contrib/ltree,
> not directly into the core. It might be best to call it something like
> "parentsel", too, to avoid giving the impression that it has something
> to do with 2-D containment.
>
> Also, you should think about using the most-common-values list as well
> as the histogram. I would guess that many ltree applications would have
> enough duplicate entries that the MCV list represents a significant
> fraction of the total population. Keep in mind when thinking about this
> that the histogram describes the population of data *exclusive of the
> MCV entries*.

I also agree that "parentsel" would better fit its purpose.

My patch was originally using MCV without good results, until I realized
that MCV was empty because the column contains unique values :)
I'll look into adding a MCV check to it.

Moving it in contrib/ltree would be more difficult to me because it
depends on other functions declared in selfuncs.c
(get_restriction_variable, etc).

Thank you for your feedback

Best regards
--
Matteo Beccati
http://phpadsnew.com/
http://phppgads.com/


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Matteo Beccati <php(at)beccati(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Teodor Sigaev <teodor(at)sigaev(dot)ru>
Subject: Re: Enhanced containment selectivity function
Date: 2005-08-04 17:15:28
Message-ID: 26303.1123175728@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Matteo Beccati <php(at)beccati(dot)com> writes:
> Moving it in contrib/ltree would be more difficult to me because it
> depends on other functions declared in selfuncs.c
> (get_restriction_variable, etc).

I'd be willing to consider exporting those functions from selfuncs.c.

regards, tom lane


From: Matteo Beccati <php(at)beccati(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Teodor Sigaev <teodor(at)sigaev(dot)ru>
Subject: Re: Enhanced containment selectivity function
Date: 2005-08-06 13:22:42
Message-ID: 42F4B9A2.4050102@beccati.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Hi,

>>Moving it in contrib/ltree would be more difficult to me because it
>>depends on other functions declared in selfuncs.c
>>(get_restriction_variable, etc).
>
> I'd be willing to consider exporting those functions from selfuncs.c.

In the meanwhile here is the latest patch which uses both mcv and
histogram values.

BTW, when restoring my test database I've found out that there were many
errors on ALTER INDEX "something" OWNER TO ... :

ERROR: "something" is not a table, view, or sequence

This using 8.1devel pg_restore and a 8.0.3 compressed dump. I could be
wrong, but I didn't get those errors a few days ago (some cvs updates ago).

Best regards
--
Matteo Beccati
http://phpadsnew.com/
http://phppgads.com/

Attachment Content-Type Size
parentsel.diff text/plain 11.5 KB

From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Matteo Beccati <php(at)beccati(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Teodor Sigaev <teodor(at)sigaev(dot)ru>
Subject: Re: Enhanced containment selectivity function
Date: 2005-08-13 02:32:37
Message-ID: 200508130232.j7D2Wb201076@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


This has been saved for the 8.2 release:

http://momjian.postgresql.org/cgi-bin/pgpatches_hold

---------------------------------------------------------------------------

Matteo Beccati wrote:
> Hi,
>
> >>Moving it in contrib/ltree would be more difficult to me because it
> >>depends on other functions declared in selfuncs.c
> >>(get_restriction_variable, etc).
> >
> > I'd be willing to consider exporting those functions from selfuncs.c.
>
> In the meanwhile here is the latest patch which uses both mcv and
> histogram values.
>
>
> BTW, when restoring my test database I've found out that there were many
> errors on ALTER INDEX "something" OWNER TO ... :
>
> ERROR: "something" is not a table, view, or sequence
>
> This using 8.1devel pg_restore and a 8.0.3 compressed dump. I could be
> wrong, but I didn't get those errors a few days ago (some cvs updates ago).
>
>
> Best regards
> --
> Matteo Beccati
> http://phpadsnew.com/
> http://phppgads.com/

> Index: contrib/ltree/ltree.sql.in
> ===================================================================
> RCS file: /projects/cvsroot/pgsql/contrib/ltree/ltree.sql.in,v
> retrieving revision 1.9
> diff -c -r1.9 ltree.sql.in
> *** contrib/ltree/ltree.sql.in 30 Mar 2004 15:45:32 -0000 1.9
> --- contrib/ltree/ltree.sql.in 6 Aug 2005 13:10:35 -0000
> ***************
> *** 230,236 ****
> RIGHTARG = ltree,
> PROCEDURE = ltree_isparent,
> COMMUTATOR = '<@',
> ! RESTRICT = contsel,
> JOIN = contjoinsel
> );
>
> --- 230,236 ----
> RIGHTARG = ltree,
> PROCEDURE = ltree_isparent,
> COMMUTATOR = '<@',
> ! RESTRICT = parentsel,
> JOIN = contjoinsel
> );
>
> ***************
> *** 248,254 ****
> RIGHTARG = ltree,
> PROCEDURE = ltree_risparent,
> COMMUTATOR = '@>',
> ! RESTRICT = contsel,
> JOIN = contjoinsel
> );
>
> --- 248,254 ----
> RIGHTARG = ltree,
> PROCEDURE = ltree_risparent,
> COMMUTATOR = '@>',
> ! RESTRICT = parentsel,
> JOIN = contjoinsel
> );
>
> Index: src/backend/utils/adt/selfuncs.c
> ===================================================================
> RCS file: /projects/cvsroot/pgsql/src/backend/utils/adt/selfuncs.c,v
> retrieving revision 1.187
> diff -c -r1.187 selfuncs.c
> *** src/backend/utils/adt/selfuncs.c 21 Jul 2005 04:41:43 -0000 1.187
> --- src/backend/utils/adt/selfuncs.c 6 Aug 2005 13:10:46 -0000
> ***************
> *** 1306,1311 ****
> --- 1306,1488 ----
> return (Selectivity) selec;
> }
>
> + #define DEFAULT_PARENT_SEL 0.001
> +
> + /*
> + * parentsel - Selectivity of parent relationship for ltree data types.
> + */
> + Datum
> + parentsel(PG_FUNCTION_ARGS)
> + {
> + PlannerInfo *root = (PlannerInfo *) PG_GETARG_POINTER(0);
> + Oid operator = PG_GETARG_OID(1);
> + List *args = (List *) PG_GETARG_POINTER(2);
> + int varRelid = PG_GETARG_INT32(3);
> + VariableStatData vardata;
> + Node *other;
> + bool varonleft;
> + Datum *values;
> + int nvalues;
> + float4 *numbers;
> + int nnumbers;
> + double selec = 0.0;
> +
> + /*
> + * If expression is not variable <@ something or something <@ variable,
> + * then punt and return a default estimate.
> + */
> + if (!get_restriction_variable(root, args, varRelid,
> + &vardata, &other, &varonleft))
> + PG_RETURN_FLOAT8(DEFAULT_PARENT_SEL);
> +
> + /*
> + * If the something is a NULL constant, assume operator is strict and
> + * return zero, ie, operator will never return TRUE.
> + */
> + if (IsA(other, Const) &&
> + ((Const *) other)->constisnull)
> + {
> + ReleaseVariableStats(vardata);
> + PG_RETURN_FLOAT8(0.0);
> + }
> +
> + if (HeapTupleIsValid(vardata.statsTuple))
> + {
> + Form_pg_statistic stats;
> + double mcvsum = 0.0;
> + double mcvsel = 0.0;
> + double hissel = 0.0;
> +
> + stats = (Form_pg_statistic) GETSTRUCT(vardata.statsTuple);
> +
> + if (IsA(other, Const))
> + {
> + /* Variable is being compared to a known non-null constant */
> + Datum constval = ((Const *) other)->constvalue;
> + bool match = false;
> + int i;
> +
> + /*
> + * Is the constant "<@" to any of the column's most common
> + * values?
> + */
> + if (get_attstatsslot(vardata.statsTuple,
> + vardata.atttype, vardata.atttypmod,
> + STATISTIC_KIND_MCV, InvalidOid,
> + &values, &nvalues,
> + &numbers, &nnumbers))
> + {
> + FmgrInfo contproc;
> +
> + fmgr_info(get_opcode(operator), &contproc);
> +
> + for (i = 0; i < nvalues; i++)
> + {
> + /* be careful to apply operator right way 'round */
> + if (varonleft)
> + match = DatumGetBool(FunctionCall2(&contproc,
> + values[i],
> + constval));
> + else
> + match = DatumGetBool(FunctionCall2(&contproc,
> + constval,
> + values[i]));
> +
> + /* calculate total selectivity of all most-common-values */
> + mcvsum += numbers[i];
> +
> + /* calculate selectivity of matching most-common-values */
> + if (match)
> + mcvsel += numbers[i];
> + }
> + }
> + else
> + {
> + /* no most-common-values info available */
> + values = NULL;
> + numbers = NULL;
> + i = nvalues = nnumbers = 0;
> + }
> +
> + free_attstatsslot(vardata.atttype, values, nvalues,
> + NULL, 0);
> +
> +
> + /*
> + * Is the constant "<@" to any of the column's histogram
> + * values?
> + */
> + if (get_attstatsslot(vardata.statsTuple,
> + vardata.atttype, vardata.atttypmod,
> + STATISTIC_KIND_HISTOGRAM, InvalidOid,
> + &values, &nvalues,
> + NULL, NULL))
> + {
> + FmgrInfo contproc;
> +
> + fmgr_info(get_opcode(operator), &contproc);
> +
> + for (i = 0; i < nvalues; i++)
> + {
> + /* be careful to apply operator right way 'round */
> + if (varonleft)
> + match = DatumGetBool(FunctionCall2(&contproc,
> + values[i],
> + constval));
> + else
> + match = DatumGetBool(FunctionCall2(&contproc,
> + constval,
> + values[i]));
> + /* count matching histogram values */
> + if (match)
> + hissel++;
> + }
> +
> + if (hissel > 0.0)
> + {
> + /*
> + * some matching values found inside histogram, divide matching entries number
> + * by total histogram entries to get the histogram related selectivity
> + */
> + hissel /= nvalues;
> + }
> + }
> + else
> + {
> + /* no histogram info available */
> + values = NULL;
> + i = nvalues = 0;
> + }
> +
> + free_attstatsslot(vardata.atttype, values, nvalues,
> + NULL, 0);
> +
> +
> + /*
> + * calculate selectivity based on MCV and histogram result
> + * histogram selectivity needs to be scaled down if there are any most-common-values
> + */
> + selec = mcvsel + hissel * (1.0 - mcvsum);
> +
> + /* don't return 0.0 selectivity unless all table values are inside mcv */
> + if (selec == 0.0 && mcvsum != 1.0)
> + selec = DEFAULT_PARENT_SEL;
> + }
> + else
> + selec = DEFAULT_PARENT_SEL;
> + }
> + else
> + selec = DEFAULT_PARENT_SEL;
> +
> +
> + ReleaseVariableStats(vardata);
> +
> + /* result should be in range, but make sure... */
> + CLAMP_PROBABILITY(selec);
> +
> + PG_RETURN_FLOAT8((float8) selec);
> + }
> +
> /*
> * eqjoinsel - Join selectivity of "="
> */
> Index: src/include/catalog/pg_proc.h
> ===================================================================
> RCS file: /projects/cvsroot/pgsql/src/include/catalog/pg_proc.h,v
> retrieving revision 1.380
> diff -c -r1.380 pg_proc.h
> *** src/include/catalog/pg_proc.h 2 Aug 2005 16:11:57 -0000 1.380
> --- src/include/catalog/pg_proc.h 6 Aug 2005 13:10:59 -0000
> ***************
> *** 3750,3755 ****
> --- 3750,3758 ----
> DATA(insert OID = 2592 ( gist_circle_compress PGNSP PGUID 12 f f t f i 1 2281 "2281" _null_ _null_ _null_ gist_circle_compress - _null_ ));
> DESCR("GiST support");
>
> + DATA(insert OID = 2600 ( parentsel PGNSP PGUID 12 f f t f s 4 701 "2281 26 2281 23" _null_ _null_ _null_ parentsel - _null_ ));
> + DESCR("enhanced restriction selectivity for ltree isparent comparison operators");
> +
>
> /*
> * Symbolic values for provolatile column: these indicate whether the result
> Index: src/include/utils/selfuncs.h
> ===================================================================
> RCS file: /projects/cvsroot/pgsql/src/include/utils/selfuncs.h,v
> retrieving revision 1.23
> diff -c -r1.23 selfuncs.h
> *** src/include/utils/selfuncs.h 5 Jun 2005 22:32:58 -0000 1.23
> --- src/include/utils/selfuncs.h 6 Aug 2005 13:11:00 -0000
> ***************
> *** 95,100 ****
> --- 95,102 ----
> extern Datum nlikesel(PG_FUNCTION_ARGS);
> extern Datum icnlikesel(PG_FUNCTION_ARGS);
>
> + extern Datum parentsel(PG_FUNCTION_ARGS);
> +
> extern Datum eqjoinsel(PG_FUNCTION_ARGS);
> extern Datum neqjoinsel(PG_FUNCTION_ARGS);
> extern Datum scalarltjoinsel(PG_FUNCTION_ARGS);
>

>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Matteo Beccati <php(at)beccati(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, PostgreSQL-patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: [HACKERS] Enhanced containment selectivity function
Date: 2006-04-26 18:33:09
Message-ID: 200604261833.k3QIX9l14013@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


Cleaned-up patch attached and applied. Catalog version updated in
separate patch.

---------------------------------------------------------------------------

Matteo Beccati wrote:
> Hi,
>
> >>Moving it in contrib/ltree would be more difficult to me because it
> >>depends on other functions declared in selfuncs.c
> >>(get_restriction_variable, etc).
> >
> > I'd be willing to consider exporting those functions from selfuncs.c.
>
> In the meanwhile here is the latest patch which uses both mcv and
> histogram values.
>
>
> BTW, when restoring my test database I've found out that there were many
> errors on ALTER INDEX "something" OWNER TO ... :
>
> ERROR: "something" is not a table, view, or sequence
>
> This using 8.1devel pg_restore and a 8.0.3 compressed dump. I could be
> wrong, but I didn't get those errors a few days ago (some cvs updates ago).
>
>
> Best regards
> --
> Matteo Beccati
> http://phpadsnew.com/
> http://phppgads.com/

--
Bruce Momjian http://candle.pha.pa.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

Attachment Content-Type Size
unknown_filename text/plain 8.0 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: Matteo Beccati <php(at)beccati(dot)com>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, PostgreSQL-patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: [HACKERS] Enhanced containment selectivity function
Date: 2006-04-26 19:12:46
Message-ID: 20368.1146078766@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> Cleaned-up patch attached and applied.

Please revert this patch. This has not been updated to satisfy the
previous agreement about how it should work. It is completely
inappropriate to be dropping code that's specific to one contrib module
into the core selfuncs.c file. What we had agreed to do was look at
exporting some of the currently-static functions in selfuncs.c so that
contrib modules could make use of them from outside the core --- but
this patch doesn't do that.

regards, tom lane


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Matteo Beccati <php(at)beccati(dot)com>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, PostgreSQL-patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: [HACKERS] Enhanced containment selectivity function
Date: 2006-04-26 20:26:53
Message-ID: 200604262026.k3QKQr301927@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Tom Lane wrote:
> Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> > Cleaned-up patch attached and applied.
>
> Please revert this patch. This has not been updated to satisfy the
> previous agreement about how it should work. It is completely
> inappropriate to be dropping code that's specific to one contrib module
> into the core selfuncs.c file. What we had agreed to do was look at
> exporting some of the currently-static functions in selfuncs.c so that
> contrib modules could make use of them from outside the core --- but
> this patch doesn't do that.

OK, reverted, but I saw it using contsel() so I figured we were allowing
it, but I see contsel() is used by our "box", so ltree was just using
something that was already there. Let me see if I can break out the new
selectivity function into /contrib.

--
Bruce Momjian http://candle.pha.pa.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: Matteo Beccati <php(at)beccati(dot)com>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, PostgreSQL-patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: [HACKERS] Enhanced containment selectivity function
Date: 2006-04-26 20:33:06
Message-ID: 20963.1146083586@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> OK, reverted, but I saw it using contsel() so I figured we were allowing
> it, but I see contsel() is used by our "box", so ltree was just using
> something that was already there. Let me see if I can break out the new
> selectivity function into /contrib.

What really needs to happen next is to think about which bits of
selfuncs.c should be exposed --- what's generally useful, and do we
think that it has an API clean/stable enough to expose? The reason I
kept all that stuff static so far was because it got whacked around
every release or two, and I didn't want to be constrained by worries
about breaking outside modules. I'd still prefer to minimize the number
of routines exposed, so some thought is needed.

regards, tom lane


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Matteo Beccati <php(at)beccati(dot)com>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, PostgreSQL-patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: [HACKERS] Enhanced containment selectivity function
Date: 2006-04-26 20:45:48
Message-ID: 200604262045.k3QKjmX06034@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Tom Lane wrote:
> Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> > OK, reverted, but I saw it using contsel() so I figured we were allowing
> > it, but I see contsel() is used by our "box", so ltree was just using
> > something that was already there. Let me see if I can break out the new
> > selectivity function into /contrib.
>
> What really needs to happen next is to think about which bits of
> selfuncs.c should be exposed --- what's generally useful, and do we
> think that it has an API clean/stable enough to expose? The reason I
> kept all that stuff static so far was because it got whacked around
> every release or two, and I didn't want to be constrained by worries
> about breaking outside modules. I'd still prefer to minimize the number
> of routines exposed, so some thought is needed.

Well, the ltree routine just needs struct VariableStatData and
macro ReleaseVariableStats(), and we need to change one function from
static to global. That and some #includes, and it works. Do we want to
be more formal about it?

--
Bruce Momjian http://candle.pha.pa.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: Matteo Beccati <php(at)beccati(dot)com>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, PostgreSQL-patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: [HACKERS] Enhanced containment selectivity function
Date: 2006-04-26 22:36:26
Message-ID: 21789.1146090986@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> Tom Lane wrote:
>> What really needs to happen next is to think about which bits of
>> selfuncs.c should be exposed --- what's generally useful, and do we
>> think that it has an API clean/stable enough to expose?

> Well, the ltree routine just needs struct VariableStatData and
> macro ReleaseVariableStats(), and we need to change one function from
> static to global. That and some #includes, and it works. Do we want to
> be more formal about it?

If we're going to expose VariableStatData then I think the minimum
reasonable set of supporting functions would be examine_variable(),
get_restriction_variable(), get_join_variable(). Matteo's patch only
uses the second one but that's because he's only implementing one type
of estimator (no joinsel function for instance).

I'd be inclined to create a new header file, too, instead of cluttering
builtins.h with this stuff. utils/selfuncs.h probably.

I'm willing to work on this, but it doesn't look like you reverted the
prior patch yet?

regards, tom lane


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Matteo Beccati <php(at)beccati(dot)com>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, PostgreSQL-patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: [HACKERS] Enhanced containment selectivity function
Date: 2006-04-26 22:39:09
Message-ID: 200604262239.k3QMd9E19372@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Tom Lane wrote:
> Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> > Tom Lane wrote:
> >> What really needs to happen next is to think about which bits of
> >> selfuncs.c should be exposed --- what's generally useful, and do we
> >> think that it has an API clean/stable enough to expose?
>
> > Well, the ltree routine just needs struct VariableStatData and
> > macro ReleaseVariableStats(), and we need to change one function from
> > static to global. That and some #includes, and it works. Do we want to
> > be more formal about it?
>
> If we're going to expose VariableStatData then I think the minimum
> reasonable set of supporting functions would be examine_variable(),
> get_restriction_variable(), get_join_variable(). Matteo's patch only
> uses the second one but that's because he's only implementing one type
> of estimator (no joinsel function for instance).
>
> I'd be inclined to create a new header file, too, instead of cluttering
> builtins.h with this stuff. utils/selfuncs.h probably.
>
> I'm willing to work on this, but it doesn't look like you reverted the
> prior patch yet?

Uh, I just moved the selectivity function over to /contrib/ltree, and
moved what I needed, so it now works. You can continue with the plan
above, or I can.

--
Bruce Momjian http://candle.pha.pa.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: Matteo Beccati <php(at)beccati(dot)com>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, PostgreSQL-patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: [HACKERS] Enhanced containment selectivity function
Date: 2006-04-26 22:46:34
Message-ID: 21905.1146091594@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> Tom Lane wrote:
>> I'm willing to work on this, but it doesn't look like you reverted the
>> prior patch yet?

> Uh, I just moved the selectivity function over to /contrib/ltree, and
> moved what I needed, so it now works. You can continue with the plan
> above, or I can.

I'll see what I can do with it. I've got an evening of watching my
laptop reinstall anyway (just replaced a failing hard drive :-()

regards, tom lane


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Matteo Beccati <php(at)beccati(dot)com>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, PostgreSQL-patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: [HACKERS] Enhanced containment selectivity function
Date: 2006-04-26 22:48:45
Message-ID: 200604262248.k3QMmjk21076@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Tom Lane wrote:
> Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> > Tom Lane wrote:
> >> I'm willing to work on this, but it doesn't look like you reverted the
> >> prior patch yet?
>
> > Uh, I just moved the selectivity function over to /contrib/ltree, and
> > moved what I needed, so it now works. You can continue with the plan
> > above, or I can.
>
> I'll see what I can do with it. I've got an evening of watching my
> laptop reinstall anyway (just replaced a failing hard drive :-()

Interesting. I took one of my laptops into service today and will get it
back tomorrow with a new empty harddrive. One big thing I have found to
help is to document all laptop changes in a backed-up server file, so
when I get a new laptop, I can just go through the list and make the
changes in 1/2 hour. I also keep installed program tarballs on the
server so those can be installed with the same versions as the other
laptops.

--
Bruce Momjian http://candle.pha.pa.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Matteo Beccati <php(at)beccati(dot)com>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, PostgreSQL-patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: [HACKERS] Enhanced containment selectivity function
Date: 2006-05-12 08:02:08
Message-ID: 44644100.2090705@beccati.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Hi,

Bruce Momjian ha scritto:
> Uh, I just moved the selectivity function over to /contrib/ltree, and
> moved what I needed, so it now works. You can continue with the plan
> above, or I can.

I've just been able to install a 8.2-devel to test the ltree selectivity
improvements I suggested.

It was a big surprise having no improvements at all in the query I used
for all my previous tests, until I found out that the test against
histogram values was removed by Tom. I strongly think it should be
reintroduced: ltree columns are likely to have a unique constraint and
the current ltreeparentsel function will behave just like contsel in
these cases.

Here is the commit:

> Log Message:
> -----------
> Fix ltreeparentsel so it actually works ...
>
> Modified Files:
> --------------
> pgsql/contrib/ltree:
> ltree_op.c (r1.10 -> r1.11)
> (http://developer.postgresql.org/cvsweb.cgi/pgsql/contrib/ltree/ltree_op.c.diff?r1=1.10&r2=1.11)
>

Best regards
--
Matteo Beccati
http://phpadsnew.com
http://phppgads.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Matteo Beccati <php(at)beccati(dot)com>
Cc: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, PostgreSQL-patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: [HACKERS] Enhanced containment selectivity function
Date: 2006-05-12 13:33:45
Message-ID: 23644.1147440825@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Matteo Beccati <php(at)beccati(dot)com> writes:
> It was a big surprise having no improvements at all in the query I used
> for all my previous tests, until I found out that the test against
> histogram values was removed by Tom. I strongly think it should be
> reintroduced: ltree columns are likely to have a unique constraint and
> the current ltreeparentsel function will behave just like contsel in
> these cases.

The histogram values seem completely meaningless in this context ---
for containment purposes, they are just ten or so randomly chosen
values. I don't believe that the estimator works better with them.
Certainly, whether the column is unique or not is totally irrelevant
to whether they are representative.

What would seem saner to me is to add a datatype-specific analyze
function that collects some statistics that are actually relevant
to containment, and then make use of those in the estimator.

regards, tom lane


From: Matteo Beccati <php(at)beccati(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, PostgreSQL-patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: [HACKERS] Enhanced containment selectivity function
Date: 2006-05-12 13:56:03
Message-ID: 446493F3.3020503@beccati.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Hi,

> The histogram values seem completely meaningless in this context ---
> for containment purposes, they are just ten or so randomly chosen
> values. I don't believe that the estimator works better with them.
> Certainly, whether the column is unique or not is totally irrelevant
> to whether they are representative.

Right, but if the column has a high number of stats, I think that the
samples found in the histogram could put the estimator on the right way:
i.e. in my case 80% of the values have '1041' as their root leaf and
most of the values in the histogram reflect this.

You're right saying that the column uniqueness isn't relevant to the
histogram, but if the column is unique, there won't be any mcv, and the
patch becomes useless.

> What would seem saner to me is to add a datatype-specific analyze
> function that collects some statistics that are actually relevant
> to containment, and then make use of those in the estimator.

Perhaps you're right, but unfortunately it's not a thing I can do
myself, because of lack of knowledge about both pg and ltree internals :(

Best regards
--
Matteo Beccati
http://phpadsnew.com
http://phppgads.com