Re: [PATCHES] Eliminate more detoast copies for packed varlenas

Lists: pgsql-hackerspgsql-patches
From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: PostgreSQL-development Hackers <pgsql-patches(at)postgresql(dot)org>
Subject: Eliminate more detoast copies for packed varlenas
Date: 2007-09-21 16:47:14
Message-ID: 876424ufsd.fsf@oxford.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


Ok, this removes what should be most if not all of the call sites where we're
detoasting text or byteas. In particular it gets all the regexp/like functions
and all the trim/pad functions. It also gets hashtext and hash_any.

$ zcat packed-varlena-efficiency_v0.patch.gz | diffstat
backend/access/hash/hashfunc.c | 12 !!
backend/utils/adt/like.c | 80 !!!!!!!!!!!!!!!!!!!
backend/utils/adt/oracle_compat.c | 157 !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
backend/utils/adt/regexp.c | 119 !!!!!!!!!!!!!!!!!!!!!!!!!!!!
include/fmgr.h | 1
5 files changed, 5 insertions(+), 364 modifications(!)

Attachment Content-Type Size
packed-varlena-efficiency_v0.patch.gz application/octet-stream 6.2 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Gregory Stark <stark(at)enterprisedb(dot)com>
Cc: PostgreSQL-development Hackers <pgsql-patches(at)postgresql(dot)org>
Subject: Re: Eliminate more detoast copies for packed varlenas
Date: 2007-09-21 22:53:56
Message-ID: 27477.1190415236@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Gregory Stark <stark(at)enterprisedb(dot)com> writes:
> Ok, this removes what should be most if not all of the call sites where we're
> detoasting text or byteas. In particular it gets all the regexp/like functions
> and all the trim/pad functions. It also gets hashtext and hash_any.

Applied with some fixes --- you'd missed like_match.c, which doubtless
explains Guillame's complaint that it didn't work ...

regards, tom lane


From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "PostgreSQL-development Hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCHES] Eliminate more detoast copies for packed varlenas
Date: 2007-09-23 12:44:04
Message-ID: 87ejgpv9ez.fsf@oxford.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

> Gregory Stark <stark(at)enterprisedb(dot)com> writes:
>> Ok, this removes what should be most if not all of the call sites where we're
>> detoasting text or byteas. In particular it gets all the regexp/like functions
>> and all the trim/pad functions. It also gets hashtext and hash_any.
>
> Applied with some fixes --- you'd missed like_match.c, which doubtless
> explains Guillame's complaint that it didn't work ...

Strange. It passed all regression tests for me and it seems like this is
something that would have been caught even in single-byte mode by a simple
test. It seems to me that like_match.c only used for SIMILAR is that right?
That would explain it as there don't appear to be any tests of SIMILAR.

Separately:

> Although I'd misdiagnosed the reason for the recent failures on buildfarm
> member grebe, I see no reason to revert the 1-byte-header-friendly changes I
> made in varlena.c. Instead, tweak the code a little bit to get more
> advantage out of that.

This may have been a misdiagnosis of the buildfarm failures but it looks like
a correct bug fix to me. That is, it looks like regexp_split_to_array() was
capable of passing a packed varlena to setup_regexp_matches which wasn't
expecting it. But this I don't understand why it wouldn't cause regression
failures and indeed when I tested it with my build it didn't cause any
problems.

This all brings up the question of what other files should be considered for
fixing. There is the possibility that some of the other sites could turn out
to be performance critical for a given application. In particular I'm
primarily concerned with calls which could be invoked during a data load or
index build.

Of the following list it seems to me the calls in adt/acl.c are probably
interesting to fix. Especially since it seems we could just replace all those
text* parameters with Datum parameters since they're only going to be passed
to textin anyways.

After that the tsearch and xml data types are probably interesting, and the
various data type input functions and contrib gist/gin operator classes.

I'm fine doing the drudge work but wanted to check before I do it that I'm not
doing something we don't think is worth doing at this point in time.

src/backend/access/transam/xlog.c:3
src/backend/catalog/pg_conversion.c:2
src/backend/commands/sequence.c:1
src/backend/libpq/be-fsstubs.c:2
src/backend/tsearch/dict.c:3
src/backend/tsearch/to_tsany.c:6
src/backend/tsearch/wparser.c:6
src/backend/utils/adt/acl.c:61
src/backend/utils/adt/char.c:1
src/backend/utils/adt/date.c:3
src/backend/utils/adt/dbsize.c:2
src/backend/utils/adt/encode.c:1
src/backend/utils/adt/formatting.c:14
src/backend/utils/adt/genfile.c:3
src/backend/utils/adt/not_in.c:2
src/backend/utils/adt/quote.c:2
src/backend/utils/adt/regproc.c:1
src/backend/utils/adt/ruleutils.c:6
src/backend/utils/adt/tid.c:1
src/backend/utils/adt/timestamp.c:8
src/backend/utils/adt/tsquery_rewrite.c:1
src/backend/utils/adt/tsvector_op.c:3
src/backend/utils/adt/xml.c:24
src/backend/utils/mb/mbutils.c:1
src/tutorial/funcs_new.c:3

contrib/adminpack/adminpack.c:6
contrib/chkpass/chkpass.c:2
contrib/dblink/dblink.c:41
contrib/fuzzystrmatch/dmetaphone.c:2
contrib/fuzzystrmatch/fuzzystrmatch.c:6
contrib/hstore/hstore_gin.c:1
contrib/hstore/hstore_gist.c:1
contrib/hstore/hstore_op.c:6
contrib/intarray/_int_op.c:1
contrib/ltree/ltree_op.c:3
contrib/pageinspect/btreefuncs.c:3
contrib/pageinspect/rawpage.c:1
contrib/pg_trgm/trgm_gin.c:2
contrib/pg_trgm/trgm_gist.c:1
contrib/pg_trgm/trgm_op.c:3
contrib/pgcrypto/pgcrypto.c:10
contrib/pgcrypto/pgp-pgsql.c:1
contrib/pgrowlocks/pgrowlocks.c:1
contrib/pgstattuple/pgstatindex.c:2
contrib/pgstattuple/pgstattuple.c:1
contrib/sslinfo/sslinfo.c:2
contrib/tablefunc/tablefunc.c:14
contrib/tsearch2/dict.c:3
contrib/tsearch2/dict_ex.c:1
contrib/tsearch2/dict_ispell.c:1
contrib/tsearch2/dict_snowball.c:3
contrib/tsearch2/dict_syn.c:1
contrib/tsearch2/dict_thesaurus.c:1
contrib/tsearch2/query.c:4
contrib/tsearch2/query_rewrite.c:1
contrib/tsearch2/ts_cfg.c:1
contrib/tsearch2/ts_stat.c:2
contrib/tsearch2/tsvector.c:2
contrib/tsearch2/wparser.c:9
contrib/uuid-ossp/uuid-ossp.c:2
contrib/xml2/xpath.c:25
contrib/xml2/xslt_proc.c:3

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Gregory Stark <stark(at)enterprisedb(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCHES] Eliminate more detoast copies for packed varlenas
Date: 2007-09-23 13:09:50
Message-ID: 46F6659E.80501@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Gregory Stark wrote:
> "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
>
>
>> Gregory Stark <stark(at)enterprisedb(dot)com> writes:
>>
>>> Ok, this removes what should be most if not all of the call sites where we're
>>> detoasting text or byteas. In particular it gets all the regexp/like functions
>>> and all the trim/pad functions. It also gets hashtext and hash_any.
>>>
>> Applied with some fixes --- you'd missed like_match.c, which doubtless
>> explains Guillame's complaint that it didn't work ...
>>
>
> Strange. It passed all regression tests for me and it seems like this is
> something that would have been caught even in single-byte mode by a simple
> test. It seems to me that like_match.c only used for SIMILAR is that right?
> That would explain it as there don't appear to be any tests of SIMILAR.
>
>

No. like_match.c contains the template for all the various incarnations
of LIKE and ILIKE functions. It is included multiple times with
different sets of #defines in like.c to create those functions
(currently there are 4 of them). It also supplies the template for the
like_escape functions, and this is where the macros are used. Those
functions are apparently only called if there is an explicit ESCAPE
clause. Some of our regression tests do have this, so I'm not sure what
happened.

cheers

andrew


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Gregory Stark <stark(at)enterprisedb(dot)com>
Cc: "PostgreSQL-development Hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCHES] Eliminate more detoast copies for packed varlenas
Date: 2007-09-23 15:38:46
Message-ID: 15738.1190561926@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Gregory Stark <stark(at)enterprisedb(dot)com> writes:
> This may have been a misdiagnosis of the buildfarm failures but it looks like
> a correct bug fix to me. That is, it looks like regexp_split_to_array() was
> capable of passing a packed varlena to setup_regexp_matches which wasn't
> expecting it. But this I don't understand why it wouldn't cause regression
> failures and indeed when I tested it with my build it didn't cause any
> problems.

The particular regression tests we have for those functions seem to pass
constants to them, not table columns, and so they don't see packed inputs.

(It might be interesting to make textin produce a packed result when
possible, just to see what breaks; but I would be afraid to try to do
that for production...)

> This all brings up the question of what other files should be considered for
> fixing.

I'm very much against such a wholesale edit as you seem to have in mind
here. We already had some destabilization from the limited patch that
went in; now when we're trying to get to beta is not the time for more.
Maybe at the beginning of 8.4 devel cycle would be a reasonable time
to consider touching a lot of files.

regards, tom lane


From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "PostgreSQL-development Hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCHES] Eliminate more detoast copies for packed varlenas
Date: 2007-09-23 19:36:29
Message-ID: 878x6xuqbm.fsf@oxford.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

> Gregory Stark <stark(at)enterprisedb(dot)com> writes:
>
> (It might be interesting to make textin produce a packed result when
> possible, just to see what breaks; but I would be afraid to try to do
> that for production...)
>
>> This all brings up the question of what other files should be considered for
>> fixing.
>
> I'm very much against such a wholesale edit as you seem to have in mind
> here. We already had some destabilization from the limited patch that
> went in; now when we're trying to get to beta is not the time for more.
> Maybe at the beginning of 8.4 devel cycle would be a reasonable time
> to consider touching a lot of files.

Well I did expect that sort of concern if I went ahead and did all of them, or
nearly all of them. That's why I'm asking if any of the list seem like they
might be important enough to do now.

For 8.4 I'm starting to think it would make sense to make the distinction
between a "real" varlena and a possibly-unaligned pointer so text* wouldn't be
an ambiguous type which might not be what it appears to be.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com


From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "PostgreSQL-development Hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCHES] Eliminate more detoast copies for packed varlenas
Date: 2007-09-24 14:15:01
Message-ID: 87abrcf8uy.fsf@oxford.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

> (It might be interesting to make textin produce a packed result when
> possible, just to see what breaks; but I would be afraid to try to do
> that for production...)

Reassuringly all checks pass with a hack like that in place. (attached)

I think the right approach here is to define a new type text_packed * (which
would just be a char* or varattrib_1b* or something like that). Then
PG_GETARG_*_PP would return one of these new pointers.

This does leave us with warnings whenever an old-style function calls a
new-style function but I think there would be relatively few of those since
we'll tackle the higher traffic areas first which will be the lower level
functions.

The benefit is that it will give us a warning if we try to pass a pointer from
a new-style function to an old-style function which isn't prepared to receive
it.

Attachment Content-Type Size
debug-packed-varlena_v0.patch text/x-diff 832 bytes

From: "Brendan Jurd" <direvus(at)gmail(dot)com>
To: "Gregory Stark" <stark(at)enterprisedb(dot)com>
Cc: pgsql-patches(at)postgresql(dot)org
Subject: Re: Eliminate more detoast copies for packed varlenas
Date: 2007-09-27 12:53:06
Message-ID: 37ed240d0709270553u20e5cd3fp9827d665bb387fd@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On 9/22/07, Gregory Stark <stark(at)enterprisedb(dot)com> wrote:
> Ok, this removes what should be most if not all of the call sites where we're
> detoasting text or byteas. In particular it gets all the regexp/like functions
> and all the trim/pad functions. It also gets hashtext and hash_any.

Looks like there's some more of this in src/tutorial/funcs.c and funcs_new.c.

On a related note, while I was trawling through header files trying to
wrap my head around all this toast and varlena business, I found the
following comment, in fmgr.h and reiterated in postgres.h:

<>
WARNING: It is only safe to use PG_DETOAST_DATUM_UNPACKED() and
VARDATA_ANY() if you really don't care about the alignment.
</>

Shouldn't this be PG_DETOAST_DATUM_PACKED()? I'm emboldened by the
fact that there is no macro called PG_TOAST_DATUM_UNPACKED defined
anywhere in postgres.

Patch attached, in case I've got the right idea.

Regards,
BJ

Attachment Content-Type Size
packed-comment.diff text/plain 1.8 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Brendan Jurd" <direvus(at)gmail(dot)com>
Cc: "Gregory Stark" <stark(at)enterprisedb(dot)com>, pgsql-patches(at)postgresql(dot)org
Subject: Re: Eliminate more detoast copies for packed varlenas
Date: 2007-09-27 21:04:32
Message-ID: 14816.1190927072@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

"Brendan Jurd" <direvus(at)gmail(dot)com> writes:
> On 9/22/07, Gregory Stark <stark(at)enterprisedb(dot)com> wrote:
>> Ok, this removes what should be most if not all of the call sites where we're
>> detoasting text or byteas. In particular it gets all the regexp/like functions
>> and all the trim/pad functions. It also gets hashtext and hash_any.

> Looks like there's some more of this in src/tutorial/funcs.c and funcs_new.c.

Well, those are just tutorial code, so I'd vote for keeping it simple.

> On a related note, while I was trawling through header files trying to
> wrap my head around all this toast and varlena business, I found the
> following comment, in fmgr.h and reiterated in postgres.h:
> WARNING: It is only safe to use PG_DETOAST_DATUM_UNPACKED() and
> VARDATA_ANY() if you really don't care about the alignment.
> Shouldn't this be PG_DETOAST_DATUM_PACKED()?

Yup, good catch. In the context of fmgr.h, it seems like the comment
is more sensible if it refers to the underlying function
pg_detoast_datum_packed (which is what the preceding paragraph is
speaking of), so I made it say that instead.

regards, tom lane


From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "PostgreSQL-development Hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCHES] Eliminate more detoast copies for packed varlenas
Date: 2007-10-11 13:42:47
Message-ID: 87myupydgo.fsf@oxford.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

"Gregory Stark" <stark(at)enterprisedb(dot)com> writes:

> "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
>
>> (It might be interesting to make textin produce a packed result when
>> possible, just to see what breaks; but I would be afraid to try to do
>> that for production...)
>
> Reassuringly all checks pass with a hack like that in place. (attached)

For the record I've been doing some more testing and found one place that
could be a problem down the road. I'm not sure why it didn't show up
previously. In selfuncs.c we use VARDATA/VARSIZE on data that is taken from
parser Const nodes and from the histogram arrays without detoasting them.

Currently this is safe as array elements are not packed and parser nodes
contain values read using textin and never stored in a tuple. But down the
road I expect we'll want to pack array element so this code would need to
detoast the elements or prepare to handle packed elements.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Gregory Stark <stark(at)enterprisedb(dot)com>
Cc: "PostgreSQL-development Hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCHES] Eliminate more detoast copies for packed varlenas
Date: 2007-10-11 20:03:36
Message-ID: 5324.1192133016@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Gregory Stark <stark(at)enterprisedb(dot)com> writes:
> For the record I've been doing some more testing and found one place that
> could be a problem down the road. I'm not sure why it didn't show up
> previously. In selfuncs.c we use VARDATA/VARSIZE on data that is taken from
> parser Const nodes and from the histogram arrays without detoasting them.

> Currently this is safe as array elements are not packed and parser nodes
> contain values read using textin and never stored in a tuple. But down the
> road I expect we'll want to pack array element so this code would need to
> detoast the elements or prepare to handle packed elements.

Hmmm ... I think this should be fixed now, actually. I'm far from
convinced that a Const could never contain a toasted datum. Consider
constant-folding in the planner --- it just stuffs the result of a
function into a Const node.

In fact, it seems there's a different risk here: if such a datum were
toasted out-of-line, the reference in a cached plan might live longer
than the underlying toast-table data. Maybe we need a forcible detoast
in evaluate_function().

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Gregory Stark <stark(at)enterprisedb(dot)com>
Cc: "PostgreSQL-development Hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCHES] Eliminate more detoast copies for packed varlenas
Date: 2007-10-11 20:50:45
Message-ID: 5986.1192135845@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

I wrote:
> In fact, it seems there's a different risk here: if such a datum were
> toasted out-of-line, the reference in a cached plan might live longer
> than the underlying toast-table data. Maybe we need a forcible detoast
> in evaluate_function().

Sure enough, it seems we do. The attached script fails in every release
back to 7.3. It's a rather contrived case, because a function marked
immutable probably shouldn't be reading from a table at all, especially
not one you are likely to change or drop. That's probably why we've not
heard reports of this before. But it's definitely a bug.

regards, tom lane

create table big(f1 text);
insert into big values(repeat('xyzzy',100000));

create or replace function getbig() returns text as
'select f1 from big' language sql immutable;

create or replace function usebig(text) returns bool as '
begin return $1 ~ ''xyzzy''; end
' language plpgsql;

prepare foo as select usebig(getbig()) from int4_tbl;

execute foo;

drop table big;

execute foo;


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Gregory Stark <stark(at)enterprisedb(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCHES] Eliminate more detoast copies for packed varlenas
Date: 2007-11-03 22:01:16
Message-ID: 200711032201.lA3M1Gb26872@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


This has been saved for the 8.4 release:

http://momjian.postgresql.org/cgi-bin/pgpatches_hold

---------------------------------------------------------------------------

Gregory Stark wrote:
> "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
>
> > Gregory Stark <stark(at)enterprisedb(dot)com> writes:
> >> Ok, this removes what should be most if not all of the call sites where we're
> >> detoasting text or byteas. In particular it gets all the regexp/like functions
> >> and all the trim/pad functions. It also gets hashtext and hash_any.
> >
> > Applied with some fixes --- you'd missed like_match.c, which doubtless
> > explains Guillame's complaint that it didn't work ...
>
> Strange. It passed all regression tests for me and it seems like this is
> something that would have been caught even in single-byte mode by a simple
> test. It seems to me that like_match.c only used for SIMILAR is that right?
> That would explain it as there don't appear to be any tests of SIMILAR.
>
> Separately:
>
> > Although I'd misdiagnosed the reason for the recent failures on buildfarm
> > member grebe, I see no reason to revert the 1-byte-header-friendly changes I
> > made in varlena.c. Instead, tweak the code a little bit to get more
> > advantage out of that.
>
> This may have been a misdiagnosis of the buildfarm failures but it looks like
> a correct bug fix to me. That is, it looks like regexp_split_to_array() was
> capable of passing a packed varlena to setup_regexp_matches which wasn't
> expecting it. But this I don't understand why it wouldn't cause regression
> failures and indeed when I tested it with my build it didn't cause any
> problems.
>
>
> This all brings up the question of what other files should be considered for
> fixing. There is the possibility that some of the other sites could turn out
> to be performance critical for a given application. In particular I'm
> primarily concerned with calls which could be invoked during a data load or
> index build.
>
> Of the following list it seems to me the calls in adt/acl.c are probably
> interesting to fix. Especially since it seems we could just replace all those
> text* parameters with Datum parameters since they're only going to be passed
> to textin anyways.
>
> After that the tsearch and xml data types are probably interesting, and the
> various data type input functions and contrib gist/gin operator classes.
>
> I'm fine doing the drudge work but wanted to check before I do it that I'm not
> doing something we don't think is worth doing at this point in time.
>
> src/backend/access/transam/xlog.c:3
> src/backend/catalog/pg_conversion.c:2
> src/backend/commands/sequence.c:1
> src/backend/libpq/be-fsstubs.c:2
> src/backend/tsearch/dict.c:3
> src/backend/tsearch/to_tsany.c:6
> src/backend/tsearch/wparser.c:6
> src/backend/utils/adt/acl.c:61
> src/backend/utils/adt/char.c:1
> src/backend/utils/adt/date.c:3
> src/backend/utils/adt/dbsize.c:2
> src/backend/utils/adt/encode.c:1
> src/backend/utils/adt/formatting.c:14
> src/backend/utils/adt/genfile.c:3
> src/backend/utils/adt/not_in.c:2
> src/backend/utils/adt/quote.c:2
> src/backend/utils/adt/regproc.c:1
> src/backend/utils/adt/ruleutils.c:6
> src/backend/utils/adt/tid.c:1
> src/backend/utils/adt/timestamp.c:8
> src/backend/utils/adt/tsquery_rewrite.c:1
> src/backend/utils/adt/tsvector_op.c:3
> src/backend/utils/adt/xml.c:24
> src/backend/utils/mb/mbutils.c:1
> src/tutorial/funcs_new.c:3
>
> contrib/adminpack/adminpack.c:6
> contrib/chkpass/chkpass.c:2
> contrib/dblink/dblink.c:41
> contrib/fuzzystrmatch/dmetaphone.c:2
> contrib/fuzzystrmatch/fuzzystrmatch.c:6
> contrib/hstore/hstore_gin.c:1
> contrib/hstore/hstore_gist.c:1
> contrib/hstore/hstore_op.c:6
> contrib/intarray/_int_op.c:1
> contrib/ltree/ltree_op.c:3
> contrib/pageinspect/btreefuncs.c:3
> contrib/pageinspect/rawpage.c:1
> contrib/pg_trgm/trgm_gin.c:2
> contrib/pg_trgm/trgm_gist.c:1
> contrib/pg_trgm/trgm_op.c:3
> contrib/pgcrypto/pgcrypto.c:10
> contrib/pgcrypto/pgp-pgsql.c:1
> contrib/pgrowlocks/pgrowlocks.c:1
> contrib/pgstattuple/pgstatindex.c:2
> contrib/pgstattuple/pgstattuple.c:1
> contrib/sslinfo/sslinfo.c:2
> contrib/tablefunc/tablefunc.c:14
> contrib/tsearch2/dict.c:3
> contrib/tsearch2/dict_ex.c:1
> contrib/tsearch2/dict_ispell.c:1
> contrib/tsearch2/dict_snowball.c:3
> contrib/tsearch2/dict_syn.c:1
> contrib/tsearch2/dict_thesaurus.c:1
> contrib/tsearch2/query.c:4
> contrib/tsearch2/query_rewrite.c:1
> contrib/tsearch2/ts_cfg.c:1
> contrib/tsearch2/ts_stat.c:2
> contrib/tsearch2/tsvector.c:2
> contrib/tsearch2/wparser.c:9
> contrib/uuid-ossp/uuid-ossp.c:2
> contrib/xml2/xpath.c:25
> contrib/xml2/xslt_proc.c:3
>
>
> --
> Gregory Stark
> EnterpriseDB http://www.enterprisedb.com
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
> http://archives.postgresql.org

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://postgres.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Gregory Stark <stark(at)enterprisedb(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCHES] Eliminate more detoast copies for packed varlenas
Date: 2008-03-24 18:24:21
Message-ID: 200803241824.m2OIOLs10022@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


Added to TODO:

* Research reducing deTOASTing in more places

http://archives.postgresql.org/pgsql-hackers/2007-09/msg00895.php

---------------------------------------------------------------------------

Gregory Stark wrote:
> "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
>
> > Gregory Stark <stark(at)enterprisedb(dot)com> writes:
> >> Ok, this removes what should be most if not all of the call sites where we're
> >> detoasting text or byteas. In particular it gets all the regexp/like functions
> >> and all the trim/pad functions. It also gets hashtext and hash_any.
> >
> > Applied with some fixes --- you'd missed like_match.c, which doubtless
> > explains Guillame's complaint that it didn't work ...
>
> Strange. It passed all regression tests for me and it seems like this is
> something that would have been caught even in single-byte mode by a simple
> test. It seems to me that like_match.c only used for SIMILAR is that right?
> That would explain it as there don't appear to be any tests of SIMILAR.
>
> Separately:
>
> > Although I'd misdiagnosed the reason for the recent failures on buildfarm
> > member grebe, I see no reason to revert the 1-byte-header-friendly changes I
> > made in varlena.c. Instead, tweak the code a little bit to get more
> > advantage out of that.
>
> This may have been a misdiagnosis of the buildfarm failures but it looks like
> a correct bug fix to me. That is, it looks like regexp_split_to_array() was
> capable of passing a packed varlena to setup_regexp_matches which wasn't
> expecting it. But this I don't understand why it wouldn't cause regression
> failures and indeed when I tested it with my build it didn't cause any
> problems.
>
>
> This all brings up the question of what other files should be considered for
> fixing. There is the possibility that some of the other sites could turn out
> to be performance critical for a given application. In particular I'm
> primarily concerned with calls which could be invoked during a data load or
> index build.
>
> Of the following list it seems to me the calls in adt/acl.c are probably
> interesting to fix. Especially since it seems we could just replace all those
> text* parameters with Datum parameters since they're only going to be passed
> to textin anyways.
>
> After that the tsearch and xml data types are probably interesting, and the
> various data type input functions and contrib gist/gin operator classes.
>
> I'm fine doing the drudge work but wanted to check before I do it that I'm not
> doing something we don't think is worth doing at this point in time.
>
> src/backend/access/transam/xlog.c:3
> src/backend/catalog/pg_conversion.c:2
> src/backend/commands/sequence.c:1
> src/backend/libpq/be-fsstubs.c:2
> src/backend/tsearch/dict.c:3
> src/backend/tsearch/to_tsany.c:6
> src/backend/tsearch/wparser.c:6
> src/backend/utils/adt/acl.c:61
> src/backend/utils/adt/char.c:1
> src/backend/utils/adt/date.c:3
> src/backend/utils/adt/dbsize.c:2
> src/backend/utils/adt/encode.c:1
> src/backend/utils/adt/formatting.c:14
> src/backend/utils/adt/genfile.c:3
> src/backend/utils/adt/not_in.c:2
> src/backend/utils/adt/quote.c:2
> src/backend/utils/adt/regproc.c:1
> src/backend/utils/adt/ruleutils.c:6
> src/backend/utils/adt/tid.c:1
> src/backend/utils/adt/timestamp.c:8
> src/backend/utils/adt/tsquery_rewrite.c:1
> src/backend/utils/adt/tsvector_op.c:3
> src/backend/utils/adt/xml.c:24
> src/backend/utils/mb/mbutils.c:1
> src/tutorial/funcs_new.c:3
>
> contrib/adminpack/adminpack.c:6
> contrib/chkpass/chkpass.c:2
> contrib/dblink/dblink.c:41
> contrib/fuzzystrmatch/dmetaphone.c:2
> contrib/fuzzystrmatch/fuzzystrmatch.c:6
> contrib/hstore/hstore_gin.c:1
> contrib/hstore/hstore_gist.c:1
> contrib/hstore/hstore_op.c:6
> contrib/intarray/_int_op.c:1
> contrib/ltree/ltree_op.c:3
> contrib/pageinspect/btreefuncs.c:3
> contrib/pageinspect/rawpage.c:1
> contrib/pg_trgm/trgm_gin.c:2
> contrib/pg_trgm/trgm_gist.c:1
> contrib/pg_trgm/trgm_op.c:3
> contrib/pgcrypto/pgcrypto.c:10
> contrib/pgcrypto/pgp-pgsql.c:1
> contrib/pgrowlocks/pgrowlocks.c:1
> contrib/pgstattuple/pgstatindex.c:2
> contrib/pgstattuple/pgstattuple.c:1
> contrib/sslinfo/sslinfo.c:2
> contrib/tablefunc/tablefunc.c:14
> contrib/tsearch2/dict.c:3
> contrib/tsearch2/dict_ex.c:1
> contrib/tsearch2/dict_ispell.c:1
> contrib/tsearch2/dict_snowball.c:3
> contrib/tsearch2/dict_syn.c:1
> contrib/tsearch2/dict_thesaurus.c:1
> contrib/tsearch2/query.c:4
> contrib/tsearch2/query_rewrite.c:1
> contrib/tsearch2/ts_cfg.c:1
> contrib/tsearch2/ts_stat.c:2
> contrib/tsearch2/tsvector.c:2
> contrib/tsearch2/wparser.c:9
> contrib/uuid-ossp/uuid-ossp.c:2
> contrib/xml2/xpath.c:25
> contrib/xml2/xslt_proc.c:3
>
>
> --
> Gregory Stark
> EnterpriseDB http://www.enterprisedb.com
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
> http://archives.postgresql.org

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://postgres.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +