Re: Text <-> C string

Lists: pgsql-hackerspgsql-patches
From: "Brendan Jurd" <direvus(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Text <-> C string
Date: 2007-09-21 18:43:18
Message-ID: 37ed240d0709211143v6c55bbebx2f1ffbe354208d8c@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Hi hackers,

I've noticed that there is a lot of code, particularly in src/backend,
that goes through the motions of making a text datum into a cstring to
perform some work on it, and likewise for making a cstring into a text
datum.

Is there not a nice macro somewhere to handle this consistently? And
if not, shouldn't there be?

I noticed a comment for StrNCpy() in src/include/c.h that seems related:

/* BTW: when you need to copy a non-null-terminated string (like a
text datum) and add a null, do not do it with StrNCpy [snip] Do it
honestly with "memcpy(dst,src,len); dst[len] = '\0'; instead."

Okay, I can see why using StrNCpy is a bad idea, but why not "Do it
honestly with TEXT_CSTRING(src, dst)", or similar?

Surely having the exact same four lines of code written out in dozens
of places is a Bad Thing, but perhaps there is some reasoning behind
this that I am missing?

Thanks for your time,
BJ


From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: "Brendan Jurd" <direvus(at)gmail(dot)com>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Text <-> C string
Date: 2007-09-21 20:00:06
Message-ID: 877imjrdq1.fsf@oxford.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

"Brendan Jurd" <direvus(at)gmail(dot)com> writes:

> Surely having the exact same four lines of code written out in dozens
> of places is a Bad Thing, but perhaps there is some reasoning behind
> this that I am missing?

The canonical way to do it is with

DatumGetCString(DirectFunctionCall1(textout, t))

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com


From: "Brendan Jurd" <direvus(at)gmail(dot)com>
To: "Gregory Stark" <stark(at)enterprisedb(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Text <-> C string
Date: 2007-09-21 20:12:46
Message-ID: 37ed240d0709211312r11f1fb47s36d37a4b48a5f63c@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On 9/22/07, Gregory Stark <stark(at)enterprisedb(dot)com> wrote:
> The canonical way to do it is with
>
> DatumGetCString(DirectFunctionCall1(textout, t))

Ah, I see. Thanks.

In that case, would it be helpful if I submitted a patch for the
various code fragments that do this locally, updating them to use
DatumGetCString?


From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: "Brendan Jurd" <direvus(at)gmail(dot)com>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Text <-> C string
Date: 2007-09-21 20:33:38
Message-ID: 873ax7rc65.fsf@oxford.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


"Brendan Jurd" <direvus(at)gmail(dot)com> writes:

> On 9/22/07, Gregory Stark <stark(at)enterprisedb(dot)com> wrote:
>> The canonical way to do it is with
>>
>> DatumGetCString(DirectFunctionCall1(textout, t))
>
> Ah, I see. Thanks.
>
> In that case, would it be helpful if I submitted a patch for the
> various code fragments that do this locally, updating them to use
> DatumGetCString?

I would be interested in seeing just a list of such places if you have it
handy. I don't think we consider it wrong to violate the text data type
abstraction barrier like you describe though.

I'm interested because any such code is possibly either failing to take into
account toasted data or is unnecessarily detoasting packed varlenas.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com


From: "Brendan Jurd" <direvus(at)gmail(dot)com>
To: "Gregory Stark" <stark(at)enterprisedb(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Text <-> C string
Date: 2007-09-21 20:39:21
Message-ID: 37ed240d0709211339m1e1be9a5u7f474a5e1dcae953@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On 9/22/07, Gregory Stark <stark(at)enterprisedb(dot)com> wrote:
> The canonical way to do it is with
>
> DatumGetCString(DirectFunctionCall1(textout, t))

I just noticed a couple of macros defined in src/include/tsearch/ts_utils.h:

#define TextPGetCString(t)
DatumGetCString(DirectFunctionCall1(textout, PointerGetDatum(t)))
#define CStringGetTextP(c) DatumGetTextP(DirectFunctionCall1(textin,
CStringGetDatum(c)))

Seems these would actually be convenient in quite a lot of places in
the backend. Is there any downside to moving these two into
src/include/postgres.h?


From: "Brendan Jurd" <direvus(at)gmail(dot)com>
To: "Gregory Stark" <stark(at)enterprisedb(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Text <-> C string
Date: 2007-09-21 21:02:20
Message-ID: 37ed240d0709211402j187df41fr2e42dabf6f6d94f3@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Well, a couple of specific cases that I came across are
quote_identifier() in src/backend/utils/adt/quote.c, and
do_to_timestamp() in src/backend/utils/adt/formatting.c (line 3349).

I was getting a rough notion of how common the duplication was using

$ egrep -Rn -C 2 'memcpy.*VARDATA' src/backend

Not all of these are genuine duplications of textout and textin (you
have to eyeball them individually to work that out) but it's a
reasonable starting point.

The files matched under src/backend are as follows.

src/backend/libpq/be-fsstubs.c
src/backend/utils/mb/mbutils.c
src/backend/utils/adt/timestamp.c
src/backend/utils/adt/nabstime.c
src/backend/utils/adt/xml.c
src/backend/utils/adt/quote.c
src/backend/utils/adt/oracle_compat.c
src/backend/utils/adt/varchar.c
src/backend/utils/adt/ruleutils.c
src/backend/utils/adt/varlena.c
src/backend/utils/adt/tsginidx.c
src/backend/utils/adt/cash.c
src/backend/utils/adt/date.c
src/backend/utils/adt/genfile.c
src/backend/utils/adt/network.c
src/backend/utils/adt/selfuncs.c
src/backend/utils/adt/formatting.c
src/backend/utils/adt/version.c
src/backend/utils/adt/pgstatfuncs.c
src/backend/access/heap/tuptoaster.c
src/backend/access/common/heaptuple.c
src/backend/storage/large_object/inv_api.c
src/backend/executor/execQual.c
src/backend/catalog/pg_conversion.c

On 9/22/07, Gregory Stark <stark(at)enterprisedb(dot)com> wrote:
>
> "Brendan Jurd" <direvus(at)gmail(dot)com> writes:
>
> > On 9/22/07, Gregory Stark <stark(at)enterprisedb(dot)com> wrote:
> >> The canonical way to do it is with
> >>
> >> DatumGetCString(DirectFunctionCall1(textout, t))
> >
> > Ah, I see. Thanks.
> >
> > In that case, would it be helpful if I submitted a patch for the
> > various code fragments that do this locally, updating them to use
> > DatumGetCString?
>
> I would be interested in seeing just a list of such places if you have it
> handy. I don't think we consider it wrong to violate the text data type
> abstraction barrier like you describe though.
>
> I'm interested because any such code is possibly either failing to take into
> account toasted data or is unnecessarily detoasting packed varlenas.
>
> --
> Gregory Stark
> EnterpriseDB http://www.enterprisedb.com
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings
>


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Brendan Jurd" <direvus(at)gmail(dot)com>
Cc: "Gregory Stark" <stark(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Text <-> C string
Date: 2007-09-21 23:09:02
Message-ID: 27654.1190416142@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

"Brendan Jurd" <direvus(at)gmail(dot)com> writes:
> I just noticed a couple of macros defined in src/include/tsearch/ts_utils.h:

> #define TextPGetCString(t)
> DatumGetCString(DirectFunctionCall1(textout, PointerGetDatum(t)))
> #define CStringGetTextP(c) DatumGetTextP(DirectFunctionCall1(textin,
> CStringGetDatum(c)))

> Seems these would actually be convenient in quite a lot of places in
> the backend. Is there any downside to moving these two into
> src/include/postgres.h?

I think if you look around you'll find several similar things in various
contrib modules. It would make some sense to try to unify all this.
I'm not particularly for making it macros in postgres.h though ---
that's no help if the macros require referencing stuff in builtins.h.

On grounds of code-space savings I think it might be worth making
these things be simple functions declared in builtins.h; that would
also make it much easier to change their implementations.

regards, tom lane


From: "Brendan Jurd" <direvus(at)gmail(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Gregory Stark" <stark(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Text <-> C string
Date: 2007-09-22 21:55:59
Message-ID: 37ed240d0709221455o20b681ect9def043fd665cc9c@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On 9/22/07, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> "Brendan Jurd" <direvus(at)gmail(dot)com> writes:
> > I just noticed a couple of macros defined in src/include/tsearch/ts_utils.h:
>
> > #define TextPGetCString(t)
> > DatumGetCString(DirectFunctionCall1(textout, PointerGetDatum(t)))
> > #define CStringGetTextP(c) DatumGetTextP(DirectFunctionCall1(textin,
> > CStringGetDatum(c)))
>
> I think if you look around you'll find several similar things in various
> contrib modules. It would make some sense to try to unify all this.
> I'm not particularly for making it macros in postgres.h though ---
> that's no help if the macros require referencing stuff in builtins.h.
>
> On grounds of code-space savings I think it might be worth making
> these things be simple functions declared in builtins.h; that would
> also make it much easier to change their implementations.

You're right about finding similar things in various places. Even
varlena.c has a set of these macros (PG_TEXT_GET_STR etc), but it
doesn't look they've really been utilised.

I'm happy to take a swing at this. Declaring in builtins.h makes sense.

The thing that's got me confused at the moment is what naming
convention to use for the functions. Looking in builtins.h you might
get the impression that we use lower_underscore for functions that are
called via fmgr, UPPER_UNDERSCORE for macros and CamelCase for
ordinary internal C functions, but there are plenty of exceptions to
disprove that rule. I see camel cased macros and lowercased internal
functions. Camel cased identifiers sometimes start with uppercase,
sometimes lowercase.

So the name for the text -> cstring function could be any of:

text_cstr
text_to_cstr
textToCString
TextToCString

Is there any kind of authoritative naming convention I can refer to?

Thanks for your time,
BJ


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Brendan Jurd" <direvus(at)gmail(dot)com>
Cc: "Gregory Stark" <stark(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Text <-> C string
Date: 2007-09-22 23:45:24
Message-ID: 6194.1190504724@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

"Brendan Jurd" <direvus(at)gmail(dot)com> writes:
> The thing that's got me confused at the moment is what naming
> convention to use for the functions.

Well, almost any convention you like has some precedent somewhere in
the PG code, given all the contributors over the years. Almost the
only thing we actively discourage is Hungarian notation, and I think
there's even some of that in some corners.

Personally I would vote against something like TextPGetCString because
it would look like one of the family of macros that are named FooGetBar.
Maybe use text_to_cstring and cstring_to_text? It's not real important
though.

regards, tom lane


From: "Brendan Jurd" <direvus(at)gmail(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Gregory Stark" <stark(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Text <-> C string
Date: 2007-09-27 13:41:52
Message-ID: 37ed240d0709270641t1b680ccw553ecfdff3efc2d@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On 9/22/07, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> On grounds of code-space savings I think it might be worth making
> these things be simple functions declared in builtins.h; that would
> also make it much easier to change their implementations.

I've noticed that this pattern isn't exclusive to the text type; other
varlena types like bytea and xmltype seem to have a common requirement
to translate to and fro C strings for various jobs.

Does it make sense to go one level lower, and make these functions
work for any varlena?

So far, I've got the following functions doing the work:

char * text_cstring(text *t)
char * text_cstring_limit(text *t, int len)
text * cstring_text(char *s)

It wouldn't be difficult at this point to make those functions
'varlena' rather than 'text', and then bytea and xmltype (and any
other future types that want to inherit from varlena) can take
advantage of them.

Thanks for your time,
BJ


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Brendan Jurd" <direvus(at)gmail(dot)com>
Cc: "Gregory Stark" <stark(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Text <-> C string
Date: 2007-09-27 13:48:54
Message-ID: 27144.1190900934@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

"Brendan Jurd" <direvus(at)gmail(dot)com> writes:
> So far, I've got the following functions doing the work:

> char * text_cstring(text *t)
> char * text_cstring_limit(text *t, int len)
> text * cstring_text(char *s)

> It wouldn't be difficult at this point to make those functions
> 'varlena' rather than 'text', and then bytea and xmltype (and any
> other future types that want to inherit from varlena) can take
> advantage of them.

Mmm, but the conversions are generally not identical --- for instance,
bytea needs to do escaping/de-escaping, and I doubt that XML will stick
to dumb flat-string representation for long, and for that matter text
itself is likely to change someday for better locale support. Where the
representations and conversions *are* identical, one can just cast.
I'd vote for keeping the names focused on text ...

regards, tom lane


From: "Brendan Jurd" <direvus(at)gmail(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Gregory Stark" <stark(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org, pgsql-patches(at)postgresql(dot)org
Subject: Re: [HACKERS] Text <-> C string
Date: 2007-10-01 19:20:12
Message-ID: 37ed240d0710011220w5d49c3e6u89d814a9f1578df9@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

As discussed on -hackers, I'm trying to get rid of some redundant code
by creating a widely useful set of functions to convert between text
and C string in the backend.

The new extern functions, declared in include/utils/builtins.h and
defined in backend/utils/adt/varlena.c, are:

char * text_cstring(const text *t)
char * text_cstring_limit(const text *t, int len)
text * cstring_text(const char *s)
text * cstring_text_limit(const char *s, int len)

Within varlena.c, the actual conversions are performed by:

char * do_text_cstring(const text *t, const int len)
text * do_cstring_text(const char *s, const int len)

These functions now do the work for the fmgr functions textin and
textout, as well as being directly accessible by backend code.

I've searched through the backend for any code which converted between
text and C string manually (with memcpy and VARDATA), replacing with
calls to one of the four new functions as appropriate.

I came across some areas which were using the same, or similar,
conversion technique on other varlena data types, such as bytea or
xmltype. In cases where the conversion was completely identical I
used the new functions. In cases with any differences (even if they
seemed minor) I played it safe and left them alone.

I'd now like to submit my work so far for review. This patch compiled
cleanly on Linux and passed all parallel regression tests. It appears
to be performance-neutral based on a few rough tests; I haven't tried
to profile the changes in detail.

There is still a lot of code out there using DirectFunctionCall1 to
call text(in|out)). I've decided to wait for some community feedback
on the patch as it stands before replacing those calls. There are a
great many, and it would be a shame to have to go through them more
than once.

I would naively expect that replacing fmgr calls with direct calls
would lead to a performance gain (no fmgr overhead), but honestly I'm
not sure whether that would actually make a difference.

Thanks for your time,
BJ

Attachment Content-Type Size
text-cstring_1.diff.gz application/x-gzip 9.8 KB

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Brendan Jurd <direvus(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Gregory Stark <stark(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org, pgsql-patches(at)postgresql(dot)org
Subject: Re: [HACKERS] Text <-> C string
Date: 2007-11-03 19:47:03
Message-ID: 200711031947.lA3Jl3f00902@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


This has been saved for the 8.4 release:

http://momjian.postgresql.org/cgi-bin/pgpatches_hold

---------------------------------------------------------------------------

Brendan Jurd wrote:
> As discussed on -hackers, I'm trying to get rid of some redundant code
> by creating a widely useful set of functions to convert between text
> and C string in the backend.
>
> The new extern functions, declared in include/utils/builtins.h and
> defined in backend/utils/adt/varlena.c, are:
>
> char * text_cstring(const text *t)
> char * text_cstring_limit(const text *t, int len)
> text * cstring_text(const char *s)
> text * cstring_text_limit(const char *s, int len)
>
> Within varlena.c, the actual conversions are performed by:
>
> char * do_text_cstring(const text *t, const int len)
> text * do_cstring_text(const char *s, const int len)
>
> These functions now do the work for the fmgr functions textin and
> textout, as well as being directly accessible by backend code.
>
> I've searched through the backend for any code which converted between
> text and C string manually (with memcpy and VARDATA), replacing with
> calls to one of the four new functions as appropriate.
>
> I came across some areas which were using the same, or similar,
> conversion technique on other varlena data types, such as bytea or
> xmltype. In cases where the conversion was completely identical I
> used the new functions. In cases with any differences (even if they
> seemed minor) I played it safe and left them alone.
>
> I'd now like to submit my work so far for review. This patch compiled
> cleanly on Linux and passed all parallel regression tests. It appears
> to be performance-neutral based on a few rough tests; I haven't tried
> to profile the changes in detail.
>
> There is still a lot of code out there using DirectFunctionCall1 to
> call text(in|out)). I've decided to wait for some community feedback
> on the patch as it stands before replacing those calls. There are a
> great many, and it would be a shame to have to go through them more
> than once.
>
> I would naively expect that replacing fmgr calls with direct calls
> would lead to a performance gain (no fmgr overhead), but honestly I'm
> not sure whether that would actually make a difference.
>
> Thanks for your time,
> BJ

[ Attachment, skipping... ]

>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://postgres.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Brendan Jurd <direvus(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Gregory Stark <stark(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org, pgsql-patches(at)postgresql(dot)org
Subject: Re: [HACKERS] Text <-> C string
Date: 2007-11-04 21:58:08
Message-ID: 200711042158.lA4Lw8l01393@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


This has been saved for the 8.4 release:

http://momjian.postgresql.org/cgi-bin/pgpatches_hold

---------------------------------------------------------------------------

Brendan Jurd wrote:
> As discussed on -hackers, I'm trying to get rid of some redundant code
> by creating a widely useful set of functions to convert between text
> and C string in the backend.
>
> The new extern functions, declared in include/utils/builtins.h and
> defined in backend/utils/adt/varlena.c, are:
>
> char * text_cstring(const text *t)
> char * text_cstring_limit(const text *t, int len)
> text * cstring_text(const char *s)
> text * cstring_text_limit(const char *s, int len)
>
> Within varlena.c, the actual conversions are performed by:
>
> char * do_text_cstring(const text *t, const int len)
> text * do_cstring_text(const char *s, const int len)
>
> These functions now do the work for the fmgr functions textin and
> textout, as well as being directly accessible by backend code.
>
> I've searched through the backend for any code which converted between
> text and C string manually (with memcpy and VARDATA), replacing with
> calls to one of the four new functions as appropriate.
>
> I came across some areas which were using the same, or similar,
> conversion technique on other varlena data types, such as bytea or
> xmltype. In cases where the conversion was completely identical I
> used the new functions. In cases with any differences (even if they
> seemed minor) I played it safe and left them alone.
>
> I'd now like to submit my work so far for review. This patch compiled
> cleanly on Linux and passed all parallel regression tests. It appears
> to be performance-neutral based on a few rough tests; I haven't tried
> to profile the changes in detail.
>
> There is still a lot of code out there using DirectFunctionCall1 to
> call text(in|out)). I've decided to wait for some community feedback
> on the patch as it stands before replacing those calls. There are a
> great many, and it would be a shame to have to go through them more
> than once.
>
> I would naively expect that replacing fmgr calls with direct calls
> would lead to a performance gain (no fmgr overhead), but honestly I'm
> not sure whether that would actually make a difference.
>
> Thanks for your time,
> BJ

[ Attachment, skipping... ]

>
> ---------------------------(end of broadcast)---------------------------
> TIP 7: You can help support the PostgreSQL project by donating at
>
> http://www.postgresql.org/about/donate

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://postgres.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Brendan Jurd" <direvus(at)gmail(dot)com>
Cc: "Gregory Stark" <stark(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org, pgsql-patches(at)postgresql(dot)org
Subject: Re: [PATCHES] Text <-> C string
Date: 2008-03-19 16:51:35
Message-ID: 3611.1205945495@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

"Brendan Jurd" <direvus(at)gmail(dot)com> writes:
> As discussed on -hackers, I'm trying to get rid of some redundant code
> by creating a widely useful set of functions to convert between text
> and C string in the backend.

> The new extern functions, declared in include/utils/builtins.h and
> defined in backend/utils/adt/varlena.c, are:

> char * text_cstring(const text *t)
> char * text_cstring_limit(const text *t, int len)
> text * cstring_text(const char *s)
> text * cstring_text_limit(const char *s, int len)

I started to look at applying this patch and immediately decided that
I didn't like these names --- it's exceeding un-obvious which direction
of conversion any one of the functions performs. Seems like every time
you wanted to call one, you'd be going back to look at the source code
to remember which to use.

What do people think of text_to_cstring? Or should we go with
TextPGetCString for consistency with the Datum-whacking macros? (I seem
to recall having argued against the latter idea, but am reconsidering.)
Or something else?

BTW, I suspect that the _limit functions are mostly useless ---
a quick look through the patch did not suggest that any of the places
using them really needed a limit. The point of, for instance,
TZ_STRLEN_MAX is to be able to use fixed-size local buffers, and
if you're going to pay for a palloc anyway then you might as well
forget it. (There might be some value in a strlcpy equivalent that
copies from a text datum into a limited-size caller-supplied buffer,
but that's not what we've got here.)

regards, tom lane


From: Sam Mason <sam(at)samason(dot)me(dot)uk>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCHES] Text <-> C string
Date: 2008-03-19 17:44:36
Message-ID: 20080319174436.GC6870@frubble.xen.chris-lamb.co.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Wed, Mar 19, 2008 at 12:51:35PM -0400, Tom Lane wrote:
> "Brendan Jurd" <direvus(at)gmail(dot)com> writes:
> > char * text_cstring(const text *t)
>
> What do people think of text_to_cstring?

I tend to put things the other way around in my code, i.e:

char * cstring_of_text(const text *t)

mainly because things read more easily---type definitions of new
variables are next to the first part of the word.

char * str = cstring_of_text(src_text);
vs.
char * str = text_to_cstring(src_text);

I think I got my original inspiration for doing it this way around from
the Caml language.

Sam


From: Volkan YAZICI <yazicivo(at)ttmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Text <-> C string
Date: 2008-03-19 17:55:22
Message-ID: 87y78e7g51.fsf@alamut.mobiliz.com.tr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Wed, 19 Mar 2008, Sam Mason <sam(at)samason(dot)me(dot)uk> writes:
> ...
> char * str = cstring_of_text(src_text);
> ...
>
> I think I got my original inspiration for doing it this way around from
> the Caml language.

Also, used in Common Lisp as class accessors:

char *s = cstring_of(text);
text *t = text_of(cstring);

But I'd vote for TextPGetCString style Tom suggested for the eye-habit
compatibility with the rest of the code.

Regards.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Volkan YAZICI <yazicivo(at)ttmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Text <-> C string
Date: 2008-03-20 00:10:47
Message-ID: 23162.1205971847@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Volkan YAZICI <yazicivo(at)ttmail(dot)com> writes:
> But I'd vote for TextPGetCString style Tom suggested for the eye-habit
> compatibility with the rest of the code.

If there are not additional votes, I'll go with TextPGetCString
and CStringGetTextP.

regards, tom lane


From: "Brendan Jurd" <direvus(at)gmail(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Gregory Stark" <stark(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org, pgsql-patches(at)postgresql(dot)org
Subject: Re: [PATCHES] Text <-> C string
Date: 2008-03-20 02:03:56
Message-ID: 37ed240d0803191903u6c242cc1n12f10e069223e035@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On 20/03/2008, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> I started to look at applying this patch and immediately decided that
> I didn't like these names --- it's exceeding un-obvious which direction
> of conversion any one of the functions performs. Seems like every time
> you wanted to call one, you'd be going back to look at the source code
> to remember which to use.
>

That's a fair criticism. I wanted to make the function names as
compact as possible, but your comment about the directionality of the
functions rings true.

> What do people think of text_to_cstring? Or should we go with
> TextPGetCString for consistency with the Datum-whacking macros? (I seem
> to recall having argued against the latter idea, but am reconsidering.)
> Or something else?
>

Your original argument against FooGetBar was that it would be *too*
consistent with the Datum macros, leading people to think that these
functions actually were macros.

As long as we don't want people getting confused about the
function/macro distinction, that argument still makes sense to me, and
I'd be more inclined towards the foo_to_bar() convention.

> BTW, I suspect that the _limit functions are mostly useless ---
> a quick look through the patch did not suggest that any of the places
> using them really needed a limit. The point of, for instance,
> TZ_STRLEN_MAX is to be able to use fixed-size local buffers, and
> if you're going to pay for a palloc anyway then you might as well
> forget it.

What about callers like dotrim() in oracle_compat.c, which only want
to copy characters from the source string up to a particular length?
Doesn't that indicate a legitimate requirement for a
cstring_to_text_limit() (the call site was palloc'ing the text value
anyway)?

On the other hand, we do have those call sites (TZ_STRLEN_MAX is a
good example) where the caller just wanted to use a local buffer. In
which case your strlcpy-equivalent function would probably be the
right thing to offer.

> (There might be some value in a strlcpy equivalent that
> copies from a text datum into a limited-size caller-supplied buffer,
> but that's not what we've got here.)
>

Perhaps we keep cstring_to_text_limit(), but make
text_to_cstring_limit() behave like strlcpy() instead?

One of the questions in the original patch submission was whether it
would be worth changing all those DirectFunctionCall(textin) and
(textout) calls to use the new functions. Is it worthwhile avoiding
the fmgr overhead?

Thanks for taking the time to review my patch and provide this
feedback. I'm happy to send in an updated version once we settle on
the naming convention for the functions.

Last time I looked, the codebase had shifted quite a bit since I
originally wrote the patch. So it probably needs some work to apply
cleanly on the latest sources anyway.

Regards,
BJ


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Brendan Jurd" <direvus(at)gmail(dot)com>
Cc: "Gregory Stark" <stark(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org, pgsql-patches(at)postgresql(dot)org
Subject: Re: [PATCHES] Text <-> C string
Date: 2008-03-20 04:43:22
Message-ID: 26079.1205988202@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

"Brendan Jurd" <direvus(at)gmail(dot)com> writes:
> One of the questions in the original patch submission was whether it
> would be worth changing all those DirectFunctionCall(textin) and
> (textout) calls to use the new functions. Is it worthwhile avoiding
> the fmgr overhead?

I think that's worth doing just on notational clarity grounds.
The small cycle savings doesn't excite me, but understanding
DirectFunctionCall1(textin, CStringGetDatum(foo)) just involves
more different bits of trivia than cstring_to_text(foo).

> Last time I looked, the codebase had shifted quite a bit since I
> originally wrote the patch. So it probably needs some work to apply
> cleanly on the latest sources anyway.

Yeah, with wide-impact patches like this you are always going to have
that problem. One point though is that we don't have to improve every
call site at the same time. I'd be inclined to put in the new functions
and hit some representative sample of utils/adt/ files in the first
commit, and then incrementally fix other stuff.

regards, tom lane


From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Volkan YAZICI" <yazicivo(at)ttmail(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Text <-> C string
Date: 2008-03-20 08:52:24
Message-ID: 87lk4d7p6f.fsf@oxford.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

> Volkan YAZICI <yazicivo(at)ttmail(dot)com> writes:
>> But I'd vote for TextPGetCString style Tom suggested for the eye-habit
>> compatibility with the rest of the code.
>
> If there are not additional votes, I'll go with TextPGetCString
> and CStringGetTextP.

I would have voted for text_to_cstring etc. I can see the logic for the above
but it's just such a pain to type...

Fwiw I didn't actually find text_cstring confusing because all our sql cast
functions are defined that way.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's Slony Replication support!


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Gregory Stark <stark(at)enterprisedb(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Volkan YAZICI <yazicivo(at)ttmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Text <-> C string
Date: 2008-03-20 12:36:51
Message-ID: 20080320123651.GA6235@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Gregory Stark wrote:
> "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
>
> > Volkan YAZICI <yazicivo(at)ttmail(dot)com> writes:
> >> But I'd vote for TextPGetCString style Tom suggested for the eye-habit
> >> compatibility with the rest of the code.
> >
> > If there are not additional votes, I'll go with TextPGetCString
> > and CStringGetTextP.
>
> I would have voted for text_to_cstring etc. I can see the logic for the above
> but it's just such a pain to type...

+1 for text_to_cstring et al.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Gregory Stark <stark(at)enterprisedb(dot)com>, Volkan YAZICI <yazicivo(at)ttmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Text <-> C string
Date: 2008-03-20 23:56:49
Message-ID: 26245.1206057409@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> Gregory Stark wrote:
>> "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
>>> If there are not additional votes, I'll go with TextPGetCString
>>> and CStringGetTextP.
>>
>> I would have voted for text_to_cstring etc. I can see the logic for the above
>> but it's just such a pain to type...

> +1 for text_to_cstring et al.

Well, that's all right with me too.

It occurs to me that this proposal is still leaving something on the
table. Consider a SQL-callable function that takes a text argument and
wants to turn it into a C string for processing. With the proposal as
it stands you'd have to do something like

text *t = PG_GETARG_TEXT_P(0);
char *c = text_to_cstring(t);

Now you might be smart enough to optimize that to

text *t = PG_GETARG_TEXT_PP(0);
char *c = text_to_cstring(t);

which would avoid a useless copy for short-header text datums, but it's
still leaking an extra copy of the text if the input is compressed or
toasted out-of-line. I'm imagining instead

char *c = PG_GETARG_TEXT_AS_CSTRING(0);

This would expand to a call on say text_datum_to_cstring(Datum d)
which would detoast, convert, and then free the detoasted copy if
needed.

On the other hand, it could be argued that this is usually a waste of
effort. (I frequently point out to people that retail pfree's in
SQL-callable functions are often less efficient than letting the next
context reset clean up.) And one thing I don't like about this notation
is that the declared type of the local variable no longer matches up
with the SQL-level declaration of the function.

Comments anyone?

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Text <-> C string
Date: 2008-03-25 16:34:43
Message-ID: 23752.1206462883@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

I've been working some more on Brendan Jurd's patch to simplify text <->
C string conversions. It seems we have consensus on the names for the
base operations:

extern text *cstring_to_text(const char *s);
extern char *text_to_cstring(const text *t);

Brendan's patch also included "cstring_text_limit(const char *s, int len)"
which was defined as copying Min(len, strlen(s)) bytes. I didn't find
this to be particularly useful. In the first place, all potential
callers are passing the exact desired length, so the strlen() call is
just a waste of cycles. In the second place, at least some callers pass
text that is not embedded in a known-to-be-null-terminated string (it
could be a section of a text datum instead); which means there is a
nonzero chance of the strlen running off the end of memory and dumping
core. So I propose instead

extern text *cstring_to_text_with_len(const char *s, int len);

which just takes the given length as gospel. Brendan had also proposed
"text_to_cstring_limit(const text *t, int len)" with similar Min()
semantics, but what this was doing was replacing copies into
limited-size local buffers with a palloc. If we did that we might
as well just use text_to_cstring. What I think is more useful is
a strlcpy()-like function that copies into a caller-supplied buffer
of limited size. For lack of a better idea I propose defining it
*exactly* like strlcpy:

extern size_t textlcpy(char *dst, const text *src, size_t siz);

I've also found that there are lots and lots of places where the
text end of the conversion needs to be a Datum not a text *,
so it seems worthwhile to introduce a couple of macros to minimize
notation in that case:

#define CStringGetTextDatum(s) PointerGetDatum(cstring_to_text(s))
#define TextDatumGetCString(d) text_to_cstring((text *) DatumGetPointer(d))

Lastly, the originally submitted text-to-something functions would
work correctly on plain and 1-byte-header datums, but not on
compressed or toasted-out-of-line datums. There are a whole lot of
places where that's not good enough. Rather than expecting the caller
to use the right detoasting macro everywhere, it seems best to make
these functions cope with any variant. That also avoids memory
leakage by allowing the intermediate copy to be pfree'd. (I had
suggested that the pfree might be pointless, but I reconsidered ---
if the text object is large enough to be compressed or toasted,
we're talking about at least several K, so it's worth not leaking.)

In short, the infrastructure I'm currently testing is the above
definitions with the attached implementation. Last call for
objections ...

regards, tom lane

/*
* cstring_to_text
*
* Create a text value from a null-terminated C string.
*
* The new text value is freshly palloc'd with a full-size VARHDR.
*/
text *
cstring_to_text(const char *s)
{
return cstring_to_text_with_len(s, strlen(s));
}

/*
* cstring_to_text_with_len
*
* Same as cstring_to_text except the caller specifies the string length;
* the string need not be null_terminated.
*/
text *
cstring_to_text_with_len(const char *s, int len)
{
text *result = (text *) palloc(len + VARHDRSZ);

SET_VARSIZE(result, len + VARHDRSZ);
memcpy(VARDATA(result), s, len);

return result;
}

/*
* text_to_cstring
*
* Create a palloc'd, null-terminated C string from a text value.
*
* We support being passed a compressed or toasted text value.
* This is a bit bogus since such values shouldn't really be referred to as
* "text *", but it seems useful for robustness. If we didn't handle that
* case here, we'd need another routine that did, anyway.
*/
char *
text_to_cstring(const text *t)
{
char *result;
text *tunpacked = pg_detoast_datum_packed((struct varlena *) t);
int len = VARSIZE_ANY_EXHDR(tunpacked);

result = (char *) palloc(len + 1);
memcpy(result, VARDATA_ANY(tunpacked), len);
result[len] = '\0';

if (tunpacked != t)
pfree(tunpacked);

return result;
}

/*
* textlcpy --- exactly like strlcpy(), except source is a text value.
*
* Copy src to string dst of size siz. At most siz-1 characters
* will be copied. Always NUL terminates (unless siz == 0).
* Returns strlen(src); if retval >= siz, truncation occurred.
*
* We support being passed a compressed or toasted text value.
* This is a bit bogus since such values shouldn't really be referred to as
* "text *", but it seems useful for robustness. If we didn't handle that
* case here, we'd need another routine that did, anyway.
*/
size_t
textlcpy(char *dst, const text *src, size_t siz)
{
text *srcunpacked = pg_detoast_datum_packed((struct varlena *) src);
size_t srclen = VARSIZE_ANY_EXHDR(srcunpacked);

if (siz > 0)
{
siz--;
if (siz >= srclen)
siz = srclen;
else /* ensure truncation is encoding-safe */
siz = pg_mbcliplen(VARDATA_ANY(srcunpacked), srclen, siz);
memcpy(dst, VARDATA_ANY(srcunpacked), siz);
dst[siz] = '\0';
}

if (srcunpacked != src)
pfree(srcunpacked);

return srclen;
}


From: "Brendan Jurd" <direvus(at)gmail(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Text <-> C string
Date: 2008-03-25 17:33:13
Message-ID: 37ed240d0803251033h60fe993bp694dd4ff657f442a@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On 26/03/2008, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Brendan's patch also included "cstring_text_limit(const char *s, int len)"
> which was defined as copying Min(len, strlen(s)) bytes. I didn't find
> this to be particularly useful. In the first place, all potential
> callers are passing the exact desired length, so the strlen() call is
> just a waste of cycles. In the second place, at least some callers pass
> text that is not embedded in a known-to-be-null-terminated string (it
> could be a section of a text datum instead); which means there is a
> nonzero chance of the strlen running off the end of memory and dumping
> core. So I propose instead
>
> extern text *cstring_to_text_with_len(const char *s, int len);
>

That all makes sense to me. I think the new name is good. It's
pretty long, but I'm not seeing a shorter name that accurately
describes the function.

> which just takes the given length as gospel. Brendan had also proposed
> "text_to_cstring_limit(const text *t, int len)" with similar Min()
> semantics, but what this was doing was replacing copies into
> limited-size local buffers with a palloc. If we did that we might
> as well just use text_to_cstring. What I think is more useful is
> a strlcpy()-like function that copies into a caller-supplied buffer
> of limited size. For lack of a better idea I propose defining it
> *exactly* like strlcpy:
>
> extern size_t textlcpy(char *dst, const text *src, size_t siz);
>

I'm all for providing a function with this behaviour, but is
textlcpy() a bit ambiguous? It's not clear from the name whether the
function copies text -> text, text -> cstring or cstring -> text. In
fact, if I didn't already know better I'd probably assume that the
function copied text -> text with length, in the same way strlcpy
copies string -> string.

A text_to_cstring_with_len() or text_to_cstring_limit() might be more
to the point, and more consistent with the other functions in the
family.

On the other hand, maybe some difference in naming would help make it
obvious to callers that, unlike its siblings, textlcpy() takes the
destination string as an argument rather than returning it.
text_to_cstring_lcpy()?

> I've also found that there are lots and lots of places where the
> text end of the conversion needs to be a Datum not a text *,
> so it seems worthwhile to introduce a couple of macros to minimize
> notation in that case:
>
> #define CStringGetTextDatum(s) PointerGetDatum(cstring_to_text(s))
> #define TextDatumGetCString(d) text_to_cstring((text *) DatumGetPointer(d))
>

Yes, I recall coming across a number of sites where these macros would
come in handy.

> Lastly, the originally submitted text-to-something functions would
> work correctly on plain and 1-byte-header datums, but not on
> compressed or toasted-out-of-line datums. There are a whole lot of
> places where that's not good enough. Rather than expecting the caller
> to use the right detoasting macro everywhere, it seems best to make
> these functions cope with any variant. That also avoids memory
> leakage by allowing the intermediate copy to be pfree'd. (I had
> suggested that the pfree might be pointless, but I reconsidered ---
> if the text object is large enough to be compressed or toasted,
> we're talking about at least several K, so it's worth not leaking.)
>

Excellent. My patch didn't contemplate dealing with
compressed/toasted datums because, quite frankly, I didn't know *how*
to deal with them correctly. Much to learn about varlenas, I still
have.

Cheers,
BJ


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Brendan Jurd" <direvus(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Text <-> C string
Date: 2008-03-25 17:45:52
Message-ID: 28064.1206467152@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

"Brendan Jurd" <direvus(at)gmail(dot)com> writes:
> On 26/03/2008, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> ... What I think is more useful is
>> a strlcpy()-like function that copies into a caller-supplied buffer
>> of limited size. For lack of a better idea I propose defining it
>> *exactly* like strlcpy:
>>
>> extern size_t textlcpy(char *dst, const text *src, size_t siz);

> I'm all for providing a function with this behaviour, but is
> textlcpy() a bit ambiguous?

Fair enough, I'm not wedded to that name. Search-and-replace is
still easy enough at this point ...

> A text_to_cstring_with_len() or text_to_cstring_limit() might be more
> to the point, and more consistent with the other functions in the
> family.

Hmm. The thing that's bothering me is that the length is the size
of the *destination*, which is not like cstring_to_text_with_len,
so using a closely similar name might be confusing. Of those two
I'd go with text_to_cstring_limit. Another thought that comes to
mind is

void text_to_cstring_buffer(const text *src, char *dst, size_t dst_len)

Anyone have other ideas?

regards, tom lane


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Brendan Jurd <direvus(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Text <-> C string
Date: 2008-03-25 18:05:57
Message-ID: 200803251805.m2PI5wu07323@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Tom Lane wrote:
> > A text_to_cstring_with_len() or text_to_cstring_limit() might be more
> > to the point, and more consistent with the other functions in the
> > family.
>
> Hmm. The thing that's bothering me is that the length is the size
> of the *destination*, which is not like cstring_to_text_with_len,
> so using a closely similar name might be confusing. Of those two
> I'd go with text_to_cstring_limit. Another thought that comes to
> mind is
>
> void text_to_cstring_buffer(const text *src, char *dst, size_t dst_len)

I think I like buffer.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://postgres.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Brendan Jurd <direvus(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Text <-> C string
Date: 2008-03-25 18:08:19
Message-ID: 20080325180819.GE12129@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Tom Lane escribió:
> "Brendan Jurd" <direvus(at)gmail(dot)com> writes:

> > A text_to_cstring_with_len() or text_to_cstring_limit() might be more
> > to the point, and more consistent with the other functions in the
> > family.
>
> Hmm. The thing that's bothering me is that the length is the size
> of the *destination*, which is not like cstring_to_text_with_len,
> so using a closely similar name might be confusing. Of those two
> I'd go with text_to_cstring_limit. Another thought that comes to
> mind is
>
> void text_to_cstring_buffer(const text *src, char *dst, size_t dst_len)

text_to_cstring_buffer seems okay. I did wonder for a bit whether it
should be

void text_to_cstring_buffer(const text *src, char *buf, size_t buf_len)

but then the src/dst pair seems better than src/buf.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


From: "Pavel Stehule" <pavel(dot)stehule(at)gmail(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Text <-> C string
Date: 2008-03-25 19:03:44
Message-ID: 162867790803251203k4177b668h53ace468f5ae4aa@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

>
> extern text *cstring_to_text_with_len(const char *s, int len);

buffer_to_text ???

Regards
Pavel Stehule


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Brendan Jurd" <direvus(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, pgsql-patches(at)postgresql(dot)org
Subject: Re: [PATCHES] Text <-> C string
Date: 2008-03-25 22:48:04
Message-ID: 1139.1206485284@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

"Brendan Jurd" <direvus(at)gmail(dot)com> writes:
> As discussed on -hackers, I'm trying to get rid of some redundant code
> by creating a widely useful set of functions to convert between text
> and C string in the backend.

Applied with revisions --- the functions were modified as per recent
discussion, and I fixed a lot more potential call sites.

There are no textout/textin calls left, but I may have missed some
places that were doing it the hard way with direct palloc/memcpy
manipulations. It might be worth trolling all the VARDATA() references
to see if any more are easily replaceable.

I notice in particular that xfunc.sgml contains sample C functions to
copy and concatenate text. While these aren't directly replaceable
with the new functions, I wonder whether we ought to change the examples
to make them less certain to break if we ever change text's
representation.

regards, tom lane


From: "Brendan Jurd" <direvus(at)gmail(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org, pgsql-patches(at)postgresql(dot)org
Subject: Re: [PATCHES] Text <-> C string
Date: 2008-03-28 16:12:51
Message-ID: 37ed240d0803280912u530b6107xfdd14bcbd42a2338@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On 26/03/2008, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> There are no textout/textin calls left, but I may have missed some
> places that were doing it the hard way with direct palloc/memcpy
> manipulations. It might be worth trolling all the VARDATA() references
> to see if any more are easily replaceable.
>

I think there are perhaps a dozen such sites, and I hope to submit a
cleanup patch that gets rid of these soon.

I did come across one site I'm not sure about in utils/adt/xml.c:291

/*
* We need a null-terminated string to pass to parse_xml_decl(). Rather
* than make a separate copy, make the temporary result one byte bigger
* than it needs to be.
*/
result = palloc(nbytes + 1 + VARHDRSZ);
SET_VARSIZE(result, nbytes + VARHDRSZ);
memcpy(VARDATA(result), str, nbytes);
str = VARDATA(result);
str[nbytes] = '\0';

The author seemed pretty sure he didn't want to make a separate copy
of the string, so I'm thinking there's not a whole lot I can do to
improve this site.

> I notice in particular that xfunc.sgml contains sample C functions to
> copy and concatenate text. While these aren't directly replaceable
> with the new functions, I wonder whether we ought to change the examples
> to make them less certain to break if we ever change text's
> representation.
>

Yes, these sample C functions are shown in tutorial/funcs.c and
funcs_new.c as well. I agree that the examples could do with
changing, but don't have any keen insight on what to change them to.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Brendan Jurd" <direvus(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, pgsql-patches(at)postgresql(dot)org
Subject: Re: [PATCHES] Text <-> C string
Date: 2008-03-28 19:26:21
Message-ID: 13684.1206732381@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

"Brendan Jurd" <direvus(at)gmail(dot)com> writes:
> On 26/03/2008, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> There are no textout/textin calls left, but I may have missed some
>> places that were doing it the hard way with direct palloc/memcpy
>> manipulations. It might be worth trolling all the VARDATA() references
>> to see if any more are easily replaceable.

> I think there are perhaps a dozen such sites, and I hope to submit a
> cleanup patch that gets rid of these soon.
> I did come across one site I'm not sure about in utils/adt/xml.c:291

I intentionally didn't touch xml.c, nor anyplace that is not dealing
in text, even if it happens to be binary-compatible with text.

regards, tom lane


From: "Brendan Jurd" <direvus(at)gmail(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org, pgsql-patches(at)postgresql(dot)org
Subject: Re: [PATCHES] Text <-> C string
Date: 2008-03-28 19:40:46
Message-ID: 37ed240d0803281240g73e4433i4cb356410045f5c7@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On 29/03/2008, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> I intentionally didn't touch xml.c, nor anyplace that is not dealing
> in text, even if it happens to be binary-compatible with text.
>

Hmm, okay. My original submission did include a few such changes; for
example, in xml_in and xml_out_internal I saw that the conversion
between xmltype and cstring was identical to the text conversion, so I
went ahead and used the text functions. Feedback upthread suggested
that it was okay to go ahead with casting in identical cases. [1]

I saw that these changes made it into the commit, so I assumed that it
was the right call.

If we don't want to meddle with xmltype/bytea/VarChar at all, we'll
have to revert those changes, and I'll have to seriously scale back
the cleanup patch I was about to post.

Regards,
BJ

[1] http://archives.postgresql.org/pgsql-hackers/2007-09/msg01094.php


From: "Brendan Jurd" <direvus(at)gmail(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org, pgsql-patches(at)postgresql(dot)org
Subject: Re: [HACKERS] Text <-> C string
Date: 2008-04-16 06:52:22
Message-ID: 37ed240d0804152352r17d74dc7n33cb966291c08a3f@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

- -----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Sat, Mar 29, 2008 at 5:40 AM, Brendan Jurd wrote:
> On 29/03/2008, Tom Lane wrote:
> > I intentionally didn't touch xml.c, nor anyplace that is not dealing
> > in text, even if it happens to be binary-compatible with text.
> >
>
> Hmm, okay. My original submission did include a few such changes; for
> example, in xml_in and xml_out_internal I saw that the conversion
> between xmltype and cstring was identical to the text conversion, so I
> went ahead and used the text functions. Feedback upthread suggested
> that it was okay to go ahead with casting in identical cases. [1]
>
> I saw that these changes made it into the commit, so I assumed that it
> was the right call.
>
> If we don't want to meddle with xmltype/bytea/VarChar at all, we'll
> have to revert those changes, and I'll have to seriously scale back
> the cleanup patch I was about to post.

Still not sure where we stand on the above. To cast, or not to cast?

Cheers,
BJ
- -----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.7 (GNU/Linux)
Comment: http://getfiregpg.org

iD8DBQFIBaFy5YBsbHkuyV0RAsMmAKDHaEb7aMudKJgVxcf5RRcOtAJ+bwCgivLI
5B3xJze46NGWjEyOpq/TSaY=
=BObd
- -----END PGP SIGNATURE-----

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.7 (GNU/Linux)
Comment: http://getfiregpg.org

iD8DBQFIBaIl5YBsbHkuyV0RArDDAJ0QThLXAhzy+NX+2YsF+q4z/sIy1QCeJPiW
s/rVns3FFQVP5g9DTOshDfQ=
=4Tdh
-----END PGP SIGNATURE-----


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Brendan Jurd" <direvus(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, pgsql-patches(at)postgresql(dot)org
Subject: Re: [HACKERS] Text <-> C string
Date: 2008-04-16 15:16:25
Message-ID: 8456.1208358985@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

"Brendan Jurd" <direvus(at)gmail(dot)com> writes:
>> If we don't want to meddle with xmltype/bytea/VarChar at all, we'll
>> have to revert those changes, and I'll have to seriously scale back
>> the cleanup patch I was about to post.

> Still not sure where we stand on the above. To cast, or not to cast?

I dunno. I know there was previously some handwaving about representing
XML values in some more intelligent fashion than a plain text string,
but I have no idea if anyone is likely to do anything about it in the
foreseeable future.

regards, tom lane


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Brendan Jurd <direvus(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, pgsql-patches(at)postgresql(dot)org
Subject: Re: [HACKERS] Text <-> C string
Date: 2008-04-16 15:46:38
Message-ID: 48061F5E.5070809@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Tom Lane wrote:
> "Brendan Jurd" <direvus(at)gmail(dot)com> writes:
>
>>> If we don't want to meddle with xmltype/bytea/VarChar at all, we'll
>>> have to revert those changes, and I'll have to seriously scale back
>>> the cleanup patch I was about to post.
>>>
>
>
>> Still not sure where we stand on the above. To cast, or not to cast?
>>
>
> I dunno. I know there was previously some handwaving about representing
> XML values in some more intelligent fashion than a plain text string,
> but I have no idea if anyone is likely to do anything about it in the
> foreseeable future.
>
>
>

It seems very unlikely to me.

cheers

andrew


From: "Brendan Jurd" <direvus(at)gmail(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org, pgsql-patches(at)postgresql(dot)org
Subject: Re: [HACKERS] Text <-> C string
Date: 2008-04-16 21:19:55
Message-ID: 37ed240d0804161419h9d84f4bkfef2edb4966d23ef@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Thu, Apr 17, 2008 at 1:16 AM, Tom Lane wrote:
> "Brendan Jurd" writes:
>
> >> If we don't want to meddle with xmltype/bytea/VarChar at all, we'll
> >> have to revert those changes, and I'll have to seriously scale back
> >> the cleanup patch I was about to post.
>
> > Still not sure where we stand on the above. To cast, or not to cast?
>
> I dunno. I know there was previously some handwaving about representing
> XML values in some more intelligent fashion than a plain text string,
> but I have no idea if anyone is likely to do anything about it in the
> foreseeable future.
>

Well ... if somebody does want to change the representation of xml
down the road, he's going to have to touch all the sites where the
code converts to and from cstring anyway, right?

So the only real difference this will make is that, instead of having
to replace four lines of VARDATA/memcpy per site, he'll have to
replace a single function call. That doesn't seem like a negative.

With that in mind, please find attached my followup patch. It cleans
up another 21 sites manually copying between cstring and varlena, for
a net reduction of 115 lines of code.

I didn't attempt to work through every reference to VARDATA, but I did
look at every hit from a `grep -Rn VARDATA . | grep memcpy`.

All regression tests passed (contrib tests included) on gentoo.

Patch added to commitfest queue.

Cheers,
BJ
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: http://getfiregpg.org

iD8DBQFIBm105YBsbHkuyV0RAmqfAKCfyNyFdciqX4QV81sG9MhPt+KXuACfe694
3d/ICZF6yqV6K20X3TVX+So=
=CvRM
-----END PGP SIGNATURE-----

Attachment Content-Type Size
text-cstring-followup.diff.bz2 application/x-bzip2 3.3 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Brendan Jurd" <direvus(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, pgsql-patches(at)postgresql(dot)org
Subject: Re: [HACKERS] Text <-> C string
Date: 2008-05-04 16:49:15
Message-ID: 22724.1209919755@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

"Brendan Jurd" <direvus(at)gmail(dot)com> writes:
> Well ... if somebody does want to change the representation of xml
> down the road, he's going to have to touch all the sites where the
> code converts to and from cstring anyway, right?

True.

> With that in mind, please find attached my followup patch. It cleans
> up another 21 sites manually copying between cstring and varlena, for
> a net reduction of 115 lines of code.

I applied most of this, but not the parts that were dependent on the
assumption that bytea and text are the same. That is unlikely to remain
true if we ever get around to putting locale/encoding information into
text values. Furthermore it's a horrid type pun, because bytea can
contain embedded nulls whereas cstrings can't; you were making
unwarranted assumptions about whether, say, cstring_to_text_with_len
would allow embedded nulls to go by. And lastly, quite a few of those
changes were just plain broken, eg several places in selfuncs.c where
you allowed strlen() to be applied to a "bytea converted to cstring".

regards, tom lane