Lists: | pgsql-bugspgsql-general |
---|
From: | "Michael Enke" <michael(dot)enke(at)wincor-nixdorf(dot)com> |
---|---|
To: | pgsql-bugs(at)postgresql(dot)org |
Subject: | BUG #2574: C function: arg TEXT data corrupt |
Date: | 2006-08-14 13:53:34 |
Message-ID: | 200608141353.k7EDrYPi038307@wwwmaster.postgresql.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-bugs pgsql-general |
The following bug has been logged online:
Bug reference: 2574
Logged by: Michael Enke
Email address: michael(dot)enke(at)wincor-nixdorf(dot)com
PostgreSQL version: 8.1.4
Operating system: Open Suse 10.1 and CentOS 4.3
Description: C function: arg TEXT data corrupt
Details:
I created a C function:
extern Datum test_arg(PG_FUNCTION_ARGS);
PG_FUNCTION_INFO_V1(test_arg);
Datum test_arg(PG_FUNCTION_ARGS) {
elog(INFO, "arg: %s", VARDATA(PG_GETARG_TEXT_P(0)));
PG_RETURN_INT16(0);
}
and used the
CREATE FUNCTION test_arg(TEXT) RETURNS INT4 AS 'path_to_lib.so' LANGUAGE
'C';
If I call this function "the first time"
after connecting with psql, the info output shows
corrupted data, the second and further call is ok:
me(at)noteme:~/uwx9/SVN/tpl/trunk/db/postgresql/src> psql -U tplinux
Welcome to psql 8.1.3, the PostgreSQL interactive terminal.
Type: \copyright for distribution terms
\h for help with SQL commands
\? for help with psql commands
\g or terminate with semicolon to execute query
\q to quit
tplinux=> select test_arg('1');
INFO: arg: 1s(at)3
test_arg
----------
0
(1 row)
tplinux=> select test_arg('1');
INFO: arg: 1
test_arg
----------
0
(1 row)
The same problem is with input length of 2, 3, 4, 8 char (and may be more).
With input of 4 char I never get correct value.
No problem for first call with 5, 6 , 7 char,
but with 5 char problem if before called with 6 char.
From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | "Michael Enke" <michael(dot)enke(at)wincor-nixdorf(dot)com> |
Cc: | pgsql-bugs(at)postgresql(dot)org |
Subject: | Re: BUG #2574: C function: arg TEXT data corrupt |
Date: | 2006-08-14 15:27:00 |
Message-ID: | 9678.1155569220@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-bugs pgsql-general |
"Michael Enke" <michael(dot)enke(at)wincor-nixdorf(dot)com> writes:
> I created a C function:
> extern Datum test_arg(PG_FUNCTION_ARGS);
> PG_FUNCTION_INFO_V1(test_arg);
> Datum test_arg(PG_FUNCTION_ARGS) {
> elog(INFO, "arg: %s", VARDATA(PG_GETARG_TEXT_P(0)));
> PG_RETURN_INT16(0);
The VARDATA of a TEXT datum is not a C string; in particular it is not
guaranteed to be null-terminated. This is an error in your code not
a bug.
The usual way to get a C string from a TEXT datum is to call textout,
eg
str = DatumGetCString(DirectFunctionCall1(textout, datumval));
regards, tom lane
From: | Reece Hart <reece(at)harts(dot)net> |
---|---|
To: | pgsql-general(at)postgresql(dot)org |
Cc: | Michael Enke <michael(dot)enke(at)wincor-nixdorf(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Subject: | text datum VARDATA and strings |
Date: | 2006-08-14 18:04:30 |
Message-ID: | 1155578671.4158.45.camel@tallac.gene.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-bugs pgsql-general |
Michael Enke recently asked in pgsql-bugs about VARDATA and C strings
(BUG #2574: C function: arg TEXT data corrupt). Since that's not a bug,
I've moved this follow-up to pgsql-general.
On Mon, 2006-08-14 at 11:27 -0400, Tom Lane wrote:
> The usual way to get a C string from a TEXT datum is to call textout,
> eg
> str = DatumGetCString(DirectFunctionCall1(textout, datumval));
Yikes! I've been accessing VARDATA text data like Michael for years
(code below). I account for length and don't expect null-termination,
but I don't use anything like Tom's suggestion above. (I always try to
do what Tom says because that usually hurts less.)
I have three questions:
1) I based everything I did on examples lifted nearly verbatim from a
7.x manual, and I bet Michael did similarly. I've never heard of
DatumGetCString, DirectFunctionCall1, or textout. Are these and other
treasures documented somewhere?
2) Does DatumGetCString(DirectFunctionCall1(textout, datumval)) do
something other than null terminate a string? All of the strings are
from [-A-Z0-1*]; server_encoding has been either SQL_ASCII or UTF8 in
case that's relevant.
3) Is there any reason to believe that the code below is problematic?
Thanks,
Reece
#include <postgres.h>
#include <fmgr.h>
#include <ctype.h>
#include <string.h>
static char* clean_sequence(const char* in, int32 n);
PG_FUNCTION_INFO_V1(pg_clean_sequence);
Datum pg_clean_sequence(PG_FUNCTION_ARGS)
{
text* t0; /* in */
text* t1; /* out */
char* tmp;
int32 tmpl;
if ( PG_ARGISNULL(0) )
{ PG_RETURN_NULL(); }
t0 = PG_GETARG_TEXT_P(0);
tmp = clean_sequence( VARDATA(t0), VARSIZE(t0)-VARHDRSZ );
tmpl = (int32) strlen(tmp);
/* copy temp sequence into new pg variable */
t1 = (text*) palloc( tmpl + VARHDRSZ );
if (!t1)
{ elog( ERROR, "couldn't palloc (%d bytes)", tmpl+VARHDRSZ ); }
memcpy(VARDATA(t1),tmp,tmpl);
VARATT_SIZEP(t1) = tmpl + VARHDRSZ;
pfree(tmp);
PG_RETURN_TEXT_P(t1);
}
/* clean_sequence -- strip non-IUPAC symbols
The intent is to strip non-sequence data which might result from
copy-pasting a fasta file or some such.
in: char*, length
out: char*, |out|<=length, NULL-TERMINATED
out is palloc'd memory; caller must free
allow chars from IUPAC std 20
+ selenocysteine (U) + ambiguity (BZX) + gap (-) + stop (*)
*/
#define isseq(c) ( ((c)>='A' && (c)<='Z' && (c)!='J' && (c)!='O') \
|| ((c)=='-') \
|| ((c)=='*') )
char* clean_sequence(const char* in, int32 n) {
char* out;
char* oi;
int32 i;
out = palloc( n + 1 ); /* w/null */
if (!out)
{ elog( ERROR, "couldn't palloc (%d bytes)", n+1 ); }
for( i=0, oi=out; i<=n-1; i++ ) {
char c = toupper(in[i]);
if ( isseq(c) ) {
*oi++ = c;
}
}
*oi = '\0';
return(out);
}
--
Reece Hart, http://harts.net/reece/, GPG:0x25EC91A0
From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Reece Hart <reece(at)harts(dot)net> |
Cc: | pgsql-general(at)postgresql(dot)org, Michael Enke <michael(dot)enke(at)wincor-nixdorf(dot)com> |
Subject: | Re: text datum VARDATA and strings |
Date: | 2006-08-14 19:51:22 |
Message-ID: | 13746.1155585082@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-bugs pgsql-general |
Reece Hart <reece(at)harts(dot)net> writes:
> On Mon, 2006-08-14 at 11:27 -0400, Tom Lane wrote:
>> The usual way to get a C string from a TEXT datum is to call textout,
>> eg
>> str = DatumGetCString(DirectFunctionCall1(textout, datumval));
> Yikes! I've been accessing VARDATA text data like Michael for years
> (code below). I account for length and don't expect null-termination,
> but I don't use anything like Tom's suggestion above.
Sure, that works. The problem with what Michael was doing was that he
was passing the string to elog, which expects a null-terminated string.
Possibly I should have written "usual way to get a null-terminated string"
above, just to be clear.
> 1) I based everything I did on examples lifted nearly verbatim from a
> 7.x manual, and I bet Michael did similarly. I've never heard of
> DatumGetCString, DirectFunctionCall1, or textout. Are these and other
> treasures documented somewhere?
Whose 7.x manual? This stuff has been there since we invented the
"version 1" function call convention, which was 7.3 or before. There
is some documentation in the SGML docs, but really we kind of expect you
to look at the standard built-in functions to see how things are done...
> 2) Does DatumGetCString(DirectFunctionCall1(textout, datumval)) do
> something other than null terminate a string?
At the moment that's all it does ... assuming that you've already
detoasted the text datum ... but it's not impossible that someday
it will do something different. For instance, sooner or later we are
going to support multiple locales/encodings within a single database,
and I wouldn't be surprised if that involves sticking extra data into
text values. So it's best not to assume that you know what is inside a
text datum, if possible,
> 3) Is there any reason to believe that the code below is problematic?
The only thing I'd suggest is that checking for a null return from
palloc is a waste of effort. It doesn't return to you if it runs
out of memory.
regards, tom lane
From: | Reece Hart <reece(at)harts(dot)net> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | pgsql-general(at)postgresql(dot)org, Michael Enke <michael(dot)enke(at)wincor-nixdorf(dot)com> |
Subject: | Re: text datum VARDATA and strings |
Date: | 2006-08-14 22:11:49 |
Message-ID: | 1155593509.4158.134.camel@tallac.gene.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-bugs pgsql-general |
On Mon, 2006-08-14 at 15:51 -0400, Tom Lane wrote:
> Whose 7.x manual? This stuff has been there since we invented the
> "version 1" function call convention, which was 7.3 or before. There
> is some documentation in the SGML docs, but really we kind of expect you
> to look at the standard built-in functions to see how things are done...
The PostgreSQL manual. I wrote these functions early in the 7.x series
and I don't know which manual version exactly. For example, the sec
9.5.4 of
http://www.postgresql.org/docs/7.3/static/xfunc-c.html shows code for
concat_text, which I remember was the basis for what I wrote. I now see
and understand the text regarding detoasting the 'DatumGetXXX' macros;
the relevance of these were not obvious to me at the time.
> So it's best not to assume that you know what is inside a
> text datum, if possible,
Okay. Does that mean that code in 9.5.4 should have a warning to that
effect?
> > 3) Is there any reason to believe that the code below is problematic?
>
> The only thing I'd suggest is that checking for a null return from
> palloc is a waste of effort. It doesn't return to you if it runs
> out of memory.
Okay. Thanks for the advice, Tom.
-Reece
--
Reece Hart, http://harts.net/reece/, GPG:0x25EC91A0