Re: psql and bytea

Lists: pgsql-general
From: Féliciano Matias <feliciano(dot)matias(at)free(dot)fr>
To: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: psql and bytea
Date: 2003-05-15 10:29:21
Message-ID: 1052994560.18364.6.camel@one.myworld
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Sorry for my poor English.

First, postgresql is a beautiful DBMS !

I try to use bytea type with psql to store some files (i know, there are
\lo_*).

There are some informations in the man page of psql :

------------------------------------
Another possible use of this mechanism is to copy the contents of a file
into a field. First load the file into a variable and then proceed as
above.

testdb=> \set content ’\’’ ‘cat my_file.txt‘ ’\’’
testdb=> INSERT INTO my_table VALUES (:content);

One possible problem with this approach is that my_file.txt might
contain single quotes. These need to be escaped so that they don’t cause
a syntax error when the third line is processed. This could be done
with the program sed:

testdb=> \set content ’\’’ ‘sed -e "s/’/\\\\\\’/g" < my_file.txt‘ ’\’’

But this does not work with binary string (bytea) since i want to store
any kind of files.
------------------------------------

I check the documentation (really fantastic documentation ! Great job):
http://www.postgresql.org/docs/view.php?version=7.3&idoc=0&file=datatype-binary.html

I build two very simple tools to convert a binary stream to a bytea
stream.

------------------------------------
tobytea_sql.c :
/*
* convert a stream to bytea.
* The '\' is escaped, so it can be used in a SQL statement.
*/
#include <stdio.h>
int main(void) {
int c ;
while ((c = getchar()) != EOF) {
if ( c == '\'' || c == '\\' ||
c < 32 || c > 126 ) {
printf("\\\\%d%d%d",
(c % (8*8*8)) / (8*8),
(c % (8*8)) / 8,
c % 8) ;
}
else {
putchar(c) ;
}
}
return 0 ;
}
------------------------------------
/*
* convert a bytea stream
* This is not compatible with the output of tobytea_sql !
* tobytea_sql put two \ . frombytea espect only one !
*/
#include <stdio.h>
int main(void) {
int c ;
int car ;
while ((c = getchar()) != EOF) {
if ( c < 32 || c > 126) {
fprintf(stderr, "Char ignored\n") ;
break ;
}
if ( c == '\\' ) {
if ((c = getchar()) == EOF) return 0 ;
car = (c - '0') * 8 * 8 ;
if ((c = getchar()) == EOF) return 0 ;
car += (c - '0') * 8 ;
if ((c = getchar()) == EOF) return 0 ;
car += c - '0' ;
putchar(car) ;
}
else {
putchar(c) ;
}
}
return 0 ;
}
------------------------------------

How to use it ?
insert into a table :
$ psql
[...]
one=> create temp table tmp (data bytea) ;
CREATE TABLE
one=> -- push some data in a psql variable
one=> \set content '\''`tobytea_sql < data`'\''
one=> \echo :content
'o\\030\\251\\273C7\\266K\\331 ... [snip] ... \323W\\027\\256D'
one=> -- insert
one=> insert into tmp (data) values (:content) ;
INSERT 35042356 1
one=> -- vrite the content of temp.data to a file
one=> \a
Output format is unaligned.
one=> \t
Showing only tuples.
one=> \o | frombytea > data
one=> select data from tmp ;
one=> Char ignored -- psql send a CR at the end of the row
(frombytea drop it since CR = 10 and it is not a "normal" char for a
bytea stream).

Is there a bug in psql ? Check this :

WARNING : this example take a lot of memory !

Create a big file with some random data :
$ dd if=/dev/urandom bs=1M count=20 of=data
$ #the bytea stream is about 74 Mo !)
$ tobytea < data | wc
0 81574 74379444
$ psql
[...]
one=> \set content '\''`tobytea_sql < data`'\''
one=> insert into tmp (data) values(:content) ;
one'> -- Problem !
one'> ' -- try to continue
one(> ) ;
INSERT 35042373 1
one=> select length(data) from tmp ;
length
--
18924736
(1 row)

one=> -- this is not the original size (20M : 20971520).

I don't have any problems with smaller files (10Mo is always fine).

All of this is used with postgresql 7.3.2 shipped with Red Hat Linux 9 .

--
Féliciano Matias <feliciano(dot)matias(at)free(dot)fr>


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Féliciano Matias <feliciano(dot)matias(at)free(dot)fr>
Cc: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: psql and bytea
Date: 2003-05-15 14:47:53
Message-ID: 8509.1053010073@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

=?ISO-8859-1?Q?F=E9liciano?= Matias <feliciano(dot)matias(at)free(dot)fr> writes:
> one=3D> -- this is not the original size (20M : 20971520).
> I don't have any problems with smaller files (10Mo is always fine).
> All of this is used with postgresql 7.3.2 shipped with Red Hat Linux 9 .

Hmm. There used to be some off-by-one type bugs in psql's
variable-substitution code, but those were fixed long before 7.3.2.
In any case, it's hard to see why such a problem would only arise
when you got past 10Mb string lengths.

I couldn't duplicate the problem here, so I'm going to suggest that
maybe you have a hardware problem? Perhaps there's a flaky RAM chip in
an area of memory that doesn't get used until you push up the size
of psql quite a bit. It'd be worth running memtest86 for awhile to
check.

regards, tom lane


From: Féliciano Matias <feliciano(dot)matias(at)free(dot)fr>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: psql and bytea
Date: 2003-05-16 04:56:02
Message-ID: 1053060938.4573.53.camel@one.myworld
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Le jeu 15/05/2003 à 16:47, Tom Lane a écrit :
> =?ISO-8859-1?Q?F=E9liciano?= Matias <feliciano(dot)matias(at)free(dot)fr> writes:
> > one=3D> -- this is not the original size (20M : 20971520).
> > I don't have any problems with smaller files (10Mo is always fine).
> > All of this is used with postgresql 7.3.2 shipped with Red Hat Linux 9 .
>
> Hmm. There used to be some off-by-one type bugs in psql's
> variable-substitution code, but those were fixed long before 7.3.2.
> In any case, it's hard to see why such a problem would only arise
> when you got past 10Mb string lengths.
>

More then 10Mb string lengths :
> > $ tobytea < data | wc
> > 0 81574 74379444 (71 Mo)

> I couldn't duplicate the problem here,

It seems it's a bug in glibc-2.3.2-27.9.i686.rpm coming with RH9.

> so I'm going to suggest that
> maybe you have a hardware problem? Perhaps there's a flaky RAM chip in
> an area of memory that doesn't get used until you push up the size
> of psql quite a bit. It'd be worth running memtest86 for awhile to
> check.

The hardware is ok.

I found the problem. It's bug in sprintf()
===============================================
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define SIZE (1024*70000)
int main(void) {
char * s = malloc(SIZE) ;
char * d = malloc(SIZE) ;
memset(s, 'a', SIZE-1) ;
d[SIZE-1] = '\0' ;
sprintf(d,"%s",s) ;
printf("%zi\n", strlen(d)) ;
return 0 ;
}
===============================================
$ ./a.out
67108863 (2^26-1)

My libc have a problem. I will fill a bug report to
http://bugzilla.redhat.com/ .

I apply this patch to the postgresql source :
===============================================
diff -urN postgresql-7.3.2.orig/src/bin/psql/mainloop.c postgresql-7.3.2/src/bin/psql/mainloop.c
--- postgresql-7.3.2.orig/src/bin/psql/mainloop.c 2002-10-13 01:09:34.000000000 +0200
+++ postgresql-7.3.2/src/bin/psql/mainloop.c 2003-05-16 04:46:03.000000000 +0200
@@ -389,8 +389,9 @@
exit(EXIT_FAILURE);
}

- sprintf(new, "%.*s%s%s", i, line, value,
- &line[i + thislen + in_length]);
+ sprintf(new, "%.*s", i, line) ;
+ strcat(&new[i], value) ;
+ strcat(&new[i+out_length], &line[i + thislen + in_length]) ;

free(line);
line = new;
===============================================

Also, i want to know if someone is interesting by tools such as
(to|from)bytea. I can put this little toys (with some enhancement.
"--help" :-) ) in contrib and update the man page of psql to show how
to use this tools to copy the content of a file (or stream) into a
field.
Since Postgresql have binary string, it is interesting and better than
\lo_* for little files (this use more memory than \lo_*).

By the way, i can add an option to psql that toggle the output of
carrier return at the end of each records. Perhaps this can be add to
\t.

>
> regards, tom lane
>

--
Féliciano Matias <feliciano(dot)matias(at)free(dot)fr>


From: "Wayne Armstrong" <wdarmst(at)bacchus(dot)com(dot)au>
To: Féliciano Matias <feliciano(dot)matias(at)free(dot)fr>
Cc: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: psql and bytea
Date: 2003-05-16 05:09:28
Message-ID: 200305160509.h4G59V8f003784@mail.bacchus.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

** Reply to message from Féliciano Matias <feliciano(dot)matias(at)free(dot)fr> on 16 May
2003 06:56:02 +0200
Hi,
I'm interested :)

Regards,
Wayne

> Le jeu 15/05/2003 à 16:47, Tom Lane a écrit :
> > =?ISO-8859-1?Q?F=E9liciano?= Matias <feliciano(dot)matias(at)free(dot)fr> writes:
> > > one=3D> -- this is not the original size (20M : 20971520).
> > > I don't have any problems with smaller files (10Mo is always fine).
> > > All of this is used with postgresql 7.3.2 shipped with Red Hat Linux 9 .
> >
> > Hmm. There used to be some off-by-one type bugs in psql's
> > variable-substitution code, but those were fixed long before 7.3.2.
> > In any case, it's hard to see why such a problem would only arise
> > when you got past 10Mb string lengths.
> >
>
> More then 10Mb string lengths :
> > > $ tobytea < data | wc
> > > 0 81574 74379444 (71 Mo)
>
> > I couldn't duplicate the problem here,
>
> It seems it's a bug in glibc-2.3.2-27.9.i686.rpm coming with RH9.
>
> > so I'm going to suggest that
> > maybe you have a hardware problem? Perhaps there's a flaky RAM chip in
> > an area of memory that doesn't get used until you push up the size
> > of psql quite a bit. It'd be worth running memtest86 for awhile to
> > check.
>
> The hardware is ok.
>
> I found the problem. It's bug in sprintf()
> ===============================================
> #include <stdio.h>
> #include <stdlib.h>
> #include <string.h>
> #define SIZE (1024*70000)
> int main(void) {
> char * s = malloc(SIZE) ;
> char * d = malloc(SIZE) ;
> memset(s, 'a', SIZE-1) ;
> d[SIZE-1] = '\0' ;
> sprintf(d,"%s",s) ;
> printf("%zi\n", strlen(d)) ;
> return 0 ;
> }
> ===============================================
> $ ./a.out
> 67108863 (2^26-1)
>
> My libc have a problem. I will fill a bug report to
> http://bugzilla.redhat.com/ .
>
> I apply this patch to the postgresql source :
> ===============================================
> diff -urN postgresql-7.3.2.orig/src/bin/psql/mainloop.c postgresql-7.3.2/src/bin/psql/mainloop.c
> --- postgresql-7.3.2.orig/src/bin/psql/mainloop.c 2002-10-13 01:09:34.000000000 +0200
> +++ postgresql-7.3.2/src/bin/psql/mainloop.c 2003-05-16 04:46:03.000000000 +0200
> @@ -389,8 +389,9 @@
> exit(EXIT_FAILURE);
> }
>
> - sprintf(new, "%.*s%s%s", i, line, value,
> - &line[i + thislen + in_length]);
> + sprintf(new, "%.*s", i, line) ;
> + strcat(&new[i], value) ;
> + strcat(&new[i+out_length], &line[i + thislen + in_length]) ;
>
> free(line);
> line = new;
> ===============================================
>
> Also, i want to know if someone is interesting by tools such as
> (to|from)bytea. I can put this little toys (with some enhancement.
> "--help" :-) ) in contrib and update the man page of psql to show how
> to use this tools to copy the content of a file (or stream) into a
> field.
> Since Postgresql have binary string, it is interesting and better than
> \lo_* for little files (this use more memory than \lo_*).
>
> By the way, i can add an option to psql that toggle the output of
> carrier return at the end of each records. Perhaps this can be add to
> \t.
>
> >
> > regards, tom lane
> >
>
> --
> Féliciano Matias <feliciano(dot)matias(at)free(dot)fr>


From: Féliciano Matias <feliciano(dot)matias(at)free(dot)fr>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: psql and bytea
Date: 2003-05-16 05:21:43
Message-ID: 1053062502.4573.57.camel@one.myworld
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

> My libc have a problem. I will fill a bug report to
> http://bugzilla.redhat.com/ .
>
done :
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=90987

--
Féliciano Matias <feliciano(dot)matias(at)free(dot)fr>


From: Féliciano Matias <feliciano(dot)matias(at)free(dot)fr>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: psql and bytea
Date: 2003-06-10 06:21:32
Message-ID: 1055226052.6253.11.camel@one.myworld
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Le ven 16/05/2003 à 06:56, Féliciano Matias a écrit :

> > All of this is used with postgresql 7.3.2 shipped with Red Hat Linux 9

> [...]

> I found the problem. It's bug in sprintf()
> ===============================================
> #include <stdio.h>
> #include <stdlib.h>
> #include <string.h>
> #define SIZE (1024*70000)
> int main(void) {
> char * s = malloc(SIZE) ;
> char * d = malloc(SIZE) ;
> memset(s, 'a', SIZE-1) ;
> d[SIZE-1] = '\0' ;
> sprintf(d,"%s",s) ;
> printf("%zi\n", strlen(d)) ;
> return 0 ;
> }
> ===============================================
> $ ./a.out
> 67108863 (2^26-1)

> https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=90987

from

bugzilla(at)redhat(dot)com

------- Additional Comments From drepper(at)redhat(dot)com 2003-06-09 23:22 -------
The current glibc CVS code has been changed to not have this liimtation anymore.

--
Féliciano Matias <feliciano(dot)matias(at)free(dot)fr>


From: Dennis Gearon <gearond(at)cvc(dot)net>
To: Féliciano Matias <feliciano(dot)matias(at)free(dot)fr>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: psql and bytea
Date: 2003-06-10 17:45:30
Message-ID: 3EE6193A.8060805@cvc.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

I've seen that come up several times now.

Féliciano Matias wrote:

> Le ven 16/05/2003 à 06:56, Féliciano Matias a écrit :
>
>
>>>All of this is used with postgresql 7.3.2 shipped with Red Hat Linux 9
>
>
>>[...]
>
>
>>I found the problem. It's bug in sprintf()
>>===============================================
>>#include <stdio.h>
>>#include <stdlib.h>
>>#include <string.h>
>>#define SIZE (1024*70000)
>>int main(void) {
>> char * s = malloc(SIZE) ;
>> char * d = malloc(SIZE) ;
>> memset(s, 'a', SIZE-1) ;
>> d[SIZE-1] = '\0' ;
>> sprintf(d,"%s",s) ;
>> printf("%zi\n", strlen(d)) ;
>> return 0 ;
>>}
>>===============================================
>>$ ./a.out
>>67108863 (2^26-1)
>
>
>>https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=90987
>
>
> from
>
> bugzilla(at)redhat(dot)com
>
>
> ------- Additional Comments From drepper(at)redhat(dot)com 2003-06-09 23:22 -------
> The current glibc CVS code has been changed to not have this liimtation anymore.
>