BUG #4866: ECPG and BYTEA

Lists: pgsql-bugs
From: "Rick Levine" <Richard_D_Levine(at)raytheon(dot)com>
To: pgsql-bugs(at)postgresql(dot)org
Subject: BUG #4866: ECPG and BYTEA
Date: 2009-06-19 21:31:50
Message-ID: 200906192131.n5JLVoMo044178@wwwmaster.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs


The following bug has been logged online:

Bug reference: 4866
Logged by: Rick Levine
Email address: Richard_D_Levine(at)raytheon(dot)com
PostgreSQL version: 8.3.7
Operating system: Windows Vista
Description: ECPG and BYTEA
Details:

ECPG does not handle BYTEA columns properly. When I encode a unsigned char
array using PQescapeByteaConn and send it to the server, it is not stored as
the original bytes, but rather is stored as the escaped string (much
larger).

//This doesn't work. Stored encoded.

bytea_var =
PQescapeByteaConn(connection, bytea_hostvar,
bytea_len, &new_len);
EXEC SQL AT :connection INSERT INTO Btable
(index, bytea_col)
VALUES
(:index_var, :bytea_var);

//This doesn't work. Stored encoded.

bytea_var =
PQescapeByteaConn(connection, bytea_hostvar,
bytea_len, new_len);
EXEC SQL AT :connection INSERT INTO Btable
(index, bytea_col)
VALUES
(:index_var, decode(:bytea_var, 'escape'));

//This doesn't work. Error.
bytea_var =
PQescapeByteaConn(connection, bytea_hostvar,
bytea_len, new_len);
EXEC SQL AT :connection INSERT INTO Btable
(index, bytea_col)
VALUES
(:index_var, :bytea_var::BYTEA);

Furthermore, when I fetch the BYTEA column value back, I have to decode it
twice (using PQunescapeBytea) to get back the original array of bytes.

I see three problems. I have to use functions from the C interface, not
documented as part of ECPG, to get this to work at all; my storage size is
quadrupled on disk; and the data communicated between the client and server
is even bigger than that.

The problem, as I see it, is that there's no way for the ECPG parser to
unequivocally determine the size of the byte array pointed to by the host
variable. Sure, if it's declared as

EXEC SQL BEGIN DECLARE SECTION;
unsigned char bytea_hostvar[1024];
EXEC SQL END DECLARE SECTION;

you can figure it out, but otherwise not. That is how I declared it BTW.
This causes the need to create a null terminated string to send to the
server, rather than just sending the original bytes.

We know the coder knows the size of the buffer, but ECPG doesn't, so the
best solution (to my mind) would be to allow the coder to tell ECPG the
buffer size directly. A clean way to do this would be to allow an indicator
variable containing the size, e.g.

EXEC SQL BEGIN DECLARE SECTION;
unsigned char bytea_hostvar[1024];
int hostvar_ind = 1024;
EXEC SQL END DECLARE SECTION;

EXEC SQL AT :connection INSERT INTO Btable
(index, bytea_col)
VALUES
(:index_var, :bytea_hostvar:hostvar_ind);

I'm just sayin... ;)


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Rick Levine <Richard_D_Levine(at)raytheon(dot)com>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #4866: ECPG and BYTEA
Date: 2009-06-23 08:09:28
Message-ID: 4A408DB8.3060109@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Rick Levine wrote:
> ECPG does not handle BYTEA columns properly. When I encode a unsigned char
> array using PQescapeByteaConn and send it to the server, it is not stored as
> the original bytes, but rather is stored as the escaped string (much
> larger).

Yeah, that's clearly not the right way to do it. You could just use
libpq PGExecParams for those queries. I understand that you'd want to
stick to the ECPG way of doing things, but that's a good work-around.

> We know the coder knows the size of the buffer, but ECPG doesn't, so the
> best solution (to my mind) would be to allow the coder to tell ECPG the
> buffer size directly. A clean way to do this would be to allow an indicator
> variable containing the size, e.g.
>
> EXEC SQL BEGIN DECLARE SECTION;
> unsigned char bytea_hostvar[1024];
> int hostvar_ind = 1024;
> EXEC SQL END DECLARE SECTION;
>
> EXEC SQL AT :connection INSERT INTO Btable
> (index, bytea_col)
> VALUES
> (:index_var, :bytea_hostvar:hostvar_ind);
>
> I'm just sayin... ;)

Yeah, that seems like a clean way to do it. Any idea how this is done in
other databases with embedded C support, like DB2 / Informix? Or SQL
spec, if it has anything to say about this. It would be good to stay
compatible.

(I've added this to the TODO list)

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Richard D Levine <Richard_D_Levine(at)raytheon(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: pgsql-bugs(at)postgresql(dot)org, Ken J Lehto <Ken_J_Lehto(at)raytheon(dot)com>, Gary G Scheitlin <Gary_G_Scheitlin(at)raytheon(dot)com>, Loretta J Rice <Loretta_J_Rice(at)raytheon(dot)com>
Subject: Re: BUG #4866: ECPG and BYTEA
Date: 2009-06-23 18:09:54
Message-ID: OF68C45A35.758403CF-ON862575DE.00614026-852575DE.0063C6D0@mck.us.ray.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

This is how Oracle does it. The equivalent data type to BYTEA in Oracle
is RAW. You can use a host variable to specify the length as well.

EXEC SQL BEGIN DECLARE SECTION;
unsigned char *bytea_hostvar;
EXEC SQL VAR bytea_hostvar IS RAW(1024);
EXEC SQL END DECLARE SECTION;

ANSI SQL 92 specifies the BIT [VARYING] type, but no mention of a byte
data type.

ANSI SQL 92, section 11.28, specifies a <character set definition> CREATE
CHARACTER SET....

You might be able to shoehorn raw bytes in with that, but it wouldn't be
very intuitive, to say the least. It doesn't solve the basic problem of
conveying a buffer size to the embedded language parser for a data type
without a length specified in the schema.

I ended up using Postgres' BIT data type, which works just fine, but
really expands the data passed between client and server.

From:
Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:
Rick Levine <Richard_D_Levine(at)raytheon(dot)com>
Cc:
pgsql-bugs(at)postgresql(dot)org
Date:
06/23/2009 04:09 AM
Subject:
Re: [BUGS] BUG #4866: ECPG and BYTEA

Rick Levine wrote:
> ECPG does not handle BYTEA columns properly. When I encode a unsigned
char
> array using PQescapeByteaConn and send it to the server, it is not
stored as
> the original bytes, but rather is stored as the escaped string (much
> larger).

Yeah, that's clearly not the right way to do it. You could just use
libpq PGExecParams for those queries. I understand that you'd want to
stick to the ECPG way of doing things, but that's a good work-around.

> We know the coder knows the size of the buffer, but ECPG doesn't, so the
> best solution (to my mind) would be to allow the coder to tell ECPG the
> buffer size directly. A clean way to do this would be to allow an
indicator
> variable containing the size, e.g.
>
> EXEC SQL BEGIN DECLARE SECTION;
> unsigned char bytea_hostvar[1024];
> int hostvar_ind = 1024;
> EXEC SQL END DECLARE SECTION;
>
> EXEC SQL AT :connection INSERT INTO Btable
> (index, bytea_col)
> VALUES
> (:index_var, :bytea_hostvar:hostvar_ind);
>
> I'm just sayin... ;)

Yeah, that seems like a clean way to do it. Any idea how this is done in
other databases with embedded C support, like DB2 / Informix? Or SQL
spec, if it has anything to say about this. It would be good to stay
compatible.

(I've added this to the TODO list)

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

The following line is added for your protection and will be used for
analysis if this message is reported as spam:

(Raytheon Analysis: IP=64.18.2.219;
e-from=heikki(dot)linnakangas(at)enterprisedb(dot)com;
from=heikki(dot)linnakangas(at)enterprisedb(dot)com; date=Jun 23, 2009 8:09:36 AM;
subject=Re: [BUGS] BUG #4866: ECPG and BYTEA)