Re: Sending messages inside a function

Lists: pgsql-interfaces
From: Chris Campbell <chris(at)bignerdranch(dot)com>
To: pgsql-interfaces(at)postgresql(dot)org
Subject: Sending messages inside a function
Date: 2005-11-29 15:44:11
Message-ID: 06AA7287-656A-4E49-B1B5-68ACCC03741B@bignerdranch.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-interfaces

I'm trying to write a set of C functions that mimic the COPY TO and
COPY FROM SQL commands so I can use them to stream binary data to/
from the client.

This seems more efficient to me than passing BYTEAs back and forth.
Maybe that's the best way to go, but the COPY mechanism seems perfect
for this. Just look how straightforward the send_to_client() function
was to write (below). I've been using large objects to accomplish
this for the past few years, but as customers' databases get larger
and larger (the bulk being entirely large objects), backing up and
restoring and recovering from corruption is taking a really long
time. So I'm going to store the data elsewhere on the server (outside
the database) and piggy-back the PostgreSQL connection to send the
data back and forth. This way, the database stays much smaller and
makes backup and restore much easier.

I'm envisioning something like this:

(1) Data transfer from Server -> Client
---------------------------------------
Client: SELECT send_to_client()
Server: Sends an 'H' message to begin a COPY TO
Server: Sends a bunch of 'd' messages with the data
Server: Sends a 'c' message to end the COPY operation
Client: Gets PGRES_COPY_OUT from PQresultStatus()
Client: Calls PQgetCopyData() to retrieve data until it returns -1
Client: Calls PQgetResult() to retrieve the results of the SELECT query

(2) Data transfer from Client -> Server
---------------------------------------
Client: SELECT receive_from_client()
Server: Sends a 'G' message to begin a COPY FROM
Client: Gets PGRES_COPY_IN from PQresultStatus()
Client: Calls PQputCopyData() to send the data
Server: Receives a bunch of 'd' messages
Server: Receives a 'c' message to end the COPY operation
Client: Calls PQgetResult() to retrieve the results of the SELECT query

I've gotten #1 working. But the client isn't able to retrieve the
SELECT results from calling the send_to_client() function (which
really doesn't bother me too much -- the send_to_client() function
returns NULL). Instead, I just call PQgetResult() until it returns
NULL and ignore all results. However, my receive_from_client()
function will need to return a status, so I'd like to get this
figured out for send_to_client() before moving on to
receive_from_client().

I'm thinking that the server sends a tuple description message ('T')
before my function is invoked (and sends all the 'H', 'd', and 'c'
messages) and then sends the tuple data message ('D') with the
function's return value afterwards. So, by spitting out messages
inside my function, I'm disconnecting the tuple data from the
descriptions.

I get the following error message:

server sent data ("D" message) without prior row description
("T" message)

I could remedy this by "faking" my own 'T' message after I've
performed the COPY operation (to match the 'D' message that's sent
later). But that seems ugly.

If the 'T' message wasn't sent until after my function was invoked,
that would solve all my problems (if my assumption about the 'T'
message being sent before the function's invoked is correct). I'm
trying to find where in the code the 'T' message is sent and try to
delay it until after the function is actually invoked, but I'm not
having much success. Any comments or pointers?

Here's what my send_to_client() function looks like:

Datum
send_to_client(PG_FUNCTION_ARGS)
{
// Begin a COPY TO STDOUT operation

StringInfoData buf;

pq_beginmessage(&buf, 'H');
pq_sendbyte(&buf, 1); // binary format (1)
pq_sendint(&buf, 0, 2); // 0 attributes
pq_endmessage(&buf);

// TODO: Send the data to the client
// For now, just send a simple binary message

char *buffer = "Hello, world!";

pq_putmessage('d', buffer, strlen(buffer));

// End the COPY operation

pq_putemptymessage('c');

PG_RETURN_NULL();
}

I've called the function from my libpq-based program ("SELECT
send_to_client()") and then retrieved the data using PQgetCopyData()
and it works, as long as I don't care about the results of the SELECT
query and just throw away any PGresults I get after the COPY
operation. But that's going to cause me problems when I implement
receive_from_client().

Thanks!

- Chris


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Chris Campbell <chris(at)bignerdranch(dot)com>
Cc: pgsql-interfaces(at)postgresql(dot)org
Subject: Re: Sending messages inside a function
Date: 2005-11-29 18:28:18
Message-ID: 8583.1133288898@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-interfaces

Chris Campbell <chris(at)bignerdranch(dot)com> writes:
> I'm trying to write a set of C functions that mimic the COPY TO and
> COPY FROM SQL commands so I can use them to stream binary data to/
> from the client.

You cannot control the FE/BE protocol from inside a user function.

> This seems more efficient to me than passing BYTEAs back and forth.

Why? A byte is a byte. There is no possible way that you will ever
save enough microseconds from this to repay the development time you
will waste trying to kluge the system to do it.

If it's the overhead of parsing the SELECT you are unhappy with,
you could look at using the "fast path" function call protocol
to invoke your functions.

regards, tom lane


From: Chris Campbell <chris(at)bignerdranch(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-interfaces(at)postgresql(dot)org
Subject: Re: Sending messages inside a function
Date: 2005-11-29 19:10:08
Message-ID: 6181AE7C-E0E5-44A1-9E02-5E6A246A06B9@bignerdranch.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-interfaces

On Nov 29, 2005, at 13:28, Tom Lane wrote:

> You cannot control the FE/BE protocol from inside a user function.

"cannot" is a bit strong -- the function in my original email did it,
and it works (with caveats). It's good to know that I "should" not,
though. I'm still curious why I got the error message:

server sent data ("D" message) without prior row description
("T" message)

Is the 'T' message sent back to the client before my function was
invoked? Can the 'T' message be suppressed until immediately before
the first piece of tuple data is returned to the client? That way, I
could do my COPY operation before the 'T' message is sent and no one
would be the wiser.

>> This seems more efficient to me than passing BYTEAs back and forth.
>
> Why? A byte is a byte. There is no possible way that you will ever
> save enough microseconds from this to repay the development time you
> will waste trying to kluge the system to do it.

Does that still apply to BYTEAs that are tens (hundreds?) of
megabytes? I was hoping to stream the data so that I wouldn't have to
read it all into memory and pass it to the function (or return it
from the function) as one big chunk. That way, the first part could
be going out over the network while the function was reading in the
rest of the data. It wouldn't have to wait to read the entire piece
of data into memory before transmitting it. I could also put up a
nice little progress bar for the transfer.

If I pass in the data as a BYTEA parameter to the function, is the
data transmitted completely before the function is invoked? If the
function fails before it even needs the data (due to an invalid
argument, etc), it seems wasteful to have transmitted it across the
wire.

I also wanted to avoid the hassle of having to massage the binary
data (escaping values, etc) before transmitting it, but perhaps using
the binary format in libpq is all that's needed. Can I easily turn a
buffer of data into a BYTEA and pass it to PQexecParams()?

Thanks!

- Chris