GUC_REPORT for protocol tunables was: Re: Optimize binary serialization format of arrays with fixed size elements

Lists: pgsql-hackers
From: Marko Kreen <markokr(at)gmail(dot)com>
To: Mikko Tiihonen <mikko(dot)tiihonen(at)nitorcreations(dot)com>
Cc: Noah Misch <noah(at)leadboat(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Merlin Moncure <mmoncure(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: GUC_REPORT for protocol tunables was: Re: Optimize binary serialization format of arrays with fixed size elements
Date: 2012-01-23 14:59:53
Message-ID: CACMqXCKkGrGXxQhjHCKCe0B8hn6sTt-1sdgHZOSGQMxrusOsQA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Jan 22, 2012 at 11:47 PM, Mikko Tiihonen
<mikko(dot)tiihonen(at)nitorcreations(dot)com> wrote:
> * introduced a new GUC variable array_output copying the current
>  bytea_output type, with values "full" (old value) and
>  "smallfixed" (new default)
> * added documentation for the new GUC variable

If this variable changes protocol-level layout
and is user-settable, shouldn't it be GUC_REPORT?

Now that I think about it, same applies to bytea_output?

You could say the problem does not appear if the
clients always accepts server default. But how can
the client know the default? If the client is required
to do "SHOW" before it can talk to server then that
seems to hint those vars should be GUC_REPORT.

Same story when clients are always expected to set
the vars to their preferred values. Then you get
clients with different settings on one server.
This breaks transaction-pooling setups (pgbouncer).
Again, such protocol-changing tunables should be
GUC_REPORT.

--
marko


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Marko Kreen <markokr(at)gmail(dot)com>
Cc: Mikko Tiihonen <mikko(dot)tiihonen(at)nitorcreations(dot)com>, Noah Misch <noah(at)leadboat(dot)com>, Merlin Moncure <mmoncure(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: GUC_REPORT for protocol tunables was: Re: Optimize binary serialization format of arrays with fixed size elements
Date: 2012-01-23 15:52:20
Message-ID: CA+TgmoZvFfFW6qktzfbozzBX6MUJp_3UhMP+_ywHGC4oyzF67g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Jan 23, 2012 at 9:59 AM, Marko Kreen <markokr(at)gmail(dot)com> wrote:
> On Sun, Jan 22, 2012 at 11:47 PM, Mikko Tiihonen
> <mikko(dot)tiihonen(at)nitorcreations(dot)com> wrote:
>> * introduced a new GUC variable array_output copying the current
>>  bytea_output type, with values "full" (old value) and
>>  "smallfixed" (new default)
>> * added documentation for the new GUC variable
>
> If this variable changes protocol-level layout
> and is user-settable, shouldn't it be GUC_REPORT?
>
> Now that I think about it, same applies to bytea_output?
>
> You could say the problem does not appear if the
> clients always accepts server default.  But how can
> the client know the default?  If the client is required
> to do "SHOW" before it can talk to server then that
> seems to hint those vars should be GUC_REPORT.
>
> Same story when clients are always expected to set
> the vars to their preferred values.  Then you get
> clients with different settings on one server.
> This breaks transaction-pooling setups (pgbouncer).
> Again, such protocol-changing tunables should be
> GUC_REPORT.

Probably so. But I think we need not introduce quite so many new
threads on this patch. This is, I think, at least thread #4, and
that's making the discussion hard to follow.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Marko Kreen <markokr(at)gmail(dot)com>, Mikko Tiihonen <mikko(dot)tiihonen(at)nitorcreations(dot)com>, Noah Misch <noah(at)leadboat(dot)com>, Merlin Moncure <mmoncure(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: GUC_REPORT for protocol tunables was: Re: Optimize binary serialization format of arrays with fixed size elements
Date: 2012-01-23 16:20:52
Message-ID: 778.1327335652@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Mon, Jan 23, 2012 at 9:59 AM, Marko Kreen <markokr(at)gmail(dot)com> wrote:
>> Now that I think about it, same applies to bytea_output?

> Probably so. But I think we need not introduce quite so many new
> threads on this patch. This is, I think, at least thread #4, and
> that's making the discussion hard to follow.

Well, this is independent of the proposed patch, so I think a separate
thread is okay. The question is "shouldn't bytea_output be marked
GUC_REPORT"? I think that probably it should be, though I wonder
whether we're not too late. Clients relying on it to be transmitted are
not going to work with existing 9.0 or 9.1 releases; so maybe changing
it to be reported going forward would just make things worse.

regards, tom lane


From: Marko Kreen <markokr(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Mikko Tiihonen <mikko(dot)tiihonen(at)nitorcreations(dot)com>, Noah Misch <noah(at)leadboat(dot)com>, Merlin Moncure <mmoncure(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: GUC_REPORT for protocol tunables was: Re: Optimize binary serialization format of arrays with fixed size elements
Date: 2012-01-23 17:38:50
Message-ID: 20120123173850.GA13695@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Jan 23, 2012 at 11:20:52AM -0500, Tom Lane wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> > On Mon, Jan 23, 2012 at 9:59 AM, Marko Kreen <markokr(at)gmail(dot)com> wrote:
> >> Now that I think about it, same applies to bytea_output?
>
> > Probably so. But I think we need not introduce quite so many new
> > threads on this patch. This is, I think, at least thread #4, and
> > that's making the discussion hard to follow.
>
> Well, this is independent of the proposed patch, so I think a separate
> thread is okay. The question is "shouldn't bytea_output be marked
> GUC_REPORT"? I think that probably it should be, though I wonder
> whether we're not too late. Clients relying on it to be transmitted are
> not going to work with existing 9.0 or 9.1 releases; so maybe changing
> it to be reported going forward would just make things worse.

Well, in a complex setup it can change under you at will,
but as clients can process the data without knowing the
server state, maybe it's not a big problem. (Unless there
are old clients in the mix...)

Perhaps we can leave it as-is?

But this leaves the question of future policy for
data format change in protocol. Note I'm talking
about both text and binary formats here together.
Although we could have different policy for them.

Also note that any kind of per-session flag is basically a GUC.

Question 1 - how does client know about which format data is?

1) new format is detectable from lossy GUC
2) new format is detectable from GUC_REPORT
3) new format is detectable from Postgres version
4) new format was requested in query (V4 proto)
5) new format is detectable from data (\x in bytea)

1. obviously does not work.
2. works, but requires changes across all infrastructure.
3. works and is simple, but painful.
4. is good, but in the future
5. is good, now

Question 2 - how does client request new format?

1) Postgres new version forces it.
2) GUC_REPORT + non-detectable data
3) Lossy GUC + autodetectable data
4) GUC_REPORT + autodetectable data
5) Per-request data (V4 proto)

1. is painful
2. is painful - all infra components need to know about the GUC.
3&4. are both ugly and non-maintanable in long term. Only
difference is that with 3) the infrastructure can give slight
guarantees that it does not change under client.
4. seems good...

Btw, it does not seems that per-request metainfo change requires
"major version". It just client can send extra metainfo packet
before bind+execute, if it knows server version is good enough.
For older servers it can simply skip the extra info. [Oh yeah,
that requires data format is autodetectable, always.]

My conclusions:

1. Any change in data format should be compatible with old data.
IOW - if client requested new data format, it should always
accept old format too.

2. Can we postpone minor data format changes on the wire until there
is proper way for clients to request on-the-wire formats?

--
marko


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Marko Kreen <markokr(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Mikko Tiihonen <mikko(dot)tiihonen(at)nitorcreations(dot)com>, Noah Misch <noah(at)leadboat(dot)com>, Merlin Moncure <mmoncure(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: GUC_REPORT for protocol tunables was: Re: Optimize binary serialization format of arrays with fixed size elements
Date: 2012-01-23 19:49:09
Message-ID: 7427.1327348149@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Marko Kreen <markokr(at)gmail(dot)com> writes:
> [ bytea_output doesn't need to be GUC_REPORT because format is autodetectable ]

Fair enough. Anyway we're really about two years too late to revisit that.

> Btw, it does not seems that per-request metainfo change requires
> "major version". It just client can send extra metainfo packet
> before bind+execute, if it knows server version is good enough.

That is nonsense. You're changing the protocol, and then saying
that clients should consult the server version instead of the
protocol version to know what to do.

> 2. Can we postpone minor data format changes on the wire until there
> is proper way for clients to request on-the-wire formats?

I think that people are coming around to that position, ie, we need
a well-engineered solution to the versioning problem *first*, and
should not accept incompatible minor improvements until we have that.

regards, tom lane


From: "A(dot)M(dot)" <agentm(at)themactionfaction(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: GUC_REPORT for protocol tunables was: Re: Optimize binary serialization format of arrays with fixed size elements
Date: 2012-01-23 20:00:38
Message-ID: 58D58B50-ACC1-4483-B083-04D40EBAFA89@themactionfaction.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On Jan 23, 2012, at 2:49 PM, Tom Lane wrote:

> Marko Kreen <markokr(at)gmail(dot)com> writes:
>> [ bytea_output doesn't need to be GUC_REPORT because format is autodetectable ]
>
> Fair enough. Anyway we're really about two years too late to revisit that.
>
>> Btw, it does not seems that per-request metainfo change requires
>> "major version". It just client can send extra metainfo packet
>> before bind+execute, if it knows server version is good enough.
>
> That is nonsense. You're changing the protocol, and then saying
> that clients should consult the server version instead of the
> protocol version to know what to do.
>
>> 2. Can we postpone minor data format changes on the wire until there
>> is proper way for clients to request on-the-wire formats?
>
> I think that people are coming around to that position, ie, we need
> a well-engineered solution to the versioning problem *first*, and
> should not accept incompatible minor improvements until we have that.

One simple way clients could detect the binary encoding at startup would be to pass known test parameters and match against the returned values. If the client cannot match the response, then it should choose the text representation.

Alternatively, the 16-bit int in the Bind and RowDescription messages could be incremented to indicate a new format and then clients can specify the highest "version" of the binary format which they support.

Cheers,
M


From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: "A(dot)M(dot)" <agentm(at)themactionfaction(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: GUC_REPORT for protocol tunables was: Re: Optimize binary serialization format of arrays with fixed size elements
Date: 2012-01-23 21:45:13
Message-ID: CAHyXU0zST1SFM5syBEMxOG4uMpTKpysgFUnkPqQSSkCZedG_JA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Jan 23, 2012 at 2:00 PM, A.M. <agentm(at)themactionfaction(dot)com> wrote:
> One simple way clients could detect the binary encoding at startup would be to pass known test parameters and match against the returned values. If the client cannot match the response, then it should choose the text representation.
>
> Alternatively, the 16-bit int in the Bind and RowDescription messages could be incremented to indicate a new format and then clients can specify the highest "version" of the binary format which they support.

Prefer the version. But why send this over and over with each bind?
Wouldn't you negotiate that when connecting? Most likely, optionally,
doing as much as you can from the server version? Personally I'm not
really enthusiastic about a solution that adds a non-avoidable penalty
to all queries.

Also, a small nit: this problem is not specific to binary formats.
Text formats can and do change, albeit rarely, with predictable
headaches for the client. I see no reason to deal with text/binary
differently. The only difference between text/binary wire formats in
my eyes are that the text formats are documented.

merlin


From: "A(dot)M(dot)" <agentm(at)themactionfaction(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: GUC_REPORT for protocol tunables was: Re: Optimize binary serialization format of arrays with fixed size elements
Date: 2012-01-23 22:12:32
Message-ID: D82FF865-F9F7-45BB-9276-C562164F2CD8@themactionfaction.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On Jan 23, 2012, at 4:45 PM, Merlin Moncure wrote:

> On Mon, Jan 23, 2012 at 2:00 PM, A.M. <agentm(at)themactionfaction(dot)com> wrote:
>> One simple way clients could detect the binary encoding at startup would be to pass known test parameters and match against the returned values. If the client cannot match the response, then it should choose the text representation.
>>
>> Alternatively, the 16-bit int in the Bind and RowDescription messages could be incremented to indicate a new format and then clients can specify the highest "version" of the binary format which they support.
>
> Prefer the version. But why send this over and over with each bind?
> Wouldn't you negotiate that when connecting? Most likely, optionally,
> doing as much as you can from the server version? Personally I'm not
> really enthusiastic about a solution that adds a non-avoidable penalty
> to all queries.
>
> Also, a small nit: this problem is not specific to binary formats.
> Text formats can and do change, albeit rarely, with predictable
> headaches for the client. I see no reason to deal with text/binary
> differently. The only difference between text/binary wire formats in
> my eyes are that the text formats are documented.
>
> merlin

In terms of backwards compatibility (to support the widest range of clients), wouldn't it make sense to freeze each format option? That way, an updated text version could also assume a new int16 format identifier. The client would simply pass its preferred format. This could also allow for multiple in-flight formats; for example, if a client anticipates a large in-bound bytea column, it could specify format X which indicates the server should use gzip the result before sending. That same format may not be preferable on a different request.

Cheers,
M


From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: "A(dot)M(dot)" <agentm(at)themactionfaction(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: GUC_REPORT for protocol tunables was: Re: Optimize binary serialization format of arrays with fixed size elements
Date: 2012-01-23 22:49:42
Message-ID: CAHyXU0x5kVgaBqP-i5zCZG9RKriw3UVsA8ff2EL9O3MnwN60ZA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Jan 23, 2012 at 4:12 PM, A.M. <agentm(at)themactionfaction(dot)com> wrote:
> On Jan 23, 2012, at 4:45 PM, Merlin Moncure wrote:
>> Prefer the version.  But why send this over and over with each bind?
>> Wouldn't you negotiate that when connecting? Most likely, optionally,
>> doing as much as you can from the server version?  Personally I'm not
>> really enthusiastic about a solution that adds a non-avoidable penalty
>> to all queries.
>
> In terms of backwards compatibility (to support the widest range of clients), wouldn't it make sense to freeze each format option? That way, an updated text version could also assume a new int16 format identifier. The client would simply pass its preferred format. This could also allow for multiple in-flight formats; for example, if a client anticipates a large in-bound bytea column, it could specify format X which indicates the server should use gzip the result before sending. That same format may not be preferable on a different request.

hm. well, I'd say that you're much better off if you can hold to the
principle that newer versions of the format are always better and
should both be used if the application and the server agree. Using
your example, since you can already do something like:

select zlib_compress(byteacol) from foo;

I'm not sure that you're getting anything with that user facing
complexity. The only realistic case I can see for explicit control of
wire formats chosen is to defend your application from format changes
in the server when upgrading the server and/or libpq. This isn't a
"let's get better compression problem", this is "I upgraded my
database and my application broke" problem.

Fixing this problem in non documentation fashion is going to require a
full protocol change, period. It's the only way we can safely get all
the various players (libpq, jdbc, etc) on the same page without
breaking/recompiling millions of lines of old code that is currently
in production. The new protocol should *require* at minimum the
application, not libpq, to explicitly send the version of the database
it was coded against. That's just not getting sent now, and without
that information there's no realistic way to prevent application
breakage -- depending on libpq versions is useless since it can be
upgraded and there's always jdbc to deal with.

merlin


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: "A(dot)M(dot)" <agentm(at)themactionfaction(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: GUC_REPORT for protocol tunables was: Re: Optimize binary serialization format of arrays with fixed size elements
Date: 2012-01-24 14:26:59
Message-ID: CA+TgmoaHj53jt239u80EZKM2wuo5s5EjC9zNLyx+9jVBQdSf4Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Jan 23, 2012 at 5:49 PM, Merlin Moncure <mmoncure(at)gmail(dot)com> wrote:
> I'm not sure that you're getting anything with that user facing
> complexity.  The only realistic case I can see for explicit control of
> wire formats chosen is to defend your application from format changes
> in the server when upgrading the server and/or libpq.   This isn't a
> "let's get better compression problem", this is "I upgraded my
> database and my application broke" problem.
>
> Fixing this problem in non documentation fashion is going to require a
> full protocol change, period.

Our current protocol allocates a 2-byte integer for the purposes of
specifying the type of each parameter, and another 2-byte integer for
the purpose of specifying the result type... but only one bit is
really needed at present: text or binary. If we revise the protocol
version at some point, we might want to use some of that bit space to
allow some more fine-grained negotiation of the protocol version. So,
for example, we might define the top 5 bits as reserved (always pass
zero), the next bit as a text/binary flag, and the remaining 10 bits
as a 10-bit "format version number". When a change like this comes
along, we can bump the highest binary format version recognized by the
server, and clients who request the new version can get it.

Alternatively, we might conclude that a 2-byte integer for each
parameter is overkill and try to cut back... but the point is there's
a bunch of unused bitspace there now. In theory we could even do
something this without bumping the protocol version since the
documentation seems clear that any value other than 0 and 1 yields
undefined behavior, but in practice that seems like it might be a bit
too edgy.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: "A(dot)M(dot)" <agentm(at)themactionfaction(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: GUC_REPORT for protocol tunables was: Re: Optimize binary serialization format of arrays with fixed size elements
Date: 2012-01-24 16:16:02
Message-ID: CAHyXU0yO=uxLyABa2+xtDnOVd0dhAxG-auvJ2-mX-3w46RA_pA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Jan 24, 2012 at 8:26 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Mon, Jan 23, 2012 at 5:49 PM, Merlin Moncure <mmoncure(at)gmail(dot)com> wrote:
>> I'm not sure that you're getting anything with that user facing
>> complexity.  The only realistic case I can see for explicit control of
>> wire formats chosen is to defend your application from format changes
>> in the server when upgrading the server and/or libpq.   This isn't a
>> "let's get better compression problem", this is "I upgraded my
>> database and my application broke" problem.
>>
>> Fixing this problem in non documentation fashion is going to require a
>> full protocol change, period.
>
> Our current protocol allocates a 2-byte integer for the purposes of
> specifying the type of each parameter, and another 2-byte integer for
> the purpose of specifying the result type... but only one bit is
> really needed at present: text or binary.  If we revise the protocol
> version at some point, we might want to use some of that bit space to
> allow some more fine-grained negotiation of the protocol version.  So,
> for example, we might define the top 5 bits as reserved (always pass
> zero), the next bit as a text/binary flag, and the remaining 10 bits
> as a 10-bit "format version number".  When a change like this comes
> along, we can bump the highest binary format version recognized by the
> server, and clients who request the new version can get it.
>
> Alternatively, we might conclude that a 2-byte integer for each
> parameter is overkill and try to cut back... but the point is there's
> a bunch of unused bitspace there now.  In theory we could even do
> something this without bumping the protocol version since the
> documentation seems clear that any value other than 0 and 1 yields
> undefined behavior, but in practice that seems like it might be a bit
> too edgy.

Yeah. But again, this isn't a contract between libpq and the server,
but between the application and the server...unless you want libpq to
do format translation to something the application can understand (but
even then the application is still involved). I'm not very
enthusiastic about encouraging libpq application authors to pass
format #defines for every single parameter and consumed datum to get
future proofing on wire formats. So I'd vote against any format code
beyond the text/binary switch that currently exists (which, by the
way, while useful, is one of the great sins of libpq that we have to
deal with basically forever). While wire formatting is granular down
to the type level, applications should not have to deal with that.
They should Just Work. So who decides what format code to stuff into
the protocol? Where are the codes defined?

I'm very much in the camp that sometime, presumably during connection
startup, the protocol accepts a non-#defined-in-libpq token (database
version?) from the application that describes to the server what wire
formats can be used and the server sends one back. There probably has
to be some additional facilities for non-core types but let's put that
aside for the moment. Those two tokens allow the server to pick the
highest supported wire format (text and binary!) that everybody
understands. The server's token is useful if we're being fancy and we
want libpq to translate an older server's wire format to a newer one
for the application. This of course means moving some of the type
system into the client, which is something we might not want to do
since among other things it puts a heavy burden on non-libpq driver
authors (but then again, they can always stay on the v3 protocol,
which can benefit from being frozen in terms of wire formats).

merlin


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: "A(dot)M(dot)" <agentm(at)themactionfaction(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: GUC_REPORT for protocol tunables was: Re: Optimize binary serialization format of arrays with fixed size elements
Date: 2012-01-24 17:55:56
Message-ID: CA+TgmoYmM1wgN4Qpmh4qeBCuC68OHFnVUBjfFx7erUwXZCSiqg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Jan 24, 2012 at 11:16 AM, Merlin Moncure <mmoncure(at)gmail(dot)com> wrote:
>> Our current protocol allocates a 2-byte integer for the purposes of
>> specifying the type of each parameter, and another 2-byte integer for
>> the purpose of specifying the result type... but only one bit is
>> really needed at present: text or binary.  If we revise the protocol
>> version at some point, we might want to use some of that bit space to
>> allow some more fine-grained negotiation of the protocol version.  So,
>> for example, we might define the top 5 bits as reserved (always pass
>> zero), the next bit as a text/binary flag, and the remaining 10 bits
>> as a 10-bit "format version number".  When a change like this comes
>> along, we can bump the highest binary format version recognized by the
>> server, and clients who request the new version can get it.
>>
>> Alternatively, we might conclude that a 2-byte integer for each
>> parameter is overkill and try to cut back... but the point is there's
>> a bunch of unused bitspace there now.  In theory we could even do
>> something this without bumping the protocol version since the
>> documentation seems clear that any value other than 0 and 1 yields
>> undefined behavior, but in practice that seems like it might be a bit
>> too edgy.
>
> Yeah.  But again, this isn't a contract between libpq and the server,
> but between the application and the server...

I don't see how this is relevant. The text/binary format flag is
there in both libpq and the underlying protocol.

>  So I'd vote against any format code
> beyond the text/binary switch that currently exists (which, by the
> way, while useful, is one of the great sins of libpq that we have to
> deal with basically forever).  While wire formatting is granular down
> to the type level, applications should not have to deal with that.
> They should Just Work.  So who decides what format code to stuff into
> the protocol?  Where are the codes defined?
>
> I'm very much in the camp that sometime, presumably during connection
> startup, the protocol accepts a non-#defined-in-libpq token (database
> version?) from the application that describes to the server what wire
> formats can be used and the server sends one back.  There probably has
> to be some additional facilities for non-core types but let's put that
> aside for the moment.  Those two tokens allow the server to pick the
> highest supported wire format (text and binary!) that everybody
> understands.  The server's token is useful if we're being fancy and we
> want libpq to translate an older server's wire format to a newer one
> for the application.  This of course means moving some of the type
> system into the client, which is something we might not want to do
> since among other things it puts a heavy burden on non-libpq driver
> authors (but then again, they can always stay on the v3 protocol,
> which can benefit from being frozen in terms of wire formats).

I think it's sensible for the server to advertise a version to the
client, but I don't see how you can dismiss add-on types so blithely.
The format used to represent any given type is logically a property of
that type, and only for built-in types is that associated with the
server version.

I do wonder whether we are making a mountain out of a mole-hill here,
though. If I properly understand the proposal on the table, which
it's possible that I don't, but if I do, the new format is
self-identifying: when the optimization is in use, it sets a bit that
previously would always have been clear. So if we just go ahead and
change this, clients that have been updated to understand the new
format will work just fine. The server uses the proposed optimization
only for arrays that meet certain criteria, so any properly updated
client must still be able to handle the case where that bit isn't set.
On the flip side, clients that aren't expecting the new optimization
might break. But that's, again, no different than what happened when
we changed the default bytea output format. If you get bit, you
either update your client or shut off the optimization and deal with
the performance consequences of so doing. In fact, the cases are
almost perfectly analogous, because in each case the proposal was
based on the size of the output format being larger than necessary,
and wanting to squeeze it down to a smaller size for compactness.

And more generally, does anyone really expect that we're never going
to change the output format of any type we support ever again, without
retaining infinite backward compatibility? I didn't hear any screams
of outrage when we updated the hyphenation rules for contrib/isbn -
well, ok, there were some howls, but that was because the rules were
still incomplete and US-centric, not so much because people thought it
was unacceptable for the hyphenation rules to be different in major
release N+1 than they were in major release N. If the IETF goes and
defines a new standard for formatting IPv6 addresses, we're likely to
eventually support it via the inet and cidr datatypes. The only
things that seem reasonably immune to future changes are text and
numeric, but even with numeric it's not impossible that the maximum
available precision or scale could eventually be different than what
it is now. I think it's unrealistic to suppose that new major
releases won't ever require drivers or applications to make any
updates. My first experience with this was an application that got
broken by the addition of attisdropped, and sure, I spent a day
cursing, but would I be happier if PostgreSQL didn't support dropping
columns? No, not really.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: "A(dot)M(dot)" <agentm(at)themactionfaction(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: GUC_REPORT for protocol tunables was: Re: Optimize binary serialization format of arrays with fixed size elements
Date: 2012-01-24 21:33:46
Message-ID: CAHyXU0xNajU_a9Kq=AwMD-vUyeBNoT1g=de7PaawtYTLRwBZeA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Jan 24, 2012 at 11:55 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> I do wonder whether we are making a mountain out of a mole-hill here,
> though.  If I properly understand the proposal on the table, which
> it's possible that I don't, but if I do, the new format is
> self-identifying: when the optimization is in use, it sets a bit that
> previously would always have been clear.  So if we just go ahead and
> change this, clients that have been updated to understand the new
> format will work just fine.  The server uses the proposed optimization
> only for arrays that meet certain criteria, so any properly updated
> client must still be able to handle the case where that bit isn't set.
>  On the flip side, clients that aren't expecting the new optimization
> might break.  But that's, again, no different than what happened when
> we changed the default bytea output format.  If you get bit, you
> either update your client or shut off the optimization and deal with
> the performance consequences of so doing.

Well, the bytea experience was IMNSHO a complete disaster (It was
earlier mentioned that jdbc clients were silently corrupting bytea
datums) and should be held up as an example of how *not* to do things;
it's better to avoid having to depend on the GUC or defensive
programmatic intervention to prevent further occurrences of
application failure since the former doesn't work and the latter won't
be reliably done. Waiting for applications to break in the field only
to point affected users at the GUC is weak sauce. It's creating a
user culture that is terrified of database upgrades which hurts
everybody.

Database apps tend to have long lives in computer terms such that they
can greatly outlive the service life of a particular postgres dot
release or even the programmers who originally wrote the application.
I'm not too concerned about the viability of a programming department
with Robert Haas at the helm, but what about when he leaves? What
about critical 3rd party software that is no longer maintained?

In regards to the array optimization, I think it's great -- but if you
truly want to avoid blowing up user applications, it needs to be
disabled automatically.

merlin


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, "A(dot)M(dot)" <agentm(at)themactionfaction(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: GUC_REPORT for protocol tunables was: Re: Optimize binary serialization format of arrays with fixed size elements
Date: 2012-01-25 01:13:48
Message-ID: 24418.1327454028@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Merlin Moncure <mmoncure(at)gmail(dot)com> writes:
> On Tue, Jan 24, 2012 at 11:55 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> I do wonder whether we are making a mountain out of a mole-hill here,
>> though. If I properly understand the proposal on the table, which
>> it's possible that I don't, but if I do, the new format is
>> self-identifying: when the optimization is in use, it sets a bit that
>> previously would always have been clear. So if we just go ahead and
>> change this, clients that have been updated to understand the new
>> format will work just fine. The server uses the proposed optimization
>> only for arrays that meet certain criteria, so any properly updated
>> client must still be able to handle the case where that bit isn't set.
>> On the flip side, clients that aren't expecting the new optimization
>> might break. But that's, again, no different than what happened when
>> we changed the default bytea output format. If you get bit, you
>> either update your client or shut off the optimization and deal with
>> the performance consequences of so doing.

> Well, the bytea experience was IMNSHO a complete disaster (It was
> earlier mentioned that jdbc clients were silently corrupting bytea
> datums) and should be held up as an example of how *not* to do things;

Yeah. In both cases, the (proposed) new output format is
self-identifying *to clients that know what to look for*. Unfortunately
it would only be the most anally-written pre-existing client code that
would be likely to spit up on the unexpected variations. What's much
more likely to happen, and did happen in the bytea case, is silent data
corruption. The lack of redundancy in binary data makes this even more
likely, and the documentation situation makes it even worse. If we had
had a clear binary-data format spec from day one that told people that
they must check for unexpected contents of the flag field and fail, then
maybe we could get away with considering not doing so to be a
client-side bug ... but I really don't think we have much of a leg to
stand on given the poor documentation we've provided.

> In regards to the array optimization, I think it's great -- but if you
> truly want to avoid blowing up user applications, it needs to be
> disabled automatically.

Right. We need to fix things so that this format will not be sent to
clients unless the client code has indicated ability to accept it.
A GUC is a really poor proxy for that.

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Merlin Moncure <mmoncure(at)gmail(dot)com>, "A(dot)M(dot)" <agentm(at)themactionfaction(dot)com>, pgsql-hackers(at)postgresql(dot)org, Mikko Tiihonen <mikko(dot)tiihonen(at)nitorcreations(dot)com>, Noah Misch <noah(at)leadboat(dot)com>
Subject: Re: GUC_REPORT for protocol tunables was: Re: Optimize binary serialization format of arrays with fixed size elements
Date: 2012-01-25 02:33:52
Message-ID: CA+TgmoY=BF-dW2V9Jos_MCXt45zU=M2c0KC_9L=GYpv9QM4vMA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Jan 24, 2012 at 8:13 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Well, the bytea experience was IMNSHO a complete disaster (It was
>> earlier mentioned that jdbc clients were silently corrupting bytea
>> datums) and should be held up as an example of how *not* to do things;
>
> Yeah.  In both cases, the (proposed) new output format is
> self-identifying *to clients that know what to look for*.Unfortunately
> it would only be the most anally-written pre-existing client code that
> would be likely to spit up on the unexpected variations.  What's much
> more likely to happen, and did happen in the bytea case, is silent data
> corruption.  The lack of redundancy in binary data makes this even more
> likely, and the documentation situation makes it even worse.  If we had
> had a clear binary-data format spec from day one that told people that
> they must check for unexpected contents of the flag field and fail, then
> maybe we could get away with considering not doing so to be a
> client-side bug ... but I really don't think we have much of a leg to
> stand on given the poor documentation we've provided.
>
>> In regards to the array optimization, I think it's great -- but if you
>> truly want to avoid blowing up user applications, it needs to be
>> disabled automatically.
>
> Right.  We need to fix things so that this format will not be sent to
> clients unless the client code has indicated ability to accept it.
> A GUC is a really poor proxy for that.

OK. It seems clear to me at this point that there is no appetite for
this patch in its present form:

https://commitfest.postgresql.org/action/patch_view?id=715

Furthermore, while we haven't settled the question of exactly what a
good negotiation facility would look like, we seem to agree that a GUC
isn't it. I think that means this isn't going to happen for 9.2, so
we should mark this patch Returned with Feedback and return to this
topic for 9.3.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Merlin Moncure <mmoncure(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, "A(dot)M(dot)" <agentm(at)themactionfaction(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: GUC_REPORT for protocol tunables was: Re: Optimize binary serialization format of arrays with fixed size elements
Date: 2012-01-25 10:10:13
Message-ID: 1327486213.732.2.camel@fsopti579.F-Secure.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On tis, 2012-01-24 at 20:13 -0500, Tom Lane wrote:
> Yeah. In both cases, the (proposed) new output format is
> self-identifying *to clients that know what to look for*.
> Unfortunately it would only be the most anally-written pre-existing
> client code that would be likely to spit up on the unexpected
> variations. What's much more likely to happen, and did happen in the
> bytea case, is silent data corruption.

The problem in the bytea case is that the client libraries are written
to ignore encoding errors. No amount of protocol versioning will help
you in that case.


From: Marko Kreen <markokr(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Merlin Moncure <mmoncure(at)gmail(dot)com>, "A(dot)M(dot)" <agentm(at)themactionfaction(dot)com>, pgsql-hackers(at)postgresql(dot)org, Mikko Tiihonen <mikko(dot)tiihonen(at)nitorcreations(dot)com>, Noah Misch <noah(at)leadboat(dot)com>
Subject: Re: GUC_REPORT for protocol tunables was: Re: Optimize binary serialization format of arrays with fixed size elements
Date: 2012-01-25 11:08:49
Message-ID: 20120125110849.GA20084@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Jan 24, 2012 at 09:33:52PM -0500, Robert Haas wrote:
> Furthermore, while we haven't settled the question of exactly what a
> good negotiation facility would look like, we seem to agree that a GUC
> isn't it. I think that means this isn't going to happen for 9.2, so
> we should mark this patch Returned with Feedback and return to this
> topic for 9.3.

Simply extending the text/bin flags should be quite
uncontroversial first step. How to express the
capability in startup packet, I leave to others to decide.

But my proposal would be following:

bit 0 : text/bin
bit 1..15 : format version number, maps to best formats in some
Postgres version.

It does not solve the resultset problem, where I'd like to say
"gimme well-known types in optimal representation, others in text".
I don't know the perfect solution for that, but I suspect the
biggest danger here is the urge to go to maximal complexity
immediately. So perhaps the good idea would simply give one
additional bit (0x8000?) in result flag to say that only
well-known types should be optimized. That should cover 95%
of use-cases, and we can design more flexible packet format
when we know more about actual needs.

libpq suggestions:

PQsetformatcodes(bool)
only if its called with TRUE, it starts interpreting
text/bin codes as non-bools. IOW, we will be compatible
with old code using -1 as TRUE.

protocol suggestions:

On startup server sends highest supported text/bin codes,
and gives error if finds higher code than supported.
Poolers/proxies with different server versions in pool
will simply give lowest common code out.

Small Q&A, to put obvious aspects into writing
----------------------------------------------

* Does that mean we need to keep old formats around infinitely?

Yes. On-wire formats have *much* higher visibility than
on-disk formats. Also, except some basic types they are
not parsed in adapters, but in client code. Libpq offers
least help in that respect.

Basically - changing on-wire formatting is big deal,
don't do it willy-nilly.

* Does that mean we cannot turn on new formats automatically?

Yes. Should be obvious..

--
marko


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Marko Kreen <markokr(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Merlin Moncure <mmoncure(at)gmail(dot)com>, "A(dot)M(dot)" <agentm(at)themactionfaction(dot)com>, pgsql-hackers(at)postgresql(dot)org, Mikko Tiihonen <mikko(dot)tiihonen(at)nitorcreations(dot)com>, Noah Misch <noah(at)leadboat(dot)com>
Subject: Re: GUC_REPORT for protocol tunables was: Re: Optimize binary serialization format of arrays with fixed size elements
Date: 2012-01-25 15:23:14
Message-ID: 27705.1327504994@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Marko Kreen <markokr(at)gmail(dot)com> writes:
> On Tue, Jan 24, 2012 at 09:33:52PM -0500, Robert Haas wrote:
>> Furthermore, while we haven't settled the question of exactly what a
>> good negotiation facility would look like, we seem to agree that a GUC
>> isn't it. I think that means this isn't going to happen for 9.2, so
>> we should mark this patch Returned with Feedback and return to this
>> topic for 9.3.

> Simply extending the text/bin flags should be quite
> uncontroversial first step. How to express the
> capability in startup packet, I leave to others to decide.

> But my proposal would be following:

> bit 0 : text/bin
> bit 1..15 : format version number, maps to best formats in some
> Postgres version.

> It does not solve the resultset problem, where I'd like to say
> "gimme well-known types in optimal representation, others in text".
> I don't know the perfect solution for that, but I suspect the
> biggest danger here is the urge to go to maximal complexity
> immediately. So perhaps the good idea would simply give one
> additional bit (0x8000?) in result flag to say that only
> well-known types should be optimized. That should cover 95%
> of use-cases, and we can design more flexible packet format
> when we know more about actual needs.

Huh? How can that work? If we decide to change the representation of
some other "well known type", say numeric, how do we decide whether a
client setting that bit is expecting that change or not?

regards, tom lane


From: Marko Kreen <markokr(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Merlin Moncure <mmoncure(at)gmail(dot)com>, "A(dot)M(dot)" <agentm(at)themactionfaction(dot)com>, pgsql-hackers(at)postgresql(dot)org, Mikko Tiihonen <mikko(dot)tiihonen(at)nitorcreations(dot)com>, Noah Misch <noah(at)leadboat(dot)com>
Subject: Re: GUC_REPORT for protocol tunables was: Re: Optimize binary serialization format of arrays with fixed size elements
Date: 2012-01-25 15:53:45
Message-ID: 20120125155345.GA22639@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Jan 25, 2012 at 10:23:14AM -0500, Tom Lane wrote:
> Marko Kreen <markokr(at)gmail(dot)com> writes:
> > On Tue, Jan 24, 2012 at 09:33:52PM -0500, Robert Haas wrote:
> >> Furthermore, while we haven't settled the question of exactly what a
> >> good negotiation facility would look like, we seem to agree that a GUC
> >> isn't it. I think that means this isn't going to happen for 9.2, so
> >> we should mark this patch Returned with Feedback and return to this
> >> topic for 9.3.
>
> > Simply extending the text/bin flags should be quite
> > uncontroversial first step. How to express the
> > capability in startup packet, I leave to others to decide.
>
> > But my proposal would be following:
>
> > bit 0 : text/bin
> > bit 1..15 : format version number, maps to best formats in some
> > Postgres version.
>
> > It does not solve the resultset problem, where I'd like to say
> > "gimme well-known types in optimal representation, others in text".
> > I don't know the perfect solution for that, but I suspect the
> > biggest danger here is the urge to go to maximal complexity
> > immediately. So perhaps the good idea would simply give one
> > additional bit (0x8000?) in result flag to say that only
> > well-known types should be optimized. That should cover 95%
> > of use-cases, and we can design more flexible packet format
> > when we know more about actual needs.
>
> Huh? How can that work? If we decide to change the representation of
> some other "well known type", say numeric, how do we decide whether a
> client setting that bit is expecting that change or not?

It sets that bit *and* version code - which means that it is
up-to-date with all "well-known" type formats in that version.

The key here is to sanely define the "well-known" types
and document them, so clients can be uptodate with them.

Variants:
- All built-in and contrib types in some Postgres version
- All built-in types in some Postgres version
- Most common types (text, numeric, bytes, int, float, bool, ..)

Also, as we have only one bit, the set of types cannot be
extended. (Unless we provide more bits for that, but that
may get too confusing?)

Basically, I see 2 scenarios here:

1) Client knows the result types and can set the
text/bin/version code safely, without further restrictions.

2) There is generic framework, that does not know query contents
but can be expected to track Postgres versions closely.
Such framework cannot say "binary" for results safely,
but *could* do it for some well-defined subset of types.

Ofcourse it may be that 2) is not worth supporting, as
frameworks can throw errors on their own if they find
format that they cannot parse. Then the user needs
to either register their own parser, or simply turn off
optmized formats to get the plain-text values.

--
marko


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Marko Kreen <markokr(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Merlin Moncure <mmoncure(at)gmail(dot)com>, "A(dot)M(dot)" <agentm(at)themactionfaction(dot)com>, pgsql-hackers(at)postgresql(dot)org, Mikko Tiihonen <mikko(dot)tiihonen(at)nitorcreations(dot)com>, Noah Misch <noah(at)leadboat(dot)com>
Subject: Re: GUC_REPORT for protocol tunables was: Re: Optimize binary serialization format of arrays with fixed size elements
Date: 2012-01-25 16:40:28
Message-ID: 298.1327509628@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Marko Kreen <markokr(at)gmail(dot)com> writes:
> On Wed, Jan 25, 2012 at 10:23:14AM -0500, Tom Lane wrote:
>> Huh? How can that work? If we decide to change the representation of
>> some other "well known type", say numeric, how do we decide whether a
>> client setting that bit is expecting that change or not?

> It sets that bit *and* version code - which means that it is
> up-to-date with all "well-known" type formats in that version.

Then why bother with the bit in the format code? If you've already done
some other negotiation to establish what datatype formats you will
accept, this doesn't seem to be adding any value.

> Basically, I see 2 scenarios here:

> 1) Client knows the result types and can set the
> text/bin/version code safely, without further restrictions.

> 2) There is generic framework, that does not know query contents
> but can be expected to track Postgres versions closely.
> Such framework cannot say "binary" for results safely,
> but *could* do it for some well-defined subset of types.

The hole in approach (2) is that it supposes that the client side knows
the specific datatypes in a query result in advance. While this is
sometimes workable for application-level code that knows what query it's
issuing, it's really entirely untenable for a framework or library.
The only way that a framework can deal with arbitrary queries is to
introduce an extra round trip (Describe step) to see what datatypes
the query will produce so it can decide what format codes to issue
... and that will pretty much eat up any time savings you might get
from a more efficient representation.

You really want to do the negotiation once, at connection setup, and
then be able to process queries without client-side prechecking of what
data types will be sent back.

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Marko Kreen <markokr(at)gmail(dot)com>, Merlin Moncure <mmoncure(at)gmail(dot)com>, "A(dot)M(dot)" <agentm(at)themactionfaction(dot)com>, pgsql-hackers(at)postgresql(dot)org, Mikko Tiihonen <mikko(dot)tiihonen(at)nitorcreations(dot)com>, Noah Misch <noah(at)leadboat(dot)com>
Subject: Re: GUC_REPORT for protocol tunables was: Re: Optimize binary serialization format of arrays with fixed size elements
Date: 2012-01-25 17:03:21
Message-ID: CA+TgmobW0xS6zOdZOsy4OsGD5ivAXWD29xbidUygad8+yCD9oA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Jan 25, 2012 at 11:40 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Marko Kreen <markokr(at)gmail(dot)com> writes:
>> On Wed, Jan 25, 2012 at 10:23:14AM -0500, Tom Lane wrote:
>>> Huh?  How can that work?  If we decide to change the representation of
>>> some other "well known type", say numeric, how do we decide whether a
>>> client setting that bit is expecting that change or not?
>
>> It sets that bit *and* version code - which means that it is
>> up-to-date with all "well-known" type formats in that version.
>
> Then why bother with the bit in the format code?  If you've already done
> some other negotiation to establish what datatype formats you will
> accept, this doesn't seem to be adding any value.
>
>> Basically, I see 2 scenarios here:
>
>> 1) Client knows the result types and can set the
>> text/bin/version code safely, without further restrictions.
>
>> 2) There is generic framework, that does not know query contents
>> but can be expected to track Postgres versions closely.
>> Such framework cannot say "binary" for results safely,
>> but *could* do it for some well-defined subset of types.
>
> The hole in approach (2) is that it supposes that the client side knows
> the specific datatypes in a query result in advance.  While this is
> sometimes workable for application-level code that knows what query it's
> issuing, it's really entirely untenable for a framework or library.
> The only way that a framework can deal with arbitrary queries is to
> introduce an extra round trip (Describe step) to see what datatypes
> the query will produce so it can decide what format codes to issue
> ... and that will pretty much eat up any time savings you might get
> from a more efficient representation.
>
> You really want to do the negotiation once, at connection setup, and
> then be able to process queries without client-side prechecking of what
> data types will be sent back.

What might work is for clients to advertise a list of capability
strings, like "compact_array_format", at connection startup time. The
server can then adjust its behavior based on that list. But the
problem with that is that as we make changes to the wire protocol, the
list of capabilities clients need to advertise could get pretty long
in a hurry. A simpler alternative is to have the client send a server
version along with the initial connection attempt and have the server
do its best not to use any features that weren't present in that
server version - but that seems to leave user-defined types out in the
cold.

I reiterate my previous view that we don't have time to engineer a
good solution to this problem right now.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Marko Kreen <markokr(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Merlin Moncure <mmoncure(at)gmail(dot)com>, "A(dot)M(dot)" <agentm(at)themactionfaction(dot)com>, pgsql-hackers(at)postgresql(dot)org, Mikko Tiihonen <mikko(dot)tiihonen(at)nitorcreations(dot)com>, Noah Misch <noah(at)leadboat(dot)com>
Subject: Re: GUC_REPORT for protocol tunables was: Re: Optimize binary serialization format of arrays with fixed size elements
Date: 2012-01-25 17:24:58
Message-ID: 20120125172458.GA23353@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Jan 25, 2012 at 11:40:28AM -0500, Tom Lane wrote:
> Marko Kreen <markokr(at)gmail(dot)com> writes:
> > On Wed, Jan 25, 2012 at 10:23:14AM -0500, Tom Lane wrote:
> >> Huh? How can that work? If we decide to change the representation of
> >> some other "well known type", say numeric, how do we decide whether a
> >> client setting that bit is expecting that change or not?
>
> > It sets that bit *and* version code - which means that it is
> > up-to-date with all "well-known" type formats in that version.
>
> Then why bother with the bit in the format code? If you've already done
> some other negotiation to establish what datatype formats you will
> accept, this doesn't seem to be adding any value.

The "other negotiation" is done via Postgres release notes...

I specifically want to avoid any sort of per-connection
negotation, except the "max format version supported",
because it will mess up multiplexed usage of single connection.
Then they need to either disabled advanced formats completely,
or still do it per-query somehow (via GUCs?) which is mess.

Also I don't see any market for "flexible" negotations,
instead I see that people want 2 things:

- Updated formats are easily available
- Old apps not to break

I might be mistaken here, then please correct me,
but currently I'm designing for simplicity.

> > Basically, I see 2 scenarios here:
>
> > 1) Client knows the result types and can set the
> > text/bin/version code safely, without further restrictions.
>
> > 2) There is generic framework, that does not know query contents
> > but can be expected to track Postgres versions closely.
> > Such framework cannot say "binary" for results safely,
> > but *could* do it for some well-defined subset of types.
>
> The hole in approach (2) is that it supposes that the client side knows
> the specific datatypes in a query result in advance. While this is
> sometimes workable for application-level code that knows what query it's
> issuing, it's really entirely untenable for a framework or library.

No, the list of "well-known" types is documented and fixed.
The bit is specifically for frameworks, so that they can say
"I support all well-known types in Postgres version X.Y".

Note I said that the list cannot be extended but that is wrong.
When this bit and version code are taken together, it clearly
defines "list as in version X.Y". So considering that
client should not send any higher version than server supports,
it means server always knows what list client refers to.

--
marko


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Marko Kreen <markokr(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Merlin Moncure <mmoncure(at)gmail(dot)com>, "A(dot)M(dot)" <agentm(at)themactionfaction(dot)com>, pgsql-hackers(at)postgresql(dot)org, Mikko Tiihonen <mikko(dot)tiihonen(at)nitorcreations(dot)com>, Noah Misch <noah(at)leadboat(dot)com>
Subject: Re: GUC_REPORT for protocol tunables was: Re: Optimize binary serialization format of arrays with fixed size elements
Date: 2012-01-25 17:58:15
Message-ID: 3074.1327514295@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Marko Kreen <markokr(at)gmail(dot)com> writes:
> On Wed, Jan 25, 2012 at 11:40:28AM -0500, Tom Lane wrote:
>> Then why bother with the bit in the format code? If you've already done
>> some other negotiation to establish what datatype formats you will
>> accept, this doesn't seem to be adding any value.

> The "other negotiation" is done via Postgres release notes...

That is really not going to work if the requirement is to not break old
apps. They haven't read the release notes.

> I specifically want to avoid any sort of per-connection
> negotation, except the "max format version supported",
> because it will mess up multiplexed usage of single connection.
> Then they need to either disabled advanced formats completely,
> or still do it per-query somehow (via GUCs?) which is mess.

Hmm, that adds yet another level of not-obvious-how-to-meet requirement.
I tend to concur with Robert that we are not close to a solution.

> No, the list of "well-known" types is documented and fixed.
> The bit is specifically for frameworks, so that they can say
> "I support all well-known types in Postgres version X.Y".

So in other words, if we have a client that contains a framework that
knows about version N, and we connect it up to a server that speaks
version N+1, it suddenly loses the ability to use any version-N
optimizations? That does not meet my idea of not breaking old apps.

regards, tom lane


From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Marko Kreen <markokr(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, "A(dot)M(dot)" <agentm(at)themactionfaction(dot)com>, pgsql-hackers(at)postgresql(dot)org, Mikko Tiihonen <mikko(dot)tiihonen(at)nitorcreations(dot)com>, Noah Misch <noah(at)leadboat(dot)com>
Subject: Re: GUC_REPORT for protocol tunables was: Re: Optimize binary serialization format of arrays with fixed size elements
Date: 2012-01-25 18:54:00
Message-ID: CAHyXU0w-Mftre-oTHXmyekFM91Fr7rPMqBK2hqhA+0_DJMUNcw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Jan 25, 2012 at 11:24 AM, Marko Kreen <markokr(at)gmail(dot)com> wrote:
> I specifically want to avoid any sort of per-connection
> negotation, except the "max format version supported",
> because it will mess up multiplexed usage of single connection.
> Then they need to either disabled advanced formats completely,
> or still do it per-query somehow (via GUCs?) which is mess.

Being able to explicitly pick format version other than the one the
application was specifically written against adds a lot of complexity
and needs to be justified. Maybe you're trying to translate data
between two differently versioned servers? I'm trying to understand
the motive behind your wanting finer grained control of picking format
version...

merlin


From: Marko Kreen <markokr(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Merlin Moncure <mmoncure(at)gmail(dot)com>, "A(dot)M(dot)" <agentm(at)themactionfaction(dot)com>, pgsql-hackers(at)postgresql(dot)org, Mikko Tiihonen <mikko(dot)tiihonen(at)nitorcreations(dot)com>, Noah Misch <noah(at)leadboat(dot)com>
Subject: Re: GUC_REPORT for protocol tunables was: Re: Optimize binary serialization format of arrays with fixed size elements
Date: 2012-01-25 19:17:13
Message-ID: 20120125191713.GA23859@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Jan 25, 2012 at 12:58:15PM -0500, Tom Lane wrote:
> Marko Kreen <markokr(at)gmail(dot)com> writes:
> > On Wed, Jan 25, 2012 at 11:40:28AM -0500, Tom Lane wrote:
> >> Then why bother with the bit in the format code? If you've already done
> >> some other negotiation to establish what datatype formats you will
> >> accept, this doesn't seem to be adding any value.
>
> > The "other negotiation" is done via Postgres release notes...
>
> That is really not going to work if the requirement is to not break old
> apps. They haven't read the release notes.

Yes, but they also keep requesting the old formats so everything is fine?
Note that formats are under full control of client, server has no way
to send newer formats to client that has not requested it.

> > I specifically want to avoid any sort of per-connection
> > negotation, except the "max format version supported",
> > because it will mess up multiplexed usage of single connection.
> > Then they need to either disabled advanced formats completely,
> > or still do it per-query somehow (via GUCs?) which is mess.
>
> Hmm, that adds yet another level of not-obvious-how-to-meet requirement.
> I tend to concur with Robert that we are not close to a solution.

Well, my simple scheme seems to work fine with such requirement.

[My scheme - client-supplied 16bit type code is only thing
that decides format.]

> > No, the list of "well-known" types is documented and fixed.
> > The bit is specifically for frameworks, so that they can say
> > "I support all well-known types in Postgres version X.Y".
>
> So in other words, if we have a client that contains a framework that
> knows about version N, and we connect it up to a server that speaks
> version N+1, it suddenly loses the ability to use any version-N
> optimizations? That does not meet my idea of not breaking old apps.

That is up to Postgres maintainers to decide, whether they want
to phase out some type from the list. But my main point was
it's OK to add types into list. I missed that aspect on my
previous mail.

--
marko


From: Marko Kreen <markokr(at)gmail(dot)com>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, "A(dot)M(dot)" <agentm(at)themactionfaction(dot)com>, pgsql-hackers(at)postgresql(dot)org, Mikko Tiihonen <mikko(dot)tiihonen(at)nitorcreations(dot)com>, Noah Misch <noah(at)leadboat(dot)com>
Subject: Re: GUC_REPORT for protocol tunables was: Re: Optimize binary serialization format of arrays with fixed size elements
Date: 2012-01-25 19:24:10
Message-ID: 20120125192410.GB23859@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Jan 25, 2012 at 12:54:00PM -0600, Merlin Moncure wrote:
> On Wed, Jan 25, 2012 at 11:24 AM, Marko Kreen <markokr(at)gmail(dot)com> wrote:
> > I specifically want to avoid any sort of per-connection
> > negotation, except the "max format version supported",
> > because it will mess up multiplexed usage of single connection.
> > Then they need to either disabled advanced formats completely,
> > or still do it per-query somehow (via GUCs?) which is mess.
>
> Being able to explicitly pick format version other than the one the
> application was specifically written against adds a lot of complexity
> and needs to be justified. Maybe you're trying to translate data
> between two differently versioned servers? I'm trying to understand
> the motive behind your wanting finer grained control of picking format
> version...

You mean if client has written with version N formats, but connects
to server with version N-1 formats? True, simply not supporting
such case simplifies client-side API.

But note that it does not change anything on protocol level, it's purely
client-API specific. It may well be that some higher-level APIs
(JDBC, Npgsql, Psycopg) may support such downgrade, but with lower-level
API-s (raw libpq), it may be optional whether the client wants to
support such usage or not.

--
marko


From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Marko Kreen <markokr(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, "A(dot)M(dot)" <agentm(at)themactionfaction(dot)com>, pgsql-hackers(at)postgresql(dot)org, Mikko Tiihonen <mikko(dot)tiihonen(at)nitorcreations(dot)com>, Noah Misch <noah(at)leadboat(dot)com>
Subject: Re: GUC_REPORT for protocol tunables was: Re: Optimize binary serialization format of arrays with fixed size elements
Date: 2012-01-25 19:43:03
Message-ID: CAHyXU0yBx5_WUZzjmJo_UyoPpX1mabfbPVDa_3NEVpQiW3kGcw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Jan 25, 2012 at 1:24 PM, Marko Kreen <markokr(at)gmail(dot)com> wrote:
> On Wed, Jan 25, 2012 at 12:54:00PM -0600, Merlin Moncure wrote:
>> On Wed, Jan 25, 2012 at 11:24 AM, Marko Kreen <markokr(at)gmail(dot)com> wrote:
>> > I specifically want to avoid any sort of per-connection
>> > negotation, except the "max format version supported",
>> > because it will mess up multiplexed usage of single connection.
>> > Then they need to either disabled advanced formats completely,
>> > or still do it per-query somehow (via GUCs?) which is mess.
>>
>> Being able to explicitly pick format version other than the one the
>> application was specifically written against adds a lot of complexity
>> and needs to be justified.  Maybe you're trying to translate data
>> between two differently versioned servers?  I'm trying to understand
>> the motive behind your wanting finer grained control of picking format
>> version...
>
> You mean if client has written with version N formats, but connects
> to server with version N-1 formats?  True, simply not supporting
> such case simplifies client-side API.
>
> But note that it does not change anything on protocol level, it's purely
> client-API specific.  It may well be that some higher-level APIs
> (JDBC, Npgsql, Psycopg) may support such downgrade, but with lower-level
> API-s (raw libpq), it may be optional whether the client wants to
> support such usage or not.

well, I see the following cases:
1) Vserver > Vapplication: server downgrades wire formats to
applications version
2) Vapplication > Vlibpq > Vserver: since the application is
reading/writing formats the server can't understand, an error should
be raised if they are used in either direction
3) Vlibpq >= VApplication > Vserver: same as above, but libpq can
'upconvert' low version wire format to application's wire format or
error otherwise.

By far, the most common cause of problems (both in terms of severity
and frequency) is case #1. #3 allows a 'compatibility mode' via
libpq, but that comes at significant cost of complexity since libpq
needs to be able to translate wire formats up (but not down). #2/3 is
a less common problem though as it's more likely the application can
be adjusted to get up to speed: so to keep things simple we can maybe
just error out in those scenarios.

In the database, we need to maintain outdated send/recv functions
basically forever and as much as possible try and translate old wire
format data to and from newer backend structures (maybe in very
specific cases that will be impossible such that the application is
SOL, but that should be rare). All send/recv functions, including
user created ones need to be stamped with a version token (database
version?). With the versions of the application, libpq, and all
server functions, we can determine all wire formats as long as we
assume the application's targeted database version represents all the
wire formats it was using.

My good ideas stop there: the exact mechanics of how the usable set of
functions are determined, how exactly the adjusted type look ups will
work, etc. would all have to be sorted out. Most of the nastier parts
though (protocol changes notwithstanding) are not in libpq, but the
server. There's just no quick fix on the client side I can see.

merlin


From: Marko Kreen <markokr(at)gmail(dot)com>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, "A(dot)M(dot)" <agentm(at)themactionfaction(dot)com>, pgsql-hackers(at)postgresql(dot)org, Mikko Tiihonen <mikko(dot)tiihonen(at)nitorcreations(dot)com>, Noah Misch <noah(at)leadboat(dot)com>
Subject: Re: GUC_REPORT for protocol tunables was: Re: Optimize binary serialization format of arrays with fixed size elements
Date: 2012-01-25 20:29:32
Message-ID: 20120125202932.GA24268@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Jan 25, 2012 at 01:43:03PM -0600, Merlin Moncure wrote:
> On Wed, Jan 25, 2012 at 1:24 PM, Marko Kreen <markokr(at)gmail(dot)com> wrote:
> > On Wed, Jan 25, 2012 at 12:54:00PM -0600, Merlin Moncure wrote:
> >> On Wed, Jan 25, 2012 at 11:24 AM, Marko Kreen <markokr(at)gmail(dot)com> wrote:
> >> > I specifically want to avoid any sort of per-connection
> >> > negotation, except the "max format version supported",
> >> > because it will mess up multiplexed usage of single connection.
> >> > Then they need to either disabled advanced formats completely,
> >> > or still do it per-query somehow (via GUCs?) which is mess.
> >>
> >> Being able to explicitly pick format version other than the one the
> >> application was specifically written against adds a lot of complexity
> >> and needs to be justified.  Maybe you're trying to translate data
> >> between two differently versioned servers?  I'm trying to understand
> >> the motive behind your wanting finer grained control of picking format
> >> version...
> >
> > You mean if client has written with version N formats, but connects
> > to server with version N-1 formats?  True, simply not supporting
> > such case simplifies client-side API.
> >
> > But note that it does not change anything on protocol level, it's purely
> > client-API specific.  It may well be that some higher-level APIs
> > (JDBC, Npgsql, Psycopg) may support such downgrade, but with lower-level
> > API-s (raw libpq), it may be optional whether the client wants to
> > support such usage or not.
>
> well, I see the following cases:
> 1) Vserver > Vapplication: server downgrades wire formats to
> applications version
> 2) Vapplication > Vlibpq > Vserver: since the application is
> reading/writing formats the server can't understand, an error should
> be raised if they are used in either direction
> 3) Vlibpq >= VApplication > Vserver: same as above, but libpq can
> 'upconvert' low version wire format to application's wire format or
> error otherwise.

I don't see why you special-case libpq here. There is no reason
libpq cannot pass older/newer formats through. Only thing that
matters it parser/formatter version. If that is done in libpq,
then app version does not matter. If it's done in app, then
libpq version does not matter.

> By far, the most common cause of problems (both in terms of severity
> and frequency) is case #1. #3 allows a 'compatibility mode' via
> libpq, but that comes at significant cost of complexity since libpq
> needs to be able to translate wire formats up (but not down). #2/3 is
> a less common problem though as it's more likely the application can
> be adjusted to get up to speed: so to keep things simple we can maybe
> just error out in those scenarios.

I don't like the idea of "conversion". Instead either client
writes values through API that picks format based on server version,
or it writes them for specific version only. In latter case it cannot
work with older server. Unless the fixed version is the baseline.

> In the database, we need to maintain outdated send/recv functions
> basically forever and as much as possible try and translate old wire
> format data to and from newer backend structures (maybe in very
> specific cases that will be impossible such that the application is
> SOL, but that should be rare). All send/recv functions, including
> user created ones need to be stamped with a version token (database
> version?). With the versions of the application, libpq, and all
> server functions, we can determine all wire formats as long as we
> assume the application's targeted database version represents all the
> wire formats it was using.
>
> My good ideas stop there: the exact mechanics of how the usable set of
> functions are determined, how exactly the adjusted type look ups will
> work, etc. would all have to be sorted out. Most of the nastier parts
> though (protocol changes notwithstanding) are not in libpq, but the
> server. There's just no quick fix on the client side I can see.

It does not need to be complex - just bring the version number to
i/o function and let it decide whether it cares about it or not.
Most functions will not.. Only those that we want to change in
compatible manner need to look at it.

But I don't see that there is danger of having regular changes in wire
formats. So most of the functions will ignore the versioning.
Including the ones that don't care about compatibility.

But seriously - on-wire compatibility is good thing, do not fear it...

--
marko


From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Marko Kreen <markokr(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, "A(dot)M(dot)" <agentm(at)themactionfaction(dot)com>, pgsql-hackers(at)postgresql(dot)org, Mikko Tiihonen <mikko(dot)tiihonen(at)nitorcreations(dot)com>, Noah Misch <noah(at)leadboat(dot)com>
Subject: Re: GUC_REPORT for protocol tunables was: Re: Optimize binary serialization format of arrays with fixed size elements
Date: 2012-01-25 20:50:09
Message-ID: CAHyXU0ypQSyWO4DU9rwLo1QyuhxLF7Q3xH7cZ5sGtrRXzjAu6g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Jan 25, 2012 at 2:29 PM, Marko Kreen <markokr(at)gmail(dot)com> wrote:
>> well, I see the following cases:
>> 1) Vserver > Vapplication: server downgrades wire formats to
>> applications version
>> 2) Vapplication > Vlibpq > Vserver: since the application is
>> reading/writing formats the server can't understand, an error should
>> be raised if they are used in either direction
>> 3) Vlibpq >= VApplication > Vserver: same as above, but libpq can
>> 'upconvert' low version wire format to application's wire format or
>> error otherwise.
>
> I don't see why you special-case libpq here.  There is no reason
> libpq cannot pass older/newer formats through.  Only thing that
> matters it parser/formatter version.  If that is done in libpq,
> then app version does not matter.  If it's done in app, then
> libpq version does not matter.

Only because if the app is targeting wire format N, but the server can
only handle N-1, libpq has the opportunity to fix it up. That's could
be just over thinking it though.

>> By far, the most common cause of problems (both in terms of severity
>> and frequency) is case #1.  #3 allows a 'compatibility mode' via
>> libpq, but that comes at significant cost of complexity since libpq
>> needs to be able to translate wire formats up (but not down).  #2/3 is
>> a less common problem though as it's more likely the application can
>> be adjusted to get up to speed: so to keep things simple we can maybe
>> just error out in those scenarios.
>
> I don't like the idea of "conversion".  Instead either client
> writes values through API that picks format based on server version,
> or it writes them for specific version only.  In latter case it cannot
> work with older server.  Unless the fixed version is the baseline.

ok. another point about that: libpq isn't really part of the solution
anyways since there are other popular fully native protocol consumers,
including (and especially) jdbc, but also python, node.js etc etc.

that's why I was earlier insisting on a protocol bump, so that we
could in the new protocol force application version to be advertised.
v3 would remain caveat emptor for wire formats but v4 would not.

>> In the database, we need to maintain outdated send/recv functions
>> basically forever and as much as possible try and translate old wire
>> format data to and from newer backend structures (maybe in very
>> specific cases that will be impossible such that the application is
>> SOL, but that should be rare).  All send/recv functions, including
>> user created ones need to be stamped with a version token (database
>> version?).  With the versions of the application, libpq, and all
>> server functions, we can determine all wire formats as long as we
>> assume the application's targeted database version represents all the
>> wire formats it was using.
>>
> My good ideas stop there: the exact mechanics of how the usable set of
>> functions are determined, how exactly the adjusted type look ups will
>> work, etc. would all have to be sorted out.  Most of the nastier parts
>> though (protocol changes notwithstanding) are not in libpq, but the
>> server.  There's just no quick fix on the client side I can see.
>
> It does not need to be complex - just bring the version number to
> i/o function and let it decide whether it cares about it or not.
> Most functions will not..  Only those that we want to change in
> compatible manner need to look at it.

well, maybe instead of passing version number around, the server
installs the proper compatibility send/recv functions just once on
session start up so your code isn't littered with stuff like
if(version > n) do this; else do this;?

> But seriously - on-wire compatibility is good thing, do not fear it...

sure -- but for postgres I just don't think it's realistic, especially
for the binary wire formats. a json based data payload could give it
to you (and I'm only half kidding) :-).

merlin


From: Marko Kreen <markokr(at)gmail(dot)com>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, "A(dot)M(dot)" <agentm(at)themactionfaction(dot)com>, pgsql-hackers(at)postgresql(dot)org, Mikko Tiihonen <mikko(dot)tiihonen(at)nitorcreations(dot)com>, Noah Misch <noah(at)leadboat(dot)com>
Subject: Re: GUC_REPORT for protocol tunables was: Re: Optimize binary serialization format of arrays with fixed size elements
Date: 2012-01-25 21:19:36
Message-ID: 20120125211936.GA24482@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Jan 25, 2012 at 02:50:09PM -0600, Merlin Moncure wrote:
> On Wed, Jan 25, 2012 at 2:29 PM, Marko Kreen <markokr(at)gmail(dot)com> wrote:
> >> well, I see the following cases:
> >> 1) Vserver > Vapplication: server downgrades wire formats to
> >> applications version
> >> 2) Vapplication > Vlibpq > Vserver: since the application is
> >> reading/writing formats the server can't understand, an error should
> >> be raised if they are used in either direction
> >> 3) Vlibpq >= VApplication > Vserver: same as above, but libpq can
> >> 'upconvert' low version wire format to application's wire format or
> >> error otherwise.
> >
> > I don't see why you special-case libpq here.  There is no reason
> > libpq cannot pass older/newer formats through.  Only thing that
> > matters it parser/formatter version.  If that is done in libpq,
> > then app version does not matter.  If it's done in app, then
> > libpq version does not matter.
>
> Only because if the app is targeting wire format N, but the server can
> only handle N-1, libpq has the opportunity to fix it up. That's could
> be just over thinking it though.

I think it's over thinking. The value should be formatted/parsed just
once. Server side must support processing different versions.
Whether client side supports downgrading, it's up to client-side
programmers.

If you want to write compatible client, you have a choice of using
proper wrapper API, or simply writing baseline formatting, ignoring
format changes in new versions.

Both are valid approaches and I think we should keep it that way.

> >> By far, the most common cause of problems (both in terms of severity
> >> and frequency) is case #1.  #3 allows a 'compatibility mode' via
> >> libpq, but that comes at significant cost of complexity since libpq
> >> needs to be able to translate wire formats up (but not down).  #2/3 is
> >> a less common problem though as it's more likely the application can
> >> be adjusted to get up to speed: so to keep things simple we can maybe
> >> just error out in those scenarios.
> >
> > I don't like the idea of "conversion".  Instead either client
> > writes values through API that picks format based on server version,
> > or it writes them for specific version only.  In latter case it cannot
> > work with older server.  Unless the fixed version is the baseline.
>
> ok. another point about that: libpq isn't really part of the solution
> anyways since there are other popular fully native protocol consumers,
> including (and especially) jdbc, but also python, node.js etc etc.
>
> that's why I was earlier insisting on a protocol bump, so that we
> could in the new protocol force application version to be advertised.
> v3 would remain caveat emptor for wire formats but v4 would not.

We can bump major/minor anyway to inform clients about new
functionality. I don't particularly care about that. What
I'm interested in is what the actual type negotation looks like.

It might be possible we could get away without bumpping anything.
But I have not thought about that angle too deeply yet.

> >> In the database, we need to maintain outdated send/recv functions
> >> basically forever and as much as possible try and translate old wire
> >> format data to and from newer backend structures (maybe in very
> >> specific cases that will be impossible such that the application is
> >> SOL, but that should be rare).  All send/recv functions, including
> >> user created ones need to be stamped with a version token (database
> >> version?).  With the versions of the application, libpq, and all
> >> server functions, we can determine all wire formats as long as we
> >> assume the application's targeted database version represents all the
> >> wire formats it was using.
> >>
> > My good ideas stop there: the exact mechanics of how the usable set of
> >> functions are determined, how exactly the adjusted type look ups will
> >> work, etc. would all have to be sorted out.  Most of the nastier parts
> >> though (protocol changes notwithstanding) are not in libpq, but the
> >> server.  There's just no quick fix on the client side I can see.
> >
> > It does not need to be complex - just bring the version number to
> > i/o function and let it decide whether it cares about it or not.
> > Most functions will not..  Only those that we want to change in
> > compatible manner need to look at it.
>
> well, maybe instead of passing version number around, the server
> installs the proper compatibility send/recv functions just once on
> session start up so your code isn't littered with stuff like
> if(version > n) do this; else do this;?

Seems confusing. Note that type i/o functions are user-callable.
How should they act then?

Also note that if()s are needed only for types that want to change their
on-wire formatting. Considering the mess incompatible on-wire format change
can cause, it's good price to pay.

> > But seriously - on-wire compatibility is good thing, do not fear it...
>
> sure -- but for postgres I just don't think it's realistic, especially
> for the binary wire formats. a json based data payload could give it
> to you (and I'm only half kidding) :-).

I think we are pretty compatible already... Minus the bytea mess.
But mostly thanks to not changing the i/o formats. The question is
how to keep compatibility while allowing changes in type formatting.

--
marko


From: Mikko Tiihonen <mikko(dot)tiihonen(at)nitorcreations(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Marko Kreen <markokr(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Merlin Moncure <mmoncure(at)gmail(dot)com>, "A(dot)M(dot)" <agentm(at)themactionfaction(dot)com>, pgsql-hackers(at)postgresql(dot)org, Noah Misch <noah(at)leadboat(dot)com>
Subject: Re: GUC_REPORT for protocol tunables was: Re: Optimize binary serialization format of arrays with fixed size elements
Date: 2012-01-26 07:47:54
Message-ID: 4F21052A.2020508@nitorcreations.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 01/25/2012 06:40 PM, Tom Lane wrote:
> Marko Kreen<markokr(at)gmail(dot)com> writes:
>> On Wed, Jan 25, 2012 at 10:23:14AM -0500, Tom Lane wrote:
>>> Huh? How can that work? If we decide to change the representation of
>>> some other "well known type", say numeric, how do we decide whether a
>>> client setting that bit is expecting that change or not?
>
>> It sets that bit *and* version code - which means that it is
>> up-to-date with all "well-known" type formats in that version.
>
> Then why bother with the bit in the format code? If you've already done
> some other negotiation to establish what datatype formats you will
> accept, this doesn't seem to be adding any value.
>
>> Basically, I see 2 scenarios here:
>
>> 1) Client knows the result types and can set the
>> text/bin/version code safely, without further restrictions.
>
>> 2) There is generic framework, that does not know query contents
>> but can be expected to track Postgres versions closely.
>> Such framework cannot say "binary" for results safely,
>> but *could* do it for some well-defined subset of types.
>
> The hole in approach (2) is that it supposes that the client side knows
> the specific datatypes in a query result in advance. While this is
> sometimes workable for application-level code that knows what query it's
> issuing, it's really entirely untenable for a framework or library.
> The only way that a framework can deal with arbitrary queries is to
> introduce an extra round trip (Describe step) to see what datatypes
> the query will produce so it can decide what format codes to issue
> ... and that will pretty much eat up any time savings you might get
> from a more efficient representation.

This is pretty much what jdbc driver already does, since it does not have
100% coverage of even current binary formats. First time you execute a
query it requests text encoding, but caches the Describe results. Next
time it sets the binary bits on all return columns that it knows how to
decode.

> You really want to do the negotiation once, at connection setup, and
> then be able to process queries without client-side prechecking of what
> data types will be sent back.

I think my original minor_version patch tried to do that. It introduced a
per-connection setting for version. Server GUC_REPORTED the maximum supported
minor_version but defaulted to the baseline wire format.
The jdbc client could bump the minor_version to supported higher
value (error if value larger than what server advertised).

A way was provided for the application using jdbc driver to
override the requested minor_version in the rare event that something
broke (rare, because jdbc driver generally does not expose the
wire-encoding to applications).

Now if pgbounce and other pooling solutions would reset the minor_version
to 0 then it should work.

Scenarios where other end is too old to know about the minor_version:
Vserver>>Vlibpq => client does nothing -> use baseline version
Vlibpq>>Vserver => no supported_minor_version in GUC_REPORT -> use baseline

Normal 9.2+ scenarios:
Vserver>Vlibpq => libpg sets minor_version to largest that is supports
-> libpq requested version used
Vlibpq>Vserver => libpg notices that server supported value is lower than
its so it sets minor_version to server supported value
-> server version used

For perl driver that exposes the wire format to application by default
I can envision that the driver needs to add a new API that applications
need to use to explicitly bump the minor_version up instead of defaulting
to the largest supported by the driver as in jdbc/libpg.

The reason why I proposed a incrementing minor_version instead of bit flags
of new encodings was that it takes less space and is easier to document and
understand so that exposing it to applications is possible.

But how to handle postgres extensions that change their wire-format?
Maybe we do need to have "oid:minor_version,oid:ver,oid_ver" as the
negotiated version variable?

-Mikko