Re: Optimize binary serialization format of arrays with fixed size elements

From: Noah Misch <noah(at)leadboat(dot)com>
To: Mikko Tiihonen <mikko(dot)tiihonen(at)nitorcreations(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Merlin Moncure <mmoncure(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Optimize binary serialization format of arrays with fixed size elements
Date: 2012-01-23 03:16:57
Message-ID: 20120123031657.GE15693@tornado.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Jan 22, 2012 at 11:47:06PM +0200, Mikko Tiihonen wrote:
> On 01/20/2012 04:45 AM, Noah Misch wrote:
>> On Thu, Jan 19, 2012 at 02:00:20PM -0500, Robert Haas wrote:
>>> Having said that, if we're to follow the precedent set by
>>> bytea_format, maybe we ought to just add
>>> binary_array_format={huge,ittybitty} and be done with it, rather than
>>> inventing a minor protocol version GUC for something that isn't really
>>> a protocol version change at all. We could introduce a
>>> differently-named general mechanism, but I guess I'm not seeing the
>>> point of that either. Just because someone has a
>>> backward-compatibility issue with one change of this type doesn't mean
>>> they have a similar issue with all of them. So I think adding a
>>> special-purpose GUC is more logical and more parallel to what we've
>>> done in the past, and it doesn't saddle us with having to be certain
>>> that we've designed the mechanism generally enough to handle all the
>>> cases that may come later.
>>
>> That makes sense. An attraction of a single binary format version was avoiding
>> the "Is this worth a GUC?" conversation for each change. However, adding a GUC
>> should be no more notable than bumping a binary format version.
>
> I see the main difference between the GUC per feature vs minor version being that
> in versioned changes old clients keep working because the have to explicitly
> request a specific version. Whereas in separate GUC variables each feature will be
> enabled by default and users have to either keep up with new client versions or
> figure out how to explicitly disable the changes.

No, we can decide that anew for each GUC. If you'd propose that array_output
= full be the default, that works for me.

> Here is a second version of the patch. The binary array encoding changes
> stay the same but all code around was rewritten.
>
> Changes from previous versions based on received comments:
> * removed the minor protocol version concept
> * introduced a new GUC variable array_output copying the current
> bytea_output type, with values "full" (old value) and
> "smallfixed" (new default)

How about the name "array_binary_output"?

> * added documentation for the new GUC variable
> * used constants for the array flags variable values

> *** a/doc/src/sgml/config.sgml
> --- b/doc/src/sgml/config.sgml
> *************** COPY postgres_log FROM '/full/path/to/lo
> *** 4888,4893 ****
> --- 4888,4910 ----
> </listitem>
> </varlistentry>
>
> + <varlistentry id="guc-array-output" xreflabel="array_output">
> + <term><varname>array_output</varname> (<type>enum</type>)</term>
> + <indexterm>
> + <primary><varname>array_output</> configuration parameter</primary>
> + </indexterm>
> + <listitem>
> + <para>
> + Sets the output format for binary encoding of values of
> + type <type>array</type>. Valid values are
> + <literal>smallfixed</literal> (the default)
> + and <literal>full</literal> (the traditional PostgreSQL
> + format). The <type>array</type> type always

It's not "The array type" but "Array types", a class.

> + accepts both formats on input, regardless of this setting.
> + </para>
> + </listitem>
> + </varlistentry>

The section "Array Input and Output Syntax" should reference this GUC.

> *** a/src/backend/utils/misc/guc.c
> --- b/src/backend/utils/misc/guc.c
> ***************
> *** 64,69 ****
> --- 64,70 ----
> #include "storage/predicate.h"
> #include "tcop/tcopprot.h"
> #include "tsearch/ts_cache.h"
> + #include "utils/array.h"
> #include "utils/builtins.h"
> #include "utils/bytea.h"
> #include "utils/guc_tables.h"
> *************** static const struct config_enum_entry by
> *** 225,230 ****
> --- 226,243 ----
> };
>
> /*
> + * Options for enum values defined in this module.
> + *
> + * NOTE! Option values may not contain double quotes!
> + */

Don't replicate this comment.

> +
> + static const struct config_enum_entry array_output_options[] = {
> + {"full", ARRAY_OUTPUT_FULL, false},
> + {"smallfixed", ARRAY_OUTPUT_SMALLFIXED, false},
> + {NULL, 0, false}
> + };
> +
> + /*
> * We have different sets for client and server message level options because
> * they sort slightly different (see "log" level)
> */
> *************** static struct config_enum ConfigureNames
> *** 3047,3052 ****
> --- 3060,3075 ----
> NULL, NULL, NULL
> },
>
> + {
> + {"array_output", PGC_USERSET, COMPAT_OPTIONS_PREVIOUS,

You've put the GUC in COMPAT_OPTIONS_PREVIOUS, but you've put its
documentation in the section for CLIENT_CONN_STATEMENT. I don't have a strong
opinion on which one to use, but be consistent.

> + gettext_noop("Sets the binary output format for arrays."),
> + NULL
> + },
> + &bytea_output,

&array_output

> + ARRAY_OUTPUT_SMALLFIXED, array_output_options,
> + NULL, NULL, NULL
> + },
> +
> {
> {"client_min_messages", PGC_USERSET, LOGGING_WHEN,
> gettext_noop("Sets the message levels that are sent to the client."),

Thanks,
nm

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2012-01-23 04:42:28 Re: Inline Extension
Previous Message Noah Misch 2012-01-23 02:30:45 Re: [PATCH] Support for foreign keys with arrays