psql NUL record and field separator

Lists: pgsql-hackers
From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: psql NUL record and field separator
Date: 2012-01-14 12:23:49
Message-ID: 1326543829.31492.13.camel@vanquo.pezone.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Inspired by this question http://stackoverflow.com/questions/6857265 I
have implemented a way to set the psql record and field separators to a
zero byte (ASCII NUL character). This can be very useful in shell
scripts to have an unambiguous separator. Other GNU tools such as find,
grep, sort, xargs also support this. So with this you could for example
do

psql --record-separator-zero -At -c 'select something from somewhere' | xargs -0 dosomething

I have thought about two different ways to implement this. Attempt one
was to make the backslash command option parsing zero-byte proof top to
bottom by using PQExpBuffers, so you could then write \R '\000'. But
that turned out to be very invasive and complicated. And worst, you
couldn't use it from the command line, because psql -R '\000' doesn't
work (the octal escape syntax is not used on the command line).

So attempt two, which I present here, is to just have separate syntax to
set the separators to zero bytes. From the command line it would be
--record-separator-zero and --field-separator-zero, and from within psql
it would be \pset recordsep_zero and \pset fieldsep_zero. I don't care
much for the verbosity of this, so I'm still thinking about ways to
abbreviate this. I think the most common use of this would be to set
the record separator from the command line, so we could use a short
option such as -0 or -z for that.

Patch attached. Comments welcome.

Attachment Content-Type Size
psql-nul-sep.patch text/x-patch 14.0 KB

From: Abhijit Menon-Sen <ams(at)toroid(dot)org>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: psql NUL record and field separator
Date: 2012-01-26 13:30:26
Message-ID: 20120126133026.GA30769@toroid.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

At 2012-01-14 14:23:49 +0200, peter_e(at)gmx(dot)net wrote:
>
> Inspired by this question http://stackoverflow.com/questions/6857265 I
> have implemented a way to set the psql record and field separators to
> a zero byte (ASCII NUL character).

Since this patch is in the commitfest, I had a look at it.

I agree that the feature is useful. The patch applies and builds cleanly
with HEAD(at)9f9135d1, but needs a further minor tweak to work (attached).
Without it, both zero separators get overwritten with the default value
after option parsing. The code looks good otherwise.

There's one problem:

> psql --record-separator-zero -At -c 'select something from somewhere' | xargs -0 dosomething

If you run find -print0 and it finds one file, it will still print
"filename\0", and xargs -0 will work fine.

But psql --record-separator-zero -At -c 'select 1' will print "1\n", not
"1\0" or even "1\0\n", so xargs -0 will use the value "1\n", not "1". If
you're doing this in a shell script, handing the last argument specially
would be painful.

At issue are (at least) these three lines from print_unaligned_text in
src/bin/psql/print.c:

358 /* the last record needs to be concluded with a newline */
359 if (need_recordsep)
360 fputc('\n', fout);

Perhaps the right thing to do would be to change this to output \0 if
--record-separator-zero was used (but leave it at \n otherwise)? That
is what my second attached patch does:

$ bin/psql --record-separator-zero --field-separator-zero -At -c 'select 1,2 union select 3,4'|xargs -0 echo
1 2 3 4

Thoughts?

> I think the most common use of this would be to set the record
> separator from the command line, so we could use a short option
> such as -0 or -z for that.

I agree. The current option names are very unwieldy to type.

-- ams

Attachment Content-Type Size
petere-zero-fix.diff text/x-diff 766 bytes
petere-zero-lastrecord.diff text/x-diff 486 bytes

From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Abhijit Menon-Sen <ams(at)toroid(dot)org>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: psql NUL record and field separator
Date: 2012-02-07 11:20:43
Message-ID: 1328613643.24489.4.camel@vanquo.pezone.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On tor, 2012-01-26 at 19:00 +0530, Abhijit Menon-Sen wrote:
> At issue are (at least) these three lines from print_unaligned_text in
> src/bin/psql/print.c:
>
> 358 /* the last record needs to be concluded with a newline
> */
> 359 if (need_recordsep)
> 360 fputc('\n', fout);
>
> Perhaps the right thing to do would be to change this to output \0 if
> --record-separator-zero was used (but leave it at \n otherwise)? That
> is what my second attached patch does:
>
> $ bin/psql --record-separator-zero --field-separator-zero -At -c
> 'select 1,2 union select 3,4'|xargs -0 echo
> 1 2 3 4
>
> Thoughts?
>
> > I think the most common use of this would be to set the record
> > separator from the command line, so we could use a short option
> > such as -0 or -z for that.
>
> I agree. The current option names are very unwieldy to type.
>
I have incorporated your two patches and added short options. Updated
patch attached.

This made me wonder, however. The existing -F and -R options set the
record *separator*. The new options, however, set the record
*terminator*. This is the small distinction that you had discovered.

Should we rename the options and/or add that to the documentation, or is
the new behavior obvious and any new terminology would be too confusing?

Attachment Content-Type Size
psql-nul-sep.patch text/x-patch 14.8 KB

From: Abhijit Menon-Sen <ams(at)toroid(dot)org>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: psql NUL record and field separator
Date: 2012-02-09 05:41:51
Message-ID: 20120209054151.GC17036@toroid.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

At 2012-02-07 13:20:43 +0200, peter_e(at)gmx(dot)net wrote:
>
> Should we rename the options and/or add that to the documentation, or is
> the new behavior obvious and any new terminology would be too confusing?

I agree there is potential for confusion either way. I tried to come up
with a complete and not-confusing wording for all four options, but did
not manage to improve on the current state of affairs significantly. I
think it can stay the way it is. The reference to xargs -0 is probably
enough to set the right expectations about how it works.

We can always add a sentence later to clarify the special-case behaviour
of -0 if anyone is actually confused (and the best wording will be more
clear in that situation too).

-- Abhijit