Re: Proposal to add --single-row to psql

Lists: pgsql-hackers
From: Christopher Manning <c(at)christophermanning(dot)org>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Proposal to add --single-row to psql
Date: 2013-04-23 15:30:59
Message-ID: CAEYkp92z2w3VBs4uxWPwub7k4hGW-vEPW_WNSui9R5T+cGpLYw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

psql currently collects the query result rows in memory before writing them
to a file and can cause out of memory problems for large results in low
memory environments like ec2. I can't use COPY TO STDOUT or FETCH_COUNT
since I'm using Redshift and it doesn't support [writing to STDOUT](
http://docs.aws.amazon.com/redshift/latest/dg/r_COPY.html) or [CURSOR](
https://forums.aws.amazon.com/thread.jspa?threadID=122664).

[Single Row Mode is available in Postgres 9.2](
http://www.postgresql.org/docs/9.2/static/libpq-single-row-mode.html) but
[it doesn't look like](
http://www.postgresql.org/docs/9.2/static/app-psql.html) you can tell psql
to use single row mode when writing to a file (using --output).

I'm proposing to add a --single-row option to psql that would allow the
result rows of a query to be streamed to a file without collecting them in
memory first.

I'm new to the postgres source, but I was considering doing this by adding
an elseif at [this line in bin/psql/common.c](
https://github.com/postgres/postgres/blob/master/src/bin/psql/common.c#L955)
that
would call [PQsetSingleRowMode](
https://github.com/postgres/postgres/blob/master/src/interfaces/libpq/fe-exec.c#L1581)
and
ideally use something very similar to [ExecQueryUsingCursor](
https://github.com/postgres/postgres/blob/master/src/bin/psql/common.c#L1081
)

Please let me know if that would be an acceptable addition and if there's
anything in particular I should be aware of when adding the feature.

Thank you,
Christopher


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Christopher Manning <c(at)christophermanning(dot)org>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposal to add --single-row to psql
Date: 2013-04-23 15:48:01
Message-ID: CAFj8pRD395WOcqx7sieAKVbbNP_G1tkY84SJ7rdCWC1A2KkwYA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hello

It is redundant to current FETCH_COUNT implementation, so I don't see sense
to use it together. Maybe we can drop FETCH_COUNT and replace it by
--single-row mode and probably it can simplify code.

Regards

Pavel

2013/4/23 Christopher Manning <c(at)christophermanning(dot)org>

> psql currently collects the query result rows in memory before writing
> them to a file and can cause out of memory problems for large results in
> low memory environments like ec2. I can't use COPY TO STDOUT or FETCH_COUNT
> since I'm using Redshift and it doesn't support [writing to STDOUT](
> http://docs.aws.amazon.com/redshift/latest/dg/r_COPY.html) or [CURSOR](
> https://forums.aws.amazon.com/thread.jspa?threadID=122664).
>
> [Single Row Mode is available in Postgres 9.2](
> http://www.postgresql.org/docs/9.2/static/libpq-single-row-mode.html) but
> [it doesn't look like](
> http://www.postgresql.org/docs/9.2/static/app-psql.html) you can tell
> psql to use single row mode when writing to a file (using --output).
>
> I'm proposing to add a --single-row option to psql that would allow the
> result rows of a query to be streamed to a file without collecting them in
> memory first.
>
> I'm new to the postgres source, but I was considering doing this by adding
> an elseif at [this line in bin/psql/common.c](
> https://github.com/postgres/postgres/blob/master/src/bin/psql/common.c#L955) that
> would call [PQsetSingleRowMode](
> https://github.com/postgres/postgres/blob/master/src/interfaces/libpq/fe-exec.c#L1581) and
> ideally use something very similar to [ExecQueryUsingCursor](
> https://github.com/postgres/postgres/blob/master/src/bin/psql/common.c#L1081
> )
>
> Please let me know if that would be an acceptable addition and if there's
> anything in particular I should be aware of when adding the feature.
>
> Thank you,
> Christopher
>
>


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Christopher Manning <c(at)christophermanning(dot)org>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Proposal to add --single-row to psql
Date: 2013-04-23 16:05:25
Message-ID: 5106.1366733125@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Christopher Manning <c(at)christophermanning(dot)org> writes:
> I'm proposing to add a --single-row option to psql that would allow the
> result rows of a query to be streamed to a file without collecting them in
> memory first.

Isn't there already a way to set FETCH_COUNT from the command line?
(ie, I think there's a generic variable-assignment facility that could
do this)

regards, tom lane


From: Fabrízio de Royes Mello <fabriziomello(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Christopher Manning <c(at)christophermanning(dot)org>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposal to add --single-row to psql
Date: 2013-04-24 12:04:04
Message-ID: CAFcNs+pBhK=zL_a_TKxRNU1JPsdq+A+Bdoty8+_nryJcg1bwnQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Apr 23, 2013 at 1:05 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> Isn't there already a way to set FETCH_COUNT from the command line?
> (ie, I think there's a generic variable-assignment facility that could
> do this)
>

Christopher,

Tom is all right... from psql [1] command line we can do that:

$ bin/psql --variable=FETCH_COUNT=100
psql (9.3devel)
Type "help" for help.

fabrizio=# \echo :FETCH_COUNT
100

Regards,

[1]
http://www.postgresql.org/docs/current/interactive/app-psql.html#AEN84903

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Blog sobre TI: http://fabriziomello.blogspot.com
>> Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello


From: Christopher Manning <c(at)christophermanning(dot)org>
To: fabriziomello(at)gmail(dot)com
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposal to add --single-row to psql
Date: 2013-04-24 22:25:26
Message-ID: CAEYkp91gpZ9MdYXNHf1aZ7zfBLkXufAvd_gDRwk9_dmeSxPgqw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Fabrízio and Tom,

I know that you can use --variable="FETCH_COUNT=10000" from the
psql command line, but internally that uses a CURSOR to batch the rows and
[Redshift doesn't support CURSOR](
https://forums.aws.amazon.com/thread.jspa?threadID=122664&tstart=0) so it's
not an option when using psql to download data from Redshift.

Pavel's idea of having a single row mode option to replace FETCH_COUNT is
interesting, does anyone have any problems with that or alternative ideas?

Regards,
Christopher

On Wed, Apr 24, 2013 at 7:04 AM, Fabrízio de Royes Mello <
fabriziomello(at)gmail(dot)com> wrote:

>
> On Tue, Apr 23, 2013 at 1:05 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >
> > Isn't there already a way to set FETCH_COUNT from the command line?
> > (ie, I think there's a generic variable-assignment facility that could
> > do this)
> >
>
> Christopher,
>
> Tom is all right... from psql [1] command line we can do that:
>
> $ bin/psql --variable=FETCH_COUNT=100
> psql (9.3devel)
> Type "help" for help.
>
> fabrizio=# \echo :FETCH_COUNT
> 100
>
> Regards,
>
> [1]
> http://www.postgresql.org/docs/current/interactive/app-psql.html#AEN84903
>
> --
> Fabrízio de Royes Mello
> Consultoria/Coaching PostgreSQL
> >> Blog sobre TI: http://fabriziomello.blogspot.com
> >> Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
> >> Twitter: http://twitter.com/fabriziomello
>


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Christopher Manning <c(at)christophermanning(dot)org>
Cc: fabriziomello(at)gmail(dot)com, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposal to add --single-row to psql
Date: 2013-04-25 14:42:53
Message-ID: 23704.1366900973@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Christopher Manning <c(at)christophermanning(dot)org> writes:
> Fabrzio and Tom,
> I know that you can use --variable="FETCH_COUNT=10000" from the
> psql command line, but internally that uses a CURSOR to batch the rows and
> [Redshift doesn't support CURSOR](
> https://forums.aws.amazon.com/thread.jspa?threadID=122664&tstart=0) so it's
> not an option when using psql to download data from Redshift.

I don't know what redshift is, but I have very little patience with the
idea of burdening psql with yet another random feature in order to work
around deficiencies in somebody else's software. Considering that the
earliest any such thing could reach the field would be 9.4, it seems not
unlikely that the need for it would be gone by next year anyway.

regards, tom lane


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Christopher Manning <c(at)christophermanning(dot)org>, fabriziomello(at)gmail(dot)com, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposal to add --single-row to psql
Date: 2013-04-25 14:55:34
Message-ID: 517943E6.4030402@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On 04/25/2013 10:42 AM, Tom Lane wrote:
> Christopher Manning <c(at)christophermanning(dot)org> writes:
>> Fabrízio and Tom,
>> I know that you can use --variable="FETCH_COUNT=10000" from the
>> psql command line, but internally that uses a CURSOR to batch the rows and
>> [Redshift doesn't support CURSOR](
>> https://forums.aws.amazon.com/thread.jspa?threadID=122664&tstart=0) so it's
>> not an option when using psql to download data from Redshift.
> I don't know what redshift is,

It's an Amazon product based on release 8.0, but with many many features
removed (e.g. Indexes!)

> but I have very little patience with the
> idea of burdening psql with yet another random feature in order to work
> around deficiencies in somebody else's software. Considering that the
> earliest any such thing could reach the field would be 9.4, it seems not
> unlikely that the need for it would be gone by next year anyway.
>
>

Plus there is the fact that we have no way to test it against redshift
anyway.

It should be up to Amazon to produce a useful psql program that works
with redshift, not us. We have enough to do to support our own product.

If this is to be justified at all it needs to be without reference to
redshift.

cheers

andrew


From: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Christopher Manning <c(at)christophermanning(dot)org>, fabriziomello(at)gmail(dot)com, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposal to add --single-row to psql
Date: 2013-04-25 14:57:41
Message-ID: 51794465.8010800@commandprompt.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On 04/25/2013 07:42 AM, Tom Lane wrote:
> Christopher Manning <c(at)christophermanning(dot)org> writes:
>> Fabr�zio and Tom,
>> I know that you can use --variable="FETCH_COUNT=10000" from the
>> psql command line, but internally that uses a CURSOR to batch the rows and
>> [Redshift doesn't support CURSOR](
>> https://forums.aws.amazon.com/thread.jspa?threadID=122664&tstart=0) so it's
>> not an option when using psql to download data from Redshift.
>
> I don't know what redshift is,

It is a PostgreSQL fork based on ancient source code. From Amazon:

Amazon Redshift is based on PostgreSQL 8.0.2. Amazon Redshift and
PostgreSQL have a number of very important differences that you must be
aware of as you design and develop your data warehouse applications.

> but I have very little patience with the
> idea of burdening psql with yet another random feature in order to work
> around deficiencies in somebody else's software. Considering that the
> earliest any such thing could reach the field would be 9.4, it seems not
> unlikely that the need for it would be gone by next year anyway.

+1 this is really an amazon problem not a postgresql problem.

Sincerely,

Joshua D. Drake

--
Command Prompt, Inc. - http://www.commandprompt.com/
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, Postgres-XC
@cmdpromptinc - 509-416-6579


From: Craig Ringer <craig(at)2ndquadrant(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Christopher Manning <c(at)christophermanning(dot)org>, fabriziomello(at)gmail(dot)com, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposal to add --single-row to psql
Date: 2013-04-28 12:50:28
Message-ID: 517D1B14.5000408@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 04/25/2013 10:55 PM, Andrew Dunstan wrote:
>
> It's an Amazon product based on release 8.0, but with many many
> features removed (e.g. Indexes!)
More specifically, it's a hacked-up column-store-ized Pg for OLAP and
analytics work. As I understand it Amazon didn't develop it themselves;
they bought/licensed Paraccel ( http://www.paraccel.com/) and customised
to integrated into Amazon's IAM authentication/authorization, usage
accounting, dynamic provisioning, etc.

http://docs.aws.amazon.com/redshift/latest/dg/c_redshift-and-postgres-sql.html
http://en.wikipedia.org/wiki/ParAccel

I find it frustrating that I've never seen an @paraccel email address
here and that few of the other vendors of highly customised Pg offshoots
are contributing back. It's almost enough to make me like the GPL.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Jim Nasby <jim(at)nasby(dot)net>
To: Craig Ringer <craig(at)2ndquadrant(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Christopher Manning <c(at)christophermanning(dot)org>, fabriziomello(at)gmail(dot)com, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposal to add --single-row to psql
Date: 2013-04-29 20:33:41
Message-ID: 517ED925.4010003@nasby.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 4/28/13 7:50 AM, Craig Ringer wrote:
> I find it frustrating that I've never seen an @paraccel email address here and that few of the other vendors of highly customised Pg offshoots are contributing back. It's almost enough to make me like the GPL.

FWIW, I think there's a pretty large barrier to these folks contributing back. Would the community really want to add a bunch of hooks to support something like Redshift? Or Greenplum? Or etc, etc.? Most of these guys have to change significant amounts of PG code, so much so that it's actually hard for them to stay current (which is why most of them just fork).

I do think this is a shame, but I'm not sure of any good way to fix it.
--
Jim C. Nasby, Data Architect jim(at)nasby(dot)net
512.569.9461 (cell) http://jim.nasby.net


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Jim Nasby <jim(at)nasby(dot)net>
Cc: Craig Ringer <craig(at)2ndquadrant(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Christopher Manning <c(at)christophermanning(dot)org>, fabriziomello(at)gmail(dot)com, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposal to add --single-row to psql
Date: 2013-05-02 00:36:42
Message-ID: CA+TgmoYNrj-qr3kw8B86iXZbmkfML4f=0OzFZETivnf9POO6iw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Apr 29, 2013 at 4:33 PM, Jim Nasby <jim(at)nasby(dot)net> wrote:
> On 4/28/13 7:50 AM, Craig Ringer wrote:
>>
>> I find it frustrating that I've never seen an @paraccel email address here
>> and that few of the other vendors of highly customised Pg offshoots are
>> contributing back. It's almost enough to make me like the GPL.
>
> FWIW, I think there's a pretty large barrier to these folks contributing
> back. Would the community really want to add a bunch of hooks to support
> something like Redshift? Or Greenplum? Or etc, etc.? Most of these guys have
> to change significant amounts of PG code, so much so that it's actually hard
> for them to stay current (which is why most of them just fork).
>
> I do think this is a shame, but I'm not sure of any good way to fix it.

Yep. There are plenty of things that we do at EDB for good and valid
business reasons that I can't imagine the community accepting under
any circumstances. For example, Oracle compatibility is not something
the community values as highly as EnterpriseDB (and our customers) do.
I'm sure that many of those vendors are in similar situations - they
write code that only runs on specialized hardware, or (rather
commonly, I suspect) they remove parts of the functionality in order
to make certain things very fast. Those are not trade-offs that make
sense for PostgreSQL, but I find it hard to understand what we'd gain
from preventing other people from making them. There are in fact a
pretty large number of companies - EnterpriseDB, obviously, but there
are many, many others - that are choosing to build businesses around
PostgreSQL precisely because it *isn't* GPL. Personally, I think
that's a good thing for our community in terms of mindshare even when
companies choose not to contribute back - and it's even better when
they do.

I was at the MySQL show a couple of years back and they had a vendor
area. And I got talking to one the vendors there who had implemented
some new kind of database - I forget the details exactly - and he told
me that it spoke the PostgreSQL wire protocol. I was of course a bit
surprised as PostgreSQL was not exactly what people at this show were
excited about. So I asked him why not the MySQL wire protocol, and
he, basically, that they were afraid of being accused of a GPL
violation, because MySQL AB had a very expansive interpretation of
what the GPL did and did not allow. We could perhaps argue about
whether he was right to be worried about that... but he was. I can't
think of all the names right now, but I've talked with a bunch of
other companies over the last few years who also chose PostgreSQL for
licensing reasons. I'm pretty happy about that.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Proposal to add --single-row to psql
Date: 2013-05-02 00:44:00
Message-ID: 5181B6D0.7060400@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


> I find it frustrating that I've never seen an @paraccel email address
> here and that few of the other vendors of highly customised Pg offshoots
> are contributing back. It's almost enough to make me like the GPL.

Well, Paraccel never ended up contributing any code, but in years back
(2006-2008) they did contribute money to PostgreSQL (event sponsorships,
SPI). I actually talked to them about contributing code, but somehow it
never happened.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


From: Craig Ringer <craig(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Jim Nasby <jim(at)nasby(dot)net>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Christopher Manning <c(at)christophermanning(dot)org>, fabriziomello(at)gmail(dot)com, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposal to add --single-row to psql
Date: 2013-05-02 02:00:22
Message-ID: 5181C8B6.7000901@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 05/02/2013 08:36 AM, Robert Haas wrote:

> I've talked with a bunch of other companies over the last few years
> who also chose PostgreSQL for licensing reasons. I'm pretty happy
> about that.

I think it's a pretty good thing too, personally ... but I do wish
they'd contribute a bit more to the client-side drivers like psqlODBC
and PgJDBC.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Jim Nasby <jim(at)nasby(dot)net>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Christopher Manning <c(at)christophermanning(dot)org>, fabriziomello(at)gmail(dot)com, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposal to add --single-row to psql
Date: 2013-05-02 02:16:30
Message-ID: 9974.1367460990@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> ... There are in fact a
> pretty large number of companies - EnterpriseDB, obviously, but there
> are many, many others - that are choosing to build businesses around
> PostgreSQL precisely because it *isn't* GPL. Personally, I think
> that's a good thing for our community in terms of mindshare even when
> companies choose not to contribute back - and it's even better when
> they do.

Yeah, exactly. People forking PG may not contribute much directly,
but they still add to mindshare and general familiarity with the PG
code base. In the long run, that cannot help but add to our goal of
world domination ;-)

We had the GPL-vs-BSD-license discussion in excruciating length years
ago, mid-2000 if memory serves, so you can find a whole lot more about
this topic in the mailing list archives from back then. But personally
I wouldn't consider changing even if it were somehow practical. The
license choice has served us well and will continue to do so.

regards, tom lane


From: Jim Nasby <jim(at)nasby(dot)net>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Craig Ringer <craig(at)2ndquadrant(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Christopher Manning <c(at)christophermanning(dot)org>, fabriziomello(at)gmail(dot)com, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposal to add --single-row to psql
Date: 2013-05-08 23:08:28
Message-ID: 518ADAEC.8050201@nasby.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 5/1/13 7:36 PM, Robert Haas wrote:
> On Mon, Apr 29, 2013 at 4:33 PM, Jim Nasby<jim(at)nasby(dot)net> wrote:
>> >On 4/28/13 7:50 AM, Craig Ringer wrote:
>>> >>
>>> >>I find it frustrating that I've never seen an @paraccel email address here
>>> >>and that few of the other vendors of highly customised Pg offshoots are
>>> >>contributing back. It's almost enough to make me like the GPL.
>> >
>> >FWIW, I think there's a pretty large barrier to these folks contributing
>> >back. Would the community really want to add a bunch of hooks to support
>> >something like Redshift? Or Greenplum? Or etc, etc.? Most of these guys have
>> >to change significant amounts of PG code, so much so that it's actually hard
>> >for them to stay current (which is why most of them just fork).
>> >
>> >I do think this is a shame, but I'm not sure of any good way to fix it.
> Yep. There are plenty of things that we do at EDB for good and valid
> business reasons that I can't imagine the community accepting under
> any circumstances. For example, Oracle compatibility is not something
> the community values as highly as EnterpriseDB (and our customers) do.
> I'm sure that many of those vendors are in similar situations - they
> write code that only runs on specialized hardware, or (rather
> commonly, I suspect) they remove parts of the functionality in order
> to make certain things very fast. Those are not trade-offs that make
> sense for PostgreSQL, but I find it hard to understand what we'd gain
> from preventing other people from making them. There are in fact a
> pretty large number of companies - EnterpriseDB, obviously, but there
> are many, many others - that are choosing to build businesses around
> PostgreSQL precisely because it*isn't* GPL. Personally, I think
> that's a good thing for our community in terms of mindshare even when
> companies choose not to contribute back - and it's even better when
> they do.

FWIW, one point I was trying to make that was overlooked is that it seems to be exceptionally difficult for companies to fork Postgres and then stay current (AFAIK EnterpriseDB and Mammoth are the only products that have pulled that feat off). I believe that makes it significantly harder for them to actually contribute code back that doesn't give them a business advantage, as well as making it harder to justify hacking on the community codebase because they'll just face a very large hurdle when it comes to pulling that code back into their proprietary product.

I don't know of any good way to solve that problem. Maybe it's not worth solving... but I do suspect there's some useful stuff that the community has lost out on because of this.
--
Jim C. Nasby, Data Architect jim(at)nasby(dot)net
512.569.9461 (cell) http://jim.nasby.net


From: David Fetter <david(at)fetter(dot)org>
To: Jim Nasby <jim(at)nasby(dot)net>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Christopher Manning <c(at)christophermanning(dot)org>, fabriziomello(at)gmail(dot)com, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposal to add --single-row to psql
Date: 2013-05-09 00:12:37
Message-ID: 20130509001237.GH13274@fetter.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, May 08, 2013 at 06:08:28PM -0500, Jim Nasby wrote:
> On 5/1/13 7:36 PM, Robert Haas wrote:
> >On Mon, Apr 29, 2013 at 4:33 PM, Jim Nasby<jim(at)nasby(dot)net> wrote:
> >>>On 4/28/13 7:50 AM, Craig Ringer wrote:
> >>>>>
> >>>>>I find it frustrating that I've never seen an @paraccel email address here
> >>>>>and that few of the other vendors of highly customised Pg offshoots are
> >>>>>contributing back. It's almost enough to make me like the GPL.
> >>>
> >>>FWIW, I think there's a pretty large barrier to these folks contributing
> >>>back. Would the community really want to add a bunch of hooks to support
> >>>something like Redshift? Or Greenplum? Or etc, etc.? Most of these guys have
> >>>to change significant amounts of PG code, so much so that it's actually hard
> >>>for them to stay current (which is why most of them just fork).
> >>>
> >>>I do think this is a shame, but I'm not sure of any good way to fix it.
> >Yep. There are plenty of things that we do at EDB for good and valid
> >business reasons that I can't imagine the community accepting under
> >any circumstances. For example, Oracle compatibility is not something
> >the community values as highly as EnterpriseDB (and our customers) do.
> > I'm sure that many of those vendors are in similar situations - they
> >write code that only runs on specialized hardware, or (rather
> >commonly, I suspect) they remove parts of the functionality in order
> >to make certain things very fast. Those are not trade-offs that make
> >sense for PostgreSQL, but I find it hard to understand what we'd gain
> >from preventing other people from making them. There are in fact a
> >pretty large number of companies - EnterpriseDB, obviously, but there
> >are many, many others - that are choosing to build businesses around
> >PostgreSQL precisely because it*isn't* GPL. Personally, I think
> >that's a good thing for our community in terms of mindshare even when
> >companies choose not to contribute back - and it's even better when
> >they do.
>
> FWIW, one point I was trying to make that was overlooked is that it
> seems to be exceptionally difficult for companies to fork Postgres
> and then stay current (AFAIK EnterpriseDB and Mammoth are the only
> products that have pulled that feat off).

VMware and CitusDB are doing pretty well so far, but it's early days.

> I believe that makes it significantly harder for them to actually
> contribute code back that doesn't give them a business advantage, as
> well as making it harder to justify hacking on the community
> codebase because they'll just face a very large hurdle when it comes
> to pulling that code back into their proprietary product.
>
> I don't know of any good way to solve that problem. Maybe it's not
> worth solving... but I do suspect there's some useful stuff that the
> community has lost out on because of this.

Some of this is getting solved by making PostgreSQL more pluggable in
ways that isolate the proprietary stuff, i.e. make people not have to
touch the PostgreSQL core code much, if at all, in order to provide
whatever special features they provide. Hooks and FDWs are two such
pluggable components.

Cheers,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: David Fetter <david(at)fetter(dot)org>
Cc: Jim Nasby <jim(at)nasby(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Christopher Manning <c(at)christophermanning(dot)org>, fabriziomello(at)gmail(dot)com, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposal to add --single-row to psql
Date: 2013-05-09 11:36:52
Message-ID: CAB7nPqQJvT3dfWr07F4y78q7ke-3vAeXmxvanoEwrsd8SXFq_w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, May 9, 2013 at 9:12 AM, David Fetter <david(at)fetter(dot)org> wrote:

> On Wed, May 08, 2013 at 06:08:28PM -0500, Jim Nasby wrote:
> > I believe that makes it significantly harder for them to actually
> > contribute code back that doesn't give them a business advantage, as
> > well as making it harder to justify hacking on the community
> > codebase because they'll just face a very large hurdle when it comes
> > to pulling that code back into their proprietary product.
> >
> > I don't know of any good way to solve that problem. Maybe it's not
> > worth solving... but I do suspect there's some useful stuff that the
> > community has lost out on because of this.
>
> Some of this is getting solved by making PostgreSQL more pluggable in
> ways that isolate the proprietary stuff, i.e. make people not have to
> touch the PostgreSQL core code much, if at all, in order to provide
> whatever special features they provide. Hooks and FDWs are two such
> pluggable components.
>
Extensions and the lovely background workers as well.
--
Michael


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: David Fetter <david(at)fetter(dot)org>, Jim Nasby <jim(at)nasby(dot)net>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Christopher Manning <c(at)christophermanning(dot)org>, fabriziomello(at)gmail(dot)com, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposal to add --single-row to psql
Date: 2013-05-11 15:17:03
Message-ID: CA+TgmoaaELFkE-QBoktUNNgWhx1qnZ06VwCqxeLH7Hb+G8UqJw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, May 9, 2013 at 7:36 AM, Michael Paquier
<michael(dot)paquier(at)gmail(dot)com> wrote:

>> Some of this is getting solved by making PostgreSQL more pluggable in
>> ways that isolate the proprietary stuff, i.e. make people not have to
>> touch the PostgreSQL core code much, if at all, in order to provide
>> whatever special features they provide. Hooks and FDWs are two such
>> pluggable components.
>
> Extensions and the lovely background workers as well.

Some kind of extendable parser would be awesome. It would need to tie
into the rewriter also.

No, I don't have a clue what the design looks like.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: David Fetter <david(at)fetter(dot)org>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Jim Nasby <jim(at)nasby(dot)net>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Christopher Manning <c(at)christophermanning(dot)org>, fabriziomello(at)gmail(dot)com, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposal to add --single-row to psql
Date: 2013-05-11 15:34:17
Message-ID: 20130511153417.GA17951@fetter.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, May 11, 2013 at 11:17:03AM -0400, Robert Haas wrote:
> On Thu, May 9, 2013 at 7:36 AM, Michael Paquier
> <michael(dot)paquier(at)gmail(dot)com> wrote:
>
> >> Some of this is getting solved by making PostgreSQL more pluggable in
> >> ways that isolate the proprietary stuff, i.e. make people not have to
> >> touch the PostgreSQL core code much, if at all, in order to provide
> >> whatever special features they provide. Hooks and FDWs are two such
> >> pluggable components.
> >
> > Extensions and the lovely background workers as well.
>
> Some kind of extendable parser would be awesome. It would need to tie
> into the rewriter also.
>
> No, I don't have a clue what the design looks like.

That's a direction several of the proprietary RDBMS vendors have
proposed. I think it'd be great :)

Pre-coffee (yeah, I know. Bad idea.) sketch of an idea: create an API
to pass expression trees in and out. This could have other benefits
as to clustering space, shortening the planning cycle, etc., but let's
not go there for now. My knowledge is very, very sketchy, but when I
squint, the expression trees we use look a lot like JSON. Are they
isomorphic?

Cheers,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: David Fetter <david(at)fetter(dot)org>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Jim Nasby <jim(at)nasby(dot)net>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Christopher Manning <c(at)christophermanning(dot)org>, fabriziomello(at)gmail(dot)com, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposal to add --single-row to psql
Date: 2013-05-11 16:27:36
Message-ID: 18726.1368289656@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

David Fetter <david(at)fetter(dot)org> writes:
> On Sat, May 11, 2013 at 11:17:03AM -0400, Robert Haas wrote:
>> Some kind of extendable parser would be awesome. It would need to tie
>> into the rewriter also.
>>
>> No, I don't have a clue what the design looks like.

> That's a direction several of the proprietary RDBMS vendors have
> proposed. I think it'd be great :)

> Pre-coffee (yeah, I know. Bad idea.) sketch of an idea: create an API
> to pass expression trees in and out. This could have other benefits
> as to clustering space, shortening the planning cycle, etc., but let's
> not go there for now. My knowledge is very, very sketchy, but when I
> squint, the expression trees we use look a lot like JSON. Are they
> isomorphic?

By the time you've got an expression tree, the problem is mostly solved,
at least so far as parser extension is concerned.

More years ago than I care to admit, I worked on systems that had
run-time-extensible parsers at Hewlett-Packard, so technology for this
does exist. But my (vague) memory of those systems is that the parser's
language capabilities were more limited than bison's, perhaps only
LL(1). Parsing spec-compatible SQL that way might be a challenge.

A larger issue is that if you don't have the whole grammar available
to check, it's difficult to be sure there are no parsing conflicts.
I seem to remember that we hit some conflicts between different
extension ROMs back at HP :-(

Another point is that extensions that are actually interesting require
a lot more than new syntax. Robert mentioned the rewriter, but you'd
typically need planner and executor additions as well. It's possible to
see how whole new plan node types might be added by a plugin so far as
the executor is concerned, but I haven't a clue how we'd get the planner
to emit them ...

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: David Fetter <david(at)fetter(dot)org>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Jim Nasby <jim(at)nasby(dot)net>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Christopher Manning <c(at)christophermanning(dot)org>, fabriziomello(at)gmail(dot)com, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposal to add --single-row to psql
Date: 2013-05-11 23:22:55
Message-ID: CA+TgmoZs9nBCyk9KFiJbeRhGMRw0kBcU2MddGhwGm+7qkk-Q3w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, May 11, 2013 at 12:27 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> By the time you've got an expression tree, the problem is mostly solved,
> at least so far as parser extension is concerned.

Right.

> More years ago than I care to admit, I worked on systems that had
> run-time-extensible parsers at Hewlett-Packard, so technology for this
> does exist. But my (vague) memory of those systems is that the parser's
> language capabilities were more limited than bison's, perhaps only
> LL(1). Parsing spec-compatible SQL that way might be a challenge.

If I understand bison correctly, it basically looks at the current
parser state and the next token and decides to either shift that token
onto the stack or reduce the stack using some rule. If there's no
matching rule, we error out. If someone wants to inject new rules
into the grammar, those state tables are all going to change. But if
we could contrive things so that the state tables are built
dynamically and can be change as rules are added and removed, then it
seems to me that we could let a loadable module supply (or delete)
whatever grammar rules it likes. Whenever it does this, we recompile
the grammar on next use (and complain if we get ambiguities). This
does not sound all that easy to code, but at least in theory it seems
doable.

We'd also need a way to add keywords. To be quite frank, I think our
whole approach to keywords is massive screwed up right now. Aside
from the parser bloat, adding even unreserved keywords breaks stuff,
e.g. "SELECT 1 this_is_not_yet_a_keyword". EnterpriseDB's internal
bug-tracking system must have at least half a dozen bug reports open
right now that are attributable to keywords either existing at all or
being more reserved than they are in Oracle. That aside, the amount
of trouble we've caused for PostgreSQL users over the years, either by
adding new keywords or disallowing them in contexts where they used to
work, is not small. I don't have a specific proposal for what we
should do to make this problem less painful, but I think if we ever
revise our parser infrastructure we ought to use that as an
opportunity to try to come up with something better than what exists
today, because the status quo is awfully painful.

> Another point is that extensions that are actually interesting require
> a lot more than new syntax. Robert mentioned the rewriter, but you'd
> typically need planner and executor additions as well. It's possible to
> see how whole new plan node types might be added by a plugin so far as
> the executor is concerned, but I haven't a clue how we'd get the planner
> to emit them ...

I don't have any ideas either. Still, solving some of the problems
would be better than solving none of them.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Jim Nasby <jim(at)nasby(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: David Fetter <david(at)fetter(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Christopher Manning <c(at)christophermanning(dot)org>, fabriziomello(at)gmail(dot)com, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposal to add --single-row to psql
Date: 2013-05-12 20:18:35
Message-ID: 518FF91B.9060300@nasby.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 5/11/13 11:27 AM, Tom Lane wrote:
> David Fetter <david(at)fetter(dot)org> writes:
>> On Sat, May 11, 2013 at 11:17:03AM -0400, Robert Haas wrote:
>>> Some kind of extendable parser would be awesome. It would need to tie
>>> into the rewriter also.
>>>
>>> No, I don't have a clue what the design looks like.
>
>> That's a direction several of the proprietary RDBMS vendors have
>> proposed. I think it'd be great :)
>
>> Pre-coffee (yeah, I know. Bad idea.) sketch of an idea: create an API
>> to pass expression trees in and out. This could have other benefits
>> as to clustering space, shortening the planning cycle, etc., but let's
>> not go there for now. My knowledge is very, very sketchy, but when I
>> squint, the expression trees we use look a lot like JSON. Are they
>> isomorphic?
>
> By the time you've got an expression tree, the problem is mostly solved,
> at least so far as parser extension is concerned.
>
> More years ago than I care to admit, I worked on systems that had
> run-time-extensible parsers at Hewlett-Packard, so technology for this
> does exist. But my (vague) memory of those systems is that the parser's
> language capabilities were more limited than bison's, perhaps only
> LL(1). Parsing spec-compatible SQL that way might be a challenge.
>
> A larger issue is that if you don't have the whole grammar available
> to check, it's difficult to be sure there are no parsing conflicts.
> I seem to remember that we hit some conflicts between different
> extension ROMs back at HP :-(
>
> Another point is that extensions that are actually interesting require
> a lot more than new syntax. Robert mentioned the rewriter, but you'd
> typically need planner and executor additions as well. It's possible to
> see how whole new plan node types might be added by a plugin so far as
> the executor is concerned, but I haven't a clue how we'd get the planner
> to emit them ...

FWIW, I've wanted the ability to plug into the parser not for an extension, but so that I could programmaticly enforce certain coding conventions. TBH, it would be ideal if that was possible without requiring a backend (ie, some kind of stand-alone library/tool), but exposing it via a backend would be better than nothing.
--
Jim C. Nasby, Data Architect jim(at)nasby(dot)net
512.569.9461 (cell) http://jim.nasby.net


From: Peter Geoghegan <pg(at)heroku(dot)com>
To: Jim Nasby <jim(at)nasby(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, David Fetter <david(at)fetter(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Christopher Manning <c(at)christophermanning(dot)org>, fabriziomello(at)gmail(dot)com, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposal to add --single-row to psql
Date: 2013-05-13 06:53:53
Message-ID: CAM3SWZRP2r7yMF__kiPuC-rmTfVTAq1jiVWQu_weyKwt44uy1A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, May 12, 2013 at 1:18 PM, Jim Nasby <jim(at)nasby(dot)net> wrote:
> FWIW, I've wanted the ability to plug into the parser not for an extension,
> but so that I could programmaticly enforce certain coding conventions.

Depending on the exact requirements, that probably wouldn't be too
difficult. It'd likely entail adding a hook for plugins to inspect raw
parse trees.

The pg_stat_statements core infrastructure changes for 9.2 added a
hook that allowed pg_stat_statements to inspect the post-parse
analysis (query) tree. That wouldn't be suitable for what you want,
though, because that tree loses many of the details that you are
likely to care about but are inconsequential to the semantics of the
query. For example, you couldn't tell from the post-parse analysis
tree if the query text was originally "select all, the, columns from
table" or "select * from table".

--
Peter Geoghegan


From: Amit Kapila <amit(dot)kapila(at)huawei(dot)com>
To: "'Robert Haas'" <robertmhaas(at)gmail(dot)com>, "'Tom Lane'" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "'David Fetter'" <david(at)fetter(dot)org>, "'Michael Paquier'" <michael(dot)paquier(at)gmail(dot)com>, "'Jim Nasby'" <jim(at)nasby(dot)net>, "'Craig Ringer'" <craig(at)2ndquadrant(dot)com>, "'Andrew Dunstan'" <andrew(at)dunslane(dot)net>, "'Christopher Manning'" <c(at)christophermanning(dot)org>, <fabriziomello(at)gmail(dot)com>, "'Pgsql Hackers'" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposal to add --single-row to psql
Date: 2013-05-14 09:06:48
Message-ID: 005901ce5082$5f20aab0$1d620010$@kapila@huawei.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sunday, May 12, 2013 4:53 AM Robert Haas wrote:
> On Sat, May 11, 2013 at 12:27 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > By the time you've got an expression tree, the problem is mostly
> solved,
> > at least so far as parser extension is concerned.
>
> Right.
>
> > More years ago than I care to admit, I worked on systems that had
> > run-time-extensible parsers at Hewlett-Packard, so technology for
> this
> > does exist. But my (vague) memory of those systems is that the
> parser's
> > language capabilities were more limited than bison's, perhaps only
> > LL(1). Parsing spec-compatible SQL that way might be a challenge.
>
> If I understand bison correctly, it basically looks at the current
> parser state and the next token and decides to either shift that token
> onto the stack or reduce the stack using some rule. If there's no
> matching rule, we error out. If someone wants to inject new rules
> into the grammar, those state tables are all going to change. But if
> we could contrive things so that the state tables are built
> dynamically and can be change as rules are added and removed, then it
> seems to me that we could let a loadable module supply (or delete)
> whatever grammar rules it likes. Whenever it does this, we recompile
> the grammar on next use (and complain if we get ambiguities). This
> does not sound all that easy to code, but at least in theory it seems
> doable.

Can we think of a pluggable parser whose responsibility is to produce raw
parse tree structure similar to PostgreSQL, but internally it can handle
different database syntax (e.g Oracle).
For example below syntaxes of Oracle can be converted to PostgreSQL raw
parse tree

Oracle syntax:
SELECT 1 + 10 * 100 + 1000 from DUAL;
PostgreSQL syntax:
SELECT 1 + 10 * 100 + 1000;

Oracle syntax:
CREATE USER xxxx IDENTIFIED BY 'password';
ALTER USER xxxx IDENTIFiED BY 'password';
PostgreSQL syntax:
CREATE USER xxxx PASSWORD 'password';
ALTER USER xxxx PASSWORD 'password';

Currently if somebody has to migrate from Sybase, Oracle, SQL Server, MySQL
to PostgreSQL, he has to either use some converter tool or enhance existing
gram.y to support it.
In case current PG parser is pluggable, he can write it's parser which can
produce PG specific parse tree for another database syntax. User can load
database specific
Parser during server start time by specifying corresponding parser dll/so in
shared_preload_libraries.
Advantage of writing PG based parser for other database syntax is user can
directly use script files of other database, no need of conversion every
time.

Though I am not sure, but I think 70~80% of commonly used SQL syntax is
common among different databases as all standard databases are SQL-spec
compliant, so if such a parser facility
is provided, it can save lot of time taken for conversion for customers who
want to migrate from other database syntax to PostgreSQL.

With Regards,
Amit Kapila.


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, David Fetter <david(at)fetter(dot)org>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Jim Nasby <jim(at)nasby(dot)net>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Christopher Manning <c(at)christophermanning(dot)org>, fabriziomello(at)gmail(dot)com, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposal to add --single-row to psql
Date: 2013-05-20 19:34:08
Message-ID: 20130520193408.GA3820@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, May 11, 2013 at 07:22:55PM -0400, Robert Haas wrote:
> On Sat, May 11, 2013 at 12:27 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > By the time you've got an expression tree, the problem is mostly solved,
> > at least so far as parser extension is concerned.
>
> Right.
>
> > More years ago than I care to admit, I worked on systems that had
> > run-time-extensible parsers at Hewlett-Packard, so technology for this
> > does exist. But my (vague) memory of those systems is that the parser's
> > language capabilities were more limited than bison's, perhaps only
> > LL(1). Parsing spec-compatible SQL that way might be a challenge.
>
> If I understand bison correctly, it basically looks at the current
> parser state and the next token and decides to either shift that token
> onto the stack or reduce the stack using some rule. If there's no
> matching rule, we error out. If someone wants to inject new rules
> into the grammar, those state tables are all going to change. But if
> we could contrive things so that the state tables are built
> dynamically and can be change as rules are added and removed, then it
> seems to me that we could let a loadable module supply (or delete)
> whatever grammar rules it likes. Whenever it does this, we recompile
> the grammar on next use (and complain if we get ambiguities). This
> does not sound all that easy to code, but at least in theory it seems
> doable.
>
> We'd also need a way to add keywords. To be quite frank, I think our
> whole approach to keywords is massive screwed up right now. Aside

FYI, one trick that is often used when parsing C code is to allow parsed
code to add to the keyword list, so when the parser goes to get the next
token, it might get "keyword" back based on what it has seen already.
This is required to handle C typedefs, and might be a method we could
use.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +


From: Darren Duncan <darren(at)darrenduncan(dot)net>
To: Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, David Fetter <david(at)fetter(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Jim Nasby <jim(at)nasby(dot)net>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Christopher Manning <c(at)christophermanning(dot)org>, fabriziomello(at)gmail(dot)com
Subject: Re: Proposal to add --single-row to psql
Date: 2013-05-21 02:18:16
Message-ID: 519AD968.3030209@darrenduncan.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I have actually been working on the task discussed in this thread, most relevant
parts quoted below, for awhile now, and hope to have something concrete that you
can use by the end of this summer.

My in-development Muldis D language is homoiconic as a core feature, its source
code is data to it, and the native syntax of the language resembles an abstract
syntax tree. This tree of nodes primarily defines behavior of the code, but it
also supports arbitrary metadata per node, which for example can be used to
preserve concrete syntax for any programming language that can be parsed into
Muldis D nodes or conversely generated from said.

For example, you can have one type of node defining a function, and its details
are defined by its attributes or child nodes, such as its result type, its
parameters, whether it is declared associative/commutative/etc or not, and the
expression(s) defining its body. Another type of node defines a call to a
function, another type defines a text literal, another a relation literal, and
so on. Conversely, a node can define a schema or package etc. Example
metadata, which is also structured, could include line numbers or code comments
or whitespace or exact numeric literal formats or quoting formats or exact
keyword choices, for example.

Using these node data types, we have a fairly normalized representation of any
source code (or data) that is easy to introspect or transform with code. A key
design of Muldis D is the clear separation of syntax and behavior.

A parser is just a function that takes (typically) a character string as input
and produces a node (tree) as output. A compiler is just a function or
operation that takes a node (tree) as input and produces machine code. An
optimizer can be a function that derives one node tree from another, either as
its own operation or typically as part of the compiler stage.

A compiler or optimizer generally can trivially ignore the node metadata, but a
code generator or debugger can use it; metadata can be stripped without
affecting behavior. The canonical tree form can also easily be mapped
losslessly with relation forms, such as typical information schemas have.

Practically all behavior is defined in terms of generic type and routine
definitions. Practically all system-defined types and routines are defined in
one or more libraries/modules/packages that have the same format as those users
would write like extensions. So, all the relational operators have the same
syntax as say string or numeric or array operators.

I envision that the most effective way to use Muldis D to handle an arbitrary
programming language, including the native SQL syntax+behavior of each SQL DBMS,
is to have a Muldis D library+parser pair for it.

For example, to represent PostgreSQL 9.2 most directly, we have a library with
an explicitly defined type and routine for every built-in that Pg 9.2 has, and
we also have a parser function that takes the SQL syntax that Pg 9.2 understands
and produces a Muldis D node tree consisting of calls to the routines of that
library or value selectors of types in that library (things like SELECT and
INSERT etc are each mapped to a routine too). That way, even with a standard
node format that isn't specific to a typical language or version, the code for
parsing Pg 9.2 SQL has the minimal amount of indirection that it has to deal
with, as each syntax element would have a direct library call counterpart.
Similarly, the existing Pg 9.2 SQL compiler would have the least indirection to
take said nodes and execute them. (The library would be named eg "Postgres_9_2"
for example, which makes it easier say to also have one side-by-side for other
versions, shims, legacy code you want to more easily support compatibility with.)

Where one decides to do cross-compilation, say make Oracle SQL run on Pg, that
could be done as simply as defining a library for Oracle SQL with the
routines+types that has, and then mapping it to a Pg one just in terms of shim
calls (which the compiler can optimize away as applicable), and so parsers or
compilers never necessarily have to deal with behavior compatibility issues.

I am presently working out the exact set of such language nodes, and making a
reference implementation which is mostly self-defined and would compile to Perl
code, and hope to have the first Muldis D executable by the end of this summer.

I am confident that an adaption of this design into C or whatever would serve
Postgres greatly in letting it effectively parse multiple languages, anywhere
from application programming languages to multiple SQL dialects or versions.

Even if you don't decide to use my design specifically (which is open source and
you can influence it), I think you should find some of the general design
principles I stated above to be useful. Representing behavior as libraries the
AST being flexible enough for any concrete language without being too specific
to details of one. And of course, cross-invocation of code written in multiple
languages is made much easier.

Note, just to be clear, my proposal does not necessitate that all of a node tree
has to be kept in memory at once. This design should be adaptable to a
streaming approach, especially as it is expected to be able to handle database
dumps or transfers of arbitrary size, same as SQL engines can today. That is in
contrast to probably what most application languages would assume, where
everything would fit in memory at once. But the ability to stream or not would
largely be an implementation detail. Realistically, all code should fit in
memory at once, and anything that would have to be buffered out of memory would
be say embedded data literals (whether large strings or simply large relations).

If you're not sure how my proposal would address any of the needs or wants
raised in the thread, go ahead and ask, and I will try and answer as time permits.

-- Darren Duncan

On 2013.05.11 9:27 AM, Tom Lane wrote:
> David Fetter <david(at)fetter(dot)org> writes:
>> On Sat, May 11, 2013 at 11:17:03AM -0400, Robert Haas wrote:
>>> Some kind of extendable parser would be awesome. It would need to tie
>>> into the rewriter also.
>>>
>>> No, I don't have a clue what the design looks like.
>
>> That's a direction several of the proprietary RDBMS vendors have
>> proposed. I think it'd be great :)
>
>> Pre-coffee (yeah, I know. Bad idea.) sketch of an idea: create an API
>> to pass expression trees in and out. This could have other benefits
>> as to clustering space, shortening the planning cycle, etc., but let's
>> not go there for now. My knowledge is very, very sketchy, but when I
>> squint, the expression trees we use look a lot like JSON. Are they
>> isomorphic?
>
> By the time you've got an expression tree, the problem is mostly solved,
> at least so far as parser extension is concerned.
>
> More years ago than I care to admit, I worked on systems that had
> run-time-extensible parsers at Hewlett-Packard, so technology for this
> does exist. But my (vague) memory of those systems is that the parser's
> language capabilities were more limited than bison's, perhaps only
> LL(1). Parsing spec-compatible SQL that way might be a challenge.
>
> A larger issue is that if you don't have the whole grammar available
> to check, it's difficult to be sure there are no parsing conflicts.
> I seem to remember that we hit some conflicts between different
> extension ROMs back at HP :-(
>
> Another point is that extensions that are actually interesting require
> a lot more than new syntax. Robert mentioned the rewriter, but you'd
> typically need planner and executor additions as well. It's possible to
> see how whole new plan node types might be added by a plugin so far as
> the executor is concerned, but I haven't a clue how we'd get the planner
> to emit them ...