libpq URL syntax vs SQLAlchemy

Lists: pgsql-hackers
From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: libpq URL syntax vs SQLAlchemy
Date: 2012-05-09 18:17:33
Message-ID: 1336587453.8747.6.camel@vanquo.pezone.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I have been reviewing how our new libpq URL syntax compares against
existing implementations of URL syntaxes in other drivers or
higher-level access libraries. In the case of SQLAlchemy, there is an
incompatibility regarding how Unix-domain sockets are specified.

First, here is the documentation on that:
http://docs.sqlalchemy.org/en/latest/dialects/postgresql.html

The recommended way to access a server over a Unix-domain socket is to
leave off the host, as in:

postgresql://user:password@/dbname

In libpq, this is parsed as host='/dbname', no database.

To specify a socket path in SQLAlchemy, you use:

postgresql://user:password@/dbname?host=/var/lib/postgresql

This also works in libpq (bizarrely, perhaps, considering the previous
case).

This libpq behavior is a problem for several reasons:

- It's incompatible with a popular existing implementation.

- It violates RFC 3986, which doesn't allow slashes in the
"authority" (host, port, user, password) part.

- As a consequence of this, URLs like this will be parsed differently
(or will fail to be parsed) by existing URL parsing libraries (tried
Perl URI and Python urllib, for instance).

- Moreover, if these libraries can't parse the URL, it might mean those
drivers can't adopt that URL syntax.

- It's internally inconsistent, as shown above.

- In most places in PostgreSQL clients, no host means Unix-domain
socket, but not here.

- It favors the case of non-default Unix-domain socket plus default
database over default Unix-domain socket plus non-default database.

- It's not obvious how to get to the default Unix-domain socket at all.
"postgresql:///dbname" doesn't work, but "postgresql:///dbname?host="
does.

I think this whole approach of using unescaped slashes in the "host"
part of the URL is going to cause lots of problems like this. We should
consider one or more of:

- Requiring percent escapes

- Requiring specifying the socket path as a parameter, like in the
above example

- Requiring some delimiters like for IPv6 addresses (which had the
same problem of reusing a reserved character) (probably a bad idea,
since we can't make existing URL parsing libraries understand
this)


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: libpq URL syntax vs SQLAlchemy
Date: 2012-05-10 06:01:29
Message-ID: CA+TgmobxFMZR5bAns+eozJB2oxJxYS6-jkB4qf3tWmuust8wvg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, May 9, 2012 at 2:17 PM, Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:
>    postgresql://user:password@/dbname
>
> In libpq, this is parsed as host='/dbname', no database.

That is flat wrong.

> - Requiring percent escapes

And this is, IMHO, the right fix.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Alex <ash(at)commandprompt(dot)com>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: libpq URL syntax vs SQLAlchemy
Date: 2012-05-10 20:15:38
Message-ID: 874nrn6bdh.fsf@commandprompt.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Peter Eisentraut <peter_e(at)gmx(dot)net> writes:

> I have been reviewing how our new libpq URL syntax compares against
> existing implementations of URL syntaxes in other drivers or
> higher-level access libraries. In the case of SQLAlchemy, there is an
> incompatibility regarding how Unix-domain sockets are specified.
>
> First, here is the documentation on that:
> http://docs.sqlalchemy.org/en/latest/dialects/postgresql.html
>
> The recommended way to access a server over a Unix-domain socket is to
> leave off the host, as in:
>
> postgresql://user:password@/dbname
>
> In libpq, this is parsed as host='/dbname', no database.

Ah, good catch: thanks for heads up.

I believe this was introduced lately in the dev cycle when we've noticed
that users will have to specify some defaults explicitly to be able to
override other defaults, while avoiding the whole "?keyword=value&..."
business.

I'll give this another look and will get back with a proposal to fix
this in form of a patch.

--
Regards,
Alex


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: libpq URL syntax vs SQLAlchemy
Date: 2012-05-12 09:32:31
Message-ID: CA+U5nMLjMu7XrG__-=03BWvrOZqR-Z=Y0K84ei2TNMi5eNPCGg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 9 May 2012 19:17, Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:

> I have been reviewing how our new libpq URL syntax compares against
> existing implementations of URL syntaxes in other drivers or
> higher-level access libraries.  In the case of SQLAlchemy, there is an
> incompatibility regarding how Unix-domain sockets are specified.

Is there an open standard that already defines this? If there is an
existing standard we should follow it, so we can quote "we now follow
standard X".

If there isn't one, can we create one? Can we propose an RFC that
works for many data stores?

If somebody can define that, I can push that through the relevant
processes. Not because I wish PostgreSQL syntax to be badged as a
standard, but because the world clearly needs a useful, open standard
here.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: libpq URL syntax vs SQLAlchemy
Date: 2012-05-12 20:19:39
Message-ID: 1336853979.578.17.camel@vanquo.pezone.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On lör, 2012-05-12 at 10:32 +0100, Simon Riggs wrote:
> On 9 May 2012 19:17, Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:
>
> > I have been reviewing how our new libpq URL syntax compares against
> > existing implementations of URL syntaxes in other drivers or
> > higher-level access libraries. In the case of SQLAlchemy, there is
> an
> > incompatibility regarding how Unix-domain sockets are specified.
>
> Is there an open standard that already defines this?

As I wrote upthread, RFC 3986 is the latest version of the standard for
URIs. But it's a multileveled matter, because in the simplest instance,
a URI is

scheme:something

(compare mailto:), so in theory almost any URI can comply. But now that
I read it once again, since our "something" starts with "//", we are
bound to the more specific syntax defined there, and that makes our
current implementation just plain invalid on the matter that I
complained about in my earlier message.


From: Alex Shulgin <ash(at)commandprompt(dot)com>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: libpq URL syntax vs SQLAlchemy
Date: 2012-05-14 15:16:19
Message-ID: 87bolq3i9o.fsf@commandprompt.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Alex <ash(at)commandprompt(dot)com> writes:

> Peter Eisentraut <peter_e(at)gmx(dot)net> writes:
>
>> I have been reviewing how our new libpq URL syntax compares against
>> existing implementations of URL syntaxes in other drivers or
>> higher-level access libraries. In the case of SQLAlchemy, there is an
>> incompatibility regarding how Unix-domain sockets are specified.
>>
>> First, here is the documentation on that:
>> http://docs.sqlalchemy.org/en/latest/dialects/postgresql.html
>>
>> The recommended way to access a server over a Unix-domain socket is to
>> leave off the host, as in:
>>
>> postgresql://user:password@/dbname
>>
>> In libpq, this is parsed as host='/dbname', no database.
>
> Ah, good catch: thanks for heads up.
>
> I believe this was introduced lately in the dev cycle when we've noticed
> that users will have to specify some defaults explicitly to be able to
> override other defaults, while avoiding the whole "?keyword=value&..."
> business.
>
> I'll give this another look and will get back with a proposal to fix
> this in form of a patch.

Upon closer inspection of the issue I came to believe that the proper
fix is to drop support for special treatment of "host part" starting
with slash altogether.

Attached is a patch to do that.

While the following type of URIs is still valid, the interpretation is
now different and is standards-conforming: the part after the
double-slash is treated as "path" specification and not authority
("host".)

postgres:///path-spec

Since the path from a URI translates into dbname connection option, the
only meaningful use of this type of URIs is the following:

postgres:///mydb

The host part in this case is empty (it is "hidden" between the "//" and
the following "/",) thus local socket connection is employed for this
type of URIs. To specify non-standard path to the local sockets
directory use the familiar URI parameter:

postgres:///db?host=/path/to/socket/dir

Also, if my reading of the RFC is correct, the username, password and
port specifiers may be omitted even if the corresponding designators are
present in URI, so we need to remove some checks on empty URI parts.

The test input and expected output files, the docs and code comments are
also updated, of course.

Finally, to complete the RFC compliance, I've added code to properly
handle percent-encoding in query parameter keywords.

At this point, I feel rather silly that we've produced and committed an
"almost" compliant version, which still requires quite a bit of
patching to become an RFC-conforming implementation...

--
Regards,
Alex

Attachment Content-Type Size
libpq-uri-rfc-compliance.patch application/octet-stream 22.0 KB

From: Alex <ash(at)commandprompt(dot)com>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: libpq URL syntax vs SQLAlchemy
Date: 2012-05-25 09:11:15
Message-ID: 87likgsk0c.fsf@commandprompt.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Alex Shulgin <ash(at)commandprompt(dot)com> writes:
>
> Upon closer inspection of the issue I came to believe that the proper
> fix is to drop support for special treatment of "host part" starting
> with slash altogether.
>
> Attached is a patch to do that.

Well, I understand I might be asking for too much, but did anyone had a
chance to look at the correcting patch? We're having the first
CommitFest of 9.3 in three weeks, so this better be dealt with before
it's too late.

I believe the correcting patch makes our implementation comply to RFC
3986.

I can produce a patch against 9.1 for improved readability: the removal
of special handling of '/' at the start of URI created a bit of mess in
the correcting patch, so it might be easier to look at the combined
effect of the committed and this one.

Comments please?

--
Regards,
Alex


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Alex Shulgin <ash(at)commandprompt(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: libpq URL syntax vs SQLAlchemy
Date: 2012-05-28 19:50:00
Message-ID: 1338234600.24935.21.camel@vanquo.pezone.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On mån, 2012-05-14 at 18:16 +0300, Alex Shulgin wrote:
> Upon closer inspection of the issue I came to believe that the proper
> fix is to drop support for special treatment of "host part" starting
> with slash altogether.
>
> Attached is a patch to do that.

Committed.

I also updated the documentation and tests to show that percent-encoding
a host part starting with slash also works, as discussed upthread.


From: Alex <ash(at)commandprompt(dot)com>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: libpq URL syntax vs SQLAlchemy
Date: 2012-05-29 04:27:21
Message-ID: 8762bf4no6.fsf@commandprompt.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Peter Eisentraut <peter_e(at)gmx(dot)net> writes:

> On mån, 2012-05-14 at 18:16 +0300, Alex Shulgin wrote:
>> Upon closer inspection of the issue I came to believe that the proper
>> fix is to drop support for special treatment of "host part" starting
>> with slash altogether.
>>
>> Attached is a patch to do that.
>
> Committed.

Many thanks!

> I also updated the documentation and tests to show that percent-encoding
> a host part starting with slash also works, as discussed upthread.

Yes, that's a side-effect, but still might be useful to know.

--
Regards,
Alex