libpq URL syntax vs SQLAlchemy

From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: libpq URL syntax vs SQLAlchemy
Date: 2012-05-09 18:17:33
Message-ID: 1336587453.8747.6.camel@vanquo.pezone.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I have been reviewing how our new libpq URL syntax compares against
existing implementations of URL syntaxes in other drivers or
higher-level access libraries. In the case of SQLAlchemy, there is an
incompatibility regarding how Unix-domain sockets are specified.

First, here is the documentation on that:
http://docs.sqlalchemy.org/en/latest/dialects/postgresql.html

The recommended way to access a server over a Unix-domain socket is to
leave off the host, as in:

postgresql://user:password@/dbname

In libpq, this is parsed as host='/dbname', no database.

To specify a socket path in SQLAlchemy, you use:

postgresql://user:password@/dbname?host=/var/lib/postgresql

This also works in libpq (bizarrely, perhaps, considering the previous
case).

This libpq behavior is a problem for several reasons:

- It's incompatible with a popular existing implementation.

- It violates RFC 3986, which doesn't allow slashes in the
"authority" (host, port, user, password) part.

- As a consequence of this, URLs like this will be parsed differently
(or will fail to be parsed) by existing URL parsing libraries (tried
Perl URI and Python urllib, for instance).

- Moreover, if these libraries can't parse the URL, it might mean those
drivers can't adopt that URL syntax.

- It's internally inconsistent, as shown above.

- In most places in PostgreSQL clients, no host means Unix-domain
socket, but not here.

- It favors the case of non-default Unix-domain socket plus default
database over default Unix-domain socket plus non-default database.

- It's not obvious how to get to the default Unix-domain socket at all.
"postgresql:///dbname" doesn't work, but "postgresql:///dbname?host="
does.

I think this whole approach of using unescaped slashes in the "host"
part of the URL is going to cause lots of problems like this. We should
consider one or more of:

- Requiring percent escapes

- Requiring specifying the socket path as a parameter, like in the
above example

- Requiring some delimiters like for IPv6 addresses (which had the
same problem of reusing a reserved character) (probably a bad idea,
since we can't make existing URL parsing libraries understand
this)

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Nolan 2012-05-09 18:36:57 Re: problem/bug in drop tablespace?
Previous Message Peter Eisentraut 2012-05-09 17:54:53 pgsql: Split contrib documentation into extensions and programs