Proof of concept: standalone backend with full FE/BE protocol

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: Proof of concept: standalone backend with full FE/BE protocol
Date: 2012-09-03 00:23:11
Message-ID: 12511.1346631791@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Attached is a proof-of-concept patch for talking to a standalone backend
using libpq and a pair of pipes. It works, to the extent that psql and
pg_dump can run without any postmaster present:

$ psql regression
psql: could not connect to server: No such file or directory
Is the server running locally and accepting
connections on Unix domain socket "/tmp/.s.PGSQL.5432"?
$ psql "standalone_datadir = $PGDATA dbname = regression"
psql (9.3devel)
Type "help" for help.

regression=> \d
List of relations
Schema | Name | Type | Owner
--------+-----------------------------+----------+-------
public | a | table | tgl
public | a_star | table | tgl
public | abstime_tbl | table | tgl
public | aggtest | table | tgl
public | array_index_op_test | table | tgl
...

but there's quite a bit of work to do yet before this could be a
committable patch. Some notes:

1. As you can see above, the feature is triggered by specifying the new
connection option "standalone_datadir", whose value must be the location
of the data directory. I also invented an option "standalone_backend",
which can be set to specify which postgres executable to launch. If the
latter isn't specified, libpq defaults to trying the installation PGBINDIR
that was selected by configure. (I don't think it can use the relative
path tricks we use in pg_ctl and elsewhere, since there's no good reason
to assume that it's running in a Postgres-supplied program.) I'm not
particularly wedded to these names or the syntax, but I think this is the
basic functionality we'd need.

2. As far as the backend is concerned, use of FE/BE protocol rather than
traditional standalone mode is triggered by writing "--child" instead of
"--single" as the first argument on the postgres command line. (I'm not
wedded to that name either ... anybody have a better idea?)

3. The bulk of the changes have to do with the fact that we need to keep
track of two file descriptors not one. This was a bit tedious, but fairly
straightforward --- though I was surprised to find that send() and recv()
don't work on pipes, at least not on Linux. You have to use read() and
write() instead.

4. As coded, the backend assumes the incoming pipe is on its FD 0 and the
outgoing pipe is on its FD 1. This made the command line simple but I'm
having second thoughts about it: if anything inside the backend tries to
read stdin or write stdout, unpleasant things will happen. It might be
better to not reassign the pipe FD numbers. In that case we'd have to
pass them on the command line, so that the syntax would be something
like "postgres --child 4,5 -D pgdata ...".

5. The fork/exec code is pretty primitive with respect to error handling.
I didn't put much time into it since I'm afraid we may need to refactor
it entirely before a Windows equivalent can be written. (And I need
somebody to write/test the Windows equivalent - any volunteers?)

6. I didn't bother with passing anything except -D and the database name
to the standalone backend. Probably we'd like to be able to pass other
command-line switches too. Again, it's not clear that it's worth doing
much here until we have equivalent Windows code available.

7. I haven't tried to make pg_upgrade use this yet.

8. PQcancel needs some work - it can't do what it does now, but it could
do kill(conn->postgres_pid, SIGINT) instead. At least in Unix. I have no
idea what we'd do in Windows. This doesn't matter for pg_upgrade of
course, but it'd be important for manual use of this mode.

Although the immediate use of this would be for pg_upgrade, I think that
people would soon drop the traditional --single mode and do anything they
need to do in standalone mode using this method, since psql is so vastly
more user-friendly than a --single backend.

In the longer run, this could provide a substitute for the "embedded
database" mode that we keep saying we're not going to implement. That is,
applications could fire up a standalone backend as a child process and not
need a postmaster anywhere, which would be a lot more convenient for an
app that wants a private database and doesn't want to involve its users in
managing a Postgres server. However, there are some additional things
we'd need to think about before advertising it as a fit solution for that.
Notably, while the lack of any background processes is just what you want
for pg_upgrade and disaster recovery, an ordinary application is probably
going to want to rely on autovacuum; and we need bgwriter and other
background processes for best performance. So I'm speculating about
having a postmaster process that isn't listening on any ports, but is
managing background processes in addition to a single child backend.
That's for another day though.

Comments? Anyone want to have a go at fixing this for Windows?

regards, tom lane

Attachment Content-Type Size
new-standalone-mode-1.patch text/x-patch 61.2 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2012-09-03 01:11:42 pg_upgrade bugs
Previous Message Jeff Janes 2012-09-02 23:28:18 Re: PATCH: pgbench - aggregation of info written into log