Re: Hard to Use WAS: Hard limit on WAL space

From: Craig Ringer <craig(at)2ndquadrant(dot)com>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Daniel Farina <daniel(at)heroku(dot)com>, MauMau <maumau307(at)gmail(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Hard to Use WAS: Hard limit on WAL space
Date: 2013-06-15 04:43:10
Message-ID: 51BBF0DE.3050906@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 06/15/2013 02:16 AM, Josh Berkus wrote:
> On 06/12/2013 02:03 PM, Joshua D. Drake wrote:
>> What concerns me is we seem to be trying to make this "easy". It isn't
>> supposed to be easy. This is hard stuff. Smart people built it and it
>> takes a smart person to run it. When did it become a bad thing to be
>> something that smart people need to run?
> 1997, last I checked.
>
> Our unofficial motto: "PostgreSQL: making very hard things possible, and
> simple things hard."
>
> It *is* hard. But that's because we've *made* it hard to understand and
> manage, not because the problem is inherently hard.

I have to agree with all this... Pg has some of the best docs around, a
really nice SQL level interface, and some truly shocking usability
outside that nice zone.

Once a user steps into the "admin zone" they're confronted with a lot of
settings they'll really struggle to understand and manage.

I don't want this to be used as an argument not to commit early stages
of work, though. I think iterative development with exposure to
real-world testing and experience is necessary when you're getting to
the complexity of things that are now going in to Pg. It's more that
"commited" != "done"; right now, once its usable at that power-user
stage further management and improvement gets farmed out to external
tools and the usability of the core feature stays rather ... rough.

Some examples:

fsync=off
------------

We have a giant foot-cannon in the config files, "fsync" with the "off"
option neatly documented alongside all the others. No note saying
"setting fsync=off is equivalent to setting yes_you_can_eat_my_data=on".
No WARNING in the logs, not that a user who'd set that without
understanding it would look at the logs. The fsync section of
http://www.postgresql.org/docs/current/static/runtime-config-wal.html
<http://www.postgresql.org/docs/9.2/static/runtime-config-wal.html> is
ok, though it could do with a more prominent warning... but the user
needs to know where to look. I've repeatedly asked to change this -
usually after yet another user comes onto -general with data loss due to
not understanding fsync=off - and haven't been able to get agreement on
even a config file comment.

Proposed fix 9.3, config file comment saying "Warning, fsync=off may
cause data loss, see the user manual."

Proposed fix 9.4+: Remove fsync=off from docs. Leave the GUC enum there
but have the postmaster FATAL when it sees it with a message saying
"fsync=off has been replaced with unsafe_writes=on, please change your
postgresql.conf". Add the corresponding new GUC.

max_connections
------------------------

max_connections is another one. I see systems with max_connections=3000
in the wild... performing terribly, as you'd expect. Yet there's no
indication (even in the docs) that this is often a terrible idea, and
that you should really look into a connection pooler if you're going
above a few hundred (hardware/workload dependent).
http://www.postgresql.org/docs/current/static/runtime-config-connection.html
<http://www.postgresql.org/docs/9.1/static/runtime-config-connection.html>
doesn't mention it, there's no config file comment, etc.

Proposed fix: Comment in the config file saying something like "See the
documentation before raising this above a few hundred". In the docs, a
note about the perf impact of high max_connections with a brief mention
of external connection pooling and links to pgbouncer/pgpool-II, mention
that many app frameworks have built-in connection pools. Brief comment
about there being an optimum workload-and-hardware dependent level of
concurrency above which performance degrades. I'll be happy to write a
draft patch for this if there's agreement on the idea.

vacuum/autovacuum
---------------------------

autovaccum tuning. We've just had this conversation and there seems to
be agreement that it needs some love, but unlike the above two there's
no easy fix and it's an ongoing process. I don't have any right to
complain about it unless I do more to help fix it.

Bloat
------

Table bloat. Table bloat has been a major issue with PostgreSQL
users/admins for years. Anyone care to explain to me in a simple
paragraph how to find out if you have table or index bloat issues in
your database and what to do about it? (Maybe we need
"pg_catalog.pg_index_bloat" and "pg_catalog.pg_table_bloat" views
including FILLFACTOR correction?)

I think I'll draft up a patch to add exactly that.

Dump/restore and globals
----------------------------------

Dump and restore. The standard advice I give is to do a "pg_dumpall
--globals-only" followed by a "pg_dump -Fc" of each database, since we
don't have "pg_dumpfall -Fc". Users often seem to do single-DB dumps
then find themselves having trouble restoring them due to missing user
accounts, etc. Or they do a pg_dumpall then want to restore just one
DB/table.

There's also a lot of confusion around restoring dumps due to the
different formats. This has improved now that pg_restore tells the user
to restore a SQL dump using psql:

$ pg_restore regress.sql
pg_restore: [archiver] input file appears to be a text format dump.
Please use psql.

... though psql still chokes horribly on a pg_dump -Fc file:

psql:regress.out:1: ERROR: syntax error at or near "PGDMP"
LINE 1: PGDMP^A^L^A^A^A^A^A^A^A^AREVOKE ALL ON SCHEMA public FROM postgres;
^
psql:regress.out:2: WARNING: no privileges were granted for "public"
GRANT

Proposed fix: Should we have a pg_dumpall that produces per-database -Fc
or -Fd output? Or perhaps --include-roles / --include-tablespaces
options to pg_dump that stashes a pg_dumpall --globals-only inside the
-Fc archive?

Proposed fix : If psql sees the pseudo-command PGDMP it should quit
immediately with an error saying "This is a PostgreSQL custom format
dump file and must be restored with the pg_restore command". (Happy to
try to come up with a patch for this).

Proposed fix: Instead of just telling the user to run psql, pg_restore
should, if there are no selective restore options, propose a psql
command. Or even just invoke psql, though I'm hesitant about that
because of issues where the psql on the PATH is one version and the user
runs /path/to/pg_restore for another version. Or, if we built a full
path using $0, the case where pg_restore is being run from within the
source tree so there's no psql in the same directory.

pg_hba.conf
----------------

The #1 question I see on Stack Overflow has to be confusion about
pg_hba.conf, mostly from people who have no idea it exists, don't
understand how to configure it, etc. They can't tell the difference
between peer/ident/trust, don't understand that if you set 'md5' but
don't set a password then the password will always be wrong, etc.

I list this last because I think Pg's client authentication is well
documented, it lacks obvious foot-guns, and it's really not that hard. I
have little sympathy for people who respond to a docs link with "I don't
have time for that, I'm not a DBA, can you just tell me what I need to
change?". Authentication and authorization isn't simple, and attempts to
make it too simple usually also make it wrong. At some point I want to
think about how to make it easier to manage Pg's auth, but I know there
are some big complexities around it because there's no DB available at
the time pg_hba.conf checking is done on an incoming connection so not
even a shared/global table may be read from.

Proposed fix: None required at this time.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Brendan Jurd 2013-06-15 05:39:41 Re: [PATCH] Exorcise "zero-dimensional" arrays (Was: Re: Should array_length() Return NULL)
Previous Message Craig Ringer 2013-06-15 04:03:31 Re: MD5 aggregate