Quick Links

Re: Restart after power outage: createdb

Lists:	pgsql-generalpgsql-hackers

From:	Jon Lapham <lapham(at)jandr(dot)org>
To:	pgsql-general(at)postgresql(dot)org
Subject:	Restart after power outage: createdb
Date:	2006-09-26 11:06:21
Message-ID:	451909AD.4080503@jandr.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

While trying to reproduce power outage restart problem I reported
earlier, I have found something odd when pulling the plug on a VM during
a createdb operation.

If I run...
sleep 3; echo starting; createdb bar
...and power off the VM while the "createdb bar" is running.

Upon restart, about 50% of the time I can reproduce the following error
message:

[lapham(at)localhost ~]$ psql bar
psql: FATAL: database "bar" does not exist
[lapham(at)localhost ~]$ createdb bar
createdb: database creation failed: ERROR: could not create directory
"base/65536": File exists
[lapham(at)localhost ~]$ createdb bar
CREATE DATABASE

So, running "createdb bar" a second time works. This is probably not a
bug, but I thought I'd mention it anyway as it could potentially cause
problems for people running scripts.

--
-**-*-*---*-*---*-*---*-----*-*-----*---*-*---*-----*-----*-*-----*---
Jon Lapham <lapham(at)jandr(dot)org> Rio de Janeiro, Brasil
Personal: http://www.jandr.org/
***-*--*----*-------*------------*--------------------*---------------

From:	Jim Nasby <jim(at)nasby(dot)net>
To:	lapham(at)jandr(dot)org
Cc:	pgsql-general(at)postgresql(dot)org
Subject:	Re: Restart after power outage: createdb
Date:	2006-09-27 05:34:06
Message-ID:	B7B1E607-38F5-4274-9B0F-7C25BDB3541B@nasby.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Sep 26, 2006, at 7:06 AM, Jon Lapham wrote:
> While trying to reproduce power outage restart problem I reported
> earlier, I have found something odd when pulling the plug on a VM
> during a createdb operation.
>
> If I run...
> sleep 3; echo starting; createdb bar
> ...and power off the VM while the "createdb bar" is running.
>
> Upon restart, about 50% of the time I can reproduce the following
> error message:
>
> [lapham(at)localhost ~]$ psql bar
> psql: FATAL: database "bar" does not exist
> [lapham(at)localhost ~]$ createdb bar
> createdb: database creation failed: ERROR: could not create
> directory "base/65536": File exists
> [lapham(at)localhost ~]$ createdb bar
> CREATE DATABASE
>
> So, running "createdb bar" a second time works. This is probably
> not a bug, but I thought I'd mention it anyway as it could
> potentially cause problems for people running scripts.

Could this be a timing issue? IE: are you just issuing the first
createdb bar too soon after the database has started?

I suspect this is a bug...
--
Jim Nasby jim(at)nasby(dot)net
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)

From:	Jon Lapham <lapham(at)jandr(dot)org>
To:	Jim Nasby <jim(at)nasby(dot)net>
Cc:	pgsql-general(at)postgresql(dot)org
Subject:	Re: Restart after power outage: createdb
Date:	2006-09-27 10:24:53
Message-ID:	451A5175.5060106@jandr.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

Jim Nasby wrote:
>> [lapham(at)localhost ~]$ psql bar
>> psql: FATAL: database "bar" does not exist
>> [lapham(at)localhost ~]$ createdb bar
>> createdb: database creation failed: ERROR: could not create directory
>> "base/65536": File exists
>> [lapham(at)localhost ~]$ createdb bar
>
> Could this be a timing issue? IE: are you just issuing the first
> createdb bar too soon after the database has started?
>
> I suspect this is a bug...

Probably not too soon, since postgresql is started by init.d during boot
and it is probably a minute or so after that before I can get logged in,
open a shell, issue the commands, etc.

From:	Jim Nasby <jim(at)nasby(dot)net>
To:	lapham(at)jandr(dot)org
Cc:	pgsql-general(at)postgresql(dot)org
Subject:	Re: Restart after power outage: createdb
Date:	2006-09-27 14:18:58
Message-ID:	27C72FDE-D289-475B-BA77-3C4A73D3A4F4@nasby.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Sep 27, 2006, at 6:24 AM, Jon Lapham wrote:
> Jim Nasby wrote:
>>> [lapham(at)localhost ~]$ psql bar
>>> psql: FATAL: database "bar" does not exist
>>> [lapham(at)localhost ~]$ createdb bar
>>> createdb: database creation failed: ERROR: could not create
>>> directory "base/65536": File exists
>>> [lapham(at)localhost ~]$ createdb bar
>> Could this be a timing issue? IE: are you just issuing the first
>> createdb bar too soon after the database has started?
>> I suspect this is a bug...
>
> Probably not too soon, since postgresql is started by init.d during
> boot and it is probably a minute or so after that before I can get
> logged in, open a shell, issue the commands, etc.

My brain is foggy on exactly what rules are used to clean $PGDATA up
after a crash, but I'm pretty sure it's not supposed to work like
that. If you can re-test with 8.2RC1 and still reproduce it (please
watch the logs to make sure crash recovery is complete before you try
the createdb), I'd ask on pgsql-bugs.
--
Jim Nasby jim(at)nasby(dot)net
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	lapham(at)jandr(dot)org
Cc:	pgsql-hackers(at)postgreSQL(dot)org
Subject:	Re: [GENERAL] Restart after power outage: createdb
Date:	2006-09-27 20:13:34
Message-ID:	6199.1159388014@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

Jon Lapham <lapham(at)jandr(dot)org> writes in pgsql-general:
> If I run...
> sleep 3; echo starting; createdb bar
> ...and power off the VM while the "createdb bar" is running.

> Upon restart, about 50% of the time I can reproduce the following error
> message:

> [lapham(at)localhost ~]$ psql bar
> psql: FATAL: database "bar" does not exist
> [lapham(at)localhost ~]$ createdb bar
> createdb: database creation failed: ERROR: could not create directory
> "base/65536": File exists

What apparently is happening here is that the same OID has been assigned
to the new database both times. Even though the createdb didn't
complete, the directory it started to build is there and so there's a
filename collision.

> So, running "createdb bar" a second time works.

Yeah, because the OID counter has been advanced, and so the second
createdb uses a nonconflicting OID.

In theory this scenario should not happen, because a crash-and-restart
is supposed to guarantee that the OID counter comes up at or beyond
where it was before the crash.

After thinking about it for awhile, I believe the problem is that
CREATE DATABASE is breaking the "WAL rule": it's allowing a data change
(specifically, creation of the new DB subdirectory) to hit disk without
having guaranteed that associated WAL entries were flushed first.
Specifically, if we generated an XLOG_NEXTOID WAL entry to record the
consumption of an OID for the database, there isn't anything ensuring
that record gets to disk before the mkdir occurs. (ie, the comment in
XLogPutNextOid is correct as far as it goes, but it fails to account
for outside-the-database effects such as creation of a directory named
after the OID.) Hence after restart the OID counter might not get
advanced as far as it should have been.

We could fix this two different ways:

1. Put an XLogFlush into createdb() somewhere between making the
pg_database entry and starting to create subdirectories.

2. Check for conflicting database directories while assigning the OID,
comparable to what GetNewRelFileNode() does for table files.

#2 has some appeal because it could deal with random junk in
$PGDATA/base regardless of how the junk got there. However, to do that
in a really bulletproof way we'd have to check all the tablespace
directories too, and that's starting to get a tad tedious for something
that shouldn't happen anyway.

So I'm leaning to #1 as a suitably low-effort fix. Thoughts?

regards, tom lane

From:	"Jim C(dot) Nasby" <jim(at)nasby(dot)net>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	lapham(at)jandr(dot)org, pgsql-hackers(at)postgreSQL(dot)org
Subject:	Re: [GENERAL] Restart after power outage: createdb
Date:	2006-09-27 20:44:36
Message-ID:	20060927204436.GU19827@nasby.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Wed, Sep 27, 2006 at 04:13:34PM -0400, Tom Lane wrote:
> Jon Lapham <lapham(at)jandr(dot)org> writes in pgsql-general:
> > If I run...
> > sleep 3; echo starting; createdb bar
> > ...and power off the VM while the "createdb bar" is running.
>
> > Upon restart, about 50% of the time I can reproduce the following error
> > message:
>
> > [lapham(at)localhost ~]$ psql bar
> > psql: FATAL: database "bar" does not exist
> > [lapham(at)localhost ~]$ createdb bar
> > createdb: database creation failed: ERROR: could not create directory
> > "base/65536": File exists
>
> What apparently is happening here is that the same OID has been assigned
> to the new database both times. Even though the createdb didn't
> complete, the directory it started to build is there and so there's a
> filename collision.
>
> > So, running "createdb bar" a second time works.
>
> Yeah, because the OID counter has been advanced, and so the second
> createdb uses a nonconflicting OID.
>
> In theory this scenario should not happen, because a crash-and-restart
> is supposed to guarantee that the OID counter comes up at or beyond
> where it was before the crash.
>
> After thinking about it for awhile, I believe the problem is that
> CREATE DATABASE is breaking the "WAL rule": it's allowing a data change
> (specifically, creation of the new DB subdirectory) to hit disk without
> having guaranteed that associated WAL entries were flushed first.
> Specifically, if we generated an XLOG_NEXTOID WAL entry to record the
> consumption of an OID for the database, there isn't anything ensuring
> that record gets to disk before the mkdir occurs. (ie, the comment in
> XLogPutNextOid is correct as far as it goes, but it fails to account
> for outside-the-database effects such as creation of a directory named
> after the OID.) Hence after restart the OID counter might not get
> advanced as far as it should have been.
>
> We could fix this two different ways:
>
> 1. Put an XLogFlush into createdb() somewhere between making the
> pg_database entry and starting to create subdirectories.
>
> 2. Check for conflicting database directories while assigning the OID,
> comparable to what GetNewRelFileNode() does for table files.
>
> #2 has some appeal because it could deal with random junk in
> $PGDATA/base regardless of how the junk got there. However, to do that
> in a really bulletproof way we'd have to check all the tablespace
> directories too, and that's starting to get a tad tedious for something
> that shouldn't happen anyway.
>
> So I'm leaning to #1 as a suitably low-effort fix. Thoughts?

It'd be nice to clean things up, but I understand the reluctance to do
so. Maybe a good compromise would be to warn about files that are
present in $PGDATA but don't show up in any catalogs.

Then again, if we're doing that, we could probably just nuke 'em...
--
Jim Nasby jim(at)nasby(dot)net
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	"Jim C(dot) Nasby" <jim(at)nasby(dot)net>
Cc:	lapham(at)jandr(dot)org, pgsql-hackers(at)postgreSQL(dot)org
Subject:	Re: [GENERAL] Restart after power outage: createdb
Date:	2006-09-27 20:52:51
Message-ID:	6579.1159390371@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

"Jim C. Nasby" <jim(at)nasby(dot)net> writes:
> Then again, if we're doing that, we could probably just nuke 'em...

This has been considered and rejected before, on the grounds that
removing files you don't know the source of is a good way to lose data.

Come to think of it, that argument bears on the immediate problem too.
The way createdb() is coded, if it gets a failure (like "File exists")
trying to create the database's directories, it will attempt to apply
remove_dbtablespaces() to clean up after itself. This would result in
removing the pre-existing directory, which violates the principle of
not removing unexpected files. So now I'm starting to think we do need
a check-for-conflicting-files step in createdb.

regards, tom lane

From:	"Jim C(dot) Nasby" <jim(at)nasby(dot)net>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	lapham(at)jandr(dot)org, pgsql-hackers(at)postgreSQL(dot)org
Subject:	Re: [GENERAL] Restart after power outage: createdb
Date:	2006-09-27 21:36:43
Message-ID:	20060927213643.GZ19827@nasby.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general pgsql-hackers

On Wed, Sep 27, 2006 at 04:52:51PM -0400, Tom Lane wrote:
> "Jim C. Nasby" <jim(at)nasby(dot)net> writes:
> > Then again, if we're doing that, we could probably just nuke 'em...
>
> This has been considered and rejected before, on the grounds that
> removing files you don't know the source of is a good way to lose data.
>
> Come to think of it, that argument bears on the immediate problem too.
> The way createdb() is coded, if it gets a failure (like "File exists")
> trying to create the database's directories, it will attempt to apply
> remove_dbtablespaces() to clean up after itself. This would result in
> removing the pre-existing directory, which violates the principle of
> not removing unexpected files. So now I'm starting to think we do need
> a check-for-conflicting-files step in createdb.

I think it would be really useful to tell the DBA that there's a bunch
of files in $PGDATA that are probably dead. If stuff had suddenly
disappeared out of the catalog I'd certainly like to know it.
--
Jim Nasby jim(at)nasby(dot)net
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)