Re: BUG #5395: UPDATE on shutdown overwrites table

Lists: pgsql-bugs
From: "Lothar Bongartz" <lotharbongartz(at)hotmail(dot)com>
To: pgsql-bugs(at)postgresql(dot)org
Subject: BUG #5395: UPDATE on shutdown overwrites table
Date: 2010-03-30 15:27:45
Message-ID: 201003301527.o2UFRj8e022515@wwwmaster.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs


The following bug has been logged online:

Bug reference: 5395
Logged by: Lothar Bongartz
Email address: lotharbongartz(at)hotmail(dot)com
PostgreSQL version: 8.4.3
Operating system: Windows Server 2003 R2
Description: UPDATE on shutdown overwrites table
Details:

I have ported a community with about one million members from MS SQL to
Postgres.
While the overall performance is comparable, I have noticed, that
conditional bulk writing can be extremely slow in Postgres:
SELECT ... INTO ... WHERE
UPDATE ... FROM ... WHERE
For this reason the database is stalling from time to time and I have to
restart the server. For the second time I have detected, that Postgres
overwrites a table when shutting down. The table <onlineinfo> is only
updated with NOW() and only for a single matching row:
UPDATE onlineinfo SET date_end=NOW() WHERE memb_id=v_id
When this command is executed while Postgres shuts down, all rows contain a
"random" date like '2007-06-25' in the <date_end> column afterwards.
To repair this, I have created a table from a backup:
CREATE TEMP TABLE temp_onlineinfo_bak
(
memb_id integer NOT NULL,
date_end timestamp NOT NULL
)
After filling the temporary table from the backup, I use it to repair the
overwritten table:
UPDATE onlineinfo SET date_end=B.date_end
FROM onlineinfo O, temp_onlineinfo_bak B
WHERE O.date_end<'2010-03-21' AND B.memb_id=O.memb_id;
The WHERE condition is needed to prevent overwriting rows, which are updated
in the meantime.

Unfortunately this command needs several hours to complete, since this is
again a conditional bulk writing.

I tried all kind of variations like removing the index on
<onlineinfo><date_end> before the update or setting up an index on
<temp_onlineinfo_bak><memb_id>

Nothing helps. These are the data of the table:
CREATE TABLE onlineinfo
(
memb_id integer NOT NULL,
log_count integer NOT NULL,
date_end timestamp NOT NULL,
ip integer NOT NULL,
CONSTRAINT onlineinfo_pkey PRIMARY KEY (memb_id),
CONSTRAINT fk__onlineinf__memb___55009f39 FOREIGN KEY (memb_id)
REFERENCES member (memb_id) MATCH FULL
ON UPDATE NO ACTION ON DELETE NO ACTION
)
WITH (
OIDS=FALSE
);
ALTER TABLE onlineinfo OWNER TO postgres;

-- Index: onlineinfo_date_end

-- DROP INDEX onlineinfo_date_end;

CREATE INDEX onlineinfo_date_end
ON onlineinfo
USING btree
(date_end);


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Lothar Bongartz" <lotharbongartz(at)hotmail(dot)com>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #5395: UPDATE on shutdown overwrites table
Date: 2010-03-30 17:47:39
Message-ID: 26181.1269971259@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

"Lothar Bongartz" <lotharbongartz(at)hotmail(dot)com> writes:
> For this reason the database is stalling from time to time and I have to
> restart the server. For the second time I have detected, that Postgres
> overwrites a table when shutting down. The table <onlineinfo> is only
> updated with NOW() and only for a single matching row:
> UPDATE onlineinfo SET date_end=NOW() WHERE memb_id=v_id
> When this command is executed while Postgres shuts down, all rows contain a
> "random" date like '2007-06-25' in the <date_end> column afterwards.

This is quite hard to believe. Can you provide a reproducible test
case?

I have seen cases where someone wrote what he thought was a single-row
update, but it turned out to be a whole-table update because the WHERE
clause actually reduced to constant TRUE. Your reference to "v_id"
makes me think that you are issuing this query inside a plpgsql
function. One of the common ways to shoot yourself in the foot like
that is to be careless about whether a name could match both a table
column and a plpgsql variable or parameter; could that have happened in
your situation?

BTW, there are easier ways to cancel a single query than restarting the
whole server.

regards, tom lane


From: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
To: Lothar Bongartz <lotharbongartz(at)hotmail(dot)com>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #5395: UPDATE on shutdown overwrites table
Date: 2010-03-31 02:28:42
Message-ID: 4BB2B35A.4080804@postnewspapers.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

On 30/03/2010 11:27 PM, Lothar Bongartz wrote:

> For this reason the database is stalling from time to time and I have to
> restart the server.

Are those stalls accompanied by any mention of checkpoints in the logs?

The default PostgreSQL configuration isn't tuned for larger workloads -
it's very minimalist. You probably need to increase your
checkpoint_segments . It may also be a good idea to look at shared_buffers .

Before just re-starting the server, you should look at `iostat',
`vmstat', `top', `iotop' or other suitable performance monitoring tools
and see if the server's actually DOING anything. You'll probably find
it's busy with disk I/O and not really stalled at all.

--
Craig Ringer