Re: Avoid MVCC using exclusive lock possible?

Lists: pgsql-hackers
From: "Stephen" <private(at)xxxxxxx(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Avoid MVCC using exclusive lock possible?
Date: 2004-02-24 16:43:52
Message-ID: r4L_b.49293$AE3.13622@nntp-post.primus.ca
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

Recently, I ran a huge update on an Integer column affecting 100 million
rows in my database. What happened was my disk space increased in size and
my IO load was very high. It appears that MVCC wants to rewrite each row
(each row was about 5kB due to a bytea column). In addition, VACUUM needs to
run to recover space eating up even more IO bandwidth.

It came to my mind that what if there could be a mechanism in place to allow
overwriting portions of the same row *whenever possible* instead of creating
a new row as MVCC would require. This would work well for timestamp, char,
integer, float, boolean columns etc..

A user must explicitly call:

EXCLUSIVE LOCK ON TABLE
UPDATE ROWs
RELEASE LOCK ON TABLE.

It basically immitates the behavior of MySQL. Surely, this would be faster
than recreating the new row and marking the old one as invalid at the
expense of locking the table. MySQL users can then use Postgres and get
similar performance simply by locking the table first.

It probably works well when the transaction volume is low, when you need a
quick counter, when your IO bandwidth is saturated or when you want to avoid
VACUUMing after a massive update.

Any thoughts?


From: Jon Jensen <jon(at)endpoint(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Avoid MVCC using exclusive lock possible?
Date: 2004-02-28 21:21:05
Message-ID: Pine.LNX.4.58.0402282119440.13411@louche.swelter.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, 24 Feb 2004, Stephen wrote:

> It came to my mind that what if there could be a mechanism in place to allow
> overwriting portions of the same row *whenever possible* instead of creating
> a new row as MVCC would require.

How would you do a rollback if an error occurred mid-query? How would you
keep your table from being a big pile of junk if a power failure happened
during the query?

Jon


From: Neil Conway <neilc(at)samurai(dot)com>
To: Jon Jensen <jon(at)endpoint(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Avoid MVCC using exclusive lock possible?
Date: 2004-02-29 14:15:57
Message-ID: 4041F41D.4050308@samurai.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Jon Jensen wrote:
> How would you do a rollback if an error occurred mid-query? How would you
> keep your table from being a big pile of junk if a power failure happened
> during the query?

As most non-MVCC database do: by writing WAL records.

In theory, it seems to me that we could implement an overwriting
storage manager for PostgreSQL (i.e. we'd have both MVCC and non-MVCC,
and allow users to choose at runtime). It would require a *lot* of
work, but I can't see any fundamental reason why it wouldn't be possible.

-Neil


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Neil Conway <neilc(at)samurai(dot)com>
Cc: Jon Jensen <jon(at)endpoint(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Avoid MVCC using exclusive lock possible?
Date: 2004-02-29 15:43:34
Message-ID: 15047.1078069414@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Neil Conway <neilc(at)samurai(dot)com> writes:
> Jon Jensen wrote:
>> How would you do a rollback if an error occurred mid-query? How would you
>> keep your table from being a big pile of junk if a power failure happened
>> during the query?

> As most non-MVCC database do: by writing WAL records.

> In theory, it seems to me that we could implement an overwriting
> storage manager for PostgreSQL (i.e. we'd have both MVCC and non-MVCC,
> and allow users to choose at runtime). It would require a *lot* of
> work, but I can't see any fundamental reason why it wouldn't be possible.

It would be possible, but what's the point? There's no evidence that
this approach is superior to what we do, and a fair number of reasons
to think it's inferior.

In particular, with this approach you have to maintain the entire
history of a transaction in WAL, so that you have the info available to
roll back if it aborts. So the OP's concern about needing 2X the disk
space to update his table still applies, it's just in a different place.
It's true that it's easier to get the space back when no longer needed
--- VACUUM FULL is expensive if it's moving lots of records. But in
general I think our VACUUM-based approach is superior to the
Oracle-style UNDO approach, because it pushes the maintenance overhead
out of foreground transaction processing and into a schedulable
background process. Certainly any Oracle DBA will tell you that huge
transactions are a headache to handle in Oracle.

I think what the OP actually wanted us to offer was non-transaction-safe
overwrite-in-place updating. That might be technically feasible (not
sure how we'd deal with changes in indexed columns though). But it's so
contrary to the basic design goals of the project that I don't think
anyone will take the suggestion seriously.

regards, tom lane


From: Shridhar Daithankar <shridhar(at)frodo(dot)hserus(dot)net>
To: "Stephen" <private(at)xxxxxxx(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Avoid MVCC using exclusive lock possible?
Date: 2004-03-01 07:14:22
Message-ID: 200403011244.22899.shridhar@frodo.hserus.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tuesday 24 February 2004 22:13, Stephen wrote:
> Hi,
>
> Recently, I ran a huge update on an Integer column affecting 100 million
> rows in my database. What happened was my disk space increased in size and
> my IO load was very high. It appears that MVCC wants to rewrite each row
> (each row was about 5kB due to a bytea column). In addition, VACUUM needs
> to run to recover space eating up even more IO bandwidth.

I am sure people have answered the approach you have suggested so let me
suggest a workaround for your problem.

You could run following in a transaction.

- begin
- Create another table with exact same structure
- write a procedure that reads from input table and updates the value in
between
- drop the original table
- rename new table to old one
- commit
- analyze new table

Except for increased disk space, this approach has all the good things
postgresql offers. Especially using transactable DDLs it is huge benefit. You
certainly do save on vacuum.

If the entire table is updated then you can almost certainly get things done
faster this way.

HTH

Shridhar


From: Andrew Sullivan <ajs(at)crankycanuck(dot)ca>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Avoid MVCC using exclusive lock possible?
Date: 2004-03-01 13:54:10
Message-ID: 20040301135410.GA8345@phlogiston.dyndns.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Feb 29, 2004 at 10:43:34AM -0500, Tom Lane wrote:
> general I think our VACUUM-based approach is superior to the
> Oracle-style UNDO approach, because it pushes the maintenance overhead
> out of foreground transaction processing and into a schedulable
> background process. Certainly any Oracle DBA will tell you that huge

I completely agree with this. If the recent work on lowering the
overall cost ov VACUUM on loaded systems pays off, then I think there
can be no argument that the work-now, vacuum-later strategy is the
best approach, simply because it deals with the outlying and
unexpected cases better than the alternatives. I know too many
people who have been burned by running out of rollback segments when
some use pattern emerged that they hadn't planned for.

A
--
Andrew Sullivan


From: "Simon Riggs" <simon(at)2ndquadrant(dot)com>
To: <shridhar(at)frodo(dot)hserus(dot)net>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Avoid MVCC using exclusive lock possible?
Date: 2004-03-02 00:27:51
Message-ID: 003601c3ffed$34cdcb20$5baa87d9@LaptopDellXP
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

>Shridhar Daithankar
> > Recently, I ran a huge update on an Integer column affecting 100
million
> > rows in my database. What happened was my disk space increased in
size
> and
> > my IO load was very high. It appears that MVCC wants to rewrite each
row
> > (each row was about 5kB due to a bytea column). In addition, VACUUM
> needs
> > to run to recover space eating up even more IO bandwidth.
>
> I am sure people have answered the approach you have suggested so let
me
> suggest a workaround for your problem.
>
> You could run following in a transaction.
>
> - begin
> - Create another table with exact same structure
> - write a procedure that reads from input table and updates the value
in
> between
> - drop the original table
> - rename new table to old one
> - commit
> - analyze new table
>
> Except for increased disk space, this approach has all the good things
> postgresql offers. Especially using transactable DDLs it is huge
benefit.
> You
> certainly do save on vacuum.
>
> If the entire table is updated then you can almost certainly get
things
> done
> faster this way.

I second this approach. Space usage similar, time faster.

For a procedure, you can use an Insert Select with a Case statement
instead of the UPDATE step.

This type of approach is in use in many other DBMS. I would vote
strongly (how is that possible?) against doing anything at all to MVCC.
It works, it's brilliant and doesn't fall foul of anybody else's patent
rights. It would be very hard to re-write anything as good and the time
spent would double other requirements. MySQL has multiple storage
managers and that must slow them down enormously trying to test or even
fine tune things.

Best Regards, Simon Riggs


From: Paul Tillotson <pntil(at)shentel(dot)net>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Avoid MVCC using exclusive lock possible?
Date: 2004-03-02 00:59:27
Message-ID: 4043DC6F.80001@shentel.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I use this type of approach when mirroring data from a foxpro database
(yuck) to a read-only postgres database. It is quicker and cleaner than
deleting all of the rows and inserting them again (TRUNCATE is not
transaction safe, which I need).

However, for this to be useful, your table must not have any indexes,
views, foreign keys, sequences, triggers, etc., or else you must be
prepared to re-create all of them using application level code.

I imagine this would break lots of things, but it would be nice if
instead of Shridhar's rename step (see below) one could do this:

$table1node = query("SELECT relfilenode FROM pg_class WHERE relname =
'$old_table';");
$table2node = query("SELECT relfilenode FROM pg_class WHERE relname =
'$new_table';");
exec("UPDATE pg_class SET relfilenode = $table2node WHERE relname =
'$old_table';");
exec("UPDATE pg_class SET relfilenode = $table1node WHERE relname =
'$new_table';");

You would of course need to change the relfilenode for all of the
toasted columns and indexes as well in the same atomic step, but it
seems like this might be more compatible with postgresql's MVCC model
than other ideas suggested.

Regards,
Paul Tillotson

Shridhar Daithankar wrote:

>I am sure people have answered the approach you have suggested so let me
>suggest a workaround for your problem.
>
>You could run following in a transaction.
>
>- begin
>- Create another table with exact same structure
>- write a procedure that reads from input table and updates the value in
>between
>- drop the original table
>- rename new table to old one
>- commit
>- analyze new table
>
>Except for increased disk space, this approach has all the good things
>postgresql offers. Especially using transactable DDLs it is huge benefit. You
>certainly do save on vacuum.
>
>If the entire table is updated then you can almost certainly get things done
>faster this way.
>
>HTH
>
> Shridhar
>
>
>---------------------------(end of broadcast)---------------------------
>TIP 1: subscribe and unsubscribe commands go to majordomo(at)postgresql(dot)org
>
>
>
>


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Paul Tillotson <pntil(at)shentel(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Avoid MVCC using exclusive lock possible?
Date: 2004-03-02 01:30:39
Message-ID: 26727.1078191039@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Paul Tillotson <pntil(at)shentel(dot)net> writes:
> I use this type of approach when mirroring data from a foxpro database
> (yuck) to a read-only postgres database. It is quicker and cleaner than
> deleting all of the rows and inserting them again (TRUNCATE is not
> transaction safe, which I need).

Just FYI, TRUNCATE *is* transaction safe as of 7.4.

regards, tom lane


From: Shridhar Daithankar <shridhar(at)frodo(dot)hserus(dot)net>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Avoid MVCC using exclusive lock possible?
Date: 2004-03-02 06:58:22
Message-ID: 200403021228.22731.shridhar@frodo.hserus.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tuesday 02 March 2004 06:29, Paul Tillotson wrote:
> However, for this to be useful, your table must not have any indexes,
> views, foreign keys, sequences, triggers, etc., or else you must be
> prepared to re-create all of them using application level code.

Which isn't a big deal is it? You can write a single function to create entire
object and it's dependency. It is one time job but can save lots of time at
runtime.

> I imagine this would break lots of things, but it would be nice if
> instead of Shridhar's rename step (see below) one could do this:
>
> $table1node = query("SELECT relfilenode FROM pg_class WHERE relname =
> '$old_table';");
> $table2node = query("SELECT relfilenode FROM pg_class WHERE relname =
> '$new_table';");
> exec("UPDATE pg_class SET relfilenode = $table2node WHERE relname =
> '$old_table';");
> exec("UPDATE pg_class SET relfilenode = $table1node WHERE relname =
> '$new_table';");
>
> You would of course need to change the relfilenode for all of the
> toasted columns and indexes as well in the same atomic step, but it
> seems like this might be more compatible with postgresql's MVCC model
> than other ideas suggested.

I still don't understand what is not so good about rename. All the indexes
remain there. Views need updation, I agree. Missed that last time. But what
you are suggesting is a part of rename if not complete of it.

I would always prefer to let PG handle these kind of details. Not very
comfortable with mucking around catalogs especially if there exists an
alternative.

Shridhar


From: Yves Darmaillac <yves(dot)darmaillac(at)club-internet(dot)fr>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Avoid MVCC using exclusive lock possible?
Date: 2004-03-25 13:04:41
Message-ID: 4062D8E9.8020009@club-internet.fr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Sullivan a écrit :

>On Sun, Feb 29, 2004 at 10:43:34AM -0500, Tom Lane wrote:
>
>
>>general I think our VACUUM-based approach is superior to the
>>Oracle-style UNDO approach, because it pushes the maintenance overhead
>>out of foreground transaction processing and into a schedulable
>>background process. Certainly any Oracle DBA will tell you that huge
>>
>>
>
>I completely agree with this. If the recent work on lowering the
>overall cost ov VACUUM on loaded systems pays off, then I think there
>can be no argument that the work-now, vacuum-later strategy is the
>best approach, simply because it deals with the outlying and
>unexpected cases better than the alternatives. I know too many
>people who have been burned by running out of rollback segments when
>some use pattern emerged that they hadn't planned for.
>
>A
>
I agree too. The VACUUM aproach is better as it reduces fragmentation
and chained rows due to columns size change.