Alter/update large tables - VERRRY annoying behaviour!

Lists: pgsql-adminpgsql-bugspgsql-generalpgsql-patches
From: Dmitry Tkach <dmitry(at)openratings(dot)com>
To: pgsql-admin(at)postgresql(dot)org(dot)pgsql-bugs(at)postgresql(dot)org(dot)pgsql-general(at)postgresql(dot)org(dot)pgsql-patches(at)postgresql(dot)org
Subject: Alter/update large tables - VERRRY annoying behaviour!
Date: 2002-04-15 17:07:20
Message-ID: 3CBB08C8.7000705@openratings.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin pgsql-bugs pgsql-general pgsql-patches

Hi, everybody!

I was wonderring if it is a known "feature" to begin with, and, if there are any plans to fix
it in future?
I had two very large tables in my database (about 30 million rows each), connected by a foreign key,
and wanted to merge them together...

Something like this:

create table a
(
id int primary key,
some_data int
);
create table b
(
id int unique references a,
other_data int
);

So, what I did was:

alter table a add other_data int;
update a set other_data = b.other_data from b where b.id=a.id;

This took me awfully long, but worked (I guess).
I say 'I guess', because I wasn't able so far to verify that - when I triued to do

select * from a limit 1;

It just hungs on me ... at least, it looks like it does.

Lucky me, I have compiled the backend from sources with full debug info, because if I hadn't done that,
(as most users), I would certainly had thought, that my database is hopelessly corrupted, and would have to
recreate it from scratch :-(
Instead, I loaded the whole thing into a debugger, because that seems to be the only way to figure out what
the hell it is thinking about...
So, what I found out was that it seems to have recreated my entire table when I updated it, and left all the
old tuples in it as well, so, my 'select *...limit 1' query was cycling through 30 million deleted tuples, trying
to find the first one that was still valid, and that's waht was taking that long time...

First of all, a question for you - is ANY update to a table equivalent (in this respect) to a delete+insert?
Or is my problem specific to the fact that I have altered the table and add new columns?

Now, I understand, that, if I vacuum'ed it, the problem would have been resolved... The problems I have with it
though are:

- As I said, the behaviour was so unexpected, I would have trashed the whole databse, if I wasn't able to debug
it... If there is no other possible solution, I think, at the very least, it should give the user some indication
that it's not hopelessly hung, when doing that query...

- Vacuum, isn't the speediest thing in the world too (it's been running for a hour now, and still has not finished).
I was hoping to complete modifying my schema first, and then just vacuum everything once. So, it would be
REALLY, REALLY helpful for situations like that, if PG was smart enough to keep track somehow of those deleted
tuples, to avoid having to scan through them all every time...
In my particular situation, the solution would be trivial (just remembering the address of the first valid
tuple would suffice - because the entire table was updated)... I am not familiar enough with internals to suggest
anything more general than this, but EVEN fixing only this particular scenario, would, I believe, be extremely
useful....

Do you agree?

Thanks a lot!

Dima.


From: Neil Conway <nconway(at)klamath(dot)dyndns(dot)org>
To: "Dmitry Tkach" <dmitry(at)openratings(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Alter/update large tables - VERRRY annoying behaviour!
Date: 2002-04-15 18:24:51
Message-ID: 20020415142451.1d8a21d0.nconway@klamath.dyndns.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin pgsql-bugs pgsql-general pgsql-patches

On Mon, 15 Apr 2002 13:07:20 -0400
"Dmitry Tkach" <dmitry(at)openratings(dot)com> wrote:
> Hi, everybody!

Hi Dmitry! Don't cross-post! It's annoying!

> This took me awfully long, but worked (I guess).
> I say 'I guess', because I wasn't able so far to verify that - when I triued to do
>
> select * from a limit 1;
>
> It just hungs on me ... at least, it looks like it does.

This didn't hang, it just requires a sequential scan of the whole table.
As you observe below, it will also need to scan through dead tuples,
but that is just a product of MVCC and there's no real way around
it. Once you VACUUM the dead tuples will be removed and sequential
scans should be fast once more.

And before assuming that something has hung, it's a good idea to
look at the output of EXPLAIN for that query, as well as monitoring
system performance (through top, vmstat, etc) to see what the
system is doing.

> Lucky me, I have compiled the backend from sources with full debug info, because if I hadn't done that,
> (as most users), I would certainly had thought, that my database is hopelessly corrupted, and would have to
> recreate it from scratch :-(

That's a ludicrous conclusion.

> First of all, a question for you - is ANY update to a table equivalent (in this respect) to a delete+insert?

Yes, AFAIK -- MVCC requires this.

> - Vacuum, isn't the speediest thing in the world too (it's been running for a hour now, and still has not finished).

Is this 7.2? If not, VACUUM should be substantially faster in 7.2.
In any case, you'll always want to VACUUM or VACUUM FULL (and
ANALYZE) when you change your tables in such a dramatic fashion.

Cheers,

Neil

--
Neil Conway <neilconway(at)rogers(dot)com>
PGP Key ID: DB3C29FC


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Dmitry Tkach <dmitry(at)openratings(dot)com>
Cc: Neil Conway <nconway(at)klamath(dot)dyndns(dot)org>, pgsql-general(at)postgresql(dot)org
Subject: Re: Alter/update large tables - VERRRY annoying behaviour!
Date: 2002-04-15 20:34:10
Message-ID: 200204152034.g3FKYAx21352@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin pgsql-bugs pgsql-general pgsql-patches

> Hi Dmitry! Don't cross-post! It's annoying!

Is overly cross posting or HTML mail more annoying? I can't decide. :-)

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026


From: Dmitry Tkach <dmitry(at)openratings(dot)com>
To: Neil Conway <nconway(at)klamath(dot)dyndns(dot)org>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Alter/update large tables - VERRRY annoying behaviour!
Date: 2002-04-15 21:25:40
Message-ID: 3CBB4554.90701@openratings.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin pgsql-bugs pgsql-general pgsql-patches

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<title></title>
</head>
<body>
Neil Conway wrote:<br>
<blockquote type="cite" cite="mid:20020415142451(dot)1d8a21d0(dot)nconway(at)klamath(dot)dyndns(dot)org">
<pre wrap="">On Mon, 15 Apr 2002 13:07:20 -0400<br>"Dmitry Tkach" <a class="moz-txt-link-rfc2396E" href="mailto:dmitry(at)openratings(dot)com">&lt;dmitry(at)openratings(dot)com&gt;</a> wrote:<br></pre>
<blockquote type="cite">
<pre wrap="">Hi, everybody!<br></pre>
</blockquote>
<pre wrap=""><!----><br>Hi Dmitry! Don't cross-post! It's annoying!<br></pre>
</blockquote>
What do you mean by 'cross-post'?<br>
Are you saying that posting to several lists at a time is annoying?<br>
I just thought, that this problem might be interesting to people, who read
those (and not necessarily ALL <br>
of them)... What's annoying about it?<br>
<blockquote type="cite" cite="mid:20020415142451(dot)1d8a21d0(dot)nconway(at)klamath(dot)dyndns(dot)org">
<pre wrap=""><br></pre>
<blockquote type="cite">
<pre wrap="">This took me awfully long, but worked (I guess).<br>I say 'I guess', because I wasn't able so far to verify that - when I triued to do<br><br>select * from a limit 1;<br><br>It just hungs on me ... at least, it looks like it does.<br></pre>
</blockquote>
<pre wrap=""><!----><br>This didn't hang, it just requires a sequential scan of the whole table.</pre>
</blockquote>
I know it does (as I said below). The point is that it SHOULD NOT, and especially,
that I can't imagine anyone, not familiar with postgres internals to expect
that it would - all it needs to do is to grab the first row and return immediately.<br>
That's what it would do, if you just create a new table and populate it with
data.<br>
<br>
<blockquote type="cite" cite="mid:20020415142451(dot)1d8a21d0(dot)nconway(at)klamath(dot)dyndns(dot)org">
<pre wrap=""><br>As you observe below, it will also need to scan through dead tuples,</pre>
</blockquote>
Not 'also' - JUST the dead ones! That's what's especially annoying about
it - as soon as it finds the first tuple, that's not dead, it returns.<br>
<blockquote type="cite" cite="mid:20020415142451(dot)1d8a21d0(dot)nconway(at)klamath(dot)dyndns(dot)org">
<pre wrap=""><br>but that is just a product of MVCC and there's no real way around<br>it.</pre>
</blockquote>
My whole point is that I don't believe it (that there is no way around) :-)<br>
For one thing, I have never seen ANY database engine (Oracle, Informix, DB2)
that would take more than a second to get the first row from a table, regardless
of what has been done to that table before.<br>
That (and my common sense too) tells me that there MUST be a 'way around
it'. <br>
I can see, that it's not currently implemented in postgres, but do believe
(and that's the whole point of me posting that message in the first place)
that it is a huge usability issue and really needs to be fixed.<br>
<blockquote type="cite" cite="mid:20020415142451(dot)1d8a21d0(dot)nconway(at)klamath(dot)dyndns(dot)org">
<pre wrap=""> Once you VACUUM the dead tuples will be removed and sequential<br>scans should be fast once more.<br></pre>
</blockquote>
<br>
Yeah... I hope so. I am running vacuum on that table. It's been running for
6 hours now and still has not finished. <br>
Doesn't it look to you like a little too much trouble to go through just
to take a look at the first row of a table ? :-)<br>
<br>
And, once again, I am not done modifying that schema - this is just an intermediate
step, which means, I will have to do the vacuum all over when I am finished...<br>
<br>
This seems like WAY too much trouble to me :-(<br>
<br>
<blockquote type="cite" cite="mid:20020415142451(dot)1d8a21d0(dot)nconway(at)klamath(dot)dyndns(dot)org">
<pre wrap=""><br>And before assuming that something has hung, it's a good idea to<br>look at the output of EXPLAIN for that query, as well as monitoring<br>system performance (through top, vmstat, etc) to see what the<br>system is doing.<br></pre>
</blockquote>
Yeah, right...<br>
<br>
explain select * from a limit 1;<br>
NOTICE:  QUERY PLAN:<br>
<br>
Limit  (cost=0.00..1.01 rows=1 width=46)<br>
  -&gt;  Seq Scan on a  (cost=0.00..32529003.00 rows=32243660 width=46)<br>
<br>
EXPLAIN<br>
<br>
There is absolutely nothing in this plan, that would suggest it will go on
executing for ages...<br>
Look at the 'cost' value for example...<br>
In any event, there is nothing different in this plan from what I was getting
before I modified the table (when the query would take just a few milliseconds
to be executed).<br>
<br>
As for monitoring system performance... Well, I could see it maxing out on
CPU usage and disk IO at times... How exactly does it help me to realize
it did not hung?<br>
<br>
(Let me clarify that - by 'hung' I mean 'not going to return the results
in any reasonable time', not necessarily 'not doing anything at all')<br>
<br>
<br>
<blockquote type="cite" cite="mid:20020415142451(dot)1d8a21d0(dot)nconway(at)klamath(dot)dyndns(dot)org">
<pre wrap=""><br></pre>
<blockquote type="cite">
<pre wrap="">Lucky me, I have compiled the backend from sources with full debug info, because if I hadn't done that,<br>(as most users), I would certainly had thought, that my database is hopelessly corrupted, and would have to<br>recreate it from scratch :-(<br></pre>
</blockquote>
<pre wrap=""><!----><br>That's a ludicrous conclusion.<br></pre>
</blockquote>
<br>
Why is it?<br>
<blockquote type="cite" cite="mid:20020415142451(dot)1d8a21d0(dot)nconway(at)klamath(dot)dyndns(dot)org">
<pre wrap=""><br></pre>
<blockquote type="cite">
<pre wrap="">First of all, a question for you - is ANY update to a table equivalent (in this respect) to a delete+insert?<br></pre>
</blockquote>
<pre wrap=""><!----><br>Yes, AFAIK -- MVCC requires this.<br></pre>
</blockquote>
<br>
What's MVCC?<br>
<blockquote type="cite" cite="mid:20020415142451(dot)1d8a21d0(dot)nconway(at)klamath(dot)dyndns(dot)org">
<pre wrap=""><br></pre>
<blockquote type="cite">
<pre wrap="">- Vacuum, isn't the speediest thing in the world too (it's been running for a hour now, and still has not finished).<br></pre>
</blockquote>
<pre wrap=""><!----><br>Is this 7.2? If not, VACUUM should be substantially faster in 7.2.</pre>
</blockquote>
Yes, it is 7.2<br>
<blockquote type="cite" cite="mid:20020415142451(dot)1d8a21d0(dot)nconway(at)klamath(dot)dyndns(dot)org">
<pre wrap=""><br>In any case, you'll always want to VACUUM or VACUUM FULL (and<br>ANALYZE) when you change your tables in such a dramatic fashion.<br><br></pre>
</blockquote>
<br>
I know... Once again, I was hoping to be able to complete my changes before
doing the vacuum :-(<br>
<br>
Thanks for your reply!<br>
<br>
Dima<br>
<br>
<br>
</body>
</html>

Attachment Content-Type Size
unknown_filename text/html 7.6 KB

From: Dmitry Tkach <dmitry(at)openratings(dot)com>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: Neil Conway <nconway(at)klamath(dot)dyndns(dot)org>, pgsql-general(at)postgresql(dot)org, Dmitry Tkach <dmitry(at)openratings(dot)com>
Subject: Re: Alter/update large tables - VERRRY annoying behaviour!
Date: 2002-04-15 22:10:11
Message-ID: 3CBB4FC3.20408@openratings.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin pgsql-bugs pgsql-general pgsql-patches

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<title></title>
</head>
<body>
Bruce Momjian wrote:<br>
<blockquote type="cite" cite="mid:200204152034(dot)g3FKYAx21352(at)candle(dot)pha(dot)pa(dot)us">
<blockquote type="cite">
<pre wrap=""> Hi Dmitry! Don't cross-post! It's annoying!<br></pre>
</blockquote>
<pre wrap=""><!----><br>Is overly cross posting or HTML mail more annoying? I can't decide. :-)<br><br></pre>
</blockquote>
Do you mean I am posting in HTML????<br>
I am sorry, if that's the case - I was sure that I have turned it off :-(<br>
Somehow, when I CC to myself, I am getting a text-only message...<br>
Do you see it as HTML?<br>
<br>
Dima<br>
<br>
<br>
</body>
</html>

Attachment Content-Type Size
unknown_filename text/html 744 bytes

From: "Nigel J(dot) Andrews" <nandrews(at)investsystems(dot)co(dot)uk>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Alter/update large tables - VERRRY annoying behaviour!
Date: 2002-04-15 22:59:49
Message-ID: Pine.LNX.4.21.0204152335270.20382-100000@ponder.fairway2k.co.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin pgsql-bugs pgsql-general pgsql-patches


On Mon, 15 Apr 2002, Dmitry Tkach wrote:

> Neil Conway wrote:
>
>> On Mon, 15 Apr 2002 13:07:20 -0400
>> "Dmitry Tkach" <dmitry(at)openratings(dot)com> wrote:
>
>>> This took me awfully long, but worked (I guess).
>>> I say 'I guess', because I wasn't able so far to verify that - when I
triued
to do
>>> select * from a limit 1;
>>> It just hungs on me ... at least, it looks like it does.
>
>> This didn't hang, it just requires a sequential scan of the whole table.
>
> I know it does (as I said below). The point is that it SHOULD NOT, and
> especially, that I can't imagine anyone, not familiar with postgres
> internals to expect that it would - all it needs to do is to grab the
> first row and return immediately.
> That's what it would do, if you just create a new table and populate it
> with data.
>
>> As you observe below, it will also need to scan through dead tuples,
>
> Not 'also' - JUST the dead ones! That's what's especially annoying about
> it - as soon as it finds the first tuple, that's not dead, it returns.
>
> but that is just a product of MVCC and there's no real way around
> it.
>
> My whole point is that I don't believe it (that there is no way around)
> :-)
> For one thing, I have never seen ANY database engine (Oracle, Informix,
> DB2) that would take more than a second to get the first row from a
> table, regardless of what has been done to that table before.
> That (and my common sense too) tells me that there MUST be a 'way around
> it'.
> I can see, that it's not currently implemented in postgres, but do
> believe (and that's the whole point of me posting that message in the
> first place) that it is a huge usability issue and really needs to be
> fixed.

Ok, so I realise that addressing this issue of using a limit of 1 doesn't
address anything associated with using this table for something useful, however
I find it an interesting point.

All that has been asked for is that a single row be returned. This doesn't have
to be the first valid row at all. Indeed isn't it one of axioms of SQL that
data is returned in some arbitrary order unless specifically forced into an
order? Therefore shouldn't doing SELECT * FROM x LIMIT 1 use an index, any
index, to just fetch a single row?

Or, is this an effect of a row needing to be visited in order to determine it's
'visibility' as proposed as a reason why a sequential scan was being performed
for one of my queries I posted about recently? I am still a little confused
about that though. If there is an index and deletes have been commited,
shouldn't the index have been updated and forced to forget about the deleted
rows unless there is a transaction open that can still access those deleted
items?

As for the rest of the argument, is it constructive? I for one thought the
original poster had done more than a normal user, me included, would have done
before posting.

>[stuff deleted]
>
>>> First of all, a question for you - is ANY update to a table equivalent (in
this respect) to a delete+insert?
>
>> Yes, AFAIK -- MVCC requires this.
>
> What's MVCC?

Funny, I was about to ask that question. Something about variable size of
fields in the physical storage?

>
>>> - Vacuum, isn't the speediest thing in the world too (it's been running
for a hour now, and still has not finished).
>
>> Is this 7.2? If not, VACUUM should be substantially faster in 7.2.
>
> Yes, it is 7.2

Pleased I haven't tried what I'm doing in a version older than 7.2 in that
case, and my table's only 1 million rows.

> [more deletions]

However, a useful thread, since it explains a 'feature' a ran into earlier
today after managing about 4/5 of an update of every row of my table. I had
been thinking the problem was large storage usage by the txtidx type, I
realise now it is the updated = deleted+inserted storage requirement. I'm
thinking of changing what I'm doing to a way I considered earlier [without
thinking about this thread] so I can stay with in the limits or my resources.

--
Nigel J. Andrews
Director

---
Logictree Systems Limited
Computer Consultants


From: "Nigel J(dot) Andrews" <nandrews(at)investsystems(dot)co(dot)uk>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Alter/update large tables - VERRRY annoying behaviour!
Date: 2002-04-15 23:16:07
Message-ID: Pine.LNX.4.21.0204160014030.20382-100000@ponder.fairway2k.co.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin pgsql-bugs pgsql-general pgsql-patches


It's sad replying to my own question [of sorts] but...

On Mon, 15 Apr 2002, Nigel J. Andrews wrote:
>
> On Mon, 15 Apr 2002, Dmitry Tkach wrote:
>
> > Neil Conway wrote:
> >
> >> On Mon, 15 Apr 2002 13:07:20 -0400
> >> "Dmitry Tkach" <dmitry(at)openratings(dot)com> wrote:
> >
> >>> First of all, a question for you - is ANY update to a table equivalent (in
> this respect) to a delete+insert?
> >
> >> Yes, AFAIK -- MVCC requires this.
> >
> > What's MVCC?
>
> Funny, I was about to ask that question. Something about variable size of
> fields in the physical storage?

I've remembered, alsmot: Multi Version ConCurrency?

[plenty deleted from those quoted messages but they you all realised that or
don't care]

--
Nigel J. Andrews
Director

---
Logictree Systems Limited
Computer Consultants


From: Jochem van Dieten <jochemd(at)oli(dot)tudelft(dot)nl>
To: "Nigel J(dot) Andrews" <nandrews(at)investsystems(dot)co(dot)uk>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Alter/update large tables - VERRRY annoying behaviour!
Date: 2002-04-16 10:19:12
Message-ID: 3CBBFAA0.1040305@oli.tudelft.nl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin pgsql-bugs pgsql-general pgsql-patches

Nigel J. Andrews wrote:
> On Mon, 15 Apr 2002, Nigel J. Andrews wrote:
>>On Mon, 15 Apr 2002, Dmitry Tkach wrote:
>>
>>>What's MVCC?
>>
>>Funny, I was about to ask that question. Something about variable size of
>>fields in the physical storage?
>
> I've remembered, alsmot: Multi Version ConCurrency?

Almost: Multi-Version Concurrency Control
http://www.postgresql.org/idocs/index.php?mvcc.html
http://www.onlamp.com/pub/a/onlamp/2001/05/25/postgresql_mvcc.html

If you need more info, Google is your friend.

Jochem


From: "Nigel J(dot) Andrews" <nandrews(at)investsystems(dot)co(dot)uk>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Alter/update large tables - VERRRY annoying behaviour!
Date: 2002-04-19 12:14:13
Message-ID: Pine.LNX.4.21.0204191312460.20382-100000@ponder.fairway2k.co.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin pgsql-bugs pgsql-general pgsql-patches

Revisiting this thread, at least I think it was this thread, I don't remember
anyone pointing out contrib/pgstattuple for monitoring the number of dead
tuples.

Just thought I'd mention it.

--
Nigel J. Andrews
Director

---
Logictree Systems Limited
Computer Consultants