long running transactions

Lists: pgsql-performance
From: Tobias Brox <tobias(at)nordicbet(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: long running transactions
Date: 2006-10-10 16:10:27
Message-ID: 20061010161027.GA18908@oppetid.no
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

While doing a verbose vacuum, I'm constantly hitting things like:

DETAIL: 3606 dead row versions cannot be removed yet.

I believe this is a problem, because I still do have some empty tables
requireing up to 3-400 ms just to check if the table is empty (see
thread "slow queue-like empty table").

If pg_stat_activity.query_start actually is the start time of the
transaction, then we've gotten rid of all the real long-running
transactions. Then again, if pg_stat_activity.query_start actually was
the start time of the transaction, the attribute would have been called
pg_stat_activity.transaction_start, right?

Is there any way to find the longest running transaction?


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Tobias Brox <tobias(at)nordicbet(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: long running transactions
Date: 2006-10-10 16:23:40
Message-ID: 25404.1160497420@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Tobias Brox <tobias(at)nordicbet(dot)com> writes:
> If pg_stat_activity.query_start actually is the start time of the
> transaction,

... but it isn't.

> Is there any way to find the longest running transaction?

Look in pg_locks to see the lowest-numbered transaction ID --- each
transaction will be holding exclusive lock on its own XID. You can
correlate that back to pg_stat_activity via the PID.

regards, tom lane


From: Tobias Brox <tobias(at)nordicbet(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Tobias Brox <tobias(at)nordicbet(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: long running transactions
Date: 2006-10-10 16:39:13
Message-ID: 20061010163913.GA8097@oppetid.no
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

[Tom Lane - Tue at 12:23:40PM -0400]
> Look in pg_locks to see the lowest-numbered transaction ID --- each
> transaction will be holding exclusive lock on its own XID. You can
> correlate that back to pg_stat_activity via the PID.

Thanks a lot for the quick reply - I've already identified one
long-running transaction.

(I'm not allowed to order by xid, and not allowed to cast it to
anything, how come?)


From: Tobias Brox <tobias(at)nordicbet(dot)com>
To: Tobias Brox <tobias(at)nordicbet(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-performance(at)postgresql(dot)org
Subject: Re: long running transactions
Date: 2006-10-10 16:41:42
Message-ID: 20061010164142.GA5360@oppetid.no
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

[Tobias Brox - Tue at 06:39:13PM +0200]
> Thanks a lot for the quick reply - I've already identified one
> long-running transaction.

belonging to autovacuum ... how come?


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Tobias Brox <tobias(at)nordicbet(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: long running transactions
Date: 2006-10-10 16:42:52
Message-ID: 25631.1160498572@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Tobias Brox <tobias(at)nordicbet(dot)com> writes:
> [Tobias Brox - Tue at 06:39:13PM +0200]
>> Thanks a lot for the quick reply - I've already identified one
>> long-running transaction.

> belonging to autovacuum ... how come?

Blocked on someone else's lock, maybe?

regards, tom lane


From: Tobias Brox <tobias(at)nordicbet(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Tobias Brox <tobias(at)nordicbet(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: long running transactions
Date: 2006-10-10 17:06:39
Message-ID: 20061010170639.GB5360@oppetid.no
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

[Tom Lane - Tue at 12:42:52PM -0400]
> > belonging to autovacuum ... how come?
>
> Blocked on someone else's lock, maybe?

hardly, the autovacuum is the only one having such a low transaction id,
and also the only one hanging around when waiting a bit and rechecking
the pg_locks table.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Tobias Brox <tobias(at)nordicbet(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: long running transactions
Date: 2006-10-10 17:09:52
Message-ID: 25953.1160500192@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Tobias Brox <tobias(at)nordicbet(dot)com> writes:
>> Blocked on someone else's lock, maybe?

> hardly, the autovacuum is the only one having such a low transaction id,
> and also the only one hanging around when waiting a bit and rechecking
> the pg_locks table.

Hmph. Is the autovac process actually doing anything (strace would be
revealing)? If not, can you attach to the autovac process with gdb and
get a stack trace to see where it's blocked?

regards, tom lane


From: Tobias Brox <tobias(at)nordicbet(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Tobias Brox <tobias(at)nordicbet(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: long running transactions
Date: 2006-10-10 17:13:04
Message-ID: 20061010171304.GC5360@oppetid.no
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

[Tom Lane - Tue at 01:09:52PM -0400]
> Hmph. Is the autovac process actually doing anything (strace would be
> revealing)? If not, can you attach to the autovac process with gdb and
> get a stack trace to see where it's blocked?

Sorry ... I SIGINT'ed it, and now it's gone :-( I thought reloading the
config would restart autovacuum. Well, whatever, we still have the
nightly vacuum crontab.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Tobias Brox <tobias(at)nordicbet(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: long running transactions
Date: 2006-10-10 17:18:27
Message-ID: 26123.1160500707@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Tobias Brox <tobias(at)nordicbet(dot)com> writes:
> [Tom Lane - Tue at 01:09:52PM -0400]
>> Hmph. Is the autovac process actually doing anything (strace would be
>> revealing)? If not, can you attach to the autovac process with gdb and
>> get a stack trace to see where it's blocked?

> Sorry ... I SIGINT'ed it, and now it's gone :-( I thought reloading the
> config would restart autovacuum.

It'll come back after the autovacuum naptime. If it gets stuck again,
please investigate.

regards, tom lane


From: Tobias Brox <tobias(at)nordicbet(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Tobias Brox <tobias(at)nordicbet(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: long running transactions
Date: 2006-10-10 17:49:55
Message-ID: 20061010174955.GA284@oppetid.no
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

[Tom Lane - Tue at 01:18:27PM -0400]
> >> Hmph. Is the autovac process actually doing anything (strace would be
> >> revealing)?

It's definitively doing something; mostly reading, but also some few
writes, semops and opens.

> If not, can you attach to the autovac process with gdb and
> >> get a stack trace to see where it's blocked?

(gdb) bt
#0 0xb7c599f8 in select () from /lib/tls/libc.so.6
#1 0x08253c53 in pg_usleep ()
#2 0x0812ee93 in vacuum_delay_point ()
#3 0x0812f2a5 in lazy_vacuum_rel ()
#4 0x0812ef7b in lazy_vacuum_rel ()
#5 0x0812b4b6 in vac_update_relstats ()
#6 0x0812a995 in vacuum ()
#7 0x0818d2ca in autovac_stopped ()
#8 0x0818ceae in autovac_stopped ()
#9 0x0818c848 in autovac_stopped ()
#10 0x0818c4e2 in autovac_start ()
#11 0x08192c11 in PostmasterMain ()
#12 0x08191dcf in PostmasterMain ()
#13 0x081541b1 in main ()

> It'll come back after the autovacuum naptime. If it gets stuck again,
> please investigate.

It seems stuck, has had the same transid for a long while, and the
number of undeletable dead rows in our tables are increasing.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Tobias Brox <tobias(at)nordicbet(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: long running transactions
Date: 2006-10-10 18:04:55
Message-ID: 528.1160503495@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Tobias Brox <tobias(at)nordicbet(dot)com> writes:
> (gdb) bt
> #0 0xb7c599f8 in select () from /lib/tls/libc.so.6
> #1 0x08253c53 in pg_usleep ()
> #2 0x0812ee93 in vacuum_delay_point ()
> #3 0x0812f2a5 in lazy_vacuum_rel ()
> #4 0x0812ef7b in lazy_vacuum_rel ()
> #5 0x0812b4b6 in vac_update_relstats ()

That doesn't look particularly blocked, and if you are seeing
reads/writes too, then it's doing something.

> It seems stuck, has had the same transid for a long while, and the
> number of undeletable dead rows in our tables are increasing.

Perhaps you have overly aggressive vacuum cost delay settings?

regards, tom lane


From: Tobias Brox <tobias(at)nordicbet(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Tobias Brox <tobias(at)nordicbet(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: long running transactions
Date: 2006-10-10 18:19:53
Message-ID: 20061010181953.GA7584@oppetid.no
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

[Tom Lane - Tue at 02:04:55PM -0400]
> > It seems stuck, has had the same transid for a long while, and the
> > number of undeletable dead rows in our tables are increasing.
>
> Perhaps you have overly aggressive vacuum cost delay settings?

Perhaps, though I wouldn't expect it to sleep in the middle of a
transaction - and also, it really did seem to me that it's doing work
rather than only sleeping.

The transaction id for the vacuum process is the same now as when I
wrote the previous email, and the number of dead unremovable rows have
increased steadily.

The settings in effect are:

autovacuum_vacuum_cost_delay = 500
autovacuum_vacuum_cost_limit = 200


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Tobias Brox <tobias(at)nordicbet(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: long running transactions
Date: 2006-10-10 18:26:53
Message-ID: 772.1160504813@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Tobias Brox <tobias(at)nordicbet(dot)com> writes:
>> Perhaps you have overly aggressive vacuum cost delay settings?

> autovacuum_vacuum_cost_delay = 500
> autovacuum_vacuum_cost_limit = 200

Well, that's going to cause it to sleep half a second after every dozen
or so page I/Os. I think you'd be well advised to reduce the delay.

regards, tom lane


From: Tobias Brox <tobias(at)nordicbet(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Tobias Brox <tobias(at)nordicbet(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: long running transactions
Date: 2006-10-10 18:43:54
Message-ID: 20061010184354.GA24098@oppetid.no
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

[Tom Lane - Tue at 02:26:53PM -0400]
> > autovacuum_vacuum_cost_delay = 500
> > autovacuum_vacuum_cost_limit = 200
>
> Well, that's going to cause it to sleep half a second after every dozen
> or so page I/Os. I think you'd be well advised to reduce the delay.

Modified it to 20/250, and it definitively helped. Sorry for the
list verbosity; I should have been able to resolve this myself already
some 2-3 emails ago :-) I wanted a "soft" introduction of autovac in
production, and assumed that it was better to begin with too much sleep
than too little! Well, well.