Re: Chronic performance issue with Replication Failover and FSM.

Lists: pgsql-hackers
From: Josh Berkus <josh(at)agliodbs(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Chronic performance issue with Replication Failover and FSM.
Date: 2012-03-13 23:53:53
Message-ID: 4F5FDE11.3030407@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

All,

I've discovered a built-in performance issue with replication failover
at one site, which I couldn't find searching the archives. I don't
really see what we can do to fix it, so I'm posting it here in case
others might have clever ideas.

1. The Free Space Map is not replicated between servers.

2. Thus, when we fail over to a replica, it starts with a blank FSM.

3. I believe replica also starts with zero counters for autovacuum.

4. On a high-UPDATE workload, this means that the replica assumes tables
have no free space until it starts to build a new FSM or autovacuum
kicks in on some of the tables, much later on.

5. If your hosting is such that you fail over a lot (such as on AWS),
then this causes cumulative table bloat which can only be cured by a
VACUUM FULL.

I can't see any way around this which wouldn't also bog down
replication. Clever ideas, anyone?

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


From: Daniel Farina <daniel(at)heroku(dot)com>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Chronic performance issue with Replication Failover and FSM.
Date: 2012-03-14 01:41:48
Message-ID: CAAZKuFZ7rDAjMZayCbhnqUhsX-SxRYaypfAq56MHvJrbRQAhcg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Mar 13, 2012 at 4:53 PM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
> All,
>
> I've discovered a built-in performance issue with replication failover
> at one site, which I couldn't find searching the archives.  I don't
> really see what we can do to fix it, so I'm posting it here in case
> others might have clever ideas.
>
> 1. The Free Space Map is not replicated between servers.
>
> 2. Thus, when we fail over to a replica, it starts with a blank FSM.
>
> 3. I believe replica also starts with zero counters for autovacuum.
>
> 4. On a high-UPDATE workload, this means that the replica assumes tables
> have no free space until it starts to build a new FSM or autovacuum
> kicks in on some of the tables, much later on.
>
> 5. If your hosting is such that you fail over a lot (such as on AWS),
> then this causes cumulative table bloat which can only be cured by a
> VACUUM FULL.
>
> I can't see any way around this which wouldn't also bog down
> replication.  Clever ideas, anyone?

Would it bog it down by "much"?

(1 byte per 8kb) * 2TB = 250MB. Even if you doubled or tripled it for
pointer-overhead reasons it's pretty menial, whereas VACUUM traffic is
already pretty intense. Still, it's clearly...work.

--
fdr


From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Chronic performance issue with Replication Failover and FSM.
Date: 2012-03-14 02:05:02
Message-ID: CAHGQGwECLh2tV1+MHapJg8+SAWErzvBbrFfen_F_6+PY7zACrA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Mar 14, 2012 at 8:53 AM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
> All,
>
> I've discovered a built-in performance issue with replication failover
> at one site, which I couldn't find searching the archives.  I don't
> really see what we can do to fix it, so I'm posting it here in case
> others might have clever ideas.
>
> 1. The Free Space Map is not replicated between servers.
>
> 2. Thus, when we fail over to a replica, it starts with a blank FSM.
>
> 3. I believe replica also starts with zero counters for autovacuum.
>
> 4. On a high-UPDATE workload, this means that the replica assumes tables
> have no free space until it starts to build a new FSM or autovacuum
> kicks in on some of the tables, much later on.

If it's really a high-UPDATE workload, wouldn't autovacuum start soon?

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


From: Daniel Farina <daniel(at)heroku(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Chronic performance issue with Replication Failover and FSM.
Date: 2012-03-14 02:39:24
Message-ID: CAAZKuFZeq__fbbss8pVWhq=_XdLjmON9U5BFQ+EmyjKdEuinbw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Mar 13, 2012 at 7:05 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:>
> If it's really a high-UPDATE workload, wouldn't autovacuum start soon?

Also, while vacuum cleanup records are applied, could not the standby
also update its free space map, without having to send the actual FSM
updates? I guess that's bogging down of another variety.

--
fdr


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Re: Chronic performance issue with Replication Failover and FSM.
Date: 2012-03-14 07:16:32
Message-ID: 4F6045D0.6020403@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 14.03.2012 01:53, Josh Berkus wrote:
> 1. The Free Space Map is not replicated between servers.
>
> 2. Thus, when we fail over to a replica, it starts with a blank FSM.

The FSM is included in the base backup, and it is updated when VACUUM
records are replayed.

It is also updated when insert/update/delete records are replayed,
athough there's some fuzziness there: records with full page images
don't update the FSM, and the FSM is only updated when the page has less
than 20% of free space left. But that would cause an error in the other
direction, with the FSM claiming that some pages have more free space
than they do in reality.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Re: Chronic performance issue with Replication Failover and FSM.
Date: 2012-03-20 21:41:56
Message-ID: 4F68F9A4.3080802@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Heikki,

> The FSM is included in the base backup, and it is updated when VACUUM
> records are replayed.

Oh? Hmmmm. In that case, the issue I'm seeing in production is
something else. Unless that was a change for 9.1?

> It is also updated when insert/update/delete records are replayed,
> athough there's some fuzziness there: records with full page images
> don't update the FSM, and the FSM is only updated when the page has less
> than 20% of free space left. But that would cause an error in the other
> direction, with the FSM claiming that some pages have more free space
> than they do in reality.

Thanks.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Re: Chronic performance issue with Replication Failover and FSM.
Date: 2012-03-21 07:11:40
Message-ID: 4F697F2C.6000708@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 20.03.2012 23:41, Josh Berkus wrote:
> Heikki,
>
>> The FSM is included in the base backup, and it is updated when VACUUM
>> records are replayed.
>
> Oh? Hmmmm. In that case, the issue I'm seeing in production is
> something else. Unless that was a change for 9.1?

No, it's been like that since 8.4, when the FSM was rewritten.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Daniel Farina <daniel(at)heroku(dot)com>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Chronic performance issue with Replication Failover and FSM.
Date: 2012-08-30 07:54:38
Message-ID: CAAZKuFYwLBKrkq85xuT4O_M-5UWjr0WavxnVN6enuf7=cLax7A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Mar 13, 2012 at 4:53 PM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
> 4. On a high-UPDATE workload, this means that the replica assumes tables
> have no free space until it starts to build a new FSM or autovacuum
> kicks in on some of the tables, much later on.
>
> 5. If your hosting is such that you fail over a lot (such as on AWS),
> then this causes cumulative table bloat which can only be cured by a
> VACUUM FULL.

I'd like to revive this thread. Like other people, I thought this was
not a huge problem -- or least maybe not directly from the mechanism
proposed -- but sometimes it's a pretty enormous one, and I've started
to notice it. I did a bug report here
(http://archives.postgresql.org/pgsql-bugs/2012-08/msg00108.php, plots
in http://archives.postgresql.org/pgsql-performance/2012-08/msg00181.php),
but just today we promoted another system via streaming replication to
pick up the planner fix in 9.1.5 (did you know: that planner bug seems
to make GIN FTS indexes un-used in non-exotic cases, and one goes to
seqscan?), and then a 40MB GIN index bloated to two gigs on a 1.5GB
table over the course of maybe six hours.

In addition, the thread on pgsql-performance that has the plot I
linked to indicates someone having the same problem with 8.3 after a
warm-standby promotion.

So I think there are some devils at work here, and I am not even sure
if they are hard to reproduce -- yet, people use standby promotion
("unfollow") on Heroku all the time and we have not been plagued
mightily by support issues involving such incredible bloating, so
there's something about the access pattern. In my two cases, there is
a significant number of UPDATEs vs actual number of INSERTs/DELETES of
records (the ratio is probably 10000+ to 1), even though neither of
these would be close to what one could consider a large or even
medium-sized database in terms of TPS or database size. In fact, the
latter system bloated even though it comfortably fits entirely in
memory.

--
fdr


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Daniel Farina <daniel(at)heroku(dot)com>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Chronic performance issue with Replication Failover and FSM.
Date: 2012-08-30 14:05:04
Message-ID: 4805.1346335504@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Daniel Farina <daniel(at)heroku(dot)com> writes:
> but just today we promoted another system via streaming replication to
> pick up the planner fix in 9.1.5 (did you know: that planner bug seems
> to make GIN FTS indexes un-used in non-exotic cases, and one goes to
> seqscan?), and then a 40MB GIN index bloated to two gigs on a 1.5GB
> table over the course of maybe six hours.

I think this is probably unrelated to what Josh was griping about:
even granting that the system forgot any free space that had been
available within the original 40MB, that couldn't in itself lead
to eating more than another 40MB, no? My guess is something is
broken about the oldest-xmin-horizon mechanism, such that VACUUM
is failing to recover space. Can you put together a self-contained
test case that exhibits similar growth?

regards, tom lane