Re: Hot Standby 0.2.1

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Hot Standby 0.2.1
Date: 2009-09-22 09:53:59
Message-ID: 4AB89EB7.8010606@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

In testing, it looks like there's still something wrong with the
subtransaction handling. I created a test function to create a large
number of subtransactions:

CREATE LANGUAGE plpgsql;
CREATE TABLE bar (id int4);

CREATE OR REPLACE FUNCTION subxids (n integer) RETURNS void LANGUAGE
plpgsql AS $$
BEGIN
IF n <= 0 THEN RETURN; END IF;
INSERT INTO bar VALUES (n);
PERFORM subxids(n - 1);
RETURN;
EXCEPTION WHEN raise_exception THEN NULL; END;
$$;

And used that to created 100 nested subtransactions in the primary server:

SELECT subxids(100);

I got this in the standby:

...
LOG: REDO @ 0/A000EA8; LSN 0/A000FCC: prev 0/A000E6C; xid 2662; len
264: Transaction - xid assignment xtop 2599 nsubxids 64; 2600 2601 2602
2603 2604 2605 2606 2607 2608 2609 2610 2611 2612 2613 2614 2615 2616
2617 2618 2619 2620 2621 2622 2623 2624 2625 2626 2627 2628 2629 2630
2631 2632 2633 2634 2635 2636 2637 2638 2639 2640 2641 2642 2643 2644
2645 2646 2647 2648 2649 2650 2651 2652 2653 2654 2655 2656 2657 2658
2659 2660 2661 2662 0
LOG: extend subtrans xid 2600 page 1 last_page 0
CONTEXT: rmgr 1 redo xid assignment xtop 2599 nsubxids 64; 2600 2601
2602 2603 2604 2605 2606 2607 2608 2609 2610 2611 2612 2613 2614 2615
2616 2617 2618 2619 2620 2621 2622 2623 2624 2625 2626 2627 2628 2629
2630 2631 2632 2633 2634 2635 2636 2637 2638 2639 2640 2641 2642 2643
2644 2645 2646 2647 2648 2649 2650 2651 2652 2653 2654 2655 2656 2657
2658 2659 2660 2661 2662 0
TRAP: FailedAssertion("!(((newestXact) >= ((TransactionId) 3)))", File:
"subtrans.c", Line: 299)
LOG: startup process (PID 28594) was terminated by signal 6: Aborted
LOG: terminating any other active server processes

Apparently an InvalidXid is sneaking into the unreported xid array
collected in the primary.

I actually bumped into this while I was testing a simpler implementation
of the logic to collect the unreported xids in the primary. As the patch
stands, we keep track of how many of the childXids at each
subtransaction level have already been reported, but it occurs to me
that it would be a lot simpler to populate a separate array of
unreported xids on-the-fly, as xids are assigned. With the simpler
implementation I bumped into another issue (see below), which is why I
wanted to test if it worked without the simplification.

So with the simpler logic, I had this problem. When I do this in the
primary:

postgres=# SELECT subxids(10000);
ERROR: stack depth limit exceeded
HINT: Increase the configuration parameter "max_stack_depth", after
ensuring the platform's stack depth limit is adequate.
CONTEXT: SQL statement "SELECT subxids( $1 - 1)"
PL/pgSQL function "subxids" line 4 at PERFORM
SQL statement "SELECT subxids( $1 - 1)"
...

I get this in the standby:

...
LOG: REDO @ 0/1000B6C4; LSN 0/1000B6F0: prev 0/1000B698; xid 4325; len
16: Transaction - abort: 2009-09-22 12:45:00.938243+03
LOG: record known xact 4325 latestObservedXid 4334
CONTEXT: rmgr 1 redo abort: 2009-09-22 12:45:00.938243+03
LOG: remove KnownAssignedXid 4325
CONTEXT: rmgr 1 redo abort: 2009-09-22 12:45:00.938243+03
LOG: REDO @ 0/1000B6F0; LSN 0/1000B71C: prev 0/1000B6C4; xid 4324; len
16: Transaction - abort: 2009-09-22 12:45:00.938438+03
LOG: record known xact 4324 latestObservedXid 4334
CONTEXT: rmgr 1 redo abort: 2009-09-22 12:45:00.938438+03
LOG: remove KnownAssignedXid 4324
CONTEXT: rmgr 1 redo abort: 2009-09-22 12:45:00.938438+03
LOG: REDO @ 0/1000B71C; LSN 0/1000B748: prev 0/1000B6F0; xid 4323; len
16: Transaction - abort: 2009-09-22 12:45:00.938632+03
LOG: record known xact 4323 latestObservedXid 4334
CONTEXT: rmgr 1 redo abort: 2009-09-22 12:45:00.938632+03
LOG: remove KnownAssignedXid 4323
CONTEXT: rmgr 1 redo abort: 2009-09-22 12:45:00.938632+03
LOG: 1 KnownAssignedXids 3619
CONTEXT: rmgr 1 redo abort: 2009-09-22 12:45:00.938632+03
FATAL: cannot remove KnownAssignedXid 4323
CONTEXT: rmgr 1 redo abort: 2009-09-22 12:45:00.938632+03
LOG: startup process (PID 31284) exited with exit code 1
LOG: terminating any other active server processes

It looks like the standby tries to remove XID 4323 from the
known-assigned hash table, but it's not there because it was removed and
set in pg_subtrans by an XLOG_XACT_ASSIGNMENT record earlier. I guess we
should just not throw an error in that case, but is there a way we could
catch that narrow case and still keep the check in
KnownAssignedXidsRemove()? It seems like the check might help catch
other bugs, so I'd rather keep it if possible.

I've pushed the simplified code to my git repository if you want to take
a look.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2009-09-22 10:01:02 Re: Hot Standby 0.2.1
Previous Message Simon Riggs 2009-09-22 09:37:50 RULEs and Read Only transactions