Re: [HACKERS] [PATCH] Provide 8-byte transaction IDs to user level

Lists: pgsql-hackerspgsql-patches
From: Marko Kreen <markokr(at)gmail(dot)com>
To: pgsql-patches(at)postgresql(dot)org
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: [PATCH] Provide 8-byte transaction IDs to user level
Date: 2006-07-21 14:17:07
Message-ID: 20060721141649.GA22826@l-t.ee
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


Intro
-----

Following patch exports 8 byte txid and snapshot to user level
allowing its use in regular SQL. It is based on Slony-I xxid
module. It provides special 'snapshot' type for snapshot but
uses regular int8 for transaction ID's.

Exported API
------------

Type: snapshot

Functions:

current_txid() returns int8
current_snapshot() returns snapshot
snapshot_xmin(snapshot) returns int8
snapshot_xmax(snapshot) returns int8
snapshot_active_list(snapshot) returns setof int8
snapshot_contains(snapshot, int8) returns bool
pg_sync_txid(int8) returns int8

Operation
---------

Extension to 8-byte is done by keeping track of wraparound count
in pg_control. On every checkpoint, nextxid is compared to one
stored in pg_control. If value is smaller wraparound happened
and epoch is inreased.

When long txid or snapshot is requested, pg_control is locked with
LW_SHARED for retrieving epoch value from it. The patch does not
affect core functionality in any other way.

Backup/restore of txid data
---------------------------

Currently I made pg_dumpall output following statement:

"SELECT pg_sync_txid(%d)", current_txid()

then on target database, pg_sync_txid if it's current
(epoch + GetTopTransactionId()) are larger than given argument.
If not then it bumps epoch, until they are, thus guaranteeing that
new issued txid's are larger then in source database. If restored
into same database instance, nothing will happen.

Advantages of 8-byte txids
--------------------------

* Indexes won't break silently. No need for mandatory periodic
truncate which may not happen for various reasons.
* Allows to keep values from different databases in one table/index.
* Ability to bring data into different server and continue there.

Advantages in being in core
---------------------------

* Core code can guarantee that wraparound check happens in 2G transactions.
* Core code can update pg_control non-transactionally. Module
needs to operate inside user transaction when updating epoch
row, which bring various problems (READ COMMITTED vs. SERIALIZABLE,
long transactions, locking, etc).
* Core code has only one place where it needs to update, module
needs to have epoch table in each database.

Todo, tothink
-------------

* Flesh out the documentation. Probably needs some background.
* Better names for some functions?
* pg_sync_txid allows use of pg_dump for moveing database,
but also adds possibility to shoot in the foot by allowing
epoch wraparound to happen. Is "Don't do it then" enough?
* Currently txid keeps its own copy of nextxid in pg_control,
this makes clear data dependencies. Its possible to drop it
and use ->checkPointCopy->nextXid directly, thus saving 4 bytes.
* Should the pg_sync_txid() issued by pg_dump instead pg_dumpall?

--
marko

Attachment Content-Type Size
txid.diff text/plain 33.2 KB

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Marko Kreen <markokr(at)gmail(dot)com>
Cc: pgsql-patches(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH] Provide 8-byte transaction IDs to user level
Date: 2006-07-26 17:35:34
Message-ID: 200607261735.k6QHZYE00677@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


I am sure you worked hard on this, but I don't see the use case, nor
have I heard people in the community requesting such functionality.
Perhaps pgfoundry would be a better place for this.

---------------------------------------------------------------------------

Marko Kreen wrote:
>
> Intro
> -----
>
> Following patch exports 8 byte txid and snapshot to user level
> allowing its use in regular SQL. It is based on Slony-I xxid
> module. It provides special 'snapshot' type for snapshot but
> uses regular int8 for transaction ID's.
>
> Exported API
> ------------
>
> Type: snapshot
>
> Functions:
>
> current_txid() returns int8
> current_snapshot() returns snapshot
> snapshot_xmin(snapshot) returns int8
> snapshot_xmax(snapshot) returns int8
> snapshot_active_list(snapshot) returns setof int8
> snapshot_contains(snapshot, int8) returns bool
> pg_sync_txid(int8) returns int8
>
> Operation
> ---------
>
> Extension to 8-byte is done by keeping track of wraparound count
> in pg_control. On every checkpoint, nextxid is compared to one
> stored in pg_control. If value is smaller wraparound happened
> and epoch is inreased.
>
> When long txid or snapshot is requested, pg_control is locked with
> LW_SHARED for retrieving epoch value from it. The patch does not
> affect core functionality in any other way.
>
> Backup/restore of txid data
> ---------------------------
>
> Currently I made pg_dumpall output following statement:
>
> "SELECT pg_sync_txid(%d)", current_txid()
>
> then on target database, pg_sync_txid if it's current
> (epoch + GetTopTransactionId()) are larger than given argument.
> If not then it bumps epoch, until they are, thus guaranteeing that
> new issued txid's are larger then in source database. If restored
> into same database instance, nothing will happen.
>
>
> Advantages of 8-byte txids
> --------------------------
>
> * Indexes won't break silently. No need for mandatory periodic
> truncate which may not happen for various reasons.
> * Allows to keep values from different databases in one table/index.
> * Ability to bring data into different server and continue there.
>
> Advantages in being in core
> ---------------------------
>
> * Core code can guarantee that wraparound check happens in 2G transactions.
> * Core code can update pg_control non-transactionally. Module
> needs to operate inside user transaction when updating epoch
> row, which bring various problems (READ COMMITTED vs. SERIALIZABLE,
> long transactions, locking, etc).
> * Core code has only one place where it needs to update, module
> needs to have epoch table in each database.
>
> Todo, tothink
> -------------
>
> * Flesh out the documentation. Probably needs some background.
> * Better names for some functions?
> * pg_sync_txid allows use of pg_dump for moveing database,
> but also adds possibility to shoot in the foot by allowing
> epoch wraparound to happen. Is "Don't do it then" enough?
> * Currently txid keeps its own copy of nextxid in pg_control,
> this makes clear data dependencies. Its possible to drop it
> and use ->checkPointCopy->nextXid directly, thus saving 4 bytes.
> * Should the pg_sync_txid() issued by pg_dump instead pg_dumpall?
>
> --
> marko
>

[ Attachment, skipping... ]

>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster

--
Bruce Momjian bruce(at)momjian(dot)us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Marko Kreen <markokr(at)gmail(dot)com>, pgsql-patches(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org, Jan Wieck <JanWieck(at)Yahoo(dot)com>
Subject: Re: [HACKERS] [PATCH] Provide 8-byte transaction IDs to user level
Date: 2006-07-26 20:04:11
Message-ID: 17604.1153944251@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Bruce Momjian <bruce(at)momjian(dot)us> writes:
> I am sure you worked hard on this, but I don't see the use case, nor
> have I heard people in the community requesting such functionality.
> Perhaps pgfoundry would be a better place for this.

The part of this that would actually be useful to put in core is
maintaining a 64-bit XID counter, ie, keep an additional counter that
bumps every time XID wraps around. This cannot be done very well from
outside core but it would be nearly trivial, and nearly free, to add
inside. Everything else in the patch could be done just as well as an
extension datatype.

(I wouldn't do it like this though --- TransactionIdAdvance itself is
the place to bump the secondary counter.)

The question though is if we did that, would Slony actually use it?

regards, tom lane


From: Darcy Buskermolen <darcyb(at)commandprompt(dot)com>
To: pgsql-patches(at)postgresql(dot)org
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Bruce Momjian <bruce(at)momjian(dot)us>, Marko Kreen <markokr(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Jan Wieck <JanWieck(at)yahoo(dot)com>
Subject: Re: [HACKERS] [PATCH] Provide 8-byte transaction IDs to user level
Date: 2006-07-26 20:41:09
Message-ID: 200607261341.10727.darcyb@commandprompt.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Wednesday 26 July 2006 13:04, Tom Lane wrote:
> Bruce Momjian <bruce(at)momjian(dot)us> writes:
> > I am sure you worked hard on this, but I don't see the use case, nor
> > have I heard people in the community requesting such functionality.
> > Perhaps pgfoundry would be a better place for this.
>
> The part of this that would actually be useful to put in core is
> maintaining a 64-bit XID counter, ie, keep an additional counter that
> bumps every time XID wraps around. This cannot be done very well from
> outside core but it would be nearly trivial, and nearly free, to add
> inside. Everything else in the patch could be done just as well as an
> extension datatype.
>
> (I wouldn't do it like this though --- TransactionIdAdvance itself is
> the place to bump the secondary counter.)
>
> The question though is if we did that, would Slony actually use it?

If it made sence to do it, then yes we would do it. The problem ends up being
Slony is designed to work across a multitude of versions of PG, and unless
this was backported to at least 7.4, it would take a while (ie when we
stopped supporting versions older than it was ported into) before we would
make use of it.

>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster

--
Darcy Buskermolen
CommandPrompt, Inc.
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
http://www.commandprompt.com


From: Hannu Krosing <hannu(at)skype(dot)net>
To: Darcy Buskermolen <darcyb(at)commandprompt(dot)com>
Cc: pgsql-patches(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Bruce Momjian <bruce(at)momjian(dot)us>, Marko Kreen <markokr(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Jan Wieck <JanWieck(at)yahoo(dot)com>
Subject: Re: [HACKERS] [PATCH] Provide 8-byte transaction IDs to
Date: 2006-07-26 21:01:15
Message-ID: 1153947675.2928.12.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Ühel kenal päeval, K, 2006-07-26 kell 13:41, kirjutas Darcy Buskermolen:
> On Wednesday 26 July 2006 13:04, Tom Lane wrote:
> > Bruce Momjian <bruce(at)momjian(dot)us> writes:
> > > I am sure you worked hard on this, but I don't see the use case, nor
> > > have I heard people in the community requesting such functionality.
> > > Perhaps pgfoundry would be a better place for this.
> >
> > The part of this that would actually be useful to put in core is
> > maintaining a 64-bit XID counter, ie, keep an additional counter that
> > bumps every time XID wraps around. This cannot be done very well from
> > outside core but it would be nearly trivial, and nearly free, to add
> > inside. Everything else in the patch could be done just as well as an
> > extension datatype.
> >
> > (I wouldn't do it like this though --- TransactionIdAdvance itself is
> > the place to bump the secondary counter.)
> >
> > The question though is if we did that, would Slony actually use it?

It seems that Slony people still hope to circumvent the known brokenness
of xxid btree indexes by dropping and creating them often enough and/or
trying other workarounds.

> If it made sence to do it, then yes we would do it. The problem ends up being
> Slony is designed to work across a multitude of versions of PG, and unless
> this was backported to at least 7.4, it would take a while (ie when we
> stopped supporting versions older than it was ported into) before we would
> make use of it.

We already have an external implementation, which requires a function
call to be executed at an interval of a few hundreds of millions
transactions to pump up the higher int4 when needed.

It would probably be easy to backport it to any version of postgres
which is supported by slony.

Being in core just makes the overflow accounting part more robust.

The function to retrieve the 8-byte trx id will look exatly the same
from userland in both cases.

> >
> > regards, tom lane
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 2: Don't 'kill -9' the postmaster
>
--
----------------
Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me: callto:hkrosing
Get Skype for free: http://www.skype.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Darcy Buskermolen <darcyb(at)commandprompt(dot)com>
Cc: pgsql-patches(at)postgresql(dot)org, Bruce Momjian <bruce(at)momjian(dot)us>, Marko Kreen <markokr(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Jan Wieck <JanWieck(at)yahoo(dot)com>
Subject: Re: [HACKERS] [PATCH] Provide 8-byte transaction IDs to user level
Date: 2006-07-26 21:03:23
Message-ID: 18071.1153947803@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Darcy Buskermolen <darcyb(at)commandprompt(dot)com> writes:
>> The question though is if we did that, would Slony actually use it?

> If it made sence to do it, then yes we would do it. The problem ends up being
> Slony is designed to work across a multitude of versions of PG, and unless
> this was backported to at least 7.4, it would take a while (ie when we
> stopped supporting versions older than it was ported into) before we would
> make use of it.

[ shrug... ] That's not happening; for one thing the change requires a
layout change in pg_control and we have no mechanism to do that without
initdb.

regards, tom lane


From: Hannu Krosing <hannu(at)skype(dot)net>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Marko Kreen <markokr(at)gmail(dot)com>, pgsql-patches(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] [PATCH] Provide 8-byte transaction IDs to
Date: 2006-07-26 21:16:33
Message-ID: 1153948593.2928.28.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Ühel kenal päeval, K, 2006-07-26 kell 13:35, kirjutas Bruce Momjian:
> I am sure you worked hard on this, but I don't see the use case,

The use case is any slony-like replication system or queueing system
which needs consistent means of knowing batches of transactions which
have finished during some period.

You can think of this as a core component for building slony that does
*not* break at 2G trx.

> nor
> have I heard people in the community requesting such functionality.

You will, once more Slony users reach 2billion trx limit and start
silently losing data. And find out a few weeks later.

> Perhaps pgfoundry would be a better place for this.

At least the part that manages epoch should be in core.

The rest can actually be on pgfoundry as a separate project, or inside
skytools/pgQ.

> ---------------------------------------------------------------------------
>
> Marko Kreen wrote:
> >
> > Intro
> > -----
> >
> > Following patch exports 8 byte txid and snapshot to user level
> > allowing its use in regular SQL. It is based on Slony-I xxid
> > module. It provides special 'snapshot' type for snapshot but
> > uses regular int8 for transaction ID's.
> >
> > Exported API
> > ------------
> >
> > Type: snapshot
> >
> > Functions:
> >
> > current_txid() returns int8
> > current_snapshot() returns snapshot
> > snapshot_xmin(snapshot) returns int8
> > snapshot_xmax(snapshot) returns int8
> > snapshot_active_list(snapshot) returns setof int8
> > snapshot_contains(snapshot, int8) returns bool
> > pg_sync_txid(int8) returns int8
> >
> > Operation
> > ---------
> >
> > Extension to 8-byte is done by keeping track of wraparound count
> > in pg_control. On every checkpoint, nextxid is compared to one
> > stored in pg_control. If value is smaller wraparound happened
> > and epoch is inreased.
> >
> > When long txid or snapshot is requested, pg_control is locked with
> > LW_SHARED for retrieving epoch value from it. The patch does not
> > affect core functionality in any other way.
> >
> > Backup/restore of txid data
> > ---------------------------
> >
> > Currently I made pg_dumpall output following statement:
> >
> > "SELECT pg_sync_txid(%d)", current_txid()
> >
> > then on target database, pg_sync_txid if it's current
> > (epoch + GetTopTransactionId()) are larger than given argument.
> > If not then it bumps epoch, until they are, thus guaranteeing that
> > new issued txid's are larger then in source database. If restored
> > into same database instance, nothing will happen.
> >
> >
> > Advantages of 8-byte txids
> > --------------------------
> >
> > * Indexes won't break silently. No need for mandatory periodic
> > truncate which may not happen for various reasons.
> > * Allows to keep values from different databases in one table/index.
> > * Ability to bring data into different server and continue there.
> >
> > Advantages in being in core
> > ---------------------------
> >
> > * Core code can guarantee that wraparound check happens in 2G transactions.
> > * Core code can update pg_control non-transactionally. Module
> > needs to operate inside user transaction when updating epoch
> > row, which bring various problems (READ COMMITTED vs. SERIALIZABLE,
> > long transactions, locking, etc).
> > * Core code has only one place where it needs to update, module
> > needs to have epoch table in each database.
> >
> > Todo, tothink
> > -------------
> >
> > * Flesh out the documentation. Probably needs some background.
> > * Better names for some functions?
> > * pg_sync_txid allows use of pg_dump for moveing database,
> > but also adds possibility to shoot in the foot by allowing
> > epoch wraparound to happen. Is "Don't do it then" enough?
> > * Currently txid keeps its own copy of nextxid in pg_control,
> > this makes clear data dependencies. Its possible to drop it
> > and use ->checkPointCopy->nextXid directly, thus saving 4 bytes.
> > * Should the pg_sync_txid() issued by pg_dump instead pg_dumpall?
> >
> > --
> > marko
> >
>
> [ Attachment, skipping... ]
>
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 2: Don't 'kill -9' the postmaster
>
--
----------------
Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me: callto:hkrosing
Get Skype for free: http://www.skype.com


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Marko Kreen <markokr(at)gmail(dot)com>, pgsql-patches(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org, Jan Wieck <JanWieck(at)Yahoo(dot)com>
Subject: Re: [HACKERS] [PATCH] Provide 8-byte transaction IDs to
Date: 2006-07-26 21:18:41
Message-ID: 200607262118.k6QLIfE29763@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Tom Lane wrote:
> Bruce Momjian <bruce(at)momjian(dot)us> writes:
> > I am sure you worked hard on this, but I don't see the use case, nor
> > have I heard people in the community requesting such functionality.
> > Perhaps pgfoundry would be a better place for this.
>
> The part of this that would actually be useful to put in core is
> maintaining a 64-bit XID counter, ie, keep an additional counter that
> bumps every time XID wraps around. This cannot be done very well from
> outside core but it would be nearly trivial, and nearly free, to add
> inside. Everything else in the patch could be done just as well as an
> extension datatype.
>
> (I wouldn't do it like this though --- TransactionIdAdvance itself is
> the place to bump the secondary counter.)

Agreed.

--
Bruce Momjian bruce(at)momjian(dot)us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Darcy Buskermolen <darcyb(at)commandprompt(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-patches(at)postgresql(dot)org, Bruce Momjian <bruce(at)momjian(dot)us>, Marko Kreen <markokr(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Jan Wieck <JanWieck(at)yahoo(dot)com>
Subject: Re: [HACKERS] [PATCH] Provide 8-byte transaction IDs to user level
Date: 2006-07-26 21:27:17
Message-ID: 200607261427.19317.darcyb@commandprompt.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Wednesday 26 July 2006 14:03, Tom Lane wrote:
> Darcy Buskermolen <darcyb(at)commandprompt(dot)com> writes:
> >> The question though is if we did that, would Slony actually use it?
> >
> > If it made sence to do it, then yes we would do it. The problem ends up
> > being Slony is designed to work across a multitude of versions of PG, and
> > unless this was backported to at least 7.4, it would take a while (ie
> > when we stopped supporting versions older than it was ported into)
> > before we would make use of it.
>
> [ shrug... ] That's not happening; for one thing the change requires a
> layout change in pg_control and we have no mechanism to do that without
> initdb.

I'll take a bit more of a look through the patch and see if it is a real boot
to use it on those platforms that support it, and that we have a suitable way
around it on those that don't. But at this point I wouldn't hold my breath
on that

>
> regards, tom lane

--
Darcy Buskermolen
CommandPrompt, Inc.
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
http://www.commandprompt.com


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Darcy Buskermolen <darcyb(at)commandprompt(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-patches(at)postgresql(dot)org, Bruce Momjian <bruce(at)momjian(dot)us>, Marko Kreen <markokr(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Jan Wieck <JanWieck(at)yahoo(dot)com>
Subject: Re: [HACKERS] [PATCH] Provide 8-byte transaction IDs to user level
Date: 2006-07-26 21:35:42
Message-ID: 20060726213542.GC10799@surnet.cl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Darcy Buskermolen wrote:
> On Wednesday 26 July 2006 14:03, Tom Lane wrote:
> > Darcy Buskermolen <darcyb(at)commandprompt(dot)com> writes:
> > >> The question though is if we did that, would Slony actually use it?
> > >
> > > If it made sence to do it, then yes we would do it. The problem ends up
> > > being Slony is designed to work across a multitude of versions of PG, and
> > > unless this was backported to at least 7.4, it would take a while (ie
> > > when we stopped supporting versions older than it was ported into)
> > > before we would make use of it.
> >
> > [ shrug... ] That's not happening; for one thing the change requires a
> > layout change in pg_control and we have no mechanism to do that without
> > initdb.
>
> I'll take a bit more of a look through the patch and see if it is a real boot
> to use it on those platforms that support it, and that we have a suitable way
> around it on those that don't. But at this point I wouldn't hold my breath
> on that

The alternative seems to be that the Slony-I team doesn't feel they have
a need for it, nobody else pushes hard enough for the feature to be in
core, and thus Slony-I and all the rest stays broken forever.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: Hannu Krosing <hannu(at)skype(dot)net>
To: Darcy Buskermolen <darcyb(at)commandprompt(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-patches(at)postgresql(dot)org, Bruce Momjian <bruce(at)momjian(dot)us>, Marko Kreen <markokr(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Jan Wieck <JanWieck(at)yahoo(dot)com>
Subject: Re: [HACKERS] [PATCH] Provide 8-byte transaction IDs to
Date: 2006-07-26 21:43:05
Message-ID: 1153950186.2928.46.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Ühel kenal päeval, K, 2006-07-26 kell 14:27, kirjutas Darcy Buskermolen:
> On Wednesday 26 July 2006 14:03, Tom Lane wrote:
> > Darcy Buskermolen <darcyb(at)commandprompt(dot)com> writes:
> > >> The question though is if we did that, would Slony actually use it?
> > >
> > > If it made sence to do it, then yes we would do it. The problem ends up
> > > being Slony is designed to work across a multitude of versions of PG, and
> > > unless this was backported to at least 7.4, it would take a while (ie
> > > when we stopped supporting versions older than it was ported into)
> > > before we would make use of it.
> >
> > [ shrug... ] That's not happening; for one thing the change requires a
> > layout change in pg_control and we have no mechanism to do that without
> > initdb.
>
> I'll take a bit more of a look through the patch and see if it is a real boot
> to use it on those platforms that support it, and that we have a suitable way
> around it on those that don't.

This patch is actually 2 things together:

1) fixing the xid wraparound and related btree brokenness by moving to
8byte txids represented as int8

2) cleaning up and exposing slony's snapshot usage.

Slony stored snapshots in tables as separate xmin, xmax and
list-of-running-transactions and then constructed the snapshot struct
and used it internally.

This patch exposes the snapshot it by providing a single snapshot type
and operators for telling if any int8 trx is committed before or after
this snapshot.

This makes it possible to use txid and snapshots in a a query that does

SELECT records FROM logtable WHERE txid BETWEEN snap1 AND snap2;

that is it gets all records which are committed between two snapshots.

> But at this point I wouldn't hold my breath on that

Well, switching to using stuff from this patch would fix the
data-corruption-after-2G problem for slony.

That is unless thera are some bugs or thinkos of its own in this
patch :)

--
----------------
Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me: callto:hkrosing
Get Skype for free: http://www.skype.com


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Darcy Buskermolen <darcyb(at)commandprompt(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Bruce Momjian <bruce(at)momjian(dot)us>, Marko Kreen <markokr(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Jan Wieck <JanWieck(at)yahoo(dot)com>
Subject: Re: [PATCHES] [PATCH] Provide 8-byte transaction IDs to
Date: 2006-07-26 22:26:11
Message-ID: 44C7EC03.5010507@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Alvaro Herrera wrote:
> Darcy Buskermolen wrote:
>
>> I'll take a bit more of a look through the patch and see if it is a real boot
>> to use it on those platforms that support it, and that we have a suitable way
>> around it on those that don't. But at this point I wouldn't hold my breath
>> on that
>>
>
> The alternative seems to be that the Slony-I team doesn't feel they have
> a need for it, nobody else pushes hard enough for the feature to be in
> core, and thus Slony-I and all the rest stays broken forever.
>
>

Some things are going to take a few generations to be generally useful,
ISTM. At least let's go with the bit that Tom says should be in core.

cheers

andrew


From: Darcy Buskermolen <darcyb(at)commandprompt(dot)com>
To: pgsql-patches(at)postgresql(dot)org
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Bruce Momjian <bruce(at)momjian(dot)us>, Marko Kreen <markokr(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Jan Wieck <JanWieck(at)yahoo(dot)com>
Subject: Re: [HACKERS] [PATCH] Provide 8-byte transaction IDs to user level
Date: 2006-07-27 15:44:58
Message-ID: 200607270844.59894.darcyb@commandprompt.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Wednesday 26 July 2006 14:27, Darcy Buskermolen wrote:
> On Wednesday 26 July 2006 14:03, Tom Lane wrote:
> > Darcy Buskermolen <darcyb(at)commandprompt(dot)com> writes:
> > >> The question though is if we did that, would Slony actually use it?
> > >
> > > If it made sence to do it, then yes we would do it. The problem ends up
> > > being Slony is designed to work across a multitude of versions of PG,
> > > and unless this was backported to at least 7.4, it would take a while
> > > (ie when we stopped supporting versions older than it was ported into)
> > > before we would make use of it.
> >
> > [ shrug... ] That's not happening; for one thing the change requires a
> > layout change in pg_control and we have no mechanism to do that without
> > initdb.
>
> I'll take a bit more of a look through the patch and see if it is a real
> boot to use it on those platforms that support it, and that we have a
> suitable way around it on those that don't. But at this point I wouldn't
> hold my breath on that

In one of those 3am lightbulbs I belive I have a way to make use of the 64-bit
XID counter and still maintain the ability to have backwards compatibility.
Is there any chance you could break this patch up into the 2 separate
componenets that Hannu mentions, and rework the XID stuff into
TransactionIdAdvance as per tom's recommendation. And in the meantime I'll
pencil out the slony stuff to utilize this.

>
> > regards, tom lane

--
Darcy Buskermolen
CommandPrompt, Inc.
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
http://www.commandprompt.com


From: "Marko Kreen" <markokr(at)gmail(dot)com>
To: "Darcy Buskermolen" <darcyb(at)commandprompt(dot)com>
Cc: pgsql-patches(at)postgresql(dot)org, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Bruce Momjian" <bruce(at)momjian(dot)us>, pgsql-hackers(at)postgresql(dot)org, "Jan Wieck" <JanWieck(at)yahoo(dot)com>
Subject: Re: [HACKERS] [PATCH] Provide 8-byte transaction IDs to user level
Date: 2006-07-28 21:13:52
Message-ID: e51f66da0607281413n6b72f275j349257f1c752de31@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On 7/27/06, Darcy Buskermolen <darcyb(at)commandprompt(dot)com> wrote:
> In one of those 3am lightbulbs I belive I have a way to make use of the 64-bit
> XID counter and still maintain the ability to have backwards compatibility.
> Is there any chance you could break this patch up into the 2 separate
> componenets that Hannu mentions, and rework the XID stuff into
> TransactionIdAdvance as per tom's recommendation. And in the meantime I'll
> pencil out the slony stuff to utilize this.

Yes, I can. As I am on vacation right now, my computer-time is rather
unstable, hopefully I can do it on weekend.

--
marko


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Marko Kreen <markokr(at)gmail(dot)com>, pgsql-patches(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org, Jan Wieck <JanWieck(at)Yahoo(dot)com>
Subject: Re: [HACKERS] [PATCH] Provide 8-byte transaction IDs to
Date: 2006-08-20 21:33:01
Message-ID: 840.1156109581@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Bruce Momjian <bruce(at)momjian(dot)us> writes:
> Tom Lane wrote:
>> The part of this that would actually be useful to put in core is
>> maintaining a 64-bit XID counter, ie, keep an additional counter that
>> bumps every time XID wraps around. This cannot be done very well from
>> outside core but it would be nearly trivial, and nearly free, to add
>> inside. Everything else in the patch could be done just as well as an
>> extension datatype.
>>
>> (I wouldn't do it like this though --- TransactionIdAdvance itself is
>> the place to bump the secondary counter.)

> Agreed.

I reconsidered after trying to do it that way --- although fixing
TransactionIdAdvance itself to maintain a 2-word counter isn't hard,
there are a whole lot of other places that can advance nextXid,
mostly bits like this in WAL recovery:

/* Make sure nextXid is beyond any XID mentioned in the record */
max_xid = xid;
for (i = 0; i < xlrec->nsubxacts; i++)
{
if (TransactionIdPrecedes(max_xid, sub_xids[i]))
max_xid = sub_xids[i];
}
if (TransactionIdFollowsOrEquals(max_xid,
ShmemVariableCache->nextXid))
{
ShmemVariableCache->nextXid = max_xid;
TransactionIdAdvance(ShmemVariableCache->nextXid);
}

We could hack all these places to know about maintaining an XID-epoch
value, but it's not looking like a simple single-place-to-touch fix :-(

So I'm now agreeing that the approach of maintaining an epoch counter
in checkpoints is best after all. That will work so long as the system
doesn't exceed 4G transactions between checkpoints ... and you'd have a
ton of other problems before that, so this restriction does not bother
me. Putting this in the core code still beats the alternatives
available to non-core code because of the impossibility of being sure
you get control on any fixed schedule, not to mention considerations of
what happens during WAL replay and PITR.

There's still a lot more cruft in the submitted patch than I think
belongs in core, but I'll work on extracting something we can apply.

There was some worry upthread about whether Slony would actually use
this in the near future, but certainly if we don't put it in then
they'll *never* be able to use it.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgresql(dot)org, Marko Kreen <markokr(at)gmail(dot)com>
Cc: pgsql-patches(at)postgresql(dot)org
Subject: Re: [PATCH] Provide 8-byte transaction IDs to user level
Date: 2006-08-21 16:18:33
Message-ID: 16124.1156177113@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Marko Kreen <markokr(at)gmail(dot)com> writes:
> Following patch exports 8 byte txid and snapshot to user level
> allowing its use in regular SQL. It is based on Slony-I xxid
> module. It provides special 'snapshot' type for snapshot but
> uses regular int8 for transaction ID's.

Per discussion, I've applied a patch that just implements tracking of
"XID epoch" in checkpoints. This should be sufficient to let xxid be
handled as an external module.

regards, tom lane


From: "Marko Kreen" <markokr(at)gmail(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Bruce Momjian" <bruce(at)momjian(dot)us>, pgsql-patches(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org, "Jan Wieck" <JanWieck(at)yahoo(dot)com>
Subject: Re: [HACKERS] [PATCH] Provide 8-byte transaction IDs to
Date: 2006-08-21 16:18:54
Message-ID: e51f66da0608210918j693fa54fyee77d721fdead601@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On 8/21/06, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Bruce Momjian <bruce(at)momjian(dot)us> writes:
> > Tom Lane wrote:
> >> (I wouldn't do it like this though --- TransactionIdAdvance itself is
> >> the place to bump the secondary counter.)
>
> > Agreed.
>
> I reconsidered after trying to do it that way --- although fixing
> TransactionIdAdvance itself to maintain a 2-word counter isn't hard,
> there are a whole lot of other places that can advance nextXid,
> mostly bits like this in WAL recovery:
>
> /* Make sure nextXid is beyond any XID mentioned in the record */
> max_xid = xid;
> for (i = 0; i < xlrec->nsubxacts; i++)
> {
> if (TransactionIdPrecedes(max_xid, sub_xids[i]))
> max_xid = sub_xids[i];
> }
> if (TransactionIdFollowsOrEquals(max_xid,
> ShmemVariableCache->nextXid))
> {
> ShmemVariableCache->nextXid = max_xid;
> TransactionIdAdvance(ShmemVariableCache->nextXid);
> }
>
> We could hack all these places to know about maintaining an XID-epoch
> value, but it's not looking like a simple single-place-to-touch fix :-(

As I was asked to rework the patch, I planned to use
TransactionIdAdvance(ShmemVariableCache), although that would
be conceptually ugly. Right Thing for this approach would be
to have special struct, but that would touch half the codebase.

That was also the reason I did not want to go that path.

> There's still a lot more cruft in the submitted patch than I think
> belongs in core, but I'll work on extracting something we can apply.

The only cruft I see is the snapshot on-disk "compression" and maybe
the pg_sync_txid() funtionality. Dropping the compression would not
matter much, snapshots would waste space, but at least for our
usage it would not be a problem. The reast of the functions are all
required for efficient handling.

Dropping the pg_sync_txid() would be loss, because that means that
user cannot just dump and restore the data and just continue where
it left off. Maybe its not a problem for replication but for generic
queueing it would need delicate juggling when restoring backup.

Although I must admit the pg_sync_txid() is indeed ugly part
of the patch, and it creates new mode for failure - wrapping
epoch. So I can kind of agree for removing it.

I hope you don't mean that none of the user-level functions belong
to core. It's not like there is several ways to expose the info.
And it not like there are much more interesting ways for using
the long xid in C level. Having long xid available in SQL level
means that efficient async replication can be done without any
use of C.

Now that I am back from vacation I can do some coding myself,
if you give hints what needs rework.

--
marko


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Marko Kreen" <markokr(at)gmail(dot)com>
Cc: "Bruce Momjian" <bruce(at)momjian(dot)us>, pgsql-hackers(at)postgresql(dot)org, "Jan Wieck" <JanWieck(at)yahoo(dot)com>
Subject: Re: [PATCHES] [PATCH] Provide 8-byte transaction IDs to
Date: 2006-08-21 16:46:04
Message-ID: 22409.1156178764@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

"Marko Kreen" <markokr(at)gmail(dot)com> writes:
> Dropping the pg_sync_txid() would be loss, because that means that
> user cannot just dump and restore the data and just continue where
> it left off. Maybe its not a problem for replication but for generic
> queueing it would need delicate juggling when restoring backup.

I'm not following the point here. Dump and restore has never intended
to preserve the transaction counter, so why should it preserve
high-order bits of the transaction counter?

There is another problem with pg_sync_txid, too: because it is willing
to advance the extended XID counter in multiples of 4G XIDs, it turns
wraparound of the extended counter from a never-will-happen scenario
into something that could happen in a poorly-managed installation.
If you've got to be prepared to cope with wraparound of the extended
counter, then what the heck is the point at all? You might as well just
work with XIDs as they stand.

So I think pg_sync_txid is a bad idea. In the patch as committed,
anyone who's really intent on munging the epoch can do it with
pg_resetxlog, but there's not a provision for doing it short of that.

regards, tom lane


From: "Marko Kreen" <markokr(at)gmail(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Bruce Momjian" <bruce(at)momjian(dot)us>, pgsql-hackers(at)postgresql(dot)org, "Jan Wieck" <JanWieck(at)yahoo(dot)com>
Subject: Re: [PATCHES] [PATCH] Provide 8-byte transaction IDs to
Date: 2006-08-21 17:15:43
Message-ID: e51f66da0608211015q4497c6f8i806856a2c1912701@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On 8/21/06, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> "Marko Kreen" <markokr(at)gmail(dot)com> writes:
> > Dropping the pg_sync_txid() would be loss, because that means that
> > user cannot just dump and restore the data and just continue where
> > it left off. Maybe its not a problem for replication but for generic
> > queueing it would need delicate juggling when restoring backup.
>
> I'm not following the point here. Dump and restore has never intended
> to preserve the transaction counter, so why should it preserve
> high-order bits of the transaction counter?

Thus it guarantees that any new issued large txid's will be larger
than existing ones in tables. Thus code can depend on monotonous
growth.

> There is another problem with pg_sync_txid, too: because it is willing
> to advance the extended XID counter in multiples of 4G XIDs, it turns
> wraparound of the extended counter from a never-will-happen scenario
> into something that could happen in a poorly-managed installation.
> If you've got to be prepared to cope with wraparound of the extended
> counter, then what the heck is the point at all? You might as well just
> work with XIDs as they stand.

Indeed. I also don't like that scenario.

> So I think pg_sync_txid is a bad idea. In the patch as committed,
> anyone who's really intent on munging the epoch can do it with
> pg_resetxlog, but there's not a provision for doing it short of that.

I like it. It is indeed better than having pg_dump issuing a function
call. This fully satisfactory.

--
marko


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Marko Kreen" <markokr(at)gmail(dot)com>
Cc: "Bruce Momjian" <bruce(at)momjian(dot)us>, pgsql-hackers(at)postgresql(dot)org, "Jan Wieck" <JanWieck(at)yahoo(dot)com>
Subject: Re: [PATCHES] [PATCH] Provide 8-byte transaction IDs to
Date: 2006-08-21 17:29:48
Message-ID: 28412.1156181388@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

"Marko Kreen" <markokr(at)gmail(dot)com> writes:
> On 8/21/06, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> I'm not following the point here. Dump and restore has never intended
>> to preserve the transaction counter, so why should it preserve
>> high-order bits of the transaction counter?

> Thus it guarantees that any new issued large txid's will be larger
> than existing ones in tables. Thus code can depend on monotonous
> growth.

Within a single installation, sure, but I don't buy that we ought to try
to preserve XIDs across installations.

regards, tom lane


From: "Marko Kreen" <markokr(at)gmail(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Bruce Momjian" <bruce(at)momjian(dot)us>, pgsql-hackers(at)postgresql(dot)org, "Jan Wieck" <JanWieck(at)yahoo(dot)com>
Subject: Re: [PATCHES] [PATCH] Provide 8-byte transaction IDs to
Date: 2006-08-21 18:10:10
Message-ID: e51f66da0608211110kda30ed0v417eceb1cbcdf8b5@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On 8/21/06, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> "Marko Kreen" <markokr(at)gmail(dot)com> writes:
> > On 8/21/06, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >> I'm not following the point here. Dump and restore has never intended
> >> to preserve the transaction counter, so why should it preserve
> >> high-order bits of the transaction counter?
>
> > Thus it guarantees that any new issued large txid's will be larger
> > than existing ones in tables. Thus code can depend on monotonous
> > growth.
>
> Within a single installation, sure, but I don't buy that we ought to try
> to preserve XIDs across installations.

I think you are right in the respect that we should not do it
automatically.

But now that the long xids may end up in data tables, user may have the
need dump/restore it in another installation. If the application
is eg. Slony like queue, that depends on xid growth, user needs to
be able to bump epoch or application level support for migration.
If he has neither, he needs basically to extract old contents by hand
(as app would not work reliably) and reset everything.

Probably the right thing would be for application have a functions
"we moved, fix everything". But bumping epoch is such a simple
way of fixing it that it should still be available.

And pg_resetxlog is fine for that. Espacially as using it signals
"It's dangerous what you are doing!"

--
marko