Re: tracking commit timestamps

From: Steve Singer <steve(at)ssinger(dot)info>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Petr Jelinek <petr(at)2ndquadrant(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Anssi Kääriäinen <anssi(dot)kaariainen(at)thl(dot)fi>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Jaime Casanova <jaime(at)2ndquadrant(dot)com>
Subject: Re: tracking commit timestamps
Date: 2014-11-15 04:32:59
Message-ID: BLU436-SMTP92E792FBEECCC26B120BECDC8D0@phx.gbl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-www

On 11/14/2014 08:21 PM, Simon Riggs wrote:
> The requested information is already available, as discussed. Logical
> decoding adds commit ordering for *exactly* the purpose of using it
> for replication, available to all solutions. This often requested
> feature has now been added and doesn't need to be added twice.
>
> So what we are discussing is adding a completely superfluous piece of
> information.
>
> Not including the LSN info does nothing to trigger based replication,
> which will no doubt live on happily for many years. But adding LSN
> will slow down logical replication, for no purpose at all.
>

Simon,
The use cases I'm talking about aren't really replication related. Often
I have come across systems that want to do something such as 'select *
from orders where X > the_last_row_I_saw order by X' and then do further
processing on the order.

This is kind of awkard to do today because you don't have a good
candidate for 'X' to order on. Using either a sequence or insert-row
timestamp doesn't work well because a transaction with a lower value for
X might end up committing after the higher value in in a query result.

Yes you could setup a logical wal slot and listen on the stream of
inserts into your order table but thats a lot of administration overhead
compared to just issuing an SQL query for what really is a query type
operation.

Using the commit timestamp for my X sounded very tempting but could
allow duplicates.

One could argue that this patch is about replication features, and
providing commit ordering for query purposes should be a separate patch
to add that on top of this infrastructure. I see merit to smaller more
focused patches but that requires leaving the door open to easily
extending things later.

It could also be that I'm the only one who wants to order and filter
queries in this manner (but that would surprise me). If the commit lsn
has limited appeal and we decide we don't want it at all then we
shouldn't add it. I've seen this type of requirement in a number of
different systems at a number of different companies. I've generally
seen it dealt with by either selecting rows behind the last now()
timestamp seen and then filtering out already processed rows or by
tracking the 'processed' state of each row individually (ie performing
an update on each row once its been processed) which performs poorly.

Steve

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2014-11-15 05:12:55 Re: New storage parameter pages_per_range not mentioned in CREATE INDEX doc
Previous Message Alvaro Herrera 2014-11-15 04:27:30 Re: New storage parameter pages_per_range not mentioned in CREATE INDEX doc

Browse pgsql-www by date

  From Date Subject
Next Message Simon Riggs 2014-11-15 12:36:53 Re: tracking commit timestamps
Previous Message Simon Riggs 2014-11-15 01:21:45 Re: tracking commit timestamps