Re: ctid & updates (or speedy updates/deletes)

Lists: pgsql-general
From: "Joshua b(dot) Jore" <josh(at)greentechnologist(dot)org>
To: pgsql-general(at)postgresql(dot)org
Subject: ctid & updates
Date: 2002-06-03 18:25:57
Message-ID: Pine.BSO.4.44.0206031301140.11800-100000@kitten.greentechnologist.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

1#
I saw a post from Jan Wieck about how ctid can be used for a fast update.
I noticed that ctid changes on update (as expected since it's really a new
row). Is there anyway to get the new ctid from the update so later
updates to the row can continue to use ctid to zero in on the row
location?

2#
Also, is ctid unique for each row? I would guess so since it looks like
(and I'm guessing) that ctid is a page/row offset. So there should be only
one thing at each ctid address. Is that correct that ctid is unique to a
table? Can anything interesting be done with the empty space? Is there any
way to find the maximum ctid and look for quantities of empty space? I
assume a user-side program could use that data to see how much unused,
yet allocated space there is in a table?

That might be the "You don't have to be a PostgreSQL hacker" version of
looking for compressible (via relocation) tables.

Joshua b. Jore ; http://www.greentechnologist.org ; 1121 1233 1311 200
1201 1302 1211 200 1201 1303 200 1300 1233 1313 1211 1302 1212 1311 1230
200 1201 1303 200 1321 1233 1311 1302 200 1211 1232 1211 1231 1321 200
1310 1220 1221 1232 1223 1303 200 1321 1233 1311 200 1201 1302 1211 232
200 1112 1233 1310 1211 200 1013 1302 1211 1211 1232 201 22


From: Jan Wieck <janwieck(at)yahoo(dot)com>
To: "Joshua b(dot) Jore" <josh(at)greentechnologist(dot)org>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: ctid & updates
Date: 2002-06-03 20:00:47
Message-ID: 200206032000.g53K0mh03880@saturn.janwieck.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Joshua b. Jore wrote:
> 1#
> I saw a post from Jan Wieck about how ctid can be used for a fast update.
> I noticed that ctid changes on update (as expected since it's really a new
> row). Is there anyway to get the new ctid from the update so later
> updates to the row can continue to use ctid to zero in on the row
> location?

That's one of the details I'm still thinking about. And in
the case of a cursor using a junk attribute it gets worse,
because there is no easy way to push that new value back into
the cursor's result set.

But I doubt that this would become a real world problem ever.
People who need to update one and the same DB row again and
again during the same transaction are spaghetti-code-script-
kiddies who know for sure that "a cursor's that thingy on ya
screen that ya move withe mouse", so we're pretty safe here
:-p

> 2#
> Also, is ctid unique for each row?

Yes, per table.

Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== JanWieck(at)Yahoo(dot)com #


From: "Joshua b(dot) Jore" <josh(at)greentechnologist(dot)org>
To: Jan Wieck <janwieck(at)yahoo(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: ctid & updates
Date: 2002-06-03 21:21:53
Message-ID: Pine.BSO.4.44.0206031612380.2992-100000@kitten.greentechnologist.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Mon, 3 Jun 2002, Jan Wieck wrote:

> Joshua b. Jore wrote:
> > 1#
> > I saw a post from Jan Wieck about how ctid can be used for a fast update.
> > I noticed that ctid changes on update (as expected since it's really a new
> > row). Is there anyway to get the new ctid from the update so later
> > updates to the row can continue to use ctid to zero in on the row
> > location?
>
> That's one of the details I'm still thinking about. And in
> the case of a cursor using a junk attribute it gets worse,
> because there is no easy way to push that new value back into
> the cursor's result set.
>
> But I doubt that this would become a real world problem ever.
> People who need to update one and the same DB row again and
> again during the same transaction are spaghetti-code-script-
> kiddies who know for sure that "a cursor's that thingy on ya
> screen that ya move withe mouse", so we're pretty safe here

True, some really bad code could be written. In my case I just wanted to
do something less than optimal as a short term fix which might mean the
same row is updated multiple times. That's a queueing/aggregation cache
just waiting to get implemented to get around that (that's in my app, not
Postgresql for others who are reading this) ;-). I just figured that if I
was going to be dumb and update the same row multiple times it might as
well be speedy.

I did notice that this puts a wrinkle into MVCC. While MVCC operates
normally this method of finding the row makes the conflicting
update/delete miss the row and alter nothing. I was only playing with Read
Committed but if I started two transactions, updated the row in one, then
updated in another it acted normally (the second waited until the first
completed). When the first txn committed the second went on and found that
the row had disappeared and returned zero rows updated. I might just use
that behaviour to detect conflicts (of course, only when possible).

Neato keen. Thanks Jan

Josh


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Joshua b(dot) Jore" <josh(at)greentechnologist(dot)org>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: ctid & updates
Date: 2002-06-04 00:05:17
Message-ID: 13244.1023149117@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

"Joshua b. Jore" <josh(at)greentechnologist(dot)org> writes:
> I noticed that ctid changes on update (as expected since it's really a new
> row). Is there anyway to get the new ctid from the update so later
> updates to the row can continue to use ctid to zero in on the row
> location?

There's a function called something like currtid that takes the
CTID of the possibly-obsoleted row and returns the CTID of its latest
updated version. I believe this is exported because the ODBC driver
uses it, so it's unlikely to go away, even though AFAIR it's not
documented anywhere. A risk of using it is that CTID of an updated
row cannot be trusted for very long --- once VACUUM has come by,
you might find that CTID reassigned to some other row entirely.

> Can anything interesting be done with the empty space? Is there any
> way to find the maximum ctid and look for quantities of empty space?

I don't think CTID gives you any useful hint about the amount of free
space available on a page.

regards, tom lane


From: "Joshua b(dot) Jore" <josh(at)greentechnologist(dot)org>
To:
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: ctid & updates (or speedy updates/deletes)
Date: 2002-06-04 01:32:12
Message-ID: Pine.BSO.4.44.0206031939050.21627-100000@kitten.greentechnologist.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Cool thanks Tom. This is awesome - as long as I don't mind being tied to
PostgreSQL then this sort of thing is a nifty thing to make things go
faster.

Josh

For the archives and others who are just reading this (I know I often just
follow interesting threads to get ideas)

I plan to use the tid two ways: fast access for updates to a row,
detecting updates. I'll do something like UPDATE ... WHERE ctid =
'(23,22)'::tid to do a really fast update. The number of modified rows
will be either one (there is only one row per tid address) or zero (the
row was altered since I took the row's tid). My application will notice
the change and react accordingly - say get a new copy of the row from the
table and see what the user wants to do. This addresses the issue
of multiple users sending updates to the same row.

Functions from src/backend/utils/adt/tid.c
Both functions return the current tid for a row indicated by a tid. From
my (very inexpert) reading of the source this might take any previous tid
and get the current one. currtid takes the table OID and currtid2 takes
the table name

currtid(OID,TID) RETURNS TID
currtid2(TEXT,TID) RETURNS TID

An example:

CREATE TABLE a (
j INTEGER
);
INSERT INTO a (j) VALUES (0);
INSERT INTO a (j) VALUES (1);
SELECT tableoid,ctid,oid,* FROM a;
tableoid | oid | ctid | i
----------+----------+-------+---
63993118 | 63993128 | (0,1) | 0
63993118 | 63993129 | (0,2) | 1
/*
* ctid starts counting from page zero, row 1
*/

UPDATE a SET i = 0 WHERE i = 0;
SELECT tableoid,ctid,oid,* FROM a;
tableoid | oid | ctid | i
----------+----------+-------+---
63993118 | 63993129 | (0,2) | 1
63993118 | 63993128 | (0,3) | 0
/*
* ctid incremented on an update (since an update is really just a sneaky
* sort of insert)
*/

SELECT currtid2('a'::text, '(0,1)'::tid)
currtid2
----------
(0,3)
/*
* locate the current row based on the old address
*/

UPDATE a SET i = 0 WHERE i = 0;
SELECT tableoid,ctid,oid,* FROM a;
tableoid | oid | ctid | i
----------+----------+-------+---
63993118 | 63993129 | (0,2) | 1
63993118 | 63993128 | (0,4) | 0
/*
* ctid incremented again, as expected
*/

SELECT currtid2('a'::text, '(0,1)'::tid)
currtid2
----------
(0,4)
/*
* currtid2 still finds the current row by using the old
* tid which is two revisions old. Maybe this just sticks
* around until the table is vacuumed. Make sure to not
* try to use the tid over any vacuum operation since that
* just moves everything around anyway.
*/

So if I tried to do an UPDATE operation where the row is located by ctid
and nothing was updated then either the table was vacuumed or the row was
obsoleted by another unknown update. The new ctid is available but it
might be better to check with the user and get the new values or something
like that.

Joshua b. Jore ; http://www.greentechnologist.org ; 1121 1233 1311 200
1201 1302 1211 200 1201 1303 200 1300 1233 1313 1211 1302 1212 1311 1230
200 1201 1303 200 1321 1233 1311 1302 200 1211 1232 1211 1231 1321 200
1310 1220 1221 1232 1223 1303 200 1321 1233 1311 200 1201 1302 1211 232
200 1112 1233 1310 1211 200 1013 1302 1211 1211 1232 201 22

On Mon, 3 Jun 2002, Tom Lane wrote:

> "Joshua b. Jore" <josh(at)greentechnologist(dot)org> writes:
> > I noticed that ctid changes on update (as expected since it's really a new
> > row). Is there anyway to get the new ctid from the update so later
> > updates to the row can continue to use ctid to zero in on the row
> > location?
>
> There's a function called something like currtid that takes the
> CTID of the possibly-obsoleted row and returns the CTID of its latest
> updated version. I believe this is exported because the ODBC driver
> uses it, so it's unlikely to go away, even though AFAIR it's not
> documented anywhere. A risk of using it is that CTID of an updated
> row cannot be trusted for very long --- once VACUUM has come by,
> you might find that CTID reassigned to some other row entirely.
>
> > Can anything interesting be done with the empty space? Is there any
> > way to find the maximum ctid and look for quantities of empty space?
>
> I don't think CTID gives you any useful hint about the amount of free
> space available on a page.
>
> regards, tom lane
>


From: Jan Wieck <janwieck(at)yahoo(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Joshua b(dot) Jore" <josh(at)greentechnologist(dot)org>, pgsql-general(at)postgresql(dot)org
Subject: Re: ctid & updates
Date: 2002-06-04 14:16:10
Message-ID: 200206041416.g54EGAH04751@saturn.janwieck.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Tom Lane wrote:
> "Joshua b. Jore" <josh(at)greentechnologist(dot)org> writes:
> > I noticed that ctid changes on update (as expected since it's really a new
> > row). Is there anyway to get the new ctid from the update so later
> > updates to the row can continue to use ctid to zero in on the row
> > location?
>
> There's a function called something like currtid that takes the
> CTID of the possibly-obsoleted row and returns the CTID of its latest
> updated version. I believe this is exported because the ODBC driver
> uses it, so it's unlikely to go away, even though AFAIR it's not
> documented anywhere. A risk of using it is that CTID of an updated
> row cannot be trusted for very long --- once VACUUM has come by,
> you might find that CTID reassigned to some other row entirely.

But it should be safe within the transaction that did the
update, right? And since cursors (in PG) cannot span
multiple transactions, this would be the last piece I'm
looking for to use in UPDATE ... WHERE CURRENT OF.

Thanks.

Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== JanWieck(at)Yahoo(dot)com #


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jan Wieck <janwieck(at)yahoo(dot)com>
Cc: "Joshua b(dot) Jore" <josh(at)greentechnologist(dot)org>, pgsql-general(at)postgresql(dot)org
Subject: Re: ctid & updates
Date: 2002-06-04 23:03:00
Message-ID: 19781.1023231780@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Jan Wieck <janwieck(at)yahoo(dot)com> writes:
> Tom Lane wrote:
>> There's a function called something like currtid that takes the
>> CTID of the possibly-obsoleted row and returns the CTID of its latest
>> updated version. I believe this is exported because the ODBC driver
>> uses it, so it's unlikely to go away, even though AFAIR it's not
>> documented anywhere. A risk of using it is that CTID of an updated
>> row cannot be trusted for very long --- once VACUUM has come by,
>> you might find that CTID reassigned to some other row entirely.

> But it should be safe within the transaction that did the
> update, right?

Sure; VACUUM won't risk deleting tuples that were visible as of the
start of the oldest open transaction, so anything that you found earlier
in the current transaction will surely still be there, even if it's not
the latest committed version anymore. I wouldn't trust a CTID older
than the current transaction, however.

regards, tom lane