Quick Links

preserving forensic information when we freeze

Lists:	pgsql-hackers

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	preserving forensic information when we freeze
Date:	2013-05-28 13:21:27
Message-ID:	CA+TgmoaEmnoLZmVbb8gvY69NA8zw9BWpiZ9+TLz-LnaBOZi7JA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Various people, including at least Heikki, Andres, and myself, have
proposed various schemes for avoiding freezing that amount to doing it
sooner, when we're already writing WAL anyway, or at least when the
buffer is already dirty anyway, or at least while the buffer is
already in shared_buffers anyway. Various people, including at least
Tom and Andres, have raised the point that this would lose
possibly-useful forensic information that they have in the past found
to be of tangible value in previous debugging of databases that have
somehow gotten messed up. I don't know who originally proposed it,
but I've had many conversations about how we could address this
concern: instead of replacing xmin when we freeze, just set an
infomask bit that means "xmin is frozen" and leave the old, literal
xmin in place. FrozenTransactionId would still exist and still be
understood, of course, but new freezing operations wouldn't use it.

I have attempted to implement this. Trouble is, we're out of infomask
bits. Using an infomask2 bit might work but we don't have many of
them left either, so it's worth casting about a bit for a better
solution. Andres proposed using HEAP_MOVED_IN|HEAP_MOVED_OFF for
this purpose, but I think we're better off trying to reclaim those
bits in a future release. Exactly how to do that is a topic for
another email, but I believe it's very possible. What I'd like to
propose instead is using HEAP_XMIN_COMMITTED|HEAP_XMIN_INVALID to
indicate that xmin is frozen. This bit pattern isn't used for
anything else, so there's no confusion possible with existing data
already on disk, but it's necessary to audit all code that checks
HEAP_XMIN_INVALID to make sure it doesn't get confused. I've done
this, and there's little enough of it that it seems pretty easy to
handle.

A somewhat larger problem is that this requires auditing every place
that looks at a tuple xmin and deciding whether any changes are needed
to handle the possibility that the tuple may be frozen even though
xmin != FrozenTransactionId. This is a somewhat more significant
change, but it doesn't seem to be too bad. But there are a couple of
cases that are tricky enough that they seem worth expounding upon:

- When we follow HOT chains, we determine where the HOT chain ends by
matching the xmax of each tuple with the xmin of the next tuple. If
they don't match, we conclude that the HOT chain has ended. I
initially thought this logic might be buggy even as things stand today
if the latest tuple in the chain is frozen, but as Andres pointed out
to me, that's not so. If the newest tuple in the chain is all-visible
(the earliest point at which we can theoretically freeze it), all
earlier tuples are dead altogether, and heap_page_prune() is always
called after locking the buffer and before freezing, so any tuple we
freeze must be the first in its HOT chain. For the same reason, this
logic doesn't need any adjustment for the new freezing system: it's
never looking at anything old enough to be frozen in the first place.

- Various procedural languages use the combination of TID and XMIN to
determine whether a function needs to be recompiled. Although the
possibility of this doing the wrong thing seems quite remote, it's not
obvious to me why it's theoretically correct even as things stand
today. Suppose that previously-frozen tuple is vacuumed away and
another tuple is placed at the same TID and then frozen. Then, we
check whether the cache is still valid and, behold, it is. This would
actually get better with this patch, since it wouldn't be enough
merely for the old and new tuples to both be frozen; they'd have to
have had the same XID prior to freezing. I think that could only
happen if a backend sticks around for at least 2^32 transactions, but
I don't know what would prevent it in that case.

- heap_get_latest_tid() appears broken even without this patch. It's
only used on user-specified TIDs, either in a TID scan, or due to the
use of currtid_byreloid() and currtid_byrelname(). It attempts find
the latest version of the tuple referenced by the given TID by
following the CTID links. Like HOT, it uses XMAX/XMIN matching to
detect when the chain is broken. However, unlike HOT, update chains
can in general span multiple blocks. It is not now nor has it ever
been safe to assume that the next tuple in the chain can't be frozen
before the previous one is vacuumed away. Andres came up with the
best example: suppose the tuple to be frozen physically precedes its
predecessor; then, an in-progress vacuum might reach the to-be-frozen
tuple before it reaches (and removes) the previous row version. In
newer releases, the same problem could be caused by vacuum's
occasional page-skipping behavior. As with the previous point, the
"don't actually change xmin when we freeze" approach actually makes it
harder for a chain to get "broken" when it shouldn't, but I suspect
it's just moving us from one set of extremely-obscure failure cases to
another.

This patch probably needs some documentation updates. Suggestions as
to where would be appreciated.

As a general statement, I view this work as something that is likely
needed no matter which one of the "remove freezing" approaches that
have been proposed we choose to adopt. It does not fix anything in
and of itself, but it (hopefully) removes an objection to the entire
line of inquiry.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment	Content-Type	Size
freeze-forensically-v1.patch	application/octet-stream	19.9 KB

From:	Josh Berkus <josh(at)agliodbs(dot)com>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: preserving forensic information when we freeze
Date:	2013-05-28 16:39:13
Message-ID:	51A4DDB1.10605@agliodbs.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 05/28/2013 06:21 AM, Robert Haas wrote:
> As a general statement, I view this work as something that is likely
> needed no matter which one of the "remove freezing" approaches that
> have been proposed we choose to adopt. It does not fix anything in
> and of itself, but it (hopefully) removes an objection to the entire
> line of inquiry.

Agreed. I have some ideas on how to reduce the impact of freezing as
well (of course), and the description of your approach certainly seems
to benefit them, especially as it removes the whole "forensic
information" objection.

One question though: if we're not removing the xmin, how do we know the
maximum xid to which we can prune clog? I can imagine several ways
given your approach.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Josh Berkus <josh(at)agliodbs(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: preserving forensic information when we freeze
Date:	2013-05-28 23:27:35
Message-ID:	20130528232735.GB818@awork2.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2013-05-28 09:39:13 -0700, Josh Berkus wrote:
> On 05/28/2013 06:21 AM, Robert Haas wrote:
> > As a general statement, I view this work as something that is likely
> > needed no matter which one of the "remove freezing" approaches that
> > have been proposed we choose to adopt. It does not fix anything in
> > and of itself, but it (hopefully) removes an objection to the entire
> > line of inquiry.
>
> Agreed. I have some ideas on how to reduce the impact of freezing as
> well (of course), and the description of your approach certainly seems
> to benefit them, especially as it removes the whole "forensic
> information" objection.
>
> One question though: if we're not removing the xmin, how do we know the
> maximum xid to which we can prune clog? I can imagine several ways
> given your approach.

Simply don't count xids which are frozen. Currently we ignore an xid
because its a special value, after this because the tuple has a certain
hint bit (combination) set.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: preserving forensic information when we freeze
Date:	2013-05-28 23:33:14
Message-ID:	CA+TgmoZiGiSKRn6udHqsdnGRBG_0S3G0YhZqG_8DVJdYqt432g@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, May 28, 2013 at 7:27 PM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> On 2013-05-28 09:39:13 -0700, Josh Berkus wrote:
>> On 05/28/2013 06:21 AM, Robert Haas wrote:
>> > As a general statement, I view this work as something that is likely
>> > needed no matter which one of the "remove freezing" approaches that
>> > have been proposed we choose to adopt. It does not fix anything in
>> > and of itself, but it (hopefully) removes an objection to the entire
>> > line of inquiry.
>>
>> Agreed. I have some ideas on how to reduce the impact of freezing as
>> well (of course), and the description of your approach certainly seems
>> to benefit them, especially as it removes the whole "forensic
>> information" objection.
>>
>> One question though: if we're not removing the xmin, how do we know the
>> maximum xid to which we can prune clog? I can imagine several ways
>> given your approach.
>
> Simply don't count xids which are frozen. Currently we ignore an xid
> because its a special value, after this because the tuple has a certain
> hint bit (combination) set.

Right, what he said. Calculating the XID before which we no longer
need CLOG is just a matter of looking at all the tuples that we don't
know to be frozen and taking the oldest XID from among those. This
patch changes the definition of "frozen" but that's a pretty minor
detail of the CLOG-truncation calculation. So, in essence, this patch
doesn't really make much difference in that area either way.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: preserving forensic information when we freeze
Date:	2013-05-29 00:00:42
Message-ID:	20130529000042.GC818@awork2.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2013-05-28 09:21:27 -0400, Robert Haas wrote:
> I have attempted to implement this. Trouble is, we're out of infomask
> bits. Using an infomask2 bit might work but we don't have many of
> them left either, so it's worth casting about a bit for a better
> solution. Andres proposed using HEAP_MOVED_IN|HEAP_MOVED_OFF for
> this purpose, but I think we're better off trying to reclaim those
> bits in a future release. Exactly how to do that is a topic for
> another email, but I believe it's very possible. What I'd like to
> propose instead is using HEAP_XMIN_COMMITTED|HEAP_XMIN_INVALID to
> indicate that xmin is frozen. This bit pattern isn't used for
> anything else, so there's no confusion possible with existing data
> already on disk, but it's necessary to audit all code that checks
> HEAP_XMIN_INVALID to make sure it doesn't get confused. I've done
> this, and there's little enough of it that it seems pretty easy to
> handle.

I only suggested MOVED_IN/OFF because I didn't remember
HEAP_XMIN_COMMITTED|HEAP_XMIN_INVALID was still free ;). So, unless that
combination could have been produced in the past in a way I couldn't
find so far, I am all for it. I don't see a problem with breaking
backward compatibility on that level and I don't think there's any way
to get there without some level of low level compat break.

> - When we follow HOT chains, we determine where the HOT chain ends by
> matching the xmax of each tuple with the xmin of the next tuple. If
> they don't match, we conclude that the HOT chain has ended. I
> initially thought this logic might be buggy even as things stand today
> if the latest tuple in the chain is frozen, but as Andres pointed out
> to me, that's not so. If the newest tuple in the chain is all-visible
> (the earliest point at which we can theoretically freeze it), all
> earlier tuples are dead altogether, and heap_page_prune() is always
> called after locking the buffer and before freezing, so any tuple we
> freeze must be the first in its HOT chain. For the same reason, this
> logic doesn't need any adjustment for the new freezing system: it's
> never looking at anything old enough to be frozen in the first place.

I am all for adding a comment why this is safe though. We thought about
it for some while before we were convinced...

> - Various procedural languages use the combination of TID and XMIN to
> determine whether a function needs to be recompiled. Although the
> possibility of this doing the wrong thing seems quite remote, it's not
> obvious to me why it's theoretically correct even as things stand
> today. Suppose that previously-frozen tuple is vacuumed away and
> another tuple is placed at the same TID and then frozen. Then, we
> check whether the cache is still valid and, behold, it is. This would
> actually get better with this patch, since it wouldn't be enough
> merely for the old and new tuples to both be frozen; they'd have to
> have had the same XID prior to freezing. I think that could only
> happen if a backend sticks around for at least 2^32 transactions, but
> I don't know what would prevent it in that case.

Hm. As previously said, I am less than convinced of those adhoc
mechanisms and I think this should get properly integrated into the
normal cache invalidation mechanisms.
But: I think this is safe since we compare the stored/cached xmin/tid
with one gotten from the SearchSysCache just before which will point to
the correct location as of the last AcceptInvalidationMessages(). I
can't think of a way were this would allow the case you describe.

> - heap_get_latest_tid() appears broken even without this patch. It's
> only used on user-specified TIDs, either in a TID scan, or due to the
> use of currtid_byreloid() and currtid_byrelname(). It attempts find
> the latest version of the tuple referenced by the given TID by
> following the CTID links. Like HOT, it uses XMAX/XMIN matching to
> detect when the chain is broken. However, unlike HOT, update chains
> can in general span multiple blocks. It is not now nor has it ever
> been safe to assume that the next tuple in the chain can't be frozen
> before the previous one is vacuumed away. Andres came up with the
> best example: suppose the tuple to be frozen physically precedes its
> predecessor; then, an in-progress vacuum might reach the to-be-frozen
> tuple before it reaches (and removes) the previous row version. In
> newer releases, the same problem could be caused by vacuum's
> occasional page-skipping behavior. As with the previous point, the
> "don't actually change xmin when we freeze" approach actually makes it
> harder for a chain to get "broken" when it shouldn't, but I suspect
> it's just moving us from one set of extremely-obscure failure cases to
> another.

I don't think this is especially problematic though. If you do a tidscan
starting from a tid that is so old that it can be removed: you're doing
it wrong. The tid could have been reused for something else anyway. I
think the ctid chaining is only meaningful if the tuple got updated very
recently, i.e. you hold a snapshot that prevents the removal of the
root tuple's snapshot.

Nice work!

Andres

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: preserving forensic information when we freeze
Date:	2013-05-29 01:26:49
Message-ID:	CA+TgmoZUaryhXiaGrKaTjukaEQmDsqj7y8010ZqRGg75+FwVCQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, May 28, 2013 at 8:00 PM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> I only suggested MOVED_IN/OFF because I didn't remember
> HEAP_XMIN_COMMITTED|HEAP_XMIN_INVALID was still free ;). So, unless that
> combination could have been produced in the past in a way I couldn't
> find so far, I am all for it. I don't see a problem with breaking
> backward compatibility on that level and I don't think there's any way
> to get there without some level of low level compat break.

I'm not sure why you call it a compatibility break. It's true that an
old PostgreSQL can't read new heaps, but we've never guaranteed that
direction anyway, and every time we add or reinterpret any infomask
bits, that's true. For example, the fklocks patch did tat. What's
important is that a new PostgreSQL will still be able to read old
heaps; that is, pg_upgrade compatibility is preserved.

> I am all for adding a comment why this is safe though. We thought about
> it for some while before we were convinced...

I'm fine with that, but the logic is present in multiple places, and I
did not want to comment them all identically. If there's a central
place in which a comment would be appropriate, let's add it there; or
IOW, what do you suggest in detail?

> Hm. As previously said, I am less than convinced of those adhoc
> mechanisms and I think this should get properly integrated into the
> normal cache invalidation mechanisms.
> But: I think this is safe since we compare the stored/cached xmin/tid
> with one gotten from the SearchSysCache just before which will point to
> the correct location as of the last AcceptInvalidationMessages(). I
> can't think of a way were this would allow the case you describe.

I don't understand why it can't. AcceptInvalidationMessages()
guarantees that we're looking at the latest version of the catalog,
but it doesn't say anything about whether the latest catalog state
happens to look like any earlier catalog state.

> I don't think this is especially problematic though. If you do a tidscan
> starting from a tid that is so old that it can be removed: you're doing
> it wrong. The tid could have been reused for something else anyway. I
> think the ctid chaining is only meaningful if the tuple got updated very
> recently, i.e. you hold a snapshot that prevents the removal of the
> root tuple's snapshot.

That logic seems sound to me.

> Nice work!

Thanks!

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: preserving forensic information when we freeze
Date:	2013-05-29 02:08:44
Message-ID:	20130529020844.GG818@awork2.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2013-05-28 21:26:49 -0400, Robert Haas wrote:
> On Tue, May 28, 2013 at 8:00 PM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> > I only suggested MOVED_IN/OFF because I didn't remember
> > HEAP_XMIN_COMMITTED|HEAP_XMIN_INVALID was still free ;). So, unless that
> > combination could have been produced in the past in a way I couldn't
> > find so far, I am all for it. I don't see a problem with breaking
> > backward compatibility on that level and I don't think there's any way
> > to get there without some level of low level compat break.
>
> I'm not sure why you call it a compatibility break. It's true that an
> old PostgreSQL can't read new heaps, but we've never guaranteed that
> direction anyway, and every time we add or reinterpret any infomask
> bits, that's true. For example, the fklocks patch did tat. What's
> important is that a new PostgreSQL will still be able to read old
> heaps; that is, pg_upgrade compatibility is preserved.

Oh, not the on-disk format. But as you said, code that looks at xmin is
going to need to change. fklocks broke code that relied on
HeapTupleHeaderGetXmax(), this would break code that looks at
xmin. Removing/renaming *GetXmin similarly sounds like a good idea to
make the breakage explicit.

> > I am all for adding a comment why this is safe though. We thought about
> > it for some while before we were convinced...
>
> I'm fine with that, but the logic is present in multiple places, and I
> did not want to comment them all identically. If there's a central
> place in which a comment would be appropriate, let's add it there; or
> IOW, what do you suggest in detail?

That's a good point. Generally lots of this is underdocumented/commented
and can only learned by spending a year or so in the postgres code. I'd
say rename README.HOT to README and add a section there and reference it
from those two places? It already has a mostly general explanation of
concurrent index builds. Don't have a better idea.

> > Hm. As previously said, I am less than convinced of those adhoc
> > mechanisms and I think this should get properly integrated into the
> > normal cache invalidation mechanisms.
> > But: I think this is safe since we compare the stored/cached xmin/tid
> > with one gotten from the SearchSysCache just before which will point to
> > the correct location as of the last AcceptInvalidationMessages(). I
> > can't think of a way were this would allow the case you describe.
>
> I don't understand why it can't. AcceptInvalidationMessages()
> guarantees that we're looking at the latest version of the catalog,
> but it doesn't say anything about whether the latest catalog state
> happens to look like any earlier catalog state.

Well, AcceptInvalidationMessages() will get a new version of the tuple
with a new tid/xmin combo. So what would need to happen is the function
being updated (to a new location), then the old version needs to get
removed, then the new one updated again, reusing to the old
location. Allthewhile either both versions need to have been frozen or
we need to have wrapped around to the same xid. All that without the
function being executed inbetween which would have invalidated the old
state and stored a new xmin/tid.
But you're right, nothing except chance prevents that from happening,
not sure what I thought of earlier.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: preserving forensic information when we freeze
Date:	2013-05-29 11:42:57
Message-ID:	CA+Tgmob9dfwO=utBomDzoWj7UiaVQ5G3QMRxadouD6eyRb2o3w@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, May 28, 2013 at 10:08 PM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> On 2013-05-28 21:26:49 -0400, Robert Haas wrote:
>> On Tue, May 28, 2013 at 8:00 PM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
>> > I only suggested MOVED_IN/OFF because I didn't remember
>> > HEAP_XMIN_COMMITTED|HEAP_XMIN_INVALID was still free ;). So, unless that
>> > combination could have been produced in the past in a way I couldn't
>> > find so far, I am all for it. I don't see a problem with breaking
>> > backward compatibility on that level and I don't think there's any way
>> > to get there without some level of low level compat break.
>>
>> I'm not sure why you call it a compatibility break. It's true that an
>> old PostgreSQL can't read new heaps, but we've never guaranteed that
>> direction anyway, and every time we add or reinterpret any infomask
>> bits, that's true. For example, the fklocks patch did tat. What's
>> important is that a new PostgreSQL will still be able to read old
>> heaps; that is, pg_upgrade compatibility is preserved.
>
> Oh, not the on-disk format. But as you said, code that looks at xmin is
> going to need to change. fklocks broke code that relied on
> HeapTupleHeaderGetXmax(), this would break code that looks at
> xmin. Removing/renaming *GetXmin similarly sounds like a good idea to
> make the breakage explicit.

I thought about that, but there are enough callers that don't care
that it seemed like the wrong choice to me.

>> > I am all for adding a comment why this is safe though. We thought about
>> > it for some while before we were convinced...
>>
>> I'm fine with that, but the logic is present in multiple places, and I
>> did not want to comment them all identically. If there's a central
>> place in which a comment would be appropriate, let's add it there; or
>> IOW, what do you suggest in detail?
>
> That's a good point. Generally lots of this is underdocumented/commented
> and can only learned by spending a year or so in the postgres code. I'd
> say rename README.HOT to README and add a section there and reference it
> from those two places? It already has a mostly general explanation of
> concurrent index builds. Don't have a better idea.

Anyone else have a suggestion?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Peter Eisentraut <peter_e(at)gmx(dot)net>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: preserving forensic information when we freeze
Date:	2013-05-29 13:58:46
Message-ID:	51A60996.4030805@gmx.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 5/28/13 8:00 PM, Andres Freund wrote:
>> - Various procedural languages use the combination of TID and XMIN to
>> > determine whether a function needs to be recompiled.

> Hm. As previously said, I am less than convinced of those adhoc
> mechanisms and I think this should get properly integrated into the
> normal cache invalidation mechanisms.

Yes please.

From:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Andres Freund <andres(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: preserving forensic information when we freeze
Date:	2013-05-29 15:55:02
Message-ID:	20130529155502.GB15045@eldon.alvh.no-ip.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas escribió:
> On Tue, May 28, 2013 at 10:08 PM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> > On 2013-05-28 21:26:49 -0400, Robert Haas wrote:

> >> > I am all for adding a comment why this is safe though. We thought about
> >> > it for some while before we were convinced...
> >>
> >> I'm fine with that, but the logic is present in multiple places, and I
> >> did not want to comment them all identically. If there's a central
> >> place in which a comment would be appropriate, let's add it there; or
> >> IOW, what do you suggest in detail?
> >
> > That's a good point. Generally lots of this is underdocumented/commented
> > and can only learned by spending a year or so in the postgres code. I'd
> > say rename README.HOT to README and add a section there and reference it
> > from those two places? It already has a mostly general explanation of
> > concurrent index builds. Don't have a better idea.
>
> Anyone else have a suggestion?

I support the idea of using README files. I don't have an opinion on
whether it's best to have a single file for everything (i.e. rename
README.HOT and add to it) or just explain this in README.freeze or
something like that.

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: preserving forensic information when we freeze
Date:	2013-06-24 12:12:34
Message-ID:	20130624121234.GD6471@awork2.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2013-05-28 09:21:27 -0400, Robert Haas wrote:
> As a general statement, I view this work as something that is likely
> needed no matter which one of the "remove freezing" approaches that
> have been proposed we choose to adopt. It does not fix anything in
> and of itself, but it (hopefully) removes an objection to the entire
> line of inquiry.

Agreed. And it also improves the status quo when debugging. I wish this
would have been the representation since the beginning.

Some remarks:
* I don't really like that HeapTupleHeaderXminFrozen() now is frequently
performed without checking for FrozenTransactionId. I think the places
where that's done are currently safe, but it seems likely that we
introduce bugs due to people writing similar code.
I think replacing *GetXmin by a wrapper that returns
FrozenTransactionId if the hint bit tell us so would be a good
idea. Then add *GetRawXmin for the rest (probably mostly display).
Seems like it would make the patch actually smaller.

* The PLs imo should set fn_xmin to FrozenTransactionId if the hint bit
tell us the tuple is frozen.

* I think rewrite_heap_dead_tuple needs to check for a frozen xmin and
store that. We might looking at a chain which partially was done in
<9.4. Not sure if that's a realistic scenario, but I'd rather be safe.

Otherwise I don't see much that needs to be done before this can get
committed.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: preserving forensic information when we freeze
Date:	2013-07-02 20:29:56
Message-ID:	CA+TgmoYya2Xb+JO+J9ZwyCa=7CJZ19OWLJ0pGxNsxj4u-7EAKA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Jun 24, 2013 at 8:12 AM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> Agreed. And it also improves the status quo when debugging. I wish this
> would have been the representation since the beginning.
>
> Some remarks:
> * I don't really like that HeapTupleHeaderXminFrozen() now is frequently
> performed without checking for FrozenTransactionId. I think the places
> where that's done are currently safe, but it seems likely that we
> introduce bugs due to people writing similar code.
> I think replacing *GetXmin by a wrapper that returns
> FrozenTransactionId if the hint bit tell us so would be a good
> idea. Then add *GetRawXmin for the rest (probably mostly display).
> Seems like it would make the patch actually smaller.

I did think about this approach, but it seemed like it would add
cycles in a significant number of places. For example, consider the
HeapTupleSatisfies() routines, which are perhaps PostgreSQL's finest
example of a place where you DON'T want to add any cycles. All of the
checks on xmin are conditional on HeapTupleHeaderXminCommitted()
having been found already to be false. That implies that the frozen
bits aren't set either, so if HeapTupleHeaderGetXmin() were to recheck
the bits it would be a waste. As I got to the end of the patch I was
a little dismayed by the number of places that did need adjustment to
check both things, but there are still plenty of important places that
don't.

> * The PLs imo should set fn_xmin to FrozenTransactionId if the hint bit
> tell us the tuple is frozen.

Why? I thought about that, but it seemed to me that the purpose of
that caching was to avoid confusing two functions whose pg_proc tuples
ended up at the same TID. So there's a failure mechanism there: the
tuple can get vacuumed away and replaced with a new tuple which then
gets frozen, and everything (bogusly) matches. If the actual
XID-prior-to-freezing has to match, ISTM that the chances of a false
match are far lower. You have to get the new tuple at the same TID
slot *and* the XID counter has to wrap back around to the
exactly-right XID to get a false match on XID, within the lifetime of
a single backend. That's not bloody likely. Remember, the whole
point of the patch is to start freezing things sooner, so the scenario
where both the original and replacement tuples are frozen is going to
become more common.

We also don't particularly *want* the freezing of a pg_proc tuple to
force recompilations in every backend.

> * I think rewrite_heap_dead_tuple needs to check for a frozen xmin and
> store that. We might looking at a chain which partially was done in
> <9.4. Not sure if that's a realistic scenario, but I'd rather be safe.

IIUC, you're talking about the scenario where we have an update chain
X -> Y, where X is dead but not actually removed and Y is
(forensically) frozen. We're examining tuple Y and trying to
determine whether X has been entered in rs_unresolved_tups. If, as I
think you're proposing, we consider the xmin of Y to be
FrozenTransactionId, we will definitely not find it - because the way
it got into the table in the first place was based on the value
returned by HeapTupleHeaderGetUpdateXid(). And that value is certain
not to be FrozenTransactionId, because we never set the xmax of a
tuple to FrozenTransactionId.

There's no possibility of getting confused here; if X is still around
at all, it's xmax is of the same generation as Y's xmin. Otherwise,
we've had an undetected XID wraparound.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: preserving forensic information when we freeze
Date:	2013-07-03 17:07:15
Message-ID:	20130703170715.GD5667@awork2.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2013-07-02 16:29:56 -0400, Robert Haas wrote:
> On Mon, Jun 24, 2013 at 8:12 AM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> > Agreed. And it also improves the status quo when debugging. I wish this
> > would have been the representation since the beginning.
> >
> > Some remarks:
> > * I don't really like that HeapTupleHeaderXminFrozen() now is frequently
> > performed without checking for FrozenTransactionId. I think the places
> > where that's done are currently safe, but it seems likely that we
> > introduce bugs due to people writing similar code.
> > I think replacing *GetXmin by a wrapper that returns
> > FrozenTransactionId if the hint bit tell us so would be a good
> > idea. Then add *GetRawXmin for the rest (probably mostly display).
> > Seems like it would make the patch actually smaller.
>
> I did think about this approach, but it seemed like it would add
> cycles in a significant number of places. For example, consider the
> HeapTupleSatisfies() routines, which are perhaps PostgreSQL's finest
> example of a place where you DON'T want to add any cycles. All of the
> checks on xmin are conditional on HeapTupleHeaderXminCommitted()
> having been found already to be false. That implies that the frozen
> bits aren't set either, so if HeapTupleHeaderGetXmin() were to recheck
> the bits it would be a waste. As I got to the end of the patch I was
> a little dismayed by the number of places that did need adjustment to
> check both things, but there are still plenty of important places that
> don't.

Well, nothing would prevent using the HeapTupleHeaderGetRawXmin() in
those places. Exactly the number of callsites is what makes me think
that somebody will get this wrong in the future.

> > * The PLs imo should set fn_xmin to FrozenTransactionId if the hint bit
> > tell us the tuple is frozen.
>
> Why? I thought about that, but it seemed to me that the purpose of
> that caching was to avoid confusing two functions whose pg_proc tuples
> ended up at the same TID.
> [good reasoning]

Man. I hate this hack. I wonder what happens if somebody does a VACUUM
FULL of pg_class and there are a bunch of functions created in the same
transaction...

> > * I think rewrite_heap_dead_tuple needs to check for a frozen xmin and
> > store that. We might looking at a chain which partially was done in
> > <9.4. Not sure if that's a realistic scenario, but I'd rather be safe.
>
> IIUC, you're talking about the scenario where we have an update chain
> X -> Y, where X is dead but not actually removed and Y is
> (forensically) frozen. We're examining tuple Y and trying to
> determine whether X has been entered in rs_unresolved_tups. If, as I
> think you're proposing, we consider the xmin of Y to be
> FrozenTransactionId, we will definitely not find it - because the way
> it got into the table in the first place was based on the value
> returned by HeapTupleHeaderGetUpdateXid(). And that value is certain
> not to be FrozenTransactionId, because we never set the xmax of a
> tuple to FrozenTransactionId.

I am thinking of something slightly different. rewrite_heap_dead_tuple()
now removes tuples/xids from the unresolved table that could be from a
different xid epoch since it unconditionally does a HASH_REMOVE if it
finds an entry doing a lookup using the *preserved* xid. Earlier that
was harmless since for frozen tuples it only ever used
FrozenTransactionId which obviously cannot be part of a valid chain and
couldn't even get entered into unresolved_tups.

I am not sure at all if that actually can be harmful but there isn't any
reason we would need to do the delete so I wouldn't. There can be
complex enough situation where later parts of a ctid chain are dead and
earlier ones are recently dead and such that I would rather be cautious.

> There's no possibility of getting confused here; if X is still around
> at all, it's xmax is of the same generation as Y's xmin. Otherwise,
> we've had an undetected XID wraparound.

Another issue I thought about is what we will return for SELECT xmin
FROM blarg; Some people use that in their applications (IIRC
skytools/pqg/londiste does so) and they might get confused if we
suddently return xids from the future. On the other hand, not returning
the original xid would be a shame as well...

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Marko Kreen <markokr(at)gmail(dot)com>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: preserving forensic information when we freeze
Date:	2013-07-03 18:54:37
Message-ID:	CACMqXCJ0OLDV6bfSj+QNrm2yZUaxYuXNzben5scWydLwX4C2Tg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Jul 3, 2013 at 8:07 PM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> On 2013-07-02 16:29:56 -0400, Robert Haas wrote:
>> There's no possibility of getting confused here; if X is still around
>> at all, it's xmax is of the same generation as Y's xmin. Otherwise,
>> we've had an undetected XID wraparound.
>
> Another issue I thought about is what we will return for SELECT xmin
> FROM blarg; Some people use that in their applications (IIRC
> skytools/pqg/londiste does so) and they might get confused if we
> suddently return xids from the future. On the other hand, not returning
> the original xid would be a shame as well...

Skytools/pqg/londiste use only txid_* APIs, they don't look at
4-byte xids on table rows.

--
marko

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: preserving forensic information when we freeze
Date:	2013-07-03 18:59:27
Message-ID:	CA+TgmoZ8Lcv+b614DsrZeROf1bosLy-MAxjzj0uVgXnRURc=yA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Jul 3, 2013 at 1:07 PM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> Well, nothing would prevent using the HeapTupleHeaderGetRawXmin() in
> those places. Exactly the number of callsites is what makes me think
> that somebody will get this wrong in the future.

Well, I guess I could go rework the whole patch that way. It's a fair
request, but I kinda doubt it's going to make the patch smaller.

>> > * I think rewrite_heap_dead_tuple needs to check for a frozen xmin and
>> > store that. We might looking at a chain which partially was done in
>> > <9.4. Not sure if that's a realistic scenario, but I'd rather be safe.
>>
>> IIUC, you're talking about the scenario where we have an update chain
>> X -> Y, where X is dead but not actually removed and Y is
>> (forensically) frozen. We're examining tuple Y and trying to
>> determine whether X has been entered in rs_unresolved_tups. If, as I
>> think you're proposing, we consider the xmin of Y to be
>> FrozenTransactionId, we will definitely not find it - because the way
>> it got into the table in the first place was based on the value
>> returned by HeapTupleHeaderGetUpdateXid(). And that value is certain
>> not to be FrozenTransactionId, because we never set the xmax of a
>> tuple to FrozenTransactionId.
>
> I am thinking of something slightly different. rewrite_heap_dead_tuple()
> now removes tuples/xids from the unresolved table that could be from a
> different xid epoch since it unconditionally does a HASH_REMOVE if it
> finds an entry doing a lookup using the *preserved* xid. Earlier that
> was harmless since for frozen tuples it only ever used
> FrozenTransactionId which obviously cannot be part of a valid chain and
> couldn't even get entered into unresolved_tups.
>
> I am not sure at all if that actually can be harmful but there isn't any
> reason we would need to do the delete so I wouldn't. There can be
> complex enough situation where later parts of a ctid chain are dead and
> earlier ones are recently dead and such that I would rather be cautious.

OK, I think I see your point, and I think you're right.

>> There's no possibility of getting confused here; if X is still around
>> at all, it's xmax is of the same generation as Y's xmin. Otherwise,
>> we've had an undetected XID wraparound.
>
> Another issue I thought about is what we will return for SELECT xmin
> FROM blarg; Some people use that in their applications (IIRC
> skytools/pqg/londiste does so) and they might get confused if we
> suddently return xids from the future. On the other hand, not returning
> the original xid would be a shame as well...

Yeah. I think the system columns that we have today are pretty much
crap. I wonder if we couldn't throw them out and replace them with
some kind of functions that you can pass a row to. That would allow
us to expose a lot more detail without adding a bajillion hidden
columns, and for a bonus we'd save substantially on catalog bloat.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Josh Berkus <josh(at)agliodbs(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Andres Freund <andres(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: preserving forensic information when we freeze
Date:	2013-07-08 18:34:33
Message-ID:	51DB0639.9050707@agliodbs.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 07/03/2013 11:59 AM, Robert Haas wrote:
> Yeah. I think the system columns that we have today are pretty much
> crap. I wonder if we couldn't throw them out and replace them with
> some kind of functions that you can pass a row to. That would allow
> us to expose a lot more detail without adding a bajillion hidden
> columns, and for a bonus we'd save substantially on catalog bloat.

Where are we on this patch? Should I mark it Returned and you'll work
on it for September?

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Josh Berkus <josh(at)agliodbs(dot)com>
Cc:	Andres Freund <andres(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: preserving forensic information when we freeze
Date:	2013-07-08 20:38:00
Message-ID:	47A7586E-2681-4BDD-AFF9-D5AF259BD4FE@gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Jul 8, 2013, at 1:34 PM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
> On 07/03/2013 11:59 AM, Robert Haas wrote:
>> Yeah. I think the system columns that we have today are pretty much
>> crap. I wonder if we couldn't throw them out and replace them with
>> some kind of functions that you can pass a row to. That would allow
>> us to expose a lot more detail without adding a bajillion hidden
>> columns, and for a bonus we'd save substantially on catalog bloat.
>
> Where are we on this patch? Should I mark it Returned and you'll work
> on it for September?

Sure.

...Robert

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: preserving forensic information when we freeze
Date:	2013-11-21 14:47:25
Message-ID:	20131121144725.GB31748@awork2.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

As promised I'm currently updating the patch. Some questions arose
during that:

* Should HeapTupleHeaderXminFrozen also check for FrozenTransactionId?
It seems quite possible that people think they've delt with frozen
xmin entirely after checking, but they still might get
FrozenTransactionId back in a pg_upgraded cluster.
* Existing htup_details boolean checks contain an 'Is', but
HeapTupleHeaderXminCommitted, HeapTupleHeaderXminInvalid,
HeapTupleHeaderXminFrozen don't contain any verb. Not sure.
* EvalPlanQualFetch() uses priorXmax like logic to find updated versions
of tuples. I am not 100% sure there aren't cases where that's
problematic even with the current code, but I think it's better not to
use the raw xid there - priorXmax can never be FrozenTransactionId, so
comparing it to the GetXmin() seems better.
It also has the following wrong comment:
* .... (Should be safe to examine xmin without getting
* buffer's content lock, since xmin never changes in an existing
* tuple.)
But I don't see that causing any problems.
* ri_trigger.c did do a TransactionIdIsCurrentTransactionId() on a raw
xmin value, the consequences are minor though.

The rewrite to introduce HeapTupleHeaderGet[Raw]Xmin() indeed somewhat
increases the patchsize - but I think it's enough increase in
expressiveness to be well worth the cost. Indeed it led me to find at
least one issue (with minor consequences).

I think once we have this we should start opportunistically try to
freeze tuples during vacuum using OldestXmin instead of FreezeLimit if
the page is already dirty.

Patch 01 is a rebased version of Robert's patch without further changes,
02 contains my suggested modifications.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment	Content-Type	Size
0001-freeze-forensically-v1.patch	text/x-patch	20.0 KB
0002-Updates-to-the-patch-from-Andres.patch	text/x-patch	26.2 KB

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	"pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: preserving forensic information when we freeze
Date:	2013-11-21 20:59:35
Message-ID:	CA+TgmoZP7OS0vk7iU5jvdNxZVCi83RXBqqnM22QCFBmVE4Y_=A@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Nov 21, 2013 at 9:47 AM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> As promised I'm currently updating the patch. Some questions arose
> during that:
>
> * Should HeapTupleHeaderXminFrozen also check for FrozenTransactionId?
> It seems quite possible that people think they've delt with frozen
> xmin entirely after checking, but they still might get
> FrozenTransactionId back in a pg_upgraded cluster.

The reason I originally wrote the patch the way I did, rather than the
way that you prefer, is that it minimizes the number of places where
we might perform extra tests that are known not to be needed in
context. These code paths are hot. If you do this sort of thing,
then after macro expansion we may end up with a lot of things like:
(flags & FROZEN) || (rawxid == 2) ? 2 : rawxid. I want to avoid that.
That macros is intended, specifically, to be a test for flag bits,
and I think it should do precisely that. If that's not what you want,
then don't use that macro.

> * Existing htup_details boolean checks contain an 'Is', but
> HeapTupleHeaderXminCommitted, HeapTupleHeaderXminInvalid,
> HeapTupleHeaderXminFrozen don't contain any verb. Not sure.

We could say XminIsComitted, XminIsInvalid, XminIsFrozen, etc. I
don't particularly care for it, but I can see the argument for it.

> * EvalPlanQualFetch() uses priorXmax like logic to find updated versions
> of tuples. I am not 100% sure there aren't cases where that's
> problematic even with the current code, but I think it's better not to
> use the raw xid there - priorXmax can never be FrozenTransactionId, so
> comparing it to the GetXmin() seems better.
> It also has the following wrong comment:
> * .... (Should be safe to examine xmin without getting
> * buffer's content lock, since xmin never changes in an existing
> * tuple.)

Hmm. The new tuple can't be frozen unless it's all-visible, and it
can't be all-visible if our scan can still see its predecessor. That
would imply that if it's frozen, it must be an unrelated tuple. I
think.

> But I don't see that causing any problems.
> * ri_trigger.c did do a TransactionIdIsCurrentTransactionId() on a raw
> xmin value, the consequences are minor though.
>
> The rewrite to introduce HeapTupleHeaderGet[Raw]Xmin() indeed somewhat
> increases the patchsize - but I think it's enough increase in
> expressiveness to be well worth the cost. Indeed it led me to find at
> least one issue (with minor consequences).
>
> I think once we have this we should start opportunistically try to
> freeze tuples during vacuum using OldestXmin instead of FreezeLimit if
> the page is already dirty.

Separate patch, but yeah, something like that. If we have to mark the
page all-visible, we might as well freeze it while we're there. We
should think about how it interacts with Heikki's freeze-without-write
patch though.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	"pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: preserving forensic information when we freeze
Date:	2013-11-21 21:51:02
Message-ID:	20131121215102.GF27838@alap2.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2013-11-21 15:59:35 -0500, Robert Haas wrote:
> > * Should HeapTupleHeaderXminFrozen also check for FrozenTransactionId?
> > It seems quite possible that people think they've delt with frozen
> > xmin entirely after checking, but they still might get
> > FrozenTransactionId back in a pg_upgraded cluster.
>
> The reason I originally wrote the patch the way I did, rather than the
> way that you prefer, is that it minimizes the number of places where
> we might perform extra tests that are known not to be needed in
> context. These code paths are hot.

The patch as sent shouldn't really do that in any of paths I know to be
hot - it uses *RawXmin() there.

> If you do this sort of thing, then after macro expansion we may end up with a lot of things like:
> (flags & FROZEN) || (rawxid == 2) ? 2 : rawxid. I want to avoid that.

But in which cases would that actually be slower? There'll be no
additional code executed if the hint bits for frozen are set, and in
case not it will usually safe us an external function call to
TransactionIdPrecedes().

> That macros is intended, specifically, to be a test for flag bits,
> and I think it should do precisely that. If that's not what you want,
> then don't use that macro.

That's a fair argument. Although there's several HeapTupleHeader* macros
that muck with stuff besides infomask.

> > * Existing htup_details boolean checks contain an 'Is', but
> > HeapTupleHeaderXminCommitted, HeapTupleHeaderXminInvalid,
> > HeapTupleHeaderXminFrozen don't contain any verb. Not sure.
>
> We could say XminIsComitted, XminIsInvalid, XminIsFrozen, etc. I
> don't particularly care for it, but I can see the argument for it.

I don't have a clear preference either, I just noticed the inconsistency
and wasn't sure whether it was intentional.

> > * EvalPlanQualFetch() uses priorXmax like logic to find updated versions
> > of tuples. I am not 100% sure there aren't cases where that's
> > problematic even with the current code, but I think it's better not to
> > use the raw xid there - priorXmax can never be FrozenTransactionId, so
> > comparing it to the GetXmin() seems better.
> > It also has the following wrong comment:
> > * .... (Should be safe to examine xmin without getting
> > * buffer's content lock, since xmin never changes in an existing
> > * tuple.)
>
> Hmm. The new tuple can't be frozen unless it's all-visible, and it
> can't be all-visible if our scan can still see its predecessor. That
> would imply that if it's frozen, it must be an unrelated tuple. I
> think.

Yes, that's my reasoning as well - and why I think we should check for
new-style frozen xids. Either via my version of GetXmin() or
HeapTupleHeaderXminFrozen() if we go with yours.

> > I think once we have this we should start opportunistically try to
> > freeze tuples during vacuum using OldestXmin instead of FreezeLimit if
> > the page is already dirty.
>
> Separate patch, but yeah, something like that. If we have to mark the
> page all-visible, we might as well freeze it while we're there. We
> should think about how it interacts with Heikki's freeze-without-write
> patch though.

Definitely separate yes. And I agree, it's partially moot if Heikki's
patch gets in, but I am not sure it will make it into 9.4. There seems
to be quite some work left.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: preserving forensic information when we freeze
Date:	2013-11-24 16:09:04
Message-ID:	CA+U5nMLkw4R_Ze7KjzhQ44Z5-Rs7M2vs0T+uC06QEsWUoLzajA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 21 November 2013 16:51, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:

> Definitely separate yes. And I agree, it's partially moot if Heikki's
> patch gets in, but I am not sure it will make it into 9.4. There seems
> to be quite some work left.

I'd prefer to do all 3 patches in one release. Don't mind which one.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	"pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: preserving forensic information when we freeze
Date:	2013-12-11 19:17:27
Message-ID:	CA+TgmoZYuFDUvKkgrg1tQP+TtUTn39Mjo9zt7hVLSNCLrLTZrA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Nov 21, 2013 at 4:51 PM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> On 2013-11-21 15:59:35 -0500, Robert Haas wrote:
>> > * Should HeapTupleHeaderXminFrozen also check for FrozenTransactionId?
>> > It seems quite possible that people think they've delt with frozen
>> > xmin entirely after checking, but they still might get
>> > FrozenTransactionId back in a pg_upgraded cluster.
>>
>> The reason I originally wrote the patch the way I did, rather than the
>> way that you prefer, is that it minimizes the number of places where
>> we might perform extra tests that are known not to be needed in
>> context. These code paths are hot.
>
> The patch as sent shouldn't really do that in any of paths I know to be
> hot - it uses *RawXmin() there.
>
>> If you do this sort of thing, then after macro expansion we may end up with a lot of things like:
>> (flags & FROZEN) || (rawxid == 2) ? 2 : rawxid. I want to avoid that.
>
> But in which cases would that actually be slower? There'll be no
> additional code executed if the hint bits for frozen are set, and in
> case not it will usually safe us an external function call to
> TransactionIdPrecedes().

Dunno. It's at least more code generation.

>> That macros is intended, specifically, to be a test for flag bits,
>> and I think it should do precisely that. If that's not what you want,
>> then don't use that macro.
>
> That's a fair argument. Although there's several HeapTupleHeader* macros
> that muck with stuff besides infomask.

Sure, but that doesn't mean they ALL have to.

>> > * Existing htup_details boolean checks contain an 'Is', but
>> > HeapTupleHeaderXminCommitted, HeapTupleHeaderXminInvalid,
>> > HeapTupleHeaderXminFrozen don't contain any verb. Not sure.
>>
>> We could say XminIsComitted, XminIsInvalid, XminIsFrozen, etc. I
>> don't particularly care for it, but I can see the argument for it.
>
> I don't have a clear preference either, I just noticed the inconsistency
> and wasn't sure whether it was intentional.

It was intentional enough. :-)

>>> > I think once we have this we should start opportunistically try to
>> > freeze tuples during vacuum using OldestXmin instead of FreezeLimit if
>> > the page is already dirty.
>>
>> Separate patch, but yeah, something like that. If we have to mark the
>> page all-visible, we might as well freeze it while we're there. We
>> should think about how it interacts with Heikki's freeze-without-write
>> patch though.
>
> Definitely separate yes. And I agree, it's partially moot if Heikki's
> patch gets in, but I am not sure it will make it into 9.4. There seems
> to be quite some work left.

I haven't heard anything further from Heikki, so I'm thinking we
should proceed with this approach. It seems to be the path of least
resistance, if not essential, for making CLUSTER freeze everything
automatically, a change almost everyone seems to really want. Even if
we did have Heikki's stuff, making cluster freeze more aggressively is
still a good argument for doing this. The pages can then be marked
all-visible (something Bruce is working on) and never need to be
revisited. Without this, I don't think we can get there. If we also
handle the vacuum-dirtied-it-already case as you propose here, I think
we'd have quite a respectable improvement in vacuum behavior for 9.4,
even without Heikki's stuff.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Andres Freund <andres(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: preserving forensic information when we freeze
Date:	2013-12-11 19:25:30
Message-ID:	52A8BC2A.2030505@vmware.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 12/11/2013 09:17 PM, Robert Haas wrote:
> On Thu, Nov 21, 2013 at 4:51 PM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
>> On 2013-11-21 15:59:35 -0500, Robert Haas wrote:
>>> Separate patch, but yeah, something like that. If we have to mark the
>>> page all-visible, we might as well freeze it while we're there. We
>>> should think about how it interacts with Heikki's freeze-without-write
>>> patch though.
>>
>> Definitely separate yes. And I agree, it's partially moot if Heikki's
>> patch gets in, but I am not sure it will make it into 9.4. There seems
>> to be quite some work left.
>
> I haven't heard anything further from Heikki, so I'm thinking we
> should proceed with this approach.

+1. It seems unlikely that my patch is going to make it into 9.4.

- Heikki

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc:	"pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: preserving forensic information when we freeze
Date:	2013-12-11 22:01:04
Message-ID:	20131211220104.GB536@awork2.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2013-12-11 14:17:27 -0500, Robert Haas wrote:
> > But in which cases would that actually be slower? There'll be no
> > additional code executed if the hint bits for frozen are set, and in
> > case not it will usually safe us an external function call to
> > TransactionIdPrecedes().
>
> Dunno. It's at least more code generation.

IMO it looks cleaner and is less error prone, so an additional
instruction or two in the slow path doesn't seem a high price. And we're
really not talking about more.

But I won't do more than roll my eyes loudly if you decide to go ahead
with your version as long as you change rewriteheap.c accordingly.

> >>> > I think once we have this we should start opportunistically try to
> >> > freeze tuples during vacuum using OldestXmin instead of FreezeLimit if
> >> > the page is already dirty.
> >>
> >> Separate patch, but yeah, something like that. If we have to mark the
> >> page all-visible, we might as well freeze it while we're there. We
> >> should think about how it interacts with Heikki's freeze-without-write
> >> patch though.
> >
> > Definitely separate yes. And I agree, it's partially moot if Heikki's
> > patch gets in, but I am not sure it will make it into 9.4. There seems
> > to be quite some work left.
>
> I haven't heard anything further from Heikki, so I'm thinking we
> should proceed with this approach.

Yea, I think by now it's pretty unlikely to get into 9.4. I hope for
early 9.5 tho.

What's your plan to commit this? I'd prefer to wait till Alvaro's
freezing changes get in, so his patch will look the same in HEAD and
9.3. But I think he plans to commit soon.

> If we also
> handle the vacuum-dirtied-it-already case as you propose here, I think
> we'd have quite a respectable improvement in vacuum behavior for 9.4,
> even without Heikki's stuff.

Yes, and it seems like a realistic goal, so let's go for it.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: preserving forensic information when we freeze
Date:	2013-12-11 22:25:10
Message-ID:	20131211222510.GC5777@eldon.alvh.no-ip.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Andres Freund escribió:

> What's your plan to commit this? I'd prefer to wait till Alvaro's
> freezing changes get in, so his patch will look the same in HEAD and
> 9.3. But I think he plans to commit soon.

Yes, I do. I'm waiting on feedback on the patch I posted this
afternoon, so if there's nothing more soon I will push it.

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc:	Andres Freund <andres(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: preserving forensic information when we freeze
Date:	2013-12-17 21:00:14
Message-ID:	CA+TgmoZhRTs=ZD4n2x7YM-kP=cpjw_hwtdo3WKZ-OChWYX5MBw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Dec 11, 2013 at 5:25 PM, Alvaro Herrera
<alvherre(at)2ndquadrant(dot)com> wrote:
> Andres Freund escribió:
>> What's your plan to commit this? I'd prefer to wait till Alvaro's
>> freezing changes get in, so his patch will look the same in HEAD and
>> 9.3. But I think he plans to commit soon.
>
> Yes, I do. I'm waiting on feedback on the patch I posted this
> afternoon, so if there's nothing more soon I will push it.

That's done now, so I've rebased this patch and hacked on it a bit
more. The latest version is attached. Review would be appreciate in
case I've goofed up anything critical, especially around adjusting
things over top of Alvaro's freezing changes. But I think this is
more or less ready to go.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment	Content-Type	Size
freeze-forensically-v3.patch	text/x-patch	29.7 KB

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: preserving forensic information when we freeze
Date:	2013-12-18 22:54:46
Message-ID:	20131218225446.GG26481@alap2.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

On 2013-12-17 16:00:14 -0500, Robert Haas wrote:
> @@ -5874,19 +5858,27 @@ heap_prepare_freeze_tuple(HeapTupleHeader tuple, TransactionId cutoff_xid,
> void
> heap_execute_freeze_tuple(HeapTupleHeader tuple, xl_heap_freeze_tuple *frz)
> {
> + tuple->t_infomask = frz->t_infomask;
> + tuple->t_infomask2 = frz->t_infomask2;
> +
> if (frz->frzflags & XLH_FREEZE_XMIN)
> - HeapTupleHeaderSetXmin(tuple, FrozenTransactionId);
> + HeapTupleHeaderSetXminFrozen(tuple);
>
> HeapTupleHeaderSetXmax(tuple, frz->xmax);
>
> if (frz->frzflags & XLH_FREEZE_XVAC)
> + {
> HeapTupleHeaderSetXvac(tuple, FrozenTransactionId);
> + /* If we somehow haven't hinted the tuple previously, do it now. */
> + HeapTupleHeaderSetXminCommitted(tuple);
> + }

What's the reasoning behind adding HeapTupleHeaderSetXminCommitted()
here?

> @@ -823,14 +823,14 @@ lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
> * NB: Like with per-tuple hint bits, we can't set the
> * PD_ALL_VISIBLE flag if the inserter committed
> * asynchronously. See SetHintBits for more info. Check
> - * that the HEAP_XMIN_COMMITTED hint bit is set because of
> - * that.
> + * that the tuple is hinted xmin-committed hint bit because
> + * of that.
> */

Looks like there's a "hint bit" too much here.

> @@ -65,6 +65,9 @@ manage to be a conflict it would merely mean that one bit-update would
> be lost and need to be done again later. These four bits are only hints
> (they cache the results of transaction status lookups in pg_clog), so no
> great harm is done if they get reset to zero by conflicting updates.
> +Note, however, that a tuple is frozen by setting both HEAP_XMIN_INVALID
> +and HEAP_XMIN_COMMITTED; this is a critical update and accordingly requires
> +an exclusive buffer lock.

I think it'd be appropriate to mention that this needs to be preserved
via WAL logging, it sounds like it's suficient to set a hint bit without
persistenc guarantees.

(not sure if I already wrote this, but whatever)

Looking good.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: preserving forensic information when we freeze
Date:	2013-12-19 02:42:25
Message-ID:	CA+TgmoYq0GgRPGxiv9MONcmVipQeAMpd-WvnEf4F05gBhnx=Zw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Dec 18, 2013 at 5:54 PM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
>> if (frz->frzflags & XLH_FREEZE_XVAC)
>> + {
>> HeapTupleHeaderSetXvac(tuple, FrozenTransactionId);
>> + /* If we somehow haven't hinted the tuple previously, do it now. */
>> + HeapTupleHeaderSetXminCommitted(tuple);
>> + }
>
> What's the reasoning behind adding HeapTupleHeaderSetXminCommitted()
> here?

I'm just copying the existing logic. See the final stanza of
heap_prepare_freeze_tuple.

> [ snip ]

Will fix.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: preserving forensic information when we freeze
Date:	2013-12-19 10:44:20
Message-ID:	20131219104420.GB1759@alap2.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2013-12-18 21:42:25 -0500, Robert Haas wrote:
> On Wed, Dec 18, 2013 at 5:54 PM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> >> if (frz->frzflags & XLH_FREEZE_XVAC)
> >> + {
> >> HeapTupleHeaderSetXvac(tuple, FrozenTransactionId);
> >> + /* If we somehow haven't hinted the tuple previously, do it now. */
> >> + HeapTupleHeaderSetXminCommitted(tuple);
> >> + }
> >
> > What's the reasoning behind adding HeapTupleHeaderSetXminCommitted()
> > here?
>
> I'm just copying the existing logic. See the final stanza of
> heap_prepare_freeze_tuple.

Yes, but why don't you keep that in heap_prepare_freeze_tuple()? Just
because of HeapTupleHeaderSetXminCommitted()? I dislike transporting the
infomask in the wal record and then changing it away from that again afterwards.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: preserving forensic information when we freeze
Date:	2013-12-19 12:40:40
Message-ID:	CA+TgmobuMF09aAVhwWknRRuSgrj_PvtkVP+MQo1VYzF9YkLmHg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Dec 19, 2013 at 5:44 AM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> On 2013-12-18 21:42:25 -0500, Robert Haas wrote:
>> On Wed, Dec 18, 2013 at 5:54 PM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
>> >> if (frz->frzflags & XLH_FREEZE_XVAC)
>> >> + {
>> >> HeapTupleHeaderSetXvac(tuple, FrozenTransactionId);
>> >> + /* If we somehow haven't hinted the tuple previously, do it now. */
>> >> + HeapTupleHeaderSetXminCommitted(tuple);
>> >> + }
>> >
>> > What's the reasoning behind adding HeapTupleHeaderSetXminCommitted()
>> > here?
>>
>> I'm just copying the existing logic. See the final stanza of
>> heap_prepare_freeze_tuple.
>
> Yes, but why don't you keep that in heap_prepare_freeze_tuple()? Just
> because of HeapTupleHeaderSetXminCommitted()?

Yes, that's pretty much it.

> I dislike transporting the
> infomask in the wal record and then changing it away from that again afterwards.

I don't really see a problem with it. Relying on the macros to tweak
the bits seems more future-proof than inventing some other way to do
it (that might even get copied into other parts of the code where it's
even less safe). I actually think transporting the infomask is kind
of a funky way to handle this in the first instance, but I don't think
it's this patch's job to kibitz that decision.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: preserving forensic information when we freeze
Date:	2013-12-19 14:19:14
Message-ID:	20131219141914.GA11483@alap2.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2013-12-19 07:40:40 -0500, Robert Haas wrote:
> On Thu, Dec 19, 2013 at 5:44 AM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> > On 2013-12-18 21:42:25 -0500, Robert Haas wrote:
>
> > I dislike transporting the
> > infomask in the wal record and then changing it away from that again afterwards.
>
> I don't really see a problem with it. Relying on the macros to tweak
> the bits seems more future-proof than inventing some other way to do
> it (that might even get copied into other parts of the code where it's
> even less safe).

Then there should be a macro to twiddle the infomask, without touching
the tuple.

> I actually think transporting the infomask is kind
> of a funky way to handle this in the first instance, but I don't think
> it's this patch's job to kibitz that decision.

It's not nice, I grant you that, but I don't see how to do it
otherwise. We can't yet set the hint bits in
heap_prepare_freeze_tuple(), as we're not in a critical section, and
thus haven't replaced eventual multixacts by plain xids.
Running it inside a critical section isn't really realistic, as we'd
either have to iterate over the whole page, including memory
allocations, inside one, or we'd have to WAL log each individual item.

We could obviously encode all the infomask setting required in flags
instructing heap_execute_freeze_tuple() to set them, but that seems more
complex without accompanying benefit.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: preserving forensic information when we freeze
Date:	2013-12-19 14:37:09
Message-ID:	CA+Tgmob_D+0WNz-uFD8BQ7VRrn1mY0t8zy1t5ujq4wcBSHf6AQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Dec 19, 2013 at 9:19 AM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> On 2013-12-19 07:40:40 -0500, Robert Haas wrote:
>> On Thu, Dec 19, 2013 at 5:44 AM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
>> > On 2013-12-18 21:42:25 -0500, Robert Haas wrote:
>>
>> > I dislike transporting the
>> > infomask in the wal record and then changing it away from that again afterwards.
>>
>> I don't really see a problem with it. Relying on the macros to tweak
>> the bits seems more future-proof than inventing some other way to do
>> it (that might even get copied into other parts of the code where it's
>> even less safe).
>
> Then there should be a macro to twiddle the infomask, without touching
> the tuple.

Sure, we can invent that. I personally don't like it as well.

>> I actually think transporting the infomask is kind
>> of a funky way to handle this in the first instance, but I don't think
>> it's this patch's job to kibitz that decision.
>
> It's not nice, I grant you that, but I don't see how to do it
> otherwise. We can't yet set the hint bits in
> heap_prepare_freeze_tuple(), as we're not in a critical section, and
> thus haven't replaced eventual multixacts by plain xids.
> Running it inside a critical section isn't really realistic, as we'd
> either have to iterate over the whole page, including memory
> allocations, inside one, or we'd have to WAL log each individual item.
>
> We could obviously encode all the infomask setting required in flags
> instructing heap_execute_freeze_tuple() to set them, but that seems more
> complex without accompanying benefit.

Abstraction is a benefit unto itself.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: preserving forensic information when we freeze
Date:	2013-12-19 20:21:52
Message-ID:	CA+TgmoYFV9o4fLaE1di_McpY2ZpRG2hbou-Qq_-Z1POD5sDU7Q@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Dec 19, 2013 at 9:37 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Thu, Dec 19, 2013 at 9:19 AM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
>> On 2013-12-19 07:40:40 -0500, Robert Haas wrote:
>>> On Thu, Dec 19, 2013 at 5:44 AM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
>>> > On 2013-12-18 21:42:25 -0500, Robert Haas wrote:
>>>
>>> > I dislike transporting the
>>> > infomask in the wal record and then changing it away from that again afterwards.
>>>
>>> I don't really see a problem with it. Relying on the macros to tweak
>>> the bits seems more future-proof than inventing some other way to do
>>> it (that might even get copied into other parts of the code where it's
>>> even less safe).
>>
>> Then there should be a macro to twiddle the infomask, without touching
>> the tuple.
>
> Sure, we can invent that. I personally don't like it as well.

After some off-list discussion via IM I propose the following
compromise: it reverts my changes to do some of the infomask bit
twaddling in the execute function, but doesn't invent new macros
either. My main concern about inventing new macros is that I don't
want to encourage people to write code that knows specifically which
parts of the heap tuple header are in which fields. I think it may
have been a mistake to divide responsibility between the prepare and
execute functions the way we did in this case, because it doesn't
appear to be a clean separation of concerns. But it's not this
patch's job to kibitz that decision, so this version just fits in with
the way things are already being done there.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment	Content-Type	Size
freeze-forensically-v4.patch	text/x-patch	29.7 KB

From:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Andres Freund <andres(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: preserving forensic information when we freeze
Date:	2013-12-19 20:36:43
Message-ID:	20131219203643.GM11006@eldon.alvh.no-ip.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas escribió:

> I think it may have been a mistake to divide responsibility between
> the prepare and execute functions the way we did in this case, because
> it doesn't appear to be a clean separation of concerns. But it's not
> this patch's job to kibitz that decision, so this version just fits in
> with the way things are already being done there.

If you want to change how it works, feel free to propose a patch; I'm
not in love with what we're doing, honestly, and I did propose the idea
of using some flag bits instead of the whole mask, but didn't get any
traction. (Not that that thread had a lot of participants.)

Or are you suggesting that I should do it? I would have welcomed
feedback when I was on that, but I have moved on to other things now,
and I don't want to be a blocker for the forensic freeze patch.

Anyway I think if we want to change it, the time is now, before we
release 9.3.3.

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc:	Andres Freund <andres(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: preserving forensic information when we freeze
Date:	2013-12-19 21:33:00
Message-ID:	CA+TgmoZonR8RbWkUKYE_oHLfseuwZyKr-=12Jo6Q6VTUZSB6Rw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Dec 19, 2013 at 3:36 PM, Alvaro Herrera
<alvherre(at)2ndquadrant(dot)com> wrote:
> Robert Haas escribió:
>> I think it may have been a mistake to divide responsibility between
>> the prepare and execute functions the way we did in this case, because
>> it doesn't appear to be a clean separation of concerns. But it's not
>> this patch's job to kibitz that decision, so this version just fits in
>> with the way things are already being done there.
>
> If you want to change how it works, feel free to propose a patch; I'm
> not in love with what we're doing, honestly, and I did propose the idea
> of using some flag bits instead of the whole mask, but didn't get any
> traction. (Not that that thread had a lot of participants.)
>
> Or are you suggesting that I should do it? I would have welcomed
> feedback when I was on that, but I have moved on to other things now,
> and I don't want to be a blocker for the forensic freeze patch.
>
> Anyway I think if we want to change it, the time is now, before we
> release 9.3.3.

I am sorry I wasn't able to follow that thread in more detail at the
time, and I'm not trying to create extra work for you now. You've put
a lot of work into stabilizing the fkeylocks stuff and it's not my
purpose to cast aspersions on that, nor do I think that this is so bad
we can't live with it. The somewhat ambiguous phrasing of that email
is attributable to the fact that I really don't have a clear idea what
would be better than what you've got here now, and even if I did, I'm
not eager to be the guy who insists on refactoring and breaks things
again in the process.

It's tempting to think that the prepare function log a set of flags
indicating what logical operations should be performed on the target
tuple, rather than the new infomask and infomask2 fields per se; or
else that we ought to just log the whole HeapTupleHeader, so that the
execute function just copies the data in, splat. In other words, make
the logging either purely logical, or purely physical, not a mix. But
making it purely logical would involve translating between multiple
sets of flags in a way that might not accomplish much beyond
increasing the possibility for error, and making it purely physical
would make the WAL record bigger for no real gain. So perhaps the way
you've chosen is best after all, despite my reservations.

But in either case, it was not my purpose to hijack this thread to
talk about that patch, just to explain how I've updated this patch to
(hopefully) satisfy Andres's concerns.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Jim Nasby <jim(at)nasby(dot)net>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	"pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: preserving forensic information when we freeze
Date:	2013-12-20 02:09:47
Message-ID:	52B3A6EB.2030606@nasby.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

One thing users will lose in this patch is the ability to reliably see if a tuple is frozen via SQL. Today you can do that just by selecting xmin from the table.

Obviously people don't generally need to do that... but it's one of those things that when you do need it it's incredibly handy to have... would it be difficult to expose infomask(2) via SQL, the same way xmin et all are?
--
Jim C. Nasby, Data Architect jim(at)nasby(dot)net
512.569.9461 (cell) http://jim.nasby.net

From:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To:	Jim Nasby <jim(at)nasby(dot)net>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: preserving forensic information when we freeze
Date:	2013-12-20 02:22:02
Message-ID:	20131220022202.GO11006@eldon.alvh.no-ip.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Jim Nasby escribió:
> One thing users will lose in this patch is the ability to reliably see if a tuple is frozen via SQL. Today you can do that just by selecting xmin from the table.
>
> Obviously people don't generally need to do that... but it's one of those things that when you do need it it's incredibly handy to have... would it be difficult to expose infomask(2) via SQL, the same way xmin et all are?

It's already exposed via the pageinspect extension. It's doubtful that
there are many valid cases where you need infomask but don't have access
to that module.

The real fix seems to ensure that the module is always available.

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc:	Jim Nasby <jim(at)nasby(dot)net>, Andres Freund <andres(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: preserving forensic information when we freeze
Date:	2013-12-20 12:12:01
Message-ID:	CA+Tgmob7gyp=SAgXDXTFgyzsP2a9kYcFT7e9at5_mLSbnpCjBQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Dec 19, 2013 at 9:22 PM, Alvaro Herrera
<alvherre(at)2ndquadrant(dot)com> wrote:
> Jim Nasby escribió:
>> One thing users will lose in this patch is the ability to reliably see if a tuple is frozen via SQL. Today you can do that just by selecting xmin from the table.
>>
>> Obviously people don't generally need to do that... but it's one of those things that when you do need it it's incredibly handy to have... would it be difficult to expose infomask(2) via SQL, the same way xmin et all are?
>
> It's already exposed via the pageinspect extension. It's doubtful that
> there are many valid cases where you need infomask but don't have access
> to that module.
>
> The real fix seems to ensure that the module is always available.

We could make the code that displays this column do GetXmin rather
than GetRawXmin, which would allow this question to be answered, but
lose the ability to find the original xmin once the tuple is frozen
and break precedent with the xmax and cmin/cmax fields, which return
the "raw" value from the tuple header. But I'm OK to go whichever
direction people prefer.

I think the root of the problem is that nobody's very eager to add
more hidden system catalog columns because each one bloats
pg_attribute significantly. Currently, we have xmin, xmax, cmin, and
cmax columns, and that's basically just a historical artifact at this
point. cmin and cmax always return the same value, and the value
returned may be neither a cmin nor a cmax but a combo CID. And if
you're looking for information that's only in the infomask, good luck.

Maybe what we should do is add a function something like
pg_tuple_header(tableoid, ctid) that returns a record, maybe something
like (rawxmin xid, rawxmax xid, rawcid cid, infomask int, infomask2
int, hoff int). Or perhaps some slightly more cooked version of that
information. And then delete the xmin, xmax, cmin, and cmax system
columns. That'd save significantly on pg_attribute entries while, at
the same time, actually providing more information than we do today.

pageinspect is useful, too, but it seems to deal mostly with pages, so
I'm not sure it's a natural substitute for something like this.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Jim Nasby <jim(at)nasby(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: preserving forensic information when we freeze
Date:	2013-12-20 12:22:07
Message-ID:	20131220122207.GA32720@alap2.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2013-12-20 07:12:01 -0500, Robert Haas wrote:
> I think the root of the problem is that nobody's very eager to add
> more hidden system catalog columns because each one bloats
> pg_attribute significantly.

I think that part is actually solveable - there's no real need for them
to be real columns, present in pg_attribute, things could easily enough be
setup during parse analysis. The bigger problem I see is that adding
more useful columns will cause name clashes, which will probably
prohibit improvements in that direction.

> Maybe what we should do is add a function something like
> pg_tuple_header(tableoid, ctid) that returns a record, maybe something
> like (rawxmin xid, rawxmax xid, rawcid cid, infomask int, infomask2
> int, hoff int). Or perhaps some slightly more cooked version of that
> information. And then delete the xmin, xmax, cmin, and cmax system
> columns. That'd save significantly on pg_attribute entries while, at
> the same time, actually providing more information than we do today.

I was wondering whether we couldn't just pass pg_tuple_header() a whole
row, instead of having the user manually pass in reloid and ctid. I
think that should actually work in the interesting scenarios.

> pageinspect is useful, too, but it seems to deal mostly with pages, so
> I'm not sure it's a natural substitute for something like this.

Agreed. I also think that we really need something to investigate
problems in core, not in a contrib module people won't have installed,
especially in bigger setups. That said I plan to either submit some
improvements to pageinspect myself, or convince somebody else to do so
in the not too far away future.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Jim Nasby <jim(at)nasby(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: preserving forensic information when we freeze
Date:	2013-12-20 12:47:17
Message-ID:	CA+TgmoYw69a_qmXH4mjyC7-+MrHzsfiTdgcYjVp-i0146EnSnA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Dec 20, 2013 at 7:22 AM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
>> Maybe what we should do is add a function something like
>> pg_tuple_header(tableoid, ctid) that returns a record, maybe something
>> like (rawxmin xid, rawxmax xid, rawcid cid, infomask int, infomask2
>> int, hoff int). Or perhaps some slightly more cooked version of that
>> information. And then delete the xmin, xmax, cmin, and cmax system
>> columns. That'd save significantly on pg_attribute entries while, at
>> the same time, actually providing more information than we do today.
>
> I was wondering whether we couldn't just pass pg_tuple_header() a whole
> row, instead of having the user manually pass in reloid and ctid. I
> think that should actually work in the interesting scenarios.

I wondered that, too, but it's not well-defined for all tuples. What
happens if you pass in constructed tuple rather than an on-disk tuple?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Jim Nasby <jim(at)nasby(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: preserving forensic information when we freeze
Date:	2013-12-20 12:51:52
Message-ID:	20131220125152.GB32720@alap2.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2013-12-20 07:47:17 -0500, Robert Haas wrote:
> On Fri, Dec 20, 2013 at 7:22 AM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> >> Maybe what we should do is add a function something like
> >> pg_tuple_header(tableoid, ctid) that returns a record, maybe something
> >> like (rawxmin xid, rawxmax xid, rawcid cid, infomask int, infomask2
> >> int, hoff int). Or perhaps some slightly more cooked version of that
> >> information. And then delete the xmin, xmax, cmin, and cmax system
> >> columns. That'd save significantly on pg_attribute entries while, at
> >> the same time, actually providing more information than we do today.
> >
> > I was wondering whether we couldn't just pass pg_tuple_header() a whole
> > row, instead of having the user manually pass in reloid and ctid. I
> > think that should actually work in the interesting scenarios.
>
> I wondered that, too, but it's not well-defined for all tuples. What
> happens if you pass in constructed tuple rather than an on-disk tuple?

Those should be discernible I think, t_self/t_tableOid won't be set for
generated tuples.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Jim Nasby <jim(at)nasby(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: preserving forensic information when we freeze
Date:	2013-12-20 12:58:46
Message-ID:	CA+TgmoaQRwZTuDM3Z_VyP0vPoU8yk=Sd_Ec9dcXetsZAjx3f7A@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Dec 20, 2013 at 7:51 AM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> On 2013-12-20 07:47:17 -0500, Robert Haas wrote:
>> On Fri, Dec 20, 2013 at 7:22 AM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
>> >> Maybe what we should do is add a function something like
>> >> pg_tuple_header(tableoid, ctid) that returns a record, maybe something
>> >> like (rawxmin xid, rawxmax xid, rawcid cid, infomask int, infomask2
>> >> int, hoff int). Or perhaps some slightly more cooked version of that
>> >> information. And then delete the xmin, xmax, cmin, and cmax system
>> >> columns. That'd save significantly on pg_attribute entries while, at
>> >> the same time, actually providing more information than we do today.
>> >
>> > I was wondering whether we couldn't just pass pg_tuple_header() a whole
>> > row, instead of having the user manually pass in reloid and ctid. I
>> > think that should actually work in the interesting scenarios.
>>
>> I wondered that, too, but it's not well-defined for all tuples. What
>> happens if you pass in constructed tuple rather than an on-disk tuple?
>
> Those should be discernible I think, t_self/t_tableOid won't be set for
> generated tuples.

Hmm. That might work.

I think the immediate problem is to decide whether this patch ought to
make the xmin column display the result of GetXmin() or GetRawXmin().
Thoughts on that?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Jim Nasby <jim(at)nasby(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: preserving forensic information when we freeze
Date:	2013-12-20 13:01:09
Message-ID:	20131220130109.GC32720@alap2.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2013-12-20 07:58:46 -0500, Robert Haas wrote:
> I think the immediate problem is to decide whether this patch ought to
> make the xmin column display the result of GetXmin() or GetRawXmin().
> Thoughts on that?

I slightly favor GetRawXmin().

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Jim Nasby <jim(at)nasby(dot)net>, Andres Freund <andres(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: preserving forensic information when we freeze
Date:	2013-12-20 13:13:13
Message-ID:	20131220131313.GQ11006@eldon.alvh.no-ip.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas escribió:

+1 for this general idea. I proposed this some time ago and got shot
down because of pageinspect. I don't know about taking the system
columns out completely -- not sure how much third party code we're going
to break that way, certainly a lot -- but the extra functionality would
be useful nonetheless.

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Andres Freund <andres(at)2ndquadrant(dot)com>, Jim Nasby <jim(at)nasby(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: preserving forensic information when we freeze
Date:	2013-12-20 18:41:24
Message-ID:	20131220184124.GC22570@eldon.alvh.no-ip.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas escribió:
> On Fri, Dec 20, 2013 at 7:22 AM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> >> Maybe what we should do is add a function something like
> >> pg_tuple_header(tableoid, ctid) that returns a record, maybe something
> >> like (rawxmin xid, rawxmax xid, rawcid cid, infomask int, infomask2
> >> int, hoff int). Or perhaps some slightly more cooked version of that
> >> information. And then delete the xmin, xmax, cmin, and cmax system
> >> columns. That'd save significantly on pg_attribute entries while, at
> >> the same time, actually providing more information than we do today.
> >
> > I was wondering whether we couldn't just pass pg_tuple_header() a whole
> > row, instead of having the user manually pass in reloid and ctid. I
> > think that should actually work in the interesting scenarios.
>
> I wondered that, too, but it's not well-defined for all tuples. What
> happens if you pass in constructed tuple rather than an on-disk tuple?

I assume without checking that passing reloid/ctid would allow this to
work for tuples in a RETURNING clause; and if we ever have an OLD
reference for the RETURNING clause of an UPDATE, that it would work
there, too, showing the post-update status of the updated tuple.

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc:	Andres Freund <andres(at)2ndquadrant(dot)com>, Jim Nasby <jim(at)nasby(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: preserving forensic information when we freeze
Date:	2013-12-20 19:12:12
Message-ID:	CA+TgmoaV7xCjDfCzaeuJ=m5U_k-nr_f_VYuFCH7DESgQAdabTg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Dec 20, 2013 at 1:41 PM, Alvaro Herrera
<alvherre(at)2ndquadrant(dot)com> wrote:
> Robert Haas escribió:
>> On Fri, Dec 20, 2013 at 7:22 AM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
>> >> Maybe what we should do is add a function something like
>> >> pg_tuple_header(tableoid, ctid) that returns a record, maybe something
>> >> like (rawxmin xid, rawxmax xid, rawcid cid, infomask int, infomask2
>> >> int, hoff int). Or perhaps some slightly more cooked version of that
>> >> information. And then delete the xmin, xmax, cmin, and cmax system
>> >> columns. That'd save significantly on pg_attribute entries while, at
>> >> the same time, actually providing more information than we do today.
>> >
>> > I was wondering whether we couldn't just pass pg_tuple_header() a whole
>> > row, instead of having the user manually pass in reloid and ctid. I
>> > think that should actually work in the interesting scenarios.
>>
>> I wondered that, too, but it's not well-defined for all tuples. What
>> happens if you pass in constructed tuple rather than an on-disk tuple?
>
> I assume without checking that passing reloid/ctid would allow this to
> work for tuples in a RETURNING clause; and if we ever have an OLD
> reference for the RETURNING clause of an UPDATE, that it would work
> there, too, showing the post-update status of the updated tuple.

I don't understand what you're saying here. Are you saying that
reloid/ctid is a better approach, a worse approach, or just a
different approach?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Andres Freund <andres(at)2ndquadrant(dot)com>, Jim Nasby <jim(at)nasby(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: preserving forensic information when we freeze
Date:	2013-12-20 19:17:19
Message-ID:	20131220191718.GD22570@eldon.alvh.no-ip.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas escribió:
> On Fri, Dec 20, 2013 at 1:41 PM, Alvaro Herrera
> <alvherre(at)2ndquadrant(dot)com> wrote:

> > I assume without checking that passing reloid/ctid would allow this to
> > work for tuples in a RETURNING clause; and if we ever have an OLD
> > reference for the RETURNING clause of an UPDATE, that it would work
> > there, too, showing the post-update status of the updated tuple.
>
> I don't understand what you're saying here. Are you saying that
> reloid/ctid is a better approach, a worse approach, or just a
> different approach?

That probably wasn't worded very well. I am just saying that whatever
approach we end up with, it would be nice that it worked somehow with
RETURNING clauses.

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc:	Andres Freund <andres(at)2ndquadrant(dot)com>, Jim Nasby <jim(at)nasby(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: preserving forensic information when we freeze
Date:	2013-12-20 20:05:24
Message-ID:	CA+TgmoZHxHkj=GAuW_Jqop2JCMSv+0b96MzyEJ5UEVKngnqVGw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Dec 20, 2013 at 2:17 PM, Alvaro Herrera
<alvherre(at)2ndquadrant(dot)com> wrote:
> Robert Haas escribió:
>> On Fri, Dec 20, 2013 at 1:41 PM, Alvaro Herrera
>> <alvherre(at)2ndquadrant(dot)com> wrote:
>
>> > I assume without checking that passing reloid/ctid would allow this to
>> > work for tuples in a RETURNING clause; and if we ever have an OLD
>> > reference for the RETURNING clause of an UPDATE, that it would work
>> > there, too, showing the post-update status of the updated tuple.
>>
>> I don't understand what you're saying here. Are you saying that
>> reloid/ctid is a better approach, a worse approach, or just a
>> different approach?
>
> That probably wasn't worded very well. I am just saying that whatever
> approach we end up with, it would be nice that it worked somehow with
> RETURNING clauses.

Hmm. It's not evident to me why that particular case would be any
different than anything else, but that might be my ignorance showing.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Jim Nasby <jim(at)nasby(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: preserving forensic information when we freeze
Date:	2013-12-21 02:56:42
Message-ID:	CA+TgmoZWZZw_wr8aDHWc8iEjWEiHXLPXRaNqyrfb8mw0schEhg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Dec 20, 2013 at 7:51 AM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
>> I wondered that, too, but it's not well-defined for all tuples. What
>> happens if you pass in constructed tuple rather than an on-disk tuple?
>
> Those should be discernible I think, t_self/t_tableOid won't be set for
> generated tuples.

I went looking for existing precedent for code that does things like
this and found record_out, which does this:

HeapTupleHeader rec = PG_GETARG_HEAPTUPLEHEADER(0);
...
/* Extract type info from the tuple itself */
tupType = HeapTupleHeaderGetTypeId(rec);
tupTypmod = HeapTupleHeaderGetTypMod(rec);
tupdesc = lookup_rowtype_tupdesc(tupType, tupTypmod);
ncolumns = tupdesc->natts;

/* Build a temporary HeapTuple control structure */
tuple.t_len = HeapTupleHeaderGetDatumLength(rec);
ItemPointerSetInvalid(&(tuple.t_self));
tuple.t_tableOid = InvalidOid;
tuple.t_data = rec;

This appears to be a typical pattern, although interestingly I noticed
that row_to_json() doesn't bother setting t_tableOid or t_self, which
I think it's supposed to do. The problem I see here is that this code
seems to imply that a function passed a record doesn't actually have
enough information to know what sort of a thing it's getting. The use
of HeapTupleHeaderGetTypeId and HeapTupleHeaderGetTypMod implies that
it's safe to assume that t_choice will contain DatumTupleFields rather
than HeapTupleFields, which doesn't seem to bode well for your
approach.

Am I missing a trick?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Jim Nasby <jim(at)nasby(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: preserving forensic information when we freeze
Date:	2013-12-23 00:57:44
Message-ID:	CA+Tgmob0hZWa1gi+s3Zho3r_mFabcwshoL8HswNCg-hw-YE7GQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Dec 20, 2013 at 8:01 AM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> On 2013-12-20 07:58:46 -0500, Robert Haas wrote:
>> I think the immediate problem is to decide whether this patch ought to
>> make the xmin column display the result of GetXmin() or GetRawXmin().
>> Thoughts on that?
>
> I slightly favor GetRawXmin().

Committed that way for now. It seems more consistent with what we do
elsewhere. We neglected to write documentation for this change, so
here's a patch for that.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment	Content-Type	Size
new-freeze-doc.patch	text/x-patch	7.3 KB

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Jim Nasby <jim(at)nasby(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: preserving forensic information when we freeze
Date:	2013-12-23 12:33:32
Message-ID:	CA+TgmoaiT2UsBkbBUyqnzGOpri5arPnCrC8sqEU=DCBQQSEZBg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Dec 20, 2013 at 9:56 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Fri, Dec 20, 2013 at 7:51 AM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
>>> I wondered that, too, but it's not well-defined for all tuples. What
>>> happens if you pass in constructed tuple rather than an on-disk tuple?
>>
>> Those should be discernible I think, t_self/t_tableOid won't be set for
>> generated tuples.
>
> I went looking for existing precedent for code that does things like
> this and found record_out, which does this:
>
> HeapTupleHeader rec = PG_GETARG_HEAPTUPLEHEADER(0);
> ...
> /* Extract type info from the tuple itself */
> tupType = HeapTupleHeaderGetTypeId(rec);
> tupTypmod = HeapTupleHeaderGetTypMod(rec);
> tupdesc = lookup_rowtype_tupdesc(tupType, tupTypmod);
> ncolumns = tupdesc->natts;
>
> /* Build a temporary HeapTuple control structure */
> tuple.t_len = HeapTupleHeaderGetDatumLength(rec);
> ItemPointerSetInvalid(&(tuple.t_self));
> tuple.t_tableOid = InvalidOid;
> tuple.t_data = rec;
>
> This appears to be a typical pattern, although interestingly I noticed
> that row_to_json() doesn't bother setting t_tableOid or t_self, which
> I think it's supposed to do. The problem I see here is that this code
> seems to imply that a function passed a record doesn't actually have
> enough information to know what sort of a thing it's getting. The use
> of HeapTupleHeaderGetTypeId and HeapTupleHeaderGetTypMod implies that
> it's safe to assume that t_choice will contain DatumTupleFields rather
> than HeapTupleFields, which doesn't seem to bode well for your
> approach.
>
> Am I missing a trick?

If not, here's a patch done the way I originally proposed.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment	Content-Type	Size
pg-tuple-header.patch	application/octet-stream	4.5 KB

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Jim Nasby <jim(at)nasby(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: preserving forensic information when we freeze
Date:	2013-12-23 12:45:04
Message-ID:	20131223124504.GB19795@alap2.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2013-12-20 13:22:07 +0100, Andres Freund wrote:
> On 2013-12-20 07:12:01 -0500, Robert Haas wrote:
> > I think the root of the problem is that nobody's very eager to add
> > more hidden system catalog columns because each one bloats
> > pg_attribute significantly.
>
> I think that part is actually solveable - there's no real need for them
> to be real columns, present in pg_attribute, things could easily enough be
> setup during parse analysis.

I've spent some time yesterday hacking up a prototype for this. The
amount of effort seems reasonable, although the attached patch certainly
doesn't do all the neccessary things. The regression tests pass.

In the prototype only oid keeps it's pg_attribute entry, everything else
uses the static entries via
SystemAttributeDefinition()/SystemAttributeByName(). We might want to
remove oids as well, but it seems reasonable to continue allowing
defining column permissions on it. And it's likely the attribute that's
most likely ingrained deeply in user applications.

> The bigger problem I see is that adding
> more useful columns will cause name clashes, which will probably
> prohibit improvements in that direction.

I wonder if we could solve that by naming new columns like
"system:attribute" or similar. Not pretty, but maybe better than the
alternatives.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment	Content-Type	Size
0001-prototype-patch-remove-system-columns-from-pg_attrib.patch	text/x-patch	20.3 KB

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Jim Nasby <jim(at)nasby(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Andrew Dunstan <andrew(at)dunslane(dot)net>
Subject:	Re: preserving forensic information when we freeze
Date:	2013-12-23 12:56:32
Message-ID:	20131223125632.GC19795@alap2.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2013-12-20 21:56:42 -0500, Robert Haas wrote:
> On Fri, Dec 20, 2013 at 7:51 AM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> >> I wondered that, too, but it's not well-defined for all tuples. What
> >> happens if you pass in constructed tuple rather than an on-disk tuple?
> >
> > Those should be discernible I think, t_self/t_tableOid won't be set for
> > generated tuples.
>
> I went looking for existing precedent for code that does things like
> this and found record_out, which does this:

I think there's plenty precedent for C code, but here the problem seems
to be that, when we pass a record that's originally a plain row, it's
coerced (via IO functions) into a record with typeid/typemod set.

So, for this to work via SQL, we'd need to add a notation that allows
passing table rows to functions without being modified. That might be a
good idea in general, but likely more work than it's work tackling in
this context.

> This appears to be a typical pattern, although interestingly I noticed
> that row_to_json() doesn't bother setting t_tableOid or t_self, which
> I think it's supposed to do.

Yes, that seems to be a minor bug. Andrew?

> Am I missing a trick?

No, don't think so, I wasn't thinking on the SQL level.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Jim Nasby <jim(at)nasby(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: preserving forensic information when we freeze
Date:	2013-12-23 15:26:49
Message-ID:	CA+TgmobH_q340rzbkyaSeBRDPcCAgNz+j0qH+F4tHKCB-eOP3A@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Dec 23, 2013 at 7:45 AM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> On 2013-12-20 13:22:07 +0100, Andres Freund wrote:
>> On 2013-12-20 07:12:01 -0500, Robert Haas wrote:
>> > I think the root of the problem is that nobody's very eager to add
>> > more hidden system catalog columns because each one bloats
>> > pg_attribute significantly.
>>
>> I think that part is actually solveable - there's no real need for them
>> to be real columns, present in pg_attribute, things could easily enough be
>> setup during parse analysis.
>
> I've spent some time yesterday hacking up a prototype for this. The
> amount of effort seems reasonable, although the attached patch certainly
> doesn't do all the neccessary things. The regression tests pass.

Well, I'm all in favor of de-bloating pg_attribute, but I don't
particularly like cooked_xmax. Even if it were better-named
(updatexid?), I'm still not sure I'd like it very much. And I
definitely think that getting rid of the pg_attribute entries ought to
be a separate patch from adding any new system columns we want to
have.

> In the prototype only oid keeps it's pg_attribute entry, everything else
> uses the static entries via
> SystemAttributeDefinition()/SystemAttributeByName(). We might want to
> remove oids as well, but it seems reasonable to continue allowing
> defining column permissions on it. And it's likely the attribute that's
> most likely ingrained deeply in user applications.

Seems fine to me.

>> The bigger problem I see is that adding
>> more useful columns will cause name clashes, which will probably
>> prohibit improvements in that direction.
>
> I wonder if we could solve that by naming new columns like
> "system:attribute" or similar. Not pretty, but maybe better than the
> alternatives.

If we're going to add a prefix, I think it should be one that doesn't
require quoting the identifier. In retrospect pg_xmin or _pg_xmin or
__pg_xmin would have been a better name than xmin.

But TBH, I'm kind of enamored of the function-based approach at the
moment. It just seems a lot more flexible. Adding a function is
nearly free and we can add as many as we need doing whatever we want,
and they can even go into contrib modules if we like it better that
way.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Jim Nasby <jim(at)nasby(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: preserving forensic information when we freeze
Date:	2013-12-23 15:42:04
Message-ID:	20131223154204.GC26564@alap2.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2013-12-23 10:26:49 -0500, Robert Haas wrote:
> On Mon, Dec 23, 2013 at 7:45 AM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> > I've spent some time yesterday hacking up a prototype for this. The
> > amount of effort seems reasonable, although the attached patch certainly
> > doesn't do all the neccessary things. The regression tests pass.
>
> Well, I'm all in favor of de-bloating pg_attribute, but I don't
> particularly like cooked_xmax. Even if it were better-named
> (updatexid?), I'm still not sure I'd like it very much. And I
> definitely think that getting rid of the pg_attribute entries ought to
> be a separate patch from adding any new system columns we want to
> have.

Yea, that was just a demo attribute, I am far from sure we should add
any. It was just important to me to test that we're easily able to do
so. I don't even think it's semantics are necessarily all that useful.

I think we should do this, independent from adding additional
attributes. The reduction of rows in pg_attribute is quite noticeable,
and I can't see us just dropping the old system columns.

> But TBH, I'm kind of enamored of the function-based approach at the
> moment. It just seems a lot more flexible. Adding a function is
> nearly free and we can add as many as we need doing whatever we want,
> and they can even go into contrib modules if we like it better that
> way.

Well, it really depends on the usecase. Reading
SELECT header.xmin, header.xmax
FROM sometable tbl,
pg_tuple_header(tbl.tableoid, tbl.ctid) header;
is surely harder to understand than
SELECT tbl.xmin, tbl.xmax
FROM sometable tbl;
and the latter is noticeably cheaper since we're not re-fetching the
tuple, especially when the tuple is the result of a more complicated
query including a seqscan, we might often need to re-fetch the tuple
from disk, thanks to the ringbuffer.

That really makes me less than convinced that the function based
approach is the right tool for everything.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Jim Nasby <jim(at)nasby(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: preserving forensic information when we freeze
Date:	2013-12-23 15:56:07
Message-ID:	CA+TgmoYokS-f-SQ-b2U095Ri0ju+hhkzCobyv-vPJVb2a=WB5Q@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Dec 23, 2013 at 10:42 AM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
>> But TBH, I'm kind of enamored of the function-based approach at the
>> moment. It just seems a lot more flexible. Adding a function is
>> nearly free and we can add as many as we need doing whatever we want,
>> and they can even go into contrib modules if we like it better that
>> way.
>
> Well, it really depends on the usecase. Reading
> SELECT header.xmin, header.xmax
> FROM sometable tbl,
> pg_tuple_header(tbl.tableoid, tbl.ctid) header;
> is surely harder to understand than
> SELECT tbl.xmin, tbl.xmax
> FROM sometable tbl;
> and the latter is noticeably cheaper since we're not re-fetching the
> tuple, especially when the tuple is the result of a more complicated
> query including a seqscan, we might often need to re-fetch the tuple
> from disk, thanks to the ringbuffer.
>
> That really makes me less than convinced that the function based
> approach is the right tool for everything.

Meh. Who are all of these people who are fetching xmin, xmax, cmin,
and cmax in complex queries, and why are they doing that? If those
columns are just for forensic purposes, then whatever performance
disadvantages the function-based approach has don't really matter. If
people are using them as integral parts of their application, then
that's more of a concern, but how are those people not getting broken
every release or three anyway? We keep inventing things like
combo-cids and multi-xacts and multi-xacts that also contain an update
and now freezing without changing xmin, and if you're actually relying
on those columns for anything but forensics your application is
presumably going to break when we change whatever part it depends on.
Is anyone really doing that, and if so, why?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Kevin Grittner <kgrittn(at)ymail(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Jim Nasby <jim(at)nasby(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: preserving forensic information when we freeze
Date:	2013-12-23 18:34:15
Message-ID:	1387823655.72626.YahooMailNeo@web162905.mail.bf1.yahoo.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> Meh. Who are all of these people who are fetching xmin, xmax, cmin,
> and cmax in complex queries, and why are they doing that? If those
> columns are just for forensic purposes, then whatever performance
> disadvantages the function-based approach has don't really matter. If
> people are using them as integral parts of their application, then
> that's more of a concern, but how are those people not getting broken
> every release or three anyway? We keep inventing things like
> combo-cids and multi-xacts and multi-xacts that also contain an update
> and now freezing without changing xmin, and if you're actually relying
> on those columns for anything but forensics your application is
> presumably going to break when we change whatever part it depends on.
> Is anyone really doing that, and if so, why?

I've seen an ORM use xmin for a sort of optimistic concurrency
control scheme. Let's say that you bring up an address by primary
key, and change an incorrect apartment number. At the same time,
someone else is entering a change of address, to a whole new
location where a totally different apartment number is supplied.
The developers want to prevent portions of the two addresses from
being intermingled, they don't want to hold a database transaction
open for the time the user has the window up, they want to avoid
blocking as much as possible, and they don't want to store an extra
column with a version number or timestamp just to do that -- so
they use xmin. When they go to rewrite the row they use the PK
values plus the xmin in the UPDATE statement's WHERE clause. If
the row is not found, the user gets an error message. Currently
that does mean that the users can get a spurious error message when
there is a tuple freeze while the window is open, so the current
changes are probably good news for those using this approach.

Other than that and ctid (for similar reasons) I have not seen
system columns used in applications or application frameworks.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Jim Nasby <jim(at)nasby(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: preserving forensic information when we freeze
Date:	2013-12-23 18:57:07
Message-ID:	20131223185707.GB31909@alap2.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2013-12-23 10:56:07 -0500, Robert Haas wrote:
> > Well, it really depends on the usecase. Reading
> > SELECT header.xmin, header.xmax
> > FROM sometable tbl,
> > pg_tuple_header(tbl.tableoid, tbl.ctid) header;
> > is surely harder to understand than
> > SELECT tbl.xmin, tbl.xmax
> > FROM sometable tbl;
> > and the latter is noticeably cheaper since we're not re-fetching the
> > tuple, especially when the tuple is the result of a more complicated
> > query including a seqscan, we might often need to re-fetch the tuple
> > from disk, thanks to the ringbuffer.
> >
> > That really makes me less than convinced that the function based
> > approach is the right tool for everything.

> Meh. Who are all of these people who are fetching xmin, xmax, cmin,
> and cmax in complex queries, and why are they doing that?

I have seen it done to avoid ABA problems in cache invalidation
mechanisms by simply checking ctid, xmin, xmax to stay the same.

> If those
> columns are just for forensic purposes, then whatever performance
> disadvantages the function-based approach has don't really matter. If
> people are using them as integral parts of their application, then
> that's more of a concern, but how are those people not getting broken
> every release or three anyway? We keep inventing things like
> combo-cids and multi-xacts and multi-xacts that also contain an update
> and now freezing without changing xmin, and if you're actually relying
> on those columns for anything but forensics your application is
> presumably going to break when we change whatever part it depends on.

Well, all of the fundamental changes (combocids, the initial multixact
introduction) have been quite some time ago. I think backward compat has
a much higher value these days (I also don't see much point in looking
at cmin/cmax for anything but diagnostic purposes). I don't think any of
the usecases I've seen would be broken by either fk-locks (multixacts
have been there before, doesn't matter much that they now contain
updates) nor by forensic freezing. The latter because they really only
checked that xmin/xmax were the same, and we hopefully haven't broken
that...

But part of my point really was the usability, not only the
performance. Requiring LATERAL queries to be usable sensibly causes a
"Meh" from my side ;)

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Jim Nasby <jim(at)nasby(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: preserving forensic information when we freeze
Date:	2013-12-23 19:15:29
Message-ID:	CA+Tgmoam6Hf5kOnh2XFNREm_hJNnZKegoF7Od8nynZJHbTgRQA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Dec 23, 2013 at 1:57 PM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> Well, all of the fundamental changes (combocids, the initial multixact
> introduction) have been quite some time ago. I think backward compat has
> a much higher value these days (I also don't see much point in looking
> at cmin/cmax for anything but diagnostic purposes). I don't think any of
> the usecases I've seen would be broken by either fk-locks (multixacts
> have been there before, doesn't matter much that they now contain
> updates) nor by forensic freezing. The latter because they really only
> checked that xmin/xmax were the same, and we hopefully haven't broken
> that...
>
> But part of my point really was the usability, not only the
> performance. Requiring LATERAL queries to be usable sensibly causes a
> "Meh" from my side ;)

I simply can't imagine that we're going to want to add a system column
every time we change something, or even enough of them to cover all of
the things we've already got. We'd need at least infomask, infomask2,
and hoff, and maybe some functions for peeking through mxacts to find
the updater and locker XIDs and lock modes. With a couple of
functions we can do all that and not look back.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Jim Nasby <jim(at)nasby(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: preserving forensic information when we freeze
Date:	2013-12-24 14:22:56
Message-ID:	20131224142256.GH26564@alap2.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2013-12-23 14:15:29 -0500, Robert Haas wrote:
> On Mon, Dec 23, 2013 at 1:57 PM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> > Well, all of the fundamental changes (combocids, the initial multixact
> > introduction) have been quite some time ago. I think backward compat has
> > a much higher value these days (I also don't see much point in looking
> > at cmin/cmax for anything but diagnostic purposes). I don't think any of
> > the usecases I've seen would be broken by either fk-locks (multixacts
> > have been there before, doesn't matter much that they now contain
> > updates) nor by forensic freezing. The latter because they really only
> > checked that xmin/xmax were the same, and we hopefully haven't broken
> > that...
> >
> > But part of my point really was the usability, not only the
> > performance. Requiring LATERAL queries to be usable sensibly causes a
> > "Meh" from my side ;)
>
> I simply can't imagine that we're going to want to add a system column
> every time we change something, or even enough of them to cover all of
> the things we've already got. We'd need at least infomask, infomask2,
> and hoff, and maybe some functions for peeking through mxacts to find
> the updater and locker XIDs and lock modes. With a couple of
> functions we can do all that and not look back.

If system columns don't have an overhead anymore, I fail to see the
advantage that functions have over simply accessing parts of the row in
the normal way parts of rows are accessed. The only reasoning I can see
is lessening the likelihood of conflicts - but that's essentially only
solved via namespaced pg_ prefixes for functions as well.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Jim Nasby <jim(at)nasby(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: preserving forensic information when we freeze
Date:	2013-12-26 23:25:41
Message-ID:	CA+TgmoZtedJr4uY_jgtbekbVbRLAGySwY_X7uXo5fyhf+rqt2Q@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Dec 24, 2013 at 6:22 AM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> On 2013-12-23 14:15:29 -0500, Robert Haas wrote:
>> On Mon, Dec 23, 2013 at 1:57 PM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
>> > Well, all of the fundamental changes (combocids, the initial multixact
>> > introduction) have been quite some time ago. I think backward compat has
>> > a much higher value these days (I also don't see much point in looking
>> > at cmin/cmax for anything but diagnostic purposes). I don't think any of
>> > the usecases I've seen would be broken by either fk-locks (multixacts
>> > have been there before, doesn't matter much that they now contain
>> > updates) nor by forensic freezing. The latter because they really only
>> > checked that xmin/xmax were the same, and we hopefully haven't broken
>> > that...
>> >
>> > But part of my point really was the usability, not only the
>> > performance. Requiring LATERAL queries to be usable sensibly causes a
>> > "Meh" from my side ;)
>>
>> I simply can't imagine that we're going to want to add a system column
>> every time we change something, or even enough of them to cover all of
>> the things we've already got. We'd need at least infomask, infomask2,
>> and hoff, and maybe some functions for peeking through mxacts to find
>> the updater and locker XIDs and lock modes. With a couple of
>> functions we can do all that and not look back.
>
> If system columns don't have an overhead anymore, I fail to see the
> advantage that functions have over simply accessing parts of the row in
> the normal way parts of rows are accessed. The only reasoning I can see
> is lessening the likelihood of conflicts - but that's essentially only
> solved via namespaced pg_ prefixes for functions as well.

I dunno, I just have an uneasy feeling about it. I guess if
everyone's happy to add pg_infomask and pg_infomask2 as new system
columns, we could go that route. For my part, I think that functions
are a real extensibility mechanism and hidden system columns are a
crock I'd rather not propagate. But I just work here, and I'll give
way if the consensus is otherwise.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Jim Nasby <jim(at)nasby(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: preserving forensic information when we freeze
Date:	2013-12-27 09:09:41
Message-ID:	20131227090941.GD31909@alap2.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2013-12-26 15:25:41 -0800, Robert Haas wrote:
> On Tue, Dec 24, 2013 at 6:22 AM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> > If system columns don't have an overhead anymore, I fail to see the
> > advantage that functions have over simply accessing parts of the row in
> > the normal way parts of rows are accessed. The only reasoning I can see
> > is lessening the likelihood of conflicts - but that's essentially only
> > solved via namespaced pg_ prefixes for functions as well.
>
> I dunno, I just have an uneasy feeling about it. I guess if
> everyone's happy to add pg_infomask and pg_infomask2 as new system
> columns, we could go that route. For my part, I think that functions
> are a real extensibility mechanism and hidden system columns are a
> crock I'd rather not propagate. But I just work here, and I'll give
> way if the consensus is otherwise.

I am happy enough with functions as well, so I certainly won't
fundamentally block that path after a bit more discussion. My problems
with the function approach may in parts even be fixable, making me
prefer it:
* It's slower. It refetches the tuple from s_b/disk, it builds a row
type to return, which then needs to get accessed for the individual
columns.
* By nature of being a record returning function it's awkward to use if you
want the individual columns because you essentially need to use
LATERAL or you re-execute the function for each column.
* It's awkward to use because you need to need to pass
tableoid/ctid. That's quite some notational overhead, relying on
system columns itself...
* It's imo harder to spot differenced between versions in record types than
columns.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Stephen Frost <sfrost(at)snowman(dot)net>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Jim Nasby <jim(at)nasby(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: preserving forensic information when we freeze
Date:	2013-12-27 14:15:34
Message-ID:	20131227141534.GV2543@tamriel.snowman.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

* Andres Freund (andres(at)2ndquadrant(dot)com) wrote:
> On 2013-12-26 15:25:41 -0800, Robert Haas wrote:
> > I dunno, I just have an uneasy feeling about it. I guess if
> > everyone's happy to add pg_infomask and pg_infomask2 as new system
> > columns, we could go that route. For my part, I think that functions
> > are a real extensibility mechanism and hidden system columns are a
> > crock I'd rather not propagate. But I just work here, and I'll give
> > way if the consensus is otherwise.
>
> I am happy enough with functions as well, so I certainly won't
> fundamentally block that path after a bit more discussion. My problems
> with the function approach may in parts even be fixable, making me
> prefer it:
> * It's slower. It refetches the tuple from s_b/disk, it builds a row
> type to return, which then needs to get accessed for the individual
> columns.
> * By nature of being a record returning function it's awkward to use if you
> want the individual columns because you essentially need to use
> LATERAL or you re-execute the function for each column.
> * It's awkward to use because you need to need to pass
> tableoid/ctid. That's quite some notational overhead, relying on
> system columns itself...
> * It's imo harder to spot differenced between versions in record types than
> columns.

For my 2c, I tend to agree w/ Andres on this one. I like the *idea* of
having a function instead, but, practically, it doesn't really work.
Perhaps if the 'function' approach was one-function-per-column and we
could remove the need for the function to take any arguments, but I'm
not sure we've really got something better than what we have now.

Hindsight being what it is, perhaps we should have stuffed the system
columns into a complex type instead of having individual columns, but
I'm not sure changing that now would be worth the backwards
compatibility break (yes, I know they're system columns, but I've seen
more than one case of using ctid to break ties in otherwise identical
rows..).

Thanks,

Stephen

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Stephen Frost <sfrost(at)snowman(dot)net>
Cc:	Andres Freund <andres(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Jim Nasby <jim(at)nasby(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: preserving forensic information when we freeze
Date:	2013-12-27 22:09:09
Message-ID:	CA+Tgmobb1c9hkQiFX0KrbXNhJRP-z-kdxcPD3stQ5T6Jh2Ee5g@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Dec 27, 2013 at 6:15 AM, Stephen Frost <sfrost(at)snowman(dot)net> wrote:
> * Andres Freund (andres(at)2ndquadrant(dot)com) wrote:
>> On 2013-12-26 15:25:41 -0800, Robert Haas wrote:
>> > I dunno, I just have an uneasy feeling about it. I guess if
>> > everyone's happy to add pg_infomask and pg_infomask2 as new system
>> > columns, we could go that route. For my part, I think that functions
>> > are a real extensibility mechanism and hidden system columns are a
>> > crock I'd rather not propagate. But I just work here, and I'll give
>> > way if the consensus is otherwise.
>>
>> I am happy enough with functions as well, so I certainly won't
>> fundamentally block that path after a bit more discussion. My problems
>> with the function approach may in parts even be fixable, making me
>> prefer it:
>> * It's slower. It refetches the tuple from s_b/disk, it builds a row
>> type to return, which then needs to get accessed for the individual
>> columns.
>> * By nature of being a record returning function it's awkward to use if you
>> want the individual columns because you essentially need to use
>> LATERAL or you re-execute the function for each column.
>> * It's awkward to use because you need to need to pass
>> tableoid/ctid. That's quite some notational overhead, relying on
>> system columns itself...
>> * It's imo harder to spot differenced between versions in record types than
>> columns.
>
> For my 2c, I tend to agree w/ Andres on this one. I like the *idea* of
> having a function instead, but, practically, it doesn't really work.
> Perhaps if the 'function' approach was one-function-per-column and we
> could remove the need for the function to take any arguments, but I'm
> not sure we've really got something better than what we have now.

I'm not sure what you mean by "doesn't work", because it clearly does
work. I've already posted a patch. You may find it ugly, but that's
not the same as not working.

> Hindsight being what it is, perhaps we should have stuffed the system
> columns into a complex type instead of having individual columns, but
> I'm not sure changing that now would be worth the backwards
> compatibility break (yes, I know they're system columns, but I've seen
> more than one case of using ctid to break ties in otherwise identical
> rows..).

Well, if the consensus is in favor of adding more system columns,
that's not my first choice, but I'm OK with it. However, I wonder how
we plan to name them. If we add pg_infomask and pg_infomask2, it
won't be consistent with the existing naming convention which doesn't
include any kind of pg-prefix, but if we don't use such a prefix then
every column we add further pollutes the namespace.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Stephen Frost <sfrost(at)snowman(dot)net>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Andres Freund <andres(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Jim Nasby <jim(at)nasby(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: preserving forensic information when we freeze
Date:	2013-12-28 02:24:44
Message-ID:	20131228022444.GW2543@tamriel.snowman.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert,

* Robert Haas (robertmhaas(at)gmail(dot)com) wrote:
> On Fri, Dec 27, 2013 at 6:15 AM, Stephen Frost <sfrost(at)snowman(dot)net> wrote:
> > For my 2c, I tend to agree w/ Andres on this one. I like the *idea* of
> > having a function instead, but, practically, it doesn't really work.
> > Perhaps if the 'function' approach was one-function-per-column and we
> > could remove the need for the function to take any arguments, but I'm
> > not sure we've really got something better than what we have now.
>
> I'm not sure what you mean by "doesn't work", because it clearly does
> work. I've already posted a patch. You may find it ugly, but that's
> not the same as not working.

I meant *practically*, it doesn't work. By which, I mean, "it sucks" as
a solution. :)

> > Hindsight being what it is, perhaps we should have stuffed the system
> > columns into a complex type instead of having individual columns, but
> > I'm not sure changing that now would be worth the backwards
> > compatibility break (yes, I know they're system columns, but I've seen
> > more than one case of using ctid to break ties in otherwise identical
> > rows..).
>
> Well, if the consensus is in favor of adding more system columns,
> that's not my first choice, but I'm OK with it. However, I wonder how
> we plan to name them. If we add pg_infomask and pg_infomask2, it
> won't be consistent with the existing naming convention which doesn't
> include any kind of pg-prefix, but if we don't use such a prefix then
> every column we add further pollutes the namespace.

Yeah, I agree that it gets a bit ugly... What would you think of doing
*both*? Keep the existing system columns for backwards compatibility,
but then also have a complex 'pg_header' type which provides all of the
existing columns, as well as infomask && infomask2 ...?

I've not looked into any of the implementation complexity, but it might
give us a path to providing *just* the 'pg_header' (or whatever color we
end up making that bike shed...) complex type with all the system
columns available under it, maybe only using that in 10.0?

Thanks,

Stephen

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Stephen Frost <sfrost(at)snowman(dot)net>
Cc:	Andres Freund <andres(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Jim Nasby <jim(at)nasby(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: preserving forensic information when we freeze
Date:	2014-01-02 02:33:54
Message-ID:	CA+TgmoYdWOZa_UiewbqE73AQCM2etpMjukkcmYPkGatH8kPTow@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Dec 27, 2013 at 9:24 PM, Stephen Frost <sfrost(at)snowman(dot)net> wrote:
>> I'm not sure what you mean by "doesn't work", because it clearly does
>> work. I've already posted a patch. You may find it ugly, but that's
>> not the same as not working.
>
> I meant *practically*, it doesn't work. By which, I mean, "it sucks" as
> a solution. :)

I fail to see why. What is so ugly about this:

select x.* from pgbench_accounts a, pg_tuple_header(a.tableoid, a.ctid) x;

It may be true that that query wouldn't have worked before Tom added
LATERAL, but he did add LATERAL and we have it now and nobody's going
to care about this function on versions that haven't got LATERAL,
because it won't exist there anyway. The whole point of adding
features like LATERAL (which doesn't even appear in that query,
interestingly enough) is that it was really annoying to use functions
like this before we had it. But now we *do* have it, so the fact a
function like this would have been annoying to use in older releases
isn't a reason not to add it now.

I will admit that performance may be a reason. After pgbench -i:

rhaas=# explain analyze select xmax from pgbench_accounts;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
Seq Scan on pgbench_accounts (cost=0.00..2640.00 rows=100000
width=4) (actual time=0.009..14.950 rows=100000 loops=1)
Total runtime: 20.354 ms
(2 rows)

rhaas=# explain analyze select (pg_tuple_header(tableoid,ctid)).xmax
from pgbench_accounts;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
Seq Scan on pgbench_accounts (cost=0.00..2890.00 rows=100000
width=10) (actual time=0.023..314.783 rows=100000 loops=1)
Total runtime: 322.714 ms
(2 rows)

>> > Hindsight being what it is, perhaps we should have stuffed the system
>> > columns into a complex type instead of having individual columns, but
>> > I'm not sure changing that now would be worth the backwards
>> > compatibility break (yes, I know they're system columns, but I've seen
>> > more than one case of using ctid to break ties in otherwise identical
>> > rows..).
>>
>> Well, if the consensus is in favor of adding more system columns,
>> that's not my first choice, but I'm OK with it. However, I wonder how
>> we plan to name them. If we add pg_infomask and pg_infomask2, it
>> won't be consistent with the existing naming convention which doesn't
>> include any kind of pg-prefix, but if we don't use such a prefix then
>> every column we add further pollutes the namespace.
>
> Yeah, I agree that it gets a bit ugly... What would you think of doing
> *both*? Keep the existing system columns for backwards compatibility,
> but then also have a complex 'pg_header' type which provides all of the
> existing columns, as well as infomask && infomask2 ...?

I don't really see what that accomplishes, TBH. If we further modify
the header format in the future, we'll need to modify the definition
of that type, and that will be a backward-compatibility break for
anyone using it. Adding new system columns is probably less
intrusive.

Maybe it's best to just add pg_infomask and pg_infomask2 as system
columns and not worry about the inconsistency with the naming
convention for existing columns.

Other opinions?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Greg Stark <stark(at)mit(dot)edu>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)2ndquadrant(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Jim Nasby <jim(at)nasby(dot)net>
Subject:	Re: preserving forensic information when we freeze
Date:	2014-01-02 05:26:26
Message-ID:	CAM-w4HO47osezyR=x0hGabhExp1T0z0VfDEmMq4d4Vk2nvGUcA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> I fail to see why. What is so ugly about this:

> select x.* from pgbench_accounts a, pg_tuple_header(a.tableoid, a.ctid) x;

Two points:

1) it's a bit weird to go to this effort to eliminate system columns by
using a scheme that depends on having a system column -- ctid

2) refetching a row could conceivably end up retrieving different data than
was present when the row was originally read. (In some cases that might
actually be the intended behaviour)

If this came up earlier I'm sorry but I suppose it's too hard to have a
function like foo(tab.*) which magically can tell that the record is a heap
tuple and look in the header? And presumably throw an error if passed a non
heap tuple.

--
greg
On 1 Jan 2014 21:34, "Robert Haas" <robertmhaas(at)gmail(dot)com> wrote:

> On Fri, Dec 27, 2013 at 9:24 PM, Stephen Frost <sfrost(at)snowman(dot)net> wrote:
> >> I'm not sure what you mean by "doesn't work", because it clearly does
> >> work. I've already posted a patch. You may find it ugly, but that's
> >> not the same as not working.
> >
> > I meant *practically*, it doesn't work. By which, I mean, "it sucks" as
> > a solution. :)
>
> I fail to see why. What is so ugly about this:
>
> select x.* from pgbench_accounts a, pg_tuple_header(a.tableoid, a.ctid) x;
>
> It may be true that that query wouldn't have worked before Tom added
> LATERAL, but he did add LATERAL and we have it now and nobody's going
> to care about this function on versions that haven't got LATERAL,
> because it won't exist there anyway. The whole point of adding
> features like LATERAL (which doesn't even appear in that query,
> interestingly enough) is that it was really annoying to use functions
> like this before we had it. But now we *do* have it, so the fact a
> function like this would have been annoying to use in older releases
> isn't a reason not to add it now.
>
> I will admit that performance may be a reason. After pgbench -i:
>
> rhaas=# explain analyze select xmax from pgbench_accounts;
> QUERY PLAN
>
> ------------------------------------------------------------------------------------------------------------------------
> Seq Scan on pgbench_accounts (cost=0.00..2640.00 rows=100000
> width=4) (actual time=0.009..14.950 rows=100000 loops=1)
> Total runtime: 20.354 ms
> (2 rows)
>
> rhaas=# explain analyze select (pg_tuple_header(tableoid,ctid)).xmax
> from pgbench_accounts;
> QUERY PLAN
>
> --------------------------------------------------------------------------------------------------------------------------
> Seq Scan on pgbench_accounts (cost=0.00..2890.00 rows=100000
> width=10) (actual time=0.023..314.783 rows=100000 loops=1)
> Total runtime: 322.714 ms
> (2 rows)
>
> >> > Hindsight being what it is, perhaps we should have stuffed the system
> >> > columns into a complex type instead of having individual columns, but
> >> > I'm not sure changing that now would be worth the backwards
> >> > compatibility break (yes, I know they're system columns, but I've seen
> >> > more than one case of using ctid to break ties in otherwise identical
> >> > rows..).
> >>
> >> Well, if the consensus is in favor of adding more system columns,
> >> that's not my first choice, but I'm OK with it. However, I wonder how
> >> we plan to name them. If we add pg_infomask and pg_infomask2, it
> >> won't be consistent with the existing naming convention which doesn't
> >> include any kind of pg-prefix, but if we don't use such a prefix then
> >> every column we add further pollutes the namespace.
> >
> > Yeah, I agree that it gets a bit ugly... What would you think of doing
> > *both*? Keep the existing system columns for backwards compatibility,
> > but then also have a complex 'pg_header' type which provides all of the
> > existing columns, as well as infomask && infomask2 ...?
>
> I don't really see what that accomplishes, TBH. If we further modify
> the header format in the future, we'll need to modify the definition
> of that type, and that will be a backward-compatibility break for
> anyone using it. Adding new system columns is probably less
> intrusive.
>
> Maybe it's best to just add pg_infomask and pg_infomask2 as system
> columns and not worry about the inconsistency with the naming
> convention for existing columns.
>
> Other opinions?
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Greg Stark <stark(at)mit(dot)edu>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Stephen Frost <sfrost(at)snowman(dot)net>, Jim Nasby <jim(at)nasby(dot)net>
Subject:	Re: preserving forensic information when we freeze
Date:	2014-01-02 08:48:26
Message-ID:	20140102084826.GA2683@awork2.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

On 2014-01-02 05:26:26 +0000, Greg Stark wrote:
> 2) refetching a row could conceivably end up retrieving different data than
> was present when the row was originally read. (In some cases that might
> actually be the intended behaviour)

That's possible with system columns as well. In the normal cases we'll
only have copied the HeapTuple, not the HeapTupleHeader, so it will be
re-fetched from the (pinned) buffer.

> If this came up earlier I'm sorry but I suppose it's too hard to have a
> function like foo(tab.*) which magically can tell that the record is a heap
> tuple and look in the header? And presumably throw an error if passed a non
> heap tuple.

I don't see how that could be a good API. What happens if you have two
relations in a query?
Even if that wouldn't be a query, why would this be a helpful? Seems
like a poor reinvention of system columns.

Andres

PS: Could you not always include the full quoted message below your --
signature?

Greetings,

Andres Freund

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Greg Stark <stark(at)mit(dot)edu>
Cc:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)2ndquadrant(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Jim Nasby <jim(at)nasby(dot)net>
Subject:	Re: preserving forensic information when we freeze
Date:	2014-01-02 12:35:11
Message-ID:	CA+TgmoYt-8YV4ucafntVu0BTAaR8fw-YWko79c+fCJ1=BM+vVA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Jan 2, 2014 at 12:26 AM, Greg Stark <stark(at)mit(dot)edu> wrote:
>> I fail to see why. What is so ugly about this:
>
>> select x.* from pgbench_accounts a, pg_tuple_header(a.tableoid, a.ctid) x;
>
> Two points:
>
> 1) it's a bit weird to go to this effort to eliminate system columns by
> using a scheme that depends on having a system column -- ctid
>
> 2) refetching a row could conceivably end up retrieving different data than
> was present when the row was originally read. (In some cases that might
> actually be the intended behaviour)
>
> If this came up earlier I'm sorry but I suppose it's too hard to have a
> function like foo(tab.*) which magically can tell that the record is a heap
> tuple and look in the header? And presumably throw an error if passed a non
> heap tuple.

Yeah, that was tried. Doesn't work:

http://www.postgresql.org/message-id/CA+TgmoZWZZw_wr8aDHWc8iEjWEiHXLPXRaNqyrfb8mw0schEhg@mail.gmail.com

At any rate, my goal isn't really to get rid of system columns, but to
address Jim Nasby's gripe that the changes thus far committed make it
difficult to use system columns to determine whether a given tuple has
been frozen. It sounds like the consensus is that a system column is
a better way of doing that than a function, so I suppose the next step
is to try to hammer out the details.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Greg Stark <stark(at)mit(dot)edu>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Stephen Frost <sfrost(at)snowman(dot)net>, Jim Nasby <jim(at)nasby(dot)net>
Subject:	Re: preserving forensic information when we freeze
Date:	2014-01-02 12:40:18
Message-ID:	20140102124018.GA842@awork2.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2014-01-02 07:35:11 -0500, Robert Haas wrote:
> On Thu, Jan 2, 2014 at 12:26 AM, Greg Stark <stark(at)mit(dot)edu> wrote:
> >> I fail to see why. What is so ugly about this:
> >
> >> select x.* from pgbench_accounts a, pg_tuple_header(a.tableoid, a.ctid) x;
> >
> > Two points:
> >
> > 1) it's a bit weird to go to this effort to eliminate system columns by
> > using a scheme that depends on having a system column -- ctid
> >
> > 2) refetching a row could conceivably end up retrieving different data than
> > was present when the row was originally read. (In some cases that might
> > actually be the intended behaviour)
> >
> > If this came up earlier I'm sorry but I suppose it's too hard to have a
> > function like foo(tab.*) which magically can tell that the record is a heap
> > tuple and look in the header? And presumably throw an error if passed a non
> > heap tuple.
>
> Yeah, that was tried. Doesn't work:
>
> http://www.postgresql.org/message-id/CA+TgmoZWZZw_wr8aDHWc8iEjWEiHXLPXRaNqyrfb8mw0schEhg@mail.gmail.com
>
> At any rate, my goal isn't really to get rid of system columns, but to
> address Jim Nasby's gripe that the changes thus far committed make it
> difficult to use system columns to determine whether a given tuple has
> been frozen. It sounds like the consensus is that a system column is
> a better way of doing that than a function, so I suppose the next step
> is to try to hammer out the details.

So, I think something like my POC patch would need to get in before we
can go there, right? I won't be able to work on it in the next 10days or
so, there's a bunch of stuff that's more pressing unfortunately. There's
a fair bit of work left there...

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	Greg Stark <stark(at)mit(dot)edu>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Stephen Frost <sfrost(at)snowman(dot)net>, Jim Nasby <jim(at)nasby(dot)net>
Subject:	Re: preserving forensic information when we freeze
Date:	2014-01-02 12:59:18
Message-ID:	CA+Tgmoak8DaHOzfq87mVeLRh=ib+M1P0ONAZGNB2ETGTQBAroQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Jan 2, 2014 at 7:40 AM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> On 2014-01-02 07:35:11 -0500, Robert Haas wrote:
>> On Thu, Jan 2, 2014 at 12:26 AM, Greg Stark <stark(at)mit(dot)edu> wrote:
>> >> I fail to see why. What is so ugly about this:
>> >
>> >> select x.* from pgbench_accounts a, pg_tuple_header(a.tableoid, a.ctid) x;
>> >
>> > Two points:
>> >
>> > 1) it's a bit weird to go to this effort to eliminate system columns by
>> > using a scheme that depends on having a system column -- ctid
>> >
>> > 2) refetching a row could conceivably end up retrieving different data than
>> > was present when the row was originally read. (In some cases that might
>> > actually be the intended behaviour)
>> >
>> > If this came up earlier I'm sorry but I suppose it's too hard to have a
>> > function like foo(tab.*) which magically can tell that the record is a heap
>> > tuple and look in the header? And presumably throw an error if passed a non
>> > heap tuple.
>>
>> Yeah, that was tried. Doesn't work:
>>
>> http://www.postgresql.org/message-id/CA+TgmoZWZZw_wr8aDHWc8iEjWEiHXLPXRaNqyrfb8mw0schEhg@mail.gmail.com
>>
>> At any rate, my goal isn't really to get rid of system columns, but to
>> address Jim Nasby's gripe that the changes thus far committed make it
>> difficult to use system columns to determine whether a given tuple has
>> been frozen. It sounds like the consensus is that a system column is
>> a better way of doing that than a function, so I suppose the next step
>> is to try to hammer out the details.
>
> So, I think something like my POC patch would need to get in before we
> can go there, right? I won't be able to work on it in the next 10days or
> so, there's a bunch of stuff that's more pressing unfortunately. There's
> a fair bit of work left there...

Well, that's the question, I guess. I think the options are:

1. Add pg_infomask now. Leave your patch for a future optimization.
Suck up the catalog bloat.
2. Wait for your patch to be done. Then add pg_infomask afterwards.
Accept that it may not make 9.4, or else accept that we may have to
accept that patch after the nominal deadline.
3. Decide that exposing the frozen status of tuples is not a priority
and just don't worry about it.

One concern I have is that if we spend too much time now worrying
about these system column problems, it may reduce the amount of
optimization that gets done on top of this optimization in the 9.4
time frame. And that, I think, would be sad.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Greg Stark <stark(at)mit(dot)edu>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)2ndquadrant(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Jim Nasby <jim(at)nasby(dot)net>
Subject:	Re: preserving forensic information when we freeze
Date:	2014-01-02 14:40:54
Message-ID:	27592.1388673654@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Thu, Jan 2, 2014 at 12:26 AM, Greg Stark <stark(at)mit(dot)edu> wrote:
>> 1) it's a bit weird to go to this effort to eliminate system columns by
>> using a scheme that depends on having a system column -- ctid

> At any rate, my goal isn't really to get rid of system columns, but to
> address Jim Nasby's gripe that the changes thus far committed make it
> difficult to use system columns to determine whether a given tuple has
> been frozen. It sounds like the consensus is that a system column is
> a better way of doing that than a function, so I suppose the next step
> is to try to hammer out the details.

Actually, I thought the function approach was a good proposal. You are
right that func(tab.*) isn't going to work, because it's going to get a
Datum-ified tuple not a pointer to raw on-disk storage. But an inspection
function that's handed a ctid could work.

I don't think that Greg's objection above has much force. The fact of the
matter is that we are never going to be able to get rid of the exposed
tableoid, oid, or ctid columns, because there are too many situations
where those are useful, and are being used, in actual client-application
queries. This is less true however of xmin/xmax/cmin/cmax; we might hope
to deprecate those at the SQL level someday, especially in view of the
fact that we keep whacking around their detailed meanings, with few user
complaints.

And I really really *don't* want to see us bloating pg_attribute, nor
infringing further on application column namespace, by adding even more
exposed columns. So -1 for just blindly doing that.

regards, tom lane

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Greg Stark <stark(at)mit(dot)edu>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Stephen Frost <sfrost(at)snowman(dot)net>, Jim Nasby <jim(at)nasby(dot)net>
Subject:	Re: preserving forensic information when we freeze
Date:	2014-01-02 14:49:27
Message-ID:	20140102144927.GB842@awork2.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2014-01-02 09:40:54 -0500, Tom Lane wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> > On Thu, Jan 2, 2014 at 12:26 AM, Greg Stark <stark(at)mit(dot)edu> wrote:
> >> 1) it's a bit weird to go to this effort to eliminate system columns by
> >> using a scheme that depends on having a system column -- ctid
>
> > At any rate, my goal isn't really to get rid of system columns, but to
> > address Jim Nasby's gripe that the changes thus far committed make it
> > difficult to use system columns to determine whether a given tuple has
> > been frozen. It sounds like the consensus is that a system column is
> > a better way of doing that than a function, so I suppose the next step
> > is to try to hammer out the details.
>
> Actually, I thought the function approach was a good proposal. You are
> right that func(tab.*) isn't going to work, because it's going to get a
> Datum-ified tuple not a pointer to raw on-disk storage. But an inspection
> function that's handed a ctid could work.

Well, we discussed that upthread, and the overhead of going through a
function is quite noticeable because the tuple needs to be fetched from
the heap again.
Check http://archives.postgresql.org/message-id/CA%2BTgmoYdWOZa_UiewbqE73AQCM2etpMjukkcmYPkGatH8kPTow%40mail.gmail.com

To get rid of part of the performance problem, we could possibly invent
a way to pass a HeapTuple instead of a blessed HeapTupleHeader to
functions, but that might not be so easy.

> And I really really *don't* want to see us bloating pg_attribute, nor
> infringing further on application column namespace, by adding even more
> exposed columns. So -1 for just blindly doing that.

Upthread there's a POC patch of mine, that started to explore what's
necessary to simply never store system columns (except maybe oid) in
pg_attribute. While it passes the regression tests it's not complete,
but the amount of work looks reasonable. Check
http://archives.postgresql.org/message-id/20131223124504.GB19795%40alap2.anarazel.de

Now, that obviously doesn't get rid of the namespace problem. I'd
suggested a system: prefix, Robert _pg_. Neither is pretty, but imo
should work.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Greg Stark <stark(at)mit(dot)edu>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Stephen Frost <sfrost(at)snowman(dot)net>, Jim Nasby <jim(at)nasby(dot)net>
Subject:	Re: preserving forensic information when we freeze
Date:	2014-01-02 17:46:34
Message-ID:	32543.1388684794@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Andres Freund <andres(at)2ndquadrant(dot)com> writes:
> On 2014-01-02 09:40:54 -0500, Tom Lane wrote:
>> Actually, I thought the function approach was a good proposal. You are
>> right that func(tab.*) isn't going to work, because it's going to get a
>> Datum-ified tuple not a pointer to raw on-disk storage. But an inspection
>> function that's handed a ctid could work.

> Well, we discussed that upthread, and the overhead of going through a
> function is quite noticeable because the tuple needs to be fetched from
> the heap again.

Yeah, I read those results, but that seems like it could probably be
optimized. I'm guessing the function was doing a new heap_open every
time? There's probably a way around that.

In any case, upon further reflection I'm not convinced that doing this
with a SELECT-based query is the right thing, no matter whether the query
looks at a function or a system column; because by definition, you'll only
be able to see tuples that are visible to your current snapshot. For real
forensics work, you need to be able to see all tuples, which makes me
think that something akin to pgstattuple is the right API; that is "return
a set of the header info for all tuples on such-and-such pages of this
relation". That should dodge any performance problem, because the
heap_open overhead could be amortized across lots of tuples, and it also
sidesteps all problems with adding new system columns.

> Upthread there's a POC patch of mine, that started to explore what's
> necessary to simply never store system columns (except maybe oid) in
> pg_attribute. While it passes the regression tests it's not complete,
> but the amount of work looks reasonable.

I think this will inevitably break a lot of code, not all of it ours,
so I'm not in favor of pursuing that direction.

regards, tom lane

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Andres Freund <andres(at)2ndquadrant(dot)com>, Greg Stark <stark(at)mit(dot)edu>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Stephen Frost <sfrost(at)snowman(dot)net>, Jim Nasby <jim(at)nasby(dot)net>
Subject:	Re: preserving forensic information when we freeze
Date:	2014-01-02 18:53:23
Message-ID:	CA+TgmoZVk6DF6t_6H_VTKarppX29QsXBCtYvph_Mu-ZMQ4Cdrw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Jan 2, 2014 at 12:46 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Andres Freund <andres(at)2ndquadrant(dot)com> writes:
>> On 2014-01-02 09:40:54 -0500, Tom Lane wrote:
>>> Actually, I thought the function approach was a good proposal. You are
>>> right that func(tab.*) isn't going to work, because it's going to get a
>>> Datum-ified tuple not a pointer to raw on-disk storage. But an inspection
>>> function that's handed a ctid could work.
>
>> Well, we discussed that upthread, and the overhead of going through a
>> function is quite noticeable because the tuple needs to be fetched from
>> the heap again.
>
> Yeah, I read those results, but that seems like it could probably be
> optimized. I'm guessing the function was doing a new heap_open every
> time? There's probably a way around that.

Yeah, it was. I don't see any plausible way around that, but I'm all ears.

> In any case, upon further reflection I'm not convinced that doing this
> with a SELECT-based query is the right thing, no matter whether the query
> looks at a function or a system column; because by definition, you'll only
> be able to see tuples that are visible to your current snapshot. For real
> forensics work, you need to be able to see all tuples, which makes me
> think that something akin to pgstattuple is the right API; that is "return
> a set of the header info for all tuples on such-and-such pages of this
> relation". That should dodge any performance problem, because the
> heap_open overhead could be amortized across lots of tuples, and it also
> sidesteps all problems with adding new system columns.

I both agree and disagree with this. I think that pgstattuple is
useful, and I further agree that adding a stat to it about how much of
the heap is frozen would be worthwhile. However, an aggregate number
isn't always what you want, and being able to scrutinize specific
tuples is, I think, a useful thing. I'm not sure that it needs to be
fast, which is why I think a function rather than a system column
might be OK, but I think it ought to be possible.

>> Upthread there's a POC patch of mine, that started to explore what's
>> necessary to simply never store system columns (except maybe oid) in
>> pg_attribute. While it passes the regression tests it's not complete,
>> but the amount of work looks reasonable.
>
> I think this will inevitably break a lot of code, not all of it ours,
> so I'm not in favor of pursuing that direction.

I thought that that approach had been discussed previously and found
desirable on the grounds that it would cut down on the size of
pg_attribute, but it's not something I want to rush through when we
have a release to get out the door.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Andres Freund <andres(at)2ndquadrant(dot)com>, Greg Stark <stark(at)mit(dot)edu>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Stephen Frost <sfrost(at)snowman(dot)net>, Jim Nasby <jim(at)nasby(dot)net>
Subject:	Re: preserving forensic information when we freeze
Date:	2014-01-02 19:03:32
Message-ID:	3433.1388689412@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Thu, Jan 2, 2014 at 12:46 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> In any case, upon further reflection I'm not convinced that doing this
>> with a SELECT-based query is the right thing, no matter whether the query
>> looks at a function or a system column; because by definition, you'll only
>> be able to see tuples that are visible to your current snapshot. For real
>> forensics work, you need to be able to see all tuples, which makes me
>> think that something akin to pgstattuple is the right API; that is "return
>> a set of the header info for all tuples on such-and-such pages of this
>> relation". That should dodge any performance problem, because the
>> heap_open overhead could be amortized across lots of tuples, and it also
>> sidesteps all problems with adding new system columns.

> I both agree and disagree with this. I think that pgstattuple is
> useful, and I further agree that adding a stat to it about how much of
> the heap is frozen would be worthwhile. However, an aggregate number
> isn't always what you want, and being able to scrutinize specific
> tuples is, I think, a useful thing.

Oh, I guess I referenced the wrong contrib module, because what I was
trying to propose is a function that returns a setof record, one row for
each physical tuple it finds. There are some functions in
contrib/pageinspect that work like that, but not pgstattuple. I don't
think pageinspect's API is ideal because it's tightly tied to processing
one page at a time, but it does show information a bit like what we need
here.

regards, tom lane

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Greg Stark <stark(at)mit(dot)edu>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Stephen Frost <sfrost(at)snowman(dot)net>, Jim Nasby <jim(at)nasby(dot)net>
Subject:	Re: preserving forensic information when we freeze
Date:	2014-01-02 19:27:18
Message-ID:	20140102192718.GE842@awork2.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2014-01-02 12:46:34 -0500, Tom Lane wrote:
> Andres Freund <andres(at)2ndquadrant(dot)com> writes:
> > On 2014-01-02 09:40:54 -0500, Tom Lane wrote:
> >> Actually, I thought the function approach was a good proposal. You are
> >> right that func(tab.*) isn't going to work, because it's going to get a
> >> Datum-ified tuple not a pointer to raw on-disk storage. But an inspection
> >> function that's handed a ctid could work.
>
> > Well, we discussed that upthread, and the overhead of going through a
> > function is quite noticeable because the tuple needs to be fetched from
> > the heap again.
>
> Yeah, I read those results, but that seems like it could probably be
> optimized. I'm guessing the function was doing a new heap_open every
> time? There's probably a way around that.

How? Everything I can think of is either fragile or too ugly.

I think it really depends - quite often you really want to just
investigate tuples you actually can see, because you can easily write
normal queries and such. It sure wouldn't replace pageinspect.

> For real
> forensics work, you need to be able to see all tuples, which makes me
> think that something akin to pgstattuple is the right API; that is "return
> a set of the header info for all tuples on such-and-such pages of this
> relation". That should dodge any performance problem, because the
> heap_open overhead could be amortized across lots of tuples, and it also
> sidesteps all problems with adding new system columns.

The biggest problem with such an API is that it's painful to use - I've
used pageinspect a fair bit, and not being able to easily get the
content of the rows you're looking at makes it really far less useful in
many scenarios. That could partially be improved by a neater API
(returning t_self would help greatly, accepting a page without a wrapper
function as well), but it still pretty fundamentally sucks.

And I really don't see any page-at-a-time access that's going to be
convenient. A big step would be returning the tuple's contents in a
normal fashion, but that's not easy without big notational overhead.

> > Upthread there's a POC patch of mine, that started to explore what's
> > necessary to simply never store system columns (except maybe oid) in
> > pg_attribute. While it passes the regression tests it's not complete,
> > but the amount of work looks reasonable.
>
> I think this will inevitably break a lot of code, not all of it ours,
> so I'm not in favor of pursuing that direction.

Are you thinking of client or extension code? From what I've looked at I
don't think it's all that likely too break much of either. Extension
code isn't likely to do its own catalog lookups and thus doesn't really
care. Most client code isn't interested in system catalog attribute
numbers - the few examples of client code looking at pg_attribute I
remember all explicitly excluded attnum < 0.

It would make pg_attribute a fair bit smaller, especially on systems
with lots of narrow relations. There's also the possibility of only
going that route for new system catalog attributes, and leave the old
ones where they are and removed them in a release or two.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Andres Freund <andres(at)2ndquadrant(dot)com>, Greg Stark <stark(at)mit(dot)edu>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Stephen Frost <sfrost(at)snowman(dot)net>, Jim Nasby <jim(at)nasby(dot)net>
Subject:	Re: preserving forensic information when we freeze
Date:	2014-01-02 19:38:45
Message-ID:	CA+TgmobVCUwRC4E_9eF+gmnKWsGf80qa58=LcVw=1=oB5P-3zQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Jan 2, 2014 at 2:03 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> I both agree and disagree with this. I think that pgstattuple is
>> useful, and I further agree that adding a stat to it about how much of
>> the heap is frozen would be worthwhile. However, an aggregate number
>> isn't always what you want, and being able to scrutinize specific
>> tuples is, I think, a useful thing.
>
> Oh, I guess I referenced the wrong contrib module, because what I was
> trying to propose is a function that returns a setof record, one row for
> each physical tuple it finds. There are some functions in
> contrib/pageinspect that work like that, but not pgstattuple. I don't
> think pageinspect's API is ideal because it's tightly tied to processing
> one page at a time, but it does show information a bit like what we need
> here.

Sure, Álvaro mentioned the same thing upthread. Also, if you notice,
the function I was proposing to add does basically the same thing as
pageinsect, except one tuple at a time rather than one page at a time.
I think both are useful, though. pageinspect is superuser-only, and
needing work by pages isn't always convenient. I thought about making
the pg_tuple_header() function I proposed scan using SnapshotAny
rather than the active snapshot, but then I think it'd need to be
superuser-only. I also thought about making it use SnapshotAny for
superusers and the active snapshot for everybody else, but that seemed
like it could be confusing.

We could certainly add a function that returns SETOF record, taking
e.g. regclass as an argument, but it doesn't seem a stretch to me to
think that you might want to get tuple header information for some but
not all tuples in the relation, and I don't see any real good way to
tell the function exactly what tuples you want except by invoking it
once per TID.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Greg Stark <stark(at)mit(dot)edu>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Stephen Frost <sfrost(at)snowman(dot)net>, Jim Nasby <jim(at)nasby(dot)net>
Subject:	Re: preserving forensic information when we freeze
Date:	2014-01-02 19:44:34
Message-ID:	14138.1388691874@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Andres Freund <andres(at)2ndquadrant(dot)com> writes:
> On 2014-01-02 12:46:34 -0500, Tom Lane wrote:
>> For real
>> forensics work, you need to be able to see all tuples, which makes me
>> think that something akin to pgstattuple is the right API; that is "return
>> a set of the header info for all tuples on such-and-such pages of this
>> relation". That should dodge any performance problem, because the
>> heap_open overhead could be amortized across lots of tuples, and it also
>> sidesteps all problems with adding new system columns.

> The biggest problem with such an API is that it's painful to use - I've
> used pageinspect a fair bit, and not being able to easily get the
> content of the rows you're looking at makes it really far less useful in
> many scenarios. That could partially be improved by a neater API

Surely. Why couldn't you join against the table on ctid?

> And I really don't see any page-at-a-time access that's going to be
> convenient.

As I commented to Robert, the page-at-a-time behavior of pageinspect
is not an API detail we'd want to copy for this. I envision something
like

select hdr.*, foo.*
from tuple_header_details('foo'::regclass) as hdr
left join foo on hdr.ctid = foo.ctid;

On a large table you might want a version that restricts its scan
to pages M through N, but that's just optimization. More useful
would be to improve the planner's intelligence about joins on ctid ...

>>> [ removing system columns from pg_attribute ]]
>> I think this will inevitably break a lot of code, not all of it ours,
>> so I'm not in favor of pursuing that direction.

> Are you thinking of client or extension code? From what I've looked at I
> don't think it's all that likely too break much of either.

It will break anything that assumes that every column is represented in
pg_attribute. I think if you think this assumption is easily removed,
you've not looked hard enough.

> It would make pg_attribute a fair bit smaller, especially on systems
> with lots of narrow relations.

I'd like to do that too, but I think getting rid of xmin/xmax/cmin/cmax
would be enough to get most of the benefit, and we could do that without
any inconsistency if we stopped exposing those as system columns.

regards, tom lane

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Andres Freund <andres(at)2ndquadrant(dot)com>, Greg Stark <stark(at)mit(dot)edu>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Stephen Frost <sfrost(at)snowman(dot)net>, Jim Nasby <jim(at)nasby(dot)net>
Subject:	Re: preserving forensic information when we freeze
Date:	2014-01-02 19:48:41
Message-ID:	CA+TgmoavG+NqOR2bi_zgfV0BtW0w3=eBOdFwR=xTwjr1g3xRPg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Jan 2, 2014 at 2:44 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> As I commented to Robert, the page-at-a-time behavior of pageinspect
> is not an API detail we'd want to copy for this. I envision something
> like
>
> select hdr.*, foo.*
> from tuple_header_details('foo'::regclass) as hdr
> left join foo on hdr.ctid = foo.ctid;
>
> On a large table you might want a version that restricts its scan
> to pages M through N, but that's just optimization. More useful
> would be to improve the planner's intelligence about joins on ctid ...

/me blinks.

Surely you're not serious. That's going to scan the whole darn table
even if we only want the details for one row. And making the planner
smarter about joins on CTID doesn't help a bit, unless you can push
the join qual down *inside the tuple_header_details() function call*.
That'd be pretty a pretty sweet thing to allow for SRFs in general,
but it doesn't sound like something we're going to conjure up at the
drop of a hat.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Andres Freund <andres(at)2ndquadrant(dot)com>, Greg Stark <stark(at)mit(dot)edu>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Stephen Frost <sfrost(at)snowman(dot)net>, Jim Nasby <jim(at)nasby(dot)net>
Subject:	Re: preserving forensic information when we freeze
Date:	2014-01-02 19:52:33
Message-ID:	14350.1388692353@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> We could certainly add a function that returns SETOF record, taking
> e.g. regclass as an argument, but it doesn't seem a stretch to me to
> think that you might want to get tuple header information for some but
> not all tuples in the relation, and I don't see any real good way to
> tell the function exactly what tuples you want except by invoking it
> once per TID.

I have no objection to having a function that retrieves the details for
a given TID alongside one that does it for a whole relation. The point
here is just that we should be headed in the direction of removing as
many system columns as we can, not adding more; especially not ones that
(a) have no purpose except forensics and (b) are virtually certain to
change across system versions.

regards, tom lane

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Andres Freund <andres(at)2ndquadrant(dot)com>, Greg Stark <stark(at)mit(dot)edu>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Stephen Frost <sfrost(at)snowman(dot)net>, Jim Nasby <jim(at)nasby(dot)net>
Subject:	Re: preserving forensic information when we freeze
Date:	2014-01-02 19:56:16
Message-ID:	14451.1388692576@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Thu, Jan 2, 2014 at 2:44 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> As I commented to Robert, the page-at-a-time behavior of pageinspect
>> is not an API detail we'd want to copy for this. I envision something
>> like
>>
>> select hdr.*, foo.*
>> from tuple_header_details('foo'::regclass) as hdr
>> left join foo on hdr.ctid = foo.ctid;
>>
>> On a large table you might want a version that restricts its scan
>> to pages M through N, but that's just optimization. More useful
>> would be to improve the planner's intelligence about joins on ctid ...

> /me blinks.

> Surely you're not serious. That's going to scan the whole darn table
> even if we only want the details for one row.

And? The complaint about the other function was that it was inefficient
when getting the details for a whole table, so I don't think you can
complain about an approach that handles that case well. Use the other
function if you just want info for one row (that you can see).

regards, tom lane

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Andres Freund <andres(at)2ndquadrant(dot)com>, Greg Stark <stark(at)mit(dot)edu>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Stephen Frost <sfrost(at)snowman(dot)net>, Jim Nasby <jim(at)nasby(dot)net>
Subject:	Re: preserving forensic information when we freeze
Date:	2014-01-02 20:02:56
Message-ID:	CA+TgmoagVZPF2ZXaK6XByg8MH3Y4dbzBF0d4e05arSahYF_9dA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Jan 2, 2014 at 2:56 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Thu, Jan 2, 2014 at 2:44 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> As I commented to Robert, the page-at-a-time behavior of pageinspect
>>> is not an API detail we'd want to copy for this. I envision something
>>> like
>>>
>>> select hdr.*, foo.*
>>> from tuple_header_details('foo'::regclass) as hdr
>>> left join foo on hdr.ctid = foo.ctid;
>>>
>>> On a large table you might want a version that restricts its scan
>>> to pages M through N, but that's just optimization. More useful
>>> would be to improve the planner's intelligence about joins on ctid ...
>
>> /me blinks.
>
>> Surely you're not serious. That's going to scan the whole darn table
>> even if we only want the details for one row.
>
> And? The complaint about the other function was that it was inefficient
> when getting the details for a whole table, so I don't think you can
> complain about an approach that handles that case well. Use the other
> function if you just want info for one row (that you can see).

Well, that's fair enough. I don't mind having two functions. Should
the whole-table function also include invisible tuples?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Andres Freund <andres(at)2ndquadrant(dot)com>, Greg Stark <stark(at)mit(dot)edu>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Stephen Frost <sfrost(at)snowman(dot)net>, Jim Nasby <jim(at)nasby(dot)net>
Subject:	Re: preserving forensic information when we freeze
Date:	2014-01-02 20:06:34
Message-ID:	14780.1388693194@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> Well, that's fair enough. I don't mind having two functions. Should
> the whole-table function also include invisible tuples?

Certainly, that's exactly why I was proposing it. You can do a join
if you want to suppress them.

regards, tom lane

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Greg Stark <stark(at)mit(dot)edu>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Stephen Frost <sfrost(at)snowman(dot)net>, Jim Nasby <jim(at)nasby(dot)net>
Subject:	Re: preserving forensic information when we freeze
Date:	2014-01-02 20:09:52
Message-ID:	20140102200952.GB22022@awork2.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2014-01-02 14:44:34 -0500, Tom Lane wrote:
> Andres Freund <andres(at)2ndquadrant(dot)com> writes:
> > On 2014-01-02 12:46:34 -0500, Tom Lane wrote:
> >> For real
> >> forensics work, you need to be able to see all tuples, which makes me
> >> think that something akin to pgstattuple is the right API; that is "return
> >> a set of the header info for all tuples on such-and-such pages of this
> >> relation". That should dodge any performance problem, because the
> >> heap_open overhead could be amortized across lots of tuples, and it also
> >> sidesteps all problems with adding new system columns.
>
> > The biggest problem with such an API is that it's painful to use - I've
> > used pageinspect a fair bit, and not being able to easily get the
> > content of the rows you're looking at makes it really far less useful in
> > many scenarios. That could partially be improved by a neater API
>
> Surely. Why couldn't you join against the table on ctid?

For the case of pageinspect it's because pageinspect doesn't return the
ctid of a tuple in a useful way - its t_ctid is HeapTupleHeader->t_ctid,
not HeapTuple->t_self...
In many cases bulk access really isn't all that useful - you do a SELECT
searching for data that's looking strange and then need the forensic
data for those. That's just painful with any of the proposed fast APIs
afaics.

> > And I really don't see any page-at-a-time access that's going to be
> > convenient.
>
> As I commented to Robert, the page-at-a-time behavior of pageinspect
> is not an API detail we'd want to copy for this. I envision something
> like
>
> select hdr.*, foo.*
> from tuple_header_details('foo'::regclass) as hdr
> left join foo on hdr.ctid = foo.ctid;
>
> On a large table you might want a version that restricts its scan
> to pages M through N, but that's just optimization. More useful
> would be to improve the planner's intelligence about joins on ctid ...

That really makes for but ugly queries. E.g. the database I found the
multixact bugs on was ~300GB and I had to look about 80GB of it. So I
would have had to write chunking code for individual tables. Not what
you want to do when shit has hit the fan.

> >>> [ removing system columns from pg_attribute ]]
> >> I think this will inevitably break a lot of code, not all of it ours,
> >> so I'm not in favor of pursuing that direction.
>
> > Are you thinking of client or extension code? From what I've looked at I
> > don't think it's all that likely too break much of either.
>
> It will break anything that assumes that every column is represented in
> pg_attribute. I think if you think this assumption is easily removed,
> you've not looked hard enough.

Uh. And how much code actually is that? Note that system columns already
aren't in a Relation's TupleDesc. So it's not they would be missing from
that - and if that were the issue we could easily add them there when
the cache entry is built.

There really isn't much code in postgres itself that iterates over all
columns including system columns. Some bits around heap.c, tablecmd.c,
lsyscache.c do lookups by name, but they are easily converted to
SystemAttributeByName()/SystemAttributeDefinition().

Most non DDL code doesn't care at all.

> > It would make pg_attribute a fair bit smaller, especially on systems
> > with lots of narrow relations.
>
> I'd like to do that too, but I think getting rid of xmin/xmax/cmin/cmax
> would be enough to get most of the benefit, and we could do that without
> any inconsistency if we stopped exposing those as system columns.

Well, I have yet to see any realistic proposal to get rid of
them. Having to write significantly more complex and/or significantly more
expensive queries doesn't qualify.

And there really is code out there using xmin/xmax as part of their
code, so I think the rumor of nobody crying about their near death is
just that, a rumor.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services