Re: patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap

From: Jameison Martin <jameisonb(at)yahoo(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap
Date: 2012-08-09 15:56:14
Message-ID: 1344527774.12166.YahooMailNeo@web39404.mail.mud.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Simon, Tom is correct, the patch doesn't change the existing row format contract or the format of the null bitmap. The change only affects how new rows are written out. And it uses the same supported format that has always been there (which is why alter table add col null works the way it does). And it keeps to the same MAXALIGN boundaries that are there today. 

One could argue that different row formats could make sense in different circumstances, and I'm certainly open to that kind of discussion, but this change is far more modest and perhaps can be made on its own since it doesn't perturb the code base much, improves performance (marginally) and improves the size of rows with lots of trailing nulls.

[separate topic: pluggable heap manager]
I'm quite interested in pursuing more aggressive compression strategies, and I'd like to do so in the context of the heap manager. I'm exploring having a pluggable heap manager implementation and would be interested in feedback on that as a general approach. My thinking is that I'd like to be able to have PostgreSQL support multiple heap implementations along the lines of how multiple index types are supported, though probably only the existing heap manager implementation would be part of the actual codeline. I've done a little exploratory work of looking at the heap interface. I was planning on doing a little prototyping before suggesting anything concrete, but, assuming the concept of a layered heap manager is not inherently objectionable, I was thinking of cleaning up the heap interface a little (e.g. some HOT stuff has bled across a little), then taking a whack at formalizing the interface along the lines of the index layering. So ideally I'd make a
few separate submissions and if all goes according to plan I'd be able to have a pluggable heap manager implementation that I could work on independently and which could in theory use the same hooks as the existing heap implementation. And if it turns out that my implementation is deemed to be general enough it could be released to the community.

If I do decide to pursue this, can anyone suggest the best way solicit feedback? I see that some proposals get shared on the postgres wiki. I could put something up there to frame the issue and encourage some back and forth dialog. Or is email the way that this kind of exchange tends to happen? Ultimately I'd like to get into a bit of detail about what the actual heap manager contract is and so forth.

Note that I'm a ways from really knowing if this is feasible on my end, so this is quite speculative at this point. But I'd like to introduce the topic and get some feedback on the right way to communicate as early as possible.

Thanks.

-Jamie

________________________________
From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: Jameison Martin <jameisonb(at)yahoo(dot)com>; "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Sent: Thursday, August 9, 2012 7:27 AM
Subject: Re: [HACKERS] patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap

Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
> On 17 April 2012 17:22, Jameison Martin <jameisonb(at)yahoo(dot)com> wrote:
>> The following patch truncates trailing null attributes from heap rows to
>> reduce the size of the row bitmap.

> This is an interesting patch, but its has had various comments made about it.

> When I look at this I see that it would change the NULL bitmap for all
> existing rows, which means it forces a complete unload/reload of data.

Huh?  I thought it would only change how *new* tuples were stored.
Old tuples ought to continue to work fine.

I'm not really convinced that it's a good idea in the larger scheme
of things --- your point in a nearby thread that micro-optimizing
storage space at the expense of all else is not good engineering
applies here.  But I don't see that it forces data reload.  Or if
it does, that should be easily fixable.

> ...  Have another flag which indicates
> when a partial trailing col trimmed NULL bitmap is in use.

That might be useful for forensic purposes, but on the whole I suspect
it's just added complexity (and eating up a valuable infomask bit)
for relatively little gain.

> ... decide whether a table will benefit from full or partial bitmap and
> set that in the tupledesc. That way the tupledesc will show
> heap_form_tuple which kind of null bitmap is preferred for new tuples.
> That preference might be settable by user on or off, but the default
> would be for postgres to decide that for us based upon null stats etc,
> which we would decide at ANALYZE time.

And that seems like huge overcomplication.  I think we could probably
do fine with some very simple fixed policy, like "don't bother with
this for tables of less than N columns", where N is maybe 64 or so
and chosen to match the MAXALIGN boundary where there actually could
be some savings from trimming the null bitmap.

(Note: I've not read the patch, so maybe Jameison already did something
of the sort.)

            regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2012-08-09 15:56:33 WIP patch for consolidating misplaced-aggregate checks
Previous Message Alexander Korotkov 2012-08-09 15:42:45 Re: SP-GiST for ranges based on 2d-mapping and quad-tree