Re: Performance Enhancement/Fix for Array Utility Functions

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Daniel Farina <drfarina(at)acm(dot)org>, Mike Lewis <mikelikespie(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Performance Enhancement/Fix for Array Utility Functions
Date: 2010-07-28 03:43:37
Message-ID: AANLkTi=QfXOpURdN0RN05bZoWzOaR0ym-esuh6_6K6pW@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jul 16, 2010 at 4:43 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Daniel Farina <drfarina(at)acm(dot)org> writes:
>> Generally I think the delimited untoasting of metadata from arrays
>> separately from the payload is Not A Bad Idea.
>
> I looked at this patch a bit.  I agree that it could be a big win for
> large external arrays, but ...
>
> 1. As-is, it's a significant *pessimization* for small arrays, because
> the heap_tuple_untoast_attr_slice code does a palloc/copy even when one
> is not needed because the data is already not toasted.  I think there
> needs to be a code path that avoids that.

This seems like it shouldn't be too hard to fix, and I think it should be fixed.

> 2. Arrays that are large enough to be pushed out to toast storage are
> almost certainly going to get compressed first.  The potential win in
> this case is very limited because heap_tuple_untoast_attr_slice will
> fetch and decompress the whole thing.  Admittedly this is a limitation
> of the existing code and not a fault of the patch proper, but still, if
> you want to make something that's generically useful, you need to do
> something about that.  Perhaps pglz_decompress() could be extended with
> an argument to say "decompress no more than this much" --- although that
> would mean adding another test to its inner loop, so we'd need to check
> for performance degradation.  I'm also unsure how to predict how much
> compressed data needs to be read in to get at least N bytes of
> decompressed data.

But this part seems to me to be something that can, and probably
should, be left for a separate patch. I don't see that we're losing
anything this way; we're just not winning as much as we otherwise
might. To really fix this, I suspect we'd need a version of
pglz_decompress that can operate in "streaming mode"; you tell it how
many bytes you want, and it returns a code indicating that either (a)
it decompressed that many bytes or (b) it decompressed less than that
many bytes and you can call it again with another chunk of data if you
want to keep going. That'd probably be a good thing to have, but I
don't think it needs to be a prerequisite for this patch.

I'm going to mark this patch "Waiting on Author".

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Davis 2010-07-28 04:23:54 Re: page corruption on 8.3+ that makes it to standby
Previous Message Tom Lane 2010-07-28 03:18:18 Re: Improper usage of MemoryContext in nodeSubplan.c for buildSubPlanHash() function. This maybe causes allocate memory failed.