Re: sortsupport for text

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Peter Geoghegan <peter(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Stark <stark(at)mit(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: sortsupport for text
Date: 2012-06-14 16:35:04
Message-ID: CA+TgmobCEMwZSSkuG7Vhjm_6iwU9z49bcPm_KpZFNKSekuoVnA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jun 14, 2012 at 11:36 AM, Peter Geoghegan <peter(at)2ndquadrant(dot)com> wrote:
> On 18 March 2012 15:08, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> One other thing I've always wondered about in this connection is the
>> general performance of sorting toasted datums.  Is it better to detoast
>> them in every comparison, or pre-detoast to save comparison cycles at
>> the cost of having to push much more data around?  I didn't see any
>> discussion of this point in Robert's benchmarks, but I don't think we
>> should go very far towards enabling sortsupport for text until we
>> understand the issue and know whether we need to add more infrastructure
>> for it.  If you cross your eyes a little bit, this is very much like
>> the strxfrm question...
>
> I see the parallels.

The problem with pre-detoasting to save comparison cycles is that you
can now fit many, many fewer tuples in work_mem. There might be cases
where it wins (for example, because the entire data set fits even
after decompressing everything) but in most cases it seems like a
loser.

Also, my guess is that most values people sort by are pretty short,
making this concern mostly academic. Suppose you are sorting a bunch
of strings which might be either 100 characters in length or 1MB. If
they're all 100 characters, you probably don't need to detoast. If
they're all 1MB, you probably can't detoast without eating up a ton of
memory (and even if you have it, this might not be the best use for
it). If you have a mix, detoasting might be affordable provided that
the percentage of long strings is small, but it's also not going to
save you much, because if the percentage of long strings is small,
then most comparisons will be between two short strings where we don't
save anything anyway.

All things considered, this seems to me to be aiming at a pretty
narrow target, but maybe I'm just not thinking about it creatively
enough.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2012-06-14 16:38:54 Re: Minimising windows installer password confusion
Previous Message Robert Haas 2012-06-14 16:17:26 Re: [RFC][PATCH] Logical Replication/BDR prototype and architecture