Re: Parallel Sort

From: Peter Geoghegan <pg(at)heroku(dot)com>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Parallel Sort
Date: 2013-05-15 18:32:29
Message-ID: CAM3SWZTz3FtcNT=_gOOhX_5Qt_QRvG-5oma6x1GsuR6VC1AWzQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, May 13, 2013 at 7:28 AM, Noah Misch <noah(at)leadboat(dot)com> wrote:
> We should decide whether to actually sort in parallel based on the comparator
> cost and the data size. The system currently has no information on comparator
> cost: bt*cmp (and indeed almost all built-in functions) all have procost=1,
> but bttextcmp is at least 1000x slower than btint4cmp.

I think that this effort could justify itself independently of any
attempt to introduce parallelism to in-memory sorting. I abandoned a
patch to introduce timsort to Postgres, because I knew that there was
no principled way to reap the benefits. Unless you introduce
parallelism, it's probably going to be virtually impossible to come up
with an alogorithm that does in-memory sorting faster (in terms of the
amount of system time taken) than a highly optimized quicksort when
sorting integers. But sorting types with really expensive comparators
(even considerably more expensive than bttextcmp) for
pass-by-reference Datums (where the memory locality advantage of
quicksort doesn't really help so much) makes timsort much more
compelling. That's why it's used for Python lists.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Cédric Villemain 2013-05-15 18:32:37 Re: PostgreSQL 9.3 beta breaks some extensions "make install"
Previous Message Kevin Grittner 2013-05-15 18:18:35 Re: counting algorithm for incremental matview maintenance