Re: Parallel Sort

From: Kohei KaiGai <kaigai(at)kaigai(dot)gr(dot)jp>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: PgHacker <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel Sort
Date: 2013-05-13 19:52:43
Message-ID: CADyhKSUieOzpqS0erkTxTHtZr999f5jJR68DfT=nv=TVipbZ3g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

2013/5/13 Noah Misch <noah(at)leadboat(dot)com>

> * Planner & Similar Issues
>
> We should decide whether to actually sort in parallel based on the
> comparator
> cost and the data size. The system currently has no information on
> comparator
> cost: bt*cmp (and indeed almost all built-in functions) all have procost=1,
> but bttextcmp is at least 1000x slower than btint4cmp. Let's improve the
> procost estimates of all core B-tree and hash operators. This will have
> other
> benefits, but we will need to be cognizant of the risk of upsetting setups
> that have tuned cpu_operator_cost based on the present situation.
>
> The choice of whether to parallelize can probably be made a manner similar
> to
> the choice to do an external sort: the planner guesses the outcome for
> costing
> purposes, but the actual decision is made at execution time. The planner
> would determine a tuple count cutoff at which parallelism becomes
> favorable,
> and tuplesort would check that to establish its actual decision.
>
> It probably crossovers my problem consciousness to off-load CPU bounds
workloads; that I partially tried to implement on writable foreign table
feature.
Not only sorting stuff, I think it may be worthful to have capability to
push
heavy workload (like sort, aggregate or complex target-list) out external
computing resources.
However, I doubt whether the decision to parallelize should be done in
execution time, rather than plan stage. For example, in case when we
have enough number of records and 10-core multiprocessor, the wise
plan may take parallel data load by 10-processors, partial-sort by 10-
processors individually, then merge-sort. It needs fundamental different
tree structure from the traditional single-processors based plan-tree.
So, it seems to me we should take an enhancement to allow to inject
plan-tree special purpose parallel processing plan node.
How about your opinion?

Thanks,

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2013-05-13 21:27:36 Re: Parallel Sort
Previous Message Tom Lane 2013-05-13 19:16:17 Re: erroneous restore into pg_catalog schema