Re: getting the most of out multi-core systems for repeated complex SELECT statements

From: Andy Colson <andy(at)squeakycode(dot)net>
To: Greg Smith <greg(at)2ndquadrant(dot)com>
Cc: Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>, gnuoytr(at)rcn(dot)com, pgsql-performance(at)postgresql(dot)org
Subject: Re: getting the most of out multi-core systems for repeated complex SELECT statements
Date: 2011-02-04 04:19:48
Message-ID: 4D4B7E64.4030203@squeakycode.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On 02/03/2011 10:00 PM, Greg Smith wrote:
> Andy Colson wrote:
>> Cpu's wont get faster, but HD's and SSD's will. To have one database connection, which runs one query, run fast, it's going to need multi-core support.
>
> My point was that situations where people need to run one query on one database connection that aren't in fact limited by disk I/O are far less common than people think. My troublesome database servers aren't ones with a single CPU at its max but wishing there were more workers, they're the ones that have >25% waiting for I/O. And even that crowd is still a subset, distinct from people who don't care about the speed of any one core, they need lots of connections to go at once.
>

Yes, I agree... for today. If you gaze into 5 years... double the core count (but not the speed), double the IO rate. What do you see?

>> My point is, there must be levels of threading, yes? If a backend has data to sort, has it collected, nothing locked, what would it hurt to use multi-core sorting?
>
> Optimizer nodes don't run that way. The executor "pulls" rows out of the top of the node tree, which then pulls from its children, etc. If you just blindly ran off and executed every individual node to completion in parallel, that's not always going to be faster--could be a lot slower, if the original query never even needed to execute portions of the tree.
>
> When you start dealing with all of the types of nodes that are out there it gets very messy in a hurry. Decomposing the nodes of the query tree into steps that can be executed in parallel usefully is the hard problem hiding behind the simple idea of "use all the cores!"
>

What if... the nodes were run in separate threads, and interconnected via queues? A node would not have to run to completion either. A queue could be setup to have a max items. When a node adds 5 out of 5 items it would go to sleep. Its parent node, removing one of the items could wake it up.

-Andy

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Scott Marlowe 2011-02-04 04:57:31 Re: getting the most of out multi-core systems for repeated complex SELECT statements
Previous Message Greg Smith 2011-02-04 04:10:41 Re: [HACKERS] Slow count(*) again...