Re: Critical performance problems on large databases

From: Stephan Szabo <sszabo(at)megazone23(dot)bigpanda(dot)com>
To: Gunther Schadow <gunther(at)aurora(dot)regenstrief(dot)org>
Cc: <pgsql-general(at)postgresql(dot)org>
Subject: Re: Critical performance problems on large databases
Date: 2002-04-11 16:54:22
Message-ID: 20020411094638.E34142-100000@megazone23.bigpanda.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general


> The constructive responses suggested that I use LIMIT/OFFSET and
> CURSORs. I can see how that could be a workaround the problem, but
> I still believe that something is wrong with the PostgreSQL query
> executer. Loading the entire result set into a buffer without
> need just makes no sense. Good data base engines try to provide
> for parallel execution of the query plan as much as possible, and
> that implies streaming. There's a crowd of literature about this
> testifying for it's importance.
>
> The main reasons for this is (a) the use on multi-processor machines
> where one CPU does one task and the other does another task on the
> same query plan and the results from CPU 1 is streamed to CPU 2 (hey,
> I have a 6 processor machine in my basement.) Perhaps more importantly
> (b) buffering (without need) is inherently bad, because it wastes
> memory resources leads to bursty demand on CPU and network, and slow
> perceived response times. Buffering is a complete waste if the buffer
> is being paged out to disk again and it isn't flexible or scaleable
> if buffer pages are fixed into physical memory. Straming is
> especially important if you want to do distributed joins (and though
> pgsql doesn't support that yet it would be foolish to close your eyes
> before a fundamental problem and then being forced to rework this in
> a hurry when the time comes for distributed PostgreSQL.)
>
> So, while my client application might benefit from such things as
> cursors and OFFSET/LIMIT, the query planning and executing may
> suffer from the buffering. And of course, the point is that it makes
> sense to design the server such that streaming results to the
> client is transparent because it automatically relieves the strain
> on all resources, CPU, storage and network! Isn't that obvious?

My limited understanding of the internals is that except for steps
that want their entire data set for providing output (sort) and
the final send everything to the client, internally each node
asks other nodes for rows as they need them and those nodes can
provide results on an as needed basis without necessarily buffering
their result set.

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2002-04-11 16:56:57 Re: Critical performance problems on large databases
Previous Message Rick Szeto 2002-04-11 16:48:24 Re: "NOT IN" predicate hangs result