Re: Critical performance problems on large databases

From: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
To: Shaun Thomas <sthomas(at)townnews(dot)com>
Cc: Gunther Schadow <gunther(at)aurora(dot)regenstrief(dot)org>, <pgsql-general(at)postgresql(dot)org>
Subject: Re: Critical performance problems on large databases
Date: 2002-04-11 17:51:16
Message-ID: Pine.GSO.4.44.0204112045140.9764-100000@ra.sai.msu.su
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

The big issue with LIMIT,OFFSET is that it still use all rows
for sorting. I already suggested to use partial sorting to avoid
sorting all rows if one selected only first 20 row, for example.
It's very important for Web applications because web users usually
read first 1-2 pages. Our experimnets have shown 6 times performance
win when using partial sorting.

Oleg
On Thu, 11 Apr 2002, Shaun Thomas wrote:

> On Wed, 10 Apr 2002, Gunther Schadow wrote:
>
> > SELECT * FROM Bigtable;
> >
> > it takes a long time for it to come up with the first page
> > of results using up lots of computer resources etc and after
> > the first page is returned the backend basically goes into
> > idle mode for all the rest of the query results retrieval.
>
> Very simple. PHP and PERL DBI, when doing non-cursor queries, will try
> and buffer the *entire* result into memory before letting you have it.
> In PHP, when you run a pg_exec, or mysql_query, or whatever, it actually
> executes the query, and stores all of the results in memory. Don't
> believe me? Try closing the database connection, and then do a fetch on
> the result identifier. Heck, fetch until you reach the end.
>
> You'll get all of your results. Even though the database is closed.
>
> So when you are doing a "SELECT * FROM bigtable", you're telling PHP to
> not only buffer over a million rows, but to transfer it from the
> database as well. "SELECT COUNT(*) FROM bigtable" doesn't have
> this overhead. The database just returns a single row, which you
> can view basically as soon as the database is done.
>
> There's also something to be said for the LIMIT clause. Somehow I
> doubt you need every single row in the entire table in order to do
> your work on that page. Just adding LIMIT/OFFSET will increase your
> speed significantly.
>
> Or, as someone else mentioned, use a cursor, and let the database buffer
> your query. It'll be slow too, but you can at least fetch the rows
> individually and get the perception of speed in your application.
>
> > no work needed other than to retrieve the tuples out of
> > physical storage, the response should be immediate and resource
> > usage low. There should not be large buffer allocations.
>
> I see how you can get into this kind of thinking. Since select * has no
> ordering, no calculations, nothing but just returning raw results from
> the table in whatever format they may be in, how could it possibly be
> resource intensive? But the logic is wrong. If you want all the data,
> it'll give it to you. Whether you do it at the console, and it has to
> throw everything at the screen buffer, or your application, which has
> to put it in some temporary storage until it gets what it needs and
> deallocates the memory. That data has to go *somewhere*, it can't just
> exist in limbo until you decide you want to use some of it.
>
> You can use cursors to get around this, because the database basically
> runs the query and doesn't send back squat until you actually ask it to.
> But I'll tell you a well selected limit clause will work almost as well,
> and reduce the need for the database to maintain a cursor.
>
> Say you're on page 5, and you are showing 20 results per page. Knowing
> that results start at 0, you can get the offset just by doing:
>
> (PAGE - 1) * PER_PAGE = (5-1) * 20 = 4*20 = 80. So your query becomes:
>
> SELECT * FROM Bigtable LIMIT 20 OFFSET 80.
>
> And viola. You'll get back 20 rows right where you want them. Also,
> take care of what you select from the table. Maybe you don't actually
> need all of the data, but only certain rows. The less data that has to
> be transferred from database to application, the faster you can get/show
> your data. Especially if your database is on a separate machine from
> your application server. Network transfers are *not* instantaneous.
>
> Try:
>
> SELECT col1,col2,col3,etc FROM Bigtable LIMIT x OFFSET y
>
> instead. You'll save yourself some time, save the database the effort
> of loading all million rows of every column in the table into the
> network interface, and save your application the need to fetch them.
>
> > COUNT(smallcolumn) behaves much faster than COUNT(*).
>
> Or you can do what every database optimization book in the entire
> universe says, and do COUNT(1). Since all you're testing is the
> existence of the row, not any value of anything in it.
>
> > Again, count should be streamed as well such as to use no
> > significant memory resources other than the counter.
>
> Actually, it's converted into: "*, eh? I think not. Let me rewrite
> this query and put a 1 here instead."
>
> The database tries to keep people from shooting themselves in the foot,
> but even the most Herculean effort will fail when the user is bound and
> determined to blow off their big toe.
>
> > Any enlightenments? Am I wrong? Will this be fixed soon?
> > Is it hard to change pgsql to do better streaming of its
> > operations.
>
> Read a few books on database design and optimization, the basics of
> application memory allocation, and the TCP/IP stack. If you want to
> understand how to make an application fast from front to back, you have
> to understand the components that make it work. Your knowledge of
> application memory performance and network latency seems inherently
> flawed, and until you get over the assumption that network transfers are
> free and optimizing queries is a fool's errand, you'll continue to have
> problems in any database you choose.
>
>

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Fran Fabrizio 2002-04-11 17:51:43 Re: where to find info about data types?
Previous Message Papp, Gyozo 2002-04-11 17:31:20 Re: SPI_execp() failed in RI_FKey_cascade_del()