Quick Links

Re: [HACKERS] Slow count(*) again...

Lists:	pgsql-hackerspgsql-performance

From:	Neil Whelchel <neil(dot)whelchel(at)gmail(dot)com>
To:	pgsql-performance(at)postgresql(dot)org
Subject:	Slow count(*) again...
Date:	2010-10-09 23:26:18
Message-ID:	201010091626.18636.neil.whelchel@gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

I know that there haven been many discussions on the slowness of count(*) even
when an index is involved because the visibility of the rows has to be
checked. In the past I have seen many suggestions about using triggers and
tables to keep track of counts and while this works fine in a situation where
you know what the report is going to be ahead of time, this is simply not an
option when an unknown WHERE clause is to be used (dynamically generated).
I ran into a fine example of this when I was searching this mailing list,
"Searching in 856,646 pages took 13.48202 seconds. Site search powered by
PostgreSQL 8.3." Obviously at some point count(*) came into play here because
the site made a list of pages (1 2 3 4 5 6 > next). I very commonly make a
list of pages from search results, and the biggest time killer here is the
count(*) portion, even worse yet, I sometimes have to hit the database with
two SELECT statements, one with OFFSET and LIMIT to get the page of results I
need and another to get the amount of total rows so I can estimate how many
pages of results are available. The point I am driving at here is that since
building a list of pages of results is such a common thing to do, there need
to be some specific high speed ways to do this in one query. Maybe an
estimate(*) that works like count but gives an answer from the index without
checking visibility? I am sure that this would be good enough to make a page
list, it is really no big deal if it errors on the positive side, maybe the
list of pages has an extra page off the end. I can live with that. What I
can't live with is taking 13 seconds to get a page of results from 850,000
rows in a table.
-Neil-

From:	Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
To:	Neil Whelchel <neil(dot)whelchel(at)gmail(dot)com>
Cc:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: Slow count(*) again...
Date:	2010-10-10 01:47:34
Message-ID:	AANLkTimayTM__hHu9RoiLaNhzfpNY6Vo5y4OkbX_sV=o@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On Sat, Oct 9, 2010 at 5:26 PM, Neil Whelchel <neil(dot)whelchel(at)gmail(dot)com> wrote:
> I know that there haven been many discussions on the slowness of count(*) even
> when an index is involved because the visibility of the rows has to be
> checked. In the past I have seen many suggestions about using triggers and
> tables to keep track of counts and while this works fine in a situation where
> you know what the report is going to be ahead of time, this is simply not an
> option when an unknown WHERE clause is to be used (dynamically generated).
> I ran into a fine example of this when I was searching this mailing list,
> "Searching in 856,646 pages took 13.48202 seconds. Site search powered by
> PostgreSQL 8.3." Obviously at some point count(*) came into play here because
> the site made a list of pages (1 2 3 4 5 6 > next). I very commonly make a
> list of pages from search results, and the biggest time killer here is the
> count(*) portion, even worse yet, I sometimes have to hit the database with
> two SELECT statements, one with OFFSET and LIMIT to get the page of results I
> need and another to get the amount of total rows so I can estimate how many
> pages of results are available. The point I am driving at here is that since
> building a list of pages of results is such a common thing to do, there need
> to be some specific high speed ways to do this in one query. Maybe an
> estimate(*) that works like count but gives an answer from the index without
> checking visibility? I am sure that this would be good enough to make a page
> list, it is really no big deal if it errors on the positive side, maybe the
> list of pages has an extra page off the end. I can live with that. What I
> can't live with is taking 13 seconds to get a page of results from 850,000
> rows in a table.

99% of the time in the situations you don't need an exact measure, and
assuming analyze has run recently, select rel_tuples from pg_class for
a given table is more than close enough. I'm sure wrapping that in a
simple estimated_rows() function would be easy enough to do.

From:	Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>
To:	Neil Whelchel <neil(dot)whelchel(at)gmail(dot)com>
Cc:	"pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: Slow count(*) again...
Date:	2010-10-10 01:54:15
Message-ID:	4CB11CC7.8070300@vmsinfo.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

Neil Whelchel wrote:
> I know that there haven been many discussions on the slowness of count(*) even
> when an index is involved because the visibility of the rows has to be
> checked. In the past I have seen many suggestions about using triggers and
> tables to keep track of counts and while this works fine in a situation where
> you know what the report is going to be ahead of time, this is simply not an
> option when an unknown WHERE clause is to be used (dynamically generated).
> I ran into a fine example of this when I was searching this mailing list,
> "Searching in 856,646 pages took 13.48202 seconds. Site search powered by
> PostgreSQL 8.3." Obviously at some point count(*) came into play here because
> the site made a list of pages (1 2 3 4 5 6 > next). I very commonly make a
> list of pages from search results, and the biggest time killer here is the
> count(*) portion, even worse yet, I sometimes have to hit the database with
> two SELECT statements, one with OFFSET and LIMIT to get the page of results I
> need and another to get the amount of total rows so I can estimate how many
> pages of results are available. The point I am driving at here is that since
> building a list of pages of results is such a common thing to do, there need
> to be some specific high speed ways to do this in one query. Maybe an
> estimate(*) that works like count but gives an answer from the index without
> checking visibility? I am sure that this would be good enough to make a page
> list, it is really no big deal if it errors on the positive side, maybe the
> list of pages has an extra page off the end. I can live with that. What I
> can't live with is taking 13 seconds to get a page of results from 850,000
> rows in a table.
> -Neil-
>
>
Unfortunately, the problem is in the rather primitive way PostgreSQL
does I/O. It didn't change in 9.0 so there is nothing you could gain by
upgrading. If you execute strace -o /tmp/pg.out -e read <PID of the
sequential scan process> and inspect the file /tmp/pg.out when the query
finishes, you will notice a gazillion of read requests, all of them 8192
bytes in size. That means that PostgreSQL is reading the table block by
block, without any merging of the requests. You can alleviate the pain
by using the OS tricks, like specifying the deadline I/O scheduler in
the grub.conf and set prefetch on the FS block devices by using
blockdev, but there is nothing special that can be done, short of
rewriting the way PostgreSQL does I/O. There were rumors about the
version 9.0 and asynchronous I/O, but that didn't materialize. That is
really strange to me, because PostgreSQL tables are files or groups of
files, if the table size exceeds 1GB. It wouldn't be very hard to try
reading 1MB at a time and that would speed up the full table scan
significantly.
Problem with single block I/O is that there is a context switch for each
request, the I/O scheduler has to work hard to merge requests
appropriately and there is really no need for that, tables are files
navigating through files is not a problem, even with much larger blocks.
In another database, whose name I will not mention, there is a parameter
db_file_multiblock_read_count which specifies how many blocks will be
read by a single read when doing a full table scan. PostgreSQL is in
dire need of something similar and it wouldn't even be that hard to
implement.

--
Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
www.vmsinfo.com

From:	Joe Conway <mail(at)joeconway(dot)com>
To:	mladen(dot)gogala(at)vmsinfo(dot)com
Cc:	Neil Whelchel <neil(dot)whelchel(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: Slow count(*) again...
Date:	2010-10-10 02:10:38
Message-ID:	4CB1209E.9000706@joeconway.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On 10/09/2010 06:54 PM, Mladen Gogala wrote:
> In another database, whose name I will not mention, there is a parameter
> db_file_multiblock_read_count which specifies how many blocks will be
> read by a single read when doing a full table scan. PostgreSQL is in
> dire need of something similar and it wouldn't even be that hard to
> implement.

You're correct in that it isn't particularly difficult to implement for
sequential scans. But I have done some testing with aggressive read
ahead, and although it is clearly a big win with a single client, the
benefit was less clear as concurrency was increased.

Joe

--
Joe Conway
credativ LLC: http://www.credativ.us
Linux, PostgreSQL, and general Open Source
Training, Service, Consulting, & 24x7 Support

From:	Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>
To:	Joe Conway <mail(at)joeconway(dot)com>
Cc:	Neil Whelchel <neil(dot)whelchel(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: Slow count(*) again...
Date:	2010-10-10 02:44:14
Message-ID:	4CB1287E.7070202@vmsinfo.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

Joe Conway wrote:
> On 10/09/2010 06:54 PM, Mladen Gogala wrote:
>
>> In another database, whose name I will not mention, there is a parameter
>> db_file_multiblock_read_count which specifies how many blocks will be
>> read by a single read when doing a full table scan. PostgreSQL is in
>> dire need of something similar and it wouldn't even be that hard to
>> implement.
>>
>
> You're correct in that it isn't particularly difficult to implement for
> sequential scans. But I have done some testing with aggressive read
> ahead, and although it is clearly a big win with a single client, the
> benefit was less clear as concurrency was increased.
>
> Joe
>
>
Well, in my opinion that should be left to the DBA, the same as in the
"other database". The mythical DBA, the creature that mighty Larry
Ellison himself is on a crusade against, usually can figure out the
right value for the database he or she's is in charge of. I humbly
confess to being an Oracle DBA for more than 2 decades and now branching
into Postgres because my employer is less than enthusiastic about
Oracle, with the special accent on their pricing.

Modern databases, Postgres included, are quite complex and companies
need DBA personnel to help fine tune the applications. I know that good
DBA personnel is quite expensive but without a competent DBA who knows
the database software well enough, companies can and will suffer from
blunders with performance, downtime, lost data and alike. In the world
where almost every application is written for the web, performance,
uptime and user experience are of the critical importance. The
architects of Postgres database would be well advised to operate under
the assumption that every production database has a competent DBA
keeping an eye on the database.

Every application has its own mix of sequential and index scans, you
cannot possibly test all possible applications. Aggressive read-ahead
or "multi-block reads" can be a performance problem and it will
complicate the optimizer, because the optimizer now has a new variable
to account for: the block size, potentially making seq_page_cost even
cheaper and random_page_cost even more expensive, depending on the
blocking. However, slow sequential scan is, in my humble opinion, the
single biggest performance problem of the PostgreSQL databases and
should be improved, the sooner, the better. You should, however, count
on the DBA personnel to help with the tuning.
We're the Tinkerbells of the database world. I am 6'4", 240 LBS, no wings.

--
Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
www.vmsinfo.com

From:	Neil Whelchel <neil(dot)whelchel(at)gmail(dot)com>
To:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: Slow count(*) again...
Date:	2010-10-10 03:02:12
Message-ID:	201010092002.13537.neil.whelchel@gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On Saturday 09 October 2010 18:47:34 Scott Marlowe wrote:
> On Sat, Oct 9, 2010 at 5:26 PM, Neil Whelchel <neil(dot)whelchel(at)gmail(dot)com>
wrote:
> > I know that there haven been many discussions on the slowness of count(*)
> > even when an index is involved because the visibility of the rows has to
> > be checked. In the past I have seen many suggestions about using
> > triggers and tables to keep track of counts and while this works fine in
> > a situation where you know what the report is going to be ahead of time,
> > this is simply not an option when an unknown WHERE clause is to be used
> > (dynamically generated). I ran into a fine example of this when I was
> > searching this mailing list, "Searching in 856,646 pages took 13.48202
> > seconds. Site search powered by PostgreSQL 8.3." Obviously at some point
> > count(*) came into play here because the site made a list of pages (1 2
> > 3 4 5 6 > next). I very commonly make a list of pages from search
> > results, and the biggest time killer here is the count(*) portion, even
> > worse yet, I sometimes have to hit the database with two SELECT
> > statements, one with OFFSET and LIMIT to get the page of results I need
> > and another to get the amount of total rows so I can estimate how many
> > pages of results are available. The point I am driving at here is that
> > since building a list of pages of results is such a common thing to do,
> > there need to be some specific high speed ways to do this in one query.
> > Maybe an estimate(*) that works like count but gives an answer from the
> > index without checking visibility? I am sure that this would be good
> > enough to make a page list, it is really no big deal if it errors on the
> > positive side, maybe the list of pages has an extra page off the end. I
> > can live with that. What I can't live with is taking 13 seconds to get a
> > page of results from 850,000 rows in a table.
>
> 99% of the time in the situations you don't need an exact measure, and
> assuming analyze has run recently, select rel_tuples from pg_class for
> a given table is more than close enough. I'm sure wrapping that in a
> simple estimated_rows() function would be easy enough to do.

This is a very good approach and it works very well when you are counting the
entire table, but when you have no control over the WHERE clause, it doesn't
help. IE: someone puts in a word to look for in a web form.

From my perspective, this issue is the biggest problem there is when using
Postgres to create web pages, and it is so commonly used, I think that there
should be a specific way to deal with it so that you don't have to run the
same WHERE clause twice.
IE: SELECT count(*) FROM <table> WHERE <clause>; to get the total amount of
items to make page navigation links, then:
SELECT <columns> FROM table WHERE <clause> LIMIT <items_per_page> OFFSET
<(page_no-1)*items_per_page>; to get the actual page contents.

It's bad enough that count(*) is slow, then you have to do it all over again
to get the results you need! I have not dug into this much yet, but would it
be possible to return the amount of rows that a WHERE clause would actually
return if the LIMIT and OFFSET were not applied. IE: When a normal query is
executed, the server returns the number of rows aside from the actual row
data. Would it be a big deal to modify this to allow it to return the amount
of rows before the LIMIT and OFFSET is applied as well? This would sure cut
down on time it takes to do the same WHERE clause twice... I have considered
using a cursor to do this, however this requires a transfer of all of the rows
to the client to get a total count, then setting the cursor to get the rows
that you are interested in. Or is there a way around this that I am not aware
of?
-Neil-

From:	Samuel Gendler <sgendler(at)ideasculptor(dot)com>
To:	mladen(dot)gogala(at)vmsinfo(dot)com
Cc:	Joe Conway <mail(at)joeconway(dot)com>, Neil Whelchel <neil(dot)whelchel(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: Slow count(*) again...
Date:	2010-10-10 03:07:00
Message-ID:	AANLkTik5kXP7tzESuX+Px=WvKpt-1q0YB6gTzfSZ8p1E@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On Sat, Oct 9, 2010 at 7:44 PM, Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>wrote:

> The architects of Postgres database would be well advised to operate under
> the assumption that every production database has a competent DBA keeping an
> eye on the database.
>

I'd actually go so far as to say that they have already made this
assumption. The out of the box config needs modification for all but the
most low-volume applications and postgres really benefits from having some
attention paid to performance. Not only does tuning the db provide enormous
gains, but it is often possible to dramatically improve query responsiveness
by simply restructuring a query (assuming an aggregating query over a fairly
large table with a few joins thrown in). My team does not have a competent
DBA (though I've got 15+ years of experience developing on top of various
dbs and certainly don't make overly naive assumptions about how things work)
and the gains that we made, when I finally just sat down and read everything
I could get my hands on about postgres and started reading this list, were
really quite impressive. I intend to take some of the courses offered by
some of the companies that are active on this list when my schedule allows
in order to expand my knowledge even farther, as a DBA is a luxury we cannot
really afford at the moment.

From:	Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
To:	Neil Whelchel <neil(dot)whelchel(at)gmail(dot)com>
Cc:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: Slow count(*) again...
Date:	2010-10-10 06:56:15
Message-ID:	4CB1638F.2080405@postnewspapers.com.au
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On 10/10/2010 11:02 AM, Neil Whelchel wrote:
> On Saturday 09 October 2010 18:47:34 Scott Marlowe wrote:
>> On Sat, Oct 9, 2010 at 5:26 PM, Neil Whelchel<neil(dot)whelchel(at)gmail(dot)com>
> wrote:
>>> I know that there haven been many discussions on the slowness of count(*)
>>> even when an index is involved because the visibility of the rows has to
>>> be checked. In the past I have seen many suggestions about using
>>> triggers and tables to keep track of counts and while this works fine in
>>> a situation where you know what the report is going to be ahead of time,
>>> this is simply not an option when an unknown WHERE clause is to be used
>>> (dynamically generated). I ran into a fine example of this when I was
>>> searching this mailing list, "Searching in 856,646 pages took 13.48202
>>> seconds. Site search powered by PostgreSQL 8.3." Obviously at some point
>>> count(*) came into play here because the site made a list of pages (1 2
>>> 3 4 5 6> next). I very commonly make a list of pages from search
>>> results, and the biggest time killer here is the count(*) portion, even
>>> worse yet, I sometimes have to hit the database with two SELECT
>>> statements, one with OFFSET and LIMIT to get the page of results I need
>>> and another to get the amount of total rows so I can estimate how many
>>> pages of results are available. The point I am driving at here is that
>>> since building a list of pages of results is such a common thing to do,
>>> there need to be some specific high speed ways to do this in one query.
>>> Maybe an estimate(*) that works like count but gives an answer from the
>>> index without checking visibility? I am sure that this would be good
>>> enough to make a page list, it is really no big deal if it errors on the
>>> positive side, maybe the list of pages has an extra page off the end. I
>>> can live with that. What I can't live with is taking 13 seconds to get a
>>> page of results from 850,000 rows in a table.
>>
>> 99% of the time in the situations you don't need an exact measure, and
>> assuming analyze has run recently, select rel_tuples from pg_class for
>> a given table is more than close enough. I'm sure wrapping that in a
>> simple estimated_rows() function would be easy enough to do.
>
> This is a very good approach and it works very well when you are counting the
> entire table, but when you have no control over the WHERE clause, it doesn't
> help. IE: someone puts in a word to look for in a web form.

For that sort of thing, there isn't much that'll help you except
visibility-aware indexes, covering indexes, etc if/when they're
implemented. Even then, they'd only help when it was a simple
index-driven query with no need to hit the table to recheck any test
conditions, etc.

I guess there could be *some* way to expose the query planner's cost
estimates in a manner useful for result count estimation ... but given
how coarse its stats are and how wildly out the estimates can be, I kind
of doubt it. It's really intended for query planning decisions and more
interested in orders of magnitude, "0, 1, or more than that" measures,
etc, and seems to consider 30% here or there to be pretty insignificant
most of the time.

> It's bad enough that count(*) is slow, then you have to do it all over again
> to get the results you need! I have not dug into this much yet, but would it
> be possible to return the amount of rows that a WHERE clause would actually
> return if the LIMIT and OFFSET were not applied. IE: When a normal query is
> executed, the server returns the number of rows aside from the actual row
> data. Would it be a big deal to modify this to allow it to return the amount
> of rows before the LIMIT and OFFSET is applied as well?

It'd force the server to fully execute the query. Then again, it sounds
like you're doing that anyway.

--
Craig Ringer

Tech-related writing at http://soapyfrogs.blogspot.com/

From:	Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
To:	mladen(dot)gogala(at)vmsinfo(dot)com
Cc:	Neil Whelchel <neil(dot)whelchel(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: Slow count(*) again...
Date:	2010-10-10 07:14:18
Message-ID:	4CB167CA.9070109@postnewspapers.com.au
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On 10/10/2010 9:54 AM, Mladen Gogala wrote:

> Unfortunately, the problem is in the rather primitive way PostgreSQL
> does I/O. It didn't change in 9.0 so there is nothing you could gain by
> upgrading. If you execute strace -o /tmp/pg.out -e read <PID of the
> sequential scan process> and inspect the file /tmp/pg.out when the query
> finishes, you will notice a gazillion of read requests, all of them 8192
> bytes in size. That means that PostgreSQL is reading the table block by
> block, without any merging of the requests.

I'd be really interested in any measurements you've done to determine
the cost of this over doing reads in larger chunks. If they're properly
detailed and thought out, the -hackers list is likely to be interested
as well.

The Linux kernel, at least, does request merging (and splitting, and
merging, and more splitting) along the request path, and I'd personally
expect that most of the cost of 8k requests would be in the increased
number of system calls, buffer copies, etc required. Measurements
demonstrating or contradicting this would be good to see.

It's worth being aware that there are memory costs to doing larger
reads, especially when you have many backends each of which want to
allocate a larger buffer for reading. If you can use a chunk of
shared_buffers as the direct destination for the read that's OK, but
otherwise you're up for (1mb-8kb)*num_backends extra memory use on I/O
buffers that could otherwise be used as shared_buffers or OS cache.

Async I/O, too, has costs.

> PostgreSQL is in
> dire need of something similar and it wouldn't even be that hard to
> implement.

I'd really like to see both those assertions backed with data or patches ;-)

Personally, I know just enough about how PG's I/O path works to suspect
that "not that hard to implement" is probably a little ...
over-optimistic. Sure, it's not that hard to implement in a new program
with no wired-in architectural and design choices; that doesn't mean
it's easy to retrofit onto existing code, especially a bunch of
co-operating processes with their own buffer management.

--
Craig Ringer

Tech-related writing at http://soapyfrogs.blogspot.com/

From:	Neil Whelchel <neil(dot)whelchel(at)gmail(dot)com>
To:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: Slow count(*) again...
Date:	2010-10-10 10:29:42
Message-ID:	201010100329.43629.neil.whelchel@gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On Saturday 09 October 2010 23:56:15 Craig Ringer wrote:
> On 10/10/2010 11:02 AM, Neil Whelchel wrote:
> > On Saturday 09 October 2010 18:47:34 Scott Marlowe wrote:
> >> On Sat, Oct 9, 2010 at 5:26 PM, Neil Whelchel<neil(dot)whelchel(at)gmail(dot)com>
> >
> > wrote:
> >>> I know that there haven been many discussions on the slowness of
> >>> count(*) even when an index is involved because the visibility of the
> >>> rows has to be checked. In the past I have seen many suggestions about
> >>> using triggers and tables to keep track of counts and while this works
> >>> fine in a situation where you know what the report is going to be
> >>> ahead of time, this is simply not an option when an unknown WHERE
> >>> clause is to be used (dynamically generated). I ran into a fine
> >>> example of this when I was searching this mailing list, "Searching in
> >>> 856,646 pages took 13.48202 seconds. Site search powered by PostgreSQL
> >>> 8.3." Obviously at some point count(*) came into play here because the
> >>> site made a list of pages (1 2 3 4 5 6> next). I very commonly make a
> >>> list of pages from search results, and the biggest time killer here is
> >>> the count(*) portion, even worse yet, I sometimes have to hit the
> >>> database with two SELECT statements, one with OFFSET and LIMIT to get
> >>> the page of results I need and another to get the amount of total rows
> >>> so I can estimate how many pages of results are available. The point I
> >>> am driving at here is that since building a list of pages of results
> >>> is such a common thing to do, there need to be some specific high
> >>> speed ways to do this in one query. Maybe an estimate(*) that works
> >>> like count but gives an answer from the index without checking
> >>> visibility? I am sure that this would be good enough to make a page
> >>> list, it is really no big deal if it errors on the positive side,
> >>> maybe the list of pages has an extra page off the end. I can live with
> >>> that. What I can't live with is taking 13 seconds to get a page of
> >>> results from 850,000 rows in a table.
> >>
> >> 99% of the time in the situations you don't need an exact measure, and
> >> assuming analyze has run recently, select rel_tuples from pg_class for
> >> a given table is more than close enough. I'm sure wrapping that in a
> >> simple estimated_rows() function would be easy enough to do.
> >
> > This is a very good approach and it works very well when you are counting
> > the entire table, but when you have no control over the WHERE clause, it
> > doesn't help. IE: someone puts in a word to look for in a web form.
>
> For that sort of thing, there isn't much that'll help you except
> visibility-aware indexes, covering indexes, etc if/when they're
> implemented. Even then, they'd only help when it was a simple
> index-driven query with no need to hit the table to recheck any test
> conditions, etc.

Good point, maybe this is turning more into a discussion of how to generate a
list of pages of results and one page of results with one query so we don't
have to do the same painfully slow query twice to do a very common task.

On the other hand, I copied a table out of one of my production servers that
has about 60,000 rows with 6 columns (numeric, numeric, bool, bool, timestamp,
text). The first numeric column has numbers evenly spread between 0 and 100
and it is indexed. I put the table in a pair of database servers both running
on the same physical hardware. One server is Postgres, the other is a popular
server (I am not mentioning names here). on Postgres: SELECT count(*) FROM
table where column>50; takes about 8 seconds to run. The other database server
took less than one second (about 25 ms) as it is using the index (I assume) to
come up with the results. It is true that this is not a fair test because both
servers were tested with their default settings, and the defaults for Postgres
are much more conservative, however, I don't think that any amount of settings
tweaking will bring them even in the same ball park. There has been discussion
about the other server returning an incorrect count because all of the indexed
rows may not be live at the time. This is not a problem for the intended use,
that is why I suggested another function like estimate(*). It's name suggests
that the result will be close, not 100% correct, which is plenty good enough
for generating a list of results pages in most cases. I am faced with a very
serious problem here. If the query to make a list of pages takes say 6 seconds
and it takes another 6 seconds to generate a page of results, the customer is
waiting 12 seconds. This is not going to work. If count made a quick estimate,
say less than a second, and it took 6 seconds to come up with the actual
results, I could live with that. Or if coming up with the window of results
via (OFFSET and LIMIT) and returned the total number of rows that would have
matched the query, then I would still have everything I need to render the
page in a reasonable time. I really think that this needs to be addressed
somewhere. It's not like I am the only one that does this. You see it nearly
everywhere a long list of results is (expected to be) returned in a web site.
Among the people I work with, this seems to be the most mentioned reason that
they claim that they don't use Postgres for their projects.

It would be nice to see how the server comes up with the search results and
list of links to pages of results for this mailing list.
(http://search.postgresql.org/search?q=slow+count%28%29&m=1&l=&d=365&s=r) I am
guessing that it probably uses the count and query method I am talking about.

>
> I guess there could be *some* way to expose the query planner's cost
> estimates in a manner useful for result count estimation ... but given
> how coarse its stats are and how wildly out the estimates can be, I kind
> of doubt it. It's really intended for query planning decisions and more
> interested in orders of magnitude, "0, 1, or more than that" measures,
> etc, and seems to consider 30% here or there to be pretty insignificant
> most of the time.
>
> > It's bad enough that count(*) is slow, then you have to do it all over
> > again to get the results you need! I have not dug into this much yet,
> > but would it be possible to return the amount of rows that a WHERE
> > clause would actually return if the LIMIT and OFFSET were not applied.
> > IE: When a normal query is executed, the server returns the number of
> > rows aside from the actual row data. Would it be a big deal to modify
> > this to allow it to return the amount of rows before the LIMIT and
> > OFFSET is applied as well?
>
> It'd force the server to fully execute the query. Then again, it sounds
> like you're doing that anyway.

From:	Віталій Тимчишин <tivv00(at)gmail(dot)com>
To:	Neil Whelchel <neil(dot)whelchel(at)gmail(dot)com>
Cc:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: Slow count(*) again...
Date:	2010-10-10 12:02:03
Message-ID:	AANLkTimikExy7yCkB23R-Zmx+DaRh0uQhZE-jAE1B_xj@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

2010/10/10 Neil Whelchel <neil(dot)whelchel(at)gmail(dot)com>

> On Saturday 09 October 2010 18:47:34 Scott Marlowe wrote:
> > On Sat, Oct 9, 2010 at 5:26 PM, Neil Whelchel <neil(dot)whelchel(at)gmail(dot)com>
> wrote:
> > > I know that there haven been many discussions on the slowness of
> count(*)
> > > even when an index is involved because the visibility of the rows has
> to
> > > be checked. In the past I have seen many suggestions about using
> > > triggers and tables to keep track of counts and while this works fine
> in
> > > a situation where you know what the report is going to be ahead of
> time,
> > > this is simply not an option when an unknown WHERE clause is to be used
> > > (dynamically generated). I ran into a fine example of this when I was
> > > searching this mailing list, "Searching in 856,646 pages took 13.48202
> > > seconds. Site search powered by PostgreSQL 8.3." Obviously at some
> point
> > > count(*) came into play here because the site made a list of pages (1 2
> > > 3 4 5 6 > next). I very commonly make a list of pages from search
> > > results, and the biggest time killer here is the count(*) portion, even
> > > worse yet, I sometimes have to hit the database with two SELECT
> > > statements, one with OFFSET and LIMIT to get the page of results I need
> > > and another to get the amount of total rows so I can estimate how many
> > > pages of results are available. The point I am driving at here is that
> > > since building a list of pages of results is such a common thing to do,
> > > there need to be some specific high speed ways to do this in one query.
> > > Maybe an estimate(*) that works like count but gives an answer from the
> > > index without checking visibility? I am sure that this would be good
> > > enough to make a page list, it is really no big deal if it errors on
> the
> > > positive side, maybe the list of pages has an extra page off the end. I
> > > can live with that. What I can't live with is taking 13 seconds to get
> a
> > > page of results from 850,000 rows in a table.
> >
> > 99% of the time in the situations you don't need an exact measure, and
> > assuming analyze has run recently, select rel_tuples from pg_class for
> > a given table is more than close enough. I'm sure wrapping that in a
> > simple estimated_rows() function would be easy enough to do.
>
> This is a very good approach and it works very well when you are counting
> the
> entire table, but when you have no control over the WHERE clause, it
> doesn't
> help. IE: someone puts in a word to look for in a web form.
>
> From my perspective, this issue is the biggest problem there is when using
> Postgres to create web pages, and it is so commonly used, I think that
> there
> should be a specific way to deal with it so that you don't have to run the
> same WHERE clause twice.
> IE: SELECT count(*) FROM <table> WHERE <clause>; to get the total amount of
> items to make page navigation links, then:
> SELECT <columns> FROM table WHERE <clause> LIMIT <items_per_page> OFFSET
> <(page_no-1)*items_per_page>; to get the actual page contents.
>
> How about
select * from (select *, count(*) over () as total_count from <table> where
<clause) a LIMIT <items_per_page> OFFSET
<(page_no-1)*items_per_page>
It will return you total_count column with equal value in each row. You may
have problems if no rows are returned (e.g. page num is too high).
--
Best regards,
Vitalii Tymchyshyn

From:	Reid Thompson <reid(dot)thompson(at)ateb(dot)com>
To:	Neil Whelchel <neil(dot)whelchel(at)gmail(dot)com>
Cc:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: Slow count(*) again...
Date:	2010-10-10 15:02:32
Message-ID:	4CB1D588.4080503@ateb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On 10/10/2010 6:29 AM, Neil Whelchel wrote:
> On Saturday 09 October 2010 23:56:15 Craig Ringer wrote:
>> On 10/10/2010 11:02 AM, Neil Whelchel wrote:
>>> On Saturday 09 October 2010 18:47:34 Scott Marlowe wrote:
>>>> On Sat, Oct 9, 2010 at 5:26 PM, Neil Whelchel<neil(dot)whelchel(at)gmail(dot)com>
>>> wrote:
>>>>> I know that there haven been many discussions on the slowness of
>>>>> count(*) even when an index is involved because the visibility of the
>>>>> rows has to be checked. In the past I have seen many suggestions about
>>>>> using triggers and tables to keep track of counts and while this works
>>>>> fine in a situation where you know what the report is going to be
>>>>> ahead of time, this is simply not an option when an unknown WHERE
>>>>> clause is to be used (dynamically generated). I ran into a fine
>>>>> example of this when I was searching this mailing list, "Searching in
>>>>> 856,646 pages took 13.48202 seconds. Site search powered by PostgreSQL
>>>>> 8.3." Obviously at some point count(*) came into play here because the
>>>>> site made a list of pages (1 2 3 4 5 6> next). I very commonly make a
>>>>> list of pages from search results, and the biggest time killer here is
>>>>> the count(*) portion, even worse yet, I sometimes have to hit the
>>>>> database with two SELECT statements, one with OFFSET and LIMIT to get
>>>>> the page of results I need and another to get the amount of total rows
>>>>> so I can estimate how many pages of results are available. The point I
>>>>> am driving at here is that since building a list of pages of results
>>>>> is such a common thing to do, there need to be some specific high
>>>>> speed ways to do this in one query. Maybe an estimate(*) that works
>>>>> like count but gives an answer from the index without checking
>>>>> visibility? I am sure that this would be good enough to make a page
>>>>> list, it is really no big deal if it errors on the positive side,
>>>>> maybe the list of pages has an extra page off the end. I can live with
>>>>> that. What I can't live with is taking 13 seconds to get a page of
>>>>> results from 850,000 rows in a table.
> Good point, maybe this is turning more into a discussion of how to generate a
> list of pages of results and one page of results with one query so we don't
> have to do the same painfully slow query twice to do a very common task.
>
> On the other hand, I copied a table out of one of my production servers that
> has about 60,000 rows with 6 columns (numeric, numeric, bool, bool, timestamp,
> text). The first numeric column has numbers evenly spread between 0 and 100
> and it is indexed. I put the table in a pair of database servers both running
> on the same physical hardware. One server is Postgres, the other is a popular
> server (I am not mentioning names here). on Postgres: SELECT count(*) FROM
> table where column>50; takes about 8 seconds to run. The other database server
> took less than one second (about 25 ms) as it is using the index (I assume) to
> come up with the results. It is true that this is not a fair test because both
> servers were tested with their default settings, and the defaults for Postgres
> are much more conservative, however, I don't think that any amount of settings
> tweaking will bring them even in the same ball park. There has been discussion
> about the other server returning an incorrect count because all of the indexed
> rows may not be live at the time. This is not a problem for the intended use,
> that is why I suggested another function like estimate(*). It's name suggests
> that the result will be close, not 100% correct, which is plenty good enough
> for generating a list of results pages in most cases. I am faced with a very
> serious problem here. If the query to make a list of pages takes say 6 seconds
> and it takes another 6 seconds to generate a page of results, the customer is
> waiting 12 seconds. This is not going to work. If count made a quick estimate,
> say less than a second, and it took 6 seconds to come up with the actual
> results, I could live with that. Or if coming up with the window of results
> via (OFFSET and LIMIT) and returned the total number of rows that would have
> matched the query, then I would still have everything I need to render the
> page in a reasonable time. I really think that this needs to be addressed
> somewhere. It's not like I am the only one that does this. You see it nearly
> everywhere a long list of results is (expected to be) returned in a web site.
> Among the people I work with, this seems to be the most mentioned reason that
> they claim that they don't use Postgres for their projects.
>
> It would be nice to see how the server comes up with the search results and
> list of links to pages of results for this mailing list.
> (http://search.postgresql.org/search?q=slow+count%28%29&m=1&l=&d=365&s=r) I am
> guessing that it probably uses the count and query method I am talking about.
>
>> I guess there could be *some* way to expose the query planner's cost
>> estimates in a manner useful for result count estimation ... but given
>> how coarse its stats are and how wildly out the estimates can be, I kind
>> of doubt it. It's really intended for query planning decisions and more
>> interested in orders of magnitude, "0, 1, or more than that" measures,
>> etc, and seems to consider 30% here or there to be pretty insignificant
>> most of the time.
>>
>>> It's bad enough that count(*) is slow, then you have to do it all over
>>> again to get the results you need! I have not dug into this much yet,
>>> but would it be possible to return the amount of rows that a WHERE
>>> clause would actually return if the LIMIT and OFFSET were not applied.
>>> IE: When a normal query is executed, the server returns the number of
>>> rows aside from the actual row data. Would it be a big deal to modify
>>> this to allow it to return the amount of rows before the LIMIT and
>>> OFFSET is applied as well?
>> It'd force the server to fully execute the query. Then again, it sounds
>> like you're doing that anyway.
How big is your DB?
How fast is your disk access?
Any chance disks/RAM can be addressed?

My disk access is pitiful...
first run, 2.3 million rows.. 0m35.38s, subsequent runs.. real 0m2.55s

rthompso(at)hw-prod-repdb1> time psql -c "select count(*) from my_production_table" reporting
count
---------
2340704
(1 row)

real 0m35.38s
user 0m0.25s
sys 0m0.03s

subsequent runs.... (count changes due to inserts.)

rthompso(at)hw-prod-repdb1> time psql -c "select count(*) from my_production_table" reporting
count
---------
2363707
(1 row)

real 0m2.70s
user 0m0.27s
sys 0m0.02s
rthompso(at)hw-prod-repdb1> time psql -c "select count(*) from my_production_table" reporting
count
---------
2363707
(1 row)

real 0m2.55s
user 0m0.26s
sys 0m0.02s
rthompso(at)hw-prod-repdb1> time psql -c "select count(*) from my_production_table" reporting
count
---------
2363707
(1 row)

real 0m2.50s
user 0m0.26s
sys 0m0.02s

reporting=# SELECT pg_size_pretty(pg_total_relation_size('my_production_table'));
pg_size_pretty
----------------
1890 MB
(1 row)

From:	Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
To:	Neil Whelchel <neil(dot)whelchel(at)gmail(dot)com>
Cc:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: Slow count(*) again...
Date:	2010-10-10 15:30:09
Message-ID:	4CB1DC01.4000409@postnewspapers.com.au
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On 10/10/2010 6:29 PM, Neil Whelchel wrote:
> On the other hand, I copied a table out of one of my production servers that
> has about 60,000 rows with 6 columns (numeric, numeric, bool, bool, timestamp,
> text). The first numeric column has numbers evenly spread between 0 and 100
> and it is indexed. I put the table in a pair of database servers both running
> on the same physical hardware. One server is Postgres, the other is a popular
> server (I am not mentioning names here).

Please do. Your comment is pretty meaningless otherwise.

If you're talking about MySQL: Were you using InnoDB or MyISAM table
storage? Of course it's fast with MyISAM, it relies on locks to do
updates and has bugger all capability for write concurrency, or to
permit readers while writing is going on.

If you're using InnoDB, then I'd like to know how they've managed that.

If you're talking about some *other* database, please name it and
provide any useful details, because the hand waving is not helpful.

> I don't think that any amount of settings
> tweaking will bring them even in the same ball park.

If you are, in fact, comparing MySQL+MyISAM and PostgreSQL, then you're
quite right. Pg will never have such a fast count() as MyISAM does or
the same insanely fast read performance, and MyISAM will never be as
reliable, robust or concurrency-friendly as Pg is. Take your pick, you
can't have both.

> There has been discussion
> about the other server returning an incorrect count because all of the indexed
> rows may not be live at the time. This is not a problem for the intended use,
> that is why I suggested another function like estimate(*). It's name suggests
> that the result will be close, not 100% correct, which is plenty good enough
> for generating a list of results pages in most cases.

Do you have any practical suggestions for generating such an estimate,
though? I find it hard to think of any way the server can do that
doesn't involve executing the query. The table stats are WAY too general
and a bit hit-and-miss, and there isn't really any other way to do it.

If all you want is a way to retrieve both a subset of results AND a
count of how many results would've been generated, it sounds like all
you really need is a way to get the total number of results returned by
a cursor query, which isn't a big engineering challenge. I expect that
in current Pg versions a trivial PL/PgSQL function could be used to
slurp and discard unwanted results, but a better in-server option to
count the results from a cursor query would certainly be nice.

--
Craig Ringer

Tech-related writing at http://soapyfrogs.blogspot.com/

From:	Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>
To:	Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
Cc:	Neil Whelchel <neil(dot)whelchel(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: Slow count(*) again...
Date:	2010-10-10 17:14:22
Message-ID:	4CB1F46E.80205@vmsinfo.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

Craig Ringer wrote:
> On 10/10/2010 9:54 AM, Mladen Gogala wrote:
>
>
>> Unfortunately, the problem is in the rather primitive way PostgreSQL
>> does I/O. It didn't change in 9.0 so there is nothing you could gain by
>> upgrading. If you execute strace -o /tmp/pg.out -e read <PID of the
>> sequential scan process> and inspect the file /tmp/pg.out when the query
>> finishes, you will notice a gazillion of read requests, all of them 8192
>> bytes in size. That means that PostgreSQL is reading the table block by
>> block, without any merging of the requests.
>>
>
> I'd be really interested in any measurements you've done to determine
> the cost of this over doing reads in larger chunks. If they're properly
> detailed and thought out, the -hackers list is likely to be interested
> as well.
>
I can provide measurements, but from Oracle RDBMS. Postgres doesn't
allow tuning of that aspect, so no measurement can be done. Would the
numbers from Oracle RDBMS be acceptable?

> The Linux kernel, at least, does request merging (and splitting, and
> merging, and more splitting) along the request path, and I'd personally
> expect that most of the cost of 8k requests would be in the increased
> number of system calls, buffer copies, etc required. Measurements
> demonstrating or contradicting this would be good to see.
>

Even the cost of hundreds of thousands of context switches is far from
negligible. What kind of measurements do you expect me to do with the
database which doesn't support tweaking of that aspect of its operation?

> It's worth being aware that there are memory costs to doing larger
> reads, especially when you have many backends each of which want to
> allocate a larger buffer for reading.

Oh, it's not only larger memory, the buffer management would have to be
changed too, to prevent process doing a sequential scan from inundating
the shared buffers. Alternatively, the blocks would have to be written
into the private memory and immediately thrown away after that. However,
the experience with Oracle tells me that this is well worth it. Here are
the numbers:

Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.5.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP, Data Mining
and Real Application Testing options

SQL> show parameter db_file_multi

NAME TYPE VALUE
------------------------------------ -----------
------------------------------
db_file_multiblock_read_count integer 16
SQL> alter session set db_file_multiblock_read_count=1;

Session altered.
SQL> select count(*) from ni_occurrence;

COUNT(*)
----------
402062638

Elapsed: 00:08:20.88
SQL> alter session set db_file_multiblock_read_count=128;

Session altered.

Elapsed: 00:00:00.50
SQL> select count(*) from ni_occurrence;

COUNT(*)
----------
402062638

Elapsed: 00:02:17.58

In other words, when I batched the sequential scan to do 128 blocks I/O,
it was 4 times faster than when I did the single block I/O.
Does that provide enough of an evidence and, if not, why not?

> If you can use a chunk of
> shared_buffers as the direct destination for the read that's OK, but
> otherwise you're up for (1mb-8kb)*num_backends extra memory use on I/O
> buffers that could otherwise be used as shared_buffers or OS cache.
>
> Async I/O, too, has costs.
>

There is a common platitude that says that there is no such thing as
free lunch. However, both Oracle RDBMS and IBM DB2 use asynchronous I/O,
probably because they're unaware of the danger. Let me now give you a
full table scan of a much smaller table located in a Postgres database:

news=> select count(*) from internet_web_sites;
count
---------
1290133
(1 row)

Time: 12838.958 ms

Oracle counts 400 million records in 2 minutes and Postgres 9.01 takes
12.8 seconds to count 1.2 million records? Do you see the disparity?

Both databases, Oracle and Postgres, are utilizing the same 3Par SAN
device, the machines housing both databases are comparable HP 64 bit
Linux machines, both running 64 bit version of Red Hat 5.5. Respective
table sizes are here:

SQL> select bytes/1048576 as MB from user_segments
2 where segment_name='NI_OCCURRENCE';

MB
----------
35329

news=> select pg_size_pretty(pg_table_size('moreover.internet_web_sites'));
pg_size_pretty
----------------
216 MB
(1 row)

So, I really pushed Oracle much harder than I pushed Postgres.

> > PostgreSQL is in
>
>> dire need of something similar and it wouldn't even be that hard to
>> implement.
>>
>
> I'd really like to see both those assertions backed with data or patches ;-)
>

With the database that doesn't allow tuning of that aspect, it's the
self-defeating proposition. However, I did my best to give you the numbers.

> Personally, I know just enough about how PG's I/O path works to suspect
> that "not that hard to implement" is probably a little ...
> over-optimistic. Sure, it's not that hard to implement in a new program
> with no wired-in architectural and design choices; that doesn't mean
> it's easy to retrofit onto existing code, especially a bunch of
> co-operating processes with their own buffer management.
>
>
It maybe so, but slow sequential scan is still the largest single
performance problem of PostgreSQL. The frequency with which that topic
appears on the mailing lists should serve as a good evidence for that. I
did my best to prove my case. Again, requiring "hard numbers" when
using the database which doesn't allow tweaking of the I/O size is self
defeating proposition. The other databases, like DB2 and Oracle both
allow tweaking of that aspect of its operation, Oracle even on the per
session basis. If you still claim that it wouldn't make the difference,
the onus to prove it is on you.

--
Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
www.vmsinfo.com

From:	Reid Thompson <reid(dot)thompson(at)ateb(dot)com>
To:	reid(dot)thompson(at)ateb(dot)com
Cc:	Neil Whelchel <neil(dot)whelchel(at)gmail(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject:	Re: Slow count(*) again...
Date:	2010-10-10 18:33:04
Message-ID:	4CB206E0.6080707@ateb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On 10/10/2010 11:02 AM, Reid Thompson wrote:
>>>> On Sat, Oct 9, 2010 at 5:26 PM, Neil Whelchel<neil(dot)whelchel(at)gmail(dot)com>
>>>>
>> On the other hand, I copied a table out of one of my production servers that
>> has about 60,000 rows with 6 columns (numeric, numeric, bool, bool, timestamp,
>> text). The first numeric column has numbers evenly spread between 0 and 100
>> and it is indexed. I put the table in a pair of database servers both running
>> on the same physical hardware. One server is Postgres, the other is a popular
>> server (I am not mentioning names here). on Postgres: SELECT count(*) FROM
>> table where column>50; takes about 8 seconds to run. The other database server
>> took less than one second (about 25 ms) as it is using the index (I assume) to
>> come up with the results. It is true that this is not a fair test because both
>> servers were tested with their default settings, and the defaults for Postgres
>> are much more conservative, however, I don't think that any amount of settings
>> tweaking will bring them even in the same ball park. There has been discussion
>> about the other server returning an incorrect count because all of the indexed
>> rows may not be live at the time. This is not a problem for the intended use,
>> that is why I suggested another function like estimate(*). It's name suggests
>> that the result will be close, not 100% correct, which is plenty good enough
>> for generating a list of results pages in most cases. I am faced with a very
>> serious problem here. If the query to make a list of pages takes say 6 seconds
>> and it takes another 6 seconds to generate a page of results, the customer is
>> waiting 12 seconds. This is not going to work. If count made a quick estimate,
>> say less than a second, and it took 6 seconds to come up with the actual
>> results, I could live with that. Or if coming up with the window of results
>> via (OFFSET and LIMIT) and returned the total number of rows that would have
>> matched the query, then I would still have everything I need to render the
>> page in a reasonable time. I really think that this needs to be addressed
>> somewhere. It's not like I am the only one that does this. You see it nearly
>> everywhere a long list of results is (expected to be) returned in a web site.
>> Among the people I work with, this seems to be the most mentioned reason that
>> they claim that they don't use Postgres for their projects. t anyway.
>
> How big is your DB?
> How fast is your disk access?
> Any chance disks/RAM can be addressed?
>
> My disk access is pitiful...
> first run, 2.3 million rows.. 0m35.38s, subsequent runs.. real 0m2.55s
>
> rthompso(at)hw-prod-repdb1> time psql -c "select count(*) from my_production_table" reporting
> count
> ---------
> 2340704
> (1 row)
>
>
> real 0m35.38s
> user 0m0.25s
> sys 0m0.03s
>
> subsequent runs.... (count changes due to inserts.)
>
> rthompso(at)hw-prod-repdb1> time psql -c "select count(*) from my_production_table" reporting
> count
> ---------
> 2363707
> (1 row)
>
>
> real 0m2.70s
> user 0m0.27s
> sys 0m0.02s
> rthompso(at)hw-prod-repdb1> time psql -c "select count(*) from my_production_table" reporting
> count
> ---------
> 2363707
> (1 row)
>
>
> real 0m2.55s
> user 0m0.26s
> sys 0m0.02s
> rthompso(at)hw-prod-repdb1> time psql -c "select count(*) from my_production_table" reporting
> count
> ---------
> 2363707
> (1 row)
>
>
> real 0m2.50s
> user 0m0.26s
> sys 0m0.02s
>
> reporting=# SELECT pg_size_pretty(pg_total_relation_size('my_production_table'));
> pg_size_pretty
> ----------------
> 1890 MB
> (1 row)
>
>
forgot to note, my table schema is significantly larger.

From:	Neil Whelchel <neil(dot)whelchel(at)gmail(dot)com>
To:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: Slow count(*) again...
Date:	2010-10-10 21:59:48
Message-ID:	201010101459.49336.neil.whelchel@gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On Sunday 10 October 2010 05:02:03 Віталій Тимчишин wrote:
> 2010/10/10 Neil Whelchel <neil(dot)whelchel(at)gmail(dot)com>
>
> > On Saturday 09 October 2010 18:47:34 Scott Marlowe wrote:
> > > On Sat, Oct 9, 2010 at 5:26 PM, Neil Whelchel <neil(dot)whelchel(at)gmail(dot)com>
> >
> > wrote:
> > > > I know that there haven been many discussions on the slowness of
> >
> > count(*)
> >
> > > > even when an index is involved because the visibility of the rows has
> >
> > to
> >
> > > > be checked. In the past I have seen many suggestions about using
> > > > triggers and tables to keep track of counts and while this works fine
> >
> > in
> >
> > > > a situation where you know what the report is going to be ahead of
> >
> > time,
> >
> > > > this is simply not an option when an unknown WHERE clause is to be
> > > > used (dynamically generated). I ran into a fine example of this when
> > > > I was searching this mailing list, "Searching in 856,646 pages took
> > > > 13.48202 seconds. Site search powered by PostgreSQL 8.3." Obviously
> > > > at some
> >
> > point
> >
> > > > count(*) came into play here because the site made a list of pages (1
> > > > 2 3 4 5 6 > next). I very commonly make a list of pages from search
> > > > results, and the biggest time killer here is the count(*) portion,
> > > > even worse yet, I sometimes have to hit the database with two SELECT
> > > > statements, one with OFFSET and LIMIT to get the page of results I
> > > > need and another to get the amount of total rows so I can estimate
> > > > how many pages of results are available. The point I am driving at
> > > > here is that since building a list of pages of results is such a
> > > > common thing to do, there need to be some specific high speed ways
> > > > to do this in one query. Maybe an estimate(*) that works like count
> > > > but gives an answer from the index without checking visibility? I am
> > > > sure that this would be good enough to make a page list, it is
> > > > really no big deal if it errors on
> >
> > the
> >
> > > > positive side, maybe the list of pages has an extra page off the end.
> > > > I can live with that. What I can't live with is taking 13 seconds to
> > > > get
> >
> > a
> >
> > > > page of results from 850,000 rows in a table.
> > >
> > > 99% of the time in the situations you don't need an exact measure, and
> > > assuming analyze has run recently, select rel_tuples from pg_class for
> > > a given table is more than close enough. I'm sure wrapping that in a
> > > simple estimated_rows() function would be easy enough to do.
> >
> > This is a very good approach and it works very well when you are counting
> > the
> > entire table, but when you have no control over the WHERE clause, it
> > doesn't
> > help. IE: someone puts in a word to look for in a web form.
> >
> > From my perspective, this issue is the biggest problem there is when
> > using Postgres to create web pages, and it is so commonly used, I think
> > that there
> > should be a specific way to deal with it so that you don't have to run
> > the same WHERE clause twice.
> > IE: SELECT count(*) FROM <table> WHERE <clause>; to get the total amount
> > of items to make page navigation links, then:
> > SELECT <columns> FROM table WHERE <clause> LIMIT <items_per_page> OFFSET
> > <(page_no-1)*items_per_page>; to get the actual page contents.
> >
> > How about
>
> select * from (select *, count(*) over () as total_count from <table> where
> <clause) a LIMIT <items_per_page> OFFSET
> <(page_no-1)*items_per_page>
> It will return you total_count column with equal value in each row. You may
> have problems if no rows are returned (e.g. page num is too high).

I have done this before, but the speedup from the two hits to the database
that I mentioned above is tiny, just a few ms. It seems to end up doing about
the same thing on the database end. The reason that I don't commonly do this
is what you said about not getting a count result if you run off the end.
-Neil-

From:	Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
To:	mladen(dot)gogala(at)vmsinfo(dot)com
Cc:	Neil Whelchel <neil(dot)whelchel(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: Slow count(*) again...
Date:	2010-10-10 22:41:16
Message-ID:	4CB2410C.6010603@postnewspapers.com.au
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On 10/11/2010 01:14 AM, Mladen Gogala wrote:

> I can provide measurements, but from Oracle RDBMS. Postgres doesn't
> allow tuning of that aspect, so no measurement can be done. Would the
> numbers from Oracle RDBMS be acceptable?

Well, they'd tell me a lot about Oracle's performance as I/O chunk size
scales, but almost nothing about the cost of small I/O operations vs
larger ones in general.

Typically dedicated test programs that simulate the database read
patterns would be used for this sort of thing. I'd be surprised if
nobody on -hackers has already done suitable testing; I was mostly
asking because I was interested in how you were backing your assertions.

PostgreSQL isn't Oracle; their design is in many ways very different.
Most importantly, Oracle uses a redo log, where PostgreSQL stores old
rows with visibility information directly in the tables. It is possible
that a larger proportion of Oracle's I/O costs are fixed per-block
overheads rather than per-byte costs, so it seeks to batch requests into
larger chunks. Of course, it's also possible that 8k chunk I/O is just
universally expensive and is something Pg should avoid, too, but we
can't know that without
dedicated testing, which I at least haven't done. I don't follow
-hackers closely, and wouldn't have seen discussion about testing done
there. The archives are likely to contain useful discussions.

Then again, IIRC Pg's page size is also it's I/O size, so you could
actually get larger I/O chunking by rebuilding Pg with larger pages.
Having never had the need, I haven't examined the performance of page
size changes on I/O performance.

>> The Linux kernel, at least, does request merging (and splitting, and
>> merging, and more splitting) along the request path, and I'd
>> personally expect that most of the cost of 8k requests would be in the
>> increased number of system calls, buffer copies, etc required.
>> Measurements demonstrating or contradicting this would be good to see.
>
> Even the cost of hundreds of thousands of context switches is far from
> negligible. What kind of measurements do you expect me to do with the
> database which doesn't support tweaking of that aspect of its operation?

Test programs, or references to testing done by others that demonstrates
these costs in isolation. Of course, they still wouldn't show what gain
Pg might obtain (nothing except hacking on Pg's sources really will) but
they'd help measure the costs of doing I/O that way.

I suspect you're right that large I/O chunks would be desirable and a
potential performance improvement. What I'd like to know is *how*
*much*, or at least how much the current approach costs in pure
overheads like context switches and scheduler delays.

> Does that provide enough of an evidence and, if not, why not?

It shows that it helps Oracle a lot ;-)

Without isolating how much of that is raw costs of the block I/O and how
much is costs internal to Oracle, it's still hard to have much idea how
much it'd benefit Pg to take a similar approach.

I'm sure the folks on -hackers have been over this and know a whole lot
more about it than I do, though.

> Oracle counts 400 million records in 2 minutes and Postgres 9.01 takes
> 12.8 seconds to count 1.2 million records? Do you see the disparity?

Sure. What I don't know is how much of that is due to block sizes. There
are all sorts of areas where Oracle could be gaining.

> It maybe so, but slow sequential scan is still the largest single
> performance problem of PostgreSQL. The frequency with which that topic
> appears on the mailing lists should serve as a good evidence for that.

I'm certainly not arguing that it could use improvement; it's clearly
hurting some users. I just don't know if I/O chunking is the answer - I
suspect that if it were, then it would've become a priority for one or
more people working on Pg much sooner.

It's quite likely that it's one of those things where it makes a huge
difference for Oracle because Oracle has managed to optimize out most of
the other bigger costs. If Pg still has other areas that make I/O more
expensive per-byte (say, visibility checks) and low fixed per-block
costs, then there'd be little point in chunking I/O. My understanding is
that that's pretty much how things stand at the moment, but I'd love
verification from someone who's done the testing.

>If you still claim that it wouldn't make the difference,
> the onus to prove it is on you.

I didn't mean to claim that it would make no difference. If I sounded
like it, sorry.

I just want to know how _much_ , or more accurately how great the
overheads of the current approach in Pg are vs doing much larger reads.

--
Craig Ringer

From:	Jon Nelson <jnelson+pgsql(at)jamponi(dot)net>
To:	mladen(dot)gogala(at)vmsinfo(dot)com
Cc:	Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, Neil Whelchel <neil(dot)whelchel(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: Slow count(*) again...
Date:	2010-10-10 22:50:22
Message-ID:	AANLkTi=gbaApWtgDEwH0A_tyjVroqMzaGEaqvjPKpCpO@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On Sun, Oct 10, 2010 at 12:14 PM, Mladen Gogala
<mladen(dot)gogala(at)vmsinfo(dot)com> wrote:
>
>
>
> In other words, when I batched the sequential scan to do 128 blocks I/O, it
> was 4 times faster than when I did the single block I/O.
> Does that provide enough of an evidence and, if not, why not?

These numbers tell us nothing because, unless you dropped the caches
between runs, then at least part of some runs was very probably
cached.

--
Jon

From:	Joshua Tolley <eggyknap(at)gmail(dot)com>
To:	Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
Cc:	mladen(dot)gogala(at)vmsinfo(dot)com, Neil Whelchel <neil(dot)whelchel(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: Slow count(*) again...
Date:	2010-10-11 00:27:53
Message-ID:	4cb25a0e.2a978e0a.7797.33d6@mx.google.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On Mon, Oct 11, 2010 at 06:41:16AM +0800, Craig Ringer wrote:
> On 10/11/2010 01:14 AM, Mladen Gogala wrote:
>
>> I can provide measurements, but from Oracle RDBMS. Postgres doesn't
>> allow tuning of that aspect, so no measurement can be done. Would the
>> numbers from Oracle RDBMS be acceptable?
>
> Well, they'd tell me a lot about Oracle's performance as I/O chunk size
> scales, but almost nothing about the cost of small I/O operations vs
> larger ones in general.
>
> Typically dedicated test programs that simulate the database read
> patterns would be used for this sort of thing. I'd be surprised if
> nobody on -hackers has already done suitable testing; I was mostly
> asking because I was interested in how you were backing your assertions.

One thing a test program would have to take into account is multiple
concurrent users. What speeds up the single user case may well hurt the
multi user case, and the behaviors that hurt single user cases may have been
put in place on purpose to allow decent multi-user performance. Of course, all
of that is "might" and "maybe", and I can't prove any assertions about block
size either. But the fact of multiple users needs to be kept in mind.

It was asserted that reading bigger chunks would help performance; a response
suggested that, at least in Linux, setting readahead on a device would
essentially do the same thing. Or that's what I got from the thread, anyway.
I'm interested to know how similar performance might be between the large
block size case and the large readahead case. Comments, anyone?

--
Joshua Tolley / eggyknap
End Point Corporation
http://www.endpoint.com

From:	Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
To:	Joshua Tolley <eggyknap(at)gmail(dot)com>
Cc:	mladen(dot)gogala(at)vmsinfo(dot)com, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: Slow count(*) again...
Date:	2010-10-11 00:51:43
Message-ID:	4CB25F9F.4050204@postnewspapers.com.au
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On 10/11/2010 08:27 AM, Joshua Tolley wrote:

> One thing a test program would have to take into account is multiple
> concurrent users. What speeds up the single user case may well hurt the
> multi user case, and the behaviors that hurt single user cases may have been
> put in place on purpose to allow decent multi-user performance. Of course, all
> of that is "might" and "maybe", and I can't prove any assertions about block
> size either. But the fact of multiple users needs to be kept in mind.

Agreed. I've put together a simple test program to test I/O chunk sizes.
It only tests single-user performance, but it'd be pretty trivial to
adapt it to spawn a couple of worker children or run several threads,
each with a suitable delay as it's rather uncommon to have a bunch of
seqscans all fire off at once.

From this test it's pretty clear that with buffered I/O of an uncached
700mb file under Linux, the I/O chunk size makes very little difference,
with all chunk sizes taking 9.8s to read the test file, with
near-identical CPU utilization. Caches were dropped between each test run.

For direct I/O (by ORing the O_DIRECT flag to the open() flags), chunk
size is *hugely* significant, with 4k chunk reads of the test file
taking 38s, 8k 22s, 16k 14s, 32k 10.8s, 64k - 1024k 9.8s, then rising a
little again over 1024k.

Apparently Oracle is almost always configured to use direct I/O, so it
would benefit massively from large chunk sizes. PostgreSQL is almost
never used with direct I/O, and at least in terms of the low-level costs
of syscalls and file system activity, shouldn't care at all about read
chunk sizes.

Bumping readahead from 256 to 8192 made no significant difference for
either case. Of course, I'm on a crappy laptop disk...

I'm guessing this is the origin of the OP's focus on I/O chunk sizes.

Anyway, for the single-seqscan case, I see little evidence here that
using a bigger read chunk size would help PostgreSQL reduce overheads or
improve performance.

OP: Is your Oracle instance using direct I/O?

--
Craig Ringer

Attachment	Content-Type	Size
testio.c	text/plain	918 bytes
results_buffered_ra256	text/plain	1.1 KB
results_buffered_ra4096	text/plain	1.1 KB
results_odirect_ra256	text/plain	1016 bytes
results_odirect_ra4096	text/plain	1019 bytes

From:	Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>
To:	Joshua Tolley <eggyknap(at)gmail(dot)com>
Cc:	Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, Neil Whelchel <neil(dot)whelchel(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: Slow count(*) again...
Date:	2010-10-11 03:14:43
Message-ID:	4CB28123.5080300@vmsinfo.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On 10/10/2010 8:27 PM, Joshua Tolley wrote:
> It was asserted that reading bigger chunks would help performance; a response
> suggested that, at least in Linux, setting readahead on a device would
> essentially do the same thing. Or that's what I got from the thread, anyway.
> I'm interested to know how similar performance might be between the large
> block size case and the large readahead case. Comments, anyone?
>

Craig maybe right, the fact that Oracle is doing direct I/O probably
does account for the difference. The fact is, however, that the question
about slow sequential scan appears with some regularity on PostgreSQL
forums. My guess that a larger chunk would be helpful may not be
correct, but I do believe that there is a problem with a too slow
sequential scan. Bigger chunks are a very traditional solution which
may not work but the problem is still there.

--
Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
www.vmsinfo.com

From:	Joshua Tolley <eggyknap(at)gmail(dot)com>
To:	Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>
Cc:	Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, Neil Whelchel <neil(dot)whelchel(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: Slow count(*) again...
Date:	2010-10-11 03:21:54
Message-ID:	4cb282d7.2405720a.5ca5.fffff05b@mx.google.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On Sun, Oct 10, 2010 at 11:14:43PM -0400, Mladen Gogala wrote:
> The fact is, however, that the question
> about slow sequential scan appears with some regularity on PostgreSQL
> forums.

Definitely. Whether that's because there's something pathologically wrong with
sequential scans, or just because they're the slowest of the common
operations, remains to be seen. After all, if sequential scans were suddenly
fast, something else would be the slowest thing postgres commonly did.

All that said, if there's gain to be had by increasing block size, or
something else, esp. if it's low hanging fruit, w00t.

--
Joshua Tolley / eggyknap
End Point Corporation
http://www.endpoint.com

From:	Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
To:	Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>
Cc:	Joshua Tolley <eggyknap(at)gmail(dot)com>, Neil Whelchel <neil(dot)whelchel(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: Slow count(*) again...
Date:	2010-10-11 04:11:58
Message-ID:	4CB28E8E.8060703@postnewspapers.com.au
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On 11/10/10 11:14, Mladen Gogala wrote:
> On 10/10/2010 8:27 PM, Joshua Tolley wrote:
>> It was asserted that reading bigger chunks would help performance; a
>> response
>> suggested that, at least in Linux, setting readahead on a device would
>> essentially do the same thing. Or that's what I got from the thread,
>> anyway.
>> I'm interested to know how similar performance might be between the large
>> block size case and the large readahead case. Comments, anyone?
>>
>
> Craig maybe right, the fact that Oracle is doing direct I/O probably
> does account for the difference. The fact is, however, that the question
> about slow sequential scan appears with some regularity on PostgreSQL
> forums. My guess that a larger chunk would be helpful may not be
> correct, but I do believe that there is a problem with a too slow
> sequential scan. Bigger chunks are a very traditional solution which
> may not work but the problem is still there.

Now that, I agree with.

BTW, I casually looked into async I/O a little, and it seems the general
situation for async I/O on Linux is "pretty bloody awful". POSIX async
I/O uses signal-driven completion handlers - but signal queue depth
limits mean they aren't necessarily reliable, so you can get lost
completions and have to scan the event buffer periodically to catch
them. The alternative is running completion handlers in threads, but
apparently there are queue depth limit issues there too, as well as the
joy that is getting POSIX threading right. I think there was some talk
about this on -HACKERS a while ago. Here's the main discussion on async
I/O I've found:

http://archives.postgresql.org/pgsql-hackers/2006-10/msg00820.php

... from which it seems that async buffered I/O is poorly supported, if
at all, on current Linux kernels. Don't know about the BSDs. As Pg is
*really* poorly suited to direct I/O, relying on the OS buffer cache as
it does, unbuffered direct I/O isn't really an option.

Linux async I/O seems to be designed for network I/O and for monitoring
lots of files for changes, rather than for highly concurrent I/O on one
or a few files. It shows.

Re slow seqscans, there is still plenty of room to move:

- Sequential scans cannot (AFAIK) use the visibility map introduced in
8.4 to skip sections of tables that are known to contain only dead
tuples not visible to any transaction or free space. This potential
optimization could make a big difference in tables with FILLFACTOR or
with holes created by certain update patterns.

- Covering indexes ("index oriented" table columns) and/or indexes with
embedded visibility information could dramatically improve the
performance of certain queries by eliminating the need to hit the heap
at all, albeit at the cost of trade-offs elsewhere. This would be
particularly useful for those classic count() queries. There have been
discussions about these on -hackers, but I'm not up with the current
status or lack thereof.

- There's been recent talk of using pread() rather than lseek() and
read() to save on syscall overhead. The difference is probably minimal,
but it'd be nice.

It is worth being aware of a few other factors:

- Sometimes seqscans are actually the fastest option, and people don't
realize this, so they try to force index use where it doesn't make
sense. This is the cause of a significant number of list complaints.

- Slow sequential scans are often a consequence of table bloat. It's
worth checking for this. Pg's autovacuum and manual vacuum have improved
in performance and usability dramatically over time, but still have room
to move. Sometimes people disable autovacuum in the name of a
short-lived performance boost, not realizing it'll have horrible effects
on performance in the mid- to long- term.

- Seqscans can be chosen when index scans are more appropriate if the
random_page_cost and seq_page_cost aren't set sensibly, which they
usually aren't. This doesn't make seqscans any faster, but it's even
worse when you have a good index you're not using. I can't help but
wonder if a bundled "quick and dirty benchmark" tool for Pg would be
beneficial in helping to determine appropriate values for these settings
and for effective io concurrency.

--
Craig Ringer

Tech-related writing: http://soapyfrogs.blogspot.com/

From:	Neil Whelchel <neil(dot)whelchel(at)gmail(dot)com>
To:	Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
Subject:	Re: Slow count(*) again...
Date:	2010-10-11 04:15:56
Message-ID:	201010102115.58729.neil.whelchel@gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On Sunday 10 October 2010 15:41:16 you wrote:
> On 10/11/2010 01:14 AM, Mladen Gogala wrote:
> > I can provide measurements, but from Oracle RDBMS. Postgres doesn't
> > allow tuning of that aspect, so no measurement can be done. Would the
> > numbers from Oracle RDBMS be acceptable?
>
> Well, they'd tell me a lot about Oracle's performance as I/O chunk size
> scales, but almost nothing about the cost of small I/O operations vs
> larger ones in general.
>
> Typically dedicated test programs that simulate the database read
> patterns would be used for this sort of thing. I'd be surprised if
> nobody on -hackers has already done suitable testing; I was mostly
> asking because I was interested in how you were backing your assertions.
>
> PostgreSQL isn't Oracle; their design is in many ways very different.
> Most importantly, Oracle uses a redo log, where PostgreSQL stores old
> rows with visibility information directly in the tables. It is possible
> that a larger proportion of Oracle's I/O costs are fixed per-block
> overheads rather than per-byte costs, so it seeks to batch requests into
> larger chunks. Of course, it's also possible that 8k chunk I/O is just
> universally expensive and is something Pg should avoid, too, but we
> can't know that without
> dedicated testing, which I at least haven't done. I don't follow
> -hackers closely, and wouldn't have seen discussion about testing done
> there. The archives are likely to contain useful discussions.
>
> Then again, IIRC Pg's page size is also it's I/O size, so you could
> actually get larger I/O chunking by rebuilding Pg with larger pages.
> Having never had the need, I haven't examined the performance of page
> size changes on I/O performance.

This is a really good idea! I will look into doing this and I will post the
results as soon as I can get it done.

Right now, I am building a test machine with two dual core Intel processors
and two 15KRPM mirrored hard drives, 1 GB ram. I am using a small amount of
ram because I will be using small test tables. I may do testing in the future
with more ram and bigger tables, but I think I can accomplish what we are all
after with what I have. The machine will be limited to running the database
server in test, init, bash, and ssh, no other processes will be running except
for what is directly involved with testing. I will post exact specs when I
post test results. I will create some test tables, and the same tables will be
used in all tests. Suggestions for optimal Postgres and system configuration
are welcome. I will try any suggested settings that I have time to test.
-Neil-

>
> >> The Linux kernel, at least, does request merging (and splitting, and
> >> merging, and more splitting) along the request path, and I'd
> >> personally expect that most of the cost of 8k requests would be in the
> >> increased number of system calls, buffer copies, etc required.
> >> Measurements demonstrating or contradicting this would be good to see.
> >
> > Even the cost of hundreds of thousands of context switches is far from
> > negligible. What kind of measurements do you expect me to do with the
> > database which doesn't support tweaking of that aspect of its operation?
>
> Test programs, or references to testing done by others that demonstrates
> these costs in isolation. Of course, they still wouldn't show what gain
> Pg might obtain (nothing except hacking on Pg's sources really will) but
> they'd help measure the costs of doing I/O that way.
>
> I suspect you're right that large I/O chunks would be desirable and a
> potential performance improvement. What I'd like to know is *how*
> *much*, or at least how much the current approach costs in pure
> overheads like context switches and scheduler delays.
>
> > Does that provide enough of an evidence and, if not, why not?
>
> It shows that it helps Oracle a lot ;-)
>
> Without isolating how much of that is raw costs of the block I/O and how
> much is costs internal to Oracle, it's still hard to have much idea how
> much it'd benefit Pg to take a similar approach.
>
> I'm sure the folks on -hackers have been over this and know a whole lot
> more about it than I do, though.
>
> > Oracle counts 400 million records in 2 minutes and Postgres 9.01 takes
> > 12.8 seconds to count 1.2 million records? Do you see the disparity?
>
> Sure. What I don't know is how much of that is due to block sizes. There
> are all sorts of areas where Oracle could be gaining.
>
> > It maybe so, but slow sequential scan is still the largest single
> > performance problem of PostgreSQL. The frequency with which that topic
> > appears on the mailing lists should serve as a good evidence for that.
>
> I'm certainly not arguing that it could use improvement; it's clearly
> hurting some users. I just don't know if I/O chunking is the answer - I
> suspect that if it were, then it would've become a priority for one or
> more people working on Pg much sooner.
>
> It's quite likely that it's one of those things where it makes a huge
> difference for Oracle because Oracle has managed to optimize out most of
> the other bigger costs. If Pg still has other areas that make I/O more
> expensive per-byte (say, visibility checks) and low fixed per-block
> costs, then there'd be little point in chunking I/O. My understanding is
> that that's pretty much how things stand at the moment, but I'd love
> verification from someone who's done the testing.
>
> >If you still claim that it wouldn't make the difference,
> >
> > the onus to prove it is on you.
>
> I didn't mean to claim that it would make no difference. If I sounded
> like it, sorry.
>
> I just want to know how _much_ , or more accurately how great the
> overheads of the current approach in Pg are vs doing much larger reads.
>
> --
> Craig Ringer

From:	"Pierre C" <lists(at)peufeu(dot)com>
To:	pgsql-performance(at)postgresql(dot)org, "Neil Whelchel" <neil(dot)whelchel(at)gmail(dot)com>
Subject:	Re: Slow count(*) again...
Date:	2010-10-11 10:09:02
Message-ID:	op.vkeo5cw2eorkce@apollo13
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

> I ran into a fine example of this when I was searching this mailing list,
> "Searching in 856,646 pages took 13.48202 seconds. Site search powered by
> PostgreSQL 8.3." Obviously at some point count(*) came into play here

Well, tsearch full text search is excellent, but it has to work inside the
limits of the postgres database itself, which means row visibility checks,
and therefore, yes, extremely slow count(*) on large result sets when the
tables are not cached in RAM.

Also, if you want to use custom sorting (like by date, thread, etc)
possibly all the matching rows will have to be read and sorted.

Consider, for example, the Xapian full text search engine. It is not MVCC
(it is single writer, multiple reader, so only one process can update the
index at a time, but readers are not locked out during writes). Of course,
you would never want something like that for your main database ! However,
in its particular application, which is multi-criteria full text search
(and flexible sorting of results), it just nukes tsearch2 on datasets not
cached in RAM, simply because everything in it including disk layout etc,
has been optimized for the application. Lucene is similar (but I have not
benchmarked it versus tsearch2, so I can't tell).

So, if your full text search is a problem, just use Xapian. You can update
the Xapian index from a postgres trigger (using an independent process, or
simply, a plpython trigger using the python Xapian bindings). You can
query it using an extra process acting as a server, or you can write a
set-returning plpython function which performs Xapian searches, and you
can join the results to your tables.

> Pg will never have such a fast count() as MyISAM does or the same
> insanely fast read performance,

Benchmark it you'll see, MyISAM is faster than postgres for "small simple
selects", only if :
- pg doesn't use prepared queries (planning time takes longer than a
really simple select)
- myisam can use index-only access
- noone is writing to the myisam table at the moment, obviously

On equal grounds (ie, SELECT * FROM table WHERE pk = value) there is no
difference. The TCP/IP overhead is larger than the query anyway, you have
to use unix sockets on both to get valid timings. Since by default on
localhost MySQL seems to use unix sockets and PG uses tcp/ip, PG seem 2x
slower, which is in fact not true.

From:	Craig James <craig_james(at)emolecules(dot)com>
To:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: Slow count(*) again...
Date:	2010-10-11 17:46:17
Message-ID:	4CB34D69.6060108@emolecules.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On 10/9/10 6:47 PM, Scott Marlowe wrote:
> On Sat, Oct 9, 2010 at 5:26 PM, Neil Whelchel<neil(dot)whelchel(at)gmail(dot)com> wrote:
>> I know that there haven been many discussions on the slowness of count(*) even
>> when an index is involved because the visibility of the rows has to be
>> checked. In the past I have seen many suggestions about using triggers and
>> tables to keep track of counts and while this works fine in a situation where
>> you know what the report is going to be ahead of time, this is simply not an
>> option when an unknown WHERE clause is to be used (dynamically generated).
>> I ran into a fine example of this when I was searching this mailing list,
>> "Searching in 856,646 pages took 13.48202 seconds. Site search powered by
>> PostgreSQL 8.3." Obviously at some point count(*) came into play here because
>> the site made a list of pages (1 2 3 4 5 6> next). I very commonly make a
>> list of pages from search results, and the biggest time killer here is the
>> count(*) portion, even worse yet, I sometimes have to hit the database with
>> two SELECT statements, one with OFFSET and LIMIT to get the page of results I
>> need and another to get the amount of total rows so I can estimate how many
>> pages of results are available. The point I am driving at here is that since
>> building a list of pages of results is such a common thing to do, there need
>> to be some specific high speed ways to do this in one query. Maybe an
>> estimate(*) that works like count but gives an answer from the index without
>> checking visibility? I am sure that this would be good enough to make a page
>> list, it is really no big deal if it errors on the positive side, maybe the
>> list of pages has an extra page off the end. I can live with that. What I
>> can't live with is taking 13 seconds to get a page of results from 850,000
>> rows in a table.
>
> 99% of the time in the situations you don't need an exact measure, and
> assuming analyze has run recently, select rel_tuples from pg_class for
> a given table is more than close enough. I'm sure wrapping that in a
> simple estimated_rows() function would be easy enough to do.

First of all, it's not true. There are plenty of applications that need an exact answer. Second, even if it is only 1%, that means it's 1% of the queries, not 1% of people. Sooner or later a large fraction of developers will run into this. It's probably been the most-asked question I've seen on this forum in the four years I've been here. It's a real problem, and it needs a real solution.

I know it's a hard problem to solve, but can we stop hinting that those of us who have this problem are somehow being dense?

Thanks,
Craig

From:	Neil Whelchel <neil(dot)whelchel(at)gmail(dot)com>
To:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: Slow count(*) again...
Date:	2010-10-11 19:54:57
Message-ID:	201010111254.58479.neil.whelchel@gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On Monday 11 October 2010 10:46:17 Craig James wrote:
> On 10/9/10 6:47 PM, Scott Marlowe wrote:
> > On Sat, Oct 9, 2010 at 5:26 PM, Neil Whelchel<neil(dot)whelchel(at)gmail(dot)com>
wrote:
> >> I know that there haven been many discussions on the slowness of
> >> count(*) even when an index is involved because the visibility of the
> >> rows has to be checked. In the past I have seen many suggestions about
> >> using triggers and tables to keep track of counts and while this works
> >> fine in a situation where you know what the report is going to be ahead
> >> of time, this is simply not an option when an unknown WHERE clause is
> >> to be used (dynamically generated). I ran into a fine example of this
> >> when I was searching this mailing list, "Searching in 856,646 pages
> >> took 13.48202 seconds. Site search powered by PostgreSQL 8.3."
> >> Obviously at some point count(*) came into play here because the site
> >> made a list of pages (1 2 3 4 5 6> next). I very commonly make a list
> >> of pages from search results, and the biggest time killer here is the
> >> count(*) portion, even worse yet, I sometimes have to hit the database
> >> with two SELECT statements, one with OFFSET and LIMIT to get the page
> >> of results I need and another to get the amount of total rows so I can
> >> estimate how many pages of results are available. The point I am
> >> driving at here is that since building a list of pages of results is
> >> such a common thing to do, there need to be some specific high speed
> >> ways to do this in one query. Maybe an estimate(*) that works like
> >> count but gives an answer from the index without checking visibility? I
> >> am sure that this would be good enough to make a page list, it is
> >> really no big deal if it errors on the positive side, maybe the list of
> >> pages has an extra page off the end. I can live with that. What I can't
> >> live with is taking 13 seconds to get a page of results from 850,000
> >> rows in a table.
> >
> > 99% of the time in the situations you don't need an exact measure, and
> > assuming analyze has run recently, select rel_tuples from pg_class for
> > a given table is more than close enough. I'm sure wrapping that in a
> > simple estimated_rows() function would be easy enough to do.
>
> First of all, it's not true. There are plenty of applications that need an
> exact answer. Second, even if it is only 1%, that means it's 1% of the
> queries, not 1% of people. Sooner or later a large fraction of developers
> will run into this. It's probably been the most-asked question I've seen
> on this forum in the four years I've been here. It's a real problem, and
> it needs a real solution.
>
> I know it's a hard problem to solve, but can we stop hinting that those of
> us who have this problem are somehow being dense?
>
> Thanks,
> Craig

That is why I suggested an estimate(*) that works like (a faster) count(*)
except that it may be off a bit. I think that is what he was talking about
when he wrote this.

I don't think that anyone here is trying to cast any blame, we are just
pointing out that there is a real problem here that involves what seems to be
a very common task, and it is placing a huge disadvantage on the use of
Postgres to other systems that can do it in less time. There doesn't seem to
be any disagreement that count(*) is slower than it could be due to MVCC and
other reasons, which is fine. However at the chopping end of the line, if a
slow count(*) makes a Postgres driven website take say a minute to render a
web page, it is completely useless if it can be replaced with a database
engine that can do the same thing in (much) less time. On my servers, this is
the major sticking point. There are so many people waiting on count(*), that
the server runs out of memory and it is forced to stop accepting more
connections until some of the threads finish. This makes many unhappy
customers.
When it comes to serving up web pages that contain a slice of a table with
links to other slices, knowing about how many slices is very important. But I
think that we can all agree that the exact amount is not a make or break (even
better if the estimate is a bit high), so an estimate(*) function that takes
some shortcuts here to get a much faster response (maybe off a bit) would
solve a huge problem.

What it all boils down to is webserver response time, and there are really two
things that are slowing things down more than what should be needed.
So there are really two possible solutions either of which would be a big
help:
1. A faster count(*), or something like my proposed estimate(*).
2. A way to get the total rows matched when using LIMIT and OFFSET before
LIMIT and OFFSET are applied.

If you are making a web page that contains a few results of many possible
results, you need two things for sure which means that there are really two
problems with Postgres for doing this task.
1. You need to know (about) how many total rows. This requires a hit to the
database which requires a scan of the table to get, there is no way to do this
faster than count(*) as far as I know.
2. You need a slice of the data which requires another scan to the table to
get, and using the same WHERE clause as above. This seems like a total waste,
because we just did that with the exception of actually fetching the data.

Why do it twice when if there was a way to get a slice using OFFSET and LIMIT
and get the amount of rows that matched before the OFFSET and LIMIT was
applied you could do the scan once? I think that this would improve things and
give Postgres an edge over other systems.

I hope this makes sense to at least one person in the right place. ;)

-Neil-

From:	Mladen Gogala <mgogala(at)vmsinfo(dot)com>
To:	Neil Whelchel <neil(dot)whelchel(at)gmail(dot)com>
Cc:	"pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: Slow count(*) again...
Date:	2010-10-11 20:58:37
Message-ID:	4CB37A7D.90502@vmsinfo.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

Neil Whelchel wrote:
>
>
> That is why I suggested an estimate(*) that works like (a faster) count(*)
> except that it may be off a bit. I think that is what he was talking about
> when he wrote this.
>
>
The main problem with "select count(*)" is that it gets seriously
mis-used. Using "select count(*)" to establish existence is bad for
performance and for code readability.

--

Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com
The Leader in Integrated Media Intelligence Solutions

From:	Samuel Gendler <sgendler(at)ideasculptor(dot)com>
To:	Neil Whelchel <neil(dot)whelchel(at)gmail(dot)com>
Cc:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: Slow count(*) again...
Date:	2010-10-11 22:03:38
Message-ID:	AANLkTimRuS-z3-8x7RpgqsQ9-rxVxWB3VyHG806UUyvR@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On Mon, Oct 11, 2010 at 12:54 PM, Neil Whelchel <neil(dot)whelchel(at)gmail(dot)com>wrote:

>
> 2. You need a slice of the data which requires another scan to the table to
> get, and using the same WHERE clause as above. This seems like a total
> waste,
> because we just did that with the exception of actually fetching the data.
>
> Why do it twice when if there was a way to get a slice using OFFSET and
> LIMIT
> and get the amount of rows that matched before the OFFSET and LIMIT was
> applied you could do the scan once? I think that this would improve things
> and
> give Postgres an edge over other systems.
>
>
I'd go even farther with number 2 and suggest that a form of offset/limit
which can return the total count OR have a total count be passed in to be
returned the same way as if total count were being computed would make the
use of that api even easier, since you could keep re-using the number
returned the first time without changing the api that gets used depending
upon context. Of course, you could contrive to set that up via a stored
proc relatively easily by simply doing the count(*) once, then appending it
to each row of the offset/limit query by including it in the select
statement. Let it optionally receive the total to be used as an input
parameter, which if not null will result in the count(*) block being skipped
in the proc. You'd incur the full cost of the table scan plus offset/limit
query once, but then not for each and every page. Since the modified api
you suggest for offset/limit would surely have to perform the table scan
once, that solution really isn't giving much more value than implementing
as a stored proc other than the flexibility of executing an arbitrary query.
Modified offset/limit combined with the count_estimate functionality would
be very useful in this circumstance, though - especially if the estimate
would just do a full count if the estimate is under a certain threshold. A
25% discrepancy when counting millions of rows is a lot less of an issue
than a 25% discrepancy when counting 10 rows.

One issue with an estimation is that you must be certain that the estimate
>= actual count or else the app must always attempt to load the page BEYOND
the last page of the estimate in order to determine if the estimate must be
revised upward. Otherwise, you risk leaving rows out entirely. Probably ok
when returning search results. Not so much when displaying a list of
assets.

From:	Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>
To:	Neil Whelchel <neil(dot)whelchel(at)gmail(dot)com>
Cc:	"pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: Slow count(*) again...
Date:	2010-10-11 23:50:36
Message-ID:	4CB3A2CC.9070009@vmsinfo.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On 10/11/2010 3:54 PM, Neil Whelchel wrote:
> 1. A faster count(*), or something like my proposed estimate(*).
> 2. A way to get the total rows matched when using LIMIT and OFFSET before
> LIMIT and OFFSET are applied.

The biggest single problem with "select count(*)" is that it is
seriously overused. People use that idiom to establish existence, which
usually leads to a performance disaster in the application using it,
unless the table has no more than few hundred records. SQL language, of
which PostgreSQL offers an excellent implementation, offers [NOT]
EXISTS clause since its inception in the Jurassic era. The problem is
with the sequential scan, not with counting. I'd even go as far as to
suggest that 99% instances of the "select count(*)" idiom are probably
bad use of the SQL language.

--
Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
www.vmsinfo.com

From:	Scott Carey <scott(at)richrelevance(dot)com>
To:	"mladen(dot)gogala(at)vmsinfo(dot)com" <mladen(dot)gogala(at)vmsinfo(dot)com>
Cc:	Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, Neil Whelchel <neil(dot)whelchel(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: Slow count(*) again...
Date:	2010-10-12 02:02:51
Message-ID:	90E30938-1E5E-4DCA-A1FB-AEECEB76A70D@richrelevance.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On Oct 10, 2010, at 10:14 AM, Mladen Gogala wrote:

>
> SQL> show parameter db_file_multi
>
> NAME TYPE VALUE
> ------------------------------------ -----------
> ------------------------------
> db_file_multiblock_read_count integer 16
> SQL> alter session set db_file_multiblock_read_count=1;
>
> Session altered.
> SQL> select count(*) from ni_occurrence;
>
> COUNT(*)
> ----------
> 402062638
>
> Elapsed: 00:08:20.88
> SQL> alter session set db_file_multiblock_read_count=128;
>
> Session altered.
>
> Elapsed: 00:00:00.50
> SQL> select count(*) from ni_occurrence;
>
> COUNT(*)
> ----------
> 402062638
>
> Elapsed: 00:02:17.58
>
>
> In other words, when I batched the sequential scan to do 128 blocks I/O,
> it was 4 times faster than when I did the single block I/O.
> Does that provide enough of an evidence and, if not, why not?
>

Did you tune the linux FS read-ahead first? You can get large gains by doing that if you are on ext3.
blockdev --setra 2048 <device>

would give you a 1MB read-ahead. Also, consider XFS and its built-in defragmentation. I have found that a longer lived postgres DB will get extreme
file fragmentation over time and sequential scans end up mostly random. On-line file defrag helps tremendously.

I'm not sure its all the I/O however. It seems that Postgres uses a lot more CPU than other DB's to crack open a tuple and inspect it. Testing on unindexed tables with count(*) I can get between 200MB and 800MB per second off disk max with full cpu utilization (depending on the average tuple size and contents). This is on a disk array that can do 1200MB/sec. It always feels dissapointing to not be able to max out the disk throughput on the simplest possible query.

> Again, requiring "hard numbers" when
> using the database which doesn't allow tweaking of the I/O size is self
> defeating proposition. The other databases, like DB2 and Oracle both
> allow tweaking of that aspect of its operation, Oracle even on the per
> session basis. If you still claim that it wouldn't make the difference,
> the onus to prove it is on you.
>
> --
> Mladen Gogala
> Sr. Oracle DBA
> 1500 Broadway
> New York, NY 10036
> (212) 329-5251
> www.vmsinfo.com
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance

From:	Greg Smith <greg(at)2ndquadrant(dot)com>
To:	Joshua Tolley <eggyknap(at)gmail(dot)com>
Cc:	Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, mladen(dot)gogala(at)vmsinfo(dot)com, Neil Whelchel <neil(dot)whelchel(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: Slow count(*) again...
Date:	2010-10-12 02:19:04
Message-ID:	4CB3C598.2010302@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

Joshua Tolley wrote:
> It was asserted that reading bigger chunks would help performance; a response
> suggested that, at least in Linux, setting readahead on a device would
> essentially do the same thing. Or that's what I got from the thread, anyway.
> I'm interested to know how similar performance might be between the large
> block size case and the large readahead case.

Large read-ahead addresses the complaint here (bulk reads are slow) just
fine, which is one reason why this whole topic isn't nearly as
interesting as claimed. Larger chunk sizes in theory will do the same
thing, but then you're guaranteed to be reading larger blocks than
necessary in all cases. The nice thing about a good adaptive read-ahead
is that it can target small blocks normally, and only kick into heavy
read-ahead mode when the I/O pattern justifies it.

This is a problem for the operating system to solve, and such solutions
out there are already good enough that PostgreSQL has little reason to
try and innovate in this area. I routinely see seq scan throughput
double on Linux just by tweaking read-ahead from the tiny defaults to a
sane value.

--
Greg Smith, 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services and Support www.2ndQuadrant.us
Author, "PostgreSQL 9.0 High Performance" Pre-ordering at:
https://www.packtpub.com/postgresql-9-0-high-performance/book

From:	Scott Carey <scott(at)richrelevance(dot)com>
To:	Scott Carey <scott(at)richrelevance(dot)com>
Cc:	"mladen(dot)gogala(at)vmsinfo(dot)com" <mladen(dot)gogala(at)vmsinfo(dot)com>, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, Neil Whelchel <neil(dot)whelchel(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: Slow count(*) again...
Date:	2010-10-12 02:21:04
Message-ID:	BAE5A1D8-E236-4220-81F1-C01BF8DBA611@richrelevance.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On Oct 11, 2010, at 7:02 PM, Scott Carey wrote:

>
> On Oct 10, 2010, at 10:14 AM, Mladen Gogala wrote:
>
>>
>> SQL> show parameter db_file_multi
>>
>> NAME TYPE VALUE
>> ------------------------------------ -----------
>> ------------------------------
>> db_file_multiblock_read_count integer 16
>> SQL> alter session set db_file_multiblock_read_count=1;
>>
>> Session altered.
>> SQL> select count(*) from ni_occurrence;
>>
>> COUNT(*)
>> ----------
>> 402062638
>>
>> Elapsed: 00:08:20.88
>> SQL> alter session set db_file_multiblock_read_count=128;
>>
>> Session altered.
>>
>> Elapsed: 00:00:00.50
>> SQL> select count(*) from ni_occurrence;
>>
>> COUNT(*)
>> ----------
>> 402062638
>>
>> Elapsed: 00:02:17.58
>>
>>
>> In other words, when I batched the sequential scan to do 128 blocks I/O,
>> it was 4 times faster than when I did the single block I/O.
>> Does that provide enough of an evidence and, if not, why not?
>>
>
> Did you tune the linux FS read-ahead first? You can get large gains by doing that if you are on ext3.
> blockdev --setra 2048 <device>
>
Scratch that, if you are using DirectIO, block read-ahead does nothing. The default is 128K for buffered I/O read-ahead.

> would give you a 1MB read-ahead. Also, consider XFS and its built-in defragmentation. I have found that a longer lived postgres DB will get extreme
> file fragmentation over time and sequential scans end up mostly random. On-line file defrag helps tremendously.
>
>> It maybe so, but slow sequential scan is still the largest single
>> performance problem of PostgreSQL. The frequency with which that topic
>> appears on the mailing lists should serve as a good evidence for that. I
>> did my best to prove my case.
>
> I'm not sure its all the I/O however. It seems that Postgres uses a lot more CPU than other DB's to crack open a tuple and inspect it. Testing on unindexed tables with count(*) I can get between 200MB and 800MB per second off disk max with full cpu utilization (depending on the average tuple size and contents). This is on a disk array that can do 1200MB/sec. It always feels dissapointing to not be able to max out the disk throughput on the simplest possible query.
>
>> Again, requiring "hard numbers" when
>> using the database which doesn't allow tweaking of the I/O size is self
>> defeating proposition. The other databases, like DB2 and Oracle both
>> allow tweaking of that aspect of its operation, Oracle even on the per
>> session basis. If you still claim that it wouldn't make the difference,
>> the onus to prove it is on you.
>>
>> --
>> Mladen Gogala
>> Sr. Oracle DBA
>> 1500 Broadway
>> New York, NY 10036
>> (212) 329-5251
>> www.vmsinfo.com
>>
>>
>> --
>> Sent via pgsql-performance mailing list (pgsql-performance(at)postgresql(dot)org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-performance
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance

From:	Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>
To:	Scott Carey <scott(at)richrelevance(dot)com>
Cc:	"pgsql-performance(at)postgresql(dot)org >> \"pgsql-performance(at)postgresql(dot)org\"" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: Slow count(*) again...
Date:	2010-10-12 02:23:46
Message-ID:	4CB3C6B2.8050501@vmsinfo.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On 10/11/2010 10:02 PM, Scott Carey wrote:
> Did you tune the linux FS read-ahead first? You can get large gains by doing that if you are on ext3.
> blockdev --setra 2048<device>

Actually, I have blockdev --setra 32768

I agree, but I am afraid that after the demise of SGI, XFS isn't being
developed. The company adopted the policy of using only the plain
vanilla Ext3, which is unfortunate, but I can't do much about it. There
is a lesson to be learned from the story of ReiserFS. I am aware of the
fact that Ext3 is rather basic, block oriented file system which doesn't
perform well when compared to HPFS, VxFS or JFS2 and has no notion of
extents, but I believe that I am stuck with it, until the advent of
Ext4. BTW, there is no defragmenter for Ext4, not even on Ubuntu, which
is rather bleeding edge distribution.

--
Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
www.vmsinfo.com

From:	Neil Whelchel <neil(dot)whelchel(at)gmail(dot)com>
To:	"pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: Slow count(*) again...
Date:	2010-10-12 02:36:45
Message-ID:	201010111936.46207.neil.whelchel@gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On Monday 11 October 2010 16:50:36 you wrote:
> On 10/11/2010 3:54 PM, Neil Whelchel wrote:
> > 1. A faster count(*), or something like my proposed estimate(*).
> > 2. A way to get the total rows matched when using LIMIT and OFFSET before
> > LIMIT and OFFSET are applied.
>
> The biggest single problem with "select count(*)" is that it is
> seriously overused. People use that idiom to establish existence, which
> usually leads to a performance disaster in the application using it,
> unless the table has no more than few hundred records. SQL language, of
> which PostgreSQL offers an excellent implementation, offers [NOT]
> EXISTS clause since its inception in the Jurassic era. The problem is
> with the sequential scan, not with counting. I'd even go as far as to
> suggest that 99% instances of the "select count(*)" idiom are probably
> bad use of the SQL language.

I agree, I have seen many very bad examples of using count(*). I will go so
far as to question the use of count(*) in my examples here. It there a better
way to come up with a page list than using count(*)? What is the best method
to make a page of results and a list of links to other pages of results? Am I
barking up the wrong tree here?
-Neil-

From:	Neil Whelchel <neil(dot)whelchel(at)gmail(dot)com>
To:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: Slow count(*) again...
Date:	2010-10-12 03:42:42
Message-ID:	201010112042.43504.neil.whelchel@gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On Monday 11 October 2010 19:23:46 Mladen Gogala wrote:
> On 10/11/2010 10:02 PM, Scott Carey wrote:
> > Did you tune the linux FS read-ahead first? You can get large gains by
> > doing that if you are on ext3. blockdev --setra 2048<device>
>
> Actually, I have blockdev --setra 32768
>
> > would give you a 1MB read-ahead. Also, consider XFS and its built-in
> > defragmentation. I have found that a longer lived postgres DB will get
> > extreme file fragmentation over time and sequential scans end up mostly
> > random. On-line file defrag helps tremendously.
>
> I agree, but I am afraid that after the demise of SGI, XFS isn't being
> developed. The company adopted the policy of using only the plain
> vanilla Ext3, which is unfortunate, but I can't do much about it. There
> is a lesson to be learned from the story of ReiserFS. I am aware of the
> fact that Ext3 is rather basic, block oriented file system which doesn't
> perform well when compared to HPFS, VxFS or JFS2 and has no notion of
> extents, but I believe that I am stuck with it, until the advent of
> Ext4. BTW, there is no defragmenter for Ext4, not even on Ubuntu, which
> is rather bleeding edge distribution.

When it comes to a database that has many modifications to its tables, it
seems that XFS pulls way ahead of other filesystems (likely) because of its
on-line defragmentation among other reasons. I am not sure that XFS is not
(properly) maintained. The people at xfs.org seem to be making steady
progress, and high quality updates. I have been using it for some time now (9+
years), and as it is, it does everything I need it to do, and it is very
reliable. I really can't see anything changing in the next few years that
would effect its usability as a filesystem for Postgres, so until something
proves to be better, I can't understand why you wouldn't want to use it,
maintained or not.
-Neil-

From:	Samuel Gendler <sgendler(at)ideasculptor(dot)com>
To:	Greg Smith <greg(at)2ndquadrant(dot)com>
Cc:	Joshua Tolley <eggyknap(at)gmail(dot)com>, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, mladen(dot)gogala(at)vmsinfo(dot)com, Neil Whelchel <neil(dot)whelchel(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: Slow count(*) again...
Date:	2010-10-12 03:58:45
Message-ID:	AANLkTi=WE-CsGVC3j4CX9ddGGsRZ3Mmk6aJGKk4NT=g6@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On Mon, Oct 11, 2010 at 7:19 PM, Greg Smith <greg(at)2ndquadrant(dot)com> wrote:
>
>
> This is a problem for the operating system to solve, and such solutions out
> there are already good enough that PostgreSQL has little reason to try and
> innovate in this area. I routinely see seq scan throughput double on Linux
> just by tweaking read-ahead from the tiny defaults to a sane value.
>

I spent some time going through the various tuning docs on the wiki whie
bringing some new hardware up and I can't remember seeing any discussion of
tweaking read-ahead at all in the normal performance-tuning references. Do
you have any documentation of the kinds of tweaking you have done and its
effects on different types of workloads?

From:	Scott Carey <scott(at)richrelevance(dot)com>
To:	Samuel Gendler <sgendler(at)ideasculptor(dot)com>
Cc:	Greg Smith <greg(at)2ndquadrant(dot)com>, Joshua Tolley <eggyknap(at)gmail(dot)com>, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, "mladen(dot)gogala(at)vmsinfo(dot)com" <mladen(dot)gogala(at)vmsinfo(dot)com>, Neil Whelchel <neil(dot)whelchel(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: Slow count(*) again...
Date:	2010-10-12 04:06:07
Message-ID:	E2D86063-1FD0-48D0-BB0D-0E78F4A77FD2@richrelevance.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

I can't speak to documentation, but it is something that helps as your I/O subsystem gets more powerful, and how much it helps depends more on your hardware, which may have adaptive read ahead on its own, and your file system which may be more or less efficient at sequential I/O. For example ext3 out of the box gets a much bigger gain from tuning read-ahead than XFS on a DELL PERC6 RAID card (but still ends up slower).

Linux Read-ahead has no effect on random access performance. A workload consisting of mixed sequential scans and random reads can be tuned to favor one over the other based on a combination of the I/O scheduler used and the ammount of read-ahead. Larger read-ahead helps sequential scans, and the Deadline scheduler tends to favor throughput (sequential scans) over latency (random access) compared to the cfq scheduler.

On Oct 11, 2010, at 8:58 PM, Samuel Gendler wrote:

On Mon, Oct 11, 2010 at 7:19 PM, Greg Smith <greg(at)2ndquadrant(dot)com<mailto:greg(at)2ndquadrant(dot)com>> wrote:

This is a problem for the operating system to solve, and such solutions out there are already good enough that PostgreSQL has little reason to try and innovate in this area. I routinely see seq scan throughput double on Linux just by tweaking read-ahead from the tiny defaults to a sane value.

I spent some time going through the various tuning docs on the wiki whie bringing some new hardware up and I can't remember seeing any discussion of tweaking read-ahead at all in the normal performance-tuning references. Do you have any documentation of the kinds of tweaking you have done and its effects on different types of workloads?

From:	Samuel Gendler <sgendler(at)ideasculptor(dot)com>
To:	Scott Carey <scott(at)richrelevance(dot)com>
Cc:	Greg Smith <greg(at)2ndquadrant(dot)com>, Joshua Tolley <eggyknap(at)gmail(dot)com>, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, "mladen(dot)gogala(at)vmsinfo(dot)com" <mladen(dot)gogala(at)vmsinfo(dot)com>, Neil Whelchel <neil(dot)whelchel(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: Slow count(*) again...
Date:	2010-10-12 04:21:59
Message-ID:	AANLkTimjsbV6W5JcMKs02b1wJLhhXaxHxq7=mURY50nG@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On Mon, Oct 11, 2010 at 9:06 PM, Scott Carey <scott(at)richrelevance(dot)com>wrote:

> I can't speak to documentation, but it is something that helps as your I/O
> subsystem gets more powerful, and how much it helps depends more on your
> hardware, which may have adaptive read ahead on its own, and your file
> system which may be more or less efficient at sequential I/O. For example
> ext3 out of the box gets a much bigger gain from tuning read-ahead than XFS
> on a DELL PERC6 RAID card (but still ends up slower).
>
>
Geez. I wish someone would have written something quite so bold as 'xfs is
always faster than ext3' in the standard tuning docs. I couldn't find
anything that made a strong filesystem recommendation. How does xfs compare
to ext4? I wound up on ext4 on a dell perc6 raid card when an unexpected
hardware failure on a production system caused my test system to get thrown
into production before I could do any serious testing of xfs. If there is a
strong consensus that xfs is simply better, I could afford the downtime to
switch.

As it happens, this is a system where all of the heavy workload is in the
form of sequential scan type load. The OLTP workload is very minimal (tens
of queries per minute on a small number of small tables), but there are a
lot of reporting queries that wind up doing sequential scans of large
partitions (millions to tens of millions of rows). We've sized the new
hardware so that the most commonly used partitions fit into memory, but if
we could speed the queries that touch less frequently used partitions, that
would be good. I'm the closest thing our team has to a DBA, which really
only means that I'm the one person on the dev team or the ops team to have
read all of the postgres docs and wiki and the mailing lists. I claim no
actual DBA experience or expertise and have limited cycles to devote to
tuning and testing, so if there is an established wisdom for filesystem
choice and read ahead tuning, I'd be very interested in hearing it.

From:	david(at)lang(dot)hm
To:	Samuel Gendler <sgendler(at)ideasculptor(dot)com>
Cc:	Scott Carey <scott(at)richrelevance(dot)com>, Greg Smith <greg(at)2ndquadrant(dot)com>, Joshua Tolley <eggyknap(at)gmail(dot)com>, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, "mladen(dot)gogala(at)vmsinfo(dot)com" <mladen(dot)gogala(at)vmsinfo(dot)com>, Neil Whelchel <neil(dot)whelchel(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: Slow count(*) again...
Date:	2010-10-12 04:35:25
Message-ID:	alpine.DEB.2.00.1010112130540.21889@asgard.lang.hm
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On Mon, 11 Oct 2010, Samuel Gendler wrote:

> On Mon, Oct 11, 2010 at 9:06 PM, Scott Carey <scott(at)richrelevance(dot)com>wrote:
>
>> I can't speak to documentation, but it is something that helps as your I/O
>> subsystem gets more powerful, and how much it helps depends more on your
>> hardware, which may have adaptive read ahead on its own, and your file
>> system which may be more or less efficient at sequential I/O. For example
>> ext3 out of the box gets a much bigger gain from tuning read-ahead than XFS
>> on a DELL PERC6 RAID card (but still ends up slower).
>>
>>
> Geez. I wish someone would have written something quite so bold as 'xfs is
> always faster than ext3' in the standard tuning docs. I couldn't find
> anything that made a strong filesystem recommendation. How does xfs compare
> to ext4? I wound up on ext4 on a dell perc6 raid card when an unexpected
> hardware failure on a production system caused my test system to get thrown
> into production before I could do any serious testing of xfs. If there is a
> strong consensus that xfs is simply better, I could afford the downtime to
> switch.

unfortunantly you are not going to get a clear opinion here.

ext3 has a long track record, and since it is the default, it gets a lot
of testing. it does have known issues

xfs had problems on linux immediatly after it was ported. It continues to
be improved and many people have been using it for years and trust it. XFS
does have a weakness in creating/deleting large numbers of small files.

ext4 is the new kid on the block. it claims good things, but it's so new
that many people don't trust it yet

btrfs is the 'future of filesystems' that is supposed to be better than
anything else, but it's definantly not stable yet, and time will tell if
it really lives up to it's promises.

an this is just on linux

on BSD or solaris (or with out-of-kernel patches) you also have ZFS, which
some people swear by, and other people swear at.

David Lang

> As it happens, this is a system where all of the heavy workload is in the
> form of sequential scan type load. The OLTP workload is very minimal (tens
> of queries per minute on a small number of small tables), but there are a
> lot of reporting queries that wind up doing sequential scans of large
> partitions (millions to tens of millions of rows). We've sized the new
> hardware so that the most commonly used partitions fit into memory, but if
> we could speed the queries that touch less frequently used partitions, that
> would be good. I'm the closest thing our team has to a DBA, which really
> only means that I'm the one person on the dev team or the ops team to have
> read all of the postgres docs and wiki and the mailing lists. I claim no
> actual DBA experience or expertise and have limited cycles to devote to
> tuning and testing, so if there is an established wisdom for filesystem
> choice and read ahead tuning, I'd be very interested in hearing it.
>

From:	Greg Smith <greg(at)2ndquadrant(dot)com>
To:	Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>
Cc:	Scott Carey <scott(at)richrelevance(dot)com>, "pgsql-performance(at)postgresql(dot)org >> \"pgsql-performance(at)postgresql(dot)org\"" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: Slow count(*) again...
Date:	2010-10-12 06:32:09
Message-ID:	4CB400E9.401@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

Mladen Gogala wrote:
> I agree, but I am afraid that after the demise of SGI, XFS isn't being
> developed.

It's back to being well maintained again; see
http://blog.2ndquadrant.com/en/2010/04/the-return-of-xfs-on-linux.html
for some history here and why it's become relevant to RedHat in
particular recently.

From:	Greg Smith <greg(at)2ndquadrant(dot)com>
To:	Samuel Gendler <sgendler(at)ideasculptor(dot)com>
Cc:	Joshua Tolley <eggyknap(at)gmail(dot)com>, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, mladen(dot)gogala(at)vmsinfo(dot)com, Neil Whelchel <neil(dot)whelchel(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: Slow count(*) again...
Date:	2010-10-12 06:39:32
Message-ID:	4CB402A4.9070109@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

Samuel Gendler wrote:
> I spent some time going through the various tuning docs on the wiki
> whie bringing some new hardware up and I can't remember seeing any
> discussion of tweaking read-ahead at all in the normal
> performance-tuning references. Do you have any documentation of the
> kinds of tweaking you have done and its effects on different types of
> workloads?

Much of my recent research has gone into the book you'll see plugged
below rather than the wiki. The basics of read-ahead tuning is that you
can see it increase bonnie++ sequential read results when you increase
it, to a point. Get to that point and stop and you should be in good shape.

From:	Vitalii Tymchyshyn <tivv00(at)gmail(dot)com>
To:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: Slow count(*) again...
Date:	2010-10-12 07:56:13
Message-ID:	4CB4149D.8030803@gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

11.10.10 20:46, Craig James написав(ла):
>
> First of all, it's not true. There are plenty of applications that
> need an exact answer. Second, even if it is only 1%, that means it's
> 1% of the queries, not 1% of people. Sooner or later a large fraction
> of developers will run into this. It's probably been the most-asked
> question I've seen on this forum in the four years I've been here.
> It's a real problem, and it needs a real solution.
>
> I know it's a hard problem to solve, but can we stop hinting that
> those of us who have this problem are somehow being dense?
>
BTW: There is a lot of talk about MVCC, but is next solution possible:
1) Create a page information map that for each page in the table will
tell you how may rows are within and if any write (either successful or
not) were done to this page. This even can be two maps to make second
one really small (a bit per page) - so that it could be most time in-memory.
2) When you need to to count(*) or index check - first check if there
were no writes to the page. If not - you can use count information from
page info/index data without going to the page itself
3) Let vacuum clear the bit after frozing all the tuples in the page (am
I using terminology correctly?).

In this case all read-only (archive) data will be this bit off and
index/count(*) will be really fast.
Am I missing something?

Best regards, Vitalii Tymchyshyn.

From:	Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
To:	Vitalii Tymchyshyn <tivv00(at)gmail(dot)com>
Cc:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: Slow count(*) again...
Date:	2010-10-12 08:14:58
Message-ID:	4CB41902.1020503@postnewspapers.com.au
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On 10/12/2010 03:56 PM, Vitalii Tymchyshyn wrote:

> BTW: There is a lot of talk about MVCC, but is next solution possible:
> 1) Create a page information map that for each page in the table will
> tell you how may rows are within and if any write (either successful or
> not) were done to this page. This even can be two maps to make second
> one really small (a bit per page) - so that it could be most time
> in-memory.
> 2) When you need to to count(*) or index check - first check if there
> were no writes to the page. If not - you can use count information from
> page info/index data without going to the page itself
> 3) Let vacuum clear the bit after frozing all the tuples in the page (am
> I using terminology correctly?).

Part of this already exists. It's called the visibility map, and is
present in 8.4 and above. It's not currently used for queries, but can
potentially be used to aid some kinds of query.

http://www.postgresql.org/docs/8.4/static/storage-vm.html

> In this case all read-only (archive) data will be this bit off and
> index/count(*) will be really fast.

A count with any joins or filter criteria would still have to scan all
pages with visible tuples in them. So the visibility map helps speed up
scanning of bloated tables, but doesn't provide a magical "fast count"
except in the utterly trivial "select count(*) from tablename;" case,
and can probably only be used for accurate results when there are no
read/write transactions currently open. Even if you kept a count of
tuples in each page along with the mvcc transaction ID information
required to determine for which transactions that count is valid, it'd
only be useful if you didn't have to do any condition checks, and it'd
be yet another thing to update with every insert/delete/update.

Perhaps for some users that'd be worth having, but it seems to me like
it'd have pretty narrow utility. I'm not sure that's the answer.

--
Craig Ringer

From:	david(at)lang(dot)hm
To:	Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
Cc:	Vitalii Tymchyshyn <tivv00(at)gmail(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject:	Re: Slow count(*) again...
Date:	2010-10-12 08:22:39
Message-ID:	alpine.DEB.2.00.1010120118030.21889@asgard.lang.hm
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On Tue, 12 Oct 2010, Craig Ringer wrote:

>
>> BTW: There is a lot of talk about MVCC, but is next solution possible:
>> 1) Create a page information map that for each page in the table will
>> tell you how may rows are within and if any write (either successful or
>> not) were done to this page. This even can be two maps to make second
>> one really small (a bit per page) - so that it could be most time
>> in-memory.
>> 2) When you need to to count(*) or index check - first check if there
>> were no writes to the page. If not - you can use count information from
>> page info/index data without going to the page itself
>> 3) Let vacuum clear the bit after frozing all the tuples in the page (am
>> I using terminology correctly?).
>
> Part of this already exists. It's called the visibility map, and is present
> in 8.4 and above. It's not currently used for queries, but can potentially be
> used to aid some kinds of query.
>
> http://www.postgresql.org/docs/8.4/static/storage-vm.html
>
>> In this case all read-only (archive) data will be this bit off and
>> index/count(*) will be really fast.
>
> A count with any joins or filter criteria would still have to scan all pages
> with visible tuples in them. So the visibility map helps speed up scanning of
> bloated tables, but doesn't provide a magical "fast count" except in the
> utterly trivial "select count(*) from tablename;" case, and can probably only
> be used for accurate results when there are no read/write transactions
> currently open. Even if you kept a count of tuples in each page along with
> the mvcc transaction ID information required to determine for which
> transactions that count is valid, it'd only be useful if you didn't have to
> do any condition checks, and it'd be yet another thing to update with every
> insert/delete/update.
>
> Perhaps for some users that'd be worth having, but it seems to me like it'd
> have pretty narrow utility. I'm not sure that's the answer.

from a PR point of view, speeding up the trivil count(*) case could be
worth it, just to avoid people complaining about it not being fast.

in the case where you are doing a count(*) where query and the where is on
an indexed column, could the search just look at the index + the
visibility mapping rather than doing an sequential search through the
table?

as for your worries about the accuracy of a visibility based count in the
face of other transactions, wouldn't you run into the same issues if you
are doing a sequential scan with the same transactions in process?

David Lang

From:	Vitalii Tymchyshyn <tivv00(at)gmail(dot)com>
To:
Cc:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: Slow count(*) again...
Date:	2010-10-12 08:34:22
Message-ID:	4CB41D8E.2010302@gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

12.10.10 11:14, Craig Ringer написав(ла):
> On 10/12/2010 03:56 PM, Vitalii Tymchyshyn wrote:
>
>> BTW: There is a lot of talk about MVCC, but is next solution possible:
>> 1) Create a page information map that for each page in the table will
>> tell you how may rows are within and if any write (either successful or
>> not) were done to this page. This even can be two maps to make second
>> one really small (a bit per page) - so that it could be most time
>> in-memory.
>> 2) When you need to to count(*) or index check - first check if there
>> were no writes to the page. If not - you can use count information from
>> page info/index data without going to the page itself
>> 3) Let vacuum clear the bit after frozing all the tuples in the page (am
>> I using terminology correctly?).
>
> Part of this already exists. It's called the visibility map, and is
> present in 8.4 and above. It's not currently used for queries, but can
> potentially be used to aid some kinds of query.
>
> http://www.postgresql.org/docs/8.4/static/storage-vm.html
>
>> In this case all read-only (archive) data will be this bit off and
>> index/count(*) will be really fast.
>
> A count with any joins or filter criteria would still have to scan all
> pages with visible tuples in them.
If one don't use parittioning. With proper partitioning, filter can
simply select a partitions.

Also filtering can be mapped on the index lookup. And if one could join
index hash and visibility map, much like two indexes can be bit joined
now, count can be really fast for all but non-frozen tuples.
> So the visibility map helps speed up scanning of bloated tables, but
> doesn't provide a magical "fast count" except in the utterly trivial
> "select count(*) from tablename;" case, and can probably only be used
> for accurate results when there are no read/write transactions
> currently open.
Why so? You simply has to recount the pages that are marked dirty in
usual way. But count problem usually occurs when there are a lot of
archive data (you need to count over 100K records) that is not modified.

Best regards, Vitalii Tymchyshyn

From:	Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
To:	david(at)lang(dot)hm
Cc:	Vitalii Tymchyshyn <tivv00(at)gmail(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject:	Re: Slow count(*) again...
Date:	2010-10-12 11:44:59
Message-ID:	4CB44A3B.5090409@postnewspapers.com.au
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On 10/12/2010 04:22 PM, david(at)lang(dot)hm wrote:

> from a PR point of view, speeding up the trivil count(*) case could be
> worth it, just to avoid people complaining about it not being fast.

At the cost of a fair bit more complexity, though, and slowing
everything else down.

The proper solution here remains, IMO, support for visibility
information in indexes, whether by storing it once in the index and once
in the heap (ouch!), storing it out-of-line, or using a covering index
where one or more columns are stored wholly in the index not in the
table heap at all.

Here are a few of the many past discussions about this that have already
covered some of the same ground:

http://stackoverflow.com/questions/839015/postgres-could-an-index-organized-tables-paved-way-for-faster-select-count-fr

http://osdir.com/ml/db.postgresql.performance/2003-10/msg00075.html
(and the rest of the thread)

A decent look with Google will find many, many more.

> in the case where you are doing a count(*) where query and the where is
> on an indexed column, could the search just look at the index + the
> visibility mapping rather than doing an sequential search through the
> table?

Nope, because the visibility map, which is IIRC only one bit per page,
doesn't record how many tuples there are on the page, or enough
information about them to determine how many of them are visible to the
current transaction*.

> as for your worries about the accuracy of a visibility based count in
> the face of other transactions, wouldn't you run into the same issues if
> you are doing a sequential scan with the same transactions in process?

No. Every tuple in a table heap in postgresql has hidden fields, some of
which are used to determine whether the current transaction* can "see"
the tuple - it may have been inserted after this transaction started, or
deleted before this transaction started, so it's not visible to this
transaction but may still be to others.

http://www.postgresql.org/docs/current/static/ddl-system-columns.html

This information isn't available in the visibility map, or in indexes.
That's why PostgreSQL has to hit the heap to find it.

* current transaction should really be "current snapshot". The snapshot
is taken at the start of the whole transaction for SERIALIZABLE
isolation, and at the start of each statement for READ COMMITTED isolation.

--
Craig Ringer

From:	Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>
To:	"david(at)lang(dot)hm" <david(at)lang(dot)hm>
Cc:	Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, Vitalii Tymchyshyn <tivv00(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: Slow count(*) again...
Date:	2010-10-12 12:27:26
Message-ID:	4CB4542E.5010406@vmsinfo.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

david(at)lang(dot)hm wrote:
> from a PR point of view, speeding up the trivil count(*) case could be
> worth it, just to avoid people complaining about it not being fast.
>
>
Fixing PR stuff is not the approach that I would take. People are
complaining about select count(*) because they're using it in all the
wrong places. My assessment that there is a problem with sequential
scan was wrong. Now, let's again take Oracle as the measure.
Someone asked me about caching the data. Here it is:

SQL> connect system/*********
Connected.
SQL> alter system flush buffer_cache;

System altered.

Elapsed: 00:00:12.68
SQL> connect adbase/*********
Connected.
SQL> alter session set db_file_multiblock_read_Count=128;

Session altered.

Elapsed: 00:00:00.41
SQL> select count(*) from ni_occurrence;

COUNT(*)
----------
402062638

Elapsed: 00:02:37.77

SQL> select bytes/1048576 MB from user_segments
2 where segment_name='NI_OCCURRENCE';

MB
----------
35329

Elapsed: 00:00:00.20
SQL>

So, the results weren't cached the first time around. The explanation is
the fact that Oracle, as of the version 10.2.0, reads the table in the
private process memory, not in the shared buffers. This table alone is
35GB in size, Oracle took 2 minutes 47 seconds to read it using the
full table scan. If I do the same thing with PostgreSQL and a comparable
table, Postgres is, in fact, faster:

psql (9.0.1)
Type "help" for help.

news=> \timing
Timing is on.
news=> select count(*) from moreover_documents_y2010m09;
count
----------
17242655
(1 row)

Time: 113135.114 ms
news=> select pg_size_pretty(pg_table_size('moreover_documents_y2010m09'));
pg_size_pretty
----------------
27 GB
(1 row)

Time: 100.849 ms
news=>

The number of rows is significantly smaller, but the table contains
rather significant "text" field which consumes quite a bit of TOAST
storage and the sizes are comparable. Postgres read through 27GB in 113
seconds, less than 2 minutes and oracle took 2 minutes 37 seconds to
read through 35GB. I stand corrected: there is nothing wrong with the
speed of the Postgres sequential scan.

--
Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
www.vmsinfo.com

From:	Jon Nelson <jnelson+pgsql(at)jamponi(dot)net>
To:	mladen(dot)gogala(at)vmsinfo(dot)com
Cc:	"david(at)lang(dot)hm" <david(at)lang(dot)hm>, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, Vitalii Tymchyshyn <tivv00(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: Slow count(*) again...
Date:	2010-10-12 13:07:46
Message-ID:	AANLkTina3zfQ01UwPqps_hogvR-ge=ur9zHCSsxvKAm8@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On Tue, Oct 12, 2010 at 7:27 AM, Mladen Gogala
<mladen(dot)gogala(at)vmsinfo(dot)com> wrote:
>
> So, the results weren't cached the first time around. The explanation is the
> fact that Oracle, as of the version 10.2.0, reads the table in the private
> process memory, not in the shared buffers. This table alone is 35GB in
> size, Oracle took 2 minutes 47 seconds to read it using the full table
> scan. If I do the same thing with PostgreSQL and a comparable table,
> Postgres is, in fact, faster:

Well, I didn't quite mean that - having no familiarity with Oracle I
don't know what the alter system statement does, but I was talking
specifically about the linux buffer and page cache. The easiest way to
drop the linux caches in one fell swoop is:

echo 3 > /proc/sys/vm/drop_caches

Is there a command to tell postgresql to drop/clear/reset it's buffer_cache?

Clearing/dropping both the system (Linux) and the DB caches is
important when doing benchmarks that involve I/O.

--
Jon

From:	Greg Smith <greg(at)2ndquadrant(dot)com>
To:	Jon Nelson <jnelson+pgsql(at)jamponi(dot)net>
Cc:	mladen(dot)gogala(at)vmsinfo(dot)com, "david(at)lang(dot)hm" <david(at)lang(dot)hm>, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, Vitalii Tymchyshyn <tivv00(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: Slow count(*) again...
Date:	2010-10-12 13:18:34
Message-ID:	4CB4602A.3040800@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

Jon Nelson wrote:
> Is there a command to tell postgresql to drop/clear/reset it's buffer_cache?
>

No. Usually the sequence used to remove all cached data from RAM before
a benchmark is:

pg_ctl stop
sync
echo 3 > /proc/sys/vm/drop_caches
pg_ctl start

From:	Luca Tettamanti <kronos(dot)it(at)gmail(dot)com>
To:	Jon Nelson <jnelson+pgsql(at)jamponi(dot)net>
Cc:	mladen(dot)gogala(at)vmsinfo(dot)com, "david(at)lang(dot)hm" <david(at)lang(dot)hm>, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, Vitalii Tymchyshyn <tivv00(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: Slow count(*) again...
Date:	2010-10-12 13:19:52
Message-ID:	AANLkTimyeTQDh22tZUbzCy+KkdzWVv2aEX=5SJ3vhjv=@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On Tue, Oct 12, 2010 at 3:07 PM, Jon Nelson <jnelson+pgsql(at)jamponi(dot)net> wrote:
> On Tue, Oct 12, 2010 at 7:27 AM, Mladen Gogala
> <mladen(dot)gogala(at)vmsinfo(dot)com> wrote:
>>
>> So, the results weren't cached the first time around. The explanation is the
>> fact that Oracle, as of the version 10.2.0, reads the table in the private
>> process memory, not in the shared buffers. This table alone is 35GB in
>> size, Oracle took 2 minutes 47 seconds to read it using the full table
>> scan. If I do the same thing with PostgreSQL and a comparable table,
>> Postgres is, in fact, faster:
>
> Well, I didn't quite mean that - having no familiarity with Oracle I
> don't know what the alter system statement does, but I was talking
> specifically about the linux buffer and page cache. The easiest way to
> drop the linux caches in one fell swoop is:
>
> echo 3 > /proc/sys/vm/drop_caches

AFAIK this won't affect Oracle when using direct IO (which bypasses
the page cache).

Luca

From:	Jon Nelson <jnelson+pgsql(at)jamponi(dot)net>
To:	Greg Smith <greg(at)2ndquadrant(dot)com>
Cc:	mladen(dot)gogala(at)vmsinfo(dot)com, "david(at)lang(dot)hm" <david(at)lang(dot)hm>, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, Vitalii Tymchyshyn <tivv00(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: Slow count(*) again...
Date:	2010-10-12 13:20:14
Message-ID:	AANLkTik_wN6Wn=G5NsxpbnEpaa=kdcAruLn8s0XBimNH@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On Tue, Oct 12, 2010 at 8:18 AM, Greg Smith <greg(at)2ndquadrant(dot)com> wrote:
> No. Usually the sequence used to remove all cached data from RAM before a
> benchmark is:

All cached data (as cached in postgresql - *not* the Linux system
caches)..., right?

--
Jon

From:	Mladen Gogala <mgogala(at)vmsinfo(dot)com>
To:	Jon Nelson <jnelson+pgsql(at)jamponi(dot)net>
Cc:	Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>, "david(at)lang(dot)hm" <david(at)lang(dot)hm>, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, Vitalii Tymchyshyn <tivv00(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: Slow count(*) again...
Date:	2010-10-12 13:55:39
Message-ID:	4CB468DB.8040902@vmsinfo.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

Jon Nelson wrote:
>
> Well, I didn't quite mean that - having no familiarity with Oracle I
> don't know what the alter system statement does, but I was talking
> specifically about the linux buffer and page cache.
>

Those are not utilized by Oracle. This is a RAC instance, running on
top of ASM, which is an Oracle volume manager, using raw devices. There
is no file system on those disks:

SQL> select file_name from dba_data_files
2 where tablespace_name='ADBASE_DATA';

FILE_NAME
--------------------------------------------------------------------------------
+DGDATA/stag3/datafile/adbase_data.262.727278257
+DGDATA/stag3/datafile/adbase_data.263.727278741
+DGDATA/stag3/datafile/adbase_data.264.727280145
+DGDATA/stag3/datafile/adbase_data.265.727280683

[oracle(at)lpo-oracle-30 ~]$ $ORA_CRS_HOME/bin/crs_stat -l
NAME=ora.STAG3.STAG31.inst
TYPE=application
TARGET=ONLINE
STATE=ONLINE on lpo-oracle-30

NAME=ora.STAG3.STAG32.inst
TYPE=application
TARGET=ONLINE
STATE=ONLINE on lpo-oracle-31

NAME=ora.STAG3.db
TYPE=application
TARGET=ONLINE
STATE=ONLINE on lpo-oracle-31

NAME=ora.lpo-oracle-30.ASM1.asm
TYPE=application
TARGET=ONLINE
STATE=ONLINE on lpo-oracle-30

NAME=ora.lpo-oracle-30.LISTENER_LPO-ORACLE-30.lsnr
TYPE=application
TARGET=ONLINE
STATE=ONLINE on lpo-oracle-30

NAME=ora.lpo-oracle-30.gsd
TYPE=application
TARGET=ONLINE
STATE=ONLINE on lpo-oracle-30

NAME=ora.lpo-oracle-30.ons
TYPE=application
TARGET=ONLINE
STATE=ONLINE on lpo-oracle-30

NAME=ora.lpo-oracle-30.vip
TYPE=application
TARGET=ONLINE
STATE=ONLINE on lpo-oracle-30

NAME=ora.lpo-oracle-31.ASM2.asm
TYPE=application
TARGET=ONLINE
STATE=ONLINE on lpo-oracle-31

NAME=ora.lpo-oracle-31.LISTENER_LPO-ORACLE-31.lsnr
TYPE=application
TARGET=ONLINE
STATE=ONLINE on lpo-oracle-31

NAME=ora.lpo-oracle-31.gsd
TYPE=application
TARGET=ONLINE
STATE=ONLINE on lpo-oracle-31

NAME=ora.lpo-oracle-31.ons
TYPE=application
TARGET=ONLINE
STATE=ONLINE on lpo-oracle-31

NAME=ora.lpo-oracle-31.vip
TYPE=application
TARGET=ONLINE
STATE=ONLINE on lpo-oracle-31

The only way to flush cache is the aforementioned "alter system"
command. AFAIK, Postgres doesn't have anything like that. Oracle uses
raw devices precisely to avoid double buffering.

--

Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com
The Leader in Integrated Media Intelligence Solutions

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	mladen(dot)gogala(at)vmsinfo(dot)com
Cc:	"david(at)lang(dot)hm" <david(at)lang(dot)hm>, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, Vitalii Tymchyshyn <tivv00(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: Slow count(*) again...
Date:	2010-10-12 13:56:30
Message-ID:	23516.1286891790@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com> writes:
> The number of rows is significantly smaller, but the table contains
> rather significant "text" field which consumes quite a bit of TOAST
> storage and the sizes are comparable. Postgres read through 27GB in 113
> seconds, less than 2 minutes and oracle took 2 minutes 37 seconds to
> read through 35GB. I stand corrected: there is nothing wrong with the
> speed of the Postgres sequential scan.

Um ... the whole point of TOAST is that the data isn't in-line.
So what Postgres was actually reading through was probably quite a
lot less than 27Gb. It's probably hard to make a completely
apples-to-apples comparison because the two databases are so different,
but I don't think this one proves that PG is faster than Oracle.

regards, tom lane

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	"Neil Whelchel" <neil(dot)whelchel(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: Slow count(*) again...
Date:	2010-10-12 13:56:33
Message-ID:	4CB422C102000025000367F7@gw.wicourts.gov
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

Neil Whelchel <neil(dot)whelchel(at)gmail(dot)com> wrote:

> What is the best method to make a page of results and a list of
> links to other pages of results?

For our most heavily used web app we decided to have the renderer
just read the list of cases and render the pages to disk, and then
present the first one. We set a limit of 500 entries on the list;
if we get past 500 we put up a page telling them to refine their
search criteria. That won't work for all circumstances, but it
works well for out web app.

-Kevin

From:	Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	"david(at)lang(dot)hm" <david(at)lang(dot)hm>, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, Vitalii Tymchyshyn <tivv00(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: Slow count(*) again...
Date:	2010-10-12 14:04:18
Message-ID:	4CB46AE2.7030006@vmsinfo.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

Tom Lane wrote:
> Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com> writes:
>
>> The number of rows is significantly smaller, but the table contains
>> rather significant "text" field which consumes quite a bit of TOAST
>> storage and the sizes are comparable. Postgres read through 27GB in 113
>> seconds, less than 2 minutes and oracle took 2 minutes 37 seconds to
>> read through 35GB. I stand corrected: there is nothing wrong with the
>> speed of the Postgres sequential scan.
>>
>
> Um ... the whole point of TOAST is that the data isn't in-line.
> So what Postgres was actually reading through was probably quite a
> lot less than 27Gb. It's probably hard to make a completely
> apples-to-apples comparison because the two databases are so different,
> but I don't think this one proves that PG is faster than Oracle.
>
> regards, tom lane
>

As is usually the case, you're right. I will try copying the table to
Postgres over the weekend, my management would not look kindly upon my
copying 35GB of the production data during the working hours, for the
scientific reasons. I have the storage and I can test, I will post the
result. I developed quite an efficient Perl script which does copying
without the intervening CSV file, so that the copy should not take more
than 2 hours. I will be able to impose a shared lock on the table over
the weekend, so that I don't blow away the UNDO segments.

--

Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com
The Leader in Integrated Media Intelligence Solutions

From:	Joe Uhl <joeuhl(at)gmail(dot)com>
To:	Neil Whelchel <neil(dot)whelchel(at)gmail(dot)com>
Cc:	"pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: Slow count(*) again...
Date:	2010-10-12 14:19:57
Message-ID:	4CB46E8D.1010800@gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

>> The biggest single problem with "select count(*)" is that it is
>> seriously overused. People use that idiom to establish existence, which
>> usually leads to a performance disaster in the application using it,
>> unless the table has no more than few hundred records. SQL language, of
>> which PostgreSQL offers an excellent implementation, offers [NOT]
>> EXISTS clause since its inception in the Jurassic era. The problem is
>> with the sequential scan, not with counting. I'd even go as far as to
>> suggest that 99% instances of the "select count(*)" idiom are probably
>> bad use of the SQL language.
>
> I agree, I have seen many very bad examples of using count(*). I will go so
> far as to question the use of count(*) in my examples here. It there a better
> way to come up with a page list than using count(*)? What is the best method
> to make a page of results and a list of links to other pages of results? Am I
> barking up the wrong tree here?
One way I have dealt with this on very large tables is to cache the
count(*) at the application level (using memcached, terracotta, or
something along those lines) and then increment that cache whenever you
add a row to the relevant table. On application restart that cache is
re-initialized with a regular old count(*). This approach works really
well and all large systems in my experience need caching in front of the
DB eventually. If you have a simpler system with say a single
application/web server you can simply store the value in a variable, the
specifics would depend on the language and framework you are using.

Another more all-DB approach is to create a statistics tables into which
you place aggregated statistics rows (num deleted, num inserted, totals,
etc) at an appropriate time interval in your code. So you have rows
containing aggregated statistics information for the past and some tiny
portion of the new data happening right now that hasn't yet been
aggregated. Queries then look like a summation of the aggregated values
in the statistics table plus a count(*) over just the newest portion of
the data table and are generally very fast.

Overall I have found that once things get big the layers of your app
stack start to blend together and have to be combined in clever ways to
keep speed up. Postgres is a beast but when you run into things it
can't do well just find a way to cache it or make it work together with
some other persistence tech to handle those cases.

From:	bricklen <bricklen(at)gmail(dot)com>
To:	Neil Whelchel <neil(dot)whelchel(at)gmail(dot)com>
Cc:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: Slow count(*) again...
Date:	2010-10-12 15:12:53
Message-ID:	AANLkTimXSWapozrZNdiRXDSm2hdsha1As94NwuS967As@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On Sat, Oct 9, 2010 at 4:26 PM, Neil Whelchel <neil(dot)whelchel(at)gmail(dot)com> wrote:
> Maybe an
> estimate(*) that works like count but gives an answer from the index without
> checking visibility? I am sure that this would be good enough to make a page
> list, it is really no big deal if it errors on the positive side, maybe the
> list of pages has an extra page off the end. I can live with that. What I
> can't live with is taking 13 seconds to get a page of results from 850,000
> rows in a table.
> -Neil-
>

FWIW, Michael Fuhr wrote a small function to parse the EXPLAIN plan a
few years ago and it works pretty well assuming your stats are up to
date.

http://markmail.org/message/gknqthlwry2eoqey

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	"Greg Smith" <greg(at)2ndquadrant(dot)com>, "Jon Nelson" <jnelson+pgsql(at)jamponi(dot)net>
Cc:	"Vitalii Tymchyshyn" <tivv00(at)gmail(dot)com>, "david(at)lang(dot)hm" <david(at)lang(dot)hm>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>, "Craig Ringer" <craig(at)postnewspapers(dot)com(dot)au>, <mladen(dot)gogala(at)vmsinfo(dot)com>
Subject:	Re: Slow count(*) again...
Date:	2010-10-12 15:29:22
Message-ID:	4CB43882020000250003681C@gw.wicourts.gov
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

Jon Nelson <jnelson+pgsql(at)jamponi(dot)net> wrote:
> Greg Smith <greg(at)2ndquadrant(dot)com> wrote:

>> Usually the sequence used to remove all cached data from RAM
>> before a benchmark is:
>
> All cached data (as cached in postgresql - *not* the Linux system
> caches)..., right?

No. The stop and start of PostgreSQL causes empty PostgreSQL
caches. These lines, in between the stop and start, force the Linux
cache to be empty (on recent kernel versions):

sync
echo 3 > /proc/sys/vm/drop_caches

-Kevin

From:	Dan Harris <fbsd(at)drivefaster(dot)net>
To:	pgsql-performance <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: Slow count(*) again...
Date:	2010-10-12 15:39:19
Message-ID:	4CB48127.6090200@drivefaster.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On 10/11/10 8:02 PM, Scott Carey wrote:
> would give you a 1MB read-ahead. Also, consider XFS and its built-in defragmentation. I have found that a longer lived postgres DB will get extreme
> file fragmentation over time and sequential scans end up mostly random. On-line file defrag helps tremendously.
>
We just had a corrupt table caused by an XFS online defrag. I'm scared
to use this again while the db is live. Has anyone else found this to
be safe? But, I can vouch for the fragmentation issue, it happens very
quickly in our system.

-Dan

From:	david(at)lang(dot)hm
To:	Joe Uhl <joeuhl(at)gmail(dot)com>
Cc:	Neil Whelchel <neil(dot)whelchel(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: Slow count(*) again...
Date:	2010-10-12 15:48:34
Message-ID:	alpine.DEB.2.00.1010120846410.21889@asgard.lang.hm
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On Tue, 12 Oct 2010, Joe Uhl wrote:

>>> The biggest single problem with "select count(*)" is that it is
>>> seriously overused. People use that idiom to establish existence, which
>>> usually leads to a performance disaster in the application using it,
>>> unless the table has no more than few hundred records. SQL language, of
>>> which PostgreSQL offers an excellent implementation, offers [NOT]
>>> EXISTS clause since its inception in the Jurassic era. The problem is
>>> with the sequential scan, not with counting. I'd even go as far as to
>>> suggest that 99% instances of the "select count(*)" idiom are probably
>>> bad use of the SQL language.
>>
>> I agree, I have seen many very bad examples of using count(*). I will go so
>> far as to question the use of count(*) in my examples here. It there a
>> better
>> way to come up with a page list than using count(*)? What is the best
>> method
>> to make a page of results and a list of links to other pages of results? Am
>> I
>> barking up the wrong tree here?
> One way I have dealt with this on very large tables is to cache the count(*)
> at the application level (using memcached, terracotta, or something along
> those lines) and then increment that cache whenever you add a row to the
> relevant table. On application restart that cache is re-initialized with a
> regular old count(*). This approach works really well and all large systems
> in my experience need caching in front of the DB eventually. If you have a
> simpler system with say a single application/web server you can simply store
> the value in a variable, the specifics would depend on the language and
> framework you are using.

this works if you know ahead of time what the criteria of the search is
going to be.

so it will work for

select count(*) from table;

what this won't work for is cases wher the criteria of the search is
unpredictable, i.e.

ask the user for input

select count(*) from table where field=$input;

David Lang

> Another more all-DB approach is to create a statistics tables into which you
> place aggregated statistics rows (num deleted, num inserted, totals, etc) at
> an appropriate time interval in your code. So you have rows containing
> aggregated statistics information for the past and some tiny portion of the
> new data happening right now that hasn't yet been aggregated. Queries then
> look like a summation of the aggregated values in the statistics table plus a
> count(*) over just the newest portion of the data table and are generally
> very fast.
>
> Overall I have found that once things get big the layers of your app stack
> start to blend together and have to be combined in clever ways to keep speed
> up. Postgres is a beast but when you run into things it can't do well just
> find a way to cache it or make it work together with some other persistence
> tech to handle those cases.
>
>
>

From:	david(at)lang(dot)hm
To:	Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>
Cc:	Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, Vitalii Tymchyshyn <tivv00(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: Slow count(*) again...
Date:	2010-10-12 15:52:48
Message-ID:	alpine.DEB.2.00.1010120849380.21889@asgard.lang.hm
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On Tue, 12 Oct 2010, Mladen Gogala wrote:

> david(at)lang(dot)hm wrote:
>> from a PR point of view, speeding up the trivil count(*) case could be
>> worth it, just to avoid people complaining about it not being fast.
>>
>>
> Fixing PR stuff is not the approach that I would take. People are complaining
> about select count(*) because they're using it in all the wrong places.

that may be the case, but if it's possible to make it less painful it will
mean more people use postgres, both because it works better for them when
they are using the suboptimal programs, but also because when people do
their trivial testing of databases to decide which one they will use, they
won't rule out postgres because "it's so slow"

the fact of the matter is that people do use count(*), and even though
there are usually ways to avoid doing so, having the programmer have to do
something different for postgres than they do for other databases is
raising a barrier against postgres untilization in anything.

David Lang

> My
> assessment that there is a problem with sequential scan was wrong. Now, let's
> again take Oracle as the measure.
> Someone asked me about caching the data. Here it is:
>
> SQL> connect system/*********
> Connected.
> SQL> alter system flush buffer_cache;
>
> System altered.
>
> Elapsed: 00:00:12.68
> SQL> connect adbase/*********
> Connected.
> SQL> alter session set db_file_multiblock_read_Count=128;
>
> Session altered.
>
> Elapsed: 00:00:00.41
> SQL> select count(*) from ni_occurrence;
>
> COUNT(*)
> ----------
> 402062638
>
> Elapsed: 00:02:37.77
>
> SQL> select bytes/1048576 MB from user_segments
> 2 where segment_name='NI_OCCURRENCE';
>
> MB
> ----------
> 35329
>
> Elapsed: 00:00:00.20
> SQL>
>
>
> So, the results weren't cached the first time around. The explanation is the
> fact that Oracle, as of the version 10.2.0, reads the table in the private
> process memory, not in the shared buffers. This table alone is 35GB in
> size, Oracle took 2 minutes 47 seconds to read it using the full table scan.
> If I do the same thing with PostgreSQL and a comparable table, Postgres is,
> in fact, faster:
>
> psql (9.0.1)
> Type "help" for help.
>
> news=> \timing
> Timing is on.
> news=> select count(*) from moreover_documents_y2010m09;
> count ----------
> 17242655
> (1 row)
>
> Time: 113135.114 ms
> news=> select pg_size_pretty(pg_table_size('moreover_documents_y2010m09'));
> pg_size_pretty
> ----------------
> 27 GB
> (1 row)
>
> Time: 100.849 ms
> news=>
>
> The number of rows is significantly smaller, but the table contains rather
> significant "text" field which consumes quite a bit of TOAST storage and the
> sizes are comparable. Postgres read through 27GB in 113 seconds, less than 2
> minutes and oracle took 2 minutes 37 seconds to read through 35GB. I stand
> corrected: there is nothing wrong with the speed of the Postgres sequential
> scan.
>
>
>

From:	david(at)lang(dot)hm
To:	Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
Cc:	Vitalii Tymchyshyn <tivv00(at)gmail(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject:	Re: Slow count(*) again...
Date:	2010-10-12 15:54:24
Message-ID:	alpine.DEB.2.00.1010120852530.21889@asgard.lang.hm
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On Tue, 12 Oct 2010, Craig Ringer wrote:

> On 10/12/2010 04:22 PM, david(at)lang(dot)hm wrote:
>
>> from a PR point of view, speeding up the trivil count(*) case could be
>> worth it, just to avoid people complaining about it not being fast.
>
> At the cost of a fair bit more complexity, though, and slowing everything
> else down.

complexity probably, although given how complex the planner is already is
this significant?

as far as slowing everything else down, why would it do that? (beyond the
simple fact that any new thing the planner can do makes the planner take a
little longer)

David Lang

> The proper solution here remains, IMO, support for visibility information in
> indexes, whether by storing it once in the index and once in the heap
> (ouch!), storing it out-of-line, or using a covering index where one or more
> columns are stored wholly in the index not in the table heap at all.
>
> Here are a few of the many past discussions about this that have already
> covered some of the same ground:
>
> http://stackoverflow.com/questions/839015/postgres-could-an-index-organized-tables-paved-way-for-faster-select-count-fr
>
> http://osdir.com/ml/db.postgresql.performance/2003-10/msg00075.html
> (and the rest of the thread)
>
> A decent look with Google will find many, many more.
>
>> in the case where you are doing a count(*) where query and the where is
>> on an indexed column, could the search just look at the index + the
>> visibility mapping rather than doing an sequential search through the
>> table?
>
> Nope, because the visibility map, which is IIRC only one bit per page,
> doesn't record how many tuples there are on the page, or enough information
> about them to determine how many of them are visible to the current
> transaction*.
>
>> as for your worries about the accuracy of a visibility based count in
>> the face of other transactions, wouldn't you run into the same issues if
>> you are doing a sequential scan with the same transactions in process?
>
> No. Every tuple in a table heap in postgresql has hidden fields, some of
> which are used to determine whether the current transaction* can "see" the
> tuple - it may have been inserted after this transaction started, or deleted
> before this transaction started, so it's not visible to this transaction but
> may still be to others.
>
> http://www.postgresql.org/docs/current/static/ddl-system-columns.html
>
> This information isn't available in the visibility map, or in indexes. That's
> why PostgreSQL has to hit the heap to find it.
>
> * current transaction should really be "current snapshot". The snapshot is
> taken at the start of the whole transaction for SERIALIZABLE isolation, and
> at the start of each statement for READ COMMITTED isolation.
>
> --
> Craig Ringer
>

From:	Scott Carey <scott(at)richrelevance(dot)com>
To:	Samuel Gendler <sgendler(at)ideasculptor(dot)com>
Cc:	Greg Smith <greg(at)2ndquadrant(dot)com>, Joshua Tolley <eggyknap(at)gmail(dot)com>, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, "mladen(dot)gogala(at)vmsinfo(dot)com" <mladen(dot)gogala(at)vmsinfo(dot)com>, Neil Whelchel <neil(dot)whelchel(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: Slow count(*) again...
Date:	2010-10-12 16:02:39
Message-ID:	0CA660FA-F59C-494E-BFA9-98CE8E527378@richrelevance.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On Oct 11, 2010, at 9:21 PM, Samuel Gendler wrote:

On Mon, Oct 11, 2010 at 9:06 PM, Scott Carey <scott(at)richrelevance(dot)com<mailto:scott(at)richrelevance(dot)com>> wrote:
I can't speak to documentation, but it is something that helps as your I/O subsystem gets more powerful, and how much it helps depends more on your hardware, which may have adaptive read ahead on its own, and your file system which may be more or less efficient at sequential I/O. For example ext3 out of the box gets a much bigger gain from tuning read-ahead than XFS on a DELL PERC6 RAID card (but still ends up slower).

Geez. I wish someone would have written something quite so bold as 'xfs is always faster than ext3' in the standard tuning docs. I couldn't find anything that made a strong filesystem recommendation. How does xfs compare to ext4? I wound up on ext4 on a dell perc6 raid card when an unexpected hardware failure on a production system caused my test system to get thrown into production before I could do any serious testing of xfs. If there is a strong consensus that xfs is simply better, I could afford the downtime to switch.

As it happens, this is a system where all of the heavy workload is in the form of sequential scan type load. The OLTP workload is very minimal (tens of queries per minute on a small number of small tables), but there are a lot of reporting queries that wind up doing sequential scans of large partitions (millions to tens of millions of rows). We've sized the new hardware so that the most commonly used partitions fit into memory, but if we could speed the queries that touch less frequently used partitions, that would be good. I'm the closest thing our team has to a DBA, which really only means that I'm the one person on the dev team or the ops team to have read all of the postgres docs and wiki and the mailing lists. I claim no actual DBA experience or expertise and have limited cycles to devote to tuning and testing, so if there is an established wisdom for filesystem choice and read ahead tuning, I'd be very interested in hearing it.

ext4 is a very fast file system. Its faster than ext2, but has many more features and has the all-important journaling.

However, for large reporting queries and sequential scans, XFS will win in the long run if you use the online defragmenter. Otherwise, your sequential scans won't be all that sequential on any file system over time if your tables aren't written once, forever, serially. Parallel restore will result in a system that is fragmented -- ext4 will do best at limiting this on the restore, but only xfs has online defragmentation. We schedule ours daily and it noticeably improves sequential scan I/O.

Supposedly, an online defragmenter is in the works for ext4 but it may be years before its available.

From:	Chris Browne <cbbrowne(at)acm(dot)org>
To:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: Slow count(*) again...
Date:	2010-10-12 16:03:57
Message-ID:	87iq179z4i.fsf@cbbrowne.afilias-int.info
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

sgendler(at)ideasculptor(dot)com (Samuel Gendler) writes:
> Geez. I wish someone would have written something quite so bold as
> 'xfs is always faster than ext3' in the standard tuning docs. I
> couldn't find anything that made a strong filesystem
> recommendation. How does xfs compare to ext4? I wound up on ext4 on
> a dell perc6 raid card when an unexpected hardware failure on a
> production system caused my test system to get thrown into production
> before I could do any serious testing of xfs. If there is a strong
> consensus that xfs is simply better, I could afford the downtime to
> switch.

It's news to me (in this thread!) that XFS is actually "getting some
developer love," which is a pretty crucial factor to considering it
relevant.

XFS was an SGI creation, and, with:

a) the not-scintillating performance of the company,

b) the lack of a lot of visible work going into the filesystem,

c) the paucity of support by Linux vendors (for a long time, if you
told RHAT you were having problems, and were using XFS, the next
step would be to park the ticket awaiting your installing a
"supported filesystem")

it didn't look like XFS was a terribly good bet. Those issues were
certainly causing concern a couple of years ago.

Faster "raw performance" isn't much good if it comes with a risk of:
- Losing data
- Losing support from vendors

If XFS now *is* getting support from both the development and support
perspectives, then the above concerns may have been invalidated. It
would be very encouraging, if so.
--
output = ("cbbrowne" "@" "gmail.com")
Rules of the Evil Overlord #228. "If the hero claims he wishes to
confess in public or to me personally, I will remind him that a
notarized deposition will serve just as well."

From:	Samuel Gendler <sgendler(at)ideasculptor(dot)com>
To:	Scott Carey <scott(at)richrelevance(dot)com>
Cc:	Greg Smith <greg(at)2ndquadrant(dot)com>, Joshua Tolley <eggyknap(at)gmail(dot)com>, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, "mladen(dot)gogala(at)vmsinfo(dot)com" <mladen(dot)gogala(at)vmsinfo(dot)com>, Neil Whelchel <neil(dot)whelchel(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: Slow count(*) again...
Date:	2010-10-12 16:23:06
Message-ID:	AANLkTinwiEWnTLNJ7e++KdxynQyt4p2_s7Vq_PQtYbZ8@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On Tue, Oct 12, 2010 at 9:02 AM, Scott Carey <scott(at)richrelevance(dot)com>wrote:
>
>
> However, for large reporting queries and sequential scans, XFS will win in
> the long run if you use the online defragmenter. Otherwise, your sequential
> scans won't be all that sequential on any file system over time if your
> tables aren't written once, forever, serially. Parallel restore will
> result in a system that is fragmented -- ext4 will do best at limiting this
> on the restore, but only xfs has online defragmentation. We schedule ours
> daily and it noticeably improves sequential scan I/O.
>
>
Our reporting tables are written sequentially and left unmodified until
entire partitions are dropped. However, equivalent partitions tend to get a
little bit larger over time, so newer partitions won't necessarily fit into
the gaps left by prior partition drops, so it is possible that partitions
will be split into two sections, but should still be very sequential, if not
perfectly so. It would seem that we stumbled into an ideal architecture for
doing this kind of work - mostly by virtue of starting with 8.2.x and having
huge problems with autovacuum and vacuum taking forever and dragging the db
to halt, which caused us to move to an architecture which aggregates and
then drops older data in entire partitions instead of updating aggregates
individually and then deleting rows. Partitions are sized such that most
reporting queries run over entire partitions, too (which was completely
accidental since I had not yet delved into individual query optimization at
the time), so even though we are doing sequential scans, we at least run as
few of them as possible and are able to keep hot data in memory.

--sam

From:	Scott Carey <scott(at)richrelevance(dot)com>
To:	Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
Cc:	Vitalii Tymchyshyn <tivv00(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: Slow count(*) again...
Date:	2010-10-12 16:35:46
Message-ID:	AFDEB033-DC31-46BF-ABF9-561A9D1AE4E0@richrelevance.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

>>
>
> A count with any joins or filter criteria would still have to scan all
> pages with visible tuples in them. So the visibility map helps speed up
> scanning of bloated tables, but doesn't provide a magical "fast count"
> except in the utterly trivial "select count(*) from tablename;" case,
> and can probably only be used for accurate results when there are no
> read/write transactions currently open.

select count(*) from tablename where [condition or filter that can use an index] [group by on columns in the index]

will also work, I think.

Additionally, I think it can work if other open transactions exist, provided they haven't written to the table being scanned. If they have, then only those pages that have been altered and marked in the visibility map need to be cracked open the normal way.

> Even if you kept a count of
> tuples in each page along with the mvcc transaction ID information
> required to determine for which transactions that count is valid, it'd
> only be useful if you didn't have to do any condition checks, and it'd
> be yet another thing to update with every insert/delete/update.
>

Yes, lots of drawbacks and added complexity.

> Perhaps for some users that'd be worth having, but it seems to me like
> it'd have pretty narrow utility. I'm not sure that's the answer.
>
> --
> Craig Ringer
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance

From:	Jesper Krogh <jesper(at)krogh(dot)cc>
To:	Scott Carey <scott(at)richrelevance(dot)com>
Cc:	Samuel Gendler <sgendler(at)ideasculptor(dot)com>, Greg Smith <greg(at)2ndquadrant(dot)com>, Joshua Tolley <eggyknap(at)gmail(dot)com>, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, "mladen(dot)gogala(at)vmsinfo(dot)com" <mladen(dot)gogala(at)vmsinfo(dot)com>, Neil Whelchel <neil(dot)whelchel(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: Slow count(*) again...
Date:	2010-10-12 16:38:12
Message-ID:	4CB48EF4.5030502@krogh.cc
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On 2010-10-12 18:02, Scott Carey wrote:
> However, for large reporting queries and sequential scans, XFS will
> win in the long run if you use the online defragmenter. Otherwise,
> your sequential scans won't be all that sequential on any file system
> over time if your tables aren't written once, forever, serially.
> Parallel restore will result in a system that is fragmented -- ext4
> will do best at limiting this on the restore, but only xfs has online
> defragmentation. We schedule ours daily and it noticeably improves
> sequential scan I/O.
>
> Supposedly, an online defragmenter is in the works for ext4 but it
> may be years before its available.

If some clever postgres hacker could teach postgres to allocate blocks
using posix_fallocate in quite large batches, say .. something like:
fallocate(min(current_relation_size *0.1,1073741824));

So if you have a relations filling 10GB allready, they the next "file"
for the
relations is just fully allocated on the first byte by the filesystem. That
would ensure that large table is sitting efficiently on the filesystem
level with
a minimum of fragmentation on ext4(and other FS's supporting
posix_fallocate)
and for small systems it would only fill 10% more of diskspace... ..

.. last night I spend an hour looking for where its done but couldnt
find the
source-file where extention of an existing relation takes place.. can
someone give directions?

--
Jesper

From:	Scott Carey <scott(at)richrelevance(dot)com>
To:	Dan Harris <fbsd(at)drivefaster(dot)net>
Cc:	pgsql-performance <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: Slow count(*) again...
Date:	2010-10-12 16:44:02
Message-ID:	8BE535FE-F9DB-4CBC-A932-EB4F07905698@richrelevance.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On Oct 12, 2010, at 8:39 AM, Dan Harris wrote:

> On 10/11/10 8:02 PM, Scott Carey wrote:
>> would give you a 1MB read-ahead. Also, consider XFS and its built-in defragmentation. I have found that a longer lived postgres DB will get extreme
>> file fragmentation over time and sequential scans end up mostly random. On-line file defrag helps tremendously.
>>
> We just had a corrupt table caused by an XFS online defrag. I'm scared
> to use this again while the db is live. Has anyone else found this to
> be safe? But, I can vouch for the fragmentation issue, it happens very
> quickly in our system.
>

What version? I'm using the latest CentoOS extras build.

We've been doing online defrag for a while now on a very busy database with > 8TB of data. Not that that means there are no bugs...

It is a relatively simple thing in xfs -- it writes a new file to temp in a way that allocates contiguous space if available, then if the file has not been modified since it was re-written it is essentially moved on top of the other one. This should be safe provided the journaling and storage is safe, etc.

> -Dan
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance

From:	Scott Carey <scott(at)richrelevance(dot)com>
To:	"<david(at)lang(dot)hm>" <david(at)lang(dot)hm>
Cc:	Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, Vitalii Tymchyshyn <tivv00(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: Slow count(*) again...
Date:	2010-10-12 16:46:08
Message-ID:	0D179340-9AD4-4118-B114-37C0B9D82FC5@richrelevance.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On Oct 12, 2010, at 8:54 AM, <david(at)lang(dot)hm> wrote:

> On Tue, 12 Oct 2010, Craig Ringer wrote:
>
>> On 10/12/2010 04:22 PM, david(at)lang(dot)hm wrote:
>>
>>> from a PR point of view, speeding up the trivil count(*) case could be
>>> worth it, just to avoid people complaining about it not being fast.
>>
>> At the cost of a fair bit more complexity, though, and slowing everything
>> else down.
>
> complexity probably, although given how complex the planner is already is
> this significant?
>
> as far as slowing everything else down, why would it do that? (beyond the
> simple fact that any new thing the planner can do makes the planner take a
> little longer)
>
> David Lang
>
I wouldn't even expect the planner to do more work. An Index Scan can simply avoid going to the tuples for visibility under some circumstances.

From:	Scott Carey <scott(at)richrelevance(dot)com>
To:	Scott Carey <scott(at)richrelevance(dot)com>
Cc:	"<david(at)lang(dot)hm>" <david(at)lang(dot)hm>, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, Vitalii Tymchyshyn <tivv00(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: Slow count(*) again...
Date:	2010-10-12 16:50:40
Message-ID:	AFE68F36-2B41-4065-A6EB-78C908FBEC95@richrelevance.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On Oct 12, 2010, at 9:46 AM, Scott Carey wrote:

>
> On Oct 12, 2010, at 8:54 AM, <david(at)lang(dot)hm> wrote:
>
>> On Tue, 12 Oct 2010, Craig Ringer wrote:
>>
>>> On 10/12/2010 04:22 PM, david(at)lang(dot)hm wrote:
>>>
>>>> from a PR point of view, speeding up the trivil count(*) case could be
>>>> worth it, just to avoid people complaining about it not being fast.
>>>
>>> At the cost of a fair bit more complexity, though, and slowing everything
>>> else down.
>>
>> complexity probably, although given how complex the planner is already is
>> this significant?
>>
>> as far as slowing everything else down, why would it do that? (beyond the
>> simple fact that any new thing the planner can do makes the planner take a
>> little longer)
>>
>> David Lang
>>
> I wouldn't even expect the planner to do more work. An Index Scan can simply avoid going to the tuples for visibility under some circumstances.
>
>
Of course, the planner has to .... Otherwise it won't choose the Index Scan over the sequential scan. So the cost of index scans when all the info other than visibility is in the index would need to be lowered.

> --
> Sent via pgsql-performance mailing list (pgsql-performance(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance

From:	Dan Harris <fbsd(at)drivefaster(dot)net>
To:	pgsql-performance <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: Slow count(*) again...
Date:	2010-10-12 17:06:47
Message-ID:	4CB495A7.6010700@drivefaster.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On 10/12/10 10:44 AM, Scott Carey wrote:
> On Oct 12, 2010, at 8:39 AM, Dan Harris wrote:
>
>> On 10/11/10 8:02 PM, Scott Carey wrote:
>>> would give you a 1MB read-ahead. Also, consider XFS and its built-in defragmentation. I have found that a longer lived postgres DB will get extreme
>>> file fragmentation over time and sequential scans end up mostly random. On-line file defrag helps tremendously.
>>>
>> We just had a corrupt table caused by an XFS online defrag. I'm scared
>> to use this again while the db is live. Has anyone else found this to
>> be safe? But, I can vouch for the fragmentation issue, it happens very
>> quickly in our system.
>>
> What version? I'm using the latest CentoOS extras build.
>
> We've been doing online defrag for a while now on a very busy database with> 8TB of data. Not that that means there are no bugs...
>
> It is a relatively simple thing in xfs -- it writes a new file to temp in a way that allocates contiguous space if available, then if the file has not been modified since it was re-written it is essentially moved on top of the other one. This should be safe provided the journaling and storage is safe, etc.
>
I'm not sure how to figure out what version of XFS we're on.. but it's
Linux kernel 2.6.24-24 x86_64 on Ubuntu Server 8.04.3. Postgres version 8.3

We're due for an upgrade on that server soon so we'll do some more
testing once we upgrade. Right now we are just living with the
fragmentation. I'm glad to hear the regular on-line defrag is working
successfully, at least that gives me hope we can rely on it in the future.

-Dan

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>
Cc:	"david(at)lang(dot)hm" <david(at)lang(dot)hm>, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, Vitalii Tymchyshyn <tivv00(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: Slow count(*) again...
Date:	2010-10-12 17:07:58
Message-ID:	27107.1286903278@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

So I spent a bit of quality time with oprofile this morning, and found
once again that there's no substitute for having actual data before
theorizing.

Test case software: current Git HEAD (plus one code change explained
below), compiled with --enable-debug to support oprofile, cassert off;
no other special configure options. Running on current Fedora 13 (gcc
4.4.4 in particular). All postgresql.conf options are out-of-the-box.

Test case hardware: recently purchased mid-grade desktop, dual Xeon
E5503 processors (Nehalem cores, 2GHZ), 4GB DDR3-800 RAM, no-name
SATA disk.

Test query: "select count(*) from t" where t has 4 nonnull integer
columns and 81920000 rows, occupying 3459MB. I chose that size
specifically to fit into available RAM, so that on repeated executions
no physical I/O will occur.

On this setup I find that "select count(*)" runs in about 7.5sec when
the data is fully cached in RAM, for a scanning speed of 460MB/sec.
This is well in excess of what the machine's disk hardware can do:
bonnie++ rates the machine's disk read speed at 152MB/sec. So in theory
PG should be able to completely saturate the disk when processing a table
bigger than RAM. In reality the test case run time if I've just flushed
cache is about 28sec, working out to a scan rate of 123MB/sec. I expect
if I'd bothered to tune the kernel readahead parameters as outlined
earlier in this thread, I could get to 150MB/sec.

Now of course this disk setup is far from industrial strength, but the
processor isn't what you'd put in a serious database server either (in
particular, its available memory bandwidth is well behind the curve).
Also, the table is pretty narrow (only 16 payload bytes per row), and
any wider test table would show a pretty much linear scaling of achievable
scan rate versus table width. So I don't see much support here at all
for the notion that we scan slower than available disk bandwidth.

Further details from poking at it with oprofile: in the fully-cached
case the CPU time is about 80% Postgres and 20% kernel. That kernel
time is of course all to do with moving pages from kernel disk buffers
into Postgres shared memory. Although I've not bothered to increase
shared_buffers from the default 32MB, it wouldn't matter on this benchmark
unless I were able to make shared_buffers hold the entire table ... and
even then I'd only save 20%.

oprofile further shows that (with stock Postgres sources) the userspace
runtime breaks down like this:

samples % symbol name
141267 13.0810 heapgettup_pagemode
85947 7.9585 advance_aggregates
83031 7.6885 ExecProject
78975 7.3129 advance_transition_function
75060 6.9504 heapgetpage
73540 6.8096 ExecClearTuple
69355 6.4221 ExecProcNode
59288 5.4899 heap_getnext
57745 5.3470 ExecScan
55618 5.1501 HeapTupleSatisfiesMVCC
47057 4.3574 MemoryContextReset
41904 3.8802 ExecStoreTuple
37146 3.4396 SeqNext
32206 2.9822 ExecAgg
22135 2.0496 XidInMVCCSnapshot
21142 1.9577 int8inc
19280 1.7853 AllocSetReset
18211 1.6863 hash_search_with_hash_value
16285 1.5079 TransactionIdPrecedes

I also looked at the source-line-level breakdown, though that's too bulky
to post here. The most interesting fact here is that tuple visibility
testing (MVCC) overhead is simply nonexistent: it'd be in heapgetpage()
if it were being done, which it isn't because all the pages of the table
have the PageIsAllVisible bit set. In a previous run where those bits
weren't set but the per-tuple hint bits were, visibility testing still
only ate a percent or two of the runtime. So the theory some people have
espoused in this thread that visibility testing is the bottleneck doesn't
hold water either. If you go back and look at previous pgsql-hackers
discussions about that, what people have been worried about is not the CPU
cost of visibility testing but the need for indexscan queries to visit
the heap for no other purpose than to check the visibility flags. In a
seqscan it's not going to matter.

I looked a bit more closely at the heapgettup_pagemode timing. The
lines shown by opannotate as more than 0.1 percent of the runtime are

22545 2.2074 :{ /* heapgettup_pagemode total: 153737 15.0528 */
5685 0.5566 : bool backward = ScanDirectionIsBackward(dir);
5789 0.5668 : if (!scan->rs_inited)
5693 0.5574 : lineindex = scan->rs_cindex + 1;
11429 1.1190 : dp = (Page) BufferGetPage(scan->rs_cbuf);
5693 0.5574 : linesleft = lines - lineindex;
5766 0.5646 : while (linesleft > 0)
5129 0.5022 : lineoff = scan->rs_vistuples[lineindex];
44461 4.3533 : tuple->t_data = (HeapTupleHeader) PageGetItem((Page) dp, lpp);
11135 1.0903 : tuple->t_len = ItemIdGetLength(lpp);
5692 0.5573 : if (key != NULL)
5773 0.5653 : HeapKeyTest(tuple, RelationGetDescr(scan->rs_rd),
5674 0.5556 : scan->rs_cindex = lineindex;
11406 1.1168 :}

There doesn't seem to be a whole lot of room for improvement there.
Maybe we could shave a couple percent with some tenser coding (I'm
wondering why HeapKeyTest is being reached, in particular, when there's
no WHERE clause). But any local changes here will be marginal at best.

One thing I did find is that the time spent in ExecProject/ExecClearTuple,
amounting to nearly 15% of the runtime, is just for evaluating the
arguments of the aggregate ... and count(*) hasn't got any arguments.
So a patch like this improves the run speed by about 15%:

diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/nodeAgg.c
index a7dafeb..051e70c 100644
*** a/src/backend/executor/nodeAgg.c
--- b/src/backend/executor/nodeAgg.c
*************** advance_aggregates(AggState *aggstate, A
*** 480,486 ****
TupleTableSlot *slot;

/* Evaluate the current input expressions for this aggregate */
! slot = ExecProject(peraggstate->evalproj, NULL);

if (peraggstate->numSortCols > 0)
{
--- 480,489 ----
TupleTableSlot *slot;

/* Evaluate the current input expressions for this aggregate */
! if (peraggstate->evalproj)
! slot = ExecProject(peraggstate->evalproj, NULL);
! else
! slot = peraggstate->evalslot;

if (peraggstate->numSortCols > 0)
{
*************** ExecInitAgg(Agg *node, EState *estate, i
*** 1728,1738 ****
peraggstate->evalslot = ExecInitExtraTupleSlot(estate);
ExecSetSlotDescriptor(peraggstate->evalslot, peraggstate->evaldesc);

! /* Set up projection info for evaluation */
! peraggstate->evalproj = ExecBuildProjectionInfo(aggrefstate->args,
! aggstate->tmpcontext,
! peraggstate->evalslot,
! NULL);

/*
* If we're doing either DISTINCT or ORDER BY, then we have a list of
--- 1731,1744 ----
peraggstate->evalslot = ExecInitExtraTupleSlot(estate);
ExecSetSlotDescriptor(peraggstate->evalslot, peraggstate->evaldesc);

! /* Set up projection info for evaluation, if agg has any args */
! if (aggrefstate->args)
! peraggstate->evalproj = ExecBuildProjectionInfo(aggrefstate->args,
! aggstate->tmpcontext,
! peraggstate->evalslot,
! NULL);
! else
! peraggstate->evalproj = NULL;

/*
* If we're doing either DISTINCT or ORDER BY, then we have a list of

bringing the oprofile results to

samples % symbol name
181660 17.9017 heapgettup_pagemode
138049 13.6040 advance_transition_function
102865 10.1368 advance_aggregates
80948 7.9770 ExecProcNode
79943 7.8780 heap_getnext
73384 7.2316 ExecScan
60607 5.9725 MemoryContextReset
53889 5.3105 ExecStoreTuple
46666 4.5987 SeqNext
40535 3.9945 ExecAgg
33481 3.2994 int8inc
32202 3.1733 heapgetpage
26068 2.5689 AllocSetReset
18493 1.8224 hash_search_with_hash_value
8679 0.8553 LWLockAcquire
6615 0.6519 ExecSeqScan
6583 0.6487 LWLockRelease
3928 0.3871 hash_any
3715 0.3661 ReadBuffer_common

(note that this, not the stock code, is what corresponds to the 7.5sec
runtime I quoted above --- it's about 8.5sec without that change).

At this point what we've got is 25% of the runtime in nodeAgg.c overhead,
and it's difficult to see how to get any real improvement without tackling
that. Rather than apply the patch shown above, I'm tempted to think about
hard-wiring COUNT(*) as a special case in nodeAgg.c such that we don't go
through advance_aggregates/advance_transition_function at all, but just
increment a counter directly. However, that would very clearly be
optimizing COUNT(*) and nothing else. Given the opinions expressed
elsewhere in this thread that heavy reliance on COUNT(*) represents
bad application design, I'm not sure that such a patch would meet with
general approval.

Actually the patch shown above is optimizing COUNT(*) and nothing else,
too, since it's hard to conceive of any other zero-argument aggregate.

Anyway, if anyone is hot to make COUNT(*) faster, that's where to look.
I don't think any of the previous discussion in this thread is on-point
at all, except for the parts where people suggested avoiding it.

regards, tom lane

From:	Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>
To:	"david(at)lang(dot)hm" <david(at)lang(dot)hm>
Cc:	Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, Vitalii Tymchyshyn <tivv00(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: Slow count(*) again...
Date:	2010-10-12 17:36:46
Message-ID:	4CB49CAE.8060106@vmsinfo.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

david(at)lang(dot)hm wrote:
> On Tue, 12 Oct 2010, Mladen Gogala wrote:
>
>
>> david(at)lang(dot)hm wrote:
>>
>>> from a PR point of view, speeding up the trivil count(*) case could be
>>> worth it, just to avoid people complaining about it not being fast.
>>>
>>>
>>>
>> Fixing PR stuff is not the approach that I would take. People are complaining
>> about select count(*) because they're using it in all the wrong places.
>>
>
> that may be the case, but if it's possible to make it less painful it will
> mean more people use postgres, both because it works better for them when
> they are using the suboptimal programs, but also because when people do
> their trivial testing of databases to decide which one they will use, they
> won't rule out postgres because "it's so slow"
>
>

There is no free lunch. If the count field is maintained somewhere, the
concurrency will suffer. I find the idea of fixing the "count delusion"
ridiculous, may Richard Dawkins forgive me for this pun. Saying that
something is slow without testing and a proper
consideration is ridiculous. As a DBA, I usually get complaints like
"the database is slow today" 3 times before lunch, every day. The
database is never slow, the database is a warehouse where you keep your
data. What is slow is the access to the data, and that is done by, guess
what, the application program. Almost always, it's the application
that's slow, not the database. As for the "select count(*)", idiom, what
are you trying to do? Where are you using it? If you are using it for
pagination, consider the possibility of not specifying
the number of pages on the website, just the links "next -->" and "prev
<--". Alternatively, you can fetch a small amount into the web page and
direct the users who would like to see the complete information to a
background reporting too. Mixing batch reports and online reports is a
very easy thing to do. If you are using it to establish existence,
you're doing it wrong. I've had a problem like that this morning. A
developer came to me with the usual phrase that the "database is slow".
It was a PHP form which should write an output file and let the user
know where the file is. The function looks like this:

The relevant includes are here:

require ('Date.php');
require ('adodb5/tohtml.inc.php');
require_once ('adodb5/adodb.inc.php');
require_once ('adodb5/adodb-exceptions.inc.php');
require_once 'Spreadsheet/Excel/Writer.php';
$ADODB_FETCH_MODE = ADODB_FETCH_NUM;

So, what is the problem here? Why was the "database slow"? As it turns
out, the PEAR module for writing Excel spreadsheets, which is the tool
used here, creates the entire spreadsheet in memory and writes it out
on the "close" command. What was spinning was "httpd" process, the
database was completely and utterly idle, rolling thumbs and awaiting
orders. Using the "fputcsv" instead, made the function fly. The only
thing that was lost were the bold column titles. Changing little things
can result in the big performance gains. Making "select count(*)"
unnaturally fast would be tending to bad programming practices. I am
not sure that this is a desirable development. You can't expect people
to adjust the database software to your application. Applications are
always database specific. Writing an application that will access a
PostgreSQL database is not the same as writing an application that will
access an Oracle database.

--

Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com
The Leader in Integrated Media Intelligence Solutions

From:	Jesper Krogh <jesper(at)krogh(dot)cc>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>, "david(at)lang(dot)hm" <david(at)lang(dot)hm>, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, Vitalii Tymchyshyn <tivv00(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: Slow count(*) again...
Date:	2010-10-12 18:22:01
Message-ID:	4CB4A749.3080705@krogh.cc
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On 2010-10-12 19:07, Tom Lane wrote:
> Anyway, if anyone is hot to make COUNT(*) faster, that's where to look.
> I don't think any of the previous discussion in this thread is on-point
> at all, except for the parts where people suggested avoiding it.
>

I would have to say that allthough it is nice to get count(*) faster I
think your testing is way too simple.

It pretty much proves that in terms of the code involved in the
count(*) process there is not much to be achieved. But your table
has way to little payload. As PG currently is it will start by pushing
data off to TOAST when the tuple size reaches 1KB
and the speed of count(*) is very much dominated by the amount
of "dead weight" it has to draw in together with the heap-access for the
row on accessing the table. Creating a case where the table is this
slim is (in my viewpoint) very much to the extreme on the small side.

Just having 32 bytes bytes of "payload" would more or less double
you time to count if I read you test results correctly?. .. and in the
situation where diskaccess would be needed .. way more.

Dividing by pg_relation_size by the amout of tuples in our production
system I end up having no avg tuple size less than 100bytes.

.. without having complete insigt.. a visibillity map that could be used in
conjunction with indices would solve that. What the cost would be
of maintaining it is also a factor.

Jesper

--
Jesper

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Jesper Krogh <jesper(at)krogh(dot)cc>
Cc:	Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>, "david(at)lang(dot)hm" <david(at)lang(dot)hm>, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, Vitalii Tymchyshyn <tivv00(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: Slow count(*) again...
Date:	2010-10-12 18:58:13
Message-ID:	4292.1286909893@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

Jesper Krogh <jesper(at)krogh(dot)cc> writes:
> On 2010-10-12 19:07, Tom Lane wrote:
>> Anyway, if anyone is hot to make COUNT(*) faster, that's where to look.

> Just having 32 bytes bytes of "payload" would more or less double
> you time to count if I read you test results correctly?. .. and in the
> situation where diskaccess would be needed .. way more.

> Dividing by pg_relation_size by the amout of tuples in our production
> system I end up having no avg tuple size less than 100bytes.

Well, yeah. I deliberately tested with a very narrow table so as to
stress the per-row CPU costs as much as possible. With any wider table
you're just going to be I/O bound.

> .. without having complete insigt.. a visibillity map that could be used in
> conjunction with indices would solve that. What the cost would be
> of maintaining it is also a factor.

I'm less than convinced that that approach will result in a significant
win. It's certainly not going to do anything to convert COUNT(*) into
an O(1) operation, which frankly is what the complainants are expecting.
There's basically no hope of solving the "PR problem" without somehow
turning COUNT(*) into a materialized-view reference. We've discussed
that in the past, and know how to do it in principle, but the complexity
and distributed overhead are daunting.

regards, tom lane

From:	"Pierre C" <lists(at)peufeu(dot)com>
To:	"pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>, "Neil Whelchel" <neil(dot)whelchel(at)gmail(dot)com>
Subject:	Re: Slow count(*) again...
Date:	2010-10-12 21:35:01
Message-ID:	op.vkhfkn1yeorkce@apollo13
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

> suggest that 99% instances of the "select count(*)" idiom are probably
>> bad use of the SQL language.

Well, suppose you paginate results. If the user sees that the search query
returns 500 pages, there are two options :

- you're google, and your sorting algorithms are so good that the answer
the user wants is in the first page
- or the user will refine his search by entering more keywords tu get a
manageable result set

So, in both cases, the count(*) was useless anyway. And the slowest ones
are the most useless, since the user will immediatey discard the result
and refine his query.

If your full text search is slow, try Xapian or Lucene.

From:	Neil Whelchel <neil(dot)whelchel(at)gmail(dot)com>
To:	Joe Uhl <joeuhl(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: Slow count(*) again...
Date:	2010-10-12 22:21:31
Message-ID:	201010121521.32086.neil.whelchel@gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On Tuesday 12 October 2010 07:19:57 you wrote:
> >> The biggest single problem with "select count(*)" is that it is
> >> seriously overused. People use that idiom to establish existence, which
> >> usually leads to a performance disaster in the application using it,
> >> unless the table has no more than few hundred records. SQL language, of
> >> which PostgreSQL offers an excellent implementation, offers [NOT]
> >> EXISTS clause since its inception in the Jurassic era. The problem is
> >> with the sequential scan, not with counting. I'd even go as far as to
> >> suggest that 99% instances of the "select count(*)" idiom are probably
> >> bad use of the SQL language.
> >
> > I agree, I have seen many very bad examples of using count(*). I will go
> > so far as to question the use of count(*) in my examples here. It there
> > a better way to come up with a page list than using count(*)? What is
> > the best method to make a page of results and a list of links to other
> > pages of results? Am I barking up the wrong tree here?
>
> One way I have dealt with this on very large tables is to cache the
> count(*) at the application level (using memcached, terracotta, or
> something along those lines) and then increment that cache whenever you
> add a row to the relevant table. On application restart that cache is
> re-initialized with a regular old count(*). This approach works really
> well and all large systems in my experience need caching in front of the
> DB eventually. If you have a simpler system with say a single
> application/web server you can simply store the value in a variable, the
> specifics would depend on the language and framework you are using.

I use this method when ever possible. I talked about it in my first post.
I generally keep a table around I call counts. It has many rows that store
count numbers from frequently used views.
The one that I can't do anything about is the case where you nave no control
over the WHERE clause, (or where there may be simply too many options to count
everything ahead of time without making things even slower). That is the point
of this entire thread, or was... ;)
-Neil-

>
> Another more all-DB approach is to create a statistics tables into which
> you place aggregated statistics rows (num deleted, num inserted, totals,
> etc) at an appropriate time interval in your code. So you have rows
> containing aggregated statistics information for the past and some tiny
> portion of the new data happening right now that hasn't yet been
> aggregated. Queries then look like a summation of the aggregated values
> in the statistics table plus a count(*) over just the newest portion of
> the data table and are generally very fast.
>
> Overall I have found that once things get big the layers of your app
> stack start to blend together and have to be combined in clever ways to
> keep speed up. Postgres is a beast but when you run into things it
> can't do well just find a way to cache it or make it work together with
> some other persistence tech to handle those cases.

From:	Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>
To:	Pierre C <lists(at)peufeu(dot)com>
Cc:	"pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>, Neil Whelchel <neil(dot)whelchel(at)gmail(dot)com>
Subject:	Re: Slow count(*) again...
Date:	2010-10-12 22:30:38
Message-ID:	4CB4E18E.2060906@vmsinfo.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

Pierre C wrote:
>>
>
> Well, suppose you paginate results. If the user sees that the search query
> returns 500 pages, there are two options :
>
With Google, I usually lose patience on the page 3. All that I, as an
end user, need to know is whether there are more than 10 pages. The
fact that there are 1776 pages in the result set is not particularly
useful to me. I couldn't care less whether the number of returned pages
is 1492, 1776 or 1861, I'm going to look at, at most, the first 5 of them.

> - you're google, and your sorting algorithms are so good that the answer
> the user wants is in the first page
> - or the user will refine his search by entering more keywords tu get a
> manageable result set
>
> So, in both cases, the count(*) was useless anyway. And the slowest ones
> are the most useless, since the user will immediatey discard the result
> and refine his query.
>
> If your full text search is slow, try Xapian or Lucene.
>
>
May I also recommend Sphinx? It's a very nice text search engine, with
the price equal to that of Lucene.

--

Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com
The Leader in Integrated Media Intelligence Solutions

From:	Neil Whelchel <neil(dot)whelchel(at)gmail(dot)com>
To:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: Slow count(*) again...
Date:	2010-10-12 22:33:33
Message-ID:	201010121533.34047.neil.whelchel@gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On Tuesday 12 October 2010 08:39:19 Dan Harris wrote:
> On 10/11/10 8:02 PM, Scott Carey wrote:
> > would give you a 1MB read-ahead. Also, consider XFS and its built-in
> > defragmentation. I have found that a longer lived postgres DB will get
> > extreme file fragmentation over time and sequential scans end up mostly
> > random. On-line file defrag helps tremendously.
>
> We just had a corrupt table caused by an XFS online defrag. I'm scared
> to use this again while the db is live. Has anyone else found this to
> be safe? But, I can vouch for the fragmentation issue, it happens very
> quickly in our system.
>
> -Dan

I would like to know the details of what was going on that caused your
problem. I have been using XFS for over 9 years, and it has never caused any
trouble at all in a production environment. Sure, I had many problems with it
on the test bench, but in most cases the issues were very clear and easy to
avoid in production. There were some (older) XFS tools that caused some
problems, but that is in the past, and as time goes on, it seems take less and
less planning to make it work properly.
-Neil-

From:	Neil Whelchel <neil(dot)whelchel(at)gmail(dot)com>
To:	"Pierre C" <lists(at)peufeu(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: Slow count(*) again...
Date:	2010-10-12 23:19:33
Message-ID:	201010121619.34839.neil.whelchel@gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On Tuesday 12 October 2010 14:35:01 you wrote:
> > suggest that 99% instances of the "select count(*)" idiom are probably
> >
> >> bad use of the SQL language.
>
> Well, suppose you paginate results. If the user sees that the search query
> returns 500 pages, there are two options :
>
> - you're google, and your sorting algorithms are so good that the answer
> the user wants is in the first page
> - or the user will refine his search by entering more keywords tu get a
> manageable result set
>
> So, in both cases, the count(*) was useless anyway. And the slowest ones
> are the most useless, since the user will immediatey discard the result
> and refine his query.
>
> If your full text search is slow, try Xapian or Lucene.

I guess I have to comment here again and point out that while I am having this
issue with text searches, I avoid using count(*) in such cases, I just use
next and previous links. Where the real problem (for me) is that when someone
searches a date or time range. My application keeps track of huge amounts of
realtime transactional data. So an administrator might want a report as to
what some data point did yesterday between 3 and 4 PM. Under normal conditions
the range of records that match can be between 0 and over 5,000. This is
really killing me especially when the reporting people want a list of how many
transactions each that were on points in a given zipcode had this morning
between 8 and 9 AM, it takes about 5 minutes to run on a server that has
enough ram to hold the entire table!

Pseudo query:
Show how many transactions per node in zipcode 92252 between 8:00 and 9:00
today:

point_number | number_of_transactions
65889 | 31
34814 | 4865
28349 | 0
3358 | 364
...

24 total rows, > 5 minutes.

Then they want every node to be a link to a list of actual data within the
specified timeframe.
This is where I have to to the same query twice to first find out how many for
the page links, then again to get a page of results.
Sure, I could keep tables around that have numbers by the hour, minute, day or
whatever to cache up results for speeding things, then the problem is that
when the data is put into the server, there are so many statistics tables to
update, the front end becomes a huge problem. Also, it makes for a huge mess
of tables to think about when I need to make a report.

-Neil-

From:	Dan Harris <fbsd(at)drivefaster(dot)net>
To:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: Slow count(*) again...
Date:	2010-10-13 00:00:11
Message-ID:	4CB4F68B.1010709@drivefaster.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On 10/12/10 4:33 PM, Neil Whelchel wrote:
> On Tuesday 12 October 2010 08:39:19 Dan Harris wrote:
>> On 10/11/10 8:02 PM, Scott Carey wrote:
>>> would give you a 1MB read-ahead. Also, consider XFS and its built-in
>>> defragmentation. I have found that a longer lived postgres DB will get
>>> extreme file fragmentation over time and sequential scans end up mostly
>>> random. On-line file defrag helps tremendously.
>> We just had a corrupt table caused by an XFS online defrag. I'm scared
>> to use this again while the db is live. Has anyone else found this to
>> be safe? But, I can vouch for the fragmentation issue, it happens very
>> quickly in our system.
>>
>> -Dan
> I would like to know the details of what was going on that caused your
> problem. I have been using XFS for over 9 years, and it has never caused any
> trouble at all in a production environment. Sure, I had many problems with it
> on the test bench, but in most cases the issues were very clear and easy to
> avoid in production. There were some (older) XFS tools that caused some
> problems, but that is in the past, and as time goes on, it seems take less and
> less planning to make it work properly.
> -Neil-
>
There were roughly 50 transactions/sec going on at the time I ran it.
xfs_db reported 99% fragmentation before it ran ( we haven't been
running it via cron ). The operation completed in about 15 minutes (
360GB of used data on the file system ) with no errors. Everything
seemed fine until the next morning when a user went to query a table we
got a message about a "missing" file inside the pg cluster. We were
unable to query the table at all via psql. It was a bit of a panic
situation so we restored that table from backup immediately and the
problem was solved without doing more research.

This database has been running for years with no problem ( and none
since ), that was the first time I tried to do an on-line defrag and
that was the only unusual variable introduced into the system at that
time so it was a strong enough correlation for me to believe that caused
it. Hopefully this was just a corner case..

-Dan

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>, "david(at)lang(dot)hm" <david(at)lang(dot)hm>, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, Vitalii Tymchyshyn <tivv00(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: Slow count(*) again...
Date:	2010-10-13 06:45:16
Message-ID:	AANLkTimQcMKPVDuHgQC11COJ9_ad6zY1dwXtqMj8LxKq@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On Tue, Oct 12, 2010 at 1:07 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Anyway, if anyone is hot to make COUNT(*) faster, that's where to look.
> I don't think any of the previous discussion in this thread is on-point
> at all, except for the parts where people suggested avoiding it.

I kind of hope that index-only scans help with this, too. If you have
a wide table and a narrow (but not partial) index, and if the
visibility map bits are mostly set, it ought to be cheaper to read the
index than the table - certainly in the case where any disk I/O is
involved, and maybe even if it isn't.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Neil Whelchel <neil(dot)whelchel(at)gmail(dot)com>
To:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: Slow count(*) again...
Date:	2010-10-13 06:47:19
Message-ID:	201010122347.20542.neil.whelchel@gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On Sunday 10 October 2010 21:15:56 Neil Whelchel wrote:

> Right now, I am building a test machine with two dual core Intel processors
> and two 15KRPM mirrored hard drives, 1 GB ram. I am using a small amount of
> ram because I will be using small test tables. I may do testing in the
> future with more ram and bigger tables, but I think I can accomplish what
> we are all after with what I have. The machine will be limited to running
> the database server in test, init, bash, and ssh, no other processes will
> be running except for what is directly involved with testing. I will post
> exact specs when I post test results. I will create some test tables, and
> the same tables will be used in all tests. Suggestions for optimal
> Postgres and system configuration are welcome. I will try any suggested
> settings that I have time to test. -Neil-
>

Ok the test machine is up and running:
A few more details, the hard drives are SCSI Ultra-320, the CPUs are 2.8 GHZ,
533 MHZ FSB. I wanted to make a more memory cramped machine to keep the table
to RAM ratio closer to the production machines, but for now, all I have are
1GB DDRs, and the machine requires pairs, so total memory is 2GB. Swap is
turned off.

The data I will be using is a couple of days of raw data from a production
system. The columns of interest are numeric and timestamp. I will use the
exact same data for all tests.

The initial test is with XFS with write barriers turned on, this makes for
very slow writes. The point of the first test is to get a baseline of
everything out-of-the-box. So, here are the numbers:

Insert the data into one table:
crash:~# time psql -U test test -q < log.sql
real 679m43.678s
user 1m4.948s
sys 13m1.893s

crash:~# echo 3 > /proc/sys/vm/drop_caches
crash:~# time psql -U test test -c "SELECT count(*) FROM log;"
count
----------
10050886
(1 row)

real 0m11.812s
user 0m0.000s
sys 0m0.004s

crash:~# time psql -U test test -c "SELECT count(*) FROM log;"
count
----------
10050886
(1 row)

real 0m3.737s
user 0m0.000s
sys 0m0.000s

As can be seen here, the cache helps..
And the numbers are not all that bad, so let's throw a sabot into the gears:
crash:~# time psql -U test test -c "UPDATE log SET raw_data=raw_data+1"
UPDATE 10050886

real 14m13.802s
user 0m0.000s
sys 0m0.000s

crash:~# time psql -U test test -c "SELECT count(*) FROM log;"
count
----------
10050886
(1 row)

real 3m32.757s
user 0m0.000s
sys 0m0.000s

Just to be sure:
crash:~# time psql -U test test -c "SELECT count(*) FROM log;"
count
----------
10050886
(1 row)

real 2m38.631s
user 0m0.000s
sys 0m0.000s

It looks like cache knocked about a minute off, still quite sad.
So, I shutdown Postgres, ran xfs_fsr, and started Postgres:
crash:~# echo 3 > /proc/sys/vm/drop_caches
crash:~# time psql -U test test -c "SELECT count(*) FROM log;"
count
----------
10050886
(1 row)

real 1m36.304s
user 0m0.000s
sys 0m0.000s

So it seems that defragmentation knocked another minute off:
Let's see how much cache helps now:
crash:~# time psql -U test test -c "SELECT count(*) FROM log;"
count
----------
10050886
(1 row)

real 1m34.873s
user 0m0.000s
sys 0m0.000s

Not much... And we are a long way from the 3.7 seconds with a freshly inserted
table. Maybe the maid can help here.
crash:~# time psql -U test test -c "VACUUM log;"
VACUUM

real 22m31.931s
user 0m0.000s
sys 0m0.000s

crash:~# time psql -U test test -c "SELECT count(*) FROM log;"
count
----------
10050886
(1 row)

real 1m30.927s
user 0m0.000s
sys 0m0.000s

Nope...
So, possible conclusions are:
1. Even with VACUUM database table speed degrades as tables are updated.
2. Time testing on a freshly INSERTed table gives results that are not real-
world.
3. Filesystem defragmentation helps (some).
4. Cache only makes a small difference once a table has been UPDATEd.

I am going to leave this configuration running for the next day or so. This
way I can try any suggestions and play with any more ideas that I have.
I will try these same tests on ext4 later, along with any good suggested
tests.
I will try MySQL with the dame data with both XFS and ext4.
-Neil-

From:	Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
To:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: Slow count(*) again...
Date:	2010-10-13 07:19:26
Message-ID:	4CB55D7E.8080902@catalyst.net.nz
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On 13/10/10 19:47, Neil Whelchel wrote:
>
> Nope...
> So, possible conclusions are:
> 1. Even with VACUUM database table speed degrades as tables are updated.
> 2. Time testing on a freshly INSERTed table gives results that are not real-
> world.
> 3. Filesystem defragmentation helps (some).
> 4. Cache only makes a small difference once a table has been UPDATEd.
>
> I am going to leave this configuration running for the next day or so. This
> way I can try any suggestions and play with any more ideas that I have.
> I will try these same tests on ext4 later, along with any good suggested
> tests.
> I will try MySQL with the dame data with both XFS and ext4.
> -Neil-
>
>

I think that major effect you are seeing here is that the UPDATE has
made the table twice as big on disk (even after VACUUM etc), and it has
gone from fitting in ram to not fitting in ram - so cannot be
effectively cached anymore.

This would not normally happen in real life (assuming UPDATEs only
modify a small part of a table per transaction). However administration
updates (e.g 'oh! - ref 1 should now be ref 2 please update
everything') *will* cause the table size to double.

This is an artifact of Postgres's non overwriting storage manager -
Mysql will update in place and you will not see this.

Try VACUUM FULL on the table and retest.

regards

Mark

From:	"Pierre C" <lists(at)peufeu(dot)com>
To:	"pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>, "Neil Whelchel" <neil(dot)whelchel(at)gmail(dot)com>
Subject:	Re: Slow count(*) again...
Date:	2010-10-13 07:46:25
Message-ID:	op.vkh7vno8eorkce@apollo13
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

> I guess I have to comment here again and point out that while I am
> having this
> issue with text searches, I avoid using count(*) in such cases, I just
> use
> next and previous links.

Unfortunately sometimes you got to do an ORDER BY on search results, and
then all the rows got to be read...

> Where the real problem (for me) is that when someone
> searches a date or time range. My application keeps track of huge

Have you tried CLUSTER ?

Also, it is sad to say, but if you need an engine able to use index-only
scans which would fit this type of query, replicate the table to MyISAM.
Unfortunately, the MySQL optimizer is really not so smart about complex
reporting queries (no hash joins, no hash aggregates) so if you don't have
a multicolumn index covering that you can use for index-only scan in your
query, you'll get either a really huge sort or a really nasty nested loop
index scan...

From:	Neil Whelchel <neil(dot)whelchel(at)gmail(dot)com>
To:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: Slow count(*) again...
Date:	2010-10-13 08:38:38
Message-ID:	201010130138.39893.neil.whelchel@gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On Wednesday 13 October 2010 00:19:26 Mark Kirkwood wrote:
> On 13/10/10 19:47, Neil Whelchel wrote:
> > Nope...
> > So, possible conclusions are:
> > 1. Even with VACUUM database table speed degrades as tables are updated.
> > 2. Time testing on a freshly INSERTed table gives results that are not
> > real- world.
> > 3. Filesystem defragmentation helps (some).
> > 4. Cache only makes a small difference once a table has been UPDATEd.
> >
> > I am going to leave this configuration running for the next day or so.
> > This way I can try any suggestions and play with any more ideas that I
> > have. I will try these same tests on ext4 later, along with any good
> > suggested tests.
> > I will try MySQL with the dame data with both XFS and ext4.
> > -Neil-
>
> I think that major effect you are seeing here is that the UPDATE has
> made the table twice as big on disk (even after VACUUM etc), and it has
> gone from fitting in ram to not fitting in ram - so cannot be
> effectively cached anymore.
>
> This would not normally happen in real life (assuming UPDATEs only
> modify a small part of a table per transaction). However administration
> updates (e.g 'oh! - ref 1 should now be ref 2 please update
> everything') *will* cause the table size to double.
>
> This is an artifact of Postgres's non overwriting storage manager -
> Mysql will update in place and you will not see this.
>
> Try VACUUM FULL on the table and retest.
>
> regards
>
> Mark

There seems to be allot of discussion about VACUUM FULL, and its problems. The
overall buzz seems to be that VACUUM FULL is a bad idea (I could be wrong
here). It has been some time since I have read the changelogs, but I seem to
remember that there have been some major changes to VACUUM FULL recently.
Maybe this needs to be re-visited in the documentation.

crash:~# time psql -U test test -c "VACUUM FULL log;"
VACUUM

real 4m49.055s
user 0m0.000s
sys 0m0.000s

crash:~# time psql -U test test -c "SELECT count(*) FROM log;"
count
----------
10050886
(1 row)

real 0m9.665s
user 0m0.000s
sys 0m0.004s

A huge improvement from the minute and a half before the VACUUM FULL.
crash:~# time psql -U test test -c "SELECT count(*) FROM log;"
count
----------
10050886
(1 row)

real 0m3.786s
user 0m0.000s
sys 0m0.000s

And the cache helps...
So, we are right back to within 10ms of where we started after INSERTing the
data, but it took a VACUUM FULL to accomplish this (by making the table fit in
RAM).
This is a big problem on a production machine as the VACUUM FULL is likely to
get in the way of INSERTing realtime data into the table.

So to add to the conclusion pile:
5. When you have no control over the WHERE clause which may send count(*)
through more rows of a table that would fit in RAM your performance will be
too slow, so count is missing a LIMIT feature to avoid this.
6. Keep tables that are to be updated frequently as narrow as possible: Link
them to wider tables to store the columns that are less frequently updated.

So with our conclusion pile so far we can deduce that if we were to keep all
of our data in two column tables (one to link them together, and the other to
store one column of data), we stand a much better chance of making the entire
table to be counted fit in RAM, so we simply apply the WHERE clause to a
specific table as opposed to a column within a wider table... This seems to
defeat the entire goal of the relational database...

-Neil-

From:	Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>
To:	Neil Whelchel <neil(dot)whelchel(at)gmail(dot)com>
Cc:	"pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: Slow count(*) again...
Date:	2010-10-13 08:40:53
Message-ID:	4CB57095.6060306@vmsinfo.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On 10/13/2010 2:47 AM, Neil Whelchel wrote:
> Even with VACUUM database table speed degrades

What the heck is the "database table speed"? Tables don't do anything.

--
Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
www.vmsinfo.com

From:	Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>
To:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: Slow count(*) again...
Date:	2010-10-13 08:44:09
Message-ID:	4CB57159.2090501@vmsinfo.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On 10/13/2010 3:19 AM, Mark Kirkwood wrote:
> I think that major effect you are seeing here is that the UPDATE has
> made the table twice as big on disk (even after VACUUM etc), and it has
> gone from fitting in ram to not fitting in ram - so cannot be
> effectively cached anymore.
>
In the real world, tables are larger than the available memory. I have
tables of several hundred gigabytes in size. Tables shouldn't be
"effectively cached", the next step would be to measure "buffer cache
hit ratio", tables should be effectively used.

--
Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
www.vmsinfo.com

From:	Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
To:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: Slow count(*) again...
Date:	2010-10-13 08:50:23
Message-ID:	4CB572CF.4080709@catalyst.net.nz
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On 13/10/10 21:38, Neil Whelchel wrote:
>
> So with our conclusion pile so far we can deduce that if we were to keep all
> of our data in two column tables (one to link them together, and the other to
> store one column of data), we stand a much better chance of making the entire
> table to be counted fit in RAM, so we simply apply the WHERE clause to a
> specific table as opposed to a column within a wider table... This seems to
> defeat the entire goal of the relational database...
>
>

That is a bit excessive I think - a more reasonable conclusion to draw
is that tables bigger than ram will drop to IO max speed to scan, rather
than DIMM max speed...

There are things you can do to radically improve IO throughput - e.g a
pair of AMC or ARECA 12 slot RAID cards setup RAID 10 and tuned properly
should give you a max sequential throughput of something like 12*100
MB/s = 1.2 GB/s. So your example table (estimated at 2GB) so be able to
be counted by Postgres in about 3-4 seconds...

This assumes a more capable machine than you are testing on I suspect.

Cheers

Mark

From:	Neil Whelchel <neil(dot)whelchel(at)gmail(dot)com>
To:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: Slow count(*) again...
Date:	2010-10-13 10:16:11
Message-ID:	201010130316.12922.neil.whelchel@gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On Wednesday 13 October 2010 01:50:23 Mark Kirkwood wrote:
> On 13/10/10 21:38, Neil Whelchel wrote:
> > So with our conclusion pile so far we can deduce that if we were to keep
> > all of our data in two column tables (one to link them together, and the
> > other to store one column of data), we stand a much better chance of
> > making the entire table to be counted fit in RAM, so we simply apply the
> > WHERE clause to a specific table as opposed to a column within a wider
> > table... This seems to defeat the entire goal of the relational
> > database...
>
> That is a bit excessive I think - a more reasonable conclusion to draw
> is that tables bigger than ram will drop to IO max speed to scan, rather
> than DIMM max speed...
>
> There are things you can do to radically improve IO throughput - e.g a
> pair of AMC or ARECA 12 slot RAID cards setup RAID 10 and tuned properly
> should give you a max sequential throughput of something like 12*100
> MB/s = 1.2 GB/s. So your example table (estimated at 2GB) so be able to
> be counted by Postgres in about 3-4 seconds...
>
> This assumes a more capable machine than you are testing on I suspect.
>
> Cheers
>
> Mark
The good ol' bruit force approach! I knew I'd see this one sooner or later.
Though I was not sure if I was going to see the 16TB of RAM suggestion first.
Seriously though, as the title of this thread suggests, everything is
relative. Sure count(*) and everything else will work faster with more system
power. It just seems to me that count(*) is slower than it could be given a
set of conditions. I started this thread because I think that there must be a
better way to count matches from an INDEXed column than shoving the entire
table through RAM (including columns that you are not interested in at the
minute). And even worse, when you have no (reasonable) control of the WHERE
clause preventing your system from thrashing for the next week because
somebody put in criteria that matched a few TB of records and there is no way
to LIMIT count(*) other than externally timing the query and aborting it if it
takes too long. Whet is needed is a way to determine how many rows are likely
to match a given WHERE clause so we can cut off useless queries, but we need a
fast count(*) for that, or a limit on the existing one... I seem to remember
saying something about an index driven estimate(*) at one point...

I might go as far as to rattle the cage of the developers to see if it makes
any sense to add some column oriented storage capability to Postgres. That
would be the hot ticket to be able to specify an attribute on a column so that
the back end could shadow or store a column in a column oriented table so
aggregate functions could work on them with good efficiency, or is that an
INDEX?

Since the thread has started, I have had people ask about different system
configurations, especially the filesystem (XFS, ext4...). I have never tested
ext4, and since we are all involved here, I thought that I could do so and
share my results for others, that is why I got into time testing stuff.
Time testing count(*) in my later postings is really not the point as count is
simply dragging the entire table off of the RAID through RAM, I can use any
other function like max()... No that can narrow down its scan with an INDEX...
Ok, sum(), there we go!

-Neil-

From:	Vitalii Tymchyshyn <tivv00(at)gmail(dot)com>
To:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: Slow count(*) again...
Date:	2010-10-13 10:41:52
Message-ID:	4CB58CF0.4020601@gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

12.10.10 14:44, Craig Ringer написав(ла):
>
>> in the case where you are doing a count(*) where query and the where is
>> on an indexed column, could the search just look at the index + the
>> visibility mapping rather than doing an sequential search through the
>> table?
>
> Nope, because the visibility map, which is IIRC only one bit per page,
> doesn't record how many tuples there are on the page, or enough
> information about them to determine how many of them are visible to
> the current transaction*.
I'd say it can tell you that your may not recheck given tuple, can't it?
You still have to count all index tuples and recheck the ones that are
uncertain. Does it work in this way? This can help a lot for wide tuples
in table, but with narrow index and mostly read-only data.

Best regards, Vitalii Tymchyshyn

From:	Vitalii Tymchyshyn <tivv00(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	"pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: Slow count(*) again...
Date:	2010-10-13 10:54:19
Message-ID:	4CB58FDB.3080505@gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

12.10.10 21:58, Tom Lane написав(ла):
>
> I'm less than convinced that that approach will result in a significant
> win. It's certainly not going to do anything to convert COUNT(*) into
> an O(1) operation, which frankly is what the complainants are expecting.
> There's basically no hope of solving the "PR problem" without somehow
> turning COUNT(*) into a materialized-view reference. We've discussed
> that in the past, and know how to do it in principle, but the complexity
> and distributed overhead are daunting.
>
>
I've though about "aggregate" indexes, something like
create index index_name on table_name(count(*) group by column1, column2);
OR
create index index_name on table_name(count(*));
for table-wide count

To make it usable one would need:
1) Allow third aggregate function SMERGE that can merge one aggregate
state to another
2) The index should be regular index (e.g. btree) on column1, column2
that for each pair has page list to which it's data may belong (in
past/current running transactions), and aggregate state for each page
that were frozen previously
When index is used, it can use precalculated values for "pages with all
tuples vacuumed" (I suspect this is information from visibility map) and
should do regular calculation for all non-frozen pages with visibility
checks and everything what's needed.
When vacuum processes the page, it should (in sync or async way)
calculate aggregate values for the page.

IMHO Such an indexes would make materialized views/triggers/high level
caches unneeded in most cases.

Best regards, Vitalii Tymchyshyn

From:	Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
To:	"pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: Slow count(*) again...
Date:	2010-10-13 11:42:00
Message-ID:	4CB59B08.6030900@postnewspapers.com.au
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On 13/10/2010 12:38 AM, Jesper Krogh wrote:

> If some clever postgres hacker could teach postgres to allocate blocks
> using posix_fallocate in quite large batches, say .. something like:
> fallocate(min(current_relation_size *0.1,1073741824))

There doesn't seem to be any use of posix_fallocate in the sources, at
least according to git grep. The patch that introduced posix_fadvise use
apparently had posix_fallocate in it, but that use appears to have been
removed down the track.

It's worth noting that posix_fallocate sucks if your file system doesn't
intelligent support for it. IIRC it's horrible on ext3, where it can
take a while to return while it allocates (and IIRC zeroes!) all those
blocks. This may be part of why it's not used. In past testing with
posix_fallocate for other tools I've also found rather mixed performance
results - it can slow things down rather than speed them up, depending
on the file system in use and all sorts of other factors.

If Pg was to use posix_fallocate, it'd probably need control over it on
a per-tablespace basis.

--
Craig Ringer

Tech-related writing at http://soapyfrogs.blogspot.com/

From:	Greg Smith <greg(at)2ndquadrant(dot)com>
To:	Dan Harris <fbsd(at)drivefaster(dot)net>
Cc:	pgsql-performance <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: Slow count(*) again...
Date:	2010-10-13 12:12:19
Message-ID:	4CB5A223.4070609@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

Dan Harris wrote:
> I'm not sure how to figure out what version of XFS we're on.. but it's
> Linux kernel 2.6.24-24 x86_64 on Ubuntu Server 8.04.3. Postgres
> version 8.3

There's the kernel side support that matches your kernel, as well as the
xfsprogs package. The latter is where a lot of distributions haven't
kept up with upstream changes, and where I suspect the defragmenter bug
you ran into is located at.

Hardy ships with 2.9.4-2: http://packages.ubuntu.com/hardy/xfsprogs

The work incorporating a more stable XFS into RHEL started with xfsprogs
3.0.1-6 going into Fedora 11, and 3.1.X would represent a current
release. So your Ubuntu kernel is two major improvement releases
behind, 3.0 and 3.1 were the upgrades to xfsprogs where things really
got going again making that code modern and solid. Ubuntu Lucid
switched to 3.1.0, RHEL6 will probably ship 3.1.0 too.

From:	Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>
To:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: Slow count(*) again...
Date:	2010-10-13 12:33:28
Message-ID:	4CB5A718.6050206@vmsinfo.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On 10/13/2010 8:12 AM, Greg Smith wrote:
> The work incorporating a more stable XFS into RHEL started with xfsprogs
> 3.0.1-6 going into Fedora 11, and 3.1.X would represent a current
> release. So your Ubuntu kernel is two major improvement releases
> behind, 3.0 and 3.1 were the upgrades to xfsprogs where things really
> got going again making that code modern and solid. Ubuntu Lucid
> switched to 3.1.0, RHEL6 will probably ship 3.1.0 too.
>

I am afraid that my management will not let me use anything that doesn't
exist as a RPM package in the current Red Hat distribution. No Ubuntu,
no Fedora, no manual linking. There will always be that ominous
question: how many other companies are using XFS? From the business
perspective, questions like that make perfect sense.

--
Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
www.vmsinfo.com

From:	Greg Smith <greg(at)2ndquadrant(dot)com>
To:	Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>
Cc:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: Slow count(*) again...
Date:	2010-10-13 13:02:21
Message-ID:	4CB5ADDD.7020402@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

Mladen Gogala wrote:
> I am afraid that my management will not let me use anything that
> doesn't exist as a RPM package in the current Red Hat distribution. No
> Ubuntu, no Fedora, no manual linking. There will always be that
> ominous question: how many other companies are using XFS? From the
> business perspective, questions like that make perfect sense.

XFS support is available as an optional module starting in RHEL 5.5. In
CentOS, you just grab it, so that's what I've been doing. My
understanding is that you may have to ask your sales rep to enable
access to it under the official RedHat Network channels if you're using
a subscription from them. I'm not sure exactly what the support
situation is with it, but it's definitely available as an RPM from RedHat.

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Neil Whelchel <neil(dot)whelchel(at)gmail(dot)com>
Cc:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: Slow count(*) again...
Date:	2010-10-13 13:27:34
Message-ID:	AANLkTimtCk7Xrc7SZi11EUXKoGioP0TqwSnBYKekdkrx@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On Wed, Oct 13, 2010 at 4:38 AM, Neil Whelchel <neil(dot)whelchel(at)gmail(dot)com> wrote:
> There seems to be allot of discussion about VACUUM FULL, and its problems. The
> overall buzz seems to be that VACUUM FULL is a bad idea (I could be wrong
> here). It has been some time since I have read the changelogs, but I seem to
> remember that there have been some major changes to VACUUM FULL recently.
> Maybe this needs to be re-visited in the documentation.

In 9.0, VACUUM FULL does something similar to what CLUSTER does. This
is a much better idea than what it did in 8.4 and prior.

> crash:~# time psql -U test test -c "VACUUM FULL log;"
> VACUUM
>
> real 4m49.055s
> user 0m0.000s
> sys 0m0.000s
>
> crash:~# time psql -U test test -c "SELECT count(*) FROM log;"
> count
> ----------
> 10050886
> (1 row)
>
> real 0m9.665s
> user 0m0.000s
> sys 0m0.004s
>
> A huge improvement from the minute and a half before the VACUUM FULL.

This is a very surprising result that I would like to understand
better. Let's assume that your UPDATE statement bloated the table by
2x (you could use pg_relation_size to find out exactly; the details
probably depend on fillfactor which you might want to lower if you're
going to do lots of updates). That ought to mean that count(*) has to
grovel through twice as much data, so instead of taking 9 seconds it
ought to take 18 seconds. Where the heck is the other 1:12 going?
This might sort of make sense if the original table was laid out
sequentially on disk and the updated table was not, but how and why
would that happen?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Neil Whelchel <neil(dot)whelchel(at)gmail(dot)com>
Cc:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: Slow count(*) again...
Date:	2010-10-13 13:28:43
Message-ID:	AANLkTi=Cn1N8keywMbePZPiezg9cn_EXQZiLzF7cym+q@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On Wed, Oct 13, 2010 at 6:16 AM, Neil Whelchel <neil(dot)whelchel(at)gmail(dot)com> wrote:
> I might go as far as to rattle the cage of the developers to see if it makes
> any sense to add some column oriented storage capability to Postgres. That
> would be the hot ticket to be able to specify an attribute on a column so that
> the back end could shadow or store a column in a column oriented table so
> aggregate functions could work on them with good efficiency, or is that an
> INDEX?

I'd love to work on that, but without funding it's tough to find the
time. It's a big project.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	"Neil Whelchel" <neil(dot)whelchel(at)gmail(dot)com>, <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: Slow count(*) again...
Date:	2010-10-13 13:45:00
Message-ID:	4CB5718D02000025000368E0@gw.wicourts.gov
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

Neil Whelchel <neil(dot)whelchel(at)gmail(dot)com> wrote:

> crash:~# time psql -U test test -c "UPDATE log SET
> raw_data=raw_data+1"
> UPDATE 10050886
>
> real 14m13.802s
> user 0m0.000s
> sys 0m0.000s
>
> crash:~# time psql -U test test -c "SELECT count(*) FROM log;"
> count
> ----------
> 10050886
> (1 row)
>
> real 3m32.757s
> user 0m0.000s
> sys 0m0.000s
>
> Just to be sure:
> crash:~# time psql -U test test -c "SELECT count(*) FROM log;"
> count
> ----------
> 10050886
> (1 row)
>
> real 2m38.631s
> user 0m0.000s
> sys 0m0.000s
>
> It looks like cache knocked about a minute off

That's unlikely to be caching, since you just updated the rows.
It's much more likely to be one or both of rewriting the rows as you
read them to set hint bits or competing with autovacuum.

The large increase after the update probably means you went from a
table which was fully cached to something larger than the total
cache.

-Kevin

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Neil Whelchel <neil(dot)whelchel(at)gmail(dot)com>
Cc:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: Slow count(*) again...
Date:	2010-10-13 13:49:12
Message-ID:	24768.1286977752@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

Neil Whelchel <neil(dot)whelchel(at)gmail(dot)com> writes:
> Insert the data into one table:
> crash:~# time psql -U test test -q < log.sql
> real 679m43.678s
> user 1m4.948s
> sys 13m1.893s

> crash:~# echo 3 > /proc/sys/vm/drop_caches
> crash:~# time psql -U test test -c "SELECT count(*) FROM log;"
> count
> ----------
> 10050886
> (1 row)

> real 0m11.812s
> user 0m0.000s
> sys 0m0.004s

> crash:~# time psql -U test test -c "SELECT count(*) FROM log;"
> count
> ----------
> 10050886
> (1 row)

> real 0m3.737s
> user 0m0.000s
> sys 0m0.000s

> As can be seen here, the cache helps..

That's probably got little to do with caching and everything to do with
setting hint bits on the first SELECT pass.

I concur with Mark's question about whether your UPDATE pushed the table
size across the limit of what would fit in RAM.

regards, tom lane

From:	Jesper Krogh <jesper(at)krogh(dot)cc>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Neil Whelchel <neil(dot)whelchel(at)gmail(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject:	Re: Slow count(*) again...
Date:	2010-10-13 17:59:48
Message-ID:	4CB5F394.6090909@krogh.cc
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On 2010-10-13 15:28, Robert Haas wrote:
> On Wed, Oct 13, 2010 at 6:16 AM, Neil Whelchel<neil(dot)whelchel(at)gmail(dot)com> wrote:
>
>> I might go as far as to rattle the cage of the developers to see if it makes
>> any sense to add some column oriented storage capability to Postgres. That
>> would be the hot ticket to be able to specify an attribute on a column so that
>> the back end could shadow or store a column in a column oriented table so
>> aggregate functions could work on them with good efficiency, or is that an
>> INDEX?
>>
> I'd love to work on that, but without funding it's tough to find the
> time. It's a big project.
>
Is it hugely different from just getting the visibillity map suitable
for doing index-only scans and extracting values from the index
directly as Heikki has explained?

That would essentially do a column oriented table (the index itself)
of a specific columns (or column set).

... still a huge task though.

--
Jesper

From:	Alex Hunsaker <badalex(at)gmail(dot)com>
To:	Neil Whelchel <neil(dot)whelchel(at)gmail(dot)com>
Cc:	pgsql-performance(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject:	Re: Slow count(*) again...
Date:	2010-10-13 18:17:19
Message-ID:	AANLkTik_EOb3VjD2NxWNUDL91sG-iJTOS6MCAYkv0O9t@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On Wed, Oct 13, 2010 at 07:49, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Neil Whelchel <neil(dot)whelchel(at)gmail(dot)com> writes:

> I concur with Mark's question about whether your UPDATE pushed the table
> size across the limit of what would fit in RAM.

Yeah, you said you have ~2GB of ram, just counting the bytes and the
number of rows (not including padding or overhead) puts you around
~670MB. Some quick testing here on a 64 bit box :

=> create table log (batch_id int, t_stamp timestamp without time zone
not null default now(), raw_data numeric, data_value numeric,
data_value_delta numeric, journal_value numeric, journal_data numeric,
machine_id integer not null, group_number integer) with oids;
CREATE TABLE
Time: 34.310 ms

=> insert into log (batch_id, data_value, data_value_delta,
journal_value, journal_data, group_number, machine_id, raw_data)
select 1, 1, 1, 1, 1, 1, 1, 1 from generate_series(1, 10050886);
INSERT 0 10050886
Time: 32818.529 ms

=> SELECT pg_size_pretty(pg_total_relation_size('log'));
pg_size_pretty
----------------
969 MB

=> update log set raw_data = raw_data+1;
UPDATE 10050886
Time: 65805.741 ms

=> SELECT pg_size_pretty(pg_total_relation_size('log'));
pg_size_pretty
----------------
1939 MB

=> SELECT count(*) from log;
count
----------
10050886
(1 row)

Time: 11181.005 ms

=> SELECT count(*) from log;
count
----------
10050886
(1 row)

Time: 2825.569 ms

This box has ~6GB ram.

BTW did anyone else hear the below in a Valeris voice?
> And the numbers are not all that bad, so let's throw a sabot into the gears:
> crash:~# time psql -U test test -c "UPDATE log SET raw_data=raw_data+1"

From:	Alex Hunsaker <badalex(at)gmail(dot)com>
To:	Neil Whelchel <neil(dot)whelchel(at)gmail(dot)com>
Cc:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: Slow count(*) again...
Date:	2010-10-13 19:09:22
Message-ID:	AANLkTimjmZX9+pUbCaXf1yrAfiWOrKVDrs8_LiQ1BWxT@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On Wed, Oct 13, 2010 at 02:38, Neil Whelchel <neil(dot)whelchel(at)gmail(dot)com> wrote:

> And the cache helps...
> So, we are right back to within 10ms of where we started after INSERTing the
> data, but it took a VACUUM FULL to accomplish this (by making the table fit in
> RAM).
> This is a big problem on a production machine as the VACUUM FULL is likely to
> get in the way of INSERTing realtime data into the table.

Right, but the real point is how often do you plan on mass updating
the table? Thats (hopefully) the only time a vacuum full should be
needed. Otherwise (auto) vacuum will probably work most of the time.

> 6. Keep tables that are to be updated frequently as narrow as possible: Link
> them to wider tables to store the columns that are less frequently updated.

Again I don't think its updated frequently so much as mass updated. I
run some databases here that have tens to hundreds of updates every
second. The difference is I don't update *all* 26 million rows at the
same time that often. But If I did, Id probably want to lower the
fillfactor.

For example:
=> update log set raw_data = raw_data+1;
UPDATE 10050886
Time: 59387.021 ms
=> SELECT pg_size_pretty(pg_total_relation_size('log'));
pg_size_pretty
----------------
1939 MB

=> update log set raw_data = raw_data+1;
UPDATE 10050886
Time: 70549.425 ms
=> SELECT pg_size_pretty(pg_total_relation_size('log'));
pg_size_pretty
----------------
2909 MB

=> update log set raw_data = raw_data+1;
UPDATE 10050886
Time: 78551.544 ms
=> SELECT pg_size_pretty(pg_total_relation_size('log'));
pg_size_pretty
----------------
3879 MB

=> update log set raw_data = raw_data+1;
UPDATE 10050886
Time: 74443.945 ms
=> SELECT pg_size_pretty(pg_total_relation_size('log'));
pg_size_pretty
----------------
4848 MB

Here you see basically linear growth, after some vacuuming:

=> VACUUM log;
VACUUM
Time: 193055.857 ms
=> SELECT pg_size_pretty(pg_total_relation_size('log'));
pg_size_pretty
----------------
4848 MB

=> VACUUM log;
VACUUM
Time: 38281.541 ms
whopper=> SELECT pg_size_pretty(pg_total_relation_size('log'));
pg_size_pretty
----------------
4848 MB

=> VACUUM log;
VACUUM
Time: 28.531 ms
=> SELECT pg_size_pretty(pg_total_relation_size('log'));
pg_size_pretty
----------------
4848 MB

Hey... its not shrinking it at all...:
=> VACUUM verbose log;
INFO: vacuuming "public.log"
INFO: "log": found 0 removable, 0 nonremovable row versions in 31 out
of 620425 pages
DETAIL: 0 dead row versions cannot be removed yet.
There were 2511 unused item pointers.
0 pages are entirely empty.
CPU 0.00s/0.01u sec elapsed 0.01 sec.
INFO: vacuuming "pg_toast.pg_toast_10544753"
INFO: index "pg_toast_10544753_index" now contains 0 row versions in 1 pages
DETAIL: 0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO: "pg_toast_10544753": found 0 removable, 0 nonremovable row
versions in 0 out of 0 pages
DETAIL: 0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
VACUUM
Time: 29.070 ms

-- ok lets start over and this time set fillfactor to 50;
=> alter table log set (fillfactor = 50);
=> vacuum full log;
=> SELECT pg_size_pretty(pg_total_relation_size('log'));
pg_size_pretty
----------------
1963 MB

-- 2x the default size, lets see what an update does now
=> update log set raw_data = raw_data+1;
UPDATE 10050886
Time: 70424.752 ms
=> SELECT pg_size_pretty(pg_total_relation_size('log'));
pg_size_pretty
----------------
1963 MB

-- hey ! same size

=> update log set raw_data = raw_data+1;
UPDATE 10050886
Time: 58112.895 ms
=> SELECT pg_size_pretty(pg_total_relation_size('log'));
pg_size_pretty
----------------
1963 MB
(1 row)

-- Still the same

So in short... vacuum seems to fall over flat with mass updates, set a
lower fillfactor :).

> So with our conclusion pile so far we can deduce that if we were to keep all
> of our data in two column tables (one to link them together, and the other to
> store one column of data), we stand a much better chance of making the entire
> table to be counted fit in RAM,

I dunno about that... Seems like if you only had 2 tables both would
fail to fit in ram fairly quickly :)

> so we simply apply the WHERE clause to a
> specific table as opposed to a column within a wider table... This seems to
> defeat the entire goal of the relational database...

Sure... thats one answer. See
http://wiki.postgresql.org/wiki/Slow_Counting for more. But the basic
ideas are:
1) estimate the count
2) use triggers and keep the count somewhere else
3) keep it in ram

From:	david(at)lang(dot)hm
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Neil Whelchel <neil(dot)whelchel(at)gmail(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject:	Re: Slow count(*) again...
Date:	2010-10-13 19:37:45
Message-ID:	alpine.DEB.2.00.1010131237010.30818@asgard.lang.hm
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On Wed, 13 Oct 2010, Tom Lane wrote:

> Neil Whelchel <neil(dot)whelchel(at)gmail(dot)com> writes:
>
> That's probably got little to do with caching and everything to do with
> setting hint bits on the first SELECT pass.
>
> I concur with Mark's question about whether your UPDATE pushed the table
> size across the limit of what would fit in RAM.

Neil, can you just double the size of your initial test to make sure that
it's too large to fit in ram to start with?

David Lang

From:	Neil Whelchel <neil(dot)whelchel(at)gmail(dot)com>
To:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: Slow count(*) again...
Date:	2010-10-13 20:08:26
Message-ID:	201010131308.27760.neil.whelchel@gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On Wednesday 13 October 2010 05:33:28 Mladen Gogala wrote:
> On 10/13/2010 8:12 AM, Greg Smith wrote:
> > The work incorporating a more stable XFS into RHEL started with xfsprogs
> > 3.0.1-6 going into Fedora 11, and 3.1.X would represent a current
> > release. So your Ubuntu kernel is two major improvement releases
> > behind, 3.0 and 3.1 were the upgrades to xfsprogs where things really
> > got going again making that code modern and solid. Ubuntu Lucid
> > switched to 3.1.0, RHEL6 will probably ship 3.1.0 too.
>
> I am afraid that my management will not let me use anything that doesn't
> exist as a RPM package in the current Red Hat distribution. No Ubuntu,
> no Fedora, no manual linking. There will always be that ominous
> question: how many other companies are using XFS? From the business
> perspective, questions like that make perfect sense.

XFS sees extensive use in the billing departments of many phone and utility
companies. Maybe not the code that you see in Linux, but the on-disk format,
which I think is unchanged since its original release. (You can use the modern
XFS code in Linux to mount a filesystem from an older SGI machine that used
XFS.) The code in Linux is based on the code that SGI released some time in
2000, which worked at that time very well for the SGI machine. At the time
that SGI came up with XFS, they had realtime in mind. They added specific
features to the filesystem to guarantee IO at a specific rate, this was
intended for database and other realtime applications. I have not looked at
the Linux version to see if it contains these extensions. I will be doing this
soon, however as my next big project will require a true realtime system.
-Neil-

From:	Neil Whelchel <neil(dot)whelchel(at)gmail(dot)com>
To:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: Slow count(*) again...
Date:	2010-10-13 20:19:06
Message-ID:	201010131319.06792.neil.whelchel@gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On Wednesday 13 October 2010 06:27:34 you wrote:
> On Wed, Oct 13, 2010 at 4:38 AM, Neil Whelchel <neil(dot)whelchel(at)gmail(dot)com>
wrote:
> > There seems to be allot of discussion about VACUUM FULL, and its
> > problems. The overall buzz seems to be that VACUUM FULL is a bad idea (I
> > could be wrong here). It has been some time since I have read the
> > changelogs, but I seem to remember that there have been some major
> > changes to VACUUM FULL recently. Maybe this needs to be re-visited in
> > the documentation.
>
> In 9.0, VACUUM FULL does something similar to what CLUSTER does. This
> is a much better idea than what it did in 8.4 and prior.
>
> > crash:~# time psql -U test test -c "VACUUM FULL log;"
> > VACUUM
> >
> > real 4m49.055s
> > user 0m0.000s
> > sys 0m0.000s
> >
> > crash:~# time psql -U test test -c "SELECT count(*) FROM log;"
> > count
> > ----------
> > 10050886
> > (1 row)
> >
> > real 0m9.665s
> > user 0m0.000s
> > sys 0m0.004s
> >
> > A huge improvement from the minute and a half before the VACUUM FULL.
>
> This is a very surprising result that I would like to understand
> better. Let's assume that your UPDATE statement bloated the table by
> 2x (you could use pg_relation_size to find out exactly; the details
> probably depend on fillfactor which you might want to lower if you're
> going to do lots of updates). That ought to mean that count(*) has to
> grovel through twice as much data, so instead of taking 9 seconds it
> ought to take 18 seconds. Where the heck is the other 1:12 going?
> This might sort of make sense if the original table was laid out
> sequentially on disk and the updated table was not, but how and why
> would that happen?
This is likely due to the table not fitting in memory before the VACUUM FULL.
I am glad that you suggested using pg_relation_size, I somehow didn't think of
it at the time. I will redo the test and publish the results of
pg_relation_size.
-Neil-

From:	Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
To:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: Slow count(*) again...
Date:	2010-10-13 21:07:07
Message-ID:	4CB61F7B.4030404@catalyst.net.nz
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On 13/10/10 23:16, Neil Whelchel wrote:
>
> The good ol' bruit force approach! I knew I'd see this one sooner or later.
> Though I was not sure if I was going to see the 16TB of RAM suggestion first.
> Seriously though, as the title of this thread suggests, everything is
> relative. Sure count(*) and everything else will work faster with more system
> power. It just seems to me that count(*) is slower than it could be given a
> set of conditions....
>
> Since the thread has started, I have had people ask about different system
> configurations, especially the filesystem (XFS, ext4...). I have never tested
> ext4, and since we are all involved here, I thought that I could do so and
> share my results for others, that is why I got into time testing stuff.
> Time testing count(*) in my later postings is really not the point as count is
> simply dragging the entire table off of the RAID through RAM, I can use any
> other function like max()... No that can narrow down its scan with an INDEX...
> Ok, sum(), there we go!
>
>
>

Well in some (quite common) use cases, the queries cannot be known in
advance, and the tables are considerably bigger than ram... this makes
the fast IO a good option - sometimes better (and in the end cheaper)
than trying to maintain every conceivable covering index.

Of course it would be great if Postgres could use the indexes alone to
execute certain queries - we may see some of that capability in the next
few release (keep and eye on messages concerning the 'Visibility Map').

regards

Mark

From:	Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
To:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: Slow count(*) again...
Date:	2010-10-13 21:48:21
Message-ID:	4CB62925.2070401@catalyst.net.nz
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On 13/10/10 21:44, Mladen Gogala wrote:
> On 10/13/2010 3:19 AM, Mark Kirkwood wrote:
>> I think that major effect you are seeing here is that the UPDATE has
>> made the table twice as big on disk (even after VACUUM etc), and it has
>> gone from fitting in ram to not fitting in ram - so cannot be
>> effectively cached anymore.
>>
> In the real world, tables are larger than the available memory. I have
> tables of several hundred gigabytes in size. Tables shouldn't be
> "effectively cached", the next step would be to measure "buffer cache
> hit ratio", tables should be effectively used.
>
Sorry Mladen,

I didn't mean to suggest that all tables should fit into ram... but was
pointing out (one reason) why Neil would expect to see a different
sequential scan speed after the UPDATE.

I agree that in many interesting cases, tables are bigger than ram [1].

Cheers

Mark

[1] Having said that, these days 64GB of ram is not unusual for a
server... and we have many real customer databases smaller than this
where I work.

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Jesper Krogh <jesper(at)krogh(dot)cc>
Cc:	Neil Whelchel <neil(dot)whelchel(at)gmail(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject:	Re: Slow count(*) again...
Date:	2010-10-14 02:18:14
Message-ID:	AANLkTi=vwE5-Gh_JmGLV=PTpaNwthaoFHnwhEaV7OOoS@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On Wed, Oct 13, 2010 at 1:59 PM, Jesper Krogh <jesper(at)krogh(dot)cc> wrote:
> On 2010-10-13 15:28, Robert Haas wrote:
>>
>> On Wed, Oct 13, 2010 at 6:16 AM, Neil Whelchel<neil(dot)whelchel(at)gmail(dot)com>
>> wrote:
>>
>>>
>>> I might go as far as to rattle the cage of the developers to see if it
>>> makes
>>> any sense to add some column oriented storage capability to Postgres.
>>> That
>>> would be the hot ticket to be able to specify an attribute on a column so
>>> that
>>> the back end could shadow or store a column in a column oriented table so
>>> aggregate functions could work on them with good efficiency, or is that
>>> an
>>> INDEX?
>>>
>>
>> I'd love to work on that, but without funding it's tough to find the
>> time. It's a big project.
>>
>
> Is it hugely different from just getting the visibillity map suitable
> for doing index-only scans and extracting values from the index
> directly as Heikki has explained?]

I think that there's a lot more to a real column-oriented database
than index-only scans, although, of course, index-only scans are very
important.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	"mark" <dvlhntr(at)gmail(dot)com>
To:	"'Robert Haas'" <robertmhaas(at)gmail(dot)com>, "'Neil Whelchel'" <neil(dot)whelchel(at)gmail(dot)com>
Cc:	<pgsql-performance(at)postgresql(dot)org>
Subject:	Re: Slow count(*) again...
Date:	2010-10-14 04:22:16
Message-ID:	008701cb6b57$634f2890$29ed79b0$@com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

Could this be an interesting test use of https://www.fossexperts.com/ ?

'Community' driven proposal - multiple people / orgs agree to pay various
portions? Maybe with multiple funders a reasonable target fund amount could
be reached.

Just throwing around ideas here.

Mark

-----Original Message-----
From: pgsql-performance-owner(at)postgresql(dot)org
[mailto:pgsql-performance-owner(at)postgresql(dot)org] On Behalf Of Robert Haas
Sent: Wednesday, October 13, 2010 7:29 AM
To: Neil Whelchel
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: [PERFORM] Slow count(*) again...

On Wed, Oct 13, 2010 at 6:16 AM, Neil Whelchel <neil(dot)whelchel(at)gmail(dot)com>
wrote:
> I might go as far as to rattle the cage of the developers to see if it
makes
> any sense to add some column oriented storage capability to Postgres. That
> would be the hot ticket to be able to specify an attribute on a column so
that
> the back end could shadow or store a column in a column oriented table so
> aggregate functions could work on them with good efficiency, or is that an
> INDEX?

I'd love to work on that, but without funding it's tough to find the
time. It's a big project.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Jesper Krogh <jesper(at)krogh(dot)cc>
To:	mark <dvlhntr(at)gmail(dot)com>
Cc:	'Robert Haas' <robertmhaas(at)gmail(dot)com>, 'Neil Whelchel' <neil(dot)whelchel(at)gmail(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject:	Re: Slow count(*) again...
Date:	2010-10-14 15:29:40
Message-ID:	4CB721E4.90103@krogh.cc
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On 2010-10-14 06:22, mark wrote:
> Could this be an interesting test use of https://www.fossexperts.com/ ?
>
> 'Community' driven proposal - multiple people / orgs agree to pay various
> portions? Maybe with multiple funders a reasonable target fund amount could
> be reached.
>
I might convince my boss to chip in... but how do we get the task
up there.. should we find one to give an estimate first?

--
Jesper

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	mark <dvlhntr(at)gmail(dot)com>
Cc:	Neil Whelchel <neil(dot)whelchel(at)gmail(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject:	Re: Slow count(*) again...
Date:	2010-10-14 19:56:05
Message-ID:	AANLkTi=X46EjYEyJ1jZKCPNci7=SHQqbnj1cHU=uv8Ko@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On Thu, Oct 14, 2010 at 12:22 AM, mark <dvlhntr(at)gmail(dot)com> wrote:
> Could this be an interesting test use of https://www.fossexperts.com/ ?
>
> 'Community' driven proposal - multiple people / orgs agree to pay various
> portions? Maybe with multiple funders a reasonable target fund amount could
> be reached.
>
> Just throwing around ideas here.

This is a bit off-topic, but as of now, they're only accepting
proposals for projects to be performed by CommandPrompt itself. So
that doesn't help me much (note the sig).

But in theory it's a good idea. Of course, when and if they open it
up, then what? If more than one developer or company is interested in
a project, who determines who gets to do the work and get paid for it?
If that determination is made by CommandPrompt itself, or if it's
just a free-for-all to see who can get their name on the patch that
ends up being committed, it's going to be hard to get other
people/companies to take it very seriously.

Another problem is that even when they do open it up, they apparently
intend to charge 7.5 - 15% of the contract value as a finder's fee.
That's a lot of money. For a $100 project it's totally reasonable,
but for a $10,000 project it's far more expensive than the value of
the service they're providing can justify. (Let's not even talk about
a $100,000 project.)

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Jesper Krogh <jesper(at)krogh(dot)cc>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	mark <dvlhntr(at)gmail(dot)com>, Neil Whelchel <neil(dot)whelchel(at)gmail(dot)com>, pgsql-performance(at)postgresql(dot)org, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
Subject:	Re: Slow count(*) again...
Date:	2010-10-15 05:04:43
Message-ID:	4CB7E0EB.4090707@krogh.cc
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On 2010-10-14 21:56, Robert Haas wrote:
> On Thu, Oct 14, 2010 at 12:22 AM, mark<dvlhntr(at)gmail(dot)com> wrote:
>
>> Could this be an interesting test use of https://www.fossexperts.com/ ?
>>
>> 'Community' driven proposal - multiple people / orgs agree to pay various
>> portions? Maybe with multiple funders a reasonable target fund amount could
>> be reached.
>>
>> Just throwing around ideas here.
>>
> This is a bit off-topic, but as of now, they're only accepting
> proposals for projects to be performed by CommandPrompt itself. So
> that doesn't help me much (note the sig).
>
> But in theory it's a good idea. Of course, when and if they open it
> up, then what? If more than one developer or company is interested in
> a project, who determines who gets to do the work and get paid for it?
> If that determination is made by CommandPrompt itself, or if it's
> just a free-for-all to see who can get their name on the patch that
> ends up being committed, it's going to be hard to get other
> people/companies to take it very seriously.
>
Couldnt you open up a dialog about it?
> Another problem is that even when they do open it up, they apparently
> intend to charge 7.5 - 15% of the contract value as a finder's fee.
> That's a lot of money. For a $100 project it's totally reasonable,
> but for a $10,000 project it's far more expensive than the value of
> the service they're providing can justify. (Let's not even talk about
> a $100,000 project.)
>

Hi Robert.

I can definately see your arguments, but you failed to describe
a "better" way?

Many of us rely heavily on PostgreSQL and would
like to get "this feature", but sponsoring it all alone does not seem
like a viable option (just a guess), taken into consideration we dont
even have an estimate about how big it is, but I saw the estimate of
15K USD of the "ALTER column position" description.. and the
visibillity map is most likely in the "same ballpark" (from my
perspective).

So in order to get something like a visibillity map (insert your
favorite big
feature here), you have the option:

* Sponsor it all by yourself. (where its most likely going to be too big,
or if it is the center of your applictions, then you definitely turn
to a
RDBMS that has supported it for longer times, if you can).
* Wait for someone else to sponsor it all by them selves. (that happens
occationally, but for particular features is it hard to see when and
what,
and the actual sponsor would still have the dilemma in the first point).
* Hack it yourselves (many of us dont have time neither skills to do it, and
my employer actually wants me to focus on the stuff that brings most
direct
value for my time, which is a category hacking PG does not fall into
when the
business is about something totally else).
* A kind of microsponsoring like above?
* Your proposal in here?

To me.. the 4'th bullet point looks like the most viable so far..

To be honest, if it is EDB, Redpill, Command Prompt, 2nd Quadrant or
whoever end up doing the job is, seen from this perspective not
important, just it ends in the hands of someone "capable" of doing
it. ... allthougth Heikki has done some work on this task allready.

Preferrably I would like to get it coordinated by the PG project itself.
But
I can see that it is really hard to do that kind of stuff. And you would
still
face the challenge about who should end up doing the thing.

Jesper .. dropped Joshua Drake on CC, he might have given all of this some
seconds of thought allready.

--
Jesper

From:	Devrim GÜNDÜZ <devrim(at)gunduz(dot)org>
To:	Greg Smith <greg(at)2ndquadrant(dot)com>
Cc:	Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject:	Re: Slow count(*) again...
Date:	2010-10-15 08:36:26
Message-ID:	1287131786.2663.7.camel@hp-laptop02.gunduz.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On Wed, 2010-10-13 at 09:02 -0400, Greg Smith wrote:

> XFS support is available as an optional module starting in RHEL 5.5.
> In CentOS, you just grab it, so that's what I've been doing. My
> understanding is that you may have to ask your sales rep to enable
> access to it under the official RedHat Network channels if you're
> using a subscription from them. I'm not sure exactly what the support
> situation is with it, but it's definitely available as an RPM from
> RedHat.

Right. It is called "Red Hat Scalable File System", and once paid, it is
available via RHN.
--
Devrim GÜNDÜZ
PostgreSQL Danışmanı/Consultant, Red Hat Certified Engineer
PostgreSQL RPM Repository: http://yum.pgrpms.org
Community: devrim~PostgreSQL.org, devrim.gunduz~linux.org.tr
http://www.gunduz.org Twitter: http://twitter.com/devrimgunduz

From:	Greg Smith <greg(at)2ndquadrant(dot)com>
To:	Jesper Krogh <jesper(at)krogh(dot)cc>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, mark <dvlhntr(at)gmail(dot)com>, Neil Whelchel <neil(dot)whelchel(at)gmail(dot)com>, pgsql-performance(at)postgresql(dot)org, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
Subject:	Re: Slow count(*) again...
Date:	2010-10-16 03:28:29
Message-ID:	4CB91BDD.2010809@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

Jesper Krogh wrote:
> To be honest, if it is EDB, Redpill, Command Prompt, 2nd Quadrant or
> whoever end up doing the job is, seen from this perspective not
> important, just it ends in the hands of someone "capable" of doing
> it. ... although Heikki has done some work on this task already.

Now you're closing in on why this is a touchy subject. Heikki has
already done work here funded by EDB. As such, the idea of anyone else
being put in charge of fund raising and allocation for this particular
feature would be a political mess. While it would be nice if there was
a completely fair sponsorship model for developing community PostgreSQL
features, overseen by a benevolent, free, and completely unaffiliated
overlord, we're not quite there yet. In cases like these, where there's
evidence a company with a track record of delivering features is already
involved, you're probably better off contacting someone from there
directly--rather than trying to fit that into the public bounty model
some PostgreSQL work is getting done via lately. The visibility map is
a particularly troublesome one, because the list of "capable" people who
could work on that, but who aren't already working at a company having
some relations with EDB, is rather slim.

I know that's kind of frustrating to hear, for people who would like to
get a feature done but can't finance the whole thing themselves. But
look on the bright side--the base price is free, and when you give most
PostgreSQL companies money to work on something it's at least possible
to get what you want done. You'd have to pay a whole lot more than the
$15K number you threw out there before any of the commercial database
vendors would pay any attention to your particular feature request.

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	bricklen <bricklen(at)gmail(dot)com>
Cc:	Neil Whelchel <neil(dot)whelchel(at)gmail(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject:	Re: Slow count(*) again...
Date:	2010-10-21 04:07:22
Message-ID:	201010210407.o9L47M912017@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

bricklen wrote:
> On Sat, Oct 9, 2010 at 4:26 PM, Neil Whelchel <neil(dot)whelchel(at)gmail(dot)com> wrote:
> > Maybe an
> > estimate(*) that works like count but gives an answer from the index without
> > checking visibility? I am sure that this would be good enough to make a page
> > list, it is really no big deal if it errors on the positive side, maybe the
> > list of pages has an extra page off the end. I can live with that. What I
> > can't live with is taking 13 seconds to get a page of results from 850,000
> > rows in a table.
> > -Neil-
> >
>
> FWIW, Michael Fuhr wrote a small function to parse the EXPLAIN plan a
> few years ago and it works pretty well assuming your stats are up to
> date.
>
> http://markmail.org/message/gknqthlwry2eoqey

What I recommend is to execute the query with EXPLAIN, and look at the
estimated rows and costs. If the row number is large, just round it to
the nearest thousand and return it to the application as a count ---
this is what Google does for searches (just try it).

If the row count/cost are low, run the query and return an exact count.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

From:	Scott Carey <scott(at)richrelevance(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Jesper Krogh <jesper(at)krogh(dot)cc>, Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>, "david(at)lang(dot)hm" <david(at)lang(dot)hm>, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, Vitalii Tymchyshyn <tivv00(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: Slow count(*) again...
Date:	2010-10-21 04:47:24
Message-ID:	B54FDA83-1DBD-4313-AF44-EF53F3EB85D4@richrelevance.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On Oct 12, 2010, at 11:58 AM, Tom Lane wrote:

> Jesper Krogh <jesper(at)krogh(dot)cc> writes:
>> On 2010-10-12 19:07, Tom Lane wrote:
>>> Anyway, if anyone is hot to make COUNT(*) faster, that's where to look.
>
>> Just having 32 bytes bytes of "payload" would more or less double
>> you time to count if I read you test results correctly?. .. and in the
>> situation where diskaccess would be needed .. way more.
>
>> Dividing by pg_relation_size by the amout of tuples in our production
>> system I end up having no avg tuple size less than 100bytes.
>
> Well, yeah. I deliberately tested with a very narrow table so as to
> stress the per-row CPU costs as much as possible. With any wider table
> you're just going to be I/O bound.

On a wimpy disk, I/O bound for sure. But my disks go 1000MB/sec. No query can go fast enough for them. The best I've gotten is 800MB/sec, on a wide row (average 800 bytes). Most tables go 300MB/sec or so. And with 72GB of RAM, many scans are in-memory anyway.

A single SSD with supercapacitor will go about 500MB/sec by itself next spring. I will easily be able to build a system with 2GB/sec I/O for under $10k.

From:	Jesper Krogh <jesper(at)krogh(dot)cc>
To:	Scott Carey <scott(at)richrelevance(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>, "david(at)lang(dot)hm" <david(at)lang(dot)hm>, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, Vitalii Tymchyshyn <tivv00(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: Slow count(*) again...
Date:	2010-10-21 18:13:24
Message-ID:	4CC082C4.4070704@krogh.cc
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On 2010-10-21 06:47, Scott Carey wrote:
> On a wimpy disk, I/O bound for sure. But my disks go 1000MB/sec.
> No query can go fast enough for them. The best I've gotten is
> 800MB/sec, on a wide row (average 800 bytes). Most tables go
> 300MB/sec or so. And with 72GB of RAM, many scans are in-memory
> anyway.

Is it cpu or io bound while doing it?

Can you scan it faster using time cat relation-oid.* > /dev/null

> A single SSD with supercapacitor will go about 500MB/sec by itself
> next spring. I will easily be able to build a system with 2GB/sec
> I/O for under $10k.

What filesystem are you using? Readahead?
Can you try to check the filesystemfragmentation of the table using filefrag?

--
Jesper

From:	Scott Carey <scott(at)richrelevance(dot)com>
To:	Jesper Krogh <jesper(at)krogh(dot)cc>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>, "david(at)lang(dot)hm" <david(at)lang(dot)hm>, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, Vitalii Tymchyshyn <tivv00(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: Slow count(*) again...
Date:	2010-10-21 23:11:22
Message-ID:	CF6DEAD9-50AC-4EE4-A4CC-FA0B31D82D2C@richrelevance.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On Oct 21, 2010, at 11:13 AM, Jesper Krogh wrote:

> On 2010-10-21 06:47, Scott Carey wrote:
> > On a wimpy disk, I/O bound for
> sure. But my disks go 1000MB/sec.
>
> > No query can go fast enough for them. The best I've gotten is
>
> > 800MB/sec, on a wide row (average 800 bytes). Most tables go
>
> > 300MB/sec or so. And with 72GB of RAM, many scans are in-memory
>
> > anyway.
>
>
> Is it cpu or io bound while doing it?
I/O bound with the fio benchmark tool if 16K blocks or greater, CPU bound with 8K blocks or smaller. CentOS 5.5.
CPU bound with postgres.

> Can you scan it faster using time cat relation-oid.* > /dev/null
>

I'm not sure what you mean. in psql, select * piped to /dev/null is VERY CPU bound because of all the formatting. I haven't toyed with COPY. Do you mean the actual files? 'dd' tests from actual files are similar to fio, but not as consistent and hard to add concurrency. That is faster than postgres.

>
> > A single SSD with supercapacitor will go about 500MB/sec by itself
>
> > next spring. I will easily be able to build a system with 2GB/sec
>
> > I/O for under $10k.
>
>
>
> What filesystem are you using? Readahead?
> Can you try to check the filesystemfragmentation of the table using filefrag?
>
XFS, defragmented once a day. Readahead 40960 (20MB, 1MB per spindle). two raid 10 arrays, each 10 discs each (2 hot spare), software raid-0 tying those together (md, 1MB blocks). Two Adaptec 5805 (or 5085, the external SAS one). A third raid card for the OS/xlog with 4x10krpm sas drives internal.

Fragmentation quickly takes this down a lot as do small files and concurrent activity, since its only enough spindles for ~2000 iops. But its almost all large reporting queries on partitioned tables (500,000 partitions). A few smaller tables are starting to cause too many seeks so those might end up on a smaller, high iops tablespace later.

Over time the disks have filled up and there is a significant slowdown in sequential transfer at the end of the partition -- 600MB/sec max. That is still CPU bound on most scans, but postgres can go that fast on some scans.

Off topic:
Other interesting features is how this setup causes the system tables to bloat by factors of 2x to 8x each week, and requires frequent vacuum full + reindex on several of them else they become 1.5GB in size. Nothing like lots of temp table work + hour long concurrent transactions to make the system catalog bloat. I suppose with 8.4 many temp tables could be replaced using WITH queries, but in other cases analyzing a temp table is the only way to get a sane query plan.

> --
> Jesper
>
>
>

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>, "david(at)lang(dot)hm" <david(at)lang(dot)hm>, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, Vitalii Tymchyshyn <tivv00(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject:	Re: [PERFORM] Slow count(*) again...
Date:	2011-02-01 22:47:06
Message-ID:	201102012247.p11Ml6u02682@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

Tom Lane wrote:
> At this point what we've got is 25% of the runtime in nodeAgg.c overhead,
> and it's difficult to see how to get any real improvement without tackling
> that. Rather than apply the patch shown above, I'm tempted to think about
> hard-wiring COUNT(*) as a special case in nodeAgg.c such that we don't go
> through advance_aggregates/advance_transition_function at all, but just
> increment a counter directly. However, that would very clearly be
> optimizing COUNT(*) and nothing else. Given the opinions expressed
> elsewhere in this thread that heavy reliance on COUNT(*) represents
> bad application design, I'm not sure that such a patch would meet with
> general approval.
>
> Actually the patch shown above is optimizing COUNT(*) and nothing else,
> too, since it's hard to conceive of any other zero-argument aggregate.
>
> Anyway, if anyone is hot to make COUNT(*) faster, that's where to look.
> I don't think any of the previous discussion in this thread is on-point
> at all, except for the parts where people suggested avoiding it.

Do we want a TODO about optimizing COUNT(*) to avoid aggregate
processing overhead?

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

From:	Andrew Dunstan <andrew(at)dunslane(dot)net>
To:	Bruce Momjian <bruce(at)momjian(dot)us>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>, "david(at)lang(dot)hm" <david(at)lang(dot)hm>, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, Vitalii Tymchyshyn <tivv00(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject:	Re: [PERFORM] Slow count(*) again...
Date:	2011-02-01 23:03:39
Message-ID:	4D48914B.4050403@dunslane.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On 02/01/2011 05:47 PM, Bruce Momjian wrote:
> Tom Lane wrote:
>> At this point what we've got is 25% of the runtime in nodeAgg.c overhead,
>> and it's difficult to see how to get any real improvement without tackling
>> that. Rather than apply the patch shown above, I'm tempted to think about
>> hard-wiring COUNT(*) as a special case in nodeAgg.c such that we don't go
>> through advance_aggregates/advance_transition_function at all, but just
>> increment a counter directly. However, that would very clearly be
>> optimizing COUNT(*) and nothing else. Given the opinions expressed
>> elsewhere in this thread that heavy reliance on COUNT(*) represents
>> bad application design, I'm not sure that such a patch would meet with
>> general approval.
>>
>> Actually the patch shown above is optimizing COUNT(*) and nothing else,
>> too, since it's hard to conceive of any other zero-argument aggregate.
>>
>> Anyway, if anyone is hot to make COUNT(*) faster, that's where to look.
>> I don't think any of the previous discussion in this thread is on-point
>> at all, except for the parts where people suggested avoiding it.
> Do we want a TODO about optimizing COUNT(*) to avoid aggregate
> processing overhead?

Whether or not it's bad application design, it's ubiquitous, and we
should make it work as best we can, IMNSHO. This often generates
complaints about Postgres, and if we really plan for world domination
this needs to be part of it.

cheers

andrew

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc:	Bruce Momjian <bruce(at)momjian(dot)us>, Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>, "david(at)lang(dot)hm" <david(at)lang(dot)hm>, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, Vitalii Tymchyshyn <tivv00(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject:	Re: [PERFORM] Slow count(*) again...
Date:	2011-02-01 23:12:44
Message-ID:	5100.1296601964@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> On 02/01/2011 05:47 PM, Bruce Momjian wrote:
>> Tom Lane wrote:
>>> At this point what we've got is 25% of the runtime in nodeAgg.c overhead,
>>> and it's difficult to see how to get any real improvement without tackling
>>> that.

>> Do we want a TODO about optimizing COUNT(*) to avoid aggregate
>> processing overhead?

> Whether or not it's bad application design, it's ubiquitous, and we
> should make it work as best we can, IMNSHO. This often generates
> complaints about Postgres, and if we really plan for world domination
> this needs to be part of it.

I don't think that saving ~25% on COUNT(*) runtime will help that at all.
The people who complain about it expect it to be instantaneous.

If this sort of hack were free, I'd be all for doing it anyway; but I'm
concerned that adding tests to enable a fast path will slow down every
other aggregate, or else duplicate a lot of code that we'll then have to
maintain.

regards, tom lane

From:	Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>
To:	Bruce Momjian <bruce(at)momjian(dot)us>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "david(at)lang(dot)hm" <david(at)lang(dot)hm>, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, Vitalii Tymchyshyn <tivv00(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject:	Re: [PERFORM] Slow count(*) again...
Date:	2011-02-01 23:21:04
Message-ID:	4D489560.2040107@vmsinfo.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On 2/1/2011 5:47 PM, Bruce Momjian wrote:
> Do we want a TODO about optimizing COUNT(*) to avoid aggregate
> processing overhead?
>

Definitely not. In my opinion, and I've seen more than a few database
designs, having count(*) is almost always an error.
If I am counting a large table like the one below, waiting for 30
seconds more is not going to make much of a difference.
To paraphrase Kenny Rogers, it will be time enough for counting when the
application is done.

Timing is on.
news=> select count(*) from moreover_documents_y2011m01;
count
----------
20350907
(1 row)

Time: 124142.437 ms
news=>

Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com
The Leader in Integrated Media Intelligence Solutions

From:	Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>
To:	Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc:	Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "david(at)lang(dot)hm" <david(at)lang(dot)hm>, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, Vitalii Tymchyshyn <tivv00(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: [HACKERS] Slow count(*) again...
Date:	2011-02-01 23:44:17
Message-ID:	4D489AD1.1010105@vmsinfo.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On 2/1/2011 6:03 PM, Andrew Dunstan wrote:
> Whether or not it's bad application design, it's ubiquitous, and we
> should make it work as best we can, IMNSHO. This often generates
> complaints about Postgres, and if we really plan for world domination
> this needs to be part of it.

There are many other things to fix first. One of them would be optimizer
decisions when a temp table is involved. I would also vote for wait
event interface, tracing and hints, much rather than speeding up
count(*). World domination will not be achieved by speeding up count(*),
it will be achieved by providing overall performance akin to what the
player who has already achieved the world domination. I believe that the
company is called "Oracle Corp." or something like that?

Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com
The Leader in Integrated Media Intelligence Solutions

From:	Samuel Gendler <sgendler(at)ideasculptor(dot)com>
To:	Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>
Cc:	Andrew Dunstan <andrew(at)dunslane(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "david(at)lang(dot)hm" <david(at)lang(dot)hm>, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, Vitalii Tymchyshyn <tivv00(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: [HACKERS] Slow count(*) again...
Date:	2011-02-02 03:13:38
Message-ID:	AANLkTikftsY8=sFf9X6XD6B37-LBHykHzY4ByMPNRZFA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On Tue, Feb 1, 2011 at 3:44 PM, Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>wrote:

> On 2/1/2011 6:03 PM, Andrew Dunstan wrote:
>
>> Whether or not it's bad application design, it's ubiquitous, and we
>> should make it work as best we can, IMNSHO. This often generates
>> complaints about Postgres, and if we really plan for world domination
>> this needs to be part of it.
>>
>
> There are many other things to fix first. One of them would be optimizer
> decisions when a temp table is involved. I would also vote for wait event
> interface, tracing and hints, much rather than speeding up count(*). World
> domination will not be achieved by speeding up count(*), it will be achieved
> by providing overall performance akin to what the player who has already
> achieved the world domination. I believe that the company is called "Oracle
> Corp." or something like that?
>
>
> Mladen Gogala
> Sr. Oracle DBA
>

Don't listen to him. He's got an oracle bias. Slashdot already announced
that NoSQL is actually going to dominate the world, so postgres has already
lost that battle. Everything postgres devs do now is just an exercise in
relational masturbation. Trust me.

From:	Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>
To:	Samuel Gendler <sgendler(at)ideasculptor(dot)com>
Cc:	Andrew Dunstan <andrew(at)dunslane(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "david(at)lang(dot)hm" <david(at)lang(dot)hm>, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, Vitalii Tymchyshyn <tivv00(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: [HACKERS] Slow count(*) again...
Date:	2011-02-02 03:40:16
Message-ID:	4D48D220.9030104@vmsinfo.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

Samuel Gendler wrote:
>
>
> Don't listen to him. He's got an oracle bias.
And bad sinuses, too.
> Slashdot already announced that NoSQL is actually going to dominate
> the world, so postgres has already lost that battle. Everything
> postgres devs do now is just an exercise in relational masturbation.
> Trust me.
>
I knew that there is some entertainment value on this list. Samuel, your
point of view is very..., er, refreshing. Trust me.

--
Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
www.vmsinfo.com

From:	Samuel Gendler <sgendler(at)ideasculptor(dot)com>
To:	mladen(dot)gogala(at)vmsinfo(dot)com
Cc:	Andrew Dunstan <andrew(at)dunslane(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "david(at)lang(dot)hm" <david(at)lang(dot)hm>, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, Vitalii Tymchyshyn <tivv00(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: [HACKERS] Slow count(*) again...
Date:	2011-02-02 04:07:47
Message-ID:	AANLkTinvAvUZGj7_XBUco5GPwiBcwGUvix-06rHPCFEO@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On Tue, Feb 1, 2011 at 7:40 PM, Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>wrote:

> Samuel Gendler wrote:
>
>>
>> Don't listen to him. He's got an oracle bias.
>>
> And bad sinuses, too.
>
> Slashdot already announced that NoSQL is actually going to dominate the
>> world, so postgres has already lost that battle. Everything postgres devs
>> do now is just an exercise in relational masturbation. Trust me.
>>
>> I knew that there is some entertainment value on this list. Samuel, your
> point of view is very..., er, refreshing. Trust me.
>
>
You get that that was sarcasm, right?

>
>

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Andrew Dunstan <andrew(at)dunslane(dot)net>, Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>, "david(at)lang(dot)hm" <david(at)lang(dot)hm>, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, Vitalii Tymchyshyn <tivv00(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject:	Re: [PERFORM] Slow count(*) again...
Date:	2011-02-02 16:03:37
Message-ID:	201102021603.p12G3bb09925@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

Tom Lane wrote:
> Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> > On 02/01/2011 05:47 PM, Bruce Momjian wrote:
> >> Tom Lane wrote:
> >>> At this point what we've got is 25% of the runtime in nodeAgg.c overhead,
> >>> and it's difficult to see how to get any real improvement without tackling
> >>> that.
>
> >> Do we want a TODO about optimizing COUNT(*) to avoid aggregate
> >> processing overhead?
>
> > Whether or not it's bad application design, it's ubiquitous, and we
> > should make it work as best we can, IMNSHO. This often generates
> > complaints about Postgres, and if we really plan for world domination
> > this needs to be part of it.
>
> I don't think that saving ~25% on COUNT(*) runtime will help that at all.
> The people who complain about it expect it to be instantaneous.
>
> If this sort of hack were free, I'd be all for doing it anyway; but I'm
> concerned that adding tests to enable a fast path will slow down every
> other aggregate, or else duplicate a lot of code that we'll then have to
> maintain.

OK, thank you.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>
Cc:	Andrew Dunstan <andrew(at)dunslane(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "david(at)lang(dot)hm" <david(at)lang(dot)hm>, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, Vitalii Tymchyshyn <tivv00(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: [HACKERS] Slow count(*) again...
Date:	2011-02-02 17:19:08
Message-ID:	AANLkTiknZi97dDwm7sHqy0EL7tpNEP9sY5uPV6AN+q=u@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On Tue, Feb 1, 2011 at 6:44 PM, Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com> wrote:
> On 2/1/2011 6:03 PM, Andrew Dunstan wrote:
>>
>> Whether or not it's bad application design, it's ubiquitous, and we
>> should make it work as best we can, IMNSHO. This often generates
>> complaints about Postgres, and if we really plan for world domination
>> this needs to be part of it.
>
> There are many other things to fix first. One of them would be optimizer
> decisions when a temp table is involved.

It would be pretty hard to make autoanalyze work on such tables
without removing some of the performance benefits of having such
tables in the first place - namely, the local buffer manager. But you
could ANALYZE them by hand.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Andrew Dunstan <andrew(at)dunslane(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "david(at)lang(dot)hm" <david(at)lang(dot)hm>, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, Vitalii Tymchyshyn <tivv00(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: [HACKERS] Slow count(*) again...
Date:	2011-02-02 18:11:33
Message-ID:	4D499E55.4020107@vmsinfo.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

Robert Haas wrote:
> On Tue, Feb 1, 2011
> It would be pretty hard to make autoanalyze work on such tables
> without removing some of the performance benefits of having such
> tables in the first place - namely, the local buffer manager. But you
> could ANALYZE them by hand.
>
>
Not necessarily autoanalyze, some default rules for the situations when
stats is not there should be put in place,
like the following:
1) If there is a usable index on the temp table - use it.
2) It there isn't a usable index on the temp table and there is a join,
make the temp table the first table
in the nested loop join.

People are complaining about the optimizer not using the indexes all
over the place, there should be a way to
make the optimizer explicitly prefer the indexes, like was the case with
Oracle's venerable RBO (rules based
optimizer). RBO didn't use statistics, it had a rank of access method
and used the access method with the highest
rank of all available access methods. In practice, it translated into:
if an index exists - use it.

--

Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com
The Leader in Integrated Media Intelligence Solutions

From:	Jon Nelson <jnelson+pgsql(at)jamponi(dot)net>
To:	Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "david(at)lang(dot)hm" <david(at)lang(dot)hm>, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, Vitalii Tymchyshyn <tivv00(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: [HACKERS] Slow count(*) again...
Date:	2011-02-02 18:19:20
Message-ID:	AANLkTimo2J-BagObs_V1bGaHOO9qL+kTF6s6-wZ=M3KW@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On Wed, Feb 2, 2011 at 12:11 PM, Mladen Gogala
<mladen(dot)gogala(at)vmsinfo(dot)com> wrote:
> Robert Haas wrote:
>>
>> On Tue, Feb 1, 2011 It would be pretty hard to make autoanalyze work on
>> such tables
>> without removing some of the performance benefits of having such
>> tables in the first place - namely, the local buffer manager. But you
>> could ANALYZE them by hand.
>>
>>
>
> Not necessarily autoanalyze, some default rules for the situations when
> stats is not there should be put in place,
> like the following:
> 1) If there is a usable index on the temp table - use it.
> 2) It there isn't a usable index on the temp table and there is a join, make
> the temp table the first table
> in the nested loop join.
>
> People are complaining about the optimizer not using the indexes all over
> the place, there should be a way to
> make the optimizer explicitly prefer the indexes, like was the case with
> Oracle's venerable RBO (rules based
> optimizer). RBO didn't use statistics, it had a rank of access method and
> used the access method with the highest
> rank of all available access methods. In practice, it translated into: if an
> index exists - use it.

However, sometimes using an index results in a HORRIBLE HORRIBLE plan.
I recently encountered the issue myself, and plopping an ANALYZE
$tablename in there, since I was using a temporary table anyway, make
all the difference. The planner switched from an index-based query to
a sequential scan, and a sequential scan was (is) vastly more
efficient in this particular case.

Personally, I'd get rid of autovacuum/autoanalyze support on temporary
tables (they typically have short lives and are often accessed
immediately after creation preventing the auto* stuff from being
useful anyway), *AND* every time I ask I'm always told "make sure
ANALYZE the table before you use it".

--
Jon

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>
Cc:	Andrew Dunstan <andrew(at)dunslane(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "david(at)lang(dot)hm" <david(at)lang(dot)hm>, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, Vitalii Tymchyshyn <tivv00(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: [HACKERS] Slow count(*) again...
Date:	2011-02-02 18:20:59
Message-ID:	AANLkTinqLbFettmezJP9iecGncxb2pp+OqfmZiqf0tCW@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On Wed, Feb 2, 2011 at 1:11 PM, Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com> wrote:
> Not necessarily autoanalyze, some default rules for the situations when
> stats is not there should be put in place,
> like the following:
> 1) If there is a usable index on the temp table - use it.
> 2) It there isn't a usable index on the temp table and there is a join, make
> the temp table the first table
> in the nested loop join.

The default selectivity estimates ought to make this happen already.

create temporary table foo (a integer, b text);
CREATE TABLE
insert into foo select g, random()::text||random()::text from
generate_series(1, 10000) g;
INSERT 0 10000
alter table foo add primary key (a);
NOTICE: ALTER TABLE / ADD PRIMARY KEY will create implicit index
"foo_pkey" for table "foo"
ALTER TABLE
explain select * from foo where a = 1;
QUERY PLAN
---------------------------------------------------------------------
Index Scan using foo_pkey on foo (cost=0.00..8.27 rows=1 width=36)
Index Cond: (a = 1)
(2 rows)

You're going to need to come up with actual examples of situations
that you think can be improved upon if you want to get anywhere here.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Jon Nelson <jnelson+pgsql(at)jamponi(dot)net>
Cc:	Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "david(at)lang(dot)hm" <david(at)lang(dot)hm>, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, Vitalii Tymchyshyn <tivv00(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: [HACKERS] Slow count(*) again...
Date:	2011-02-02 18:32:28
Message-ID:	AANLkTin7A_Q=Yc9Wf_5=POp2mVqmaiwqUDsFMLG68H45@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On Wed, Feb 2, 2011 at 1:19 PM, Jon Nelson <jnelson+pgsql(at)jamponi(dot)net> wrote:
> However, sometimes using an index results in a HORRIBLE HORRIBLE plan.
> I recently encountered the issue myself, and plopping an ANALYZE
> $tablename in there, since I was using a temporary table anyway, make
> all the difference. The planner switched from an index-based query to
> a sequential scan, and a sequential scan was (is) vastly more
> efficient in this particular case.

Yep...

> Personally, I'd get rid of autovacuum/autoanalyze support on temporary
> tables

We don't have any such support, which I think is the root of Mladen's complaint.

> (they typically have short lives and are often accessed
> immediately after creation preventing the auto* stuff from being
> useful anyway), *AND* every time I ask I'm always told "make sure
> ANALYZE the table before you use it".

Yeah. Any kind of bulk load into an empty table can be a problem,
even if it's not temporary. When you load a bunch of data and then
immediately plan a query against it, autoanalyze hasn't had a chance
to do its thing yet, so sometimes you get a lousy plan. In the case
of temporary tables, this can happen even if there's a delay before
you use the data. Some sort of fix for this - where the first query
that needs the stats does an analyze first - seems like it could be
quite useful (although it would suck if the transaction that took it
upon itself to do the analyze then rolled back, losing the stats and
forcing the next guy to do it all over again).

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Greg Smith <greg(at)2ndquadrant(dot)com>
To:	Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "david(at)lang(dot)hm" <david(at)lang(dot)hm>, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, Vitalii Tymchyshyn <tivv00(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: [HACKERS] Slow count(*) again...
Date:	2011-02-02 18:47:21
Message-ID:	4D49A6B9.4060608@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

Mladen Gogala wrote:
> People are complaining about the optimizer not using the indexes all
> over the place, there should be a way to
> make the optimizer explicitly prefer the indexes, like was the case
> with Oracle's venerable RBO (rules based
> optimizer). RBO didn't use statistics, it had a rank of access method
> and used the access method with the highest
> rank of all available access methods. In practice, it translated into:
> if an index exists - use it.

Given that even Oracle kicked out the RBO a long time ago, I'm not so
sure longing for those good old days will go very far. I regularly see
queries that were tweaked to always use an index run at 1/10 or less the
speed of a sequential scan against the same data. The same people
complaining "all over the place" about this topic are also the sort who
write them. There are two main fallacies at play here that make this
happen:

1) Even if you use an index, PostgreSQL must still retrieve the
associated table data to execute the query in order to execute its
version of MVCC

2) The sort of random I/O done by index lookups can be as much as 50X as
expensive on standard hard drives as sequential, if every block goes to
physical hardware.

If I were to work on improving this area, it would be executing on some
plans a few of us have sketched out for exposing some notion about what
indexes are actually in memory to the optimizer. There are more obvious
fixes to the specific case of temp tables though.

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books

From:	Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>
To:	Greg Smith <greg(at)2ndquadrant(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "david(at)lang(dot)hm" <david(at)lang(dot)hm>, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, Vitalii Tymchyshyn <tivv00(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: [HACKERS] Slow count(*) again...
Date:	2011-02-02 20:54:26
Message-ID:	4D49C482.4010707@vmsinfo.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

Greg Smith wrote:
> Given that even Oracle kicked out the RBO a long time ago, I'm not so
> sure longing for those good old days will go very far. I regularly see
> queries that were tweaked to always use an index run at 1/10 or less the
> speed of a sequential scan against the same data. The same people
> complaining "all over the place" about this topic are also the sort who
> write them. There are two main fallacies at play here that make this
> happen:
>
Oracle just gives an impression that RBO is gone. It's actually still
there, even in 11.2:

Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing
options

SQL> alter session set optimizer_mode=rule;

Session altered.

Oracle people were just as puritanical as Postgres people, if not more
so. However, the huge backlash made them reconsider the decision. RBO is
officially de-supported, obsolete and despised but it is also widely
used, even in the Oracle's own SYS schema. Oracle is having huge
problems with trying to get people to the cost based optimizer, but they
are not yet quite done.

> 1) Even if you use an index, PostgreSQL must still retrieve the
> associated table data to execute the query in order to execute its
> version of MVCC
>
Of course. Nobody contests that. However, index scans for OLTP are
indispensable. Sequential scans just don't do the trick in some situations.

> 2) The sort of random I/O done by index lookups can be as much as 50X as
> expensive on standard hard drives as sequential, if every block goes to
> physical hardware.
>

Greg, how many questions about queries not using an index have you seen?
There is a reason why people keep asking that. The sheer number of
questions like that on this group should tell you that there is a
problem there.
There must be a relatively simple way of influencing optimizer
decisions. With all due respect, I consider myself smarter than the
optimizer. I'm 6'4", 235LBS so telling me that you disagree and that I
am more stupid than a computer program, would not be a smart thing to
do. Please, do not misunderestimate me.

> If I were to work on improving this area, it would be executing on some
> plans a few of us have sketched out for exposing some notion about what
> indexes are actually in memory to the optimizer. There are more obvious
> fixes to the specific case of temp tables though.
>
>
I've had a run in with a temporary table, that I had to resolve by
disabling hash join and merge join, that really irritated me.

--

Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com
The Leader in Integrated Media Intelligence Solutions

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>
Cc:	Greg Smith <greg(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "david(at)lang(dot)hm" <david(at)lang(dot)hm>, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, Vitalii Tymchyshyn <tivv00(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: [HACKERS] Slow count(*) again...
Date:	2011-02-02 21:11:25
Message-ID:	201102022111.p12LBP800917@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

Mladen Gogala wrote:
> > 2) The sort of random I/O done by index lookups can be as much as 50X as
> > expensive on standard hard drives as sequential, if every block goes to
> > physical hardware.
> >
>
> Greg, how many questions about queries not using an index have you seen?
> There is a reason why people keep asking that. The sheer number of
> questions like that on this group should tell you that there is a
> problem there.

Very few of those reports found that an index scan was indeed faster ---
they just assumed so but when they actually tested it, they understood.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

From:	Kenneth Marshall <ktm(at)rice(dot)edu>
To:	Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>
Cc:	Greg Smith <greg(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "david(at)lang(dot)hm" <david(at)lang(dot)hm>, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, Vitalii Tymchyshyn <tivv00(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: [HACKERS] Slow count(*) again...
Date:	2011-02-02 21:14:06
Message-ID:	20110202211406.GN24931@aart.is.rice.edu
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On Wed, Feb 02, 2011 at 03:54:26PM -0500, Mladen Gogala wrote:
> Greg Smith wrote:
>> Given that even Oracle kicked out the RBO a long time ago, I'm not so sure
>> longing for those good old days will go very far. I regularly see queries
>> that were tweaked to always use an index run at 1/10 or less the speed of
>> a sequential scan against the same data. The same people complaining "all
>> over the place" about this topic are also the sort who write them. There
>> are two main fallacies at play here that make this happen:
>>
> Oracle just gives an impression that RBO is gone. It's actually still
> there, even in 11.2:
>
> Connected to:
> Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - Production
> With the Partitioning, OLAP, Data Mining and Real Application Testing
> options
>
> SQL> alter session set optimizer_mode=rule;
>
> Session altered.
>
> Oracle people were just as puritanical as Postgres people, if not more so.
> However, the huge backlash made them reconsider the decision. RBO is
> officially de-supported, obsolete and despised but it is also widely used,
> even in the Oracle's own SYS schema. Oracle is having huge problems with
> trying to get people to the cost based optimizer, but they are not yet
> quite done.
>

This problem in getting people to migrate to the cost-based optimizer
seems to stem from the original use of the rule based optimizer and
the ability to (mis)hint every option in the DB. If I were running
a shop with 100k-1m lines of SQL code with embedded hints, I would
run screaming at the QA required to move to the cost-based system.
In many ways, the RBO itself + hints is hindering the adoption of
the CBO. Are there any stats on the adoption/use of the CBO on new
Oracle users/shops?

>> 1) Even if you use an index, PostgreSQL must still retrieve the associated
>> table data to execute the query in order to execute its version of MVCC
>>
> Of course. Nobody contests that. However, index scans for OLTP are
> indispensable. Sequential scans just don't do the trick in some situations.
>
>
>> 2) The sort of random I/O done by index lookups can be as much as 50X as
>> expensive on standard hard drives as sequential, if every block goes to
>> physical hardware.
>>
>
> Greg, how many questions about queries not using an index have you seen?
> There is a reason why people keep asking that. The sheer number of
> questions like that on this group should tell you that there is a problem
> there. There must be a relatively simple way of influencing optimizer
> decisions. With all due respect, I consider myself smarter than the
> optimizer. I'm 6'4", 235LBS so telling me that you disagree and that I am
> more stupid than a computer program, would not be a smart thing to do.
> Please, do not misunderestimate me.
>

I see them come up regularly. However, there really are not all that
many when you consider how many people are using PostgreSQL. Its
optimizer works quite well. Knowing how hints can be misused, I would
rather have the developers use their resource to improve the optimizer
than spend time on a hint system that would be mis-used over and over
by beginners, with the attendent posts to HACKERS/PERFORM/NOVICE/...
groups. I certainly have had a fun time or two in my limited Oracle
experience tracking down a hint-based performance problem, so it
works both ways.

Regards,
Ken

From:	Justin Pitts <justinpitts(at)gmail(dot)com>
To:	Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>
Cc:	Greg Smith <greg(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "david(at)lang(dot)hm" <david(at)lang(dot)hm>, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, Vitalii Tymchyshyn <tivv00(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: [HACKERS] Slow count(*) again...
Date:	2011-02-02 21:25:00
Message-ID:	AANLkTinpDcOU79KZ8UZ6JrTQ6j+E0T=yVeJ+=dy227TO@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

> With all
> due respect, I consider myself smarter than the optimizer. I'm 6'4", 235LBS
> so telling me that you disagree and that I am more stupid than a computer
> program, would not be a smart thing to do. Please, do not misunderestimate
> me.

I don't see computer programs make thinly veiled threats, especially
in a public forum.
I'll do what you claim is not the smart thing and disagree with you.
You are wrong.
You are dragging the signal-to-noise ratio of this discussion down.
You owe Greg an apology.

From:	Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>
To:	Kenneth Marshall <ktm(at)rice(dot)edu>
Cc:	Greg Smith <greg(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "david(at)lang(dot)hm" <david(at)lang(dot)hm>, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, Vitalii Tymchyshyn <tivv00(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: [HACKERS] Slow count(*) again...
Date:	2011-02-02 21:59:50
Message-ID:	4D49D3D6.2010002@vmsinfo.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

Kenneth Marshall wrote:
>
>
> I see them come up regularly. However, there really are not all that
> many when you consider how many people are using PostgreSQL. Its
> optimizer works quite well. Knowing how hints can be misused, I would
> rather have the developers use their resource to improve the optimizer
> than spend time on a hint system that would be mis-used over and over
> by beginners, with the attendent posts to HACKERS/PERFORM/NOVICE/...
> groups. I certainly have had a fun time or two in my limited Oracle
> experience tracking down a hint-based performance problem, so it
> works both ways.
>
> Regards,
> Ken
>

Ken, the story is really simple: when a problem with a bad query arises,
the DBA has to make it work, one way or another. The weapon of choice
are usually hints, but there is also the ability to set the critical
statistic variables to the desired values. If my users are screaming
that the application response time is slow, I cannot afford to wait for
developers to fix the optimizer. I will therefore not use Postgres for
my mission critical applications, as long as there are no hints.

Oracle is expensive, but not as expensive as the downtime. And that's
the bottom line. Yes, hints can cause problems, but the absence of hints
and wait interface can cause even bigger problems. This is not a choice
between good and evil, as in the Nick Cage movies, it is a choice
between evil and lesser evil. I would love to be able to use Postgres
for some of my mission critical applications. Saving tens of thousands
of dollars would make me a company hero and earn me a hefty bonus, so I
have a personal incentive to do so. Performance is normally not a
problem. If the application is carefully crafted and designed, it will
work more or less the same as Oracle. However, applications sometimes
need maintenance. Ruth from sales wants the IT to start ingesting data
in UTF8 because we have clients in other countries. She also wants us to
track language and countries. Columns have to be added to the tables,
applications have to be changed, foreign keys added, triggers altered,
etc, etc. What you end up with is usually less than optimal.
Applications have life cycle and they move from being young and sexy to
being an old fart application, just as people do. Hints are Viagra for
applications. Under the ideal conditions, it is not needed, but once the
app is past certain age....

The other problem is that plans change with the stats, not necessarily
for the better. People clean a large table, Postgres runs auto-vacuum,
stats change and all the plans change, too. If some of the new plans are
unacceptable, there isn't much you can do about it, but to hint it to
the proper plan. Let's not pretend, Postgres does support sort of hints
with the "set enable_<access method>" and random/sequential scan cost.
Also, effective cache size is openly used to trick the optimizer into
believing that there is more memory than there actually is. Hints are
already there, they're just not as elegant as Oracle's solution. If I
set sequential page cost to 4 and random page cost to 1, I have,
effectively, introduced rule based optimizer to Postgres. I am not sure
why is there such a puritanical resistance to hints on one side and, on
other side, there are means to achieve exactly the same thing. As my
signature line says, I am a senior Oracle DBA, with quite a bit of
experience. What I need to approve moving mission critical applications
to Postgres are better monitoring tools and something to help me with
quick and dirty fixes when necessary. I am willing to learn, I got the
company to invest some money and do pilot projects, but I am not
prepared to have my boss saying "we could have fixed the problem, had we
stayed on Oracle".

BTW:
On my last airplane trip, I saw Nick Cage in the "Sorcerer's Apprentice"
and my brain still hurts.

--

Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com
The Leader in Integrated Media Intelligence Solutions

From:	Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>
To:	Justin Pitts <justinpitts(at)gmail(dot)com>
Cc:	Greg Smith <greg(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "david(at)lang(dot)hm" <david(at)lang(dot)hm>, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, Vitalii Tymchyshyn <tivv00(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: [HACKERS] Slow count(*) again...
Date:	2011-02-02 22:03:28
Message-ID:	4D49D4B0.5080803@vmsinfo.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

Justin Pitts wrote:
>> With all
>> due respect, I consider myself smarter than the optimizer. I'm 6'4", 235LBS
>> so telling me that you disagree and that I am more stupid than a computer
>> program, would not be a smart thing to do. Please, do not misunderestimate
>> me.
>>
>
> I don't see computer programs make thinly veiled threats, especially
> in a public forum.
> I'll do what you claim is not the smart thing and disagree with you.
> You are wrong.
> You are dragging the signal-to-noise ratio of this discussion down.
> You owe Greg an apology.
>
I apologize if that was understood as a threat. It was actually a joke.
I thought that my using of the word "misunderestimate" has made it
abundantly clear. Apparently, G.W. doesn't have as many fans as I have
previously thought. Once again, it was a joke, I humbly apologize if
that was misunderstood.

--

Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com
The Leader in Integrated Media Intelligence Solutions

From:	Greg Smith <greg(at)2ndquadrant(dot)com>
To:	Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "david(at)lang(dot)hm" <david(at)lang(dot)hm>, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, Vitalii Tymchyshyn <tivv00(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: [HACKERS] Slow count(*) again...
Date:	2011-02-03 00:03:06
Message-ID:	4D49F0BA.9000901@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

Mladen Gogala wrote:
> Greg, how many questions about queries not using an index have you
> seen? There is a reason why people keep asking that. The sheer number
> of questions like that on this group should tell you that there is a
> problem there. There must be a relatively simple way of influencing
> optimizer decisions.

I think that's not quite the right question. For every person like
yourself who is making an informed "the optimizer is really picking the
wrong index" request, I think there are more who are asking for that but
are not actually right that it will help. I think you would agree that
this area is hard to understand, and easy to make mistakes about, yes?
So the right question is "how many questions about queries not using an
index would have actually benefitted from the behavior they asked for?"
That's a much fuzzier and harder to answer question.

I agree that it would be nice to provide a UI for the informed.
Unfortunately, the problem I was pointing out is that doing so could, on
average, make PostgreSQL appear to run worse to people who use it.
Things like which index and merge type are appropriate changes as data
comes in, and some of the plan switches that occur because of that are
the right thing to do--not a mistake on the optimizer's part. I'm sure
you've seen people put together plan rules for the RBO that worked fine
on small data sets, but were very wrong as production data volume went
up. That problem should be less likely to happen to a CBO approach. It
isn't always, of course, but trying to build a RBO-style approach from
scratch now to resolve those cases isn't necessarily the right way to
proceed.

Given limited resources as a development community, it's hard to justify
working on hinting--which has its own complexity to do right--when there
are so many things that I think are more likely to help *everyone* that
could be done instead. The unfortunate situation we're in, unlike
Oracle, is that there isn't a practically infinite amount of money
available to fund every possible approach here, then see which turn out
to work later after our customers suffer through the bad ones for a while.

> With all due respect, I consider myself smarter than the optimizer.
> I'm 6'4", 235LBS so telling me that you disagree and that I am more
> stupid than a computer program, would not be a smart thing to do.
> Please, do not misunderestimate me.

I remember when I used to only weigh that much. You are lucky to be
such a slim little guy!

Oh, I guess I should add, :)

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Greg Smith <greg(at)2ndquadrant(dot)com>
Cc:	Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "david(at)lang(dot)hm" <david(at)lang(dot)hm>, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, Vitalii Tymchyshyn <tivv00(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: [HACKERS] Slow count(*) again...
Date:	2011-02-03 00:13:44
Message-ID:	201102030013.p130DiH08690@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

Greg Smith wrote:
> Mladen Gogala wrote:
> > Greg, how many questions about queries not using an index have you
> > seen? There is a reason why people keep asking that. The sheer number
> > of questions like that on this group should tell you that there is a
> > problem there. There must be a relatively simple way of influencing
> > optimizer decisions.
>
> I think that's not quite the right question. For every person like
> yourself who is making an informed "the optimizer is really picking the
> wrong index" request, I think there are more who are asking for that but
> are not actually right that it will help. I think you would agree that
> this area is hard to understand, and easy to make mistakes about, yes?
> So the right question is "how many questions about queries not using an
> index would have actually benefitted from the behavior they asked for?"
> That's a much fuzzier and harder to answer question.

Agreed. I created an FAQ entry years ago to explain this point and tell
people how to test it:

http://wiki.postgresql.org/wiki/FAQ#Why_are_my_queries_slow.3F_Why_don.27t_they_use_my_indexes.3F

Once I added that FAQ we had many fewer email questions about index
choice.

> > With all due respect, I consider myself smarter than the optimizer.
> > I'm 6'4", 235LBS so telling me that you disagree and that I am more
> > stupid than a computer program, would not be a smart thing to do.
> > Please, do not misunderestimate me.
>
> I remember when I used to only weigh that much. You are lucky to be
> such a slim little guy!
>
> Oh, I guess I should add, :)

Oh, wow, what a great retort. :-)

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

From:	Justin Pitts <justinpitts(at)gmail(dot)com>
To:	Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>
Cc:	Greg Smith <greg(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "david(at)lang(dot)hm" <david(at)lang(dot)hm>, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, Vitalii Tymchyshyn <tivv00(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: [HACKERS] Slow count(*) again...
Date:	2011-02-03 00:29:20
Message-ID:	AANLkTinUoAb5yHJ6fDm1Rc2jhHL7=GuQ_4xNgB2GV5Gy@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

Thank you.

It appears I owe an apology also, for jumping to that conclusion. It
was rash and unfair of me. I am sorry.

On Wed, Feb 2, 2011 at 5:03 PM, Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com> wrote:
> Justin Pitts wrote:
>>>
>>> With all
>>> due respect, I consider myself smarter than the optimizer. I'm 6'4",
>>> 235LBS
>>> so telling me that you disagree and that I am more stupid than a computer
>>> program, would not be a smart thing to do. Please, do not
>>> misunderestimate
>>> me.
>>>
>>
>> I don't see computer programs make thinly veiled threats, especially
>> in a public forum.
>> I'll do what you claim is not the smart thing and disagree with you.
>> You are wrong.
>> You are dragging the signal-to-noise ratio of this discussion down.
>> You owe Greg an apology.
>>
>
> I apologize if that was understood as a threat. It was actually a joke. I
> thought that my using of the word "misunderestimate" has made it abundantly
> clear. Apparently, G.W. doesn't have as many fans as I have previously
> thought. Once again, it was a joke, I humbly apologize if that was
> misunderstood.
>
> --
>
> Mladen Gogala Sr. Oracle DBA
> 1500 Broadway
> New York, NY 10036
> (212) 329-5251
> http://www.vmsinfo.com The Leader in Integrated Media Intelligence Solutions
>
>
>
>

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Greg Smith <greg(at)2ndquadrant(dot)com>
Cc:	Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "david(at)lang(dot)hm" <david(at)lang(dot)hm>, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, Vitalii Tymchyshyn <tivv00(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: [HACKERS] Slow count(*) again...
Date:	2011-02-03 02:01:07
Message-ID:	AANLkTik1TqM4cBKGm--pfZZ8p7j-70-Z6ocbJTRGDNfo@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On Wed, Feb 2, 2011 at 7:03 PM, Greg Smith <greg(at)2ndquadrant(dot)com> wrote:
> Given limited resources as a development community, it's hard to justify
> working on hinting--which has its own complexity to do right--when there are
> so many things that I think are more likely to help *everyone* that could be
> done instead. The unfortunate situation we're in, unlike Oracle, is that
> there isn't a practically infinite amount of money available to fund every
> possible approach here, then see which turn out to work later after our
> customers suffer through the bad ones for a while.

There are actually very few queries where I actually want to force the
planner to use a particular index, which is the sort of thing Oracle
lets you do. If it's a simple query and
random_page_cost/seq_page_cost are reasonably well adjusted, the
planner's choice is very, very likely to be correct. If it's a
complex query, the planner has more likelihood of going wrong, but
forcing it to use an index on one table isn't going to help much if
that table is being used on the inner side of a hash join. You almost
need to be able to force the entire plan into the shape you've chosen,
and that's a lot of work and not terribly robust. The most common
type of "hard to fix" query problem - by far - is a bad selectivity
estimate. Being able to hint that would be worth more than any number
of hints about which indexes to use, in my book.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>
To:	Greg Smith <greg(at)2ndquadrant(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "david(at)lang(dot)hm" <david(at)lang(dot)hm>, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, Vitalii Tymchyshyn <tivv00(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: [HACKERS] Slow count(*) again...
Date:	2011-02-03 02:45:19
Message-ID:	4D4A16BF.1020304@vmsinfo.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On 2/2/2011 7:03 PM, Greg Smith wrote:
> I think that's not quite the right question. For every person like
> yourself who is making an informed "the optimizer is really picking the
> wrong index" request, I think there are more who are asking for that but
> are not actually right that it will help. I think you would agree that
> this area is hard to understand, and easy to make mistakes about, yes?
> So the right question is "how many questions about queries not using an
> index would have actually benefitted from the behavior they asked for?"
> That's a much fuzzier and harder to answer question.
>
> I agree that it would be nice to provide a UI for the informed.
> Unfortunately, the problem I was pointing out is that doing so could, on
> average, make PostgreSQL appear to run worse to people who use it.
Greg, I understand your concerns, but let me point out two things:
1) The basic mechanism is already there. PostgreSQL has a myriad of
ways to actually control the optimizer. One, completely analogous to
Oracle mechanisms, is to control the cost of sequential vs. random page
scan. The other, completely analogous to Oracle hints, is based on the
group of switches for turning on and off various join and access
methods. This also includes setting join_collapse limit to 1, to force
the desired join order. The third way is to actually make the optimizer
work a lot harder by setting gego_effort to 10 and
default_statistics_target to 1000 or more, which will increase the size
of histograms and increase the time and CPU spent on parsing. I can
literally force the plan of my choosing on Postgres optimizer. The
mechanisms are already there, I am only pleading for a more elegant version.

2) The guys who may spread Postgres and help it achieve the desired
world domination, discussed here the other day, are database
administrators in the big companies. If you get people from JP Morgan
Chase, Bank of America, Goldman Sachs or Lehman Brothers to start using
Postgres for serious projects, the rest will follow the suit. People
from some of these companies have already been seen on NYC Postgres
meetings.
Granted, MySQL started on the other end of the spectrum, by being used
for ordering downloaded MP3 collections, but it had found its way into
the corporate server rooms, too. The techies at big companies are the
guys who will or will not make it happen. And these guys are not
beginners. Appeasing them may actually go a long way.

--
Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
www.vmsinfo.com

From:	Greg Smith <greg(at)2ndquadrant(dot)com>
To:	Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "david(at)lang(dot)hm" <david(at)lang(dot)hm>, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, Vitalii Tymchyshyn <tivv00(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: [HACKERS] Slow count(*) again...
Date:	2011-02-03 06:16:36
Message-ID:	4D4A4844.3020409@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

Mladen Gogala wrote:
> The techies at big companies are the guys who will or will not make it
> happen. And these guys are not beginners. Appeasing them may actually
> go a long way.

The PostgreSQL community isn't real big on appeasing people if it's at
the expense of robustness or correctness, and this issue falls into that
category. There are downsides to that, but good things too. Chasing
after whatever made people happy regardless of its impact on the server
code itself has in my mind contributed to why Oracle is so bloated and
MySQL so buggy, to pick two examples from my favorite horse to whip.
Trying to graft an alternate UI for the stuff that needs to be tweaked
here to do better, one flexible enough to actually handle the complexity
of the job, is going to add some code with a new class of bugs and
continous maintenance headaches. Being picky about rejecting such
things is part of the reason why the PostgreSQL code has developed a
good reputation.

From:	Vitalii Tymchyshyn <tivv00(at)gmail(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Jon Nelson <jnelson+pgsql(at)jamponi(dot)net>, Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "david(at)lang(dot)hm" <david(at)lang(dot)hm>, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: [HACKERS] Slow count(*) again...
Date:	2011-02-03 09:54:49
Message-ID:	4D4A7B69.2020704@gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

02.02.11 20:32, Robert Haas написав(ла):
>
> Yeah. Any kind of bulk load into an empty table can be a problem,
> even if it's not temporary. When you load a bunch of data and then
> immediately plan a query against it, autoanalyze hasn't had a chance
> to do its thing yet, so sometimes you get a lousy plan.

May be introducing something like 'AutoAnalyze' threshold will help? I
mean that any insert/update/delete statement that changes more then x%
of table (and no less then y records) must do analyze right after it was
finished.
Defaults like x=50 y=10000 should be quite good as for me.

Best regards, Vitalii Tymchyshyn

From:	david(at)lang(dot)hm
To:	Vitalii Tymchyshyn <tivv00(at)gmail(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Jon Nelson <jnelson+pgsql(at)jamponi(dot)net>, Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: [HACKERS] Slow count(*) again...
Date:	2011-02-03 10:11:58
Message-ID:	alpine.DEB.2.00.1102030208440.8162@asgard.lang.hm
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On Thu, 3 Feb 2011, Vitalii Tymchyshyn wrote:

> 02.02.11 20:32, Robert Haas ???????(??):
>>
>> Yeah. Any kind of bulk load into an empty table can be a problem,
>> even if it's not temporary. When you load a bunch of data and then
>> immediately plan a query against it, autoanalyze hasn't had a chance
>> to do its thing yet, so sometimes you get a lousy plan.
>
> May be introducing something like 'AutoAnalyze' threshold will help? I mean
> that any insert/update/delete statement that changes more then x% of table
> (and no less then y records) must do analyze right after it was finished.
> Defaults like x=50 y=10000 should be quite good as for me.

If I am understanding things correctly, a full Analyze is going over all
the data in the table to figure out patterns.

If this is the case, wouldn't it make sense in the situation where you are
loading an entire table from scratch to run the Analyze as you are
processing the data? If you don't want to slow down the main thread that's
inserting the data, you could copy the data to a second thread and do the
analysis while it's still in RAM rather than having to read it off of disk
afterwords.

this doesn't make sense for updates to existing databases, but the use
case of loading a bunch of data and then querying it right away isn't
_that_ uncommon.

David Lang

From:	Kenneth Marshall <ktm(at)rice(dot)edu>
To:	david(at)lang(dot)hm
Cc:	Vitalii Tymchyshyn <tivv00(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Jon Nelson <jnelson+pgsql(at)jamponi(dot)net>, Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: [HACKERS] Slow count(*) again...
Date:	2011-02-03 13:41:42
Message-ID:	20110203134142.GO24931@aart.is.rice.edu
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On Thu, Feb 03, 2011 at 02:11:58AM -0800, david(at)lang(dot)hm wrote:
> On Thu, 3 Feb 2011, Vitalii Tymchyshyn wrote:
>
>> 02.02.11 20:32, Robert Haas ???????(??):
>>> Yeah. Any kind of bulk load into an empty table can be a problem,
>>> even if it's not temporary. When you load a bunch of data and then
>>> immediately plan a query against it, autoanalyze hasn't had a chance
>>> to do its thing yet, so sometimes you get a lousy plan.
>>
>> May be introducing something like 'AutoAnalyze' threshold will help? I
>> mean that any insert/update/delete statement that changes more then x% of
>> table (and no less then y records) must do analyze right after it was
>> finished.
>> Defaults like x=50 y=10000 should be quite good as for me.
>
> If I am understanding things correctly, a full Analyze is going over all
> the data in the table to figure out patterns.
>
> If this is the case, wouldn't it make sense in the situation where you are
> loading an entire table from scratch to run the Analyze as you are
> processing the data? If you don't want to slow down the main thread that's
> inserting the data, you could copy the data to a second thread and do the
> analysis while it's still in RAM rather than having to read it off of disk
> afterwords.
>
> this doesn't make sense for updates to existing databases, but the use case
> of loading a bunch of data and then querying it right away isn't _that_
> uncommon.
>
> David Lang
>

+1 for in-flight ANALYZE. This would be great for bulk loads of
real tables as well as temp tables.

Cheers,
Ken

From:	Jon Nelson <jnelson+pgsql(at)jamponi(dot)net>
To:	Kenneth Marshall <ktm(at)rice(dot)edu>
Cc:	david(at)lang(dot)hm, Vitalii Tymchyshyn <tivv00(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: [HACKERS] Slow count(*) again...
Date:	2011-02-03 14:20:01
Message-ID:	AANLkTinUDcJJpHfbRaboKVVgoKPhmTgPebbRZmNAJvZh@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On Thu, Feb 3, 2011 at 7:41 AM, Kenneth Marshall <ktm(at)rice(dot)edu> wrote:
> On Thu, Feb 03, 2011 at 02:11:58AM -0800, david(at)lang(dot)hm wrote:
>> On Thu, 3 Feb 2011, Vitalii Tymchyshyn wrote:
>>
>>> 02.02.11 20:32, Robert Haas ???????(??):
>>>> Yeah. Any kind of bulk load into an empty table can be a problem,
>>>> even if it's not temporary. When you load a bunch of data and then
>>>> immediately plan a query against it, autoanalyze hasn't had a chance
>>>> to do its thing yet, so sometimes you get a lousy plan.
>>>
>>> May be introducing something like 'AutoAnalyze' threshold will help? I
>>> mean that any insert/update/delete statement that changes more then x% of
>>> table (and no less then y records) must do analyze right after it was
>>> finished.
>>> Defaults like x=50 y=10000 should be quite good as for me.
>>
>> If I am understanding things correctly, a full Analyze is going over all
>> the data in the table to figure out patterns.
>>
>> If this is the case, wouldn't it make sense in the situation where you are
>> loading an entire table from scratch to run the Analyze as you are
>> processing the data? If you don't want to slow down the main thread that's
>> inserting the data, you could copy the data to a second thread and do the
>> analysis while it's still in RAM rather than having to read it off of disk
>> afterwords.
>>
>> this doesn't make sense for updates to existing databases, but the use case
>> of loading a bunch of data and then querying it right away isn't _that_
>> uncommon.
>>
>> David Lang
>>
>
> +1 for in-flight ANALYZE. This would be great for bulk loads of
> real tables as well as temp tables.

Yes, please, that would be really nice.

--
Jon

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Vitalii Tymchyshyn <tivv00(at)gmail(dot)com>
Cc:	Jon Nelson <jnelson+pgsql(at)jamponi(dot)net>, Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "david(at)lang(dot)hm" <david(at)lang(dot)hm>, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: [HACKERS] Slow count(*) again...
Date:	2011-02-03 15:31:27
Message-ID:	AANLkTim0AbX71fPTOQa++q9PAGn7SL2ziYpRnb_JMyNR@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On Thu, Feb 3, 2011 at 4:54 AM, Vitalii Tymchyshyn <tivv00(at)gmail(dot)com> wrote:
> 02.02.11 20:32, Robert Haas написав(ла):
>>
>> Yeah. Any kind of bulk load into an empty table can be a problem,
>> even if it's not temporary. When you load a bunch of data and then
>> immediately plan a query against it, autoanalyze hasn't had a chance
>> to do its thing yet, so sometimes you get a lousy plan.
>
> May be introducing something like 'AutoAnalyze' threshold will help? I mean
> that any insert/update/delete statement that changes more then x% of table
> (and no less then y records) must do analyze right after it was finished.
> Defaults like x=50 y=10000 should be quite good as for me.

That would actually be a pessimization for many real world cases. Consider:

COPY
COPY
COPY
COPY
COPY
COPY
COPY
COPY
COPY
COPY
COPY
COPY
COPY
SELECT

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	david(at)lang(dot)hm
Cc:	Vitalii Tymchyshyn <tivv00(at)gmail(dot)com>, Jon Nelson <jnelson+pgsql(at)jamponi(dot)net>, Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: [HACKERS] Slow count(*) again...
Date:	2011-02-03 15:35:43
Message-ID:	AANLkTi=Thet0VZ-X6b7tek+iSLVkE27JjiKrB5D8U0Jd@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On Thu, Feb 3, 2011 at 5:11 AM, <david(at)lang(dot)hm> wrote:
> If I am understanding things correctly, a full Analyze is going over all the
> data in the table to figure out patterns.

No. It's going over a small, fixed-size sample which depends on
default_statistics_target but NOT on the table size. It's really
important to come up with a solution that's not susceptible to running
ANALYZE over and over again, in many cases unnecessarily.

> If this is the case, wouldn't it make sense in the situation where you are
> loading an entire table from scratch to run the Analyze as you are
> processing the data? If you don't want to slow down the main thread that's
> inserting the data, you could copy the data to a second thread and do the
> analysis while it's still in RAM rather than having to read it off of disk
> afterwords.

Well that's basically what autoanalyze is going to do anyway, if the
table is small enough to fit in shared_buffers. And it's actually
usually BAD if it starts running while you're doing a large bulk load,
because it competes for I/O bandwidth and the buffer cache and slows
things down. Especially when you're bulk loading for a long time and
it tries to run over and over. I'd really like to suppress all those
asynchronous ANALYZE operations and instead do ONE synchronous one at
the end, when we try to use the data.

Of course, the devil is in the nontrivial details.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Vitalii Tymchyshyn <tivv00(at)gmail(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Jon Nelson <jnelson+pgsql(at)jamponi(dot)net>, Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "david(at)lang(dot)hm" <david(at)lang(dot)hm>, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: [HACKERS] Slow count(*) again...
Date:	2011-02-03 15:43:47
Message-ID:	4D4ACD33.903@gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

03.02.11 17:31, Robert Haas написав(ла):
>
>> May be introducing something like 'AutoAnalyze' threshold will help? I mean
>> that any insert/update/delete statement that changes more then x% of table
>> (and no less then y records) must do analyze right after it was finished.
>> Defaults like x=50 y=10000 should be quite good as for me.
> That would actually be a pessimization for many real world cases. Consider:
>
> COPY
> COPY
> COPY
> COPY
> COPY
> COPY
> COPY
> COPY
> COPY
> COPY
> COPY
> COPY
> COPY
> SELECT
If all the copies are ~ same in size and large this will make it:

COPY
ANALYZE
COPY
ANALYZE
COPY
COPY
ANALYZE
COPY
COPY
COPY
COPY
ANALYZE
COPY
COPY
COPY
COPY
COPY
SELECT

instead of

COPY
COPY
COPY
COPY
COPY
COPY
COPY
COPY
COPY
COPY
COPY
COPY
COPY
ANALYZE (manual, if one is clever enough)
SELECT

So, yes this will add 3 more analyze, but
1) Analyze is pretty cheap comparing to large data loading. I'd say this
would add few percent of burden. And NOT doing analyze manually before
select can raise select costs orders of magnitude.
2) How often in real world a single table is loaded in many COPY
statements? (I don't say it's not often, I really don't know). At least
for restore it is not the case, is not it?
3) default thresholds are things to discuss. You can make x=90 or x=200
(latter will make it run only for massive load/insert operations). You
can even make it disabled by default for people to test. Or enable by
default for temp tables only (and have two sets of thresholds)
4) As most other settings, this threshold can be changed on up to
per-query basis.

P.S. I would also like to have index analyze as part of any create index
process.

Best regards, Vitalii Tymchyshyn

From:	Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>
To:	Greg Smith <greg(at)2ndquadrant(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "david(at)lang(dot)hm" <david(at)lang(dot)hm>, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, Vitalii Tymchyshyn <tivv00(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: [HACKERS] Slow count(*) again...
Date:	2011-02-03 16:38:14
Message-ID:	4D4AD9F6.6040903@vmsinfo.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

Greg Smith wrote:
> Mladen Gogala wrote:
>
>> The techies at big companies are the guys who will or will not make it
>> happen. And these guys are not beginners. Appeasing them may actually
>> go a long way.
>>
>
> The PostgreSQL community isn't real big on appeasing people if it's at
> the expense of robustness or correctness, and this issue falls into that
> category.
With all due respect, I don't see how does the issue of hints fall into
this category? As I explained, the mechanisms are already there, they're
just not elegant enough. The verb "appease" doesn't convey the meaning
that I had in mind quite correctly. The phrase "target population" would
have described what I wanted to say in a much better way .
> There are downsides to that, but good things too. Chasing
> after whatever made people happy regardless of its impact on the server
> code itself has in my mind contributed to why Oracle is so bloated and
> MySQL so buggy, to pick two examples from my favorite horse to whip.
>
Well, those two databases are also used much more widely than Postgres,
which means that they're doing something better than Postgres.

Hints are not even that complicated to program. The SQL parser should
compile the list of hints into a table and optimizer should check
whether any of the applicable access methods exist in the table. If it
does - use it. If not, ignore it. This looks to me like a philosophical
issue, not a programming issue. Basically, the current Postgres
philosophy can be described like this: if the database was a gas stove,
it would occasionally catch fire. However, bundling a fire extinguisher
with the stove is somehow seen as bad. When the stove catches fire,
users is expected to report the issue and wait for a better stove to be
developed. It is a very rough analogy, but rather accurate one, too.

--

Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com
The Leader in Integrated Media Intelligence Solutions

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>
Cc:	Greg Smith <greg(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "david(at)lang(dot)hm" <david(at)lang(dot)hm>, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, Vitalii Tymchyshyn <tivv00(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: [HACKERS] Slow count(*) again...
Date:	2011-02-03 16:56:56
Message-ID:	201102031656.p13Guug09499@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

Mladen Gogala wrote:
> Greg Smith wrote:
> > Mladen Gogala wrote:
> >
> >> The techies at big companies are the guys who will or will not make it
> >> happen. And these guys are not beginners. Appeasing them may actually
> >> go a long way.
> >>
> >
> > The PostgreSQL community isn't real big on appeasing people if it's at
> > the expense of robustness or correctness, and this issue falls into that
> > category.
>
> With all due respect, I don't see how does the issue of hints fall into
> this category? As I explained, the mechanisms are already there, they're
> just not elegant enough. The verb "appease" doesn't convey the meaning
> that I had in mind quite correctly. The phrase "target population" would
> have described what I wanted to say in a much better way .

The settings are currently there to better model the real world
(random_page_cost), or for testing (enable_seqscan). They are not there
to force certain plans. They can be used for that, but that is not
their purpose and they would not have been added if that was their
purpose.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>
Cc:	Greg Smith <greg(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "david(at)lang(dot)hm" <david(at)lang(dot)hm>, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, Vitalii Tymchyshyn <tivv00(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: [HACKERS] Slow count(*) again...
Date:	2011-02-03 16:57:30
Message-ID:	201102031657.p13GvUc09567@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

Mladen Gogala wrote:
> Hints are not even that complicated to program. The SQL parser should
> compile the list of hints into a table and optimizer should check
> whether any of the applicable access methods exist in the table. If it
> does - use it. If not, ignore it. This looks to me like a philosophical
> issue, not a programming issue. Basically, the current Postgres
> philosophy can be described like this: if the database was a gas stove,
> it would occasionally catch fire. However, bundling a fire extinguisher
> with the stove is somehow seen as bad. When the stove catches fire,
> users is expected to report the issue and wait for a better stove to be
> developed. It is a very rough analogy, but rather accurate one, too.

That might be true.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

From:	Shaun Thomas <sthomas(at)peak6(dot)com>
To:	"pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: [HACKERS] Slow count(*) again...
Date:	2011-02-03 17:10:06
Message-ID:	4D4AE16E.9040807@peak6.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On 02/03/2011 10:38 AM, Mladen Gogala wrote:

> With all due respect, I don't see how does the issue of hints fall
> into this category?

You have a few good arguments, and if you hadn't said this, it wouldn't
have been so obvious that there was a fundamental philosophical
disconnect. I asked this same question almost ten years ago, and the
answer Tom gave me was more than sufficient.

It all boils down to the database. Hints, whether they're
well-intentioned or not, effectively cover up bugs in the optimizer,
planner, or some other approach the database is using to build its
execution. Your analogy is that PG is a gas stove, so bundle a fire
extinguisher. Well, the devs believe that the stove should be upgraded
to electric or possibly even induction to remove the need for the
extinguisher.

If they left hints in, it would just be one more thing to deprecate as
the original need for the hint was removed. If you really need hints
that badly, EnterpriseDB cobbled the Oracle syntax into the planner, and
it seems to work alright. That doesn't mean it's right, just that it
works. EnterpriseDB will now have to support those query hints forever,
even if the planner gets so advanced they're effectively useless.

> Well, those two databases are also used much more widely than
> Postgres, which means that they're doing something better than
> Postgres.

Please don't make arguments like this. "Better" is such a subjective
evaluation it means nothing. Are Honda Accords "better" than Lamborghini
Gallardos because more people buy Accords? The MySQL/PostgreSQL flame
war is a long and sometimes bitter one, and bringing it up to try and
persuade the devs to "see reason" is just going to backfire.

> Hints are not even that complicated to program.

Then write a contrib module. It's not part of the core DB, and it
probably never will be. This is a *very* old argument. There's literally
nothing you can say, no argument you can bring, that hasn't been heard a
million times in the last decade.

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
312-676-8870
sthomas(at)peak6(dot)com

______________________________________________

See http://www.peak6.com/email_disclaimer.php
for terms and conditions related to this email

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>
Cc:	Greg Smith <greg(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>, "david(at)lang(dot)hm" <david(at)lang(dot)hm>, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, Vitalii Tymchyshyn <tivv00(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: [HACKERS] Slow count(*) again...
Date:	2011-02-03 17:27:54
Message-ID:	24765.1296754074@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com> writes:
> Hints are not even that complicated to program.

With all due respect, you don't know what you're talking about.

regards, tom lane

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Bruce Momjian <bruce(at)momjian(dot)us>
Cc:	Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>, Greg Smith <greg(at)2ndquadrant(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "david(at)lang(dot)hm" <david(at)lang(dot)hm>, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, Vitalii Tymchyshyn <tivv00(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: [HACKERS] Slow count(*) again...
Date:	2011-02-03 17:28:29
Message-ID:	AANLkTi=rfLJwboH7r3kXvDHmn2vJU1wr8YK+mUJfRdy1@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On Thu, Feb 3, 2011 at 11:56 AM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> The settings are currently there to better model the real world
> (random_page_cost), or for testing (enable_seqscan). They are not there
> to force certain plans. They can be used for that, but that is not
> their purpose and they would not have been added if that was their
> purpose.

Sure. But Mladen's point is that this is rather narrow-minded. I
happen to agree. We are not building an ivory tower. We are building
a program that real people will use to solve real problems, and it is
not our job to artificially prevent them from achieving their
objectives so that we remain motivated to improve future versions of
the code.

I don't, however, agree with his contention that this is easy to
implement. It would be easy to implement something that sucked. It
would be hard to implement something that actually helped in the cases
where the existing settings aren't already sufficient.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Chris Browne <cbbrowne(at)acm(dot)org>
To:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: [HACKERS] Slow count(*) again...
Date:	2011-02-03 17:44:23
Message-ID:	87d3n92etk.fsf@cbbrowne.afilias-int.info
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

mladen(dot)gogala(at)vmsinfo(dot)com (Mladen Gogala) writes:
> Hints are not even that complicated to program. The SQL parser should
> compile the list of hints into a table and optimizer should check
> whether any of the applicable access methods exist in the table. If it
> does - use it. If not, ignore it. This looks to me like a
> philosophical issue, not a programming issue.

It's worth looking back to what has already been elaborated on in the
ToDo.

http://wiki.postgresql.org/wiki/Todo
-----------------------------------
Optimizer hints (not wanted)

Optimizer hints are used to work around problems in the optimizer and
introduce upgrade and maintenance issues. We would rather have the
problems reported and fixed. We have discussed a more sophisticated
system of per-class cost adjustment instead, but a specification remains
to be developed.
-----------------------------------

The complaint is that kludging hints into a particular query attacks the
problem from the wrong direction.

The alternative recommended is to collect some declarative information,
that *won't* be part of the query, that *won't* be processed by the
parser, and that *won't* kludge up the query with information that is
liable to turn into crud over time.

Tom Lane was pretty specific about some kinds of declarative information
that seemed useful:
<http://archives.postgresql.org/pgsql-hackers/2006-10/msg00663.php>

On Jeapordy, participants are expected to phrase one's answers in the
form of a question, and doing so is rewarded.

Based on the presence of "query hints" on the Not Wanted portion of the
ToDo list, it's pretty clear that participants here are expected to
propose optimizer hints in ways that do NOT involve decorating queries
with crud. You'll get a vastly friendlier response if you at least make
an attempt to attack the problem in the "declarative information"
fashion.

Perhaps we're all wrong in believing that pushing query optimization
information into application queries by decorating the application with
hints, is the right idea but it's a belief that certainly seems to be
regularly agreed upon by gentle readers.
--
"cbbrowne","@","linuxdatabases.info"
The people's revolutionary committee has decided that the name "e" is
retrogressive, unmulticious and reactionary, and has been flushed.
Please update your abbrevs.

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Bruce Momjian <bruce(at)momjian(dot)us>, Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>, Greg Smith <greg(at)2ndquadrant(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, "david(at)lang(dot)hm" <david(at)lang(dot)hm>, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, Vitalii Tymchyshyn <tivv00(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: [HACKERS] Slow count(*) again...
Date:	2011-02-03 17:46:48
Message-ID:	25146.1296755208@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> I don't, however, agree with his contention that this is easy to
> implement. It would be easy to implement something that sucked. It
> would be hard to implement something that actually helped in the cases
> where the existing settings aren't already sufficient.

Exactly. A hint system that actually did more good than harm would be a
very nontrivial project. IMO such effort is better spent on making the
optimizer smarter.

regards, tom lane

From:	Greg Smith <greg(at)2ndquadrant(dot)com>
To:	Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "david(at)lang(dot)hm" <david(at)lang(dot)hm>, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, Vitalii Tymchyshyn <tivv00(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: [HACKERS] Slow count(*) again...
Date:	2011-02-03 18:17:08
Message-ID:	4D4AF124.60904@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

Mladen Gogala wrote:
> With all due respect, I don't see how does the issue of hints fall
> into this category? As I explained, the mechanisms are already there,
> they're just not elegant enough.

You're making some assumptions about what a more elegant mechanism would
look to develop that are simplifying the actual situation here. If you
take a survey of everyone who ever works on this area of the code, and
responses to this thread are already approaching a significant
percentage of such people, you'll discover that doing what you want is
more difficult--and very much "not elegant enough" from the perspective
of the code involved--than you think it would be.

It's actually kind of funny...I've run into more than one person who
charged into the PostgreSQL source code with the goal of "I'm going to
add good hinting!" But it seems like the minute anyone gets enough
understanding of how it fits together to actually do that, they realize
there are just plain better things to be done in there instead. I used
to be in the same situation you're in--thinking that all it would take
is a better UI for tweaking the existing parameters. But now that I've
actually done such tweaking for long enough to get a feel for what's
really wrong with the underlying assumptions, I can name 3 better uses
of development resources that I'd rather work on instead. I mentioned
incorporating cache visibility already, Robert has talked about
improvements to the sensitivity estimates, and the third one is
improving pooling of work_mem so individual clients can get more of it
safely.

> Well, those two databases are also used much more widely than
> Postgres, which means that they're doing something better than Postgres.

"Starting earlier" is the only "better" here. Obviously Oracle got a
much earlier start than either open-source database. The real
divergence in MySQL adoption relative to PostgreSQL was when they
released a Windows port in January of 1998. PostgreSQL didn't really
match that with a fully native port until January of 2005.

Check out
http://www.indeed.com/jobtrends?q=postgres%2C+mysql%2C+oracle&relative=1&relative=1
if you want to see the real story here. Oracle has a large installed
base, but it's considered a troublesome legacy product being replaced
whenever possible now in every place I visit. Obviously my view of the
world as seen through my client feedback is skewed a bit toward
PostgreSQL adoption. But you would be hard pressed to support any view
that suggests Oracle usage is anything other than flat or decreasing at
this point. When usage of one product is growing at an expontential
rate and the other is not growing at all, eventually the market share
curves always cross too.

From:	Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>
To:	Chris Browne <cbbrowne(at)acm(dot)org>
Cc:	"pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: [HACKERS] Slow count(*) again...
Date:	2011-02-03 19:09:35
Message-ID:	4D4AFD6F.2010203@vmsinfo.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

Chris Browne wrote:
> It's worth looking back to what has already been elaborated on in the
> ToDo.
>

And that precisely is what I am trying to contest.

--

Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com
The Leader in Integrated Media Intelligence Solutions

From:	"Ross J(dot) Reedstrom" <reedstrm(at)rice(dot)edu>
To:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: [HACKERS] Slow count(*) again...
Date:	2011-02-03 19:24:42
Message-ID:	20110203192442.GF31926@rice.edu
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On Thu, Feb 03, 2011 at 12:44:23PM -0500, Chris Browne wrote:
> mladen(dot)gogala(at)vmsinfo(dot)com (Mladen Gogala) writes:
> > Hints are not even that complicated to program. The SQL parser should
> > compile the list of hints into a table and optimizer should check
> > whether any of the applicable access methods exist in the table. If it
> > does - use it. If not, ignore it. This looks to me like a
> > philosophical issue, not a programming issue.
>
> It's worth looking back to what has already been elaborated on in the
> ToDo.
>
> http://wiki.postgresql.org/wiki/Todo
> -----------------------------------
> Optimizer hints (not wanted)
>
> Optimizer hints are used to work around problems in the optimizer and
> introduce upgrade and maintenance issues. We would rather have the
> problems reported and fixed. We have discussed a more sophisticated
> system of per-class cost adjustment instead, but a specification remains
> to be developed.

And as to the 'wait around for a new version to fix that': there are
constantly excellent examples of exactly this happening, all the time
with PostgreSQL - most recent example I've seen -
http://archives.postgresql.org/pgsql-performance/2011-01/msg00337.php

The wait often isn't long, at all.

Ross
--
Ross Reedstrom, Ph.D. reedstrm(at)rice(dot)edu
Systems Engineer & Admin, Research Scientist phone: 713-348-6166
Connexions http://cnx.org fax: 713-348-3665
Rice University MS-375, Houston, TX 77005
GPG Key fingerprint = F023 82C8 9B0E 2CC6 0D8E F888 D3AE 810E 88F0 BEDE

From:	david(at)lang(dot)hm
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Vitalii Tymchyshyn <tivv00(at)gmail(dot)com>, Jon Nelson <jnelson+pgsql(at)jamponi(dot)net>, Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: [HACKERS] Slow count(*) again...
Date:	2011-02-03 20:54:02
Message-ID:	alpine.DEB.2.00.1102031243190.10088@asgard.lang.hm
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On Thu, 3 Feb 2011, Robert Haas wrote:

> On Thu, Feb 3, 2011 at 5:11 AM, <david(at)lang(dot)hm> wrote:
>> If I am understanding things correctly, a full Analyze is going over all the
>> data in the table to figure out patterns.
>
> No. It's going over a small, fixed-size sample which depends on
> default_statistics_target but NOT on the table size. It's really
> important to come up with a solution that's not susceptible to running
> ANALYZE over and over again, in many cases unnecessarily.
>
>> If this is the case, wouldn't it make sense in the situation where you are
>> loading an entire table from scratch to run the Analyze as you are
>> processing the data? If you don't want to slow down the main thread that's
>> inserting the data, you could copy the data to a second thread and do the
>> analysis while it's still in RAM rather than having to read it off of disk
>> afterwords.
>
> Well that's basically what autoanalyze is going to do anyway, if the
> table is small enough to fit in shared_buffers. And it's actually
> usually BAD if it starts running while you're doing a large bulk load,
> because it competes for I/O bandwidth and the buffer cache and slows
> things down. Especially when you're bulk loading for a long time and
> it tries to run over and over. I'd really like to suppress all those
> asynchronous ANALYZE operations and instead do ONE synchronous one at
> the end, when we try to use the data.

If the table is not large enough to fit in ram, then it will compete for
I/O, and the user will have to wait.

what I'm proposing is that as the records are created, the process doing
the creation makes copies of the records (either all of them, or some of
them if not all are needed for the analysis, possibly via shareing memory
with the analysis process), this would be synchronous with the load, not
asynchronous.

this would take zero I/O bandwidth, it would take up some ram, memory
bandwidth, and cpu time, but a load of a large table like this is I/O
contrained.

it would not make sense for this to be the default, but as an option it
should save a significant amount of time.

I am making the assumption that an Analyze run only has to go over the
data once (a seqential scan of the table if it's >> ram for example) and
gathers stats as it goes.

with the current code, this is a completely separate process that knows
nothing about the load, so if you kick it off when you start the load, it
makes a pass over the table (competing for I/O), finishes, you continue to
update the table, so it makes another pass, etc. As you say, this is a bad
thing to do. I am saying to have an option that ties the two togeather,
essentially making the data feed into the Analyze run be a fork of the
data comeing out of the insert run going to disk. So the Analyze run
doesn't do any I/O and isn't going to complete until the insert is
complete. At which time it will have seen one copy of the entire table.

David Lang

From:	Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>
To:	"sthomas(at)peak6(dot)com" <sthomas(at)peak6(dot)com>
Cc:	"pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: [HACKERS] Slow count(*) again...
Date:	2011-02-03 21:01:40
Message-ID:	4D4B17B4.9030406@vmsinfo.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

Shaun Thomas wrote:
> On 02/03/2011 10:38 AM, Mladen Gogala wrote:
>
>
> It all boils down to the database. Hints, whether they're
> well-intentioned or not, effectively cover up bugs in the optimizer,
> planner, or some other approach the database is using to build its
> execution.
Hints don't cover up bugs, they simply make it possible for the user to
circumvent the bugs and keep the users happy. As I hinted before, this
is actually a purist argument which was made by someone who has never
had to support a massive production database with many users for living.
> Your analogy is that PG is a gas stove, so bundle a fire
> extinguisher. Well, the devs believe that the stove should be upgraded
> to electric or possibly even induction to remove the need for the
> extinguisher.
>
In the meantime, the fire is burning. What should the hapless owner of
the database application do in the meantime? Tell the users that it will
be better in the next version? As I've said before: hints are make it or
break it point. Without hints, I cannot consider Postgres for the
mission critical projects. I am managing big databases for living and I
flatter myself that after more than two decades of doing it, I am not
too bad at it.

> If they left hints in, it would just be one more thing to deprecate as
> the original need for the hint was removed. If you really need hints
> that badly, EnterpriseDB cobbled the Oracle syntax into the planner, and
> it seems to work alright. That doesn't mean it's right, just that it
> works. EnterpriseDB will now have to support those query hints forever,
> even if the planner gets so advanced they're effectively useless.
>

I don't foresee that to happen in my lifetime. And I plan to go on for
quite a while. There will always be optimizer bugs, users will be
smarter and know more about their data than computer programs in
foreseeable future. What this attitude boils down to is that developers
don't trust their users enough to give them control of the execution
path. I profoundly disagree with that type of philosophy. DB2 also has
hints: http://tinyurl.com/48fv7w7
So does SQL Server:
http://www.sql-server-performance.com/tips/hints_general_p1.aspx
Finally, even the Postgres greatest open source competitor MySQL
supports hints: http://dev.mysql.com/doc/refman/5.0/en/index-hints.html

I must say that this purist attitude is extremely surprising to me. All
the major DB vendors support optimizer hints, yet in the Postgres
community, they are considered bad with almost religious fervor.
Postgres community is quite unique with the fatwa against hints.

--

Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com
The Leader in Integrated Media Intelligence Solutions

From:	Chris Browne <cbbrowne(at)acm(dot)org>
To:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: [HACKERS] Slow count(*) again...
Date:	2011-02-03 21:18:41
Message-ID:	871v3o3jgu.fsf@cbbrowne.afilias-int.info
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

mladen(dot)gogala(at)vmsinfo(dot)com (Mladen Gogala) writes:
> I must say that this purist attitude is extremely surprising to
> me. All the major DB vendors support optimizer hints, yet in the
> Postgres community, they are considered bad with almost religious
> fervor.
> Postgres community is quite unique with the fatwa against hints.

Well, the community declines to add hints until there is actual
consensus on a good way to add hints.

Nobody has ever proposed a way to add hints where consensus was arrived
at that the way was good, so...
--
http://linuxfinances.info/info/nonrdbms.html
Rules of the Evil Overlord #192. "If I appoint someone as my consort,
I will not subsequently inform her that she is being replaced by a
younger, more attractive woman. <http://www.eviloverlord.com/>

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	"sthomas(at)peak6(dot)com" <sthomas(at)peak6(dot)com>, "Mladen Gogala" <mladen(dot)gogala(at)vmsinfo(dot)com>
Cc:	"pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: [HACKERS] Slow count(*) again...
Date:	2011-02-03 21:29:25
Message-ID:	4D4AC9D5020000250003A33F@gw.wicourts.gov
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com> wrote:

> In the meantime, the fire is burning. What should the hapless
> owner of the database application do in the meantime? Tell the
> users that it will be better in the next version? As I've said
> before: hints are make it or break it point. Without hints, I
> cannot consider Postgres for the mission critical projects. I am
> managing big databases for living and I flatter myself that after
> more than two decades of doing it, I am not too bad at it.

Well, I've been at it since 1972, and I'm OK with the current
situation because I push hard for *testing* in advance of production
deployment. So I generally discover that leaving a pan of grease on
maximum flame unattended is a bad idea in the test lab, where no
serious damage is done. Then I take steps to ensure that this
doesn't happen in the user world.

We've got about 100 production databases, some at 2TB and growing,
and 100 development, testing, and staging databases. About 3,000
directly connected users and millions of web hits per day generating
tens of millions of queries. Lots of fun replication and automated
interfaces to business partners -- DOT, county sheriffs, local
police agencies, district attorneys, public defenders offices,
Department of Revenue (for tax intercept collections), Department of
Justice, etc. (That was really just the tip of the iceberg.)

Almost all of this was converted inside of a year with minimal fuss
and only a one user complaint that I can recall. Most users
described it as a "non-event", with the only visible difference
being that applications were "snappier" than under the commercial
database product. One type of query was slow in Milwaukee County
(our largest). We tuned seq_page_cost and random_page_cost until
all queries were running with good plans. It did not require any
down time to sort this out and fix it -- same day turnaround. This
is not a matter of hinting; it's a matter of creating a cost model
for the planner which matches reality. (We don't set this or any
other "hint" per query, we tune the model.) When the cost estimates
mirror reality, good plans are chosen.

-Kevin

From:	Shaun Thomas <sthomas(at)peak6(dot)com>
To:	undisclosed-recipients:;
Cc:	"pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: [HACKERS] Slow count(*) again...
Date:	2011-02-03 21:34:19
Message-ID:	4D4B1F5B.4060403@peak6.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On 02/03/2011 03:01 PM, Mladen Gogala wrote:

> As I hinted before, this is actually a purist argument which was made
> by someone who has never had to support a massive production database
> with many users for living.

Our database handles 9000 transactions per second and over 200-million
transactions per day just fine, thanks. It may not be a "real database"
in your world, but it's real enough for us.

> I must say that this purist attitude is extremely surprising to me.
> All the major DB vendors support optimizer hints, yet in the
> Postgres community, they are considered bad with almost religious
> fervor. Postgres community is quite unique with the fatwa against
> hints.

You missed the argument. The community, or at least the devs, see hints
as an ugly hack. Do I agree? Not completely, but I can definitely
understand the perspective. Saying every other "vendor" has hints is
really just admitting every other vendor has a crappy optimizer. Is that
something to be proud of?

In almost every single case I've seen a query with bad performance, it's
the fault of the author or the DBA. Not enough where clauses; not paying
attention to cardinality or selectivity; inappropriate or misapplied
indexes; insufficient table statistics... the list of worse grievances
out there is endless.

And here's something I never saw you consider: hints making performance
worse. Sure, for now, forcing a sequence scan or forcing it to use
indexes on a specific table is faster for some specific edge-case. But
hints are like most code, and tend to miss frequent refactor. As the
optimizer improves, hints likely won't, meaning code is likely to be
slower than if the hints didn't exist. This of course ignores the
contents of a table are likely to evolve or grow in volume, which can
also drastically alter the path the optimizer would choose, but can't
because a hint is forcing it to take a specific path.

Want to remove a reverse index scan? Reindex with DESC on the column
being reversed. That was added in 8.3. Getting too many calls for nested
loops when a merge or hash would be faster? Increase the statistics
target for the column causing the problems and re-analyze. Find an
actual bug in the optimizer? Tell the devs and they'll fix it. Just stay
current, and you get all those benefits. This is true for any database;
bugs get fixed, things get faster and more secure.

Or like I said, if you really need hints that badly, use EnterpriseDB
instead. It's basically completely Oracle-compatible at this point. But
pestering the PostgreSQL dev community about how inferior they are, and
how they're doing it wrong, and how they're just another vendor making a
database product that can't support massive production databases, is
doing nothing but ensuring they'll ignore you. Flies, honey, vinegar, etc.

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
312-676-8870
sthomas(at)peak6(dot)com

______________________________________________

See http://www.peak6.com/email_disclaimer.php
for terms and conditions related to this email

From:	Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>
To:	Chris Browne <cbbrowne(at)acm(dot)org>
Cc:	"pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: [HACKERS] Slow count(*) again...
Date:	2011-02-03 21:50:20
Message-ID:	4D4B231C.3030802@vmsinfo.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

Chris Browne wrote:
> Well, the community declines to add hints until there is actual
> consensus on a good way to add hints.
>
OK. That's another matter entirely. Who should make that decision? Is
there a committee or a person who would be capable of making that decision?

> Nobody has ever proposed a way to add hints where consensus was arrived
> at that the way was good, so...
>

So, I will have to go back on my decision to use Postgres and
re-consider MySQL? I will rather throw away the effort invested in
studying Postgres than to risk an unfixable application downtime. I am
not sure about the world domination thing, though. Optimizer hints are a
big feature that everybody else has and Postgres does not have because
of religious reasons.

--

Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com
The Leader in Integrated Media Intelligence Solutions

From:	Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
To:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: [HACKERS] Slow count(*) again...
Date:	2011-02-03 21:51:24
Message-ID:	4D4B235C.1080308@catalyst.net.nz
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On 04/02/11 10:01, Mladen Gogala wrote:
> In the meantime, the fire is burning. What should the hapless owner of
> the database application do in the meantime? Tell the users that it
> will be better in the next version? As I've said before: hints are
> make it or break it point. Without hints, I cannot consider Postgres
> for the mission critical projects. I am managing big databases for
> living and I flatter myself that after more than two decades of doing
> it, I am not too bad at it.

This is somewhat of a straw man argument. This sort of query that the
optimizer does badly usually gets noticed during the test cycle i.e
before production, so there is some lead time to get a fix into the
code, or add/subtract indexes/redesign the query concerned.

The cases I've seen in production typically involve "outgrowing"
optimizer parameter settings: (e.g work_mem, effective_cache_size) as
the application dataset gets bigger over time. I would note that this is
*more* likely to happen with hints, as they lobotomize the optimizer so
it *cannot* react to dataset size or distribution changes.

regards

Mark

From:	Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>
To:	"sthomas(at)peak6(dot)com" <sthomas(at)peak6(dot)com>
Cc:	"pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: [HACKERS] Slow count(*) again...
Date:	2011-02-03 22:03:07
Message-ID:	4D4B261B.6010303@vmsinfo.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

Shaun Thomas wrote:
> You missed the argument. The community, or at least the devs, see hints
> as an ugly hack. Do I agree? Not completely, but I can definitely
> understand the perspective. Saying every other "vendor" has hints is
> really just admitting every other vendor has a crappy optimizer. Is that
> something to be proud of?
>
This is funny? Everybody else has a crappy optimizer? That's a funny way
of looking at the fact that every other major database supports hints. I
would be tempted to call that a major missing feature, but the statement
that everybody else has a crappy optimizer sounds kind of funny. No
disrespect meant. It's not unlike claiming that the Earth is 6000 years old.

>
> And here's something I never saw you consider: hints making performance
> worse.
>
Sure. If you give me the steering wheell, there is a chance that I might
get car off the cliff or even run someone over, but that doesn't mean
that there is no need for having one. After all, we're talking about the
ability to control the optimizer decision.

> Want to remove a reverse index scan? Reindex with DESC on the column
> being reversed. That was added in 8.3. Getting too many calls for nested
> loops when a merge or hash would be faster? Increase the statistics
> target for the column causing the problems and re-analyze. Find an
> actual bug in the optimizer? Tell the devs and they'll fix it. Just stay
> current, and you get all those benefits. This is true for any database;
> bugs get fixed, things get faster and more secure.
>
In the meantime, the other databases provide hints which help me bridge
the gap. As I said before: hints are there, even if they were not meant
to be used that way. I can do things in a way that I consider very
non-elegant. The hints are there because they are definitely needed.
Yet, there is a religious zeal and a fatwa against them.

--

Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com
The Leader in Integrated Media Intelligence Solutions

From:	Ben Chobot <bench(at)silentmedia(dot)com>
To:	Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>
Cc:	"pgsql-performance(at)postgresql(dot)org Performance" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: [HACKERS] Slow count(*) again...
Date:	2011-02-03 22:04:06
Message-ID:	9ED07E3C-D1DF-4BC7-B0C4-8126578605A8@silentmedia.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On Feb 3, 2011, at 1:50 PM, Mladen Gogala wrote:

> So, I will have to go back on my decision to use Postgres and re-consider MySQL? I will rather throw away the effort invested in studying Postgres than to risk an unfixable application downtime. I am not sure about the world domination thing, though. Optimizer hints are a big feature that everybody else has and Postgres does not have because of religious reasons.

As always, you should use the tool you consider best for the job. If you think MySQL as both a product and a community has a better chance of giving you what you want, then you should use MySQL.

From:	Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To:	Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>
Cc:	Chris Browne <cbbrowne(at)acm(dot)org>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: [HACKERS] Slow count(*) again...
Date:	2011-02-03 22:05:50
Message-ID:	AANLkTing43D0F5zDBvJnMDLFOZACCAydyx4Zq_HfX230@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

2011/2/3 Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>:
> Chris Browne wrote:
>>
>> Well, the community declines to add hints until there is actual
>> consensus on a good way to add hints.
>>
>
> OK. That's another matter entirely. Who should make that decision? Is
> there a committee or a person who would be capable of making that decision?
>

Because there are not consensus about hints, then hints are not in pg.

And community development must be based on consensus. There are not second way.

Hints are not a win from some reasons.

Situation isn't immutable. There are a lot of features, that was
rejected first time - like replication. But it needs a different
access. You have to show tests, use cases, code and you have to
satisfy all people, so your request is good and necessary. Argument,
so other databases has this feature is a last on top ten.

>> Nobody has ever proposed a way to add hints where consensus was arrived
>> at that the way was good, so...
>>
>
> So, I will have to go back on my decision to use Postgres and re-consider
> MySQL? I will rather throw away the effort invested in studying Postgres
> than to risk an unfixable application downtime. I am not sure about the
> world domination thing, though. Optimizer hints are a big feature that
> everybody else has and Postgres does not have because of religious reasons.

it's not correct from you. There is a real arguments against hints.

you can try a edb. There is a other external modul

http://postgresql.1045698.n5.nabble.com/contrib-plantuner-enable-PostgreSQL-planner-hints-td1924794.html

Regards

Pavel Stehule

> --
>
> Mladen Gogala Sr. Oracle DBA
> 1500 Broadway
> New York, NY 10036
> (212) 329-5251
> http://www.vmsinfo.com The Leader in Integrated Media Intelligence Solutions
>
>
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

From:	Josh Berkus <josh(at)agliodbs(dot)com>
To:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: [HACKERS] Slow count(*) again...
Date:	2011-02-03 22:08:00
Message-ID:	4D4B2740.3070902@agliodbs.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On 2/3/11 1:18 PM, Chris Browne wrote:
> mladen(dot)gogala(at)vmsinfo(dot)com (Mladen Gogala) writes:
>> I must say that this purist attitude is extremely surprising to
>> me. All the major DB vendors support optimizer hints,

I don't think that's actually accurate. Can you give me a list of
DBMSes which support hints other than Oracle?

> Well, the community declines to add hints until there is actual
> consensus on a good way to add hints.
>
> Nobody has ever proposed a way to add hints where consensus was arrived
> at that the way was good, so...

Well, we did actually have some pretty good proposals (IIRC) for
selectively adjusting the cost model to take into account DBA knowledge.
These needed some refinement, but in general seem like the right way to go.

However, since this system wasn't directly compatible with Oracle Hints,
folks pushing for hints dropped the solution as unsatisfactory. This is
the discussion we have every time: the users who want hints specifically
want hints which work exactly like Oracle's, and aren't interested in a
system designed for PostgreSQL. It's gotten very boring; it's like the
requests to support MySQL-only syntax.

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com

From:	Maciek Sakrejda <msakrejda(at)truviso(dot)com>
To:	Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>
Cc:	"sthomas(at)peak6(dot)com" <sthomas(at)peak6(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: [HACKERS] Slow count(*) again...
Date:	2011-02-03 22:09:33
Message-ID:	AANLkTi=6_7N4EuT-trfPbxb9UmanJsnL+ineO-iJUFtJ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

> The hints are there because they are definitely needed. Yet, there is a
> religious zeal and a fatwa against them.

The opposition is philosophical, not "religious". There is no "fatwa".
If you want a serious discussion, avoid inflammatory terms.

---
Maciek Sakrejda | System Architect | Truviso

1065 E. Hillsdale Blvd., Suite 215
Foster City, CA 94404
www.truviso.com

From:	Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To:	Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>
Cc:	"sthomas(at)peak6(dot)com" <sthomas(at)peak6(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: [HACKERS] Slow count(*) again...
Date:	2011-02-03 22:10:52
Message-ID:	AANLkTin=KHhsP=05U9=f28vMUtHOczKUt-Z9EC03ngmv@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

> In the meantime, the other databases provide hints which help me bridge the
> gap. As I said before: hints are there, even if they were not meant to be
> used that way. I can do things in a way that I consider very non-elegant.
> The hints are there because they are definitely needed. Yet, there is a
> religious zeal and a fatwa against them.
>

Other databases has different development model. It isn't based on
consensus. The are not any commercial model for PostgreSQL. There are
not possible to pay programmers. So you can pay and as customer, you
are boss or use it freely and search a consensus - a common talk.

Regards

Pavel Stehule

From:	Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>
To:	Josh Berkus <josh(at)agliodbs(dot)com>
Cc:	"pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: [HACKERS] Slow count(*) again...
Date:	2011-02-03 22:12:07
Message-ID:	4D4B2837.6080601@vmsinfo.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

Josh Berkus wrote:
> However, since this system wasn't directly compatible with Oracle Hints,
> folks pushing for hints dropped the solution as unsatisfactory. This is
> the discussion we have every time: the users who want hints specifically
> want hints which work exactly like Oracle's, and aren't interested in a
> system designed for PostgreSQL. It's gotten very boring; it's like the
> requests to support MySQL-only syntax.
>
Actually, I don't want Oracle hints. Oracle hints are ugly and
cumbersome. I would prefer something like this:

http://dev.mysql.com/doc/refman/5.0/en/index-hints.html

That should also answer the question about other databases supporting hints.

--

Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com
The Leader in Integrated Media Intelligence Solutions

From:	Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>
To:	Josh Berkus <josh(at)agliodbs(dot)com>
Cc:	"pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: [HACKERS] Slow count(*) again...
Date:	2011-02-03 22:13:09
Message-ID:	4D4B2875.2000500@vmsinfo.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

Mladen Gogala wrote:
> Actually, I don't want Oracle hints. Oracle hints are ugly and
> cumbersome. I would prefer something like this:
>
> http://dev.mysql.com/doc/refman/5.0/en/index-hints.html
>
> That should also answer the question about other databases supporting hints.
>

Sorry. I forgot that MySQL too is now an Oracle product.

--

Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com
The Leader in Integrated Media Intelligence Solutions

From:	Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
To:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: [HACKERS] Slow count(*) again...
Date:	2011-02-03 22:17:06
Message-ID:	4D4B2962.7040801@catalyst.net.nz
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On 04/02/11 11:08, Josh Berkus wrote:
> I don't think that's actually accurate. Can you give me a list of
> DBMSes which support hints other than Oracle?
>
DB2 LUW (Linux, Unix, Win32 code base) has hint profiles:

http://justdb2chatter.blogspot.com/2008/06/db2-hints-optimizer-selection.html

From:	Michael Glaesemann <grzm(at)seespotcode(dot)net>
To:	Josh Berkus <josh(at)agliodbs(dot)com>
Cc:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: [HACKERS] Slow count(*) again...
Date:	2011-02-03 22:19:04
Message-ID:	592C8787-E659-49C8-AFD1-96A8B25331DC@seespotcode.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On Feb 3, 2011, at 17:08, Josh Berkus wrote:

> On 2/3/11 1:18 PM, Chris Browne wrote:
>> mladen(dot)gogala(at)vmsinfo(dot)com (Mladen Gogala) writes:
>>> I must say that this purist attitude is extremely surprising to
>>> me. All the major DB vendors support optimizer hints,
>
> I don't think that's actually accurate. Can you give me a list of
> DBMSes which support hints other than Oracle?

1 minute of Googling shows results for:

db2:
<http://publib.boulder.ibm.com/infocenter/dzichelp/v2r2/index.jsp?topic=/com.ibm.db2.doc.admin/p9li375.htm>

informix:
<http://www.ibm.com/developerworks/data/zones/informix/library/techarticle/0502fan/0502fan.html>

sybase:
<http://searchenterpriselinux.techtarget.com/answer/Query-hints-to-override-optimizer>

mysql:
<http://dev.mysql.com/doc/refman/5.0/en/index-hints.html>

I haven't read much of the rest of this thread, so others may have brought these up before.

Michael Glaesemann
grzm seespotcode net

From:	Greg Smith <greg(at)2ndquadrant(dot)com>
To:	david(at)lang(dot)hm
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Vitalii Tymchyshyn <tivv00(at)gmail(dot)com>, Jon Nelson <jnelson+pgsql(at)jamponi(dot)net>, Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: [HACKERS] Slow count(*) again...
Date:	2011-02-03 22:35:12
Message-ID:	4D4B2DA0.3040601@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

david(at)lang(dot)hm wrote:
> I am making the assumption that an Analyze run only has to go over the
> data once (a seqential scan of the table if it's >> ram for example)
> and gathers stats as it goes.

And that's the part there's some confusion about here. ANALYZE grabs a
random set of samples from the table, the number of which is guided by
the setting for default_statistics_target. The amount of time it takes
is not proportional to the table size; it's only proportional to the
sampling size. Adding a process whose overhead is proportional to the
table size, such as the continuous analyze idea you're proposing, is
quite likely to be a big step backwards relative to just running a
single ANALYZE after the loading is finished.

What people should be doing if concerned about multiple passes happening
is something like this:

CREATE TABLE t (s serial, i integer) WITH (autovacuum_enabled=off);
[populate table]
ANALYZE t;
ALTER TABLE t SET (autovacuum_enabled=on);

I'm not optimistic the database will ever get smart enough to recognize
bulk loading and do this sort of thing automatically, but as the
workaround is so simple it's hard to get motivated to work on trying.

From:	Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>
To:	Maciek Sakrejda <msakrejda(at)truviso(dot)com>
Cc:	"sthomas(at)peak6(dot)com" <sthomas(at)peak6(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: [HACKERS] Slow count(*) again...
Date:	2011-02-03 22:39:06
Message-ID:	4D4B2E8A.7010402@vmsinfo.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

Maciek Sakrejda wrote:
>> The hints are there because they are definitely needed. Yet, there is a
>> religious zeal and a fatwa against them.
>>
>
> The opposition is philosophical, not "religious". There is no "fatwa".
> If you want a serious discussion, avoid inflammatory terms.
>
>
>
I don't want to insult anybody but the whole thing does look strange.
Maybe we can agree to remove that ridiculous "we don't want hints" note
from Postgresql wiki? That would make it look less like , hmph,
philosophical issue and more "not yet implemented" issue, especially if
we have in mind that hints are already here, in the form of
"enable_<method>" switches.

--

Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com
The Leader in Integrated Media Intelligence Solutions

From:	Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>
To:	Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
Cc:	"pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: [HACKERS] Slow count(*) again...
Date:	2011-02-03 22:40:17
Message-ID:	4D4B2ED1.1070602@vmsinfo.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

Mark Kirkwood wrote:
> On 04/02/11 11:08, Josh Berkus wrote:
>
>> I don't think that's actually accurate. Can you give me a list of
>> DBMSes which support hints other than Oracle?
>>
>>
> DB2 LUW (Linux, Unix, Win32 code base) has hint profiles:
>
> http://justdb2chatter.blogspot.com/2008/06/db2-hints-optimizer-selection.html
>
>
>
SQL Server and MySQL too.

--

Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com
The Leader in Integrated Media Intelligence Solutions

From:	Josh Berkus <josh(at)agliodbs(dot)com>
To:	Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>
Cc:	Maciek Sakrejda <msakrejda(at)truviso(dot)com>, sthomas(at)peak6(dot)com, pgsql-performance(at)postgresql(dot)org
Subject:	Re: [HACKERS] Slow count(*) again...
Date:	2011-02-03 23:00:37
Message-ID:	4D4B3395.6010007@agliodbs.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

> I don't want to insult anybody but the whole thing does look strange.
> Maybe we can agree to remove that ridiculous "we don't want hints" note
> from Postgresql wiki? That would make it look less like , hmph,
> philosophical issue and more "not yet implemented" issue, especially if
> we have in mind that hints are already here, in the form of
> "enable_<method>" switches.

Link? There's a lot of stuff on the wiki.

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	"Maciek Sakrejda" <msakrejda(at)truviso(dot)com>, "Mladen Gogala" <mladen(dot)gogala(at)vmsinfo(dot)com>
Cc:	"sthomas(at)peak6(dot)com" <sthomas(at)peak6(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: [HACKERS] Slow count(*) again...
Date:	2011-02-03 23:00:50
Message-ID:	4D4ADF42020000250003A34C@gw.wicourts.gov
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com> wrote:

> Maybe we can agree to remove that ridiculous "we don't want hints"
> note from Postgresql wiki?

I'd be against that. This is rehashed less frequently since that
went in. Less wasted time and bandwidth with it there.

> That would make it look less like , hmph, philosophical issue and
> more "not yet implemented" issue,

Exactly what we don't want.

> especially if we have in mind that hints are already here, in the
> form of "enable_<method>" switches.

Those aren't intended as hints for production use. They're there
for diagnostic purposes. In our shop we've never used any of those
flags in production.

That said, there are ways to force an optimization barrier when
needed, which I have occasionally seen people find useful. And
there are sometimes provably logically equivalent ways to write a
query which result in different plans with different performance.
It's rare that someone presents a poorly performing query on the
list and doesn't get a satisfactory resolution fairly quickly -- if
they present sufficient detail and work nicely with others who are
volunteering their time to help.

-Kevin

From:	Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>
To:	Josh Berkus <josh(at)agliodbs(dot)com>
Cc:	Maciek Sakrejda <msakrejda(at)truviso(dot)com>, "sthomas(at)peak6(dot)com" <sthomas(at)peak6(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: [HACKERS] Slow count(*) again...
Date:	2011-02-03 23:25:02
Message-ID:	4D4B394E.9080505@vmsinfo.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

Josh Berkus wrote:
>> I don't want to insult anybody but the whole thing does look strange.
>> Maybe we can agree to remove that ridiculous "we don't want hints" note
>> from Postgresql wiki? That would make it look less like , hmph,
>> philosophical issue and more "not yet implemented" issue, especially if
>> we have in mind that hints are already here, in the form of
>> "enable_<method>" switches.
>>
>
> Link? There's a lot of stuff on the wiki.
>
>
>
http://wiki.postgresql.org/wiki/Todo#Features_We_Do_Not_Want

No. 2 on the list.

--

Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com
The Leader in Integrated Media Intelligence Solutions

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	david(at)lang(dot)hm
Cc:	Vitalii Tymchyshyn <tivv00(at)gmail(dot)com>, Jon Nelson <jnelson+pgsql(at)jamponi(dot)net>, Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: [HACKERS] Slow count(*) again...
Date:	2011-02-03 23:29:54
Message-ID:	AANLkTikmChsPODbUD4V-gQ8k2yYvvjJY6b9K_29+k7nu@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On Thu, Feb 3, 2011 at 3:54 PM, <david(at)lang(dot)hm> wrote:
> with the current code, this is a completely separate process that knows
> nothing about the load, so if you kick it off when you start the load, it
> makes a pass over the table (competing for I/O), finishes, you continue to
> update the table, so it makes another pass, etc. As you say, this is a bad
> thing to do. I am saying to have an option that ties the two togeather,
> essentially making the data feed into the Analyze run be a fork of the data
> comeing out of the insert run going to disk. So the Analyze run doesn't do
> any I/O and isn't going to complete until the insert is complete. At which
> time it will have seen one copy of the entire table.

Yeah, but you'll be passing the entire table through this separate
process that may only need to see 1% of it or less on a large table.
If you want to write the code and prove it's better than what we have
now, or some other approach that someone else may implement in the
meantime, hey, this is an open source project, and I like improvements
as much as the next guy. But my prediction for what it's worth is
that the results will suck. :-)

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>
To:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc:	Maciek Sakrejda <msakrejda(at)truviso(dot)com>, "sthomas(at)peak6(dot)com" <sthomas(at)peak6(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: [HACKERS] Slow count(*) again...
Date:	2011-02-03 23:33:21
Message-ID:	4D4B3B41.7060803@vmsinfo.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

Kevin Grittner wrote:
> Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com> wrote:
>
>
>> Maybe we can agree to remove that ridiculous "we don't want hints"
>> note from Postgresql wiki?
>>
>
> I'd be against that. This is rehashed less frequently since that
> went in. Less wasted time and bandwidth with it there.
>

Well, the problem will not go away. As I've said before, all other
databases have that feature and none of the reasons listed here
convinced me that everybody else has a crappy optimizer. The problem
may go away altogether if people stop using PostgreSQL.
>
>
>> That would make it look less like , hmph, philosophical issue and
>> more "not yet implemented" issue,
>>
>
> Exactly what we don't want.
>
Who is "we"?

--

Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com
The Leader in Integrated Media Intelligence Solutions

From:	"Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To:	Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>
Cc:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Maciek Sakrejda <msakrejda(at)truviso(dot)com>, "sthomas(at)peak6(dot)com" <sthomas(at)peak6(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: [HACKERS] Slow count(*) again...
Date:	2011-02-03 23:56:57
Message-ID:	1296777417.18411.97.camel@jd-desktop
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On Thu, 2011-02-03 at 18:33 -0500, Mladen Gogala wrote:
>
> >
> > Exactly what we don't want.
> >
> Who is "we"?

The majority of long term hackers.

--
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579
Consulting, Training, Support, Custom Development, Engineering
http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt

From:	Craig James <craig_james(at)emolecules(dot)com>
To:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: [HACKERS] Slow count(*) again...
Date:	2011-02-04 00:08:10
Message-ID:	4D4B436A.5030002@emolecules.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On 2/3/11 1:34 PM, Shaun Thomas wrote:
>> I must say that this purist attitude is extremely surprising to me.
>> All the major DB vendors support optimizer hints, yet in the
>> Postgres community, they are considered bad with almost religious
>> fervor. Postgres community is quite unique with the fatwa against
>> hints.
>
> You missed the argument. The community, or at least the devs, see hints
> as an ugly hack.

Let's kill the myth right now that Postgres doesn't have hints. It DOES have hints.

Just read this forum for a few days and see how many time there are suggestions like "disable nested loops" or "disable seqscan", or "change the random page cost", or "change the join collapse limit".

All of these options are nothing more than a way of altering the planner's choices so that it will pick the plan that the designer already suspects is more optimal.

If that's not a hint, I don't know what is.

Craig

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>
Cc:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Maciek Sakrejda <msakrejda(at)truviso(dot)com>, "sthomas(at)peak6(dot)com" <sthomas(at)peak6(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: [HACKERS] Slow count(*) again...
Date:	2011-02-04 00:08:21
Message-ID:	AANLkTinMdsacnWJUADLg8URiSEKjXV3EmU=1XTAaCT-D@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On Thu, Feb 3, 2011 at 6:33 PM, Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com> wrote:
> Kevin Grittner wrote:
>> Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com> wrote:
>>>
>>> Maybe we can agree to remove that ridiculous "we don't want hints"
>>> note from Postgresql wiki?
>>>
>>
>> I'd be against that. This is rehashed less frequently since that
>> went in. Less wasted time and bandwidth with it there.
>
> Well, the problem will not go away. As I've said before, all other
> databases have that feature and none of the reasons listed here convinced me
> that everybody else has a crappy optimizer. The problem may go away
> altogether if people stop using PostgreSQL.

You seem to be asserting that without hints, problem queries can't be
fixed. But you haven't offered any evidence for that proposition, and
it doesn't match my experience, or the experience of other people on
this list who have been using PostgreSQL for a very long time. If you
want to seriously advance this conversation, you should (1) learn how
people who use PostgreSQL solve these problems and then (2) if you
think there are cases where those methods are inadequate, present
them, and let's have a discussion about it. People in this community
DO change their mind about things - but they do so in response to
*evidence*. You haven't presented one tangible example of where the
sort of hints you seem to want would actually help anything, and yet
you're accusing the people who don't agree with you of being engaged
in a religious war. It seems to me that the shoe is on the other
foot. Religion is when you believe something first and then look for
evidence to support it. Science goes the other direction.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>
To:	"jd(at)commandprompt(dot)com" <jd(at)commandprompt(dot)com>
Cc:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Maciek Sakrejda <msakrejda(at)truviso(dot)com>, "sthomas(at)peak6(dot)com" <sthomas(at)peak6(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: [HACKERS] Slow count(*) again...
Date:	2011-02-04 00:13:17
Message-ID:	4D4B449D.6080402@vmsinfo.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

Joshua D. Drake wrote:
> On Thu, 2011-02-03 at 18:33 -0500, Mladen Gogala wrote:
>
>>
>>
>>>
>>> Exactly what we don't want.
>>>
>>>
>> Who is "we"?
>>
>
> The majority of long term hackers.
>
>
If that is so, I don't see "world domination" in the future, exactly
the opposite. Database whose creators don't trust their users cannot
count on the very bright future. All other databases do have that
feature. I must say, this debate gave me a good deal of stuff to think
about.

--

Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com
The Leader in Integrated Media Intelligence Solutions

From:	Jeremy Harris <jgh(at)wizmail(dot)org>
To:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: [HACKERS] Slow count(*) again...
Date:	2011-02-04 00:29:22
Message-ID:	4D4B4862.20203@wizmail.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On 2011-02-03 23:29, Robert Haas wrote:
> Yeah, but you'll be passing the entire table through this separate
> process that may only need to see 1% of it or less on a large table.

It doesn't sound too impossible to pass only a percentage, starting high
and dropping towards 1% once the loaded size has become "large".
--
Jeremy

From:	Shaun Thomas <sthomas(at)peak6(dot)com>
To:	"pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: [HACKERS] Slow count(*) again...
Date:	2011-02-04 00:30:50
Message-ID:	D247E79EFD801E40A9449A9724F6295B4C5F5FBB@spswchi6mail1.peak6.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

> All other databases do have that feature. I must say, this
> debate gave me a good deal of stuff to think about.

Aaaaand, I think we're done here. The idea that the lack of hints will kill
PostgreSQL is already demonstrably false. This is sounding more and
more like a petulant tantrum.

Folks, I apologize for ever taking part in this conversation and contributing
to the loss of signal to noise. Please forgive me.

--
Shaun Thomas
Peak6 | 141 W. Jackson Blvd. | Suite 800 | Chicago, IL 60604
312-676-8870
sthomas(at)peak6(dot)com

______________________________________________

See http://www.peak6.com/email_disclaimer.php
for terms and conditions related to this email

From:	david(at)lang(dot)hm
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Vitalii Tymchyshyn <tivv00(at)gmail(dot)com>, Jon Nelson <jnelson+pgsql(at)jamponi(dot)net>, Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: [HACKERS] Slow count(*) again...
Date:	2011-02-04 00:39:12
Message-ID:	alpine.DEB.2.00.1102031637480.30983@asgard.lang.hm
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On Thu, 3 Feb 2011, Robert Haas wrote:

> On Thu, Feb 3, 2011 at 3:54 PM, <david(at)lang(dot)hm> wrote:
>> with the current code, this is a completely separate process that knows
>> nothing about the load, so if you kick it off when you start the load, it
>> makes a pass over the table (competing for I/O), finishes, you continue to
>> update the table, so it makes another pass, etc. As you say, this is a bad
>> thing to do. I am saying to have an option that ties the two togeather,
>> essentially making the data feed into the Analyze run be a fork of the data
>> comeing out of the insert run going to disk. So the Analyze run doesn't do
>> any I/O and isn't going to complete until the insert is complete. At which
>> time it will have seen one copy of the entire table.
>
> Yeah, but you'll be passing the entire table through this separate
> process that may only need to see 1% of it or less on a large table.
> If you want to write the code and prove it's better than what we have
> now, or some other approach that someone else may implement in the
> meantime, hey, this is an open source project, and I like improvements
> as much as the next guy. But my prediction for what it's worth is
> that the results will suck. :-)

I will point out that 1% of a very large table can still be a lot of disk
I/O that is avoided (especially if it's random I/O that's avoided)

David Lang

From:	Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Maciek Sakrejda <msakrejda(at)truviso(dot)com>, "sthomas(at)peak6(dot)com" <sthomas(at)peak6(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: [HACKERS] Slow count(*) again...
Date:	2011-02-04 00:39:42
Message-ID:	4D4B4ACE.5090908@vmsinfo.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

Robert Haas wrote:
> On Thu, Feb 3, 2011 at 6:33 PM, Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com> wrote:
>
>> Kevin Grittner wrote:
>>
>>> Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com> wrote:
>>>
>>>> Maybe we can agree to remove that ridiculous "we don't want hints"
>>>> note from Postgresql wiki?
>>>>
>>>>
>>> I'd be against that. This is rehashed less frequently since that
>>> went in. Less wasted time and bandwidth with it there.
>>>
>> Well, the problem will not go away. As I've said before, all other
>> databases have that feature and none of the reasons listed here convinced me
>> that everybody else has a crappy optimizer. The problem may go away
>> altogether if people stop using PostgreSQL.
>>
>
> You seem to be asserting that without hints, problem queries can't be
> fixed. But you haven't offered any evidence for that proposition, and
> it doesn't match my experience, or the experience of other people on
> this list who have been using PostgreSQL for a very long time. If you
> want to seriously advance this conversation, you should (1) learn how
> people who use PostgreSQL solve these problems and then (2) if you
> think there are cases where those methods are inadequate, present
> them, and let's have a discussion about it. People in this community
> DO change their mind about things - but they do so in response to
> *evidence*. You haven't presented one tangible example of where the
> sort of hints you seem to want would actually help anything, and yet
> you're accusing the people who don't agree with you of being engaged
> in a religious war. It seems to me that the shoe is on the other
> foot. Religion is when you believe something first and then look for
> evidence to support it. Science goes the other direction.
>
>
Actually, it is not unlike a religious dogma, only stating that "hints
are bad". It even says so in the wiki. The arguments are
1) Refusal to implement hints is motivated by distrust toward users,
citing that some people may mess things up.
Yes, they can, with and without hints.
2) All other databases have them. This is a major feature and if I were
in the MySQL camp, I would use it as an
argument. Asking me for some "proof" is missing the point. All other
databases have hints precisely because
they are useful. Assertion that only Postgres is so smart that can
operate without hints doesn't match the
reality. As a matter of fact, Oracle RDBMS on the same machine will
regularly beat PgSQL in performance.
That has been my experience so far. I even posted counting query
results.
3) Hints are "make it or break it" feature. They're absolutely needed in
the fire extinguishing situations.

I see no arguments to say otherwise and until that ridiculous "we don't
want hints" dogma is on wiki, this is precisely what it is: a dogma.
Dogmas do not change and I am sorry that you don't see it that way.
However, this discussion
did convince me that I need to take another look at MySQL and tone down
my engagement with PostgreSQL community. This is my last post on the
subject because posts are becoming increasingly personal. This level of
irritation is also
characteristic of a religious community chastising a sinner. Let me
remind you again: all other major databases have that possibility:
Oracle, MySQL, DB2, SQL Server and Informix. Requiring burden of proof
about hints is equivalent to saying that all these databases are
developed by idiots and have a crappy optimizer.
I am not going to back down, but I may stop using Postgres altogether.
If that was your goal, you almost achieved it. Oh yes, and good luck
with the world domination. If there is not enough common sense even to
take down that stupid dogma on the wiki, there isn't much hope left.
With this post, my participation in this group is finished, for the
foreseeable future.

--

Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com
The Leader in Integrated Media Intelligence Solutions

From:	Jeremy Harris <jgh(at)wizmail(dot)org>
To:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: [HACKERS] Slow count(*) again...
Date:	2011-02-04 00:49:52
Message-ID:	4D4B4D30.2060007@wizmail.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On 2011-02-03 21:51, Mark Kirkwood wrote:
> The cases I've seen in production typically involve "outgrowing" optimizer parameter settings: (e.g work_mem, effective_cache_size) as the application dataset gets bigger over time.

An argument in favour of the DBMS maintaining a running estimate of such things.
--
Jeremy

From:	Grant Johnson <grant(at)amadensor(dot)com>
To:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: [HACKERS] Slow count(*) again...
Date:	2011-02-04 01:18:28
Message-ID:	4D4B53E4.6010701@amadensor.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On PostgreSQL, the difference in no hints and hints for that one query
with skewed data is that the query finishes a little faster. On some
others, which shall remain nameless, it is the difference between
finishing in seconds or days, or maybe never. Hints can be useful, but
I can also see why they are not a top priority. They are rarely needed,
and only when working around a bug. If you want them so badly, you have
the source, write a contrib module (can you do that on Oracle or
MSSQL?) If I have a choice between the developers spending time on
implementing hints, and spending time on improving the optimiser, I'll
take the optimiser.

Tom Kyte agrees:
http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:8912905298920
http://tkyte.blogspot.com/2006/08/words-of-wisdom.html

Oracle can be faster on count queries, but that is the only case I have
seen. Generally on most other queries, especially when it involves
complex joins, or indexes on text fields, PostgreSQL is faster on the
same hardware.

From:	Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
To:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: [HACKERS] Slow count(*) again...
Date:	2011-02-04 01:28:08
Message-ID:	4D4B5628.8030100@catalyst.net.nz
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On 04/02/11 13:49, Jeremy Harris wrote:
> On 2011-02-03 21:51, Mark Kirkwood wrote:
>> The cases I've seen in production typically involve "outgrowing"
>> optimizer parameter settings: (e.g work_mem, effective_cache_size) as
>> the application dataset gets bigger over time.
>
> An argument in favour of the DBMS maintaining a running estimate of
> such things.

That is an interesting idea - I'm not quite sure how it could apply to
server config settings (e.g work_mem) for which it would be dangerous to
allow the server to increase on the fly, but it sure would be handy to
have some sort of query execution "memory" so that alerts like:

"STATEMENT: SELECT blah : PARAMETERS blah: using temp file(s), last
execution used memory"

could be generated (this could be quite complex I guess, requiring some
sort of long lived statement plan cache).

Cheers

Mark

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	david(at)lang(dot)hm
Cc:	Vitalii Tymchyshyn <tivv00(at)gmail(dot)com>, Jon Nelson <jnelson+pgsql(at)jamponi(dot)net>, Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: [HACKERS] Slow count(*) again...
Date:	2011-02-04 01:29:14
Message-ID:	AANLkTi=Kx6rRFs3Kz2m32nLKwLGv0Ee87UF8ngKhppgM@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On Thu, Feb 3, 2011 at 7:39 PM, <david(at)lang(dot)hm> wrote:
>> Yeah, but you'll be passing the entire table through this separate
>> process that may only need to see 1% of it or less on a large table.
>> If you want to write the code and prove it's better than what we have
>> now, or some other approach that someone else may implement in the
>> meantime, hey, this is an open source project, and I like improvements
>> as much as the next guy. But my prediction for what it's worth is
>> that the results will suck. :-)
>
> I will point out that 1% of a very large table can still be a lot of disk
> I/O that is avoided (especially if it's random I/O that's avoided)

Sure, but I think that trying to avoid it will be costly in other ways
- you'll be streaming a huge volume of data through some auxiliary
process, which will have to apply some algorithm that's very different
from the one we use today. The reality is that I think there's little
evidence that the way we do ANALYZE now is too expensive. It's
typically very cheap and works very well. It's a bit annoying when it
fires off in the middle of a giant data load, so we might need to
change the time of it a little, but if there's a problem with the
operation itself being too costly, this is the first I'm hearing of
it. We've actually worked *really* hard to make it cheap.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>
Cc:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Maciek Sakrejda <msakrejda(at)truviso(dot)com>, "sthomas(at)peak6(dot)com" <sthomas(at)peak6(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: [HACKERS] Slow count(*) again...
Date:	2011-02-04 01:36:32
Message-ID:	AANLkTi=V9J0Z-QTM=yJUctR7ECFRE3CJYB8NPXcxzuTO@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On Thu, Feb 3, 2011 at 7:39 PM, Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com> wrote:
> reality. As a matter of fact, Oracle RDBMS on the same machine will
> regularly beat PgSQL in performance.
> That has been my experience so far. I even posted counting query results.

It sure is, but those count queries didn't run faster because of query
planner hints. They ran faster because of things like index-only
scans, fast full index scans, asynchronous I/O, and parallel query.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	david(at)lang(dot)hm
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Vitalii Tymchyshyn <tivv00(at)gmail(dot)com>, Jon Nelson <jnelson+pgsql(at)jamponi(dot)net>, Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: [HACKERS] Slow count(*) again...
Date:	2011-02-04 01:37:14
Message-ID:	alpine.DEB.2.00.1102031732470.30983@asgard.lang.hm
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On Thu, 3 Feb 2011, Robert Haas wrote:

> On Thu, Feb 3, 2011 at 7:39 PM, <david(at)lang(dot)hm> wrote:
>>> Yeah, but you'll be passing the entire table through this separate
>>> process that may only need to see 1% of it or less on a large table.
>>> If you want to write the code and prove it's better than what we have
>>> now, or some other approach that someone else may implement in the
>>> meantime, hey, this is an open source project, and I like improvements
>>> as much as the next guy. But my prediction for what it's worth is
>>> that the results will suck. :-)
>>
>> I will point out that 1% of a very large table can still be a lot of disk
>> I/O that is avoided (especially if it's random I/O that's avoided)
>
> Sure, but I think that trying to avoid it will be costly in other ways
> - you'll be streaming a huge volume of data through some auxiliary
> process, which will have to apply some algorithm that's very different
> from the one we use today. The reality is that I think there's little
> evidence that the way we do ANALYZE now is too expensive. It's
> typically very cheap and works very well. It's a bit annoying when it
> fires off in the middle of a giant data load, so we might need to
> change the time of it a little, but if there's a problem with the
> operation itself being too costly, this is the first I'm hearing of
> it. We've actually worked *really* hard to make it cheap.

I could be misunderstanding things here, but my understanding is that it's
'cheap' in that it has little impact on the database while it is running.

the issue here is that the workflow is

load data
analyze
start work

so the cost of analyze in this workflow is not "1% impact on query speed
for the next X time", it's "the database can't be used for the next X time
while we wait for analyze to finish running"

I don't understand why the algorithm would have to be so different than
what's done today, surely the analyze thread could easily be tweaked to
ignore the rest of the data (assuming we don't have the thread sending the
data to analyze do the filtering)

David Lang
>From pgsql-performance-owner(at)postgresql(dot)org Thu Feb 3 21:46:39 2011
Received: from maia.hub.org (maia-2.hub.org [200.46.204.251])
by mail.postgresql.org (Postfix) with ESMTP id 7F1811337B96
for <pgsql-performance-postgresql(dot)org(at)mail(dot)postgresql(dot)org>; Thu, 3 Feb 2011 21:46:39 -0400 (AST)
Received: from mail.postgresql.org ([200.46.204.86])
by maia.hub.org (mx1.hub.org [200.46.204.251]) (amavisd-maia, port 10024)
with ESMTP id 80837-04
for <pgsql-performance-postgresql(dot)org(at)mail(dot)postgresql(dot)org>;
Fri, 4 Feb 2011 01:46:32 +0000 (UTC)
X-Greylist: from auto-whitelisted by SQLgrey-1.7.6
Received: from outmail148143.authsmtp.com (outmail148143.authsmtp.com [62.13.148.143])
by mail.postgresql.org (Postfix) with ESMTP id AF4A11337B95
for <pgsql-performance(at)postgresql(dot)org>; Thu, 3 Feb 2011 21:46:31 -0400 (AST)
Received: from mail-c193.authsmtp.com (mail-c193.authsmtp.com [62.13.128.118])
by punt8.authsmtp.com (8.14.2/8.14.2/Kp) with ESMTP id p141kVx4097555;
Fri, 4 Feb 2011 01:46:31 GMT
Received: from Sidney-Stratton.local (dsl081-245-111.sfo1.dsl.speakeasy.net [64.81.245.111])
(authenticated bits=0)
by mail.authsmtp.com (8.14.2/8.14.2) with ESMTP id p141kSnH064206;
Fri, 4 Feb 2011 01:46:29 GMT
Message-ID: <4D4B5A73(dot)3080302(at)agliodbs(dot)com>
Date: Thu, 03 Feb 2011 17:46:27 -0800
From: Josh Berkus <josh(at)agliodbs(dot)com>
User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.1b3pre) Gecko/20090223 Thunderbird/3.0b2
MIME-Version: 1.0
To: pgsql-performance(at)postgresql(dot)org
CC: Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>
Subject: Re: Why we don't want hints Was: Slow count(*) again...
References: <201102012247(dot)p11Ml6u02682(at)momjian(dot)us> <4D48914B(dot)4050403(at)dunslane(dot)net> <4D489AD1(dot)1010105(at)vmsinfo(dot)com> <AANLkTiknZi97dDwm7sHqy0EL7tpNEP9sY5uPV6AN+q=u(at)mail(dot)gmail(dot)com> <4D499E55(dot)4020107(at)vmsinfo(dot)com> <4D49A6B9(dot)4060608(at)2ndquadrant(dot)com> <4D49C482(dot)4010707(at)vmsinfo(dot)com> <4D49F0BA(dot)9000901(at)2ndquadrant(dot)com> <4D4A16BF(dot)1020304(at)vmsinfo(dot)com> <4D4A4844(dot)3020409(at)2ndquadrant(dot)com> <4D4AD9F6(dot)6040903(at)vmsinfo(dot)com> <4D4AE16E(dot)9040807(at)peak6(dot)com> <4D4B17B4(dot)9030406(at)vmsinfo(dot)com> <4D4B1F5B(dot)4060403(at)peak6(dot)com> <4D4B261B(dot)6010303(at)vmsinfo(dot)com> <AANLkTi=6_7N4EuT-trfPbxb9UmanJsnL+ineO-iJUFtJ(at)mail(dot)gmail(dot)com> <4D4B2E8A(dot)7010402(at)vmsinfo(dot)com> <4D4B3395(dot)6010007(at)agliodbs(dot)com> <4D4B394E(dot)9080505(at)vmsinfo(dot)com>
In-Reply-To: <4D4B394E(dot)9080505(at)vmsinfo(dot)com>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
X-Server-Quench: 9647b83a-3000-11e0-97bb-002264978518
X-AuthReport-Spam: If SPAM / abuse - report it at: http://www.authsmtp.com/abuse
X-AuthRoute: OCdyZgscClZXSx8a IioLCC5HRQ8+YBZL BAkGMA9GIUINWEQL c1ACch19PVdbHwkA AnYLWl5QVldyWS1z bxRZbBtfZk9QXgRr T0pMQFdNFEsoABgA XX1AKhl0cwdGfjB3 Zk9qEHldWEMofUUs X01UFW0bZGY1aH0W VxIKagNUcgFMehZC YlV+XD1vNG8XDRoV JSEUBRUEdQpfOWxK T0kBKlRdXQ4UFzgg DxADGyk0VXIMXHd7 FBghNRYXG1sXLgVw cBMoVlsZNVlUTGUA
X-Authentic-SMTP: 61633136333939.1014:706
X-AuthFastPath: 0 (Was 255)
X-AuthVirus-Status: No virus detected - but ensure you scan with your own anti-virus system.
X-Virus-Scanned: Maia Mailguard 1.0.1
X-Spam-Status: No, hits=-1.9 tagged_above=-10 required=5 tests=BAYES_00=-1.9,
RCVD_IN_DNSWL_NONE=-0.0001
X-Spam-Level:
X-Archive-Number: 201102/149
X-Sequence-Number: 42287

> http://wiki.postgresql.org/wiki/Todo#Features_We_Do_Not_Want
>
> No. 2 on the list.

Heck, *I* wrote that text.

I quote:

"Optimizer hints are used to work around problems in the optimizer and
introduce upgrade and maintenance issues. We would rather have the
problems reported and fixed. We have discussed a more sophisticated
system of per-class cost adjustment instead, but a specification remains
to be developed."

That seems pretty straightforwards. There are even links to prior
discussions about what kind of system would work. I don't think this
text needs any adjustment; that's our clear consensus on the hint issue:
we want a tool which works better than what we've seen in other databases.

Quite frankly, the main reason why most DBMSes have a hinting system has
nothing to do with the quality of optimizer and everything to do with
DBAs who think they're smarter than the optimizer (incorrectly). Oracle
has a darned good query optimizer, and SQL server's is even better.
However, there are a lot of undereducated or fossilized DBAs out there
who don't trust the query planner and want to override it in fairly
arbitrary ways; I refer you to the collected works of Dan Tow, for example.

In many cases Hints are used by DBAs in "emergency" situations because
they are easier than figuring out what the underlying issue is, even
when that could be done relatively simply. Improving diagnostic query
tools would be a much better approach here; for example, the team
working on hypothetical indexes has a lot to offer. If you can figure
out what's really wrong with the query in 10 minutes, you don't need a hint.

Yes, I occasionally run across cases where having a query tweaking
system would help me fix a pathological failure in the planner.
However, even on data warehouses that's less than 0.1% of the queries I
deal with, so this isn't exactly a common event. And any hinting system
we develop needs to address those specific cases, NOT a hypothetical
case which can't be tested. Otherwise we'll implement hints which
actually don't improve queries.

Commercial DBMSes have to give in to what their big paying customers
want, no matter how stupid it is. I'm grateful that I can work on a DBMS
-- the third most popular SQL DBMS in the world -- which can focus on
quality instead.

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	david(at)lang(dot)hm
Cc:	Vitalii Tymchyshyn <tivv00(at)gmail(dot)com>, Jon Nelson <jnelson+pgsql(at)jamponi(dot)net>, Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: [HACKERS] Slow count(*) again...
Date:	2011-02-04 02:05:52
Message-ID:	AANLkTikN14hsVVVLJ=BQdqEcpmzkLLqSbEpqx-th4Ohx@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On Thu, Feb 3, 2011 at 8:37 PM, <david(at)lang(dot)hm> wrote:
> On Thu, 3 Feb 2011, Robert Haas wrote:
>
>> On Thu, Feb 3, 2011 at 7:39 PM, <david(at)lang(dot)hm> wrote:
>>>>
>>>> Yeah, but you'll be passing the entire table through this separate
>>>> process that may only need to see 1% of it or less on a large table.
>>>> If you want to write the code and prove it's better than what we have
>>>> now, or some other approach that someone else may implement in the
>>>> meantime, hey, this is an open source project, and I like improvements
>>>> as much as the next guy. But my prediction for what it's worth is
>>>> that the results will suck. :-)
>>>
>>> I will point out that 1% of a very large table can still be a lot of disk
>>> I/O that is avoided (especially if it's random I/O that's avoided)
>>
>> Sure, but I think that trying to avoid it will be costly in other ways
>> - you'll be streaming a huge volume of data through some auxiliary
>> process, which will have to apply some algorithm that's very different
>> from the one we use today. The reality is that I think there's little
>> evidence that the way we do ANALYZE now is too expensive. It's
>> typically very cheap and works very well. It's a bit annoying when it
>> fires off in the middle of a giant data load, so we might need to
>> change the time of it a little, but if there's a problem with the
>> operation itself being too costly, this is the first I'm hearing of
>> it. We've actually worked *really* hard to make it cheap.
>
> I could be misunderstanding things here, but my understanding is that it's
> 'cheap' in that it has little impact on the database while it is running.

I mean that it's cheap in that it usually takes very little time to complete.

> the issue here is that the workflow is
>
> load data
> analyze
> start work
>
> so the cost of analyze in this workflow is not "1% impact on query speed for
> the next X time", it's "the database can't be used for the next X time while
> we wait for analyze to finish running"

OK.

> I don't understand why the algorithm would have to be so different than
> what's done today, surely the analyze thread could easily be tweaked to
> ignore the rest of the data (assuming we don't have the thread sending the
> data to analyze do the filtering)

If you want to randomly pick 10,000 rows out of all the rows that are
going to be inserted in the table without knowing in advance how many
there will be, how do you do that? Maybe there's an algorithm, but
it's not obvious to me. But mostly, I question how expensive it is to
have a second process looking at the entire table contents vs. going
back and rereading a sample of rows at the end. I can't remember
anyone ever complaining "ANALYZE took too long to run". I only
remember complaints of the form "I had to remember to manually run it
and I wish it had just happened by itself".

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Conor Walsh <ctw(at)adverb(dot)ly>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	"pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: [HACKERS] Slow count(*) again...
Date:	2011-02-04 02:12:57
Message-ID:	AANLkTi=XUprgXqV84iY1sukC_6Pos+DfVPFveaK1CSHf@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

> I can't remember
> anyone ever complaining "ANALYZE took too long to run". I only
> remember complaints of the form "I had to remember to manually run it
> and I wish it had just happened by itself".

Robert,

This sounds like an argument in favor of an implicit ANALYZE after all
COPY statements, and/or an implicit autoanalyze check after all
INSERT/UPDATE statements.

-Conor

From:	"Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To:	Conor Walsh <ctw(at)adverb(dot)ly>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: [HACKERS] Slow count(*) again...
Date:	2011-02-04 02:33:30
Message-ID:	1296786810.18411.102.camel@jd-desktop
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On Thu, 2011-02-03 at 18:12 -0800, Conor Walsh wrote:
> > I can't remember
> > anyone ever complaining "ANALYZE took too long to run". I only
> > remember complaints of the form "I had to remember to manually run it
> > and I wish it had just happened by itself".
>
> Robert,
>
> This sounds like an argument in favor of an implicit ANALYZE after all
> COPY statements, and/or an implicit autoanalyze check after all
> INSERT/UPDATE statements.

Well that already happens. Assuming you insert/update or copy in a
greater amount than the threshold for the

autovacuum_analyze_scale_factor

Then autovacuum is going to analyze on the next run. The default is .1
so it certainly doesn't take much.

>
> -Conor
>

From:	Conor Walsh <ctw(at)adverb(dot)ly>
To:	jd(at)commandprompt(dot)com
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: [HACKERS] Slow count(*) again...
Date:	2011-02-04 02:45:09
Message-ID:	AANLkTikR77m7ttGb5Y1y7HWa0Os95TQXj_+wgqa9Xkfm@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On Thu, Feb 3, 2011 at 6:33 PM, Joshua D. Drake <jd(at)commandprompt(dot)com> wrote:
> Well that already happens...

My understanding is that auto-analyze will fire only after my
transaction is completed, because it is a seperate daemon. If I do
like so:

BEGIN;
COPY ...;
-- Dangerously un-analyzed
SELECT complicated-stuff ...;
END;

Auto-analyze does not benefit me, or might not because it won't fire
often enough. I agree that analyze is very fast, and it often seems
to me like the cost/benefit ratio suggests making auto-analyze even
more aggressive.

Disclaimer/disclosure: I deal exclusively with very large data sets
these days, so analyzing all the time is almost a highly effective
worst-case amortization. I understand that constant analyze is not so
great in, say, an OLTP setting. But if the check is cheap, making
auto-analyze more integrated and less daemon-driven might be a net
win. I'm not sure.

-Conor

From:	Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
To:	Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Maciek Sakrejda <msakrejda(at)truviso(dot)com>, "sthomas(at)peak6(dot)com" <sthomas(at)peak6(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: [HACKERS] Slow count(*) again...
Date:	2011-02-04 02:59:46
Message-ID:	AANLkTinsC5=TgpUoK_bZqQojuof4hu36=N_GXo2+MzBy@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On Thu, Feb 3, 2011 at 5:39 PM, Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com> wrote:
> Actually, it is not unlike a religious dogma, only stating that "hints are
> bad". It even says so in the wiki. The arguments are

There's been considerably more output than "hints bad! Hulk Smash!"

> 1) Refusal to implement hints is motivated by distrust toward users, citing
> that some people may mess things up.

It's more about creating a knob that will create more problems than it
solves. Which I get. And making sure that if you make such a knob
that it'll do the least damage and give the most usefulness. Until a
good proposal and some code to do it shows up, we're all just waving
our hands around describing different parts of the elephant.

> 2) All other databases have them. This is a major feature and if I were in
> the MySQL camp, I would use it as an
> argument. Asking me for some "proof" is missing the point. All other
> databases have hints precisely because
> they are useful.

Uh, two points. 1: Argumentum Ad Populum. Just because it's popular
doesn't mean it's right. 2: Other databases have them because their
optimizers can't make the right decision even most of the time. Yes
they're useful, but like a plastic bad covering a broken car window,
they're useful because they cover something that's inherently broken.

> Assertion that only Postgres is so smart that can operate
> without hints doesn't match the
> reality.

Again, you're twisting what people have said. the point being that
while postgresql makes mistakes, we'd rather concentrate on making the
planner smarter than giving it a lobotomy and running it remotely like
a robot.

> As a matter of fact, Oracle RDBMS on the same machine will
> regularly beat PgSQL in performance.

Yes. And this has little to do with hints. It has to do with years
of development lead with THOUSANDS of engineers who can work on the
most esoteric corner cases in their spare time. Find the pg project a
couple hundred software engineers and maybe we'll catch Oracle a
little quicker. Otherwise we'll have to marshall our resources to do
the best we can on the project ,and that means avoiding maintenance
black holes and having the devs work on the things that give the most
benefit for the cost. Hints are something only a tiny percentage of
users could actually use and use well.

Write a check, hire some developers and get the code done and present
it to the community. If it's good and works it'll likely get
accepted. Or use EDB, since it has oracle compatibility in it.

> That has been my experience so far. I even posted counting query results.
> 3) Hints are "make it or break it" feature. They're absolutely needed in the
> fire extinguishing situations.

I've been using pg since 6.5.2. I've used Oracle since version 8 or
so. I have never been in a situation with postgresql where I couldn't
fix the problem with either tuning, query editing, or asking Tom for a
patch for a problem I found in it. Turnaround time on the last patch
that was made to fix my problem was somewhere in the 24 hour range.
If Oracle can patch their planner that fast, let me know.

> I see no arguments to say otherwise and until that ridiculous "we don't want
> hints" dogma is on wiki, this is precisely what it is: a dogma. Dogmas do
> not change and I am sorry that you don't see it that way. However, this
> discussion

No, it's not dogma, you need to present a strong coherent argument,
not threaten people on the list etc.

From:	Jeff Davis <pgsql(at)j-davis(dot)com>
To:	Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>
Cc:	Chris Browne <cbbrowne(at)acm(dot)org>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: [HACKERS] Slow count(*) again...
Date:	2011-02-04 03:01:10
Message-ID:	1296788470.16666.119.camel@jdavis-ux.asterdata.local
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On Thu, 2011-02-03 at 16:50 -0500, Mladen Gogala wrote:
> Chris Browne wrote:
> > Well, the community declines to add hints until there is actual
> > consensus on a good way to add hints.
> >
> OK. That's another matter entirely. Who should make that decision? Is
> there a committee or a person who would be capable of making that decision?

Admittedly I haven't read this whole discussion, but it seems like
"hints" might be too poorly defined right now.

If by "hints" we mean some mechanism to influence the planner in a more
fine-grained way, I could imagine that some proposal along those lines
might gain significant support.

But, as always, it depends on the content and quality of the proposal
more than the title. If someone has thoughtful proposal that tries to
balance things like:
* DBA control versus query changes/comments
* compatibility across versions versus finer plan control
* allowing the existing optimizer to optimize portions of the
query while controlling other portions
* indicating costs and cardinalities versus plans directly

I am confident that such a proposal will gain traction among the
community as a whole.

However, a series proposals for individual hacks for specific purposes
will probably be rejected. I am in no way implying that you are
approaching it this way -- I am just trying to characterize an approach
that won't make progress.

Regards,
Jeff Davis

From:	Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	david(at)lang(dot)hm, Vitalii Tymchyshyn <tivv00(at)gmail(dot)com>, Jon Nelson <jnelson+pgsql(at)jamponi(dot)net>, Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: [HACKERS] Slow count(*) again...
Date:	2011-02-04 03:13:22
Message-ID:	AANLkTi=s7yBPnZEbO6GAk_tUeRH_bZL=fi3NsfKxPKFf@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On Thu, Feb 3, 2011 at 7:05 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> If you want to randomly pick 10,000 rows out of all the rows that are
> going to be inserted in the table without knowing in advance how many
> there will be, how do you do that?

Maybe you could instead just have it use some % of the rows going by?
Just a guess.

From:	Mark Mielke <mark(at)mark(dot)mielke(dot)cc>
To:	Conor Walsh <ctw(at)adverb(dot)ly>
Cc:	jd(at)commandprompt(dot)com, Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Does auto-analyze work on dirty writes? (was: Re: [HACKERS] Slow count(*) again...)
Date:	2011-02-04 03:31:35
Message-ID:	4D4B7317.3070803@mark.mielke.cc
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On 02/03/2011 09:45 PM, Conor Walsh wrote:
> My understanding is that auto-analyze will fire only after my
> transaction is completed, because it is a seperate daemon. If I do
> like so:
>
> BEGIN;
> COPY ...;
> -- Dangerously un-analyzed
> SELECT complicated-stuff ...;
> END;
>
> Auto-analyze does not benefit me, or might not because it won't fire
> often enough. I agree that analyze is very fast, and it often seems
> to me like the cost/benefit ratio suggests making auto-analyze even
> more aggressive.

The count discussion is boring. Nothing new there. But auto-analyze on
dirty writes does interest me. :-)

My understanding is:

1) Background daemon wakes up and checks whether a number of changes
have happened to the database, irrelevant of transaction boundaries.

2) Background daemon analyzes a percentage of rows in the database for
statistical data, irrelevant of row visibility.

3) Analyze is important for both visible rows and invisible rows, as
plan execution is impacted by invisible rows. As long as they are part
of the table, they may impact the queries performed against the table.

4) It doesn't matter if the invisible rows are invisible because they
are not yet committed, or because they are not yet vacuumed.

Would somebody in the know please confirm the above understanding for my
own piece of mind?

Thanks,
mark

--
Mark Mielke<mark(at)mielke(dot)cc>

From:	Greg Smith <greg(at)2ndquadrant(dot)com>
To:	Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
Cc:	Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Maciek Sakrejda <msakrejda(at)truviso(dot)com>, "sthomas(at)peak6(dot)com" <sthomas(at)peak6(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: [HACKERS] Slow count(*) again...
Date:	2011-02-04 03:40:31
Message-ID:	4D4B752F.6000605@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

Scott Marlowe wrote:
> Yes they're useful, but like a plastic bad covering a broken car window,
> they're useful because they cover something that's inherently broken.
>

Awesome. Now we have a car anology, with a funny typo no less.
"Plastic bad", I love it. This is real progress toward getting all the
common list argument idioms aired out. All we need now is a homage to
Mike Godwin and we can close this down.

From:	Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
To:	Greg Smith <greg(at)2ndquadrant(dot)com>
Cc:	Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Maciek Sakrejda <msakrejda(at)truviso(dot)com>, "sthomas(at)peak6(dot)com" <sthomas(at)peak6(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: [HACKERS] Slow count(*) again...
Date:	2011-02-04 03:48:46
Message-ID:	AANLkTi=2X8ZQ=yGmrLTHQ7U-XKB_EP4GKEi2ZV9Pi59W@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On Thu, Feb 3, 2011 at 8:40 PM, Greg Smith <greg(at)2ndquadrant(dot)com> wrote:
> Scott Marlowe wrote:
>>
>> Yes they're useful, but like a plastic bad covering a broken car window,
>> they're useful because they cover something that's inherently broken.
>>
>
> Awesome. Now we have a car anology, with a funny typo no less. "Plastic
> bad", I love it. This is real progress toward getting all the common list
> argument idioms aired out. All we need now is a homage to Mike Godwin and
> we can close this down.

It's not so much a car analogy as a plastic bad analogy.

From:	Greg Smith <greg(at)2ndquadrant(dot)com>
To:	Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
Cc:	Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Maciek Sakrejda <msakrejda(at)truviso(dot)com>, "sthomas(at)peak6(dot)com" <sthomas(at)peak6(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: [HACKERS] Slow count(*) again...
Date:	2011-02-04 03:56:12
Message-ID:	4D4B78DC.8030405@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

Scott Marlowe wrote:
> It's not so much a car analogy as a plastic bad analogy.
>

Is that like a Plastic Ono Band? Because I think one of those is the
only thing holding the part of my bumper I smashed in the snow on right
now. I could be wrong about the name.

From:	Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
To:	Greg Smith <greg(at)2ndquadrant(dot)com>
Cc:	Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Maciek Sakrejda <msakrejda(at)truviso(dot)com>, "sthomas(at)peak6(dot)com" <sthomas(at)peak6(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: [HACKERS] Slow count(*) again...
Date:	2011-02-04 04:00:01
Message-ID:	AANLkTim+Gwzs1fTmwr_O9ETEnNTQjkda3wWpw1DqUsLp@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On Thu, Feb 3, 2011 at 8:56 PM, Greg Smith <greg(at)2ndquadrant(dot)com> wrote:
> Scott Marlowe wrote:
>>
>> It's not so much a car analogy as a plastic bad analogy.
>>
>
> Is that like a Plastic Ono Band? Because I think one of those is the only
> thing holding the part of my bumper I smashed in the snow on right now. I
> could be wrong about the name.

No, that's a plastic oh no! band you have.

From:	Greg Smith <greg(at)2ndquadrant(dot)com>
To:	Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
Cc:	Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Maciek Sakrejda <msakrejda(at)truviso(dot)com>, "sthomas(at)peak6(dot)com" <sthomas(at)peak6(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: [HACKERS] Slow count(*) again...
Date:	2011-02-04 04:10:41
Message-ID:	4D4B7C41.4080001@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

Scott Marlowe wrote:
> No, that's a plastic oh no! band you have.
>

Wow, right you are. So with this type holding together my Japanese car,
if it breaks and parts fall off, I'm supposed to yell "Oh, no! There
goes Tokyo!", yes?

From:	David Wilson <david(dot)t(dot)wilson(at)gmail(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	david(at)lang(dot)hm, Vitalii Tymchyshyn <tivv00(at)gmail(dot)com>, Jon Nelson <jnelson+pgsql(at)jamponi(dot)net>, Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: [HACKERS] Slow count(*) again...
Date:	2011-02-04 05:06:18
Message-ID:	AANLkTi=A-70EgJ8yWDCn4hbyCaA0nFJFA3+zedan=nLS@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On Thu, Feb 3, 2011 at 6:05 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

>
> If you want to randomly pick 10,000 rows out of all the rows that are
> going to be inserted in the table without knowing in advance how many
> there will be, how do you do that?
>

Reservoir sampling, as the most well-known option:
http://en.wikipedia.org/wiki/Reservoir_sampling

--
- David T. Wilson
david(dot)t(dot)wilson(at)gmail(dot)com

From:	Samuel Gendler <sgendler(at)ideasculptor(dot)com>
To:	David Wilson <david(dot)t(dot)wilson(at)gmail(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, david(at)lang(dot)hm, Vitalii Tymchyshyn <tivv00(at)gmail(dot)com>, Jon Nelson <jnelson+pgsql(at)jamponi(dot)net>, Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: [HACKERS] Slow count(*) again...
Date:	2011-02-04 06:36:20
Message-ID:	AANLkTi=AqJ_GohhV1usnr7tKxDuoPy5pte7Hz_xtxUjZ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

Neat. That was my 'you learn something every day' moment. Thanks.

On Thu, Feb 3, 2011 at 9:06 PM, David Wilson <david(dot)t(dot)wilson(at)gmail(dot)com>wrote:

>
>
> On Thu, Feb 3, 2011 at 6:05 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
>>
>> If you want to randomly pick 10,000 rows out of all the rows that are
>> going to be inserted in the table without knowing in advance how many
>> there will be, how do you do that?
>>
>
> Reservoir sampling, as the most well-known option:
> http://en.wikipedia.org/wiki/Reservoir_sampling
>
> --
> - David T. Wilson
> david(dot)t(dot)wilson(at)gmail(dot)com
>

From:	Віталій Тимчишин <tivv00(at)gmail(dot)com>
To:	david(at)lang(dot)hm
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Jon Nelson <jnelson+pgsql(at)jamponi(dot)net>, Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: [HACKERS] Slow count(*) again...
Date:	2011-02-04 07:08:59
Message-ID:	AANLkTimko0CviRbwkvB=XXci4h7FU8ue7iKPnhM645jX@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

2011/2/3 <david(at)lang(dot)hm>

>
> If the table is not large enough to fit in ram, then it will compete for
> I/O, and the user will have to wait.
>
> what I'm proposing is that as the records are created, the process doing
> the creation makes copies of the records (either all of them, or some of
> them if not all are needed for the analysis, possibly via shareing memory
> with the analysis process), this would be synchronous with the load, not
> asynchronous.
>
> this would take zero I/O bandwidth, it would take up some ram, memory
> bandwidth, and cpu time, but a load of a large table like this is I/O
> contrained.
>
> it would not make sense for this to be the default, but as an option it
> should save a significant amount of time.
>
> I am making the assumption that an Analyze run only has to go over the data
> once (a seqential scan of the table if it's >> ram for example) and gathers
> stats as it goes.
>
> with the current code, this is a completely separate process that knows
> nothing about the load, so if you kick it off when you start the load, it
> makes a pass over the table (competing for I/O), finishes, you continue to
> update the table, so it makes another pass, etc. As you say, this is a bad
> thing to do. I am saying to have an option that ties the two togeather,
> essentially making the data feed into the Analyze run be a fork of the data
> comeing out of the insert run going to disk. So the Analyze run doesn't do
> any I/O and isn't going to complete until the insert is complete. At which
> time it will have seen one copy of the entire table.
>
> Actually that are two different problems. The one is to make analyze more
automatic to make select right after insert more clever by providing
statistics to it.
Another is to make it take less IO resources.
I dont like for it to be embedded into insert (unless the threshold can be
determined before inserts starts). Simply because it is more CPU/memory that
will slow down each insert. And if you will add knob, that is disabled by
default, this will be no more good than manual analyze.

--
Best regards,
Vitalii Tymchyshyn

From:	Віталій Тимчишин <tivv00(at)gmail(dot)com>
To:	Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>
Cc:	Josh Berkus <josh(at)agliodbs(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: [HACKERS] Slow count(*) again...
Date:	2011-02-04 07:24:20
Message-ID:	AANLkTin-SkbdjBqu=yNaO+b-b53Dxf3DB8ax4gyOiT=S@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

2011/2/4 Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>

> Josh Berkus wrote:
>
>> However, since this system wasn't directly compatible with Oracle Hints,
>> folks pushing for hints dropped the solution as unsatisfactory. This is
>> the discussion we have every time: the users who want hints specifically
>> want hints which work exactly like Oracle's, and aren't interested in a
>> system designed for PostgreSQL. It's gotten very boring; it's like the
>> requests to support MySQL-only syntax.
>>
>>
> Actually, I don't want Oracle hints. Oracle hints are ugly and cumbersome.
> I would prefer something like this:
>
>
> http://dev.mysql.com/doc/refman/5.0/en/index-hints.html
>
> As far as I can see, this should be embedded into query, should not it? You
can achive something like this by setting variables right before query
(usually even in same sall by embedding multiple statements into execute
query call).
E.g. "set random_page_cost=1;select something that need index; set
random_page_to to default;". Yes this is as ugly as a hack may look and
can't be used on per-table basis in complex statement, but you have it.

--
Best regards,
Vitalii Tymchyshyn

From:	david(at)lang(dot)hm
To:	Віталій Тимчишин <tivv00(at)gmail(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Jon Nelson <jnelson+pgsql(at)jamponi(dot)net>, Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: [HACKERS] Slow count(*) again...
Date:	2011-02-04 07:32:47
Message-ID:	alpine.DEB.2.00.1102032327430.8162@asgard.lang.hm
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On Fri, 4 Feb 2011, ??????? ???????? wrote:

> 2011/2/3 <david(at)lang(dot)hm>
>
>>
>> If the table is not large enough to fit in ram, then it will compete for
>> I/O, and the user will have to wait.
>>
>> what I'm proposing is that as the records are created, the process doing
>> the creation makes copies of the records (either all of them, or some of
>> them if not all are needed for the analysis, possibly via shareing memory
>> with the analysis process), this would be synchronous with the load, not
>> asynchronous.
>>
>> this would take zero I/O bandwidth, it would take up some ram, memory
>> bandwidth, and cpu time, but a load of a large table like this is I/O
>> contrained.
>>
>> it would not make sense for this to be the default, but as an option it
>> should save a significant amount of time.
>>
>> I am making the assumption that an Analyze run only has to go over the data
>> once (a seqential scan of the table if it's >> ram for example) and gathers
>> stats as it goes.
>>
>> with the current code, this is a completely separate process that knows
>> nothing about the load, so if you kick it off when you start the load, it
>> makes a pass over the table (competing for I/O), finishes, you continue to
>> update the table, so it makes another pass, etc. As you say, this is a bad
>> thing to do. I am saying to have an option that ties the two togeather,
>> essentially making the data feed into the Analyze run be a fork of the data
>> comeing out of the insert run going to disk. So the Analyze run doesn't do
>> any I/O and isn't going to complete until the insert is complete. At which
>> time it will have seen one copy of the entire table.
>>
> Actually that are two different problems. The one is to make analyze more
> automatic to make select right after insert more clever by providing
> statistics to it.
> Another is to make it take less IO resources.
> I dont like for it to be embedded into insert (unless the threshold can be
> determined before inserts starts). Simply because it is more CPU/memory that
> will slow down each insert. And if you will add knob, that is disabled by
> default, this will be no more good than manual analyze.

if it can happen during the copy instead of being a step after the copy it
will speed things up. things like the existing parallel restore could use
this instead ofneeding a separate pass. so I don't think that having to
turn it on manually makes it useless, any more than the fact that you have
to explicity disable fsync makes that disabling feature useless (and the
two features would be likely to be used togeather)

when a copy command is issued, I assume that there is some indication of
how much data is going to follow. I know that it's not just 'insert
everything until the TCP connection terminates' because that would give
you no way of knowing if the copy got everything in or was interrupted
part way through. think about what happens with ftp if the connection
drops, you get a partial file 'successfully' as there is no size provided,
but with HTTP you get a known-bad transfer that you can abort or resume.

David Lang

From:	Віталій Тимчишин <tivv00(at)gmail(dot)com>
To:	david(at)lang(dot)hm
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Jon Nelson <jnelson+pgsql(at)jamponi(dot)net>, Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: [HACKERS] Slow count(*) again...
Date:	2011-02-04 07:39:38
Message-ID:	AANLkTimju9TDj_YNaFMJzRMDQtmYGC1=mRsci7Mo+yo7@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

4 лютого 2011 р. 09:32 <david(at)lang(dot)hm> написав:

>
>
> when a copy command is issued, I assume that there is some indication of
> how much data is going to follow. I know that it's not just 'insert
> everything until the TCP connection terminates' because that would give you
> no way of knowing if the copy got everything in or was interrupted part way
> through. think about what happens with ftp if the connection drops, you get
> a partial file 'successfully' as there is no size provided, but with HTTP
> you get a known-bad transfer that you can abort or resume.
>
> I don't think so, since you can do 'cat my_large_copy.sql | psql'. AFAIR it
simply looks for end of data marker, either in protocol or in stream itself
(run copy from stdin in psql and it will tell you what marker is).

--
Best regards,
Vitalii Tymchyshyn

From:	Andrew Dunstan <andrew(at)dunslane(dot)net>
To:	david(at)lang(dot)hm
Cc:	Віталій Тимчишин <tivv00(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Jon Nelson <jnelson+pgsql(at)jamponi(dot)net>, Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: [HACKERS] Slow count(*) again...
Date:	2011-02-04 07:59:06
Message-ID:	4D4BB1CA.6020400@dunslane.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On 02/04/2011 02:32 AM, david(at)lang(dot)hm wrote:
>
> when a copy command is issued, I assume that there is some indication
> of how much data is going to follow.
>
>

No of course there isn't. How would we do that with a stream like STDIN?
Read the code.

cheers

andrew

From:	Torsten Zühlsdorff <foo(at)meisterderspiele(dot)de>
To:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: [HACKERS] Slow count(*) again...
Date:	2011-02-04 08:43:23
Message-ID:	iige7c$ctl$1@news.eternal-september.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

Mladen Gogala schrieb:

> Well, the problem will not go away. As I've said before, all other
> databases have that feature and none of the reasons listed here
> convinced me that everybody else has a crappy optimizer. The problem
> may go away altogether if people stop using PostgreSQL.

A common problem of programmers is, that they want a solution they
already know for a problem they already know, even if it is the worst
solution the can choose.

There are so many possibilities to solve a given problem and you even
have time to do this before your application get released.

Also: if you rely so heavily on hints, then use a database which
supports hints. A basic mantra in every training i have given is: use
the tool/technic/persons which fits best for the needs of the project.
There are many databases out there - choose for every project the one,
which fits best!

Greetings from Germany,
Torsten
--
http://www.dddbl.de - ein Datenbank-Layer, der die Arbeit mit 8
verschiedenen Datenbanksystemen abstrahiert,
Queries von Applikationen trennt und automatisch die Query-Ergebnisse
auswerten kann.

From:	Grant Johnson <grant(at)amadensor(dot)com>
To:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: [HACKERS] Slow count(*) again...
Date:	2011-02-04 13:05:33
Message-ID:	4D4BF99D.7050901@amadensor.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

> Yes. And this has little to do with hints. It has to do with years
> of development lead with THOUSANDS of engineers who can work on the
> most esoteric corner cases in their spare time. Find the pg project a
> couple hundred software engineers and maybe we'll catch Oracle a
> little quicker. Otherwise we'll have to marshall our resources to do
> the best we can on the project ,and that means avoiding maintenance
> black holes and having the devs work on the things that give the most
> benefit for the cost. Hints are something only a tiny percentage of
> users could actually use and use well.
>
> Write a check, hire some developers and get the code done and present
> it to the community. If it's good and works it'll likely get
> accepted. Or use EDB, since it has oracle compatibility in it.
>
I have to disagree with you here. I have never seen Oracle outperform
PostgreSQL on complex joins, which is where the planner comes in.
Perhaps on certain throughput things, but this is likely do to how we
handle dead rows, and counts, which is definitely because of how dead
rows are handled, but the easier maintenance makes up for those. Also
both of those are by a small percentage.

I have many times had Oracle queries that never finish (OK maybe not
never, but not over a long weekend) on large hardware, but can be
finished on PostgreSQL in a matter or minutes on cheap hardware. This
happens to the point that often I have set up a PostgreSQL database to
copy the data to for querying and runnign the complex reports, even
though the origin of the data was Oracle, since the application was
Oracle specific. It took less time to duplicate the database and run
the query on PostgreSQL than it did to just run it on Oracle.

From:	Kenneth Marshall <ktm(at)rice(dot)edu>
To:	david(at)lang(dot)hm
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Vitalii Tymchyshyn <tivv00(at)gmail(dot)com>, Jon Nelson <jnelson+pgsql(at)jamponi(dot)net>, Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: [HACKERS] Slow count(*) again...
Date:	2011-02-04 14:33:15
Message-ID:	20110204143315.GA1261@aart.is.rice.edu
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On Thu, Feb 03, 2011 at 04:39:12PM -0800, david(at)lang(dot)hm wrote:
> On Thu, 3 Feb 2011, Robert Haas wrote:
>
>> On Thu, Feb 3, 2011 at 3:54 PM, <david(at)lang(dot)hm> wrote:
>>> with the current code, this is a completely separate process that knows
>>> nothing about the load, so if you kick it off when you start the load, it
>>> makes a pass over the table (competing for I/O), finishes, you continue
>>> to
>>> update the table, so it makes another pass, etc. As you say, this is a
>>> bad
>>> thing to do. I am saying to have an option that ties the two togeather,
>>> essentially making the data feed into the Analyze run be a fork of the
>>> data
>>> comeing out of the insert run going to disk. So the Analyze run doesn't
>>> do
>>> any I/O and isn't going to complete until the insert is complete. At
>>> which
>>> time it will have seen one copy of the entire table.
>>
>> Yeah, but you'll be passing the entire table through this separate
>> process that may only need to see 1% of it or less on a large table.
>> If you want to write the code and prove it's better than what we have
>> now, or some other approach that someone else may implement in the
>> meantime, hey, this is an open source project, and I like improvements
>> as much as the next guy. But my prediction for what it's worth is
>> that the results will suck. :-)
>
> I will point out that 1% of a very large table can still be a lot of disk
> I/O that is avoided (especially if it's random I/O that's avoided)
>
> David Lang
>

In addition, the streaming ANALYZE can provide better statistics at
any time during the load and it will be complete immediately. As far
as passing the entire table through the ANALYZE process, a simple
counter can be used to only send the required samples based on the
statistics target. Where this would seem to help the most is in
temporary tables which currently do not work with autovacuum but it
would streamline their use for more complicated queries that need
an analyze to perform well.

Regards,
Ken

From:	Vitalii Tymchyshyn <tivv00(at)gmail(dot)com>
To:	Kenneth Marshall <ktm(at)rice(dot)edu>
Cc:	david(at)lang(dot)hm, Robert Haas <robertmhaas(at)gmail(dot)com>, Jon Nelson <jnelson+pgsql(at)jamponi(dot)net>, Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: [HACKERS] Slow count(*) again...
Date:	2011-02-04 14:38:30
Message-ID:	4D4C0F66.1080507@gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

04.02.11 16:33, Kenneth Marshall написав(ла):
>
> In addition, the streaming ANALYZE can provide better statistics at
> any time during the load and it will be complete immediately. As far
> as passing the entire table through the ANALYZE process, a simple
> counter can be used to only send the required samples based on the
> statistics target. Where this would seem to help the most is in
> temporary tables which currently do not work with autovacuum but it
> would streamline their use for more complicated queries that need
> an analyze to perform well.
>
Actually for me the main "con" with streaming analyze is that it adds
significant CPU burden to already not too fast load process. Especially
if it's automatically done for any load operation performed (and I can't
see how it can be enabled on some threshold).
And you can't start after some threshold of data passed by since you may
loose significant information (like minimal values).

Best regards, Vitalii Tymchyshyn

From:	Kenneth Marshall <ktm(at)rice(dot)edu>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	david(at)lang(dot)hm, Vitalii Tymchyshyn <tivv00(at)gmail(dot)com>, Jon Nelson <jnelson+pgsql(at)jamponi(dot)net>, Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: [HACKERS] Slow count(*) again...
Date:	2011-02-04 14:52:20
Message-ID:	20110204145220.GC1261@aart.is.rice.edu
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On Thu, Feb 03, 2011 at 09:05:52PM -0500, Robert Haas wrote:
> On Thu, Feb 3, 2011 at 8:37 PM, <david(at)lang(dot)hm> wrote:
> > On Thu, 3 Feb 2011, Robert Haas wrote:
> >
> >> On Thu, Feb 3, 2011 at 7:39 PM, ?<david(at)lang(dot)hm> wrote:
> >>>>
> >>>> Yeah, but you'll be passing the entire table through this separate
> >>>> process that may only need to see 1% of it or less on a large table.
> >>>> If you want to write the code and prove it's better than what we have
> >>>> now, or some other approach that someone else may implement in the
> >>>> meantime, hey, this is an open source project, and I like improvements
> >>>> as much as the next guy. ?But my prediction for what it's worth is
> >>>> that the results will suck. ?:-)
> >>>
> >>> I will point out that 1% of a very large table can still be a lot of disk
> >>> I/O that is avoided (especially if it's random I/O that's avoided)
> >>
> >> Sure, but I think that trying to avoid it will be costly in other ways
> >> - you'll be streaming a huge volume of data through some auxiliary
> >> process, which will have to apply some algorithm that's very different
> >> from the one we use today. ?The reality is that I think there's little
> >> evidence that the way we do ANALYZE now is too expensive. ?It's
> >> typically very cheap and works very well. ?It's a bit annoying when it
> >> fires off in the middle of a giant data load, so we might need to
> >> change the time of it a little, but if there's a problem with the
> >> operation itself being too costly, this is the first I'm hearing of
> >> it. ?We've actually worked *really* hard to make it cheap.
> >
> > I could be misunderstanding things here, but my understanding is that it's
> > 'cheap' in that it has little impact on the database while it is running.
>
> I mean that it's cheap in that it usually takes very little time to complete.
>
> > the issue here is that the workflow is
> >
> > load data
> > analyze
> > start work
> >
> > so the cost of analyze in this workflow is not "1% impact on query speed for
> > the next X time", it's "the database can't be used for the next X time while
> > we wait for analyze to finish running"
>
> OK.
>
> > I don't understand why the algorithm would have to be so different than
> > what's done today, surely the analyze thread could easily be tweaked to
> > ignore the rest of the data (assuming we don't have the thread sending the
> > data to analyze do the filtering)
>
> If you want to randomly pick 10,000 rows out of all the rows that are
> going to be inserted in the table without knowing in advance how many
> there will be, how do you do that? Maybe there's an algorithm, but
> it's not obvious to me. But mostly, I question how expensive it is to
> have a second process looking at the entire table contents vs. going
> back and rereading a sample of rows at the end. I can't remember
> anyone ever complaining "ANALYZE took too long to run". I only
> remember complaints of the form "I had to remember to manually run it
> and I wish it had just happened by itself".
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company

Probably doomed to be shot down, but since you are effectively inline,
you could sample assuming a range of table row counts. Start at the
size of a table where random (index) lookups are faster than a sequential
scan and then at appropriate multiples, 100x, 100*100X,... then you should
be able to generate appropriate statistics. I have not actually looked at
how that would happen, but it would certainly allow you to process far, far
fewer rows than the entire table.

Regards,
Ken

From:	Nick Lello <nick(dot)lello(at)rentrakmail(dot)com>
To:
Cc:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: [HACKERS] Slow count(*) again...
Date:	2011-02-04 14:55:01
Message-ID:	AANLkTikK86APLfOTdxekudXgK3DNPusbAQZMFfDXoas_@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

Informix IDS supports hints as well; normally the only need for hints in
this engine is when the Table/Index statistics are not being updated on a
regular basis (ie: lazy DBA).

On 3 February 2011 22:17, Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>wrote:

> On 04/02/11 11:08, Josh Berkus wrote:
>
>> I don't think that's actually accurate. Can you give me a list of
>> DBMSes which support hints other than Oracle?
>>
>> DB2 LUW (Linux, Unix, Win32 code base) has hint profiles:
>
>
> http://justdb2chatter.blogspot.com/2008/06/db2-hints-optimizer-selection.html
>
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

Nick Lello | Web Architect
o +44 (0) 8433309374 | m +44 (0) 7917 138319
Email: nick.lello at rentrak.com
RENTRAK | www.rentrak.com | NASDAQ: RENT

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Mark Mielke <mark(at)mark(dot)mielke(dot)cc>
Cc:	Conor Walsh <ctw(at)adverb(dot)ly>, jd(at)commandprompt(dot)com, Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: Does auto-analyze work on dirty writes? (was: Re: [HACKERS] Slow count(*) again...)
Date:	2011-02-04 15:41:04
Message-ID:	22836.1296834064@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

Mark Mielke <mark(at)mark(dot)mielke(dot)cc> writes:
> My understanding is:

> 1) Background daemon wakes up and checks whether a number of changes
> have happened to the database, irrelevant of transaction boundaries.

> 2) Background daemon analyzes a percentage of rows in the database for
> statistical data, irrelevant of row visibility.

> 3) Analyze is important for both visible rows and invisible rows, as
> plan execution is impacted by invisible rows. As long as they are part
> of the table, they may impact the queries performed against the table.

> 4) It doesn't matter if the invisible rows are invisible because they
> are not yet committed, or because they are not yet vacuumed.

> Would somebody in the know please confirm the above understanding for my
> own piece of mind?

No.

1. Autovacuum fires when the stats collector's insert/update/delete
counts have reached appropriate thresholds. Those counts are
accumulated from messages sent by backends at transaction commit or
rollback, so they take no account of what's been done by transactions
still in progress.

2. Only live rows are included in the stats computed by ANALYZE.
(IIRC it uses SnapshotNow to decide whether rows are live.)

Although the stats collector does track an estimate of the number of
dead rows for the benefit of autovacuum, this isn't used by planning.
Table bloat is accounted for only in terms of growth of the physical
size of the table in blocks.

regards, tom lane

From:	Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
To:	Grant Johnson <grant(at)amadensor(dot)com>
Cc:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: [HACKERS] Slow count(*) again...
Date:	2011-02-04 16:05:32
Message-ID:	AANLkTinDJkF8J9-W64JCMc1Q+q_xfCFw1aMT2ZdLgs4B@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On Fri, Feb 4, 2011 at 6:05 AM, Grant Johnson <grant(at)amadensor(dot)com> wrote:
>
>> Yes. And this has little to do with hints. It has to do with years
>> of development lead with THOUSANDS of engineers who can work on the
>> most esoteric corner cases in their spare time. Find the pg project a
>> couple hundred software engineers and maybe we'll catch Oracle a
>> little quicker. Otherwise we'll have to marshall our resources to do
>> the best we can on the project ,and that means avoiding maintenance
>> black holes and having the devs work on the things that give the most
>> benefit for the cost. Hints are something only a tiny percentage of
>> users could actually use and use well.
>>
>> Write a check, hire some developers and get the code done and present
>> it to the community. If it's good and works it'll likely get
>> accepted. Or use EDB, since it has oracle compatibility in it.
>>
> I have to disagree with you here. I have never seen Oracle outperform
> PostgreSQL on complex joins, which is where the planner comes in. Perhaps
> on certain throughput things, but this is likely do to how we handle dead
> rows, and counts, which is definitely because of how dead rows are handled,
> but the easier maintenance makes up for those. Also both of those are by a
> small percentage.
>
> I have many times had Oracle queries that never finish (OK maybe not never,
> but not over a long weekend) on large hardware, but can be finished on
> PostgreSQL in a matter or minutes on cheap hardware. This happens to the
> point that often I have set up a PostgreSQL database to copy the data to for
> querying and runnign the complex reports, even though the origin of the data
> was Oracle, since the application was Oracle specific. It took less time
> to duplicate the database and run the query on PostgreSQL than it did to
> just run it on Oracle.

It very much depends on the query. With lots of tables to join, and
with pg 8.1 which is what I used when we were running Oracle 9, Oracle
won. With fewer tables to join in an otherwise complex reporting
query PostgreSQL won. I did the exact thing you're talking about. I
actually wrote a simple replication system fro Oracle to PostgreSQL
(it was allowed to be imperfect because it was stats data and we could
recreate at a moment).

PostgreSQL on a PIV workstation with 2G ram and 4 SATA drives in
RAID-10 stomped Oracle on much bigger Sun hardware into the ground for
reporting queries. Queries that ran for hours or didn't finish in
Oracle ran in 5 to 30 minutes on the pg box.

But not all queries were like that.

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Vitalii Tymchyshyn <tivv00(at)gmail(dot)com>
Cc:	Kenneth Marshall <ktm(at)rice(dot)edu>, david(at)lang(dot)hm, Jon Nelson <jnelson+pgsql(at)jamponi(dot)net>, Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: [HACKERS] Slow count(*) again...
Date:	2011-02-04 18:48:32
Message-ID:	AANLkTik1OYXBvXryop=SiRg_54_7vwmAZMaYTK9Jt7_L@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On Fri, Feb 4, 2011 at 9:38 AM, Vitalii Tymchyshyn <tivv00(at)gmail(dot)com> wrote:
> Actually for me the main "con" with streaming analyze is that it adds
> significant CPU burden to already not too fast load process.

Exactly.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Greg Smith <greg(at)2ndquadrant(dot)com>
Cc:	Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "david(at)lang(dot)hm" <david(at)lang(dot)hm>, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, Vitalii Tymchyshyn <tivv00(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: [HACKERS] Slow count(*) again...
Date:	2011-02-04 22:58:17
Message-ID:	201102042258.p14MwIV28696@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

Greg Smith wrote:
> Check out
> http://www.indeed.com/jobtrends?q=postgres%2C+mysql%2C+oracle&relative=1&relative=1
> if you want to see the real story here. Oracle has a large installed
> base, but it's considered a troublesome legacy product being replaced

+1 for Oracle being a "troublesome legacy product".

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>
Cc:	Chris Browne <cbbrowne(at)acm(dot)org>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: [HACKERS] Slow count(*) again...
Date:	2011-02-04 23:17:12
Message-ID:	201102042317.p14NHCK12314@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

Mladen Gogala wrote:
> Chris Browne wrote:
> > Well, the community declines to add hints until there is actual
> > consensus on a good way to add hints.
> >
> OK. That's another matter entirely. Who should make that decision? Is
> there a committee or a person who would be capable of making that decision?
>
> > Nobody has ever proposed a way to add hints where consensus was arrived
> > at that the way was good, so...
> >
>
> So, I will have to go back on my decision to use Postgres and
> re-consider MySQL? I will rather throw away the effort invested in

You want to reconsider using MySQL because Postgres doesn't have hints.
Hard to see how that logic works.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Maciek Sakrejda <msakrejda(at)truviso(dot)com>, "sthomas(at)peak6(dot)com" <sthomas(at)peak6(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: [HACKERS] Slow count(*) again...
Date:	2011-02-05 00:17:10
Message-ID:	201102050017.p150HAm22228@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

Mladen Gogala wrote:
> Actually, it is not unlike a religious dogma, only stating that "hints
> are bad". It even says so in the wiki. The arguments are
> 1) Refusal to implement hints is motivated by distrust toward users,
> citing that some people may mess things up.
> Yes, they can, with and without hints.
> 2) All other databases have them. This is a major feature and if I were
> in the MySQL camp, I would use it as an
> argument. Asking me for some "proof" is missing the point. All other
> databases have hints precisely because
> they are useful. Assertion that only Postgres is so smart that can
> operate without hints doesn't match the
> reality. As a matter of fact, Oracle RDBMS on the same machine will
> regularly beat PgSQL in performance.
> That has been my experience so far. I even posted counting query
> results.
> 3) Hints are "make it or break it" feature. They're absolutely needed in
> the fire extinguishing situations.
>
> I see no arguments to say otherwise and until that ridiculous "we don't
> want hints" dogma is on wiki, this is precisely what it is: a dogma.

Uh, that is kind of funny considering that text is on a 'wiki', meaning
everything there is open to change if the group agrees.

> Dogmas do not change and I am sorry that you don't see it that way.
> However, this discussion
> did convince me that I need to take another look at MySQL and tone down
> my engagement with PostgreSQL community. This is my last post on the
> subject because posts are becoming increasingly personal. This level of
> irritation is also
> characteristic of a religious community chastising a sinner. Let me
> remind you again: all other major databases have that possibility:
> Oracle, MySQL, DB2, SQL Server and Informix. Requiring burden of proof
> about hints is equivalent to saying that all these databases are
> developed by idiots and have a crappy optimizer.

You need to state the case for hints independent of what other databases
do, and indepdendent of fixing the problems where the optimizer doesn't
match reatility.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

From:	Mark Mielke <mark(at)mark(dot)mielke(dot)cc>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Conor Walsh <ctw(at)adverb(dot)ly>, jd(at)commandprompt(dot)com, Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: Does auto-analyze work on dirty writes?
Date:	2011-02-05 01:50:13
Message-ID:	4D4CACD5.7060104@mark.mielke.cc
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On 02/04/2011 10:41 AM, Tom Lane wrote:
> 1. Autovacuum fires when the stats collector's insert/update/delete
> counts have reached appropriate thresholds. Those counts are
> accumulated from messages sent by backends at transaction commit or
> rollback, so they take no account of what's been done by transactions
> still in progress.
>
> 2. Only live rows are included in the stats computed by ANALYZE.
> (IIRC it uses SnapshotNow to decide whether rows are live.)
>
> Although the stats collector does track an estimate of the number of
> dead rows for the benefit of autovacuum, this isn't used by planning.
> Table bloat is accounted for only in terms of growth of the physical
> size of the table in blocks.

Thanks, Tom.

Does this un-analyzed "bloat" not impact queries? I guess the worst case
here is if autovaccum is disabled for some reason and 99% of the table
is dead rows. If I understand the above correctly, I think analyze might
generate a bad plan under this scenario, thinking that a value is
unique, using the index - but every tuple in the index has the same
value and each has to be looked up in the table to see if it is visible?

Still, I guess the idea here is not to disable autovacuum, making dead
rows insignificant in the grand scheme of things. I haven't specifically
noticed any performance problems here - PostgreSQL is working great for
me as usual. Just curiosity...

Cheers,
mark

--
Mark Mielke<mark(at)mielke(dot)cc>

From:	Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
To:	Bruce Momjian <bruce(at)momjian(dot)us>
Cc:	Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Maciek Sakrejda <msakrejda(at)truviso(dot)com>, "sthomas(at)peak6(dot)com" <sthomas(at)peak6(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: [HACKERS] Slow count(*) again...
Date:	2011-02-05 04:45:05
Message-ID:	AANLkTik6rEiwi8+76=P__Pa9mwBAAKqMQxS=yx6Z=MSK@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On Fri, Feb 4, 2011 at 5:17 PM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> Mladen Gogala wrote:
>> characteristic of a religious community chastising a sinner. Let me
>> remind you again: all other major databases have that possibility:
>> Oracle, MySQL, DB2, SQL Server and Informix. Requiring burden of proof
>> about hints is equivalent to saying that all these databases are
>> developed by idiots and have a crappy optimizer.
>
> You need to state the case for hints independent of what other databases
> do, and indepdendent of fixing the problems where the optimizer doesn't
> match reatility.

And that kind of limits to an area where we would the ability to nudge
costs instead of just set them for an individual part of a query.
i.e. join b on (a.a=b.b set selectivity=0.01) or (a.a=b.b set
selectivity=1.0) or something like that. i.e. a.a and b.b have a lot
of matches or few, etc. If there's any thought of hinting it should
be something that a DBA, knowing his data model well, WILL know more
than the current planner because the planner can't get cross table
statistics yet.

But then, why not do something to allow cross table indexes and / or
statistics? To me that would go much further to helping fix the
issues where the current planner "flies blind".

--
To understand recursion, one must first understand recursion.

From:	david(at)lang(dot)hm
To:	Vitalii Tymchyshyn <tivv00(at)gmail(dot)com>
Cc:	Kenneth Marshall <ktm(at)rice(dot)edu>, Robert Haas <robertmhaas(at)gmail(dot)com>, Jon Nelson <jnelson+pgsql(at)jamponi(dot)net>, Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: [HACKERS] Slow count(*) again...
Date:	2011-02-05 05:46:30
Message-ID:	alpine.DEB.2.00.1102042142571.8162@asgard.lang.hm
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On Fri, 4 Feb 2011, Vitalii Tymchyshyn wrote:

> 04.02.11 16:33, Kenneth Marshall ???????(??):
>>
>> In addition, the streaming ANALYZE can provide better statistics at
>> any time during the load and it will be complete immediately. As far
>> as passing the entire table through the ANALYZE process, a simple
>> counter can be used to only send the required samples based on the
>> statistics target. Where this would seem to help the most is in
>> temporary tables which currently do not work with autovacuum but it
>> would streamline their use for more complicated queries that need
>> an analyze to perform well.
>>
> Actually for me the main "con" with streaming analyze is that it adds
> significant CPU burden to already not too fast load process. Especially if
> it's automatically done for any load operation performed (and I can't see how
> it can be enabled on some threshold).

two thoughts

1. if it's a large enough load, itsn't it I/O bound?

2. this chould be done in a separate process/thread than the load itself,
that way the overhead of the load is just copying the data in memory to
the other process.

with a multi-threaded load, this would eat up some cpu that could be used
for the load, but cores/chip are still climbing rapidly so I expect that
it's still pretty easy to end up with enough CPU to handle the extra load.

David Lang

> And you can't start after some threshold of data passed by since you may
> loose significant information (like minimal values).
>
> Best regards, Vitalii Tymchyshyn
>

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	david(at)lang(dot)hm
Cc:	Vitalii Tymchyshyn <tivv00(at)gmail(dot)com>, Kenneth Marshall <ktm(at)rice(dot)edu>, Jon Nelson <jnelson+pgsql(at)jamponi(dot)net>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: [HACKERS] Slow count(*) again...
Date:	2011-02-05 06:37:49
Message-ID:	AANLkTi=EkCUkQs9VTtW4aOBNNNV0gi8jYZhNgdHuFTKM@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On Sat, Feb 5, 2011 at 12:46 AM, <david(at)lang(dot)hm> wrote:
>> Actually for me the main "con" with streaming analyze is that it adds
>> significant CPU burden to already not too fast load process. Especially if
>> it's automatically done for any load operation performed (and I can't see
>> how it can be enabled on some threshold).
>
> two thoughts
>
> 1. if it's a large enough load, itsn't it I/O bound?

Sometimes. Our COPY is not as cheap as we'd like it to be.

> 2. this chould be done in a separate process/thread than the load itself,
> that way the overhead of the load is just copying the data in memory to the
> other process.

I think that's more expensive than you're giving it credit for.

But by all means implement it and post the patch if it works out...!

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	david(at)lang(dot)hm, Vitalii Tymchyshyn <tivv00(at)gmail(dot)com>, Kenneth Marshall <ktm(at)rice(dot)edu>, Jon Nelson <jnelson+pgsql(at)jamponi(dot)net>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: [HACKERS] Slow count(*) again...
Date:	2011-02-05 08:38:40
Message-ID:	AANLkTikKTc2GPm87Liztk3UTdRY+_THquaaW3qoMe2+d@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On Fri, Feb 4, 2011 at 11:37 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Sat, Feb 5, 2011 at 12:46 AM, <david(at)lang(dot)hm> wrote:
>>> Actually for me the main "con" with streaming analyze is that it adds
>>> significant CPU burden to already not too fast load process. Especially if
>>> it's automatically done for any load operation performed (and I can't see
>>> how it can be enabled on some threshold).
>>
>> two thoughts
>>
>> 1. if it's a large enough load, itsn't it I/O bound?
>
> Sometimes. Our COPY is not as cheap as we'd like it to be.

With a 24 drive RAID-10 array that can read at ~1GB/s I am almost
always CPU bound during copies. This isn't wholly bad as it leaves
spare IO for the rest of the machine so regular work carries on just
fine.

From:	Greg Smith <greg(at)2ndquadrant(dot)com>
To:	Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, david(at)lang(dot)hm, Vitalii Tymchyshyn <tivv00(at)gmail(dot)com>, Kenneth Marshall <ktm(at)rice(dot)edu>, Jon Nelson <jnelson+pgsql(at)jamponi(dot)net>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: [HACKERS] Slow count(*) again...
Date:	2011-02-05 08:49:05
Message-ID:	4D4D0F01.6070407@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

Scott Marlowe wrote:
> With a 24 drive RAID-10 array that can read at ~1GB/s I am almost
> always CPU bound during copies. This isn't wholly bad as it leaves
> spare IO for the rest of the machine so regular work carries on just
> fine.
>

And you don't need nearly that much I/O bandwidth to reach that point.
I've hit being CPU bound on COPY...FROM on systems with far less drives
than 24.

From:	Gorshkov <gorshkovlists(at)gmail(dot)com>
To:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: [HACKERS] Slow count(*) again...
Date:	2011-02-10 01:58:12
Message-ID:	4D534634.7090902@gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On 2011-02-03 22:48, Scott Marlowe wrote:
> On Thu, Feb 3, 2011 at 8:40 PM, Greg Smith<greg(at)2ndquadrant(dot)com> wrote:
>> Scott Marlowe wrote:
>>>
>>> Yes they're useful, but like a plastic bad covering a broken car window,
>>> they're useful because they cover something that's inherently broken.
>>>
>>
>> Awesome. Now we have a car anology, with a funny typo no less. "Plastic
>> bad", I love it. This is real progress toward getting all the common list
>> argument idioms aired out. All we need now is a homage to Mike Godwin and
>> we can close this down.
>
> It's not so much a car analogy as a plastic bad analogy.
>

Don't be such an analogy Nazi.

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Josh Berkus <josh(at)agliodbs(dot)com>
Cc:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: Why we don't want hints Was: Slow count(*) again...
Date:	2011-02-10 15:50:40
Message-ID:	AANLkTi=mgusFAd-iU7w0G+_R3MJbJxeSFeU73HiB7b8i@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On Thu, Feb 3, 2011 at 8:46 PM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
> "Optimizer hints are used to work around problems in the optimizer and
> introduce upgrade and maintenance issues. We would rather have the
> problems reported and fixed. We have discussed a more sophisticated
> system of per-class cost adjustment instead, but a specification remains
> to be developed."
>
> That seems pretty straightforwards. There are even links to prior
> discussions about what kind of system would work. I don't think this
> text needs any adjustment; that's our clear consensus on the hint issue:
> we want a tool which works better than what we've seen in other databases.

I think it's just dumb to say we don't want hints. We want hints, or
at least many of us do. We just want them to actually work, and to
not suck. Can't we just stop saying we don't want them, and say that
we do want something, but it has to be really good?

> Yes, I occasionally run across cases where having a query tweaking
> system would help me fix a pathological failure in the planner.
> However, even on data warehouses that's less than 0.1% of the queries I
> deal with, so this isn't exactly a common event. And any hinting system
> we develop needs to address those specific cases, NOT a hypothetical
> case which can't be tested. Otherwise we'll implement hints which
> actually don't improve queries.

No argument.

The bottom line here is that a lot of features that we don't have are
things that we don't want in the sense that we're not interested in
working on them over other things that seem more pressing, and we have
finite manpower. But if someone feels motivated to work on it, and
can actually come up with something good, then why should we give the
impression that such a thing would be rejected out of hand? I think
we ought to nuke that item and replace it with some items in the
optimizer section that express what we DO want, which is some better
ways of fixing queries the few queries that suck despite our best (and
very successful) efforts to produce a top-notch optimizer.

The problem with multi-column statistics is a particularly good
example of something in this class. We may have a great solution to
that problem for PostgreSQL 11.0. But between now and then, if you
have that problem, there is no good way to adjust the selectivity
estimates. If this were an academic research project or just being
used for toy projects that didn't really matter, we might not care.
But this is a real database that people are relying on for their
livelihood, and we should be willing to provide a way for those people
to not get fired when they hit the 0.1% of queries that can't be fixed
using existing methods. I don't know exactly what the right solution
is off the top of my head, but digging in our heels is not it.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	"Josh Berkus" <josh(at)agliodbs(dot)com>, "Robert Haas" <robertmhaas(at)gmail(dot)com>
Cc:	<pgsql-performance(at)postgresql(dot)org>
Subject:	Re: Why we don't want hints Was: Slow count(*) again...
Date:	2011-02-10 16:45:20
Message-ID:	4D53C1C0020000250003A783@gw.wicourts.gov
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> I think it's just dumb to say we don't want hints. We want hints,
> or at least many of us do.

Well, yeah. Even those most ostensibly opposed to hints have been
known to post that they would rather not have the optimizer
recognize two logically equivalent constructs and optimize them the
same because they find the current difference "useful to coerce the
optimizer" to choose a certain plan. That's implementing hints but
refusing to document them. And it sometimes bites those who don't
know they're accidentally using a hint construct. An explicit and
documented hint construct would be better. Probably not a "use this
plan" type hint, but some form of optimization barrier hint, maybe.
You know, like OFFSET 0, but more explicitly hint-like.

> The bottom line here is that a lot of features that we don't have
> are things that we don't want in the sense that we're not
> interested in working on them over other things that seem more
> pressing, and we have finite manpower. But if someone feels
> motivated to work on it, and can actually come up with something
> good, then why should we give the impression that such a thing
> would be rejected out of hand? I think we ought to nuke that item
> and replace it with some items in the optimizer section that
> express what we DO want, which is some better ways of fixing
> queries the few queries that suck despite our best (and very
> successful) efforts to produce a top-notch optimizer.
>
> The problem with multi-column statistics is a particularly good
> example of something in this class. We may have a great solution
> to that problem for PostgreSQL 11.0. But between now and then, if
> you have that problem, there is no good way to adjust the
> selectivity estimates.

Yeah, this is probably the most important area to devise some
explicit way for a DBA who knows that such multicolumn selections
are going to be used, and is capable of calculating some correlation
factor, could supply it to the optimizer to override the naive
calculation it currently does. Even there I would tend to think
that the sort of "do it this way" hints that people seem to
initially want wouldn't be good; it should be a way to override the
costing factor which the optimizer gets wrong, so it can do its
usual excellent job of evaluating plans with accurate costs.

> I don't know exactly what the right solution is off the top of my
> head, but digging in our heels is not it.

Well, I'm comfortable digging in my heels against doing *lame* hints
just because "it's what all the other kids are doing," which I think
is the only thing which would have satisfied the OP on this thread.
>From both on-list posts and ones exchanged off-list with me, it
seems he was stubbornly resistant to properly tuning the server to
see if any problems remained, or posting particular problems to see
how they would be most effectively handled in PostgreSQL. We
obviously can't be drawn into dumb approaches because of
ill-informed demands like that.

-Kevin

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc:	"Josh Berkus" <josh(at)agliodbs(dot)com>, "Robert Haas" <robertmhaas(at)gmail(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject:	Re: Why we don't want hints Was: Slow count(*) again...
Date:	2011-02-10 17:01:42
Message-ID:	17455.1297357302@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov> writes:
> Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> I don't know exactly what the right solution is off the top of my
>> head, but digging in our heels is not it.

> Well, I'm comfortable digging in my heels against doing *lame* hints
> just because "it's what all the other kids are doing," which I think
> is the only thing which would have satisfied the OP on this thread.

Right. If someone comes up with a design that avoids the serious
pitfalls of traditional hinting schemes, that'd be great. But I'm
not interested in implementing Oracle-like hints just because Oracle
has them, which I think was basically what the OP wanted. I haven't
seen a hinting scheme that didn't suck (and that includes the aspects
of our own current behavior that are hint-like). I don't say that
there can't be one.

I believe that the FAQ entry is meant to answer people who come along
and say "oh, this is easily solved, just do what $PRODUCT does". The
generic answer to that is "no, it's not that easy". But maybe the FAQ
should be rephrased to be more like "we don't want traditional hints
because of problems X, Y, and Z. If you have an idea that avoids those
problems, let us know."

regards, tom lane

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc:	Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject:	Re: Why we don't want hints Was: Slow count(*) again...
Date:	2011-02-10 17:02:58
Message-ID:	AANLkTinrzwbAapmtmxr1_8eOXhUcgPWSFCuH5Gt2qkqf@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On Thu, Feb 10, 2011 at 11:45 AM, Kevin Grittner
<Kevin(dot)Grittner(at)wicourts(dot)gov> wrote:
> Well, I'm comfortable digging in my heels against doing *lame* hints
> just because "it's what all the other kids are doing," which I think
> is the only thing which would have satisfied the OP on this thread.
> From both on-list posts and ones exchanged off-list with me, it
> seems he was stubbornly resistant to properly tuning the server to
> see if any problems remained, or posting particular problems to see
> how they would be most effectively handled in PostgreSQL. We
> obviously can't be drawn into dumb approaches because of
> ill-informed demands like that.

Nor was I proposing any such thing. But that doesn't make "we don't
want hints" an accurate statement. Despite the impression that OP
went away with, the real situation is a lot more nuanced than that,
and the statement on the Todo list gives the wrong impression, IMHO.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Shaun Thomas <sthomas(at)peak6(dot)com>
To:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc:	Josh Berkus <josh(at)agliodbs(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: Why we don't want hints Was: Slow count(*) again...
Date:	2011-02-10 17:06:42
Message-ID:	4D541B22.9080000@peak6.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On 02/10/2011 10:45 AM, Kevin Grittner wrote:

> Even there I would tend to think that the sort of "do it this way"
> hints that people seem to initially want wouldn't be good; it should
> be a way to override the costing factor which the optimizer gets
> wrong, so it can do its usual excellent job of evaluating plans with
> accurate costs.

You know... that's an interesting approach. We already do that with
functions by allowing users to specify the estimated cost, rows
returned, and even override config settings. It's an inexact science at
best, but it might help the optimizer out.

Really... how difficult would it be to add that syntax to the JOIN
statement, for example?

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
312-676-8870
sthomas(at)peak6(dot)com

______________________________________________

See http://www.peak6.com/email_disclaimer.php
for terms and conditions related to this email

From:	Shaun Thomas <sthomas(at)peak6(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Josh Berkus <josh(at)agliodbs(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: Why we don't want hints Was: Slow count(*) again...
Date:	2011-02-10 17:09:18
Message-ID:	4D541BBE.50705@peak6.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On 02/10/2011 11:01 AM, Tom Lane wrote:

> But I'm not interested in implementing Oracle-like hints just because
> Oracle has them, which I think was basically what the OP wanted.

Hilariously, I'm not so sure that's what the OP wanted. Several of us
pointed him to EnterpriseDB and their Oracle-style syntax, and the only
thing he said about that was to use it as further evidence that
PostgreSQL should implement them. I'm very tempted to say he wanted
something for free, and was angry he couldn't get it.

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
312-676-8870
sthomas(at)peak6(dot)com

______________________________________________

See http://www.peak6.com/email_disclaimer.php
for terms and conditions related to this email

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject:	Re: Why we don't want hints Was: Slow count(*) again...
Date:	2011-02-10 17:19:34
Message-ID:	AANLkTikM8j8R75G2Ji-r59Oxq86a12HchpVtR28hJ-6i@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On Thu, Feb 10, 2011 at 12:01 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov> writes:
>> Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>>> I don't know exactly what the right solution is off the top of my
>>> head, but digging in our heels is not it.
>
>> Well, I'm comfortable digging in my heels against doing *lame* hints
>> just because "it's what all the other kids are doing," which I think
>> is the only thing which would have satisfied the OP on this thread.
>
> Right. If someone comes up with a design that avoids the serious
> pitfalls of traditional hinting schemes, that'd be great. But I'm
> not interested in implementing Oracle-like hints just because Oracle
> has them, which I think was basically what the OP wanted. I haven't
> seen a hinting scheme that didn't suck (and that includes the aspects
> of our own current behavior that are hint-like). I don't say that
> there can't be one.
>
> I believe that the FAQ entry is meant to answer people who come along
> and say "oh, this is easily solved, just do what $PRODUCT does". The
> generic answer to that is "no, it's not that easy". But maybe the FAQ
> should be rephrased to be more like "we don't want traditional hints
> because of problems X, Y, and Z. If you have an idea that avoids those
> problems, let us know."

That's closer to where I think the community is on this issue, for sure.

Frankly, I think we should also have some much better documentation
about how to fix problems in the optimizer. Before the OP went off on
a rant, he actually showed up at a webinar I did looking for advice on
how to fix queries in PG, which wasn't exactly the topic of the
webinar, so he didn't get his answer. But the only way you're going
to find out about a lot of the tricks that we rely on is to read the
mailing lists, and that's below our usual standard of documentation.
Sure, it's a bunch of ugly hacks, but they're useful when you're being
eaten by a crocodile, and the need for them isn't limited to people
who have time to spend all day reading pgsql-whatever.

I also think that we have enough knowledge between us to identify the
areas where some better hints, or hint-ish mechanisms, would actually
be useful. I feel like I have a pretty good idea where the bodies are
buried, and what some of the solutions might look like. But I'm not
sure I want to open that can of worms while we're trying to close out
this CommitFest. In fact I'm pretty sure I don't. But I would like
to change the Todo text to say something less misleading.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	<sthomas(at)peak6(dot)com>
Cc:	"Josh Berkus" <josh(at)agliodbs(dot)com>, "Robert Haas" <robertmhaas(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: Why we don't want hints Was: Slow count(*) again...
Date:	2011-02-10 17:21:51
Message-ID:	4D53CA4F020000250003A789@gw.wicourts.gov
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

Shaun Thomas <sthomas(at)peak6(dot)com> wrote:

> how difficult would it be to add that syntax to the JOIN
> statement, for example?

Something like this syntax?:

JOIN WITH (correlation_factor=0.3)

Where 1.0 might mean that for each value on the left there was only
one distinct value on the right, and 0.0 would mean that they were
entirely independent? (Just as an off-the-cuff example -- I'm not
at all sure that this makes sense, let alone is the best thing to
specify. I'm trying to get at *syntax* here, not particular knobs.)

-Kevin

From:	Chris Browne <cbbrowne(at)acm(dot)org>
To:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: Why we don't want hints
Date:	2011-02-10 17:25:37
Message-ID:	87tygbyf72.fsf@cbbrowne.afilias-int.info
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

robertmhaas(at)gmail(dot)com (Robert Haas) writes:
> On Thu, Feb 10, 2011 at 11:45 AM, Kevin Grittner
> <Kevin(dot)Grittner(at)wicourts(dot)gov> wrote:
>> Well, I'm comfortable digging in my heels against doing *lame* hints
>> just because "it's what all the other kids are doing," which I think
>> is the only thing which would have satisfied the OP on this thread.
>> From both on-list posts and ones exchanged off-list with me, it
>> seems he was stubbornly resistant to properly tuning the server to
>> see if any problems remained, or posting particular problems to see
>> how they would be most effectively handled in PostgreSQL. We
>> obviously can't be drawn into dumb approaches because of
>> ill-informed demands like that.
>
> Nor was I proposing any such thing. But that doesn't make "we don't
> want hints" an accurate statement. Despite the impression that OP
> went away with, the real situation is a lot more nuanced than that,
> and the statement on the Todo list gives the wrong impression, IMHO.

I have added the following comment to the ToDo:

We are not interested to implement hints in ways they are commonly
implemented on other databases, and proposals based on "because
they've got them" will not be welcomed. If you have an idea that
avoids the problems that have been observed with other hint systems,
that could lead to valuable discussion.

That seems to me to characterize the nuance.
--
let name="cbbrowne" and tld="gmail.com" in String.concat "@" [name;tld];;
http://www3.sympatico.ca/cbbrowne/languages.html
If only women came with pull-down menus and online help.

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	"Robert Haas" <robertmhaas(at)gmail(dot)com>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	"Josh Berkus" <josh(at)agliodbs(dot)com>, <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: Why we don't want hints Was: Slow count(*) again...
Date:	2011-02-10 17:27:18
Message-ID:	4D53CB96020000250003A790@gw.wicourts.gov
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

>> maybe the FAQ should be rephrased to be more like "we don't want
>> traditional hints because of problems X, Y, and Z. If you have
>> an idea that avoids those problems, let us know."
>
> That's closer to where I think the community is on this issue

That sounds pretty good to me.

-Kevin

From:	Shaun Thomas <sthomas(at)peak6(dot)com>
To:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc:	Josh Berkus <josh(at)agliodbs(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: Why we don't want hints Was: Slow count(*) again...
Date:	2011-02-10 17:30:46
Message-ID:	4D5420C6.3030608@peak6.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On 02/10/2011 11:21 AM, Kevin Grittner wrote:

> Something like this syntax?:
>
> JOIN WITH (correlation_factor=0.3)

I was thinking more:

JOIN foo_tab USING (foo_id) WITH (COST=50)

or something, to exploit the hooks that already exist for functions, for
example. But it's still an interesting concept. Tell the optimizer what
you want and how the data is really related in cases where it's wrong,
and let it figure out the best path.

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
312-676-8870
sthomas(at)peak6(dot)com

______________________________________________

See http://www.peak6.com/email_disclaimer.php
for terms and conditions related to this email

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	<sthomas(at)peak6(dot)com>
Cc:	"Josh Berkus" <josh(at)agliodbs(dot)com>, "Robert Haas" <robertmhaas(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: Why we don't want hints Was: Slow count(*) again...
Date:	2011-02-10 17:44:29
Message-ID:	4D53CF9D020000250003A797@gw.wicourts.gov
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

Shaun Thomas <sthomas(at)peak6(dot)com> wrote:

> I was thinking more:
>
> JOIN foo_tab USING (foo_id) WITH (COST=50)

The problem I have with that syntax is that it would be hard to read
when you have some nested set of joins or a (SELECT) in the JOIN
instead of simple table name. For me, at least, it would "get lost"
less easily if it were right next to the JOIN keyword.

The problem with a COST factor is that it's not obvious to me what
it would apply to:
- each row on the left?
- each row on the right?
- each row in the result of the JOIN step?
- the entire step?

How would it scale based on other criteria which affected the number
of rows on either side of the join?

If I'm understanding the problem correctly, the part the optimizer
gets wrong (because we don't yet have statistics to support a better
assumption) is assuming that selection criteria on opposite sides of
a join affect entirely independent sets of what would be in the
result without the criteria. To use an oft-cited example, when one
table is selected by zip code and the other by city, that's a bad
assumption about the correlation, leading to bad estimates, leading
to bad costing, leading to bad plans. The OP wanted to override
step 4, a COST setting would try to override step 3, but I think we
would want to override step 1 (until we get statistics which let us
compute that accurately).

-Kevin

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Mark Mielke <mark(at)mark(dot)mielke(dot)cc>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Conor Walsh <ctw(at)adverb(dot)ly>, jd(at)commandprompt(dot)com, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: Does auto-analyze work on dirty writes?
Date:	2011-02-10 17:51:31
Message-ID:	AANLkTimeTk_reda4x3WwsgOfXN6NO6Be63YyjJ5UnZLz@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On Fri, Feb 4, 2011 at 8:50 PM, Mark Mielke <mark(at)mark(dot)mielke(dot)cc> wrote:
> On 02/04/2011 10:41 AM, Tom Lane wrote:
>>
>> 1. Autovacuum fires when the stats collector's insert/update/delete
>> counts have reached appropriate thresholds. Those counts are
>> accumulated from messages sent by backends at transaction commit or
>> rollback, so they take no account of what's been done by transactions
>> still in progress.
>>
>> 2. Only live rows are included in the stats computed by ANALYZE.
>> (IIRC it uses SnapshotNow to decide whether rows are live.)
>>
>> Although the stats collector does track an estimate of the number of
>> dead rows for the benefit of autovacuum, this isn't used by planning.
>> Table bloat is accounted for only in terms of growth of the physical
>> size of the table in blocks.
>
> Thanks, Tom.
>
> Does this un-analyzed "bloat" not impact queries? I guess the worst case
> here is if autovaccum is disabled for some reason and 99% of the table is
> dead rows. If I understand the above correctly, I think analyze might
> generate a bad plan under this scenario, thinking that a value is unique,
> using the index - but every tuple in the index has the same value and each
> has to be looked up in the table to see if it is visible?

It sounds like you're describing something like a one-row table with a
unique index on one of its column, getting updates that can't be made
HOT, and not getting vacuumed. That scenario does suck - I had a test
case I was using it a while back that generated something similar -
but I'm not sure how much it's worth worrying about the plan, because
either an index scan or a sequential scan is going to be awful.

To put that another way, I've founded that the optimizer copes pretty
well with adjusting plans as tables get bloated - mostly by using
index scans rather than sequential scans. It's possible there is some
improvement still to be had there, but I would be a lot more
interested in fixing the bloat, at least based on my own experiences.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Greg Smith <greg(at)2ndquadrant(dot)com>
To:	sthomas(at)peak6(dot)com
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Josh Berkus <josh(at)agliodbs(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: Why we don't want hints Was: Slow count(*) again...
Date:	2011-02-10 17:56:10
Message-ID:	4D5426BA.80400@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

Shaun Thomas wrote:
> Hilariously, I'm not so sure that's what the OP wanted.

Someone to blame as a scapegoat for why his badly planned project had
failed. I've done several Oracle conversions before, and never met
someone who was so resistent to doing the right things for such a
conversion. You have to relatively flexible in your thinking to work
with the good and away from the bad parts of PostgreSQL for such a
project to succeed. I didn't hear a whole lot of "flexible" in that
discussion.

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	"Greg Smith" <greg(at)2ndquadrant(dot)com>,<sthomas(at)peak6(dot)com>
Cc:	"Josh Berkus" <josh(at)agliodbs(dot)com>, "Robert Haas" <robertmhaas(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject:	Re: Why we don't want hints Was: Slow count(*) again...
Date:	2011-02-10 18:26:44
Message-ID:	4D53D984020000250003A79F@gw.wicourts.gov
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

Greg Smith <greg(at)2ndquadrant(dot)com> wrote:
> Shaun Thomas wrote:
>> Hilariously, I'm not so sure that's what the OP wanted.
>
> Someone to blame as a scapegoat for why his badly planned project
> had failed. I've done several Oracle conversions before, and
> never met someone who was so resistent to doing the right things
> for such a conversion. You have to relatively flexible in your
> thinking to work with the good and away from the bad parts of
> PostgreSQL for such a project to succeed. I didn't hear a whole
> lot of "flexible" in that discussion.

I was thinking along the same lines, but couldn't find the words to
put it so politely, so I held back. Still biting my tongue, but I
appreciate your milder summary.

-Kevin

From:	Craig James <craig_james(at)emolecules(dot)com>
To:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: Why we don't want hints Was: Slow count(*) again...
Date:	2011-02-10 18:32:31
Message-ID:	4D542F3F.60002@emolecules.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On 2/10/11 9:21 AM, Kevin Grittner wrote:
> Shaun Thomas<sthomas(at)peak6(dot)com> wrote:
>
>> how difficult would it be to add that syntax to the JOIN
>> statement, for example?
>
> Something like this syntax?:
>
> JOIN WITH (correlation_factor=0.3)
>
> Where 1.0 might mean that for each value on the left there was only
> one distinct value on the right, and 0.0 would mean that they were
> entirely independent? (Just as an off-the-cuff example -- I'm not
> at all sure that this makes sense, let alone is the best thing to
> specify. I'm trying to get at *syntax* here, not particular knobs.)

There are two types of problems:

1. The optimizer is imperfect and makes a sub-optimal choice.

2. There is theoretical reasons why it's hard for the optimizer. For example, in a table with 50 columns, there is a staggering number of possible correlations. An optimizer can't possibly figure this out, but a human might know them from the start. The City/Postal-code correlation is a good example.

For #1, Postgres should never offer any sort of hint mechanism. As many have pointed out, it's far better to spend the time fixing the optimizer than adding hacks.

For #2, it might make sense to give a designer a way to tell Postgres stuff that it couldn't possibly figure out. But ... not until the problem is clearly defined.

What should happen is that someone writes with an example query, and the community realizes that no amount of cleverness from Postgres could ever solve it (for solid theoretical reasons). Only then, when the problem is clearly defined, should we talk about solutions and SQL extensions.

Craig

From:	Tobias Brox <tobixen(at)gmail(dot)com>
To:	Josh Berkus <josh(at)agliodbs(dot)com>
Cc:	pgsql-performance(at)postgresql(dot)org, Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>
Subject:	Re: Why we don't want hints Was: Slow count(*) again...
Date:	2011-02-10 20:55:29
Message-ID:	AANLkTikCG8QUGpt7X=WEBnQCOio=R=zMLa_AoH4KCG2J@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On 4 February 2011 04:46, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
> "Optimizer hints are used to work around problems in the optimizer and
> introduce upgrade and maintenance issues. We would rather have the
> problems reported and fixed. We have discussed a more sophisticated
> system of per-class cost adjustment instead, but a specification remains
> to be developed."

I have no clue about how hints works in Oracle ... I've never been
working "enterprise level" on anything else than Postgres. Anyway,
today I just came over an interesting problem in our production
database today - and I think it would be a benefit to be able to
explicitly tell the planner what index to use (the dev team is adding
redundant attributes and more indexes to solve the problem - which
worries me, because we will run into serious problems as soon as there
won't be enough memory for all the frequently-used indexes).

We have users and transactions, and we have transaction types. The
transaction table is huge. The users are able to interactively check
their transaction listings online, and they have some simple filter
options available as well. Slightly simplified, the queries done
looks like this:

select * from account_transaction where account_id=? order by
created desc limit 25;

select * from account_transaction where trans_type_id in ( ...
long, hard-coded list ...) and account_id=? order by created desc
limit 25;

and we have indexes on:

account_transaction(account_id, created)

account_transaction(account_id, trans_type_id, created)

(At this point, someone would probably suggest to make three
single-key indexes and use bitmap index scan ... well, pulling 25 rows
from the end of an index may be orders of magnitude faster than doing
bitmap index mapping on huge indexes)

For the second query, the planner would chose the first index - and
maybe it makes sense - most of our customers have between 10-30% of
the transactions from the long list of transaction types, slim indexes
are good and by average the slimmer index would probably do the job a
bit faster. The problem is with the corner cases - for some of our
extreme customers thousands of transaction index tuples may need to be
scanned before 25 rows with the correct transaction type is pulled
out, and if the index happens to be on disk, it may take tens of
seconds to pull out the answer. Tens of seconds of waiting leads to
frustration, it is a lot nowadays in an interactive session. Also, I
haven't really checked it up, but it may very well be that this is
exactly the kind of customers we want to retain.

To summarize, there are two things the planner doesn't know - it
doesn't know that there exists such corner cases where the real cost
is far larger than the estimated cost, and it doesn't know that it's
more important to keep the worst-case cost on a reasonable level than
to minimize the average cost. In the ideal world postgres would have
sufficiently good statistics to know that for user #77777 it is better
to chose the second index, but I suppose it would be easier if I was
able to explicitly hide the account_transaction(account_id, created)
index for this query. Well, I know of one way to do it ... but I
suppose it's not a good idea to put "drop index foo; select ...;
rollback;" into production ;-)

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Tobias Brox <tobixen(at)gmail(dot)com>
Cc:	Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject:	Re: Why we don't want hints Was: Slow count(*) again...
Date:	2011-02-10 21:12:12
Message-ID:	4660.1297372332@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

Tobias Brox <tobixen(at)gmail(dot)com> writes:
> I have no clue about how hints works in Oracle ... I've never been
> working "enterprise level" on anything else than Postgres. Anyway,
> today I just came over an interesting problem in our production
> database today - and I think it would be a benefit to be able to
> explicitly tell the planner what index to use (the dev team is adding
> redundant attributes and more indexes to solve the problem - which
> worries me, because we will run into serious problems as soon as there
> won't be enough memory for all the frequently-used indexes).

> We have users and transactions, and we have transaction types. The
> transaction table is huge. The users are able to interactively check
> their transaction listings online, and they have some simple filter
> options available as well. Slightly simplified, the queries done
> looks like this:

> select * from account_transaction where account_id=? order by
> created desc limit 25;

> select * from account_transaction where trans_type_id in ( ...
> long, hard-coded list ...) and account_id=? order by created desc
> limit 25;

> and we have indexes on:

> account_transaction(account_id, created)

> account_transaction(account_id, trans_type_id, created)

Well, in this case the optimizer *is* smarter than you are, and the
reason is that it remembers the correct rules for when indexes are
useful. That second index is of no value for either query, because
"in" doesn't work the way you're hoping.

I understand the larger point you're trying to make, but this example
also nicely illustrates the point being made on the other side, that
"force the optimizer to use the index I think it should use" isn't a
very good solution.

regards, tom lane

From:	Віталій Тимчишин <tivv00(at)gmail(dot)com>
To:	Tobias Brox <tobixen(at)gmail(dot)com>
Cc:	Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-performance(at)postgresql(dot)org, Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>
Subject:	Re: Why we don't want hints Was: Slow count(*) again...
Date:	2011-02-11 08:19:01
Message-ID:	AANLkTi=tUc33HgqKdZOYVsaLSp0zQ7eeMyb+HcTWdNfH@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

2011/2/10 Tobias Brox <tobixen(at)gmail(dot)com>

> On 4 February 2011 04:46, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
> > "Optimizer hints are used to work around problems in the optimizer and
> > introduce upgrade and maintenance issues. We would rather have the
> > problems reported and fixed. We have discussed a more sophisticated
> > system of per-class cost adjustment instead, but a specification remains
> > to be developed."
>
> I have no clue about how hints works in Oracle ... I've never been
> working "enterprise level" on anything else than Postgres. Anyway,
> today I just came over an interesting problem in our production
> database today - and I think it would be a benefit to be able to
> explicitly tell the planner what index to use (the dev team is adding
> redundant attributes and more indexes to solve the problem - which
> worries me, because we will run into serious problems as soon as there
> won't be enough memory for all the frequently-used indexes).
>
> We have users and transactions, and we have transaction types. The
> transaction table is huge. The users are able to interactively check
> their transaction listings online, and they have some simple filter
> options available as well. Slightly simplified, the queries done
> looks like this:
>
> select * from account_transaction where account_id=? order by
> created desc limit 25;
>
> select * from account_transaction where trans_type_id in ( ...
> long, hard-coded list ...) and account_id=? order by created desc
> limit 25;
>
> and we have indexes on:
>
> account_transaction(account_id, created)
>
> account_transaction(account_id, trans_type_id, created)
>
> If the list is hard-coded, you can create partial index on
account_transaction(account_id, created desc) where trans_type_id in ( ...
long, hard-coded list ...)

--
Best regards,
Vitalii Tymchyshyn

From:	Tobias Brox <tobixen(at)gmail(dot)com>
To:	Віталій Тимчишин <tivv00(at)gmail(dot)com>
Cc:	Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-performance(at)postgresql(dot)org, Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>
Subject:	Re: Why we don't want hints Was: Slow count(*) again...
Date:	2011-02-11 09:29:06
Message-ID:	AANLkTim+D5W3OyFWmXi0BA9xvP5yb5+Aja+Yfq0yOUFZ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

2011/2/11 Віталій Тимчишин <tivv00(at)gmail(dot)com>:
> If the list is hard-coded, you can create partial index on
> account_transaction(account_id, created desc) where trans_type_id in ( ...
> long, hard-coded list ...)

My idea as well, though it looks ugly and it would be a maintenance
head-ache (upgrading the index as new transaction types are added
would mean "costly" write locks on the table, and we can't rely on
manual processes to get it right ... we might need to set up scripts
to either upgrade the index or alert us if the index needs upgrading).

From:	Vitalii Tymchyshyn <tivv00(at)gmail(dot)com>
To:	Tobias Brox <tobixen(at)gmail(dot)com>
Cc:	Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-performance(at)postgresql(dot)org, Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>
Subject:	Re: Why we don't want hints Was: Slow count(*) again...
Date:	2011-02-11 09:44:05
Message-ID:	4D5504E5.3060901@gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

11.02.11 11:29, Tobias Brox написав(ла):
> 2011/2/11 Віталій Тимчишин<tivv00(at)gmail(dot)com>:
>> If the list is hard-coded, you can create partial index on
>> account_transaction(account_id, created desc) where trans_type_id in ( ...
>> long, hard-coded list ...)
> My idea as well, though it looks ugly and it would be a maintenance
> head-ache (upgrading the index as new transaction types are added
> would mean "costly" write locks on the table,
Create new one concurrently.
> and we can't rely on
> manual processes to get it right ... we might need to set up scripts
> to either upgrade the index or alert us if the index needs upgrading).
Yep. Another option could be to add query rewrite as

select * from (
select * from account_transaction where trans_type_id =type1 and
account_id=? order by created desc limit 25 union all
select * from account_transaction where trans_type_id =type2 and
account_id=? order by created desc limit 25 union all
...
union all
select * from account_transaction where trans_type_id =typeN and
account_id=? order by created desc limit 25
) a
order by created desc limit 25

This will allow to use three-column index in the way it can be used for
such query. Yet if N is large query will look ugly. And I am not sure if
optimizer is smart enough for not to fetch 25*N rows.

Best regards, Vitalii Tymchyshyn

From:	Tobias Brox <tobixen(at)gmail(dot)com>
To:	Vitalii Tymchyshyn <tivv00(at)gmail(dot)com>
Cc:	Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-performance(at)postgresql(dot)org, Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>
Subject:	Re: Why we don't want hints Was: Slow count(*) again...
Date:	2011-02-11 11:26:01
Message-ID:	AANLkTikwHQO2TVBYu81CLsBPjcyin3xrK+o7r=WrJ3g7@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

2011/2/11 Vitalii Tymchyshyn <tivv00(at)gmail(dot)com>:
>> My idea as well, though it looks ugly and it would be a maintenance
>> head-ache (upgrading the index as new transaction types are added
>> would mean "costly" write locks on the table,
>
> Create new one concurrently.

Concurrently? Are there any ways to add large indexes without
blocking inserts to the table for the time it takes to create the
index?

> Yep. Another option could be to add query rewrite as
>
> select * from (
> select * from account_transaction where trans_type_id =type1 and
> account_id=? order by created desc limit 25 union all
> select * from account_transaction where trans_type_id =type2 and
> account_id=? order by created desc limit 25 union all
> ...
> union all
> select * from account_transaction where trans_type_id =typeN and
> account_id=? order by created desc limit 25
> ) a
> order by created desc limit 25

I actually considered that. For the test case given it works very
fast. Not sure if it would work universally ... it scales well when
having extreme amounts of transactions outside the given transaction
list (the case we have problems with now), but it wouldn't scale if
some user has an extreme amount of transactions within the list.
However, I think our "extreme amount of transactions"-problem is
mostly limited to the transaction types outside the list.

From:	Andrea Suisani <sickpig(at)opinioni(dot)net>
To:	Tobias Brox <tobixen(at)gmail(dot)com>
Cc:	Vitalii Tymchyshyn <tivv00(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-performance(at)postgresql(dot)org, Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>
Subject:	Re: Why we don't want hints Was: Slow count(*) again...
Date:	2011-02-11 11:33:22
Message-ID:	4D551E82.7030300@opinioni.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On 02/11/2011 12:26 PM, Tobias Brox wrote:
> 2011/2/11 Vitalii Tymchyshyn<tivv00(at)gmail(dot)com>:
>>> My idea as well, though it looks ugly and it would be a maintenance
>>> head-ache (upgrading the index as new transaction types are added
>>> would mean "costly" write locks on the table,
>>
>> Create new one concurrently.
>
> Concurrently? Are there any ways to add large indexes without
> blocking inserts to the table for the time it takes to create the
> index?

yep, AFAIR since 8.2
see: http://www.postgresql.org/docs/8.2/static/sql-createindex.html#SQL-CREATEINDEX-CONCURRENTLY

[cut]

Andrea

From:	"Pierre C" <lists(at)peufeu(dot)com>
To:	"Josh Berkus" <josh(at)agliodbs(dot)com>, "Tobias Brox" <tobixen(at)gmail(dot)com>
Cc:	pgsql-performance(at)postgresql(dot)org, "Mladen Gogala" <mladen(dot)gogala(at)vmsinfo(dot)com>
Subject:	Re: Why we don't want hints Was: Slow count(*) again...
Date:	2011-02-11 14:51:44
Message-ID:	op.vqqt8ihaeorkce@apollo13
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

> select * from account_transaction where trans_type_id in ( ...
> long, hard-coded list ...) and account_id=? order by created desc
> limit 25;

You could use an index on (account_id, created, trans_type), in
replacement of your index on (account_id, created). This will not prevent
the "Index Scan Backwards", but at least, index rows with trans_type not
matching the WHERE clause will not generate any heap access...

From:	Rob Wultsch <wultsch(at)gmail(dot)com>
To:	Chris Browne <cbbrowne(at)acm(dot)org>
Cc:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: Why we don't want hints
Date:	2011-02-13 20:40:09
Message-ID:	AANLkTinJjH4838Aa3wq6VrNm+MqB6hOKwC_-iei-eEj-@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On Thu, Feb 10, 2011 at 9:25 AM, Chris Browne <cbbrowne(at)acm(dot)org> wrote:
> robertmhaas(at)gmail(dot)com (Robert Haas) writes:
>> On Thu, Feb 10, 2011 at 11:45 AM, Kevin Grittner
>> <Kevin(dot)Grittner(at)wicourts(dot)gov> wrote:
>>> Well, I'm comfortable digging in my heels against doing *lame* hints
>>> just because "it's what all the other kids are doing," which I think
>>> is the only thing which would have satisfied the OP on this thread.
>>> From both on-list posts and ones exchanged off-list with me, it
>>> seems he was stubbornly resistant to properly tuning the server to
>>> see if any problems remained, or posting particular problems to see
>>> how they would be most effectively handled in PostgreSQL. We
>>> obviously can't be drawn into dumb approaches because of
>>> ill-informed demands like that.
>>
>> Nor was I proposing any such thing. But that doesn't make "we don't
>> want hints" an accurate statement. Despite the impression that OP
>> went away with, the real situation is a lot more nuanced than that,
>> and the statement on the Todo list gives the wrong impression, IMHO.
>
> I have added the following comment to the ToDo:
>
> We are not interested to implement hints in ways they are commonly
> implemented on other databases, and proposals based on "because
> they've got them" will not be welcomed. If you have an idea that
> avoids the problems that have been observed with other hint systems,
> that could lead to valuable discussion.
>
> That seems to me to characterize the nuance.

Where exactly are the problems with other systems noted? Most other
systems have this option so saying "They have problems" is a giant cop
out.

--
Rob Wultsch
wultsch(at)gmail(dot)com

From:	Josh Berkus <josh(at)agliodbs(dot)com>
To:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: Why we don't want hints
Date:	2011-02-13 22:29:32
Message-ID:	4D585B4C.5020602@agliodbs.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

I've wordsmithed Chris's changes some, and then spun off a completely
separate page for Hints discussion, since the NotToDo item was becoming
too long.

> Something like this syntax?:
>
> JOIN WITH (correlation_factor=0.3)

Please, NO!

This is exactly the kind of hint that I regard as a last resort if we
run out of implementation alternatives. Any hint which gets coded into
the actual queries becomes a *massive* maintenance and upgrade headache
thereafter. If we're implementing a hint alternative, we should look at
stuff in this priority order:

1. Useful tuning of additional cost parameters by GUC (i.e.
cursor_tuple_fraction)
2. Modifying cost parameters on database *objects* (i.e. "ndistinct=500")
3. Adding new parameters to modify on database objects (i.e.
"distribution=normal(1.5,17)","new_rows=0.1")
4. Query hints (if all of the above fails to give fixes for some tested
problem)

> Where exactly are the problems with other systems noted? Most other
> systems have this option so saying "They have problems" is a giant cop
> out.

I've put my list down:
http://wiki.postgresql.org/wiki/OptimizerHintsDiscussion#Problems_with_existing_Hint_stystems

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com

From:	Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
To:	Josh Berkus <josh(at)agliodbs(dot)com>
Cc:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: Why we don't want hints
Date:	2011-02-13 22:52:22
Message-ID:	AANLkTi=_+snMF0MxUnYquTh6TB-nKtOAUfVK5VwM15D6@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On Sun, Feb 13, 2011 at 3:29 PM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
> I've wordsmithed Chris's changes some, and then spun off a completely
> separate page for Hints discussion, since the NotToDo item was becoming
> too long.
>
>> Something like this syntax?:
>>
>> JOIN WITH (correlation_factor=0.3)
>
> Please, NO!
>
> This is exactly the kind of hint that I regard as a last resort if we
> run out of implementation alternatives. Any hint which gets coded into
> the actual queries becomes a *massive* maintenance and upgrade headache
> thereafter. If we're implementing a hint alternative, we should look at
> stuff in this priority order:
>
> 1. Useful tuning of additional cost parameters by GUC (i.e.
> cursor_tuple_fraction)
> 2. Modifying cost parameters on database *objects* (i.e. "ndistinct=500")
> 3. Adding new parameters to modify on database objects (i.e.
> "distribution=normal(1.5,17)","new_rows=0.1")
> 4. Query hints (if all of the above fails to give fixes for some tested
> problem)

I fail to see how 1 through 3 can tell the planner the correlation
between two fields in two separate tables.

From:	Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
To:	Josh Berkus <josh(at)agliodbs(dot)com>
Cc:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: Why we don't want hints
Date:	2011-02-14 07:01:37
Message-ID:	AANLkTik8hhy9UOAMgQnM-=fi0eZ0x=0=o=b=_N=qJxHY@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On Sun, Feb 13, 2011 at 10:49 PM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
>
>> I fail to see how 1 through 3 can tell the planner the correlation
>> between two fields in two separate tables.
>
> CREATE CORRELATION_ESTIMATE ( table1.colA ) = ( table2.colB ) IS 0.3
>
> ... and then it fixes the correlation for *every* query in the database, not
> just that one. And is easy to fix if the correlation changes.

I like that. Even better, could we setup some kind of simple command
to tell analyze to collect stats for the two columns together?

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc:	sthomas(at)peak6(dot)com, Josh Berkus <josh(at)agliodbs(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: Why we don't want hints Was: Slow count(*) again...
Date:	2011-02-16 21:22:26
Message-ID:	201102162122.p1GLMQS29138@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

Kevin Grittner wrote:
> Shaun Thomas <sthomas(at)peak6(dot)com> wrote:
>
> > how difficult would it be to add that syntax to the JOIN
> > statement, for example?
>
> Something like this syntax?:
>
> JOIN WITH (correlation_factor=0.3)
>
> Where 1.0 might mean that for each value on the left there was only
> one distinct value on the right, and 0.0 would mean that they were
> entirely independent? (Just as an off-the-cuff example -- I'm not
> at all sure that this makes sense, let alone is the best thing to
> specify. I'm trying to get at *syntax* here, not particular knobs.)

I am not excited about the idea of putting these correlations in
queries. What would be more intesting would be for analyze to build a
correlation coeffficent matrix showing how columns are correlated:

a b c
a 1 .4 0
b .1 1 -.3
c .2 .3 1

and those correlations could be used to weigh how the single-column
statistics should be combined.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Bruce Momjian <bruce(at)momjian(dot)us>
Cc:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, sthomas(at)peak6(dot)com, Josh Berkus <josh(at)agliodbs(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: Why we don't want hints Was: Slow count(*) again...
Date:	2011-02-23 02:22:10
Message-ID:	AANLkTimjt7PGuSyXFsedjCsANzMiVYNDfUUaPJ6_wqrr@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On Wed, Feb 16, 2011 at 4:22 PM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> I am not excited about the idea of putting these correlations in
> queries. What would be more intesting would be for analyze to build a
> correlation coeffficent matrix showing how columns are correlated:
>
> a b c
> a 1 .4 0
> b .1 1 -.3
> c .2 .3 1
>
> and those correlations could be used to weigh how the single-column
> statistics should be combined.

If you can make it work, I'll take it... it's (much) easier said than
done, though.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	pasman pasmański <pasman(dot)p(at)gmail(dot)com>
To:	"pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: Why we don't want hints Was: Slow count(*) again...
Date:	2011-06-05 15:25:39
Message-ID:	BANLkTimE3isrTRmZFG8obsvYJCUxSmJzaA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

Hi. I have the idea: hints joined to function. For example instead of

WHERE table1.field1=table2.field2
write:
WHERE specificeq(table1.field1,table2.field2)

and hints add to declaration of specificeq function.

2011/2/23, Robert Haas <robertmhaas(at)gmail(dot)com>:
> On Wed, Feb 16, 2011 at 4:22 PM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
>> I am not excited about the idea of putting these correlations in
>> queries. What would be more intesting would be for analyze to build a
>> correlation coeffficent matrix showing how columns are correlated:
>>
>> a b c
>> a 1 .4 0
>> b .1 1 -.3
>> c .2 .3 1
>>
>> and those correlations could be used to weigh how the single-column
>> statistics should be combined.
>
> If you can make it work, I'll take it... it's (much) easier said than
> done, though.
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

--
------------
pasman

From:	Robert Klemme <shortcutter(at)googlemail(dot)com>
To:	Craig James <craig_james(at)emolecules(dot)com>
Cc:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: Why we don't want hints Was: Slow count(*) again...
Date:	2011-06-06 08:14:43
Message-ID:	BANLkTikOjbBmgbEY0Z3iJAMNOgaxGYr-Dg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On Thu, Feb 10, 2011 at 7:32 PM, Craig James <craig_james(at)emolecules(dot)com> wrote:
> On 2/10/11 9:21 AM, Kevin Grittner wrote:
>>
>> Shaun Thomas<sthomas(at)peak6(dot)com> wrote:
>>
>>> how difficult would it be to add that syntax to the JOIN
>>> statement, for example?
>>
>> Something like this syntax?:
>>
>> JOIN WITH (correlation_factor=0.3)
>>
>> Where 1.0 might mean that for each value on the left there was only
>> one distinct value on the right, and 0.0 would mean that they were
>> entirely independent? (Just as an off-the-cuff example -- I'm not
>> at all sure that this makes sense, let alone is the best thing to
>> specify. I'm trying to get at *syntax* here, not particular knobs.)
>
> There are two types of problems:
>
> 1. The optimizer is imperfect and makes a sub-optimal choice.
>
> 2. There is theoretical reasons why it's hard for the optimizer. For
> example, in a table with 50 columns, there is a staggering number of
> possible correlations. An optimizer can't possibly figure this out, but a
> human might know them from the start. The City/Postal-code correlation is a
> good example.
>
> For #1, Postgres should never offer any sort of hint mechanism. As many
> have pointed out, it's far better to spend the time fixing the optimizer
> than adding hacks.
>
> For #2, it might make sense to give a designer a way to tell Postgres stuff
> that it couldn't possibly figure out. But ... not until the problem is
> clearly defined.
>
> What should happen is that someone writes with an example query, and the
> community realizes that no amount of cleverness from Postgres could ever
> solve it (for solid theoretical reasons). Only then, when the problem is
> clearly defined, should we talk about solutions and SQL extensions.

I don't have one such query handy. However, I think your posting is a
good starting point for a discussion how to figure out what we need
and how a good solution could look like. For example, one thing I
dislike about hints is that they go into the query. There are a few
drawbacks of this approach

- Applications need to be changed to benefit which is not always possible.
- One important class of such applications are those that use OR
mappers - hinting then would have to be buried in OR mapper code or
configuration.
- Hints in the query work only for exactly that query (this might be
an advantage depending on point of view).

I think the solution should rather be to tell Postgres what "it
couldn't possibly figure out". I imagine that could be some form of
description of the distribution of data in columns and / or
correlations between columns. Advantage would be that the optimizer
gets additional input which it can use (i.e. the usage can change
between releases), the information is separate from queries (more like
meta data for tables) and thus all queries using a particular table
which was augmented with this meta data would benefit. Usage of this
meta data could be controlled by a flag per session (as well as
globally) so it would be relatively easy to find out whether this meta
data has become obsolete (because data changed or a new release of the
database is in use).

Kind regards

robert

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/