Quick Links

Re: understanding pg_stat* numbers

Lists:	pgsql-hackers

From:	Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
To:	Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	understanding pg_stat* numbers
Date:	2005-03-26 10:41:38
Message-ID:	Pine.GSO.4.62.0503261259050.17555@ra.sai.msu.su
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi there,

I'm investigating one performance issue with tsearch2 index and trying to
interperet io statiscs from pg_statio_user_tables, pg_stat_user_tables.
But from documentation it's not clear what numbers I shoud take into
account and I'm a bit confused :)
I'm looking for blocks *actually* read from disk, since IO is the most
important factor.

I reseted stats and run my query and then obtained statistics:

=# select pg_stat_reset();

=# explain analyze select 1 from message_parts where message_parts.index_fts @@ '\'star\'';
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------
Index Scan using a_gist_key on message_parts (cost=0.00..1381.92 rows=469 width=0) (actual time=0.785..1236.086 rows=5142 loops=1)
Index Cond: (index_fts @@ '\'star\''::tsquery)
Total runtime: 1240.274 ms
(3 rows)

=# select 'StatB:',heap_blks_read,heap_blks_hit,idx_blks_read, idx_blks_hit from pg_statio_user_tables where relname='message_parts';
?column? | heap_blks_read | heap_blks_hit | idx_blks_read | idx_blks_hit
----------+----------------+---------------+---------------+--------------
StatB: | 1888 | 1700 | 1056 | 7226
(1 row)

=# select 'StatR:',seq_scan,seq_tup_read,idx_scan,idx_tup_fetch from pg_stat_user_tables where relname='message_parts';
?column? | seq_scan | seq_tup_read | idx_scan | idx_tup_fetch
----------+----------+--------------+----------+---------------
StatR: | 0 | 0 | 1 | 5939
(1 row)

>From documentation:

the total number of disk blocks read from table - 1888
the number of buffer hits from table - 1700
the numbers of disk blocks read from indices - 1056
the number of buffer hits from indices - 7226

total numbers of rows returned by index scan - 5939

So, the total number of table blocks read is (1888+1700), and index blocks
is (1056+7226) ? Or from 1888 table blocks read there were 1700 blocks
already in buffer, but then I dont' understand index stats.

Since disk io is the most important performance factor,
should I look mostly on heap_blks_read and idx_blks_read ?

My query returns 5142 rows, while I see from idx_tup_fetch that
index returns 5939 rows. So, does it means that 5939 table rows was actually read
from disk and checked for lossines (index is lossy) and 797 hits was
actually false drops ?

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
Cc:	Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: understanding pg_stat* numbers
Date:	2005-03-26 21:05:04
Message-ID:	18156.1111871104@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Oleg Bartunov <oleg(at)sai(dot)msu(dot)su> writes:
> I'm looking for blocks *actually* read from disk, since IO is the most
> important factor.

Well, you won't find that out from Postgres, since it has no idea
whether a read() request was satisfied from kernel disk cache or had
to actually go to disk.

You could turn on log_statement_stats to get some possibly-more-reliable
numbers from the kernel via getrusage(), but this will only tell you
about totals across the whole backend process, not per-relation ...

regards, tom lane

From:	Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: understanding pg_stat* numbers
Date:	2005-03-26 21:35:36
Message-ID:	Pine.GSO.4.62.0503270012360.17555@ra.sai.msu.su
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sat, 26 Mar 2005, Tom Lane wrote:

> Oleg Bartunov <oleg(at)sai(dot)msu(dot)su> writes:
>> I'm looking for blocks *actually* read from disk, since IO is the most
>> important factor.
>
> Well, you won't find that out from Postgres, since it has no idea
> whether a read() request was satisfied from kernel disk cache or had
> to actually go to disk.

so, what's the usefulness of pg_stat* ?

From my experiments I feel heap_blks_read is the table blocks read from disk,
well, sort of, and heap_blks_hit - from buffer. At least, I observed when I
repeat the same query several times (heap_blks_read+heap_blks_hit) doesn't
changed while more blocks come from heap_blks_hit.

>
> You could turn on log_statement_stats to get some possibly-more-reliable
> numbers from the kernel via getrusage(), but this will only tell you
> about totals across the whole backend process, not per-relation ...
>

Hmm, it's impossible to do researching :( We have so many parameters and
almost no reliable stats. How do you believe you did a good choice ?
It's common in db worlds to have IO statistics (timings are not important)
to research algorithms and implementation.

> regards, tom lane
>

From:	"Qingqing Zhou" <zhouqq(at)cs(dot)toronto(dot)edu>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: understanding pg_stat* numbers
Date:	2005-03-31 14:28:06
Message-ID:	d2h1f2$1fr2$1@news.hub.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

"Oleg Bartunov" <oleg(at)sai(dot)msu(dot)su> writes
> From my experiments I feel heap_blks_read is the table blocks read from
disk,
> well, sort of, and heap_blks_hit - from buffer. At least, I observed when
I
> repeat the same query several times (heap_blks_read+heap_blks_hit) doesn't
> changed while more blocks come from heap_blks_hit.
>

PG relies on the OS to schedule the *real* IO you mean. So heap_blks_read
may come from OS kernel buffer, or from disk. You can minimize/disable your
OS file buffers (if it supports this function) or change the mdopen() using
O_DIRECT to make the heap_blks_read approaches the *real* IO if you want -
but for sure this hurts performance.

> It's common in db worlds to have IO statistics (timings are not important)
> to research algorithms and implementation.
>

The *real* IO you mean might be an elegant choice in academic, but is a
subjective matter in practice. For example, we can only assume statistically
N random IO costs the same time with another N random IO, because disk
header has its own scheduling logic controlled by the layout of these
blocks, etc, which you can't decide. Not to speak of the IOs in concurrent
environment. If so, use heap_blks_read/write is already good enough?

Regards,
Qingqing

From:	Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
To:	Qingqing Zhou <zhouqq(at)cs(dot)toronto(dot)edu>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: understanding pg_stat* numbers
Date:	2005-04-01 07:54:30
Message-ID:	Pine.GSO.4.62.0504011111340.14187@ra.sai.msu.su
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Qingqing,

Thanks for the info, I do know all issues you mentioned. I want to know
if there is any possibility to research performance problem in PostgreSQL
without voodoo techiques, at least at the level of pg itself. At this aspect,
I could answer myself: heap_blks_read is the # of blocks
'ordered' to read from disk and heap_blks_hit is the # of blocks
read from pg buffers.

One mystery remains, why stats show heap_blks_read > 0 for indexed search ?
select 1 from foo where id=5
I did pg_stat_reset() before run query.

Oleg

On Thu, 31 Mar 2005, Qingqing Zhou wrote:

>
> "Oleg Bartunov" <oleg(at)sai(dot)msu(dot)su> writes
>> From my experiments I feel heap_blks_read is the table blocks read from
> disk,
>> well, sort of, and heap_blks_hit - from buffer. At least, I observed when
> I
>> repeat the same query several times (heap_blks_read+heap_blks_hit) doesn't
>> changed while more blocks come from heap_blks_hit.
>>
>
> PG relies on the OS to schedule the *real* IO you mean. So heap_blks_read
> may come from OS kernel buffer, or from disk. You can minimize/disable your
> OS file buffers (if it supports this function) or change the mdopen() using
> O_DIRECT to make the heap_blks_read approaches the *real* IO if you want -
> but for sure this hurts performance.
>
>> It's common in db worlds to have IO statistics (timings are not important)
>> to research algorithms and implementation.
>>
>
> The *real* IO you mean might be an elegant choice in academic, but is a
> subjective matter in practice. For example, we can only assume statistically
> N random IO costs the same time with another N random IO, because disk
> header has its own scheduling logic controlled by the layout of these
> blocks, etc, which you can't decide. Not to speak of the IOs in concurrent
> environment. If so, use heap_blks_read/write is already good enough?
>
> Regards,
> Qingqing
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo(at)postgresql(dot)org
>

From:	"Qingqing Zhou" <zhouqq(at)cs(dot)toronto(dot)edu>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: understanding pg_stat* numbers
Date:	2005-04-01 09:29:57
Message-ID:	d2j4op$1c9t$1@news.hub.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

"Oleg Bartunov" <oleg(at)sai(dot)msu(dot)su> writes
> One mystery remains, why stats show heap_blks_read > 0 for indexed search
?
> select 1 from foo where id=5
> I did pg_stat_reset() before run query.
>

There is no clustered index in PG so far, so all the data item has to be
stay in the heap. In brief, the executor has to check the visibility of each
row in the heap pointed by the index. For performance consideration, if one
row is invisible("killed"), PG could remember its status it in the index
item identifier, so next time it just ignore it.

Regards,
Qingqing