Quick Links

Script to compute random page cost

Lists:	pgsql-hackers

From:	Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To:	PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject:	Script to compute random page cost
Date:	2002-09-09 05:05:49
Message-ID:	200209090505.g8955nq14584@candle.pha.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Because we have seen many complains about sequential vs index scans, I
wrote a script which computes the value for your OS/hardware
combination.

Under BSD/OS on one SCSI disk, I get a random_page_cost around 60. Our
current postgresql.conf default is 4.

What do other people get for this value?

Keep in mind if we increase this value, we will get a more sequential
scans vs. index scans.

One flaw in this test is that it randomly reads blocks from different
files rather than randomly reading from the same file. Do people have a
suggestion on how to correct this? Does it matter?

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

Attachment	Content-Type	Size
unknown_filename	text/plain	2.2 KB

From:	Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To:	PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject:	Re: Script to compute random page cost
Date:	2002-09-09 06:13:54
Message-ID:	200209090613.g896Dsj12335@candle.pha.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

OK, turns out that the loop for sequential scan ran fewer times and was
skewing the numbers. I have a new version at:

ftp://candle.pha.pa.us/pub/postgresql/randcost

I get _much_ lower numbers now for random_page_cost.

---------------------------------------------------------------------------

Bruce Momjian wrote:
> Because we have seen many complains about sequential vs index scans, I
> wrote a script which computes the value for your OS/hardware
> combination.
>
> Under BSD/OS on one SCSI disk, I get a random_page_cost around 60. Our
> current postgresql.conf default is 4.
>
> What do other people get for this value?
>
> Keep in mind if we increase this value, we will get a more sequential
> scans vs. index scans.
>
> One flaw in this test is that it randomly reads blocks from different
> files rather than randomly reading from the same file. Do people have a
> suggestion on how to correct this? Does it matter?
>
> --
> Bruce Momjian | http://candle.pha.pa.us
> pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
> + If your life is a hard drive, | 13 Roberts Road
> + Christ can be your backup. | Newtown Square, Pennsylvania 19073

> #!/bin/bash
>
> trap "rm -f /tmp/$$" 0 1 2 3 15
>
> BLCKSZ=8192
>
> if [ "$RANDOM" = "$RANDOM" ]
> then echo "Your shell does not support \$RANDOM. Try using bash." 1>&2
> exit 1
> fi
>
> # XXX We assume 0 <= random <= 32767
>
> echo "Collecting sizing information ..."
>
> TEMPLATE1=`du -s "$PGDATA/base/1" | awk '{print $1}'`
> FULL=`du -s "$PGDATA/base" | awk '{print $1}'`
> if [ "$FULL" -lt `expr "$TEMPLATE1" \* 4` ]
> then echo "Your installation should have at least four times the data stored in template1 to yield meaningful results" 1>&2
> exit 1
> fi
>
> find "$PGDATA/base" -type f -exec ls -ld {} \; |
> awk '$5 % '"$BLCKSZ"' == 0 {print $5 / '"$BLCKSZ"', $9}' |
> grep -v '^0 ' > /tmp/$$
>
> TOTAL=`awk 'BEGIN {sum=0}
> {sum += $1}
> END {print sum}' /tmp/$$`
>
> echo "Running random access timing test ..."
>
> START=`date '+%s'`
> PAGES=1000
>
> while [ "$PAGES" -ne 0 ]
> do
> BIGRAND=`expr "$RANDOM" \* 32768 + "$RANDOM"`
>
> OFFSET=`awk 'BEGIN{printf "%d\n", ('"$BIGRAND"' / 2^30) * '"$TOTAL"'}'`
>
> RESULT=`awk ' BEGIN {offset = 0}
> offset + $1 > '"$OFFSET"' \
> {print $2, '"$OFFSET"' - offset ; exit}
> {offset += $1}' /tmp/$$`
> FILE=`echo "$RESULT" | awk '{print $1}'`
> OFFSET=`echo "$RESULT" | awk '{print $2}'`
>
> dd bs="$BLCKSZ" seek="$OFFSET" count=1 if="$FILE" of="/dev/null" >/dev/null 2>&1
> PAGES=`expr "$PAGES" - 1`
> done
>
> STOP=`date '+%s'`
> RANDTIME=`expr "$STOP" - "$START"`
>
> echo "Running sequential access timing test ..."
>
> START=`date '+%s'`
> # We run the random test 10 times more because it is quicker and
> # we need it to run for a while to get accurate results.
> PAGES=10000
>
> while [ "$PAGES" -ne 0 ]
> do
> BIGRAND=`expr "$RANDOM" \* 32768 + "$RANDOM"`
>
> OFFSET=`awk 'BEGIN{printf "%d\n", ('"$BIGRAND"' / 2^30) * '"$TOTAL"'}'`
>
> RESULT=`awk ' BEGIN {offset = 0}
> offset + $1 > '"$OFFSET"' \
> {print $2, $1; exit}
> {offset += $1}' /tmp/$$`
> FILE=`echo "$RESULT" | awk '{print $1}'`
> FILEPAGES=`echo "$RESULT" | awk '{print $2}'`
>
> if [ "$FILEPAGES" -gt "$PAGES" ]
> then FILEPAGES="$PAGES"
> fi
>
> dd bs="$BLCKSZ" count="$FILEPAGES" if="$FILE" of="/dev/null" >/dev/null 2>&1
> PAGES=`expr "$PAGES" - "$FILEPAGES"`
> done
>
> STOP=`date '+%s'`
> SEQTIME=`expr "$STOP" - "$START"`
>
> echo
> awk 'BEGIN {printf "random_page_cost = %f\n", ('"$RANDTIME"' / '"$SEQTIME"') * 10}'

>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
> message can get through to the mailing list cleanly

From:	"Christopher Kings-Lynne" <chriskl(at)familyhealth(dot)com(dot)au>
To:	"Bruce Momjian" <pgman(at)candle(dot)pha(dot)pa(dot)us>, "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Script to compute random page cost
Date:	2002-09-09 07:20:42
Message-ID:	GNELIHDDFBOCMGBFGEFOMECFCEAA.chriskl@familyhealth.com.au
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

I got:

random_page_cost = 0.807018

For FreeBSD 4.4/i386

With 512MB RAM & SCSI HDD

Chris

> -----Original Message-----
> From: pgsql-hackers-owner(at)postgresql(dot)org
> [mailto:pgsql-hackers-owner(at)postgresql(dot)org]On Behalf Of Bruce Momjian
> Sent: Monday, 9 September 2002 2:14 PM
> To: PostgreSQL-development
> Subject: Re: [HACKERS] Script to compute random page cost
>
>
>
> OK, turns out that the loop for sequential scan ran fewer times and was
> skewing the numbers. I have a new version at:
>
> ftp://candle.pha.pa.us/pub/postgresql/randcost
>
> I get _much_ lower numbers now for random_page_cost.
>
> ------------------------------------------------------------------
> ---------
>
> Bruce Momjian wrote:
> > Because we have seen many complains about sequential vs index scans, I
> > wrote a script which computes the value for your OS/hardware
> > combination.
> >
> > Under BSD/OS on one SCSI disk, I get a random_page_cost around 60. Our
> > current postgresql.conf default is 4.
> >
> > What do other people get for this value?
> >
> > Keep in mind if we increase this value, we will get a more sequential
> > scans vs. index scans.
> >
> > One flaw in this test is that it randomly reads blocks from different
> > files rather than randomly reading from the same file. Do people have a
> > suggestion on how to correct this? Does it matter?
> >
> > --
> > Bruce Momjian | http://candle.pha.pa.us
> > pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
> > + If your life is a hard drive, | 13 Roberts Road
> > + Christ can be your backup. | Newtown Square,
> Pennsylvania 19073
>
> > #!/bin/bash
> >
> > trap "rm -f /tmp/$$" 0 1 2 3 15
> >
> > BLCKSZ=8192
> >
> > if [ "$RANDOM" = "$RANDOM" ]
> > then echo "Your shell does not support \$RANDOM. Try
> using bash." 1>&2
> > exit 1
> > fi
> >
> > # XXX We assume 0 <= random <= 32767
> >
> > echo "Collecting sizing information ..."
> >
> > TEMPLATE1=`du -s "$PGDATA/base/1" | awk '{print $1}'`
> > FULL=`du -s "$PGDATA/base" | awk '{print $1}'`
> > if [ "$FULL" -lt `expr "$TEMPLATE1" \* 4` ]
> > then echo "Your installation should have at least four
> times the data stored in template1 to yield meaningful results" 1>&2
> > exit 1
> > fi
> >
> > find "$PGDATA/base" -type f -exec ls -ld {} \; |
> > awk '$5 % '"$BLCKSZ"' == 0 {print $5 / '"$BLCKSZ"', $9}' |
> > grep -v '^0 ' > /tmp/$$
> >
> > TOTAL=`awk 'BEGIN {sum=0}
> > {sum += $1}
> > END {print sum}' /tmp/$$`
> >
> > echo "Running random access timing test ..."
> >
> > START=`date '+%s'`
> > PAGES=1000
> >
> > while [ "$PAGES" -ne 0 ]
> > do
> > BIGRAND=`expr "$RANDOM" \* 32768 + "$RANDOM"`
> >
> > OFFSET=`awk 'BEGIN{printf "%d\n", ('"$BIGRAND"' / 2^30) *
> '"$TOTAL"'}'`
> >
> > RESULT=`awk ' BEGIN {offset = 0}
> > offset + $1 > '"$OFFSET"' \
> > {print $2, '"$OFFSET"' - offset ; exit}
> > {offset += $1}' /tmp/$$`
> > FILE=`echo "$RESULT" | awk '{print $1}'`
> > OFFSET=`echo "$RESULT" | awk '{print $2}'`
> >
> > dd bs="$BLCKSZ" seek="$OFFSET" count=1 if="$FILE"
> of="/dev/null" >/dev/null 2>&1
> > PAGES=`expr "$PAGES" - 1`
> > done
> >
> > STOP=`date '+%s'`
> > RANDTIME=`expr "$STOP" - "$START"`
> >
> > echo "Running sequential access timing test ..."
> >
> > START=`date '+%s'`
> > # We run the random test 10 times more because it is quicker and
> > # we need it to run for a while to get accurate results.
> > PAGES=10000
> >
> > while [ "$PAGES" -ne 0 ]
> > do
> > BIGRAND=`expr "$RANDOM" \* 32768 + "$RANDOM"`
> >
> > OFFSET=`awk 'BEGIN{printf "%d\n", ('"$BIGRAND"' / 2^30) *
> '"$TOTAL"'}'`
> >
> > RESULT=`awk ' BEGIN {offset = 0}
> > offset + $1 > '"$OFFSET"' \
> > {print $2, $1; exit}
> > {offset += $1}' /tmp/$$`
> > FILE=`echo "$RESULT" | awk '{print $1}'`
> > FILEPAGES=`echo "$RESULT" | awk '{print $2}'`
> >
> > if [ "$FILEPAGES" -gt "$PAGES" ]
> > then FILEPAGES="$PAGES"
> > fi
> >
> > dd bs="$BLCKSZ" count="$FILEPAGES" if="$FILE"
> of="/dev/null" >/dev/null 2>&1
> > PAGES=`expr "$PAGES" - "$FILEPAGES"`
> > done
> >
> > STOP=`date '+%s'`
> > SEQTIME=`expr "$STOP" - "$START"`
> >
> > echo
> > awk 'BEGIN {printf "random_page_cost = %f\n", ('"$RANDTIME"' /
> '"$SEQTIME"') * 10}'
>
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 3: if posting/reading through Usenet, please send an appropriate
> > subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
> > message can get through to the mailing list cleanly
>
> --
> Bruce Momjian | http://candle.pha.pa.us
> pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
> + If your life is a hard drive, | 13 Roberts Road
> + Christ can be your backup. | Newtown Square,
> Pennsylvania 19073
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
>

From:	Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp>
To:	pgman(at)candle(dot)pha(dot)pa(dot)us
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Script to compute random page cost
Date:	2002-09-09 07:57:53
Message-ID:	20020909.165753.41634704.t-ishii@sra.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> OK, turns out that the loop for sequential scan ran fewer times and was
> skewing the numbers. I have a new version at:
>
> ftp://candle.pha.pa.us/pub/postgresql/randcost
>
> I get _much_ lower numbers now for random_page_cost.

I got:

random_page_cost = 1.047619

Linux kernel 2.4.18
Pentium III 750MHz
Memory 256MB
IDE HDD

(A notebook/SONY VAIO PCG-Z505CR/K)
--
Tatsuo Ishii

From:	Curt Sampson <cjs(at)cynic(dot)net>
To:	Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject:	Re: Script to compute random page cost
Date:	2002-09-09 10:52:46
Message-ID:	Pine.NEB.4.44.0209091944340.479-100000@angelic.cynic.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, 9 Sep 2002, Bruce Momjian wrote:

> What do other people get for this value?

With your new script, with a 1.5 GHz Athlon, 512 MB RAM, and a nice fast
IBM 7200 RPM IDE disk, I get random_page_cost = 0.933333.

> One flaw in this test is that it randomly reads blocks from different
> files rather than randomly reading from the same file. Do people have a
> suggestion on how to correct this? Does it matter?

From my quick glance, it also does a lot of work work to read each
block, including forking off serveral other programs. This would tend to
push up the cost of a random read. You might want to look at modifying
the randread program (http://randread.sourceforge.net) to do what you
want....

cjs
--
Curt Sampson <cjs(at)cynic(dot)net> +81 90 7737 2974 http://www.netbsd.org
Don't you know, in this new Dark Age, we're all light. --XTC

From:	Oliver Elphick <olly(at)lfix(dot)co(dot)uk>
To:	Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject:	Re: Script to compute random page cost
Date:	2002-09-09 11:52:38
Message-ID:	1031572358.24419.214.camel@linda
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, 2002-09-09 at 07:13, Bruce Momjian wrote:
>
> OK, turns out that the loop for sequential scan ran fewer times and was
> skewing the numbers. I have a new version at:
>
> ftp://candle.pha.pa.us/pub/postgresql/randcost
>
> I get _much_ lower numbers now for random_page_cost.
>
> ---------------------------------------------------------------------------

Five successive runs:

random_page_cost = 0.947368
random_page_cost = 0.894737
random_page_cost = 0.947368
random_page_cost = 0.894737
random_page_cost = 0.894737

linux 2.4.18 SMP
dual Athlon MP 1900+
512Mb RAM
SCSI

--
Oliver Elphick Oliver(dot)Elphick(at)lfix(dot)co(dot)uk
Isle of Wight, UK
http://www.lfix.co.uk/oliver
GPG: 1024D/3E1D0C1C: CA12 09E0 E8D5 8870 5839 932A 614D 4C34 3E1D 0C1C
========================================
"Submit yourselves therefore to God. Resist the devil,
and he will flee from you." James 4:7

From:	Rod Taylor <rbt(at)rbt(dot)ca>
To:	Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Script to compute random page cost
Date:	2002-09-09 12:00:45
Message-ID:	1031572846.15580.36.camel@jester
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, 2002-09-09 at 02:13, Bruce Momjian wrote:
>
> OK, turns out that the loop for sequential scan ran fewer times and was
> skewing the numbers. I have a new version at:
>
> ftp://candle.pha.pa.us/pub/postgresql/randcost
>
> I get _much_ lower numbers now for random_page_cost.

The current script pulls way more data for Sequential scan than random
scan now.

Random is pulling a single page (count=1 for dd) with every loop.
Sequential does the same number of loops, but pulls count > 1 in each.

In effect, sequential is random with more data load -- which explains
all of the 0.9's.

Rod Taylor

From:	"Nick Fankhauser" <nickf(at)ontko(dot)com>
To:	"Bruce Momjian" <pgman(at)candle(dot)pha(dot)pa(dot)us>, "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>
Cc:	"Ray Ontko" <rayo(at)ontko(dot)com>
Subject:	Re: Script to compute random page cost
Date:	2002-09-09 16:25:08
Message-ID:	NEBBLAAHGLEEPCGOBHDGGENKFLAA.nickf@ontko.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Bruce-

With the change in the script that I mentioned to you off-list (which I
believe just pointed it at our "real world" data), I got the following
results with 6 successive runs on each of our two development platforms:

(We're running PGSQL 7.2.1 on Debian Linux 2.4)

System 1:
1.2 GHz Athlon Processor, 512MB RAM, Database on IDE hard drive
random_page_cost = 0.857143
random_page_cost = 0.809524
random_page_cost = 0.809524
random_page_cost = 0.809524
random_page_cost = 0.857143
random_page_cost = 0.884615

System 2:
Dual 1.2Ghz Athlon MP Processors, SMP enabled, 1 GB RAM, Database on Ultra
SCSI RAID 5 with Hardware controller.
random_page_cost = 0.894737
random_page_cost = 0.842105
random_page_cost = 0.894737
random_page_cost = 0.894737
random_page_cost = 0.842105
random_page_cost = 0.894737

I was surprised that the SCSI RAID drive is generally slower than IDE, but
the values are in line with the results that others have been getting.

-Nick

From:	"scott(dot)marlowe" <scott(dot)marlowe(at)ihs(dot)com>
To:	Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Script to compute random page cost
Date:	2002-09-09 17:46:00
Message-ID:	Pine.LNX.4.33.0209091145210.12848-100000@css120.ihs.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

I'm getting an infinite wait on that file, could someone post it to the
list please?

On Mon, 9 Sep 2002, Bruce Momjian wrote:

>
> OK, turns out that the loop for sequential scan ran fewer times and was
> skewing the numbers. I have a new version at:
>
> ftp://candle.pha.pa.us/pub/postgresql/randcost
>
> I get _much_ lower numbers now for random_page_cost.

From:	"Nick Fankhauser" <nickf(at)ontko(dot)com>
To:	<nickf(at)ontko(dot)com>, "Bruce Momjian" <pgman(at)candle(dot)pha(dot)pa(dot)us>, "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>
Cc:	"Ray Ontko" <rayo(at)ontko(dot)com>
Subject:	Re: Script to compute random page cost
Date:	2002-09-09 18:22:55
Message-ID:	NEBBLAAHGLEEPCGOBHDGAEOCFLAA.nickf@ontko.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi again-

I bounced these numbers off of Ray Ontko here at our shop, and he pointed
out that random page cost is measured in multiples of a sequential page
fetch. It seems almost impossible that a random fetch would be less
expensive than a sequential fetch, yet we all seem to be getting results <
1. I can't see anything obviously wrong with the script, but something very
odd is going.

-Nick

> -----Original Message-----
> From: pgsql-hackers-owner(at)postgresql(dot)org
> [mailto:pgsql-hackers-owner(at)postgresql(dot)org]On Behalf Of Nick Fankhauser
> Sent: Monday, September 09, 2002 11:25 AM
> To: Bruce Momjian; PostgreSQL-development
> Cc: Ray Ontko
> Subject: Re: [HACKERS] Script to compute random page cost
>
>
> Bruce-
>
> With the change in the script that I mentioned to you off-list (which I
> believe just pointed it at our "real world" data), I got the following
> results with 6 successive runs on each of our two development platforms:
>
> (We're running PGSQL 7.2.1 on Debian Linux 2.4)
>
> System 1:
> 1.2 GHz Athlon Processor, 512MB RAM, Database on IDE hard drive
> random_page_cost = 0.857143
> random_page_cost = 0.809524
> random_page_cost = 0.809524
> random_page_cost = 0.809524
> random_page_cost = 0.857143
> random_page_cost = 0.884615
>
> System 2:
> Dual 1.2Ghz Athlon MP Processors, SMP enabled, 1 GB RAM, Database on Ultra
> SCSI RAID 5 with Hardware controller.
> random_page_cost = 0.894737
> random_page_cost = 0.842105
> random_page_cost = 0.894737
> random_page_cost = 0.894737
> random_page_cost = 0.842105
> random_page_cost = 0.894737
>
>
> I was surprised that the SCSI RAID drive is generally slower than IDE, but
> the values are in line with the results that others have been getting.
>
> -Nick
>
> > -----Original Message-----
> > From: pgsql-hackers-owner(at)postgresql(dot)org
> > [mailto:pgsql-hackers-owner(at)postgresql(dot)org]On Behalf Of Bruce Momjian
> > Sent: Monday, September 09, 2002 1:14 AM
> > To: PostgreSQL-development
> > Subject: Re: [HACKERS] Script to compute random page cost
> >
> >
> >
> > OK, turns out that the loop for sequential scan ran fewer times and was
> > skewing the numbers. I have a new version at:
> >
> > ftp://candle.pha.pa.us/pub/postgresql/randcost
> >
> > I get _much_ lower numbers now for random_page_cost.
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
> message can get through to the mailing list cleanly
>

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	nickf(at)ontko(dot)com
Cc:	"Bruce Momjian" <pgman(at)candle(dot)pha(dot)pa(dot)us>, "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>, "Ray Ontko" <rayo(at)ontko(dot)com>
Subject:	Re: Script to compute random page cost
Date:	2002-09-09 21:09:36
Message-ID:	13844.1031605776@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

"Nick Fankhauser" <nickf(at)ontko(dot)com> writes:
> I bounced these numbers off of Ray Ontko here at our shop, and he pointed
> out that random page cost is measured in multiples of a sequential page
> fetch. It seems almost impossible that a random fetch would be less
> expensive than a sequential fetch, yet we all seem to be getting results <
> 1. I can't see anything obviously wrong with the script, but something very
> odd is going.

The big problem with the script is that it involves an invocation of
"dd" - hence, at least one process fork --- for every page read
operation. The seqscan part of the test is even worse, as it adds a
test(1) call and a shell if/then/else to the overhead. My guess is that
we are measuring script overhead here, and not the desired I/O quantities
at all --- the script overhead is completely swamping the latter. The
apparent stability of the results across a number of different platforms
bolsters that thought.

Someone else opined that the script was also not comparing equal
numbers of pages read for the random and sequential cases. I haven't
tried to decipher the logic enough to see if that allegation is true,
but it's not obviously false.

Finally, I wouldn't believe the results for a moment if they were taken
against databases that are not several times the size of physical RAM
on the test machine, with a total I/O volume also much more than
physical RAM. We are trying to measure the behavior when kernel
caching is not helpful; if the database fits in RAM then you are just
naturally going to get random_page_cost close to 1, because the kernel
will avoid doing any I/O at all.

regards, tom lane

From:	Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To:	nickf(at)ontko(dot)com
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Ray Ontko <rayo(at)ontko(dot)com>
Subject:	Re: Script to compute random page cost
Date:	2002-09-10 01:24:28
Message-ID:	200209100124.g8A1OSJ27459@candle.pha.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Nick Fankhauser wrote:
> Hi again-
>
> I bounced these numbers off of Ray Ontko here at our shop, and he pointed
> out that random page cost is measured in multiples of a sequential page
> fetch. It seems almost impossible that a random fetch would be less
> expensive than a sequential fetch, yet we all seem to be getting results <
> 1. I can't see anything obviously wrong with the script, but something very
> odd is going.

OK, new version at:

ftp://candle.pha.pa.us/pub/postgresql/randcost

What I have done is to take all of the computation stuff out of the
timed loop so only the 'dd' is done in the loop.

I am getting a 1.0 for random pages cost with this new code, but I don't
have much data in the database so it is very possible I have it all
cached. Would others please test it?

From:	Curt Sampson <cjs(at)cynic(dot)net>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	nickf(at)ontko(dot)com, Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Ray Ontko <rayo(at)ontko(dot)com>
Subject:	Re: Script to compute random page cost
Date:	2002-09-10 02:54:07
Message-ID:	Pine.NEB.4.44.0209101153230.13186-100000@angelic.cynic.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, 9 Sep 2002, Tom Lane wrote:

> Finally, I wouldn't believe the results for a moment if they were taken
> against databases that are not several times the size of physical RAM
> on the test machine, with a total I/O volume also much more than
> physical RAM. We are trying to measure the behavior when kernel
> caching is not helpful; if the database fits in RAM then you are just
> naturally going to get random_page_cost close to 1, because the kernel
> will avoid doing any I/O at all.

Um...yeah; another reason to use randread against a raw disk device.
(A little hard to use on linux systems, I bet, but works fine on
BSD systems.)

cjs
--
Curt Sampson <cjs(at)cynic(dot)net> +81 90 7737 2974 http://www.netbsd.org
Don't you know, in this new Dark Age, we're all light. --XTC

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Curt Sampson <cjs(at)cynic(dot)net>
Cc:	nickf(at)ontko(dot)com, Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Ray Ontko <rayo(at)ontko(dot)com>
Subject:	Re: Script to compute random page cost
Date:	2002-09-10 03:43:08
Message-ID:	16319.1031629388@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Curt Sampson <cjs(at)cynic(dot)net> writes:
> On Mon, 9 Sep 2002, Tom Lane wrote:
>> ... We are trying to measure the behavior when kernel
>> caching is not helpful; if the database fits in RAM then you are just
>> naturally going to get random_page_cost close to 1, because the kernel
>> will avoid doing any I/O at all.

> Um...yeah; another reason to use randread against a raw disk device.
> (A little hard to use on linux systems, I bet, but works fine on
> BSD systems.)

Umm... not really; surely randread wouldn't know anything about
read-ahead logic?

The reason this is a difficult topic is that we are trying to measure
certain kernel behaviors --- namely readahead for sequential reads ---
and not others --- namely caching, because we have other parameters
of the cost models that purport to deal with that.

Mebbe this is an impossible task and we need to restructure the cost
models from the ground up. But I'm not convinced of that. The fact
that a one-page shell script can't measure the desired quantity doesn't
mean we can't measure it with more effort.

regards, tom lane

From:	Curt Sampson <cjs(at)cynic(dot)net>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	nickf(at)ontko(dot)com, Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Ray Ontko <rayo(at)ontko(dot)com>
Subject:	Re: Script to compute random page cost
Date:	2002-09-10 04:19:46
Message-ID:	Pine.NEB.4.44.0209101302540.13186-100000@angelic.cynic.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, 9 Sep 2002, Tom Lane wrote:

> Curt Sampson <cjs(at)cynic(dot)net> writes:
> > On Mon, 9 Sep 2002, Tom Lane wrote:
> >> ... We are trying to measure the behavior when kernel
> >> caching is not helpful; if the database fits in RAM then you are just
> >> naturally going to get random_page_cost close to 1, because the kernel
> >> will avoid doing any I/O at all.
>
> > Um...yeah; another reason to use randread against a raw disk device.
> > (A little hard to use on linux systems, I bet, but works fine on
> > BSD systems.)
>
> Umm... not really; surely randread wouldn't know anything about
> read-ahead logic?

Randread doesn't know anything about read-ahead logic, but I don't
see how that matters one way or the other. The chances of it reading
blocks sequentially are pretty much infinitesimal if you're reading
across a reasonably large area of disk (I recommend at least 4GB),
so readahead will never be triggered.

> The reason this is a difficult topic is that we are trying to measure
> certain kernel behaviors --- namely readahead for sequential reads ---
> and not others --- namely caching, because we have other parameters
> of the cost models that purport to deal with that.

Well, for the sequential reads, the readahead should be trigerred
even when reading from a raw device. So just use dd to measure
that. If you want to slightly more accurately model postgres'
behaviour, you probably want to pick a random area of the disk,
read a gigabyte, switch areas, read another gigabyte, and so on.
This will model the "split into 1GB" files thing.

cjs
--
Curt Sampson <cjs(at)cynic(dot)net> +81 90 7737 2974 http://www.netbsd.org
Don't you know, in this new Dark Age, we're all light. --XTC

From:	Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To:	Curt Sampson <cjs(at)cynic(dot)net>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, nickf(at)ontko(dot)com, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Ray Ontko <rayo(at)ontko(dot)com>
Subject:	Re: Script to compute random page cost
Date:	2002-09-10 06:01:43
Message-ID:	200209100601.g8A61hK04081@candle.pha.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

OK, I have a better version at:

ftp://candle.pha.pa.us/pub/postgresql/randcost

I have added a null loop which does a dd on a single file without
reading any data, and by netting that loop out of the total computation
and increasing the number of tests, I have gotten the following results
for three runs:

random test: 36
sequential test: 33
null timing test: 27

random_page_cost = 1.500000

random test: 38
sequential test: 32
null timing test: 27

random_page_cost = 2.200000

random test: 40
sequential test: 31
null timing test: 27

random_page_cost = 3.250000

Interesting that random time is increasing, while the others were
stable. I think this may have to do with other system activity at the
time of the test. I will run it some more tomorrow but clearly we are
seeing reasonable numbers now.

From:	"Christopher Kings-Lynne" <chriskl(at)familyhealth(dot)com(dot)au>
To:	"Bruce Momjian" <pgman(at)candle(dot)pha(dot)pa(dot)us>, "Curt Sampson" <cjs(at)cynic(dot)net>
Cc:	"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, <nickf(at)ontko(dot)com>, "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>, "Ray Ontko" <rayo(at)ontko(dot)com>
Subject:	Re: Script to compute random page cost
Date:	2002-09-10 06:23:29
Message-ID:	GNELIHDDFBOCMGBFGEFOAECMCEAA.chriskl@familyhealth.com.au
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

I got somewhat different:

$ ./randcost /usr/local/pgsql/data
Collecting sizing information ...
Running random access timing test ...
Running sequential access timing test ...
Running null loop timing test ...
random test: 13
sequential test: 15
null timing test: 11

random_page_cost = 0.500000

Chris

> -----Original Message-----
> From: pgsql-hackers-owner(at)postgresql(dot)org
> [mailto:pgsql-hackers-owner(at)postgresql(dot)org]On Behalf Of Bruce Momjian
> Sent: Tuesday, 10 September 2002 2:02 PM
> To: Curt Sampson
> Cc: Tom Lane; nickf(at)ontko(dot)com; PostgreSQL-development; Ray Ontko
> Subject: Re: [HACKERS] Script to compute random page cost
>
>
> OK, I have a better version at:
>
> ftp://candle.pha.pa.us/pub/postgresql/randcost
>
> I have added a null loop which does a dd on a single file without
> reading any data, and by netting that loop out of the total computation
> and increasing the number of tests, I have gotten the following results
> for three runs:
>
> random test: 36
> sequential test: 33
> null timing test: 27
>
> random_page_cost = 1.500000
>
>
> random test: 38
> sequential test: 32
> null timing test: 27
>
> random_page_cost = 2.200000
>
>
> random test: 40
> sequential test: 31
> null timing test: 27
>
> random_page_cost = 3.250000
>
> Interesting that random time is increasing, while the others were
> stable. I think this may have to do with other system activity at the
> time of the test. I will run it some more tomorrow but clearly we are
> seeing reasonable numbers now.
>
> --
> Bruce Momjian | http://candle.pha.pa.us
> pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
> + If your life is a hard drive, | 13 Roberts Road
> + Christ can be your backup. | Newtown Square,
> Pennsylvania 19073
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
>

From:	Curt Sampson <cjs(at)cynic(dot)net>
To:	Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, <nickf(at)ontko(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Ray Ontko <rayo(at)ontko(dot)com>
Subject:	Re: Script to compute random page cost
Date:	2002-09-10 06:24:21
Message-ID:	Pine.NEB.4.44.0209101520240.13186-100000@angelic.cynic.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, 10 Sep 2002, Bruce Momjian wrote:

> Interesting that random time is increasing, while the others were
> stable. I think this may have to do with other system activity at the
> time of the test.

Actually, the random versus sequential time may also be different
depending on how many processes are competing for disk access, as
well. If the OS isn't maintaining readahead for whatever reason,
sequential access could, in theory, degrade to being the same speed
as random access. It might be interesting to test this, too.

cjs
--
Curt Sampson <cjs(at)cynic(dot)net> +81 90 7737 2974 http://www.netbsd.org
Don't you know, in this new Dark Age, we're all light. --XTC

From:	Oliver Elphick <olly(at)lfix(dot)co(dot)uk>
To:	Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject:	Re: Script to compute random page cost
Date:	2002-09-10 10:47:53
Message-ID:	1031654873.25060.508.camel@linda
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Latest version:

olly(at)linda$
random test: 14
sequential test: 11
null timing test: 9
random_page_cost = 2.500000

olly(at)linda$ for a in 1 2 3 4 5
> do
> ~/randcost
> done
Collecting sizing information ...
random test: 11
sequential test: 11
null timing test: 9
random_page_cost = 1.000000

random test: 11
sequential test: 10
null timing test: 9
random_page_cost = 2.000000

random test: 11
sequential test: 11
null timing test: 9
random_page_cost = 1.000000

random test: 11
sequential test: 10
null timing test: 9
random_page_cost = 2.000000

random test: 10
sequential test: 10
null timing test: 10
Sequential time equals null time. Increase TESTCYCLES and rerun.

Available memory (512M) exceeds the total database size, so sequential
and random are almost the same for the second and subsequent runs.

Since, in production, I would hope to have all active tables permanently
in RAM, would there be a case for my using a page cost of 1 on the
assumption that no disk reads would be needed?

--
Oliver Elphick Oliver(dot)Elphick(at)lfix(dot)co(dot)uk
Isle of Wight, UK
http://www.lfix.co.uk/oliver
GPG: 1024D/3E1D0C1C: CA12 09E0 E8D5 8870 5839 932A 614D 4C34 3E1D 0C1C
========================================
"Draw near to God and he will draw near to you.
Cleanse your hands, you sinners; and purify your
hearts, you double minded." James 4:8

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Curt Sampson <cjs(at)cynic(dot)net>
Cc:	nickf(at)ontko(dot)com, Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Ray Ontko <rayo(at)ontko(dot)com>
Subject:	Re: Script to compute random page cost
Date:	2002-09-10 14:00:49
Message-ID:	19140.1031666449@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Curt Sampson <cjs(at)cynic(dot)net> writes:
> Well, for the sequential reads, the readahead should be trigerred
> even when reading from a raw device.

That strikes me as an unportable assumption.

Even if true, we can't provide a test mechanism that requires root
access to run it --- raw-device testing is out of the question just on
that basis, never mind that it is not measuring what we want to measure.

Perhaps it's time to remind people that what we want to measure
is the performance seen by a C program issuing write() and read()
commands, transferring 8K at a time, on a regular Unix filesystem.
A shell script invoking dd is by definition going to see a very
different performance ratio, even if what dd does under the hood
is 8K read() and write() (another not-well-supported assumption,
IMHO). If you try to "improve" the results by using a raw device,
you're merely moving even further away from the scenario of interest.

regards, tom lane

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc:	Curt Sampson <cjs(at)cynic(dot)net>, nickf(at)ontko(dot)com, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Ray Ontko <rayo(at)ontko(dot)com>
Subject:	Re: Script to compute random page cost
Date:	2002-09-10 14:09:14
Message-ID:	19229.1031666954@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> I will run it some more tomorrow but clearly we are
> seeing reasonable numbers now.

... which still have no provable relationship to the ratio we need to
measure. See my previous comments to Curt; I don't think you can
possibly get trustworthy results out of a shell script + dd approach,
because we do not implement Postgres using dd.

If you implemented a C testbed and then proved by experiment that the
shell script got comparable numbers, then I'd believe its results.
Without that confirmation, these are just meaningless numbers.

regards, tom lane

From:	Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To:	Oliver Elphick <olly(at)lfix(dot)co(dot)uk>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject:	Re: Script to compute random page cost
Date:	2002-09-10 15:27:06
Message-ID:	200209101527.g8AFR6Z05598@candle.pha.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

OK, what you are seeing here is that for your platform the TESTCYCLES
size isn't enough; the numbers are too close to measure the difference.

I am going to increase the TESTCYCLES from 5k to 10k. That should
provide better numbers.

---------------------------------------------------------------------------

Oliver Elphick wrote:
> On Mon, 2002-09-09 at 07:13, Bruce Momjian wrote:
> >
> > OK, turns out that the loop for sequential scan ran fewer times and was
> > skewing the numbers. I have a new version at:
> >
> > ftp://candle.pha.pa.us/pub/postgresql/randcost
>
> Latest version:
>
> olly(at)linda$
> random test: 14
> sequential test: 11
> null timing test: 9
> random_page_cost = 2.500000
>
> olly(at)linda$ for a in 1 2 3 4 5
> > do
> > ~/randcost
> > done
> Collecting sizing information ...
> random test: 11
> sequential test: 11
> null timing test: 9
> random_page_cost = 1.000000
>
> random test: 11
> sequential test: 10
> null timing test: 9
> random_page_cost = 2.000000
>
> random test: 11
> sequential test: 11
> null timing test: 9
> random_page_cost = 1.000000
>
> random test: 11
> sequential test: 10
> null timing test: 9
> random_page_cost = 2.000000
>
> random test: 10
> sequential test: 10
> null timing test: 10
> Sequential time equals null time. Increase TESTCYCLES and rerun.
>
>
> Available memory (512M) exceeds the total database size, so sequential
> and random are almost the same for the second and subsequent runs.
>
> Since, in production, I would hope to have all active tables permanently
> in RAM, would there be a case for my using a page cost of 1 on the
> assumption that no disk reads would be needed?
>
> --
> Oliver Elphick Oliver(dot)Elphick(at)lfix(dot)co(dot)uk
> Isle of Wight, UK
> http://www.lfix.co.uk/oliver
> GPG: 1024D/3E1D0C1C: CA12 09E0 E8D5 8870 5839 932A 614D 4C34 3E1D 0C1C
> ========================================
> "Draw near to God and he will draw near to you.
> Cleanse your hands, you sinners; and purify your
> hearts, you double minded." James 4:8
>
>

From:	Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To:	Oliver Elphick <olly(at)lfix(dot)co(dot)uk>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Script to compute random page cost
Date:	2002-09-10 15:28:23
Message-ID:	200209101528.g8AFSOW14990@candle.pha.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Oliver Elphick wrote:
> Available memory (512M) exceeds the total database size, so sequential
> and random are almost the same for the second and subsequent runs.
>
> Since, in production, I would hope to have all active tables permanently
> in RAM, would there be a case for my using a page cost of 1 on the
> assumption that no disk reads would be needed?

Yes, in your case random_page_cost would be 1 once the data gets into
RAM.

In fact, that is the reason I used only /data/base for testing so places
where data can load into ram will see lower random pages costs.

I could just create a random file and test on that but it isn't the
same.

From:	Curt Sampson <cjs(at)cynic(dot)net>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	nickf(at)ontko(dot)com, Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Ray Ontko <rayo(at)ontko(dot)com>
Subject:	Re: Script to compute random page cost
Date:	2002-09-11 01:51:41
Message-ID:	Pine.NEB.4.44.0209111044501.23252-100000@angelic.cynic.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, 10 Sep 2002, Tom Lane wrote:

> Curt Sampson <cjs(at)cynic(dot)net> writes:
> > Well, for the sequential reads, the readahead should be trigerred
> > even when reading from a raw device.
>
> That strikes me as an unportable assumption.

Not only unportable: but false. :-) NetBSD, at least, does read-ahead
only through the buffer cache. Thinking about it, you *can't* do
read-ahead on a raw device, because you're not buffering. Doh!

> Perhaps it's time to remind people that what we want to measure
> is the performance seen by a C program issuing write() and read()
> commands, transferring 8K at a time, on a regular Unix filesystem.

Right. Which is what randread does, if you give it a file rather
than a raw device. I'm actually just now working on some modifications
for it that will let you work against a bunch of files, rather than
just one, so it will very accurately emulate a postgres random read
of blocks from a table.

There are two other tricky things related to the behaviour, however:

1. The buffer cache. You really need to be working against your
entire database, not just a few gigabytes of its data, or sample
data.

2. Multiple users. You really want a mix of simultaneous accesses
going on, with as many processes as you normally have users querying
the database.

These can probably both be taken care of with shell scripts, though.

cjs
--
Curt Sampson <cjs(at)cynic(dot)net> +81 90 7737 2974 http://www.netbsd.org
Don't you know, in this new Dark Age, we're all light. --XTC

From:	Mark Kirkwood <markir(at)slingshot(dot)co(dot)nz>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Curt Sampson <cjs(at)cynic(dot)net>, nickf(at)ontko(dot)com, Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Ray Ontko <rayo(at)ontko(dot)com>
Subject:	Re: Script to compute random page cost
Date:	2002-09-11 05:47:25
Message-ID:	3D7ED8ED.3050709@slingshot.co.nz
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Tom Lane wrote:
> Perhaps it's time to remind people that what we want to measure
> is the performance seen by a C program issuing write() and read()
>commands, transferring 8K at a time, on a regular Unix filesystem

Yes...and at the risk of being accused of marketing ;-) , that is
exactly what the 3 programs in my archive do (see previous post for url) :

- one called 'write' creates a suitably sized data file (8k at a time -
configurable), using the write() call
- another called 'read' does sequential reads (8k at a time -
configurable), using the read() call
- finally one called 'seek' does random reads (8k chunks -
configurable), using the lseek() and read() calls

I tried to use code as similar as possible to how Postgres does its
io....so the results *should* be meaningful !
Large file support in enabled too (as you need to use a file several
times bigger than your RAM - and everyone seems to have >1G of it these
days...)

I think the code is reasonably readable too....
Its been *tested* on Linux, Freebsd, Solaris, MacosX.

The only downer is that they don't automatically compute
random_page_cost for you..(I was more interested in the raw sequential
read, write and random read rates at the time). However it would be a
fairly simple modification to combine the all 3 programs into one
executable that outputs random_page_cost...

regards

Mark

From:	Curt Sampson <cjs(at)cynic(dot)net>
To:	Mark Kirkwood <markir(at)slingshot(dot)co(dot)nz>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, <nickf(at)ontko(dot)com>, Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Ray Ontko <rayo(at)ontko(dot)com>
Subject:	Re: Script to compute random page cost
Date:	2002-09-11 07:18:14
Message-ID:	Pine.NEB.4.44.0209111548560.24427-100000@angelic.cynic.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, 11 Sep 2002, Mark Kirkwood wrote:

> Yes...and at the risk of being accused of marketing ;-) , that is
> exactly what the 3 programs in my archive do (see previous post for url) :

Hm, it appears we've both been working on something similar. However,
I've just released version 0.2 of randread, which has the following
features:

Written in C, uses read(2) and write(2), pretty much like postgres.

Reads or writes random blocks from a specified list of files,
treated as a contiguous range of blocks, again like postgres. This
allows you to do random reads from the actual postgres data files
for a table, if you like.

You can specify the block size to use, and the number of reads to do.

Allows you to specify how many blocks you want to read before you
start reading again at a new random location. (The default is 1.)
This allows you to model various sequential and random read mixes.

If you want to do writes, I suggest you create your own set of files to
write, rather than destroying postgresql data. This can easily a be done
with something like this Bourne shell script:

for i in 1 2 3 4; do
dd if=/dev/zero of=file.$i bs=1m count=1024
done

However, it doesn't calculate the random vs. sequential ratio for you;
you've got to do that for yourself. E.g.,:

$ ./randread -l 512 -c 256 /u/cjs/z?
256 reads of 512 x 8.00 KB blocks (4096.00 KB)
totalling 131072 blocks (1024.00 MB)
from 524288 blocks (4092.00 MB) in 4 files.
256 reads in 36.101119 sec. (141019 usec/read, 7 reads/sec, 29045.53 KB/sec)

$ ./randread -c 4096 /u/cjs/z?
4096 reads of 1 x 8.00 KB blocks (8.00 KB)
totalling 4096 blocks (32.00 MB)
from 524288 blocks (4095.99 MB) in 4 files.
4096 reads in 34.274582 sec. (8367 usec/read, 120 reads/sec, 956.04 KB/sec)

In this case, across 4 GB in 4 files on my 512 MB, 1.5 GHz Athlon
with an IBM 7200 RPM IDE drive, I read about 30 times faster doing
a full sequential read of the files than I do reading 32 MB randomly
from it. But because of the size of this, there's basically no
buffer cache involved. If I do this on a single 512 MB file:

$ ./randread -c 4096 /u/cjs/z1:0-65536
4096 reads of 1 x 8.00 KB blocks (8.00 KB)
totalling 4096 blocks (32.00 MB)
from 65536 blocks (511.99 MB) in 1 files.
4096 reads in 28.064573 sec. (6851 usec/read, 146 reads/sec, 1167.59 KB/sec)

$ ./randread -l 65535 -c 1 /u/cjs/z1:0-65536
1 reads of 65535 x 8.00 KB blocks (524280.00 KB)
totalling 65535 blocks (511.99 MB)
from 65536 blocks (0.01 MB) in 1 files.
1 reads in 17.107867 sec. (17107867 usec/read, 0 reads/sec, 30645.55 KB/sec)

$ ./randread -c 4096 /u/cjs/z1:0-65536
4096 reads of 1 x 8.00 KB blocks (8.00 KB)
totalling 4096 blocks (32.00 MB)
from 65536 blocks (511.99 MB) in 1 files.
4096 reads in 19.413738 sec. (4739 usec/read, 215 reads/sec, 1687.88 KB/sec)

Well, there you see some of the buffer cache effect from starting
with about half the file in memory. If you want to see serious buffer
cache action, just use the first 128 MB of my first test file:

$ ./randread -c 4096 /u/cjs/z1:0-16536
4096 reads of 1 x 8.00 KB blocks (8.00 KB)
totalling 4096 blocks (32.00 MB)
from 16536 blocks (129.18 MB) in 1 files.
4096 reads in 20.220791 sec. (4936 usec/read, 204 reads/sec, 1620.51 KB/sec)

$ ./randread -l 16535 -c 1 /u/cjs/z1:0-16536
1 reads of 16535 x 8.00 KB blocks (132280.00 KB)
totalling 16535 blocks (129.18 MB)
from 16536 blocks (0.01 MB) in 1 files.
1 reads in 3.469231 sec. (3469231 usec/read, 0 reads/sec, 38129.49 KB/sec)

$ ./randread -l 16535 -c 64 /u/cjs/z1:0-16536
64 reads of 16535 x 8.00 KB blocks (132280.00 KB)
totalling 1058240 blocks (8267.50 MB)
from 16536 blocks (0.01 MB) in 1 files.
64 reads in 23.643026 sec. (369422 usec/read, 2 reads/sec, 358072.59 KB/sec)

For those last three, we're basically limited completely by the
CPU, as there's not much disk I/O going on at all. The many-block
one is going to be slower because it's got to generate a lot more
random numbers and do a lot more lseek operations.

Anyway, looking at the real difference between truly sequential
and truly random reads on a large amount of data file (30:1 or so),
it looks to me that people getting much less than that are getting
good work out of their buffer cache. You've got to wonder if there's
some way to auto-tune for this sort of thing....

Anyway, feel free to download and play. If you want to work on the
program, I'm happy to give developer access on sourceforge.

http://sourceforge.net/project/showfiles.php?group_id=55994

cjs
--
Curt Sampson <cjs(at)cynic(dot)net> +81 90 7737 2974 http://www.netbsd.org
Don't you know, in this new Dark Age, we're all light. --XTC

From:	Hans-Jürgen Schönig <postgres(at)cybertec(dot)at>
To:	Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org, TESTCYCLES=500000(at)cybertec(dot)at
Subject:	Re: Script to compute random page cost
Date:	2002-09-11 07:55:48
Message-ID:	3D7EF704.3000609@cybertec.at
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

AMD Athlon 500
512MB Ram
IBM 120GB IDE

Tested with:
BLCKSZ=8192
TESTCYCLES=500000

Result:
Collecting sizing information ...
Running random access timing test ...
Running sequential access timing test ...
Running null loop timing test ...
random test: 2541
sequential test: 2455
null timing test: 2389

random_page_cost = 2.303030

Hans

--
*Cybertec Geschwinde u Schoenig*
Ludo-Hartmannplatz 1/14, A-1160 Vienna, Austria
Tel: +43/1/913 68 09; +43/664/233 90 75
www.postgresql.at <http://www.postgresql.at>, cluster.postgresql.at
<http://cluster.postgresql.at>, www.cybertec.at
<http://www.cybertec.at>, kernel.cybertec.at <http://kernel.cybertec.at>

From:	Mark Kirkwood <markir(at)slingshot(dot)co(dot)nz>
To:	Curt Sampson <cjs(at)cynic(dot)net>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, nickf(at)ontko(dot)com, Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Ray Ontko <rayo(at)ontko(dot)com>
Subject:	Re: Script to compute random page cost
Date:	2002-09-11 08:04:48
Message-ID:	3D7EF920.7070503@slingshot.co.nz
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Curt Sampson wrote:
> On Wed, 11 Sep 2002, Mark Kirkwood wrote:
>
>
>
> Hm, it appears we've both been working on something similar. However,
> I've just released version 0.2 of randread, which has the following
> features:
>

funny how often that happens...( I think its often worth the effort to
write your own benchmarking / measurement tool in order to gain an good
understanding of what you intend to measure)

>Anyway, feel free to download and play. If you want to work on the
>program, I'm happy to give developer access on sourceforge.
>
> http://sourceforge.net/project/showfiles.php?group_id=55994

I'll take a look.

best wishes

Mark

From:	Guido Goldstein <news(at)a-nugget(dot)de>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Script to compute random page cost
Date:	2002-09-11 23:47:59
Message-ID:	m2lm68w0k0.fsf@mail.a-nugget.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi!

On Tue, 10 Sep 2002 14:01:11 +0000 (UTC)
tgl(at)sss(dot)pgh(dot)pa(dot)us (Tom Lane) wrote:
[...]
> Perhaps it's time to remind people that what we want to measure
> is the performance seen by a C program issuing write() and read()
> commands, transferring 8K at a time, on a regular Unix filesystem.
[...]

I've written something like that.
It is not C but might be useful.
Any comments are welcome.

http://www.a-nugget.org/downloads/randread.py

Bye
Guido

From:	Justin Clift <justin(at)postgresql(dot)org>
To:	Mark Kirkwood <markir(at)slingshot(dot)co(dot)nz>
Cc:	Curt Sampson <cjs(at)cynic(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, nickf(at)ontko(dot)com, Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Ray Ontko <rayo(at)ontko(dot)com>
Subject:	Re: Script to compute random page cost
Date:	2002-10-12 12:28:02
Message-ID:	3DA81552.EA79ADB0@postgresql.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi all,

As an end result of all this, do we now have a decent utility by which
end user admin's can run it against the same disk/array that their
PostgreSQL installation is on, and get a reasonably accurate number for
random page cost?

ie:

$ ./get_calc_cost
Try using random_page_cost = foo

:-)

Regards and best wishes,

Justin Clift

--
"My grandfather once told me that there are two kinds of people: those
who work and those who take the credit. He told me to try to be in the
first group; there was less competition there."
- Indira Gandhi

From:	Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To:	Justin Clift <justin(at)postgresql(dot)org>
Cc:	Mark Kirkwood <markir(at)slingshot(dot)co(dot)nz>, Curt Sampson <cjs(at)cynic(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, nickf(at)ontko(dot)com, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Ray Ontko <rayo(at)ontko(dot)com>
Subject:	Re: Script to compute random page cost
Date:	2002-10-12 14:45:32
Message-ID:	200210121445.g9CEjXw16090@candle.pha.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Justin Clift wrote:
> Hi all,
>
> As an end result of all this, do we now have a decent utility by which
> end user admin's can run it against the same disk/array that their
> PostgreSQL installation is on, and get a reasonably accurate number for
> random page cost?
>
> ie:
>
> $ ./get_calc_cost
> Try using random_page_cost = foo
>
> $
>
> :-)

Right now we only have my script:

ftp://candle.pha.pa.us/pub/postgresql/randcost

It uses dd so it forks for every loop and shows a value for my machine
around 2.5. I need to code the loop in C to get more accurate numbers.