Quick Links

Re: [PERFORM] pgbench to the MAXINT

Lists:	pgsql-hackerspgsql-performance

From:	Greg Smith <greg(at)2ndquadrant(dot)com>
To:	"pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	pgbench to the MAXINT
Date:	2011-01-08 01:59:01
Message-ID:	4D27C4E5.5000609@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

At one point I was working on a patch to pgbench to have it adopt 64-bit
math internally even when running on 32 bit platforms, which are
currently limited to a dataabase scale of ~4000 before the whole process
crashes and burns. But since the range was still plenty high on a
64-bit system, I stopped working on that. People who are only running
32 bit servers at this point in time aren't doing anything serious
anyway, right?

So what is the upper limit now? The way it degrades when you cross it
amuses me:

$ pgbench -i -s 21475 pgbench
creating tables...
set primary key...
NOTICE: ALTER TABLE / ADD PRIMARY KEY will create implicit index
"pgbench_branches_pkey" for table "pgbench_branches"
NOTICE: ALTER TABLE / ADD PRIMARY KEY will create implicit index
"pgbench_tellers_pkey" for table "pgbench_tellers"
NOTICE: ALTER TABLE / ADD PRIMARY KEY will create implicit index
"pgbench_accounts_pkey" for table "pgbench_accounts"
vacuum...done.
$ pgbench -S -t 10 pgbench
starting vacuum...end.
setrandom: invalid maximum number -2147467296

It doesn't throw any error during the initialization step, neither via
client or database logs, even though it doesn't do anything whatsoever.
It just turns into the quickest pgbench init ever. That's the exact
threshold, because this works:

$ pgbench -i -s 21474 pgbench
creating tables...
10000 tuples done.
20000 tuples done.
30000 tuples done.
...

So where we're at now is that the maximum database pgbench can create is
a scale of 21474. That makes approximately a 313GB database. I can
tell you the size for sure when that init finishes running, which is not
going to be soon. That's not quite as big as I'd like to exercise a
system with 128GB of RAM, the biggest size I run into regularly now, but
it's close enough for now. This limit will need to finally got pushed
upward soon though, because 256GB servers are getting cheaper every
day--and the current pgbench can't make a database big enough to really
escape cache on one of them.

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books

From:	Euler Taveira de Oliveira <euler(at)timbira(dot)com>
To:	Greg Smith <greg(at)2ndquadrant(dot)com>
Cc:	"pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: pgbench to the MAXINT
Date:	2011-01-10 05:17:14
Message-ID:	4D2A965A.3000603@timbira.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

Em 07-01-2011 22:59, Greg Smith escreveu:
> setrandom: invalid maximum number -2147467296
>
It is failing at atoi() circa pgbench.c:1036. But it just the first one. There
are some variables and constants that need to be converted to int64 and some
functions that must speak 64-bit such as getrand(). Are you working on a patch?

> It doesn't throw any error during the initialization step, neither via
> client or database logs, even though it doesn't do anything whatsoever.
> It just turns into the quickest pgbench init ever. That's the exact
> threshold, because this works:
>
AFAICS that is because atoi() is so fragile.

> So where we're at now is that the maximum database pgbench can create is
> a scale of 21474.
>
That's because 21475 * 100,000 > INT_MAX. We must provide an alternative to
atoi() that deals with 64-bit integers.

--
Euler Taveira de Oliveira
http://www.timbira.com/

From:	Greg Smith <greg(at)2ndquadrant(dot)com>
To:	Euler Taveira de Oliveira <euler(at)timbira(dot)com>
Cc:	"pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: pgbench to the MAXINT
Date:	2011-01-10 08:25:23
Message-ID:	4D2AC273.3010709@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

Euler Taveira de Oliveira wrote:
> Em 07-01-2011 22:59, Greg Smith escreveu:
>> setrandom: invalid maximum number -2147467296
>>
> It is failing at atoi() circa pgbench.c:1036. But it just the first
> one. There are some variables and constants that need to be converted
> to int64 and some functions that must speak 64-bit such as getrand().
> Are you working on a patch?

http://archives.postgresql.org/pgsql-hackers/2010-01/msg02868.php
http://archives.postgresql.org/message-id/4C326F46.4050801@2ndquadrant.com

I thought we really needed to fix that before 9.0 shipped, but it turned
out the limit wasn't so bad after all on 64-bit systems; I dropped
worrying about it for a while. It's starting to look like it's back on
the critical list for 9.1 again though.

If anyone here wanted to pick that up and help with review, I could
easily update to it to current git HEAD and re-post. There's enough
people on this list who do tests on large machines that I was mainly
alerting to where the breaking point is at, the fix required has already
been worked on a bit. Someone with more patience than I to play around
with multi-platform string conversion trivia is what I think is really
needed next, followed by some performance tests on 32-bit systems.

From:	Euler Taveira de Oliveira <euler(at)timbira(dot)com>
To:	Greg Smith <greg(at)2ndquadrant(dot)com>
Cc:	"pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: pgbench to the MAXINT
Date:	2011-01-11 21:34:17
Message-ID:	4D2CCCD9.802@timbira.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

Em 10-01-2011 05:25, Greg Smith escreveu:
> Euler Taveira de Oliveira wrote:
>> Em 07-01-2011 22:59, Greg Smith escreveu:
>>> setrandom: invalid maximum number -2147467296
>>>
>> It is failing at atoi() circa pgbench.c:1036. But it just the first
>> one. There are some variables and constants that need to be converted
>> to int64 and some functions that must speak 64-bit such as getrand().
>> Are you working on a patch?
>
> http://archives.postgresql.org/pgsql-hackers/2010-01/msg02868.php
> http://archives.postgresql.org/message-id/4C326F46.4050801@2ndquadrant.com
>
Greg, I just improved your patch. I tried to work around the problems pointed
out in the above threads. Also, I want to raise some points:

(i) If we want to support and scale factor greater than 21474 we have to
convert some columns to bigint; it will change the test. From the portability
point it is a pity but as we have never supported it I'm not too worried about
it. Why? Because it will use bigint columns only if the scale factor is
greater than 21474. Is it a problem? I don't think so because generally people
compare tests with the same scale factor.

(ii) From the performance perspective, we need to test if the modifications
don't impact performance. I don't create another code path for 64-bit
modifications (it is too ugly) and I'm afraid some modifications affect the
32-bit performance. I'm in a position to test it though because I don't have a
big machine ATM. Greg, could you lead these tests?

(iii) I decided to copy scanint8() (called strtoint64 there) from backend
(Robert suggestion [1]) because Tom pointed out that strtoll() has portability
issues. I replaced atoi() with strtoint64() but didn't do any performance tests.

Comments?

[1] http://archives.postgresql.org/pgsql-hackers/2010-07/msg00173.php

--
Euler Taveira de Oliveira
http://www.timbira.com/

Attachment	Content-Type	Size
pgbench-110111.diff	text/x-patch	11.6 KB

From:	Greg Smith <greg(at)2ndquadrant(dot)com>
To:	Euler Taveira de Oliveira <euler(at)timbira(dot)com>
Cc:	Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [PERFORM] pgbench to the MAXINT
Date:	2011-01-18 18:42:59
Message-ID:	4D35DF33.6060901@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

Euler Taveira de Oliveira wrote:
> (i) If we want to support and scale factor greater than 21474 we have
> to convert some columns to bigint; it will change the test. From the
> portability point it is a pity but as we have never supported it I'm
> not too worried about it. Why? Because it will use bigint columns only
> if the scale factor is greater than 21474. Is it a problem? I don't
> think so because generally people compare tests with the same scale
> factor.
>
> (ii) From the performance perspective, we need to test if the
> modifications don't impact performance. I don't create another code
> path for 64-bit modifications (it is too ugly) and I'm afraid some
> modifications affect the 32-bit performance. I'm in a position to test
> it though because I don't have a big machine ATM. Greg, could you lead
> these tests?
>
> (iii) I decided to copy scanint8() (called strtoint64 there) from
> backend (Robert suggestion [1]) because Tom pointed out that strtoll()
> has portability issues. I replaced atoi() with strtoint64() but didn't
> do any performance tests.

(i): Completely agreed.

(ii): There is no such thing as a "big machine" that is 32 bits now;
anything that's 32 is a tiny system here in 2011. What I can do is
check for degredation on the only 32-bit system I have left here, my
laptop. I'll pick a sensitive test case and take a look.

(iii) This is an important thing to test, particularly given it has the
potential to impact 64-bit results too.

Thanks for picking this up again and finishing the thing off. I'll add
this into my queue of performance tests to run and we can see if this is
worth applying. Probably take a little longer than the usual CF review
time. But as this doesn't interfere with other code people are working
on and is sort of a bug fix, I don't think it will be a problem if it
takes a little longer to get this done.

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Greg Smith <greg(at)2ndquadrant(dot)com>
Cc:	Euler Taveira de Oliveira <euler(at)timbira(dot)com>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [PERFORM] pgbench to the MAXINT
Date:	2011-01-30 20:09:20
Message-ID:	AANLkTinJ0mqkS5sSx8kCm9x+miNeRXdPWaxfVEO8Szjg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On Tue, Jan 18, 2011 at 1:42 PM, Greg Smith <greg(at)2ndquadrant(dot)com> wrote:
> Thanks for picking this up again and finishing the thing off. I'll add this
> into my queue of performance tests to run and we can see if this is worth
> applying. Probably take a little longer than the usual CF review time. But
> as this doesn't interfere with other code people are working on and is sort
> of a bug fix, I don't think it will be a problem if it takes a little longer
> to get this done.

At least in my book, we need to get this committed in the next two
weeks, or wait for 9.2.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Greg Smith <greg(at)2ndquadrant(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Euler Taveira de Oliveira <euler(at)timbira(dot)com>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [PERFORM] pgbench to the MAXINT
Date:	2011-02-04 03:28:55
Message-ID:	4D4B7277.80902@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

Robert Haas wrote:
> At least in my book, we need to get this committed in the next two
> weeks, or wait for 9.2.
>

Yes, I was just suggesting that I was not going to get started in the
first week or two given the other pgbench related tests I had queued up
already. Those are closing up nicely, and I'll start testing
performance of this change over the weekend.

From:	Greg Smith <greg(at)2ndquadrant(dot)com>
To:	Euler Taveira de Oliveira <euler(at)timbira(dot)com>
Cc:	Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [PERFORM] pgbench to the MAXINT
Date:	2011-02-07 16:03:42
Message-ID:	4D5017DE.90404@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

The update on the work to push towards a bigger pgbench is that I now
have the patch running and generating databases larger than any
previously possible scale:

$ time pgbench -i -s 25000 pgbench
...
2500000000 tuples done.
...
real 258m46.350s
user 14m41.970s
sys 0m21.310s

$ psql -d pgbench -c "select
pg_size_pretty(pg_relation_size('pgbench_accounts'));"
pg_size_pretty
----------------
313 GB

$ psql -d pgbench -c "select
pg_size_pretty(pg_relation_size('pgbench_accounts_pkey'));"
pg_size_pretty
----------------
52 GB

$ time psql -d pgbench -c "select count(*) from pgbench_accounts"
count
------------
2500000000

real 18m48.363s
user 0m0.010s
sys 0m0.000s

The only thing wrong with the patch sent already needed to reach this
point was this line:

for (k = 0; k < naccounts * scale; k++)

Which needed a (int64) cast for the multiplied value in the middle there.

Unfortunately the actual test itself doesn't run yet. Every line I see
when running the SELECT-only test says:

client 0 sending SELECT abalance FROM pgbench_accounts WHERE aid = 1;

So something about the updated random generation code isn't quite right
yet. Now that I have this monster built, I'm going to leave it on the
server until I can sort that out, which hopefully will finish up in the
next day or so.

From:	Greg Smith <greg(at)2ndquadrant(dot)com>
To:	Euler Taveira de Oliveira <euler(at)timbira(dot)com>
Cc:	Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [PERFORM] pgbench to the MAXINT
Date:	2011-02-09 08:38:25
Message-ID:	4D525281.1000201@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

Attached is an updated 64-bit pgbench patch that works as expected for
all of the most common pgbench operations, including support for scales
above the previous boundary of just over 21,000. Here's the patched
version running against a 303GB database with a previously unavailable
scale factor:

$ pgbench -T 300 -j 2 -c 4 pgbench
starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 25000
query mode: simple
number of clients: 4
number of threads: 2
duration: 300 s
number of transactions actually processed: 21681
tps = 72.249999 (including connections establishing)
tps = 72.250610 (excluding connections establishing)

And some basic Q/A that the values it touched were in the right range:

$ psql -d pgbench -c "select min(aid),max(aid) from pgbench_accounts";

min | max
-----+------------
1 | 2500000000

$ psql -d pgbench -c "select min(aid),max(aid),count(*) from
pgbench_accounts where abalance!=0" &

min | max | count
-------+------------+-------
51091 | 2499989587 | 21678

(This system was doing 300MB/s on reads while executing that count, and
it still took 19 minutes)

The clever way Euler updated the patch, you don't pay for the larger
on-disk data (bigint columns) unless you use a range that requires it,
which greatly reduces the number of ways the test results can suffer
from this change. I felt the way that was coded was a bit more
complicated than it needed to be though, as it made where that switch
happened at get computed at runtime based on the true size of the
integers. I took that complexity out and just put a hard line in there
instead: if scale>=20000, you get bigints. That's not very different
from the real limit, and it made documenting when the switch happens
easy to write and to remember.

The main performance concern with this change was whether using int64
more internally for computations would slow things down on a 32-bit
system. I thought I'd test that on my few years old laptop. It turns
out that even though I've been running an i386 Linux on here, it's
actually a 64-bit CPU. (I think that it has a 32-bit install may be an
artifact of Adobe Flash install issues, sadly) So this may not be as
good of a test case as I'd hoped. Regardless, running a test aimed to
stress simple SELECTs, the thing I'd expect to suffer most from
additional CPU overhead, didn't show any difference in performance:

$ createdb pgbench
$ pgbench -i -s 10 pgbench
$ psql -c "show shared_buffers"
shared_buffers
----------------
256MB
(1 row)
$ pgbench -S -j 2 -c 4 -T 60 pgbench

i386 x86_64
6932 6924
6923 6926
6923 6922
6688 6772
6914 6791
6902 6916
6917 6909
6943 6837
6689 6744

6688 6744 min
6943 6926 max
6870 6860 average

Given the noise level of pgbench tests, I'm happy saying that is the
same speed. I suspect the real overhead in pgbench's processing relates
to how it is constantly parsing text to turn them into statements, and
that how big the integers it uses are is barley detectable over that.

So...where does that leave this patch? I feel that pgbench will become
less relevant very quickly in 9.1 unless something like this is
committed. And there don't seem to be significant downsides to this in
terms of performance. There are however a few rough points left in here
that might raise concern:

1) A look into the expected range of the rand() function suggests the
glibc implementation normally proves 30 bits of resolution, so about 1
billion numbers. You'll have >1B rows in a pgbench database once the
scale goes over 10,000. So without a major overhaul of how random
number generation is treated here, people can expect the distribution of
rows touched by a test run to get less even once the database scale gets
very large. I added another warning paragraph to the end of the docs in
this update to mention this. Long-term, I suspect we may need to adopt
a superior 64-bit RNG approach, something like a Mersenne Twister
perhaps. That's a bit more than can be chewed on during 9.1 development
though.

2) I'd rate odds are good there's one or more corner-case bugs in
\setrandom or \setshell I haven't found yet, just from the way that code
was converted. Those have some changes I haven't specifically tested
exhaustively yet. I don't see any issues when running the most common
two pgbench tests, but that's doesn't mean every part of that 32 -> 64
bit conversion was done correctly.

Given how I use pgbench, for data generation and rough load testing, I'd
say neither of those concerns outweights the need to expand the size
range of this program. I would be happy to see this go in, followed by
some alpha and beta testing aimed to see if any of the rough spots I'm
concerned about actually appear. Unfortunately I can't fit all of those
tests in right now, as throwing around one of these 300GB data sets is
painful--when you're only getting 72 TPS, looking for large scale
patterns in the transactions takes a long time to do. For example, if I
really wanted a good read on how bad the data distribution skew due to
small random range is, I'd need to let some things run for a week just
for a first pass.

I'd like to see this go in, but the problems I've spotted are such that
I would completely understand this being considered not ready by
others. Just having this patch available here is a very useful step
forward in my mind, because now people can always just grab it and do a
custom build if they run into a larger system.

Wavering between Returned with Feedback and Ready for Committer here.
Thoughts?

Attachment	Content-Type	Size
pgbench-64-v5.patch	text/x-patch	10.2 KB

From:	Stephen Frost <sfrost(at)snowman(dot)net>
To:	Greg Smith <greg(at)2ndquadrant(dot)com>
Cc:	Euler Taveira de Oliveira <euler(at)timbira(dot)com>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [PERFORM] pgbench to the MAXINT
Date:	2011-02-09 19:40:08
Message-ID:	20110209194008.GX4116@tamriel.snowman.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

Greg,

* Greg Smith (greg(at)2ndquadrant(dot)com) wrote:
> I took that complexity out and just put a hard line
> in there instead: if scale>=20000, you get bigints. That's not
> very different from the real limit, and it made documenting when the
> switch happens easy to write and to remember.

Agreed completely on this.

> It turns out that even though I've been running an i386 Linux on
> here, it's actually a 64-bit CPU. (I think that it has a 32-bit
> install may be an artifact of Adobe Flash install issues, sadly) So
> this may not be as good of a test case as I'd hoped.

Actually, I would think it'd still be sufficient.. If you're under a
32bit kernel you're not going to be using the extended registers, etc,
that would be available under a 64bit kernel.. That said, the idea that
we should care about 32-bit systems these days, in a benchmarking tool,
is, well, silly, imv.

> 1) A look into the expected range of the rand() function suggests
> the glibc implementation normally proves 30 bits of resolution, so
> about 1 billion numbers. You'll have >1B rows in a pgbench database
> once the scale goes over 10,000. So without a major overhaul of how
> random number generation is treated here, people can expect the
> distribution of rows touched by a test run to get less even once the
> database scale gets very large.

Just wondering, did you consider just calling random() twice and
smashing the result together..?

> I added another warning paragraph
> to the end of the docs in this update to mention this. Long-term, I
> suspect we may need to adopt a superior 64-bit RNG approach,
> something like a Mersenne Twister perhaps. That's a bit more than
> can be chewed on during 9.1 development though.

I tend to agree that we should be able to improve the random number
generation in the future. Additionally, imv, we should be able to say
"pg_bench version X isn't comparable to version Y" in the release notes
or something, or have seperate version #s for it which make it clear
what can be compared to each other and what can't. Painting ourselves
into a corner by saying we can't ever make pgbench generate results that
can't be compared to every other released version of pgbench just isn't
practical.

> 2) I'd rate odds are good there's one or more corner-case bugs in
> \setrandom or \setshell I haven't found yet, just from the way that
> code was converted. Those have some changes I haven't specifically
> tested exhaustively yet. I don't see any issues when running the
> most common two pgbench tests, but that's doesn't mean every part of
> that 32 -> 64 bit conversion was done correctly.

I'll take a look. :)

Thanks,

Stephen

From:	Greg Smith <greg(at)2ndquadrant(dot)com>
To:	Stephen Frost <sfrost(at)snowman(dot)net>
Cc:	Euler Taveira de Oliveira <euler(at)timbira(dot)com>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [PERFORM] pgbench to the MAXINT
Date:	2011-02-11 02:27:30
Message-ID:	4D549E92.4010709@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

Stephen Frost wrote:
> Just wondering, did you consider just calling random() twice and
> smashing the result together..?
>

I did. The problem is that even within the 32 bits that random()
returns, it's not uniformly distributed. Combining two of them isn't
really going to solve the distribution problem, just move it around.
Some number of lower-order bits are less random than the others, and
which they are is implementation dependent.

Poking around a bit more, I just discovered another possible approach is
to use erand48 instead of rand in pgbench, which is either provided by
the OS or emulated in src/port/erand48.c That's way more resolution
than needed here, given that 2^48 pgbench accounts would be a scale of
2.8M, which makes for a database of about 42 petabytes.

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Greg Smith <greg(at)2ndquadrant(dot)com>
Cc:	Stephen Frost <sfrost(at)snowman(dot)net>, Euler Taveira de Oliveira <euler(at)timbira(dot)com>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [PERFORM] pgbench to the MAXINT
Date:	2011-02-11 03:18:38
Message-ID:	12987.1297394318@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

Greg Smith <greg(at)2ndquadrant(dot)com> writes:
> Poking around a bit more, I just discovered another possible approach is
> to use erand48 instead of rand in pgbench, which is either provided by
> the OS or emulated in src/port/erand48.c That's way more resolution
> than needed here, given that 2^48 pgbench accounts would be a scale of
> 2.8M, which makes for a database of about 42 petabytes.

I think that might be a good idea --- it'd reduce the cross-platform
variability of the results quite a bit, I suspect. random() is not
to be trusted everywhere, but I think erand48 is pretty much the same
wherever it exists at all (and src/port/ provides it elsewhere).

regards, tom lane

From:	Stephen Frost <sfrost(at)snowman(dot)net>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Greg Smith <greg(at)2ndquadrant(dot)com>, Euler Taveira de Oliveira <euler(at)timbira(dot)com>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [PERFORM] pgbench to the MAXINT
Date:	2011-02-11 13:35:51
Message-ID:	20110211133551.GE4116@tamriel.snowman.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

Greg,

* Tom Lane (tgl(at)sss(dot)pgh(dot)pa(dot)us) wrote:
> Greg Smith <greg(at)2ndquadrant(dot)com> writes:
> > Poking around a bit more, I just discovered another possible approach is
> > to use erand48 instead of rand in pgbench, which is either provided by
> > the OS or emulated in src/port/erand48.c That's way more resolution
> > than needed here, given that 2^48 pgbench accounts would be a scale of
> > 2.8M, which makes for a database of about 42 petabytes.
>
> I think that might be a good idea --- it'd reduce the cross-platform
> variability of the results quite a bit, I suspect. random() is not
> to be trusted everywhere, but I think erand48 is pretty much the same
> wherever it exists at all (and src/port/ provides it elsewhere).

Works for me. Greg, will you be able to work on this change? If not, I
might be able to.

Thanks,

Stephen

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Stephen Frost <sfrost(at)snowman(dot)net>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Smith <greg(at)2ndquadrant(dot)com>, Euler Taveira de Oliveira <euler(at)timbira(dot)com>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [PERFORM] pgbench to the MAXINT
Date:	2011-02-16 02:41:24
Message-ID:	AANLkTinQadRK_DEYf2ezwGebzppxvwYpwHSBcx7i0_am@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On Fri, Feb 11, 2011 at 8:35 AM, Stephen Frost <sfrost(at)snowman(dot)net> wrote:
> Greg,
>
> * Tom Lane (tgl(at)sss(dot)pgh(dot)pa(dot)us) wrote:
>> Greg Smith <greg(at)2ndquadrant(dot)com> writes:
>> > Poking around a bit more, I just discovered another possible approach is
>> > to use erand48 instead of rand in pgbench, which is either provided by
>> > the OS or emulated in src/port/erand48.c That's way more resolution
>> > than needed here, given that 2^48 pgbench accounts would be a scale of
>> > 2.8M, which makes for a database of about 42 petabytes.
>>
>> I think that might be a good idea --- it'd reduce the cross-platform
>> variability of the results quite a bit, I suspect. random() is not
>> to be trusted everywhere, but I think erand48 is pretty much the same
>> wherever it exists at all (and src/port/ provides it elsewhere).
>
> Works for me. Greg, will you be able to work on this change? If not, I
> might be able to.

Seeing as how this patch has not been updated, I think it's time to
mark this one Returned with Feedback.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Greg Smith <greg(at)2ndquadrant(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Stephen Frost <sfrost(at)snowman(dot)net>, Euler Taveira de Oliveira <euler(at)timbira(dot)com>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [PERFORM] pgbench to the MAXINT
Date:	2011-02-16 13:15:41
Message-ID:	4D5BCDFD.2010703@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

Tom Lane wrote:
> I think that might be a good idea --- it'd reduce the cross-platform
> variability of the results quite a bit, I suspect. random() is not
> to be trusted everywhere, but I think erand48 is pretty much the same
> wherever it exists at all (and src/port/ provides it elsewhere).
>

Given that pgbench will run with threads in some multi-worker
configurations, after some more portability research I think odds are
good we'd get nailed by
http://sourceware.org/bugzilla/show_bug.cgi?id=10320 : "erand48
implementation not thread safe but POSIX says it should be". The AIX
docs have a similar warning on them, so who knows how many versions of
that library have the same issue.

Maybe we could make sure the one in src/port/ is thread safe and make
sure pgbench only uses it. This whole area continues to be messy enough
that I think the patch needs to brew for another CF before it will all
be sorted out properly. I'll mark it accordingly and can pick this back
up later.

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Greg Smith <greg(at)2ndquadrant(dot)com>
Cc:	Stephen Frost <sfrost(at)snowman(dot)net>, Euler Taveira de Oliveira <euler(at)timbira(dot)com>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [PERFORM] pgbench to the MAXINT
Date:	2011-02-16 15:40:31
Message-ID:	25309.1297870831@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

Greg Smith <greg(at)2ndquadrant(dot)com> writes:
> Given that pgbench will run with threads in some multi-worker
> configurations, after some more portability research I think odds are
> good we'd get nailed by
> http://sourceware.org/bugzilla/show_bug.cgi?id=10320 : "erand48
> implementation not thread safe but POSIX says it should be". The AIX
> docs have a similar warning on them, so who knows how many versions of
> that library have the same issue.

FWIW, I think that bug report is effectively complaining that if you use
both drand48 and erand48, the former can impact the latter. If you use
only erand48, I don't see that there's any problem.

regards, tom lane

From:	Gurjeet Singh <singh(dot)gurjeet(at)gmail(dot)com>
To:	Greg Smith <greg(at)2ndquadrant(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Stephen Frost <sfrost(at)snowman(dot)net>, Euler Taveira de Oliveira <euler(at)timbira(dot)com>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [PERFORM] pgbench to the MAXINT
Date:	2012-12-21 06:16:12
Message-ID:	CABwTF4WdTZuooQVHSOSLrWWfyMQeBcGwLKzSBSsTuQxVY8hDTQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On Wed, Feb 16, 2011 at 8:15 AM, Greg Smith <greg(at)2ndquadrant(dot)com> wrote:

> Tom Lane wrote:
>
>> I think that might be a good idea --- it'd reduce the cross-platform
>> variability of the results quite a bit, I suspect. random() is not
>> to be trusted everywhere, but I think erand48 is pretty much the same
>> wherever it exists at all (and src/port/ provides it elsewhere).
>>
>>
>
> Given that pgbench will run with threads in some multi-worker
> configurations, after some more portability research I think odds are good
> we'd get nailed by http://sourceware.org/**bugzilla/show_bug.cgi?id=10320<http://sourceware.org/bugzilla/show_bug.cgi?id=10320>: "erand48 implementation not thread safe but POSIX says it should be".
> The AIX docs have a similar warning on them, so who knows how many
> versions of that library have the same issue.
>
> Maybe we could make sure the one in src/port/ is thread safe and make sure
> pgbench only uses it. This whole area continues to be messy enough that I
> think the patch needs to brew for another CF before it will all be sorted
> out properly. I'll mark it accordingly and can pick this back up later.
>

Hi Greg,

I spent some time rebasing this patch to current master. Attached is
the patch, based on master couple of commits old.

Your concern of using erand48() has been resolved since pgbench now
uses thread-safe and concurrent pg_erand48() from src/port/.

The patch is very much what you had posted, except for a couple of
differences due to bit-rot. (i) I didn't have to #define MAX_RANDOM_VALUE64
since its cousin MAX_RANDOM_VALUE is not used by code anymore, and (ii) I
used ternary operator in DDLs[] array to decide when to use bigint vs int
columns.

Please review.

As for tests, I am currently running 'pgbench -i -s 21474' using
unpatched pgbench, and am recording the time taken;Scale factor 21475 had
actually failed to do anything meaningful using unpatched pgbench. Next
I'll run with '-s 21475' on patched version to see if it does the right
thing, and in acceptable time compared to '-s 21474'.

What tests would you and others like to see, to get some confidence in
the patch? The machine that I have access to has 62 GB RAM, 16-core
64-hw-threads, and about 900 GB of disk space.

Linux <host> 3.2.6-3.fc16.ppc64 #1 SMP Fri Feb 17 21:41:20 UTC 2012 ppc64
ppc64 ppc64 GNU/Linux

Best regards,

PS: The primary source of patch is this branch:
https://github.com/gurjeet/postgres/tree/64bit_pgbench
--
Gurjeet Singh

http://gurjeet.singh.im/

Attachment	Content-Type	Size
pgbencg-64-v6.patch	application/octet-stream	8.5 KB

From:	Satoshi Nagayasu <snaga(at)uptime(dot)jp>
To:	Gurjeet Singh <singh(dot)gurjeet(at)gmail(dot)com>
Cc:	Greg Smith <greg(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Stephen Frost <sfrost(at)snowman(dot)net>, Euler Taveira de Oliveira <euler(at)timbira(dot)com>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [PERFORM] pgbench to the MAXINT
Date:	2013-01-27 04:24:27
Message-ID:	CAA8sozcukQOfmio7RGPUX3HnAHxPs=VV22u1xkaE49L0dHGm0Q@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

Hi,

I have reviewed this patch.

https://commitfest.postgresql.org/action/patch_view?id=1068

2012/12/21 Gurjeet Singh <singh(dot)gurjeet(at)gmail(dot)com>:
> The patch is very much what you had posted, except for a couple of
> differences due to bit-rot. (i) I didn't have to #define MAX_RANDOM_VALUE64
> since its cousin MAX_RANDOM_VALUE is not used by code anymore, and (ii) I
> used ternary operator in DDLs[] array to decide when to use bigint vs int
> columns.
>
> Please review.
>
> As for tests, I am currently running 'pgbench -i -s 21474' using
> unpatched pgbench, and am recording the time taken;Scale factor 21475 had
> actually failed to do anything meaningful using unpatched pgbench. Next I'll
> run with '-s 21475' on patched version to see if it does the right thing,
> and in acceptable time compared to '-s 21474'.
>
> What tests would you and others like to see, to get some confidence in
> the patch? The machine that I have access to has 62 GB RAM, 16-core
> 64-hw-threads, and about 900 GB of disk space.

I have tested this patch, and hvae confirmed that the columns
for aid would be switched to using bigint, instead of int,
when the scalefactor >= 20,000.
(aid columns would exeed the upper bound of int when sf>21474.)

Also, I added a few fixes on it.

- Fixed to apply for the current git master.
- Fixed to surpress few more warnings about INT64_FORMAT.
- Minor improvement in the docs. (just my suggestion)

I attached the revised one.

Regards,
--
Satoshi Nagayasu <snaga(at)uptime(dot)jp>
Uptime Technologies, LLC http://www.uptime.jp/

Attachment	Content-Type	Size
pgbench-64-v7.patch	application/octet-stream	9.4 KB

From:	Gurjeet Singh <singh(dot)gurjeet(at)gmail(dot)com>
To:	Satoshi Nagayasu <snaga(at)uptime(dot)jp>
Cc:	Greg Smith <greg(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Stephen Frost <sfrost(at)snowman(dot)net>, Euler Taveira de Oliveira <euler(at)timbira(dot)com>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [PERFORM] pgbench to the MAXINT
Date:	2013-01-28 21:30:51
Message-ID:	CABwTF4WpWqYHWYU9TaRmefDYDBWxFjjNJ0TWadGzBAP-mGxuKw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On Sat, Jan 26, 2013 at 11:24 PM, Satoshi Nagayasu <snaga(at)uptime(dot)jp> wrote:

> Hi,
>
> I have reviewed this patch.
>
> https://commitfest.postgresql.org/action/patch_view?id=1068
>
> 2012/12/21 Gurjeet Singh <singh(dot)gurjeet(at)gmail(dot)com>:
> > The patch is very much what you had posted, except for a couple of
> > differences due to bit-rot. (i) I didn't have to #define
> MAX_RANDOM_VALUE64
> > since its cousin MAX_RANDOM_VALUE is not used by code anymore, and (ii) I
> > used ternary operator in DDLs[] array to decide when to use bigint vs int
> > columns.
> >
> > Please review.
> >
> > As for tests, I am currently running 'pgbench -i -s 21474' using
> > unpatched pgbench, and am recording the time taken;Scale factor 21475 had
> > actually failed to do anything meaningful using unpatched pgbench. Next
> I'll
> > run with '-s 21475' on patched version to see if it does the right thing,
> > and in acceptable time compared to '-s 21474'.
> >
> > What tests would you and others like to see, to get some confidence
> in
> > the patch? The machine that I have access to has 62 GB RAM, 16-core
> > 64-hw-threads, and about 900 GB of disk space.
>
> I have tested this patch, and hvae confirmed that the columns
> for aid would be switched to using bigint, instead of int,
> when the scalefactor >= 20,000.
> (aid columns would exeed the upper bound of int when sf>21474.)
>
> Also, I added a few fixes on it.
>
> - Fixed to apply for the current git master.
> - Fixed to surpress few more warnings about INT64_FORMAT.
> - Minor improvement in the docs. (just my suggestion)
>
> I attached the revised one.
>

Looks good to me. Thanks!

--
Gurjeet Singh

http://gurjeet.singh.im/

From:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To:	Gurjeet Singh <singh(dot)gurjeet(at)gmail(dot)com>
Cc:	Satoshi Nagayasu <snaga(at)uptime(dot)jp>, Greg Smith <greg(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Stephen Frost <sfrost(at)snowman(dot)net>, Euler Taveira de Oliveira <euler(at)timbira(dot)com>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [PERFORM] pgbench to the MAXINT
Date:	2013-01-29 10:12:41
Message-ID:	5107A099.6030208@vmware.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers pgsql-performance

On 28.01.2013 23:30, Gurjeet Singh wrote:
> On Sat, Jan 26, 2013 at 11:24 PM, Satoshi Nagayasu<snaga(at)uptime(dot)jp> wrote:
>
>> 2012/12/21 Gurjeet Singh<singh(dot)gurjeet(at)gmail(dot)com>:
>>> The patch is very much what you had posted, except for a couple of
>>> differences due to bit-rot. (i) I didn't have to #define
>> MAX_RANDOM_VALUE64
>>> since its cousin MAX_RANDOM_VALUE is not used by code anymore, and (ii) I
>>> used ternary operator in DDLs[] array to decide when to use bigint vs int
>>> columns.
>>>
>>> Please review.
>>>
>>> As for tests, I am currently running 'pgbench -i -s 21474' using
>>> unpatched pgbench, and am recording the time taken;Scale factor 21475 had
>>> actually failed to do anything meaningful using unpatched pgbench. Next
>> I'll
>>> run with '-s 21475' on patched version to see if it does the right thing,
>>> and in acceptable time compared to '-s 21474'.
>>>
>>> What tests would you and others like to see, to get some confidence
>> in
>>> the patch? The machine that I have access to has 62 GB RAM, 16-core
>>> 64-hw-threads, and about 900 GB of disk space.
>>
>> I have tested this patch, and hvae confirmed that the columns
>> for aid would be switched to using bigint, instead of int,
>> when the scalefactor>= 20,000.
>> (aid columns would exeed the upper bound of int when sf>21474.)
>>
>> Also, I added a few fixes on it.
>>
>> - Fixed to apply for the current git master.
>> - Fixed to surpress few more warnings about INT64_FORMAT.
>> - Minor improvement in the docs. (just my suggestion)
>>
>> I attached the revised one.
>
> Looks good to me. Thanks!

Ok, committed.

At some point, we might want to have a strtoll() implementation in src/port.

- Heikki