Re: 64-bit pgbench V2

Lists: pgsql-hackers
From: Greg Smith <greg(at)2ndquadrant(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: 64-bit pgbench V2
Date: 2010-07-05 23:48:22
Message-ID: 4C326F46.4050801@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Attached is an updated second rev of the patch I sent a few months ago,
to expand pgbench to support database scales larger than around
4,294--where the 32-bit integer for the account number overflows in the
current version. The current limit makes for about a 60GB database.
Last week I ran this on a system with 72GB of RAM, which are already
quite common, and wasn't able to get a test that didn't fit in RAM.
Without a bug fix here I am concerned that pgbench will ship in 9.0
already obsolete for the generation of hardware is it going to be
deployed on.

The main tricky part was figuring how to convert the \setshell
implementation. That uses strtol to parse the number that should have
been returned by the shell call. It turns out there are a stack of ways
to do something similar but return 64 bits instead:

* strtoll is defined by ISO C99
* strtoq was used on some earlier BSD systems
* MSVC has _strtoi64 for signed and _strtoui64 for unsigned 64bit integers

According to the glib docs at
http://www.gnu.org/software/gnulib/manual/html_node/strtoll.html ,
stroll is missing on HP-UX 11, OSF/1 5.1, Interix 3.5, so one of the
HP-UX boxes might be a useful testbed for what works on a trickier platform.

For prototype purposes, I wrote the patch to include some minimal logic
to map the facility available to strtoint64, falling back to the 32-bit
strtol if that's the best available. There are three ways I could
forsee this going:

1) Keep this ugly bit of code isolated to pgbench
2) Move it to src/include/c.h where the other 64-bit int abstraction is done
3) Push the problem toward autoconf

I don't have a clear argument for or against those individual options,
they all seem reasonable from some perspectives.

The only open issue I'm not sure about is whether the situation where
the code falls back to 32-bits should be documented, or even a warning
produced if you create something at a scale without some strtoll
available. Given that it only impacts the \setrandom case, it's not
really a disaster that it might not work, so long as there's
documentation explaining the potential limitations. I'll write those if
necessary, but I think that some testing on known tricky platforms that
I don't have setup here is the best next step, so I'm looking for
feedback on that.

--
Greg Smith 2ndQuadrant US Baltimore, MD
PostgreSQL Training, Services and Support
greg(at)2ndQuadrant(dot)com www.2ndQuadrant.us

Attachment Content-Type Size
pgbench-64-v2.patch text/x-patch 4.9 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Greg Smith <greg(at)2ndquadrant(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: 64-bit pgbench V2
Date: 2010-07-06 00:17:24
Message-ID: 13407.1278375444@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Greg Smith <greg(at)2ndquadrant(dot)com> writes:
> The main tricky part was figuring how to convert the \setshell
> implementation. That uses strtol to parse the number that should have
> been returned by the shell call. It turns out there are a stack of ways
> to do something similar but return 64 bits instead:

Please choose a way that doesn't introduce new portability assumptions.
The backend gets along fine without strtoll, and I don't see why pgbench
should have to require it.

(BTW, I don't actually believe that the proposed code works at all,
since in general strtoll or other variants aren't going to be macros,
but plain functions.)

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Greg Smith <greg(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: 64-bit pgbench V2
Date: 2010-07-06 00:32:17
Message-ID: AANLkTill9PpjmUxiHx7hScosM7E2zB6f4h2Z1lnQrpRd@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Jul 5, 2010 at 8:17 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Greg Smith <greg(at)2ndquadrant(dot)com> writes:
>> The main tricky part was figuring how to convert the \setshell
>> implementation.  That uses strtol to parse the number that should have
>> been returned by the shell call.  It turns out there are a stack of ways
>> to do something similar but return 64 bits instead:
>
> Please choose a way that doesn't introduce new portability assumptions.
> The backend gets along fine without strtoll, and I don't see why pgbench
> should have to require it.

It doesn't seem very palatable to have multiple handwritten integer
parsers floating around the code base either. Maybe we should try to
standardize something and ship it in src/port, or somesuch.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Greg Smith <greg(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: 64-bit pgbench V2
Date: 2010-07-06 15:01:36
Message-ID: 4C334550.9090607@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas wrote:
> It doesn't seem very palatable to have multiple handwritten integer
> parsers floating around the code base either. Maybe we should try to
> standardize something and ship it in src/port, or somesuch

I was considering at one point making two trips through strtol, each
allowed to gobble 10 characters, then combining the two--just to cut
down a little bit on the roll your own parser aspects here. I hadn't
really considered how the main server does this job though. If there's
something reasonable to expose by refactoring some code that's already
there, I could take a stab at that. I'm not exactly sure where the
integer parsing code in the server that would be appropriate is to break
out is at though.

--
Greg Smith 2ndQuadrant US Baltimore, MD
PostgreSQL Training, Services and Support
greg(at)2ndQuadrant(dot)com www.2ndQuadrant.us


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Greg Smith <greg(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: 64-bit pgbench V2
Date: 2010-07-06 15:03:24
Message-ID: AANLkTilfU2ZxPo_YdspkDDz2ENM8Yh5PGOUHLqtN8TBo@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Jul 6, 2010 at 11:01 AM, Greg Smith <greg(at)2ndquadrant(dot)com> wrote:
> Robert Haas wrote:
>>
>> It doesn't seem very palatable to have multiple handwritten integer
>> parsers floating around the code base either.  Maybe we should try to
>> standardize something and ship it in src/port, or somesuch
>
> I was considering at one point making two trips through strtol, each allowed
> to gobble 10 characters, then combining the two--just to cut down a little
> bit on the roll your own parser aspects here.  I hadn't really considered
> how the main server does this job though.  If there's something reasonable
> to expose by refactoring some code that's already there, I could take a stab
> at that.  I'm not exactly sure where the integer parsing code in the server
> that would be appropriate is to break out is at though.

Take a look at int8in. It's got some backend-specific stuff in it ATM
but maybe it would be reasonable to try to fact that out somehow.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Greg Smith <greg(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: 64-bit pgbench V2
Date: 2010-07-12 19:56:51
Message-ID: 4C3B7383.5010200@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> Please choose a way that doesn't introduce new portability assumptions.
> The backend gets along fine without strtoll, and I don't see why pgbench
> should have to require it.
>

Funny you should mention this...it turns out there is some code already
there, I just didn't notice it before because it's only the unsigned
64-bit strtoul used, not the signed one I was looking for, and it's only
called in one place I didn't previously check.
src/interfaces/ecpg/ecpglib/data.c does this:

*((unsigned long long int *) (var + offset * act_tuple)) =
strtoull(pval, &scan_length, 10);

The appropriate autoconf magic was in the code all along for both
versions, so my bad not noticing it until now. It even transparently
remaps the BSD-ism of calling it strtoq.

I suspect that this alone isn't sufficient to make the code I'm trying
to wedge into pgbench to always work on the platforms I consider must
haves, because of the weird names like _strtoi64 that Windows uses:
http://msdn.microsoft.com/en-us/library/h80404d3(v=VS.80).aspx In fact,
I wouldn't be surprised to discover the ECPG code above doesn't do the
right thing if compiled with a 64-bit MSVC version. Don't expect that's
a popular combination to explicitly test in a way that hits the code
path where this line is at.

The untested (I need to setup for building Windows to really confirm
this works) next patch attempt I've attached does what I think is the
right general sort of thing here. It extends the autoconf remapping
that was already being done to include the second variation on how the
function needed can be named in a MSVC build. This might improve the
ECPG compatibility issue I theorize could be there on that platform.
Given the autoconf stuff and use of the unsigned version was already a
dependency, I'd rather improve that code (so it's more obvious when it
is broken) than do the refactoring work suggested to re-use the server's
internal 64-bit parsing method instead. I could split this into two
patches instead--"add 64-bit strtoull/strtoll support for MSVC" on the
presumption it's actually broken now (possibly wrong on my part) and
"make pgbench use 64-bit values"--but it's not so complicated as one.

I expect there is almost zero overlap between "needs pgbench setshell to
return >32 bit return values" and "not on a platform with a working
64-bit strtoull variation". What I did to hedge against that was add a
little check to pgbench that lets you confirm whether setshell lines are
limited to 32 bits or not, depending on whether the appropriate function
was found. It tries to fall back to the existing strtol in that case,
and I've put a note when that happens (and matching documentation to
look for it) into the debug output of the program.

I'll continue with testing work here, but what's attached is now the
first form I think this could potentially be committed in once it's
known to be free of obvious bugs (testing at this database scale takes
forever). I can revisit not using the library function instead if Tom
or someone else really opposes this new approach. Given most of the
autoconf bits are already there and the limited number of platforms
where this is a problem, I think there's little gain for doing that work
though.

Style/functional suggestions appreciated.

--
Greg Smith 2ndQuadrant US Baltimore, MD
PostgreSQL Training, Services and Support
greg(at)2ndQuadrant(dot)com www.2ndQuadrant.us

Attachment Content-Type Size
pgbench-64-v3.patch text/x-patch 7.8 KB

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Greg Smith <greg(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: 64-bit pgbench V2
Date: 2011-02-06 16:09:42
Message-ID: 201102061609.p16G9gT17831@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


What happened to this idea/patch?

---------------------------------------------------------------------------

Greg Smith wrote:
> Tom Lane wrote:
> > Please choose a way that doesn't introduce new portability assumptions.
> > The backend gets along fine without strtoll, and I don't see why pgbench
> > should have to require it.
> >
>
> Funny you should mention this...it turns out there is some code already
> there, I just didn't notice it before because it's only the unsigned
> 64-bit strtoul used, not the signed one I was looking for, and it's only
> called in one place I didn't previously check.
> src/interfaces/ecpg/ecpglib/data.c does this:
>
> *((unsigned long long int *) (var + offset * act_tuple)) =
> strtoull(pval, &scan_length, 10);
>
> The appropriate autoconf magic was in the code all along for both
> versions, so my bad not noticing it until now. It even transparently
> remaps the BSD-ism of calling it strtoq.
>
> I suspect that this alone isn't sufficient to make the code I'm trying
> to wedge into pgbench to always work on the platforms I consider must
> haves, because of the weird names like _strtoi64 that Windows uses:
> http://msdn.microsoft.com/en-us/library/h80404d3(v=VS.80).aspx In fact,
> I wouldn't be surprised to discover the ECPG code above doesn't do the
> right thing if compiled with a 64-bit MSVC version. Don't expect that's
> a popular combination to explicitly test in a way that hits the code
> path where this line is at.
>
> The untested (I need to setup for building Windows to really confirm
> this works) next patch attempt I've attached does what I think is the
> right general sort of thing here. It extends the autoconf remapping
> that was already being done to include the second variation on how the
> function needed can be named in a MSVC build. This might improve the
> ECPG compatibility issue I theorize could be there on that platform.
> Given the autoconf stuff and use of the unsigned version was already a
> dependency, I'd rather improve that code (so it's more obvious when it
> is broken) than do the refactoring work suggested to re-use the server's
> internal 64-bit parsing method instead. I could split this into two
> patches instead--"add 64-bit strtoull/strtoll support for MSVC" on the
> presumption it's actually broken now (possibly wrong on my part) and
> "make pgbench use 64-bit values"--but it's not so complicated as one.
>
> I expect there is almost zero overlap between "needs pgbench setshell to
> return >32 bit return values" and "not on a platform with a working
> 64-bit strtoull variation". What I did to hedge against that was add a
> little check to pgbench that lets you confirm whether setshell lines are
> limited to 32 bits or not, depending on whether the appropriate function
> was found. It tries to fall back to the existing strtol in that case,
> and I've put a note when that happens (and matching documentation to
> look for it) into the debug output of the program.
>
> I'll continue with testing work here, but what's attached is now the
> first form I think this could potentially be committed in once it's
> known to be free of obvious bugs (testing at this database scale takes
> forever). I can revisit not using the library function instead if Tom
> or someone else really opposes this new approach. Given most of the
> autoconf bits are already there and the limited number of platforms
> where this is a problem, I think there's little gain for doing that work
> though.
>
> Style/functional suggestions appreciated.
>
> --
> Greg Smith 2ndQuadrant US Baltimore, MD
> PostgreSQL Training, Services and Support
> greg(at)2ndQuadrant(dot)com www.2ndQuadrant.us
>

>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +


From: Euler Taveira de Oliveira <euler(at)timbira(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Greg Smith <greg(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: 64-bit pgbench V2
Date: 2011-02-06 19:04:38
Message-ID: 4D4EF0C6.4090801@timbira.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Em 06-02-2011 13:09, Bruce Momjian escreveu:
>
> What happened to this idea/patch?
>
I refactored the patch [1] to not depend on strtoll.

[1] http://archives.postgresql.org/message-id/4D2CCCD9.802@timbira.com

--
Euler Taveira de Oliveira
http://www.timbira.com/