Re: pgbench --unlogged-tables

Lists: pgsql-hackers
From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: pgbench --unlogged-tables
Date: 2011-07-22 20:40:37
Message-ID: CA+TgmoZJqPcnGFDoWasz0EdnB=7d6wawvDLCoTzSmT9F1TZteQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I know I'm not the only one to hack up pgbench to create unlogged
tables, so I thought maybe it would be useful to have an option to do
that.

I wasn't excited about picking a single letter option name, so I
modified pgbench to use getopt_long. Patch attached.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment Content-Type Size
pgbench-unlogged.patch application/octet-stream 4.0 KB

From: Greg Smith <greg(at)2ndQuadrant(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: pgbench --unlogged-tables
Date: 2011-07-22 21:15:37
Message-ID: 4E29E879.4040709@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

That looks straightforward enough. The other thing I keep realizing
would be useful recently is to allow specifying a different tablespace
to switch to when creating all of the indexes. The old "data here,
indexes on faster storage here" trick was already popular in some
environments. But it's becoming a really big win for environments that
put indexes on SSD, and being able to simulate that easily with pgbench
would be nice.

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us


From: David Fetter <david(at)fetter(dot)org>
To: Greg Smith <greg(at)2ndQuadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: pgbench --unlogged-tables
Date: 2011-07-23 00:15:13
Message-ID: 20110723001513.GB28783@fetter.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Jul 22, 2011 at 05:15:37PM -0400, Greg Smith wrote:
> That looks straightforward enough. The other thing I keep realizing
> would be useful recently is to allow specifying a different
> tablespace to switch to when creating all of the indexes. The old
> "data here, indexes on faster storage here" trick was already
> popular in some environments. But it's becoming a really big win
> for environments that put indexes on SSD, and being able to simulate
> that easily with pgbench would be nice.

Do you have any theories as to how indexing on SSD speeds things up?
IIRC you found only marginal benefit in putting WALs there. Are there
cases that SSD helps more than others when it comes to indexing?

Cheers,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


From: Greg Smith <greg(at)2ndQuadrant(dot)com>
To: David Fetter <david(at)fetter(dot)org>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: pgbench --unlogged-tables
Date: 2011-07-23 02:15:08
Message-ID: 4E2A2EAC.5060002@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 07/22/2011 08:15 PM, David Fetter wrote:
> Do you have any theories as to how indexing on SSD speeds things up?
> IIRC you found only marginal benefit in putting WALs there. Are there
> cases that SSD helps more than others when it comes to indexing?
>

Yes, I've found a variety of workloads where using a SSD turns out to be
slower than the old-school array of drives with a battery-backed write
cache. Tiny commits are slower, sequential writes can easily be slower,
and if there isn't a random I/O component to the job the SSD won't get
any way to make up for that.

In the standard pgbench case, the heavy UPDATE traffic does a lot of
random writes to the index blocks of the pgbench_accounts table. Even
in cases where the whole index fits into RAM, having the indexes backed
by a faster store can end up speeding those up, particularly at
checkpoint time. And if you can't quite fit the whole index in RAM, but
it does fit on the SSD, being able to shuffle it in/out of flash as
needed to look pointers to data blocks is a whole lot better than
seeking around a regular drive. That case is where the biggest win
seems to be at.

I'd like to publish some hard numbers on all this, but have realized I
need to relocate just the pgbench indexes to do a good simulation. And
I'm getting tired of doing that manually. If I'm going to put time into
testing this unlogged table variation that Robert has submitted, and I
expect to, I'm just pointing out I'd like to that the "index on
alternate tablespace" one available then too.

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Greg Smith <greg(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: pgbench --unlogged-tables
Date: 2011-07-25 13:23:01
Message-ID: CA+Tgmob54uZNwQ7ZJ5aPmsGjfPfhRY93Nk4VuOrX1p0j2Hpb9A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Jul 22, 2011 at 5:15 PM, Greg Smith <greg(at)2ndquadrant(dot)com> wrote:
> That looks straightforward enough.

OK, committed.

> The other thing I keep realizing would
> be useful recently is to allow specifying a different tablespace to switch
> to when creating all of the indexes.  The old "data here, indexes on faster
> storage here" trick was already popular in some environments.  But it's
> becoming a really big win for environments that put indexes on SSD, and
> being able to simulate that easily with pgbench would be nice.

Hearing no objections, I did this, too.

At some point, we also need to sort out the scale factor limit issues,
so you can make these things bigger.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Greg Smith <greg(at)2ndQuadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: pgbench --unlogged-tables
Date: 2011-07-25 20:52:41
Message-ID: 4E2DD799.4000305@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 07/25/2011 09:23 AM, Robert Haas wrote:
> At some point, we also need to sort out the scale factor limit issues,
> so you can make these things bigger.
>

I had a patch to improve that whole situation, but it hasn't seem to nag
at me recently. I forget why it seemed less important, but I doubt I'll
make it another six months without coming to some resolution there.

The two systems I have in for benchmarking right now have 128GB and
192GB of RAM in them, so large scales should have been tested.
Unfortunately, it looks like the real-world limiting factor on doing
lots of tests at big scales is how long it takes to populate the data
set. For example, here's pgbench creation time on a big server (48
cores, 128GB RAM) with a RAID10 array, when scale=20000 (292GB):

real 174m12.055s
user 17m35.994s
sys 0m52.358s

And here's the same server putting the default tablespace (but not the
WAL) on [much faster flash device I can't talk about yet]:

Creating new pgbench tables, scale=20000
real 169m59.541s
user 18m19.527s
sys 0m52.833s

I was hoping for a bigger drop here; maybe I needed to use unlogged
tables? (ha!) I think I need to start looking at the pgbench data
generation stage as its own optimization problem. Given how expensive
systems this large are, I never get them for very long before they are
rushed into production. People don't like hearing that just generating
the data set for a useful test is going to take 3 hours; that tends to
limit how many of them I can schedule running.

And, yes, I'm going to try and sneak in some time to test fastpatch
locking on one of these before they head into production.

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us


From: David Fetter <david(at)fetter(dot)org>
To: Greg Smith <greg(at)2ndQuadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: pgbench --unlogged-tables
Date: 2011-07-25 23:16:01
Message-ID: 20110725231601.GF28754@fetter.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Jul 22, 2011 at 10:15:08PM -0400, Greg Smith wrote:
> On 07/22/2011 08:15 PM, David Fetter wrote:
> >Do you have any theories as to how indexing on SSD speeds things
> >up? IIRC you found only marginal benefit in putting WALs there.
> >Are there cases that SSD helps more than others when it comes to
> >indexing?
>
> Yes, I've found a variety of workloads where using a SSD turns out
> to be slower than the old-school array of drives with a
> battery-backed write cache. Tiny commits are slower, sequential
> writes can easily be slower, and if there isn't a random I/O
> component to the job the SSD won't get any way to make up for that.

So you're saying this is more of a flash thing than an SSD thing? I
haven't heard of systems with PCM having this limitation.

Cheers,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate