Re: Fwd: Re: [PERFORM] MySQL vs PG TPC-H benchmarks

Lists: pgsql-advocacy
From: "Merlin Moncure" <merlin(dot)moncure(at)rcsonline(dot)com>
To: "Josh Berkus" <josh(at)agliodbs(dot)com>
Cc: <pgsql-advocacy(at)postgresql(dot)org>
Subject: Re: Fwd: Re: [PERFORM] MySQL vs PG TPC-H benchmarks
Date: 2004-04-21 18:51:34
Message-ID: 6EE64EF3AB31D5448D0007DD34EEB34101ADC9@Herge.rcsinc.local
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy


> Perhaps one of the advocay team will pick up the batton?
He is using COPY to load the data...I can't think of any earthly reason
why it takes > 1d to load 10gb data...probabaly all boils down to
default shared buffer setting. I don't even really consider this
'optimizing', just basic configuring to match the software to the
machine.

Also, the create table statements do not have primary/foreign key
definitions...from his comments on the results page it's not clear if
this is intentional...

If RI is set up properly it may explain why the results are off.
Perhaps the data generating app is not functioning properly in some way.
( this might explain the tpc errors as well ). The fact that his
results are not returning correct row count is setting off warning
bells. Most of the use cases are relatively simple joins, actually.
Maybe one of the key columns is all nulls, or some similar strangeness.

It would be useful to know if his server is I/O or cpu bound. My guess
is that the server swapping stuff all while...

Running ANALYZE after import can work wonders...or does it? I don't
usually use COPY to do the import. Perhaps create indexes/constraints
after import?

Some explains might be helpful. Still, shared buffers is the first
thing to look at. Maybe if I get around to it, I'll try the tpc-h out
here.

Merlin


From: Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>
To: Merlin Moncure <merlin(dot)moncure(at)rcsonline(dot)com>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-advocacy(at)postgresql(dot)org
Subject: Re: Fwd: Re: [PERFORM] MySQL vs PG TPC-H benchmarks
Date: 2004-04-21 21:39:58
Message-ID: Pine.LNX.4.58.0404220738290.27222@linuxworld.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy

On Wed, 21 Apr 2004, Merlin Moncure wrote:

>
> > Perhaps one of the advocay team will pick up the batton?
> He is using COPY to load the data...I can't think of any earthly reason
> why it takes > 1d to load 10gb data...probabaly all boils down to
> default shared buffer setting. I don't even really consider this
> 'optimizing', just basic configuring to match the software to the
> machine.

The indexes should be created after the data is loaded. There is no
mention of VACUUM, ANALYZE or tuning.

I've generated the work load over night and am going to have a quick look
at it when I get a minute.

Gavin


From: Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>
To: Merlin Moncure <merlin(dot)moncure(at)rcsonline(dot)com>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-advocacy(at)postgresql(dot)org
Subject: Re: Fwd: Re: [PERFORM] MySQL vs PG TPC-H benchmarks
Date: 2004-04-22 08:55:06
Message-ID: Pine.LNX.4.58.0404221852310.2034@linuxworld.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-advocacy

On Thu, 22 Apr 2004, Gavin Sherry wrote:

> On Wed, 21 Apr 2004, Merlin Moncure wrote:
>
> >
> > > Perhaps one of the advocay team will pick up the batton?
> > He is using COPY to load the data...I can't think of any earthly reason
> > why it takes > 1d to load 10gb data...probabaly all boils down to
> > default shared buffer setting. I don't even really consider this
> > 'optimizing', just basic configuring to match the software to the
> > machine.
>
> The indexes should be created after the data is loaded. There is no
> mention of VACUUM, ANALYZE or tuning.
>
> I've generated the work load over night and am going to have a quick look
> at it when I get a minute.

With a 10 GB work load on an old Duron, 512 MB ram, IDE disk, it took 30
odd minutes to load the data and 9 hours to load indexes. Unfortunately
I'm out of disk! I'll run the 1 GB work load instead when I get some time
but I can imagine that an ANALYZE will make a dramatic different to those
query times.

Gavin