Re: Performance comparison between Postgres and Greenplum

Lists: pgsql-performance
From: Suvankar Roy <suvankar(dot)roy(at)tcs(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Performance comparison between Postgres and Greenplum
Date: 2009-07-13 11:23:41
Message-ID: OF2EBBDB72.9891A893-ON652575F2.0037D572-652575F2.003E5221@tcs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Hi,

I have some 99,000 records in a table (OBSERVATION_ALL) in a Postgres DB
as well as a Greenplum DB.

The Primary key is a composite one comprising of 2 columns (so_no,
serial_no).

The execution of the following query takes 8214.016 ms in Greenplum but
only 729.134 ms in Postgres.
select * from observation_all order by so_no, serial_no;

I believe that execution time in greenplum should be less compared to
postgres. Can anybody throw some light, it would be of great help.

Regards,

Suvankar Roy
=====-----=====-----=====
Notice: The information contained in this e-mail
message and/or attachments to it may contain
confidential or privileged information. If you are
not the intended recipient, any dissemination, use,
review, distribution, printing or copying of the
information contained in this e-mail message
and/or attachments to it are strictly prohibited. If
you have received this communication in error,
please notify us by reply e-mail or telephone and
immediately and permanently delete the message
and any attachments. Thank you


From: Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
To: Suvankar Roy <suvankar(dot)roy(at)tcs(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Performance comparison between Postgres and Greenplum
Date: 2009-07-15 03:40:38
Message-ID: dcc563d10907142040u21a48979g56dc197ff926848d@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Mon, Jul 13, 2009 at 5:23 AM, Suvankar Roy<suvankar(dot)roy(at)tcs(dot)com> wrote:
>
> Hi,
>
> I have some 99,000 records in a table (OBSERVATION_ALL) in a Postgres DB as
> well as a Greenplum DB.
>
> The Primary key is a composite one comprising of 2 columns (so_no,
> serial_no).
>
> The execution of the following query takes 8214.016 ms in Greenplum but only
> 729.134 ms in Postgres.
> select * from observation_all order by so_no, serial_no;
>
> I believe that execution time in greenplum should be less compared to
> postgres. Can anybody throw some light, it would be of great help.

What versions are you comparing?


From: Suvankar Roy <suvankar(dot)roy(at)tcs(dot)com>
To: Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Performance comparison between Postgres and Greenplum
Date: 2009-07-15 05:33:59
Message-ID: OFB530BC8D.5DC071A7-ON652575F4.0019F071-652575F4.001E4DE8@tcs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Hi Scott,

This is what I have got -

In Greenplum, the following query returns:

test_db1=# select version();
version
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
PostgreSQL 8.2.13 (Greenplum Database 3.3.0.1 build 4) on
i686-pc-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20080704 (Red Hat
4.1.2-44) compiled on Jun 4 2009 16:30:49
(1 row)

In Postgres, the same query returns:

postgres=# select version();
version
-----------------------------------------------------
PostgreSQL 8.3.7, compiled by Visual C++ build 1400
(1 row)

Regards,

Suvankar Roy
Tata Consultancy Services
Ph:- +91 33 66367352
Cell:- +91 9434666898

Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
07/15/2009 09:10 AM

To
Suvankar Roy <suvankar(dot)roy(at)tcs(dot)com>
cc
pgsql-performance(at)postgresql(dot)org
Subject
Re: [PERFORM] Performance comparison between Postgres and Greenplum

On Mon, Jul 13, 2009 at 5:23 AM, Suvankar Roy<suvankar(dot)roy(at)tcs(dot)com> wrote:
>
> Hi,
>
> I have some 99,000 records in a table (OBSERVATION_ALL) in a Postgres DB
as
> well as a Greenplum DB.
>
> The Primary key is a composite one comprising of 2 columns (so_no,
> serial_no).
>
> The execution of the following query takes 8214.016 ms in Greenplum but
only
> 729.134 ms in Postgres.
> select * from observation_all order by so_no, serial_no;
>
> I believe that execution time in greenplum should be less compared to
> postgres. Can anybody throw some light, it would be of great help.

What versions are you comparing?

ForwardSourceID:NT00004AAE
=====-----=====-----=====
Notice: The information contained in this e-mail
message and/or attachments to it may contain
confidential or privileged information. If you are
not the intended recipient, any dissemination, use,
review, distribution, printing or copying of the
information contained in this e-mail message
and/or attachments to it are strictly prohibited. If
you have received this communication in error,
please notify us by reply e-mail or telephone and
immediately and permanently delete the message
and any attachments. Thank you


From: Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
To: Suvankar Roy <suvankar(dot)roy(at)tcs(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Performance comparison between Postgres and Greenplum
Date: 2009-07-15 09:30:41
Message-ID: dcc563d10907150230o67c7426bnb733ded769f997a8@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Tue, Jul 14, 2009 at 11:33 PM, Suvankar Roy<suvankar(dot)roy(at)tcs(dot)com> wrote:
>
> Hi Scott,
>
> This is what I have got -
> In Greenplum, version PostgreSQL 8.2.13 (Greenplum Database 3.3.0.1 build 4) on
> i686-pc-linux-gnu, compiled by GCC gcc (GCC)

> In Postgres, version PostgreSQL 8.3.7, compiled by Visual C++ build 1400
> (1 row)

I wouldn't expect 8.2.x to outrun 8.3.x


From: Suvankar Roy <suvankar(dot)roy(at)tcs(dot)com>
To: Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Performance comparison between Postgres and Greenplum
Date: 2009-07-15 09:39:21
Message-ID: OF98B28EF5.74EAD5B9-ON652575F4.00344843-652575F4.0034BAA6@tcs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Hi Scott,

Thanks for your input Scott.

But, then being a Massively Parallel Processing Database, is Greenplum not
expected to outperform versions of Postgres higher than on which it is
based.

My notion was that GP 3.3 (based on PostgreSQL 8.2.13) would exceed PG
8.3.7.

It seems that I was wrong here.

Regards,

Suvankar Roy

Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
07/15/2009 03:00 PM

To
Suvankar Roy <suvankar(dot)roy(at)tcs(dot)com>
cc
pgsql-performance(at)postgresql(dot)org
Subject
Re: [PERFORM] Performance comparison between Postgres and Greenplum

On Tue, Jul 14, 2009 at 11:33 PM, Suvankar Roy<suvankar(dot)roy(at)tcs(dot)com>
wrote:
>
> Hi Scott,
>
> This is what I have got -
> In Greenplum, version PostgreSQL 8.2.13 (Greenplum Database 3.3.0.1
build 4) on
> i686-pc-linux-gnu, compiled by GCC gcc (GCC)

> In Postgres, version PostgreSQL 8.3.7, compiled by Visual C++ build 1400
> (1 row)

I wouldn't expect 8.2.x to outrun 8.3.x

ForwardSourceID:NT00004AD2
=====-----=====-----=====
Notice: The information contained in this e-mail
message and/or attachments to it may contain
confidential or privileged information. If you are
not the intended recipient, any dissemination, use,
review, distribution, printing or copying of the
information contained in this e-mail message
and/or attachments to it are strictly prohibited. If
you have received this communication in error,
please notify us by reply e-mail or telephone and
immediately and permanently delete the message
and any attachments. Thank you


From: Alex Goncharov <alex-goncharov(at)comcast(dot)net>
To: Suvankar Roy <suvankar(dot)roy(at)tcs(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Performance comparison between Postgres and Greenplum
Date: 2009-07-15 12:37:32
Message-ID: E1MR3jo-00079u-Ao@daland.home
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

,--- You/Suvankar (Mon, 13 Jul 2009 16:53:41 +0530) ----*
| I have some 99,000 records in a table (OBSERVATION_ALL) in a Postgres DB
| as well as a Greenplum DB.
|
| The Primary key is a composite one comprising of 2 columns (so_no,
| serial_no).
|
| The execution of the following query takes 8214.016 ms in Greenplum but
| only 729.134 ms in Postgres.
| select * from observation_all order by so_no, serial_no;
|
| I believe that execution time in greenplum should be less compared to
| postgres. Can anybody throw some light, it would be of great help.

Why do you believe so?

Is your data distributed and served by separate segment hosts? By how
many? Is the network connectivity not a factor? What happens with
the times if you don't sort your result set?

-- Alex -- alex-goncharov(at)comcast(dot)net --


From: Suvankar Roy <suvankar(dot)roy(at)tcs(dot)com>
To: Alex Goncharov <alex-goncharov(at)comcast(dot)net>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Performance comparison between Postgres and Greenplum
Date: 2009-07-15 13:02:12
Message-ID: OF1BC69A48.3BA89FC8-ON652575F4.0045498B-652575F4.00474CE3@tcs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Hi Alex,

Yes, I have got 2 segments and a master host. So, in a way processing
should be faster in Greenplum.

Actually this is only a sort of Proof of Concept trial that I am carrying
out to notice differences between greenplum and postgres, if any.

For other queries though, results are satisfactory or at least comparable,
like-

select distinct so_no, serial_no from observation_all;
in postgres it takes - 1404.238 ms
in gp it takes - 1217.283 ms

Regards,

Suvankar Roy

Alex Goncharov <alex-goncharov(at)comcast(dot)net>
07/15/2009 06:07 PM
Please respond to
Alex Goncharov <alex-goncharov(at)comcast(dot)net>

To
Suvankar Roy <suvankar(dot)roy(at)tcs(dot)com>
cc
pgsql-performance(at)postgresql(dot)org
Subject
Re: [PERFORM] Performance comparison between Postgres and Greenplum

,--- You/Suvankar (Mon, 13 Jul 2009 16:53:41 +0530) ----*
| I have some 99,000 records in a table (OBSERVATION_ALL) in a Postgres DB

| as well as a Greenplum DB.
|
| The Primary key is a composite one comprising of 2 columns (so_no,
| serial_no).
|
| The execution of the following query takes 8214.016 ms in Greenplum but
| only 729.134 ms in Postgres.
| select * from observation_all order by so_no, serial_no;
|
| I believe that execution time in greenplum should be less compared to
| postgres. Can anybody throw some light, it would be of great help.

Why do you believe so?

Is your data distributed and served by separate segment hosts? By how
many? Is the network connectivity not a factor? What happens with
the times if you don't sort your result set?

-- Alex -- alex-goncharov(at)comcast(dot)net --

ForwardSourceID:NT00004AF2
=====-----=====-----=====
Notice: The information contained in this e-mail
message and/or attachments to it may contain
confidential or privileged information. If you are
not the intended recipient, any dissemination, use,
review, distribution, printing or copying of the
information contained in this e-mail message
and/or attachments to it are strictly prohibited. If
you have received this communication in error,
please notify us by reply e-mail or telephone and
immediately and permanently delete the message
and any attachments. Thank you


From: Alex Goncharov <alex-goncharov(at)comcast(dot)net>
To: Suvankar Roy <suvankar(dot)roy(at)tcs(dot)com>
Cc: alex-goncharov(at)comcast(dot)net, pgsql-performance(at)postgresql(dot)org
Subject: Re: Performance comparison between Postgres and Greenplum
Date: 2009-07-15 13:18:12
Message-ID: E1MR4NA-0007Gd-FJ@daland.home
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

,--- You/Suvankar (Wed, 15 Jul 2009 18:32:12 +0530) ----*
| Yes, I have got 2 segments and a master host. So, in a way processing
| should be faster in Greenplum.

No, it should not: it all depends on your data, SQL statements and
setup.

In my own experiments, with small amounts of stored data, PostgreSQL
beats Greenplum, which doesn't surprise me a bit.

You need to know where most of the execution time goes -- maybe to
sorting? And sorting in Greenplum, isn't it done on one machine, the
master host? Why would that be faster than in PostgreSQL?
|
| For other queries though, results are satisfactory or at least comparable,
| like-
|
| select distinct so_no, serial_no from observation_all;
| in postgres it takes - 1404.238 ms
| in gp it takes - 1217.283 ms

No surprise here: the data is picked by multiple segment hosts and
never sorted on the master.

-- Alex -- alex-goncharov(at)comcast(dot)net --


From: Scott Mead <scott(dot)lists(at)enterprisedb(dot)com>
To: Alex Goncharov <alex-goncharov(at)comcast(dot)net>
Cc: Suvankar Roy <suvankar(dot)roy(at)tcs(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Performance comparison between Postgres and Greenplum
Date: 2009-07-15 15:33:03
Message-ID: d3ab2ec80907150833i4a0144b6h278f68807f23f2ec@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Wed, Jul 15, 2009 at 9:18 AM, Alex Goncharov
<alex-goncharov(at)comcast(dot)net>wrote:

> ,--- You/Suvankar (Wed, 15 Jul 2009 18:32:12 +0530) ----*
> | Yes, I have got 2 segments and a master host. So, in a way processing
> | should be faster in Greenplum.
>
> No, it should not: it all depends on your data, SQL statements and
> setup.
>
> In my own experiments, with small amounts of stored data, PostgreSQL
> beats Greenplum, which doesn't surprise me a bit.

Agreed. You're only operating on 99,000 rows. That isn't really
enough rows to exercise the architecture of shared-nothing clusters.
Now, I don't know greenplum very well, but I am familiar with another
warehousing product
with approximately the same architecture behind
it. From all the testing I've done, you need to get into the 50
million plus row range before the architecture starts to be really
effective. 99,000 rows probably fits completely into memory on the
machine that you're testing PG with, so your test really isn't fair.
On one PG box, you're just doing memory reads, and maybe some high-speed
disk access, on the Greenplum setup, you've got network overhead on top of
all that. Bottom
line: You need to do a test with a number of rows that won't fit into
memory, and won't be very quickly scanned from disk into memory. You
need a LOT of data.

--Scott


From: Greg Smith <gsmith(at)gregsmith(dot)com>
To: Suvankar Roy <suvankar(dot)roy(at)tcs(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Performance comparison between Postgres and Greenplum
Date: 2009-07-16 01:14:46
Message-ID: alpine.GSO.2.01.0907152101370.15586@westnet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Mon, 13 Jul 2009, Suvankar Roy wrote:

> I believe that execution time in greenplum should be less compared to postgres.

Well, first off you don't even mention which PostgreSQL or Greenplum
version you're comparing, which leaves a lot of variables we can't account
for. Second, you'd need to make sure that the two servers had as close to
identical server parameter configurations as possible to get a fair
comparison (the postgresql.conf file). Next, you need to make sure the
data has been loaded and analyzed similarly on the two--try using "VACUUM
ANALYZE" on both systems before running your query, then "EXPLAIN ANALYZE"
on both setups to get an idea if they're using the same plan to pull data
from the disk, you may discover there's a radical different there.

...and even if you did all that, this still wouldn't be the right place to
ask about Greenplum's database product. You'll end up with everyone mad
at you. Nobody likes have benchmarks that show their product in a bad
light published, particularly if they aren't completely fair. And this
list is dedicated to talking about the open-source PostgreSQL versions.
Your question would be more appropriate to throw in Greenplum's direction.
The list I gave above is by no means even comprehensive--there are plenty
of other ways you can end up doing an unfair comparison here (using
different paritions on the same disk which usually end up with different
speeds comes to mind).

--
* Greg Smith gsmith(at)gregsmith(dot)com http://www.gregsmith.com Baltimore, MD


From: Greg Smith <gsmith(at)gregsmith(dot)com>
To: Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
Cc: Suvankar Roy <suvankar(dot)roy(at)tcs(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Performance comparison between Postgres and Greenplum
Date: 2009-07-16 01:17:45
Message-ID: alpine.GSO.2.01.0907152115460.15586@westnet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Wed, 15 Jul 2009, Scott Marlowe wrote:

> On Tue, Jul 14, 2009 at 11:33 PM, Suvankar Roy<suvankar(dot)roy(at)tcs(dot)com> wrote:
>>
>> Hi Scott,
>>
>> This is what I have got -
>> In Greenplum, version PostgreSQL 8.2.13 (Greenplum Database 3.3.0.1 build 4) on
>> i686-pc-linux-gnu, compiled by GCC gcc (GCC)
>
>> In Postgres, version PostgreSQL 8.3.7, compiled by Visual C++ build 1400
>> (1 row)
>
> I wouldn't expect 8.2.x to outrun 8.3.x

And you can't directly compare performance of a system running Linux with
one running Windows, even if they're the same hardware. Theoretically,
Linux should have an advantage, but only if you're accounting for a whole
stack of other variables.

--
* Greg Smith gsmith(at)gregsmith(dot)com http://www.gregsmith.com Baltimore, MD
>From pgsql-performance-owner(at)postgresql(dot)org Wed Jul 15 23:37:09 2009
Received: from maia.hub.org (unknown [200.46.204.183])
by mail.postgresql.org (Postfix) with ESMTP id 7372E633F4F
for <pgsql-performance-postgresql(dot)org(at)mail(dot)postgresql(dot)org>; Wed, 15 Jul 2009 23:37:09 -0300 (ADT)
Received: from mail.postgresql.org ([200.46.204.86])
by maia.hub.org (mx1.hub.org [200.46.204.183]) (amavisd-maia, port 10024)
with ESMTP id 39630-06
for <pgsql-performance-postgresql(dot)org(at)mail(dot)postgresql(dot)org>;
Wed, 15 Jul 2009 23:36:58 -0300 (ADT)
X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6
Received: from mail-ew0-f223.google.com (mail-ew0-f223.google.com [209.85.219.223])
by mail.postgresql.org (Postfix) with ESMTP id F374863386F
for <pgsql-performance(at)postgresql(dot)org>; Wed, 15 Jul 2009 23:36:55 -0300 (ADT)
Received: by ewy23 with SMTP id 23so4392589ewy.19
for <pgsql-performance(at)postgresql(dot)org>; Wed, 15 Jul 2009 19:36:53 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
d=gmail.com; s=gamma;
h=domainkey-signature:mime-version:received:in-reply-to:references
:date:message-id:subject:from:to:cc:content-type
:content-transfer-encoding;
bh=zAd2j/NRQceoCSRzNlsqeSjRgrq+BGQ8vrIESsiGTU8=;
b=WfO09u5wWjUyoJRb7cEyxT/WM1tjLY4r8zTMVAabVs3QvEQufug8qkFtXLv6dni5Z7
TIUHUELABFrezvTmPQG9eX1VJgYcBHuNw6PPXzeyKlMCSlRd74Ev6f7VJ7m2FShKPhPv
aM8B5cfPEd0v2gbgSKS7V6oXXSJphHBEX7xiA=
DomainKey-Signature: a=rsa-sha1; c=nofws;
d=gmail.com; s=gamma;
h=mime-version:in-reply-to:references:date:message-id:subject:from:to
:cc:content-type:content-transfer-encoding;
b=fqS+7js4KDYQDotfr5TcBhbo5dIyS2vaYgwwWo4yvSfNoAkOSEoB431WBbhCh6J785
xyhpVHz5RXyFluxbtEvGjCN/cZcjc7AOrHF75AF9bmj7neoT5xdP9erdCYWqbuSbU/Eq
G3j6SqJiU57Csoi6VYbELJBi5ia0EnTDL44Nw=
MIME-Version: 1.0
Received: by 10.210.81.9 with SMTP id e9mr9033856ebb.68.1247711811972; Wed, 15
Jul 2009 19:36:51 -0700 (PDT)
In-Reply-To: <ac116f9a0907151917o6d87d1a8wf3ff2afb088eb47c(at)mail(dot)gmail(dot)com>
References: <530068a0907150804p7455348fp4ec264448b9c36bf(at)mail(dot)gmail(dot)com>
<C683C36A(dot)A2F8%scott(at)richrelevance(dot)com>
<ac116f9a0907151917o6d87d1a8wf3ff2afb088eb47c(at)mail(dot)gmail(dot)com>
Date: Wed, 15 Jul 2009 20:36:51 -0600
Message-ID: <dcc563d10907151936y22d01025qcd27d6420c695d4(at)mail(dot)gmail(dot)com>
Subject: Re: cluster index on a table
From: Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
To: Justin Pitts <justinpitts(at)gmail(dot)com>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>,
Ibrahim Harrani <ibrahim(dot)harrani(at)gmail(dot)com>, Scott Carey <scott(at)richrelevance(dot)com>
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: quoted-printable
X-Virus-Scanned: Maia Mailguard 1.0.1
X-Spam-Status: No, hits=0 tagged_above=0 required=5 tests=none
X-Spam-Level:
X-Archive-Number: 200907/122
X-Sequence-Number: 34782

I'd love to see it.

On Wed, Jul 15, 2009 at 8:17 PM, Justin Pitts<justinpitts(at)gmail(dot)com> wrote:
> Is there any interest in adding that (continual/automatic cluster
> order maintenance) to a future release?
>
> On Wed, Jul 15, 2009 at 8:33 PM, Scott Carey<scott(at)richrelevance(dot)com> wro=
te:
>> If you have a lot of insert/update/delete activity on a table fillfactor=
can
>> help.
>>
>> I don=92t believe that postgres will try and maintain the table in the c=
luster
>> order however.
>>
>>
>> On 7/15/09 8:04 AM, "Ibrahim Harrani" <ibrahim(dot)harrani(at)gmail(dot)com> wrote:
>>
>> Hi,
>>
>> thanks for your suggestion.
>> Is there any benefit of setting fillfactor to 70 or 80 on this table?
>>
>>
>>
>> On Wed, Jun 24, 2009 at 8:42 PM, Scott Marlowe<scott(dot)marlowe(at)gmail(dot)com>
>> wrote:
>>> As another poster pointed out, you cluster on ONE index and one index
>>> only. =A0However, you can cluster on a multi-column index.
>>>
>>
>> --
>> Sent via pgsql-performance mailing list (pgsql-performance(at)postgresql(dot)or=
g)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-performance
>>
>>
>

--=20
When fascism comes to America, it will be intolerance sold as diversity.


From: Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
To: Suvankar Roy <suvankar(dot)roy(at)tcs(dot)com>
Cc: Alex Goncharov <alex-goncharov(at)comcast(dot)net>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Performance comparison between Postgres and Greenplum
Date: 2009-07-17 06:24:35
Message-ID: dcc563d10907162324o142f1f35r8f9b3c61d6a78203@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Wed, Jul 15, 2009 at 7:02 AM, Suvankar Roy<suvankar(dot)roy(at)tcs(dot)com> wrote:
>
> Hi Alex,
>
> Yes, I have got 2 segments and a master host. So, in a way processing should
> be faster in Greenplum.
>
> Actually this is only a sort of Proof of Concept trial that I am carrying
> out to notice differences between greenplum and postgres, if any.

You're definitely gonna want more data to test with. I run regular
vanilla pgsql for stats at work, and we average 0.8M to 2M rows of
stats every day. We keep them for up to two years. So, when we reach
our max of two years, we're talking somewhere in the range of a
billion rows to mess about with.

During a not so busy day, the 99,000th row entered into stats for
happens at about 3am. Once they're loaded into memory it takes 435 ms
to access those 99k rows.

Start testing in the millions, at a minimum. Hundreds of millions is
more likely to start showing a difference.