Re: postgresql and openmosix migration

Lists: pgsql-performance
From: "Bill" <bill(at)math(dot)uchicago(dot)edu>
To: <pgsql-performance(at)postgresql(dot)org>
Subject: postgresql and openmosix migration
Date: 2004-06-22 14:29:39
Message-ID: 004001c45865$59cc9f20$dfc58780@billnotebook
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Hi, I am trying to make a cluster out of any database, postgresql or mysql or any other free database. I have looked at openmosix patched with the migshm patch for shared memory support and it seems that neither work fully. Postgresql in particular uses "shared memory but not the system semaphores for locking it". Thus apparently it won't benefit from an openmosix cluster. In addition mysql doesn't seem to migrate because it is multithreaded. Any ideas of how I can cluster my database (around 800 GB in size so even partial replication is not really practical)?

If interested this is my source for openmosix and migshm information http://howto.ipng.be/MigSHM-openMosix/x90.html

Thanks.


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: "Bill" <bill(at)math(dot)uchicago(dot)edu>, <pgsql-performance(at)postgresql(dot)org>
Subject: Re: postgresql and openmosix migration
Date: 2004-06-22 16:31:23
Message-ID: 200406220931.23780.josh@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Bill,

> Any ideas of how I can cluster my database (around 800 GB
> in size so even partial replication is not really practical)?

Um, raise $150,000 to pay for a clustering implementation?

Various techniques of "shared memory clustering" have been tried with
PostgreSQL, and none work. Neither does LinuxLabs "ClusGres", which is
based on similar principles -- unfortunately. (at least, LL repeatedly
postponed the demo they said they'd give me. I've yet to see anything
working ...)

Frankly, we're waiting for a well-funded corporation to jump in and decide
they want PostgreSQL clustering. Database server clustering is a "big
ticket item" requiring roughly 1,000 hours of programming and
troubleshooting. As such, you're not likely to see it come out of the OSS
community unaided.

Oh, and FYI, MySQL's "clustering" doesn't work either. It requires your
entire database to fit into available RAM ....

--
Josh Berkus
Aglio Database Solutions
San Francisco


From: "Bill" <bill(at)math(dot)uchicago(dot)edu>
To: "Josh Berkus" <josh(at)agliodbs(dot)com>
Cc: <pgsql-performance(at)postgresql(dot)org>
Subject: Re: postgresql and openmosix migration
Date: 2004-06-22 17:31:15
Message-ID: 008201c4587e$b8856510$dfc58780@billnotebook
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Ok, so maybe someone on this group will have a better idea. We have a
database of financial information, and this has literally millions of
entries. I have installed indicies, but for the rather computationally
demanding processes we like to use, like a select query to find the
commodity with the highest monthly or annual returns, the computer generally
runs unacceptably slow. So, other than clustring, how could I achieve a
speed increase in these complex queries? Is this better in mysql or
postgresql?

Thanks.
----- Original Message -----
From: "Josh Berkus" <josh(at)agliodbs(dot)com>
To: "Bill" <bill(at)math(dot)uchicago(dot)edu>; <pgsql-performance(at)postgresql(dot)org>
Sent: Tuesday, June 22, 2004 11:31 AM
Subject: Re: [PERFORM] postgresql and openmosix migration

> Bill,
>
> > Any ideas of how I can cluster my database (around 800 GB
> > in size so even partial replication is not really practical)?
>
> Um, raise $150,000 to pay for a clustering implementation?
>
> Various techniques of "shared memory clustering" have been tried with
> PostgreSQL, and none work. Neither does LinuxLabs "ClusGres", which is
> based on similar principles -- unfortunately. (at least, LL repeatedly
> postponed the demo they said they'd give me. I've yet to see anything
> working ...)
>
> Frankly, we're waiting for a well-funded corporation to jump in and decide
> they want PostgreSQL clustering. Database server clustering is a "big
> ticket item" requiring roughly 1,000 hours of programming and
> troubleshooting. As such, you're not likely to see it come out of the
OSS
> community unaided.
>
> Oh, and FYI, MySQL's "clustering" doesn't work either. It requires your
> entire database to fit into available RAM ....
>
> --
> Josh Berkus
> Aglio Database Solutions
> San Francisco
>


From: Bruno Wolff III <bruno(at)wolff(dot)to>
To: Bill <bill(at)math(dot)uchicago(dot)edu>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: postgresql and openmosix migration
Date: 2004-06-22 17:53:28
Message-ID: 20040622175328.GA20086@wolff.to
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Tue, Jun 22, 2004 at 12:31:15 -0500,
Bill <bill(at)math(dot)uchicago(dot)edu> wrote:
> Ok, so maybe someone on this group will have a better idea. We have a
> database of financial information, and this has literally millions of
> entries. I have installed indicies, but for the rather computationally
> demanding processes we like to use, like a select query to find the
> commodity with the highest monthly or annual returns, the computer generally
> runs unacceptably slow. So, other than clustring, how could I achieve a
> speed increase in these complex queries? Is this better in mysql or
> postgresql?

Queries using max (or min) can often be rewritten as queries using ORDER BY
and LIMIT so that they can take advantage of indexes. Doing this might help
with some of the problems you are seeing.
If you commonly query on aggregated data it might be better to create
derived tables of the aggregated data maintained by triggers, and query
against them. If you do lots of selects relative to inserts and updates,
this could be a big win.


From: Richard Welty <rwelty(at)averillpark(dot)net>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: postgresql and openmosix migration
Date: 2004-06-22 18:08:10
Message-ID: Mahogany-0.66.0-22456-20040622-140810.00@averillpark.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Tue, 22 Jun 2004 12:31:15 -0500 Bill <bill(at)math(dot)uchicago(dot)edu> wrote:
> I have installed indicies,

but are there any statistics? vacuum analyze is your friend

> but for the rather computationally
> demanding processes we like to use, like a select query to find the
> commodity with the highest monthly or annual returns, the computer generally
> runs unacceptably slow. So, other than clustring, how could I achieve a
> speed increase in these complex queries?

1) have you gone to the effort to tune the values in postgresql.conf?

2) have you tried using explain to find out what the query planner is
up to?

> Is this better in mysql or
> postgresql?

if there is any complexity to the queries, postgresql will serve you better
if you learn how to use it properly.

richard
--
Richard Welty rwelty(at)averillpark(dot)net
Averill Park Networking 518-573-7592
Java, PHP, PostgreSQL, Unix, Linux, IP Network Engineering, Security


From: "Matthew Nuzum" <matt(at)followers(dot)net>
To: "'Bill'" <bill(at)math(dot)uchicago(dot)edu>
Cc: <pgsql-performance(at)postgresql(dot)org>
Subject: Re: postgresql and openmosix migration
Date: 2004-06-22 18:49:48
Message-ID: 200406221849.i5MInpnb027568@ms-smtp-02.tampabay.rr.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Hi Bill, I am more often in the "needing help" category than the "giving
help" when it comes to advise about using postgresql. I have found it to be
an extremely powerful tool and by far the best performance/price for my
work.

I think you will get some excellent answers and help to your performance
questions if you send the list details about specific queries that are
running too slow. If you are willing to throw more/bigger hardware at the
problem, let people know that when you ask and they will tell you if your
bottleneck can be alleviated through more ram, disks, cpu or whatever.
Having been watching this list for some time now, I suspect most of the
performance problems can be improved using non-intuitive query or
configuration modifications (for example, replacing min()/max() as suggested
by Mr. Wolf).

The heavy hitters on the list will usually ask for an "explain analyze" of
your query. If your query is "select * from foo", then change it to
"EXPLAIN ANALYZE select * from foo" and post the output. It will look
something like this:
QUERY PLAN

----------------------------------------------------------------------------
-------------------------------
Seq Scan on foo (cost=0.00..1.04 rows=4 width=44) (actual time=8.46..8.47
rows=4 loops=1)
Total runtime: 19.63 msec
(2 rows)

I'm sure your data is confidential; mine is too. The good news is that none
of your data is included in the query. Only technical details about what the
database is doing.

If your problem might involve the application that works with the data, give
some details about that. For example, if you're using a Java application,
let people know what driver version you use, what jvm and other related
info. There are lurkers on this list using just about every programming
language imaginable on more platforms than you can shake a stick at (I don't
care how good you are at shaking sticks, either).

The more details you give the better help you're going to get and you'd be
amazed at the results I've seen people get with a judicious amount of
tweaking. The other day someone had a query that took hours decrease to less
than 10 minutes by using some techniques prescribed by members on the list.
Bringing 30 - 60 second queries down to 2-3 seconds is commonplace.

You seem to be ready to throw money at the problem by investing in new
hardware but I would suggest digging into the performance problems first.
Too many times we've seen people on the list say, "I've just spent $x0,000
on a new xyz and I'm still having problems with this query." Often times
the true solution is rewriting queries, tweaking config parameters, adding
RAM and upgrading disks (in that order I believe).

As I found out even today on the SQL list, it's best to ask questions in
this form:
"I want to do this... I've been trying this... I'm getting this... which
is problematic because..."

The more clearly you state the abstract goal the more creative answers
you'll get with people often suggesting things you'd never considered.

I hope this helps and I hope that you achieve your goals of a well
performing application.

Matthew Nuzum | Makers of "Elite Content Management System"
www.followers.net | View samples of Elite CMS in action
matt(at)followers(dot)net | http://www.followers.net/portfolio/

> -----Original Message-----
> From: pgsql-performance-owner(at)postgresql(dot)org [mailto:pgsql-performance-
> owner(at)postgresql(dot)org] On Behalf Of Bill
> Sent: Tuesday, June 22, 2004 1:31 PM
> To: Josh Berkus
> Cc: pgsql-performance(at)postgresql(dot)org
> Subject: Re: [PERFORM] postgresql and openmosix migration
>
> Ok, so maybe someone on this group will have a better idea. We have a
> database of financial information, and this has literally millions of
> entries. I have installed indicies, but for the rather computationally
> demanding processes we like to use, like a select query to find the
> commodity with the highest monthly or annual returns, the computer
> generally
> runs unacceptably slow. So, other than clustring, how could I achieve a
> speed increase in these complex queries? Is this better in mysql or
> postgresql?
>
> Thanks.


From: Joe Conway <mail(at)joeconway(dot)com>
To: Bill <bill(at)math(dot)uchicago(dot)edu>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: postgresql and openmosix migration
Date: 2004-06-23 01:25:16
Message-ID: 40D8DBFC.2090208@joeconway.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Bill wrote:
> Ok, so maybe someone on this group will have a better idea. We have a
> database of financial information, and this has literally millions of
> entries. I have installed indicies, but for the rather computationally
> demanding processes we like to use, like a select query to find the
> commodity with the highest monthly or annual returns, the computer generally
> runs unacceptably slow. So, other than clustring, how could I achieve a
> speed increase in these complex queries? Is this better in mysql or
> postgresql?

If the bottleneck is really computational, not I/O, you might try PL/R
in conjunction with the rpvm R package. rpvm allows R to make use of pvm
to split its load among a cluster. See:

R:
http://www.r-project.org/

PL/R:
http://www.joeconway.com/plr/

rpvm:
http://cran.r-project.org/src/contrib/Descriptions/rpvm.html
http://cran.r-project.org/doc/packages/rpvm.pdf

I haven't had a chance to play with this myself yet, but I hope to
relatively soon.

HTH,

Joe


From: Andrew Hammond <ahammond(at)ca(dot)afilias(dot)info>
To: Bill <bill(at)math(dot)uchicago(dot)edu>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: postgresql and openmosix migration
Date: 2004-06-23 14:47:21
Message-ID: 40D997F9.7000700@ca.afilias.info
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Bill wrote:
| Ok, so maybe someone on this group will have a better idea. We have a
| database of financial information, and this has literally millions of
| entries. I have installed indicies, but for the rather computationally
| demanding processes we like to use, like a select query to find the
| commodity with the highest monthly or annual returns, the computer
generally
| runs unacceptably slow. So, other than clustring, how could I achieve a
| speed increase in these complex queries? Is this better in mysql or
| postgresql?

Postgres generally beats MySQL on complex queries. The easiest solution
to speed issues is to throw hardware at it. Generally, you're first
bound by disk, RAM then CPU.

1) Move your data over to an array of smallish 15kRPM disks. The more
spindles the better.
2) Use a 64 bit platform and take advantage of >4 GB memory.

There are dozens of options for the disk array. For the processing
platform, I'd recommend looking at Opteron. I've heard only good things
and their price is much more reasonable than the other options.

- --
Andrew Hammond 416-673-4138 ahammond(at)ca(dot)afilias(dot)info
Database Administrator, Afilias Canada Corp.
CB83 2838 4B67 D40F D086 3568 81FC E7E5 27AF 4A9A
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFA2Zf3gfzn5SevSpoRAr0HAJ0S/uVjuqYEuhMgdSAI3rfHK0ga1wCgwpHl
g+yuBYpAt58vnJWtX+wii1s=
=2fGN
-----END PGP SIGNATURE-----

Attachment Content-Type Size
ahammond.vcf text/x-vcard 509 bytes

From: Josh Berkus <josh(at)agliodbs(dot)com>
To: "Bill" <bill(at)math(dot)uchicago(dot)edu>
Cc: <pgsql-performance(at)postgresql(dot)org>
Subject: Re: postgresql and openmosix migration
Date: 2004-06-23 17:31:45
Message-ID: 200406231031.45401.josh@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Bill,

> Ok, so maybe someone on this group will have a better idea. We have a
> database of financial information, and this has literally millions of
> entries. I have installed indicies, but for the rather computationally
> demanding processes we like to use, like a select query to find the
> commodity with the highest monthly or annual returns, the computer
> generally runs unacceptably slow. So, other than clustring, how could I
> achieve a speed increase in these complex queries?

Well, you can do this 2 ways:
1) you can pick out one query at a time, and send us complete information on
it, like Matt's really nice e-mail describes. People on this list will
help you troubleshoot it. It will take a lot of time, but no money.

2) You can hire a PG database expert. This will be much faster, but cost
you a lot of money.

>Is this better in mysql
> or postgresql?

Complex queries? Large databases? That's us. MySQL is obtimized for
simple queries on small databases.

--
Josh Berkus
Aglio Database Solutions
San Francisco


From: Rod Taylor <pg(at)rbt(dot)ca>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: Bill <bill(at)math(dot)uchicago(dot)edu>, Postgresql Performance <pgsql-performance(at)postgresql(dot)org>
Subject: Re: postgresql and openmosix migration
Date: 2004-06-23 17:52:39
Message-ID: 1088013159.95078.21.camel@jester
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

> 2) You can hire a PG database expert. This will be much faster, but cost
> you a lot of money.

I wouldn't exactly say "a lot of money". Lots of consulters out there
are willing to put in a weeks worth of effort, on site, for
significantly less than a support contract with most commercial DB
organizations (including MySQL) -- and often give better results since
they're on-site rather than over phone or via email.

But yes, doing it via this mailing list is probably the cheapest option.


From: Richard Welty <rwelty(at)averillpark(dot)net>
To: Postgresql Performance <pgsql-performance(at)postgresql(dot)org>
Subject: Re: postgresql and openmosix migration
Date: 2004-06-23 18:20:24
Message-ID: Mahogany-0.66.0-22456-20040623-142024.00@averillpark.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Wed, 23 Jun 2004 13:52:39 -0400 Rod Taylor <pg(at)rbt(dot)ca> wrote:
> But yes, doing it via this mailing list is probably the cheapest option.

yes, he just needs to decide how big a hurry he's in.

also, if he does decide to hire a consultant, i suggest he pop over
to pgsql-jobs and ask there.

richard
--
Richard Welty rwelty(at)averillpark(dot)net
Averill Park Networking 518-573-7592
Java, PHP, PostgreSQL, Unix, Linux, IP Network Engineering, Security