Re: Multiple Postmasters on Beowulf cluster

From: "Jan Hartmann" <jhart(at)frw(dot)uva(dot)nl>
To: <pgsql-admin(at)postgresql(dot)org>
Subject: Re: Multiple Postmasters on Beowulf cluster
Date: 2002-07-29 16:43:26
Message-ID: DIEALLGCLLCNIHBDCMAEOEHFCDAA.jhart@frw.uva.nl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

Thanks a lot for the reactions. I experimented a bit further with the
answers in mind and got the following result:

(Tom Lane)
> In that case, make 45 copies of the database ...

Without expecting much I created 3 data-directories and made symbolic links
from everything in the original data-directory, except postmaster.pid. Next
I started PostgreSQL on 3 nodes with PGDATA set to a different directory and
PGPORT to a different port. Surprisingly it worked! First startup gave a
message on each node about not having had a proper shutdown, but afterwards
everything ran ok from all servers, even restarting PostgreSQL. MapServer
didn't have a problem at all in producing a map from layers requested from
different nodes, although without any time gain (see next point). Probably
this gives PostgreSQL wizards the creeps and I wouldn't advise it to anyone,
but just for curiosity's sake, what dangers am I running (given that only
read access is needed).

(Paul Ramsey)
> It is worth noting that layers are not really all that independant from a
display
> point of view. (...) The process of creating the final visual product is
the
> result of sequential application of layers.

Yes, I didn't realise that MapServer waits until a layer has been returned
from PostgreSQL before starting with the next, essentially doing nothing in
the meantime. I thought it worked like a web browser retrieving images,
which is done asynchronously. It should work however when asking for
complete maps from different browser frames, or retrieving a map in one
frame and getting statistics for it in another, using separate PHP scripts
targeted at different nodes. This would already help me enormously.

(Bob Meyer)
> Since disk is typically the slowest part of
> any system, I would imagine that 45 nodes, all beating on one network
> file system (or a multiport filesystem for that matter) would tend to
> slow things down dramatically. I would think that it would be better to
> make 45 separate copies of the database and then if there are updates,
> make some kind of process to pass all of the transactions to each
> instantiation of the DB. Granted, the disk space would increase to 45X
> the original estimate. How much updating/changing goes on in the Db?

I am trying this out for population statistics in Dutch municipalities
within specified distances (1, 2, 5, 10, 25 km etc ) from the railway
network. Number of railway lines: 419 (each having numerous line segments),
#municipalites: 633, size of mun. map about 10M. It takes about 30 seconds
wall time to produce a map (good compared to desktop GIS-sytems, I have no
experience with Oracle Spatial). Next step would be using the roads network,
(much larger of course, but still in the range of tens of megabytes, perhaps
a hundred), and data from very diverse sources and years, including raster
bitmaps, all not excessively large. Lots of different buffers have to be put
around all kinds of selections (type roads, geographical selections) and
compared with each other. Last step is animating the results in Flash
movies: MapServer will be supporting Flash in the very near future, and I
already got some preliminary results. This will require even more
computations of intermediate results, to get flowing movies. So the problem
is not data access, it is computing power and administration of a very
disparate bunch of data. I certainly have enough computing power, and
probably also enough disk space for a 45 fold data reduplication, but I know
from experience how error prone this is, even with duplicating scripts. Even
so, unless I am very much mistaken, the MapServer-PostgreSQL-Beowulf
combinatation should offer some very interesting prospects in GIS.

Thanks for the answers

Jan

Jan Hartmann
Department of Geography
University of Amsterdam
jhart(at)frw(dot)uva(dot)nl

In response to

Browse pgsql-admin by date

  From Date Subject
Next Message Chad R. Larson 2002-07-29 20:15:50 Re: The best book
Previous Message David F. Skoll 2002-07-29 15:21:54 Batch authentication with psql -- SOLVED