Re: Two billion records ok?

From: Brent Wood <b(dot)wood(at)niwa(dot)co(dot)nz>
To: Nick Bower <nick(at)nickbower(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Two billion records ok?
Date: 2006-09-05 20:39:32
Message-ID: 44FDE084.4020108@niwa.co.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Nick Bower wrote:
> We're considering using Postgresql for storing gridded metadata - each point
> of our grids has a variety of metadata attached to it (including lat/lon,
> measurements, etc) and would constitute a record in Postgresql+Postgis.
>
> Size-wise, grids are about 4000x700 and are collected twice daily over say 10
> years. As mentioned, each record would have up to 50 metadata attributes
> (columns) including geom, floats, varchars etc.
>
> So given 4000x700x2x365x10 > 2 billion, is this going to be a problem if we
> will be wanting to query on datetimes, Postgis lat/lon, and integer-based
> metadata flags?
>

Hmmmm... 2 billion looks optimistic... I get 2.044e+10, which is 20 billion.

I'm currently working with a "table" of over 200,000,000 records. With a
clustered index & underlying partitioning, response times are more than
acceptable (a 25 wide self relation (left outer join) for 3 months data
(records are around 40/minute) is about 2 minutes. Simple query with a
where clause on timestamp is a few secs at most. This is on a 32 bit
Intel system with only 2Gb memory & mirrored 7200RPM SATA hard drives.

I'd suggest partition on timestamp, maybe per year at least, & use a
clustered index on timestamp. It should be viable if your hardware is up
to it.

I'd also strongly recommend a suitable platform, 64 bit Linux on AMD 64
or Opteron with as much memory & the fastest RAID setup you can afford.
Make sure you use a fully 64 bit version of Postgres/Postgis on this
platform as well.

If the same grid is being resampled, then a separate table defining the
grid, and a join on grid ID to the main (partitioned) table may improve
performance (& reduce data volume).

I wouldn't expect instantaneous answers from it, but don't know of any
reason it won't work. Depends very much on what level of performance is
considered acceptable.

I'm also using Postgis grids with up to a few million cells and spatial
joins to a millions or so tracklines to generate gridded models for
analysis in R. You seem to be scaling this up from where I'm at, so I'd
like to know how you get on..

Cheers,

Brent Wood

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Ivan Sergio Borgonovo 2006-09-05 20:53:14 what fired a trigger
Previous Message Ron Johnson 2006-09-05 20:05:38 Re: [OT] sig sizes (was Re: Porting from ...)