Re: Two billion records ok?

Lists: pgsql-general
From: Nick Bower <nick(at)nickbower(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Two billion records ok?
Date: 2006-09-05 01:26:59
Message-ID: 200609050926.59910.nick@nickbower.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

We're considering using Postgresql for storing gridded metadata - each point
of our grids has a variety of metadata attached to it (including lat/lon,
measurements, etc) and would constitute a record in Postgresql+Postgis.

Size-wise, grids are about 4000x700 and are collected twice daily over say 10
years. As mentioned, each record would have up to 50 metadata attributes
(columns) including geom, floats, varchars etc.

So given 4000x700x2x365x10 > 2 billion, is this going to be a problem if we
will be wanting to query on datetimes, Postgis lat/lon, and integer-based
metadata flags?

If however I'm forced to sub-sample the grid, what rule of thumb should I be
looking to be constrained by?

Thanks for any pointers, Nick

PS - Feel free to throw in any other ideas of grid-suitable databases :)


From: Michael Fuhr <mike(at)fuhr(dot)org>
To: Nick Bower <nick(at)nickbower(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Two billion records ok?
Date: 2006-09-05 02:26:51
Message-ID: 20060905022651.GA36214@winnie.fuhr.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Tue, Sep 05, 2006 at 09:26:59AM +0800, Nick Bower wrote:
> So given 4000x700x2x365x10 > 2 billion, is this going to be a problem if we
> will be wanting to query on datetimes, Postgis lat/lon, and integer-based
> metadata flags?

That figure is about 20 billion, which is indeed > 2 billion :-).
If you plan to use integer row IDs then you'll need to use 64-bit
bigint/bigserial instead of 32-bit integer/serial. I haven't worked
with a database that large; maybe somebody else can give additional
advice.

--
Michael Fuhr


From: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
To: Nick Bower <nick(at)nickbower(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Two billion records ok?
Date: 2006-09-05 05:40:00
Message-ID: Pine.GSO.4.63.0609050937530.16344@ra.sai.msu.su
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Nick,

if you need very fast spatial queries (spherical) you may use our
Q3C module for POstgreSQL (q3c.sf.net). We use it for providing access
to very big astronomical catalogs.

Oleg

On Tue, 5 Sep 2006, Nick Bower wrote:

> We're considering using Postgresql for storing gridded metadata - each point
> of our grids has a variety of metadata attached to it (including lat/lon,
> measurements, etc) and would constitute a record in Postgresql+Postgis.
>
> Size-wise, grids are about 4000x700 and are collected twice daily over say 10
> years. As mentioned, each record would have up to 50 metadata attributes
> (columns) including geom, floats, varchars etc.
>
> So given 4000x700x2x365x10 > 2 billion, is this going to be a problem if we
> will be wanting to query on datetimes, Postgis lat/lon, and integer-based
> metadata flags?
>
> If however I'm forced to sub-sample the grid, what rule of thumb should I be
> looking to be constrained by?
>
> Thanks for any pointers, Nick
>
> PS - Feel free to throw in any other ideas of grid-suitable databases :)
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
> http://archives.postgresql.org
>

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83


From: Nick Bower <nick(at)nickbower(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Two billion records ok?
Date: 2006-09-05 05:53:08
Message-ID: 200609051353.08230.nick@nickbower.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Thanks - but what do you call big?

My application is satellite data btw so the reference could be useful.

On Tue, 5 Sep 2006 01:40 pm, Oleg Bartunov wrote:
> Nick,
>
> if you need very fast spatial queries (spherical) you may use our
> Q3C module for POstgreSQL (q3c.sf.net). We use it for providing access
> to very big astronomical catalogs.
>
>
> Oleg
>
> On Tue, 5 Sep 2006, Nick Bower wrote:
> > We're considering using Postgresql for storing gridded metadata - each
> > point of our grids has a variety of metadata attached to it (including
> > lat/lon, measurements, etc) and would constitute a record in
> > Postgresql+Postgis.
> >
> > Size-wise, grids are about 4000x700 and are collected twice daily over
> > say 10 years. As mentioned, each record would have up to 50 metadata
> > attributes (columns) including geom, floats, varchars etc.
> >
> > So given 4000x700x2x365x10 > 2 billion, is this going to be a problem if
> > we will be wanting to query on datetimes, Postgis lat/lon, and
> > integer-based metadata flags?
> >
> > If however I'm forced to sub-sample the grid, what rule of thumb should I
> > be looking to be constrained by?
> >
> > Thanks for any pointers, Nick
> >
> > PS - Feel free to throw in any other ideas of grid-suitable databases :)
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 4: Have you searched our list archives?
> >
> > http://archives.postgresql.org
>
> Regards,
> Oleg
> _____________________________________________________________
> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
> Sternberg Astronomical Institute, Moscow University, Russia
> Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
> phone: +007(495)939-16-83, +007(495)939-23-83


From: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
To: Nick Bower <nick(at)nickbower(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Two billion records ok?
Date: 2006-09-05 05:58:02
Message-ID: Pine.GSO.4.63.0609050956100.16344@ra.sai.msu.su
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Tue, 5 Sep 2006, Nick Bower wrote:

> Thanks - but what do you call big?

Several billions of stars. You can try our Cone Search service at
http://vo.astronet.ru/cas/conesearch.php

Oleg

>
> My application is satellite data btw so the reference could be useful.
>
> On Tue, 5 Sep 2006 01:40 pm, Oleg Bartunov wrote:
>> Nick,
>>
>> if you need very fast spatial queries (spherical) you may use our
>> Q3C module for POstgreSQL (q3c.sf.net). We use it for providing access
>> to very big astronomical catalogs.
>>
>>
>> Oleg
>>
>> On Tue, 5 Sep 2006, Nick Bower wrote:
>>> We're considering using Postgresql for storing gridded metadata - each
>>> point of our grids has a variety of metadata attached to it (including
>>> lat/lon, measurements, etc) and would constitute a record in
>>> Postgresql+Postgis.
>>>
>>> Size-wise, grids are about 4000x700 and are collected twice daily over
>>> say 10 years. As mentioned, each record would have up to 50 metadata
>>> attributes (columns) including geom, floats, varchars etc.
>>>
>>> So given 4000x700x2x365x10 > 2 billion, is this going to be a problem if
>>> we will be wanting to query on datetimes, Postgis lat/lon, and
>>> integer-based metadata flags?
>>>
>>> If however I'm forced to sub-sample the grid, what rule of thumb should I
>>> be looking to be constrained by?
>>>
>>> Thanks for any pointers, Nick
>>>
>>> PS - Feel free to throw in any other ideas of grid-suitable databases :)
>>>
>>> ---------------------------(end of broadcast)---------------------------
>>> TIP 4: Have you searched our list archives?
>>>
>>> http://archives.postgresql.org
>>
>> Regards,
>> Oleg
>> _____________________________________________________________
>> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
>> Sternberg Astronomical Institute, Moscow University, Russia
>> Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
>> phone: +007(495)939-16-83, +007(495)939-23-83
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
> message can get through to the mailing list cleanly
>

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83


From: Tim Allen <tim(at)proximity(dot)com(dot)au>
To: Nick Bower <nick(at)nickbower(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Two billion records ok?
Date: 2006-09-05 05:58:03
Message-ID: 44FD11EB.9080500@proximity.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Nick Bower wrote:
> Thanks - but what do you call big?

How many stars do you think there are? :-)

> My application is satellite data btw so the reference could be useful.
>
> On Tue, 5 Sep 2006 01:40 pm, Oleg Bartunov wrote:
>
>>Nick,
>>
>>if you need very fast spatial queries (spherical) you may use our
>>Q3C module for POstgreSQL (q3c.sf.net). We use it for providing access
>>to very big astronomical catalogs.
>>
>>Oleg

Tim

--
-----------------------------------------------
Tim Allen tim(at)proximity(dot)com(dot)au
Proximity Pty Ltd http://www.proximity.com.au/


From: Ron Johnson <ron(dot)l(dot)johnson(at)cox(dot)net>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Two billion records ok?
Date: 2006-09-05 10:01:31
Message-ID: 44FD4AFB.9020705@cox.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Nick Bower wrote:
> We're considering using Postgresql for storing gridded metadata - each point
> of our grids has a variety of metadata attached to it (including lat/lon,
> measurements, etc) and would constitute a record in Postgresql+Postgis.
>
> Size-wise, grids are about 4000x700 and are collected twice daily over say 10
> years. As mentioned, each record would have up to 50 metadata attributes
> (columns) including geom, floats, varchars etc.
>
> So given 4000x700x2x365x10 > 2 billion, is this going to be a problem if we
> will be wanting to query on datetimes, Postgis lat/lon, and integer-based
> metadata flags?
>
> If however I'm forced to sub-sample the grid, what rule of thumb should I be
> looking to be constrained by?
>
> Thanks for any pointers, Nick

Tablespaces and table partitioning will be crucial to your needs.
I'm not sure if you can partition indexes, though.

And too bad that compressed bit-map indexes have not been
implemented yet. For indexes with high "key cardinality", they save
a *lot* of space, and queries can run a lot faster.

- --
Ron Johnson, Jr.
Jefferson LA USA

Is "common sense" really valid?
For example, it is "common sense" to white-power racists that
whites are superior to blacks, and that those with brown skins
are mud people.
However, that "common sense" is obviously wrong.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFE/Ur7S9HxQb37XmcRAsKLAKDnC36QSzRuaedSsXe+rQp3fbDbOgCfSwlQ
ip2em5mEmXF45kek2rHKJvw=
=uqTK
-----END PGP SIGNATURE-----


From: Brent Wood <b(dot)wood(at)niwa(dot)co(dot)nz>
To: Nick Bower <nick(at)nickbower(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Two billion records ok?
Date: 2006-09-05 20:39:32
Message-ID: 44FDE084.4020108@niwa.co.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Nick Bower wrote:
> We're considering using Postgresql for storing gridded metadata - each point
> of our grids has a variety of metadata attached to it (including lat/lon,
> measurements, etc) and would constitute a record in Postgresql+Postgis.
>
> Size-wise, grids are about 4000x700 and are collected twice daily over say 10
> years. As mentioned, each record would have up to 50 metadata attributes
> (columns) including geom, floats, varchars etc.
>
> So given 4000x700x2x365x10 > 2 billion, is this going to be a problem if we
> will be wanting to query on datetimes, Postgis lat/lon, and integer-based
> metadata flags?
>

Hmmmm... 2 billion looks optimistic... I get 2.044e+10, which is 20 billion.

I'm currently working with a "table" of over 200,000,000 records. With a
clustered index & underlying partitioning, response times are more than
acceptable (a 25 wide self relation (left outer join) for 3 months data
(records are around 40/minute) is about 2 minutes. Simple query with a
where clause on timestamp is a few secs at most. This is on a 32 bit
Intel system with only 2Gb memory & mirrored 7200RPM SATA hard drives.

I'd suggest partition on timestamp, maybe per year at least, & use a
clustered index on timestamp. It should be viable if your hardware is up
to it.

I'd also strongly recommend a suitable platform, 64 bit Linux on AMD 64
or Opteron with as much memory & the fastest RAID setup you can afford.
Make sure you use a fully 64 bit version of Postgres/Postgis on this
platform as well.

If the same grid is being resampled, then a separate table defining the
grid, and a join on grid ID to the main (partitioned) table may improve
performance (& reduce data volume).

I wouldn't expect instantaneous answers from it, but don't know of any
reason it won't work. Depends very much on what level of performance is
considered acceptable.

I'm also using Postgis grids with up to a few million cells and spatial
joins to a millions or so tracklines to generate gridded models for
analysis in R. You seem to be scaling this up from where I'm at, so I'd
like to know how you get on..

Cheers,

Brent Wood