Re: Design Question (Time Series Data)

From: "Pavel Stehule" <pavel(dot)stehule(at)gmail(dot)com>
To: "Andreas Strasser" <kontakt(at)andreas-strasser(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Design Question (Time Series Data)
Date: 2007-10-04 09:20:19
Message-ID: 162867790710040220j3297053bo1b41f12af000273f@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

2007/10/4, Andreas Strasser <kontakt(at)andreas-strasser(dot)com>:
> Hello,
>
> i'm currently designing an application that will retrieve economic data
> (mainly time series)from different sources and distribute it to clients.
> It is supposed to manage around 20.000 different series with differing
> numbers of observations (some have only a few dozen observations, others
> several thousand) and i'm now faced with the decision where and how to
> store the data.
>
> So far, i've come up with 3 possible solutions
>
> 1) Storing the observations in one big table with fields for the series,
> position within the series and the value (float)
> 2) Storing the observations in an array (either in the same table as the
> series or in an extra data-table)
> 3) Storing the observations in CSV-files on the hard disk and only
> putting a reference to it in the database
>

I did good experience with 2 variant. PostgreSQL needs 24bytes for
head of every row, so isn't too much efective store one field to one
row. You can simply do transformation between array and table now.

Pavel

> I expect that around 50 series will be updated daily - which would mean
> that for solution nr. 1 around 50.000 rows would be deleted and appended
> (again) every day.
>
> I personally prefer solution 1, because it is the easiest to implement
> (i need to make different calculations and be able to transform the data
> easily), but i'm concerned about perfomance and overhead. It effectively
> triples the space needed (over solutions nr. 2) and will result in huge
> index files.
>
> Are there any other storage methods which are better suited for this
> kind of data? How can i avoid trouble resulting from the daily updates
> (high number of deleted rows)? Which method would you prefer?
>
> Thanks in advance!
>
> Andreas
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Jorge Godoy 2007-10-04 09:34:01 Re: Design Question (Time Series Data)
Previous Message Albe Laurenz 2007-10-04 08:37:59 Re: About selected data dump