Re: Using Postgres to store high volume streams of sensor readings

From: "Nikolas Everett" <nik9000(at)gmail(dot)com>
To: "Ciprian Dorin Craciun" <ciprian(dot)craciun(at)gmail(dot)com>
Cc: "Gerhard Heift" <ml-postgresql-20081012-3518(at)gheift(dot)de>, pgsql-general(at)postgresql(dot)org
Subject: Re: Using Postgres to store high volume streams of sensor readings
Date: 2008-11-21 15:33:18
Message-ID: d4e11e980811210733k541768ecg28187c742b760a87@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

You might want to look into how OpenNMS uses RRDTool. It is able to handle
a huge number of nodes by queuing inserts into the RRDs and using JRobin.

I'm not sure if it is a great solution for what you are looking for, but
I've found its performance scales quite well. I'm getting well over 500
updates per second using JRobin and an NFS disk. I'm sure I could do better
but by my application is limited by hibernate. Each of my files stores an
MRTG's worth of data and keeps the average and max of four points in 136k.

Here is an iostat -dmx 1 of the rrd update:
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz
avgqu-sz await svctm %util
xvda 7.00 33.00 122.00 2093.00 0.48 8.34 8.16
150.70 69.92 0.45 100.00

Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz
avgqu-sz await svctm %util
xvda 493.00 26.00 248.00 540.00 2.84 2.21 13.13
43.70 55.42 1.26 99.60

My big problem with RRD is not being able to query it like you can a
database and it tending not to keep exact values. Oh, and it being pretty
cryptic.

Sorry to clutter up the list of RRD stuff. I just thought it might be
pertinent to Ciprian.

On Fri, Nov 21, 2008 at 8:03 AM, Ciprian Dorin Craciun <
ciprian(dot)craciun(at)gmail(dot)com> wrote:

> On Fri, Nov 21, 2008 at 2:55 PM, Gerhard Heift
> <ml-postgresql-20081012-3518(at)gheift(dot)de> wrote:
> > On Fri, Nov 21, 2008 at 02:50:45PM +0200, Ciprian Dorin Craciun wrote:
> >> Hello all!
> >>
> >> I would like to ask some advice about the following problem
> >> (related to the Dehems project: http://www.dehems.eu/ ):
> >> * there are some clients; (the clients are in fact house holds;)
> >> * each device has a number of sensors (about 10), and not all the
> >> clients have the same sensor; also sensors might appear and disappear
> >> dynamicaly; (the sensors are appliances;)
> >> * for each device and each sensor a reading is produced (at about
> >> 6 seconds); (the values could be power consumptions;)
> >> * I would like to store the following data: (client, sensor,
> >> timestamp, value);
> >> * the usual queries are:
> >> * for a given client (and sensor), and time interval, I need
> >> the min, max, and avg of the values;
> >> * for a given time interval (and sensor), I need min, max, and
> >> avg of the values;
> >> * other statistics;
> >>
> >> Currently I'm benchmarking the following storage solutions for this:
> >> * Hypertable (http://www.hypertable.org/) -- which has good insert
> >> rate (about 250k inserts / s), but slow read rate (about 150k reads /
> >> s); (the aggregates are manually computed, as Hypertable does not
> >> support other queries except scanning (in fact min, and max are easy
> >> beeing the first / last key in the ordered set, but avg must be done
> >> by sequential scan);)
> >> * BerkeleyDB -- quite Ok insert rate (about 50k inserts / s), but
> >> fabulos read rate (about 2M reads / s); (the same issue with
> >> aggregates;)
> >> * Postgres -- which behaves quite poorly (see below)...
> >> * MySQL -- next to be tested;
> >
> > For this case I would use RRDTool: http://oss.oetiker.ch/rrdtool/
> >
> > Regards,
> > Gerhard
> >
> > -----BEGIN PGP SIGNATURE-----
> > Version: GnuPG v1.4.9 (GNU/Linux)
> >
> > iEYEARECAAYFAkkmr9wACgkQa8fhU24j2flejQCeKl650qmSlomovOXDl6IKrADb
> > cTAAnRebAFq420MuW9aMmhoFOo+sPIje
> > =Zcoo
> > -----END PGP SIGNATURE-----
>
> Hy Gerhard, I know about RRDTool, but it has some limitations:
> * I must know in advance the number of sensors;
> * I must create for each client a file (and If I have 10 thousand
> clients?);
> * I have a limited amount of history;
> * (I'm not sure about this one but i think that) I must insert
> each data point by executing a command;
> * and also I can not replicate (distribute) it easily;
>
> Or have you used RRDTool in a similar context as mine? Do you have
> some benchmarks?
>
> Ciprian.
>
> --
> Sent via pgsql-general mailing list (pgsql-general(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Pavel Stehule 2008-11-21 16:02:36 Re: converter pgplsql funcion
Previous Message Joshua D. Drake 2008-11-21 15:30:13 Re: Postgres mail list traffic over time