Re: Using Postgres to store high volume streams of sensor readings

From: "V S P" <toreason(at)fastmail(dot)fm>
To: "Ciprian Dorin Craciun" <ciprian(dot)craciun(at)gmail(dot)com>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Alvaro Herrera" <alvherre(at)commandprompt(dot)com>, "Shane Ambler" <pgsql(at)sheeky(dot)biz>, "Diego Schulz" <dschulz(at)gmail(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: Using Postgres to store high volume streams of sensor readings
Date: 2008-11-23 16:06:27
Message-ID: 1227456387.24744.1286313811@webmail.messagingengine.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

While most of my experience with oracle/informix

I would also recommend
a) partitioning on DB level
Put partitions on on separate hard disks, have the system to be
at least dual core, and make the disks to be attached via SCSI
controller (not IDE) for parallel performance.

b) partitioning on application level (that is having
the insert code dynamically figure out what DB/and what table to go
(this complicates the application for inserts as well as for reports)

c) may be there is a chance to remove the index (if all you are doing
is inserts) -- and then recreate it later?

e) I did not see the type of index but if the value of at least
some of the indexed fields repeated a lot -- Oracle had what's called
'bitmap index'
Postgresql might have something similar, where that type of index
is optimized for the fact that values are the same for majority
of the rows (it becomes much smaller, and therefore quicker to update).

f) review that there are no insert triggers and
constraints (eithe field or foreign) on those tables
if there -- validate why they are there and see if they can
be removed -- and the application would then need to gurantee
correctness

VSP

On Sun, 23 Nov 2008 08:34:57 +0200, "Ciprian Dorin Craciun"
<ciprian(dot)craciun(at)gmail(dot)com> said:
> On Sun, Nov 23, 2008 at 1:02 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> >> The problem is, most likely, on updating the indexes. Heap inserts
> >> should always take more or less the same time, but index insertion
> >> requires walking down the index struct for each insert, and the path to
> >> walk gets larger the more data you have.
> >
> > It's worse than that: his test case inserts randomly ordered keys, which
> > means that there's no locality of access during the index updates. Once
> > the indexes get bigger than RAM, update speed goes into the toilet,
> > because the working set of index pages that need to be touched also
> > is bigger than RAM. That effect is going to be present in *any*
> > standard-design database, not just Postgres.
> >
> > It's possible that performance in a real-world situation would be
> > better, if the incoming data stream isn't so random; but it's
> > hard to tell about that with the given facts.
> >
> > One possibly useful trick is to partition the data by timestamp with
> > partition sizes chosen so that the indexes don't get out of hand.
> > But the partition management might be enough of a PITA to negate
> > any win.
> >
> > regards, tom lane
>
> Thanks for your feedback! This is just as I supposed, but i didn't
> had the Postgres experience to be certain.
> I'll include your conclusion to my report.
>
> Ciprian Craciun.
>
> --
> Sent via pgsql-general mailing list (pgsql-general(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
--
V S P
toreason(at)fastmail(dot)fm

--
http://www.fastmail.fm - Email service worth paying for. Try it for free

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2008-11-23 16:37:03 Re: delete commands fails silently to delete primary key
Previous Message V S P 2008-11-23 15:43:37 Re: [Q]updating multiple rows with Different values