Re: select count() out of memory

From: Adrian Klaver <aklaver(at)comcast(dot)net>
To: pgsql-general(at)postgresql(dot)org
Cc: tfinneid(at)student(dot)matnat(dot)uio(dot)no, "Gregory Stark" <stark(at)enterprisedb(dot)com>
Subject: Re: select count() out of memory
Date: 2007-10-27 21:34:07
Message-ID: 200710271434.08061.aklaver@comcast.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Friday 26 October 2007 8:56 am, tfinneid(at)student(dot)matnat(dot)uio(dot)no wrote:
>
> Serious engineering does not imply perfect engineering, I have analyzed it
> and made my tradeoffs. What you are forgetting here is that you clearly
> dont understand the enire solution, So I will try to explain it again. And
> if you still think its bonkers, the I urge you to come up with a solution
> that works with the requirements.
>
> Every predefined X seconds (e.g. 3,6,9,12 etc ) a bunch of data arrives,
> which must be stored by descrete time groups, e.g. second 3,6,9,12. The
> data that arrives is approx 4MB per second, so in this case its 12MB. This
> has to be processed by the server and written to the db, within 1 second.
> There can be up to 5 writers at the same time. Within that same second, at
> least 16 readers should be able to read all the data, *each*. Writers and
> readers are only concerned with the latest data, i.e. data from the latest
> time group, e.g. second 9.
> This has to go on every predefined seconds for the next 6-12 weeks,
> without stop, pause or any errors. These are the requirements.
>
> When I performed performance tests I found several unwanted effects from
> several test scenarios. Here are the most important ones:
>
> - single large table, with indexes created when table is created.
> - this leads to the performance of an insert degrading as more data is
> added, when I get
> to 1 billion rows it took 50 seconds to add the data.
>
> My lesson from this is that
> - single inserts can never be efficient enough
> - indexes cause linear performance drop as data volume increases
>
> So I tried a different approach, which would address both issues:
>
> - separate tables for each bulk of data
> - use of bulk insert through jdbc COPY.
> - add indexes to the newly create table after the copy is finished.
>
> My lesson from this is:
> - insert take constant time, no matter how much data is in the base
> - adding the indexes after insert takes constant time, i.e. some
> milliseconds.
>
> From this I realised that using either single tables or partitions is the
> way to go, since I only need to access the latest data, i.e. the newest
> table, in normal situations.
>
> After thinking about it and discussing with this group, I found that using
> partitions would be more practical for two reasons:
> - changes to the parent table is automatically propagated to all child
> tables, so the schema remains consistent and the server wont brake because
> of differences in the tables.
> - it is more maintainable to use "create with inheritance" sql in source
> code than the entire ddl of the table.
>
> So now I have tested the server 24/7 for a week and a half, with 1 writer
> and 16 readers writing all the mentioned data, and everything works fine.
> Expect for the select on the parent table, which now runs out of memory.
> Which in it self is not a problem since I will never use the parent table
> in production in any case.
>
> regards
>
> tom
I might be missing the point, but couldn't you do a Copy to a single table
instead of multiple inserts and avoid the index overhead.
--
Adrian Klaver
aklaver(at)comcast(dot)net

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Dennis Brakhane 2007-10-27 23:59:22 How can I easily and effectively support (language) translation of column values?
Previous Message Alexander Staubo 2007-10-27 21:24:18 Re: Version 8.3