Re: In-Memory Columnar Store

From: knizhnik <knizhnik(at)garret(dot)ru>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Oleg Bartunov <obartunov(at)gmail(dot)com>, Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: In-Memory Columnar Store
Date: 2013-12-12 10:02:24
Message-ID: 52A989B0.5020407@garret.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 12/12/2013 11:42 AM, Pavel Stehule wrote:
>
> it is interesting idea. For me, a significant information from
> comparation, so we do some significantly wrong. Memory engine should
> be faster naturally, but I don't tkink it can be 1000x.
>

Sorry, but I didn't fabricate this results:
Below is just snapshot from my computer:

postgres=# select DbItem_load();
dbitem_load
-------------
9999998
(1 row)

postgres=# \timing
Timing is on.
postgres=# select cs_used_memory();
cs_used_memory
----------------
4441894912
(1 row)

postgres=# select agg_val,cs_cut(group_by,'c22c30c10') from
(select (cs_project_agg(ss1.*)).* from
(select (s1).sum/(s2).sum,(s1).groups from DbItem_get() q,
cs_hash_sum(q.score*q.volenquired,
q.trader||q.desk||q.office) s1,
cs_hash_sum(q.volenquired, q.trader||q.desk||q.office)
s2) ss1) ss2;
agg_val | cs_cut
------------------+------------------------------------------------------------
1.50028393511844 | ("John Coltrane","New York Corporates","New York")
....
Time: 506.125 ms

postgres=# select sum(score*volenquired)/sum(volenquired) from DbItem
group by (trader,desk,office);
...
Time: 449328.645 ms
postgres=# select sum(score*volenquired)/sum(volenquired) from DbItem
group by (trader,desk,office);
...
Time: 441530.689 ms

Please notice that time of second execution is almost the same as first,
although all data can fit in cache!

Certainly it was intersting to me to understand the reason of such bad
performance.
And find out two things:

1.
select sum(score*volenquired)/sum(volenquired) from DbItem group
by (trader,desk,office);
and
select sum(score*volenquired)/sum(volenquired) from DbItem group
by trader,desk,office;

are not the same queries (it is hard to understand to C programmer:)
And first one is executed significantly slower.

2. It is not enough to increase "shared_buffers" parameter in
postgresql.conf.
"work_mem" is also very important. When I increased it to 1Gb from
default 1Mb, then time of query execution is reduced to
7107.146 ms. So the real difference is ten times, not 1000 times.

> Yesterday we did a some tests, that shows so for large tables (5G)a
> our hashing is not effective. Disabling hash join and using merge join
> increased speed 2x
> Dne 9. 12. 2013 20:41 "knizhnik" <knizhnik(at)garret(dot)ru
> <mailto:knizhnik(at)garret(dot)ru>> napsal(a):
> >
> > Hello!
> >
> > I want to annouce my implementation of In-Memory Columnar Store
> extension for PostgreSQL:
> >
> > Documentation: http://www.garret.ru/imcs/user_guide.html
> > Sources: http://www.garret.ru/imcs-1.01.tar.gz
> >
> > Any feedbacks, bug reports and suggestions are welcome.
> >
> > Vertical representation of data is stored in PostgreSQL shared memory.
> > This is why it is important to be able to utilize all available
> physical memory.
> > Now servers with Tb or more RAM are not something exotic, especially
> in financial world.
> > But there is limitation in Linux with standard 4kb pages for
> maximal size of mapped memory segment: 256Gb.
> > It is possible to overcome this limitation either by creating
> multiple segments - but it requires too much changes in PostgreSQL
> memory manager.
> > Or just set MAP_HUGETLB flag (assuming that huge pages were
> allocated in the system).
> >
> > I found several messages related with MAP_HUGETLB flag, the most
> recent one was from 21 of November:
> > http://www.postgresql.org/message-id/20131125032920.GA23793@toroid.org
> >
> > I wonder what is the current status of this patch?
> >
> >
> >
> >
> >
> >
> > --
> > Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org
> <mailto:pgsql-hackers(at)postgresql(dot)org>)
> > To make changes to your subscription:
> > http://www.postgresql.org/mailpref/pgsql-hackers
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2013-12-12 10:02:55 Re: TODO: Split out pg_resetxlog output into pre- and post-sections
Previous Message Peter Geoghegan 2013-12-12 09:47:14 Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE