Quick Links

Re: In-Memory Columnar Store

Lists:	pgsql-hackers

From:	knizhnik <knizhnik(at)garret(dot)ru>
To:	pgsql-hackers(at)postgresql(dot)org
Cc:	Abhijit Menon-Sen <ams(at)2ndQuadrant(dot)com>, Oleg Bartunov <obartunov(at)gmail(dot)com>
Subject:	In-Memory Columnar Store
Date:	2013-12-09 19:40:41
Message-ID:	52A61CB9.3080906@garret.ru
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hello!

I want to annouce my implementation of In-Memory Columnar Store
extension for PostgreSQL:

Documentation: http://www.garret.ru/imcs/user_guide.html
Sources: http://www.garret.ru/imcs-1.01.tar.gz

Any feedbacks, bug reports and suggestions are welcome.

Vertical representation of data is stored in PostgreSQL shared memory.
This is why it is important to be able to utilize all available physical
memory.
Now servers with Tb or more RAM are not something exotic, especially in
financial world.
But there is limitation in Linux with standard 4kb pages for maximal
size of mapped memory segment: 256Gb.
It is possible to overcome this limitation either by creating multiple
segments - but it requires too much changes in PostgreSQL memory manager.
Or just set MAP_HUGETLB flag (assuming that huge pages were allocated in
the system).

I found several messages related with MAP_HUGETLB flag, the most recent
one was from 21 of November:
http://www.postgresql.org/message-id/20131125032920.GA23793@toroid.org

I wonder what is the current status of this patch?

From:	Merlin Moncure <mmoncure(at)gmail(dot)com>
To:	knizhnik <knizhnik(at)garret(dot)ru>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com>, Oleg Bartunov <obartunov(at)gmail(dot)com>
Subject:	Re: In-Memory Columnar Store
Date:	2013-12-11 14:33:20
Message-ID:	CAHyXU0zUMHVmPE5HapY9ufhQdGJ5jJmCRaFuFst5zzhbrMPVLA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Dec 9, 2013 at 1:40 PM, knizhnik <knizhnik(at)garret(dot)ru> wrote:
> Hello!
>
> I want to annouce my implementation of In-Memory Columnar Store extension
> for PostgreSQL:
>
> Documentation: http://www.garret.ru/imcs/user_guide.html
> Sources: http://www.garret.ru/imcs-1.01.tar.gz
>
> Any feedbacks, bug reports and suggestions are welcome.
>
> Vertical representation of data is stored in PostgreSQL shared memory.
> This is why it is important to be able to utilize all available physical
> memory.
> Now servers with Tb or more RAM are not something exotic, especially in
> financial world.
> But there is limitation in Linux with standard 4kb pages for maximal size
> of mapped memory segment: 256Gb.
> It is possible to overcome this limitation either by creating multiple
> segments - but it requires too much changes in PostgreSQL memory manager.
> Or just set MAP_HUGETLB flag (assuming that huge pages were allocated in the
> system).
>
> I found several messages related with MAP_HUGETLB flag, the most recent one
> was from 21 of November:
> http://www.postgresql.org/message-id/20131125032920.GA23793@toroid.org
>
> I wonder what is the current status of this patch?

I looked over your extension. I think it's a pretty amazing example
of the postgres extension and type systems -- up there with postgis.
Very well done. How long did this take you to write?

MAP_HUGETLB patch was marked 'returned with feedback'.
https://commitfest.postgresql.org/action/patch_view?id=1308. It seems
likely to be revived, perhaps in time for 9.4.

Honestly, I think your efforts here provide more argument for adding
huge tbl support.

merlin

From:	"ktm(at)rice(dot)edu" <ktm(at)rice(dot)edu>
To:	knizhnik <knizhnik(at)garret(dot)ru>
Cc:	pgsql-hackers(at)postgresql(dot)org, Abhijit Menon-Sen <ams(at)2ndQuadrant(dot)com>, Oleg Bartunov <obartunov(at)gmail(dot)com>
Subject:	Re: In-Memory Columnar Store
Date:	2013-12-11 15:14:17
Message-ID:	20131211151417.GM19242@aart.rice.edu
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Dec 09, 2013 at 11:40:41PM +0400, knizhnik wrote:
> Hello!
>
> I want to annouce my implementation of In-Memory Columnar Store
> extension for PostgreSQL:
>
> Documentation: http://www.garret.ru/imcs/user_guide.html
> Sources: http://www.garret.ru/imcs-1.01.tar.gz
>
> Any feedbacks, bug reports and suggestions are welcome.
>
> Vertical representation of data is stored in PostgreSQL shared memory.
> This is why it is important to be able to utilize all available
> physical memory.

Hi,

This is very neat! The question I have, which applies to the matview
support as well, is "How can we transparently substitute usage of the
in-memory columnar store/matview in a SQL query?".

Regards,
Ken

From:	knizhnik <knizhnik(at)garret(dot)ru>
To:	Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com>, Oleg Bartunov <obartunov(at)gmail(dot)com>
Subject:	Re: In-Memory Columnar Store
Date:	2013-12-11 16:08:28
Message-ID:	52A88DFC.5020601@garret.ru
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hello!

Implementation of IMCS itself took me about two months (with testing and
writing documentation).
But huge part of the code was previously written by me for other
projects, so I have reused them.
Most of the time I have spent in integration of this code with
PostgreSQL (I was not so familiar with it before).

Certainly implementations of columnar store for Oracle (Oracle Database
In-Memory Option), DB2 (BLU Acceleration), ... are more convenient for
users: them can execute normal SQL queries and do not require users to
learn new functions and approach. But it requires complete redesign of
query engine (or providing alternative implementation). I was not able
to do it.

This is why I try to provide advantages of vertical data representation
(vector operation, parallel execution, data skipping) as well as
advantages of fast access to in-memory data as standard PostgreSQL
extension. There are obviously some limitations and queries look more
complicated than in case of standard SQL...

But from the other side it is possible to write queries which are hardly
to be expressed using standard SQL.
For example calculating split-adjusted prices can not be done in SQL
without using stored procedures.
To make usage of IMCS functions as simple as possible I defined a larger
number of various operators for most popular operations.
For example Volume-Weighted-Average-Price can be calculated just as:

select Volume//Close as VWAP from Quote_get();

It is even shore than analog SQL statement:

select sum(Close*Volume)/sum(Volume) as VWAP from Quote;

Concerning integration with PostgreSQL, there were several problems.
Some of them seems to have no easy solution, but other are IMHO
imperfections in PostgreSQL which I hope will be fixed sometime:

1. Calls in PL/pgSQL are very slow - about 1-2 micsroseconds at my
computer. Just defining insertion per-row trigger with empty procedure
increase time of insertion of 6 million records twice - from 7 till 15
seconds. If trigger procedure is not empty, then time is increased
proportionally number of performed calls.
In my case inserting data with propagation it in columnar store using
trigger takes about 80 seconds. But if I first load data without
triggers in PostgreSQL table and then
insert it in columnar store using load function (implemented in C), then
time will be 7+9=16 seconds.

Certainly I realize that plpgsql is interpreted language. But for
example also interpreted Python is able to do 100 times more calls per
second.
Unfortunately profiler doesn;t show some bottleneck - looks like long
calltime is caused by large overhead of initializing and resetting
memory context and copying arguments data.

2. Inefficient implementation of expanding composite type columns using
(foo()).* clause. In this case function foo() will be invoked as much
times as there are fields in the returned composite type. Even in case
of placing call in FROM list (thanks to lateral joins in 9.3),
PostgreSQL still sometimes performs redundant calls which can be avoided
using hack with adding "OFFSET 1" clause.

3. 256Gb limit for used shared memory segment size at Linux.

Concerning last problem - I have included in IMCS distributive much
simpler patch which just set MAP_HUGETLB flags when
a) is it defined in system headers
b) requested memory size is larger than 256Gb

In this case right now PostgreSQL will just fail to start.
But certainly it is more correct to trigger this flag through
configuration parameter, because large pages can minimize MMU overhead
and so increase speed even if size of used memory is less than 256Gb
(this is why Oracle is widely using it).

. Вызов функции занимает прядка 2 микросекунд. Т.е. если я напишу
триггер с пустой процедурой, то вставка 6 миллионов объектов займёт 15
секунд. Это при том, что без триггера вставка занимает всего 7 секунд...

On 12/11/2013 06:33 PM, Merlin Moncure wrote:
> On Mon, Dec 9, 2013 at 1:40 PM, knizhnik <knizhnik(at)garret(dot)ru> wrote:
>> Hello!
>>
>> I want to annouce my implementation of In-Memory Columnar Store extension
>> for PostgreSQL:
>>
>> Documentation: http://www.garret.ru/imcs/user_guide.html
>> Sources: http://www.garret.ru/imcs-1.01.tar.gz
>>
>> Any feedbacks, bug reports and suggestions are welcome.
>>
>> Vertical representation of data is stored in PostgreSQL shared memory.
>> This is why it is important to be able to utilize all available physical
>> memory.
>> Now servers with Tb or more RAM are not something exotic, especially in
>> financial world.
>> But there is limitation in Linux with standard 4kb pages for maximal size
>> of mapped memory segment: 256Gb.
>> It is possible to overcome this limitation either by creating multiple
>> segments - but it requires too much changes in PostgreSQL memory manager.
>> Or just set MAP_HUGETLB flag (assuming that huge pages were allocated in the
>> system).
>>
>> I found several messages related with MAP_HUGETLB flag, the most recent one
>> was from 21 of November:
>> http://www.postgresql.org/message-id/20131125032920.GA23793@toroid.org
>>
>> I wonder what is the current status of this patch?
> I looked over your extension. I think it's a pretty amazing example
> of the postgres extension and type systems -- up there with postgis.
> Very well done. How long did this take you to write?
>
> MAP_HUGETLB patch was marked 'returned with feedback'.
> https://commitfest.postgresql.org/action/patch_view?id=1308. It seems
> likely to be revived, perhaps in time for 9.4.
>
> Honestly, I think your efforts here provide more argument for adding
> huge tbl support.
>
> merlin

From:	knizhnik <knizhnik(at)garret(dot)ru>
To:	"ktm(at)rice(dot)edu" <ktm(at)rice(dot)edu>
Cc:	pgsql-hackers(at)postgresql(dot)org, Abhijit Menon-Sen <ams(at)2ndQuadrant(dot)com>, Oleg Bartunov <obartunov(at)gmail(dot)com>
Subject:	Re: In-Memory Columnar Store
Date:	2013-12-11 16:25:33
Message-ID:	52A891FD.8090901@garret.ru
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

I depends on what you mean by "transparently substitute".
I f you want to be able to execute standard SQL queries using columnar
store, then it seems to be impossible without rewriting of executor.
I provided another approach based on calling standard functions which
perform manipulations not with scalar types but with timeseries.

For example instead of standard SQL

select sum(ClosePrice) from Quote;

I will have to write:

select cs_sum(ClosePrice) from Quote_get();

It looks similar but not quite the same.
And for more complex queries difference is larger.
For example the query

select sum(score*volenquired)/sum(volenquired) from DbItem group by
(trader,desk,office);

can be written as

select agg_val,cs_cut(group_by,'c22c30c10') from
(select (cs_project_agg(ss1.*)).* from
(select (s1).sum/(s2).sum,(s1).groups from DbItem_get() q,
cs_hash_sum(q.score*q.volenquired,
q.trader||q.desk||q.office) s1,
cs_hash_sum(q.volenquired, q.trader||q.desk||q.office)
s2) ss1) ss2;

Looks too complex, doesn't it?
But first two lines are responsible to perform reverse mapping: from
vertical data representation to normal horisontal tuples.
The good thing is that this query is executed more than 1000 times
faster (with default PostgreSQL configuration parameters except shared
shared_buffers
which was set large enough to fit all data in memory).

On 12/11/2013 07:14 PM, ktm(at)rice(dot)edu wrote:
> On Mon, Dec 09, 2013 at 11:40:41PM +0400, knizhnik wrote:
>> Hello!
>>
>> I want to annouce my implementation of In-Memory Columnar Store
>> extension for PostgreSQL:
>>
>> Documentation: http://www.garret.ru/imcs/user_guide.html
>> Sources: http://www.garret.ru/imcs-1.01.tar.gz
>>
>> Any feedbacks, bug reports and suggestions are welcome.
>>
>> Vertical representation of data is stored in PostgreSQL shared memory.
>> This is why it is important to be able to utilize all available
>> physical memory.
> Hi,
>
> This is very neat! The question I have, which applies to the matview
> support as well, is "How can we transparently substitute usage of the
> in-memory columnar store/matview in a SQL query?".
>
> Regards,
> Ken

From:	Merlin Moncure <mmoncure(at)gmail(dot)com>
To:	knizhnik <knizhnik(at)garret(dot)ru>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com>, Oleg Bartunov <obartunov(at)gmail(dot)com>
Subject:	Re: In-Memory Columnar Store
Date:	2013-12-11 18:33:08
Message-ID:	CAHyXU0zpYpPpK5-keDrhBhXhBSA_qdK1DMhybwLRPzB_omsqdA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Dec 11, 2013 at 10:08 AM, knizhnik <knizhnik(at)garret(dot)ru> wrote:
> 1. Calls in PL/pgSQL are very slow - about 1-2 micsroseconds at my computer.
> Just defining insertion per-row trigger with empty procedure increase time
> of insertion of 6 million records twice - from 7 till 15 seconds. If trigger
> procedure is not empty, then time is increased proportionally number of
> performed calls.
> In my case inserting data with propagation it in columnar store using
> trigger takes about 80 seconds. But if I first load data without triggers in
> PostgreSQL table and then
> insert it in columnar store using load function (implemented in C), then
> time will be 7+9=16 seconds.

Yeah. For this problem, we either unfortunately have to try to try to
use standard sql functions in such away that supports inlining (this
is a black art mostly, and fragile), or move logic out of the function
and into the query via things like window functions, or just deal with
the performance hit. postgres flavored SQL is pretty much the most
productive language on the planet AFAIC, but the challenge is always
performance, performance.

Down the line, I am optimistic per call function overhead can be
optimized, probably by expanding what can be inlined somehow. The
problem is that this requires cooperation from the language executors
this is not currently possible through the SPI interface, so I really
don't know.

> Certainly I realize that plpgsql is interpreted language. But for example
> also interpreted Python is able to do 100 times more calls per second.
> Unfortunately profiler doesn;t show some bottleneck - looks like long
> calltime is caused by large overhead of initializing and resetting memory
> context and copying arguments data.
>
> 2. Inefficient implementation of expanding composite type columns using
> (foo()).* clause. In this case function foo() will be invoked as much times
> as there are fields in the returned composite type. Even in case of placing
> call in FROM list (thanks to lateral joins in 9.3), PostgreSQL still
> sometimes performs redundant calls which can be avoided using hack with
> adding "OFFSET 1" clause.

Yeah, this is long standing headache. LATERAL mostly deals with this
but most cases (even with pre-9.3) can be worked around one way or
another.

> 3. 256Gb limit for used shared memory segment size at Linux.

I figure this will be solved fairly soon. It's a nice problem to have.

merlin

From:	Kevin Grittner <kgrittn(at)ymail(dot)com>
To:	"ktm(at)rice(dot)edu" <ktm(at)rice(dot)edu>, knizhnik <knizhnik(at)garret(dot)ru>
Cc:	"pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Abhijit Menon-Sen <ams(at)2ndQuadrant(dot)com>, Oleg Bartunov <obartunov(at)gmail(dot)com>
Subject:	Re: In-Memory Columnar Store
Date:	2013-12-11 18:52:22
Message-ID:	1386787942.33209.YahooMailNeo@web162901.mail.bf1.yahoo.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

"ktm(at)rice(dot)edu" <ktm(at)rice(dot)edu> wrote:

> The question I have, which applies to the matview support as
> well, is "How can we transparently substitute usage of the
> in-memory columnar store/matview in a SQL query?".

My take on that regarding matviews is:

(1) It makes no sense to start work on this without a far more
sophisticated concept of matview "freshness" (or "staleness", as
some products prefer to call it).

(2) Work on query rewrite to use sufficiently fresh matviews to
optimize the execution of a query and work on "freshness" tracking
are orthogonal to work on incremental maintenance.

I have no plans to work on either matview freshness or rewrite, as
there seems to be several years worth of work to get incremental
maintenance up to a level matching other products. I welcome
anyone else to take on those other projects.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	desmodemone <desmodemone(at)gmail(dot)com>
To:	knizhnik <knizhnik(at)garret(dot)ru>
Cc:	PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com>, Oleg Bartunov <obartunov(at)gmail(dot)com>
Subject:	Re: In-Memory Columnar Store
Date:	2013-12-12 00:06:44
Message-ID:	CAEs9oFn920CSw_0k+TTa79cdF6zQC+TdHz3xwvbdXmXa_iEMZQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

2013/12/9 knizhnik <knizhnik(at)garret(dot)ru>

> Hello!
>
> I want to annouce my implementation of In-Memory Columnar Store extension
> for PostgreSQL:
>
> Documentation: http://www.garret.ru/imcs/user_guide.html
> Sources: http://www.garret.ru/imcs-1.01.tar.gz
>
> Any feedbacks, bug reports and suggestions are welcome.
>
> Vertical representation of data is stored in PostgreSQL shared memory.
> This is why it is important to be able to utilize all available physical
> memory.
> Now servers with Tb or more RAM are not something exotic, especially in
> financial world.
> But there is limitation in Linux with standard 4kb pages for maximal size
> of mapped memory segment: 256Gb.
> It is possible to overcome this limitation either by creating multiple
> segments - but it requires too much changes in PostgreSQL memory manager.
> Or just set MAP_HUGETLB flag (assuming that huge pages were allocated in
> the system).
>
> I found several messages related with MAP_HUGETLB flag, the most recent
> one was from 21 of November:
> http://www.postgresql.org/message-id/20131125032920.GA23793@toroid.org
>
> I wonder what is the current status of this patch?
>
>
>
>
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

Hello,
excellent work! I begin to do testing and it's very fast, by the
way I found a strange case of "endless" query with CPU a 100% when the
value used as filter does not exists:

I am testing with postgres 9.3.1 on debian and I used default value for
the extension except memory ( 512mb )

how to recreate the test case :

## create a table :

create table endless ( col1 int , col2 char(30) , col3 int ) ;

## insert some values:

insert into endless values ( 1, 'ahahahaha', 3);

insert into endless values ( 2, 'ghghghghg', 4);

## create the column store objects:

select cs_create('endless','col1','col2');
cs_create
-----------

(1 row)

## try and test column store :

select cs_avg(col3) from endless_get('ahahahaha');
cs_avg
--------
3
(1 row)

select cs_avg(col3) from endless_get('ghghghghg');
cs_avg
--------
4
(1 row)

## now select with a value that does not exist :

select cs_avg(col3) from endless_get('testing');

# and now start to loop on cpu and seems to never ends , I had to
terminate backend

Bye

Mat

From:	knizhnik <knizhnik(at)garret(dot)ru>
To:	desmodemone <desmodemone(at)gmail(dot)com>
Cc:	PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com>, Oleg Bartunov <obartunov(at)gmail(dot)com>
Subject:	Re: In-Memory Columnar Store
Date:	2013-12-12 06:53:00
Message-ID:	52A95D4C.1030204@garret.ru
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Thank you very much for reporting the problem.
And sorry for this bug and lack of negative tests.

Attempt to access unexisted value cause autoloading of data from the
table to columnar store (because autoload property is enabled by default)
and as far as this entry is not present in the table, the code falls
into infinite recursion.
Patched version of IMCS is available at
http://www.garret.ru/imcs-1.01.tar.gz

I am going to place IMCS under version control now. Just looking for
proper place for repository...

On 12/12/2013 04:06 AM, desmodemone wrote:
>
>
>
> 2013/12/9 knizhnik <knizhnik(at)garret(dot)ru <mailto:knizhnik(at)garret(dot)ru>>
>
> Hello!
>
> I want to annouce my implementation of In-Memory Columnar Store
> extension for PostgreSQL:
>
> Documentation: http://www.garret.ru/imcs/user_guide.html
> Sources: http://www.garret.ru/imcs-1.01.tar.gz
>
> Any feedbacks, bug reports and suggestions are welcome.
>
> Vertical representation of data is stored in PostgreSQL shared memory.
> This is why it is important to be able to utilize all available
> physical memory.
> Now servers with Tb or more RAM are not something exotic,
> especially in financial world.
> But there is limitation in Linux with standard 4kb pages for
> maximal size of mapped memory segment: 256Gb.
> It is possible to overcome this limitation either by creating
> multiple segments - but it requires too much changes in PostgreSQL
> memory manager.
> Or just set MAP_HUGETLB flag (assuming that huge pages were
> allocated in the system).
>
> I found several messages related with MAP_HUGETLB flag, the most
> recent one was from 21 of November:
> http://www.postgresql.org/message-id/20131125032920.GA23793@toroid.org
>
> I wonder what is the current status of this patch?
>
>
>
>
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org
> <mailto:pgsql-hackers(at)postgresql(dot)org>)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>
>
> Hello,
> excellent work! I begin to do testing and it's very fast,
> by the way I found a strange case of "endless" query with CPU a 100%
> when the value used as filter does not exists:
>
> I am testing with postgres 9.3.1 on debian and I used default value
> for the extension except memory ( 512mb )
>
> how to recreate the test case :
>
> ## create a table :
>
> create table endless ( col1 int , col2 char(30) , col3 int ) ;
>
> ## insert some values:
>
> insert into endless values ( 1, 'ahahahaha', 3);
>
> insert into endless values ( 2, 'ghghghghg', 4);
>
> ## create the column store objects:
>
> select cs_create('endless','col1','col2');
> cs_create
> -----------
>
> (1 row)
>
> ## try and test column store :
>
> select cs_avg(col3) from endless_get('ahahahaha');
> cs_avg
> --------
> 3
> (1 row)
>
> select cs_avg(col3) from endless_get('ghghghghg');
> cs_avg
> --------
> 4
> (1 row)
>
> ## now select with a value that does not exist :
>
> select cs_avg(col3) from endless_get('testing');
>
> # and now start to loop on cpu and seems to never ends , I had to
> terminate backend
>
> Bye
>
> Mat

From:	Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To:	knizhnik <knizhnik(at)garret(dot)ru>
Cc:	Oleg Bartunov <obartunov(at)gmail(dot)com>, Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: In-Memory Columnar Store
Date:	2013-12-12 07:42:44
Message-ID:	CAFj8pRDv7dHivPVrCQFT3zz2a7-d9+nm6cwB-mQcxOWtkdrrDg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

it is interesting idea. For me, a significant information from comparation,
so we do some significantly wrong. Memory engine should be faster
naturally, but I don't tkink it can be 1000x.

Yesterday we did a some tests, that shows so for large tables (5G)a our
hashing is not effective. Disabling hash join and using merge join
increased speed 2x
Dne 9. 12. 2013 20:41 "knizhnik" <knizhnik(at)garret(dot)ru> napsal(a):
>
> Hello!
>
> I want to annouce my implementation of In-Memory Columnar Store extension
for PostgreSQL:
>
> Documentation: http://www.garret.ru/imcs/user_guide.html
> Sources: http://www.garret.ru/imcs-1.01.tar.gz
>
> Any feedbacks, bug reports and suggestions are welcome.
>
> Vertical representation of data is stored in PostgreSQL shared memory.
> This is why it is important to be able to utilize all available physical
memory.
> Now servers with Tb or more RAM are not something exotic, especially in
financial world.
> But there is limitation in Linux with standard 4kb pages for maximal
size of mapped memory segment: 256Gb.
> It is possible to overcome this limitation either by creating multiple
segments - but it requires too much changes in PostgreSQL memory manager.
> Or just set MAP_HUGETLB flag (assuming that huge pages were allocated in
the system).
>
> I found several messages related with MAP_HUGETLB flag, the most recent
one was from 21 of November:
> http://www.postgresql.org/message-id/20131125032920.GA23793@toroid.org
>
> I wonder what is the current status of this patch?
>
>
>
>
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

From:	knizhnik <knizhnik(at)garret(dot)ru>
To:	Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc:	Oleg Bartunov <obartunov(at)gmail(dot)com>, Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: In-Memory Columnar Store
Date:	2013-12-12 10:02:24
Message-ID:	52A989B0.5020407@garret.ru
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 12/12/2013 11:42 AM, Pavel Stehule wrote:
>
> it is interesting idea. For me, a significant information from
> comparation, so we do some significantly wrong. Memory engine should
> be faster naturally, but I don't tkink it can be 1000x.
>

Sorry, but I didn't fabricate this results:
Below is just snapshot from my computer:

postgres=# select DbItem_load();
dbitem_load
-------------
9999998
(1 row)

postgres=# \timing
Timing is on.
postgres=# select cs_used_memory();
cs_used_memory
----------------
4441894912
(1 row)

postgres=# select agg_val,cs_cut(group_by,'c22c30c10') from
(select (cs_project_agg(ss1.*)).* from
(select (s1).sum/(s2).sum,(s1).groups from DbItem_get() q,
cs_hash_sum(q.score*q.volenquired,
q.trader||q.desk||q.office) s1,
cs_hash_sum(q.volenquired, q.trader||q.desk||q.office)
s2) ss1) ss2;
agg_val | cs_cut
------------------+------------------------------------------------------------
1.50028393511844 | ("John Coltrane","New York Corporates","New York")
....
Time: 506.125 ms

postgres=# select sum(score*volenquired)/sum(volenquired) from DbItem
group by (trader,desk,office);
...
Time: 449328.645 ms
postgres=# select sum(score*volenquired)/sum(volenquired) from DbItem
group by (trader,desk,office);
...
Time: 441530.689 ms

Please notice that time of second execution is almost the same as first,
although all data can fit in cache!

Certainly it was intersting to me to understand the reason of such bad
performance.
And find out two things:

1.
select sum(score*volenquired)/sum(volenquired) from DbItem group
by (trader,desk,office);
and
select sum(score*volenquired)/sum(volenquired) from DbItem group
by trader,desk,office;

are not the same queries (it is hard to understand to C programmer:)
And first one is executed significantly slower.

2. It is not enough to increase "shared_buffers" parameter in
postgresql.conf.
"work_mem" is also very important. When I increased it to 1Gb from
default 1Mb, then time of query execution is reduced to
7107.146 ms. So the real difference is ten times, not 1000 times.

> Yesterday we did a some tests, that shows so for large tables (5G)a
> our hashing is not effective. Disabling hash join and using merge join
> increased speed 2x
> Dne 9. 12. 2013 20:41 "knizhnik" <knizhnik(at)garret(dot)ru
> <mailto:knizhnik(at)garret(dot)ru>> napsal(a):
> >
> > Hello!
> >
> > I want to annouce my implementation of In-Memory Columnar Store
> extension for PostgreSQL:
> >
> > Documentation: http://www.garret.ru/imcs/user_guide.html
> > Sources: http://www.garret.ru/imcs-1.01.tar.gz
> >
> > Any feedbacks, bug reports and suggestions are welcome.
> >
> > Vertical representation of data is stored in PostgreSQL shared memory.
> > This is why it is important to be able to utilize all available
> physical memory.
> > Now servers with Tb or more RAM are not something exotic, especially
> in financial world.
> > But there is limitation in Linux with standard 4kb pages for
> maximal size of mapped memory segment: 256Gb.
> > It is possible to overcome this limitation either by creating
> multiple segments - but it requires too much changes in PostgreSQL
> memory manager.
> > Or just set MAP_HUGETLB flag (assuming that huge pages were
> allocated in the system).
> >
> > I found several messages related with MAP_HUGETLB flag, the most
> recent one was from 21 of November:
> > http://www.postgresql.org/message-id/20131125032920.GA23793@toroid.org
> >
> > I wonder what is the current status of this patch?
> >
> >
> >
> >
> >
> >
> > --
> > Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org
> <mailto:pgsql-hackers(at)postgresql(dot)org>)
> > To make changes to your subscription:
> > http://www.postgresql.org/mailpref/pgsql-hackers
>

From:	Merlin Moncure <mmoncure(at)gmail(dot)com>
To:	knizhnik <knizhnik(at)garret(dot)ru>
Cc:	Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Oleg Bartunov <obartunov(at)gmail(dot)com>, Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: In-Memory Columnar Store
Date:	2013-12-12 15:03:46
Message-ID:	CAHyXU0wUtR6o4G1KTyCrxEnkHA96=wSNLcEDZHip4zgH2k0s9Q@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Dec 12, 2013 at 4:02 AM, knizhnik <knizhnik(at)garret(dot)ru> wrote:
> On 12/12/2013 11:42 AM, Pavel Stehule wrote:
>
> it is interesting idea. For me, a significant information from comparation,
> so we do some significantly wrong. Memory engine should be faster naturally,
> but I don't tkink it can be 1000x.
>
>
> Sorry, but I didn't fabricate this results:
> Below is just snapshot from my computer:
>
>
> postgres=# select DbItem_load();
> dbitem_load
> -------------
> 9999998
> (1 row)
>
> postgres=# \timing
> Timing is on.
> postgres=# select cs_used_memory();
> cs_used_memory
> ----------------
> 4441894912
> (1 row)
>
> postgres=# select agg_val,cs_cut(group_by,'c22c30c10') from
> (select (cs_project_agg(ss1.*)).* from
> (select (s1).sum/(s2).sum,(s1).groups from DbItem_get() q,
> cs_hash_sum(q.score*q.volenquired,
> q.trader||q.desk||q.office) s1,
> cs_hash_sum(q.volenquired, q.trader||q.desk||q.office) s2)
> ss1) ss2;
> agg_val | cs_cut
> ------------------+------------------------------------------------------------
> 1.50028393511844 | ("John Coltrane","New York Corporates","New York")
> ....
> Time: 506.125 ms
>
> postgres=# select sum(score*volenquired)/sum(volenquired) from DbItem group
> by (trader,desk,office);
> ...
> Time: 449328.645 ms
> postgres=# select sum(score*volenquired)/sum(volenquired) from DbItem group
> by (trader,desk,office);
> ...
> Time: 441530.689 ms
>
> Please notice that time of second execution is almost the same as first,
> although all data can fit in cache!
>
> Certainly it was intersting to me to understand the reason of such bad
> performance.
> And find out two things:
>
> 1.
> select sum(score*volenquired)/sum(volenquired) from DbItem group by
> (trader,desk,office);
> and
> select sum(score*volenquired)/sum(volenquired) from DbItem group by
> trader,desk,office;
>
> are not the same queries (it is hard to understand to C programmer:)
> And first one is executed significantly slower.
>
> 2. It is not enough to increase "shared_buffers" parameter in
> postgresql.conf.
> "work_mem" is also very important. When I increased it to 1Gb from default
> 1Mb, then time of query execution is reduced to
> 7107.146 ms. So the real difference is ten times, not 1000 times.

Yeah. It's not fair to compare vs an implementation that is
constrained to use only 1mb. For analytics work huge work mem is
pretty typical setting. 10x improvement is believable considering
you've removed all MVCC overhead, locking, buffer management, etc. and
have a simplified data structure.

merlin

From:	knizhnik <knizhnik(at)garret(dot)ru>
To:	Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc:	Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Oleg Bartunov <obartunov(at)gmail(dot)com>, Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: In-Memory Columnar Store
Date:	2013-12-12 18:18:24
Message-ID:	52A9FDF0.6000604@garret.ru
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 12/12/2013 07:03 PM, Merlin Moncure wrote:
> On Thu, Dec 12, 2013 at 4:02 AM, knizhnik <knizhnik(at)garret(dot)ru> wrote:
> Yeah. It's not fair to compare vs an implementation that is
> constrained to use only 1mb. For analytics work huge work mem is
> pretty typical setting. 10x improvement is believable considering
> you've removed all MVCC overhead, locking, buffer management, etc. and
> have a simplified data structure. merlin
I agree that it is not fair comparison. As an excuse I can say that I am
not an experienced PostgreSQL user, so I thought that setting
shared_buffers is enough to avoid disk access by PostgreSQL. Only after
getting such strange results I started investigation of how to properly
tune P{ostgreSQL parameters.

IMHO it is strange to see such small default values in postgresql
configuration - PostgreSQL is not an embedded database and now even
mobile devices have several gigs of memory...
Also it will be nice to have one single switch - how much physical
memory can PostgreSQL use. And let PostgreSQL spit it in optimal way.
For example I have no idea how to optimally split memory between
""shared_buffers", "temp_buffers", "work_mem", "maintenance_work_mem".
PostgreSQL itself should do this work much better than unexperienced
administrator.

And one of the possible values of such parameter can be "auto": make it
possible to automatically determine available memory (it is not a big
deal to check amount of available RAM in the system). I know that
vendors of big databases never tries to simplify configuration and
tuning of their products: just because most of the profit them get from
consulting. But I think that it is not true for PostgreSQL.

From:	Merlin Moncure <mmoncure(at)gmail(dot)com>
To:	knizhnik <knizhnik(at)garret(dot)ru>
Cc:	Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Oleg Bartunov <obartunov(at)gmail(dot)com>, Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: In-Memory Columnar Store
Date:	2013-12-12 18:26:47
Message-ID:	CAHyXU0zDetDik7X8zEbSgYK3VncvVL_su3qrgSshVrJOtTyN_w@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Dec 12, 2013 at 12:18 PM, knizhnik <knizhnik(at)garret(dot)ru> wrote:
> IMHO it is strange to see such small default values in postgresql
> configuration.

This (low default work mem) is because of three things:

1) Most queries do not really need a lot of work mem
2) Work mem stacks with each query using it -- so with your 1mb
setting vs 1000 connections, you get a gigabyte. So, some
conservatism is justified although this setting tended to be much more
dangerous in the old days when we measured memory in megabytes.
3) Postgres does not query available physical memory for default
settings due to portability issues. So we tend to tune to "common
denominator".

merlin