Quick Links

Re: [ANNOUNCE] IMCS: In Memory Columnar Store for PostgreSQL

Lists:	pgsql-announcepgsql-hackers

From:	knizhnik <knizhnik(at)garret(dot)ru>
To:	pgsql-announce(at)postgresql(dot)org
Subject:	IMCS: In Memory Columnar Store for PostgreSQL
Date:	2014-01-02 16:48:24
Message-ID:	52C59858.9090500@garret.ru
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-announce pgsql-hackers

I want to announce implementation of In-Memory Columnar Store extension
for PostgreSQL.
Vertical representation of data is stored in PostgreSQL shared memory.
Various basic and sophisticated analytic operators are provided for
manipulation with timeseries.

GitHub repository: https://github.com/knizhnik/imcs/
Documentation: http://www.garret.ru/imcs/user_guide.html
Sources: http://www.garret.ru/imcs-1.02.tar.gz

Columnar store manager stores data tables as sections of columns of data
rather than as rows of data.
Most of traditional DBMS-es store data in rows ("horizontally"): all
record attributes are stored together.
Such approach allows to load the whole record using one read operation
which usually leads to better performance for OLTP
queries (which access or update single records). But OLAP queries are
mostly performing operations on individual columns,
for example calculating sum or average of some column. In this case
vertical data representation, when data for each column
is stored independently, is more efficient. There are several DBMS-es in
marker which are based on vertical model: Vertica,
SciDB,... Also most of mainstream commercial databases also provide OLAP
extensions based on vertical storage:
Blue Acceleration for DB2, Oracle Database In-Memory Option, Microsoft
SQL server column store...

Columnar store or vertical representation of data allows to achieve
better performance in comparison with classical horizontal
representation due to three factors:
* Reducing size of fetched data: only columns involved in query are
accessed.
* Vector operations. Applying an operator to set of values (tile) makes
it possible to minimize interpretation cost.
Also SIMD instructions of modern processors accelerate execution of
vector operations.
* Compression of data. Certainly compression can also be used for all
the records, but independent compression of each column can give much
better results without significant extra CPU overhead. For example such
simple compression algorithm like RLE
(run-length-encoding) allows not only to reduce used space, but also
minimize number of performed operations.

From:	David Fetter <david(at)fetter(dot)org>
To:	knizhnik <knizhnik(at)garret(dot)ru>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [ANNOUNCE] IMCS: In Memory Columnar Store for PostgreSQL
Date:	2014-01-03 23:21:20
Message-ID:	20140103232120.GA4976@fetter.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-announce pgsql-hackers

On Thu, Jan 02, 2014 at 08:48:24PM +0400, knizhnik wrote:
> I want to announce implementation of In-Memory Columnar Store
> extension for PostgreSQL.
> Vertical representation of data is stored in PostgreSQL shared memory.

Thanks for the hard work!

I noticed a couple of things about this that probably need some
improvement.

1. There are unexplained patches against other parts of PostgreSQL,
which means that they may break other parts of PostgreSQL in equally
inexplicable ways. Please rearrange the patch so it doesn't require
this. This leads to:

2. The add-on is not formatted as an EXTENSION, which would allow
people to add it or remove it cleanly.

Would you be so kind as to fix these?

Cheers,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

From:	knizhnik <knizhnik(at)garret(dot)ru>
To:	David Fetter <david(at)fetter(dot)org>
Cc:	PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [ANNOUNCE] IMCS: In Memory Columnar Store for PostgreSQL
Date:	2014-01-04 07:46:25
Message-ID:	52C7BC51.3030709@garret.ru
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-announce pgsql-hackers

Hi David,

Sorry, but I do not completely understand your suggestions:

1. IMCS really contains single patch file sysv_shmem.patch.
Applying this patch is not mandatory for using IMCS: it just solves the
problem with support of > 256Gb of shared memory.
Right now PostgreSQL is not able to use more than 256Gb shared buffers
at Linux with standard 4kb pages.
I have found proposal for using MAP_HUGETLB flag in commit fest:

http://www.postgresql.org/message-id/20131125032920.GA23793@toroid.org

but unfortunately it was rejected. Hugepages are intensively used by
Oracle and I think that them will be useful for improving performance of
PorstreSQL. So not just IMCS can benefit from this patch. My patch is
much more simple - I specially limited scope of this patch to one file.
Certainly switch huge tlb on/off should be done through postgresql.conf
configuration file.

In any case - IMCS can be used without this patch: you just could not
use more than 256Gb memory, even if your system has more RAM.

2. I do not understand "The add-on is not formatted as an EXTENSION"
IMCS was created as standard extension - I just look at the examples of
other PostgreSQL extensions included in PostgreSQL distribution
(for example pg_stat_statements). It can be added using "create
extension imcs" and removed "drop extension imcs" commands.

If there are some violations of PostgreSQL extensions rules, please let
me know, I will fix them.
But I thought that I have done everything in legal way.

On 01/04/2014 03:21 AM, David Fetter wrote:
> On Thu, Jan 02, 2014 at 08:48:24PM +0400, knizhnik wrote:
>> I want to announce implementation of In-Memory Columnar Store
>> extension for PostgreSQL.
>> Vertical representation of data is stored in PostgreSQL shared memory.
> Thanks for the hard work!
>
> I noticed a couple of things about this that probably need some
> improvement.
>
> 1. There are unexplained patches against other parts of PostgreSQL,
> which means that they may break other parts of PostgreSQL in equally
> inexplicable ways. Please rearrange the patch so it doesn't require
> this. This leads to:
>
> 2. The add-on is not formatted as an EXTENSION, which would allow
> people to add it or remove it cleanly.
>
> Would you be so kind as to fix these?
>
> Cheers,
> David.

From:	David Fetter <david(at)fetter(dot)org>
To:	knizhnik <knizhnik(at)garret(dot)ru>
Cc:	PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [ANNOUNCE] IMCS: In Memory Columnar Store for PostgreSQL
Date:	2014-01-04 08:05:11
Message-ID:	20140104080511.GA12040@fetter.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-announce pgsql-hackers

I'm sorry I misunderstood about the extension you wrote.

Is there some way not to use shared memory for it?

Cheers,
David.

On Sat, Jan 04, 2014 at 11:46:25AM +0400, knizhnik wrote:
> Hi David,
>
> Sorry, but I do not completely understand your suggestions:
>
> 1. IMCS really contains single patch file sysv_shmem.patch.
> Applying this patch is not mandatory for using IMCS: it just solves
> the problem with support of > 256Gb of shared memory.
> Right now PostgreSQL is not able to use more than 256Gb shared
> buffers at Linux with standard 4kb pages.
> I have found proposal for using MAP_HUGETLB flag in commit fest:
>
> http://www.postgresql.org/message-id/20131125032920.GA23793@toroid.org
>
> but unfortunately it was rejected. Hugepages are intensively used by
> Oracle and I think that them will be useful for improving
> performance of PorstreSQL. So not just IMCS can benefit from this
> patch. My patch is much more simple - I specially limited scope of
> this patch to one file. Certainly switch huge tlb on/off should be
> done through postgresql.conf configuration file.
>
> In any case - IMCS can be used without this patch: you just could
> not use more than 256Gb memory, even if your system has more RAM.
>
> 2. I do not understand "The add-on is not formatted as an EXTENSION"
> IMCS was created as standard extension - I just look at the examples
> of other PostgreSQL extensions included in PostgreSQL distribution
> (for example pg_stat_statements). It can be added using "create
> extension imcs" and removed "drop extension imcs" commands.
>
> If there are some violations of PostgreSQL extensions rules, please
> let me know, I will fix them.
> But I thought that I have done everything in legal way.
>
>
>
>
>
>
> On 01/04/2014 03:21 AM, David Fetter wrote:
> >On Thu, Jan 02, 2014 at 08:48:24PM +0400, knizhnik wrote:
> >>I want to announce implementation of In-Memory Columnar Store
> >>extension for PostgreSQL.
> >>Vertical representation of data is stored in PostgreSQL shared memory.
> >Thanks for the hard work!
> >
> >I noticed a couple of things about this that probably need some
> >improvement.
> >
> >1. There are unexplained patches against other parts of PostgreSQL,
> >which means that they may break other parts of PostgreSQL in equally
> >inexplicable ways. Please rearrange the patch so it doesn't require
> >this. This leads to:
> >
> >2. The add-on is not formatted as an EXTENSION, which would allow
> >people to add it or remove it cleanly.
> >
> >Would you be so kind as to fix these?
> >
> >Cheers,
> >David.

--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

From:	knizhnik <knizhnik(at)garret(dot)ru>
To:	David Fetter <david(at)fetter(dot)org>
Cc:	PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [ANNOUNCE] IMCS: In Memory Columnar Store for PostgreSQL
Date:	2014-01-04 08:11:35
Message-ID:	52C7C237.6080401@garret.ru
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-announce pgsql-hackers

On 01/04/2014 12:05 PM, David Fetter wrote:
> I'm sorry I misunderstood about the extension you wrote.
>
> Is there some way not to use shared memory for it?

No, IMCS ("In-Memory Columnar Store") is storing data in shared memory.
Certainly I could allocate shared memory myself, but due to portability
and easy maintenance reasons I decided to reuse PostgreSQL mechanism of
shared memory. The only requirement is that IMSC extension (as well as
pg_stat_statements extension) should be included in
"shared_preload_libraries" list in postgresql.conf.

IMCS memory is not somehow interleave with shared memory used for
PostgreSQL shared buffers.
And the only limitation is this 2567Gb limit at Linux, which can be
resolved using the patch included in IMCS distributive.

> Cheers,
> David.
>
> On Sat, Jan 04, 2014 at 11:46:25AM +0400, knizhnik wrote:
>> Hi David,
>>
>> Sorry, but I do not completely understand your suggestions:
>>
>> 1. IMCS really contains single patch file sysv_shmem.patch.
>> Applying this patch is not mandatory for using IMCS: it just solves
>> the problem with support of > 256Gb of shared memory.
>> Right now PostgreSQL is not able to use more than 256Gb shared
>> buffers at Linux with standard 4kb pages.
>> I have found proposal for using MAP_HUGETLB flag in commit fest:
>>
>> http://www.postgresql.org/message-id/20131125032920.GA23793@toroid.org
>>
>> but unfortunately it was rejected. Hugepages are intensively used by
>> Oracle and I think that them will be useful for improving
>> performance of PorstreSQL. So not just IMCS can benefit from this
>> patch. My patch is much more simple - I specially limited scope of
>> this patch to one file. Certainly switch huge tlb on/off should be
>> done through postgresql.conf configuration file.
>>
>> In any case - IMCS can be used without this patch: you just could
>> not use more than 256Gb memory, even if your system has more RAM.
>>
>> 2. I do not understand "The add-on is not formatted as an EXTENSION"
>> IMCS was created as standard extension - I just look at the examples
>> of other PostgreSQL extensions included in PostgreSQL distribution
>> (for example pg_stat_statements). It can be added using "create
>> extension imcs" and removed "drop extension imcs" commands.
>>
>> If there are some violations of PostgreSQL extensions rules, please
>> let me know, I will fix them.
>> But I thought that I have done everything in legal way.
>>
>>
>>
>>
>>
>>
>> On 01/04/2014 03:21 AM, David Fetter wrote:
>>> On Thu, Jan 02, 2014 at 08:48:24PM +0400, knizhnik wrote:
>>>> I want to announce implementation of In-Memory Columnar Store
>>>> extension for PostgreSQL.
>>>> Vertical representation of data is stored in PostgreSQL shared memory.
>>> Thanks for the hard work!
>>>
>>> I noticed a couple of things about this that probably need some
>>> improvement.
>>>
>>> 1. There are unexplained patches against other parts of PostgreSQL,
>>> which means that they may break other parts of PostgreSQL in equally
>>> inexplicable ways. Please rearrange the patch so it doesn't require
>>> this. This leads to:
>>>
>>> 2. The add-on is not formatted as an EXTENSION, which would allow
>>> people to add it or remove it cleanly.
>>>
>>> Would you be so kind as to fix these?
>>>
>>> Cheers,
>>> David.

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	knizhnik <knizhnik(at)garret(dot)ru>
Cc:	David Fetter <david(at)fetter(dot)org>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [ANNOUNCE] IMCS: In Memory Columnar Store for PostgreSQL
Date:	2014-01-04 19:11:37
Message-ID:	21856.1388862697@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-announce pgsql-hackers

knizhnik <knizhnik(at)garret(dot)ru> writes:
> On 01/04/2014 12:05 PM, David Fetter wrote:
>> Is there some way not to use shared memory for it?

> No, IMCS ("In-Memory Columnar Store") is storing data in shared memory.

It would probably be better if it made use of the dynamic shared memory
features that exist in HEAD.

regards, tom lane

From:	knizhnik <knizhnik(at)garret(dot)ru>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	David Fetter <david(at)fetter(dot)org>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [ANNOUNCE] IMCS: In Memory Columnar Store for PostgreSQL
Date:	2014-01-04 20:27:13
Message-ID:	52C86EA1.70703@garret.ru
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-announce pgsql-hackers

On 01/04/2014 11:11 PM, Tom Lane wrote:
> knizhnik <knizhnik(at)garret(dot)ru> writes:
>> On 01/04/2014 12:05 PM, David Fetter wrote:
>>> Is there some way not to use shared memory for it?
>> No, IMCS ("In-Memory Columnar Store") is storing data in shared memory.
> It would probably be better if it made use of the dynamic shared memory
> features that exist in HEAD.
>
> regards, tom lane

Thank you, I will try it.
But I have some concerns:

1. I want IMCS to work with PostgreSQL versions not supporting DSM
(dynamic shared memory), like 9.2, 9.3.1,...

2. IMCS is using PostgreSQL hash table implementation (ShmemInitHash,
hash_search,...)
May be I missed something - I just noticed DSM and have no chance to
investigate it, but looks like hash table can not be allocated in DSM...

3. IMCS is allocating memory using ShmemAlloc. In case of using DSM I
have to provide own allocator (although creation of non-releasing memory
allocator should not be a big issue).

4. Current implementation of DSM still suffers from 256Gb problem.
Certainly I can create multiple segments and so provide workaround
without using huge pages, but it complicates allocator.

5. I wonder if I dynamically add new DSM segment - will it be available
for other PostgreSQL processes? For example I run query which loads data
in IMCS and so needs more space and allocates new DSM segment. Then
another query is executed by other PostgreSQL process which tries to
access this data. This process is not forked from the process created
this new DSM segment, so I do not understand how this segment will be
mapped to the address space of this process, preserving address...
Certainly I can prohibit dynamic extension of IMCS storage (hoping that
in this case there will be no such problem with DSM). But in this case
we will loose the main advantage of using DSM instead of old schema of
plugin's private shared memory.

6. IMCS has some configuration parameters which has to be set through
postgresql.conf. So in any case user has to edit postgresql.conf file.
In case of using DSM it will be not necessary to add IMCS to
shared_preload_libraries list. But I do not think that it is so
restrictive and critical requirement, is it?

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	knizhnik <knizhnik(at)garret(dot)ru>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, David Fetter <david(at)fetter(dot)org>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [ANNOUNCE] IMCS: In Memory Columnar Store for PostgreSQL
Date:	2014-01-05 16:50:48
Message-ID:	CA+TgmoYPec_Awn+NM-ETnzOwyiYMmH-JaH1-LDOvFDqsFojsTw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-announce pgsql-hackers

On Sat, Jan 4, 2014 at 3:27 PM, knizhnik <knizhnik(at)garret(dot)ru> wrote:
> 1. I want IMCS to work with PostgreSQL versions not supporting DSM (dynamic
> shared memory), like 9.2, 9.3.1,...

Yeah. If it's loaded at postmaster start time, then it can work with
any version. On 9.4+, you could possibly make it work even if it's
loaded on the fly by using the dynamic shared memory facilities.
However, there are currently some limitations to those facilities that
make some things you might want to do tricky. There are pending
patches to lift some of these limitations.

> 2. IMCS is using PostgreSQL hash table implementation (ShmemInitHash,
> hash_search,...)
> May be I missed something - I just noticed DSM and have no chance to
> investigate it, but looks like hash table can not be allocated in DSM...

It wouldn't be very difficult to write an analog of ShmemInitHash() on
top of the dsm_toc patch that is currently pending. A problem,
though, is that it's not currently possible to put LWLocks in dynamic
shared memory, and even spinlocks will be problematic if
--disable-spinlocks is used. I'm due to write a post about these
problems; perhaps I should go do that.

> 3. IMCS is allocating memory using ShmemAlloc. In case of using DSM I have
> to provide own allocator (although creation of non-releasing memory
> allocator should not be a big issue).

The dsm_toc infrastructure would solve this problem.

> 4. Current implementation of DSM still suffers from 256Gb problem. Certainly
> I can create multiple segments and so provide workaround without using huge
> pages, but it complicates allocator.

So it sounds like DSM should also support huge pages somehow. I'm not
sure what that should look like.

> 5. I wonder if I dynamically add new DSM segment - will it be available for
> other PostgreSQL processes? For example I run query which loads data in IMCS
> and so needs more space and allocates new DSM segment. Then another query is
> executed by other PostgreSQL process which tries to access this data. This
> process is not forked from the process created this new DSM segment, so I do
> not understand how this segment will be mapped to the address space of this
> process, preserving address... Certainly I can prohibit dynamic extension of
> IMCS storage (hoping that in this case there will be no such problem with
> DSM). But in this case we will loose the main advantage of using DSM instead
> of old schema of plugin's private shared memory.

You can definitely dynamically add a new DSM segment; that's the point
of making it *dynamic* shared memory. What's a bit tricky as things
stand today is making sure that it sticks around. The current model
is that the DSM segment is destroyed when the last process unmaps it.
It would be easy enough to lift that limitation on systems other than
Windows; we could just add a dsm_keep_until_shutdown() API or
something similar. But on Windows, segments are *automatically*
destroyed *by the operating system* when the last process unmaps them,
so it's not quite so clear to me how we can allow it there. The main
shared memory segment is no problem because the postmaster always has
it mapped, even if no one else does, but that doesn't help for dynamic
shared memory segments.

> 6. IMCS has some configuration parameters which has to be set through
> postgresql.conf. So in any case user has to edit postgresql.conf file.
> In case of using DSM it will be not necessary to add IMCS to
> shared_preload_libraries list. But I do not think that it is so restrictive
> and critical requirement, is it?

I don't really see a problem here. One of the purposes of dynamic
shared memory (and dynamic background workers) is precisely that you
don't *necessarily* need to put extensions that use shared memory in
shared_preload_libraries - or in other words, you can add the
extension to a running server without restarting it. If you know in
advance that you will want it, you probably still *want* to put it in
shared_preload_libraries, but part of the idea is that we can get away
from requiring that.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	james <james(at)mansionfamily(dot)plus(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	knizhnik(at)garret(dot)ru, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, David Fetter <david(at)fetter(dot)org>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [ANNOUNCE] IMCS: In Memory Columnar Store for PostgreSQL
Date:	2014-01-05 17:34:23
Message-ID:	52C9979F.3060200@mansionfamily.plus.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-announce pgsql-hackers

On 05/01/2014 16:50, Robert Haas wrote:
> But on Windows, segments are*automatically*
> destroyed*by the operating system* when the last process unmaps them,
> so it's not quite so clear to me how we can allow it there. The main
> shared memory segment is no problem because the postmaster always has
> it mapped, even if no one else does, but that doesn't help for dynamic
> shared memory segments.
Surely you just need to DuplicateHandle into the parent process? If you
want to (tidily) dispose of it at some time, then you'll need to tell the
postmaster that you have done so and what the handle is in its process,
but if you just want it to stick around, then you can just pass it up.

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	James Mansion <james(at)mansionfamily(dot)plus(dot)com>
Cc:	knizhnik <knizhnik(at)garret(dot)ru>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, David Fetter <david(at)fetter(dot)org>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [ANNOUNCE] IMCS: In Memory Columnar Store for PostgreSQL
Date:	2014-01-05 18:02:43
Message-ID:	CA+TgmoZ2EL2zdt=e4FONQPGjjn4Y2N=-cu7q+cS+vWRnDgnR+Q@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-announce pgsql-hackers

On Sun, Jan 5, 2014 at 12:34 PM, james <james(at)mansionfamily(dot)plus(dot)com> wrote:
> On 05/01/2014 16:50, Robert Haas wrote:
>
> But on Windows, segments are *automatically*
> destroyed *by the operating system* when the last process unmaps them,
> so it's not quite so clear to me how we can allow it there. The main
> shared memory segment is no problem because the postmaster always has
> it mapped, even if no one else does, but that doesn't help for dynamic
> shared memory segments.
>
> Surely you just need to DuplicateHandle into the parent process? If you
> want to (tidily) dispose of it at some time, then you'll need to tell the
> postmaster that you have done so and what the handle is in its process,
> but if you just want it to stick around, then you can just pass it up.

Uh, I don't know, maybe? Does the postmaster have to do something to
receive the duplicated handle, or can the child just throw it over the
wall to the parent and let it rot until the postmaster finally exits?
The latter would be nicer for our purposes, perhaps, as running more
code from within the postmaster is risky for us. If a regular backend
process dies, the postmaster will restart everything and the database
will come back on line, but if the postmaster itself dies, we're hard
down.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	knizhnik <knizhnik(at)garret(dot)ru>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, David Fetter <david(at)fetter(dot)org>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [ANNOUNCE] IMCS: In Memory Columnar Store for PostgreSQL
Date:	2014-01-05 18:28:16
Message-ID:	52C9A440.7010605@garret.ru
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-announce pgsql-hackers

From my point of view it is not a big problem that it is not possible
to place LWLock in DSM.
I can allocate LWLocks in standard way - using RequestAddinLWLocks and
use them for synchronization.

Concerning support of huge pages - actually I do not think that it
should involve something more than just setting MAP_HUGETLB flag.
Allocation of correspondent number of huge pages should be done by
system administrator.

And what I still do not completely understand - how DSM enforces that
segment created by one PosatgreSQL process will be mapped to the same
virtual memory address in all other PostgreSQL processes.
As far as I understand right now (with standard PostgreSQL shared memory
segments) it is enforced by fork().
Shared memory segments are allocated in one process and all other
processes are forked from this process inheriting this memory segments.

But if new DSM segment is allocated at during execution of some query,
then we should add it to virtual space of all PostgreSQL processes. Even
if we somehow notify them all about presence of new segment, there is
absolutely no warranty that all of them can map this segment to the
specified memory address (it can be for some reasons already used by
some other shared object).
Or may be DSM doesn't guarantee than DSM segment is mapped to the same
address in all processes?
In this case it significantly complicates DSM usage: it will not be
possible to use direct pointers.

Can you clarify me please how dynamically allocated DSM segments will be
shared by all PostgreSQL processes?

On 01/05/2014 08:50 PM, Robert Haas wrote:
> On Sat, Jan 4, 2014 at 3:27 PM, knizhnik <knizhnik(at)garret(dot)ru> wrote:
>> 1. I want IMCS to work with PostgreSQL versions not supporting DSM (dynamic
>> shared memory), like 9.2, 9.3.1,...
> Yeah. If it's loaded at postmaster start time, then it can work with
> any version. On 9.4+, you could possibly make it work even if it's
> loaded on the fly by using the dynamic shared memory facilities.
> However, there are currently some limitations to those facilities that
> make some things you might want to do tricky. There are pending
> patches to lift some of these limitations.
>
>> 2. IMCS is using PostgreSQL hash table implementation (ShmemInitHash,
>> hash_search,...)
>> May be I missed something - I just noticed DSM and have no chance to
>> investigate it, but looks like hash table can not be allocated in DSM...
> It wouldn't be very difficult to write an analog of ShmemInitHash() on
> top of the dsm_toc patch that is currently pending. A problem,
> though, is that it's not currently possible to put LWLocks in dynamic
> shared memory, and even spinlocks will be problematic if
> --disable-spinlocks is used. I'm due to write a post about these
> problems; perhaps I should go do that.
>
>> 3. IMCS is allocating memory using ShmemAlloc. In case of using DSM I have
>> to provide own allocator (although creation of non-releasing memory
>> allocator should not be a big issue).
> The dsm_toc infrastructure would solve this problem.
>
>> 4. Current implementation of DSM still suffers from 256Gb problem. Certainly
>> I can create multiple segments and so provide workaround without using huge
>> pages, but it complicates allocator.
> So it sounds like DSM should also support huge pages somehow. I'm not
> sure what that should look like.
>
>> 5. I wonder if I dynamically add new DSM segment - will it be available for
>> other PostgreSQL processes? For example I run query which loads data in IMCS
>> and so needs more space and allocates new DSM segment. Then another query is
>> executed by other PostgreSQL process which tries to access this data. This
>> process is not forked from the process created this new DSM segment, so I do
>> not understand how this segment will be mapped to the address space of this
>> process, preserving address... Certainly I can prohibit dynamic extension of
>> IMCS storage (hoping that in this case there will be no such problem with
>> DSM). But in this case we will loose the main advantage of using DSM instead
>> of old schema of plugin's private shared memory.
> You can definitely dynamically add a new DSM segment; that's the point
> of making it *dynamic* shared memory. What's a bit tricky as things
> stand today is making sure that it sticks around. The current model
> is that the DSM segment is destroyed when the last process unmaps it.
> It would be easy enough to lift that limitation on systems other than
> Windows; we could just add a dsm_keep_until_shutdown() API or
> something similar. But on Windows, segments are *automatically*
> destroyed *by the operating system* when the last process unmaps them,
> so it's not quite so clear to me how we can allow it there. The main
> shared memory segment is no problem because the postmaster always has
> it mapped, even if no one else does, but that doesn't help for dynamic
> shared memory segments.
>
>> 6. IMCS has some configuration parameters which has to be set through
>> postgresql.conf. So in any case user has to edit postgresql.conf file.
>> In case of using DSM it will be not necessary to add IMCS to
>> shared_preload_libraries list. But I do not think that it is so restrictive
>> and critical requirement, is it?
> I don't really see a problem here. One of the purposes of dynamic
> shared memory (and dynamic background workers) is precisely that you
> don't *necessarily* need to put extensions that use shared memory in
> shared_preload_libraries - or in other words, you can add the
> extension to a running server without restarting it. If you know in
> advance that you will want it, you probably still *want* to put it in
> shared_preload_libraries, but part of the idea is that we can get away
> from requiring that.
>

From:	james <james(at)mansionfamily(dot)plus(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	knizhnik(at)garret(dot)ru, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, David Fetter <david(at)fetter(dot)org>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [ANNOUNCE] IMCS: In Memory Columnar Store for PostgreSQL
Date:	2014-01-05 18:44:38
Message-ID:	52C9A816.3060106@mansionfamily.plus.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-announce pgsql-hackers

On 05/01/2014 18:02, Robert Haas wrote:
> On Sun, Jan 5, 2014 at 12:34 PM, james<james(at)mansionfamily(dot)plus(dot)com> wrote:
>> >On 05/01/2014 16:50, Robert Haas wrote:
>> >
>> > But on Windows, segments are*automatically*
>> >destroyed*by the operating system* when the last process unmaps them,
>> >so it's not quite so clear to me how we can allow it there. The main
>> >shared memory segment is no problem because the postmaster always has
>> >it mapped, even if no one else does, but that doesn't help for dynamic
>> >shared memory segments.
>> >
>> >Surely you just need to DuplicateHandle into the parent process? If you
>> >want to (tidily) dispose of it at some time, then you'll need to tell the
>> >postmaster that you have done so and what the handle is in its process,
>> >but if you just want it to stick around, then you can just pass it up.
> Uh, I don't know, maybe? Does the postmaster have to do something to
> receive the duplicated handle

In principle, no, so long as the child has a handle to the parent
process that has
the appropriate permissions. Given that these processes have a parent/child
relationship that shouldn't be too hard to arrange.
> , or can the child just throw it over the
> wall to the parent and let it rot until the postmaster finally exits?
Yes. Though it might be a good idea to record the handle somewhere (perhaps
in a table) so that any potential issues from an insane system spamming
the postmaster
with handles are apparent.

I'm intrigued - how are the handles shared between children that are peers
in the current scheme? Some handle transfer must already be in place.

Could you share the handles to an immortal worker if you want to reduce any
potential impact on the postmaster?
> The latter would be nicer for our purposes, perhaps, as running more
> code from within the postmaster is risky for us. If a regular backend
> process dies, the postmaster will restart everything and the database
> will come back on line, but if the postmaster itself dies, we're hard
> down.
>
> -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The
> Enterprise PostgreSQL Company

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	knizhnik <knizhnik(at)garret(dot)ru>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, David Fetter <david(at)fetter(dot)org>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [ANNOUNCE] IMCS: In Memory Columnar Store for PostgreSQL
Date:	2014-01-06 03:11:33
Message-ID:	CA+TgmoZ8da3rCqA_uO0-W9G-R3Pab+AgF5mBJQFxgxx+CA5KFw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-announce pgsql-hackers

On Sun, Jan 5, 2014 at 1:28 PM, knizhnik <knizhnik(at)garret(dot)ru> wrote:
> From my point of view it is not a big problem that it is not possible to
> place LWLock in DSM.
> I can allocate LWLocks in standard way - using RequestAddinLWLocks and use
> them for synchronization.

Sure, well, that works fine if you're being loaded from
shared_preload_libraries. If you want to be able to load the
extension after startup time, though, it's no good.

> And what I still do not completely understand - how DSM enforces that
> segment created by one PosatgreSQL process will be mapped to the same
> virtual memory address in all other PostgreSQL processes.

It doesn't. One process calls dsm_create() to create a shared memory
segment. Other processes call dsm_attach() to attach it. There's no
guarantee that they'll map it at the same address; they'll just map it
somewhere.

> Or may be DSM doesn't guarantee than DSM segment is mapped to the same
> address in all processes?
> In this case it significantly complicates DSM usage: it will not be possible
> to use direct pointers.

Yeah, that's where we're at.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	James Mansion <james(at)mansionfamily(dot)plus(dot)com>
Cc:	Константин Книжник <knizhnik(at)garret(dot)ru>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, David Fetter <david(at)fetter(dot)org>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [ANNOUNCE] IMCS: In Memory Columnar Store for PostgreSQL
Date:	2014-01-06 03:14:49
Message-ID:	CA+TgmobgBhuMQ6CP8mkWfe+ktSAtfLxPwzbCo6OW0DM=JB4QJA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-announce pgsql-hackers

On Sun, Jan 5, 2014 at 1:44 PM, james <james(at)mansionfamily(dot)plus(dot)com> wrote:
> I'm intrigued - how are the handles shared between children that are peers
> in the current scheme? Some handle transfer must already be in place.

That's up to the application. After calling dsm_create(), you call
dsm_segment_handle() to get the 32-bit integer handle for that
segment. Then you have to get that to the other process(es) somehow.
If you're trying to share a handle with a background worker, you can
stuff it in bgw_main_arg. Otherwise, you'll probably need to store it
in the main shared memory segment, or a file, or whatever.

> Could you share the handles to an immortal worker if you want to reduce any
> potential impact on the postmaster?

You could, but this seems like this justification for spawning another
process, and how immortal is that worker really?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	james(at)mansionfamily(dot)plus(dot)com
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, knizhnik(at)garret(dot)ru, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, David Fetter <david(at)fetter(dot)org>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [ANNOUNCE] IMCS: In Memory Columnar Store for PostgreSQL
Date:	2014-01-06 04:20:39
Message-ID:	CAA4eK1+caBghCMrC6eWHNLwrUPqPZZrwONo-wLg34yJW2SBmzA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-announce pgsql-hackers

On Sun, Jan 5, 2014 at 11:04 PM, james <james(at)mansionfamily(dot)plus(dot)com> wrote:
> On 05/01/2014 16:50, Robert Haas wrote:
>
> But on Windows, segments are *automatically*
> destroyed *by the operating system* when the last process unmaps them,
> so it's not quite so clear to me how we can allow it there. The main
> shared memory segment is no problem because the postmaster always has
> it mapped, even if no one else does, but that doesn't help for dynamic
> shared memory segments.
>
> Surely you just need to DuplicateHandle into the parent process?

Ideally DuplicateHandle should work, but while going through Windows
internals of shared memory functions on below link, I observed that
they mentioned it that it will work for child proceess.
http://msdn.microsoft.com/en-us/library/ms810613.aspx
Refer section "Inheriting and duplicating memory-mapped file object
handles"

> If you
> want to (tidily) dispose of it at some time, then you'll need to tell the
> postmaster that you have done so and what the handle is in its process,
> but if you just want it to stick around, then you can just pass it up.

Duplicate handle should work, but we need to communicate the handle
to other process using IPC.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

From:	james <james(at)mansionfamily(dot)plus(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Константин Книжник <knizhnik(at)garret(dot)ru>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, David Fetter <david(at)fetter(dot)org>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [ANNOUNCE] IMCS: In Memory Columnar Store for PostgreSQL
Date:	2014-01-06 20:59:13
Message-ID:	52CB1921.8000505@mansionfamily.plus.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-announce pgsql-hackers

On 06/01/2014 03:14, Robert Haas wrote:
> That's up to the application. After calling dsm_create(), you call
> dsm_segment_handle() to get the 32-bit integer handle for that
> segment. Then you have to get that to the other process(es) somehow.
> If you're trying to share a handle with a background worker, you can
> stuff it in bgw_main_arg. Otherwise, you'll probably need to store it
> in the main shared memory segment, or a file, or whatever.
Well, that works for sysv shm, sure. But I was interested (possibly
from Konstantin)
how the handle transfer takes place at the moment, particularly if it is
possible
to create additional segments dynamically. I haven't looked at the
extension at all.

From:	james <james(at)mansionfamily(dot)plus(dot)com>
To:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, knizhnik(at)garret(dot)ru, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, David Fetter <david(at)fetter(dot)org>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [ANNOUNCE] IMCS: In Memory Columnar Store for PostgreSQL
Date:	2014-01-06 21:04:09
Message-ID:	52CB1A49.5060700@mansionfamily.plus.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-announce pgsql-hackers

On 06/01/2014 04:20, Amit Kapila wrote:
> Duplicate handle should work, but we need to communicate the handle
> to other process using IPC.
Only if the other process needs to use it. The IPC is not to transfer
the handle to
the other process, just to tell it which slot in its handle table
contains the handle.
If you just want to ensure that its use-count never goes to zero, the
receiver does
not need to know what the handle is.

However ...

The point remains that you need to duplicate it into every process that
might
want to use it subsequently, so it makes sense to DuplicateHandle into the
parent, and then to advertise that handle value publicly so that other
child
processes can DuplicateHandle it back into their own process.

The handle value can change so you also need to refer to the handle in the
parent and map it in each child to the local equivalent.

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	James Mansion <james(at)mansionfamily(dot)plus(dot)com>
Cc:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Константин Книжник <knizhnik(at)garret(dot)ru>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, David Fetter <david(at)fetter(dot)org>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [ANNOUNCE] IMCS: In Memory Columnar Store for PostgreSQL
Date:	2014-01-06 21:16:38
Message-ID:	CA+TgmoaAKAHtsZtXUBVy+G3zU75iGxww4vFfvhNd4sCvUjDH9A@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-announce pgsql-hackers

On Mon, Jan 6, 2014 at 4:04 PM, james <james(at)mansionfamily(dot)plus(dot)com> wrote:
> The point remains that you need to duplicate it into every process that
> might
> want to use it subsequently, so it makes sense to DuplicateHandle into the
> parent, and then to advertise that handle value publicly so that other
> child
> processes can DuplicateHandle it back into their own process.

Well, right now we just reopen the same object from all of the
processes, which seems to work fine and doesn't require any of this
complexity. The only problem I don't know how to solve is how to make
a segment stick around for the whole postmaster lifetime. If
duplicating the handle into the postmaster without its knowledge gets
us there, it may be worth considering, but that doesn't seem like a
good reason to rework the rest of the existing mechanism.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	James Mansion <james(at)mansionfamily(dot)plus(dot)com>, Константин Книжник <knizhnik(at)garret(dot)ru>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, David Fetter <david(at)fetter(dot)org>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [ANNOUNCE] IMCS: In Memory Columnar Store for PostgreSQL
Date:	2014-01-08 03:20:50
Message-ID:	CAA4eK1+rwxm-8hyN5UmoapHxh6VoGBYBRaN+JahcTsS6Gf-RFw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-announce pgsql-hackers

On Tue, Jan 7, 2014 at 2:46 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Mon, Jan 6, 2014 at 4:04 PM, james <james(at)mansionfamily(dot)plus(dot)com> wrote:
>> The point remains that you need to duplicate it into every process that
>> might
>> want to use it subsequently, so it makes sense to DuplicateHandle into the
>> parent, and then to advertise that handle value publicly so that other
>> child
>> processes can DuplicateHandle it back into their own process.
>
> Well, right now we just reopen the same object from all of the
> processes, which seems to work fine and doesn't require any of this
> complexity. The only problem I don't know how to solve is how to make
> a segment stick around for the whole postmaster lifetime. If
> duplicating the handle into the postmaster without its knowledge gets
> us there, it may be worth considering, but that doesn't seem like a
> good reason to rework the rest of the existing mechanism.

I think one has to try this to see if it works as per the need. If it's not
urgent, I can try this early next week?

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc:	James Mansion <james(at)mansionfamily(dot)plus(dot)com>, Константин Книжник <knizhnik(at)garret(dot)ru>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, David Fetter <david(at)fetter(dot)org>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [ANNOUNCE] IMCS: In Memory Columnar Store for PostgreSQL
Date:	2014-01-08 18:51:27
Message-ID:	CA+TgmoY_XWJGMSdrtGaCZje+r_yxCpAy4v1hedftuLP1vUPLXw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-announce pgsql-hackers

On Tue, Jan 7, 2014 at 10:20 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> On Tue, Jan 7, 2014 at 2:46 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> On Mon, Jan 6, 2014 at 4:04 PM, james <james(at)mansionfamily(dot)plus(dot)com> wrote:
>>> The point remains that you need to duplicate it into every process that
>>> might
>>> want to use it subsequently, so it makes sense to DuplicateHandle into the
>>> parent, and then to advertise that handle value publicly so that other
>>> child
>>> processes can DuplicateHandle it back into their own process.
>>
>> Well, right now we just reopen the same object from all of the
>> processes, which seems to work fine and doesn't require any of this
>> complexity. The only problem I don't know how to solve is how to make
>> a segment stick around for the whole postmaster lifetime. If
>> duplicating the handle into the postmaster without its knowledge gets
>> us there, it may be worth considering, but that doesn't seem like a
>> good reason to rework the rest of the existing mechanism.
>
> I think one has to try this to see if it works as per the need. If it's not
> urgent, I can try this early next week?

Anything we want to get into 9.4 has to be submitted by next Tuesday,
but I don't know that we're going to get this into 9.4.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	knizhnik <knizhnik(at)garret(dot)ru>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, James Mansion <james(at)mansionfamily(dot)plus(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, David Fetter <david(at)fetter(dot)org>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [ANNOUNCE] IMCS: In Memory Columnar Store for PostgreSQL
Date:	2014-01-08 19:39:01
Message-ID:	52CDA955.4030808@garret.ru
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-announce pgsql-hackers

On 01/08/2014 10:51 PM, Robert Haas wrote:
> On Tue, Jan 7, 2014 at 10:20 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>> On Tue, Jan 7, 2014 at 2:46 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>>> On Mon, Jan 6, 2014 at 4:04 PM, james <james(at)mansionfamily(dot)plus(dot)com> wrote:
>>>> The point remains that you need to duplicate it into every process that
>>>> might
>>>> want to use it subsequently, so it makes sense to DuplicateHandle into the
>>>> parent, and then to advertise that handle value publicly so that other
>>>> child
>>>> processes can DuplicateHandle it back into their own process.
>>> Well, right now we just reopen the same object from all of the
>>> processes, which seems to work fine and doesn't require any of this
>>> complexity. The only problem I don't know how to solve is how to make
>>> a segment stick around for the whole postmaster lifetime. If
>>> duplicating the handle into the postmaster without its knowledge gets
>>> us there, it may be worth considering, but that doesn't seem like a
>>> good reason to rework the rest of the existing mechanism.
>> I think one has to try this to see if it works as per the need. If it's not
>> urgent, I can try this early next week?
> Anything we want to get into 9.4 has to be submitted by next Tuesday,
> but I don't know that we're going to get this into 9.4.
>
I wonder what is the intended use case of dynamic shared memory?
Is is primarly oriented on PostgreSQL extensions or it will be used also
in PosatgreSQL core?
In case of extensions, shared memory may be needed to store some
collected/calculated information which will be used by extension functions.

The main advantage of DSM (from my point of view) comparing with existed
mechanism of preloaded extension is that it is not necessary to restart
server to add new extension requiring shared memory.
DSM segment can be attached or created by _PG_init function of the
loaded module.
But there will be not so much sense in this mechanism if this segment
will be deleted when there are no more processes attached to it.
So to make DSM really useful for extension it needs some mechanism to
pin segment in memory during all server/extension lifetime.

May be I am wrong, but I do not see some reasons for creating multiple
DSM segments by the same extension.
And total number of DSM segments is expected to be not very large (<10).
The same is true for synchronization primitives (LWLocks for example)
needed to synchronize access to this DSM segments. So I am not sure if
possibility to place locks in DSM is really so critical...
We can just reserved some space for LWLocks which can be used by
extension, so that LWLockAssign() can be used without
RequestAddinLWLocks or RequestAddinLWLocks can be used not only from
preloaded extension.

IMHO the main trouble with DSM is lack of guarantee that segment is
always mapped to the same virtual address.
Without such guarantee it is not possible to use direct (normal)
pointers inside DSM.
But there seems to be no reasonable solution.

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	knizhnik <knizhnik(at)garret(dot)ru>
Cc:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, James Mansion <james(at)mansionfamily(dot)plus(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, David Fetter <david(at)fetter(dot)org>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [ANNOUNCE] IMCS: In Memory Columnar Store for PostgreSQL
Date:	2014-01-09 17:22:36
Message-ID:	CA+TgmoZSnC2ehE219cmyJt+PfLkOJb9SawB-cQnvZunEXg-gQw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-announce pgsql-hackers

On Wed, Jan 8, 2014 at 2:39 PM, knizhnik <knizhnik(at)garret(dot)ru> wrote:
> I wonder what is the intended use case of dynamic shared memory?
> Is is primarly oriented on PostgreSQL extensions or it will be used also in
> PosatgreSQL core?

My main motivation is that I want to use it to support parallel query.
There is unfortunately quite a bit of work left to be done before we
can make that a reality, but that's the goal.

> May be I am wrong, but I do not see some reasons for creating multiple DSM
> segments by the same extension.

Right.

> And total number of DSM segments is expected to be not very large (<10). The
> same is true for synchronization primitives (LWLocks for example) needed to
> synchronize access to this DSM segments. So I am not sure if possibility to
> place locks in DSM is really so critical...
> We can just reserved some space for LWLocks which can be used by extension,
> so that LWLockAssign() can be used without RequestAddinLWLocks or
> RequestAddinLWLocks can be used not only from preloaded extension.

If you're doing all of this at postmaster startup time, that all works
fine. If you want to be able to load up an extension on the fly, then
it doesn't. You can only RequestAddinLWLocks() at postmaster start
time, not afterwards, so currently any extension that wants to use
lwlocks has to be loaded at postmaster startup time, or you're out of
luck.

Well. Technically we reserve something like 3 extra lwlocks that
could be assigned later. But relying on those to be available is not
very reliable, and also, 3 is not very many, considering that we have
something north of 32k core lwlocks in the default configuration.

> IMHO the main trouble with DSM is lack of guarantee that segment is always
> mapped to the same virtual address.
> Without such guarantee it is not possible to use direct (normal) pointers
> inside DSM.
> But there seems to be no reasonable solution.

Yeah, that basically sucks. But it's very hard to do any better. At
least on a 64-bit platform, there's an awful lot of address space
available, and in theory it ought to be possible to find a portion of
that address space that isn't in use by any Postgres process and have
all of the backends map the shared memory segment there. But there's
no portable way to do that, and it seems like it would require an
awful lot of IPC to achieve consensus on where to put a new mapping.

On non-Windows platforms, Noah had the idea that could reserve a large
chunk of address space mapped as PROT_NONE and then overwrite it with
mappings later as needed. However, I'm not sure how portable that is
or whether it'll cause performance consequences (like page table
bloat) if the space doesn't end up getting used (or if it does). And
unless you have an awful lot of space available, it's hard to be sure
that new mappings are going to fit. And then there's Windows.

It would be nice to have better operating system support for this.
For example, IIUC, 64-bit Linux has 128TB of address space available
for user processes. When you clone(), it can either share the entire
address space (i.e. it's a thread) or none of it (i.e. it's a
process). There's no option to, say, share 64TB and not the other
64TB, which would be ideal for us. We could then map dynamic shared
memory segments into the shared portion of the address space and do
backend-private allocations in the unshared part. Of course, even if
we had that, it wouldn't be portable, so who knows how much good it
would do. But it would be awfully nice to have the option.

I haven't given up hope that we'll some day find a way to make
same-address mappings work, at least on some platforms. But I don't
expect it to happen soon.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Claudio Freire <klaussfreire(at)gmail(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	knizhnik <knizhnik(at)garret(dot)ru>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, James Mansion <james(at)mansionfamily(dot)plus(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, David Fetter <david(at)fetter(dot)org>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [ANNOUNCE] IMCS: In Memory Columnar Store for PostgreSQL
Date:	2014-01-09 17:46:03
Message-ID:	CAGTBQpbfzaRS06L6=RFQXSNwLM8DCbJX-Q4X-1HZG=5nyjDZ5g@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-announce pgsql-hackers

On Thu, Jan 9, 2014 at 2:22 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> It would be nice to have better operating system support for this.
> For example, IIUC, 64-bit Linux has 128TB of address space available
> for user processes. When you clone(), it can either share the entire
> address space (i.e. it's a thread) or none of it (i.e. it's a
> process). There's no option to, say, share 64TB and not the other
> 64TB, which would be ideal for us. We could then map dynamic shared
> memory segments into the shared portion of the address space and do
> backend-private allocations in the unshared part. Of course, even if
> we had that, it wouldn't be portable, so who knows how much good it
> would do. But it would be awfully nice to have the option.

You can map a segment at fork time, and unmap it after forking. That
doesn't really use RAM, since it's supposed to be lazily allocated (it
can be forced to be so, I believe, with PROT_NONE and MAP_NORESERVE,
but I don't think that's portable).

That guarantees it's free.

Next, you can map shared memory at explicit addresses (linux's mmap
has support for that, and I seem to recall Windows did too).

All you have to do, is some book-keeping in shared memory (so all
processes can coordinate new mappings).

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	James Mansion <james(at)mansionfamily(dot)plus(dot)com>, Константин Книжник <knizhnik(at)garret(dot)ru>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, David Fetter <david(at)fetter(dot)org>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [ANNOUNCE] IMCS: In Memory Columnar Store for PostgreSQL
Date:	2014-01-09 19:09:25
Message-ID:	CAA4eK1JTa_iusGTHp=kmtWcg-Lqgszzvk8Ek9iu3td3Wm0_BTQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-announce pgsql-hackers

On Thu, Jan 9, 2014 at 12:21 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Tue, Jan 7, 2014 at 10:20 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>> On Tue, Jan 7, 2014 at 2:46 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>>>
>>> Well, right now we just reopen the same object from all of the
>>> processes, which seems to work fine and doesn't require any of this
>>> complexity. The only problem I don't know how to solve is how to make
>>> a segment stick around for the whole postmaster lifetime. If
>>> duplicating the handle into the postmaster without its knowledge gets
>>> us there, it may be worth considering, but that doesn't seem like a
>>> good reason to rework the rest of the existing mechanism.
>>
>> I think one has to try this to see if it works as per the need. If it's not
>> urgent, I can try this early next week?
>
> Anything we want to get into 9.4 has to be submitted by next Tuesday,
> but I don't know that we're going to get this into 9.4.

Using DuplicateHandle(), we can make segment stick for Postmaster
lifetime. I have used below test (used dsm_demo module) to verify:
Session - 1
select dsm_demo_create('this message is from session-1');
dsm_demo_create
-----------------
827121111

Session - 2
-----------------
select dsm_demo_read(827121111);
dsm_demo_read
----------------------------
this message is from session-1
(1 row)

Session-1
\q

-- till here it will work without DuplicateHandle as well

Session -2
select dsm_demo_read(827121111);
dsm_demo_read
----------------------------
this message is from session-1
(1 row)

Session -2
\q

Session -3
select dsm_demo_read(827121111);
dsm_demo_read
----------------------------
this message is from session-1
(1 row)

-- above shows that handle stays around.

Note -
Currently I have to bypass below code in dam_attach(), as it assumes
segment will not stay if it's removed from control file.

/*
* If we didn't find the handle we're looking for in the control
* segment, it probably means that everyone else who had it mapped,
* including the original creator, died before we got to this point.
* It's up to the caller to decide what to do about that.
*/
if (seg->control_slot == INVALID_CONTROL_SLOT)
{
dsm_detach(seg);
return NULL;
}

Could you let me know what exactly you are expecting in patch,
just a call to DuplicateHandle() after CreateFileMapping() or something
else as well?

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

From:	knizhnik <knizhnik(at)garret(dot)ru>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, James Mansion <james(at)mansionfamily(dot)plus(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, David Fetter <david(at)fetter(dot)org>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [ANNOUNCE] IMCS: In Memory Columnar Store for PostgreSQL
Date:	2014-01-09 19:18:39
Message-ID:	52CEF60F.9070206@garret.ru
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-announce pgsql-hackers

On 01/09/2014 09:22 PM, Robert Haas wrote:
> On Wed, Jan 8, 2014 at 2:39 PM, knizhnik <knizhnik(at)garret(dot)ru> wrote:
>> I wonder what is the intended use case of dynamic shared memory?
>> Is is primarly oriented on PostgreSQL extensions or it will be used also in
>> PosatgreSQL core?
> My main motivation is that I want to use it to support parallel query.
> There is unfortunately quite a bit of work left to be done before we
> can make that a reality, but that's the goal.

I do not want to waste your time, but this topic is very interesting to
me and I will be very pleased if you drop few words about how DSM can
help to implement parallel query processing?
It seems to me that the main complexity is in optimizer - it needs to
split query plan into several subplans which can be executed
concurrently and then merge their partial results.
As far as I understand it is not possible to use multithreading for
parallel query execution because most of PostgreSQL code is
non-reentrant. So we need to execute this subplans by several processes.
And unlike threads, the only way of efficient exchanging data between
processes is shared memory. So it is clear why do we need shared memory
for parallel query execution. But why it has to be dynamic? Why it can
not be preallocated at start time as most of other resources used by
PostgreSQL?

>
>> May be I am wrong, but I do not see some reasons for creating multiple DSM
>> segments by the same extension.
> Right.
>
>> And total number of DSM segments is expected to be not very large (<10). The
>> same is true for synchronization primitives (LWLocks for example) needed to
>> synchronize access to this DSM segments. So I am not sure if possibility to
>> place locks in DSM is really so critical...
>> We can just reserved some space for LWLocks which can be used by extension,
>> so that LWLockAssign() can be used without RequestAddinLWLocks or
>> RequestAddinLWLocks can be used not only from preloaded extension.
> If you're doing all of this at postmaster startup time, that all works
> fine. If you want to be able to load up an extension on the fly, then
> it doesn't. You can only RequestAddinLWLocks() at postmaster start
> time, not afterwards, so currently any extension that wants to use
> lwlocks has to be loaded at postmaster startup time, or you're out of
> luck.
>
> Well. Technically we reserve something like 3 extra lwlocks that
> could be assigned later. But relying on those to be available is not
> very reliable, and also, 3 is not very many, considering that we have
> something north of 32k core lwlocks in the default configuration.

3 is definitely too small.
But you agreed with me that number of DSM segments will be not very large.
And if we do not need fine grain locking (and IMHO it is not needed for
most extensions), then we need just few (most likely one) lock per DSM
segment.
It means that if instead of 3 we reserve let's say 30 LW-locks, then it
will be enough for most extensions. And there will be almost now extra
resources overhead, because as you wrote PostgreSQL has 32k locks in
default configuration.

Certainly if we need independent lock for each page of DSM memory than
there will be no other choice except placing locks in DSM segment
itself. But once again - I do not think that most of extension needed
shared memory will use such fine grain locking.

From:	knizhnik <knizhnik(at)garret(dot)ru>
To:	Claudio Freire <klaussfreire(at)gmail(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, James Mansion <james(at)mansionfamily(dot)plus(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, David Fetter <david(at)fetter(dot)org>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [ANNOUNCE] IMCS: In Memory Columnar Store for PostgreSQL
Date:	2014-01-09 19:24:59
Message-ID:	52CEF78B.8040000@garret.ru
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-announce pgsql-hackers

On 01/09/2014 09:46 PM, Claudio Freire wrote:
> On Thu, Jan 9, 2014 at 2:22 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> It would be nice to have better operating system support for this.
>> For example, IIUC, 64-bit Linux has 128TB of address space available
>> for user processes. When you clone(), it can either share the entire
>> address space (i.e. it's a thread) or none of it (i.e. it's a
>> process). There's no option to, say, share 64TB and not the other
>> 64TB, which would be ideal for us. We could then map dynamic shared
>> memory segments into the shared portion of the address space and do
>> backend-private allocations in the unshared part. Of course, even if
>> we had that, it wouldn't be portable, so who knows how much good it
>> would do. But it would be awfully nice to have the option.
> You can map a segment at fork time, and unmap it after forking. That
> doesn't really use RAM, since it's supposed to be lazily allocated (it
> can be forced to be so, I believe, with PROT_NONE and MAP_NORESERVE,
> but I don't think that's portable).
>
> That guarantees it's free.
>
> Next, you can map shared memory at explicit addresses (linux's mmap
> has support for that, and I seem to recall Windows did too).
>
> All you have to do, is some book-keeping in shared memory (so all
> processes can coordinate new mappings).
As far as I undersand the main advantage of DSM is that segment can be
allocated at any time - not only at fork time.
And it is not because of memory consumption: even without unmap,
allocation of some memory region doesn't cause loose pg physical memory.
And there are usually no problem with exhaustion of virtual space at
64-bit architecture. But using some combination of flags (as
MAP_NORESERVE), it is usually possible to completely eliminate overhead
of reserving some address range in virtual space. But mapping
dynamically created segment (not at fork time) to the same address
really seems to be a big challenge.

From:	Claudio Freire <klaussfreire(at)gmail(dot)com>
To:	knizhnik <knizhnik(at)garret(dot)ru>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, James Mansion <james(at)mansionfamily(dot)plus(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, David Fetter <david(at)fetter(dot)org>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [ANNOUNCE] IMCS: In Memory Columnar Store for PostgreSQL
Date:	2014-01-09 19:30:40
Message-ID:	CAGTBQpZK2vjGj=Cju7vLXKt_jr6Q6n4h9eT5or2COUayOs0A8Q@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-announce pgsql-hackers

On Thu, Jan 9, 2014 at 4:24 PM, knizhnik <knizhnik(at)garret(dot)ru> wrote:
> On 01/09/2014 09:46 PM, Claudio Freire wrote:
>>
>> On Thu, Jan 9, 2014 at 2:22 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>>>
>>> It would be nice to have better operating system support for this.
>>> For example, IIUC, 64-bit Linux has 128TB of address space available
>>> for user processes. When you clone(), it can either share the entire
>>> address space (i.e. it's a thread) or none of it (i.e. it's a
>>> process). There's no option to, say, share 64TB and not the other
>>> 64TB, which would be ideal for us. We could then map dynamic shared
>>> memory segments into the shared portion of the address space and do
>>> backend-private allocations in the unshared part. Of course, even if
>>> we had that, it wouldn't be portable, so who knows how much good it
>>> would do. But it would be awfully nice to have the option.
>>
>> You can map a segment at fork time, and unmap it after forking. That
>> doesn't really use RAM, since it's supposed to be lazily allocated (it
>> can be forced to be so, I believe, with PROT_NONE and MAP_NORESERVE,
>> but I don't think that's portable).
>>
>> That guarantees it's free.
>>
>> Next, you can map shared memory at explicit addresses (linux's mmap
>> has support for that, and I seem to recall Windows did too).
>>
>> All you have to do, is some book-keeping in shared memory (so all
>> processes can coordinate new mappings).
>
> As far as I undersand the main advantage of DSM is that segment can be
> allocated at any time - not only at fork time.
> And it is not because of memory consumption: even without unmap, allocation
> of some memory region doesn't cause loose pg physical memory. And there are
> usually no problem with exhaustion of virtual space at 64-bit architecture.
> But using some combination of flags (as MAP_NORESERVE), it is usually
> possible to completely eliminate overhead of reserving some address range in
> virtual space. But mapping dynamically created segment (not at fork time) to
> the same address really seems to be a big challenge.

At fork time I only wrote about reserving the address space. After
reserving it, all you have to do is implement an allocator that works
in shared memory (protected by a lwlock of course).

In essence, a hypothetical pg_dsm_alloc(region_name) would use regular
shared memory to coordinate returning an already mapped region (same
address which is guaranteed to work since we reserved that region), or
allocate one (within the reserved address space).

From:	knizhnik <knizhnik(at)garret(dot)ru>
To:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, James Mansion <james(at)mansionfamily(dot)plus(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, David Fetter <david(at)fetter(dot)org>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [ANNOUNCE] IMCS: In Memory Columnar Store for PostgreSQL
Date:	2014-01-09 19:30:59
Message-ID:	52CEF8F3.9050704@garret.ru
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-announce pgsql-hackers

On 01/09/2014 11:09 PM, Amit Kapila wrote:
> On Thu, Jan 9, 2014 at 12:21 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> On Tue, Jan 7, 2014 at 10:20 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>>> On Tue, Jan 7, 2014 at 2:46 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>>>> Well, right now we just reopen the same object from all of the
>>>> processes, which seems to work fine and doesn't require any of this
>>>> complexity. The only problem I don't know how to solve is how to make
>>>> a segment stick around for the whole postmaster lifetime. If
>>>> duplicating the handle into the postmaster without its knowledge gets
>>>> us there, it may be worth considering, but that doesn't seem like a
>>>> good reason to rework the rest of the existing mechanism.
>>> I think one has to try this to see if it works as per the need. If it's not
>>> urgent, I can try this early next week?
>> Anything we want to get into 9.4 has to be submitted by next Tuesday,
>> but I don't know that we're going to get this into 9.4.
> Using DuplicateHandle(), we can make segment stick for Postmaster
> lifetime. I have used below test (used dsm_demo module) to verify:
> Session - 1
> select dsm_demo_create('this message is from session-1');
> dsm_demo_create
> -----------------
> 827121111
>
> Session - 2
> -----------------
> select dsm_demo_read(827121111);
> dsm_demo_read
> ----------------------------
> this message is from session-1
> (1 row)
>
> Session-1
> \q
>
> -- till here it will work without DuplicateHandle as well
>
> Session -2
> select dsm_demo_read(827121111);
> dsm_demo_read
> ----------------------------
> this message is from session-1
> (1 row)
>
> Session -2
> \q
>
> Session -3
> select dsm_demo_read(827121111);
> dsm_demo_read
> ----------------------------
> this message is from session-1
> (1 row)
>
> -- above shows that handle stays around.
>
> Note -
> Currently I have to bypass below code in dam_attach(), as it assumes
> segment will not stay if it's removed from control file.
>
> /*
> * If we didn't find the handle we're looking for in the control
> * segment, it probably means that everyone else who had it mapped,
> * including the original creator, died before we got to this point.
> * It's up to the caller to decide what to do about that.
> */
> if (seg->control_slot == INVALID_CONTROL_SLOT)
> {
> dsm_detach(seg);
> return NULL;
> }
>
>
> Could you let me know what exactly you are expecting in patch,
> just a call to DuplicateHandle() after CreateFileMapping() or something
> else as well?

As far as I understand DuplicateHandle() should really do the trick:
protect segment from deallocation.
But should postmaster be somehow notified about this handle?
For example, if we really wants to delete this segment (drop extension),
we should somehow make Postmaster to close this handle.
How it can be done?

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	knizhnik <knizhnik(at)garret(dot)ru>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, James Mansion <james(at)mansionfamily(dot)plus(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, David Fetter <david(at)fetter(dot)org>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [ANNOUNCE] IMCS: In Memory Columnar Store for PostgreSQL
Date:	2014-01-09 19:36:48
Message-ID:	CAA4eK1LMiSvf1WALkWMUL5b+86DpgeSgikAiZb0DVkuW8eJymw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-announce pgsql-hackers

On Fri, Jan 10, 2014 at 1:00 AM, knizhnik <knizhnik(at)garret(dot)ru> wrote:
> On 01/09/2014 11:09 PM, Amit Kapila wrote:
>>
>>
>> Using DuplicateHandle(), we can make segment stick for Postmaster
>> lifetime. I have used below test (used dsm_demo module) to verify:
>
> As far as I understand DuplicateHandle() should really do the trick: protect
> segment from deallocation.
> But should postmaster be somehow notified about this handle?
> For example, if we really wants to delete this segment (drop extension), we
> should somehow make Postmaster to close this handle.
> How it can be done?

I think we need to use some form of IPC to communicate it to Postmaster.
I could not think of any other way atm.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

From:	knizhnik <knizhnik(at)garret(dot)ru>
To:	Claudio Freire <klaussfreire(at)gmail(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, James Mansion <james(at)mansionfamily(dot)plus(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, David Fetter <david(at)fetter(dot)org>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [ANNOUNCE] IMCS: In Memory Columnar Store for PostgreSQL
Date:	2014-01-09 19:39:36
Message-ID:	52CEFAF8.5080406@garret.ru
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-announce pgsql-hackers

On 01/09/2014 11:30 PM, Claudio Freire wrote:
> On Thu, Jan 9, 2014 at 4:24 PM, knizhnik <knizhnik(at)garret(dot)ru> wrote:
>> On 01/09/2014 09:46 PM, Claudio Freire wrote:
>>> On Thu, Jan 9, 2014 at 2:22 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>>>> It would be nice to have better operating system support for this.
>>>> For example, IIUC, 64-bit Linux has 128TB of address space available
>>>> for user processes. When you clone(), it can either share the entire
>>>> address space (i.e. it's a thread) or none of it (i.e. it's a
>>>> process). There's no option to, say, share 64TB and not the other
>>>> 64TB, which would be ideal for us. We could then map dynamic shared
>>>> memory segments into the shared portion of the address space and do
>>>> backend-private allocations in the unshared part. Of course, even if
>>>> we had that, it wouldn't be portable, so who knows how much good it
>>>> would do. But it would be awfully nice to have the option.
>>> You can map a segment at fork time, and unmap it after forking. That
>>> doesn't really use RAM, since it's supposed to be lazily allocated (it
>>> can be forced to be so, I believe, with PROT_NONE and MAP_NORESERVE,
>>> but I don't think that's portable).
>>>
>>> That guarantees it's free.
>>>
>>> Next, you can map shared memory at explicit addresses (linux's mmap
>>> has support for that, and I seem to recall Windows did too).
>>>
>>> All you have to do, is some book-keeping in shared memory (so all
>>> processes can coordinate new mappings).
>> As far as I undersand the main advantage of DSM is that segment can be
>> allocated at any time - not only at fork time.
>> And it is not because of memory consumption: even without unmap, allocation
>> of some memory region doesn't cause loose pg physical memory. And there are
>> usually no problem with exhaustion of virtual space at 64-bit architecture.
>> But using some combination of flags (as MAP_NORESERVE), it is usually
>> possible to completely eliminate overhead of reserving some address range in
>> virtual space. But mapping dynamically created segment (not at fork time) to
>> the same address really seems to be a big challenge.
> At fork time I only wrote about reserving the address space. After
> reserving it, all you have to do is implement an allocator that works
> in shared memory (protected by a lwlock of course).
>
> In essence, a hypothetical pg_dsm_alloc(region_name) would use regular
> shared memory to coordinate returning an already mapped region (same
> address which is guaranteed to work since we reserved that region), or
> allocate one (within the reserved address space).
Why do we need named segments? There is ShmemAlloc function in
PostgreSQL API.
If RequestAddinShmemSpace can be used without requirement to place
module in preloaded list, then isn't it enough for most extensions?
And ShmemInitHash can be used to maintain named regions if it is needed...

So if we have some reserved address space, do we actually need some
special allocator for this space to allocate new segments in it?
Why existed API to shared memory is not enough?

From:	Claudio Freire <klaussfreire(at)gmail(dot)com>
To:	knizhnik <knizhnik(at)garret(dot)ru>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, James Mansion <james(at)mansionfamily(dot)plus(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, David Fetter <david(at)fetter(dot)org>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [ANNOUNCE] IMCS: In Memory Columnar Store for PostgreSQL
Date:	2014-01-09 19:48:31
Message-ID:	CAGTBQpaiA6wu=obm-Uf5Hi3iAU5jBH1+PhfhZeX0bb1b++t47w@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-announce pgsql-hackers

On Thu, Jan 9, 2014 at 4:39 PM, knizhnik <knizhnik(at)garret(dot)ru> wrote:
>> At fork time I only wrote about reserving the address space. After
>> reserving it, all you have to do is implement an allocator that works
>> in shared memory (protected by a lwlock of course).
>>
>> In essence, a hypothetical pg_dsm_alloc(region_name) would use regular
>> shared memory to coordinate returning an already mapped region (same
>> address which is guaranteed to work since we reserved that region), or
>> allocate one (within the reserved address space).
>
> Why do we need named segments? There is ShmemAlloc function in PostgreSQL
> API.
> If RequestAddinShmemSpace can be used without requirement to place module in
> preloaded list, then isn't it enough for most extensions?
> And ShmemInitHash can be used to maintain named regions if it is needed...

If you want to dynamically create the segments, you need some way to
identify them. That is, the name. Otherwise, RequestWhateverShmemSpace
won't know when to return an already-mapped region or not.

Mind you, the name can be a number. No need to make it a string.

> So if we have some reserved address space, do we actually need some special
> allocator for this space to allocate new segments in it?
> Why existed API to shared memory is not enough?

I don't know this existing API you mention. But I think this is quite
a specific case very unlikely to be serviced from existing APIs. You
need a data structure that can map names to regions, any hash map will
do, or even an array since one wouldn't expect it to be too big, or
require it to be too fast, and then you need to unmap the "reserve"
mapping and put a shared region there instead, before returning the
pointer to this shared region.

So, the special thing is, the book-keeping region sits in regular
shared memory, whereas the allocated regions sit in newly-created
segments. And segments are referenced by pointers (since the address
space is fixed and shared). Is there something like that already?

From:	Claudio Freire <klaussfreire(at)gmail(dot)com>
To:	knizhnik <knizhnik(at)garret(dot)ru>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, James Mansion <james(at)mansionfamily(dot)plus(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, David Fetter <david(at)fetter(dot)org>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [ANNOUNCE] IMCS: In Memory Columnar Store for PostgreSQL
Date:	2014-01-09 19:50:32
Message-ID:	CAGTBQpZ79frCzgtpukujPAsgFf1JhadaibqCG8NvU7oAkTV7tw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-announce pgsql-hackers

On Thu, Jan 9, 2014 at 4:48 PM, Claudio Freire <klaussfreire(at)gmail(dot)com> wrote:
> On Thu, Jan 9, 2014 at 4:39 PM, knizhnik <knizhnik(at)garret(dot)ru> wrote:
>>> At fork time I only wrote about reserving the address space. After
>>> reserving it, all you have to do is implement an allocator that works
>>> in shared memory (protected by a lwlock of course).
>>>
>>> In essence, a hypothetical pg_dsm_alloc(region_name) would use regular
>>> shared memory to coordinate returning an already mapped region (same
>>> address which is guaranteed to work since we reserved that region), or
>>> allocate one (within the reserved address space).
>>
>> Why do we need named segments? There is ShmemAlloc function in PostgreSQL
>> API.
>> If RequestAddinShmemSpace can be used without requirement to place module in
>> preloaded list, then isn't it enough for most extensions?
>> And ShmemInitHash can be used to maintain named regions if it is needed...
>
> If you want to dynamically create the segments, you need some way to
> identify them. That is, the name. Otherwise, RequestWhateverShmemSpace
> won't know when to return an already-mapped region or not.
>
> Mind you, the name can be a number. No need to make it a string.
>
>> So if we have some reserved address space, do we actually need some special
>> allocator for this space to allocate new segments in it?
>> Why existed API to shared memory is not enough?

Oh, I notice why the confusion now.

The "reserve" mapping I was proposing, was a MAP_NORESERVE with PROT_NONE.

Ie: forbidden access. Which guarantees the OS won't try to allocate
physical RAM to it.

You'd have to re-map it before using, so it's not like a regular
shared memory region where you can simply allocate pointers and
intersperse bookkeeping data in-place.

From:	knizhnik <knizhnik(at)garret(dot)ru>
To:	Claudio Freire <klaussfreire(at)gmail(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, James Mansion <james(at)mansionfamily(dot)plus(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, David Fetter <david(at)fetter(dot)org>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [ANNOUNCE] IMCS: In Memory Columnar Store for PostgreSQL
Date:	2014-01-09 20:18:51
Message-ID:	52CF042B.6030902@garret.ru
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-announce pgsql-hackers

On 01/09/2014 11:48 PM, Claudio Freire wrote:
> On Thu, Jan 9, 2014 at 4:39 PM, knizhnik <knizhnik(at)garret(dot)ru> wrote:
>>> At fork time I only wrote about reserving the address space. After
>>> reserving it, all you have to do is implement an allocator that works
>>> in shared memory (protected by a lwlock of course).
>>>
>>> In essence, a hypothetical pg_dsm_alloc(region_name) would use regular
>>> shared memory to coordinate returning an already mapped region (same
>>> address which is guaranteed to work since we reserved that region), or
>>> allocate one (within the reserved address space).
>> Why do we need named segments? There is ShmemAlloc function in PostgreSQL
>> API.
>> If RequestAddinShmemSpace can be used without requirement to place module in
>> preloaded list, then isn't it enough for most extensions?
>> And ShmemInitHash can be used to maintain named regions if it is needed...
> If you want to dynamically create the segments, you need some way to
> identify them. That is, the name. Otherwise, RequestWhateverShmemSpace
> won't know when to return an already-mapped region or not.
>
> Mind you, the name can be a number. No need to make it a string.
>
>> So if we have some reserved address space, do we actually need some special
>> allocator for this space to allocate new segments in it?
>> Why existed API to shared memory is not enough?
> I don't know this existing API you mention. But I think this is quite
> a specific case very unlikely to be serviced from existing APIs. You
> need a data structure that can map names to regions, any hash map will
> do, or even an array since one wouldn't expect it to be too big, or
> require it to be too fast, and then you need to unmap the "reserve"
> mapping and put a shared region there instead, before returning the
> pointer to this shared region.
>
> So, the special thing is, the book-keeping region sits in regular
> shared memory, whereas the allocated regions sit in newly-created
> segments. And segments are referenced by pointers (since the address
> space is fixed and shared). Is there something like that already?
By existed API I mostly mean 6 functions:

RequestAddinShmemSpace()
RequestAddinLWLocks()
ShmemInitStruct()
LWLockAssign()
ShmemAlloc()
ShmemInitHash()

If it will be possible to use this function without requirement for
module to be included in "shared_preload_libraries" list, then do we
really need DSM?
And it can be achieved by
1. Preserving address space (as you suggested)
2. Preserving some fixed number of free LWLocks (not very large < 100).

I do not have something against creation of own allocator of named
shared memory segments within preserved address space.
I just not sure if it is actually needed. In some sense
RequestAddinShmemSpace() can be such allocator.

From:	Jim Nasby <jim(at)nasby(dot)net>
To:	knizhnik <knizhnik(at)garret(dot)ru>, Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, James Mansion <james(at)mansionfamily(dot)plus(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, David Fetter <david(at)fetter(dot)org>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [ANNOUNCE] IMCS: In Memory Columnar Store for PostgreSQL
Date:	2014-01-09 21:04:47
Message-ID:	52CF0EEF.2050408@nasby.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-announce pgsql-hackers

On 1/9/14, 1:18 PM, knizhnik wrote:
> So it is clear why do we need shared memory for parallel query execution. But why it has to be dynamic? Why it can not be preallocated at start time as most of other resources used by PostgreSQL?

That would limit us to doing something like allocating a fixed maximum of parallel processes (which might be workable) and only allocating a very small amount of memory for IPC. Small as in can only handle a small number of tuples. That sounds like a really inefficient way to shuffle data to and from parallel processes, especially because one or both sides would probably have to actually copy the data if we're doing it that way.

With DSM if you want to do something like a parallel sort each process can put their results into memory that the parent process can directly access.

Of course the other enormous win for DSM is it's the foundation for finally being able to resize things without a restart. For large dollar sites that ability would be hugely beneficial.
--
Jim C. Nasby, Data Architect jim(at)nasby(dot)net
512.569.9461 (cell) http://jim.nasby.net

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Claudio Freire <klaussfreire(at)gmail(dot)com>
Cc:	knizhnik <knizhnik(at)garret(dot)ru>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, James Mansion <james(at)mansionfamily(dot)plus(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, David Fetter <david(at)fetter(dot)org>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [ANNOUNCE] IMCS: In Memory Columnar Store for PostgreSQL
Date:	2014-01-10 18:23:21
Message-ID:	CA+Tgmoab8LE1EQN6agmjXemwXWQK1JzLdb7Nv9sbq6Dr5ynfZw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-announce pgsql-hackers

On Thu, Jan 9, 2014 at 12:46 PM, Claudio Freire <klaussfreire(at)gmail(dot)com> wrote:
> On Thu, Jan 9, 2014 at 2:22 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> It would be nice to have better operating system support for this.
>> For example, IIUC, 64-bit Linux has 128TB of address space available
>> for user processes. When you clone(), it can either share the entire
>> address space (i.e. it's a thread) or none of it (i.e. it's a
>> process). There's no option to, say, share 64TB and not the other
>> 64TB, which would be ideal for us. We could then map dynamic shared
>> memory segments into the shared portion of the address space and do
>> backend-private allocations in the unshared part. Of course, even if
>> we had that, it wouldn't be portable, so who knows how much good it
>> would do. But it would be awfully nice to have the option.
>
> You can map a segment at fork time, and unmap it after forking. That
> doesn't really use RAM, since it's supposed to be lazily allocated (it
> can be forced to be so, I believe, with PROT_NONE and MAP_NORESERVE,
> but I don't think that's portable).
>
> That guarantees it's free.

It guarantees that it is free as of the moment you unmap it, but it
doesn't guarantee that future memory allocations or shared library
loads couldn't stomp on the space.

Also, that not-portable thing is a bit of a problem. I've got no
problem with the idea that third-party code may be platform-specific,
but I think the stuff we ship in core has got to work on more or less
all reasonably modern systems.

> Next, you can map shared memory at explicit addresses (linux's mmap
> has support for that, and I seem to recall Windows did too).
>
> All you have to do, is some book-keeping in shared memory (so all
> processes can coordinate new mappings).

I did something like this back in 1998 or 1999 at the operating system
level, and it turned out not to work very well. I was working on an
experimental research operating system kernel, and we wanted to add
support for mmap(), so we set aside a portion of the virtual address
space for file mappings. That region was shared across all processes
in the system. One problem is that there's no guarantee the space is
big enough for whatever you want to map; and the other problem is that
it can easily get fragmented. Now, 64-bit address spaces go some way
to ameliorating these concerns so maybe it can be made to work, but I
would be a teeny bit cautious about using the word "just" to describe
the complexity involved.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc:	James Mansion <james(at)mansionfamily(dot)plus(dot)com>, Константин Книжник <knizhnik(at)garret(dot)ru>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, David Fetter <david(at)fetter(dot)org>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [ANNOUNCE] IMCS: In Memory Columnar Store for PostgreSQL
Date:	2014-01-10 18:25:26
Message-ID:	CA+TgmoaKoGuJQbEdGeYKYSXud9EAidqx77J2_HXzRgFo3Hr46A@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-announce pgsql-hackers

On Thu, Jan 9, 2014 at 2:09 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> On Thu, Jan 9, 2014 at 12:21 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> On Tue, Jan 7, 2014 at 10:20 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>>> On Tue, Jan 7, 2014 at 2:46 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>>>>
>>>> Well, right now we just reopen the same object from all of the
>>>> processes, which seems to work fine and doesn't require any of this
>>>> complexity. The only problem I don't know how to solve is how to make
>>>> a segment stick around for the whole postmaster lifetime. If
>>>> duplicating the handle into the postmaster without its knowledge gets
>>>> us there, it may be worth considering, but that doesn't seem like a
>>>> good reason to rework the rest of the existing mechanism.
>>>
>>> I think one has to try this to see if it works as per the need. If it's not
>>> urgent, I can try this early next week?
>>
>> Anything we want to get into 9.4 has to be submitted by next Tuesday,
>> but I don't know that we're going to get this into 9.4.
>
> Using DuplicateHandle(), we can make segment stick for Postmaster
> lifetime. I have used below test (used dsm_demo module) to verify:
> Session - 1
> select dsm_demo_create('this message is from session-1');
> dsm_demo_create
> -----------------
> 827121111
>
> Session - 2
> -----------------
> select dsm_demo_read(827121111);
> dsm_demo_read
> ----------------------------
> this message is from session-1
> (1 row)
>
> Session-1
> \q
>
> -- till here it will work without DuplicateHandle as well
>
> Session -2
> select dsm_demo_read(827121111);
> dsm_demo_read
> ----------------------------
> this message is from session-1
> (1 row)
>
> Session -2
> \q
>
> Session -3
> select dsm_demo_read(827121111);
> dsm_demo_read
> ----------------------------
> this message is from session-1
> (1 row)
>
> -- above shows that handle stays around.
>
> Note -
> Currently I have to bypass below code in dam_attach(), as it assumes
> segment will not stay if it's removed from control file.
>
> /*
> * If we didn't find the handle we're looking for in the control
> * segment, it probably means that everyone else who had it mapped,
> * including the original creator, died before we got to this point.
> * It's up to the caller to decide what to do about that.
> */
> if (seg->control_slot == INVALID_CONTROL_SLOT)
> {
> dsm_detach(seg);
> return NULL;
> }
>
>
> Could you let me know what exactly you are expecting in patch,
> just a call to DuplicateHandle() after CreateFileMapping() or something
> else as well?

Well, I guess what I was thinking is that we could have a call
dsm_keep_segment() which would be invoked on an already-created
dsm_segment *. On Linux, that would just bump the reference count in
the control segment up by one so that it doesn't get destroyed until
postmaster shutdown. On Windows it may as well still do that for
consistency, but will also need to do this DuplicateHandle() trick.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Claudio Freire <klaussfreire(at)gmail(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	knizhnik <knizhnik(at)garret(dot)ru>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, James Mansion <james(at)mansionfamily(dot)plus(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, David Fetter <david(at)fetter(dot)org>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [ANNOUNCE] IMCS: In Memory Columnar Store for PostgreSQL
Date:	2014-01-10 18:35:34
Message-ID:	CAGTBQpZ2tYj9XkZS8DeYQRX2BS-fRCNS9JvVWmCZquBByC+yqA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-announce pgsql-hackers

On Fri, Jan 10, 2014 at 3:23 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Thu, Jan 9, 2014 at 12:46 PM, Claudio Freire <klaussfreire(at)gmail(dot)com> wrote:
>> On Thu, Jan 9, 2014 at 2:22 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>>> It would be nice to have better operating system support for this.
>>> For example, IIUC, 64-bit Linux has 128TB of address space available
>>> for user processes. When you clone(), it can either share the entire
>>> address space (i.e. it's a thread) or none of it (i.e. it's a
>>> process). There's no option to, say, share 64TB and not the other
>>> 64TB, which would be ideal for us. We could then map dynamic shared
>>> memory segments into the shared portion of the address space and do
>>> backend-private allocations in the unshared part. Of course, even if
>>> we had that, it wouldn't be portable, so who knows how much good it
>>> would do. But it would be awfully nice to have the option.
>>
>> You can map a segment at fork time, and unmap it after forking. That
>> doesn't really use RAM, since it's supposed to be lazily allocated (it
>> can be forced to be so, I believe, with PROT_NONE and MAP_NORESERVE,
>> but I don't think that's portable).
>>
>> That guarantees it's free.
>
> It guarantees that it is free as of the moment you unmap it, but it
> doesn't guarantee that future memory allocations or shared library
> loads couldn't stomp on the space.

You would only unmap prior to remapping, only the to-be-mapped
portion, so I don't see a problem.

> Also, that not-portable thing is a bit of a problem. I've got no
> problem with the idea that third-party code may be platform-specific,
> but I think the stuff we ship in core has got to work on more or less
> all reasonably modern systems.
>
>> Next, you can map shared memory at explicit addresses (linux's mmap
>> has support for that, and I seem to recall Windows did too).
>>
>> All you have to do, is some book-keeping in shared memory (so all
>> processes can coordinate new mappings).
>
> I did something like this back in 1998 or 1999 at the operating system
> level, and it turned out not to work very well. I was working on an
> experimental research operating system kernel, and we wanted to add
> support for mmap(), so we set aside a portion of the virtual address
> space for file mappings. That region was shared across all processes
> in the system. One problem is that there's no guarantee the space is
> big enough for whatever you want to map; and the other problem is that
> it can easily get fragmented. Now, 64-bit address spaces go some way
> to ameliorating these concerns so maybe it can be made to work, but I
> would be a teeny bit cautious about using the word "just" to describe
> the complexity involved.

Ok, yes, fragmentation could be an issue if the address range is not
"humongus enough".

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Claudio Freire <klaussfreire(at)gmail(dot)com>
Cc:	knizhnik <knizhnik(at)garret(dot)ru>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, James Mansion <james(at)mansionfamily(dot)plus(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, David Fetter <david(at)fetter(dot)org>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [ANNOUNCE] IMCS: In Memory Columnar Store for PostgreSQL
Date:	2014-01-10 18:51:05
Message-ID:	CA+TgmoaSYcGo9LUvk18HXLOAEw9MoHYWsFEQ3MP_eJMj9+tP-w@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-announce pgsql-hackers

On Fri, Jan 10, 2014 at 1:35 PM, Claudio Freire <klaussfreire(at)gmail(dot)com> wrote:
>>> You can map a segment at fork time, and unmap it after forking. That
>>> doesn't really use RAM, since it's supposed to be lazily allocated (it
>>> can be forced to be so, I believe, with PROT_NONE and MAP_NORESERVE,
>>> but I don't think that's portable).
>>>
>>> That guarantees it's free.
>>
>> It guarantees that it is free as of the moment you unmap it, but it
>> doesn't guarantee that future memory allocations or shared library
>> loads couldn't stomp on the space.
>
> You would only unmap prior to remapping, only the to-be-mapped
> portion, so I don't see a problem.

OK, yeah, that way works. That's more or less what Noah proposed
before. But I was skeptical it would work well everywhere. I suppose
we won't know until somebody tries it. (I didn't.)

> Ok, yes, fragmentation could be an issue if the address range is not
> "humongus enough".

I've often thought that 64-bit machines are so capable that there's no
reason to go any higher. But lately I've started to wonder. There
are already machines out there with >2^40 bytes of physical memory,
and the number just keeps creeping up. When you reserve a couple of
bits to indicate user or kernel space, and then consider that virtual
address space can be many times larger than physical memory, it starts
not to seem like that much.

But I'm not that excited about the amount of additional memory we'll
eat when somebody decides to make a pointer 16 bytes. Ugh.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Claudio Freire <klaussfreire(at)gmail(dot)com>, knizhnik <knizhnik(at)garret(dot)ru>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, James Mansion <james(at)mansionfamily(dot)plus(dot)com>, David Fetter <david(at)fetter(dot)org>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [ANNOUNCE] IMCS: In Memory Columnar Store for PostgreSQL
Date:	2014-01-10 19:02:06
Message-ID:	14942.1389380526@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-announce pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> I've often thought that 64-bit machines are so capable that there's no
> reason to go any higher. But lately I've started to wonder. There
> are already machines out there with >2^40 bytes of physical memory,
> and the number just keeps creeping up. When you reserve a couple of
> bits to indicate user or kernel space, and then consider that virtual
> address space can be many times larger than physical memory, it starts
> not to seem like that much.

> But I'm not that excited about the amount of additional memory we'll
> eat when somebody decides to make a pointer 16 bytes. Ugh.

Once you really need that, you're not going to care about doubling
the size of pointers. At worst, you're giving up 1 bit of address
space to gain 64 more.

(Still, I rather doubt it'll happen in my lifetime.)

regards, tom lane