Re: can we publish a aset interface?

Lists: pgsql-hackers
From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: can we publish a aset interface?
Date: 2010-09-07 08:53:36
Message-ID: AANLkTimEv1DSVVwGXo9T6TgxWJGn9qXvnAEx8D+YS_9Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hello

I would to use a special memory context for shared data (based on
mmap) and I like impementation of aset. There is only one difference -
aset is based on malloc and I would to use a mmap.

malloc() is used in AllocSetContextCreate and AllocSetAlloc. These
procedures should be overwritten, but other code and data structures
can be used. This step can be useful for previous discuss about some
more comfortable maintaining of shared memory.

What do you think about?

Regards

Pavel Stehule


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: can we publish a aset interface?
Date: 2010-09-07 12:09:54
Message-ID: AANLkTi=oNr1CurY4mTbgmw6iKT9E3zkyZdy4xv0EUG+t@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Sep 7, 2010 at 4:53 AM, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:
> I would to use a special memory context for shared data (based on
> mmap) and I like impementation of aset. There is only one difference -
> aset is based on malloc and I would to use a mmap.
>
> malloc() is used in AllocSetContextCreate and AllocSetAlloc. These
> procedures should be overwritten, but other code and data structures
> can be used. This step can be useful for previous discuss about some
> more comfortable maintaining of shared memory.
>
> What do you think about?

What would this be good for?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: can we publish a aset interface?
Date: 2010-09-07 13:27:29
Message-ID: AANLkTin+W+_B2Kjx70_5Xga_LSV_sDSk8wgfTweUws0B@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2010/9/7 Robert Haas <robertmhaas(at)gmail(dot)com>:
> On Tue, Sep 7, 2010 at 4:53 AM, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:
>> I would to use a special memory context for shared data (based on
>> mmap) and I like impementation of aset. There is only one difference -
>> aset is based on malloc and I would to use a mmap.
>>
>> malloc() is used in AllocSetContextCreate and AllocSetAlloc. These
>> procedures should be overwritten, but other code and data structures
>> can be used. This step can be useful for previous discuss about some
>> more comfortable maintaining of shared memory.
>>
>> What do you think about?
>
> What would this be good for?
>

I try to solve performance problems with czech tsearch. I checked
serialization and deserialization, but this decrease load time only to
100ms (from 500) that is too much for us. After some gaming with mmap
I thinking so there some chance to preallocate mmap memory, and then
use a special memory context based on mmap instead of malloc.
Teoretically I can copy aset interface - this module probably never be
in core (this problem is probably local - only Czech), but it isn't
nice. So I asking.

Regards

Pavel Stehule

> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise Postgres Company
>


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: can we publish a aset interface?
Date: 2010-09-07 14:13:12
Message-ID: AANLkTikD_Dnu1VmCia9m0Dy5R0jB_83eHkrzY+BnOpyu@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Sep 7, 2010 at 9:27 AM, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:
> 2010/9/7 Robert Haas <robertmhaas(at)gmail(dot)com>:
>> On Tue, Sep 7, 2010 at 4:53 AM, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:
>>> I would to use a special memory context for shared data (based on
>>> mmap) and I like impementation of aset. There is only one difference -
>>> aset is based on malloc and I would to use a mmap.
>>>
>>> malloc() is used in AllocSetContextCreate and AllocSetAlloc. These
>>> procedures should be overwritten, but other code and data structures
>>> can be used. This step can be useful for previous discuss about some
>>> more comfortable maintaining of shared memory.
>>>
>>> What do you think about?
>>
>> What would this be good for?
>>
>
> I try to solve performance problems with czech tsearch. I checked
> serialization and deserialization, but this decrease load time only to
> 100ms (from 500) that is too much for us. After some gaming with mmap
> I thinking so there some chance to preallocate mmap memory, and then
> use a special memory context based on mmap instead of malloc.
> Teoretically I can copy aset interface - this module probably never be
> in core (this problem is probably local - only Czech), but it isn't
> nice. So I asking.

I don't see how you could do anything with this that you can't do with
the existing implementation. It's not as if you can store pointers
into an mmap'd block and then count on them being valid the next time
you map the file... it might not end up at the same offset.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: can we publish a aset interface?
Date: 2010-09-07 14:40:33
Message-ID: 9702.1283870433@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> writes:
> I would to use a special memory context for shared data (based on
> mmap) and I like impementation of aset. There is only one difference -
> aset is based on malloc and I would to use a mmap.

> malloc() is used in AllocSetContextCreate and AllocSetAlloc. These
> procedures should be overwritten, but other code and data structures
> can be used. This step can be useful for previous discuss about some
> more comfortable maintaining of shared memory.

> What do you think about?

If you're proposing factoring aset.c into two levels, I don't think so.
That code is already a tremendous performance hot-spot and introducing
any more inefficiency into it doesn't seem like a good idea. Especially
not for shared memory allocation, which is a feature that still has
no buy-in. Also, you'd need to do more than just replace malloc: you'd
need to add locking capability. That would make the code even uglier,
and slower, if it has to support locking or no locking dynamically.

Use the mcxt.c switch. That's what it's there for.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: can we publish a aset interface?
Date: 2010-09-07 14:56:35
Message-ID: 10040.1283871395@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Tue, Sep 7, 2010 at 9:27 AM, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:
>> I try to solve performance problems with czech tsearch. I checked
>> serialization and deserialization, but this decrease load time only to
>> 100ms (from 500) that is too much for us. After some gaming with mmap
>> I thinking so there some chance to preallocate mmap memory, and then
>> use a special memory context based on mmap instead of malloc.
>> Teoretically I can copy aset interface - this module probably never be
>> in core (this problem is probably local - only Czech), but it isn't
>> nice. So I asking.

> I don't see how you could do anything with this that you can't do with
> the existing implementation. It's not as if you can store pointers
> into an mmap'd block and then count on them being valid the next time
> you map the file... it might not end up at the same offset.

More to the point, this entire approach to speeding up dictionary loading
has already been proposed and rejected, and it'll get rejected again if
it's submitted.

The conclusion of the previous discussion was that we should build
"precompiled" dictionaries, using some pointer-free representation,
which would be stored in files that could be either mmap'd in or just
read in if running on a platform lacking mmap. There is no need for
any shmem allocator in that implementation.

regards, tom lane


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: can we publish a aset interface?
Date: 2010-09-07 15:18:27
Message-ID: 1283872651-sup-9737@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Excerpts from Robert Haas's message of mar sep 07 10:13:12 -0400 2010:

> > I try to solve performance problems with czech tsearch. I checked
> > serialization and deserialization, but this decrease load time only to
> > 100ms (from 500) that is too much for us. After some gaming with mmap
> > I thinking so there some chance to preallocate mmap memory, and then
> > use a special memory context based on mmap instead of malloc.
> > Teoretically I can copy aset interface - this module probably never be
> > in core (this problem is probably local - only Czech), but it isn't
> > nice. So I asking.
>
> I don't see how you could do anything with this that you can't do with
> the existing implementation. It's not as if you can store pointers
> into an mmap'd block and then count on them being valid the next time
> you map the file... it might not end up at the same offset.

Hmm, surely you could store offsets instead of absolute pointers.

--
Álvaro Herrera <alvherre(at)commandprompt(dot)com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: can we publish a aset interface?
Date: 2010-09-07 15:50:24
Message-ID: AANLkTikexgcLS1kQNohFSCkWDU2SM_LLXQ7zG9q5_xPN@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Sep 7, 2010 at 11:18 AM, Alvaro Herrera
<alvherre(at)commandprompt(dot)com> wrote:
> Excerpts from Robert Haas's message of mar sep 07 10:13:12 -0400 2010:
>
>> > I try to solve performance problems with czech tsearch. I checked
>> > serialization and deserialization, but this decrease load time only to
>> > 100ms (from 500) that is too much for us. After some gaming with mmap
>> > I thinking so there some chance to preallocate mmap memory, and then
>> > use a special memory context based on mmap instead of malloc.
>> > Teoretically I can copy aset interface - this module probably never be
>> > in core (this problem is probably local - only Czech), but it isn't
>> > nice. So I asking.
>>
>> I don't see how you could do anything with this that you can't do with
>> the existing implementation.  It's not as if you can store pointers
>> into an mmap'd block and then count on them being valid the next time
>> you map the file...  it might not end up at the same offset.
>
> Hmm, surely you could store offsets instead of absolute pointers.

Surely you could. But then where does palloc come in? As Tom said
upthread, the right thing to do here is to create a pre-compiler that
outputs a pointer-free representation which you can then mmap().

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: can we publish a aset interface?
Date: 2010-09-07 16:44:44
Message-ID: AANLkTi=qc8fVG62O8ahmiKGMdY1eydenn59R+DVyrCSX@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2010/9/7 Robert Haas <robertmhaas(at)gmail(dot)com>:
> On Tue, Sep 7, 2010 at 9:27 AM, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:
>> 2010/9/7 Robert Haas <robertmhaas(at)gmail(dot)com>:
>>> On Tue, Sep 7, 2010 at 4:53 AM, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:
>>>> I would to use a special memory context for shared data (based on
>>>> mmap) and I like impementation of aset. There is only one difference -
>>>> aset is based on malloc and I would to use a mmap.
>>>>
>>>> malloc() is used in AllocSetContextCreate and AllocSetAlloc. These
>>>> procedures should be overwritten, but other code and data structures
>>>> can be used. This step can be useful for previous discuss about some
>>>> more comfortable maintaining of shared memory.
>>>>
>>>> What do you think about?
>>>
>>> What would this be good for?
>>>
>>
>> I try to solve performance problems with czech tsearch. I checked
>> serialization and deserialization, but this decrease load time only to
>> 100ms (from 500) that is too much for us. After some gaming with mmap
>> I thinking so there some chance to preallocate mmap memory, and then
>> use a special memory context based on mmap instead of malloc.
>> Teoretically I can copy aset interface - this module probably never be
>> in core (this problem is probably local - only Czech), but it isn't
>> nice. So I asking.
>
> I don't see how you could do anything with this that you can't do with
> the existing implementation.  It's not as if you can store pointers
> into an mmap'd block and then count on them being valid the next time
> you map the file...  it might not end up at the same offset.

you can, but you have to do preallocation and you have to use a FIXED flag.

Pavel

>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise Postgres Company
>


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: can we publish a aset interface?
Date: 2010-09-07 18:31:14
Message-ID: AANLkTi=yTS3zkqVkBX_SRBg5nJNkUZX8YSp9XjN-JW06@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Sep 7, 2010 at 12:44 PM, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:
>> I don't see how you could do anything with this that you can't do with
>> the existing implementation.  It's not as if you can store pointers
>> into an mmap'd block and then count on them being valid the next time
>> you map the file...  it might not end up at the same offset.
>
> you can, but you have to do preallocation and you have to use a FIXED flag.

MAP_FIXED? As TFM says: "Because requiring a fixed address for a
mapping is less portable, the use of this option is discouraged."

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: can we publish a aset interface?
Date: 2010-09-07 18:35:35
Message-ID: AANLkTi=PhUp-oFUWYMx4=w50NoZqPTzzZKb2+TAtzx8_@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2010/9/7 Robert Haas <robertmhaas(at)gmail(dot)com>:
> On Tue, Sep 7, 2010 at 12:44 PM, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:
>>> I don't see how you could do anything with this that you can't do with
>>> the existing implementation.  It's not as if you can store pointers
>>> into an mmap'd block and then count on them being valid the next time
>>> you map the file...  it might not end up at the same offset.
>>
>> you can, but you have to do preallocation and you have to use a FIXED flag.
>
> MAP_FIXED?  As TFM says: "Because requiring a fixed address for a
> mapping is less portable, the use of this option  is  discouraged."

yes, I know. This will be used for proprietary Czech language - 95% of
postgresql instalations are on Linux, 10% on MS Windows (in Czech
Republic)

I don't plan to try to move this module to core. And it's useless -
other languages has not our problems.

Regards

Pavel

>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise Postgres Company
>


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: can we publish a aset interface?
Date: 2010-09-16 18:26:43
Message-ID: 1284661603.4696.22.camel@vanquo.pezone.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On tis, 2010-09-07 at 20:35 +0200, Pavel Stehule wrote:
> I don't plan to try to move this module to core. And it's useless -
> other languages has not our problems.

I don't know the details of what you're struggling with, but it's a bit
hard to believe that there is a problem that is absolutely unique to the
Czech language.


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: can we publish a aset interface?
Date: 2010-09-16 18:43:37
Message-ID: AANLkTimTrkSmxXe=pW-fTnnJ-HVjUHdXEqm_G_p-rx+0@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2010/9/16 Peter Eisentraut <peter_e(at)gmx(dot)net>:
> On tis, 2010-09-07 at 20:35 +0200, Pavel Stehule wrote:
>> I don't plan to try to move this module to core. And it's useless -
>> other languages has not our problems.
>
> I don't know the details of what you're struggling with, but it's a bit
> hard to believe that there is a problem that is absolutely unique to the
> Czech language.
>

I think so people uses a steamer dictionary - because ispell
dictionary should be slow for any language. But there are not
available steamer for Czech language. People who need fast processing
just use a simple dictionary - and probably there are not any pg
hacker from Poland or Slovakia.

Regards

Pavel

>


From: David Fetter <david(at)fetter(dot)org>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: can we publish a aset interface?
Date: 2010-09-16 21:52:15
Message-ID: 20100916215215.GA23371@fetter.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Sep 16, 2010 at 08:43:37PM +0200, Pavel Stehule wrote:
> 2010/9/16 Peter Eisentraut <peter_e(at)gmx(dot)net>:
> > On tis, 2010-09-07 at 20:35 +0200, Pavel Stehule wrote:
> >> I don't plan to try to move this module to core. And it's useless
> >> - other languages has not our problems.
> >
> > I don't know the details of what you're struggling with, but it's
> > a bit hard to believe that there is a problem that is absolutely
> > unique to the Czech language.
>
> I think so people uses a steamer dictionary - because ispell
> dictionary should be slow for any language. But there are not
> available steamer for Czech language. People who need fast
> processing just use a simple dictionary - and probably there are not
> any pg hacker from Poland or Slovakia.

I know of at least one in Poland, and I'd be amazed if there were none
from Slovakia.

Cheers,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate