patch: tsearch - some memory diet

Lists: pgsql-hackers
From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: patch: tsearch - some memory diet
Date: 2010-09-07 07:10:19
Message-ID: AANLkTi=4fUi1zoFMpZ==Yf14RJDv_G1xgAkVQMDyEtBk@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hello

this simple patch reduce a persistent allocated memory for tsearch
ispell dictionaries.

on 32bit from 27MB (3399 blocks) to 13MB (1564 blocks)
on 64bit from 55MB to cca 27MB.

Regards

Pavel Stehule

Attachment Content-Type Size
tsearch_group_alloc.diff application/octet-stream 2.6 KB

From: Teodor Sigaev <teodor(at)sigaev(dot)ru>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: tsearch - some memory diet
Date: 2010-09-07 16:27:18
Message-ID: 4C8667E6.6090500@sigaev.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> on 32bit from 27MB (3399 blocks) to 13MB (1564 blocks)
> on 64bit from 55MB to cca 27MB.

Good results. But, I think, there are more places in ispell to use hold_memory():
- affixes and affix tree
- regis (REGex for ISpell, regis.c)

--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Teodor Sigaev <teodor(at)sigaev(dot)ru>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: tsearch - some memory diet
Date: 2010-09-07 16:42:07
Message-ID: AANLkTikSK8Cs5MAntyD9umagQhoEur_LS1JPre=P3BOQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2010/9/7 Teodor Sigaev <teodor(at)sigaev(dot)ru>:
>> on 32bit from 27MB (3399 blocks) to 13MB (1564 blocks)
>> on 64bit from 55MB to cca 27MB.
>
> Good results. But, I think, there are more places in ispell to use
> hold_memory():
> - affixes and affix tree
> - regis (REGex for ISpell, regis.c)

yes, but minimally for Czech dictionary other places are not
important. It's decrease 1MB more.

Last month I moved all these unreleased parts and it's not important.
But can be interesting check it for other languages than Czech.

Regards

Pavel Stehule

>
>
> --
> Teodor Sigaev                                   E-mail: teodor(at)sigaev(dot)ru
>                                                   WWW: http://www.sigaev.ru/
>


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Teodor Sigaev <teodor(at)sigaev(dot)ru>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: tsearch - some memory diet
Date: 2010-09-07 16:42:34
Message-ID: 4C866B7A.5000006@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 07/09/10 19:27, Teodor Sigaev wrote:
>> on 32bit from 27MB (3399 blocks) to 13MB (1564 blocks)
>> on 64bit from 55MB to cca 27MB.
>
> Good results. But, I think, there are more places in ispell to use
> hold_memory():
> - affixes and affix tree
> - regis (REGex for ISpell, regis.c)

A more general solution would be to have a new MemoryContext
implementation that does the same your patch does. Ie. instead of
tracking each allocation, just allocate a big chunk, and have palloc()
return the next n free bytes from it, like a stack. pfree() would
obviously not work, but wholesale MemoryContextDelete of the whole
memory context would.

I remember I actually tried this years ago, trying to reduce the
overhead of parsing IIRC. The parser also does a lot of small
allocations that are not individually pfree'd. And I think it helped a
tiny bit, but I didn't pursue it further. But if there's many places
where it would help, then it might well be worth it.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Teodor Sigaev <teodor(at)sigaev(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: tsearch - some memory diet
Date: 2010-09-07 16:47:54
Message-ID: AANLkTikrnFO1vCn=Go4pC+F5QPLZZLUCygjNpesH7aBq@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2010/9/7 Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>:
> On 07/09/10 19:27, Teodor Sigaev wrote:
>>>
>>> on 32bit from 27MB (3399 blocks) to 13MB (1564 blocks)
>>> on 64bit from 55MB to cca 27MB.
>>
>> Good results. But, I think, there are more places in ispell to use
>> hold_memory():
>> - affixes and affix tree
>> - regis (REGex for ISpell, regis.c)
>
> A more general solution would be to have a new MemoryContext implementation
> that does the same your patch does. Ie. instead of tracking each allocation,
> just allocate a big chunk, and have palloc() return the next n free bytes
> from it, like a stack. pfree() would obviously not work, but wholesale
> MemoryContextDelete of the whole memory context would.
>
> I remember I actually tried this years ago, trying to reduce the overhead of
> parsing IIRC. The parser also does a lot of small allocations that are not
> individually pfree'd. And I think it helped a tiny bit, but I didn't pursue
> it further. But if there's many places where it would help, then it might
> well be worth it.

I sent patch last year - simpleAllocator, and this idea was rejected.
But now I dislike this idea too. This is unclear, and some forgotten
"free" calling can do problems. This is more simple, more readable and
secure.

Regards

Pavel Stehule

>
> --
>  Heikki Linnakangas
>  EnterpriseDB   http://www.enterprisedb.com
>


From: Teodor Sigaev <teodor(at)sigaev(dot)ru>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: tsearch - some memory diet
Date: 2010-09-07 16:53:42
Message-ID: 4C866E16.7050604@sigaev.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> A more general solution would be to have a new MemoryContext
> implementation that does the same your patch does. Ie. instead of
> tracking each allocation, just allocate a big chunk, and have palloc()
> return the next n free bytes from it, like a stack. pfree() would
> obviously not work, but wholesale MemoryContextDelete of the whole
> memory context would.

repalloc() will not work too. Such implementation should have possibility to
debug memory allocation/management by using some kind of red-zones or
CLOBBER_FREED_MEMORY/MEMORY_CONTEXT_CHECKING
--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Teodor Sigaev <teodor(at)sigaev(dot)ru>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: tsearch - some memory diet
Date: 2010-09-07 17:30:54
Message-ID: 13016.1283880654@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
> A more general solution would be to have a new MemoryContext
> implementation that does the same your patch does. Ie. instead of
> tracking each allocation, just allocate a big chunk, and have palloc()
> return the next n free bytes from it, like a stack. pfree() would
> obviously not work, but wholesale MemoryContextDelete of the whole
> memory context would.

The trick with that is to not crash horribly if pfree or
GetMemoryChunkSpace or GetMemoryChunkContext is applied to such a chunk.
Perhaps we can live without that requirement, but it greatly limits the
safe usage of such a context type.

In the particular case here, the dictionary structures could probably
safely use such a context type, but I'm not sure it's worth bothering
if the long-term plan is to implement a precompiler. There would be
no need for this after the precompiled representation is installed,
because that'd just be one big hunk of memory anyway.

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: tsearch - some memory diet
Date: 2010-09-28 03:30:48
Message-ID: AANLkTi=QjrWSf5=YqDm8EZEabo_FRtS86RmZQ=xYfYtc@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Sep 7, 2010 at 1:30 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> In the particular case here, the dictionary structures could probably
> safely use such a context type, but I'm not sure it's worth bothering
> if the long-term plan is to implement a precompiler.  There would be
> no need for this after the precompiled representation is installed,
> because that'd just be one big hunk of memory anyway.

Rather than inventing something more complex, I'm inclined to say we
should just go ahead and apply this more or less as Pavel wrote it. I
haven't tried to reproduce Pavel's results, but I assume that they are
accurate and that's a pretty big savings for a pretty trivial amount
of code. If it gets thrown away later when/if someone codes up a
precompiler, no harm done.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: tsearch - some memory diet
Date: 2010-10-01 18:23:57
Message-ID: AANLkTimmgPrLK56JY=ho6x1wTvp8Uq=Ocr=CL1o_4ouK@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Sep 27, 2010 at 11:30 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Tue, Sep 7, 2010 at 1:30 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> In the particular case here, the dictionary structures could probably
>> safely use such a context type, but I'm not sure it's worth bothering
>> if the long-term plan is to implement a precompiler.  There would be
>> no need for this after the precompiled representation is installed,
>> because that'd just be one big hunk of memory anyway.
>
> Rather than inventing something more complex, I'm inclined to say we
> should just go ahead and apply this more or less as Pavel wrote it.  I
> haven't tried to reproduce Pavel's results, but I assume that they are
> accurate and that's a pretty big savings for a pretty trivial amount
> of code.  If it gets thrown away later when/if someone codes up a
> precompiler, no harm done.

I tried to reproduce Pavel's results this afternoon and failed. I
read the documentation:

http://developer.postgresql.org/pgdocs/postgres/textsearch-dictionaries.html#TEXTSEARCH-ISPELL-DICTIONARY

...and I followed the link to ispell. And I installed it from
MacPorts. And then I built it by hand, too. And I'm still confused.
Because I don't see anything in either set of results that looks like
the right set of files to use with CREATE TEXT SEARCH DICTIONARY.
What am I doing wrong?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: tsearch - some memory diet
Date: 2010-10-01 18:34:46
Message-ID: AANLkTiktajKxrE9s7_VZppECX47hby3Hx-cbTgdK3W9i@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hello

2010/10/1 Robert Haas <robertmhaas(at)gmail(dot)com>:
> On Mon, Sep 27, 2010 at 11:30 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> On Tue, Sep 7, 2010 at 1:30 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> In the particular case here, the dictionary structures could probably
>>> safely use such a context type, but I'm not sure it's worth bothering
>>> if the long-term plan is to implement a precompiler.  There would be
>>> no need for this after the precompiled representation is installed,
>>> because that'd just be one big hunk of memory anyway.
>>
>> Rather than inventing something more complex, I'm inclined to say we
>> should just go ahead and apply this more or less as Pavel wrote it.  I
>> haven't tried to reproduce Pavel's results, but I assume that they are
>> accurate and that's a pretty big savings for a pretty trivial amount
>> of code.  If it gets thrown away later when/if someone codes up a
>> precompiler, no harm done.
>
> I tried to reproduce Pavel's results this afternoon and failed.  I
> read the documentation:
>
> http://developer.postgresql.org/pgdocs/postgres/textsearch-dictionaries.html#TEXTSEARCH-ISPELL-DICTIONARY
>
> ...and I followed the link to ispell.  And I installed it from
> MacPorts.  And then I built it by hand, too.  And I'm still confused.
> Because I don't see anything in either set of results that looks like
> the right set of files to use with CREATE TEXT SEARCH DICTIONARY.
> What am I doing wrong?
>

download http://www.pgsql.cz/data/czech.tar.gz and unpack it to
pgsql/share/tsearch_data

as superuser do

CREATE TEXT SEARCH DICTIONARY cspell
(template=ispell, dictfile = czech, afffile=czech, stopwords=czech);
CREATE TEXT SEARCH CONFIGURATION cs (copy=english);
ALTER TEXT SEARCH CONFIGURATION cs
ALTER MAPPING FOR word, asciiword WITH cspell, simple;

and then postgres=# select * from ts_debug('cs','Příliš žluťoučký kůň
se napil žluté vody');

maybe try to read
http://www.april-child.com/blog/2007/06/25/tsearch2-utf8-czech-czech-utf-8-support-for-tsearch2-postgresql-82/

Regards

Pavel Stehule

> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise Postgres Company
>


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: tsearch - some memory diet
Date: 2010-10-01 19:46:32
Message-ID: AANLkTi=KcN-6K4CV3tQF8j8e36jQ+_0PCs7-5P9rztzT@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Oct 1, 2010 at 2:34 PM, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:
> Hello
>
> 2010/10/1 Robert Haas <robertmhaas(at)gmail(dot)com>:
>> On Mon, Sep 27, 2010 at 11:30 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>>> On Tue, Sep 7, 2010 at 1:30 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>>> In the particular case here, the dictionary structures could probably
>>>> safely use such a context type, but I'm not sure it's worth bothering
>>>> if the long-term plan is to implement a precompiler.  There would be
>>>> no need for this after the precompiled representation is installed,
>>>> because that'd just be one big hunk of memory anyway.
>>>
>>> Rather than inventing something more complex, I'm inclined to say we
>>> should just go ahead and apply this more or less as Pavel wrote it.  I
>>> haven't tried to reproduce Pavel's results, but I assume that they are
>>> accurate and that's a pretty big savings for a pretty trivial amount
>>> of code.  If it gets thrown away later when/if someone codes up a
>>> precompiler, no harm done.
>>
>> I tried to reproduce Pavel's results this afternoon and failed.  I
>> read the documentation:
>>
>> http://developer.postgresql.org/pgdocs/postgres/textsearch-dictionaries.html#TEXTSEARCH-ISPELL-DICTIONARY
>>
>> ...and I followed the link to ispell.  And I installed it from
>> MacPorts.  And then I built it by hand, too.  And I'm still confused.
>> Because I don't see anything in either set of results that looks like
>> the right set of files to use with CREATE TEXT SEARCH DICTIONARY.
>> What am I doing wrong?
>>
>
> download http://www.pgsql.cz/data/czech.tar.gz and unpack it to
> pgsql/share/tsearch_data
>
> as superuser do
>
> CREATE TEXT SEARCH DICTIONARY cspell
>   (template=ispell, dictfile = czech, afffile=czech, stopwords=czech);
> CREATE TEXT SEARCH CONFIGURATION cs (copy=english);
> ALTER TEXT SEARCH CONFIGURATION cs
>   ALTER MAPPING FOR word, asciiword WITH cspell, simple;
>
> and then postgres=# select * from ts_debug('cs','Příliš žluťoučký kůň
> se napil žluté vody');
>
> maybe try to read
> http://www.april-child.com/blog/2007/06/25/tsearch2-utf8-czech-czech-utf-8-support-for-tsearch2-postgresql-82/

Thanks, I'll try that. Do you have any other examples? Why was my
attempt to do what the documentation says unsuccessful?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: tsearch - some memory diet
Date: 2010-10-03 23:02:42
Message-ID: 7073.1286146962@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Tue, Sep 7, 2010 at 1:30 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> In the particular case here, the dictionary structures could probably
>> safely use such a context type, but I'm not sure it's worth bothering
>> if the long-term plan is to implement a precompiler. There would be
>> no need for this after the precompiled representation is installed,
>> because that'd just be one big hunk of memory anyway.

> Rather than inventing something more complex, I'm inclined to say we
> should just go ahead and apply this more or less as Pavel wrote it. I
> haven't tried to reproduce Pavel's results, but I assume that they are
> accurate and that's a pretty big savings for a pretty trivial amount
> of code. If it gets thrown away later when/if someone codes up a
> precompiler, no harm done.

Actually, my fear about it is that it will get in the way of whoever
tries to implement a precompiler, because it confuses the memory
management quite a lot. It's not at all apparent that the code is even
safe as-is, because it's depending on the unstated assumption that that
static variable will get reset once per dictionary. The documentation
is horrible: it doesn't really explain what the patch is doing, and what
it does say is wrong. It's not the case that that memory will never be
freed, so it's critical that the allocs happen in the dictCtx of the
dictionary currently being built, and not get piggybacked onto the
memory allocated for some previous dictionary. I *think* that it's not
actively broken as-is, but it's sure fragile and badly documented.

(Of course, the same charge could be leveled against the tmpCtx hack
that's already there; but that seems like a poor excuse for making
things even messier.)

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, PostgreSQLHackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: tsearch - some memory diet
Date: 2010-10-04 01:08:56
Message-ID: BFC2392A-9049-4019-BCCC-C5760FA3BF2D@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Oct 3, 2010, at 7:02 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> It's not at all apparent that the code is even
> safe as-is, because it's depending on the unstated assumption that that
> static variable will get reset once per dictionary. The documentation
> is horrible: it doesn't really explain what the patch is doing, and what
> it does say is wrong.

Yep. We certainly would need to convince ourselves that this is correct before applying it, and that is all kinds of non-obvious.

...Robert


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, PostgreSQLHackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: tsearch - some memory diet
Date: 2010-10-04 06:05:48
Message-ID: AANLkTik7YUnqM-7KHOYyRSw+Q4sFvKLvNjaTjfVFnFz5@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hello

2010/10/4 Robert Haas <robertmhaas(at)gmail(dot)com>:
> On Oct 3, 2010, at 7:02 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> It's not at all apparent that the code is even
>> safe as-is, because it's depending on the unstated assumption that that
>> static variable will get reset once per dictionary.  The documentation
>> is horrible: it doesn't really explain what the patch is doing, and what
>> it does say is wrong.
>
> Yep. We certainly would need to convince ourselves that this is correct before applying it, and that is all kinds of non-obvious.
>

what is good documentation?

This patch doesn't do more, than it removes palloc overhead on just
one structure of TSearch2 ispell dictionary. It isn't related to some
static variable - the most important is fact so this memory is
unallocated by dropping of memory context.

Regards

Pavel Stehule

> ...Robert


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, PostgreSQLHackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: tsearch - some memory diet
Date: 2010-10-06 16:44:08
Message-ID: AANLkTin3av6+X-G4_wqM4R13L-Z2tjewCN4uN6jyJ-_A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Oct 4, 2010 at 2:05 AM, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:
> 2010/10/4 Robert Haas <robertmhaas(at)gmail(dot)com>:
>> On Oct 3, 2010, at 7:02 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> It's not at all apparent that the code is even
>>> safe as-is, because it's depending on the unstated assumption that that
>>> static variable will get reset once per dictionary.  The documentation
>>> is horrible: it doesn't really explain what the patch is doing, and what
>>> it does say is wrong.
>>
>> Yep. We certainly would need to convince ourselves that this is correct before applying it, and that is all kinds of non-obvious.
>>
>
> what is good documentation?
>
> This patch doesn't do more, than it removes palloc overhead on just
> one structure of TSearch2 ispell dictionary. It isn't related to some
> static variable - the most important is fact so this memory is
> unallocated by dropping of memory context.

After looking at this a bit more, I don't think it's too hard to fix
up the comments so that they reflect what's actually going on here.
For this patch to be correct, the only thing we really need to believe
is that no one is going to try to pfree an SPNode, which seems like
something we ought to be able to convince ourselves of. I don't see
how the fact that some of the memory may get allocated out of a
palloc'd chunk from context X rather than from context X directly can
really cause any problems otherwise. The existing code already
depends on the unstated assumption that the static variable will get
reset once per dictionary; we're not making that any worse.

I think it would be cleaner to get rid of checkTmpCtx() and instead
have dispell_init() set up and tear down the temporary context,
leaving NULL behind in the global variable after it's torn down,
perhaps by having spell.c publish an API like this:

void NISetupForDictionaryLoad();
void NICleanupAfterDictionaryLoad();

...but I don't really see why that has to be done as part of this patch.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, PostgreSQLHackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: tsearch - some memory diet
Date: 2010-10-06 17:40:27
Message-ID: 19405.1286386827@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> I think it would be cleaner to get rid of checkTmpCtx() and instead
> have dispell_init() set up and tear down the temporary context,

What I was thinking of doing was getting rid of the static variable
altogether: we should do what you say above, but in the form of a
state struct that's created and destroyed by additional calls from
dispell_init(). Then that state struct could also carry the
infrastructure for this additional hack. It's a little more notation to
pass an additional parameter through all these routines, but from the
standpoint of understandability and maintainability it's clearly worth
it.

> void NISetupForDictionaryLoad();
> void NICleanupAfterDictionaryLoad();

More like

NISpellState *NISpellInit();
NISpellTerm(NISpellState *stat);

> ...but I don't really see why that has to be done as part of this patch.

Because patches that reduce maintainability seldom get cleaned up after.

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, PostgreSQLHackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: tsearch - some memory diet
Date: 2010-10-06 17:48:20
Message-ID: AANLkTinu+Re8aTn4ZbFd40xsb8VtNfrEWoFLy60196p0@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Oct 6, 2010 at 1:40 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> I think it would be cleaner to get rid of checkTmpCtx() and instead
>> have dispell_init() set up and tear down the temporary context,
>
> What I was thinking of doing was getting rid of the static variable
> altogether: we should do what you say above, but in the form of a
> state struct that's created and destroyed by additional calls from
> dispell_init().  Then that state struct could also carry the
> infrastructure for this additional hack.  It's a little more notation to
> pass an additional parameter through all these routines, but from the
> standpoint of understandability and maintainability it's clearly worth
> it.
>
>> void NISetupForDictionaryLoad();
>> void NICleanupAfterDictionaryLoad();
>
> More like
>
>        NISpellState *NISpellInit();
>        NISpellTerm(NISpellState *stat);
>
>> ...but I don't really see why that has to be done as part of this patch.
>
> Because patches that reduce maintainability seldom get cleaned up after.

I don't think you've made a convincing argument that this patch does
that, but if you're feeling motivated to go clean this up, I'm more
than happy to get out of the way.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, PostgreSQLHackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: tsearch - some memory diet
Date: 2010-10-06 18:00:35
Message-ID: 19763.1286388035@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Wed, Oct 6, 2010 at 1:40 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>>> ...but I don't really see why that has to be done as part of this patch.
>>
>> Because patches that reduce maintainability seldom get cleaned up after.

> I don't think you've made a convincing argument that this patch does
> that, but if you're feeling motivated to go clean this up, I'm more
> than happy to get out of the way.

Yeah, I'll take it.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: tsearch - some memory diet
Date: 2010-10-06 23:33:45
Message-ID: 14955.1286408025@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> writes:
> this simple patch reduce a persistent allocated memory for tsearch
> ispell dictionaries.

> on 32bit from 27MB (3399 blocks) to 13MB (1564 blocks)
> on 64bit from 55MB to cca 27MB.

Applied with revisions --- I got rid of the risky static state as per
discussion, and extended the hackery to strings and Aff nodes as
suggested by Teodor.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Teodor Sigaev <teodor(at)sigaev(dot)ru>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: tsearch - some memory diet
Date: 2010-10-06 23:36:42
Message-ID: 15008.1286408202@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Teodor Sigaev <teodor(at)sigaev(dot)ru> writes:
>> on 32bit from 27MB (3399 blocks) to 13MB (1564 blocks)
>> on 64bit from 55MB to cca 27MB.

> Good results. But, I think, there are more places in ispell to use hold_memory():
> - affixes and affix tree
> - regis (REGex for ISpell, regis.c)

I fixed the affix stuff as much as possible (some of the structures are
re-palloc'd so they can't easily be included). It appears that hacking
up regis, or any of the remaining allocations, wouldn't be worth the
trouble. Using the Czech dictionary on a 32-bit machine, I see about
16MB going through the compacted-alloc code and only about 375K going
through regular small palloc's.

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Teodor Sigaev <teodor(at)sigaev(dot)ru>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: tsearch - some memory diet
Date: 2010-10-06 23:44:46
Message-ID: AANLkTin4SScz_KLzp_zLeyeDYmtbw+eCgHWo37DO58UV@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Oct 6, 2010 at 7:36 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Teodor Sigaev <teodor(at)sigaev(dot)ru> writes:
>>> on 32bit from 27MB (3399 blocks) to 13MB (1564 blocks)
>>> on 64bit from 55MB to cca 27MB.
>
>> Good results. But, I think, there are more places in ispell to use hold_memory():
>> - affixes and affix tree
>> - regis (REGex for ISpell, regis.c)
>
> I fixed the affix stuff as much as possible (some of the structures are
> re-palloc'd so they can't easily be included).  It appears that hacking
> up regis, or any of the remaining allocations, wouldn't be worth the
> trouble.  Using the Czech dictionary on a 32-bit machine, I see about
> 16MB going through the compacted-alloc code and only about 375K going
> through regular small palloc's.

Nice. What was the overall effect on memory consumption?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Teodor Sigaev <teodor(at)sigaev(dot)ru>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: tsearch - some memory diet
Date: 2010-10-07 00:00:56
Message-ID: 15445.1286409656@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> Nice. What was the overall effect on memory consumption?

Before:
cspell: 31352608 total in 3814 blocks; 37432 free (6 chunks); 31315176 used

After:
cspell: 16214816 total in 1951 blocks; 13744 free (12 chunks); 16201072 used

This is on a 32-bit machine that uses MAXALIGN 8, and I also had
enable_cassert on (hence extra palloc chunk overhead) so it may be
overstating the amount of savings you'd see in production. But it's
in the same ballpark as what Pavel reported originally. AFAICT
practically all of the useful savings comes from the one place he
targeted originally, and the other changes are just marginal gravy.

Before I throw it away, here's some data about the allocations that
go through that code on the Czech dictionary. First column is number
of calls of the given size, second is requested size in bytes:

1 1
695 2
1310 3
1965 4
2565 5
1856 6
578 7
301 8
7 9
2 10
707733 12
520 16
107035 20
16 24
22606 28
3 32
8814 36
59 40
4305 44
2 48
2238 52
2 56
1236 60
20 64
816 68
599 76
1 80
408 84
9 88
334 92
2 96
246 100
1 104
164 108
13 112
132 116
110 124
1 128
107 132
3 136
81 140
1 144
77 148
40 156
46 164
29 172
39 180
2 184
35 188
31 196
19 204
16 212
12 220
10 228
3 244
1 304
1 400
1 1120

The spikes at 12/20/28 bytes correspond to SPNodes with 1/2/3 data
items.

BTW, on a 64-bit machine we're really paying through the nose for the
pointers in SPNodeData --- the pointers are bad enough, and their
alignment effects are worse. If we were to try to change this over to
a pointer-free representation, we could probably replace those pointers
by 32-bit offsets, which would save a full factor of 2 on 64-bit.

regards, tom lane