Re: Random penalties on GIN index updates?

Lists: pgsql-performance
From: jesper(at)krogh(dot)cc
To: pgsql-performance(at)postgresql(dot)org
Subject: Random penalties on GIN index updates?
Date: 2009-10-21 15:03:09
Message-ID: b67fc372766d66a4eb391b4d69861c3c.squirrel@shrek.krogh.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Hi (running PG8.4.1)

As far as I have gotten in my test of PG Full Text Search.. I have got
over 6m documents indexed so far and the index has grown to 37GB. The
systems didnt do any autovacuums in the process but I manually vacuumed a
few times and that stopped growth for a short period of time.

table_name | index_name | times_used | table_size | index_size |
num_writes | definition
------------+-----------------+------------+------------+------------+------------+----------------------------------------------------------------------
ftstest | body_tfs_idx | 171 | 5071 MB | 37 GB |
6122086 | CREATE INDEX ftstest_tfs_idx ON ftstest USING gin
(ftstest_body_fts)
(1 row)

This is sort of what I'd expect this is not more scary than the Xapian
index it is comparing with. Search speed seems excellent. But I feel I'm
getting a significant drop-off in indexing speed as time goes by, I dont
have numbers to confirm this.

If i understand the technicalities correct then INSERT/UPDATES to the
index will be accumulated in the "maintainance_work_mem" and the "user"
being unlucky to fill it up will pay the penalty of merging all the
changes into the index?

I currently have "maintainance_work_mem" set to 128MB and according to
"pg_stat_activity" i currently have a insert sitting for over 1 hour. If I
strace the postgres process-id it is reading and writing a lot on the
filesystem and imposing an IO-wait load of 1 cpu.

Can I do something to prevent this from happening? Is it "by design"?

--
Jesper


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: jesper(at)krogh(dot)cc
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Random penalties on GIN index updates?
Date: 2009-10-21 15:13:57
Message-ID: 28265.1256138037@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

jesper(at)krogh(dot)cc writes:
> If i understand the technicalities correct then INSERT/UPDATES to the
> index will be accumulated in the "maintainance_work_mem" and the "user"
> being unlucky to fill it up will pay the penalty of merging all the
> changes into the index?

You can turn off the "fastupdate" index parameter to disable that,
but I think there may be a penalty in index bloat as well as insertion
speed. It would be better to use a more conservative work_mem
(work_mem, not maintenance_work_mem, is what limits the amount of stuff
accumulated during normal inserts).

regards, tom lane


From: Jesper Krogh <jesper(at)krogh(dot)cc>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Random penalties on GIN index updates?
Date: 2009-10-21 17:58:34
Message-ID: 4ADF4BCA.3030603@krogh.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Tom Lane wrote:
> jesper(at)krogh(dot)cc writes:
>> If i understand the technicalities correct then INSERT/UPDATES to the
>> index will be accumulated in the "maintainance_work_mem" and the "user"
>> being unlucky to fill it up will pay the penalty of merging all the
>> changes into the index?
>
> You can turn off the "fastupdate" index parameter to disable that,
> but I think there may be a penalty in index bloat as well as insertion
> speed. It would be better to use a more conservative work_mem
> (work_mem, not maintenance_work_mem, is what limits the amount of stuff
> accumulated during normal inserts).

Ok, I read the manual about that. Seems worth testing, hat I'm seeing is
stuff like this:

2009-10-21T16:32:21
2009-10-21T16:32:25
2009-10-21T16:32:30
2009-10-21T16:32:35
2009-10-21T17:10:50
2009-10-21T17:10:59
2009-10-21T17:11:09
... then it went on steady for another 180.000 documents.

Each row is a printout from the application doing INSERTS, it print the
time for each 1000 rows it gets through. It is the 38minutes in the
middle I'm a bit worried about.

work_mem is set to 512MB, that may translate into 180.000 documents in
my system?

What I seems to miss a way to make sure som "background" application is
the one getting the penalty, so a random user doing a single insert
won't get stuck. Is that doable?

It also seems to lock out other inserts while being in this state.

--
Jesper


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jesper Krogh <jesper(at)krogh(dot)cc>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Random penalties on GIN index updates?
Date: 2009-10-21 18:35:15
Message-ID: 1908.1256150115@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Jesper Krogh <jesper(at)krogh(dot)cc> writes:
> What I seems to miss a way to make sure som "background" application is
> the one getting the penalty, so a random user doing a single insert
> won't get stuck. Is that doable?

You could force a vacuum every so often, but I don't think that will
help the locking situation. You really need to back off work_mem ---
512MB is probably not a sane global value for that anyway.

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jesper Krogh <jesper(at)krogh(dot)cc>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Random penalties on GIN index updates?
Date: 2009-10-22 03:16:28
Message-ID: 603c8f070910212016t3073b73cw3318787f81e42ad7@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Wed, Oct 21, 2009 at 2:35 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Jesper Krogh <jesper(at)krogh(dot)cc> writes:
>> What I seems to miss a way to make sure som "background" application is
>> the one getting the penalty, so a random user doing a single insert
>> won't get stuck. Is that doable?
>
> You could force a vacuum every so often, but I don't think that will
> help the locking situation.  You really need to back off work_mem ---
> 512MB is probably not a sane global value for that anyway.

Yeah, it's hard to imagine a system where that doesn't threaten all
kinds of other bad results. I bet setting this to 4MB will make this
problem largely go away.

Arguably we shouldn't be using work_mem to control this particular
behavior, but...

...Robert


From: Jesper Krogh <jesper(at)krogh(dot)cc>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Random penalties on GIN index updates?
Date: 2009-10-22 04:57:49
Message-ID: 4ADFE64D.1080608@krogh.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Robert Haas wrote:
> On Wed, Oct 21, 2009 at 2:35 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Jesper Krogh <jesper(at)krogh(dot)cc> writes:
>>> What I seems to miss a way to make sure som "background" application is
>>> the one getting the penalty, so a random user doing a single insert
>>> won't get stuck. Is that doable?
>> You could force a vacuum every so often, but I don't think that will
>> help the locking situation. You really need to back off work_mem ---
>> 512MB is probably not a sane global value for that anyway.
>
> Yeah, it's hard to imagine a system where that doesn't threaten all
> kinds of other bad results. I bet setting this to 4MB will make this
> problem largely go away.
>
> Arguably we shouldn't be using work_mem to control this particular
> behavior, but...

I came from Xapian, where you only can have one writer process, but
batching up in several GB's improved indexing performance dramatically.
Lowering work_mem to 16MB gives "batches" of 11.000 documents and stall
between 45 and 90s. ~ 33 docs/s

--
Jesper