Lists: | pgsql-performance |
---|
From: | jesper(at)krogh(dot)cc |
---|---|
To: | pgsql-performance(at)postgresql(dot)org |
Subject: | Random penalties on GIN index updates? |
Date: | 2009-10-21 15:03:09 |
Message-ID: | b67fc372766d66a4eb391b4d69861c3c.squirrel@shrek.krogh.cc |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-performance |
Hi (running PG8.4.1)
As far as I have gotten in my test of PG Full Text Search.. I have got
over 6m documents indexed so far and the index has grown to 37GB. The
systems didnt do any autovacuums in the process but I manually vacuumed a
few times and that stopped growth for a short period of time.
table_name | index_name | times_used | table_size | index_size |
num_writes | definition
------------+-----------------+------------+------------+------------+------------+----------------------------------------------------------------------
ftstest | body_tfs_idx | 171 | 5071 MB | 37 GB |
6122086 | CREATE INDEX ftstest_tfs_idx ON ftstest USING gin
(ftstest_body_fts)
(1 row)
This is sort of what I'd expect this is not more scary than the Xapian
index it is comparing with. Search speed seems excellent. But I feel I'm
getting a significant drop-off in indexing speed as time goes by, I dont
have numbers to confirm this.
If i understand the technicalities correct then INSERT/UPDATES to the
index will be accumulated in the "maintainance_work_mem" and the "user"
being unlucky to fill it up will pay the penalty of merging all the
changes into the index?
I currently have "maintainance_work_mem" set to 128MB and according to
"pg_stat_activity" i currently have a insert sitting for over 1 hour. If I
strace the postgres process-id it is reading and writing a lot on the
filesystem and imposing an IO-wait load of 1 cpu.
Can I do something to prevent this from happening? Is it "by design"?
--
Jesper
From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | jesper(at)krogh(dot)cc |
Cc: | pgsql-performance(at)postgresql(dot)org |
Subject: | Re: Random penalties on GIN index updates? |
Date: | 2009-10-21 15:13:57 |
Message-ID: | 28265.1256138037@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-performance |
jesper(at)krogh(dot)cc writes:
> If i understand the technicalities correct then INSERT/UPDATES to the
> index will be accumulated in the "maintainance_work_mem" and the "user"
> being unlucky to fill it up will pay the penalty of merging all the
> changes into the index?
You can turn off the "fastupdate" index parameter to disable that,
but I think there may be a penalty in index bloat as well as insertion
speed. It would be better to use a more conservative work_mem
(work_mem, not maintenance_work_mem, is what limits the amount of stuff
accumulated during normal inserts).
regards, tom lane
From: | Jesper Krogh <jesper(at)krogh(dot)cc> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | pgsql-performance(at)postgresql(dot)org |
Subject: | Re: Random penalties on GIN index updates? |
Date: | 2009-10-21 17:58:34 |
Message-ID: | 4ADF4BCA.3030603@krogh.cc |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-performance |
Tom Lane wrote:
> jesper(at)krogh(dot)cc writes:
>> If i understand the technicalities correct then INSERT/UPDATES to the
>> index will be accumulated in the "maintainance_work_mem" and the "user"
>> being unlucky to fill it up will pay the penalty of merging all the
>> changes into the index?
>
> You can turn off the "fastupdate" index parameter to disable that,
> but I think there may be a penalty in index bloat as well as insertion
> speed. It would be better to use a more conservative work_mem
> (work_mem, not maintenance_work_mem, is what limits the amount of stuff
> accumulated during normal inserts).
Ok, I read the manual about that. Seems worth testing, hat I'm seeing is
stuff like this:
2009-10-21T16:32:21
2009-10-21T16:32:25
2009-10-21T16:32:30
2009-10-21T16:32:35
2009-10-21T17:10:50
2009-10-21T17:10:59
2009-10-21T17:11:09
... then it went on steady for another 180.000 documents.
Each row is a printout from the application doing INSERTS, it print the
time for each 1000 rows it gets through. It is the 38minutes in the
middle I'm a bit worried about.
work_mem is set to 512MB, that may translate into 180.000 documents in
my system?
What I seems to miss a way to make sure som "background" application is
the one getting the penalty, so a random user doing a single insert
won't get stuck. Is that doable?
It also seems to lock out other inserts while being in this state.
--
Jesper
From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Jesper Krogh <jesper(at)krogh(dot)cc> |
Cc: | pgsql-performance(at)postgresql(dot)org |
Subject: | Re: Random penalties on GIN index updates? |
Date: | 2009-10-21 18:35:15 |
Message-ID: | 1908.1256150115@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-performance |
Jesper Krogh <jesper(at)krogh(dot)cc> writes:
> What I seems to miss a way to make sure som "background" application is
> the one getting the penalty, so a random user doing a single insert
> won't get stuck. Is that doable?
You could force a vacuum every so often, but I don't think that will
help the locking situation. You really need to back off work_mem ---
512MB is probably not a sane global value for that anyway.
regards, tom lane
From: | Robert Haas <robertmhaas(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Jesper Krogh <jesper(at)krogh(dot)cc>, pgsql-performance(at)postgresql(dot)org |
Subject: | Re: Random penalties on GIN index updates? |
Date: | 2009-10-22 03:16:28 |
Message-ID: | 603c8f070910212016t3073b73cw3318787f81e42ad7@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-performance |
On Wed, Oct 21, 2009 at 2:35 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Jesper Krogh <jesper(at)krogh(dot)cc> writes:
>> What I seems to miss a way to make sure som "background" application is
>> the one getting the penalty, so a random user doing a single insert
>> won't get stuck. Is that doable?
>
> You could force a vacuum every so often, but I don't think that will
> help the locking situation. You really need to back off work_mem ---
> 512MB is probably not a sane global value for that anyway.
Yeah, it's hard to imagine a system where that doesn't threaten all
kinds of other bad results. I bet setting this to 4MB will make this
problem largely go away.
Arguably we shouldn't be using work_mem to control this particular
behavior, but...
...Robert
From: | Jesper Krogh <jesper(at)krogh(dot)cc> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-performance(at)postgresql(dot)org |
Subject: | Re: Random penalties on GIN index updates? |
Date: | 2009-10-22 04:57:49 |
Message-ID: | 4ADFE64D.1080608@krogh.cc |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-performance |
Robert Haas wrote:
> On Wed, Oct 21, 2009 at 2:35 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Jesper Krogh <jesper(at)krogh(dot)cc> writes:
>>> What I seems to miss a way to make sure som "background" application is
>>> the one getting the penalty, so a random user doing a single insert
>>> won't get stuck. Is that doable?
>> You could force a vacuum every so often, but I don't think that will
>> help the locking situation. You really need to back off work_mem ---
>> 512MB is probably not a sane global value for that anyway.
>
> Yeah, it's hard to imagine a system where that doesn't threaten all
> kinds of other bad results. I bet setting this to 4MB will make this
> problem largely go away.
>
> Arguably we shouldn't be using work_mem to control this particular
> behavior, but...
I came from Xapian, where you only can have one writer process, but
batching up in several GB's improved indexing performance dramatically.
Lowering work_mem to 16MB gives "batches" of 11.000 documents and stall
between 45 and 90s. ~ 33 docs/s
--
Jesper