Re: GIN pending list pages not recycled promptly (was Re: GIN improvements part 1: additional information)

From: Amit Langote <amitlangote09(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Tomas Vondra <tv(at)fuzzy(dot)cz>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: GIN pending list pages not recycled promptly (was Re: GIN improvements part 1: additional information)
Date: 2014-06-19 07:51:29
Message-ID: CA+HiwqFm7D2Vrpw4byLERS4306Btd4E6v6yq_MuKyiV1OxZs1g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jun 19, 2014 at 2:09 PM, Amit Langote <amitlangote09(at)gmail(dot)com> wrote:
> On Wed, Jan 22, 2014 at 9:12 PM, Heikki Linnakangas
> <hlinnakangas(at)vmware(dot)com> wrote:
>>
>> I think we need to add the deleted pages to the FSM more aggressively.
>>
>> I tried simply adding calls to RecordFreeIndexPage, after the list pages
>> have been marked as deleted, but unfortunately that didn't help. The problem
>> is that the FSM is organized into a three-level tree, and
>> RecordFreeIndexPage only updates the bottom level. The upper levels are not
>> updated until the FSM is vacuumed, so the pages are still not visible to
>> GetFreeIndexPage calls until next vacuum. The simplest fix would be to add a
>> call to IndexFreeSpaceMapVacuum after flushing the pending list, per
>> attached patch. I'm slightly worried about the performance impact of the
>> IndexFreeSpaceMapVacuum() call. It scans the whole FSM of the index, which
>> isn't exactly free. So perhaps we should teach RecordFreeIndexPage to update
>> the upper levels of the FSM in a retail-fashion instead.
>>
>
> I wonder if you pursued this further?
>
> You recently added a number of TODO items related to GIN index; is it
> worth adding this to the list?
>

In fact, I forgot to mention that I tried your patch and it seems to
be working for the particular example I am working with using pg_bigm
(bigram) (indexed data not so realistic maybe).

postgres=# CREATE TABLE test(contents text);
CREATE TABLE

postgres=# create or replace function rnd_str(length integer) returns text as
$$
declare
chars text[] :=
'{0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F,G,H,I,J,K,L,M,N,O,P,Q,R,S,T,U,V,W,X,Y,Z,a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z,
あ, い, う, え, お, か, き, く, け, こ, さ, し, す, せ, そ, た, ち, つ, て, と, な, に, ぬ,
ね, の, は, ひ, ふ, へ, ほ, ま, み, む, め, も, や, ゆ, よ, ら, り, る, れ, ろ, わ, ゐ, ゑ,
を}';
result text := '';
i integer := 0;
arr_len integer;
begin
chars := array_append(chars, ' ');
arr_len := array_length(chars, 1);
if length < 0 then
raise exception 'Given length cannot be less than 0';
end if;
for i in 1..length loop
result := result || chars[1+random()*(arr_len - 1)];
end loop;
return result;
end;
$$ language plpgsql;
CREATE FUNCTION

-- HEAD

postgres=# INSERT INTO test SELECT rnd_str((random() * 500)::int) FROM
generate_series(1, 100000);
INSERT 0 100000

Time: 76296.641 ms

postgres=# SELECT pg_size_pretty(pg_table_size('test'));
pg_size_pretty
----------------
49 MB
(1 row)

postgres=# CREATE INDEX test_bigm_idx ON test USING GIN (contents gin_bigm_ops);
CREATE INDEX
Time: 50517.912 ms

postgres=# SELECT pg_size_pretty(pg_relation_size('test_bigm_idx'));
pg_size_pretty
----------------
118 MB
(1 row)

postgres=# TRUNCATE test;
TRUNCATE TABLE

postgres=# INSERT INTO test SELECT rnd_str((random() * 500)::int) FROM
generate_series(1, 100000);
INSERT 0 100000
Time: 233369.366 ms

postgres=# SELECT pg_size_pretty(pg_relation_size('test_bigm_idx'));
pg_size_pretty
----------------
747 MB
(1 row)

-- Whereas, patched (gin-recycle-deleted-list-pages-1.patch) HEAD

postgres=# INSERT INTO test SELECT rnd_str((random() * 500)::int) FROM
generate_series(1, 100000);
INSERT 0 100000
Time: 32808.779 ms

postgres=# CREATE INDEX test_bigm_idx ON test USING GIN (contents gin_bigm_ops);
CREATE INDEX
Time: 24490.945 ms

postgres=# SELECT pg_size_pretty(pg_relation_size('test_bigm_idx'));
pg_size_pretty
----------------
118 MB
(1 row)

postgres=# TRUNCATE test;
TRUNCATE TABLE

postgres=# INSERT INTO test SELECT rnd_str((random() * 500)::int) FROM
generate_series(1, 100000);
INSERT 0 100000
Time: 153878.163 ms

postgres=# SELECT pg_size_pretty(pg_relation_size('test_bigm_idx'));
pg_size_pretty
----------------
119 MB
(1 row)

That sure looks good.

--
Amit

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Ian Barwick 2014-06-19 08:29:47 Re: replication commands and log_statements
Previous Message Michael Paquier 2014-06-19 07:29:35 Re: Removing dependency to wsock32.lib when compiling code on WIndows