Skip site navigation (1) Skip section navigation (2)

Peripheral Links

Header And Logo

PostgreSQL
| The world's most advanced open source database.

Site Navigation

Search for
  Advanced Search

BUG #3875: Building GIN index on full table could be slower than filling table with existing index.


  • From: "Sokolov Yura" <funny(dot)falcon(at)gmail(dot)com>
  • To: pgsql-bugs(at)postgresql(dot)org
  • Subject: BUG #3875: Building GIN index on full table could be slower than filling table with existing index.
  • Date: Tue, 15 Jan 2008 13:20:32 GMT
  • Message-id: <200801151320(dot)m0FDKW7g035228(at)wwwmaster(dot)postgresql(dot)org>

The following bug has been logged online:

Bug reference:      3875
Logged by:          Sokolov Yura
Email address:      funny(dot)falcon(at)gmail(dot)com
PostgreSQL version: 8.3RC1 8.2.x
Operating system:   Debian 4.0
Description:        Building GIN index on full table could be slower than
filling table with existing index.
Details: 

Here is pathological case:

    \timing
    drop table if exists test_gin;
    
    create table test_gin (
      id serial primary key,
      ar int4[]
    );
    
    create index test_gin_ix_ar on test_gin
    using gin
    ( ar );
    -- And here is pathological pattern
    insert into test_gin (ar)
    select ARRAY[i, i-1, i+1, i-2, i+2]
    from generate_series(1, 20000) as i;
    
    drop index test_gin_ix_ar;
    
    create index test_gin_ix_ar on test_gin
    using gin
    ( ar );

Timing for this case:

    Timing is on.
    DROP TABLE
    Time: 19,421 ms
    CREATE TABLE
    Time: 13,531 ms
    CREATE INDEX
    Time: 3,352 ms
    INSERT 0 20000     !
    Time: 1064,444 ms  !!
    DROP INDEX
    Time: 30,925 ms
    CREATE INDEX       !
    Time: 28180,484 ms !!!!

So that, creating index on full table is 26.5x slower than filling table
with existing index.

After applying a patch below timings are:

    .....
    INSERT 0 20000
    Time: 1036,746 ms
    .....
    CREATE INDEX
    Time: 220,073 ms

For other tested pattern patched timing usually 
slightly better (5-10%) and in rare case slightly 
worse(5-10%). Other tested patterns are:

    insert into test_gin (ar)
    select ARRAY[i, i-1, i+1, i-2, i+2]
    from generate_series(1, 100) as i,
         generate_series(1, 200) as j;
    
    insert into test_gin (ar)
    select ARRAY[i, i-1, i+1, i-2, i+2]
    from generate_series(1, 200) as j,
         generate_series(1, 100) as i;
    
    insert into test_gin (ar)
    select ARRAY[i, i-1, i+1, i-2, i+2]
    from generate_series(1, 200) as j,
         generate_series(1, 1000) as i;
    
    insert into test_gin (ar)
    select (select int_array_aggregate(k) from generate_series(i, i+200) as
k)
    from generate_series(1, 200) as j,
         generate_series(1, 100) as i;
    
Patch:

diff -pr postgresql-8.3RC1old/src/backend/access/gin/ginbulk.c
postgresql-8.3RC1/src/backend/access/gin/ginbulk.c
*** postgresql-8.3RC1old/src/backend/access/gin/ginbulk.c	2008-01-01
22:45:46.000000000 +0300
--- postgresql-8.3RC1/src/backend/access/gin/ginbulk.c	2008-01-14
16:00:48.000000000 +0300
*************** ginInitBA(BuildAccumulator *accum)
*** 28,33 ****
--- 28,34 ----
  	accum->maxdepth = 1;
  	accum->stackpos = 0;
  	accum->entries = NULL;
+ 	accum->nentries = 0;
  	accum->stack = NULL;
  	accum->allocatedMemory = 0;
  	accum->entryallocator = NULL;
*************** EAAllocate(BuildAccumulator *accum)
*** 44,49 ****
--- 45,51 ----
  	}
  
  	accum->length++;
+ 	accum->nentries++;
  	return accum->entryallocator + accum->length - 1;
  }
  
diff -pr postgresql-8.3RC1old/src/backend/access/gin/gininsert.c
postgresql-8.3RC1/src/backend/access/gin/gininsert.c
*** postgresql-8.3RC1old/src/backend/access/gin/gininsert.c	2008-01-01
22:45:46.000000000 +0300
--- postgresql-8.3RC1/src/backend/access/gin/gininsert.c	2008-01-14
16:07:21.000000000 +0300
*************** ginBuildCallback(Relation index, HeapTup
*** 238,244 ****
  	buildstate->indtuples += ginHeapTupleBulkInsert(buildstate, *values,
&htup->t_self);
  
  	/* If we've maxed out our available memory, dump everything to the index
*/
! 	if (buildstate->accum.allocatedMemory >= maintenance_work_mem * 1024L)
  	{
  		ItemPointerData *list;
  		Datum		entry;
--- 238,246 ----
  	buildstate->indtuples += ginHeapTupleBulkInsert(buildstate, *values,
&htup->t_self);
  
  	/* If we've maxed out our available memory, dump everything to the index
*/
! 	if (buildstate->accum.allocatedMemory >= maintenance_work_mem * 1024L
! 	    || (buildstate->accum.maxdepth > 16 && 
! 	        buildstate->accum.nentries < (1 << (buildstate->accum.maxdepth /
2))))
  	{
  		ItemPointerData *list;
  		Datum		entry;
diff -pr postgresql-8.3RC1old/src/include/access/gin.h
postgresql-8.3RC1/src/include/access/gin.h
*** postgresql-8.3RC1old/src/include/access/gin.h	2008-01-01
22:45:56.000000000 +0300
--- postgresql-8.3RC1/src/include/access/gin.h	2008-01-14 15:59:51.000000000
+0300
*************** typedef struct
*** 450,455 ****
--- 450,456 ----
  {
  	GinState   *ginstate;
  	EntryAccumulator *entries;
+ 	uint32		nentries;
  	uint32		maxdepth;
  	EntryAccumulator **stack;
  	uint32		stackpos;



Home | Main Index | Thread Index

Privacy Policy | PostgreSQL Archives hosted by Command Prompt, Inc. | Designed by tinysofa
Copyright © 1996 – 2008 PostgreSQL Global Development Group