Re: WIP: store additional info in GIN index

From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Tomas Vondra <tv(at)fuzzy(dot)cz>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: WIP: store additional info in GIN index
Date: 2012-12-05 08:10:58
Message-ID: CAPpHfdtvXmW=phyeks5OKD6+2vZxRkpH3jJtdVOer43M_xTJiA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Dec 5, 2012 at 1:56 AM, Tomas Vondra <tv(at)fuzzy(dot)cz> wrote:

> On 4.12.2012 20:12, Alexander Korotkov wrote:
> > Hi!
> >
> > On Sun, Dec 2, 2012 at 5:02 AM, Tomas Vondra <tv(at)fuzzy(dot)cz
> > <mailto:tv(at)fuzzy(dot)cz>> wrote:
> >
> > I've tried to apply the patch with the current HEAD, but I'm getting
> > segfaults whenever VACUUM runs (either called directly or from
> autovac
> > workers).
> >
> > The patch applied cleanly against 9b3ac49e and needed a minor fix
> when
> > applied on HEAD (because of an assert added to ginRedoCreatePTree),
> but
> > that shouldn't be a problem.
> >
> >
> > Thanks for testing! Patch is rebased with HEAD. The bug you reported was
> > fixed.
>
> Applies fine, but I get a segfault in dataPlaceToPage at gindatapage.c.
> The whole backtrace is here: http://pastebin.com/YEPuWeuV
>
> The messages written into PostgreSQL log are quite variable - usually it
> looks like this:
>
> 2012-12-04 22:31:08 CET 31839 LOG: database system was not properly
> shut down; automatic recovery in progress
> 2012-12-04 22:31:08 CET 31839 LOG: redo starts at 0/68A76E48
> 2012-12-04 22:31:08 CET 31839 LOG: unexpected pageaddr 0/1BE64000 in
> log segment 000000010000000000000069, offset 15089664
> 2012-12-04 22:31:08 CET 31839 LOG: redo done at 0/69E63638
>
> but I've seen this message too
>
> 2012-12-04 22:20:29 CET 31709 LOG: database system was not properly
> shut down; automatic recovery in progress
> 2012-12-04 22:20:29 CET 31709 LOG: redo starts at 0/AEAFAF8
> 2012-12-04 22:20:29 CET 31709 LOG: record with zero length at 0/C7D5698
> 2012-12-04 22:20:29 CET 31709 LOG: redo done at 0/C7D55E
>
>
> I wasn't able to prepare a simple testcase to reproduce this, so I've
> attached two files from my "fun project" where I noticed it. It's a
> simple DB + a bit of Python for indexing mbox archives inside Pg.
>
> - create.sql - a database structure with a bunch of GIN indexes on
> tsvector columns on "messages" table
>
> - load.py - script for parsing mbox archives / loading them into the
> "messages" table (warning: it's a bit messy)
>
>
> Usage:
>
> 1) create the DB structure
> $ createdb archives
> $ psql archives < create.sql
>
> 2) fetch some archives (I consistently get SIGSEGV after first three)
> $ wget
> http://archives.postgresql.org/pgsql-hackers/mbox/pgsql-hackers.1997-01.gz
> $ wget
> http://archives.postgresql.org/pgsql-hackers/mbox/pgsql-hackers.1997-02.gz
> $ wget
> http://archives.postgresql.org/pgsql-hackers/mbox/pgsql-hackers.1997-03.gz
>
> 3) gunzip and load them using the python script
> $ gunzip pgsql-hackers.*.gz
> $ ./load.py --db archives pgsql-hackers.*
>
> 4) et voila - a SIGSEGV :-(
>
>
> I suspect this might be related to the fact that the load.py script uses
> savepoints quite heavily to handle UNIQUE_VIOLATION (duplicate messages).
>

Thanks for bug report. It is fixed in the attached patch.

------
With best regards,
Alexander Korotkov.

Attachment Content-Type Size
ginaddinfo.3.patch.gz application/x-gzip 31.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2012-12-05 08:12:15 Re: DEALLOCATE IF EXISTS
Previous Message Heikki Linnakangas 2012-12-05 08:08:38 Re: the number of pending entries in GIN index with FASTUPDATE=on