Re: GIN improvements part2: fast scan

From: Rod Taylor <rbt(at)simple-knowledge(dot)com>
To: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: GIN improvements part2: fast scan
Date: 2013-11-14 23:25:37
Message-ID: CAKddOFBAp39whKbbYLvyK8sYOFXO_gkGGeFtD0rUVu=6pY18GQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I checked out master and put together a test case using a small percentage
of production data for a known problem we have with Pg 9.2 and text search
scans.

A small percentage in this case means 10 million records randomly selected;
has a few billion records.

Tests ran for master successfully and I recorded timings.

Applied the patch included here to master along with
gin-packed-postinglists-14.patch.
Run make clean; ./configure; make; make install.
make check (All 141 tests passed.)

initdb, import dump

The GIN index fails to build with a segfault.

DETAIL: Failed process was running: CREATE INDEX textsearch_gin_idx ON kp
USING gin (to_tsvector('simple'::regconfig, string)) WHERE (score1 IS NOT
NULL);

#0 XLogCheckBuffer (holdsExclusiveLock=1 '\001', lsn=lsn(at)entry=0x7fffcf341920,
bkpb=bkpb(at)entry=0x7fffcf341960, rdata=0x468f11 <ginFindLeafPage+529>,
rdata=0x468f11 <ginFindLeafPage+529>) at xlog.c:2339
#1 0x00000000004b9ddd in XLogInsert (rmid=rmid(at)entry=13 '\r',
info=info(at)entry=16 '\020', rdata=rdata(at)entry=0x7fffcf341bf0) at xlog.c:936
#2 0x0000000000468a9e in createPostingTree (index=0x7fa4e8d31030,
items=items(at)entry=0xfb55680, nitems=nitems(at)entry=762,
buildStats=buildStats(at)entry=0x7fffcf343dd0) at gindatapage.c:1324
#3 0x00000000004630c0 in buildFreshLeafTuple (buildStats=0x7fffcf343dd0,
nitem=762, items=0xfb55680, category=<optimized out>, key=34078256,
attnum=<optimized out>, ginstate=0x7fffcf341df0) at gininsert.c:281
#4 ginEntryInsert (ginstate=ginstate(at)entry=0x7fffcf341df0,
attnum=<optimized out>, key=34078256, category=<optimized out>,
items=0xfb55680, nitem=762,
buildStats=buildStats(at)entry=0x7fffcf343dd0) at gininsert.c:351
#5 0x00000000004635b0 in ginbuild (fcinfo=<optimized out>) at
gininsert.c:531
#6 0x0000000000718637 in OidFunctionCall3Coll
(functionId=functionId(at)entry=2738,
collation=collation(at)entry=0, arg1=arg1(at)entry=140346257507968,
arg2=arg2(at)entry=140346257510448, arg3=arg3(at)entry=32826432) at
fmgr.c:1649
#7 0x00000000004ce1da in index_build
(heapRelation=heapRelation(at)entry=0x7fa4e8d30680,
indexRelation=indexRelation(at)entry=0x7fa4e8d31030,
indexInfo=indexInfo(at)entry=0x1f4e440, isprimary=isprimary(at)entry=0
'\000', isreindex=isreindex(at)entry=0 '\000') at index.c:1963
#8 0x00000000004ceeaa in index_create
(heapRelation=heapRelation(at)entry=0x7fa4e8d30680,

indexRelationName=indexRelationName(at)entry=0x1f4e660
"textsearch_gin_knn_idx", indexRelationId=16395, indexRelationId(at)entry=0,
relFileNode=<optimized out>, indexInfo=indexInfo(at)entry=0x1f4e440,
indexColNames=indexColNames(at)entry=0x1f4f728,
accessMethodObjectId=accessMethodObjectId(at)entry=2742,
tableSpaceId=tableSpaceId(at)entry=0,
collationObjectId=collationObjectId(at)entry=0x1f4fcc8,

classObjectId=classObjectId(at)entry=0x1f4fce0,
coloptions=coloptions(at)entry=0x1f4fcf8,
reloptions=reloptions(at)entry=0, isprimary=0 '\000',
isconstraint=0 '\000', deferrable=0 '\000', initdeferred=0 '\000',
allow_system_table_mods=0 '\000', skip_build=0 '\000', concurrent=0 '\000',
is_internal=0 '\000') at index.c:1082
#9 0x0000000000546a78 in DefineIndex (stmt=<optimized out>,
indexRelationId=indexRelationId(at)entry=0, is_alter_table=is_alter_table(at)entry=0
'\000',
check_rights=check_rights(at)entry=1 '\001', skip_build=skip_build(at)entry=0
'\000', quiet=quiet(at)entry=0 '\000') at indexcmds.c:594
#10 0x000000000065147e in ProcessUtilitySlow
(parsetree=parsetree(at)entry=0x1f7fb68,

queryString=0x1f7eb10 "CREATE INDEX textsearch_gin_idx ON kp USING gin
(to_tsvector('simple'::regconfig, string)) WHERE (score1 IS NOT NULL);",
context=<optimized out>, params=params(at)entry=0x0,
completionTag=completionTag(at)entry=0x7fffcf344c10 "", dest=<optimized out>)
at utility.c:1163
#11 0x000000000065079e in standard_ProcessUtility (parsetree=0x1f7fb68,
queryString=<optimized out>, context=<optimized out>, params=0x0,
dest=<optimized out>, completionTag=0x7fffcf344c10 "") at utility.c:873
#12 0x000000000064de61 in PortalRunUtility (portal=portal(at)entry=0x1f4c350,
utilityStmt=utilityStmt(at)entry=0x1f7fb68, isTopLevel=isTopLevel(at)entry=1
'\001',
dest=dest(at)entry=0x1f7ff08, completionTag=completionTag(at)entry=0x7fffcf344c10
"") at pquery.c:1187
#13 0x000000000064e9e5 in PortalRunMulti (portal=portal(at)entry=0x1f4c350,
isTopLevel=isTopLevel(at)entry=1 '\001', dest=dest(at)entry=0x1f7ff08,
altdest=altdest(at)entry=0x1f7ff08,
completionTag=completionTag(at)entry=0x7fffcf344c10
"") at pquery.c:1318
#14 0x000000000064f459 in PortalRun (portal=portal(at)entry=0x1f4c350,
count=count(at)entry=9223372036854775807, isTopLevel=isTopLevel(at)entry=1
'\001',
dest=dest(at)entry=0x1f7ff08, altdest=altdest(at)entry=0x1f7ff08,
completionTag=completionTag(at)entry=0x7fffcf344c10 "") at pquery.c:816
#15 0x000000000064d2d5 in exec_simple_query (
query_string=0x1f7eb10 "CREATE INDEX textsearch_gin_idx ON kp USING gin
(to_tsvector('simple'::regconfig, string)) WHERE (score1 IS NOT NULL);") at
postgres.c:1048
#16 PostgresMain (argc=<optimized out>, argv=argv(at)entry=0x1f2ad40,
dbname=0x1f2abf8 "rbt", username=<optimized out>) at postgres.c:3992
#17 0x000000000045b1b4 in BackendRun (port=0x1f47280) at postmaster.c:4085
#18 BackendStartup (port=0x1f47280) at postmaster.c:3774
#19 ServerLoop () at postmaster.c:1585
#20 0x000000000060d031 in PostmasterMain (argc=argc(at)entry=3,
argv=argv(at)entry=0x1f28b20)
at postmaster.c:1240
#21 0x000000000045bb25 in main (argc=3, argv=0x1f28b20) at main.c:196

On Thu, Nov 14, 2013 at 12:26 PM, Alexander Korotkov
<aekorotkov(at)gmail(dot)com>wrote:

> On Sun, Jun 30, 2013 at 3:00 PM, Heikki Linnakangas <
> hlinnakangas(at)vmware(dot)com> wrote:
>
>> On 28.06.2013 22:31, Alexander Korotkov wrote:
>>
>>> Now, I got the point of three state consistent: we can keep only one
>>> consistent in opclasses that support new interface. exact true and exact
>>> false values will be passed in the case of current patch consistent;
>>> exact
>>> false and unknown will be passed in the case of current patch
>>> preConsistent. That's reasonable.
>>>
>>
>> I'm going to mark this as "returned with feedback". For the next version,
>> I'd like to see the API changed per above. Also, I'd like us to do
>> something about the tidbitmap overhead, as a separate patch before this, so
>> that we can assess the actual benefit of this patch. And a new test case
>> that demonstrates the I/O benefits.
>
>
> Revised version of patch is attached.
> Changes are so:
> 1) Patch rebased against packed posting lists, not depends on additional
> information now.
> 2) New API with tri-state logic is introduced.
>
> ------
> With best regards,
> Alexander Korotkov.
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2013-11-14 23:33:30 Re: init_sequence spill to hash table
Previous Message Peter Geoghegan 2013-11-14 23:16:00 Re: Anybody using get_eclass_for_sort_expr in an extension?