Quick Links

GiST concurrency commited

Lists:	pgsql-hackers

From:	Teodor Sigaev <teodor(at)sigaev(dot)ru>
To:	Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject:	GiST concurrency commited
Date:	2005-06-27 12:49:00
Message-ID:	42BFF5BC.70006@sigaev.ru
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Have we list named something like 'test focusing for 8.1'? If it exists then
GiST concurrency and recovery testing should be added to it. Especially,
recovery after crash. Of course, now Oleg and me going to begin a large test
program.

While I'm running test with concurrent select/insert/update/delete/vacuum/vacuum
full I found, that sometimes postgres crashes in index_beginscan_internal on
FunctionCall3, because structure 'procedure' becomes zeroed. As I understand,
LockRelation can invalidate part of Relation structure. So, I moved
GET_REL_PROCEDURE after LockRelation. It seems to me, this patch should be
backpatched or it's needed another fixing. This problem was 2-4 times per
million statements executing by 4 flows.

And there is one more problem: it caused approximatly one time per 2-4 million
statements, I got traps:
TRAP: FailedAssertion("!((*curpage)->offsets_used == num_tuples)", File:
"vacuum.c", Line: 2766)
LOG: server process (PID 15847) was terminated by signal 6
Sorry, but I couldn't debug this trap and my knowledge about this piece of code
is very limited. Postgres didn't create a core file. I don't believe this
problem is in touch with my GiST framework, becouse it is about heap pages. I
suspect trap occurs while concurrent vacuum, but I am not sure.

PS
My concurrency testing scripts:
http://www.sigaev.ru/gist/
concur.pl - generator of SQL statements
concur.sh - simple wrapper about concur.pl which reinit db, makes db and table.

--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Teodor Sigaev <teodor(at)sigaev(dot)ru>
Cc:	Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: GiST concurrency commited
Date:	2005-06-27 14:38:07
Message-ID:	6946.1119883087@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Teodor Sigaev <teodor(at)sigaev(dot)ru> writes:
> While I'm running test with concurrent
> select/insert/update/delete/vacuum/vacuum full I found, that sometimes
> postgres crashes in index_beginscan_internal on FunctionCall3, because
> structure 'procedure' becomes zeroed. As I understand, LockRelation
> can invalidate part of Relation structure. So, I moved
> GET_REL_PROCEDURE after LockRelation.

Oooh, good catch.

> It seems to me, this patch
> should be backpatched or it's needed another fixing.

No, it's not an issue in the back branches, because until recently
GET_REL_PROCEDURE only fetched the function OID.

> And there is one more problem: it caused approximatly one time per 2-4 million
> statements, I got traps:
> TRAP: FailedAssertion("!((*curpage)->offsets_used == num_tuples)", File:
> "vacuum.c", Line: 2766)
> LOG: server process (PID 15847) was terminated by signal 6

Odd. Will look at it later (after feature freeze), if you don't find
the cause beforehand.

regards, tom lane

From:	Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Teodor Sigaev <teodor(at)sigaev(dot)ru>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: GiST concurrency commited
Date:	2005-06-28 01:37:49
Message-ID:	42C0A9ED.30600@familyhealth.com.au
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

I think the whole GiST limitations page can be removed now...

http://developer.postgresql.org/docs/postgres/limitations.html

Chris

From:	Teodor Sigaev <teodor(at)sigaev(dot)ru>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: GiST concurrency commited
Date:	2005-06-29 07:30:02
Message-ID:	42C24DFA.80107@sigaev.ru
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

>>And there is one more problem: it caused approximatly one time per 2-4 million
>>statements, I got traps:
>>TRAP: FailedAssertion("!((*curpage)->offsets_used == num_tuples)", File:
>>"vacuum.c", Line: 2766)
>>LOG: server process (PID 15847) was terminated by signal 6
>
>
> Odd. Will look at it later (after feature freeze), if you don't find
> the cause beforehand.

It's definitly bug in a vaccum code, I got the same trap without any GiST
indexes (to reproduce, just comment out 'create index' command in my script).

--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/

From:	"Qingqing Zhou" <zhouqq(at)cs(dot)toronto(dot)edu>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: GiST concurrency commited
Date:	2005-06-29 10:09:34
Message-ID:	d9ts50$e6j$1@news.hub.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

"Teodor Sigaev" <teodor(at)sigaev(dot)ru> writes
>
> concur.pl - generator of SQL statements

retrieving it is forbidden ...

Regards,
Qingqing

From:	Teodor Sigaev <teodor(at)sigaev(dot)ru>
To:	Qingqing Zhou <zhouqq(at)cs(dot)toronto(dot)edu>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: GiST concurrency commited
Date:	2005-06-29 10:43:11
Message-ID:	42C27B3F.4040203@sigaev.ru
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Sorry, fixed.

Qingqing Zhou wrote:
> "Teodor Sigaev" <teodor(at)sigaev(dot)ru> writes
>
>>concur.pl - generator of SQL statements
>
>
> retrieving it is forbidden ...
>
> Regards,
> Qingqing
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 7: don't forget to increase your free space map settings

--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/

From:	Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To:	Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: GiST concurrency commited
Date:	2005-07-02 20:08:34
Message-ID:	200507022008.j62K8YU06620@candle.pha.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Christopher Kings-Lynne wrote:
> I think the whole GiST limitations page can be removed now...
>
> http://developer.postgresql.org/docs/postgres/limitations.html

Done.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Teodor Sigaev <teodor(at)sigaev(dot)ru>
Cc:	Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	VACUUM/t_ctid bug (was Re: GiST concurrency commited)
Date:	2005-08-20 06:20:09
Message-ID:	20570.1124518809@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Awhile back, Teodor Sigaev <teodor(at)sigaev(dot)ru> wrote:
> And there is one more problem: it caused approximatly one time per 2-4 million
> statements, I got traps:
> TRAP: FailedAssertion("!((*curpage)->offsets_used == num_tuples)", File:
> "vacuum.c", Line: 2766)
> LOG: server process (PID 15847) was terminated by signal 6
> Sorry, but I couldn't debug this trap and my knowledge about this piece of code
> is very limited. Postgres didn't create a core file. I don't believe this
> problem is in touch with my GiST framework, becouse it is about heap pages. I
> suspect trap occurs while concurrent vacuum, but I am not sure.

> PS
> My concurrency testing scripts:
> http://www.sigaev.ru/gist/
> concur.pl - generator of SQL statements
> concur.sh - simple wrapper about concur.pl which reinit db, makes db and table.

I have committed changes that I believe fix this problem:
http://archives.postgresql.org/pgsql-committers/2005-08/msg00213.php
But it needs more testing. Would you update to CVS tip and see if you
still see the failure?

Also, if anyone else has some vacuum + concurrent update test cases,
any testing you can do in CVS tip would be useful. This patch is big
and ugly enough that back-patching it into all the supported back
branches is a pretty scary prospect. I don't think we have a lot of
choice --- it is a data-loss risk --- but we need to beat the heck
out of the CVS-tip version before we start pushing it into the release
branches.

My current intention is to leave it just in CVS tip for the next few
days, and not to start developing back-branch versions until after
we've made the first 8.1 beta release. The back-ports are going to
be painful (the code involved has changed often enough that I fear
each branch will need a custom tailored patch) ... so I really don't
want to start without some confidence that the CVS-tip patch is right.

In other words ... if you can test this ... HELP!!!

regards, tom lane

From:	Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Teodor Sigaev <teodor(at)sigaev(dot)ru>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: VACUUM/t_ctid bug (was Re: GiST concurrency commited)
Date:	2005-08-20 07:54:05
Message-ID:	Pine.LNX.4.58.0508201749060.3361@linuxworld.com.au
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sat, 20 Aug 2005, Tom Lane wrote:

> I have committed changes that I believe fix this problem:
> http://archives.postgresql.org/pgsql-committers/2005-08/msg00213.php
> But it needs more testing. Would you update to CVS tip and see if you
> still see the failure?

I've written some quick scripts. One just vacuums constantly (999 vacuums
to 1 vacuum full) while three other scripts three randomly insert
into, update and delete from 3 tables. There's a mix of small and large
transactions. The tables have a single int column. It is set up to run 3
million transactions across the 3 scripts.

I will try and jump onto one of the larger OSDL machines to test as well.

Gavin

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>
Cc:	Teodor Sigaev <teodor(at)sigaev(dot)ru>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: VACUUM/t_ctid bug (was Re: GiST concurrency commited)
Date:	2005-08-20 14:25:58
Message-ID:	22915.1124547958@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Gavin Sherry <swm(at)linuxworld(dot)com(dot)au> writes:
> I've written some quick scripts. One just vacuums constantly (999 vacuums
> to 1 vacuum full) while three other scripts three randomly insert
> into, update and delete from 3 tables. There's a mix of small and large
> transactions. The tables have a single int column. It is set up to run 3
> million transactions across the 3 scripts.

Note that since the issues have mainly to do with update chains, it'd be
good to stress cases where a row is updated multiple times before being
deleted. And use at least one long-running transaction, so that VACUUM
can't just throw away the update chain.

regards, tom lane

From:	Teodor Sigaev <teodor(at)sigaev(dot)ru>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: VACUUM/t_ctid bug (was Re: GiST concurrency commited)
Date:	2005-08-22 10:32:25
Message-ID:	4309A9B9.9040503@sigaev.ru
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> In other words ... if you can test this ... HELP!!!
I'll run tests.

--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/

From:	Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Teodor Sigaev <teodor(at)sigaev(dot)ru>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: VACUUM/t_ctid bug (was Re: GiST concurrency commited)
Date:	2005-08-23 06:06:54
Message-ID:	Pine.LNX.4.58.0508231604160.30102@linuxworld.com.au
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sat, 20 Aug 2005, Tom Lane wrote:

> Gavin Sherry <swm(at)linuxworld(dot)com(dot)au> writes:
> > I've written some quick scripts. One just vacuums constantly (999 vacuums
> > to 1 vacuum full) while three other scripts three randomly insert
> > into, update and delete from 3 tables. There's a mix of small and large
> > transactions. The tables have a single int column. It is set up to run 3
> > million transactions across the 3 scripts.
>
> Note that since the issues have mainly to do with update chains, it'd be
> good to stress cases where a row is updated multiple times before being
> deleted. And use at least one long-running transaction, so that VACUUM
> can't just throw away the update chain.

Right.

I modified the test so have multiple updates of a given row mixed with
concurrent long running read transactions. Vacuum was running repeatedly
in a concurrent session. I did not encounter any problems.

However, the results are inconclusive since I ran the same test against
HEAD from 10 days ago and didn't manage to trigger the problem Teodor's
script did. I'll take a better look tomorrow.

Gavin

From:	Teodor Sigaev <teodor(at)sigaev(dot)ru>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: VACUUM/t_ctid bug (was Re: GiST concurrency commited)
Date:	2005-08-23 10:40:21
Message-ID:	430AFD15.10400@sigaev.ru
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

It seems, patch works correctly. My tests is passed with approximatly 1e8 SQL
statements without any failure. Tests runs on two boxes: PIII/1133MHz adn Quad
Xeon/500MHz and works with four threads.

Further I'm going to increase concurrency up to 12 parallel threads.

PS GiST passed this tests too.

--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/

From:	Teodor Sigaev <teodor(at)sigaev(dot)ru>
To:
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: VACUUM/t_ctid bug (was Re: GiST concurrency commited)
Date:	2005-08-24 11:52:27
Message-ID:	430C5F7B.3090305@sigaev.ru
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> Further I'm going to increase concurrency up to 12 parallel threads.

All is ok, test is passed with approximatly 40 millions statements

--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/

From:	Hannu Krosing <hannu(at)skype(dot)net>
To:	Teodor Sigaev <teodor(at)sigaev(dot)ru>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: VACUUM/t_ctid bug (was Re: GiST concurrency commited)
Date:	2005-08-24 15:41:20
Message-ID:	1124898080.4827.13.camel@fuji.krosing.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On K, 2005-08-24 at 15:52 +0400, Teodor Sigaev wrote:
> > Further I'm going to increase concurrency up to 12 parallel threads.
>
> All is ok, test is passed with approximatly 40 millions statements
>

I have sent a patch to patches list enabling concurrent vacuums to
actually reclaim space while another long vacuum is running, i.e.
vacuums won't no longer block each other from removing deleted tuples.

see:

http://archives.postgresql.org/pgsql-patches/2005-08/msg00304.php

Could you perhaps test this patch as well, while you already have a
setup for testing parallel vacuums under big loads ?

Or perhaps you can share the setup/scripts/data so that I could run your
test myself as well ?

--
Hannu Krosing <hannu(at)skype(dot)net>

From:	Teodor Sigaev <teodor(at)sigaev(dot)ru>
To:	Hannu Krosing <hannu(at)skype(dot)net>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: VACUUM/t_ctid bug (was Re: GiST concurrency commited)
Date:	2005-08-25 09:16:04
Message-ID:	430D8C54.1000509@sigaev.ru
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> http://archives.postgresql.org/pgsql-patches/2005-08/msg00304.php
>
> Could you perhaps test this patch as well, while you already have a
> setup for testing parallel vacuums under big loads ?

Ok, I'll do it.

>
> Or perhaps you can share the setup/scripts/data so that I could run your
> test myself as well ?
>
Statements generator:
http://www.sigaev.ru/gist/concur.pl
Usage:
./concur.pl -d DATABASE [-n N] [-c N] [-l]
-d DATABASE - DATABASE name
-n N - number of rows
-c N - number of flow
-l - linear mode

Script guarantees the same result (ie the same data in table) with the same -n
and -c options, result doesn't depend on -l option. Also script guarantees
absence of dead locks.

% perl concur.pl -d qq -n 100000 -c 4
Start: parallel mode with 4 flows
3 flow finish. Stats: ni:25000 nu:555 nd:81 nv:4(nf:1) nt:232
0 flow finish. Stats: ni:25000 nu:554 nd:77 nv:2(nf:0) nt:247
1 flow finish. Stats: ni:25000 nu:548 nd:65 nv:4(nf:1) nt:249
2 flow finish. Stats: ni:25000 nu:552 nd:79 nv:7(nf:2) nt:263
All flow finish; status: 0; elapsed time: 159.25 sec

Script prints some statistics per flow:
ni - number of insert's statements (sum of this should be equal to -n N)
nu - -/- update's statements
nd - -/- delete's statements
nv - -/- vacuum (including full vacuum)
nf - number of full vacuum
nt - number of transactions

Simple wrapper for manipulating test table and loop testing (you should edit it
to right paths):
http://www.sigaev.ru/gist/concur.sh

Those scripts was wrote to test GiST concurrency.

I suspect it's needed to make some changes in generator to increase number of
updates and vacuums for your goal.

--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Teodor Sigaev <teodor(at)sigaev(dot)ru>
Cc:	Hannu Krosing <hannu(at)skype(dot)net>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: VACUUM/t_ctid bug (was Re: GiST concurrency commited)
Date:	2005-08-25 21:36:13
Message-ID:	23220.1125005773@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Teodor Sigaev <teodor(at)sigaev(dot)ru> writes:
> http://www.sigaev.ru/gist/concur.pl
> http://www.sigaev.ru/gist/concur.sh

BTW, these scripts seem to indicate that there's a GIST or
contrib/intarray problem in the 8.0 branch. I was trying to use 'em
to test REL8_0_STABLE branch tip to verify my t_ctid chain backpatch,
and I pretty consistently see "Problem with update":

Start: parallel mode with 4 flows
Problem with update {77,77}:0 count:1 at concur.pl line 91.
Issuing rollback() for database handle being DESTROY'd without explicit disconnect().
Problem with update {43,24}:3 count:1 at concur.pl line 91.
Issuing rollback() for database handle being DESTROY'd without explicit disconnect().
Problem with update {43,43}:2 count:1 at concur.pl line 91.
Issuing rollback() for database handle being DESTROY'd without explicit disconnect().
1 flow finish. Stats: ni:75000 nu:1661 nd:216 nv:13(nf:3) nt:780
All flow finish; status: 255; elapsed time: 265.48 sec

Is this something that can be fixed for 8.0.4?

regards, tom lane

From:	Teodor Sigaev <teodor(at)sigaev(dot)ru>
To:	Hannu Krosing <hannu(at)skype(dot)net>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: VACUUM/t_ctid bug (was Re: GiST concurrency commited)
Date:	2005-08-26 12:47:51
Message-ID:	430F0F77.3000808@sigaev.ru
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> http://archives.postgresql.org/pgsql-patches/2005-08/msg00304.php
>
> Could you perhaps test this patch as well, while you already have a
> setup for testing parallel vacuums under big loads ?

I didn't find any problem with your patch during testing with 1e8 statements...

--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/

From:	Hannu Krosing <hannu(at)skype(dot)net>
To:	Teodor Sigaev <teodor(at)sigaev(dot)ru>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: VACUUM/t_ctid bug (was Re: GiST concurrency commited)
Date:	2005-08-26 15:07:43
Message-ID:	1125068863.4823.25.camel@fuji.krosing.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On R, 2005-08-26 at 16:47 +0400, Teodor Sigaev wrote:
> > http://archives.postgresql.org/pgsql-patches/2005-08/msg00304.php
> >
> > Could you perhaps test this patch as well, while you already have a
> > setup for testing parallel vacuums under big loads ?
>
> I didn't find any problem with your patch during testing with 1e8 statements...

Thank You Very Much !

--
Hannu Krosing <hannu(at)skype(dot)net>

From:	Teodor Sigaev <teodor(at)sigaev(dot)ru>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: VACUUM/t_ctid bug (was Re: GiST concurrency commited)
Date:	2005-08-26 17:49:21
Message-ID:	430F5621.50904@sigaev.ru
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Finded problem in GiST isn't too simple to resolve. I'm working on it. The
problem is about update query...

Tom Lane wrote:
> Teodor Sigaev <teodor(at)sigaev(dot)ru> writes:
>
>>http://www.sigaev.ru/gist/concur.pl
>>http://www.sigaev.ru/gist/concur.sh
>
>
> BTW, these scripts seem to indicate that there's a GIST or
> contrib/intarray problem in the 8.0 branch. I was trying to use 'em
> to test REL8_0_STABLE branch tip to verify my t_ctid chain backpatch,
> and I pretty consistently see "Problem with update":
>
> Start: parallel mode with 4 flows
> Problem with update {77,77}:0 count:1 at concur.pl line 91.
> Issuing rollback() for database handle being DESTROY'd without explicit disconnect().
> Problem with update {43,24}:3 count:1 at concur.pl line 91.
> Issuing rollback() for database handle being DESTROY'd without explicit disconnect().
> Problem with update {43,43}:2 count:1 at concur.pl line 91.
> Issuing rollback() for database handle being DESTROY'd without explicit disconnect().
> 1 flow finish. Stats: ni:75000 nu:1661 nd:216 nv:13(nf:3) nt:780
> All flow finish; status: 255; elapsed time: 265.48 sec
>
> Is this something that can be fixed for 8.0.4?
>
> regards, tom lane

--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/

From:	Teodor Sigaev <teodor(at)sigaev(dot)ru>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Hannu Krosing <hannu(at)skype(dot)net>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: VACUUM/t_ctid bug (was Re: GiST concurrency commited)
Date:	2005-08-30 09:25:52
Message-ID:	43142620.2010405@sigaev.ru
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Fixed in 8.0, 7.4 and 7.3 branches.

--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/

From:	Mario Weilguni <mweilguni(at)sime(dot)com>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: VACUUM/t_ctid bug (was Re: GiST concurrency commited)
Date:	2005-08-30 09:58:37
Message-ID:	200508301158.37593.mweilguni@sime.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Am Dienstag, 30. August 2005 11:25 schrieb Teodor Sigaev:
> Fixed in 8.0, 7.4 and 7.3 branches.
>
> Tom Lane wrote:
> > Teodor Sigaev <teodor(at)sigaev(dot)ru> writes:
> >>http://www.sigaev.ru/gist/concur.pl
> >>http://www.sigaev.ru/gist/concur.sh
> >
> > BTW, these scripts seem to indicate that there's a GIST or
> > contrib/intarray problem in the 8.0 branch. I was trying to use 'em
> > to test REL8_0_STABLE branch tip to verify my t_ctid chain backpatch,
> > and I pretty consistently see "Problem with update":
> >
> > Start: parallel mode with 4 flows
> > Problem with update {77,77}:0 count:1 at concur.pl line 91.
> > Issuing rollback() for database handle being DESTROY'd without explicit
> > disconnect(). Problem with update {43,24}:3 count:1 at concur.pl line 91.
> > Issuing rollback() for database handle being DESTROY'd without explicit
> > disconnect(). Problem with update {43,43}:2 count:1 at concur.pl line 91.
> > Issuing rollback() for database handle being DESTROY'd without explicit
> > disconnect(). 1 flow finish. Stats: ni:75000 nu:1661 nd:216 nv:13(nf:3)
> > nt:780 All flow finish; status: 255; elapsed time: 265.48 sec
> >
> > Is this something that can be fixed for 8.0.4?
> >
> > regards, tom lane

Since 7.4 we have troubles with ltree (seldom corruption of buffer cache, not
on-disk), might this bug be somehow related to the ltree problem?
7.2 was rock-stable with ltree.

Best regards,
Mario Weilguni

From:	Teodor Sigaev <teodor(at)sigaev(dot)ru>
To:	Mario Weilguni <mweilguni(at)sime(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: VACUUM/t_ctid bug (was Re: GiST concurrency commited)
Date:	2005-08-30 10:19:38
Message-ID:	431432BA.9030207@sigaev.ru
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> Since 7.4 we have troubles with ltree (seldom corruption of buffer cache, not
> on-disk), might this bug be somehow related to the ltree problem?
> 7.2 was rock-stable with ltree.

Not sure. Fixed bug was (@ - contains operation):

update wow set a = a || '{101}'::int[] where a @ '{1,2,3}';
select a from wow where a @ '{1,2,3}' and not a @ '{101}';

After update query select must not find any rows, but it did. The problem was in
GiST code and so any GiST idexes was affected.

Can you say more about your trouble?

--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/

From:	Mario Weilguni <mweilguni(at)sime(dot)com>
To:	Teodor Sigaev <teodor(at)sigaev(dot)ru>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: VACUUM/t_ctid bug (was Re: GiST concurrency commited)
Date:	2005-08-30 10:51:44
Message-ID:	200508301251.45018.mweilguni@sime.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Am Dienstag, 30. August 2005 12:19 schrieb Teodor Sigaev:
> > Since 7.4 we have troubles with ltree (seldom corruption of buffer cache,
> > not on-disk), might this bug be somehow related to the ltree problem? 7.2
> > was rock-stable with ltree.
>
> Not sure. Fixed bug was (@ - contains operation):
>
> update wow set a = a || '{101}'::int[] where a @ '{1,2,3}';
> select a from wow where a @ '{1,2,3}' and not a @ '{101}';
>
> After update query select must not find any rows, but it did. The problem
> was in GiST code and so any GiST idexes was affected.
>
> Can you say more about your trouble?

We have queries that use ltree for sorting too, the sort looks like this:
order by subpath(ltreefield, 0, nlevel(ltreefield) - 1)

But concurrency leads to a bug, that results in an sql-error:
ERROR: invalid positions

Now we use locking to prevent concurrenct access on the most-used concurrent
part of the program, and the problem is extremly rare now, I had only 4
occurences in one year, but still happens (there are other access paths that
do not use locking, but they are rareley accessed).

It seems the ltree length parameter is set to 0 in the tuples, the content
itself is still there: Example:
Say the tuple was before treefield='1.2.3.4.5'
After the occurence of the error, I get: treefield='' (empty, but not null)

Using a tool Tom Lane told me to use, I checked it, and on-disk I had still
"1.2.3.4.5", but the length parameter of the ltree column was 0 (sorry, I was
wrong in my first mail, on-disk was broken too.)

Might this be somehow related to the intarray bugs?

Best regards,
Mario Weilguni

p.s.: I tried hard to create a self-contained test for tracking this down, but
failed.

From:	Teodor Sigaev <teodor(at)sigaev(dot)ru>
To:	Mario Weilguni <mweilguni(at)sime(dot)com>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: VACUUM/t_ctid bug (was Re: GiST concurrency commited)
Date:	2005-08-30 11:32:27
Message-ID:	431443CB.8090002@sigaev.ru
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> We have queries that use ltree for sorting too, the sort looks like this:
> order by subpath(ltreefield, 0, nlevel(ltreefield) - 1)
>
> But concurrency leads to a bug, that results in an sql-error:
> ERROR: invalid positions

contrib_regression=# select nlevel(t), subpath(t, 0, nlevel(t)-1) from
text2ltree('1.2.3.4.5') as t;
nlevel | subpath
--------+---------
5 | 1.2.3.4
(1 row)

contrib_regression=# select nlevel(t), subpath(t, 0, nlevel(t)-1) from
text2ltree('1') as t;
nlevel | subpath
--------+---------
1 |
(1 row)

contrib_regression=# select nlevel(t), subpath(t, 0, nlevel(t)-1) from
text2ltree('') as t;
ERROR: invalid positions
contrib_regression=#

It's incorrect arguments for subpath().

>
> It seems the ltree length parameter is set to 0 in the tuples, the content
> itself is still there: Example:
> Say the tuple was before treefield='1.2.3.4.5'
> After the occurence of the error, I get: treefield='' (empty, but not null)
>
> Using a tool Tom Lane told me to use, I checked it, and on-disk I had still
> "1.2.3.4.5", but the length parameter of the ltree column was 0 (sorry, I was
> wrong in my first mail, on-disk was broken too.)

Interesting... But with some test suite or more information I'm helpless.
How often do updates/inserts of table and/or ltree column occurs? Vacuum?

>
> Might this be somehow related to the intarray bugs?

No, except case when you update your table something like to:
update tbl set ltreefield=... where ltreefield ...;

--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Teodor Sigaev <teodor(at)sigaev(dot)ru>
Cc:	Hannu Krosing <hannu(at)skype(dot)net>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: VACUUM/t_ctid bug (was Re: GiST concurrency commited)
Date:	2005-08-30 13:54:55
Message-ID:	21232.1125410095@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Teodor Sigaev <teodor(at)sigaev(dot)ru> writes:
>> Is this something that can be fixed for 8.0.4?

> Fixed in 8.0, 7.4 and 7.3 branches.

Excellent news. Thanks.

regards, tom lane