Simple tester for MVCC in PostgreSQL

Lists: pgsql-hackers
From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Simple tester for MVCC in PostgreSQL
Date: 2005-08-29 22:56:21
Message-ID: 20050829225605.GB31767@svana.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

[Please CC any replies, thanks]

Hi,

I saw the discussion about an tester for MVCC. Since I'd never done
anything with asyncronous queries before, I figured I'd try to write
something useful with it. The result is at:

http://svana.org/kleptog/pgsql/mvcctest.tar.gz

It's a tester that takes a testing script, runs the queries one at a
time through as many connections as you ask. It uses asyncronous
queries so it can handle queries that don't return immediatly (using &
and wait model ala UNIX). It doesn't use threads. It can assign
variables and handle expected errors and test that queries unblock at
the right time. Whether this is enough for serious testing I have no
idea.

I've included 5 selftests and 3 real tests (which about covers my
knowledge of transaction isolation levels). I hope it provides a useful
basis for a real tool.

It uses Perl and the Pg module from CPAN (included in many
distributions, it's basically a wrapper for libpq). For more details
see the README. A simple test harness type program is included. Just
typing "make" will set it going.

As an example I include a sample test script below (included as
tests/in/test3.test). It opens two connections and tries several
concurrent updates. In the first block, they don't conflict so
there is no delay. In the second block the second update has to wait
for the rollback, in the third it causes a serialization failure.

I will not be able to answer my email for the next few hours due to
sleep. Please be patient if it doesn't work straight away.

Have a nice day,

# This test tests simple serialisation failure
1I drop table t
1 SET default_transaction_isolation = serializable
2 SET default_transaction_isolation = serializable

1 create table t (grp text, value int4)
1 insert into t values ('a',10)
2 insert into t values ('a',20)
1 insert into t values ('b',30)
2 insert into t values ('b',40)
# These updates don't conflict, all should be fine
1 begin
2 begin
1 update t set value = value + 10 where grp = 'a'
2 update t set value = value + 10 where grp = 'b'
1 commit
2 commit
# These updates do conflict, but the second can proceed once first transaction aborts
1 begin
2 begin
1 update t set value = value + 10 where grp = 'a'
2& update t set value = value + 10 where grp = 'a'
1 abort
2 wait
2 commit
# These updates conflict, the second transaction aborts
1 begin
2 begin
1 update t set value = value + 10 where grp = 'a'
2&e update t set value = value + 10 where grp = 'a'
1 commit
2 wait
2 abort
1 drop table t

--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.


From: Matt Miller <mattm(at)epx(dot)com>
To: Martijn van Oosterhout <kleptog(at)svana(dot)org>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Simple tester for MVCC in PostgreSQL
Date: 2005-09-06 15:51:41
Message-ID: 1126021901.3076.7.camel@dbamm01-linux
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, 2005-08-30 at 00:56 +0200, Martijn van Oosterhout wrote:
> I saw the discussion about an tester for MVCC. Since I'd never done
> anything with asyncronous queries before, I figured I'd try to write
> something useful with it. The result is at:
>
> http://svana.org/kleptog/pgsql/mvcctest.tar.gz

I've started using it in some simple cases and it seems to be a good
tool. The feature set looks to me to be a pretty solid core on which to
build.

> It uses Perl and the Pg module from CPAN

This dependency seems easy enough to live with.


From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: Matt Miller <mattm(at)epx(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Simple tester for MVCC in PostgreSQL
Date: 2005-09-06 16:21:11
Message-ID: 20050906162053.GD24388@svana.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Sep 06, 2005 at 03:51:41PM +0000, Matt Miller wrote:
> On Tue, 2005-08-30 at 00:56 +0200, Martijn van Oosterhout wrote:
> > http://svana.org/kleptog/pgsql/mvcctest.tar.gz
>
> I've started using it in some simple cases and it seems to be a good
> tool. The feature set looks to me to be a pretty solid core on which to
> build.

Very nice. I too think the base is sufficient for quite complicated
tests. I actually wrote a script which tested all pairs of locks to
ensure they blocked exactly as the documentation said they should. And
it passed.

Thanks for the feedback,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.


From: "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
To: Martijn van Oosterhout <kleptog(at)svana(dot)org>
Cc: Matt Miller <mattm(at)epx(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Simple tester for MVCC in PostgreSQL
Date: 2005-09-06 18:56:24
Message-ID: 20050906185624.GT60481@pervasive.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Sep 06, 2005 at 06:21:11PM +0200, Martijn van Oosterhout wrote:
> On Tue, Sep 06, 2005 at 03:51:41PM +0000, Matt Miller wrote:
> > On Tue, 2005-08-30 at 00:56 +0200, Martijn van Oosterhout wrote:
> > > http://svana.org/kleptog/pgsql/mvcctest.tar.gz
> >
> > I've started using it in some simple cases and it seems to be a good
> > tool. The feature set looks to me to be a pretty solid core on which to
> > build.
>
> Very nice. I too think the base is sufficient for quite complicated
> tests. I actually wrote a script which tested all pairs of locks to
> ensure they blocked exactly as the documentation said they should. And
> it passed.

Should we add this stuff to the regression tests (probably as a seperate
option since not everyone will want to install perl)?
--
Jim C. Nasby, Sr. Engineering Consultant jnasby(at)pervasive(dot)com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461


From: huaxin zhang <uwcssa(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: count(*) optimization
Date: 2005-09-06 19:21:16
Message-ID: f2f5625105090612215aafe162@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

not sure where to put this.

I run two queries:

1. select count(*) from table where indexed_column<10;
2. select * from table where indexed_column<10;

the indexed column is not clustered at all. I saw from the trace that
both query runs
through index scans on that index and takes the same amount of buffer
hits and disk read. However, shouldn't the optimizer notice that the
first query only needs to look at the indexes
and possibly reduce the amount of buffer/disk visits?

thanks


From: "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>
To: huaxin zhang <uwcssa(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: count(*) optimization
Date: 2005-09-06 19:44:13
Message-ID: 36e6829205090612446b7dac6e@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

huaxin... I'll save you the time... see the topic "MUCH ADO ABOUT COUNT(*)"
and "ADVANCED INDEX USAGE".

On 9/6/05, huaxin zhang <uwcssa(at)gmail(dot)com> wrote:
>
> not sure where to put this.
>
> I run two queries:
>
> 1. select count(*) from table where indexed_column<10;
> 2. select * from table where indexed_column<10;
>
> the indexed column is not clustered at all. I saw from the trace that
> both query runs
> through index scans on that index and takes the same amount of buffer
> hits and disk read. However, shouldn't the optimizer notice that the
> first query only needs to look at the indexes
> and possibly reduce the amount of buffer/disk visits?
>
> thanks
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
> choose an index scan if your joining column's datatypes do not
> match
>


From: Bruno Wolff III <bruno(at)wolff(dot)to>
To: huaxin zhang <uwcssa(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: count(*) optimization
Date: 2005-09-06 19:48:35
Message-ID: 20050906194835.GA23609@wolff.to
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Sep 06, 2005 at 15:21:16 -0400,
huaxin zhang <uwcssa(at)gmail(dot)com> wrote:
> not sure where to put this.
>
> I run two queries:
>
> 1. select count(*) from table where indexed_column<10;
> 2. select * from table where indexed_column<10;
>
> the indexed column is not clustered at all. I saw from the trace that
> both query runs
> through index scans on that index and takes the same amount of buffer
> hits and disk read. However, shouldn't the optimizer notice that the
> first query only needs to look at the indexes
> and possibly reduce the amount of buffer/disk visits?

No, because that isn't true. Whether or not a tuple is visible to the current
transaction isn't stored in indexes. If you have more questions on this, you
should look through the archives before asking them, as this topic has been
discussed numerous times.


From: Chris Browne <cbbrowne(at)acm(dot)org>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: count(*) optimization
Date: 2005-09-06 21:01:06
Message-ID: 60vf1dlxm5.fsf@dba2.int.libertyrms.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

uwcssa(at)gmail(dot)com (huaxin zhang) writes:
> not sure where to put this.
>
> I run two queries:
>
> 1. select count(*) from table where indexed_column<10;
> 2. select * from table where indexed_column<10;
>
> the indexed column is not clustered at all. I saw from the trace
> that both query runs through index scans on that index and takes the
> same amount of buffer hits and disk read.

> However, shouldn't the optimizer notice that the first query only
> needs to look at the indexes and possibly reduce the amount of
> buffer/disk visits?

No, it shouldn't, because that is NOT TRUE.

Indexes do not have MVCC visibility information stored in them, so
that a query cannot depend on the index to imply whether a particular
tuple is visible or not. It must read the tuple itself as well.
--
output = ("cbbrowne" "@" "acm.org")
http://www.ntlug.org/~cbbrowne/linuxdistributions.html
"I promise you a police car on every sidewalk."
-- M. Barry Mayor of Washington, DC