Quick Links

Re: SSI patch version 8

Lists:	pgsql-hackers

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	<pgsql-hackers(at)postgresql(dot)org>
Cc:	"Dan Ports" <drkp(at)csail(dot)mit(dot)edu>,<john(dot)okite(at)gmail(dot)org>
Subject:	SSI patch version 8
Date:	2011-01-10 16:03:18
Message-ID:	4D2AD96602000025000391C7@gw.wicourts.gov
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Due to popular request (Hey, David's popular, right?), I'm posting a
patch for Serializable Snapshot Isolation (SSI), although I don't
yet have everything in it that I was planning on submitting before
the CF. I will probably be submitting another version before the
deadline with the following, but there should be plenty here for
people to test and benchmark. We're done with the major refactoring
needed to address concerns raised in earlier reviews, and I don't
expect the remaining work to destabilize what's there or to have a
significant impact on performance.

(1) A README file needs to be prepared, mostly by copy/paste from
portions of the Wiki page at:

http://wiki.postgresql.org/wiki/Serializable

(2) The Concurrency Control chapter needs some major work:

http://developer.postgresql.org/pgdocs/postgres/mvcc.html

(3) Recent changes to index page split/combine logic are still
being tested.

(4) Two Phase Commit (2PC), used for distributed transactions,
needs work to move some of the SSI logic executed during PREPARE
TRANSACTION to COMMIT PREPARED.

(5) Index AMs other than btree need to be modified to support
fine-grained predicate locking. Currently GiST, GIN, and hash
indexes work correctly (in the sense of not allowing anomalies) by
acquiring SIRead locks at the index relation level. False positive
rw-conflicts, which could ultimately lead to transaction rollbacks,
will be more frequent when using these index access methods until
this work is done.

-Kevin

Attachment	Content-Type	Size
ssi-8.patch.gz	application/octet-stream	61.0 KB

From:	David Fetter <david(at)fetter(dot)org>
To:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc:	pgsql-hackers(at)postgresql(dot)org, Dan Ports <drkp(at)csail(dot)mit(dot)edu>, john(dot)okite(at)gmail(dot)org
Subject:	Re: SSI patch version 8
Date:	2011-01-10 17:48:45
Message-ID:	20110110174844.GB25656@fetter.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Jan 10, 2011 at 10:03:18AM -0600, Kevin Grittner wrote:
> Due to popular request (Hey, David's popular, right?),

Well, I'm a person, and "popular" originally refers to something like
that ;)

Cheers,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

From:	Anssi Kääriäinen <anssi(dot)kaariainen(at)thl(dot)fi>
To:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc:	"pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Dan Ports <drkp(at)csail(dot)mit(dot)edu>, "john(dot)okite(at)gmail(dot)org" <john(dot)okite(at)gmail(dot)org>
Subject:	Re: SSI patch version 8
Date:	2011-01-11 12:00:49
Message-ID:	4D2C4671.9070906@thl.fi
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 01/10/2011 06:03 PM, Kevin Grittner wrote:
> Due to popular request (Hey, David's popular, right?), I'm posting a
> patch for Serializable Snapshot Isolation (SSI), although I don't
> yet have everything in it that I was planning on submitting before
> the CF. I will probably be submitting another version before the
> deadline with the following, but there should be plenty here for
> people to test and benchmark. We're done with the major refactoring
> needed to address concerns raised in earlier reviews, and I don't
> expect the remaining work to destabilize what's there or to have a
> significant impact on performance.
I think I found a problem. This is using SSI v8. The table definition:

create table test_t (id integer, val1 text, val2 integer);

create index test_idx on test_t(id) where val2 = 1;

The data:

insert into test_t (select generate_series(0, 10000), 'a', 2);
insert into test_t (select generate_series(0, 10), 'a', 1);

The transactions:
T1:
hot2=> begin transaction isolation level serializable;
BEGIN
hot2=> select * from test_t where val2 = 1;
id | val1 | val2
----+------+------
0 | a | 1
1 | a | 1
2 | a | 1
3 | a | 1
4 | a | 1
5 | a | 1
6 | a | 1
7 | a | 1
8 | a | 1
9 | a | 1
10 | a | 1
(11 rows)

hot2=> update test_t set val2 = 2 where val2 = 1 and id = 10;
UPDATE 1
-- The concurrent transaction:
T2:
hot2=> begin transaction isolation level serializable;
BEGIN
hot2=> select * from test_t where val2 = 1;
id | val1 | val2
----+------+------
0 | a | 1
1 | a | 1
2 | a | 1
3 | a | 1
4 | a | 1
5 | a | 1
6 | a | 1
7 | a | 1
8 | a | 1
9 | a | 1
10 | a | 1
(11 rows)

hot2=> update test_t set val2 = 2 where val2 = 1 and id = 9;
UPDATE 1
hot2=> commit;
COMMIT
-- Now, t1 can commit, too. Even though there is a serialization anomaly
T1:
hot2=> commit;
COMMIT

If the test_idx is changed:
(outside any transaction...)
hot2=> drop index test_idx;
DROP INDEX
hot2=> create index test_idx on test_t(id, val2);
CREATE INDEX

T1:
hot2=> begin transaction isolation level serializable;
BEGIN
hot2=> select * from test_t where val2 = 1;
id | val1 | val2
----+------+------
0 | a | 1
1 | a | 1
2 | a | 1
3 | a | 1
4 | a | 1
5 | a | 1
6 | a | 1
7 | a | 1
8 | a | 1
(9 rows)

hot2=> update test_t set val2 = 2 where val2 = 1 and id = 8;
UPDATE 1

T2:
hot2=> select * from test_t where val2 = 1;
id | val1 | val2
----+------+------
0 | a | 1
1 | a | 1
2 | a | 1
3 | a | 1
4 | a | 1
5 | a | 1
6 | a | 1
7 | a | 1
8 | a | 1
(9 rows)

hot2=> update test_t set val2 = 2 where val2 = 1 and id = 7;
UPDATE 1
hot2=> commit;
ERROR: could not serialize access due to read/write dependencies among
transactions
HINT: The transaction might succeed if retried.
T1:
hot2=> commit;
COMMIT

So, something seems to be broken when using partial indexes.

- Anssi

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	Anssi Kääriäinen <anssi(dot)kaariainen(at)thl(dot)fi>
Cc:	"Dan Ports" <drkp(at)csail(dot)mit(dot)edu>, "john(dot)okite(at)gmail(dot)org" <john(dot)okite(at)gmail(dot)org>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: SSI patch version 8
Date:	2011-01-11 14:28:18
Message-ID:	4D2C14A202000025000392BA@gw.wicourts.gov
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Anssi Kääriäinen<anssi(dot)kaariainen(at)thl(dot)fi> wrote:

> A speed test showing a significant drop in performance when using
SSI:
>
> hot2=> create table test_t2 as (select generate_series(0, 1000000));
> hot2=> \timing
> begin transaction isolation level repeatable read;
> Time: 0.185 ms
> hot2=> select count(*) from test_t2;
> Time: 134.521 ms
> hot2=> select count(*) from test_t2;
> Time: 142.132 ms
> hot2=> select count(*) from test_t2;
> Time: 147.469 ms
> hot2=> commit;
> hot2=> begin transaction isolation level serializable;
> Time: 0.165 ms
> hot2=> select count(*) from test_t2;
> Time: 349.219 ms
> hot2=> select count(*) from test_t2;
> Time: 354.010 ms
> hot2=> select count(*) from test_t2;
> Time: 355.960 ms
> hot2=> commit;
>
> So, count(*) queries are more than twice as slow compared to the old

> serializable transaction isolation level.

Thanks for the report. I'll profile tat and see what can be done about
it.

-Kevin

From:	Anssi Kääriäinen <anssi(dot)kaariainen(at)thl(dot)fi>
To:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc:	"pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Dan Ports <drkp(at)csail(dot)mit(dot)edu>, "john(dot)okite(at)gmail(dot)org" <john(dot)okite(at)gmail(dot)org>
Subject:	Re: SSI patch version 8
Date:	2011-01-11 14:29:21
Message-ID:	4D2C6941.70104@thl.fi
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 01/10/2011 06:03 PM, Kevin Grittner wrote:
> Due to popular request (Hey, David's popular, right?), I'm posting a
> patch for Serializable Snapshot Isolation (SSI), although I don't
> yet have everything in it that I was planning on submitting before
> the CF. I will probably be submitting another version before the
> deadline with the following, but there should be plenty here for
> people to test and benchmark. We're done with the major refactoring
> needed to address concerns raised in earlier reviews, and I don't
> expect the remaining work to destabilize what's there or to have a
> significant impact on performance.
A speed test showing a significant drop in performance when using SSI:

hot2=> create table test_t2 as (select generate_series(0, 1000000));
hot2=> \timing
begin transaction isolation level repeatable read;
Time: 0.185 ms
hot2=> select count(*) from test_t2;
Time: 134.521 ms
hot2=> select count(*) from test_t2;
Time: 142.132 ms
hot2=> select count(*) from test_t2;
Time: 147.469 ms
hot2=> commit;
hot2=> begin transaction isolation level serializable;
Time: 0.165 ms
hot2=> select count(*) from test_t2;
Time: 349.219 ms
hot2=> select count(*) from test_t2;
Time: 354.010 ms
hot2=> select count(*) from test_t2;
Time: 355.960 ms
hot2=> commit;

So, count(*) queries are more than twice as slow compared to the old
serializable transaction isolation level. The relative speed difference
stays the same if testing with more rows. Also, the same speed
difference is there if testing with more columns:

create table test_t4 as (select g g1, g g2, g g3, g g4, g g5, g g6 from
(select generate_series as g from generate_series(0, 1000000)) as t1);

repeatable read times:
140.113 ms 134.628 ms 140.113 ms 156.166 ms

serializable:
353.257 ms 366.558 ms 373.914 ms 359.682 ms

- Anssi

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	Anssi Kääriäinen <anssi(dot)kaariainen(at)thl(dot)fi>
Cc:	"Dan Ports" <drkp(at)csail(dot)mit(dot)edu>, "john(dot)okite(at)gmail(dot)org" <john(dot)okite(at)gmail(dot)org>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: SSI patch version 8
Date:	2011-01-11 14:53:31
Message-ID:	4D2C1A8B02000025000392C2@gw.wicourts.gov
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Anssi Kääriäinen<anssi(dot)kaariainen(at)thl(dot)fi> wrote:

> I think I found a problem. This is using SSI v8.

Thanks much for testing. You're managing to exercise some code
paths I didn't think to test, which is great! I guess this is the
up side of having posted yesterday. :-)

> So, something seems to be broken when using partial indexes.

Yes there is. The predicate locks against the heap tuples should
have prevented that, but obviously we're missing a call to
PredicateLockTuple from the code path for the partial index which is
present on the code path for complete indexes. I'll go looking for
the spot to add that line of code.

-Kevin

From:	Anssi Kääriäinen <anssi(dot)kaariainen(at)thl(dot)fi>
To:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc:	Dan Ports <drkp(at)csail(dot)mit(dot)edu>, "john(dot)okite(at)gmail(dot)org" <john(dot)okite(at)gmail(dot)org>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: SSI patch version 8
Date:	2011-01-11 15:56:20
Message-ID:	4D2C7DA4.6060900@thl.fi
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 01/11/2011 04:53 PM, Kevin Grittner wrote:
> Thanks much for testing. You're managing to exercise some code
> paths I didn't think to test, which is great! I guess this is the
> up side of having posted yesterday. :-)

Glad that I can help. This feature is something that is very important
to our usage of PostgreSQL.

Just to add some positive feedback: I tried this patch with our in-house
in development EAV database. Everything was working perfectly, and time
to import current and history data for 7000 employees (total of 95000
attributes) to the DB using stored procedures went from 4 minutes 20
seconds to 4 minutes 30 seconds (the procedures are doing a lot of
validation...). So in this case the real-world performance difference
was barely noticeable. SSI was also able to block serialization
anomalies and I did not have any false rollbacks in my (very limited)
testing. So, thank you for the work on this feature. And, of course, I
will try to break it some more.:-)

- Anssi

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	<pgsql-hackers(at)postgresql(dot)org>, Anssi Kääriäinen <anssi(dot)kaariainen(at)thl(dot)fi>
Cc:	"Dan Ports" <drkp(at)csail(dot)mit(dot)edu>, "john(dot)okite(at)gmail(dot)org" <john(dot)okite(at)gmail(dot)org>
Subject:	SSI patch version 10
Date:	2011-01-11 23:27:40
Message-ID:	4D2C930C020000250003930B@gw.wicourts.gov
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Anssi Kääriäinen<anssi(dot)kaariainen(at)thl(dot)fi> wrote:

> something seems to be broken when using partial indexes.

Boy do I feel dumb for taking all day to find the cause.

The problem was a misdirected optimization -- on an update it was
only checking the "after" image for conflict, assuming that it would
be redundant to check both the before and after images. The problem
is that with a partial index, you might only see one of those
tuples, and I suspect there could be bugs with updates which changed
a value later used for access.

The evil premature optimization is eliminated here:

http://git.postgresql.org/gitweb?p=users/kgrittn/postgres.git;a=commitdiff;h=a9c63f9c688629e2f6f92a880c5f1ac9669bf77d

This also includes an attempt to eliminate the assertion failure Dan
found in DBT-2 testing yesterday. I'm not sure if this change is
radical enough, but I figured it was better to try the minimal
change first, and see if that was sufficient. If not, I'll have to
move some code between functions, and duplicate a bit of code.

New patch (version 10) attached.

-Kevin

Attachment	Content-Type	Size
ssi-10.patch.gz	application/octet-stream	60.9 KB

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	Anssi Kääriäinen <anssi(dot)kaariainen(at)thl(dot)fi>
Cc:	"Dan Ports" <drkp(at)csail(dot)mit(dot)edu>, "john(dot)okite(at)gmail(dot)org" <john(dot)okite(at)gmail(dot)org>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: SSI patch version 8
Date:	2011-01-12 19:18:45
Message-ID:	4D2DAA350200002500039395@gw.wicourts.gov
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Anssi Kääriäinen<anssi(dot)kaariainen(at)thl(dot)fi> wrote:

> So, count(*) queries are more than twice as slow compared to the
> old serializable transaction isolation level.

I've looked at this enough to know that I can do something about
that, but wanted to point out that this is a good example of why you
should specify READ ONLY when possible. My numbers:

begin transaction isolation level repeatable read;
Time: 394.946 ms
Time: 248.675 ms
Time: 242.559 ms

begin transaction isolation level serializable;
Time: 494.676 ms
Time: 494.036 ms
Time: 491.712 ms

begin transaction isolation level serializable, read only;
Time: 234.075 ms
Time: 234.050 ms
Time: 234.057 ms

begin transaction isolation level serializable, read only,
deferrable;
Time: 233.494 ms
Time: 234.099 ms
Time: 235.290 ms

The slower times for REPEATABLE READ gave me pause, so I ran those
again:

begin transaction isolation level repeatable read;
Time: 233.946 ms
Time: 236.200 ms
Time: 236.414 ms

I guess the database just hadn't "warmed up" enough for the first
few tests....

-Kevin

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	Anssi Kääriäinen <anssi(dot)kaariainen(at)thl(dot)fi>
Cc:	"Dan Ports" <drkp(at)csail(dot)mit(dot)edu>, "john(dot)okite(at)gmail(dot)org" <john(dot)okite(at)gmail(dot)org>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: SSI patch version 8
Date:	2011-01-13 00:01:56
Message-ID:	4D2DEC9402000025000393DA@gw.wicourts.gov
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Anssi Kääriäinen<anssi(dot)kaariainen(at)thl(dot)fi> wrote:

> So, count(*) queries are more than twice as slow compared to the
> old serializable transaction isolation level.

I got this down from more than twice the run time to running 33%
longer through remembering the last relation for which a search for
a predicate lock held by the current transaction found a match at
the coarsest (relation) level. It's a bit of a hack and 33% isn't
very impressive, even for a worst case (and this is one type of
worst case) -- especially given how often people use SELECT count(*)
FROM table_x as a performance test. :-(

I can see a way to improve on this if there's a low-cost way to
determine from within the heapam.c:heapgettup_pagemode function
whether it's returning tuples for a table scan. It seems likely
that this is somehow contained in the HeapScanDesc structure, but
I'm not seeing it. Can anyone point me in the right direction, or
tell me that this avenue is a dead end?

Thanks,

-Kevin

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc:	Anssi Kääriäinen <anssi(dot)kaariainen(at)thl(dot)fi>, Dan Ports <drkp(at)csail(dot)mit(dot)edu>, "john(dot)okite(at)gmail(dot)org" <john(dot)okite(at)gmail(dot)org>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: SSI patch version 8
Date:	2011-01-13 09:53:03
Message-ID:	4D2ECB7F.9000805@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 13.01.2011 02:01, Kevin Grittner wrote:
> Anssi Kääriäinen<anssi(dot)kaariainen(at)thl(dot)fi> wrote:
>
>> So, count(*) queries are more than twice as slow compared to the
>> old serializable transaction isolation level.
>
> I got this down from more than twice the run time to running 33%
> longer through remembering the last relation for which a search for
> a predicate lock held by the current transaction found a match at
> the coarsest (relation) level. It's a bit of a hack and 33% isn't
> very impressive, even for a worst case (and this is one type of
> worst case) -- especially given how often people use SELECT count(*)
> FROM table_x as a performance test. :-(
>
> I can see a way to improve on this if there's a low-cost way to
> determine from within the heapam.c:heapgettup_pagemode function
> whether it's returning tuples for a table scan. It seems likely
> that this is somehow contained in the HeapScanDesc structure, but
> I'm not seeing it. Can anyone point me in the right direction, or
> tell me that this avenue is a dead end?

Pardon my ignorance, but where exactly is the extra overhead coming
from? Searching for a predicate lock?

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	"Heikki Linnakangas" <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	"Dan Ports" <drkp(at)csail(dot)mit(dot)edu>, "john(dot)okite(at)gmail(dot)org" <john(dot)okite(at)gmail(dot)org>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, <anssi(dot)kaariainen(at)thl(dot)fi>
Subject:	Re: SSI patch version 8
Date:	2011-01-13 14:51:48
Message-ID:	4D2EBD24020000250003947B@gw.wicourts.gov
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:

> Pardon my ignorance, but where exactly is the extra overhead
> coming from? Searching for a predicate lock?

Right. As each tuple is read we need to ensure that there is a
predicate lock to cover it. Since finer-grained locks can be
combined into coarser-grained locks we need to start with the fine
grained and move toward checking the coarser grains, to avoid
missing a lock during promotion. So for each tuple we calculate a
hash, find a partition, lock it, and lookup the tuple as a lock
target. When that's not found we do the same thing for the page.
When that's not found we do the same thing for the relation.

But we acquired a relation lock up front, when we determined that
this would be a heap scan, so we could short-circuit this whole
thing if within the heapgettup_pagemode function we could determine
that this was a scan of the whole relation.

The profiling also showed that it was spending an obscene amount of
time calculating hash values (over 10% of total run time!). I'm
inclined to think that a less sophisticated algorithm (like just
adding oid, page, and tuple offset numbers) would generate very
*real* savings with the down side being a very hypothetical
*possible* cost to longer chains in the HTAB. But that's a separate
issue, best settled on the basis of benchmarks rather than
theoretical discussions.

-Kevin

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	"Heikki Linnakangas" <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	"Dan Ports" <drkp(at)csail(dot)mit(dot)edu>, "john(dot)okite(at)gmail(dot)org" <john(dot)okite(at)gmail(dot)org>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, <anssi(dot)kaariainen(at)thl(dot)fi>
Subject:	Re: SSI patch version 8
Date:	2011-01-13 15:02:12
Message-ID:	4D2EBF950200002500039483@gw.wicourts.gov
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:

> where exactly is the extra overhead coming from?

Keep in mind that this is a sort of worst case scenario. The data
is fully cached in shared memory and we're doing a sequential pass
just counting the rows. In an earlier benchmark (which I should
re-do after all this refactoring), random access queries against a
fully cached data set only increased run time by 1.8%. Throw some
disk access into the mix, and the overhead is likely to get lost in
the noise.

But, as I said, count(*) seems to be the first thing many people try
as a benchmark, and this is a symptom of a more general issue, so
I'd like to find a good solution.

-Kevin

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc:	Dan Ports <drkp(at)csail(dot)mit(dot)edu>, "john(dot)okite(at)gmail(dot)org" <john(dot)okite(at)gmail(dot)org>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, anssi(dot)kaariainen(at)thl(dot)fi
Subject:	Re: SSI patch version 8
Date:	2011-01-13 15:02:21
Message-ID:	4D2F13FD.3070009@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 13.01.2011 16:51, Kevin Grittner wrote:
> Right. As each tuple is read we need to ensure that there is a
> predicate lock to cover it. Since finer-grained locks can be
> combined into coarser-grained locks we need to start with the fine
> grained and move toward checking the coarser grains, to avoid
> missing a lock during promotion. So for each tuple we calculate a
> hash, find a partition, lock it, and lookup the tuple as a lock
> target. When that's not found we do the same thing for the page.
> When that's not found we do the same thing for the relation.
>
> But we acquired a relation lock up front, when we determined that
> this would be a heap scan, so we could short-circuit this whole
> thing if within the heapgettup_pagemode function we could determine
> that this was a scan of the whole relation.

That sounds simple enough. Add a boolean field to HeapScanDesc,
"rs_relpredicatelocked", and set it when you acquire the relation lock.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	"Heikki Linnakangas" <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	"Dan Ports" <drkp(at)csail(dot)mit(dot)edu>, "john(dot)okite(at)gmail(dot)org" <john(dot)okite(at)gmail(dot)org>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, <anssi(dot)kaariainen(at)thl(dot)fi>
Subject:	Re: SSI patch version 8
Date:	2011-01-13 15:39:30
Message-ID:	4D2EC852020000250003948B@gw.wicourts.gov
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:

> That sounds simple enough. Add a boolean field to HeapScanDesc,
> "rs_relpredicatelocked", and set it when you acquire the relation
> lock.

I'll take a look at doing that. Thanks!

-Kevin

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	"Heikki Linnakangas" <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	"Dan Ports" <drkp(at)csail(dot)mit(dot)edu>, "john(dot)okite(at)gmail(dot)org" <john(dot)okite(at)gmail(dot)org>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, <anssi(dot)kaariainen(at)thl(dot)fi>
Subject:	Re: SSI patch version 8
Date:	2011-01-13 16:28:19
Message-ID:	4D2ED3C3020000250003949D@gw.wicourts.gov
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> On 13.01.2011 16:51, Kevin Grittner wrote:

>> But we acquired a relation lock up front, when we determined that
>> this would be a heap scan, so we could short-circuit this whole
>> thing if within the heapgettup_pagemode function we could
>> determine that this was a scan of the whole relation.
>
> That sounds simple enough. Add a boolean field to HeapScanDesc,
> "rs_relpredicatelocked", and set it when you acquire the relation
> lock.

Heikki, I can't thank you enough. The fix is here:

http://git.postgresql.org/gitweb?p=users/kgrittn/postgres.git;a=commitdiff;h=64ca508a0e2fa9c21dc76a5d6a5f549c27f511fa

The timings are now:

begin transaction isolation level repeatable read;
Time: 324.938 ms
Time: 228.045 ms
Time: 227.963 ms

begin transaction isolation level serializable;
Time: 311.954 ms
Time: 311.928 ms
Time: 311.848 ms

begin transaction isolation level serializable, read only;
Time: 227.471 ms
Time: 228.137 ms
Time: 227.778 ms

begin transaction isolation level serializable, read only,
deferrable;
Time: 227.899 ms
Time: 249.772 ms
Time: 228.026 ms

begin transaction isolation level repeatable read;
Time: 231.173 ms
Time: 245.041 ms
Time: 228.149 ms

I'm surprised the difference is still that high as a percentage, and
will investigate, but this seems survivable. When I do the math,
the difference comes out to 83.885 nanoseconds per row.

-Kevin

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	Anssi Kääriäinen <anssi(dot)kaariainen(at)thl(dot)fi>
Cc:	"Dan Ports" <drkp(at)csail(dot)mit(dot)edu>, "john(dot)okite(at)gmail(dot)org" <john(dot)okite(at)gmail(dot)org>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: SSI patch version 8
Date:	2011-01-14 00:21:43
Message-ID:	4D2F42B7020000250003951E@gw.wicourts.gov
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Anssi Kääriäinen<anssi(dot)kaariainen(at)thl(dot)fi> wrote:

> I think I found a problem. This is using SSI v8. The table
> definition:
>
> create table test_t (id integer, val1 text, val2 integer);

> create index test_idx on test_t(id) where val2 = 1;

> insert into test_t (select generate_series(0, 10000), 'a', 2);
> insert into test_t (select generate_series(0, 10), 'a', 1);

> T1:
> hot2=> begin transaction isolation level serializable;

> hot2=> select * from test_t where val2 = 1;

> hot2=> update test_t set val2 = 2 where val2 = 1 and id = 10;

> T2:
> hot2=> begin transaction isolation level serializable;

> hot2=> select * from test_t where val2 = 1;

> hot2=> update test_t set val2 = 2 where val2 = 1 and id = 9;

> hot2=> commit;

> T1:
> hot2=> commit;

I hope you have no objection to having the code you wrote included
in the test suite which is part of the patch. Well, if you do, I'll
pull it back out and invent something similar... ;-)

http://git.postgresql.org/gitweb?p=users/kgrittn/postgres.git;a=commitdiff;h=2502cccbdd5e5d44be469549b91fe49c0554ec3e

-Kevin

From:	Anssi Kääriäinen <anssi(dot)kaariainen(at)thl(dot)fi>
To:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc:	Dan Ports <drkp(at)csail(dot)mit(dot)edu>, "john(dot)okite(at)gmail(dot)org" <john(dot)okite(at)gmail(dot)org>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: SSI patch version 8
Date:	2011-01-14 10:42:18
Message-ID:	4D30288A.1030303@thl.fi
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 01/14/2011 02:21 AM, Kevin Grittner wrote:
> I hope you have no objection to having the code you wrote included
> in the test suite which is part of the patch. Well, if you do, I'll
> pull it back out and invent something similar... ;-)
No objection.

- Anssi