Re: Remove hacks for old bad qsort() implementations?

Lists: pgsql-hackers
From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: Remove hacks for old bad qsort() implementations?
Date: 2008-03-17 03:59:41
Message-ID: 23321.1205726381@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

There are several places in tuplesort.c (and perhaps elsewhere) where
we explicitly work around limitations of various platforms' qsort()
functions. Notably, there's this bit in comparetup_index_btree

/*
* If key values are equal, we sort on ItemPointer. This does not affect
* validity of the finished index, but it offers cheap insurance against
* performance problems with bad qsort implementations that have trouble
* with large numbers of equal keys.
*/

which I unquestioningly copied into comparetup_index_hash yesterday.
However, oprofile is telling me that doing this is costing
*significantly* more than just returning zero would do:

9081 0.3050 : tuple1 = (IndexTuple) a->tuple;
3759 0.1263 : tuple2 = (IndexTuple) b->tuple;
:
: {
130409 4.3800 : BlockNumber blk1 = ItemPointerGetBlockNumber(&tuple1->t_tid);
34539 1.1601 : BlockNumber blk2 = ItemPointerGetBlockNumber(&tuple2->t_tid);
:
3281 0.1102 : if (blk1 != blk2)
812 0.0273 : return (blk1 < blk2) ? -1 : 1;
: }
: {
28 9.4e-04 : OffsetNumber pos1 = ItemPointerGetOffsetNumber(&tuple1->t_tid);
1 3.4e-05 : OffsetNumber pos2 = ItemPointerGetOffsetNumber(&tuple2->t_tid);
:
1 3.4e-05 : if (pos1 != pos2)
48757 1.6376 : return (pos1 < pos2) ? -1 : 1;
: }
:
: return 0;
56705 1.9045 :}

Looks to me like we're eating more than seven percent of the total
runtime to do this :-(

Now as far as I can see, the original motivation for this (as stated in
the comment) is entirely dead anymore, since we always use our own qsort
implementation in preference to whatever bogus version a given libc
might supply. What do people think of removing this bit of code in
favor of just returning 0?

I can see a couple of possible objections:

1. Someday we might go back to using platform qsort. (But surely we
could insist on qsort behaving sanely for equal keys.)

2. If you've got lots of equal keys, it's conceivable that having the
index entries sorted by TID offers some advantage in indexscan speed.
I'm dubious that that's useful, mainly because the planner should prefer
a bitmap scan in such a case; and anyway the ordering is unlikely to
be preserved for long. But it's something to think about.

Comments?

regards, tom lane


From: "Zeugswetter Andreas OSB SD" <Andreas(dot)Zeugswetter(at)s-itsolutions(dot)at>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Re: Remove hacks for old bad qsort() implementations?
Date: 2008-03-17 10:55:58
Message-ID: E1539E0ED7043848906A8FF995BDA57902DD360B@m0143.s-mxs.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


> 2. If you've got lots of equal keys, it's conceivable that having the
> index entries sorted by TID offers some advantage in indexscan speed.
> I'm dubious that that's useful, mainly because the planner should
prefer
> a bitmap scan in such a case; and anyway the ordering is unlikely to
> be preserved for long. But it's something to think about.

How about always adding the TID as last key when using qsort for create
index ?

Andreas


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Remove hacks for old bad qsort() implementations?
Date: 2008-03-17 13:16:59
Message-ID: 20080317131658.GE6083@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> There are several places in tuplesort.c (and perhaps elsewhere) where
> we explicitly work around limitations of various platforms' qsort()
> functions. Notably, there's this bit in comparetup_index_btree
>
> /*
> * If key values are equal, we sort on ItemPointer. This does not affect
> * validity of the finished index, but it offers cheap insurance against
> * performance problems with bad qsort implementations that have trouble
> * with large numbers of equal keys.
> */

Hmm, wasn't this supposed to be there to fix a problem with Lehman &
Yao's btree definition, that required all keys to be distinct?

[ checks the README ]

Okay, it seems I'm wrong; it has nothing to do with what we pass to
qsort.

The requirement that all btree keys be unique is too onerous,
but the algorithm won't work correctly without it. Fortunately, it is
only necessary that keys be unique on a single tree level, because L&Y
only use the assumption of key uniqueness when re-finding a key in a
parent page (to determine where to insert the key for a split page).
Therefore, we can use the link field to disambiguate multiple
occurrences of the same user key: only one entry in the parent level
will be pointing at the page we had split.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Zeugswetter Andreas OSB SD" <Andreas(dot)Zeugswetter(at)s-itsolutions(dot)at>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Remove hacks for old bad qsort() implementations?
Date: 2008-03-17 13:30:43
Message-ID: 8195.1205760643@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Zeugswetter Andreas OSB SD" <Andreas(dot)Zeugswetter(at)s-itsolutions(dot)at> writes:
> How about always adding the TID as last key when using qsort for create
> index ?

I think you misunderstood: that's what we do now. I'm proposing
removing it because I think it's probably useless.

regards, tom lane


From: "Zeugswetter Andreas OSB SD" <Andreas(dot)Zeugswetter(at)s-itsolutions(dot)at>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Re: Remove hacks for old bad qsort() implementations?
Date: 2008-03-18 10:25:43
Message-ID: E1539E0ED7043848906A8FF995BDA57902EB0CB7@m0143.s-mxs.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> > How about always adding the TID as last key when using qsort for
create
> > index ?
>
> I think you misunderstood: that's what we do now. I'm proposing
> removing it because I think it's probably useless.

Ah, sorry, I did not look at the code, and interpreted your comment as
some exceptional handling.
I think really randomly (regarding TID) ordering highly duplicate keys
is not such a good idea.
But the point is, it does not need to be exact. Basically sorted with a
few exceptions
and jumps, or sorted by blockid only would be ok. How random does our
qsort really return those tids ?

You wrote:
> However, oprofile is telling me that doing this is costing
> *significantly* more than just returning zero would do:
>
> 9081 0.3050 : tuple1 = (IndexTuple) a->tuple;
> 3759 0.1263 : tuple2 = (IndexTuple) b->tuple;
> :
> : {
> 130409 4.3800 : BlockNumber blk1 =
ItemPointerGetBlockNumber(&tuple1->t_tid);

So why is this ItemPointerGetBlockNumber so expensive ?

> 34539 1.1601 : BlockNumber blk2 =
ItemPointerGetBlockNumber(&tuple2->t_tid);

Is it not correctly inlined ? Are the shifts for BlockNumber so
expensive ?
Or is this simply some oprofile overhead that is not real at all ?

Andreas


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Remove hacks for old bad qsort() implementations?
Date: 2008-05-07 23:50:13
Message-ID: 200805072350.m47NoDE20060@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Tom, are you intending to remove this part of the sort code?

---------------------------------------------------------------------------

Tom Lane wrote:
> There are several places in tuplesort.c (and perhaps elsewhere) where
> we explicitly work around limitations of various platforms' qsort()
> functions. Notably, there's this bit in comparetup_index_btree
>
> /*
> * If key values are equal, we sort on ItemPointer. This does not affect
> * validity of the finished index, but it offers cheap insurance against
> * performance problems with bad qsort implementations that have trouble
> * with large numbers of equal keys.
> */
>
> which I unquestioningly copied into comparetup_index_hash yesterday.
> However, oprofile is telling me that doing this is costing
> *significantly* more than just returning zero would do:
>
> 9081 0.3050 : tuple1 = (IndexTuple) a->tuple;
> 3759 0.1263 : tuple2 = (IndexTuple) b->tuple;
> :
> : {
> 130409 4.3800 : BlockNumber blk1 = ItemPointerGetBlockNumber(&tuple1->t_tid);
> 34539 1.1601 : BlockNumber blk2 = ItemPointerGetBlockNumber(&tuple2->t_tid);
> :
> 3281 0.1102 : if (blk1 != blk2)
> 812 0.0273 : return (blk1 < blk2) ? -1 : 1;
> : }
> : {
> 28 9.4e-04 : OffsetNumber pos1 = ItemPointerGetOffsetNumber(&tuple1->t_tid);
> 1 3.4e-05 : OffsetNumber pos2 = ItemPointerGetOffsetNumber(&tuple2->t_tid);
> :
> 1 3.4e-05 : if (pos1 != pos2)
> 48757 1.6376 : return (pos1 < pos2) ? -1 : 1;
> : }
> :
> : return 0;
> 56705 1.9045 :}
>
> Looks to me like we're eating more than seven percent of the total
> runtime to do this :-(
>
> Now as far as I can see, the original motivation for this (as stated in
> the comment) is entirely dead anymore, since we always use our own qsort
> implementation in preference to whatever bogus version a given libc
> might supply. What do people think of removing this bit of code in
> favor of just returning 0?
>
> I can see a couple of possible objections:
>
> 1. Someday we might go back to using platform qsort. (But surely we
> could insist on qsort behaving sanely for equal keys.)
>
> 2. If you've got lots of equal keys, it's conceivable that having the
> index entries sorted by TID offers some advantage in indexscan speed.
> I'm dubious that that's useful, mainly because the planner should prefer
> a bitmap scan in such a case; and anyway the ordering is unlikely to
> be preserved for long. But it's something to think about.
>
> Comments?
>
> regards, tom lane
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Remove hacks for old bad qsort() implementations?
Date: 2008-05-08 15:43:37
Message-ID: 2731.1210261417@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Bruce Momjian <bruce(at)momjian(dot)us> writes:
> Tom, are you intending to remove this part of the sort code?
http://archives.postgresql.org/message-id/23321.1205726381@sss.pgh.pa.us

Well, I got some push-back on the proposal, so I'd kind of dropped it.
But AFAIR the objections were purely hypothetical, whereas the cost
of the existing code is measurable; so maybe we should do it anyway.

regards, tom lane


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Remove hacks for old bad qsort() implementations?
Date: 2008-05-08 16:27:49
Message-ID: 200805081627.m48GRng19857@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> Bruce Momjian <bruce(at)momjian(dot)us> writes:
> > Tom, are you intending to remove this part of the sort code?
> http://archives.postgresql.org/message-id/23321.1205726381@sss.pgh.pa.us
>
> Well, I got some push-back on the proposal, so I'd kind of dropped it.
> But AFAIR the objections were purely hypothetical, whereas the cost
> of the existing code is measurable; so maybe we should do it anyway.

Yea, that's what I was thinking. Your measurements were significant.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Remove hacks for old bad qsort() implementations?
Date: 2008-06-23 20:21:01
Message-ID: 200806232021.m5NKL1q11089@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Added to TODO:

* Consider whether duplicate keys should be sorted by block/offset

http://archives.postgresql.org/pgsql-hackers/2008-03/msg00558.php

---------------------------------------------------------------------------

Tom Lane wrote:
> There are several places in tuplesort.c (and perhaps elsewhere) where
> we explicitly work around limitations of various platforms' qsort()
> functions. Notably, there's this bit in comparetup_index_btree
>
> /*
> * If key values are equal, we sort on ItemPointer. This does not affect
> * validity of the finished index, but it offers cheap insurance against
> * performance problems with bad qsort implementations that have trouble
> * with large numbers of equal keys.
> */
>
> which I unquestioningly copied into comparetup_index_hash yesterday.
> However, oprofile is telling me that doing this is costing
> *significantly* more than just returning zero would do:
>
> 9081 0.3050 : tuple1 = (IndexTuple) a->tuple;
> 3759 0.1263 : tuple2 = (IndexTuple) b->tuple;
> :
> : {
> 130409 4.3800 : BlockNumber blk1 = ItemPointerGetBlockNumber(&tuple1->t_tid);
> 34539 1.1601 : BlockNumber blk2 = ItemPointerGetBlockNumber(&tuple2->t_tid);
> :
> 3281 0.1102 : if (blk1 != blk2)
> 812 0.0273 : return (blk1 < blk2) ? -1 : 1;
> : }
> : {
> 28 9.4e-04 : OffsetNumber pos1 = ItemPointerGetOffsetNumber(&tuple1->t_tid);
> 1 3.4e-05 : OffsetNumber pos2 = ItemPointerGetOffsetNumber(&tuple2->t_tid);
> :
> 1 3.4e-05 : if (pos1 != pos2)
> 48757 1.6376 : return (pos1 < pos2) ? -1 : 1;
> : }
> :
> : return 0;
> 56705 1.9045 :}
>
> Looks to me like we're eating more than seven percent of the total
> runtime to do this :-(
>
> Now as far as I can see, the original motivation for this (as stated in
> the comment) is entirely dead anymore, since we always use our own qsort
> implementation in preference to whatever bogus version a given libc
> might supply. What do people think of removing this bit of code in
> favor of just returning 0?
>
> I can see a couple of possible objections:
>
> 1. Someday we might go back to using platform qsort. (But surely we
> could insist on qsort behaving sanely for equal keys.)
>
> 2. If you've got lots of equal keys, it's conceivable that having the
> index entries sorted by TID offers some advantage in indexscan speed.
> I'm dubious that that's useful, mainly because the planner should prefer
> a bitmap scan in such a case; and anyway the ordering is unlikely to
> be preserved for long. But it's something to think about.
>
> Comments?
>
> regards, tom lane
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +