From: | Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> |
---|---|
To: | Manfred Koizar <mkoi-pg(at)aon(dot)at> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-patches(at)postgresql(dot)org, ohp(at)pyrenet(dot)fr, pgsql-hackers list <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: [HACKERS] Index creation takes for ever |
Date: | 2003-12-01 04:50:57 |
Message-ID: | 200312010450.hB14ovH16330@candle.pha.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers pgsql-patches |
Your patch has been added to the PostgreSQL unapplied patches list at:
http://momjian.postgresql.org/cgi-bin/pgpatches
I will try to apply it within the next 48 hours.
---------------------------------------------------------------------------
Manfred Koizar wrote:
> On Mon, 01 Sep 2003 08:46:09 -0400, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
> wrote:
> >ohp(at)pyrenet(dot)fr writes:
> >> it took 69 minutes to finish, 75% of this time was devoted to create 2
> >> indexes on varchar(2) with value being 'O', 'N' or null;
> >
> >I still say it's either strcoll or qsort's fault.
>
> If qsort is to blame, then maybe this patch could help. It sorts
> equal key values on item pointer. And if it doesn't help index
> creation speed, at least the resulting index has better correlation.
>
> Test script:
> CREATE TABLE t (i int NOT NULL, t text NOT NULL);
> INSERT INTO t VALUES (1, 'lajshdflasjhdflajhsdfljhasdlfjhasdf');
> INSERT INTO t SELECT * FROM t;
> INSERT INTO t SELECT * FROM t;
> INSERT INTO t VALUES (100, 's,dmfa.,smdn.famsndfamdnsbfmansdbf');
> INSERT INTO t SELECT * FROM t;
> INSERT INTO t SELECT * FROM t;
> INSERT INTO t SELECT * FROM t;
> INSERT INTO t SELECT * FROM t;
> INSERT INTO t SELECT * FROM t;
> INSERT INTO t SELECT * FROM t;
> INSERT INTO t SELECT * FROM t;
> INSERT INTO t SELECT * FROM t;
> INSERT INTO t SELECT * FROM t;
> INSERT INTO t SELECT * FROM t;
> INSERT INTO t SELECT * FROM t;
> INSERT INTO t SELECT * FROM t;
> INSERT INTO t SELECT * FROM t;
> INSERT INTO t SELECT * FROM t;
> ANALYZE t;
> CREATE INDEX t_i ON t(i);
> SET enable_seqscan = 0;
> SELECT ctid FROM t WHERE i=100 LIMIT 10;
>
> Result without patch:
> ctid
> ----------
> (153,14)
> (306,23)
> (305,80)
> (152,91)
> (76,68)
> (38,34)
> (153,34)
> (305,50)
> (9,62)
> (305,40)
> (10 rows)
>
> Result with patch:
> ctid
> --------
> (0,5)
> (0,10)
> (0,15)
> (0,20)
> (0,25)
> (0,30)
> (0,35)
> (0,40)
> (0,45)
> (0,50)
> (10 rows)
>
> For testing purposes I have made a second patch that provides a
> boolean GUC variable sort_index. It is available here:
> http://www.pivot.at/pg/23.test-IdxTupleSort.diff
>
> Servus
> Manfred
> diff -ruN ../base/src/backend/utils/sort/tuplesort.c src/backend/utils/sort/tuplesort.c
> --- ../base/src/backend/utils/sort/tuplesort.c 2003-08-17 21:58:06.000000000 +0200
> +++ src/backend/utils/sort/tuplesort.c 2003-09-05 10:04:22.000000000 +0200
> @@ -2071,6 +2071,33 @@
> (errcode(ERRCODE_UNIQUE_VIOLATION),
> errmsg("could not create unique index"),
> errdetail("Table contains duplicated values.")));
> + else
> + {
> + /*
> + * If key values are equal, we sort on ItemPointer. This might help
> + * for some bad qsort implementation having performance problems
> + * with many equal items. OTOH I wouldn't trust such a weak qsort
> + * to handle pre-sorted sequences very well ...
> + *
> + * Anyway, this code doesn't hurt much, and it helps produce indices
> + * with better index correlation which is a good thing per se.
> + */
> + ItemPointer tid1 = &tuple1->t_tid;
> + ItemPointer tid2 = &tuple2->t_tid;
> + BlockNumber blk1 = ItemPointerGetBlockNumber(tid1);
> + BlockNumber blk2 = ItemPointerGetBlockNumber(tid2);
> +
> + if (blk1 != blk2)
> + return (blk1 < blk2) ? -1 : 1;
> + else
> + {
> + OffsetNumber pos1 = ItemPointerGetOffsetNumber(tid1);
> + OffsetNumber pos2 = ItemPointerGetOffsetNumber(tid2);
> +
> + if (pos1 != pos2)
> + return (pos1 < pos2) ? -1 : 1;
> + }
> + }
>
> return 0;
> }
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
> http://archives.postgresql.org
--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073
From | Date | Subject | |
---|---|---|---|
Next Message | Bruce Momjian | 2003-12-01 04:52:10 | Re: ISO 8601 "Time Intervals" of the "format with time-unit |
Previous Message | Bruce Momjian | 2003-12-01 04:11:45 | Re: Wrong masklen in result of netmask()/hostmask() |
From | Date | Subject | |
---|---|---|---|
Next Message | Bruce Momjian | 2003-12-01 04:52:10 | Re: ISO 8601 "Time Intervals" of the "format with time-unit |
Previous Message | Bruce Momjian | 2003-12-01 04:48:04 | Re: Numeric version of factorial() |