Re: Permance issues with migrated db

Lists: pgsql-general
From: Robert Fitzpatrick <lists(at)webtent(dot)net>
To: PostgreSQL <pgsql-general(at)postgresql(dot)org>
Subject: Permance issues with migrated db
Date: 2007-05-22 15:17:40
Message-ID: 1179847060.28503.55.camel@columbus.webtent.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

I posted an issue I was having with a migration from MS SQL server to
pgsql over the weekend. Tom's suggestion for the query I posted was
right on, I made the appropriate updates to column types by dumping,
changing and restoring the database. I then analyze'd the db and my
query performance was equal to that of MS SQL. This is my first
migration and also my first time trying to spot performance issues,
mainly because all the previous db's I've worked with were built from
scratch, never an issue with performance, but never worked with so much
data either (not sure if that has anything to do with my issues).

I have developed a view in pgsql that takes over 160K ms to execute, but
when copied into MS SQL against the old database (with syntax mods of
course), runs in a few seconds. Seems the issues are with tblcontactinfo
and tblclientactivitytag. Only if I remove all references to *both*
tables do I get good performance from the query. Thanks for any help!

SELECT tblclientmaster.fldclientnumber, tblclientmaster.fldclientname, tblclientmaster.fldclienttype, tblclientmaster.fldbuyingstatus, tblclientmaster.fldsellingstatus, tblclientproductpreference.fldfullservice, tblclientproductpreference.fldlimitedservice, tblclientproductpreference.fldallsuite, tblclientproductpreference.fldbudget, tblclientproductpreference.fldconference, tblclientproductpreference.fldresort, tblclientproductpreference.flddailyfee, tblclientproductpreference.fldsemiprivate, tblclientproductpreference.fldprivate, tblclientproductpreference.fldmunicipal, tblclientroomsize.fldsize149, tblclientroomsize.fldsize299, tblclientroomsize.fldsize449, tblclientroomsize.fldsize599, tblclientroomsize.fldsize600, tblgeopreference.fldsw, tblgeopreference.fldnw, tblgeopreference.fldmw, tblgeopreference.fldw, tblgeopreference.fldma, tblgeopreference.fldse, tblgeopreference.flds, tblgeopreference.fldne, tblproductmaster.fldproductname, tblproductmaster.fldproductcode, tblcontactinfo.fldcontactnumber, tblcontactinfo.fldcontactfirstname, tblcontactinfo.fldcontactlastname, (tblcontactinfo.fldcontactaddress1::text || ' '::text) || tblcontactinfo.fldcontactaddress2::text AS fldcontactaddress, tblcontactinfo.fldcontactcity, tblcontactinfo.fldcontactstate, tblcontactinfo.fldcontactzipcode, tblcontactinfo.fldcontacttitle, tblcontactinfo.fldcontactphone2_type, tblcontactinfo.fldcontactphone2_num, tblcontactinfo.fldcontactphone3_num, tblcontactinfo.fldcontactphone4_num, tblcontactinfo.fldcontactphone5_num, tblcontactinfo.fldcontactemail, tblcontactinfo.fldenable, tblcontactinfo.fldcontactphone1_num, tblcontactinfo.fldperscomments, tblclientactivitytag.fldcontactactivitytag
FROM tblclientmaster
LEFT JOIN tblclientproductpreference ON tblclientmaster.fldclientnumber = tblclientproductpreference.fldclientnumber
LEFT JOIN tblclientroomsize ON tblclientmaster.fldclientnumber = tblclientroomsize.fldclientnumber
LEFT JOIN tblgeopreference ON tblclientmaster.fldclientnumber = tblgeopreference.fldclientnumber
LEFT JOIN tblclientproductrelation ON tblclientmaster.fldclientnumber = tblclientproductrelation.fldclientnumber
JOIN tblproductmaster ON tblclientproductrelation.fldproductnumber = tblproductmaster.fldproductnumber
LEFT JOIN tblcontactinfo ON tblclientmaster.fldclientnumber = tblcontactinfo.fldclientnumber
LEFT JOIN tblclientactivitytag ON tblclientmaster.fldclientnumber = tblclientactivitytag.fldclientnumber
ORDER BY tblclientmaster.fldclientnumber, tblclientmaster.fldclientname, tblclientmaster.fldclienttype, tblclientmaster.fldbuyingstatus, tblclientmaster.fldsellingstatus, tblclientproductpreference.fldfullservice, tblclientproductpreference.fldlimitedservice, tblclientproductpreference.fldallsuite, tblclientproductpreference.fldbudget, tblclientproductpreference.fldconference, tblclientproductpreference.fldresort, tblclientproductpreference.flddailyfee, tblclientproductpreference.fldsemiprivate, tblclientproductpreference.fldprivate, tblclientproductpreference.fldmunicipal, tblclientroomsize.fldsize149, tblclientroomsize.fldsize299, tblclientroomsize.fldsize449, tblclientroomsize.fldsize599, tblclientroomsize.fldsize600, tblgeopreference.fldsw, tblgeopreference.fldnw, tblgeopreference.fldmw, tblgeopreference.fldw, tblgeopreference.fldma, tblgeopreference.fldse, tblgeopreference.flds, tblgeopreference.fldne, tblproductmaster.fldproductname, tblproductmaster.fldproductcode, tblcontactinfo.fldcontactnumber, tblcontactinfo.fldcontactfirstname, tblcontactinfo.fldcontactlastname, (tblcontactinfo.fldcontactaddress1::text || ' '::text) || tblcontactinfo.fldcontactaddress2::text, tblcontactinfo.fldcontactcity, tblcontactinfo.fldcontactstate, tblcontactinfo.fldcontactzipcode, tblcontactinfo.fldcontacttitle, tblcontactinfo.fldcontactphone2_type, tblcontactinfo.fldcontactphone2_num, tblcontactinfo.fldcontactphone3_num, tblcontactinfo.fldcontactphone4_num, tblcontactinfo.fldcontactphone5_num, tblcontactinfo.fldcontactemail, tblcontactinfo.fldenable, tblcontactinfo.fldcontactphone1_num, tblcontactinfo.fldperscomments, tblclientactivitytag.fldcontactactivitytag;

Merge Left Join (cost=35366.42..57565.28 rows=565261 width=297) (actual time=1000.457..148111.905 rows=41866801 loops=1)
Merge Cond: (tblclientmaster.fldclientnumber = tblclientactivitytag.fldclientnumber)
-> Merge Join (cost=18768.38..29470.77 rows=76505 width=286) (actual time=437.703..1575.449 rows=360822 loops=1)
Merge Cond: (tblclientmaster.fldclientnumber = tblclientproductrelation.fldclientnumber)
-> Merge Left Join (cost=0.00..9495.89 rows=23878 width=257) (actual time=0.224..302.321 rows=24908 loops=1)
Merge Cond: (tblclientmaster.fldclientnumber = tblcontactinfo.fldclientnumber)
-> Merge Left Join (cost=0.00..4044.08 rows=16485 width=90) (actual time=0.172..182.162 rows=16547 loops=1)
Merge Cond: (tblclientmaster.fldclientnumber = tblclientproductpreference.fldclientnumber)
-> Merge Left Join (cost=0.00..3212.26 rows=16485 width=80) (actual time=0.134..130.524 rows=16514 loops=1)
Merge Cond: (tblclientmaster.fldclientnumber = tblclientroomsize.fldclientnumber)
-> Merge Left Join (cost=0.00..2396.89 rows=16485 width=75) (actual time=0.097..81.108 rows=16497 loops=1)
Merge Cond: (tblclientmaster.fldclientnumber = tblgeopreference.fldclientnumber)
-> Index Scan using pk_tblclientmaster on tblclientmaster (cost=0.00..1582.44 rows=16485 width=67) (actual time=0.053..22.425 rows=16484 loops=1)
-> Index Scan using ix_tblgeopreference_fldclientnumber on tblgeopreference (cost=0.00..606.38 rows=13398 width=16) (actual time=0.033..14.121 rows=13397 loops=1)
-> Index Scan using ix_tblclientroomsize_fldclientnumber on tblclientroomsize (cost=0.00..606.72 rows=13395 width=13) (actual time=0.032..14.240 rows=13414 loops=1)
-> Index Scan using ix_tblclientproductpreference on tblclientproductpreference (cost=0.00..623.95 rows=13391 width=18) (actual time=0.031..15.194 rows=13439 loops=1)
-> Index Scan using ix_tblcontactinfo_fldclientnumber on tblcontactinfo (cost=0.00..5113.21 rows=23878 width=175) (actual time=0.046..50.194 rows=24512 loops=1)
-> Sort (cost=18768.38..18900.69 rows=52924 width=37) (actual time=437.401..724.717 rows=361884 loops=1)
Sort Key: tblclientproductrelation.fldclientnumber
-> Merge Join (cost=0.00..12985.06 rows=52924 width=37) (actual time=0.827..299.983 rows=52904 loops=1)
Merge Cond: (tblclientproductrelation.fldproductnumber = tblproductmaster.fldproductnumber)
-> Index Scan using ix_tblclientproductrelation_fldproductnumber on tblclientproductrelation (cost=0.00..2210.71 rows=52950 width=12) (actual time=0.443..53.273 rows=52950 loops=1)
-> Index Scan using pk_tblproperty on tblproductmaster (cost=0.00..9987.31 rows=52924 width=37) (actual time=0.377..102.800 rows=52927 loops=1)
-> Sort (cost=16598.04..16902.54 rows=121800 width=19) (actual time=562.569..29677.017 rows=41662721 loops=1)
Sort Key: tblclientactivitytag.fldclientnumber
-> Seq Scan on tblclientactivitytag (cost=0.00..3394.00 rows=121800 width=19) (actual time=12.772..115.473 rows=121800 loops=1)
Total runtime: 163914.674 ms

Indexes were added in an effort to help the issue, but no help. These
indexes do not exists in MS SQL server:

CREATE TABLE "public"."tblclientactivitytag" (
"fldclientnumber" BIGINT,
"fldcontactactivitytag" VARCHAR(100),
"fldclientname_old" VARCHAR(100),
"fldcontactfirstname" VARCHAR(50),
"fldcontactlastname" VARCHAR(50),
"fldcontactnumber" BIGINT,
"fldclientactivitytagid" BIGSERIAL,
CONSTRAINT "tblclientactivitytag_pkey" PRIMARY KEY("fldclientactivitytagid"),
CONSTRAINT "test2" FOREIGN KEY ("fldclientnumber")
REFERENCES "public"."tblclientmaster"("fldclientnumber")
ON DELETE CASCADE
ON UPDATE CASCADE
NOT DEFERRABLE
) WITHOUT OIDS;
CREATE INDEX "ix_tblclientactivitytag_fldclientactivitytagid" ON "public"."tblclientactivitytag"
USING btree ("fldclientactivitytagid");
CREATE INDEX "ix_tblclientactivitytag_fldclientnumber" ON "public"."tblclientactivitytag"
USING btree ("fldclientnumber");
CREATE INDEX "ix_tblclientactivitytag_fldcontactnumber" ON "public"."tblclientactivitytag"
USING btree ("fldcontactnumber");
CREATE INDEX "ix_tblclientactivitytag_tag" ON "public"."tblclientactivitytag"
USING btree ("fldcontactactivitytag");

CREATE TABLE "public"."tblcontactinfo" (
"fldclientnumber" BIGINT,
"fldcontactnumber" BIGSERIAL,
"fldcontactfirstname" VARCHAR(50),
"fldcontactlastname" VARCHAR(50),
"fldcontactaddress1" VARCHAR(60),
"fldcontactaddress2" VARCHAR(50),
"fldcontactcity" VARCHAR(50),
"fldcontactstate" VARCHAR(50),
"fldcontactzipcode" VARCHAR(10),
"fldclientname_remove" VARCHAR(100),
"fldcontacttype" VARCHAR(150),
"fldcontactsalutation" VARCHAR(50),
"fldcontactdear" VARCHAR(50),
"fldcontacttitle" VARCHAR(100),
"fldcontactphone1_type" VARCHAR(50),
"fldcontactphone1_num" VARCHAR(20),
"fldcontactphone1_num_ext" VARCHAR(20),
"fldcontactphone2_type" VARCHAR(50),
"fldcontactphone2_num" VARCHAR(20),
"fldcontactphone3_type" VARCHAR(50),
"fldcontactphone3_num" VARCHAR(20),
"fldcontactphone4_type" VARCHAR(50),
"fldcontactphone4_num" VARCHAR(20),
"fldcontactphone5_type" VARCHAR(50),
"fldcontactphone5_num" VARCHAR(20),
"fldcontactemail" VARCHAR(50) NOT NULL,
"fldcontactwebsite" VARCHAR(75),
"fldperscomments" VARCHAR(900),
"fldassistant" VARCHAR(100),
"fldhfcode" VARCHAR(50),
"fldenable" BOOLEAN NOT NULL,
"fldpreviousclientnumber" BIGINT,
"fldcreateddate" TIMESTAMP WITHOUT TIME ZONE,
"fldcreatedby" VARCHAR(50),
"fldlastupdateddate" TIMESTAMP WITHOUT TIME ZONE,
"fldlastupdatedby" VARCHAR(50),
"fldcontactbyemail" BOOLEAN NOT NULL,
"fldcontactbyfax" BOOLEAN NOT NULL,
"fldcontactbymail" BOOLEAN NOT NULL,
"fldcontactbyphone" BOOLEAN NOT NULL,
"fldcontactbycell" BOOLEAN NOT NULL,
"fldcontactbypager" BOOLEAN NOT NULL,
"fldcontactpassword" VARCHAR(8),
CONSTRAINT "pk_tblcontactinfo" PRIMARY KEY("fldcontactnumber"),
CONSTRAINT "test" FOREIGN KEY ("fldclientnumber")
REFERENCES "public"."tblclientmaster"("fldclientnumber")
ON DELETE CASCADE
ON UPDATE CASCADE
NOT DEFERRABLE
) WITHOUT OIDS;
CREATE INDEX "ix_tblcontactinfo_fldclientnumber" ON "public"."tblcontactinfo"
USING btree ("fldclientnumber");
CREATE INDEX "ix_tblcontactinfo_fldcontactcity" ON "public"."tblcontactinfo"
USING btree ("fldcontactcity");
CREATE INDEX "ix_tblcontactinfo_fldcontactstate" ON "public"."tblcontactinfo"
USING btree ("fldcontactstate");
CREATE INDEX "ix_tblcontactinfo_fldperscomments" ON "public"."tblcontactinfo"
USING btree ("fldperscomments");

CREATE TABLE "public"."tblclientmaster" (
"fldclientnumber" BIGSERIAL,
"fldclientname" VARCHAR(100),
"fldclienttype" VARCHAR(50),
"fldclientparentcompanyname_remove" VARCHAR(100),
"fldclientparentcompanynumber" BIGINT,
"fldchildren" SMALLINT,
"fldclientbuyerseller" VARCHAR(10),
"fldterms" VARCHAR(50),
"fldmaxdollars" VARCHAR(20),
"fldmaxdownpayment" VARCHAR(20),
"fldenable" BOOLEAN NOT NULL,
"fldcreateddate" TIMESTAMP WITHOUT TIME ZONE,
"fldcreatedby" VARCHAR(10),
"fldlastupdateddate" TIMESTAMP WITHOUT TIME ZONE,
"fldlastupdatedby" VARCHAR(10),
"fldclientwebsite" VARCHAR(100),
"fldbuyingstatus" VARCHAR(50),
"fldsellingstatus" VARCHAR(50),
"fldequitystatus" VARCHAR(50),
"fldstatusdate" TIMESTAMP WITHOUT TIME ZONE,
CONSTRAINT "pk_tblclientmaster" PRIMARY KEY("fldclientnumber")
) WITHOUT OIDS;

--
Robert


From: Richard Huxton <dev(at)archonet(dot)com>
To: Robert Fitzpatrick <lists(at)webtent(dot)net>
Cc: PostgreSQL <pgsql-general(at)postgresql(dot)org>
Subject: Re: Permance issues with migrated db
Date: 2007-05-22 16:21:48
Message-ID: 4653189C.6040609@archonet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Robert Fitzpatrick wrote:
> I have developed a view in pgsql that takes over 160K ms to execute, but
> when copied into MS SQL against the old database (with syntax mods of
> course), runs in a few seconds.

Your query seems to produce 41.8 million rows. Are you sure MS-SQL is
returning that many rows in a few seconds?

> Merge Left Join (cost=35366.42..57565.28 rows=565261 width=297) (actual time=1000.457..148111.905 rows=41866801 loops=1)

--
Richard Huxton
Archonet Ltd


From: Robert Fitzpatrick <lists(at)webtent(dot)net>
To: PostgreSQL <pgsql-general(at)postgresql(dot)org>
Subject: Re: Permance issues with migrated db
Date: 2007-05-22 16:49:27
Message-ID: 1179852567.28503.69.camel@columbus.webtent.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Tue, 2007-05-22 at 17:21 +0100, Richard Huxton wrote:
> Robert Fitzpatrick wrote:
> > I have developed a view in pgsql that takes over 160K ms to execute, but
> > when copied into MS SQL against the old database (with syntax mods of
> > course), runs in a few seconds.
>
> Your query seems to produce 41.8 million rows. Are you sure MS-SQL is
> returning that many rows in a few seconds?
>
I thought the same thing. While I'm not a MS SQL guru, I put 'TOP 100
PERCENT' after SELECT in the query. The Enterprise Manager does not
indicate how many rows come back. I save it as a VIEW in MS SQL and do a
'select count(*)...' and, yes, it comes back 42164877 records.

Just to be sure MS SQL hasn't done something to the structure (I noticed
dbo prefixes, etc.), I pasted back into pgadmin, took off 'top 100
percent'. Then saved as a view and did a count(*) in pgsql, got
41866801. The difference in the number of records could vary due to the
dataset age I'm using in pgsql for migration. The MS SQL db is still
actively used. I hope I'm just missing something here...this are the
queries for mssql and pgsql, respectively...

SELECT TOP 100 PERCENT dbo.tblClientMaster.fldClientNumber, dbo.tblClientMaster.fldClientName, dbo.tblClientMaster.fldClientType,
dbo.tblClientMaster.fldBuyingStatus, dbo.tblClientMaster.fldSellingStatus, dbo.tblClientProductPreference.fldFullService,
dbo.tblClientProductPreference.fldLimitedService, dbo.tblClientProductPreference.fldAllSuite, dbo.tblClientProductPreference.fldBudget,
dbo.tblClientProductPreference.fldConference, dbo.tblClientProductPreference.fldResort, dbo.tblClientProductPreference.fldDailyFee,
dbo.tblClientProductPreference.fldSemiPrivate, dbo.tblClientProductPreference.fldPrivate, dbo.tblClientProductPreference.fldMunicipal,
dbo.tblClientRoomSize.fldSize149, dbo.tblClientRoomSize.fldSize299, dbo.tblClientRoomSize.fldSize449, dbo.tblClientRoomSize.fldSize599,
dbo.tblClientRoomSize.fldSize600, dbo.tblGeoPreference.fldSW, dbo.tblGeoPreference.fldNW, dbo.tblGeoPreference.fldMW,
dbo.tblGeoPreference.fldW, dbo.tblGeoPreference.fldMA, dbo.tblGeoPreference.fldSE, dbo.tblGeoPreference.fldS, dbo.tblGeoPreference.fldNE,
dbo.tblProductMaster.fldProductName, dbo.tblProductMaster.fldProductCode, dbo.tblContactInfo.fldContactNumber,
dbo.tblContactInfo.fldContactFirstName, dbo.tblContactInfo.fldContactLastName, dbo.tblContactInfo.fldContactCity, dbo.tblContactInfo.fldContactState,
dbo.tblContactInfo.fldContactZipCode, dbo.tblContactInfo.fldContactTitle, dbo.tblContactInfo.fldContactPhone2_Type,
dbo.tblContactInfo.fldContactPhone2_Num, dbo.tblContactInfo.fldContactPhone3_Num, dbo.tblContactInfo.fldContactPhone4_Num,
dbo.tblContactInfo.fldContactPhone5_Num, dbo.tblContactInfo.fldContactEMail, dbo.tblContactInfo.fldEnable, dbo.tblContactInfo.fldContactPhone1_Num,
dbo.tblContactInfo.fldPersComments, dbo.tblClientActivityTag.fldContactActivityTag
FROM dbo.tblClientMaster LEFT OUTER JOIN
dbo.tblClientProductPreference ON dbo.tblClientMaster.fldClientNumber = dbo.tblClientProductPreference.fldClientNumber LEFT OUTER JOIN
dbo.tblClientRoomSize ON dbo.tblClientMaster.fldClientNumber = dbo.tblClientRoomSize.fldClientNumber LEFT OUTER JOIN
dbo.tblGeoPreference ON dbo.tblClientMaster.fldClientNumber = dbo.tblGeoPreference.fldClientNumber LEFT OUTER JOIN
dbo.tblClientProductRelation ON dbo.tblClientMaster.fldClientNumber = dbo.tblClientProductRelation.fldClientNumber INNER JOIN
dbo.tblProductMaster ON dbo.tblClientProductRelation.fldProductNumber = dbo.tblProductMaster.fldProductNumber LEFT OUTER JOIN
dbo.tblContactInfo ON dbo.tblClientMaster.fldClientNumber = dbo.tblContactInfo.fldClientNumber LEFT OUTER JOIN
dbo.tblClientActivityTag ON dbo.tblClientMaster.fldClientNumber = dbo.tblClientActivityTag.fldClientNumber

SELECT tblclientmaster.fldclientnumber, tblclientmaster.fldclientname, tblclientmaster.fldclienttype, tblclientmaster.fldbuyingstatus, tblclientmaster.fldsellingstatus, tblclientproductpreference.fldfullservice, tblclientproductpreference.fldlimitedservice, tblclientproductpreference.fldallsuite, tblclientproductpreference.fldbudget, tblclientproductpreference.fldconference, tblclientproductpreference.fldresort, tblclientproductpreference.flddailyfee, tblclientproductpreference.fldsemiprivate, tblclientproductpreference.fldprivate, tblclientproductpreference.fldmunicipal, tblclientroomsize.fldsize149, tblclientroomsize.fldsize299, tblclientroomsize.fldsize449, tblclientroomsize.fldsize599, tblclientroomsize.fldsize600, tblgeopreference.fldsw, tblgeopreference.fldnw, tblgeopreference.fldmw, tblgeopreference.fldw, tblgeopreference.fldma, tblgeopreference.fldse, tblgeopreference.flds, tblgeopreference.fldne, tblproductmaster.fldproductname, tblproductmaster.fldproductcode, tblcontactinfo.fldcontactnumber, tblcontactinfo.fldcontactfirstname, tblcontactinfo.fldcontactlastname, tblcontactinfo.fldcontactcity, tblcontactinfo.fldcontactstate, tblcontactinfo.fldcontactzipcode, tblcontactinfo.fldcontacttitle, tblcontactinfo.fldcontactphone2_type, tblcontactinfo.fldcontactphone2_num, tblcontactinfo.fldcontactphone3_num, tblcontactinfo.fldcontactphone4_num, tblcontactinfo.fldcontactphone5_num, tblcontactinfo.fldcontactemail, tblcontactinfo.fldenable, tblcontactinfo.fldcontactphone1_num, tblcontactinfo.fldperscomments, tblclientactivitytag.fldcontactactivitytag
FROM tblclientmaster
LEFT JOIN tblclientproductpreference ON tblclientmaster.fldclientnumber = tblclientproductpreference.fldclientnumber
LEFT JOIN tblclientroomsize ON tblclientmaster.fldclientnumber = tblclientroomsize.fldclientnumber
LEFT JOIN tblgeopreference ON tblclientmaster.fldclientnumber = tblgeopreference.fldclientnumber
LEFT JOIN tblclientproductrelation ON tblclientmaster.fldclientnumber = tblclientproductrelation.fldclientnumber
JOIN tblproductmaster ON tblclientproductrelation.fldproductnumber = tblproductmaster.fldproductnumber
LEFT JOIN tblcontactinfo ON tblclientmaster.fldclientnumber = tblcontactinfo.fldclientnumber
LEFT JOIN tblclientactivitytag ON tblclientmaster.fldclientnumber = tblclientactivitytag.fldclientnumber;

--
Robert


From: PFC <lists(at)peufeu(dot)com>
To: "Robert Fitzpatrick" <lists(at)webtent(dot)net>, PostgreSQL <pgsql-general(at)postgresql(dot)org>
Subject: Re: Permance issues with migrated db
Date: 2007-05-22 17:04:55
Message-ID: op.tsqm2ha3cigqcu@apollo13
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general


> I thought the same thing. While I'm not a MS SQL guru, I put 'TOP 100
> PERCENT' after SELECT in the query. The Enterprise Manager does not
> indicate how many rows come back. I save it as a VIEW in MS SQL and do a
> 'select count(*)...' and, yes, it comes back 42164877 records.

No, it comes back 1 record with the count in it, the ORDER BY is useless
for a count(*), etc.

What is it that you are trying to do exactly ?


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Fitzpatrick <lists(at)webtent(dot)net>
Cc: PostgreSQL <pgsql-general(at)postgresql(dot)org>
Subject: Re: Permance issues with migrated db
Date: 2007-05-22 17:10:43
Message-ID: 14820.1179853843@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Robert Fitzpatrick <lists(at)webtent(dot)net> writes:
> On Tue, 2007-05-22 at 17:21 +0100, Richard Huxton wrote:
>> Your query seems to produce 41.8 million rows. Are you sure MS-SQL is
>> returning that many rows in a few seconds?

> I thought the same thing. While I'm not a MS SQL guru, I put 'TOP 100
> PERCENT' after SELECT in the query. The Enterprise Manager does not
> indicate how many rows come back. I save it as a VIEW in MS SQL and do a
> 'select count(*)...' and, yes, it comes back 42164877 records.

> Just to be sure MS SQL hasn't done something to the structure (I noticed
> dbo prefixes, etc.), I pasted back into pgadmin, took off 'top 100
> percent'. Then saved as a view and did a count(*) in pgsql, got
> 41866801.

How much time do the two select count(*) operations take? That would be
a reasonably fair comparison of the query engines, as opposed to
whatever might be happening on the client side (in particular, I wonder
whether the MS client is actually fetching all the rows or just the
first few).

regards, tom lane


From: Robert Fitzpatrick <lists(at)webtent(dot)net>
To: PostgreSQL <pgsql-general(at)postgresql(dot)org>
Subject: Re: Permance issues with migrated db
Date: 2007-05-22 17:15:56
Message-ID: 1179854156.28503.80.camel@columbus.webtent.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Tue, 2007-05-22 at 19:04 +0200, PFC wrote:
> > I thought the same thing. While I'm not a MS SQL guru, I put 'TOP 100
> > PERCENT' after SELECT in the query. The Enterprise Manager does not
> > indicate how many rows come back. I save it as a VIEW in MS SQL and do a
> > 'select count(*)...' and, yes, it comes back 42164877 records.
>
> No, it comes back 1 record with the count in it, the ORDER BY is useless
> for a count(*), etc.
>
> What is it that you are trying to do exactly ?

Yes, one record indicating over 42 million records available from the
view, correct? I realized this after my first post, there is no 'ORDER
BY' in my last post with the two query examples.

I'm trying to my query in pgsql to return a result in the same amount of
time (approx) than it does in the existing mssql db. The query comes
back with results using MS SQL Enterprise Manager in seconds and the
same query in pgadmin takes super long. I just tried running the query
now and it is still going with over 200K ms clocked. If I stop the
query, remove all references to tblcontactinfo and tblactivitytag, the
query comes back in less than 6000 ms.

--
Robert


From: Robert Fitzpatrick <lists(at)webtent(dot)net>
To: PostgreSQL <pgsql-general(at)postgresql(dot)org>
Subject: Re: Permance issues with migrated db
Date: 2007-05-22 17:23:35
Message-ID: 1179854615.28503.86.camel@columbus.webtent.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Tue, 2007-05-22 at 13:10 -0400, Tom Lane wrote:
> Robert Fitzpatrick <lists(at)webtent(dot)net> writes:
> > On Tue, 2007-05-22 at 17:21 +0100, Richard Huxton wrote:
> >> Your query seems to produce 41.8 million rows. Are you sure MS-SQL is
> >> returning that many rows in a few seconds?
>
> > I thought the same thing. While I'm not a MS SQL guru, I put 'TOP 100
> > PERCENT' after SELECT in the query. The Enterprise Manager does not
> > indicate how many rows come back. I save it as a VIEW in MS SQL and do a
> > 'select count(*)...' and, yes, it comes back 42164877 records.
>
> > Just to be sure MS SQL hasn't done something to the structure (I noticed
> > dbo prefixes, etc.), I pasted back into pgadmin, took off 'top 100
> > percent'. Then saved as a view and did a count(*) in pgsql, got
> > 41866801.
>
> How much time do the two select count(*) operations take? That would be
> a reasonably fair comparison of the query engines, as opposed to
> whatever might be happening on the client side (in particular, I wonder
> whether the MS client is actually fetching all the rows or just the
> first few).

Takes 25K ms in pgsql, don't see a timer in MS Ent Manager, but only 5
seconds clocked. Maybe I should put together a php script to operate on
each to be using the exact same client. I am doing all this all on the
same server with PostgreSQL 8.2 loaded in Windows Server 2003 also with
MS SQL server 2000.

--
Robert


From: Richard Huxton <dev(at)archonet(dot)com>
To: Robert Fitzpatrick <lists(at)webtent(dot)net>
Cc: PostgreSQL <pgsql-general(at)postgresql(dot)org>
Subject: Re: Permance issues with migrated db
Date: 2007-05-22 18:03:04
Message-ID: 46533058.2070607@archonet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Robert Fitzpatrick wrote:
> On Tue, 2007-05-22 at 13:10 -0400, Tom Lane wrote:
>> Robert Fitzpatrick <lists(at)webtent(dot)net> writes:
>>> On Tue, 2007-05-22 at 17:21 +0100, Richard Huxton wrote:
>>>> Your query seems to produce 41.8 million rows. Are you sure MS-SQL is
>>>> returning that many rows in a few seconds?
>>> I thought the same thing. While I'm not a MS SQL guru, I put 'TOP 100
>>> PERCENT' after SELECT in the query. The Enterprise Manager does not
>>> indicate how many rows come back. I save it as a VIEW in MS SQL and do a
>>> 'select count(*)...' and, yes, it comes back 42164877 records.
>>> Just to be sure MS SQL hasn't done something to the structure (I noticed
>>> dbo prefixes, etc.), I pasted back into pgadmin, took off 'top 100
>>> percent'. Then saved as a view and did a count(*) in pgsql, got
>>> 41866801.
>> How much time do the two select count(*) operations take? That would be
>> a reasonably fair comparison of the query engines, as opposed to
>> whatever might be happening on the client side (in particular, I wonder
>> whether the MS client is actually fetching all the rows or just the
>> first few).
>
> Takes 25K ms in pgsql, don't see a timer in MS Ent Manager, but only 5
> seconds clocked. Maybe I should put together a php script to operate on
> each to be using the exact same client. I am doing all this all on the
> same server with PostgreSQL 8.2 loaded in Windows Server 2003 also with
> MS SQL server 2000.

1. Be aware that with the real query, PG is giving you the time to fetch
the *last* row, whereas MS is probably the *first* row. I'm reasonably
sure of this because 4.1 million rows * (say) 256 bytes would be 1GB of
data to return, which in a few seconds seems optimistic.

2. If you don't want all the rows in one go, explicitly declare a cursor
with PG.

3. The EXPLAIN ANALYSE will not have the
format-and-transfer-data-to-client costs, but I think does the rest of
the query.

4. We're still 5 x slower than MS-SQL (with the count). That might well
be down to having to check visibility on each row with our MVCC rather
than just going to the index.

Hmm... How much of your machine is PG getting to use vs. MS-SQL? What
are your shared_buffers, work_mem, effective_cache_size (and how much
RAM on this box)?

--
Richard Huxton
Archonet Ltd


From: Robert Fitzpatrick <lists(at)webtent(dot)net>
To: PostgreSQL <pgsql-general(at)postgresql(dot)org>
Subject: Re: Permance issues with migrated db
Date: 2007-05-22 18:30:37
Message-ID: 1179858637.28503.96.camel@columbus.webtent.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Tue, 2007-05-22 at 19:03 +0100, Richard Huxton wrote:
> 4. We're still 5 x slower than MS-SQL (with the count). That might
> well
> be down to having to check visibility on each row with our MVCC
> rather
> than just going to the index.

Tips? I'd love to know how to see inside MVCC. I really appreciate the
help!

>
> Hmm... How much of your machine is PG getting to use vs. MS-SQL? What
> are your shared_buffers, work_mem, effective_cache_size (and how much
> RAM on this box)?

3.5GB of RAM on a Xeon 2.8GHz server. I have default shared_buffers
32MB, no defaults changed except listen_addresses. How can I check
work_mem and effective_cache_size?

--
Robert


From: Robert Fitzpatrick <lists(at)webtent(dot)net>
To: PostgreSQL <pgsql-general(at)postgresql(dot)org>
Subject: Re: Permance issues with migrated db
Date: 2007-05-22 19:00:34
Message-ID: 1179860434.28503.103.camel@columbus.webtent.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Tue, 2007-05-22 at 14:30 -0400, Robert Fitzpatrick wrote:
> On Tue, 2007-05-22 at 19:03 +0100, Richard Huxton wrote:
> >
> > Hmm... How much of your machine is PG getting to use vs. MS-SQL? What
> > are your shared_buffers, work_mem, effective_cache_size (and how much
> > RAM on this box)?
>
> 3.5GB of RAM on a Xeon 2.8GHz server. I have default shared_buffers
> 32MB, no defaults changed except listen_addresses. How can I check
> work_mem and effective_cache_size?

I did some googling and came up with some ideas, I have it now with
these settings and after restarting PG, no help.

work_mem = 5MB
shared_buffers = 128MB
effective_cache_size = 800MB

--
Robert


From: Richard Huxton <dev(at)archonet(dot)com>
To: Robert Fitzpatrick <lists(at)webtent(dot)net>
Cc: PostgreSQL <pgsql-general(at)postgresql(dot)org>
Subject: Re: Permance issues with migrated db
Date: 2007-05-22 19:11:54
Message-ID: 4653407A.1060107@archonet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Robert Fitzpatrick wrote:
> On Tue, 2007-05-22 at 19:03 +0100, Richard Huxton wrote:
>> 4. We're still 5 x slower than MS-SQL (with the count). That might
>> well
>> be down to having to check visibility on each row with our MVCC
>> rather
>> than just going to the index.
>
> Tips? I'd love to know how to see inside MVCC. I really appreciate the
> help!

The main thing is that PostgreSQL's implementation of MVCC means that
1. (In many cases) writers need not block readers.
2. An update is effectively a delete and an insert.
3. VACUUM is needed to mark space from deleted rows for re-use.
4. The indexes don't carry visibility information, which means we need
to go to the actual row on-disk to see if the current transaction can
actually see the row.

This last point is a problem for things like count(*) where we can't
just count entries in the index because we don't know if some of the
rows they point to might be deleted. The reason we don't store
visibility info with the index is that it makes the index larger, so
using up valuable RAM more quickly.

For more info, see "Internals" in the manuals for a start. Then have a
quick look around these for some more bits & pieces. There are some
presentation slides somewhere. Note - I am *not* a developer, just a
long-term user.

http://www.postgresql.org/developer/
http://www.postgresql.org/docs/faqs.FAQ_DEV.html
http://www.postgresql.org/docs/techdocs

>> Hmm... How much of your machine is PG getting to use vs. MS-SQL? What
>> are your shared_buffers, work_mem, effective_cache_size (and how much
>> RAM on this box)?
>
> 3.5GB of RAM on a Xeon 2.8GHz server. I have default shared_buffers
> 32MB, no defaults changed except listen_addresses. How can I check
> work_mem and effective_cache_size?

Ah - I bet MS-SQL is squatting on a gig of RAM or some-such too. Read
through this - it's a little old, but still good advice.
http://www.powerpostgresql.com/PerfList/
You'll probably find increasing work_mem (by a lot) for this one query
will help you out.
SET work_mem = <something large>;
SELECT ....
SET work_mem = <small again>;

--
Richard Huxton
Archonet Ltd


From: Richard Huxton <dev(at)archonet(dot)com>
To: Robert Fitzpatrick <lists(at)webtent(dot)net>
Cc: PostgreSQL <pgsql-general(at)postgresql(dot)org>
Subject: Re: Permance issues with migrated db
Date: 2007-05-22 19:13:33
Message-ID: 465340DD.10908@archonet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Robert Fitzpatrick wrote:
> I did some googling and came up with some ideas, I have it now with
> these settings and after restarting PG, no help.
>
> work_mem = 5MB

My last post missed yours - you're ahead of my previous reply :-)

Try 32MB, then 64MB, then perhaps 128MB. If it looks like you're not
using stupid amounts of RAM on this one sort go higher still.

--
Richard Huxton
Archonet Ltd


From: Robert Fitzpatrick <lists(at)webtent(dot)net>
To: PostgreSQL <pgsql-general(at)postgresql(dot)org>
Subject: Re: Permance issues with migrated db
Date: 2007-05-22 19:31:13
Message-ID: 1179862273.28503.114.camel@columbus.webtent.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Tue, 2007-05-22 at 20:13 +0100, Richard Huxton wrote:
> Robert Fitzpatrick wrote:
> > I did some googling and came up with some ideas, I have it now with
> > these settings and after restarting PG, no help.
> >
> > work_mem = 5MB
>
> My last post missed yours - you're ahead of my previous reply :-)
>
> Try 32MB, then 64MB, then perhaps 128MB. If it looks like you're not
> using stupid amounts of RAM on this one sort go higher still.
>

This really has me perplexed now :\

I closed pgadmin, set it to 128MB work_mem and restarted PG, went back
into pgadmin and created script from my saved view (again, this view is
identical/copied from the same mssql view with only syntax changes). I
started the query and then I started looking over that doc you sent me
(again, since I just read through it from a google search)...after over
300000 ms, still going, not even finishing. I'm beginning to worry
something is wrong with the dataset migrated from mssql? I can migrate
again tonight, last done about a month ago. Since then it has been
dumped from a linux server and loaded on this Windows server without
issue.

--
Robert