Cost of XLogInsert CRC calculations

Lists: pgsql-hackers
From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: Cost of XLogInsert CRC calculations
Date: 2005-03-06 05:17:56
Message-ID: 2541.1110086276@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I was profiling a case involving UPDATEs into a table with too many
indexes (brought to mind by mysql's sql-bench, about which more later)
and got this rather surprising result for routines costing more than
1% of the total runtime:

Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls s/call s/call name
64.03 86.20 86.20 133608 0.00 0.00 XLogInsert
3.50 90.91 4.71 2484787 0.00 0.00 _bt_compare
2.92 94.84 3.93 839893 0.00 0.00 hash_search
2.77 98.57 3.73 1875815 0.00 0.00 LWLockAcquire
1.89 101.12 2.55 1887972 0.00 0.00 LWLockRelease
1.27 102.83 1.71 125234 0.00 0.00 _bt_getroot
1.01 104.19 1.36 403342 0.00 0.00 PinBuffer
1.00 105.54 1.35 840002 0.00 0.00 hash_any

I suppose that the bulk of the CPU cycles being attributed to XLogInsert
are going into the inlined CRC calculations. Maybe we need to think
twice about the cost/benefit ratio of using 64-bit CRCs to protect xlog
records that are often only a few dozen bytes.

regards, tom lane


From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Cost of XLogInsert CRC calculations
Date: 2005-03-06 08:24:55
Message-ID: Pine.OSF.4.61.0503060923210.1725@kosh.hut.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, 6 Mar 2005, Tom Lane wrote:

> I suppose that the bulk of the CPU cycles being attributed to XLogInsert
> are going into the inlined CRC calculations. Maybe we need to think
> twice about the cost/benefit ratio of using 64-bit CRCs to protect xlog
> records that are often only a few dozen bytes.

Isn't the CRC quite important on recovery to recognize where the last
valid log record is?

Is there any better implementations of CRC-64? Would using a different
polynomial help?

Would it help to do the CRC calculation in a more wholesale fashion in
XLogWrite?

How about switching to CRC-32 or even CRC-16? I searched the archives for
the reason CRC-64 was chosen in the first place. It seems that the
difference in computation time was not considered to be significant, and
there was 8 bytes available in the record header anyway.

Just some thoughts...

- Heikki


From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Cost of XLogInsert CRC calculations
Date: 2005-03-06 10:05:12
Message-ID: 1110103512.6117.116.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, 2005-03-06 at 00:17 -0500, Tom Lane wrote:
> I was profiling a case involving UPDATEs into a table with too many
> indexes (brought to mind by mysql's sql-bench, about which more later)
> and got this rather surprising result for routines costing more than
> 1% of the total runtime:
>
> Each sample counts as 0.01 seconds.
> % cumulative self self total
> time seconds seconds calls s/call s/call name
> 64.03 86.20 86.20 133608 0.00 0.00 XLogInsert
> 3.50 90.91 4.71 2484787 0.00 0.00 _bt_compare
> 2.92 94.84 3.93 839893 0.00 0.00 hash_search
> 2.77 98.57 3.73 1875815 0.00 0.00 LWLockAcquire
> 1.89 101.12 2.55 1887972 0.00 0.00 LWLockRelease
> 1.27 102.83 1.71 125234 0.00 0.00 _bt_getroot
> 1.01 104.19 1.36 403342 0.00 0.00 PinBuffer
> 1.00 105.54 1.35 840002 0.00 0.00 hash_any
>
> I suppose that the bulk of the CPU cycles being attributed to XLogInsert
> are going into the inlined CRC calculations. Maybe we need to think
> twice about the cost/benefit ratio of using 64-bit CRCs to protect xlog
> records that are often only a few dozen bytes.

Yes, in recent performance tests sponsored by Unisys, this result was
also very clear. In those tests we used Intel VTune to identify the
precise lines of code soaking up the cycles...it was the CRC checks.

More results should be available from the Unisys testing within a few
days.

I had assumed that the majority of the cost of CRC checking was as a
result of the need to log complete blocks, rather than the rather small
xlog records themselves?

Best Regards, Simon Riggs


From: Greg Stark <gsstark(at)mit(dot)edu>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Cost of XLogInsert CRC calculations
Date: 2005-06-01 18:56:22
Message-ID: 87ekblucrt.fsf@stark.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


It occurs to me that at least on some OSes the WAL logs are being synced with
O_SYNC or its ilk. In those cases the writes should be guaranteed to be
written out in the order postgres wrote them. So if the tail end of the WAL
entry is there (is there any sort of footer?) then the entire entry must be
there. In that case is there any need to calculate the CRC at all?

I suppose it's a bit of a problem in that the database doing the replay might
not know which sync method was used to write the entries. The format would
have to stay the same. Some magic value would have to be defined to always be
correct.

--
greg


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Cost of XLogInsert CRC calculations
Date: 2005-06-01 19:03:48
Message-ID: 17708.1117652628@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Greg Stark <gsstark(at)mit(dot)edu> writes:
> It occurs to me that at least on some OSes the WAL logs are being synced with
> O_SYNC or its ilk. In those cases the writes should be guaranteed to be
> written out in the order postgres wrote them. So if the tail end of the WAL
> entry is there (is there any sort of footer?) then the entire entry must be
> there. In that case is there any need to calculate the CRC at all?

Sure. How else do you know that the entry is all there *and is valid*?
There's no "footer", and if there were it might still be garbage (eg
left over from a prior cycle).

Also, I doubt that O_SYNC could do anything to guarantee write order of
the individual sectors within a page.

regards, tom lane