I wrote:
I'm off for a little visit with oprofile...
It seems the answer is that fwrite() does have pretty significant
per-call overhead, at least on Fedora Core 4. The patch I did yesterday
still ended up making an fwrite() call every few characters when dealing
with bytea text output, because it'd effectively do two fwrite()s per
occurrence of '\' in the data being output. I've committed a further
hack that buffers a whole data row before calling fwrite(). Even though
this presumably is adding one extra level of data copying, it seems to
make things noticeably faster: