Re: Fixed length data types issue

From: Mark Dilger <pgsql(at)markdilger(dot)com>
To: Mark Dilger <pgsql(at)markdilger(dot)com>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Fixed length data types issue
Date: 2006-09-14 16:41:58
Message-ID: 45098656.3050607@markdilger.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

My apologies if you are seeing this twice. I posted it last night, but
it still does not appear to have made it to the group.

Mark Dilger wrote:
> Tom Lane wrote:
>> Mark Dilger <pgsql(at)markdilger(dot)com> writes:
>>> Tom Lane wrote:
>>>> Please provide a stack trace --- AFAIK there shouldn't be any reason
>>>> why
>>>> a pass-by-ref 3-byte type wouldn't work.
>>
>>> (gdb) bt
>>> #0 0xb7e01d45 in memcpy () from /lib/libc.so.6
>>> #1 0x08077ece in heap_fill_tuple (tupleDesc=0x83c2ef7,
>>> values=0x83c2e84, isnull=0x83c2e98 "", data=0x83c2ef4 "",
>>> infomask=0x83c2ef0, bit=0x0)
>>> at heaptuple.c:181
>>
>> Hm, are you sure you provided a valid pointer (not the integer value
>> itself) as the Datum output from int3_in?
>>
>> (Looks at patch ... ) Um, I think you didn't, although that coding
>> is far too cute to be actually readable ...
>>
>> regards, tom lane
>
> Ok, I have it working on my intel architecture machine. Here are some
> of my findings. Disk usage is calculated by running 'du -b' in
> /usr/local/pgsql/data before and after loading the table, and taking the
> difference. That directory is deleted, recreated, and initdb rerun
> between each test. The host system is a dual processor, dual core 2.4
> GHz system, 2 GB DDR400 memory, 10,000 RPM SCSI ultra160 hard drive with
> the default postgresql.conf file as created by initdb. The code is the
> stock postgresql-8.1.4 release tarball compiled with gcc and configured
> without debug or cassert options enabled.
>
>
> INT3 VS INT4
> ------------
> Using a table of 8 integers per row and 16777216 rows, I can drop the
> disk usage from 1.2 GB down to 1.0 GB by defining those integers as int3
> rather than int4. (It works out to about 70.5 bytes per row vs. 62.5
> bytes per row.) However, the load time actually increases, probably due
> to CPU/memory usage. The time increased from 197 seconds to 213
> seconds. Note that int3 is defined pass-by-reference due to a
> limitation in the code that prevents pass-by-value for any datasize
> other than 1, 2, or 4 bytes.
>
> Using a table of only one integer per row, the table size is exactly the
> same (down to the byte) whether I use int3 or int4. I suspect this is
> due to data alignment for the row being on at least a 4 byte boundary.
>
> Creating an index on a single column of the 8-integer-per-row table, the
> index size is exactly the same whether the integers are int3 or int4.
> Once again, I suspect that data alignment is eliminating the space savings.
>
> I haven't tested this, but I suspect that if the column following an
> int3 is aligned on 4 or 8 byte boundaries, that the int3 column will
> have an extra byte padded and hence will have no performance gain.
>
>
> INT1 VS INT2
> ------------
> Once again using a table of 8 integers per row and 16777216 rows, I can
> drop the disk usage from 909 MB down to 774 MB by defining those
> integers as int1 rather than int2. (54 bytes per row vs 46 bytes per
> row.) The load time also drops, from 179 seconds to 159 seconds. Note
> that int1 is defined pass-by-value.
>
>
> mark

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Stefan Kaltenbrunner 2006-09-14 16:49:07 Re: Mid cycle release?
Previous Message Joshua D. Drake 2006-09-14 16:39:27 Re: Mid cycle release?