Re: [RFC] new digest datatypes, or generic fixed-len hex types?

Lists: pgsql-hackers
From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: [RFC] new digest datatypes, or generic fixed-len hex types?
Date: 2009-07-27 11:50:30
Message-ID: 20090727115030.GA6459@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

We've developed some code to implement fixed-length datatypes for well
known digest function output (MD5, SHA1 and the various SHA2 types).
These types have minimal overhead and are quite complete, including
btree and hash opclasses.

We're wondering about proposing them for inclusion in pgcrypto. I asked
Marko Kreen but he is not sure about it; according to him it would be
better to have general fixed-length hex types. (I guess it would be
possible to implement the digest types as domains over those.)

So basically we have sha1, sha-256, sha-512 etc on one hand, and hex8,
hex16, hex32 on the other hand. In both cases there is a single body of
code that is compiled with a macro definition that provides the data
length for every separate case. (Actually in the digest code we
refactored the common routines so that each type has a light wrapper
calling a function that works on any length; this could also be done to
the fixed-len hex code as well -- that code is pretty grotty at the
moment.)

Of these two choices, which one is likely to have better acceptance
around here?

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [RFC] new digest datatypes, or generic fixed-len hex types?
Date: 2009-07-27 14:20:18
Message-ID: 10105.1248704418@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> We've developed some code to implement fixed-length datatypes for well
> known digest function output (MD5, SHA1 and the various SHA2 types).
> These types have minimal overhead and are quite complete, including
> btree and hash opclasses.

> We're wondering about proposing them for inclusion in pgcrypto.

Wasn't this proposed and rejected before? (Or more to the point,
why'd you bother? The advantage over bytea seems negligible.)

regards, tom lane


From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [RFC] new digest datatypes, or generic fixed-len hex types?
Date: 2009-07-27 15:12:48
Message-ID: b42b73150907270812q69f985f8x73d02a2fd198aa25@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Jul 27, 2009 at 10:20 AM, Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
>> We've developed some code to implement fixed-length datatypes for well
>> known digest function output (MD5, SHA1 and the various SHA2 types).
>> These types have minimal overhead and are quite complete, including
>> btree and hash opclasses.
>
>> We're wondering about proposing them for inclusion in pgcrypto.
>
> Wasn't this proposed and rejected before?  (Or more to the point,
> why'd you bother?  The advantage over bytea seems negligible.)

well, one nice things about the fixed length types is that you can
keep your table from needing a toast table when you have a bytea in
it.

merlin


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [RFC] new digest datatypes, or generic fixed-len hex types?
Date: 2009-07-27 15:36:49
Message-ID: 11267.1248709009@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Merlin Moncure <mmoncure(at)gmail(dot)com> writes:
> On Mon, Jul 27, 2009 at 10:20 AM, Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Wasn't this proposed and rejected before? (Or more to the point,
>> why'd you bother? The advantage over bytea seems negligible.)

> well, one nice things about the fixed length types is that you can
> keep your table from needing a toast table when you have a bytea in
> it.

If you don't actually use the toast table, it doesn't cost anything very
noticeable ...

regards, tom lane


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [RFC] new digest datatypes, or generic fixed-len hex types?
Date: 2009-07-27 16:02:32
Message-ID: 4A6DCF98.7040308@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Merlin Moncure wrote:
> On Mon, Jul 27, 2009 at 10:20 AM, Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
>> Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
>>
>>> We've developed some code to implement fixed-length datatypes for well
>>> known digest function output (MD5, SHA1 and the various SHA2 types).
>>> These types have minimal overhead and are quite complete, including
>>> btree and hash opclasses.
>>>
>>> We're wondering about proposing them for inclusion in pgcrypto.
>>>
>> Wasn't this proposed and rejected before? (Or more to the point,
>> why'd you bother? The advantage over bytea seems negligible.)
>>
>
> well, one nice things about the fixed length types is that you can
> keep your table from needing a toast table when you have a bytea in
> it.
>
>
>

Can't you just set storage on the column to MAIN to stop it being stored
in a toast table?

cheers

andrew


From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [RFC] new digest datatypes, or generic fixed-len hex types?
Date: 2009-07-27 17:54:59
Message-ID: b42b73150907271054v742ea9b4o535fa28bf8498706@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Jul 27, 2009 at 12:02 PM, Andrew Dunstan<andrew(at)dunslane(dot)net> wrote:
> Merlin Moncure wrote:
>>
>> On Mon, Jul 27, 2009 at 10:20 AM, Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>
>>>
>>> Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
>>>
>>>>
>>>> We've developed some code to implement fixed-length datatypes for well
>>>> known digest function output (MD5, SHA1 and the various SHA2 types).
>>>> These types have minimal overhead and are quite complete, including
>>>> btree and hash opclasses.
>>>>      We're wondering about proposing them for inclusion in pgcrypto.
>>>>
>>>
>>> Wasn't this proposed and rejected before?  (Or more to the point,
>>> why'd you bother?  The advantage over bytea seems negligible.)
>>>
>>
>> well, one nice things about the fixed length types is that you can
>> keep your table from needing a toast table when you have a bytea in
>> it.
>
> Can't you just set storage on the column to MAIN to stop it being stored in
> a toast table?

of course.

hm. would the input/output functions for the fixed length types be
faster? what is the advantage of the proposal?

merlin


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Subject: Re: [RFC] new digest datatypes, or generic fixed-len hex types?
Date: 2009-07-28 11:15:33
Message-ID: 200907281415.33774.peter_e@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Monday 27 July 2009 14:50:30 Alvaro Herrera wrote:
> We've developed some code to implement fixed-length datatypes for well
> known digest function output (MD5, SHA1 and the various SHA2 types).
> These types have minimal overhead and are quite complete, including
> btree and hash opclasses.
>
> We're wondering about proposing them for inclusion in pgcrypto. I asked
> Marko Kreen but he is not sure about it; according to him it would be
> better to have general fixed-length hex types. (I guess it would be
> possible to implement the digest types as domains over those.)

I think equipping bytea with a length restriction would be a very natural,
simple, and useful addition. If we ever want to move the bytea type closer to
the SQL standard blob type, this will need to happen anyway.

The case for separate fixed-length data types seems very dubious, unless you
can show very impressive performance numbers. For one thing, they would make
the whole type system more complicated, or in the alternative, would have
little function and operator support.


From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org, Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Subject: Re: [RFC] new digest datatypes, or generic fixed-len hex types?
Date: 2009-07-28 14:34:36
Message-ID: b42b73150907280734s28897444iac7f14e95c394052@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Jul 28, 2009 at 7:15 AM, Peter Eisentraut<peter_e(at)gmx(dot)net> wrote:
> On Monday 27 July 2009 14:50:30 Alvaro Herrera wrote:
>> We've developed some code to implement fixed-length datatypes for well
>> known digest function output (MD5, SHA1 and the various SHA2 types).
>> These types have minimal overhead and are quite complete, including
>> btree and hash opclasses.
>>
> I think equipping bytea with a length restriction would be a very natural,
> simple, and useful addition.  If we ever want to move the bytea type closer to
> the SQL standard blob type, this will need to happen anyway.
>

+1

merlin


From: decibel <decibel(at)decibel(dot)org>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org, Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Subject: Re: [RFC] new digest datatypes, or generic fixed-len hex types?
Date: 2009-07-29 17:16:48
Message-ID: 0ED03D5D-33F1-4220-B151-A1AE566B7533@decibel.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Jul 28, 2009, at 6:15 AM, Peter Eisentraut wrote:
> On Monday 27 July 2009 14:50:30 Alvaro Herrera wrote:
>> We've developed some code to implement fixed-length datatypes for
>> well
>> known digest function output (MD5, SHA1 and the various SHA2 types).
>> These types have minimal overhead and are quite complete, including
>> btree and hash opclasses.
>>
>> We're wondering about proposing them for inclusion in pgcrypto. I
>> asked
>> Marko Kreen but he is not sure about it; according to him it would be
>> better to have general fixed-length hex types. (I guess it would be
>> possible to implement the digest types as domains over those.)
>
> I think equipping bytea with a length restriction would be a very
> natural,
> simple, and useful addition. If we ever want to move the bytea
> type closer to
> the SQL standard blob type, this will need to happen anyway.
>
> The case for separate fixed-length data types seems very dubious,
> unless you
> can show very impressive performance numbers. For one thing, they
> would make
> the whole type system more complicated, or in the alternative,
> would have
> little function and operator support.
bytea doesn't cast well to and from text when you're dealing with hex
data; you end up using the same amount of space as a varchar. What
would probably work well is a hex datatype that internally works like
bytea but requires that the input data is hex (I know you can use
encode/decode, but that added step is a pain). A similar argument
could be made for base64 encoded data.
--
Decibel!, aka Jim C. Nasby, Database Architect decibel(at)decibel(dot)org
Give your computer some brain candy! www.distributed.net Team #1828


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: pgsql-hackers(at)postgresql(dot)org
Cc: decibel <decibel(at)decibel(dot)org>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Subject: Re: [RFC] new digest datatypes, or generic fixed-len hex types?
Date: 2009-07-30 08:10:13
Message-ID: 200907301110.13603.peter_e@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wednesday 29 July 2009 20:16:48 decibel wrote:
> bytea doesn't cast well to and from text when you're dealing with hex
> data; you end up using the same amount of space as a varchar. What
> would probably work well is a hex datatype that internally works like
> bytea but requires that the input data is hex (I know you can use
> encode/decode, but that added step is a pain). A similar argument
> could be made for base64 encoded data.

There is a patch in the queue that adds hex input and output to bytea.