Fixing domain input

Lists: pgsql-hackers
From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: Fixing domain input
Date: 2005-07-08 21:35:43
Message-ID: 28421.1120858543@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

We've seen a couple of bug reports now about how domain constraints
aren't checked during input of a parameter that's been deduced to be
of a domain type, eg
http://archives.postgresql.org/pgsql-interfaces/2005-07/msg00009.php
http://archives.postgresql.org/pgsql-bugs/2005-07/msg00084.php
There's also the long-standing bugaboo that plpgsql doesn't enforce
domain constraints.

In the first of these threads, I suggested hacking the parameter type
resolution rules so that parameters wouldn't be assigned inferred types
that are domains, but only their base types. However, that only fixes
things when the parameter type is inferred --- if it's specified as a
domain by the client, we'd still see the problem. And it does nothing
for plpgsql.

It occurs to me that a cleaner solution would be to stop giving domain
types the same typinput routines as their base types. Instead, give
them all a specialized routine domain_in (comparable to array_in) that
first invokes the base type's input function and then applies any
relevant constraint checks. Likewise for typreceive (but we'd not need
to touch the output functions). This has a number of attractions:

* Solves both cases of the domain-parameter problem.

* Since plpgsql does all type coercions by calling output and input
functions, I believe this would automatically fix the bugs in plpgsql.

* Allows us to eliminate special cases for domains in parse_coerce.c,
copy.c, possibly other places.

The main disadvantage of it is that for domains that have CHECK
constraints, it's necessary to set up an ExprContext in which the check
expressions can be evaluated; and in turn that requires an
ExecutorState, plus ExecInitExpr, etc. So there's a pretty fair amount
of setup overhead involved, and doing that repeatedly in a series of
calls is not attractive from a performance standpoint. (This may be why
we didn't do it that way originally, though I don't recall any more
whether it was even considered.)

We could eliminate this overhead in the case of COPY by adding an API
kluge that lets domain_in() detect whether it's being called inside COPY
IN, and let it piggyback on COPY's EState, so that the setup overhead is
still only paid once per COPY command.

In other scenarios such as plpgsql I'm not sure we can afford to try to
amortize the setup across multiple calls --- plpgsql is pretty cavalier
about the context it calls things in, and I think we'd see huge memory
leaks if we didn't free the EState before returning from domain_in().
Still, a slow feature is better than silently failing to apply the
constraint, which is where we are now.

Thoughts?

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Fixing domain input
Date: 2005-07-11 03:47:02
Message-ID: 29754.1121053622@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I wrote:
> It occurs to me that a cleaner solution would be to stop giving domain
> types the same typinput routines as their base types. Instead, give
> them all a specialized routine domain_in (comparable to array_in) that
> first invokes the base type's input function and then applies any
> relevant constraint checks.

I did most of the work of coding this up, only to watch the idea
crash and burn: datatype input routines aren't called at all for
NULL values, so there's no way to enforce a NOT NULL domain constraint
from the input routine.

Currently trying to think of decent alternatives ...

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Fixing domain input
Date: 2006-04-04 01:39:15
Message-ID: 11781.1144114755@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Last summer, I wrote:
[ http://archives.postgresql.org/pgsql-hackers/2005-07/msg00320.php ]
>> It occurs to me that a cleaner solution would be to stop giving domain
>> types the same typinput routines as their base types. Instead, give
>> them all a specialized routine domain_in (comparable to array_in) that
>> first invokes the base type's input function and then applies any
>> relevant constraint checks.

> I did most of the work of coding this up, only to watch the idea
> crash and burn: datatype input routines aren't called at all for
> NULL values, so there's no way to enforce a NOT NULL domain constraint
> from the input routine.

The obvious solution to this, of course, is to allow datatype input
routines to be called for NULLs. We could check the functions'
strictness flag before doing so, so that the old behavior still
prevails for any input function marked strict, which is most of 'em.

When I first thought of this, a couple days ago, my immediate worry
was that it'd open a security hole: any C-language input function that
wasn't marked strict would crash the backend if passed a null input
string. However, any such function is *already* a security hole,
because it's been possible to call it manually for some time:
select int4in(null::cstring);
So there's no additional risk --- in fact, arguably having such a
function crash during normal input of NULL would be a Good Thing,
because it would make it far more likely that the mistake would get
noticed and fixed before someone used it as a form of DOS attack.

So that seems to leave only the objection that we'd still have to change
all the places that call input functions. But it's not like we've not
done that before (several times, even :-(). And making a change to
support non-strict input functions is still way nicer than introducing
explicit knowledge of domains in all these places.

Comments?

regards, tom lane


From: Michael Glaesemann <grzm(at)myrealbox(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Fixing domain input
Date: 2006-04-05 02:20:48
Message-ID: 0B447D7A-E810-42AD-AA63-AABA91A5DCE5@myrealbox.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On Apr 4, 2006, at 10:39 , Tom Lane wrote:

> So there's no additional risk --- in fact, arguably having such a
> function crash during normal input of NULL would be a Good Thing,
> because it would make it far more likely that the mistake would get
> noticed and fixed before someone used it as a form of DOS attack.

Granted, finding an error earlier than later is a Good Thing, but an
Even Better Thing would be to prevent crashing the backend, and
afaics (as far as that is) the change you propose doesn't hurt
anything. Do you see any way to do that?

I'm glad to see work being done on domains. I'm definitely learning
from the discussion.

Michael Glaesemann
grzm myrealbox com


From: Christopher Kings-Lynne <chris(dot)kings-lynne(at)calorieking(dot)com>
To: Michael Glaesemann <grzm(at)myrealbox(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Fixing domain input
Date: 2006-04-05 02:33:14
Message-ID: 44332C6A.4050806@calorieking.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> I'm glad to see work being done on domains. I'm definitely learning from
> the discussion.

I wonder if we should implement 'GRANT USAGE ON DOMAINS' for spec
compliance sometime...

Chris


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Michael Glaesemann <grzm(at)myrealbox(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Fixing domain input
Date: 2006-04-05 02:46:01
Message-ID: 23148.1144205161@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Michael Glaesemann <grzm(at)myrealbox(dot)com> writes:
> Granted, finding an error earlier than later is a Good Thing, but an
> Even Better Thing would be to prevent crashing the backend, and
> afaics (as far as that is) the change you propose doesn't hurt
> anything. Do you see any way to do that?

For starters we'd have to forbid any user-written C functions.
Somehow that doesn't seem like a net win.

regards, tom lane


From: Michael Glaesemann <grzm(at)myrealbox(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Fixing domain input
Date: 2006-04-05 02:50:48
Message-ID: 4E7B6983-2D7A-4BDB-9922-75A98C251311@myrealbox.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On Apr 5, 2006, at 11:46 , Tom Lane wrote:

> For starters we'd have to forbid any user-written C functions.
> Somehow that doesn't seem like a net win.

Ouch.

Michael Glaesemann
grzm myrealbox com