resolving SQL ambiguity (was Re: WIP: Allow SQL-lang funcs to ref params by param name)

From: Darren Duncan <darren(at)darrenduncan(dot)net>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: resolving SQL ambiguity (was Re: WIP: Allow SQL-lang funcs to ref params by param name)
Date: 2011-03-26 03:50:21
Message-ID: 4D8D627D.9010001@darrenduncan.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Robert Haas wrote:
> On Mar 25, 2011, at 9:22 PM, Joshua Berkus <josh(at)agliodbs(dot)com> wrote:
>> Tom,
>>
>>> Personally I'd vote for *not* having any such dangerous semantics as
>>> that. We should have learned better by now from plpgsql experience.
>>> I think the best idea is to throw error for ambiguous references,
>>> period.
>> As a likely heavy user of this feature, I agree with Tom here. I really don't want the column being silently preferred in SQL functions, when PL/pgSQL functions are throwing an error. I'd end up spending hours debugging this.
>>
>> Also, I don't understand why this would be a dump/reload issue if $1 and $2 continue to work.
>
> Because an identifier that previously referred unambiguously to a column might now be ambiguous, if there is a parameter with the same name.

I mention 2 possible solutions here, both which involve syntax alterations, each
between the ---------- lines. I personally like the second/lower option more.

------------

Might it be reasonable, perhaps as a 9.2 feature, to add top-level-namespaces so
that one could always explicitly qualify what they are referring to?

For example, you could have the 3 "sch", "lex", "attr" (I may have missed some
useful ones).

The "sch" TLN would unambiguously refer directly to a schema object, such as a
database table.

The "lex" TLN would unambiguously refer directly to a lexical, either a
parameter of the current routine or to a lexical variable.

The "attr" TLN would unambiguously refer to a table/etc column/attribute in the
manner typical for SQL.

Use them like:

sch.foo - the table/etc foo
lex.foo - the lexical variable foo
attr.foo - the column foo

Use of these TLN are optional where there is no ambiguity.

The TLN are not reserved words, but if one has an entity named the same, then
references to it must be TLN-qualified; eg:

lex.sch
lex.lex
lex.attr

Now these are just examples. You may find a different set works better.

--------------

There are also alternate solutions.

For example, it could be mandated that lexical-scope aliases for any
data/var-like schema object are required in routines, where the aliases are
distinct from all lexical vars/params/etc, and then all SQL/code in the routines
may only refer to the schema objects by the aliases.

Effectively this makes it so that routines can no longer see non-lexical vars
but for those from parameters, and this aliasing is defining a parameter whose
argument is supplied by the DBMS automatically rather than as an explicit
routine caller argument.

That way, inside a routine body there are only lexical names for things, and so
no namespace-qualification is ever needed by the regular SQL.

Similarly, if you always think of table column names as referring to an
attribute or element of a table variable, then just reference the column
qualified by the table name (or the lexical alias thereof). Same as you do in
any other programming language. Of course, sometimes you don't have to qualify
column name references as context could make it unambiguous. Or, a shorthand
like a simple leading "." could unambiguously say you're referring to a column
of the particular table in context.

With those in place, all unqualified references are straight to lexical
variables or parameters.

And so, this is also an effective way to resolve the ambiguity and I prefer the
latter design personally.

Here's an example in quasi-PL/PgSQL:

create function myrtn (myparam integer, mytbl ::= mydb.myschema.mytbl) as
declare
myvar integer := 5;
$body$
begin
select (.mycol + myvar * myparam) as mynewcol from mytbl;
end;
$body$

Note that I've already thought through this last example as these methods of
avoiding ambiguity are loosely-speaking how my language Muldis D avoids the
problem faced by many SQL procedures.

The ".mycol" syntax specifically was inspired originally for me by Perl 6 where
the lack of something just before the "." means that the implicit topic variable
is referred to, like if you said "$_.mycol".

A Perl 6 analogy being something like:

$mytbl.map:{ .mycol + $myvar * $myparam }

aka:

$mytbl.map:{ $_.mycol + $myvar * $myparam }

--------------

-- Darren Duncan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2011-03-26 04:36:43 Open issues for collations
Previous Message Tom Lane 2011-03-26 03:23:42 Re: WIP: Allow SQL-language functions to reference parameters by parameter name