Re: proposal: new contrib module plpgsql's embeded sql validator

Lists: pgsql-hackers
From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: proposal: new contrib module plpgsql's embeded sql validator
Date: 2011-07-08 04:31:31
Message-ID: CAFj8pRCj9AUogrQq2Na_8Mg58-YiKWN_WuNyQPdnqOt83wZ-fg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hello all,

a lazy deep SQL validation inside plpgsq functions is interesting
attribute. It allows to work with temporary tables and it make testing
and debugging harder, because lot of errors in embedded queries are
detected too late. I wrote a simple module that can to help little
bit. It is based on plpgsql plugin API and it ensures a deep
validation of embedded sql early - after start of execution. I am
thinking, so this plugin is really useful and it is example of plpgsql
pluging - that is missing in contrib.

Example:

buggy function - raise error when par > 10

CREATE OR REPLACE FUNCTION public.kuku(a integer)
RETURNS integer
LANGUAGE plpgsql
AS $function$
begin
if (a > 10) then
return b + 1;
else
return a + 1;
end if;
end;
$function$

but it is works for par <= 10

postgres=# select kuku(1);
kuku
------
2
(1 row)

postgres=# load 'plpgsql';
LOAD
postgres=# load 'plpgsql_esql_checker';
LOAD
postgres=# select kuku(1);
ERROR: column "b" does not exist
LINE 1: SELECT b + 1
^
QUERY: SELECT b + 1
CONTEXT: PL/pgSQL function "kuku" line 3 at RETURN

with esql checker this bug is identified without dependency on used
parameter's value

What do you think about this idea?

The code contains a plpgsql_statement_tree walker - it should be moved
to core and used generally - statistic, coverage tests, ...

Regards

Pavel Stehule

Attachment Content-Type Size
plpgsql-checker-9.0.tgz application/x-gzip 2.8 KB

From: Jim Nasby <jim(at)nasby(dot)net>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: proposal: new contrib module plpgsql's embeded sql validator
Date: 2011-07-17 20:31:45
Message-ID: 0AA9297B-E97A-4AA0-BE1E-713ABC9D65EC@nasby.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Jul 7, 2011, at 11:31 PM, Pavel Stehule wrote:
> a lazy deep SQL validation inside plpgsq functions is interesting
> attribute. It allows to work with temporary tables and it make testing
> and debugging harder, because lot of errors in embedded queries are
> detected too late. I wrote a simple module that can to help little
> bit. It is based on plpgsql plugin API and it ensures a deep
> validation of embedded sql early - after start of execution. I am
> thinking, so this plugin is really useful and it is example of plpgsql
> pluging - that is missing in contrib.
>
> Example:
>
> buggy function - raise error when par > 10
>
>
> CREATE OR REPLACE FUNCTION public.kuku(a integer)
> RETURNS integer
> LANGUAGE plpgsql
> AS $function$
> begin
> if (a > 10) then
> return b + 1;
> else
> return a + 1;
> end if;
> end;
> $function$
>
> but it is works for par <= 10
>
> postgres=# select kuku(1);
> kuku
> ------
> 2
> (1 row)
>
> postgres=# load 'plpgsql';
> LOAD
> postgres=# load 'plpgsql_esql_checker';
> LOAD
> postgres=# select kuku(1);
> ERROR: column "b" does not exist
> LINE 1: SELECT b + 1
> ^
> QUERY: SELECT b + 1
> CONTEXT: PL/pgSQL function "kuku" line 3 at RETURN
>
> with esql checker this bug is identified without dependency on used
> parameter's value
>
> What do you think about this idea?
>
> The code contains a plpgsql_statement_tree walker - it should be moved
> to core and used generally - statistic, coverage tests, ...

I think this should at least be a contrib module; it seems very useful.

On a somewhat related note, I'd also really like to have the ability to parse things like .sql files externally, to do things like LINT checking.
--
Jim C. Nasby, Database Architect jim(at)nasby(dot)net
512.569.9461 (cell) http://jim.nasby.net


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Jim Nasby <jim(at)nasby(dot)net>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: proposal: new contrib module plpgsql's embeded sql validator
Date: 2011-07-18 18:05:42
Message-ID: 1311012052-sup-1482@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Excerpts from Jim Nasby's message of dom jul 17 16:31:45 -0400 2011:

> On a somewhat related note, I'd also really like to have the ability to parse things like .sql files externally, to do things like LINT checking.

We talked about this during PGCon. The idea that seemed to have
consensues there was to export the parser similarly to how we build the
ecpg parser, that is, a set of perl scripts which take our gram.y as
input and modify it to emit something different. What ecpg does with it
is emit a different grammar, but it doesn't seem impossible to me to
have it emit some sort of (lint) checker.

I admit I haven't looked at the new Perl scripts we use in ecpg (they
were rewritten in the 9.1 era).

--
Álvaro Herrera <alvherre(at)commandprompt(dot)com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: David Fetter <david(at)fetter(dot)org>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Jim Nasby <jim(at)nasby(dot)net>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: proposal: new contrib module plpgsql's embeded sql validator
Date: 2011-07-19 02:30:52
Message-ID: 20110719023052.GA18335@fetter.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Jul 18, 2011 at 02:05:42PM -0400, Alvaro Herrera wrote:
> Excerpts from Jim Nasby's message of dom jul 17 16:31:45 -0400 2011:
>
> > On a somewhat related note, I'd also really like to have the
> > ability to parse things like .sql files externally, to do things
> > like LINT checking.
>
> We talked about this during PGCon. The idea that seemed to have
> consensues there was to export the parser similarly to how we build
> the ecpg parser, that is, a set of perl scripts which take our
> gram.y as input and modify it to emit something different.

That solves the non-customized part of the problem, which is worth
solving. A complete parser for a given DB would need catalog access,
i.e. a live connection to that DB.

Cheers,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


From: Tatsuo Ishii <ishii(at)postgresql(dot)org>
To: david(at)fetter(dot)org
Cc: alvherre(at)commandprompt(dot)com, jim(at)nasby(dot)net, pavel(dot)stehule(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: proposal: new contrib module plpgsql's embeded sql validator
Date: 2011-07-19 02:46:48
Message-ID: 20110719.114648.244414910322696865.t-ishii@sraoss.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

>> We talked about this during PGCon. The idea that seemed to have
>> consensues there was to export the parser similarly to how we build
>> the ecpg parser, that is, a set of perl scripts which take our
>> gram.y as input and modify it to emit something different.
>
> That solves the non-customized part of the problem, which is worth
> solving.

In addition to this, parsing SQL may need tree walker functions and
some support functions(e.g. memory management). These are source files
from pgpool-II's parser directory.

copyfuncs.c kwlist.h nodes.c pg_list.h pool_string.h value.c
gram.c kwlookup.c nodes.h pg_wchar.h primnodes.h value.h
gram.h list.c outfuncs.c pool_memory.c scan.c wchar.c
gramparse.h makefuncs.c parsenodes.h pool_memory.h scanner.h
keywords.c makefuncs.h parser.c pool_parser.h scansup.c
keywords.h memnodes.h parser.h pool_string.c scansup.h

> A complete parser for a given DB would need catalog access,
> i.e. a live connection to that DB.

Yes, pgpool-II does some of this. (for example, to know if particular
table is a tempory table, to expand "*" to real column lists and so on).

Just F.Y.I.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp


From: Petr Jelínek <pjmodos(at)pjmodos(dot)net>
To: Jim Nasby <jim(at)nasby(dot)net>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: proposal: new contrib module plpgsql's embeded sql validator
Date: 2011-07-19 19:15:46
Message-ID: 4E25D7E2.6090805@pjmodos.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 07/17/2011 10:31 PM, Jim Nasby wrote:
> On Jul 7, 2011, at 11:31 PM, Pavel Stehule wrote:
>> a lazy deep SQL validation inside plpgsq functions is interesting
>> attribute. It allows to work with temporary tables and it make testing
>> and debugging harder, because lot of errors in embedded queries are
>> detected too late. I wrote a simple module that can to help little
>> bit. It is based on plpgsql plugin API and it ensures a deep
>> validation of embedded sql early - after start of execution. I am
>> thinking, so this plugin is really useful and it is example of plpgsql
>> pluging - that is missing in contrib.
> I think this should at least be a contrib module; it seems very useful.
>

Yes I agree this should be part of pg distribution.

But, I think we should add valitation hook to plpgsql plugin structure
so that you don't have to actually execute the function to check it -
curretly there are only executing hooks which is why the plugin only
works when you the func (not good for automation).

--
Petr Jelinek


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Petr Jelínek <pjmodos(at)pjmodos(dot)net>
Cc: Jim Nasby <jim(at)nasby(dot)net>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: proposal: new contrib module plpgsql's embeded sql validator
Date: 2011-07-19 19:42:33
Message-ID: CAFj8pRChdnwAqXxuqAYVp48sWx401m2h3eLZaTqp3wkE5+=evg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Dne 19. července 2011 21:15 Petr Jelínek <pjmodos(at)pjmodos(dot)net> napsal(a):
> On 07/17/2011 10:31 PM, Jim Nasby wrote:
>>
>> On Jul 7, 2011, at 11:31 PM, Pavel Stehule wrote:
>>>
>>> a lazy deep SQL validation inside plpgsq functions is interesting
>>> attribute. It allows to work with temporary tables and it make testing
>>> and debugging harder, because lot of errors in embedded queries are
>>> detected too late. I wrote a simple module that can to help little
>>> bit. It is based on plpgsql plugin API and it ensures a deep
>>> validation of embedded sql early - after start of execution. I am
>>> thinking, so this plugin is really useful and it is example of plpgsql
>>> pluging - that is missing in contrib.
>>
>> I think this should at least be a contrib module; it seems very useful.
>>
>
> Yes I agree this should be part of pg distribution.
>
> But, I think we should add valitation hook to plpgsql plugin structure so
> that you don't have to actually execute the function to check it - curretly
> there are only executing hooks which is why the plugin only works when you
> the func (not good for automation).
>

should be great, but there are still few limits in compile time

* polymorphic parameters
* triggers - there are no a info about relation in compile time

we can adapt a #option keyword for using in some plpgsql plugins

for example - for addition information that are necessary for usage of
lint in compilation time

CREATE OR REPLACE FUNCTION foo ()
RETURNS ... AS $$

#option trigger_relation some_table_name
#option replace_anyelement integer

...

with this addition info it and some compile hook it is possible

Regards

Pavel

> --
> Petr Jelinek
>


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Petr Jelínek <pjmodos(at)pjmodos(dot)net>, Jim Nasby <jim(at)nasby(dot)net>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: proposal: new contrib module plpgsql's embeded sql validator
Date: 2011-07-20 03:51:53
Message-ID: CAFj8pRC2WNKPvANGS499c4UOyLY6BWHwi2hiGMJvm6Fg1xJc7w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2011/7/20 Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>:
> =?ISO-8859-1?Q?Petr_Jel=EDnek?= <pjmodos(at)pjmodos(dot)net> writes:
>> But, I think we should add valitation hook to plpgsql plugin structure
>> so that you don't have to actually execute the function to check it -
>> curretly there are only executing hooks which is why the plugin only
>> works when you the func (not good for automation).
>
> If you mean that such checks would be done automatically, no, they
> shouldn't be.  Consider a function that creates a table and then uses
> it, or even just depends on using a table that doesn't yet exist when
> you do CREATE FUNCTION.

yes, any deep check is not possible for function that uses a temporary tables.

A plpgsql_lint is not silver bullet - for these cases is necessary to
disable lint.

. I can't to speak generally - I have no idea, how much percent of
functions are functions with access to temporary tables - in my last
project I use 0 temp tables on cca 300 KB of plpgsql code.

The more terrible problem is a new dependency between functions. I use
a workaround - some like headers

CREATE FUNCTIONS foo(define interface here) RETURNS ... AS $$ BEGIN
RETURN; END; $$ LANGUAGE plpgsql;

....

...

--real implementation of foo
CREATE OR REPLACE FUNCTIONS foo(...)
RETURNS ...
AS ..

It works because I write a plpgsql script in hand - I don't use a dump
for plpgsql, but it is not solution for production servers. On second
hand - plpgsql_lint or some similar (and builtin or external) should
not be active on production servers. A planning only really processed
queries is necessary optimization if we have not a global plan cache.

Regards

Pavel

>
>                        regards, tom lane
>


From: Petr Jelínek <pjmodos(at)pjmodos(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jim Nasby <jim(at)nasby(dot)net>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: proposal: new contrib module plpgsql's embeded sql validator
Date: 2011-07-20 17:09:52
Message-ID: 4E270BE0.1080503@pjmodos.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 07/20/2011 05:24 AM, Tom Lane wrote:
> =?ISO-8859-1?Q?Petr_Jel=EDnek?=<pjmodos(at)pjmodos(dot)net> writes:
>> But, I think we should add valitation hook to plpgsql plugin structure
>> so that you don't have to actually execute the function to check it -
>> curretly there are only executing hooks which is why the plugin only
>> works when you the func (not good for automation).
> If you mean that such checks would be done automatically, no, they
> shouldn't be. Consider a function that creates a table and then uses
> it, or even just depends on using a table that doesn't yet exist when
> you do CREATE FUNCTION.

No, certainly not by default, I would just like to be able to do it
automatically without having to call the function.

--
Petr Jelinek


From: Jim Nasby <jim(at)nasby(dot)net>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Petr Jelínek <pjmodos(at)pjmodos(dot)net>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: proposal: new contrib module plpgsql's embeded sql validator
Date: 2011-07-21 22:00:50
Message-ID: CD8B5F74-2D12-4B23-A5A7-F61808AF78C0@nasby.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Jul 19, 2011, at 10:51 PM, Pavel Stehule wrote:
>> If you mean that such checks would be done automatically, no, they
>> shouldn't be. Consider a function that creates a table and then uses
>> it, or even just depends on using a table that doesn't yet exist when
>> you do CREATE FUNCTION.
>
> yes, any deep check is not possible for function that uses a temporary tables.
>
> A plpgsql_lint is not silver bullet - for these cases is necessary to
> disable lint.
>
> . I can't to speak generally - I have no idea, how much percent of
> functions are functions with access to temporary tables - in my last
> project I use 0 temp tables on cca 300 KB of plpgsql code.
>
> The more terrible problem is a new dependency between functions. I use
> a workaround - some like headers

You can work around temp table issues the same way: just define the temp table before you create the function.

In practice, if I have a function that depends on a temp table it either creates it itself if it doesn't already exist or I have a separate function to create the table; that way you have a single place that has the temp table definition, and that is in the database itself.
--
Jim C. Nasby, Database Architect jim(at)nasby(dot)net
512.569.9461 (cell) http://jim.nasby.net


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Jim Nasby <jim(at)nasby(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Petr Jelínek <pjmodos(at)pjmodos(dot)net>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: proposal: new contrib module plpgsql's embeded sql validator
Date: 2011-07-22 04:50:30
Message-ID: CAFj8pRBA6Unt7ayFh1MtO8+hWPs3CQgFsZBE0dU1ixCa3uOh2Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2011/7/22 Jim Nasby <jim(at)nasby(dot)net>:
> On Jul 19, 2011, at 10:51 PM, Pavel Stehule wrote:
>>> If you mean that such checks would be done automatically, no, they
>>> shouldn't be.  Consider a function that creates a table and then uses
>>> it, or even just depends on using a table that doesn't yet exist when
>>> you do CREATE FUNCTION.
>>
>> yes, any deep check is not possible for function that uses a temporary tables.
>>
>> A plpgsql_lint is not silver bullet - for these cases is necessary to
>> disable lint.
>>
>> . I can't to speak generally - I have no idea, how much percent of
>> functions are functions with access to temporary tables - in my last
>> project I use 0 temp tables on cca 300 KB of plpgsql code.
>>
>> The more terrible problem is a new dependency between functions. I use
>> a workaround - some like headers
>
> You can work around temp table issues the same way: just define the temp table before you create the function.
>
> In practice, if I have a function that depends on a temp table it either creates it itself if it doesn't already exist or I have a separate function to create the table; that way you have a single place that has the temp table definition, and that is in the database itself.

there is other trick - use a persistent table with same name before.
Runtime temporary table is near in search_path, so all executed SQL
will be related to this temp table.

Pavel

> --
> Jim C. Nasby, Database Architect                   jim(at)nasby(dot)net
> 512.569.9461 (cell)                         http://jim.nasby.net
>
>
>