Re: Proposal for XML Schema Validation

Lists: pgsql-hackers
From: "Kodamasimham Pridhvi (MT2012066)" <Pridhvi(dot)Kodamasimham(at)iiitb(dot)org>
To: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Cc: Bisen Vikrantsingh Mohansingh MT2012036 <BisenVikrantsingh(dot)Mohansingh(at)iiitb(dot)org>, "rc(at)iiitb(dot)ac(dot)in" <rc(at)iiitb(dot)ac(dot)in>
Subject: Proposal for XML Schema Validation
Date: 2013-08-08 04:42:11
Message-ID: b008b4308f5c45b4a3effc21fd94bc91@SINPR01MB057.apcprd01.prod.exchangelabs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hello pgsql-hackers ,

With reference to "Add XML Schema validation and xmlvalidate functions (SQL:2008)" in ToDo list, we have gone through pgsql-mailing list but we didn't find any significant work in this area, so we are proposing our own model for xml schema validation . please kindly go through it and let us know how can we improve it.Please find the attached proposal document.

Thanks,
Kodamasimham Pridhvi
International Institute of Information Technology, Bangalore

Attachment Content-Type Size
proposal_v1.pdf application/pdf 276.3 KB

From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: "Kodamasimham Pridhvi (MT2012066)" <Pridhvi(dot)Kodamasimham(at)iiitb(dot)org>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Bisen Vikrantsingh Mohansingh MT2012036 <BisenVikrantsingh(dot)Mohansingh(at)iiitb(dot)org>, "rc(at)iiitb(dot)ac(dot)in" <rc(at)iiitb(dot)ac(dot)in>
Subject: Re: Proposal for XML Schema Validation
Date: 2013-08-08 13:24:02
Message-ID: 52039BF2.5050709@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On 08/08/2013 12:42 AM, Kodamasimham Pridhvi (MT2012066) wrote:
> Hello pgsql-hackers ,
> With reference to "Add XML Schema validation and xmlvalidate
> functions (SQL:2008)" in ToDo list, we have gone through pgsql-mailing
> list but we didn't find any significant work in this area, so we are
> proposing our own model for xml schema validation . please kindly go
> through it and let us know how can we improve it.Please find the
> attached proposal document.
>
>
>
>

Please post your proposal as text, not as a PDF attachment. That's what
is preferred on this mailing list.

cheers

andrew


From: "Kodamasimham Pridhvi (MT2012066)" <Pridhvi(dot)Kodamasimham(at)iiitb(dot)org>
To: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Cc: Bisen Vikrantsingh Mohansingh MT2012036 <BisenVikrantsingh(dot)Mohansingh(at)iiitb(dot)org>
Subject: Re: Proposal for XML Schema Validation
Date: 2013-08-08 16:39:26
Message-ID: da983329e46f481f97bc9f5bc7205f00@HKNPR01MB050.apcprd01.prod.exchangelabs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Objective: To Add XML Schema validation and xmlvalidate functions (SQL:2008)

Description:
We’ve gone through current support of xml in postgreSQL and found that there is a check for well-formedness of xml document while inserting and updating. We want to extend this feature by adding xml schema validation.
We will be providing user with DDL commands for creating and deleting XML Schema, also provision of associating xml schema with table while creation of new table or while altering table structure, we are planning to use libxml2 library. Proposed syntax is given below.

1. End user perspective:

1.1 DDL
1.1.1 Register xmlschema
Syntax
REGSITER_XML_SCHEMA( “<URL OF SCHEMA>”, “<NAMESPACE>” , “<CONTENT OF SCHEMA .XSD FILE>")
We will save this information into system catalog

1.1.2 Delete xmlschema
Syntax
DELETE_XML_SCHEMA( “<URL OF SCHEMA>”)

1.1.3 Modification in Create Table commands
Syntax
Create table <tablename> (<col_name> <datatype>, <col_name> xml USE_SCHEMA <URL OF SCHEMA> )
We will keep a flag in catalog for xml schema validation for each table. If xml schema is specified then while every insert/update sql query we will call valdate_xml_schema() [currently built in xml_is_well_formed() is called while inserting/updating, we can place our function call just next to it]

1.1.4 Similarly for Alter Table commands

2. Developer perspective

2.1. C-Function for Validation of xml doc

2.1.1 Validating XML

Syntax
Int<err_code> validate_xml_schema(char xml[], char xml_schema[])
This function will return 0 if validate successfully else return respective error code (which we will define later)
We are planning to use libxml2
This function will called while insert/update sql query

________________________________________
From: Andrew Dunstan <andrew(at)dunslane(dot)net>
Sent: Thursday, August 08, 2013 6:54 PM
To: Kodamasimham Pridhvi (MT2012066)
Cc: pgsql-hackers(at)postgresql(dot)org; Bisen Vikrantsingh Mohansingh MT2012036; rc(at)iiitb(dot)ac(dot)in
Subject: Re: [HACKERS] Proposal for XML Schema Validation

On 08/08/2013 12:42 AM, Kodamasimham Pridhvi (MT2012066) wrote:
> Hello pgsql-hackers ,
> With reference to "Add XML Schema validation and xmlvalidate
> functions (SQL:2008)" in ToDo list, we have gone through pgsql-mailing
> list but we didn't find any significant work in this area, so we are
> proposing our own model for xml schema validation . please kindly go
> through it and let us know how can we improve it.Please find the
> attached proposal document.
>
>
>
>

Please post your proposal as text, not as a PDF attachment. That's what
is preferred on this mailing list.

cheers

andrew


From: Craig Ringer <craig(at)2ndquadrant(dot)com>
To: "Kodamasimham Pridhvi (MT2012066)" <Pridhvi(dot)Kodamasimham(at)iiitb(dot)org>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Bisen Vikrantsingh Mohansingh MT2012036 <BisenVikrantsingh(dot)Mohansingh(at)iiitb(dot)org>
Subject: Re: Proposal for XML Schema Validation
Date: 2013-08-09 02:57:03
Message-ID: 52045A7F.3000604@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 08/09/2013 12:39 AM, Kodamasimham Pridhvi (MT2012066) wrote:
>
> Objective: To Add XML Schema validation and xmlvalidate functions (SQL:2008)
>
> Description:
> We’ve gone through current support of xml in postgreSQL and found that there is a check for well-formedness of xml document while inserting and updating. We want to extend this feature by adding xml schema validation.
> We will be providing user with DDL commands for creating and deleting XML Schema, also provision of associating xml schema with table while creation of new table or while altering table structure, we are planning to use libxml2 library. Proposed syntax is given below.

The first thing that comes to mind here is "what if the user wants to
update/replace the schema" ? How would you handle re-validating the fields?

Sure, updating XML schemas is not a great idea, but it doesn't stop
people doing it. It might be reasonable to say "if you want to do this
you have to drop the dependent constraints, drop the schema, re-create
the schema and re-create the schema constraints" though.

Why extend the create table / alter table syntax with "USE_SCHEMA"? Is
there a compatibility/standards reason to do this? If not, what
advantage does this provide over using a suitable CHECK constraint?

IIRC there were some memory management issues with libxml2 in Pg. Anyone
remember anything about that?

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Bisen Vikrantsingh Mohansingh MT2012036 <BisenVikrantsingh(dot)Mohansingh(at)iiitb(dot)org>
To: Craig Ringer <craig(at)2ndquadrant(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, "Kodamasimham Pridhvi (MT2012066)" <Pridhvi(dot)Kodamasimham(at)iiitb(dot)org>
Subject: Re: Proposal for XML Schema Validation
Date: 2013-08-09 09:24:21
Message-ID: 6a028c4a09e741a287dcd6af5fe89384@HKXPR01MB055.apcprd01.prod.exchangelabs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi Craig Ringer,

yeah, you are right indeed. I tried to answer your question in three section as below.

(A) XML Schema update

User may wish to update schema in future. So we will provide them one more function

UPDATE_XML_SCHEMA("<URL OF SCHEMA>","<NAMESPACE>","<NEW CONTENT OF .XSD>")

Here we are assuming URL OF SCHEMA as a primary key. And we will overwrite content of .xsd field no provision
of altering a selective portion xsd in place.

Whenever an update to schema happens
before committing changes we will run small algo as below
1. For all table in which this schema is used
2. if(!validate xml document for each row)
3. abort/exit with error;
4. commit

If user modify schema by adding some extra optional tags then their won't be any error
,error will arise in cases such as adding new compulsory/required tags, datatype in .xsd for
certain tag is modified. This is beyond our control an obvious solution as suggested by you
could be used, user will manually go through rows which are violating schema (for simplicity,
we will mention row number which are violating schema in our error message) and do
modification/deletion as required.

(
similar case happen, suppose you have a table t(a,b,c) with lot of data, later on
you want to add primary key constraints to column 'a', but if data in column 'a' is
not unique then it may fail, and user has to manually handle this situation may be by deleting or
modifying respective rows.
)

(B) Alter Table

Only sole purpose of making use of keyword USE_SCHEMA is to mimic oracle (somewhere on
oracle site i found this type of syntax), I may not be correct but check constraint is only used to
limit the value ranges. So it is better to introduce new meaningful keyword or else no problem
to work embed this feature with CHECK()

(C)

yes , there are memory management related issue with libxml as mentioned on below link
http://wiki.postgresql.org/wiki/XML_Support#Implementation_Issues
It is also mention there that this issue can be resolved(how? don't know!).

Thanks,
Vikrantsingh & Pridhvi
IIIT Bangalore
________________________________________
From: Craig Ringer <craig(at)2ndquadrant(dot)com>
Sent: Friday, August 09, 2013 8:27 AM
To: Kodamasimham Pridhvi (MT2012066)
Cc: pgsql-hackers(at)postgresql(dot)org; Bisen Vikrantsingh Mohansingh MT2012036
Subject: Re: [HACKERS] Proposal for XML Schema Validation

On 08/09/2013 12:39 AM, Kodamasimham Pridhvi (MT2012066) wrote:
>
> Objective: To Add XML Schema validation and xmlvalidate functions (SQL:2008)
>
> Description:
> We’ve gone through current support of xml in postgreSQL and found that there is a check for well-formedness of xml document while inserting and updating. We want to extend this feature by adding xml schema validation.
> We will be providing user with DDL commands for creating and deleting XML Schema, also provision of associating xml schema with table while creation of new table or while altering table structure, we are planning to use libxml2 library. Proposed syntax is given below.

The first thing that comes to mind here is "what if the user wants to
update/replace the schema" ? How would you handle re-validating the fields?

Sure, updating XML schemas is not a great idea, but it doesn't stop
people doing it. It might be reasonable to say "if you want to do this
you have to drop the dependent constraints, drop the schema, re-create
the schema and re-create the schema constraints" though.

Why extend the create table / alter table syntax with "USE_SCHEMA"? Is
there a compatibility/standards reason to do this? If not, what
advantage does this provide over using a suitable CHECK constraint?

IIRC there were some memory management issues with libxml2 in Pg. Anyone
remember anything about that?

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Bisen Vikrantsingh Mohansingh MT2012036 <BisenVikrantsingh(dot)Mohansingh(at)iiitb(dot)org>
To: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Cc: "Kodamasimham Pridhvi (MT2012066)" <Pridhvi(dot)Kodamasimham(at)iiitb(dot)org>
Subject: Re: Proposal for XML Schema Validation
Date: 2013-08-09 09:41:50
Message-ID: ed3acce13bc5441984af04ee868011c9@HKXPR01MB055.apcprd01.prod.exchangelabs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi Craig Ringer,

yeah, you are right indeed. I tried to answer your question in three section as below.

(A) XML Schema update

User may wish to update schema in future. So we will provide them one more function

UPDATE_XML_SCHEMA("<URL OF SCHEMA>","<NAMESPACE>","<NEW CONTENT OF .XSD>")

Here we are assuming URL OF SCHEMA as a primary key. And we will overwrite content of .xsd field no provision
of altering a selective portion xsd in place.

Whenever an update to schema happens
before committing changes we will run small algo as below
1. For all table in which this schema is used
2. if(!validate xml document for each row)
3. abort/exit with error;
4. commit

If user modify schema by adding some extra optional tags then their won't be any error
,error will arise in cases such as adding new compulsory/required tags, datatype in .xsd for
certain tag is modified. This is beyond our control an obvious solution as suggested by you
could be used, user will manually go through rows which are violating schema (for simplicity,
we will mention row number which are violating schema in our error message) and do
modification/deletion as required.

(
similar case happen, suppose you have a table t(a,b,c) with lot of data, later on
you want to add primary key constraints to column 'a', but if data in column 'a' is
not unique then it may fail, and user has to manually handle this situation may be by deleting or
modifying respective rows.
)

(B) Alter Table

Only sole purpose of making use of keyword USE_SCHEMA is to mimic oracle (somewhere on
oracle site i found this type of syntax), I may not be correct but check constraint is only used to
limit the value ranges. So it is better to introduce new meaningful keyword or else no problem
to work embed this feature with CHECK()

(C)

yes , there are memory management related issue with libxml as mentioned on below link
http://wiki.postgresql.org/wiki/XML_Support#Implementation_Issues
It is also mention there that this issue can be resolved(how? don't know!).

Thanks,
Vikrantsingh & Pridhvi
IIIT Bangalore
________________________________________
From: Craig Ringer <craig(at)2ndquadrant(dot)com>
Sent: Friday, August 09, 2013 8:27 AM
To: Kodamasimham Pridhvi (MT2012066)
Cc: pgsql-hackers(at)postgresql(dot)org; Bisen Vikrantsingh Mohansingh MT2012036
Subject: Re: [HACKERS] Proposal for XML Schema Validation

On 08/09/2013 12:39 AM, Kodamasimham Pridhvi (MT2012066) wrote:
>
> Objective: To Add XML Schema validation and xmlvalidate functions (SQL:2008)
>
> Description:
> We’ve gone through current support of xml in postgreSQL and found that there is a check for well-formedness of xml document while inserting and updating. We want to extend this feature by adding xml schema validation.
> We will be providing user with DDL commands for creating and deleting XML Schema, also provision of associating xml schema with table while creation of new table or while altering table structure, we are planning to use libxml2 library. Proposed syntax is given below.

The first thing that comes to mind here is "what if the user wants to
update/replace the schema" ? How would you handle re-validating the fields?

Sure, updating XML schemas is not a great idea, but it doesn't stop
people doing it. It might be reasonable to say "if you want to do this
you have to drop the dependent constraints, drop the schema, re-create
the schema and re-create the schema constraints" though.

Why extend the create table / alter table syntax with "USE_SCHEMA"? Is
there a compatibility/standards reason to do this? If not, what
advantage does this provide over using a suitable CHECK constraint?

IIRC there were some memory management issues with libxml2 in Pg. Anyone
remember anything about that?

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: "Kodamasimham Pridhvi (MT2012066)" <Pridhvi(dot)Kodamasimham(at)iiitb(dot)org>
To: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Cc: Bisen Vikrantsingh Mohansingh MT2012036 <BisenVikrantsingh(dot)Mohansingh(at)iiitb(dot)org>
Subject: FW: Proposal for XML Schema Validation
Date: 2013-08-09 09:47:02
Message-ID: feec6d4b36004a2ba8b5e0118704148c@SINPR01MB057.apcprd01.prod.exchangelabs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi Craig Ringer,

(Sorry for reposting it as it got posted in different thread previously)
yeah, you are right indeed. I tried to answer your question in three section as below.

(A) XML Schema update

User may wish to update schema in future. So we will provide them one more function

UPDATE_XML_SCHEMA("<URL OF SCHEMA>","<NAMESPACE>","<NEW CONTENT OF .XSD>")

Here we are assuming URL OF SCHEMA as a primary key. And we will overwrite content of .xsd field no provision
of altering a selective portion xsd in place.

Whenever an update to schema happens
before committing changes we will run small algo as below
1. For all table in which this schema is used
2. if(!validate xml document for each row)
3. abort/exit with error;
4. commit

If user modify schema by adding some extra optional tags then their won't be any error
,error will arise in cases such as adding new compulsory/required tags, datatype in .xsd for
certain tag is modified. This is beyond our control an obvious solution as suggested by you
could be used, user will manually go through rows which are violating schema (for simplicity,
we will mention row number which are violating schema in our error message) and do
modification/deletion as required.

(
similar case happen, suppose you have a table t(a,b,c) with lot of data, later on
you want to add primary key constraints to column 'a', but if data in column 'a' is
not unique then it may fail, and user has to manually handle this situation may be by deleting or
modifying respective rows.
)

(B) Alter Table

Only sole purpose of making use of keyword USE_SCHEMA is to mimic oracle (somewhere on
oracle site i found this type of syntax), I may not be correct but check constraint is only used to
limit the value ranges. So it is better to introduce new meaningful keyword or else no problem
to work embed this feature with CHECK()

(C)

yes , there are memory management related issue with libxml as mentioned on below link
http://wiki.postgresql.org/wiki/XML_Support#Implementation_Issues
It is also mention there that this issue can be resolved(how? don't know!).

Thanks,
Vikrantsingh & Pridhvi
IIIT Bangalore
________________________________________
From: Craig Ringer <craig(at)2ndquadrant(dot)com>
Sent: Friday, August 09, 2013 8:27 AM
To: Kodamasimham Pridhvi (MT2012066)
Cc: pgsql-hackers(at)postgresql(dot)org; Bisen Vikrantsingh Mohansingh MT2012036
Subject: Re: [HACKERS] Proposal for XML Schema Validation

On 08/09/2013 12:39 AM, Kodamasimham Pridhvi (MT2012066) wrote:
>
> Objective: To Add XML Schema validation and xmlvalidate functions (SQL:2008)
>
> Description:
> We’ve gone through current support of xml in postgreSQL and found that there is a check for well-formedness of xml document while inserting and updating. We want to extend this feature by adding xml schema validation.
> We will be providing user with DDL commands for creating and deleting XML Schema, also provision of associating xml schema with table while creation of new table or while altering table structure, we are planning to use libxml2 library. Proposed syntax is given below.

The first thing that comes to mind here is "what if the user wants to
update/replace the schema" ? How would you handle re-validating the fields?

Sure, updating XML schemas is not a great idea, but it doesn't stop
people doing it. It might be reasonable to say "if you want to do this
you have to drop the dependent constraints, drop the schema, re-create
the schema and re-create the schema constraints" though.

Why extend the create table / alter table syntax with "USE_SCHEMA"? Is
there a compatibility/standards reason to do this? If not, what
advantage does this provide over using a suitable CHECK constraint?

IIRC there were some memory management issues with libxml2 in Pg. Anyone
remember anything about that?

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: "Kodamasimham Pridhvi (MT2012066)" <Pridhvi(dot)Kodamasimham(at)iiitb(dot)org>
To: Craig Ringer <craig(at)2ndquadrant(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, "Bisen Vikrantsingh Mohansingh MT2012036" <BisenVikrantsingh(dot)Mohansingh(at)iiitb(dot)org>
Subject: Re: Proposal for XML Schema Validation
Date: 2013-08-09 09:55:18
Message-ID: 09014d1071af4c398d67f8c26bda6a6d@SINPR01MB057.apcprd01.prod.exchangelabs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi Craig Ringer,

(Sorry for reposting it as it got posted in different thread previously)
yeah, you are right indeed. I tried to answer your question in three section as below.

(A) XML Schema update

User may wish to update schema in future. So we will provide them one more function

UPDATE_XML_SCHEMA("<URL OF SCHEMA>","<NAMESPACE>","<NEW CONTENT OF .XSD>")

Here we are assuming URL OF SCHEMA as a primary key. And we will overwrite content of .xsd field no provision
of altering a selective portion xsd in place.

Whenever an update to schema happens
before committing changes we will run small algo as below
1. For all table in which this schema is used
2. if(!validate xml document for each row)
3. abort/exit with error;
4. commit

If user modify schema by adding some extra optional tags then their won't be any error
,error will arise in cases such as adding new compulsory/required tags, datatype in .xsd for
certain tag is modified. This is beyond our control an obvious solution as suggested by you
could be used, user will manually go through rows which are violating schema (for simplicity,
we will mention row number which are violating schema in our error message) and do
modification/deletion as required.

(
similar case happen, suppose you have a table t(a,b,c) with lot of data, later on
you want to add primary key constraints to column 'a', but if data in column 'a' is
not unique then it may fail, and user has to manually handle this situation may be by deleting or
modifying respective rows.
)

(B) Alter Table

Only sole purpose of making use of keyword USE_SCHEMA is to mimic oracle (somewhere on
oracle site i found this type of syntax), I may not be correct but check constraint is only used to
limit the value ranges. So it is better to introduce new meaningful keyword or else no problem
to work embed this feature with CHECK()

(C)

yes , there are memory management related issue with libxml as mentioned on below link
http://wiki.postgresql.org/wiki/XML_Support#Implementation_Issues
It is also mention there that this issue can be resolved(how? don't know!).

Thanks,
Pridhvi & Vikrantsingh
IIIT Bangalore
________________________________________
From: Craig Ringer <craig(at)2ndquadrant(dot)com>
Sent: Friday, August 09, 2013 8:27 AM
To: Kodamasimham Pridhvi (MT2012066)
Cc: pgsql-hackers(at)postgresql(dot)org; Bisen Vikrantsingh Mohansingh MT2012036
Subject: Re: [HACKERS] Proposal for XML Schema Validation

On 08/09/2013 12:39 AM, Kodamasimham Pridhvi (MT2012066) wrote:
>
> Objective: To Add XML Schema validation and xmlvalidate functions (SQL:2008)
>
> Description:
> We’ve gone through current support of xml in postgreSQL and found that there is a check for well-formedness of xml document while inserting and updating. We want to extend this feature by adding xml schema validation.
> We will be providing user with DDL commands for creating and deleting XML Schema, also provision of associating xml schema with table while creation of new table or while altering table structure, we are planning to use libxml2 library. Proposed syntax is given below.

The first thing that comes to mind here is "what if the user wants to
update/replace the schema" ? How would you handle re-validating the fields?

Sure, updating XML schemas is not a great idea, but it doesn't stop
people doing it. It might be reasonable to say "if you want to do this
you have to drop the dependent constraints, drop the schema, re-create
the schema and re-create the schema constraints" though.

Why extend the create table / alter table syntax with "USE_SCHEMA"? Is
there a compatibility/standards reason to do this? If not, what
advantage does this provide over using a suitable CHECK constraint?

IIRC there were some memory management issues with libxml2 in Pg. Anyone
remember anything about that?

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Bisen Vikrantsingh Mohansingh MT2012036 <BisenVikrantsingh(dot)Mohansingh(at)iiitb(dot)org>
To: "Kodamasimham Pridhvi (MT2012066)" <Pridhvi(dot)Kodamasimham(at)iiitb(dot)org>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposal for XML Schema Validation
Date: 2013-08-09 10:03:37
Message-ID: 278c3407cf634697bce8e18ed43d843f@HKXPR01MB055.apcprd01.prod.exchangelabs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi ,

In support to proposal, I'm extending by providing use case scenario.

Consider a table student(id int, profile xml) where you are storing student id and their profile in xml format.
Let xml format for profile be as follow.

========profile.xml===========
<student>
<name>
<firstname>xyz</firstname>
<lastname>abc</lastname>
</name>
<age>20</age>
<course>DB101</course>
<address>
<city>bangalore</city>
<pincode>560100</pincode>
</address>
</student>
============================

There may a situation where course (tag) may be missing in xml file. The insertion of this file will not prompt any error message (even though it is one of important field) because current version of postgresql only check for well-formedness of xml document, no provision for validation against xml schema.

Initially user has to register his schema by using our function REGSITER_XML_SCHEMA( “<URL OF SCHEMA>”, “<NAMESPACE>” , “<CONTENT OF SCHEMA .XSD FILE>") where user has to provide the URI for the schema , Namespace to avoid namespace clashes and need to provide the complete schema in text and while creating the table user must associate the schema with column as below :

Create table <tablename> (<col_name> <datatype>, <col_name> xml USE_SCHEMA <URL OF SCHEMA> );

eg: Create table student (id int, profile xml USE_SCHEMA "http://www.example.com/profile");

So we are providing facility to register xml schema against a column with a datatype xml in a table, so that when ever an insertion in that column happens query processor (or which ever unit involved in this process) will first call our function validate_xml_schema()(as proposed in previous mail), which will decide to proceed or abort the insertion.

In this way we can ensure that all the documents in the xml column are valid against schema.

Thanks,
Vikrantsingh
________________________________________
From: Kodamasimham Pridhvi (MT2012066)
Sent: Thursday, August 08, 2013 10:09 PM
To: pgsql-hackers(at)postgresql(dot)org
Cc: Bisen Vikrantsingh Mohansingh MT2012036
Subject: RE: [HACKERS] Proposal for XML Schema Validation

Objective: To Add XML Schema validation and xmlvalidate functions (SQL:2008)

Description:
We’ve gone through current support of xml in postgreSQL and found that there is a check for well-formedness of xml document while inserting and updating. We want to extend this feature by adding xml schema validation.
We will be providing user with DDL commands for creating and deleting XML Schema, also provision of associating xml schema with table while creation of new table or while altering table structure, we are planning to use libxml2 library. Proposed syntax is given below.

1. End user perspective:

1.1 DDL
1.1.1 Register xmlschema
Syntax
REGSITER_XML_SCHEMA( “<URL OF SCHEMA>”, “<NAMESPACE>” , “<CONTENT OF SCHEMA .XSD FILE>")
We will save this information into system catalog

1.1.2 Delete xmlschema
Syntax
DELETE_XML_SCHEMA( “<URL OF SCHEMA>”)

1.1.3 Modification in Create Table commands
Syntax
Create table <tablename> (<col_name> <datatype>, <col_name> xml USE_SCHEMA <URL OF SCHEMA> )
We will keep a flag in catalog for xml schema validation for each table. If xml schema is specified then while every insert/update sql query we will call valdate_xml_schema() [currently built in xml_is_well_formed() is called while inserting/updating, we can place our function call just next to it]

1.1.4 Similarly for Alter Table commands

2. Developer perspective

2.1. C-Function for Validation of xml doc

2.1.1 Validating XML

Syntax
Int<err_code> validate_xml_schema(char xml[], char xml_schema[])
This function will return 0 if validate successfully else return respective error code (which we will define later)
We are planning to use libxml2
This function will called while insert/update sql query

________________________________________
From: Andrew Dunstan <andrew(at)dunslane(dot)net>
Sent: Thursday, August 08, 2013 6:54 PM
To: Kodamasimham Pridhvi (MT2012066)
Cc: pgsql-hackers(at)postgresql(dot)org; Bisen Vikrantsingh Mohansingh MT2012036; rc(at)iiitb(dot)ac(dot)in
Subject: Re: [HACKERS] Proposal for XML Schema Validation

On 08/08/2013 12:42 AM, Kodamasimham Pridhvi (MT2012066) wrote:
> Hello pgsql-hackers ,
> With reference to "Add XML Schema validation and xmlvalidate
> functions (SQL:2008)" in ToDo list, we have gone through pgsql-mailing
> list but we didn't find any significant work in this area, so we are
> proposing our own model for xml schema validation . please kindly go
> through it and let us know how can we improve it.Please find the
> attached proposal document.
>
>
>
>

Please post your proposal as text, not as a PDF attachment. That's what
is preferred on this mailing list.

cheers

andrew


From: Craig Ringer <craig(at)2ndquadrant(dot)com>
To: "Kodamasimham Pridhvi (MT2012066)" <Pridhvi(dot)Kodamasimham(at)iiitb(dot)org>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Bisen Vikrantsingh Mohansingh MT2012036 <BisenVikrantsingh(dot)Mohansingh(at)iiitb(dot)org>
Subject: Re: Proposal for XML Schema Validation
Date: 2013-08-09 14:14:34
Message-ID: 5204F94A.1050308@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 08/09/2013 05:55 PM, Kodamasimham Pridhvi (MT2012066) wrote:

> (B) Alter Table
>
> Only sole purpose of making use of keyword USE_SCHEMA is to mimic oracle (somewhere on
> oracle site i found this type of syntax)

Well, there's certainly precedent for that - see to_char, the various
different BEGIN permutations, etc.

I would suggest doing that as a second separate step though. First
produce a function based interface that can be tried and tested without
the need to mess with the syntax and the parser. Then once that's in
good shape propose a patch that adds the compatibility syntax.

Among other things, if you're not adding new syntax you're more likely
to be able to prototype this as an extension.

I'm very far from being an expert in getting patches into Pg, though, so
please don't just take my word for it.

> I may not be correct but check constraint is only used to
> limit the value ranges.

A CHECK constraint can be any logic that refers only to the current row.

Using it with non-immutable (stable/volatile) functions isn't prevented,
but is also not a great idea, so updating an xsd would be a concern, but
it'd otherwise be fine.

> yes , there are memory management related issue with libxml as mentioned on below link
> http://wiki.postgresql.org/wiki/XML_Support#Implementation_Issues
> It is also mention there that this issue can be resolved(how? don't know!).

Well, if you're planning on relying on libxml in core (and it'll have to
be in core if you're adding new syntax) then you'll need a solid, well
researched answer to that one or an alternative XML library that's
portable and doesn't have those issues.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Craig Ringer <craig(at)2ndquadrant(dot)com>
Cc: "Kodamasimham Pridhvi (MT2012066)" <Pridhvi(dot)Kodamasimham(at)iiitb(dot)org>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Bisen Vikrantsingh Mohansingh MT2012036 <BisenVikrantsingh(dot)Mohansingh(at)iiitb(dot)org>
Subject: Re: Proposal for XML Schema Validation
Date: 2013-08-09 14:35:32
Message-ID: 18726.1376058932@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Craig Ringer <craig(at)2ndquadrant(dot)com> writes:
> On 08/09/2013 05:55 PM, Kodamasimham Pridhvi (MT2012066) wrote:
>> Only sole purpose of making use of keyword USE_SCHEMA is to mimic oracle (somewhere on
>> oracle site i found this type of syntax)

> Well, there's certainly precedent for that - see to_char, the various
> different BEGIN permutations, etc.

> I would suggest doing that as a second separate step though. First
> produce a function based interface that can be tried and tested without
> the need to mess with the syntax and the parser. Then once that's in
> good shape propose a patch that adds the compatibility syntax.

TBH I think any such syntax would be rejected. We have enough trouble
dealing with the SQL standards committee's creative ideas about weird
syntax with unnecessary keywords. Oracle compatibility is not going
to be enough of an argument for inventing another keyword. Especially
not if it has to be reserved, which seems rather likely given where
you're proposing to put it.

Having to add another catalog column for the sole use of this feature is
another thing that's unlikely to fly. (A general rule of thumb is that
if a proposed feature imposes overhead on everybody, whether they ever
use that feature or not, it had better be something that a pretty large
percentage of people *will* use. I doubt this meets that standard.)

So if you can do it along the lines of CHECK(xml_validates(xml_col_name,
'schema name')), I would strongly urge you to pursue that path.

regards, tom lane


From: Noah Misch <noah(at)leadboat(dot)com>
To: "Kodamasimham Pridhvi (MT2012066)" <Pridhvi(dot)Kodamasimham(at)iiitb(dot)org>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Bisen Vikrantsingh Mohansingh MT2012036 <BisenVikrantsingh(dot)Mohansingh(at)iiitb(dot)org>, "rc(at)iiitb(dot)ac(dot)in" <rc(at)iiitb(dot)ac(dot)in>
Subject: Re: Proposal for XML Schema Validation
Date: 2013-08-10 16:31:25
Message-ID: 20130810163125.GA478418@tornado.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Aug 08, 2013 at 04:42:11AM +0000, Kodamasimham Pridhvi (MT2012066) wrote:
> With reference to "Add XML Schema validation and xmlvalidate functions (SQL:2008)" in ToDo list, we have gone through pgsql-mailing list but we didn't find any significant work in this area, so we are proposing our own model for xml schema validation . please kindly go through it and let us know how can we improve it.Please find the attached proposal document.

Note that PostgreSQL 8.3 had xmlvalidate() for a time; commit
3bf822c4d722d6245a65abdd2502a9d26ab990d5 removed it due to security problems.
A new implementation (or revamp of the old) must avoid reintroducing those
vulnerabilities. Similar considerations apply to XML Schema.

--
Noah Misch
EnterpriseDB http://www.enterprisedb.com


From: "Kodamasimham Pridhvi (MT2012066)" <Pridhvi(dot)Kodamasimham(at)iiitb(dot)org>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, "Bisen Vikrantsingh Mohansingh MT2012036" <BisenVikrantsingh(dot)Mohansingh(at)iiitb(dot)org>
Subject: Re: Proposal for XML Schema Validation
Date: 2013-08-12 16:25:31
Message-ID: 6505b4fd6bb140039dddf15e63aa2967@SINPR01MB057.apcprd01.prod.exchangelabs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, 10 Aug 2013 09:33:05 -0700, Noah Misch , wrote:
>Note that PostgreSQL 8.3 had xmlvalidate() for a time; commit

we found that, xmlvalidate() was for checking well formedness of an xml doc, not for validating against xml schema, we inferred this from Release note of 8.2

for reference, below is the content from release documentation of version 8.2
" In contrib/xml2/, rename xml_valid() to xml_is_well_formed() (Tom)
xml_valid() will remain for backward compatibility, but its behavior will change to do schema checking in a future release."
[http://www.postgresql.org/docs/8.2/static/release-8-2.html]

>3bf822c4d722d6245a65abdd2502a9d26ab990d5 removed it due to security problems.
>A new implementation (or revamp of the old) must avoid reintroducing those
>vulnerabilities. Similar considerations apply to XML Schema.

the main vulnerability in the xmlvalidate() [http://www.postgresql.org/message-id/20080301024649.3CDCD754108@cvs.postgresql.org] was mainly because one of the parameter was file path.

But in our case, we are taking xml as a string, currently we didn't proposed file path as a input parameter to any of our function.

Thanks,
Pridhvi & Vikrantsingh
IIIT Bangalore


From: "Kodamasimham Pridhvi (MT2012066)" <Pridhvi(dot)Kodamasimham(at)iiitb(dot)org>
To: Craig Ringer <craig(at)2ndquadrant(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, "Bisen Vikrantsingh Mohansingh MT2012036" <BisenVikrantsingh(dot)Mohansingh(at)iiitb(dot)org>
Subject: Re: Proposal for XML Schema Validation
Date: 2013-09-05 13:31:37
Message-ID: 1f7db2f297334d37a1076090ea7777c3@SINPR01MB057.apcprd01.prod.exchangelabs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2013-08-09 14:14:34, Craig Ringer wrote:
>Well, if you're planning on relying on libxml in core (and it'll have to
>be in core if you're adding new syntax) then you'll need a solid, well
>researched answer to that one or an alternative XML library that's
>portable and doesn't have those issues.

After doing some research regarding this issue and going through mailing list, we were unable to find a fix for memory management issue with libxml when used with postgresql can you please suggest us any alternative to libxml or redirect to some resources.

Thanks and Regrads,
Pridhvi and Vikranthsingh,
IIITB