Re: Case Preservation disregarding case sensitivity?

Lists: pgsql-hackerspgsql-sql
From: beau hargis <beauh(at)bluefrogmobile(dot)com>
To: pgsql-sql(at)postgresql(dot)org
Subject: Case Preservation disregarding case sensitivity?
Date: 2006-10-27 23:23:37
Message-ID: 200610271623.37525.beauh@bluefrogmobile.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-sql

Hello all.

As this is my first post to this particular mailing list, let me preface this
with a couple of items:

1) I have, indeed, searched far and wide for the answer to my question. I have
read and understood the responses to similar posts posing similar questions.
My next stop after this is the postgresql source code.

2) I develop database driven applications for telephony and mobile
applications in C/C++, perl and python. The databases I am working with
exceed 100GB. We have several production database systems.

3) I am pushing hard to replace our MySQL installations with PostgreSQL
(open-source and/or commercial) because Postgres has performed better by
several orders of magnitude and has been more stable and as consumed half the
resources in my testing.

The behavior I am seeing in Postgres in my development and migration process
is a showstopper. The manner in which Postgres handles case-sensitivity
breaks the current tools that I use and the applications that have been
written.

I am aware of the double-quote 'feature' which indicates that an element
should be treated in a case-sensitive way. This as been the 'answer' to every
question of this sort. This 'feature' does not solve the problem and
introduces other problems.

I have a data set that is designed in such a way as:

CREATE TABLE user_profile (
userProfileId int8 not null primary key,
userProfileTypeId int8 not null
);

CREATE TABLE user_profile_type (
userProfileTypeId int8 not null primary key,
userProfileType varchar not null
);

Ok, the description of the table from the database:

Table "public.user_profile"
Column | Type | Modifiers
-------------------+--------+-----------
userprofileid | bigint | not null
userprofiletypeid | bigint | not null
Indexes:
"user_profile_pkey" PRIMARY KEY, btree (userprofileid)

Ok, adding a foreign key:

ALTER table user_profile ADD CONSTRAINT fk_uproftype FOREIGN KEY
(userProfileTypeId) REFERENCES user_profile_type (userProfileTypeId);

...works successfully.

Works as it is supposed to work. Postgres is supposed to be case sensitive,
which is not a problem, but I want it to preserve the case of the column
names.

Every search has turned up the suggestion of using double-quotes around
identifiers in the creation of the tables. So we'll try that:

DROP TABLE user_profile;
DROP TABLE user_profile_type;
CREATE TABLE user_profile (
"userProfileId" int8 not null primary key,
"userProfileTypeId" int8 not null
);

CREATE TABLE user_profile_type (
"userProfileTypeId" int8 not null primary key,
"userProfileType" varchar not null
);

The table description:

Table "public.user_profile"
Column | Type | Modifiers
-------------------+--------+-----------
userProfileId | bigint | not null
userProfileTypeId | bigint | not null
Indexes:
"user_profile_pkey" PRIMARY KEY, btree ("userProfileId")

Case was preserved. Now lets add the foreign key just as we did before (note
that the case in the table definition and the ALTER TABLE query is the same):

ALTER TABLE user_profile ADD CONSTRAINT fk_uproftype FOREIGN KEY
(userProfileTypeId) REFERENCES user_profile_type (userProfileTypeId);

ERROR: column "userprofiletypeid" referenced in foreign key constraint does
not exist

The case was preserved, but the case-insensitivity only works one way. The
case of the input identifier was 'normalized' in a way, but not matched
against the column names in a case-insensitive way. This is the behavior that
breaks the tools that I use. I cannot modify these tools to add quotes around
the identifiers, and in my mind I should not have to.

OK, another query (perfectly valid SQL):

insert into user_profile_type
(userProfileTypeId,userProfileType) VALUES(1,'ABNORMAL');

ERROR: column "userprofiletypeid" of relation "user_profile_type" does not
exist

I am hoping that there is an easy way to obtain case-preservation with
case-insensitivity, or at the very least, case-preservation and complete
case-sensitivity, or case-preservation and a consistant case-conversion
strategy.

The case of the column names need to be preserved because that is the way the
schema is designed and most importantly (VERY, VERY IMPORTANT), column names
are used in apps as hash values, or as named references which are case
sensitive and as such need to be delivered to the client in exactly in the
manner specified at the time of table creation.

Again, I am looking for a way (magic, patches, whiskey, etc) that will give me
case-preservation with EITHER case-sensitivity OR case-insensitivity, but not
both as I am seeing.

Thanks in advance. I am hoping to find a solution to this so I can actually
convert one of our databases to use Postgres. And I can say that little
issues like this are precisely why Postgres was never used in this
organization before, even though several of the other database developers
like the features, stability and performance of Postgres.


From: Frank Bax <fbax(at)sympatico(dot)ca>
To: pgsql-sql(at)postgresql(dot)org
Subject: Re: Case Preservation disregarding case sensitivity?
Date: 2006-10-28 01:50:22
Message-ID: 5.2.1.1.0.20061027214608.04129070@pop6.sympatico.ca
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-sql

At 07:23 PM 10/27/06, beau hargis wrote:
>I am aware of the double-quote 'feature' which indicates that an element
>should be treated in a case-sensitive way. This as been the 'answer' to every
>question of this sort. This 'feature' does not solve the problem and
>introduces other problems.

If you use double-quotes when creating the table, you need to use
double-quotes EVERY time you access those elements. Neither of your two
examples (that produced errors) have double quotes.

>ALTER TABLE user_profile ADD CONSTRAINT fk_uproftype FOREIGN KEY
>(userProfileTypeId) REFERENCES user_profile_type (userProfileTypeId);
>
>ERROR: column "userprofiletypeid" referenced in foreign key constraint does
>not exist

>insert into user_profile_type
>(userProfileTypeId,userProfileType) VALUES(1,'ABNORMAL');
>
>ERROR: column "userprofiletypeid" of relation "user_profile_type" does not
>exist

The second query should be:

insert into user_profile_type ("userProfileTypeId","userProfileType")
VALUES(1,'ABNORMAL');


From: Joe <dev(at)freedomcircle(dot)net>
To: beau hargis <beauh(at)bluefrogmobile(dot)com>
Cc: pgsql-sql(at)postgresql(dot)org
Subject: Re: Case Preservation disregarding case sensitivity?
Date: 2006-10-28 02:38:52
Message-ID: 1162003132.687.8.camel@pampa
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-sql

Hi Beau,

On Fri, 2006-10-27 at 16:23 -0700, beau hargis wrote:
> I am hoping that there is an easy way to obtain case-preservation with
> case-insensitivity, or at the very least, case-preservation and complete
> case-sensitivity, or case-preservation and a consistant case-conversion
> strategy.
>
> The case of the column names need to be preserved because that is the way the
> schema is designed and most importantly (VERY, VERY IMPORTANT), column names
> are used in apps as hash values, or as named references which are case
> sensitive and as such need to be delivered to the client in exactly in the
> manner specified at the time of table creation.
>
> Again, I am looking for a way (magic, patches, whiskey, etc) that will give me
> case-preservation with EITHER case-sensitivity OR case-insensitivity, but not
> both as I am seeing.
>
> Thanks in advance. I am hoping to find a solution to this so I can actually
> convert one of our databases to use Postgres. And I can say that little
> issues like this are precisely why Postgres was never used in this
> organization before, even though several of the other database developers
> like the features, stability and performance of Postgres.

I went through the same issue in my conversion from MySQL to Postgres
and (since I had a small application) I ended up changing up all my
tables and columns "UserProfile" to user_profile.

I'm afraid however, that it's MySQL that is the odd man out. I haven't
researched this completely but I believe PG follows either the FIPS-127
or SQL-92 standard with respect to what are called "delimited
identifiers". Basically, this says if you want case sensitivity in
identifier names, you have to use double quotes wherever you refer to
the identifier. Without the double quotes, the SQL implementor can
either use UPPERCASE (as I believe Oracle and DB2 do) or lowercase (as
PG does) when it displays those identifiers.

Joe


From: Richard Broersma Jr <rabroersma(at)yahoo(dot)com>
To: beau hargis <beauh(at)bluefrogmobile(dot)com>, pgsql-sql(at)postgresql(dot)org
Subject: Re: Case Preservation disregarding case sensitivity?
Date: 2006-10-28 14:38:59
Message-ID: 467277.20359.qm@web31812.mail.mud.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-sql

> Case was preserved. Now lets add the foreign key just as we did before (note
> that the case in the table definition and the ALTER TABLE query is the same):
>
> ALTER TABLE user_profile ADD CONSTRAINT fk_uproftype FOREIGN KEY
> (userProfileTypeId) REFERENCES user_profile_type (userProfileTypeId);
> ^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^
> ERROR: column "userprofiletypeid" referenced in foreign key constraint does
> not exist

When ever you defince a column with quotes, all references to it must also contain quotes. Try:
ALTER TABLE user_profile ADD CONSTRAINT fk_uproftype FOREIGN KEY
("userProfileTypeId") REFERENCES user_profile_type ("userProfileTypeId");

> OK, another query (perfectly valid SQL):
>
> insert into user_profile_type
> (userProfileTypeId,userProfileType) VALUES(1,'ABNORMAL');
> ^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^
> ERROR: column "userprofiletypeid" of relation "user_profile_type" does not
> exist

Try:
insert into user_profile_type
("userProfileTypeId","userProfileType") VALUES(1,'ABNORMAL');

>
> I am hoping that there is an easy way to obtain case-preservation with
> case-insensitivity, or at the very least, case-preservation and complete
> case-sensitivity, or case-preservation and a consistant case-conversion
> strategy.
>
> Again, I am looking for a way (magic, patches, whiskey, etc) that will give me
> case-preservation with EITHER case-sensitivity OR case-insensitivity, but not
> both as I am seeing.

Perhaps in your queries or views you use the AS keywork to respecify the column name with
upper/lower cases.

i.e.
mydb=# select id as "Id" from foo;
Id
---------
goodbye
(1 row)

Regards,

Richard Broersma Jr.


From: Richard Broersma Jr <rabroersma(at)yahoo(dot)com>
To: beau hargis <beauh(at)bluefrogmobile(dot)com>, pgsql-sql(at)postgresql(dot)org
Subject: Re: Case Preservation disregarding case sensitivity?
Date: 2006-10-28 14:59:09
Message-ID: 879704.84815.qm@web31810.mail.mud.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-sql

> Again, I am looking for a way (magic, patches, whiskey, etc) that will give me
> case-preservation with EITHER case-sensitivity OR case-insensitivity, but not
> both as I am seeing.
>
> Thanks in advance. I am hoping to find a solution to this so I can actually
> convert one of our databases to use Postgres. And I can say that little
> issues like this are precisely why Postgres was never used in this
> organization before, even though several of the other database developers
> like the features, stability and performance of Postgres.

Here is an explination from the postgresql manual:
http://www.postgresql.org/docs/8.1/interactive/sql-syntax.html#SQL-SYNTAX-IDENTIFIERS

...
Quoting an identifier also makes it case-sensitive, whereas unquoted names are always folded to
lower case. For example, the identifiers FOO, foo, and "foo" are considered the same by
PostgreSQL, but "Foo" and "FOO" are different from these three and each other. (The folding of
unquoted names to lower case in PostgreSQL is incompatible with the SQL standard, which says that
unquoted names should be folded to upper case. Thus, foo should be equivalent to "FOO" not "foo"
according to the standard. If you want to write portable applications you are advised to always
quote a particular name or never quote it.)
...

Regards,

Richard Broersma Jr.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: beau hargis <beauh(at)bluefrogmobile(dot)com>
Cc: pgsql-sql(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Case Preservation disregarding case sensitivity?
Date: 2006-10-31 03:23:30
Message-ID: 23177.1162265010@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-sql

beau hargis <beauh(at)bluefrogmobile(dot)com> writes:
> Considering the differences that already exist between database systems and
> their varying compliance with SQL and the various extensions that have been
> created, I do not consider that the preservation of case for identifiers
> would violate any SQL standard.

That's not how I read the spec. It is true that we are not 100% spec
compliant, but that isn't a good argument for moving further away from
spec. Not to mention breaking backwards compatibility with our
historical behavior. The change you propose would fix your application
at the cost of breaking other people's applications. Perhaps you
should consider fixing your app instead.

regards, tom lane


From: beau hargis <beauh(at)bluefrogmobile(dot)com>
To: pgsql-sql(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Case Preservation disregarding case sensitivity?
Date: 2006-10-31 03:28:50
Message-ID: 200610301928.51255.beauh@bluefrogmobile.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-sql

On Friday 27 October 2006 19:38, Joe wrote:
> Hi Beau,
>
> On Fri, 2006-10-27 at 16:23 -0700, beau hargis wrote:
> > I am hoping that there is an easy way to obtain case-preservation with
> > case-insensitivity, or at the very least, case-preservation and complete
> > case-sensitivity, or case-preservation and a consistant case-conversion
> > strategy.
> >
> > The case of the column names need to be preserved because that is the way
> > the schema is designed and most importantly (VERY, VERY IMPORTANT),
> > column names are used in apps as hash values, or as named references
> > which are case sensitive and as such need to be delivered to the client
> > in exactly in the manner specified at the time of table creation.
> >
>
> I went through the same issue in my conversion from MySQL to Postgres
> and (since I had a small application) I ended up changing up all my
> tables and columns "UserProfile" to user_profile.
>
> I'm afraid however, that it's MySQL that is the odd man out. I haven't
> researched this completely but I believe PG follows either the FIPS-127
> or SQL-92 standard with respect to what are called "delimited
> identifiers". Basically, this says if you want case sensitivity in
> identifier names, you have to use double quotes wherever you refer to
> the identifier. Without the double quotes, the SQL implementor can
> either use UPPERCASE (as I believe Oracle and DB2 do) or lowercase (as
> PG does) when it displays those identifiers.
>
> Joe

Again, I am at the same point I was at when searching and searching for
information on the problem, and I am not sure what the SQL standard has to
say about it: I do not, and I dont think that anyone else who is struggling
to deal with the problem does, care about case-sensitivity. I am interested
in case preservation of column names. I do, indeed, want identifiers treated
in a case insensitive way, but I want the case PRESERVED in the table
definitions and I want that case, as preserved, to be reflected in the field
names as returned by the server to any client library that connects and
initiates a query.

Case-preservation is not the same as case-sensitivity; nor is
case-normalization the same as case-insensitivity. What PostgreSQL is doing
is converting any, and all, identifiers to a lower case and then matching
those against the identifiers (as stored in the table definition) in a
case-sensitive manner. It 'normalizes' the case of the identifiers so that it
has a common internal representation; the desires of the programmer and
database architect be damned.

Referenced specification details:
From FIPS-127:
=======================================
3. Delimited identifiers. In the previous ANSI SQL specification, it was not
possible for an application to specify identifiers with spaces or other
special symbols. Also, it was not possible to protect against future assaults
on the name space for (identifier) by additions to the (reserved word) list.
The new facility for (delimited identifier) allows a user to enclose all
identifiers in double-quotation marks, thereby ensuring that the name defined
or referenced may contain spaces or other special symbols and will not be
impacted by future additions to the (reserved word) list.
=======================================

From SQL-92/Sec. 5.2:
=======================================
10)The <identifier body> of a <regular identifier> is equivalent
to an <identifier body> in which every letter that is a lower-
case letter is replaced by the equivalent upper-case letter
or letters. This treatment includes determination of equiva-
lence, representation in the Information and Definition Schemas,
representation in the diagnostics area, and similar uses.

11)The <identifier body> of a <regular identifier> (with every
letter that is a lower-case letter replaced by the equivalent
upper-case letter or letters), treated as the repetition of
a <character string literal> that specifies a <character set
specification> of SQL_TEXT, shall not be equal, according to
the comparison rules in Subclause 8.2, "<comparison predicate>",
to any <reserved word> (with every letter that is a lower-case
letter replaced by the equivalent upper-case letter or letters),
treated as the repetition of a <character string literal> that
specifies a <character set specification> of SQL_TEXT.

12)Two <regular identifier>s are equivalent if their <identifier
body>s, considered as the repetition of a <character string
literal> that specifies a <character set specification> of
SQL_TEXT, compare equally according to the comparison rules
in Subclause 8.2, "<comparison predicate>".

13)A <regular identifier> and a <delimited identifier> are equiva-
lent if the <identifier body> of the <regular identifier> (with
every letter that is a lower-case letter replaced by the equiva-
lent upper-case letter or letters) and the <delimited identifier
body> of the <delimited identifier> (with all occurrences of
<quote> replaced by <quote symbol> and all occurrences of <dou-
blequote symbol> replaced by <double quote>), considered as
the repetition of a <character string literal> that specifies a
<character set specification> of SQL_TEXT and an implementation-
defined collation that is sensitive to case, compare equally
according to the comparison rules in Subclause 8.2, "<comparison
predicate>".

14)Two <delimited identifier>s are equivalent if their <delimited
identifier body>s (with all occurrences of <quote> replaced
by <quote symbol> and all occurrences of <doublequote symbol>
replaced by <doublequote>), considered as the repetition of a
<character string literal> that specifies a <character set spec-
ification> of SQL_TEXT and an implementation-defined collation
that is sensitive to case, compare equally according to the
comparison rules in Subclause 8.2, "<comparison predicate>".
========================================

Having installed DB2 Enterprise today and taking it for a spin, it does indeed
behave in a similar manner. However, after reading through both
specifications, it seems that DB2 follows more of the spec than PostgreSQL.
The specifications state that for purpose of comparing identifiers, both
shall be converted to upper-case. DB2 displays all identifiers in upper-case
whereas PostgreSQL displays all identifiers in lower-case. This alone would
be a deviation from the specification.

It seems, however, that the regular-identifier/delimited-identifier
distinction is for the embedding of spaces or normally invalid characters
from other character sets, not for actually creating case-sensitive
identifiers. It is only a side-effect. It does not enforce any method for the
representation of the identifiers which the client expects, either. Since a
query can be constructed which specifies aliases (... AS ...), the
presentation of the data is implementation/runtime specific.

Considering the differences that already exist between database systems and
their varying compliance with SQL and the various extensions that have been
created, I do not consider that the preservation of case for identifiers
would violate any SQL standard. The end result being eaiser migration, the
continued operation of cross-platform tools and applications, and a database
designer or data architect having more flexibility in the construction and
presentation of a database.

Again, the distinction between case-sensitivity and case-preservation, and,
case-insensitivity and case-normalization needs to be recognized. Having
case-preservation defined in the CREATE TABLE syntax and/or a server
configuration seems, to me, to be a valuable addition. Of course, making it
an option would not interfere with the current state of affairs.

I have looked through the source code and have identified places which need to
be patched to make this possible. Tomorrow I will work on that and see what
it will impact in the code. Let me know what you think.


From: "Chuck McDevitt" <cmcdevitt(at)greenplum(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "beau hargis" <beauh(at)bluefrogmobile(dot)com>
Cc: pgsql-sql(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] Case Preservation disregarding case
Date: 2006-10-31 06:08:31
Message-ID: EB48EBF3B239E948AC1E3F3780CF8F88012F88DD@MI8NYCMAIL02.Mi8.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-sql

At Teradata, we certainly interpreted the spec to allow case-preserving,
but case-insensitive, identifiers.
Users really liked it that way: If you re-created a CREATE TABLE
statement from the catalog, you could get back exactly the case the user
had entered, but people using the table didn't need to worry about case.
And column titles in reports would have the nice case preserving
information.
Sort of like how Windows systems treat file names... The case is
preserved, but you don't need to know it to access the file.

I know UNIX users usually think "case-preserving with case-insensitive"
a foreign concept, but that doesn't mean the average user feels the
same.

If I want my column named "WeeklyTotalSales", I really don't want to
have to always quote it and type in the exact case.

-----Original Message-----
From: pgsql-hackers-owner(at)postgresql(dot)org
[mailto:pgsql-hackers-owner(at)postgresql(dot)org] On Behalf Of Tom Lane
Sent: Monday, October 30, 2006 7:24 PM
To: beau hargis
Cc: pgsql-sql(at)postgresql(dot)org; pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] [SQL] Case Preservation disregarding case
sensitivity?

beau hargis <beauh(at)bluefrogmobile(dot)com> writes:
> Considering the differences that already exist between database
systems and
> their varying compliance with SQL and the various extensions that have
been
> created, I do not consider that the preservation of case for
identifiers
> would violate any SQL standard.

That's not how I read the spec. It is true that we are not 100% spec
compliant, but that isn't a good argument for moving further away from
spec. Not to mention breaking backwards compatibility with our
historical behavior. The change you propose would fix your application
at the cost of breaking other people's applications. Perhaps you
should consider fixing your app instead.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Chuck McDevitt" <cmcdevitt(at)greenplum(dot)com>
Cc: "beau hargis" <beauh(at)bluefrogmobile(dot)com>, pgsql-sql(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] Case Preservation disregarding case sensitivity?
Date: 2006-10-31 06:34:56
Message-ID: 24861.1162276496@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-sql

"Chuck McDevitt" <cmcdevitt(at)greenplum(dot)com> writes:
> At Teradata, we certainly interpreted the spec to allow case-preserving,
> but case-insensitive, identifiers.

Really?

As I see it, the controlling parts of the SQL spec are (SQL99 sec 5.2)

26) A <regular identifier> and a <delimited identifier> are
equivalent if the <identifier body> of the <regular identifier>
(with every letter that is a lower-case letter replaced by the
corresponding upper-case letter or letters) and the <delimited
identifier body> of the <delimited identifier> (with all
occurrences of <quote> replaced by <quote symbol> and all
occurrences of <doublequote symbol> replaced by <double quote>),
considered as the repetition of a <character string literal>
that specifies a <character set specification> of SQL_IDENTIFIER
and an implementation-defined collation that is sensitive to
case, compare equally according to the comparison rules in
Subclause 8.2, "<comparison predicate>".

27) Two <delimited identifier>s are equivalent if their <delimited
identifier body>s, considered as the repetition of a <character
string literal> that specifies a <character set specification>
of SQL_IDENTIFIER and an implementation-defined collation
that is sensitive to case, compare equally according to the
comparison rules in Subclause 8.2, "<comparison predicate>".

Note well the "sensitive to case" bits there. Now consider

CREATE TABLE tab (
"foobar" int,
"FooBar" timestamp,
"FOOBAR" varchar(3)
);

We can *not* reject this as containing duplicate column names, else we
have certainly violated rule 27. Now what will you do with

SELECT fooBar FROM tab;

? The spec is unquestionably on the side of "you selected the varchar
column"; historical Postgres practice is on the side of "you selected
the int column". AFAICS a case-insensitive approach would have to
fail with some "I can't identify which column you mean" error. I am
interested to see where you find support for that in the spec...

regards, tom lane


From: "Chuck McDevitt" <cmcdevitt(at)greenplum(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "beau hargis" <beauh(at)bluefrogmobile(dot)com>, pgsql-sql(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] Case Preservation disregarding case
Date: 2006-10-31 17:11:16
Message-ID: EB48EBF3B239E948AC1E3F3780CF8F88012F89F8@MI8NYCMAIL02.Mi8.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-sql

We treated quoted identifiers as case-specific, as the spec requires.

In the catalog, we stored TWO columns... The column name with case
converted as appropriate (as PostgreSQL already does), used for looking
up the attribute,
And a second column, which was the column name with the case exactly as
entered by the user.

So, your example would work just fine.

-----Original Message-----
From: Tom Lane [mailto:tgl(at)sss(dot)pgh(dot)pa(dot)us]
Sent: Monday, October 30, 2006 10:35 PM
To: Chuck McDevitt
Cc: beau hargis; pgsql-sql(at)postgresql(dot)org; pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] [SQL] Case Preservation disregarding case
sensitivity?

"Chuck McDevitt" <cmcdevitt(at)greenplum(dot)com> writes:
> At Teradata, we certainly interpreted the spec to allow
case-preserving,
> but case-insensitive, identifiers.

Really?

As I see it, the controlling parts of the SQL spec are (SQL99 sec 5.2)

26) A <regular identifier> and a <delimited identifier> are
equivalent if the <identifier body> of the <regular
identifier>
(with every letter that is a lower-case letter replaced by
the
corresponding upper-case letter or letters) and the
<delimited
identifier body> of the <delimited identifier> (with all
occurrences of <quote> replaced by <quote symbol> and all
occurrences of <doublequote symbol> replaced by <double
quote>),
considered as the repetition of a <character string literal>
that specifies a <character set specification> of
SQL_IDENTIFIER
and an implementation-defined collation that is sensitive to
case, compare equally according to the comparison rules in
Subclause 8.2, "<comparison predicate>".

27) Two <delimited identifier>s are equivalent if their
<delimited
identifier body>s, considered as the repetition of a
<character
string literal> that specifies a <character set
specification>
of SQL_IDENTIFIER and an implementation-defined collation
that is sensitive to case, compare equally according to the
comparison rules in Subclause 8.2, "<comparison predicate>".

Note well the "sensitive to case" bits there. Now consider

CREATE TABLE tab (
"foobar" int,
"FooBar" timestamp,
"FOOBAR" varchar(3)
);

We can *not* reject this as containing duplicate column names, else we
have certainly violated rule 27. Now what will you do with

SELECT fooBar FROM tab;

? The spec is unquestionably on the side of "you selected the varchar
column"; historical Postgres practice is on the side of "you selected
the int column". AFAICS a case-insensitive approach would have to
fail with some "I can't identify which column you mean" error. I am
interested to see where you find support for that in the spec...

regards, tom lane


From: "Chuck McDevitt" <cmcdevitt(at)greenplum(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "beau hargis" <beauh(at)bluefrogmobile(dot)com>, pgsql-sql(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] Case Preservation disregarding case
Date: 2006-10-31 17:13:31
Message-ID: EB48EBF3B239E948AC1E3F3780CF8F88012F89FA@MI8NYCMAIL02.Mi8.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-sql

Oh... And Microsoft SQLServer does something similar.

At Greenplum, we've already gotten complaints from customers about this
when they were switching from MSSQL to GP's PostgreSQL-based database.

-----Original Message-----
From: Tom Lane [mailto:tgl(at)sss(dot)pgh(dot)pa(dot)us]
Sent: Monday, October 30, 2006 10:35 PM
To: Chuck McDevitt
Cc: beau hargis; pgsql-sql(at)postgresql(dot)org; pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] [SQL] Case Preservation disregarding case
sensitivity?

"Chuck McDevitt" <cmcdevitt(at)greenplum(dot)com> writes:
> At Teradata, we certainly interpreted the spec to allow
case-preserving,
> but case-insensitive, identifiers.


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Chuck McDevitt <cmcdevitt(at)greenplum(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, beau hargis <beauh(at)bluefrogmobile(dot)com>, pgsql-sql(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] Case Preservation disregarding case
Date: 2006-10-31 17:55:46
Message-ID: 45478E22.8060805@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-sql


There was some discussion a couple of years ago on the -hackers list
about it, so you might like to review the archives. The consensus seemed
to be that behaviour would need to be set no later than createdb time.
The options I thought of were:

. current postgres behaviour (we need to do this for legacy reasons, of
course, as well as to keep happy the legions who hate using upper case
for anything)
. strictly spec compliant (same as current behaviour, but folding to
upper case for unquoted identifiers rather than lower)
. fully case sensitive even for unquoted identifiers (not spec compliant
at all, but nevertheless possibly attractive especially for people
migrating from MS SQLServer, where it is an option, IIRC).

To this you propose, as I understand it, to have a fourth possibility
which would be spec compliant for comparison purposes but would label
result set columns with the case preserved name originally used (or
would you use the casing used in the query?).

These could be accomplished I think with a second catalog column like
you suggest, in a number of places, but making sure all the code paths
were covered might be somewhat laborious. We could probably add the
second option without being nearly so invasive, though, and some people
might feel that that would be sufficient.

cheers

andrew

Chuck McDevitt wrote:
> We treated quoted identifiers as case-specific, as the spec requires.
>
> In the catalog, we stored TWO columns... The column name with case
> converted as appropriate (as PostgreSQL already does), used for looking
> up the attribute,
> And a second column, which was the column name with the case exactly as
> entered by the user.
>
> So, your example would work just fine.
>
>
> -----Original Message-----
> From: Tom Lane [mailto:tgl(at)sss(dot)pgh(dot)pa(dot)us]
> Sent: Monday, October 30, 2006 10:35 PM
> To: Chuck McDevitt
> Cc: beau hargis; pgsql-sql(at)postgresql(dot)org; pgsql-hackers(at)postgresql(dot)org
> Subject: Re: [HACKERS] [SQL] Case Preservation disregarding case
> sensitivity?
>
> "Chuck McDevitt" <cmcdevitt(at)greenplum(dot)com> writes:
>
>> At Teradata, we certainly interpreted the spec to allow
>>
> case-preserving,
>
>> but case-insensitive, identifiers.
>>
>
> Really?
>
> As I see it, the controlling parts of the SQL spec are (SQL99 sec 5.2)
>
> 26) A <regular identifier> and a <delimited identifier> are
> equivalent if the <identifier body> of the <regular
> identifier>
> (with every letter that is a lower-case letter replaced by
> the
> corresponding upper-case letter or letters) and the
> <delimited
> identifier body> of the <delimited identifier> (with all
> occurrences of <quote> replaced by <quote symbol> and all
> occurrences of <doublequote symbol> replaced by <double
> quote>),
> considered as the repetition of a <character string literal>
> that specifies a <character set specification> of
> SQL_IDENTIFIER
> and an implementation-defined collation that is sensitive to
> case, compare equally according to the comparison rules in
> Subclause 8.2, "<comparison predicate>".
>
> 27) Two <delimited identifier>s are equivalent if their
> <delimited
> identifier body>s, considered as the repetition of a
> <character
> string literal> that specifies a <character set
> specification>
> of SQL_IDENTIFIER and an implementation-defined collation
> that is sensitive to case, compare equally according to the
> comparison rules in Subclause 8.2, "<comparison predicate>".
>
> Note well the "sensitive to case" bits there. Now consider
>
> CREATE TABLE tab (
> "foobar" int,
> "FooBar" timestamp,
> "FOOBAR" varchar(3)
> );
>
> We can *not* reject this as containing duplicate column names, else we
> have certainly violated rule 27. Now what will you do with
>
> SELECT fooBar FROM tab;
>
> ? The spec is unquestionably on the side of "you selected the varchar
> column"; historical Postgres practice is on the side of "you selected
> the int column". AFAICS a case-insensitive approach would have to
> fail with some "I can't identify which column you mean" error. I am
> interested to see where you find support for that in the spec...
>
>


From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Chuck McDevitt <cmcdevitt(at)greenplum(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, beau hargis <beauh(at)bluefrogmobile(dot)com>, pgsql-sql(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] Case Preservation disregarding case
Date: 2006-10-31 18:06:13
Message-ID: 20061031180613.GB29338@svana.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-sql

On Tue, Oct 31, 2006 at 12:55:46PM -0500, Andrew Dunstan wrote:
> To this you propose, as I understand it, to have a fourth possibility
> which would be spec compliant for comparison purposes but would label
> result set columns with the case preserved name originally used (or
> would you use the casing used in the query?).

The big issue I see with this is that it would break PQfname on the
client end, since that's case sensetive too. Most client languages are,
so you really are between a rock and a hard place.

Making PQfname case-insensetive also screws up in Tom's example.

One way to appraoch this is to consider this a setting of the collation
of the name datatype. If a case-insensetive collation is selected at
initdb time, then Tom's example would indeed fail, but that's a choice
someone made. Problem being, you'd have to export that choice to
clients to make PQfname work, and that's going to messy.

Have a nice day,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.


From: Stephan Szabo <sszabo(at)megazone(dot)bigpanda(dot)com>
To: Chuck McDevitt <cmcdevitt(at)greenplum(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, beau hargis <beauh(at)bluefrogmobile(dot)com>, pgsql-sql(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] Case Preservation disregarding case
Date: 2006-10-31 18:22:40
Message-ID: 20061031101314.R83230@megazone.bigpanda.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-sql

On Tue, 31 Oct 2006, Chuck McDevitt wrote:

> We treated quoted identifiers as case-specific, as the spec requires.
>
> In the catalog, we stored TWO columns... The column name with case
> converted as appropriate (as PostgreSQL already does), used for looking
> up the attribute,
> And a second column, which was the column name with the case exactly as
> entered by the user.

Wouldn't using that second column's value tend to often violate 5.2SR10
(at least that's the reference item in SQL92)? AFAICT, that rule basically
says that the regular identifier is equivalent to the case-folded one for
purposes of information and definition schema and similar purposes which
seems like it would be intended to include things like column labeling for
output. There's a little bit of flexibility there on both similar purposes
and equivalence, though.

----

10) The <identifier body> of a <regular identifier> is equivalent
to an <identifier body> in which every letter that is a lower-
case letter is replaced by the equivalent upper-case letter
or letters. This treatment includes determination of equiva-
lence, representation in the Information and Definition Schemas,
representation in the diagnostics area, and similar uses.


From: "Chuck McDevitt" <cmcdevitt(at)greenplum(dot)com>
To: "Stephan Szabo" <sszabo(at)megazone(dot)bigpanda(dot)com>
Cc: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "beau hargis" <beauh(at)bluefrogmobile(dot)com>, pgsql-sql(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] Case Preservation disregarding case
Date: 2006-10-31 18:32:42
Message-ID: EB48EBF3B239E948AC1E3F3780CF8F88012F8A53@MI8NYCMAIL02.Mi8.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-sql

-----Original Message-----
From: Stephan Szabo [mailto:sszabo(at)megazone(dot)bigpanda(dot)com]
Sent: Tuesday, October 31, 2006 10:23 AM
To: Chuck McDevitt
Cc: Tom Lane; beau hargis; pgsql-sql(at)postgresql(dot)org;
pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] [SQL] Case Preservation disregarding case

On Tue, 31 Oct 2006, Chuck McDevitt wrote:

> We treated quoted identifiers as case-specific, as the spec requires.
>
> In the catalog, we stored TWO columns... The column name with case
> converted as appropriate (as PostgreSQL already does), used for
looking
> up the attribute,
> And a second column, which was the column name with the case exactly
as
> entered by the user.

Wouldn't using that second column's value tend to often violate 5.2SR10
(at least that's the reference item in SQL92)? AFAICT, that rule
basically
says that the regular identifier is equivalent to the case-folded one
for
purposes of information and definition schema and similar purposes which
seems like it would be intended to include things like column labeling
for
output. There's a little bit of flexibility there on both similar
purposes
and equivalence, though.

Equivalent, yes. But I can interpret that clause it mean I can show
either the case folded or non-case-folded value in the information
schema, as they are equivalent.

Anyway, we have many things that are "enhancements" beyond the spec, and
this could also be considered an enhancement.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Chuck McDevitt" <cmcdevitt(at)greenplum(dot)com>
Cc: "Stephan Szabo" <sszabo(at)megazone(dot)bigpanda(dot)com>, "beau hargis" <beauh(at)bluefrogmobile(dot)com>, pgsql-sql(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] Case Preservation disregarding case
Date: 2006-11-01 06:38:25
Message-ID: 14059.1162363105@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-sql

"Chuck McDevitt" <cmcdevitt(at)greenplum(dot)com> writes:
> Equivalent, yes. But I can interpret that clause it mean I can show
> either the case folded or non-case-folded value in the information
> schema, as they are equivalent.

Well, that's an interesting bit of specs-lawyering, but I don't see
how you can defend it against these rules in SQL99 5.2:

21) For every <identifier body> IB there is exactly one
corresponding case-normal form CNF. CNF is an <identifier body>
derived from IB as follows.

Let n be the number of characters in IB. For i ranging from 1
(one) to n, the i-th character M(i) of IB is translated into the
corresponding character or characters of CNF as follows.

Case:

a) If M(i) is a lower case character or a title case character
for which an equivalent upper case sequence U is defined by
Unicode, then let j be the number of characters in U; the
next j characters of CNF are U.

b) Otherwise, the next character of CNF is M(i).

22) The case-normal form of the <identifier body> of a <regular
identifier> is used for purposes such as and including
determination of identifier equivalence, representation in
the Definition and Information Schemas, and representation in
diagnostics areas.

NOTE 44 - Any lower-case letters for which there are no upper-
case equivalents are left in their lower-case form.

Again, obviously we are not compliant because we fold to lower rather
than upper case, but I do not see how you can read (22) as not requiring
the information schema to show the upper-cased form. The output of
functions such as PQfname() might be considered closer to diagnostics
info than information schema, but that's covered too.

But the really serious problem with what you propose is that it would
allow two table columns with names that the system considers distinct
to show as the same string in the information schema and diagnostic
outputs. That can't be acceptable --- it's going to break any
application that does any nontrivial analysis of what it sees there,
not to mention that it violates various primary key constraints in
the information schema specification.

regards, tom lane


From: "Chuck McDevitt" <cmcdevitt(at)greenplum(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Stephan Szabo" <sszabo(at)megazone(dot)bigpanda(dot)com>, "beau hargis" <beauh(at)bluefrogmobile(dot)com>, pgsql-sql(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] Case Preservation disregarding case
Date: 2006-11-01 16:31:30
Message-ID: EB48EBF3B239E948AC1E3F3780CF8F88012F8CD6@MI8NYCMAIL02.Mi8.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-sql

Sorry, my last mail wasn't well thought out. Yes, the
information_schema needs the case-folded name (although it might be ok
to add additional columns to the information_schema for extra
information).

But, stepping back from all that, what is it the users want?

1) When re-creating a CREATE TABLE statement from whatever catalog
info, they'd like the names to come back exactly as then entered them.
If I do:
CREATE TABLE BobsTable (WeeklySales numeric(10,2),
"SomeStrangeName" int);

They'd like to see exactly that when the CREATE TABLE gets
re-created, not what we do now:

CREATE TABLE bobstable (weeklysales numeric(10,2),
"SomeStrangeName" int);

2) When doing reports, they'd like the name as entered to be the title
of the column:
Select * from bobstable;

Would be nice if they saw this:
WeeklySales SomeStrangeName
----------- ---------------


For compatibility with existing apps and spec compliance, you'd still
want PQfname() to return the case-folded name.
But there isn't any reason you couldn't also return a "suggested title"
field (PQftitle?) which preserves the user's case.

You could also extend the concept of a PQftitle to make nicer titles for
expressions. Instead of
SELECT sum(WeeklySales) from BobsTable;

Producing "?column?" or somesuch to use in the report, it could return a
title like "sum(WeeklySales)"

-----Original Message-----
From: Tom Lane [mailto:tgl(at)sss(dot)pgh(dot)pa(dot)us]
Sent: Tuesday, October 31, 2006 10:38 PM
To: Chuck McDevitt
Cc: Stephan Szabo; beau hargis; pgsql-sql(at)postgresql(dot)org;
pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] [SQL] Case Preservation disregarding case

"Chuck McDevitt" <cmcdevitt(at)greenplum(dot)com> writes:
> Equivalent, yes. But I can interpret that clause it mean I can show
> either the case folded or non-case-folded value in the information
> schema, as they are equivalent.

Well, that's an interesting bit of specs-lawyering, but I don't see
how you can defend it against these rules in SQL99 5.2:

21) For every <identifier body> IB there is exactly one
corresponding case-normal form CNF. CNF is an <identifier
body>
derived from IB as follows.

Let n be the number of characters in IB. For i ranging from
1
(one) to n, the i-th character M(i) of IB is translated into
the
corresponding character or characters of CNF as follows.

Case:

a) If M(i) is a lower case character or a title case
character
for which an equivalent upper case sequence U is defined
by
Unicode, then let j be the number of characters in U; the
next j characters of CNF are U.

b) Otherwise, the next character of CNF is M(i).

22) The case-normal form of the <identifier body> of a <regular
identifier> is used for purposes such as and including
determination of identifier equivalence, representation in
the Definition and Information Schemas, and representation
in
diagnostics areas.

NOTE 44 - Any lower-case letters for which there are no
upper-
case equivalents are left in their lower-case form.

Again, obviously we are not compliant because we fold to lower rather
than upper case, but I do not see how you can read (22) as not requiring
the information schema to show the upper-cased form. The output of
functions such as PQfname() might be considered closer to diagnostics
info than information schema, but that's covered too.

But the really serious problem with what you propose is that it would
allow two table columns with names that the system considers distinct
to show as the same string in the information schema and diagnostic
outputs. That can't be acceptable --- it's going to break any
application that does any nontrivial analysis of what it sees there,
not to mention that it violates various primary key constraints in
the information schema specification.

regards, tom lane


From: "Simon Riggs" <simon(at)2ndquadrant(dot)com>
To: "Chuck McDevitt" <cmcdevitt(at)greenplum(dot)com>
Cc: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Stephan Szabo" <sszabo(at)megazone(dot)bigpanda(dot)com>, "beau hargis" <beauh(at)bluefrogmobile(dot)com>, <pgsql-sql(at)postgresql(dot)org>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Case Preservation disregarding case
Date: 2006-11-02 10:27:44
Message-ID: 1162463264.3587.281.camel@silverbirch.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-sql

On Wed, 2006-11-01 at 11:31 -0500, Chuck McDevitt wrote:

> But, stepping back from all that, what is it the users want?
>
> 1) When re-creating a CREATE TABLE statement from whatever catalog
> info, they'd like the names to come back exactly as then entered them.
> If I do:
> CREATE TABLE BobsTable (WeeklySales numeric(10,2),
> "SomeStrangeName" int);
>
> They'd like to see exactly that when the CREATE TABLE gets
> re-created, not what we do now:
>
> CREATE TABLE bobstable (weeklysales numeric(10,2),
> "SomeStrangeName" int);

This would be very good indEEd.

It can be very annoying trying to locate a table when the user swears
they called it one thing and actually the case or quotation is
different. Current behaviour isn't useful, even if it is "onspec" (or is
that OnSpec?). Would be better to make this behaviour a userset
switchable between the exactly compliant and the more intuitive.

We have namespaces to differentiate between two sources of object names,
so anybody who creates a schema where MyColumn is not the same thing as
myColumn is not following sensible rules for conceptual distance. It's
certainly an error of best practice, even if its not actually a bug.

> 2) When doing reports, they'd like the name as entered to be the title
> of the column:
> Select * from bobstable;
>
> Would be nice if they saw this:
> WeeklySales SomeStrangeName
> ----------- ---------------
...

> Producing "?column?" or somesuch to use in the report, it could return a
> title like "sum(WeeklySales)"

That would be just great. I'm not sure the spec says what the titles
should be, does it?

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Simon Riggs" <simon(at)2ndquadrant(dot)com>
Cc: "Chuck McDevitt" <cmcdevitt(at)greenplum(dot)com>, "Stephan Szabo" <sszabo(at)megazone(dot)bigpanda(dot)com>, "beau hargis" <beauh(at)bluefrogmobile(dot)com>, pgsql-sql(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] Case Preservation disregarding case
Date: 2006-11-02 15:51:25
Message-ID: 16897.1162482685@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-sql

"Simon Riggs" <simon(at)2ndquadrant(dot)com> writes:
> We have namespaces to differentiate between two sources of object names,
> so anybody who creates a schema where MyColumn is not the same thing as
> myColumn is not following sensible rules for conceptual distance.

I'd agree that that is not a good design practice, but the fact remains
that they *are* different per spec.

> Would be better to make this behaviour a userset
> switchable between the exactly compliant and the more intuitive.

That's certainly not happening --- if you make any changes in the
semantics of equality of type name, it would have to be frozen no
later than initdb time, for exactly the same reasons we freeze
locale then (hint: index ordering).

regards, tom lane


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: beau hargis <beauh(at)bluefrogmobile(dot)com>
Cc: pgsql-sql(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Case Preservation disregarding case sensitivity?
Date: 2006-11-14 18:43:58
Message-ID: 200611141843.kAEIhwv04665@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-sql

beau hargis wrote:
> Having installed DB2 Enterprise today and taking it for a spin, it does indeed
> behave in a similar manner. However, after reading through both
> specifications, it seems that DB2 follows more of the spec than PostgreSQL.
> The specifications state that for purpose of comparing identifiers, both
> shall be converted to upper-case. DB2 displays all identifiers in upper-case
> whereas PostgreSQL displays all identifiers in lower-case. This alone would
> be a deviation from the specification.

True. We lowercase because historically we have, and because
all-upper-case is hard to read.

--
Bruce Momjian bruce(at)momjian(dot)us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: "Simon Riggs" <simon(at)2ndquadrant(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Chuck McDevitt" <cmcdevitt(at)greenplum(dot)com>, "Stephan Szabo" <sszabo(at)megazone(dot)bigpanda(dot)com>, "beau hargis" <beauh(at)bluefrogmobile(dot)com>, <pgsql-sql(at)postgresql(dot)org>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [SQL] Case Preservation disregarding case
Date: 2006-11-14 21:42:33
Message-ID: 1163540553.27956.109.camel@silverbirch.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-sql

On Thu, 2006-11-02 at 10:51 -0500, Tom Lane wrote:
> "Simon Riggs" <simon(at)2ndquadrant(dot)com> writes:
> > We have namespaces to differentiate between two sources of object names,
> > so anybody who creates a schema where MyColumn is not the same thing as
> > myColumn is not following sensible rules for conceptual distance.
>
> I'd agree that that is not a good design practice, but the fact remains
> that they *are* different per spec.
>
> > Would be better to make this behaviour a userset
> > switchable between the exactly compliant and the more intuitive.
>
> That's certainly not happening --- if you make any changes in the
> semantics of equality of type name, it would have to be frozen no
> later than initdb time, for exactly the same reasons we freeze
> locale then (hint: index ordering).

[Re-read all of this after Bruce's post got me thinking.]

My summary of the thread, with TODO items noted:

1. PostgreSQL doesn't follow the spec, but almost does, with regard to
comparison of unquoted and quoted identifiers. DB2 does this per spec.

2. TODO: We could follow the spec, but it would need an initdb option;
some non-SQL:2003 standard PostgreSQL programs would not work as they do
now. This is considered a minor, low priority item, though.

3. TODO: We could set column headers better if we wanted to (rather
than ?column? we could use e.g. Sum_ColumnName etc)

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com


From: Jim Nasby <jim(at)nasby(dot)net>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Chuck McDevitt" <cmcdevitt(at)greenplum(dot)com>, "Stephan Szabo" <sszabo(at)megazone(dot)bigpanda(dot)com>, "beau hargis" <beauh(at)bluefrogmobile(dot)com>, <pgsql-sql(at)postgresql(dot)org>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Case Preservation disregarding case
Date: 2006-11-15 15:47:18
Message-ID: E019F6B3-0E16-4FE8-9787-14250841C3D1@nasby.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-sql

On Nov 14, 2006, at 2:42 PM, Simon Riggs wrote:
> On Thu, 2006-11-02 at 10:51 -0500, Tom Lane wrote:
>> "Simon Riggs" <simon(at)2ndquadrant(dot)com> writes:
>>> We have namespaces to differentiate between two sources of object
>>> names,
>>> so anybody who creates a schema where MyColumn is not the same
>>> thing as
>>> myColumn is not following sensible rules for conceptual distance.
>>
>> I'd agree that that is not a good design practice, but the fact
>> remains
>> that they *are* different per spec.
>>
>>> Would be better to make this behaviour a userset
>>> switchable between the exactly compliant and the more intuitive.
>>
>> That's certainly not happening --- if you make any changes in the
>> semantics of equality of type name, it would have to be frozen no
>> later than initdb time, for exactly the same reasons we freeze
>> locale then (hint: index ordering).
>
> [Re-read all of this after Bruce's post got me thinking.]
>
> My summary of the thread, with TODO items noted:
>
> 1. PostgreSQL doesn't follow the spec, but almost does, with regard to
> comparison of unquoted and quoted identifiers. DB2 does this per spec.
>
> 2. TODO: We could follow the spec, but it would need an initdb option;
> some non-SQL:2003 standard PostgreSQL programs would not work as
> they do
> now. This is considered a minor, low priority item, though.
>
> 3. TODO: We could set column headers better if we wanted to (rather
> than ?column? we could use e.g. Sum_ColumnName etc)

Did the idea of preserving the original case and using that for
output column names, /d, etc. get shot down? I thought it would be a
useful addition...
--
Jim Nasby jim(at)nasby(dot)net
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Jim Nasby <jim(at)nasby(dot)net>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Chuck McDevitt <cmcdevitt(at)greenplum(dot)com>, Stephan Szabo <sszabo(at)megazone(dot)bigpanda(dot)com>, beau hargis <beauh(at)bluefrogmobile(dot)com>, pgsql-sql(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [SQL] Case Preservation disregarding case
Date: 2006-11-15 17:58:15
Message-ID: 455B5537.3060905@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-sql

Jim Nasby wrote:
> On Nov 14, 2006, at 2:42 PM, Simon Riggs wrote:
>> On Thu, 2006-11-02 at 10:51 -0500, Tom Lane wrote:
>>> "Simon Riggs" <simon(at)2ndquadrant(dot)com> writes:
>>>> We have namespaces to differentiate between two sources of object
>>>> names,
>>>> so anybody who creates a schema where MyColumn is not the same
>>>> thing as
>>>> myColumn is not following sensible rules for conceptual distance.
>>>
>>> I'd agree that that is not a good design practice, but the fact remains
>>> that they *are* different per spec.
>>>
>>>> Would be better to make this behaviour a userset
>>>> switchable between the exactly compliant and the more intuitive.
>>>
>>> That's certainly not happening --- if you make any changes in the
>>> semantics of equality of type name, it would have to be frozen no
>>> later than initdb time, for exactly the same reasons we freeze
>>> locale then (hint: index ordering).
>>
>> [Re-read all of this after Bruce's post got me thinking.]
>>
>> My summary of the thread, with TODO items noted:
>>
>> 1. PostgreSQL doesn't follow the spec, but almost does, with regard to
>> comparison of unquoted and quoted identifiers. DB2 does this per spec.
>>
>> 2. TODO: We could follow the spec, but it would need an initdb option;
>> some non-SQL:2003 standard PostgreSQL programs would not work as they do
>> now. This is considered a minor, low priority item, though.
>>
>> 3. TODO: We could set column headers better if we wanted to (rather
>> than ?column? we could use e.g. Sum_ColumnName etc)
>
> Did the idea of preserving the original case and using that for output
> column names, /d, etc. get shot down? I thought it would be a useful
> addition...

I think there is broad agreement that we need to provide optional
minimally spec compliant behaviour (fold unquoted to upper case,
otherwise as now). I am not sure how invasive either the "case preserved
+ case insensitive comparison" or the "case preserved + case sensitive
comparison" option would be. I don't know that anything has been ruled
out. Until someone produces a patch or a definite design and analysis we
are a bit in the dark. Personally, I would like to see all of these as
options ;-)

I think Simon's item 3 above is a separate issue.

cheers

andrew


From: Ken Johanson <pg-user(at)kensystem(dot)com>
To:
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-sql(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] Case Preservation disregarding case
Date: 2006-12-02 07:41:37
Message-ID: 45712E31.1030004@kensystem.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-sql

Chuck McDevitt wrote:
> At Teradata, we certainly interpreted the spec to allow case-preserving,
> but case-insensitive, identifiers.
> Users really liked it that way

My 2 thoughts:

1: It seems like this behavior of case sensitive-or-not-identifiers
could/should be a config option -- either globally for the server,
database, or at the connection/session level. Other databases *do*
support this type of granular config of misc SQL behavior -- its
essential for shared hosting environments. Without it some users just
*cant* make the switch. Quoting all an app's identifiers -- or renaming
camel-case to underscored -- show stopper.

2: Even though the spec state different (that identifiers should be
treated as case sensitive or else folded), precedence seems to have
changed that:

a) The databases that enforce this rule are fewer, I believe. IMO SQL
is now considered even higher than a 4GL language because it use is so
widespread - laymen need to use it.

b) the fact that different identifiers of mixed case could even coexist
in a table-columns or 'AS' or 'JOIN' -- really represents a more of an
err'd design -- and a case-insen option would detect this (unlike the
current behavior). It would throw an immediate ("fail fast") runtime
exception. So I think it's *safer*. (If tbl.rowId and tbl.rowid both
exist in a table or AS identifiers, something bad _will_ happen when
someone takes over a project)

If there were a new default behavior (or just config option added), my
vote would, without a doubt, be for case-insens (yet case preserving)
mode... even when using quoting identifiers. This case sen. behavior
doesn't seem to offer any advantage/safety.

ken


From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: ken(at)kensystem(dot)com
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-sql(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [SQL] Case Preservation disregarding case
Date: 2006-12-02 13:35:56
Message-ID: 20061202133556.GC26365@svana.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-sql

On Sat, Dec 02, 2006 at 12:41:37AM -0700, Ken Johanson wrote:
> 1: It seems like this behavior of case sensitive-or-not-identifiers
> could/should be a config option -- either globally for the server,
> database, or at the connection/session level. Other databases *do*
> support this type of granular config of misc SQL behavior -- its
> essential for shared hosting environments. Without it some users just
> *cant* make the switch. Quoting all an app's identifiers -- or renaming
> camel-case to underscored -- show stopper.

What about option 3: use camelcase without underscares and don't quote.
Then you get case-insensetivity and it's still readable.

You're obviously talking about an app which isn't quoting identifiers,
so I'm not sure what the actual problem is.

Have a nice day,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.


From: Ken Johanson <pg-user(at)kensystem(dot)com>
To: Martijn van Oosterhout <kleptog(at)svana(dot)org>
Cc: pgsql-sql(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] Case Preservation disregarding case
Date: 2006-12-02 15:21:08
Message-ID: 457199E4.3010401@kensystem.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-sql

Martijn van Oosterhout wrote:
> On Sat, Dec 02, 2006 at 12:41:37AM -0700, Ken Johanson wrote:
>> 1: It seems like this behavior of case sensitive-or-not-identifiers
>> could/should be a config option -- either globally for the server,
>> database, or at the connection/session level. Other databases *do*
>> support this type of granular config of misc SQL behavior -- its
>> essential for shared hosting environments. Without it some users just
>> *cant* make the switch. Quoting all an app's identifiers -- or renaming
>> camel-case to underscored -- show stopper.
>
> What about option 3: use camelcase without underscares and don't quote.
> Then you get case-insensetivity and it's still readable.
>
> You're obviously talking about an app which isn't quoting identifiers,
> so I'm not sure what the actual problem is.
>
> Have a nice day,

Yes, I do routinely use non-quoted identifiers. The problem is, that
they are case-folded (to lower in PG's case), so my camel-case does not
match. For the query to work I MUST quote identifiers hat have camel-case.

SELECT
pers."firstName",
pers.lastname,
...

Has your experience with PG been different? If so I presume you have
have found a config that allows?:

SELECT
pers.firstName,
pers.lastname,

Ken


From: Dennis Bjorklund <db(at)zigo(dot)dhs(dot)org>
To: ken(at)kensystem(dot)com
Cc: Martijn van Oosterhout <kleptog(at)svana(dot)org>, pgsql-sql(at)postgresql(dot)org
Subject: Re: [HACKERS] Case Preservation disregarding case
Date: 2006-12-02 16:02:55
Message-ID: 4571A3AF.6070608@zigo.dhs.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-sql

Ken Johanson skrev:

> Has your experience with PG been different? If so I presume you have
> have found a config that allows?:
>
> SELECT
> pers.firstName,
> pers.lastname,

As long as you don't create the columns using quotes you can use that
kind of names. For example

CREATE TABLE foo (BAR int);

INSERT INTO foo VALUES (42);

SELECT BaR, bar, BAR, "bar" FROM foo;

But using "Bar" wont work.

/Dennis


From: Ken Johanson <pg-user(at)kensystem(dot)com>
To: Dennis Bjorklund <db(at)zigo(dot)dhs(dot)org>
Cc: Martijn van Oosterhout <kleptog(at)svana(dot)org>, pgsql-sql(at)postgresql(dot)org
Subject: Re: [HACKERS] Case Preservation disregarding case
Date: 2006-12-02 16:48:38
Message-ID: 4571AE66.7050304@kensystem.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-sql

Dennis Bjorklund wrote:
> Ken Johanson skrev:
>
>> Has your experience with PG been different? If so I presume you have
>> have found a config that allows?:
>>
>> SELECT
>> pers.firstName,
>> pers.lastname,
>
> As long as you don't create the columns using quotes you can use that
> kind of names. For example
>
> CREATE TABLE foo (BAR int);
>
> INSERT INTO foo VALUES (42);
>
> SELECT BaR, bar, BAR, "bar" FROM foo;
>
> But using "Bar" wont work.
>
> /Dennis
>

That definitely makes sense. If one *wants* to enforce case, they should
create the identifier with quotes.

Although, since I'm using pgAdmin (III) to design tables in this case,
or creating the tables through JDBC (a convenient way to copy tables and
data from another datasource) (using the copy-paste gesture), I believe
those tools both *are* quoting identifiers that have camel case. And
that their behavior can't be overridden.

So though I might personally be comfortable with DDL commands to
re-define my existing quoted columns (as created by JDBC and pgAdmin),
other users may not be. And having to recreate and issue the DDL to use
un-quoted col names will be tedious in my case since I have ~20 tables
to import.

So my vote would remain for having a config-option to ignore case, even
on quoted identifiers..

Ken


From: Dennis Bjorklund <db(at)zigo(dot)dhs(dot)org>
To: ken(at)kensystem(dot)com
Cc: Martijn van Oosterhout <kleptog(at)svana(dot)org>, pgsql-sql(at)postgresql(dot)org
Subject: Re: [HACKERS] Case Preservation disregarding case
Date: 2006-12-02 17:10:42
Message-ID: 4571B392.6020303@zigo.dhs.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-sql

Ken Johanson skrev:

> Although, since I'm using pgAdmin (III) to design tables in this case,
> or creating the tables through JDBC (a convenient way to copy tables and
> data from another datasource) (using the copy-paste gesture), I believe
> those tools both *are* quoting identifiers that have camel case. And
> that their behavior can't be overridden.

I know nothing about JDBC but I've heard that pgAdmin always quote
identifiers. What you can do is to always create tables and columns
using lower case in pgadmin and then you can refer to them using any
case in your SQL (as long as you don't "Quote" the identifiers in your
SQL code).

> other users may not be. And having to recreate and issue the DDL to use
> un-quoted col names will be tedious in my case since I have ~20 tables
> to import.

Yes, it will be some work but it is at least possible to do a bunch of
renames.

> So my vote would remain for having a config-option to ignore case, even
> on quoted identifiers..

And my vote is to not have such an option. But I'm not the one who
decide so don't worry about what I think :-) I would like to have an
option to upper case the identifiers instead of lower casing them as pg
do. The sql standard say that they should be upper cased. But as far as
I know there are no plan at the moment to add such an option either.
Some time in the future I expect it to be implemented only because it's
the standard.

/Dennis


From: Ken Johanson <pg-user(at)kensystem(dot)com>
To: Dennis Bjorklund <db(at)zigo(dot)dhs(dot)org>
Cc: Martijn van Oosterhout <kleptog(at)svana(dot)org>, pgsql-sql(at)postgresql(dot)org
Subject: Re: [HACKERS] Case Preservation disregarding case
Date: 2006-12-02 18:08:51
Message-ID: 4571C133.1060200@kensystem.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-sql

Dennis Bjorklund wrote:
>> So my vote would remain for having a config-option to ignore case,
>> even on quoted identifiers..
>
> And my vote is to not have such an option. But I'm not the one who
> decide so don't worry about what I think :-) I would like to have an
> option to upper case the identifiers instead of lower casing them as pg
> do. The sql standard say that they should be upper cased. But as far as
> I know there are no plan at the moment to add such an option either.
> Some time in the future I expect it to be implemented only because it's
> the standard.
>
> /Dennis
>

Ya, I thought about that - just designing with all-lower case column
names. But then the column names are harder to read, unless I change
over to using underscore naming.

I personally cant see any benefit at all to having the DB fold case (up
or down), or enforcing case sensitive identifiers. In fact I believe
that if here were an option to support case insensitive (at least on a
session-level) that would be largely innocuous --- beneficial even since
the optimizer would now detect that someone created both a rowId, rowid,
and that indicates a possible design error (the whole fail-fast notion).

In one way I think that even allowing creation of a separate "rowid" and
"rowId" sort of violates set theory in a 4+ GL language... a "name" in
its most abstract (human) sense doesn't (shouldn't) consider the case of
its characters. Only what the characters are. A rowid is also a rowId
(or ROWID). Who really intentionally mixes them? (only 3-4GL
*programmers* who consider all-caps to represent constants in my
experience).

thoughts,
Ken


From: Ken Johanson <pg-user(at)kensystem(dot)com>
To: Martijn van Oosterhout <kleptog(at)svana(dot)org>
Cc: pgsql-sql(at)postgresql(dot)org
Subject: Re: [HACKERS] Case Preservation disregarding case
Date: 2006-12-03 04:57:07
Message-ID: 45725923.7010500@kensystem.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-sql

Martijn van Oosterhout wrote:
> On Sat, Dec 02, 2006 at 11:08:51AM -0700, Ken Johanson wrote:
>>> And my vote is to not have such an option. But I'm not the one who
>>> decide so don't worry about what I think :-) I would like to have an
>>> option to upper case the identifiers instead of lower casing them as pg
>>> do. The sql standard say that they should be upper cased. But as far as
>>> I know there are no plan at the moment to add such an option either.
>>> Some time in the future I expect it to be implemented only because it's
>>> the standard.
>
> I think it's unlikely to happen anytime soon. The primary reason being
> that then you can no longer use indexes to search the catalog. Which

I'm pretty sure this is no the case - other DBs do allow index search on
columns/identifiers regardless of their case. Probably the typical
strategy is to use a case-insensitive hashtable (fold case for the keys
before generating the hash). If its the actual data that you're
referring to in index searches, that would be a separate topic I think.

> means it has to be fixed at initdb time. And it would break a large
> number of client apps, for no particularly good reason.

I take a different opinion on this:

-*If* the option to turn on case-insenetive behavior were selectable at
the DB or session level, the existing apps could continue to use the
case sensitve mode and be completely unaffected.

-IMO turning it on *globally* would only break apps that are built
case-sensitivly *and* refer to identifiers of the same name (but mixed
case) *and* are written for PG (since PG *had* been by and large
non-portable until recently.. the addition of standard string quoting
for example)

-It would *enhance* people's ability to "bring in" apps from so many
other DBs which don't treat identifiers as case sensitive. More of a
compatibility boon than loss. Thats is a particularly good reason to me
(since I'm the one who has to issue DDL on all my camelCase columns and
recode my identifiers).

>
> Since the way identifiers are treated is user-visible, it would mean
> that apps would have to be coded to work with any setting. What would
> probably happen is that app A would only work with case-sensetive, and
> app B would only work with case-insensetive, and you end up with two
> apps that can't work on the same database.
>
> That's *bad*, we don't want to go there.

That is a good point and I'd normally agree - entice people to use the
lowest common denominator behavior and code their apps case-sensitive.
And yet, the DBs that expect case-sens are now the minority, and we have:

a) programmers who code against MySQL or MSSQL, or;
b) are laymen try to run or port an app designed on MySQL to PG

Maybe not right per se - but the more popular way of doing things
eventually wins out.

..

>
>> In one way I think that even allowing creation of a separate "rowid" and
>> "rowId" sort of violates set theory in a 4+ GL language... a "name" in
>> its most abstract (human) sense doesn't (shouldn't) consider the case of
>> its characters. Only what the characters are. A rowid is also a rowId
>> (or ROWID). Who really intentionally mixes them? (only 3-4GL
>> *programmers* who consider all-caps to represent constants in my
>> experience).
>
> The thing is, postgresql *is* case-insensetive, as is the whole SQL
> language. It not case-preserving, that's all.

Right, it's case insensitive only if you're willing to accept case
folding (down) everything that's not quoted. Not being case-preserving,
as you say.

But thats a pita to anyone coming from those "other" DBs and wants their
column names to have mixed/camel case (for readability). PG right now
*forces* them to change/adhere to an underscore naming, or to quote
*every* mixed case identifier. You MUST tolerate having your column
names stored in all-lower case, or else you must quote all of them.

Best,
Ken


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Ken Johanson <pg-user(at)kensystem(dot)com>
Cc: Martijn van Oosterhout <kleptog(at)svana(dot)org>, pgsql-sql(at)postgresql(dot)org
Subject: Re: [HACKERS] Case Preservation disregarding case
Date: 2006-12-03 05:55:38
Message-ID: 13344.1165125338@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-sql

Ken Johanson <pg-user(at)kensystem(dot)com> writes:
> Martijn van Oosterhout wrote:
>> I think it's unlikely to happen anytime soon. The primary reason being
>> that then you can no longer use indexes to search the catalog. Which

> I take a different opinion on this:

> -*If* the option to turn on case-insenetive behavior were selectable at
> the DB or session level, the existing apps could continue to use the
> case sensitve mode and be completely unaffected.

Ken, you clearly fail to understand the point being made above. This is
not something that is "selectable at the session level". It won't work
from either a functional or performance perspective.

The real bottom line, though, is that this community has little respect
for proposals that involve moving away from the SQL spec rather than
closer to it; and that's what you're asking us to do. The spec is not
at all vague about the case-sensitivity of identifiers. Now certainly
we are not doing exactly what the spec says, but what you ask is even
less like the spec's requirements.

regards, tom lane


From: Ken Johanson <pg-user(at)kensystem(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-sql(at)postgresql(dot)org
Subject: Re: [HACKERS] Case Preservation disregarding case
Date: 2006-12-03 07:32:45
Message-ID: 45727D9D.6080409@kensystem.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-sql

Tom Lane wrote:
> Ken Johanson <pg-user(at)kensystem(dot)com> writes:
>
>> -*If* the option to turn on case-insenetive behavior were selectable at
>> the DB or session level, the existing apps could continue to use the
>> case sensitve mode and be completely unaffected.
>
> Ken, you clearly fail to understand the point being made above. This is
> not something that is "selectable at the session level". It won't work
> from either a functional or performance perspective.
>

Tom,

re: "clearly fail to understand": I know it's not currently selectable
at the session-level. My proposal (quote):

*If* the option ... were selectable

was to make them selectable -- this and another item that I'd proposed
in the past (The proposal for 'standard_conforming_strings' at the
session level was a step *toward* spec without harming existing apps).
Having to set standard_conforming_strings globally for example, is a
problematic setup for shared servers (ISPs etc).

A session configurable strategy is a huge feature to have in shared
environments. (BTW that strategy is both "functional and performing" for
Mysql -- several SQL conformance items can be config'd per connection).

> ...proposals that involve moving away from the SQL spec rather than
> closer to it; and that's what you're asking us to do.

If I ever proposed changing any *default* (spec) behavior, it was
inadvertent or a mistake. (prob. in making my argument that case-insens
does more harm that good for new PG users).

Of course adhering to spec is desirable, and if that mode of operation
remains *supported by default*, good. Adding per-connection or per-db
'quirks mode' (in the case of plug-in apps already built as case-insens
is innocuous).. that's a useful thing.

Put another way how many places is PG *deviating* from spec just by
providing conveniences -- would users revolt if those were dropped? For
compatibility, whether with old PG or Mysql (which is case-preserving
but case insens.), we NEED them. Not as a default behavior per se, but
as a per-session option.

Best,
Ken


From: Ken Johanson <pg-user(at)kensystem(dot)com>
To: Martijn van Oosterhout <kleptog(at)svana(dot)org>
Cc: pgsql-sql(at)postgresql(dot)org
Subject: Re: [HACKERS] Case Preservation disregarding case
Date: 2006-12-03 16:00:52
Message-ID: 4572F4B4.4060600@kensystem.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-sql

Martijn van Oosterhout wrote:

> I think you're missing the point that clients will be using languages
> that are case sensetive. Consider the following Perl code:
>
> $dbh->do("CREATE TEMP TABLE foo (Bar int4)");
> $dbh->do("INSERT INTO foo VALUES (1)");
> my $sth = $dbh->prepare("SELECT Bar FROM foo");
> $sth->execute();
> my $row = $sth->fetchrow_hashref();
> print $row->{"bar"}; # prints 1
>
> This code will works on any version of postgresql ever released, yet
> your change will break it. By setting some options you can work around
> it, but it's still a gratuitous change.
>
> Have a nice day,

I agree, that code would break -- if the option were enabled globally --
because the named-column retrieval internals of that and many existing
modules don't do case-insens lookups. They would have to be retrofitted
to support it.

So that is the reason there was an idea proposed per database or per
connection SQL conformance controls (like what Mysql has). They'd allow
other apps to elect "less standard" modes (not just this one), for the
sake of compatibility (beit old PG modules or and other DB). You code
sample would continue to work.

Ken


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Ken Johanson <pg-user(at)kensystem(dot)com>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, pgsql-sql(at)postgresql(dot)org
Subject: Re: [HACKERS] Case Preservation disregarding case
Date: 2006-12-03 16:57:06
Message-ID: 200612031657.kB3Gv6408322@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-sql

Tom Lane wrote:
> Ken Johanson <pg-user(at)kensystem(dot)com> writes:
> > Martijn van Oosterhout wrote:
> >> I think it's unlikely to happen anytime soon. The primary reason being
> >> that then you can no longer use indexes to search the catalog. Which
>
> > I take a different opinion on this:
>
> > -*If* the option to turn on case-insenetive behavior were selectable at
> > the DB or session level, the existing apps could continue to use the
> > case sensitve mode and be completely unaffected.
>
> Ken, you clearly fail to understand the point being made above. This is
> not something that is "selectable at the session level". It won't work
> from either a functional or performance perspective.
>
> The real bottom line, though, is that this community has little respect
> for proposals that involve moving away from the SQL spec rather than
> closer to it; and that's what you're asking us to do. The spec is not
> at all vague about the case-sensitivity of identifiers. Now certainly
> we are not doing exactly what the spec says, but what you ask is even
> less like the spec's requirements.

I think there are two major issues here:

o if you quote identifiers that have any upper-case characters,
do it both at table creation and use
o display of non-quoted identifiers is lower-case

I think we are OK making people either always quote, or always not
quote. What we don't currently have a good answer for is people wanting
the identifiers displayed using the original case. You can use quotes
all the time of you want such display, but it is a pain. I think this
is the crux of the complaints.

Saving the case of the original creation and displaying that does work,
but then it isn't clear if the identifier needs quotes (is the
upper-case real or just for display). This gets us into even more
confusion.

Can someone can think of an answer to all this?

--
Bruce Momjian bruce(at)momjian(dot)us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Ken Johanson <pg-user(at)kensystem(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, pgsql-sql(at)postgresql(dot)org
Subject: Re: [HACKERS] Case Preservation disregarding case
Date: 2006-12-06 04:28:42
Message-ID: 457646FA.1020804@kensystem.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-sql

Bruce Momjian wrote:
> Tom Lane wrote:
>>
>> The real bottom line, though, is that this community has little respect
>> for proposals that involve moving away from the SQL spec rather than
>> closer to it; and that's what you're asking us to do. The spec is not
>> at all vague about the case-sensitivity of identifiers. Now certainly
>> we are not doing exactly what the spec says, but what you ask is even
>> less like the spec's requirements.
>
> I think there are two major issues here:
>
> o if you quote identifiers that have any upper-case characters,
> do it both at table creation and use
> o display of non-quoted identifiers is lower-case
>
> I think we are OK making people either always quote, or always not
> quote. What we don't currently have a good answer for is people wanting
> the identifiers displayed using the original case. You can use quotes
> all the time of you want such display, but it is a pain. I think this
> is the crux of the complaints.
>
> Saving the case of the original creation and displaying that does work,
> but then it isn't clear if the identifier needs quotes (is the
> upper-case real or just for display). This gets us into even more
> confusion.
>
> Can someone can think of an answer to all this?
>

I believe there is no one answer that will solve all cases.. but one
solution that might come close is something like this (I take no credit,
others have described this before me):

PG would support two modes of operation:

1) Standard compliant, and;

2) Quirks / compatibility mode (case preserving but case-insensitive)

I believe (and may be wrong) that the #2 mode-of-operation would only
require:

a) resultset data to have it's (unquoted) identifiers to be case-folded,
and;
b) queries with (unquoted) identifiers (joins / aliases etc) would
continue in the planner to be case folded, so would work as now (and in
effect be case-insensitive).
c) the table describe syntax would return the case-preserved id name
(which it already does if quoted?), or d:
d) in addition to a), optionally send metadata describing the
case-preserved name; this might someday allow newer drivers to display
(display only, not column lookup) those names in database-designer views
(iSQL types apps)

If #a is possible, then drivers should not break, even if the DB is
config'd to use setting #2. But I don't know the low-level protocol of
PG to know if that is possible. ..

Hopefully I'm not missing any points here, please correct me if so...

I believe what I described above (while not being standard complaint per
se) is identical to how mysql and mssql work (operationally anyway)
right now.. On the other had Oracle and others work as PG does now, but
my point in discussing this, is that the first two DBs have enough
market, that offering a compatibility mode to ease the burden of porting
apps would have substantial value (I know this from experience)

Ken


From: "Aaron Bono" <postgresql(at)aranya(dot)com>
To: "Ken Johanson" <pg-user(at)kensystem(dot)com>
Cc: "Bruce Momjian" <bruce(at)momjian(dot)us>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Martijn van Oosterhout" <kleptog(at)svana(dot)org>, pgsql-sql(at)postgresql(dot)org
Subject: Re: [HACKERS] Case Preservation disregarding case
Date: 2006-12-08 14:09:10
Message-ID: bf05e51c0612080609y2201db8cva5ec3e99391fcceb@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-sql

On 12/5/06, Ken Johanson <pg-user(at)kensystem(dot)com> wrote:
>
> Bruce Momjian wrote:
> > Tom Lane wrote:
> >>
> >> The real bottom line, though, is that this community has little respect
> >> for proposals that involve moving away from the SQL spec rather than
> >> closer to it; and that's what you're asking us to do. The spec is not
> >> at all vague about the case-sensitivity of identifiers. Now certainly
> >> we are not doing exactly what the spec says, but what you ask is even
> >> less like the spec's requirements.
> >
> > I think there are two major issues here:
> >
> > o if you quote identifiers that have any upper-case characters,
> > do it both at table creation and use
> > o display of non-quoted identifiers is lower-case
> >
> > I think we are OK making people either always quote, or always not
> > quote. What we don't currently have a good answer for is people wanting
> > the identifiers displayed using the original case. You can use quotes
> > all the time of you want such display, but it is a pain. I think this
> > is the crux of the complaints.
> >
> > Saving the case of the original creation and displaying that does work,
> > but then it isn't clear if the identifier needs quotes (is the
> > upper-case real or just for display). This gets us into even more
> > confusion.
> >
> > Can someone can think of an answer to all this?
> >
>
> I believe there is no one answer that will solve all cases.. but one
> solution that might come close is something like this (I take no credit,
> others have described this before me):
>
> PG would support two modes of operation:
>
> 1) Standard compliant, and;
>
> 2) Quirks / compatibility mode (case preserving but case-insensitive)
>
> I believe (and may be wrong) that the #2 mode-of-operation would only
> require:
>
> a) resultset data to have it's (unquoted) identifiers to be case-folded,
> and;
> b) queries with (unquoted) identifiers (joins / aliases etc) would
> continue in the planner to be case folded, so would work as now (and in
> effect be case-insensitive).
> c) the table describe syntax would return the case-preserved id name
> (which it already does if quoted?), or d:
> d) in addition to a), optionally send metadata describing the
> case-preserved name; this might someday allow newer drivers to display
> (display only, not column lookup) those names in database-designer views
> (iSQL types apps)
>
> If #a is possible, then drivers should not break, even if the DB is
> config'd to use setting #2. But I don't know the low-level protocol of
> PG to know if that is possible. ..
>
> Hopefully I'm not missing any points here, please correct me if so...
>
> I believe what I described above (while not being standard complaint per
> se) is identical to how mysql and mssql work (operationally anyway)
> right now.. On the other had Oracle and others work as PG does now, but
> my point in discussing this, is that the first two DBs have enough
> market, that offering a compatibility mode to ease the burden of porting
> apps would have substantial value (I know this from experience)
>

Wasn't the whole reason this came up because someone wanted to do SELECT *
FROM some_table and have the returned columns returned in a case that was
viewable? This thread has gone on for so long I don't remember for sure and
I have not kept up with the thread very well.

If this is the "case", no pun intended, then why not create views like this:

CREATE VIEW my_view (
"Person's ID",
"Person's Name"
) AS
SELECT
person_id,
first_name || ' ' || last_name
FROM person
;

When I do a SELECT * FROM my_view, I get "Person's ID", not "PERSON'S ID" or
"person's id".

--
==================================================================
Aaron Bono
Aranya Software Technologies, Inc.
http://www.aranya.com
http://codeelixir.com
==================================================================


From: Erik Aronesty <erik(at)q32(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [SQL] Case Preservation disregarding case
Date: 2009-11-05 15:45:24
Message-ID: ccd588d90911050745o4d2e3d83m869e86ae287a1263@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-sql

An update on pg_comparator as an efficient way to do master-slave
replication.

I have been using it for 2 years on a "products" table that has grown from
12,000 rows to 24,000 rows. There are 3 slaves and 1 master. It is sync'ed
every 10 minutes.

It has never failed or caused problems.

On 23039 rows, with under 100 rows changed, over a 3mbit internet
connection, and on a the sync takes 3.3 seconds, 0.94 seconds of which is
CPU time (1.86 GHZ intel dual core). Most of the time is waiting for the
network. And that could be sped up considerably with compression (maybe
5-10 times for my data)... I don't think the postgres communications
protocol considers compression an option.

I do not synchronize all the columns ... just the 15 most important ones

Average number of bytes per row is 284

Primary Key is an autoincrement integer id

Databases are all on the internet at cheap colocation centers with suppsedly
10mbit high speed connections that realistically get about 3 mbit.

I ship a backup and restore of the table every week... in case there are
tons of changes and the system burps when there are too many.... I also
schedule scripts that might make lots of changes to happen before the
dump/restore.

In my 15 years as a DBA, I have never had "replication" (which some say this
isn't, and I say that's a matter of how you define it) work so well.