Re: Casting issues with domains

Lists: pgsql-hackers
From: Thomas Reiss <thomas(dot)reiss(at)dalibo(dot)com>
To: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Casting issues with domains
Date: 2014-12-08 10:06:07
Message-ID: 5485780F.7080200@dalibo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hello all,

We experienced some casting issues with domains. We experienced the
problem while querying the information_schema btw, but here is a simpler
test case :

postgres=# create table test1 (a text);
CREATE TABLE
postgres=# insert into test1 select generate_series(1,100000);
INSERT 0 100000
postgres=# create index idx1 on test1(a);
CREATE INDEX
postgres=# analyze test1 ;
ANALYZE;
postgres=# explain select * from test1 where a = 'toto';
QUERY PLAN
-----------------------------------------------------------------------
Index Only Scan using idx1 on test1 (cost=0.29..8.31 rows=1 width=5)
Index Cond: (a = 'toto'::text)
(2 lignes)

Now we create a tstdom domain and cast the a column to tstdom in the
view definition :
postgres=# create domain tstdom as text;
CREATE DOMAIN
postgres=# create view test2 as select a::tstdom from test1 ;
CREATE VIEW
postgres=# explain select * from test2 where a='toto';
QUERY PLAN
----------------------------------------------------------
Seq Scan on test1 (cost=0.00..1693.00 rows=500 width=5)
Filter: (((a)::tstdom)::text = 'toto'::text)
(2 lignes)

As you can see, a is casted to tstdom then again to text. This casts
prevents the optimizer to choose an index scan to retrieve the data. The
casts are however strictly equivalent and should be not prevent the
optimizer to use indexes.

Also, the same problem appears in the information_schema views, as every
object names are casted to information_schema.sql_identifier. Even if
this domain is declared as name, no index will be used because of this cast.

Shouldn't the planner simplify the casts when it's possible ?

Regards,
Thomas


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Thomas Reiss <thomas(dot)reiss(at)dalibo(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Casting issues with domains
Date: 2014-12-08 15:18:42
Message-ID: 25277.1418051922@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Thomas Reiss <thomas(dot)reiss(at)dalibo(dot)com> writes:
> postgres=# explain select * from test2 where a='toto';
> QUERY PLAN
> ----------------------------------------------------------
> Seq Scan on test1 (cost=0.00..1693.00 rows=500 width=5)
> Filter: (((a)::tstdom)::text = 'toto'::text)
> (2 lignes)

> As you can see, a is casted to tstdom then again to text. This casts
> prevents the optimizer to choose an index scan to retrieve the data. The
> casts are however strictly equivalent and should be not prevent the
> optimizer to use indexes.

No, they are not equivalent. The optimizer can't simply drop the
cast-to-domain, because that cast might result in a runtime error due
to a domain CHECK constraint violation. (This is true even if no such
constraint exists at planning time, unfortunately. If we had a
mechanism to force replanning at ALTER DOMAIN ADD CONSTRAINT, maybe the
no-constraints case could be handled better, but we don't; and adding
one would also imply adding more locks around domain usage, so it's not
all that attractive to do it.)

The short answer is that SQL domains are not zero-cost type aliases.
Perhaps there would be value in having a feature that *is* a a zero-cost
alias, but it wouldn't be a domain.

regards, tom lane


From: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Thomas Reiss <thomas(dot)reiss(at)dalibo(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Casting issues with domains
Date: 2014-12-08 21:43:14
Message-ID: 54861B72.9050804@BlueTreble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 12/8/14, 9:18 AM, Tom Lane wrote:
> The short answer is that SQL domains are not zero-cost type aliases.
> Perhaps there would be value in having a feature that*is* a a zero-cost
> alias, but it wouldn't be a domain.

Note that you can actually re-use the support functions of one data type to create a new one. So if you wanted a special type called "document" that actually behaved the same as text you could do that fairly easily (though not as easily as creating a domain).

If we were going to expend energy here, I suspect it would be more useful to look at ways of creating new types without requiring C. C isn't an option on many (even most) environments in today's "cloud" world, aside from the intimidation factor. There are comments in the code that hypothesize about making cstring a full type; that might be all that's needed.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


From: Thomas Reiss <thomas(dot)reiss(at)dalibo(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Casting issues with domains
Date: 2014-12-09 10:42:55
Message-ID: 5486D22F.2050105@dalibo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Le 08/12/2014 16:18, Tom Lane a écrit :
> Thomas Reiss <thomas(dot)reiss(at)dalibo(dot)com> writes:
>> postgres=# explain select * from test2 where a='toto';
>> QUERY PLAN
>> ----------------------------------------------------------
>> Seq Scan on test1 (cost=0.00..1693.00 rows=500 width=5)
>> Filter: (((a)::tstdom)::text = 'toto'::text)
>> (2 lignes)
>
>> As you can see, a is casted to tstdom then again to text. This casts
>> prevents the optimizer to choose an index scan to retrieve the data. The
>> casts are however strictly equivalent and should be not prevent the
>> optimizer to use indexes.
>
> No, they are not equivalent. The optimizer can't simply drop the
> cast-to-domain, because that cast might result in a runtime error due
> to a domain CHECK constraint violation. (This is true even if no such
> constraint exists at planning time, unfortunately. If we had a
> mechanism to force replanning at ALTER DOMAIN ADD CONSTRAINT, maybe the
> no-constraints case could be handled better, but we don't; and adding
> one would also imply adding more locks around domain usage, so it's not
> all that attractive to do it.)
>
> The short answer is that SQL domains are not zero-cost type aliases.
> Perhaps there would be value in having a feature that *is* a a zero-cost
> alias, but it wouldn't be a domain.

I agree regarding the feature for zero-cost aliases. It would ease
access on the catalog done via the information_schema for example.

Thanks for your answer. There's some room for improvement for sure, but
it not as easy as it seems.

Regards,
Thomas


From: Kevin Grittner <kgrittn(at)ymail(dot)com>
To: Thomas Reiss <thomas(dot)reiss(at)dalibo(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Casting issues with domains
Date: 2014-12-10 18:41:27
Message-ID: 889339290.4895779.1418236887637.JavaMail.yahoo@jws100160.mail.ne1.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Thomas Reiss <thomas(dot)reiss(at)dalibo(dot)com> wrote:

> postgres=# create table test1 (a text);
> CREATE TABLE
> postgres=# insert into test1 select generate_series(1,100000);
> INSERT 0 100000
> postgres=# create index idx1 on test1(a);
> CREATE INDEX
> postgres=# analyze test1 ;
> ANALYZE;
> postgres=# explain select * from test1 where a = 'toto';
> QUERY PLAN
> -----------------------------------------------------------------------
> Index Only Scan using idx1 on test1 (cost=0.29..8.31 rows=1 width=5)
> Index Cond: (a = 'toto'::text)
> (2 lignes)
>
> Now we create a tstdom domain and cast the a column to tstdom in the
> view definition :
> postgres=# create domain tstdom as text;
> CREATE DOMAIN
> postgres=# create view test2 as select a::tstdom from test1 ;
> CREATE VIEW
> postgres=# explain select * from test2 where a='toto';
> QUERY PLAN
> ----------------------------------------------------------
> Seq Scan on test1 (cost=0.00..1693.00 rows=500 width=5)
> Filter: (((a)::tstdom)::text = 'toto'::text)
> (2 lignes)
>
> As you can see, a is casted to tstdom then again to text. This casts
> prevents the optimizer to choose an index scan to retrieve the data. The
> casts are however strictly equivalent and should be not prevent the
> optimizer to use indexes.

You can create an index to be used for searching using the domain.
Following the steps in your example, you can run this:

postgres=# create index idx2 on test1 ((a::tstdom));
CREATE INDEX
postgres=# vacuum analyze test1;
VACUUM
postgres=# explain select * from test2 where a='toto';
QUERY PLAN
------------------------------------------------------------------
Index Scan using idx2 on test1 (cost=0.29..8.31 rows=1 width=5)
Index Cond: (((a)::tstdom)::text = 'toto'::text)
(2 rows)

It's even easier if "a" is defined to be a member of the domain in
the original table:

postgres=# create domain tstdom as text;
CREATE DOMAIN
postgres=# create table test1 (a tstdom);
CREATE TABLE
postgres=# insert into test1 select generate_series(1,100000);
INSERT 0 100000
postgres=# create index idx1 on test1(a);
CREATE INDEX
postgres=# analyze test1 ;
ANALYZE
postgres=# explain select * from test1 where a = 'toto';
QUERY PLAN
-----------------------------------------------------------------------
Index Only Scan using idx1 on test1 (cost=0.29..8.31 rows=1 width=5)
Index Cond: (a = 'toto'::text)
(2 rows)

postgres=# create view test2 as select a::tstdom from test1 ;
CREATE VIEW
postgres=# explain select * from test2 where a='toto';
QUERY PLAN
-----------------------------------------------------------------------
Index Only Scan using idx1 on test1 (cost=0.29..8.31 rows=1 width=5)
Index Cond: (a = 'toto'::text)
(2 rows)

It's kinda hard for me to visualize where it makes sense to define
the original table column as the bare type but use a domain when
referencing it in the view.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Kevin Grittner <kgrittn(at)ymail(dot)com>
Cc: Thomas Reiss <thomas(dot)reiss(at)dalibo(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Casting issues with domains
Date: 2014-12-10 23:01:09
Message-ID: 31143.1418252469@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Kevin Grittner <kgrittn(at)ymail(dot)com> writes:
> It's kinda hard for me to visualize where it makes sense to define
> the original table column as the bare type but use a domain when
> referencing it in the view.

As far as that goes, I think the OP was unhappy about the performance
of the information_schema views, which in our implementation do exactly
that so that the exposed types of the view columns conform to the SQL
standard, even though the underlying catalogs use PG-centric types.

I don't believe that that's the only reason why the performance of the
information_schema views tends to be sucky, but it's certainly a reason.

regards, tom lane


From: Kevin Grittner <kgrittn(at)ymail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Thomas Reiss <thomas(dot)reiss(at)dalibo(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Casting issues with domains
Date: 2014-12-10 23:23:46
Message-ID: 1702807885.4979365.1418253826765.JavaMail.yahoo@jws100136.mail.ne1.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Kevin Grittner <kgrittn(at)ymail(dot)com> writes:

>> It's kinda hard for me to visualize where it makes sense to define
>> the original table column as the bare type but use a domain when
>> referencing it in the view.
>
> As far as that goes, I think the OP was unhappy about the performance
> of the information_schema views, which in our implementation do exactly
> that so that the exposed types of the view columns conform to the SQL
> standard, even though the underlying catalogs use PG-centric types.
>
> I don't believe that that's the only reason why the performance of the
> information_schema views tends to be sucky, but it's certainly a reason.

Is that schema too "edge case" to justify some functional indexes
on the cast values on the underlying catalogs? (I'm inclined to
think so, but it seemed like a question worth putting out
there....)

Or, since these particular domains are known, is there any sane way
to "special-case" these to allow the underlying types to work?

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Kevin Grittner <kgrittn(at)ymail(dot)com>
Cc: Thomas Reiss <thomas(dot)reiss(at)dalibo(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Casting issues with domains
Date: 2014-12-10 23:46:35
Message-ID: 32285.1418255195@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Kevin Grittner <kgrittn(at)ymail(dot)com> writes:
> Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> As far as that goes, I think the OP was unhappy about the performance
>> of the information_schema views, which in our implementation do exactly
>> that so that the exposed types of the view columns conform to the SQL
>> standard, even though the underlying catalogs use PG-centric types.
>>
>> I don't believe that that's the only reason why the performance of the
>> information_schema views tends to be sucky, but it's certainly a reason.

> Is that schema too "edge case" to justify some functional indexes
> on the cast values on the underlying catalogs? (I'm inclined to
> think so, but it seemed like a question worth putting out
> there....)

We don't support functional indexes on system catalogs, so whether it'd
be justified is sorta moot. On the whole though I'm inclined to agree
that the information_schema views aren't used enough to justify adding
overhead to system-catalog updates, even if the pieces for that all
existed.

> Or, since these particular domains are known, is there any sane way
> to "special-case" these to allow the underlying types to work?

I don't particularly care for a kluge solution here.

I notice that recent versions of the SQL spec contain the notion of a
"distinct type", which is a user-defined type that is representationally
identical to some base type but has its own name, and comes equipped with
assignment-grade casts to and from the base type (which in PG terms would
be binary-compatible casts, though the spec doesn't require that).
It seems like this might be intended to be the sort of "zero cost type
alias" we were talking about, except that the SQL committee seems to have
got it wrong by not specifying the cast-to-base-type as being implicit.
Which ISTM you would want so that operators/functions on the base type
would apply automatically to the distinct type. But perhaps we could
extend the spec with some option to CREATE TYPE to allow the cast to come
out that way.

Or in short, maybe we should try to replace the domains used in the
current information_schema with distinct types.

regards, tom lane


From: Thomas Reiss <thomas(dot)reiss(at)dalibo(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <kgrittn(at)ymail(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Casting issues with domains
Date: 2014-12-11 08:41:08
Message-ID: 548958A4.4050902@dalibo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Le 11/12/2014 00:46, Tom Lane a écrit :
> Kevin Grittner <kgrittn(at)ymail(dot)com> writes:
>> Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> As far as that goes, I think the OP was unhappy about the performance
>>> of the information_schema views, which in our implementation do exactly
>>> that so that the exposed types of the view columns conform to the SQL
>>> standard, even though the underlying catalogs use PG-centric types.
>>>
>>> I don't believe that that's the only reason why the performance of the
>>> information_schema views tends to be sucky, but it's certainly a reason.
>
>> Is that schema too "edge case" to justify some functional indexes
>> on the cast values on the underlying catalogs? (I'm inclined to
>> think so, but it seemed like a question worth putting out
>> there....)
>
> We don't support functional indexes on system catalogs, so whether it'd
> be justified is sorta moot. On the whole though I'm inclined to agree
> that the information_schema views aren't used enough to justify adding
> overhead to system-catalog updates, even if the pieces for that all
> existed.
>
>> Or, since these particular domains are known, is there any sane way
>> to "special-case" these to allow the underlying types to work?
>
> I don't particularly care for a kluge solution here.
>
> I notice that recent versions of the SQL spec contain the notion of a
> "distinct type", which is a user-defined type that is representationally
> identical to some base type but has its own name, and comes equipped with
> assignment-grade casts to and from the base type (which in PG terms would
> be binary-compatible casts, though the spec doesn't require that).
> It seems like this might be intended to be the sort of "zero cost type
> alias" we were talking about, except that the SQL committee seems to have
> got it wrong by not specifying the cast-to-base-type as being implicit.
> Which ISTM you would want so that operators/functions on the base type
> would apply automatically to the distinct type. But perhaps we could
> extend the spec with some option to CREATE TYPE to allow the cast to come
> out that way.
>
> Or in short, maybe we should try to replace the domains used in the
> current information_schema with distinct types.

That's interesting and could easily solve the problem.

To give some context, for some reason, Drupal queries the
information_schema views before displaying some pages.
As our customer has many tables (approx 60000 tables, organised "à la
Oracle" with one schema per database user). Thus, the seq scan against
pg_class takes ~50ms and the very same one without the cast takes less
than 1ms.

There is an example of query used :
SELECT column_name, data_type, column_default
FROM information_schema.columns
WHERE table_schema = 'one_schema'
AND table_name = 'one_table'
AND ( data_type = 'bytea'
OR ( numeric_precision IS NOT NULL
AND column_default::text LIKE '%nextval%' )
);

Regards