Re: Convert a text list to text array? Was: Denormalizing during

Lists: pgsql-sql
From: no(dot)spam(at)address(dot)com
To: pgsql-sql(at)postgresql(dot)org
Subject: Denormalizing during select
Date: 2003-02-24 17:18:24
Message-ID: u3kk5v4i276aqv2hv22pn6jh4t3a2ev08g@4ax.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

I have two tables (A and B) in a one-to-many relationship. When
retrieving data from one table (A), I need to denormalize the tables
during a select so that values of a column in the many table (B)
appear as a list in one field of the output. E.g.:

create table one_table (
name varchar(20),
age integer,
--
constraint one_table_pk
primary key (name)
);

create table many_table (
name varchar(20),
property varchar(15),
--
constraint many_table_pk
primary key (name, property),
constraint many_table_fk
foreign key (name) references one_table (name)
);

So one_table could contain:
name age
---- ---
Mickey Mouse 50
Donald Duck 60
Goofy 45

and many_table could contain:
name property
---- --------
Mickey Mouse Small
Mickey Mouse Black
Mickey Mouse Squeaks
Donald Duck Quacks

The query I'm trying to write would have as output something like:

name age properties
---- --- ----------
Mickey Mouse 50 Small, Black, Squeaks
Donald Duck 60 Quacks
Goofy 45 <null>

Could somebody please help point me in the right direction? I imagine
this is a common need...

Thanks,

...Edmund.


From: Jeff Boes <jboes(at)nexcerpt(dot)com>
To: pgsql-sql(at)postgresql(dot)org
Subject: Re: Denormalizing during select
Date: 2003-02-24 19:12:03
Message-ID: 988d394c6d80ecd1396eac17ff787470@news.teranews.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

On Mon, 24 Feb 2003 12:18:24 -0500, no.spam wrote:

> I have two tables (A and B) in a one-to-many relationship. When
> retrieving data from one table (A), I need to denormalize the tables
> during a select so that values of a column in the many table (B) appear
> as a list in one field of the output.

Well, the straightforward way to do this would be with a function:

create or replace function my_func(text) returns text as '
declare
name_one alias for $1;
accum text;
result record;

begin
accum := \'\';
for result in select property from many_table where name = name_one
loop
if accum != \'\'
then
accum := accum || \',\';
end if;
accum := accum || result.property;
end loop;
return accum;
end;
' language 'plpgsql';

Note that the quotes internal to the function must be escaped.

--
Jeff Boes vox 269.226.9550 ext 24
Database Engineer fax 269.349.9076
Nexcerpt, Inc. http://www.nexcerpt.com
...Nexcerpt... Extend your Expertise


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Jeff Boes <jboes(at)nexcerpt(dot)com>, pgsql-sql(at)postgresql(dot)org
Subject: Re: Denormalizing during select
Date: 2003-02-24 22:33:49
Message-ID: 200302241433.49502.josh@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql


Jeff,

> > I have two tables (A and B) in a one-to-many relationship. When
> > retrieving data from one table (A), I need to denormalize the tables
> > during a select so that values of a column in the many table (B) appear
> > as a list in one field of the output.
>
> Well, the straightforward way to do this would be with a function:

Can you post the start of this thread? I think there's a more efficient way
using custom aggregates, but somhow I never received the original question.

--
-Josh Berkus
Aglio Database Solutions
San Francisco


From: no(dot)spam(at)address(dot)com
To: pgsql-sql(at)postgresql(dot)org
Subject: Re: Denormalizing during select
Date: 2003-02-25 03:37:41
Message-ID: m3pl5vsrmhi1rbn865huhqn29cjfmoal79@4ax.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

On Tue, 25 Feb 2003 01:22:42 +0000 (UTC), josh(at)agliodbs(dot)com (Josh
Berkus) wrote:

>Can you post the start of this thread? I think there's a more
>efficient way using custom aggregates, but somhow I never
>received the original question.

I'm hoping there is... somehow, using a function to loop over every
row seems a bit inefficient? Here's my original question...

I have two tables (A and B) in a one-to-many relationship. When
retrieving data from one table (A), I need to denormalize the tables
during a select so that values of a column in the many table (B)
appear as a list in one field of the output. E.g.:

create table one_table (
name varchar(20),
age integer,
--
constraint one_table_pk
primary key (name)
);

create table many_table (
name varchar(20),
property varchar(15),
--
constraint many_table_pk
primary key (name, property),
constraint many_table_fk
foreign key (name) references one_table (name)
);

So one_table could contain:
name age
---- ---
Mickey Mouse 50
Donald Duck 60
Goofy 45

and many_table could contain:
name property
---- --------
Mickey Mouse Small
Mickey Mouse Black
Mickey Mouse Squeaks
Donald Duck Quacks

The query I'm trying to write would have as output something like:

name age properties
---- --- ----------
Mickey Mouse 50 Small, Black, Squeaks
Donald Duck 60 Quacks
Goofy 45 <null>

Could somebody please help point me in the right direction? I imagine
this is a common need...

Thanks,

...Edmund.


From: Edmund Lian <no(dot)spam(at)address(dot)com>
To: pgsql-sql(at)postgresql(dot)org
Subject: Re: Denormalizing during select
Date: 2003-02-25 03:44:02
Message-ID: fhpl5v86vkhq26gifm9g4cb1kh2o2b6d6r@4ax.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

Jeff and Josh,

I found this example in "Practical PostgreSQL"... will it do the job?

"""
The following example defines an aggregate function named sum(), for
use with the text data type. This aggregate calls the
textcat(text,text) function built into PostgreSQL to return a
concatenated "sum" of all the text found in its input values:

booktown=# CREATE AGGREGATE sum ( BASETYPE = text,
booktown(# SFUNC = textcat,
booktown(# STYPE = text,
booktown(# INITCOND = '' );
CREATE
booktown=# SELECT sum(title || ' ') FROM books WHERE title ~ '^L';
sum
-------------------------------
Little Women Learning Python
(1 row)

"""

...Edmund.


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Jeff Boes <jboes(at)nexcerpt(dot)com>, pgsql-sql(at)postgresql(dot)org
Subject: Re: Denormalizing during select
Date: 2003-02-25 18:24:19
Message-ID: 200302251024.19249.josh@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

Jeff,

> > I have two tables (A and B) in a one-to-many relationship. When
> > retrieving data from one table (A), I need to denormalize the tables
> > during a select so that values of a column in the many table (B) appear
> > as a list in one field of the output.
>
> Well, the straightforward way to do this would be with a function:

Actually, it's much easier to do this using a custom aggregate:

CREATE FUNCTION "comma_cat" (text,text) RETURNS text AS '
SELECT CASE WHEN $2 IS NULL OR $2 = '''' THEN $1
WHEN $1 IS NULL OR $1 = '''' THEN $2
ELSE $1 || '', '' || $2
END
' LANGUAGE 'sql';

CREATE AGGREGATE comma_list ( BASETYPE = text, SFUNC = comma_cat, STYPE =
text,
INITCOND = '' );

SELECT a.id, a.col1, a.col2, comma_list(b.text)
FROM a JOIN b on a.id = b.a_id
GROUP BY a.id, a.col1, a.col2

The only drawback of this approach is that you cannot order the items in the
list, but it is *much* faster than the function method that Jeff outlined.

--
-Josh Berkus
Aglio Database Solutions
San Francisco


From: Edmund Lian <no(dot)spam(at)address(dot)com>
To: pgsql-sql(at)postgresql(dot)org
Subject: Re: Denormalizing during select
Date: 2003-02-25 18:48:19
Message-ID: eq8n5vcad8ut4hk3dba1g4q27c54gr6dbs@4ax.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

On Tue, 25 Feb 2003 03:44:02 GMT, Edmund Lian wrote:

>I found this example in "Practical PostgreSQL"... will it do the job?

Answering my own question: kind of. The problem with custom aggregates
is that they need to be used with a "group by" clause, and this means
that the select cannot return columns that are not aggregates of some
kind. What I'm trying to return are rows that are a combination of
columns and aggregates.

...Edmund.


From: Rod Taylor <rbt(at)rbt(dot)ca>
To: Edmund Lian <no(dot)spam(at)address(dot)com>
Cc: pgsql-sql(at)postgresql(dot)org
Subject: Re: Denormalizing during select
Date: 2003-03-01 17:07:04
Message-ID: 1046538423.26763.177.camel@jester
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

On Tue, 2003-02-25 at 13:48, Edmund Lian wrote:
> On Tue, 25 Feb 2003 03:44:02 GMT, Edmund Lian wrote:
>
> >I found this example in "Practical PostgreSQL"... will it do the job?
>
> Answering my own question: kind of. The problem with custom aggregates
> is that they need to be used with a "group by" clause, and this means
> that the select cannot return columns that are not aggregates of some
> kind. What I'm trying to return are rows that are a combination of
> columns and aggregates.

I've been trying to figure out how to give a running total (similar
issue I think).

key value
1 5
1 5
1 5
2 1
2 2
2 1

Query output:
key value sum to point
1 5 5
1 5 10
1 5 15
2 1 1
2 2 3
2 1 4

I think I should be able to do it with an aggregate -- but the best I've
been able to come up with is a Set Returning Function.

--
Rod Taylor <rbt(at)rbt(dot)ca>

PGP Key: http://www.rbt.ca/rbtpub.asc


From: Rod Taylor <rbt(at)rbt(dot)ca>
To: Edmund Lian <no(dot)spam(at)address(dot)com>
Cc: pgsql-sql(at)postgresql(dot)org
Subject: Re: Denormalizing during select
Date: 2003-03-01 17:07:06
Message-ID: 1046538423.26763.178.camel@jester
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

On Tue, 2003-02-25 at 13:48, Edmund Lian wrote:
> On Tue, 25 Feb 2003 03:44:02 GMT, Edmund Lian wrote:
>
> >I found this example in "Practical PostgreSQL"... will it do the job?
>
> Answering my own question: kind of. The problem with custom aggregates
> is that they need to be used with a "group by" clause, and this means
> that the select cannot return columns that are not aggregates of some
> kind. What I'm trying to return are rows that are a combination of
> columns and aggregates.

I've been trying to figure out how to give a running total (similar
issue I think).

key value
1 5
1 5
1 5
2 1
2 2
2 1

Query output:
key value sum to point
1 5 5
1 5 10
1 5 15
2 1 1
2 2 3
2 1 4

I think I should be able to do it with an aggregate -- but the best I've
been able to come up with is a Set Returning Function.

--
Rod Taylor <rbt(at)rbt(dot)ca>

PGP Key: http://www.rbt.ca/rbtpub.asc


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Edmund Lian <no(dot)spam(at)address(dot)com>
Cc: pgsql-sql(at)postgresql(dot)org
Subject: Re: Denormalizing during select
Date: 2003-03-01 17:13:32
Message-ID: 12195.1046538812@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

Edmund Lian <no(dot)spam(at)address(dot)com> writes:
> Answering my own question: kind of. The problem with custom aggregates
> is that they need to be used with a "group by" clause, and this means
> that the select cannot return columns that are not aggregates of some
> kind. What I'm trying to return are rows that are a combination of
> columns and aggregates.

This strikes me as evidence of fuzzy thinking. What sets of rows are
the aggregates taken over? Which column values within those sets of
rows do you expect the non-aggregated column references to return?
If the columns aren't the ones grouped by, seems like you have an
inherently undefined result.

If you know for some reason that there will be only one unique value
of a column in a grouped row set, or you don't actually much care which
one you get, then you could use MIN() or MAX() on the column reference
to make it look like a kosher query.

regards, tom lane


From: Greg Stark <gsstark(at)mit(dot)edu>
To: Rod Taylor <rbt(at)rbt(dot)ca>
Cc: Edmund Lian <no(dot)spam(at)address(dot)com>, pgsql-sql(at)postgresql(dot)org
Subject: Re: Denormalizing during select
Date: 2003-03-01 17:53:41
Message-ID: 87d6lbkm6y.fsf@stark.dyndns.tv
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql


Rod Taylor <rbt(at)rbt(dot)ca> writes:

> I've been trying to figure out how to give a running total (similar
> issue I think).

Running totals are a "hard problem". They certainly cannot be solved using
aggregates. They're similar to the ranking problem of assigning a sequential
number to each item within each group.

The problem is they share certain properties of aggregate functions, namely
that they require persistent state storage and a state transition function.
But they definitely aren't aggregate functions in that they return a value for
every record, not one for the whole group.

I'm not even clear how to write an embedded (plpgsql or perl or python)
function, since I'm not clear how to allocate space for the state that will be
available for each call on each record but independent from other calls in the
same query. You have to be able to handle two running totals at the same time.

Note that running totals are not very sql-ish. Since sql deals in unordered
sets a running total is pretty ill-defined. It would have to be calculated
after the sort operation or else require you to sort the input tables in a
subquery or something.

To write a proper well-defined sql-ish query for running totals you would have
to do the very inefficient:

select employee_id, salary,
(select count(*) from employees x where x.salary < employee.salary) as salary_rank,
(select sum(salary) from employees x where x.salary < employee.salary) as running_total
from employees
order by salary desc

Finding a way to transform that into the single-scan plan that's obviously the
right way to execute it would be really cool but seems pretty far-fetched. I
don't think any database is capable of it today.

--
greg


From: Guy Fraser <guy(at)incentre(dot)net>
To: Edmund Lian <no(dot)spam(at)address(dot)com>
Cc: pgsql-sql(at)postgresql(dot)org
Subject: Convert a text list to text array? Was: Denormalizing during select
Date: 2003-03-04 17:58:21
Message-ID: 3E64E93D.9050506@incentre.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

The email at the bottom gave me an idea, but it doesn't quite work:

CREATE AGGREGATE accumulate(
BASETYPE = text,
SFUNC = textcat,
STYPE = text,
INITCOND = '' );
--
SELECT ('{' || ltrim(accumulate(',' || tablename),',') || '}') as cruft
FROM pg_tables
WHERE hasindexes = 'f';

cruft
---------------------------
{pg_xactlock,pg_listener}
(1 row)

This produces somthing that looks like it could be able to be converted into
an array but I cant figure out how to make it work.

Guy

Edmund Lian wrote:
> Jeff and Josh,
>
> I found this example in "Practical PostgreSQL"... will it do the job?
>
> """
> The following example defines an aggregate function named sum(), for
> use with the text data type. This aggregate calls the
> textcat(text,text) function built into PostgreSQL to return a
> concatenated "sum" of all the text found in its input values:
>
> booktown=# CREATE AGGREGATE sum ( BASETYPE = text,
> booktown(# SFUNC = textcat,
> booktown(# STYPE = text,
> booktown(# INITCOND = '' );
> CREATE
> booktown=# SELECT sum(title || ' ') FROM books WHERE title ~ '^L';
...snip...


From: Greg Stark <gsstark(at)mit(dot)edu>
To: Guy Fraser <guy(at)incentre(dot)net>
Cc: Edmund Lian <no(dot)spam(at)address(dot)com>, pgsql-sql(at)postgresql(dot)org
Subject: Re: Convert a text list to text array? Was: Denormalizing during select
Date: 2003-03-04 18:27:27
Message-ID: 878yvvkmwg.fsf@stark.dyndns.tv
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql


You might be able to adapt the code in contrib/intagg to handle text.
I'm not sure what would be involved in doing so.

--
greg


From: Joe Conway <mail(at)joeconway(dot)com>
To: Guy Fraser <guy(at)incentre(dot)net>
Cc: pgsql-sql(at)postgresql(dot)org
Subject: Re: Convert a text list to text array? Was: Denormalizing during
Date: 2003-03-04 19:39:32
Message-ID: 3E6500F4.8030700@joeconway.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

Guy Fraser wrote:
> The email at the bottom gave me an idea, but it doesn't quite work:
>
> CREATE AGGREGATE accumulate(
> BASETYPE = text,
> SFUNC = textcat,
> STYPE = text,
> INITCOND = '' );
> --
> SELECT ('{' || ltrim(accumulate(',' || tablename),',') || '}') as cruft
> FROM pg_tables
> WHERE hasindexes = 'f';
>
> cruft
> ---------------------------
> {pg_xactlock,pg_listener}
> (1 row)
>
> This produces somthing that looks like it could be able to be converted
> into an array but I cant figure out how to make it work.
>

If I understand correctly, the function array_accum() distributed with
PL/R can do exactly what you're looking for (you need the function, but
you don't necessarily need to use PL/R):

CREATE OR REPLACE FUNCTION array_accum (_name, name)
RETURNS name[]
AS '$libdir/plr','array_accum'
LANGUAGE 'C';

CREATE AGGREGATE accumulate (
sfunc = array_accum,
basetype = name,
stype = _name
);

regression=# SELECT accumulate(tablename) as cruft FROM pg_tables WHERE
tablename LIKE 'c%';
cruft
---------------------------------------
{connectby_int,connectby_text,ct,cth}
(1 row)

See:
http://www.joeconway.com/plr/doc/plr-aggregate-funcs.html
and download at:
http://www.joeconway.com/plr/

HTH,

Joe


From: Guy Fraser <guy(at)incentre(dot)net>
To: pgsql-sql(at)postgresql(dot)org
Subject: Re: Convert a text list to text array? Was: Denormalizing during
Date: 2003-03-18 18:55:10
Message-ID: 3E776B8E.9000603@incentre.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

Thanks a lot

Joe Conway wrote:

> Guy Fraser wrote:
>
>> The email at the bottom gave me an idea, but it doesn't quite work:
>>
>> CREATE AGGREGATE accumulate(
>> BASETYPE = text,
>> SFUNC = textcat,
>> STYPE = text,
>> INITCOND = '' );
>> --
>> SELECT ('{' || ltrim(accumulate(',' || tablename),',') || '}') as cruft
>> FROM pg_tables
>> WHERE hasindexes = 'f';
>>
>> cruft
>> ---------------------------
>> {pg_xactlock,pg_listener}
>> (1 row)
>>
>> This produces somthing that looks like it could be able to be
>> converted into an array but I cant figure out how to make it work.
>>
>
> If I understand correctly, the function array_accum() distributed with
> PL/R can do exactly what you're looking for (you need the function,
> but you don't necessarily need to use PL/R):
>
> CREATE OR REPLACE FUNCTION array_accum (_name, name)
> RETURNS name[]
> AS '$libdir/plr','array_accum'
> LANGUAGE 'C';
>
> CREATE AGGREGATE accumulate (
> sfunc = array_accum,
> basetype = name,
> stype = _name
> );
>
> regression=# SELECT accumulate(tablename) as cruft FROM pg_tables
> WHERE tablename LIKE 'c%';
> cruft
> ---------------------------------------
> {connectby_int,connectby_text,ct,cth}
> (1 row)
>
>
> See:
> http://www.joeconway.com/plr/doc/plr-aggregate-funcs.html
> and download at:
> http://www.joeconway.com/plr/
>
> HTH,
>
> Joe
>
>