Re: NULL values or not?

Lists: pgsql-sql
From: "Glenn MacGregor" <gtm(at)oracom(dot)com>
To: <pgsql-sql(at)postgresql(dot)org>
Subject: Transaction and cascade problem
Date: 2001-12-20 13:32:26
Message-ID: 00ad01c1895a$c3d5df80$4d00a8c0@catamount
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

Hi All,

I have postgres 7.1 set up with two tables (groups, users). groups has 2 columns name, groupid where id is the primary key autoincrement and name is unique. users has 3 columns name, userid, groupid. I have a constraint on users which says that groupid must exists in the group table. I also have a cascade delete on the groups table to delete any users that are in the group I am removing. I regular (no transaction) mode everything works fine. When I start a transaction and I add a group to the group table, then delete it before a commit or rollback I get the following error:

ERROR: triggered data change violation on relation "groups"

After that the transaction must be rolledback. Any clues?

Thanks

Glenn


From: "Josh Berkus" <josh(at)agliodbs(dot)com>
To: "Glenn MacGregor" <gtm(at)oracom(dot)com>, <pgsql-sql(at)postgresql(dot)org>
Subject: Re: Transaction and cascade problem
Date: 2001-12-20 16:08:14
Message-ID: web-532942@davinci.ethosmedia.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

Glenn,

> I have postgres 7.1 set up with two tables (groups, users). groups
> has 2 columns name, groupid where id is the primary key autoincrement
> and name is unique. users has 3 columns name, userid, groupid. I
> have a constraint on users which says that groupid must exists in the
> group table. I also have a cascade delete on the groups table to
> delete any users that are in the group I am removing. I regular (no
> transaction) mode everything works fine. When I start a transaction
> and I add a group to the group table, then delete it before a commit
> or rollback I get the following error:
>
> ERROR: triggered data change violation on relation "groups"

Yes. For some technical reason ( I'm not clear on this ) you cannot
both add and delete the same row within a PostgreSQL transaction. This
is a known issue in 7.1.x; I do not know if it is fixed in 7.2.

In the couple of functions where I need to add rows, then drop them, I
have been taking the following steps:
1. Add new rows
2. When I'm done with the rows, flag them as invalid.
3. Put SQL in another, regularly-running (1/hour) function that deletes
all invalid-flagged rows.
Sort of a messy workaround, I know, but there it is.

-Josh

______AGLIO DATABASE SOLUTIONS___________________________
Josh Berkus
Complete information technology josh(at)agliodbs(dot)com
and data management solutions (415) 565-7293
for law firms, small businesses fax 621-2533
and non-profit organizations. San Francisco


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Josh Berkus" <josh(at)agliodbs(dot)com>
Cc: "Glenn MacGregor" <gtm(at)oracom(dot)com>, pgsql-sql(at)postgresql(dot)org
Subject: Re: Transaction and cascade problem
Date: 2001-12-20 17:09:11
Message-ID: 9261.1008868151@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

"Josh Berkus" <josh(at)agliodbs(dot)com> writes:
>> ERROR: triggered data change violation on relation "groups"

> Yes. For some technical reason ( I'm not clear on this ) you cannot
> both add and delete the same row within a PostgreSQL transaction. This
> is a known issue in 7.1.x; I do not know if it is fixed in 7.2.

Actually, we've just removed that error check in 7.2. The discussion
concluded that we'd misinterpreted the spec in treating this condition
as an error. You can simply dike out the error call (it's in
src/backend/commands/trigger.c, IIRC) if it's getting in your way in
7.1.

regards, tom lane


From: "Christopher Kings-Lynne" <chriskl(at)familyhealth(dot)com(dot)au>
To: "Josh Berkus" <josh(at)agliodbs(dot)com>, "Glenn MacGregor" <gtm(at)oracom(dot)com>, <pgsql-sql(at)postgresql(dot)org>
Subject: Re: Transaction and cascade problem
Date: 2001-12-21 02:57:55
Message-ID: GNELIHDDFBOCMGBFGEFOMEOCCAAA.chriskl@familyhealth.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

> > ERROR: triggered data change violation on relation "groups"
>
> Yes. For some technical reason ( I'm not clear on this ) you cannot
> both add and delete the same row within a PostgreSQL transaction. This
> is a known issue in 7.1.x; I do not know if it is fixed in 7.2.

Hmmm...I seem to recall that it is fixed in 7.2. Correct me if I'm wrong.

Chris


From: Archibald Zimonyi <archie(at)netg(dot)se>
To: pgsql-sql(at)postgresql(dot)org
Subject: NULL values or not?
Date: 2001-12-21 09:53:38
Message-ID: Pine.LNX.4.21.0112211045420.15695-100000@valdez.netg.se
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql


Hi there,

I have a question about NULL values. Lets say that we have a world with
the following info:

FirstName
LastName
PhoneNumber

Everyone has to have a FirstName and LastName but not everyone has to have
a PhoneNumber.

Personally I don't like NULL values, so I would have created to tables for
the above world roughly like this:

CREATE TABLE person
(
id int2,
firstname text,
lastname text
);

CREATE TABLE phonenumbers
(
id int2,
phonenumber text
);

with keys and indexes and such things.

A SELECT statement to retireve all info from these two tables would look
like this:

SELECT firstname, lastname, phonenumber
FROM person
LEFT JOIN phonenumbers USING (id);

Another way of doing the same world is the following

CREATE TABLE person
(
id int2,
firstname text,
lastname text,
phonenumber text
);

with all appropriate keys etc.

A SELECT statement would look like this:

SELECT firstname, lastname, phonenumber
FROM person;

As I wrote, I usually try to avoid NULL values, thus creating my tables as
the first example. What kind of thumb rules do you use when it comes to
NULL values? Again, I am referring to my vampire database which I named a
few days ago (btw, without VACUUM the SELECT statement takes less then a
second) and I am planning on making less tables where I can. But it still
feels wrong to add NULL values when I can avoid them.

Could someone give me some input please?

Thanks in advance,

Archie


From: Jan Wieck <janwieck(at)yahoo(dot)com>
To: Archibald Zimonyi <archie(at)netg(dot)se>
Cc: pgsql-sql(at)postgresql(dot)org
Subject: Re: NULL values or not?
Date: 2001-12-21 12:29:56
Message-ID: 200112211229.fBLCTvD01601@saturn.jw.home
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

Archibald Zimonyi wrote:
>
> Hi there,
>
> I have a question about NULL values. Lets say that we have a world with
> the following info:
>
> FirstName
> LastName
> PhoneNumber
>
>
> Everyone has to have a FirstName and LastName but not everyone has to have
> a PhoneNumber.

I don't know of any particular, but I can imagine cultures
where people don't have first or last names. For sure alot of
people on this earth don't have a mailing address that'd fit
into the "usual" schema. Don't make the schema too scattered
just to avoid NULLs. Wether you like them or not doesn't
matter, use them where they are handy.

Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== JanWieck(at)Yahoo(dot)com #

_________________________________________________________
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com


From: darcy(at)druid(dot)net (D'Arcy J(dot)M(dot) Cain)
To: archie(at)netg(dot)se (Archibald Zimonyi)
Cc: pgsql-sql(at)postgresql(dot)org
Subject: Re: NULL values or not?
Date: 2001-12-21 12:37:59
Message-ID: 20011221123759.5760E1A69@druid.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

Thus spake Archibald Zimonyi
> I have a question about NULL values. Lets say that we have a world with
> the following info:
>
> FirstName
> LastName
> PhoneNumber
>
>
> Everyone has to have a FirstName and LastName but not everyone has to have
> a PhoneNumber.
>
> Personally I don't like NULL values, so I would have created to tables for
> the above world roughly like this:

Why don't you like NULLs? NULLs are an important part os SQL. however...

> CREATE TABLE person
> (
> id int2,
> firstname text,
> lastname text
> );
>
> CREATE TABLE phonenumbers
> (
> id int2,
> phonenumber text
> );
>
> with keys and indexes and such things.
>
> A SELECT statement to retireve all info from these two tables would look
> like this:
>
> SELECT firstname, lastname, phonenumber
> FROM person
> LEFT JOIN phonenumbers USING (id);

How about this?

CREATE TABLE person
(
firstname text DEFAULT '',
lastname text NOT NULL,
phonenumber text DEFAULT ''
);

Now you can simply list everything and the phone number will be blank if
you don't have one. Note that I do the same for firstname since I am not
as convinced as you are that everyone needs two names. With your philosopy
you would need to break out another table. Worse, some people only have
first names and some only have last names.

I also put a NOT NULL constraint on lastname to force an entry there
although you can still insert a blank.

However, don't reject NULLs out of hand. They can be very important
especially when you have foreign keys and such. It's much cleaner to
put a NULL in a field when you don't know than to have to add a special
"Unknown" entry in the foreign table.

--
D'Arcy J.M. Cain <darcy(at){druid|vex}.net> | Democracy is three wolves
http://www.druid.net/darcy/ | and a sheep voting on
+1 416 425 1212 (DoD#0082) (eNTP) | what's for dinner.


From: "Aasmund Midttun Godal" <postgresql(at)envisity(dot)com>
To: archie(at)netg(dot)se
Cc: pgsql-sql(at)postgresql(dot)org
Subject: Re: NULL values or not?
Date: 2001-12-21 12:41:24
Message-ID: 20011221124124.22978.qmail@213-145-170-138.dd.nextgentel.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

When it comes to database design, there are many books on the topic.

However, here are the principles I use:

1.) The schema should reflect the model (i.e. you should be able to do everything you need to, and not be able to do anything you shouldn't)
2.) It should be easy to understand, maintain and use.

In my opinion null values don't go against any of these principles. Adding all sorts of id's and unnecessary tables goes against #2. If you need to be able to register several numbers on each person, then you need two tables.

Regards,

Aasmund.

On Fri, 21 Dec 2001 10:53:38 +0100 (CET), Archibald Zimonyi <archie(at)netg(dot)se> wrote:
>
> Hi there,
>
> I have a question about NULL values. Lets say that we have a world with
> the following info:
>
> FirstName
> LastName
> PhoneNumber
>
>
> Everyone has to have a FirstName and LastName but not everyone has to have
> a PhoneNumber.
>
> Personally I don't like NULL values, so I would have created to tables for
> the above world roughly like this:
>
> CREATE TABLE person
> (
> id int2,
> firstname text,
> lastname text
> );
>
> CREATE TABLE phonenumbers
> (
> id int2,
> phonenumber text
> );
>
> with keys and indexes and such things.
>
> A SELECT statement to retireve all info from these two tables would look
> like this:
>
> SELECT firstname, lastname, phonenumber
> FROM person
> LEFT JOIN phonenumbers USING (id);
>
>
> Another way of doing the same world is the following
>
> CREATE TABLE person
> (
> id int2,
> firstname text,
> lastname text,
> phonenumber text
> );
>
> with all appropriate keys etc.
>
> A SELECT statement would look like this:
>
> SELECT firstname, lastname, phonenumber
> FROM person;
>
> As I wrote, I usually try to avoid NULL values, thus creating my tables as
> the first example. What kind of thumb rules do you use when it comes to
> NULL values? Again, I am referring to my vampire database which I named a
> few days ago (btw, without VACUUM the SELECT statement takes less then a
> second) and I am planning on making less tables where I can. But it still
> feels wrong to add NULL values when I can avoid them.
>
> Could someone give me some input please?
>
> Thanks in advance,
>
> Archie
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo(at)postgresql(dot)org

Aasmund Midttun Godal

aasmund(at)godal(dot)com - http://www.godal.com/
+47 40 45 20 46


From: Archibald Zimonyi <archie(at)netg(dot)se>
To: pgsql-sql(at)postgresql(dot)org
Subject: Re: NULL values or not?
Date: 2001-12-21 15:08:07
Message-ID: Pine.LNX.4.21.0112211549310.21359-100000@valdez.netg.se
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql


Hi all,

thanks for the replies. Most of you didn't read my post correctly though,
I know that on our earth we have people with more then one FirstName, and
some with no LastName etc. I stated however that if we had a world where
people had only 1 FirstName and 1 LastName but it was not necessary to
have a PhoneNumber would my first of my second structure be advisable to
create?

> I don't know of any particular, but I can imagine cultures
> where people don't have first or last names. For sure alot of
> people on this earth don't have a mailing address that'd fit
> into the "usual" schema. Don't make the schema too scattered
> just to avoid NULLs. Wether you like them or not doesn't
> matter, use them where they are handy.
>
The schema too scattered. That is exactly the way I like to build my
tables, the question was is it advisable? I can still get my (although
fairly low amount of data) statements to work, takes about 1 second to
start the search from the webbpage until the document is done again. That
is fairly fast and well below the time limit that people tend to find
annoying when searching for something.

So in effect my databasestructure works nicely, no NULLS and lots of
tables.

Again, I will include my schema and functions as attachments.

The question I asked, and will ask again, why should I add a column with
NULL values rather then having a new table with that info instead? When
will the NULL value be worth it. I take it there is less cost of simply
having one table scan then joining two tables with a left join in order
to retrive all rows.

What if the PhoneNumber is an ID to another table that would look like
this:

CREATE TABLE phonenumber
(
phonenumber_id int2,
phonenumber
);

with all keys etc.

If I have to retrieve FirstName, LastName and PhoneNumber I would still
have to join the two tables with a left join, since phonenumber_id has to
match in both tables.

Archie


From: "Aasmund Midttun Godal" <postgresql(at)envisity(dot)com>
To: archie(at)netg(dot)se
Cc: pgsql-sql(at)postgresql(dot)org
Subject: Re: NULL values or not?
Date: 2001-12-21 15:24:55
Message-ID: 20011221152455.4672.qmail@213-145-170-138.dd.nextgentel.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

What is the problem with nulls? why are they bad??

Regards,

Aasmund.

On Fri, 21 Dec 2001 16:08:07 +0100 (CET), Archibald Zimonyi <archie(at)netg(dot)se> wrote:
>
> Hi all,
>
> thanks for the replies. Most of you didn't read my post correctly though,
> I know that on our earth we have people with more then one FirstName, and
> some with no LastName etc. I stated however that if we had a world where
> people had only 1 FirstName and 1 LastName but it was not necessary to
> have a PhoneNumber would my first of my second structure be advisable to
> create?
>
> The schema too scattered. That is exactly the way I like to build my
> tables, the question was is it advisable? I can still get my (although
> fairly low amount of data) statements to work, takes about 1 second to
> start the search from the webbpage until the document is done again. That
> is fairly fast and well below the time limit that people tend to find
> annoying when searching for something.
>
> So in effect my databasestructure works nicely, no NULLS and lots of
> tables.
>
> Again, I will include my schema and functions as attachments.
>
> The question I asked, and will ask again, why should I add a column with
> NULL values rather then having a new table with that info instead? When
> will the NULL value be worth it. I take it there is less cost of simply
> having one table scan then joining two tables with a left join in order
> to retrive all rows.
>
> What if the PhoneNumber is an ID to another table that would look like
> this:
>
> CREATE TABLE phonenumber
> (
> phonenumber_id int2,
> phonenumber
> );
>
> with all keys etc.
>
> If I have to retrieve FirstName, LastName and PhoneNumber I would still
> have to join the two tables with a left join, since phonenumber_id has to
> match in both tables.
>
> Archie
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster

Aasmund Midttun Godal

aasmund(at)godal(dot)com - http://www.godal.com/
+47 40 45 20 46


From: Archibald Zimonyi <archie(at)netg(dot)se>
To: Aasmund Midttun Godal <postgresql(at)envisity(dot)com>
Cc: pgsql-sql(at)postgresql(dot)org
Subject: Re: NULL values or not?
Date: 2001-12-21 15:48:50
Message-ID: Pine.LNX.4.21.0112211628390.21359-100000@valdez.netg.se
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

> What is the problem with nulls? why are they bad??
>
NULLs are not bad, I have never claimed them to be. I try to avoid them
personally, which tends to make my number of tables many, but with fewer
lines usually since I don't have any NULLS.

However the current structure I have takes too long time (at least I think
so) and when I used VACUUM ANALYZE I went from roughly a 1 second query to
a 1 minute query. In my opinion that should not happen.

And if I have a structure with the following data:

CREATE TABLE person
(
person_id int2,
FirstName text,
LastName text,
phone_id int2
);

CREATE TABLE phonenumbers
(
phone_id int2,
phonenumber
);

where in table person phone_id could be NULL.

I would still have to join together my tables in order to retrieve all
info with a left join.

So again, I have no problems with NULLS, I just would like someone to
share when they use them contra when they don't.

Archie


From: "Josh Berkus" <josh(at)agliodbs(dot)com>
To: Archibald Zimonyi <archie(at)netg(dot)se>, Aasmund Midttun Godal <postgresql(at)envisity(dot)com>
Cc: pgsql-sql(at)postgresql(dot)org
Subject: Re: NULL values or not?
Date: 2001-12-21 16:55:11
Message-ID: web-533913@davinci.ethosmedia.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

Archibald,

> So again, I have no problems with NULLS, I just would like someone to
> share when they use them contra when they don't.

Thank you for bringing a theory question onto the list! We spend much
of our time on pgsql-sql focused on practical minutia. Thus several of
the respondants who siezed on your hypothetical example rather than
answering the question. I'd love to see some of the more experienced DB
theorists weigh in on this issue (Tom? Joe C.?).

Relationally speaking, NULLs are a necessary evil. You are right to
approach them with caution. I cannot count the number of times one of
my database rescue attempts has involved eliminating, sub-tabling, or
consolidating NULLable columns.

By coincidence, yesterday I posted this to PGSQL-NOVICE:

============================================================
> 3) Use of NULL
>
> It would facilitate entering data into TABLE clone if seq, qual...
> were defined as NULL even though values for these columns will/does
> exist. If I defined these columns as NULL I would not have to have
> ALL of the data together at one time, but could enter it in steps.
> Is this an ill conceived notion?

Well, this purpose is what NULL is for. Cases where data is unknown or
not yet available. That being said, allowing NULLs takes data integrity
out of the table design and moves it elsewhere in the software. If you
allow NULLs for seq, then you will need to create a data integrity
report that searches for old entries with NULL in the seq column.
Otherwise, you risk having some required data never filled in.

Also, remember that you can't JOIN on a NULL value. For example, if you
allow NULLs in, say clones.gb_id, then if you do a report on clones
JOIN
gb_accessions, the clones who are missing GB will not show up with a
blank GB, instead they will not show up at all! You can get around this
with OUTER JOINS, but OUTER JOINs are not dependable for multi-column
joins.

Instead, I recommend that everywhere it is possible, you have an actual
value that indicates why the data has not been filled in. FOr example,
you could create a gb_accession with the ID of zero (0) (and accn_no,
gi_no of 0 as well) which would indicate (to you) "gb not run yet".
This gives you more information than NULL (which could indicate a
number
of things: GB not run, GB lost, data error, program error, etc.), as
well as supporting JOINs cleanly. You could even have more than one
such value to indicate different reasons for missing info.
=====================================================

For a more in-depth discussion of NULLs and their problems and
workarounds, see Fabian Pascal's "Practical Issues in Database
Management", which has the better part of a chapter on the topic.

-Josh

______AGLIO DATABASE SOLUTIONS___________________________
Josh Berkus
Complete information technology josh(at)agliodbs(dot)com
and data management solutions (415) 565-7293
for law firms, small businesses fax 621-2533
and non-profit organizations. San Francisco


From: Archibald Zimonyi <archie(at)netg(dot)se>
To: pgsql-sql(at)postgresql(dot)org
Subject: Re: NULL values or not?
Date: 2001-12-21 17:22:54
Message-ID: Pine.LNX.4.21.0112211807470.21359-100000@valdez.netg.se
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

Thank you, thank you, thank you, thank you....

Your post, Josh, was very informative for me. As it is now, my Vampire
schema, which I forgot to add as an attachment, doesn't allow a single
NULL value. I do have a small amount of data, which is why things might
still work as fast as they do.

The start of my problems is still the fact that when I do VACUUM ANALYZE
my main query takes almost a minute (from around a second), which has made
me perlexed by the whole thing, since the EXPLAIN shows relatively low
values (although reading that is still an art, I have a query which
results in more rows and has higher EXPLAIN values but still takes less
time then my main query).

I have read a book which I find immensly useful and good, called:

The Practical SQL Handbook

published by Addison-Wesley. It is this book that has given me my main
view of SQL and database design. I think my database uses a lot of the
normalization rules which is why I try to avoid NULL values. An example
from my own Vampire database follows:

CREATE TABLE cards
(
card_id int2 DEFAULT NEXTVAL('cards_seq'),
cardname text NOT NULL,
PRIMARY KEY(card_id)
);

CREATE TABLE cards_names
(
card_id int2 NOT NULL,
cardname text NOT NULL,
FOREIGN KEY (card_id) REFERENCES cards
);

The table cards include the primary names of all cards. Each card is
unique, giving it a unique id. Some cards have two names, due to the fact
that there were misprints etc.

So the table cards_names includes those few cards which might have two
names.

CREATE TABLE cards_costs
(
card_id int2 NOT NULL,
cost int2 NOT NULL,
pool bool NOT NULL,
FOREIGN KEY (card_id) REFERENCES cards
);

Some cards, but not all, have a certain cost to play. Instead of having a
"cost" column in "cards" and then having NULL values where the cards have
no cost, I once again decided to create a new table connecting the cost to
a card with it's id. So cards_costs also includes less cards then do the
table cards.

It is this I based my questions on, I know why I do it the way I do, but I
would also like to hear what other people think of this solution, and
perhaps why I should use NULLS instead. As I said in an earlier post,
there is a cost of joining the tables, which is higher then having one
table with NULLS.

Once again, thanks for the replies folks,

Archie

On Fri, 21 Dec 2001, Josh Berkus wrote:

> Archibald,
>
> > So again, I have no problems with NULLS, I just would like someone to
> > share when they use them contra when they don't.
>
> Thank you for bringing a theory question onto the list! We spend much
> of our time on pgsql-sql focused on practical minutia. Thus several of
> the respondants who siezed on your hypothetical example rather than
> answering the question. I'd love to see some of the more experienced DB
> theorists weigh in on this issue (Tom? Joe C.?).
>
> Relationally speaking, NULLs are a necessary evil. You are right to
> approach them with caution. I cannot count the number of times one of
> my database rescue attempts has involved eliminating, sub-tabling, or
> consolidating NULLable columns.
>
> By coincidence, yesterday I posted this to PGSQL-NOVICE:
>
> ============================================================
> > 3) Use of NULL
> >
> > It would facilitate entering data into TABLE clone if seq, qual...
> > were defined as NULL even though values for these columns will/does
> > exist. If I defined these columns as NULL I would not have to have
> > ALL of the data together at one time, but could enter it in steps.
> > Is this an ill conceived notion?
>
> Well, this purpose is what NULL is for. Cases where data is unknown or
> not yet available. That being said, allowing NULLs takes data integrity
> out of the table design and moves it elsewhere in the software. If you
> allow NULLs for seq, then you will need to create a data integrity
> report that searches for old entries with NULL in the seq column.
> Otherwise, you risk having some required data never filled in.
>
> Also, remember that you can't JOIN on a NULL value. For example, if you
> allow NULLs in, say clones.gb_id, then if you do a report on clones
> JOIN
> gb_accessions, the clones who are missing GB will not show up with a
> blank GB, instead they will not show up at all! You can get around this
> with OUTER JOINS, but OUTER JOINs are not dependable for multi-column
> joins.
>
> Instead, I recommend that everywhere it is possible, you have an actual
> value that indicates why the data has not been filled in. FOr example,
> you could create a gb_accession with the ID of zero (0) (and accn_no,
> gi_no of 0 as well) which would indicate (to you) "gb not run yet".
> This gives you more information than NULL (which could indicate a
> number
> of things: GB not run, GB lost, data error, program error, etc.), as
> well as supporting JOINs cleanly. You could even have more than one
> such value to indicate different reasons for missing info.
> =====================================================
>
> For a more in-depth discussion of NULLs and their problems and
> workarounds, see Fabian Pascal's "Practical Issues in Database
> Management", which has the better part of a chapter on the topic.
>
> -Josh
>
> ______AGLIO DATABASE SOLUTIONS___________________________
> Josh Berkus
> Complete information technology josh(at)agliodbs(dot)com
> and data management solutions (415) 565-7293
> for law firms, small businesses fax 621-2533
> and non-profit organizations. San Francisco
>


From: "Josh Berkus" <josh(at)agliodbs(dot)com>
To: Archibald Zimonyi <archie(at)netg(dot)se>, pgsql-sql(at)postgresql(dot)org
Subject: Re: NULL values or not?
Date: 2001-12-21 21:09:57
Message-ID: web-534180@davinci.ethosmedia.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

Archibald,

> The start of my problems is still the fact that when I do VACUUM
> ANALYZE
> my main query takes almost a minute (from around a second), which has
> made
> me perlexed by the whole thing, since the EXPLAIN shows relatively
> low
> values (although reading that is still an art, I have a query which
> results in more rows and has higher EXPLAIN values but still takes
> less
> time then my main query).

I believe that this is a seperate problem, not at all related to your
use of NULLs or not. I have queries in some of my databases that join
as many as 18 different tables (some multiple times), and they run fine
and VACUUM speeds them up, as it's supposed to. So I believe that you
have a configuration, system, or version problem that is unrelated to
your schema.

To get help from the core developers, I suggest that you post, in a
seperate e-mail:
1. Your Postgres version (and if it's not 7.1.3 you should maybe
upgrade)
2. Your schema
3. The query that behaves oddly
4. Explain & time before VACUUM
5. Explain & time after VACUUM
6. Your hardware platform data.

> I have read a book which I find immensly useful and good, called:
>
> The Practical SQL Handbook
>
> published by Addison-Wesley. It is this book that has given me my
> main
> view of SQL and database design.

I would very much appreciate it if you sent me a mini-review of the book
for Techdocs Book Reviews. See:
http://techdocs.postgresql.org/bookreviews.php
for format, etc.

> It is this I based my questions on, I know why I do it the way I do,
> but I
> would also like to hear what other people think of this solution, and
> perhaps why I should use NULLS instead. As I said in an earlier post,
> there is a cost of joining the tables, which is higher then having
> one
> table with NULLS.

The cost should not be appreciably higher for your size database.

-Josh

______AGLIO DATABASE SOLUTIONS___________________________
Josh Berkus
Complete information technology josh(at)agliodbs(dot)com
and data management solutions (415) 565-7293
for law firms, small businesses fax 621-2533
and non-profit organizations. San Francisco


From: "Wei Weng" <wweng(at)kencast(dot)com>
To: <pgsql-sql(at)postgresql(dot)org>
Subject: Re: NULL values or not?
Date: 2001-12-21 21:37:52
Message-ID: NEBBLAIMPMNDAPBMNGJNCEHKCCAA.wweng@kencast.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

Forgive my ignorance, but what exactly is a "multi-column" join?
<snip>
> gb_accessions, the clones who are missing GB will not show up with a
> blank GB, instead they will not show up at all! You can get around this
> with OUTER JOINS, but OUTER JOINs are not dependable for multi-column
> joins.
<snip>

I also used a lot of NULLs in my table schemas though never in the
primary(foreign) keys. Is that bad?

Thank you.

==========
Wei Weng
Network Software Engineer
KenCast Inc.


From: "Andrew G(dot) Hammond" <drew(at)xyzzy(dot)dhs(dot)org>
To: Archibald Zimonyi <archie(at)netg(dot)se>
Cc: pgsql-sql(at)postgresql(dot)org
Subject: Re: NULL values or not?
Date: 2001-12-22 06:32:54
Message-ID: 1009002775.1903.3.camel@xyzzy
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql


> As I wrote, I usually try to avoid NULL values, thus creating my tables as
> the first example. What kind of thumb rules do you use when it comes to
> NULL values? Again, I am referring to my vampire database which I named a
> few days ago (btw, without VACUUM the SELECT statement takes less then a
> second) and I am planning on making less tables where I can. But it still
> feels wrong to add NULL values when I can avoid them.

Using a seperate table for phone numbers makes sense if you need to keep
track of multiple phone numbers per person or multiple people per phone
number. Otherwise it's pointless. You add unnecessary and useless
complexity to your system. Null values to indicate null data make
perfect sense. I'm a strong believer in the KISS principal:

Keep It Short & Simple.

--
Andrew G. Hammond mailto:drew(at)xyzzy(dot)dhs(dot)org
http://xyzzy.dhs.org/~drew/
56 2A 54 EF 19 C0 3B 43 72 69 5B E3 69 5B A1 1F
613-389-5481
5CD3 62B0 254B DEB1 86E0 8959 093E F70A B457 84B1
"To blow recursion you must first blow recur" -- me


From: Archibald Zimonyi <archie(at)netg(dot)se>
To: pgsql-sql(at)postgresql(dot)org
Subject: Re: NULL values or not?
Date: 2001-12-22 09:30:31
Message-ID: Pine.LNX.4.21.0112221030100.5131-100000@valdez.netg.se
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

> To get help from the core developers, I suggest that you post, in a
> seperate e-mail:
> 1. Your Postgres version (and if it's not 7.1.3 you should maybe
> upgrade)
> 2. Your schema
> 3. The query that behaves oddly
> 4. Explain & time before VACUUM
> 5. Explain & time after VACUUM
> 6. Your hardware platform data.
>
Will do, thanks again for the input.

Archie


From: "Frank Morton" <fmorton(at)base2inc(dot)com>
To: <pgsql-sql(at)postgresql(dot)org>
Subject: JOIN question
Date: 2001-12-22 19:26:50
Message-ID: 000d01c18b1e$9ba84320$8455e5ce@edison
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

I'm looking for the most portable way to do the following,
given these two tables:

Table 1 is called "content" which contains an integer "id" column.
The value of this id is "1" for this example.

Table 2 is called "protection" and keeps track of who can read
and write the content object, so this table may contain multiple
protection settings for a single content object.

Simplifying, to consider my problem, the protection table contains
a column called "contentId", connecting it to the content table id.
This table has three rows in it for three groups that can read it
with contentId set to "1".

If I do the query:

select Content.* from Content,Protection
where (Content.id = Protection.contentId);

I get three rows back, corresponding to each group that has
access to the content. However, I would like to get back
just one row, corresponding to the content that fits the desired
protections.

How do I do this?

Thanks for answering this question for the relative sql newbie.


From: Ian Barwick <barwick(at)gmx(dot)net>
To: "Frank Morton" <fmorton(at)base2inc(dot)com>, <pgsql-sql(at)postgresql(dot)org>
Subject: Re: JOIN question
Date: 2001-12-22 22:57:34
Message-ID: 200112222355.AAA29294@post.webmailer.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

On Saturday 22 December 2001 20:26, Frank Morton wrote:
> I'm looking for the most portable way to do the following,
> given these two tables:
>
> Table 1 is called "content" which contains an integer "id" column.
> The value of this id is "1" for this example.
>
> Table 2 is called "protection" and keeps track of who can read
> and write the content object, so this table may contain multiple
> protection settings for a single content object.
>
> Simplifying, to consider my problem, the protection table contains
> a column called "contentId", connecting it to the content table id.
> This table has three rows in it for three groups that can read it
> with contentId set to "1".
>
> If I do the query:
>
> select Content.* from Content,Protection
> where (Content.id = Protection.contentId);
>
> I get three rows back, corresponding to each group that has
> access to the content. However, I would like to get back
> just one row, corresponding to the content that fits the desired
> protections.

IIUC (if I understand correctly) you want to get each row of table
"Content" which is referenced at least once from "Protection"?

If so SELECT DISTINCT will be your friend:

SELECT DISTINCT Content.*
FROM Content,Protection
WHERE (Content.id = Protection.contentId)

or in ANSI rather than theta join style:

SELECT DISTINCT Content.*
FROM Content
INNER JOIN Protection ON (Content.id=Protection.contentId)

You could achieve the same result with a subselect, although
it may be slower:

SELECT *
FROM Content
WHERE Content.id IN (SELECT contentID FROM Protection)

HTH

Ian Barwick


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Frank Morton <fmorton(at)base2inc(dot)com>
Cc: <pgsql-sql(at)postgresql(dot)org>
Subject: Re: JOIN question
Date: 2001-12-22 23:27:07
Message-ID: Pine.LNX.4.30.0112222245230.1404-100000@peter.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

Frank Morton writes:

> If I do the query:
>
> select Content.* from Content,Protection
> where (Content.id = Protection.contentId);
>
> I get three rows back, corresponding to each group that has
> access to the content. However, I would like to get back
> just one row, corresponding to the content that fits the desired
> protections.

You didn't specify the "desired protection" anywhere within your query.

Possibly you want something like

SELECT * FROM content, protection WHERE content.id = protection.contentid
AND protection.name = 'something';

This will give you the content that "something" has access to, which may
be zero, one, or many rows.

--
Peter Eisentraut peter_e(at)gmx(dot)net