Re: Storing and querying boolean fields

Lists: pgsql-general
From: "Richard Brown" <rbrown(at)openratings(dot)com>
To: <pgsql-general(at)postgresql(dot)org>
Subject: Storing and querying boolean fields
Date: 2008-01-09 23:23:04
Message-ID: 009901c85316$95c83110$5b06be0a@ad.openratings.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Hi All,

First, some background:

- We are using PostgreSQL 7.3.4, and am locked into this version. I would
upgrade if I could, but the decision is not mine.
- The table referred to below is 120+ million rows, and has a width of 27
columns (15 smallints, 5 integers, 4 dates, 1 integer[], 1 single char text
field, and 1 double precision). This table already has 3 multi-field
indexes, as well as a primary key index.

I need to add many boolean flags to my database that need to be retrieved by
primary key, as well as be searched upon with decent speed. By "many", I
mean about 30 booleans total. The added bonus is that some of these values
need to be searched in an OR manner, while others need to be searched as an
AND. For example, I will need to do a search that looks something like:
select * from table where A && B && C && (D || E) && F

Our first thought was to "pack" these booleans into integer fields to save
space and help with searching (so we thought). The design calls for a
smallint field and an integer field to be added to the table. UPDATE/INSERT
would be handled by calculating the values of the two fields outside of the
database. Searching would be accomplished by taking the bits that were
requested, creating a bitmask corresponding to the 2 fields, and then doing
something like: select * from table where (field & BITMASK) = BITMASK. We
felt that putting this information in this table rather than a separate one
would help performance by eliminating a JOIN, and it was assumed (ugh) that
if we built an index on those two fields, that we would be able to use those
indexes for searching in the aforementioned manner.

Unfortunately we have come up with the following problems with this
approach:
1. Postgres will only use 1 index per table, so putting these fields in the
same table as several other searchable fields actually hurts us more than
helps.
2. We haven't been able to get any index scans in preliminary testing (all
seq. scans).

After all of that background, my actual question is: what is the best way
to effectively store and query this type of data? It seems more and more
that our initial design is just flat-out wrong, but we are at a loss as to
what the other options are.

Thanks very much in advance, and I apologize for the length of the message,
but I felt background info would be important.

Regards,
Richard


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Richard Brown" <rbrown(at)openratings(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Storing and querying boolean fields
Date: 2008-01-10 00:09:34
Message-ID: 21546.1199923774@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

"Richard Brown" <rbrown(at)openratings(dot)com> writes:
> - We are using PostgreSQL 7.3.4, and am locked into this version. I would
> upgrade if I could, but the decision is not mine.

They won't even let you update to 7.3.something-reasonably-current ?

Resign. Go find a job with a boss whose IQ is above room temperature.
You want to be out of there before the finger-pointing starts, which
it certainly will when one of the many fixed-since-7.3.4 bugs eats
your database.

> I need to add many boolean flags to my database that need to be retrieved by
> primary key, as well as be searched upon with decent speed. By "many", I
> mean about 30 booleans total. The added bonus is that some of these values
> need to be searched in an OR manner, while others need to be searched as an
> AND. For example, I will need to do a search that looks something like:
> select * from table where A && B && C && (D || E) && F

As for the problem at hand, the contrib/intarray module would probably
fill the bill nicely, though I dunno how far I'd trust either it or the
GIST index code back in 7.3. You'd represent the flags as an array of
the numbers of the flags that are ON, and search it with the @@
operator:
int[] @@ query_int - returns TRUE if array satisfies query (like '1&(2|3)')
This on-disk representation might be a bit larger than you were hoping
for, but it'd not break if you suddenly found you needed 33 flags not 32...

regards, tom lane


From: Greg Smith <gsmith(at)gregsmith(dot)com>
To: Richard Brown <rbrown(at)openratings(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Storing and querying boolean fields
Date: 2008-01-10 01:09:51
Message-ID: Pine.GSO.4.64.0801091917010.26051@westnet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Wed, 9 Jan 2008, Richard Brown wrote:

> - We are using PostgreSQL 7.3.4, and am locked into this version. I would
> upgrade if I could, but the decision is not mine.

Just make sure you CYA so when said server eats itself the decision maker
can't point the finger at you. Give them a copy of a dated written
statement from you with text like "PostgreSQL 7.3.4 has several major bugs
that are known to cause data corruption" and quote some of the choicer
items in the trail of release notes at
http://www.postgresql.org/docs/7.3/static/release.html Some juicy ones are
in 7.3.7, 7.3.12, 7.3.19.

The bonus to this approach, where you disclaim the whole platform as known
to be unreliable, is that if you ever make a huge mistake you can shrug
off the blame by saying the old database that you told them was buggy, but
weren't allowed to upgrade, ate the data and it was totally not your
fault.

--
* Greg Smith gsmith(at)gregsmith(dot)com http://www.gregsmith.com Baltimore, MD


From: Robert Treat <xzilla(at)users(dot)sourceforge(dot)net>
To: pgsql-general(at)postgresql(dot)org
Cc: Greg Smith <gsmith(at)gregsmith(dot)com>, Richard Brown <rbrown(at)openratings(dot)com>
Subject: Re: Storing and querying boolean fields
Date: 2008-01-10 03:19:05
Message-ID: 200801092219.05829.xzilla@users.sourceforge.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Wednesday 09 January 2008 20:09, Greg Smith wrote:
> On Wed, 9 Jan 2008, Richard Brown wrote:
> > - We are using PostgreSQL 7.3.4, and am locked into this version. I would
> > upgrade if I could, but the decision is not mine.
>
> Just make sure you CYA so when said server eats itself the decision maker
> can't point the finger at you. Give them a copy of a dated written
> statement from you with text like "PostgreSQL 7.3.4 has several major bugs
> that are known to cause data corruption" and quote some of the choicer
> items in the trail of release notes at
> http://www.postgresql.org/docs/7.3/static/release.html Some juicy ones are
> in 7.3.7, 7.3.12, 7.3.19.
>
> The bonus to this approach, where you disclaim the whole platform as known
> to be unreliable, is that if you ever make a huge mistake you can shrug
> off the blame by saying the old database that you told them was buggy, but
> weren't allowed to upgrade, ate the data and it was totally not your
> fault.
>

It would probably also be nice to reference the last security announcement
(http://www.postgresql.org/about/news.905) which includes this relevant
message:

"Minor release 7.3.21 for PostgreSQL version 7.3 will be the last update to
the 7.3 branch. As version 7.3 is now over five years old, the community will
no longer release patches for it after today's release. Users of version 7.3
are encouraged to upgrade to a more current version as soon as possible, or
to seek support from a commercial support vendor who is willing to continue
backpatching for them."

--
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL