Re: Populating large DB from Perl script

From: andy <andy(at)squeakycode(dot)net>
To: Kynn Jones <kynnjo(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Populating large DB from Perl script
Date: 2007-11-02 20:33:20
Message-ID: 472B8990.4050700@squeakycode.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Kynn Jones wrote:
>> So... not really an answer (other than I used a stored proc)
>
> Actually, I'm interested in your solution.
>
> Just to make sure I understood what you did: you bulk-populated (i.e.
> with $dbh->do('COPY...'), $dbh->pg_putline(...), $dbh->pg_endcopy) the
> referring tables, with their fkey constraints disabled; then you ran
> stored procedure(s) that went through these referring tables and
> filled in the missing fkeys; and finally you activated their fkey
> constraints. Is this right?
>
> I'm very much of a stored procedures dunce, so if the code for your
> stored procedure is "postable", please do.
>
> TIA,
>
> kj
>
> PS: As an aside to the list, as a programmer, when I'm starting out in
> language, I learn more than I can say from reading source code written
> by the experts, but for some reason I have had a hard time coming
> across expertly written PostgreSQL stored procedures, other than the
> occasionally didactic snippet in the docs. All these expertly-written
> procedures seem to be very STORED away indeed! If, on the contrary,
> it's just the case that I haven't looked in the right places, please
> hurl me a cluebrick!

I'm afraid it was nothing that super. My scripts where to insert a
bunch of test data into a db so I could play with some very large db's.

I have a people table, and a car table. I wanted to add ownership of
cars to people, but thought nobody should own more than 5 cars (to make
sure my "random" generator didnt bunch too many cars per person)

My perl looks like:
my $carcount = new BerkeleyDB::Btree( -Filename => 'carcount.dbm',
-Flags => DB_CREATE ) or die "Cannot open file: $!";

<..later...>
if (rand() > 0.10) {
my $person = int(rand($maxperson));
my $ok = 1;

$i = 0;
$carcount->db_get($person, $i);

if ($i >= 5)
{
#shall we allow more than 5 cars?
if (rand() > 0.90) {
$ok = 1;
} else {
$ok = 0;
}
}

if ($ok)
{
... do the insert ...
}

So I was keeping personid => carcount map. This worked great, as I
said, until I got into the gigbytes size for the BerkeleyDB.

Instead I created a stored proc:

CREATE OR REPLACE FUNCTION addowner(xpersonid integer, xcarid integer)
returns void AS $$
declare
cc integer;
begin
select into cc count(*)
from ownership
where personid = xpersonid;

if cc < 5 then
insert into ownership(personid, carid) values (xpersonid, xcarid);
end if;

return;
end
$$ LANGUAGE plpgsql;

... and here is another stored proc I wrote for our website. We have
company shirts n'stuff that employee's can order. So the webpage is
kinda like a shopping cart thing.

create or replace function updatecart(xuserid integer, xgroupid integer,
xprodid integer, xsizeid integer, xcolorid integer, xqty integer)
returns void as $$
declare
xid integer;
begin
select into xid rowid from vcs_ordertable where userid = xuserid and
groupid = xgroupid and prodid = xprodid and sizeid = xsizeid and colorid
= xcolorid;
if not found then
insert into vcs_ordertable(userid, groupid, prodid, sizeid,
colorid, quant) values (xuserid, xgroupid, xprodid, xsizeid, xcolorid,
xqty);
else
update vcs_ordertable set quant = quant + xqty where rowid = xid;
end if;
return;
end
$$ LANGUAGE plpgsql;

When you select a shirt (including the size, color, etc) I only wanted
one row per (user, group, productid, size and color), if one already
exists in the table, I just bump its count, if not I insert it.

These two procs are about as complex as I've needed to get.

-Andy

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Jeffrey W. Baker 2007-11-02 20:49:27 setting for maximum acceptable plan cost?
Previous Message Dawid Kuroczko 2007-11-02 18:57:59 Re: Populating large DB from Perl script