Re: US Census database (Tiger 2004FE) - 4.4G

Lists: pgsql-hackers
From: "Mark Woodward" <pgsql(at)mohawksoft(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: US Census database (Tiger 2004FE)
Date: 2005-08-03 21:00:16
Message-ID: 20799.24.91.171.78.1123102816.squirrel@mail.mohawksoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I just finished converting and loading the US census data into PostgreSQL
would anyone be interested in it for testing purposes?

It's a *LOT* of data (about 40+ Gig in PostgreSQL)


From: David Fetter <david(at)fetter(dot)org>
To: Mark Woodward <pgsql(at)mohawksoft(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: US Census database (Tiger 2004FE)
Date: 2005-08-04 00:09:56
Message-ID: 20050804000956.GD673@fetter.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Aug 03, 2005 at 05:00:16PM -0400, Mark Woodward wrote:
> I just finished converting and loading the US census data into PostgreSQL
> would anyone be interested in it for testing purposes?
>
> It's a *LOT* of data (about 40+ Gig in PostgreSQL)

Sure. Got a torrent?

Cheers,
D
--
David Fetter david(at)fetter(dot)org http://fetter.org/
phone: +1 510 893 6100 mobile: +1 415 235 3778

Remember to vote!


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: David Fetter <david(at)fetter(dot)org>
Cc: Mark Woodward <pgsql(at)mohawksoft(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: US Census database (Tiger 2004FE)
Date: 2005-08-04 00:34:20
Message-ID: 42F1628C.7080003@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


David Fetter wrote:

>On Wed, Aug 03, 2005 at 05:00:16PM -0400, Mark Woodward wrote:
>
>
>>I just finished converting and loading the US census data into PostgreSQL
>>would anyone be interested in it for testing purposes?
>>
>>It's a *LOT* of data (about 40+ Gig in PostgreSQL)
>>
>>
>
>Sure. Got a torrent?
>
>
>

How big is it when dumped and compressed?

cheers

andrew


From: "Mark Woodward" <pgsql(at)mohawksoft(dot)com>
To: "Andrew Dunstan" <andrew(at)dunslane(dot)net>, pgsql-hackers(at)postgresql(dot)org
Cc: "David Fetter" <david(at)fetter(dot)org>, "Mark Woodward" <pgsql(at)mohawksoft(dot)com>
Subject: Re: US Census database (Tiger 2004FE)
Date: 2005-08-04 11:15:43
Message-ID: 22964.24.91.171.78.1123154143.squirrel@mail.mohawksoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Wow! a lot of people seem to want it!
I am dumping out with pg_dump right now, it may take a few hours.
It is in PostgreSQL 8.0.3
Does anyone have access to a high bandwidth server? I could mail it on a
DVD to someone who would host it.

>
> David Fetter wrote:
>
>>On Wed, Aug 03, 2005 at 05:00:16PM -0400, Mark Woodward wrote:
>>
>>
>>>I just finished converting and loading the US census data into
>>> PostgreSQL
>>>would anyone be interested in it for testing purposes?
>>>
>>>It's a *LOT* of data (about 40+ Gig in PostgreSQL)
>>>
>>>
>>
>>Sure. Got a torrent?
>>
>>
>>
>
>
> How big is it when dumped and compressed?
>
> cheers
>
> andrew
>


From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Mark Woodward <pgsql(at)mohawksoft(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: US Census database (Tiger 2004FE)
Date: 2005-08-04 11:29:16
Message-ID: 20050804112916.GN6026@ns.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

* Mark Woodward (pgsql(at)mohawksoft(dot)com) wrote:
> I just finished converting and loading the US census data into PostgreSQL
> would anyone be interested in it for testing purposes?
>
> It's a *LOT* of data (about 40+ Gig in PostgreSQL)

How big dumped & compressed? I may be able to host it depending on how
big it ends up being...

Stephen


From: "Mark Woodward" <pgsql(at)mohawksoft(dot)com>
To: "Stephen Frost" <sfrost(at)snowman(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: US Census database (Tiger 2004FE)
Date: 2005-08-04 12:40:18
Message-ID: 22897.24.91.171.78.1123159218.squirrel@mail.mohawksoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> * Mark Woodward (pgsql(at)mohawksoft(dot)com) wrote:
>> I just finished converting and loading the US census data into
>> PostgreSQL
>> would anyone be interested in it for testing purposes?
>>
>> It's a *LOT* of data (about 40+ Gig in PostgreSQL)
>
> How big dumped & compressed? I may be able to host it depending on how
> big it ends up being...

It's been running for about an hour now, and it is up to 3.3G.

pg_dump tiger | gzip > tiger.pgz

I'll let you know. Hopefully, it will fit on DVD.

You know, ... maybe pg_dump needs a progress bar? (How would it do that, I
wonder?)


From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Mark Woodward <pgsql(at)mohawksoft(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: US Census database (Tiger 2004FE)
Date: 2005-08-04 12:42:34
Message-ID: 20050804124234.GO6026@ns.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

* Mark Woodward (pgsql(at)mohawksoft(dot)com) wrote:
> > How big dumped & compressed? I may be able to host it depending on how
> > big it ends up being...
>
> It's been running for about an hour now, and it is up to 3.3G.

Not too bad. I had 2003 (iirc) loaded into 7.4 at one point.

> pg_dump tiger | gzip > tiger.pgz

What db version are you using, how did you load it (ogr2ogr?), is it in
postgis form? Fun questions, all of them. :)

> I'll let you know. Hopefully, it will fit on DVD.

I guess your upload pipe isn't very big? snail-mail is slow... :)

> You know, ... maybe pg_dump needs a progress bar? (How would it do that, I
> wonder?)

Using the new functions in 8.1 which provide size-on-disk of things,
hopefully there's also a function to give a tuple-size or similar as
well. It'd be a high estimate due to dead tuples but should be
sufficient for a progress bar.

Thanks,

Stephen


From: Tino Wildenhain <tino(at)wildenhain(dot)de>
To: Mark Woodward <pgsql(at)mohawksoft(dot)com>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: US Census database (Tiger 2004FE)
Date: 2005-08-04 12:59:32
Message-ID: 1123160373.15416.58.camel@sabrina.peacock.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Am Donnerstag, den 04.08.2005, 08:40 -0400 schrieb Mark Woodward:
> > * Mark Woodward (pgsql(at)mohawksoft(dot)com) wrote:
> >> I just finished converting and loading the US census data into
> >> PostgreSQL
> >> would anyone be interested in it for testing purposes?
> >>
> >> It's a *LOT* of data (about 40+ Gig in PostgreSQL)
> >
> > How big dumped & compressed? I may be able to host it depending on how
> > big it ends up being...
>
> It's been running for about an hour now, and it is up to 3.3G.
>
> pg_dump tiger | gzip > tiger.pgz
>
> I'll let you know. Hopefully, it will fit on DVD.
>
> You know, ... maybe pg_dump needs a progress bar? (How would it do that, I
> wonder?)
pg_dump -v maybe? ;) *hint hint*

--
Tino Wildenhain <tino(at)wildenhain(dot)de>


From: "Mark Woodward" <pgsql(at)mohawksoft(dot)com>
To: "Stephen Frost" <sfrost(at)snowman(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: US Census database (Tiger 2004FE)
Date: 2005-08-04 13:01:27
Message-ID: 22892.24.91.171.78.1123160487.squirrel@mail.mohawksoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> * Mark Woodward (pgsql(at)mohawksoft(dot)com) wrote:
>> > How big dumped & compressed? I may be able to host it depending on
>> how
>> > big it ends up being...
>>
>> It's been running for about an hour now, and it is up to 3.3G.
>
> Not too bad. I had 2003 (iirc) loaded into 7.4 at one point.

Cool.

>
>> pg_dump tiger | gzip > tiger.pgz
>
> What db version are you using, how did you load it (ogr2ogr?), is it in
> postgis form? Fun questions, all of them. :)

8.0.3, in simple pg_dump form.

I loaded it with a utility I wrote a long time ago for tigerua. It is a
fixed width text file to PG utility. It takes a "control" file that
describes the fields, field widths, and field name. It creates a SQL
"create table" statement, and also reads all the records from a control
file into a PostgreSQL copy command. A control file looks something like:

# Zip+4 codes
# Tiger 2003 Record Conversion File
# Copyright (c) 2004 Mark L. Woodward, Mohawk Software
TABLE RTZ
1:I RT
4:I VERSION
10:T TLID
3:S RTSQ
4:Z ZIP4L
4:Z ZIP4R

The first number is the field width in chars, second is an optional type
(there are a few, 'I' means ignore, 'Z' means zipcode, etc.) if no type is
given, then varchar is assumed. Last is the column name.

>
>> I'll let you know. Hopefully, it will fit on DVD.
>
> I guess your upload pipe isn't very big? snail-mail is slow... :)

Never underestimate the bandwidth of a few DVDs and FedEx. Do the math, it
is embarrasing.

>
>> You know, ... maybe pg_dump needs a progress bar? (How would it do that,
>> I
>> wonder?)
>
> Using the new functions in 8.1 which provide size-on-disk of things,
> hopefully there's also a function to give a tuple-size or similar as
> well. It'd be a high estimate due to dead tuples but should be
> sufficient for a progress bar.
>
> Thanks,
>
> Stephen
>


From: Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>
To: Mark Woodward <pgsql(at)mohawksoft(dot)com>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: US Census database (Tiger 2004FE)
Date: 2005-08-04 14:07:14
Message-ID: 42F22112.9070002@familyhealth.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> It's been running for about an hour now, and it is up to 3.3G.
>
> pg_dump tiger | gzip > tiger.pgz

| bzip2 > tiger.sql.bz2 :)

Chris


From: "Mark Woodward" <pgsql(at)mohawksoft(dot)com>
To: "Christopher Kings-Lynne" <chriskl(at)familyhealth(dot)com(dot)au>
Cc: "Stephen Frost" <sfrost(at)snowman(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: US Census database (Tiger 2004FE)
Date: 2005-08-04 14:16:55
Message-ID: 22673.24.91.171.78.1123165015.squirrel@mail.mohawksoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

>> It's been running for about an hour now, and it is up to 3.3G.
>>
>> pg_dump tiger | gzip > tiger.pgz
>
> | bzip2 > tiger.sql.bz2 :)
>

I find bzip2 FAR SLOWER than the gain in compression.


From: "Mark Woodward" <pgsql(at)mohawksoft(dot)com>
To: "Stephen Frost" <sfrost(at)snowman(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: US Census database (Tiger 2004FE) - 4.4G
Date: 2005-08-04 15:26:26
Message-ID: 22538.24.91.171.78.1123169186.squirrel@mail.mohawksoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

It is 4.4G in space in a gzip package.

I'll mail a DVD to two people who promise to host it for Hackers.


From: "Gavin M(dot) Roy" <gmr(at)ehpg(dot)net>
To: "Mark Woodward" <pgsql(at)mohawksoft(dot)com>
Cc: "Stephen Frost" <sfrost(at)snowman(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: US Census database (Tiger 2004FE) - 4.4G
Date: 2005-08-04 16:37:24
Message-ID: D02F66EB-DAAC-43EA-A493-217F40929CC8@ehpg.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

You can send it to me, and ehpg will host it. I'll send you a
private email with my info.

Gavin

On Aug 4, 2005, at 8:26 AM, Mark Woodward wrote:

> It is 4.4G in space in a gzip package.
>
> I'll mail a DVD to two people who promise to host it for Hackers.
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
> http://archives.postgresql.org
>

Gavin M. Roy
800 Pound Gorilla
gmr(at)ehpg(dot)net


From: Ron Mayer <rm_pg(at)cheapcomplexdevices(dot)com>
To: Mark Woodward <pgsql(at)mohawksoft(dot)com>
Subject: Re: US Census database (Tiger 2004FE) - 4.4G
Date: 2005-08-04 18:41:34
Message-ID: 42F2615E.5080700@cheapcomplexdevices.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Mark Woodward wrote:
> It is 4.4G in space in a gzip package.
>
> I'll mail a DVD to two people who promise to host it for Hackers.

Would it be easier to release the program you did to do
this conversion?

I use this pretty short (274 line) C program:
http://www.forensiclogic.com/tmp/tgr2sql.c
to convert the raw tiger files
from http://www.census.gov/geo/www/tiger/index.html
into SQL statements that can be loaded by postgresql.

The #define SQL line controls if it makes data
with INSERT statements or for COPY statements.


From: "Mark Woodward" <pgsql(at)mohawksoft(dot)com>
To: "Ron Mayer" <rm_pg(at)cheapcomplexdevices(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: US Census database (Tiger 2004FE) - 4.4G
Date: 2005-08-04 21:22:15
Message-ID: 22672.24.91.171.78.1123190535.squirrel@mail.mohawksoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I thought bout it, but it isn't the best program around, but it does work.
My program also reformats numbers, i.e. long/lat become properly
decimal-ed numerics, zips become integers, etc.

The question is...

Do you download the raw data and convert it into a database, or do you
download the pre-formatted database?

I would say the preformated database is easier to manage. There are
hundreds of individual zips files, in each of those files 10 or so data
files.

> Mark Woodward wrote:
>> It is 4.4G in space in a gzip package.
>>
>> I'll mail a DVD to two people who promise to host it for Hackers.
>
> Would it be easier to release the program you did to do
> this conversion?
>
>
> I use this pretty short (274 line) C program:
> http://www.forensiclogic.com/tmp/tgr2sql.c
> to convert the raw tiger files
> from http://www.census.gov/geo/www/tiger/index.html
> into SQL statements that can be loaded by postgresql.
>
> The #define SQL line controls if it makes data
> with INSERT statements or for COPY statements.
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend
>


From: Darcy Buskermolen <darcy(at)wavefire(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Cc: "Gavin M(dot) Roy" <gmr(at)ehpg(dot)net>, "Mark Woodward" <pgsql(at)mohawksoft(dot)com>, "Stephen Frost" <sfrost(at)snowman(dot)net>
Subject: Re: US Census database (Tiger 2004FE) - 4.4G
Date: 2005-09-15 22:18:04
Message-ID: 200509151518.05052.darcy@wavefire.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thursday 04 August 2005 09:37, Gavin M. Roy wrote:
> You can send it to me, and ehpg will host it. I'll send you a
> private email with my info.
>
> Gavin
>
> On Aug 4, 2005, at 8:26 AM, Mark Woodward wrote:
> > It is 4.4G in space in a gzip package.
> >
> > I'll mail a DVD to two people who promise to host it for Hackers.

I'm wondering if this is now available for consumption by the rest of us??

(ie what's the link)

--
Darcy Buskermolen
Wavefire Technologies Corp.

http://www.wavefire.com
ph: 250.717.0200
fx: 250.763.1759


From: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To: Darcy Buskermolen <darcy(at)wavefire(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, "Gavin M(dot) Roy" <gmr(at)ehpg(dot)net>, Mark Woodward <pgsql(at)mohawksoft(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>
Subject: Re: US Census database (Tiger 2004FE) - 4.4G
Date: 2005-09-16 15:57:37
Message-ID: 432AEB71.3050304@commandprompt.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Darcy Buskermolen wrote:
> On Thursday 04 August 2005 09:37, Gavin M. Roy wrote:
>
>>You can send it to me, and ehpg will host it. I'll send you a
>>private email with my info.
>>
>>Gavin
>>
>>On Aug 4, 2005, at 8:26 AM, Mark Woodward wrote:
>>
>>>It is 4.4G in space in a gzip package.
>>>
>>>I'll mail a DVD to two people who promise to host it for Hackers.

Command Prompt would be willing to host it.

Sincerely,

Joshua D. Drake

>
>
> I'm wondering if this is now available for consumption by the rest of us??
>
> (ie what's the link)
>
>

--
Your PostgreSQL solutions company - Command Prompt, Inc. 1.800.492.2240
PostgreSQL Replication, Consulting, Custom Programming, 24x7 support
Managed Services, Shared and Dedicated Hosting
Co-Authors: plPHP, plPerlNG - http://www.commandprompt.com/