Lists: | pgsql-hackers |
---|
From: | "Mark Woodward" <pgsql(at)mohawksoft(dot)com> |
---|---|
To: | pgsql-hackers(at)postgresql(dot)org |
Subject: | US Census database (Tiger 2004FE) |
Date: | 2005-08-03 21:00:16 |
Message-ID: | 20799.24.91.171.78.1123102816.squirrel@mail.mohawksoft.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
I just finished converting and loading the US census data into PostgreSQL
would anyone be interested in it for testing purposes?
It's a *LOT* of data (about 40+ Gig in PostgreSQL)
From: | David Fetter <david(at)fetter(dot)org> |
---|---|
To: | Mark Woodward <pgsql(at)mohawksoft(dot)com> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: US Census database (Tiger 2004FE) |
Date: | 2005-08-04 00:09:56 |
Message-ID: | 20050804000956.GD673@fetter.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Wed, Aug 03, 2005 at 05:00:16PM -0400, Mark Woodward wrote:
> I just finished converting and loading the US census data into PostgreSQL
> would anyone be interested in it for testing purposes?
>
> It's a *LOT* of data (about 40+ Gig in PostgreSQL)
Sure. Got a torrent?
Cheers,
D
--
David Fetter david(at)fetter(dot)org http://fetter.org/
phone: +1 510 893 6100 mobile: +1 415 235 3778
Remember to vote!
From: | Andrew Dunstan <andrew(at)dunslane(dot)net> |
---|---|
To: | David Fetter <david(at)fetter(dot)org> |
Cc: | Mark Woodward <pgsql(at)mohawksoft(dot)com>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: US Census database (Tiger 2004FE) |
Date: | 2005-08-04 00:34:20 |
Message-ID: | 42F1628C.7080003@dunslane.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
David Fetter wrote:
>On Wed, Aug 03, 2005 at 05:00:16PM -0400, Mark Woodward wrote:
>
>
>>I just finished converting and loading the US census data into PostgreSQL
>>would anyone be interested in it for testing purposes?
>>
>>It's a *LOT* of data (about 40+ Gig in PostgreSQL)
>>
>>
>
>Sure. Got a torrent?
>
>
>
How big is it when dumped and compressed?
cheers
andrew
From: | "Mark Woodward" <pgsql(at)mohawksoft(dot)com> |
---|---|
To: | "Andrew Dunstan" <andrew(at)dunslane(dot)net>, pgsql-hackers(at)postgresql(dot)org |
Cc: | "David Fetter" <david(at)fetter(dot)org>, "Mark Woodward" <pgsql(at)mohawksoft(dot)com> |
Subject: | Re: US Census database (Tiger 2004FE) |
Date: | 2005-08-04 11:15:43 |
Message-ID: | 22964.24.91.171.78.1123154143.squirrel@mail.mohawksoft.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
Wow! a lot of people seem to want it!
I am dumping out with pg_dump right now, it may take a few hours.
It is in PostgreSQL 8.0.3
Does anyone have access to a high bandwidth server? I could mail it on a
DVD to someone who would host it.
>
> David Fetter wrote:
>
>>On Wed, Aug 03, 2005 at 05:00:16PM -0400, Mark Woodward wrote:
>>
>>
>>>I just finished converting and loading the US census data into
>>> PostgreSQL
>>>would anyone be interested in it for testing purposes?
>>>
>>>It's a *LOT* of data (about 40+ Gig in PostgreSQL)
>>>
>>>
>>
>>Sure. Got a torrent?
>>
>>
>>
>
>
> How big is it when dumped and compressed?
>
> cheers
>
> andrew
>
From: | Stephen Frost <sfrost(at)snowman(dot)net> |
---|---|
To: | Mark Woodward <pgsql(at)mohawksoft(dot)com> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: US Census database (Tiger 2004FE) |
Date: | 2005-08-04 11:29:16 |
Message-ID: | 20050804112916.GN6026@ns.snowman.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
* Mark Woodward (pgsql(at)mohawksoft(dot)com) wrote:
> I just finished converting and loading the US census data into PostgreSQL
> would anyone be interested in it for testing purposes?
>
> It's a *LOT* of data (about 40+ Gig in PostgreSQL)
How big dumped & compressed? I may be able to host it depending on how
big it ends up being...
Stephen
From: | "Mark Woodward" <pgsql(at)mohawksoft(dot)com> |
---|---|
To: | "Stephen Frost" <sfrost(at)snowman(dot)net> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: US Census database (Tiger 2004FE) |
Date: | 2005-08-04 12:40:18 |
Message-ID: | 22897.24.91.171.78.1123159218.squirrel@mail.mohawksoft.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
> * Mark Woodward (pgsql(at)mohawksoft(dot)com) wrote:
>> I just finished converting and loading the US census data into
>> PostgreSQL
>> would anyone be interested in it for testing purposes?
>>
>> It's a *LOT* of data (about 40+ Gig in PostgreSQL)
>
> How big dumped & compressed? I may be able to host it depending on how
> big it ends up being...
It's been running for about an hour now, and it is up to 3.3G.
pg_dump tiger | gzip > tiger.pgz
I'll let you know. Hopefully, it will fit on DVD.
You know, ... maybe pg_dump needs a progress bar? (How would it do that, I
wonder?)
From: | Stephen Frost <sfrost(at)snowman(dot)net> |
---|---|
To: | Mark Woodward <pgsql(at)mohawksoft(dot)com> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: US Census database (Tiger 2004FE) |
Date: | 2005-08-04 12:42:34 |
Message-ID: | 20050804124234.GO6026@ns.snowman.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
* Mark Woodward (pgsql(at)mohawksoft(dot)com) wrote:
> > How big dumped & compressed? I may be able to host it depending on how
> > big it ends up being...
>
> It's been running for about an hour now, and it is up to 3.3G.
Not too bad. I had 2003 (iirc) loaded into 7.4 at one point.
> pg_dump tiger | gzip > tiger.pgz
What db version are you using, how did you load it (ogr2ogr?), is it in
postgis form? Fun questions, all of them. :)
> I'll let you know. Hopefully, it will fit on DVD.
I guess your upload pipe isn't very big? snail-mail is slow... :)
> You know, ... maybe pg_dump needs a progress bar? (How would it do that, I
> wonder?)
Using the new functions in 8.1 which provide size-on-disk of things,
hopefully there's also a function to give a tuple-size or similar as
well. It'd be a high estimate due to dead tuples but should be
sufficient for a progress bar.
Thanks,
Stephen
From: | Tino Wildenhain <tino(at)wildenhain(dot)de> |
---|---|
To: | Mark Woodward <pgsql(at)mohawksoft(dot)com> |
Cc: | Stephen Frost <sfrost(at)snowman(dot)net>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: US Census database (Tiger 2004FE) |
Date: | 2005-08-04 12:59:32 |
Message-ID: | 1123160373.15416.58.camel@sabrina.peacock.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
Am Donnerstag, den 04.08.2005, 08:40 -0400 schrieb Mark Woodward:
> > * Mark Woodward (pgsql(at)mohawksoft(dot)com) wrote:
> >> I just finished converting and loading the US census data into
> >> PostgreSQL
> >> would anyone be interested in it for testing purposes?
> >>
> >> It's a *LOT* of data (about 40+ Gig in PostgreSQL)
> >
> > How big dumped & compressed? I may be able to host it depending on how
> > big it ends up being...
>
> It's been running for about an hour now, and it is up to 3.3G.
>
> pg_dump tiger | gzip > tiger.pgz
>
> I'll let you know. Hopefully, it will fit on DVD.
>
> You know, ... maybe pg_dump needs a progress bar? (How would it do that, I
> wonder?)
pg_dump -v maybe? ;) *hint hint*
--
Tino Wildenhain <tino(at)wildenhain(dot)de>
From: | "Mark Woodward" <pgsql(at)mohawksoft(dot)com> |
---|---|
To: | "Stephen Frost" <sfrost(at)snowman(dot)net> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: US Census database (Tiger 2004FE) |
Date: | 2005-08-04 13:01:27 |
Message-ID: | 22892.24.91.171.78.1123160487.squirrel@mail.mohawksoft.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
> * Mark Woodward (pgsql(at)mohawksoft(dot)com) wrote:
>> > How big dumped & compressed? I may be able to host it depending on
>> how
>> > big it ends up being...
>>
>> It's been running for about an hour now, and it is up to 3.3G.
>
> Not too bad. I had 2003 (iirc) loaded into 7.4 at one point.
Cool.
>
>> pg_dump tiger | gzip > tiger.pgz
>
> What db version are you using, how did you load it (ogr2ogr?), is it in
> postgis form? Fun questions, all of them. :)
8.0.3, in simple pg_dump form.
I loaded it with a utility I wrote a long time ago for tigerua. It is a
fixed width text file to PG utility. It takes a "control" file that
describes the fields, field widths, and field name. It creates a SQL
"create table" statement, and also reads all the records from a control
file into a PostgreSQL copy command. A control file looks something like:
# Zip+4 codes
# Tiger 2003 Record Conversion File
# Copyright (c) 2004 Mark L. Woodward, Mohawk Software
TABLE RTZ
1:I RT
4:I VERSION
10:T TLID
3:S RTSQ
4:Z ZIP4L
4:Z ZIP4R
The first number is the field width in chars, second is an optional type
(there are a few, 'I' means ignore, 'Z' means zipcode, etc.) if no type is
given, then varchar is assumed. Last is the column name.
>
>> I'll let you know. Hopefully, it will fit on DVD.
>
> I guess your upload pipe isn't very big? snail-mail is slow... :)
Never underestimate the bandwidth of a few DVDs and FedEx. Do the math, it
is embarrasing.
>
>> You know, ... maybe pg_dump needs a progress bar? (How would it do that,
>> I
>> wonder?)
>
> Using the new functions in 8.1 which provide size-on-disk of things,
> hopefully there's also a function to give a tuple-size or similar as
> well. It'd be a high estimate due to dead tuples but should be
> sufficient for a progress bar.
>
> Thanks,
>
> Stephen
>
From: | Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au> |
---|---|
To: | Mark Woodward <pgsql(at)mohawksoft(dot)com> |
Cc: | Stephen Frost <sfrost(at)snowman(dot)net>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: US Census database (Tiger 2004FE) |
Date: | 2005-08-04 14:07:14 |
Message-ID: | 42F22112.9070002@familyhealth.com.au |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
> It's been running for about an hour now, and it is up to 3.3G.
>
> pg_dump tiger | gzip > tiger.pgz
| bzip2 > tiger.sql.bz2 :)
Chris
From: | "Mark Woodward" <pgsql(at)mohawksoft(dot)com> |
---|---|
To: | "Christopher Kings-Lynne" <chriskl(at)familyhealth(dot)com(dot)au> |
Cc: | "Stephen Frost" <sfrost(at)snowman(dot)net>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: US Census database (Tiger 2004FE) |
Date: | 2005-08-04 14:16:55 |
Message-ID: | 22673.24.91.171.78.1123165015.squirrel@mail.mohawksoft.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
>> It's been running for about an hour now, and it is up to 3.3G.
>>
>> pg_dump tiger | gzip > tiger.pgz
>
> | bzip2 > tiger.sql.bz2 :)
>
I find bzip2 FAR SLOWER than the gain in compression.
From: | "Mark Woodward" <pgsql(at)mohawksoft(dot)com> |
---|---|
To: | "Stephen Frost" <sfrost(at)snowman(dot)net> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | US Census database (Tiger 2004FE) - 4.4G |
Date: | 2005-08-04 15:26:26 |
Message-ID: | 22538.24.91.171.78.1123169186.squirrel@mail.mohawksoft.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
It is 4.4G in space in a gzip package.
I'll mail a DVD to two people who promise to host it for Hackers.
From: | "Gavin M(dot) Roy" <gmr(at)ehpg(dot)net> |
---|---|
To: | "Mark Woodward" <pgsql(at)mohawksoft(dot)com> |
Cc: | "Stephen Frost" <sfrost(at)snowman(dot)net>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: US Census database (Tiger 2004FE) - 4.4G |
Date: | 2005-08-04 16:37:24 |
Message-ID: | D02F66EB-DAAC-43EA-A493-217F40929CC8@ehpg.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
You can send it to me, and ehpg will host it. I'll send you a
private email with my info.
Gavin
On Aug 4, 2005, at 8:26 AM, Mark Woodward wrote:
> It is 4.4G in space in a gzip package.
>
> I'll mail a DVD to two people who promise to host it for Hackers.
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
> http://archives.postgresql.org
>
Gavin M. Roy
800 Pound Gorilla
gmr(at)ehpg(dot)net
From: | Ron Mayer <rm_pg(at)cheapcomplexdevices(dot)com> |
---|---|
To: | Mark Woodward <pgsql(at)mohawksoft(dot)com> |
Subject: | Re: US Census database (Tiger 2004FE) - 4.4G |
Date: | 2005-08-04 18:41:34 |
Message-ID: | 42F2615E.5080700@cheapcomplexdevices.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
Mark Woodward wrote:
> It is 4.4G in space in a gzip package.
>
> I'll mail a DVD to two people who promise to host it for Hackers.
Would it be easier to release the program you did to do
this conversion?
I use this pretty short (274 line) C program:
http://www.forensiclogic.com/tmp/tgr2sql.c
to convert the raw tiger files
from http://www.census.gov/geo/www/tiger/index.html
into SQL statements that can be loaded by postgresql.
The #define SQL line controls if it makes data
with INSERT statements or for COPY statements.
From: | "Mark Woodward" <pgsql(at)mohawksoft(dot)com> |
---|---|
To: | "Ron Mayer" <rm_pg(at)cheapcomplexdevices(dot)com> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: US Census database (Tiger 2004FE) - 4.4G |
Date: | 2005-08-04 21:22:15 |
Message-ID: | 22672.24.91.171.78.1123190535.squirrel@mail.mohawksoft.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
I thought bout it, but it isn't the best program around, but it does work.
My program also reformats numbers, i.e. long/lat become properly
decimal-ed numerics, zips become integers, etc.
The question is...
Do you download the raw data and convert it into a database, or do you
download the pre-formatted database?
I would say the preformated database is easier to manage. There are
hundreds of individual zips files, in each of those files 10 or so data
files.
> Mark Woodward wrote:
>> It is 4.4G in space in a gzip package.
>>
>> I'll mail a DVD to two people who promise to host it for Hackers.
>
> Would it be easier to release the program you did to do
> this conversion?
>
>
> I use this pretty short (274 line) C program:
> http://www.forensiclogic.com/tmp/tgr2sql.c
> to convert the raw tiger files
> from http://www.census.gov/geo/www/tiger/index.html
> into SQL statements that can be loaded by postgresql.
>
> The #define SQL line controls if it makes data
> with INSERT statements or for COPY statements.
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend
>
From: | Darcy Buskermolen <darcy(at)wavefire(dot)com> |
---|---|
To: | pgsql-hackers(at)postgresql(dot)org |
Cc: | "Gavin M(dot) Roy" <gmr(at)ehpg(dot)net>, "Mark Woodward" <pgsql(at)mohawksoft(dot)com>, "Stephen Frost" <sfrost(at)snowman(dot)net> |
Subject: | Re: US Census database (Tiger 2004FE) - 4.4G |
Date: | 2005-09-15 22:18:04 |
Message-ID: | 200509151518.05052.darcy@wavefire.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Thursday 04 August 2005 09:37, Gavin M. Roy wrote:
> You can send it to me, and ehpg will host it. I'll send you a
> private email with my info.
>
> Gavin
>
> On Aug 4, 2005, at 8:26 AM, Mark Woodward wrote:
> > It is 4.4G in space in a gzip package.
> >
> > I'll mail a DVD to two people who promise to host it for Hackers.
I'm wondering if this is now available for consumption by the rest of us??
(ie what's the link)
--
Darcy Buskermolen
Wavefire Technologies Corp.
http://www.wavefire.com
ph: 250.717.0200
fx: 250.763.1759
From: | "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com> |
---|---|
To: | Darcy Buskermolen <darcy(at)wavefire(dot)com> |
Cc: | pgsql-hackers(at)postgresql(dot)org, "Gavin M(dot) Roy" <gmr(at)ehpg(dot)net>, Mark Woodward <pgsql(at)mohawksoft(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net> |
Subject: | Re: US Census database (Tiger 2004FE) - 4.4G |
Date: | 2005-09-16 15:57:37 |
Message-ID: | 432AEB71.3050304@commandprompt.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
Darcy Buskermolen wrote:
> On Thursday 04 August 2005 09:37, Gavin M. Roy wrote:
>
>>You can send it to me, and ehpg will host it. I'll send you a
>>private email with my info.
>>
>>Gavin
>>
>>On Aug 4, 2005, at 8:26 AM, Mark Woodward wrote:
>>
>>>It is 4.4G in space in a gzip package.
>>>
>>>I'll mail a DVD to two people who promise to host it for Hackers.
Command Prompt would be willing to host it.
Sincerely,
Joshua D. Drake
>
>
> I'm wondering if this is now available for consumption by the rest of us??
>
> (ie what's the link)
>
>
--
Your PostgreSQL solutions company - Command Prompt, Inc. 1.800.492.2240
PostgreSQL Replication, Consulting, Custom Programming, 24x7 support
Managed Services, Shared and Dedicated Hosting
Co-Authors: plPHP, plPerlNG - http://www.commandprompt.com/