Re: Huge sample dataset for testing.

Lists: pgsql-general
From: Tim Uckun <timuckun(at)gmail(dot)com>
To: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Huge sample dataset for testing.
Date: 2009-04-28 11:36:24
Message-ID: 855e4dcf0904280436p18041dc3h31b9e991c289b564@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Does anybody know if there is a sample database or text files I can import
to do some performance testing?

I would like to have tables with tens of millions of records if possible.


From: Gerd König <koenig(at)transporeon(dot)com>
To: Tim Uckun <timuckun(at)gmail(dot)com>
Cc: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: Huge sample dataset for testing.
Date: 2009-04-28 11:46:34
Message-ID: 49F6EC9A.3080507@transporeon.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Hello Tim,

you can create this by yourself very easily, e.g. you have a table
CREATE TABLE test1
(
a_int serial NOT NULL,
a_text character varying(200),
dt timestamp without time zone DEFAULT now(),
primary key (a_int)
);
create a bunch of data with something like:
insert into test1 (a_text) (select 'this is row number: '||i::text from (select
generate_series(1,1000000) as i) as q);

regards...GERD...

Tim Uckun schrieb:
> Does anybody know if there is a sample database or text files I can
> import to do some performance testing?
>
> I would like to have tables with tens of millions of records if possible.
>
>

--
/===============================\
| Gerd König
| - Infrastruktur -
|
| TRANSPOREON GmbH
| Pfarrer-Weiss-Weg 12
| DE - 89077 Ulm
|
|
| Tel: +49 [0]731 16906 16
| Fax: +49 [0]731 16906 99
| Web: www.transporeon.com
|
\===============================/

Bleiben Sie auf dem Laufenden.
Jetzt den Transporeon Newsletter abonnieren!
http://www.transporeon.com/unternehmen_newsletter.shtml

TRANSPOREON GmbH, Amtsgericht Ulm, HRB 722056
Geschäftsf.: Axel Busch, Peter Förster, Roland Hötzl, Marc-Oliver Simon


From: "A(dot) Kretschmer" <andreas(dot)kretschmer(at)schollglas(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Huge sample dataset for testing.
Date: 2009-04-28 11:49:33
Message-ID: 20090428114933.GD13320@a-kretschmer.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

In response to Tim Uckun :
> Does anybody know if there is a sample database or text files I can import to
> do some performance testing?
>
> I would like to have tables with tens of millions of records if possible.

It is easy to create such a table:

test=# create table huge_data_table as select s, md5(s::text) from generate_series(1,10) s;
SELECT
test=*# select * from huge_data_table ;
s | md5
----+----------------------------------
1 | c4ca4238a0b923820dcc509a6f75849b
2 | c81e728d9d4c2f636f067f89cc14862c
3 | eccbc87e4b5ce2fe28308fd9f2a7baf3
4 | a87ff679a2f3e71d9181a67b7542122c
5 | e4da3b7fbbce2345d7772b0674a318d5
6 | 1679091c5a880faf6fb5e6087eb1b2dc
7 | 8f14e45fceea167a5a36dedd4bea2543
8 | c9f0f895fb98ab9159f51fd0297e236d
9 | 45c48cce2e2d7fbdea1afc51c7c6ad26
10 | d3d9446802a44259755d38e6d163e820
(10 rows)

Change the 2nd parameter from 10 to <insert a big number>

Andreas
--
Andreas Kretschmer
Kontakt: Heynitz: 035242/47150, D1: 0160/7141639 (mehr: -> Header)
GnuPG-ID: 0x3FFF606C, privat 0x7F4584DA http://wwwkeys.de.pgp.net


From: Tim Uckun <timuckun(at)gmail(dot)com>
To: "A(dot) Kretschmer" <andreas(dot)kretschmer(at)schollglas(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Huge sample dataset for testing.
Date: 2009-04-28 12:29:36
Message-ID: 855e4dcf0904280529j372748f9td1494c75919f9e25@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

>
> > I would like to have tables with tens of millions of records if possible.
>
> It is easy to create such a table:
>
> test=# create table huge_data_table as select s, md5(s::text) from
> generate_series(1,10) s;

Thanks I'll try something like that.

I guess can create some random dates or something for other types of fields
too.

I was hoping there was already something like this available though because
it's going to take some time to create relations and such.


From: "A(dot) Kretschmer" <andreas(dot)kretschmer(at)schollglas(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Huge sample dataset for testing.
Date: 2009-04-28 12:36:15
Message-ID: 20090428123615.GE13320@a-kretschmer.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

In response to Tim Uckun :
> Thanks I'll try something like that.
>
> I guess can create some random dates or something for other types of fields
> too.

Sure, dates for instance:

test=*# select (current_date + random() * 1000 * '1day'::interval)::date from generate_series(1,10);
date
------------
2010-12-11
2009-06-20
2009-08-13
2011-10-17
2011-10-09
2010-10-13
2010-02-04
2011-03-04
2012-01-17
2010-11-18
(10 rows)

>
> I was hoping there was already something like this available though because
> it's going to take some time to create relations and such.

You want really download a database or table with 100 million rows?

Andreas
--
Andreas Kretschmer
Kontakt: Heynitz: 035242/47150, D1: 0160/7141639 (mehr: -> Header)
GnuPG-ID: 0x3FFF606C, privat 0x7F4584DA http://wwwkeys.de.pgp.net


From: Greg Smith <gsmith(at)gregsmith(dot)com>
To: Tim Uckun <timuckun(at)gmail(dot)com>
Cc: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: Huge sample dataset for testing.
Date: 2009-04-28 18:30:13
Message-ID: alpine.GSO.2.01.0904281405530.29444@westnet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Tue, 28 Apr 2009, Tim Uckun wrote:

> Does anybody know if there is a sample database or text files I can
> import to do some performance testing? I would like to have tables with
> tens of millions of records if possible.

There is a utility that ships with PostgreSQL named pgbench that includes
a simple schema (4 tables) and a data generator. The generator
initialization step takes a database scale factor and creates 100,000
records per unit of scale. So a scale of, say, 500 would give you 50M
records. These tables are pretty simple, just having some ID number keys
and simulated bank accounts balances.

If you want a more complicated schema, you might try one of those from the
various DBT projects. See
http://www.slideshare.net/markwkm/postgresql-portland-performance-practice-project-database-test-2-howto
for an intro to DBT2, which gives you 9 tables you can populate in various
ways to play with.

--
* Greg Smith gsmith(at)gregsmith(dot)com http://www.gregsmith.com Baltimore, MD