Re: psql is slow and it does not take much resources

Lists: pgsql-general
From: "Larry Rosenman" <lrosenman(at)pervasive(dot)com>
To: "Javier de la Torre" <jatorre(at)gmail(dot)com>
Cc: <pgsql-general(at)postgresql(dot)org>
Subject: Re: psql is slow and it does not take much resources
Date: 2006-05-03 14:12:46
Message-ID: F6616E0E81AC0841B1F9DD252F7C4B55041A56@ausmaildd.aus.pervasive.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Javier de la Torre wrote:
> It is inserts.
>
> I create the inserts myself with a Python programmed I hace created to
> migrate MySQL databases to PostgreSQL (by th way if someone wants
> it...)
Ok, that makes *EACH* insert a transaction, with all the overhead.

You need to batch the inserts between BEGIN;/COMMIT; pairs, or, better
yet
set it up as a COPY.

--
Larry Rosenman
Database Support Engineer

PERVASIVE SOFTWARE. INC.
12365B RIATA TRACE PKWY
3015
AUSTIN TX 78727-6531

Tel: 512.231.6173
Fax: 512.231.6597
Email: Larry(dot)Rosenman(at)pervasive(dot)com
Web: www.pervasive.com


From: "Leif B(dot) Kristensen" <leif(at)solumslekt(dot)org>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: psql is slow and it does not take much resources
Date: 2006-05-03 14:28:10
Message-ID: 200605031628.10244.leif@solumslekt.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Wednesday 03 May 2006 16:12, Larry Rosenman wrote:
>Javier de la Torre wrote:
>> It is inserts.
>>
>> I create the inserts myself with a Python programmed I hace created
>> to migrate MySQL databases to PostgreSQL (by th way if someone wants
>> it...)
>
>Ok, that makes *EACH* insert a transaction, with all the overhead.
>
>You need to batch the inserts between BEGIN;/COMMIT; pairs, or, better
>yet set it up as a COPY.

I'm using essentially the same approach for my custom backup/restore
procedure. I also found it a very slow process. But when I wrapped up
each table script (ie. 20-30k of INSERTs) the time it took to populate
the entire database went down from about half an hour to 50 seconds.
Very impressive ;-)

However, I'm wondering if there's a practical limit to how many rows you
can insert within one transaction?
--
Leif Biberg Kristensen :: Registered Linux User #338009
http://solumslekt.org/ :: Cruising with Gentoo/KDE


From: "Javier de la Torre" <jatorre(at)gmail(dot)com>
To: "Leif B(dot) Kristensen" <leif(at)solumslekt(dot)org>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: psql is slow and it does not take much resources
Date: 2006-05-03 14:43:15
Message-ID: a0174d240605030743t25dfb3e5w19d7551a7c080549@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Yes,

Thanks. I am doing this now...

Is definetly faster, but I will also discover now if there is a limit
in a transaction side... I am going to try to insert into one single
transaction 60 million records in a table.

In any case I still don't understand how why PostgreSQL was not taking
resources before without the transaction. If it has to create a
transaction per insert I understand it will have to do more things,
but why is not taking all resources from the machine? I mean, why is
it only taking 3% of them.

Javier.

On 5/3/06, Leif B. Kristensen <leif(at)solumslekt(dot)org> wrote:
> On Wednesday 03 May 2006 16:12, Larry Rosenman wrote:
> >Javier de la Torre wrote:
> >> It is inserts.
> >>
> >> I create the inserts myself with a Python programmed I hace created
> >> to migrate MySQL databases to PostgreSQL (by th way if someone wants
> >> it...)
> >
> >Ok, that makes *EACH* insert a transaction, with all the overhead.
> >
> >You need to batch the inserts between BEGIN;/COMMIT; pairs, or, better
> >yet set it up as a COPY.
>
> I'm using essentially the same approach for my custom backup/restore
> procedure. I also found it a very slow process. But when I wrapped up
> each table script (ie. 20-30k of INSERTs) the time it took to populate
> the entire database went down from about half an hour to 50 seconds.
> Very impressive ;-)
>
> However, I'm wondering if there's a practical limit to how many rows you
> can insert within one transaction?
> --
> Leif Biberg Kristensen :: Registered Linux User #338009
> http://solumslekt.org/ :: Cruising with Gentoo/KDE
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
> choose an index scan if your joining column's datatypes do not
> match
>


From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: "Leif B(dot) Kristensen" <leif(at)solumslekt(dot)org>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: psql is slow and it does not take much resources
Date: 2006-05-03 15:28:32
Message-ID: 20060503152832.GC27354@svana.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Wed, May 03, 2006 at 04:28:10PM +0200, Leif B. Kristensen wrote:
> However, I'm wondering if there's a practical limit to how many rows you
> can insert within one transaction?

There's a limit of (I think) 2-4 billion commands per transaction. Each
command can insert any number of tuples.

So if you're doing one tuple per command that limits you to a few
billion inserts per transaction. Ofcourse, COPY is always faster
still...

Have a nice day,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.


From: "Javier de la Torre" <jatorre(at)gmail(dot)com>
To: "Martijn van Oosterhout" <kleptog(at)svana(dot)org>, "Leif B(dot) Kristensen" <leif(at)solumslekt(dot)org>, pgsql-general(at)postgresql(dot)org
Subject: Re: psql is slow and it does not take much resources
Date: 2006-05-03 15:34:47
Message-ID: a0174d240605030834v232ca2e3y387150b06137bd96@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Great! Then there will be no problems.

I would use COPY but I think I can not. While moving from MySQL to
PostgreSQL I am also transforming a pair of fields, latitude,
longitude, into a geometry field, POINT, that is understood for
Potgis. I though I will not be able to use COPY when inserting data
with functions.

Thanks again all.

Javier.

On 5/3/06, Martijn van Oosterhout <kleptog(at)svana(dot)org> wrote:
> On Wed, May 03, 2006 at 04:28:10PM +0200, Leif B. Kristensen wrote:
> > However, I'm wondering if there's a practical limit to how many rows you
> > can insert within one transaction?
>
> There's a limit of (I think) 2-4 billion commands per transaction. Each
> command can insert any number of tuples.
>
> So if you're doing one tuple per command that limits you to a few
> billion inserts per transaction. Ofcourse, COPY is always faster
> still...
>
> Have a nice day,
> --
> Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> > From each according to his ability. To each according to his ability to litigate.
>
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.1 (GNU/Linux)
>
> iD8DBQFEWMwgIB7bNG8LQkwRAnvUAJ9YlsyGDInXKwFhsViFTJXvnUmd9ACeO5Al
> LLqOvjBshH9VXfR1SaBHMYE=
> =itek
> -----END PGP SIGNATURE-----
>
>
>


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Martijn van Oosterhout <kleptog(at)svana(dot)org>
Cc: "Leif B(dot) Kristensen" <leif(at)solumslekt(dot)org>, pgsql-general(at)postgresql(dot)org
Subject: Re: psql is slow and it does not take much resources
Date: 2006-05-03 19:17:21
Message-ID: 25040.1146683841@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Martijn van Oosterhout <kleptog(at)svana(dot)org> writes:
>> However, I'm wondering if there's a practical limit to how many rows you
>> can insert within one transaction?

> There's a limit of (I think) 2-4 billion commands per transaction. Each
> command can insert any number of tuples.

> So if you're doing one tuple per command that limits you to a few
> billion inserts per transaction. Ofcourse, COPY is always faster
> still...

If you have any deferred triggers (including foreign keys) on the table
then the practical limit is likely to be a lot less, say in the few
millions. However, in a bulk data load situation you probably don't
need to have such triggers; it's better to establish the constraint
after you load the data.

BTW, has the OP read
http://developer.postgresql.org/docs/postgres/populate.html
? Much of this thread seems to be rehashing that page ...

regards, tom lane


From: Joe Healy <joe(at)omc-international(dot)com(dot)au>
To: Javier de la Torre <jatorre(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: psql is slow and it does not take much resources
Date: 2006-05-03 23:54:35
Message-ID: 445942BB.5050106@omc-international.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Javier de la Torre wrote:
> Great! Then there will be no problems.
>
> I would use COPY but I think I can not. While moving from MySQL to
> PostgreSQL I am also transforming a pair of fields, latitude,
> longitude, into a geometry field, POINT, that is understood for
> Potgis. I though I will not be able to use COPY when inserting data
> with functions.
I definitely recommend using copy if you are inserting a large amount of
data into postgis.
we use something like the following python code to read from a csv file
and insert into pgsql.
I can't remember the rate it works at but it was much quicker than
anything else we tried.

def insertData( header, delimiter, filename, table, SRID, dbname, user,
host ):

f = open(filename, 'r')

# Open a new process to enter data (~10x faster than psycopg)
process = os.popen('psql %s %s -c "COPY %s (geom, elevation) from
stdin;" -h %s' % (dbname, user, table, host), "w")

for a in f:
unpackline = a[:-1].split(delimiter)
easting, northing, elevation = unpackline
process.write("SRID=%s;POINT(%s %s)\t%s\n" %( SRID, easting,
northing, elevation))

f.close()
process.close()

Hope that helps,

Joe


From: Alban Hertroys <alban(at)magproductions(dot)nl>
To: Javier de la Torre <jatorre(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: psql is slow and it does not take much resources
Date: 2006-05-04 09:02:51
Message-ID: 4459C33B.7010502@magproductions.nl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Javier de la Torre wrote:
> Great! Then there will be no problems.
>
> I would use COPY but I think I can not. While moving from MySQL to
> PostgreSQL I am also transforming a pair of fields, latitude,
> longitude, into a geometry field, POINT, that is understood for
> Potgis. I though I will not be able to use COPY when inserting data
> with functions.

I think you'd get better results COPYing into a (temporary) table (say
it's called 'fish') and then inserting your data into your table using
INSERT (...) SELECT ... FROM fish; with the necessary conversions.

I think it'll be still faster than a couple million seperate INSERTs,
and it gives you the opportunity to fix mistakes if you issue the above
command inside a transaction.

--
Alban Hertroys
alban(at)magproductions(dot)nl

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
7500 AK Enschede

// Integrate Your World //


From: "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
To: "Leif B(dot) Kristensen" <leif(at)solumslekt(dot)org>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: psql is slow and it does not take much resources
Date: 2006-05-04 20:30:24
Message-ID: 20060504203024.GB97354@pervasive.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Wed, May 03, 2006 at 04:28:10PM +0200, Leif B. Kristensen wrote:
> However, I'm wondering if there's a practical limit to how many rows you
> can insert within one transaction?

I believe transactions are limited to 4B commands, so the answer would
be 4B rows.
--
Jim C. Nasby, Sr. Engineering Consultant jnasby(at)pervasive(dot)com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461


From: "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
To: Javier de la Torre <jatorre(at)gmail(dot)com>
Cc: "Leif B(dot) Kristensen" <leif(at)solumslekt(dot)org>, pgsql-general(at)postgresql(dot)org
Subject: Re: psql is slow and it does not take much resources
Date: 2006-05-04 20:31:02
Message-ID: 20060504203102.GC97354@pervasive.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Wed, May 03, 2006 at 04:43:15PM +0200, Javier de la Torre wrote:
> Yes,
>
> Thanks. I am doing this now...
>
> Is definetly faster, but I will also discover now if there is a limit
> in a transaction side... I am going to try to insert into one single
> transaction 60 million records in a table.
>
> In any case I still don't understand how why PostgreSQL was not taking
> resources before without the transaction. If it has to create a
> transaction per insert I understand it will have to do more things,
> but why is not taking all resources from the machine? I mean, why is
> it only taking 3% of them.

Because a server has more than just CPU as a resource. In this case you
were undoubtedly limited by the drives that pg_xlog is on.
--
Jim C. Nasby, Sr. Engineering Consultant jnasby(at)pervasive(dot)com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461


From: "Leif B(dot) Kristensen" <leif(at)solumslekt(dot)org>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: psql is slow and it does not take much resources
Date: 2006-05-04 20:58:24
Message-ID: 200605042258.24985.leif@solumslekt.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Thursday 04 May 2006 22:30, Jim C. Nasby wrote:
>On Wed, May 03, 2006 at 04:28:10PM +0200, Leif B. Kristensen wrote:
>> However, I'm wondering if there's a practical limit to how many rows
>> you can insert within one transaction?
>
>I believe transactions are limited to 4B commands, so the answer would
>be 4B rows.

That is definitely not the case. I routinely do around 36000 inserts
wrapped up in one transaction.

I know that there is one hard-wired limit due to the OID wrap-around
problem, at 2^31 commands in one transaction. But the practical limit
due to hardware resources is probably much lower.
--
Leif Biberg Kristensen :: Registered Linux User #338009
http://solumslekt.org/ :: Cruising with Gentoo/KDE


From: "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
To: "Leif B(dot) Kristensen" <leif(at)solumslekt(dot)org>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: psql is slow and it does not take much resources
Date: 2006-05-04 21:06:51
Message-ID: 20060504210650.GL97354@pervasive.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Thu, May 04, 2006 at 10:58:24PM +0200, Leif B. Kristensen wrote:
> On Thursday 04 May 2006 22:30, Jim C. Nasby wrote:
> >On Wed, May 03, 2006 at 04:28:10PM +0200, Leif B. Kristensen wrote:
> >> However, I'm wondering if there's a practical limit to how many rows
> >> you can insert within one transaction?
> >
> >I believe transactions are limited to 4B commands, so the answer would
> >be 4B rows.
>
> That is definitely not the case. I routinely do around 36000 inserts
> wrapped up in one transaction.

Check your eyes or cleen your monitor. ;) I said 4B as in 4 *billion*.
And as Tom mentioned, if you have foreign keys or triggers each insert
will burn through multiple CIDs.

> I know that there is one hard-wired limit due to the OID wrap-around
> problem, at 2^31 commands in one transaction. But the practical limit
> due to hardware resources is probably much lower.

This has nothing to do with OIDs, and in fact I don't believe there's
any intrinsic reason why you couldn't insert more than 2B records in a
table with OIDs so long as you don't have a unique index defined on it.
--
Jim C. Nasby, Sr. Engineering Consultant jnasby(at)pervasive(dot)com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461


From: Scott Marlowe <smarlowe(at)g2switchworks(dot)com>
To: "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
Cc: "Leif B(dot) Kristensen" <leif(at)solumslekt(dot)org>, pgsql general <pgsql-general(at)postgresql(dot)org>
Subject: Re: psql is slow and it does not take much resources
Date: 2006-05-04 21:12:34
Message-ID: 1146777154.22037.59.camel@state.g2switchworks.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Thu, 2006-05-04 at 16:06, Jim C. Nasby wrote:
> On Thu, May 04, 2006 at 10:58:24PM +0200, Leif B. Kristensen wrote:

> > I know that there is one hard-wired limit due to the OID wrap-around
> > problem, at 2^31 commands in one transaction. But the practical limit
> > due to hardware resources is probably much lower.
>
> This has nothing to do with OIDs, and in fact I don't believe there's
> any intrinsic reason why you couldn't insert more than 2B records in a
> table with OIDs so long as you don't have a unique index defined on it.

But there is a truth there. Most people are far more likely to be
limited by their machine's hardware than by the database at this point.

Of course, SOMEONE will find a way to hit the limit.


From: "Leif B(dot) Kristensen" <leif(at)solumslekt(dot)org>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: psql is slow and it does not take much resources
Date: 2006-05-04 21:30:22
Message-ID: 200605042330.22304.leif@solumslekt.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Thursday 04 May 2006 23:06, Jim C. Nasby wrote:
>On Thu, May 04, 2006 at 10:58:24PM +0200, Leif B. Kristensen wrote:
>> On Thursday 04 May 2006 22:30, Jim C. Nasby wrote:
>> >I believe transactions are limited to 4B commands, so the answer
>> > would be 4B rows.
>>
>> That is definitely not the case. I routinely do around 36000 inserts
>> wrapped up in one transaction.
>
>Check your eyes or cleen your monitor. ;) I said 4B as in 4 *billion*.

I've wiped my glasses now :-)

I'm not used to reading B for Billion, and I thought I saw an 8. (Here
in Scandinavia, we adhere to the French convention of 1000 millions = 1
milliard. 1000 milliards = 1 billion.) But it looks like we do agree on
the order of magnitude after all.

Another interpretation leads to reading 4B as 4 bytes, and given that a
byte equals 8 bits, 4B would be the width of a 32-bit integer.
--
Leif Biberg Kristensen :: Registered Linux User #338009
http://solumslekt.org/ :: Cruising with Gentoo/KDE