Re: Looking for tips

Lists: pgsql-performance
From: Oliver Crosby <ryusei(at)gmail(dot)com>
To: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: jd(at)commandprompt(dot)com, pgsql-performance(at)postgresql(dot)org
Subject: Re: Looking for tips
Date: 2005-07-19 20:08:11
Message-ID: 1efd553a0507191308219a098b@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

> We had low resource utilization and poor throughput on inserts of
> thousands of rows within a single database transaction. There were a
> lot of configuration parameters we changed, but the one which helped the
> most was wal_buffers -- we wound up setting it to 1000. This may be
> higher than it needs to be, but when we got to something which ran well,
> we stopped tinkering. The default value clearly caused a bottleneck.

I just tried wal_buffers = 1000, sort_mem at 10% and
effective_cache_size at 75%.
The performance refuses to budge.. I guess that's as good as it'll go?


From: Dawid Kuroczko <qnex42(at)gmail(dot)com>
To: Oliver Crosby <ryusei(at)gmail(dot)com>
Cc: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, jd(at)commandprompt(dot)com, pgsql-performance(at)postgresql(dot)org
Subject: Re: Looking for tips
Date: 2005-07-19 20:19:03
Message-ID: 758d5e7f0507191319636afaa3@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On 7/19/05, Oliver Crosby <ryusei(at)gmail(dot)com> wrote:
> > We had low resource utilization and poor throughput on inserts of
> > thousands of rows within a single database transaction. There were a
> > lot of configuration parameters we changed, but the one which helped the
> > most was wal_buffers -- we wound up setting it to 1000. This may be
> > higher than it needs to be, but when we got to something which ran well,
> > we stopped tinkering. The default value clearly caused a bottleneck.
>
> I just tried wal_buffers = 1000, sort_mem at 10% and
> effective_cache_size at 75%.
> The performance refuses to budge.. I guess that's as good as it'll go?

If it is possible try:
1) wrapping many inserts into one transaction
(BEGIN;INSERT;INSERT;...INSERT;COMMIT;). As PostgreSQL will need to
handle less transactions per second (each your insert is a transaction), it
may work faster.

2) If you can do 1, you could go further and use a COPY command which is
the fastest way to bulk-load a database.

Sometimes I insert data info temporary table, and then do:
INSERT INTO sometable SELECT * FROM tmp_table;
(but I do it when I want to do some select, updates, etc on
the data before "commiting" them to main table; dropping
temporary table is much cheaper than vacuuming many-a-row
table).

Regards,
Dawid

PS: Where can I find benchmarks comparing PHP vs Perl vs Python in
terms of speed of executing prepared statements?


From: Oliver Crosby <ryusei(at)gmail(dot)com>
To: Dawid Kuroczko <qnex42(at)gmail(dot)com>
Cc: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, jd(at)commandprompt(dot)com, pgsql-performance(at)postgresql(dot)org
Subject: Re: Looking for tips
Date: 2005-07-19 20:28:26
Message-ID: 1efd553a050719132836c31b78@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

> If it is possible try:
> 1) wrapping many inserts into one transaction
> (BEGIN;INSERT;INSERT;...INSERT;COMMIT;). As PostgreSQL will need to
> handle less transactions per second (each your insert is a transaction), it
> may work faster.

Aye, that's what I have it doing right now. The transactions do save a
HUGE chunk of time. (Cuts it down by about 40%).

> 2) If you can do 1, you could go further and use a COPY command which is
> the fastest way to bulk-load a database.

I don't think I can use COPY in my case because I need to do
processing on a per-line basis, and I need to check if the item I want
to insert is already there, and if it is, I need to get it's ID so I
can use that for further processing.


From: PFC <lists(at)boutiquenumerique(dot)com>
To: "Dawid Kuroczko" <qnex42(at)gmail(dot)com>, "Oliver Crosby" <ryusei(at)gmail(dot)com>
Cc: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>, jd(at)commandprompt(dot)com, pgsql-performance(at)postgresql(dot)org
Subject: Re: Looking for tips
Date: 2005-07-19 20:28:44
Message-ID: op.st6gh6c9th1vuj@localhost
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance


> PS: Where can I find benchmarks comparing PHP vs Perl vs Python in
> terms of speed of executing prepared statements?

I'm afraid you'll have to do these yourself !

And, I don't think the Python drivers support real prepared statements
(the speed of psycopy is really good though).
I don't think PHP either ; they don't even provide a database interface
to speak of (ie you have to build the query string by hand including
quoting).


From: Sven Willenberger <sven(at)dmv(dot)com>
To: Oliver Crosby <ryusei(at)gmail(dot)com>
Cc: Dawid Kuroczko <qnex42(at)gmail(dot)com>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, jd(at)commandprompt(dot)com, pgsql-performance(at)postgresql(dot)org
Subject: Re: Looking for tips
Date: 2005-07-19 20:46:01
Message-ID: 1121805961.3674.25.camel@lanshark.dmv.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Tue, 2005-07-19 at 16:28 -0400, Oliver Crosby wrote:
> > If it is possible try:
> > 1) wrapping many inserts into one transaction
> > (BEGIN;INSERT;INSERT;...INSERT;COMMIT;). As PostgreSQL will need to
> > handle less transactions per second (each your insert is a transaction), it
> > may work faster.
>
> Aye, that's what I have it doing right now. The transactions do save a
> HUGE chunk of time. (Cuts it down by about 40%).
>
> > 2) If you can do 1, you could go further and use a COPY command which is
> > the fastest way to bulk-load a database.
>
> I don't think I can use COPY in my case because I need to do
> processing on a per-line basis, and I need to check if the item I want
> to insert is already there, and if it is, I need to get it's ID so I
> can use that for further processing.
>

since triggers work with COPY, you could probably write a trigger that
looks for this condition and does the ID processsing you need; you could
thereby enjoy the enormous speed gain resulting from COPY and maintain
your data continuity.

Sven


From: Oliver Crosby <ryusei(at)gmail(dot)com>
To: Sven Willenberger <sven(at)dmv(dot)com>
Cc: Dawid Kuroczko <qnex42(at)gmail(dot)com>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, jd(at)commandprompt(dot)com, pgsql-performance(at)postgresql(dot)org
Subject: Re: Looking for tips
Date: 2005-07-19 21:04:04
Message-ID: 1efd553a050719140462d41468@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

> since triggers work with COPY, you could probably write a trigger that
> looks for this condition and does the ID processsing you need; you could
> thereby enjoy the enormous speed gain resulting from COPY and maintain
> your data continuity.

So... (bear with me here.. trying to make sense of this)..
With triggers there's a way I can do the parsing I need to on a log
file and react to completed events in non-sequential order (you can
ignore that part.. it's just how we piece together different related
events) and then have perl/DBD::Pg invoke a copy command (which, from
what I can tell, has to operate on a file...) and the copy command can
feed the ID I need back to perl so I can work with it...
If that doesn't hurt my brain, then I'm at least kinda confused...
Anyway. Heading home now. I'll think about this more tonight/tomorrow.


From: Sven Willenberger <sven(at)dmv(dot)com>
To: Oliver Crosby <ryusei(at)gmail(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Looking for tips
Date: 2005-07-19 21:15:08
Message-ID: 1121807708.3673.47.camel@lanshark.dmv.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Tue, 2005-07-19 at 17:04 -0400, Oliver Crosby wrote:
> > since triggers work with COPY, you could probably write a trigger that
> > looks for this condition and does the ID processsing you need; you could
> > thereby enjoy the enormous speed gain resulting from COPY and maintain
> > your data continuity.
>
> So... (bear with me here.. trying to make sense of this)..
> With triggers there's a way I can do the parsing I need to on a log
> file and react to completed events in non-sequential order (you can
> ignore that part.. it's just how we piece together different related
> events) and then have perl/DBD::Pg invoke a copy command (which, from
> what I can tell, has to operate on a file...) and the copy command can
> feed the ID I need back to perl so I can work with it...
> If that doesn't hurt my brain, then I'm at least kinda confused...
> Anyway. Heading home now. I'll think about this more tonight/tomorrow.
>

Well without knowing the specifics of what you are actually trying to
accomplish I cannot say yes or no to your question. I am not sure from
where this data is coming that you are inserting into the db. However,
if the scenario is this: a) attempt to insert a row b) if row exists
already, grab the ID and do other db selects/inserts/deletes based on
that ID, then there is no need to feed this information back to the
perlscript. Is your perlscript parsing a file and then using the parsed
information to insert rows? If so, how is the ID that is returned used?
Can you have the trigger use the ID that may be returned to perform
whatever it is that your perlscript is trying to accomplish with that
ID?

It's all kind of vague so my answers may or may not help, but based on
the [lack of] specifics you have provided, I fear that is the best
suggestion that I can offer at this point.

Sven


From: PFC <lists(at)boutiquenumerique(dot)com>
To: "Oliver Crosby" <ryusei(at)gmail(dot)com>, "Sven Willenberger" <sven(at)dmv(dot)com>
Cc: "Dawid Kuroczko" <qnex42(at)gmail(dot)com>, "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>, jd(at)commandprompt(dot)com, pgsql-performance(at)postgresql(dot)org
Subject: Re: Looking for tips
Date: 2005-07-19 23:01:56
Message-ID: op.st6nlincth1vuj@localhost
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance


You could have a program pre-parse your log and put it in a format
understandable by COPY, then load it in a temporary table and write a part
of your application simply as a plpgsql function, reading from this table
and doing queries (or a plperl function)...

> So... (bear with me here.. trying to make sense of this)..
> With triggers there's a way I can do the parsing I need to on a log
> file and react to completed events in non-sequential order (you can
> ignore that part.. it's just how we piece together different related
> events) and then have perl/DBD::Pg invoke a copy command (which, from
> what I can tell, has to operate on a file...) and the copy command can
> feed the ID I need back to perl so I can work with it...
> If that doesn't hurt my brain, then I'm at least kinda confused...
> Anyway. Heading home now. I'll think about this more tonight/tomorrow.
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/docs/faq
>


From: Oliver Crosby <ryusei(at)gmail(dot)com>
To: PFC <lists(at)boutiquenumerique(dot)com>
Cc: Sven Willenberger <sven(at)dmv(dot)com>, Dawid Kuroczko <qnex42(at)gmail(dot)com>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, jd(at)commandprompt(dot)com, pgsql-performance(at)postgresql(dot)org
Subject: Re: Looking for tips
Date: 2005-07-20 01:50:18
Message-ID: 1efd553a050719185059de2581@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Sorry for the lack of specifics...

We have a file generated as a list of events, one per line. Suppose
lines 1,2,3,5,7,11,etc were related, then the last one would specify
that it's the last event. Gradually this gets assembled by a perl
script and when the last event is encountered, it gets inserted into
the db. For a given table, let's say it's of the form (a,b,c) where
'a' is a pkey, 'b' is indexed, and 'c' is other related information.
The most common 'b' values are cached locally with the perl script to
save us having to query the db. So what we end up having is:

if 'b' exists in cache, use cached 'a' value and continue
else if 'b' exists in the db, use the associated 'a' value and continue
else add a new line with 'b', return the new 'a' and continue

The local cache was a huge time saver with mysql. I've tried making a
plpgsql function that handles everything in one step on the db side,
but it didn't show any improvement. Time permitting, I'll try some new
approaches with changing the scripts and queries, though right now I
was just hoping to tune postgresql.conf to work better with the
hardware available.

Thanks to everyone for your help. Very much appreciated.