Quick Links

PostgreSQL Write Performance

Lists:	pgsql-general

From:	Yan Cheng Cheok <yccheok(at)yahoo(dot)com>
To:	pgsql-general(at)postgresql(dot)org
Subject:	PostgreSQL Write Performance
Date:	2010-01-05 03:36:10
Message-ID:	707105.14949.qm@web65702.mail.ac4.yahoo.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

I am not sure whether I am doing the correct benchmarking way.

I have the following table ;

CREATE TABLE measurement_type
(
measurement_type_id bigserial NOT NULL,
measurement_type_name text NOT NULL,
CONSTRAINT pk_measurement_type_id PRIMARY KEY (measurement_type_id),
CONSTRAINT measurement_type_measurement_type_name_key UNIQUE (measurement_type_name)
)

I make the following single write operation through pgAdmin :

INSERT INTO measurement_type ( measurement_type_name )
VALUES('Width');

It takes 16ms to write a single row according to "Query Editor" (bottom right corner)

Am I doing the correct way to benchmarking? I am not sure whether this is expected performance? For me, I am expecting the time measurement is in nano seconds :p

Thanks and Regards
Yan Cheng CHEOK

From:	Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
To:	Yan Cheng Cheok <yccheok(at)yahoo(dot)com>
Cc:	pgsql-general(at)postgresql(dot)org
Subject:	Re: PostgreSQL Write Performance
Date:	2010-01-05 03:45:36
Message-ID:	dcc563d11001041945t2556e9aar4c2b90388da3d485@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

On Mon, Jan 4, 2010 at 8:36 PM, Yan Cheng Cheok <yccheok(at)yahoo(dot)com> wrote:
> I am not sure whether I am doing the correct benchmarking way.
>
> I have the following table ;
>
> CREATE TABLE measurement_type
> (
> measurement_type_id bigserial NOT NULL,
> measurement_type_name text NOT NULL,
> CONSTRAINT pk_measurement_type_id PRIMARY KEY (measurement_type_id),
> CONSTRAINT measurement_type_measurement_type_name_key UNIQUE (measurement_type_name)
> )
>
> I make the following single write operation through pgAdmin :
>
> INSERT INTO measurement_type ( measurement_type_name )
> VALUES('Width');
>
> It takes 16ms to write a single row according to "Query Editor" (bottom right corner)
>
> Am I doing the correct way to benchmarking? I am not sure whether this is expected performance? For me, I am expecting the time measurement is in nano seconds :p

It would be great if a hard drive could seek write, acknowledge the
write and the OS could tell the db about it in nano seconds. However,
some part of that chain would have to be lieing to do that. It takes
at LEAST a full rotation of a hard drive to commit a single change to
a database, usually more. Given that the fastest HDs are 15k RPM
right now, you're looking at 250 revolutions per second, or 1/250th of
a second minimum to commit a transaction.

Now, the good news is that if you make a bunch of inserts in the same
transaction a lot of them can get committed together to the disk at
the same time, and the aggregate speed will be, per insert, much
faster.

From:	Yan Cheng Cheok <yccheok(at)yahoo(dot)com>
To:	Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
Cc:	pgsql-general(at)postgresql(dot)org
Subject:	Re: PostgreSQL Write Performance
Date:	2010-01-05 05:04:43
Message-ID:	63583.95240.qm@web65714.mail.ac4.yahoo.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

Instead of sending 1000++ INSERT statements in one shot, which will requires my application to keep track on the INSERT statement.

Is it possible that I can tell PostgreSQL,

"OK. I am sending you INSERT statement. But do not perform any actual right operation. Only perform actual write operation when the pending statement had reached 1000"

Thanks and Regards
Yan Cheng CHEOK

--- On Tue, 1/5/10, Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com> wrote:

> From: Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
> Subject: Re: [GENERAL] PostgreSQL Write Performance
> To: "Yan Cheng Cheok" <yccheok(at)yahoo(dot)com>
> Cc: pgsql-general(at)postgresql(dot)org
> Date: Tuesday, January 5, 2010, 11:45 AM
> On Mon, Jan 4, 2010 at 8:36 PM, Yan
> Cheng Cheok <yccheok(at)yahoo(dot)com>
> wrote:
> > I am not sure whether I am doing the correct
> benchmarking way.
> >
> > I have the following table ;
> >
> > CREATE TABLE measurement_type
> > (
> > measurement_type_id bigserial NOT NULL,
> > measurement_type_name text NOT NULL,
> > CONSTRAINT pk_measurement_type_id PRIMARY KEY
> (measurement_type_id),
> > CONSTRAINT
> measurement_type_measurement_type_name_key UNIQUE
> (measurement_type_name)
> > )
> >
> > I make the following single write operation through
> pgAdmin :
> >
> > INSERT INTO measurement_type ( measurement_type_name
> )
> > VALUES('Width');
> >
> > It takes 16ms to write a single row according to
> "Query Editor" (bottom right corner)
> >
> > Am I doing the correct way to benchmarking? I am not
> sure whether this is expected performance? For me, I am
> expecting the time measurement is in nano seconds :p
>
> It would be great if a hard drive could seek write,
> acknowledge the
> write and the OS could tell the db about it in nano
> seconds. However,
> some part of that chain would have to be lieing to do
> that. It takes
> at LEAST a full rotation of a hard drive to commit a single
> change to
> a database, usually more. Given that the fastest HDs
> are 15k RPM
> right now, you're looking at 250 revolutions per second, or
> 1/250th of
> a second minimum to commit a transaction.
>
> Now, the good news is that if you make a bunch of inserts
> in the same
> transaction a lot of them can get committed together to the
> disk at
> the same time, and the aggregate speed will be, per insert,
> much
> faster.
>
> --
> Sent via pgsql-general mailing list (pgsql-general(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

From:	Greg Smith <greg(at)2ndquadrant(dot)com>
To:	Yan Cheng Cheok <yccheok(at)yahoo(dot)com>
Cc:	Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>, pgsql-general(at)postgresql(dot)org
Subject:	Re: PostgreSQL Write Performance
Date:	2010-01-05 05:56:56
Message-ID:	4B42D4A8.3080504@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

Yan Cheng Cheok wrote:
> Instead of sending 1000++ INSERT statements in one shot, which will requires my application to keep track on the INSERT statement.
>
> Is it possible that I can tell PostgreSQL,
>
> "OK. I am sending you INSERT statement. But do not perform any actual right operation. Only perform actual write operation when the pending statement had reached 1000"
>

You can turn off synchronous_commit to get something like that:

http://www.postgresql.org/docs/current/static/runtime-config-wal.html

This should make your single-record INSERT time drop dramatically. Note
that you'll be introducing a possibility of some data loss from the
latest insert(s) if the server crashes in this situation.

--
Greg Smith 2ndQuadrant Baltimore, MD
PostgreSQL Training, Services and Support
greg(at)2ndQuadrant(dot)com www.2ndQuadrant.com

From:	"Dann Corbit" <DCorbit(at)connx(dot)com>
To:	"Yan Cheng Cheok" <yccheok(at)yahoo(dot)com>
Cc:	<pgsql-general(at)postgresql(dot)org>
Subject:	Re: PostgreSQL Write Performance
Date:	2010-01-05 06:03:39
Message-ID:	D425483C2C5C9F49B5B7A41F89441547029626B5@postal.corporate.connx.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

> -----Original Message-----
> From: pgsql-general-owner(at)postgresql(dot)org [mailto:pgsql-general-
> owner(at)postgresql(dot)org] On Behalf Of Yan Cheng Cheok
> Sent: Monday, January 04, 2010 9:05 PM
> To: Scott Marlowe
> Cc: pgsql-general(at)postgresql(dot)org
> Subject: Re: [GENERAL] PostgreSQL Write Performance
>
> Instead of sending 1000++ INSERT statements in one shot, which will
> requires my application to keep track on the INSERT statement.
>
> Is it possible that I can tell PostgreSQL,
>
> "OK. I am sending you INSERT statement. But do not perform any actual
> right operation. Only perform actual write operation when the pending
> statement had reached 1000"

You might use the copy command instead of insert, which is far faster.
If you want the fastest possible inserts, then probably copy is the way
to go instead of insert.
Here is copy command via API:
http://www.postgresql.org/docs/current/static/libpq-copy.html
Here is copy command via SQL:
http://www.postgresql.org/docs/8.4/static/sql-copy.html

You might (instead) use this sequence:

1. Begin transaction
2. Prepare
3. Insert 1000 times
4. Commit

If the commit fails, you will have to redo the entire set of 1000 or
otherwise handle the error.

Or something along those lines. Of course, when information is not
written to disk, what will happen on power failure? If you do something
cheesy like turning fsync off to speed up inserts, then you will have
trouble if you lose power.

What is the actual problem you are trying to solve?

From:	Yan Cheng Cheok <yccheok(at)yahoo(dot)com>
To:	Dann Corbit <DCorbit(at)connx(dot)com>
Cc:	pgsql-general(at)postgresql(dot)org
Subject:	Re: PostgreSQL Write Performance
Date:	2010-01-05 07:30:07
Message-ID:	732376.62487.qm@web65703.mail.ac4.yahoo.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

>> What is the actual problem you are trying to solve?

I am currently developing a database system for a high speed measurement machine.

The time taken to perform measurement per unit is in term of ~30 milliseconds. We need to record down the measurement result for every single unit. Hence, the time taken by record down the measurement result shall be far more less than milliseconds, so that it will have nearly 0 impact on the machine speed (If not, machine need to wait for database to finish writing, before performing measurement on next unit)

Previously, we are using flat file.. However, using flat file is quite a mess, when come to generating reports to customers.

Thanks and Regards
Yan Cheng CHEOK

--- On Tue, 1/5/10, Dann Corbit <DCorbit(at)connx(dot)com> wrote:

> From: Dann Corbit <DCorbit(at)connx(dot)com>
> Subject: Re: [GENERAL] PostgreSQL Write Performance
> To: "Yan Cheng Cheok" <yccheok(at)yahoo(dot)com>
> Cc: pgsql-general(at)postgresql(dot)org
> Date: Tuesday, January 5, 2010, 2:03 PM
> > -----Original Message-----
> > From: pgsql-general-owner(at)postgresql(dot)org
> [mailto:pgsql-general-
> > owner(at)postgresql(dot)org]
> On Behalf Of Yan Cheng Cheok
> > Sent: Monday, January 04, 2010 9:05 PM
> > To: Scott Marlowe
> > Cc: pgsql-general(at)postgresql(dot)org
> > Subject: Re: [GENERAL] PostgreSQL Write Performance
> >
> > Instead of sending 1000++ INSERT statements in one
> shot, which will
> > requires my application to keep track on the INSERT
> statement.
> >
> > Is it possible that I can tell PostgreSQL,
> >
> > "OK. I am sending you INSERT statement. But do not
> perform any actual
> > right operation. Only perform actual write operation
> when the pending
> > statement had reached 1000"
>
> You might use the copy command instead of insert, which is
> far faster.
> If you want the fastest possible inserts, then probably
> copy is the way
> to go instead of insert.
> Here is copy command via API:
> http://www.postgresql.org/docs/current/static/libpq-copy.html
> Here is copy command via SQL:
> http://www.postgresql.org/docs/8.4/static/sql-copy.html
>
>
> You might (instead) use this sequence:
>
> 1. Begin transaction
> 2. Prepare
> 3. Insert 1000 times
> 4. Commit
>
> If the commit fails, you will have to redo the entire set
> of 1000 or
> otherwise handle the error.
>
> Or something along those lines. Of course, when
> information is not
> written to disk, what will happen on power failure?
> If you do something
> cheesy like turning fsync off to speed up inserts, then you
> will have
> trouble if you lose power.
>
> What is the actual problem you are trying to solve?
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

From:	"Dann Corbit" <DCorbit(at)connx(dot)com>
To:	"Yan Cheng Cheok" <yccheok(at)yahoo(dot)com>
Cc:	<pgsql-general(at)postgresql(dot)org>
Subject:	Re: PostgreSQL Write Performance
Date:	2010-01-05 07:34:09
Message-ID:	D425483C2C5C9F49B5B7A41F89441547029626B6@postal.corporate.connx.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

> -----Original Message-----
> From: Yan Cheng Cheok [mailto:yccheok(at)yahoo(dot)com]
> Sent: Monday, January 04, 2010 11:30 PM
> To: Dann Corbit
> Cc: pgsql-general(at)postgresql(dot)org
> Subject: Re: [GENERAL] PostgreSQL Write Performance
>
> >> What is the actual problem you are trying to solve?
>
> I am currently developing a database system for a high speed
> measurement machine.
>
> The time taken to perform measurement per unit is in term of ~30
> milliseconds. We need to record down the measurement result for every
> single unit. Hence, the time taken by record down the measurement
> result shall be far more less than milliseconds, so that it will have
> nearly 0 impact on the machine speed (If not, machine need to wait for
> database to finish writing, before performing measurement on next
unit)
>
> Previously, we are using flat file.. However, using flat file is quite
> a mess, when come to generating reports to customers.

Does the data volume build continuously so that the file becomes
arbitrarily large, or can you archive data that is more than X days old?
What is the format of a record?
What is the format of the incoming data?
Do you need indexes on these records?
Is the stream of incoming data continuous around the clock, or are there
periods when there is no incoming data?

From:	Alban Hertroys <dalroi(at)solfertje(dot)student(dot)utwente(dot)nl>
To:	Yan Cheng Cheok <yccheok(at)yahoo(dot)com>
Cc:	Dann Corbit <DCorbit(at)connx(dot)com>, pgsql-general(at)postgresql(dot)org
Subject:	Re: PostgreSQL Write Performance
Date:	2010-01-05 11:04:05
Message-ID:	8A4E4050-34DE-4459-979C-3752E9C02F21@solfertje.student.utwente.nl
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

On 5 Jan 2010, at 8:30, Yan Cheng Cheok wrote:

>>> What is the actual problem you are trying to solve?
>
> I am currently developing a database system for a high speed measurement machine.
>
> The time taken to perform measurement per unit is in term of ~30 milliseconds. We need to record down the measurement result for every single unit. Hence, the time taken by record down the measurement result shall be far more less than milliseconds, so that it will have nearly 0 impact on the machine speed (If not, machine need to wait for database to finish writing, before performing measurement on next unit)
>
> Previously, we are using flat file.. However, using flat file is quite a mess, when come to generating reports to customers.

That flat file can help you with clustering INSERTs together and also means you'll have all your measurements ready for a single INSERT. The latter is useful if you're planning on using arrays to store your measurements, as updating array values requires the entire array to be rewritten to the database. I don't know how your measurements would arrive without the flat file, but I wouldn't be surprised if the measurements for a single unit would come out at different points in time, which would be a bit painful with arrays (not quite as much with a measurements table though).

A safe approach (with minimal risk of data loss) would be to split your flat file every n units (earlier in this thread a number of n=1000 was mentioned) and store that data using COPY in the format COPY expects. You will probably also want to keep a queue-table (which is just a normal table, but it's used like a queue) with the names of the flat files that need processing.

I haven't done this kind of thing before, but I envision it something like this:

CREATE TABLE unit (
id bigserial NOT NULL,
date date NOT NULL DEFAULT CURRENT_DATE,
measured text[],
measurements numeric(4,3)[]
);

CREATE TABLE queue (
file text NOT NULL,
definitive boolean DEFAULT False
);

---file-2010-01-05-00000001---
/* Update in it's own transaction so that we know we tried to process this file
* even if the transaction rolls back.
*/
UPDATE queue SET definitive = True
WHERE file = 'file-' || to_char(CURRENT_DATE, 'YYYY-MM-DD') || '-00000001';

/* Start work */
BEGIN;
COPY unit FROM STDIN;
1 {Width,Height} {0.001,0.021}
2 {Width,Height} {0.002,0.019}
...
999 {Width,Height} {0.000,0.018}
\.

/* This file was processed and can be removed from the queue */
DELETE FROM queue WHERE file='file-2010-01-05-00000001';
COMMIT;

/* This will probably be the name of the next flat file, but we don't know that
* for sure yet. It needs to be outside the transaction as otherwise CURRENT_DATE
* will have the date of the start of the transaction and we need to know what the
* next batch will be regardless of whether this one succeeded.
*/
INSERT INTO queue (file, definitive)
VALUES ('file-' || to_char(CURRENT_DATE, 'YYYY-MM-DD') || '-00001000', False);
---end of file---

You'd need a little program (a script will probably work) to read that queue table and send the commands in those files to the database. Don't forget that at the start the queue table will be empty ;) I recall some of this lists' members wrote up a webpage about how to implement queue-tables reliably.

Alban Hertroys

--
Screwing up is the best way to attach something to the ceiling.

!DSPAM:737,4b431caa10731320433375!

From:	Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
To:	Yan Cheng Cheok <yccheok(at)yahoo(dot)com>
Cc:	Dann Corbit <DCorbit(at)connx(dot)com>, pgsql-general(at)postgresql(dot)org
Subject:	Re: PostgreSQL Write Performance
Date:	2010-01-05 11:20:41
Message-ID:	4B432089.1070706@postnewspapers.com.au
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

On 5/01/2010 3:30 PM, Yan Cheng Cheok wrote:
>>> What is the actual problem you are trying to solve?
>
> I am currently developing a database system for a high speed measurement machine.
>
> The time taken to perform measurement per unit is in term of ~30 milliseconds. We need to record down the measurement result for every single unit. Hence, the time taken by record down the measurement result shall be far more less than milliseconds, so that it will have nearly 0 impact on the machine speed (If not, machine need to wait for database to finish writing, before performing measurement on next unit)

The commit_delay and synchronous_commit pararmeters may help you if you
want to do each insert as a separate transaction. Note that with these
parameters there's some risk of very recently committed data being lost
if the server OS crashes or the server hardware is powered
off/power-cycled unexpectedly. PostgreSQL its self crashing shouldn't
cause loss of the committed data, though.

Alternately, you can accumulate small batches of measurements in your
app and do multi-valued INSERTs once you have a few (say 10) collected
up. You'd have to be prepared to lose those if the app crashed though.

Another option is to continue using your flat file, and have a "reader"
process tailing the flat file and inserting new records into the
database as they become available in the flat file. The reader could
batch inserts intelligently, keep a record on disk of its progress,
rotate the flat file periodically, etc.

--
Craig Ringer

From:	Tim Uckun <timuckun(at)gmail(dot)com>
To:	pgsql-general(at)postgresql(dot)org
Subject:	Re: PostgreSQL Write Performance
Date:	2010-01-05 22:21:08
Message-ID:	855e4dcf1001051421t2e391c47kb9381657e0730057@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

> You might use the copy command instead of insert, which is far faster.
> If you want the fastest possible inserts, then probably copy is the way
> to go instead of insert.
> Here is copy command via API:
> http://www.postgresql.org/docs/current/static/libpq-copy.html
> Here is copy command via SQL:
> http://www.postgresql.org/docs/8.4/static/sql-copy.html
>

Is there a command like COPY which will insert the data but skip all
triggers and optionally integrity checks.

From:	Greg Smith <greg(at)2ndquadrant(dot)com>
To:	Tim Uckun <timuckun(at)gmail(dot)com>
Cc:	pgsql-general(at)postgresql(dot)org
Subject:	Re: PostgreSQL Write Performance
Date:	2010-01-05 22:34:28
Message-ID:	4B43BE74.5050800@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

Tim Uckun wrote:
> Is there a command like COPY which will insert the data but skip all
> triggers and optionally integrity checks.
>

Nope, skipping integrity checks is MySQL talk. When doing a bulk
loading job, it may make sense to drop constraints and triggers though;
there's more notes on this and related techniques at
http://www.postgresql.org/docs/current/interactive/populate.html

Another thing you can do is defer your constraints:

http://www.postgresql.org/docs/current/static/sql-set-constraints.html

so that they execute in a more efficient block.

--
Greg Smith 2ndQuadrant Baltimore, MD
PostgreSQL Training, Services and Support
greg(at)2ndQuadrant(dot)com www.2ndQuadrant.com

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	pgsql-general(at)postgresql(dot)org
Cc:	Yan Cheng Cheok <yccheok(at)yahoo(dot)com>
Subject:	Re: PostgreSQL Write Performance
Date:	2010-01-05 22:49:13
Message-ID:	201001052349.13382.andres@anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

Hi,

On Tuesday 05 January 2010 04:36:10 Yan Cheng Cheok wrote:
> I make the following single write operation through pgAdmin :
...
> It takes 16ms to write a single row according to "Query Editor" (bottom
> right corner)
In my experience the times presented by pgadmin vary wildly and seldomly do
represent an accurate timing.

Andres

From:	Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
To:	Tim Uckun <timuckun(at)gmail(dot)com>
Cc:	pgsql-general(at)postgresql(dot)org
Subject:	Re: PostgreSQL Write Performance
Date:	2010-01-05 23:33:23
Message-ID:	4B43CC43.60505@postnewspapers.com.au
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

On 6/01/2010 6:21 AM, Tim Uckun wrote:
>> You might use the copy command instead of insert, which is far faster.
>> If you want the fastest possible inserts, then probably copy is the way
>> to go instead of insert.
>> Here is copy command via API:
>> http://www.postgresql.org/docs/current/static/libpq-copy.html
>> Here is copy command via SQL:
>> http://www.postgresql.org/docs/8.4/static/sql-copy.html
>>
>
> Is there a command like COPY which will insert the data but skip all
> triggers and optionally integrity checks.

No. If you don't want triggers and integrity checks to fire, don't
define them in the first place.

Technically you *can* disable triggers, including RI checks, but it's
VERY unwise and almost completely defeats the purpose of having the
checks. In most such situations you're much better off dropping the
constraints then adding them again at the end of the load.

--
Craig Ringer

From:	Tim Uckun <timuckun(at)gmail(dot)com>
To:	Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
Cc:	pgsql-general(at)postgresql(dot)org
Subject:	Re: PostgreSQL Write Performance
Date:	2010-01-05 23:46:51
Message-ID:	855e4dcf1001051546n2f4a481erd97f582fa12e8445@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

>
> Technically you *can* disable triggers, including RI checks, but it's VERY
> unwise and almost completely defeats the purpose of having the checks. In
> most such situations you're much better off dropping the constraints then
> adding them again at the end of the load.

I know that the SQL server bulk loader defaults to not firing the
triggers and I was hoping there would be an option on the COPY command
to accomplish the same thing.

pg_dump has a --disable-triggers option too.

It seems to me that the COPY FROM should have an option that bypasses
the triggers as a convience.

Both the SQL server team and the postgres team have obviously
recognized that there will be situations where the DBA will want to
bulk load data without firing off a set of triggers for every insert.

It doesn't seem like an outrageous expectation that the COPY command
or something similar should have that option.

From:	Christophe Pettus <xof(at)thebuild(dot)com>
To:	"pgsql-general(at)postgresql(dot)org mailing list" <pgsql-general(at)postgresql(dot)org>
Subject:	Re: PostgreSQL Write Performance
Date:	2010-01-05 23:55:04
Message-ID:	26667536-DB10-46E2-A7AB-38304A2BF59B@thebuild.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

On Jan 5, 2010, at 3:46 PM, Tim Uckun wrote:
> pg_dump has a --disable-triggers option too.

[...]

> It doesn't seem like an outrageous expectation that the COPY command
> or something similar should have that option.

Well, whether an expectation is "outrageous" or not is a matter of
viewpoint. The principle is that pg_dump and COPY have fundamentally
different use cases. pg_dump is intended to restore a backup of data
that was already in the database, and presumably was already validated
by the appropriate constraints and triggers. COPY is used to create
new records in a database, from arbitrary data, which may not be valid
based on the database's vision of data consistency.

--
-- Christophe Pettus
xof(at)thebuild(dot)com

From:	Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
To:	Tim Uckun <timuckun(at)gmail(dot)com>
Cc:	pgsql-general(at)postgresql(dot)org
Subject:	Re: PostgreSQL Write Performance
Date:	2010-01-06 01:15:07
Message-ID:	4B43E41B.4070003@postnewspapers.com.au
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

Tim Uckun wrote:
>> Technically you *can* disable triggers, including RI checks, but it's VERY
>> unwise and almost completely defeats the purpose of having the checks. In
>> most such situations you're much better off dropping the constraints then
>> adding them again at the end of the load.
>
>
> I know that the SQL server bulk loader defaults to not firing the
> triggers and I was hoping there would be an option on the COPY command
> to accomplish the same thing.
>
> pg_dump has a --disable-triggers option too.

pg_restore? Yes, it does. However, it knows that the data you're loading
came from a PostgreSQL database where those triggers have already fired
when the data was originally inserted. It can trust that the data is OK.

I don't personally think that COPY should make it easy to disable
triggers. You can do it if you want to (see the manual for how) but in
almost all cases its much wiser to drop triggers and constraints instead.

Rather than disabling RI constraints, it'd be desirable for COPY to have
an option that *deferred* RI constraint checking, then re-checked the
constraints for all rows at once and rolled back the COPY if any failed.
That'd be a lot faster for many uses, but (importantly) would preserve
referential integrity.

While COPY doesn't offer an easy way to do that, you can emulate the
behavior by:

- Beginning a transaction
- Disabling RI constraints
- Running COPY
- Re-enabling RI constraints
- Re-checking the RI constraints and raising an exception to abort the
transaction if the checks fail.

Unfortunately you can't (yet) do this with deferrable RI constraints,
because Pg isn't clever enough to notice when large numbers of
individual checks have built up and merge them into a single re-check
query that verifies the whole constraint in one pass. So you have to do
that yourself.

> It seems to me that the COPY FROM should have an option that bypasses
> the triggers as a convience.

I, for one, would loudly and firmly resist the addition of such a
feature. Almost-as-fast options such as intelligent re-checking of
deferred constraints would solve the problem better and much more
safely. If you really want the dangerous way you can already get it,
it's just a bit more work to disable the triggers yourself, and by doing
so you're saying "I understand what I am doing and take responsibility
for the dangers inherent in doing so".

If you really want to do that, look at the manual for how to disable
triggers, but understand that you are throwing away the database's data
integrity protection by doing it.

--
Craig Ringer

From:	Tim Uckun <timuckun(at)gmail(dot)com>
To:	Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
Cc:	pgsql-general(at)postgresql(dot)org
Subject:	Re: PostgreSQL Write Performance
Date:	2010-01-06 02:30:03
Message-ID:	855e4dcf1001051830x1612f600idd52387d7e73aa7a@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

> I, for one, would loudly and firmly resist the addition of such a
> feature. Almost-as-fast options such as intelligent re-checking of

Even if it was not the default behavior?

>
> If you really want to do that, look at the manual for how to disable
> triggers, but understand that you are throwing away the database's data
> integrity protection by doing it.
>

I guess it's a matter of philosophy. I kind of think as the DBA I
should be the final authority in determining what is right and wrong.
It's my data after all. Yes I would expect pg to perform every check I
specify and execute every trigger I write but if I want I should be
able to bypass those things "just this once".

As you point out I can already do this by manually going through and
disabling every trigger or even dropping the triggers. Many people
have said I could drop the constraints and re-set them up. The fact
that the COPY command does not have a convenient way for me to do this
doesn't prevent me from "shooting myself in the foot" if I want to.
It would just be a flag. If you want you can enable it, if you don't
they no harm no foul.

Anyway this is getting offtopic. I got my question answered. COPY does
not do this. If I want to do it I have to manually iterate through all
the triggers and disable them or drop them before running copy.

From:	Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
To:	Tim Uckun <timuckun(at)gmail(dot)com>
Cc:	pgsql-general(at)postgresql(dot)org
Subject:	Re: PostgreSQL Write Performance
Date:	2010-01-06 02:56:13
Message-ID:	4B43FBCD.8090109@postnewspapers.com.au
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

Tim Uckun wrote:
>> I, for one, would loudly and firmly resist the addition of such a
>> feature. Almost-as-fast options such as intelligent re-checking of
>
> Even if it was not the default behavior?

Even if it was called

COPY (PLEASE BREAK MY DATABASE) FROM ...

... because there are *better* ways to do it that are safe for exposure
to normal users, and existing ways to do it the dangerous way if you
really want to.

> I guess it's a matter of philosophy. I kind of think as the DBA I
> should be the final authority in determining what is right and wrong.
> It's my data after all. Yes I would expect pg to perform every check I
> specify and execute every trigger I write but if I want I should be
> able to bypass those things "just this once".
>
> As you point out I can already do this by manually going through and
> disabling every trigger or even dropping the triggers.

You could alternately dynamically query pg_catalog and use PL/PgSQL (or
SQL generated by your app) to issue the appropriate statements to
control the triggers.

> Many people
> have said I could drop the constraints and re-set them up. The fact
> that the COPY command does not have a convenient way for me to do this
> doesn't prevent me from "shooting myself in the foot" if I want to.

I think that it would be desirable for COPY to provide a convenient way
to drop and re-create constraints, or (preferably) just disable them
while it ran then re-check them before returning success. Think:

COPY (BATCH_RI_CHECKS) FROM ...

The thing you might have missed is that dropping and re-creating
constraints is different to disabling them (as you're requesting that
COPY do). When the constraints are re-created, the creation will *fail*
if the constraint is violated, aborting the whole operation if you're
sensible enough to do it in a single transaction. At no point can you
end up with a constraint in place promising RI that is in fact violated
by the data.

By contrast, if you disable triggers and constraints, re-enabling them
does *not* re-check any constraints. So it's much, MUCH more dangerous,
since it can let bad data into the DB silently. So if you disable
constraints you MUST re-check them after re-enabling them and abort the
transaction if they're violated, or must be utterly certain that the
data you inserted/altered/deleted was really safe.

> It would just be a flag. If you want you can enable it, if you don't
> they no harm no foul.

Unfortunately my experience has been that many users (a) often don't
read documentation and (b) just try different things until something
they do makes the error "go away". They then get five steps down the
track and post a question about a query that doesn't work correctly
(because they broke their data) and it takes forever to get to the
bottom of it.

You're clearly a fairly experienced and responsible DBA who'd look
something up before using it and would be careful to preserve RI
manually in these situations. Some people who use PG aren't (an amazing
number of them just installed it to run their poker card counting
software!), and I don't think it's wise to make dangerous things *too*
obvious. They need to be there and available, but not staring you in the
face.

I don't advocate the GNOME philosophy of "make it impossible if it's not
suitable for a user who's using a computer for the first time" ... but I
don't like the "nuke my data" button to be on the default desktop either ;-)

--
Craig Ringer

From:	Yan Cheng Cheok <yccheok(at)yahoo(dot)com>
To:	pgsql-general(at)postgresql(dot)org, Andres Freund <andres(at)anarazel(dot)de>
Subject:	Re: PostgreSQL Write Performance
Date:	2010-01-06 06:17:17
Message-ID:	229380.83724.qm@web65702.mail.ac4.yahoo.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

Thanks for the information. I wrote a plan c program to test the performance. Its time measurement is very MUCH different from pgAdmin.

Thanks and Regards
Yan Cheng CHEOK

--- On Wed, 1/6/10, Andres Freund <andres(at)anarazel(dot)de> wrote:

> From: Andres Freund <andres(at)anarazel(dot)de>
> Subject: Re: [GENERAL] PostgreSQL Write Performance
> To: pgsql-general(at)postgresql(dot)org
> Cc: "Yan Cheng Cheok" <yccheok(at)yahoo(dot)com>
> Date: Wednesday, January 6, 2010, 6:49 AM
> Hi,
>
> On Tuesday 05 January 2010 04:36:10 Yan Cheng Cheok wrote:
>
> > I make the following single write operation through
> pgAdmin :
> ...
> > It takes 16ms to write a single row according to
> "Query Editor" (bottom
> > right corner)
> In my experience the times presented by pgadmin vary wildly
> and seldomly do
> represent an accurate timing.
>
> Andres
>

From:	Yan Cheng Cheok <yccheok(at)yahoo(dot)com>
To:	Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
Cc:	Dann Corbit <DCorbit(at)connx(dot)com>, pgsql-general(at)postgresql(dot)org
Subject:	Re: PostgreSQL Write Performance
Date:	2010-01-06 06:29:30
Message-ID:	571823.76690.qm@web65714.mail.ac4.yahoo.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

Thanks for the information. I perform benchmarking on a very simple table, on local database. (1 table, 2 fields with 1 is bigserial, another is text)

====================================================================
INSERT INTO measurement_type(measurement_type_name) VALUES ('Hello')
====================================================================

I turn synchronous_commit to off.

To write a single row(local database), the time taken is in between 0.1ms and 0.5ms

I try to compare this with flat text file.

To write a single row(file), the time taken is in between 0.005ms and 0.05ms

The different is big. Is this the expected result? Are you guys also getting the similar result?

I know there shall be some overhead to write to database compared to flat text file. (network communication, interpretation of SQL statement...) However, Is there any way to further improve so that PostgreSQL write performance is near to file?

If not, I need to plan another strategy, to migrate my flat text file system, into PostgreSQL system smoothly.

Thanks and Regards
Yan Cheng CHEOK

--- On Tue, 1/5/10, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au> wrote:

> From: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
> Subject: Re: [GENERAL] PostgreSQL Write Performance
> To: "Yan Cheng Cheok" <yccheok(at)yahoo(dot)com>
> Cc: "Dann Corbit" <DCorbit(at)connx(dot)com>, pgsql-general(at)postgresql(dot)org
> Date: Tuesday, January 5, 2010, 7:20 PM
> On 5/01/2010 3:30 PM, Yan Cheng Cheok
> wrote:
> >>> What is the actual problem you are trying to
> solve?
> >
> > I am currently developing a database system for a high
> speed measurement machine.
> >
> > The time taken to perform measurement per unit is in
> term of ~30 milliseconds. We need to record down the
> measurement result for every single unit. Hence, the time
> taken by record down the measurement result shall be far
> more less than milliseconds, so that it will have nearly 0
> impact on the machine speed (If not, machine need to wait
> for database to finish writing, before performing
> measurement on next unit)
>
> The commit_delay and synchronous_commit pararmeters may
> help you if you want to do each insert as a separate
> transaction. Note that with these parameters there's some
> risk of very recently committed data being lost if the
> server OS crashes or the server hardware is powered
> off/power-cycled unexpectedly. PostgreSQL its self crashing
> shouldn't cause loss of the committed data, though.
>
> Alternately, you can accumulate small batches of
> measurements in your app and do multi-valued INSERTs once
> you have a few (say 10) collected up. You'd have to be
> prepared to lose those if the app crashed though.
>
> Another option is to continue using your flat file, and
> have a "reader" process tailing the flat file and inserting
> new records into the database as they become available in
> the flat file. The reader could batch inserts intelligently,
> keep a record on disk of its progress, rotate the flat file
> periodically, etc.
>
> --
> Craig Ringer
>
> -- Sent via pgsql-general mailing list (pgsql-general(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

From:	Stephen Cook <sclists(at)gmail(dot)com>
To:	Tim Uckun <timuckun(at)gmail(dot)com>
Cc:	pgsql-general(at)postgresql(dot)org
Subject:	Re: PostgreSQL Write Performance
Date:	2010-01-06 06:50:01
Message-ID:	4B443299.5010909@gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

Tim Uckun wrote:
>
> Is there a command like COPY which will insert the data but skip all
> triggers and optionally integrity checks.
>

I'm curious if it would be worth COPYing the data into dummy tables with
no constraints, and then using INSERT INTO ... SELECT statements to feed
from those tables into the real ones, which would check constraints and
such but as a set.

I've done it that way in SQL Server before, but I'm much less
experienced with PostgreSQL.

-- Stephen Cook

From:	"Dann Corbit" <DCorbit(at)connx(dot)com>
To:	"Yan Cheng Cheok" <yccheok(at)yahoo(dot)com>
Cc:	<pgsql-general(at)postgresql(dot)org>
Subject:	Re: PostgreSQL Write Performance
Date:	2010-01-06 07:05:36
Message-ID:	D425483C2C5C9F49B5B7A41F89441547029626CC@postal.corporate.connx.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

> -----Original Message-----
> From: Yan Cheng Cheok [mailto:yccheok(at)yahoo(dot)com]
> Sent: Tuesday, January 05, 2010 10:30 PM
> To: Craig Ringer
> Cc: Dann Corbit; pgsql-general(at)postgresql(dot)org
> Subject: Re: [GENERAL] PostgreSQL Write Performance
>
> Thanks for the information. I perform benchmarking on a very simple
> table, on local database. (1 table, 2 fields with 1 is bigserial,
> another is text)
>
> ====================================================================
> INSERT INTO measurement_type(measurement_type_name) VALUES ('Hello')
> ====================================================================
>
> I turn synchronous_commit to off.
>
> To write a single row(local database), the time taken is in between
> 0.1ms and 0.5ms
>
> I try to compare this with flat text file.
>
> To write a single row(file), the time taken is in between 0.005ms and
> 0.05ms
>
> The different is big. Is this the expected result? Are you guys also
> getting the similar result?

Ten to one is not surprising. You will save quite a bit by doing a
prepare and a large number of inserts in a single transaction.

> I know there shall be some overhead to write to database compared to
> flat text file. (network communication, interpretation of SQL
> statement...) However, Is there any way to further improve so that
> PostgreSQL write performance is near to file?
>
> If not, I need to plan another strategy, to migrate my flat text file
> system, into PostgreSQL system smoothly.

Did you try the copy command? It will be very much faster.

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Yan Cheng Cheok <yccheok(at)yahoo(dot)com>
Cc:	Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, Dann Corbit <DCorbit(at)connx(dot)com>, pgsql-general(at)postgresql(dot)org
Subject:	Re: PostgreSQL Write Performance
Date:	2010-01-06 09:37:53
Message-ID:	1262770673.19367.60781.camel@ebony
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

On Tue, 2010-01-05 at 22:29 -0800, Yan Cheng Cheok wrote:
> Thanks for the information. I perform benchmarking on a very simple table, on local database. (1 table, 2 fields with 1 is bigserial, another is text)
>
> ====================================================================
> INSERT INTO measurement_type(measurement_type_name) VALUES ('Hello')
> ====================================================================
>
> I turn synchronous_commit to off.
>
> To write a single row(local database), the time taken is in between 0.1ms and 0.5ms

x32 improvement sounds pretty good, IMHO

> I try to compare this with flat text file.
>
> To write a single row(file), the time taken is in between 0.005ms and 0.05ms
>
> The different is big. Is this the expected result? Are you guys also getting the similar result?
>
> I know there shall be some overhead to write to database compared to flat text file. (network communication, interpretation of SQL statement...) However, Is there any way to further improve so that PostgreSQL write performance is near to file?

Postgres provides
* data validation on input
* foreign keys to cross-validate fields in your data input
* transactions to allow roll-back of failed data
* auto-generation timestamps, if required
* auto-generated, easily restartable generation of unique keys
* crash recovery of your database changes (apart from very recent data)
* multiple concurrent writers, so multiple measurement machines can
record to just a single database
* readers do not block writers
* SQL interface to run analytical queries against data
* allows you to index data to allow fast retrieval of results
* triggers to allow you to augment your data collection application with
additional features over time
...

If you don't want these features, then yes, they are overheads.

If you're interested in funding feature development, I'd be more than
happy to investigate further optimising your specific case. I think an
auto-batching mode for INSERT statements should be possible, so that we
save up consecutive SQL statements and apply them as a single
transaction without requiring any client changes.

--
Simon Riggs www.2ndQuadrant.com

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Tim Uckun <timuckun(at)gmail(dot)com>
Cc:	Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, pgsql-general(at)postgresql(dot)org
Subject:	Re: PostgreSQL Write Performance
Date:	2010-01-06 09:44:01
Message-ID:	1262771041.19367.60846.camel@ebony
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

On Wed, 2010-01-06 at 15:30 +1300, Tim Uckun wrote:
> > I, for one, would loudly and firmly resist the addition of such a
> > feature. Almost-as-fast options such as intelligent re-checking of
>
> Even if it was not the default behavior?
>
> >
> > If you really want to do that, look at the manual for how to disable
> > triggers, but understand that you are throwing away the database's data
> > integrity protection by doing it.
> >
>
> I guess it's a matter of philosophy. I kind of think as the DBA I
> should be the final authority in determining what is right and wrong.
> It's my data after all. Yes I would expect pg to perform every check I
> specify and execute every trigger I write but if I want I should be
> able to bypass those things "just this once".
>
> As you point out I can already do this by manually going through and
> disabling every trigger or even dropping the triggers. Many people
> have said I could drop the constraints and re-set them up. The fact
> that the COPY command does not have a convenient way for me to do this
> doesn't prevent me from "shooting myself in the foot" if I want to.
> It would just be a flag. If you want you can enable it, if you don't
> they no harm no foul.

The reason we don't do this is because COPY can be run concurrently with
queries, which is different to most other load utilities. The only time
it is safe to disable triggers is in a transaction in which we either
create the table or truncate it. That path is already optimised.

We did discuss a version of COPY that locks the table to allow various
performance optimisations, but that prevented running multiple COPYs
concurrently and the loss in performance from doing that was more than
the expected gain from the optimisation, so we didn't bother.

> Anyway this is getting offtopic. I got my question answered. COPY does
> not do this. If I want to do it I have to manually iterate through all
> the triggers and disable them or drop them before running copy.

There is a command to disable all triggers at once.

--
Simon Riggs www.2ndQuadrant.com

From:	Dimitri Fontaine <dfontaine(at)hi-media(dot)com>
To:	Tim Uckun <timuckun(at)gmail(dot)com>
Cc:	pgsql-general(at)postgresql(dot)org
Subject:	Re: PostgreSQL Write Performance
Date:	2010-01-06 14:13:43
Message-ID:	87fx6js4mg.fsf@hi-media-techno.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

Tim Uckun <timuckun(at)gmail(dot)com> writes:
> Is there a command like COPY which will insert the data but skip all
> triggers and optionally integrity checks.

pg_bulkload does that AFAIK.

http://pgbulkload.projects.postgresql.org/

Regards,
--
dim

From:	Tim Uckun <timuckun(at)gmail(dot)com>
To:	Dimitri Fontaine <dfontaine(at)hi-media(dot)com>
Cc:	pgsql-general(at)postgresql(dot)org
Subject:	Re: PostgreSQL Write Performance
Date:	2010-01-06 20:46:51
Message-ID:	855e4dcf1001061246p4c684cev25ea5f37509ad98c@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

On Thu, Jan 7, 2010 at 3:13 AM, Dimitri Fontaine <dfontaine(at)hi-media(dot)com> wrote:
> Tim Uckun <timuckun(at)gmail(dot)com> writes:
>> Is there a command like COPY which will insert the data but skip all
>> triggers and optionally integrity checks.
>
> pg_bulkload does that AFAIK.
>

That's a great utility. Unfortunately since it bypasses the WAL I
can't use it for this project but thanks for the pointer.

From:	Yan Cheng Cheok <yccheok(at)yahoo(dot)com>
To:	pgsql-general(at)postgresql(dot)org
Subject:	Optimized Select Statement
Date:	2010-01-07 01:45:31
Message-ID:	1893.70905.qm@web65716.mail.ac4.yahoo.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

I am having the table with 1 million rows.

I know there can be multiple "YanChengCHEOK". But in certain situation, I will be only interested in 1 "YanChengCHEOK".

I try to perform SELECT query.

SemiconductorInspection=# SELECT measurement_type_id FROM measurement_type WHERE measurement_type_name='YanChengCHEOK';
measurement_type_id
---------------------
1
(1 row)
Time: 331.057 ms

I try to have it in stored procedures.

DECLARE
_measurement_type_id int8;
BEGIN
SELECT measurement_type_id INTO _measurement_type_id FROM measurement_type WHERE measurement_type_name='YanChengCHEOK';
RAISE NOTICE '%', _measurement_type_id;
return 1;
end;

It tools me only 1.018ms

High chance that PostgreSQL stop looking further, when it found that the variable int8 had been fill in with at most 1 value.

Without using stored procedure, how can I send a SQL statement to PostgreSQL, to tell it that I need only 1 measurement_type_id, to speed up the SELECT speed.

Thanks and Regards
Yan Cheng CHEOK

From:	Ivan Sergio Borgonovo <mail(at)webthatworks(dot)it>
To:	pgsql-general(at)postgresql(dot)org
Subject:	Re: Optimized Select Statement
Date:	2010-01-07 02:02:55
Message-ID:	20100107030255.7c796dc7@dawn.webthatworks.it
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

On Wed, 6 Jan 2010 17:45:31 -0800 (PST)
Yan Cheng Cheok <yccheok(at)yahoo(dot)com> wrote:

> situation, I will be only interested in 1 "YanChengCHEOK".

> SELECT measurement_type_id INTO _measurement_type_id FROM
> measurement_type WHERE measurement_type_name='YanChengCHEOK';

LIMIT 1

Is that what you were looking for?

--
Ivan Sergio Borgonovo
http://www.webthatworks.it

From:	Greg Smith <greg(at)2ndquadrant(dot)com>
To:	Yan Cheng Cheok <yccheok(at)yahoo(dot)com>
Cc:	Dann Corbit <DCorbit(at)connx(dot)com>, pgsql-general(at)postgresql(dot)org
Subject:	Re: PostgreSQL Write Performance
Date:	2010-01-07 04:49:35
Message-ID:	4B4567DF.1080604@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

Yan Cheng Cheok wrote:
> The time taken to perform measurement per unit is in term of ~30 milliseconds. We need to record down the measurement result for every single unit. Hence, the time taken by record down the measurement result shall be far more less than milliseconds, so that it will have nearly 0 impact on the machine speed (If not, machine need to wait for database to finish writing, before performing measurement on next unit)
>

Saving a piece of data to a hard disk permanently takes a few
milliseconds. As pointed out already, exactly how many depends on the
drive, but it's probably going to be 8ms or longer on your system.
There are a few options here:

1) Add a battery-backed write caching controller to your system. Then
the battery will help make sure the data doesn't get lost even if the
power goes out before the driver writes it out. This will cost you
around $300.

2) Use some other type of faster storage, such as a SSD drive that has a
battery on it to cache any unfinished writes. Probably also going to be
around that price, the cheaper SSDs (and some of the expensive ones)
don't take data integrity very seriously.

3) Write the data to a flat file. Periodically import the results into
the database in a batch.

The thing you should realize is that using (3) is going to put you in a
position where it's possible you've told the machine the measurement was
saved, but if the system crashes it won't actually be in the database.
If you're saving to a flat file now, you're already in this
position--you can't write to a flat file and make sure the result is on
disk in less than around 8ms either, you just probably haven't tested
that out yet. Just because the write has returned successfully, that
doesn't mean it's really stored permanently. Power the system off in
the window between that write and when the memory cache goes out to
disk, and you'll discover the data missing from the file after the
system comes back up.

If you're OK with the possibility of losing a measurement in the case of
a system crash, then you should just write measurements to a series of
flat files, then have another process altogether (one that isn't holding
up the machine) load those files into the database. The fact that it
takes a few ms to write to disk is a physical limitation you can't get
around without using more expensive hardware to improve the situation.
If you haven't been seeing that in your app already, I assure you it's
just because you haven't looked for the issue before--this limitation on
disk write speed has been there all along, the database is just forcing
you to address it.

--
Greg Smith 2ndQuadrant Baltimore, MD
PostgreSQL Training, Services and Support
greg(at)2ndQuadrant(dot)com www.2ndQuadrant.com

From:	Yan Cheng Cheok <yccheok(at)yahoo(dot)com>
To:	Greg Smith <greg(at)2ndquadrant(dot)com>
Cc:	Dann Corbit <DCorbit(at)connx(dot)com>, pgsql-general(at)postgresql(dot)org
Subject:	Re: PostgreSQL Write Performance
Date:	2010-01-07 05:17:47
Message-ID:	336122.21521.qm@web65708.mail.ac4.yahoo.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

Thanks for the valuable advice! Will take them into consideration seriously..

From my point of view, my current requirement is limited by so-called "overhead" during communication with database. See the following result from SQL Shell :

SemiconductorInspection=# \timing on
Timing is on.
SemiconductorInspection=# ;
Time: 0.660 ms
SemiconductorInspection=# ;
Time: 0.517 ms
SemiconductorInspection=# ;
Time: 2.249 ms
SemiconductorInspection=#

I assume there shall be no hard disc activity involved, as I am sending "empty" SQL statement over.

Thanks and Regards
Yan Cheng CHEOK

--- On Thu, 1/7/10, Greg Smith <greg(at)2ndquadrant(dot)com> wrote:

> From: Greg Smith <greg(at)2ndquadrant(dot)com>
> Subject: Re: [GENERAL] PostgreSQL Write Performance
> To: "Yan Cheng Cheok" <yccheok(at)yahoo(dot)com>
> Cc: "Dann Corbit" <DCorbit(at)connx(dot)com>, pgsql-general(at)postgresql(dot)org
> Date: Thursday, January 7, 2010, 12:49 PM
> Yan Cheng Cheok wrote:
> > The time taken to perform measurement per unit is in
> term of ~30 milliseconds. We need to record down the
> measurement result for every single unit. Hence, the time
> taken by record down the measurement result shall be far
> more less than milliseconds, so that it will have nearly 0
> impact on the machine speed (If not, machine need to wait
> for database to finish writing, before performing
> measurement on next unit)
> >
>
> Saving a piece of data to a hard disk permanently takes a
> few milliseconds. As pointed out already, exactly how
> many depends on the drive, but it's probably going to be 8ms
> or longer on your system. There are a few options
> here:
>
> 1) Add a battery-backed write caching controller to your
> system. Then the battery will help make sure the data
> doesn't get lost even if the power goes out before the
> driver writes it out. This will cost you around $300.
>
> 2) Use some other type of faster storage, such as a SSD
> drive that has a battery on it to cache any unfinished
> writes. Probably also going to be around that price,
> the cheaper SSDs (and some of the expensive ones) don't take
> data integrity very seriously.
>
> 3) Write the data to a flat file. Periodically import
> the results into the database in a batch.
>
> The thing you should realize is that using (3) is going to
> put you in a position where it's possible you've told the
> machine the measurement was saved, but if the system crashes
> it won't actually be in the database. If you're saving
> to a flat file now, you're already in this position--you
> can't write to a flat file and make sure the result is on
> disk in less than around 8ms either, you just probably
> haven't tested that out yet. Just because the write
> has returned successfully, that doesn't mean it's really
> stored permanently. Power the system off in the window
> between that write and when the memory cache goes out to
> disk, and you'll discover the data missing from the file
> after the system comes back up.
>
> If you're OK with the possibility of losing a measurement
> in the case of a system crash, then you should just write
> measurements to a series of flat files, then have another
> process altogether (one that isn't holding up the machine)
> load those files into the database. The fact that it
> takes a few ms to write to disk is a physical limitation you
> can't get around without using more expensive hardware to
> improve the situation. If you haven't been seeing that
> in your app already, I assure you it's just because you
> haven't looked for the issue before--this limitation on disk
> write speed has been there all along, the database is just
> forcing you to address it.
>
> -- Greg Smith
> 2ndQuadrant Baltimore, MD
> PostgreSQL Training, Services and Support
> greg(at)2ndQuadrant(dot)com
> www.2ndQuadrant.com
>
>

From:	Dimitri Fontaine <dfontaine(at)hi-media(dot)com>
To:	Greg Smith <greg(at)2ndquadrant(dot)com>
Cc:	Yan Cheng Cheok <yccheok(at)yahoo(dot)com>, Dann Corbit <DCorbit(at)connx(dot)com>, pgsql-general(at)postgresql(dot)org
Subject:	Re: PostgreSQL Write Performance
Date:	2010-01-07 10:29:50
Message-ID:	87iqbe9pi9.fsf@hi-media-techno.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

Greg Smith <greg(at)2ndquadrant(dot)com> writes:
> If you're OK with the possibility of losing a measurement in the case of a
> system crash

Then I'd say use synchronous_commit = off for the transactions doing
that, trading durability (the 'D' of ACID) against write
performances. That requires 8.3 at least, and will not fsync() before
telling the client the commit is done.

Regards,
--
dim

From:	Joe Conway <mail(at)joeconway(dot)com>
To:	Greg Smith <greg(at)2ndquadrant(dot)com>
Cc:	Yan Cheng Cheok <yccheok(at)yahoo(dot)com>, Dann Corbit <DCorbit(at)connx(dot)com>, pgsql-general(at)postgresql(dot)org
Subject:	Re: PostgreSQL Write Performance
Date:	2010-01-07 17:02:25
Message-ID:	4B4613A1.30408@joeconway.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

On 01/06/2010 08:49 PM, Greg Smith wrote:
> Yan Cheng Cheok wrote:
>> The time taken to perform measurement per unit is in term of ~30
>> milliseconds. We need to record down the measurement result for every
>> single unit. Hence, the time taken by record down the measurement
>> result shall be far more less than milliseconds, so that it will have
>> nearly 0 impact on the machine speed (If not, machine need to wait for
>> database to finish writing, before performing measurement on next unit)

> Saving a piece of data to a hard disk permanently takes a few
> milliseconds. As pointed out already, exactly how many depends on the
> drive, but it's probably going to be 8ms or longer on your system.
> There are a few options here:

> 3) Write the data to a flat file. Periodically import the results into
> the database in a batch.

> If you're OK with the possibility of losing a measurement in the case of
> a system crash, then you should just write measurements to a series of
> flat files, then have another process altogether (one that isn't holding
> up the machine) load those files into the database. The fact that it

At my last company we built a system for use in semiconductor/flat-panel
display equipment and faced a very similar issue -- namely we needed to
collect 40+ parameters at 6 kHz from one source, and another 200+ at 6
kHz from a second source (and then sync them so they could be properly
analyzed). Our solution was similar to #3, except we didn't bother with
the flat file. We basically had a C++ process "catch" the incoming
stream of data into a memory buffer, and periodically bulk copy it into
the database using libpq COPY functions (see:
http://www.postgresql.org/docs/8.4/interactive/libpq-copy.html)

HTH,

Joe