Re: Postgres insert performance and storage requirement compared to Oracle

Lists: pgsql-hackerspgsql-performance
From: Divakar Singh <dpsmails(at)yahoo(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Postgres insert performance and storage requirement compared to Oracle
Date: 2010-10-25 18:12:40
Message-ID: 402913.53404.qm@web65405.mail.ac4.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

Hello Experts,
My application uses Oracle DB, and makes use of OCI interface.
I have been able to develop similar interface using postgreSQL library.
However, I have done some tests but results for PostgreSQL have not been
encouraging for a few of them.

My questions/scenarios are:

1. How does PostgreSQL perform when inserting data into an indexed (type: btree)
table? Is it true that as you add the indexes on a table, the performance
deteriorates significantly whereas Oracle does not show that much performance
decrease. I have tried almost all postgreSQL performance tips available. I want
to have very good "insert" performance (with indexes), "select" performance is
not that important at this point of time.

2. What are the average storage requirements of postgres compared to Oracle? I
inserted upto 1 million records. The storage requirement of postgreSQL is almost
double than that of Oracle.

Thanks in anticipation.


From: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To: Divakar Singh <dpsmails(at)yahoo(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Postgres insert performance and storage requirement compared to Oracle
Date: 2010-10-25 18:20:33
Message-ID: 1288030833.8930.30.camel@jd-desktop
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

On Mon, 2010-10-25 at 11:12 -0700, Divakar Singh wrote:

> My questions/scenarios are:
>
> 1. How does PostgreSQL perform when inserting data into an indexed
> (type: btree)
> table? Is it true that as you add the indexes on a table, the
> performance
> deteriorates significantly whereas Oracle does not show that much
> performance
> decrease. I have tried almost all postgreSQL performance tips
> available. I want
> to have very good "insert" performance (with indexes), "select"
> performance is
> not that important at this point of time.

Did you test?

>
> 2. What are the average storage requirements of postgres compared to
> Oracle? I
> inserted upto 1 million records. The storage requirement of postgreSQL
> is almost
> double than that of Oracle.

What was your table structure?

Joshua D. Drake

--
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579
Consulting, Training, Support, Custom Development, Engineering
http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt


From: Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
To: Divakar Singh <dpsmails(at)yahoo(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Postgres insert performance and storage requirement compared to Oracle
Date: 2010-10-25 18:26:27
Message-ID: AANLkTinB1yN0u61EGceCXq6Tjp+RuQ0-JDRp5usZAfWp@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

On Mon, Oct 25, 2010 at 12:12 PM, Divakar Singh <dpsmails(at)yahoo(dot)com> wrote:
> Hello Experts,
> My application uses Oracle DB, and makes use of OCI interface.
> I have been able to develop similar interface using postgreSQL library.
> However, I have done some tests but results for PostgreSQL have not been
> encouraging for a few of them.

Tell us more about your tests and results please.


From: Divakar Singh <dpsmails(at)yahoo(dot)com>
To: jd(at)commandprompt(dot)com
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Postgres insert performance and storage requirement compared to Oracle
Date: 2010-10-25 18:31:22
Message-ID: 772167.16310.qm@web65409.mail.ac4.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance


> My questions/scenarios are:
>
> 1. How does PostgreSQL perform when inserting data into an indexed
> (type: btree)
> table? Is it true that as you add the indexes on a table, the
> performance
> deteriorates significantly whereas Oracle does not show that much
> performance
> decrease. I have tried almost all postgreSQL performance tips
> available. I want
> to have very good "insert" performance (with indexes), "select"
> performance is
> not that important at this point of time.

-- Did you test?

Yes. the performance was comparable when using SQL procedure. However, When I
used libpq, PostgreSQL performed very bad. There was some difference in
environment also between these 2 tests, but I am assuming libpq vs SQL was the
real cause. Or it was something else?

>
> 2. What are the average storage requirements of postgres compared to
> Oracle? I
> inserted upto 1 million records. The storage requirement of postgreSQL
> is almost
> double than that of Oracle.

-- What was your table structure?

Some 10-12 columns ( like 2 timestamp, 4 int and 4 varchar), with 5 indexes on
varchar and int fields including 1 implicit index coz of PK.

Joshua D. Drake

--
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579
Consulting, Training, Support, Custom Development, Engineering
http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt


From: Josh Kupershmidt <schmiddy(at)gmail(dot)com>
To: Divakar Singh <dpsmails(at)yahoo(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Postgres insert performance and storage requirement compared to Oracle
Date: 2010-10-25 18:33:10
Message-ID: AANLkTinxpLHj26-3RbEMDvSQ8_oyRpyFtnXLOvhiO56R@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

On Mon, Oct 25, 2010 at 2:12 PM, Divakar Singh <dpsmails(at)yahoo(dot)com> wrote:
> 1. How does PostgreSQL perform when inserting data into an indexed (type:
> btree) table? Is it true that as you add the indexes on a table, the
> performance deteriorates significantly whereas Oracle does not show that
> much performance decrease. I have tried almost all postgreSQL performance
> tips available. I want to have very good "insert" performance (with
> indexes), "select" performance is not that important at this point of time.

I don't claim to have any experience with Oracle, but this boast
smells fishy. See for example Figure 3-2 (pp. 57-58) in "The Art of
SQL", where the author presents simple charts showing the performance
impact upon INSERTs of adding indexes to a table in Oracle and MySQL:
they're both in the same ballpark, and the performance impact is
indeed significant. As Joshua Drake suggests, table schemas and test
results would help your case.

Josh


From: Divakar Singh <dpsmails(at)yahoo(dot)com>
To: Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Postgres insert performance and storage requirement compared to Oracle
Date: 2010-10-25 18:36:24
Message-ID: 781874.42830.qm@web65413.mail.ac4.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

Storage test was simple, but the data (seconds taken) for INSERT test for PG vs
Oracle for 1, 2, 3,4 and 5 indexes was:
PG:

25
30
37
42
45

Oracle:

33
43
50
65
68 Rows inserted: 100,000
Above results show good INSERT performance of PG when using SQL procedures. But
performance when I use C++ lib is very bad. I did that test some time back so I
do not have data for that right now.

________________________________
From: Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
To: Divakar Singh <dpsmails(at)yahoo(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Sent: Mon, October 25, 2010 11:56:27 PM
Subject: Re: [PERFORM] Postgres insert performance and storage requirement
compared to Oracle

On Mon, Oct 25, 2010 at 12:12 PM, Divakar Singh <dpsmails(at)yahoo(dot)com> wrote:
> Hello Experts,
> My application uses Oracle DB, and makes use of OCI interface.
> I have been able to develop similar interface using postgreSQL library.
> However, I have done some tests but results for PostgreSQL have not been
> encouraging for a few of them.

Tell us more about your tests and results please.


From: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To: Divakar Singh <dpsmails(at)yahoo(dot)com>
Cc: Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Postgres insert performance and storage requirement compared to Oracle
Date: 2010-10-25 18:38:52
Message-ID: 1288031932.8930.32.camel@jd-desktop
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

On Mon, 2010-10-25 at 11:36 -0700, Divakar Singh wrote:
>
> 68 Rows inserted: 100,000
> Above results show good INSERT performance of PG when using SQL
> procedures. But
> performance when I use C++ lib is very bad. I did that test some time
> back so I
> do not have data for that right now.

This is interesting, are you using libpq or libpqXX?

Joshua D. Drake

>
>
--
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579
Consulting, Training, Support, Custom Development, Engineering
http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt


From: Divakar Singh <dpsmails(at)yahoo(dot)com>
To: jd(at)commandprompt(dot)com
Cc: Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Postgres insert performance and storage requirement compared to Oracle
Date: 2010-10-25 18:42:48
Message-ID: 415787.11817.qm@web65404.mail.ac4.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

Hi Joshua,
I have been only using libpq.
Is libpqXX better than the other?
Is there any notable facility in libpqxx which could aid in fast inserts or
better performance in general?

Best Regards,
Divakar

________________________________
From: Joshua D. Drake <jd(at)commandprompt(dot)com>
To: Divakar Singh <dpsmails(at)yahoo(dot)com>
Cc: Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>; pgsql-performance(at)postgresql(dot)org
Sent: Tue, October 26, 2010 12:08:52 AM
Subject: Re: [PERFORM] Postgres insert performance and storage requirement
compared to Oracle

On Mon, 2010-10-25 at 11:36 -0700, Divakar Singh wrote:
>
> 68 Rows inserted: 100,000
> Above results show good INSERT performance of PG when using SQL
> procedures. But
> performance when I use C++ lib is very bad. I did that test some time
> back so I
> do not have data for that right now.

This is interesting, are you using libpq or libpqXX?

Joshua D. Drake

>
>
--
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579
Consulting, Training, Support, Custom Development, Engineering
http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt


From: Steve Singer <ssinger(at)ca(dot)afilias(dot)info>
To: Divakar Singh <dpsmails(at)yahoo(dot)com>
Cc: jd(at)commandprompt(dot)com, pgsql-performance(at)postgresql(dot)org
Subject: Re: Postgres insert performance and storage requirement compared to Oracle
Date: 2010-10-25 18:46:46
Message-ID: 4CC5D096.5030406@ca.afilias.info
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

On 10-10-25 02:31 PM, Divakar Singh wrote:
>
> > My questions/scenarios are:
> >
> > 1. How does PostgreSQL perform when inserting data into an indexed
> > (type: btree)
> > table? Is it true that as you add the indexes on a table, the
> > performance
> > deteriorates significantly whereas Oracle does not show that much
> > performance
> > decrease. I have tried almost all postgreSQL performance tips
> > available. I want
> > to have very good "insert" performance (with indexes), "select"
> > performance is
> > not that important at this point of time.
>
> -- Did you test?
>
> Yes. the performance was comparable when using SQL procedure. However,
> When I used libpq, PostgreSQL performed very bad. There was some
> difference in environment also between these 2 tests, but I am assuming
> libpq vs SQL was the real cause. Or it was something else?

So your saying that when you load the data with psql it loads fine, but
when you load it using libpq it takes much longer?

How are you using libpq?
-Are you opening and closing the database connection between each insert?
-Are you doing all of your inserts as one big transaction or are you
doing a transaction per insert
-Are you using prepared statements for your inserts?
-Are you using the COPY command to load your data or the INSERT command?
-Are you running your libpq program on the same server as postgresql?
-How is your libpq program connecting to postgresql, is it using ssl?

>
> Some 10-12 columns ( like 2 timestamp, 4 int and 4 varchar), with 5
> indexes on varchar and int fields including 1 implicit index coz of PK.

If your run "VACUUM VERBOSE tablename" on the table, what does it say?

You also don't mention which version of postgresql your using.

>
>
> Joshua D. Drake
>
>
> --
> PostgreSQL.org Major Contributor
> Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579
> Consulting, Training, Support, Custom Development, Engineering
> http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt
>
>


From: Divakar Singh <dpsmails(at)yahoo(dot)com>
To: Steve Singer <ssinger(at)ca(dot)afilias(dot)info>
Cc: jd(at)commandprompt(dot)com, pgsql-performance(at)postgresql(dot)org
Subject: Re: Postgres insert performance and storage requirement compared to Oracle
Date: 2010-10-25 18:52:31
Message-ID: 441423.23877.qm@web65406.mail.ac4.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance


Answers:

How are you using libpq?
-Are you opening and closing the database connection between each insert?

[Need to check, will come back on this]

-Are you doing all of your inserts as one big transaction or are you doing a
transaction per insert

[Answer: for C++ program, one insert per transaction in PG as well as Oracle.
But in stored proc, I think both use only 1 transaction for all inserts]

-Are you using prepared statements for your inserts?

[Need to check, will come back on this]

-Are you using the COPY command to load your data or the INSERT command?

[No]

-Are you running your libpq program on the same server as postgresql?

[Yes]

-How is your libpq program connecting to postgresql, is it using ssl?

[No]

If your run "VACUUM VERBOSE tablename" on the table, what does it say?

[Need to check, will come back on this]

You also don't mention which version of postgresql your using.

[Latest, 9.x]

Best Regards,
Divakar

________________________________
From: Steve Singer <ssinger(at)ca(dot)afilias(dot)info>
To: Divakar Singh <dpsmails(at)yahoo(dot)com>
Cc: jd(at)commandprompt(dot)com; pgsql-performance(at)postgresql(dot)org
Sent: Tue, October 26, 2010 12:16:46 AM
Subject: Re: [PERFORM] Postgres insert performance and storage requirement
compared to Oracle

On 10-10-25 02:31 PM, Divakar Singh wrote:
>
> > My questions/scenarios are:
> >
> > 1. How does PostgreSQL perform when inserting data into an indexed
> > (type: btree)
> > table? Is it true that as you add the indexes on a table, the
> > performance
> > deteriorates significantly whereas Oracle does not show that much
> > performance
> > decrease. I have tried almost all postgreSQL performance tips
> > available. I want
> > to have very good "insert" performance (with indexes), "select"
> > performance is
> > not that important at this point of time.
>
> -- Did you test?
>
> Yes. the performance was comparable when using SQL procedure. However,
> When I used libpq, PostgreSQL performed very bad. There was some
> difference in environment also between these 2 tests, but I am assuming
> libpq vs SQL was the real cause. Or it was something else?

So your saying that when you load the data with psql it loads fine, but
when you load it using libpq it takes much longer?

How are you using libpq?
-Are you opening and closing the database connection between each insert?
-Are you doing all of your inserts as one big transaction or are you
doing a transaction per insert
-Are you using prepared statements for your inserts?
-Are you using the COPY command to load your data or the INSERT command?
-Are you running your libpq program on the same server as postgresql?
-How is your libpq program connecting to postgresql, is it using ssl?

>
> Some 10-12 columns ( like 2 timestamp, 4 int and 4 varchar), with 5
> indexes on varchar and int fields including 1 implicit index coz of PK.

If your run "VACUUM VERBOSE tablename" on the table, what does it say?

You also don't mention which version of postgresql your using.

>
>
> Joshua D. Drake
>
>
> --
> PostgreSQL.org Major Contributor
> Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579
> Consulting, Training, Support, Custom Development, Engineering
> http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt
>
>


From: Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>
To: Divakar Singh <dpsmails(at)yahoo(dot)com>
Cc: Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Postgres insert performance and storage requirement compared to Oracle
Date: 2010-10-25 18:56:13
Message-ID: 4CC5D2CD.40403@vmsinfo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

Profiling could tell you where is the time lost and where is your
program spending time. Having experience with both Oracle and Postgres,
I don't feel that there is much of a difference in the insert speed. I
am not using C++, I am using scripting languages like Perl and PHP and,
as far as inserts go, I don't see much of a difference. I have an
application which inserts approximately 600,000 records into a
PostgreSQL 9.0.1 per day, in chunks of up to 60,000 records every hour.
The table is partitioned and there are indexes on the underlying
partitions. I haven't noticed any problems with inserts. Also, if I use
"copy" instead of the "insert" command, I can be even faster. In
addition to that, PostgreSQL doesn't support index organized tables.

Divakar Singh wrote:
> Storage test was simple, but the data (seconds taken) for INSERT test
> for PG vs Oracle for 1, 2, 3,4 and 5 indexes was:
> PG:
> 25
> 30
> 37
> 42
> 45
>
>
>
> Oracle:
>
> 33
> 43
> 50
> 65
> 68
>
> Rows inserted: 100,000
> Above results show good INSERT performance of PG when using SQL
> procedures. But performance when I use C++ lib is very bad. I did that
> test some time back so I do not have data for that right now.
>
> ------------------------------------------------------------------------
> *From:* Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
> *To:* Divakar Singh <dpsmails(at)yahoo(dot)com>
> *Cc:* pgsql-performance(at)postgresql(dot)org
> *Sent:* Mon, October 25, 2010 11:56:27 PM
> *Subject:* Re: [PERFORM] Postgres insert performance and storage
> requirement compared to Oracle
>
> On Mon, Oct 25, 2010 at 12:12 PM, Divakar Singh <dpsmails(at)yahoo(dot)com
> <mailto:dpsmails(at)yahoo(dot)com>> wrote:
> > Hello Experts,
> > My application uses Oracle DB, and makes use of OCI interface.
> > I have been able to develop similar interface using postgreSQL library.
> > However, I have done some tests but results for PostgreSQL have not been
> > encouraging for a few of them.
>
> Tell us more about your tests and results please.
>

--

Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com
The Leader in Integrated Media Intelligence Solutions


From: Ray Stell <stellr(at)cns(dot)vt(dot)edu>
To: Divakar Singh <dpsmails(at)yahoo(dot)com>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Postgres insert performance and storage requirement compared to Oracle
Date: 2010-10-25 19:21:57
Message-ID: 20101025192157.GB14620@cns.vt.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance


On Mon, Oct 25, 2010 at 11:39:30AM -0700, Divakar Singh wrote:
> Thanks Ray,
> Already seen that, but it does not tell about storage requirement compared to
> Oracle. I find it takes 2 times space than oracle.
>
>
> Best Regards,
> Divakar
> ________________________________
> From: Ray Stell <stellr(at)cns(dot)vt(dot)edu>
> To: Divakar Singh <dpsmails(at)yahoo(dot)com>
> Sent: Tue, October 26, 2010 12:05:23 AM
> Subject: Re: [PERFORM] Postgres insert performance and storage requirement
> compared to Oracle
>
> On Mon, Oct 25, 2010 at 11:12:40AM -0700, Divakar Singh wrote:
> >
> > 2. What are the average storage requirements of postgres compared to Oracle? I
>
> > inserted upto 1 million records. The storage requirement of postgreSQL is
> >almost
> >
> > double than that of Oracle.
>
> there's a fine manual:
> http://www.postgresql.org/docs/9.0/interactive/storage.html

Maybe compare to oracle's storage documentation:

http://download.oracle.com/docs/cd/E11882_01/server.112/e17118/sql_elements001.htm#SQLRF30020
http://download.oracle.com/docs/cd/E11882_01/server.112/e17120/schema007.htm#ADMIN11622

I don't believe for a second the byte count is double in pg, but that's just
a religious expression, I've never counted.


From: Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
To: Divakar Singh <dpsmails(at)yahoo(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Postgres insert performance and storage requirement compared to Oracle
Date: 2010-10-25 19:26:41
Message-ID: AANLkTinyQME8DhkD0hGPS6-09Qz0C99rVr_AzDLr_oqD@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

On Mon, Oct 25, 2010 at 12:36 PM, Divakar Singh <dpsmails(at)yahoo(dot)com> wrote:
>
> Storage test was simple, but the data (seconds taken) for INSERT test for PG vs Oracle for 1, 2, 3,4 and 5 indexes was:
> PG:
> 25
> 30
> 37
> 42
> 45
>
> Oracle:
>
> 33
> 43
> 50
> 65
> 68
> Rows inserted: 100,000
> Above results show good INSERT performance of PG when using SQL procedures. But performance when I use C++ lib is very bad. I did that test some time back so I do not have data for that right now.

So, assuming I wanted to reproduce your results, can you provide a
self contained test case that shows these differences? I have always
gotten really good performance using libpq myself, so I'm looking for
what it is you might be doing differently from me that would make it
so slow.


From: Alan Hodgson <ahodgson(at)simkin(dot)ca>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Postgres insert performance and storage requirement compared to Oracle
Date: 2010-10-25 19:51:41
Message-ID: 201010251251.41812.ahodgson@simkin.ca
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

On October 25, 2010 11:36:24 am Divakar Singh wrote:
> Above results show good INSERT performance of PG when using SQL procedures.
> But performance when I use C++ lib is very bad. I did that test some time
> back so I do not have data for that right now.

Wrap it in a transaction.


From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Divakar Singh <dpsmails(at)yahoo(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Postgres insert performance and storage requirement compared to Oracle
Date: 2010-10-25 20:28:57
Message-ID: AANLkTi=0iZDF7EHebukEJ+jxFN8giCy3iXxYuSe_mLUY@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

On Mon, Oct 25, 2010 at 2:12 PM, Divakar Singh <dpsmails(at)yahoo(dot)com> wrote:
> Hello Experts,
> My application uses Oracle DB, and makes use of OCI interface.
> I have been able to develop similar interface using postgreSQL library.
> However, I have done some tests but results for PostgreSQL have not been
> encouraging for a few of them.
>
> My questions/scenarios are:
>
> 1. How does PostgreSQL perform when inserting data into an indexed (type:
> btree) table? Is it true that as you add the indexes on a table, the
> performance deteriorates significantly whereas Oracle does not show that
> much performance decrease. I have tried almost all postgreSQL performance
> tips available. I want to have very good "insert" performance (with
> indexes), "select" performance is not that important at this point of time.
>
> 2. What are the average storage requirements of postgres compared to Oracle?
> I inserted upto 1 million records. The storage requirement of postgreSQL is
> almost double than that of Oracle.
>u
> Thanks in anticipation.

I ran the following tests w/libpqtypes. While you probably wont end
up using libpqtypes, it's illustrative to mention it because it's
generally the easiest way to get data into postgres and by far the
fastest (excluding 'COPY'). source code follows after the sig (I
banged it out quite quickly, it's messy!) :-). I am not seeing your
results.

via libpqtypes: Inserting, begin..insert..(repeat 1000000x) commit;
local workstation: 2m24s
remote server: 8m8s

via libpqtypes, but stacking array and unstacking on server (this
could be optimized further by using local prepare):
local workstation: 43s (io bound)
remote server: 29s (first million)
remote server: 29s (second million)
create index (1.8s) remote
remote server: 33s (third million, w/index)

obviously insert at a time tests are network bound. throw a couple of
indexes in there and you should see some degradation, but nothing too
terrible.

merlin
libpqtypes.esilo.com

ins1.c (insert at a time)
#include "libpq-fe.h"
#include "libpqtypes.h"

#define INS_COUNT 1000000

int main()
{
int i;

PGconn *conn = PQconnectdb("host=devdb dbname=postgres port=8071");
if(PQstatus(conn) != CONNECTION_OK)
{
printf("bad connection");
return -1;
}

PQtypesRegister(conn);

PQexec(conn, "begin");

for(i=0; i<INS_COUNT; i++)
{
PGint4 a=i;
PGtext b = "some_text";
PGtimestamp c;
PGbytea d;

d.len = 8;
d.data = b;

c.date.isbc = 0;
c.date.year = 2000;
c.date.mon = 0;
c.date.mday = 19;
c.time.hour = 10;
c.time.min = 41;
c.time.sec = 6;
c.time.usec = 0;
c.time.gmtoff = -18000;

PGresult *res = PQexecf(conn, "insert into ins_test(a,b,c,d)
values(%int4, %text, %timestamptz, %bytea)", a, b, &c, &d);

if(!res)
{
printf("got %s\n", PQgeterror());
return -1;
}
PQclear(res);
}

PQexec(conn, "commit");

PQfinish(conn);
}

ins2.c (array stack/unstack)
#include "libpq-fe.h"
#include "libpqtypes.h"

#define INS_COUNT 1000000

int main()
{
int i;

PGconn *conn = PQconnectdb("host=devdb dbname=postgres port=8071");
PGresult *res;
if(PQstatus(conn) != CONNECTION_OK)
{
printf("bad connection");
return -1;
}

PQtypesRegister(conn);

PGregisterType type = {"ins_test", NULL, NULL};
PQregisterComposites(conn, &type, 1);

PGparam *p = PQparamCreate(conn);
PGarray arr;
arr.param = PQparamCreate(conn);
arr.ndims = 0;

for(i=0; i<INS_COUNT; i++)
{
PGint4 a=i;
PGtext b = "some_text";
PGtimestamp c;
PGbytea d;
PGparam *i = PQparamCreate(conn);

d.len = 8;
d.data = b;

c.date.isbc = 0;
c.date.year = 2000;
c.date.mon = 0;
c.date.mday = 19;
c.time.hour = 10;
c.time.min = 41;
c.time.sec = 6;
c.time.usec = 0;
c.time.gmtoff = -18000;

PQputf(i, "%int4 %text %timestamptz %bytea", a, b, &c, &d);
PQputf(arr.param, "%ins_test", i);
}

if(!PQputf(p, "%ins_test[]", &arr))
{
printf("putf failed: %s\n", PQgeterror());
return -1;
}
res = PQparamExec(conn, p, "insert into ins_test select (unnest($1)).*", 1);

if(!res)
{
printf("got %s\n", PQgeterror());
return -1;
}
PQclear(res);
PQfinish(conn);
}


From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Divakar Singh <dpsmails(at)yahoo(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Postgres insert performance and storage requirement compared to Oracle
Date: 2010-10-25 20:51:02
Message-ID: AANLkTi=cy67frLbx+YH_uUk5vqPsxt8u-Q_cpU74pYMq@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

On Mon, Oct 25, 2010 at 4:28 PM, Merlin Moncure <mmoncure(at)gmail(dot)com> wrote:
> I ran the following tests w/libpqtypes.  While you probably wont end
> up using libpqtypes, it's illustrative to mention it because it's
> generally the easiest way to get data into postgres and by far the
> fastest (excluding 'COPY').  source code follows after the sig (I
> banged it out quite quickly, it's messy!) :-).  I am not seeing your
> results.

I had a really horrible bug in there -- leaking a param inside the
array push loop. cleaning it up dropped another 5 seconds or so from
the 4th million inserted to the remote server!. Using local prepare
(PQspecPrepare) prob another second or two could be shaved off.

PGparam *t = PQparamCreate(conn);

for(i=0; i<INS_COUNT; i++)
{
PGint4 a=i;
PGtext b = "some_text";
PGtimestamp c;
PGbytea d;

d.len = 8;
d.data = b;

c.date.isbc = 0;
c.date.year = 2000;
c.date.mon = 0;
c.date.mday = 19;
c.time.hour = 10;
c.time.min = 41;
c.time.sec = 6;
c.time.usec = 0;
c.time.gmtoff = -18000;

PQputf(t, "%int4 %text %timestamptz %bytea", a, b, &c, &d);
PQputf(arr.param, "%ins_test", t);
PQparamReset(t);
}

merlin


From: Divakar Singh <dpsmails(at)yahoo(dot)com>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Postgres insert performance and storage requirement compared to Oracle
Date: 2010-10-26 11:44:28
Message-ID: 208728.42197.qm@web65409.mail.ac4.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

Hi Merlin,
Thanks for your quick input.
Well 1 difference worth mentioning:
I am inserting each row in a separate transaction, due to design of my program.

-Divakar

________________________________
From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Divakar Singh <dpsmails(at)yahoo(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Sent: Tue, October 26, 2010 2:21:02 AM
Subject: Re: [PERFORM] Postgres insert performance and storage requirement
compared to Oracle

On Mon, Oct 25, 2010 at 4:28 PM, Merlin Moncure <mmoncure(at)gmail(dot)com> wrote:
> I ran the following tests w/libpqtypes. While you probably wont end
> up using libpqtypes, it's illustrative to mention it because it's
> generally the easiest way to get data into postgres and by far the
> fastest (excluding 'COPY'). source code follows after the sig (I
> banged it out quite quickly, it's messy!) :-). I am not seeing your
> results.

I had a really horrible bug in there -- leaking a param inside the
array push loop. cleaning it up dropped another 5 seconds or so from
the 4th million inserted to the remote server!. Using local prepare
(PQspecPrepare) prob another second or two could be shaved off.

PGparam *t = PQparamCreate(conn);

for(i=0; i<INS_COUNT; i++)
{
PGint4 a=i;
PGtext b = "some_text";
PGtimestamp c;
PGbytea d;

d.len = 8;
d.data = b;

c.date.isbc = 0;
c.date.year = 2000;
c.date.mon = 0;
c.date.mday = 19;
c.time.hour = 10;
c.time.min = 41;
c.time.sec = 6;
c.time.usec = 0;
c.time.gmtoff = -18000;

PQputf(t, "%int4 %text %timestamptz %bytea", a, b, &c, &d);
PQputf(arr.param, "%ins_test", t);
PQparamReset(t);
}

merlin


From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Divakar Singh <dpsmails(at)yahoo(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Postgres insert performance and storage requirement compared to Oracle
Date: 2010-10-26 13:20:38
Message-ID: AANLkTikeYj52q1v1RcwCYR=9Kv5mxJmh+Q+KvXcmibYK@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

On Tue, Oct 26, 2010 at 7:44 AM, Divakar Singh <dpsmails(at)yahoo(dot)com> wrote:
> Hi Merlin,
> Thanks for your quick input.
> Well 1 difference worth mentioning:
> I am inserting each row in a separate transaction, due to design of my
> program.

Well, that right there is going to define your application
performance. You have basically three major issues -- postgresql
executes each query synchronously through the protocol, transaction
overhead, and i/o issues coming from per transaction sync. libpq
supports asynchronous queries, but only from the clients point of view
-- so that this only helps if you have non trivial work to do setting
up each query. The database is inherently capable of doing what you
want it to do...you may just have to rethink certain things if you
want to unlock the true power of postgres...

You have several broad areas of attack:
*) client side: use prepared queries (PQexecPrepared) possibly
asynchronously (PQsendPrepared). Reasonably you can expect 5-50%
speedup if not i/o bound
*) Stage data to a temp table: temp tables are not wal logged or
synced. Periodically they can be flushed to a permanent table.
Possible data loss
*) Relax sync policy (synchronous_commit/fsync) -- be advised these
settings are dangerous
*) Multiple client writers -- as long as you are not i/o bound, you
will see big improvements in tps from multiple clients
*) Stage/queue application data before inserting it -- requires
retooling application, but you can see orders of magnitude jump insert
performance

merlin


From: Leonardo Francalanci <m_lists(at)yahoo(dot)it>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>, Divakar Singh <dpsmails(at)yahoo(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Postgres insert performance and storage requirement compared to Oracle
Date: 2010-10-26 15:08:20
Message-ID: 770715.39613.qm@web29003.mail.ird.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

> temp tables are not wal logged or
> synced. Periodically they can be flushed to a permanent table.

What do you mean with "Periodically they can be flushed to
a permanent table"? Just doing

insert into tabb select * from temptable

or using a proper, per-temporary table command???


From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Leonardo Francalanci <m_lists(at)yahoo(dot)it>
Cc: Divakar Singh <dpsmails(at)yahoo(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Postgres insert performance and storage requirement compared to Oracle
Date: 2010-10-26 15:41:59
Message-ID: AANLkTinaW3ZtGJbzgAARG0WfAOJxSPAbdVAZa6X_2dh+@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

On Tue, Oct 26, 2010 at 11:08 AM, Leonardo Francalanci <m_lists(at)yahoo(dot)it> wrote:
>> temp  tables are not wal logged or
>> synced.  Periodically they can be flushed  to a permanent table.
>
>
> What do you mean with "Periodically they can be flushed  to
> a permanent table"? Just doing
>
> insert into tabb select * from temptable
>

yup, that's exactly what I mean -- this will give you more uniform
insert performance (your temp table doesn't even need indexes). Every
N records (say 10000) you send to permanent and truncate the temp
table. Obviously, this is more fragile approach so weigh the
pros/cons carefully.

merlin


From: Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Postgres insert performance and storage requirement compared to Oracle
Date: 2010-10-26 21:02:55
Message-ID: 4CC741FF.8050709@vmsinfo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

On 10/26/2010 11:41 AM, Merlin Moncure wrote:
> yup, that's exactly what I mean -- this will give you more uniform
> insert performance (your temp table doesn't even need indexes). Every
> N records (say 10000) you send to permanent and truncate the temp
> table. Obviously, this is more fragile approach so weigh the
> pros/cons carefully.
>
> merlin

Truncate temporary table? What a horrible advice! All that you need is
the temporary table to delete rows on commit.

--

Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com
The Leader in Integrated Media Intelligence Solutions


From: Jon Nelson <jnelson+pgsql(at)jamponi(dot)net>
To:
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Postgres insert performance and storage requirement compared to Oracle
Date: 2010-10-26 21:27:23
Message-ID: AANLkTinntqfT1b9xaRfGGqVwwsGhyA7=JhLsuQ10Xzyy@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

On Tue, Oct 26, 2010 at 4:02 PM, Mladen Gogala
<mladen(dot)gogala(at)vmsinfo(dot)com> wrote:
> On 10/26/2010 11:41 AM, Merlin Moncure wrote:
>>
>> yup, that's exactly what I mean -- this will give you more uniform
>> insert performance (your temp table doesn't even need indexes).  Every
>> N records (say 10000) you send to permanent and truncate the temp
>> table.  Obviously, this is more fragile approach so weigh the
>> pros/cons carefully.
>>
>> merlin
>
> Truncate temporary table? What a horrible advice! All that you need is the
> temporary table to delete rows on commit.

I believe Merlin was suggesting that, after doing 10000 inserts into
the temporary table, that something like this might work better:

start loop:
populate rows in temporary table
insert from temporary table into permanent table
truncate temporary table
loop

I do something similar, where I COPY data to a temporary table, do
lots of manipulations, and then perform a series of INSERTS from the
temporary table into a permanent table.

--
Jon


From: Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Postgres insert performance and storage requirement compared to Oracle
Date: 2010-10-26 21:54:57
Message-ID: 4CC74E31.6060307@vmsinfo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

On 10/26/2010 5:27 PM, Jon Nelson wrote:
> start loop:
> populate rows in temporary table
> insert from temporary table into permanent table
> truncate temporary table
> loop
>
> I do something similar, where I COPY data to a temporary table, do
> lots of manipulations, and then perform a series of INSERTS from the
> temporary table into a permanent table.
>

1) It's definitely not faster because you have to insert into the
temporary table, in addition to inserting into the permanent table.
2) This is what I had in mind:

mgogala=# create table a(c1 int);
CREATE TABLE
mgogala=# create temporary table t1(c1 int) on commit delete rows;
CREATE TABLE
mgogala=# begin;
BEGIN
mgogala=# insert into t1 select generate_series(1,1000);
INSERT 0 1000
mgogala=# insert into a select * from t1;
INSERT 0 1000
mgogala=# commit;
COMMIT
mgogala=# select count(*) from a;
count
-------
1000
(1 row)

mgogala=# select count(*) from t1;
count
-------
0
(1 row)

The table is created with "on commit obliterate rows" option which means
that there is no need to do "truncate". The "truncate" command is a
heavy artillery. Truncating a temporary table is like shooting ducks in
a duck pond, with a howitzer.

--

Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com
The Leader in Integrated Media Intelligence Solutions


From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Postgres insert performance and storage requirement compared to Oracle
Date: 2010-10-26 22:14:15
Message-ID: AANLkTikOew+aGf80kZX9cYYFKBQ_AuoUXEvRVc4xpyJx@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

On Tue, Oct 26, 2010 at 5:54 PM, Mladen Gogala
<mladen(dot)gogala(at)vmsinfo(dot)com> wrote:
> On 10/26/2010 5:27 PM, Jon Nelson wrote:
>>
>> start loop:
>>   populate rows in temporary table
>>   insert from temporary table into permanent table
>>   truncate temporary table
>>   loop
>>
>> I do something similar, where I COPY data to a temporary table, do
>> lots of manipulations, and then perform a series of INSERTS from the
>> temporary table into a permanent table.
>>
>
> 1) It's definitely not faster because you have to insert into the temporary
> table, in addition to inserting into the permanent table.
> 2) This is what I had in mind:
>
> mgogala=# create table a(c1 int);
> CREATE TABLE
> mgogala=# create temporary table t1(c1 int) on commit delete rows;
> CREATE TABLE
> mgogala=# begin;
> BEGIN
> mgogala=# insert into t1 select generate_series(1,1000);
> INSERT 0 1000
> mgogala=# insert into a select * from t1;
> INSERT 0 1000
> mgogala=# commit;
> COMMIT
> mgogala=# select count(*) from a;
>  count
> -------
>  1000
> (1 row)
>
> mgogala=# select count(*) from t1;
>  count
> -------
>     0
> (1 row)
>
> The table is created with "on commit obliterate rows" option which means
> that there is no need to do "truncate". The "truncate" command is a heavy
> artillery. Truncating a temporary table is like shooting ducks in a duck
> pond, with a howitzer.

You are not paying attention ;-). Look upthread: "I am inserting each
row in a separate transaction, due to design of my program." (also on
commit/drop is no picnic either, but I digress...)

merlin


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Postgres insert performance and storage requirement compared to Oracle
Date: 2010-10-26 22:50:37
Message-ID: AANLkTimf9Nj8A+Spiu9q3suvZ+h71CHXRFTXFo_5aBYj@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

On Tue, Oct 26, 2010 at 5:54 PM, Mladen Gogala
<mladen(dot)gogala(at)vmsinfo(dot)com> wrote:
> The table is created with "on commit obliterate rows" option which means
> that there is no need to do "truncate". The "truncate" command is a heavy
> artillery. Truncating a temporary table is like shooting ducks in a duck
> pond, with a howitzer.

This is just not true. ON COMMIT DELETE ROWS simply arranges for a
TRUNCATE to happen immediately before each commit. See
PreCommit_on_commit_actions() in tablecmds.c.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Postgres insert performance and storage requirement compared to Oracle
Date: 2010-10-26 23:16:53
Message-ID: AANLkTi=-fK6ODDNJPntq3OeUJQzKtHbsmkdDM851waOG@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

On Tue, Oct 26, 2010 at 6:50 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Tue, Oct 26, 2010 at 5:54 PM, Mladen Gogala
> <mladen(dot)gogala(at)vmsinfo(dot)com> wrote:
>> The table is created with "on commit obliterate rows" option which means
>> that there is no need to do "truncate". The "truncate" command is a heavy
>> artillery. Truncating a temporary table is like shooting ducks in a duck
>> pond, with a howitzer.
>
> This is just not true.  ON COMMIT DELETE ROWS simply arranges for a
> TRUNCATE to happen immediately before each commit.  See
> PreCommit_on_commit_actions() in tablecmds.c.

quite so. If you are doing anything performance sensitive with 'on
commit drop', you are better off organizing a cache around
txid_current() (now(), pid for older pg versions). Skips the writes
to the system catalogs and truncate.

merlin


From: Divakar Singh <dpsmails(at)yahoo(dot)com>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Postgres insert performance and storage requirement compared to Oracle
Date: 2010-10-27 03:10:56
Message-ID: 252538.42200.qm@web65405.mail.ac4.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

Dear All,
Thanks for your inputs on the insert performance part.
Any suggestion on storage requirement?
VACUUM is certainly not an option, because this is something related to
maintenance AFTER insertion.

I am talking about the plain storage requirement w.r. to Oracle, which I
observed is twice of Oracle in case millions of rows are inserted.
Anybody who tried to analyze the average storage requirement of PG w.r. to
Oracle?

Best Regards,
Divakar

________________________________
From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>; pgsql-performance(at)postgresql(dot)org
Sent: Wed, October 27, 2010 4:46:53 AM
Subject: Re: [PERFORM] Postgres insert performance and storage requirement
compared to Oracle

On Tue, Oct 26, 2010 at 6:50 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Tue, Oct 26, 2010 at 5:54 PM, Mladen Gogala
> <mladen(dot)gogala(at)vmsinfo(dot)com> wrote:
>> The table is created with "on commit obliterate rows" option which means
>> that there is no need to do "truncate". The "truncate" command is a heavy
>> artillery. Truncating a temporary table is like shooting ducks in a duck
>> pond, with a howitzer.
>
> This is just not true. ON COMMIT DELETE ROWS simply arranges for a
> TRUNCATE to happen immediately before each commit. See
> PreCommit_on_commit_actions() in tablecmds.c.

quite so. If you are doing anything performance sensitive with 'on
commit drop', you are better off organizing a cache around
txid_current() (now(), pid for older pg versions). Skips the writes
to the system catalogs and truncate.

merlin


From: Ivan Voras <ivoras(at)freebsd(dot)org>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Postgres insert performance and storage requirement compared to Oracle
Date: 2010-10-27 10:13:10
Message-ID: ia8tvj$15r$1@dough.gmane.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

On 10/26/10 17:41, Merlin Moncure wrote:
> On Tue, Oct 26, 2010 at 11:08 AM, Leonardo Francalanci <m_lists(at)yahoo(dot)it> wrote:
>>> temp tables are not wal logged or
>>> synced. Periodically they can be flushed to a permanent table.
>>
>>
>> What do you mean with "Periodically they can be flushed to
>> a permanent table"? Just doing
>>
>> insert into tabb select * from temptable
>>
>
> yup, that's exactly what I mean -- this will give you more uniform

In effect, when so much data is in temporary storage, a better option
would be to simply configure "synchronous_commit = off" (better in the
sense that the application would not need to be changed). The effects
are almost the same - in both cases transactions might be lost but the
database will survive.


From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Ivan Voras <ivoras(at)freebsd(dot)org>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Re: Postgres insert performance and storage requirement compared to Oracle
Date: 2010-10-27 11:05:44
Message-ID: AANLkTimdzm-U_10twZ6dVNV9abC-zuZGTs-Nui30rfvk@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

On Wed, Oct 27, 2010 at 6:13 AM, Ivan Voras <ivoras(at)freebsd(dot)org> wrote:
> On 10/26/10 17:41, Merlin Moncure wrote:
>> On Tue, Oct 26, 2010 at 11:08 AM, Leonardo Francalanci <m_lists(at)yahoo(dot)it> wrote:
>>>> temp  tables are not wal logged or
>>>> synced.  Periodically they can be flushed  to a permanent table.
>>>
>>>
>>> What do you mean with "Periodically they can be flushed  to
>>> a permanent table"? Just doing
>>>
>>> insert into tabb select * from temptable
>>>
>>
>> yup, that's exactly what I mean -- this will give you more uniform
>
> In effect, when so much data is in temporary storage, a better option
> would be to simply configure "synchronous_commit = off" (better in the
> sense that the application would not need to be changed). The effects
> are almost the same - in both cases transactions might be lost but the
> database will survive.

right -- although that's a system wide setting and perhaps other
tables still require full synchronous fsync. Still -- fair point
(although I bet you are still going to get better performance going by
the temp route if only by a hair).

merlin


From: Scott Carey <scott(at)richrelevance(dot)com>
To: Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Postgres insert performance and storage requirement compared to Oracle
Date: 2010-10-27 17:48:35
Message-ID: E2FA14BE-A47B-4673-9DE1-851EB74009F2@richrelevance.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance


On Oct 26, 2010, at 2:54 PM, Mladen Gogala wrote:

> On 10/26/2010 5:27 PM, Jon Nelson wrote:
>> start loop:
>> populate rows in temporary table
>> insert from temporary table into permanent table
>> truncate temporary table
>> loop
>>
>> I do something similar, where I COPY data to a temporary table, do
>> lots of manipulations, and then perform a series of INSERTS from the
>> temporary table into a permanent table.
>>
>
> 1) It's definitely not faster because you have to insert into the
> temporary table, in addition to inserting into the permanent table.

It is almost always significantly faster than a direct bulk load into a table.
* The temp table has no indexes, the final table usually does, bulk operations on indexes are faster than per row operations.
* The final table might require both updates and inserts, doing these in bulk from a temp stage table is far faster than per row.
* You don't even have to commit after the merge from the temp table, and can loop until its all done, then commit -- though this can have table/index bloat implications if doing updates.

> 2) This is what I had in mind:
>
> mgogala=# create table a(c1 int);
> CREATE TABLE
> mgogala=# create temporary table t1(c1 int) on commit delete rows;
> CREATE TABLE
> mgogala=# begin;
> BEGIN
> mgogala=# insert into t1 select generate_series(1,1000);
> INSERT 0 1000
> mgogala=# insert into a select * from t1;
> INSERT 0 1000
> mgogala=# commit;
> COMMIT
> mgogala=# select count(*) from a;
> count
> -------
> 1000
> (1 row)
>
> mgogala=# select count(*) from t1;
> count
> -------
> 0
> (1 row)
>
> The table is created with "on commit obliterate rows" option which means
> that there is no need to do "truncate". The "truncate" command is a
> heavy artillery. Truncating a temporary table is like shooting ducks in
> a duck pond, with a howitzer.

??? Test it. DELETE is slow, truncate is nearly instantaneous for normal tables. For temp tables its the same thing. Maybe in Oracle TRUNCATE is a howitzer, in Postgres its lightweight. Your loop above requires a commit after every 1000 rows. What if you require that all rows are seen at once or not at all? What if you fail part way through? One big transaction is often a better idea and/or required. Especially in postgres, with no undo-log, bulk inserts in one large transaction work out very well -- usually better than multiple smaller transactions.
>
> --
>
> Mladen Gogala
> Sr. Oracle DBA
> 1500 Broadway
> New York, NY 10036
> (212) 329-5251
> http://www.vmsinfo.com
> The Leader in Integrated Media Intelligence Solutions
>
>
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance


From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Divakar Singh <dpsmails(at)yahoo(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Postgres insert performance and storage requirement compared to Oracle
Date: 2010-10-27 18:06:00
Message-ID: AANLkTik1Acnzpm62XgHADZtZ0sgkxSo=43jKrX7+N_f1@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

On Tue, Oct 26, 2010 at 11:10 PM, Divakar Singh <dpsmails(at)yahoo(dot)com> wrote:
> Dear All,
> Thanks for your inputs on the insert performance part.
> Any suggestion on storage requirement?
> VACUUM is certainly not an option, because this is something related to
> maintenance AFTER insertion.
> I am talking about the plain storage requirement w.r. to Oracle, which I
> observed is twice of Oracle in case millions of rows are inserted.
> Anybody who tried to analyze the average storage requirement of PG w.r. to
> Oracle?

There isn't much you can to about storage use other than avoid stupid
things (like using char() vs varchar()), smart table layout, toast
compression, etc. Are you sure this is a problem?

merlin


From: Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>
To: Scott Carey <scott(at)richrelevance(dot)com>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Postgres insert performance and storage requirement compared to Oracle
Date: 2010-10-27 18:06:53
Message-ID: 4CC86A3D.8000507@vmsinfo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

On 10/27/2010 1:48 PM, Scott Carey wrote:
>
> It is almost always significantly faster than a direct bulk load into a table.
> * The temp table has no indexes, the final table usually does, bulk operations on indexes are faster than per row operations.
> * The final table might require both updates and inserts, doing these in bulk from a temp stage table is far faster than per row.
> * You don't even have to commit after the merge from the temp table, and can loop until its all done, then commit -- though this can have table/index bloat implications if doing updates.

Scott, I find this very hard to believe. If you are inserting into a
temporary table and then into the target table, you will do 2 inserts
instead of just one. What you are telling me is that it is faster for me
to drive from NYC to Washington DC by driving first to Miami and then
from Miami to DC.

>> 2) This is what I had in mind:
>>
>> mgogala=# create table a(c1 int);
>> CREATE TABLE
>> mgogala=# create temporary table t1(c1 int) on commit delete rows;
>> CREATE TABLE
>> mgogala=# begin;
>> BEGIN
>> mgogala=# insert into t1 select generate_series(1,1000);
>> INSERT 0 1000
>> mgogala=# insert into a select * from t1;
>> INSERT 0 1000
>> mgogala=# commit;
>> COMMIT
>> mgogala=# select count(*) from a;
>> count
>> -------
>> 1000
>> (1 row)
>>
>> mgogala=# select count(*) from t1;
>> count
>> -------
>> 0
>> (1 row)
>>
>> The table is created with "on commit obliterate rows" option which means
>> that there is no need to do "truncate". The "truncate" command is a
>> heavy artillery. Truncating a temporary table is like shooting ducks in
>> a duck pond, with a howitzer.
> ??? Test it. DELETE is slow, truncate is nearly instantaneous for normal tables. For temp tables its the same thing. Maybe in Oracle TRUNCATE is a howitzer, in Postgres its lightweight.

Truncate has specific list of tasks to do:
1) lock the table in the exclusive mode to prevent concurrent
transactions on the table.
2) Release the file space and update the table headers.
3) Flush any buffers possibly residing in shared memory.
4) Repeat the procedures on the indexes.

Of course, in case of the normal table, all of these changes are logged,
possibly producing WAL archives. That is still much faster than delete
which depends on the number of rows that need to be deleted, but not
exactly lightweight, either. In Postgres, truncate recognizes that the
table is a temporary table so it makes a few shortcuts, which makes the
truncate faster.

1) No need to flush buffers.
2) Locking requirements are much less stringent.
3) No WAL archives are produced.

Temporary tables are completely different beasts in Oracle and Postgres.
Yes, you are right, truncate of a temporary table is a big no-no in the
Oracle world, especially in the RAC environment. However, I do find "ON
COMMIT DELETE ROWS" trick to be more elegant than the truncate. Here is
the classic Tom Kyte, on the topic of truncating the temporary tables:
*http://tinyurl.com/29kph3p

"*NO. truncate is DDL. DDL is expensive. Truncation is something that
should be done very infrequently.
Now, I don't mean "turn your truncates into DELETE's" -- that would
be even worse. I mean -- avoid having
to truncate or delete every row in the first place. Use a transaction
based temporary table and upon commit, it'll empty itself."

> Your loop above requires a commit after every 1000 rows. What if you require that all rows are seen at once or not at all? What if you fail part way through? One big transaction is often a better idea and/or required. Especially in postgres, with no undo-log, bulk inserts in one large transaction work out very well -- usually better than multiple smaller transactions.

I don't contest that. I also prefer to do things in one big transaction,
if possible.

--

Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com
The Leader in Integrated Media Intelligence Solutions


From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>
Cc: Scott Carey <scott(at)richrelevance(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Postgres insert performance and storage requirement compared to Oracle
Date: 2010-10-27 18:13:21
Message-ID: AANLkTinGzybNd5M0ZTLV6JO4aNxJkbsi0oDtehNR9LUw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

On Wed, Oct 27, 2010 at 2:06 PM, Mladen Gogala
<mladen(dot)gogala(at)vmsinfo(dot)com> wrote:
> Scott, I find this very hard to believe. If you are inserting into a
> temporary table and then into the target table, you will do 2 inserts
> instead of just one. What you are telling me is that it is faster for me to
> drive from NYC to Washington DC by driving first to Miami and then from
> Miami to DC.

The reason why in one transaction per insert environment staging to
temp table first is very simple...non temp table inserts have to be
wal logged and fsync'd. When you batch them into the main table, you
get more efficient use of WAL and ONE sync operation. This is
especially advantageous if the inserts are coming fast and furious and
there are other things going on in the database at the time, or there
are multiple inserters.

If you have luxury of batching data in a transaction, you don't have
to worry about it.

merlin


From: Divakar Singh <dpsmails(at)yahoo(dot)com>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Postgres insert performance and storage requirement compared to Oracle
Date: 2010-10-27 18:14:30
Message-ID: 173298.80592.qm@web65406.mail.ac4.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

yes this is a very clearly visible problem.
The difference b/w oracle and PG increases with more rows.
when oracle takes 3 GB, PG takes around 6 GB.
I only use varchar.
I will try to use your tips on "smart table layout, toast compression".
Assuming these suggested options do not have any performance penalty?

Best Regards,
Divakar

________________________________
From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Divakar Singh <dpsmails(at)yahoo(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>; Mladen Gogala
<mladen(dot)gogala(at)vmsinfo(dot)com>; pgsql-performance(at)postgresql(dot)org
Sent: Wed, October 27, 2010 11:36:00 PM
Subject: Re: [PERFORM] Postgres insert performance and storage requirement
compared to Oracle

On Tue, Oct 26, 2010 at 11:10 PM, Divakar Singh <dpsmails(at)yahoo(dot)com> wrote:
> Dear All,
> Thanks for your inputs on the insert performance part.
> Any suggestion on storage requirement?
> VACUUM is certainly not an option, because this is something related to
> maintenance AFTER insertion.
> I am talking about the plain storage requirement w.r. to Oracle, which I
> observed is twice of Oracle in case millions of rows are inserted.
> Anybody who tried to analyze the average storage requirement of PG w.r. to
> Oracle?

There isn't much you can to about storage use other than avoid stupid
things (like using char() vs varchar()), smart table layout, toast
compression, etc. Are you sure this is a problem?

merlin


From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Divakar Singh <dpsmails(at)yahoo(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Postgres insert performance and storage requirement compared to Oracle
Date: 2010-10-27 18:28:06
Message-ID: AANLkTin90tsRz4BNQP1gqrg6zCpQp0Yt0bQXKQHC22Ps@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

On Wed, Oct 27, 2010 at 2:14 PM, Divakar Singh <dpsmails(at)yahoo(dot)com> wrote:
> yes this is a very clearly visible problem.
> The difference b/w oracle and PG increases with more rows.
> when oracle takes 3 GB, PG takes around 6 GB.
> I only use varchar.
> I will try to use your tips on "smart table layout, toast compression".
> Assuming these suggested options do not have any performance penalty?

These will only be helpful in particular cases, for example if your
layout is bad :-). toast compression is for dealing with large datums
(on by default iirc). Also it's very hard to get apples to apples
comparison test via synthetic insertion benchmark. It's simply not
the whole story.

The deal with postgres is that things are pretty optimized and fairly
unlikely to get a whole lot better than they are today. The table
layout is pretty optimal already, nulls are bitmaps, data lengths are
using fancy bitwise length mechanism, etc. Each record in postgres
has a 20 byte header that has to be factored in to any storage
estimation, plus the index usage.

Postgres indexes are pretty compact, and oracle (internals I am not
familiar with) also has to do MVCC type management, so I am suspecting
your measurement is off (aka, operator error) or oracle is cheating
somehow by optimizing away storage requirements somehow via some sort
of tradeoff. However you still fail to explain why storage size is a
problem. Are planning to port oracle to postgres on a volume that is
>50% full? :-)

merlin


From: Jesper Krogh <jesper(at)krogh(dot)cc>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: Divakar Singh <dpsmails(at)yahoo(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Postgres insert performance and storage requirement compared to Oracle
Date: 2010-10-27 18:42:19
Message-ID: 4CC8728B.9050205@krogh.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

On 2010-10-27 20:28, Merlin Moncure wrote:
> Postgres indexes are pretty compact, and oracle (internals I am not
> familiar with) also has to do MVCC type management, so I am suspecting
> your measurement is off (aka, operator error) or oracle is cheating
> somehow by optimizing away storage requirements somehow via some sort
> of tradeoff. However you still fail to explain why storage size is a
> problem. Are planning to port oracle to postgres on a volume that is
> 50% full? :-)
>
Pretty ignorant comment.. sorry ..

But when your database approaches something that is not mainly
fitting in memory, space directly translates into speed and a more
compact table utillizes the OS-page cache better. This is both
true for index and table page caching.

And the more compact your table the later you hit the stage where
you cant fit into memory anymore.

.. but if above isn't issues, then your statements are true.

--
Jesper


From: Steve Singer <ssinger(at)ca(dot)afilias(dot)info>
To: Divakar Singh <dpsmails(at)yahoo(dot)com>
Cc: Merlin Moncure <mmoncure(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Postgres insert performance and storage requirement compared to Oracle
Date: 2010-10-27 18:51:02
Message-ID: 4CC87496.9070703@ca.afilias.info
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

On 10-10-27 02:14 PM, Divakar Singh wrote:
> yes this is a very clearly visible problem.
> The difference b/w oracle and PG increases with more rows.
> when oracle takes 3 GB, PG takes around 6 GB.
> I only use varchar.
> I will try to use your tips on "smart table layout, toast compression".
> Assuming these suggested options do not have any performance penalty?
> Best Regards,
> Divakar

In between test runs are you cleaning out the tables with a "DELETE FROM
aaaaa" or are you using the TRUNCATE command? Or dropping the table and
recreating it.

If your just using DELETE it might be that disk space is still being
used by the old versions of the rows.

Also is postgresql using more space than oracle for storing the index
data or the main table data? and is any particular index larger on
postgresql compared to Oracle.

>
>
> ------------------------------------------------------------------------
> *From:* Merlin Moncure <mmoncure(at)gmail(dot)com>
> *To:* Divakar Singh <dpsmails(at)yahoo(dot)com>
> *Cc:* Robert Haas <robertmhaas(at)gmail(dot)com>; Mladen Gogala
> <mladen(dot)gogala(at)vmsinfo(dot)com>; pgsql-performance(at)postgresql(dot)org
> *Sent:* Wed, October 27, 2010 11:36:00 PM
> *Subject:* Re: [PERFORM] Postgres insert performance and storage
> requirement compared to Oracle
>
> On Tue, Oct 26, 2010 at 11:10 PM, Divakar Singh <dpsmails(at)yahoo(dot)com
> <mailto:dpsmails(at)yahoo(dot)com>> wrote:
> > Dear All,
> > Thanks for your inputs on the insert performance part.
> > Any suggestion on storage requirement?
> > VACUUM is certainly not an option, because this is something related to
> > maintenance AFTER insertion.
> > I am talking about the plain storage requirement w.r. to Oracle, which I
> > observed is twice of Oracle in case millions of rows are inserted.
> > Anybody who tried to analyze the average storage requirement of PG
> w.r. to
> > Oracle?
>
> There isn't much you can to about storage use other than avoid stupid
> things (like using char() vs varchar()), smart table layout, toast
> compression, etc. Are you sure this is a problem?
>
> merlin
>


From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Jesper Krogh <jesper(at)krogh(dot)cc>
Cc: Divakar Singh <dpsmails(at)yahoo(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Postgres insert performance and storage requirement compared to Oracle
Date: 2010-10-27 18:51:23
Message-ID: AANLkTimMSMcqkAY9jMi8UZKveY3YS-p-V60KckvtDB6g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

On Wed, Oct 27, 2010 at 2:42 PM, Jesper Krogh <jesper(at)krogh(dot)cc> wrote:
> On 2010-10-27 20:28, Merlin Moncure wrote:
>>
>> Postgres indexes are pretty compact, and oracle (internals I am not
>> familiar with) also has to do MVCC type management, so I am suspecting
>> your measurement is off (aka, operator error) or oracle is cheating
>> somehow by optimizing away storage requirements somehow via some sort
>> of tradeoff.  However you still fail to explain why storage size is a
>> problem.  Are planning to port oracle to postgres on a volume that is
>> 50% full? :-)
>>
>
> Pretty ignorant comment.. sorry ..
>
> But when your database approaches something that is not mainly
> fitting in memory, space directly translates into speed and a more
> compact table utillizes the OS-page cache better. This is both
> true for index and table page caching.
>
> And the more compact your table the later you hit the stage where
> you cant fit into memory anymore.
>
> .. but if above isn't issues, then your statements are true.

Yes, I am quite aware of how the o/s page cache works. All else being
equal, I more compact database obviously would be preferred. However
'all else' is not necessarily equal. I can mount my database on bzip
volume, that must make it faster, right? wrong. I understand the
postgres storage architecture pretty well, and the low hanging fruit
having been grabbed further layout compression is only going to come
as a result of tradeoffs.

Now, comparing oracle vs postgres, mvcc works differently because
oracle uses rollback logs while postgres maintains extra/old versions
in the heap. This will add up to big storage usage based on various
things, but should not so much be reflected via insert only test.

merlin


From: Jesper Krogh <jesper(at)krogh(dot)cc>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: Divakar Singh <dpsmails(at)yahoo(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Postgres insert performance and storage requirement compared to Oracle
Date: 2010-10-27 19:47:23
Message-ID: 4CC881CB.8090806@krogh.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

On 2010-10-27 20:51, Merlin Moncure wrote:

>> Yes, I am quite aware of how the o/s page cache works. All else being
>> equal, I more compact database obviously would be preferred. However
>> 'all else' is not necessarily equal. I can mount my database on bzip
>> volume, that must make it faster, right? wrong. I understand the
>> postgres storage architecture pretty well, and the low hanging fruit
>> having been grabbed further layout compression is only going to come
>> as a result of tradeoffs.
>>
Or configureabillity.. Not directly related to overall space consumption
but I have been working on a patch that would make TOAST* kick in
earlier in the process, giving a "slimmer" main table with visibillity
information
and simple columns and moving larger colums more aggressively to TOAST.

The overall disadvantage of TOAST is the need for an extra disk seek if
you actually need the data. If the application rarely needs the large
columns but often do count/filtering on simple values this will eventually
lead to a better utillization of the OS-page-cache with a very small
overhead
to PG (in terms of code) and 0 overhead in the applications that benefit.

Keeping in mind that as SSD-drives get more common the "the extra disk seek"
drops dramatically, but the drive is by itself probably still 100-1000x
slower than
main memory, so keeping "the right data" in the OS-cache is also a
parameter.

If you deal with data where the individual tuple-size goes up, currently
TOAST
first kicks in at 2KB (compressed size) which leads to a very sparse
main table
in terms of visibillity information and count and selects on simple values
will drag a huge amount of data into the cache-layers thats not needed
there.

Another suggestion could be to make the compression of text columns kick in
earlier .. if thats possible. (I dont claim that its achiveable)

Unless the tuple-header is hugely bloated I have problems creating a
situation in my
head where hammering that one can change anything significantly.

* http://www.mail-archive.com/pgsql-hackers(at)postgresql(dot)org/msg159726.html

--
Jesper


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Ivan Voras <ivoras(at)freebsd(dot)org>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Re: Postgres insert performance and storage requirement compared to Oracle
Date: 2010-10-28 02:01:07
Message-ID: AANLkTimB0FG2fPQPLL4SVgLWHTh8pYn3pJODTPDeaEM1@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

On Wed, Oct 27, 2010 at 6:13 AM, Ivan Voras <ivoras(at)freebsd(dot)org> wrote:
> On 10/26/10 17:41, Merlin Moncure wrote:
>> On Tue, Oct 26, 2010 at 11:08 AM, Leonardo Francalanci <m_lists(at)yahoo(dot)it> wrote:
>>>> temp  tables are not wal logged or
>>>> synced.  Periodically they can be flushed  to a permanent table.
>>>
>>>
>>> What do you mean with "Periodically they can be flushed  to
>>> a permanent table"? Just doing
>>>
>>> insert into tabb select * from temptable
>>>
>>
>> yup, that's exactly what I mean -- this will give you more uniform
>
> In effect, when so much data is in temporary storage, a better option
> would be to simply configure "synchronous_commit = off" (better in the
> sense that the application would not need to be changed). The effects
> are almost the same - in both cases transactions might be lost but the
> database will survive.

Gee, I wonder if it would possible for PG to automatically do an
asynchronous commit of any transaction which touches only temp tables.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Ivan Voras <ivoras(at)freebsd(dot)org>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Re: Postgres insert performance and storage requirement compared to Oracle
Date: 2010-10-28 03:32:25
Message-ID: 3989.1288236745@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> Gee, I wonder if it would possible for PG to automatically do an
> asynchronous commit of any transaction which touches only temp tables.

Hmm ... do we need a commit at all in such a case? If our XID has only
gone into temp tables, I think we need to write to clog, but we don't
really need a WAL entry, synced or otherwise.

regards, tom lane


From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Jesper Krogh <jesper(at)krogh(dot)cc>
Cc: Divakar Singh <dpsmails(at)yahoo(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Postgres insert performance and storage requirement compared to Oracle
Date: 2010-10-28 13:13:22
Message-ID: AANLkTinBtTQ2mTPvA3n_yF=rLNfeK2RRW7FGOHqfWmv_@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

On Wed, Oct 27, 2010 at 3:47 PM, Jesper Krogh <jesper(at)krogh(dot)cc> wrote:
> On 2010-10-27 20:51, Merlin Moncure wrote:
>
>>> Yes, I am quite aware of how the o/s page cache works.  All else being
>>> equal, I more compact database obviously would be preferred.  However
>>> 'all else' is not necessarily equal.  I can mount my database on bzip
>>> volume, that must make it faster, right?  wrong.  I understand the
>>> postgres storage architecture pretty well, and the low hanging fruit
>>> having been grabbed further layout compression is only going to come
>>> as a result of tradeoffs.
>>>
>
> Or configureabillity.. Not directly related to overall space consumption
> but I have been working on a patch that would make TOAST* kick in
> earlier in the process, giving a "slimmer" main table with visibillity
> information
> and simple columns and moving larger colums more aggressively to TOAST.

Do you have any benchmarks supporting if/when such a change would be beneficial?

merlin


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PERFORM] Re: Postgres insert performance and storage requirement compared to Oracle
Date: 2010-10-28 14:23:05
Message-ID: AANLkTikGSRKDun7UXtEfBbkQx6OdqMmmpupka1kqCwe6@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

[moving to -hackers, from -performance]

On Wed, Oct 27, 2010 at 11:32 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> Gee, I wonder if it would possible for PG to automatically do an
>> asynchronous commit of any transaction which touches only temp tables.
>
> Hmm ... do we need a commit at all in such a case?  If our XID has only
> gone into temp tables, I think we need to write to clog, but we don't
> really need a WAL entry, synced or otherwise.

I think we might need a commit record anyway to keep Hot Standby's
KnownAssignedXids tracking from getting confused. It might be
possible to suppress it when wal_level is less than hot_standby, but
I'm not sure it's worth it.

You definitely need to write to CLOG, because otherwise a subsequent
transaction from within the same backend might fail to see those
tuples.

Also, I think that the right test is probably "Have we done anything
that needs to be WAL-logged?". We can get that conveniently by
checking whether XactLastRecEnd.xrecoff. One option looks to be to
just change this:

if (XactSyncCommit || forceSyncCommit || nrels > 0)

...to say ((XactSyncCommit && XactLastRecEnd.recoff != 0) ||
forceSyncCommit || nrels > 0).

But I'm wondering if we can instead rejigger things so that this test
moves out of the !markXidCommitted branch of the if statement and
drops down below the whole if statement.

/*
* If we didn't create XLOG entries, we're done here;
otherwise we
* should flush those entries the same as a commit
record. (An
* example of a possible record that wouldn't cause an XID to be
* assigned is a sequence advance record due to
nextval() --- we want
* to flush that to disk before reporting commit.)
*/
if (XactLastRecEnd.xrecoff == 0)
goto cleanup;

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Compan


From: Jesper Krogh <jesper(at)krogh(dot)cc>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: Divakar Singh <dpsmails(at)yahoo(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Postgres insert performance and storage requirement compared to Oracle
Date: 2010-10-28 15:28:53
Message-ID: 4CC996B5.2050203@krogh.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

On 2010-10-28 15:13, Merlin Moncure wrote:
> On Wed, Oct 27, 2010 at 3:47 PM, Jesper Krogh<jesper(at)krogh(dot)cc> wrote:
>
>> On 2010-10-27 20:51, Merlin Moncure wrote:
>>
>>
>>>> Yes, I am quite aware of how the o/s page cache works. All else being
>>>> equal, I more compact database obviously would be preferred. However
>>>> 'all else' is not necessarily equal. I can mount my database on bzip
>>>> volume, that must make it faster, right? wrong. I understand the
>>>> postgres storage architecture pretty well, and the low hanging fruit
>>>> having been grabbed further layout compression is only going to come
>>>> as a result of tradeoffs.
>>>>
>>>>
>> Or configureabillity.. Not directly related to overall space consumption
>> but I have been working on a patch that would make TOAST* kick in
>> earlier in the process, giving a "slimmer" main table with visibillity
>> information
>> and simple columns and moving larger colums more aggressively to TOAST.
>>
> Do you have any benchmarks supporting if/when such a change would be beneficial?
>
>
On, IO-bound queries it pretty much translates to the ration between
the toast-table-size vs. the main-table-size.

Trying to aggressively speed up "select count(*) from table" gives this:
http://www.mail-archive.com/pgsql-hackers(at)postgresql(dot)org/msg146153.html
with shutdown of pg and drop caches inbetween... the "default" select
count (*) on 50K tuples
gives 4.613ms (2 tuples pr page) vs. 318ms... (8 tuples pr page).

PG default is inbetween...

--
Jesper