Improving speed of copy

Lists: pgsql-hackers
From: "Shridhar Daithankar" <shridhar_daithankar(at)persistent(dot)co(dot)in>
To: Pgsql-hackers(at)postgresql(dot)org
Subject: Improving speed of copy
Date: 2002-09-20 15:52:08
Message-ID: 3D8B9180.32413.19C49E9E@localhost
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi all,

While testing for large databases, I am trying to load 12.5M rows of data from
a text file and it takes lot longer than mysql even with copy.

Mysql takes 221 sec. v/s 1121 sec. for postgres. For postgresql, that is around
11.5K rows per second. Each tuple has 23 fields with fixed length of around 100
bytes

I wrote a programs which does inserts in batches but none of thme reaches
performance of copy. I tried 1K/5K/10K/100K rows in a transaction but it can
not cross 2.5K rows/sec.

The machine is 800MHz, P-III/512MB/IDE disk. Postmaster is started with 30K
buffers i.e. around 235MB buffers. Kernel caching paramaters are defaults.

Besides there is issue of space. Mysql takes 1.4GB space for 1.2GB text data
and postgresql takes 3.2GB of space. Even with 40 bytes per row overhead
mentioned in FAQ, that should come to around 1.7GB, counting for 40% increase
in size. Vacuum was run on database.

Any further help? Especially if batch inserts could be speed up, that would be
great..

Bye
Shridhar

--
Alone, adj.: In bad company. -- Ambrose Bierce, "The Devil's Dictionary"


From: "Jonah H(dot) Harris" <jharris(at)nightstarcorporation(dot)com>
To: <shridhar_daithankar(at)persistent(dot)co(dot)in>, <Pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Improving speed of copy
Date: 2002-09-20 16:14:30
Message-ID: 000c01c260c0$cdaf99e0$b77b2344@gemini
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Are you using copy within a transaction?

I don't know how to explain the size difference tho. I have never seen an
overhead difference that large. What type of MySQL tables were you using
and what version?

Have you tried this with Oracle or similar commercial database?

-----Original Message-----
From: pgsql-hackers-owner(at)postgresql(dot)org
[mailto:pgsql-hackers-owner(at)postgresql(dot)org]On Behalf Of Shridhar
Daithankar
Sent: Friday, September 20, 2002 9:52 AM
To: Pgsql-hackers(at)postgresql(dot)org
Subject: [HACKERS] Improving speed of copy

Hi all,

While testing for large databases, I am trying to load 12.5M rows of data
from
a text file and it takes lot longer than mysql even with copy.

Mysql takes 221 sec. v/s 1121 sec. for postgres. For postgresql, that is
around
11.5K rows per second. Each tuple has 23 fields with fixed length of around
100
bytes

I wrote a programs which does inserts in batches but none of thme reaches
performance of copy. I tried 1K/5K/10K/100K rows in a transaction but it can
not cross 2.5K rows/sec.

The machine is 800MHz, P-III/512MB/IDE disk. Postmaster is started with 30K
buffers i.e. around 235MB buffers. Kernel caching paramaters are defaults.

Besides there is issue of space. Mysql takes 1.4GB space for 1.2GB text data
and postgresql takes 3.2GB of space. Even with 40 bytes per row overhead
mentioned in FAQ, that should come to around 1.7GB, counting for 40%
increase
in size. Vacuum was run on database.

Any further help? Especially if batch inserts could be speed up, that would
be
great..

Bye
Shridhar

--
Alone, adj.: In bad company. -- Ambrose Bierce, "The Devil's Dictionary"

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to majordomo(at)postgresql(dot)org)


From: "Shridhar Daithankar" <shridhar_daithankar(at)persistent(dot)co(dot)in>
To: Pgsql-hackers(at)postgresql(dot)org
Subject: Re: Improving speed of copy
Date: 2002-09-20 16:18:30
Message-ID: 3D8B97AE.23803.19DCC348@localhost
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 20 Sep 2002 at 21:22, Shridhar Daithankar wrote:

> Mysql takes 221 sec. v/s 1121 sec. for postgres. For postgresql, that is around
> 11.5K rows per second. Each tuple has 23 fields with fixed length of around 100
> bytes
>
> I wrote a programs which does inserts in batches but none of thme reaches
> performance of copy. I tried 1K/5K/10K/100K rows in a transaction but it can
> not cross 2.5K rows/sec.

1121 sec. was time with postgres default of 64 buffers. With 30K buffers it has
degraded to 1393 sec.

One more issue is time taken for composite index creation. It's 4341 sec as
opposed to 436 sec for mysql. These are three non-unique character fields where
the combination itself is not unique as well. Will doing a R-Tree index would
be a better choice?

In select test where approx. 15 rows where reported with query on index field,
mysql took 14 sec. and psotgresql took 17.5 sec. Not bad but other issues
eclipse the result..

TIA once again..

Bye
Shridhar

--
revolutionary, adj.: Repackaged.


From: "Shridhar Daithankar" <shridhar_daithankar(at)persistent(dot)co(dot)in>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Improving speed of copy
Date: 2002-09-20 16:22:34
Message-ID: 3D8B98A2.22352.19E07DA1@localhost
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 20 Sep 2002 at 10:14, Jonah H. Harris wrote:

> Are you using copy within a transaction?

No. Will that help? I can try. But the utility I wrote, I could insert say 10K
records in a transaction. Copy seems to be doing it all in one transaction. I
don't get any value for select count(*) in another psql session till copy
finishes..

> I don't know how to explain the size difference tho. I have never seen an
> overhead difference that large. What type of MySQL tables were you using
> and what version?

Dunno.. Not my machine.. I am just trying to tune postgres on a friends
machine.. not even postgres/root there.. So can not answer these questions fast
but will get back on themm..

> Have you tried this with Oracle or similar commercial database?

No. This requirement is specific for open source database..

May be in another test on a 4 way/4GB RAM machine, I might seem another
result. Mysql was creating index on a 10GB table for last 25hours and last I
knew it wasn't finished..Must be something with parameters..

Will keep you guys posted..

Bye
Shridhar

--
brain, n: The apparatus with which we think that we think. -- Ambrose Bierce,
"The Devil's Dictionary"


From: "Jonah H(dot) Harris" <jharris(at)nightstarcorporation(dot)com>
To: "'Jonah H(dot) Harris'" <jharris(at)nightstarcorporation(dot)com>, <shridhar_daithankar(at)persistent(dot)co(dot)in>, <Pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Improving speed of copy
Date: 2002-09-20 16:26:46
Message-ID: 000e01c260c2$83e1f220$b77b2344@gemini
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Also, did you disable fsync?

-----Original Message-----
From: pgsql-hackers-owner(at)postgresql(dot)org
[mailto:pgsql-hackers-owner(at)postgresql(dot)org]On Behalf Of Jonah H. Harris
Sent: Friday, September 20, 2002 10:15 AM
To: shridhar_daithankar(at)persistent(dot)co(dot)in; Pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] Improving speed of copy

Are you using copy within a transaction?

I don't know how to explain the size difference tho. I have never seen an
overhead difference that large. What type of MySQL tables were you using
and what version?

Have you tried this with Oracle or similar commercial database?

-----Original Message-----
From: pgsql-hackers-owner(at)postgresql(dot)org
[mailto:pgsql-hackers-owner(at)postgresql(dot)org]On Behalf Of Shridhar
Daithankar
Sent: Friday, September 20, 2002 9:52 AM
To: Pgsql-hackers(at)postgresql(dot)org
Subject: [HACKERS] Improving speed of copy

Hi all,

While testing for large databases, I am trying to load 12.5M rows of data
from
a text file and it takes lot longer than mysql even with copy.

Mysql takes 221 sec. v/s 1121 sec. for postgres. For postgresql, that is
around
11.5K rows per second. Each tuple has 23 fields with fixed length of around
100
bytes

I wrote a programs which does inserts in batches but none of thme reaches
performance of copy. I tried 1K/5K/10K/100K rows in a transaction but it can
not cross 2.5K rows/sec.

The machine is 800MHz, P-III/512MB/IDE disk. Postmaster is started with 30K
buffers i.e. around 235MB buffers. Kernel caching paramaters are defaults.

Besides there is issue of space. Mysql takes 1.4GB space for 1.2GB text data
and postgresql takes 3.2GB of space. Even with 40 bytes per row overhead
mentioned in FAQ, that should come to around 1.7GB, counting for 40%
increase
in size. Vacuum was run on database.

Any further help? Especially if batch inserts could be speed up, that would
be
great..

Bye
Shridhar

--
Alone, adj.: In bad company. -- Ambrose Bierce, "The Devil's Dictionary"

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to majordomo(at)postgresql(dot)org)

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster


From: "Shridhar Daithankar" <shridhar_daithankar(at)persistent(dot)co(dot)in>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Improving speed of copy
Date: 2002-09-20 16:30:43
Message-ID: 3D8B9A8B.14691.19E7F04E@localhost
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 20 Sep 2002 at 10:26, Jonah H. Harris wrote:

> Also, did you disable fsync?

Aarrrgggh.. If that turns out to be culprit, I will kill him..;-)

Problem is I can't see postgresql.conf nor can access his command history and
he has left for the day..

I will count that in checklist but this is postgresql 7.1.3 on RHL7.2.. IIRC it
should have WAL, in which case -F should not matter much..

On second thought, would it be worth to try 7.2.2, compiled? Will there be any
performance difference? I can see on other machine that Mandrake8.2 has come
with 7.2-12..

I think this may be the factor as well..

Bye
Shridhar

--
A hypothetical paradox: What would happen in a battle between an Enterprise
security team, who always get killed soon after appearing, and a squad of
Imperial Stormtroopers, who can't hit the broad side of a planet? -- Tom
Galloway


From: Mike Benoit <mikeb(at)netnation(dot)com>
To: shridhar_daithankar(at)persistent(dot)co(dot)in
Cc: Pgsql-hackers(at)postgresql(dot)org
Subject: Re: Improving speed of copy
Date: 2002-09-20 17:27:13
Message-ID: 1032542833.7433.11.camel@mikeb.staff.netnation.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, 2002-09-20 at 08:52, Shridhar Daithankar wrote:

> Besides there is issue of space. Mysql takes 1.4GB space for 1.2GB text data
> and postgresql takes 3.2GB of space. Even with 40 bytes per row overhead
> mentioned in FAQ, that should come to around 1.7GB, counting for 40% increase
> in size. Vacuum was run on database.
>

How did you calculate the size of database? If you used "du" make sure
you do it in the data/base directory as to not include the WAL files.

--
Best Regards,

Mike Benoit
NetNation Communication Inc.
Systems Engineer
Tel: 604-684-6892 or 888-983-6600
---------------------------------------

Disclaimer: Opinions expressed here are my own and not
necessarily those of my employer


From: "Nigel J(dot) Andrews" <nandrews(at)investsystems(dot)co(dot)uk>
To: Shridhar Daithankar <shridhar_daithankar(at)persistent(dot)co(dot)in>
Cc: Pgsql-hackers(at)postgresql(dot)org
Subject: Re: Improving speed of copy
Date: 2002-09-20 17:41:24
Message-ID: Pine.LNX.4.21.0209201838590.599-100000@ponder.fairway2k.co.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, 20 Sep 2002, Shridhar Daithankar wrote:

> In select test where approx. 15 rows where reported with query on index field,
> mysql took 14 sec. and psotgresql took 17.5 sec. Not bad but other issues
> eclipse the result..

I don't know about anyone else but I find this aspect strange. That's 1 second
(approx.) per row retrieved. That is pretty dire for an index scan. The
data/index must be very non unique.

--
Nigel J. Andrews


From: Joe Conway <mail(at)joeconway(dot)com>
To: "Nigel J(dot) Andrews" <nandrews(at)investsystems(dot)co(dot)uk>
Cc: Shridhar Daithankar <shridhar_daithankar(at)persistent(dot)co(dot)in>, Pgsql-hackers(at)postgresql(dot)org
Subject: Re: Improving speed of copy
Date: 2002-09-20 18:28:15
Message-ID: 3D8B68BF.1030700@joeconway.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Nigel J. Andrews wrote:
> On Fri, 20 Sep 2002, Shridhar Daithankar wrote:
>
>>In select test where approx. 15 rows where reported with query on index field,
>>mysql took 14 sec. and psotgresql took 17.5 sec. Not bad but other issues
>>eclipse the result..
>
> I don't know about anyone else but I find this aspect strange. That's 1 second
> (approx.) per row retrieved. That is pretty dire for an index scan. The
> data/index must be very non unique.
>

Yeah, I'd agree that is strange. Can we see EXPLAIN ANALYZE for that query.

Also, in one of your ealier posts you mentioned a slowdown after raising
shared buffers from the default 64 to 30000. You might have driven the machine
into swapping. Maybe try something more like 10000 - 15000.

HTH,

Joe


From: "Shridhar Daithankar" <shridhar_daithankar(at)persistent(dot)co(dot)in>
To: Pgsql-hackers(at)postgresql(dot)org
Subject: Re: Improving speed of copy
Date: 2002-09-21 08:44:26
Message-ID: 3D8C7EC2.21798.1D636CD3@localhost
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 20 Sep 2002 at 10:27, Mike Benoit wrote:

> On Fri, 2002-09-20 at 08:52, Shridhar Daithankar wrote:
>
> > Besides there is issue of space. Mysql takes 1.4GB space for 1.2GB text data
> > and postgresql takes 3.2GB of space. Even with 40 bytes per row overhead
> > mentioned in FAQ, that should come to around 1.7GB, counting for 40% increase
> > in size. Vacuum was run on database.
> >
>
> How did you calculate the size of database? If you used "du" make sure
> you do it in the data/base directory as to not include the WAL files.

OK latest experiments, I turned number of buffers 15K and fsync is disabled..
Load time is now 1250 sec.

I noticed lots of notices in log saying, XLogWrite: new log files created.. I
am pushing wal_buffers to 1000 and wal_files to 40 to test again.. I hope it
gives me some required boost..

And BTW about disk space usage, it's 2.6G with base pg_xlog taking 65M. still
not good..

Will keep you guys updated..

Bye
Shridhar

--
It is necessary to have purpose. -- Alice #1, "I, Mudd", stardate 4513.3


From: "Shridhar Daithankar" <shridhar_daithankar(at)persistent(dot)co(dot)in>
To: Pgsql-hackers(at)postgresql(dot)org
Subject: Re: Improving speed of copy
Date: 2002-09-23 06:34:59
Message-ID: 3D8F036B.9931.3585197@localhost
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 20 Sep 2002 at 18:41, Nigel J. Andrews wrote:

> On Fri, 20 Sep 2002, Shridhar Daithankar wrote:
>
> > In select test where approx. 15 rows where reported with query on index field,
> > mysql took 14 sec. and psotgresql took 17.5 sec. Not bad but other issues
> > eclipse the result..
>
> I don't know about anyone else but I find this aspect strange. That's 1 second
> (approx.) per row retrieved. That is pretty dire for an index scan. The
> data/index must be very non unique.

Sorry for late reply.. The numbers were scaled off.. Actually my fiend forgot
to add units to those number.. The actual numbers are 140ms for mysql and 17
5ms for postgresql.. Further since result are obtained via 'time psql' higher
overhead of postgres connection establishement is factored in..

Neck to neck I would say..

Bye
Shridhar

--
Steele's Law: There exist tasks which cannot be done by more than ten men or
fewer than one hundred.


From: "Michael Paesold" <mpaesold(at)gmx(dot)at>
To: <pgsql-hackers(at)postgresql(dot)org>
Subject: Insert Performance
Date: 2002-09-25 20:10:30
Message-ID: 029101c264cf$99ee3bd0$4201a8c0@beeblebrox
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

I am wondering about bad INSERT performance compared against the speed of
COPY. (I use 7.2.2 on RedHat 7.2)

I have a table with about 30 fields, some constraints, some indexes, some
foreign key constraints. I use COPY to import old data. Copying about
10562 rows takes about 19 seconds.

For testing I have writtin a simple function in PL/pgSQL that inserts dummy
records into the same table (just a FOR loop and an INSERT INTO ...).

To insert another 10562 rows takes about 12 minutes now!!!

What is the problem with INSERT in postgresql? I usually don't compare mysql
and postgresql because mysql is just playing stuff, but I have think that
the insert performance of mysql (even with innodb tables) is about 10 times
better than the insert performance of postgresql.

What is the reason and what can be done about it?

Best Regards,
Michael

P.S: Perhaps you want to know about my postgresql.conf

#
# Shared Memory Size
#
shared_buffers = 12288 # 2*max_connections, min 16
max_fsm_relations = 100 # min 10, fsm is free space map
max_fsm_pages = 20000 # min 1000, fsm is free space map
max_locks_per_transaction = 64 # min 10
wal_buffers = 8 # min 4

#
# Non-shared Memory Sizes
#
sort_mem = 4096 # min 32 (in Kb)
vacuum_mem = 16384 # min 1024

#
# Write-ahead log (WAL)
#
wal_files = 8 # range 0-64, default 0
wal_sync_method = fdatasync # the default varies across platforms:
# # fsync, fdatasync, open_sync, or open_datasync
fsync = true


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Michael Paesold" <mpaesold(at)gmx(dot)at>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Insert Performance
Date: 2002-09-25 20:49:32
Message-ID: 27812.1032986972@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Michael Paesold" <mpaesold(at)gmx(dot)at> writes:
> To insert another 10562 rows takes about 12 minutes now!!!

See
http://www.ca.postgresql.org/users-lounge/docs/7.2/postgres/populate.html
particularly the point about not committing each INSERT as a separate
transaction.

regards, tom lane


From: "Michael Paesold" <mpaesold(at)gmx(dot)at>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Insert Performance
Date: 2002-09-25 21:25:54
Message-ID: 031f01c264da$22729d20$4201a8c0@beeblebrox
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:

> "Michael Paesold" <mpaesold(at)gmx(dot)at> writes:
> > To insert another 10562 rows takes about 12 minutes now!!!
>
> See
> http://www.ca.postgresql.org/users-lounge/docs/7.2/postgres/populate.html
> particularly the point about not committing each INSERT as a separate
> transaction.
>
> regards, tom lane

As I said I wrote a function to insert the rows (PL/pgSQL). All values were
inserted inside a single function call; I always though that a function call
would be executed inside a transaction block. Experience says it does.

About the other points in the docs:

> Use COPY FROM:
Well, I am currently comparing INSERT to COPY ... ;)

> Remove Indexes:
Doesn't COPY also have to update indexes?

> ANALYZE Afterwards:
I have done a VACUUM FULL; VACUUM ANALYZE; just before running the test.

So is it just the planner/optimizer/etc. costs? Would a PREPARE in 7.3 help?

Best Regards,
Michael Paesold


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Michael Paesold" <mpaesold(at)gmx(dot)at>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Insert Performance
Date: 2002-09-25 22:06:55
Message-ID: 28498.1032991615@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Michael Paesold" <mpaesold(at)gmx(dot)at> writes:
> To insert another 10562 rows takes about 12 minutes now!!!

> As I said I wrote a function to insert the rows (PL/pgSQL). All values were
> inserted inside a single function call; I always though that a function call
> would be executed inside a transaction block. Experience says it does.

Well, there's something fishy about your results. Using CVS tip I see
about a 4-to-1 difference between COPYing 10000 rows and INSERT'ing
10000 rows (as one transaction). That's annoyingly high, but it's still
way lower than what you're reporting ...

I used the contents of table tenk1 in the regression database for test
data, and dumped it out with "pg_dump -a" with and without -d. I then
just timed feeding the scripts to psql ...

regards, tom lane


From: "Michael Paesold" <mpaesold(at)gmx(dot)at>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Insert Performance
Date: 2002-09-25 23:44:22
Message-ID: 004101c264ed$7b21e1c0$4201a8c0@beeblebrox
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:

> "Michael Paesold" <mpaesold(at)gmx(dot)at> writes:
> > To insert another 10562 rows takes about 12 minutes now!!!
>
> > As I said I wrote a function to insert the rows (PL/pgSQL). All values
were
> > inserted inside a single function call; I always though that a function
call
> > would be executed inside a transaction block. Experience says it does.
>
> Well, there's something fishy about your results. Using CVS tip I see
> about a 4-to-1 difference between COPYing 10000 rows and INSERT'ing
> 10000 rows (as one transaction). That's annoyingly high, but it's still
> way lower than what you're reporting ...
>
> I used the contents of table tenk1 in the regression database for test
> data, and dumped it out with "pg_dump -a" with and without -d. I then
> just timed feeding the scripts to psql ...
>
> regards, tom lane

I have further played around with the test here. I now realized that insert
performance is much better right after a vacuum full; vacuum analyze;

I have this function bench_invoice(integer) that will insert $1 records into
invoice table;
select bench_invoice(10000) took about 10 minutes average. Now I executed
this with psql:

vacuum full; vacuum analyze;
select bench_invoice(1000); select bench_invoice(1000); ... (10 times)

It seems performance is degrading with every insert!
Here is the result (time in seconds in bench_invoice(), commit between
selects just under a second)

13, 24, 36, 47, 58, 70, 84, 94, 105, 117, ... (seconds per 1000 rows
inserted)

Isn't that odd?
I have tried again. vacuum analyze alone (without full) is enough to lower
times again. They will start again with 13 seconds.

I did not delete from the table by now; the table now has about 50000 rows.
The disk is not swapping, there are no other users using postgres,
postmaster takes about 100% cpu time during the whole operation. There are
no special messages in error log.

Can you explain?
Should I enable some debug logging? Disable some optimizer? Do something
else?
This is a development server, I habe no problem with playing around.

Best Regards,
Michael Paesold


From: "Michael Paesold" <mpaesold(at)gmx(dot)at>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Insert Performance
Date: 2002-09-25 23:58:17
Message-ID: 004901c264ef$6baaeb40$4201a8c0@beeblebrox
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Update:

> vacuum full; vacuum analyze;
> select bench_invoice(1000); select bench_invoice(1000); ... (10 times)
>
> It seems performance is degrading with every insert!
> Here is the result (time in seconds in bench_invoice(), commit between
> selects just under a second)
>
> 13, 24, 36, 47, 58, 70, 84, 94, 105, 117, ... (seconds per 1000 rows
> inserted)
>
> Isn't that odd?
> I have tried again. vacuum analyze alone (without full) is enough to lower
> times again. They will start again with 13 seconds.

Tested further what exactly will reset insert times to lowest possible:

vacuum full; helps
vacuum analyze; helps
analyze <tablename>; of table that I insert to doesn't help!
analyze <tablename>; of any table reference in foreign key constraints
doesn't help!

Only vacuum will reset the insert times to the lowest possible!
What does the vacuum code do?? :-]

Regards,
Michael Paesold


From: Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>
To: Michael Paesold <mpaesold(at)gmx(dot)at>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Insert Performance
Date: 2002-09-26 00:08:07
Message-ID: Pine.LNX.4.21.0209261006080.3530-100000@linuxworld.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> Only vacuum will reset the insert times to the lowest possible!
> What does the vacuum code do?? :-]

Please see the manual and the extensive discussions on this point in the
archives. This behaviour is well known -- though undesirable. It is an
effect of the multi-version concurrency control system.

Gavin


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Michael Paesold" <mpaesold(at)gmx(dot)at>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Insert Performance
Date: 2002-09-26 03:15:12
Message-ID: 525.1033010112@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Michael Paesold" <mpaesold(at)gmx(dot)at> writes:
> Only vacuum will reset the insert times to the lowest possible!
> What does the vacuum code do?? :-]

It removes dead tuples. Dead tuples can only arise from update or
delete operations ... so you have not been telling us the whole
truth. An insert-only test would not have this sort of behavior.

regards, tom lane


From: "Michael Paesold" <mpaesold(at)gmx(dot)at>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Insert Performance
Date: 2002-09-26 10:28:37
Message-ID: 001b01c26547$7ae84140$4201a8c0@beeblebrox
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:

> "Michael Paesold" <mpaesold(at)gmx(dot)at> writes:
> > Only vacuum will reset the insert times to the lowest possible!
> > What does the vacuum code do?? :-]
>
> It removes dead tuples. Dead tuples can only arise from update or
> delete operations ... so you have not been telling us the whole
> truth. An insert-only test would not have this sort of behavior.
>
> regards, tom lane

Sleeping is good. When I woke up this morning I had an idea of
what is causing these problems; and you are right. I had used a
self-written sequence system for the invoice_ids -- I can't use a
sequence because sequence values can skip.

So inserting an invoice would also do an update on a single row
of the cs_sequence table, which cause the problems.

Now, with a normal sequence, it works like a charm.
17 sec. for 10000 rows and 2-3 sec. for commit.

But why is performance so much degrading? After 10000 updates
on a row, the row seems to be unusable without vacuum! I hope
the currently discussed autovacuum daemon will help in such a
situation.

So I think I will have to look for another solution. It would be
nice if one could lock a sequence! That would solve all my
troubles,...

<dreaming>
BEGIN;
LOCK SEQUENCE invoice_id_seq;
-- now only this connection can get nextval(), all others will block
INSERT INTO invoice VALUES (nextval('invoice_id_seq'), ...);
INSERT INTO invoice VALUES (nextval('invoice_id_seq'), ...);
...
COMMIT;
-- now this only helps if sequences could be rolled back -- wake up!
</dreaming>

What could you recommend? Locking the table and selecting
max(invoice_id) wouldn't really be much faster, with max(invoice_id)
not using an index...

Best Regards,
Michael Paesold


From: "Shridhar Daithankar" <shridhar_daithankar(at)persistent(dot)co(dot)in>
To: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Insert Performance
Date: 2002-09-26 10:30:59
Message-ID: 3D932F3B.1928.13A460C5@localhost
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 26 Sep 2002 at 12:28, Michael Paesold wrote:
> But why is performance so much degrading? After 10000 updates
> on a row, the row seems to be unusable without vacuum! I hope
> the currently discussed autovacuum daemon will help in such a
> situation.

Let mw know if it works. Use CVS BTW.. I am eager to know any bug reports..
Didn't have a chance to test it the way I would have liked. May be this
weekend..

Bye
Shridhar

--
QOTD: The forest may be quiet, but that doesn't mean the snakes have gone away.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Michael Paesold" <mpaesold(at)gmx(dot)at>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Insert Performance
Date: 2002-09-26 14:05:11
Message-ID: 3674.1033049111@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Michael Paesold" <mpaesold(at)gmx(dot)at> writes:
> So inserting an invoice would also do an update on a single row
> of the cs_sequence table, which cause the problems.

> Now, with a normal sequence, it works like a charm.
> 17 sec. for 10000 rows and 2-3 sec. for commit.

> But why is performance so much degrading? After 10000 updates
> on a row, the row seems to be unusable without vacuum!

Probably, because the table contains 10000 dead tuples and one live one.
The system is scanning all 10001 tuples looking for the one to UPDATE.

In 7.3 it might help a little to create an index on the table. But
really this is one of the reasons that SEQUENCEs were invented ---
you have no alternative but to do frequent vacuums, if you repeatedly
update the same row of a table. You might consider issuing a selective
"VACUUM cs_sequence" command every so often (ideally every few hundred
updates).

> I hope the currently discussed autovacuum daemon will help in such a
> situation.

Probably, if we can teach it to recognize that such frequent vacuums are
needed. In the meantime, cron is your friend ...

regards, tom lane


From: Mark Kirkwood <markir(at)paradise(dot)net(dot)nz>
To: shridhar_daithankar(at)persistent(dot)co(dot)in
Cc: pgsql-hackers(at)postgresql(dot)org, jharris(at)nightstarcorporation(dot)com
Subject: Re: Improving speed of copy
Date: 2002-10-02 07:37:04
Message-ID: 3D9AA220.7000506@paradise.net.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

>Have you tried this with Oracle or similar commercial database?

I have timed COPY/LOAD times for Postgresql/Mysql/Oracle/Db2 -

the rough comparison is :

Db2 and Mysql fastest (Db2 slightly faster)
Oracle approx twice as slow as Db2
Postgresql about 3.5-4 times slower than Db2

However Postgresql can sometimes create indexes faster than Mysql ....
so that the total time of COPY + CREATE INDEX can be smaller for
Postgresql than Mysql.

Oracle an Db2 seemed similarish to Postgresql with respect to CREATE INDEX

regards

Mark


From: Curt Sampson <cjs(at)cynic(dot)net>
To: Shridhar Daithankar <shridhar_daithankar(at)persistent(dot)co(dot)in>
Cc: Pgsql-hackers(at)postgresql(dot)org
Subject: Re: Improving speed of copy
Date: 2002-10-06 15:06:11
Message-ID: Pine.NEB.4.44.0210070002510.515-100000@angelic.cynic.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, 20 Sep 2002, Shridhar Daithankar wrote:

> On 20 Sep 2002 at 21:22, Shridhar Daithankar wrote:
>
> > Mysql takes 221 sec. v/s 1121 sec. for postgres. For postgresql,
> > that is around 11.5K rows per second. Each tuple has 23 fields with
> > fixed length of around 100 bytes

Yes, postgres is much slower than MySQL for doing bulk loading of data.
There's not much, short of hacking on the code, that can be done about
this.

> One more issue is time taken for composite index creation. It's 4341
> sec as opposed to 436 sec for mysql. These are three non-unique
> character fields where the combination itself is not unique as well.

Setting sort_mem appropriately makes a big difference here. I generally
bump it up to 2-8 MB for everyone, and when I'm building a big index, I
set it to 32 MB or so just for that session.

But make sure you don't set it so high you drive your system into
swapping, or it will kill your performance. Remember also, that in
7.2.x, postgres will actually use almost three times the value you give
sort_mem (i.e., sort_mem of 32 MB will actually allocate close to 96 MB
of memory for the sort).

cjs
--
Curt Sampson <cjs(at)cynic(dot)net> +81 90 7737 2974 http://www.netbsd.org
Don't you know, in this new Dark Age, we're all light. --XTC