Quick Links

Re: Sorting performance vs. MySQL

Lists:	pgsql-general

From:	Yang Zhang <yanghatespam(at)gmail(dot)com>
To:	pgsql-general(at)postgresql(dot)org
Subject:	Sorting performance vs. MySQL
Date:	2010-02-22 18:10:30
Message-ID:	9066fa251002221010o365b58f5v988f7bac933ad576@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

I have the exact same table of data in both MySQL and Postgresql. In Postgresql:

tpcc=# \d metarelcloud_transactionlog
Table
"public.metarelcloud_transactionlog"
Column | Type |
Modifiers
---------------------+-----------------------+--------------------------------------------------------------------------
id | integer | not null default
nextval('metarelcloud_transactionlog_id_seq'::regclass)
transactionid | integer | not null
queryid | smallint | not null
tableid | character varying(30) | not null
tupleid | integer | not null
querytype | character varying | not null
graphpartition | smallint |
replicatedpartition | smallint |
justifiedpartition | smallint |
hashpartition | smallint |
nodeid | integer |
manualpartition | smallint |
Indexes:
"metarelcloud_transactionlog_pkey" PRIMARY KEY, btree (id)
Check constraints:
"metarelcloud_transactionlog_graphpartition_check" CHECK
(graphpartition >= 0)
"metarelcloud_transactionlog_hashpartition_check" CHECK (hashpartition >= 0)
"metarelcloud_transactionlog_justifiedpartition_check" CHECK
(justifiedpartition >= 0)
"metarelcloud_transactionlog_manualpartition_check" CHECK
(manualpartition >= 0)
"metarelcloud_transactionlog_querytype_check" CHECK
(querytype::text = ANY (ARRAY['select'::character varying,
'insert'::character varying, 'delete'::character varying,
'update'::character varying]::text[]))
"metarelcloud_transactionlog_replicatedpartition_check" CHECK
(replicatedpartition >= 0)

In MySQL:

CREATE TABLE `metarelcloud_transactionlog` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`transactionid` int(11) NOT NULL,
`queryid` tinyint(4) NOT NULL,
`tableid` varchar(30) NOT NULL,
`tupleid` int(11) NOT NULL,
`querytype` enum('select','insert','delete','update') NOT NULL,
`graphpartition` tinyint(3) unsigned DEFAULT NULL,
`replicatedpartition` tinyint(3) unsigned DEFAULT NULL,
`justifiedpartition` tinyint(3) unsigned DEFAULT NULL,
`hashpartition` tinyint(3) unsigned DEFAULT NULL,
`nodeid` int(11) DEFAULT NULL,
`manualpartition` tinyint(3) unsigned DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `transactionid` (`transactionid`),
KEY `tableid` (`tableid`,`tupleid`),
KEY `nodeid` (`nodeid`)
) ENGINE=MyISAM AUTO_INCREMENT=50410166 DEFAULT CHARSET=latin1

I'm running:

select * from metarelcloud_transactionlog order by transactionid;

It takes MySQL 6 minutes, but Postgresql is still running after 70
minutes. Is there something like a glaring misconfiguration that I'm
overlooking? Thanks in advance.
--
Yang Zhang
http://www.mit.edu/~y_z/

From:	Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To:	Yang Zhang <yanghatespam(at)gmail(dot)com>
Cc:	pgsql-general(at)postgresql(dot)org
Subject:	Re: Sorting performance vs. MySQL
Date:	2010-02-22 18:13:18
Message-ID:	162867791002221013m10c0088bsdc687df0477e3a40@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

hello

the speed depends on setting of working_memory. Try to increase a working_memory

set working_memory to '10MB';

Regards
Pavel Stehule

2010/2/22 Yang Zhang <yanghatespam(at)gmail(dot)com>:
> I have the exact same table of data in both MySQL and Postgresql. In Postgresql:
>
> tpcc=# \d metarelcloud_transactionlog
> Table
> "public.metarelcloud_transactionlog"
> Column | Type |
> Modifiers
> ---------------------+-----------------------+--------------------------------------------------------------------------
> id | integer | not null default
> nextval('metarelcloud_transactionlog_id_seq'::regclass)
> transactionid | integer | not null
> queryid | smallint | not null
> tableid | character varying(30) | not null
> tupleid | integer | not null
> querytype | character varying | not null
> graphpartition | smallint |
> replicatedpartition | smallint |
> justifiedpartition | smallint |
> hashpartition | smallint |
> nodeid | integer |
> manualpartition | smallint |
> Indexes:
> "metarelcloud_transactionlog_pkey" PRIMARY KEY, btree (id)
> Check constraints:
> "metarelcloud_transactionlog_graphpartition_check" CHECK
> (graphpartition >= 0)
> "metarelcloud_transactionlog_hashpartition_check" CHECK (hashpartition >= 0)
> "metarelcloud_transactionlog_justifiedpartition_check" CHECK
> (justifiedpartition >= 0)
> "metarelcloud_transactionlog_manualpartition_check" CHECK
> (manualpartition >= 0)
> "metarelcloud_transactionlog_querytype_check" CHECK
> (querytype::text = ANY (ARRAY['select'::character varying,
> 'insert'::character varying, 'delete'::character varying,
> 'update'::character varying]::text[]))
> "metarelcloud_transactionlog_replicatedpartition_check" CHECK
> (replicatedpartition >= 0)
>
> In MySQL:
>
> CREATE TABLE `metarelcloud_transactionlog` (
> `id` int(11) NOT NULL AUTO_INCREMENT,
> `transactionid` int(11) NOT NULL,
> `queryid` tinyint(4) NOT NULL,
> `tableid` varchar(30) NOT NULL,
> `tupleid` int(11) NOT NULL,
> `querytype` enum('select','insert','delete','update') NOT NULL,
> `graphpartition` tinyint(3) unsigned DEFAULT NULL,
> `replicatedpartition` tinyint(3) unsigned DEFAULT NULL,
> `justifiedpartition` tinyint(3) unsigned DEFAULT NULL,
> `hashpartition` tinyint(3) unsigned DEFAULT NULL,
> `nodeid` int(11) DEFAULT NULL,
> `manualpartition` tinyint(3) unsigned DEFAULT NULL,
> PRIMARY KEY (`id`),
> KEY `transactionid` (`transactionid`),
> KEY `tableid` (`tableid`,`tupleid`),
> KEY `nodeid` (`nodeid`)
> ) ENGINE=MyISAM AUTO_INCREMENT=50410166 DEFAULT CHARSET=latin1
>
> I'm running:
>
> select * from metarelcloud_transactionlog order by transactionid;
>
> It takes MySQL 6 minutes, but Postgresql is still running after 70
> minutes. Is there something like a glaring misconfiguration that I'm
> overlooking? Thanks in advance.
> --
> Yang Zhang
> http://www.mit.edu/~y_z/
>
> --
> Sent via pgsql-general mailing list (pgsql-general(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

From:	Frank Heikens <frankheikens(at)mac(dot)com>
To:	Yang Zhang <yanghatespam(at)gmail(dot)com>
Cc:	pgsql-general(at)postgresql(dot)org
Subject:	Re: Sorting performance vs. MySQL
Date:	2010-02-22 18:17:55
Message-ID:	21D4F741-3E28-45A4-BEDC-57BC28E961FA@mac.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

There is no index on the column transactionid in your PostgreSQL-
table, as there is in your MySQL-table. This explains the difference.

CREATE INDEX i_transactionid ON public.metarelcloud_transactionlog
(transactionid);

Op 22 feb 2010, om 19:10 heeft Yang Zhang het volgende geschreven:

> I have the exact same table of data in both MySQL and Postgresql. In
> Postgresql:
>
> tpcc=# \d metarelcloud_transactionlog
> Table
> "public.metarelcloud_transactionlog"
> Column | Type |
> Modifiers
> ---------------------+-----------------------
> +
> --------------------------------------------------------------------------
> id | integer | not null default
> nextval('metarelcloud_transactionlog_id_seq'::regclass)
> transactionid | integer | not null
> queryid | smallint | not null
> tableid | character varying(30) | not null
> tupleid | integer | not null
> querytype | character varying | not null
> graphpartition | smallint |
> replicatedpartition | smallint |
> justifiedpartition | smallint |
> hashpartition | smallint |
> nodeid | integer |
> manualpartition | smallint |
> Indexes:
> "metarelcloud_transactionlog_pkey" PRIMARY KEY, btree (id)
> Check constraints:
> "metarelcloud_transactionlog_graphpartition_check" CHECK
> (graphpartition >= 0)
> "metarelcloud_transactionlog_hashpartition_check" CHECK
> (hashpartition >= 0)
> "metarelcloud_transactionlog_justifiedpartition_check" CHECK
> (justifiedpartition >= 0)
> "metarelcloud_transactionlog_manualpartition_check" CHECK
> (manualpartition >= 0)
> "metarelcloud_transactionlog_querytype_check" CHECK
> (querytype::text = ANY (ARRAY['select'::character varying,
> 'insert'::character varying, 'delete'::character varying,
> 'update'::character varying]::text[]))
> "metarelcloud_transactionlog_replicatedpartition_check" CHECK
> (replicatedpartition >= 0)
>
> In MySQL:
>
> CREATE TABLE `metarelcloud_transactionlog` (
> `id` int(11) NOT NULL AUTO_INCREMENT,
> `transactionid` int(11) NOT NULL,
> `queryid` tinyint(4) NOT NULL,
> `tableid` varchar(30) NOT NULL,
> `tupleid` int(11) NOT NULL,
> `querytype` enum('select','insert','delete','update') NOT NULL,
> `graphpartition` tinyint(3) unsigned DEFAULT NULL,
> `replicatedpartition` tinyint(3) unsigned DEFAULT NULL,
> `justifiedpartition` tinyint(3) unsigned DEFAULT NULL,
> `hashpartition` tinyint(3) unsigned DEFAULT NULL,
> `nodeid` int(11) DEFAULT NULL,
> `manualpartition` tinyint(3) unsigned DEFAULT NULL,
> PRIMARY KEY (`id`),
> KEY `transactionid` (`transactionid`),
> KEY `tableid` (`tableid`,`tupleid`),
> KEY `nodeid` (`nodeid`)
> ) ENGINE=MyISAM AUTO_INCREMENT=50410166 DEFAULT CHARSET=latin1
>
> I'm running:
>
> select * from metarelcloud_transactionlog order by transactionid;
>
> It takes MySQL 6 minutes, but Postgresql is still running after 70
> minutes. Is there something like a glaring misconfiguration that I'm
> overlooking? Thanks in advance.
> --
> Yang Zhang
> http://www.mit.edu/~y_z/
>
> --
> Sent via pgsql-general mailing list (pgsql-general(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general

Frank Heikens
frankheikens(at)mac(dot)com

From:	Richard Broersma <richard(dot)broersma(at)gmail(dot)com>
To:	Frank Heikens <frankheikens(at)mac(dot)com>
Cc:	Yang Zhang <yanghatespam(at)gmail(dot)com>, pgsql-general(at)postgresql(dot)org
Subject:	Re: Sorting performance vs. MySQL
Date:	2010-02-22 18:30:00
Message-ID:	396486431002221030o74166784wf06bf0e909a29496@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

On Mon, Feb 22, 2010 at 10:17 AM, Frank Heikens <frankheikens(at)mac(dot)com> wrote:

> There is no index on the column transactionid in your PostgreSQL-table, as
> there is in your MySQL-table. This explains the difference.
>
> CREATE INDEX i_transactionid ON public.metarelcloud_transactionlog
> (transactionid);

Does an index help a sort operation in PostgreSQL?

--
Regards,
Richard Broersma Jr.

Visit the Los Angeles PostgreSQL Users Group (LAPUG)
http://pugs.postgresql.org/lapug

From:	Yang Zhang <yanghatespam(at)gmail(dot)com>
To:	Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc:	pgsql-general(at)postgresql(dot)org
Subject:	Re: Sorting performance vs. MySQL
Date:	2010-02-22 18:35:23
Message-ID:	9066fa251002221035s38a1a72j5a73d301553a72b@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

On Mon, Feb 22, 2010 at 1:13 PM, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:
> hello
>
> the speed depends on setting of working_memory. Try to increase a working_memory
>
> set working_memory to '10MB';

It's already at

tpcc=# show work_mem;
work_mem
----------
20000kB
(1 row)

I also wouldn't have imagined an external merge-sort as being very
memory-intensive--wouldn't it only enough buffer space to read 2x and
write 1x in big-enough chunks for mostly-sequential access?
--
Yang Zhang
http://www.mit.edu/~y_z/

From:	Yang Zhang <yanghatespam(at)gmail(dot)com>
To:	Richard Broersma <richard(dot)broersma(at)gmail(dot)com>
Cc:	Frank Heikens <frankheikens(at)mac(dot)com>, pgsql-general(at)postgresql(dot)org
Subject:	Re: Sorting performance vs. MySQL
Date:	2010-02-22 18:36:48
Message-ID:	9066fa251002221036w223fef8bqe780113bee2875b1@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

On Mon, Feb 22, 2010 at 1:30 PM, Richard Broersma
<richard(dot)broersma(at)gmail(dot)com> wrote:
> On Mon, Feb 22, 2010 at 10:17 AM, Frank Heikens <frankheikens(at)mac(dot)com> wrote:
>
>> There is no index on the column transactionid in your PostgreSQL-table, as
>> there is in your MySQL-table. This explains the difference.
>>
>> CREATE INDEX i_transactionid ON public.metarelcloud_transactionlog
>> (transactionid);
>
> Does an index help a sort operation in PostgreSQL?

I also share the same doubt. An external merge-sort needs to make
complete passes over the entire dataset, with no index-directed
accesses.
--
Yang Zhang
http://www.mit.edu/~y_z/

From:	Frank Heikens <frankheikens(at)mac(dot)com>
To:	Richard Broersma <richard(dot)broersma(at)gmail(dot)com>
Cc:	Yang Zhang <yanghatespam(at)gmail(dot)com>, pgsql-general(at)postgresql(dot)org
Subject:	Re: Sorting performance vs. MySQL
Date:	2010-02-22 18:42:46
Message-ID:	55EDC372-1A11-42E7-A72E-D2D688702E6A@mac.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

Op 22 feb 2010, om 19:30 heeft Richard Broersma het volgende geschreven:

> On Mon, Feb 22, 2010 at 10:17 AM, Frank Heikens
> <frankheikens(at)mac(dot)com> wrote:
>
>> There is no index on the column transactionid in your PostgreSQL-
>> table, as
>> there is in your MySQL-table. This explains the difference.
>>
>> CREATE INDEX i_transactionid ON public.metarelcloud_transactionlog
>> (transactionid);
>
> Does an index help a sort operation in PostgreSQL?

Yes it does, see the manual: http://www.postgresql.org/docs/8.4/interactive/indexes-ordering.html

Example without index:
"Sort (cost=804.39..829.39 rows=10000 width=4) (actual
time=16.006..17.171 rows=10000 loops=1)"
" Sort Key: bar"
" Sort Method: quicksort Memory: 491kB"
" -> Seq Scan on bla (cost=0.00..140.00 rows=10000 width=4) (actual
time=0.015..2.236 rows=10000 loops=1)"
"Total runtime: 18.098 ms"

Same query with index (btree):
"Index Scan Backward using i_bar on bla (cost=0.00..406.25 rows=10000
width=4) (actual time=0.093..4.408 rows=10000 loops=1)"
"Total runtime: 5.381 ms"

>
>
> --
> Regards,
> Richard Broersma Jr.
>
> Visit the Los Angeles PostgreSQL Users Group (LAPUG)
> http://pugs.postgresql.org/lapug

Regards,
Frank Heikens

From:	Alban Hertroys <dalroi(at)solfertje(dot)student(dot)utwente(dot)nl>
To:	Yang Zhang <yanghatespam(at)gmail(dot)com>
Cc:	Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, pgsql-general(at)postgresql(dot)org
Subject:	Re: Sorting performance vs. MySQL
Date:	2010-02-22 18:48:20
Message-ID:	D2FE719C-F437-4264-934B-D8A90F9C201B@solfertje.student.utwente.nl
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

On 22 Feb 2010, at 19:35, Yang Zhang wrote:

> I also wouldn't have imagined an external merge-sort as being very

Where's that external merge-sort coming from? Can you show an explain analyze?

If your work-mem is too low there's a good chance that Postgres has to use your disks for sorting, which will obviously be quite slow.

Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.

!DSPAM:737,4b82d18510442035320951!

From:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To:	Yang Zhang <yanghatespam(at)gmail(dot)com>
Cc:	pgsql-general(at)postgresql(dot)org
Subject:	Re: Sorting performance vs. MySQL
Date:	2010-02-22 19:03:43
Message-ID:	20100222190343.GG4629@alvh.no-ip.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

Yang Zhang escribió:

> I'm running:
>
> select * from metarelcloud_transactionlog order by transactionid;
>
> It takes MySQL 6 minutes, but Postgresql is still running after 70
> minutes. Is there something like a glaring misconfiguration that I'm
> overlooking? Thanks in advance.

How large is the table, and have you vacuumed it? Did you analyze it?
What Pg version is this?

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

From:	Yang Zhang <yanghatespam(at)gmail(dot)com>
To:	Alban Hertroys <dalroi(at)solfertje(dot)student(dot)utwente(dot)nl>
Cc:	Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, pgsql-general(at)postgresql(dot)org
Subject:	Re: Sorting performance vs. MySQL
Date:	2010-02-22 19:07:21
Message-ID:	9066fa251002221107x39450bb8h55b5bdc06ae48e59@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

On Mon, Feb 22, 2010 at 1:48 PM, Alban Hertroys
<dalroi(at)solfertje(dot)student(dot)utwente(dot)nl> wrote:
> On 22 Feb 2010, at 19:35, Yang Zhang wrote:
>
>> I also wouldn't have imagined an external merge-sort as being very
>
>
> Where's that external merge-sort coming from? Can you show an explain analyze?

I just assumed that the "Sort" in the EXPLAIN output meant an external
merge-sort, given that the table has over 50 million tuples and is
over 3GB, *and* there is no index on the sort key:

tpcc=# explain select * from metarelcloud_transactionlog order by transactionid;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
Sort (cost=8408637.34..8534662.95 rows=50410244 width=17)
Sort Key: a.transactionid
-> Seq Scan on metarelcloud_transactionlog a
(cost=0.00..925543.44 rows=50410244 width=17)
(3 rows)

Anyway, I added the INDEX as suggested by Frank, but it's been 20
minutes and it's still running. With the index, EXPLAIN says:

tpcc=# explain select * from metarelcloud_transactionlog order by transactionid;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
Index Scan using i_transactionid on metarelcloud_transactionlog
(cost=0.00..4453076.81 rows=50410164 width=44)
(1 row)

> If your work-mem is too low there's a good chance that Postgres has to use your disks for sorting, which will obviously be quite slow.

Relative to the non-terminating 80-minute-so-far sort, Unix sort runs
much faster (on the order of several minutes).
--
Yang Zhang
http://www.mit.edu/~y_z/

From:	Yang Zhang <yanghatespam(at)gmail(dot)com>
To:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc:	pgsql-general(at)postgresql(dot)org
Subject:	Re: Sorting performance vs. MySQL
Date:	2010-02-22 19:08:24
Message-ID:	9066fa251002221108q608c42a8j2c7b4b587ec1127@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

On Mon, Feb 22, 2010 at 2:03 PM, Alvaro Herrera
<alvherre(at)commandprompt(dot)com> wrote:
> Yang Zhang escribió:
>
>> I'm running:
>>
>> select * from metarelcloud_transactionlog order by transactionid;
>>
>> It takes MySQL 6 minutes, but Postgresql is still running after 70
>> minutes. Is there something like a glaring misconfiguration that I'm
>> overlooking? Thanks in advance.
>
> How large is the table, and have you vacuumed it? Did you analyze it?
> What Pg version is this?

The table has 50.4M tuples. It's been vacuumed and analyzed. I'm using
PG 8.3.8 on Fedora 10 x86_64.
--
Yang Zhang
http://www.mit.edu/~y_z/

From:	Frank Heikens <frankheikens(at)mac(dot)com>
To:	Yang Zhang <yanghatespam(at)gmail(dot)com>
Cc:	Alban Hertroys <dalroi(at)solfertje(dot)student(dot)utwente(dot)nl>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, pgsql-general(at)postgresql(dot)org
Subject:	Re: Sorting performance vs. MySQL
Date:	2010-02-22 19:15:04
Message-ID:	C57F96F3-9A99-471E-862B-64C565CF2DA3@mac.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

Op 22 feb 2010, om 20:07 heeft Yang Zhang het volgende geschreven:

> On Mon, Feb 22, 2010 at 1:48 PM, Alban Hertroys
> <dalroi(at)solfertje(dot)student(dot)utwente(dot)nl> wrote:
>> On 22 Feb 2010, at 19:35, Yang Zhang wrote:
>>
>>> I also wouldn't have imagined an external merge-sort as being very
>>
>>
>> Where's that external merge-sort coming from? Can you show an
>> explain analyze?
>
> I just assumed that the "Sort" in the EXPLAIN output meant an external
> merge-sort, given that the table has over 50 million tuples and is
> over 3GB, *and* there is no index on the sort key:
>
> tpcc=# explain select * from metarelcloud_transactionlog order by
> transactionid;
> QUERY PLAN
> -----------------------------------------------------------------------------------------------------------------
> Sort (cost=8408637.34..8534662.95 rows=50410244 width=17)
> Sort Key: a.transactionid
> -> Seq Scan on metarelcloud_transactionlog a
> (cost=0.00..925543.44 rows=50410244 width=17)
> (3 rows)
>
> Anyway, I added the INDEX as suggested by Frank, but it's been 20
> minutes and it's still running. With the index, EXPLAIN says:
>
> tpcc=# explain select * from metarelcloud_transactionlog order by
> transactionid;
> QUERY PLAN
> -----------------------------------------------------------------------------------------------------------------
> Index Scan using i_transactionid on metarelcloud_transactionlog
> (cost=0.00..4453076.81 rows=50410164 width=44)
> (1 row)
>

Use EXPLAIN ANALYZE to see how the query is executed, gives you more
details.

>> If your work-mem is too low there's a good chance that Postgres has
>> to use your disks for sorting, which will obviously be quite slow.
>
> Relative to the non-terminating 80-minute-so-far sort, Unix sort runs
> much faster (on the order of several minutes).

Make sure your index does fit into memory, what's the size of the index?

> --
> Yang Zhang
> http://www.mit.edu/~y_z/
>
> --
> Sent via pgsql-general mailing list (pgsql-general(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general

Frank Heikens

From:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To:	Yang Zhang <yanghatespam(at)gmail(dot)com>
Cc:	pgsql-general(at)postgresql(dot)org
Subject:	Re: Sorting performance vs. MySQL
Date:	2010-02-22 19:27:56
Message-ID:	20100222192756.GH4629@alvh.no-ip.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

Yang Zhang escribió:
> I have the exact same table of data in both MySQL and Postgresql. In Postgresql:

I just noticed two things:

[snip lots of stuff]

> ) ENGINE=MyISAM AUTO_INCREMENT=50410166 DEFAULT CHARSET=latin1

You're doing a comparison to MyISAM.

> select * from metarelcloud_transactionlog order by transactionid;

You're reading the whole table.

This is unlikely to fly very far. I suggest you try some query that's
actually going to be used in the real world.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

From:	Yang Zhang <yanghatespam(at)gmail(dot)com>
To:	Frank Heikens <frankheikens(at)mac(dot)com>
Cc:	Alban Hertroys <dalroi(at)solfertje(dot)student(dot)utwente(dot)nl>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, pgsql-general(at)postgresql(dot)org
Subject:	Re: Sorting performance vs. MySQL
Date:	2010-02-22 19:28:33
Message-ID:	9066fa251002221128td05a3ddmaa9b2f072e97be25@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

On Mon, Feb 22, 2010 at 2:15 PM, Frank Heikens <frankheikens(at)mac(dot)com> wrote:
>
> Op 22 feb 2010, om 20:07 heeft Yang Zhang het volgende geschreven:
>
>> On Mon, Feb 22, 2010 at 1:48 PM, Alban Hertroys
>> <dalroi(at)solfertje(dot)student(dot)utwente(dot)nl> wrote:
>>>
>>> On 22 Feb 2010, at 19:35, Yang Zhang wrote:
>>>
>>>> I also wouldn't have imagined an external merge-sort as being very
>>>
>>>
>>> Where's that external merge-sort coming from? Can you show an explain
>>> analyze?
>>
>> I just assumed that the "Sort" in the EXPLAIN output meant an external
>> merge-sort, given that the table has over 50 million tuples and is
>> over 3GB, *and* there is no index on the sort key:
>>
>> tpcc=# explain select * from metarelcloud_transactionlog order by
>> transactionid;
>> QUERY PLAN
>>
>> -----------------------------------------------------------------------------------------------------------------
>> Sort (cost=8408637.34..8534662.95 rows=50410244 width=17)
>> Sort Key: a.transactionid
>> -> Seq Scan on metarelcloud_transactionlog a
>> (cost=0.00..925543.44 rows=50410244 width=17)
>> (3 rows)
>>
>> Anyway, I added the INDEX as suggested by Frank, but it's been 20
>> minutes and it's still running. With the index, EXPLAIN says:
>>
>> tpcc=# explain select * from metarelcloud_transactionlog order by
>> transactionid;
>> QUERY PLAN
>>
>> -----------------------------------------------------------------------------------------------------------------
>> Index Scan using i_transactionid on metarelcloud_transactionlog
>> (cost=0.00..4453076.81 rows=50410164 width=44)
>> (1 row)
>>
>
> Use EXPLAIN ANALYZE to see how the query is executed, gives you more
> details.

As mentioned, this would take a very long time to complete running --
I have not yet seen one successful query completion yet. I'd need to
let it run overnight.

>
>
>>> If your work-mem is too low there's a good chance that Postgres has to
>>> use your disks for sorting, which will obviously be quite slow.
>>
>> Relative to the non-terminating 80-minute-so-far sort, Unix sort runs
>> much faster (on the order of several minutes).
>
> Make sure your index does fit into memory, what's the size of the index?

How might I find out the size and whether it's being fit in memory?
--
Yang Zhang
http://www.mit.edu/~y_z/

From:	Yang Zhang <yanghatespam(at)gmail(dot)com>
To:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc:	pgsql-general(at)postgresql(dot)org
Subject:	Re: Sorting performance vs. MySQL
Date:	2010-02-22 19:30:43
Message-ID:	9066fa251002221130m6b4b97bby1cc6bbfd00af6353@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

On Mon, Feb 22, 2010 at 2:27 PM, Alvaro Herrera
<alvherre(at)commandprompt(dot)com> wrote:
> Yang Zhang escribió:
>> I have the exact same table of data in both MySQL and Postgresql. In Postgresql:
>
> I just noticed two things:
>
> [snip lots of stuff]
>
> 1.
>
>> ) ENGINE=MyISAM AUTO_INCREMENT=50410166 DEFAULT CHARSET=latin1
>
> You're doing a comparison to MyISAM.

We've actually been using innodb as well; it exhibits similar
execution times to MyISAM.

>
>
> 2.
>
>> select * from metarelcloud_transactionlog order by transactionid;
>
> You're reading the whole table.
>
> This is unlikely to fly very far. I suggest you try some query that's
> actually going to be used in the real world.

This isn't some microbenchmark. This is part of our actual analytical
application. We're running large-scale graph partitioning algorithms.
--
Yang Zhang
http://www.mit.edu/~y_z/

From:	Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
To:	Yang Zhang <yanghatespam(at)gmail(dot)com>
Cc:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>, pgsql-general(at)postgresql(dot)org
Subject:	Re: Sorting performance vs. MySQL
Date:	2010-02-22 19:39:39
Message-ID:	dcc563d11002221139n3c4e5af0pfc6a823894c25439@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

On Mon, Feb 22, 2010 at 12:30 PM, Yang Zhang <yanghatespam(at)gmail(dot)com> wrote:
> This isn't some microbenchmark. This is part of our actual analytical
> application. We're running large-scale graph partitioning algorithms.

It's important to see how it runs if you can fit more / most of the
data set into memory by cranking up work_mem to something really big
(like a gigabyte or two) and if the query planner can switch to some
sort of hash algorithm.

Also, can you cluster the table on transactionid ?

From:	Frank Heikens <frankheikens(at)mac(dot)com>
To:	Yang Zhang <yanghatespam(at)gmail(dot)com>
Cc:	Alban Hertroys <dalroi(at)solfertje(dot)student(dot)utwente(dot)nl>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, pgsql-general(at)postgresql(dot)org
Subject:	Re: Sorting performance vs. MySQL
Date:	2010-02-22 19:41:29
Message-ID:	475A81B1-52E6-4DAE-B72B-142B0E3464F7@mac.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

Op 22 feb 2010, om 20:28 heeft Yang Zhang het volgende geschreven:
>
>
>>
>>
>>>> If your work-mem is too low there's a good chance that Postgres
>>>> has to
>>>> use your disks for sorting, which will obviously be quite slow.
>>>
>>> Relative to the non-terminating 80-minute-so-far sort, Unix sort
>>> runs
>>> much faster (on the order of several minutes).
>>
>> Make sure your index does fit into memory, what's the size of the
>> index?
>
> How might I find out the size and whether it's being fit in memory?

SELECT pg_size_pretty(pg_relation_size('i_transactionid'));

Frank Heikens

From:	Yang Zhang <yanghatespam(at)gmail(dot)com>
To:	Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
Cc:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>, pgsql-general(at)postgresql(dot)org
Subject:	Re: Sorting performance vs. MySQL
Date:	2010-02-22 19:50:46
Message-ID:	9066fa251002221150q15913begfb39c1373e6525ca@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

On Mon, Feb 22, 2010 at 2:39 PM, Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com> wrote:
> On Mon, Feb 22, 2010 at 12:30 PM, Yang Zhang <yanghatespam(at)gmail(dot)com> wrote:
>> This isn't some microbenchmark. This is part of our actual analytical
>> application. We're running large-scale graph partitioning algorithms.
>
> It's important to see how it runs if you can fit more / most of the
> data set into memory by cranking up work_mem to something really big
> (like a gigabyte or two) and if the query planner can switch to some
> sort of hash algorithm.

We're actually using a very small dataset right now. Being bounded by
memory capacity is not a scalable approach for our application.

>
> Also, can you cluster the table on transactionid ?
>

We can, but that's not really addressing the core issue, which matters
to us since the sort itself is only for performing a self merge join
on transactionid, and the *very next step* is a group by a.tableid,
a.tupleid, b.tableid, b.tupleid (i.e. requiring another sort for the
group-agg).
--
Yang Zhang
http://www.mit.edu/~y_z/

From:	Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
To:	Yang Zhang <yanghatespam(at)gmail(dot)com>
Cc:	pgsql-general(at)postgresql(dot)org
Subject:	Re: Sorting performance vs. MySQL
Date:	2010-02-22 19:52:51
Message-ID:	dcc563d11002221152w2a45ed48pe702cfadd7613ce1@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

On Mon, Feb 22, 2010 at 11:10 AM, Yang Zhang <yanghatespam(at)gmail(dot)com> wrote:
> I have the exact same table of data in both MySQL and Postgresql. In Postgresql:

Just wondering, are these on the same exact machine?

From:	Yang Zhang <yanghatespam(at)gmail(dot)com>
To:	Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
Cc:	pgsql-general(at)postgresql(dot)org
Subject:	Re: Sorting performance vs. MySQL
Date:	2010-02-22 19:53:29
Message-ID:	9066fa251002221153t724acc8ds9c8888ad64df62d6@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

On Mon, Feb 22, 2010 at 2:52 PM, Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com> wrote:
> On Mon, Feb 22, 2010 at 11:10 AM, Yang Zhang <yanghatespam(at)gmail(dot)com> wrote:
>> I have the exact same table of data in both MySQL and Postgresql. In Postgresql:
>
> Just wondering, are these on the same exact machine?
>

Yes, on the same disk.
--
Yang Zhang
http://www.mit.edu/~y_z/

From:	Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
To:	Yang Zhang <yanghatespam(at)gmail(dot)com>
Cc:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>, pgsql-general(at)postgresql(dot)org
Subject:	Re: Sorting performance vs. MySQL
Date:	2010-02-22 19:54:20
Message-ID:	dcc563d11002221154h19175206x2104b1e24d0a60cf@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

On Mon, Feb 22, 2010 at 12:50 PM, Yang Zhang <yanghatespam(at)gmail(dot)com> wrote:
> On Mon, Feb 22, 2010 at 2:39 PM, Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com> wrote:
>> On Mon, Feb 22, 2010 at 12:30 PM, Yang Zhang <yanghatespam(at)gmail(dot)com> wrote:
>>> This isn't some microbenchmark. This is part of our actual analytical
>>> application. We're running large-scale graph partitioning algorithms.
>>
>> It's important to see how it runs if you can fit more / most of the
>> data set into memory by cranking up work_mem to something really big
>> (like a gigabyte or two) and if the query planner can switch to some
>> sort of hash algorithm.
>
> We're actually using a very small dataset right now. Being bounded by
> memory capacity is not a scalable approach for our application.

But the more you can fit into work_mem the faster it will go anyway.
So it's still worth a try.

From:	Yang Zhang <yanghatespam(at)gmail(dot)com>
To:	Frank Heikens <frankheikens(at)mac(dot)com>
Cc:	Alban Hertroys <dalroi(at)solfertje(dot)student(dot)utwente(dot)nl>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, pgsql-general(at)postgresql(dot)org
Subject:	Re: Sorting performance vs. MySQL
Date:	2010-02-22 20:05:17
Message-ID:	9066fa251002221205w193cf9dej8606b1ae3d90a92d@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

On Mon, Feb 22, 2010 at 2:41 PM, Frank Heikens <frankheikens(at)mac(dot)com> wrote:
>
> Op 22 feb 2010, om 20:28 heeft Yang Zhang het volgende geschreven:
>>
>>
>>>
>>>
>>>>> If your work-mem is too low there's a good chance that Postgres has to
>>>>> use your disks for sorting, which will obviously be quite slow.
>>>>
>>>> Relative to the non-terminating 80-minute-so-far sort, Unix sort runs
>>>> much faster (on the order of several minutes).
>>>
>>> Make sure your index does fit into memory, what's the size of the index?
>>
>> How might I find out the size and whether it's being fit in memory?
>
> SELECT pg_size_pretty(pg_relation_size('i_transactionid'));

pg_size_pretty
----------------
1080 MB
(1 row)
--
Yang Zhang
http://www.mit.edu/~y_z/

From:	Yeb Havinga <yebhavinga(at)gmail(dot)com>
To:	Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
Cc:	Yang Zhang <yanghatespam(at)gmail(dot)com>, pgsql-general(at)postgresql(dot)org
Subject:	Re: Sorting performance vs. MySQL
Date:	2010-02-22 20:37:55
Message-ID:	4B82EB23.3000307@gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

Scott Marlowe wrote:
> On Mon, Feb 22, 2010 at 11:10 AM, Yang Zhang <yanghatespam(at)gmail(dot)com> wrote:
>
>> I have the exact same table of data in both MySQL and Postgresql. In Postgresql:
>>
>
> Just wondering, are these on the same exact machine?
>
>
Just reading up on this interesting thread. WFIW, 2 years ago I and a
collegue of mine did a hardware comparison of early Intel and AMD
desktop quadcore processors to run postgres database, with most other
parts comparable. The intel processor was 20 to 30 % faster in cpu
operations to the (first generation) Phenom at almost everything, except
at index creation. The test that the AMD finished in a few minutes, we
had to stop on the Intel because it simply didn't finish. We double
checked configuration settings and could not find explainable
differences. I hesitate to post this information here, because its hard
to believe that an actual big difference between the processors exists,
and it more likely was something in our test setup. Still: the
difference was *only* in index creation, which is kindoff like a qsort yes?

egards
Yeb Havinga

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Yang Zhang <yanghatespam(at)gmail(dot)com>
Cc:	Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, pgsql-general(at)postgresql(dot)org
Subject:	Re: Sorting performance vs. MySQL
Date:	2010-02-22 20:44:23
Message-ID:	1079.1266871463@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

Yang Zhang <yanghatespam(at)gmail(dot)com> writes:
> On Mon, Feb 22, 2010 at 1:13 PM, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:
>> the speed depends on setting of working_memory. Try to increase a working_memory

> It's already at
> 20000kB

According to your original posting, you're trying to sort something like
a gigabyte of data. 20MB is peanuts. I wouldn't recommend increasing
the value across-the-board, but setting it to several hundred meg for
this particular query might help. How much RAM in your machine anyway?

Also, the fact that mysql is faster suggests that having an index does help.
Possibly the data is nearly ordered by transactionid, in which case an
indexscan would not have random-access problems and would be much faster
than an explicit sort.

regards, tom lane

From:	Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
To:	Yang Zhang <yanghatespam(at)gmail(dot)com>
Cc:	pgsql-general(at)postgresql(dot)org
Subject:	Re: Sorting performance vs. MySQL
Date:	2010-02-22 20:45:13
Message-ID:	dcc563d11002221245u36db3560t7886f3f8b0ed9735@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

On Mon, Feb 22, 2010 at 12:53 PM, Yang Zhang <yanghatespam(at)gmail(dot)com> wrote:
> On Mon, Feb 22, 2010 at 2:52 PM, Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com> wrote:
>> On Mon, Feb 22, 2010 at 11:10 AM, Yang Zhang <yanghatespam(at)gmail(dot)com> wrote:
>>> I have the exact same table of data in both MySQL and Postgresql. In Postgresql:
>>
>> Just wondering, are these on the same exact machine?
>>
>
> Yes, on the same disk.

I'm wondering how much of this could be caching effects. Is the MySQL
database "warmed up" before you started, and the pgsql database is
"cold" and no caching has taken place?

What do things like vmstat 10 say while the query is running on each
db? First time, second time, things like that.

Also, just curios, what's shared_buffers set to on the pgsql instance?

From:	Yang Zhang <yanghatespam(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, pgsql-general(at)postgresql(dot)org
Subject:	Re: Sorting performance vs. MySQL
Date:	2010-02-22 20:55:54
Message-ID:	9066fa251002221255u2f17fd37t9f5c332f96874b16@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

On Mon, Feb 22, 2010 at 3:44 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Yang Zhang <yanghatespam(at)gmail(dot)com> writes:
>> On Mon, Feb 22, 2010 at 1:13 PM, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:
>>> the speed depends on setting of working_memory. Try to increase a working_memory
>
>> It's already at
>> 20000kB
>
> According to your original posting, you're trying to sort something like
> a gigabyte of data. 20MB is peanuts. I wouldn't recommend increasing
> the value across-the-board, but setting it to several hundred meg for
> this particular query might help. How much RAM in your machine anyway?

We have 16GB of RAM, but again, Unix sort (and even our own
hand-rolled merge-sort) can operate zippily while avoiding consuming
additional memory.

All the same, we increased work_mem to 1GB, and still the query is not
completing.

>
> Also, the fact that mysql is faster suggests that having an index does help.
> Possibly the data is nearly ordered by transactionid, in which case an
> indexscan would not have random-access problems and would be much faster
> than an explicit sort.

Note that earlier in the thread I tried running this query with an
index scan, but it's still much slower.
--
Yang Zhang
http://www.mit.edu/~y_z/

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Yeb Havinga <yebhavinga(at)gmail(dot)com>
Cc:	Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>, Yang Zhang <yanghatespam(at)gmail(dot)com>, pgsql-general(at)postgresql(dot)org
Subject:	Re: Sorting performance vs. MySQL
Date:	2010-02-22 21:24:54
Message-ID:	1874.1266873894@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

Yeb Havinga <yebhavinga(at)gmail(dot)com> writes:
> Just reading up on this interesting thread. WFIW, 2 years ago I and a
> collegue of mine did a hardware comparison of early Intel and AMD
> desktop quadcore processors to run postgres database, with most other
> parts comparable. The intel processor was 20 to 30 % faster in cpu
> operations to the (first generation) Phenom at almost everything, except
> at index creation. The test that the AMD finished in a few minutes, we
> had to stop on the Intel because it simply didn't finish. We double
> checked configuration settings and could not find explainable
> differences. I hesitate to post this information here, because its hard
> to believe that an actual big difference between the processors exists,
> and it more likely was something in our test setup. Still: the
> difference was *only* in index creation, which is kindoff like a qsort yes?

Interesting. Yes, btree index creation is essentially a sort ...

regards, tom lane

From:	"Igor Neyman" <ineyman(at)perceptron(dot)com>
To:	"Yang Zhang" <yanghatespam(at)gmail(dot)com>, "Richard Broersma" <richard(dot)broersma(at)gmail(dot)com>
Cc:	"Frank Heikens" <frankheikens(at)mac(dot)com>, <pgsql-general(at)postgresql(dot)org>
Subject:	Re: Sorting performance vs. MySQL
Date:	2010-02-22 22:31:58
Message-ID:	F4C27E77F7A33E4CA98C19A9DC6722A205932548@EXCHANGE.corp.perceptron.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

When in doubt - test.
Why not remove index in MySQL (or create index in PostgreSQL) and see
what happens.
Why trying compare "apples and oranges"?

Igor Neyman

> -----Original Message-----
> From: Yang Zhang [mailto:yanghatespam(at)gmail(dot)com]
> Sent: Monday, February 22, 2010 1:37 PM
> To: Richard Broersma
> Cc: Frank Heikens; pgsql-general(at)postgresql(dot)org
> Subject: Re: Sorting performance vs. MySQL
>
> On Mon, Feb 22, 2010 at 1:30 PM, Richard Broersma
> <richard(dot)broersma(at)gmail(dot)com> wrote:
> > On Mon, Feb 22, 2010 at 10:17 AM, Frank Heikens
> <frankheikens(at)mac(dot)com> wrote:
> >
> >> There is no index on the column transactionid in your
> >> PostgreSQL-table, as there is in your MySQL-table. This
> explains the difference.
> >>
> >> CREATE INDEX i_transactionid ON public.metarelcloud_transactionlog
> >> (transactionid);
> >
> > Does an index help a sort operation in PostgreSQL?
>
> I also share the same doubt. An external merge-sort needs to
> make complete passes over the entire dataset, with no
> index-directed accesses.
> --
> Yang Zhang
> http://www.mit.edu/~y_z/
>

From:	Yang Zhang <yanghatespam(at)gmail(dot)com>
To:	Igor Neyman <ineyman(at)perceptron(dot)com>
Cc:	Richard Broersma <richard(dot)broersma(at)gmail(dot)com>, Frank Heikens <frankheikens(at)mac(dot)com>, pgsql-general(at)postgresql(dot)org
Subject:	Re: Sorting performance vs. MySQL
Date:	2010-02-22 22:33:16
Message-ID:	9066fa251002221433o36d2e5e7qa0a83988ede75c8e@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

On Mon, Feb 22, 2010 at 5:31 PM, Igor Neyman <ineyman(at)perceptron(dot)com> wrote:
> When in doubt - test.
> Why not remove index in MySQL (or create index in PostgreSQL) and see
> what happens.
> Why trying compare "apples and oranges"?

Continue reading this thread -- I also tried using an index in Postgresql.
--
Yang Zhang
http://www.mit.edu/~y_z/

From:	Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
To:	Yang Zhang <yanghatespam(at)gmail(dot)com>
Cc:	Igor Neyman <ineyman(at)perceptron(dot)com>, Richard Broersma <richard(dot)broersma(at)gmail(dot)com>, Frank Heikens <frankheikens(at)mac(dot)com>, pgsql-general(at)postgresql(dot)org
Subject:	Re: Sorting performance vs. MySQL
Date:	2010-02-22 23:51:48
Message-ID:	dcc563d11002221551m6ae5e742j889521648213dbeb@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

On Mon, Feb 22, 2010 at 3:33 PM, Yang Zhang <yanghatespam(at)gmail(dot)com> wrote:
> On Mon, Feb 22, 2010 at 5:31 PM, Igor Neyman <ineyman(at)perceptron(dot)com> wrote:
>> When in doubt - test.
>> Why not remove index in MySQL (or create index in PostgreSQL) and see
>> what happens.
>> Why trying compare "apples and oranges"?
>
> Continue reading this thread -- I also tried using an index in Postgresql.

But have you tried cranking up work_mem to say 1G?

From:	Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
To:	Yang Zhang <yanghatespam(at)gmail(dot)com>
Cc:	Igor Neyman <ineyman(at)perceptron(dot)com>, Richard Broersma <richard(dot)broersma(at)gmail(dot)com>, Frank Heikens <frankheikens(at)mac(dot)com>, pgsql-general(at)postgresql(dot)org
Subject:	Re: Sorting performance vs. MySQL
Date:	2010-02-22 23:52:19
Message-ID:	dcc563d11002221552i472601d9rb8b72280ad05b5f8@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

And oh yeah, what was shared_buffers set to? I'm not sure we ever got
an answer to that.

From:	Alex Hunsaker <badalex(at)gmail(dot)com>
To:	Yang Zhang <yanghatespam(at)gmail(dot)com>
Cc:	pgsql-general(at)postgresql(dot)org
Subject:	Re: Sorting performance vs. MySQL
Date:	2010-02-23 02:30:49
Message-ID:	34d269d41002221830p300dbe55kdbd002b9804ac6cd@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

On Mon, Feb 22, 2010 at 11:10, Yang Zhang <yanghatespam(at)gmail(dot)com> wrote:
> I have the exact same table of data in both MySQL and Postgresql. In ?> Postgresql:

FWIW on a stock (unchanged postgresql.conf) 8.3.9 I get (best of 3
runs) 79 seconds, 26 using an index and 27 seconds with it clustered.
Now yes it goes a lot faster because im skipping all the overhead of
sending the data to the client... But still that means it has almost
nothing with the sort or indexs. Leaves pretty much your cpu, disk,
filesystem and network... BTW the first time I did it it had to write
out the hint bits so that took (a bit) longer... Is this freshly
loaded data?

---

# create table metarelcould_transactionlog (
id serial primary key,
transactionid integer not null,
queryid smallint not null,
tableid varchar(30) not null,
tupleid integer not null,
querytype varchar not null,
graphpartition smallint,
replicatedpartition smallint,
justifiedpartition smallint,
hashpartition smallint,
modeid integer,
manualpartition smallint
);

# insert into metarelcould_transactionlog (transactionid, queryid,
tableid, tupleid, querytype, graphpartition, replicatedpartition,
justifiedpartition, hashpartition, modeid, manualpartition) select
foo, 1, 'sometable', 1, 's', 1, 1, 1, 1, 1, 1 from generate_series(1,
50000000) as foo;

# select count(1) from (SELECT * from metarelcould_transactionlog
order by transactionid) as foo;
count
----------
50000000
(1 row)

Time: 79017.186 ms

# create index idx on metarelcould_transactionlog (transactionid);
# select count(1) from (SELECT * from metarelcould_transactionlog
order by transactionid) as foo;
count
----------
50000000
(1 row)

Time: 26230.534 ms

# cluster metarelcould_transactionlog USING idx;
CLUSTER
Time: 342381.535 ms

# select count(1) from (SELECT * from metarelcould_transactionlog
order by transactionid) as foo;
count
----------
50000000
(1 row)

Time: 27704.794 ms

From:	Yang Zhang <yanghatespam(at)gmail(dot)com>
To:	Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
Cc:	pgsql-general(at)postgresql(dot)org
Subject:	Re: Sorting performance vs. MySQL
Date:	2010-02-23 05:51:16
Message-ID:	9066fa251002222151u87ba137lb7785e42d66e59d6@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

nnnnnOn Mon, Feb 22, 2010 at 3:45 PM, Scott Marlowe
<scott(dot)marlowe(at)gmail(dot)com> wrote:
> On Mon, Feb 22, 2010 at 12:53 PM, Yang Zhang <yanghatespam(at)gmail(dot)com> wrote:
>> On Mon, Feb 22, 2010 at 2:52 PM, Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com> wrote:
>>> On Mon, Feb 22, 2010 at 11:10 AM, Yang Zhang <yanghatespam(at)gmail(dot)com> wrote:
>>>> I have the exact same table of data in both MySQL and Postgresql. In Postgresql:
>>>
>>> Just wondering, are these on the same exact machine?
>>>
>>
>> Yes, on the same disk.
>
> I'm wondering how much of this could be caching effects. Is the MySQL
> database "warmed up" before you started, and the pgsql database is
> "cold" and no caching has taken place?
>
> What do things like vmstat 10 say while the query is running on each
> db? First time, second time, things like that.

Awesome -- this actually led me to discover the problem.

When running the query in MySQL InnoDB:

$ vmstat 10
procs -----------memory---------- ---swap-- -----io---- --system--
-----cpu------
r b swpd free buff cache si so bi bo in cs us sy id wa st
0 13 13733604 83020 5648 2193884 3 3 936 168 2 1
4 2 89 5 0
1 12 13749952 80164 5600 2178032 0 4354 908 4379 3586 2638
0 1 38 60 0
0 19 13762228 80576 5556 2145220 208 3527 1280 3690 3668 2635
1 1 39 59 0
0 19 13778632 79420 5560 2135228 52 4186 1046 4191 3682 2418
0 1 37 62 0
0 19 13792964 77336 5592 2082520 41 3731 1698 3804 4102 2686
1 1 53 45 0
0 14 13810356 84036 5556 2049836 36 4241 797 4246 3913 2603
0 1 68 31 0
1 14 13825640 81336 5520 2001920 0 4212 958 4220 3848 2736
1 1 73 25 0
0 17 13844952 78036 5476 1976956 8 4685 923 4689 3832 2547
0 1 69 29 0
2 13 13863828 79812 5448 1954952 3 4627 692 4634 3744 2505
0 1 70 28 0
0 15 13883828 77764 5440 1920528 249 4544 972 4548 4345 2506
0 1 70 28 0
1 20 13898900 79132 5456 1890192 28 4341 723 4438 4982 3030
0 3 64 33 0
0 11 13915252 85184 5624 1865260 79 3668 752 3764 4472 2765
0 3 57 40 0
0 12 13933964 78448 5700 1832640 120 4327 1066 4434 4484 2777
1 3 52 45 0
0 19 13951748 77640 5816 1795720 94 4005 1159 4091 4580 2762
1 3 48 49 0
0 16 13972748 79884 5780 1753676 0 4737 787 4746 4385 2766
1 3 51 45 0
0 25 13988108 78936 5884 1726068 547 3954 1468 4116 4976 3502
0 4 44 52 0
1 20 14011500 77676 5868 1689136 161 4980 843 5506 5218 3131
0 3 34 62 0
0 22 14032472 81348 5816 1647884 270 4198 943 4369 4521 2826
1 3 40 56 0
0 23 14055220 81712 5804 1626872 193 4774 1408 4856 4617 2754
1 3 38 58 0
0 21 14075848 81844 5696 1576836 0 4738 974 4742 4528 2704
1 3 40 56 0
0 25 14097260 79788 5628 1536336 213 4512 922 4639 4726 2929
1 3 27 69 0
0 24 14123900 80820 5616 1488460 319 5033 1059 5128 4895 2780
2 3 17 78 0
1 26 14142632 77276 5660 1445592 445 4605 1434 4727 5401 3364
1 3 16 79 0
0 31 14165668 83736 5976 1387048 362 4288 1444 4428 4739 2963
2 3 17 78 0
1 28 14180104 77564 6324 1369232 387 4526 2222 4677 5748 3559
1 3 16 80 0

I'm guessing the swap numbers are because MySQL uses mmap?

Anyway, when running the query in Postgresql:

$ vmstat 10
procs -----------memory---------- ---swap-- -----io---- --system--
-----cpu------
r b swpd free buff cache si so bi bo in cs us sy id wa st
1 2 13866336 1574540 25024 7878980 3 3 936 168 2 1
4 2 89 5 0
1 3 13861520 1163596 25328 8128360 1046 0 25429 433 4368 4175
4 2 80 14 0
0 3 13856484 803024 25600 8343220 1117 0 22062 688 4492 4590
4 2 73 20 0
0 6 13855304 678868 26052 8435540 160 0 9239 598 5195 7141
1 5 70 24 0
0 6 13853644 513568 26332 8563680 401 0 12480 7100 4775 4248
3 3 68 26 0
2 2 13851804 166228 26624 8775304 634 0 21466 1497 4680 4550
6 2 64 28 0
0 5 13861556 81896 26740 8825360 860 3547 6100 3847 5142 3386
6 2 57 35 0
0 6 13867688 91368 26808 8832712 653 3326 1835 3604 4738 2762
5 2 61 32 0
0 5 13870676 88524 26872 8849392 638 3272 2578 3517 4864 2909
4 2 55 39 0
0 5 13872748 79512 27004 8864456 629 1788 2086 2949 4337 2921
1 3 55 41 0
0 7 13876760 83124 27136 8867272 1018 2253 1713 2409 4321 2889
0 3 63 33 0
0 6 13878964 82876 27240 8874540 792 2119 1854 2314 4288 2813
2 2 72 24 0
3 4 13883204 81224 27280 8887068 661 3067 2995 3385 4558 2899
4 2 72 22 0
0 6 13886636 82036 27352 8905628 594 3726 2628 4013 4744 2765
4 2 69 25 0
0 8 13899852 85604 27400 8925800 638 4423 2689 4658 4903 2808
4 2 55 40 0
1 4 13905184 80544 27484 8940040 676 3501 3006 3799 4805 2932
4 2 66 28 0
0 9 13908480 80100 27516 8948476 668 2996 1720 3192 4594 2799
4 2 60 35 0

vmstat showed no swapping-out for a while, and then suddenly it
started spilling a lot. Checking psql's memory stats showed that it
was huge -- apparently, it's trying to store its full result set in
memory. As soon as I added a LIMIT 10000, everything worked
beautifully and finished in 4m (I verified that the planner was still
issuing a Sort).

I'm relieved that Postgresql itself does not, in fact, suck, but
slightly disappointed in the behavior of psql. I suppose it needs to
buffer everything in memory to properly format its tabular output,
among other possible reasons I could imagine.

Now it all almost seems obvious. My only solace is that it wasn't obv.
to the list either (thanks to everyone for their feedback!). Should've
just started with system profiling -- saves the day again.

>
> Also, just curios, what's shared_buffers set to on the pgsql instance?

shared_buffers
----------------
320000kB
(1 row)
--
Yang Zhang
http://www.mit.edu/~y_z/

From:	Yang Zhang <yanghatespam(at)gmail(dot)com>
To:	Alex Hunsaker <badalex(at)gmail(dot)com>
Cc:	pgsql-general(at)postgresql(dot)org
Subject:	Re: Sorting performance vs. MySQL
Date:	2010-02-23 05:55:33
Message-ID:	9066fa251002222155h6526c335mb4a0dc5739a1bde9@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

On Mon, Feb 22, 2010 at 9:30 PM, Alex Hunsaker <badalex(at)gmail(dot)com> wrote:
> On Mon, Feb 22, 2010 at 11:10, Yang Zhang <yanghatespam(at)gmail(dot)com> wrote:
>> I have the exact same table of data in both MySQL and Postgresql. In ?> Postgresql:
>
> FWIW on a stock (unchanged postgresql.conf) 8.3.9 I get (best of 3
> runs) 79 seconds, 26 using an index and 27 seconds with it clustered.
> Now yes it goes a lot faster because im skipping all the overhead of
> sending the data to the client...

Last sentence also contributed to my realizing the problem (the client
I was using was psql), but there's one oddity....

> # select count(1) from (SELECT * from metarelcould_transactionlog
> order by transactionid) as foo;

Does it strike anyone else that the query optimizer/rewriter should be
able to toss out the sort from such a query altogether?
--
Yang Zhang
http://www.mit.edu/~y_z/

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Yang Zhang <yanghatespam(at)gmail(dot)com>
Cc:	Alex Hunsaker <badalex(at)gmail(dot)com>, pgsql-general(at)postgresql(dot)org
Subject:	Re: Sorting performance vs. MySQL
Date:	2010-02-23 06:03:48
Message-ID:	11997.1266905028@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

Yang Zhang <yanghatespam(at)gmail(dot)com> writes:
>> # select count(1) from (SELECT * from metarelcould_transactionlog
>> order by transactionid) as foo;

> Does it strike anyone else that the query optimizer/rewriter should be
> able to toss out the sort from such a query altogether?

It could, if it knew that the aggregate function didn't care about input
row order. We don't have that knowledge about aggregates ATM.

regards, tom lane

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Yang Zhang <yanghatespam(at)gmail(dot)com>
Cc:	Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>, pgsql-general(at)postgresql(dot)org
Subject:	Re: Sorting performance vs. MySQL
Date:	2010-02-23 06:09:24
Message-ID:	12128.1266905364@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

Yang Zhang <yanghatespam(at)gmail(dot)com> writes:
> I'm relieved that Postgresql itself does not, in fact, suck, but
> slightly disappointed in the behavior of psql. I suppose it needs to
> buffer everything in memory to properly format its tabular output,
> among other possible reasons I could imagine.

That's half of it, and the other half is not wanting to present a
portion of query output if the query fails partway through. You could
certainly write a client that disregarded these issues (as I suppose
mysql must be doing).

regards, tom lane

From:	Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
To:	Yang Zhang <yanghatespam(at)gmail(dot)com>
Cc:	pgsql-general(at)postgresql(dot)org
Subject:	Re: Sorting performance vs. MySQL
Date:	2010-02-23 06:48:25
Message-ID:	dcc563d11002222248p3cfcd35ag52fc25f376649c09@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

On Mon, Feb 22, 2010 at 10:51 PM, Yang Zhang <yanghatespam(at)gmail(dot)com> wrote:
> nnnnnOn Mon, Feb 22, 2010 at 3:45 PM, Scott Marlowe
> <scott(dot)marlowe(at)gmail(dot)com> wrote:
>>
>> What do things like vmstat 10 say while the query is running on each
>> db? First time, second time, things like that.
>
> Awesome -- this actually led me to discover the problem.
>
> vmstat showed no swapping-out for a while, and then suddenly it
> started spilling a lot. Checking psql's memory stats showed that it
> was huge -- apparently, it's trying to store its full result set in
> memory. As soon as I added a LIMIT 10000, everything worked
> beautifully and finished in 4m (I verified that the planner was still
> issuing a Sort).
>
> I'm relieved that Postgresql itself does not, in fact, suck, but
> slightly disappointed in the behavior of psql. I suppose it needs to
> buffer everything in memory to properly format its tabular output,
> among other possible reasons I could imagine.

It's best when working with big sets to do so with a cursor and fetch
a few thousand rows at a time. It's how we handle really big sets at
work and it works like a charm in keeping the client from bogging down
with a huge memory footprint.

From:	Alex Hunsaker <badalex(at)gmail(dot)com>
To:	Yang Zhang <yanghatespam(at)gmail(dot)com>
Cc:	Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>, pgsql-general(at)postgresql(dot)org
Subject:	Re: Sorting performance vs. MySQL
Date:	2010-02-23 06:49:13
Message-ID:	34d269d41002222249u6b61e93fp7012b9d33c64b715@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

On Mon, Feb 22, 2010 at 22:51, Yang Zhang <yanghatespam(at)gmail(dot)com> wrote:
> vmstat showed no swapping-out for a while, and then suddenly it
> started spilling a lot. Checking psql's memory stats showed that it
> was huge -- apparently, it's trying to store its full result set in
> memory. As soon as I added a LIMIT 10000, everything worked
> beautifully and finished in 4m (I verified that the planner was still
> issuing a Sort).

Well im half surprised no one has recommend using a cursor. Have you
looked in to that? I bet that would fix most of your problems here.
>
> I'm relieved that Postgresql itself does not, in fact, suck, but
> slightly disappointed in the behavior of psql. I suppose it needs to
> buffer everything in memory to properly format its tabular output,
> among other possible reasons I could imagine.

Well AFAIK it will dump everything you asked for. So if you said
select * from 1G table; It should take at least 1G and potentially
quite a bit more formatting and overhead.

From:	Yang Zhang <yanghatespam(at)gmail(dot)com>
To:	Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
Cc:	pgsql-general(at)postgresql(dot)org
Subject:	Re: Sorting performance vs. MySQL
Date:	2010-02-23 07:02:33
Message-ID:	9066fa251002222302y31e1992awdddfa147d11ad9cf@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

On Tue, Feb 23, 2010 at 1:48 AM, Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com> wrote:
> On Mon, Feb 22, 2010 at 10:51 PM, Yang Zhang <yanghatespam(at)gmail(dot)com> wrote:
>> nnnnnOn Mon, Feb 22, 2010 at 3:45 PM, Scott Marlowe
>> <scott(dot)marlowe(at)gmail(dot)com> wrote:
>>>
>>> What do things like vmstat 10 say while the query is running on each
>>> db? First time, second time, things like that.
>>
>> Awesome -- this actually led me to discover the problem.
>>
>> vmstat showed no swapping-out for a while, and then suddenly it
>> started spilling a lot. Checking psql's memory stats showed that it
>> was huge -- apparently, it's trying to store its full result set in
>> memory. As soon as I added a LIMIT 10000, everything worked
>> beautifully and finished in 4m (I verified that the planner was still
>> issuing a Sort).
>>
>> I'm relieved that Postgresql itself does not, in fact, suck, but
>> slightly disappointed in the behavior of psql. I suppose it needs to
>> buffer everything in memory to properly format its tabular output,
>> among other possible reasons I could imagine.
>
> It's best when working with big sets to do so with a cursor and fetch
> a few thousand rows at a time. It's how we handle really big sets at
> work and it works like a charm in keeping the client from bogging down
> with a huge memory footprint.
>

Thing is, this is how I got here:

- ran complex query that does SELECT INTO.
- that never terminated, so killed it and tried a simpler SELECT (the
subject of this thread) from psql to see how long that would take.

I.e., my original application doesn't receive the entire dataset.
--
Yang Zhang
http://www.mit.edu/~y_z/

From:	Greg Stark <gsstark(at)mit(dot)edu>
To:	Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
Cc:	Yang Zhang <yanghatespam(at)gmail(dot)com>, pgsql-general(at)postgresql(dot)org
Subject:	Re: Sorting performance vs. MySQL
Date:	2010-02-23 09:02:58
Message-ID:	407d949e1002230102u1a374386h7aa5502839b965f@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

On Tue, Feb 23, 2010 at 6:48 AM, Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com> wrote:
>> I'm relieved that Postgresql itself does not, in fact, suck, but
>> slightly disappointed in the behavior of psql. I suppose it needs to
>> buffer everything in memory to properly format its tabular output,
>> among other possible reasons I could imagine.
>
> It's best when working with big sets to do so with a cursor and fetch
> a few thousand rows at a time. It's how we handle really big sets at
> work and it works like a charm in keeping the client from bogging down
> with a huge memory footprint.
>

You can do \set FETCH_COUNT to have psql use a cursor automatically.

--
greg

From:	John Gage <jsmgage(at)numericable(dot)fr>
To:	Yang Zhang <yanghatespam(at)gmail(dot)com>
Cc:	pgsql-general(at)postgresql(dot)org
Subject:	Re: Sorting performance vs. MySQL
Date:	2010-02-23 09:22:21
Message-ID:	5A438B61-B517-41B8-BF81-685884782335@numericable.fr
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

I am under the impression that MySQL does not have anything resembling
Postgres' support for regular expressions. Though some might think
that regular expressions are a sort of poor man's SQL, in any
application which manages large amounts of text they are crucial.
Postgres definitely does not suck.

Is this the authoratative webpage for "Snowball" (which I never
realized was a play on "Snobol")?

http://snowball.tartarus.org/

Thanks,

John

On Feb 23, 2010, at 6:51 AM, Yang Zhang wrote:

> I'm relieved that Postgresql itself does not, in fact, suck,

From:	Yeb Havinga <yebhavinga(at)gmail(dot)com>
To:	Greg Stark <gsstark(at)mit(dot)edu>
Cc:	Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>, Yang Zhang <yanghatespam(at)gmail(dot)com>, pgsql-general(at)postgresql(dot)org
Subject:	Re: Sorting performance vs. MySQL
Date:	2010-02-23 09:48:14
Message-ID:	4B83A45E.1080701@gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

Greg Stark wrote:
>
> You can do \set FETCH_COUNT to have psql use a cursor automatically.
>
It seems like a big win in this case. What would be the downside of
having a fetch_count set default in psql?

regards
Yeb Havinga

From:	Greg Stark <gsstark(at)mit(dot)edu>
To:	Yeb Havinga <yebhavinga(at)gmail(dot)com>
Cc:	Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>, Yang Zhang <yanghatespam(at)gmail(dot)com>, pgsql-general(at)postgresql(dot)org
Subject:	Re: Sorting performance vs. MySQL
Date:	2010-02-23 10:17:01
Message-ID:	407d949e1002230217u43d55ccg9e07f0014532da44@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

On Tue, Feb 23, 2010 at 9:48 AM, Yeb Havinga <yebhavinga(at)gmail(dot)com> wrote:
> Greg Stark wrote:
>>
>> You can do \set FETCH_COUNT to have psql use a cursor automatically.
>>
>
> It seems like a big win in this case. What would be the downside of having a
> fetch_count set default in psql?

They were mentioned previously in this thread:

1) If an error occurs partway through the execution of the query you
might receive part of the result set.

2) psql won't be able to align the columns properly

--
greg

From:	Alex Hunsaker <badalex(at)gmail(dot)com>
To:	Yang Zhang <yanghatespam(at)gmail(dot)com>
Cc:	Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>, pgsql-general(at)postgresql(dot)org
Subject:	Re: Sorting performance vs. MySQL
Date:	2010-02-23 17:01:34
Message-ID:	34d269d41002230901na9838a4uc82617785ac5c9cc@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

On Tue, Feb 23, 2010 at 00:02, Yang Zhang <yanghatespam(at)gmail(dot)com> wrote:
> Thing is, this is how I got here:
>
> - ran complex query that does SELECT INTO.
> - that never terminated, so killed it and tried a simpler SELECT (the
> subject of this thread) from psql to see how long that would take.

You might have better luck if you paste the EXPLAIN ANALYZE of the
SELECT INTO here (Maybe a new thread? Maybe on -performance? use your
judgement...). But I bet if its doing something with transactionid
like your straight select was, an index would help. If you are just
using SELECT INTO to copy all of the data into a new table... COPY
might be faster or CREATE TABLE AS.

From:	Baron Schwartz <baron(at)xaprb(dot)com>
To:	Yang Zhang <yanghatespam(at)gmail(dot)com>
Cc:	Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>, pgsql-general(at)postgresql(dot)org
Subject:	Re: Sorting performance vs. MySQL
Date:	2010-02-24 04:08:22
Message-ID:	4cfa0b031002232008q3dff3354paabede9ece442774@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-general

Hi,

On Tue, Feb 23, 2010 at 12:51 AM, Yang Zhang <yanghatespam(at)gmail(dot)com> wrote:
> When running the query in MySQL InnoDB:
>
> $ vmstat 10
> procs -----------memory---------- ---swap-- -----io---- --system--
> -----cpu------
> r b swpd free buff cache si so bi bo in cs us sy id wa st
> 0 13 13733604 83020 5648 2193884 3 3 936 168 2 1
> 4 2 89 5 0
> 1 12 13749952 80164 5600 2178032 0 4354 908 4379 3586 2638
> 0 1 38 60 0
> 0 19 13762228 80576 5556 2145220 208 3527 1280 3690 3668 2635
> 1 1 39 59 0
> 0 19 13778632 79420 5560 2135228 52 4186 1046 4191 3682 2418
> 0 1 37 62 0
[snip]
> I'm guessing the swap numbers are because MySQL uses mmap?

InnoDB doesn't use mmap.

Baron