Re: Parallel Select query performance and shared buffers

Lists: pgsql-hackerspgsql-performance
From: Metin Doslu <metin(at)citusdata(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Parallel Select query performance and shared buffers
Date: 2013-12-03 13:49:07
Message-ID: CAL1dPcec7wMC9Z3VHrfO-2sEUqTH0Aawoi-JVJ3H45sQFFi3yA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

We have several independent tables on a multi-core machine serving Select
queries. These tables fit into memory; and each Select queries goes over
one table's pages sequentially. In this experiment, there are no indexes or
table joins.

When we send concurrent Select queries to these tables, query performance
doesn't scale out with the number of CPU cores. We find that complex Select
queries scale out better than simpler ones. We also find that increasing
the block size from 8 KB to 32 KB, or increasing shared_buffers to include
the working set mitigates the problem to some extent.

For our experiments, we chose an 8-core machine with 68 GB of memory from
Amazon's EC2 service. We installed PostgreSQL 9.3.1 on the instance, and
set shared_buffers to 4 GB.

We then generated 1, 2, 4, and 8 separate tables using the data generator
from the industry standard TPC-H benchmark. Each table we generated, called
lineitem-1, lineitem-2, etc., had about 750 MB of data. Next, we sent 1, 2,
4, and 8 concurrent Select queries to these tables to observe the scale out
behavior. Our expectation was that since this machine had 8 cores, our run
times would stay constant all throughout. Also, we would have expected the
machine's CPU utilization to go up to 100% at 8 concurrent queries. Neither
of those assumptions held true.

We found that query run times degraded as we increased the number of
concurrent Select queries. Also, CPU utilization flattened out at less than
50% for the simpler queries. Full results with block size of 8KB are below:

Table select count(*) TPC-H Simple (#6)[2]
TPC-H Complex (#1)[1]
1 Table / 1 query 1.5 s 2.5 s
8.4 s
2 Tables / 2 queries 1.5 s 2.5 s
8.4 s
4 Tables / 4 queries 2.0 s 2.9 s
8.8 s
8 Tables / 8 queries 3.3 s 4.0 s
9.6 s

We then increased the block size (BLCKSZ) from 8 KB to 32 KB and recompiled
PostgreSQL. This change had a positive impact on query completion times.
Here are the new results with block size of 32 KB:

Table select count(*) TPC-H Simple (#6)[2]
TPC-H Complex (#1)[1]
1 Table / 1 query 1.5 s 2.3 s
8.0 s
2 Tables / 2 queries 1.5 s 2.3 s
8.0 s
4 Tables / 4 queries 1.6 s 2.4 s
8.1 s
8 Tables / 8 queries 1.8 s 2.7 s
8.3 s

As a quick side, we also repeated the same experiment on an EC2 instance
with 16 CPU cores, and found that the scale out behavior became worse
there. (We also tried increasing the shared_buffers to 30 GB. This change
completely solved the scaling out problem on this instance type, but hurt
our performance on the hi1.4xlarge instances.)

Unfortunately, increasing the block size from 8 to 32 KB has other
implications for some of our customers. Could you help us out with the
problem here?

What can we do to identify the problem's root cause? Can we work around it?

Thank you,
Metin

[1] http://examples.citusdata.com/tpch_queries.html#query-1
[2] http://examples.citusdata.com/tpch_queries.html#query-6


From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Metin Doslu <metin(at)citusdata(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Parallel Select query performance and shared buffers
Date: 2013-12-03 13:53:23
Message-ID: 20131203135323.GA5158@eldon.alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

Metin Doslu wrote:

> When we send concurrent Select queries to these tables, query performance
> doesn't scale out with the number of CPU cores. We find that complex Select
> queries scale out better than simpler ones. We also find that increasing
> the block size from 8 KB to 32 KB, or increasing shared_buffers to include
> the working set mitigates the problem to some extent.

Maybe you could help test this patch:
http://www.postgresql.org/message-id/20131115194725.GG5489@awork2.anarazel.de

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Claudio Freire <klaussfreire(at)gmail(dot)com>
To: Metin Doslu <metin(at)citusdata(dot)com>
Cc: postgres performance list <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Parallel Select query performance and shared buffers
Date: 2013-12-03 15:56:11
Message-ID: CAGTBQpZtU0yo=eV7Xxw6gCvMoMkMZ-qFi+2BxgSisdzrn44c3Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

On Tue, Dec 3, 2013 at 10:49 AM, Metin Doslu <metin(at)citusdata(dot)com> wrote:
> We have several independent tables on a multi-core machine serving Select
> queries. These tables fit into memory; and each Select queries goes over one
> table's pages sequentially. In this experiment, there are no indexes or
> table joins.
>
> When we send concurrent Select queries to these tables, query performance
> doesn't scale out with the number of CPU cores. We find that complex Select
> queries scale out better than simpler ones. We also find that increasing the
> block size from 8 KB to 32 KB, or increasing shared_buffers to include the
> working set mitigates the problem to some extent.
>
> For our experiments, we chose an 8-core machine with 68 GB of memory from
> Amazon's EC2 service. We installed PostgreSQL 9.3.1 on the instance, and set
> shared_buffers to 4 GB.

If you are certain your tables fit in RAM, you may want to disable
synchronized sequential scans, as they will create contention between
the threads.


From: Metin Doslu <metin(at)citusdata(dot)com>
To: Claudio Freire <klaussfreire(at)gmail(dot)com>
Cc: postgres performance list <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Parallel Select query performance and shared buffers
Date: 2013-12-03 16:24:55
Message-ID: CAL1dPccjD+J8SH40WajMZFrUVoxSsLUQV1zQN_3tGqOUVgs-_A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

Looking into syncscan.c, it says in comments:

"When multiple backends run a sequential scan on the same table, we try to
keep them synchronized to reduce the overall I/O needed."

But in my workload, every process was running on a different table.

On Tue, Dec 3, 2013 at 5:56 PM, Claudio Freire <klaussfreire(at)gmail(dot)com>wrote:

> On Tue, Dec 3, 2013 at 10:49 AM, Metin Doslu <metin(at)citusdata(dot)com> wrote:
> > We have several independent tables on a multi-core machine serving Select
> > queries. These tables fit into memory; and each Select queries goes over
> one
> > table's pages sequentially. In this experiment, there are no indexes or
> > table joins.
> >
> > When we send concurrent Select queries to these tables, query performance
> > doesn't scale out with the number of CPU cores. We find that complex
> Select
> > queries scale out better than simpler ones. We also find that increasing
> the
> > block size from 8 KB to 32 KB, or increasing shared_buffers to include
> the
> > working set mitigates the problem to some extent.
> >
> > For our experiments, we chose an 8-core machine with 68 GB of memory from
> > Amazon's EC2 service. We installed PostgreSQL 9.3.1 on the instance, and
> set
> > shared_buffers to 4 GB.
>
>
> If you are certain your tables fit in RAM, you may want to disable
> synchronized sequential scans, as they will create contention between
> the threads.
>


From: Claudio Freire <klaussfreire(at)gmail(dot)com>
To: Metin Doslu <metin(at)citusdata(dot)com>
Cc: postgres performance list <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Parallel Select query performance and shared buffers
Date: 2013-12-03 16:32:47
Message-ID: CAGTBQpYE3oRUfrt+++E-AFenfnnC8tRZie-Gf7fZHkV48Cwxyg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

On Tue, Dec 3, 2013 at 1:24 PM, Metin Doslu <metin(at)citusdata(dot)com> wrote:
> Looking into syncscan.c, it says in comments:
>
> "When multiple backends run a sequential scan on the same table, we try to
> keep them synchronized to reduce the overall I/O needed."
>
> But in my workload, every process was running on a different table.

Ah, ok, so that's what you meant by "independent tables".


From: Metin Doslu <metin(at)citusdata(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PERFORM] Parallel Select query performance and shared buffers
Date: 2013-12-04 13:19:29
Message-ID: CAL1dPcerFdFksv+FYsx4QWaJFdzU_LSbHSppjjNxuX4zb=v-Jw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

> Maybe you could help test this patch:
>
http://www.postgresql.org/message-id/20131115194725.GG5489@awork2.anarazel.de

Which repository should I apply these patches. I tried main repository, 9.3
stable and source code of 9.3.1, and in my trials at least of one the
patches is failed. What patch command should I use?