Re: dependency between numbers keywords and parser speed

Lists: pgsql-hackers
From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: dependency between numbers keywords and parser speed
Date: 2011-03-14 20:11:52
Message-ID: AANLkTik=PsqxLdGFa4A71Pnx7n1WE1UyOpxVnc4RXVHb@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hello,

there was a discussion about impact of number of keyword for parser
speed. I did some synthetic tests and I didn't see any slowness on
pgbench when I increased a number of keywords.

I added a 30 reserved keywords and 30 unreserved keywords.

On my Intel(R) Core(TM)2 Duo CPU E8400 @ 3.00GHz wasn't a
significant difference between patched and unpatched server.

Tested read only queries. I am sure, so there will be any dependency,
but it probably needs more keywords, then I tested.

Regards

Pavel Stehule


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: dependency between numbers keywords and parser speed
Date: 2011-03-14 20:34:40
Message-ID: 29940.1300134880@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> writes:
> there was a discussion about impact of number of keyword for parser
> speed. I did some synthetic tests and I didn't see any slowness on
> pgbench when I increased a number of keywords.

I don't see any particular reason to suppose that pgbench would be a
good framework for stressing parsing speed. The queries it issues
are of trivial length.

regards, tom lane


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Mark Wong <markwkm(at)gmail(dot)com>
Subject: Re: dependency between numbers keywords and parser speed
Date: 2011-03-14 20:46:47
Message-ID: 4D7E7EB7.20409@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 3/14/11 1:34 PM, Tom Lane wrote:
> Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> writes:
>> there was a discussion about impact of number of keyword for parser
>> speed. I did some synthetic tests and I didn't see any slowness on
>> pgbench when I increased a number of keywords.
>
> I don't see any particular reason to suppose that pgbench would be a
> good framework for stressing parsing speed. The queries it issues
> are of trivial length.

TPC-H might work well. Mark, is DBT3 still in usable condition?

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: dependency between numbers keywords and parser speed
Date: 2011-03-15 01:07:43
Message-ID: AANLkTimKepxX08TknwiTfdAu6XU6maTFKJqUwrnmEJjX@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Mar 14, 2011 at 4:34 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> writes:
>> there was a discussion about impact of number of keyword for parser
>> speed. I did some synthetic tests and I didn't see any slowness on
>> pgbench when I increased a number of keywords.
>
> I don't see any particular reason to suppose that pgbench would be a
> good framework for stressing parsing speed.  The queries it issues
> are of trivial length.

I found that it was actually a fairly measurable component of the
select-only test when running with shared_buffers cranked up to a
reasonable value. But it'd probably be a lot easier to measure on a
benchmark specifically targeted at the parser.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: dependency between numbers keywords and parser speed
Date: 2011-03-15 06:19:44
Message-ID: AANLkTinX8d8yf5z=_Mj2TV4qJ_Jxhi0J8fHgthJu6d_R@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2011/3/15 Robert Haas <robertmhaas(at)gmail(dot)com>:
> On Mon, Mar 14, 2011 at 4:34 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> writes:
>>> there was a discussion about impact of number of keyword for parser
>>> speed. I did some synthetic tests and I didn't see any slowness on
>>> pgbench when I increased a number of keywords.
>>
>> I don't see any particular reason to suppose that pgbench would be a
>> good framework for stressing parsing speed.  The queries it issues
>> are of trivial length.
>
> I found that it was actually a fairly measurable component of the
> select-only test when running with shared_buffers cranked up to a
> reasonable value.  But it'd probably be a lot easier to measure on a
> benchmark specifically targeted at the parser.
>

When I tested it - all data was in memory, there was a minimal (near
zero IO) and I run read only test.

It doesn't mean, so parser is gratis, but my numbers doesn't show any
potential problem with 60 new keywords.

Pavel

> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: dependency between numbers keywords and parser speed
Date: 2011-03-15 14:58:24
Message-ID: AANLkTi=01TLr1+Y9pEwOO5Lw6ndVBdxBoyab796CPSyR@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Mar 15, 2011 at 2:19 AM, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:
> 2011/3/15 Robert Haas <robertmhaas(at)gmail(dot)com>:
>> On Mon, Mar 14, 2011 at 4:34 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> writes:
>>>> there was a discussion about impact of number of keyword for parser
>>>> speed. I did some synthetic tests and I didn't see any slowness on
>>>> pgbench when I increased a number of keywords.
>>>
>>> I don't see any particular reason to suppose that pgbench would be a
>>> good framework for stressing parsing speed.  The queries it issues
>>> are of trivial length.
>>
>> I found that it was actually a fairly measurable component of the
>> select-only test when running with shared_buffers cranked up to a
>> reasonable value.  But it'd probably be a lot easier to measure on a
>> benchmark specifically targeted at the parser.
>>
>
> When I tested it - all data was in memory, there was a minimal (near
> zero IO) and I run read only test.
>
> It doesn't mean, so parser is gratis, but my numbers doesn't show any
> potential problem with 60 new keywords.

That's an interesting result, although it would be more interesting if
you posted the patch and benchmark methodology. It's important for us
not to overestimate the cost of adding keywords, and I don't object to
adding them where it adds meaningful clarity that is not otherwise
available or where it is necessary to comply with the SQL spec. But I
do think it is worth being disciplined about. We should think about
wording commands in a way that won't require new keywords; if there's
not a reasonable way to do it, then we add a keyword. Our preference
should be not to add keywords where that's reasonably possible.

It is particularly important for us to avoid keywords that are
partially or fully reserved. In that case, the issue is not parser
overhead but the fact that it breaks compatibility with previous
releases. pg_dump files can't be loaded, PL/pgsql procedures break,
and so on. I have been here and it isn't fun.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: dependency between numbers keywords and parser speed
Date: 2011-03-15 16:09:47
Message-ID: AANLkTinV1RgTk1ofiUtY3jj0BkJ2dt3gJMh1Tn7jGAep@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2011/3/15 Robert Haas <robertmhaas(at)gmail(dot)com>:
> On Tue, Mar 15, 2011 at 2:19 AM, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:
>> 2011/3/15 Robert Haas <robertmhaas(at)gmail(dot)com>:
>>> On Mon, Mar 14, 2011 at 4:34 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>>> Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> writes:
>>>>> there was a discussion about impact of number of keyword for parser
>>>>> speed. I did some synthetic tests and I didn't see any slowness on
>>>>> pgbench when I increased a number of keywords.
>>>>
>>>> I don't see any particular reason to suppose that pgbench would be a
>>>> good framework for stressing parsing speed.  The queries it issues
>>>> are of trivial length.
>>>
>>> I found that it was actually a fairly measurable component of the
>>> select-only test when running with shared_buffers cranked up to a
>>> reasonable value.  But it'd probably be a lot easier to measure on a
>>> benchmark specifically targeted at the parser.
>>>
>>
>> When I tested it - all data was in memory, there was a minimal (near
>> zero IO) and I run read only test.
>>
>> It doesn't mean, so parser is gratis, but my numbers doesn't show any
>> potential problem with 60 new keywords.
>
> That's an interesting result, although it would be more interesting if
> you posted the patch and benchmark methodology.  It's important for us
> not to overestimate the cost of adding keywords, and I don't object to
> adding them where it adds meaningful clarity that is not otherwise
> available or where it is necessary to comply with the SQL spec.  But I
> do think it is worth being disciplined about.  We should think about
> wording commands in a way that won't require new keywords; if there's
> not a reasonable way to do it, then we add a keyword.  Our preference
> should be not to add keywords where that's reasonably possible.
>
> It is particularly important for us to avoid keywords that are
> partially or fully reserved.  In that case, the issue is not parser
> overhead but the fact that it breaks compatibility with previous
> releases.  pg_dump files can't be loaded, PL/pgsql procedures break,
> and so on.  I have been here and it isn't fun.
>

I agree and I understand well a problems with keywords. Just I would
to know a real limits of bison and I can say so 60 keywords are not a
problem.

Real test of parser's speed should be done on short and quick queries.
It can be unexpected so parser should be a bottle neck on long OLAP
queries.

Patch is added

Pavel

p.s. I am sure so this test depends on platform.

> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>

Attachment Content-Type Size
keywords-test.diff text/x-patch 10.1 KB

From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: dependency between numbers keywords and parser speed
Date: 2011-03-15 23:04:28
Message-ID: 20110315230427.GA28601@svana.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Mar 15, 2011 at 05:09:47PM +0100, Pavel Stehule wrote:
> Real test of parser's speed should be done on short and quick queries.
> It can be unexpected so parser should be a bottle neck on long OLAP
> queries.

Surely parsing overhead could be measured by simply PREPAREing every
query, rather than executing them.

Have a nice day,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> Patriotism is when love of your own people comes first; nationalism,
> when hate for people other than your own comes first.
> - Charles de Gaulle