Re: pg_stat_statements fingerprinting logic and ArrayExpr

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)heroku(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_stat_statements fingerprinting logic and ArrayExpr
Date: 2013-12-10 21:09:02
Message-ID: CA+TgmoaC-FS45m_X+Gn4QvR4waRuex5x0C8+iFzyQOYP_kVrFw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Dec 10, 2013 at 4:30 AM, Peter Geoghegan <pg(at)heroku(dot)com> wrote:
> pg_stat_statements' fingerprinting logic considers the following two
> statements as distinct:
>
> select 1 in (1, 2, 3);
> select 1 in (1, 2, 3, 4);
>
> This is because the ArrayExpr jumble case jumbles any ArrayExpr's list
> of elements recursively. In this case it's a list of Const nodes, and
> the fingerprinting logic jumbles those nodes indifferently.
>
> Somebody told me that they think that pg_stat_statements should not do
> that. This person felt that it would be preferable for such
> expressions to be normalized without regard to the number of distinct
> Const elements. I suppose that that would work by determing if the
> ArrayExpr elements list was a list of Const nodes and only const
> nodes. Iff that turned out to be the case, something else would be
> jumbled (something other than the list) that would essentially be a
> representation of "some list of zero or more (or maybe one or more)
> Const nodes with consttype of, in this example, 23". I think that this
> would make at least one person happy, because of course the two
> statements above would have their costs aggregated within a single
> pg_stat_statements entry.
>
> I'm not sure that I agree, but there is anecdata that suggests that it
> isn't uncommon for these sorts of queries to be broken out when
> they're all traceable back to a single point in the application
> (apparently it's common for Django apps to do so, perhaps
> questionably). If we assume that doing what I've described has no real
> downside, then it would probably be worth implementing. Plus I'm
> pretty sure that tools that do regex normalization are already doing
> something analogous. Thoughts?

Sounds like this:

// What's the worst thing that could happen?
pandoras_box.open();

I am very wary of implementing special-case logic here even though I
know it could be useful to some people, simply because I fear that
there could be a near-infinite variety of situations where, in a
particular environment, a particular distinction isn't important. We
won't be serving anyone well if we ignore all of those distinctions,
because not everyone will want to ignore all of them. And adding a
configuration option for each one doesn't sound like a good idea,
either. *Maybe* this particular case is narrow enough that it's OK to
just ignore it unconditionally and maybe that'd be OK ... but I fear
this is a rathole.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2013-12-10 21:12:39 Re: tracking commit timestamps
Previous Message Alvaro Herrera 2013-12-10 21:04:12 Re: tracking commit timestamps