Re: Changeset Extraction v7.6.1

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Erik Rijkers <er(at)xs4all(dot)nl>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Changeset Extraction v7.6.1
Date: 2014-02-19 18:31:06
Message-ID: CA+TgmoZEd4wbNPn-P6BrXXhs7_fPVfUWM_Nryo_XBM2RruKU_Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Feb 18, 2014 at 4:33 AM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> On 2014-02-17 21:35:23 -0500, Robert Haas wrote:
>> What
>> I don't understand is why we're not taking the test_decoding module,
>> polishing it up a little to produce some nice, easily
>> machine-parseable output, calling it basic_decoding, and shipping
>> that. Then people who want something else can build it, but people
>> who are happy with something basic will already have it.
>
> Because every project is going to need their own plugin
> *anyway*. Londiste, slony sure are going to ignore changes to relations
> they don't need. Querying their own metadata. They will want
> compatibility to the earlier formats as far as possible. Sometime not
> too far away they will want to optionally support binary output because
> it's so much faster.
> There's just not much chance that either of these will be able to agree
> on a format short term.

Ah, so part of what you're expecting the output plugin to do is
filtering. I can certainly see where there might be considerable
variation between solutions in that area - but I think that's separate
from the question of formatting per se. Although I think we should
have an in-core output plugin with filtering capabilities eventually,
I'm happy to define that as out of scope for 9.4. But isn't there a
way that we can ship something that will due for people who want to
just see the database's entire change stream float by?

TBH, as compared to what you've got now, I think this mostly boils
down to a question of quoting and escaping. I'm not really concerned
with whether we ship something that's perfectly efficient, or that has
filtering capabilities, or that has a lot of fancy bells and whistles.
What I *am* concerned about is that if the user updates a text field
that contains characters like " or ' or : or [ or ] or , that somebody
might be using as delimiters in the output format, that a program can
still parse that output format and reliably determine what the actual
change was. I don't care all that much whether we use JSON or CSV or
something custom, but the data that gets spit out should not have
SQL-injection-like vulnerabilities.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Marti Raudsepp 2014-02-19 18:39:40 Re: PoC: Partial sort
Previous Message Marti Raudsepp 2014-02-19 18:29:38 Re: Draft release notes up for review