Re: [RFC] CREATE QUEUE (log-only table) for londiste/pgQ ccompatibility

Lists: pgsql-hackers
From: Hannu Krosing <hannu(at)2ndQuadrant(dot)com>
To: pgsql-hackers(at)postgresql(dot)org, Andres Freund <andres(at)2ndquadrant(dot)com>, Simon Riggs <simon(at)2ndQuadrant(dot)com>, Marko Kreen <markokr(at)gmail(dot)com>
Subject: [RFC] CREATE QUEUE (log-only table) for londiste/pgQ ccompatibility
Date: 2012-10-16 08:56:43
Message-ID: 507D214B.601@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hallo postgresql and replication hackers

This mail is an additional RFC which proposes a simple way to extend the
new logical replication feature so it can cover most usages of
skytools/pgq/londiste

While the current work for BDR/LCR (bi-directional replication/logical
replication)
using WAL is theoretically enought to cover _replication_ offered by
Londiste it
falls short in one important way - there is currently no support for
pure queueing,
that is for "streams" of data which does not need to be stored in the
source database.

Fortunately there is a simple solution - do not store it in the source
database :)

The only thing needed for adding this is to have a table type which

a) generates a INSERT record in WAL

and

b) does not actually store the data in a local file

If implemented in userspace it would be a VIEW (or table) with a
before/instead
trigger which logs the inserted data and then cancels the insert.

I'm sure this thing could be implemented, but I leave the tech
discussion to those
who are currently deep in WAL generation/reconstruction .

If we implement logged only tables / queues we would not only enable a more
performant pgQ replacement for implementing full Londiste / skytools
functionality
but would also become a very strong player to be used as persistent
basis for
message queueing solutions like ActiveMQ, StorMQ, any Advanced Message
Queuing Protocol (AMQP) and so on.

comments ?

Hannu Krosing


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Hannu Krosing <hannu(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, Andres Freund <andres(at)2ndquadrant(dot)com>, Marko Kreen <markokr(at)gmail(dot)com>
Subject: Re: [RFC] CREATE QUEUE (log-only table) for londiste/pgQ ccompatibility
Date: 2012-10-16 09:18:44
Message-ID: CA+U5nMKKCRuqHMhX4_aQ+8foc6Gy+jfzK9sbKxY4reE3y=byZA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 16 October 2012 09:56, Hannu Krosing <hannu(at)2ndquadrant(dot)com> wrote:
> Hallo postgresql and replication hackers
>
> This mail is an additional RFC which proposes a simple way to extend the
> new logical replication feature so it can cover most usages of
> skytools/pgq/londiste
>
> While the current work for BDR/LCR (bi-directional replication/logical
> replication) using WAL is theoretically enought to cover _replication_ offered by
> Londiste it falls short in one important way - there is currently no support for pure
> queueing, that is for "streams" of data which does not need to be stored in the source
> database.
>
> Fortunately there is a simple solution - do not store it in the source
> database :)
>
> The only thing needed for adding this is to have a table type which
>
> a) generates a INSERT record in WAL
>
> and
>
> b) does not actually store the data in a local file
>
> If implemented in userspace it would be a VIEW (or table) with a
> before/instead
> trigger which logs the inserted data and then cancels the insert.
>
> I'm sure this thing could be implemented, but I leave the tech discussion to
> those who are currently deep in WAL generation/reconstruction .
>
> If we implement logged only tables / queues we would not only enable a more
> performant pgQ replacement for implementing full Londiste / skytools
> functionality
> but would also become a very strong player to be used as persistent basis
> for message queueing solutions like ActiveMQ, StorMQ, any Advanced Message
> Queuing Protocol (AMQP) and so on.

Hmm, I was assuming that we'd be able to do that by just writing extra
WAL directly. But now you've made me think about it, that would be
very ugly.

Doing it this was, as you suggest, would allow us to write WAL records
for queuing/replication to specific queue ids. It also allows us to
have privileges assigned. So this looks like a good idea and might
even be possible for 9.3.

I've got a feeling we may want the word QUEUE again in the future, so
I think we should call this a MESSAGE QUEUE.

CREATE MESSAGE QUEUE foo;
DROP MESSAGE QUEUE foo;

GRANT INSERT ON MESSAGE QUEUE foo TO ...;
REVOKE INSERT ON MESSAGE QUEUE foo TO ...;

Rules wouldn't. DELETE and UPDATE wouldn't work, nor would SELECT.

Things for next release: Triggers, SELECT sees a stream of changes,
CHECK clauses to constrain what can be written.

One question: would we require the INSERT statement to parse against a
tupledesc, or would it be just a single blob of TEXT or can we send
any payload? I'd suggest just a single blob of TEXT, since that can be
XML or JSON etc easily enough.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Hannu Krosing <hannu(at)2ndQuadrant(dot)com>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, Andres Freund <andres(at)2ndquadrant(dot)com>, Marko Kreen <markokr(at)gmail(dot)com>
Subject: Re: [RFC] CREATE QUEUE (log-only table) for londiste/pgQ ccompatibility
Date: 2012-10-16 09:29:17
Message-ID: 507D28ED.6060205@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 10/16/2012 11:18 AM, Simon Riggs wrote:
> On 16 October 2012 09:56, Hannu Krosing <hannu(at)2ndquadrant(dot)com> wrote:
>> Hallo postgresql and replication hackers
>>
>> This mail is an additional RFC which proposes a simple way to extend the
>> new logical replication feature so it can cover most usages of
>> skytools/pgq/londiste
>>
>> While the current work for BDR/LCR (bi-directional replication/logical
>> replication) using WAL is theoretically enought to cover _replication_ offered by
>> Londiste it falls short in one important way - there is currently no support for pure
>> queueing, that is for "streams" of data which does not need to be stored in the source
>> database.
>>
>> Fortunately there is a simple solution - do not store it in the source
>> database :)
>>
>> The only thing needed for adding this is to have a table type which
>>
>> a) generates a INSERT record in WAL
>>
>> and
>>
>> b) does not actually store the data in a local file
>>
>> If implemented in userspace it would be a VIEW (or table) with a
>> before/instead
>> trigger which logs the inserted data and then cancels the insert.
>>
>> I'm sure this thing could be implemented, but I leave the tech discussion to
>> those who are currently deep in WAL generation/reconstruction .
>>
>> If we implement logged only tables / queues we would not only enable a more
>> performant pgQ replacement for implementing full Londiste / skytools
>> functionality
>> but would also become a very strong player to be used as persistent basis
>> for message queueing solutions like ActiveMQ, StorMQ, any Advanced Message
>> Queuing Protocol (AMQP) and so on.
>
> Hmm, I was assuming that we'd be able to do that by just writing extra
> WAL directly. But now you've made me think about it, that would be
> very ugly.
>
> Doing it this was, as you suggest, would allow us to write WAL records
> for queuing/replication to specific queue ids. It also allows us to
> have privileges assigned. So this looks like a good idea and might
> even be possible for 9.3.
>
> I've got a feeling we may want the word QUEUE again in the future, so
> I think we should call this a MESSAGE QUEUE.
>
> CREATE MESSAGE QUEUE foo;
> DROP MESSAGE QUEUE foo;
I would like this to be very similar to a table, so it would be

CREATE MESSAGE QUEUE(fieldname type, ...) foo;

perhaps even allowing defaults and constraints. again, this
depends on how complecxt the implementation would be.

for the receiving side it would look like a table with only inserts,
and in this case there could even be a possibility to use it as
a remote log table.

>
> GRANT INSERT ON MESSAGE QUEUE foo TO ...;
> REVOKE INSERT ON MESSAGE QUEUE foo TO ...;
>
> Rules wouldn't. DELETE and UPDATE wouldn't work, nor would SELECT.
>
> Things for next release: Triggers, SELECT sees a stream of changes,
> CHECK clauses to constrain what can be written.
>
> One question: would we require the INSERT statement to parse against a
> tupledesc, or would it be just a single blob of TEXT or can we send
> any payload? I'd suggest just a single blob of TEXT, since that can be
> XML or JSON etc easily enough.
>


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Hannu Krosing <hannu(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, Andres Freund <andres(at)2ndquadrant(dot)com>, Marko Kreen <markokr(at)gmail(dot)com>
Subject: Re: [RFC] CREATE QUEUE (log-only table) for londiste/pgQ ccompatibility
Date: 2012-10-16 09:43:03
Message-ID: CA+U5nMKXxQw2asK9T=OQfJiG=+M4Jp8WVfODcZE7suKwaWnncw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 16 October 2012 10:29, Hannu Krosing <hannu(at)2ndquadrant(dot)com> wrote:

> I would like this to be very similar to a table, so it would be
>
> CREATE MESSAGE QUEUE(fieldname type, ...) foo;
>
> perhaps even allowing defaults and constraints. again, this
> depends on how complecxt the implementation would be.

Presumably just CHECK constraints, not UNIQUE or FKs.
Indexes would not be allowed.

> for the receiving side it would look like a table with only inserts,
> and in this case there could even be a possibility to use it as
> a remote log table.

The queue data would be available via the API, so it can look like anything.

It would be good to identify this with a new rmgr id.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Hannu Krosing <hannu(at)2ndQuadrant(dot)com>
To: Hannu Krosing <hannu(at)2ndQuadrant(dot)com>
Cc: Simon Riggs <simon(at)2ndQuadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org, Andres Freund <andres(at)2ndquadrant(dot)com>, Marko Kreen <markokr(at)gmail(dot)com>
Subject: Re: [RFC] CREATE QUEUE (log-only table) for londiste/pgQ ccompatibility
Date: 2012-10-16 09:47:31
Message-ID: 507D2D33.8080505@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 10/16/2012 11:29 AM, Hannu Krosing wrote:
> On 10/16/2012 11:18 AM, Simon Riggs wrote:
>> On 16 October 2012 09:56, Hannu Krosing <hannu(at)2ndquadrant(dot)com> wrote:
>>> Hallo postgresql and replication hackers
>>>
>>> This mail is an additional RFC which proposes a simple way to extend
>>> the
>>> new logical replication feature so it can cover most usages of
>>> skytools/pgq/londiste
>>>
>>> While the current work for BDR/LCR (bi-directional replication/logical
>>> replication) using WAL is theoretically enought to cover
>>> _replication_ offered by
>>> Londiste it falls short in one important way - there is currently no
>>> support for pure
>>> queueing, that is for "streams" of data which does not need to be
>>> stored in the source
>>> database.
>>>
>>> Fortunately there is a simple solution - do not store it in the source
>>> database :)
>>>
>>> The only thing needed for adding this is to have a table type which
>>>
>>> a) generates a INSERT record in WAL
>>>
>>> and
>>>
>>> b) does not actually store the data in a local file
>>>
>>> If implemented in userspace it would be a VIEW (or table) with a
>>> before/instead
>>> trigger which logs the inserted data and then cancels the insert.
>>>
>>> I'm sure this thing could be implemented, but I leave the tech
>>> discussion to
>>> those who are currently deep in WAL generation/reconstruction .
>>>
>>> If we implement logged only tables / queues we would not only enable
>>> a more
>>> performant pgQ replacement for implementing full Londiste / skytools
>>> functionality
>>> but would also become a very strong player to be used as persistent
>>> basis
>>> for message queueing solutions like ActiveMQ, StorMQ, any Advanced
>>> Message
>>> Queuing Protocol (AMQP) and so on.
>>
>> Hmm, I was assuming that we'd be able to do that by just writing extra
>> WAL directly. But now you've made me think about it, that would be
>> very ugly.
>>
>> Doing it this was, as you suggest, would allow us to write WAL records
>> for queuing/replication to specific queue ids. It also allows us to
>> have privileges assigned. So this looks like a good idea and might
>> even be possible for 9.3.
>>
>> I've got a feeling we may want the word QUEUE again in the future, so
>> I think we should call this a MESSAGE QUEUE.
>>
>> CREATE MESSAGE QUEUE foo;
>> DROP MESSAGE QUEUE foo;
> I would like this to be very similar to a table, so it would be
>
> CREATE MESSAGE QUEUE(fieldname type, ...) foo;
>
> perhaps even allowing defaults and constraints. again, this
> depends on how complecxt the implementation would be.
>
> for the receiving side it would look like a table with only inserts,
> and in this case there could even be a possibility to use it as
> a remote log table.

To clarify - this is intended to be a mirror image of UNLOGGED table

That is , as much as possible a full table, except that no data gets
written, which means that

a) indexes do not make any sense
b) exclusion and unique constraints dont make any sense
c) select, update and delete always see an empty table

all these should probably throw and error, analogous to how VIEWs
currently work.

It could be also described as a write-only table, except that it is
possible to materialise it as a real table on the receiving side

>
>>
>> GRANT INSERT ON MESSAGE QUEUE foo TO ...;
>> REVOKE INSERT ON MESSAGE QUEUE foo TO ...;
>>
>> Rules wouldn't. DELETE and UPDATE wouldn't work, nor would SELECT.
>>
>> Things for next release: Triggers, SELECT sees a stream of changes,
>> CHECK clauses to constrain what can be written.
>>
>> One question: would we require the INSERT statement to parse against a
>> tupledesc, or would it be just a single blob of TEXT or can we send
>> any payload? I'd suggest just a single blob of TEXT, since that can be
>> XML or JSON etc easily enough.
>>
>
>
>


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [RFC] CREATE QUEUE (log-only table) for londiste/pgQ ccompatibility
Date: 2012-10-16 22:03:53
Message-ID: 507DD9C9.6020903@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hannu,

Can you explain in more detail how this would be used on the receiving
side? I'm unable to picture it from your description.

I'm also a bit reluctant to call this a "message queue", since it lacks
the features required for it to be used as an application-level queue.
"REPLICATION MESSAGE", maybe?

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, Hannu Krosing <hannu(at)2ndquadrant(dot)com>, Marko Kreen <markokr(at)gmail(dot)com>
Subject: Re: [RFC] CREATE QUEUE (log-only table) for londiste/pgQ ccompatibility
Date: 2012-10-17 06:28:01
Message-ID: CA+U5nMJZTXGm_DL0GgkSXzdvFkcUNpxzKCTBqD1KEJLZFnjK+Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 16 October 2012 23:03, Josh Berkus <josh(at)agliodbs(dot)com> wrote:

> Can you explain in more detail how this would be used on the receiving
> side? I'm unable to picture it from your description.

This will allow implementation of pgq in core, as discussed many times
at cluster hackers meetings.

> I'm also a bit reluctant to call this a "message queue", since it lacks
> the features required for it to be used as an application-level queue.

It's the input end of an application-level queue. In this design the
queue is like a table, so we need SQL grammar to support this new type
of object. Replication message doesn't describe this, since it has
little if anything to do with replication and if anything its a
message type, not a message.

You're right that Hannu needs to specify the rest of the design and
outline the API. The storage of the queue is "in WAL", which raises
questions about how the API will guarantee we read just once from the
queue and what happens when queue overflows. The simple answer would
be we put everything in a table somewhere else, but that needs more
careful specification to show we have both ends of the queue and a
working design.

Do we need a new object at all? Can we not just define a record type,
then define messages using that type? At the moment I think the
named-object approach works better, but we should consider that.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Hannu Krosing <hannu(at)2ndQuadrant(dot)com>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [RFC] CREATE QUEUE (log-only table) for londiste/pgQ ccompatibility
Date: 2012-10-17 10:26:24
Message-ID: 507E87D0.8030301@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 10/17/2012 12:03 AM, Josh Berkus wrote:
> Hannu,
>
> Can you explain in more detail how this would be used on the receiving
> side? I'm unable to picture it from your description.
It would be used similar to how the event tables in pgQ (from skytools)
is used - as a source of "events" to be replied on the subscriber side.

(For discussion sake let's just call this LOGGED ONLY TABLE, as opposed
to UNLOGGED TABLE we already have)

The simplest usage would be implementing "remote log tables" that is
tables, where you do INSERT on the master side, but it "inserts" only
a logical WAL record and nothing else.

On subscriber side your replay process reads this WAL record as an
"insert event" and if the table is declared as an ordinary table on
subscriber, it performs an insert there.

This would make it trivial to implement a persistent remote log table
with minimal required amount of writing on the master side.

We could even implement a log table which captures also log entries
from aborted transactions by treating ROLLBACK as COMMIT for this
table.

But the subscriber side could also do other things instead (or in
addition to) filling a log table. For example, it could create a
partitioned
table instead of a plain table defined on the provider side.

There is support and several example replay agents in skytools package
which do this based on pgQ

Or you could do computations/materialised views based on "events" from
the table.

Or you could use the "insert events"/wal records as a base for some
other remote processing, like sending out e-mails .

There is also support for these kinds of things in skytools.

> I'm also a bit reluctant to call this a "message queue", since it lacks
> the features required for it to be used as an application-level queue.
> "REPLICATION MESSAGE", maybe?
>
Initially I'd just stick with LOG ONLY TABLE or QUEUE based on what
it does, not on how it could be used.

LOGGED ONLY TABLE is very technical description of realisation - I'd
prefer it to work as mush like a table as possible, similar to how VIEW
currently works - for all usages that make sense, you can simply
substitute it for a TABLE

QUEUE emphasizes the aspect of logged only table that it accepts
"records" in a certain order, persists these and then quarantees
that they can be read out in exact the same order - all this being
guaranteed by existing WAL mechanisms.

It is not meant to be a full implementation of application level queuing
system though but just the capture, persisting and distribution parts

Using this as an "application level queue" needs a set of interface
functions to extract the events and also to keep track of the processed
events. As there is no general consensus what these shoul be (like if
processing same event twice is allowed) this part is left for specific
queue consumer implementations.

--------------------
Hannu Krosing


From: Greg Stark <stark(at)mit(dot)edu>
To: Hannu Krosing <hannu(at)2ndquadrant(dot)com>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [RFC] CREATE QUEUE (log-only table) for londiste/pgQ ccompatibility
Date: 2012-10-17 17:17:29
Message-ID: CAM-w4HOcgdYQxs2Ce0psGCf3QDP2HvioOD4XPQwh+BLNch9T_g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Oct 17, 2012 at 11:26 AM, Hannu Krosing <hannu(at)2ndquadrant(dot)com> wrote:
> The simplest usage would be implementing "remote log tables" that is
> tables, where you do INSERT on the master side, but it "inserts" only
> a logical WAL record and nothing else.
>
> On subscriber side your replay process reads this WAL record as an
> "insert event" and if the table is declared as an ordinary table on
> subscriber, it performs an insert there.

What kinds of applications would need that?

--
greg


From: Christopher Browne <cbbrowne(at)gmail(dot)com>
To: Greg Stark <stark(at)mit(dot)edu>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, PostgreSQL Mailing Lists <pgsql-hackers(at)postgresql(dot)org>, Hannu Krosing <hannu(at)2ndquadrant(dot)com>
Subject: Re: [RFC] CREATE QUEUE (log-only table) for londiste/pgQ ccompatibility
Date: 2012-10-17 18:48:45
Message-ID: CAFNqd5X8XXHC+zsL6wn7qAwAtCa1F0jCpBdqs2mArji7LHx0GA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Well, replication is arguably a relevant case.

For Slony, the origin/master node never cares about logged changes - that
data is only processed on replicas. Now, that's certainly a little
weaselly - the log data (sl_log_*) has got to get read to get to the
replica.

This suggests, nonetheless, a curiously different table structure than is
usual, and I could see this offering interesting possibilities.

The log tables are only useful to read in transaction order, which is
pretty well the order data gets written to WAL, so perhaps we could have
savings by only writing data to WAL...

It occurs to me that this notion might exist as a special sort of table,
interesting for pgq as well as Slony, which consists of:

- table data is stored only in WAL
- an index supports quick access to this data, residing in WAL
- TOASTing perhaps unneeded?
- index might want to be on additional attributes
- the triggers-on-log-tables thing Slony 2.2 does means we want these
tables to support triggers
- if data is only held in WAL, we need to hold the WAL until (mumble,
later, when known to be replicated)
- might want to mix local updates with updates imported from remote nodes

I think it's a misnomer to think this is about having the data not locally
accessible. Rather, it has a pretty curious access and storage pattern.

And a slick pgq queue would likely make a good Slony log, too.


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Hannu Krosing <hannu(at)2ndQuadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [RFC] CREATE QUEUE (log-only table) for londiste/pgQ ccompatibility
Date: 2012-10-17 20:25:17
Message-ID: 507F142D.1080905@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


> It is not meant to be a full implementation of application level queuing
> system though but just the capture, persisting and distribution parts
>
> Using this as an "application level queue" needs a set of interface
> functions to extract the events and also to keep track of the processed
> events. As there is no general consensus what these shoul be (like if
> processing same event twice is allowed) this part is left for specific
> queue consumer implementations.

Well, but AFAICT, you've already prohibited features through your design
which are essential to application-level queues, and are implemented by,
for example, pgQ.

1. your design only allows the queue to be read on replicas, not on the
node where the item was inserted.

2. if you can't UPDATE or DELETE queue items -- or LOCK them -- how on
earth would a client know which items they have executed and which they
haven't?

3. Double-down on #2 in a multithreaded environment.

For an application-level queue, the base functionality is:

ADD ITEM
READ NEXT (#) ITEM(S)
LOCK ITEM
DELETE ITEM

More sophisticated an useful queues also allow:

READ NEXT UNLOCKED ITEM
LOCK NEXT UNLOCKED ITEM
UPDATE ITEM
READ NEXT (#) UNSEEN ITEM(S)

The design you describe seems to prohibit pretty much all of the above
operations after READ NEXT. This makes it completely useless as a
application-level queue.

And, for that matter, if your new queue only accepts INSERTs, why not
just improve LISTEN/NOTIFY so that it's readable on replicas? What does
this design buy you that that doesn't?

Quite possibly you have plans which answer all of the above, but they
aren't at all clear in your RFC.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


From: Christopher Browne <cbbrowne(at)gmail(dot)com>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: Hannu Krosing <hannu(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [RFC] CREATE QUEUE (log-only table) for londiste/pgQ ccompatibility
Date: 2012-10-17 21:42:39
Message-ID: CAFNqd5VVA8n3m+6o6-CcmEkj=fgW+-oGqE2O3bKa96g2a6hA=w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Oct 17, 2012 at 4:25 PM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
>
>> It is not meant to be a full implementation of application level queuing
>> system though but just the capture, persisting and distribution parts
>>
>> Using this as an "application level queue" needs a set of interface
>> functions to extract the events and also to keep track of the processed
>> events. As there is no general consensus what these shoul be (like if
>> processing same event twice is allowed) this part is left for specific
>> queue consumer implementations.
>
> Well, but AFAICT, you've already prohibited features through your design
> which are essential to application-level queues, and are implemented by,
> for example, pgQ.
>
> 1. your design only allows the queue to be read on replicas, not on the
> node where the item was inserted.

I commented separately on this; I'm pretty sure there needs to be a
way to read the queue on a replica, yes, indeed.

> 2. if you can't UPDATE or DELETE queue items -- or LOCK them -- how on
> earth would a client know which items they have executed and which they
> haven't?

If the items are actually stored in WAL, then it seems well and truly
impossible to do any of those three things directly.

What could be done, instead, would be to add "successor" items to
indicate that they have been dealt with, in effect, back-references.

You don't get to UPDATE or DELETE; instead, you do something like:

INSERT into queue (reference_to_xid, reference_to_id_in_xid, action)
values (old_xid_1, old_id_within_xid_1, 'COMPLETED'), (old_xid_2,
old_id_within_xid_2, 'CANCELLED');

In a distributed context, it's possible that multiple nodes could be
reading from the same queue, so that while "process at least once" is
no trouble, "process at most once" is just plain troublesome.
--
When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"


From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: Hannu Krosing <hannu(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [RFC] CREATE QUEUE (log-only table) for londiste/pgQ ccompatibility
Date: 2012-10-18 07:19:45
Message-ID: CA+U5nMJujvmB1cMU7pFOuAeHZXE5xqXc5r22tQk8DJy3QXNDhQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 17 October 2012 21:25, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
>
>> It is not meant to be a full implementation of application level queuing
>> system though but just the capture, persisting and distribution parts
>>
>> Using this as an "application level queue" needs a set of interface
>> functions to extract the events and also to keep track of the processed
>> events. As there is no general consensus what these shoul be (like if
>> processing same event twice is allowed) this part is left for specific
>> queue consumer implementations.
>
> Well, but AFAICT, you've already prohibited features through your design
> which are essential to application-level queues, and are implemented by,
> for example, pgQ.
>
> 1. your design only allows the queue to be read on replicas, not on the
> node where the item was inserted.
>
> 2. if you can't UPDATE or DELETE queue items -- or LOCK them -- how on
> earth would a client know which items they have executed and which they
> haven't?
>
> 3. Double-down on #2 in a multithreaded environment.

It's hard to work out how to reply to this because its just so off
base. I don't agree with the restrictions you think you see at all,
saying it politely rather than giving a one word answer.

The problem here is you phrase these things with too much certainty,
seeing only barriers. The "how on earth?" vibe is not appropriate at
all. It's perfectly fine to ask for answers to those difficult
questions, but don't presume that there are no answers, or that you
know with certainty they are even hard ones. By phrasing things in
such a closed way the only way forwards is through you, which does not
help.

All we're discussing is moving a successful piece of software into
core, which has been discussed for years at the international
technical meetings we've both been present at. I think an open
viewpoint on the feasibility of that would be reasonable, especially
when it comes from one of the original designers.

I apologise for making a personal comment, but this does affect the
technical discussion.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Hannu Krosing <hannu(at)2ndquadrant(dot)com>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [RFC] CREATE QUEUE (log-only table) for londiste/pgQ ccompatibility
Date: 2012-10-18 07:36:39
Message-ID: CA+U5nMKGWYULpbM3OqnooxZRZ-oWfcr=Bit9rSi4hgXvM1t9Dw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 17 October 2012 11:26, Hannu Krosing <hannu(at)2ndquadrant(dot)com> wrote:

> LOGGED ONLY TABLE is very technical description of realisation - I'd
> prefer it to work as mush like a table as possible, similar to how VIEW
> currently works - for all usages that make sense, you can simply
> substitute it for a TABLE
>
> QUEUE emphasizes the aspect of logged only table that it accepts
> "records" in a certain order, persists these and then quarantees
> that they can be read out in exact the same order - all this being
> guaranteed by existing WAL mechanisms.
>
> It is not meant to be a full implementation of application level queuing
> system though but just the capture, persisting and distribution parts
>
> Using this as an "application level queue" needs a set of interface
> functions to extract the events and also to keep track of the processed
> events. As there is no general consensus what these shoul be (like if
> processing same event twice is allowed) this part is left for specific
> queue consumer implementations.

The two halves of the queue are the TAIL/entry point and the HEAD/exit
point. As you point out these could be on the different servers,
wherever the logical changes flow to, but could also be on the same
server. When the head and tail are on the same server, the MESSAGE
QUEUE syntax seems appropriate, but I agree that calling it that when
its just a head or just a tail seems slightly misleading.

I guess the question is whether we provide a full implementation or
just the first half.

We do, I think, want a full queue implementation in core. We also want
to allow other queue implementations to interface with Postgres, so we
probably want to allow "first half" only as well. Meaning we want both
head and tail separately in core code. The question is whether we
require both head and tail in core before we allow commit, to which I
would say I think adding the tail first is OK, and adding the head
later when we know exactly the design.

Having said that, the LOGGING ONLY syntax makes me shiver. Better name?

I should also add that this is an switchable sync/asynchronous
transactional queue, whereas LISTEN/NOTIFY is a synchronous
transactional queue.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Hannu Krosing <hannu(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [RFC] CREATE QUEUE (log-only table) for londiste/pgQ ccompatibility
Date: 2012-10-18 17:33:38
Message-ID: 50803D72.7010600@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Simon,

> It's hard to work out how to reply to this because its just so off
> base. I don't agree with the restrictions you think you see at all,
> saying it politely rather than giving a one word answer.

You have inside knowledge of Hannu's design. I am merely going from his
description *on this list*, because that's all I have to go in.

He requested comments, so here I am, commenting. I'm *hoping* that it's
merely the description which is poor and not the conception of the
feature. *As Hannu described the feature* it sounds useless and
obscure, and miles away from powering any kind of general queueing
mechanism. Or anything we discussed at the clustering meetings.

And, again, if you didn't want comments, you shouldn't have posted an RFC.

> All we're discussing is moving a successful piece of software into
> core, which has been discussed for years at the international
> technical meetings we've both been present at. I think an open
> viewpoint on the feasibility of that would be reasonable, especially
> when it comes from one of the original designers.

When I ask you for technical clarification or bring up potential
problems with a 2Q feature, you consistently treat it as a personal
attack and are emotionally defensive instead of answering my technical
questions. This, in turn, frustrates the heck out of me (and others)
because we can't get the technical questions answered. I don't want you
to justify yourself, I want a clear technical spec.

I'm asking these questions because I'm excited about ReplicationII, and
I want it to be the best feature it can possibly be.

Or, as we tell many new contributors, "We wouldn't bring up potential
problems and ask lots of questions if we weren't interested in the feature."

Now, on to the technical questions:

>> QUEUE emphasizes the aspect of logged only table that it accepts
>> "records" in a certain order, persists these and then quarantees
>> that they can be read out in exact the same order - all this being
>> guaranteed by existing WAL mechanisms.
>>
>> It is not meant to be a full implementation of application level queuing
>> system though but just the capture, persisting and distribution parts
>>
>> Using this as an "application level queue" needs a set of interface
>> functions to extract the events and also to keep track of the processed
>> events. As there is no general consensus what these shoul be (like if
>> processing same event twice is allowed) this part is left for specific
>> queue consumer implementations.

While implementations vary, I think you'll find that the set of
operations required for a full-featured application queue are remarkably
similar across projects. Personally, I've worked with celery, Redis,
AMQ, and RabbitMQ, as well as a custom solution on top of pgQ. The
design, as you've described it, make several of these requirements
unreasonably convoluted to implement.

It sounds to me like the needs of internal queueing and application
queueing may be hopelessly divergent. That was always possible, and
maybe the answer is to forget about application queueing and focus on
making this mechanism work for replication and for matviews, the two
features we *know* we want it for. Which don't need the application
queueing features I described AFAIK.

> The two halves of the queue are the TAIL/entry point and the HEAD/exit
> point. As you point out these could be on the different servers,
> wherever the logical changes flow to, but could also be on the same
> server. When the head and tail are on the same server, the MESSAGE
> QUEUE syntax seems appropriate, but I agree that calling it that when
> its just a head or just a tail seems slightly misleading.

Yeah, that's why I was asking for clarification; the way Hannu described
it, it sounded like it *couldn't* be read on the insert node, but only
on a replica.

> We do, I think, want a full queue implementation in core. We also want
> to allow other queue implementations to interface with Postgres, so we
> probably want to allow "first half" only as well. Meaning we want both
> head and tail separately in core code. The question is whether we
> require both head and tail in core before we allow commit, to which I
> would say I think adding the tail first is OK, and adding the head
> later when we know exactly the design.

I'm just pointing out that some of the requirements of the design for
the replication queue may conflict with a design for a full-featured
application queue.

I don't quite follow you on what you mean by "head" vs. "tail". Explain?

> Having said that, the LOGGING ONLY syntax makes me shiver. Better name?

I suck at names. Sorry.

> I should also add that this is an switchable sync/asynchronous
> transactional queue, whereas LISTEN/NOTIFY is a synchronous
> transactional queue.

Thanks for explaining.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


From: Claudio Freire <klaussfreire(at)gmail(dot)com>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Hannu Krosing <hannu(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [RFC] CREATE QUEUE (log-only table) for londiste/pgQ ccompatibility
Date: 2012-10-18 18:36:09
Message-ID: CAGTBQpZVMmvNk5GqqutuQeWaBikEgHqJZT3ntA-d1QDPqJjobw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Oct 18, 2012 at 2:33 PM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
>> I should also add that this is an switchable sync/asynchronous
>> transactional queue, whereas LISTEN/NOTIFY is a synchronous
>> transactional queue.
>
> Thanks for explaining.

New here, I missed half the conversation, but since it's been brought
up and (to me wrongfully) dismissed, I'd like to propose:

NOTIFY [ALL|ONE] [REMOTE|LOCAL|CLUSTER|DOWNSTREAM] ASYNCHRONOUSLY
LISTEN [REMOTE|LOCAL|CLUSTER|UPSTREAM] too for good measure.

That ought to work out fine as SQL constructs go, implementation aside.

That's not enough for matviews, but it is IMO a good starting point.
All you need after that, are triggers for notifying automatically upon
insert, and some mechanism to attach triggers to a channel for the
receiving side.

Since channels are limited to short strings, maybe a different kind of
object (but with similar manipulation syntax) ought to be created. The
CREATE QUEUE command, in fact, could be creating such a channel. The
channel itself won't be WAL-only, just the messages going through it.
This (I think) would solve locking issues.


From: Hannu Krosing <hannu(at)2ndQuadrant(dot)com>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [RFC] CREATE QUEUE (log-only table) for londiste/pgQ ccompatibility
Date: 2012-10-18 18:56:33
Message-ID: 508050E1.5050202@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 10/18/2012 07:33 PM, Josh Berkus wrote:
> Simon,
>
>
>> It's hard to work out how to reply to this because its just so off
>> base. I don't agree with the restrictions you think you see at all,
>> saying it politely rather than giving a one word answer.
> You have inside knowledge of Hannu's design.
Actually Simon has currently no more knowledge of this specific
design than you do - I posted this on this list as soon as I had figured
it out as a possible solution of a specific problem of supporting
full pgQ/Londiste functionality in WAL based logical replication
with minimal overhead.

(well, actually I let it settle a few weeks, but i did not discuss
this off-list before ).

Simon may have better grasp of it thanks to having done work
on the BDR/Logical Replication design and thus having better or
at least more recent understanding of issues involved in Logical
Replication.

When mapping londiste/Slony message capture to Logical WAL
the WAL already _is_ the event queue for replication.
NOT LOGGED tables make it also usable for non-replication
things using same mechanisms. (the equivalent in trigger-based
system would be a log trigger which captures insert event and then
cancels an insert).

> I am merely going from his
> description *on this list*, because that's all I have to go in.
>
> He requested comments, so here I am, commenting. I'm *hoping* that it's
> merely the description which is poor and not the conception of the
> feature. *As Hannu described the feature* it sounds useless and
> obscure, and miles away from powering any kind of general queueing
> mechanism.
If we describe a queue as something you put stuff in at one end and
get it out in same or some other specific order at the other end, then
WAL _is_ a queue when you use it for replication (if you just write to it,
then it is "Log", if you write and read, it is "Queue")

That is, the WAL already is a form of persistent and ordered (that is
how WAL works)
stream of messages ("WAL records") that are generated on the "master"
and replayed on one or more consumers (called "slaves" in case of simple
replication)

All it takes to make this scenario work is keeping track of LSN or simply
log position on the slave side.

What you seem to be wanting is support for a cooperative consumers,
that is multiple consumers on the same queue working together and
sharing the work to process the incoming event .

This can be easily achieved using a single ordered event stream and
extra bookkeeping structures on the consumer side (look at cooperative
consumer samples in skytools).

What I suggested was optimisation for the case where you know that you
will never need the data on the master side and are only interested in it
on the slave side.

By writing rows/events/messages only to log (or steam or queue), you
avoid the need to later clean up it on the master by either DELETE or
TRUNCATE or rotating tables.

For both physical and logical streaming the WAL _is_ the queue of events
that were recorded on master and need to be replied on the slave.

Thanks to introducing logical replication, it now makes sense to have
actions recorded _only_ in this queue and this is what the whole RC was
about.

I recommend that you introduce yourself a bit to skytools/pgQ to get a
better feel of the things I am talking about. Londiste is just one
application
built on a general event logging, transport and transform/replay (that is
what i'd call queueing :) ) system pgQ.

pgQ does have its roots in Slony an(and earlier) replication systems,
but it
is by no means _only_ a replication system.

The LOG ONLY tables are _not_ needed for pure replication (like Slony) but
they make replication + queueing type solutions like skytools/pgQ much more
efficient as they do away wuth the need to maintain the queued data on
the
master side where it will never be needed ( just to reapeat this once more
)

> Or anything we discussed at the clustering meetings.
>
> And, again, if you didn't want comments, you shouldn't have posted an RFC.
I did want comments and as far as I know I do not see you as hostile :)

I do understand that what you mean by QUEUE (and specially as a
MESSAGE QUEUE) is different from what I described.
You seem to want specifically an implementation of cooperative
consumers for a generic queue.

The answer is yes, it is possible to build this on WAL, or table based
event logs/queue of londiste / slony. It just takkes a little extra
management on the receiving side to do the record locking and
distribution between cooperating consumers.
>> All we're discussing is moving a successful piece of software into
>> core, which has been discussed for years at the international
>> technical meetings we've both been present at. I think an open
>> viewpoint on the feasibility of that would be reasonable, especially
>> when it comes from one of the original designers.
> When I ask you for technical clarification or bring up potential
> problems with a 2Q feature, you consistently treat it as a personal
> attack and are emotionally defensive instead of answering my technical
> questions. This, in turn, frustrates the heck out of me (and others)
> because we can't get the technical questions answered. I don't want you
> to justify yourself, I want a clear technical spec.
Currently the "clear tech spec" is just this:

* works as table on INSERTS up to inserting logical WAL record
describing the
insert but no data is inserted locally.

with all things that follow from the local table having no data
- unique constraints don't make sense
- indexes make no sense
- updates and deletes hit no data
- etc. . .
>
> I'm asking these questions because I'm excited about ReplicationII, and
> I want it to be the best feature it can possibly be.
>
> Or, as we tell many new contributors, "We wouldn't bring up potential
> problems and ask lots of questions if we weren't interested in the feature."
>
> Now, on to the technical questions:
>
>>> QUEUE emphasizes the aspect of logged only table that it accepts
>>> "records" in a certain order, persists these and then quarantees
>>> that they can be read out in exact the same order - all this being
>>> guaranteed by existing WAL mechanisms.
>>>
>>> It is not meant to be a full implementation of application level queuing
>>> system though but just the capture, persisting and distribution parts
>>>
>>> Using this as an "application level queue" needs a set of interface
>>> functions to extract the events and also to keep track of the processed
>>> events. As there is no general consensus what these shoul be (like if
>>> processing same event twice is allowed) this part is left for specific
>>> queue consumer implementations.
> While implementations vary, I think you'll find that the set of
> operations required for a full-featured application queue are remarkably
> similar across projects. Personally, I've worked with celery, Redis,
> AMQ, and RabbitMQ, as well as a custom solution on top of pgQ. The
> design, as you've described it, make several of these requirements
> unreasonably convoluted to implement.
As Simon explained, the initial RFC was just about not keeping the
data in local table if we know it will never be accessed (at leas not
for anything except vacuum and delete/truncate)

This is something that made no sense for physical replication .

> It sounds to me like the needs of internal queueing and application
> queueing may be hopelessly divergent. That was always possible, and
> maybe the answer is to forget about application queueing and focus on
> making this mechanism work for replication and for matviews, the two
> features we *know* we want it for. Which don't need the application
> queueing features I described AFAIK.
>
>> The two halves of the queue are the TAIL/entry point and the HEAD/exit
>> point. As you point out these could be on the different servers,
>> wherever the logical changes flow to, but could also be on the same
>> server. When the head and tail are on the same server, the MESSAGE
>> QUEUE syntax seems appropriate, but I agree that calling it that when
>> its just a head or just a tail seems slightly misleading.
> Yeah, that's why I was asking for clarification; the way Hannu described
> it, it sounded like it *couldn't* be read on the insert node, but only
> on a replica.
Well, the reading is done the same way any WAL reading is done -
you subscribe to the stream and from that point on get the records
in LSN order.

It is very hard for me to tell for sure if walsender->walreceiver combo
"reads the events" on master or slave side
>
>> We do, I think, want a full queue implementation in core. We also want
>> to allow other queue implementations to interface with Postgres, so we
>> probably want to allow "first half" only as well. Meaning we want both
>> head and tail separately in core code. The question is whether we
>> require both head and tail in core before we allow commit, to which I
>> would say I think adding the tail first is OK, and adding the head
>> later when we know exactly the design.
> I'm just pointing out that some of the requirements of the design for
> the replication queue may conflict with a design for a full-featured
> application queue.
>
> I don't quite follow you on what you mean by "head" vs. "tail". Explain?
HEAD is the queue producer, where the events go in (any insert on master)

TAIL (to avoid another word) is where they come out
(walreader -> walreceiver moving the events to slave)

Think of an analogy with a snake feeding on berries used by
an ant colony to get the nutrients in the berries to its nest :)

Ans there is no processing inside the snake - the work of
distributing said nutrients once they have arrived to the nest has
to be organised by the cooperative colony of ants on that end, the
snake just guarantees that the berries arrive in the same order they
get in.

I guess this organisation of works after the events are delivered is
what you were after when asking about "an application level queue".

>> Having said that, the LOGGING ONLY syntax makes me shiver. Better name?
>
I guess WRITE ONLY tables would get us more publicity would not be
entirely correct, as the data is readable from the log .

Hannu


From: Hannu Krosing <hannu(at)2ndQuadrant(dot)com>
To: Claudio Freire <klaussfreire(at)gmail(dot)com>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [RFC] CREATE QUEUE (log-only table) for londiste/pgQ ccompatibility
Date: 2012-10-18 19:03:21
Message-ID: 50805279.5030503@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 10/18/2012 08:36 PM, Claudio Freire wrote:
> The CREATE QUEUE command, in fact, could be creating
> such a channel. The channel itself won't be WAL-only, just
> the messages going through it. This (I think) would solve locking issues.

Hmm. Maybe we should think of implementing this as REMOTE TABLE, that
is a table which gets no real data stored locally but all insert got
through WAL
and are replayed as real inserts on slave side.

Then if you want matviews or partitioned table, you just attach triggers to
the table on slave side to do them.

This would be tangential to their use as pure queues which would happen
at the level of plugins to logical replication.

--------------
Hannu


From: Christopher Browne <cbbrowne(at)gmail(dot)com>
To: Hannu Krosing <hannu(at)2ndquadrant(dot)com>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [RFC] CREATE QUEUE (log-only table) for londiste/pgQ ccompatibility
Date: 2012-10-18 19:18:19
Message-ID: CAFNqd5VCr721=CGHe+eD4-A0Y4+T3TyF0Nhsve8MezY2pwX-sg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Oct 18, 2012 at 2:56 PM, Hannu Krosing <hannu(at)2ndquadrant(dot)com> wrote:
> * works as table on INSERTS up to inserting logical WAL record describing
> the
> insert but no data is inserted locally.
>
> with all things that follow from the local table having no data
> - unique constraints don't make sense
> - indexes make no sense
> - updates and deletes hit no data
> - etc. . .

Yep, I think I was understanding those aspects.

I think I disagree that "indexes make no sense."

I think that it would be meaningful to have an index type for this,
one that is a pointer at WAL records, to enable efficiently jumping to
the right WAL log to start accessing a data stream, given an XID.
That's a fundamentally different sort of index than we have today
(much the way that hash indexes, GiST indexes, and BTrees differ from
one another).

I'm having a hard time thinking about what happens if you have
cascaded replication, and want to carry records downstream. In that
case, the XIDs from the original system aren't miscible with the XIDs
in a message queue on a downstream database, and I'm not sure what
we'd want to do. Keep the original XIDs in a side attribute, maybe?
It seems weird, at any rate. Or perhaps data from foreign sources has
got to go into a separate queue/'sorta-table', and thereby have two
XIDs, the "source system XID" and the "when we loaded it in locally
XID."
--
When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"


From: Ants Aasma <ants(at)cybertec(dot)at>
To: Hannu Krosing <hannu(at)2ndquadrant(dot)com>
Cc: Claudio Freire <klaussfreire(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [RFC] CREATE QUEUE (log-only table) for londiste/pgQ ccompatibility
Date: 2012-10-19 02:26:54
Message-ID: CA+CSw_u9U6NSSCFkNYzG2DA2nf=zxAhA3Jzbp3dy=tRk5YHZOA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Oct 18, 2012 at 10:03 PM, Hannu Krosing <hannu(at)2ndquadrant(dot)com> wrote:
> Hmm. Maybe we should think of implementing this as REMOTE TABLE, that
> is a table which gets no real data stored locally but all insert got through
> WAL
> and are replayed as real inserts on slave side.

FWIW, MySQL calls this exact concept the "black hole" storage engine.

Regards,
Ants Aasma
--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de


From: Hannu Krosing <hannu(at)krosing(dot)net>
To: Ants Aasma <ants(at)cybertec(dot)at>
Cc: Hannu Krosing <hannu(at)2ndquadrant(dot)com>, Claudio Freire <klaussfreire(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [RFC] CREATE QUEUE (log-only table) for londiste/pgQ ccompatibility
Date: 2012-10-19 11:17:03
Message-ID: 508136AF.6040407@krosing.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 10/19/2012 04:26 AM, Ants Aasma wrote:
> On Thu, Oct 18, 2012 at 10:03 PM, Hannu Krosing <hannu(at)2ndquadrant(dot)com> wrote:
>> Hmm. Maybe we should think of implementing this as REMOTE TABLE, that
>> is a table which gets no real data stored locally but all insert got through
>> WAL
>> and are replayed as real inserts on slave side.
> FWIW, MySQL calls this exact concept the "black hole" storage engine.
In this case calling this WRITE ONLY TABLE does not seem so strange
anymore :)

Or even PERSISTENT WRITE ONLY TABLE to make the paradox more explicit.
>
> Regards,
> Ants Aasma


From: Hannu Krosing <hannu(at)krosing(dot)net>
To: Christopher Browne <cbbrowne(at)gmail(dot)com>
Cc: Hannu Krosing <hannu(at)2ndquadrant(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [RFC] CREATE QUEUE (log-only table) for londiste/pgQ ccompatibility
Date: 2012-10-19 11:53:16
Message-ID: 50813F2C.9040101@krosing.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 10/18/2012 09:18 PM, Christopher Browne wrote:
> On Thu, Oct 18, 2012 at 2:56 PM, Hannu Krosing <hannu(at)2ndquadrant(dot)com> wrote:
>> * works as table on INSERTS up to inserting logical WAL record describing
>> the
>> insert but no data is inserted locally.
>>
>> with all things that follow from the local table having no data
>> - unique constraints don't make sense
>> - indexes make no sense
>> - updates and deletes hit no data
>> - etc. . .
> Yep, I think I was understanding those aspects.
>
> I think I disagree that "indexes make no sense."
>
> I think that it would be meaningful to have an index type for this,
> one that is a pointer at WAL records, to enable efficiently jumping to
> the right WAL log to start accessing a data stream, given an XID.
> That's a fundamentally different sort of index than we have today
> (much the way that hash indexes, GiST indexes, and BTrees differ from
> one another).
>
> I'm having a hard time thinking about what happens if you have
> cascaded replication, and want to carry records downstream.
I'd try to keep it as similar as possible to how the "real" tables
behave in this multi-master (or "bidirectional" as the original
logical wal case was named) scenario.

I assume that the current thinking is that the replicated changes
will carry original (node id, transaxtion id) info which is used to
determine when to stop replicating in case there is more than
one node in the replication ring.

In case any changes to the resulting table are performed due to
conflict resolution this "original (node id, transaxtion id)" gets
replaced (or added ?) by the info from the node that did the
latest changes so that the original origin node gets a chance
to examine the changes too.

This has to be pondered carefully so that the conflict resolution
chain will end at some point.

(I guess that the whole logrep design is something that should
be discussed in Prague . Simon and Andres are doing a
presentation on it there and in case this ignites more discussion
it may be something warranting a separate discussion session
among all interested parties)

Hannu

> In that
> case, the XIDs from the original system aren't miscible with the XIDs
> in a message queue on a downstream database, and I'm not sure what
> we'd want to do. Keep the original XIDs in a side attribute, maybe?
> It seems weird, at any rate. Or perhaps data from foreign sources has
> got to go into a separate queue/'sorta-table', and thereby have two
> XIDs, the "source system XID" and the "when we loaded it in locally
> XID."


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: Hannu Krosing <hannu(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [RFC] CREATE QUEUE (log-only table) for londiste/pgQ ccompatibility
Date: 2012-10-19 12:06:36
Message-ID: CA+U5nM+QkzOePgWqZjpAhDKkKcqjF985dDjRd86QwphDztB17w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 18 October 2012 18:33, Josh Berkus <josh(at)agliodbs(dot)com> wrote:

>> All we're discussing is moving a successful piece of software into
>> core, which has been discussed for years at the international
>> technical meetings we've both been present at. I think an open
>> viewpoint on the feasibility of that would be reasonable, especially
>> when it comes from one of the original designers.
>
> When I ask you for technical clarification or bring up potential
> problems with a 2Q feature, you consistently treat it as a personal
> attack and are emotionally defensive instead of answering my technical
> questions. This, in turn, frustrates the heck out of me (and others)
> because we can't get the technical questions answered. I don't want you
> to justify yourself, I want a clear technical spec.

Well, this isn't "a 2Q feature"; perhaps that is part of the problem,
but I couldn't say.

I didn't know this was coming at all, nor is that a problem for me.
Since we've talked about that general feature enough at meetings we've
all been present at (and indeed, you chaired), I recognised it as that
and treated it positively in that light. (I think even that Hannu may
not have been present, just Marko).

You made claims that were completely unfounded and yet also strangely
negative. I picked you up on it because you'll kill discussion of the
feature if I don't speak out, not because the speaker works with me.
I'm not otherwise involved in the feature.

So your assumption of off-list collusion is wrong, as is your claim of
any emotional aspect to this from me. I don't think you can turn this
back onto me.

If a design is not clear, ask for clarification. Don't tell the world
in general that the design is bad or flawed until you actually know it
is. I hear "there is a problem with that patch" discussed too often.
Unfounded negativity is as certain a killer as any real technical
flaw, so we must be careful to avoid it. That comment goes to
everybody, for any patch, but in this case to you because this is the
second thread this week I've seen it.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Любен Каравелов <karavelov(at)mail(dot)bg>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [RFC] CREATE QUEUE (log-only table) for londiste/pgQ ccompatibility
Date: 2012-10-19 12:36:05
Message-ID: e673b0d46aca6654c741d0bf7a82fa22.mailbg@beta.mail.bg
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

----- Цитат от Hannu Krosing (hannu(at)krosing(dot)net), на 19.10.2012 в
14:17 ----- On 10/19/2012 04:26 AM, Ants Aasma wrote:
On Thu, Oct 18, 2012 at 10:03 PM, Hannu Krosing wrote:
Hmm. Maybe we should think of implementing this as REMOTE TABLE, that
is a table which gets no real data stored locally but all insert got
through
WAL
and are replayed as real inserts on slave side.
FWIW, MySQL calls this exact concept the "black hole" storage engine.
In this case calling this WRITE ONLY TABLE does not seem so strange
anymore :)

Or even PERSISTENT WRITE ONLY TABLE to make the paradox more explicit.

Oracle call this "Streams" and they build application queues ("Advanced
queuing") and replication solution ("Advanced replication") on them.

Why not call the feature "STREAM TABLE"?

Best regards

--
Luben Karavelov


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [RFC] CREATE QUEUE (log-only table) for londiste/pgQ ccompatibility
Date: 2012-10-19 18:26:35
Message-ID: 50819B5B.7030507@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


> If we describe a queue as something you put stuff in at one end and
> get it out in same or some other specific order at the other end, then
> WAL _is_ a queue when you use it for replication (if you just write to it,
> then it is "Log", if you write and read, it is "Queue")

For that matter, WAL is a queue you use for recovery. But, for that
matter, BerkeleyDB is a database just as PostgreSQL as a database. That
doesn't mean you can use BerkeleyDB and PostgreSQL for all the same tasks.

> All it takes to make this scenario work is keeping track of LSN or simply
> log position on the slave side.
>
> What you seem to be wanting is support for a cooperative consumers,
> that is multiple consumers on the same queue working together and
> sharing the work to process the incoming event .
>
> This can be easily achieved using a single ordered event stream and
> extra bookkeeping structures on the consumer side (look at cooperative
> consumer samples in skytools).

What I'm saying is, we'll get nowhere promoting an application queue
which is permanently inferior to existing, popular open source software.
My advice: Forget about the application queue aspects of this. Focus
on making it work for replication and matviews, which are already hard
use cases to optimize.

If someone can turn this feature into the base for a distributed
queueing system later, then great. But let's not complicate this
feature by worrying about a use case it may never fulfill.

> Thanks to introducing logical replication, it now makes sense to have
> actions recorded _only_ in this queue and this is what the whole RC was
> about.

Yes, I agree.

I'm just pointing out that the needs of a replication queue and of an
application queue are divergent.

> Currently the "clear tech spec" is just this:
>
> * works as table on INSERTS up to inserting logical WAL record
> describing the
> insert but no data is inserted locally.

Yeah, I think where you confused a bunch of people here is the
definition of "locally". Let me see if I understand this:

* a Writer would INSERT data into the LOG ONLY TABLE (L.O.T.), which
write would be synched to WAL but there would be no in-memory or on-disk
version of the table updated.

* Readers could subscribe to the LSN for the L.O.T. and would receive a
stream of INSERTs, which they could handle as they wished.

Is my understanding correct? If it is, I have more questions!

> with all things that follow from the local table having no data
> - unique constraints don't make sense
> - indexes make no sense
> - updates and deletes hit no data
> - etc. . .

Right.

> As Simon explained, the initial RFC was just about not keeping the
> data in local table if we know it will never be accessed

Ah, so to answer Simon's question: no, this RFC makes no sense without a
description of expected Reader activity.

> (at leas not
> for anything except vacuum and delete/truncate)

If the table is not being represented as a table in the catalog or on
disk, why would it ever need to be vacuumed?

> It is very hard for me to tell for sure if walsender->walreceiver combo
> "reads the events" on master or slave side

Well, presumably the only way a Reader on the master could get the queue
would be for the master to subscribe to its own LSN. No?

> HEAD is the queue producer, where the events go in (any insert on master)
>
> TAIL (to avoid another word) is where they come out
> (walreader -> walreceiver moving the events to slave)

BTW, I suggest using "Writer" and "Reader" for the queue roles, not
"Head" and "Tail", which terms are rather unclear.

> Think of an analogy with a snake feeding on berries used by
> an ant colony to get the nutrients in the berries to its nest :)

That's a very ... unique analogy. ;-)

>>> Having said that, the LOGGING ONLY syntax makes me shiver. Better name?
>>
> I guess WRITE ONLY tables would get us more publicity would not be
> entirely correct, as the data is readable from the log .

I like LOG ONLY TABLES, actually; it's the mirror of UNLOGGED TABLEs.
Or REPLICATION MESSAGE TABLE.

Now, since I've pointed out what use case this mechanism does not apply
to (replacing a generic application queue), let me point out some ones
which it *does* apply to, and handily:

* Updating matviews on a replica
* Updating a cache (assuming an autonomous LSN reader)
* Remote security logging (especially if combined with command triggers)

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


From: Jim Nasby <jim(at)nasby(dot)net>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [RFC] CREATE QUEUE (log-only table) for londiste/pgQ ccompatibility
Date: 2012-10-22 19:58:37
Message-ID: 5085A56D.2090707@nasby.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 10/19/12 1:26 PM, Josh Berkus wrote:
> What I'm saying is, we'll get nowhere promoting an application queue
> which is permanently inferior to existing, popular open source software.
> My advice: Forget about the application queue aspects of this. Focus
> on making it work for replication and matviews, which are already hard
> use cases to optimize.
>
> If someone can turn this feature into the base for a distributed
> queueing system later, then great. But let's not complicate this
> feature by worrying about a use case it may never fulfill.

And as someone else mentioned... we should call this a stream and not a queue, since this would be lacking in many queue features.

It certainly sounds like a useful framework to have.
--
Jim C. Nasby, Database Architect jim(at)nasby(dot)net
512.569.9461 (cell) http://jim.nasby.net


From: Greg Stark <stark(at)mit(dot)edu>
To: Christopher Browne <cbbrowne(at)gmail(dot)com>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, PostgreSQL Mailing Lists <pgsql-hackers(at)postgresql(dot)org>, Hannu Krosing <hannu(at)2ndquadrant(dot)com>
Subject: Re: [RFC] CREATE QUEUE (log-only table) for londiste/pgQ ccompatibility
Date: 2012-10-22 23:31:54
Message-ID: CAM-w4HON+iV-t3zBsO_NGg4XgGbjRT8pdgi3igYkYGMGwUuddg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Oct 17, 2012 at 7:48 PM, Christopher Browne <cbbrowne(at)gmail(dot)com> wrote:
> Well, replication is arguably a relevant case.
>
> For Slony, the origin/master node never cares about logged changes - that
> data is only processed on replicas. Now, that's certainly a little weaselly
> - the log data (sl_log_*) has got to get read to get to the replica.

Well this is a clever way for Slony to use existing infrastructure to
get data into the WAL. But wouldn't it be more logical for an in-core
system to just annotate the existing records with enough information
to replay them logically? Instead of synthesizing inserts into an
imaginary table containing data that can be extracted to retrieve info
about some other record, just add the info needed to the relevant
record.

The minimum needed for DML afaict is DELETE and UPDATE records need
the primary key of the record being deleted and updated. It might make
sense to include the whole tupledesc or at least key parts of it like
the attlen and atttyp array so that replay can be more robust. But the
logical place for this data, it seems to me, is *in* the update or
insert record that already exists. Otherwise managing logical
standbies will require a whole duplicate set of infrastructure to keep
track of what has and hasn't been replayed. For instance what if an
update record is covered by a checkpoint but the logical record falls
after the checkpoint and the system crashes before writing it out?

--
greg


From: Hannu Krosing <hannu(at)2ndQuadrant(dot)com>
To: Greg Stark <stark(at)mit(dot)edu>
Cc: Christopher Browne <cbbrowne(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, PostgreSQL Mailing Lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [RFC] CREATE QUEUE (log-only table) for londiste/pgQ ccompatibility
Date: 2012-10-23 10:41:06
Message-ID: 50867442.7070309@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 10/23/2012 01:31 AM, Greg Stark wrote:
> On Wed, Oct 17, 2012 at 7:48 PM, Christopher Browne <cbbrowne(at)gmail(dot)com> wrote:
>> Well, replication is arguably a relevant case.
>>
>> For Slony, the origin/master node never cares about logged changes - that
>> data is only processed on replicas. Now, that's certainly a little weaselly
>> - the log data (sl_log_*) has got to get read to get to the replica.
> Well this is a clever way for Slony to use existing infrastructure to
> get data into the WAL. But wouldn't it be more logical for an in-core
> system to just annotate the existing records with enough information
> to replay them logically?
The QUEUE / LOG ONLY TABLES / WRITE ONLY TABLES :) proposal
was _not_ for use in standard replication - it is already covered by
what is being done - but for cases where the data is needed _only_
on the slave/replay side.

One typical case is sending e-mail on some database actions, like
sending a greeting or confirmation mail when creating a new user.

On a busy system you often want to offload the things that can be
done asynchronously to other hosts.

My RFC was for a proposal to skip writing the unneeded info in local
tables and put it _only_ in WAL.
> Instead of synthesizing inserts into an
> imaginary table containing data that can be extracted to retrieve info
> about some other record, just add the info needed to the relevant
> record.
This is more or less how the current system is being designed,
only the "add enough relevant info" part is offloaded to logical
version of WALSender
> The minimum needed for DML afaict is DELETE and UPDATE records need
> the primary key of the record being deleted and updated. It might make
> sense to include the whole tupledesc or at least key parts of it like
> the attlen and atttyp array so that replay can be more robust. But the
> logical place for this data, it seems to me, is *in* the update or
> insert record that already exists. Otherwise managing logical
> standbies will require a whole duplicate set of infrastructure to keep
> track of what has and hasn't been replayed. For instance what if an
> update record is covered by a checkpoint but the logical record falls
> after the checkpoint and the system crashes before writing it out?
>
This complexity (which is really a lot more than you briefley
described here) is the reason the construction of the "update records"
from WAL records was moved back to master side. In original design
it was hoped that it could be done all on slave by keeping an own
time-synced copy of system catalog.

Currently it seems to play out reasonably well, but I'd not completely
rule out some new complexities arising which would force the creation
of (more of the) full logical DML records as part of WAL.

The downside would be performance, which for current case is mostly
inaffected on the write side, but would be affected a lot more if the WAL
volume had to increase significantly to accommodate all needed info for
LogRep

---------------
Hannu


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Hannu Krosing <hannu(at)2ndQuadrant(dot)com>
Cc: Greg Stark <stark(at)mit(dot)edu>, Christopher Browne <cbbrowne(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, PostgreSQL Mailing Lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [RFC] CREATE QUEUE (log-only table) for londiste/pgQ ccompatibility
Date: 2012-10-23 14:13:50
Message-ID: 17539.1351001630@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

[ hadn't been following this thread, sorry ]

Hannu Krosing <hannu(at)2ndQuadrant(dot)com> writes:
> My RFC was for a proposal to skip writing the unneeded info in local
> tables and put it _only_ in WAL.

This concept seems fundamentally broken. What will happen if the master
crashes immediately after emitting the WAL record? It will replay it
locally, that's what, and thus you have uncertainty about whether the
master will contain the data or not.

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: Hannu Krosing <hannu(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [RFC] CREATE QUEUE (log-only table) for londiste/pgQ ccompatibility
Date: 2012-10-23 16:47:48
Message-ID: CA+TgmoaDWLDmHkq2ORB1Er77YLWsHCrhpgOsyy3mXeFYH64WKQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Oct 17, 2012 at 4:25 PM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
>> It is not meant to be a full implementation of application level queuing
>> system though but just the capture, persisting and distribution parts
>>
>> Using this as an "application level queue" needs a set of interface
>> functions to extract the events and also to keep track of the processed
>> events. As there is no general consensus what these shoul be (like if
>> processing same event twice is allowed) this part is left for specific
>> queue consumer implementations.
>
> Well, but AFAICT, you've already prohibited features through your design
> which are essential to application-level queues, and are implemented by,
> for example, pgQ.
>
> 1. your design only allows the queue to be read on replicas, not on the
> node where the item was inserted.
>
> 2. if you can't UPDATE or DELETE queue items -- or LOCK them -- how on
> earth would a client know which items they have executed and which they
> haven't?
>
> 3. Double-down on #2 in a multithreaded environment.
>
> For an application-level queue, the base functionality is:
>
> ADD ITEM
> READ NEXT (#) ITEM(S)
> LOCK ITEM
> DELETE ITEM
>
> More sophisticated an useful queues also allow:
>
> READ NEXT UNLOCKED ITEM
> LOCK NEXT UNLOCKED ITEM
> UPDATE ITEM
> READ NEXT (#) UNSEEN ITEM(S)
>
> The design you describe seems to prohibit pretty much all of the above
> operations after READ NEXT. This makes it completely useless as a
> application-level queue.
>
> And, for that matter, if your new queue only accepts INSERTs, why not
> just improve LISTEN/NOTIFY so that it's readable on replicas? What does
> this design buy you that that doesn't?

I've read the whole thread, but I still don't see that anyone's said
it better than this, and I agree with these comments. (I don't find
them ad hominem, either.)

It's also worth noting that in order to be useful, this feature
intrinsically requires the logical replication stuff to be committed
first. It's not entirely clear that there's not enough time to get
logical replication committed for 9.3, and the chances of getting any
follow-on features getting committed as well seems remote. Besides
the shortness of the time, I think all experience has shown that it's
best not to rush into the design of follow-on features before we've
got the basic feature well nailed down. This certainly can't be said
of logical replication at this point. Andres seems to be making good
progress and I'm grateful for his work on it, but I think there's a
lot left to do before that one is in the bag (as I think Andres would
agree).

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Hannu Krosing <hannu(at)krosing(dot)net>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [RFC] CREATE QUEUE (log-only table) for londiste/pgQ ccompatibility
Date: 2012-10-24 12:00:18
Message-ID: 5087D852.9050903@krosing.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 10/23/2012 04:13 PM, Tom Lane wrote:
> [ hadn't been following this thread, sorry ]
>
> Hannu Krosing <hannu(at)2ndQuadrant(dot)com> writes:
>> My RFC was for a proposal to skip writing the unneeded info in local
>> tables and put it _only_ in WAL.
> This concept seems fundamentally broken. What will happen if the master
> crashes immediately after emitting the WAL record? It will replay it
> locally, that's what, and thus you have uncertainty about whether the
> master will contain the data or not.
I agree that emitting a record indistinguishable from current insert
record would probably be a bad idea as it would require the WAL
replay to examine the table description to find that the corresponding
table does not accept local data .

It surely would be better to use a special record type so crash
recovery on the master knows not to replay it.

The syntax and mechanics of what would essentially be a simple QUEUEing
feature being declared and defined in a similar way to a table were chosen
for 2 reasons -
* familiarity - easy to adapt
* most structure can be shared with tables & views - easy to implement

--------------------
Hannu

> regards, tom lane
>
>


From: Hannu Krosing <hannu(at)krosing(dot)net>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, Hannu Krosing <hannu(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [RFC] CREATE QUEUE (log-only table) for londiste/pgQ ccompatibility
Date: 2012-10-24 12:15:11
Message-ID: 5087DBCF.2020504@krosing.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 10/23/2012 06:47 PM, Robert Haas wrote:
> On Wed, Oct 17, 2012 at 4:25 PM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
...
>> 3. Double-down on #2 in a multithreaded environment.
>>
>> For an application-level queue, the base functionality is:
>>
>> ADD ITEM
>> READ NEXT (#) ITEM(S)
>> LOCK ITEM
>> DELETE ITEM
>>
>> More sophisticated an useful queues also allow:
>>
>> READ NEXT UNLOCKED ITEM
>> LOCK NEXT UNLOCKED ITEM
>> UPDATE ITEM
>> READ NEXT (#) UNSEEN ITEM(S)
>>
>> The design you describe seems to prohibit pretty much all of the above
>> operations after READ NEXT. This makes it completely useless as a
>> application-level queue.

By the above logic MVCC "prohibits" UPDATES and DELETES on table data ;)

WAL-only tables/queues "prohobit" none of what you claim above, you just
implement in a (loosely) MVCC way by keeping track of what events are
processed.

>>
>> And, for that matter, if your new queue only accepts INSERTs, why not
>> just improve LISTEN/NOTIFY so that it's readable on replicas? What does
>> this design buy you that that doesn't?
I get the ability to easily keep track of which events are already acted on
and which are not.

And you really can't fall back on processing LISTEN/NOTIFY - they
come when they come.

For WAL based event stream you only need to track LSN and in case
of multiple cooperative consumers (which I think Josh meant by
"multithreaded" above) a small structure to keep track of locking
and event consumption while The WAL part takes care of consistency,
order and durability.

> I've read the whole thread, but I still don't see that anyone's said
> it better than this, and I agree with these comments. (I don't find
> them ad hominem, either.)
>
> It's also worth noting that in order to be useful, this feature
> intrinsically requires the logical replication stuff to be committed
> first.
I agree that this feature - at least if implemented as
proposed - does need some underlying features from the Logical
Replication.

Otoh it does not really _need_ to have full logical replication
integrated - just
having a special WAL type and easy way for your own WAL reader (something
like pg_basebackup cold work well a a sample).

Without WAL-based logical replication I already can do the same
thing in a bit more expensive way by having a before trigger which
logs the insert in Slony/Londiste style event table and then cancels it
on the main table.
> It's not entirely clear that there's not enough time to get
> logical replication committed for 9.3, and the chances of getting any
> follow-on features getting committed as well seems remote. Besides
> the shortness of the time, I think all experience has shown that it's
> best not to rush into the design of follow-on features before we've
> got the basic feature well nailed down. This certainly can't be said
> of logical replication at this point. Andres seems to be making good
> progress and I'm grateful for his work on it, but I think there's a
> lot left to do before that one is in the bag (as I think Andres would
> agree).
>


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Hannu Krosing <hannu(at)krosing(dot)net>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Hannu Krosing <hannu(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [RFC] CREATE QUEUE (log-only table) for londiste/pgQ ccompatibility
Date: 2012-10-25 13:14:38
Message-ID: 50893B3E.2040804@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


> WAL-only tables/queues "prohobit" none of what you claim above, you just
> implement in a (loosely) MVCC way by keeping track of what events are
> processed.

Well, per our discussion here in person, I'm not convinced that this
buys us anything in the "let's replace AMQ" case. However, as I pointed
out in my last email, this feature doesn't need to replace AMQ to be
useful. Let's focus on the original use case of supplying a queue which
Londiste and Slony can use, which is a sufficient motivation to push the
feature if the Slony and Londiste folks think it's good enough (and it
seems that they do).

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [RFC] CREATE QUEUE (log-only table) for londiste/pgQ ccompatibility
Date: 2012-10-26 20:27:54
Message-ID: 508AF24A.7040206@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


> Well, per our discussion here in person, I'm not convinced that this
> buys us anything in the "let's replace AMQ" case. However, as I pointed
> out in my last email, this feature doesn't need to replace AMQ to be
> useful. Let's focus on the original use case of supplying a queue which
> Londiste and Slony can use, which is a sufficient motivation to push the
> feature if the Slony and Londiste folks think it's good enough (and it
> seems that they do).

BTW, I talked to Marko Kreen about this feature at the boat party, and
he thought it would work for pgQ.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com