Re: [PATCH 06/16] Add support for a generic wal reading facility dubbed XLogReader

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Subject: Re: [PATCH 06/16] Add support for a generic wal reading facility dubbed XLogReader
Date: 2012-06-14 21:38:33
Message-ID: 201206142338.33897.andres@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thursday, June 14, 2012 11:19:00 PM Heikki Linnakangas wrote:
> On 13.06.2012 14:28, Andres Freund wrote:
> > Features:
> > - streaming reading/writing
> > - filtering
> > - reassembly of records
> >
> > Reusing the ReadRecord infrastructure in situations where the code that
> > wants to do so is not tightly integrated into xlog.c is rather hard and
> > would require changes to rather integral parts of the recovery code
> > which doesn't seem to be a good idea.
> It would be nice refactor ReadRecord and its subroutines out of xlog.c.
> That file has grown over the years to be really huge, and separating the
> code to read WAL sounds like it should be a pretty natural split. I
> don't want to duplicate all the WAL reading code, so we really should
> find a way to reuse that. I'd suggest rewriting ReadRecord into a thin
> wrapper that just calls the new xlogreader code.
I aggree that it is not very nice to duplicate it. But I also don't want to go
the route of replacing ReadRecord with it for a while, we can replace
ReadRecord later if we want. As long as it is in flux like it is right now I
don't really see the point in investing energy in it.
Also I am not that sure how a callback oriented API fits into the xlog.c
workflow?

> > Missing:
> > - "compressing" the stream when removing uninteresting records
> > - writing out correct CRCs
> > - validating CRCs
> > - separating reader/writer
>
> - comments.
> At a quick glance, I couldn't figure out how this works. There seems to
> be some callback functions? If you want to read an xlog stream using
> this facility, what do you do?
You currently have to fill out 4 callbacks:

XLogReaderStateInterestingCB is_record_interesting;
XLogReaderStateWriteoutCB writeout_data;
XLogReaderStateFinishedRecordCB finished_record;
XLogReaderStateReadPageCB read_page;

As an example how to use it (from the walsender support for
START_LOGICAL_REPLICATION):

if(!xlogreader_state){
xlogreader_state = XLogReaderAllocate();
xlogreader_state->is_record_interesting =
RecordRelevantForLogicalReplication;
xlogreader_state->finished_record = ProcessRecord;
xlogreader_state->writeout_data = WriteoutData;
xlogreader_state->read_page = XLogReadPage;

/* startptr is the current XLog position */
xlogreader_state->startptr = startptr;

XLogReaderReset(xlogreader_state);
}

/* how far does valid data go */
xlogreader_state->endptr = endptr;

XLogReaderRead(xlogreader_state);

The last step will then call the above callbacks till it reaches endptr. I.e.
it first reads a page with "read_page"; then checks whether a record is
interesting for the use-case ("is_record_interesting"); in case it is
interesting, it gets reassembled and passed to the "finished_record" callback.
Then the bytestream gets written out again with "writeout_data".

In this case it gets written to the buffer the walsender has allocated. In
others it might just get thrown away.

> Can this be used for writing WAL, as well as reading? If so, what do you
> need the write support for?
It currently can replace records which are not interesting (e.g. index changes
in the case of logical rep). Filtered records are replaced with XLOG_NOOP
records with correct length currently. In future the actual amount of data
should really be reduced. I don't know yet know how to map LSNs of
uncompressed/compressed stream onto each other...
The filtered data is then passed to a writeout callback (in a streaming
fashion).

The whole writing out part is pretty ugly at the moment and I just bolted it
ontop because it was convenient for the moment. I am not yet sure how the api
for that should look....

Andres

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2012-06-14 21:39:40 Re: measuring spinning
Previous Message Heikki Linnakangas 2012-06-14 21:34:05 Re: WIP: relation metapages