Re: contrib/cache_scan (Re: What's needed for cache-only table scan?)

From: Kohei KaiGai <kaigai(at)kaigai(dot)gr(dot)jp>
To: KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PgHacker <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: contrib/cache_scan (Re: What's needed for cache-only table scan?)
Date: 2014-02-08 02:09:29
Message-ID: CADyhKSXCKu33hbLfdwi5_GGfqf20fd1F6QSScO+2pMOPkjBncg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello,

Because of time pressure in the commit-fest:Jan, I tried to simplifies the patch
for cache-only scan into three portions; (1) add a hook on heap_page_prune
for cache invalidation on vacuuming a particular page. (2) add a check to accept
InvalidBuffer on SetHintBits (3) a proof-of-concept module of cache-only scan.

(1) pgsql-v9.4-heap_page_prune_hook.v1.patch
Once on-memory columnar cache is constructed, then it needs to be invalidated
if heap page on behalf of the cache is modified. In usual DML cases, extension
can get control using row-level trigger functions for invalidation,
however, we right
now have no way to get control on a page is vacuumed, usually handled by
autovacuum process.
This patch adds a callback on heap_page_prune(), to allow extensions to prune
dead entries on its cache, not only heap pages.
I'd also like to see any other scenario we need to invalidate columnar cache
entries, if exist. It seems to me object_access_hook makes sense to conver
DDL and VACUUM FULL scenario...

(2) pgsql-v9.4-HeapTupleSatisfies-accepts-InvalidBuffer.v1.patch
In case when we want to check visibility of the tuples on cache entries (thus
no particular shared buffer is associated) using HeapTupleSatisfiesVisibility,
it internally tries to update hint bits of tuples. However, it does
not make sense
onto the tuples being not associated with a particular shared buffer.
Due to its definition, tuple entries being on cache does not connected with
a particular shared buffer. If we need to load whole of the buffer page to set
hint bits, it is totally nonsense because the purpose of on-memory cache is
to reduce disk accesses.
This patch adds an exceptional condition on SetHintBits() to skip anything
if the given buffer is InvalidBuffer. It allows to check tuple
visibility using regular
visibility check functions, without re-invention of the wheel by themselves.

(3) pgsql-v9.4-contrib-cache-scan.v1.patch
Unlike (1) and (2), this patch is just a proof of the concept to
implement cache-
only scan on top of the custom-scan interface.
It tries to offer an alternative scan path on the table with row-level
triggers for
cache invalidation if total width of referenced columns are less than 30% of the
total width of table definition. Thus, it can keep larger number of records with
meaningful portion on the main memory.
This cache shall be invalidated according to the main heap update. One is
row-level trigger, second is object_access_hook on DDL, and the third is
heap_page_prune hook. Once a columns reduced tuple gets cached, it is
copied to the cache memory from the shared buffer, so it needs a feature
to ignore InvalidBuffer for visibility check functions.

Please volunteer to reviewing the patches, especially (1) and (2) that are
very small portion.

Thanks,

2014-01-21 KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>:
> Hello,
>
> I revisited the patch for contrib/cache_scan extension.
> The previous one had a problem when T-tree node shall be rebalanced
> then crashed on merging the node.
>
> Even though contrib/cache_scan portion has more than 2KL code,
> things I'd like to have a discussion first is a portion of the
> core enhancements to run MVCCsnapshot on the cached tuple, and
> to get callback on vacuumed pages for cache synchronization.
>
> Any comments please.
>
> Thanks,
>
>
> (2014/01/15 0:06), Kohei KaiGai wrote:
>>
>> Hello,
>>
>> The attached patch is what we discussed just before the commit-fest:Nov.
>>
>> It implements an alternative way to scan a particular table using
>> on-memory
>> cache instead of the usual heap access method. Unlike buffer cache, this
>> mechanism caches a limited number of columns on the memory, so memory
>> consumption per tuple is much smaller than the regular heap access method,
>> thus it allows much larger number of tuples on the memory.
>>
>> I'd like to extend this idea to implement a feature to cache data
>> according to
>> column-oriented data structure to utilize parallel calculation processors
>> like
>> CPU's SIMD operations or simple GPU cores. (Probably, it makes sense to
>> evaluate multiple records with a single vector instruction if contents of
>> a particular column is put as a large array.)
>> However, this patch still keeps all the tuples in row-oriented data
>> format,
>> because row <=> column translation makes this patch bigger than the
>> current form (about 2KL), and GPU integration needs to link proprietary
>> library (cuda or opencl) thus I thought it is not preferable for the
>> upstream
>> code.
>>
>> Also note that this patch needs part-1 ~ part-3 patches of CustomScan
>> APIs as prerequisites because it is implemented on top of the APIs.
>>
>> One thing I have to apologize is, lack of documentation and source code
>> comments around the contrib/ code. Please give me a couple of days to
>> clean-up the code.
>> Aside from the extension code, I put two enhancement on the core code
>> as follows. I'd like to have a discussion about adequacy of these
>> enhancement.
>>
>> The first enhancement is a hook on heap_page_prune() to synchronize
>> internal state of extension with changes of heap image on the disk.
>> It is not avoidable to hold garbage, increasing time by time, on the
>> cache,
>> thus needs to clean up as vacuum process doing. The best timing to do
>> is when dead tuples are reclaimed because it is certain nobody will
>> reference the tuples any more.
>>
>> diff --git a/src/backend/utils/time/tqual.c
>> b/src/backend/utils/time/tqual.c
>> index f626755..023f78e 100644
>> --- a/src/backend/utils/time/tqual.c
>> bool marked[MaxHeapTuplesPerPage + 1];
>> } PruneState;
>>
>> +/* Callback for each page pruning */
>> +heap_page_prune_hook_type heap_page_prune_hook = NULL;
>> +
>> /* Local functions */
>> static int heap_prune_chain(Relation relation, Buffer buffer,
>> OffsetNumber rootoffnum,
>> @@ -294,6 +297,16 @@ heap_page_prune(Relation relation, Buffer buffer,
>> Transacti
>> onId OldestXmin,
>> * and update FSM with the remaining space.
>> */
>>
>> + /*
>> + * This callback allows extensions to synchronize their own status
>> with
>> + * heap image on the disk, when this buffer page is vacuumed.
>> + */
>> + if (heap_page_prune_hook)
>> + (*heap_page_prune_hook)(relation,
>> + buffer,
>> + ndeleted,
>> + OldestXmin,
>> + prstate.latestRemovedXid);
>> return ndeleted;
>> }
>>
>>
>> The second enhancement makes SetHintBits() accepts InvalidBuffer to
>> ignore all the jobs. We need to check visibility of cached tuples when
>> custom-scan node scans cached table instead of the heap.
>> Even though we can use MVCC snapshot to check tuple's visibility,
>> it may internally set hint bit of tuples thus we always needs to give
>> a valid buffer pointer to HeapTupleSatisfiesVisibility(). Unfortunately,
>> it kills all the benefit of table cache if it takes to load the heap
>> buffer
>> being associated with the cached tuple.
>> So, I'd like to have a special case handling on the SetHintBits() for
>> dry-run when InvalidBuffer is given.
>>
>> diff --git a/src/backend/utils/time/tqual.c
>> b/src/backend/utils/time/tqual.c
>> index f626755..023f78e 100644
>> --- a/src/backend/utils/time/tqual.c
>> +++ b/src/backend/utils/time/tqual.c
>> @@ -103,11 +103,18 @@ static bool XidInMVCCSnapshot(TransactionId xid,
>> Snapshot snapshot);
>> *
>> * The caller should pass xid as the XID of the transaction to check, or
>> * InvalidTransactionId if no check is needed.
>> + *
>> + * In case when the supplied HeapTuple is not associated with a
>> particular
>> + * buffer, it just returns without any jobs. It may happen when an
>> extension
>> + * caches tuple with their own way.
>> */
>> static inline void
>> SetHintBits(HeapTupleHeader tuple, Buffer buffer,
>> uint16 infomask, TransactionId xid)
>> {
>> + if (BufferIsInvalid(buffer))
>> + return;
>> +
>> if (TransactionIdIsValid(xid))
>> {
>> /* NB: xid must be known committed here! */
>>
>> Thanks,
>>
>> 2013/11/13 Kohei KaiGai <kaigai(at)kaigai(dot)gr(dot)jp>:
>>>
>>> 2013/11/12 Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>:
>>>>
>>>> Kohei KaiGai <kaigai(at)kaigai(dot)gr(dot)jp> writes:
>>>>>
>>>>> So, are you thinking it is a feasible approach to focus on custom-scan
>>>>> APIs during the upcoming CF3, then table-caching feature as use-case
>>>>> of this APIs on CF4?
>>>>
>>>>
>>>> Sure. If you work on this extension after CF3, and it reveals that the
>>>> custom scan stuff needs some adjustments, there would be time to do that
>>>> in CF4. The policy about what can be submitted in CF4 is that we don't
>>>> want new major features that no one has seen before, not that you can't
>>>> make fixes to previously submitted stuff. Something like a new hook
>>>> in vacuum wouldn't be a "major feature", anyway.
>>>>
>>> Thanks for this clarification.
>>> 3 days are too short to write a patch, however, 2 month may be sufficient
>>> to develop a feature on top of the scheme being discussed in the previous
>>> comitfest.
>>>
>>> Best regards,
>>> --
>>> KaiGai Kohei <kaigai(at)kaigai(dot)gr(dot)jp>
>>
>>
>>
>>
>>
>>
>
> --
> OSS Promotion Center / The PG-Strom Project
> KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>

--
KaiGai Kohei <kaigai(at)kaigai(dot)gr(dot)jp>

Attachment Content-Type Size
pgsql-v9.4-contrib-cache-scan.v1.patch application/octet-stream 74.8 KB
pgsql-v9.4-heap_page_prune_hook.v1.patch application/octet-stream 1.7 KB
pgsql-v9.4-HeapTupleSatisfies-accepts-InvalidBuffer.v1.patch application/octet-stream 840 bytes

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2014-02-08 04:11:23 Re: Move unused buffers to freelist
Previous Message Peter Eisentraut 2014-02-08 01:50:17 Re: [DOCS] Re: Viability of text HISTORY/INSTALL/regression README files (was Re: [COMMITTERS] pgsql: Document a few more regression test hazards.)