Re: contrib/cache_scan (Re: What's needed for cache-only table scan?)

From: Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>
To: Kohei KaiGai <kaigai(at)kaigai(dot)gr(dot)jp>
Cc: KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PgHacker <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: contrib/cache_scan (Re: What's needed for cache-only table scan?)
Date: 2014-02-12 05:59:50
Message-ID: CAJrrPGcjd0xgR5cyFTerVbweSHCScDCdNjMh9ey4H432XKVWXw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Feb 8, 2014 at 1:09 PM, Kohei KaiGai <kaigai(at)kaigai(dot)gr(dot)jp> wrote:

> Hello,
>
> Because of time pressure in the commit-fest:Jan, I tried to simplifies the
> patch
> for cache-only scan into three portions; (1) add a hook on heap_page_prune
> for cache invalidation on vacuuming a particular page. (2) add a check to
> accept
> InvalidBuffer on SetHintBits (3) a proof-of-concept module of cache-only
> scan.
>
> (1) pgsql-v9.4-heap_page_prune_hook.v1.patch
> Once on-memory columnar cache is constructed, then it needs to be
> invalidated
> if heap page on behalf of the cache is modified. In usual DML cases,
> extension
> can get control using row-level trigger functions for invalidation,
> however, we right
> now have no way to get control on a page is vacuumed, usually handled by
> autovacuum process.
> This patch adds a callback on heap_page_prune(), to allow extensions to
> prune
> dead entries on its cache, not only heap pages.
> I'd also like to see any other scenario we need to invalidate columnar
> cache
> entries, if exist. It seems to me object_access_hook makes sense to conver
> DDL and VACUUM FULL scenario...
>
>
> (2) pgsql-v9.4-HeapTupleSatisfies-accepts-InvalidBuffer.v1.patch
> In case when we want to check visibility of the tuples on cache entries
> (thus
> no particular shared buffer is associated) using
> HeapTupleSatisfiesVisibility,
> it internally tries to update hint bits of tuples. However, it does
> not make sense
> onto the tuples being not associated with a particular shared buffer.
> Due to its definition, tuple entries being on cache does not connected with
> a particular shared buffer. If we need to load whole of the buffer page to
> set
> hint bits, it is totally nonsense because the purpose of on-memory cache is
> to reduce disk accesses.
> This patch adds an exceptional condition on SetHintBits() to skip anything
> if the given buffer is InvalidBuffer. It allows to check tuple
> visibility using regular
> visibility check functions, without re-invention of the wheel by
> themselves.
>
>
> (3) pgsql-v9.4-contrib-cache-scan.v1.patch
> Unlike (1) and (2), this patch is just a proof of the concept to
> implement cache-
> only scan on top of the custom-scan interface.
> It tries to offer an alternative scan path on the table with row-level
> triggers for
> cache invalidation if total width of referenced columns are less than 30%
> of the
> total width of table definition. Thus, it can keep larger number of
> records with
> meaningful portion on the main memory.
> This cache shall be invalidated according to the main heap update. One is
> row-level trigger, second is object_access_hook on DDL, and the third is
> heap_page_prune hook. Once a columns reduced tuple gets cached, it is
> copied to the cache memory from the shared buffer, so it needs a feature
> to ignore InvalidBuffer for visibility check functions.
>

I reviewed all the three patches. The first 1 and 2 core PostgreSQL patches
are fine.
And I have comments in the third patch related to cache scan.

1. +# contrib/dbcache/Makefile

Makefile header comment is not matched with file name location.

2.+ /*
+ * Estimation of average width of cached tuples - it does not make
+ * sense to construct a new cache if its average width is more than
+ * 30% of the raw data.
+ */

Move the estimation of average width calculation of cached tuples into
the case where the cache is not found,
otherwise it is an overhead for cache hit scenario.

3. + if (old_cache)
+ attrs_used = bms_union(attrs_used, &old_cache->attrs_used);

can't we need the check to see the average width is more than 30%?
During estimation it doesn't
include the existing other attributes.

4. + lchunk->right = cchunk;
+ lchunk->l_depth = TTREE_DEPTH(lchunk->right);

I think it should be lchunk->r_depth needs to be set in a clock wise
rotation.

5. can you add some comments in the code with how the block is used?

6. In do_insert_tuple function I felt moving the tuples and rearranging
their addresses is little bit costly. How about the following way?

Always insert the tuple from the bottom of the block where the empty
space is started and store their corresponding reference pointers
in the starting of the block in an array. As and when the new tuple
inserts this array increases from block start and tuples from block end.
Just need to sort this array based on item pointers, no need to update
their reference pointers.

In this case the movement is required only when the tuple is moved from
one block to another block and also whenever if the continuous
free space is not available to insert the new tuple. you can decide
based on how frequent the sorting will happen in general.

7. In ccache_find_tuple function this Assert(i_min + 1 < cchunk->ntups);
can go wrong when only one tuple present in the block
with the equal item pointer what we are searching in the forward scan
direction.

8. I am not able to find a protection mechanism in insert/delete and etc of
a tuple in Ttree. As this is a shared memory it can cause problems.

9. + /* merge chunks if this chunk has enough space to merge */
+ ccache_merge_chunk(ccache, cchunk);

calling the merge chunks for every call back of heap page prune is a
overhead for vacuum. After the merge which may again leads
to node splits because of new data.

10. "columner" is present in some places of the patch. correct it.

11. In cache_scan_next function, incase of cache insert fails because of
shared memory the tuple pointer is not reset and cache is NULL.
Because of this during next record fetch it leads to assert as cache !=
NULL.

12. + if (ccache->status != CCACHE_STATUS_IN_PROGRESS)
+ cs_put_ccache(ccache);

The cache is created with refcnt as 2 and in some times two times put
cache is called to eliminate it and in some times with a different approach.
It is little bit confusing, can you explain in with comments with why 2
is required and how it maintains?

13. A performance report is required to see how much impact it can cause on
insert/delete/update operations because of cache synchronizer.

14. The Guc variable "cache_scan_disabled" is missed in docs description.

please let me know if you need any support.

Regards,
Hari Babu
Fujitsu Australia

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Inoue, Hiroshi 2014-02-12 06:31:20 Re: narwhal and PGDLLIMPORT
Previous Message Magnus Hagander 2014-02-12 05:12:43 Re: narwhal and PGDLLIMPORT