Re: contrib/cache_scan (Re: What's needed for cache-only table scan?)

From: Kohei KaiGai <kaigai(at)kaigai(dot)gr(dot)jp>
To: Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>
Cc: KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PgHacker <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: contrib/cache_scan (Re: What's needed for cache-only table scan?)
Date: 2014-02-12 15:42:29
Message-ID: CADyhKSVsKkXn1QH3i49ozWv-hT5WCX7z7xB-_5o3B0m+KUNv7A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

2014-02-12 14:59 GMT+09:00 Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>:
> I reviewed all the three patches. The first 1 and 2 core PostgreSQL patches
> are fine.
> And I have comments in the third patch related to cache scan.
>
Thanks for your volunteering.

> 1. +# contrib/dbcache/Makefile
>
> Makefile header comment is not matched with file name location.
>
Ahh, it's an old name when I started to implement.

> 2.+ /*
> + * Estimation of average width of cached tuples - it does not make
> + * sense to construct a new cache if its average width is more than
> + * 30% of the raw data.
> + */
>
> Move the estimation of average width calculation of cached tuples into
> the case where the cache is not found,
> otherwise it is an overhead for cache hit scenario.
>
You are right. If and when existing cache is found and match, its width is
obviously less than 30% of total width.

> 3. + if (old_cache)
> + attrs_used = bms_union(attrs_used, &old_cache->attrs_used);
>
> can't we need the check to see the average width is more than 30%? During
> estimation it doesn't
> include the existing other attributes.
>
Indeed. It should drop some attributes on the existing cache if total average
width grows more than the threshold. Probably, we need to have a statistical
variable to track how many times or how recently referenced.

> 4. + lchunk->right = cchunk;
> + lchunk->l_depth = TTREE_DEPTH(lchunk->right);
>
> I think it should be lchunk->r_depth needs to be set in a clock wise
> rotation.
>
Oops, nice cache.

> 5. can you add some comments in the code with how the block is used?
>
Sorry, I'll add it. A block is consumed from the head to store pointers of
tuples, and from the tail to store contents of the tuples. A block can hold
multiple tuples unless usage of tuple pointers from the head does not cross
the area for tuple contents. Anyway, I'll put it on the source code.

> 6. In do_insert_tuple function I felt moving the tuples and rearranging
> their addresses is little bit costly. How about the following way?
>
> Always insert the tuple from the bottom of the block where the empty
> space is started and store their corresponding reference pointers
> in the starting of the block in an array. As and when the new tuple
> inserts this array increases from block start and tuples from block end.
> Just need to sort this array based on item pointers, no need to update
> their reference pointers.
>
> In this case the movement is required only when the tuple is moved from
> one block to another block and also whenever if the continuous
> free space is not available to insert the new tuple. you can decide based
> on how frequent the sorting will happen in general.
>
It seems to me a reasonable suggestion.
Probably, an easier implementation is replacing an old block with dead-
spaces by a new block that contains only valid tuples, if and when dead-
space grows threshold of block-usage.

> 7. In ccache_find_tuple function this Assert(i_min + 1 < cchunk->ntups); can
> go wrong when only one tuple present in the block
> with the equal item pointer what we are searching in the forward scan
> direction.
>
It shouldn't happen, because the first or second ItemPointerCompare will
handle the condition. Please assume the cchunk->ntups == 1. In this case,
any given ctid shall match either of them, because any ctid is less, equal or
larger to the tuple being only cached, thus, it moves to the right or left node
according to the scan direction.

> 8. I am not able to find a protection mechanism in insert/delete and etc of
> a tuple in Ttree. As this is a shared memory it can cause problems.
>
For design simplification, I put a giant lock per columnar-cache.
So, routines in cscan.c acquires exclusive lwlock prior to invocation of
ccache_insert_tuple / ccache_delete_tuple.

> 9. + /* merge chunks if this chunk has enough space to merge */
> + ccache_merge_chunk(ccache, cchunk);
>
> calling the merge chunks for every call back of heap page prune is a
> overhead for vacuum. After the merge which may again leads
> to node splits because of new data.
>
OK, I'll check the condition to merge the chunks, to prevent too frequent
merge / split.

> 10. "columner" is present in some places of the patch. correct it.
>
Ahh, it should be "columnar".

> 11. In cache_scan_next function, incase of cache insert fails because of
> shared memory the tuple pointer is not reset and cache is NULL.
> Because of this during next record fetch it leads to assert as cache !=
> NULL.
>
You are right. I had to modify the state of scan as if normal-seqscan path,
not just setting NULL on csstate->ccache.
I left an older manner during try & error during implementation.

> 12. + if (ccache->status != CCACHE_STATUS_IN_PROGRESS)
> + cs_put_ccache(ccache);
>
> The cache is created with refcnt as 2 and in some times two times put
> cache is called to eliminate it and in some times with a different approach.
> It is little bit confusing, can you explain in with comments with why 2
> is required and how it maintains?
>
I thought, 2 is same as create + get, so putting the cache at end of the scan
does not release the cache. However, it might be confusing as you pointed out.
The process who create the cache knows it is the creator process. So, all it
needs to do is exiting the scan without putting the cache, if it successfully
create the cache and wants to leave the cache for later scan.

> 13. A performance report is required to see how much impact it can cause on
> insert/delete/update operations because of cache synchronizer.
>
OK, I'll try to measure the difference between them on next patch submission.

> 14. The Guc variable "cache_scan_disabled" is missed in docs description.
>
OK,

Thanks, I'll submit a revised one within a couple of days.
--
KaiGai Kohei <kaigai(at)kaigai(dot)gr(dot)jp>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2014-02-12 15:56:09 Re: Changeset Extraction v7.5
Previous Message Stefan Seifert 2014-02-12 14:54:31 Docs incorrectly claiming equivalence between show and pg_settings