Re: patch: preload dictionary new version

From: Takahiro Itagaki <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: preload dictionary new version
Date: 2010-07-08 02:50:51
Message-ID: 20100708115051.FFE1.52131E4D@oss.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:

> this version has enhanced AllocSet allocator - it can use a mmap API.

I review your patch and will report some comments. However, I don't have
test cases for the patch because there is no large dictionaries in the
default postgres installation. I'd like to ask you to supply test data
for the patch.

This patch allocates memory with non-file-based mmap() to preload text search
dictionary files at the server start. Note that dist files are not mmap'ed
directly in the patch; mmap() is used for reallocatable shared memory.

The dictinary loader is also modified a bit to use simple_alloc() instead
of palloc() for long-lived cache. It can reduce calls of AllocSetAlloc(),
that have some overheads to support pfree(). Since the cache is never
released, simple_alloc() seems to have better performance than palloc().
Note that the optimization will also work for non-preloaded dicts.

=== Questions ===
- How do backends share the dict cache? You might expect postmaster's
catalog is inherited to backends with fork(), but we don't use fork()
on Windows.

- Why are SQL functions dpreloaddict_init() and dpreloaddict_lexize()
defined but not used?

=== Design ===
- You added 3 custom parameters (dict_preload.dictfile/afffile/stopwords),
but I think text search configuration names is better than file names.
However, it requires system catalog access but we cannot access any
catalog at the moment of preloading. If config-name-based setting is
difficult, we need to write docs about where we can get the dict names
to be preloaded instead. (from \dFd+ ?)

- Do we need to support multiple preloaded dicts? I think dict_preload.*
should accept a list of items to be loaded. GUC_LIST_INPUT will be a help.

- Server doesn't start when I added dict_preload to
shared_preload_libraries and didn't add any custom parameters.
FATAL: missing AffFile parameter
But server should start with no effects or print WARNING messages
for "no dicts are preloaded" in such case.

- We could replace simple_alloc() to a new MemoryContextMethods that
doesn't support pfree() but has better performance. It doesn't look
ideal for me to implement simple_alloc() on the top of palloc().

=== Implementation ===
I'm sure that your patch is WIP, but I'll note some issues just in case.

- We need Makefile for contrib/dict_preload.

- mmap() is not always portable. We should check the availability
in configure, and also have an alternative implementation for Win32.

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message marcin mank 2010-07-08 03:52:14 Re: Proposal for 9.1: WAL streaming from WAL buffers
Previous Message Robert Haas 2010-07-08 02:26:15 Re: ALTER TABLE SET STATISTICS requires AccessExclusiveLock