Skip site navigation (1) Skip section navigation (2)

Peripheral Links

Header And Logo

PostgreSQL
| The world's most advanced open source database.

Site Navigation

Search for
  Advanced Search

Re: using Tsearch2 for chemical text



Naz, in posted link to the dict_regex dictionary for tsearch2 http://lynx.sao.ru/~karpov/software/postgres_dict_regex.html

Feel free to test it and send us feedback. It's rather general, of course,
it uses regex (pcre library).

Oleg
On Thu, 26 Jul 2007, Naz Gassiep wrote:


I think you might need to write a custom lexer to divide the strings
into meaningful units.  If there are subsections of these names that
make sense to search for, then tsearch2 can certainly handle the
mechanics of that, but I doubt that the standard rules will divide
these names into lexemes usefully.

A custom lexer for tsearch2 that recognized chemistry related lexical components (di-, tetra-, acetyl-, ethan-, -oic, -ane, -ene etc) would increase *hugely* the out-of-the-box applicability of PostgreSQL to scientific applications. Perhaps such an effort could be co ordinated with a physics based lexer and biology related lexer, to perhaps provide a unified lexer that provided full scientific capabilities in the way that PostGIS provides unified geospatial capabilities.

I don't know how best to bring such an effort about, but I do know that if such a thing were created it would be a boon for PostgreSQL, giving it a very significant leg up in terms of functionality, not to mention the great positive impact that the wide, free availability of such a tool would have on the scientific research community.


---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend


	Regards,
		Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83



Home | Main Index | Thread Index

Privacy Policy | PostgreSQL Archives hosted by Command Prompt, Inc. | Designed by tinysofa
Copyright © 1996 – 2008 PostgreSQL Global Development Group