Skip site navigation (1) Skip section navigation (2)

Peripheral Links

Header And Logo

PostgreSQL
| The world's most advanced open source database.

Site Navigation

Search for
  Advanced Search

Re: multi terabyte fulltext searching


  • From: Arturo Perez <aperez(at)hayesinc(dot)com>
  • To: pgsql-general(at)postgresql(dot)org
  • Subject: Re: multi terabyte fulltext searching
  • Date: Thu, 22 Mar 2007 14:57:10 -0400
  • Message-id: <pan(dot)2007(dot)03(dot)22(dot)18(dot)57(dot)10(dot)22444(at)hayesinc(dot)com>

On Wed, 21 Mar 2007 08:57:39 -0700, Benjamin Arai wrote:

> Hi Oleg,
> 
> I am currently using GIST indexes because I receive about 10GB of new data
> a week (then again I am not deleting any information).  The do not expect
> to be able to stop receiving text for about 5 years, so the data is not
> going to become static any time soon.  The reason I am concerned with
> performance is that I am providing a search system for several newspapers
> since essentially the beginning of time.  Many bibliographer etc would
> like to use this utility but if each search takes too long I am not going
> to be able to support many concurrent users.
> 
> Benjamin
>


At a previous job, I built a system to do this.  We had 3,000 publications
and approx 70M newspaper articles.  Total content size (postprocessed) was
on the order of >100GB, IIRC.  We used a proprietary (closed-source
not ours) search engine.

In order to reach subsecond response time we needed to horizontally scale
to about 50-70 machines, each a low-end Dell 1650.  This was after about 5
years of trying to vertically scale.

-arturo




Home | Main Index | Thread Index

Privacy Policy | PostgreSQL Archives hosted by Command Prompt, Inc. | Designed by tinysofa
Copyright © 1996 – 2008 PostgreSQL Global Development Group