Re: Slow running to_tsvector (tsearch2 in PG 8.2.3)

Lists: pgsql-general
From: Henrik Zagerholm <henke(at)mac(dot)se>
To: pgsql-general(at)postgresql(dot)org
Subject: Slow running to_tsvector (tsearch2 in PG 8.2.3)
Date: 2007-02-21 09:59:32
Message-ID: 81E2F3F6-0543-4D78-B205-3EC575D3A081@mac.se
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Hello list,

I've been testing tsearch2 for a while and I recently noticed some
really slow queries.

This is a quite big document so the times are maybe accurate.

The document has about 194 000 words.

I put all the data in tbl_fulltext.fulltext_text and then did a

update tbl_fulltext set vectors = to_tsvector(fulltext_text);

Which takes about 80 seconds to complete.
The hardware is a Pentium 4 2.8GHz with 1GB HyperX memory.

Is this normal? What can I tweak in postgresql.conf to speed up big
to_tsvector()?

Regards,
Henrik


From: Richard Huxton <dev(at)archonet(dot)com>
To: Henrik Zagerholm <henke(at)mac(dot)se>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Slow running to_tsvector (tsearch2 in PG 8.2.3)
Date: 2007-02-21 11:05:58
Message-ID: 45DC2796.4030102@archonet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Henrik Zagerholm wrote:
> The document has about 194 000 words.

> update tbl_fulltext set vectors = to_tsvector(fulltext_text);
>
> Which takes about 80 seconds to complete.
> The hardware is a Pentium 4 2.8GHz with 1GB HyperX memory.
>
> Is this normal? What can I tweak in postgresql.conf to speed up big
> to_tsvector()?

Is this a problem? How many 194,000 word documents do you index? If you
have many, you might want to separate them into chapters. I think
tsearch2 is more targetted at web-pages or short PDFs rather than whole
books.

--
Richard Huxton
Archonet Ltd


From: Markus Schiltknecht <markus(at)bluegap(dot)ch>
To: Henrik Zagerholm <henke(at)mac(dot)se>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Slow running to_tsvector (tsearch2 in PG 8.2.3)
Date: 2007-02-21 11:09:14
Message-ID: 45DC285A.4010106@bluegap.ch
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Hi,

Henrik Zagerholm wrote:
> Which takes about 80 seconds to complete.
> The hardware is a Pentium 4 2.8GHz with 1GB HyperX memory.
>
> Is this normal? What can I tweak in postgresql.conf to speed up big
> to_tsvector()?

Hm.. seems not too unreasonable to me.

Take a look at the stemmers or dictionaries involved. What do you use there?

Regards

Markus


From: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
To: Henrik Zagerholm <henke(at)mac(dot)se>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Slow running to_tsvector (tsearch2 in PG 8.2.3)
Date: 2007-02-21 12:01:25
Message-ID: Pine.LNX.4.64.0702211458570.400@sn.sai.msu.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Wed, 21 Feb 2007, Henrik Zagerholm wrote:

> Hello list,
>
> I've been testing tsearch2 for a while and I recently noticed some really
> slow queries.
>
> This is a quite big document so the times are maybe accurate.
>
> The document has about 194 000 words.
>
> I put all the data in tbl_fulltext.fulltext_text and then did a
>
> update tbl_fulltext set vectors = to_tsvector(fulltext_text);
>
> Which takes about 80 seconds to complete.
> The hardware is a Pentium 4 2.8GHz with 1GB HyperX memory.
>
> Is this normal? What can I tweak in postgresql.conf to speed up big
> to_tsvector()?

What is your configuration ? to_tsvector does a lot of work.

>
> Regards,
> Henrik
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Markus Schiltknecht <markus(at)bluegap(dot)ch>
Cc: Henrik Zagerholm <henke(at)mac(dot)se>, pgsql-general(at)postgresql(dot)org
Subject: Re: Slow running to_tsvector (tsearch2 in PG 8.2.3)
Date: 2007-02-21 14:44:43
Message-ID: 13205.1172069083@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Markus Schiltknecht <markus(at)bluegap(dot)ch> writes:
> Henrik Zagerholm wrote:
>> Is this normal? What can I tweak in postgresql.conf to speed up big
>> to_tsvector()?

> Hm.. seems not too unreasonable to me.
> Take a look at the stemmers or dictionaries involved. What do you use there?

Also, I wonder how much of the time went into to_tsvector() and how much
into updating the GIST or GIN index that I suppose is on the column.
For a full update like this it might make sense to drop the index,
update the column, rebuild the index from scratch.

regards, tom lane


From: cedric <cedric(at)over-blog(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Slow running to_tsvector (tsearch2 in PG 8.2.3)
Date: 2007-02-21 15:01:16
Message-ID: 200702211601.17661.cedric@over-blog.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Le mercredi 21 février 2007 15:44, Tom Lane a écrit :
> Markus Schiltknecht <markus(at)bluegap(dot)ch> writes:
> > Henrik Zagerholm wrote:
> >> Is this normal? What can I tweak in postgresql.conf to speed up big
> >> to_tsvector()?
> >
> > Hm.. seems not too unreasonable to me.
> > Take a look at the stemmers or dictionaries involved. What do you use
> > there?
>
> Also, I wonder how much of the time went into to_tsvector() and how much
> into updating the GIST or GIN index that I suppose is on the column.
> For a full update like this it might make sense to drop the index,
> update the column, rebuild the index from scratch.
a count of distinct word in those 194 000 can be interesting also, I presume.
>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings


From: Henrik Zagerholm <henke(at)mac(dot)se>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Markus Schiltknecht <markus(at)bluegap(dot)ch>, pgsql-general(at)postgresql(dot)org
Subject: Re: Slow running to_tsvector (tsearch2 in PG 8.2.3)
Date: 2007-02-21 15:03:04
Message-ID: A6488B00-7975-49DD-B6D5-EA6B34C27449@mac.se
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general


21 feb 2007 kl. 15:44 skrev Tom Lane:

> Markus Schiltknecht <markus(at)bluegap(dot)ch> writes:
>> Henrik Zagerholm wrote:
>>> Is this normal? What can I tweak in postgresql.conf to speed up big
>>> to_tsvector()?
>
>> Hm.. seems not too unreasonable to me.
>> Take a look at the stemmers or dictionaries involved. What do you
>> use there?
I tried both the default english and simple stemmers. Simple was a
little bit faster but not much.
>
What is your configuration ? to_tsvector does a lot of work.
I haven't tweaked the postgresql.conf yet. What should I focus on?
work_mem, shared_buffers?

> Also, I wonder how much of the time went into to_tsvector() and how
> much
> into updating the GIST or GIN index that I suppose is on the column.
> For a full update like this it might make sense to drop the index,
> update the column, rebuild the index from scratch.
I actually tested to remove the GIN index on the vectors column and
it didn't have any noticeable effect. Maybe because its the only
tuple in the database.

Thanks for all your input!

regards, henrik

>
> regards, tom lane
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings