Re: Big table with UNION ALL or partitioning with Tsearch2

Lists: pgsql-general
From: "Benjamin Arai" <me(at)benjaminarai(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Big table with UNION ALL or partitioning with Tsearch2
Date: 2007-07-12 16:38:56
Message-ID: 1832.131.107.65.118.1184258336.squirrel@webmail.benjaminarai.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Hi,

I have a really big Tsearch2 table (100s GB) that takes a while to perform
queries and takes days to index. Is there any way to fix these issues
using UNIONs or partitioning? I was thinking that I could partition the
data by date but since I am always performing queries on the Tsearch2
field I do not know if this will help performance. I think paritioning
will help the indexing problem since I can incrementally re-index the data
but again I figured it would be better to ask.

Any suggestions will be greatly appreciated. Thanks in advance.

System I am running on:

-Raid 5 with 16x drives
-Quad core XEON
16 GB of memory (Any suggestion on the postgresql.conf setup would also be
great! Currently I am just setting shared mem to 8192MB)
-x86_64 but Redhat 5 Ent

Benjamin


From: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To: me(at)benjaminarai(dot)com
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Big table with UNION ALL or partitioning with Tsearch2
Date: 2007-07-12 17:03:16
Message-ID: 46965ED4.8090600@commandprompt.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Benjamin Arai wrote:
> Hi,
>
> I have a really big Tsearch2 table (100s GB) that takes a while to perform
> queries and takes days to index. Is there any way to fix these issues
> using UNIONs or partitioning? I was thinking that I could partition the
> data by date but since I am always performing queries on the Tsearch2
> field I do not know if this will help performance. I think paritioning
> will help the indexing problem since I can incrementally re-index the data
> but again I figured it would be better to ask.
>
> Any suggestions will be greatly appreciated. Thanks in advance.
>
> System I am running on:
>
> -Raid 5 with 16x drives

RAID 5 with 16 spindles? RAID 10 will give you better performance I
would think.

> -Quad core XEON
> 16 GB of memory (Any suggestion on the postgresql.conf setup would also be
> great! Currently I am just setting shared mem to 8192MB)

Assuming 8.1+ I would try something much more aggressive, like 4GB.
Dont' forget your effective_cache_size.

Joshua D. Drake

> -x86_64 but Redhat 5 Ent
>
> Benjamin
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
> message can get through to the mailing list cleanly
>

--

=== The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive PostgreSQL solutions since 1997
http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/


From: "Benjamin Arai" <me(at)benjaminarai(dot)com>
To: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
Cc: me(at)benjaminarai(dot)com, pgsql-general(at)postgresql(dot)org
Subject: Re: Big table with UNION ALL or partitioning with Tsearch2
Date: 2007-07-12 17:20:42
Message-ID: 5208.131.107.65.118.1184260842.squirrel@webmail.benjaminarai.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

> Benjamin Arai wrote:
>> Hi,
>>
>> I have a really big Tsearch2 table (100s GB) that takes a while to
>> perform
>> queries and takes days to index. Is there any way to fix these issues
>> using UNIONs or partitioning? I was thinking that I could partition the
>> data by date but since I am always performing queries on the Tsearch2
>> field I do not know if this will help performance. I think paritioning
>> will help the indexing problem since I can incrementally re-index the
>> data
>> but again I figured it would be better to ask.
>>
>> Any suggestions will be greatly appreciated. Thanks in advance.
>>
>> System I am running on:
>>
>> -Raid 5 with 16x drives
>
> RAID 5 with 16 spindles? RAID 10 will give you better performance I
> would think.
>
>
>> -Quad core XEON
>> 16 GB of memory (Any suggestion on the postgresql.conf setup would also
>> be
>> great! Currently I am just setting shared mem to 8192MB)
>
> Assuming 8.1+ I would try something much more aggressive, like 4GB.
> Dont' forget your effective_cache_size.
How is 4GB more aggressive? How large should the effective_cache_size be?
>
> Joshua D. Drake
>
>
>> -x86_64 but Redhat 5 Ent
>>
>> Benjamin
>>
>>
>>
>> ---------------------------(end of broadcast)---------------------------
>> TIP 1: if posting/reading through Usenet, please send an appropriate
>> subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
>> message can get through to the mailing list cleanly
>>
>
>
> --
>
> === The PostgreSQL Company: Command Prompt, Inc. ===
> Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
> Providing the most comprehensive PostgreSQL solutions since 1997
> http://www.commandprompt.com/
>
> Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
> PostgreSQL Replication: http://www.commandprompt.com/products/
>
>


From: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
To: Benjamin Arai <me(at)benjaminarai(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Big table with UNION ALL or partitioning with Tsearch2
Date: 2007-07-12 18:07:49
Message-ID: Pine.LNX.4.64.0707122204330.20068@sn.sai.msu.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Benjamin,

what version of postgres and what type of index you used ?
The best setup is to use partitioning with rather small table for
new data and GiST index and big archive table with static data and
GIN index. I have some slides from PGDay
http://www.sai.msu.su/~megera/postgres/talks/fts-pgday-2007.pdf
Also, did you consider using dblink/dbilink to scale your search ?

Oleg
On Thu, 12 Jul 2007, Benjamin Arai wrote:

> Hi,
>
> I have a really big Tsearch2 table (100s GB) that takes a while to perform
> queries and takes days to index. Is there any way to fix these issues
> using UNIONs or partitioning? I was thinking that I could partition the
> data by date but since I am always performing queries on the Tsearch2
> field I do not know if this will help performance. I think paritioning
> will help the indexing problem since I can incrementally re-index the data
> but again I figured it would be better to ask.
>
> Any suggestions will be greatly appreciated. Thanks in advance.
>
> System I am running on:
>
> -Raid 5 with 16x drives
> -Quad core XEON
> 16 GB of memory (Any suggestion on the postgresql.conf setup would also be
> great! Currently I am just setting shared mem to 8192MB)
> -x86_64 but Redhat 5 Ent
>
> Benjamin
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
> message can get through to the mailing list cleanly
>

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83


From: "Benjamin Arai" <me(at)benjaminarai(dot)com>
To: "Oleg Bartunov" <oleg(at)sai(dot)msu(dot)su>
Cc: "Benjamin Arai" <me(at)benjaminarai(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: Big table with UNION ALL or partitioning with Tsearch2
Date: 2007-07-12 18:19:18
Message-ID: 10109.131.107.65.118.1184264358.squirrel@webmail.benjaminarai.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Oleg,

I am running the latest 8.2.4. I am using GIN. The data is static. I do
a batch upload every week of about 500GB and the ata is never touched
again, it is always add and never delete or update.

From your slides you state:

GIN_FUZZY_SEARCH_LIMIT - maximum number of
returned rows
– GIN_FUZZY_SEARCH_LIMIT=0, disabled on
default

When I do a search with say LIMIT 100 isn't this essentially the same thing?

Benjamin

> Benjamin,
>
> what version of postgres and what type of index you used ?
> The best setup is to use partitioning with rather small table for
> new data and GiST index and big archive table with static data and
> GIN index. I have some slides from PGDay
> http://www.sai.msu.su/~megera/postgres/talks/fts-pgday-2007.pdf
> Also, did you consider using dblink/dbilink to scale your search ?
>
> Oleg
> On Thu, 12 Jul 2007, Benjamin Arai wrote:
>
>> Hi,
>>
>> I have a really big Tsearch2 table (100s GB) that takes a while to
>> perform
>> queries and takes days to index. Is there any way to fix these issues
>> using UNIONs or partitioning? I was thinking that I could partition the
>> data by date but since I am always performing queries on the Tsearch2
>> field I do not know if this will help performance. I think paritioning
>> will help the indexing problem since I can incrementally re-index the
>> data
>> but again I figured it would be better to ask.
>>
>> Any suggestions will be greatly appreciated. Thanks in advance.
>>
>> System I am running on:
>>
>> -Raid 5 with 16x drives
>> -Quad core XEON
>> 16 GB of memory (Any suggestion on the postgresql.conf setup would also
>> be
>> great! Currently I am just setting shared mem to 8192MB)
>> -x86_64 but Redhat 5 Ent
>>
>> Benjamin
>>
>>
>>
>> ---------------------------(end of broadcast)---------------------------
>> TIP 1: if posting/reading through Usenet, please send an appropriate
>> subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
>> message can get through to the mailing list cleanly
>>
>
> Regards,
> Oleg
> _____________________________________________________________
> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
> Sternberg Astronomical Institute, Moscow University, Russia
> Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
> phone: +007(495)939-16-83, +007(495)939-23-83
>
>


From: Hannes Dorbath <light(at)theendofthetunnel(dot)de>
To: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
Cc: Benjamin Arai <me(at)benjaminarai(dot)com>
Subject: Re: Big table with UNION ALL or partitioning with Tsearch2
Date: 2007-07-12 22:04:55
Message-ID: 4696A587.8020804@theendofthetunnel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Joshua D. Drake wrote:
> Benjamin Arai wrote:
> RAID 5 with 16 spindles? RAID 10 will give you better performance I
> would think.

I'd say RAID 5 is not that bad for this kind of query, at least if the
controller is worth anything. RAID 10 is the best choice for OLTP, but
if the main job for this query is to read an index as fast as possible,
RAID 5 might be well suited. I have good experience with TSearch queries
on a 8 drive RAID 6 setup.

Even if those 16 drives are cheap desktop SATAs, he should be able to
read with around 800MB/sec.

Benjamin, could you try if the following does change your query
performance in any way?

blockdev --setra 16384 /dev/sdX

--
Best regards,
Hannes Dorbath


From: "Simon Riggs" <simon(at)2ndquadrant(dot)com>
To: <me(at)benjaminarai(dot)com>
Cc: "Oleg Bartunov" <oleg(at)sai(dot)msu(dot)su>, <pgsql-general(at)postgresql(dot)org>
Subject: Re: Big table with UNION ALL or partitioning with Tsearch2
Date: 2007-07-13 10:37:43
Message-ID: 1184323063.4512.121.camel@ebony.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Thu, 2007-07-12 at 11:19 -0700, Benjamin Arai wrote:

> I am running the latest 8.2.4. I am using GIN. The data is static. I do
> a batch upload every week of about 500GB and the ata is never touched
> again, it is always add and never delete or update.

Partitioning will certainly help the index build times, even if it
doesn't specifically help with the queries.

> >From your slides you state:
>
> GIN_FUZZY_SEARCH_LIMIT - maximum number of
> returned rows
> – GIN_FUZZY_SEARCH_LIMIT=0, disabled on
> default
>
> When I do a search with say LIMIT 100 isn't this essentially the same thing?

Both restrict the number of matches found, but they're not the same
thing. One is for the query as a whole, the other is for one index scan
on a GIN index.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com