Re: substring implementation (long string)

From: Joe Conway <mail(at)joeconway(dot)com>
To: Scott Cain <cain(at)cshl(dot)org>
Cc: pgsql General List <pgsql-general(at)postgresql(dot)org>
Subject: Re: substring implementation (long string)
Date: 2003-07-30 04:46:56
Message-ID: 3F274DC0.3040307@joeconway.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Scott Cain wrote:
> I am wondering about the implementation of substring for very large
> strings. I've got strings that are several million characters long and
> frequently need to extract relatively small substrings (5000-40000
> characters) (that's right, it's DNA). Before I cared much about
> performance, I retrieved the whole string and and substr'ed it in perl.
> I realized recently it is better to do the substring in postgres
> (performance increase by an order of magnitude). So here is what I am
> wondering: does postgres read the whole string into memory before it
> does the substring, or does it have some sort of smart way of reading
> just the substring from disk?

I believe that if you store the text uncompressed (which is not the
default) then the substring function can just grab the section of
interest. Normally text columns this large are compressed though, which
requires them to be read completely from disk before they are sliced.

See:
http://www.postgresql.org/docs/view.php?version=7.3&idoc=0&file=sql-altertable.html

In particular:
ALTER TABLE [ ONLY ] table [ * ] ALTER [ COLUMN ] column
SET STORAGE { PLAIN | EXTERNAL | EXTENDED | MAIN }

SET STORAGE

This form sets the storage mode for a column. This controls whether
this column is held inline or in a supplementary table, and whether the
data should be compressed or not. PLAIN must be used for fixed-length
values such as INTEGER and is inline, uncompressed. MAIN is for inline,
compressible data. EXTERNAL is for external, uncompressed data and
EXTENDED is for external, compressed data. EXTENDED is the default for
all data types that support it. The use of EXTERNAL will make substring
operations on a TEXT column faster, at the penalty of increased storage
space.

I think you'll want
ALTER TABLE mytable ALTER COLUMN bigtextcol SET STORAGE EXTERNAL;

As far as I know, there is no way to create a table with STORAGE
EXTERNAL; you have to create the table and then alter it with the above
statement.

HTH,

Joe

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Henshall, Stuart - TNP Southwest 2003-07-30 08:19:30 Re: SQL SUM query limited by dates
Previous Message Castle, Lindsay 2003-07-30 04:01:01 SQL SUM query limited by dates