Re: insert and query performance on big string table with pg_trgm

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Matthew Hall <mhall(at)mhcomputing(dot)net>
Cc: pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: Re: insert and query performance on big string table with pg_trgm
Date: 2017-11-21 01:42:50
Message-ID: CAMkU=1yPVyiyyXOxDyDCA18zqaorkoZrfns-2CZXvGi5uZw8xQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Mon, Nov 20, 2017 at 2:54 PM, Matthew Hall <mhall(at)mhcomputing(dot)net> wrote:

While I have not done exhaustive testing, from the tests I have done I've
never found gist to be better than gin with trgm indexes.

>
> Here is the table:
>
> Unlogged table "public.huge_table"
> Column | Type | Collation | Nullable |
> Default
> -------------+--------------------------+-----------+-------
> ---+-----------------------------------------------
> id | bigint | | not null |
> nextval('huge_table_id_seq'::regclass)
> inserted_ts | timestamp with time zone | | |
> transaction_timestamp()
> value | character varying | | |
> Indexes:
> "huge_table_pkey" PRIMARY KEY, btree (id)
> "huge_table_value_idx" UNIQUE, btree (value)
> "huge_table_value_trgm" gin (value gin_trgm_ops)
>

Do you really need the artificial primary key, when you already have
another column that would be used as the primary key? If you need to use
this it a foreign key in another type, then very well might. But
maintaining two unique indexes doesn't come free.

Are all indexes present at the time you insert? It will probably be much
faster to insert without the gin index (at least) and build it after the
load.

Without knowing this key fact, it is hard to interpret the rest of your
data.

>
> I managed to load the table initially in about 9 hours, after doing some
> optimizations below based on various documentation (the server is 8-core
> Xeon
> E5504, 16 GB RAM, 4 Hitachi 1TB 7200 RPM in a RAID 5 via Linux MD):
> ...

>
>
* maintenance_work_mem 512 MB
>

Building a gin index in bulk could benefit from more memory here.

* synchronous_commit off
>

If you already are using unlogged tables, this might not be so helpful, but
does increase the risk of the rest of your system.

> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> 29578 postgres 20 0 6575672 6.149g 6.139g R 86.0 39.7 45:24.97
> postgres
>

You should expand the command line (by hitting 'c', at least in my version
of top) so we can see which postgres process this is.

>
> As for queries, doing a simple query like this one seems to require around
> 30
> seconds to a minute. My volume is not crazy high but I am hoping I could
> get
> this down to less than 30 seconds, because other stuff above this code will
> start to time out otherwise:
>
> osint=# explain analyze select * from huge_table where value ilike
> '%keyword%';
>

explain (analyze, buffers), please. And hopefully with track_io_timing=on.

If you repeat the same query, is it then faster, or is it still slow?

Cheers,

Jeff

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message phb07 2017-11-21 08:00:53 Re: [PERFORM] POWA doesn't show queries executed
Previous Message Matthew Hall 2017-11-20 22:54:01 insert and query performance on big string table with pg_trgm