Re: Efficient map looking table

From: "David Johnston" <polobo(at)yahoo(dot)com>
To: "'Yoann MOREAU'" <yoann(dot)moreau(at)univ-avignon(dot)fr>, <pgsql-general(at)postgresql(dot)org>
Subject: Re: Efficient map looking table
Date: 2012-01-10 22:05:32
Message-ID: 020901cccfe3$f93026b0$eb907410$@yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

-----Original Message-----
From: pgsql-general-owner(at)postgresql(dot)org
[mailto:pgsql-general-owner(at)postgresql(dot)org] On Behalf Of Yoann MOREAU
Sent: Tuesday, January 10, 2012 4:45 PM
To: pgsql-general(at)postgresql(dot)org
Subject: [GENERAL] Efficient map looking table

Hello, I store in a database some XML-tag data. I have one table for the tag
names with an int PK for each unique name (tag_names) and one table for the
tags (tags) where each row references a tag name. The tag_names table is not
big but the tags table is huge. When I insert a row in tags I have to check
if its tag name already exists to get its id_tag_name and if not insert it
in tag_names.

My question : is there an efficient way to check/get the tag_name, it's
doing the same job as a hashmap. I need it to be up to date in the database
when inserting new tag names, because different inserts in tags table could
occur at the same time, all checking for a tag name. I then would need the
whole tag_names table to be loaded in memory and in the most "map" way as
possible, is there anything special to do ?

Thanks !
Yoann Moreau

------------------------------------------------------

Cache the tag_names table in your language of choice; whenever you get a
miss simply execute an INSERT and then REFRESH THE CACHE. If the INSERT
fails due to a duplicate key exception that is OK since once you refresh the
cache your re-lookup attempt should succeed.

You can do this directly in PostgreSQL via an Insert Trigger with a
SavePoint if/when you attempt to insert the new tag. Catch and ignore any
unique key violation exception done within the SavePoint.

With a small table and lots of memory you probably won't save a huge amount
with the out-of-DB cache.

You can also issue the SELECT and INSERTs manually, without using the
trigger, if you so desire. However, the Trigger should have excellent
performance compared to doing it manually but maybe not as much as caching
out-of-DB.

You should probably test both the cache and the Trigger and see which one
performs better. You could also deploy both and just have a run-time flag
to disable the cache (and surrounding code) in your application(s) and you
can toggle the Trigger on-the-fly as well.

David J.

In response to

Browse pgsql-general by date

  From Date Subject
Next Message panam 2012-01-11 01:07:33 Re: Extending the volume size of the data directory volume
Previous Message David Johnston 2012-01-10 21:52:16 Re: string = any()