Re: Question about OID and TCID

Lists: pgsql-general
From: Zhipan Wang <wzhipan(at)soe(dot)ucsc(dot)edu>
To: pgsql-general(at)postgresql(dot)org
Subject: Question about OID and TCID
Date: 2010-11-27 08:52:56
Message-ID: 464718628.766821290847976572.JavaMail.root@mail-01.cse.ucsc.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Hi,

I want to access part of a table on the disk sequentially, i,e., when I get to a tuple in the table, I need to read several pages of data in the table starting from this tuple. I guess CTID could be translated to physical address on the disk to retrieve this tuple, right? If so, how do I use CTID to retrieve a particular tuple (or a page) in SQL? Can I use OID to do this equally efficiently?

Another question is: when I update a tuple in a table, this tuple will get a new CTID and it leaves a gap at the old CTID, and when I insert a new tuple, it's appended to the end of the table, so the gap is always there. Does this mean it actually inserts a new tuple and the out-dated tuple still occupies the space? How can I write the updated tuple back to its original position to utilize disk space more efficiently?

Thanks!


From: Dmitriy Igrishin <dmitigr(at)gmail(dot)com>
To: Zhipan Wang <wzhipan(at)soe(dot)ucsc(dot)edu>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Question about OID and TCID
Date: 2010-11-28 08:41:06
Message-ID: AANLkTimmE-8PqUhw-OByW8R36SYkTFuxfpBaY8VdRuNM@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Hey Zhipan,

2010/11/27 Zhipan Wang <wzhipan(at)soe(dot)ucsc(dot)edu>

> Hi,
>
> I want to access part of a table on the disk sequentially, i,e., when I get
> to a tuple in the table, I need to read several pages of data in the table
> starting from this tuple. I guess CTID could be translated to physical
> address on the disk to retrieve this tuple, right? If so, how do I use CTID
> to retrieve a particular tuple (or a page) in SQL? Can I use OID to do this
> equally efficiently?
>
Consider to use cursors to read sequentially by FETCH.
http://www.postgresql.org/docs/9.0/static/sql-fetch.html

>
> Another question is: when I update a tuple in a table, this tuple will get
> a new CTID and it leaves a gap at the old CTID, and when I insert a new
> tuple, it's appended to the end of the table, so the gap is always there.
> Does this mean it actually inserts a new tuple and the out-dated tuple still
> occupies the space? How can I write the updated tuple back to its original
> position to utilize disk space more efficiently?
>
I believe that VACUUM works well on it.
http://www.postgresql.org/docs/9.0/static/sql-vacuum.html

>
> Thanks!
>
> --
> Sent via pgsql-general mailing list (pgsql-general(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

--
// Dmitriy.


From: Guillaume Lelarge <guillaume(at)lelarge(dot)info>
To: Zhipan Wang <wzhipan(at)soe(dot)ucsc(dot)edu>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Question about OID and TCID
Date: 2010-11-28 08:59:46
Message-ID: 4CF21A02.3010608@lelarge.info
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Le 27/11/2010 09:52, Zhipan Wang a écrit :
> Hi,
>
> I want to access part of a table on the disk sequentially, i,e., when I get to a tuple in the table, I need to read several pages of data in the table starting from this tuple.

You shouldn't rely on the order on disk. It will change as soon as you
update one.

> I guess CTID could be translated to physical address on the disk to retrieve this tuple, right?

Yes, first number is the page number. The second one is the record
number in that page.

> If so, how do I use CTID to retrieve a particular tuple (or a page) in SQL?

For example, SELECT * FROM yourtable WHERE ctid='(0,5)';

> Can I use OID to do this equally efficiently?

No, they don't reflect the order on disk.

> Another question is: when I update a tuple in a table, this tuple will get a new CTID and it leaves a gap at the old CTID, and when I insert a new tuple, it's appended to the end of the table, so the gap is always there. Does this mean it actually inserts a new tuple and the out-dated tuple still occupies the space?

Yes. Other sessions could still need to see the old tuple values.

> How can I write the updated tuple back to its original position to utilize disk space more efficiently?

You don't need to. PostgreSQL will deal with that. As soon as you do a
VACUUM, PostgreSQL will be able to use the dead space for the next
UPDATE or INSERT query (if there is enough space of course).

--
Guillaume
http://www.postgresql.fr
http://dalibo.com


From: Guillaume Lelarge <guillaume(at)lelarge(dot)info>
To: zhipan Wang <wzhipan(at)yahoo(dot)com>
Cc: postgresql Forums <pgsql-general(at)postgresql(dot)org>
Subject: Re: Question about OID and TCID
Date: 2010-11-28 09:55:09
Message-ID: 4CF226FD.8030403@lelarge.info
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Please send your answer (also) to the list.

Le 28/11/2010 10:23, zhipan Wang a écrit :
> [...]
> Thanks a lot for your detailed reply. I am trying to implement the idea proposed
> in the paper Database Cracking. The basic idea is like this:
>
> When a query with a range predicate like "a<col" comes, a copy of the column col
> is created as a stand-alone table, and I need to partially sort (so physically
> reorganize) the table, and insert a node into the index to point to a tuple in
> the table, such that all values in the new table before that tuple are less
> than a and all values in col after that tuple are greater than or equal to a. As
> more and more queries come in, the table becomes better ordered, and a qeury can
> benefit from sequential scan enabled by this order.
>

Finding a solution to your issue is quite hard, because we don't know
which issue you have. Anyways, some thoughts...

PostgreSQL can already use an index to quickly sort a table. But CLUSTER
may be a more appropriate solution for you. You have to remember that
you'll need to launch CLUSTER from time to time so that the table get
reorganized.

> So the in-place update is a vital part of the program, because a range of values
> of column col should be stored together on the disk. Is there any way to do such
> a in-place update without generating much extra overhead?
>

If you need in-place update, then it's possible that PostgreSQL is not
the right tool for you.

But, as I already said, it's difficult to know for sure as we don't know
what issue you're trying to fix.

--
Guillaume
http://www.postgresql.fr
http://dalibo.com


From: Matthew Walden <matthew(dot)walden(at)bcs(dot)org>
To: Guillaume Lelarge <guillaume(at)lelarge(dot)info>
Cc: zhipan Wang <wzhipan(at)yahoo(dot)com>, postgresql Forums <pgsql-general(at)postgresql(dot)org>
Subject: Re: Question about OID and TCID
Date: 2010-11-28 11:54:49
Message-ID: AANLkTikTL1ny49nqYC62VqqEme5PHp3-a3d-5FnZVg2q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

> So the in-place update is a vital part of the program, because a range of
values
> of column col should be stored together on the disk. Is there any way to
do such
> a in-place update without generating much extra overhead?
>

Although in-place update is not possible, can he not use partitioning to at
least try to store the data together based on the value of column col?