Quick Links

Re: Postgres code for a query intermediate dataset

Lists:	pgsql-hackers

From:	Rohit Goyal <rhtgyl(dot)87(at)gmail(dot)com>
To:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Postgres code for a query intermediate dataset
Date:	2014-09-13 17:36:11
Message-ID:	CANqGtSu3MGxyTdVmtLdjA8c-KX=-pXYLd67c80X8UeBNpYFUFw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi All,

I want to work on the code of intermediate dataset of select and update
query.

For example.

Rohit's salary has been updated 4 times, so it has 4 different version of
salary.

I want to select salary of person named Rohit. Now suppose , in
intermediate result, I found 4 different versions of the data. I want to
know the code portion which i need to look for working on all 4 versions in
dataset. :)

Thanks in advance!!

Regards,
Rohit Goyal

--
Regards,
Rohit Goyal

From:	Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
To:	Rohit Goyal <rhtgyl(dot)87(at)gmail(dot)com>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Postgres code for a query intermediate dataset
Date:	2014-09-13 17:53:14
Message-ID:	CAOeZVicef5+6TfLnTYdmD1vFopAJ7FRAqpoTbAPZJHnKSE1Vhw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sat, Sep 13, 2014 at 11:06 PM, Rohit Goyal <rhtgyl(dot)87(at)gmail(dot)com> wrote:

>
> Hi All,
>
> I want to work on the code of intermediate dataset of select and update
> query.
>
> For example.
>
> Rohit's salary has been updated 4 times, so it has 4 different version of
> salary.
>
> I want to select salary of person named Rohit. Now suppose , in
> intermediate result, I found 4 different versions of the data. I want to
> know the code portion which i need to look for working on all 4 versions in
> dataset. :)
>
> Thanks in advance!!
>
>
>
Not sure what you are looking for, but each update is an insert of a new
tuple with the new values and marking the old tuple as deleted.

There is no need for tracking the versions of any changes in data set. Each
update operation leaves only one visible tuple. If the transaction commits,
inserted tuple becomes visible and old row is marked deleted. If the
transaction rollbacks, only the old tuple shall remain visible.
--
Regards,

Atri
*l'apprenant*

From:	David G Johnston <david(dot)g(dot)johnston(at)gmail(dot)com>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Postgres code for a query intermediate dataset
Date:	2014-09-13 18:22:47
Message-ID:	1410632567941-5818935.post@n5.nabble.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Atri Sharma wrote
> On Sat, Sep 13, 2014 at 11:06 PM, Rohit Goyal <

> rhtgyl.87@

> > wrote:
>
>>
>> Hi All,
>>
>> I want to work on the code of intermediate dataset of select and update
>> query.
>>
>> For example.
>>
>> Rohit's salary has been updated 4 times, so it has 4 different version of
>> salary.
>>
>> I want to select salary of person named Rohit. Now suppose , in
>> intermediate result, I found 4 different versions of the data. I want to
>> know the code portion which i need to look for working on all 4 versions
>> in
>> dataset. :)
>>
>> Thanks in advance!!
>>
>>
>>
> Not sure what you are looking for, but each update is an insert of a new
> tuple with the new values and marking the old tuple as deleted.
>
> There is no need for tracking the versions of any changes in data set.
> Each
> update operation leaves only one visible tuple. If the transaction
> commits,
> inserted tuple becomes visible and old row is marked deleted. If the
> transaction rollbacks, only the old tuple shall remain visible.
> --
> Regards,
>
> Atri
> *l'apprenant*

Or rather even if you want to be able to reference the older versions of
that record there is nothing in PostgreSQL to facilitate that. You have to
manually create and manage the data so that you know during what time period
a given record is valid.

David J.

--
View this message in context: http://postgresql.1045698.n5.nabble.com/Postgres-code-for-a-query-intermediate-dataset-tp5818931p5818935.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

From:	Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
To:	David G Johnston <david(dot)g(dot)johnston(at)gmail(dot)com>
Cc:	Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Postgres code for a query intermediate dataset
Date:	2014-09-13 18:35:18
Message-ID:	CAOeZVifSzN9cU9i=zXRKEVc26M0GkpKkqGDNbZrY59nMs3TgOw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sat, Sep 13, 2014 at 11:52 PM, David G Johnston <
david(dot)g(dot)johnston(at)gmail(dot)com> wrote:

> Atri Sharma wrote
> > On Sat, Sep 13, 2014 at 11:06 PM, Rohit Goyal <
>
>
>
> Or rather even if you want to be able to reference the older versions of
> that record there is nothing in PostgreSQL to facilitate that. You have to
> manually create and manage the data so that you know during what time
> period
> a given record is valid.
>
> David J.
>
>
>
>
>
Sometimes I do miss 'time travel' we used to have :)

Regards,

Atri
--
Regards,

Atri
*l'apprenant*

From:	Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz>
To:	Atri Sharma <atri(dot)jiit(at)gmail(dot)com>, David G Johnston <david(dot)g(dot)johnston(at)gmail(dot)com>
Cc:	Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Postgres code for a query intermediate dataset
Date:	2014-09-13 21:31:56
Message-ID:	5414B7CC.8070709@archidevsys.co.nz
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 14/09/14 06:35, Atri Sharma wrote:
>
>
> On Sat, Sep 13, 2014 at 11:52 PM, David G Johnston
> <david(dot)g(dot)johnston(at)gmail(dot)com <mailto:david(dot)g(dot)johnston(at)gmail(dot)com>> wrote:
>
> Atri Sharma wrote
> > On Sat, Sep 13, 2014 at 11:06 PM, Rohit Goyal <
>
>
>
> Or rather even if you want to be able to reference the older
> versions of
> that record there is nothing in PostgreSQL to facilitate that. You
> have to
> manually create and manage the data so that you know during what
> time period
> a given record is valid.
>
> David J.
>
>
>
>
>
> Sometimes I do miss 'time travel' we used to have :)
>
> Regards,
>
> Atri
> --
> Regards,
> Atri
> /l'apprenant/
That is only because the Guild of Time Travellers was formed, and we are
very selective in whom we allow to join. It was a massive undertaking
to purge the knowledge of effective time travel from the general
populace (H. G. Wells had to be expelled with a partial brain wipe)! :-)

On a more serious note:
I did design and implement a system to allow what the original poster
was after, it involved 2 tables for each logical table, and used both an
EFFECTIVE_DATE & an AS_AT_DATE. This allowed insurance quotes to be
valid for a given of time, even if the insurance rates were set change
after the quote was given (but before the quote expired). This was
about 15 years ago. It was amusing that my wife joined that team 10
years after I left, and found 2 of my original colleagues still there!

Cheers,
Gavin

From:	Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
To:	Rohit Goyal <rhtgyl(dot)87(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Postgres code for a query intermediate dataset
Date:	2014-09-14 06:57:01
Message-ID:	54153C3D.2070009@catalyst.net.nz
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 14/09/14 05:36, Rohit Goyal wrote:
> Hi All,
>
> I want to work on the code of intermediate dataset of select and update
> query.
>
> For example.
>
> Rohit's salary has been updated 4 times, so it has 4 different version
> of salary.
>
> I want to select salary of person named Rohit. Now suppose , in
> intermediate result, I found 4 different versions of the data. I want to
> know the code portion which i need to look for working on all 4 versions
> in dataset. :)
>
>

Hi Rohit,

Currently in Postgres, these intermediate versions all exist - however a
given session can only see one of them. Also VACUUM is allowed to
destroy versions that no other transactions can see.

So if I'm understanding you correctly, you would like to have some way
for a session to see *all* these versions (and I guess preventing VACUUM
from destroying them).

It is certainly possible (or used to be via snapshot manipulation, I
haven't looked at that code in a while sorry) to enable a session to see
all the old versions, and is quite a cool idea (Postgres used to have
this ability in older versions - called Time Travel).

For pure practicality, this can be achieved without any code
modifications using TRIGGERs and an extra table (as Gavin alludes to).

Do tell us a bit more about what you are wanting to do!

Cheers

Mark

From:	Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
To:	Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
Cc:	Rohit Goyal <rhtgyl(dot)87(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Postgres code for a query intermediate dataset
Date:	2014-09-14 07:25:47
Message-ID:	CAOeZVidenRCTO98bBz3J6v96Zf21-joXp_YGbOE1k7xjotXZ+g@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sunday, September 14, 2014, Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
wrote:

> On 14/09/14 05:36, Rohit Goyal wrote:
>
>> Hi All,
>>
>> I want to work on the code of intermediate dataset of select and update
>> query.
>>
>> For example.
>>
>> Rohit's salary has been updated 4 times, so it has 4 different version
>> of salary.
>>
>> I want to select salary of person named Rohit. Now suppose , in
>> intermediate result, I found 4 different versions of the data. I want to
>> know the code portion which i need to look for working on all 4 versions
>> in dataset. :)
>>
>>
>>
> Hi Rohit,
>
> Currently in Postgres, these intermediate versions all exist - however a
> given session can only see one of them. Also VACUUM is allowed to destroy
> versions that no other transactions can see.
>
> So if I'm understanding you correctly, you would like to have some way for
> a session to see *all* these versions (and I guess preventing VACUUM from
> destroying them).
>
>
>
Any modifications of that sort are bound to introduce lots of pain, not to
mention performance degradation and the added responsibility of ensuring
that dead tuples don't bloat up the system (prevent vacuum from running at
regular intervals and you can have a xid wraparound).

I just mentioned that in case you are planning to go in that direction. If
you only want the data, use the triggers as Gavin mentioned.

Regards,

Atri

--
Regards,

Atri
*l'apprenant*

From:	Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
To:	Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
Cc:	Rohit Goyal <rhtgyl(dot)87(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Postgres code for a query intermediate dataset
Date:	2014-09-14 08:00:39
Message-ID:	54154B27.70409@catalyst.net.nz
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 14/09/14 19:25, Atri Sharma wrote:
>
>
> On Sunday, September 14, 2014, Mark Kirkwood
> <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz <mailto:mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>>
> wrote:
>
> On 14/09/14 05:36, Rohit Goyal wrote:
>
> Hi All,
>
> I want to work on the code of intermediate dataset of select and
> update
> query.
>
> For example.
>
> Rohit's salary has been updated 4 times, so it has 4 different
> version
> of salary.
>
> I want to select salary of person named Rohit. Now suppose , in
> intermediate result, I found 4 different versions of the data. I
> want to
> know the code portion which i need to look for working on all 4
> versions
> in dataset. :)
>
>
>
> Hi Rohit,
>
> Currently in Postgres, these intermediate versions all exist -
> however a given session can only see one of them. Also VACUUM is
> allowed to destroy versions that no other transactions can see.
>
> So if I'm understanding you correctly, you would like to have some
> way for a session to see *all* these versions (and I guess
> preventing VACUUM from destroying them).
>
>
>
> Any modifications of that sort are bound to introduce lots of pain, not
> to mention performance degradation and the added responsibility of
> ensuring that dead tuples don't bloat up the system (prevent vacuum from
> running at regular intervals and you can have a xid wraparound).
>
> I just mentioned that in case you are planning to go in that direction.
> If you only want the data, use the triggers as Gavin mentioned.
>

Obviously in the general case sure - but (as yet) we don't have much
idea about Rohit's use case and workload. If retrieving past versions is
the *primary* workload bias and high update concurrency is not required
then this could well work better than a trigger based solution.

And it does not seem too onerous to have the ability to switch this on
as required, viz:

ALTER TABLE table1 VERSIONING;

(or similar syntax) which makes VACUUM leave this table alone. It might
make more sense to make such a concept apply to a TABLESPACE instead
mind you (i.e things in here are for archive/versioning purposes)...

Clearly we'd need to see the code for any of this and evaluate if it is
good or terrible, but I'm not seeing the idea as bad as stated.

Cheers

Mark

From:	Rohit Goyal <rhtgyl(dot)87(at)gmail(dot)com>
To:	Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Postgres code for a query intermediate dataset
Date:	2014-09-14 08:11:17
Message-ID:	CANqGtSuM_7VZAN_isMMR24hU7hK6u_cYobOe=GVJrdrxAfNi7Q@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi Mark,

On Sun, Sep 14, 2014 at 8:57 AM, Mark Kirkwood <
mark(dot)kirkwood(at)catalyst(dot)net(dot)nz> wrote:

> On 14/09/14 05:36, Rohit Goyal wrote:
>
>> Hi All,
>>
>> I want to work on the code of intermediate dataset of select and update
>> query.
>>
>> For example.
>>
>> Rohit's salary has been updated 4 times, so it has 4 different version
>> of salary.
>>
>> I want to select salary of person named Rohit. Now suppose , in
>> intermediate result, I found 4 different versions of the data. I want to
>> know the code portion which i need to look for working on all 4 versions
>> in dataset. :)
>>
>>
>>
> Hi Rohit,
>
> Currently in Postgres, these intermediate versions all exist - however a
> given session can only see one of them. Also VACUUM is allowed to destroy
> versions that no other transactions can see.
>
> Exactly, one visible version is there per session. But, I want to test my
algorithm in which i myself want to work on all the intermediate version
and find the correct one for the session.

> So if I'm understanding you correctly, you would like to have some way for
> a session to see *all* these versions (and I guess preventing VACUUM from
> destroying them).
>
yes and I also want to know the portion of code where i can find all the
historical or intermediate versions

>
> It is certainly possible (or used to be via snapshot manipulation, I
> haven't looked at that code in a while sorry) to enable a session to see
> all the old versions, and is quite a cool idea (Postgres used to have this
> ability in older versions - called Time Travel).
>
> For pure practicality, this can be achieved without any code modifications
> using TRIGGERs and an extra table (as Gavin alludes to).
>
Can you explain me more about how to starting working using trigger?

>
> Do tell us a bit more about what you are wanting to do!
>
> Cheers
>
> Mark
>
>

--
Regards,
Rohit Goyal

From:	Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
To:	Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
Cc:	Rohit Goyal <rhtgyl(dot)87(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Postgres code for a query intermediate dataset
Date:	2014-09-14 08:24:06
Message-ID:	CAOeZVifBFOCaepCmj+JaqkVDQ2vfemTBFrdjsy0AkavXkduWBQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sun, Sep 14, 2014 at 1:30 PM, Mark Kirkwood <
mark(dot)kirkwood(at)catalyst(dot)net(dot)nz> wrote:

> On 14/09/14 19:25, Atri Sharma wrote:
>
>>
>>
>> On Sunday, September 14, 2014, Mark Kirkwood
>> <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz <mailto:mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>>
>>
>> wrote:
>>
>> On 14/09/14 05:36, Rohit Goyal wrote:
>>
>> Hi All,
>>
>> I want to work on the code of intermediate dataset of select and
>> update
>> query.
>>
>> For example.
>>
>> Rohit's salary has been updated 4 times, so it has 4 different
>> version
>> of salary.
>>
>> I want to select salary of person named Rohit. Now suppose , in
>> intermediate result, I found 4 different versions of the data. I
>> want to
>> know the code portion which i need to look for working on all 4
>> versions
>> in dataset. :)
>>
>>
>>
>> Hi Rohit,
>>
>> Currently in Postgres, these intermediate versions all exist -
>> however a given session can only see one of them. Also VACUUM is
>> allowed to destroy versions that no other transactions can see.
>>
>> So if I'm understanding you correctly, you would like to have some
>> way for a session to see *all* these versions (and I guess
>> preventing VACUUM from destroying them).
>>
>>
>>
>> Any modifications of that sort are bound to introduce lots of pain, not
>> to mention performance degradation and the added responsibility of
>> ensuring that dead tuples don't bloat up the system (prevent vacuum from
>> running at regular intervals and you can have a xid wraparound).
>>
>> I just mentioned that in case you are planning to go in that direction.
>> If you only want the data, use the triggers as Gavin mentioned.
>>
>>
> Obviously in the general case sure - but (as yet) we don't have much idea
> about Rohit's use case and workload. If retrieving past versions is the
> *primary* workload bias and high update concurrency is not required then
> this could well work better than a trigger based solution.
>
> And it does not seem too onerous to have the ability to switch this on as
> required, viz:
>
> ALTER TABLE table1 VERSIONING;
>
> (or similar syntax) which makes VACUUM leave this table alone.
>

How do you plan to do all that VACUUM does for this table then?

It seems to me that you are saying to VACUUM that it need not be concerned
with table 'A' and you are assuming ownership of all the tasks performed by
VACUUM for this table. Seems pretty broken to me, not to mention the
performance degradations.

Regards,

Atri

Regards,

Atri

From:	Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
To:	Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
Cc:	Rohit Goyal <rhtgyl(dot)87(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Postgres code for a query intermediate dataset
Date:	2014-09-14 08:36:51
Message-ID:	CAOeZVied1mQ5CUKGL7rMTOG0tmh1yd+Wow_VHwGWMmisWE7g+g@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sun, Sep 14, 2014 at 1:30 PM, Mark Kirkwood <
mark(dot)kirkwood(at)catalyst(dot)net(dot)nz> wrote:

What I think can be done is have a tuplestore which has the delta of
updated rows i.e. only have the changes made in an update statement stored
in a tuplestore (it could be a part of RelationData). It should be simple
enough to have tuplestore store the oid of the inserted tuple and the
difference between new tuple and the old tuple. No changes need to be done
for old tuple since it can be marked as deleted and VACUUM can remove it as
normal logic.

Not a clean way, but should work for what you proposed.

From:	Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
To:	Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
Cc:	Rohit Goyal <rhtgyl(dot)87(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Postgres code for a query intermediate dataset
Date:	2014-09-14 08:43:55
Message-ID:	5415554B.2080205@catalyst.net.nz
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 14/09/14 20:24, Atri Sharma wrote:
>
>
> On Sun, Sep 14, 2014 at 1:30 PM, Mark Kirkwood
> <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz <mailto:mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>>
> wrote:
>
> On 14/09/14 19:25, Atri Sharma wrote:
>
>
>
> On Sunday, September 14, 2014, Mark Kirkwood
> <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz
> <mailto:mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
> <mailto:mark(dot)kirkwood(at)__catalyst(dot)net(dot)nz
> <mailto:mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>>>
>
> wrote:
>
> On 14/09/14 05:36, Rohit Goyal wrote:
>
> Hi All,
>
> I want to work on the code of intermediate dataset of
> select and
> update
> query.
>
> For example.
>
> Rohit's salary has been updated 4 times, so it has 4
> different
> version
> of salary.
>
> I want to select salary of person named Rohit. Now
> suppose , in
> intermediate result, I found 4 different versions of
> the data. I
> want to
> know the code portion which i need to look for working
> on all 4
> versions
> in dataset. :)
>
>
>
> Hi Rohit,
>
> Currently in Postgres, these intermediate versions all exist -
> however a given session can only see one of them. Also
> VACUUM is
> allowed to destroy versions that no other transactions can see.
>
> So if I'm understanding you correctly, you would like to
> have some
> way for a session to see *all* these versions (and I guess
> preventing VACUUM from destroying them).
>
>
>
> Any modifications of that sort are bound to introduce lots of
> pain, not
> to mention performance degradation and the added responsibility of
> ensuring that dead tuples don't bloat up the system (prevent
> vacuum from
> running at regular intervals and you can have a xid wraparound).
>
> I just mentioned that in case you are planning to go in that
> direction.
> If you only want the data, use the triggers as Gavin mentioned.
>
>
> Obviously in the general case sure - but (as yet) we don't have much
> idea about Rohit's use case and workload. If retrieving past
> versions is the *primary* workload bias and high update concurrency
> is not required then this could well work better than a trigger
> based solution.
>
> And it does not seem too onerous to have the ability to switch this
> on as required, viz:
>
> ALTER TABLE table1 VERSIONING;
>
> (or similar syntax) which makes VACUUM leave this table alone.
>
>
> How do you plan to do all that VACUUM does for this table then?
>
> It seems to me that you are saying to VACUUM that it need not be
> concerned with table 'A' and you are assuming ownership of all the tasks
> performed by VACUUM for this table. Seems pretty broken to me, not to
> mention the performance degradations.
>

I think the whole point of such a modification is that nothing is done
to such tables, as you want to see all the previous versions.

Clearly this is less performant for standard workloads...but we are
talking about non standard workloads surely...

Regards

Mark

From:	Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
To:	Rohit Goyal <rhtgyl(dot)87(at)gmail(dot)com>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Postgres code for a query intermediate dataset
Date:	2014-09-14 08:53:13
Message-ID:	54155779.4050107@catalyst.net.nz
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 14/09/14 20:11, Rohit Goyal wrote:
> Hi Mark,
>
> On Sun, Sep 14, 2014 at 8:57 AM, Mark Kirkwood
> <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz <mailto:mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>>
> wrote:

> Currently in Postgres, these intermediate versions all exist -
> however a given session can only see one of them. Also VACUUM is
> allowed to destroy versions that no other transactions can see.
>
> Exactly, one visible version is there per session. But, I want to test
> my algorithm in which i myself want to work on all the intermediate
> version and find the correct one for the session.
>
> So if I'm understanding you correctly, you would like to have some
> way for a session to see *all* these versions (and I guess
> preventing VACUUM from destroying them).
>
> yes and I also want to know the portion of code where i can find all the
> historical or intermediate versions

Well that's going to be a bit of a learning curve for you :-), the
concept to get familiar with is snapshots (see
src/backand/access/heap/heapam.c to get started).

>
>
> It is certainly possible (or used to be via snapshot manipulation, I
> haven't looked at that code in a while sorry) to enable a session to
> see all the old versions, and is quite a cool idea (Postgres used to
> have this ability in older versions - called Time Travel).
>
> For pure practicality, this can be achieved without any code
> modifications using TRIGGERs and an extra table (as Gavin alludes to).
>
> Can you explain me more about how to starting working using trigger?

I'm not sure we have specif examples in the docs for what you want to
do, but generally see
http://www.postgresql.org/docs/9.4/static/plpgsql-trigger.html

Cheers

Mark

From:	Rohit Goyal <rhtgyl(dot)87(at)gmail(dot)com>
To:	Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Postgres code for a query intermediate dataset
Date:	2014-09-14 09:18:51
Message-ID:	CANqGtSs9jTZEP3y1PWP5vMKzWOag8Ln_KNiaLy8WykO8SUHAGA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi Mark & Atri, :)

Thanks for reply. But, I think i confused you. I am talking about access
using indexes. So, I assume that B+ tree store key-value pair where rohit
is the key and all the versions are its value.

Another way to think is I have a secondary index on emp. name and there are
4 rohit exist in DB. So, now B+ tree gives me 4 different tuple pointer for
each Rohit. I want to know the code portion for this where i can see all 4
tuple pointer before each one have I/O access to fetch its tuple.

Are the suggestions still valid?

On Sun, Sep 14, 2014 at 10:53 AM, Mark Kirkwood <
mark(dot)kirkwood(at)catalyst(dot)net(dot)nz> wrote:

> On 14/09/14 20:11, Rohit Goyal wrote:
>
>> Hi Mark,
>>
>> On Sun, Sep 14, 2014 at 8:57 AM, Mark Kirkwood
>> <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz <mailto:mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>>
>> wrote:
>>
>
> Currently in Postgres, these intermediate versions all exist -
>> however a given session can only see one of them. Also VACUUM is
>> allowed to destroy versions that no other transactions can see.
>>
>> Exactly, one visible version is there per session. But, I want to test
>> my algorithm in which i myself want to work on all the intermediate
>> version and find the correct one for the session.
>>
>> So if I'm understanding you correctly, you would like to have some
>> way for a session to see *all* these versions (and I guess
>> preventing VACUUM from destroying them).
>>
>> yes and I also want to know the portion of code where i can find all the
>> historical or intermediate versions
>>
>
> Well that's going to be a bit of a learning curve for you :-), the concept
> to get familiar with is snapshots (see src/backand/access/heap/heapam.c
> to get started).

Thanks I will read it. :) Can you please tel me some specifics from this c
file, if you already know :)

>
>
>
>>
>> It is certainly possible (or used to be via snapshot manipulation, I
>> haven't looked at that code in a while sorry) to enable a session to
>> see all the old versions, and is quite a cool idea (Postgres used to
>> have this ability in older versions - called Time Travel).
>>
>> For pure practicality, this can be achieved without any code
>> modifications using TRIGGERs and an extra table (as Gavin alludes to).
>>
>> Can you explain me more about how to starting working using trigger?
>>
>
> I'm not sure we have specif examples in the docs for what you want to do,
> but generally see http://www.postgresql.org/docs/9.4/static/plpgsql-
> trigger.html
>
> Cheers
>
> Mark
>

--
Regards,
Rohit Goyal

From:	Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
To:	Rohit Goyal <rhtgyl(dot)87(at)gmail(dot)com>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Postgres code for a query intermediate dataset
Date:	2014-09-14 09:57:51
Message-ID:	5415669F.2030506@catalyst.net.nz
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 14/09/14 21:18, Rohit Goyal wrote:
> Hi Mark & Atri, :)
>
> Thanks for reply. But, I think i confused you. I am talking about access
> using indexes. So, I assume that B+ tree store key-value pair where
> rohit is the key and all the versions are its value.
>
> Another way to think is I have a secondary index on emp. name and there
> are 4 rohit exist in DB. So, now B+ tree gives me 4 different tuple
> pointer for each Rohit. I want to know the code portion for this where i
> can see all 4 tuple pointer before each one have I/O access to fetch its
> tuple.
>
> Are the suggestions still valid?
>

Visibility rules mentioned earlier apply equally to tables and indexes
(strictly speaking what happens is index tuples are checked against the
relevant tables to see if your session can see them), so discussion of
whether tuples are retrieved via index or table scans is not really
relevant (i.e query planning/optimization is separate from tuple
visibility).

Cheers

Mark

From:	Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
To:	Atri Sharma <atri(dot)jiit(at)gmail(dot)com>, Rohit Goyal <rhtgyl(dot)87(at)gmail(dot)com>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Postgres code for a query intermediate dataset
Date:	2014-09-15 06:09:15
Message-ID:	5416828B.1030900@catalyst.net.nz
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 14/09/14 20:43, Mark Kirkwood wrote:
> On 14/09/14 20:24, Atri Sharma wrote:
>>
>> How do you plan to do all that VACUUM does for this table then?
>>
>> It seems to me that you are saying to VACUUM that it need not be
>> concerned with table 'A' and you are assuming ownership of all the tasks
>> performed by VACUUM for this table. Seems pretty broken to me, not to
>> mention the performance degradations.
>>
>
> I think the whole point of such a modification is that nothing is done
> to such tables, as you want to see all the previous versions.
>
> Clearly this is less performant for standard workloads...but we are
> talking about non standard workloads surely...

To be fair with respect to what Atri is saying, I should have said
something like:

Clearly this is *horribly* less performant for standard workloads...etc :-)

Also there is the good point he raised about transaction xid wrap, so
some messing about with that part of VACUUM would be required too (it's
the little complications that all add up)!

The TRIGGER based approach is clearly a lot simpler! However for an
interest project to understand Postgres internals the other approach is
worthwhile.

Cheers

Mark

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Rohit Goyal <rhtgyl(dot)87(at)gmail(dot)com>
Cc:	Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Postgres code for a query intermediate dataset
Date:	2014-09-15 16:41:25
Message-ID:	CA+TgmoY7_4aK9oaVn+z2FZP_OnNk1w=VL8HjKAPCyPQESPuJkg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sun, Sep 14, 2014 at 5:18 AM, Rohit Goyal <rhtgyl(dot)87(at)gmail(dot)com> wrote:
> Thanks for reply. But, I think i confused you. I am talking about access
> using indexes. So, I assume that B+ tree store key-value pair where rohit is
> the key and all the versions are its value.
>
> Another way to think is I have a secondary index on emp. name and there are
> 4 rohit exist in DB. So, now B+ tree gives me 4 different tuple pointer for
> each Rohit. I want to know the code portion for this where i can see all 4
> tuple pointer before each one have I/O access to fetch its tuple.

You may want to look at index_getnext(), index_getnext_tid(), and/or
heap_hot_search_buffer().

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company