Re: About tapes

Lists: pgsql-hackers
From: "mac_man2005(at)hotmail(dot)it" <mac_man2005(at)hotmail(dot)it>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: About tapes
Date: 2010-06-18 18:36:31
Message-ID: BLU0-SMTP7902622414A2DDE6E2A978E6C00@phx.gbl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi to all.

Please take a look at the initial comment contained into the logtape.c file:
http://doxygen.postgresql.org/logtape_8c-source.html

Almost at the beginning of that file, it is affirmed that implementing
tapes on disk (quote: /by creating a separate file for each "tape"/)
will require more space than implementing merge on tapes themselves.
Now, taking in account that tuplesort.c and logtape.c actually DO
implement tapes on disk, in which case it would require between 2x and
4x the input space?

Thanks for your attention.
Best regards.

Manolo


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: "mac_man2005(at)hotmail(dot)it" <mac_man2005(at)hotmail(dot)it>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: About tapes
Date: 2010-06-18 19:00:15
Message-ID: AANLkTinERwYkDbw-EFuuUQo1R5YfhSiFIAQ_s_UwyRiB@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Jun 18, 2010 at 2:36 PM, mac_man2005(at)hotmail(dot)it
<mac_man2005(at)hotmail(dot)it> wrote:
> Please take a look at the initial comment contained into the logtape.c file:
> http://doxygen.postgresql.org/logtape_8c-source.html
>
> Almost at the beginning of that file, it is affirmed that implementing tapes
> on disk (quote: by creating a separate file for each "tape") will require
> more space than implementing merge on tapes themselves.
> Now, taking in account that tuplesort.c and logtape.c actually DO implement
> tapes on disk, in which case it would require between 2x and 4x the input
> space?

Did you read the rest of the comment? It explains how the code avoids this...

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: "mac_man2005(at)hotmail(dot)it" <mac_man2005(at)hotmail(dot)it>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: About tapes
Date: 2010-06-18 19:11:57
Message-ID: BLU0-SMTP93C48E617F26B5CC5AFADE6C00@phx.gbl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Il 18/06/2010 21:00, Robert Haas ha scritto:
> On Fri, Jun 18
> Did you read the rest of the comment? It explains how the code avoids this...
>
>

Robert, thanks for your reply.
I read the rest of the document, but please take in account that my
question wasn't about "avoiding".
My question is "in which cases"?

I repeat my question. Tuplesort.c and logtape.c DO implement tapes on
disk and currently they do not request 2x or 4x of the input space: so,
again, in which case implementing tapes on disks requires between 2x and
4x of input space?

Thanks for your attention.
Manolo.


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: "mac_man2005(at)hotmail(dot)it" <mac_man2005(at)hotmail(dot)it>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: About tapes
Date: 2010-06-18 19:16:28
Message-ID: AANLkTikDd6EDyueHcjvkm6tyqoBVMEKYpdPc51o2fMUa@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Jun 18, 2010 at 3:11 PM, mac_man2005(at)hotmail(dot)it
<mac_man2005(at)hotmail(dot)it> wrote:
> Il 18/06/2010 21:00, Robert Haas ha scritto:
>>
>> On Fri, Jun 18
>> Did you read the rest of the comment?  It explains how the code avoids
>> this...
>>
>>
>
> Robert, thanks for your reply.
> I read the rest of the document, but please take in account that my question
> wasn't about "avoiding".
> My question is "in which cases"?
>
> I repeat my question. Tuplesort.c and logtape.c DO implement tapes on disk
> and currently they do not request 2x or 4x of the input space: so, again, in
> which case implementing tapes on disks requires between 2x and 4x of input
> space?

I think that the comment is saying that it *would* take 2x or 4x the
input space IF we created a separate file for each input. So instead
we don't.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: "mac_man2005(at)hotmail(dot)it" <mac_man2005(at)hotmail(dot)it>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: About tapes
Date: 2010-06-18 19:37:13
Message-ID: 17176.1276889833@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Fri, Jun 18, 2010 at 3:11 PM, mac_man2005(at)hotmail(dot)it
> <mac_man2005(at)hotmail(dot)it> wrote:
>> I repeat my question. Tuplesort.c and logtape.c DO implement tapes on disk
>> and currently they do not request 2x or 4x of the input space: so, again, in
>> which case implementing tapes on disks requires between 2x and 4x of input
>> space?

> I think that the comment is saying that it *would* take 2x or 4x the
> input space IF we created a separate file for each input. So instead
> we don't.

The point of the comment (and indeed of the whole module) is that if we
don't want peak space usage to be at least twice the data volume, we
have to recycle the space used by "input tapes" before the tapes have
been fully read. There's no way to do that if each "tape" is an
independent operating-system file.

regards, tom lane


From: "mac_man2005(at)hotmail(dot)it" <mac_man2005(at)hotmail(dot)it>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: About tapes
Date: 2010-06-18 19:46:35
Message-ID: BLU0-SMTP53C0E8BB213EDAEE55F362E6C00@phx.gbl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Ok, so let's try asking the question in another way.

Which is the difference between having more than one tape into a file
and having one tape per file?
I mean, we are peaking runs belonging to different tapes and merge those
runs.

Moreover, why space is reduced taking in account that we can free the
read tuples, when both using one tape per file or more tapes per file?

Il 18/06/2010 21:16, Robert Haas ha scritto:
> I think that the comment is saying that it *would* take 2x or 4x the
> input space IF we created a separate file for each input. So instead
> we don't.
>
>


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: "mac_man2005(at)hotmail(dot)it" <mac_man2005(at)hotmail(dot)it>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: About tapes
Date: 2010-06-18 21:29:23
Message-ID: AANLkTimEuAyvn34NykLpbMISZGLIZc8sEffsZy5SeMKP@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Jun 18, 2010 at 3:46 PM, mac_man2005(at)hotmail(dot)it
<mac_man2005(at)hotmail(dot)it> wrote:
> Which is the difference between having more than one tape into a file and
> having one tape per file?

It makes it easier to recycle space a little at a time. Suppose
you're merging two runs of 100 blocks each. You read in a block from
each run and write out two output blocks. Now that you've done that,
the first block of each of the input runs is garbage and can be
recycled - but if the input runs and the output run are in three
separate files, there's no easy way to do that. You can truncate a
file (and throw away the end) but there's no easy way to throw away
the BEGINNING of a file. So you'll probably have to hold on to the
entirety of both inputs until you've written the entirety of the
output.

On the other hand, suppose you have all the blocks in one big file.
The first input run is in blocks 1-100; the second is in blocks
101-200. You can read blocks 1 and 101, say, and write the results to
blocks 201 and 202, using extra storage, but only a little bit. When
you then read blocks 2 and 102, you write the results to blocks 1 and
100, which are no longer needed, because you've already merged them.
When you get done with that, blocks 2 and 102 are now no longer needed
and can be used to write the next part of the output. Of course, you
have to keep track of which order to reread the blocks in when the
sort is done: 201, 202, 1, 101, ... but that's a manageable problem.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: "mac_man2005(at)hotmail(dot)it" <mac_man2005(at)hotmail(dot)it>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: About tapes
Date: 2010-06-19 08:57:19
Message-ID: BLU0-SMTP6566E398824C4CCE630A2FE6C10@phx.gbl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom, Robert,
thank you.

Now it is clearer how space on tapes is recycled.

I tried to follow Robert's example but storing one tape per separate file.
Read in the first block of each run (hosted by separate tapes and so by
separate files) and output them into extra storage, wherever this extra
storage is.
Again, those first input blocks are now garbage: is it correct?
In this case, what happens when trying to recycle those garbage blocks
by hosting the result of merging the second block of each run?

Il 18/06/2010 23:29, Robert Haas ha scritto:
> On Fri, Jun 18, 2010 at 3:46 PM, mac_man2005(at)hotmail(dot)it
> <mac_man2005(at)hotmail(dot)it> wrote:
>
>> Which is the difference between having more than one tape into a file and
>> having one tape per file?
>>
> It makes it easier to recycle space a little at a time. Suppose
> you're merging two runs of 100 blocks each. You read in a block from
> each run and write out two output blocks. Now that you've done that,
> the first block of each of the input runs is garbage and can be
> recycled - but if the input runs and the output run are in three
> separate files, there's no easy way to do that. You can truncate a
> file (and throw away the end) but there's no easy way to throw away
> the BEGINNING of a file. So you'll probably have to hold on to the
> entirety of both inputs until you've written the entirety of the
> output.
>
> On the other hand, suppose you have all the blocks in one big file.
> The first input run is in blocks 1-100; the second is in blocks
> 101-200. You can read blocks 1 and 101, say, and write the results to
> blocks 201 and 202, using extra storage, but only a little bit. When
> you then read blocks 2 and 102, you write the results to blocks 1 and
> 100, which are no longer needed, because you've already merged them.
> When you get done with that, blocks 2 and 102 are now no longer needed
> and can be used to write the next part of the output. Of course, you
> have to keep track of which order to reread the blocks in when the
> sort is done: 201, 202, 1, 101, ... but that's a manageable problem.
>
>


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: "mac_man2005(at)hotmail(dot)it" <mac_man2005(at)hotmail(dot)it>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: About tapes
Date: 2010-06-20 21:20:18
Message-ID: AANLkTinEkolhYvV2jE2FSS6abmHP7o5Bv5VOuoLIRXRL@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Jun 19, 2010 at 4:57 AM, mac_man2005(at)hotmail(dot)it
<mac_man2005(at)hotmail(dot)it> wrote:
> Tom, Robert,
> thank you.
>
> Now it is clearer how space on tapes is recycled.
>
> I tried to follow Robert's example but storing one tape per separate file.
> Read in the first block of each run (hosted by separate tapes and so by
> separate files) and output them into extra storage, wherever this extra
> storage is.
> Again, those first input blocks are now garbage: is it correct?

Yes.

> In this case, what happens when trying to recycle those garbage blocks by
> hosting the result of merging the second block of each run?

You just overwrite them with the new data.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: "mac_man2005(at)hotmail(dot)it" <mac_man2005(at)hotmail(dot)it>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: About tapes
Date: 2010-06-21 00:39:03
Message-ID: BLU0-SMTP71314FEA77C360F9446065E6C30@phx.gbl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert, so in my example:
- tapes are stored in different files (one tape per file)
- you confirm those first blocks are garbage
- you confirm they could be rewritten with new data

This means that we can do recycle space using one tape per file. Correct?

So, in this case, why do we need to use logical tapesets?
In other words, why Tom affirmed it was impossible to recycle space
implementing one tape per file?

Il 20/06/2010 23:20, Robert Haas ha scritto:
> On Sat, Jun 19, 2010 at 4:57 AM, mac_man2005(at)hotmail(dot)it
> <mac_man2005(at)hotmail(dot)it> wrote:
>
>> Tom, Robert,
>> thank you.
>>
>> Now it is clearer how space on tapes is recycled.
>>
>> I tried to follow Robert's example but storing one tape per separate file.
>> Read in the first block of each run (hosted by separate tapes and so by
>> separate files) and output them into extra storage, wherever this extra
>> storage is.
>> Again, those first input blocks are now garbage: is it correct?
>>
> Yes.
>
>
>> In this case, what happens when trying to recycle those garbage blocks by
>> hosting the result of merging the second block of each run?
>>
> You just overwrite them with the new data.
>
>


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "mac_man2005(at)hotmail(dot)it" <mac_man2005(at)hotmail(dot)it>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: About tapes
Date: 2010-06-21 02:25:12
Message-ID: 21674.1277087112@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"mac_man2005(at)hotmail(dot)it" <mac_man2005(at)hotmail(dot)it> writes:
> Robert, so in my example:
> - tapes are stored in different files (one tape per file)
> - you confirm those first blocks are garbage
> - you confirm they could be rewritten with new data

> This means that we can do recycle space using one tape per file. Correct?

No. You could do that if the rate at which you need to write data to
the file is <= the rate at which you extract it. But for what we
are doing, namely merging runs from several tapes into one output run,
it's pretty much guaranteed that you need new space faster than you are
consuming data from any one input tape. It balances out as long as you
keep *all* the tapes in one operating-system file; otherwise not.

regards, tom lane


From: "mac_man2005(at)hotmail(dot)it" <mac_man2005(at)hotmail(dot)it>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: About tapes
Date: 2010-06-21 17:16:06
Message-ID: BLU0-SMTP78822EB589D2A6726BF302E6C30@phx.gbl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Il 21/06/2010 04:25, Tom Lane ha scritto:
> No. You could do that if the rate at which you need to write data to
> the file is<= the rate at which you extract it. But for what we
> are doing, namely merging runs from several tapes into one output run,
> it's pretty much guaranteed that you need new space faster than you are
> consuming data from any one input tape. It balances out as long as you
> keep *all* the tapes in one operating-system file; otherwise not.
>
> regards, tom lane
>
>
Tom, hope you could clarify the issue of the rates.

During the initialisation phase (loading blocks into heap) of course we
can mark as garbage more space than we are consuming (since we haven't
still begun merging blocks). The time to do that is after prereading as
much tuples as possible. Of course even during the algorithm we cannot
output more tuples than we preread. So there is no problem in terms of
total number of tuples read and output: at each time, read tuples are >=
output tuples.

Of course, in this case, output blocks should be placed in the free
space spread around the various files and we should keep track of this
placement.

But, recall that even in case of using a LogicalTapeSet we should keep
track of the output blocks, as Robert said in his example.

What's wrong in my picture?

Thank you.
Manolo.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "mac_man2005(at)hotmail(dot)it" <mac_man2005(at)hotmail(dot)it>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: About tapes
Date: 2010-06-21 19:27:06
Message-ID: 16635.1277148426@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"mac_man2005(at)hotmail(dot)it" <mac_man2005(at)hotmail(dot)it> writes:
> Of course, in this case, output blocks should be placed in the free
> space spread around the various files and we should keep track of this
> placement.

And once you've done that, what benefit have you got over the current
design? None that I can see. It's only more complicated.

regards, tom lane


From: "mac_man2005(at)hotmail(dot)it" <mac_man2005(at)hotmail(dot)it>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: About tapes
Date: 2010-06-21 19:44:20
Message-ID: BLU0-SMTP4798AEBCE8C8E7D3DCCAABE6C30@phx.gbl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom, you are right: it is just more complicated.

In fact, I did not pretend to demonstrate that it was easier or faster
using one file per tape.
As you can remember, I just did not understand why you said it was
*impossible* to recycle space in that case.

So, the conclusion is: you can do recycle space when using one file per
tape, but it is just more complicated than current design, isn't it?

PD: are we sure it is more complicated?

Thanks.

Manolo.

Il 21/06/2010 21:27, Tom Lane ha scritto:
>
> And once you've done that, what benefit have you got over the current
> design? None that I can see. It's only more complicated.
>
> regards, tom lane
>
>