[no subject]
Therefore the vast majority of the time representing a data set by Key
will use less space that the original record. Less space used means
less IO to scan the data set, which means faster scan times.
This is why index files work in the first place, right?
>Perhaps I believe this because you can now buy as much sequential I/O
>as you want. Random I/O is the only real savings.
>
1= No, you can not "buy as much sequential IO as you want". Even if
with an infinite budget, there are physical and engineering limits. Long
before you reach those limits, you will pay exponentially increasing costs
for linearly increasing performance gains. So even if you _can_ buy a
certain level of sequential IO, it may not be the most efficient way to
spend money.
2= Most RW IT professionals have far from an infinite budget. Just traffic
on these lists shows how severe the typical cost constraints usually are.
OTOH, if you have an inifinite IT budget, care to help a few less fortunate
than yourself? After all, a even a large constant substracted from infinity
is still infinity... ;-)
3= No matter how fast you can do IO, IO remains the most expensive
part of the performance equation. The fastest and cheapest IO you can
do is _no_ IO. As long as we trade cheaper RAM and even cheaoer CPU
operations for IO correctly, more space efficient data representations will
always be a Win because of this.
Home |
Main Index |
Thread Index