Re: intarray internals

From: Volkan YAZICI <yazicivo(at)ttnet(dot)net(dot)tr>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: intarray internals
Date: 2006-05-06 20:43:15
Message-ID: 20060506204315.GC202@alamut
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

On May 06 05:38, Volkan YAZICI wrote:
> g_int_compress():
> if (integer array length > constant limit)
> {
> Transfrom {A, B, C, ..., Z} array into
> {A, A, B, B, ..., Z, Z}
>
> while (integer array length > constant limit)
> {
> Select two couples whose difference is minimum
> and remove them from the list.
> }
> }
>
> g_int_decompress():
> for (iterate over compressed array items)
> {
> Transform {..., 47, 50, ...} into {..., 47, 48, 49, 50, ...}
> }
>
> As you can see both compression and decompression methods are quite
> lossy. I'm not sure if this has any negative impact on the traversing
> of nodes stuff for more accurate predicates, but I am currently
> considering "performance gain * physical storage gain / cpu
> consumation loss" ratio if we'd instead use a lossless data
> compression method.

After considering the idea, I conclude up with that, current lossy
compression method seems like the best for now. But here goes my
proposal:

g_int_compress():
/* Input */
arr = {16, 20, 40, 52, 58}
len = 5

orig_len = len

while (len(arr) > LIM)
{
Find the couple with minimum diff: (16, 20)

Remove(arr, 20) /* Removing right bound. If it's the
last item of the list, remove
left item. */
--len
}

/*
* Suppose that, above function reduced array with below steps:
* 0: {16, 20, 40, 52, 58} (16, 20*)
* 1: {16, 40, 52, 58} (52*, 58)
* 2: {16, 40, 58}
*/

/* Prepend orig_len to array */
arr = {_5_, 16, 40, 58}

g_int_decompress():
/* Import info from input array. */
orig_len = 5
arr = {16, 40, 58}
len = 3

while (len < orig_len)
{
Divide every interval into two equal parts:
{16, 40, 58} -> {16, 28*, 40, 49*, 58}
}

I didn't test above method on real-world data chunks but, AFAIC, above
method would result with more accurate ranges in the decompressed arrays
and probably skip some redundant loop recursions when compared to
previous method.

Your comments?

Regards.

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Martijn van Oosterhout 2006-05-06 20:46:15 Re: intarray internals
Previous Message kmh496 2006-05-06 19:42:34 i need perl help for gborg project about mysql conversion

Browse pgsql-hackers by date

  From Date Subject
Next Message Martijn van Oosterhout 2006-05-06 20:46:15 Re: intarray internals
Previous Message Bruce Momjian 2006-05-06 16:30:37 Re: InsertXLogFile in pg_resetxlog