Re: Best way to construct PostgreSQL ArrayType (_int4) from C int array

Lists: pgsql-general
From: Adrian Schreyer <ams214(at)cam(dot)ac(dot)uk>
To: pgsql-general(at)postgresql(dot)org
Subject: Best way to construct PostgreSQL ArrayType (_int4) from C int array
Date: 2011-04-27 11:02:36
Message-ID: BANLkTim6L6NCgDP3cUvA4hZjPLpPU1ZLeA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

At the moment I am using the following code to construct a PostgreSQL
array from a C array in my C extension but I am not so sure if this is
really the best solution:

const int *data = array.data(); // C array
Datum *d = (Datum *) palloc(sizeof(Datum) * size);

for (int i = 0; i < size; i++) d[i] = Int32GetDatum(data[i]);

ArrayType *a = construct_array(d, size, INT4OID, sizeof(int4), true, 'i');

Is this okay or is there a better solution (existing function in the
PostgreSQL source for example)?

Cheers,

Adrian


From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Adrian Schreyer <ams214(at)cam(dot)ac(dot)uk>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Best way to construct PostgreSQL ArrayType (_int4) from C int array
Date: 2011-04-27 13:49:34
Message-ID: BANLkTi=8mc6+5t+BYXkaMNhp6miChNS=7Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Wed, Apr 27, 2011 at 6:02 AM, Adrian Schreyer <ams214(at)cam(dot)ac(dot)uk> wrote:
> At the moment I am using the following code to construct a PostgreSQL
> array from a C array in my C extension but I am not so sure if this is
> really the best solution:
>
> const int *data = array.data(); // C array
> Datum *d = (Datum *) palloc(sizeof(Datum) * size);
>
> for (int i = 0; i < size; i++) d[i] = Int32GetDatum(data[i]);
>
> ArrayType *a = construct_array(d, size, INT4OID, sizeof(int4), true, 'i');
>
> Is this okay or is there a better solution (existing function in the
> PostgreSQL source for example)?

That's pretty much the best way AFAIK. Int32GetDatum doesn't do
anything fancy -- it's just a 32 bit mask/assignment. constructing
the array at once is going to be a lot better than incrementally
creating it. Do you expect the arrays to be large, say bigger than
10k elements?

merlin


From: Adrian Schreyer <ams214(at)cam(dot)ac(dot)uk>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Best way to construct PostgreSQL ArrayType (_int4) from C int array
Date: 2011-04-27 17:00:47
Message-ID: BANLkTimsm5eyzoAF0KSf4cFPkb+uFZNutQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

The largest arrays I expect at the moment are more or less sparse
vectors of around 4.8k elements and I have noticed that the
input/output (C/C++ extension) does not scale well with the number of
elements in the array.

Using a function that sums all elements in the array, this is the time
it takes for ~150k arrays of various sizes (including ordering desc
and limit 10):

128: 61ms
256: 80ms
512: 681ms
1024 1065ms
2048 7682ms
4096 21332ms

That's why I thought that the construction of the PostgreSQL array was
not optimal.

On Wed, Apr 27, 2011 at 14:49, Merlin Moncure <mmoncure(at)gmail(dot)com> wrote:
> On Wed, Apr 27, 2011 at 6:02 AM, Adrian Schreyer <ams214(at)cam(dot)ac(dot)uk> wrote:
>> At the moment I am using the following code to construct a PostgreSQL
>> array from a C array in my C extension but I am not so sure if this is
>> really the best solution:
>>
>> const int *data = array.data(); // C array
>> Datum *d = (Datum *) palloc(sizeof(Datum) * size);
>>
>> for (int i = 0; i < size; i++) d[i] = Int32GetDatum(data[i]);
>>
>> ArrayType *a = construct_array(d, size, INT4OID, sizeof(int4), true, 'i');
>>
>> Is this okay or is there a better solution (existing function in the
>> PostgreSQL source for example)?
>
> That's pretty much the best way AFAIK. Int32GetDatum doesn't do
> anything fancy -- it's just a 32 bit mask/assignment.  constructing
> the array at once is going to be a lot better than incrementally
> creating it.  Do you expect the arrays to be large, say bigger than
> 10k elements?
>
> merlin
>


From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Adrian Schreyer <ams214(at)cam(dot)ac(dot)uk>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Best way to construct PostgreSQL ArrayType (_int4) from C int array
Date: 2011-04-27 17:06:01
Message-ID: BANLkTi=qYY9gdR+S2VAZAq028TEQyywmVw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Wed, Apr 27, 2011 at 12:00 PM, Adrian Schreyer <ams214(at)cam(dot)ac(dot)uk> wrote:
> The largest arrays I expect at the moment are more or less sparse
> vectors of around 4.8k elements and I have noticed that the
> input/output (C/C++ extension) does not scale well with the number of
> elements in the array.
>
> Using a function that sums all elements in the array, this is the time
> it takes for ~150k arrays of various sizes (including ordering desc
> and limit 10):
>
> 128: 61ms
> 256: 80ms
> 512: 681ms
> 1024 1065ms
> 2048 7682ms
> 4096 21332ms

hm, I'm not following you exactly -- what sql are you running? This
scales pretty well for me:
select array_dims(array(select generate_series(1,1000000)));
etc

merlin


From: Adrian Schreyer <ams214(at)cam(dot)ac(dot)uk>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Best way to construct PostgreSQL ArrayType (_int4) from C int array
Date: 2011-04-27 18:43:16
Message-ID: BANLkTikoSkCERiYMQfnNCY-+=8y_mhZqXw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Wed, Apr 27, 2011 at 18:06, Merlin Moncure <mmoncure(at)gmail(dot)com> wrote:
> On Wed, Apr 27, 2011 at 12:00 PM, Adrian Schreyer <ams214(at)cam(dot)ac(dot)uk> wrote:
>> The largest arrays I expect at the moment are more or less sparse
>> vectors of around 4.8k elements and I have noticed that the
>> input/output (C/C++ extension) does not scale well with the number of
>> elements in the array.
>>
>> Using a function that sums all elements in the array, this is the time
>> it takes for ~150k arrays of various sizes (including ordering desc
>> and limit 10):
>>
>> 128: 61ms
>> 256: 80ms
>> 512: 681ms
>> 1024 1065ms
>> 2048 7682ms
>> 4096 21332ms
>
> hm, I'm not following you exactly -- what sql are you running?  This
> scales pretty well for me:
> select array_dims(array(select generate_series(1,1000000)));
> etc
>
> merlin
>
I have a C extension function that creates _int4 arrays of a specified
size with random elements, in this case 128,256,512 etc. Another
function from my extension returns the sum of the array. In this case
I created a table with around 150k arrays to benchmark the extension.
The query sums each array in the table and returns the 10 highest
numbers. The C extension is actually a wrapper around the Eigen 3
template library, which works pretty well - now I am trying to tweak
the input/output functions to get better performance with larger
arrays.


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Adrian Schreyer <ams214(at)cam(dot)ac(dot)uk>
Cc: Merlin Moncure <mmoncure(at)gmail(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: Best way to construct PostgreSQL ArrayType (_int4) from C int array
Date: 2011-04-28 04:34:20
Message-ID: BANLkTiki7G-O2DcGGGJno-S=q+HE+hD00A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

2011/4/27 Adrian Schreyer <ams214(at)cam(dot)ac(dot)uk>:
> The largest arrays I expect at the moment are more or less sparse
> vectors of around 4.8k elements and I have noticed that the
> input/output (C/C++ extension) does not scale well with the number of
> elements in the array.
>
> Using a function that sums all elements in the array, this is the time
> it takes for ~150k arrays of various sizes (including ordering desc
> and limit 10):
>

PostgreSQL doesn't use a index for access to array fields. So access
to fields of packed arrays can be slower for higher subscripts.

Regards

Pavel Stehule

> 128: 61ms
> 256: 80ms
> 512: 681ms
> 1024 1065ms
> 2048 7682ms
> 4096 21332ms
>
> That's why I thought that the construction of the PostgreSQL array was
> not optimal.
>
> On Wed, Apr 27, 2011 at 14:49, Merlin Moncure <mmoncure(at)gmail(dot)com> wrote:
>> On Wed, Apr 27, 2011 at 6:02 AM, Adrian Schreyer <ams214(at)cam(dot)ac(dot)uk> wrote:
>>> At the moment I am using the following code to construct a PostgreSQL
>>> array from a C array in my C extension but I am not so sure if this is
>>> really the best solution:
>>>
>>> const int *data = array.data(); // C array
>>> Datum *d = (Datum *) palloc(sizeof(Datum) * size);
>>>
>>> for (int i = 0; i < size; i++) d[i] = Int32GetDatum(data[i]);
>>>
>>> ArrayType *a = construct_array(d, size, INT4OID, sizeof(int4), true, 'i');
>>>
>>> Is this okay or is there a better solution (existing function in the
>>> PostgreSQL source for example)?
>>
>> That's pretty much the best way AFAIK. Int32GetDatum doesn't do
>> anything fancy -- it's just a 32 bit mask/assignment.  constructing
>> the array at once is going to be a lot better than incrementally
>> creating it.  Do you expect the arrays to be large, say bigger than
>> 10k elements?
>>
>> merlin
>>
>
> --
> Sent via pgsql-general mailing list (pgsql-general(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>