Re: Modifying update_attstats of analyze.c for C Strings

Lists: pgsql-hackers
From: Ashoke <s(dot)ashoke(at)gmail(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Modifying update_attstats of analyze.c for C Strings
Date: 2014-07-08 05:22:00
Message-ID: CALpszJOkbYcGXehaLqDMqT6P-BfurPZOHT-ywGqzGMxE+R3gSQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

I am trying to implement a functionality that is similar to ANALYZE, but
needs to have different values (the values will be valid and is stored in
inp->str[][]) for MCV/Histogram Bounds in case the column under
consideration is varchar (C Strings). I have written a function
*dummy_update_attstats* with the following changes. Other things remain the
same as in *update_attstats* of *~/src/backend/commands/analyze.c*

*---*
*{*

* ArrayType *arry;*
* if (*
*strcmp(col_type,"varchar") == 0*
*)*
* arry = construct_array(stats->stavalues[k],*
* stats->numvalues[k],*
* CSTRINGOID,*
* -2,*
* false,*
* 'c');*
* else*
* arry = construct_array(stats->stavalues[k],*
* stats->numvalues[k],*
* stats->statypid[k],*
* stats->statyplen[k],*
* stats->statypbyval[k],*
* stats->statypalign[k]);*
* values[i++] = PointerGetDatum(arry); /* stavaluesN */ }*
---

and I update the hist_values in the appropriate function as:
---

*if (strcmp(col_type,"varchar") == 0**)*
* hist_values[i] = datumCopy(CStringGetDatum(inp->str[i][j]),*
* false,*
* -2);*
*---*

I tried this based on the following reference :
http://www.postgresql.org/message-id/attachment/20352/vacattrstats-extend.diff

My issue is : When I use my way for strings, the MCV/histogram_bounds in
pg_stats doesn't have double quotes (" ") surrounding string. That is,

If normal *update_attstats* is used, histogram_bounds for *TPCH
nation(n_name)* are : *"ALGERIA ","ARGENTINA ",...*
If I use *dummy_update_attstats* as above, histogram_bounds for *TPCH
nation(n_name)* are : *ALGERIA,ARGENTINA,...*

This becomes an issue if the string has ',' (commas), like for example in
*n_comment* column of *nation* table.

Could someone point out the problem and suggest a solution?

Thank you.

--
Regards,
Ashoke


From: Ashoke <s(dot)ashoke(at)gmail(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Modifying update_attstats of analyze.c for C Strings
Date: 2014-07-08 06:34:35
Message-ID: CALpszJO5ax+yVcjU3t_A3S94gM+s4u+SYjLZO5Ydf+VGkZPhDg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

As a follow-up question,

I found some of the varchar column types, in which the histogram_bounds are
not being surrounded in double quotes (" ") even in the default
implementation.
Ex : *c_name* column of *Customer* table

I also found histogram_bounds in which only some strings are surrounded in
double quotes and some are not.
Ex : *c_address *column of* Customer *table

Why are there such inconsistencies? How is this determined?

Thank you.

On Tue, Jul 8, 2014 at 10:52 AM, Ashoke <s(dot)ashoke(at)gmail(dot)com> wrote:

> Hi,
>
> I am trying to implement a functionality that is similar to ANALYZE, but
> needs to have different values (the values will be valid and is stored in
> inp->str[][]) for MCV/Histogram Bounds in case the column under
> consideration is varchar (C Strings). I have written a function
> *dummy_update_attstats* with the following changes. Other things remain
> the same as in *update_attstats* of *~/src/backend/commands/analyze.c*
>
>
> *---*
> *{*
>
> * ArrayType *arry; *
> * if (*
> *strcmp(col_type,"varchar") == 0*
> * )*
> * arry = construct_array(stats->stavalues[k],*
> * stats->numvalues[k], *
> * CSTRINGOID,*
> * -2, *
> * false,*
> * 'c'); *
> * else*
> * arry = construct_array(stats->stavalues[k], *
> * stats->numvalues[k],*
> * stats->statypid[k], *
> * stats->statyplen[k],*
> * stats->statypbyval[k], *
> * stats->statypalign[k]);*
> * values[i++] = PointerGetDatum(arry); /* stavaluesN */ }*
> ---
>
> and I update the hist_values in the appropriate function as:
> ---
>
> *if (strcmp(col_type,"varchar") == 0**)*
> * hist_values[i] = datumCopy(CStringGetDatum(inp->str[i][j]),*
> * false,*
> * -2);*
> *---*
>
> I tried this based on the following reference :
> http://www.postgresql.org/message-id/attachment/20352/vacattrstats-extend.diff
>
> My issue is : When I use my way for strings, the MCV/histogram_bounds in
> pg_stats doesn't have double quotes (" ") surrounding string. That is,
>
> If normal *update_attstats* is used, histogram_bounds for *TPCH
> nation(n_name)* are : *"ALGERIA ","ARGENTINA ",...*
> If I use *dummy_update_attstats* as above, histogram_bounds for *TPCH
> nation(n_name)* are : *ALGERIA,ARGENTINA,...*
>
> This becomes an issue if the string has ',' (commas), like for example in
> *n_comment* column of *nation* table.
>
> Could someone point out the problem and suggest a solution?
>
> Thank you.
>
> --
> Regards,
> Ashoke
>

--
Regards,
Ashoke


From: Ashoke <s(dot)ashoke(at)gmail(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Modifying update_attstats of analyze.c for C Strings
Date: 2014-07-08 07:11:41
Message-ID: CALpszJOqw23bRBNtNUw5eiS1vmFBF0xFBkU0Zv4Psds-JOFtaw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Ok, I was able to figure out that when strings contained 'spaces',
PostgreSQL appends them with double quotes.

On Tue, Jul 8, 2014 at 12:04 PM, Ashoke <s(dot)ashoke(at)gmail(dot)com> wrote:

> As a follow-up question,
>
> I found some of the varchar column types, in which the histogram_bounds
> are not being surrounded in double quotes (" ") even in the default
> implementation.
> Ex : *c_name* column of *Customer* table
>
> I also found histogram_bounds in which only some strings are surrounded in
> double quotes and some are not.
> Ex : *c_address *column of* Customer *table
>
> Why are there such inconsistencies? How is this determined?
>
> Thank you.
>
>
> On Tue, Jul 8, 2014 at 10:52 AM, Ashoke <s(dot)ashoke(at)gmail(dot)com> wrote:
>
>> Hi,
>>
>> I am trying to implement a functionality that is similar to ANALYZE, but
>> needs to have different values (the values will be valid and is stored in
>> inp->str[][]) for MCV/Histogram Bounds in case the column under
>> consideration is varchar (C Strings). I have written a function
>> *dummy_update_attstats* with the following changes. Other things remain
>> the same as in *update_attstats* of *~/src/backend/commands/analyze.c*
>>
>>
>> *---*
>> *{*
>>
>> * ArrayType *arry; *
>> * if (*
>> *strcmp(col_type,"varchar") == 0*
>> * )*
>> * arry = construct_array(stats->stavalues[k],*
>> * stats->numvalues[k], *
>> * CSTRINGOID,*
>> * -2, *
>> * false,*
>> * 'c'); *
>> * else*
>> * arry = construct_array(stats->stavalues[k], *
>> * stats->numvalues[k],*
>> * stats->statypid[k], *
>> * stats->statyplen[k],*
>> * stats->statypbyval[k], *
>> * stats->statypalign[k]);*
>> * values[i++] = PointerGetDatum(arry); /* stavaluesN */ }*
>> ---
>>
>> and I update the hist_values in the appropriate function as:
>> ---
>>
>> *if (strcmp(col_type,"varchar") == 0**)*
>> * hist_values[i] = datumCopy(CStringGetDatum(inp->str[i][j]),*
>> * false,*
>> * -2);*
>> *---*
>>
>> I tried this based on the following reference :
>> http://www.postgresql.org/message-id/attachment/20352/vacattrstats-extend.diff
>>
>> My issue is : When I use my way for strings, the MCV/histogram_bounds in
>> pg_stats doesn't have double quotes (" ") surrounding string. That is,
>>
>> If normal *update_attstats* is used, histogram_bounds for *TPCH
>> nation(n_name)* are : *"ALGERIA ","ARGENTINA ",...*
>> If I use *dummy_update_attstats* as above, histogram_bounds for *TPCH
>> nation(n_name)* are : *ALGERIA,ARGENTINA,...*
>>
>> This becomes an issue if the string has ',' (commas), like for example in
>> *n_comment* column of *nation* table.
>>
>> Could someone point out the problem and suggest a solution?
>>
>> Thank you.
>>
>> --
>> Regards,
>> Ashoke
>>
>
>
>
> --
> Regards,
> Ashoke
>
>
>
>
>

--
Regards,
Ashoke