postgres8.3beta encodding problem?

Lists: pgsql-general
From: marcelo Cortez <jmdc_marcelo(at)yahoo(dot)com(dot)ar>
To: pgsql-general(at)postgresql(dot)org
Subject: postgres8.3beta encodding problem?
Date: 2007-12-17 16:53:37
Message-ID: 22821.9051.qm@web32105.mail.mud.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general


Folks

select chr(165);
ERROR: requested character too large for encoding:
165
it's one old scrip if not remember wrong works
postgres in 8.2.4
any clue?
best regars
mdc

info:
select version().
"PostgreSQL 8.3beta3 on i686-pc-linux-gnu, compiled by
GCC gcc (GCC) 4.1.2 (Gentoo 4.1.2 p1.0.1)"

show all

"add_missing_from";"off"
"allow_system_table_mods";"off"
"archive_command";"(disabled)"
"archive_mode";"off"
"archive_timeout";"0"
"array_nulls";"on"
"authentication_timeout";"1min"
"autovacuum";"on"
"autovacuum_analyze_scale_factor";"0.1"
"autovacuum_analyze_threshold";"50"
"autovacuum_freeze_max_age";"200000000"
"autovacuum_max_workers";"3"
"autovacuum_naptime";"1min"
"autovacuum_vacuum_cost_delay";"20ms"
"autovacuum_vacuum_cost_limit";"-1"
"autovacuum_vacuum_scale_factor";"0.2"
"autovacuum_vacuum_threshold";"50"
"backslash_quote";"safe_encoding"
"bgwriter_delay";"200ms"
"bgwriter_lru_maxpages";"100"
"bgwriter_lru_multiplier";"2"
"block_size";"8192"
"bonjour_name";""
"check_function_bodies";"on"
"checkpoint_completion_target";"0.5"
"checkpoint_segments";"3"
"checkpoint_timeout";"5min"
"checkpoint_warning";"30s"
"client_encoding";"latin1"
"client_min_messages";"notice"
"commit_delay";"0"
"commit_siblings";"5"
"config_file";"/usr/local/pgsql/data/postgresql.conf"
"constraint_exclusion";"off"
"cpu_index_tuple_cost";"0.005"
"cpu_operator_cost";"0.0025"
"cpu_tuple_cost";"0.01"
"custom_variable_classes";""
"data_directory";"/usr/local/pgsql/data"
"DateStyle";"ISO, DMY"
"db_user_namespace";"off"
"deadlock_timeout";"1s"
"debug_assertions";"off"
"debug_pretty_print";"off"
"debug_print_parse";"off"
"debug_print_plan";"off"
"debug_print_rewritten";"off"
"default_statistics_target";"10"
"default_tablespace";""
"default_text_search_config";"pg_catalog.spanish"
"default_transaction_isolation";"read committed"
"default_transaction_read_only";"off"
"default_with_oids";"off"
"dynamic_library_path";"$libdir"
"effective_cache_size";"128MB"
"enable_bitmapscan";"on"
"enable_hashagg";"on"
"enable_hashjoin";"on"
"enable_indexscan";"on"
"enable_mergejoin";"on"
"enable_nestloop";"on"
"enable_seqscan";"on"
"enable_sort";"on"
"enable_tidscan";"on"
"escape_string_warning";"on"
"explain_pretty_print";"on"
"external_pid_file";""
"extra_float_digits";"0"
"from_collapse_limit";"8"
"fsync";"on"
"full_page_writes";"on"
"geqo";"on"
"geqo_effort";"5"
"geqo_generations";"0"
"geqo_pool_size";"0"
"geqo_selection_bias";"2"
"geqo_threshold";"12"
"gin_fuzzy_search_limit";"0"
"hba_file";"/usr/local/pgsql/data/pg_hba.conf"
"ident_file";"/usr/local/pgsql/data/pg_ident.conf"
"ignore_system_indexes";"off"
"integer_datetimes";"off"
"join_collapse_limit";"8"
"krb_caseins_users";"off"
"krb_realm";""
"krb_server_hostname";""
"krb_server_keyfile";""
"krb_srvname";"postgres"
"lc_collate";"es_AR"
"lc_ctype";"es_AR"
"lc_messages";"es_AR"
"lc_monetary";"es_AR"
"lc_numeric";"es_AR"
"lc_time";"es_AR"
"listen_addresses";"*"
"local_preload_libraries";""
"log_autovacuum_min_duration";"-1"
"log_checkpoints";"off"
"log_connections";"off"
"log_destination";"stderr"
"log_directory";"pg_log"
"log_disconnections";"off"
"log_duration";"off"
"log_error_verbosity";"default"
"log_executor_stats";"off"
"log_filename";"postgresql-%Y-%m-%d_%H%M%S.log"
"log_hostname";"off"
"log_line_prefix";""
"log_lock_waits";"off"
"log_min_duration_statement";"-1"
"log_min_error_statement";"error"
"log_min_messages";"notice"
"log_parser_stats";"off"
"log_planner_stats";"off"
"log_rotation_age";"1d"
"log_rotation_size";"10MB"
"log_statement";"all"
"log_statement_stats";"off"
"log_temp_files";"-1"
"log_timezone";"America/Buenos_Aires"
"log_truncate_on_rotation";"off"
"logging_collector";"off"
"maintenance_work_mem";"16MB"
"max_connections";"100"
"max_files_per_process";"1000"
"max_fsm_pages";"153600"
"max_fsm_relations";"1000"
"max_function_args";"100"
"max_identifier_length";"63"
"max_index_keys";"32"
"max_locks_per_transaction";"64"
"max_prepared_transactions";"5"
"max_stack_depth";"2MB"
"password_encryption";"on"
"port";"5432"
"post_auth_delay";"0"
"pre_auth_delay";"0"
"random_page_cost";"4"
"regex_flavor";"advanced"
"search_path";""$user",public"
"seq_page_cost";"1"
"server_encoding";"LATIN1"
"server_version";"8.3beta3"
"server_version_num";"80300"
"session_replication_role";"origin"
"shared_buffers";"24MB"
"shared_preload_libraries";""
"silent_mode";"off"
"sql_inheritance";"on"
"ssl";"off"
"standard_conforming_strings";"off"
"statement_timeout";"0"
"superuser_reserved_connections";"3"
"synchronous_commit";"on"
"syslog_facility";"LOCAL0"
"syslog_ident";"postgres"
"tcp_keepalives_count";"9"
"tcp_keepalives_idle";"7200"
"tcp_keepalives_interval";"75"
"temp_buffers";"1024"
"temp_tablespaces";""
"TimeZone";"America/Buenos_Aires"
"timezone_abbreviations";"Default"
"trace_notify";"off"
"trace_sort";"off"
"track_activities";"on"
"track_counts";"on"
"transaction_isolation";"read committed"
"transaction_read_only";"off"
"transform_null_equals";"off"
"unix_socket_directory";""
"unix_socket_group";""
"unix_socket_permissions";"511"
"update_process_title";"on"
"vacuum_cost_delay";"0"
"vacuum_cost_limit";"200"
"vacuum_cost_page_dirty";"20"
"vacuum_cost_page_hit";"1"
"vacuum_cost_page_miss";"10"
"vacuum_freeze_min_age";"100000000"
"wal_buffers";"64kB"
"wal_sync_method";"fdatasync"
"wal_writer_delay";"200ms"
"work_mem";"1MB"
"xmlbinary";"base64"
"xmloption";"content"
"zero_damaged_pages";"off"

Tarjeta de crédito Yahoo! de Banco Supervielle.
Solicitá tu nueva Tarjeta de crédito. De tu PC directo a tu casa. www.tuprimeratarjeta.com.ar


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: marcelo Cortez <jmdc_marcelo(at)yahoo(dot)com(dot)ar>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: postgres8.3beta encodding problem?
Date: 2007-12-17 18:13:54
Message-ID: 1197915234.28804.325.camel@dogma.ljc.laika.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Mon, 2007-12-17 at 13:53 -0300, marcelo Cortez wrote:
> select chr(165);
> ERROR: requested character too large for encoding:
> 165
> it's one old scrip if not remember wrong works
> postgres in 8.2.4
> any clue?

http://www.postgresql.org/docs/8.3/static/release-8-3.html

"Ensure that chr() cannot create invalidly-encoded values (Andrew)

In UTF8-encoded databases the argument of chr() is now treated as a
Unicode code point. In other multi-byte encodings chr()'s argument must
designate a 7-bit ASCII character. Zero is no longer accepted. ascii()
has been adjusted to match."

Regards,
Jeff Davis


From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: marcelo Cortez <jmdc_marcelo(at)yahoo(dot)com(dot)ar>, pgsql-general(at)postgresql(dot)org
Subject: Re: postgres8.3beta encodding problem?
Date: 2007-12-18 10:09:25
Message-ID: 20071218100925.GA13268@svana.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Mon, Dec 17, 2007 at 10:13:54AM -0800, Jeff Davis wrote:
> http://www.postgresql.org/docs/8.3/static/release-8-3.html
>
> "Ensure that chr() cannot create invalidly-encoded values (Andrew)

Ok, but that doesn't apply in this case, his database appears to be
LATIN1 and this character is valid for that encoding...

Have a nice day,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> Those who make peaceful revolution impossible will make violent revolution inevitable.
> -- John F Kennedy


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Martijn van Oosterhout <kleptog(at)svana(dot)org>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, marcelo Cortez <jmdc_marcelo(at)yahoo(dot)com(dot)ar>, Andrew Dunstan <andrew(at)dunslane(dot)net>, pgsql-general(at)postgresql(dot)org
Subject: Re: postgres8.3beta encodding problem?
Date: 2007-12-18 15:35:39
Message-ID: 29305.1197992139@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Martijn van Oosterhout <kleptog(at)svana(dot)org> writes:
> On Mon, Dec 17, 2007 at 10:13:54AM -0800, Jeff Davis wrote:
>> http://www.postgresql.org/docs/8.3/static/release-8-3.html

> Ok, but that doesn't apply in this case, his database appears to be
> LATIN1 and this character is valid for that encoding...

You know what, I think the test in the code is backwards.

is_mb = pg_encoding_max_length(encoding) > 1;

if ((is_mb && (cvalue > 255)) || (!is_mb && (cvalue > 127)))
ereport(ERROR,
(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
errmsg("requested character too large for encoding: %d",
cvalue)));

Shouldn't we be allowing up-to-255 for single-byte encodings, not
multibyte?

regards, tom lane


From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, marcelo Cortez <jmdc_marcelo(at)yahoo(dot)com(dot)ar>, Andrew Dunstan <andrew(at)dunslane(dot)net>, pgsql-general(at)postgresql(dot)org
Subject: Re: postgres8.3beta encodding problem?
Date: 2007-12-18 15:54:03
Message-ID: 20071218155403.GF13268@svana.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Tue, Dec 18, 2007 at 10:35:39AM -0500, Tom Lane wrote:
> Martijn van Oosterhout <kleptog(at)svana(dot)org> writes:
> > Ok, but that doesn't apply in this case, his database appears to be
> > LATIN1 and this character is valid for that encoding...
>
> You know what, I think the test in the code is backwards.
>
> is_mb = pg_encoding_max_length(encoding) > 1;
>
> if ((is_mb && (cvalue > 255)) || (!is_mb && (cvalue > 127)))

It does seem to be a bit wierd. For single character encodings anything
up to 255 is OK, well, sort of. It depends on what you want chr() to do
(oh no, not this discussion again). If you subscribe to the idea that
it should use unicode code points then the test is completely bogus,
since whether or not the character is valid has nothing to with whether
the encoding is multibyte or not.

If you want the output of th chr() to (logically) depend on the encoding
then the test makes more sense, but ten it's inverted. Single-byte
encodings are by definition defined to 255 characters. And multibyte
encodings (other than UTF-8 I suppose) can only see the ASCII subset.

Have a nice day,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> Those who make peaceful revolution impossible will make violent revolution inevitable.
> -- John F Kennedy


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Martijn van Oosterhout <kleptog(at)svana(dot)org>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, marcelo Cortez <jmdc_marcelo(at)yahoo(dot)com(dot)ar>, Andrew Dunstan <andrew(at)dunslane(dot)net>, pgsql-general(at)postgresql(dot)org
Subject: Re: postgres8.3beta encodding problem?
Date: 2007-12-18 16:00:08
Message-ID: 29674.1197993608@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Martijn van Oosterhout <kleptog(at)svana(dot)org> writes:
> It does seem to be a bit wierd. For single character encodings anything
> up to 255 is OK, well, sort of. It depends on what you want chr() to do
> (oh no, not this discussion again). If you subscribe to the idea that
> it should use unicode code points then the test is completely bogus,
> since whether or not the character is valid has nothing to with whether
> the encoding is multibyte or not.

Well, the advertised purpose of the chr() changes was to prevent
generation of invalid multibyte sequences, not to cut off
potentially-useful functionality. So I don't think it should be
preventing people from generating non-ASCII single-byte characters.

The test is clearly backwards, because in an MB encoding it will in fact
let you generate invalid encoding ...

regards, tom lane


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jeff Davis <pgsql(at)j-davis(dot)com>, marcelo Cortez <jmdc_marcelo(at)yahoo(dot)com(dot)ar>, Andrew Dunstan <andrew(at)dunslane(dot)net>, pgsql-general(at)postgresql(dot)org
Subject: Re: postgres8.3beta encodding problem?
Date: 2007-12-18 16:01:28
Message-ID: 4767EED8.3040609@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Martijn van Oosterhout wrote:
> On Tue, Dec 18, 2007 at 10:35:39AM -0500, Tom Lane wrote:
>
>> Martijn van Oosterhout <kleptog(at)svana(dot)org> writes:
>>
>>> Ok, but that doesn't apply in this case, his database appears to be
>>> LATIN1 and this character is valid for that encoding...
>>>
>> You know what, I think the test in the code is backwards.
>>
>> is_mb = pg_encoding_max_length(encoding) > 1;
>>
>> if ((is_mb && (cvalue > 255)) || (!is_mb && (cvalue > 127)))
>>

Yes.

>
> It does seem to be a bit wierd. For single character encodings anything
> up to 255 is OK, well, sort of. It depends on what you want chr() to do
> (oh no, not this discussion again). If you subscribe to the idea that
> it should use unicode code points then the test is completely bogus,
> since whether or not the character is valid has nothing to with whether
> the encoding is multibyte or not.
>

We are certainly not going to revisit that discussion at this stage. It
was thrashed out months ago.
> If you want the output of th chr() to (logically) depend on the encoding
> then the test makes more sense, but ten it's inverted. Single-byte
> encodings are by definition defined to 255 characters. And multibyte
> encodings (other than UTF-8 I suppose) can only see the ASCII subset.
>

Right. There is a simple thinko on my part in the line of code Tom
pointed to, which needs to be fixed.

cheers

andrew