Lists: | pgsql-general |
---|
From: | Alexey Borzov <borz_off(at)rdw(dot)ru> |
---|---|
To: | pgsql-general(at)postgresql(dot)org |
Subject: | sorting/grouping/(non-)unique indexes bug |
Date: | 2002-06-07 09:52:59 |
Message-ID: | 12476244223.20020607135259@rdw.ru |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-general |
Greetings!
Before I present the following, I must say that 'simple' index
corruption is highly improbable; the server did not suffer hard
reboots.
rdw=# \d reg_user
Table "reg_user"
Column | Type | Modifiers
------------------+------------------------+------------------------------------------------
user_id | integer | not null default nextval('reg_user_seq'::text)
user_email | character varying(50) | not null
user_passwd | character varying(32) | not null
user_active | boolean | not null default 't'
user_allow_pauth | boolean | default 'f'
user_full_name | character varying(100) |
user_pseudonym | character varying(100) |
user_who | character(1) |
Primary key: reg_user_pkey
Unique keys: reg_user_email_key
rdw=# \d reg_user_email_key
Index "reg_user_email_key"
Column | Type
------------+-----------------------
user_email | character varying(50)
unique btree
rdw=# create table broken as select user_email from reg_user;
SELECT
rdw=# -- one may think that emails are unique...
rdw=# select count(user_email) from broken;
count
--------
212400
(1 row)
rdw=# select count(distinct user_email) from broken;
count
--------
212397
(1 row)
rdw=# -- look closely at the numbers
rdw=# select user_email, count(user_email) from broken group by user_email order by 2 desc;
user_email | count
----------------------------------------------------+-------
[duplicated email] | 2
[some email] | 1
[some email] | 1
[some email] | 1
rdw=# delete from broken where user_email = '[duplicated email]';
DELETE 2
rdw=# select user_email, count(user_email) from broken group by user_email order by 2 desc;
user_email | count
----------------------------------------------------+-------
[another email] | 2
[some email] | 1
[some email] | 1
rdw=# delete from broken where user_email = '[another email]';
DELETE 2
rdw=# select user_email, count(user_email) from broken group by user_email order by 2 desc;
user_email | count
----------------------------------------------------+-------
[some email] | 1
[some email] | 1
[some email] | 1
rdw=# --finally, no more duplicates!
rdw=# create unique index broken_email_key on broken (user_email);
ERROR: Cannot create unique index. Table contains non-unique values
rdw=# select version();
version
-------------------------------------------------------------
PostgreSQL 7.2.1 on i686-pc-linux-gnu, compiled by GCC 2.96
(1 row)
$psql -l
List of databases
Name | Owner | Encoding
-----------+----------+-----------
...
rdw | alex | WIN
...
The server is run under ru_RU.CP1251 locale, initdb was done under the
same locale.
As you probably guessed, reg_user table contains actual email
addresses of our site's registered users. I can provide them for
testing (without passwords and stuff, of course) if this is going to
help.
--
Yours, Alexey V. Borzov, Webmaster of RDW.ru
From: | Jean-Luc Lachance <jllachan(at)nsd(dot)ca> |
---|---|
To: | Alexey Borzov <borz_off(at)rdw(dot)ru> |
Cc: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: sorting/grouping/(non-)unique indexes bug |
Date: | 2002-06-07 16:26:55 |
Message-ID: | 3D00DECF.7FDB34EE@nsd.ca |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-general |
Unbelievable, but you may have a couple of record with null user_email.
Try:
select * from broken where user_email is null;
select * from reg_user where user_email is null;
Alexey Borzov wrote:
>
> Greetings!
>
> Before I present the following, I must say that 'simple' index
> corruption is highly improbable; the server did not suffer hard
> reboots.
>
> rdw=# \d reg_user
> Table "reg_user"
> Column | Type | Modifiers
> ------------------+------------------------+------------------------------------------------
> user_id | integer | not null default nextval('reg_user_seq'::text)
> user_email | character varying(50) | not null
> user_passwd | character varying(32) | not null
> user_active | boolean | not null default 't'
> user_allow_pauth | boolean | default 'f'
> user_full_name | character varying(100) |
> user_pseudonym | character varying(100) |
> user_who | character(1) |
> Primary key: reg_user_pkey
> Unique keys: reg_user_email_key
>
> rdw=# \d reg_user_email_key
> Index "reg_user_email_key"
> Column | Type
> ------------+-----------------------
> user_email | character varying(50)
> unique btree
>
> rdw=# create table broken as select user_email from reg_user;
> SELECT
>
> rdw=# -- one may think that emails are unique...
> rdw=# select count(user_email) from broken;
> count
> --------
> 212400
> (1 row)
>
> rdw=# select count(distinct user_email) from broken;
> count
> --------
> 212397
> (1 row)
>
> rdw=# -- look closely at the numbers
> rdw=# select user_email, count(user_email) from broken group by user_email order by 2 desc;
> user_email | count
> ----------------------------------------------------+-------
> [duplicated email] | 2
> [some email] | 1
> [some email] | 1
> [some email] | 1
>
> rdw=# delete from broken where user_email = '[duplicated email]';
> DELETE 2
>
> rdw=# select user_email, count(user_email) from broken group by user_email order by 2 desc;
> user_email | count
> ----------------------------------------------------+-------
> [another email] | 2
> [some email] | 1
> [some email] | 1
>
> rdw=# delete from broken where user_email = '[another email]';
> DELETE 2
>
> rdw=# select user_email, count(user_email) from broken group by user_email order by 2 desc;
> user_email | count
> ----------------------------------------------------+-------
> [some email] | 1
> [some email] | 1
> [some email] | 1
>
> rdw=# --finally, no more duplicates!
> rdw=# create unique index broken_email_key on broken (user_email);
> ERROR: Cannot create unique index. Table contains non-unique values
>
> rdw=# select version();
> version
> -------------------------------------------------------------
> PostgreSQL 7.2.1 on i686-pc-linux-gnu, compiled by GCC 2.96
> (1 row)
>
> $psql -l
> List of databases
> Name | Owner | Encoding
> -----------+----------+-----------
> ...
> rdw | alex | WIN
> ...
>
> The server is run under ru_RU.CP1251 locale, initdb was done under the
> same locale.
>
> As you probably guessed, reg_user table contains actual email
> addresses of our site's registered users. I can provide them for
> testing (without passwords and stuff, of course) if this is going to
> help.
>
> --
> Yours, Alexey V. Borzov, Webmaster of RDW.ru
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
> http://archives.postgresql.org
From: | "Alexey V(dot) Borzov" <borz_off(at)rdw(dot)ru> |
---|---|
To: | Jean-Luc Lachance <jllachan(at)nsd(dot)ca> |
Cc: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: sorting/grouping/(non-)unique indexes bug |
Date: | 2002-06-07 18:09:13 |
Message-ID: | 1852175648.20020607220913@rdw.ru |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-general |
Hello Jean-Luc,
Friday, June 07, 2002, 8:26:55 PM, you wrote:
JLL> Unbelievable, but you may have a couple of record with null user_email.
JLL> Try:
JLL> select * from broken where user_email is null;
JLL> select * from reg_user where user_email is null;
Nope, there are none. Which is expected, 'cause user_email was
declared not null:
rdw=# select * from reg_user where user_email is null;
user_id | user_email | user_passwd | user_active | user_allow_pauth | user_full_name | user_pseudonym | user_who
---------+------------+-------------+-------------+------------------+----------------+----------------+----------
(0 rows)
rdw=# select * from broken where user_email is null;
user_email
------------
(0 rows)
And the locale is probably OK, too. I tried dumping the table in
question and passing it through sort/uniq. It worked OK, unlike GROUP BY in
Postgres.
JLL> Alexey Borzov wrote:
>>
>> Greetings!
>>
>> Before I present the following, I must say that 'simple' index
>> corruption is highly improbable; the server did not suffer hard
>> reboots.
>>
[skip]
--
Yours, Alexey V. Borzov, webmaster of RDW.ru
From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Alexey Borzov <borz_off(at)rdw(dot)ru> |
Cc: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: sorting/grouping/(non-)unique indexes bug |
Date: | 2002-06-09 01:43:11 |
Message-ID: | 101.1023586991@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-general |
Alexey Borzov <borz_off(at)rdw(dot)ru> writes:
> rdw=# --finally, no more duplicates!
> rdw=# create unique index broken_email_key on broken (user_email);
> ERROR: Cannot create unique index. Table contains non-unique values
Seems pretty broken :-(
> rdw=# select version();
> version
> -------------------------------------------------------------
> PostgreSQL 7.2.1 on i686-pc-linux-gnu, compiled by GCC 2.96
Uh, what is the platform exactly?
> As you probably guessed, reg_user table contains actual email
> addresses of our site's registered users. I can provide them for
> testing (without passwords and stuff, of course) if this is going to
> help.
If you could send me the email address data (off-list), I could try
to reproduce the problem here.
regards, tom lane