Re: unique index violation after pg_upgrade to PG10

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Justin Pryzby <pryzby(at)telsasoft(dot)com>
Cc: Kenneth Marshall <ktm(at)rice(dot)edu>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: unique index violation after pg_upgrade to PG10
Date: 2017-10-25 05:39:26
Message-ID: CAH2-Wz=0s6zxoj4JEL+2dvXo4QyE4gif3OHYVQ-TBPhyR0BMCQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Oct 24, 2017 at 10:20 PM, Justin Pryzby <pryzby(at)telsasoft(dot)com> wrote:
> I think you must have compared these:

Yes, I did. My mistake.

> On Tue, Oct 24, 2017 at 03:11:44PM -0500, Justin Pryzby wrote:
>> ts=# SELECT * FROM bt_page_items(get_raw_page('sites_idx', 1));
>>
>> itemoffset | 48
>> ctid | (1,37)
>> itemlen | 32
>> nulls | f
>> vars | t
>> data | 1b 43 52 43 4c 4d 54 2d 43 45 4d 53 30 0b 31 31 31 31 00 00 00 00 00 00
> ...
>> itemoffset | 37
>> ctid | (0,97)
>> itemlen | 24
>> nulls | f
>> vars | t
>> data | 1b 43 52 43 4c 4d 54 2d 43 45 4d 53 30 03 00 00
>
> ..but note those are both items in sites_idx (48 and 37, which I seem to have
> pasted out of order).. I included both because I'm not confident I know what
> the "index tid=(1,37)" referred to, but I gather it means item at offset=37
> (and not item with ctid=(1,37).)
>
> | [pryzbyj(at)database amcheck]$ psql --port 5678 ts -c "SELECT bt_index_check('sites_idx'::regclass::oid, heapallindexed=>True)"
> | ERROR: high key invariant violated for index "sites_idx"
> | DETAIL: Index tid=(1,37) points to heap tid=(0,97) page lsn=0/0.

This means that the item at (1,37) in the index is not <= the high
key, which is located at (1,1). (The high key comes first, despite
being an upper bound rather than a lower bound, per pageinspect docs.)

I find it suspicious that the page lsn is 0 here, btw.

> ts=# SELECT * FROM page_header(get_raw_page('sites_idx', 1));
> lsn | checksum | flags | lower | upper | special | pagesize | version | prune_xid
> -----+----------+-------+-------+-------+---------+----------+---------+-----------
> 0/0 | 0 | 0 | 872 | 1696 | 8176 | 8192 | 4 | 0
>
> Here is its heap page:
>
> ts=# SELECT * FROM heap_page_items(get_raw_page('sites', 0)) WHERE lp=97;
> lp | lp_off | lp_flags | lp_len | t_xmin | t_xmax | t_field3 | t_ctid | t_infomask2 | t_infomask | t_hoff | t_bits | t_oid | t_data
> ----+--------+----------+--------+--------+----------+----------+--------+-------------+------------+--------+------------------+-------+--------------------------------------------
> 97 | 968 | 1 | 52 | 21269 | 33567444 | 0 | (3,27) | 8204 | 2307 | 32 | 1110100000010000 | | \x700000001b4352434c4d542d43454d5330030303
>
> Which I see ends with 0303 vs 0000..

Looks like I was accidentally right, then -- the heap and index do differ.

You might try this tool I published recently, to get a better sense of
details like this:

https://github.com/petergeoghegan/pg_hexedit

(Don't do so with a data directory that you cannot afford to corrupt
again, though!)

> Maybe this is relevant ?
> ts=# SELECT * FROM heap_page_items(get_raw_page('sites', 3)) WHERE lp=27;
> lp | lp_off | lp_flags | lp_len | t_xmin | t_xmax | t_field3 | t_ctid | t_infomask2 | t_infomask | t_hoff | t_bits | t_oid | t_data
> ----+--------+----------+--------+--------+--------+----------+--------+-------------+------------+--------+--------+-------+--------
> 27 | 0 | 0 | 0 | | | | | | | | | |

This looks like an LP_DEAD item.

--
Peter Geoghegan

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2017-10-25 05:53:49 Re: Current int & float overflow checking is slow.
Previous Message Robert Haas 2017-10-25 05:33:46 Re: Current int & float overflow checking is slow.