Re: unique index violation after pg_upgrade to PG10

From: Justin Pryzby <pryzby(at)telsasoft(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Kenneth Marshall <ktm(at)rice(dot)edu>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: unique index violation after pg_upgrade to PG10
Date: 2017-10-25 05:20:39
Message-ID: 20171025052039.GY21735@telsasoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Oct 24, 2017 at 02:57:47PM -0700, Peter Geoghegan wrote:
> On Tue, Oct 24, 2017 at 1:11 PM, Justin Pryzby <pryzby(at)telsasoft(dot)com> wrote:
> > ..which I gather just verifies that the index is corrupt, not sure if there's
> > anything else to do with it? Note, we've already removed the duplicate rows.
>
> Yes, the index itself is definitely corrupt -- this failed before the
> new "heapallindexed" check even started. Though it looks like that
> would have failed too, if you got that far, since the index points to
> a row that does not contain the same data. (I only know this because
> you dumped the heap tuple and the index tuple.)

I think you must have compared these:

On Tue, Oct 24, 2017 at 03:11:44PM -0500, Justin Pryzby wrote:
> ts=# SELECT * FROM bt_page_items(get_raw_page('sites_idx', 1));
>
> itemoffset | 48
> ctid | (1,37)
> itemlen | 32
> nulls | f
> vars | t
> data | 1b 43 52 43 4c 4d 54 2d 43 45 4d 53 30 0b 31 31 31 31 00 00 00 00 00 00
...
> itemoffset | 37
> ctid | (0,97)
> itemlen | 24
> nulls | f
> vars | t
> data | 1b 43 52 43 4c 4d 54 2d 43 45 4d 53 30 03 00 00

..but note those are both items in sites_idx (48 and 37, which I seem to have
pasted out of order).. I included both because I'm not confident I know what
the "index tid=(1,37)" referred to, but I gather it means item at offset=37
(and not item with ctid=(1,37).)

| [pryzbyj(at)database amcheck]$ psql --port 5678 ts -c "SELECT bt_index_check('sites_idx'::regclass::oid, heapallindexed=>True)"
| ERROR: high key invariant violated for index "sites_idx"
| DETAIL: Index tid=(1,37) points to heap tid=(0,97) page lsn=0/0.

ts=# SELECT * FROM page_header(get_raw_page('sites_idx', 1));
lsn | checksum | flags | lower | upper | special | pagesize | version | prune_xid
-----+----------+-------+-------+-------+---------+----------+---------+-----------
0/0 | 0 | 0 | 872 | 1696 | 8176 | 8192 | 4 | 0

Here is its heap page:

ts=# SELECT * FROM heap_page_items(get_raw_page('sites', 0)) WHERE lp=97;
lp | lp_off | lp_flags | lp_len | t_xmin | t_xmax | t_field3 | t_ctid | t_infomask2 | t_infomask | t_hoff | t_bits | t_oid | t_data
----+--------+----------+--------+--------+----------+----------+--------+-------------+------------+--------+------------------+-------+--------------------------------------------
97 | 968 | 1 | 52 | 21269 | 33567444 | 0 | (3,27) | 8204 | 2307 | 32 | 1110100000010000 | | \x700000001b4352434c4d542d43454d5330030303

Which I see ends with 0303 vs 0000..

t_infomask=2307=2048+256+3 =>
#define HEAP_HASNULL 0x0001 /* has null attribute(s) */
#define HEAP_HASVARWIDTH 0x0002 /* has variable-width attribute(s) */
#define HEAP_XMIN_COMMITTED 0x0100 /* t_xmin committed */
#define HEAP_XMAX_INVALID 0x0800 /* t_xmax invalid/aborted */

t_infomask2=8204 => 8192+12 =>
#define HEAP_KEYS_UPDATED 0x2000 /* tuple was updated and key cols modified, or tuple deleted */

Maybe this is relevant ?
ts=# SELECT * FROM heap_page_items(get_raw_page('sites', 3)) WHERE lp=27;
lp | lp_off | lp_flags | lp_len | t_xmin | t_xmax | t_field3 | t_ctid | t_infomask2 | t_infomask | t_hoff | t_bits | t_oid | t_data
----+--------+----------+--------+--------+--------+----------+--------+-------------+------------+--------+--------+-------+--------
27 | 0 | 0 | 0 | | | | | | | | | |

Justin

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2017-10-25 05:33:46 Re: Current int & float overflow checking is slow.
Previous Message Michael Paquier 2017-10-25 04:36:13 Re: Remove secondary checkpoint