Column missing from pg_statistics

From: Kadri Raudsepp <kadri(dot)raudsepp(at)nordicgaming(dot)com>
To: pgsql-admin(at)postgresql(dot)org
Subject: Column missing from pg_statistics
Date: 2014-01-10 14:31:48
Message-ID: CAJ_Sy+yzsbj6QNCQMyuh+PsnnHYUMJ+1_9La4=U9XXryeZsmbg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin pgsql-hackers

Hello,

I'm DBA for several quite large postgres databases on debian servers.
Current version:
PostgreSQL 9.2.6 on x86_64-unknown-linux-gnu, compiled by gcc (Debian
4.7.2-5) 4.7.2, 64-bit
but the problem I'm about to describe was discovered before upgrade, while
running v9.2.4.

I have set up a cron-job that sends me daily reports on bloat amount in
tables and indices, which I calculate using pg_stats, not pgstattuple, for
performance and I/O reasons. If the bloat amount or percentage are big
enough, I use pg_repack to get rid of it. At some point I noticed, that
some tables keep showing up in the reports with the same amount of bloat,
which pg_repack was seemingly unable to remove. Investigation showed that
pgstattuple gave very different results than my bloat-finding query.
Reason - for some tables there are some columns that never show up in
pg_statistics.

I'm going to give an example of a table that has this issue. I have
vacuumed it, analyzed it, used "alter column set statistic" specifically on
the missing column - no results. I hope someone has encountered this issue
before or at least could give me some new ideas, why this is or how to fix
it.

Here's the example:

# \d+ d3s_log
Table
"public.d3s_log"
Column | Type |
Modifiers | Storage | Stats target | Description
---------------+-----------------------------+------------------------------------------------------+----------+--------------+-------------
id | bigint | not null default
nextval('d3s_log_id_seq'::regclass) | plain | |
users_id | integer | not
null | plain |
|
bet_id | integer | not
null | plain |
|
bet_state | integer
| | plain
| |
d3s_trans_id | character varying(255) | not
null | extended |
|
operation | d3s_operation_enum | not
null | plain |
|
created | timestamp without time zone | not null default
now() | plain | |
success | boolean | not
null | plain |
|
response_time | timestamp without time zone
| | plain
| |
path | character varying(255) | default NULL::character
varying | extended | |
params | text
| | extended
| |
request | text
| | extended |
1000 |
response | text
| | extended
| |
Indexes:
"d3s_log_pkey" PRIMARY KEY, btree (id)
"d3s_log_bet_id_operation_success" btree (bet_id, operation, success)
"d3s_log_bets_id_index" btree (bet_id)
"d3s_log_created" btree (created)
"d3s_log_users_id_index" btree (users_id)
Has OIDs: no

# SELECT s.staattnum, a.attname FROM pg_statistic s INNER JOIN pg_attribute
a ON (a.attrelid,a.attnum)=(s.starelid,s.staattnum) WHERE starelid =
'd3s_log'::regclass;
staattnum | attname
-----------+---------------
1 | id
2 | users_id
3 | bet_id
4 | bet_state
5 | d3s_trans_id
6 | operation
7 | created
8 | success
9 | response_time
10 | path
11 | params
13 | response
(12 rows)

## As can be seen, column 12, "request", is missing.

# select tablename, attname, null_frac, avg_width from pg_stats where
tablename = 'd3s_log';
tablename | attname | null_frac | avg_width
-----------+---------------+------------+-----------
d3s_log | id | 0 | 8
d3s_log | users_id | 0 | 4
d3s_log | bet_id | 0 | 4
d3s_log | bet_state | 1 | 4
d3s_log | d3s_trans_id | 0 | 23
d3s_log | operation | 0 | 4
d3s_log | created | 0 | 8
d3s_log | success | 0 | 1
d3s_log | response_time | 0 | 8
d3s_log | path | 0.00303333 | 48
d3s_log | params | 0 | 180
d3s_log | response | 0.00123333 | 456
(12 rows)

## Again, "request" is missing, but there were small changes in avg_width
for other columns after analyze, so some statistics calculation seems to be
working.

The difference between pgstattuple and calculated bloat is 6.5 GB out of 15
GB total table size, which is approximately the amount of data stored in
the extended "request" column.
I also did pg_dump of this table and restored it on another server, running
v9.2.4 - no errors, analyze calculates statistics, but "request" is still
missing there, too.
I have also checked other tables with the same problem - the common thing
seems to be that all the missing columns are extended, type text or
varchar, but I'm not 100% sure, haven't checked them all yet. The size of
the missing column is not a common thing - on some tables the missing
column is almost empty.

Looking forward to your answers and ideas,
Katie

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Tom Lane 2014-01-10 16:15:14 Re: [ADMIN] Column missing from pg_statistics
Previous Message Renato Forti 2014-01-09 16:30:38 RES: RES: Remote Backup (pg_dump_all) Windows/Linux.

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2014-01-10 14:49:35 Re: [PATCH] Negative Transition Aggregate Functions (WIP)
Previous Message Merlin Moncure 2014-01-10 14:10:00 Re: array_length(anyarray)