Re: Still a bug in the VACUUM ??? !!!

Lists: pgsql-adminpgsql-hackers
From: Andreas Schmitz <a(dot)schmitz(at)cityweb(dot)de>
To: pgsql-admin(at)postgresql(dot)org
Subject: Still a bug in the VACUUM ??? !!!
Date: 2003-02-14 14:52:32
Message-ID: 200302141552.32907.a.schmitz@cityweb.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin pgsql-hackers


Hello *,

I have a problem with the "vacuum full". every time I run the vacuum command I
loose data from the parent tables. maybe also from the subtables (haven't
checked yet). I tried it a few times up to now an I can reproduce the
phenomena.

I am running postgresql 7.3.2 on solaris 8 (E450 4x 400 sun4u 1.5 GB)

regards

-Andreas

--
Andreas Schmitz - Phone +49 201 8501 318
Cityweb-Technik-Service-Gesellschaft mbH
Friedrichstr. 12 - Fax +49 201 8501 104
45128 Essen - email a(dot)schmitz(at)cityweb(dot)de


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: a(dot)schmitz(at)cityweb(dot)de
Cc: pgsql-admin(at)postgresql(dot)org
Subject: Re: Still a bug in the VACUUM ??? !!!
Date: 2003-02-14 16:55:01
Message-ID: 7967.1045241701@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin pgsql-hackers

Andreas Schmitz <a(dot)schmitz(at)cityweb(dot)de> writes:
> I have a problem with the "vacuum full". every time I run the vacuum command I
> loose data from the parent tables. maybe also from the subtables (haven't
> checked yet). I tried it a few times up to now an I can reproduce the
> phenomena.

That sounds ugly ... but are you sure you don't have a hardware problem?
I don't think anyone's ever reported such behavior before.

If it is a Postgres bug, we can't do much to help you without a lot more
detail.

regards, tom lane


From: Andreas Schmitz <a(dot)schmitz(at)cityweb(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-admin(at)postgresql(dot)org
Subject: Re: Still a bug in the VACUUM ??? !!!
Date: 2003-02-15 11:51:09
Message-ID: 200302151251.09162.a.schmitz@cityweb.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin pgsql-hackers

On Friday 14 February 2003 17:55, Tom Lane wrote:
> Andreas Schmitz <a(dot)schmitz(at)cityweb(dot)de> writes:
> > I have a problem with the "vacuum full". every time I run the vacuum
> > command I loose data from the parent tables. maybe also from the
> > subtables (haven't checked yet). I tried it a few times up to now an I
> > can reproduce the phenomena.
>
> That sounds ugly ... but are you sure you don't have a hardware problem?
> I don't think anyone's ever reported such behavior before.
>
> If it is a Postgres bug, we can't do much to help you without a lot more
> detail.
>
> regards, tom lane

hi,

it does sound ugly. I checked the hardware. I can't see any problems with it.
I know, somestimes you need a lot of luck to see a CPU problem under solaris.
But I think the hardware is ok. however, what kind of details do you need to
qualify if it's a postgres problem or not ?

regards

-andreas schmitz


From: daniel alvarez <d-alvarez(at)gmx(dot)de>
To: a(dot)schmitz(at)cityweb(dot)de
Cc: tgl(at)sss(dot)pgh(dot)pa(dot)us, pgsql-admin(at)postgresql(dot)org
Subject: Re: Still a bug in the VACUUM ??? !!!
Date: 2003-02-15 13:14:42
Message-ID: 26148.1045314882@www38.gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin pgsql-hackers


> > > I have a problem with the "vacuum full". every time I run the vacuum
> > > command I loose data from the parent tables. maybe also from the
> > > subtables (haven't checked yet). I tried it a few times up to now an I
> > > can reproduce the phenomena.
> >
> > That sounds ugly ... but are you sure you don't have a hardware problem?
> > I don't think anyone's ever reported such behavior before.
> >
> > If it is a Postgres bug, we can't do much to help you without a lot more
> > detail.
> >
> > regards, tom lane
>

I have a similiar problem with VACUUM FULL ANALYZE.

I do not loose any data, but get hundreds of uninitialized pages and
a segmentation fault. Processing is very slow (twenty minutes).

The only thing unusual about my configuration is that system indices
are bloated. I expect the hardwhere to be ok, but I can not verify it
because the sever is hosted elsewhere.

The last lines of output were:

NOTICE: Rel pg_class: Uninitialized page 3344 - fixing

NOTICE: Rel pg_class: Uninitialized page 3345 - fixing

NOTICE: Rel pg_class: Uninitialized page 3346 - fixing

NOTICE: Rel pg_class: Uninitialized page 3347 - fixing

NOTICE: Rel pg_class: Uninitialized page 3348 - fixing

NOTICE: Rel pg_class: Uninitialized page 3349 - fixing

NOTICE: Rel pg_class: Uninitialized page 3350 - fixing

NOTICE: Rel pg_class: Uninitialized page 3351 - fixing

NOTICE: Rel pg_class: Uninitialized page 3352 - fixing

NOTICE: Rel pg_class: Uninitialized page 3353 - fixing

NOTICE: Rel pg_class: Uninitialized page 3354 - fixing

NOTICE: Rel pg_class: Uninitialized page 3355 - fixing

NOTICE: Rel pg_class: Uninitialized page 3356 - fixing

batch/nachts.sh: line 3: 30855 Segmentation fault /usr/bin/php -q
/usr/local/httpd/htdocs/kunden/web41/html/wcopy/batch/vacuum.php

Running VACUUM FULL ANALYZE another time there were no errors.

> hi,
>
> it does sound ugly. I checked the hardware. I can't see any problems with
> i
> t.
> I know, somestimes you need a lot of luck to see a CPU problem under
> solari
> s.
> But I think the hardware is ok. however, what kind of details do you need
> t
> o
> qualify if it's a postgres problem or not ?
>
> regards
>
> -andreas schmitz

--
+++ GMX - Mail, Messaging & more http://www.gmx.net +++
Bitte lächeln! Fotogalerie online mit GMX ohne eigene Homepage!


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: a(dot)schmitz(at)cityweb(dot)de
Cc: pgsql-admin(at)postgresql(dot)org
Subject: Re: Still a bug in the VACUUM ??? !!!
Date: 2003-02-15 16:03:58
Message-ID: 16338.1045325038@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin pgsql-hackers

Andreas Schmitz <a(dot)schmitz(at)cityweb(dot)de> writes:
> however, what kind of details do you need to
> qualify if it's a postgres problem or not ?

Ultimately, we need a way to reproduce the problem for debugging.
If it is a Postgres bug, it should be possible to reproduce it.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: daniel alvarez <d-alvarez(at)gmx(dot)de>
Cc: a(dot)schmitz(at)cityweb(dot)de, pgsql-admin(at)postgresql(dot)org
Subject: Re: Still a bug in the VACUUM ??? !!!
Date: 2003-02-15 16:11:20
Message-ID: 16387.1045325480@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin pgsql-hackers

daniel alvarez <d-alvarez(at)gmx(dot)de> writes:
> NOTICE: Rel pg_class: Uninitialized page 3344 - fixing
> NOTICE: Rel pg_class: Uninitialized page 3345 - fixing
> NOTICE: Rel pg_class: Uninitialized page 3346 - fixing
> NOTICE: Rel pg_class: Uninitialized page 3347 - fixing
> NOTICE: Rel pg_class: Uninitialized page 3348 - fixing
> NOTICE: Rel pg_class: Uninitialized page 3349 - fixing
> [etc]

This is a known and not very serious problem --- see
http://archives.postgresql.org/pgsql-hackers/2002-11/msg00486.php

> batch/nachts.sh: line 3: 30855 Segmentation fault /usr/bin/php -q
> /usr/local/httpd/htdocs/kunden/web41/html/wcopy/batch/vacuum.php

As best I can tell, that's your own client code crashing, not Postgres.

regards, tom lane


From: Andreas Schmitz <a(dot)schmitz(at)cityweb(dot)de>
To: daniel alvarez <d-alvarez(at)gmx(dot)de>
Cc: tgl(at)sss(dot)pgh(dot)pa(dot)us, pgsql-admin(at)postgresql(dot)org
Subject: Re: Still a bug in the VACUUM ??? !!!
Date: 2003-02-17 12:23:36
Message-ID: 200302171323.36606.a.schmitz@cityweb.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin pgsql-hackers


> The last lines of output were:
>
>
> NOTICE: Rel pg_class: Uninitialized page 3344 - fixing
>
[...]
> NOTICE: Rel pg_class: Uninitialized page 3355 - fixing
>
> NOTICE: Rel pg_class: Uninitialized page 3356 - fixing
>
> batch/nachts.sh: line 3: 30855 Segmentation fault /usr/bin/php -q
> /usr/local/httpd/htdocs/kunden/web41/html/wcopy/batch/vacuum.php
>
>
> Running VACUUM FULL ANALYZE another time there were no errors.

Hi,

I think it is not the same. When I ran the vaccum when no other clients where
connected to the database. I did the "update dpa_text set
titleidx=txt2txtidx(volltext);" When I ran "vacuumdb --full --verbose
--analyze -d newsdb2 >>/tmp/vacuum.log 2>&1" Maybe the log will provide some
information. I noticed a few messages like this in the database log while
running the vacuum:

Feb 17 11:19:36 postgres2 postgres[1803]: [ID 553393 local0.info] [5] LOG:
pq_flush: send() failed: Broken pipe

regards

-andreas

--
Andreas Schmitz - Phone +49 201 8501 318
Cityweb-Technik-Service-Gesellschaft mbH
Friedrichstr. 12 - Fax +49 201 8501 104
45128 Essen - email a(dot)schmitz(at)cityweb(dot)de

Attachment Content-Type Size
vacuum.log text/x-log 56.1 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: a(dot)schmitz(at)cityweb(dot)de
Cc: daniel alvarez <d-alvarez(at)gmx(dot)de>, pgsql-admin(at)postgresql(dot)org
Subject: Re: Still a bug in the VACUUM ??? !!!
Date: 2003-02-17 18:56:20
Message-ID: 9919.1045508180@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin pgsql-hackers

Andreas Schmitz <a(dot)schmitz(at)cityweb(dot)de> writes:
> I think it is not the same. When I ran the vaccum when no other clients whe=
> re=20
> connected to the database.

The vacuum that reports the NOTICEs is not the one that created the
problem. The scenario I was talking about requires concurrent clients
during the preceding vacuum.

regards, tom lane


From: Andreas Schmitz <a(dot)schmitz(at)cityweb(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: daniel alvarez <d-alvarez(at)gmx(dot)de>, pgsql-admin(at)postgresql(dot)org
Subject: Re: Still a bug in the VACUUM ??? !!!
Date: 2003-02-17 20:42:11
Message-ID: 200302172142.11697.a.schmitz@cityweb.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin pgsql-hackers

On Monday 17 February 2003 19:56, Tom Lane wrote:
> Andreas Schmitz <a(dot)schmitz(at)cityweb(dot)de> writes:
> > I think it is not the same. When I ran the vaccum when no other clients
> > whe= re=20
> > connected to the database.
>
> The vacuum that reports the NOTICEs is not the one that created the
> problem. The scenario I was talking about requires concurrent clients
> during the preceding vacuum.
>
> regards, tom lane

Hi,

ok. I got that one. I was able to reproduce it. but it still doesn't solve the
problem. fact is that I loose data and that is a big problem.

regards

-andreas


From: Andreas Schmitz <a(dot)schmitz(at)cityweb(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-admin(at)postgresql(dot)org
Subject: Re: Still a bug in the VACUUM ??? !!!
Date: 2003-02-28 09:16:29
Message-ID: 200302281016.29799.a.schmitz@cityweb.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin pgsql-hackers


> Ultimately, we need a way to reproduce the problem for debugging.
> If it is a Postgres bug, it should be possible to reproduce it.
>
> regards, tom lane

Hello,

the vacuum seems to be ok. I discovered that the "cluster" kills the data for
some reason. I could reproduce it on to sun (SS20 sol9, e450 sol8, redhat 7.3
-> postgresql-7.3.2). I took the same dump to all of these machines und ran
the cluster command.

tables: multicom(pk)->multithumb(fk) (see attached schema)

I checked before the cluster run:

select id from multithumb where id not in (select id from multicom) order by
id desc;

and it returned 0 row(s).

after this I ran "cluster multicom_txt_t_idx on multicom;" and repeated the
statement above. 33 rows of data from the multicom table where gon and 32
references in multithumb were still there. the same 33 rows (id and data)
were killed on all three databases. I tried it a few times and I always got
the output. same id's same data.

I picked the id's and had a look at the data.

select * from multicom where id=18379;
id | zulieferer | prioritaet | rubrik | kategorie | datum
| kennung | dachzeile | ueberschrift | unterzeile |
autor | zwischentitel | vorspann | kurztext | gueltig_bis |
region | dateiname | volltext
| titleidx
-------+------------+------------+------------+-----------+------------------------+------------------------------+-----------+--------------+------------+-------+---------------+----------+----------+------------------------+----------+-------------------------------------------+--------------------------------+----------
18379 | otz | 10 | Wirtschaft | BIL | 2003-02-26
19:47:00+01 | onbildotzBILWirNational37676 | | |
| | | | | 2003-03-02 19:47:00+01 |
National | /var/data/multikom/otz/m/WIRT/bulaptop.as | <p
class="contentfliess"> </p> |
(1 row)

No txtidx was build on that datarows 'cause no long text was inserted. I
filled the empty fields with the "<NODATA>" string. But it didn't help.

can someone give me some advise on this.

thanks and regards

-andreas

--
Andreas Schmitz - Phone +49 201 8501 318
Cityweb-Technik-Service-Gesellschaft mbH
Friedrichstr. 12 - Fax +49 201 8501 104
45128 Essen - email a(dot)schmitz(at)cityweb(dot)de


From: Andreas Schmitz <a(dot)schmitz(at)cityweb(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-admin(at)postgresql(dot)org
Subject: Re: Still a bug in the VACUUM ??? !!! (SCHEMA)
Date: 2003-02-28 09:17:40
Message-ID: 200302281017.40071.a.schmitz@cityweb.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin pgsql-hackers


ops ... forgot to attach the schema

--
Andreas Schmitz - Phone +49 201 8501 318
Cityweb-Technik-Service-Gesellschaft mbH
Friedrichstr. 12 - Fax +49 201 8501 104
45128 Essen - email a(dot)schmitz(at)cityweb(dot)de

Attachment Content-Type Size
tables.sql text/plain 2.4 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: a(dot)schmitz(at)cityweb(dot)de
Cc: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Teodor Sigaev <teodor(at)stack(dot)net>, pgsql-admin(at)postgresql(dot)org
Subject: Re: Still a bug in the VACUUM ??? !!!
Date: 2003-02-28 20:40:54
Message-ID: 6523.1046464854@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin pgsql-hackers

Andreas Schmitz <a(dot)schmitz(at)cityweb(dot)de> writes:
> the vacuum seems to be ok. I discovered that the "cluster" kills the
> data for some reason.

Hmm. I'm not sure whether clustering on a GIST index is really useful
(seeing that there's no linear ordering associated with GIST). But it
shouldn't lose data.

I was able to reproduce a problem as follows: run the tsearch regression
test, then do "cluster wowidx on test_txtidx". This appears to lose
one row:

regression=# select count(*) from test_txtidx;
count
-------
502
(1 row)

regression=# cluster wowidx on test_txtidx;
CLUSTER
regression=# select count(*) from test_txtidx;
count
-------
501
(1 row)

Oleg, Teodor, can you investigate this?

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: a(dot)schmitz(at)cityweb(dot)de
Cc: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Teodor Sigaev <teodor(at)stack(dot)net>, pgsql-hackers(at)postgreSQL(dot)org
Subject: CLUSTER loses nulls (was Re: [ADMIN] Still a bug in the VACUUM)
Date: 2003-02-28 23:54:13
Message-ID: 8187.1046476453@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin pgsql-hackers

I said:
> I was able to reproduce a problem as follows: run the tsearch regression
> test, then do "cluster wowidx on test_txtidx". This appears to lose
> one row:

Ahh ... it took me way too long to realize what was happening. The
problem is simply that GiST indexes do not index nulls (at least not in
the first column of an index). So if you CLUSTER, you lose any rows
that contain NULLs in the indexed column --- they're not in the index,
so they're not seen by the indexscan that copies the data over to the
new table.

Having CLUSTER lose data is obviously not acceptable :-(. I can see two
possible solutions:

* Make CLUSTER error out if the target index is not of an 'amindexnulls'
index AM. This would amount to restricting CLUSTER to b-trees, which is
annoying.

* If the index is not amindexnulls and the first target column is not
marked attnotnull, make an extra seqscan pass over the source table to
look for rows containing nulls. Copy these rows separately. This would
work but adds a good deal of overhead.

Approach #2 is even worse for functional indexes --- attnotnull is not
helpful. We'd have to actually evaluate the function at every single
row to see if it yields NULL there. Yech.

It occurs to me also that the same kind of pitfall exists for partial
indexes: cluster on a partial index, you lose. However, I don't have
a problem with simply refusing to cluster on partial indexes.

Comments? Anyone want to do the legwork to fix this?

regards, tom lane