implicit abort harmful?

Lists: pgsql-general
From: "Wayne Armstrong" <wdarmst(at)bacchus(dot)com(dot)au>
To: pgsql-general(at)postgresql(dot)org
Subject: implicit abort harmful?
Date: 2003-05-25 00:04:30
Message-ID: 200305250004.h4P04XSY019198@mail.bacchus.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Hi,
I have been using postgresql for about 3 months now (after about 15 years of
using various other dbms).

I love it :) but - (and you just knew that but was coming :) there is one thing
that really causes me grief.
.
It is the tendency for postgres to rollback automatically on some errors.

What this leads to is the need for an application to commit much more
frequently than is really desirable.

Example1.

During import of 120 thousand records from an isam file system, 3 say records
fail integrity checks ( files in non-database systems tend not to have
referential integrity implemented on them except at an application level
(meaning not at all :). The desired result is to drop the records failing
integrity checks.
Importing into db2 or oracle say - I have the option to ignore the referential
integrity errors (i get the error code on the insert anyway), and continue with
the insert of the 120 thousand - 3 records.
In postgres, I either have to commit after every record write, or guarantee the
"cleanness" of the data before I begin the import - which is sometimes
difficult given the data sources I may be importing from (Often I neither own
nor can modify them).
Worse, if the intention behind the automatic rollback is to guarantee data
purity (from the postgresql manual 10.4.1 -
" Turn off autocommit and just do one commit at the end. (In plain SQL, this
means issuing BEGIN at the start and COMMIT at the end. Some client libraries
may do this behind your back, in which case you need to make sure the library
does it when you want it done.) If you allow each insertion to be committed
separately, PostgreSQL is doing a lot of work for each record added. An
additional benefit of doing all insertions in one transaction is that if the
insertion of one record were to fail then the insertion of all records inserted
up to that point would be rolled back, so you won't be stuck with partially
loaded data." ),
It actually makes more of a mess than leaving the decision to rollback under
application control.
If in this example the three "bad" records are scattered throughout the
dataset, I could end up (if I ignore the error as i would do for most other
dbms), with a random number of records in the table. etc etc.

Of course also, to do robust imports of the insert ifError update style
avaiilable in most other dbms I have worked with, it is also nescessary to
issue a commit before each insert/update attempt.

Example2 - ODBC driver rollbacks :-

Here is a snippet from a postgres log :-
2003-05-24 23:09:14 [1525] LOG: duration: 0.134503 sec
2003-05-24 23:09:14 [1525] LOG: query: select nspname from pg_namespace n, p
_class c where c.relnamespace=n.oid and c.oid='select'::regclass
2003-05-24 23:09:14 [1525] ERROR: Relation "select" does not exist
2003-05-24 23:09:14 [1525] LOG: statement: select nspname from pg_namespace
, pg_class c where c.relnamespace=n.oid and c.oid='select'::regclass
2003-05-24 23:09:14 [1525] LOG: query: ROLLBACK
2003-05-24 23:09:14 [1525] LOG: duration: 0.000538 sec

The rollback here is generated by the odbc driver to clear an error created by
the odbc driver incorrectly parsing the select from a subselect statement as a
table name.
The application is totally unaware an error has occured and a rollback has been
issued.
This is most likely to lead to data loss, unless, basically, an application
using odbc is in autocommit mode or commits after every sql statement issued.

This concerns me to the piont where I would really recommend not using the
parse statement option in the odbc driver (even though that reduces the odbc
drivers ability to mimic a prepare which is also problematic :) unless you are
using autocommit.

For any application complex enough to be working with autocommit turned off in
the odbc driver, the auto rollback is a real problem.

Please consider making the automatic abort/rollback feature an option ?

Regards,
Wayne Armstorng
Bacchus Management Systems
http://www.bacchus.com.au


From: "Nigel J(dot) Andrews" <nandrews(at)investsystems(dot)co(dot)uk>
To: Wayne Armstrong <wdarmst(at)bacchus(dot)com(dot)au>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: implicit abort harmful?
Date: 2003-05-25 01:32:21
Message-ID: Pine.LNX.4.21.0305250200020.4741-100000@ponder.fairway2k.co.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Sun, 25 May 2003, Wayne Armstrong wrote:

> Hi,
> I have been using postgresql for about 3 months now (after about 15 years of
> using various other dbms).
>
> I love it :) but - (and you just knew that but was coming :) there is one thing
> that really causes me grief.
> .
> It is the tendency for postgres to rollback automatically on some errors.
>
> ...
>
> It actually makes more of a mess than leaving the decision to rollback under
> application control.
> If in this example the three "bad" records are scattered throughout the
> dataset, I could end up (if I ignore the error as i would do for most other
> dbms), with a random number of records in the table. etc etc.

I'm confused on this. You're complaining that Postgresql lets you insert and
commit a known number of records or rollback and insert none and yet you seem
here to be saying in support of not applying that level of data integrity
checks that it is messy to not apply that level of checking and ending up with
a random number of inserted tuples.

> Of course also, to do robust imports of the insert ifError update style
> avaiilable in most other dbms I have worked with, it is also nescessary to
> issue a commit before each insert/update attempt.

Ok. So you're faced with a client wanting to do an bulk import of data. 3 out
of a million records fail the import so you suggest the client determines the
business reasons for why those records fail the import and are stopping the
entire import or you import the rest and then spend the next six months
alternately trying to a) get the client to determine the business reasons for
the failure of those records because the lack of them isn't causing a problem
and b) explaining to the client exactly why what they are trying to do
won't work because of the lack of those business records?

Ok, so after a couple of months you do get used to it and it becomes water off
a ducks back but it still gets seen as a failing on your part by the
client. BTW, I do believe in giving a client what they want in case you're
wonder, just that sometimes you have to try and educate them in what they
_really_ want.

>
> Example2 - ODBC driver rollbacks :-
>
> Here is a snippet from a postgres log :-
> 2003-05-24 23:09:14 [1525] LOG: duration: 0.134503 sec
> 2003-05-24 23:09:14 [1525] LOG: query: select nspname from pg_namespace n, p
> _class c where c.relnamespace=n.oid and c.oid='select'::regclass
> 2003-05-24 23:09:14 [1525] ERROR: Relation "select" does not exist
> 2003-05-24 23:09:14 [1525] LOG: statement: select nspname from pg_namespace
> , pg_class c where c.relnamespace=n.oid and c.oid='select'::regclass
> 2003-05-24 23:09:14 [1525] LOG: query: ROLLBACK
> 2003-05-24 23:09:14 [1525] LOG: duration: 0.000538 sec
>
> The rollback here is generated by the odbc driver to clear an error created by
> the odbc driver incorrectly parsing the select from a subselect statement as a
> table name.
> The application is totally unaware an error has occured and a rollback has been
> issued.
> This is most likely to lead to data loss, unless, basically, an application
> using odbc is in autocommit mode or commits after every sql statement issued.

Ok. I can see that would be a pain but that is an odbc issue. If the odbc layer
is misparsing something and issuing a rollback then that is nothing to do with
the server, you'll hit that whether or not a single error server side forces a
transaction abort or not.


> This concerns me to the piont where I would really recommend not using the
> parse statement option in the odbc driver (even though that reduces the odbc
> drivers ability to mimic a prepare which is also problematic :) unless you are
> using autocommit.
>
> For any application complex enough to be working with autocommit turned off in
> the odbc driver, the auto rollback is a real problem.
>
> Please consider making the automatic abort/rollback feature an option ?

I've got to disagree with this strongly. Like pain, errors are there to tell
you something is wrong not to be ignored. Although, may be you're only on about
the odbc thing. Even in that case I would suggest that the error is in odbc not
flagging the offending query as being 'in error' when it generated the error
instead of the backend.

...Actually, rereading the log above I see what you mean. ODBC generates a
query which generates an error thus aborting the current transaction and making
odbc issue a rollback becuase it noticed the error raised...however, my comment
still stands, odbc should show that error to the client application. Indeed the
standard behaviour of postgresql is correct in this situation and odbc trying
to be clever has broken it. Until the client app. acknowledges an error
occured, by causing the issuance of the rollback, then all queries should raise
additional errors stopping the partial data commit you are seeing. If odbc was
not trying to be clever and hide that fact from the application then any
partial data commit would be the applications fault.

Of course, I could have completely misunderstood your post considering the hour
and everything.

--
Nigel J. Andrews


From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: Wayne Armstrong <wdarmst(at)bacchus(dot)com(dot)au>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: implicit abort harmful?
Date: 2003-05-25 02:29:38
Message-ID: 20030525022938.GB17757@svana.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Sun, May 25, 2003 at 10:04:30AM +1000, Wayne Armstrong wrote:
> During import of 120 thousand records from an isam file system, 3 say records
> fail integrity checks ( files in non-database systems tend not to have
> referential integrity implemented on them except at an application level
> (meaning not at all :). The desired result is to drop the records failing
> integrity checks.

Yes, I have that problem too. I actually wrote a script that took an input
file and automatically reissued queries that succeeded but got rolled-back.
I called it quickloader.pl :)

Probably the biggest problem is that you can't use COPY to load the data.
I've thought about loading into another table and transferring later but I
havn't sorted out the details.

> Example2 - ODBC driver rollbacks :-
>
> Here is a snippet from a postgres log :-
> 2003-05-24 23:09:14 [1525] LOG: duration: 0.134503 sec
> 2003-05-24 23:09:14 [1525] LOG: query: select nspname from pg_namespace n, p
> _class c where c.relnamespace=n.oid and c.oid='select'::regclass
> 2003-05-24 23:09:14 [1525] ERROR: Relation "select" does not exist
> 2003-05-24 23:09:14 [1525] LOG: statement: select nspname from pg_namespace
> , pg_class c where c.relnamespace=n.oid and c.oid='select'::regclass
> 2003-05-24 23:09:14 [1525] LOG: query: ROLLBACK
> 2003-05-24 23:09:14 [1525] LOG: duration: 0.000538 sec
>
> The rollback here is generated by the odbc driver to clear an error created by
> the odbc driver incorrectly parsing the select from a subselect statement as a
> table name.

Sorry, that's an ODBC problem. For internal stuff it should use queries that
can't fail like that. In this case it should use c.relname = 'select'; that
won't abort if the table don't exist.

--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> "the West won the world not by the superiority of its ideas or values or
> religion but rather by its superiority in applying organized violence.
> Westerners often forget this fact, non-Westerners never do."
> - Samuel P. Huntington


From: Jan Wieck <JanWieck(at)Yahoo(dot)com>
To: Martijn van Oosterhout <kleptog(at)svana(dot)org>
Cc: Wayne Armstrong <wdarmst(at)bacchus(dot)com(dot)au>, pgsql-general(at)postgresql(dot)org
Subject: Re: implicit abort harmful?
Date: 2003-05-30 00:25:56
Message-ID: 3ED6A514.6070102@Yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Martijn van Oosterhout wrote:
> On Sun, May 25, 2003 at 10:04:30AM +1000, Wayne Armstrong wrote:
>
>> During import of 120 thousand records from an isam file system, 3 say records
>>fail integrity checks ( files in non-database systems tend not to have
>>referential integrity implemented on them except at an application level
>>(meaning not at all :). The desired result is to drop the records failing
>>integrity checks.
>
>
> Yes, I have that problem too. I actually wrote a script that took an input
> file and automatically reissued queries that succeeded but got rolled-back.
> I called it quickloader.pl :)
>
> Probably the biggest problem is that you can't use COPY to load the data.
> I've thought about loading into another table and transferring later but I
> havn't sorted out the details.

The general idea would be to setup a table that has exactly the same
structure as the final target table, but with no constraints at all. As
long as your data passes all input functions you can even COPY it in.

Now you run check queries that show you all tuples in that staging table
that would fail constraints on the final table. Fix those and you can do

INSERT INTO final SELECT * FROM staging;

If step one fails because of data that doesn't pass the input functions
of our data types, you have to go through another level of staging with
a table that has text fields only and move it by explicit casting after
cleaning up those problems.

Jan

--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== JanWieck(at)Yahoo(dot)com #


From: "Wayne Armstrong" <wdarmst(at)bacchus(dot)com(dot)au>
To: Jan Wieck <JanWieck(at)Yahoo(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: implicit abort harmful?
Date: 2003-05-30 18:43:41
Message-ID: 200305301843.h4UIhf7v022234@mail.bacchus.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

** Reply to message from Jan Wieck <JanWieck(at)Yahoo(dot)com> on Thu, 29 May 2003
20:25:56 -0400
Hi Jan,
Yup, staging tables are probably going to be the answer in this case.
However, my point is, you shouldn't need workarounds for this. (or for the
more general insert on failure update on failure ignore type of logic used in a
lot of import scenarios)
Taking the decision if or not to rollback a transaction (sql error or no) away
from the application, and enforcing it in the dbm, results in kluges,
workarounds, and a generally less robust dbm.

Regards,
Wayne

> Martijn van Oosterhout wrote:
> > On Sun, May 25, 2003 at 10:04:30AM +1000, Wayne Armstrong wrote:
> >
> >> During import of 120 thousand records from an isam file system, 3 say records
> >>fail integrity checks ( files in non-database systems tend not to have
> >>referential integrity implemented on them except at an application level
> >>(meaning not at all :). The desired result is to drop the records failing
> >>integrity checks.
> >
> >
> > Yes, I have that problem too. I actually wrote a script that took an input
> > file and automatically reissued queries that succeeded but got rolled-back.
> > I called it quickloader.pl :)
> >
> > Probably the biggest problem is that you can't use COPY to load the data.
> > I've thought about loading into another table and transferring later but I
> > havn't sorted out the details.
>
> The general idea would be to setup a table that has exactly the same
> structure as the final target table, but with no constraints at all. As
> long as your data passes all input functions you can even COPY it in.
>
> Now you run check queries that show you all tuples in that staging table
> that would fail constraints on the final table. Fix those and you can do
>
> INSERT INTO final SELECT * FROM staging;
>
> If step one fails because of data that doesn't pass the input functions
> of our data types, you have to go through another level of staging with
> a table that has text fields only and move it by explicit casting after
> cleaning up those problems.
>
>
> Jan
>
> --
> #======================================================================#
> # It's easier to get forgiveness for being wrong than for being right. #
> # Let's break this rule - forgive me. #
> #================================================== JanWieck(at)Yahoo(dot)com #


From: Jan Wieck <JanWieck(at)Yahoo(dot)com>
To: Wayne Armstrong <wdarmst(at)bacchus(dot)com(dot)au>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: implicit abort harmful?
Date: 2003-05-30 19:06:01
Message-ID: 3ED7AB99.9070702@Yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Wayne Armstrong wrote:
> ** Reply to message from Jan Wieck <JanWieck(at)Yahoo(dot)com> on Thu, 29 May 2003
> 20:25:56 -0400
> Hi Jan,
> Yup, staging tables are probably going to be the answer in this case.
> However, my point is, you shouldn't need workarounds for this. (or for the
> more general insert on failure update on failure ignore type of logic used in a
> lot of import scenarios)
> Taking the decision if or not to rollback a transaction (sql error or no) away
> from the application, and enforcing it in the dbm, results in kluges,
> workarounds, and a generally less robust dbm.

You obviously didn't search the mail archives too much, did you? If
INSERT ... SELECT ... fails half way through due to a duplicate key
error - how to get rid of the so far inserted tuples?

This problem is well known, has been often discussed and is yet not
solveable because we do not have subtransactions. They are coming, but I
don't know if the proposed implementation will cope well with 120
million single INSERTs each running in it's own subtransaction.

And I disagree with your last statement. It doesn't result in a less
robust dbm, it results in more complex applications that (if written by
more primitive coders) will instruct a perfectly robust dbm to do the
wrong thing to the data - from a business point of view.

Jan

>
> Regards,
> Wayne
>
>> Martijn van Oosterhout wrote:
>> > On Sun, May 25, 2003 at 10:04:30AM +1000, Wayne Armstrong wrote:
>> >
>> >> During import of 120 thousand records from an isam file system, 3 say records
>> >>fail integrity checks ( files in non-database systems tend not to have
>> >>referential integrity implemented on them except at an application level
>> >>(meaning not at all :). The desired result is to drop the records failing
>> >>integrity checks.
>> >
>> >
>> > Yes, I have that problem too. I actually wrote a script that took an input
>> > file and automatically reissued queries that succeeded but got rolled-back.
>> > I called it quickloader.pl :)
>> >
>> > Probably the biggest problem is that you can't use COPY to load the data.
>> > I've thought about loading into another table and transferring later but I
>> > havn't sorted out the details.
>>
>> The general idea would be to setup a table that has exactly the same
>> structure as the final target table, but with no constraints at all. As
>> long as your data passes all input functions you can even COPY it in.
>>
>> Now you run check queries that show you all tuples in that staging table
>> that would fail constraints on the final table. Fix those and you can do
>>
>> INSERT INTO final SELECT * FROM staging;
>>
>> If step one fails because of data that doesn't pass the input functions
>> of our data types, you have to go through another level of staging with
>> a table that has text fields only and move it by explicit casting after
>> cleaning up those problems.
>>
>>
>> Jan
>>
>> --
>> #======================================================================#
>> # It's easier to get forgiveness for being wrong than for being right. #
>> # Let's break this rule - forgive me. #
>> #================================================== JanWieck(at)Yahoo(dot)com #

--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== JanWieck(at)Yahoo(dot)com #


From: "Wayne Armstrong" <wdarmst(at)bacchus(dot)com(dot)au>
To: Jan Wieck <JanWieck(at)Yahoo(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: implicit abort harmful?
Date: 2003-05-30 19:18:45
Message-ID: 200305301918.h4UJIj7v022391@mail.bacchus.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

** Reply to message from Jan Wieck <JanWieck(at)Yahoo(dot)com> on Fri, 30 May 2003
15:06:01 -0400
Hi,
Subtransactions here are likely as onerous as the solution that I bet gets
used most of the time in this scenario, that is to commit after every insert.
And, it's not a matter of good or bad coding here. There are cases where the
response to an sql or insert error should be a rollback. There are as many
cases where (given that the application is informed there was a problem), the
problem can safely be ignored. Again, the decision should be upto the
application not the database manager. It is the dbms perogative to reject the
statement, not the transaction.

Regards,
Wayne

> Wayne Armstrong wrote:
> > ** Reply to message from Jan Wieck <JanWieck(at)Yahoo(dot)com> on Thu, 29 May 2003
> > 20:25:56 -0400
> > Hi Jan,
> > Yup, staging tables are probably going to be the answer in this case.
> > However, my point is, you shouldn't need workarounds for this. (or for the
> > more general insert on failure update on failure ignore type of logic used in a
> > lot of import scenarios)
> > Taking the decision if or not to rollback a transaction (sql error or no) away
> > from the application, and enforcing it in the dbm, results in kluges,
> > workarounds, and a generally less robust dbm.
>
> You obviously didn't search the mail archives too much, did you? If
> INSERT ... SELECT ... fails half way through due to a duplicate key
> error - how to get rid of the so far inserted tuples?
>
> This problem is well known, has been often discussed and is yet not
> solveable because we do not have subtransactions. They are coming, but I
> don't know if the proposed implementation will cope well with 120
> million single INSERTs each running in it's own subtransaction.
>
> And I disagree with your last statement. It doesn't result in a less
> robust dbm, it results in more complex applications that (if written by
> more primitive coders) will instruct a perfectly robust dbm to do the
> wrong thing to the data - from a business point of view.
>
>
> Jan
>
> >
> > Regards,
> > Wayne
> >
> >> Martijn van Oosterhout wrote:
> >> > On Sun, May 25, 2003 at 10:04:30AM +1000, Wayne Armstrong wrote:
> >> >
> >> >> During import of 120 thousand records from an isam file system, 3 say records
> >> >>fail integrity checks ( files in non-database systems tend not to have
> >> >>referential integrity implemented on them except at an application level
> >> >>(meaning not at all :). The desired result is to drop the records failing
> >> >>integrity checks.
> >> >
> >> >
> >> > Yes, I have that problem too. I actually wrote a script that took an input
> >> > file and automatically reissued queries that succeeded but got rolled-back.
> >> > I called it quickloader.pl :)
> >> >
> >> > Probably the biggest problem is that you can't use COPY to load the data.
> >> > I've thought about loading into another table and transferring later but I
> >> > havn't sorted out the details.
> >>
> >> The general idea would be to setup a table that has exactly the same
> >> structure as the final target table, but with no constraints at all. As
> >> long as your data passes all input functions you can even COPY it in.
> >>
> >> Now you run check queries that show you all tuples in that staging table
> >> that would fail constraints on the final table. Fix those and you can do
> >>
> >> INSERT INTO final SELECT * FROM staging;
> >>
> >> If step one fails because of data that doesn't pass the input functions
> >> of our data types, you have to go through another level of staging with
> >> a table that has text fields only and move it by explicit casting after
> >> cleaning up those problems.
> >>
> >>
> >> Jan
> >>
> >> --
> >> #======================================================================#
> >> # It's easier to get forgiveness for being wrong than for being right. #
> >> # Let's break this rule - forgive me. #
> >> #================================================== JanWieck(at)Yahoo(dot)com #
>
>
>
> --
> #======================================================================#
> # It's easier to get forgiveness for being wrong than for being right. #
> # Let's break this rule - forgive me. #
> #================================================== JanWieck(at)Yahoo(dot)com #


From: Jan Wieck <JanWieck(at)Yahoo(dot)com>
To: Wayne Armstrong <wdarmst(at)bacchus(dot)com(dot)au>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: implicit abort harmful?
Date: 2003-05-30 19:40:43
Message-ID: 3ED7B3BB.1050901@Yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Wayne,

yes, I totally agree. Will you implement UNDO for that and if so, how do
you propose to get rid of the to be removed index tuples without the
risk of deadlocking during UNDO? Or is there some other possibility we
have overlooked so far?

With subtransactions I didn't mean application controlled ones, but
automatic ones that allow the system to return to the state at the
beginning of the failing command. It's not just your little INSERT
problem that needs to be covered. If you want the system to continue
after an ERROR without rollback, you need to undo every triggered action
done during this statement before the ERROR occured. Rule generated
additional queries run before the failing one, triggers, whatnot.

So please, do you have a context diff, do you have a detailed
implementation proposal, or are you just waving hands telling us what
you think the user visible behaviour should be?

Jan

Wayne Armstrong wrote:
> ** Reply to message from Jan Wieck <JanWieck(at)Yahoo(dot)com> on Fri, 30 May 2003
> 15:06:01 -0400
> Hi,
> Subtransactions here are likely as onerous as the solution that I bet gets
> used most of the time in this scenario, that is to commit after every insert.
> And, it's not a matter of good or bad coding here. There are cases where the
> response to an sql or insert error should be a rollback. There are as many
> cases where (given that the application is informed there was a problem), the
> problem can safely be ignored. Again, the decision should be upto the
> application not the database manager. It is the dbms perogative to reject the
> statement, not the transaction.
>
> Regards,
> Wayne
>
>> Wayne Armstrong wrote:
>> > ** Reply to message from Jan Wieck <JanWieck(at)Yahoo(dot)com> on Thu, 29 May 2003
>> > 20:25:56 -0400
>> > Hi Jan,
>> > Yup, staging tables are probably going to be the answer in this case.
>> > However, my point is, you shouldn't need workarounds for this. (or for the
>> > more general insert on failure update on failure ignore type of logic used in a
>> > lot of import scenarios)
>> > Taking the decision if or not to rollback a transaction (sql error or no) away
>> > from the application, and enforcing it in the dbm, results in kluges,
>> > workarounds, and a generally less robust dbm.
>>
>> You obviously didn't search the mail archives too much, did you? If
>> INSERT ... SELECT ... fails half way through due to a duplicate key
>> error - how to get rid of the so far inserted tuples?
>>
>> This problem is well known, has been often discussed and is yet not
>> solveable because we do not have subtransactions. They are coming, but I
>> don't know if the proposed implementation will cope well with 120
>> million single INSERTs each running in it's own subtransaction.
>>
>> And I disagree with your last statement. It doesn't result in a less
>> robust dbm, it results in more complex applications that (if written by
>> more primitive coders) will instruct a perfectly robust dbm to do the
>> wrong thing to the data - from a business point of view.
>>
>>
>> Jan
>>
>> >
>> > Regards,
>> > Wayne
>> >
>> >> Martijn van Oosterhout wrote:
>> >> > On Sun, May 25, 2003 at 10:04:30AM +1000, Wayne Armstrong wrote:
>> >> >
>> >> >> During import of 120 thousand records from an isam file system, 3 say records
>> >> >>fail integrity checks ( files in non-database systems tend not to have
>> >> >>referential integrity implemented on them except at an application level
>> >> >>(meaning not at all :). The desired result is to drop the records failing
>> >> >>integrity checks.
>> >> >
>> >> >
>> >> > Yes, I have that problem too. I actually wrote a script that took an input
>> >> > file and automatically reissued queries that succeeded but got rolled-back.
>> >> > I called it quickloader.pl :)
>> >> >
>> >> > Probably the biggest problem is that you can't use COPY to load the data.
>> >> > I've thought about loading into another table and transferring later but I
>> >> > havn't sorted out the details.
>> >>
>> >> The general idea would be to setup a table that has exactly the same
>> >> structure as the final target table, but with no constraints at all. As
>> >> long as your data passes all input functions you can even COPY it in.
>> >>
>> >> Now you run check queries that show you all tuples in that staging table
>> >> that would fail constraints on the final table. Fix those and you can do
>> >>
>> >> INSERT INTO final SELECT * FROM staging;
>> >>
>> >> If step one fails because of data that doesn't pass the input functions
>> >> of our data types, you have to go through another level of staging with
>> >> a table that has text fields only and move it by explicit casting after
>> >> cleaning up those problems.
>> >>
>> >>
>> >> Jan
>> >>
>> >> --
>> >> #======================================================================#
>> >> # It's easier to get forgiveness for being wrong than for being right. #
>> >> # Let's break this rule - forgive me. #
>> >> #================================================== JanWieck(at)Yahoo(dot)com #
>>
>>
>>
>> --
>> #======================================================================#
>> # It's easier to get forgiveness for being wrong than for being right. #
>> # Let's break this rule - forgive me. #
>> #================================================== JanWieck(at)Yahoo(dot)com #

--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== JanWieck(at)Yahoo(dot)com #


From: "Wayne Armstrong" <wdarmst(at)bacchus(dot)com(dot)au>
To: Jan Wieck <JanWieck(at)Yahoo(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: implicit abort harmful?
Date: 2003-05-30 19:54:54
Message-ID: 200305301954.h4UJst7v022537@mail.bacchus.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

** Reply to message from Jan Wieck <JanWieck(at)Yahoo(dot)com> on Fri, 30 May 2003
15:40:43 -0400
Hi Jan,
I'm grabbing source right now :)
That ought to be enough to shut me up for a couple months :)

Regards,
Wayne

> Wayne,
>
> yes, I totally agree. Will you implement UNDO for that and if so, how do
> you propose to get rid of the to be removed index tuples without the
> risk of deadlocking during UNDO? Or is there some other possibility we
> have overlooked so far?
>
> With subtransactions I didn't mean application controlled ones, but
> automatic ones that allow the system to return to the state at the
> beginning of the failing command. It's not just your little INSERT
> problem that needs to be covered. If you want the system to continue
> after an ERROR without rollback, you need to undo every triggered action
> done during this statement before the ERROR occured. Rule generated
> additional queries run before the failing one, triggers, whatnot.
>
> So please, do you have a context diff, do you have a detailed
> implementation proposal, or are you just waving hands telling us what
> you think the user visible behaviour should be?
>
>
> Jan
>
> Wayne Armstrong wrote:
> > ** Reply to message from Jan Wieck <JanWieck(at)Yahoo(dot)com> on Fri, 30 May 2003
> > 15:06:01 -0400
> > Hi,
> > Subtransactions here are likely as onerous as the solution that I bet gets
> > used most of the time in this scenario, that is to commit after every insert.
> > And, it's not a matter of good or bad coding here. There are cases where the
> > response to an sql or insert error should be a rollback. There are as many
> > cases where (given that the application is informed there was a problem), the
> > problem can safely be ignored. Again, the decision should be upto the
> > application not the database manager. It is the dbms perogative to reject the
> > statement, not the transaction.
> >
> > Regards,
> > Wayne
> >
> >> Wayne Armstrong wrote:
> >> > ** Reply to message from Jan Wieck <JanWieck(at)Yahoo(dot)com> on Thu, 29 May 2003
> >> > 20:25:56 -0400
> >> > Hi Jan,
> >> > Yup, staging tables are probably going to be the answer in this case.
> >> > However, my point is, you shouldn't need workarounds for this. (or for the
> >> > more general insert on failure update on failure ignore type of logic used in a
> >> > lot of import scenarios)
> >> > Taking the decision if or not to rollback a transaction (sql error or no) away
> >> > from the application, and enforcing it in the dbm, results in kluges,
> >> > workarounds, and a generally less robust dbm.
> >>
> >> You obviously didn't search the mail archives too much, did you? If
> >> INSERT ... SELECT ... fails half way through due to a duplicate key
> >> error - how to get rid of the so far inserted tuples?
> >>
> >> This problem is well known, has been often discussed and is yet not
> >> solveable because we do not have subtransactions. They are coming, but I
> >> don't know if the proposed implementation will cope well with 120
> >> million single INSERTs each running in it's own subtransaction.
> >>
> >> And I disagree with your last statement. It doesn't result in a less
> >> robust dbm, it results in more complex applications that (if written by
> >> more primitive coders) will instruct a perfectly robust dbm to do the
> >> wrong thing to the data - from a business point of view.
> >>
> >>
> >> Jan
> >>
> >> >
> >> > Regards,
> >> > Wayne
> >> >
> >> >> Martijn van Oosterhout wrote:
> >> >> > On Sun, May 25, 2003 at 10:04:30AM +1000, Wayne Armstrong wrote:
> >> >> >
> >> >> >> During import of 120 thousand records from an isam file system, 3 say records
> >> >> >>fail integrity checks ( files in non-database systems tend not to have
> >> >> >>referential integrity implemented on them except at an application level
> >> >> >>(meaning not at all :). The desired result is to drop the records failing
> >> >> >>integrity checks.
> >> >> >
> >> >> >
> >> >> > Yes, I have that problem too. I actually wrote a script that took an input
> >> >> > file and automatically reissued queries that succeeded but got rolled-back.
> >> >> > I called it quickloader.pl :)
> >> >> >
> >> >> > Probably the biggest problem is that you can't use COPY to load the data.
> >> >> > I've thought about loading into another table and transferring later but I
> >> >> > havn't sorted out the details.
> >> >>
> >> >> The general idea would be to setup a table that has exactly the same
> >> >> structure as the final target table, but with no constraints at all. As
> >> >> long as your data passes all input functions you can even COPY it in.
> >> >>
> >> >> Now you run check queries that show you all tuples in that staging table
> >> >> that would fail constraints on the final table. Fix those and you can do
> >> >>
> >> >> INSERT INTO final SELECT * FROM staging;
> >> >>
> >> >> If step one fails because of data that doesn't pass the input functions
> >> >> of our data types, you have to go through another level of staging with
> >> >> a table that has text fields only and move it by explicit casting after
> >> >> cleaning up those problems.
> >> >>
> >> >>
> >> >> Jan
> >> >>
> >> >> --
> >> >> #======================================================================#
> >> >> # It's easier to get forgiveness for being wrong than for being right. #
> >> >> # Let's break this rule - forgive me. #
> >> >> #================================================== JanWieck(at)Yahoo(dot)com #
> >>
> >>
> >>
> >> --
> >> #======================================================================#
> >> # It's easier to get forgiveness for being wrong than for being right. #
> >> # Let's break this rule - forgive me. #
> >> #================================================== JanWieck(at)Yahoo(dot)com #
>
>
>
> --
> #======================================================================#
> # It's easier to get forgiveness for being wrong than for being right. #
> # Let's break this rule - forgive me. #
> #================================================== JanWieck(at)Yahoo(dot)com #
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
> message can get through to the mailing list cleanly