A patch which implements HOT is attached with this email. This patch applies cleanly to PostgresQL 8.2 BETA3 and passes all the regression tests with two exceptions which are explained later. This patch has shown a good performance gain when tested with pgbench, DBT2 and some other benchmarking tests. This an early version of the patch and list of known issues/bugs and outstanding items is included with this email. We would appreciate all the review comments and experiences with the patch. What is HOT ? ------------ HOT stands for Heap Overflow Tuple and this is an attempt to solve some of the problems associated with frequently updated tables. This design optimizies the updates when none of the index columns are modified and length of the tuple remains the same after update. In this particular case, the updated tuple is stored in a seperate overflow relation and pulled-back into the main relation when the tuple in the main relation becomes dead. For a detail design document and related discussion please see the pgsql-hackers mailing list. How to Use HOT ? -------------- A new WITH option for tables is required to take advantage of HOT: CREATE TABLE test (a int) WITH (hot_update='true'); The default is set by an additional GUC parameter "default_use_hot_update", which defaults to false. This may not be supported in the final release, but it comes very handy to test some existing application with HOT-update without changing any of the table creation scripts. An initdb is required after patch application. Testing: ------- * Correctness Testing: - Regression tests When tables are created with HOT-update turned OFF, all the regression tests pass. The exceptions are the 'type_sanity' test which fails because a new system attribute called "btid" is added and 'rules' test which fails because of additional pg_stats columns added with the patch. With the current behaviour, when a tuple is updated, the updated tuple is inserted at the end of the table. So during a sequential scan, the updated tuple is returned at the end. With HOT-updates, the updated tuple goes into the overflow relation (and subsequently copied-back to the root tuple). So in a sequential scan, the tuple is returned early and in the same order before the update. This causes a difference in the scan order for HOT-updated and normal tables. Because of the same issue, few of the regression tests seem failing when HOT is turned ON. These not really bugs. The only exeption is hash_index test which fails because of a bug in the code and is described in the section of Outstanding Bugs. Since changes to the regression tests is a major topic of discussion, we made a separate patch which fixes the regression tests (by adding ORDER BY clause) for testing HOT-updates and we would post that as a seperate patch. - Concurrent psql tests We implemented a concurrent version of psql to support multiple sessions and transactions from the same psql prompt. We also wrote several test cases to test the correctness of HOT-updates in various serializable modes. All of these tests are passing fine. We should be posting those tests and psql patch very soon. * Performance Testing: We ran performance benchmark tests using various open source benchmarking tools such as pgbench and DBT2 and some specially developed tests such as truckin which was posted to the mailing lists earlier. We shall post the results of these tests to the list. * Crash Recovery: We have tested the crash recovery to some extent, but a thorough testing is still required. Till now we have mostly tested the recovery of pgbench running with large number of clients. Extensive crash recovery testing is needed, especially with heavy concurrent updates/selects/deletes and mix of HOT and non-HOT updates. Outstanding BUGS: ---------------- The following items don't work yet, but no problems are foreseen in doing so: - Vacuuming copied-back tuple When a tuple from the overflow relation is copied-back to the main relation, the overflow tuple must not be vacuumed until there are backends which might have references to the tuple. The tuple should be vacuumed once its clear that the tuple is now unreachable. In the current code, we set the xmax of the overflow tuple to the transaction id of the transaction doing the copy-back operation. But this is clearly a wrong thing to do. Need to fix this. - Bitmap heapscan The hash_index regression test fails because of this bug. We should follow the overflow tuple chain while checking the visibility of the root tuples. - Tuple freezing during vacuum This code is very recently added to PostgresQL Beta3. The HOT code need some rework to deal with this change. Outstanding Work: ---------------------- The following items have not yet been implemented, though the current design does specify them: - Overflow tuple header In the current prototype tuples in the main heap also have the overflow header, though this can be removed. At some of the places, we have used this assumption and this needs to be fixed. Outstanding TODO: ------------------- - Vacuum full disabled Vacuum full is currently disabled on the HOT-updatable and overflow relations. Though we did this for simplifying the early implementation, we need to revisit this and see if we can remove this constraint. Since vacuum full is disabled, we could reuse the HEAP_UPDATED flag for marking copied-back tuples. So if we support vacuum full, then this would need slight rework. - Pagemode seqscan disabled Again for simplifying the first version, pagemode seqscan is disabled for HOT-updatable relations. Another reason for doing so is that root tuples may get overwritten and we may need to reapply visibility checks anyways. This might become a permanent feature. - HOT-update for toasted tuples We should be able to support HOT-updates for toasted tuples. But this need more thinking. - dump/restore may not work Dump/restore may not work for HOT-updatable relation when it is created by setting the "default_hot_update" GUC parameter to true. This should be fixed if we decide to support the GUC parameter. - HOT-updates for tables with expression index For simplicity reasons, tables with expression index can not use HOT updates. We need to think again and check if this can be relaxed. - Alter table support We may want to add ALTER TABLE support for HOT-update. So a table which is not created with HOT-update ON can be altered to use HOT-update and vice versa. - Turning HOT-update ON/OFF We may want to add support to turn HOT on/off while table is being used. Unresolved Issues: ------------------ - Handling ctid references As per discussion of the mailing list, there are postgresql clients which might cache the ctid of a tuple and use it later. Since we replace a dead root tuple with the oldtest live tuple in the tuple chain, there are two copies of the live tuple. The working copy of the tuple is in the main relation, but there could be backends still holding reference to the overflow copy. If the working copy is updated/deleted, the backends may not see that effect if the tuple is refetched using the ctid. - Index creation Assuming a relation is created with HOT-update on and the tuples are HOT-updated, there could be tuples in the overflow chain with different values for an attribute. If an index is now created on that attribute, we must ensure that the semantics of having the same index value in all the tuples in the overflow chain is maintained. That means, we should either pull-up the chain so that there are no tuples in the chain or break the chain and have multiple entries in the index.
Attachment:
HOT-pg82beta3-v1.0.patch.gz
Description: GNU Zip compressed data
Attachment:
HOT-pg82beta3-regression-v1.0.patch.gz
Description: GNU Zip compressed data