Partitioning and Performance, Revisited-Partitioning-2

Much like the hash partitioned tables we investigated earlier, Oracle will take the OWNER value, hash it to a partition between 1 and 16, and place the index entry in there.
Now when we review the output from autotrace, we are much closer to the work performed by the nonpartitioned table earlier—that is, we have not negatively impacted the work performed by our queries:

It should be noted, however, that a hash partitioned index cannot be range scanned; in general, it is most suitable for exact equality (equals or in-lists). If you were to query “WHERE OWNER > :X” using the preceding index, it would not be able to perform a simple range scan using partition elimination. You would be back to inspecting all 16 hash partitions.

USING ORDER BY

This example brought to mind an unrelated but very important fact. When looking at hash partitioned indexes, we are faced with another case where the use of an index to retrieve data would not automatically retrieve the data sorted.

Many people assume that if the query plan shows an index is used to retrieve the data, the data will be retrieved sorted. This has never been true.
The only way we can retrieve data in any sort of sorted order is to use an ORDER BY clause on the query. If your query does not contain an ORDER BY statement, you cannot make any assumptions about the sorted order of the data.

A quick example demonstrates this. We create a small table as a copy of ALL_USERS and create a hash partitioned index with four partitions on the USER_ID column:

So, even though Oracle used the index in a range scan, the data is obviously not sorted. In fact, you might observe a pattern in this data.
There are four sorted results here: the … replaces values that were increasing in value; and between the rows with USER_ID = 13 and 97, the values were increasing in the output.
Then the row with USER_ID = 22 appeared. What we are observing is Oracle returning “sorted data” from each of the four hash partitions, one after the other.

This is just a warning that unless your query has an ORDER BY, you have no reason to anticipate the data being returned to you in any kind of sorted order whatsoever. (And no, GROUP BY doesn’t have to sort either! There is no substitute for ORDER BY.)

Does that mean partitioning won’t affect OLTP performance at all in a positive sense? No, not entirely—you just have to look in a different place. In general, it will not positively impact the performance of your data retrieval in OLTP; rather, care has to be taken to ensure data retrieval isn’t affected negatively. But on data modification, partitioning may provide salient benefits in highly concurrent environments.

Consider the preceding a rather simple example of a single table with a single index, and add into the mix a primary key. Without partitioning, there is a single table: all insertions go into this single table. There is contention perhaps for the freelists on this table.

Additionally, the primary key index that would be on the OBJECT_ID column would be a heavy right-hand-side index, as we discussed in Chapter 11. Presumably, it would be populated by a sequence; hence, all inserts would go after the rightmost block leading to buffer busy waits.
Also, there would be a single index structure T_IDX that people would be contending for. So far, a lot of single items.

Enter partitioning. You hash partition the table by OBJECT_ID into 16 partitions.
There are now 16 tables to contend for, and each table has one-sixteenth the number of users hitting it simultaneously.

You locally partition the primary key index on OBJECT_ID into 16 partitions. You now have 16 right-hand sides, and each index structure will receive one-sixteenth the workload it had before. And so on.

That is, you can use partitioning in a highly concurrent environment to reduce contention, much like we used a reverse key index in Chapter 11 to reduce the buffer busy waits. However, you must be aware that the very process of partitioning out the data consumes more CPU itself than not having partitioning.
That is, it takes more CPU to figure out where to put the data than it would if the data had but one place to go.

So, as with everything, before applying partitioning to a system to increase performance, make sure you understand what that system needs. If your system is currently CPU bound, but that CPU usage is not due to contention and latch waits, introducing partitioning could make the problem worse, not better!

Leave a Reply

Your email address will not be published. Required fields are marked *