Partitioning and Performance, Revisited-Partitioning-2

Much like the hash partitioned tables we investigated earlier, Oracle will take the OWNER value, hash it to a partition between 1 and 16, and place the index entry in there.
Now when we review the output from autotrace, we are much closer to the work performed by the nonpartitioned table earlier—that is, we have not negatively impacted the work performed by our queries:

It should be noted, however, that a hash partitioned index cannot be range scanned; in general, it is most suitable for exact equality (equals or in-lists). If you were to query “WHERE OWNER > :X” using the preceding index, it would not be able to perform a simple range scan using partition elimination. You would be back to inspecting all 16 hash partitions.

USING ORDER BY

This example brought to mind an unrelated but very important fact. When looking at hash partitioned indexes, we are faced with another case where the use of an index to retrieve data would not automatically retrieve the data sorted.

Many people assume that if the query plan shows an index is used to retrieve the data, the data will be retrieved sorted. This has never been true.
The only way we can retrieve data in any sort of sorted order is to use an ORDER BY clause on the query. If your query does not contain an ORDER BY statement, you cannot make any assumptions about the sorted order of the data.

A quick example demonstrates this. We create a small table as a copy of ALL_USERS and create a hash partitioned index with four partitions on the USER_ID column:

So, even though Oracle used the index in a range scan, the data is obviously not sorted. In fact, you might observe a pattern in this data.
There are four sorted results here: the … replaces values that were increasing in value; and between the rows with USER_ID = 13 and 97, the values were increasing in the output.
Then the row with USER_ID = 22 appeared. What we are observing is Oracle returning “sorted data” from each of the four hash partitions, one after the other.

This is just a warning that unless your query has an ORDER BY, you have no reason to anticipate the data being returned to you in any kind of sorted order whatsoever. (And no, GROUP BY doesn’t have to sort either! There is no substitute for ORDER BY.)

Does that mean partitioning won’t affect OLTP performance at all in a positive sense? No, not entirely—you just have to look in a different place. In general, it will not positively impact the performance of your data retrieval in OLTP; rather, care has to be taken to ensure data retrieval isn’t affected negatively. But on data modification, partitioning may provide salient benefits in highly concurrent environments.

Consider the preceding a rather simple example of a single table with a single index, and add into the mix a primary key. Without partitioning, there is a single table: all insertions go into this single table. There is contention perhaps for the freelists on this table.

Additionally, the primary key index that would be on the OBJECT_ID column would be a heavy right-hand-side index, as we discussed in Chapter 11. Presumably, it would be populated by a sequence; hence, all inserts would go after the rightmost block leading to buffer busy waits.
Also, there would be a single index structure T_IDX that people would be contending for. So far, a lot of single items.

Enter partitioning. You hash partition the table by OBJECT_ID into 16 partitions.
There are now 16 tables to contend for, and each table has one-sixteenth the number of users hitting it simultaneously.

You locally partition the primary key index on OBJECT_ID into 16 partitions. You now have 16 right-hand sides, and each index structure will receive one-sixteenth the workload it had before. And so on.

That is, you can use partitioning in a highly concurrent environment to reduce contention, much like we used a reverse key index in Chapter 11 to reduce the buffer busy waits. However, you must be aware that the very process of partitioning out the data consumes more CPU itself than not having partitioning.
That is, it takes more CPU to figure out where to put the data than it would if the data had but one place to go.

So, as with everything, before applying partitioning to a system to increase performance, make sure you understand what that system needs. If your system is currently CPU bound, but that CPU usage is not due to contention and latch waits, introducing partitioning could make the problem worse, not better!

Partitioning and Performance, Revisited-Partitioning-1

Many times, I hear people say, “I’m very disappointed in partitioning. We partitioned our largest table and it went much slower. So much for partitioning being a performance increasing feature!” Partitioning can do one of the following three things to overall query performance:

•\ Make your queries go faster
•\ Not impact the performance of your queries at all
•\ Make your queries go much slower and use many times the resources as the nonpartitioned implementation

In a data warehouse, with an understanding of the questions being asked of the data, the first bullet point is very much achievable. Partitioning can positively impact queries that frequently full scan large database tables by eliminating large sections of data from consideration.

Suppose you have a table with one billion rows in it. There is a timestamp attribute. Your query is going to retrieve one year’s worth of data from this table (and it has ten years of data). Your query uses a full table scan to retrieve this data. Had it been partitioned by this timestamp entry—say, a partition per month—then you could have full scanned one-tenth the data (assuming a uniform distribution of data over the years).

Partition elimination would have removed the other 90 percent of the data from consideration. Your query would likely run faster.

Now, take a similar table in an OLTP system. You would never retrieve ten percent of a one billion–row table in that type of application.

Therefore, the massive increase in speed seen by the data warehouse just would not be achievable in a transactional system. You are not doing the same sort of work, and the same possible improvements are just not realistic.

Therefore, in general, in your OLTP system the first bullet point is not achievable, and you won’t be applying partitioning predominantly for increased performance. Increased availability—absolutely.

Administrative ease of use—very much so. But in an OLTP system, I say you have to work hard to make sure you achieve the second point: that you do not impact the performance of your queries at all, negatively or positively. Many times, your goal is to apply partitioning without affecting query response time.

On many occasions, I’ve seen that the implementation team will see they have a medium-sized table, say of 100 million rows. Now, 100 million sounds like an incredibly large number (and five or ten years ago, it would have been, but time changes all things).

So the team decides to partition the data. But in looking at the data, there are no logical attributes that make sense for RANGE partitioning. There are no sensible attributes for that. Likewise, LIST partitioning doesn’t make sense.

Nothing pops out of this table as being the right thing to partition by. So, the team opts for hash partitioning on the primary key, which just happens to be populated by an Oracle sequence number. It looks perfect, it is unique and easy to hash, and many queries are of the form SELECT * FROM T WHERE PRIMARY_KEY = :X.

But the problem is there are many other queries against this object that are not of that form. For illustrative purposes, assume the table in question is really the ALL_OBJECTS dictionary view, and while internally many queries would be of the form WHERE OBJECT_ID = :X, the end users frequently have these requests of the application as well:

•\ Show me the details of SCOTT’s EMP table (where owner = 😮 and object_type = :t and object_name = :n).
•\ Show me all of the tables SCOTT owns (where owner = 😮 andobject_type = :t).
•\ Show me all of the objects SCOTT owns (where owner = :o).

In support of those queries, you have an index on (OWNER,OBJECT_TYPE,OBJECT_ NAME). But you also read that local indexes are more available, and you would like to be more available regarding your system, so you implement them. You end up re-creating your table like this, with 16 hash partitions:

However, when you run this with autotrace on and review the output, you notice the following performance characteristics:

You compare that to the same table, only with no partitioning implemented, and discover the following:

You might immediately jump to the (erroneous) conclusion that partitioning causes an increase in I/O. The query gets 7 consistent gets without partitioning and 36 with partitioning. If your system had an issue with high consistent gets (logical I/Os before), it is worse now. If it didn’t have one before, it might well get one.

The root cause? The index partitioning scheme. Look at the following output from an explain plan for the partitioned version of this table:

This query has to look at each and every index partition here. The reason for that is because entries for SCOTT may well be in each and every index partition and probably is.

The index is logically hash partitioned by OBJECT_ID; any query that uses this index and that does not also refer to the OBJECT_ID in the predicate must consider every index partition! So, what is the solution here? You should globally partition your index. Using the previous case as the example, we could choose to hash partition the index:

Note There are considerations to be taken into account with hash partitioned indexes regarding range scans, which we’ll discuss later in this section.