Partitioning and Performance, Revisited-Partitioning-1 – How Hash Partitioning Works at Oracle Database

Many times, I hear people say, “I’m very disappointed in partitioning. We partitioned our largest table and it went much slower. So much for partitioning being a performance increasing feature!” Partitioning can do one of the following three things to overall query performance:

•\ Make your queries go faster
•\ Not impact the performance of your queries at all
•\ Make your queries go much slower and use many times the resources as the nonpartitioned implementation

In a data warehouse, with an understanding of the questions being asked of the data, the first bullet point is very much achievable. Partitioning can positively impact queries that frequently full scan large database tables by eliminating large sections of data from consideration.

Suppose you have a table with one billion rows in it. There is a timestamp attribute. Your query is going to retrieve one year’s worth of data from this table (and it has ten years of data). Your query uses a full table scan to retrieve this data. Had it been partitioned by this timestamp entry—say, a partition per month—then you could have full scanned one-tenth the data (assuming a uniform distribution of data over the years).

Partition elimination would have removed the other 90 percent of the data from consideration. Your query would likely run faster.

Now, take a similar table in an OLTP system. You would never retrieve ten percent of a one billion–row table in that type of application.

Therefore, the massive increase in speed seen by the data warehouse just would not be achievable in a transactional system. You are not doing the same sort of work, and the same possible improvements are just not realistic.

Therefore, in general, in your OLTP system the first bullet point is not achievable, and you won’t be applying partitioning predominantly for increased performance. Increased availability—absolutely.

Administrative ease of use—very much so. But in an OLTP system, I say you have to work hard to make sure you achieve the second point: that you do not impact the performance of your queries at all, negatively or positively. Many times, your goal is to apply partitioning without affecting query response time.

On many occasions, I’ve seen that the implementation team will see they have a medium-sized table, say of 100 million rows. Now, 100 million sounds like an incredibly large number (and five or ten years ago, it would have been, but time changes all things).

So the team decides to partition the data. But in looking at the data, there are no logical attributes that make sense for RANGE partitioning. There are no sensible attributes for that. Likewise, LIST partitioning doesn’t make sense.

Nothing pops out of this table as being the right thing to partition by. So, the team opts for hash partitioning on the primary key, which just happens to be populated by an Oracle sequence number. It looks perfect, it is unique and easy to hash, and many queries are of the form SELECT * FROM T WHERE PRIMARY_KEY = :X.

But the problem is there are many other queries against this object that are not of that form. For illustrative purposes, assume the table in question is really the ALL_OBJECTS dictionary view, and while internally many queries would be of the form WHERE OBJECT_ID = :X, the end users frequently have these requests of the application as well:

•\ Show me the details of SCOTT’s EMP table (where owner = 😮 and object_type = :t and object_name = :n).
•\ Show me all of the tables SCOTT owns (where owner = 😮 andobject_type = :t).
•\ Show me all of the objects SCOTT owns (where owner = :o).

In support of those queries, you have an index on (OWNER,OBJECT_TYPE,OBJECT_ NAME). But you also read that local indexes are more available, and you would like to be more available regarding your system, so you implement them. You end up re-creating your table like this, with 16 hash partitions:

However, when you run this with autotrace on and review the output, you notice the following performance characteristics:

You compare that to the same table, only with no partitioning implemented, and discover the following:

You might immediately jump to the (erroneous) conclusion that partitioning causes an increase in I/O. The query gets 7 consistent gets without partitioning and 36 with partitioning. If your system had an issue with high consistent gets (logical I/Os before), it is worse now. If it didn’t have one before, it might well get one.

The root cause? The index partitioning scheme. Look at the following output from an explain plan for the partitioned version of this table:

This query has to look at each and every index partition here. The reason for that is because entries for SCOTT may well be in each and every index partition and probably is.

The index is logically hash partitioned by OBJECT_ID; any query that uses this index and that does not also refer to the OBJECT_ID in the predicate must consider every index partition! So, what is the solution here? You should globally partition your index. Using the previous case as the example, we could choose to hash partition the index:

Note There are considerations to be taken into account with hash partitioned indexes regarding range scans, which we’ll discuss later in this section.

Leave a Reply Cancel reply