How Hash Partitioning Works at Oracle Database

Summary-Partitioning

Posted on July 25, 2024July 25, 2024 by Barbara Raymond

Partitioning is extremely useful in scaling up large database objects in the database. This scaling is visible from the perspective of performance scaling, availability scaling, and administrative scaling. All three are extremely important to different people.

The DBA is concerned with administrative scaling. The owners of the system are concerned with availability, because downtime is lost money, and anything that reduces downtime—or reduces the impact of downtime—boosts the payback for a system.

The end users of the system are concerned with performance scaling. No one likes to use a slow system, after all.

We also looked at the fact that in an OLTP system, partitions may not increase performance, especially if applied improperly. Partitions can increase the performance of certain classes of queries, but those queries are generally not applied in an OLTP system.

This point is important to understand, as many people associate partitioning with “free performance increase.” This does not mean that partitions should not be used in OLTP systems—they do provide many other salient benefits in this environment— just don’t expect a massive increase in throughput.

Expect reduced downtime. Expect the same good performance (partitioning will not slow you down when applied appropriately). Expect easier manageability, which may lead to increased performance due to the fact that some maintenance operations are performed by the DBAs more frequently because they can be.

We investigated the various table partitioning schemes offered by Oracle—range, hash, list, interval, reference, interval reference, virtual column, and composite—and talked about when they are most appropriately used.

We spent the bulk of our time looking at partitioned indexes and examining the differences between prefixed and nonprefixed and local and global indexes. We investigated partition operations in data warehouses combined with global indexes, and the trade-off between resource consumption and availability.

We also looked at the ease of maintenance features such as the ability to perform maintenance operations on multiple partitions at a time, cascade truncate, and cascade exchange. Oracle continues to update and improve partitioning with each new release.

Over time, I see this feature becoming more relevant to a broader audience as the size and scale of database applications grow.

The Internet and its database-hungry nature along with legislation requiring longer retention of audit data are leading to more and more extremely large collections of data, and partitioning is a natural tool to help manage that problem.

Auditing and Segment Space Compression-Partitioning

Posted on June 15, 2024July 25, 2024 by Barbara Raymond

Not too many years ago, US government constraints such as those imposed by HIPAA (www.hhs.gov/ocr/hipaa) were not in place. Companies such as Enron were still in business, and another US government requirement for Sarbanes-Oxley compliance did not exist. Back then, auditing was considered something that “we might do someday, maybe.” Today, however, auditing is at the forefront, and many DBAs are challenged to retain online up to seven years of audit trail information for their financial, business, and health-care databases.

Audit trail information is the one piece of data in your database that you might well insert but never retrieve during the normal course of operation. It is there predominantly as a forensic, after-the-fact trail of evidence. We need to have it, but from many perspectives, it is just something that sits on our disks and consumes space—lots and lots of space.
And then every month or year or some other time interval, we have to purge or archive it. Auditing is something that if not properly designed from the beginning can kill you at the end. Seven years from now when you are faced with your first purge or archive of the old data is not when you want to be thinking about how to accomplish it. Unless you designed for it, getting that old information out is going to be painful.

Enter two technologies that make auditing not only bearable but also pretty easy to manage and consume less space. These technologies are partitioning and segment space compression, as we discussed in Chapter 10. That second one might not be as obvious since basic segment space compression only works with large bulk operations like a direct path load (OLTP compression is a feature of the Advanced Compression Option—not available with all database editions), and audit trails are typically inserted into a row at a time, as events happen. The trick is to combine sliding window partitions with segment space compression.

Suppose we decide to partition the audit trail by month. During the first month of business, we just insert into the partitioned table; these inserts go in using the conventional path, not a direct path, and hence are not compressed.

Now, before the month ends, we’ll add a new partition to the table to accommodate next month’s auditing activity. Shortly after the beginning of next month, we will perform a large bulk operation on last month’s audit trail—specifically, we’ll use the ALTER TABLE command to move last month’s partition, which will have the effect of compressing the data as well.

If we, in fact, take this a step further, we could move this partition from a read-write tablespace, which it must have been in, into a tablespace that is normally read-only (and contains other partitions for this audit trail). In that fashion, we can back up that tablespace once a month, after we move the partition in there; ensure we have a good, clean, current readable copy of the tablespace; and then not back it up anymore that month.
We might have the following tablespaces for our audit trail:

•\ A current online, read-write tablespace that gets backed up like every other normal tablespace in our system: The audit trail information in this tablespace is not compressed, and it is constantly inserted into.
•\ A read-only tablespace containing “this year to date” audit trail partitions in a compressed format: At the beginning of each month, we make this tablespace read-write, move and compress last month’s audit information into this tablespace, make it read-only again, and back it up.
•\ A series of tablespaces for last year, the year before, and so on: These are all read-only and might even be on slow, cheap media. In the event of a media failure, we just need to restore from backup. We would occasionally pick a year at random from our backup sets to ensure they are still restorable (tapes go bad sometimes).

In this fashion, we have made purging easy (i.e., drop a partition). We have made archiving easy, too—we could just transport a tablespace off and restore it later. We have reduced our space utilization by implementing compression. We have reduced our backup volumes, as in many systems, the single largest set of data is audit trail data. If you can remove some or all of that from your day-to-day backups, the difference will be measurable.

In short, audit trail requirements and partitioning are two things that go hand in hand, regardless of the underlying system type, be it data warehouse or OLTP.

Tip Consider using Oracle’s Flashback Data Archive feature for auditing requirements. When enabled for a table, the Flashback Data Archive will automatically create an underlying partitioned table to record transactional information.

Cascade Truncate-Partitioning

Posted on May 3, 2024July 25, 2024 by Barbara Raymond

Oracle allows you to truncate parent/child tables in tandem as a single atomic DDL statement. While the truncate cascade is taking place, any queries issued against the parent/child table combination are always presented with a read-consistent view of the data, meaning that the data in the parent/child tables will either be seen as both tables populated or both tables truncated.

The truncate cascade functionality is initiated with a TRUNCATE … CASCADE statement on the parent table. For the cascade truncate to take place, any child tables must be defined with the foreign key relational constraint of ON DELETE CASCADE. What does cascade truncate have to do with partitioning? In a reference partitioned table, you can truncate a parent table partition and have it cascade to the child table partition in one transaction.

Let’s look at an example of this. Applying the TRUNCATE … CASCADE functionalityto reference partitioned tables, the parent ORDERS table is created here, and the ORDER_LINE_ITEMS table is created with ON DELETE CASCADE applied to the foreign key constraint:

$ sqlplus eoda/foo@PDB1
SQL> create table orders(order# number primary key,order_date date,data varchar2(30))
PARTITION BY RANGE (order_date)(PARTITION part_2020 VALUES LESS THAN (to_date(’01-01-2021′,’dd-mm-yyyy’)) , PARTITION part_2021 VALUES LESS THAN (to_date(’01-01-2022′,’dd-mm-yyyy’)));Table created.
SQL> insert into orders values ( 1, to_date( ’01-jun-2020′, ‘dd-mon-yyyy’ ), ‘xyz’ );1 row created.
SQL> insert into orders values ( 2, to_date( ’01-jun-2021′, ‘dd-mon-yyyy’ ), ‘xyz’ );1 row created.

And now we’ll create the ORDER_LINE_ITEMS table, ensuring we include the ON DELETE CASCADE clause:
SQL> create table order_line_items(order# number,line# number,datavarchar2(30),constraint c1_pk primary key(order#,line#),constraint c1_fk_p foreign key(order#) references orders on deletecascade) partition by reference(c1_fk_p);
SQL> insert into order_line_items values ( 1, 1, ‘yyy’ ); 1 row created.
SQL> insert into order_line_items values ( 2, 1, ‘yyy’ ); 1 row created.

Now we can issue a TRUNCATE … CASCADE that truncates both the parent table partition and the child table partition as a single transaction:
SQL> alter table orders truncate partition PART_2020 cascade; Table truncated.

In other words, the TRUNCATE … CASCADE functionality prevents applications from seeing the child table truncated before the parent table is truncated.
You can also truncate all partitions in the parent and child tables via
SQL> truncate table orders cascade;Table truncated.

Again, just to be clear, the ability to cascade truncate parent/child tables is not exclusively a partitioning feature. This feature also applies to nonpartitioned parent/ child tables. This allows you to use one DDL statement to initiate truncate operations and also ensures the database application is always presented with a consistent view of parent/child partitions.

Cascade Exchange

Oracle allows you to exchange the combination of parent/child reference partitioned tables in one atomic DDL statement. A small example will demonstrate this. First, a reference partitioned parent and child table is created to set this up:
$ sqlplus eoda/foo@PDB1
SQL> create table orders( order# number primary key,order_date date,data varchar2(30))PARTITION BY RANGE (order_date)
(PARTITION part_2020 VALUES LESS THAN (to_date(’01-01-2021′,’dd-mm-yyyy’)) , PARTITION part_2021 VALUES LESS THAN (to_date(’01-01-2022′,’dd-mm-yyyy’)));
SQL> insert into orders values (1, to_date( ’01-jun-2014′, ‘dd-mon-yyyy’ ), ‘xyz’); SQL> insert into orders values (2, to_date( ’01-jun-2015′, ‘dd-mon-yyyy’ ), ‘xyz’);
SQL> create table order_line_items
SQL> insert into order_line_items values ( 1, 1, ‘yyy’ ); SQL> insert into order_line_items values ( 2, 1, ‘yyy’ );

Next, an empty partition is added to the reference partitioned table:
SQL> alter table orders add partition part_2022 values less than (to_ date(’01-01-2023′,’dd-mm-yyyy’));

Next, a parent and a child table are created and loaded with data. These are the tables that will be exchanged with the empty partitions in the reference partitioned table:
SQL> create table part_2022( order# number primary key,order_date date,data varchar2(30));
SQL> insert into part_2022 values (3, to_date(’01-jun-2022′, ‘dd-mon-yyyy’ ), ‘xyz’);

Now we can exchange the prior two tables in one transaction into the reference partitioned tables. Notice the CASCADE option is specified:
SQL> alter table orders exchange partition part_2022
with table part_2022without validationCASCADE
update global indexes;

That’s it. With one DDL statement, we simultaneously exchanged two tables related by a foreign key constraint into a reference partitioned table. Anybody accessing the database will see the parent and child table partitions added seamlessly as one unit of work.

Ease of Maintenance Features-Partitioning

Posted on April 6, 2024July 25, 2024 by Barbara Raymond

At the beginning of this chapter, I stated the goal was to provide a practical guide to implement applications with partitioning, and that I wouldn’t be focusing so much on administration. However, there are a few new administrative features available that deserve some discussion, namely:

•\ Multiple partition maintenance operations
•\ Cascade exchange
•\ Cascade delete

These features have a positive impact in terms of ease of maintenance, data integrity, and performance. Therefore, it’s important to be aware of these features when implementing partitioning.

Multiple Partition Maintenance Operations
This feature eases the administration of partitioning and in some scenarios reduces the database resources required to perform maintenance operations. Oracle allows you to combine more than one partition maintenance operation in one DDL statement. Consider the following example:

$ sqlplus eoda/foo@PDB1
SQL> create table p_table(a int)partition by range (a)
(partition p1 values less than (1000), partition p2 values less than (2000));
Table created.

Now say you want to add multiple partitions to the table that was just created. Oracle allows you to perform multiple partition operations in one statement:

SQL> alter table p_table add
partition p3 values less than (3000),
partition p4 values less than (4000);Table altered.

Note In addition to adding partitions, multiple partition maintenance operations can be applied to dropping, merging, splitting, and truncating.

Performing multiple maintenance partition operations in one DDL statement is particularly advantageous for splitting partitions and thus deserves more discussion. A small example will illustrate this. Let’s set this up by creating a table and loading it with data:

SQL> CREATE TABLE sales(sales_id int,s_date date)PARTITION BY RANGE (s_date)
(PARTITION P2021 VALUES LESS THAN (to_date(’01-jan-2022′,’dd-mon-yyyy’))); Table created.
SQL> insert into sales
select level, to_date(’01-jan-2021′,’dd-mon-yyyy’) + ceil(dbms_random. value(1,364))
from dual connect by level < 100000; 99999 rows created.

Next, we create a small utility function to help us measure the resources consumed while performing an operation:

SQL> create or replace function get_stat_val( p_name in varchar2 ) return number
asl_val number;beginselect b.valueinto l_valfrom v$statname a, v$mystat b
where a.statistic# = b.statistic#and a.name = p_name;return l_val;end;/
Function created.

Next, we’ll split the P2021 partition into four partitions in one DDL statement and measure the resources consumed:

SQL> var r1 number
SQL> exec :r1 := get_stat_val(‘redo size’);
PL/SQL procedure successfully completed.
SQL> var c1 number
SQL> exec :c1 := dbms_utility.get_cpu_time;
PL/SQL procedure successfully completed.
SQL> alter table sales split partition P2021
into (partition Q1 values less than (to_date(’01-apr-2021′,’dd-mon-yyyy’)), partition Q2 values less than (to_date(’01-jul-2021′,’dd-mon-yyyy’)),
partition Q3 values less than (to_date(’01-oct-2021′,’dd-mon-yyyy’)), partition Q4);Table altered.
SQL> set serverout on
SQL> exec dbms_output.put_line(get_stat_val(‘redo size’) – :r1); 47068
SQL> exec dbms_output.put_line(dbms_utility.get_cpu_time – :c1); 12

The amount of redo generated via the single DDL statement is relatively low. Depending on the number of partitions being split and if you’re updating indexes at the same time, the amount of redo generated and CPU consumed can be considerably less than when splitting the maintenance operations into multiple statements.

Partitioning and Performance, Revisited-Partitioning-2

Posted on March 7, 2024July 25, 2024 by Barbara Raymond

Much like the hash partitioned tables we investigated earlier, Oracle will take the OWNER value, hash it to a partition between 1 and 16, and place the index entry in there.
Now when we review the output from autotrace, we are much closer to the work performed by the nonpartitioned table earlier—that is, we have not negatively impacted the work performed by our queries:

It should be noted, however, that a hash partitioned index cannot be range scanned; in general, it is most suitable for exact equality (equals or in-lists). If you were to query “WHERE OWNER > :X” using the preceding index, it would not be able to perform a simple range scan using partition elimination. You would be back to inspecting all 16 hash partitions.

USING ORDER BY

This example brought to mind an unrelated but very important fact. When looking at hash partitioned indexes, we are faced with another case where the use of an index to retrieve data would not automatically retrieve the data sorted.

Many people assume that if the query plan shows an index is used to retrieve the data, the data will be retrieved sorted. This has never been true.
The only way we can retrieve data in any sort of sorted order is to use an ORDER BY clause on the query. If your query does not contain an ORDER BY statement, you cannot make any assumptions about the sorted order of the data.

A quick example demonstrates this. We create a small table as a copy of ALL_USERS and create a hash partitioned index with four partitions on the USER_ID column:

So, even though Oracle used the index in a range scan, the data is obviously not sorted. In fact, you might observe a pattern in this data.
There are four sorted results here: the … replaces values that were increasing in value; and between the rows with USER_ID = 13 and 97, the values were increasing in the output.
Then the row with USER_ID = 22 appeared. What we are observing is Oracle returning “sorted data” from each of the four hash partitions, one after the other.

This is just a warning that unless your query has an ORDER BY, you have no reason to anticipate the data being returned to you in any kind of sorted order whatsoever. (And no, GROUP BY doesn’t have to sort either! There is no substitute for ORDER BY.)

Does that mean partitioning won’t affect OLTP performance at all in a positive sense? No, not entirely—you just have to look in a different place. In general, it will not positively impact the performance of your data retrieval in OLTP; rather, care has to be taken to ensure data retrieval isn’t affected negatively. But on data modification, partitioning may provide salient benefits in highly concurrent environments.

Consider the preceding a rather simple example of a single table with a single index, and add into the mix a primary key. Without partitioning, there is a single table: all insertions go into this single table. There is contention perhaps for the freelists on this table.

Additionally, the primary key index that would be on the OBJECT_ID column would be a heavy right-hand-side index, as we discussed in Chapter 11. Presumably, it would be populated by a sequence; hence, all inserts would go after the rightmost block leading to buffer busy waits.
Also, there would be a single index structure T_IDX that people would be contending for. So far, a lot of single items.

Enter partitioning. You hash partition the table by OBJECT_ID into 16 partitions.
There are now 16 tables to contend for, and each table has one-sixteenth the number of users hitting it simultaneously.

You locally partition the primary key index on OBJECT_ID into 16 partitions. You now have 16 right-hand sides, and each index structure will receive one-sixteenth the workload it had before. And so on.

That is, you can use partitioning in a highly concurrent environment to reduce contention, much like we used a reverse key index in Chapter 11 to reduce the buffer busy waits. However, you must be aware that the very process of partitioning out the data consumes more CPU itself than not having partitioning.
That is, it takes more CPU to figure out where to put the data than it would if the data had but one place to go.

So, as with everything, before applying partitioning to a system to increase performance, make sure you understand what that system needs. If your system is currently CPU bound, but that CPU usage is not due to contention and latch waits, introducing partitioning could make the problem worse, not better!

Partitioning and Performance, Revisited-Partitioning-1

Posted on March 1, 2024July 25, 2024 by Barbara Raymond

Many times, I hear people say, “I’m very disappointed in partitioning. We partitioned our largest table and it went much slower. So much for partitioning being a performance increasing feature!” Partitioning can do one of the following three things to overall query performance:

•\ Make your queries go faster
•\ Not impact the performance of your queries at all
•\ Make your queries go much slower and use many times the resources as the nonpartitioned implementation

In a data warehouse, with an understanding of the questions being asked of the data, the first bullet point is very much achievable. Partitioning can positively impact queries that frequently full scan large database tables by eliminating large sections of data from consideration.

Suppose you have a table with one billion rows in it. There is a timestamp attribute. Your query is going to retrieve one year’s worth of data from this table (and it has ten years of data). Your query uses a full table scan to retrieve this data. Had it been partitioned by this timestamp entry—say, a partition per month—then you could have full scanned one-tenth the data (assuming a uniform distribution of data over the years).

Partition elimination would have removed the other 90 percent of the data from consideration. Your query would likely run faster.

Now, take a similar table in an OLTP system. You would never retrieve ten percent of a one billion–row table in that type of application.

Therefore, the massive increase in speed seen by the data warehouse just would not be achievable in a transactional system. You are not doing the same sort of work, and the same possible improvements are just not realistic.

Therefore, in general, in your OLTP system the first bullet point is not achievable, and you won’t be applying partitioning predominantly for increased performance. Increased availability—absolutely.

Administrative ease of use—very much so. But in an OLTP system, I say you have to work hard to make sure you achieve the second point: that you do not impact the performance of your queries at all, negatively or positively. Many times, your goal is to apply partitioning without affecting query response time.

On many occasions, I’ve seen that the implementation team will see they have a medium-sized table, say of 100 million rows. Now, 100 million sounds like an incredibly large number (and five or ten years ago, it would have been, but time changes all things).

So the team decides to partition the data. But in looking at the data, there are no logical attributes that make sense for RANGE partitioning. There are no sensible attributes for that. Likewise, LIST partitioning doesn’t make sense.

Nothing pops out of this table as being the right thing to partition by. So, the team opts for hash partitioning on the primary key, which just happens to be populated by an Oracle sequence number. It looks perfect, it is unique and easy to hash, and many queries are of the form SELECT * FROM T WHERE PRIMARY_KEY = :X.

But the problem is there are many other queries against this object that are not of that form. For illustrative purposes, assume the table in question is really the ALL_OBJECTS dictionary view, and while internally many queries would be of the form WHERE OBJECT_ID = :X, the end users frequently have these requests of the application as well:

•\ Show me the details of SCOTT’s EMP table (where owner = 😮 and object_type = :t and object_name = :n).
•\ Show me all of the tables SCOTT owns (where owner = 😮 andobject_type = :t).
•\ Show me all of the objects SCOTT owns (where owner = :o).

In support of those queries, you have an index on (OWNER,OBJECT_TYPE,OBJECT_ NAME). But you also read that local indexes are more available, and you would like to be more available regarding your system, so you implement them. You end up re-creating your table like this, with 16 hash partitions:

However, when you run this with autotrace on and review the output, you notice the following performance characteristics:

You compare that to the same table, only with no partitioning implemented, and discover the following:

You might immediately jump to the (erroneous) conclusion that partitioning causes an increase in I/O. The query gets 7 consistent gets without partitioning and 36 with partitioning. If your system had an issue with high consistent gets (logical I/Os before), it is worse now. If it didn’t have one before, it might well get one.

The root cause? The index partitioning scheme. Look at the following output from an explain plan for the partitioned version of this table:

This query has to look at each and every index partition here. The reason for that is because entries for SCOTT may well be in each and every index partition and probably is.

The index is logically hash partitioned by OBJECT_ID; any query that uses this index and that does not also refer to the OBJECT_ID in the predicate must consider every index partition! So, what is the solution here? You should globally partition your index. Using the previous case as the example, we could choose to hash partition the index:

Note There are considerations to be taken into account with hash partitioned indexes regarding range scans, which we’ll discuss later in this section.

Partial Indexes-Partitioning

Posted on February 20, 2024July 25, 2024 by Barbara Raymond

Oracle allows you to create either local or global indexes on a subset of partitions in a table. You may want to do this if you’ve pre-created partitions and don’t yet have data for range partitions that map to future dates—the idea being that you’ll build the index after the partitions have been loaded (at some future date).

You set up the use of a partial index by first specifying INDEXING ON|OFF for each partition in the table. In this next example, PART_1 has indexing turned on and PART_2 has indexing turned off:

$ sqlplus eoda/foo@PDB1
SQL> CREATE TABLE p_table (a int)PARTITION BY RANGE (a)
(PARTITION part_1 VALUES LESS THAN(1000) INDEXING ON, PARTITION part_2 VALUES LESS THAN(2000) INDEXING OFF);Table created.

Next, a partial local index is created:
SQL> create index pi1 on p_table(a) local indexing partial; Index created.

In this scenario, the INDEXING PARTIAL clause instructs Oracle to only build and make usable local index partitions that point to partitions in the table that were defined with INDEXING ON. In this case, one usable index partition will be created with index entries pointing to data in the PART_1 table partition:

SQL> select a.index_name, a.partition_name, a.status from user_ind_partitions a, user_indexes b where b.table_name = ‘P_TABLE’and a.index_name = b.index_name;
INDEX_NAME PARTITION_NAME STATUS
PI1 PART_2 UNUSABLE
PI1 PART_1 USABLE

Next, we’ll insert some test data, generate statistics, set autotrace on, and run a query that should locate data in the PART_1 partition:
SQL> insert into p_table select rownum from dual connect by level < 2000; 1999 rows created. SQL> exec dbms_stats.gather_table_stats(user,’P_TABLE’); PL/SQL procedure successfully comple
SQL> explain plan for select * from p_table where a = 20;Explained.
SQL> select * from table(dbms_xplan.display(null,null,’BASIC +PARTITION’));

0 SELECT STATEMENT
1 PARTITION RANGE SINGLE
2 INDEX RANGE SCAN

As expected, the optimizer was able to generate an execution plan utilizing the index. Next, a query is issued that selects data from the partition defined with INDEXING OFF:
SQL> explain plan for select * from p_table where a = 1500;Explained.
SQL> select * from table(dbms_xplan.display(null,null,’BASIC +PARTITION’));

0 SELECT STATEMENT
1 PARTITION RANGE SINGLE
2 TABLE ACCESS FULL

The output shows a full table scan of PART_2 was required, as there is no usable index with entries pointing at data in PART_2. We can instruct Oracle to create index entries pointing to data in PART_2 by rebuilding the index partition associated with the PART_2 partition:
SQL> alter index pi1 rebuild partition part_2; Index altered.

Rerunning the previous select query shows that the optimizer is now utilizing the local partitioned index pointing to the PART_2 table partition:
0 SELECT STATEMENT
1 PARTITION RANGE SINGLE
2 INDEX RANGE SCAN

In this way, partial indexes allow you to disable the index while the table partition is being loaded (increasing the loading speed), and then later you can rebuild the partial index to make it available.

OLTP and Global Indexes-Partitioning-2

Posted on January 29, 2024July 25, 2024 by Barbara Raymond

This shows the distribution of data, by location, into the individual partitions. We can now review some query plans to see what we could expect performance-wise:

SQL> variable x varchar2(30);
SQL> begindbms_stats.set_table_stats( user, ‘EMP’, numrows=>100000, numblks => 10000 ); end;/
PL/SQL procedure successfully completed.
SQL> explain plan for select empno, job, loc from emp where empno = :x; Explained.
SQL> select * from table(dbms_xplan.display(null,null,’BASIC +PARTITION’));

The plan here shows an INDEX UNIQUE SCAN of the nonpartitioned index EMP_PK that was created in support of our primary key. Then there is a TABLE ACCESS BY GLOBAL INDEX ROWID, with a PSTART and PSTOP of ROWID/ROWID, meaning that when we get the ROWID from the index, it will tell us precisely which index partition to read to get this row. This index access will be as effective as on a nonpartitioned table and perform the same amount of I/O to do so. It is just a simple, single index unique scan followed by “get this row by rowid.” Now, let’s look at one of the other global indexes, the one on JOB:

SQL> explain plan for select empno, job, loc from emp where job = :x; Explained.
SQL> select * from table(dbms_xplan.display);

Sure enough, we see a similar effect for the INDEX RANGE SCAN. Our indexes are used and can provide high-speed OLTP access to the underlying data. If they were partitioned, they would have to be prefixed and enforce index partition elimination; hence, they are scalable as well, meaning we can partition them and observe the same behavior. In a moment, we’ll look at what would happen if we used LOCAL indexes only.

Lastly, let’s look at the area of availability. The Oracle documentation claims that globally partitioned indexes make for less available data than locally partitioned indexes. I don’t fully agree with this blanket characterization. I believe that in an OLTP system they are as highly available as a locally partitioned index. Consider the following:

SQL> alter tablespace p1 offline;
SQL> alter tablespace p2 offline;
SQL> alter tablespace p3 offline;
SQL> select empno, job, loc from emp where empno = 7782;
EMPNO JOB LOC
7782 MANAGER NEW YORK

Here, even though most of the underlying data is unavailable in the table, we can still gain access to any bit of data available via that index. As long as the EMPNO we want is in a tablespace that is available, and our GLOBAL index is available, our GLOBAL index works for us. On the other hand, if we had been using the highly available local index in the preceding case, we might have been prevented from accessing the data! This is a side effect of the fact that we partitioned on LOC but needed to query by EMPNO. We would have had to probe each local index partition and would have failed on the index partitions that were not available.

Other types of queries, however, will not (and cannot) function at this point in time:
SQL> select empno, job, loc from emp where job = ‘CLERK’;
ERROR at line 1:
ORA-00376: file 38 cannot be read at this time ORA-01110: data file 38: ‘/opt/oracle/oradata/CDB/ C217E68DF48779E1E0530101007F73B9/datafile/o1_mf_p2_jcbnhfh2_.dbf’

The CLERK data is in all of the partitions, and the fact that three of the tablespaces are offline does affect us. This is unavoidable unless we had partitioned on JOB, but then we would have had the same issues with queries that needed data by LOC. Anytime you need to access the data from many different keys, you will have this issue. Oracle will give you the data whenever it can.

Note, however, that if the query can be answered from the index, avoiding the TABLE ACCESS BY ROWID, the fact that the data is unavailable is not as meaningful:
SQL> select count() from emp where job = ‘CLERK’;COUNT()

Since Oracle didn’t need the table in this case, the fact that most of the partitions were offline doesn’t affect this query (assuming the index isn’t in one of the offline tablespaces of course). As this type of optimization (i.e., answer the query using just the index) is common in an OLTP system, there will be many applications that are not affected by the data that is offline. All we need to do now is make the offline data available as fast as possible (restore it and recover it).

OLTP and Global Indexes-Partitioning-1

Posted on January 20, 2024July 25, 2024 by Barbara Raymond

An OLTP system is characterized by the frequent occurrence of many small read and write transactions. In general, fast access to the row (or rows) you need is paramount. Data integrity is vital. Availability is also very important. Global indexes make sense in many cases in OLTP systems. Table data can be partitioned by only one key—one set of columns. However, you may need to access the data in many different ways. You might partition EMPLOYEE data by LOCATION in the table, but you still need fast access to EMPLOYEE data by

•\ DEPARTMENT: Departments are geographically dispersed. There is no relationship between a department and a location.
•\ EMPLOYEE_ID: While an employee ID will determine a location, you don’t want to have to search by EMPLOYEE_ID and LOCATION; hence, partition elimination cannot take place on the index partitions. Also, EMPLOYEE_ID by itself must be unique.
•\ JOB_TITLE: There is no relationship between JOB_TITLE and LOCATION. All JOB_TITLE values may appear in any LOCATION.

There is a need to access the EMPLOYEE data by many different keys in different places in the application, and speed is paramount. In a data warehouse, we might just use locally partitioned indexes on these keys and use parallel index range scans to collect a large amount of data fast. In these cases, we don’t necessarily need to use index partition elimination. In an OLTP system, however, we do need to use it. Parallel query is not appropriate for these systems; we need to provide the indexes appropriately. Therefore, we will need to make use of global indexes on certain fields.

The following are the goals we need to meet:
•\ Fast access
•\ Data integrity
•\ Availability

Global indexes can help us accomplish these goals in an OLTP system. We will probably not be doing sliding windows, auditing aside for a moment. We will not be splitting partitions (unless we have a scheduled downtime), we will not be moving data, and so on. The operations we perform in a data warehouse are not done on a live OLTP system in general.

Here is a small example that shows how we can achieve the three goals just listed with global indexes. I am going to use simple, single partition global indexes, but the results would not be different with global indexes in multiple partitions (except for the fact that availability and manageability would increase as we added index partitions). We start by creating tablespaces P1, P2, P3, and P4, then create a table that is range partitioned by location, LOC, according to our rules, which place all LOC values less than ‘C’ into partition P1, those less than ‘D’ into partition P2, and so on:

$ sqlplus eoda/foo@PDB1
SQL> create tablespace p1 datafile size 1m autoextend on next 1m; Tablespace created.
SQL> create tablespace p2 datafile size 1m autoextend on next 1m; Tablespace created.
SQL> create tablespace p3 datafile size 1m autoextend on next 1m; Tablespace created.
SQL> create tablespace p4 datafile size 1m autoextend on next 1m; Tablespace created.
SQL> create table emp(EMPNO NUMBER(4) NOT NULL,ENAME VARCHAR2(10),JOB VARCHAR2(9),MGR NUMBER(4),HIREDATE DATE,SAL NUMBER(7,2),COMM NUMBER(7,2),DEPTNO NUMBER(2) NOT NULL,LOC VARCHAR2(13) NOT NULL)partition by range(loc) (partition p1 values less than(‘C’) tablespace p1, partition p2 values less than(‘D’) tablespace p2, partition p3 values less than(‘N’) tablespace p3, partition p4 values less than(‘Z’) tablespace p4 );Table created.

We alter the table to add a constraint on the primary key column:
SQL> alter table emp add constraint emp_pk primary key(empno); Table altered.

A side effect of this is that there exists a unique index on the EMPNO column. This shows we can support and enforce data integrity, one of our goals. Finally, we create two more global indexes on DEPTNO and JOB to facilitate accessing records quickly by those attributes:

SQL> create index emp_job_idx on emp(job) GLOBAL; Index created.
SQL> create index emp_dept_idx on emp(deptno) GLOBAL; Index created.
SQL> insert into empselect e.*, d.locfrom scott.emp e, scott.dept dwhere e.deptno = d.deptno;14 rows created.

Let’s see what is in each partition:
SQL> break on pname skip 1
SQL> select ‘p1’ pname, empno, job, loc from emp partition(p1) union allselect ‘p2’ pname, empno, job, loc from emp partition(p2)union allselect ‘p3’ pname, empno, job, loc from emp partition(p3)union allselect ‘p4’ pname, empno, job, loc from emp partition(p4);

Asynchronous Global Index Maintenance-Partitioning

Posted on October 20, 2023July 25, 2024 by Barbara Raymond

As shown in the prior section, you can maintain global indexes while dropping or truncating partitions via the UPDATE GLOBAL INDEXES clause. However, as shown previously, such operations come at a cost in terms of time and resource consumption.

When dropping or truncating table partitions, Oracle postpones the removal of the global index entries associated with the dropped or truncated partitions. This is known as asynchronous global index maintenance. Oracle postpones the maintenance of the global index to a future time while keeping the global index usable.

The idea is that this improves the performance of dropping/truncating partitions while keeping any global indexes in a usable state. The actual cleanup of the index entries is done later (asynchronously) either by the DBA or by an automatically scheduled Oracle job. It’s not that less work is being done, rather it’s the cleanup of index entries is decoupled from the DROP/TRUNCATE statement.

A small example will demonstrate asynchronous global index maintenance. To set this up, we create a table, populate it with test data, and create a global index:
$ sqlplus eoda/foo@PDB1
SQL> drop table partitioned;
SQL> CREATE TABLE partitioned( timestamp date,id int)PARTITION BY RANGE (timestamp) (PARTITION fy_2020 VALUES LESS THAN (to_date(’01-jan-2021′,’dd-mon-yyyy’)), PARTITION fy_2021 VALUES LESS THAN( to_date(’01-jan-2022′,’dd-mon-yyyy’)));
SQL> insert into partitioned partition(fy_2020)select to_date(’31-dec-2020′,’dd-mon-yyyy’)-mod(rownum,364), rownum from dual connect by level < 100000;99999 rows created. SQL> insert into partitioned partition(fy_2021)select to_date(’31-dec-2021′,’dd-mon-yyyy’)-mod(rownum,364), rownum from dual connect by level < 100000;99999 rows created. SQL> create index partitioned_idx_global on partitioned(timestamp) GLOBAL; Index created.

Next, we’ll run a query to retrieve the current values of redo size and db block gets statistics for the current session:
SQL> col r1 new_value r2
SQL> col b1 new_value b2
SQL> select * from(select b.value r1from v$statname a, v$mystat bwhere a.statistic# = b.statistic#and a.name = ‘redo size’),(select b.value b1from v$statname a, v$mystat bwhere a.statistic# = b.statistic#and a.name = ‘db block gets’);
R1 B1
56928036 80829

Next, a partition is dropped with the UPDATE GLOBAL INDEXES clause specified:
SQL> alter table partitioned drop partition fy_2020 update global indexes; Table altered.

Now we’ll calculate the amount of redo generated and the number of current blocks accessed:
SQL> select * from(select b.value – &r2 redo_genfrom v$statname a, v$mystat bwhere a.statistic# = b.statistic#and a.name = ‘redo size’),(select b.value – &b2 db_block_getsfrom v$statname a, v$mystat bwhere a.statistic# = b.statistic#and a.name = ‘db block gets’);
old 2: (select b.value – &r2 redo_gen
new 2: (select b.value – 4816712 redo_gen
old 6: (select b.value – &b2 db_block_gets
new 6: (select b.value – 4512 db_block_gets
REDO_GEN DB_BLOCK_GETS
16864 103

Only a small amount of the redo is generated, and a small number of blocks are accessed. The reason behind this is that Oracle doesn’t immediately perform the index maintenance of removing the index entries from the dropped partition. Rather, these entries are marked as orphaned and will later be cleaned up by Oracle. The existence of orphaned entries can be verified via the following:
SQL> select index_name, orphaned_entries, status from user_indexes where table_name=’PARTITIONED’;
INDEX_NAME ORP STATUS
PARTITIONED_IDX_GLOBAL YES VALID

How do the orphaned entries get cleaned up? Oracle has an automatically scheduled
PMO_DEFERRED_GIDX_MAINT_JOB, which runs in a nightly maintenance window:
SQL> select job_name from dba_scheduler_jobs where job_name like ‘PMO%’;
JOB_NAME
PMO_DEFERRED_GIDX_MAINT_JOB

If you don’t want to wait for that job, you can manually clean up the entries yourself:
SQL> exec dbms_part.cleanup_gidx; PL/SQL procedure successfully completed.

Now checking for orphaned rows shows there are none:
SQL> select index_name, orphaned_entries, status from user_indexes where table_name=’PARTITIONED’;
INDEX_NAME ORP STATUS
PARTITIONED_IDX_GLOBAL NO VALID

In this way, you can perform operations such as dropping and truncating partitions and still leave your global indexes in a usable state without the immediate overhead of cleaning up the index entries as part of the drop/truncate operation.

Tip See Oracle Support note 1482264.1 for further details on asynchronous global index maintenance.