Sliding Windows and Indexes-Partitioning-1

The following example implements a classic sliding window of data. In many implementations, data is added to a warehouse over time and the oldest data is aged out. Many times, this data is range partitioned by a date attribute, so that the oldest data is stored together in a single partition, and the newly loaded data is likewise stored together in a new partition. The monthly load process involves the following:

•\ Detaching the old data: The oldest partition is either dropped or exchanged with an empty table (turning the oldest partition into a table) to permit archiving of the old data.
•\ Loading and indexing of the new data: The new data is loaded into a work table and indexed and validated.
•\ Attaching the new data: Once the new data is loaded and processed, the table it is in is exchanged with an empty partition in the partitioned table, turning this newly loaded data in a table into a partition of the larger partitioned table.

This process is repeated every month, or however often the load process is performed; it could be every day or every week. We will implement this very typical process in this section to show the impact of global partitioned indexes and demonstrate the options we have during partition operations to increase availability, allowing us to implement a sliding window of data and maintain continuous availability of data.

We’ll process yearly data in this example and have fiscal years 2020 and 2021 loaded up. The table will be partitioned by the TIMESTAMP column, and it will have two indexes created on it—one is a locally partitioned index on the ID column, and the other is a global index (nonpartitioned, in this case) on the TIMESTAMP column:

$ sqlplus eoda/foo@PDB1
SQL> drop table partitioned;
SQL> CREATE TABLE partitioned( timestamp date,id int)
PARTITION BY RANGE (timestamp)
(PARTITION fy_2020 VALUES LESS THAN
( to_date(’01-jan-2021′,’dd-mon-yyyy’) ) , PARTITION fy_2021 VALUES LESS THAN
( to_date(’01-jan-2022′,’dd-mon-yyyy’) ) ); Table created.
SQL> insert into partitioned partition(fy_2020) select to_date(’31-dec-2020′,’dd-mon-yyyy’)-mod(rownum,360), rownum from dual connect by level <= 70000;70000 rows created. SQL> insert into partitioned partition(fy_2021) select to_date(’31-dec-2021′,’dd-mon-yyyy’)-mod(rownum,360), rownum from dual connect by level <= 70000;70000 rows created. SQL> create index partitioned_idx_local on partitioned(id) LOCAL; Index created.
SQL> create index partitioned_idx_global on partitioned(timestamp) GLOBAL; Index created.

This sets up our warehouse table. The data is partitioned by fiscal year, and we have the last two years’ worth of data online. This table has two indexes: one is LOCAL and the other is GLOBAL. Now it’s the end of the year and we would like to do the following:

\ 1.\ Remove the oldest fiscal year data: We do not want to lose this data forever; we just want to age it out and archive it.
\ 2.\ Add the newest fiscal year data: It will take a while to load it, transform it, index it, and so on. We would like to do this work without impacting the availability of the current data, if at all possible.

The first step is to set up an empty table for fiscal year 2020 that looks just like the partitioned table. We’ll use this table to exchange with the FY_2020 partition in the partitioned table, turning that partition into a table and in turn emptying out the partition in the partitioned table. The net effect is that the oldest data in the partitioned table will have been in effect removed after the exchange:
SQL> create table fy_2020 ( timestamp date, id int ); Table created.
SQL> create index fy_2020_idx on fy_2020(id); Index created.

We’ll do the same to the new data to be loaded. We’ll create and load a table that structurally looks like the existing partitioned table (but that is not itself partitioned):

SQL> create table fy_2022 ( timestamp date, id int ); Table created.
SQL> insert into fy_2022 select to_date(’31-dec-2022′,’dd-mon-yyyy’)-mod(rownum,360), rownum from dual connect by level <= 70000;70000 rows created. SQL> create index fy_2022_idx on fy_2022(id) nologging; Index created.

Sliding Windows and Indexes-Partitioning-2

We’ll turn the current full partition into an empty partition and create a full table with the FY_2020 data in it. Also, we’ve completed all of the work necessary to have the FY_2022 data ready to go. This would have involved verifying the data, transforming it— whatever complex tasks we need to undertake to get it ready. Now we’re ready to update the live data using an exchange partition:

SQL> alter table partitionedexchange partition fy_2020with table fy_2020including indexeswithout validation;Table altered.
SQL> alter table partitioned drop partition fy_2020; Table altered.

This is all we need to do to age the old data out. We turned the partition into a full table and the empty table into a partition. This was a simple data dictionary update. No large amount of I/O took place—it just happened. We can now export that FY_2020 table (perhaps using a transportable tablespace) out of our database for archival purposes. We could reattach it quickly if we ever needed to.

Next, we want to slide in the new data:
SQL> alter table partitioned add partition fy_2022values less than ( to_date(’01-jan-2023′,’dd-mon-yyyy’) ); Table altered.
SQL> alter table partitionedexchange partition fy_2022with table fy_2022including indexeswithout validation;Table altered.

Again, this was instantaneous; it was accomplished via simple data dictionary updates—the WITHOUT VALIDATION clause allowed us to accomplish that. When you use that clause, the database will trust that the data you are placing into that partition is, in fact, valid for that partition. Adding the empty partition took very little time to process.

Then, we exchange the newly created empty partition with the full table, and the full table with the empty partition, and that operation is performed quickly as well. The new data is online. Looking at our indexes, however, we’ll find the following:
SQL> select index_name, status from user_indexes;
INDEX_NAME STATUS
PARTITIONED_IDX_LOCAL N/A
PARTITIONED_IDX_GLOBAL UNUSABLE
FY_2020_IDX VALID
FY_2022_IDX VALID

The global index is, of course, unusable after this operation. Since each index partition can point to any table partition, and we just took away a partition and added a partition, that index is invalid. It has entries that point into the partition we dropped. It has no entries that point into the partition we just added. Any query that would make use of this index would fail and not execute, or if we skip unusable indexes, the query’s performance would be negatively impacted by not being able to use the index:

SQL> select /*+ index( partitioned PARTITIONED_IDX_GLOBAL ) */ count() from partitioned where timestamp between to_date( ’01-mar-2022′, ‘dd-mon-yyyy’ ) and to_date( ’31-mar-2022′, ‘dd-mon-yyyy’ );select /+ index( partitioned PARTITIONED_IDX_GLOBAL ) / count()
ERROR at line 1:
ORA-01502: index ‘EODA.PARTITIONED_IDX_GLOBAL’ or partition of such index is in unusable state
SQL> explain plan for select count(*)from partitionedwhere timestamp between to_date( ’01-mar-2022′, ‘dd-mon-yyyy’ ) and to_date( ’31-mar-2022′, ‘dd-mon-yyyy’ );

So, our choices after performing this partition operation with global indexes are
•\ Ensure that SKIP_UNUSABLE_INDEXES=TRUE (the default is TRUE), andthen Oracle will not use this unusable index. But then we lose the performance the index was giving us.
•\ Rebuild this index to make it usable again.
•\ Use the UPDATE GLOBAL INDEXES clause when performing partition maintenance (the focus of the next section in this chapter).

The sliding window process, which so far has resulted in virtually no downtime, will now take a very long time to complete while we rebuild the global index. Runtime query performance of queries that relied on these indexes will be negatively affected during this time—either they will not run at all or they will run without the benefit of the index. All of the data must be scanned and the entire index reconstructed from the table data. If the table is many hundreds of gigabytes in size, this will take considerable resources.

“Live” Global Index Maintenance-Partitioning

Oracle has the ability to maintain the global indexes during the partition operation using the UPDATE GLOBAL INDEXES clause. This means that as you drop a partition, split a partition, perform whatever operation necessary on a partition, Oracle will perform the necessary modifications to the global index to keep it up to date.

Since most partition operations will cause this global index invalidation to occur, this feature can be a boon to systems that need to provide continual access to the data. You’ll find that you sacrifice the raw speed of the partition operation, but with the associated window of unavailability immediately afterward as you rebuild indexes, for a slower overall response time from the partition operation but coupled with 100 percent data availability.

In short, if you have a data warehouse that cannot have downtime, but must support these common data warehouse techniques of sliding data in and out, then this feature is for you—but you must understand the implications.

Revisiting our previous example, if our partition operations had used the UPDATE GLOBAL INDEXES clause when relevant (in this example, it would not be needed on the ADD PARTITION statement since the newly added partition would not have any rows in it), we would have discovered the indexes to be perfectly valid and usable both during and after the operation:

SQL> alter table partitioned exchange partition fy_2020 with table fy_2020 including indexes without validation UPDATE GLOBAL INDEXES;Table altered.

Note in the following output, the N/A status observed for the PARTITIONED_IDX_LOCAL index simply means the statuses are associated with the index partitions associated with that index, not the index itself. It doesn’t make sense to say the locally partitioned index is valid or not; it is just a container that logically holds the index partitions themselves:

SQL> select index_name, status from user_indexes;
INDEX_NAME STATUS
PARTITIONED_IDX_LOCAL N/A
PARTITIONED_IDX_GLOBAL VALID
FY_2020_IDX VALID
FY_2022_IDX VALID

But there is a trade-off: we are performing the logical equivalent of DELETE and INSERT operations on the global index structures. When we drop a partition, we have to delete all of the global index entries that might be pointing to that partition.

When we did the exchange of a table with a partition, we had to delete all of the global index entries pointing to the original data and then insert all of the new ones that we just slid in there. So the amount of work performed by the ALTER commands was significantly increased.

You should expect with global index maintenance considerations that the approach without index maintenance will consume fewer database resources and therefore ­perform faster but incur a measurable period of downtime.

The second approach, involving maintaining the indexes, will consume more resources and perhaps take longer overall, but will not incur downtime. As far as the end users are concerned, their ability to work never ceased. They might have been processing a bit slower (since we were competing with them for resources), but they were still processing, and they never stopped.

The index rebuild approach will almost certainly run faster, considering both the elapsed time and the CPU time. This fact has caused many a DBA to pause and say, “Hey, I don’t want to use UPDATE GLOBAL INDEXES—it’s slower.”

That is too simplistic of a view, however. What you need to remember is that while the operations overall took longer, processing on your system was not necessarily interrupted. Sure, you as the DBA might be looking at your screen for a longer period of time, but the really important work that takes place on your system was still taking place.

You need to see if this trade-off makes sense for you. If you have an eight-hour maintenance window overnight in which to load new data, then by all means, use the rebuild approach if that makes sense. However, if you have a mandate to be available continuously, then the ability to maintain the global indexes will be crucial.

One more thing to consider is the redo generated by each approach. You will find that the UPDATE GLOBAL INDEXES generates considerably more redo (due to the index maintenance), and you should expect that to only go up as you add more and more global indexes to the table.

The redo generated by the UPDATE GLOBAL INDEXES processing is unavoidable and cannot be turned off via NOLOGGING, since the maintenance of the global indexes is not a complete rebuild of their structure but more of an incremental maintenance.

Additionally, since you are maintaining the live index structure, you must generate undo for that—in the event the partition operation fails, you must be prepared to put the index back the way it was. And remember, undo is protected by redo itself, so some of the redo you see generated is from the index updates and some is from the rollback. Add another global index or two and you would reasonably expect these numbers to increase.

So, UPDATE GLOBAL INDEXES is an option that allows you to trade off availability for resource consumption. If you need to provide continuous availability, it’s the option for you. But you have to understand the ramifications and size other components of your system appropriately. Specifically, many data warehouses have been crafted over time to use bulk direct path operations, bypassing undo generation and, when permitted, redo generation as well. Using UPDATE GLOBAL INDEXES cannot bypass either of those two ­elements. You need to examine the rules you use to size your redo and undo needs before using this feature, so you can assure yourself it can work on your system.

Hash Partition Using Powers of Two-Partitioning

I mentioned earlier that the number of partitions should be a power of two. This is easily observed to be true. To demonstrate, we’ll set up a stored procedure to automate the creation of a hash partitioned table with N partitions (N will be a parameter). This procedure will construct a dynamic query to retrieve the counts of rows by partition and then display the counts and a simple histogram of the counts by partition. Lastly, it will open this query and let us see the results. This procedure starts with the hash table creation. We will use a table named T:

SQL> create or replaceprocedure hash_proc( p_nhash in number,p_cursor out sys_refcursor )authid current_userasl_text long;l_template long :=’select $POS$ oc, ”p$POS$” pname, count(*) cnt ‘ || ‘from t partition ( $PNAME$ ) union all ‘;table_or_view_does_not_exist exception;pragma exception_init( table_or_view_does_not_exist, -942 ); beginbegin
execute immediate ‘drop table t’; exception when table_or_view_does_not_existthen null;end;
execute immediate ‘CREATE TABLE t ( id )partition by hash(id)partitions ‘ || p_nhash || ‘asselect rownumfrom all_objects’;

Next, we will dynamically construct a query to retrieve the count of rows by partition. It does this using the template query defined earlier. For each partition, we’ll gather the count using the partition-extended table name and union all of the counts together:
for x in ( select partition_name pname, PARTITION_POSITION pos
from user_tab_partitions where table_name = ‘T’
order by partition_position )loopl_text := l_text ||replace(replace(l_template,’$POS$’, x.pos),’$PNAME$’, x.pname );end loop;

Now, we’ll take that query and select out the partition position (PNAME) and the count of rows in that partition (CNT). Using RPAD, we’ll construct a rather rudimentary but effective histogram:

open p_cursor for’select pname, cnt,

substr( rpad(””,30round( cnt/max(cnt)over(),2),”*”),1,30) hg from (‘ || substr( l_text, 1, length(l_text)-11 ) || ‘) order by oc’;end;/

If we run this with an input of 4, for four hash partitions, we would expect to see output similar to the following:
SQL> variable x refcursor
SQL> set autoprint on
SQL> exec hash_proc( 4, 😡 );
PL/SQL procedure successfully completed.

The simple histogram depicted shows a nice, even distribution of data over each of the four partitions. Each has close to the same number of rows in it. However, if we simply go from four to five hash partitions, we’ll see the following:
SQL> exec hash_proc( 5, 😡 );
PL/SQL procedure successfully completed.
PN CNT HG

This histogram points out that the first and last partitions have just half as many rows as the interior partitions. The data is not very evenly distributed at all. We’ll see the trend continue for six and seven hash partitions:
SQL> exec hash_proc( 6, 😡 );
PL/SQL procedure successfully completed.
PN CNT HG

SQL> exec hash_proc( 7, 😡 );
PL/SQL procedure successfully completed.
PN CNT HG

As soon as we get back to a number of hash partitions that is a power of two, we achieve the goal of even distribution once again:
SQL> exec hash_proc( 8, 😡 );
PL/SQL procedure successfully completed.
PN CNT HG

If we continue this experiment up to 16 partitions, we would see the same effects for the 9th through the 15th partitions—a skewing of the data to the interior partitions, away from the edges—and then upon hitting the 16th partition, you would see a flattening out again. The same would be true again up to 32 partitions, and then 64, and so on. This example just points out the importance of using a power of two as the number of hash partitions.

Global Indexes-Partitioning

Global indexes are partitioned using a scheme that is different from that used in the underlying table. The table might be partitioned by a TIMESTAMP column into ten partitions, and a global index on that table could be partitioned into five partitions by the REGION column. Unlike local indexes, there is only one class of global index, and that is a prefixed global index. There is no support for a global index whose index key does not begin with the partitioning key for that index. That implies that whatever attribute(s) you use to partition the index will be on the leading edge of the index key itself.

Building on our previous example, here is a quick example of the use of a global index. It shows that a global partitioned index can be used to enforce uniqueness for a primary key, so you can have partitioned indexes that enforce uniqueness, but do not include the partition key of the table. The following example creates a table partitioned by TIMESTAMP that has an index partitioned by ID:
$ sqlplus eoda/foo@PDB1
SQL> drop table partitioned;
SQL> CREATE TABLE partitioned( timestamp date,id int)
PARTITION BY RANGE (timestamp)
PARTITION part_1 VALUES LESS THAN
( to_date(’01-jan-2021′,’dd-mon-yyyy’) ) , PARTITION part_2 VALUES LESS THAN
( to_date(’01-jan-2022′,’dd-mon-yyyy’) ));Table created.
SQL> create index partitioned_indexon partitioned(id)GLOBALpartition by range(id)
partition part_1 values less than(1000), partition part_2 values less than (MAXVALUE) );Index created.

Note the use of MAXVALUE in this index. MAXVALUE can be used in any range partitioned table as well as in the index. It represents an infinite upper bound on the range. In our examples so far, we’ve used hard upper bounds on the ranges (values less than ). However, a global index has a requirement that the highest partition (the last partition) must have a partition bound whose value is MAXVALUE. This ensures that all rows in the underlying table can be placed in the index.

Now, completing this example, we’ll add our primary key to the table:
SQL> alter table partitioned add constraint partitioned_pk primarykey(id);Table altered.

It is not evident from this code that Oracle is using the index we created to enforce the primary key (it is to me because I know that Oracle is using it), so we can prove it by simply trying to drop that index:
SQL> drop index partitioned_index;drop index partitioned_index
ERROR at line 1:
ORA-02429: cannot drop index used for enforcement of unique/primary key

To show that Oracle will not allow us to create a nonprefixed global index, we only need try the following:
SQL> create index partitioned_index2on partitioned(timestamp,id)GLOBALpartition by range(id)(partition part_1 values less than(1000), partition part_2 values less than (MAXVALUE) );partition by range(id)
ERROR at line 4:
ORA-14038: GLOBAL partitioned index must be prefixed

The error message is pretty clear. The global index must be prefixed. So, when would you use a global index? We’ll take a look at two system types, data warehouse and OLTP, and see when they might apply.

Data Warehousing and Global Indexes

Many data warehouses implement a sliding window approach to managing data—that is, drop the oldest partition of a table and add a new partition for the newly loaded data. In the following sections, we’ll take a look at what is meant by a sliding window of data and the potential impact of a global index on it.

Composite Partitioning-Partitioning

Lastly, we’ll look at some examples of composite partitioning, which is a mixture of range, hash, and/or list. Table 13-1 lists the various ways you can mix and match types of partitioning. In other words, the table shows there are nine different combinations of composite partitioning currently available.

Table 13-1.  Oracle Database Supported Composite Partitioning Schemes by Version

It is interesting to note that when you use composite partitioning, there will be no partition segments; there will be only subpartition segments.
When using composite partitioning, the partitions themselves do not have segments (much like a partitioned table doesn’t have a segment).
The data is physically stored in subpartition segments, and the partition becomes a logical container, or a container that points to the actual subpartitions.

In our example, we’ll look at a range-hash composite partitioning. Here, we are using a different set of columns for the range partition from those used for the hash partition. This is not mandatory; we could use the same set of columns for both:
$ sqlplus eoda/foo@PDB1
SQL> CREATE TABLE composite_example
( range_key_column date,hash_key_column int,data varchar2(20))PARTITION BY RANGE (range_key_column)subpartition by hash(hash_key_column) subpartitions 2

PARTITION part_1
VALUES LESS THAN(to_date(’01/01/2021′,’dd/mm/yyyy’))
(subpartition part_1_sub_1,subpartition part_1_sub_2),

PARTITION part_2
VALUES LESS THAN(to_date(’01/01/2022′,’dd/mm/yyyy’))
(subpartition part_2_sub_1,subpartition part_2_sub_2));Table created.

In range-hash composite partitioning, Oracle will first apply the range partitioning rules to figure out which range the data falls into.
Then it will apply the hash function to decide into which physical partition the data should finally be placed. This process is described in Figure 13-4.

Figure 13-4.  Range-hash composite partition example

So, composite partitioning gives you the ability to break your data up by range and, when a given range is considered too large or further partition elimination could be useful, to break it up further by hash or list. It is interesting to note that each range partition need not have the same number of subpartitions; for example, suppose you were range partitioning on a date column in support of data purging (to remove all old data rapidly and easily).

In the year 2020 and before, you had equal amounts of data in odd code numbers in the CODE_KEY_COLUMN and in even code numbers. But after that, you knew the number of records associated with the odd code number was more than double, and you wanted to have more subpartitions for the odd code values.

You can achieve that rather easily just by defining more subpartitions:
SQL> CREATE TABLE composite_range_list_example ( range_key_column date,code_key_column int,data varchar2(20))PARTITION BY RANGE (range_key_column)subpartition by list(code_key_column)

PARTITION part_1
VALUES LESS THAN(to_date(’01/01/2021′,’dd/mm/yyyy’)) (subpartition part_1_sub_1 values( 1, 3, 5, 7 ), subpartition part_1_sub_2 values( 2, 4, 6, 8 )),

PARTITION part_2
VALUES LESS THAN(to_date(’01/01/2022′,’dd/mm/yyyy’)) (subpartition part_2_sub_1 values ( 1, 3 ), subpartition part_2_sub_2 values ( 5, 7 ), subpartition part_2_sub_3 values ( 2, 4, 6, 8 )));Table created.

Here, you end up with five partitions altogether: two subpartitions for partition PART_1 and three for partition PART_2.

Local Indexes and Unique Constraints-Partitioning

To enforce uniqueness—and that includes a UNIQUE constraint or PRIMARY KEY constraints—your partitioning key must be included in the constraint itself if you want to use a local index to enforce the constraint.
This is the largest limitation of a local index, in my opinion. Oracle enforces uniqueness only within an index partition—never across partitions. What this implies, for example, is that you cannot range partition on a TIMESTAMP field and have a primary key on the ID that is enforced using a locally partitioned index. Oracle will instead utilize a global index to enforce uniqueness.

In the next example, we will create a range partitioned table that is partitioned by a column named TIMESTAMP but has a primary key on the ID column. We can do that by executing the following CREATE TABLE statement in a schema that owns no other objects, so we can easily see exactly what objects are created by looking at every segment this user owns:
SQL> CREATE TABLE partitioned( timestamp date,id int,constraint partitioned_pk primary key(id))
PARTITION BY RANGE (timestamp)
PARTITION part_1 VALUES LESS THAN
( to_date(’01/01/2021′,’dd/mm/yyyy’) ) , PARTITION part_2 VALUES LESS THAN
( to_date(’01/01/2022′,’dd/mm/yyyy’) ));Table created.

And inserting some data so that we get segments created:
SQL> insert into partitioned values(to_date(’01/01/2020′,’dd/mm/yyyy’),1); 1 row created.
SQL> insert into partitioned values(to_date(’01/01/2021′,’dd/mm/yyyy’),2); 1 row created.

Assuming we run this in a schema with no other objects created, we’ll see the following:
SQL > select segment_name, partition_name, segment_type from user_segments;
SEGMENT_NAME PARTITION_NAME SEGMENT_TYPE
PARTITIONED PART_1 TABLE PARTITION
PARTITIONED PART_2 TABLE PARTITION
PARTITIONED _PK INDEX

The PARTITIONED_PK index is not even partitioned, let alone locally partitioned, and as we’ll see, it cannot be locally partitioned. Even if we try to trick Oracle by realizing that a primary key can be enforced by a nonunique index as well as a unique index, we’ll find that this approach will not work either:
SQL> drop table partitioned;
SQL> CREATE TABLE partitioned( timestamp date,id int
PARTITION BY RANGE (timestamp)
PARTITION part_1 VALUES LESS THAN
( to_date(’01-jan-2021′,’dd-mon-yyyy’) ) , PARTITION part_2 VALUES LESS THAN
( to_date(’01-jan-2022′,’dd-mon-yyyy’) ));Table created.
SQL> create index partitioned_idx on partitioned(id) local; Index created.

And inserting some data so that we get segments created:
SQL> insert into partitioned values(to_date(’01/01/2020′,’dd/mm/yyyy’),1); 1 row created.
SQL> insert into partitioned values(to_date(’01/01/2021′,’dd/mm/yyyy’),2); 1 row created
SQL> select segment_name, partition_name, segment_type from user_segments;
SEGMENT_NAME PARTITION_NAME SEGMENT_TYPE
PARTITIONED PART_1 TABLE PARTITION
PARTITIONED PART_2 TABLE PARTITION
PARTITIONED_IDX PART_1 INDEX PARTITION
PARTITIONED_IDX PART_2 INDEX PARTITION

Here, Oracle attempts to create a global index on ID, but finds that it cannot since an index already exists. The preceding statements would work if the index we created was not partitioned, as Oracle would have used that index to enforce the constraint.

The reasons why uniqueness cannot be enforced, unless the partition key is part of the constraint, are twofold. First, if Oracle allowed this, it would void most of the advantages of partitions. Availability and scalability would be lost, as each and every partition would always have to be available and scanned to do any inserts and updates.
The more partitions you had, the less available the data would be. The more partitions you had, the more index partitions you would have to scan, and the less scalable partitions would become. Instead of providing availability and scalability, doing this would actually decrease both.

Additionally, Oracle would have to effectively serialize inserts and updates to this table at the transaction level. This is because if we add ID=1 to PART_1, Oracle would have to somehow prevent anyone else from adding ID=1 to PART_2. The only way to do this would be to prevent others from modifying index partition PART_2, since there isn’t anything to really lock in that partition.

In an OLTP system, unique constraints must be system enforced (i.e., enforced by Oracle) to ensure the integrity of data. This implies that the logical model of your application will have an impact on the physical design. Uniqueness constraints will either drive the underlying table partitioning scheme, driving the choice of the partition keys, or point you toward the use of global indexes instead. We’ll take a look at global indexes in more depth next.

Partitioning Indexes-Partitioning

Indexes, like tables, may be partitioned. There are two possible methods to partition indexes:

•\ Equipartition the index with the table: This is also known as a local index. For every table partition, there will be an index partition that indexes just that table partition. All of the entries in a given index partition point to a single table partition, and all of the rows in a single table partition are represented in a single index partition.

•\ Partition the index by range or hash: This is also known as a globally partitioned index. Here, the index is partitioned by range, or optionally by hash, and a single index partition may point to any (and all) table partitions.

Figure 13-5 demonstrates the difference between a local and a global index.

Chapter 13   Partitioning

Figure 13-5.  Local and global index partitions

In the case of a globally partitioned index, note that the number of index partitions may be different from the number of table partitions.

Since global indexes may be partitioned by range or hash only, you must use local indexes if you wish to have a list or composite partitioned index. The local index will be partitioned using the same scheme as the underlying table.

Local Indexes vs. Global Indexes

In my experience, most partition implementations in data warehouse systems use local indexes. In an OLTP system, global indexes are much more common, and we’ll see why shortly. It has to do with the need to perform partition elimination on the index structures to maintain the same query response times after partitioning as before partitioning them.

Local indexes have certain properties that make them the best choice for most data warehouse implementations. They support a more available environment (less downtime), since problems will be isolated to one range or hash of data. On the other hand, since it can point to many table partitions, a global index may become a point of failure, rendering all partitions inaccessible to certain queries.

Local indexes are more flexible when it comes to partition maintenance operations. If the DBA decides to move a table partition, only the associated local index partition needs to be rebuilt or maintained. With a global index, all index partitions must be rebuilt or maintained in real time.

The same is true with sliding window implementations, where old data is aged out of the partition and new data is aged in. No local indexes will be in need of a rebuild, but all global indexes will be either rebuilt or maintained during the partition operation. In some cases, Oracle can take advantage of the fact that the index is locally partitioned with the table and will develop optimized query plans based on that. With global indexes, there is no such relationship between the index and table partitions.

Local indexes also facilitate a partition point-in-time recovery operation. If a single partition needs to be recovered to an earlier point in time than the rest of the table for some reason, all locally partitioned indexes can be recovered to that same point in time. All global indexes would need to be rebuilt on this object. This does not mean “avoid global indexes”—in fact, they are vitally important for performance reasons, as you’ll learn shortly—you just need to be aware of the implications of using them.

Row Movement-Partitioning

You might wonder what would happen if the value of the column used to determine the partition is modified in any of the preceding partitioning schemes. There are two cases to consider:
•\ The modification would not cause a different partition to be used; therow would still belong in this partition. This is supported in all cases.
•\ The modification would cause the row to move across partitions. Thisis supported if row movement is enabled for the table; otherwise, anerror will be raised.

We can observe these behaviors easily. In the previous example in the “RangePartitioning” section, we inserted a pair of rows into PART_1 of the RANGE_EXAMPLE table:
$ sqlplus eoda/foo@PDB1
SQL> CREATE TABLE range_example
( range_key_column date,data varchar2(20))PARTITION BY RANGE (range_key_column)
( PARTITION part_1 VALUES LESS THAN
(to_date(’01/01/2021′,’dd/mm/yyyy’)), PARTITION part_2 VALUES LESS THAN
(to_date(’01/01/2022′,’dd/mm/yyyy’)));Table created.

SQL> insert into range_example( range_key_column, data )values(to_date( ’15-dec-2020 00:00:00′, ‘dd-mon-yyyy hh24:mi:ss’ ), ‘application data…’ );1 row created.

SQL> insert into range_example( range_key_column, data )values( to_date( ’01-jan-2021 00:00:00′, ‘dd-mon-yyyy hh24:mi:ss’ )-1/24/60/60,
‘application data…’ );1 row created.

SQL> select * from range_example partition(part_1); RANGE_KEY DATA
15-DEC-20 application data…
31-DEC-20 application data…

We take one of the rows and update the value in its RANGE_KEY_COLUMN such that it can remain in PART_1:
SQL> update range_example set range_key_column = trunc(range_key_column) where range_key_column =to_date( ’31-dec-2020 23:59:59′, ‘dd-mon-yyyy hh24:mi:ss’ ); 1 row updated.
As expected, this succeeds: the row remains in partition PART_1. Next, we update the
RANGE_KEY_COLUMN to a value that would cause it to belong in PART_2:
SQL> update range_exampleset range_key_column = to_date(’01-jan-2021′,’dd-mon-yyyy’) where range_key_column = to_date(’31-dec-2020′,’dd-mon-yyyy’);update range_example
ERROR at line 1:
ORA-14402: updating partition key column would cause a partition change

This immediately raises an error since we did not explicitly enable row movement. We can enable row movement on this table to allow the row to move from partition to partition.
You should be aware of a subtle side effect of doing this, however, namely, that the ROWID of a row will change as the result of the update:
SQL> select rowid from range_examplewhere range_key_column = to_date(’31-dec-2020′,’dd-mon-yyyy’);
SQL> alter table range_example enable row movement; Table altered.
SQL> update range_exampleset range_key_column = to_date(’01-jan-2021′,’dd-mon-yyyy’) where range_key_column = to_date(’31-dec-2020′,’dd-mon-yyyy’);1 row updated.

As long as you understand that the ROWID of the row will change on this update, enabling row movement will allow you to update partition keys.

Note There are other cases where a ROWID can change as a result of an update. It can happen as a result of an update to the primary key of an IOT. The universal ROWID will change for that row, too. The FLASHBACK TABLE command may also change the ROWID of rows, as might the ALTER TABLE SHRINK command.

You need to understand that, internally, row movement is done as if you had, in fact, deleted the row and reinserted it. It will update every single index on this table and delete the old entry and insert a new one. It will do the physical work of a DELETE plus an INSERT.

However, it is considered an update by Oracle even though it physically deletes and inserts the row—therefore, it won’t cause INSERT and DELETE triggers to fire, just the UPDATE triggers. Additionally, child tables that might prevent a DELETE due to a foreign key constraint won’t.
You do have to be prepared, however, for the extra work that will be performed; it is much more expensive than a normal UPDATE. Therefore, it would be a bad design decision to construct a system whereby the partition key was modified frequently and that modification would cause a partition movement.

Interval Partitioning-Partitioning-1

Interval partitioning is very similar to range partitioning described previously—in fact, it starts with a range partitioned table but adds a rule (the interval) to the definition so the database knows how to add partitions in the future.

The goal of interval partitioning is to create new partitions for data—if, and only if, data exists for a given partition and only when that data arrives in the database. In other words, to remove the need to pre-­ create partitions for data, to allow the data itself to create the partition as it is inserted.

To use interval partitioning, you start with a range partitioned table without a MAXVALUE partition and specify an interval to add to the upper bound, the highest value of that partitioned table to create a new range.

You need to have a table that is range partitioned on a single column that permits adding a NUMBER or INTERVAL type to it (e.g., a table partitioned by a VARCHAR2 field cannot be interval partitioned; there is nothing you can add to a VARCHAR2).

You can use interval partitioning with any suitable existing range partitioned table; that is, you can ALTER an existing range partitioned table to be interval partitioned, or you can create one with the CREATE TABLE command.

For example, suppose you had a range partitioned table that said “anything strictly less than 01-JAN-2021” (data in the year 2020 and before) goes into partition P1—and that was it.

So it had one partition for all data in the year 2020 and before. If you attempted to insert data for the year 2022 into the table, the insert would fail as demonstrated previously in the section on range partitioning.

With interval partitioning, you can create a table and specify both a range (strictly less than 01-JAN-2020) and an interval—say one month in duration—and the database would create monthly partitions (a partition capable of holding exactly one month’s worth of data) as the data arrived.

The database would not pre-create all possible partitions because that would not be practical. But, as each row arrived, the database would see whether the partition for the month in question existed. The database would create the partition if needed. Here is an example of the syntax:

$ sqlplus eoda/foo@PDB1
SQL> create table audit_trail( ts timestamp,data varchar2(30))partition by range(ts)interval (numtoyminterval(1,’month’))store in (users, example )(partition p0 values less than(to_date(’01-01-1900′,’dd-mm-yyyy’)));Table created.

Note You might have a question in your mind, especially if you just finished reading the previous chapter on datatypes. You can see we are partitioning by a TIMESTAMP, and we are adding an INTERVAL of one month to it. In Chapter 12, we saw how adding an INTERVAL of one month to a TIMESTAMP that fell on January 31 would raise an error, since there is no February 31. Will the same issue happen with interval partitioning? The answer is yes; if you attempt to use a date such as ’29-01-1990′ (any day of the month after 28 would suffice), you will receive an error “ORA-14767: Cannot specify this interval with existing high bounds”. The database will not permit you to use a boundary value that is not safe to add the interval to.

On lines 8 and 9, you see the range partitioning scheme for this table; it starts with a single empty partition that would contain any data prior to 01-JAN-1900. Presumably, since the table holds an audit trail, this partition will remain small and empty forever. It is a mandatory partition and is referred to as the transitional partition.

All data that is strictly less than this current high value partition will be range partitioned, using traditional range partitioning. Only data that is created above the transitional partition high value will use interval partitioning. If we query the data dictionary, we can see what has been created so far:

SQL> select a.partition_name, a.tablespace_name, a.high_value, decode( a.interval, ‘YES’, b.interval ) intervalfrom user_tab_partitions a, user_part_tables b where a.table_name = ‘AUDIT_TRAIL’ and a.table_name = b.table_nameorder by a.partition_position;

So far, we have just the single partition, and it is not an INTERVAL partition, as shown by the empty INTERVAL column. Rather, it is just a regular RANGE partition right now; it will hold anything strictly less than 01-JAN-1900.

Looking at the CREATE TABLE statement again, we can see the new interval partitioning–specific information on lines 6 and 7:interval (numtoyminterval(1,’month’))store in (users, example )

On line 6, we have the actual interval specification of NUMTOYMINTERVAL(1,’MONTH’). Our goal was to store monthly partitions—a new partition for each month’s worth of data—a very common goal. By using a date that is safe to add a month to (refer to Chapter 12 for why adding a month to a timestamp can be error-prone in some cases)— the first of the month—we can have the database create monthly partitions on the fly, as data arrives, for us.

On line 7, we have specifics: store in (users,example). This allows us to tell the database where to create these new partitions—what tablespaces to use. As the database figures out what partitions it wants to create, it uses this list to decide what tablespace to create each partition in. This allows the DBA to control the maximum desired tablespace size: they might not want a single 500GB tablespace, but they would be comfortable with ten 50GB tablespaces. In that case, they would set up ten tablespaces and allow the database to use all ten to create partitions. Let’s insert a row of data now and see what happens:

SQL> insert into audit_trail (ts,data) values ( to_timestamp(’27-feb-­2020′,’dd-mon-yyyy’), ‘xx’ );1 row created.
SQL> select a.partition_name, a.tablespace_name, a.high_value, decode( a.interval, ‘YES’, b.interval ) intervalfrom user_tab_partitions a, user_part_tables b where a.table_name = ‘AUDIT_TRAIL’and a.table_name = b.table_nameorder by a.partition_position;

If you recall from the Range Partitioning section, you would expect that INSERT to fail. However, since we are using interval partitioning, it succeeds and, in fact, creates a new partition SYS_P1623. The HIGH_VALUE for this partition is 01-MAR-2020 which, if we were using range partitioning, would imply anything strictly less than 01-MAR-2020 and greater than or equal to 01-JAN-1900 would go into this partition, but since we have an interval, the rules are different. When the interval is set, the range for this partition is anything greater than or equal to the HIGH_VALUE-INTERVAL and strictly less than the HIGH_VALUE. So, this partition would have the range of

SQL> select TIMESTAMP’ 2020-03-01 00:00:00′-NUMTOYMINTERVAL(1,’MONTH’) greater_than_eq_to,