How Hash Partitioning Works – How Hash Partitioning Works at Oracle Database

Summary-Partitioning

Posted on July 25, 2024July 25, 2024 by Barbara Raymond

Partitioning is extremely useful in scaling up large database objects in the database. This scaling is visible from the perspective of performance scaling, availability scaling, and administrative scaling. All three are extremely important to different people.

The DBA is concerned with administrative scaling. The owners of the system are concerned with availability, because downtime is lost money, and anything that reduces downtime—or reduces the impact of downtime—boosts the payback for a system.

The end users of the system are concerned with performance scaling. No one likes to use a slow system, after all.

We also looked at the fact that in an OLTP system, partitions may not increase performance, especially if applied improperly. Partitions can increase the performance of certain classes of queries, but those queries are generally not applied in an OLTP system.

This point is important to understand, as many people associate partitioning with “free performance increase.” This does not mean that partitions should not be used in OLTP systems—they do provide many other salient benefits in this environment— just don’t expect a massive increase in throughput.

Expect reduced downtime. Expect the same good performance (partitioning will not slow you down when applied appropriately). Expect easier manageability, which may lead to increased performance due to the fact that some maintenance operations are performed by the DBAs more frequently because they can be.

We investigated the various table partitioning schemes offered by Oracle—range, hash, list, interval, reference, interval reference, virtual column, and composite—and talked about when they are most appropriately used.

We spent the bulk of our time looking at partitioned indexes and examining the differences between prefixed and nonprefixed and local and global indexes. We investigated partition operations in data warehouses combined with global indexes, and the trade-off between resource consumption and availability.

We also looked at the ease of maintenance features such as the ability to perform maintenance operations on multiple partitions at a time, cascade truncate, and cascade exchange. Oracle continues to update and improve partitioning with each new release.

Over time, I see this feature becoming more relevant to a broader audience as the size and scale of database applications grow.

The Internet and its database-hungry nature along with legislation requiring longer retention of audit data are leading to more and more extremely large collections of data, and partitioning is a natural tool to help manage that problem.

Cascade Truncate-Partitioning

Posted on May 3, 2024July 25, 2024 by Barbara Raymond

Oracle allows you to truncate parent/child tables in tandem as a single atomic DDL statement. While the truncate cascade is taking place, any queries issued against the parent/child table combination are always presented with a read-consistent view of the data, meaning that the data in the parent/child tables will either be seen as both tables populated or both tables truncated.

The truncate cascade functionality is initiated with a TRUNCATE … CASCADE statement on the parent table. For the cascade truncate to take place, any child tables must be defined with the foreign key relational constraint of ON DELETE CASCADE. What does cascade truncate have to do with partitioning? In a reference partitioned table, you can truncate a parent table partition and have it cascade to the child table partition in one transaction.

Let’s look at an example of this. Applying the TRUNCATE … CASCADE functionalityto reference partitioned tables, the parent ORDERS table is created here, and the ORDER_LINE_ITEMS table is created with ON DELETE CASCADE applied to the foreign key constraint:

$ sqlplus eoda/foo@PDB1
SQL> create table orders(order# number primary key,order_date date,data varchar2(30))
PARTITION BY RANGE (order_date)(PARTITION part_2020 VALUES LESS THAN (to_date(’01-01-2021′,’dd-mm-yyyy’)) , PARTITION part_2021 VALUES LESS THAN (to_date(’01-01-2022′,’dd-mm-yyyy’)));Table created.
SQL> insert into orders values ( 1, to_date( ’01-jun-2020′, ‘dd-mon-yyyy’ ), ‘xyz’ );1 row created.
SQL> insert into orders values ( 2, to_date( ’01-jun-2021′, ‘dd-mon-yyyy’ ), ‘xyz’ );1 row created.

And now we’ll create the ORDER_LINE_ITEMS table, ensuring we include the ON DELETE CASCADE clause:
SQL> create table order_line_items(order# number,line# number,datavarchar2(30),constraint c1_pk primary key(order#,line#),constraint c1_fk_p foreign key(order#) references orders on deletecascade) partition by reference(c1_fk_p);
SQL> insert into order_line_items values ( 1, 1, ‘yyy’ ); 1 row created.
SQL> insert into order_line_items values ( 2, 1, ‘yyy’ ); 1 row created.

Now we can issue a TRUNCATE … CASCADE that truncates both the parent table partition and the child table partition as a single transaction:
SQL> alter table orders truncate partition PART_2020 cascade; Table truncated.

In other words, the TRUNCATE … CASCADE functionality prevents applications from seeing the child table truncated before the parent table is truncated.
You can also truncate all partitions in the parent and child tables via
SQL> truncate table orders cascade;Table truncated.

Again, just to be clear, the ability to cascade truncate parent/child tables is not exclusively a partitioning feature. This feature also applies to nonpartitioned parent/ child tables. This allows you to use one DDL statement to initiate truncate operations and also ensures the database application is always presented with a consistent view of parent/child partitions.

Cascade Exchange

Oracle allows you to exchange the combination of parent/child reference partitioned tables in one atomic DDL statement. A small example will demonstrate this. First, a reference partitioned parent and child table is created to set this up:
$ sqlplus eoda/foo@PDB1
SQL> create table orders( order# number primary key,order_date date,data varchar2(30))PARTITION BY RANGE (order_date)
(PARTITION part_2020 VALUES LESS THAN (to_date(’01-01-2021′,’dd-mm-yyyy’)) , PARTITION part_2021 VALUES LESS THAN (to_date(’01-01-2022′,’dd-mm-yyyy’)));
SQL> insert into orders values (1, to_date( ’01-jun-2014′, ‘dd-mon-yyyy’ ), ‘xyz’); SQL> insert into orders values (2, to_date( ’01-jun-2015′, ‘dd-mon-yyyy’ ), ‘xyz’);
SQL> create table order_line_items
SQL> insert into order_line_items values ( 1, 1, ‘yyy’ ); SQL> insert into order_line_items values ( 2, 1, ‘yyy’ );

Next, an empty partition is added to the reference partitioned table:
SQL> alter table orders add partition part_2022 values less than (to_ date(’01-01-2023′,’dd-mm-yyyy’));

Next, a parent and a child table are created and loaded with data. These are the tables that will be exchanged with the empty partitions in the reference partitioned table:
SQL> create table part_2022( order# number primary key,order_date date,data varchar2(30));
SQL> insert into part_2022 values (3, to_date(’01-jun-2022′, ‘dd-mon-yyyy’ ), ‘xyz’);

Now we can exchange the prior two tables in one transaction into the reference partitioned tables. Notice the CASCADE option is specified:
SQL> alter table orders exchange partition part_2022
with table part_2022without validationCASCADE
update global indexes;

That’s it. With one DDL statement, we simultaneously exchanged two tables related by a foreign key constraint into a reference partitioned table. Anybody accessing the database will see the parent and child table partitions added seamlessly as one unit of work.

Partial Indexes-Partitioning

Posted on February 20, 2024July 25, 2024 by Barbara Raymond

Oracle allows you to create either local or global indexes on a subset of partitions in a table. You may want to do this if you’ve pre-created partitions and don’t yet have data for range partitions that map to future dates—the idea being that you’ll build the index after the partitions have been loaded (at some future date).

You set up the use of a partial index by first specifying INDEXING ON|OFF for each partition in the table. In this next example, PART_1 has indexing turned on and PART_2 has indexing turned off:

$ sqlplus eoda/foo@PDB1
SQL> CREATE TABLE p_table (a int)PARTITION BY RANGE (a)
(PARTITION part_1 VALUES LESS THAN(1000) INDEXING ON, PARTITION part_2 VALUES LESS THAN(2000) INDEXING OFF);Table created.

Next, a partial local index is created:
SQL> create index pi1 on p_table(a) local indexing partial; Index created.

In this scenario, the INDEXING PARTIAL clause instructs Oracle to only build and make usable local index partitions that point to partitions in the table that were defined with INDEXING ON. In this case, one usable index partition will be created with index entries pointing to data in the PART_1 table partition:

SQL> select a.index_name, a.partition_name, a.status from user_ind_partitions a, user_indexes b where b.table_name = ‘P_TABLE’and a.index_name = b.index_name;
INDEX_NAME PARTITION_NAME STATUS
PI1 PART_2 UNUSABLE
PI1 PART_1 USABLE

Next, we’ll insert some test data, generate statistics, set autotrace on, and run a query that should locate data in the PART_1 partition:
SQL> insert into p_table select rownum from dual connect by level < 2000; 1999 rows created. SQL> exec dbms_stats.gather_table_stats(user,’P_TABLE’); PL/SQL procedure successfully comple
SQL> explain plan for select * from p_table where a = 20;Explained.
SQL> select * from table(dbms_xplan.display(null,null,’BASIC +PARTITION’));

0 SELECT STATEMENT
1 PARTITION RANGE SINGLE
2 INDEX RANGE SCAN

As expected, the optimizer was able to generate an execution plan utilizing the index. Next, a query is issued that selects data from the partition defined with INDEXING OFF:
SQL> explain plan for select * from p_table where a = 1500;Explained.
SQL> select * from table(dbms_xplan.display(null,null,’BASIC +PARTITION’));

0 SELECT STATEMENT
1 PARTITION RANGE SINGLE
2 TABLE ACCESS FULL

The output shows a full table scan of PART_2 was required, as there is no usable index with entries pointing at data in PART_2. We can instruct Oracle to create index entries pointing to data in PART_2 by rebuilding the index partition associated with the PART_2 partition:
SQL> alter index pi1 rebuild partition part_2; Index altered.

Rerunning the previous select query shows that the optimizer is now utilizing the local partitioned index pointing to the PART_2 table partition:
0 SELECT STATEMENT
1 PARTITION RANGE SINGLE
2 INDEX RANGE SCAN

In this way, partial indexes allow you to disable the index while the table partition is being loaded (increasing the loading speed), and then later you can rebuild the partial index to make it available.

Sliding Windows and Indexes-Partitioning-1

Posted on September 3, 2023July 25, 2024 by Barbara Raymond

The following example implements a classic sliding window of data. In many implementations, data is added to a warehouse over time and the oldest data is aged out. Many times, this data is range partitioned by a date attribute, so that the oldest data is stored together in a single partition, and the newly loaded data is likewise stored together in a new partition. The monthly load process involves the following:

•\ Detaching the old data: The oldest partition is either dropped or exchanged with an empty table (turning the oldest partition into a table) to permit archiving of the old data.
•\ Loading and indexing of the new data: The new data is loaded into a work table and indexed and validated.
•\ Attaching the new data: Once the new data is loaded and processed, the table it is in is exchanged with an empty partition in the partitioned table, turning this newly loaded data in a table into a partition of the larger partitioned table.

This process is repeated every month, or however often the load process is performed; it could be every day or every week. We will implement this very typical process in this section to show the impact of global partitioned indexes and demonstrate the options we have during partition operations to increase availability, allowing us to implement a sliding window of data and maintain continuous availability of data.

We’ll process yearly data in this example and have fiscal years 2020 and 2021 loaded up. The table will be partitioned by the TIMESTAMP column, and it will have two indexes created on it—one is a locally partitioned index on the ID column, and the other is a global index (nonpartitioned, in this case) on the TIMESTAMP column:

$ sqlplus eoda/foo@PDB1
SQL> drop table partitioned;
SQL> CREATE TABLE partitioned( timestamp date,id int)
PARTITION BY RANGE (timestamp)
(PARTITION fy_2020 VALUES LESS THAN
( to_date(’01-jan-2021′,’dd-mon-yyyy’) ) , PARTITION fy_2021 VALUES LESS THAN
( to_date(’01-jan-2022′,’dd-mon-yyyy’) ) ); Table created.
SQL> insert into partitioned partition(fy_2020) select to_date(’31-dec-2020′,’dd-mon-yyyy’)-mod(rownum,360), rownum from dual connect by level <= 70000;70000 rows created. SQL> insert into partitioned partition(fy_2021) select to_date(’31-dec-2021′,’dd-mon-yyyy’)-mod(rownum,360), rownum from dual connect by level <= 70000;70000 rows created. SQL> create index partitioned_idx_local on partitioned(id) LOCAL; Index created.
SQL> create index partitioned_idx_global on partitioned(timestamp) GLOBAL; Index created.

This sets up our warehouse table. The data is partitioned by fiscal year, and we have the last two years’ worth of data online. This table has two indexes: one is LOCAL and the other is GLOBAL. Now it’s the end of the year and we would like to do the following:

\ 1.\ Remove the oldest fiscal year data: We do not want to lose this data forever; we just want to age it out and archive it.
\ 2.\ Add the newest fiscal year data: It will take a while to load it, transform it, index it, and so on. We would like to do this work without impacting the availability of the current data, if at all possible.

The first step is to set up an empty table for fiscal year 2020 that looks just like the partitioned table. We’ll use this table to exchange with the FY_2020 partition in the partitioned table, turning that partition into a table and in turn emptying out the partition in the partitioned table. The net effect is that the oldest data in the partitioned table will have been in effect removed after the exchange:
SQL> create table fy_2020 ( timestamp date, id int ); Table created.
SQL> create index fy_2020_idx on fy_2020(id); Index created.

We’ll do the same to the new data to be loaded. We’ll create and load a table that structurally looks like the existing partitioned table (but that is not itself partitioned):

SQL> create table fy_2022 ( timestamp date, id int ); Table created.
SQL> insert into fy_2022 select to_date(’31-dec-2022′,’dd-mon-yyyy’)-mod(rownum,360), rownum from dual connect by level <= 70000;70000 rows created. SQL> create index fy_2022_idx on fy_2022(id) nologging; Index created.

Sliding Windows and Indexes-Partitioning-2

Posted on August 10, 2023July 25, 2024 by Barbara Raymond

We’ll turn the current full partition into an empty partition and create a full table with the FY_2020 data in it. Also, we’ve completed all of the work necessary to have the FY_2022 data ready to go. This would have involved verifying the data, transforming it— whatever complex tasks we need to undertake to get it ready. Now we’re ready to update the live data using an exchange partition:

SQL> alter table partitionedexchange partition fy_2020with table fy_2020including indexeswithout validation;Table altered.
SQL> alter table partitioned drop partition fy_2020; Table altered.

This is all we need to do to age the old data out. We turned the partition into a full table and the empty table into a partition. This was a simple data dictionary update. No large amount of I/O took place—it just happened. We can now export that FY_2020 table (perhaps using a transportable tablespace) out of our database for archival purposes. We could reattach it quickly if we ever needed to.

Next, we want to slide in the new data:
SQL> alter table partitioned add partition fy_2022values less than ( to_date(’01-jan-2023′,’dd-mon-yyyy’) ); Table altered.
SQL> alter table partitionedexchange partition fy_2022with table fy_2022including indexeswithout validation;Table altered.

Again, this was instantaneous; it was accomplished via simple data dictionary updates—the WITHOUT VALIDATION clause allowed us to accomplish that. When you use that clause, the database will trust that the data you are placing into that partition is, in fact, valid for that partition. Adding the empty partition took very little time to process.

Then, we exchange the newly created empty partition with the full table, and the full table with the empty partition, and that operation is performed quickly as well. The new data is online. Looking at our indexes, however, we’ll find the following:
SQL> select index_name, status from user_indexes;
INDEX_NAME STATUS
PARTITIONED_IDX_LOCAL N/A
PARTITIONED_IDX_GLOBAL UNUSABLE
FY_2020_IDX VALID
FY_2022_IDX VALID

The global index is, of course, unusable after this operation. Since each index partition can point to any table partition, and we just took away a partition and added a partition, that index is invalid. It has entries that point into the partition we dropped. It has no entries that point into the partition we just added. Any query that would make use of this index would fail and not execute, or if we skip unusable indexes, the query’s performance would be negatively impacted by not being able to use the index:

SQL> select /*+ index( partitioned PARTITIONED_IDX_GLOBAL ) */ count() from partitioned where timestamp between to_date( ’01-mar-2022′, ‘dd-mon-yyyy’ ) and to_date( ’31-mar-2022′, ‘dd-mon-yyyy’ );select /+ index( partitioned PARTITIONED_IDX_GLOBAL ) / count()
ERROR at line 1:
ORA-01502: index ‘EODA.PARTITIONED_IDX_GLOBAL’ or partition of such index is in unusable state
SQL> explain plan for select count(*)from partitionedwhere timestamp between to_date( ’01-mar-2022′, ‘dd-mon-yyyy’ ) and to_date( ’31-mar-2022′, ‘dd-mon-yyyy’ );

So, our choices after performing this partition operation with global indexes are
•\ Ensure that SKIP_UNUSABLE_INDEXES=TRUE (the default is TRUE), andthen Oracle will not use this unusable index. But then we lose the performance the index was giving us.
•\ Rebuild this index to make it usable again.
•\ Use the UPDATE GLOBAL INDEXES clause when performing partition maintenance (the focus of the next section in this chapter).

The sliding window process, which so far has resulted in virtually no downtime, will now take a very long time to complete while we rebuild the global index. Runtime query performance of queries that relied on these indexes will be negatively affected during this time—either they will not run at all or they will run without the benefit of the index. All of the data must be scanned and the entire index reconstructed from the table data. If the table is many hundreds of gigabytes in size, this will take considerable resources.

“Live” Global Index Maintenance-Partitioning

Posted on May 6, 2023July 25, 2024 by Barbara Raymond

Oracle has the ability to maintain the global indexes during the partition operation using the UPDATE GLOBAL INDEXES clause. This means that as you drop a partition, split a partition, perform whatever operation necessary on a partition, Oracle will perform the necessary modifications to the global index to keep it up to date.

Since most partition operations will cause this global index invalidation to occur, this feature can be a boon to systems that need to provide continual access to the data. You’ll find that you sacrifice the raw speed of the partition operation, but with the associated window of unavailability immediately afterward as you rebuild indexes, for a slower overall response time from the partition operation but coupled with 100 percent data availability.

In short, if you have a data warehouse that cannot have downtime, but must support these common data warehouse techniques of sliding data in and out, then this feature is for you—but you must understand the implications.

Revisiting our previous example, if our partition operations had used the UPDATE GLOBAL INDEXES clause when relevant (in this example, it would not be needed on the ADD PARTITION statement since the newly added partition would not have any rows in it), we would have discovered the indexes to be perfectly valid and usable both during and after the operation:

SQL> alter table partitioned exchange partition fy_2020 with table fy_2020 including indexes without validation UPDATE GLOBAL INDEXES;Table altered.

Note in the following output, the N/A status observed for the PARTITIONED_IDX_LOCAL index simply means the statuses are associated with the index partitions associated with that index, not the index itself. It doesn’t make sense to say the locally partitioned index is valid or not; it is just a container that logically holds the index partitions themselves:

SQL> select index_name, status from user_indexes;
INDEX_NAME STATUS
PARTITIONED_IDX_LOCAL N/A
PARTITIONED_IDX_GLOBAL VALID
FY_2020_IDX VALID
FY_2022_IDX VALID

But there is a trade-off: we are performing the logical equivalent of DELETE and INSERT operations on the global index structures. When we drop a partition, we have to delete all of the global index entries that might be pointing to that partition.

When we did the exchange of a table with a partition, we had to delete all of the global index entries pointing to the original data and then insert all of the new ones that we just slid in there. So the amount of work performed by the ALTER commands was significantly increased.

You should expect with global index maintenance considerations that the approach without index maintenance will consume fewer database resources and therefore perform faster but incur a measurable period of downtime.

The second approach, involving maintaining the indexes, will consume more resources and perhaps take longer overall, but will not incur downtime. As far as the end users are concerned, their ability to work never ceased. They might have been processing a bit slower (since we were competing with them for resources), but they were still processing, and they never stopped.

The index rebuild approach will almost certainly run faster, considering both the elapsed time and the CPU time. This fact has caused many a DBA to pause and say, “Hey, I don’t want to use UPDATE GLOBAL INDEXES—it’s slower.”

That is too simplistic of a view, however. What you need to remember is that while the operations overall took longer, processing on your system was not necessarily interrupted. Sure, you as the DBA might be looking at your screen for a longer period of time, but the really important work that takes place on your system was still taking place.

You need to see if this trade-off makes sense for you. If you have an eight-hour maintenance window overnight in which to load new data, then by all means, use the rebuild approach if that makes sense. However, if you have a mandate to be available continuously, then the ability to maintain the global indexes will be crucial.

One more thing to consider is the redo generated by each approach. You will find that the UPDATE GLOBAL INDEXES generates considerably more redo (due to the index maintenance), and you should expect that to only go up as you add more and more global indexes to the table.

The redo generated by the UPDATE GLOBAL INDEXES processing is unavoidable and cannot be turned off via NOLOGGING, since the maintenance of the global indexes is not a complete rebuild of their structure but more of an incremental maintenance.

Additionally, since you are maintaining the live index structure, you must generate undo for that—in the event the partition operation fails, you must be prepared to put the index back the way it was. And remember, undo is protected by redo itself, so some of the redo you see generated is from the index updates and some is from the rollback. Add another global index or two and you would reasonably expect these numbers to increase.

So, UPDATE GLOBAL INDEXES is an option that allows you to trade off availability for resource consumption. If you need to provide continuous availability, it’s the option for you. But you have to understand the ramifications and size other components of your system appropriately. Specifically, many data warehouses have been crafted over time to use bulk direct path operations, bypassing undo generation and, when permitted, redo generation as well. Using UPDATE GLOBAL INDEXES cannot bypass either of those two elements. You need to examine the rules you use to size your redo and undo needs before using this feature, so you can assure yourself it can work on your system.

Composite Partitioning-Partitioning

Posted on December 1, 2022July 25, 2024 by Barbara Raymond

Lastly, we’ll look at some examples of composite partitioning, which is a mixture of range, hash, and/or list. Table 13-1 lists the various ways you can mix and match types of partitioning. In other words, the table shows there are nine different combinations of composite partitioning currently available.

Table 13-1. Oracle Database Supported Composite Partitioning Schemes by Version

It is interesting to note that when you use composite partitioning, there will be no partition segments; there will be only subpartition segments.
When using composite partitioning, the partitions themselves do not have segments (much like a partitioned table doesn’t have a segment).
The data is physically stored in subpartition segments, and the partition becomes a logical container, or a container that points to the actual subpartitions.

In our example, we’ll look at a range-hash composite partitioning. Here, we are using a different set of columns for the range partition from those used for the hash partition. This is not mandatory; we could use the same set of columns for both:
$ sqlplus eoda/foo@PDB1
SQL> CREATE TABLE composite_example
( range_key_column date,hash_key_column int,data varchar2(20))PARTITION BY RANGE (range_key_column)subpartition by hash(hash_key_column) subpartitions 2

PARTITION part_1
VALUES LESS THAN(to_date(’01/01/2021′,’dd/mm/yyyy’))
(subpartition part_1_sub_1,subpartition part_1_sub_2),

PARTITION part_2
VALUES LESS THAN(to_date(’01/01/2022′,’dd/mm/yyyy’))
(subpartition part_2_sub_1,subpartition part_2_sub_2));Table created.

In range-hash composite partitioning, Oracle will first apply the range partitioning rules to figure out which range the data falls into.
Then it will apply the hash function to decide into which physical partition the data should finally be placed. This process is described in Figure 13-4.

Figure 13-4. Range-hash composite partition example

So, composite partitioning gives you the ability to break your data up by range and, when a given range is considered too large or further partition elimination could be useful, to break it up further by hash or list. It is interesting to note that each range partition need not have the same number of subpartitions; for example, suppose you were range partitioning on a date column in support of data purging (to remove all old data rapidly and easily).

In the year 2020 and before, you had equal amounts of data in odd code numbers in the CODE_KEY_COLUMN and in even code numbers. But after that, you knew the number of records associated with the odd code number was more than double, and you wanted to have more subpartitions for the odd code values.

You can achieve that rather easily just by defining more subpartitions:
SQL> CREATE TABLE composite_range_list_example ( range_key_column date,code_key_column int,data varchar2(20))PARTITION BY RANGE (range_key_column)subpartition by list(code_key_column)

PARTITION part_1
VALUES LESS THAN(to_date(’01/01/2021′,’dd/mm/yyyy’)) (subpartition part_1_sub_1 values( 1, 3, 5, 7 ), subpartition part_1_sub_2 values( 2, 4, 6, 8 )),

PARTITION part_2
VALUES LESS THAN(to_date(’01/01/2022′,’dd/mm/yyyy’)) (subpartition part_2_sub_1 values ( 1, 3 ), subpartition part_2_sub_2 values ( 5, 7 ), subpartition part_2_sub_3 values ( 2, 4, 6, 8 )));Table created.

Here, you end up with five partitions altogether: two subpartitions for partition PART_1 and three for partition PART_2.

Interval Partitioning-Partitioning-1

Posted on March 21, 2022July 25, 2024 by Barbara Raymond

Interval partitioning is very similar to range partitioning described previously—in fact, it starts with a range partitioned table but adds a rule (the interval) to the definition so the database knows how to add partitions in the future.

The goal of interval partitioning is to create new partitions for data—if, and only if, data exists for a given partition and only when that data arrives in the database. In other words, to remove the need to pre- create partitions for data, to allow the data itself to create the partition as it is inserted.

To use interval partitioning, you start with a range partitioned table without a MAXVALUE partition and specify an interval to add to the upper bound, the highest value of that partitioned table to create a new range.

You need to have a table that is range partitioned on a single column that permits adding a NUMBER or INTERVAL type to it (e.g., a table partitioned by a VARCHAR2 field cannot be interval partitioned; there is nothing you can add to a VARCHAR2).

You can use interval partitioning with any suitable existing range partitioned table; that is, you can ALTER an existing range partitioned table to be interval partitioned, or you can create one with the CREATE TABLE command.

For example, suppose you had a range partitioned table that said “anything strictly less than 01-JAN-2021” (data in the year 2020 and before) goes into partition P1—and that was it.

So it had one partition for all data in the year 2020 and before. If you attempted to insert data for the year 2022 into the table, the insert would fail as demonstrated previously in the section on range partitioning.

With interval partitioning, you can create a table and specify both a range (strictly less than 01-JAN-2020) and an interval—say one month in duration—and the database would create monthly partitions (a partition capable of holding exactly one month’s worth of data) as the data arrived.

The database would not pre-create all possible partitions because that would not be practical. But, as each row arrived, the database would see whether the partition for the month in question existed. The database would create the partition if needed. Here is an example of the syntax:

$ sqlplus eoda/foo@PDB1
SQL> create table audit_trail( ts timestamp,data varchar2(30))partition by range(ts)interval (numtoyminterval(1,’month’))store in (users, example )(partition p0 values less than(to_date(’01-01-1900′,’dd-mm-yyyy’)));Table created.

Note You might have a question in your mind, especially if you just finished reading the previous chapter on datatypes. You can see we are partitioning by a TIMESTAMP, and we are adding an INTERVAL of one month to it. In Chapter 12, we saw how adding an INTERVAL of one month to a TIMESTAMP that fell on January 31 would raise an error, since there is no February 31. Will the same issue happen with interval partitioning? The answer is yes; if you attempt to use a date such as ’29-01-1990′ (any day of the month after 28 would suffice), you will receive an error “ORA-14767: Cannot specify this interval with existing high bounds”. The database will not permit you to use a boundary value that is not safe to add the interval to.

On lines 8 and 9, you see the range partitioning scheme for this table; it starts with a single empty partition that would contain any data prior to 01-JAN-1900. Presumably, since the table holds an audit trail, this partition will remain small and empty forever. It is a mandatory partition and is referred to as the transitional partition.

All data that is strictly less than this current high value partition will be range partitioned, using traditional range partitioning. Only data that is created above the transitional partition high value will use interval partitioning. If we query the data dictionary, we can see what has been created so far:

SQL> select a.partition_name, a.tablespace_name, a.high_value, decode( a.interval, ‘YES’, b.interval ) intervalfrom user_tab_partitions a, user_part_tables b where a.table_name = ‘AUDIT_TRAIL’ and a.table_name = b.table_nameorder by a.partition_position;

So far, we have just the single partition, and it is not an INTERVAL partition, as shown by the empty INTERVAL column. Rather, it is just a regular RANGE partition right now; it will hold anything strictly less than 01-JAN-1900.

Looking at the CREATE TABLE statement again, we can see the new interval partitioning–specific information on lines 6 and 7:interval (numtoyminterval(1,’month’))store in (users, example )

On line 6, we have the actual interval specification of NUMTOYMINTERVAL(1,’MONTH’). Our goal was to store monthly partitions—a new partition for each month’s worth of data—a very common goal. By using a date that is safe to add a month to (refer to Chapter 12 for why adding a month to a timestamp can be error-prone in some cases)— the first of the month—we can have the database create monthly partitions on the fly, as data arrives, for us.

On line 7, we have specifics: store in (users,example). This allows us to tell the database where to create these new partitions—what tablespaces to use. As the database figures out what partitions it wants to create, it uses this list to decide what tablespace to create each partition in. This allows the DBA to control the maximum desired tablespace size: they might not want a single 500GB tablespace, but they would be comfortable with ten 50GB tablespaces. In that case, they would set up ten tablespaces and allow the database to use all ten to create partitions. Let’s insert a row of data now and see what happens:

SQL> insert into audit_trail (ts,data) values ( to_timestamp(’27-feb-2020′,’dd-mon-yyyy’), ‘xx’ );1 row created.
SQL> select a.partition_name, a.tablespace_name, a.high_value, decode( a.interval, ‘YES’, b.interval ) intervalfrom user_tab_partitions a, user_part_tables b where a.table_name = ‘AUDIT_TRAIL’and a.table_name = b.table_nameorder by a.partition_position;

If you recall from the Range Partitioning section, you would expect that INSERT to fail. However, since we are using interval partitioning, it succeeds and, in fact, creates a new partition SYS_P1623. The HIGH_VALUE for this partition is 01-MAR-2020 which, if we were using range partitioning, would imply anything strictly less than 01-MAR-2020 and greater than or equal to 01-JAN-1900 would go into this partition, but since we have an interval, the rules are different. When the interval is set, the range for this partition is anything greater than or equal to the HIGH_VALUE-INTERVAL and strictly less than the HIGH_VALUE. So, this partition would have the range of

SQL> select TIMESTAMP’ 2020-03-01 00:00:00′-NUMTOYMINTERVAL(1,’MONTH’) greater_than_eq_to,

Interval Partitioning-Partitioning-2

Posted on December 9, 2021July 25, 2024 by Barbara Raymond

That is—all of the data for the month of February 2020. If we insert another row in some other month, as follows, we can see that another partition, SYS_P1624, is added that contains all of the data for the month of June 2020:

SQL> insert into audit_trail (ts,data) values ( to_date(’25-jun-2020′, ‘dd-mon-yyyy’), ‘xx’ );1 row created.
SQL> select a.partition_name, a.tablespace_name, a.high_value, decode( a.interval, ‘YES’, b.interval ) intervalfrom user_tab_partitions a, user_part_tables b where a.table_name = ‘AUDIT_TRAIL’ and a.table_name = b.table_nameorder by a.partition_position;

You might be looking at this output and asking why everything is in the USERS tablespace. We clearly asked for the data to be spread out over the USERS tablespace and the EXAMPLE tablespace, so why is everything in a single tablespace?

It has to do with the fact that when the database is figuring out what partition the data goes into, it is also computing which tablespace it would go into. Since each of our partitions is an even number of months away from each other and we are using just two tablespaces, we end up using the same tablespace over and over.

If we only loaded “every other month” into this table, we would end up using only a single tablespace. We can see that the EXAMPLE tablespace can be used by adding some row that is an “odd” number of months away from our existing data:

SQL> insert into audit_trail (ts,data) values ( to_date(’15-mar-2020′,’dd-mon-yyyy’), ‘xx’ );1 row created.
SQL> select a.partition_name, a.tablespace_name, a.high_value, decode( a.interval, ‘YES’, b.interval ) intervalfrom user_tab_partitions a, user_part_tables b where a.table_name = ‘AUDIT_TRAIL’and a.table_name = b.table_name order by a.partition_position;

Now we have used the EXAMPLE tablespace. This new partition was slid in between the two existing partitions and will contain all of our March 2020 data.

You might be asking, “What happens if I roll back at this point?” If we were to roll back, it should be obvious that the AUDIT_TRAIL rows we just inserted would go away:
SQL> select * from audit_trail;
SQL> rollback;Rollback complete.
SQL> select * from audit_trail; no rows selected

But what isn’t clear immediately is what would happen to the partitions we added:do they stay or will they go away as well? A quick query will verify that they will stay:

SQL> select a.partition_name, a.tablespace_name, a.high_value, decode( a.interval, ‘YES’, b.interval ) interval from user_tab_partitions a, user_part_tables b where a.table_name = ‘AUDIT_TRAIL’ and a.table_name = b.table_name order by a.partition_position;PARTITION_ TABLESPACE HIGH_VALUE INTERVAL

As soon as they are created, they are committed and visible. These partitions are created using a recursive transaction, a transaction executed separate and distinct from any transaction you might already be performing.

When we went to insert the row and the database discovered that the partition we needed did not exist, the database immediately started a new transaction, updated the data dictionary to reflect the new partition’s existence, and committed its work.

It must do this, or there would be severe contention (serialization) on many inserts as other transactions would have to wait for us to commit to be able to see this new partition. Therefore, this DDL is done outside of your existing transaction, and the partitions will persist.

You might have noticed that the database names the partition for us; SYS_P1625 is the name of the newest partition. The names are not sortable nor very meaningful in the sense most people would be used to.

They show the order in which the partitions were added to the table (although you cannot rely on that always being true; it is subject to change) but not much else. Normally, in a range partitioned table, the DBA would have named the partition using some naming scheme and in most cases would have made the partition names sortable.

For example, the February data would be in a partition named PART_2020_02 (using a format of PART_yyyy_mm), March would be in PART_2020_03, and so on. With interval partitioning, you have no control over the partition names as they are created, but you can easily rename them afterward if you like.
For example, we could query out the HIGH_VALUE string and using dynamic SQL convert that into nicely formatted, meaningful names.

We can do this because we understand how we’d like the names formatted; the database does not. For example:

SQL> declarel_str varchar2(4000);beginfor x in ( select a.partition_name, a.tablespace_name, a.high_value from user_tab_partitions a
where a.table_name = ‘AUDIT_TRAIL’and a.interval = ‘YES’and a.partition_name like ‘SYS_P%’ escape ‘\’ )loopexecute immediate’select to_char( ‘ || x.high_value || ‘-numtodsinterval(1,”second”), ””PART_”yyyy_mm” ) from dual’ into l_str;execute immediate’alter table audit_trail rename partition “‘ || x.partition_name || ‘” to “‘ || l_str || ‘”‘;end loop;end;/
PL/SQL procedure successfully completed.

So, what we’ve done is take the HIGH_VALUE and subtract one second from it. We know that the HIGH_VALUE represents the strictly less than value, so one second before its value would be a value in the range. Once we have that, we applied the format “PART_”yyyy_mm to the resulting TIMESTAMP and get a string such as PART_2020_03 for March 2020. We use that string in a rename command, and now our data dictionary looks like this:
SQL> select a.partition_name, a.tablespace_name, a.high_value, decode( a.interval, ‘YES’, b.interval ) intervalfrom user_tab_partitions a, user_part_tables b where a.table_name = ‘AUDIT_TRAIL’and a.table_name = b.table_nameorder by a.partition_position;

We would just run that script every now and then to rename any newly added partitions to keep the nice naming convention in place. Bear in mind, to avoid any SQL injection issues (we are using string concatenation, not bind variables; we cannot use bind variables in DDL), we would want to keep this script as an anonymous block or as an invoker’s rights routine if we decide to make it a stored procedure. That will prevent others from running SQL in our schema as if they were us, which could be a disaster.

Virtual Column Partitioning-Partitioning

Posted on November 2, 2021July 25, 2024 by Barbara Raymond

Virtual column partitioning allows you to partition based on a SQL expression. This type of partitioning is useful when a table column is overloaded with multiple business values and you want to partition on a portion of that column. For example, let’s say you have a RESERVATION_CODE column in a table:
$ sqlplus eoda/foo@PDB1
SQL> create table res(reservation_code varchar2(30)); Table created.

And the first character in the RESERVATION_CODE column defines a region from which the reservation originated. For the purposes of this example, let’s say a first character of an A or C maps to the NE region, values of B map to the SW region, and values of D map to the NW region.

In this scenario, we know that the first character represents the region, and we want to be able to list partition by region. With the data as it is, it’s not practical to list partition by the RESERVATION_CODE column, whereas virtual partitioning allows us to apply a SQL function to the column and list partition by the first character. Here’s what the table definition looks like with virtual column partitioning:
SQL> drop table res;
SQLcreatetableres(reservation_codevarchar2(30),regionas(decode(substr(reservation_code,1,1),’A’,’NE’,’C’,’NE’,’B’,’SW’,’D’,’NW’)))partition by list (region)(partition NE values(‘NE’),partition SW values(‘SW’),partition NW values(‘NW’));Table created.

We can view the partition information via this query:
SQL> select a.table_name, a.partition_name, a.high_value from user_tab_partitions a, user_part_tables b where a.table_name = ‘RES’and a.table_name = b.table_nameorder by a.table_name;

TABLE_NAME PARTITION_ HIGH_VALUE
RES NE ‘NE’
RES NW ‘NW’
RES SW ‘SW’

Next, some random test data is inserted into the table (your random results will be different from this example):
SQL> insert into res (reservation_code)select chr(64+(round(dbms_random.value(1,4)))) || level from dual connect by level < 10;9 rows created.

Now let’s see how the data was partitioned:
SQL> select ‘NE’, reservation_code, region from res partition(NE) union all
select ‘SW’, reservation_code, region from res partition(SW) union all
select ‘NW’, reservation_code, region from res partition(NW);

‘N RESERVATION_CODE RE

NE C3 NE
NE C5 NE
NE A6 NE
NE C8 NE
SW B2 SW
SW B7 SW
SW B9 SW
NW D1 NW
NW D4 NW

In this way, virtual column partitioning is often appropriate when there is a business requirement to partition on portions of data in a column or combinations of data from different columns (especially when there might not be an obvious way to list or range partition). The expression behind a virtual column can be a complex calculation, return a subset of a column string, combine column values, and so on.