January 2024 – How Hash Partitioning Works at Oracle Database

This shows the distribution of data, by location, into the individual partitions. We can now review some query plans to see what we could expect performance-wise:

SQL> variable x varchar2(30);
SQL> begindbms_stats.set_table_stats( user, ‘EMP’, numrows=>100000, numblks => 10000 ); end;/
PL/SQL procedure successfully completed.
SQL> explain plan for select empno, job, loc from emp where empno = :x; Explained.
SQL> select * from table(dbms_xplan.display(null,null,’BASIC +PARTITION’));

The plan here shows an INDEX UNIQUE SCAN of the nonpartitioned index EMP_PK that was created in support of our primary key. Then there is a TABLE ACCESS BY GLOBAL INDEX ROWID, with a PSTART and PSTOP of ROWID/ROWID, meaning that when we get the ROWID from the index, it will tell us precisely which index partition to read to get this row. This index access will be as effective as on a nonpartitioned table and perform the same amount of I/O to do so. It is just a simple, single index unique scan followed by “get this row by rowid.” Now, let’s look at one of the other global indexes, the one on JOB:

SQL> explain plan for select empno, job, loc from emp where job = :x; Explained.
SQL> select * from table(dbms_xplan.display);

Sure enough, we see a similar effect for the INDEX RANGE SCAN. Our indexes are used and can provide high-speed OLTP access to the underlying data. If they were partitioned, they would have to be prefixed and enforce index partition elimination; hence, they are scalable as well, meaning we can partition them and observe the same behavior. In a moment, we’ll look at what would happen if we used LOCAL indexes only.

Lastly, let’s look at the area of availability. The Oracle documentation claims that globally partitioned indexes make for less available data than locally partitioned indexes. I don’t fully agree with this blanket characterization. I believe that in an OLTP system they are as highly available as a locally partitioned index. Consider the following:

SQL> alter tablespace p1 offline;
SQL> alter tablespace p2 offline;
SQL> alter tablespace p3 offline;
SQL> select empno, job, loc from emp where empno = 7782;
EMPNO JOB LOC
7782 MANAGER NEW YORK

Here, even though most of the underlying data is unavailable in the table, we can still gain access to any bit of data available via that index. As long as the EMPNO we want is in a tablespace that is available, and our GLOBAL index is available, our GLOBAL index works for us. On the other hand, if we had been using the highly available local index in the preceding case, we might have been prevented from accessing the data! This is a side effect of the fact that we partitioned on LOC but needed to query by EMPNO. We would have had to probe each local index partition and would have failed on the index partitions that were not available.

Other types of queries, however, will not (and cannot) function at this point in time:
SQL> select empno, job, loc from emp where job = ‘CLERK’;
ERROR at line 1:
ORA-00376: file 38 cannot be read at this time ORA-01110: data file 38: ‘/opt/oracle/oradata/CDB/ C217E68DF48779E1E0530101007F73B9/datafile/o1_mf_p2_jcbnhfh2_.dbf’

The CLERK data is in all of the partitions, and the fact that three of the tablespaces are offline does affect us. This is unavoidable unless we had partitioned on JOB, but then we would have had the same issues with queries that needed data by LOC. Anytime you need to access the data from many different keys, you will have this issue. Oracle will give you the data whenever it can.

Note, however, that if the query can be answered from the index, avoiding the TABLE ACCESS BY ROWID, the fact that the data is unavailable is not as meaningful:
SQL> select count() from emp where job = ‘CLERK’;COUNT()

Since Oracle didn’t need the table in this case, the fact that most of the partitions were offline doesn’t affect this query (assuming the index isn’t in one of the offline tablespaces of course). As this type of optimization (i.e., answer the query using just the index) is common in an OLTP system, there will be many applications that are not affected by the data that is offline. All we need to do now is make the offline data available as fast as possible (restore it and recover it).

An OLTP system is characterized by the frequent occurrence of many small read and write transactions. In general, fast access to the row (or rows) you need is paramount. Data integrity is vital. Availability is also very important. Global indexes make sense in many cases in OLTP systems. Table data can be partitioned by only one key—one set of columns. However, you may need to access the data in many different ways. You might partition EMPLOYEE data by LOCATION in the table, but you still need fast access to EMPLOYEE data by

•\ DEPARTMENT: Departments are geographically dispersed. There is no relationship between a department and a location.
•\ EMPLOYEE_ID: While an employee ID will determine a location, you don’t want to have to search by EMPLOYEE_ID and LOCATION; hence, partition elimination cannot take place on the index partitions. Also, EMPLOYEE_ID by itself must be unique.
•\ JOB_TITLE: There is no relationship between JOB_TITLE and LOCATION. All JOB_TITLE values may appear in any LOCATION.

There is a need to access the EMPLOYEE data by many different keys in different places in the application, and speed is paramount. In a data warehouse, we might just use locally partitioned indexes on these keys and use parallel index range scans to collect a large amount of data fast. In these cases, we don’t necessarily need to use index partition elimination. In an OLTP system, however, we do need to use it. Parallel query is not appropriate for these systems; we need to provide the indexes appropriately. Therefore, we will need to make use of global indexes on certain fields.

The following are the goals we need to meet:
•\ Fast access
•\ Data integrity
•\ Availability

Global indexes can help us accomplish these goals in an OLTP system. We will probably not be doing sliding windows, auditing aside for a moment. We will not be splitting partitions (unless we have a scheduled downtime), we will not be moving data, and so on. The operations we perform in a data warehouse are not done on a live OLTP system in general.

Here is a small example that shows how we can achieve the three goals just listed with global indexes. I am going to use simple, single partition global indexes, but the results would not be different with global indexes in multiple partitions (except for the fact that availability and manageability would increase as we added index partitions). We start by creating tablespaces P1, P2, P3, and P4, then create a table that is range partitioned by location, LOC, according to our rules, which place all LOC values less than ‘C’ into partition P1, those less than ‘D’ into partition P2, and so on:

$ sqlplus eoda/foo@PDB1
SQL> create tablespace p1 datafile size 1m autoextend on next 1m; Tablespace created.
SQL> create tablespace p2 datafile size 1m autoextend on next 1m; Tablespace created.
SQL> create tablespace p3 datafile size 1m autoextend on next 1m; Tablespace created.
SQL> create tablespace p4 datafile size 1m autoextend on next 1m; Tablespace created.
SQL> create table emp(EMPNO NUMBER(4) NOT NULL,ENAME VARCHAR2(10),JOB VARCHAR2(9),MGR NUMBER(4),HIREDATE DATE,SAL NUMBER(7,2),COMM NUMBER(7,2),DEPTNO NUMBER(2) NOT NULL,LOC VARCHAR2(13) NOT NULL)partition by range(loc) (partition p1 values less than(‘C’) tablespace p1, partition p2 values less than(‘D’) tablespace p2, partition p3 values less than(‘N’) tablespace p3, partition p4 values less than(‘Z’) tablespace p4 );Table created.

We alter the table to add a constraint on the primary key column:
SQL> alter table emp add constraint emp_pk primary key(empno); Table altered.

A side effect of this is that there exists a unique index on the EMPNO column. This shows we can support and enforce data integrity, one of our goals. Finally, we create two more global indexes on DEPTNO and JOB to facilitate accessing records quickly by those attributes:

SQL> create index emp_job_idx on emp(job) GLOBAL; Index created.
SQL> create index emp_dept_idx on emp(deptno) GLOBAL; Index created.
SQL> insert into empselect e.*, d.locfrom scott.emp e, scott.dept dwhere e.deptno = d.deptno;14 rows created.

Let’s see what is in each partition:
SQL> break on pname skip 1
SQL> select ‘p1’ pname, empno, job, loc from emp partition(p1) union allselect ‘p2’ pname, empno, job, loc from emp partition(p2)union allselect ‘p3’ pname, empno, job, loc from emp partition(p3)union allselect ‘p4’ pname, empno, job, loc from emp partition(p4);

How Hash Partitioning Works at Oracle Database

Month: January 2024

OLTP and Global Indexes-Partitioning-2

OLTP and Global Indexes-Partitioning-1