I mentioned earlier that the number of partitions should be a power of two. This is easily observed to be true. To demonstrate, we’ll set up a stored procedure to automate the creation of a hash partitioned table with N partitions (N will be a parameter). This procedure will construct a dynamic query to retrieve the counts of rows by partition and then display the counts and a simple histogram of the counts by partition. Lastly, it will open this query and let us see the results. This procedure starts with the hash table creation. We will use a table named T:
SQL> create or replaceprocedure hash_proc( p_nhash in number,p_cursor out sys_refcursor )authid current_userasl_text long;l_template long :=’select $POS$ oc, ”p$POS$” pname, count(*) cnt ‘ || ‘from t partition ( $PNAME$ ) union all ‘;table_or_view_does_not_exist exception;pragma exception_init( table_or_view_does_not_exist, -942 ); beginbegin
execute immediate ‘drop table t’; exception when table_or_view_does_not_existthen null;end;
execute immediate ‘CREATE TABLE t ( id )partition by hash(id)partitions ‘ || p_nhash || ‘asselect rownumfrom all_objects’;
Next, we will dynamically construct a query to retrieve the count of rows by partition. It does this using the template query defined earlier. For each partition, we’ll gather the count using the partition-extended table name and union all of the counts together:
for x in ( select partition_name pname, PARTITION_POSITION pos
from user_tab_partitions where table_name = ‘T’
order by partition_position )loopl_text := l_text ||replace(replace(l_template,’$POS$’, x.pos),’$PNAME$’, x.pname );end loop;
Now, we’ll take that query and select out the partition position (PNAME) and the count of rows in that partition (CNT). Using RPAD, we’ll construct a rather rudimentary but effective histogram:
open p_cursor for’select pname, cnt,
substr( rpad(””,30round( cnt/max(cnt)over(),2),”*”),1,30) hg from (‘ || substr( l_text, 1, length(l_text)-11 ) || ‘) order by oc’;end;/
If we run this with an input of 4, for four hash partitions, we would expect to see output similar to the following:
SQL> variable x refcursor
SQL> set autoprint on
SQL> exec hash_proc( 4, 😡 );
PL/SQL procedure successfully completed.
The simple histogram depicted shows a nice, even distribution of data over each of the four partitions. Each has close to the same number of rows in it. However, if we simply go from four to five hash partitions, we’ll see the following:
SQL> exec hash_proc( 5, 😡 );
PL/SQL procedure successfully completed.
PN CNT HG
This histogram points out that the first and last partitions have just half as many rows as the interior partitions. The data is not very evenly distributed at all. We’ll see the trend continue for six and seven hash partitions:
SQL> exec hash_proc( 6, 😡 );
PL/SQL procedure successfully completed.
PN CNT HG
SQL> exec hash_proc( 7, 😡 );
PL/SQL procedure successfully completed.
PN CNT HG
As soon as we get back to a number of hash partitions that is a power of two, we achieve the goal of even distribution once again:
SQL> exec hash_proc( 8, 😡 );
PL/SQL procedure successfully completed.
PN CNT HG
If we continue this experiment up to 16 partitions, we would see the same effects for the 9th through the 15th partitions—a skewing of the data to the interior partitions, away from the edges—and then upon hitting the 16th partition, you would see a flattening out again. The same would be true again up to 32 partitions, and then 64, and so on. This example just points out the importance of using a power of two as the number of hash partitions.