Technology Craving's

My technology evolution through daily learnings

Indexes and ‘IS NULL’

Came across this blog today via Twitter and got curious, as I wasn’t really aware about it (maybe it’s time to revisit the documentation). So, the next best thing to do was to kick start the XE database on my laptop and verify how it works.

As is the case with any Oracle performance scenario, this one too had it’s complications, but at the end I was happy to understand the situations in which it works and when it doesn’t.

STEP 1 : Create a sample table with enough records to test

create table idxtest as 
SELECT object_name,
  CASE MOD( rownum, 2 )
    WHEN 0
    THEN rownum
    ELSE NULL
  END AS Num
FROM all_objects;

--
-- Equal number of NULL & NOT NULL Values
--
SELECT NVL2( num, 9, - 9 ) NumCols, COUNT( * )
FROM idxtest
GROUP BY NVL2( num, 9, - 9 );

   NUMCOLS   COUNT(*)
---------- ----------
        -9       9914 
         9       9914

No indexes have been created yet. So we first create an index on the NUM column of IDXTEST and verify that if we were to execute a query with a where clause using “IS NOT NULL” condition then it indeed does a full table scan.

STEP 2 : Verify FTS on table

create table idxtest as 
--
--Create the INDEX
--
CREATE INDEX id1 ON idxtest
  ( num
  );

--
--Gather the stats on the table
--
exec SYS.dbms_stats.gather_table_stats(ownname=>'DEPTRACK',
                                       tabname =>'IDXTEST', 
                                       cascade=> TRUE);

--
--Verify Stats 
--
SELECT table_name, blocks,
  num_rows
FROM user_tables
WHERE table_name LIKE '%IDXTEST'
ORDER BY 1;

TABLE_NAME                         BLOCKS   NUM_ROWS
------------------------------ ---------- ----------
IDXTEST                                79      19828 

SELECT table_name, index_name,
  clustering_factor
FROM user_indexes
WHERE table_name LIKE '%IDXTEST'
ORDER BY 1;

TABLE_NAME                     INDEX_NAME                     CLUSTERING_FACTOR
------------------------------ ------------------------------ -----------------
IDXTEST                        ID1                                           72 

--
--Check the plan
--
SQL> set line 500
SQL> set pagesize 0
SQL> set autotrace traceonly explain
SQL> select * from idxtest where num is null;

Execution Plan
----------------------------------------------------------
Plan hash value: 3672553500

-----------------------------------------------------------------------------
| Id  | Operation         | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |         |  9914 |   212K|    23   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| IDXTEST |  9914 |   212K|    23   (0)| 00:00:01 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("NUM" IS NULL)

The above code sample shows how the index has ignores the NULL values and that FTS is getting used inspite of an index being present.
Now we modify the index to include the NULL’s and see if it changes the plan

STEP 3 : Modify INDEX – ID1 to include NULL’s

DROP INDEX id1;

CREATE INDEX id1 ON idxtest
  ( num, 1
  );

exec SYS.dbms_stats.gather_table_stats(ownname=>'DEPTRACK',
                                       tabname =>'IDXTEST', 
                                       cascade=> TRUE);

SELECT table_name, index_name,
  clustering_factor
FROM user_indexes
WHERE table_name LIKE '%IDXTEST'
ORDER BY 1;

TABLE_NAME                     INDEX_NAME                     CLUSTERING_FACTOR
------------------------------ ------------------------------ -----------------
IDXTEST                        ID1                                          144 

SELECT table_name, blocks,
  num_rows
FROM user_tables
WHERE table_name LIKE '%IDXTEST'
ORDER BY 1;

TABLE_NAME                         BLOCKS   NUM_ROWS
------------------------------ ---------- ----------
IDXTEST                                79      19828

SQL> comment on table idxtest is '';

Comment created.

SQL> select * from idxtest where num is null;

Execution Plan
----------------------------------------------------------
Plan hash value: 3672553500

-----------------------------------------------------------------------------
| Id  | Operation         | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |         |  9914 |   212K|    23   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| IDXTEST |  9914 |   212K|    23   (0)| 00:00:01 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("NUM" IS NULL)

So, nothing has changed. Why does this happen?
First let’s verify if the Optimizer even considered looking at the index to choose an optimal plan. We will use a 10053 trace for the same

SQL> comment on table idxtest is '';

Comment created.

SQL> ALTER SESSION SET EVENTS '10053 trace name context forever';

Session altered.

SQL> select * from idxtest where num is null;

....
SQL> ALTER SESSION SET EVENTS '10053 trace name context off';

Session altered.

--
--Trace File output (Only relevant section extracted
--

****************
QUERY BLOCK TEXT
****************
select * from idxtest where num is null
---------------------
QUERY BLOCK SIGNATURE
---------------------
signature (optimizer): qb_name=SEL$1 nbfros=1 flg=0
  fro(0): flg=0 objn=24444 hint_alias="IDXTEST"@"SEL$1"

-----------------------------
SYSTEM STATISTICS INFORMATION
-----------------------------
  Using NOWORKLOAD Stats
  CPUSPEEDNW: 2017 millions instructions/sec (default is 100)
  IOTFRSPEED: 4096 bytes per millisecond (default is 4096)
  IOSEEKTIM: 10 milliseconds (default is 10)
  MBRC: -1 blocks (default is 8)

***************************************
BASE STATISTICAL INFORMATION
***********************
Table Stats::
  Table: IDXTEST  Alias: IDXTEST
    #Rows: 19828  #Blks:  79  AvgRowLen:  25.00  ChainCnt:  0.00
Index Stats::
  Index: ID1  Col#: 2 3
    LVLS: 1  #LB: 47  #DK: 9915  LB/K: 1.00  DB/K: 1.00  CLUF: 144.00
Access path analysis for IDXTEST
***************************************
SINGLE TABLE ACCESS PATH 
  Single Table Cardinality Estimation for IDXTEST[IDXTEST] 
  Column (#2): NUM(
    AvgLen: 3 NDV: 9914 Nulls: 9914 Density: 0.000101 Min: 2 Max: 19828
  Table: IDXTEST  Alias: IDXTEST
    Card: Original: 19828.000000  Rounded: 9914  Computed: 9914.00  Non Adjusted: 9914.00
  Access Path: TableScan
    Cost:  23.18  Resp: 23.18  Degree: 0
      Cost_io: 23.00  Cost_cpu: 4329914
      Resp_io: 23.00  Resp_cpu: 4329914
  Access Path: index (RangeScan)
    Index: ID1
    resc_io: 97.00  resc_cpu: 4358960
    ix_sel: 0.500000  ix_sel_with_filters: 0.500000 
    Cost: 97.18  Resp: 97.18  Degree: 1
  Best:: AccessPath: TableScan
         Cost: 23.18  Degree: 1  Resp: 23.18  Card: 9914.00  Bytes: 0

***************************************

As we can see the Optimizer did evaluate both options but chose the FTS to be the best.
So, when exactly is this index helpful.
Let us change the data density for NULL’s and reduce it to 20% from the existing 50%.

DROP TABLE idxtest;

create table idxtest as 
SELECT object_name,
  CASE MOD( rownum, 5 )
    WHEN 0
    THEN NULL
    ELSE rownum
  END AS Num
FROM all_objects;

SELECT NVL2( num, 9, - 9 ) NumCols, COUNT( * )
FROM idxtest
GROUP BY NVL2( num, 9, - 9 );

   NUMCOLS   COUNT(*)
---------- ----------
        -9       3966 
         9      15866

CREATE INDEX id1 ON idxtest
  ( num, 1
  );

exec SYS.dbms_stats.gather_table_stats(ownname=>'DEPTRACK',
                                       tabname =>'IDXTEST', 
                                       cascade=> TRUE);

SQL> select * from idxtest where num is null;

Execution Plan
----------------------------------------------------------
Plan hash value: 3672553500

-----------------------------------------------------------------------------
| Id  | Operation         | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |         |  3966 | 91218 |    25   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| IDXTEST |  3966 | 91218 |    25   (0)| 00:00:01 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("NUM" IS NULL)

Still FTS. We reduce further and make the NULL’s 5% of the overall numbers.

DROP TABLE idxtest;

create table idxtest as 
SELECT object_name,
  CASE MOD( rownum, 20 )
    WHEN 0
    THEN NULL
    ELSE rownum
  END AS Num
FROM all_objects;

SELECT NVL2( num, 9, - 9 ) NumCols, COUNT( * )
FROM idxtest
GROUP BY NVL2( num, 9, - 9 );

   NUMCOLS   COUNT(*)
---------- ----------
        -9        991 
         9      18841

CREATE INDEX id1 ON idxtest
  ( num, 1
  );

exec SYS.dbms_stats.gather_table_stats(ownname=>'DEPTRACK',
                                       tabname =>'IDXTEST', 
                                       cascade=> TRUE);

SQL> select * from idxtest where num is null;

Execution Plan
----------------------------------------------------------
Plan hash value: 2409506981

---------------------------------------------------------------------------------------
| Id  | Operation                   | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |         |   991 | 23784 |    12   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| IDXTEST |   991 | 23784 |    12   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | ID1     |   991 |       |     4   (0)| 00:00:01 |
---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("NUM" IS NULL)

****************
QUERY BLOCK TEXT
****************
select * from idxtest where num is null
---------------------
QUERY BLOCK SIGNATURE
---------------------
signature (optimizer): qb_name=SEL$1 nbfros=1 flg=0
  fro(0): flg=0 objn=24504 hint_alias="IDXTEST"@"SEL$1"

-----------------------------
SYSTEM STATISTICS INFORMATION
-----------------------------
  Using NOWORKLOAD Stats
  CPUSPEEDNW: 2017 millions instructions/sec (default is 100)
  IOTFRSPEED: 4096 bytes per millisecond (default is 4096)
  IOSEEKTIM: 10 milliseconds (default is 10)
  MBRC: -1 blocks (default is 8)

***************************************
BASE STATISTICAL INFORMATION
***********************
Table Stats::
  Table: IDXTEST  Alias: IDXTEST
    #Rows: 19832  #Blks:  86  AvgRowLen:  26.00  ChainCnt:  0.00
Index Stats::
  Index: ID1  Col#: 2 3
    LVLS: 1  #LB: 51  #DK: 18842  LB/K: 1.00  DB/K: 1.00  CLUF: 156.00
Access path analysis for IDXTEST
***************************************
SINGLE TABLE ACCESS PATH 
  Single Table Cardinality Estimation for IDXTEST[IDXTEST] 
  Column (#2): NUM(
    AvgLen: 5 NDV: 18964 Nulls: 991 Density: 0.000053 Min: 1 Max: 19832
  Table: IDXTEST  Alias: IDXTEST
    Card: Original: 19832.000000  Rounded: 991  Computed: 991.00  Non Adjusted: 991.00
  Access Path: TableScan
    Cost:  25.18  Resp: 25.18  Degree: 0
      Cost_io: 25.00  Cost_cpu: 4380524
      Resp_io: 25.00  Resp_cpu: 4380524
  Access Path: index (RangeScan)
    Index: ID1
    resc_io: 12.00  resc_cpu: 452127
    ix_sel: 0.049970  ix_sel_with_filters: 0.049970 
    Cost: 12.02  Resp: 12.02  Degree: 1
  Best:: AccessPath: IndexRange
  Index: ID1
         Cost: 12.02  Degree: 1  Resp: 12.02  Card: 991.00  Bytes: 0

***************************************

There you go, the index has been chosen by the optimizer and used.

CONCLUSION

DO not blindly change your indexes to include the NULL values. This will depend on how the NULL’s are distributed in column values and what kind of queries are executed on them. Always good to do a quick test on your application dataset before jumping to any conclusions. These same tests may vary if the data volumes are changed.

SQL Queries – Stats, Plans and Functions

In Progress

Julian Dontcheff's Database Blog

The good DBA is one who learns from his mistakes, the best DBA is one who learns from other DBA's mistakes

Richard Foote's Oracle Blog

Focusing Specifically On Oracle Indexes, Database Administration and Some Great Music

So Many Oracle Manuals, So Little Time

“Books to the ceiling, Books to the sky, My pile of books is a mile high. How I love them! How I need them! I'll have a long beard by the time I read them”—Lobel, Arnold. Whiskers and Rhymes. William Morrow & Co, 1988.

Merv Adrian's IT Market Strategy

IT Industry trends: technology, products, competition and the research industry.

flashdba

Oracle databases, storage and the high-performance world of flash memory

Kevin Closson's Blog: Platforms, Databases and Storage

Platform, Database and Storage Topics

The Oracle Instructor

Explain, Exemplify, Empower

Coskan's Approach to Oracle

What I learned about Oracle

Iordan Iotzov's DBA blog

The opinions here may not necessarily reflect that of my employers - both past and present. The comments left by the reviewers are theirs alone and may not reflect my opinion. None of the advice is warranted to be free of errors and omissions. Please use at your own risk and after thorough testing in your environment.

Oracle Tips and Tricks -- David Fitzjarrell

Oracle Tips, Tricks and Information

Oracle Scratchpad

Just another Oracle weblog

Technology Craving's

My technology evolution through daily learnings

Follow

Get every new post delivered to your Inbox.

Join 229 other followers