This question was recently asked by a colleague:
"I have an index on a column, but when I run a DISTINCT query on that column, Oracle still performs a full table scan. Why?”
At first glance, this feels counter-intuitive. If an index exists, Oracle should use it, right?
The answer is simple, but understanding why Oracle behaves this way is important for every DBA.
In this post, we’ll walk through a small test case and see exactly when and why Oracle uses (or ignores) an index for a DISTINCT query.
Test Setup
Let’s create a sample table with some data and an index on the column we’ll use in the DISTINCT query.
SQL> create table some_data as select trunc(dbms_random.value(1,100)) val, a.* from dba_objects a;
Table created.
SQL> @desc some_data
Name Null? Type
----------------------------------------- -------- ----------------------------
VAL NUMBER
OWNER VARCHAR2(128)
OBJECT_NAME VARCHAR2(128)
SUBOBJECT_NAME VARCHAR2(128)
OBJECT_ID NUMBER
DATA_OBJECT_ID NUMBER
OBJECT_TYPE VARCHAR2(23)
CREATED DATE
LAST_DDL_TIME DATE
TIMESTAMP VARCHAR2(19)
STATUS VARCHAR2(7)
TEMPORARY VARCHAR2(1)
GENERATED VARCHAR2(1)
SECONDARY VARCHAR2(1)
NAMESPACE NUMBER
EDITION_NAME VARCHAR2(128)
SHARING VARCHAR2(18)
EDITIONABLE VARCHAR2(1)
ORACLE_MAINTAINED VARCHAR2(1)
APPLICATION VARCHAR2(1)
DEFAULT_COLLATION VARCHAR2(100)
DUPLICATED VARCHAR2(1)
SHARDED VARCHAR2(1)
CREATED_APPID NUMBER
CREATED_VSNID NUMBER
MODIFIED_APPID NUMBER
MODIFIED_VSNID NUMBER
SQL> create index some_data_val_indx on some_data(val);
Index created.
SQL>
SQL>
SQL> exec dbms_stats.gather_table_stats(null, 'SOME_DATA', cascade=>true);
PL/SQL procedure successfully completed.
SQL> set autot trace
SQL> select distinct val from some_data;
99 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 443606636
-------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 99 | 297 | 632 (1)| 00:00:01 |
| 1 | HASH UNIQUE | | 99 | 297 | 632 (1)| 00:00:01 |
| 2 | TABLE ACCESS FULL| SOME_DATA | 116K| 341K| 628 (1)| 00:00:01 |
-------------------------------------------------------------
Statistics
----------------------------------------------------------
36 recursive calls
0 db block gets
2340 consistent gets
0 physical reads
0 redo size
2207 bytes sent via SQL*Net to client
463 bytes received via SQL*Net from client
8 SQL*Net roundtrips to/from client
4 sorts (memory)
0 sorts (disk)
99 rows processed
Even though an index exists on VAL, Oracle chooses a full table scan.
What If We Force the Index Using a Hint?
SQL> select /*+ index(some_data) */ distinct val from some_data;
99 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 443606636
-------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 99 | 297 | 632 (1)| 00:00:01 |
| 1 | HASH UNIQUE | | 99 | 297 | 632 (1)| 00:00:01 |
| 2 | TABLE ACCESS FULL| SOME_DATA | 116K| 341K| 628 (1)| 00:00:01 |
-------------------------------------------------------------
Hint Report (identified by operation id / Query Block Name / Object Alias):
Total hints for statement: 1 (U - Unused (1))
-------------------------------------------------------------
2 - SEL$1 / SOME_DATA@SEL$1
U - index(some_data)
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
2312 consistent gets
0 physical reads
0 redo size
2207 bytes sent via SQL*Net to client
486 bytes received via SQL*Net from client
8 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
99 rows processed
Result
• Same execution plan
• Index hint is ignored
• Full table scan is still chosen
Oracle clearly doesn’t want to use the index here.
Why Is Oracle Ignoring the Index?
The reason is straightforward:
A B-Tree index does NOT store NULL values.
Since VAL is nullable, Oracle cannot guarantee that all distinct values can be derived by scanning only the index. To be 100% correct, it must scan the table to account for possible NULLs.
That’s why:
• Full table scan is chosen
• Index hints are ignored
What If the Column Is NOT NULL?
Let’s change the column definition.
SQL> alter table some_data modify(val not null);
Table altered.
SQL> select distinct val from some_data;
99 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 1329759908
--------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------
| 0 | SELECT STATEMENT | | 99 | 297 | 69 (8)| 00:00:01 |
| 1 | HASH UNIQUE | | 99 | 297 | 69 (8)| 00:00:01 |
| 2 | INDEX FAST FULL SCAN| SOME_DATA_VAL_INDX | 116K| 341K| 65 (2)| 00:00:01 |
-------------------------------------------------------------
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
236 consistent gets
0 physical reads
0 redo size
2207 bytes sent via SQL*Net to client
463 bytes received via SQL*Net from client
8 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
99 rows processed
Oracle now uses an INDEX FAST FULL SCAN, resulting in:
• Lower cost
• Fewer consistent gets
• Multiblock reads
• No table access
This is a significant improvement.
What If the Column Cannot Be Changed to NOT NULL?
In real systems, modifying column definitions is not always possible.
In that case, we can help Oracle by explicitly excluding NULLs.
SQL> alter table some_data modify (val null);
Table altered.
SQL> select /*+ index_ffs(some_data) */ distinct val from some_data where val is not null;
99 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 1329759908
-------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 99 | 297 | 69 (8)| 00:00:01 |
| 1 | HASH UNIQUE | | 99 | 297 | 69 (8)| 00:00:01 |
|* 2 | INDEX FAST FULL SCAN| SOME_DATA_VAL_INDX | 116K| 341K| 65 (2)| 00:00:01 |
-------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("VAL" IS NOT NULL)
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
236 consistent gets
0 physical reads
0 redo size
2207 bytes sent via SQL*Net to client
512 bytes received via SQL*Net from client
8 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
99 rows processed
Oracle now safely uses the index because:
• NULL values are explicitly filtered
• Index contains all required rows
Key Takeaways
1. B-Tree indexes do not store NULL values
2. DISTINCT on a nullable column often results in a full table scan
3. Oracle may ignore index hints if correctness cannot be guaranteed
4. DISTINCT on a NOT NULL column can use an index efficiently
5. Adding WHERE column IS NOT NULL can enable index usage
6. INDEX FAST FULL SCAN is especially efficient when the index covers all required columns
I hope this article helped you. Your suggestions/feedback are most welcome.
Keep learning... Have a great day!!!
Thank you,
Amit Pawar
Email: amitpawar.dba@gmail.com
WhatsApp No: +91-8454841011
No comments:
Post a Comment