Dion Cho – Oracle Performance Storyteller

We are natural born scientists

Interesting case of the restriction of global hints

with one comment

One of my colleagues sent following test case, of which he couldn’t control the join order with hints.

1. Create objects – table t1, t2 and t3

SQL> create table t1(c1 number, c2 number);

Table created.

SQL> create table t2(c1 number, c2 number);

Table created.

SQL> create table t3(c1 number, c2 number);

Table created.

2. Now Let’s set the join order as T1->T2->T3, using global hints convention. But it seems that Oracle does not work as expected.

SQL> explain plan for
  2  select * from
  3  (
  4  select
  5  	     /*+ leading(v.t1 v.t2 t3) */
  6  	     v.c1 as v_c1,
  7  	     v.c2 as v_c2,
  8  	     t3.c2 as t3_c2
  9  from
 10  	     (select
 11  		     t1.c1,
 12  		     t2.c2
 13  	     from
 14  		     t1, t2
 15  	     where
 16  		     t1.c1 = t2.c1) v,
 17  	     t3
 18  where
 19  	     v.c1 = t3.c1
 20  ) x
 21  ;

------------------------------------------------------------------------------
| Id  | Operation             | Name | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |      |     1 |    65 |     7  (15)| 00:00:01 |
|*  1 |  HASH JOIN            |      |     1 |    65 |     7  (15)| 00:00:01 |
|   2 |   MERGE JOIN CARTESIAN|      |     1 |    52 |     4   (0)| 00:00:01 |
|   3 |    TABLE ACCESS FULL  | T2   |     1 |    26 |     2   (0)| 00:00:01 |
|   4 |    BUFFER SORT        |      |     1 |    26 |     2   (0)| 00:00:01 |
|   5 |     TABLE ACCESS FULL | T3   |     1 |    26 |     2   (0)| 00:00:01 |
|   6 |   TABLE ACCESS FULL   | T1   |     1 |    13 |     2   (0)| 00:00:01 |
------------------------------------------------------------------------------

It really seems that the optimizer does not understand the global hints when it is used with non global hint convention(t3 here). Yes, you can use hints inside the inline view, but what if you want to control the join order only with global hint convention?

3. The cure he tried was using Oracle’s internal global hint convention. You can get how Oracle’s internal hint convention looks using ‘ADVANCED’ format option with DBMS_XPLAN.DISPLAY function.

select * from table(dbms_xplan.display(null, null, 'advanced'));
...

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------

   1 - SEL$5C160134
   3 - SEL$5C160134 / T1@SEL$3
   4 - SEL$5C160134 / T2@SEL$3
   5 - SEL$5C160134 / T3@SEL$2

Outline Data
-------------

  /*+
      BEGIN_OUTLINE_DATA
      USE_HASH(@"SEL$5C160134" "T3"@"SEL$2")
      USE_HASH(@"SEL$5C160134" "T2"@"SEL$3")
      LEADING(@"SEL$5C160134" "T1"@"SEL$3" "T2"@"SEL$3" "T3"@"SEL$2")
      FULL(@"SEL$5C160134" "T3"@"SEL$2")
      FULL(@"SEL$5C160134" "T2"@"SEL$3")
      FULL(@"SEL$5C160134" "T1"@"SEL$3")
      OUTLINE(@"SEL$3")
      OUTLINE(@"SEL$2")
      MERGE(@"SEL$3")
      OUTLINE(@"SEL$335DD26A")
      OUTLINE(@"SEL$1")
      MERGE(@"SEL$335DD26A")
      OUTLINE_LEAF(@"SEL$5C160134")
      ALL_ROWS
      DB_VERSION('11.2.0.1')
      OPTIMIZER_FEATURES_ENABLE('11.2.0.1')
      IGNORE_OPTIM_EMBEDDED_HINTS
      END_OUTLINE_DATA
  */

4. With this ugly global hint convention, the join order is now fully controllable.

SQL> explain plan for
  2  select * from
  3  (
  4  select
  5  	     /*+ LEADING(@"SEL$5C160134" "T1"@"SEL$3" "T2"@"SEL$3" "T3"@"SEL$2" ) */
  6  	     v.c1 as v_c1,
  7  	     v.c2 as v_c2,
  8  	     t3.c2 as t3_c2
  9  from
 10  	     (select
 11  		     t1.c1,
 12  		     t2.c2
 13  	     from
 14  		     t1, t2
 15  	     where
 16  		     t1.c1 = t2.c1) v,
 17  	     t3
 18  where
 19  	     v.c1 = t3.c1
 20  ) x
 21  ;

----------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |     1 |    65 |     7  (15)| 00:00:01 |
|*  1 |  HASH JOIN          |      |     1 |    65 |     7  (15)| 00:00:01 |
|*  2 |   HASH JOIN         |      |     1 |    39 |     5  (20)| 00:00:01 |
|   3 |    TABLE ACCESS FULL| T1   |     1 |    13 |     2   (0)| 00:00:01 |
|   4 |    TABLE ACCESS FULL| T2   |     1 |    26 |     2   (0)| 00:00:01 |
|   5 |   TABLE ACCESS FULL | T3   |     1 |    26 |     2   (0)| 00:00:01 |
----------------------------------------------------------------------------

5. The better approach would be QB_NAME hint, which provides better readability and flexibility.

SQL> explain plan for
  2  select * from
  3  (
  4  select
  5  	     /*+ leading(t1@inline t2@inline t3) */
  6  	     v.c1 as v_c1,
  7  	     v.c2 as v_c2,
  8  	     t3.c2 as t3_c2
  9  from
 10  	     (select /*+ qb_name(inline) */
 11  		     t1.c1,
 12  		     t2.c2
 13  	     from
 14  		     t1, t2
 15  	     where
 16  		     t1.c1 = t2.c1) v,
 17  	     t3
 18  where
 19  	     v.c1 = t3.c1
 20  ) x
 21  ;

----------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |     1 |    65 |     7  (15)| 00:00:01 |
|*  1 |  HASH JOIN          |      |     1 |    65 |     7  (15)| 00:00:01 |
|*  2 |   HASH JOIN         |      |     1 |    39 |     5  (20)| 00:00:01 |
|   3 |    TABLE ACCESS FULL| T1   |     1 |    13 |     2   (0)| 00:00:01 |
|   4 |    TABLE ACCESS FULL| T2   |     1 |    26 |     2   (0)| 00:00:01 |
|   5 |   TABLE ACCESS FULL | T3   |     1 |    26 |     2   (0)| 00:00:01 |
----------------------------------------------------------------------------

Let me give my thanks to my colleague, who sent this interesting test case.

PS) Note that QB_NAME hint has also some restrictions, in that case, Oracle’s internal global hint convention would be a good alternative.

Written by Dion Cho

December 17, 2010 at 5:21 am

Posted in Uncategorized

Tagged with , ,

Interpreting Parallel MERGE Statement

with one comment

This post is inspired by one of my colleagues, who had small difficulty while interpreting parallel MERGE execution plan.

This is the step-by-step explanation I gave him.

1. Create tables.

SQL> create table t1
  2  as
  3  select
  4  	      level as c1,
  5  	      level as c2,
  6  	      rpad('x',100) as c3
  7  from
  8  	      dual
  9  connect by level <= 10000
 10  ;

Table created.

SQL> create table t2
  2  as
  3  select
  4  	      level as c1,
  5  	      level as c2,
  6  	      rpad('x', 100) as c3
  7  from
  8  	      dual
  9  connect by level <= 10000
 10  ;

Table created.

2. Would this MERGE statement run in parallel?

SQL> explain plan for
  2  merge /*+ parallel */ into t1
  3   using (select c1, c2 from t2) t2
  4   on (t1.c1 = t2.c1)
  5  when matched then
  6  	      update set t1.c2 = t1.c2
  7  when not matched then
  8  	      insert(c1, c2) values(t2.c1, t2.c2)
  9  ;

Explained.

-------------------------------------------------------
| Id  | Operation                  | Name     | Rows  |
-------------------------------------------------------
|   0 | MERGE STATEMENT            |          |  9356 |
|   1 |  MERGE                     | T1       |       |
|   2 |   PX COORDINATOR           |          |       |
|   3 |    PX SEND QC (RANDOM)     | :TQ10001 |  9356 |
|   4 |     VIEW                   |          |       |
|*  5 |      HASH JOIN OUTER       |          |  9356 |
|   6 |       PX BLOCK ITERATOR    |          |  9356 |
|   7 |        TABLE ACCESS FULL   | T2       |  9356 |
|   8 |       BUFFER SORT          |          |       |
|   9 |        PX RECEIVE          |          | 11234 |
|  10 |         PX SEND BROADCAST  | :TQ10000 | 11234 |
|  11 |          PX BLOCK ITERATOR |          | 11234 |
|  12 |           TABLE ACCESS FULL| T1       | 11234 |
-------------------------------------------------------

The answer is NO. As you notice from the operation #2, the MERGE part itself runs in serial. It is SELECT part that runs in parallel.

3. The reason would be parallel dml not enabled. Let’s see what happens if we have parallel dml enabled.

SQL> alter session enable parallel dml;

Session altered.

SQL> explain plan for
  2  merge /*+ parallel */ into t1
  3   using (select c1, c2 from t2) t2
  4   on (t1.c1 = t2.c1)
  5  when matched then
  6  	      update set t1.c2 = t1.c2
  7  when not matched then
  8  	      insert(c1, c2) values(t2.c1, t2.c2)
  9  ;

-------------------------------------------------------
| Id  | Operation                  | Name     | Rows  |
-------------------------------------------------------
|   0 | MERGE STATEMENT            |          |  9356 |
|   1 |  MERGE                     | T1       |       |
|   2 |   PX COORDINATOR           |          |       |
|   3 |    PX SEND QC (RANDOM)     | :TQ10001 |  9356 |
|   4 |     VIEW                   |          |       |
|*  5 |      HASH JOIN OUTER       |          |  9356 |
|   6 |       PX BLOCK ITERATOR    |          |  9356 |
|   7 |        TABLE ACCESS FULL   | T2       |  9356 |
|   8 |       BUFFER SORT          |          |       |
|   9 |        PX RECEIVE          |          | 11234 |
|  10 |         PX SEND BROADCAST  | :TQ10000 | 11234 |
|  11 |          PX BLOCK ITERATOR |          | 11234 |
|  12 |           TABLE ACCESS FULL| T1       | 11234 |
-------------------------------------------------------

Still, the MERGE part runs serially.

4. Maybe parallel merge needs more specific HINT, which is quite natural because the MERGE statement has two tables involved. Let me make the PARALLE hint more specific.

SQL> explain plan for
  2  merge /*+ parallel(t1) */ into t1
  3   using (select c1, c2 from t2) t2
  4   on (t1.c1 = t2.c1)
  5  when matched then
  6  	      update set t1.c2 = t1.c2
  7  when not matched then
  8  	      insert(c1, c2) values(t2.c1, t2.c2)
  9  ;

----------------------------------------------------
| Id  | Operation                       | Name     |
----------------------------------------------------
|   0 | MERGE STATEMENT                 |          |
|   1 |  PX COORDINATOR                 |          |
|   2 |   PX SEND QC (RANDOM)           | :TQ10003 |
|   3 |    MERGE                        | T1       |
|   4 |     PX RECEIVE                  |          |
|   5 |      PX SEND HYBRID (ROWID PKEY)| :TQ10002 |
|   6 |       VIEW                      |          |
|*  7 |        HASH JOIN OUTER BUFFERED |          |
|   8 |         BUFFER SORT             |          |
|   9 |          PX RECEIVE             |          |
|  10 |           PX SEND HASH          | :TQ10000 |
|  11 |            TABLE ACCESS FULL    | T2       |
|  12 |         PX RECEIVE              |          |
|  13 |          PX SEND HASH           | :TQ10001 |
|  14 |           PX BLOCK ITERATOR     |          |
|  15 |            TABLE ACCESS FULL    | T1       |
----------------------------------------------------

Now, the MERGE part really runs in parallel!

5. One more thing you should be careful about. Let me create an index on the target table t1 and see what happens on the plan.

SQL> create index t1_n1 on t1(c1);

Index created.

SQL> explain plan for
  2  merge /*+ parallel(t1) */ into t1
  3   using (select c1, c2 from t2) t2
  4   on (t1.c1 = t2.c1)
  5  when matched then
  6  	      update set t1.c2 = t1.c2
  7  when not matched then
  8  	      insert(c1, c2) values(t2.c1, t2.c2)
  9  ;

---------------------------------------------------------------
| Id  | Operation                          | Name     | Rows  |
---------------------------------------------------------------
|   0 | MERGE STATEMENT                    |          |  9356 |
|   1 |  PX COORDINATOR                    |          |       |
|   2 |   PX SEND QC (RANDOM)              | :TQ10004 |  9356 |
|   3 |    INDEX MAINTENANCE               | T1       |       |
|   4 |     PX RECEIVE                     |          |  9356 |
|   5 |      PX SEND RANGE                 | :TQ10003 |  9356 |
|   6 |       MERGE                        | T1       |       |
|   7 |        PX RECEIVE                  |          |  9356 |
|   8 |         PX SEND HYBRID (ROWID PKEY)| :TQ10002 |  9356 |
|   9 |          VIEW                      |          |       |
|* 10 |           HASH JOIN OUTER BUFFERED |          |  9356 |
|  11 |            BUFFER SORT             |          |       |
|  12 |             PX RECEIVE             |          |  9356 |
|  13 |              PX SEND HASH          | :TQ10000 |  9356 |
|  14 |               TABLE ACCESS FULL    | T2       |  9356 |
|  15 |            PX RECEIVE              |          | 11234 |
|  16 |             PX SEND HASH           | :TQ10001 | 11234 |
|  17 |              PX BLOCK ITERATOR     |          | 11234 |
|  18 |               TABLE ACCESS FULL    | T1       | 11234 |
---------------------------------------------------------------

Now we have a new operation called INDEX MAINTENANCE and need one more table queue(TQ10004) to parallelize index maintenance itself. But why do we have index maintenance here?

  • Parallel MERGE is composed of parallel update and parallel insert.
  • Parallel insert is direct path insert.
  • Indexes need to be maintained after direct path insert.

Oracle needs to maintain the indexes after direct path insert completes. This is controlled by _idl_conventional_index_maintenance hidden parameter, which you’d never have a chance to modify.

SQL> select * from table(tpack.param('_idl_conventional_index_maintenance'));

NAME                                VALUE
----------------------------------- ----------------------------------------
Name #1                             _idl_conventional_index_maintenance
  Value                             TRUE
  Is Default                        TRUE
  Sess Modifiable                   false
  Sys Modifiable                    false
  Description                       enable conventional index maintenance fo
                                    r insert direct load

The INDEX MAINTENANCE operation does not disappear even when the index is disabled, but I believe that Oracle would skip the maintenance for the disabled indexes in runtime.

SQL> alter index t1_n1 unusable;

Index altered.

SQL> explain plan for
  2  merge /*+ parallel(t1) */ into t1
  3   using (select c1, c2 from t2) t2
  4   on (t1.c1 = t2.c1)
  5  when matched then
  6  	      update set t1.c2 = t1.c2
  7  when not matched then
  8  	      insert(c1, c2) values(t2.c1, t2.c2)
  9  ;

---------------------------------------------------------------
| Id  | Operation                          | Name     | Rows  |
---------------------------------------------------------------
|   0 | MERGE STATEMENT                    |          |  9356 |
|   1 |  PX COORDINATOR                    |          |       |
|   2 |   PX SEND QC (RANDOM)              | :TQ10004 |  9356 |
|   3 |    INDEX MAINTENANCE               | T1       |       |
|   4 |     PX RECEIVE                     |          |  9356 |
|   5 |      PX SEND RANGE                 | :TQ10003 |  9356 |
|   6 |       MERGE                        | T1       |       |
|   7 |        PX RECEIVE                  |          |  9356 |
|   8 |         PX SEND HYBRID (ROWID PKEY)| :TQ10002 |  9356 |
|   9 |          VIEW                      |          |       |
|* 10 |           HASH JOIN OUTER BUFFERED |          |  9356 |
|  11 |            BUFFER SORT             |          |       |
|  12 |             PX RECEIVE             |          |  9356 |
|  13 |              PX SEND HASH          | :TQ10000 |  9356 |
|  14 |               TABLE ACCESS FULL    | T2       |  9356 |
|  15 |            PX RECEIVE              |          | 11234 |
|  16 |             PX SEND HASH           | :TQ10001 | 11234 |
|  17 |              PX BLOCK ITERATOR     |          | 11234 |
|  18 |               TABLE ACCESS FULL    | T1       | 11234 |
---------------------------------------------------------------

I hope this helped him to understand how to interpret the execution plan of parallel DML . :)

Written by Dion Cho

December 10, 2010 at 4:19 am

Posted in Parallel Execution

Making self deadlock on the library cache pin

with 4 comments

It’s very easy to make your session deadlocked on the library cache pin. Let me prove it.

Step 1. Make a dummy procedure called TEST_PROC1.

TPACK@ukja1120> select * from v$version where rownum = 1;

BANNER
-----------------------------------------------------------------------

Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production

create or replace procedure test_proc1 
is
begin
	null;
end;
/

Step 2. Execute simple PL/SQL which executes TEST_PROC1 and compiles it.

TPACK@ukja1120> begin
  2     test_proc1;
  3
  4     execute immediate 'alter procedure test_proc1 compile';
  5
  6  end;
  7  /

...
(Hang)

Very simple, isn’t it?

ASH cleary shows that the session is waiting for the library cache pin to be released, but it’s being acquried by the session itself.

select * 
from (
	select
		h.session_id as sid,
		to_char(h.sample_time,'mi:ss') as sample_time,
		h.sql_id,
		(select sql_text from v$sqlarea a where a.sql_id = h.sql_id) as sql_text,
		event,
		blocking_session as blocker
	from
		v$active_session_history h
	where
		h.session_id = &sid
	order by h.sample_time desc			
) where rownum <= 20	
;					

 SID SAMPL SQL_ID        SQL_TEXT             EVENT         BLOCKER
---- ----- ------------- -------------------- ---------- ----------
 136 49:10                                    library ca        136
                                              che pin

 136 49:09                                    library ca        136
                                              che pin

 136 49:08                                    library ca        136
                                              che pin

 136 49:07                                    library ca        136
                                              che pin

 136 49:06                                    library ca        136
                                              che pin

 136 49:05                                    library ca        136
                                              che pin

 136 49:04                                    library ca        136
                                              che pin

 136 49:03                                    library ca        136
                                              che pin

 136 49:02                                    library ca        136
                                              che pin

 136 49:01                                    library ca        136
                                              che pin

 136 49:00                                    library ca        136
                                              che pin

 136 48:59                                    library ca        136
                                              che pin

 ...

TPack‘s Wait Detail Report gives more detailed information on the deadlocked library cache pin contention.

TPACK@ukja1120> select * from table(tpack.session_detail(136,'wait_detail'))

NAME                           VALUE
------------------------------ --------------------
SID                            136
Serial#                        2797
SPID                           5148
Program                        sqlplus.exe
Process                        5404:672
Module                         SQL*Plus
SQL ID                         9pbva4bn2m25b
Child No                       0
SQL Text                       alter procedure test
                               _proc1 compile

Status                         ACTIVE
Blocking Instance              1
Blocking Session               136
SQL Exec Start                 2010/09/15 13:45:34
Event                          library cache pin
Seq#                           130
P1(P1raw)                      384372376(0000000016
                               E90E98)

P2(P2raw)                      384372376(0000000016
                               DAB608)

P3(P3raw)                      384372376(00014F8500
                               010003)

Seconds in wait                40
State                          WAITING
Wait Event                     library cache pin
Holder SID                     136
Namespace                      TABLE/PROCEDURE
Object                         TEST_PROC1
Holding Mode                   2(S)

So, this is not a normal situation, isn’t it? What would be the reason of the library cache pin not being released even after the execution?

Written by Dion Cho

September 16, 2010 at 2:06 am

Posted in Troubleshooting

Batching NLJ optimization and ordering

with 5 comments

You might be aware of the batching nested loops join(batching NLJ) optimization introduced at Oracle 11g. For who are not aware of this new feature, let me show you two different execution plans(Oracle 10g vs. 11g) of the same SQL.

-- Oracle 10g
------------------------------------------------
| Id  | Operation                      | Name  |
------------------------------------------------
|   0 | SELECT STATEMENT               |       |
|*  1 |  COUNT STOPKEY                 |       |
|   2 |   TABLE ACCESS BY INDEX ROWID  | T2    |  <-- Here
|   3 |    NESTED LOOPS                |       |
|   4 |     TABLE ACCESS BY INDEX ROWID| T1    |
|*  5 |      INDEX RANGE SCAN          | T1_N1 |
|*  6 |     INDEX RANGE SCAN           | T2_N1 |
------------------------------------------------

-- Oracle 11g
------------------------------------------------
| Id  | Operation                      | Name  |
------------------------------------------------
|   0 | SELECT STATEMENT               |       |
|*  1 |  COUNT STOPKEY                 |       |
|   2 |   NESTED LOOPS                 |       |
|   3 |    NESTED LOOPS                |       |
|   4 |     TABLE ACCESS BY INDEX ROWID| T1    |
|*  5 |      INDEX RANGE SCAN          | T1_N1 |
|*  6 |     INDEX RANGE SCAN           | T2_N1 |
|   7 |    TABLE ACCESS BY INDEX ROWID | T2    |   <-- And here
------------------------------------------------

You might note that Oracle 10g and 11g have very different location of the table lookup operation on the table T2. This is a symptom of the nested loops join optimization of Oracle 11g. By using batching NLJ, you might see a decreased logical reads and improved performance.

So far, so good, but a couple of days ago, I hit a somewhat strange ordering problem with the batching NLJ. Following is a replayable test case. Just note that the query is using index(t1_n1) to implement the pagination without the overhead of ORDER BY.

create table t1
as
select 1 as c1, mod(level, 4) as c2, level as c3, level as c4, rpad('x',1000) as dummy
from dual
connect by level <= 1000;

create table t2
as
select 1001-level as c1, level as c2, rpad('x',1000) as dummy
from dual
connect by level <= 100;

create index t1_n1 on t1(c1, c2, c3);
create index t2_n1 on t2(c1);

exec dbms_stats.gather_table_stats(user, 't1');
exec dbms_stats.gather_table_stats(user, 't2');

explain plan for
select /*+ leading(t1 t2) use_nl(t2) index_asc(t1) index_asc(t2) */
	rownum as rnum,
	t2.c1,
	t1.c4,
	t2.c2
from t1, t2
where 
	t1.c3 = t2.c1
	and t1.c1 = 1
	and t1.c2 = 0
	and rownum <= 20
;

select * from table(dbms_xplan.display);

-- Read from the disk
alter system flush buffer_cache;

select * from (
	select /*+ leading(t1 t2) use_nl(t2) index_asc(t1) index_asc(t2) */
		rownum as rnum,
		t2.c1,
		t1.c4,
		t2.c2
	from t1, t2
	where 
		t1.c3 = t2.c1
		and t1.c1 = 1
		and t1.c2 = 0
		and rownum <= 20
) where rnum >= 15
;

-- Read from the buffer cache
select * from (
	select /*+ leading(t1 t2) use_nl(t2) index_asc(t1) index_asc(t2) */
		rownum as rnum,
		t2.c1,
		t1.c4,
		t2.c2
	from t1, t2
	where 
		t1.c3 = t2.c1
		and t1.c1 = 1
		and t1.c2 = 0
		and rownum <= 20
) where rnum >= 15
;

-- Disable exceptions for buffer cache misses
alter session set "_nlj_batching_misses_enabled" = 0;

-- Read from the disk
alter system flush buffer_cache;

select * from (
	select /*+ leading(t1 t2) use_nl(t2) index_asc(t1) index_asc(t2) */
		rownum as rnum,
		t2.c1,
		t1.c4,
		t2.c2
	from t1, t2
	where 
		t1.c3 = t2.c1
		and t1.c1 = 1
		and t1.c2 = 0
		and rownum <= 20
) where rnum >= 15
;

-- Read from the buffer cache
select * from (
	select /*+ leading(t1 t2) use_nl(t2) index_asc(t1) index_asc(t2) */
		rownum as rnum,
		t2.c1,
		t1.c4,
		t2.c2
	from t1, t2
	where 
		t1.c3 = t2.c1
		and t1.c1 = 1
		and t1.c2 = 0
		and rownum <= 20
) where rnum >= 15
;

To save your time, let me show you the result of the above test case with some comments appended.

-- Case1 : batching NLJ enabled
-- when the query reads from the disk
      RNUM         C1         C4         C2
---------- ---------- ---------- ----------
        15        960        960         41
        16        964        964         37
        17        980        980         21  <-- Why 980 here?
        18        968        968         33
        19        972        972         29
        20        976        976         25


-- when the query reads from the buffer cache
      RNUM         C1         C4         C2
---------- ---------- ---------- ----------
        15        960        960         41
        16        964        964         37
        17        968        968         33
        18        972        972         29
        19        976        976         25
        20        980        980         21


-- Case 2: batching NLJ disabled
-- when the query reads from the disk
      RNUM         C1         C4         C2
---------- ---------- ---------- ----------
        15        960        960         41
        16        964        964         37
        17        968        968         33
        18        972        972         29
        19        976        976         25
        20        980        980         21

-- when the query reads from the buffer cache
      RNUM         C1         C4         C2
---------- ---------- ---------- ----------
        15        960        960         41
        16        964        964         37
        17        968        968         33
        18        972        972         29
        19        976        976         25
        20        980        980         21

The simple explanation is :
“The new optimization code of the nested loops join does not guarantee that the rows would be returned as an order of the outer table, especially when it reads the data from the disk.”

This could be a limitation when you want to get the ordered rows using index and NLJ in the pagination query, but I don’t believe that this could be classified as a bug. The only way to ensure the order of the returning rows is always to use ORDER BY clause.

But if you still want to use INDEX not ORDER BY in the above query(which was a natural selection in Oracle 10g), the workarounds is one of the followings.

  • Set parameter – _nlj_batching_misses_enabled = 0;
  • Set parameter – _nlj_batching_enabled = 0;
  • Append hint – NO_NLJ_BATCHING(t2)

It seems that I need to do further research on the batching NLJ – it’s exact mechanism and the pitfalls.

Written by Dion Cho

August 16, 2010 at 6:33 am

Posted in Troubleshooting

_gby_hash_aggregation_enabled bug

with 2 comments

One of the customers complained that Oracle refuses to use the HASH GROUP BY operation under any circumstances.

It’s 10gR2(10.2.0.1) database.

TPACK@ukja1021> select * from v$version where rownum = 1;

BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Prod

See that Oracle simply refuses to use HASH GROUP BY operation even with the simplest group by?

create table t1
as
select level as c1
from dual
connect by level <= 10;


explain plan for 
select count(*)
from tpack.t1
group by c1;

-------------------------------------------
| Id  | Operation          | Name | Rows  |
-------------------------------------------
|   0 | SELECT STATEMENT   |      |    10 |
|   1 |  SORT GROUP BY     |      |    10 |
|   2 |   TABLE ACCESS FULL| T1   |    10 |
-------------------------------------------

You might think that the “_gby_hash_aggregation_enabled” would have been false which is clearly right.

TPACK@ukja1021> col value format a10
TPACK@ukja1021> @para gby_hash
TPACK@ukja1021> set echo off
old   9: and i.ksppinm like '%&1%'
new   9: and i.ksppinm like '%gby_hash%'

NAME                           VALUE                IS_DEFAUL SES_MODIFI
------------------------------ -------------------- --------- ----------
SYS_MODIFI
----------
DESCRIPTION
-------------------------------------------------------------------------

_gby_hash_aggregation_enabled  FALSE                FALSE     true
immediate
enable group-by and aggregation using hash scheme

But even with the parameter enabled and USE_HASH_AGGREGATION hint applied, Oracle still persists to the SORT GROUP BY operation.

alter session set "_gby_hash_aggregation_enabled" = true;

explain plan for 
select /*+ use_hash_aggregation */ count(*)
from t1
group by c1;

-------------------------------------------
| Id  | Operation          | Name | Rows  |
-------------------------------------------
|   0 | SELECT STATEMENT   |      |    10 |
|   1 |  SORT GROUP BY     |      |    10 |
|   2 |   TABLE ACCESS FULL| T1   |    10 |
-------------------------------------------

Okay, enough. This is not a normal situation. The last resort is MOS. Following bug seems to perfectly match this case.

Bug 8631742: ALTER SESSION SET ... NOT HAVING EFFECT IN 10.2.0.4
...
RELEASE NOTES:
]] Setting _gby_hash_aggregation_enabled at the session level did not
]] always take effect
REDISCOVERY INFORMATION:
If you change the use of hash aggregation at the session level, but this does
not affect the choice of aggregation method in subsequebtly parsed SQL,
you are probably hitting this bug.

One thing to note is that you might see this bug only when you specify the parameter in the parameter file which would be a a bit rare case.

*._gby_hash_aggregation_enabled=FALSE

I know that the earlier implementation of the HASH GROUP BY operation has some serious bugs which made many sites simply turn off it permanently. The problem is that you can’t use that operation even with USE_HASH_AGGREGATION hint which you wouldn’t have expected. :)

FIY, this bug was patched at 10.2.0.5.

Written by Dion Cho

July 6, 2010 at 5:52 am

Rapid PGA size increase

with 8 comments

A couple of days ago, one of our customers experienced a rapid PGA increase and the system got rebounced by PMON. We suggested dumping the PGA heap while the problem reoccurs. But the problem is that the PGA increase is too rapid(upto 3.6G in less than 2 min), so it is a bit hard to get the PGA dump manually.

Fortunately, in situation like this, Oracle provides a way of getting automatic PGA heap dump.

1. First, let me set the 10261 diagnostic event to limit the size of the PGA heap. For instance, following command will limit the size of the PGA heap to 100000KB.

alter system set events '10261 trace name context forever, level 100000';

2. With the 10261 event set, the process would fail with ORA-600 [723] when the size of the PGA heap reaches 100000KB. As you already know, the 10261 event is used as a band-aid for the ORA-4030 error.

-- make big pga
declare
 type varchar2_array is table of varchar2(32767) index by pls_integer;
 vc  varchar2_array;
 v  varchar2(32767);
begin
 for idx in 1 .. 10000 loop
 v := rpad('x',32767,'x');
 vc(idx) := v;
 end loop;
end;
/

ERROR at line 1:
ORA-00600: internal error code, arguments: [723], [41000], [pga heap], [], [],
[], [], []

3. Now, let me set the 600 diagnostic event to get the heap dump when the process hits ORA-600 error. With the combination of 10261 and 600 event, the process would record the PGA heap dump automatically when the PGA heap size reaches the limit.

alter system set events '600 trace name heapdump level 0x20000001';

4. When the PGA heap size reaches the limit(100000KB in this case) again, I would have the complete PGA heap dump including all the recursive subheaps by virtue of the dump level 0x20000001.

DDE: Problem Key 'ORA 600 [723]' was flood controlled (0x2) (incident: 44800)
ORA-00600: internal error code, arguments: [723], [41000], [pga heap], [], [], [], [], []
****** ERROR: PGA size limit exceeded: 102450812 > 102400000 *****
******************************************************
HEAP DUMP heap name="pga heap"  desc=11AFB098
 extent sz=0x206c alt=92 het=32767 rec=0 flg=2 opc=1
 parent=00000000 owner=00000000 nex=00000000 xsz=0xfff8 heap=00000000
 fl2=0x60, nex=00000000
EXTENT 0 addr=39150008
  Chunk 39150010 sz=    24528    free      "               "
  Chunk 39155fe0 sz=    40992    freeable  "koh-kghu call  "  ds=0D4D9A60
EXTENT 1 addr=39140008
  Chunk 39140010 sz=    24528    free      "               "
  Chunk 39145fe0 sz=    40992    freeable  "koh-kghu call  "  ds=0D4D9A60
...

5. The last step is to analyze the heap dump. For instance, I have my own library to analyze the heap dump.

select * from table(tpack.heap_file_report('C:\oracle\diag\rdbms\ukja1106\ukja1106\trace\ukja1106_ora_3640.trc'));

TYPE     HEAP_NAME        ITEM             ITEM_COUNT  ITEM_SIZE  HEAP_SIZE      RATIO
-------- ---------------- ---------------- ---------- ---------- ---------- ----------
HEAP     pga heap                                   0      97.14      97.14        100
HEAP     top call heap                              0        .18        .18        100
HEAP     top uga heap                               0        .31        .31        100
CHUNK    pga heap         free                   1554       36.2       97.1       37.3
CHUNK    pga heap         recreate                  9          0       97.1          0
CHUNK    pga heap         perm                     14          0       97.1          0
CHUNK    pga heap         freeable               1597       60.7       97.1       62.5
CHUNK    top uga heap     recreate                  1          0         .3       19.9
CHUNK    top uga heap     free                      5          0         .3          0
CHUNK    top uga heap     freeable                  4         .2         .3       79.9
CHUNK    top call heap    free                      3         .1         .1       65.5
CHUNK    top call heap    recreate                  2          0         .1          1
CHUNK    top call heap    freeable                  1          0         .1       33.3
CHUNK    top call heap    perm                      1          0         .1          0
OBJECT   pga heap         kews sqlstat st           1          0       97.1          0
OBJECT   pga heap         pesom.c:Proces            3          0       97.1          0
...

6. It is also possible to write the script to monitor the V$SESSTAT view(session pga memory) and execute the heap dump when the value hits some specific threshold. For instance, I have my own library which is used like following.

col report_id new_value report_id

select tpack_server.create_report('Heap Dump') as report_id from dual;

exec tpack_server.add_parameter('&report_id', 'dump_level', '0x20000001');
exec tpack_server.add_parameter('&report_id', 'get_whole_contents', 0);

exec tpack_server.add_condition('&report_id', 'STAT', 'session pga memory', '>100000000', 'SUM');

exec tpack_server.register_report('&report_id');

-- start server
exec tpack_server.start_server;

When the PGA heap size hits the limit(100000000B), it would execute the predefined procedure which executes the heap dump.

Fri Jun 11 06:19:10 GMT+00:00 2010 : Session 142 got! sum=659645392, name = session pga memory
...
Fri Jun 11 06:27:50 GMT+00:00 2010 : executing report 1:142:1973827792 for session 142
Fri Jun 11 06:27:55 GMT+00:00 2010 : executing report = begin tpack.heap_dump(  dump_level=>'0x20000001', get_whole_contents=>0,  session_id => 142); end;
...

Also note that the combination of 10261 and 600 event is just a temporary solution and the most important thing is to analyze the heap dump very carefully to make the rapid PGA increase never happen again!

Written by Dion Cho

June 14, 2010 at 2:10 am

Posted in Troubleshooting

Tagged with , , ,

Interesting combination of RAC and serial direct path read

with 7 comments

For who are not aware, Oracle 11g has introduced a serial direct path read which enables the efficient direct path read for the serial table scan.

Disabling Direct path read for the serial full table scan-11g

A couple of days ago, I was reported a weird performance bias between 2 nodes in 11g RAC.

The query has the same execution plan on both nodes.

---------------------------------------------------------
| Id  | Operation                       | Name         | 
---------------------------------------------------------
|   0 | SELECT STATEMENT                |              | 
|   1 |  NESTED LOOPS                   |              | 
|*  2 |   TABLE ACCESS FULL             | 			   | 
|*  3 |   VIEW                          |              | 
|   4 |    UNION ALL PUSHED PREDICATE   |              | 
|*  5 |     FILTER                      |              | 
|   6 |      TABLE ACCESS BY INDEX ROWID| 			   | 
|*  7 |       INDEX RANGE SCAN          | 			   | 
|*  8 |     FILTER                      |              | 
|   9 |      TABLE ACCESS BY INDEX ROWID| 			   | 
|* 10 |       INDEX RANGE SCAN          | 			   | 
---------------------------------------------------------

But on node A, it takes 12 sec to complete while takes only 2 sec on node B.

-- node A
all  count    cpu  elapsed    disk    query  current    rows 
------- ------ -------- ---------- ---------- ---------- ---------- ---------- 
Parse    1    0.06    0.06      0      0      0      0 
Execute    1    0.00    0.00      0      0      0      0 
Fetch    2    1.31    10.48    90421    90437      0      2 
------- ------ -------- ---------- ---------- ---------- ---------- ---------- 
total    4    1.37    10.55    90421    90437      0      2 
Elapsed times include waiting on following events: 
Event waited on                  Times  Max. Wait Total Waited 
----------------------------------------  Waited ---------- ------------ 
library cache lock                    1    0.00      0.00 
library cache pin                    1    0.00      0.00 
SQL*Net message to client              2    0.00      0.00 
enq: KO - fast object checkpoint          2    0.00      0.00 
reliable message                    1    0.00      0.00 
direct path read                  1579    0.00      0.01 
SQL*Net message from client              2    23.81      23.81

-- node B
call  count    cpu  elapsed    disk    query  current    rows 
------- ------ -------- ---------- ---------- ---------- ---------- ---------- 
Parse    1    0.03    0.03      0      0      0      0 
Execute    1    0.00    0.00      0      0      0      0 
Fetch    2    2.01    2.01      0    90450      0      2 
------- ------ -------- ---------- ---------- ---------- ---------- ---------- 
total    4    2.04    2.05      0    90450      0      2 

Elapsed times include waiting on following events: 
Event waited on                  Times  Max. Wait Total Waited 
----------------------------------------  Waited ---------- ------------ 
SQL*Net message to client              2    0.00      0.00 
gc cr block 2-way                    1    0.00      0.00 
SQL*Net message from client              2    23.53      23.53 
library cache lock                    1    0.00      0.00 
library cache pin                    1    0.00      0.00

This performance bias was always reproduced. It took always 12 sec on node A and 2 sec on node B.

The hint was direct path read wait event which means that the direct path read was being performed.

  • Because this is not a parallel execution, we have a strong chance of the serial direct path read.
  • Nocache LOB can’t be a reason here. If the nocache LOB is the issue, both nodes should have the same performance feature.
  • But we should be careful when interpreting the wait count and wait time of direct path read event. We can’t rely on these values due to the mechanism of the direct path read.

The customer was suggested of the 10949 diagnostics event as a workaround and the problem was gone(I know that this event is undocumented and unsupported, but… )

It seems that node A and node B had different environments(different cache size and/or different size of preload blocks …) and as a result, Oracle chose the different ways of executing FTS – direct vs. conventional.

Can this be classified as another dark side(side effect) of the automization by Oracle? I’m not sure but the unexpected bias between the multiple nodes is always likely to embarrass the DBAs.

Written by Dion Cho

June 9, 2010 at 4:31 am

Posted in I/O

Tagged with

Follow

Get every new post delivered to your Inbox.

Join 59 other followers