Interesting case of the restriction of global hints
One of my colleagues sent following test case, of which he couldn’t control the join order with hints.
1. Create objects – table t1, t2 and t3SQL> create table t1(c1 number, c2 number); Table created. SQL> create table t2(c1 number, c2 number); Table created. SQL> create table t3(c1 number, c2 number); Table created.
2. Now Let’s set the join order as T1->T2->T3, using global hints convention. But it seems that Oracle does not work as expected.
SQL> explain plan for 2 select * from 3 ( 4 select 5 /*+ leading(v.t1 v.t2 t3) */ 6 v.c1 as v_c1, 7 v.c2 as v_c2, 8 t3.c2 as t3_c2 9 from 10 (select 11 t1.c1, 12 t2.c2 13 from 14 t1, t2 15 where 16 t1.c1 = t2.c1) v, 17 t3 18 where 19 v.c1 = t3.c1 20 ) x 21 ; ------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 1 | 65 | 7 (15)| 00:00:01 | |* 1 | HASH JOIN | | 1 | 65 | 7 (15)| 00:00:01 | | 2 | MERGE JOIN CARTESIAN| | 1 | 52 | 4 (0)| 00:00:01 | | 3 | TABLE ACCESS FULL | T2 | 1 | 26 | 2 (0)| 00:00:01 | | 4 | BUFFER SORT | | 1 | 26 | 2 (0)| 00:00:01 | | 5 | TABLE ACCESS FULL | T3 | 1 | 26 | 2 (0)| 00:00:01 | | 6 | TABLE ACCESS FULL | T1 | 1 | 13 | 2 (0)| 00:00:01 | ------------------------------------------------------------------------------
It really seems that the optimizer does not understand the global hints when it is used with non global hint convention(t3 here). Yes, you can use hints inside the inline view, but what if you want to control the join order only with global hint convention?
3. The cure he tried was using Oracle’s internal global hint convention. You can get how Oracle’s internal hint convention looks using ‘ADVANCED’ format option with DBMS_XPLAN.DISPLAY function.
select * from table(dbms_xplan.display(null, null, 'advanced'));
...
Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
1 - SEL$5C160134
3 - SEL$5C160134 / T1@SEL$3
4 - SEL$5C160134 / T2@SEL$3
5 - SEL$5C160134 / T3@SEL$2
Outline Data
-------------
/*+
BEGIN_OUTLINE_DATA
USE_HASH(@"SEL$5C160134" "T3"@"SEL$2")
USE_HASH(@"SEL$5C160134" "T2"@"SEL$3")
LEADING(@"SEL$5C160134" "T1"@"SEL$3" "T2"@"SEL$3" "T3"@"SEL$2")
FULL(@"SEL$5C160134" "T3"@"SEL$2")
FULL(@"SEL$5C160134" "T2"@"SEL$3")
FULL(@"SEL$5C160134" "T1"@"SEL$3")
OUTLINE(@"SEL$3")
OUTLINE(@"SEL$2")
MERGE(@"SEL$3")
OUTLINE(@"SEL$335DD26A")
OUTLINE(@"SEL$1")
MERGE(@"SEL$335DD26A")
OUTLINE_LEAF(@"SEL$5C160134")
ALL_ROWS
DB_VERSION('11.2.0.1')
OPTIMIZER_FEATURES_ENABLE('11.2.0.1')
IGNORE_OPTIM_EMBEDDED_HINTS
END_OUTLINE_DATA
*/
4. With this ugly global hint convention, the join order is now fully controllable.
SQL> explain plan for 2 select * from 3 ( 4 select 5 /*+ LEADING(@"SEL$5C160134" "T1"@"SEL$3" "T2"@"SEL$3" "T3"@"SEL$2" ) */ 6 v.c1 as v_c1, 7 v.c2 as v_c2, 8 t3.c2 as t3_c2 9 from 10 (select 11 t1.c1, 12 t2.c2 13 from 14 t1, t2 15 where 16 t1.c1 = t2.c1) v, 17 t3 18 where 19 v.c1 = t3.c1 20 ) x 21 ; ---------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 65 | 7 (15)| 00:00:01 | |* 1 | HASH JOIN | | 1 | 65 | 7 (15)| 00:00:01 | |* 2 | HASH JOIN | | 1 | 39 | 5 (20)| 00:00:01 | | 3 | TABLE ACCESS FULL| T1 | 1 | 13 | 2 (0)| 00:00:01 | | 4 | TABLE ACCESS FULL| T2 | 1 | 26 | 2 (0)| 00:00:01 | | 5 | TABLE ACCESS FULL | T3 | 1 | 26 | 2 (0)| 00:00:01 | ----------------------------------------------------------------------------
5. The better approach would be QB_NAME hint, which provides better readability and flexibility.
SQL> explain plan for 2 select * from 3 ( 4 select 5 /*+ leading(t1@inline t2@inline t3) */ 6 v.c1 as v_c1, 7 v.c2 as v_c2, 8 t3.c2 as t3_c2 9 from 10 (select /*+ qb_name(inline) */ 11 t1.c1, 12 t2.c2 13 from 14 t1, t2 15 where 16 t1.c1 = t2.c1) v, 17 t3 18 where 19 v.c1 = t3.c1 20 ) x 21 ; ---------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 65 | 7 (15)| 00:00:01 | |* 1 | HASH JOIN | | 1 | 65 | 7 (15)| 00:00:01 | |* 2 | HASH JOIN | | 1 | 39 | 5 (20)| 00:00:01 | | 3 | TABLE ACCESS FULL| T1 | 1 | 13 | 2 (0)| 00:00:01 | | 4 | TABLE ACCESS FULL| T2 | 1 | 26 | 2 (0)| 00:00:01 | | 5 | TABLE ACCESS FULL | T3 | 1 | 26 | 2 (0)| 00:00:01 | ----------------------------------------------------------------------------
Let me give my thanks to my colleague, who sent this interesting test case.
PS) Note that QB_NAME hint has also some restrictions, in that case, Oracle’s internal global hint convention would be a good alternative.Interpreting Parallel MERGE Statement
This post is inspired by one of my colleagues, who had small difficulty while interpreting parallel MERGE execution plan.
This is the step-by-step explanation I gave him. 1. Create tables.
SQL> create table t1
2 as
3 select
4 level as c1,
5 level as c2,
6 rpad('x',100) as c3
7 from
8 dual
9 connect by level <= 10000
10 ;
Table created.
SQL> create table t2
2 as
3 select
4 level as c1,
5 level as c2,
6 rpad('x', 100) as c3
7 from
8 dual
9 connect by level <= 10000
10 ;
Table created.
2. Would this MERGE statement run in parallel?
SQL> explain plan for 2 merge /*+ parallel */ into t1 3 using (select c1, c2 from t2) t2 4 on (t1.c1 = t2.c1) 5 when matched then 6 update set t1.c2 = t1.c2 7 when not matched then 8 insert(c1, c2) values(t2.c1, t2.c2) 9 ; Explained. ------------------------------------------------------- | Id | Operation | Name | Rows | ------------------------------------------------------- | 0 | MERGE STATEMENT | | 9356 | | 1 | MERGE | T1 | | | 2 | PX COORDINATOR | | | | 3 | PX SEND QC (RANDOM) | :TQ10001 | 9356 | | 4 | VIEW | | | |* 5 | HASH JOIN OUTER | | 9356 | | 6 | PX BLOCK ITERATOR | | 9356 | | 7 | TABLE ACCESS FULL | T2 | 9356 | | 8 | BUFFER SORT | | | | 9 | PX RECEIVE | | 11234 | | 10 | PX SEND BROADCAST | :TQ10000 | 11234 | | 11 | PX BLOCK ITERATOR | | 11234 | | 12 | TABLE ACCESS FULL| T1 | 11234 | -------------------------------------------------------
The answer is NO. As you notice from the operation #2, the MERGE part itself runs in serial. It is SELECT part that runs in parallel.
3. The reason would be parallel dml not enabled. Let’s see what happens if we have parallel dml enabled.SQL> alter session enable parallel dml; Session altered. SQL> explain plan for 2 merge /*+ parallel */ into t1 3 using (select c1, c2 from t2) t2 4 on (t1.c1 = t2.c1) 5 when matched then 6 update set t1.c2 = t1.c2 7 when not matched then 8 insert(c1, c2) values(t2.c1, t2.c2) 9 ; ------------------------------------------------------- | Id | Operation | Name | Rows | ------------------------------------------------------- | 0 | MERGE STATEMENT | | 9356 | | 1 | MERGE | T1 | | | 2 | PX COORDINATOR | | | | 3 | PX SEND QC (RANDOM) | :TQ10001 | 9356 | | 4 | VIEW | | | |* 5 | HASH JOIN OUTER | | 9356 | | 6 | PX BLOCK ITERATOR | | 9356 | | 7 | TABLE ACCESS FULL | T2 | 9356 | | 8 | BUFFER SORT | | | | 9 | PX RECEIVE | | 11234 | | 10 | PX SEND BROADCAST | :TQ10000 | 11234 | | 11 | PX BLOCK ITERATOR | | 11234 | | 12 | TABLE ACCESS FULL| T1 | 11234 | -------------------------------------------------------
Still, the MERGE part runs serially.
4. Maybe parallel merge needs more specific HINT, which is quite natural because the MERGE statement has two tables involved. Let me make the PARALLE hint more specific.SQL> explain plan for 2 merge /*+ parallel(t1) */ into t1 3 using (select c1, c2 from t2) t2 4 on (t1.c1 = t2.c1) 5 when matched then 6 update set t1.c2 = t1.c2 7 when not matched then 8 insert(c1, c2) values(t2.c1, t2.c2) 9 ; ---------------------------------------------------- | Id | Operation | Name | ---------------------------------------------------- | 0 | MERGE STATEMENT | | | 1 | PX COORDINATOR | | | 2 | PX SEND QC (RANDOM) | :TQ10003 | | 3 | MERGE | T1 | | 4 | PX RECEIVE | | | 5 | PX SEND HYBRID (ROWID PKEY)| :TQ10002 | | 6 | VIEW | | |* 7 | HASH JOIN OUTER BUFFERED | | | 8 | BUFFER SORT | | | 9 | PX RECEIVE | | | 10 | PX SEND HASH | :TQ10000 | | 11 | TABLE ACCESS FULL | T2 | | 12 | PX RECEIVE | | | 13 | PX SEND HASH | :TQ10001 | | 14 | PX BLOCK ITERATOR | | | 15 | TABLE ACCESS FULL | T1 | ----------------------------------------------------
Now, the MERGE part really runs in parallel!
5. One more thing you should be careful about. Let me create an index on the target table t1 and see what happens on the plan.SQL> create index t1_n1 on t1(c1); Index created. SQL> explain plan for 2 merge /*+ parallel(t1) */ into t1 3 using (select c1, c2 from t2) t2 4 on (t1.c1 = t2.c1) 5 when matched then 6 update set t1.c2 = t1.c2 7 when not matched then 8 insert(c1, c2) values(t2.c1, t2.c2) 9 ; --------------------------------------------------------------- | Id | Operation | Name | Rows | --------------------------------------------------------------- | 0 | MERGE STATEMENT | | 9356 | | 1 | PX COORDINATOR | | | | 2 | PX SEND QC (RANDOM) | :TQ10004 | 9356 | | 3 | INDEX MAINTENANCE | T1 | | | 4 | PX RECEIVE | | 9356 | | 5 | PX SEND RANGE | :TQ10003 | 9356 | | 6 | MERGE | T1 | | | 7 | PX RECEIVE | | 9356 | | 8 | PX SEND HYBRID (ROWID PKEY)| :TQ10002 | 9356 | | 9 | VIEW | | | |* 10 | HASH JOIN OUTER BUFFERED | | 9356 | | 11 | BUFFER SORT | | | | 12 | PX RECEIVE | | 9356 | | 13 | PX SEND HASH | :TQ10000 | 9356 | | 14 | TABLE ACCESS FULL | T2 | 9356 | | 15 | PX RECEIVE | | 11234 | | 16 | PX SEND HASH | :TQ10001 | 11234 | | 17 | PX BLOCK ITERATOR | | 11234 | | 18 | TABLE ACCESS FULL | T1 | 11234 | ---------------------------------------------------------------
Now we have a new operation called INDEX MAINTENANCE and need one more table queue(TQ10004) to parallelize index maintenance itself. But why do we have index maintenance here?
- Parallel MERGE is composed of parallel update and parallel insert.
- Parallel insert is direct path insert.
- Indexes need to be maintained after direct path insert.
Oracle needs to maintain the indexes after direct path insert completes. This is controlled by _idl_conventional_index_maintenance hidden parameter, which you’d never have a chance to modify.
SQL> select * from table(tpack.param('_idl_conventional_index_maintenance'));
NAME VALUE
----------------------------------- ----------------------------------------
Name #1 _idl_conventional_index_maintenance
Value TRUE
Is Default TRUE
Sess Modifiable false
Sys Modifiable false
Description enable conventional index maintenance fo
r insert direct load
The INDEX MAINTENANCE operation does not disappear even when the index is disabled, but I believe that Oracle would skip the maintenance for the disabled indexes in runtime.
SQL> alter index t1_n1 unusable; Index altered. SQL> explain plan for 2 merge /*+ parallel(t1) */ into t1 3 using (select c1, c2 from t2) t2 4 on (t1.c1 = t2.c1) 5 when matched then 6 update set t1.c2 = t1.c2 7 when not matched then 8 insert(c1, c2) values(t2.c1, t2.c2) 9 ; --------------------------------------------------------------- | Id | Operation | Name | Rows | --------------------------------------------------------------- | 0 | MERGE STATEMENT | | 9356 | | 1 | PX COORDINATOR | | | | 2 | PX SEND QC (RANDOM) | :TQ10004 | 9356 | | 3 | INDEX MAINTENANCE | T1 | | | 4 | PX RECEIVE | | 9356 | | 5 | PX SEND RANGE | :TQ10003 | 9356 | | 6 | MERGE | T1 | | | 7 | PX RECEIVE | | 9356 | | 8 | PX SEND HYBRID (ROWID PKEY)| :TQ10002 | 9356 | | 9 | VIEW | | | |* 10 | HASH JOIN OUTER BUFFERED | | 9356 | | 11 | BUFFER SORT | | | | 12 | PX RECEIVE | | 9356 | | 13 | PX SEND HASH | :TQ10000 | 9356 | | 14 | TABLE ACCESS FULL | T2 | 9356 | | 15 | PX RECEIVE | | 11234 | | 16 | PX SEND HASH | :TQ10001 | 11234 | | 17 | PX BLOCK ITERATOR | | 11234 | | 18 | TABLE ACCESS FULL | T1 | 11234 | ---------------------------------------------------------------
I hope this helped him to understand how to interpret the execution plan of parallel DML . :)
Making self deadlock on the library cache pin
It’s very easy to make your session deadlocked on the library cache pin. Let me prove it.
Step 1. Make a dummy procedure called TEST_PROC1.
TPACK@ukja1120> select * from v$version where rownum = 1; BANNER ----------------------------------------------------------------------- Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production create or replace procedure test_proc1 is begin null; end; /
Step 2. Execute simple PL/SQL which executes TEST_PROC1 and compiles it.
TPACK@ukja1120> begin 2 test_proc1; 3 4 execute immediate 'alter procedure test_proc1 compile'; 5 6 end; 7 / ... (Hang)
Very simple, isn’t it?
ASH cleary shows that the session is waiting for the library cache pin to be released, but it’s being acquried by the session itself.
select *
from (
select
h.session_id as sid,
to_char(h.sample_time,'mi:ss') as sample_time,
h.sql_id,
(select sql_text from v$sqlarea a where a.sql_id = h.sql_id) as sql_text,
event,
blocking_session as blocker
from
v$active_session_history h
where
h.session_id = &sid
order by h.sample_time desc
) where rownum <= 20
;
SID SAMPL SQL_ID SQL_TEXT EVENT BLOCKER
---- ----- ------------- -------------------- ---------- ----------
136 49:10 library ca 136
che pin
136 49:09 library ca 136
che pin
136 49:08 library ca 136
che pin
136 49:07 library ca 136
che pin
136 49:06 library ca 136
che pin
136 49:05 library ca 136
che pin
136 49:04 library ca 136
che pin
136 49:03 library ca 136
che pin
136 49:02 library ca 136
che pin
136 49:01 library ca 136
che pin
136 49:00 library ca 136
che pin
136 48:59 library ca 136
che pin
...
TPack‘s Wait Detail Report gives more detailed information on the deadlocked library cache pin contention.
TPACK@ukja1120> select * from table(tpack.session_detail(136,'wait_detail'))
NAME VALUE
------------------------------ --------------------
SID 136
Serial# 2797
SPID 5148
Program sqlplus.exe
Process 5404:672
Module SQL*Plus
SQL ID 9pbva4bn2m25b
Child No 0
SQL Text alter procedure test
_proc1 compile
Status ACTIVE
Blocking Instance 1
Blocking Session 136
SQL Exec Start 2010/09/15 13:45:34
Event library cache pin
Seq# 130
P1(P1raw) 384372376(0000000016
E90E98)
P2(P2raw) 384372376(0000000016
DAB608)
P3(P3raw) 384372376(00014F8500
010003)
Seconds in wait 40
State WAITING
Wait Event library cache pin
Holder SID 136
Namespace TABLE/PROCEDURE
Object TEST_PROC1
Holding Mode 2(S)
So, this is not a normal situation, isn’t it? What would be the reason of the library cache pin not being released even after the execution?
Batching NLJ optimization and ordering
You might be aware of the batching nested loops join(batching NLJ) optimization introduced at Oracle 11g. For who are not aware of this new feature, let me show you two different execution plans(Oracle 10g vs. 11g) of the same SQL.
-- Oracle 10g ------------------------------------------------ | Id | Operation | Name | ------------------------------------------------ | 0 | SELECT STATEMENT | | |* 1 | COUNT STOPKEY | | | 2 | TABLE ACCESS BY INDEX ROWID | T2 | <-- Here | 3 | NESTED LOOPS | | | 4 | TABLE ACCESS BY INDEX ROWID| T1 | |* 5 | INDEX RANGE SCAN | T1_N1 | |* 6 | INDEX RANGE SCAN | T2_N1 | ------------------------------------------------ -- Oracle 11g ------------------------------------------------ | Id | Operation | Name | ------------------------------------------------ | 0 | SELECT STATEMENT | | |* 1 | COUNT STOPKEY | | | 2 | NESTED LOOPS | | | 3 | NESTED LOOPS | | | 4 | TABLE ACCESS BY INDEX ROWID| T1 | |* 5 | INDEX RANGE SCAN | T1_N1 | |* 6 | INDEX RANGE SCAN | T2_N1 | | 7 | TABLE ACCESS BY INDEX ROWID | T2 | <-- And here ------------------------------------------------
You might note that Oracle 10g and 11g have very different location of the table lookup operation on the table T2. This is a symptom of the nested loops join optimization of Oracle 11g. By using batching NLJ, you might see a decreased logical reads and improved performance.
So far, so good, but a couple of days ago, I hit a somewhat strange ordering problem with the batching NLJ. Following is a replayable test case. Just note that the query is using index(t1_n1) to implement the pagination without the overhead of ORDER BY.
create table t1
as
select 1 as c1, mod(level, 4) as c2, level as c3, level as c4, rpad('x',1000) as dummy
from dual
connect by level <= 1000;
create table t2
as
select 1001-level as c1, level as c2, rpad('x',1000) as dummy
from dual
connect by level <= 100;
create index t1_n1 on t1(c1, c2, c3);
create index t2_n1 on t2(c1);
exec dbms_stats.gather_table_stats(user, 't1');
exec dbms_stats.gather_table_stats(user, 't2');
explain plan for
select /*+ leading(t1 t2) use_nl(t2) index_asc(t1) index_asc(t2) */
rownum as rnum,
t2.c1,
t1.c4,
t2.c2
from t1, t2
where
t1.c3 = t2.c1
and t1.c1 = 1
and t1.c2 = 0
and rownum <= 20
;
select * from table(dbms_xplan.display);
-- Read from the disk
alter system flush buffer_cache;
select * from (
select /*+ leading(t1 t2) use_nl(t2) index_asc(t1) index_asc(t2) */
rownum as rnum,
t2.c1,
t1.c4,
t2.c2
from t1, t2
where
t1.c3 = t2.c1
and t1.c1 = 1
and t1.c2 = 0
and rownum <= 20
) where rnum >= 15
;
-- Read from the buffer cache
select * from (
select /*+ leading(t1 t2) use_nl(t2) index_asc(t1) index_asc(t2) */
rownum as rnum,
t2.c1,
t1.c4,
t2.c2
from t1, t2
where
t1.c3 = t2.c1
and t1.c1 = 1
and t1.c2 = 0
and rownum <= 20
) where rnum >= 15
;
-- Disable exceptions for buffer cache misses
alter session set "_nlj_batching_misses_enabled" = 0;
-- Read from the disk
alter system flush buffer_cache;
select * from (
select /*+ leading(t1 t2) use_nl(t2) index_asc(t1) index_asc(t2) */
rownum as rnum,
t2.c1,
t1.c4,
t2.c2
from t1, t2
where
t1.c3 = t2.c1
and t1.c1 = 1
and t1.c2 = 0
and rownum <= 20
) where rnum >= 15
;
-- Read from the buffer cache
select * from (
select /*+ leading(t1 t2) use_nl(t2) index_asc(t1) index_asc(t2) */
rownum as rnum,
t2.c1,
t1.c4,
t2.c2
from t1, t2
where
t1.c3 = t2.c1
and t1.c1 = 1
and t1.c2 = 0
and rownum <= 20
) where rnum >= 15
;
To save your time, let me show you the result of the above test case with some comments appended.
-- Case1 : batching NLJ enabled
-- when the query reads from the disk
RNUM C1 C4 C2
---------- ---------- ---------- ----------
15 960 960 41
16 964 964 37
17 980 980 21 <-- Why 980 here?
18 968 968 33
19 972 972 29
20 976 976 25
-- when the query reads from the buffer cache
RNUM C1 C4 C2
---------- ---------- ---------- ----------
15 960 960 41
16 964 964 37
17 968 968 33
18 972 972 29
19 976 976 25
20 980 980 21
-- Case 2: batching NLJ disabled
-- when the query reads from the disk
RNUM C1 C4 C2
---------- ---------- ---------- ----------
15 960 960 41
16 964 964 37
17 968 968 33
18 972 972 29
19 976 976 25
20 980 980 21
-- when the query reads from the buffer cache
RNUM C1 C4 C2
---------- ---------- ---------- ----------
15 960 960 41
16 964 964 37
17 968 968 33
18 972 972 29
19 976 976 25
20 980 980 21
The simple explanation is :
“The new optimization code of the nested loops join does not guarantee that the rows would be returned as an order of the outer table, especially when it reads the data from the disk.”
This could be a limitation when you want to get the ordered rows using index and NLJ in the pagination query, but I don’t believe that this could be classified as a bug. The only way to ensure the order of the returning rows is always to use ORDER BY clause.
But if you still want to use INDEX not ORDER BY in the above query(which was a natural selection in Oracle 10g), the workarounds is one of the followings.
- Set parameter – _nlj_batching_misses_enabled = 0;
- Set parameter – _nlj_batching_enabled = 0;
- Append hint – NO_NLJ_BATCHING(t2)
It seems that I need to do further research on the batching NLJ – it’s exact mechanism and the pitfalls.
_gby_hash_aggregation_enabled bug
One of the customers complained that Oracle refuses to use the HASH GROUP BY operation under any circumstances.
It’s 10gR2(10.2.0.1) database.
TPACK@ukja1021> select * from v$version where rownum = 1; BANNER ---------------------------------------------------------------- Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Prod
See that Oracle simply refuses to use HASH GROUP BY operation even with the simplest group by?
create table t1 as select level as c1 from dual connect by level <= 10; explain plan for select count(*) from tpack.t1 group by c1; ------------------------------------------- | Id | Operation | Name | Rows | ------------------------------------------- | 0 | SELECT STATEMENT | | 10 | | 1 | SORT GROUP BY | | 10 | | 2 | TABLE ACCESS FULL| T1 | 10 | -------------------------------------------
You might think that the “_gby_hash_aggregation_enabled” would have been false which is clearly right.
TPACK@ukja1021> col value format a10 TPACK@ukja1021> @para gby_hash TPACK@ukja1021> set echo off old 9: and i.ksppinm like '%&1%' new 9: and i.ksppinm like '%gby_hash%' NAME VALUE IS_DEFAUL SES_MODIFI ------------------------------ -------------------- --------- ---------- SYS_MODIFI ---------- DESCRIPTION ------------------------------------------------------------------------- _gby_hash_aggregation_enabled FALSE FALSE true immediate enable group-by and aggregation using hash scheme
But even with the parameter enabled and USE_HASH_AGGREGATION hint applied, Oracle still persists to the SORT GROUP BY operation.
alter session set "_gby_hash_aggregation_enabled" = true; explain plan for select /*+ use_hash_aggregation */ count(*) from t1 group by c1; ------------------------------------------- | Id | Operation | Name | Rows | ------------------------------------------- | 0 | SELECT STATEMENT | | 10 | | 1 | SORT GROUP BY | | 10 | | 2 | TABLE ACCESS FULL| T1 | 10 | -------------------------------------------
Okay, enough. This is not a normal situation. The last resort is MOS. Following bug seems to perfectly match this case.
Bug 8631742: ALTER SESSION SET ... NOT HAVING EFFECT IN 10.2.0.4 ... RELEASE NOTES: ]] Setting _gby_hash_aggregation_enabled at the session level did not ]] always take effect REDISCOVERY INFORMATION: If you change the use of hash aggregation at the session level, but this does not affect the choice of aggregation method in subsequebtly parsed SQL, you are probably hitting this bug.
One thing to note is that you might see this bug only when you specify the parameter in the parameter file which would be a a bit rare case.
*._gby_hash_aggregation_enabled=FALSE
I know that the earlier implementation of the HASH GROUP BY operation has some serious bugs which made many sites simply turn off it permanently. The problem is that you can’t use that operation even with USE_HASH_AGGREGATION hint which you wouldn’t have expected. :)
FIY, this bug was patched at 10.2.0.5.
Rapid PGA size increase
A couple of days ago, one of our customers experienced a rapid PGA increase and the system got rebounced by PMON. We suggested dumping the PGA heap while the problem reoccurs. But the problem is that the PGA increase is too rapid(upto 3.6G in less than 2 min), so it is a bit hard to get the PGA dump manually.
Fortunately, in situation like this, Oracle provides a way of getting automatic PGA heap dump. 1. First, let me set the 10261 diagnostic event to limit the size of the PGA heap. For instance, following command will limit the size of the PGA heap to 100000KB.alter system set events '10261 trace name context forever, level 100000';
2. With the 10261 event set, the process would fail with ORA-600 [723] when the size of the PGA heap reaches 100000KB. As you already know, the 10261 event is used as a band-aid for the ORA-4030 error.
-- make big pga
declare
type varchar2_array is table of varchar2(32767) index by pls_integer;
vc varchar2_array;
v varchar2(32767);
begin
for idx in 1 .. 10000 loop
v := rpad('x',32767,'x');
vc(idx) := v;
end loop;
end;
/
ERROR at line 1:
ORA-00600: internal error code, arguments: [723], [41000], [pga heap], [], [],
[], [], []
3. Now, let me set the 600 diagnostic event to get the heap dump when the process hits ORA-600 error. With the combination of 10261 and 600 event, the process would record the PGA heap dump automatically when the PGA heap size reaches the limit.
alter system set events '600 trace name heapdump level 0x20000001';
4. When the PGA heap size reaches the limit(100000KB in this case) again, I would have the complete PGA heap dump including all the recursive subheaps by virtue of the dump level 0×20000001.
DDE: Problem Key 'ORA 600 [723]' was flood controlled (0x2) (incident: 44800) ORA-00600: internal error code, arguments: [723], [41000], [pga heap], [], [], [], [], [] ****** ERROR: PGA size limit exceeded: 102450812 > 102400000 ***** ****************************************************** HEAP DUMP heap name="pga heap" desc=11AFB098 extent sz=0x206c alt=92 het=32767 rec=0 flg=2 opc=1 parent=00000000 owner=00000000 nex=00000000 xsz=0xfff8 heap=00000000 fl2=0x60, nex=00000000 EXTENT 0 addr=39150008 Chunk 39150010 sz= 24528 free " " Chunk 39155fe0 sz= 40992 freeable "koh-kghu call " ds=0D4D9A60 EXTENT 1 addr=39140008 Chunk 39140010 sz= 24528 free " " Chunk 39145fe0 sz= 40992 freeable "koh-kghu call " ds=0D4D9A60 ...
5. The last step is to analyze the heap dump. For instance, I have my own library to analyze the heap dump.
select * from table(tpack.heap_file_report('C:\oracle\diag\rdbms\ukja1106\ukja1106\trace\ukja1106_ora_3640.trc'));
TYPE HEAP_NAME ITEM ITEM_COUNT ITEM_SIZE HEAP_SIZE RATIO
-------- ---------------- ---------------- ---------- ---------- ---------- ----------
HEAP pga heap 0 97.14 97.14 100
HEAP top call heap 0 .18 .18 100
HEAP top uga heap 0 .31 .31 100
CHUNK pga heap free 1554 36.2 97.1 37.3
CHUNK pga heap recreate 9 0 97.1 0
CHUNK pga heap perm 14 0 97.1 0
CHUNK pga heap freeable 1597 60.7 97.1 62.5
CHUNK top uga heap recreate 1 0 .3 19.9
CHUNK top uga heap free 5 0 .3 0
CHUNK top uga heap freeable 4 .2 .3 79.9
CHUNK top call heap free 3 .1 .1 65.5
CHUNK top call heap recreate 2 0 .1 1
CHUNK top call heap freeable 1 0 .1 33.3
CHUNK top call heap perm 1 0 .1 0
OBJECT pga heap kews sqlstat st 1 0 97.1 0
OBJECT pga heap pesom.c:Proces 3 0 97.1 0
...
6. It is also possible to write the script to monitor the V$SESSTAT view(session pga memory) and execute the heap dump when the value hits some specific threshold. For instance, I have my own library which is used like following.
col report_id new_value report_id
select tpack_server.create_report('Heap Dump') as report_id from dual;
exec tpack_server.add_parameter('&report_id', 'dump_level', '0x20000001');
exec tpack_server.add_parameter('&report_id', 'get_whole_contents', 0);
exec tpack_server.add_condition('&report_id', 'STAT', 'session pga memory', '>100000000', 'SUM');
exec tpack_server.register_report('&report_id');
-- start server
exec tpack_server.start_server;
When the PGA heap size hits the limit(100000000B), it would execute the predefined procedure which executes the heap dump.
Fri Jun 11 06:19:10 GMT+00:00 2010 : Session 142 got! sum=659645392, name = session pga memory ... Fri Jun 11 06:27:50 GMT+00:00 2010 : executing report 1:142:1973827792 for session 142 Fri Jun 11 06:27:55 GMT+00:00 2010 : executing report = begin tpack.heap_dump( dump_level=>'0x20000001', get_whole_contents=>0, session_id => 142); end; ...
Also note that the combination of 10261 and 600 event is just a temporary solution and the most important thing is to analyze the heap dump very carefully to make the rapid PGA increase never happen again!
Interesting combination of RAC and serial direct path read
For who are not aware, Oracle 11g has introduced a serial direct path read which enables the efficient direct path read for the serial table scan.
Disabling Direct path read for the serial full table scan-11g A couple of days ago, I was reported a weird performance bias between 2 nodes in 11g RAC. The query has the same execution plan on both nodes.--------------------------------------------------------- | Id | Operation | Name | --------------------------------------------------------- | 0 | SELECT STATEMENT | | | 1 | NESTED LOOPS | | |* 2 | TABLE ACCESS FULL | | |* 3 | VIEW | | | 4 | UNION ALL PUSHED PREDICATE | | |* 5 | FILTER | | | 6 | TABLE ACCESS BY INDEX ROWID| | |* 7 | INDEX RANGE SCAN | | |* 8 | FILTER | | | 9 | TABLE ACCESS BY INDEX ROWID| | |* 10 | INDEX RANGE SCAN | | ---------------------------------------------------------
But on node A, it takes 12 sec to complete while takes only 2 sec on node B.
-- node A all count cpu elapsed disk query current rows ------- ------ -------- ---------- ---------- ---------- ---------- ---------- Parse 1 0.06 0.06 0 0 0 0 Execute 1 0.00 0.00 0 0 0 0 Fetch 2 1.31 10.48 90421 90437 0 2 ------- ------ -------- ---------- ---------- ---------- ---------- ---------- total 4 1.37 10.55 90421 90437 0 2 Elapsed times include waiting on following events: Event waited on Times Max. Wait Total Waited ---------------------------------------- Waited ---------- ------------ library cache lock 1 0.00 0.00 library cache pin 1 0.00 0.00 SQL*Net message to client 2 0.00 0.00 enq: KO - fast object checkpoint 2 0.00 0.00 reliable message 1 0.00 0.00 direct path read 1579 0.00 0.01 SQL*Net message from client 2 23.81 23.81 -- node B call count cpu elapsed disk query current rows ------- ------ -------- ---------- ---------- ---------- ---------- ---------- Parse 1 0.03 0.03 0 0 0 0 Execute 1 0.00 0.00 0 0 0 0 Fetch 2 2.01 2.01 0 90450 0 2 ------- ------ -------- ---------- ---------- ---------- ---------- ---------- total 4 2.04 2.05 0 90450 0 2 Elapsed times include waiting on following events: Event waited on Times Max. Wait Total Waited ---------------------------------------- Waited ---------- ------------ SQL*Net message to client 2 0.00 0.00 gc cr block 2-way 1 0.00 0.00 SQL*Net message from client 2 23.53 23.53 library cache lock 1 0.00 0.00 library cache pin 1 0.00 0.00
This performance bias was always reproduced. It took always 12 sec on node A and 2 sec on node B.
The hint was direct path read wait event which means that the direct path read was being performed.- Because this is not a parallel execution, we have a strong chance of the serial direct path read.
- Nocache LOB can’t be a reason here. If the nocache LOB is the issue, both nodes should have the same performance feature.
- But we should be careful when interpreting the wait count and wait time of direct path read event. We can’t rely on these values due to the mechanism of the direct path read.
The customer was suggested of the 10949 diagnostics event as a workaround and the problem was gone(I know that this event is undocumented and unsupported, but… )
It seems that node A and node B had different environments(different cache size and/or different size of preload blocks …) and as a result, Oracle chose the different ways of executing FTS – direct vs. conventional. Can this be classified as another dark side(side effect) of the automization by Oracle? I’m not sure but the unexpected bias between the multiple nodes is always likely to embarrass the DBAs.

