Dion Cho – Oracle Performance Storyteller

We are natural born scientists

Archive for the ‘Troubleshooting’ Category

Making self deadlock on the library cache pin

with 4 comments

It’s very easy to make your session deadlocked on the library cache pin. Let me prove it.

Step 1. Make a dummy procedure called TEST_PROC1.

TPACK@ukja1120> select * from v$version where rownum = 1;

BANNER
-----------------------------------------------------------------------

Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production

create or replace procedure test_proc1 
is
begin
	null;
end;
/

Step 2. Execute simple PL/SQL which executes TEST_PROC1 and compiles it.

TPACK@ukja1120> begin
  2     test_proc1;
  3
  4     execute immediate 'alter procedure test_proc1 compile';
  5
  6  end;
  7  /

...
(Hang)

Very simple, isn’t it?

ASH cleary shows that the session is waiting for the library cache pin to be released, but it’s being acquried by the session itself.

select * 
from (
	select
		h.session_id as sid,
		to_char(h.sample_time,'mi:ss') as sample_time,
		h.sql_id,
		(select sql_text from v$sqlarea a where a.sql_id = h.sql_id) as sql_text,
		event,
		blocking_session as blocker
	from
		v$active_session_history h
	where
		h.session_id = &sid
	order by h.sample_time desc			
) where rownum <= 20	
;					

 SID SAMPL SQL_ID        SQL_TEXT             EVENT         BLOCKER
---- ----- ------------- -------------------- ---------- ----------
 136 49:10                                    library ca        136
                                              che pin

 136 49:09                                    library ca        136
                                              che pin

 136 49:08                                    library ca        136
                                              che pin

 136 49:07                                    library ca        136
                                              che pin

 136 49:06                                    library ca        136
                                              che pin

 136 49:05                                    library ca        136
                                              che pin

 136 49:04                                    library ca        136
                                              che pin

 136 49:03                                    library ca        136
                                              che pin

 136 49:02                                    library ca        136
                                              che pin

 136 49:01                                    library ca        136
                                              che pin

 136 49:00                                    library ca        136
                                              che pin

 136 48:59                                    library ca        136
                                              che pin

 ...

TPack‘s Wait Detail Report gives more detailed information on the deadlocked library cache pin contention.

TPACK@ukja1120> select * from table(tpack.session_detail(136,'wait_detail'))

NAME                           VALUE
------------------------------ --------------------
SID                            136
Serial#                        2797
SPID                           5148
Program                        sqlplus.exe
Process                        5404:672
Module                         SQL*Plus
SQL ID                         9pbva4bn2m25b
Child No                       0
SQL Text                       alter procedure test
                               _proc1 compile

Status                         ACTIVE
Blocking Instance              1
Blocking Session               136
SQL Exec Start                 2010/09/15 13:45:34
Event                          library cache pin
Seq#                           130
P1(P1raw)                      384372376(0000000016
                               E90E98)

P2(P2raw)                      384372376(0000000016
                               DAB608)

P3(P3raw)                      384372376(00014F8500
                               010003)

Seconds in wait                40
State                          WAITING
Wait Event                     library cache pin
Holder SID                     136
Namespace                      TABLE/PROCEDURE
Object                         TEST_PROC1
Holding Mode                   2(S)

So, this is not a normal situation, isn’t it? What would be the reason of the library cache pin not being released even after the execution?

Written by Dion Cho

September 16, 2010 at 2:06 am

Posted in Troubleshooting

Batching NLJ optimization and ordering

with 5 comments

You might be aware of the batching nested loops join(batching NLJ) optimization introduced at Oracle 11g. For who are not aware of this new feature, let me show you two different execution plans(Oracle 10g vs. 11g) of the same SQL.

-- Oracle 10g
------------------------------------------------
| Id  | Operation                      | Name  |
------------------------------------------------
|   0 | SELECT STATEMENT               |       |
|*  1 |  COUNT STOPKEY                 |       |
|   2 |   TABLE ACCESS BY INDEX ROWID  | T2    |  <-- Here
|   3 |    NESTED LOOPS                |       |
|   4 |     TABLE ACCESS BY INDEX ROWID| T1    |
|*  5 |      INDEX RANGE SCAN          | T1_N1 |
|*  6 |     INDEX RANGE SCAN           | T2_N1 |
------------------------------------------------

-- Oracle 11g
------------------------------------------------
| Id  | Operation                      | Name  |
------------------------------------------------
|   0 | SELECT STATEMENT               |       |
|*  1 |  COUNT STOPKEY                 |       |
|   2 |   NESTED LOOPS                 |       |
|   3 |    NESTED LOOPS                |       |
|   4 |     TABLE ACCESS BY INDEX ROWID| T1    |
|*  5 |      INDEX RANGE SCAN          | T1_N1 |
|*  6 |     INDEX RANGE SCAN           | T2_N1 |
|   7 |    TABLE ACCESS BY INDEX ROWID | T2    |   <-- And here
------------------------------------------------

You might note that Oracle 10g and 11g have very different location of the table lookup operation on the table T2. This is a symptom of the nested loops join optimization of Oracle 11g. By using batching NLJ, you might see a decreased logical reads and improved performance.

So far, so good, but a couple of days ago, I hit a somewhat strange ordering problem with the batching NLJ. Following is a replayable test case. Just note that the query is using index(t1_n1) to implement the pagination without the overhead of ORDER BY.

create table t1
as
select 1 as c1, mod(level, 4) as c2, level as c3, level as c4, rpad('x',1000) as dummy
from dual
connect by level <= 1000;

create table t2
as
select 1001-level as c1, level as c2, rpad('x',1000) as dummy
from dual
connect by level <= 100;

create index t1_n1 on t1(c1, c2, c3);
create index t2_n1 on t2(c1);

exec dbms_stats.gather_table_stats(user, 't1');
exec dbms_stats.gather_table_stats(user, 't2');

explain plan for
select /*+ leading(t1 t2) use_nl(t2) index_asc(t1) index_asc(t2) */
	rownum as rnum,
	t2.c1,
	t1.c4,
	t2.c2
from t1, t2
where 
	t1.c3 = t2.c1
	and t1.c1 = 1
	and t1.c2 = 0
	and rownum <= 20
;

select * from table(dbms_xplan.display);

-- Read from the disk
alter system flush buffer_cache;

select * from (
	select /*+ leading(t1 t2) use_nl(t2) index_asc(t1) index_asc(t2) */
		rownum as rnum,
		t2.c1,
		t1.c4,
		t2.c2
	from t1, t2
	where 
		t1.c3 = t2.c1
		and t1.c1 = 1
		and t1.c2 = 0
		and rownum <= 20
) where rnum >= 15
;

-- Read from the buffer cache
select * from (
	select /*+ leading(t1 t2) use_nl(t2) index_asc(t1) index_asc(t2) */
		rownum as rnum,
		t2.c1,
		t1.c4,
		t2.c2
	from t1, t2
	where 
		t1.c3 = t2.c1
		and t1.c1 = 1
		and t1.c2 = 0
		and rownum <= 20
) where rnum >= 15
;

-- Disable exceptions for buffer cache misses
alter session set "_nlj_batching_misses_enabled" = 0;

-- Read from the disk
alter system flush buffer_cache;

select * from (
	select /*+ leading(t1 t2) use_nl(t2) index_asc(t1) index_asc(t2) */
		rownum as rnum,
		t2.c1,
		t1.c4,
		t2.c2
	from t1, t2
	where 
		t1.c3 = t2.c1
		and t1.c1 = 1
		and t1.c2 = 0
		and rownum <= 20
) where rnum >= 15
;

-- Read from the buffer cache
select * from (
	select /*+ leading(t1 t2) use_nl(t2) index_asc(t1) index_asc(t2) */
		rownum as rnum,
		t2.c1,
		t1.c4,
		t2.c2
	from t1, t2
	where 
		t1.c3 = t2.c1
		and t1.c1 = 1
		and t1.c2 = 0
		and rownum <= 20
) where rnum >= 15
;

To save your time, let me show you the result of the above test case with some comments appended.

-- Case1 : batching NLJ enabled
-- when the query reads from the disk
      RNUM         C1         C4         C2
---------- ---------- ---------- ----------
        15        960        960         41
        16        964        964         37
        17        980        980         21  <-- Why 980 here?
        18        968        968         33
        19        972        972         29
        20        976        976         25


-- when the query reads from the buffer cache
      RNUM         C1         C4         C2
---------- ---------- ---------- ----------
        15        960        960         41
        16        964        964         37
        17        968        968         33
        18        972        972         29
        19        976        976         25
        20        980        980         21


-- Case 2: batching NLJ disabled
-- when the query reads from the disk
      RNUM         C1         C4         C2
---------- ---------- ---------- ----------
        15        960        960         41
        16        964        964         37
        17        968        968         33
        18        972        972         29
        19        976        976         25
        20        980        980         21

-- when the query reads from the buffer cache
      RNUM         C1         C4         C2
---------- ---------- ---------- ----------
        15        960        960         41
        16        964        964         37
        17        968        968         33
        18        972        972         29
        19        976        976         25
        20        980        980         21

The simple explanation is :
“The new optimization code of the nested loops join does not guarantee that the rows would be returned as an order of the outer table, especially when it reads the data from the disk.”

This could be a limitation when you want to get the ordered rows using index and NLJ in the pagination query, but I don’t believe that this could be classified as a bug. The only way to ensure the order of the returning rows is always to use ORDER BY clause.

But if you still want to use INDEX not ORDER BY in the above query(which was a natural selection in Oracle 10g), the workarounds is one of the followings.

  • Set parameter – _nlj_batching_misses_enabled = 0;
  • Set parameter – _nlj_batching_enabled = 0;
  • Append hint – NO_NLJ_BATCHING(t2)

It seems that I need to do further research on the batching NLJ – it’s exact mechanism and the pitfalls.

Written by Dion Cho

August 16, 2010 at 6:33 am

Posted in Troubleshooting

_gby_hash_aggregation_enabled bug

with 2 comments

One of the customers complained that Oracle refuses to use the HASH GROUP BY operation under any circumstances.

It’s 10gR2(10.2.0.1) database.

TPACK@ukja1021> select * from v$version where rownum = 1;

BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Prod

See that Oracle simply refuses to use HASH GROUP BY operation even with the simplest group by?

create table t1
as
select level as c1
from dual
connect by level <= 10;


explain plan for 
select count(*)
from tpack.t1
group by c1;

-------------------------------------------
| Id  | Operation          | Name | Rows  |
-------------------------------------------
|   0 | SELECT STATEMENT   |      |    10 |
|   1 |  SORT GROUP BY     |      |    10 |
|   2 |   TABLE ACCESS FULL| T1   |    10 |
-------------------------------------------

You might think that the “_gby_hash_aggregation_enabled” would have been false which is clearly right.

TPACK@ukja1021> col value format a10
TPACK@ukja1021> @para gby_hash
TPACK@ukja1021> set echo off
old   9: and i.ksppinm like '%&1%'
new   9: and i.ksppinm like '%gby_hash%'

NAME                           VALUE                IS_DEFAUL SES_MODIFI
------------------------------ -------------------- --------- ----------
SYS_MODIFI
----------
DESCRIPTION
-------------------------------------------------------------------------

_gby_hash_aggregation_enabled  FALSE                FALSE     true
immediate
enable group-by and aggregation using hash scheme

But even with the parameter enabled and USE_HASH_AGGREGATION hint applied, Oracle still persists to the SORT GROUP BY operation.

alter session set "_gby_hash_aggregation_enabled" = true;

explain plan for 
select /*+ use_hash_aggregation */ count(*)
from t1
group by c1;

-------------------------------------------
| Id  | Operation          | Name | Rows  |
-------------------------------------------
|   0 | SELECT STATEMENT   |      |    10 |
|   1 |  SORT GROUP BY     |      |    10 |
|   2 |   TABLE ACCESS FULL| T1   |    10 |
-------------------------------------------

Okay, enough. This is not a normal situation. The last resort is MOS. Following bug seems to perfectly match this case.

Bug 8631742: ALTER SESSION SET ... NOT HAVING EFFECT IN 10.2.0.4
...
RELEASE NOTES:
]] Setting _gby_hash_aggregation_enabled at the session level did not
]] always take effect
REDISCOVERY INFORMATION:
If you change the use of hash aggregation at the session level, but this does
not affect the choice of aggregation method in subsequebtly parsed SQL,
you are probably hitting this bug.

One thing to note is that you might see this bug only when you specify the parameter in the parameter file which would be a a bit rare case.

*._gby_hash_aggregation_enabled=FALSE

I know that the earlier implementation of the HASH GROUP BY operation has some serious bugs which made many sites simply turn off it permanently. The problem is that you can’t use that operation even with USE_HASH_AGGREGATION hint which you wouldn’t have expected. :)

FIY, this bug was patched at 10.2.0.5.

Written by Dion Cho

July 6, 2010 at 5:52 am

Rapid PGA size increase

with 8 comments

A couple of days ago, one of our customers experienced a rapid PGA increase and the system got rebounced by PMON. We suggested dumping the PGA heap while the problem reoccurs. But the problem is that the PGA increase is too rapid(upto 3.6G in less than 2 min), so it is a bit hard to get the PGA dump manually.

Fortunately, in situation like this, Oracle provides a way of getting automatic PGA heap dump.

1. First, let me set the 10261 diagnostic event to limit the size of the PGA heap. For instance, following command will limit the size of the PGA heap to 100000KB.

alter system set events '10261 trace name context forever, level 100000';

2. With the 10261 event set, the process would fail with ORA-600 [723] when the size of the PGA heap reaches 100000KB. As you already know, the 10261 event is used as a band-aid for the ORA-4030 error.

-- make big pga
declare
 type varchar2_array is table of varchar2(32767) index by pls_integer;
 vc  varchar2_array;
 v  varchar2(32767);
begin
 for idx in 1 .. 10000 loop
 v := rpad('x',32767,'x');
 vc(idx) := v;
 end loop;
end;
/

ERROR at line 1:
ORA-00600: internal error code, arguments: [723], [41000], [pga heap], [], [],
[], [], []

3. Now, let me set the 600 diagnostic event to get the heap dump when the process hits ORA-600 error. With the combination of 10261 and 600 event, the process would record the PGA heap dump automatically when the PGA heap size reaches the limit.

alter system set events '600 trace name heapdump level 0x20000001';

4. When the PGA heap size reaches the limit(100000KB in this case) again, I would have the complete PGA heap dump including all the recursive subheaps by virtue of the dump level 0x20000001.

DDE: Problem Key 'ORA 600 [723]' was flood controlled (0x2) (incident: 44800)
ORA-00600: internal error code, arguments: [723], [41000], [pga heap], [], [], [], [], []
****** ERROR: PGA size limit exceeded: 102450812 > 102400000 *****
******************************************************
HEAP DUMP heap name="pga heap"  desc=11AFB098
 extent sz=0x206c alt=92 het=32767 rec=0 flg=2 opc=1
 parent=00000000 owner=00000000 nex=00000000 xsz=0xfff8 heap=00000000
 fl2=0x60, nex=00000000
EXTENT 0 addr=39150008
  Chunk 39150010 sz=    24528    free      "               "
  Chunk 39155fe0 sz=    40992    freeable  "koh-kghu call  "  ds=0D4D9A60
EXTENT 1 addr=39140008
  Chunk 39140010 sz=    24528    free      "               "
  Chunk 39145fe0 sz=    40992    freeable  "koh-kghu call  "  ds=0D4D9A60
...

5. The last step is to analyze the heap dump. For instance, I have my own library to analyze the heap dump.

select * from table(tpack.heap_file_report('C:\oracle\diag\rdbms\ukja1106\ukja1106\trace\ukja1106_ora_3640.trc'));

TYPE     HEAP_NAME        ITEM             ITEM_COUNT  ITEM_SIZE  HEAP_SIZE      RATIO
-------- ---------------- ---------------- ---------- ---------- ---------- ----------
HEAP     pga heap                                   0      97.14      97.14        100
HEAP     top call heap                              0        .18        .18        100
HEAP     top uga heap                               0        .31        .31        100
CHUNK    pga heap         free                   1554       36.2       97.1       37.3
CHUNK    pga heap         recreate                  9          0       97.1          0
CHUNK    pga heap         perm                     14          0       97.1          0
CHUNK    pga heap         freeable               1597       60.7       97.1       62.5
CHUNK    top uga heap     recreate                  1          0         .3       19.9
CHUNK    top uga heap     free                      5          0         .3          0
CHUNK    top uga heap     freeable                  4         .2         .3       79.9
CHUNK    top call heap    free                      3         .1         .1       65.5
CHUNK    top call heap    recreate                  2          0         .1          1
CHUNK    top call heap    freeable                  1          0         .1       33.3
CHUNK    top call heap    perm                      1          0         .1          0
OBJECT   pga heap         kews sqlstat st           1          0       97.1          0
OBJECT   pga heap         pesom.c:Proces            3          0       97.1          0
...

6. It is also possible to write the script to monitor the V$SESSTAT view(session pga memory) and execute the heap dump when the value hits some specific threshold. For instance, I have my own library which is used like following.

col report_id new_value report_id

select tpack_server.create_report('Heap Dump') as report_id from dual;

exec tpack_server.add_parameter('&report_id', 'dump_level', '0x20000001');
exec tpack_server.add_parameter('&report_id', 'get_whole_contents', 0);

exec tpack_server.add_condition('&report_id', 'STAT', 'session pga memory', '>100000000', 'SUM');

exec tpack_server.register_report('&report_id');

-- start server
exec tpack_server.start_server;

When the PGA heap size hits the limit(100000000B), it would execute the predefined procedure which executes the heap dump.

Fri Jun 11 06:19:10 GMT+00:00 2010 : Session 142 got! sum=659645392, name = session pga memory
...
Fri Jun 11 06:27:50 GMT+00:00 2010 : executing report 1:142:1973827792 for session 142
Fri Jun 11 06:27:55 GMT+00:00 2010 : executing report = begin tpack.heap_dump(  dump_level=>'0x20000001', get_whole_contents=>0,  session_id => 142); end;
...

Also note that the combination of 10261 and 600 event is just a temporary solution and the most important thing is to analyze the heap dump very carefully to make the rapid PGA increase never happen again!

Written by Dion Cho

June 14, 2010 at 2:10 am

Posted in Troubleshooting

Tagged with , , ,

Interesting case of ORA-01792 error.

with one comment

One of my customers had a very long but not complex SQL statement generated dynamically and hit the following ORA error.

ORA-01792: maximum number of columns in a table or view is 1000

You have very little chance to meet this error in the real life. The definition of the error code is like following.

01792
 "maximum number of columns in a table or view is 1000"
// *Cause: An attempt was made to create a table or view with more than 1000
//         columns, or to add more columns to a table or view which pushes
//         it over the maximum allowable limit of 1000. Note that unused
//         columns in the table are counted toward the 1000 column limit.
// *Action: If the error is a result of a CREATE command, then reduce the
//         number of columns in the command and resubmit. If the error is
//         a result of an ALTER TABLE command, then there are two options:
//         1) If the table contained unused columns, remove them by executing
//            ALTER TABLE DROP UNUSED COLUMNS before adding new columns;
//         2) Reduce the number of columns in the command and resubmit.

But the customer did not have any table whose column count is over 1000. Besides that the SQL statement was just a plain query not any type of DDL.

A close look on the definition would tell you that a view also has a 1000 limitation. So I requested the source SQL statement and the statement has following pattern.

select * from (
   select b, 
       decode(a, 1, 1) as c1,
       decode(a, 2, 2) as c2,
       ...
       decode(a, 1000, 1000) as c1000
   from t1
) 
  • The SQL statement has declared unnecessary inline view which seems to be generated automatically.
  • The inline view has declared just over 1000 dynamic columns using decode.

So, it seems that Oracle cannot declare columns more than 1000 inside a inline view. Let me prove it with following simple test case.

ORA-01792 error with the inline view containing 1001 columns.

UKJA@ukja1021> declare
  2  	     v_cursor	     sys_refcursor;
  3  	     v_sql		     varchar2(32767);
  4  begin
  5  	     v_sql := 'select * from (select ';
  6  	     for idx in 1 .. 1000 loop
  7  		     v_sql := v_sql || rpad('1',10, '1')  || ' as c'|| idx || ', ';
  8  	     end loop;
  9  	     v_sql := v_sql || '1  from dual)';
 10  
 11  	     open v_cursor for v_sql;
 12  	     close v_cursor;
 13  end;
 14  /
declare
*
ERROR at line 1:
ORA-01792: maximum number of columns in a table or view is 1000
ORA-06512: at line 11

Without inline view, no ORA-01792 error.

Elapsed: 00:00:00.06
UKJA@ukja1021> 
UKJA@ukja1021> declare
  2  	     v_cursor	     sys_refcursor;
  3  	     v_sql		     varchar2(32767);
  4  begin
  5  	     v_sql := 'select ';
  6  	     for idx in 1 .. 1000 loop
  7  		     v_sql := v_sql || rpad('1',10, '1')  || ' as c'|| idx || ', ';
  8  	     end loop;
  9  	     v_sql := v_sql || '1  from dual';
 10  
 11  	     open v_cursor for v_sql;
 12  	     close v_cursor;
 13  end;
 14  /

PL/SQL procedure successfully completed.

One thing you should keep in mind is that Oracle limits the 1000 columns while it is checking syntax. Removing view with view merging would have no effect on this 1000 limitation.

This customer had the luck to be free from ORA-01792 error simply by removing unnecessary inline view. But under more complex situation, the workaround would have not been that easy.

Written by Dion Cho

September 10, 2009 at 4:25 am

Posted in Troubleshooting

Tagged with

Playing with ORA-4030 error

with 3 comments

ORA-4030 has such a long history that for now, even junior DBAs should have a good knowledge on it. But the reality is not like my expectation. That’s why I’m posting this article – to help them play with ORA-4030 and have an insight on it!

Look at this definition.

04030
 "out of process memory when trying to allocate %s bytes (%s,%s)"
// *Cause:  Operating system process private memory has been exhausted
// *Action:

It clearly says that this is OS process private memory problem! But I believe that this statement is 50% of truth.

As far as I know, there are 3 common reasons for this error.

  1. You have under-configured OS memory limitation.
  2. You have memory leak bug.
  3. You are allocating too many objects.

In the Unix system, the low limit of the process configuration can cause ORA-4030 error. The quick cure is to increase the limit – the most prefered value is UNLIMITED.

prompt> ulimit -a

      core file size                   (blocks, -c)    0
      data seg size                  (kbytes, -d)    unlimited
      file size                              (blocks, -f)    unlimited
      max locked memory     (kbytes, -l)    unlimited
      max memory size        (kbytes, -m)    unlimited
      open files                                        (-n)    1024
      pipe size                     (512 bytes, -p)    8
      stack size                         (kbytes, -s)    unlimited
      cpu time                         (seconds, -t)    unlimited
      max user processes                    (-u)    7168
      virtual memory               (kbytes, -v)    unlimited

In the Windows 32 system, the limit of the private memory is around 1.5G. You cannot grow your process memory over than it regardless of your physical RAM size. For this reason, it’s somtimes recommended to use the Windows 64 system.

If the problem persists even with the higher OS configuration, you have 2 other reasons. You’re hitting Oracle bug(memory leak) and/or allocating too many objects.

The most intuitive and natural way for further investigation is the PGA heap dump. As far as I know, this is the easiest way!

Before introuducing how to use PGA heap dump, you should understand exactly when you hit ORA-4030 error. Let me show you simple examples.

1. I have following minimum PGA size configuration.

alter system set "_pga_max_size" = 15000000;
alter system set pga_aggregate_target=50m;


2.I build very big PL/SQL block so that it exceeds the max PGA size. Would I hit ORA-4030 error?

Note: I had a serious misinterpretation on the following second test case and please kindly ignore this specific test case.

-- case1. this does not cause 4030
set serveroutput on
spool temp.sql

begin
	dbms_output.put_line('declare');
	for idx in 1 .. 8000 loop
		dbms_output.put_line(' v' || idx || ' varchar2(4000) := rpad(''x'',4000);');
	end loop;
	dbms_output.put_line('begin');
	dbms_output.put_line('null;');
	dbms_output.put_line('end;');
	dbms_output.put_line('/');
end;
/

spool off
@temp

v91 varchar2(4000) := rpad('x',4000);
               *
ERROR at line 92:
ORA-06550: line 1753, column 30:
PLS-00123: program too large (Diana nodes)

NAME                                      VALUE
------------------------------ ----------------
session pga memory                    2,022,996
session pga memory max                2,022,996


The answer seems NO and Oracle limits the program size below the maximum PGA size. Good. You would never write such an ugly codes anyway!

3.I build 1.2G byte of LOB local variable. Would I hit ORA-4030?

declare 
	v1		clob;
begin
	for idx in 1 .. 1200000 loop
		v1 := v1 || rpad('x', 1000, 'x');
	end loop;
end;
/

NAME                                                VALUE
---------------------------------------- ----------------
session pga memory                              2,547,284
session pga memory max                          5,627,476

PL/SQL procedure successfully completed.

Absolutely NO! Oracle gracefully handles the LOB data – well designed. And see the maximum memory usage. It never exceeds some specific size.

We can expect the same behavior with the sort operation. Sorting 2G of data would cause performance problem with 15m of PGA max size, but you never hit ORA-4030 error.

4. Then when do I hit ORA-4030? See following trick.

create or replace procedure proc_rec(depth number)
is
	v1 		varchar2(1000) := rpad('x',1000);
	v2 		varchar2(1000) := rpad('x',1000);
	v3 		varchar2(1000) := rpad('x',1000);
	v4 		varchar2(1000) := rpad('x',1000);
	v5 		varchar2(1000) := rpad('x',1000);
	v6 		varchar2(1000) := rpad('x',1000);
	v7 		varchar2(1000) := rpad('x',1000);
	v8 		varchar2(1000) := rpad('x',1000);
	v9 		varchar2(1000) := rpad('x',1000);
	v10 		varchar2(1000) := rpad('x',1000);
begin
	if depth > 0 then
		proc_rec(depth - 1);
	end if;
end;
/

UKJA@ukja102> exec proc_rec(100);

NAME                                      VALUE
------------------------------ ----------------
session pga memory                    2,940,500
session pga memory max                2,940,500

Elapsed: 00:00:00.00

UKJA@ukja102> exec proc_rec(10000);

NAME                                      VALUE
------------------------------ ----------------
session pga memory                  111,140,436
session pga memory max              124,575,316

UKJA@ukja102> exec proc_rec(20000);

NAME                                      VALUE
------------------------------ ----------------
session pga memory                  220,323,412
session pga memory max              248,307,284

Do you see that memory usage goes far beyond the PGA max size? Many complex Oracle applications have a complex structure of function calls. If this structure meets unexpected Oracle bug or logic hole, you have a strong chance to hit ORA-4030 error.

5. PL/SQL collections also go beyond your configuration.

create or replace procedure proc_array(len number)
is
	type vtable is table of varchar2(1000);
	vt	 vtable := vtable();
begin
	for idx in 1 .. len loop
		vt.extend;
		vt(idx) := rpad('x',1000,'x');
	end loop;
end;
/
UKJA@ukja102> exec proc_array(10000);

NAME                                      VALUE
------------------------------ ----------------
session pga memory                  221,896,276
session pga memory max              248,307,284

UKJA@ukja102> exec proc_array(10000);

NAME                                      VALUE
------------------------------ ----------------
session pga memory                  220,192,340
session pga memory max              364,961,364

UKJA@ukja102> exec proc_array(1200000);

ERROR at line 1:
ORA-04030: out of process memory when trying to allocate 16396 bytes (koh-kghu
call ,pl/sql vc2)

NAME                                      VALUE
------------------------------ ----------------
session pga memory                  219,668,052
session pga memory max            1,405,476,436

The memory usage hit the OS boundary(1.4G here on my Windows 32) – This is why ORA-4030 error is defined as “OS private process memory problem”. But the actual problem lies with the agressive memory allocation by my application.

6. OK, the play is over. Now what can I do? I would prefer PGA heap dump! I can dump PGA heap(including subheaps) automatically using following diagnostic event syntax.

-- This time 
alter session set events 
	'4030 trace name heapdump level 0x20000001, lifetime 1';

Heapdump on level 0x20000001 is described here.

The next step is summarizing the raw trace file and making some readible reports.

UKJA@ukja102> @heap_analyze ukja10_ora_5728.trc
 

HEAP_NAME                   HSZ
-------------------- ----------
pga heap                1,362.7
koh-kghu call           1,022.8
top uga heap                 .2
session heap                 .2
top call heap                .1
PLS non-lib hp               .0
qmtmInit                     .0
Alloc environm               .0
KSFQ heap                    .0
Alloc server h               .0
koh-kghu sessi               .0
callheap                     .0
 
12 rows selected.
 
Elapsed: 00:00:00.09
 
HEAP_NAME            CHUNK_TYPE           CNT         SZ        HSZ HRATIO
-------------------- --------------- -------- ---------- ---------- ------
Alloc environm       freeable               3         .0         .0   80.9
Alloc environm       recreate               1         .0         .0   14.6
Alloc environm       perm                   2         .0         .0    4.5
Alloc server h       free                   6         .0         .0   94.3
Alloc server h       perm                   2         .0         .0    3.1
Alloc server h       freeable               2         .0         .0    2.6
KSFQ heap            perm                   2         .0         .0  100.0
PLS non-lib hp       freeable              12         .0         .0   75.1
PLS non-lib hp       free                   6         .0         .0   15.9
PLS non-lib hp       perm                   2         .0         .0    9.0
callheap             free                   6         .0         .0   77.1
callheap             perm                   2         .0         .0   22.9
koh-kghu call        freeable          65,415    1,022.8    1,022.8  100.0
koh-kghu sessi       freeable               4         .0         .0  100.0
pga heap             freeable          65,453    1,024.1    1,362.7   75.2
pga heap             free              43,616      338.4    1,362.7   24.8
pga heap             recreate               6         .0    1,362.7     .0
pga heap             perm                  28         .2    1,362.7     .0
qmtmInit             freeable              12         .0         .0   69.1
qmtmInit             free                   8         .0         .0   30.9
session heap         perm                   2         .1         .2   36.8
session heap         freeable             333         .1         .2   33.6
session heap         free                  14         .0         .2   23.0
session heap         recreate               8         .0         .2    6.6
top call heap        free                   2         .1         .1   93.5
top call heap        perm                   2         .0         .1     .1
top call heap        freeable               1         .0         .1    3.1
top call heap        recreate               2         .0         .1    3.3
top uga heap         recreate               1         .1         .2   33.3
top uga heap         free                   6         .1         .2   33.4
top uga heap         freeable               1         .1         .2   33.3
 
31 rows selected.
 
Elapsed: 00:00:00.15
 
HEAP_NAME            OBJ_TYPE                  CNT         SZ        HSZ HRATIO
-------------------- -------------------- -------- ---------- ---------- ------
Alloc environm                                   1         .0         .0   17.2
Alloc environm       Alloc server h              3         .0         .0   78.4
Alloc environm       perm                        2         .0         .0    4.5
Alloc server h                                   8         .0         .0   96.9
Alloc server h       perm                        2         .0         .0    3.1
KSFQ heap            perm                        2         .0         .0  100.0
PLS non-lib hp       PL/SQL STACK                2         .0         .0   69.3
PLS non-lib hp       PLSQL Stack des             2         .0         .0     .2
PLS non-lib hp       perm                        2         .0         .0    9.0
PLS non-lib hp       pl_lut_alloc                1         .0         .0     .4
PLS non-lib hp       peihstdep                   5         .0         .0     .7
PLS non-lib hp       PEIDEF                      1         .0         .0    4.2
PLS non-lib hp                                   6         .0         .0   15.9
PLS non-lib hp       pl_iot_alloc                1         .0         .0     .4
callheap                                         6         .0         .0   77.1
callheap             perm                        2         .0         .0   22.9
koh-kghu call        pl/sql vc2             65,414    1,022.8    1,022.8  100.0  <-- This is it!
koh-kghu call        pmucalm coll                1         .0    1,022.8     .0
koh-kghu sessi       pl/sql vc2                  1         .0         .0   39.9
koh-kghu sessi       pliost struct               3         .0         .0   60.1
pga heap             KJZT context                1         .0    1,362.7     .0
pga heap             external name               1         .0    1,362.7     .0
pga heap             KFIO PGA struct             1         .0    1,362.7     .0
pga heap             KSFQ heap descr             1         .0    1,362.7     .0
pga heap             PLS cca hp desc             1         .0    1,362.7     .0
pga heap             KFK PGA                     1         .0    1,362.7     .0
pga heap             kews sqlstat st             1         .0    1,362.7     .0
pga heap             koh-kghu call h             2         .0    1,362.7     .0
pga heap             kpuinit env han             1         .0    1,362.7     .0
pga heap             joxp heap                   2         .0    1,362.7     .0
pga heap             kjztprq struct              1         .0    1,362.7     .0
pga heap             kopolal dvoid               5         .0    1,362.7     .0
pga heap             KSFQ heap                   1         .0    1,362.7     .0
pga heap             Alloc environm              2         .0    1,362.7     .0
pga heap             ldm context                13         .0    1,362.7     .0
pga heap             qmtmInit                    4         .0    1,362.7     .0
pga heap             kgh stack                   1         .0    1,362.7     .0
pga heap             PLS non-lib hp              3         .0    1,362.7     .0
pga heap             Fixed Uga                   1         .0    1,362.7     .0
pga heap             perm                       28         .2    1,362.7     .0
pga heap             kzsna:login nam             1         .0    1,362.7     .0
pga heap             koh-kghu call          65,415    1,024.1    1,362.7   75.2
pga heap                                    43,616      338.4    1,362.7   24.8
qmtmInit             qmushtCreate                3         .0         .0   44.9
qmtmInit                                         8         .0         .0   30.9
qmtmInit             qmtmltAlloc                 6         .0         .0   23.0
qmtmInit             qmtmltCreate                3         .0         .0    1.2
session heap         perm                        2         .1         .2   36.8
session heap                                    14         .0         .2   23.0
session heap         koklug hxctx in             1         .0         .2     .0
session heap         koklug hlctx in             1         .0         .2     .0
session heap         koddcal dvoid               1         .0         .2     .0
session heap         system trigger              1         .0         .2     .0
session heap         kxsFrame4kPage              5         .0         .2   12.8
session heap         koh-kghu sessio             7         .0         .2    4.9
session heap         koh-kghu sessi              6         .0         .2    4.6
session heap         kxsc: kkspsc0              12         .0         .2    3.6
session heap         kgsc ht segs              266         .0         .2    3.2
session heap         PLS non-lib hp              2         .0         .2    2.6
session heap         kzctxhugi1                  1         .0         .2    2.6
session heap         kpuinit env han             1         .0         .2    1.0
session heap         kgiob                       6         .0         .2     .7
session heap         kokl lob id has             1         .0         .2     .6
session heap         kxs-heap-p                  1         .0         .2     .6
session heap         kodpai image                1         .0         .2     .6
session heap         kxs-krole                   7         .0         .2     .4
session heap         session languag             1         .0         .2     .3
session heap         Session NCHAR l             1         .0         .2     .3
session heap         PLS cca hp desc             2         .0         .2     .2
session heap         kokl transactio             1         .0         .2     .2
session heap         kokahin kgglk               1         .0         .2     .1
session heap         kqlpWrntoStr:st             1         .0         .2     .1
session heap         kwqidwh memory              2         .0         .2     .1
session heap         kwqaalag                    2         .0         .2     .1
session heap         kgiobdtb                    1         .0         .2     .1
session heap         kwqb context me             2         .0         .2     .1
session heap         kwqica hash tab             2         .0         .2     .1
session heap         kwqmahal                    2         .0         .2     .1
session heap         kodmcon kodmc               1         .0         .2     .0
session heap         kzsrcrdi                    1         .0         .2     .0
session heap         ksulu : ksulueo             1         .0         .2     .0
top call heap        perm                        2         .0         .1     .1
top call heap        callheap                    3         .0         .1    6.4
top call heap                                    2         .1         .1   93.5
top uga heap         session heap                2         .1         .2   66.6
top uga heap                                     6         .1         .2   33.4
 
86 rows selected.
 
Elapsed: 00:00:00.15
 
HEAP_NAME            SUBHEAP                   CNT         SZ        HSZ HRATIO
-------------------- -------------------- -------- ---------- ---------- ------
Alloc environm       ds=04F665B8                 2         .0         .0   63.7
Alloc environm                                   4         .0         .0   36.3
Alloc server h                                  10         .0         .0  100.0
KSFQ heap                                        2         .0         .0  100.0
PLS non-lib hp                                  20         .0         .0  100.0
callheap                                         8         .0         .0  100.0
koh-kghu call                               65,415    1,022.8    1,022.8  100.0
koh-kghu sessi                                   4         .0         .0  100.0
pga heap             ds=05003858            65,414    1,024.1    1,362.7   75.2
pga heap                                    43,683      338.6    1,362.7   24.8
pga heap             ds=04F67B34                 1         .0    1,362.7     .0
pga heap             ds=04F8D9CC                 2         .0    1,362.7     .0
pga heap             ds=04FE3470                 3         .0    1,362.7     .0
qmtmInit                                        20         .0         .0  100.0
session heap                                   356         .2         .2   98.7
session heap         ds=07F7EAAC                 1         .0         .2    1.3
top call heap        ds=083B8E20                 1         .0         .1    3.1
top call heap                                    6         .1         .1   96.9
top uga heap                                     7         .1         .2   66.7
top uga heap         ds=05007600                 1         .1         .2   33.3
 
20 rows selected.
 
Elapsed: 00:00:00.12

(heap_analyze.sql is here)
It’s a long list, but do you see the culprit? – koh-kghu call pl/sql vc2

KOH means Kernel Object Heap and KGHU means Kernel Generic Service for Heap Management(U might be UGA or User object). So this means that this process is allocating too many objects of PL/SQL Varchar2.
(Seel metalink note 175982.1 for further Oracle naming conventions)

I should admit that the real-life ORA-4030 troubleshooting is far more difficult than my simple test case. You have no access to the source code and the object names are decryptic.

But being on the good start point is very important and I believe that above simple demonstration would help you to start troubeshooting ORA-4030 in more scientific way.

Written by Dion Cho

July 27, 2009 at 6:18 am

Posted in Troubleshooting

Tagged with

Object name from file# and block#

with 13 comments

Get object name from file# and block#

Above simple request has met with a big disappointment on the performance. For instance, let’s assume that I have following wait events.

WAIT #6: nam='db file scattered read' ela= 438472 file#=6 block#=2641 blocks=8
WAIT #6: nam='db file scattered read' ela= 1039 file#=6 block#=833 blocks=8 obj#=90054 tim=878243950382
WAIT #6: nam='db file scattered read' ela= 835 file#=10 block#=22961 blocks=8 obj#=90054 tim=878243957168
WAIT #6: nam='db file scattered read' ela= 815 file#=11 block#=7409 blocks=8 obj#=90054 tim=878243966696
...



How can I get the object(segment) name from the p1(file#) and p2(block#)? The general and natural solution is querying DBA_EXTENTS view, which turns out to be very under-performing.

UKJA@ukja102> ed which_obj

/*
define __FILE = &1
define __BLOCK = &2

select segment_name
from dba_extents
where file_id = &__FILE
			and &__BLOCK between block_id and block_id + blocks - 1
			and rownum = 1
;

set echo on

*/

UKJA@ukja102> @which_obj 6 2641

SEGMENT_NAME
--------------------
T1_N1

Elapsed: 00:02:43.84

Statistics
----------------------------------------------------------
       4676  recursive calls
          2  db block gets
    4077424  consistent gets
       6492  physical reads
          0  redo size
        418  bytes sent via SQL*Net to client
        400  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          5  sorts (memory)
          0  sorts (disk)
          1  rows processed



Bang!

So people started to implement alternatives like 1) making summarized table for DBA_EXTENTS view, or 2) querying X$BH view very quickly, or 3) doing block dump and find the object id from it.

The 3rd alternative seems promising to me, so I made an automized script to get object information from file# and block# like following.

UKJA@ukja102> ed which_obj2

/*
define __FILE = &1
define __BLOCK = &2

alter system dump datafile &__FILE block &__BLOCK;

set serveroutput on

declare
	v_dba		varchar2(100);
	v_type	varchar2(100);
	v_obj_id		number;
	v_obj_name	varchar2(100);
begin
	for r in (select column_value as t from table(get_trace_file1)) loop
		if regexp_like(r.t, 'buffer tsn:') then
			dbms_output.put_line('------------------------------------------------');
			v_dba := regexp_substr(r.t, '[[:digit:]]+/[[:digit:]]+');
			dbms_output.put_line(rpad('dba = ',20)|| v_dba);
		end if;
		
		if regexp_like(r.t, 'type: 0x([[:xdigit:]]+)=([[:print:]]+)') then
			v_type := substr(regexp_substr(r.t, '=[[:print:]]+'), 2);
			dbms_output.put_line(rpad('type = ',20)|| v_type);
		end if;
		
		if regexp_like(r.t, 'seg/obj:') then 
			v_obj_id := to_dec(substr(regexp_substr(r.t, 
							'seg/obj: 0x[[:xdigit:]]+'), 12));
			select object_name into v_obj_name from all_objects 
				where data_object_id = v_obj_id;
			dbms_output.put_line(rpad('object_id = ',20)|| v_obj_id);
			dbms_output.put_line(rpad('object_name = ',20)|| v_obj_name);
		end if;
		
		if regexp_like(r.t, 'Objd: [[:digit:]]+') then
			v_obj_id := substr(regexp_substr(r.t, 'Objd: [[:digit:]]+'), 7);
			select object_name into v_obj_name from all_objects 
				where data_object_id = v_obj_id;
			dbms_output.put_line(rpad('object_id = ',20)|| v_obj_id);
			dbms_output.put_line(rpad('object_name = ',20)|| v_obj_name);
		end if;
		
	end loop;
	
	dbms_output.put_line('------------------------------------------------');
	
end;
/

*/

UKJA@ukja102> @which_obj2 6 2641
old   1: alter system dump datafile &__FILE block &__BLOCK
new   1: alter system dump datafile 6 block 2641

System altered.

Elapsed: 00:00:00.01
------------------------------------------------
dba =               6/2641
type =              FIRST LEVEL BITMAP BLOCK
object_id =         90055
object_name =       T1_N1
------------------------------------------------
PL/SQL procedure successfully completed.

Elapsed: 00:00:00.04



2 minutes of under-performing job has been replaced with 0.5 second of beautiful result.

Written by Dion Cho

July 6, 2009 at 3:52 am

Posted in Misc., Troubleshooting

Tagged with ,

Follow

Get every new post delivered to your Inbox.

Join 59 other followers