Troubleshooting PGA leak with PGA heap dump
Jonathan Lewis shared interesting and practical experience on PGA memory leak here. Here he mentions on the memory leaks by forall batch insert bug.
This is actually a very interesting case where troubleshooting using PGA heap dump seems to be one of the feasible ways to analyze the problem(as Tanel responded to Jonathan’s request). I would just like to show and discuss how I would investigate this problem and find the solution using PGA heap dump.
This is the simple simulation made by Jonathan Lewis which makes memory leak on PGA.
define m_string_length = 20 drop table t1 purge; create table t1(v1 varchar2( &m_string_length )); create or replace procedure proc1 ( i_rowcount in number default 1000000, i_bulk_pause in number default 0, i_forall_pause in number default 0, i_free_pause in number default 0 ) as type w_type is table of varchar2( &m_string_length ); w_list w_type := w_type(); w_free w_type := w_type(); begin for i in 1..i_rowcount loop w_list.extend; w_list(i) := rpad('x', &m_string_length ); end loop; dbms_lock.sleep(i_bulk_pause); forall i in 1..w_list.count insert into t1 values(w_list(i)); dbms_lock.sleep(i_forall_pause); commit; w_list := w_free; dbms_session.free_unused_user_memory; dbms_lock.sleep(i_free_pause); end; /
Let me monitor the session memory usage while making multiple calls on this procedure.
UKJA@ukja102> exec proc1; PL/SQL procedure successfully completed. SYS@ukja10> select 2 name, value 3 from 4 v$sesstat ss, 5 v$statname sn 6 where 7 sn.name like '%ga memory%' 8 and ss.statistic# = sn.statistic# 9 and ss.sid = 149 10 ; NAME VALUE ------------------------------ ---------------- session uga memory 498,942,840 session uga memory max 500,053,908 session pga memory 599,907,924 session pga memory max 675,602,004 UKJA@ukja102> exec proc1; PL/SQL procedure successfully completed. NAME VALUE ------------------------------ ---------------- session uga memory 695,429,288 session uga memory max 695,429,288 session pga memory 904,191,572 session pga memory max 904,257,108 UKJA@ukja102> exec proc1; PL/SQL procedure successfully completed. NAME VALUE ------------------------------ ---------------- session uga memory 840,565,088 session uga memory max 840,565,088 session pga memory 1,077,861,972 session pga memory max 1,077,861,972
Very serious memory leak! Isn’t it? 1) Oracle does not release memory even after the call has completed and 2) Oracle does not 100% reuse the memory.
The problem is that it’s almost impossible to identify on which objects we’re losing memory. As far as I know, Oracle does not expose the PGA memory structure by v$ views nor x$views. (Jonathan Lewis pointed me to the X$KSMPP view)
So what else can we do? PGA heap dump is the only thing left to us.
oradebug setospid 6760 oradebug dump heapdump 1
The heap dump file is quite easy to interpret, but it is sometimes too long to interpret with our eyes. So I built heap_analyze.sql to retrieve summarized report from raw dump file. Here is the result for the PGA heap dump.
UKJA@ukja102> @heap_analyze heap_dump_1.trc UKJA@ukja102> set echo off TSIZE ---------- 598488868 Elapsed: 00:00:00.57 ATYPE CSIZE HEAP_SIZE RATIO ------------------------ ---------- ---------- ------ freeable 498650144 598488868 83.32 perm 181928 598488868 .03 recreate 87180 598488868 .01 free 99569616 598488868 16.64 Elapsed: 00:00:00.75 CTYPE CSIZE HEAP_SIZE RATIO ------------------------------ ---------- ---------- ------ Alloc environm 4144 598488868 .00 Fixed Uga 20572 598488868 .00 KFIO PGA struct 72 598488868 .00 KFK PGA 260 598488868 .00 KJZT context 60 598488868 .00 KSFQ heap 3928 598488868 .00 KSFQ heap descr 92 598488868 .00 PLS cca hp desc 212 598488868 .00 PLS non-lib hp 18560 598488868 .00 callheap 2144 598488868 .00 external name 24 598488868 .00 joxp heap 2000 598488868 .00 kews sqlstat st 1292 598488868 .00 kgh stack 17012 598488868 .00 kjztprq struct 2068 598488868 .00 koh-kghu call h 1328 598488868 .00 kopolal dvoid 2524 598488868 .00 kpuinit env han 1584 598488868 .00 kzsna:login nam 24 598488868 .00 ldm context 12712 598488868 .00 perm 181928 598488868 .03 qmtmInit 13980 598488868 .00 session heap 498632732 598488868 83.32 free 99569616 598488868 16.64 24 rows selected. Elapsed: 00:00:00.65 DS CSIZE HEAP_SIZE RATIO -------------------- ---------- ---------- ------ 083BD9CC 10320 598488868 .00 08563470 12436 598488868 .00 085A7600 498567256 598488868 83.30 free 99898856 598488868 16.69 Elapsed: 00:00:00.68
Hm… nothing special except the fact that the subheap(085A7600) occupies 83.3% of the PGA heap. So I do subheap dump for that heap address.
oradebug dump heapdump_addr 1 0xa067600 UKJA@ukja102> @heap_analyze heap_subdump_1.trc UKJA@ukja102> set echo off TSIZE ---------- 498561776 Elapsed: 00:00:00.40 ATYPE CSIZE HEAP_SIZE RATIO ------------------------ ---------- ---------- ------ freeable 488817960 498561776 98.05 perm 54896 498561776 .01 recreate 14852 498561776 .00 free 9674068 498561776 1.94 Elapsed: 00:00:00.56 CTYPE CSIZE HEAP_SIZE RATIO ------------------------------ ---------- ---------- ------ PLS cca hp desc 400 498561776 .00 PLS non-lib hp 488740608 498561776 98.03 Session NCHAR l 552 498561776 .00 kgict 40 498561776 .00 kgicttab 44 498561776 .00 kgicu 92 498561776 .00 kgiob 1928 498561776 .00 kgiobdtb 192 498561776 .00 kgsc ht segs 5720 498561776 .00 koddcal dvoid 24 498561776 .00 kodmcon kodmc 64 498561776 .00 kodpai image 1036 498561776 .00 koh-kghu sessi 15888 498561776 .00 koh-kghu sessio 14252 498561776 .00 kokahin kgglk 140 498561776 .00 kokl lob id has 1036 498561776 .00 kokl transactio 268 498561776 .00 koklug hlctx in 24 498561776 .00 koklug hxctx in 24 498561776 .00 kpuinit env han 1584 498561776 .00 kqlpWrntoStr:st 112 498561776 .00 ksulu : ksulueo 40 498561776 .00 kwqaalag 92 498561776 .00 kwqb context me 92 498561776 .00 kwqica hash tab 92 498561776 .00 kwqidwh memory 92 498561776 .00 kwqmahal 92 498561776 .00 kxs-heap-d 1036 498561776 .00 kxs-heap-p 4148 498561776 .00 kxs-krole 780 498561776 .00 kxsFrame4kPage 28840 498561776 .01 kxsc: kkspbd0 968 498561776 .00 kxsc: kkspsc0 7756 498561776 .00 kzctxhugi1 4108 498561776 .00 kzsrcrdi 60 498561776 .00 perm 54896 498561776 .01 session languag 552 498561776 .00 system trigger 36 498561776 .00 free 9674068 498561776 1.94 39 rows selected. Elapsed: 00:00:00.51 DS CSIZE HEAP_SIZE RATIO -------------------- ---------- ---------- ------ 08B3608C 2076 498561776 .00 08B3EAAC 488738512 498561776 98.03 08B578F4 2080 498561776 .00 08B5A114 2080 498561776 .00 0FD400B0 2080 498561776 .00 free 9814948 498561776 1.97 6 rows selected. Elapsed: 00:00:00.54
Well, again, another subheap(08B3EAAC) does occupy 98% of the space.
oradebug dump heapdump_addr 1 0x0A08EAAC UKJA@ukja102> @heap_analyze heap_subdump_2.trc UKJA@ukja102> set echo off TSIZE ---------- 488662552 Elapsed: 00:00:02.50 ATYPE CSIZE HEAP_SIZE RATIO ------------------------ ---------- ---------- ------ freeable 4750540 488662552 .97 perm 144 488662552 .00 free 483911868 488662552 99.03 Elapsed: 00:00:03.37 CTYPE CSIZE HEAP_SIZE RATIO ------------------------------ ---------- ---------- ------ DPAGE 4750096 488662552 .97 peihstdep 260 488662552 .00 perm 144 488662552 .00 pl_iot_alloc 92 488662552 .00 pl_lut_alloc 92 488662552 .00 free 483911868 488662552 99.03 6 rows selected. Elapsed: 00:00:03.20 no rows selected Elapsed: 00:00:02.42
Okay, now I see the problem. The deepest subheap occupies 466M(488662552) but, the 99% is FREE! Oracle does not release the free space even after the call has completed and even denies to reuse it 100%.
This certainly looks like a bug and I search metalink using these keywords.
And this links me to the bug 5866410 which is described as the free chunks problem with FORALL insertion.
Bulk insert in PLSQL can consume a large amount of PGA memory which can lead to ORA-4030 errors.
A heapdump will show lot of free memory in the free lists which is not used but instead fresh allocations are made.
Well, above process might seem natural and easy for who read this post, but following things make the process somewhat difficult to follow
- The heap dump file is sometimes quite long and hard to analyze without proper analysis tool.
- Heap has hierarchical structure as shown above. It sometimes is composed of multiple subheaps and subheap itself could be composed of multiple subheapds.
- The object names are sometimes hard to decode.
But I believe that above test case proves the usefulness of the PGA heap dump analysis and it would turn out to be very successful under certain circumstances.
Dion,
Nice presentation.
The problem is that it’s almost impossible to identify on which objects we’re losing memory. As far as I know, Oracle does not expose the PGA memory structure by v$ views nor x$views.
If you want to play with the pga heap from SQL, you might want to take a look at x$ksmpp.
Jonathan Lewis
June 9, 2009 at 6:43 am
A couple of other items you might want to look at:
As far as I can tell, x$ksmup is the session heap; x$ksmsp is the sga heap (and x$ksmspr is the “shared pool reserved).
x$ksmhp exposes the subheaps of the sga heap, but you can only address it by ksmchds, and x$ksmmem is the complete memory array. But x$ksmhp and x$ksmmem are only for the sga heap – I haven’t found their equivalents for the pga and session heaps.
Jonathan Lewis
June 9, 2009 at 7:20 am
@Jonathan.
Thank you for the infos.
I would look into those views for further investigation.
Dion Cho
June 9, 2009 at 7:23 am
The script heap_analyse must be refined. Spacing in trace file is platform dependent. On my Linux x86_64 the following line “select to_number(substr(column_value, 21, 9)) as csize,” return empty string and the to_number make it error. After investigation the heap size value in trace file was at 30 not 21.
I suggest using regexp_replace to extract values
” select to_number(regexp_replace(column_value,’.*sz=[[:space:]]+([[:digit:]]+)[[:space:]].*’,’\1′)) as csize”
tested it and it worked.
B. Polarski
June 9, 2009 at 12:16 pm
humm… and the rest must also be transformed:
select
to_number(regexp_replace(column_value,’.*sz=[[:space:]]+([[:digit:]]+)[[:space:]].*’,’\1′)) csize,
rtrim(regexp_replace(column_value,’.*sz=[[:space:]]+[[:digit:]]+[[:space:]]+([[:alnum:]]+)[[:space:]].*’,’\1′)) atype,
rtrim(regexp_replace(column_value,’.*[“]+(.*)[“].*’,’\1′)) ctype,
rtrim(regexp_replace(column_value,’.*[“]+.*[“](.*)’,’\1′)) ds
from table(get_trace_file2(‘&1’))
B. Polarski
June 9, 2009 at 12:53 pm
@Polarski
Thanks for the work.
That’s why I left the comment “this works only for local database”. It was just a dirty sample and you made it better!
Dion Cho
June 10, 2009 at 2:41 am
[…] 1-How to deal with memory leaks by using memory dumps. (PGA leak bug) Jonathan Lewis – PGA Leaks Dion Cho – Troubleshooting PGA leak with PGA heap dump […]
Blogroll Report 06/06/09 – 12/06/09 « Coskan’s Approach to Oracle
June 12, 2009 at 7:33 pm
Hello Don,
With impdp we get
ORA-04030: out of process memory when trying to allocate 2080 bytes (kxs-heap-c,kghsseg: qcstxsInit)
As far as we know no real memory shortage.
Could impdp use array bulk inserts and hit the same bug?
Regards Hans-Peter
Regards Hans-Peter
Hans-Peter Sloot
August 20, 2009 at 9:48 am
Hans.
1. Have you checked your OS configuration?
2. And tried analyzing PGA heap dump?
3. And searched metalink for that?
Dion Cho
August 21, 2009 at 11:52 pm
Hi Dion,
Sorry I should have added.
The ulimit where checked again and again.
The impdp shows various ORA-04030. Until now we have generated at least 4 different.
I have no direct access to the instance.
The dba’s managed to get a heap dump for (sort subheap,sort key) generated while impdp is creating indexes. The heapdump shows that the sort subheap has only 6k free whereas 64k is requested. The total pga is 358M with 6M free in various subheaps.
Metalink is not of much help.
Oracle Support isn’t either alas.
The just tell us to exclude indexes + statistics + procedure etc.
Regards Hans-Peter
Hans-Peter Sloot
August 24, 2009 at 8:32 am
@Hans.
Thansk for the adding.
Without proper data(including PGA heap dump + corresponding subheap dump) when 4030 took place, there would be not much thing to tell for me.
You might like to post the question on the OTN forum and/or freelist.
Regards.
Dion Cho
August 24, 2009 at 9:33 am
Dion,
This is really good stuff, and help me immensely this morning while troubleshooting what looked like a leak in a sessions PGA. Thanks for taking the time to post it.
Thanks,
Steve
Steve Howard
March 8, 2010 at 3:01 pm