Dion Cho – Oracle Performance Storyteller

We are natural born scientists

Archive for the ‘Misc.’ Category

Underperformance of V$SQL_OPTIMIZER_ENV

with 8 comments

A couple of days ago, I was trying to search the non-default optimizer environments(aka. parameters) of a specific SQL query which has multiple child cursors. But it soon turned out that it’s almost impossible to search this simple view just because of the underperformance.

Followings are what I mean.

1. Create simple objects and gather statistics with histogram.

TPACK@ukja1106> select * from v$version;

BANNER
------------------------------------------------------------------------

Oracle Database 11g Enterprise Edition Release 11.1.0.6.0 - Production

TPACK@ukja1106> create table t1(c1 int, c2 int);

Table created.

TPACK@ukja1106> 
TPACK@ukja1106> insert into t1 select level, mod(level,5) from dual
  2  connect by level <= 10000;

10000 rows created.

TPACK@ukja1106> 
TPACK@ukja1106> create index t1_n1 on t1(c1);

Index created.

TPACK@ukja1106> 
TPACK@ukja1106> exec dbms_stats.gather_table_stats(user, 't1', method_opt=>'for columns size 5 c1 size 5 c2');

PL/SQL procedure successfully completed.

2. Now I create 200 child cursors for a simple query using the notorious combination of SIMILAR cursor sharing and histogram.

TPACK@ukja1106> -- session #1
TPACK@ukja1106> alter session set cursor_sharing = similar;

Session altered.

TPACK@ukja1106> 
TPACK@ukja1106> declare
  2    v_cnt   number;
  3  begin
  4    for idx in 1 .. 200 loop
  5      execute immediate 'select count(*) from t1 where c1 = ' || idx || ' and c2 = ' || idx
  6        into v_cnt;
  7    end loop;
  8  end;
  9  /

PL/SQL procedure successfully completed.

Now I have 200 child cursors.

TPACK@ukja1106> -- version count
TPACK@ukja1106> select version_count from v$sqlarea where sql_id = '&sql_id';

VERSION_COUNT
-------------
          200

3. But the problem is that simple search on V$SQL_OPTIMIZER_ENV is extremely slow. Following simple lookup takes 17.3 sec to be accomplished.

TPACK@ukja1106> -- get sql optimizer env
TPACK@ukja1106> select count(*) from v$sql_optimizer_env
  2  where sql_id = '&sql_id' and child_number = &child_number;

  COUNT(*)
----------
        37

Elapsed: 00:00:17.29

The bigger problem is that this simple query eats up 100% of CPU and makes the whole system hang with the library cache related contention as the count of the child cursors increases. Restarting the instance was the only way to continue my test.

4. The query follows the effective index range scan on the fixed index. No more tuning available.

TPACK@ukja1106> explain plan for
  2  select count(*) from v$sql_optimizer_env
  3  where sql_id = '&sql_id' and child_number = &child_number;

-----------------------------------------------------------------------------------------------
| Id  | Operation                | Name               | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |                    |     1 |    47 |     0   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE          |                    |     1 |    47 |            |          |
|*  2 |   FIXED TABLE FIXED INDEX| X$KQLFSQCE (ind:2) |     1 |    47 |     0   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("KQLFSQCE_SQLID"='ftg19cu1jf8st' AND "KQLFSQCE_CHNO"=0 AND
              "INST_ID"=USERENV('INSTANCE') AND BITAND("KQLFSQCE_FLAGS",8)=0 AND
              (BITAND("KQLFSQCE_FLAGS",4)=0 OR BITAND("KQLFSQCE_FLAGS",2)=0))

5. Querying the time model and latch activity tells nothing meaningful. It would have been an aggressive mutex activity which made my laptop hang as this is 11g enviroment.

TIME       DB time                                     17572455
TIME       sql execute elapsed time                    17562899
TIME       DB CPU                                      17500000

LATCH      session allocation                             31344
LATCH      DML lock allocation                            21187
LATCH      row cache objects                              12711
LATCH      shared pool                                     9956
LATCH      cache buffers chains                            8095

I’ve profiled the callstack and got nothing meaningful. It just shows the API call for the X$VIEW lookup.

STACK_TRACE                                                                         HIT_CNT    HIT_PCT
-------------------------------------------------------------------------------- ---------- ----------
00000000                                                                                 10        100
 ->7C80B710
  ->_OracleThreadStart@4()
   ->_opimai()
    ->_opimai_real()
     ->_sou2o()
      ->_opidrv()
       ->_opiodr()
        ->_opiino()
         ->_opitsk()
          ->_ttcpip()
           ->_opiodr()
            ->_kpoal8()
             ->_opifch2()
              ->_qergsFetch()
               ->_qerfxFetch()
                ->_kqlfsqce()
                 ->_kglic()
                  ->_kglic0()
                   ->_kgligi()
                    ->_kqlfsqcet()  <-- X$VIEW
                     ->_kghalf()
                      ->00000000
                       ->7C80B710
                        ->_OracleOradebugThreadStart@4()
                         ->_ssthreadsrgruncallback()
                          ->_ksdxcb()
                           ->_ksdxfdmp()
                            ->_ksedst_tracecb()
                             ->_ksedst1()
                              ->_skdstdst()

This problem was reproducible only when the query has many(over 200?) child cursors. And the V$[SYS|SES]_OPTIMIZER_ENV view did not have such problem.

Metalink didn’t give any bug related to V$SQL_OPTIMIZER_ENV. So I suspect that this is not a bug, but the lookup code for the view has some under-optimized part on which I have no control.

– Appended : 2010/5/20
This behavior is likely to be dependent on versions. I would share the result of more comprehensive tests on this in case something meaningful found.

– Appended : 2010/5/28
It looks like OS problem. More specifically, this unpderperformance took only in Oracle on Windows. Didn’t have a chance to dig it in more detail.

Written by Dion Cho

May 20, 2010 at 2:21 am

Posted in Misc.

Tagged with

Follow

Get every new post delivered to your Inbox.

Join 58 other followers