Dion Cho – Oracle Performance Storyteller

We are natural born scientists

Underperformance of V$SQL_OPTIMIZER_ENV

with 8 comments

A couple of days ago, I was trying to search the non-default optimizer environments(aka. parameters) of a specific SQL query which has multiple child cursors. But it soon turned out that it’s almost impossible to search this simple view just because of the underperformance.

Followings are what I mean.

1. Create simple objects and gather statistics with histogram.

TPACK@ukja1106> select * from v$version;

BANNER
------------------------------------------------------------------------

Oracle Database 11g Enterprise Edition Release 11.1.0.6.0 - Production

TPACK@ukja1106> create table t1(c1 int, c2 int);

Table created.

TPACK@ukja1106> 
TPACK@ukja1106> insert into t1 select level, mod(level,5) from dual
  2  connect by level <= 10000;

10000 rows created.

TPACK@ukja1106> 
TPACK@ukja1106> create index t1_n1 on t1(c1);

Index created.

TPACK@ukja1106> 
TPACK@ukja1106> exec dbms_stats.gather_table_stats(user, 't1', method_opt=>'for columns size 5 c1 size 5 c2');

PL/SQL procedure successfully completed.

2. Now I create 200 child cursors for a simple query using the notorious combination of SIMILAR cursor sharing and histogram.

TPACK@ukja1106> -- session #1
TPACK@ukja1106> alter session set cursor_sharing = similar;

Session altered.

TPACK@ukja1106> 
TPACK@ukja1106> declare
  2    v_cnt   number;
  3  begin
  4    for idx in 1 .. 200 loop
  5      execute immediate 'select count(*) from t1 where c1 = ' || idx || ' and c2 = ' || idx
  6        into v_cnt;
  7    end loop;
  8  end;
  9  /

PL/SQL procedure successfully completed.

Now I have 200 child cursors.

TPACK@ukja1106> -- version count
TPACK@ukja1106> select version_count from v$sqlarea where sql_id = '&sql_id';

VERSION_COUNT
-------------
          200

3. But the problem is that simple search on V$SQL_OPTIMIZER_ENV is extremely slow. Following simple lookup takes 17.3 sec to be accomplished.

TPACK@ukja1106> -- get sql optimizer env
TPACK@ukja1106> select count(*) from v$sql_optimizer_env
  2  where sql_id = '&sql_id' and child_number = &child_number;

  COUNT(*)
----------
        37

Elapsed: 00:00:17.29

The bigger problem is that this simple query eats up 100% of CPU and makes the whole system hang with the library cache related contention as the count of the child cursors increases. Restarting the instance was the only way to continue my test.

4. The query follows the effective index range scan on the fixed index. No more tuning available.

TPACK@ukja1106> explain plan for
  2  select count(*) from v$sql_optimizer_env
  3  where sql_id = '&sql_id' and child_number = &child_number;

-----------------------------------------------------------------------------------------------
| Id  | Operation                | Name               | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |                    |     1 |    47 |     0   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE          |                    |     1 |    47 |            |          |
|*  2 |   FIXED TABLE FIXED INDEX| X$KQLFSQCE (ind:2) |     1 |    47 |     0   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("KQLFSQCE_SQLID"='ftg19cu1jf8st' AND "KQLFSQCE_CHNO"=0 AND
              "INST_ID"=USERENV('INSTANCE') AND BITAND("KQLFSQCE_FLAGS",8)=0 AND
              (BITAND("KQLFSQCE_FLAGS",4)=0 OR BITAND("KQLFSQCE_FLAGS",2)=0))

5. Querying the time model and latch activity tells nothing meaningful. It would have been an aggressive mutex activity which made my laptop hang as this is 11g enviroment.

TIME       DB time                                     17572455
TIME       sql execute elapsed time                    17562899
TIME       DB CPU                                      17500000

LATCH      session allocation                             31344
LATCH      DML lock allocation                            21187
LATCH      row cache objects                              12711
LATCH      shared pool                                     9956
LATCH      cache buffers chains                            8095

I’ve profiled the callstack and got nothing meaningful. It just shows the API call for the X$VIEW lookup.

STACK_TRACE                                                                         HIT_CNT    HIT_PCT
-------------------------------------------------------------------------------- ---------- ----------
00000000                                                                                 10        100
 ->7C80B710
  ->_OracleThreadStart@4()
   ->_opimai()
    ->_opimai_real()
     ->_sou2o()
      ->_opidrv()
       ->_opiodr()
        ->_opiino()
         ->_opitsk()
          ->_ttcpip()
           ->_opiodr()
            ->_kpoal8()
             ->_opifch2()
              ->_qergsFetch()
               ->_qerfxFetch()
                ->_kqlfsqce()
                 ->_kglic()
                  ->_kglic0()
                   ->_kgligi()
                    ->_kqlfsqcet()  <-- X$VIEW
                     ->_kghalf()
                      ->00000000
                       ->7C80B710
                        ->_OracleOradebugThreadStart@4()
                         ->_ssthreadsrgruncallback()
                          ->_ksdxcb()
                           ->_ksdxfdmp()
                            ->_ksedst_tracecb()
                             ->_ksedst1()
                              ->_skdstdst()

This problem was reproducible only when the query has many(over 200?) child cursors. And the V$[SYS|SES]_OPTIMIZER_ENV view did not have such problem.

Metalink didn’t give any bug related to V$SQL_OPTIMIZER_ENV. So I suspect that this is not a bug, but the lookup code for the view has some under-optimized part on which I have no control.

– Appended : 2010/5/20
This behavior is likely to be dependent on versions. I would share the result of more comprehensive tests on this in case something meaningful found.

– Appended : 2010/5/28
It looks like OS problem. More specifically, this unpderperformance took only in Oracle on Windows. Didn’t have a chance to dig it in more detail.

About these ads

Written by Dion Cho

May 20, 2010 at 2:21 am

Posted in Misc.

Tagged with

8 Responses

Subscribe to comments with RSS.

  1. Dion,

    1) base R1 release? That is not what one should use for test purposes
    2) it’s a good thing to provide a testing script – so that anyone could run it any time they want with a simple copy-paste in a different test-bed
    3) I wasn’t able to reproduce your results using 10.2.0.5 on Linux x86

    Timur Akhmadeev

    May 20, 2010 at 9:02 am

    • Hi, Timur.

      It was 11gR1(11.1.0.6) database, but in my local databases, 10gR2(10.2.0.1) showed similar phenomenon.

      I would do more comprehensive test on the different versions and share the result. It is very likely that different versions would show different results.

      Dion Cho

      May 20, 2010 at 9:30 am

  2. I am curious to see the execution plan (DBMS_XPLAN with ALLSTATS LAST) for the query:

    select count(*) from v$sql_optimizer_env  
    where sql_id = '&sql_id' and child_number = &child_number;
    

    Have you collected dictionary statistics? If not, that might explain the long query execution time.

    11.1.0.6 is known to generate excessive numbers of child cursors due to the effects of adaptive cursor sharing (this was mostly fixed in 11.1.0.7). I have sample code that will generate in excess of 2,000 child cursors (multiple child cursors will be displayed in V$SQL with the same CHILD_NUMBER) when 10,000 bind variable combinations are provided for a single SQL statement.

    Charles Hooper

    May 20, 2010 at 11:31 am

  3. I too could not reproduce your result. Also on 11.1.0.6. How many objects in your test database ( count(*) from dba_objects ) ? Mine has 18858. Did you gather statistics on fixed objects?

    Wolfgang Breitling

    May 21, 2010 at 4:58 pm

  4. We can’t reproduce on 11gR2 and 10gR2. BTW, Don, will you decide to publish your books in english ?

    Piqouskerberos

    May 27, 2010 at 10:00 am

    • Hi, Piqouskerberos.

      This problem seems to be reproducible only on Windows platform. Not sure why but one possibility would be some under performant implementation for the mutex(or latch) on Windows.

      By the way, I would really like to write books in English, but not in the near future. :)

      Dion Cho

      May 28, 2010 at 4:08 am

  5. Dion,

    Just out of idle curiosity, do you see the same performance issue if you query by hash_value (which is the other “indexed” column in that structure ?

    Jonathan Lewis

    May 29, 2010 at 7:19 pm

    • Jonathan,

      Querying by hash_value shows the same performance problem. I’ve been thinking of tracing system call(OS kernel API call) but on Windows system, I don’t have a good idea of tracing system call. I’m on the research on that. :)

      Dion Cho

      May 31, 2010 at 12:21 am


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 58 other followers

%d bloggers like this: