Dion Cho – Oracle Performance Storyteller

We are natural born scientists

Archive for the ‘I/O’ Category

Interesting combination of RAC and serial direct path read

with 7 comments

For who are not aware, Oracle 11g has introduced a serial direct path read which enables the efficient direct path read for the serial table scan.

Disabling Direct path read for the serial full table scan-11g

A couple of days ago, I was reported a weird performance bias between 2 nodes in 11g RAC.

The query has the same execution plan on both nodes.

---------------------------------------------------------
| Id  | Operation                       | Name         | 
---------------------------------------------------------
|   0 | SELECT STATEMENT                |              | 
|   1 |  NESTED LOOPS                   |              | 
|*  2 |   TABLE ACCESS FULL             | 			   | 
|*  3 |   VIEW                          |              | 
|   4 |    UNION ALL PUSHED PREDICATE   |              | 
|*  5 |     FILTER                      |              | 
|   6 |      TABLE ACCESS BY INDEX ROWID| 			   | 
|*  7 |       INDEX RANGE SCAN          | 			   | 
|*  8 |     FILTER                      |              | 
|   9 |      TABLE ACCESS BY INDEX ROWID| 			   | 
|* 10 |       INDEX RANGE SCAN          | 			   | 
---------------------------------------------------------

But on node A, it takes 12 sec to complete while takes only 2 sec on node B.

-- node A
all  count    cpu  elapsed    disk    query  current    rows 
------- ------ -------- ---------- ---------- ---------- ---------- ---------- 
Parse    1    0.06    0.06      0      0      0      0 
Execute    1    0.00    0.00      0      0      0      0 
Fetch    2    1.31    10.48    90421    90437      0      2 
------- ------ -------- ---------- ---------- ---------- ---------- ---------- 
total    4    1.37    10.55    90421    90437      0      2 
Elapsed times include waiting on following events: 
Event waited on                  Times  Max. Wait Total Waited 
----------------------------------------  Waited ---------- ------------ 
library cache lock                    1    0.00      0.00 
library cache pin                    1    0.00      0.00 
SQL*Net message to client              2    0.00      0.00 
enq: KO - fast object checkpoint          2    0.00      0.00 
reliable message                    1    0.00      0.00 
direct path read                  1579    0.00      0.01 
SQL*Net message from client              2    23.81      23.81

-- node B
call  count    cpu  elapsed    disk    query  current    rows 
------- ------ -------- ---------- ---------- ---------- ---------- ---------- 
Parse    1    0.03    0.03      0      0      0      0 
Execute    1    0.00    0.00      0      0      0      0 
Fetch    2    2.01    2.01      0    90450      0      2 
------- ------ -------- ---------- ---------- ---------- ---------- ---------- 
total    4    2.04    2.05      0    90450      0      2 

Elapsed times include waiting on following events: 
Event waited on                  Times  Max. Wait Total Waited 
----------------------------------------  Waited ---------- ------------ 
SQL*Net message to client              2    0.00      0.00 
gc cr block 2-way                    1    0.00      0.00 
SQL*Net message from client              2    23.53      23.53 
library cache lock                    1    0.00      0.00 
library cache pin                    1    0.00      0.00

This performance bias was always reproduced. It took always 12 sec on node A and 2 sec on node B.

The hint was direct path read wait event which means that the direct path read was being performed.

  • Because this is not a parallel execution, we have a strong chance of the serial direct path read.
  • Nocache LOB can’t be a reason here. If the nocache LOB is the issue, both nodes should have the same performance feature.
  • But we should be careful when interpreting the wait count and wait time of direct path read event. We can’t rely on these values due to the mechanism of the direct path read.

The customer was suggested of the 10949 diagnostics event as a workaround and the problem was gone(I know that this event is undocumented and unsupported, but… )

It seems that node A and node B had different environments(different cache size and/or different size of preload blocks …) and as a result, Oracle chose the different ways of executing FTS – direct vs. conventional.

Can this be classified as another dark side(side effect) of the automization by Oracle? I’m not sure but the unexpected bias between the multiple nodes is always likely to embarrass the DBAs.

Advertisements

Written by Dion Cho

June 9, 2010 at 4:31 am

Posted in I/O

Tagged with

Serial Direct Path Read and Block Cleanout(11g)

with 2 comments

Chris Antognini posted an article about the serial direct path read and repeated block cleanout problem here. I’d just like to add my own interpretation on this phenomenon.

In summary,

  1. Oracle needs to cleanout the dirty blocks to determine the status of the block and rolls them back if necessary.
  2. But it does not mean that Oracle modifies the dirty blocks. It just checks the status of the dirty blocks.
  3. So, it does not generate redo and does not physically cleanout the dirty blocks.

Following is the the result of block dump which helps you to understand exactly when Oracle physically modifies the dirty block.

-- before update
 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x0007.00a.000036ed  0x00c0032b.1929.08  C---    0  scn 0x07df.2427c615
0x02   0x0000.000.00000000  0x00000000.0000.00  ----    0  fsc 0x0000.00000000

-- after update
 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x0007.00a.000036ed  0x00c0032b.1929.08  C---    0  scn 0x07df.2427c615
0x02   0x0006.001.0000480a  0x00c010b1.17f3.02  ----    1  fsc 0x0000.00000000 (cleanout is delayed)

-- after direct path read
 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x0007.00a.000036ed  0x00c0032b.1929.08  C---    0  scn 0x07df.2427c615
0x02   0x0006.001.0000480a  0x00c010b1.17f3.02  ----    1  fsc 0x0000.00000000 (wasn't cleaned out)

-- after conventional path read
 Itl           Xid                  Uba         Flag  Lck        Scn/Fsc
0x01   0x0007.00a.000036ed  0x00c0032b.1929.08  C---    0  scn 0x07df.2427c615
0x02   0x0006.001.0000480a  0x00c010b1.17f3.02  C---    0  scn 0x07df.2427ffae (was cleaned out)

Statistics shows another insight.
(block count# is 6010 in my test case)

-- after direct path read
consistent gets from cache                      6,025
consistent gets - examination                   6,011
immediate (CR) block cleanout applicatio        6,010
ns                                                   
cleanout - number of ktugct calls               6,010
cleanouts only - consistent read gets           6,010
commit txn count during cleanout                6,010
redo size                                       4,112

-- after conventional read
redo size                                     438,084
session logical reads                          12,103
consistent gets                                12,076
consistent gets from cache                     12,076
redo entries                                    6,025
consistent gets - examination                   6,016
cleanout - number of ktugct calls               6,016
cleanouts only - consistent read gets           6,010
immediate (CR) block cleanout applicatio        6,010
ns                                                   
commit txn count during cleanout                6,010
consistent gets from cache (fastpath)           5,668

We can tell that Oracle internally cleans out the dirty blocks from the statistics. But the serial direct path read does not load the current version of the block into the buffer cache. So it cannot modify the current block. It just reads the disk-version of the blocks from the data file and refers to the undo to cleanout the dirty blocks.

I think that it is unfair to call this a problem, but under certain situation it might cause problems like ORA-01555 error or repeated reads on the undo.

Footnote1:The traditional term of cleanout meant the modification on the dirty block to me, but I realized that I need more sophiscated terminology.

Footnote2: I believe that the parallel direct path read has the same design and feature with regard to the cleanout mechanism.

Written by Dion Cho

July 29, 2009 at 4:15 am

Posted in I/O, Misc.

Tagged with ,

Disabling direct path read for the serial full table scan – 11g

with 11 comments

It’s a good news – Oracle 11g has implemented direct path read for the serial full table scan!

But I don’t like this kind of one-way optimization. What I want is to enable/disable it whenever I want it – putting Oracle under my control.

So I had a research for a couple of hours and found how to disable it. Oracle 11g has introduced 10949 event to control this.

UKJA@ukja116> @oerr 10949
10949                                                                           
 "Disable autotune direct path read for full table scan"                        
// *Cause:                                                                      
// *Action:  Disable autotune direct path read for serial full table scan.      
//                                                                              

Here is a simple and dirty example how to use this event.

1. Create a table big enough to meet the threshold which seems to be 5 * _small_table_threshold.

UKJA@ukja116> col value new_value sth
UKJA@ukja116> @para small_table
old   9: and i.ksppinm like '%&1%'
new   9: and i.ksppinm like '%small_table%'

NAME                           VALUE                IS_DEFAUL SES_MODIFI        
------------------------------ -------------------- --------- ----------        
SYS_MODIFI                                                                      
----------                                                                      
DESCRIPTION                                                                     
--------------------------------------------------------------------------------
_small_table_threshold         637                  TRUE      true              
deferred                                                                        
threshold level of table size for direct reads                                  
                                                                                
Elapsed: 00:00:00.04
UKJA@ukja116> 
UKJA@ukja116> create table t1(c1 number, c2 char(2000), c3 char(2000), c4 char(2000));

Table created.

Elapsed: 00:00:00.21
UKJA@ukja116> insert into t1
  2  select level, 'x', 'x', 'x'
  3  from dual connect by level <= 10 + 5*&sth;
old   3: from dual connect by level <= 10 + 5*&sth
new   3: from dual connect by level <= 10 + 5*637

3195 rows created.

Elapsed: 00:00:01.14
UKJA@ukja116> commit;

Commit complete.

Elapsed: 00:00:00.00
UKJA@ukja116> @gather t1
UKJA@ukja116> exec dbms_stats.gather_table_stats(user, '&1', no_invalidate=>false);

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.68

2. Now I compare the wait event(direct path read vs. db file scattered read) of the two queries – one with 10949 disabled(default) and the other with 10949 enabled.

-- case#1
alter system flush buffer_cache;
alter session set events '10046 trace name context forever, level 8';
select count(*) from t1;

-- case#2
alter system flush buffer_cache;
alter session set events '10046 trace name context forever, level 8';
alter session set events '10949 trace name context forever, level 1';
select count(*) from t1;

3. Do you see that 10949 enabled one behaves just like the previous version? The famous db file scattered read event!

SQL ID : 5bc0v4my7dvr5
select count(*) 
from
 t1

-- Case #1
Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  SQL*Net message to client                       2        0.00          0.00
  db file sequential read                         3        0.02          0.04
  db file scattered read                          1        0.02          0.02
  direct path read                              231        0.29          1.67
  SQL*Net message from client                     2        0.03          0.03


-- Case #2
Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  SQL*Net message to client                       2        0.00          0.00
  db file scattered read                        213        0.26          1.77
  SQL*Net message from client                     2        0.00          0.00

Don’t forget 10949 event when you want to disable this great feature for any reason!

Footnote1:What you should keep in mind is that this is not a CBO feature but a runtime execution engine feature. It would always show the same execution plan – TABLE ACCESS FULL. When the execution engine runs this operation, it determines how to read the table(direct path read or conventional path read) considering various factors.

Footnote2:There is “_serial_direct_read” parameter which forces the full scan on the big table to be a direct read – The opposite of 10949 event.

Written by Dion Cho

July 21, 2009 at 3:38 am

Strong doubt on small table(_small_table_threshold)

with 5 comments

It is well known that the table is called “small” when the size is smaller than 2% of the buffer cache. That size is calculated on startup into _small_table_threshold parameter. Some links about this:

  1. http://jonathanlewis.wordpress.com/2007/01/29/small-partitions/
  2. http://www.ixora.com.au/q+a/cache.htm

For instance, in my local database whose db_cache_size is 200m, the _db_block_buffers is 24950. Hence _small_table_threshold should be 24950*2% = 499.

SYS@ukja10> @para _db_cache_size

NAME                           VALUE
------------------------------ --------------------
__db_cache_size                209715200

SYS@ukja10> @para _db_block_buffers

NAME                           VALUE                
------------------------------ --------------------
_db_block_buffers              24950

SYS@ukja10> @para small_table_threshold

NAME                           VALUE                
------------------------------ --------------------
_small_table_threshold         499





Why is this 2% threshold so important? Because it affects the efficiency of LRU algorithm of the buffer cache.

  • Table < (2%*buffer_cache) is located into the middle point of LRU list when loaded into the buffer cache.
  • Table > (2%*buffer_cache) is located into the cold end of the LRU list when loaded into the buffer cache.

This simple rule enables Oracle to prevent the buffer cache from being flooded by the full scan on big table. But the negative effect is that the full table scan on the big table is always physical(when it is read after the buffer cache got full). No matter how often the table is read.

Okay, so far so good.

A couple of days ago, I was asked the exactly same question from my customer.

What is the threshold of being small table?

I was just about to say that the threshold is 2% of the buffer cache. But wait… I’ve never proved it myself. So I tried to make a simple test case to demonstrate the 2% rule.

But, to my surprise, the famous 2% rule never worked! The more surprising thing is that in my local 10gR2 database, the threshold seems to be around 25% of the buffer cache.

Here goes the test case.

UKJA@ukja102> select * from v$version;

BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Prod
PL/SQL Release 10.2.0.1.0 - Production
CORE    10.2.0.1.0      Production
TNS for 32-bit Windows: Version 10.2.0.1.0 - Production
NLSRTL Version 10.2.0.1.0 - Production

-- When the db_cache_size = 100m (without ASMM)
__db_cache_size                104857600
_db_block_buffers              12475
_small_table_threshold         249

-- When the db_cache_size = 200m (without ASMM)
__db_cache_size                209715200
_db_block_buffers              24950
_small_table_threshold         499


-- a bit big table (a bit bigger than 25% of db_cache_size)
UKJA@ukja102> create table t1(c1 char(2000), c2 char(2000), c3 char(2000)) nologging;

Table created.

UKJA@ukja102> insert /*+ append */ into t1
  2  select 'x', 'x', 'x'
  3  from dual
  4  connect by level <= 1*3400    -- when db_cache_size = 100m
  -- connect by level <= 2*3400    -- when db_cache_size = 200m 
  5  ;

3400 rows created.

-- a bit small table(a bit smaller than 25% of db_cache_size)
UKJA@ukja102> create table t2(c1 char(2000), c2 char(2000), c3 char(2000)) nologging;

Table created.

UKJA@ukja102> insert /*+ append */ into t2
  2  select 'x', 'x', 'x'
  3  from dual
  4  connect by level <= 1*3100    -- when db_cache_size = 100m 
--  connect by level <= 2*3100    -- when db_cache_size = 200m 
  5  ;

3100 rows created.

-- big table(which a bit bigger than db_cache_size)
UKJA@ukja102> create table big_table(c1 char(2000), c2 char(2000), c3 char(2000)) nologging;

Table created.

UKJA@ukja102> insert /*+ append */ into big_table
  2  select 'x', 'x', 'x'
  3  from dual
  4  connect by level <= 1*13000    -- when db_cache_size = 100m
-- connect by level <= 2*13000    -- when db_cache_size = 200m
  5  ;

13000 rows created.


-- Read table big_table after flushing buffer cache.
-- This would fill out the buffer cache.

UKJA@ukja102> alter system flush buffer_cache;

System altered.

UKJA@ukja102> -- read big_table
UKJA@ukja102> select count(*) from big_table;

  COUNT(*)                                                                      
----------                                                                      
     13000         


-- Read table t1(a bit bigger than 25% of db_cache_size). 
-- See that physical reads is equal to logical reads?
-- This means that table t1 is always located to the cold end of LRU list.

UKJA@ukja102> select count(*) from t1;

  COUNT(*)                                                                      
----------                                                                      
      3400                                                                      

UKJA@ukja102> select count(*) from t1;

  COUNT(*)                                                                      
----------                                                                      
      3400                                                                      

UKJA@ukja102> set autot on
UKJA@ukja102> select count(*) from t1;

  COUNT(*)                                                                      
----------                                                                      
      3400                                                                      

Statistics
----------------------------------------------------------                      
          0  recursive calls                                                    
          0  db block gets                                                      
       3407  consistent gets                                                    
       3376  physical reads                                                     
          0  redo size                                                          
        411  bytes sent via SQL*Net to client                                   
        400  bytes received via SQL*Net from client                             
          2  SQL*Net roundtrips to/from client                                  
          0  sorts (memory)                                                     
          0  sorts (disk)                                                       
          1  rows processed                                                     

UKJA@ukja102> set autot off


-- Read table t2(a bit smaller than 25% of db_cache_size). 
-- See that physical is almost gone?
-- This means that table t1 is always located to the mid point of the LRU list.

UKJA@ukja102> select count(*) from t2;

  COUNT(*)                                                                      
----------                                                                      
      3100                                                                      

UKJA@ukja102> select count(*) from t2;

  COUNT(*)                                                                      
----------                                                                      
      3100        

UKJA@ukja102> set autot on
UKJA@ukja102> select count(*) from t2;

  COUNT(*)                                                                      
----------                                                                      
      3100                                                                      

Statistics
----------------------------------------------------------                      
          0  recursive calls                                                    
          0  db block gets                                                      
       3106  consistent gets                                                    
         32  physical reads                                                     
          0  redo size                                                          
        411  bytes sent via SQL*Net to client                                   
        400  bytes received via SQL*Net from client                             
          2  SQL*Net roundtrips to/from client                                  
          0  sorts (memory)                                                     
          0  sorts (disk)                                                       
          1  rows processed                                                     

UKJA@ukja102> set autot off





Small table threshold is just around 25% of db_cache_size with both of 100m and 200m size!

This result gave me a strong doubt on 2% rule.

  • Oracle would have unknown internal threshold. This threshold would vary by version and/or db_cache_size. Memory management algorithm would be another factor.
  • 2% rule is poorly documented and it has never been fully verified.

Well, from now on, I would not trust the 2% stuff. But I strongly hope someone would have better answer and test case than mine.




Written by Dion Cho

April 22, 2009 at 5:37 am

Posted in I/O

Tagged with , ,

Simple and stupid test on consistent read

with 6 comments

One of the most beautiful and important features of Oracle database is consistent read. Yes, that is what makes Oracle be Oracle.

But I always see many misconceptions and confusions on the consistent read among engineers.

So I made a very simple and stupid demonstration to show how Oracle’s consistent read works.

1. Create objects.

-- Create simple table whose one row occupies one block
UKJA@ukja102> create table 
    t1(c1 int, c2 char(2000), c3 char(2000), c4 char(2000));

Table created.

-- Insert one row
UKJA@ukja102> insert into t1 values(1, 'x', 'x', 'x');

1 row created.

UKJA@ukja102> commit;

Commit complete.




2. Get the block address and browse X$BH view for that block.(bh.sql is here)

UKJA@ukja102> col f# new_value file_no
UKJA@ukja102> col b# new_value blk_no
UKJA@ukja102>
UKJA@ukja102> select dbms_rowid.rowid_relative_fno(rowid) as f#,
  2        dbms_rowid.rowid_block_number(rowid) as b#
  3  from t1
  4  ;

        F#         B#
---------- ----------
        10      35470

-- We have one CURRENT block(xcur). CURRENT block is always one and only!
UKJA@ukja102> @bh &file_no &blk_no
DBARFIL     DBABLK      CLASS STATE      CR_SCN_BAS CR_SCN_WRP CR_UBA_FIL
---------- ---------- ---------- ---------- ---------- ---------- ----------
CR_UBA_BLK CR_UBA_SEQ OBJECT_NAME
---------- ---------- --------------------
        10      35470          1 xcur                0          0          0
         0          0 T1




3. Now, let’s see how consecutive DML statements generate CR blocks.

UKJA@ukja102> -- update and monitor cr block
UKJA@ukja102> -- 1st update
UKJA@ukja102> update t1 set c2 = 'y', c3 = 'y', c4 = 'y' where c1 = 1;

1 row updated.

UKJA@ukja102> @bh &file_no &blk_no
DBARFIL     DBABLK      CLASS STATE      CR_SCN_BAS CR_SCN_WRP CR_UBA_FIL
---------- ---------- ---------- ---------- ---------- ---------- ----------
CR_UBA_BLK CR_UBA_SEQ OBJECT_NAME
---------- ---------- --------------------
        10      35470          1 xcur                0          0          0
         0          0 T1

        10      35470          1 cr          599680015       2015          0
         0          0 T1


UKJA@ukja102> -- 2nd update
UKJA@ukja102> update t1 set c2 = 'y', c3 = 'y', c4 = 'y' where c1 = 1;

1 row updated.

UKJA@ukja102> @bh &file_no &blk_no
old  14:   dbarfil = &1 and
new  14:   dbarfil = 10 and
old  15:   dbablk = &2
new  15:   dbablk = 35470

   DBARFIL     DBABLK      CLASS STATE      CR_SCN_BAS CR_SCN_WRP CR_UBA_FIL
---------- ---------- ---------- ---------- ---------- ---------- ----------
CR_UBA_BLK CR_UBA_SEQ OBJECT_NAME
---------- ---------- --------------------
        10      35470          1 xcur                0          0          0
         0          0 T1

        10      35470          1 cr          599680017       2015          0
         0          0 T1

        10      35470          1 cr          599680015       2015          0
         0          0 T1


UKJA@ukja102> -- 3rd update
UKJA@ukja102> update t1 set c2 = 'y', c3 = 'y', c4 = 'y' where c1 = 1;

1 row updated.

UKJA@ukja102> @bh &file_no &blk_no
   DBARFIL     DBABLK      CLASS STATE      CR_SCN_BAS CR_SCN_WRP CR_UBA_FIL
---------- ---------- ---------- ---------- ---------- ---------- ----------
CR_UBA_BLK CR_UBA_SEQ OBJECT_NAME
---------- ---------- --------------------
        10      35470          1 xcur                0          0          0
         0          0 T1

        10      35470          1 cr          599680019       2015          0
         0          0 T1

        10      35470          1 cr          599680017       2015          0
         0          0 T1

        10      35470          1 cr          599680015       2015          0
         0          0 T1


UKJA@ukja102> -- 4th update
UKJA@ukja102> update t1 set c2 = 'y', c3 = 'y', c4 = 'y' where c1 = 1;

1 row updated.

UKJA@ukja102> @bh &file_no &blk_no
   DBARFIL     DBABLK      CLASS STATE      CR_SCN_BAS CR_SCN_WRP CR_UBA_FIL
---------- ---------- ---------- ---------- ---------- ---------- ----------
CR_UBA_BLK CR_UBA_SEQ OBJECT_NAME
---------- ---------- --------------------
        10      35470          1 xcur                0          0          0
         0          0 T1

        10      35470          1 cr          599680021       2015          0
         0          0 T1

        10      35470          1 cr          599680019       2015          0
         0          0 T1

        10      35470          1 cr          599680017       2015          0
         0          0 T1

        10      35470          1 cr          599680015       2015          0
         0          0 T1

UKJA@ukja102> -- 5th update
UKJA@ukja102> update t1 set c2 = 'y', c3 = 'y', c4 = 'y' where c1 = 1;

1 row updated.

UKJA@ukja102> @bh &file_no &blk_no
   DBARFIL     DBABLK      CLASS STATE      CR_SCN_BAS CR_SCN_WRP CR_UBA_FIL
---------- ---------- ---------- ---------- ---------- ---------- ----------
CR_UBA_BLK CR_UBA_SEQ OBJECT_NAME
---------- ---------- --------------------
        10      35470          1 xcur                0          0          0
         0          0 T1

        10      35470          1 cr          599680023       2015          0
         0          0 T1

        10      35470          1 cr          599680021       2015          0
         0          0 T1

        10      35470          1 cr          599680019       2015          0
         0          0 T1

        10      35470          1 cr          599680017       2015          0
         0          0 T1

        10      35470          1 cr          599680015       2015          0
         0          0 T1


6 rows selected.

UKJA@ukja102> -- 6th update
UKJA@ukja102> update t1 set c2 = 'y', c3 = 'y', c4 = 'y' where c1 = 1;

1 row updated.

UKJA@ukja102> @bh &file_no &blk_no
   DBARFIL     DBABLK      CLASS STATE      CR_SCN_BAS CR_SCN_WRP CR_UBA_FIL
---------- ---------- ---------- ---------- ---------- ---------- ----------
CR_UBA_BLK CR_UBA_SEQ OBJECT_NAME
---------- ---------- --------------------
        10      35470          1 xcur                0          0          0
         0          0 T1

        10      35470          1 cr          599680025       2015          0
         0          0 T1

        10      35470          1 cr          599680023       2015          0
         0          0 T1

        10      35470          1 cr          599680021       2015          0
         0          0 T1

        10      35470          1 cr          599680019       2015          0
         0          0 T1

        10      35470          1 cr          599680017       2015          0
         0          0 T1


6 rows selected.

UKJA@ukja102> commit;

Commit complete.



Oracle seems to generate CR blocks in circual way. The maximum number of CR blocks per CURRENT block is controlled by this hidden parameter. It’s default value is 6 which seems to mean 5 CR block + 1 CURRENT block.

UKJA@ukja102> @para _db_block_max_cr_dba
NAME                           VALUE                SES_MODIFI SYS_MODIFI
------------------------------ -------------------- ---------- ----------
DESCRIPTION
--------------------------------------------------------------------------------
_db_block_max_cr_dba           6                    false      false
Maximum Allowed Number of CR buffers per dba



Oracle needs to make a balance here. More CR blocks would enable logical reads without rollback but waste the buffer cache.


4. We need only 1 logical reads without rollback.

UKJA@ukja102> col rid new_value v_rid
UKJA@ukja102> select rowid as rid from t1;

RID
------------------
AAAUa4AAKAAAIqOAAA

UKJA@ukja102> select * from t1 where rowid = '&v_rid';

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          1  consistent gets
          0  physical reads
          0  redo size
       6591  bytes sent via SQL*Net to client
        400  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed
          
UKJA@ukja102> @bh &file_no &blk_no
   DBARFIL     DBABLK      CLASS STATE      CR_SCN_BAS CR_SCN_WRP CR_UBA_FIL
---------- ---------- ---------- ---------- ---------- ---------- ----------
CR_UBA_BLK CR_UBA_SEQ OBJECT_NAME
---------- ---------- --------------------
        10      35470          1 xcur                0          0          0
         0          0 T1





5. But 2 logical reads with rollback. This is the overhead of Oracle’s consistent implementation.

UKJA@ukja102> -- Session #1. make DML statements on the block.
We need at least 6 consecutive DMLs to make a full rotation of CR blocks. 
This would force Oracle to read undo block to build CR block for select query of session #2.

UKJA@ukja102> update t1 set c2 = 'y', c3 = 'y', c4 = 'y' where c1 = 1;

1 row updated.

UKJA@ukja102> update t1 set c2 = 'y', c3 = 'y', c4 = 'y' where c1 = 1;

1 row updated.

UKJA@ukja102> update t1 set c2 = 'y', c3 = 'y', c4 = 'y' where c1 = 1;

1 row updated.

UKJA@ukja102> update t1 set c2 = 'y', c3 = 'y', c4 = 'y' where c1 = 1;

1 row updated.

UKJA@ukja102> update t1 set c2 = 'y', c3 = 'y', c4 = 'y' where c1 = 1;

1 row updated.

UKJA@ukja102> update t1 set c2 = 'y', c3 = 'y', c4 = 'y' where c1 = 1;

1 row updated.

UKJA@ukja102> commit;

Commit complete.

UKJA@ukja102> -- Session #2 (This query should be started before updates of session #1 
and should not end until the update completes).

UKJA@ukja102> select * from t1 where rowid = '&v_rid';

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          2  consistent gets
          0  physical reads
          0  redo size
       6591  bytes sent via SQL*Net to client
        400  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed


UKJA@ukja102> -- Which undo is being read? 
Note that Oracle read (file#=2,block#2966) undo block to rollback successfully.

UKJA@ukja102> @bh &file_no &blk_no
   DBARFIL     DBABLK      CLASS STATE      CR_SCN_BAS CR_SCN_WRP CR_UBA_FIL
---------- ---------- ---------- ---------- ---------- ---------- ----------
CR_UBA_BLK CR_UBA_SEQ OBJECT_NAME
---------- ---------- --------------------
        10      35470          1 cr          599680154       2015          2
      2966       6468 T1

        10      35470          1 xcur                0          0          0
         0          0 T1

        10      35470          1 cr          599680163       2015          0
         0          0 T1

        10      35470          1 cr          599680161       2015          0
         0          0 T1

        10      35470          1 cr          599680159       2015          0
         0          0 T1

        10      35470          1 cr          599680157       2015          0
         0          0 T1


6 rows selected.



The overhead of rollback in consistent read is negligible at most time, but under specific situation, we get an unbelievable logical reads on undo blocks and/or ORA-1555 error.



6. 10200 event(consistent read buffer status) shows another interesting(but not that useful) information. When rollback is being activated by consistent read, Oracle seems to print all the related CR block information.

-- without rollback
Consistent read started for block 7 : 02808a8e
  env: (scn: 0x07df.23be65f7  xid: 0x0000.000.00000000  uba: 0x00000000.0000.00  statement num=0  parent xid: xid: 0x0000.000.00000000  scn: 0x0000.00000000 0sch: scn: 0x0000.00000000)
Consistent read finished for block 7 : 2808a8e
Consistent read finished for block 7 : 2808a8e

-- with rollback
Consistent read started for block 7 : 02808a8e
  env: (scn: 0x07df.23be659f  xid: 0x0000.000.00000000  uba: 0x00000000.0000.00  statement num=0  parent xid: xid: 0x0000.000.00000000  scn: 0x0000.00000000 0sch: scn: 0x0000.00000000)
CR exa ret 9 on:  03C44148  scn: 0xffff.ffffffff  xid: 0x0000.000.00000000  uba: 0x00000000.0000.00  scn: 0xffff.ffffffff  sfl: 0
CR exa ret 9 on:  10BE8D20  scn: 0x07df.23be65ab  xid: 0x0000.000.00000000  uba: 0x00000000.0000.00  scn: 0x07df.23be65ab  sfl: 0
CR exa ret 9 on:  107EFA70  scn: 0x07df.23be65a9  xid: 0x0000.000.00000000  uba: 0x00000000.0000.00  scn: 0x07df.23be65a9  sfl: 0
CR exa ret 9 on:  107EE3C0  scn: 0x07df.23be65a7  xid: 0x0000.000.00000000  uba: 0x00000000.0000.00  scn: 0x07df.23be65a7  sfl: 0
CR exa ret 9 on:  107F3D20  scn: 0x07df.23be65a5  xid: 0x0000.000.00000000  uba: 0x00000000.0000.00  scn: 0x07df.23be65a5  sfl: 0
CR exa ret 9 on:  107F8810  scn: 0x07df.23be65a3  xid: 0x0000.000.00000000  uba: 0x00000000.0000.00  scn: 0x07df.23be65a3  sfl: 0
Consistent read finished for block 7 : 2808a8e
Consistent read finished for block 7 : 2808a8e





Well, I meant simple test case(which is actually simple), but the result was too lengthy post to be blogged. :(

As mentioned above, I just wanted to draw a simplified picture of how Oracle’s consistent read works. This basic demonstration would help you to understand how to interpret the pros and cons of consistent read in Oracle.

Written by Dion Cho

April 18, 2009 at 3:20 pm

Buffer Pinning

with 23 comments

One of my colleagues asked me about the meaning of buffer is pinned count and buffer pinning.

The baisc concept of buffer pinning.

In case of consecutive access on the same buffer within the same fetch call, Oracle pins the buffer so as to visit the buffer without acquiring the cache buffers chains latch and searching the cache buffers chains. The duration of pin is fetch-call scope, so we have a decreased overhead without the pin contention.

Basic but hard to understand. :(

So I made a very simple test case to demonstrate what is buffer pinning and how we interpret it. Here it goes.

UKJA@ukja102> select * from v$version;

BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Prod
PL/SQL Release 10.2.0.1.0 - Production
CORE    10.2.0.1.0      Production
TNS for 32-bit Windows: Version 10.2.0.1.0 - Production
NLSRTL Version 10.2.0.1.0 - Production

UKJA@ukja102> create table t1(c1 int, c2 int, c3 char(100));

Table created.

UKJA@ukja102>
UKJA@ukja102> insert into t1
  2  select level, dbms_random.random, 'x'
  3  from dual connect by level <= 10000;

10000 rows created.

UKJA@ukja102>
UKJA@ukja102> create index t1_n1 on t1(c1);

Index created.

UKJA@ukja102> create index t1_n2 on t1(c2);

Index created.

UKJA@ukja102> exec dbms_stats.gather_table_stats(user, 't1', no_invalidate=>false);

PL/SQL procedure successfully completed.





Note that index T1_N1 and T1_N2 have almost same data distributions, but clustering factors of big difference.

UKJA@ukja102> select index_name, clustering_factor
  2  from user_indexes where table_name = 'T1';

INDEX_NAME           CLUSTERING_FACTOR                                          
-------------------- -----------------                                          
T1_N1                              164                                          
T1_N2                             9933                                          





The comparison using SQL*Trace shows that the index T1_N1(good clustering factor of 164) has only 186 logical reads but the index T1_N2(bad clustering factor of 9933) has 9,960 logical reads for the same volume of data!

select /*+ index(t1 t1(c1)) */ count(c3) 
from
 t1 where c1 is not null

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2      0.00       0.11         69        186          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4      0.00       0.11         69        186          0           1


select /*+ index(t1 t1(c2)) */ count(c3) 
from
 t1 where c2 is not null

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2      0.07       0.05          0      9960          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4      0.07       0.05          0       9960          0           1




Big difference! Where does this number come from?

Session diff makes me realize that this number comes from the diffference of buffer is pinned count

-- index T1_N1(good clustering factor)
NAME                                             DIFF
---------------------------------------- ------------
buffer is pinned count                         19,835
table fetch by rowid                           10,000
session logical reads                             219
consistent gets from cache                        199
consistent gets                                   199
buffer is not pinned count                        165

-- index T1_N2(bad clustering factor)
NAME                                             DIFF
---------------------------------------- ------------
buffer is pinned count                         10,066
table fetch by rowid                           10,000
session logical reads                           9,998
consistent gets from cache                      9,975
consistent gets                                 9,975
no work - consistent read gets                  9,958





Good clustering factor enables us to have a consecutive visit on the same table block. The result is frequent buffer pinning and decreased logical reads.

Same effect with cache buffers chains latch.

-- index T1_N1(good clustering factor)
LATCH_NAME                         D_GETS
------------------------------ ----------
cache buffers chains                  492

-- index T1_N2(bad clustering factor)
LATCH_NAME                         D_GETS
------------------------------ ----------
cache buffers chains                20108




The main usage of buffer pinning is table access by rowid via index and nested loop join.

One thing to be cautious about. We do not have 100 times of performance gain with 1/100 of reduced logical reads. Buffer pinning enables us to read the buffer without the overhead of latch, but we still need to read the buffer!

As far as I know, we have no direct control on buffer pinning. It’s intrinsic mechanism of Oracle. But you can still make the use of it for index scan and nested loop join. It matters how the corresponing datas are sorted.

Written by Dion Cho

April 8, 2009 at 12:56 am

Loading index to keep buffer pool

with one comment

Keep buffer pool is designed to keep objects that are frequently accessed through multi block I/O.

But I’ve seen several cases that people want to load index(not table) into the keep buffer pool. To alleviate the random access I/O on index.

Some people are skeptical on this. When the index is accessed through single bock I/O, the visited blocks are loaded to the mid point of LRU list. When more sessions visit them, they would got promoted to hot area.  Yes, theoretically, there would be no reason to keep frequently accessed index. Frequentl accessed index would survive anyway!

But in real life, I’ve seen many people who want to gurantee that the index never gets flushed out from SGA(but who can guarantee?). And in one speical case, I had a customer who wanted to preload the entire index to keep buffer just after the database starts up.

The best way is to read the index by fast full scan from after-startup trigger. Like this:

create index t1_n1 on t1(c1) storage ( buffer_pool keep);
...

create or replace trigger trg_startup after startup on database
begin
    for idx in (select /*+ index_ffs(t1 t1_n1) */ count(c1) from t1 where c1 is not null) loop
      null;
   end loop;
end;
/

Written by Dion Cho

February 10, 2009 at 9:02 am

Posted in I/O, Index

Tagged with