Dion Cho – Oracle Performance Storyteller

We are natural born scientists

Strong doubt on small table(_small_table_threshold)

with 5 comments

It is well known that the table is called “small” when the size is smaller than 2% of the buffer cache. That size is calculated on startup into _small_table_threshold parameter. Some links about this:

  1. http://jonathanlewis.wordpress.com/2007/01/29/small-partitions/
  2. http://www.ixora.com.au/q+a/cache.htm

For instance, in my local database whose db_cache_size is 200m, the _db_block_buffers is 24950. Hence _small_table_threshold should be 24950*2% = 499.

SYS@ukja10> @para _db_cache_size

NAME                           VALUE
------------------------------ --------------------
__db_cache_size                209715200

SYS@ukja10> @para _db_block_buffers

NAME                           VALUE                
------------------------------ --------------------
_db_block_buffers              24950

SYS@ukja10> @para small_table_threshold

NAME                           VALUE                
------------------------------ --------------------
_small_table_threshold         499





Why is this 2% threshold so important? Because it affects the efficiency of LRU algorithm of the buffer cache.

  • Table < (2%*buffer_cache) is located into the middle point of LRU list when loaded into the buffer cache.
  • Table > (2%*buffer_cache) is located into the cold end of the LRU list when loaded into the buffer cache.

This simple rule enables Oracle to prevent the buffer cache from being flooded by the full scan on big table. But the negative effect is that the full table scan on the big table is always physical(when it is read after the buffer cache got full). No matter how often the table is read.

Okay, so far so good.

A couple of days ago, I was asked the exactly same question from my customer.

What is the threshold of being small table?

I was just about to say that the threshold is 2% of the buffer cache. But wait… I’ve never proved it myself. So I tried to make a simple test case to demonstrate the 2% rule.

But, to my surprise, the famous 2% rule never worked! The more surprising thing is that in my local 10gR2 database, the threshold seems to be around 25% of the buffer cache.

Here goes the test case.

UKJA@ukja102> select * from v$version;

BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Prod
PL/SQL Release 10.2.0.1.0 - Production
CORE    10.2.0.1.0      Production
TNS for 32-bit Windows: Version 10.2.0.1.0 - Production
NLSRTL Version 10.2.0.1.0 - Production

-- When the db_cache_size = 100m (without ASMM)
__db_cache_size                104857600
_db_block_buffers              12475
_small_table_threshold         249

-- When the db_cache_size = 200m (without ASMM)
__db_cache_size                209715200
_db_block_buffers              24950
_small_table_threshold         499


-- a bit big table (a bit bigger than 25% of db_cache_size)
UKJA@ukja102> create table t1(c1 char(2000), c2 char(2000), c3 char(2000)) nologging;

Table created.

UKJA@ukja102> insert /*+ append */ into t1
  2  select 'x', 'x', 'x'
  3  from dual
  4  connect by level <= 1*3400    -- when db_cache_size = 100m
  -- connect by level <= 2*3400    -- when db_cache_size = 200m 
  5  ;

3400 rows created.

-- a bit small table(a bit smaller than 25% of db_cache_size)
UKJA@ukja102> create table t2(c1 char(2000), c2 char(2000), c3 char(2000)) nologging;

Table created.

UKJA@ukja102> insert /*+ append */ into t2
  2  select 'x', 'x', 'x'
  3  from dual
  4  connect by level <= 1*3100    -- when db_cache_size = 100m 
--  connect by level <= 2*3100    -- when db_cache_size = 200m 
  5  ;

3100 rows created.

-- big table(which a bit bigger than db_cache_size)
UKJA@ukja102> create table big_table(c1 char(2000), c2 char(2000), c3 char(2000)) nologging;

Table created.

UKJA@ukja102> insert /*+ append */ into big_table
  2  select 'x', 'x', 'x'
  3  from dual
  4  connect by level <= 1*13000    -- when db_cache_size = 100m
-- connect by level <= 2*13000    -- when db_cache_size = 200m
  5  ;

13000 rows created.


-- Read table big_table after flushing buffer cache.
-- This would fill out the buffer cache.

UKJA@ukja102> alter system flush buffer_cache;

System altered.

UKJA@ukja102> -- read big_table
UKJA@ukja102> select count(*) from big_table;

  COUNT(*)                                                                      
----------                                                                      
     13000         


-- Read table t1(a bit bigger than 25% of db_cache_size). 
-- See that physical reads is equal to logical reads?
-- This means that table t1 is always located to the cold end of LRU list.

UKJA@ukja102> select count(*) from t1;

  COUNT(*)                                                                      
----------                                                                      
      3400                                                                      

UKJA@ukja102> select count(*) from t1;

  COUNT(*)                                                                      
----------                                                                      
      3400                                                                      

UKJA@ukja102> set autot on
UKJA@ukja102> select count(*) from t1;

  COUNT(*)                                                                      
----------                                                                      
      3400                                                                      

Statistics
----------------------------------------------------------                      
          0  recursive calls                                                    
          0  db block gets                                                      
       3407  consistent gets                                                    
       3376  physical reads                                                     
          0  redo size                                                          
        411  bytes sent via SQL*Net to client                                   
        400  bytes received via SQL*Net from client                             
          2  SQL*Net roundtrips to/from client                                  
          0  sorts (memory)                                                     
          0  sorts (disk)                                                       
          1  rows processed                                                     

UKJA@ukja102> set autot off


-- Read table t2(a bit smaller than 25% of db_cache_size). 
-- See that physical is almost gone?
-- This means that table t1 is always located to the mid point of the LRU list.

UKJA@ukja102> select count(*) from t2;

  COUNT(*)                                                                      
----------                                                                      
      3100                                                                      

UKJA@ukja102> select count(*) from t2;

  COUNT(*)                                                                      
----------                                                                      
      3100        

UKJA@ukja102> set autot on
UKJA@ukja102> select count(*) from t2;

  COUNT(*)                                                                      
----------                                                                      
      3100                                                                      

Statistics
----------------------------------------------------------                      
          0  recursive calls                                                    
          0  db block gets                                                      
       3106  consistent gets                                                    
         32  physical reads                                                     
          0  redo size                                                          
        411  bytes sent via SQL*Net to client                                   
        400  bytes received via SQL*Net from client                             
          2  SQL*Net roundtrips to/from client                                  
          0  sorts (memory)                                                     
          0  sorts (disk)                                                       
          1  rows processed                                                     

UKJA@ukja102> set autot off





Small table threshold is just around 25% of db_cache_size with both of 100m and 200m size!

This result gave me a strong doubt on 2% rule.

  • Oracle would have unknown internal threshold. This threshold would vary by version and/or db_cache_size. Memory management algorithm would be another factor.
  • 2% rule is poorly documented and it has never been fully verified.

Well, from now on, I would not trust the 2% stuff. But I strongly hope someone would have better answer and test case than mine.




About these ads

Written by Dion Cho

April 22, 2009 at 5:37 am

Posted in I/O

Tagged with , ,

5 Responses

Subscribe to comments with RSS.

  1. Dion,

    There are a lot of little tests of the around “_small_table_threshold” that I still have to do. But one point to consider is that if the buffer cache is completely free when you start testing then the the tablescan is allowed to fill (possibly all) the free blocks and therefore ignore the small table threshold. After all, the main purpose of the threshold is to make sure that a large tablescan doesn’t push other (more useful) blocks out of the buffer cache – so if there’s nothing in the buffer cache there’s nothing to lose by keeping the table blocks.

    Other thoughts (that I keep meaning to check):
    recycle and keep caches, non-standard block sizes, automatic memory management: if the threshold is 2% is this 2% of the total memory allocated to cache, or is it 2% of the 4KB cache size if you’re reading a 4KB block table, and 2% of the recycle cache if you’re reading a table allocated to recycle; and does the threshold change dynamically as the cache size changes dynamically.

    Moreover, at some point in the recent past this parameter started to affect the behaviour of parallel table scans (and parallel index fast full scans) and direct reads. Again, I haven’t checked all the details, but a parallel read may not go direct if the table is less than the small table threshold … and a serial read may go direct if the table is larger than the threshold etc…

    Jonathan Lewis

    April 22, 2009 at 10:15 am

  2. @Jonathan.

    Thanks for visiting my blog.

    1. Yes, loading big table in the free buffer cache is no problem. I put this into consideration in my test case.

    2. _small_table_threshold value itself is calculated as (2% * _db_block_buffers), which means that this value is 2% of the standard size default pool buffers.

    3. The description of _small_table_threshold is “threshold level of table size for direct reads”. Literally, it means that Oracle would not do direct read for tables whose size is smaller than this value. I would verifty this and share it.

    Dion Cho

    April 22, 2009 at 12:40 pm

  3. Dion,

    My questions were rhetorical – so I wasn’t expecting you to answer any of them, but but your comment in para. 2 doesn’t match some tests I did a couple of years ago. I have the following note in one of my test scripts for short tables:


    rem 10.2.0.3
    rem May be 2% of buffer "count" - not size.
    rem _small_table_threshold was 379 when 8K and 2K caches
    rem were 32M - giving 3992 and 14960 buffers respectively.
    rem 0.02 * (3992 + 14960) = 379.

    In passing, the definition of the paramter in 8.1.7.4 is: “level of table size for forget-bit enabled during scan”: I can’t remember what it was before that, and I don’t remember exactly when it changed to the “level of table size for direct reads” but I think it was some time in 9.2

    Regards
    Jonathan Lewis

    Jonathan Lewis

    April 22, 2009 at 2:43 pm

  4. @Jonathan.

    My assumption of _small_table_threshold being (2% * _db_block_buffers) was just based on my simple test case. Your test case implies the complexity of calculating _small_table_threshold.

    But still I doubt the fact that _small_table_threshold is used to control the way of full scaned blocks being loaded into the LRU list.

    As you pointed out, Oracle might have changed the usage of this parameter to control direct read.

    I’ve made simple test case to verify this.

    UKJA@ukja102> col value new_value sth
    UKJA@ukja102> @para _small_table_threshold
    NAME                           VALUE
    ------------------------------ --------------------
    _small_table_threshold         499                  
    
    UKJA@ukja102> -- Table t1 is a bit bigger than _small_table_threshold
    UKJA@ukja102> create table t1(c1 char(2000), c2 char(2000), c3 char(2000));
    
    Table created.
    
    UKJA@ukja102> insert into t1 select 'x', 'x', 'x' from dual
      2  connect by level <= &sth + 50
      3  ;
    
    549 rows created.
    
    UKJA@ukja102> -- Table t2 is a bit smaller than _small_table_threshold
    UKJA@ukja102> create table t2(c1 char(2000), c2 char(2000), c3 char(2000));
    
    Table created.
    
    UKJA@ukja102> insert into t2 select 'x', 'x', 'x' from dual
      2  connect by level <= &sth - 50
      3  ;
    
    449 rows created.
    

    10046 trace cleary shows that
    - Table whose size is bigger than _small_table_threshold is read by direct read
    - But table whose size is smaller than _small_table_threshold is read by conventional read

    select /*+ parallel(t1) */ count(*) 
    from
     t1
    
    call     count       cpu    elapsed       disk      query    current        rows
    ------- ------  -------- ---------- ---------- ---------- ----------  ----------
    Parse        2      0.00       0.00          0          0          0           0
    Execute      2      0.01       0.58        264        314          0           0
    Fetch        0      0.00       0.00          0          0          0           0
    ------- ------  -------- ---------- ---------- ---------- ----------  ----------
    total        4      0.01       0.58        264        314          0           0
    
    Misses in library cache during parse: 0
    Optimizer mode: ALL_ROWS
    Parsing user id: 61     (recursive depth: 1)
    
    Rows     Row Source Operation
    -------  ---------------------------------------------------
          0  SORT AGGREGATE (cr=0 pr=0 pw=0 time=0 us)
          0   PX COORDINATOR  (cr=0 pr=0 pw=0 time=0 us)
          0    PX SEND QC (RANDOM) :TQ10000 (cr=0 pr=0 pw=0 time=0 us)
          1     SORT AGGREGATE (cr=176 pr=148 pw=0 time=250604 us)
        148      PX BLOCK ITERATOR (cr=176 pr=148 pw=0 time=19765 us)
        148       TABLE ACCESS FULL T1 (cr=176 pr=148 pw=0 time=241743 us)
    
    
    Elapsed times include waiting on following events:
      Event waited on                             Times   Max. Wait  Total Waited
      ----------------------------------------   Waited  ----------  ------------
      PX Deq: Execution Msg                          27        0.02          0.02
      direct path read                               31        0.10          0.52
    
    
    select /*+ parallel(t2) */ count(*) 
    from
     t2
    
    
    call     count       cpu    elapsed       disk      query    current        rows
    ------- ------  -------- ---------- ---------- ---------- ----------  ----------
    Parse        1      0.00       0.00          0          0          0           0
    Execute      1      0.00       0.06          0        149          0           0
    Fetch        0      0.00       0.00          0          0          0           0
    ------- ------  -------- ---------- ---------- ---------- ----------  ----------
    total        2      0.00       0.06          0        149          0           0
    
    Misses in library cache during parse: 0
    Optimizer mode: ALL_ROWS
    Parsing user id: 61     (recursive depth: 1)
    
    Rows     Row Source Operation
    -------  ---------------------------------------------------
          0  SORT AGGREGATE (cr=0 pr=0 pw=0 time=0 us)
          0   PX COORDINATOR  (cr=0 pr=0 pw=0 time=0 us)
          0    PX SEND QC (RANDOM) :TQ10000 (cr=0 pr=0 pw=0 time=0 us)
          1     SORT AGGREGATE (cr=149 pr=0 pw=0 time=42395 us)
         97      PX BLOCK ITERATOR (cr=149 pr=0 pw=0 time=27679 us)
         97       TABLE ACCESS FULL T2 (cr=149 pr=0 pw=0 time=27424 us)
    
    
    Elapsed times include waiting on following events:
      Event waited on                             Times   Max. Wait  Total Waited
      ----------------------------------------   Waited  ----------  ------------
      PX Deq: Execution Msg                          13        0.02          0.03
      PX qref latch                                  17        0.00          0.00
    

    Dion Cho

    April 23, 2009 at 1:49 am

  5. I hit following posts

    http://oraclue.com/2009/07/17/direct-path-reads-and-serial-table-scans-in-11g/

    http://afatkulin.blogspot.com/2009/01/11g-adaptive-direct-path-reads-what-is.html

    , which describe how Oracle 11g expands the usage of _small_table_threshold to enable non-parallel direct path read on the big table.

    Refer to following post.

    http://dioncho.wordpress.com/2009/07/21/disabling-direct-path-read-for-the-serial-full-table-scan-11g/

    Dion Cho

    July 20, 2009 at 3:41 am


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 61 other followers

%d bloggers like this: