Strong doubt on small table(_small_table_threshold)
It is well known that the table is called “small” when the size is smaller than 2% of the buffer cache. That size is calculated on startup into _small_table_threshold parameter. Some links about this:
- http://jonathanlewis.wordpress.com/2007/01/29/small-partitions/
- http://www.ixora.com.au/q+a/cache.htm
For instance, in my local database whose db_cache_size is 200m, the _db_block_buffers is 24950. Hence _small_table_threshold should be 24950*2% = 499.
SYS@ukja10> @para _db_cache_size NAME VALUE ------------------------------ -------------------- __db_cache_size 209715200 SYS@ukja10> @para _db_block_buffers NAME VALUE ------------------------------ -------------------- _db_block_buffers 24950 SYS@ukja10> @para small_table_threshold NAME VALUE ------------------------------ -------------------- _small_table_threshold 499
Why is this 2% threshold so important? Because it affects the efficiency of LRU algorithm of the buffer cache.
- Table < (2%*buffer_cache) is located into the middle point of LRU list when loaded into the buffer cache.
- Table > (2%*buffer_cache) is located into the cold end of the LRU list when loaded into the buffer cache.
This simple rule enables Oracle to prevent the buffer cache from being flooded by the full scan on big table. But the negative effect is that the full table scan on the big table is always physical(when it is read after the buffer cache got full). No matter how often the table is read.
Okay, so far so good.
A couple of days ago, I was asked the exactly same question from my customer.
What is the threshold of being small table?
I was just about to say that the threshold is 2% of the buffer cache. But wait… I’ve never proved it myself. So I tried to make a simple test case to demonstrate the 2% rule.
But, to my surprise, the famous 2% rule never worked! The more surprising thing is that in my local 10gR2 database, the threshold seems to be around 25% of the buffer cache.
Here goes the test case.
UKJA@ukja102> select * from v$version; BANNER ---------------------------------------------------------------- Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Prod PL/SQL Release 10.2.0.1.0 - Production CORE 10.2.0.1.0 Production TNS for 32-bit Windows: Version 10.2.0.1.0 - Production NLSRTL Version 10.2.0.1.0 - Production -- When the db_cache_size = 100m (without ASMM) __db_cache_size 104857600 _db_block_buffers 12475 _small_table_threshold 249 -- When the db_cache_size = 200m (without ASMM) __db_cache_size 209715200 _db_block_buffers 24950 _small_table_threshold 499 -- a bit big table (a bit bigger than 25% of db_cache_size) UKJA@ukja102> create table t1(c1 char(2000), c2 char(2000), c3 char(2000)) nologging; Table created. UKJA@ukja102> insert /*+ append */ into t1 2 select 'x', 'x', 'x' 3 from dual 4 connect by level <= 1*3400 -- when db_cache_size = 100m -- connect by level <= 2*3400 -- when db_cache_size = 200m 5 ; 3400 rows created. -- a bit small table(a bit smaller than 25% of db_cache_size) UKJA@ukja102> create table t2(c1 char(2000), c2 char(2000), c3 char(2000)) nologging; Table created. UKJA@ukja102> insert /*+ append */ into t2 2 select 'x', 'x', 'x' 3 from dual 4 connect by level <= 1*3100 -- when db_cache_size = 100m -- connect by level <= 2*3100 -- when db_cache_size = 200m 5 ; 3100 rows created. -- big table(which a bit bigger than db_cache_size) UKJA@ukja102> create table big_table(c1 char(2000), c2 char(2000), c3 char(2000)) nologging; Table created. UKJA@ukja102> insert /*+ append */ into big_table 2 select 'x', 'x', 'x' 3 from dual 4 connect by level <= 1*13000 -- when db_cache_size = 100m -- connect by level <= 2*13000 -- when db_cache_size = 200m 5 ; 13000 rows created. -- Read table big_table after flushing buffer cache. -- This would fill out the buffer cache. UKJA@ukja102> alter system flush buffer_cache; System altered. UKJA@ukja102> -- read big_table UKJA@ukja102> select count(*) from big_table; COUNT(*) ---------- 13000 -- Read table t1(a bit bigger than 25% of db_cache_size). -- See that physical reads is equal to logical reads? -- This means that table t1 is always located to the cold end of LRU list. UKJA@ukja102> select count(*) from t1; COUNT(*) ---------- 3400 UKJA@ukja102> select count(*) from t1; COUNT(*) ---------- 3400 UKJA@ukja102> set autot on UKJA@ukja102> select count(*) from t1; COUNT(*) ---------- 3400 Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 3407 consistent gets 3376 physical reads 0 redo size 411 bytes sent via SQL*Net to client 400 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processed UKJA@ukja102> set autot off -- Read table t2(a bit smaller than 25% of db_cache_size). -- See that physical is almost gone? -- This means that table t1 is always located to the mid point of the LRU list. UKJA@ukja102> select count(*) from t2; COUNT(*) ---------- 3100 UKJA@ukja102> select count(*) from t2; COUNT(*) ---------- 3100 UKJA@ukja102> set autot on UKJA@ukja102> select count(*) from t2; COUNT(*) ---------- 3100 Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 3106 consistent gets 32 physical reads 0 redo size 411 bytes sent via SQL*Net to client 400 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processed UKJA@ukja102> set autot off
Small table threshold is just around 25% of db_cache_size with both of 100m and 200m size!
This result gave me a strong doubt on 2% rule.
- Oracle would have unknown internal threshold. This threshold would vary by version and/or db_cache_size. Memory management algorithm would be another factor.
- 2% rule is poorly documented and it has never been fully verified.
Well, from now on, I would not trust the 2% stuff. But I strongly hope someone would have better answer and test case than mine.
Dion,
There are a lot of little tests of the around “_small_table_threshold” that I still have to do. But one point to consider is that if the buffer cache is completely free when you start testing then the the tablescan is allowed to fill (possibly all) the free blocks and therefore ignore the small table threshold. After all, the main purpose of the threshold is to make sure that a large tablescan doesn’t push other (more useful) blocks out of the buffer cache – so if there’s nothing in the buffer cache there’s nothing to lose by keeping the table blocks.
Other thoughts (that I keep meaning to check):
recycle and keep caches, non-standard block sizes, automatic memory management: if the threshold is 2% is this 2% of the total memory allocated to cache, or is it 2% of the 4KB cache size if you’re reading a 4KB block table, and 2% of the recycle cache if you’re reading a table allocated to recycle; and does the threshold change dynamically as the cache size changes dynamically.
Moreover, at some point in the recent past this parameter started to affect the behaviour of parallel table scans (and parallel index fast full scans) and direct reads. Again, I haven’t checked all the details, but a parallel read may not go direct if the table is less than the small table threshold … and a serial read may go direct if the table is larger than the threshold etc…
Jonathan Lewis
April 22, 2009 at 10:15 am
@Jonathan.
Thanks for visiting my blog.
1. Yes, loading big table in the free buffer cache is no problem. I put this into consideration in my test case.
2. _small_table_threshold value itself is calculated as (2% * _db_block_buffers), which means that this value is 2% of the standard size default pool buffers.
3. The description of _small_table_threshold is “threshold level of table size for direct reads”. Literally, it means that Oracle would not do direct read for tables whose size is smaller than this value. I would verifty this and share it.
Dion Cho
April 22, 2009 at 12:40 pm
Dion,
My questions were rhetorical – so I wasn’t expecting you to answer any of them, but but your comment in para. 2 doesn’t match some tests I did a couple of years ago. I have the following note in one of my test scripts for short tables:
rem 10.2.0.3
rem May be 2% of buffer "count" - not size.
rem _small_table_threshold was 379 when 8K and 2K caches
rem were 32M - giving 3992 and 14960 buffers respectively.
rem 0.02 * (3992 + 14960) = 379.
In passing, the definition of the paramter in 8.1.7.4 is: “level of table size for forget-bit enabled during scan”: I can’t remember what it was before that, and I don’t remember exactly when it changed to the “level of table size for direct reads” but I think it was some time in 9.2
Regards
Jonathan Lewis
Jonathan Lewis
April 22, 2009 at 2:43 pm
@Jonathan.
My assumption of _small_table_threshold being (2% * _db_block_buffers) was just based on my simple test case. Your test case implies the complexity of calculating _small_table_threshold.
But still I doubt the fact that _small_table_threshold is used to control the way of full scaned blocks being loaded into the LRU list.
As you pointed out, Oracle might have changed the usage of this parameter to control direct read.
I’ve made simple test case to verify this.
10046 trace cleary shows that
– Table whose size is bigger than _small_table_threshold is read by direct read
– But table whose size is smaller than _small_table_threshold is read by conventional read
Dion Cho
April 23, 2009 at 1:49 am
I hit following posts
http://oraclue.com/2009/07/17/direct-path-reads-and-serial-table-scans-in-11g/
http://afatkulin.blogspot.com/2009/01/11g-adaptive-direct-path-reads-what-is.html
, which describe how Oracle 11g expands the usage of _small_table_threshold to enable non-parallel direct path read on the big table.
Refer to following post.
https://dioncho.wordpress.com/2009/07/21/disabling-direct-path-read-for-the-serial-full-table-scan-11g/
Dion Cho
July 20, 2009 at 3:41 am