Dion Cho – Oracle Performance Storyteller

We are natural born scientists

Buffer Pinning

with 23 comments

One of my colleagues asked me about the meaning of buffer is pinned count and buffer pinning.

The baisc concept of buffer pinning.

In case of consecutive access on the same buffer within the same fetch call, Oracle pins the buffer so as to visit the buffer without acquiring the cache buffers chains latch and searching the cache buffers chains. The duration of pin is fetch-call scope, so we have a decreased overhead without the pin contention.

Basic but hard to understand. :(

So I made a very simple test case to demonstrate what is buffer pinning and how we interpret it. Here it goes.

UKJA@ukja102> select * from v$version;

BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Prod
PL/SQL Release 10.2.0.1.0 - Production
CORE    10.2.0.1.0      Production
TNS for 32-bit Windows: Version 10.2.0.1.0 - Production
NLSRTL Version 10.2.0.1.0 - Production

UKJA@ukja102> create table t1(c1 int, c2 int, c3 char(100));

Table created.

UKJA@ukja102>
UKJA@ukja102> insert into t1
  2  select level, dbms_random.random, 'x'
  3  from dual connect by level <= 10000;

10000 rows created.

UKJA@ukja102>
UKJA@ukja102> create index t1_n1 on t1(c1);

Index created.

UKJA@ukja102> create index t1_n2 on t1(c2);

Index created.

UKJA@ukja102> exec dbms_stats.gather_table_stats(user, 't1', no_invalidate=>false);

PL/SQL procedure successfully completed.





Note that index T1_N1 and T1_N2 have almost same data distributions, but clustering factors of big difference.

UKJA@ukja102> select index_name, clustering_factor
  2  from user_indexes where table_name = 'T1';

INDEX_NAME           CLUSTERING_FACTOR                                          
-------------------- -----------------                                          
T1_N1                              164                                          
T1_N2                             9933                                          





The comparison using SQL*Trace shows that the index T1_N1(good clustering factor of 164) has only 186 logical reads but the index T1_N2(bad clustering factor of 9933) has 9,960 logical reads for the same volume of data!

select /*+ index(t1 t1(c1)) */ count(c3) 
from
 t1 where c1 is not null

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2      0.00       0.11         69        186          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4      0.00       0.11         69        186          0           1


select /*+ index(t1 t1(c2)) */ count(c3) 
from
 t1 where c2 is not null

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2      0.07       0.05          0      9960          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4      0.07       0.05          0       9960          0           1




Big difference! Where does this number come from?

Session diff makes me realize that this number comes from the diffference of buffer is pinned count

-- index T1_N1(good clustering factor)
NAME                                             DIFF
---------------------------------------- ------------
buffer is pinned count                         19,835
table fetch by rowid                           10,000
session logical reads                             219
consistent gets from cache                        199
consistent gets                                   199
buffer is not pinned count                        165

-- index T1_N2(bad clustering factor)
NAME                                             DIFF
---------------------------------------- ------------
buffer is pinned count                         10,066
table fetch by rowid                           10,000
session logical reads                           9,998
consistent gets from cache                      9,975
consistent gets                                 9,975
no work - consistent read gets                  9,958





Good clustering factor enables us to have a consecutive visit on the same table block. The result is frequent buffer pinning and decreased logical reads.

Same effect with cache buffers chains latch.

-- index T1_N1(good clustering factor)
LATCH_NAME                         D_GETS
------------------------------ ----------
cache buffers chains                  492

-- index T1_N2(bad clustering factor)
LATCH_NAME                         D_GETS
------------------------------ ----------
cache buffers chains                20108




The main usage of buffer pinning is table access by rowid via index and nested loop join.

One thing to be cautious about. We do not have 100 times of performance gain with 1/100 of reduced logical reads. Buffer pinning enables us to read the buffer without the overhead of latch, but we still need to read the buffer!

As far as I know, we have no direct control on buffer pinning. It’s intrinsic mechanism of Oracle. But you can still make the use of it for index scan and nested loop join. It matters how the corresponing datas are sorted.

About these ads

Written by Dion Cho

April 8, 2009 at 12:56 am

23 Responses

Subscribe to comments with RSS.

  1. Why is “buffer is pinned count” higher in the “good clustering” case? Oracle reads (lio) less buffers and the rows to be extracted are the same.

    Best regards,
    Roberto

    Roberto

    April 8, 2009 at 1:50 pm

  2. @Roberto: it’s easier to understand if you read this as “buffer is already pinned”:
    – you pin the buffer on first access
    – second access (for “good” case) is to same block so no need to pin again (and this counter gets incremented).
    – When your index scan finally requires you to fetch a row from another block (and/or to move on to the next index block?), the data block is finally unpinned;
    – on reading the next data block, the logical read count is incremented.

    HTH

    Nigel Thomas

    April 8, 2009 at 4:19 pm

  3. @Nigel
    Thanks for the answer.

    One thing to add.
    In this test case, we have buffer pinning in the table block and index block.

    It is dependent on the access pattern where we have buffer pinning.

    Dion Cho

    April 8, 2009 at 9:48 pm

  4. Strange, huh ?

    UKJA@ukja102> insert into t1
    2 select level, dbms_random.random, ‘x’ — Note the randomness!
    3 from dual connect by level create index t1_n1 on t1(c1);

    Index created.

    Rudi Kristanto

    April 9, 2009 at 3:47 am

  5. @Rudi.

    Thanks for pointing out the typo.

    It turned out to be the mixture of copy-and-paste error and wordpress’s inability to handle < character.

    Dion Cho

    April 9, 2009 at 4:00 am

  6. Nigel,
    if I understand correctly, you mean:
    In “good clustering factor” case:
    get buffer -> pin buffer -> get row -> pin buffer -> get row -> pin buffer -> … -> unpin buffer -> get another buffer -> …

    In “bad clustering factor” case:
    get buffer -> pin buffer -> get row -> unpin buffer -> get another buffer -> pin buffer -> get row -> unpin buffer -> …

    That is one pin for row.

    Omitting index block gets (about 20-30), in good and bad cases pin count should be 10000 (in bad case, they really are).

    Why are they 20000 in “good clustering factor” case?

    Roberto

    Roberto

    April 9, 2009 at 8:52 am

  7. Roberto

    In “best” clustering case, you get “buffer is (already) pinned” incremented for all rows (except the first in each block, if I understand right).

    In “worst” clustering case (where no successive index entries point to the same block), the buffer is pinned once per row; it is *never* “already pinned”.

    HTH

    Nigel Thomas

    April 9, 2009 at 12:55 pm

  8. I don’t understand.

    In “best” clustering case:
    “you get “buffer is (already) pinned” incremented for all rows”
    IMPLIES
    “buffer is pinned count” about “number of rows” about 10000

    In “worst” clustering case:
    “the buffer is pinned once per row”
    IMPLIES
    “buffer is pinned count” about “number of rows” about 10000

    It doesn’matter ALREADY:
    “buffer is pinned count” = pinned + already pinned

    Why are they 20000 in “best clustering factor” case?

    Roberto

    April 9, 2009 at 1:49 pm

  9. [...] your primary key.” In the comments, Yas links to a post illustrating the culprit here—buffer pinning, by Don [...]

  10. I have always a doubt about buffer pinning.

    If there are 3 concurrent users running the same query can they read the same buffer block at same time?

    lscheng

    May 1, 2009 at 7:51 pm

  11. @Lscheng

    Read operation acquires buffer pin in shared mode. So concurrent reads do not block each other.

    Read operation(shared mode) and update operation(exclusive mode) can block each other.

    But the blocking is normally not a problem. The duration of buffer pinning in read operation is

    “fetch on the same block”, so the blocking time is very short and it would not make a noticeable contention.

    Cocurrent read on the same block can be blocked by “cache buffers chains latch(CBCL)”. The corresponding wait event is “latch: cache buffers chains”.

    CBCL is sharedable for read operation, but at the very moment of acquiring buffer pin in shared mode to read the buffer Oracle needs to acquire CBCL in exclusive mode. That’s why we are blocked even with the cocurrent read.

    Dion Cho

    May 1, 2009 at 11:40 pm

  12. Hi Dion

    My understanding is that walking a cache buffer chain looking for a buffer handle is latched by the corresponding latch and that is why I wonder if a block can be read at sametime.

    So from your answer it looks like if concurrent processes trying to read the same block are readers can walk the buffer chain at the same time?

    A bit confused because first you have said reads do not block each other but at the end you said we are blocked even with concurrent read. (Hey that is why I always had this doubt :-) )

    Thanks for your time

    lscheng

    May 2, 2009 at 11:55 am

  13. @Lscheng

    Yes, it’s always confusing at first.

    CBCL *was* exclusive for read operation at the ancient versions(8 maybe). But in recent versions, as for the read operation, CBCL is acquired in shared mode. This was a big improvement.

    This means that (as you exactly pointed out) multiple sessions can walk through the cache buffers chains in the same time with CBCL held in the shared mode.

    But this does not mean that we are never blocked in the read operation. It is just walking through the chains which is shared.

    Just after the session finds the buffer what it is supposed to read, it needs to acquire buffer pin. The exact process is as following.

    1. Acquire CBCL in the shared mode.
    2. Walk through the chain and find the buffer to read.
    3. Release CBCL.
    4. Acquire CBCL in the exclusive mode.
    5. Acquire buffer pin for the buffer – shared mode for SELECT, exclusive mode for DML.
    6. Release CBCL.
    7. Read the buffer.
    8. Acquire CBCL in the exclusive mode.
    9. Release buffer pin.
    10. Release CBCL.
    11. Logical Reads done!

    Buffer pin itself is protected by CBCL. This is why we still suffer from the CBCL contention even for the concurrent reads.

    The contention got alleviated by the improvement(CBCL in shared mode), but we still see heavy CBCL contention in the highly concurrent systems.

    This is why Oracle 11g introduced the concept of result cache.

    PS) With buffer pinning activated, we don’t need to acquire latch and walk through the chain. As the buffer is pinned(buffer pin is not released) and the buffer is guaranteed not to be aged out, we are safe to read the buffer directly. This is the merit of buffer pinning.

    Dion Cho

    May 2, 2009 at 12:16 pm

  14. Hi Dion

    Dont understand very well step 5 and 7, arent they the same? Shouldnt the step be 1, 2, 3, 4, 7, 8, 9, 10,11, 12?

    So at the end of the day concurrent reads could block each other due to exclusive CBCL (as I thought) no?

    Thanks for your time

    lscheng

    May 2, 2009 at 10:49 pm

  15. Sprry I mean step order should be

    1, 2, 3, 4, 7, 8, 10,11, 12

    lscheng

    May 2, 2009 at 10:53 pm

  16. @Lscheng

    The step got modified as following.

    1. Acquire CBCL in the shared mode.
    2. Walk through the chain and find the buffer to read.
    3. Release CBCL.
    4. Acquire CBCL in the exclusive mode.
    5. Acquire buffer pin for the buffer - shared mode for SELECT, exclusive mode for DML.
    6. Release CBCL.
    7. Read the buffer.
    8. Acquire CBCL in the exclusive mode.
    9. Release buffer pin.
    10. Release CBCL.
    11. Logical Reads done!
    

    The point here is that buffer pin itself is a shared memory object which should be protected by latch. CBCL is used to protect buffer pin.

    So to acquire buffer pin, 1) acquire CBCL in exclusive mode 2) acquire buffer pin 3) release CBCL.

    And to release buffer pin, 1) acquire CBCL in exclusive mode 2) release buffer pin 3) release CBCL.

    And this is why concurrent reads are still blocked by CBCL contention. At the short period during buffer read, we still need to acquire CBCL in exclusive mode.

    Dion Cho

    May 2, 2009 at 11:24 pm

  17. Aha, it’s very clear now

    Thank you for the time and patience :-)

    lscheng

    May 3, 2009 at 8:50 am

  18. Please Dion, Nigel didn’t respond me. Could you answer to my doubts?
    Thanks,
    Roberto

    Roberto

    May 4, 2009 at 8:38 am

  19. @Roberto.

    It’s all about simple mathematics. In the above example, the size of both indexes is as following.

    TABLE_NAME                    : T1                    
    INDEX_NAME                    : T1_N1                 
    BLEVEL                        : 1                     
    LEAF_BLOCKS                   : 21                    
    DISTINCT_KEYS                 : 10000                 
    CLUSTERING_FACTOR             : 164                   
    SAMPLE_SIZE                   : 10000                 
    GLOBAL_STATS                  : YES                   
    LAST_ANAL                     : 2009/05/04 20:52:22   
    -----------------                                     
    TABLE_NAME                    : T1                    
    INDEX_NAME                    : T1_N2                 
    BLEVEL                        : 1                     
    LEAF_BLOCKS                   : 26                    
    DISTINCT_KEYS                 : 10000                 
    CLUSTERING_FACTOR             : 9934                  
    SAMPLE_SIZE                   : 10000                 
    GLOBAL_STATS                  : YES                   
    LAST_ANAL                     : 2009/05/04 20:52:22   
    -----------------                                     
    

    I’ll do very simple mathematics using these values(with some computational errors)

    1. For index T1_N1 of good clustering factor

    1) Leaf block# = 21
    2) So, keys per leaf block = 10000/21 = 476
    3) Table block# = 164
    4) So, keys per table block = 10000/164 = 61
    5) Buffer is pinned by the 2nd visit on the same block.
    Next read on the same block is incremented as buffer is pinned count
    6) So, buffer is pinned count = (478-1)*21 + (61-1)*164 = 19815
    
    -- index T1_N1(good clustering factor)
    NAME                                             DIFF
    ---------------------------------------- ------------
    buffer is pinned count                         19,835
    
    

    2. For index T1_N2 of bad clustering factor

    1) Leaf block# = 26
    2) So, keys per leaf block = 10000/26 = 384.6
    3) Table block# = 164
    4) So, keys per table block = 10000/164 = 61
    5) Buffer is pinned by the 2nd visit on the same block.
    Next read on the same block is incremented as buffer is pinned count
    6) So, buffer is pinned count = (384.6-1)*26 +  = 
    9973 + small_value_for_table
    
    
    -- index T1_N2(bad clustering factor)
    NAME                                             DIFF
    ---------------------------------------- ------------
    buffer is pinned count                         10,066
    

    Dion Cho

    May 4, 2009 at 1:04 pm

  20. Hi,

    A little correction. In step 3 above (in your reply to LSCheng) the CBC latch is not released and retaken in exclusive mode, but upgraded to exclusive mode from shared. This is what the “shared hash latch upgrades” statistics show in v$sesstat.

    Also, pinning a buffer additionally means estabilishing pointers between session’s db buffer handles and pinned buffer headers. Oracle caches the db buffer handles in SGA (separate handles for each process) and doesn’t close these (nor unpins buffers) immediately after block visit. So, Oracle caches db handles for a call and closes them when a call finishes or the number of already cached handles exceeds a threshold (controlled by _db_handles_cached parameter, 5 by default)

    So, the “buffer is already pinned count” statistic

    Tanel Poder

    May 5, 2009 at 1:00 pm

  21. …shows how many times we tried to access a buffer but discovered that such buffer already had an open handle cached pointing to it (thus the buffer was also pinned)

    Tanel Poder

    May 5, 2009 at 1:01 pm

  22. @Tanel

    I was about to add comment on my comments about latch acquisition. As test case cleary shows, the latch acquisition count is almost 2 times of logical reads. This means that Oracle does not release and reacquire the CBCL, but upgrades it. Your explanation clearly proves my assumption.

    Really thanks for the awesome explanation!

    Dion Cho

    May 5, 2009 at 1:12 pm

  23. Thanks Dion, I missed that “buffer is pinned count” is incremented only after the SECOND visit, and then, in worst clustering case, for each key in the index block, I (ideally) pin the table blocks only one time and thus “buffer is pinned count” is not incremented. Nigel tried to explain me, but I was deaf. The crude numbers opened my eyes! (and my ears)

    Roberto

    May 7, 2009 at 9:11 am


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 61 other followers

%d bloggers like this: