Sydney Oracle Meetup Message Board › Exadata: 'direct path read', 'object checkpoint' & consistency reads
| Yury Velikanov | |
|
|
Last Friday we had a very good presentation & discussion on the Exadata solution.
As always big thanks to Alex & Martin for organizing and David & Tim for presenting. This discussion triggered a question on how read consistency is ensured during Smart Scan Offload processing? The problem is that during Full Scan Offload Processing on Storage level some of table blocks could be changed by other sessions and written to disk before Smart Scan read the block. In that case in order to ensure the read consistency for that block Oracle should ether prevent anybody from modifying the table during the scan (we know that readers doesn’t block writers in the Oracle world, so this is less likely happens) or Oracle should do some post-processing work on the blocks returned. Alex suggested that this probably processed in the same way as in case of direct path reads. Oracle issues object checkpoint to ensure read consistency. This is something that I couldn’t understand and spent some time to make some research around the topic. Reading through some posts available on the internet it seams that direct path reads works as the following: --- Step A. Issue "object checkpoint" Just before starting direct path processing Oracle session issues "object checkpoint" to protect itself from reading inconsistent data blocks with lower SCN number than Full Scan’s SCN number is. Let’s shortly explain the reason for "object checkpoint". Imagine what the Full Scan’s (SQL’s) SCN number is 1050. If Oracle doesn’t issue "object checkpoint" at the beginning of the query processing there might be a dirty block in the memory (SGA) with SCN 1040 and hard copy of the same block on the disk with SCN 1030 (obviously the block is modified just recently before we start the Full Scan and DBWR didn’t write it to the disk jet). As the direct read doesn’t look in SGA but reads blocks directly from the disk it would read the block and wrongly consider the data as consistent (block’s SCN - 1030 < SQL’s SCN - 1050). --- Step B. Reads data blocks from the disk directly This step is obvious. The session issues read requests to read the data from the disk to PGA directly. --- Step C. Reconstruct a consistent version of data blocks if necessary The session reads each block’s SCN number and compares it with SQL’s SCN. If the block’s SCN number is higher than SQL’s SCN number it looks in to block’s ITL (Interested Transaction List) and reads UNDO data from ether SGA or disk to reconstruct previous version of the block. Yury References: http://www.freelists.... http://oracledoug.com... http://groups.google.... |
| Yury Velikanov | |
|
|
Question: How Consistent reads are ensured in Exadata case?
It looks like it is clear how Consistent Reads ensured in case of Direct Reads. The consistent image of the blocks should be processed on DB server side (where we do have access to UNDO information). However if Exadata returns to DB server relevant rows and columns only, the question still is open how the consistent image of the data is processed? My guess is: - Ether Oracle returns the whole block from the Exadata cells to DB cell having at least one row what satisfy the condition. In that case I don’t see how the Exadata could eliminate unnecessary columns from to be returned from the storage cell - Ether Oracle sending block’s headers + relevant rows & columns in the format what makes possible to reconstruct consistent image of the data on the DB cell side. Any input are welcome, Yury |
| Evgeny Platonov | |
|
Yury,
Thanks for the research! It makes things clearer. But are you sure that Exadata cell returns only particular rows and columns rather than whole blocks? Evgeny |
|
| Evgeny Platonov | |
|
By the way, if every SQL initiates "object checkpoint" for Exadata then it should be very huge load for the system given the number of SQL that Exadata can process.
|
|
| Yury Velikanov | |
|
|
Yury, This is that white papers suggests e.g.: http://www.oracle.com... I am sure that David will have more to comment on that. |
| Yury Velikanov | |
|
|
By the way, if every SQL initiates "object checkpoint" for Exadata then it should be very huge load for the system given the number of SQL that Exadata can process. This is true. But on the other hand one of the characteristics of DWH is fewer SQL-s, but lager resource consumptions per SQL. At least this is one logical explanation on how Exadata can ensure data consistency. I would be glad to hear explanations from others. :) Evgeny - thank you for participation. Yura |
| David Centellas | |
|
|
Protection Against Data Corruption
Exadata Cell is compliant with the Oracle Hardware Assisted Resilient Data (HARD) initiative, a joint initiative between Oracle and hardware vendors to prevent data corruptions from being written out to disks. Data corruptions, while rare, can have a catastrophic effect on a database, and therefore on a business. Exadata Cell takes data protection to the next level by protecting business data, not just the physical bits. The key approach to detecting and preventing corrupted data is block checking where the storage subsystem validates the Oracle block contents. Oracle Database validates and adds protection information to the database blocks, while Exadata Cell detects corruptions introduced into the I/O path between the database and storage. It stops corrupted data from being written to disk, and validates data when reading the disk. This eliminates a large class of failures that the database industry has previously been unable to prevent. Exadata Cell implements all the HARD checks, and because of its tight integration with Oracle Database, additional checks are implemented that are specific to Exadata Cell. Unlike other implementations of HARD checking, HARD checks with Exadata Cell operate completely transparently. No parameters need to be set at the database or storage tier. The HARD checks transparently handle all cases, including ASM disk rebalance operations and disk failures. Hope this helps |