ⓘ Data consistency. Point-in-time consistency is also relevant to computer disk subsystems. Specifically, operating systems and file systems are designed with the ..

                                     

ⓘ Data consistency

Point-in-time consistency is also relevant to computer disk subsystems.

Specifically, operating systems and file systems are designed with the expectation that the computer system they are running on could lose power, crash, fail, or otherwise cease operating at any time. When properly designed, they ensure that data will not be unrecoverably corrupted if the power is lost. Operating systems and file systems do this by ensuring that data is written to a hard disk in a certain order, and rely on that in order to detect and recover from unexpected shutdowns.

On the other hand, rigorously writing data to disk in the order that maximizes data integrity also impacts performance. A process of write caching is used to consolidate and re-sequence write operations such that they can be done faster by minimizing the time spent moving disk heads.

Data consistency concerns arise when write caching changes the sequence in which writes are carried out, because it there exists the possibility of an unexpected shutdown that violates the operating systems expectation that all writes will be committed sequentially.

For example, in order to save a typical document or picture file, an operating system might write the following records to a disk in the following order:

  • Journal entry saying file XYZ is about to be saved into sector 123.
  • Journal entry noting the file completely saved, and its name is XYZ and is located in sector 123.
  • The actual contents of the file XYZ are written into sector 123.
  • Sector 123 is now flagged as occupied in the record of free/used space.

The operating system relies on the assumption that if it sees item #1 is present saying the file is about to be saved, but that item #4 is missing confirming success, that the save operation was unsuccessful and so it should undo any incomplete steps already taken to save it e.g. marking sector 123 free since it never was properly filled, and removing any record of XYZ from the file directory. It relies on these items being committed to disk in sequential order.

Suppose a caching algorithm determines it would be fastest to write these items to disk in the order 4-3-1-2, and starts doing so, but the power gets shut down after 4 get written, before 3, 1 and 2, and so those writes never occur. When the computer is turned back on, the file system would then show it contains a file named XYZ which is located in sector 123, but this sector really does not contain the file.

Further, the file systems free space map will not contain any entry showing that sector 123 is occupied, so later, it will likely assign that sector to the next file to be saved, believing it is available. The file system will then have two files both unexpectedly claiming the same sector known as a cross-linked file. As a result, a write to one of the files will overwrite part of the other file, invisibly damaging it.

A disk caching subsystem that ensures point-in-time consistency guarantees that in the event of an unexpected shutdown, the four elements would be written one of only five possible ways: completely 1-2-3-4, partially 1, 1-2, 1-2-3, or not at all.

High-end hardware disk controllers of the type found in servers include a small battery back-up unit on their cache memory so that they may offer the performance gains of write caching while mitigating the risk of unintended shutdowns. The battery back-up unit keeps the memory powered even during a shutdown so that when the computer is powered back up, it can quickly complete any writes it has previously committed. With such a controller, the operating system may request four writes 1-2-3-4 in that order, but the controller may decide the quickest way to write them is 4-3-1-2. The controller essentially lies to the operating system and reports that the writes have been completed in order a lie that improves performance at the expense of data corruption if power is lost, and the battery backup hedges against the risk of data corruption by giving the controller a way to silently fix any and all damage that could occur as a result.

If the power gets shut off after element 4 has been written, the battery backed memory contains the record of commitment for the other three items and ensures that they are written "flushed" to the disk at the next available opportunity.

                                     

1. Transaction consistency

Consistency database systems in the realm of Distributed database systems refers to the property of many ACID databases to ensure that the results of a Database transaction are visible to all nodes simultaneously. That is, once the transaction has been committed all parties attempting to access the database can see the results of that transaction simultaneously.

A good example of the importance of transaction consistency is a database that handles the transfer of money. Suppose a money transfer requires two operations: writing a debit in one place, and a credit in another. If the system crashes or shuts down when one operation has completed but the other has not, and there is nothing in place to correct this, the system can be said to lack transaction consistency. With a money transfer, it is desirable that either the entire transaction completes, or none of it completes. Both of these scenarios keep the balance in check.

Transaction consistency ensures just that - that a system is programmed to be able to detect incomplete transactions when powered on, and undo or "roll back" the portion of any incomplete transactions that are found.

                                     

2. Application consistency

Application Consistency, similar to Transaction consistency, is applied on a grander scale. Instead of having the scope of a single transaction, data must be consistent within the confines of many different transaction streams from one or more applications. An application may be made up of many different types of data, various types of files and data feeds from other applications. Application consistency is the state in which all related files and databases are synchronized representing the true status of the application.

                                     
  • In computer science, consistency models are used in distributed systems like distributed shared memory systems or distributed data stores such as a filesystems
  • Consistency in database systems refers to the requirement that any given database transaction must change affected data only in allowed ways. Any data
  • updates are made to a given data item, eventually all accesses to that item will return the last updated value. Eventual consistency also called optimistic
  • Release consistency is one of the synchronization - based consistency models used in concurrent programming e.g. in distributed shared memory, distributed
  • from these definitions, as the number of data sources increases, the question of internal data consistency becomes significant, regardless of fitness
  • Processor Consistency is one of the consistency models used in the domain of concurrent computing e.g. in distributed shared memory, distributed transactions
  • accessing a shared memory, a consistency model restricts which accesses are legal. This is useful for defining correct data structures in distributed shared
  • Photo - Consistency Based Registration of an Uncalibrated Image Pair to a 3D Surface Model Using Genetic Algorithm, Proceedings of the 3D Data Processing
  • software RAID implementation, makes data consistency checks available and provides automated repairing of detected data inconsistencies. Such procedures
  • and its application. Data validation is intended to provide certain well - defined guarantees for fitness, accuracy, and consistency for any of various kinds
  • a distributed storage system from the Consistency point of view of its data It can be used to support Big Data management frameworks, Workflow management