With Skippy, database application code can run against an arbitrarily old snapshot, and iterate over snapshot ranges, as efficiently as it can access recent snapshots, for all db.
Decreasing disk costs make it possible to take frequent snapshots of past storage system states and retain them for a long duration. Existing snapshot approaches offer no satisfactory solution to long-lived snapshots. Current methods include the Split snapshots and they are an approach that is promising because it does not disrupt the current state storage system in either the short or the long run. An unsolved problem has been how to maintain an efficient
access method for long-lived split snapshots without imposing undesirable overhead on the storage system. Existing snapshot approaches, however, offer no satisfactory solution to
long-lived snapshots. Yet, long-lived snapshots are important because, if the past is any predictor of the future, a longer-time prediction needs a longer-lived past.
Existing access techniques to versioned past data in databases and file systems rely on a “no-overwrite” update approach. In this approach, the past state remains in place and the new
state is copied, so the mappings for the past state take over the mappings of the current state all at once, rather than gradually. Although in-memory techniques exist for split snapshot systems to accelerate the construction scan, this approach supports only short-lived snapshots. Thus, an access method is needed for split snapshot systems that also support long-lived snapshots.
Skippy is a new approach that inexpensively indexes long-lived snapshots in parallel with snapshot creation. An embodiment of Skippy uses append-only index data structures to optimize writes while simultaneously providing low-latency snapshot lookup. Performance evaluations of Skippy indicate that this new approach is effective and efficient. It provides close-to-optimal access to long-lived snapshots while incurring a minimal impact on the current-state storage system. A key component of the split snapshot system that allows a storage system to run unmodified applications over snapshots in addition to the current state.
This allows for the creation of maps for high-frequency snapshots without disrupting access to the current state and to look up the mappings efficiently when application code runs against long-lived snapshots.
The Skippy Mapper access method provides both low-cost snapshot mapping creation and low-cost snapshot mapping lookup for long-lived snapshots because they were disruptive in the long run. This method provides consistency by constraining the order of disk writes for snapshot mappings but allows flexible order for snapshot blocks. The flexibility enables more efficient snapshot creation. Because the Skippy mapper protocol supports a common low-level interface in storage systems the invention does not depend on the specific storage system architecture and is applicable to different types of storage systems, databases, file systems, and content-addressable stores.
Current methods rely on specific storage system architecture and are not applicable to different types of storage systems; Skippy is applicable to different storage systems/databases.