Challenges of Data Recovery From Modern Storage Systems and SSDs
By David Logue and Robert Bloomquist, Kroll Ontrack
This is a Press Release edited by StorageNewsletter.com on June 20, 2012 at 2:43 pmThe Challenges of Data Recovery
From Modern Storage Systems
Authors:
- David Logue, Esq. is the lead remote rata recovery engineer for Kroll Ontrack. He assists customers around the world with the recovery of data from failed or damaged computer systems while ensuring efficiency and quality at every stage of the data recovery process.
- Robert Bloomquist is a principal data recovery engineer for Kroll Ontrack. With over 15 years of experience, he performs expert data recovery on enterprise class storage systems, databases and virtual systems.
Source: (In)Secure – Help Net Security
Date: June 2012
Today, every organization is managing vast amounts of structured and unstructured data across traditional databases and within both cloud and virtual environments. This makes data recovery all the more difficult in the event of a data loss. With data residing in multiple locations, high availability means it is constantly moving between storage layers, automatically leaving companies dangerously unaware of where their data is at any given moment.
While vendors are trying to make storage easier for end users, they are actually generating more complex recovery scenarios in the event of a data loss.
This article examines the various challenges associated with data recovery from a variety of the most popular storage systems today and will specifically outline what customers, data recovery providers, and storage vendors need to be aware of in the event of a data loss.
Taming the cloud
Cloud computing has been a buzz word in the last few years, and many organizations have quickly and easily handed over control of valuable data to cloud providers. Unfortunately, many are still trying to understand the challenges that go along with storing data in the cloud and recovering lost data in the event of a disaster.
One challenge with cloud recovery is that although individual company data is located in silos, multiple customers are still accessing the same physical storage space. When a failure or data loss occurs with a specific customer, cloud vendors will sometimes not allow access to the environment in an effort to protect other customers. In addition, many cloud hosting companies will use their own proprietary storage or virtual machine format, which also leads to recovery challenges. Often, the vendor will not share any details about the storage configuration in order to protect their intellectual property.
Researching and developing the right solution is a challenge in the instances where the vendor does not want to share any specifics around the storage configuration. In an increasingly common scenario, cloud customers relinquish control of their data to the cloud provider, and do not realize the importance of setting up a SLA contract with their provider. Customers of the cloud may not realize the limitations of an SLA contract until a business disruption occurs. After a data loss, cloud customers may regret not requiring more resiliencies from their cloud contracts.
The best safeguard upon adopting cloud technology is for the cloud service provider to partner with a reputable, full-service data recovery company, which minimizes downtime caused by data storage failures. This way, customers are more aware of where and how their data is stored, and how it will be recovered if a loss were to occur.
The virtualization headache
Virtualization is nothing new. However, as more and more companies rely on virtualization systems to support critical elements across their infrastructures, they face scenarios such as hardware failure, deleted virtual machines or virtual disks, file system corruption, and file level corruption. It is critical to address some of the lesser-known challenges associated with recovering data from virtualized environments.
The real challenge with data recovery in a virtual environment is that there is one piece of physical hardware along with multiple virtual machines. Therefore, the failure of one physical machine can result in the failure of many virtual ones, making the impact of data loss far greater. In addition, finding the correct pieces of data and bringing them back together is difficult, as data is fragmented across the storage platform and constantly moving behind the scenes. Add to that thin or sparsely provisioned files, and you have the makings of a true data recovery puzzle.
Virtualized environments contain much larger pools of data, and the key to not over-taxing storage is to balance load capacity. The user has the opportunity to place a large amount of data in a single storage environment. The challenge becomes recovering data at this scale, and having the tools in place to both recover the data completely and get it back to the customer in a timely manner.
In addition to the quantity of data, the amount of fragmentation also impacts the success of the recovery, with less fragmentation leading to a higher success rate. Large volumes of data make it very difficult to find the individual fragments of specific virtual disks to reassemble damaged or deleted virtual machines.
Understanding the new storage landscape – SSDs
Flash-based SSDs have made their mark on the storage market, touting huge benefits such as high speed (low read latency, random access, and start-up times), low power consumption, light weight, noise-free, and high resistance to shock and vibration. SSD storage capacities are increasing and can speed up server applications by as much as three times the normal rate.
As adoption rates increase due to these attributes, the cost per gigabyte is falling rapidly. From data centers to personal devices, the amount and value of the data contained on SSDs is increasing – making data loss potentially catastrophic for the businesses or customers involved.
Many believe SSDs are immune to data loss due to the lack of moving parts compared to traditional HDDs. While SSD drives are less susceptible to fail from being dropped, data loss can still occur due to a variety of circumstances as they have their own unique characteristics that make recovery inherently complex.
In the most extreme cases, data recovery from SSDs can be very time-consuming due to the need to research the algorithms used to originally store the data. With SSDs, the location of the data changes every time it is rewritten, making recovery far more complex.
SSDs can also employ other unique complex functions such as advanced ECC, garbage collection, data striping RAID-like techniques, compression, encryption, bad block mapping, read/write caching, and read disturb management methods.
Additionally, since SSDs generally have a finite number of writes before they become unstable, wear leveling contributes to the extremely arduous task of reassembling the data. This process can take anywhere from a few days to several weeks, depending on the complexity of the drive.
SSDs are still in the earlier stages of their technology life cycle compared to HDDs. Therefore, they vary greatly from manufacturer to manufacturer and between drive families within the same manufacturer. They also often times differ within the same drive family and can even drastically vary within the same model of SSD!
The variations are primarily due to changes, enhancements, and firmware updates manufacturers make to improve drive operation and to meet customer requirements; however, this adds to the difficulty and complexity of recovering data.
Finally, it’s important for customers and manufacturers of enterprise servers or client-based systems integrating and utilizing SSDs to understand the complexities and challenges with recovering data on SSDs. Just like HDDs, SSDs are not immune to data loss and as the technology matures, the standards and data recovery tools and technology will also continue to emerge and mature. Developing standards and data recovery solutions for SSDs instills confidence in Storage Integrators and customers to more widely adopt and implement SSD technology.
What’s in your database?
Whether an organization is facing physical hardware damage, internal database corruption, or basic data deletion, recovery from a database is not as straightforward as some might assume. Each database is complex and unique, featuring its own internal structure different from others, with different versions and upgrades constantly released.
Data recovery vendors must keep up with these varied formats and upgrades in order to successfully recover customer data. In addition, corrupt, missing, or deleted data can create a series of recovery issues reliant on an exhaustive analysis of the complex internal structure of the database. When a storage device is not operational, or a file system structure needs repair, many companies simply assume a recovery is impossible.
Although not impossible, it does require raw database fragments to rebuild database files. With all of these scenarios, the recovery method must easily allow the customers to gain access to their data once it has been recovered.
Another challenge presented is the physical data versus the logical data. When a hard drive fails, many data recovery providers try to recover data only at the physical level, but ignore the logical level.
There are also many different and proprietary file systems and dynamic RAID configurations, which require various types of solutions to recover data. Providers need to be able to pull critical data out of the virtual level that is useful for their customers, in addition to the physical data from the various file systems and configurations, to ensure a quality and complete recovery.
Conclusion
The technology landscape will only continue to become more complex as hardware and software are constantly updated, and offer faster capabilities and larger storage capacity every day.
As vendors try to gain a competitive advantage over one another, they create new updates to their file systems that require additional research and development. In turn, individuals and organizations are becoming increasingly reliant on these ever-changing data storage technologies, with their critical data being housed in various types of environments.
When this data, whether structured or unstructured, resides in multiple kinds of environments, it makes recovery all the more difficult when a data loss occurs. Whether the data is housed in a traditional database, SSDs, virtual, or cloud environments, each presents its own sets of challenges when trying to recover lost data. It is imperative for businesses and customers to remain in constant communication with their storage providers to determine where their data is at any given time.
They also need to be aware of recovery processes and procedures in the event of a data loss to minimize any impact to business continuity and salvage business critical information.