Index Engines Bringing Out Unstructured Data Profiling Engine

Data centers have been growing at unprecedented rates and creating an uphill battle for IT departments looking to reclaim storage budgets, regain capacity and gain insight into the data that is being stored.

Index Engines, Inc., an enterprise data management and archiving company, is looking to provide a cost and time effective solution to the big storage crisis with its latest release, the Catalyst Data Profiling Engine.

It processes all forms of unstructured files and document types, creating a searchable index of what exists, where it is located, who owns it, when it was last accessed and what key terms are in it. Summary reports allow instant insight into enterprise storage providing knowledge of data assets.

Through this process, mystery data can be managed and classified, including content that has outlived its business value or that which is owned by ex-employees and is now abandoned on the network.

"Organizations have little awareness of the volume, composition, risk and business value of their unstructured data," Gartner, Innovation Insight: File Analysis Innovation Delivers an Understanding of Unstructured Dark Data, Alan Dayley, March 2013. "File analysis tools enable IT to create a visualization of unstructured data that can be presented to others in the organization so that they can make decisions based on the data."

Data profiling relies on an enterprise index of metadata from user files and email databases such as last modified or accessed time, number of duplicates, size, owner, location, file type, and more. Using summary reports combined with filters, users can view content on specific servers or locations, and see a chart of top owners by capacity, age of data, files by type and much more.

Optionally data profiling can look beyond metadata and go deep within documents and email finding content supporting keyword searches or even confidential information or compliance assurance audits for sensitive content misplaced behind the firewall in PSTs or on the wrong server.

Once the data is located it can be remediated, archived or even moved to a different storage platform. Organizations are finding that capacity can be reclaimed by purging data that has no business or legal value including ex-employees files, duplicates, and content that is abandoned and has not been accessed in more than 7 years. Besides the ability to reclaim storage capacity and reduce the annual storage budget, data profiling supports proactive compliance, security and risk management.

"On a very granular level, you can search for Social Security and credit card numbers," Index Engines VP Jim McGann said. "But the biggest use case is likely going to be showing legal and compliance what information exists and getting the ball rolling on managing data and putting an information governance or data retention policy in place."

The Catalyst Data Profiling Engine is designed for large enterprise environments allowing organizations to uncover and analyze unstructured and mystery data, creating an index of the information that is only a 1% footprint resulting in extreme scalability.

From there, the indexing engine, version 5.0, allows action to take place on the data.

Features include:

Deletion with Validation – Manage the defensible deletion of unstructured data using validation to ensure the content has not changed since it was profiled. Validation checks the modified date or optionally the signature of the document.
Defensible Audit Logs – As disposition of the data is performed, including deletion, logs will be maintained that detail the date and disposition of the document, including the user that executed the disposition.
Expanded Duplicate Reports – Summary reports include duplicates by file type, owner, age, location and more. These reports allow for deeper profiling of redundant content.
Report Scheduling and History – Stored reports can be scheduled to run on a periodic basis and the results can be stored in order to access a historical perspective of the data environment. This allows a view into the data center including the incremental change of the content based on historical reports.
Increased Capacity – This version breaks the 1PB barrier and now supports metadata profiling of up to 1PB of unstructured data using a single engine. This unprecedented scale and efficiency is unmatched in the market and allows for enterprise data profiles to be achieved.

"With a few clicks of a mouse you can find data on your network servers that have not been accessed in five, 10 years, who it belongs to and where it lives," Jim McGann said. "From there it can be moved to cheaper storage, archived for compliance or purged from the system."

Data profiling starts at $1,000/TB and is deployable through VMware and hardware.

To read Achieving effective Information Governance through Data Profiling, by Jim McGann, Index Engines (registration needed)