• Home

Start-Up Profile: Datos IO, Just Emerging and Getting $15 Million in Financial Funding

Designed distributed versioning platform built for recovery of scale-out databases
By Jean Jacques Maleval on 2015.09.17

AddThis Social Bookmark Button

Datos IO, Inc.

San Jose, CA

Date founded
June 2014

Financial funding

  • Oversubscribed seed funding: $2.75 million from True Ventures and angel investors
  • Oversubscribed Series A funding: $12.5 million from Lightspeed Venture Partners and True Ventures

Founders and main executives

ThakurTarun Thakur, CEO, most recently worked at Data Domain (EMC) where he led and delivered multiple new products. Prior to EMC, he was at Veritas (Symantec) for clustered NAS appliance products, responsible for in-bound and out-bound activities. Before that, he was at the IBM Almaden Research Center with various distributed systems initiatives. He started his career as an assembly language programmer at Seagate, where he developed advanced storage architecture products.


SarkarPrasenjit Sarkar, CTO, was formerly manager, big data systems, at IBM from 2009, and worked at Big Blue since 1998.




Advisors include BJ Jenkins, CEO, Barracuda Networks; Matt Pfeil, chief customer officer, DataStax; Debashis Saha, VP, eBay Cloud Services; and Dr. Remzi Arpaci-Dusseau, professor of computer science, University of Wisconsin-Madison

Number of employees
The team includes five members from IBM Almaden Research Center, five PhDs, and senior technical architects from Google, Netflix, Data Domain (EMC), CommVault, NetApp and Oracle.

Datos IO has developed a distributed versioning platform built for recovery of next-gen applications and scale-out databases, ensuring consistent versions across all scale-out databases and providing enterprises with a single state of truth for their distributed applications.

This platform can be deployed on scale-out non-relational databases such as Cassandra, MongoDB, HBase, Google BigTable, and Amazon DynamoDB.

It allows enterprises to embrace these applications by:

  • delivering a distributed versioning platform for scale-out databases
  • empowering application architects and DevOps
  • allowing customers to extract value from their data and metadata via data management services (runnable applications and more)

Products description

Datos IO designed its enterprise-grade versioning platform on four foundations:

  1. 1/ Enable any point-in-time versions of the scale-out databases where versions are in native formats and stored as highly space efficient.
  2. 2/ Empower the new owners of the infrastructure stack - the application admins and DevOps teams - as well as existing IT operations staff.
  3. 3/ Reduce the operational pain of recovery management by allowing live access and live replay to versions of a database and search of versioned data and metadata.
  4. 4/ Given that applications are multi-sourced where they are deployed on a multitude of databases for state and data organization, Datos IO understands this paradigm and provides a single state of truth for such applications providing a version across all their source databases.

Highlights of the platform include:

  • Cluster-consitent versioning Cluster-consistent versioning: Scale-out databases need enterprise-grade recovery capabilities to avoid the risk of data loss. Datos IO cluster-consistent versions are in native formats and designed for any point in time.
  • Semantic de-dupe, semantic deduplication: Designed specifically for scale-out, eventually consistent databases. It recognizes semantic equivalents of data values across the nodes of a scale-out database.
  • Orchestrated repair-free recovery: Users can restore without manual administration steps. No copying, no scripts, and most of all, geared for application admins and DevOps. Datos IO versions are in native format and database-consistent, so no repair is required on restore.
  • Scale-out recovery platform: Big Data and cloud databases have scale-out architectures because they are built to handle extreme throughput requirements; the data recovery platform needs to scale horizontally with the data stores. The software platform scales horizontally as you grow your scale-out databases.

Released date
Early Access Program is now open and product will be available in 2016.

There is no cost to joining Early Access Program but pricing is not yet available.

Support of Cassandra and MongoDB in the initial release and then probably other scale-out databases such as HBase, Google BigTable, and Amazon DynamoDB, etc.

DataStax, company delivering Apache Cassandra in a database platform

Number of customers
No one, but early adopters include eBay Cloud Services, Barracuda Networks and Threat Stack

Analytics, IoT, digital advertising, security analytics

Target market
Enterprise software (cloud-native, on-premise), big data

Traditional backup and recovery software vendors like Veritas, EMC (Legato) and IBM (Tivoli), etc.