• Home

Exclusive Interview With Mohit Aron, CEO and Founder, Cohesity

In hyperconverged secondary storage platform
By Philippe Nicolas on 2017.05.03

AddThis Social Bookmark Button




Who is Mohit Aron?

He is:

He was:

  • Founder and CTO, Nutanix
  • Architect, Aster Data Systems, Inc.
  • Staff engineer, Google
  • Principal software engineer, Zambeel Inc.
  • Before that summer intern at IBM Research and DEC Western Research Lab

Storage Newsletter: Could you summarize the genesis and background of the company? Year founded? Number of employees?
Mohit Aron: After my three years at Nutanix, hyperconvergence was becoming a runaway trend in the industry, but there was one problem: hyperconvergence only applied to primary storage, even though the bulk of enterprise data (80%) resides in secondary storage. We define secondary storage to encompass not just data protection but the massive amounts of storage tied to your non-mission-critical apps, including test and development, file shares, object storage, and analytics. It was a problem I wanted to fix, and after some thought, I came up with the idea for Cohesity, which was incorporated in the summer of 2013.

I am very proud to say the company is doing very well, fueled by an exceptional group of over 150 people. Based on our rapid progress in just four quarters of selling, we recently took our employees and their families on a trip to Hawaii to thank them for being so involved, motivated and committed to Cohesity.

You founded Cohesity. Who are the other people behind the company? What are the products, service or storage technologies already developed by the team before Cohesity?
The team consists of engineering, marketing and sales staff that hail from innovators in data center infrastructure, including Google, Nutanix, VMware, Pure Storage, Nimble, NetApp, Riverbed and EMC. Much of our innovation is in distributed systems design and implementation, all of which is built at the software layer that can run either on-premises or in the cloud.

How is the company capitalized? VC money, others? And are you looking for additional VC round?
Cohesity launched in 2013 and to date, raised $160 million in three funding rounds. Our most recent series C round over $90 million closed this past March 2017, led by GV (Google Ventures) and Sequoia Capital. Cisco and Hewlett-Packard Enterprise both took significant positions in this round as well. The following venture firms also participated: Accel Partners, ARTIS Ventures, Battery Ventures, Danhua Capital, Foundation Capital, Qualcomm Ventures, Trinity Ventures and Wing Venture Capital. We are currently well-capitalised to reach a sustainable positive cash flow status at a faster pace than other storage startups.

What's the company vision?
We envision a phased journey that starts by unifying secondary storage on an infinitely scalable platform that stores and protects all customer data. As customers scale out the Cohesity platform for data protection, they can start using that previously passive data for DevOps. In the third phase, as that data is consolidated, they can analyze it to detect key trends effectively, get immediate alerts about security issues and, most importantly, they are able to see the whole picture, which is the only way to make the right business decisions.

What are the challenges you wish to solve?
Our mission is to enable companies of all sizes to bring order to their secondary storage chaos. We take aim at the inefficient, fractured landscape of secondary storage for backup, test & dev, file shares, by applying a distributed, web-scale architecture to converge these silos of data. Our solution consolidates these secondary storage workflows into an intelligent data platform so customers can store and protect data seamlessly, use it efficiently, and draw intelligence from it instantly.

What are the target market segments?
Our top markets are healthcare, financial services, high-tech and public sector.

What are the typical use cases?
The most common initial use case for our clients is simplifying and consolidating their data protection environment. They oftentimes have invested in multiple backup software providers with associated media servers and master servers, a multitude of backup target storage appliances, and separate cloud gateways. Cohesity can simplify this type of complex backup environments by turning it into an efficient, scalable and intuitive policy-driven data protection solution. Beyond data protection, customers can then use their inactive backup data for provisioning test/dev environments, running analytic workloads, and consolidating file shares and object storage.

We hear a lot of things about software-defined storage. What is your definition of SDS? Do you consider Cohesity as a SDS player?
SDS places the emphasis on storage-related services rather than storage hardware. As is the case with software-defined network, the goal of SDS is to provide administrators with flexible management capabilities through policies and programming. Without the constraints of a physical system, a storage resource can be used more efficiently and its administration can be simplified.

Cohesity is a true SDS system that enables policy-based provisioning and management of data storage, independent of the underlying physical infrastructure. This approach gives architects the opportunity to build multicloud data centre storage services with far greater efficiency and flexibility at much lower costs.

Could you elaborate on your products?
Secondary data consumes about 80% of enterprise storage capacity and imposes a huge burden on enterprise IT budgets. It is traditionally stored on a complicated patchwork of point appliances for backups, files, objects, test/dev copies, and analytics data. This infrastructure is fragmented, complex to manage, and inefficient.

We solve these challenges by providing the only hyperconverged platform designed to eliminate secondary storage silos. It does so by combining secondary data and the associated management functions in one unified solution, including backups, files, objects, test/dev copies, and analytics.

Some recent product developments include our native integration with Pure Storage, which combines Cohesity's hyperconverged secondary storage platform with Pure's all-flash arrays. Additionally, we released Cohesity DataPlatform Virtual Edition, which extends its data protection capabilities from core data centers to remote sites and branch offices.

DataPlatform VE eliminates the need to provision any new hardware in the ROBOs and can be remotely installed and administered without requiring on-site support, which is often in short supply at remote offices.

Finally, we recently released Cohesity 4.0, our fourth-generation platform, which includes new workloads, native NAS protection, and erasure coding. The new offerings, Cohesity DataPlatform 4.0 and Cohesity DataProtect 4.0, expand the company's capabilities far beyond backup, to bring together even more data formats and infrastructures including S3-based object storage and NAS, onto its hyperconverged platform.

Are you running bare-metal or also within Hypervisors? If yes, which one?
Today, we've chosen to run our distributed system software directly on hyperconverged bare metal nodes for maximum performance and efficiency. The architecture can support either hypervisors or containers in the future.

You currently leverage VMware Data Protection API, any other plans? What about physical servers?
Yes, we use VMWare VADP to directly manage backups for virtual environments. About a year ago we introduced the ability to backup physical servers (Linux, Windows), and just recently we announced direct backup for Pure Storage FlashArrays, as well as NAS storage. From our inception, we've always provided the capability to backup any other storage system using backup applications, such as Veeam, CommVault, NetBackup, IBM Spectrum TSM, etc.

Could we consider your product as a gateway as well? Or a super data management orchestrator?
We directly manage the placement of data blocks, both on-premises and in the cloud, and use a metadata store to efficiently retrieve data. We are not a gateway, but rather a data fabric that seamlessly incorporates edge deployments (remote offices or I0T locations), on-premises data centers, and remote clouds. As a software-defined infrastructure we provide both a data plane, and a separate control plane, to manage applications and policies associated with the data.

What about Geo topology?
Being software-defined, we support a broad topology, including on-premises with our standard edition, at the edge with our Virtual Edition, and in the cloud with our Cloud Edition. All three products are based on the same fundamental DataPlatform software, which is our distributed file system.

In addition, to protect against disasters, we feature cluster replication across both on-premises datacenters and into public clouds running Cohesity DataPlatform.

Storage back-end is fully agnostic for you, right? So what kind of interfaces you need to support a back-end storage device?
Cohesity aims to provide a single platform for secondary storage which is able to address the requirements of file services, backup systems, object storage, archive and analytics, while also tightly integrating with major cloud vendors.

The Cohesity Data Platform is built upon a distributed web-scale file system that combines infinite scalability with architectural simplicity that consolidates multiple data storage workloads onto a single platform.

DataPlatform was built from the ground up to be a fully distributed, strongly consistent and versatile storage system. A key part of the file system is our unique snapshotting technology called SnapTree.

It allows for frequent and near instant snapshots, while keeping all the logical images fully hydrated. This supports the most stringent RPO/RTO goals imaginable. Another key aspect is a true global de-duplication capability that ensures that the same dedupe block is not written twice across the cluster and removes all unnecessary and redundant data copies. With Cohesity global deduplication and compression, our customers typically experience >10x data reduction.

The full power of the underlying Cohesity file system is unleashed through a set of interfaces that together constitute the service layer. This layer is key to unleashing the power of the filesystem in numerous different storage workflows. The services layer supports storage protocols such as NFS, SMB and S3. It supports data protection workflows such as backup and restore. It enables replication between different clusters to support disaster recovery and data availability. It has built-in search and a MapReduce framework to support instant search and ad hoc file content analytics.

What are the interfaces exposed by your product? Any plan to add others?
For data ingest and local connectivity, we support Ethernet-based protocols including: NFS, SMB (CIFS), S3. We natively connect to Amazon, Azure and Google via their storage protocols. We plan to include additional protocols in the future, as we expand our range of data center use cases.

What is the minimal configuration?
The Cohesity solution provides a pay-as-you-grow model and scales to meet the end user's capacity and performance needs. The minimum configuration is three nodes and the maximum number of nodes is effectively limitless. There is no software limitation on how many nodes you can add to the platform.

What kind of patents do you have and around which topics?
We have multiple patents today for our clustering technology and file system. We were recently granted a significant new patent for our distributed snapshot and cloning capability, known as SnapTree.

What about competition?
Our competitors consist of legacy storage systems that are used today for secondary storage workloads, such as backup, file shares, test/dev and analytics.

What is your link with Open Source?
We do employ popular open-source products and libraries for commonplace services (e.g. Linux, OpenSSL), and use them as building blocks.

What makes Cohesity unique?
Cohesity is the only product on the market that consolidates and simplifies the diverse set of secondary storage workloads both on-premises and in the cloud. We substantially reduce the cost of acquiring and managing storage assets, as well as, generate potential new revenue sources by putting your data to additional productive use (i.e. analytics, devops, search).

What is your business model?
We have a global sales force that focuses on mid-large enterprise accounts, and fulfill exclusively through a worldwide network of channel partners. We also have resell and co-selling relationships with large enterprise vendors such as Cisco and Hewlett-Packard Enterprise.

How do you sell? Direct, channel, OEM? What are the partnerships you develop?
We sell through the channel and we have access to a network of expert, knowledgeable integrators and resellers. We created a partner program that is both advantageous for resellers, distributors and for Cohesity, as we pride ourselves to offer the highest margins in the industry. We are currently actively recruiting channel partners in EMEA.

How many customers do you have? Could you name some of them?
We have experienced unprecedented growth coming out of our launch just over a year ago, and we are excited to reach new levels as more and more companies recognize how data protection and other secondary workloads are ripe for transformation within enterprise IT. Our list of our customers now numbers 100+ and we are working towards doubling this number in the coming quarters. Businesses with over 1,000 employees or more than $500 million in revenue now account for over 70% of our customer base. In addition, our earliest customers are already purchasing their second or third expansion of the platform, demonstrating its strong ROI, as well as an appreciation for the cost-effective pay-as-you-grow storage capabilities. To name a few: Quarles & Brady LLP, Diesel, Munson Healthcare.

What is the total capacity you operate? How many sites?
We protect over 250PB of primary storage today for our enterprise clients. Our software is installed in hundreds of sites.

Your revenue?
While we don't disclose our revenue numbers, we can say that our customer base and revenues have roughly doubled sequentially for each of the past three quarters. 

How is the product priced and licensed?
We sell pre-configured systems scaled by the number of nodes a customer wishes to deploy. We also provide a software-only option if the customer chooses to deploy on Cisco UCS or HPE ProLiant hardware. Shortly, you will be able to order our software for deployment in public cloud services via their marketplaces. Finally, we will offer a subscription model for clients that prefer to purchase software on a term basis.

What about the international presence?
We operate in both North America and Europe today. We will be expanding into AsiaPac in the coming year.

What are the priorities for 2017?
We have two priorities for 2017. The first is to assist enterprise accounts worldwide to consolidate and simplify their secondary storage silos, and expand our data centre footprint. The second is to invest further in product engineering to build out the richness of our secondary storage use cases and workflows. For example, we just announced object storage and enhanced file services capabilities this month, with our fourth-generation platform, Cohesity 4.0.