Exclusive Interview With Avinash Lakshman, CEO and Founder, HedvigStart-up just raised new VC round with HPE as investor and technical advisor
By Philippe Nicolas on 2017.03.20
Who is Avinash Lakshman?
- CEO and co-founder, Hedvig, Inc.
- Software engineer, Facebook
- Software design engineer, Amazon Inc.
StorageNewsletter: You founded Hedvig. Who are the other people behind the company? What are the products, service or storage technologies already developed by the team before Hedvig?
Avinash Lakshman: Hedvig was profoundly inspired by the work I had done in distributed systems. Before founding Hedvig in 2012, I was one of the initial co-inventors of Amazon Dynamo, which was the genesis of the entire NoSQL movement today. I was there for three years. Then I became one of Facebook's early employees and that's where I invented and built Apache Cassandra. It was the second NoSQL system I built.
I took my experience in developing and operating two of the world’s largest distributed systems and started Hedvig to bring a similar 'hyperscale mentality' to enterprise data storage.
As for the rest of my team, we come from a mix of distributed systems and traditional storage backgrounds. I firmly believe what we’re doing here at Hedvig is the marriage of these two disciplines.
Year founded? Number of employees today?
Before founding Hedvig in 2012, I noticed that unlike the surge in dramatic changes that had been happening in virtualization (I'm including containers and Docker as a form of virtualization) and cloud, there had been no fundamental innovation in storage for at least a decade. Everything that was done was incremental - even the advent of scale-out architectures, all-flash, and hyperconvergence. I felt my experience in distributed systems could fundamentally solve the problem differently and disrupt a very big space. That was the origin of Hedvig.
So after working for three years, we finally launched the company in 2015 with the Hedvig Distributed Storage Platform, a software-defined approach to storage. Today, after growing to 50 employees, the original goal remains: build a programmable, software-defined storage platform for the modern datacenter that spans any storage tier, any workload, and any cloud.
How is the company capitalized? Are you looking for additional VC round?
Hedvig launched from stealth in March 2015. At that time we had raised $12.5 million in seed and Series A funding rounds, led by Atlantic Bridge Capital, True Ventures and Redpoint Ventures. Just a few months later, in June 2015, prompted by heightened interest from the VC community, Hedvig announced an $18 million series B round to support the company's product expansion and address the needs of its growing customer base. Most recently, we also announced a $21.5 million series C round, with all previous investors participating and a variety of new ones, including HPE and EDBI, the dedicated investment arm of the Singapore Economic Development Board, joining as well. The new funds will be used to expand into AsiaPac, develop end-to-end enterprise solutions, and continue building out a world-class team.
What are the reasons of the HPE investment?
The primary motivation is joint customers. We've seen organic demand from large enterprises to have Hedvig work closely with HPE on ProLiant, Apollo, and Moonshot servers. From here sprang the opportunity to have HPE invest in the series C round.
How and where do you see this collaboration going? Joint solutions? reseller or oem agreement?
We're engaged with HPE on several different levels. We're part of the Hewlett Packard Pathfinder program, which works on both partnership and investment opportunities in startups like Hedvig. We'll engage with the Pathfinder team on marketing and communications. We're also in the HPE Complete program. This is where we will work on joint solutions and broader go-to-market. We're in the process of developing combined HPE hardware with Hedvig software SKUs. These will be part of the HPE price list and sold and supported by HPE sales and channel partners. We're also engaging with relevant HPE teams for specific accounts where we see immediate opportunity. We have such opportunities in both North America and Europe today, and we expect more as we continue to work through the HPE complete process. This is not an OEM relationship, but rather an opportunity to pre-integrate our hardware and software to create end-to-end solutions for enterprises and service providers. We think the combination of HPE hardware with Hedvig software is ideal for companies looking to build private and hybrid cloud solutions. Hedvig's vision for the Universal Data Plane very much aligns with HPE's vision for hybrid IT. There will be a lot of areas we can explore in future phases of the relationship.
What will be the exact role of Milan Shetti, mentioned as technical advisor, with all HPE storage plan in mind as we can imagine?
Milan will help Hedvig engage at all levels of HPE. As with all Hedvig technical advisors, Milan will advise Hedvig along key product and strategy elements. We're excited to tap into his experience working with large customers throughout the world. Milan will be instrumental in helping us pursue a clear opportunity we see in larger storage environments where customers prefer software-defined storage and storage solutions for hybrid cloud environments. Given the scale and nature of these deployments, we don't see overlap with existing 3PAR, SimpliVity, or Nimble solutions.
What is the company vision?
From its inception, our mission at Hedvig has been to help companies become more responsive to the data demands of today's digital businesses. We believe a modern approach to storage is needed - one that re-architects the way ever-growing volumes of data are stored in software-defined environments. We believe that a distributed systems process is the best way to scale storage in any enterprise environment that includes multiple clouds, tiers, and workloads. This is the reason we started the company and it also underpins our concept of the Universal Data Plane (UDP).
What are the challenges you want to solve?
Today's CIOs need a hybrid strategy that balances digital services across both private and public cloud infrastructure. Advances in containers and orchestration make application portability among clouds a reality. But what about data portability? Companies struggle with how to store, manage, and protect the growing volume of data that digital businesses generate. Traditional storage infrastructures, with their rigid, costly solutions, are a bottleneck. Hedvig provides the optimum cloud architecture for storage that spans tiers, workloads, and clouds.
What are the target market segments? Is it dedicated to primary or secondary storage?
We are targeting mid-to-large enterprises that are undergoing cloud transformation initiatives. Typically this means they have a petabyte or more of storage growing at 30% or more per year. We aren't focused on any one vertical, but we find financial services, service providers, retail, and healthcare markets are most commonly looking for our product. Although our most common deployment is primary storage in a private or hybrid cloud, usage of the our Distributed Storage Platform as secondary storage and backup is increasing. Often, we notice that our customers will first deploy us as secondary storage or in DevOps environments, and then extend the deployment to other types of storage once they become comfortable with how the technology works.
What are the typical use cases?
Given the flexibility of the Distributed Storage Platform, we have several traditional and modern workload use cases. However, we see two common patterns emerging. The first is to use the platform as part of a private or hybrid cloud initiative. The goal is to create self-service infrastructure that developers or business users can self-provision. The use cases are often a mix of VMware, Microsoft, OpenStack, and Docker. In fact, part of the reason customers choose Hedvig is that they are often mixing and matching all of these workloads in a single cloud. The second pattern is to use us as a backup or archive target. In this case, we’re providing a scale-out, multi-site solution where you can just point your backup or archive software at us and away you go. More conservative, risk-averse companies will start here before moving to a cloud use case. Usually that's a year-long customer journey.
We hear a lot of things about software-defined storage. What is your definition of SDS? Do you consider Hedvig as SDS player?
Hedvig's definition of SDS is a storage technology that is installed and managed as software on commodity hardware rather than purchased and deployed as a distinct hardware storage array. To be truly software-defined, we believe the solution must have three core capabilities:
- No hardware dependencies. Hardware matters greatly in the world of SDS. CPU, memory, and flash all dictate overall performance. But there should be no hard-coupling of the software to the underlying hardware. A product needs to be flexible to mix and match different hardware profiles based on your workloads.
- Application-level granularity. We believe the power of SDS is to give each application its own distinct storage policies. Most storage platforms are quite feature-rich, but those features are either universally on or universally off. That doesn't work in software-defined architectures. We enable all features to be provisioned on a per-application basis, tailoring your storage to the application needs and not the other way around.
- API-driven and programmable. The biggest change in software-defined anything is that it plugs into a broader orchestration fabric. To do so, all elements need to be programmable - be it control or data plane. These means a full set of RESTful APIs and hooks into all the common tools like VMware, Docker, OpenStack, Kubernetes, Mesos, and more.
Could you elaborate on your technologies?
The Distributed Storage Platform enables organizations to deploy storage for any compute and any scale, provides advanced data services, and allows for hybrid multi-site replication. Our platform's components are, first, the Hedvig Storage Proxy, which is lightweight-access software that presents block and file storage to hosts, accelerates read performance with flash caching, and drives efficiency with client-side deduplication. Second, the Hedvig Storage Service, which is a powerful, patented distributed systems engine that transforms commodity server nodes into an elastic, flexible enterprise-class storage system. Third, the platform includes APIs, which are RESTful APIs that provide programmable access to every feature of the Distributed Storage Platform for integration with custom applications and service catalogs.
Does your solution leverage things you have built in the past, could you give us some details about this? Or just lessons from the past?
The Distributed Storage Platform borrows from many of the lessons I learned building Dynamo and Cassandra. We intentionally chose not to reuse a lot of the code and instead built it from the ground-up. Building storage is dramatically different than building a database. But the concepts of scalability, self-healing, and multi-site are similar. So we took the best practices, but not the code. My intention was to create a disruptive product that challenged the status quo of storage technology, which prior to Hedvig, had been broken.
Are you running bare-metal or also within Hypervisors?
The answer is both. Hedvig was architected to run as a bare-metal solution running directly atop commodity servers or instances of cloud compute. However, in a hyperconverged deployment you would run Hedvig as a VM atop the hypervisor. About 80% of our deployments are in a bare-metal environment where we are running on a dedicated server. We call this hyperscale. The remaining 20% are hyperconverged deployments.
On each disk in each node, do you use a file system such as ext4 or xfs or do you consider disk partition as raw device?
Yes, internal to the Hedvig platform is a file system. However, that is abstracted from the user. As far as the user is concerned, they create a volume (Virtual Disk) and assign it to be block (iSCSI), file (NFS), or object (S3, Swift). How we lay the data to disk is proprietary to the platform and takes into account a sophisticated data and metadata construct.
With such product, how do you protect data?
Today, data protection is done via a tunable replication. What we mean by that is that each Virtual Disk can have its own replication factor (RF), ranging from one to six. We also apply one of three DR policies: agnostic (data is spread throughout the cluster as best it can be); rack-aware (data is spread across distinct racks in a single datacenter); and datacenter-aware (data is spread across distinct sites, be it a datacenter or public cloud). This combination of RF and DR policy enables companies to protect data across multiple sites. We have specific optimizations that ensure latencies don't impact the application. We're also getting requests from customers to deliver erasure coding (EC) for object storage volumes, so that's an item on the roadmap. The power of Hedvig is that for each volume you can decide RF or EC, enabling an optimal policy per the needs of the application.
What about geo cluster?
Hedvig currently supports geo-clustering. We can have any number of nodes spread across any number of sites, whether that site is a private datacenter or public cloud. We have customers that have production clusters spanning multiple countries. Several of them are pushing clusters that will span three continents. It's a very flexible system in this regard. In these scenarios, replication is used (erasure coding would be too cost-prohibitive from a performance perspective) and we have algorithms and data reduction techniques to minimize the impact of WAN latencies.
What is the initial or minimal number of nodes for the product a user should buy?
We recommend a minimum of three nodes for the Hedvig SDS platform, although you could technically start with as few as two. Typical customer deployments start around 50TB. From there, the cluster can grow to an almost unlimited number of nodes, although we don't recommend scaling beyond 1,000 nodes for operational purposes. In a cluster of this size the failure rates of components would be so high that it doesn't warrant the operational 'savings' of keeping it as a single cluster. Two 600-node clusters could be more operationally efficient than a 1,200-node cluster, for example. However, 1,000 nodes could easily be an exabyte-scale cluster depending on the node size.
What are the interfaces exposed by your product? Any plan to add others?
Today, we expose iSCSI, NFS, S3, and Swift interfaces. We also provide SMB via integration into Microsoft SOFS. You can also program both control and data plane interfaces directly via our RESTful APIs. And finally, we support the Docker Volume plugin, OpenStack Cinder driver, and VMware VAAI. Moving forward, we have plans to support SMB natively, VMware VVOLs, and we're looking to see if there is customer demand for OpenStack Manila.
How many patents ? Around which topics?
The company has several dozen patents in storage and recovery technology as well as patents that ensure high availability of data. Most focus on the metadata architecture, how to provide per-volume granularity for policy provisioning, and how to apply enterprise-grade storage features across a distributed fabric.
What about competition?
Competition tends to fall into three buckets. By far the most common competitor is legacy or traditional storage. We're not competing head-to-head, but the customer is making a choice between buying more of the incumbent or switching to a modern, software-defined architecture like ours.
The second bucket are hyperscale software-defined storage providers. Here we most commonly see the open source project, Ceph. And finally, we do see hyperconverged competitors. However, we don't explicitly focus on this market. We allow customers to deploy Hedvig in an hyperconverged mode, but it's usually for a specific workload and the customer will still deploy hyperscale. Thus, we're not competing for small- to medium-sized environments where the customer was going to go all-hyperconverged.
What is your link with Open Source?
We contribute upstream to several open source projects. We are also developing our Cinder and Docker Volume plugin, both of which are available in open source communities. Finally, we also make contributions to several other projects based on various components inside the platform, which is very common in today's Linux-based infrastructure software. And to answer a question everyone always asks, no, the product is not based on Cassandra. That was my previous life and I've built a new distributed system from the ground up to be the backbone of Hedvig.
How your offering differentiates?
There are three things that set Hedvig apart.
- Our two-tier, hyperscale architecture enables us to support any number of compute environments. Specifically, this means we support VMware, Microsoft, Xen, and KVM hypervisors, Docker containers, bare metal Windows and Linux, and cloud platforms like OpenStack. Additionally, we supports any public cloud and we've demonstrated support for the big three - AWS, Azure, and Google. Finally, we also have demonstrated orchestration integration with Docker, OpenStack, Mesos, and Kubernetes.
- We provide per-volume storage policies. Most storage platforms, even modern ones, provide universal capabilities. That means, for example, that deduplication or compression are either on or off for all applications. We enable each application to have its own unique storage policy.
- Our built-in, multi-site replication. This provides native cross-cloud replication and is the backbone of our ultra-high availability architecture and our Universal Data Plane.
What is your business model?
The Distributed Storage Platform is available with subscription and perpetual licenses based on capacity. We provide different license types depending on the use cases and capabilities the customer needs. Upon request, we can also do a subscription pricing model.
How do you sell?
We usually go to market through our channel partners, which 80% of customers choose to do, but do go direct for those who require it. We plan to aggressively expand our channel strategy in 2017 given the demand we see globally for SDS technology. Today we do business in North America and EMEA, but expect to see a lot more from us in the AsiaPac region moving forward. This is all under the Hedvig CloudScale Partner Program that we launched in March of 2016. This program also encompasses our growing list of technical partnerships.
How many customers do you have? Could you name some of them?
Today we have just under 50 customers and are growing rapidly. As for specific customers, they range from large financial institutions like BNP Paribas CIB, service providers like DGC, utility and energy companies like LKAB, and professional services firms like Mazzetti.
What is the total capacity you operate? How many sites? How many nodes in a cluster? How many clusters?
We're seeing a big uptick in the amount of capacity our customers are buying. Average capacity per customer grew to 750TB, or 167% year-over-year, in 2016. Multi-petabyte deals tripled in 2016. Large customer clusters are around 50 nodes and span at least four data centers.
How is the product priced and licensed?
The software is available on a subscription basis starting at less than $0.01/GB/month list price. Since it is a software-only solution, we work with customers and hardware partners to provide a complete storage solution running on x86, ARM, or public cloud platforms.
What about the international presence?
The majority of our business today (about 60%) has been in EMEA. We see adoption in Europe in particular has been more aggressive. However, North America will overtake EMEA in 2017 for our largest region. Finally, we saw a ton of inbound request from Australia, Singapore, Thailand, China, Japan, and South Korea. A big part of our investment from EDBI is to help us capitalize on the AsiaPac region.
What are the priorities for 2017? And what is the future for Hedvig?
Our overarching priority remains the same: become the number one software-defined storage provider. We believe our vision for the UDP and our recent commercial traction puts us on that path.
Given our latest series C funding, we have three additional priorities for 2017:
- Grow our presence in AsiaPac, which is one of the reasons we're working with EDBI to establish a base of operations in Singapore
- Develop more end-to-end solutions or backup, Docker, and other cloud environments
- Continue to hire a world-class engineering, sales, support, and channel team
Hedvig Raises $21.5 Million Series C
Total at $52 milllion; HPE new investor
2017.03.02 | Press Release | [with our comments]
Start-Up Profile: Hedvig
New one in software-defined storage
by Jean-Jacques Maleval | 2015.03.30 | News