Exclusive Interview With Lance Smith, CEO and President, Primary DataStart-up promoting data orchestration whatever storage flavor
By Philippe Nicolas on 2017.02.20
Who is Lance Smith?
- CEO and President, Primary Data
- SVP and GM, IO memory solutions division, SanDisk (following Fusion-io acquisition)
- President and COO, Fusion-io
- VP and GM, SPS BU, RMI Corporation, formerly Raza Microelectronics
- VP, business development, Raza Foundries, Inc.
- Director, technical marketing, NexGen Microsystems, Inc.
- Director, microprocessor technical marketing, AMD
- Director, technical marketing, Chips and Technologies
Storage Newsletter: Primary Data is a growing company, could you summarize the genesis and background of the company? Year founded? Number of employees today?
Lance Smith: Primary Data was founded in 2013 by David Flynn and Rick White, who are our CTO and CMO, respectively. The idea was born out of our experience at our last company, Fusion-io. As we brought server-side flash to the market, flash created a new memory tier, but inadvertently, it also added a new silo of storage at the same time that the cloud was gaining adoption. With existing SAN and NAS storage already in the datacenter, the two new types of cloud and flash storage created more diversity in the storage landscape, as well as more complexity for the IT teams managing all these different resources.
Initially, the founding idea for data virtualization was applied to an all-flash scale-out NAS. However, after I came on board as CEO in late 2014, we broadened our view of how we could leverage data virtualization to develop a software-defined, storage agnostic solution with data orchestration across different types of storage.
Today we have about 80 employees who work at our Los Altos headquarters, Salt Lake City office, and around the world.
Flynn founded Primary Data based on Tonian Systems, who are the other people behind the company? What are the products or storage technologies they already developed?
Flynn and White co-founded Primary Data, as they co-founded Fusion-io, which brought flash to the server for the first time with the ioDrive product line. I first met them at Fusion-io, where I was the COO and president. At Primary Data, we have engineering developers and leaders that range from open-source maintainers, architects and engineers from Fusion-io, Novell, Veritas, NetApp, EMC, VMware, Intel and others.
How is the company capitalized? And are you looking for additional VC round?
Primary Data has raised about $60 million in venture capital to date. Accel and Battery Ventures are our lead investors, along with strategic investors and boutique firms. I’m a firm believer in working with investors who can share expertise with us along with capital, so we’ll expand our funding as it makes strategic sense for our business.
What's the company vision?
It's to get the right data to the right place at the right time, across different storage types, and without application disruption. We do this with our DataSphere enterprise metadata engine. DataSphere seamlessly gives application owners the ability to define the storage attributes (performance, cost and protection) required for their applications, while simultaneously allowing IT managers to maximize the use of their existing resources or add new storage types, to ensure business needs are met in real-time. Whether they are using or want to use cloud, flash or networked shared storage, or a combination of all of these, we help enterprises orchestrate petabytes of data automatically and non-disruptively.
DataSphere is a software-defined platform that gives storage awareness to applications and application awareness to storage for the very first time. This helps customers save significantly by aligning the real-time needs of an application’s data with an available storage type that best fits the business’ cost, performance and protection demands. Big savings also come from moving cold data to the cloud according to the insights that come from DataSphere’s intelligence and Smart Objectives. The result is that customers are able to cut over provisioning costs by up to 50%, and those savings easily run into the millions when you are managing petabytes of data.
Customers are generally adopting DataSphere to integrate with the cloud, scale-out NAS and simplify the complexity of data lifecycle management, data migration, and virtualization. Our ability to orchestrate data to the right storage for the job greatly simplifies workflows for our customers across all of these needs.
What are the challenges you wish to solve?
Innovative companies have plenty of storage and overspend significantly on over provisioning, but there is still no way to make sure the right resource is serving the right data at the right time. Our DataSphere metadata engine fixes this for enterprises. It separates the data path from the control path for scalability and reliability and provides intelligent alignment between your data and the resources in your infrastructure. This overcomes performance bottlenecks without the need for new hardware, meets the ongoing needs of applications throughout their service life without costly over-provisioning, breaks vendor lock-in, enables the use of newer storage technologies without migration downtime and costs, leverages low cost cloud storage without manual intervention, and helps IT automate data management.
As a metadata engine for enterprises, DataSphere solves the complexity and cost of managing petabytes of data across cloud, flash and shared storage. It enables customers to integrate different types of storage into a single scale-out data space, or multiple data spaces, if that better serves the customer or the demands of the business. The global data space enables IT to achieve visibility into all storage resources and workloads connected to that DataSphere system; but more importantly, it gives us the ability to move data without application interruption. It also shows which applications are generating those workloads, and their historical resource usage. DataSphere offers policy templates, and customers can use these historical insights to tune their policies. To maintain policies, DataSphere non-disruptively moves data to the ideal storage as data and business needs evolve.
When you can finally ensure that the right data is on your most appropriate storage resource, you can save significantly by moving cold data to other storage systems or the cloud and place hot data on the most performant. This is helping customers save substantially by only purchasing the type of storage their data really needs, while putting their existing resources to work handling the workloads they were designed to serve.
What are the target market segments?
DataSphere makes the most sense for large enterprises managing petabytes of data. We are presently working with customers in the media and entertainment, oil and gas, service providers, electronic design automation and financial industries.
What kind of storage product is it?
DataSphere is a software-defined metadata engine for enterprises. It’s a platform, not a traditional storage system. We don’t sell hardware. We are storage- and vendor-agnostic. DataSphere can be implemented virtually or as software installed on a server. It orchestrates data automatically across storage added to the global data space to maintain the objectives that IT sets for each application. We do this through data virtualization, using abstraction to connect different storage across file, block and object through a global data space.
We hear a lot of things about software-defined storage. First do you consider DataSphere being a SDS offering? And second, what is your definition of SDS?
SDS is a word that unfortunately carries a lot of baggage and many different meanings with it. Like Gartner and other leading analysts in the industry, we consider something software-defined to be a product that is truly hardware agnostic. We don’t see many of them on the market, but yes, we do feel that DataSphere is truly a software-defined platform. It’s an engine that accelerates your existing storage resources and integrates your business with the cloud. We unite other vendors’ hardware and cloud storage with our software, and I don’t know how much more software-defined that can get.
What are the advantages of your data virtualization approach?
Virtualizing data as we have with DataSphere enables an unprecedented level of automation and simplicity at a time when storage is getting incredibly complex. Virtualization makes it possible to unite different types of storage - some analysts have called it 'the Holy Grail.' Given how much data we’re creating these days, and how integral data is to business, a solution like DataSphere is critical to helping enterprises align the right data to the right resources, automatically.
As DataSphere abstracts the logical view of data from the physical storage through virtualization, it becomes possible to create a global data space and unite different storage systems. Through this global data space, DataSphere makes it easy for organizations to see all their resources, know how they are being used, move data on demand or by objectives and scale performance or capacity when they are needed. The savings DataSphere makes possible by reducing over provisioning alone is helping companies meet growing data demands without breaking budgets.
How do you leverage pNFS? Does it bring the parallelization data access capability?
Indeed, our ability to leverage the parallelization data access capability for the client is quite important. First, it removes one of the primary performance bottlenecks in scale-out architectures that has limited which applications scale-out storage can support, and second, pNFS provided us with a well-defined and mature protocol that we leverage in our DataSphere platform.
Primary Data has also made a number of Parallel NFS (pNFS) contributions to the NFS 4.2 standard, which have been integrated into the Linux kernel by our Principal System Architect and Linux NFS client maintainer, Trond Myklebust. We announced those contributions in December 2016.
These enhancements give Linux clients and application servers native support for DataSphere’s data virtualization and orchestration capabilities. Enhancements to the Parallel Network File System (pNFS) Flex File layout enable clients to provide statistics on how data is being used, and the performance provided by the storage resources serving the data.
DataSphere makes this information intuitively visible to customers in its UI and reports. Customers can use this information to create service level objectives in DataSphere to automatically match evolving data needs with different storage resources as business needs change.
How do you support Windows platforms?
DataSphere features SMB support that enables it to work with Windows Server 2008/Windows 7 and later.
In fact, Primary Data develops a product based on solid technologies, could you elaborate on a few of them?
We appreciate your insight on how we’re working to simplify storage complexity with a unique, software-based approach. To be frank, it’s a hard problem to solve. There have been many disruptive hardware technologies to come to market in recent years, and now everyone is looking to the cloud for savings and agility. DataSphere helps enterprises easily put all of these investments to work and enable them with intelligence that spans across storage.
DataSphere is a completely new approach to data management. As a platform, it integrates many different elements, including the ability to virtualize data and create a global data space across any type of storage, visibility into all storage resources and application data access, an Objective engine that enables customers to manage data according to its performance and protection requirements, data portals that enable Windows, VMware, and Linux applications to connect to the global data space, and our pNFS contributions that make it easy for customers to adopt it all.
It sounds like a lot of complexity under the hood and it is - we have an incredible engineering team led by Flynn,. The complexity inside DataSphere is key to making it all simple for customers on the integration and operation side of our software, as automation is an important element of DataSphere. Manual migrations are time consuming and costly, and we can make them obsolete by moving data to the right storage automatically, without application disruption.
What is your plan for other data access protocols and interfaces?
We will build out support for protocols and interfaces as our customers request them. As a technology vendor, the best way to grow your business is to ensure that your engineering resources align with your customers’ needs as quickly as possible.
What about file services technologies coming from the HPC world? I mean Lustre or Spectrum Scale aka GPF S.
I discussed this question with Flynn, as HPC has long been a passion for him. Here are his thoughts: "Lustre and GPFS require proprietary, non-NFS clients, while we use the Linux NFS 4.2 client that is most commonly in use for high performance computing environments. The open source nature of NFS makes it much easier for enterprises to adopt, and the abilities of pNFS make it a very compelling architecture for what we see ahead."
How do you manage back-end object storage?
DataSphere integrates object storage into the global data space through data virtualization. Customers input the features of the resource, such as IOPS, capacity, latency, etc., and then Smart Objectives can automatically move data to the object store as needed. DataSphere writes data to the stores as files. This allows customers to see and manage the data as they always have, and to use object/cloud storage without modifying their application. It’s all seamless and very simple for the customer.
What about competition?
Our biggest competition is the status quo. People don’t realize that it doesn't have to be so hard to manage all these different storage platforms and automatically place the right data on the right resource. To our knowledge, there are no other vendors that can orchestrate data in the same way we can with our DataSphere software. Other systems either need you to use their storage appliance, or sit in the data path and thus impact performance. Interestingly, we’ve seen other older vendors attempt to adapt their messaging to line up with ours, but as a metadata engine, DataSphere is completely unique across its out-of-band, agnostic architecture, intelligence, and abilities.
What is your link with open source?
Primary Data has made a number of contributions to open source standards, including our NFS 4.2 contributions we noted above and announced in December 2016. At Fusion-io, customers had to download new drivers when we made software updates, and this was a pain. We’ve learned from that lesson and are committed to making open source contributions that simplify adoption for our customers at Primary Data. With our NFS 4.2 contributions, customers running on the latest builds of Linux are ready to support data virtualization natively, without any driver downloads.
What makes Primary Data unique?
Primary Data is the only company providing an enterprise metadata engine to orchestrate data outside the data path, across any vendor’s existing or new file, block and object storage, transparently to applications. DataSphere is software, not storage, and its ability to unite diverse storage resources, give insights into data, and move data automatically to the right storage for the job - without impacting performance - makes us completely unique in the industry.
What is your distribution model?
DataSphere is sold to our customers via the channel and through our direct sales force. It is available as a software subscription with no limits on the amount of data managed through the global data space. As we’re just beginning our sales expansion, we work closely with our early adopter customers to help them through each step of evaluation to implementation. This year, we will also be building out our channel relationships.
How many customers do you have?
We’re working with about a dozen customers right now. Our NDAs don’t allow me to name them, but these are household names in verticals like media and entertainment, oil and gas, and service roviders.
What is the total capacity you operate?
What we can say is that we fit well with enterprises managing at least 1PB of storage and have at least 500TB of data.
How is the product priced and licensed?
DataSphere is licensed as an annual subscription, with no limits on the amount of storage capacity or number of data objects managed. Customers can add a second DataSphere for high availabilty. Clients, applications and servers are attached to DataSphere through our DSX Extended Services data portal. Each DSX portal attached to DataSphere is available as an annual subscription license, and customers can add features to DSX according to their unique business needs and scale using the environment of their choice.
What about your international presence?
We have a number of employees working around the world, and are proud to be working with international customers even at this early stage in our business. What DataSphere can do is incredibly compelling, and the innovators we’re working with aren’t limited by geography. As the accounts grow, so will we, and our business operations will expand to support our customer base.
What are the priorities for 2017?
I can tell you that we are focused on continuing to develop DataSphere to meet the needs of our customers in 2017 and grow our business. We believe that we have a solution to the storage industry’s biggest challenges across complexity, performance and costs, and we’re excited to be working hard every day to help our customers simplify storage through automated data orchestration.