Top Ten 2015 Trends by HDS CTO Hu Yoshida

Hu Yoshida, VP and CTO, Hitachi Data Systems Corp., post in his blog series on top trends for 2015.

1. Business Defined IT
Mobility, cloud, social, and big data are four macro trends that will drive sustainable business growth and will require stronger IT and business collaboration. Businesses need mobility for their people to be more productive in a competitive world. Businesses need the agility and elasticity of cloud for on demand services. Businesses need social to understand the sentiments of their customer base and connect with them on a more direct level. Businesses need big data for predictive analysis to guide their decisions and validate their assumptions. IT must respond to these requirements and become an architect and broker of business services rather than an administrator of data center infrastructure. In 2015 not only will you see CIO’s become more astute in regards to the business drivers around these trends; they will invest more of their budgets in business driven initiatives and businesses will control more of the IT spend.

In order for IT to focus more on business outcomes rather than infrastructure they will to need embrace technologies that converge, orchestrate, automate and integrate their infrastructure. Instead of a ‘Do IT Your Self’ approach to building an application platform by connecting server, storage, network; orchestrating all the element managers; and integrating it with the OS and application’s stack; they should be able to consume infrastructure like they buy a car, without having to build the car from scratch, starting with the wheels and the engine.

Business models for vendors and their channels are also being disrupted, and they are struggling to realign their portfolios. Amazon cloud revenue was $2.6 billlion in 2013 and displaced over $13 billion of IT spend that would have gone to traditional infrastructure vendors and their channels. Business defined IT will require infrastructure vendors to provide more than just the infrastructure. They will need to provide orchestration, automation, and integration around their products to enable CIOs to focus on business outcomes. It is not enough for infrastructure vendors to build a bigger, faster, cheaper box. They must also provide the management and reporting software as well as services that support business defined IT. Since most IT organizations have not hired staff since 2008, they are over loaded with work and do not have the bandwidth to implement new software and processes. Vendors and their channel partners will need to provide services to offload some of this work and enable IT to transition to this new paradigm. Vendors and their channels must be willing to share the risk and work as partners with IT through this transformation.

2. New Capabilities Accelerate Adoption
of Converged and Hyper-converged Platforms
A key enabler for Business Defined IT are converged solutions that are integrated with the systems and the application stack. Converged solutions are the fastest growing segment in HDS portfolio. Businesses no longer need to wait weeks or months for IT to spin up an Oracle RAC or SAP Hana platform. The integration of its orchestration software with VMware and Hyper-V, provides a software defined data center for private and public cloud. IT will begin to focus more on best of breed converged platforms, which can be customized, orchestrated, and optimized through a single pane of glass. New capabilities such as lower cost entry points, deeper integration of infrastructure and software, and the ability to nest hypervisors will accelerate the adoption of converged and hyper-converged solutions.

The entry point for converged solutions will be lowered through solutions that are priced and packaged for the SMB and remote office/back office market with the flexibility to integrate into enterprise configurations. Hyper-converged solutions, which are composed of scale out nodes of commodity servers with internal storage, will help to lower the costs even further for scale out applications like webservers and map reduce analytics.

New capabilities like VVOL from VMware will provide increased communication between the infrastructure and the application through API’s and client/providers. This will enable infrastructure vendors to publish their value-add capabilities to vSphere for consumption while providing the infrastructure with awareness of the logical entities like VMs. This interface enables vSphere to utilize these value add capabilities when they define a virtual volume while enabling the infrastructure with the visibility to allocate resources based on VMs rather than infrastructure constructs like storage LUNs.

Another new capability for unified compute platforms will be the ability to run nested hypervisors on the new Xeon E5v3 processors. This will enable the running of different hypervisors on the same physical server for flexible cloud services, development test, and migration of legacy applications. Hitachi’s blade server will be able to use this new E5v3 capability with its unique LPAR architecture for added protection and load balancing without degradation in performance. LPARs enable multiple applications to run on the same processor in their own partitions with dedicated resources and with no data leakage or escalation of management privileges. Failures in one partition will not impact the running of applications in other partitions. Hitachi x86 LPARs, which have been used to run multiple bare metal applications, such as SAP HANA and Oracle RAC, in the same server to reduce hardware and software licensing costs will now be able to run a mix of bare metal applications and hypervisors for further costs reductions.

3. Management Automation
2015 will see greater investment in management automation tools. Provisioning of applications and orchestration of workloads will be done based on templates. Management automation will begin to include exception monitoring, alerting, root cause analysis and automated remediation. Orchestration will include the movement of workloads between clouds, private and public to align the appropriate infrastructure based on cost, performance, locality, and governance. Management automation will be facilitated by a converged infrastructure with an orchestration layer that eliminates the need to link and launch different element managers.

IT vendors have worked with applications and have built up best practices. Instead of delivering their products with white papers on best practices, they should be able to deliver these best practices in the form of templates that can be used for automation. While it is easier for infrastructure vendors to do this if they own a converged solution, including server, storage and network, API’s must be available in the element managers for customers who still have the need to integrate with legacy infrastructure, OS, or applications.

4. Software Defined
Software defined everything will be heavily hyped in 2015 and many vendors will sell their products under this banner. The concept of software defined is a key step toward simplifying and automating the IT infrastructure. start-ups will jump into the market with software that will be purported to be enterprise with commodity low cost hardware. While software can enhance technology in the hardware the results will be limited by the hardware infrastructure beneath it. Software with commodity hardware can be good enough in some cases, but there will still be a need for intelligent enterprise hardware in the software defined world.

Software defined will require communication between the software and the infrastructure through APIs or client/providers, which are open and restful. A good example of this is VVOL from VMware. VVOL provides VM-level granularity to IT administrators by providing an abstraction layer in the form of a storage API between control plane and storage systems. It provides increased scalability and unprecedented automation in policy-based storage management. Using vSphere APIs for Storage Awareness (VASA), Hitachi is creating providers for its storage and converged solution products to publish their unique capabilities to vSphere for consumption. VVOLs can also use these storage providers to set policies and push information down to the external storage to make the storage VM-aware and able to negotiate with individual VMs; rather than the traditional way of dealing with a set of VMs in a LUN or file share. VVOLs minimize the NAS vs. SAN debate due to its VM centric management perspective and enables VMs to leverage the enterprise capabilities of enterprise storage systems. At the same time VMware provides VSAN for internal storage that is good enough for smaller configurations of EVO RAIL and EVO Rack that don’t need the functions of enterprise storage systems.

5. Global Virtualization Adds New Dimension to Storage Virtualization
Storage virtualization has been vertically oriented up to now, where a storage virtualization engine is able to virtualize other storage systems that are attached behind it. Global virtualization will extend virtualization horizontally across multiple storage systems, which can be separated by campus or metro distances for block and global distances for file and content storage systems, creating virtual storage machines that present one pool of virtual storage resources that span multiple physical storage systems. Separate storage systems are connected through software to enable a virtual storage image to be read or written through either storage system. For block systems Hitachi Data Systems provides a Storage Virtualization OS, SVOS, which acts like a storage hypervisor providing synchronization across G1000 block storage systems so that applications continue to run across systems failures with zero RTO and zero RPO. SVOS also provides configuration flexibility across campus and metro distance and enables non-disruptive migration for technology refresh. HNAS provides this with a global name space for files, and HCP provides this with Global Access Topology for object stores.

The advantage of having multiple storage systems actively supporting the same logical image of data to an application is that the application can continue to run if one storage system should happen to fail. Up to now, only one storage system could be ‘active’ servicing an application and protection of the data was provided by replication to another storage systems that would be in “passive” standby mode. If the active storage system should fail, the application could be restarted on the replica of the data in the passive storage system. Restarting the application requires an outage, and even if the replication was done synchronously and the data in the passive storage system exactly mirrored the data in the active system at the time of the failure, the application recovery must check for transaction consistency. Logs would have to be checked to see which transactions had completed and which transactions would have to be restarted, before processing could resume. This process requires a delay in the applications recovery point and RTOs. In a configuration where both storage systems are active (active/active) the application would maintain transaction consistency during the failure of one of the storage systems. This will be particularly important to core applications that need to reduce their recovery time and recovery point objectives.

These virtual storage machines should be able to provide all the enterprise capabilities of previous enterprise storage virtualization systems including, virtualization of external storage, dynamic provisioning, replication of consistency groups, high performance, and high scalability. The magic will be in the storage virtualization OS that resides in these virtual storage machines. Think of this as a hypervisor for virtual storage machines, which is similar to the hypervisors for virtual server machines like VMware or HyperV. Virtual storage machines that support active/active processing for physically separate storage systems can provide the advantages of HA with zero recovery time and zero point recovery, configuration flexibility across campus or metro distances, and non-disruptive migration across technology refreshes.

6. Greater Focus on Data Recovery and Management of Data Protection
Surveys show that data protection continues to be the highest concern for data center managers. The amount of backup data continues to explode driving up recovery time and recovery point objectives. Much of the focus up to now has been on backup, converting to disk based backup and deduplication to reduce the cost of backup storage. In 2015 more of the focus will be on driving down the recovery time and recovery point objectives while continuing to reduce the cost of data protection.

Some techniques that will be used to reduce recovery times are aggressive use of active archives to reduce the working set required for recovery, increasing use of snaps and clones that are enabled by thin copy technologies, object stores for unstructured data that only needs to be replicated and not backed up, file synch and share from a central repository, edge ingesters that replicate content to a central repository for immediate recovery, faster VTLs that can support multi-streaming and multiplexing to reduce recovery time, and active/active storage controllers where virtual volumes span separate storage systems so that applications can continue to run without the need to recover when one storage system fails.

The cost for data protection of primary data has been exploding not only due to data growth, but by an increasing number of test and development, data protection and replication copies. A database may have 50 to 60 copies that are administered by different users for different purposes. Many copies become orphaned with no owners or purpose except to consume storage resources, and when a recovery is required, it is not clear which snap or replica should be used for recovery. IT administrators will require tools to discover, track and manage copies, clones, snaps, and replicas of data stores across their environment, in order to reduce the waste associated with all these copies and stream line the recovery process.

7. Increasing Intelligence in Enterprise Flash Modules
Due to the limitations of flash technology, writing to formatted blocks, block formatting, recovery of invalidated pages, write amplification, wear leveling and the management of spares, an enterprise flash storage module requires a considerable amount of processing power to maintain performance, extend durability, and increase flash capacity. In 2015 we will see the displacement of SSD that were designed for the commodity PC market with enterprise flash modules that are enhanced with the processing power that is required to address enterprise storage requirements for performance, durability and capacity. New flash technologies like TLC and 3D NAND will be introduced to increase the capacity of flash drives but will further increase demand for intelligence in the flash module to manage the additional complexity and lower durability of these technologies. Hitachi currently uses a quad core processor in its flash module device (FMD) which supports 3.2TB with higher sustained performance and durability than off the shelf SSD. There is an opportunity to utilize the processing power in the FMD to provide further capacity and workload optimization for certain applications.

8. Big Data and Internet of Things
IDC has predicted that big data will grow at a 27% CAGR to $32.4 billion through 2017 about 6 times the growth rate of the overall information and communication technology market. Other analyst like Wikibon, are even more bullish, predicting revenues of $53.4 billion by 2017, as more businesses begin to realize real benefits from big data analytics. 2015 will continue to see solid growth in big data analytics tools like SAP HANA and Hadoop, which can deliver results in a matter of minutes or hours as opposed to days. Preconfigured converged and hyper-converged platforms will speed implementation of big data applications.

While big data of today is more about business data with the addition of social sentiment, the big data of tomorrow will be more about the Internet of Things (IoT) with machine to machine communications, which will have a bigger impact on our lives. The Internet of Things will help us solve problems in carbon footprint, transportation, energy, smart cities, public safety, life sciences, based on information technology. The new world of IoT will create an explosion of new information, which can be used to create a better world. Batch analytics will give way to data streaming analytics for real time analysis of sensor data, and more intelligence will be incorporated in edge ingestors. Applications, built around the Internet of things, will be introduced by companies that have expertise in sensor analysis and in verticals like surveillance and healthcare. In 2015, IT companies will be partnering with social infrastructure companies to realize the potential of an IoT world.

HDS has already started down this path by partnering with other divisions in Hitachi. For instance HDS is partnering with Clarion, a member of the Hitachi Group and an In-Vehicle Information Solution Provider. We have announced a R&D partnership for deployment of new data-driven solutions in the next generation of Clarion in-vehicle connectivity products. This collaboration will give drivers, insurance companies, and manufacturers usable insights that will lead to improved auto performance and safety, increasing value across the growing market for connected cars.

9. Data Lake for Big Data Analytics
While there will continue to be a high demand for scale up enterprise storage and compute systems, the growth of unstructured data and the value it has for big data analysis will require new types of distributed, scale out storage and compute systems. Pentaho CTO James Dixon is credited with coining the term ‘data lake’. “If you think of a data mart as a store of bottled water – cleansed and packaged and structured for easy consumption – the data lake is a large body of water in a more natural state. The contents of the data lake stream in from a source to fill the lake, and various users of the lake can come to examine, dive in, or take samples.” These ‘data lake’ systems will hold massive amounts of data and be accessible through file and web interfaces. Data protection for data lakes will consist of replicas and will not require backup since the data is not updated. Erasure coding will be used to protect large data sets and enable fast recovery. Open source will be used to reduce licensing costs and compute systems will be optimized for map reduce analytics. Automated tiering will be employed for performance and long-term retention requirements. Cold storage, storage that will not require power for long-term retention, will be introduced in the form of tape or optical media.

10. Hybrid cloud Gains Traction
The adoption of hybrid clouds, the combination of private and public cloud is gaining momentum. Analysts like Tech Pro Research are suggesting that 70% of organizations are either using or evaluating the use of hybrid clouds. With the growing competition among public cloud providers, the lower cost of WAN bandwidth, and the ability to control the movement to the public cloud with meta data that is retained within the firewalls of a private cloud, hybrid cloud become a cost effective platform for running enterprise workloads. Much of the data that is created for big data and IoT will not be frequently accessed and would be suitable for low cost storage in a public cloud. The data could be sent to a public cloud using RESTful protocols while the active meta data remains in a private cloud, behind the firewall. Object storage systems like the Hitachi Content Platform (HCP) enables the automatic tiering of data into a public cloud while maintaining the encryption and meta data control within a private cloud. It is very inexpensive to store data in a public cloud as long as you do not access it. But while you store the data in the public cloud you want to have control. Here are some considerations for using a hybrid cloud. You want to retain the meta data under your control so that you can search the data content and only retrieve the data objects from the public cloud when a match is found. The meta data should be extensible so that changes in the use or linkages to the data can be appended to the meta data. You also want to encrypt the data before your send it to the pubic cloud since you don’t know where it physically resides at anytime and if it is moved in the cloud you want to ensure that you data is not left in the clear on the prior device. Data that is moved to the cloud should also be hashed so that when you retrieve the data, you can compare the hash to ensure nothing has changed and you can prove immutability. You should also have the flexibility of migrating data in the background, between public clouds to meet your business needs.