What are you looking for ?
Infinidat
Articles_top

Microsoft Drowns Amazon with Azure Data Lake

Exabyte-scale data store optimized for analytical workloads

Microsoft Azure, formerly Windows Azure, was released in February 2010 as a cloud computing platform for building, deploying and managing applications and services through a global network of Microsoft-managed data centers.

It is getting bigger every year with new updates and enhancements like HDInsight, Data Factory, Machine Learning and Revolution R who was acquired by Microsoft earlier this year.

These services were not only released to face the rough competition from Amazon and Google, but also to respond to the high demand of enterprises that need to load applications from the cloud, need to manage databases or to analyze big data without big investments.

Last April, 29,  Microsoft announced it will release later this year three new services, Azure SQL Database elastic databases, Azure SQL Data Warehouse and Azure Data Lake. Here is a brief description.

azure f1

Azure SQL Database elastic databases
The elastic database, who might be the first one created these days, lets a business share several databases from different location as if it was only one database. It also allows centralized queries for all databases instead of making single queries on each one for the same analysis, making therefore faster and better analysis and management of all the data at low cost. It allows the building of SaaS applications, the database can be searched on full text and uses column-oriented structure for better and faster analysis.

Below is a chart of the pricing model where eDTU refers to elastic database throughput unit. As you can see, 100 databases can be pooled together for $4.5 per eDTU per month with elastic databases of up to 250GB of size at $2.5 per database per month.

azure f2

Azure SQL Data Warehouse
A cloud data warehouse with a parallel processing architecture can dynamically grow, shrink and pause compute ins independent of the storage capacity or the size of the data being analyzed. In the chart above, Microsoft compares SQL Data Warehouse to Amazon’s Redshift, and as you can remark, it seems to be better than Redshift. Is it really?

The fact is that when compared to other similar products, Redshift has always been faster analyzing big data because of its scalable and customizable clusters and nodes, and its column-oriented architecture, the same one Microsoft is adopting today. The cons of Redshift are the loss of time in configuration of clusters and nodes for the optimum speed/price analysis, the loss of time having the data structured to be able to analyze it, and some constraints of true SQL features.

Azure SQL Data Warehouse is not fully available today, so a real comparison cannot be made.

azure f3

Azure Data Lake
It’s a global scale, exabyte-scale data store optimized for analytical workloads. It’s a single place to store every type of data in its native format with no fixed limits on account size or file size, high throughput to increase analytic performance with the following features:

  • HDFS for the cloud: compatibility with Hadoop File System, an integration with Azure HDInsight.
  • High capacity: no fixed limits on account size or file size.
  • Optimized for massive throughput.
  • High frequency, low latency, real-time analytics: high volumes of small writes at low latency making it optimized for near real-time scenarios like website analytics, Internet of Things (IoT), analytics from sensors, and others.
  • Store data in its native format without prior transformation: allowing you to store relational and non-relational data without transformation or schema definition. This allows to store all of your data and analyze them in their native format.
  • Durable and highly available: protects your data, backing it up three times.
  • Rich management and security features: ability to monitor performance, receive alerts, and audit usage.

Microsoft’s objective with these new products is to let companies be able to work with any kind of data from a single place and in their native format, providing full SQL support and permitting the use of tools developers are used to work with, including Hadoop, one of the most used tool for analyzing data, Revolution R, Hortonworks and other Cloudera tools, with also the benefit of using other Azure services like Azure Cloud, Azure Machine Learning, Azure Data Factory without having to modify the structure of the data for each service.

These new products are only available for private preview right now, and will be launched next June. Google and Amazon both have interesting suites based on cloud and big data, and also working with Hadoop and others, but Microsoft’s new tools are a big step forward and will certainly bring lots of benefits to their users.

Today Azure seems to be the most complete suite. How will Amazon and Google respond to this challenge?

Articles_bottom
AIC
ATTO
OPEN-E