TPC-H Benchmark on Memory1 for Apache Spark SQL Workloads by Diablo

Diablo Technologies, Inc. announced a TPC-H benchmark data showcasing the performance benefits of Memory1 for Apache Spark SQL workloads.

By increasing the cluster memory size with Memory1, the company was able to improve data processing times by as much as 289% while lowering the overall TCO by as much as 51%.

The published data demonstrates that increasing application memory using Memory1 results in each server achieving three times the performance at approximately half the overall cost.

Apache Spark is an open-source platform that enables high-speed data processing for large and complex datasets. Spark SQL, an Apache Spark module for structured data processing, allows SQL-like queries to be run on Spark data. Its large In-Memory requirement makes it an ideal application for Memory1.

The TPC-H test suite was selected to measure the performance of Memory1 for Spark SQL workloads.

Benchmarks were run on the following configurations
with different anticipated results:

5 node DRAM cluster vs. 5 node Memory1 cluster -illustrates significant performance increase at similar cluster size
7 node DRAM cluster vs. 2 node Memory1 cluster – illustrates consolidation achievable at similar performance levels

All DRAM nodes used SSDs and all Memory1 nodes used DMX RAM Disk for Spark local storage.

The results on Memory1 demonstrate that users can achieve more work per server and reduce the time needed to process increasingly larger datasets than servers with DRAM alone. Customers can now improve performance, get more work done with existing resources and, in some instances, realize a lower TCO.

Behind the tests
The TPC-H test suite is comprised of 22 queries that retrieve data from various subsets of the eight source tables under varying query parameters to obtain an aggregated indication of performance. For each configuration, the full 22 query set was run seven times.

The results were then reviewed and normalized across all seven sets as follows:

Validate all queries ran successfully
Individual query completion time (breakdown of query type and performance)
Aggregate query completion time (summarized cumulative full query set completion time)

The servers were configured as follows:

Components	Hardware	Software	Version
Server	Supermicro SYS-1028U-TR4T+	Java	1.8.0_101
CPU	Intel Xeon CPU E5-2690 v4 2.60GHz 2 sockets, 35MB cache 28 total physical cores 56 logical cores (HT)	Scala	2.11.8
DRAM Node	256GB DRAM	DMX	2.1.3
Memory1 Node	2TB M1 256GB DRAM	Benchmark Tools	https://github.com/ssavvides/tpch-spark1
Memory1 Version	Firmware: 2.1.2.25, Hardware: A8+	HDP	2.7.3.2.5.0.0-1245
Interconnect	10GbE	OS	CentOS Linux release 7.2.1511 (Core)
Spark Local Storage	DRAM = 2 x Samsung EVO SSDs (1.4TB) M1 = 600GB DMX RAM Disk	Kernel	Linux 3.18.3 + DMX.2.1.2.29 #1 SMP x86_64_x86_64_x86_64 GNU/Linux

In order to comparatively test a Memory1 cluster to a DRAM-only cluster, a common 1TB CSV dataset was used. The source data was then converted to Parquet format with one Parquet file generated for each of the eight source tables. This was utilized for all tests to ensure data uniformity.

The tests were executed using the tpch-spark test repository as its foundation.

Using Spark v2.0.1, the servers were first configured to use only the installed 256GB of DRAM per server to process the dataset. Next, the cluster was set up to run the tests on the same datasets with 2TB of Memory1 per server.

Results

5 x Memory1 nodes with DMX RAM Disk vs. 5 x DRAM nodes with SSD
Looking at the five node cluster comparisons, the Memory1 cluster completed the seven sets faster than DRAM with an average time of 48.41 minutes vs. the DRAM average of 144.41 minutes. In this configuration, Memory1 delivers a 289% work per node advantage.

2 x Memory1 Node with DMX RAM Disk vs. 7 x DRAM Node with SSD
The performance results show that the Memory1 node outperformed the DRAM node, completing the TPC-H set in an average time of 90.47 minutes vs. the DRAM average of 100.09 minutes. That is 11% faster than DRAM with five fewer compute nodes. In this configuration, Memory1 delivers 389% more work per node at a 51% lower TCO than DRAM.

Full whitepaper

“With dataset sizes increasing daily, the need for larger memory footprints while maintaining affordability has become a business imperative,” said Maher Amer, CTO, Diablo. “With Memory1, customers can achieve more work per server and greatly reduce the time needed to process increasingly larger datasets than servers with DRAM-alone. Memory1 not only improves Spark SQL performance but it also lowers the TCO, and that is a very real and tangible business advantage.”

Memory1 solution delivers high-capacity flash-as-memory DIMMs and intelligent memory management software to enable more work per server. It scales up memory resources, delivering up to 40TB of application memory in a single rack. More efficient and resource dense servers means improved real-time analytics, faster business decisions, and more transactions completed in a shorter amount of time. The net result provides users with the flexibility to address evolving business needs and technologies at a lower TCO. Memory 1 solutions are currently available from Inspur and Supermicro.

Read also:
Benchmarks Show Diablo Technologies’ Memory1 Doubles Speed of Apache Spark Graph Processing
Users can get more work done with existing resources, minimize server sprawl, and improve TCO.
2017.02.10 | Press Release