Thursday 30 June 2016

Performance debate - 'Sour grapes' DataCore chairman fires back

Fresh from the DataCoreLabs blog:
 
Based on the questions raised in recent press articles, it seems some have missed a major aspect that contributed to DataCore's world record storage performance. As some may think, it wasn’t just the cache in memory that made the biggest difference in the result. The principal innovation that provided the differentiation is DataCore’s new parallel I/O architecture. I think our Chairman and Technologist; Ziya Aral says it well in this excerpt from the recent article from The Register, written by Chris Mellor: The SPC-1 benchmark is cobblers, thunders Oracle veep

The press release that sparked the debate is located here: 
DataCore Parallel Server Rockets Past All Competitors, Setting the New World Record for Storage Performance

Measured Results are Faster than the Previous Top Two Leaders Combined, yet Costs Only a Fraction of Their Price in Head-to-head Comparisons Validated by the Storage Performance Council; See Chart Below:
Top 3 Capture

Comments from the original article:

The DataCore SPC-1-topping benchmark has attracted attention, with some saying that it is artificial (read cache-centric) and unrealistic as the benchmark is not applicable to today's workloads.

Oracle SVP Chuck Hollis told The Register: "The way [DataCore] can get such amazing IOPS on a SPC-1 is that they're using an enormous amount of server cache."
...In his view: "The trick is to size the capacity of the benchmark so everything fits in memory. The SPC-1 rules allow this, as long as the data is recoverable after a power outage. Unfortunately, the SPC-1 hasn't been updated in a long, long time. So, all congrats to DataCore (or whoever) who is able to figure out how to fit an appropriately sized SPC-1 workload into cache."

But, in his opinion, "we're not really talking about a storage benchmark any more, we're really talking about a memory benchmark. Whether that is relevant or not I'll leave to others to debate."

DataCore's response ... Sour grapes
Ziya Aral, DataCore's chairman, has a different view, which we present in at length as we reckon it is important to understand his, as well as DataCore's, point of view.
"Mr. Hollis' comments are odd coming from a company which has spent so much effort on in-memory databases. Unfortunately, they fall into the category of 'sour grapes'."
“The SPC-1 does not specify the size of the database which may be run and this makes the discussion around 'enormous cache', etc. moot,” continued Aral. “The benchmark has always been able to fit inside the cache of the storage server at any given point, simply by making the database small enough. Several all-cache systems have been benchmarked over the years, going back over a decade and reaching almost to the present day.”

"Conversely, 'large caches' have been an attribute of most recent SPC-1 submissions. I think Huawei used 4TB of DRAM cache and Hitachi used 2TB. TB caches have become typical as DRAM densities have evolved. In some cases, this has been supplemented by 'fast flash', also serving in a caching role."

Aral continued:
In none of the examples above were vendors able to produce results similar to DataCore's, either in absolute or relative terms. If Mr. Hollis were right, it should be possible for any number of vendors to duplicate DataCore's results. More, it should not have waited for DataCore to implement such an obvious strategy given the competitive significance of SPC-1. We welcome such an attempt by other vendors.

“So too with 'tuning tricks,'” he went on. “One advantage of the SPC-1 is that it has been run so long by so many vendors and with so much intensity that very few such "tricks" remain undiscovered. There is no secret to DataCore's results and no reason to try guess how they came about. DRAM is very important but it is not the magnitude of the memory array so much as the bandwidth to it."

Symmetric multi-processing
Aral also says SMP is a crucial aspect of DataCore's technology concerning memory array bandwidth, explaining this at length:

As multi-core CPUs have evolved through several iterations, their architecture has been simplified to yield a NUMA per socket, a private DRAM array per NUMA and inter-NUMA links fast enough to approach uniform access shared memory for many applications. At the same time, bandwidth to the DRAMs has grown dramatically, from the current four channels to DRAM, to six in the next iteration.

The above has made Symmetrical Multi-Processing or SMP, practical again. SMP was always the most general and, in most ways, the most efficient of the various parallel processing techniques to be employed. It was ultimately defeated nearly 20 years ago by the application of Moore's Law – it became impossible to iterate SMP generations as qucikly as uniprocessors were advancing.

DataCore is the first recent practitioner of the Science/Art to put SMP to work... in our case with Parallel I/O. In DataCore's world record SPC-1 run, we use two small systems but no less than 72 cores organized as 144 usable logical CPUs. The DRAM serves as a large speed matching buffer and shared memory pool, most important because it brings a large number of those CPUs to ground. The numbers are impressive but I assure Mr. Hollis that there is a long way to go.

DataCore likes SPC-1. It generates a reasonable workload and simulates a virtual machine environment so common today. But, Mr. Hollis would be mistaken in believing that the DataCore approach is confined to this segment. The next big focus of our work will be on, analytics which is properly on the other end of this workload spectrum. We expect to yield a similar result in an entirely dissimilar environment.
The irony in Mr. Hollis' comments is that Oracle was an early pioneer and practitioner of SMP programming and made important contributions in that area.

...
DRAM usage
DataCore's Eric Wendel, Director for Technical Ecosystem Development, added this fascinating fact: "We actually only used 1.25TB (per server node) for the DRAM (2.5TB total for both nodes) to get 5.1 million IOPS, while Huawei used 4.0TB [in total] to get 3 million IOPS."

Although 1.536TB of memory was fitted to each server only 1.25TB was actually configured for DataCore's Parallel Server (See the full disclosure report) which means DataCore used 1.5TB of DRAM in total for 5 million IOPS compared to Huawei's 4TB for 3 million IOPS...

Monday 27 June 2016

DataCore takes World Record for Performance and the SPC-1 crown

Storage vendor DataCore has established a record for the SPC-1 benchmark, blowing the doors off the previous top performers despite its use of commodity hardware.
A pair of Lenovo X3650 M5 servers running the DataCore Parallel Server software has achieved 5,120,098.98 SPC-1 IOPS.
The previous top performers on this benchmark were the Huawei OceanStor 18800 V3 (3,010,007.37 SPC-1 IOPS) and the Hitachi VSP G1000 (2,004,941.89 SPC-1 IOPS).
While those two systems cost in excess of US$2 million, the DataCore-based system cost just over US$506,000.
Two other DataCore systems are in the SPC-1 top 10: a single node configuration of DataCore Parallel Server (1,150,090 SPC-1 IOPS for US$137,000) and the DataCore SANsymphony HA-FC (1,201,961 SPC-1 IOPS for US$115,000).
No other vendor in the top 10 comes close to matching DataStore's average response time under full load. The three systems managed 0.28, 0.10 and 0.22ms respectively. The only others with sub-millisecond response were the Huawei (0.92ms) and Hitachi (0.96ms) systems mentioned above
DataCore's high performance comes from taking full advantage of the parallelism available in modern multi-core CPUs, explained vice president of APAC sales Jamie Humphrey.
"We're redefining not only how storage works, but the economies inside the data centre," he toldiTWire.
What other vendors deliver in 48 or 72U of rack space, a DataCore-based system can provide in 14U, he said.
DataCore's approach makes high performance storage available to midmarket organisations as well as large enterprises, ANZ regional sales director Marco Marinelli told iTWire. Furthermore, the company offers a "highly mature product" currently on version 10.
Not every customer needs 5.1 million IOPS, but most would like the reduced latency that comes from being able to being able to fully utilise Fibre Channel's performance. Humphrey gave the example of a mid-sised organisation that just wants faster database access. With conventional systems it would need to over-engineer the storage to get the required response time, but DataCore provides "a very adaptive architecture" that can accommodate various workloads.
Customers need the flexibility to buy what they need, not what they're told they can buy, he said.
And where implementing software-defined storage is usually seen as a "rip and replace" project, that's not the case with DataCore, which can be used to augment an existing environment, bringing together various point solutions in a way their vendors cannot manage.
DataCore has hardware alliances with server, networking and storage vendors, said Humphrey, and publishes reference architectures for assembling the various products.

DataCore Sets Record-Breaking Hyper-Converged Performance With Multi-node Highly Available Server SAN

http://www.storagenewsletter.com/rubriques/software/datacore-up-scales-record-breaking-hyper-converged-performance-with-multi-node-highly-available-server-san/

The new record-breaking performance results for a hyper-converged solution demonstrates the effectiveness of Parallel I/O technology in harnessing the untapped power of multi-core processors and disrupting the status quo of the storage industry.


The Results Speak for Themselves
Using the industry's most recognized storage benchmark for driving enterprise database workloads - the Storage Performance Council's SPC-1 - the company took on the classic high-end external storage arrays with a fully redundant, dual-node FC Server SAN solution, running itsSANsymphony software-defined storage services Platform on a pair of off-the-shelf Intel-based servers.


"With our first SPC-1 Price-Performance record [1], we set out to prove what could be accomplished with Parallel I/O in a single server, combining the benchmark's database workload and our storage stack in a single, atomic, hyper-converged system," said Ziya Aral, chairman, DataCore. "Now just a few months later, we are showcasing our progress and the effectiveness of multi-node scaling."

The new results put DataCore SANsymphony at number five on the SPC-1 Top Ten List for Performance[4], ranking only behind million-dollar mega-arrays including Huawei, Hitachi, HP XP7, and Kaminario, as well as DataCore's own Parallel Server hyper-converged configuration. The total price for the DataCore hyper-converged high-availability solution was $115,142.76, including three years of support. 

"We see the Server SAN architecture at the intersection of the hyperscale, convergence and flash trends. Storage intelligence has been moving back adjacent to compute, and Server SANs should be deployed as a best practice to enable low latency, high bandwidth and high availability in enterprise applications," said David Floyer, Chief Technology Officer of Wikibon. "The move to Server SAN architectures (aka hyper-converged infrastructure) has simplified operations by creating repeatable rack level deployments. DataCore with Parallel I/O software is demonstrating why these powerful multicore rack servers are becoming the basis for driving new levels of system performance and price-performance, and is a foundation for next generation system architecture." 

Aral adds, "With these new benchmark results, we up-scaled the configuration to two nodes, connected by Fibre Channel fabric, and reconfigured it for full high-availability with mirrored everything (mass storage, cache and software). Our objective was not only to set the performance record for hyper-converged systems, but to establish the corners of our performance envelope for the purposes of sizing and configuring what is otherwise an extremely flexible Software-Defined Storage scheme." 

Unlike competitive solutions, DataCore's mirroring comes standard with the capability to support local and stretched/metro clusters with automatic failover and failback protection across active-active synchronized copies of data located in geographically separate locations.  

Size Matters: Compact Server SAN for Lowest Total Cost of Ownership
In terms of lowering the total cost of ownership, it is also important to look at environmental and space considerations. Competitive storage solutions take up multiple 42U racks and many square feet of floor space, whereas the DataCore configuration occupies a mere fraction (12U) of one rack. 


Unlike traditional storage arrays, the DataCore nodes had the combined responsibility for running the database workloads and handling their I/O demands in the same servers - a much more challenging scenario.  DataCore's compact Server SAN solution collapses the infrastructure needed and significantly reduces the networking and administrative complexity and cost. It can also be non-disruptively upgraded at any time with more powerful servers and storage technology available in the open market. 

SANsymphony software availability an benchmark report detailsDataCore's latest software release has improved performance of SANsymphony and Hyper-converged Virtual SAN software solutions by up to 50%. The new software that was used as the basis for the SPC-1 benchmarking is available in June at no charge to current customers under support contracts.

The SPC-1 performance testing is designed to demonstrate a system's performance capabilities for business-critical enterprise workloads typically found in database and transaction processing environments. The audited configuration that was tested and priced includes SANsymphony Parallel I/O software on two standard Intel-based servers.

For complete configuration, pricing and performance details, see the SPC-1 Full Disclosure report.

Wednesday 1 June 2016

The Register: DataCore Parallel IO speed tweak may rewrite benchmark plus more VVol goodness


+Comment DataCore has accelerated its Parallel IO technology's performance by 50 per cent with a v10 PSP5 software release, when run in its SANsymphony and Hyper-converged Virtual SAN software products.
...Previous PSP5 software has given DataCore impressive SPC-1 benchmark results.
V10 PSP5 sees the maximum cache rise from 1TB to 8TB per node. We understand bigger RAM caches accelerate applications such as larger databases, where the amount of data being actively referenced typically exceeds 1TB. Both reads and writes to/from SSDs and disk benefit from larger caches.

DataCore has also added support for QLogic 32 Gbit/s Fibre Channel host bus adapters (HBAs), allowing more concurrent requests to be serviced over the same physical channel.
Server administrators can now create policies and self-provision storage to suit their needs using VMware vSphere Virtual Volumes (VVols) and Microsoft System Center Virtual Machine Manager (VMM).

DataCore says it's the only certified software vendor for VVols, and its capabilities allow VVols to work universally across all types of storage (disk subsystems, flash/SSD arrays, DAS, etc).

PSP5 includes enhanced support for VVols storage policy-based management (SPBM) using multi-tiered storage pools. Virtual disk templates can be tailored to establish different classes of service (storage profiles) that the vSphere administrators can choose from when creating virtual machines (VMs) or adding disks to those VMs.

The product also has richer performance-monitoring, charting and capacity-planning tools, plus finer-grained control over QoS and administrative access privileges at the virtual disk level.

SANsymphony adds support for lower cost, bulk storage for cold and archival data, such as the public cloud. In all, it supports up to 15 storage tiers.

Reg comment

El Reg expects DataCore and hardware partner Lenovo to come out with new SPC-1 benchmark-busting results using the PSP5 v10 software. The string of SPC-1 results by DataCore with its parallelization IO technology, combined with DDN's IME technology, must surely be causing all server SAN and hyper-converged vendors to investigate the technology as a way of getting a sure-fire IO performance boost.

How else can they compete with DataCore's price/performance?

...The PSP5 v10 enhancements will be generally available in June. Current DataCore customers can upgrade at no charge under their existing software update service and support contracts.