EMR + Windjammer Spark Accelerator
Industry leading SQL query throughput per vCPU
= Transparent Acceleration and Reduction in Slaves
Why EMR Spark Acceleration?
Massive scale of use of EMR Spark
Spark’s JVM is very CPU intensive causing server sprawl, performance instability & management challenges
Spark does not fully exploit high bandwidth of today’s cloud storage systems, causing high query run times
Standard Spark fault tolerance requires persisting data at shuffle boundaries, motivating complex and expensive shuffle services
Windjammer EMR Spark Accelerator
More efficient use of expensive CPU resources: Native execution, MPP (massively parallel processing) dataflow clustered architecture eliminating JVM bottlenecks
Fully exploits S3 cloud storage bandwidth: Aggressive parallel,asynchronous prefetch of analytics data sets
Eliminates need for complex shuffle service: Checkpoint-based fault-tolerance uses reliable, high bandwidth S3 cloud storage: no need for special shuffle service while providing full query fault tolerance including spot instance and cluster interruptions
Transparent, 100% compatible
Use Much Less CPU
- CPU Seconds/Query (Lower is better)
Fully Exploit S3 Storage Bandwidth
Gbps/slave (Higher is better)
Large Reduction in Run Times and EC2 Nodes
Cut TCO to 1/3
If you are interested in...
Running Spark jobs faster with significant cost savings
3x-5x increase in query throughput/vCPU to eliminate server sprawl
Making full use of S3 cloud storage bandwidth to speed up your Spark jobs
Eliminating the cost and hassle of a shuffle service
Achieving all this with no modifications to your existing Spark jobs
Doing all this with a simple bootstrap extension in EMR cluster creation
You will be interested in deploying EMR with Windjammer’s Spark Accelerator
We look forward to working with you!
Comments