top of page

AWS Award: Finalist, 2021 AWS Startup Architecture of the Year Competition

Updated: Mar 3, 2023

EMR + Windjammer Spark Accelerator

Industry leading SQL query throughput per vCPU

= Transparent Acceleration and Reduction in Slaves

Why EMR Spark Acceleration?

  • Massive scale of use of EMR Spark

  • Spark’s JVM is very CPU intensive causing server sprawl, performance instability & management challenges

  • Spark does not fully exploit high bandwidth of today’s cloud storage systems, causing high query run times

  • Standard Spark fault tolerance requires persisting data at shuffle boundaries, motivating complex and expensive shuffle services

Windjammer EMR Spark Accelerator

  • More efficient use of expensive CPU resources: Native execution, MPP (massively parallel processing) dataflow clustered architecture eliminating JVM bottlenecks

  • Fully exploits S3 cloud storage bandwidth: Aggressive parallel,asynchronous prefetch of analytics data sets

  • Eliminates need for complex shuffle service: Checkpoint-based fault-tolerance uses reliable, high bandwidth S3 cloud storage: no need for special shuffle service while providing full query fault tolerance including spot instance and cluster interruptions

  • Transparent, 100% compatible

Use Much Less CPU

- CPU Seconds/Query (Lower is better)

Fully Exploit S3 Storage Bandwidth

Gbps/slave (Higher is better)

Large Reduction in Run Times and EC2 Nodes

Cut TCO to 1/3

If you are interested in...

  • Running Spark jobs faster with significant cost savings

  • 3x-5x increase in query throughput/vCPU to eliminate server sprawl

  • Making full use of S3 cloud storage bandwidth to speed up your Spark jobs

  • Eliminating the cost and hassle of a shuffle service

  • Achieving all this with no modifications to your existing Spark jobs

  • Doing all this with a simple bootstrap extension in EMR cluster creation

You will be interested in deploying EMR with Windjammer’s Spark Accelerator

We look forward to working with you!



bottom of page