Skip to content

Elastic Map Reduce (EMR)

Amazon Elastic MapReduce (EMR) is a cloud-native big data platform that simplifies running big data frameworks, such as Apache Hadoop, Apache Spark, and Presto, on AWS. EMR provides a managed Hadoop framework that makes it easy, fast, and cost-effective to process vast amounts of data across dynamically scalable Amazon EC2 instances.

  • Managed Hadoop framework
  • Also supports Apache Spark, Presto, and other big data frameworks
  • Most commonly used for log analysis, data transformation, and machine learning
  • A step is a programatic task for performing a specific action, such as installing software or running a script
  • A cluster is a collection of Amazon EC2 instances that run the Hadoop framework and other big data tools