Elastic Map Reduce (EMR)¶
Amazon Elastic MapReduce (EMR) is a cloud-native big data platform that simplifies running big data frameworks, such as Apache Hadoop, Apache Spark, and Presto, on AWS. EMR provides a managed Hadoop framework that makes it easy, fast, and cost-effective to process vast amounts of data across dynamically scalable Amazon EC2 instances.
- Managed Hadoop framework
- Also supports Apache Spark, Presto, and other big data frameworks
- Most commonly used for log analysis, data transformation, and machine learning
- A step is a programatic task for performing a specific action, such as installing software or running a script
- A cluster is a collection of Amazon EC2 instances that run the Hadoop framework and other big data tools