I've been working on a more efficient way of deploying EMR (Elastic MapReduce) clusters for "Big Data' processing using applications that come part of the Hadoop Suite. Originally, I was just using a parameterised Jenkins job with a lengthy AWS CLI command but that became difficult to maintain the more functionality I added to it. I won't share it with you, as it was 100+ lines long. I came across AWS CloudFormation which makes the deployments easy to build and maintain.
AWS CloudFormation enables you to create and manage AWS resources using Infrastructure as Code (I've attached the link below to the AWS CloudFormation Product & Service page for more information). During investigation of how I could script EMR in CloudFormation, I noticed there was not much resource available online to build a template which suited a tailor-made EMR cluster. I tried to use tools to build the template such as CloudForm and the CloudFormation template designer but no such luck. In the end, I took the most basic EMR template available on the AWS Knowledge base and built on top of it. Feel free to use it, I'll also keep it updated as I add to it.
---
AWSTemplateFormatVersion: '2010-09-09'
Description: Cloudformation Template to spin up EMR clusters V3 (Version 5 of EMR
only)
Parameters:
clusterName:
Description: Name of the cluster
Type: String
taskInstanceCount:
Description: Number of task instances
Type: String
AllowedValues:
- '1'
- '2'
- '3'
- '4'
- '5'
- '6'
- '7'
ConstraintDescription: Up to 7 nodes only
emrVersion:
Description: Version of EMR
Type: String
AllowedPattern: emr-5.[0-9].[0-9]
ConstraintDescription: 'Must be EMR Version 4 (i.e: emr-5.3.0)'
masterInstanceType:
Description: Instance type of Master Node
Type: String
coreInstanceType:
Description: Instance type of Core Node
Type: String
taskInstanceType:
Description: Instance type of Task Node
Type: String
environmentType:
Description: What environment do you want the cluster to be in
Type: String
s3BucketBasePath:
Description: Bucket to log EMR actions to
Type: String
taskBidPrice:
Description: Bid price for Task nodes
Type: String
terminationProtected:
Description: Is the cluster to have termination protection enabled
Type: String
AllowedValues:
- 'true'
- 'false'
ConstraintDescription: Boolean
awsRegion:
Description: awsRegion
Default: eu-west-1
AllowedValues:
- eu-west-1
- eu-central-1
Type: String
Conditions:
isLive:
Fn::Equals:
- Ref: environmentType
- live
Resources:
EMRClusterV5:
Type: AWS::EMR::Cluster
Properties:
Instances:
MasterInstanceGroup:
InstanceCount: 1
InstanceType:
Ref: masterInstanceType
Market: ON_DEMAND
Name: Master instance group - 1
CoreInstanceGroup:
InstanceCount: 1
InstanceType:
Ref: coreInstanceType
Market: ON_DEMAND
Name: Core instance group - 2
TerminationProtected:
Ref: terminationProtected
Ec2SubnetId: ENTER SUBNET HERE
Ec2KeyName: ENTER NAME OF SSH KEY HERE
EmrManagedMasterSecurityGroup: ENTER SECURITY GROUP HERE
EmrManagedSlaveSecurityGroup: ENTER SECURITY GROUP HERE
ServiceAccessSecurityGroup: ENTER SECURITY GROUP HERE
BootstrapActions:
- Name: NAME OF BOOTSTRAP
ScriptBootstrapAction:
Path: S3 LOCATION OF SHELL SCRIPT
Configurations:
- Classification: hadoop-log4j
ConfigurationProperties:
hadoop.log.maxfilesize: 256MB
hadoop.log.maxbackupindex: '3'
hadoop.security.log.maxfilesize: 256MB
hadoop.security.log.maxbackupindex: '3'
hdfs.audit.log.maxfilesize: 256MB
hdfs.audit.log.maxbackupindex: '3'
mapred.audit.log.maxfilesize: 256MB
mapred.audit.log.maxbackupindex: '3'
hadoop.mapreduce.jobsummary.log.maxfilesize: 256MB
hadoop.mapreduce.jobsummary.log.maxbackupindex: '3'
- Classification: hbase-log4j
ConfigurationProperties:
hbase.log.maxbackupindex: '3'
hbase.log.maxfilesize: 10MB
hbase.security.log.maxbackupindex: '3'
hbase.security.log.maxfilesize: 10MB
- Classification: yarn-site
ConfigurationProperties:
yarn.log-aggregation.retain-seconds: '43200'
Applications:
- Name: Hadoop
- Name: Hive
- Name: Pig
- Name: Hue
- Name: HCatalog
- Name: Sqoop
- Name: Ganglia
- Name: Spark
- Name: Oozie
- Name: Tez
Name:
Ref: clusterName
JobFlowRole: ENTER EMR ROLE HERE
ServiceRole: ENTER EMR ROLE HERE
ReleaseLabel:
Ref: emrVersion
LogUri:
Fn::Join:
- ''
- - s3n://
- Ref: s3BucketBasePath
- "/logs/"
VisibleToAllUsers: true
Tags:
- Key: Name
Value:
Fn::Join:
- ''
- - emr-instance-
- Ref: AWS::StackName
- ''
- Key: Environment
Value:
Ref: environmentType
- Key: Stack ID
Value:
Ref: AWS::StackName
EMRTaskNodes:
Type: AWS::EMR::InstanceGroupConfig
Properties:
InstanceCount:
Ref: taskInstanceCount
InstanceType:
Ref: taskInstanceType
BidPrice:
Ref: taskBidPrice
Market: SPOT
InstanceRole: TASK
Name: Task instance group - 3
JobFlowId:
Ref: EMRClusterV5
To deploy the stack - you would use the following command:
aws cloudformation create-stack --stack-name [STACK NAME] \
--template-url [LOCATION OF TEMPLATE] --parameters \
ParameterKey=clusterName,ParameterValue=$stackName \
ParameterKey=taskInstanceCount,ParameterValue=$taskNodeCount \
ParameterKey=coreInstanceType,ParameterValue=$coreNodeInstanceType \
ParameterKey=taskInstanceType,ParameterValue=$taskNodeInstanceType \
ParameterKey=emrVersion,ParameterValue=$emrVersion \
ParameterKey=environmentType,ParameterValue=$environmentType \
ParameterKey=masterInstanceType,ParameterValue=$masterNodeInstanceType \
ParameterKey=s3BucketBasePath,ParameterValue=$s3BucketBasePath \
ParameterKey=terminationProtected,ParameterValue=$terminationProtected \
ParameterKey=taskBidPrice,ParameterValue=$bidPrice --region $awsRegion
To update the stack (e.g number of core nodes):
aws cloudformation update-stack --stack-name [STACK NAME] \
--use-previous-template --parameters \
ParameterKey=clusterName,UsePreviousValue=true \
ParameterKey=taskInstanceCount,ParameterValue=$taskNodeCount \
ParameterKey=coreInstanceType,UsePreviousValue=true \
ParameterKey=taskInstanceType,UsePreviousValue=true \
ParameterKey=emrVersion,UsePreviousValue=true \
ParameterKey=environmentType,UsePreviousValue=true \
ParameterKey=masterInstanceType,UsePreviousValue=true \
ParameterKey=s3BucketBasePath,UsePreviousValue=true \
ParameterKey=terminationProtected,UsePreviousValue=true \
ParameterKey=taskBidPrice,UsePreviousValue=true --region $awsRegion
The "--use-previous-template" switch and "UsePreviousValue" resource ensure nothing else changes.
Finally, to delete the stack:
aws cloudformation update-stack --stack-name [STACK_NAME] \
--use-previous-template --parameters \
ParameterKey=clusterName,UsePreviousValue=true \
ParameterKey=taskInstanceCount,ParameterValue=$taskNodeCount \
ParameterKey=coreInstanceType,UsePreviousValue=true \
ParameterKey=taskInstanceType,UsePreviousValue=true \
ParameterKey=emrVersion,UsePreviousValue=true \
ParameterKey=environmentType,UsePreviousValue=true \
ParameterKey=masterInstanceType,UsePreviousValue=true \
ParameterKey=s3BucketBasePath,UsePreviousValue=true \
ParameterKey=terminationProtected,ParameterValue=false \
ParameterKey=taskBidPrice,UsePreviousValue=true --region $awsRegion
sleep 20
aws cloudformation delete-stack --stack-name [STACK_NAME] --region [REGION]
The first section of the command updates the stack by changing the termination protection value to 'false'. Once that has completed, the stack is then deleted.
In conclusion, we've changed a script which consists of 100+ lines of code to commands which average 14 lines (if you want to include line continuation).
Link to AWS CloudFormation: https://aws.amazon.com/cloudformation/
Thanks for posting this, it has been very useful. A couple of issues I encountered:
ReplyDelete1. I received error: "Encountered unsupported property Configuration"
I had to delete the configuration sections from your template.
2. The parameter s3BucketBasePath" must have a Type or it will be rejected.
3. I changed the "Name" property to be "Ref": "clusterName"
Hey Tim
DeleteNo problem, I've updated this a lot since I posted it - will update when I get a second.
1. I guess this depends on whether the s3 path was correct or not or your custom config (this section of the template was to tailor to my requirements
2. This must have been a typo my end - apologies
3. I did the same sometime ago
Thanks for the feedback - much appreciated.
Mike.
Tim
DeleteAs promised, I've updated the template.
Cheers
Mike.
Invalid bootstrap action path, must be a location in Amazon S3 or a local path starting with 'file:'.
ReplyDelete
ReplyDeleteRegarding all those parameters in the update-stack call, do you really need to specify them if they aren't changing. Based on what it says here, it uses what's already in yor template:
http://docs.aws.amazon.com/AWSCloudFormation/latest/APIReference/API_Parameter.html
"If you don't specify a key and value for a particular parameter, AWS CloudFormation uses the default value that is specified in your template."
But then, why do they have "UsePreviousValue" so maybe you have to. Seems excessive.
Nice Information my sincere thanks for sharing this post Please continue to share this kind of post
ReplyDeleteAWS Training in BTM Layout
nice blog has been shared by you. before i read this blog i didn't have any knowledge about this but now i got some knowledge about this. so keep on sharing such kind of an interesting blogs.
ReplyDeleteSelenium Training in Bangalore
This comment has been removed by the author.
ReplyDeleteThis comment has been removed by the author.
ReplyDeleteThose guidelines additionally worked to become a good way to
ReplyDeleterecognize that other people online have the identical fervor like mine
to grasp great deal more around this condition.
AWS Training in Chennai
AWS Training in Bangalore
AWS Training in Bangalore
I wants master node instance Id into the output section, which attribute should i used to get that
ReplyDelete3d Animation Course training Classes
Best institute for 3d Animation and Multimedia
Best institute for 3d Animation Course training Classes in Noida- webtrackker Is providing the 3d Animation and Multimedia training in noida with 100% placement supports. for more call - 8802820025.
3D Animation Training in Noida
Company Address:
Webtrackker Technology
C- 67, Sector- 63, Noida
Phone: 01204330760, 8802820025
Email: info@webtrackker.com
Website: http://webtrackker.com/Best-institute-3dAnimation-Multimedia-Course-training-Classes-in-Noida.php
Our courses:
3D Animation and Multimedia Training in Noida.
3d Multimedia Institute in Noida.
Animation and Multimedia Training in Noida.
Animation and Multimedia Training institute in Noida .
Multimedia Training institute in Noida.
Multimedia Training classes in Noida.
3D Animation Training in Noida.
3D Animation Training institute in Noida.
Hi, thank you very much for new information , i learned something new. Very well written. It was so good to read and usefull to improve knowledge. Keep posting. If you are looking for any python related information please visit our website
ReplyDeletepython training in pune.
Your info is really amazing with impressive content..Excellent blog with informative concept. Really I feel happy to see this useful blog, Thanks for sharing such a nice blog..
ReplyDeleteIf you are looking for any Data science Related information please visit our website Data science courses in Pune page!
This comment has been removed by the author.
ReplyDeleteThanks for sharing useful information. I learned something new from your bog. Its very interesting and informative. keep updating. If you are looking for any R Programming related information, please visit our website R Programming training institute in bangalore
ReplyDeleteIt's very Inspiring to Visit your Site...I Grasp something new from your bog...keep Updating
ReplyDeleteJava training in chennai | Java training in annanagar | Java training in omr | Java training in porur | Java training in tambaram | Java training in velachery
nice blog has been shared by you. before i read this blog i didn't have any knowledge about this but now i got some knowledge about this. so keep on sharing such kind of an interesting blogs.
ReplyDeletedata science training in chennai
data science training in omr
android training in chennai
android training in omr
devops training in chennai
devops training in omr
artificial intelligence training in chennai
artificial intelligence training in omr
Content on your blog is really informative 50 High Quality for just 50 INR
ReplyDelete2000 Backlink at cheapest
5000 Backlink at cheapest
Boost DA upto 15+ at cheapest
Boost DA upto 25+ at cheapest
Boost DA upto 35+ at cheapest
Boost DA upto 45+ at cheapest
Thanksyou for the valuable content.50 High Quality for just 50 INR
ReplyDelete2000 Backlink at cheapest
5000 Backlink at cheapest
Boost DA upto 15+ at cheapest
Boost DA upto 25+ at cheapest
Boost DA upto 35+ at cheapest
Boost DA upto 45+ at cheapest
Nice content, Keep it up. Thanks for sharing.
ReplyDeletehttps://realcracks.org/