Cloud, DevOps, Platform Engineering and generally making assembly lines for developers
Friday, 14 October 2016
Wednesday, 21 September 2016
A well-populated AWS CloudFormation template for building an EMR Cluster
22/02/17: Update to the template including support for EMR 5 and task node functionality for processing (task nodes optional)
I've been working on a more efficient way of deploying EMR (Elastic MapReduce) clusters for "Big Data' processing using applications that come part of the Hadoop Suite. Originally, I was just using a parameterised Jenkins job with a lengthy AWS CLI command but that became difficult to maintain the more functionality I added to it. I won't share it with you, as it was 100+ lines long. I came across AWS CloudFormation which makes the deployments easy to build and maintain.
AWS CloudFormation enables you to create and manage AWS resources using Infrastructure as Code (I've attached the link below to the AWS CloudFormation Product & Service page for more information). During investigation of how I could script EMR in CloudFormation, I noticed there was not much resource available online to build a template which suited a tailor-made EMR cluster. I tried to use tools to build the template such as CloudForm and the CloudFormation template designer but no such luck. In the end, I took the most basic EMR template available on the AWS Knowledge base and built on top of it. Feel free to use it, I'll also keep it updated as I add to it.
To deploy the stack - you would use the following command:
To update the stack (e.g number of core nodes):
The "--use-previous-template" switch and "UsePreviousValue" resource ensure nothing else changes.
Finally, to delete the stack:
The first section of the command updates the stack by changing the termination protection value to 'false'. Once that has completed, the stack is then deleted.
In conclusion, we've changed a script which consists of 100+ lines of code to commands which average 14 lines (if you want to include line continuation).
Link to AWS CloudFormation: https://aws.amazon.com/cloudformation/
I've been working on a more efficient way of deploying EMR (Elastic MapReduce) clusters for "Big Data' processing using applications that come part of the Hadoop Suite. Originally, I was just using a parameterised Jenkins job with a lengthy AWS CLI command but that became difficult to maintain the more functionality I added to it. I won't share it with you, as it was 100+ lines long. I came across AWS CloudFormation which makes the deployments easy to build and maintain.
AWS CloudFormation enables you to create and manage AWS resources using Infrastructure as Code (I've attached the link below to the AWS CloudFormation Product & Service page for more information). During investigation of how I could script EMR in CloudFormation, I noticed there was not much resource available online to build a template which suited a tailor-made EMR cluster. I tried to use tools to build the template such as CloudForm and the CloudFormation template designer but no such luck. In the end, I took the most basic EMR template available on the AWS Knowledge base and built on top of it. Feel free to use it, I'll also keep it updated as I add to it.
---
AWSTemplateFormatVersion: '2010-09-09'
Description: Cloudformation Template to spin up EMR clusters V3 (Version 5 of EMR
only)
Parameters:
clusterName:
Description: Name of the cluster
Type: String
taskInstanceCount:
Description: Number of task instances
Type: String
AllowedValues:
- '1'
- '2'
- '3'
- '4'
- '5'
- '6'
- '7'
ConstraintDescription: Up to 7 nodes only
emrVersion:
Description: Version of EMR
Type: String
AllowedPattern: emr-5.[0-9].[0-9]
ConstraintDescription: 'Must be EMR Version 4 (i.e: emr-5.3.0)'
masterInstanceType:
Description: Instance type of Master Node
Type: String
coreInstanceType:
Description: Instance type of Core Node
Type: String
taskInstanceType:
Description: Instance type of Task Node
Type: String
environmentType:
Description: What environment do you want the cluster to be in
Type: String
s3BucketBasePath:
Description: Bucket to log EMR actions to
Type: String
taskBidPrice:
Description: Bid price for Task nodes
Type: String
terminationProtected:
Description: Is the cluster to have termination protection enabled
Type: String
AllowedValues:
- 'true'
- 'false'
ConstraintDescription: Boolean
awsRegion:
Description: awsRegion
Default: eu-west-1
AllowedValues:
- eu-west-1
- eu-central-1
Type: String
Conditions:
isLive:
Fn::Equals:
- Ref: environmentType
- live
Resources:
EMRClusterV5:
Type: AWS::EMR::Cluster
Properties:
Instances:
MasterInstanceGroup:
InstanceCount: 1
InstanceType:
Ref: masterInstanceType
Market: ON_DEMAND
Name: Master instance group - 1
CoreInstanceGroup:
InstanceCount: 1
InstanceType:
Ref: coreInstanceType
Market: ON_DEMAND
Name: Core instance group - 2
TerminationProtected:
Ref: terminationProtected
Ec2SubnetId: ENTER SUBNET HERE
Ec2KeyName: ENTER NAME OF SSH KEY HERE
EmrManagedMasterSecurityGroup: ENTER SECURITY GROUP HERE
EmrManagedSlaveSecurityGroup: ENTER SECURITY GROUP HERE
ServiceAccessSecurityGroup: ENTER SECURITY GROUP HERE
BootstrapActions:
- Name: NAME OF BOOTSTRAP
ScriptBootstrapAction:
Path: S3 LOCATION OF SHELL SCRIPT
Configurations:
- Classification: hadoop-log4j
ConfigurationProperties:
hadoop.log.maxfilesize: 256MB
hadoop.log.maxbackupindex: '3'
hadoop.security.log.maxfilesize: 256MB
hadoop.security.log.maxbackupindex: '3'
hdfs.audit.log.maxfilesize: 256MB
hdfs.audit.log.maxbackupindex: '3'
mapred.audit.log.maxfilesize: 256MB
mapred.audit.log.maxbackupindex: '3'
hadoop.mapreduce.jobsummary.log.maxfilesize: 256MB
hadoop.mapreduce.jobsummary.log.maxbackupindex: '3'
- Classification: hbase-log4j
ConfigurationProperties:
hbase.log.maxbackupindex: '3'
hbase.log.maxfilesize: 10MB
hbase.security.log.maxbackupindex: '3'
hbase.security.log.maxfilesize: 10MB
- Classification: yarn-site
ConfigurationProperties:
yarn.log-aggregation.retain-seconds: '43200'
Applications:
- Name: Hadoop
- Name: Hive
- Name: Pig
- Name: Hue
- Name: HCatalog
- Name: Sqoop
- Name: Ganglia
- Name: Spark
- Name: Oozie
- Name: Tez
Name:
Ref: clusterName
JobFlowRole: ENTER EMR ROLE HERE
ServiceRole: ENTER EMR ROLE HERE
ReleaseLabel:
Ref: emrVersion
LogUri:
Fn::Join:
- ''
- - s3n://
- Ref: s3BucketBasePath
- "/logs/"
VisibleToAllUsers: true
Tags:
- Key: Name
Value:
Fn::Join:
- ''
- - emr-instance-
- Ref: AWS::StackName
- ''
- Key: Environment
Value:
Ref: environmentType
- Key: Stack ID
Value:
Ref: AWS::StackName
EMRTaskNodes:
Type: AWS::EMR::InstanceGroupConfig
Properties:
InstanceCount:
Ref: taskInstanceCount
InstanceType:
Ref: taskInstanceType
BidPrice:
Ref: taskBidPrice
Market: SPOT
InstanceRole: TASK
Name: Task instance group - 3
JobFlowId:
Ref: EMRClusterV5
To deploy the stack - you would use the following command:
aws cloudformation create-stack --stack-name [STACK NAME] \
--template-url [LOCATION OF TEMPLATE] --parameters \
ParameterKey=clusterName,ParameterValue=$stackName \
ParameterKey=taskInstanceCount,ParameterValue=$taskNodeCount \
ParameterKey=coreInstanceType,ParameterValue=$coreNodeInstanceType \
ParameterKey=taskInstanceType,ParameterValue=$taskNodeInstanceType \
ParameterKey=emrVersion,ParameterValue=$emrVersion \
ParameterKey=environmentType,ParameterValue=$environmentType \
ParameterKey=masterInstanceType,ParameterValue=$masterNodeInstanceType \
ParameterKey=s3BucketBasePath,ParameterValue=$s3BucketBasePath \
ParameterKey=terminationProtected,ParameterValue=$terminationProtected \
ParameterKey=taskBidPrice,ParameterValue=$bidPrice --region $awsRegion
To update the stack (e.g number of core nodes):
aws cloudformation update-stack --stack-name [STACK NAME] \
--use-previous-template --parameters \
ParameterKey=clusterName,UsePreviousValue=true \
ParameterKey=taskInstanceCount,ParameterValue=$taskNodeCount \
ParameterKey=coreInstanceType,UsePreviousValue=true \
ParameterKey=taskInstanceType,UsePreviousValue=true \
ParameterKey=emrVersion,UsePreviousValue=true \
ParameterKey=environmentType,UsePreviousValue=true \
ParameterKey=masterInstanceType,UsePreviousValue=true \
ParameterKey=s3BucketBasePath,UsePreviousValue=true \
ParameterKey=terminationProtected,UsePreviousValue=true \
ParameterKey=taskBidPrice,UsePreviousValue=true --region $awsRegion
The "--use-previous-template" switch and "UsePreviousValue" resource ensure nothing else changes.
Finally, to delete the stack:
aws cloudformation update-stack --stack-name [STACK_NAME] \
--use-previous-template --parameters \
ParameterKey=clusterName,UsePreviousValue=true \
ParameterKey=taskInstanceCount,ParameterValue=$taskNodeCount \
ParameterKey=coreInstanceType,UsePreviousValue=true \
ParameterKey=taskInstanceType,UsePreviousValue=true \
ParameterKey=emrVersion,UsePreviousValue=true \
ParameterKey=environmentType,UsePreviousValue=true \
ParameterKey=masterInstanceType,UsePreviousValue=true \
ParameterKey=s3BucketBasePath,UsePreviousValue=true \
ParameterKey=terminationProtected,ParameterValue=false \
ParameterKey=taskBidPrice,UsePreviousValue=true --region $awsRegion
sleep 20
aws cloudformation delete-stack --stack-name [STACK_NAME] --region [REGION]
The first section of the command updates the stack by changing the termination protection value to 'false'. Once that has completed, the stack is then deleted.
In conclusion, we've changed a script which consists of 100+ lines of code to commands which average 14 lines (if you want to include line continuation).
Link to AWS CloudFormation: https://aws.amazon.com/cloudformation/
Tuesday, 14 June 2016
Wordpress HTTPS site setup behind an SSL terminating Loadbalancer
I've recently been carrying out work on Wordpress multi-site functionality for some websites. More recently, configuring Wordpress sites over HTTPS. There is a lot of tutorials out there on how to set it up, but many of them assume you are not using any kind of load balancing technology or SSL termination. After hours of troubleshooting why my new multi-site was not displaying any data at all over HTTPS (after changing the site URLs and adding in my relevant entries in .htaccess as instructed), I found that you need to modify the wp-config.php to turn on HTTPS if the X-Forwarded-Proto header passed from the load balancer contains 'https'. This is the snippet of code you need to have in your wp-config.php:
This needs to go directly above the "/* That's all, stop editing! Happy blogging. */" line.
if (isset($_SERVER['HTTP_X_FORWARDED_PROTO']) && $_SERVER['HTTP_X_FORWARDED_PROTO'] == 'https') $_SERVER['HTTPS'] = 'on';
This needs to go directly above the "/* That's all, stop editing! Happy blogging. */" line.
This will only work for Loadbalancers which support passing the X-Forwarded-Proto header, in this case, I am using Amazon's ELB (Elastic Loadbalancer)
Additionally, ensure your .htaccess file contains the relevant rewrite rules for your site domain, e.g:
<IfModule mod_rewrite.c> RewriteEngine On RewriteCond %{SERVER_PORT} 80 RewriteRule ^(.*)$ https://www.domain.com/$1 [R,L] </IfModule>
If you wish to be selective on what domain you want to rewrite - add the additional conditional statement line:
<IfModule mod_rewrite.c> RewriteEngine On RewriteCond %{HTTP_HOST} ^www\.domain\.com RewriteCond %{SERVER_PORT} 80 RewriteRule ^(.*)$ https://www.domain.com/$1 [R,L] </IfModule>
Subscribe to:
Posts (Atom)
Platform Engineering: Developer Experience
Intro The role of Platform Engineer as evolved through the ages. I’ve gone through all the names: Sys Admin, Infrastructure Engineer, DevOps...
-
22/02/17: Update to the template including support for EMR 5 and task node functionality for processing (task nodes optional) I've bee...
-
This post illustrates how to migrate a Solaris zone from one global host to another using zfs send and receive via ssh. This procedure is ve...