Tuesday, 18 April 2023

Platform Engineering: Developer Experience

Intro

The role of Platform Engineer as evolved through the ages. I’ve gone through all the names: Sys Admin, Infrastructure Engineer, DevOps Engineer, Cloud Engineer, Platform Engineer, but it all comes back to the same aim which is to help deliver a product to market as quickly, securely and easily as possible.

As Platform Engineers, we build toolchains and workflows that empower engineers to iterate effectively and provide a platform. We enable self-service capabilities to improve developer velocity and therefore speed up product to market which is the end goal. A typical frontend developer just wants an environment that comes fully stacked with everything they need to deploy their code, not necessarily caring where it runs, it's in our best interest to make that a self-service want.

Here are a few topics I'd like to share that have improved Developer Experience generally speaking and in a recent engagement I was apart of.

Do Kanban... If you can...

If you can, try and implement a Kanban framework for your team. A team of platform engineers can be a very reactive team. Quite often requests come in ad-hoc and we find ourselves prioritising requests and requirements in real-time. If we used the SCRUM framework –we would constantly break the sprints. The product informs the infrastructure, the infrastructure does not build the product. The move to Kanban really helped us in a recent engagement we were on:

  • We had 4 columns: backlog, ready, in progress and done
  • We also had an expedite lane for high priority tickets (with an agreed expedite policy)
  • We had telemetry such as lead time (backlog-done) and cycle time (in progress-done) to keep the PMs happy along with t-shirt sizing our user stories
  • Rituals included replenishment, retrospective and a general catchup and our cycles were typically 2 weeks

CRUD* is not a bad word...

If done properly, you can grant developers access to what they need in a lower environment. Instilling trust is crucial to building a good developer experience.

In a recent engagement I was on, Developers had access to manipulate services in their respected kubernetes namespaces (deletion of pods, rollout restart of deployment etc., viewing logs and port-forwarding to test api functionality), and when they needed to revert back it was all Infrastructure as Code, so running a build put everything back.

We leveraged istio services and one of the developers needed to inject some faults into one of their service’s API to test failure and how the application would react (would it drop the event or place it back into the messaging queue until the service was available). We made it possible for the developers to set fault injection up themselves by CRUDing an Istio VirtualService alongside the main Istio VirtualService to abort requests based on a matching correlationId header so the application itself could still be active but drop connections based on the header value.

* Create Read Update Delete

Harmonise Cycles and communicate cycle goals…

Harmonise cycles and communicate cycle goals. Regardless of what framework you use as part of your ways of working, attempt to plan your cycles and the developer sprint cycles in parallel. That way there is no misalignment of priorities and there is visibility of work on both sides. 

I was in a recent engagement where the infrastructure squad had priorities of decommissioning old infrastructure and the engineering squads had the need of new infrastructure to be provisioned. This resulted in a clash of priorities and we had to reset and realign expectations of each team which slowed down work. You could go further and have representatives at some of each other’s rituals but typically alignment of goals and priorities adds enough clarity

Write yourself out the role...

Write yourself out the role to focus on other awesome stuff. PaaS where possible. Treat the platform as a product. Version control it. You can’t build a platform and decide it’s finished. Don’t forget maintenance, enhancements and features. If developers need a repository with a set of pipelines and build validations etc. Modularise it and provide it as a service. Seek developer input and understanding how developers work. Don’t build the product and hope for the best.

In a recent engagement, the developers were using Azure DevOps to store, build and deploy their code. We managed to leverage the ADO provider in terraform and wrapped everything into a module, the developer only needed to specify the name of the repository they wished to build and the module built out all what was needed including the repository, pipelines and build validations.

Have documentation in place such as a runbook or platform guide for the developers to use. We built out one recently to perform actions such as creating pipelines, adding tools to build agents, creating kafka credentials and topics etc. 

When building out User stories always have an acceptance criteria of documenting where appropriate so nothing gets left out. Having this in place enabled us to focus on other areas of improvement and optimisation of the infrastructure stack (reducing costs, optimising for performance, fun spikes etc.)

In Summary... 

  • PaaS where possible. Treat the platform as a product
  • Seek input and harmonise cycles and communicate
  • Remember that the platform is the golden path to developer productivity. I said originally that we build toolchains and workflows that empower engineers to iterate effectively and provide a platform. The developer experience is built around the toolchains and workflows developer use. 

AWS Kinesis Flink Application not supported in terraform

UPDATE (18/04/23): There is now a terraform resource for an Apache Flink Kinesis Application, please see here for the terraform documentation.

So until the below PR is addressed:

https://github.com/terraform-providers/terraform-provider-aws/pull/11652

I've been using null_resources for Apache Flink Kinesis Application. It's not great, but it's a workaround. See below example:

resource "null_resource" "kinesis_application" {
  provisioner "local-exec" {
    when    = create
    command = "aws kinesisanalyticsv2 create-application --cli-input-json '{\"ApplicationName\":\"${local.name}\",\"RuntimeEnvironment\":\"FLINK-1_8\",\"ServiceExecutionRole\":\"${aws_iam_role.main.arn}\",\"ApplicationConfiguration\":{\"FlinkApplicationConfiguration\":{\"MonitoringConfiguration\":{\"ConfigurationType\":\"CUSTOM\",\"MetricsLevel\":\"APPLICATION\",\"LogLevel\":\"WARN\"},\"ParallelismConfiguration\":{\"ConfigurationType\":\"CUSTOM\",\"Parallelism\":1,\"ParallelismPerKPU\":1,\"AutoScalingEnabled\":true}},\"EnvironmentProperties\":{\"PropertyGroups\":[{\"PropertyGroupId\":\"KdaConfigProperties\",\"PropertyMap\":{\"aws.kinesis.input.stream\":\"${local.project}-event_aggregator-kda-input-${local.environment}\",\"aws.kinesis.output.stream\":\"${local.project}-event_aggregator-kda-output-${local.environment}\",\"aws.region\":\"us-east-1\",\"AggregationEnabled\":\"false\",\"flink.stream.initpos\":\"TRIM_HORIZON\"}}]},\"ApplicationCodeConfiguration\":{\"CodeContent\":{\"S3ContentLocation\":{\"BucketARN\":\"${aws_s3_bucket.main.arn}\",\"FileKey\":\"event-aggregator.jar\"}},\"CodeContentType\":\"ZIPFILE\"},\"ApplicationSnapshotConfiguration\":{\"SnapshotsEnabled\":true}},\"CloudWatchLoggingOptions\":[{\"LogStreamARN\":\"${aws_cloudwatch_log_stream.main.arn}\"}]}' --profile ${var.aws_profile}"
  }

  provisioner "local-exec" {
    when    = destroy
    command = <<EOT
export TIMESTAMP=$(aws kinesisanalyticsv2 describe-application --application-name ${local.name} --profile ${var.aws_profile} --query 'ApplicationDetail.CreateTimestamp');
aws kinesisanalyticsv2 delete-application --create-timestamp $TIMESTAMP --application-name ${local.name} --profile ${var.aws_profile}
EOT
  }
}


Wednesday, 21 September 2016

A well-populated AWS CloudFormation template for building an EMR Cluster

22/02/17: Update to the template including support for EMR 5 and task node functionality for processing (task nodes optional)

I've been working on a more efficient way of deploying EMR (Elastic MapReduce) clusters for "Big Data' processing using applications that come part of the Hadoop Suite. Originally, I was just using a parameterised Jenkins job with a lengthy AWS CLI command but that became difficult to maintain the more functionality I added to it. I won't share it with you, as it was 100+ lines long. I came across AWS CloudFormation which makes the deployments easy to build and maintain.

AWS CloudFormation enables you to create and manage AWS resources using Infrastructure as Code (I've attached the link below to the AWS CloudFormation Product & Service page for more information). During investigation of how I could script EMR in CloudFormation, I noticed there was not much resource available online to build a template which suited a tailor-made EMR cluster. I tried to use tools to build the template such as CloudForm and the CloudFormation template designer but no such luck. In the end, I took the most basic EMR template available on the AWS Knowledge base and built on top of it. Feel free to use it, I'll also keep it updated as I add to it.

---
AWSTemplateFormatVersion: '2010-09-09'
Description: Cloudformation Template to spin up EMR clusters V3 (Version 5 of EMR
  only)
Parameters:
  clusterName:
    Description: Name of the cluster
    Type: String
  taskInstanceCount:
    Description: Number of task instances
    Type: String
    AllowedValues:
    - '1'
    - '2'
    - '3'
    - '4'
    - '5'
    - '6'
    - '7'
    ConstraintDescription: Up to 7 nodes only
  emrVersion:
    Description: Version of EMR
    Type: String
    AllowedPattern: emr-5.[0-9].[0-9]
    ConstraintDescription: 'Must be EMR Version 4 (i.e: emr-5.3.0)'
  masterInstanceType:
    Description: Instance type of Master Node
    Type: String
  coreInstanceType:
    Description: Instance type of Core Node
    Type: String
  taskInstanceType:
    Description: Instance type of Task Node
    Type: String
  environmentType:
    Description: What environment do you want the cluster to be in
    Type: String
  s3BucketBasePath:
    Description: Bucket to log EMR actions to
    Type: String
  taskBidPrice:
    Description: Bid price for Task nodes
    Type: String
  terminationProtected:
    Description: Is the cluster to have termination protection enabled
    Type: String
    AllowedValues:
    - 'true'
    - 'false'
    ConstraintDescription: Boolean
  awsRegion:
    Description: awsRegion
    Default: eu-west-1
    AllowedValues:
    - eu-west-1
    - eu-central-1
    Type: String
Conditions:
  isLive:
    Fn::Equals:
    - Ref: environmentType
    - live
Resources:
  EMRClusterV5:
    Type: AWS::EMR::Cluster
    Properties:
      Instances:
        MasterInstanceGroup:
          InstanceCount: 1
          InstanceType:
            Ref: masterInstanceType
          Market: ON_DEMAND
          Name: Master instance group - 1
        CoreInstanceGroup:
          InstanceCount: 1
          InstanceType:
            Ref: coreInstanceType
          Market: ON_DEMAND
          Name: Core instance group - 2
        TerminationProtected:
          Ref: terminationProtected
        Ec2SubnetId: ENTER SUBNET HERE
        Ec2KeyName: ENTER NAME OF SSH KEY HERE
        EmrManagedMasterSecurityGroup: ENTER SECURITY GROUP HERE
        EmrManagedSlaveSecurityGroup: ENTER SECURITY GROUP HERE
        ServiceAccessSecurityGroup: ENTER SECURITY GROUP HERE
      BootstrapActions:
      - Name: NAME OF BOOTSTRAP
        ScriptBootstrapAction:
          Path: S3 LOCATION OF SHELL SCRIPT
      Configurations:
      - Classification: hadoop-log4j
        ConfigurationProperties:
          hadoop.log.maxfilesize: 256MB
          hadoop.log.maxbackupindex: '3'
          hadoop.security.log.maxfilesize: 256MB
          hadoop.security.log.maxbackupindex: '3'
          hdfs.audit.log.maxfilesize: 256MB
          hdfs.audit.log.maxbackupindex: '3'
          mapred.audit.log.maxfilesize: 256MB
          mapred.audit.log.maxbackupindex: '3'
          hadoop.mapreduce.jobsummary.log.maxfilesize: 256MB
          hadoop.mapreduce.jobsummary.log.maxbackupindex: '3'
      - Classification: hbase-log4j
        ConfigurationProperties:
          hbase.log.maxbackupindex: '3'
          hbase.log.maxfilesize: 10MB
          hbase.security.log.maxbackupindex: '3'
          hbase.security.log.maxfilesize: 10MB
      - Classification: yarn-site
        ConfigurationProperties:
          yarn.log-aggregation.retain-seconds: '43200'
      Applications:
      - Name: Hadoop
      - Name: Hive
      - Name: Pig
      - Name: Hue
      - Name: HCatalog
      - Name: Sqoop
      - Name: Ganglia
      - Name: Spark
      - Name: Oozie
      - Name: Tez
      Name:
        Ref: clusterName
      JobFlowRole: ENTER EMR ROLE HERE
      ServiceRole: ENTER EMR ROLE HERE
      ReleaseLabel:
        Ref: emrVersion
      LogUri:
        Fn::Join:
        - ''
        - - s3n://
          - Ref: s3BucketBasePath
          - "/logs/"
      VisibleToAllUsers: true
      Tags:
      - Key: Name
        Value:
          Fn::Join:
          - ''
          - - emr-instance-
            - Ref: AWS::StackName
            - ''
      - Key: Environment
        Value:
          Ref: environmentType
      - Key: Stack ID
        Value:
          Ref: AWS::StackName
  EMRTaskNodes:
    Type: AWS::EMR::InstanceGroupConfig
    Properties:
      InstanceCount:
        Ref: taskInstanceCount
      InstanceType:
        Ref: taskInstanceType
      BidPrice:
        Ref: taskBidPrice
      Market: SPOT
      InstanceRole: TASK
      Name: Task instance group - 3
      JobFlowId:
        Ref: EMRClusterV5


To deploy the stack - you would use the following command:

aws cloudformation create-stack --stack-name [STACK NAME] \
--template-url [LOCATION OF TEMPLATE] --parameters \
ParameterKey=clusterName,ParameterValue=$stackName \
ParameterKey=taskInstanceCount,ParameterValue=$taskNodeCount \
ParameterKey=coreInstanceType,ParameterValue=$coreNodeInstanceType \
ParameterKey=taskInstanceType,ParameterValue=$taskNodeInstanceType \
ParameterKey=emrVersion,ParameterValue=$emrVersion \
ParameterKey=environmentType,ParameterValue=$environmentType \
ParameterKey=masterInstanceType,ParameterValue=$masterNodeInstanceType \
ParameterKey=s3BucketBasePath,ParameterValue=$s3BucketBasePath \
ParameterKey=terminationProtected,ParameterValue=$terminationProtected \
ParameterKey=taskBidPrice,ParameterValue=$bidPrice --region $awsRegion

To update the stack (e.g number of core nodes):

aws cloudformation update-stack --stack-name [STACK NAME] \
--use-previous-template --parameters \
ParameterKey=clusterName,UsePreviousValue=true \
ParameterKey=taskInstanceCount,ParameterValue=$taskNodeCount \
ParameterKey=coreInstanceType,UsePreviousValue=true \
ParameterKey=taskInstanceType,UsePreviousValue=true \
ParameterKey=emrVersion,UsePreviousValue=true \
ParameterKey=environmentType,UsePreviousValue=true \
ParameterKey=masterInstanceType,UsePreviousValue=true \
ParameterKey=s3BucketBasePath,UsePreviousValue=true \
ParameterKey=terminationProtected,UsePreviousValue=true \
ParameterKey=taskBidPrice,UsePreviousValue=true --region $awsRegion

The "--use-previous-template" switch and "UsePreviousValue" resource ensure nothing else changes.

Finally, to delete the stack:

aws cloudformation update-stack --stack-name [STACK_NAME] \
--use-previous-template --parameters \
ParameterKey=clusterName,UsePreviousValue=true \
ParameterKey=taskInstanceCount,ParameterValue=$taskNodeCount \
ParameterKey=coreInstanceType,UsePreviousValue=true \
ParameterKey=taskInstanceType,UsePreviousValue=true \
ParameterKey=emrVersion,UsePreviousValue=true \
ParameterKey=environmentType,UsePreviousValue=true \
ParameterKey=masterInstanceType,UsePreviousValue=true \
ParameterKey=s3BucketBasePath,UsePreviousValue=true \
ParameterKey=terminationProtected,ParameterValue=false \
ParameterKey=taskBidPrice,UsePreviousValue=true --region $awsRegion
sleep 20 aws cloudformation delete-stack --stack-name [STACK_NAME] --region [REGION]

The first section of the command updates the stack by changing the termination protection value to 'false'. Once that has completed, the stack is then deleted.

In conclusion, we've changed a script which consists of 100+ lines of code to commands which average 14 lines (if you want to include line continuation).

Link to AWS CloudFormation: https://aws.amazon.com/cloudformation/

Tuesday, 14 June 2016

Wordpress HTTPS site setup behind an SSL terminating Loadbalancer

I've recently been carrying out work on Wordpress multi-site functionality for some websites. More recently, configuring Wordpress sites over HTTPS. There is a lot of tutorials out there on how to set it up, but many of them assume you are not using any kind of load balancing technology or SSL termination. After hours of troubleshooting why my new multi-site was not displaying any data at all over HTTPS (after changing the site URLs and adding in my relevant entries in .htaccess as instructed), I found that you need to modify the wp-config.php to turn on HTTPS if the X-Forwarded-Proto header passed from the load balancer contains 'https'. This is the snippet of code you need to have in your wp-config.php:

if (isset($_SERVER['HTTP_X_FORWARDED_PROTO']) && $_SERVER['HTTP_X_FORWARDED_PROTO'] == 'https')
$_SERVER['HTTPS'] = 'on';

This needs to go directly above the "/* That's all, stop editing! Happy blogging. */" line.

This will only work for Loadbalancers which support passing the X-Forwarded-Proto header, in this case, I am using Amazon's ELB (Elastic Loadbalancer)

Additionally, ensure your .htaccess file contains the relevant rewrite rules for your site domain, e.g:

<IfModule mod_rewrite.c>
RewriteEngine On
RewriteCond %{SERVER_PORT} 80 
RewriteRule ^(.*)$ https://www.domain.com/$1 [R,L]
</IfModule>

If you wish to be selective on what domain you want to rewrite - add the additional conditional statement line:

<IfModule mod_rewrite.c>
RewriteEngine On
RewriteCond %{HTTP_HOST} ^www\.domain\.com
RewriteCond %{SERVER_PORT} 80
RewriteRule ^(.*)$ https://www.domain.com/$1 [R,L]
</IfModule>

Thursday, 20 December 2012

A clean and simple way of ridding of Ransomware

I discovered that Ransomware infected my PC the other evening whilst I was surfing the web. Once infected it carried out the following functionality:
  • Completely locked me out of the system including the keyboard and mouse (apart from the keypad)
  • A banner that took up the whole of my monitor appeared with a "policeman" giving a stop signal to me.
  • Content included: "This computer has been locked due to known criminal activity including the following: Child Pornography, Gambling, Copyright Theft etc". In addition there was a section where the malware must integrate with your local webcam where it quotes "We are watching you" with the output of the webcam integrated on the banner.
  • Also - it resolves your IP address. ISP and location using an internet connection to make it look more realistic.
  • Finally - it stated that to unlock the computer - Please dial a number and pay 100 GBP. A keycode will be be provided to unlock the computer.
Now I have read online on how to get rid of the malware and they suggest to run anti-virus and anti-malware software. However that does not get rid of it instantly. I came across the issue recently and performed the following quick, clean and simple procedure:

  • Disconnect from the internet - this will stop the banner from starting
  • Perform an unexpected reboot of your PC and boot into Windows Safe mode
  • Go to Start > Run > and type 'msconfig'
  • Locate to the startup tab and scroll down to near the bottom
  • Untick any unfamilar startup entries that are in another language or does not fit in to your other startup entries, click Apply, OK.
  • Find out the location of the start up program and remove it (if you attempt to so this on normal boot of windows you will get an error).
  • Reboot your system
As a precaution it may be worth running your antivirus and antimalware scans just in case you oversee other potentially threatening files.

Friday, 4 May 2012

Migrating Cloned Solaris Zones from one global host to another

This post illustrates a simple and efficient way of migrating a cloned zone instead of a sparsed root zone, on a sparsed root zone you simply snapshot the dataset and use zfs send and receive via ssh to transfer it over. However it will not work with a cloned zone, so this is a possible way of migrating

Firstly take a record of the info on the zone configuration so that you can configure it the same on the destination server.

# zonecfg –z testzone
zonecfg:testzone> info

Halt and detach the zone:

# zoneadm –z testzone halt
# zoneadm –z testzone detach

The most crucial file in this procedure is the detached xml file which can be located within the zones directory (SUNWdetached.xml) make sure this is generated before archiving and zipping. Change into to the directory where all your zonepaths are stored and perform the following commands to archive and zip the zone directory:

e.g.
# cd /pool/zones

# tar –cvf testzone.tar testzone/*
# gzip testzone.tar

The next step is to send over the zipped and archived file via scp to the destination global host

# scp testzone.tar.gz root@[IPADDROFHOST]:/path/to/zones

Once the file has been sent over, login to the server and locate to the zone directory.  Unzip and untar the file using the following commands:

# gunzip testzone.tar.gz
# tar –xvf testzone.tar

Next we want to rename the current zone directory, create the new zfs filesystem and copy the contents of testzones directory into its zfs filesystem mountpoint. Perform the following commands:

e.g.
#mv testzone testzone2
# zfs create pool/zones/testzone
# mv testzone2/* testzone

Make sure you change into the testzone directory and unzip the detached xml file as this will be compressed, if this is not done you will not be able to attach the zone.

# cd testzone
# gunzip SUNWdetached.xml.gz

The zone directory must not be readable or writable by group or other so change the permissions to 700

# chmod 700 testzone

Now its time to create the zone and attach it

# zonecfg -z testzone
testzone: No such zone configured
Use 'create' to begin configuring a new zone.
zonecfg:testzone> create –a /path/to/zone
zonecfg:testzone> info

At this point make sure all the configuration in terms of networking, package inheritance and resource are correct and are familiar to the original zone.

zonecfg:testzone> commit
zonecfg:testzone> exit
# zoneadm –z testzone attach –u

If that does not work, use the –F switch to force the attachment

# zoneadm –z testzone boot
# zlogin testzone

Have a look around the service management facilities and check that common configuration is correct. And you are good to go.

Platform Engineering: Developer Experience

Intro The role of Platform Engineer as evolved through the ages. I’ve gone through all the names: Sys Admin, Infrastructure Engineer, DevOps...