Merikanto

一簫一劍平生意,負盡狂名十五年

AWS - 10 Blue / Green Deployment and CI / CD

In today’s post, I’m going to cover doing blue / green deployment on AWS, and share some notes about building Ci/ CD pipelines on AWS. Since this is my summary from the AWS whitepapers, the entire post isn’t necessarily written in English.



Blue / Green Deployment

Blue / Green: Release apps by shifting traffic between 2 identical env, running different versions of the app

  • Near zero downtime
  • Easy rollback (revert traffic back to the still-operating blue)
  • Isolation between blue & green application

🧡 Goal of B / G: Achieve Immutable Infrastructure (Need not make changes to application after deployment)


B / G with AWS services

1) Route 53

  • DNS, classic approach
  • Direct traffic by updating DNS records, set shorter TTL

2) ELB

  • Health check against EC2 resources, increase fault tolerance

3) Auto Scaling

  • Enable B/G: Attach different versions of the launch configuration to AS group
  • Add ELB: balance traffic across EC2 instances running in AS groups
  • Standby state / termination policies: quick rollback

4) Beanstalk

  • EB supports AS & ELB for B/G
  • Run multiple versions by swapping environment URLs

5) OpsWorks

  • Based on Chef
  • Simplifies cloning entire stack

6) CF

  • Describe the AWS resources they need through JSON / yaml
  • Provision B/G, switch traffic via Route 53 / ELB
  • Infrastructure as code: Version control & CI

7) CW

  • Collect & track metrics
  • Collect & monitor logs
  • Set alarms
  • System-wide visibility into resource utilization, application performance & operational health

B / G Technique

1) Update DNS routing with Route 53

  • DNS routing through record updates (Aliases)
  • Express endpoint into the environment as a DNS name / IP
  • Can do a weighted distribution (gradual shift with Route 53), define traffic percentage (canary analysis)
  • Rollback by updating DNS record, to shift traffic back to blue (TTL, how long clients cache query results)
  • Applies to:
    • Public / Elastic IP, or expose IP / DNS endpoint
    • EC2 Instances / ECS clusters behind ELB, or in AS groups with ELB as frontend
    • EB web tiers

2) Swap AS groups behind ELB

  • ELB: health check (New instances auto added to the LB pool, if they pass health check)

  • AS: replace unhealthy instances

  • Health check occurs at configurable intervals

  • Deploy: Attach green group to LB, put blue in Standby state


3) Update AS group Launch configurations

  • A launch configuration: AMI ID (Amazon Machine Image), instance type, key pair, security groups, etc.

  • Associate only one launch config with an AS group, unchangeable after you create it

  • Change launch config: Replace existing config with a new one

  • Default termination policy: Remove instances with oldest launch config

  • Deploy

    • Update AS group with new launch config
    • Scale AS group *** 2**
    • Shrink As group back to original size (instances with old configs `are removed)
  • Instances with standby state: quick rollback

    • Update AS group with old launch config
    • Do the steps above in reverse

4) Swap Environment of EB application

  • In-place update on existing instances (downtime during update)
  • Immutable deployment using new instance sets
    • Swap Environment URLs from Actions
    • EB performs a DNS switch

5) Clone a Stack in OpsWorks & Update DNS

  • Stacks: logical grouping of AWS resources, one or more layers

  • Deploy: Update DNS records to point to green (stack’s LB)


Other

Best Practice for Data Sync & Schema Change

  • Decoupling schema change from code change
    • Additive: changed first
    • Deletive: changed last
  • Need to consider state (DB contains much state, but comparatively little logic & structure)

When NOT to use B / G

  • Introduce additional points of failure
  • Schema change is too complex, problem with data sync

CI / CD on AWS

1. CodeStar

Rapidly orchestrate an end-to-end software release workflow (pipeline)


2. Tests

  • Unit tests should make up the bulk of testing strategy (70%)

  • Staging Phase (Full environments are created to mirror real production environment)

    • Integration test (interface between components)
    • Component test (message passing between components)
    • System test (end-to-end)
    • Performance test (load / stress / spike tests)
    • Compliance test
    • User Acceptance Test (UAT, e2e business flow)
  • Production: Canary test


3. Build the Pipeline

  • CI / CD stages: Source, build, staging, production
  • buildspec.yml

4. Deployment methods

除了 deploy in place, 其它四种都是近乎 zero downtime

1) Deploy in place

  • All at once
  • Downtime during updates
  • Deploy: existing instances
  • Rollback: Redeploy

2) Rolling

  • Single batch out of service
  • Deploy: existing instances
  • Rollback: Redeploy
  • Variation: Canary release

3) Rolling with additional batch

  • Beanstalk ONLY
  • Deploy: new & existing instances
  • Rollback: Redeploy

4) Immutable

  • Deploy: new instances
  • Rollback: Redeploy

5) Blue / Green

  • Deploy: new instances
  • Rollback: Switch back to old environment

Best Practices

  • Infrastructure as code, pipeline as code
  • No long-running feature branches
  • Build unit tests toward 100% coverage, takes 70% of overall testing
  • Role-based security