Merikanto

一簫一劍平生意,負盡狂名十五年

AWS - 01 Microservices

This is the first post from the AWS series. We will go through some notes and points about running microservices applications on AWS. All details can be found in the AWS whitepapers.



Microservices on AWS (Distributed)

Infrastructure as code

CF: describe the whole infrastructure as code, and version control it (Fast rollback)


Microservices structure

ELB (ALB) – ECS + AS – RDS / DynamoDB


ECS

  • Create task definition in JSON
  • Container placement strategies & constraints
    • Task placement constraint: Rule considered during task placement, based on attributes (key-value pairs)
  • Use ECR to register container

Data store

  • ElastiCache (Memcached is multi-threaded, Redis is single-threaded)
  • DAX: caching, eventually consistent data

Reduce operational complexity

  • Throttle requests to protect backend

  • CloudFront Point of Presence (PoP) & Regional Edge Cache: minimize latency

    GW first check if the GET request is in the cache at edge location / Regional Edge Cache / GW response cache.

    After backend processes the request, API call metrics are logged in CW.

  • SAM is natively supported by CF; Use CF to config serverless apps, SAM simplifies the amount of yaml you need to write


Distributed system components

I. Service discovery

  • Best: key-value store (e.g. Eureka, Consul)
    • AWS: use DynamoDB to propagate status changes (key-value)
    • Does not have DNS caching issues
    • Works well with client-side LB (Netflix Ribbon), eliminate bottlenecks & simplify management
  • Client-side service discovery
  • ALB-based
  • DNS-based
  • Using ECS Event Stream
  • Using configuration management tools (OpsWorks / Chef / Ansible)

II. Distributed data management

1. Event sourcing

  • Represent & persist every application change as an event record
  • Data is stored as a stream of events
  • Examples: DB transaction logging, version control systems
  • Pros
    • State can be determined & reconstructed any point in time
    • Produce persistent audit trail (easy for debugging)

2. Event sourcing & microservices

  • Decouple: publish / subscribe pattern

  • Feeds the same event data into different data models for separate microservices

  • Decouple read from write: CQRS (Command Query Responsibility Segregation)

  • Kinesis Streams as the central event store (capture application changes as events, and persist on S3)

    Publish event by writing message to Kinesis Streams. All microservices read the message copy, filter based on relevancy, and forward to Lambda / Kinesis Firehose for further processing.


3. No containers

The key to building resilient, self-healing systems is to allow failures to be contained, refined as messages, sent to other components (that act as supervisors), and managed from a safe context outside the failed component.

Event sourcing: Here, being message-driven is the enabler. The idea is to decouple the management of failures from the call chain, freeing the client from the responsibility of handling the failures of the server. No container or orchestration tooling will help you to integrate this.


III. Async communication

  • REST can be sync / async, REST relies on:
    • Stateless communication
    • Uniform interfaces
    • Standard methods (e.g. HTTP GET, POST, etc.)
  • Message passing
    • If async, does not need service discovery
    • Exchange message via a queue (SQS / SNS):
      • Subscribe an SQS queue to an SNS topic
      • Publish a message to the topic, and SNS sends a message to the subscribed SQS queue
      • Message (JSON) contains: subject, message, metadata

Orchestration & state management

  • Step functions (state machines): coordinate components of distributed applications & microservices
  • SF supports orchestration of Lambda functions (sequential & parallel)
  • Amazon States Language

IV. Distributed Monitoring, Tracing & Auditing

1. Distribute monitoring - CW

  • Centralize logs
    • Primary destination: S3 / CW Logs
    • Application running on EC2: Daemon ship logs to CW Logs
    • Lambda natively ship logs to CW Logs
    • ECS support awslogs, centralize container logs to CW Logs
  • Search & analyze logs: ES & Kibana, Athena (query logs from S3)

2. Distributed tracing - X-Ray

X-Ray: end-to-end view of requests

  • Use correlation IDs: unique identifiers attached to all requests & messages related to a specific event chain
  • Trace ID is added to HTTP requests in specific tracing headers (X-Amzn-trace-Id)
  • Works with EC2, ECS, Lambda, EB

3. Log analysis

  • EC2 / ECS / Lambda – CW Logs – ES & Kibana
    • Config CW to stream log entries to ES in near real time, via CW subscription
    • Send SNS notice, emails, JIRA tickets
  • EC2 / ECS / Lambda – CW Logs – Kinesis Firehose – Redshift – QuickSight
    • QuickSight can only query from data services (e.g. Redshift)
    • 🧡 CW as centralized store for log data
    • Stream log entries to Firehose (deliver real-time streaming data to S3 / ES / Redshift)
  • CW Logs – Firehose – S3 – DynamoDB – QuickSight
  • CW Logs – Lambda – S3 – DynamoDB – QuickSight
    • CW Logs: Centralize logs
    • S3: Store logs
    • QS: Last step

4. Auditing - CT

  • Tracking changes in microservices, pass to CW Logs / S3

  • Allow multiple trails for the same account

  • Aggregate in a single S3 bucket

    Pros: New files can trigger SNS / start Lambda to parse the log file, data auto archived to Glacier via lifecycle policies.

  • Store in CW Logs

    Pros: Trail data is generated in real time, reroute to ES for search & visualization.


5. Events & real-time actions

  • CW Events deliver near real-time stream of system events that describe changes in AWS resources
  • CT + S3 + CW Events: Generate events for all changing API calls across all AWS services

6. Resource Inventory & change management

  • AWS Config
    • Provide AWS resource inventory, config history, and config change notifications
    • Create rules that auto check the config of AWS resources recorded by AWS Config
  • SNS
    • Send email to specific groups
    • Add a message to SQS queue (message picked up by SQS, compliant state is restored by GW configuration)

Containerized Microservices

Layer caching: Docker only build the layer that was changed.


  • K8S Pods = ECS Tasks = Container sets (collaborate using links / volumes)
  • Scheduler maintain the desired count of tasks / container sets

Treating software as always-improving products instead of projects.


Smart endpoints & dumb pipes

  • Sync: Request / Response

  • Async: Publish / Subscribe

    • Event-based architecture
  • Endpoints that produce & consume messages are smart, but the pipe between endpoints are dumb


Infrastructure automation

  • Infrastrtucture as code (easy rollbacks, instantiated from description)
  • Deploy in phases: blue / green, canary (Lambda)

Design for failures

  • Self-healing infrastructure (automation)
  • Treat container instances as immutable servers