AWS - 01 Microservices

This is the first post from the AWS series. We will go through some notes and points about running microservices applications on AWS. All details can be found in the AWS whitepapers.

Microservices on AWS (Distributed)

Infrastructure as code

CF: describe the whole infrastructure as code, and version control it (Fast rollback)

Microservices structure

ELB (ALB) – ECS + AS – RDS / DynamoDB

ECS

Create task definition in JSON
Container placement strategies & constraints
- Task placement constraint: Rule considered during task placement, based on attributes (key-value pairs)
Use ECR to register container

Data store

ElastiCache (Memcached is multi-threaded, Redis is single-threaded)
DAX: caching, eventually consistent data

Reduce operational complexity

Throttle requests to protect backend
CloudFront Point of Presence (PoP) & Regional Edge Cache: minimize latency

GW first check if the GET request is in the cache at edge location / Regional Edge Cache / GW response cache.

After backend processes the request, API call metrics are logged in CW.
SAM is natively supported by CF; Use CF to config serverless apps, SAM simplifies the amount of yaml you need to write

Distributed system components

I. Service discovery

Best: key-value store (e.g. Eureka, Consul)
- AWS: use DynamoDB to propagate status changes (key-value)
- Does not have DNS caching issues
- Works well with client-side LB (Netflix Ribbon), eliminate bottlenecks & simplify management
Client-side service discovery
ALB-based
DNS-based
Using ECS Event Stream
Using configuration management tools (OpsWorks / Chef / Ansible)

II. Distributed data management

1. Event sourcing

Represent & persist every application change as an event record
Data is stored as a stream of events
Examples: DB transaction logging, version control systems
Pros
- State can be determined & reconstructed any point in time
- Produce persistent audit trail (easy for debugging)

2. Event sourcing & microservices

Decouple: publish / subscribe pattern
Feeds the same event data into different data models for separate microservices
Decouple read from write: CQRS (Command Query Responsibility Segregation)
Kinesis Streams as the central event store (capture application changes as events, and persist on S3)

Publish event by writing message to Kinesis Streams. All microservices read the message copy, filter based on relevancy, and forward to Lambda / Kinesis Firehose for further processing.

3. No containers

The key to building resilient, self-healing systems is to allow failures to be contained, refined as messages, sent to other components (that act as supervisors), and managed from a safe context outside the failed component.

Event sourcing: Here, being message-driven is the enabler. The idea is to decouple the management of failures from the call chain, freeing the client from the responsibility of handling the failures of the server. No container or orchestration tooling will help you to integrate this.

III. Async communication

REST can be sync / async, REST relies on:
- Stateless communication
- Uniform interfaces
- Standard methods (e.g. HTTP GET, POST, etc.)
Message passing
- If async, does not need service discovery
- Exchange message via a queue (SQS / SNS):
  - Subscribe an SQS queue to an SNS topic
  - Publish a message to the topic, and SNS sends a message to the subscribed SQS queue
  - Message (JSON) contains: subject, message, metadata

Orchestration & state management

Step functions (state machines): coordinate components of distributed applications & microservices
SF supports orchestration of Lambda functions (sequential & parallel)
Amazon States Language

IV. Distributed Monitoring, Tracing & Auditing

1. Distribute monitoring - CW

Centralize logs
- Primary destination: S3 / CW Logs
- Application running on EC2: Daemon ship logs to CW Logs
- Lambda natively ship logs to CW Logs
- ECS support awslogs, centralize container logs to CW Logs
Search & analyze logs: ES & Kibana, Athena (query logs from S3)

2. Distributed tracing - X-Ray

X-Ray: end-to-end view of requests

Use correlation IDs: unique identifiers attached to all requests & messages related to a specific event chain
Trace ID is added to HTTP requests in specific tracing headers (X-Amzn-trace-Id)
Works with EC2, ECS, Lambda, EB

3. Log analysis

EC2 / ECS / Lambda – CW Logs – ES & Kibana
- Config CW to stream log entries to ES in near real time, via CW subscription
- Send SNS notice, emails, JIRA tickets
EC2 / ECS / Lambda – CW Logs – Kinesis Firehose – Redshift – QuickSight
- QuickSight can only query from data services (e.g. Redshift)
- 🧡 CW as centralized store for log data
- Stream log entries to Firehose (deliver real-time streaming data to S3 / ES / Redshift)
CW Logs – Firehose – S3 – DynamoDB – QuickSight
CW Logs – Lambda – S3 – DynamoDB – QuickSight
- CW Logs: Centralize logs
- S3: Store logs
- QS: Last step

4. Auditing - CT

Tracking changes in microservices, pass to CW Logs / S3
Allow multiple trails for the same account
Aggregate in a single S3 bucket

Pros: New files can trigger SNS / start Lambda to parse the log file, data auto archived to Glacier via lifecycle policies.
Store in CW Logs

Pros: Trail data is generated in real time, reroute to ES for search & visualization.

5. Events & real-time actions

CW Events deliver near real-time stream of system events that describe changes in AWS resources
CT + S3 + CW Events: Generate events for all changing API calls across all AWS services

6. Resource Inventory & change management

AWS Config
- Provide AWS resource inventory, config history, and config change notifications
- Create rules that auto check the config of AWS resources recorded by AWS Config
SNS
- Send email to specific groups
- Add a message to SQS queue (message picked up by SQS, compliant state is restored by GW configuration)

Containerized Microservices

Layer caching: Docker only build the layer that was changed.

K8S Pods = ECS Tasks = Container sets (collaborate using links / volumes)
Scheduler maintain the desired count of tasks / container sets

Treating software as always-improving products instead of projects.

Smart endpoints & dumb pipes

Sync: Request / Response
Async: Publish / Subscribe
- Event-based architecture
Endpoints that produce & consume messages are smart, but the pipe between endpoints are dumb

Infrastructure automation

Infrastrtucture as code (easy rollbacks, instantiated from description)
Deploy in phases: blue / green, canary (Lambda)

Design for failures

Self-healing infrastructure (automation)
Treat container instances as immutable servers