This is the first post from the AWS series. We will go through some notes and points about running microservices applications on AWS. All details can be found in the AWS whitepapers.
Microservices on AWS (Distributed)
Infrastructure as code
CF: describe the whole infrastructure as code, and version control it (Fast rollback)
Microservices structure
ELB (ALB) – ECS + AS – RDS / DynamoDB
ECS
- Create task definition in JSON
- Container placement strategies & constraints
- Task placement constraint: Rule considered during task placement, based on attributes (key-value pairs)
- Use ECR to register container
Data store
- ElastiCache (Memcached is multi-threaded, Redis is single-threaded)
- DAX: caching, eventually consistent data
Reduce operational complexity
Throttle requests to protect backend
CloudFront Point of Presence (PoP) & Regional Edge Cache: minimize latency
GW first check if the GET request is in the cache at edge location / Regional Edge Cache / GW response cache.
After backend processes the request, API call metrics are logged in CW.
SAM is natively supported by CF; Use CF to config serverless apps, SAM simplifies the amount of yaml you need to write
Distributed system components
I. Service discovery
- Best: key-value store (e.g. Eureka, Consul)
- AWS: use DynamoDB to propagate status changes (key-value)
- Does not have DNS caching issues
- Works well with client-side LB (Netflix Ribbon), eliminate bottlenecks & simplify management
- Client-side service discovery
- ALB-based
- DNS-based
- Using ECS Event Stream
- Using configuration management tools (OpsWorks / Chef / Ansible)
II. Distributed data management
1. Event sourcing
- Represent & persist every application change as an event record
- Data is stored as a stream of events
- Examples: DB transaction logging, version control systems
- Pros
- State can be determined & reconstructed any point in time
- Produce persistent audit trail (easy for debugging)
2. Event sourcing & microservices
Decouple: publish / subscribe pattern
Feeds the same event data into different data models for separate microservices
Decouple read from write: CQRS (Command Query Responsibility Segregation)
Kinesis Streams as the central event store (capture application changes as events, and persist on S3)
Publish event by writing message to Kinesis Streams. All microservices read the message copy, filter based on relevancy, and forward to Lambda / Kinesis Firehose for further processing.
3. No containers
The key to building resilient, self-healing systems is to allow failures to be contained, refined as messages, sent to other components (that act as supervisors), and managed from a safe context outside the failed component.
Event sourcing: Here, being message-driven is the enabler. The idea is to decouple the management of failures from the call chain, freeing the client from the responsibility of handling the failures of the server. No container or orchestration tooling will help you to integrate this.
III. Async communication
- REST can be sync / async, REST relies on:
- Stateless communication
- Uniform interfaces
- Standard methods (e.g. HTTP GET, POST, etc.)
- Message passing
- If async, does not need service discovery
- Exchange message via a queue (SQS / SNS):
- Subscribe an SQS queue to an SNS topic
- Publish a message to the topic, and SNS sends a message to the subscribed SQS queue
- Message (JSON) contains: subject, message, metadata
Orchestration & state management
- Step functions (state machines): coordinate components of distributed applications & microservices
- SF supports orchestration of Lambda functions (sequential & parallel)
- Amazon States Language
IV. Distributed Monitoring, Tracing & Auditing
1. Distribute monitoring - CW
- Centralize logs
- Primary destination: S3 / CW Logs
- Application running on EC2: Daemon ship logs to CW Logs
- Lambda natively ship logs to CW Logs
- ECS support
awslogs
, centralize container logs to CW Logs
- Search & analyze logs: ES & Kibana, Athena (query logs from S3)
2. Distributed tracing - X-Ray
X-Ray: end-to-end view of requests
- Use correlation IDs: unique identifiers attached to all requests & messages related to a specific event chain
- Trace ID is added to HTTP requests in specific tracing headers (
X-Amzn-trace-Id
) - Works with EC2, ECS, Lambda, EB
3. Log analysis
- EC2 / ECS / Lambda – CW Logs – ES & Kibana
- Config CW to stream log entries to ES in near real time, via CW subscription
- Send SNS notice, emails, JIRA tickets
- EC2 / ECS / Lambda – CW Logs – Kinesis Firehose – Redshift – QuickSight
- QuickSight can only query from data services (e.g. Redshift)
- 🧡 CW as centralized store for log data
- Stream log entries to Firehose (deliver real-time streaming data to S3 / ES / Redshift)
- CW Logs – Firehose – S3 – DynamoDB – QuickSight
- CW Logs – Lambda – S3 – DynamoDB – QuickSight
- CW Logs: Centralize logs
- S3: Store logs
- QS: Last step
4. Auditing - CT
Tracking changes in microservices, pass to CW Logs / S3
Allow multiple trails for the same account
Aggregate in a single S3 bucket
Pros: New files can trigger SNS / start Lambda to parse the log file, data auto archived to Glacier via lifecycle policies.
Store in CW Logs
Pros: Trail data is generated in real time, reroute to ES for search & visualization.
5. Events & real-time actions
- CW Events deliver near real-time stream of system events that describe changes in AWS resources
- CT + S3 + CW Events: Generate events for all changing API calls across all AWS services
6. Resource Inventory & change management
- AWS Config
- Provide AWS resource inventory, config history, and config change notifications
- Create rules that auto check the config of AWS resources recorded by AWS Config
- SNS
- Send email to specific groups
- Add a message to SQS queue (message picked up by SQS, compliant state is restored by GW configuration)
Containerized Microservices
Layer caching: Docker only build the layer that was changed.
- K8S Pods = ECS Tasks = Container sets (collaborate using links / volumes)
- Scheduler maintain the desired count of tasks / container sets
Treating software as always-improving products instead of projects.
Smart endpoints & dumb pipes
Sync: Request / Response
Async: Publish / Subscribe
- Event-based architecture
Endpoints that produce & consume messages are smart, but the pipe between endpoints are dumb
Infrastructure automation
- Infrastrtucture as code (easy rollbacks, instantiated from description)
- Deploy in phases: blue / green, canary (Lambda)
Design for failures
- Self-healing infrastructure (automation)
- Treat container instances as immutable servers