In the eighth post of the AWS series, we’re going to talk about three monitoring services today:
- CloudWatch (Metrics & Logs)
- X-Ray (Traces)
- CloudTrail (Audit)
General
Observability:
- CloudWatch (Metrics & Logs)
- X-Ray (Traces)
CloudTrail only tracks client API calls. X-Ray traces within the AWS service.
Since the errors are being received intermittently, it’s better to collect and aggregate the results at regular intervals and then send the data to CloudWatch.
X-Ray:
- Trace and analyze user requests as they travel through GW APIs to the underlying services
- API Gateway supports AWS X-Ray tracing for all API Gateway endpoint types (regional, edge-optimized, and private)
- X-Ray gives you an end-to-end view of an entire request, so you can analyze latencies in your APIs and their backend services
You can use an X-Ray service map to view the latency of an entire request, and latency of the downstream services integrated with X-Ray. And you can configure sampling rules to tell X-Ray which requests to record, at what sampling rates, according to criteria that you specify.
区别 :
CloudTrail is primarily used for API logging of all of your AWS resources
CloudWatch is a monitoring and management service. It does not have the capability to trace and analyze user requests as they travel through APIs
VPC flow logs enable you to capture information about the IP traffic going to and from network interfaces in your entire VPC
Although it can capture some details about the incoming user requests, it is still better to use AWS X-Ray as it is a better way to debug and analyze your microservices applications with request tracing, so you can find the root cause of your issues and performance.
CloudWatch (Metrics & Logs)
In essence, CW is a metric repository
- Monitoring tool for your AWS resources and applications
- CW metrics are not shared across regions
- Display metrics & create alarms that watch the metrics and send notifications or automatically make changes to the resources you are monitoring, when a threshold is breached
Concepts
- Namespaces: Container for CW metrics
- Metrics: ordered time-series data
- Cannot be deleted, but auto expire after 15 months
- Each metric data point is marked with a timestamp
- CW Detailed monitoring: publish your own application metrics
- EC2 metrics: CW does not collect memory utils and disk space usage metrics automatically. Need to install CloudWatch Agent in your instances first to retrieve these metrics
- Dimension: Name-value pair that uniquely identifies a metric
- Statistics: metric data aggregation
CW Events
- Deliver near real-time stream of system events that describe changes in AWS resources
- Events: change in the AWS environment
- Targets: process events
- Rules: Matches incoming events & route them to targets for processing
CW Logs
- Monitor logs from EC2 instances in real-time
- Monitor CT logged events
- By default, logs are kept indefinitely and never expire
- CW Log Insights: interactively search and analyze your log data in CloudWatch Logs using queries
CW Agent
- Collect more logs and system-level metrics from EC2 instances and your on-premises servers
- Needs to be installed first
Security
- IAM users / roles
- Dashboard permissions, IAM identity-based policies, service-linked roles
X-Ray (Performance Monitoring)
- X-Ray analyzes and debugs apps, such as those built using a microservices architecture. With X-Ray, you can identify performance bottlenecks, edge case errors, and other hard to detect issues
- X-Ray daemon buffers segments in a queue, and uploads them to X-Ray in batches
- Listens for UDP traffic (port 2000)
- Gathers raw segment data
- Relays to X-Ray API
X-Ray SDK does not send data directly to X-Ray!
To avoid calling the service every time your application serves a request, the SDK sends the trace data to a daemon, which collects segments for multiple requests and uploads them in batches.
To properly instrument your application hosted in an EC2 instance, you have to install the X-Ray daemon by using a user data script. This will install and run the daemon automatically when you launch the instance.
To use the daemon on Amazon EC2, create a new instance profile role or add the managed policy to an existing one. This will grant the daemon permission to upload trace data to X-Ray.
Amazon Inspector: Automated security assessment service that helps improve application security and compliance deployed on AWS
Concepts
Segment: Provides the name of the compute resources running your application logic, details about the request sent by your application, and details about the work done
X-Ray uses the data that your application sends to generate a service graph (JSON document). Each AWS resource that sends data to X-Ray appears as a service in the graph
Trace collects all the segments generated by a single request
The request is typically an HTTP GET or POST request that travels through a load balancer, hits your application code, and generates downstream calls to other AWS services or external web APIs
Use filter expression for advanced tracing
Groups are a collection of traces that are defined by a filter expression (identified by name or ARN)
🧡 Annotations are simple key-value pairs that are indexed for use with filter expressions. Use annotations to record data that you want to bundle traces by groups
- A segment can contain multiple annotations
- System-defined annotations include data added to the segment by AWS services, whereas user-defined annotations are metadata added to a segment by a developer
Features
- X-Ray can be used with Lambda, EC2, ECS, Beanstalk (integrate X-Ray SDK in the application, and install X-Ray Agent)
- Provide end-to-end, cross-service, application-centric view of requests flowing through your application, by aggregating the data gathered from individual services of the application into a single unit called trace
- X-Ray SDK captures metadata for requests made to RDS & DynamoDB, and SQS & SNS
- Set trace sampling rate: X-Ray continually traces requests made to the application, and stores a sampling of the requests for analysis
- X-Ray creates a map of services used by your application with trace data
流程
- X-Ray receives data from services as segments
- X-Ray then groups segments that have a common request into traces
- X-Ray processes the traces to generate a service graph that provides a visual representation of your application.
Types of X-Ray integration
- Active instrumentation: Samples and instruments incoming requests
- Passive instrumentation: Instrument requests that have been sampled by another service
- Request tracing: Adds a tracing header to all incoming requests and propagates it downstream
- Tooling: Runs the X-Ray daemon to receive segments from the X-Ray SDK
X-Ray integration with AWS services
Lambda
- Active and passive instrumentation of incoming requests on all runtimes
- Lambda adds two nodes to your service map, one for the AWS Lambda service, and one for the function
API Gateway
- Active and passive instrumentation.
- GW uses sampling rules to determine which requests to record, and adds a node for the gateway stage to your service map
ELB
- Request tracing on ALBs
- ALB adds the trace ID to the request header before sending it to a target group
Beanstalk
- Tooling
EC2
- Use a user data script to install the X-Ray daemon
CloudTrail (Log Management)
CT: logs (CT triggers CW logs)
CW: metrics
View events in Event History (actions taken by user / role / services)
CT Trails:
- One region
- all regions (default)
- Organization trail
- By default, CloudTrail event log files are encrypted using S3 server-side encryption. You can also encrypt log files with KMS
- Use SNS for log delivery & validation
- CT publish logs every 5 min
Events
Management events
- Logged (default)
- Insight, control plane operations
Data events
- Not logged (default)
- Data plane ops
- High-volume activities
Monitoring
CW Logs to monitor log data
CT does not capture error logs in EC2 instance; Need CW logs for this.
CT events that are sent to CW Logs can trigger alarms according to the metric filters you define
CT log file integrity validation: Determine whether a log file was modified / deleted after CT delivers it