In the seventh post of the AWS series, we’re going to talk about 3 message-delivering related services:
- Kinesis
- SQS (Simple Queue Service)
- SNS (Simple Notification Service)
Kinesis
Easily collect, process, and analyze real-time, streaming data
Kinesis Data Stream
- A massively scalable, highly durable data ingestion and processing service optimized for streaming data. You can configure hundreds of thousands of data producers to continuously put data into a Kinesis data stream
- Data producer
- An application that typically emits data records as they are generated to a Kinesis data stream
- Data producers assign partition keys to records
- Partition keys ultimately determine which shard ingests the data record for a data stream
- Data consumer
- A distributed Kinesis application or AWS service retrieving data from all shards in a stream as it is generated. Most data consumers are retrieving the most recent data in a shard, enabling real-time analytics or handling of data
- Data stream
- logical grouping of shards
Shard
Base throughput unit of a Kinesis data stream
Append-only log and a unit of streaming capability. A shard contains an ordered sequence of records ordered by arrival time
Add or remove shards from your stream dynamically as your data throughput changes
Increase capacity: Resharding, e.g.
UpdateShardCount
, or split every shard in the streamEnhanced fan-out: for each registered data consumer
Kinesis agent
- Pre-built Java application that offers an easy way to collect and send data to Kinesis data stream
Monitoring
- Monitor shard-level metrics in Kinesis Data Streams (CW, Kinesis Agent)
- CT: log API calls
Security
- Kinesis Data Streams can automatically encrypt sensitive data as a producer enters it into a stream (KMS)
- IAM: managing access control
- Interface VPC endpoint to keep traffic between your VPC and Kinesis Data Streams from leaving the Amazon network
Kinesis Data Firehose
- The easiest way to load streaming data into data stores and analytics tools
- It is a fully managed service that automatically scales to match the throughput of your data
- Batch, compress & encrypt data before loading
Features
- Load streaming data into S3, Redshift, ES, Splunk; Enable real-time analytics
- Firehose can convert the format of incoming data from JSON to ORC formats before storing the data in S3
- Configure Firehose to prepare your streaming data before it is loaded to data stores
- Pre-built Lambda blueprints for converting common data sources (system logs) to JSON / CSV
Buffer size & buffer interval
Firehose buffers incoming streaming data to a certain size or for a certain period of time, before delivering it to destinations. Buffer Size is in MBs and Buffer Interval is in seconds
Firehose Stream sources
- Kinesis data stream
- Kinesis Agent
- Firehose API using AWS SDK
- CW logs, CW events, IoT
Data delivery frequency
- S3 / ES: S3 / ES buffer size & buffer interval
- Redshift: how fast Redshift finish
COPY
command - Splunk: Firehose buffers incoming data before delivering it to Splunk. The buffer size is 5 MB, and the buffer interval is 60 seconds
Monitoring
- CW monitor Firehose metrics
- Kinesis Agent publishes custom CW metrics, and helps assess whether the agent is healthy, submitting data into Kinesis Data Firehose as specified, and consuming the appropriate amount of CPU and memory resources on the data producer
- CT: log API calls
Security
- Auto encrypt data after it is uploaded to the destination
- Resource access with IAM
Kinesis Data Analytics
Analyze streaming data, gain actionable insights, and respond to your business and customer needs in real time.
接收 Firehose / Data Stream 传过来的 data,并进行处理
Features
- Serverless, takes care of everything required to continuously run your application
- Scales applications to keep up with any volume of data in the incoming data stream
- Delivers sub-second processing latencies, so you can generate real-time alerts, dashboards, and actionable insights
- Supports standard SQL for query
Running application storage is used for saving application state using checkpoints
It is also accessible to your application code to use as temporary disk for caching data or any other purpose
SQS & SNS
SNS (Push)
- Makes it easy to set up, operate, and send notifications from the cloud
- Publish-subscribe (pub-sub) messaging paradigm, event-driven
- Notifications delivered to clients using push (clients are passive receiver), rather than periodically check or poll (clients need action)
Event-driven computing
A model, in which subscriber services automatically perform work in response to events triggered by publisher services
Automate workflows while decoupling the services that collectively and independently work to fulfill these workflows
AWS event sources: EC2, S3, RDS
AWS event destinations: Lambda, SQS
Features
- Message filtering: create filter policy
- Message fanout occurs when a message is sent to a topic and then replicated and pushed to multiple endpoints. Fanout provides asynchronous event notifications, which in turn allows for parallel processing
- Durable storage for all received messages, customized TTL values
- Send alerts
- Application & system alerts
- SNS mobile notification (Fanout, async, SMS & SMTP)
- Push email & texts
Publisher & Subscriber
- Publishers communicate asynchronously with subscribers by producing and sending a message to a topic, which is a logical access point and communication channel
- Subscribers consume or receive the message or notification over one of the supported protocols when they are subscribed to the topic
- Publishers create topics to send messages, while subscribers subscribe to topics to receive messages
- SNS supports SQS standard queues, but does not support forwarding messages to SQS FIFO queues
Topics
- Instead of including a specific destination address in each message, a publisher sends a message to a topic
- Each topic has a unique name that identifies the SNS endpoint for publishers to post messages and subscribers to register for notifications
- A topic can support subscriptions and notification deliveries over multiple transports
- SNS delivers messages to the subscriber in the order they were published into the topic
SNS also logs the the delivery status of notification messages sent to topics with the following SNS endpoints:
- Lambda
- SQS
- HTTP
- Application
Monitor & Security
- CW to monitor SNS topics
- CT for logging SNS API calls
- X-Ray for messages passing through SNS, easier to trace and analyze messages as they travel through to the downstream services
- SNS provides server-side encryption to encrypt topics
- SNS supports VPC Endpoints via AWS PrivateLink. You can use VPC Endpoints to privately publish messages to SNS topics, from a VPC, without traversing the public internet
SQS (Poll)
- A hosted queue that lets you integrate and decouple distributed software systems and components
- Standard & FIFO queues
- Poll-based. SNS use push-based
- Access SQS using VPC endpoints via AWS PrivateLink, without using public IPs, and without needing to traverse the public internet
Benefits
- Server-side encryption
- Redundant infrastructure for high-concurrent access & HA for producing & consuming messages
- SQS locks your messages during processing, so that multiple producers can send and multiple consumers can receive messages at the same time
Standard queue
- All regions
- Unlimited throughput
- At least once delivery
- Best effort ordering
FIFO queue
- Some regions
- High throughput
- Exactly once processing, no duplication
- Preserve the order which the messages are received
- Support message groups (allow multiple ordered message groups within a single queue)
Polling
- Short polling (default): returns immediately, even if the message queue being polled is empty
- Long polling:
- Reduce the cost by eliminating number of empty responses and false empty responses
- Does not return a response until a message arrives in the message queue, or the long poll times out
Visibility timeout
- Prevent other consumers from processing a message redundantly
- Set a period of time SQS prevents other consumers from receiving and processing the message
Dead Letter Queue (DLQ)
- Other queues can target DLQ for messages that can’t be processed successfully
Delay Queue
- Postpone the delivery of new messages to a queue for a number of seconds
Best Practice
Extend the message’s visibility timeout to the maximum time it takes to process and delete the message
Using the appropriate polling mode (short / long)
Configure DLQ to capture problematic messages
To avoid inconsistent message processing by standard queues, avoid setting max receives number to 1 when configure DLQ
Reduce cost by batching message actions
Monitoring
- CW for monitor, CT for logging API calls
- Automate notifications from AWS Services to SQS using CW Events