Merikanto

一簫一劍平生意,負盡狂名十五年

AWS - 07 Kinesis, SQS & SNS

In the seventh post of the AWS series, we’re going to talk about 3 message-delivering related services:

  • Kinesis
  • SQS (Simple Queue Service)
  • SNS (Simple Notification Service)


Kinesis

Easily collect, process, and analyze real-time, streaming data


Kinesis Data Stream

  • A massively scalable, highly durable data ingestion and processing service optimized for streaming data. You can configure hundreds of thousands of data producers to continuously put data into a Kinesis data stream
  • Data producer
    • An application that typically emits data records as they are generated to a Kinesis data stream
    • Data producers assign partition keys to records
    • Partition keys ultimately determine which shard ingests the data record for a data stream
  • Data consumer
    • A distributed Kinesis application or AWS service retrieving data from all shards in a stream as it is generated. Most data consumers are retrieving the most recent data in a shard, enabling real-time analytics or handling of data
  • Data stream
    • logical grouping of shards

Shard

  • Base throughput unit of a Kinesis data stream

  • Append-only log and a unit of streaming capability. A shard contains an ordered sequence of records ordered by arrival time

  • Add or remove shards from your stream dynamically as your data throughput changes

    Increase capacity: Resharding, e.g. UpdateShardCount, or split every shard in the stream

  • Enhanced fan-out: for each registered data consumer


Kinesis agent

  • Pre-built Java application that offers an easy way to collect and send data to Kinesis data stream

Monitoring

  • Monitor shard-level metrics in Kinesis Data Streams (CW, Kinesis Agent)
  • CT: log API calls

Security

  • Kinesis Data Streams can automatically encrypt sensitive data as a producer enters it into a stream (KMS)
  • IAM: managing access control
  • Interface VPC endpoint to keep traffic between your VPC and Kinesis Data Streams from leaving the Amazon network

Kinesis Data Firehose

  • The easiest way to load streaming data into data stores and analytics tools
  • It is a fully managed service that automatically scales to match the throughput of your data
  • Batch, compress & encrypt data before loading

Features

  • Load streaming data into S3, Redshift, ES, Splunk; Enable real-time analytics
  • Firehose can convert the format of incoming data from JSON to ORC formats before storing the data in S3
  • Configure Firehose to prepare your streaming data before it is loaded to data stores
  • Pre-built Lambda blueprints for converting common data sources (system logs) to JSON / CSV

Buffer size & buffer interval

Firehose buffers incoming streaming data to a certain size or for a certain period of time, before delivering it to destinations. Buffer Size is in MBs and Buffer Interval is in seconds


Firehose Stream sources

  • Kinesis data stream
  • Kinesis Agent
  • Firehose API using AWS SDK
  • CW logs, CW events, IoT

Data delivery frequency

  • S3 / ES: S3 / ES buffer size & buffer interval
  • Redshift: how fast Redshift finish COPY command
  • Splunk: Firehose buffers incoming data before delivering it to Splunk. The buffer size is 5 MB, and the buffer interval is 60 seconds

Monitoring

  • CW monitor Firehose metrics
  • Kinesis Agent publishes custom CW metrics, and helps assess whether the agent is healthy, submitting data into Kinesis Data Firehose as specified, and consuming the appropriate amount of CPU and memory resources on the data producer
  • CT: log API calls

Security

  • Auto encrypt data after it is uploaded to the destination
  • Resource access with IAM

Kinesis Data Analytics

Analyze streaming data, gain actionable insights, and respond to your business and customer needs in real time.

接收 Firehose / Data Stream 传过来的 data,并进行处理


Features

  • Serverless, takes care of everything required to continuously run your application
  • Scales applications to keep up with any volume of data in the incoming data stream
  • Delivers sub-second processing latencies, so you can generate real-time alerts, dashboards, and actionable insights
  • Supports standard SQL for query

Running application storage is used for saving application state using checkpoints

It is also accessible to your application code to use as temporary disk for caching data or any other purpose



SQS & SNS


SNS (Push)

  • Makes it easy to set up, operate, and send notifications from the cloud
  • Publish-subscribe (pub-sub) messaging paradigm, event-driven
  • Notifications delivered to clients using push (clients are passive receiver), rather than periodically check or poll (clients need action)

Event-driven computing

  • A model, in which subscriber services automatically perform work in response to events triggered by publisher services

  • Automate workflows while decoupling the services that collectively and independently work to fulfill these workflows

  • AWS event sources: EC2, S3, RDS

  • AWS event destinations: Lambda, SQS


Features

  • Message filtering: create filter policy
  • Message fanout occurs when a message is sent to a topic and then replicated and pushed to multiple endpoints. Fanout provides asynchronous event notifications, which in turn allows for parallel processing
  • Durable storage for all received messages, customized TTL values
  • Send alerts
    • Application & system alerts
    • SNS mobile notification (Fanout, async, SMS & SMTP)
    • Push email & texts

Publisher & Subscriber

  • Publishers communicate asynchronously with subscribers by producing and sending a message to a topic, which is a logical access point and communication channel
  • Subscribers consume or receive the message or notification over one of the supported protocols when they are subscribed to the topic
  • Publishers create topics to send messages, while subscribers subscribe to topics to receive messages
  • SNS supports SQS standard queues, but does not support forwarding messages to SQS FIFO queues

Topics

  • Instead of including a specific destination address in each message, a publisher sends a message to a topic
  • Each topic has a unique name that identifies the SNS endpoint for publishers to post messages and subscribers to register for notifications
  • A topic can support subscriptions and notification deliveries over multiple transports
  • SNS delivers messages to the subscriber in the order they were published into the topic

SNS also logs the the delivery status of notification messages sent to topics with the following SNS endpoints:

  • Lambda
  • SQS
  • HTTP
  • Application

Monitor & Security

  • CW to monitor SNS topics
  • CT for logging SNS API calls
  • X-Ray for messages passing through SNS, easier to trace and analyze messages as they travel through to the downstream services
  • SNS provides server-side encryption to encrypt topics
  • SNS supports VPC Endpoints via AWS PrivateLink. You can use VPC Endpoints to privately publish messages to SNS topics, from a VPC, without traversing the public internet

SQS (Poll)

  • A hosted queue that lets you integrate and decouple distributed software systems and components
  • Standard & FIFO queues
  • Poll-based. SNS use push-based
  • Access SQS using VPC endpoints via AWS PrivateLink, without using public IPs, and without needing to traverse the public internet

Benefits

  • Server-side encryption
  • Redundant infrastructure for high-concurrent access & HA for producing & consuming messages
  • SQS locks your messages during processing, so that multiple producers can send and multiple consumers can receive messages at the same time

Standard queue

  • All regions
  • Unlimited throughput
  • At least once delivery
  • Best effort ordering

FIFO queue

  • Some regions
  • High throughput
  • Exactly once processing, no duplication
  • Preserve the order which the messages are received
  • Support message groups (allow multiple ordered message groups within a single queue)

Polling

  • Short polling (default): returns immediately, even if the message queue being polled is empty
  • Long polling:
    • Reduce the cost by eliminating number of empty responses and false empty responses
    • Does not return a response until a message arrives in the message queue, or the long poll times out

Visibility timeout

  • Prevent other consumers from processing a message redundantly
  • Set a period of time SQS prevents other consumers from receiving and processing the message

Dead Letter Queue (DLQ)

  • Other queues can target DLQ for messages that can’t be processed successfully

Delay Queue

  • Postpone the delivery of new messages to a queue for a number of seconds

Best Practice

  • Extend the message’s visibility timeout to the maximum time it takes to process and delete the message

  • Using the appropriate polling mode (short / long)

  • Configure DLQ to capture problematic messages

    To avoid inconsistent message processing by standard queues, avoid setting max receives number to 1 when configure DLQ

  • Reduce cost by batching message actions


Monitoring

  • CW for monitor, CT for logging API calls
  • Automate notifications from AWS Services to SQS using CW Events