Merikanto

一簫一劍平生意,負盡狂名十五年

AWS - 04 RDS & DynamoDB

In the fourth post of the AWS series, we’re going to talk about 2 services today:

  • Relational Database Service (RDS)
  • DynamoDB (NoSQL)


RDS

  • Basic building block of RDS: DB instance (isolated DB environment in the cloud)
  • Each DB instance runs a DB engine
  • An RDS Tag is a name-value pair
  • Multi-AZ deployment for HA

Sync data replication in RDS

RDS DB instance running as Multi-AZ deployment

  • Auto provision & maintain synchronous standby replica in a different AZ

  • Updates to your DB Instance are synchronously replicated across AZs to the standby in order to keep both in sync and protect your latest database updates against DB instance failure


Async data replication

Use Read Replicas

  • All read replicas are active / accessible
  • No backups by default
  • Manually promoted to standalone DB instance

DB instance

Class types

  • Standard
  • Memory Optimized
  • Burstable Performance

Security

  • Groups

    • DB
    • VPC
    • EC2
  • Assign individual IAM account to each person who manages RDS resources

  • Use IAM groups, and rotate credentials regularly

  • Use security groups to control what IP addresses / EC2 instances can connect to your DB instance


Monitoring

  • CW

  • RDS Events

  • DB logs: Audit / Error / General / Slow query log (troubleshoot queries that take a long time)

  • Enhance monitoring for DB instances. Metrics:

    • IOPS (I/O operations per second)

    • Latency

    • Throughput

    • Queue Depth (number of I/O requests in the queue waiting to be serviced)

      CW gathers metrics about CPU utilization from the hypervisor for a DB instance

      Enhanced Monitoring gathers its metrics from an agent on the instance

  • CloudTrail captures all API calls to RDS as events



DynamoDB

Concepts

DAX is DynamoDB’s caching solution. (Cache reads).

Delivers microsecond response times for accessing eventually consistent data


1. Overview

  • Encryption at rest, on-demand back-up, and point-in-time recovery
  • Data stored in partitions, backed by SSDs, and auto replicated in multiple AZs

2. Components

  • Tables: no schema, 256 per Region

  • Items: collection of attributes

    • Primary key to uniquely identify each item

      One PK: partition key

      Composite PK: partition key + sort key

    • Secondary index to make queries faster

      Global: different partition & sort key

      Local: same partition key, different sort key

  • Attributes: fundamental data element


3. DynamoDB Streams

  • Capture data modification events in DynamoDB tables

  • Each event is represented by a stream record, captures when new item is CUDed

    • Stream records (24h) are organized into groups (shards). Each shard acts as a container for multiple stream records
  • DynamoDB Stream & Lambda:

    Trigger: Code that executes automatically when an event of interest appears in a stream

  • Use:

    • Data replication across regions
    • Materialized view of tables
    • Data analysis with Kinesis

4. Data type

  • Scalar
  • Document
  • Set

5. Read & Write

  • Strongly consistent read: return with most up-to-date data (no stale data)

  • When create table / index, first provision throughput capacity:

    • WCU (write capacity unit): 1KB
    • RCU (read capacity unit): 4KB
  • Throttling: prevents your application from consuming too many capacity units. DynamoDB can throttle read or write requests that exceed the throughput settings for a table, and can also throttle read requests exceeds for an index

    When request is throttled, the HTTP return code is 400 Bad Request

  • DynamoDB Auto Scaling is enabled by default


6. Items

  • Use UpdateItem to create Atomic counter: numeric attribute that is incremented unconditionally, not interfere with other write requests
  • Conditional writes for CUD (A conditional write only succeeds, if the item attributes meet one or more expected conditions)
  • Conditional writes can be idempotent if the conditional check is on the same attribute that is being updated

7. Other Properties

  • Projection expression: GET only a few items ( a string that identifies the attributes you want )

  • Condition expression: determine which should be written for CUD

  • TTL: Items are auto deleted when expire

  • Filter expression: Refine query results (only return filtered results, others are discarded)

  • Query results are paginated

  • Batch operations: Wrappers for multiple read or write requests.

    Batch operations are primarily used when you want to retrieve or submit multiple items in DynamoDB through a single API call, which reduces the number of network round trips from your application to DynamoDB


8. Scans

  • Reads every item in a table or a secondary index (return all results by default)
  • By default, a Scan operation performs eventually consistent reads, and process data sequentially

9. On-demand backup & restore

  • Use IAM to restrict DynamoDB backup and restore actions for some resources
  • All backup and restore actions are captured & recorded in CloudTrail
  • Restore backups to a new table

10. Transactions

  • Simplify the developer experience of making coordinated, all-or-nothing changes to multiple items both within and across tables

  • Transactions provide atomicity, consistency, isolation, and durability (ACID) in DynamoDB, help to maintain data correctness

  • You can group multiple Put, Update, Delete, and ConditionCheck actions. You can then submit the actions as a single TransactWriteItems operation that either succeeds or fails as a unit

  • You can group and submit multiple Get actions as a single TransactGetItems operation


11. Global Tables

  • To ensure eventual consistency, DynamoDB global tables use a “last writer win” reconciliation between concurrent updates, where DynamoDB makes a best effort to determine the last writer

12. Security

  • Encrypt data use KMS managed keys
  • Permission policy (identity based)
    • Attach a permissions policy to a user or a group in your account
    • Attach a permissions policy to a role (grant cross-account permissions)

13. Monitoring

  • CW alarms: Watch a single metric over a time period that you specify, and perform one or more actions based on the value of the metric relative to a given threshold over a number of time periods
  • CW events: Match events and route them to one or more target functions or streams to make changes, capture state information, and take corrective action.
  • CW logs: Monitor, store, and access logs from CT
  • CT log monitoring: Share log files between accounts, monitor CT logs in real time by sending them to CW Logs

14. Best practices

  • Maintain as few tables as possible in a DynamoDB application. Most well designed applications require only one table
  • Understand access patterns:
    • Data size
    • Data shape
    • Data velocity
  • DynamoDB applies adaptive capacity in real time in response to changing application traffic patterns (maintain performance)

15. Pricing

  • Charge for: DAX, RCU, WCU, Reserved capacity, etc.

16. Partition key

The partition key of a table’s primary key determines the logical partitions in which a table’s data is stored. This in turn affects the underlying physical partitions. Provisioned I/O capacity for the table is divided evenly among these physical partitions. Therefore a partition key design that doesn’t distribute I/O requests evenly can create “hot” partitions that result in throttling and use your provisioned I/O capacity inefficiently.

The optimal usage of a table’s provisioned throughput depends not only on the workload patterns of individual items, but also on the partition key design. One example for this is the use of partition keys with high-cardinality attributes, which have a large number of distinct values for each item.


Note that the more distinct partition key values the workload access, the more those requests are spread across the partitioned space.


History

Relational DB is a great way to reduce storage cost (relational data, reduce redundancy), in the 70s and 80s, when the storage device is very expensive.

But relational DB increase CPU costs, because of the complex queries (joins) it executes to present a denormalized view of data that your application consumes.

Now the most expensive resource in the data center is the CPU, but not storage. So why do we want to use the technology (relational DB) that optimizes the least expensive resource in the data center ?!


So here comes NoSQL (Denormalized data model)

How to model data correctly in NoSQL

OLTP: Online Transaction Processing. (repeatable, consistent, simple)

OLAP: Online Analytical Processing


Overview

Wide column key-value store (support document attribute type)


Table: catalog (contains many items)

  • Mandatory partition key (uniquely identify) - Think of partition as folder / bucket
    • Distribute items across key space (i.e. tables)
    • Choose partition key that has a large number of distinct values (to fully distribute out)
    • Space: Make sure access is evenly spread over the key space
  • Optional sort key (orders the item within that folder)

LSI & GSI (support secondary access patterns)

  • Local: resort the data in the partitions (must use the same partition key, so only resorting)
  • Global: regroup the data (regroup the data by other attributes in the entire table)

Elasticity with AS


NoSQL Data Modeling (Access Patterns)

Select partition key:

  • Large number of distinct values
  • e.g. Customer ID

Select sort key:

  • Model 1:N & N:N relations
  • e.g. Orders & Order Items

With NoSQL:

  • Need to first understand every access pattern, what the application is doing
  • Model based on access patterns
  • Nature of application: OLAP / OLTP / DSS
  • NoSQL is efficient, but not flexible: Data modeling is tightly coupled with the access pattern of a specific application

DynamoDB Stream + Lambda

  • Stream is the change log for the DynamoDB table
  • Once data is in the stream, can invoke a lambda function
  • Lambda, 2 IAM roles:
    • Invocation role: define what it can see / read from the stream
    • Execution role: define what it can do

Composite Keys

  • Most people use NoSQL as a key-value store, but that’s not the most efficient way to use NoSQL DB
  • Because we want to store hierarchical data in the table
  • Sort condition before the read, filter condition after the read
  • Create composite sort keys, for faster queries on a small number of items