AWS - 04 RDS & DynamoDB

In the fourth post of the AWS series, we’re going to talk about 2 services today:

Relational Database Service (RDS)
DynamoDB (NoSQL)

RDS

Basic building block of RDS: DB instance (isolated DB environment in the cloud)
Each DB instance runs a DB engine
An RDS Tag is a name-value pair
Multi-AZ deployment for HA

Sync data replication in RDS

RDS DB instance running as Multi-AZ deployment

Auto provision & maintain synchronous standby replica in a different AZ
Updates to your DB Instance are synchronously replicated across AZs to the standby in order to keep both in sync and protect your latest database updates against DB instance failure

Async data replication

Use Read Replicas

All read replicas are active / accessible
No backups by default
Manually promoted to standalone DB instance

DB instance

Class types

Standard
Memory Optimized
Burstable Performance

Security

Groups
- DB
- VPC
- EC2
Assign individual IAM account to each person who manages RDS resources
Use IAM groups, and rotate credentials regularly
Use security groups to control what IP addresses / EC2 instances can connect to your DB instance

Monitoring

CW
RDS Events
DB logs: Audit / Error / General / Slow query log (troubleshoot queries that take a long time)
Enhance monitoring for DB instances. Metrics:
- IOPS (I/O operations per second)
- Latency
- Throughput
- Queue Depth (number of I/O requests in the queue waiting to be serviced)
  
  CW gathers metrics about CPU utilization from the hypervisor for a DB instance
  
  Enhanced Monitoring gathers its metrics from an agent on the instance
CloudTrail captures all API calls to RDS as events

DynamoDB

Concepts

DAX is DynamoDB’s caching solution. (Cache reads).

Delivers microsecond response times for accessing eventually consistent data

1. Overview

Encryption at rest, on-demand back-up, and point-in-time recovery
Data stored in partitions, backed by SSDs, and auto replicated in multiple AZs

2. Components

Tables: no schema, 256 per Region
Items: collection of attributes
- Primary key to uniquely identify each item
  
  One PK: partition key
  
  Composite PK: partition key + sort key
- Secondary index to make queries faster
  
  Global: different partition & sort key
  
  Local: same partition key, different sort key
Attributes: fundamental data element

3. DynamoDB Streams

Capture data modification events in DynamoDB tables
Each event is represented by a stream record, captures when new item is CUDed
- Stream records (24h) are organized into groups (shards). Each shard acts as a container for multiple stream records
DynamoDB Stream & Lambda:

Trigger: Code that executes automatically when an event of interest appears in a stream
Use:
- Data replication across regions
- Materialized view of tables
- Data analysis with Kinesis

4. Data type

Scalar
Document
Set

5. Read & Write

Strongly consistent read: return with most up-to-date data (no stale data)
When create table / index, first provision throughput capacity:
- WCU (write capacity unit): 1KB
- RCU (read capacity unit): 4KB
Throttling: prevents your application from consuming too many capacity units. DynamoDB can throttle read or write requests that exceed the throughput settings for a table, and can also throttle read requests exceeds for an index

When request is throttled, the HTTP return code is 400 Bad Request
DynamoDB Auto Scaling is enabled by default

6. Items

Use UpdateItem to create Atomic counter: numeric attribute that is incremented unconditionally, not interfere with other write requests
Conditional writes for CUD (A conditional write only succeeds, if the item attributes meet one or more expected conditions)
Conditional writes can be idempotent if the conditional check is on the same attribute that is being updated

7. Other Properties

Projection expression: GET only a few items ( a string that identifies the attributes you want )
Condition expression: determine which should be written for CUD
TTL: Items are auto deleted when expire
Filter expression: Refine query results (only return filtered results, others are discarded)
Query results are paginated
Batch operations: Wrappers for multiple read or write requests.

Batch operations are primarily used when you want to retrieve or submit multiple items in DynamoDB through a single API call, which reduces the number of network round trips from your application to DynamoDB

8. Scans

Reads every item in a table or a secondary index (return all results by default)
By default, a Scan operation performs eventually consistent reads, and process data sequentially

9. On-demand backup & restore

Use IAM to restrict DynamoDB backup and restore actions for some resources
All backup and restore actions are captured & recorded in CloudTrail
Restore backups to a new table

10. Transactions

Simplify the developer experience of making coordinated, all-or-nothing changes to multiple items both within and across tables
Transactions provide atomicity, consistency, isolation, and durability (ACID) in DynamoDB, help to maintain data correctness
You can group multiple Put, Update, Delete, and ConditionCheck actions. You can then submit the actions as a single TransactWriteItems operation that either succeeds or fails as a unit
You can group and submit multiple Get actions as a single TransactGetItems operation

11. Global Tables

To ensure eventual consistency, DynamoDB global tables use a “last writer win” reconciliation between concurrent updates, where DynamoDB makes a best effort to determine the last writer

12. Security

Encrypt data use KMS managed keys
Permission policy (identity based)
- Attach a permissions policy to a user or a group in your account
- Attach a permissions policy to a role (grant cross-account permissions)

13. Monitoring

CW alarms: Watch a single metric over a time period that you specify, and perform one or more actions based on the value of the metric relative to a given threshold over a number of time periods
CW events: Match events and route them to one or more target functions or streams to make changes, capture state information, and take corrective action.
CW logs: Monitor, store, and access logs from CT
CT log monitoring: Share log files between accounts, monitor CT logs in real time by sending them to CW Logs

14. Best practices

Maintain as few tables as possible in a DynamoDB application. Most well designed applications require only one table
Understand access patterns:
- Data size
- Data shape
- Data velocity
DynamoDB applies adaptive capacity in real time in response to changing application traffic patterns (maintain performance)

15. Pricing

Charge for: DAX, RCU, WCU, Reserved capacity, etc.

16. Partition key

The partition key of a table’s primary key determines the logical partitions in which a table’s data is stored. This in turn affects the underlying physical partitions. Provisioned I/O capacity for the table is divided evenly among these physical partitions. Therefore a partition key design that doesn’t distribute I/O requests evenly can create “hot” partitions that result in throttling and use your provisioned I/O capacity inefficiently.

The optimal usage of a table’s provisioned throughput depends not only on the workload patterns of individual items, but also on the partition key design. One example for this is the use of partition keys with high-cardinality attributes, which have a large number of distinct values for each item.

Note that the more distinct partition key values the workload access, the more those requests are spread across the partitioned space.

History

Relational DB is a great way to reduce storage cost (relational data, reduce redundancy), in the 70s and 80s, when the storage device is very expensive.

But relational DB increase CPU costs, because of the complex queries (joins) it executes to present a denormalized view of data that your application consumes.

Now the most expensive resource in the data center is the CPU, but not storage. So why do we want to use the technology (relational DB) that optimizes the least expensive resource in the data center ?!

So here comes NoSQL (Denormalized data model)

How to model data correctly in NoSQL

OLTP: Online Transaction Processing. (repeatable, consistent, simple)

OLAP: Online Analytical Processing

Overview

Wide column key-value store (support document attribute type)

Table: catalog (contains many items)

Mandatory partition key (uniquely identify) - Think of partition as folder / bucket
- Distribute items across key space (i.e. tables)
- Choose partition key that has a large number of distinct values (to fully distribute out)
- Space: Make sure access is evenly spread over the key space
Optional sort key (orders the item within that folder)

LSI & GSI (support secondary access patterns)

Local: resort the data in the partitions (must use the same partition key, so only resorting)
Global: regroup the data (regroup the data by other attributes in the entire table)

Elasticity with AS

NoSQL Data Modeling (Access Patterns)

Select partition key:

Large number of distinct values
e.g. Customer ID

Select sort key:

Model 1:N & N:N relations
e.g. Orders & Order Items

With NoSQL:

Need to first understand every access pattern, what the application is doing
Model based on access patterns
Nature of application: OLAP / OLTP / DSS
NoSQL is efficient, but not flexible: Data modeling is tightly coupled with the access pattern of a specific application

DynamoDB Stream + Lambda

Stream is the change log for the DynamoDB table
Once data is in the stream, can invoke a lambda function
Lambda, 2 IAM roles:
- Invocation role: define what it can see / read from the stream
- Execution role: define what it can do

Composite Keys

Most people use NoSQL as a key-value store, but that’s not the most efficient way to use NoSQL DB
Because we want to store hierarchical data in the table
Sort condition before the read, filter condition after the read
Create composite sort keys, for faster queries on a small number of items