In the fourth post of the AWS series, we’re going to talk about 2 services today:
- Relational Database Service (RDS)
- DynamoDB (NoSQL)
RDS
- Basic building block of RDS: DB instance (isolated DB environment in the cloud)
- Each DB instance runs a DB engine
- An RDS Tag is a name-value pair
- Multi-AZ deployment for HA
Sync data replication in RDS
RDS DB instance running as Multi-AZ deployment
Auto provision & maintain synchronous standby replica in a different AZ
Updates to your DB Instance are synchronously replicated across AZs to the standby in order to keep both in sync and protect your latest database updates against DB instance failure
Async data replication
Use Read Replicas
- All read replicas are active / accessible
- No backups by default
- Manually promoted to standalone DB instance
DB instance
Class types
- Standard
- Memory Optimized
- Burstable Performance
Security
Groups
- DB
- VPC
- EC2
Assign individual IAM account to each person who manages RDS resources
Use IAM groups, and rotate credentials regularly
Use security groups to control what IP addresses / EC2 instances can connect to your DB instance
Monitoring
CW
RDS Events
DB logs: Audit / Error / General / Slow query log (troubleshoot queries that take a long time)
Enhance monitoring for DB instances. Metrics:
IOPS (I/O operations per second)
Latency
Throughput
Queue Depth (number of I/O requests in the queue waiting to be serviced)
CW gathers metrics about CPU utilization from the hypervisor for a DB instance
Enhanced Monitoring gathers its metrics from an agent on the instance
CloudTrail captures all API calls to RDS as events
DynamoDB
Concepts
DAX is DynamoDB’s caching solution. (Cache reads).
Delivers microsecond response times for accessing eventually consistent data
1. Overview
- Encryption at rest, on-demand back-up, and point-in-time recovery
- Data stored in partitions, backed by SSDs, and auto replicated in multiple AZs
2. Components
Tables: no schema, 256 per Region
Items: collection of attributes
Primary key to uniquely identify each item
One PK: partition key
Composite PK: partition key + sort key
Secondary index to make queries faster
Global: different partition & sort key
Local: same partition key, different sort key
Attributes: fundamental data element
3. DynamoDB Streams
Capture data modification events in DynamoDB tables
Each event is represented by a stream record, captures when new item is CUDed
- Stream records (24h) are organized into groups (shards). Each shard acts as a container for multiple stream records
DynamoDB Stream & Lambda:
Trigger: Code that executes automatically when an event of interest appears in a stream
Use:
- Data replication across regions
- Materialized view of tables
- Data analysis with Kinesis
4. Data type
- Scalar
- Document
- Set
5. Read & Write
Strongly consistent read: return with most up-to-date data (no stale data)
When create table / index, first provision throughput capacity:
- WCU (write capacity unit): 1KB
- RCU (read capacity unit): 4KB
Throttling: prevents your application from consuming too many capacity units. DynamoDB can throttle read or write requests that exceed the throughput settings for a table, and can also throttle read requests exceeds for an index
When request is throttled, the HTTP return code is 400 Bad Request
DynamoDB Auto Scaling is enabled by default
6. Items
- Use
UpdateItem
to create Atomic counter: numeric attribute that is incremented unconditionally, not interfere with other write requests - Conditional writes for CUD (A conditional write only succeeds, if the item attributes meet one or more expected conditions)
- Conditional writes can be idempotent if the conditional check is on the same attribute that is being updated
7. Other Properties
Projection expression: GET only a few items ( a string that identifies the attributes you want )
Condition expression: determine which should be written for CUD
TTL: Items are auto deleted when expire
Filter expression: Refine query results (only return filtered results, others are discarded)
Query results are paginated
Batch operations: Wrappers for multiple read or write requests.
Batch operations are primarily used when you want to retrieve or submit multiple items in DynamoDB through a single API call, which reduces the number of network round trips from your application to DynamoDB
8. Scans
- Reads every item in a table or a secondary index (return all results by default)
- By default, a Scan operation performs eventually consistent reads, and process data sequentially
9. On-demand backup & restore
- Use IAM to restrict DynamoDB backup and restore actions for some resources
- All backup and restore actions are captured & recorded in CloudTrail
- Restore backups to a new table
10. Transactions
Simplify the developer experience of making coordinated, all-or-nothing changes to multiple items both within and across tables
Transactions provide atomicity, consistency, isolation, and durability (ACID) in DynamoDB, help to maintain data correctness
You can group multiple Put, Update, Delete, and ConditionCheck actions. You can then submit the actions as a single
TransactWriteItems
operation that either succeeds or fails as a unitYou can group and submit multiple Get actions as a single
TransactGetItem
s operation
11. Global Tables
- To ensure eventual consistency, DynamoDB global tables use a “last writer win” reconciliation between concurrent updates, where DynamoDB makes a best effort to determine the last writer
12. Security
- Encrypt data use KMS managed keys
- Permission policy (identity based)
- Attach a permissions policy to a user or a group in your account
- Attach a permissions policy to a role (grant cross-account permissions)
13. Monitoring
- CW alarms: Watch a single metric over a time period that you specify, and perform one or more actions based on the value of the metric relative to a given threshold over a number of time periods
- CW events: Match events and route them to one or more target functions or streams to make changes, capture state information, and take corrective action.
- CW logs: Monitor, store, and access logs from CT
- CT log monitoring: Share log files between accounts, monitor CT logs in real time by sending them to CW Logs
14. Best practices
- Maintain as few tables as possible in a DynamoDB application. Most well designed applications require only one table
- Understand access patterns:
- Data size
- Data shape
- Data velocity
- DynamoDB applies adaptive capacity in real time in response to changing application traffic patterns (maintain performance)
15. Pricing
- Charge for: DAX, RCU, WCU, Reserved capacity, etc.
16. Partition key
The partition key of a table’s primary key determines the logical partitions in which a table’s data is stored. This in turn affects the underlying physical partitions. Provisioned I/O capacity for the table is divided evenly among these physical partitions. Therefore a partition key design that doesn’t distribute I/O requests evenly can create “hot” partitions that result in throttling and use your provisioned I/O capacity inefficiently.
The optimal usage of a table’s provisioned throughput depends not only on the workload patterns of individual items, but also on the partition key design. One example for this is the use of partition keys with high-cardinality attributes, which have a large number of distinct values for each item.
Note that the more distinct partition key values the workload access, the more those requests are spread across the partitioned space.
History
Relational DB is a great way to reduce storage cost (relational data, reduce redundancy), in the 70s and 80s, when the storage device is very expensive.
But relational DB increase CPU costs, because of the complex queries (joins) it executes to present a denormalized view of data that your application consumes.
Now the most expensive resource in the data center is the CPU, but not storage. So why do we want to use the technology (relational DB) that optimizes the least expensive resource in the data center ?!
So here comes NoSQL (Denormalized data model)
How to model data correctly in NoSQL
OLTP: Online Transaction Processing. (repeatable, consistent, simple)
OLAP: Online Analytical Processing
Overview
Wide column key-value store (support document attribute type)
Table: catalog (contains many items)
- Mandatory partition key (uniquely identify) - Think of partition as folder / bucket
- Distribute items across key space (i.e. tables)
- Choose partition key that has a large number of distinct values (to fully distribute out)
- Space: Make sure access is evenly spread over the key space
- Optional sort key (orders the item within that folder)
LSI & GSI (support secondary access patterns)
- Local: resort the data in the partitions (must use the same partition key, so only resorting)
- Global: regroup the data (regroup the data by other attributes in the entire table)
Elasticity with AS
NoSQL Data Modeling (Access Patterns)
Select partition key:
- Large number of distinct values
- e.g. Customer ID
Select sort key:
- Model 1:N & N:N relations
- e.g. Orders & Order Items
With NoSQL:
- Need to first understand every access pattern, what the application is doing
- Model based on access patterns
- Nature of application: OLAP / OLTP / DSS
- NoSQL is efficient, but not flexible: Data modeling is tightly coupled with the access pattern of a specific application
DynamoDB Stream + Lambda
- Stream is the change log for the DynamoDB table
- Once data is in the stream, can invoke a lambda function
- Lambda, 2 IAM roles:
- Invocation role: define what it can see / read from the stream
- Execution role: define what it can do
Composite Keys
- Most people use NoSQL as a key-value store, but that’s not the most efficient way to use NoSQL DB
- Because we want to store hierarchical data in the table
- Sort condition before the read, filter condition after the read
- Create composite sort keys, for faster queries on a small number of items