AWS - 03 S3 & ECS | Merikanto

In the third post of the AWS series, we’re going to talk about 2 services today:

Simple Storage Service (S3)
EC2 Container Service (ECS)

S3

Object: data & key & metadata (system & user-defined)
Key: unique identifier for an object within a bucket

Bucket name must be globally unique DNS-compliant name (no dots)
After create the bucket, cannot change name / Region
Up to 100 buckets (soft limit)

Host static sites with S3 (configure it for static hosting)
Read-after-write data consistency
Eventual consistency
- read-after-write (GET, HEAD)
- overwrite (PUT, DELETE)

Storage class

Standard: general-purpose
- Standard_IA: long-lived, less frequently accessed
Reduced_redundancy(RRS): Cost more than standard

Glacier

Glacier objects are visible through S3 only
Glacier Deep Archive

S3 Select

Pull out only the data you need from an object (improve performance)
Format: CSV / JSON
CloudWatch Metrics with S3 Select

S3 Transfer Acceleration: use edge locations to speed up

Security

Resources
- Bucket policies (centralize access control)
- ACLs
Server-side encryption:
- SSE-C (customer provided)
- SSE-KMS
- SSE-S3 (managed by S3)
Client-side:
- KMS-managed customer master key
- Client-side master key

S3 Bucket policy for VPC endpoints

Amazon S3 is a service that is not used within a VPC. This means that traffic does not pass through VPC resources such as internet gateways or NAT gateways.
For security, instead of using security groups and network ACLs, you have bucket policies and S3 ACLs for managing access to your S3 bucket and objects.
Amazon S3 traffic passes through the public internet. If you want your traffic to run only within the Amazon network then you will have to employ VPC endpoints.

A VPC endpoint is what you use to privately connect your VPC to AWS services (e.g. EC2). It adds a gateway entry in your VPC’s route table so that communication between your AWS resources and your S3 bucket pass through the gateway instead of the public internet. As a result, VPC endpoint is a regional service. You should create the endpoint in the same region as the VPC you want to use.
VPC endpoints are best used when you have compliance requirements or sensitive information stored in S3 that should not leave the Amazon network
A VPC endpoint is also a better option for private network connections in AWS, as compared to using VPN / NAT, since it is easier to setup and offers you more network bandwidth at your disposal

VPC endpoint access policies & S3 bucket policies: Refine access control

ECS

Manage Docker containers on a cluster, A Regional service
After a cluster is up and running, you can define task definitions and services that specify which Docker container images to run across your clusters.

Components

Containers created from images (read-only), images built from Dockerfile, stored in registry

Task definitions (JSON) specify various parameters for your application. It describes 1 - 10 containers that form the application

IAM task role
Container definitions (image, CPU, memory)
Volumes (share data between containers, even persist the data on the container instance, when containers are no longer running)
Launch types (EC2 / Fargate)
- Fargate as serverless container - Does not need to manage infrastructure (EC2 needs manual maintenance)
- Fargate price is based on task size

Task & Scheduling

A task is the instantiation of a task definition within a cluster
After create task definition, specify the number of tasks that will run on your cluster
- Each task that uses the Fargate launch type has its own isolation boundary, and does not share the underlying kernel, CPU resources, memory resources, or elastic network interface with another task
Task scheduler: cron. placing task within the cluster
- Replica: spreads tasks across AZs
- Daemon: deploys exactly one task on each active container; No need to specify a desired number of tasks & task placement strategy
Upload a new version of task definition, and the ECS scheduler automatically starts new containers using the updated image and stop containers running the previous version

Clusters

Place tasks in ECS in a cluster (logical grouping of resources)
Clusters can contain tasks use both EC2 & Fargate
Before delete a cluster, must first delete services and deregister container instances inside that cluster

Services

Run & maintain a specified number of instances of a task definition simultaneously in a cluster
Two deployment strategies
- Rolling update (downtime)
- Blue / Green (no downtime, need Application LB / Network LB)

Container agent

Runs on each infrastructure resource within an ECS cluster
Only supported in EC2 launch type

Fargate (Serverless Container)

Use Fargate with ECS to run containers without having to manage servers or clusters of EC2 instances
Fargate only supports images on DockerHub / ECR

A serverless compute engine for containers that works with ECS & EKS; A managed service for container cluster management
No manual provisioning, patching, cluster capacity management, or any infrastructure management

Task definition for Fargate launch type

Network use awsvpc, which provides each task with its own ENI (elastic network interface)``
Specify CPU & memory at task level
awslogs: send log to CloudWatch Logs

Other log drivers: Splunk, Firelens, Fluentd
Task storage is ephemeral, when task stops, storage is deleted (think of how docker container works)
Put multiple containers in the same task definition if: Share resources / data volumes / lifecycle

Task definition for EC2 launch type

Data volumes: Docker & Bind
Private repos are only supported by EC2

If you have a service with running tasks and want to update their platform version, you can update your service, specify a new platform version, and choose Force new deployment. Your tasks are redeployed with the latest platform version
If your service is scaled up without updating the platform version, those tasks receive the platform version that was specified on the service’s current deployment

Task Placement Strategies

A task placement strategy is an algorithm for selecting instances for task placement or tasks for termination.

1. Binpack

Place tasks based on the least available amount of CPU or memory
Cost efficient

2. Random

Randomly (when task placement / termination doesn’t matter)

3. Spread

Place tasks evenly based on the specified value
Accepted values are attribute key-value pairs, instanceId, or host
Achieve high availability by making sure that multiple copies of a task are scheduled across multiple instances
Fargate default: spread across multiple AZs

Monitoring & Other

Send log info to CW Logs
With CW Alarms, monitor single metric over a specified time period
Share log files between accounts, monitor CloudTrail log files in real time by sending them to CloudWatch Logs

Tagging: ECS resources are assigned with unique ID & ARN (Amazon Resource Name). Tagged with self-defined values to identify.