Merikanto

一簫一劍平生意,負盡狂名十五年

AWS - 03 S3 & ECS

In the third post of the AWS series, we’re going to talk about 2 services today:

  • Simple Storage Service (S3)
  • EC2 Container Service (ECS)


S3

  • Object: data & key & metadata (system & user-defined)
  • Key: unique identifier for an object within a bucket

  • Bucket name must be globally unique DNS-compliant name (no dots)
  • After create the bucket, cannot change name / Region
  • Up to 100 buckets (soft limit)

  • Host static sites with S3 (configure it for static hosting)
  • Read-after-write data consistency
  • Eventual consistency
    • read-after-write (GET, HEAD)
    • overwrite (PUT, DELETE)

Storage class

  • Standard: general-purpose
    • Standard_IA: long-lived, less frequently accessed
  • Reduced_redundancy(RRS): Cost more than standard

Glacier

  • Glacier objects are visible through S3 only
  • Glacier Deep Archive

S3 Select

  • Pull out only the data you need from an object (improve performance)
  • Format: CSV / JSON
  • CloudWatch Metrics with S3 Select

S3 Transfer Acceleration: use edge locations to speed up


Security

  • Resources

    • Bucket policies (centralize access control)
    • ACLs
  • Server-side encryption:

    • SSE-C (customer provided)
    • SSE-KMS
    • SSE-S3 (managed by S3)
  • Client-side:

    • KMS-managed customer master key
    • Client-side master key

S3 Bucket policy for VPC endpoints

  • Amazon S3 is a service that is not used within a VPC. This means that traffic does not pass through VPC resources such as internet gateways or NAT gateways.

  • For security, instead of using security groups and network ACLs, you have bucket policies and S3 ACLs for managing access to your S3 bucket and objects.

  • Amazon S3 traffic passes through the public internet. If you want your traffic to run only within the Amazon network then you will have to employ VPC endpoints.


  • A VPC endpoint is what you use to privately connect your VPC to AWS services (e.g. EC2). It adds a gateway entry in your VPC’s route table so that communication between your AWS resources and your S3 bucket pass through the gateway instead of the public internet. As a result, VPC endpoint is a regional service. You should create the endpoint in the same region as the VPC you want to use.
  • VPC endpoints are best used when you have compliance requirements or sensitive information stored in S3 that should not leave the Amazon network
  • A VPC endpoint is also a better option for private network connections in AWS, as compared to using VPN / NAT, since it is easier to setup and offers you more network bandwidth at your disposal

  • VPC endpoint access policies & S3 bucket policies: Refine access control


ECS

  • Manage Docker containers on a cluster, A Regional service

  • After a cluster is up and running, you can define task definitions and services that specify which Docker container images to run across your clusters.


Components

Containers created from images (read-only), images built from Dockerfile, stored in registry


Task definitions (JSON) specify various parameters for your application. It describes 1 - 10 containers that form the application

  • IAM task role

  • Container definitions (image, CPU, memory)

  • Volumes (share data between containers, even persist the data on the container instance, when containers are no longer running)

  • Launch types (EC2 / Fargate)

    • Fargate as serverless container - Does not need to manage infrastructure (EC2 needs manual maintenance)
    • Fargate price is based on task size

Task & Scheduling

  • A task is the instantiation of a task definition within a cluster

  • After create task definition, specify the number of tasks that will run on your cluster

    • Each task that uses the Fargate launch type has its own isolation boundary, and does not share the underlying kernel, CPU resources, memory resources, or elastic network interface with another task
  • Task scheduler: cron. placing task within the cluster

    • Replica: spreads tasks across AZs
    • Daemon: deploys exactly one task on each active container; No need to specify a desired number of tasks & task placement strategy
  • Upload a new version of task definition, and the ECS scheduler automatically starts new containers using the updated image and stop containers running the previous version


Clusters

  • Place tasks in ECS in a cluster (logical grouping of resources)
  • Clusters can contain tasks use both EC2 & Fargate
  • Before delete a cluster, must first delete services and deregister container instances inside that cluster

Services

  • Run & maintain a specified number of instances of a task definition simultaneously in a cluster
  • Two deployment strategies
    • Rolling update (downtime)
    • Blue / Green (no downtime, need Application LB / Network LB)

Container agent

  • Runs on each infrastructure resource within an ECS cluster
  • Only supported in EC2 launch type

Fargate (Serverless Container)

  • Use Fargate with ECS to run containers without having to manage servers or clusters of EC2 instances
  • Fargate only supports images on DockerHub / ECR

  • A serverless compute engine for containers that works with ECS & EKS; A managed service for container cluster management
  • No manual provisioning, patching, cluster capacity management, or any infrastructure management

Task definition for Fargate launch type

  • Network use awsvpc, which provides each task with its own ENI (elastic network interface)``

  • Specify CPU & memory at task level

  • awslogs: send log to CloudWatch Logs

    Other log drivers: Splunk, Firelens, Fluentd

  • Task storage is ephemeral, when task stops, storage is deleted (think of how docker container works)

  • Put multiple containers in the same task definition if: Share resources / data volumes / lifecycle


Task definition for EC2 launch type

  • Data volumes: Docker & Bind
  • Private repos are only supported by EC2

  • If you have a service with running tasks and want to update their platform version, you can update your service, specify a new platform version, and choose Force new deployment. Your tasks are redeployed with the latest platform version
  • If your service is scaled up without updating the platform version, those tasks receive the platform version that was specified on the service’s current deployment

Task Placement Strategies

A task placement strategy is an algorithm for selecting instances for task placement or tasks for termination.


1. Binpack

  • Place tasks based on the least available amount of CPU or memory
  • Cost efficient

2. Random

  • Randomly (when task placement / termination doesn’t matter)

3. Spread

  • Place tasks evenly based on the specified value
  • Accepted values are attribute key-value pairs, instanceId, or host
  • Achieve high availability by making sure that multiple copies of a task are scheduled across multiple instances
  • Fargate default: spread across multiple AZs

Monitoring & Other

  • Send log info to CW Logs
  • With CW Alarms, monitor single metric over a specified time period
  • Share log files between accounts, monitor CloudTrail log files in real time by sending them to CloudWatch Logs

  • Tagging: ECS resources are assigned with unique ID & ARN (Amazon Resource Name). Tagged with self-defined values to identify.