Merikanto

一簫一劍平生意,負盡狂名十五年

AWS - 05 Serverless on AWS

In this post, we’re going to talk about running serverless applications on AWS. Below are some of the points from AWS whitepapers.



Optimize Economics with Serverless

Event-driven computing  ─  FaaS  ─  Serverless FaaS (Lambda)

Function as the unit of deployment & execution.


Serverless

  • Functional & microservice approach, where business logic is triggered only when required
  • Async, event-based
  • Lambda: Receive events / client invocation, then instantiate & run the code
  • Scale automatically, built-in fault tolerance

Serverless use case

  • Web / mobile backends
  • Media & log processing (compute-heavy workloads)
  • Automation (functions can be attached to alarms & monitors)
  • Near real-time streaming data processes
    • Big data: parallelize with a serverless approach
    • Low latency: moving serverless event handling to the internet edge (Lambda@Edge)

SAM

  • Open specification / blueprint. Application modeling framework
  • Orchestration & state management (serverless is stateless)
  • Manage all the steps in the SDLC
  • Deploy SAM apps with CF (SAM is specification, CF is implementation)
  • Edge locations: Key to low-latency serverless computing
  • Role-based & access-based permissions, and API-based authentication & access control

Lambda

  • Lambda@Edge is available in all edge locations
  • Auto retries for async & ordered events
  • DLQ: capture events that weren’t processed successfully
  • End user: Cognito

Serverless components

  • Developer tools
    • CF for deployment
    • X-Ray for diagnostics (cross-service request tracing & performance analysis)
    • CW & CW Logs for monitoring
  • Orchestration
    • Step Functions (create long-running workflows, state machines)
    • CW Events (respond to events)
  • Streaming data:
    • Kinesis Streams: near real-time analytics engine
    • Kinesis Firehose: With Lambda
  • Compute: Lambda & Lambda@Edge (cloud logic layer)
  • API Proxy: API Gateway (HTTP endpoints)
  • Database: DynamoDB
  • Storage: S3 (Lambda function can be used as automatic event triggers, when changes on the object)

Construct serverless application

  • S3: static content
  • Lambda & API Gateway: Dynamic API requests
  • DynamoDB: store session & user state
  • Cognito: end-user registration, authentication (user pool), and access control to resources (identity pool)
  • SAM: describe elements of the app
  • CodeStar: CI / CD pipeline

Data processing (Lambda itself is stateless)

  • Lambda & Kinesis
  • Lambda & S3: trigger computation in response to object creation / event updates
  • Step Functions: stateful long-running workflows



Serverless & Lambda


Lambda

  • FaaS: Build reactive, event-driven system
  • Multiple, simultaneous events: Run more copies of the function in parallel
  • Lambda executes in a container (sandbox) that isolates it from other functions
  • Lambda also provides a RESTful API, which can directly invoke a Lambda function

Lambda function

  • Code
  • Configuration
  • Event sources (detect events & invoke function, e.g. API Gateway, SNS)

Run the code package

  • Download from S3 bucket
  • Install in the Lambda runtime environment (based on Amazon Linux AMI)
  • Invoke as needed

🧡 Handler

  • Specific code method. Specify the handler when create the Lambda function
  • Handler can call other methods & functions within the files & classes you’ve uploaded
  • Event object
  • Context object: allows function code to interact with the Lambda execution environment
    • AWS RequestId
    • Remaining time
    • Logging: Stream log statements to CW logs

Lambda: Statelessness & Reuse

  • Warm container: already active, invoked before (faster code execution)
  • Cold start: create & invoke for the first time (slower)

Event Sources

Invocation patterns

  • Push Model (passive user)
  • Pull Model (active user)
    • Polls data source, batching new records together in a single function invocation

Lambda functions can be executed async / sync. Choose InvocationType parameter. It has 3 possible values:

  • RequestResponse: Sync
  • Event: Async
  • DryRun: Test, not actually executing

Push model event source (Trigger Lambda)

  • S3 (Async)
  • API Gateway (Async / sync)
    • Sync: API as Lambda proxy
    • Async: API as AWS service proxy (return immediately with empty response)
  • SNS (Async): Automated response to CW alarms
  • CF (Sync)
  • CW Events (Async): AWS services publish resource state changes to CW Events (for event-driven ops automation)

Pull model event source (Lambda trigger them, all are sync)

  • DynamoDB (Sync)
    • Workflows triggered as changes occur in a DynamoDB table
    • Replicate DynamoDB table to another Region
  • Kinesis Streams (Sync): Real-time data processing

Lambda Config

Aliases (Versioning)

To version the Lambda functions: Aliases (Pointer to a specific Lambda version)

  • live / prod / active
  • blue / green
  • debug

Environment Variables (Config)

  • Use env var with Lambda: Separate code & config
  • Lambda enables user to dynamically pass data to function code
  • Key-value pairs, encrypted at rest
  • Encrypt with KMS before creating the function, store cyphertext as variable value
  • Use cases
    • Log setting (INFO, DEBUG, etc)
    • Dependency & database connection credentials

IAM role

  • Policies can be associated with IAM roles
  • Assign IAM execution role to Lambda functions
  • Source code is decoupled from the security aspect, does not need any credential check / rotation

Function permissions

  • Pull model event sources ONLY
    • Make sure actions are permitted
    • AWS provides a set of IAM roles associated with each of the pull-based event sources

Outbound Network Connectivity

  • Default: VPC managed by Lambda, not private connection

  • VPC: Communicate via ENI (Elastic Network Interface), connect to private resources

    • ENIs can be assigned security groups

    • Route traffic based on the route tables of ENIs’ subnets

      If you choose VPC, you need to manage:

      • Subnets, ensure multi-AZ
      • Allocate IP addresses to each subnet
      • VPC network design
      • Code start time increase, if invocation requires new ENI to be created just in time


DLQ (Dead Letter Queue)

  • SNS topic / SQS queue
  • Destination for all filed invocation events
  • Use DLQ if you need all Lambda invocation complete eventually, even if execution is delayed

Timeout:

  • Time limit for a single invocation of a Lambda function (300 s)
  • Sometimes need to fail fast
  • Should not rely on background / async processes for critical activities

Architecture Best Practices


Security

General

  • One IAM role per function (1 : 1 relations, decouple the IAM role)

  • Use temporary AWS credentials (SDK, manage retrieval & rotation)

    For cross-account cases, grant execution role to AssumeRole API within STS (Security Token Service)

  • Store user session data in DynamoDB / ElastiCache, to reduce latency

  • Secrets should always only exist in memory, and never logged / written to disk

  • VPC security: Lambda-specific subnets, NACL, route tables


Persisting secrets

  • Lambda env var with encryption helpers
    • Pro: Directly to runtime (no latency)
    • Con: Coupled to function version
  • EC2 Systems Manager Parameter Store
    • Pro: Decoupled from function version
    • Con: Add latency (for retrieval)

API auth

  • API Gateway as Lambda’s event source: You have ownership to authorize & authenticate your API clients
  • SigV4 authentication
  • Lambda Authorizer

Deployment access control

  • UpdateFunctionCode API call: code deployment
  • UpdateAlias API call: code release
  • Eliminate direct user access to the above APIs for any function (use automation)

Reliability

  • HA: Subnets have adequate IPs to support many concurrent functions
  • Fault tolerance: Multi-region coordinates failover across all tiers of app stack
  • Recovery: For async, use DLQ (store during outage, process after recovery)

Performance Efficiency

If use case can be achieved async, then do not need to concern the performance

  • Use event InvocationType, or pull-based model

  • Allow application logic to proceed, while Lambda process event separately

  • Optimize Lambda execution time

    • Resources allocation in the function configuration
    • Language runtime
    • The code you write (warm container reuse, minimize initial cost of cold start)
  • Choose optimal memory size (RAM impacts CPU time & network bandwidth)

    Monitor memory usage in CW Logs

    Use X-Ray to trace full lifecycle of application request, through each of its component parts.


Operational Excellence


General

  • Use Lambda env var to create log level var
  • Enable investigation with logging, use X-Ray to profile applications
  • Create Lambda aliases that represent operational activities such as integration testing, performance testing, debugging, etc

Metrics

  • Create alarm thresholds (high & low) for each Lambda function, on all provided metrics through CW
  • Create custom metric, and integrate directly with API required from Lambda
  • Capture metric with Lambda function code, and log it using provided logging mechanisms in Lambda
    • Then create CW Log metric filter on the function stream, to extract the metric, and make it available in CW
    • Create another Lambda as a subscription filter on the CW Log stream to push filtered log statements to another metrics

Deployment

  • Steps:
    • Upload new function code
    • Publish the new version
    • update the alias
  • Parallel version invocations
  • Deployment schedule (do not choose peak time)
  • Rollback

Cost Optimization

  • Right sizing (might pay more due to longer execution time)
  • Distributed & async architecture (Each decoupled architecture component takes less compute time to conduct the work)
  • Many Lambda event sources fit well with distributed systems

Development Best Practices


1. Infrastructure as code

  • CF requires large amount of JSON / yaml, so we use SAM (open specification abstraction layer on top of CF)
  • Use SAM & CF together

2. Load testing

  • SAM Local to test serverless functions & apps locally (use Docker)

3. Coding

  • Put business logic outside the Handler

    Lambda starts execution at the handler function, then it pass the parameters (event & context) to another function to parse into new vars / objects that are contextualized to your app.

  • Warm containers: Caching / Keepalived / Reuse

    Scoping vars in a way that they & their contents can be reused on subsequent invocations.

  • Control dependencies

  • Fail fast

    • Short timeout for external dependencies & Lambda overall timeout
  • Handling exceptions (for async)

    • Some exception goes to DLQ for reprocessing
    • Some just logged

4. Code Management

  • Code repository organization (1 : 1)
    • Make sure Lambda function is independently versioned & committed to
  • Release branches
    • Correlate Lambda function deployment with incremental commits on a release branch

5. Testing

1) Unit Test

  • Scope all unit tests down to a single code path, within a single logical function
  • Focus mostly on the business logic outside the handler function
  • Unit test the ability to parse mock objects for the event sources
  • Local test automation with SAM Local

2) Integration test

  • Integration test: test integration of the code to its dependencies in an env that mimics the live env
  • Create lower lifecycle version of the Lambda function

6. Continuous Delivery

  • CodeCommit: hosted private Git repos
  • CodePipeline: Declarative steps in the pipeline
  • CodeBuild: Build the code, run unit tests, and create code package
  • SAM: Integrate with CodeBuild, push code package to S3, and push new package to Lambda via CF
  • CodeStar: = Commit + Pipeline + Build. A CD toolchain, manage all aspects of the SDLC