In this post, we’re going to talk about running serverless applications on AWS. Below are some of the points from AWS whitepapers.
Optimize Economics with Serverless
Event-driven computing ─ FaaS ─ Serverless FaaS (Lambda)
Function as the unit of deployment & execution.
Serverless
- Functional & microservice approach, where business logic is triggered only when required
- Async, event-based
- Lambda: Receive events / client invocation, then instantiate & run the code
- Scale automatically, built-in fault tolerance
Serverless use case
- Web / mobile backends
- Media & log processing (compute-heavy workloads)
- Automation (functions can be attached to alarms & monitors)
- Near real-time streaming data processes
- Big data: parallelize with a serverless approach
- Low latency: moving serverless event handling to the internet edge (Lambda@Edge)
SAM
- Open specification / blueprint. Application modeling framework
- Orchestration & state management (serverless is stateless)
- Manage all the steps in the SDLC
- Deploy SAM apps with CF (SAM is specification, CF is implementation)
- Edge locations: Key to low-latency serverless computing
- Role-based & access-based permissions, and API-based authentication & access control
Lambda
- Lambda@Edge is available in all edge locations
- Auto retries for async & ordered events
- DLQ: capture events that weren’t processed successfully
- End user: Cognito
Serverless components
- Developer tools
- CF for deployment
- X-Ray for diagnostics (cross-service request tracing & performance analysis)
- CW & CW Logs for monitoring
- Orchestration
- Step Functions (create long-running workflows, state machines)
- CW Events (respond to events)
- Streaming data:
- Kinesis Streams: near real-time analytics engine
- Kinesis Firehose: With Lambda
- Compute: Lambda & Lambda@Edge (cloud logic layer)
- API Proxy: API Gateway (HTTP endpoints)
- Database: DynamoDB
- Storage: S3 (Lambda function can be used as automatic event triggers, when changes on the object)
Construct serverless application
- S3: static content
- Lambda & API Gateway: Dynamic API requests
- DynamoDB: store session & user state
- Cognito: end-user registration, authentication (user pool), and access control to resources (identity pool)
- SAM: describe elements of the app
- CodeStar: CI / CD pipeline
Data processing (Lambda itself is stateless)
- Lambda & Kinesis
- Lambda & S3: trigger computation in response to object creation / event updates
- Step Functions: stateful long-running workflows
Serverless & Lambda
Lambda
- FaaS: Build reactive, event-driven system
- Multiple, simultaneous events: Run more copies of the function in parallel
- Lambda executes in a container (sandbox) that isolates it from other functions
- Lambda also provides a RESTful API, which can directly invoke a Lambda function
Lambda function
- Code
- Configuration
- Event sources (detect events & invoke function, e.g. API Gateway, SNS)
Run the code package
- Download from S3 bucket
- Install in the Lambda runtime environment (based on Amazon Linux AMI)
- Invoke as needed
🧡 Handler
- Specific code method. Specify the handler when create the Lambda function
- Handler can call other methods & functions within the files & classes you’ve uploaded
- Event object
- Context object: allows function code to interact with the Lambda execution environment
- AWS
RequestId
- Remaining time
- Logging: Stream log statements to CW logs
- AWS
Lambda: Statelessness & Reuse
- Warm container: already active, invoked before (faster code execution)
- Cold start: create & invoke for the first time (slower)
Event Sources
Invocation patterns
- Push Model (passive user)
- Pull Model (active user)
- Polls data source, batching new records together in a single function invocation
Lambda functions can be executed async / sync. Choose InvocationType
parameter. It has 3 possible values:
-
RequestResponse
: Sync -
Event
: Async -
DryRun
: Test, not actually executing
Push model event source (Trigger Lambda)
- S3 (Async)
- API Gateway (Async / sync)
- Sync: API as Lambda proxy
- Async: API as AWS service proxy (return immediately with empty response)
- SNS (Async): Automated response to CW alarms
- CF (Sync)
- CW Events (Async): AWS services publish resource state changes to CW Events (for event-driven ops automation)
Pull model event source (Lambda trigger them, all are sync)
- DynamoDB (Sync)
- Workflows triggered as changes occur in a DynamoDB table
- Replicate DynamoDB table to another Region
- Kinesis Streams (Sync): Real-time data processing
Lambda Config
Aliases (Versioning)
To version the Lambda functions: Aliases (Pointer to a specific Lambda version)
- live / prod / active
- blue / green
- debug
Environment Variables (Config)
- Use env var with Lambda: Separate code & config
- Lambda enables user to dynamically pass data to function code
- Key-value pairs, encrypted at rest
- Encrypt with KMS before creating the function, store cyphertext as variable value
- Use cases
- Log setting (INFO, DEBUG, etc)
- Dependency & database connection credentials
IAM role
- Policies can be associated with IAM roles
- Assign IAM execution role to Lambda functions
- Source code is decoupled from the security aspect, does not need any credential check / rotation
Function permissions
- Pull model event sources ONLY
- Make sure actions are permitted
- AWS provides a set of IAM roles associated with each of the pull-based event sources
Outbound Network Connectivity
Default: VPC managed by Lambda, not private connection
VPC: Communicate via ENI (Elastic Network Interface), connect to private resources
ENIs can be assigned security groups
Route traffic based on the route tables of ENIs’ subnets
If you choose VPC, you need to manage:
- Subnets, ensure multi-AZ
- Allocate IP addresses to each subnet
- VPC network design
- Code start time increase, if invocation requires new ENI to be created just in time
DLQ (Dead Letter Queue)
- SNS topic / SQS queue
- Destination for all filed invocation events
- Use DLQ if you need all Lambda invocation complete eventually, even if execution is delayed
Timeout:
- Time limit for a single invocation of a Lambda function (300 s)
- Sometimes need to fail fast
- Should not rely on background / async processes for critical activities
Architecture Best Practices
Security
General
One IAM role per function (1 : 1 relations, decouple the IAM role)
Use temporary AWS credentials (SDK, manage retrieval & rotation)
For cross-account cases, grant execution role to
AssumeRole
API within STS (Security Token Service)Store user session data in DynamoDB / ElastiCache, to reduce latency
Secrets should always only exist in memory, and never logged / written to disk
VPC security: Lambda-specific subnets, NACL, route tables
Persisting secrets
- Lambda env var with encryption helpers
- Pro: Directly to runtime (no latency)
- Con: Coupled to function version
- EC2 Systems Manager Parameter Store
- Pro: Decoupled from function version
- Con: Add latency (for retrieval)
API auth
- API Gateway as Lambda’s event source: You have ownership to authorize & authenticate your API clients
- SigV4 authentication
- Lambda Authorizer
Deployment access control
-
UpdateFunctionCode
API call: code deployment -
UpdateAlias
API call: code release - Eliminate direct user access to the above APIs for any function (use automation)
Reliability
- HA: Subnets have adequate IPs to support many concurrent functions
- Fault tolerance: Multi-region coordinates failover across all tiers of app stack
- Recovery: For async, use DLQ (store during outage, process after recovery)
Performance Efficiency
If use case can be achieved async, then do not need to concern the performance
Use event
InvocationType
, or pull-based modelAllow application logic to proceed, while Lambda process event separately
Optimize Lambda execution time
- Resources allocation in the function configuration
- Language runtime
- The code you write (warm container reuse, minimize initial cost of cold start)
Choose optimal memory size (RAM impacts CPU time & network bandwidth)
Monitor memory usage in CW Logs
Use X-Ray to trace full lifecycle of application request, through each of its component parts.
Operational Excellence
General
- Use Lambda env var to create log level var
- Enable investigation with logging, use X-Ray to profile applications
- Create Lambda aliases that represent operational activities such as integration testing, performance testing, debugging, etc
Metrics
- Create alarm thresholds (high & low) for each Lambda function, on all provided metrics through CW
- Create custom metric, and integrate directly with API required from Lambda
- Capture metric with Lambda function code, and log it using provided logging mechanisms in Lambda
- Then create CW Log metric filter on the function stream, to extract the metric, and make it available in CW
- Create another Lambda as a subscription filter on the CW Log stream to push filtered log statements to another metrics
Deployment
- Steps:
- Upload new function code
- Publish the new version
- update the alias
- Parallel version invocations
- Deployment schedule (do not choose peak time)
- Rollback
Cost Optimization
- Right sizing (might pay more due to longer execution time)
- Distributed & async architecture (Each decoupled architecture component takes less compute time to conduct the work)
- Many Lambda event sources fit well with distributed systems
Development Best Practices
1. Infrastructure as code
- CF requires large amount of JSON / yaml, so we use SAM (open specification abstraction layer on top of CF)
- Use SAM & CF together
2. Load testing
- SAM Local to test serverless functions & apps locally (use Docker)
3. Coding
Put business logic outside the Handler
Lambda starts execution at the handler function, then it pass the parameters (event & context) to another function to parse into new vars / objects that are contextualized to your app.
Warm containers: Caching / Keepalived / Reuse
Scoping vars in a way that they & their contents can be reused on subsequent invocations.
Control dependencies
Fail fast
- Short timeout for external dependencies & Lambda overall timeout
Handling exceptions (for async)
- Some exception goes to DLQ for reprocessing
- Some just logged
4. Code Management
- Code repository organization (1 : 1)
- Make sure Lambda function is independently versioned & committed to
- Release branches
- Correlate Lambda function deployment with incremental commits on a release branch
5. Testing
1) Unit Test
- Scope all unit tests down to a single code path, within a single logical function
- Focus mostly on the business logic outside the handler function
- Unit test the ability to parse mock objects for the event sources
- Local test automation with SAM Local
2) Integration test
- Integration test: test integration of the code to its dependencies in an env that mimics the live env
- Create lower lifecycle version of the Lambda function
6. Continuous Delivery
- CodeCommit: hosted private Git repos
- CodePipeline: Declarative steps in the pipeline
- CodeBuild: Build the code, run unit tests, and create code package
- SAM: Integrate with CodeBuild, push code package to S3, and push new package to Lambda via CF
- CodeStar: = Commit + Pipeline + Build. A CD toolchain, manage all aspects of the SDLC