From YAML to TypeScript: Developer's View on Cloud Automation

The rise of managed cloud services, cloud-native and serverless applications brings both new possibilities and challenges. More and more practices from software development process like version control, code review, continuous integration, and automated testing are applied to the cloud infrastructure provisioning.

Most existing tools suggest defining infrastructure in text-based markup formats. In this article, I'm making a case for using real programming languages like TypeScript instead. Such a change makes even more software development practices applicable to the infrastructure realm.

Sample Application

It's easier to make a case given a specific example. For this article, I define a URL Shortener application, a basic clone of tinyurl.com or bit.ly. There is an administrative page where one can define short aliases for long URLs:

URL Shortener sample app

URL Shortener sample app

Now, whenever somebody goes to the base URL of the application + an existing alias, they get redirected to the full URL.

This app is simple to describe but involves enough moving parts to be representative of some real-world issues. As a bonus, there are many existing implementations on the web to compare with.

Serverless URL Shortener

I'm a big proponent of the serverless architecture: the style of cloud applications being a combination of serverless functions and managed cloud services. They are fast to develop, effortless to run and cost pennies unless the application gets lots of users. However, even serverless applications require infrastructure provisioning, like databases, queues, and other sources of events and destinations of data.

My examples are going to use Amazon AWS, but this could be Microsoft Azure or Google Cloud Platform too.

So, the gist is to store URLs with short names as key-value pairs in Amazon DynamoDB and use AWS Lambdas to run the application code. Here is the initial sketch:

URL Shortener with AWS Lambda and DynamoDB

URL Shortener with AWS Lambda and DynamoDB

The Lambda on the top receives an event when somebody decides to add a new URL. It extracts the name and the URL from the request and saves them as an item in the DynamoDB table.

The Lambda at the bottom is called when a user navigates to a short URL. The code reads the full URL based on the requested path and returns a 301 response with the corresponding location.

Here is the implementation of the Open URL Lambda in JavaScript:

const aws = require('aws-sdk');
const table = new aws.DynamoDB.DocumentClient();

exports.handler = async (event) => {
  const name = event.path.substring(1);

  const params = { TableName: "urls", Key: { "name": name } };
  const value = await table.get(params).promise();

  const url = value && value.Item && value.Item.url;
  return url 
    ? { statusCode: 301, body: "", headers: { "Location": url } }
    : { statusCode: 404, body: name + " not found" };
};

That's 11 lines of code. I'll skip the implementation of Add URL function because it's very similar. Considering a third function to list the existing URLs for UI, we might have 30-40 lines of JavaScript in total.

So, how do we deploy the application?

Well, before we do that, we should realize that the above picture was an over-simplification.

  • AWS Lambda can't handle HTTP requests directly, so we need to add AWS API Gateway in front of it.
  • We also need to serve some static files for the UI, which we'll put into AWS S3 and front it with the same API Gateway.

Here is the updated diagram:

API Gateway, Lambda, DynamoDB, and S3

API Gateway, Lambda, DynamoDB, and S3

This is a viable design, but the details are even more complicated:

  • API Gateway is a complex beast which needs Stages, Deployments, and REST Endpoints to be appropriately configured.
  • Permissions and policies need to be defined so that API Gateway could call Lambda and Lambda could access DynamoDB.
  • Static Files should go to S3 Bucket Objects.

So, the actual setup involves a couple of dozen objects to be configured in AWS:

Cloud resources to be provisioned

Cloud resources to be provisioned

How do we approach this task?

Options to Provision the Infrastructure

There are many options to provision a cloud application, each one has its trade-offs. Let's quickly go through the list of possibilities to understand the landscape.

AWS Web Console

AWS, like any other cloud, has a web user interface to configure its resources:

AWS Web Console

AWS Web Console

That's a decent place to start—good for experimenting, figuring out the available options, following the tutorials, i.e., for exploration.

However, it doesn't suit particularly well for long-lived ever-changing applications developed in teams. A manually clicked deployment is pretty hard to reproduce in the exact manner, which becomes a maintainability issue pretty fast.

AWS Command Line Interface

The AWS Command Line Interface (CLI) is a unified tool to manage all AWS services from a command prompt. You write the calls like

aws apigateway create-rest-api --name 'My First API' --description 'This is my first API'

aws apigateway create-stage --rest-api-id 1234123412 --stage-name 'dev' --description 'Development stage' --deployment-id a1b2c3

The initial experience might not be as smooth as clicking buttons in the browser, but the huge benefit is that you can reuse commands that you once wrote. You can build scripts by combining many commands into cohesive scenarios. So, your colleague can benefit from the same script that you created. You can provision multiple environments by parameterizing the scripts.

Frankly speaking, I've never done that for several reasons:

  • CLI scripts feel too imperative to me. I have to describe "how" to do things, not "what" I want to get in the end.
  • There seems to be no good story for updating existing resources. Do I write small delta scripts for each change? Do I have to keep them forever and run the full suite every time I need a new environment?
  • If a failure occurs mid-way through the script, I need to manually repair everything to a consistent state. This gets messy real quick, and I have no desire to exercise this process, especially in production.

To overcome such limitations, the notion of the Desired State Configuration (DSC) was invented. Under this paradigm, one describes the desired layout of the infrastructure, and then the tooling takes care of either provisioning it from scratch or applying the required changes to an existing environment.

Which tool provides DSC model for AWS? There are legions.

AWS CloudFormation

AWS CloudFormation is the first-party tool for Desired State Configuration management from Amazon. CloudFormation templates use YAML to describe all the infrastructure resources of AWS.

Here is a snippet from a private URL shortener example kindly provided at AWS blog:

Resources:
  S3BucketForURLs:
    Type: "AWS::S3::Bucket"
    DeletionPolicy: Delete
    Properties:
      BucketName: !If [ "CreateNewBucket", !Ref "AWS::NoValue", !Ref S3BucketName ]
      WebsiteConfiguration:
        IndexDocument: "index.html"
      LifecycleConfiguration:
        Rules:
          -
            Id: DisposeShortUrls
            ExpirationInDays: !Ref URLExpiration
            Prefix: "u"
            Status: Enabled

This is just a very short fragment: the complete example consists of 317 lines YAML. That's an order of magnitude more than the actual JavaScript code that we have in the application!

CloudFormation is a powerful tool, but it demands quite some learning to be done to master it. Moreover, it's specific to AWS: you won't be able to transfer the skill to other cloud providers.

Wouldn't it be great if there was a universal DSC format? Meet Terraform.

Terraform

HashiCorp Terraform is an open source tool to define infrastructure in declarative configuration files. It has a pluggable architecture, so the tool supports all major clouds and even hybrid scenarios.

The custom text-based Terraform .tf format is used to define the configurations. The templating language is quite powerful, and once you learn it, you can use it for different cloud providers.

Here is a snippet from AWS Lambda Short URL Generator example:

resource "aws_api_gateway_rest_api" "short_urls_api_gateway" {
  name        = "Short URLs API"
  description = "API for managing short URLs."
}
resource "aws_api_gateway_usage_plan" "short_urls_admin_api_key_usage_plan" {
  name         = "Short URLs admin API key usage plan"
  description  = "Usage plan for the admin API key for Short URLS."
  api_stages {
    api_id = "${aws_api_gateway_rest_api.short_urls_api_gateway.id}"
    stage  = "${aws_api_gateway_deployment.short_url_api_deployment.stage_name}"
  }
}

This time, the complete example is around 450 lines of textual templates. Are there ways to reduce the size of the infrastructure definition?

Yes, by raising the level of abstraction. It's possible with Terraform's modules, or by using other, more specialized tools.

Serverless Framework and SAM

The Serverless Framework is an infrastructure management tool focused on serverless applications. It works across cloud providers (AWS support is the strongest though) and only exposes features related to building applications with cloud functions.

The benefit is that it's much more concise. Once again, the tool is using YAML to define the templates, here is the snippet from Serverless URL Shortener example:

functions:
  store:
    handler: api/store.handle
    events:
      - http:
          path: /
          method: post
          cors: true

The domain-specific language yields a shorter definition: this example has 45 lines of YAML + 123 lines of JavaScript functions.

However, the conciseness has a flip side: as soon as you veer outside of the fairly "thin" golden path—the cloud functions and an incomplete list of event sources—you have to fall back to more generic tools like CloudFormation. As soon as your landscape includes lower-level infrastructure work or some container-based components, you're stuck using multiple config languages and tools again.

Amazon's AWS Serverless Application Model (SAM) looks very similar to the Serverless Framework but tailored to be AWS-specific.

Is that the end game? I don't think so.

Desired Properties of Infrastructure Definition Tool

So what have we learned while going through the existing landscape? The perfect infrastructure tools should:

  • Provide reproducible results of deployments
  • Be scriptable, i.e., require no human intervention after the definition is complete
  • Define desired state rather than exact steps to achieve it
  • Support multiple cloud providers and hybrid scenarios
  • Be universal in the sense of using the same tool to define any type of resource
  • Be succinct and concise to stay readable and manageable
  • Use YAML-based format

Nah, I crossed out the last item. YAML seems to be the most popular language among this class of tools (and I haven't even touched Kubernetes yet!), but I'm not convinced it works well for me. YAML has many flaws, and I just don't want to use it.

Have you noticed that I haven't mentioned Infrastructure as code a single time yet? Well, here we go (from Wikipedia):

Infrastructure as code (IaC) is the process of managing and provisioning computer data centers through machine-readable definition files, rather than physical hardware configuration or interactive configuration tools.

Shouldn't it be called "Infrastructure as definition files", or "Infrastructure as YAML"?

As a software developer, what I really want is "Infrastructure as actual code, you know, the program thing". I want to use the same language that I already know. I want to stay in the same editor. I want to get IntelliSense auto-completion when I type. I want to see the compilation errors when what I typed is not syntactically correct. I want to reuse the developer skills that I already have. I want to come up with abstractions to generalize my code and create reusable components. I want to leverage open-source community who would create much better components than I ever could. I want to combine the code and infrastructure in one code project.

If you are with me on that, keep reading. You get all of that with Pulumi.

Pulumi

Pulumi is a tool to build cloud-based software using real programming languages. They support all major cloud providers, plus Kubernetes.

Pulumi programming model supports Go and Python too, but I'm going to use TypeScript for the rest of the article.

While prototyping a URL shortener, I explain the fundamental way of working and illustrate the benefits and some trade-offs. If you want to follow along, install Pulumi.

How Pulumi Works

Let's start defining our URL shortener application in TypeScript. I installed @pulumi/pulumi and @pulumi/aws NPM modules so that I can start the program. The first resource to create is a DynamoDB table:

import * as aws from "@pulumi/aws";

// A DynamoDB table with a single primary key
let counterTable = new aws.dynamodb.Table("urls", {
    name: "urls",
    attributes: [
        { name: "name", type: "S" },
    ],
    hashKey: "name",
    readCapacity: 1,
    writeCapacity: 1
});

I use pulumi CLI to run this program to provision the actual resource in AWS:

> pulumi up

Previewing update (urlshortener):

     Type                   Name             Plan
 +   pulumi:pulumi:Stack    urlshortener     create
 +    aws:dynamodb:Table    urls             create

Resources:
    + 2 to create

Do you want to perform this update? yes
Updating (urlshortener):

     Type                   Name             Status
 +   pulumi:pulumi:Stack    urlshortener     created
 +    aws:dynamodb:Table    urls             created

Resources:
    + 2 created

The CLI first shows the preview of the changes to be made, and when I confirm, it creates the resource. It also creates a stack—a container for all the resources of the application.

This code might look like an imperative command to create a DynamoDB table, but it actually isn't. If I go ahead and change readCapacity to 2 and then re-run pulumi up, it produces a different outcome:

> pulumi up

Previewing update (urlshortener):

     Type                   Name             Plan
     pulumi:pulumi:Stack    urlshortener        
 ~   aws:dynamodb:Table     urls             update  [diff: ~readCapacity]

Resources:
    ~ 1 to update
    1 unchanged

It detects the exact change that I made and suggests an update. The following picture illustrates how Pulumi works:

How Pulumi works

How Pulumi works

index.ts in the red square is my program. Pulumi's language host understands TypeScript and translates the code to commands to the internal engine. As a result, the engine builds a tree of resources-to-be-provisioned, the desired state of the infrastructure.

The end state of the last deployment is persisted in the storage (can be in pulumi.com backend or a file on disk). The engine then compares the current state of the system with the desired state of the program and calculates the delta in terms of create-update-delete commands to the cloud provider.

Help Of Types

Now I can proceed to the code that defines a Lambda function:

// Create a Role giving our Lambda access.
let policy: aws.iam.PolicyDocument = { /* Redacted for brevity */ };
let role = new aws.iam.Role("lambda-role", {
    assumeRolePolicy: JSON.stringify(policy),
});
let fullAccess = new aws.iam.RolePolicyAttachment("lambda-access", {
    role: role,
    policyArn: aws.iam.AWSLambdaFullAccess,
});

// Create a Lambda function, using code from the `./app` folder.
let lambda = new aws.lambda.Function("lambda-get", {
    runtime: aws.lambda.NodeJS8d10Runtime,
    code: new pulumi.asset.AssetArchive({
        ".": new pulumi.asset.FileArchive("./app"),
    }),
    timeout: 300,
    handler: "read.handler",
    role: role.arn,
    environment: { 
        variables: {
            "COUNTER_TABLE": counterTable.name
        }
    },
}, { dependsOn: [fullAccess] });

You can see that the complexity kicked in and the code size is growing. However, now I start to gain real benefits from using a typed programming language:

  • I'm using objects in the definitions of other object's parameters. If I misspell their name, I don't get a runtime failure but an immediate error message from the editor.
  • If I don't know which options I need to provide, I can go to the type definition and look it up (or use IntelliSense).
  • If I forget to specify a mandatory option, I get a clear error.
  • If the type of the input parameter doesn't match the type of the object I'm passing, I get an error again.
  • I can use language features like JSON.stringify right inside my program. In fact, I can reference and use any NPM module.

You can see the code for API Gateway here. It looks too verbose, doesn't it? Moreover, I'm only half-way through with only one Lambda function defined.

Reusable Components

We can do better than that. Here is the improved definition of the same Lambda function:

import { Lambda } from "./lambda";

const func = new Lambda("lambda-get", {
    path: "./app",
    file: "read",
    environment: { 
       "COUNTER_TABLE": counterTable.name
    },
});

Now, isn't that beautiful? Only the essential options remained, while all the machinery is gone. Well, it's not completely gone, it's been hidden behind an abstraction.

I defined a custom component called Lambda:

export interface LambdaOptions {
    readonly path: string;
    readonly file: string;

    readonly environment?:  pulumi.Input<{
        [key: string]: pulumi.Input<string>;
    }>;    
}

export class Lambda extends pulumi.ComponentResource {
    public readonly lambda: aws.lambda.Function;

    constructor(name: string,
        options: LambdaOptions,
        opts?: pulumi.ResourceOptions) {

        super("my:Lambda", name, opts);

        const role = //... Role as defined in the last snippet
        const fullAccess = //... RolePolicyAttachment as defined in the last snippet

        this.lambda = new aws.lambda.Function(`${name}-func`, {
            runtime: aws.lambda.NodeJS8d10Runtime,
            code: new pulumi.asset.AssetArchive({
                ".": new pulumi.asset.FileArchive(options.path),
            }),
            timeout: 300,
            handler: `${options.file}.handler`,
            role: role.arn,
            environment: {
                variables: options.environment
            }
        }, { dependsOn: [fullAccess], parent: this });
    }
}

The interface LambdaOptions defines options that are important for my abstraction. The class Lambda derives from pulumi.ComponentResource and creates all the child resources in its constructor.

A nice effect is that one can see the structure in pulumi preview:

Previewing update (fosdem-component-urlshortener):

     Type                                Name                  Plan
 +   pulumi:pulumi:Stack                 urlshortener          create
 +     my:Lambda                         lambda-get            create
 +       aws:iam:Role                    lambda-get-role       create
 +       aws:iam:RolePolicyAttachment    lambda-get-access     create
 +       aws:lambda:Function             lambda-get-func       create
 +     aws:dynamodb:Table                urls                  create

The Endpoint component simplifies definition of API Gateway (see the source):

const api = new Endpoint("urlapi", {
    path: "/{proxy+}",
    lambda: func.lambda
});

The component hides the complexity from the clients; if the abstraction was selected correctly, that is. The component class can be reused in multiple places, in several projects, across teams, etc.

Standard Component Library

In fact, Pulumi team came up with lots of high-level components that build abstractions on top of raw resources. The components from @pulumi/cloud-aws package are particularly useful for serverless applications.

Here is the full URL shortener application with DynamoDB table, Lambdas, API Gateway, and S3-based static files:

import * as aws from "@pulumi/cloud-aws";

// Create a table `urls`, with `name` as primary key.
let urlTable = new aws.Table("urls", "name");

// Create a web server.
let endpoint = new aws.API("urlshortener");

// Serve all files in the www directory to the root.
endpoint.static("/", "www");

// GET /url/{name} redirects to the target URL based on a short-name.
endpoint.get("/url/{name}", async (req, res) => {
    let name = req.params["name"];
    let value = await urlTable.get({name});
    let url = value && value.url;

    // If we found an entry, 301 redirect to it; else, 404.
    if (url) {
        res.setHeader("Location", url);
        res.status(301);
        res.end("");
    }
    else {
        res.status(404);
        res.end("");
    }
});

// POST /url registers a new URL with a given short-name.
endpoint.post("/url", async (req, res) => {
    let url = req.query["url"];
    let name = req.query["name"];
    await urlTable.insert({ name, url });
    res.json({ shortenedURLName: name });
});

export let endpointUrl = endpoint.publish().url;

The coolest thing here is that the actual implementation code of AWS Lambdas is intertwined with the definition of resources. The code looks very similar to an Express application. AWS Lambdas are defined as TypeScript lambdas. All strongly typed and compile-time checked.

It's worth noting that at the moment such high-level components only exist in TypeScript. One could create their custom components in Python or Go, but there is no standard library available. Pulumi folks are actively trying to figure out a way to bridge this gap.

Avoiding Vendor Lock-in?

If you look closely at the previous code block, you notice that only one line is AWS-specific: the import statement. The rest is just naming.

We can get rid of that one too: just change the import to import * as cloud from "@pulumi/cloud"; and replace aws. with cloud. everywhere. Now, we'd have to go to the stack configuration file and specify the cloud provider there:

config:
  cloud:provider: aws

Which is enough to make the application work again!

Vendor lock-in seems to be a big concern among many people when it comes to cloud architectures heavily relying on managed cloud services, including serverless applications. While I don't necessarily share those concerns and am not sure if generic abstractions are the right way to go, Pulumi Cloud library can be one direction for the exploration.

The following picture illustrates the choice of the level of abstraction that Pulumi provides:

Pulumi abstraction layers

Pulumi abstraction layers

Working on top of the cloud provider's API and internal resource provider, you can choose to work with raw components with maximum flexibility, or opt-in for higher-level abstractions. Mix-and-match in the same program is possible too.

Infrastructure as Real Code

Designing applications for the modern cloud means utilizing multiple cloud services which have to be configured to play nicely together. The Infrastructure as Code approach is almost a requirement to keep the management of such applications reliable in a team setting and over the extended period.

Application code and supporting infrastructure become more and more blended, so it's natural that software developers take the responsibility to define both. The next logical step is to use the same set of languages, tooling, and practices for both software and infrastructure.

Pulumi exposes cloud resources as APIs in several popular general-purpose programming languages. Developers can directly transfer their skills and experience to define, build, compose, and deploy modern cloud-native and serverless applications more efficiently than ever.

Serverless at Scale: Serving StackOverflow-like Traffic

Originally published at Binaris Blog

Serverless compute is a very productive and quick way to get an application up and running. A developer writes a piece of code that solves a particular task and uploads it to the cloud. The provider handles code deployment and the ops burden of managing all the required infrastructure, so that the Function is always available, secure and performant.

Performance is a feature, and the ability to run the same application for 10 users or 10 million users is very appealing. Unfortunately, FaaS is not magical, so scalability limits do exist. That's why I spend time testing the existing FaaS services to highlight the cases where performance might not be perfect. For some background, you can read my previous articles: From 0 to 1000 Instances: How Serverless Providers Scale Queue Processing for queue-based workloads and Serverless: Cold Start War for exploring cold start latencies.

Today, I want to dig into the scalability of serverless HTTP-based functions. HTTP-based functions are a popular use case that most developers can relate to, and they are also heavily impacted by the ability to scale. When your app goes viral on social networks, scores the Hacker News front page, or gets featured on TV, the last thing you want is slow responses and timeouts.

I implemented a simple HTTP-triggered function and deployed it across the Big-3 cloud providers—Amazon, Microsoft, and Google. Next, I ran a load test issuing hundreds of requests per second to each function. In this article, I present the design and the results of these experiments.

DISCLAIMER: Performance testing is hard. I might be missing some crucial factors and parameters that influence the outcome. My interpretation might be wrong. The results might change over time. If you happen to know a way to improve my tests, please let me know, and I will re-run them and re-publish the results.

StackOverflow on FaaS

Every developer knows StackOverflow and uses it pretty much every day. I've made the goal to serve traffic comparable to what StackOverflow sees, solely from a serverless function.

StackOverflow is an excellent target for many reasons:

StackOverflow runs on .NET Core, SQL Server, Redis, Elastic, etc. Obviously, my goal is not to replicate the whole site. I just want to serve the comparable traffic to the outside world.

Here are some important metrics for my experiment:

  • StackOverflow served 66 million pages per day, which is 760 pages/sec on average.
  • We’ll make the assumption that the vast majority of those pageviews are question pages, so I will ignore everything else.
  • We know they serve the whole page as one server-rendered HTML, so we’ll do something comparable.
  • Each page should be ~ 100kb size before compression.

With this in mind, I came up with the following experiment design:

  • Create a HTML template for the whole question page with question and answer markup replaced by placeholders.
  • Download the data of about 1000 questions and their respective answers from the the data explorer.
  • Save the HTML templates and JSON data in blob storage of each cloud provider.
  • Implement a serverless function that retrieves the question data, populates the template, and returns the HTML in response.

Serving StackOverflow Traffic from a Serverless Function

Serving StackOverflow Traffic from a Serverless Function

The HTML template is loaded at the first request and then cached in memory. The question/answers data file is loaded from the blob storage for every request. Template population is accomplished with string concatenation.

In my view, this setup is a simple but fair approximation of StackOverflow’s front-end. In addition, it is somewhat representative of many real-world web applications.

Metrics Setup

I analyzed the scalability of the following cloud services:

  • AWS Lambda triggered via Amazon API Gateway (docs)
  • Azure Function with an HTTP trigger (docs)
  • Google Cloud HTTP Function (docs)

All functions were implemented in JavaScript (Node.js) and were running on the latest GA runtime.

Since built-in monitoring tools, such as CloudWatch, would only report the function execution duration, which does not include other potential delays in the HTTP pipeline, I instead measured end-to-end latency from the client perspective. This means that the latency of the network and HTTP gateway (e.g., API Gateway in the case of AWS) were included in the total duration.

Requests were sent from multiple VMs outside the target cloud provider's region but in geographical proximity. Network latency was present in the metrics, but I estimated it to be 20-30 milliseconds at most.

Measuring Response Time of a Serverless Function

Measuring Response Time of a Serverless Function

Blob storage services of all cloud providers have enough throughput to serve one blob per HTTP request. However, the latencies differ among the clouds, so I included blob fetch duration measurements in the performance baseline.

Each measurement was then saved to persistent storage and analyzed afterward.

The charts below show percentile values. For instance, the 95th percentile (written as P95) value of 100ms means that 95% of the requests were faster than 100ms while 5% were slower than that. P50 is the median.

Load Pattern

I wanted to test the ability of serverless functions to scale up rapidly in response to the growth in request rate, so I came up with a dynamic load scenario.

The experiments started with a baseline 10% of the target load. The goal of the baseline was to make sure that the app was overall healthy, to evaluate the basic latency and the impact of blob storage on it.

At some point (around minute 0 of the charts), the load began to grow and reached 1000 RPS within 8 minutes. After the peak, the cooldown period started, and the load steadily decreased to zero in 8 more minutes.

Request Distribution during the Load Test

Request Distribution during the Load Test

Even though the growth period on the left and the decline period on the right represented the same number of requests, the hypothesis was that the first half might be more challenging because of the need to provision new resources rapidly.

In total, 600,000 requests were served within 17 minutes with the total outbound traffic of 70 GB.

Finally, we’ve made it through all the mechanics, and now it's time to present the actual results.

AWS Lambda

AWS Lambda was our first target for the experiment. I provisioned 512 MB size for Lambda instances, which is a medium-range value. I expected larger instances to be slightly faster, and smaller instances to be a bit slower, but the load is not very demanding to CPU, so the overall results should be comparable across the spectrum.

During the low-load baseline, the median response time was about 70 ms with a minimum of 50 ms. The median response time from the S3 bucket was 50 ms.

Here is the P50-P95 latency chart during the load test:

AWS Lambda Response Time Distribution (P50-P95)

AWS Lambda Response Time Distribution (P50-P95)

The percentiles were very consistent and flat. The median response time was still around 70 ms with no variance observed. P90 and P95 were quite stable too.

Only the 99th percentile displayed the difference between the ramp-up period on the left and the cooldown period on the right:

AWS Lambda Response Time Distribution (P99)

AWS Lambda Response Time Distribution (P99)

AWS Lambda scales by creating multiple instances of the same function that handle the requests in parallel. Each Lambda instance is handling a single request at any given time, which is why the scale is measured in "concurrent executions." When the current request is done being processed, the same instance can be reused for a subsequent request.

Instance identifier can be retrieved from /proc/self/cgroup of a lambda, so I recorded this value for each execution. The following chart shows the number of instances throughout the experiment:

AWS Lambda Concurrent Executions

AWS Lambda Concurrent Executions
There were about 80 concurrent executions at peak. That's quite a few, but still, almost an order of magnitude fewer instances compared to my queue processing experiment. It felt that AWS was capable of scaling even further.

P99.9 showed slowness of the least lucky 0.1% requests. Most probably, it had lots of cold starts in it:

AWS Lambda Response Time Distribution (P99.9)

AWS Lambda Response Time Distribution (P99.9)
Still, even those requests were mostly served within 2-3 seconds.

Google Cloud Functions

Now, let's look at the results of Google Cloud Functions. Once again I provisioned 512 MB instance size (same as for Lambda).

During the low-load baseline, the median response time was about 150 ms with a minimum of 100 ms. Almost all of that time was spent fetching blobs from Cloud Storage: Its median latency was over 130 ms! I haven't spent too much time investigating the reason, but I assume that Google Cloud Storage has higher latency for small files than S3. Zach Bjornson published the comparison of storage latencies. Although it's 3 years old, the conclusion was that "GCS averaged more than three times higher latency" when compared to Azure and AWS.

That's an important observation because the Function execution times were twice as big as those recorded on AWS. Keeping this difference in mind, here is the P50-P95 latency chart during the GCP load test:

Google Cloud Function Response Time Distribution (P50-P95)

Google Cloud Function Response Time Distribution (P50-P95)
The median value was stable and flat at 150-180 ms. P90 and P95 had some spikes during the first 3 minutes. Google passed the test, but the lower percentiles were not perfect.

The 99th percentile was relatively solid though. It was higher on the left, but it stayed within 1 second most of the time:

Google Cloud Function Response Time Distribution (P99)

Google Cloud Function Response Time Distribution (P99)
The scaling model of Google Functions appeared to be very similar to the one of AWS Lambda. This means that 2x duration of the average execution required 2x more concurrent executions to run and 2x more instances to be provisioned:

Google Cloud Function Concurrent Executions

Google Cloud Function Concurrent Executions
Indeed, there were about 160 concurrent executions at peak. GCP had to work twice as hard because of the storage latency, which might explain some of the additional variations of response time.

Besides, Google seems to manage instance lifecycle differently. It provisioned a larger batch of instances during the first two minutes, which was in line with my previous findings. It also kept instances for longer when the traffic went down (or at least, reused the existing instances more evenly).

For completeness, here are the P99.9 values:

Google Cloud Function Response Time Distribution (P99.9)

Google Cloud Function Response Time Distribution (P99.9)
They fluctuated between 1 and 5 seconds on the left and were incredibly stable on the right.

Azure

Experiments with Azure Functions were run on Consumption Plan —the dynamically scaled and billed-per-execution runtime. Consumption Plan doesn't have a configuration for allocated memory or any other instance size parameters.

During the low-load baseline, the median response time was about 95 ms with a minimum of 45 ms, which is close to AWS and considerably faster than GCP. This time, JSON file retrieval was not the main contributor to the end-to-end latency: The median response time of Azure Blob Storage was an amazing 8 ms.

However, it turns out that the scaling model of Azure Functions doesn't work well for my experiment. Very high latencies were observed during the load test on Azure:

Azure Function (Node.js) Response Time Distribution (P50-P95)

Azure Function (Node.js) Response Time Distribution (P50-P95)
The concurrency model of Azure Functions is different from the counterparts of AWS/GCP. Function App instance is closer to a VM than a single-task container. It runs multiple concurrent executions in parallel. A central coordinator called Scale Controller monitors the metrics from existing instances and determines how many instances to provision on top. Instance identifier can be retrieved from environment variables of a function, so I recorded this value for each execution.

The multiple-requests-at-one-instance model didn't help in terms of the total instances required to process the traffic:

Azure Function (Node.js) Instances

Azure Function (Node.js) Instances
At peak, 90 instances were required, which is almost the same as the number of current executions of AWS Lambda. Given the I/O bound nature of my function, this was surprising to me.

Puzzled by the moderate results, I decided to run the same application as a .NET Azure Function and compare the performance. The same function ported to C# got much faster:

Azure Function (.NET) Response Time Distribution (P50-P95)

Azure Function (.NET) Response Time Distribution (P50-P95)
P50 was extremely good: It stayed below 50 ms (leveraging the blazingly fast Blob Storage) for the whole period except for one point when it was 140 ms. P90 and P95 were stable except for three data points.

The chart of instance growth was very different from the JavaScript one too:

Azure Function (.NET) Instances

Azure Function (.NET) Instances
Basically, it spiked to 20 instances at the third minute, and that was enough for the rest of the test. I concluded that the .NET worker was more efficient compared to Node.js worker, at least for my scenario.

If I compare the percentile charts with the instance charts, it looks as if the latency spikes happen at the time when new instances get provisioned. For some reason, the performance suffers during the scale out. It's not just cold starts at the new instances: P90 and even P50 are affected. It might be a good topic for a separate investigation.

Conclusion

During the experiment, sample StackOverflow pages were built and served from AWS Lambda, Google Cloud Functions, and Azure Functions at the rate of up to 1000 pageviews per second. Each Function call served a single pageview and was a combination of I/O workload (reading blob storage) and CPU usage (for parsing JSON and rendering HTML).

All cloud providers were able to scale up and serve the traffic. However, the latency distributions were quite different.

AWS Lambda was solid: Median response time was always below 100 ms, 95th percentile was below 200 ms, and 99th percentile exceeded 500 ms just once.

Google Cloud Storage seemed to have the highest latency out of the three cloud providers. Google Cloud Functions had a bit of a slowdown during the first two minutes of the scale-out but otherwise were quite stable and responsive.

Azure Functions had difficulties during the scale-out period and the response time went up to several seconds. .NET worker appeared to be more performant compared to Node.js one, but both of them show undesirable spikes when new instances are provisioned.

Here is my practical advice to take home:

  • Function-as-a-Service is a great model to build applications that can work for low-usage scenarios, high-load applications, and even spiky workloads.
  • Scalability limits do exist, so if you anticipate high growth in the application's usage, run a simple load test to see how it behaves.
  • Always test in combination with your non-serverless dependencies. I've selected scalable-by-definition cloud blob storage, and yet I got some influence of its behavior on the results. If you use a database or a third-party service, it's quite likely they will hit the scalability limit much earlier than the serverless compute.

A Fairy Tale of F# and Durable Functions

The post is a part of F# Advent Calendar 2018. It's Christmas time!

This summer I was hired by the office of Santa Claus. Santa is not just a fairy tale character on his own—he leads a large organization that supplies gifts and happiness to millions of children around the globe. Like any large organization, Santa's office employs an impressive number of IT systems.

As part of its IT modernization effort, North Pole HQ restructured the whole supply chain of Christmas gifts. Many legacy components were moved from a self-managed data center at the North Pole—although the cooling is quite cheap there—to Azure cloud. Azure was an easy sell since Santa's techy elves use Office 365, SharePoint and the .NET development stack.

One of the goals of the redesign was to leverage managed cloud services and serverless architecture wherever possible. Santa has no spare elves to keep reinventing IT wheels.

Wish Fulfillment Service

My assignment was to redesign the Wish Fulfillment service. The service receives wish lists from clients (they call children "clients"):

Christmas Wish List

Christmas Card with a Wish List © my son Tim

Luckily, the list is already parsed by some other service, and also contains the metadata about the kid's background (age, gender, and so on) and preferences.

For each item in the list, our service calls the Matching service, which uses machine learning, Azure Cognitive services, and a bit of magic to determine the actual products (they call gifts "products") that best fit the client's expressed desire and profile. For instance, my son's wish for "LEGO Draak" matches to "LEGO NINJAGO Masters of Spinjitzu Firstbourne Red Dragon". You get the point.

There might be several matches for each desired item, and each result has an estimate of how likely it is to fulfill the original request and make the child happy.

All the matching products are combined and sent over to the Gift Picking service. Gift Picking selects one of the options based on its price, demand, confidence level, and the Naughty-or-Nice score of the client.

The last step of the workflow is to Reserve the selected gift in the warehouse and shipping system called "Santa's Archive of Products", also referred to as SAP.

Here is the whole flow in one picture:

Gift Fulfillment Workflow

Gift Fulfillment Workflow

How should we implement this service?

Original Design

The Wish Fulfillment service should run in the cloud and integrate with other services. It should be able to process millions of requests in December and stay very cheap to run during the rest of the year. We decided to leverage serverless architecture with Azure Functions on the Consumption Plan. Serverless Functions are:

  • Fully Managed: the cloud provider provisions resources, scales them based on the load, takes care of uptime and reliability;

  • Event-Driven: for each serverless Function you have to define a specific trigger—the event type which causes it to run, be it an HTTP endpoint or a queue message;

  • Changed per Execution: it costs nothing to run the application if there is no usage, and the cost of busy applications is proportional to the actual resource utilization.

Here is the diagram of the original design:

Workflow Design with Azure Functions and Storage Queues

Workflow Design with Azure Functions and Storage Queues

We used Azure Storage Queues to keep the whole flow asynchronous and more resilient to failures and load fluctuation.

This design would mostly work, but we found a couple of problems with it:

  • The Functions were manually wired via storage queues and corresponding bindings. The workflow was spread over infrastructure definition and thus was hard to grasp.

  • We had to pass all items of each wish list into a single invocation of Matching Function, otherwise combining the matching results from multiple queue messages would be tricky.

  • Although not in scope for the initial release, there were plans to add manual elf intervention for poorly matched items. This feature would require a change in the flow design: it's not trivial to fit long-running processes into the pipeline.

To improve on these points, we decided to try Durable Functions—a library that brings workflow orchestration to Azure Functions. It introduces several tools to define stateful, potentially long-running operations, and handles a lot of the mechanics of reliable communication and state management behind the scenes.

If you want to know more about what Durable Functions are and why they might be a good idea, I invite you to read my article Making Sense of Azure Durable Functions (20 minutes read).

For the rest of this post, I will walk you through the implementation of the Wish Fulfillment workflow with Azure Durable Functions.

Domain Model

A good design starts with a decent domain model. Luckily, the project was built with F#—the language with the richest domain modeling capabilities in the .NET ecosystem.

Types

Our service is invoked with a wish list as the input parameter, so let's start with the type WishList:

type WishList = 
{
    Kid: Customer
    Wishes: string list
}

It contains information about the author of the list and recognized "order" items. Customer is a custom type; for now, it's not important what's in it.

For each wish we want to produce a list of possible matches:

type Match = 
{
    Product: Product
    Confidence: Probability
}

The product is a specific gift option from Santa's catalog, and the confidence is a number from 0.0 to 1.0 of how strong the match is.

The end goal of our service is to produce a Reservation:

type Reservation = 
{
    Kid: Customer
    Product: Product
}

It represents the exact product selection for the specific kid.

Functions

The Wish Fulfillment service needs to perform three actions, which can be modeled with three strongly-typed asynchronous functions.

Note: I use lowercase "function" for F# functions and capitalize "Function" for Azure Functions throughout the article to minimize confusion.

The first action finds matches for each wish:

// string -> Async<Match list>
let findMatchingGift (wish: string) = async {
    // Call a custom machine learning model
    // The real implementation uses the Customer profile to adjust decisions by age, etc.
    // but we'll keep the model simple for now.
}

The first line of all my function snippets shows the function type. In this case, it's a mapping from the text of the child's wish (string) to a list of matches (Match list).

The second action takes the combined list of all matches of all wishes and picks one. Its real implementation is Santa's secret sauce, but my model just picks the one with the highest confidence level:

// Match list -> Product
let pickGift (candidates: Match list) =
    candidates
    |> List.sortByDescending (fun x -> x.Confidence)
    |> List.head
    |> (fun x -> x.Product)

Given the picked gift, the reservation is merely { Kid = wishlist.Kid; Product = gift }, not worthy of a separate action.

The third action registers a reservation in the SAP system:

// Reservation -> Async<unit>
let reserve (reservation: Reservation) = async {
    // Call Santa's Archive of Products
}

Workflow

The fulfillment service combines the three actions into one workflow:

// WishList -> Async<Reservation>
let workflow (wishlist: WishList) = async {

    // 1. Find matches for each wish 
    let! matches = 
        wishlist.Wishes
        |> List.map findMatchingGift
        |> Async.Parallel

    // 2. Pick one product from the combined list of matches
    let gift = pickGift (List.concat matches)

    // 3. Register and return the reservation
    let reservation = { Kid = wishlist.Kid; Product = gift }
    do! reserve reservation
    return reservation
}

The workflow implementation is a nice and concise summary of the actual domain flow.

Note that the Matching service is called multiple times in parallel, and then the results are easily combined by virtue of the Async.Parallel F# function.

So how do we translate the domain model to the actual implementation on top of serverless Durable Functions?

Classic Durable Functions API

C# was the first target language for Durable Functions; Javascript is now fully supported too.

F# wasn't initially declared as officially supported, but since F# runs on top of the same .NET runtime as C#, it has always worked. I have a blog post about Azure Durable Functions in F# and have added F# samples to the official repository.

Here are two examples from that old F# code of mine (they have nothing to do with our gift fulfillment domain):

// 1. Simple sequencing of activities
let Run([<OrchestrationTrigger>] context: DurableOrchestrationContext) = task {
  let! hello1 = context.CallActivityAsync<string>("E1_SayHello", "Tokyo")
  let! hello2 = context.CallActivityAsync<string>("E1_SayHello", "Seattle")
  let! hello3 = context.CallActivityAsync<string>("E1_SayHello", "London")
  return [hello1; hello2; hello3]
}     

// 2. Parallel calls snippet
let tasks = Array.map (fun f -> context.CallActivityAsync<int64>("E2_CopyFileToBlob", f)) files
let! results = Task.WhenAll tasks

This code works and does its job, but doesn't look like idiomatic F# code:

  • No strong typing: Activity Functions are called by name and with types manually specified
  • Functions are not curried, so partial application is hard
  • The need to pass the context object around for any Durable operation

Although not shown here, the other samples read input parameters, handle errors, and enforce timeouts—all look too C#-y.

Better Durable Functions

Instead of following the sub-optimal route, we implemented the service with a more F#-idiomatic API. I'll show the code first, and then I'll explain its foundation.

The implementation consists of three parts:

  • Activity Functions—one per action from the domain model
  • Orchestrator Function defines the workflow
  • Azure Functions bindings to instruct how to run the application in the cloud

Activity Functions

Each Activity Function defines one step of the workflow: Matching, Picking, and Reserving. We simply reference the F# functions of those actions in one-line definitions:

let findMatchingGiftActivity = Activity.defineAsync "FindMatchingGift" findMatchingGift
let pickGiftActivity = Activity.define "PickGift" pickGift
let reserveActivity = Activity.defineAsync "Reserve" reserve

Each activity is defined by a name and a function.

Orchestrator

The Orchestrator calls Activity Functions to produce the desired outcome of the service. The code uses a custom computation expression:

let workflow wishlist = orchestrator {
    let! matches = 
        wishlist.Wishes
        |> List.map (Activity.call findMatchingGiftActivity)
        |> Activity.all

    let! gift = Activity.call pickGiftActivity (List.concat matches)

    let reservation = { Kid = wishlist.Kid; Product = gift }
    do! Activity.call reserveActivity reservation
    return reservation
}

Notice how closely it matches the workflow definition from our domain model:

Async Function vs. Durable Orchestrator

Async function vs. Durable Orchestrator

The only differences are:

  • orchestrator computation expression is used instead of async because multi-threading is not allowed in Orchestrators
  • Activity.call replaces of direct invocations of functions
  • Activity.all substitutes Async.Parallel

Hosting layer

An Azure Function trigger needs to be defined to host any piece of code as a cloud Function. This can be done manually in function.json, or via trigger generation from .NET attributes. In my case I added the following four definitions:

[<FunctionName("FindMatchingGift")>]
let FindMatchingGift([<ActivityTrigger>] wish) = 
    Activity.run findMatchingGiftActivity wish

[<FunctionName("PickGift")>]
let PickGift([<ActivityTrigger>] matches) = 
    Activity.run pickGiftActivity matches

[<FunctionName("Reserve")>]
let Reserve([<ActivityTrigger>] wish) = 
    Activity.run reserveActivity wish

[<FunctionName("WishlistFulfillment")>]
let Workflow ([<OrchestrationTrigger>] context: DurableOrchestrationContext) =
    Orchestrator.run (workflow, context)

The definitions are very mechanical and, again, strongly typed (apart from Functions' names).

Ship It!

These are all the bits required to get our Durable Wish Fulfillment service up and running. From this point, we can leverage all the existing tooling of Azure Functions:

There is a learning curve in the process of adopting the serverless architecture. However, a small project like ours is a great way to do the learning. It sets Santa's IT department on the road to success, and children will get better gifts more reliably!

DurableFunctions.FSharp

The above code was implemented with the library DurableFunctions.FSharp. I created this library as a thin F#-friendly wrapper around Durable Functions.

Frankly speaking, the whole purpose of this article is to introduce the library and make you curious enough to give it a try. DurableFunctions.FSharp has several pieces in the toolbox:

  • OrchestratorBuilder and orchestrator computation expression which encapsulates proper usage of Task-based API of DurableOrchestrationContext

  • Activity generic type to define activities as first-class values

  • Activity module with helper functions to call activities

  • Adapters for Azure Functions definition for Async and Orchestrator

  • API of the original Durable Extensions is still available, so you can fall back to them if needed

In my opinion, F# is a great language to develop serverless Functions. The simplicity of working with functions, immutability by default, strong type system, focus on data pipelines are all useful in the world of event-driven cloud applications.

Azure Durable Functions brings higher-level abstractions to compose workflows out of simple building blocks. The goal of DurableFunctions.FSharp is to make such composition natural and enjoyable for F# developers.

Getting Started is as easy as creating a new .NET Core project and referencing a NuGet package.

I'd love to get as much feedback as possible! Leave comments below, create issues on the GitHub repository, or open a PR. This would be super awesome!

Happy coding, and Merry Christmas!

Acknowledgments

Many thanks to Katy Shimizu, Devon Burriss, Dave Lowe, Chris Gillum for reviewing the draft of this article and their valuable contributions and suggestions.

Making Sense of Azure Durable Functions

Stateful Workflows on top of Stateless Serverless Cloud Functions—this is the essence of the Azure Durable Functions library. That's a lot of fancy words in one sentence, and they might be hard for the majority of readers to understand.

Please join me on the journey where I'll try to explain how those buzzwords fit together. I will do this in 3 steps:

  • Describe the context of modern cloud applications relying on serverless architecture;
  • Identify the limitations of basic approaches to composing applications out of the simple building blocks;
  • Explain the solutions that Durable Functions offer for those problems.

Microservices

Traditionally, server-side applications were built in a style which is now referred to as Monolith. If multiple people and teams were developing parts of the same application, they mostly contributed to the same code base. If the code base were structured well, it would have some distinct modules or components, and a single team would typically own each module:

Monolith

Multiple components of a monolithic application

Usually, the modules would be packaged together at build time and then deployed as a single unit, so a lot of communication between modules would stay inside the OS process.

Although the modules could stay loosely coupled over time, the coupling almost always occurred on the level of the data store because all teams would use a single centralized database.

This model works great for small- to medium-size applications, but it turns out that teams start getting in each other's way as the application grows since synchronization of contributions takes more and more effort.

As a complex but viable alternative, the industry came up with a revised service-oriented approach commonly called Microservices. The teams split the big application into "vertical slices" structured around the distinct business capabilities:

Microservices

Multiple components of a microservice-based application

Each team then owns a whole vertical—from public communication contracts, or even UIs, down to the data storage. Explicitly shared databases are strongly discouraged. Services talk to each other via documented and versioned public contracts.

If the borders for the split were selected well—and that's the most tricky part—the contracts stay stable over time, and thin enough to avoid too much chattiness. This gives each team enough autonomy to innovate at their best pace and to make independent technical decisions.

One of the drawbacks of microservices is the change in deployment model. The services are now deployed to separate servers connected via a network:

Distributed Systems

Challenges of communication between distributed components

Networks are fundamentally unreliable: they work just fine most of the time, but when they fail, they fail in all kinds of unpredictable and least desirable manners. There are books written on the topic of distributed systems architecture. TL;DR: it's hard.

A lot of the new adopters of microservices tend to ignore such complications. REST over HTTP(S) is the dominant style of connecting microservices. Like any other synchronous communication protocol, it makes the system brittle.

Consider what happens when one service becomes temporary unhealthy: maybe its database goes offline, or it's struggling to keep up with the request load, or a new version of the service is being deployed. All the requests to the problematic service start failing—or worse—become very slow. The dependent service waits for the response, and thus blocks all incoming requests of its own. The error propagates upstream very quickly causing cascading failures all over the place:

Cascading Failures

Error in one component causes cascading failures

The application is down. Everybody screams and starts the blame war.

Event-Driven Applications

While cascading failures of HTTP communication can be mitigated with patterns like a circuit breaker and graceful degradation, a better solution is to switch to the asynchronous style of communication as the default. Some kind of persistent queueing service is used as an intermediary.

The style of application architecture which is based on sending events between services is known as Event-Driven. When a service does something useful, it publishes an event—a record about the fact which happened to its business domain. Another service listens to the published events and executes its own duty in response to those facts:

Event-Driven Application

Communication in event-driven applications

The service that produces events might not know about the consumers. New event subscribers can be introduced over time. This works better in theory than in practice, but the services tend to get coupled less.

More importantly, if one service is down, other services don't catch fire immediately. The upstream services keep publishing the events, which build up in the queue but can be stored safely for hours or days. The downstream services might not be doing anything useful for this particular flow, but it can stay healthy otherwise.

However, another potential issue comes hand-in-hand with loose coupling: low cohesion. As Martin Fowler notices in his essay What do you mean by "Event-Driven":

It's very easy to make nicely decoupled systems with event notification, without realizing that you're losing sight of the larger-scale flow.

Given many components that publish and subscribe to a large number of event types, it's easy to stop seeing the forest for the trees. Combinations of events usually constitute gradual workflows executed in time. A workflow is more than the sum of its parts, and understanding of the high-level flow is paramount to controlling the system behavior.

Hold this thought for a minute; we'll get back to it later. Now it's time to talk cloud.

Cloud

The birth of public cloud changed the way we architect applications. It made many things much more straightforward: provisioning of new resources in minutes instead of months, scaling elastically based on demand, and resiliency and disaster recovery at the global scale.

It made other things more complicated. Here is the picture of the global Azure network:

Azure Network

Azure locations with network connections

There are good reasons to deploy applications to more than one geographical location: among others, to reduce network latency by staying close to the customer, and to achieve resilience through geographical redundancy. Public Cloud is the ultimate distributed system. As you remember, distributed systems are hard.

There's more to that. Each cloud provider has dozens and dozens of managed services, which is the curse and the blessing. Specialized services are great to provide off-the-shelf solutions to common complex problems. On the flip side, each service has distinct properties regarding consistency, resiliency and fault tolerance.

In my opinion, at this point developers have to embrace the public cloud and apply the distributed system design on top of it. If you agree, there is an excellent way to approach it.

Serverless

The slightly provocative term serverless is used to describe cloud services that do not require provisioning of VMs, instances, workers, or any other fixed capacity to run custom applications on top of them. Resources are allocated dynamically and transparently, and the cost is based on their actual consumption, rather than on pre-purchased capacity.

Serverless is more about operational and economical properties of the system than about the technology per se. Servers do exist, but they are someone else's concern. You don't manage the uptime of serverless applications: the cloud provider does.

On top of that, you pay for what you use, similar to the consumption of other commodity resources like electricity. Instead of buying a generator to power up your house, you just purchase energy from the power company. You lose some control (e.g., no way to select the voltage), but this is fine in most cases. The great benefit is no need to buy and maintain the hardware.

Serverless compute does the same: it supplies standard services on a pay-per-use basis.

If we talk more specifically about Function-as-a-Service offerings like Azure Functions, they provide a standard model to run small pieces of code in the cloud. You zip up the code or binaries and send it to Azure; Microsoft takes care of all the hardware and software required to run it. The infrastructure automatically scales up or down based on demand, and you pay per request, CPU time and memory that the application consumed. No usage—no bill.

However, there's always a "but". FaaS services come with an opinionated development model that applications have to follow:

  • Event-Driven: for each serverless function you have to define a specific trigger—the event type which causes it to run, be it an HTTP endpoint or a queue message;

  • Short-Lived: functions can only run up to several minutes, and preferably for a few seconds or less;

  • Stateless: as you don't control where and when function instances are provisioned or deprovisioned, there is no way to store data within the process between requests reliably; external storage has to be utilized.

Frankly speaking, the majority of existing applications don't really fit into this model. If you are lucky to work on a new application (or a new module of it), you are in better shape.

A lot of the serverless applications may be designed to look somewhat similar to this example from the Serverless360 blog:

Serviceful Serverless Application

Sample application utilizing "serviceful" serverless architecture

There are 9 managed Azure services working together in this app. Most of them have a unique purpose, but the services are all glued together with Azure Functions. An image is uploaded to Blob Storage, an Azure Function calls Vision API to recognize the license plate and send the result to Event Grid, another Azure Function puts that event to Cosmos DB, and so on.

This style of cloud applications is sometimes referred to as Serviceful to emphasize the heavy usage of managed services "glued" together by serverless functions.

Creating a comparable application without any managed services would be a much harder task, even more so, if the application has to run at scale. Moreover, there's no way to keep the pay-as-you-go pricing model in the self-service world.

The application pictured above is still pretty straightforward. The processes in enterprise applications are often much more sophisticated.

Remember the quote from Martin Fowler about losing sight of the large-scale flow. That was true for microservices, but it's even more true for the "nanoservices" of cloud functions.

I want to dive deeper and give you several examples of related problems.

Challenges of Serverless Composition

For the rest of the article, I'll define an imaginary business application for booking trips to software conferences. In order to go to a conference, I need to buy tickets to the conference itself, purchase the flights, and book a room at a hotel.

In this scenario, it makes sense to create three Azure Functions, each one responsible for one step of the booking process. As we prefer message passing, each Function emits an event which the next function can listen for:

Conference Booking Application

Conference booking application

This approach works, however, problems do exist.

Flexible Sequencing

As we need to execute the whole booking process in sequence, the Azure Functions are wired one after another by configuring the output of one function to match with the event source of the downstream function.

In the picture above, the functions' sequence is hard-defined. If we were to swap the order of booking the flights and reserving the hotel, that would require a code change—at least of the input/output wiring definitions, but probably also the functions' parameter types.

In this case, are the functions really decoupled?

Error Handling

What happens if the Book Flight function becomes unhealthy, perhaps due to the outage of the third-party flight-booking service? Well, that's why we use asynchronous messaging: after the function execution fails, the message returns to the queue and is picked up again by another execution.

However, such retries happen almost immediately for most event sources. This might not be what we want: an exponential back-off policy could be a smarter idea. At this point, the retry logic becomes stateful: the next attempt should "know" the history of previous attempts to make a decision about retry timing.

There are more advanced error-handling patterns too. If executions failures are not intermittent, we may decide to cancel the whole process and run compensating actions against the already completed steps.

An example of this is a fallback action: if the flight is not possible (e.g., no routes for this origin-destination combination), the flow could choose to book a train instead:

Fallback On Error

Fallback after 3 consecutive failures

This scenario is not trivial to implement with stateless functions. We could wait until a message goes to the dead-letter queue and then route it from there, but this is brittle and not expressive enough.

Parallel Actions

Sometimes the business process doesn't have to be sequential. In our reservation scenario, there might be no difference whether we book a flight before a hotel or vice versa. It could be desirable to run those actions in parallel.

Parallel execution of actions is easy with the pub-sub capabilities of an event bus: both functions should subscribe to the same event and act on it independently.

The problem comes when we need to reconcile the outcomes of parallel actions, e.g., calculate the final price for expense reporting purposes:

Fan-out / Fan-in

Fan-out / fan-in pattern

There is no way to implement the Report Expenses block as a single Azure Function: functions can't be triggered by two events, let alone correlate two related events.

The solution would probably include two functions, one per event, and the shared storage between them to pass information about the first completed booking to the one who completes last. All this wiring has to be implemented in custom code. The complexity grows if more than two functions need to run in parallel.

Also, don't forget the edge cases. What if one of the function fails? How do you make sure there is no race condition when writing and reading to/from the shared storage?

Missing Orchestrator

All these examples give us a hint that we need an additional tool to organize low-level single-purpose independent functions into high-level workflows.

Such a tool can be called an Orchestrator because its sole mission is to delegate work to stateless actions while maintaining the big picture and history of the flow.

Azure Durable Functions aims to provide such a tool.

Introducing Azure Durable Functions

Azure Functions

Azure Functions is the serverless compute service from Microsoft. Functions are event-driven: each function defines a trigger—the exact definition of the event source, for instance, the name of a storage queue.

Azure Functions can be programmed in several languages. A basic Function with a Storage Queue trigger implemented in C# would look like this:

[FunctionName("MyFirstFunction")]
public static void QueueTrigger(
    [QueueTrigger("myqueue-items")] string myQueueItem, 
    ILogger log)
{
    log.LogInformation($"C# function processed: {myQueueItem}");
}

The FunctionName attribute exposes the C# static method as an Azure Function named MyFirstFunction. The QueueTrigger attribute defines the name of the storage queue to listen to. The function body logs the information about the incoming message.

Durable Functions

Durable Functions is a library that brings workflow orchestration abstractions to Azure Functions. It introduces a number of idioms and tools to define stateful, potentially long-running operations, and manages a lot of mechanics of reliable communication and state management behind the scenes.

The library records the history of all actions in Azure Storage services, enabling durability and resilience to failures.

Durable Functions is open source, Microsoft accepts external contributions, and the community is quite active.

Currently, you can write Durable Functions in 3 programming languages: C#, F#, and Javascript (Node.js). All my examples are going to be in C#. For Javascript, check this quickstart and these samples. For F# see the samples, the F#-specific library and my article A Fairy Tale of F# and Durable Functions.

Workflow building functionality is achieved by the introduction of two additional types of triggers: Activity Functions and Orchestrator Functions.

Activity Functions

Activity Functions are simple stateless single-purpose building blocks that do just one task and have no awareness of the bigger workflow. A new trigger type, ActivityTrigger, was introduced to expose functions as workflow steps, as I explain below.

Here is a simple Activity Function implemented in C#:

[FunctionName("BookConference")]
public static ConfTicket BookConference([ActivityTrigger] string conference)
{
    var ticket = BookingService.Book(conference);
    return new ConfTicket { Code = ticket };
}

It has a common FunctionName attribute to expose the C# static method as an Azure Function named BookConference. The name is important because it is used to invoke the activity from orchestrators.

The ActivityTrigger attribute defines the trigger type and points to the input parameter conference which the activity expects to get for each invocation.

The function can return a result of any serializable type; my sample function returns a simple property bag called ConfTicket.

Activity Functions can do pretty much anything: call other services, load and save data from/to databases, and use any .NET libraries.

Orchestrator Functions

The Orchestrator Function is a unique concept introduced by Durable Functions. Its sole purpose is to manage the flow of execution and data among several activity functions.

Its most basic form chains multiple independent activities into a single sequential workflow.

Let's start with an example which books a conference ticket, a flight itinerary, and a hotel room one-by-one:

Sequential Workflow

3 steps of a workflow executed in sequence

The implementation of this workflow is defined by another C# Azure Function, this time with OrchestrationTrigger:

[FunctionName("SequentialWorkflow")]
public static async Task Sequential([OrchestrationTrigger] DurableOrchestrationContext context)
{
    var conference = await context.CallActivityAsync<ConfTicket>("BookConference", "ServerlessDays");
    var flight = await context.CallActivityAsync<FlightTickets>("BookFlight", conference.Dates);
    await context.CallActivityAsync("BookHotel", flight.Dates);
}

Again, attributes are used to describe the function for the Azure runtime.

The only input parameter has type DurableOrchestrationContext. This context is the tool that enables the orchestration operations.

In particular, the CallActivityAsync method is used three times to invoke three activities one after the other. The method body looks very typical for any C# code working with a Task-based API. However, the behavior is entirely different. Let's have a look at the implementation details.

Behind the Scenes

Let's walk through the lifecycle of one execution of the sequential workflow above.

When the orchestrator starts running, the first CallActivityAsync invocation is made to book the conference ticket. What actually happens here is that a queue message is sent from the orchestrator to the activity function.

The corresponding activity function gets triggered by the queue message. It does its job (books the ticket) and returns the result. The activity function serializes the result and sends it as a queue message back to the orchestrator:

Durable Functions: Message Passing

Messaging between the orchestrator and the activity

When the message arrives, the orchestrator gets triggered again and can proceed to the second activity. The cycle repeats—a message gets sent to Book Flight activity, it gets triggered, does its job, and sends a message back to the orchestrator. The same message flow happens for the third call.

Stop-resume behavior

As discussed earlier, message passing is intended to decouple the sender and receiver in time. For every message in the scenario above, no immediate response is expected.

On the C# level, when the await operator is executed, the code doesn't block the execution of the whole orchestrator. Instead, it just quits: the orchestrator stops being active and its current step completes.

Whenever a return message arrives from an activity, the orchestrator code restarts. It always starts with the first line. Yes, this means that the same line is executed multiple times: up to the number of messages to the orchestrator.

However, the orchestrator stores the history of its past executions in Azure Storage, so the effect of the second pass of the first line is different: instead of sending a message to the activity it already knows the result of that activity, so await returns this result back and assigns it to the conference variable.

Because of these "replays", the orchestrator's implementation has to be deterministic: don't use DateTime.Now, random numbers or multi-thread operations; more details here.

Event Sourcing

Azure Functions are stateless, while workflows require a state to keep track of their progress. Every time a new action towards the workflow's execution happens, the framework automatically records an event in table storage.

Whenever an orchestrator restarts the execution because a new message arrives from its activity, it loads the complete history of this particular execution from storage. Durable Context uses this history to make decisions whether to call the activity or return the previously stored result.

The pattern of storing the complete history of state changes as an append-only event store is known as Event Sourcing. Event store provides several benefits:

  • Durability—if a host running an orchestration fails, the history is retained in persistent storage and is loaded by the new host where the orchestration restarts;
  • Scalability—append-only writes are fast and easy to spread over multiple storage servers;
  • Observability—no history is ever lost, so it's straightforward to inspect and analyze even after the workflow is complete.

Here is an illustration of the notable events that get recorded during our sequential workflow:

Durable Functions: Event Sourcing

Log of events in the course of orchestrator progression

Billing

Azure Functions on the serverless consumption-based plan are billed per execution + per duration of execution.

The stop-replay behavior of durable orchestrators causes the single workflow "instance" to execute the same orchestrator function multiple times. This also means paying for several short executions.

However, the total bill usually ends up being much lower compared to the potential cost of blocking synchronous calls to activities. The price of 5 executions of 100 ms each is significantly lower than the cost of 1 execution of 30 seconds.

By the way, the first million executions per month are at no charge, so many scenarios incur no cost at all from Azure Functions service.

Another cost component to keep in mind is Azure Storage. Queues and Tables that are used behind the scenes are charged to the end customer. In my experience, this charge remains close to zero for low- to medium-load applications.

Beware of unintentional eternal loops or indefinite recursive fan-outs in your orchestrators. Those can get expensive if you leave them out of control.

Error-handling and retries

What happens when an error occurs somewhere in the middle of the workflow? For instance, a third-party flight booking service might not be able to process the request:

Error Handling

One activity is unhealthy

This situation is expected by Durable Functions. Instead of silently failing, the activity function sends a message containing the information about the error back to the orchestrator.

The orchestrator deserializes the error details and, at the time of replay, throws a .NET exception from the corresponding call. The developer is free to put a try .. catch block around the call and handle the exception:

[FunctionName("SequentialWorkflow")]
public static async Task Sequential([OrchestrationTrigger] DurableOrchestrationContext context)
{
    var conf = await context.CallActivityAsync<ConfTicket>("BookConference", "ServerlessDays");
    try
    {
        var itinerary = MakeItinerary(/* ... */);
        await context.CallActivityAsync("BookFlight", itinerary);
    }
    catch (FunctionFailedException)
    {
        var alternativeItinerary = MakeAnotherItinerary(/* ... */);
        await context.CallActivityAsync("BookFlight", alternativeItinerary);
    }
    await context.CallActivityAsync("BookHotel", flight.Dates);
}

The code above falls back to a "backup plan" of booking another itinerary. Another typical pattern would be to run a compensating activity to cancel the effects of any previous actions (un-book the conference in our case) and leave the system in a clean state.

Quite often, the error might be transient, so it might make sense to retry the failed operation after a pause. It's a such a common scenario that Durable Functions provides a dedicated API:

var options = new RetryOptions(
    firstRetryInterval: TimeSpan.FromMinutes(1),                    
    maxNumberOfAttempts: 5);
options.BackoffCoefficient = 2.0;

await context.CallActivityWithRetryAsync("BookFlight", options, itinerary);

The above code instructs the library to

  • Retry up to 5 times
  • Wait for 1 minute before the first retry
  • Increase delays before every subsequent retry by the factor of 2 (1 min, 2 min, 4 min, etc.)

The significant point is that, once again, the orchestrator does not block while awaiting retries. After a failed call, a message is scheduled for the moment in the future to re-run the orchestrator and retry the call.

Sub-orchestrators

Business processes may consist of numerous steps. To keep the code of orchestrators manageable, Durable Functions allows nested orchestrators. A "parent" orchestrator can call out to child orchestrators via the context.CallSubOrchestratorAsync method:

[FunctionName("CombinedOrchestrator")]
public static async Task CombinedOrchestrator([OrchestrationTrigger] DurableOrchestrationContext context)
{
    await context.CallSubOrchestratorAsync("BookTrip", serverlessDaysAmsterdam);
    await context.CallSubOrchestratorAsync("BookTrip", serverlessDaysHamburg);
}

The code above books two conferences, one after the other.

Fan-out / Fan-in

What if we want to run multiple activities in parallel?

For instance, in the example above, we could wish to book two conferences, but the booking order might not matter. Still, when both bookings are completed, we want to combine the results to produce an expense report for the finance department:

Parallel Calls

Parallel calls followed by a final step

In this scenario, the BookTrip orchestrator accepts an input parameter with the name of the conference and returns the expense information. ReportExpenses needs to receive both expenses combined.

This goal can be easily achieved by scheduling two tasks (i.e., sending two messages) without awaiting them separately. We use the familiar Task.WhenAll method to await both and combine the results:

[FunctionName("ParallelWorkflow")]
public static async Task Parallel([OrchestrationTrigger] DurableOrchestrationContext context)
{
    var amsterdam = context.CallSubOrchestratorAsync("BookTrip", serverlessDaysAmsterdam);
    var hamburg   = context.CallSubOrchestratorAsync("BookTrip", serverlessDaysHamburg);

    var expenses = await Task.WhenAll(amsterdam, hamburg);

    await context.CallActivityAsync("ReportExpenses", expenses);
}

Remember that awaiting the WhenAll method doesn't synchronously block the orchestrator. It quits the first time and then restarts two times on reply messages received from activities. The first restart quits again, and only the second restart makes it past the await.

Task.WhenAll returns an array of results (one result per each input task), which is then passed to the reporting activity.

Another example of parallelization could be a workflow sending e-mails to hundreds of recipients. Such fan-out wouldn't be hard with normal queue-triggered functions: simply send hundreds of messages. However, combining the results, if required for the next step of the workflow, is quite challenging.

It's straightforward with a durable orchestrator:

var emailSendingTasks =
    recepients
    .Select(to => context.CallActivityAsync<bool>("SendEmail", to))
    .ToArray();

var results = await Task.WhenAll(emailSendingTasks);

if (results.All(r => r)) { /* ... */ }

Making hundreds of roundtrips to activities and back could cause numerous replays of the orchestrator. As an optimization, if multiple activity functions complete around the same time, the orchestrator may internally process several messages as a batch and restart the orchestrator function only once per batch.

Other Concepts

There are many more patterns enabled by Durable Functions. Here is a quick list to give you some perspective:

  • Waiting for the first completed task in a collection (rather than all of them) using the Task.WhenAny method. Useful for scenarios like timeouts or competing actions.
  • Pausing the workflow for a given period or until a deadline.
  • Waiting for external events, e.g., bringing human interaction into the workflow.
  • Running recurring workflows, when the flow repeats until a certain condition is met.

Further explanation and code samples are in the docs.

Conclusion

I firmly believe that serverless applications utilizing a broad range of managed cloud services are highly beneficial to many companies, due to both rapid development process and the properly aligned billing model.

Serverless tech is still young; more high-level architectural patterns need to emerge to enable expressive and composable implementations of large business systems.

Azure Durable Functions suggests some of the possible answers. It combines the clarity and readability of sequential RPC-style code with the power and resilience of event-driven architecture.

The documentation for Durable Functions is excellent, with plenty of examples and how-to guides. Learn it, try it for your real-life scenarios, and let me know your opinion—I'm excited about the serverless future!

Acknowledgments

Many thanks to Katy Shimizu, Chris Gillum, Eric Fleming, KJ Jones, William Liebenberg, Andrea Tosato for reviewing the draft of this article and their valuable contributions and suggestions. The community around Azure Functions and Durable Functions is superb!

From 0 to 1000 Instances: How Serverless Providers Scale Queue Processing

Originally published at Binaris Blog

Whenever I see a "Getting Started with Function-as-a-Service" tutorial, it usually shows off a synchronous HTTP-triggered scenario. In my projects, though, I use a lot of asynchronous functions triggered by a queue or an event stream.

Quite often, the number of messages passing through a queue isn't uniform over time. I might drop batches of work now and then. My app may get piles of queue items arriving from upstream systems that were down or under maintenance for an extended period. The system might see some rush-hour peaks every day or only a few busy days per month.

This is where serverless tech shines: You pay per execution, and then the promise is that the provider takes care of scaling up or down for you. Today, I want to put this scalability under test.

The goal of this article is to explore queue-triggered serverless functions and hopefully distill some practical advice regarding asynchronous functions for real projects. I will be evaluating the problem:

  • Across Big-3 cloud providers (Amazon, Microsoft, Google)
  • For different types of workloads
  • For different performance tiers

Let's see how I did that and what the outcome was.

DISCLAIMER. Performance testing is hard. I might be missing some crucial factors and parameters that influence the outcome. My interpretation might be wrong. The results might change over time. If you happen to know a way to improve my tests, please let me know, and I will re-run them and re-publish the results.

Methodology

In this article I analyze the execution results of the following cloud services:

  • AWS Lambda triggered via SQS queues
  • Azure Function triggered via Storage queues
  • Google Cloud Function triggered via Cloud Pub/Sub

All functions are implemented in Javascript and are running on GA runtime.

At the beginning of each test, I threw 100,000 messages into a queue that was previously idle. Enqueuing never took longer than one minute (I sent the messages from multiple clients in parallel).

I disabled any batch processing, so each message was consumed by a separate function invocation.

I then analyzed the logs (AWS CloudWatch, Azure Application Insights, and GCP Stackdriver Logging) to generate charts of execution distribution over time.

How Scaling Actually Works

To understand the experiment better, let's look at a very simplistic but still useful model of how cloud providers scale serverless applications.

All providers handle the increased load by scaling out, i.e., by creating multiple instances of the same application that execute the chunks of work in parallel.

In theory, a cloud provider could spin up an instance for each message in the queue as soon as the messages arrive. The backlog processing time would then stay very close to zero.

In practice, allocating instances is not cheap. The Cloud provider has to boot up the function runtime, hit a cold start, and waste expensive resources on a job that potentially will take just a few milliseconds.

So the providers are trying to find a sweet spot between handling the work as soon as possible and using resources efficiently. The outcomes differ, which is the point of my article.

AWS

AWS Lambda defines scale out with a notion of Concurrent Executions. Each instance of your AWS Lambda is handling a single execution at any given time. In our case, it's processing a single SQS message.

It's helpful to think of a function instance as a container working on a single task. If execution pauses or waits for an external I/O operation, the instance is on hold.

The model of concurrent executions is universal to all trigger types supported by Lambdas. An instance doesn't work with event sources directly; it just receives an event to work on.

There is a central element in the system, let's call it "Orchestrator". The Orchestrator is the component talking to an SQS queue and getting the messages from it. It's then the job of the Orchestrator and related infrastructure to provision the required number of instances for working on concurrent executions:

Model of AWS Lambda Scale-Out

Model of AWS Lambda Scale-Out

As to scaling behavior, here is what the official AWS docs say:

AWS Lambda automatically scales up ... until the number of concurrent function executions reaches 1000 ... Amazon Simple Queue Service supports an initial burst of 5 concurrent function invocations and increases concurrency by 60 concurrent invocations per minute.

GCP

The model of Google Cloud Functions is very similar to what AWS does. It runs a single simultaneous execution per instance and routes the messages centrally.

I wasn't able to find any scaling specifics except the definition of Function Quotas.

Azure

Experiments with Azure Functions were run on Consumption Plan —the dynamically scaled and billed-per-execution runtime. The concurrency model of Azure Functions is different from the counterparts of AWS/GCP.

Function App instance is closer to a VM than a single-task container. It runs multiple concurrent executions in parallel. Equally importantly, it pulls messages from the queue on its own instead of getting them pushed from a central Orchestrator.

There is still a central coordinator called Scale Controller, but its role is a bit more subtle. It connects to the same data source (the queue) and needs to determine how many instances to provision based on the metrics from that queue:

Model of Azure Function Scale-Out

Model of Azure Function Scale-Out

This model has pros and cons. If one execution is idle, waiting for some I/O operation such as an HTTP request to finish, the instance might become busy processing other messages, thus being more efficient. Running multiple executions is useful in terms of shared resource utilization, e.g., keeping database connection pools and reusing HTTP connections.

On the flip side, the Scale Controller now needs to be smarter: to know not only the queue backlog but also how instances are doing and at what pace they are processing the messages. It's probably achievable based on queue telemetry though.

Let's start applying this knowledge in practical experiments.

Pause-the-World Workload

My first serverless function is aimed to simulate I/O-bound workloads without using external dependencies to keep the experiment clean. Therefore, the implementation is extremely straightforward: pause for 500 ms and return.

It could be loading data from a scalable third-party API. It could be running a database query. Instead, it just runs setTimeout.

I sent 100k messages to queues of all three cloud providers and observed the result.

AWS

AWS Lambda allows multiple instance sizes to be provisioned. Since the workload is neither CPU- nor memory-intensive, I was using the smallest memory allocation of 128 MB.

Here comes the first chart of many, so let's learn to read it. The horizontal axis shows time in minutes since all the messages were sent to the queue.

The line going from top-left to bottom-right shows the decreasing queue backlog. Accordingly, the left vertical axis denotes the number of items still-to-be-handled.

The bars show the number of concurrent executions crunching the messages at a given time. Every execution logs the instance ID so that I could derive the instance count from the logs. The right vertical axis shows the instance number.

AWS Lambda processing 100k SQS messages with "Pause" handler

AWS Lambda processing 100k SQS messages with "Pause" handler

It took AWS Lambda 5.5 minutes to process the whole batch of 100k messages. For comparison, the same batch processed sequentially would take about 14 hours.

Notice how linear the growth of instance count is. If I apply the official scaling formula:

Instance Count = 5 + Minutes * 60 = 5 + 5.5 * 60  = 335

We get a very close result! Promises kept.

GCP

Same function, same chart, same instance size of 128 MB of RAM—but this time for Google Cloud Functions:

Google Cloud Function processing 100k Pub/Sub messages with "Pause" handler

Google Cloud Function processing 100k Pub/Sub messages with "Pause" handler

Coincidentally, the total amount of instances, in the end, was very close to AWS. The scaling pattern looks entirely different though: Within the very first minute, there was a burst of scaling close to 300 instances, and then the growth got very modest.

Thanks to this initial jump, GCP managed to finish processing almost one minute earlier than AWS.

Azure

Azure Function doesn't have a configuration for allocated memory or any other instance size parameters.

The shape of the chart for Azure Functions is very similar, but the instance number growth is significantly different:

Azure Function processing 100k queue messages with "Pause" handler

Azure Function processing 100k queue messages with "Pause" handler

The total processing time was a bit faster than AWS and somewhat slower than GCP. Azure Function instances process several messages in parallel, so it takes much less of them to do the same amount of work.

Instance number growth seems far more linear than bursty.

What we learned

Based on this simple test, it's hard to say if one cloud provider handles scale-out better than the others.

It looks like all serverless platforms under stress are making decisions at the resolution of 5-15 seconds, so the backlog processing delays are likely to be measured in minutes. It sounds quite far from the theoretical "close to zero" target but is most likely good enough for the majority of applications.

Crunching Numbers

That was an easy job though. Let's give cloud providers a hard time by executing CPU-heavy workloads and see if they survive!

This time, each message handler calculates a Bcrypt hash with a cost of 10. One such calculation takes about 200 ms on my laptop.

AWS

Once again, I sent 100k messages to an SQS queue and recorded the processing speed and instance count.

Since the workload is CPU-bound, and AWS allocates CPU shares proportionally to the allocated memory, the instance size might have a significant influence on the result.

I started with the smallest memory allocation of 128 MB:

AWS Lambda (128 MB) processing 100k SQS messages with "Bcrypt" handler

AWS Lambda (128 MB) processing 100k SQS messages with "Bcrypt" handler

This time it took almost 10 minutes to complete the experiment.

The scaling shape is pretty much the same as last time, still correctly described by the formula 60 * Minutes + 5. However, because AWS allocates a small fraction of a full CPU to each 128 MB execution, one message takes around 1,700 ms to complete. Thus, the total work increased approximately by the factor of 3 (47 hours if done sequentially).

At the peak, 612 concurrent executions were running, nearly double the amount in our initial experiment. So, the total processing time increased only by the factor of 2—up to 10 minutes.

Let's see if larger Lambda instances would improve the outcome. Here is the chart for 512 MB of allocated memory:

AWS Lambda (512 MB) processing 100k SQS messages with "Bcrypt" handler

AWS Lambda (512 MB) processing 100k SQS messages with "Bcrypt" handler

And yes it does. The average execution duration is down to 400 ms: 4 times less, as expected. The scaling shape still holds, so the entire batch was done in less than four minutes.

GCP

I executed the same experiment on Google Cloud Functions. I started with 128 MB, and it looks impressive:

Google Cloud Function (128 MB) processing 100k Pub/Sub messages with "Bcrypt" handler

Google Cloud Function (128 MB) processing 100k Pub/Sub messages with "Bcrypt" handler

The average execution duration is very close to Amazon's: 1,600 ms. However, GCP scaled more aggressively—to a staggering 1,169 parallel executions! Scaling also has a different shape: It's not linear but grows in steep jumps. As a result, it took less than six minutes on the lowest CPU profile—very close to AWS's time on a 4x more powerful CPU.

What will GCP achieve on a faster CPU? Let's provision 512 MB. It must absolutely crush the test. Umm, wait, look at that:

Google Cloud Function (512 MB) processing 100k Pub/Sub messages with "Bcrypt" handler

Google Cloud Function (512 MB) processing 100k Pub/Sub messages with "Bcrypt" handler

It actually... got slower. Yes, the average execution time is 4x lower: 400 ms, but the scaling got much less aggressive too, which canceled the speedup.

I confirmed it with the largest instance size of 2,048 MB:

Google Cloud Function (2 GB) processing 100k Pub/Sub messages with "Bcrypt" handler

Google Cloud Function (2 GB) processing 100k Pub/Sub messages with "Bcrypt" handler

CPU is fast: 160 ms average execution time, but the total time to process 100k messages went up to eight minutes. Beyond the initial spike at the first minute, it failed to scale up any further and stayed at about 110 concurrent executions.

It seems that GCP is not that keen to scale out larger instances. It's probably easier to find many small instances available on the pool rather than a similar number of giant instances.

Azure

A single invocation takes about 400 ms to complete on Azure Function. Here is the burndown chart:

Azure Function processing 100k queue messages with "Bcrypt" handler

Azure Function processing 100k queue messages with "Bcrypt" handler

Azure spent 21 minutes to process the whole backlog. The scaling was linear, similarly to AWS, but with a much slower pace regarding instance size growth, about 2.5 * Minutes.

As a reminder, each instance could process multiple queue messages in parallel, but each such execution would be competing for the same CPU resource, which doesn't help for the purely CPU-bound workload.

Practical Considerations

Time for some conclusions and pieces of advice to apply in real serverless applications.

Serverless is great for async data processing

If you are already using cloud services, such as managed queues and topics, serverless functions are the easiest way to consume them.

Moreover, the scalability is there too. When was the last time you ran 1,200 copies of your application?

Serverless is not infinitely scalable

There are limits. Your functions won't scale perfectly to accommodate your spike—a provider-specific algorithm will determine the scaling pattern.

If you have large spikes in queue workloads, which is quite likely for medium- to high-load scenarios, you can and should expect delays up to several minutes before the backlog is fully digested.

All cloud providers have quotas and limits that define an upper boundary of scalability.

Cloud providers have different implementations

AWS Lambda seems to have a very consistent and well-documented linear scale growth for SQS-triggered Lambda functions. It will happily scale to 1,000 instances, or whatever other limit you hit first.

Google Cloud Functions has the most aggressive scale-out strategy for the smallest instance sizes. It can be a cost-efficient and scalable way to run your queue-based workloads. Larger instances seem to scale in a more limited way, so a further investigation is required if you need those.

Azure Functions share instances for multiple concurrent executions, which works better for I/O-bound workloads than for CPU-bound ones. Depending on the exact scenario that you have, it might help to play with instance-level settings.

Don't forget batching

For the tests, I was handling queue messages in the 1-by-1 fashion. In practice, it helps if you can batch several messages together and execute a single action for all of them in one go.

If the destination for your data supports batched operations, the throughput will usually increase immensely. Processing 100,000 Events Per Second on Azure Functions is an excellent case to prove the point.

You might get too much scale

A month ago, Troy Hunt published a great post Breaking Azure Functions with Too Many Connections. His scenario looks very familiar: He uses queue-triggered Azure Functions to notify subscribers about data breaches. One day, he dropped 126 million items into the queue, and Azure scaled out, which overloaded Mozilla's servers and caused them to go all-timeouts.

Another consideration is that non-serverless dependencies limit the scalability of your serverless application. If you call a legacy HTTP endpoint, a SQL database, or a third-party web service—be sure to test how they react when your serverless function scales out to hundreds of concurrent executions.

Stay tuned for more serverless performance goodness!

Mikhail Shilkov I'm Mikhail Shilkov, a software developer and architect, a Microsoft Azure MVP, Russian expat living in the Netherlands. I am passionate about cloud technologies, functional programming and the intersection of the two.

LinkedIn@mikhailshilkovGitHubStack Overflow