AWS Lambda Warmer as Pulumi Component

Out of curiosity, I'm currently investigating cold starts of Function-as-a-Service platforms of major cloud providers. Basically, if a function is not called for several minutes, the cloud instance behind it might be recycled, and then the next request will take longer because a new instance will need to be provisioned.

Recently, Jeremy Daly posted a nice article about the proper way to keep AWS Lambda instances "warm" to (mostly) prevent cold starts with minimal overhead. Chris Munns endorsed the article, so we know it's the right way.

The amount of actions to be taken is quite significant:

  • Define a CloudWatch event which would fire every 5 minutes
  • Bind this event as another trigger for your Lambda
  • Inside the Lambda, detect whether current invocation is triggered by our CloudWatch event
  • If so, short-circuit the execution and return immediately; otherwise, run the normal workload
  • (Bonus point) If you want to keep multiple instances alive, do some extra dancing with calling itself N times in parallel, provided by an extra permission to do so.

Pursuing Reusability

To simplify this for his readers, Jeremy was so kind to

  • Create an NPM package which you can install and then call from a function-to-be-warmed
  • Provide SAM and Serverless Framework templates to automate Cloud Watch integration

Those are still two distinct steps: writing the code (JS + NPM) and provisioning the cloud resources (YAML + CLI). There are some drawbacks to that:

  • You need to change two parts, which don't look like each other
  • They have to work in sync, e.g. Cloud Watch event must provide the right payload for the handler
  • There's still some boilerplate for every new Lambda

Pulumi Components

Pulumi takes a different approach. You can blend the application code and infrastructure management code into one cohesive cloud application.

Related resources can be combined together into reusable components, which hide repetitive stuff behind code abstractions.

One way to define an AWS Lambda with Typescript in Pulumi is the following:

const handler = (event: any, context: any, callback: (error: any, result: any) => void) => {
    const response = {
        statusCode: 200,
        body: "Cheers, how are things?"
      };

    callback(null, response);
};

const lambda = new aws.serverless.Function("my-function", { /* options */ }, handler);

The processing code handler is just passed to infrastructure code as a parameter.

So, if I wanted to make reusable API for an "always warm" function, how would it look like?

From the client code perspective, I just want to be able to do the same thing:

const lambda = new mylibrary.WarmLambda("my-warm-function", { /* options */ }, handler);

CloudWatch? Event subscription? Short-circuiting? They are implementation details!

Warm Lambda

Here is how to implement such component. The declaration starts with a Typescript class:

export class WarmLambda extends pulumi.ComponentResource {
    public lambda: aws.lambda.Function;

    // Implementation goes here...
}

We expose the raw Lambda Function object, so that it could be used for further bindings and retrieving outputs.

The constructor accepts the same parameters as aws.serverless.Function provided by Pulumi:

constructor(name: string,
        options: aws.serverless.FunctionOptions,
        handler: aws.serverless.Handler,
        opts?: pulumi.ResourceOptions) {

    // Subresources are created here...
}

We start resource provisioning by creating the CloudWatch rule to be triggered every 5 minutes:

const eventRule = new aws.cloudwatch.EventRule(`${name}-warming-rule`, 
    { scheduleExpression: "rate(5 minutes)" },
    { parent: this, ...opts }
);

Then goes the cool trick. We substitute the user-provided handler with our own "outer" handler. This handler closes over eventRule, so it can use the rule to identify the warm-up event coming from CloudWatch. If such is identified, the handler short-circuits to the callback. Otherwise, it passes the event over to the original handler:

const outerHandler = (event: any, context: aws.serverless.Context, callback: (error: any, result: any) => void) =>
{
    if (event.resources && event.resources[0] && event.resources[0].includes(eventRule.name.get())) {
        console.log('Warming...');
        callback(null, "warmed!");
    } else {
        console.log('Running the real handler...');
        handler(event, context, callback);
    }
};

That's a great example of synergy enabled by doing both application code and application infrastructure in a single program. I'm free to mix and match objects from both worlds.

It's time to bind both eventRule and outerHandler to a new serverless function:

const func = new aws.serverless.Function(
    `${name}-warmed`, 
    options, 
    outerHandler, 
    { parent: this, ...opts });
this.lambda = func.lambda;            

Finally, I create an event subscription from CloudWatch schedule to Lambda:

this.subscription = new serverless.cloudwatch.CloudwatchEventSubscription(
    `${name}-warming-subscription`, 
    eventRule,
    this.lambda,
    { },
    { parent: this, ...opts });

And that's all we need for now! See the full code here.

Here is the output of pulumi update command for my sample "warm" lambda application:

     Type                                                      Name                            Plan
 +   pulumi:pulumi:Stack                                       WarmLambda-WarmLambda-dev       create
 +    samples:WarmLambda                                       i-am-warm                       create
 +      aws-serverless:cloudwatch:CloudwatchEventSubscription  i-am-warm-warming-subscription  create
 +        aws:lambda:Permission                                i-am-warm-warming-subscription  create
 +        aws:cloudwatch:EventTarget                           i-am-warm-warming-subscription  create
 +      aws:cloudwatch:EventRule                               i-am-warm-warming-rule          create
 +      aws:serverless:Function                                i-am-warm-warmed                create
 +         aws:lambda:Function                                 i-am-warm-warmed                create

7 Pulumi components and 4 AWS cloud resources are provisioned by one new WarmLambda() line.

Multi-Instance Warming

Jeremy's library supports warming several instances of Lambda by issuing parallel self-calls.

Reproducing the same with Pulumi component should be fairly straightforward:

  • Add an extra constructor option to accept the number of instances to keep warm
  • Add a permission to call Lambda from itself
  • Fire N calls when warming event is triggered
  • Short-circuit those calls in each instance

Note that only the first item would be visible to the client code. That's the power of componentization and code reuse.

I didn't need multi-instance warming, so I'll leave the implementation as exercise for the reader.

Conclusion

Obligatory note: most probably, you don't need to add warming to your AWS Lambdas.

But whatever advanced scenario you might have, it's likely that it is easier to express the scenario in terms of general-purpose reusable component, rather than a set of guidelines or templates.

Happy hacking!

Getting Started with AWS Lambda in Pulumi

For a small research project of mine, I needed to create HTTP triggered AWS Lambda's in all supported programming languages.

I'm not a power AWS user, so I get easily confused about the configuration of things like IAM roles or API Gateway. Moreover, I wanted my environment to be reproducible, so manual AWS Console wasn't a good option.

I decided it was a good job for Pulumi. They pay a lot of attention to serverless and especially AWS Lambda, and I love the power of configuration as code.

I created a Pulumi program which provisions Lambda's running on Javascript, .NET, Python, Java and Go. Pulumi program itself is written in Javascript.

I'm describing the resulting code below in case folks need to do the same thing. The code itself is on my github.

Javascript

Probably, the vast majority of Pulumi + AWS Lambda users will be using Javascript as programming language for their serverless functions.

No wonder that this scenario is the easiest to start with. There is a high-level package @pulumi/cloud-aws which hides all the AWS machinery from a developer.

The simplest function will consist of just several lines:

const cloud = require("@pulumi/cloud-aws");

const api = new cloud.API("aws-hellolambda-js");
api.get("/js", (req, res) => {
    res.status(200).json("Hi from Javascript lambda");
});

exports.endpointJs = api.publish().url;

Configure your Pulumi stack, run pulumi update and a Lambda is up, running and accessible via HTTP.

.NET Core

.NET is my default development environment and AWS Lambda supports .NET Core as execution runtime.

Pulumi program is still Javascript, so it can't mix C# code in. Thus, the setup looks like this:

  • There is a .NET Core 2.0 application written in C# and utilizing Amazon.Lambda.* NuGet packages
  • I build and publish this application with dotnet CLI
  • Pulumi then utilizes the published binaries to create deployment artifacts

C# function looks like this:

public class Functions
{
    public async Task<APIGatewayProxyResponse> GetAsync(APIGatewayProxyRequest request, ILambdaContext context)
    {
        return new APIGatewayProxyResponse
        {
            StatusCode = (int)HttpStatusCode.OK,
            Body = "\"Hi from C# Lambda\"",
            Headers = new Dictionary<string, string> { { "Content-Type", "application/json" } }
        };
    }
}

For non-Javascript lambdas I utilize @pulumi/aws package. It's of lower level than @pulumi/cloud-aws, so I had to setup IAM first:

const aws = require("@pulumi/aws");

const policy = {
    "Version": "2012-10-17",
    "Statement": [
        {
            "Action": "sts:AssumeRole",
            "Principal": {
                "Service": "lambda.amazonaws.com",
            },
            "Effect": "Allow",
            "Sid": "",
        },
    ],
};
const role = new aws.iam.Role("precompiled-lambda-role", {
    assumeRolePolicy: JSON.stringify(policy),
});

And then I did a raw definition of AWS Lambda:

const pulumi = require("@pulumi/pulumi");

const csharpLambda = new aws.lambda.Function("aws-hellolambda-csharp", {
    runtime: aws.lambda.DotnetCore2d0Runtime,
    code: new pulumi.asset.AssetArchive({
        ".": new pulumi.asset.FileArchive("./csharp/bin/Debug/netcoreapp2.0/publish"),
    }),
    timeout: 5,
    handler: "app::app.Functions::GetAsync",
    role: role.arn
});

Note the path to publish folder, which should match the path created by dotnet publish, and the handler name matching C# class/method.

Finally, I used @pulumi/aws-serverless to define API Gateway endpoint for the lambda:

const serverless = require("@pulumi/aws-serverless");

const precompiledApi = new serverless.apigateway.API("aws-hellolambda-precompiledapi", {
    routes: [
        { method: "GET", path: "/csharp", handler: csharpLambda },
    ],
});

That's definitely more ceremony compared to Javascript version. But hey, it's code, so if you find yourself repeating the same code, go ahead and make a higher order component out of it, incapsulating the repetitive logic.

Python

Pulumi supports Python as scripting language, but I'm sticking to Javascript for uniform experience.

In this case, the flow is similar to .NET but simpler: no compilation step is required. Just define a handler.py:

def handler(event, context): 
    return {
        'statusCode': 200,
        'headers': {'Content-Type': 'application/json'},
        'body': '"Hi from Python lambda"'
    }

and package it into zip in AWS lambda definition:

const pythonLambda = new aws.lambda.Function("aws-hellolambda-python", {
    runtime: aws.lambda.Python3d6Runtime,
    code: new pulumi.asset.AssetArchive({
        ".": new pulumi.asset.FileArchive("./python"),
    }),
    timeout: 5,
    handler: "handler.handler",
    role: role.arn
});

I'm reusing the role definition from above. The API definition will also be the same as for .NET.

Go

Golang is a compiled language, so the approach is similar to .NET: write code, build, reference the built artifact from Pulumi.

My Go function looks like this:

func Handler(request events.APIGatewayProxyRequest) (events.APIGatewayProxyResponse, error) {

 return events.APIGatewayProxyResponse{
  Body:       "\"Hi from Golang lambda\"",
  StatusCode: 200,
 }, nil

}

Because I'm on Windows but AWS Lambda runs on Linux, I had to use build-lambda-zip tool to make the package compatible. Here is the PowerShell build script:

$env:GOOS = "linux"
$env:GOARCH = "amd64"
go build -o main main.go
~\Go\bin\build-lambda-zip.exe -o main.zip main

and Pulumi function definition:

const golangLambda = new aws.lambda.Function("aws-hellolambda-golang", {
    runtime: aws.lambda.Go1dxRuntime,
    code: new pulumi.asset.FileArchive("./go/main.zip"),
    timeout: 5,
    handler: "main",
    role: role.arn
});

Java

Java class implements an interface from AWS SDK:

public class Hello implements RequestStreamHandler {

    public void handleRequest(InputStream inputStream, OutputStream outputStream, Context context) throws IOException {

        JSONObject responseJson = new JSONObject();

        responseJson.put("isBase64Encoded", false);
        responseJson.put("statusCode", "200");
        responseJson.put("body", "\"Hi from Java lambda\"");  

        OutputStreamWriter writer = new OutputStreamWriter(outputStream, "UTF-8");
        writer.write(responseJson.toJSONString());  
        writer.close();
    }
}

I compiled this code with Maven (mvn package), which produced a jar file. AWS Lambda accepts jar directly, but Pulumi's FileArchive is unfortunately crashing on trying to read it.

As a workaround, I had to define a zip file with jar placed inside lib folder:

const javaLambda = new aws.lambda.Function("aws-coldstart-java", {
    code: new pulumi.asset.AssetArchive({
        "lib/lambda-java-example-1.0-SNAPSHOT.jar": new pulumi.asset.FileAsset("./java/target/lambda-java-example-1.0-SNAPSHOT.jar"),
    }),
    runtime: aws.lambda.Java8Runtime,
    timeout: 5,
    handler: "example.Hello",
    role: role.arn
});

Conclusion

The complete code for 5 lambda functions in 5 different programming languages can be found in my github repository.

Running pulumi update provisions 25 AWS resources in a matter of 1 minute, so I can start playing with my test lambdas in no time.

And the best part: when I don't need them anymore, I run pulumi destroy and my AWS Console is clean again!

Happy serverless moments!

Monads explained in C# (again)

I love functional programming for the simplicity that it brings.

But at the same time, I realize that learning functional programming is a challenging process. FP comes with a baggage of unfamiliar vocabulary that can be daunting for somebody coming from an object-oriented language like C#.

Functional Programming Word Cloud

Some of functional lingo

"Monad" is probably the most infamous term from the list above. Monads have reputation of being something very abstract and very confusing.

The Fallacy of Monad Tutorials

Numerous attempts were made to explain monads in simple definitions; and monad tutorials have become a genre of its own. And yet, times and times again, they fail to enlighten the readers.

The shortest explanation of monads looks like this:

A Monad is just a monoid in the category of endofunctors

It's both mathematically correct and totally useless to anybody learning functional programming. To understand this statement, one has to know the terms "monoid", "category" and "endofunctors" and be able to mentally compose them into something meaningful.

The same problem is apparent in most monad tutorials. They assume some pre-existing knowledge in heads of their readers, and if that assumption fails, the tutorial doesn't click.

Focusing too much on mechanics of monads instead of explaining why they are important is another common problem.

Douglas Crockford grasped this fallacy very well:

The monadic curse is that once someone learns what monads are and how to use them, they lose the ability to explain them to other people

The problem here is likely the following. Every person who understands monads had their own path to this knowledge. It hasn't come all at once, instead there was a series of steps, each giving an insight, until the last final step made the puzzle complete.

But they don't remember the whole path anymore. They go online and blog about that very last step as the key to understanding, joining the club of flawed explanations.

There is an actual academic paper from Tomas Petricek that studies monad tutorials.

I've read that paper and a dozen of monad tutorials online. And of course, now I came up with my own.

I'm probably doomed to fail too, at least for some readers. Yet, I know that many people found the previous version of this article useful.

I based my explanation on examples from C# - the object-oriented language familiar to .NET developers.

Story of Composition

The base element of each functional program is Function. In typed languages each function is just a mapping between the type of its input parameter and output parameter. Such type can be annotated as func: TypeA -> TypeB.

C# is object-oriented language, so we use methods to declare functions. There are two ways to define a method comparable to func function above. I can use static method:

static class Mapper 
{
    static ClassB func(ClassA a) { ... }
}

... or instance method:

class ClassA 
{
    // Instance method
    ClassB func() { ... }
}

Static form looks closer to the function annotation, but both ways are actually equivalent for the purpose of our discussion. I will use instance methods in my examples, however all of them could be written as static extension methods too.

How do we compose more complex workflows, programs and applications out of such simple building blocks? A lot of patterns both in OOP and FP worlds revolve around this question. And monads are one of the answers.

My sample code is going to be about conferences and speakers. The method implementations aren't really important, just watch the types carefully. There are 4 classes (types) and 3 methods (functions):

class Speaker 
{
    Talk NextTalk() { ... }
}

class Talk 
{
    Conference GetConference() { ... }
}

class Conference 
{
    City GetCity() { ... }
}

class City { ... }

These methods are currently very easy to compose into a workflow:

static City NextTalkCity(Speaker speaker) 
{
    Talk talk = speaker.NextTalk();
    Conference conf = talk.GetConference();
    City city = conf.GetCity();
    return city;
}

Because the return type of the previous step always matches the input type of the next step, we can write it even shorter:

static City NextTalkCity(Speaker speaker) 
{
    return 
        speaker
        .NextTalk()
        .GetConference()
        .GetCity();
}

This code looks quite readable. It's concise and it flows from top to bottom, from left to right, similar to how we are used to read any text. There is not much noise too.

That's not what real codebases look like though, because there are multiple complications along the happy composition path. Let's look at some of them.

NULLs

Any class instance in C# can be null. In the example above I might get runtime errors if one of the methods ever returns null back.

Typed functional programming always tries to be explicit about types, so I'll re-write the signatures of my methods to annotate the return types as nullables:

class Speaker 
{
    Nullable<Talk> NextTalk() { ... }
}

class Talk 
{
    Nullable<Conference> GetConference() { ... }
}

class Conference 
{
    Nullable<City> GetCity() { ... }
}

class City { ... }

This is actually invalid syntax in current C# version, because Nullable<T> and its short form T? are not applicable to reference types. This might change in C# 8 though, so bear with me.

Now, when composing our workflow, we need to take care of null results:

static Nullable<City> NextTalkCity(Speaker speaker) 
{
    Nullable<Talk> talk = speaker.NextTalk();
    if (talk == null) return null;

    Nullable<Conference> conf = talk.GetConference();
    if (conf == null) return null;

    Nullable<City> city = conf.GetCity();
    return city;
}

It's still the same method, but it got more noise now. Even though I used short-circuit returns and one-liners, it still got harder to read.

To fight that problem, smart language designers came up with the Null Propagation Operator:

static Nullable<City> NextTalkCity(Speaker speaker) 
{
    return 
        speaker
        ?.NextTalk()
        ?.GetConference()
        ?.GetCity();
}

Now we are almost back to our original workflow code: it's clean and concise, we just got 3 extra ? symbols around.

Let's take another leap.

Collections

Quite often a function returns a collection of items, not just a single item. To some extent, that's a generalization of null case: with Nullable<T> we might get 0 or 1 results back, while with a collection we can get 0 to any n results.

Our sample API could look like this:

class Speaker 
{
    List<Talk> GetTalks() { ... }
}

class Talk 
{
    List<Conference> GetConferences() { ... }
}

class Conference 
{
    List<City> GetCities() { ... }
}

I used List<T> but it could be any class or plain IEnumerable<T> interface.

How would we combine the methods into one workflow? Traditional version would look like this:

static List<City> AllCitiesToVisit(Speaker speaker) 
{
    var result = new List<City>();

    foreach (Talk talk in speaker.GetTalks())
        foreach (Conference conf in talk.GetConferences())
            foreach (City city in conf.GetCities())
                result.Add(city);

    return result;
}

It reads ok-ish still. But the combination of nested loops and mutation with some conditionals sprinkled on them can get unreadable pretty soon. The exact workflow might be lost in the mechanics.

As an alternative, C# language designers invented LINQ extension methods. We can write code like this:

static List<City> AllCitiesToVisit(Speaker speaker) 
{
    return 
        speaker
        .GetTalks()
        .SelectMany(talk => talk.GetConferences())
        .SelectMany(conf => conf.GetCities())
        .ToList();
}

Let me do one further trick and format the same code in an unusual way:

static List<City> AllCitiesToVisit(Speaker speaker) 
{
    return 
        speaker
        .GetTalks()           .SelectMany(x => x
        .GetConferences()    ).SelectMany(x => x
        .GetCities()         ).ToList();
}

Now you can see the same original code on the left, combined with just a bit of technical repeatable clutter on the right. Hold on, I'll show you where I'm going.

Let's discuss another possible complication.

Asynchronous Calls

What if our methods need to access some remote database or service to produce the results? This should be shown in type signature, and C# has Task<T> for that:

class Speaker 
{
    Task<Talk> NextTalk() { ... }
}

class Talk 
{
    Task<Conference> GetConference() { ... }
}

class Conference 
{
    Task<City> GetCity() { ... }
}

This change breaks our nice workflow composition again.

We'll get back to async-await later, but the original way to combine Task-based methods was to use ContinueWith and Unwrap API:

static Task<City> NextTalkCity(Speaker speaker) 
{
    return 
        speaker
        .NextTalk()
        .ContinueWith(talk => talk.Result.GetConference())
        .Unwrap()
        .ContinueWith(conf => conf.Result.GetCity())
        .Unwrap();
}

Hard to read, but let me apply my formatting trick again:

static Task<City> NextTalkCity(Speaker speaker) 
{
    return 
        speaker
        .NextTalk()         .ContinueWith(x => x.Result
        .GetConference()   ).Unwrap().ContinueWith(x => x.Result
        .GetCity()         ).Unwrap();
}

You can see that, once again, it's our nice readable workflow on the left + some mechanical repeatable junction code on the right.

Pattern

Can you see a pattern yet?

I'll repeat the Nullable-, List- and Task-based workflows again:

static Nullable<City> NextTalkCity(Speaker speaker) 
{
    return 
        speaker               ?
        .NextTalk()           ?
        .GetConference()      ?
        .GetCity();
}

static List<City> AllCitiesToVisit(Speaker speaker) 
{
    return 
        speaker
        .GetTalks()            .SelectMany(x => x
        .GetConferences()     ).SelectMany(x => x
        .GetCities()          ).ToList();
}

static Task<City> NextTalkCity(Speaker speaker) 
{
    return 
        speaker
        .NextTalk()            .ContinueWith(x => x.Result
        .GetConference()      ).Unwrap().ContinueWith(x => x.Result
        .GetCity()            ).Unwrap();
}

In all 3 cases there was a complication which prevented us from sequencing method calls fluently. In all 3 cases we found the gluing code to get back to fluent composition.

Let's try to generalize this approach. Given some generic container type WorkflowThatReturns<T>, we have a method to combine an instance of such workflow with a function which accepts the result of that workflow and returns another workflow back:

class WorkflowThatReturns<T> 
{
    WorkflowThatReturns<U> AddStep(Func<T, WorkflowThatReturns<U>> step);
}

In case this is hard to grasp, have a look at the picture of what is going on:

Monad Bind Internals

  1. An instance of type T sits in a generic container.

  2. We call AddStep with a function, which maps T to U sitting inside yet another container.

  3. We get an instance of U but inside two containers.

  4. Two containers are automatically unwrapped into a single container to get back to the original shape.

  5. Now we are ready to add another step!

In the following code, NextTalk returns the first instance inside the container:

WorkflowThatReturns<City> Workflow(Speaker speaker) 
{
    return 
        speaker
        .NextTalk()         
        .AddStep(x => x.GetConference())
        .AddStep(x => x.GetCity()); 
}

Subsequently, AddStep is called two times to transfer to Conference and then City inside the same container:

Monad Bind Chaining

Finally, Monads

The name of this pattern is Monad.

In C# terms, a Monad is a generic class with two operations: constructor and bind.

class Monad<T> {
    Monad(T instance);
    Monad<U> Bind(Func<T, Monad<U>> f);
}

Constructor is used to put an object into container, Bind is used to replace one contained object with another contained object.

It's important that Bind's argument returns Monad<U> and not just U. We can think of Bind as a combination of Map and Unwrap as defined per following signature:

class Monad<T> {
    Monad(T instance);
    Monad<U> Map(Function<T, U> f);
    static Monad<U> Unwrap(Monad<Monad<U>> nested);
}

Even though I spent quite some time with examples, I expect you to be slightly confused at this point. That's ok.

Keep going and let's have a look at several sample implementations of Monad pattern.

Maybe (Option)

My first motivational example was with Nullable<T> and ?.. The full pattern containing either 0 or 1 instance of some type is called Maybe (it maybe has a value, or maybe not).

Maybe is another approach to dealing with 'no value' value, alternative to the concept of null.

Functional-first language F# typically doesn't allow null for its types. Instead, F# has a maybe implementation built into the language: it's called option type.

Here is a sample implementation in C#:

public class Maybe<T> where T : class
{
    private readonly T value;

    public Maybe(T someValue)
    {
        if (someValue == null)
            throw new ArgumentNullException(nameof(someValue));
        this.value = someValue;
    }

    private Maybe()
    {
    }

    public Maybe<U> Bind<U>(Func<T, Maybe<U>> func) where U : class
    {
        return value != null ? func(value) : Maybe<U>.None();
    }

    public static Maybe<T> None() => new Maybe<T>();
}

When null is not allowed, any API contract gets more explicit: either you return type T and it's always going to be filled, or you return Maybe<T>. The client will see that Maybe type is used, so it will be forced to handle the case of absent value.

Given an imaginary repository contract (which does something with customers and orders):

public interface IMaybeAwareRepository
{
    Maybe<Customer> GetCustomer(int id);
    Maybe<Address> GetAddress(int id);
    Maybe<Order> GetOrder(int id);
}

The client can be written with Bind method composition, without branching, in fluent style:

Maybe<Shipper> shipperOfLastOrderOnCurrentAddress =
    repo.GetCustomer(customerId)
        .Bind(c => c.Address)
        .Bind(a => repo.GetAddress(a.Id))
        .Bind(a => a.LastOrder)
        .Bind(lo => repo.GetOrder(lo.Id))
        .Bind(o => o.Shipper);

As we saw above, this syntax looks very much like a LINQ query with a bunch of SelectMany statements. One of the common implementations of Maybe implements IEnumerable interface to enable a more C#-idiomatic binding composition. Actually:

Enumerable + SelectMany is a Monad

IEnumerable is an interface for enumerable containers.

Enumerable containers can be created - thus the constructor monadic operation.

The Bind operation is defined by the standard LINQ extension method, here is its signature:

public static IEnumerable<U> SelectMany<T, U>(
    this IEnumerable<T> first, 
    Func<T, IEnumerable<U>> selector)

Direct implementation is quite straightforward:

static class Enumerable 
{
    public static IEnumerable<U> SelectMany(
        this IEnumerable<T> values, 
        Func<T, IEnumerable<U>> func) 
    { 
        foreach (var item in values)
            foreach (var subItem in func(item))
                yield return subItem;
    }
}

And here is an example of composition:

IEnumerable<Shipper> shippers =
    customers
        .SelectMany(c => c.Addresses)
        .SelectMany(a => a.Orders)
        .SelectMany(o => o.Shippers);

The query has no idea about how the collections are stored (encapsulated in containers). We use functions T -> IEnumerable<U> to produce new enumerables (Bind operation).

Task (Future)

In C# Task<T> type is used to denote asynchronous computation which will eventually return an instance of T. The other names for similar concepts in other languages are Promise and Future.

While the typical usage of Task in C# is different from the Monad pattern we discussed, I can still come up with a Future class with the familiar structure:

public class Future<T>
{
    private readonly Task<T> instance;

    public Future(T instance)
    {
        this.instance = Task.FromResult(instance);
    }

    private Future(Task<T> instance)
    {
        this.instance = instance;
    }

    public Future<U> Bind<U>(Func<T, Future<U>> func)
    {
        var a = this.instance.ContinueWith(t => func(t.Result).instance).Unwrap();
        return new Future<U>(a);
    }

    public void OnComplete(Action<T> action)
    {
        this.instance.ContinueWith(t => action(t.Result));
    }
}

Effectively, it's just a wrapper around the Task which doesn't add too much value, but it's a useful illustration because now we can do:

repository
    .LoadSpeaker()
    .Bind(speaker => speaker.NextTalk())
    .Bind(talk => talk.GetConference())
    .Bind(conference => conference.GetCity())
    .OnComplete(city => reservations.BookFlight(city));

We are back to the familiar structure. Time for some more complications.

Non-Sequential Workflows

Up until now, all the composed workflows had very liniar, sequential structure: the output of a previous step was always the input for the next step. That piece of data could be discarded after the first use because it was never needed for later steps:

Linear Workflow

Quite often though, this might not be the case. A workflow step might need data from two or more previous steps combined.

In the example above, BookFlight method might actually need both Speaker and City objects:

Non Linear Workflow

In this case, we would have to use closure to save speaker object until we get a talk too:

repository
    .LoadSpeaker()
    .OnComplete(speaker =>
        speaker
            .NextTalk()
            .Bind(talk => talk.GetConference())
            .Bind(conference => conference.GetCity())
            .OnComplete(city => reservations.BookFlight(speaker, city))
        );

Obviously, this gets ugly very soon.

To solve this structural problem, C# language got its async-await feature, which is now being reused in more languages including Javascript.

If we move back to using Task instead of our custom Future, we are able to write

var speaker = await repository.LoadSpeaker();
var talk = await speaker.NextTalk();
var conference = await talk.GetConference();
var city = await conference.GetCity();
await reservations.BookFlight(speaker, city);

Even though we lost the fluent syntax, at least the block has just one level, which makes it easier to navigate.

Monads in Functional Languages

So far we learned that

  • Monad is a workflow composition pattern
  • This pattern is used in functional programming
  • Special syntax helps simplify the usage

It should come at no surprise that functional languages support monads on syntactic level.

F# is a functional-first language running on .NET framework. F# had its own way of doing workflows comparable to async-await before C# got it. In F#, the above code would look like this:

let sendReservation () = async {
    let! speaker = repository.LoadSpeaker()
    let! talk = speaker.nextTalk()
    let! conf = talk.getConference()
    let! city = conf.getCity()
    do! bookFlight(speaker, city)
}

Apart from syntax (! instead of await), the major difference to C# is that async is just one possible monad type to be used this way. There are many other monads in F# standard library (they are called Computation Expressions).

The best part is that any developer can create their own monads, and then use all the power of language features.

Say, we want a hand-made Maybe computation expressoin in F#:

let nextTalkCity (speaker: Speaker) = maybe {
    let! talk = speaker.nextTalk()
    let! conf = talk.getConference()
    let! city = conf.getCity(talk)
    return city
}

To make this code runnable, we need to define Maybe computation expression builder:

type MaybeBuilder() =

    member this.Bind(x, f) = 
        match x with
        | None -> None
        | Some a -> f a

    member this.Return(x) = 
        Some x

let maybe = new MaybeBuilder()

I won't explain the details of what happens here, but you can see that the code is quite trivial. Note the presence of Bind operation (and Return operation being the monad constructor).

The feature is widely used by third-party F# libraries. Here is an actor definition in Akka.NET F# API:

let loop () = actor {
    let! message = mailbox.Receive()
    match message with
    | Greet(name) -> printfn "Hello %s" name
    | Hi -> printfn "Hello from F#!"
    return! loop ()
}

Monad Laws

There are a couple laws that constructor and Bind need to adhere to, so that they produce a proper monad.

A typical monad tutorial will make a lot of emphasis on the laws, but I find them less important to explain to a beginner. Nonetheless, here they are for the sake of completeness.

Left Identity law says that Monad constructor is a neutral operation: you can safely run it before Bind, and it won't change the result of the function call:

// Given
T value;
Func<T, Monad<U>> f;

// Then (== means both parts are equivalent)
new Monad<T>(value).Bind(f) == f(value) 

Right Identity law says that given a monadic value, wrapping its contained data into another monad of same type and then Binding it, doesn't change the original value:

// Given
Monad<T> monadicValue;

// Then (== means both parts are equivalent)
monadicValue.Bind(x => new Monad<T>(x)) == monadicValue

Associativity law means that the order in which Bind operations are composed does not matter:

// Given
Monad<T> m;
Func<T, Monad<U>> f;
Func<U, Monad<V>> g;

// Then (== means both parts are equivalent)
m.Bind(f).Bind(g) == m.Bind(a => f(a).Bind(g))

The laws may look complicated, but in fact they are very natural expectations that any developer has when working with monads, so don't spend too much mental effort on memorizing them.

Conclusion

You should not be afraid of the "M-word" just because you are a C# programmer.

C# does not have a notion of monads as predefined language constructs, but that doesn't mean we can't borrow some ideas from the functional world. Having said that, it's also true that C# is lacking some powerful ways to combine and generalize monads that are available in functional programming languages.

Go learn some more Functional Programming!

Programmable Cloud: Provisioning Azure App Service with Pulumi

Modern Cloud providers offer a wide variety of services of different types and levels. A modern cloud application would leverage multiple services in order to be efficient in terms of developer experience, price, operations etc.

For instance, a very simple Web Application deployed to Azure PaaS services could use

  • App Service - to host the application
  • App Service Plan - to define the instance size, price, scaling and other hosting parameters
  • Azure SQL Database - to store relational data
  • Application Insights - to collect telemetry and logs
  • Storage Account - to store the binaries and leverage Run-as-Zip feature

Provisioning such environment becomes a task on its own:

  • How do we create the initial setup?
  • How do we make changes?
  • What if we need multiple environments?
  • How do we apply settings?
  • How do we recycle resources which aren't needed anymore?

Well, there are several options.

Manually in Azure Portal

We all start doing this in Azure Portal. User Interface is great for discovering new services and features, and it's a quick way to make a single change.

Azure Portal

Creating an App Service in Azure Portal

Clicking buttons manually doesn't scale though. After the initial setup is complete, maintaining the environment over time poses significant challenges:

  • Every change requires going back to the portal, finding the right resource and doing the right change
  • People make mistakes, so if you have multiple environments, they are likely to be different in subtle ways
  • Naming gets messy over time
  • There is no easily accessible history of environment changes
  • Cleaning up is hard: usually some leftovers will remain unnoticed
  • Skills are required from everybody involved in provisioning

So, how do we streamline this process?

Azure PowerShell, CLI and Management SDKs

Azure comes with a powerful set of tools to manage resources with code.

You can use PowerShell, CLI scripts or custom code like C# to do with code whatever is possible to do via portal.

var webApp = azure.WebApps.Define(appName)
    .WithRegion(Region.WestEurope)
    .WithNewResourceGroup(rgName)
    .WithNewFreeAppServicePlan()
    .Create();

Fluent C# code creating an App Service

However, those commands are usually expressed in imperative style of CRUD operations. You can run the commands once, but it's hard to modify existing resources from an arbitrary state to the desired end state.

Azure Resource Manager Templates

All services in Azure are managed by Azure Resource Manager (ARM). ARM has a special JSON-based format for templates.

Once a template is defined, it's relatively straightforward to be deployed to Azure environment. So, if resources are defined in JSON, they will be created automatically via PowerShell or CLI commands.

It is also possible to deploy templates in incremental mode, when the tool will compare existing environment with desired configuration and will deploy the difference.

Templates can be parametrized, which enables multi-environment deployments.

There's a problem with templates though: they are JSON files. They get very large very fast, they are hard to reuse, it's easy to make a typo.

ARM Template

A fragment of auto-generated ARM Template for App Service, note the line numbers

Terraform is another templating tool to provision cloud resources but it uses YAML instead of JSON. I don't have much experience with it, but the problems seem to be very similar.

Can we combine the power of SDKs and the power of JSON-/YAML-based desired state configuration tools?

Pulumi

One potential solution has just arrived. A startup called Pulumi just went out of private beta to open source.

Pulumi

Pulumi wants to be much more than a better version of ARM templates, aiming to become the tool to build cloud-first distributed systems. But for today I'll focus on lower level of resource provisioning task.

With Pulumi cloud infrastructure is defined in code using full-blown general purpose programming languages.

The workflow goes like this:

  • Define a Stack, which is a container for a group of related resources
  • Write a program in one of supported languages (I'll use TypeScript) which references pulumi libraries and constructs all the resources as objects
  • Establish connection with your Azure account
  • Call pulumi CLI to create, update or destroy Azure resources based on the program
  • Pulumi will first show the preview of changes, and then apply them as requested

Pulumi Program

I'm using TypeScript to define my Azure resources in Pulumi. So, the program is a normal Node.js application with index.ts file, package references in package.json and one extra file Pulumi.yaml to define the program:

name: azure-appservice
runtime: nodejs

Our index.js is as simple as a bunch of import statements followed by creating TypeScript objects per desired resource. The simplest program can look like this:

import * as pulumi from "@pulumi/pulumi";
import * as azure from "@pulumi/azure";

const resourceGroup = new azure.core.ResourceGroup("myrg", {
    location: "West Europe"
});

When executed by pulumi update command, this program will create a new Resource Group in your Azure subscription.

Chaining Resources

When multiple resources are created, the properties of one resource will depend on properties of the others. E.g. I've defined the Resource Group above, and now I want to create an App Service Plan under this Group:

const resourceGroupArgs = {
    resourceGroupName: resourceGroup.name,
    location: resourceGroup.location
};

const appServicePlan = new azure.appservice.Plan("myplan", {
    ...resourceGroupArgs,

    kind: "App",

    sku: {
        tier: "Basic",
        size: "B1",
    },
});

I've assigned resourceGroupName and location of App Service Plan to values from the Resource Group. It looks like a simple assignment of strings but in fact it's more complicated.

Property resourceGroup.name has the type of pulumi.Output<string>. Constructor argument resourceGroupName of Plan has the type of pulumi.Input<string>.

We assigned "myrg" value to Resource Group name, but during the actual deployment it will change. Pulumi will append a unique identifier to the name, so the actually provisioned group will be named e.g. "myrg65fb103e".

This value will materialize inside Output type only at deployment time, and then it will get propagated to Input by Pulumi.

There is also a nice way to return the end values of Output's from Pulumi program. Let's say we define an App Service:

const app = new azure.appservice.AppService("mywebsite", {
    ...resourceGroupArgs,

    appServicePlanId: appServicePlan.id
});

First, notice how we used TypeScript spread operator to reuse properties from resourceGroupArgs.

Second, Output-Input assignment got used again to propagate App Service Plan ID.

Lastly, we can now export App Service host name from our program, e.g. for the user to be able to go to the web site immediately after deployment:

exports.hostname = app.defaultSiteHostname;

Output can also be transformed with apply function. Here is the code to format output URL:

exports.endpoint = app.defaultSiteHostname.apply(n => `https://${n}`);

Running pulumi update from CLI will then print the endpoint for us:

---outputs:---
endpoint: "https://mywebsiteb76260b5.azurewebsites.net"

Multiple outputs can be combined with pulumi.all, e.g. given SQL Server and Database, we could make a connection string:

const connectionString = 
    pulumi.all([sqlServer, database]).apply(([server, db]) => 
        `Server=tcp:${server}.database.windows.net;initial catalog=${db};user ID=${username};password=${pwd};Min Pool Size=0;Max Pool Size=30;Persist Security Info=true;`)

Using the Power of NPM

Since our program is just a TypeScript application, we are free to use any 3rd party package which exists out there in NPM.

For instance, we can install Azure Storage SDK. Just

npm install [email protected]

and then we can write a function to produce SAS token for a Blob in Azure Storage:

import * as azurestorage from "azure-storage";

// Given an Azure blob, create a SAS URL that can read it.
export function signedBlobReadUrl(
    blob: azure.storage.Blob | azure.storage.ZipBlob,
    account: azure.storage.Account,
    container: azure.storage.Container,
): pulumi.Output<string> {
    const signatureExpiration = new Date(2100, 1);

    return pulumi.all([
        account.primaryConnectionString,
        container.name,
        blob.name,
    ]).apply(([connectionString, containerName, blobName]) => {
        let blobService = new azurestorage.BlobService(connectionString);
        let signature = blobService.generateSharedAccessSignature(
            containerName,
            blobName,
            {
                AccessPolicy: {
                    Expiry: signatureExpiration,
                    Permissions: azurestorage.BlobUtilities.SharedAccessPermissions.READ,
                },
            }
        );

        return blobService.getUrl(containerName, blobName, signature);
    });
}

I took this function from Azure Functions example, and it will probably move to Pulumi libraries at some point, but until then you are free to leverage the package ecosystem.

Deploying Application Files

So far we provisioned Azure App Service, but we can also deploy the application files as part of the same workflow.

The code below is using Run from Zip feature of App Service:

  1. Define Storage Account and Container

     const storageAccount = new azure.storage.Account("mystorage", {
         ...resourceGroupArgs,
    
         accountKind: "StorageV2",
         accountTier: "Standard",
         accountReplicationType: "LRS",
     });
    
     const storageContainer = new azure.storage.Container("mycontainer", {
         resourceGroupName: resourceGroup.name,
         storageAccountName: storageAccount.name,
         containerAccessType: "private",
     });
  2. Create a folder with application files, e.g. wwwroot. It may contain some test HTML, ASP.NET application, or anything supported by App Service.

  3. Produce a zip file from that folder in Pulumi program:

     const blob = new azure.storage.ZipBlob("myzip", {
         resourceGroupName: resourceGroup.name,
         storageAccountName: storageAccount.name,
         storageContainerName: storageContainer.name,
         type: "block",
    
         content: new pulumi.asset.FileArchive("wwwroot")
     });
  4. Produce SAS Blob URL and assign it to App Service Run-as-Zip setting:

     const codeBlobUrl = signedBlobReadUrl(blob, storageAccount, storageContainer);
    
     const app = new azure.appservice.AppService("mywebsite", {
         ...resourceGroupArgs,
    
         appServicePlanId: appServicePlan.id,
    
         appSettings: {
             "WEBSITE_RUN_FROM_ZIP": codeBlobUrl
         }
     });

Run the program, and your Application will start as soon as pulumi update is complete.

Determinism

Pulumi programs should strive to be deterministic. That means you should avoid using things like current date/time or random numbers.

The reason is incremental updates. Every time you run pulumi update, it will execute the program from scratch. If your resources depend on random values, they will not match the existing resources and thus the false delta will be detected and deployed.

In the SAS generation example above we used a fixed date in the future instead of doing today + 1 year kind of calculation.

Should Pulumi provide some workaround for this?

Conclusion

My code was kindly merged to Pulumi examples, go there for the complete runnable program that provisions App Service with Azure SQL Database and Application Insights.

I really see high potential in Cloud-as-Code approach suggested by Pulumi. Today we just scratched the surface of the possibilities. We were working with cloud services on raw level: provisioning specific services with given parameters.

Pulumi's vision includes providing higher-level components to blur the line between infrastructure and code, and to enable everybody to create such components on their own.

Exciting future ahead!

Cold Starts Beyond First Request in Azure Functions

In my previous article I've explored the topic of Cold Starts in Azure Functions. Particularly, I've measured the cold start delays per language and runtime version.

I received some follow-up questions that I'd like to explore in today's post:

  • Can we avoid cold starts except the very first one by keeping the instance warm?
  • Given one warm instance, if two requests come at the same time, will one request hit a cold start because existing instance is busy with the other?
  • In general, does a cold start happen at scale-out when a new extra instance is provisioned?

Again, we are only talking Consumption Plan here.

Theory

Azure Functions are running on instances provided by Azure App Service. Each instance is able to process several requests concurrently, which is different comparing to AWS Lambda.

Thus, the following could be true:

  • If we issue at least 1 request every 20 minutes, the first instance should stay warm for long time
  • Simultaneous requests don't cause cold start unless the existing instance gets too busy
  • When runtime decides to scale out and spin up a new instance, it could do so in the background, still forwarding incoming requests to the existing warm instance(s). Once the new instance is ready, it could be added to the pool without causing cold starts
  • If so, cold starts are mitigated beyond the very first execution

Let's put this theory under test!

Keeping Always Warm

I've tested a Function App which consists of two Functions:

  • HTTP Function under test
  • Timer Function which runs every 10 minutes and does nothing but logging 1 line of text

I then measured the cold start statistics similar to all the tests from my previous article.

During 2 days I was issuing infrequent requests to the same app, most of them would normally lead to a cold start. Interestingly, even though I was regularly firing the timer, Azure switched instances to serve my application 2 times during the test period:

Infrequent Requests to Azure Functions with "Keep It Warm" Timer

I can see that most responses are fast, so timer "warmer" definitely helps.

The first request(s) to a new instance are slower than subsequent ones. Still, they are faster than normal full cold start time, so it could be related to HTTP stack loading.

Anyway, keeping Functions warm seems a viable strategy.

Parallel Requests

What happens when there is a warm instance, but it's already busy with processing another request? Will the parallel request be delayed, or will it be processed by the same warm instance?

I tested with a very lightweight function, which nevertheless takes some time to complete:

public static async Task<HttpResponseMessage> Delay500([HttpTrigger] HttpRequestMessage req)
{
    await Task.Delay(500);
    return req.CreateResponse(HttpStatusCode.OK, "Done");
}

I believe it's an OK approximation for an IO-bound function.

The test client then issued 2 to 10 parallel requests to this function and measured the response time for all requests.

It's not the easiest chart to understand in full, but note the following:

  • Each group of bars are for requests sent at the same time. Then there goes a pause about 20 seconds before the next group of requests gets sent

  • The bars are colored by the instance which processed that request: same instance - same color

Azure Functions Response Time to Batches of Simultaneous Requests

Here are some observations from this experiment:

  • Out of 64 requests, there were 11 cold starts

  • Same instance can process multiple simultaneous requests, e.g. one instance processed 7 out of 10 requests in the last batch

  • Nonetheless, Azure is eager to spin up new instances for multiple requests. In total 12 instances were created, which is even more than max amount of requests in any single batch

  • Some of those instances were actually never reused (gray-ish bars in batched x2 and x3, brown bar in x10)

  • The first request to each new instance pays the full cold start price. Runtime doesn't provision them in background while reusing existing instances for received requests

  • If an instance handled more than one request at a time, response time invariably suffers, even though the function is super lightweight (Task.Delay)

Conclusion

Getting back to the experiment goals, there are several things that we learned.

For low-traffic apps with sporadic requests it makes sense to setup a "warmer" timer function firing every 10 minutes or so to prevent the only instance from being recycled.

However, scale-out cold starts are real and I don't see any way to prevent them from happening.

When multiple requests come in at the same time, we might expect some of them to hit a new instance and get slowed down. The exact algorithm of instance reuse is not entirely clear.

Same instance is capable of processing multiple requests in parallel, so there are possibilities for optimization in terms of routing to warm instances during the provisioning of cold ones.

If such optimizations happen, I'll be glad to re-run my tests and report any noticeable improvements.

Stay tuned for more serverless perf goodness!

Mikhail Shilkov I'm Mikhail Shilkov, a software developer. I enjoy F#, C#, Javascript and SQL development, reasoning about distributed systems, data processing pipelines, cloud and web apps. I blog about my experience on this website.

LinkedIn@mikhailshilkovGitHubStack Overflow