Getting Started with AWS Lambda in Pulumi

For a small research project of mine, I needed to create HTTP triggered AWS Lambda's in all supported programming languages.

I'm not a power AWS user, so I get easily confused about the configuration of things like IAM roles or API Gateway. Moreover, I wanted my environment to be reproducible, so manual AWS Console wasn't a good option.

I decided it was a good job for Pulumi. They pay a lot of attention to serverless and especially AWS Lambda, and I love the power of configuration as code.

I created a Pulumi program which provisions Lambda's running on Javascript, .NET, Python, Java and Go. Pulumi program itself is written in Javascript.

I'm describing the resulting code below in case folks need to do the same thing. The code itself is on my github.

Javascript

Probably, the vast majority of Pulumi + AWS Lambda users will be using Javascript as programming language for their serverless functions.

No wonder that this scenario is the easiest to start with. There is a high-level package @pulumi/cloud-aws which hides all the AWS machinery from a developer.

The simplest function will consist of just several lines:

const cloud = require("@pulumi/cloud-aws");

const api = new cloud.API("aws-hellolambda-js");
api.get("/js", (req, res) => {
    res.status(200).json("Hi from Javascript lambda");
});

exports.endpointJs = api.publish().url;

Configure your Pulumi stack, run pulumi update and a Lambda is up, running and accessible via HTTP.

.NET Core

.NET is my default development environment and AWS Lambda supports .NET Core as execution runtime.

Pulumi program is still Javascript, so it can't mix C# code in. Thus, the setup looks like this:

  • There is a .NET Core 2.0 application written in C# and utilizing Amazon.Lambda.* NuGet packages
  • I build and publish this application with dotnet CLI
  • Pulumi then utilizes the published binaries to create deployment artifacts

C# function looks like this:

public class Functions
{
    public async Task<APIGatewayProxyResponse> GetAsync(APIGatewayProxyRequest request, ILambdaContext context)
    {
        return new APIGatewayProxyResponse
        {
            StatusCode = (int)HttpStatusCode.OK,
            Body = "\"Hi from C# Lambda\"",
            Headers = new Dictionary<string, string> { { "Content-Type", "application/json" } }
        };
    }
}

For non-Javascript lambdas I utilize @pulumi/aws package. It's of lower level than @pulumi/cloud-aws, so I had to setup IAM first:

const aws = require("@pulumi/aws");

const policy = {
    "Version": "2012-10-17",
    "Statement": [
        {
            "Action": "sts:AssumeRole",
            "Principal": {
                "Service": "lambda.amazonaws.com",
            },
            "Effect": "Allow",
            "Sid": "",
        },
    ],
};
const role = new aws.iam.Role("precompiled-lambda-role", {
    assumeRolePolicy: JSON.stringify(policy),
});

And then I did a raw definition of AWS Lambda:

const pulumi = require("@pulumi/pulumi");

const csharpLambda = new aws.lambda.Function("aws-hellolambda-csharp", {
    runtime: aws.lambda.DotnetCore2d0Runtime,
    code: new pulumi.asset.AssetArchive({
        ".": new pulumi.asset.FileArchive("./csharp/bin/Debug/netcoreapp2.0/publish"),
    }),
    timeout: 5,
    handler: "app::app.Functions::GetAsync",
    role: role.arn
});

Note the path to publish folder, which should match the path created by dotnet publish, and the handler name matching C# class/method.

Finally, I used @pulumi/aws-serverless to define API Gateway endpoint for the lambda:

const serverless = require("@pulumi/aws-serverless");

const precompiledApi = new serverless.apigateway.API("aws-hellolambda-precompiledapi", {
    routes: [
        { method: "GET", path: "/csharp", handler: csharpLambda },
    ],
});

That's definitely more ceremony compared to Javascript version. But hey, it's code, so if you find yourself repeating the same code, go ahead and make a higher order component out of it, incapsulating the repetitive logic.

Python

Pulumi supports Python as scripting language, but I'm sticking to Javascript for uniform experience.

In this case, the flow is similar to .NET but simpler: no compilation step is required. Just define a handler.py:

def handler(event, context): 
    return {
        'statusCode': 200,
        'headers': {'Content-Type': 'application/json'},
        'body': '"Hi from Python lambda"'
    }

and package it into zip in AWS lambda definition:

const pythonLambda = new aws.lambda.Function("aws-hellolambda-python", {
    runtime: aws.lambda.Python3d6Runtime,
    code: new pulumi.asset.AssetArchive({
        ".": new pulumi.asset.FileArchive("./python"),
    }),
    timeout: 5,
    handler: "handler.handler",
    role: role.arn
});

I'm reusing the role definition from above. The API definition will also be the same as for .NET.

Go

Golang is a compiled language, so the approach is similar to .NET: write code, build, reference the built artifact from Pulumi.

My Go function looks like this:

func Handler(request events.APIGatewayProxyRequest) (events.APIGatewayProxyResponse, error) {

 return events.APIGatewayProxyResponse{
  Body:       "\"Hi from Golang lambda\"",
  StatusCode: 200,
 }, nil

}

Because I'm on Windows but AWS Lambda runs on Linux, I had to use build-lambda-zip tool to make the package compatible. Here is the PowerShell build script:

$env:GOOS = "linux"
$env:GOARCH = "amd64"
go build -o main main.go
~\Go\bin\build-lambda-zip.exe -o main.zip main

and Pulumi function definition:

const golangLambda = new aws.lambda.Function("aws-hellolambda-golang", {
    runtime: aws.lambda.Go1dxRuntime,
    code: new pulumi.asset.FileArchive("./go/main.zip"),
    timeout: 5,
    handler: "main",
    role: role.arn
});

Java

Java class implements an interface from AWS SDK:

public class Hello implements RequestStreamHandler {

    public void handleRequest(InputStream inputStream, OutputStream outputStream, Context context) throws IOException {

        JSONObject responseJson = new JSONObject();

        responseJson.put("isBase64Encoded", false);
        responseJson.put("statusCode", "200");
        responseJson.put("body", "\"Hi from Java lambda\"");  

        OutputStreamWriter writer = new OutputStreamWriter(outputStream, "UTF-8");
        writer.write(responseJson.toJSONString());  
        writer.close();
    }
}

I compiled this code with Maven (mvn package), which produced a jar file. AWS Lambda accepts jar directly, but Pulumi's FileArchive is unfortunately crashing on trying to read it.

As a workaround, I had to define a zip file with jar placed inside lib folder:

const javaLambda = new aws.lambda.Function("aws-coldstart-java", {
    code: new pulumi.asset.AssetArchive({
        "lib/lambda-java-example-1.0-SNAPSHOT.jar": new pulumi.asset.FileAsset("./java/target/lambda-java-example-1.0-SNAPSHOT.jar"),
    }),
    runtime: aws.lambda.Java8Runtime,
    timeout: 5,
    handler: "example.Hello",
    role: role.arn
});

Conclusion

The complete code for 5 lambda functions in 5 different programming languages can be found in my github repository.

Running pulumi update provisions 25 AWS resources in a matter of 1 minute, so I can start playing with my test lambdas in no time.

And the best part: when I don't need them anymore, I run pulumi destroy and my AWS Console is clean again!

Happy serverless moments!

Monads explained in C# (again)

I love functional programming for the simplicity that it brings.

But at the same time, I realize that learning functional programming is a challenging process. FP comes with a baggage of unfamiliar vocabulary that can be daunting for somebody coming from an object-oriented language like C#.

Functional Programming Word Cloud

Some of functional lingo

"Monad" is probably the most infamous term from the list above. Monads have reputation of being something very abstract and very confusing.

The Fallacy of Monad Tutorials

Numerous attempts were made to explain monads in simple definitions; and monad tutorials have become a genre of its own. And yet, times and times again, they fail to enlighten the readers.

The shortest explanation of monads looks like this:

A Monad is just a monoid in the category of endofunctors

It's both mathematically correct and totally useless to anybody learning functional programming. To understand this statement, one has to know the terms "monoid", "category" and "endofunctors" and be able to mentally compose them into something meaningful.

The same problem is apparent in most monad tutorials. They assume some pre-existing knowledge in heads of their readers, and if that assumption fails, the tutorial doesn't click.

Focusing too much on mechanics of monads instead of explaining why they are important is another common problem.

Douglas Crockford grasped this fallacy very well:

The monadic curse is that once someone learns what monads are and how to use them, they lose the ability to explain them to other people

The problem here is likely the following. Every person who understands monads had their own path to this knowledge. It hasn't come all at once, instead there was a series of steps, each giving an insight, until the last final step made the puzzle complete.

But they don't remember the whole path anymore. They go online and blog about that very last step as the key to understanding, joining the club of flawed explanations.

There is an actual academic paper from Tomas Petricek that studies monad tutorials.

I've read that paper and a dozen of monad tutorials online. And of course, now I came up with my own.

I'm probably doomed to fail too, at least for some readers. Yet, I know that many people found the previous version of this article useful.

I based my explanation on examples from C# - the object-oriented language familiar to .NET developers.

Story of Composition

The base element of each functional program is Function. In typed languages each function is just a mapping between the type of its input parameter and output parameter. Such type can be annotated as func: TypeA -> TypeB.

C# is object-oriented language, so we use methods to declare functions. There are two ways to define a method comparable to func function above. I can use static method:

static class Mapper 
{
    static ClassB func(ClassA a) { ... }
}

... or instance method:

class ClassA 
{
    // Instance method
    ClassB func() { ... }
}

Static form looks closer to the function annotation, but both ways are actually equivalent for the purpose of our discussion. I will use instance methods in my examples, however all of them could be written as static extension methods too.

How do we compose more complex workflows, programs and applications out of such simple building blocks? A lot of patterns both in OOP and FP worlds revolve around this question. And monads are one of the answers.

My sample code is going to be about conferences and speakers. The method implementations aren't really important, just watch the types carefully. There are 4 classes (types) and 3 methods (functions):

class Speaker 
{
    Talk NextTalk() { ... }
}

class Talk 
{
    Conference GetConference() { ... }
}

class Conference 
{
    City GetCity() { ... }
}

class City { ... }

These methods are currently very easy to compose into a workflow:

static City NextTalkCity(Speaker speaker) 
{
    Talk talk = speaker.NextTalk();
    Conference conf = talk.GetConference();
    City city = conf.GetCity();
    return city;
}

Because the return type of the previous step always matches the input type of the next step, we can write it even shorter:

static City NextTalkCity(Speaker speaker) 
{
    return 
        speaker
        .NextTalk()
        .GetConference()
        .GetCity();
}

This code looks quite readable. It's concise and it flows from top to bottom, from left to right, similar to how we are used to read any text. There is not much noise too.

That's not what real codebases look like though, because there are multiple complications along the happy composition path. Let's look at some of them.

NULLs

Any class instance in C# can be null. In the example above I might get runtime errors if one of the methods ever returns null back.

Typed functional programming always tries to be explicit about types, so I'll re-write the signatures of my methods to annotate the return types as nullables:

class Speaker 
{
    Nullable<Talk> NextTalk() { ... }
}

class Talk 
{
    Nullable<Conference> GetConference() { ... }
}

class Conference 
{
    Nullable<City> GetCity() { ... }
}

class City { ... }

This is actually invalid syntax in current C# version, because Nullable<T> and its short form T? are not applicable to reference types. This might change in C# 8 though, so bear with me.

Now, when composing our workflow, we need to take care of null results:

static Nullable<City> NextTalkCity(Speaker speaker) 
{
    Nullable<Talk> talk = speaker.NextTalk();
    if (talk == null) return null;

    Nullable<Conference> conf = talk.GetConference();
    if (conf == null) return null;

    Nullable<City> city = conf.GetCity();
    return city;
}

It's still the same method, but it got more noise now. Even though I used short-circuit returns and one-liners, it still got harder to read.

To fight that problem, smart language designers came up with the Null Propagation Operator:

static Nullable<City> NextTalkCity(Speaker speaker) 
{
    return 
        speaker
        ?.NextTalk()
        ?.GetConference()
        ?.GetCity();
}

Now we are almost back to our original workflow code: it's clean and concise, we just got 3 extra ? symbols around.

Let's take another leap.

Collections

Quite often a function returns a collection of items, not just a single item. To some extent, that's a generalization of null case: with Nullable<T> we might get 0 or 1 results back, while with a collection we can get 0 to any n results.

Our sample API could look like this:

class Speaker 
{
    List<Talk> GetTalks() { ... }
}

class Talk 
{
    List<Conference> GetConferences() { ... }
}

class Conference 
{
    List<City> GetCities() { ... }
}

I used List<T> but it could be any class or plain IEnumerable<T> interface.

How would we combine the methods into one workflow? Traditional version would look like this:

static List<City> AllCitiesToVisit(Speaker speaker) 
{
    var result = new List<City>();

    foreach (Talk talk in speaker.GetTalks())
        foreach (Conference conf in talk.GetConferences())
            foreach (City city in conf.GetCities())
                result.Add(city);

    return result;
}

It reads ok-ish still. But the combination of nested loops and mutation with some conditionals sprinkled on them can get unreadable pretty soon. The exact workflow might be lost in the mechanics.

As an alternative, C# language designers invented LINQ extension methods. We can write code like this:

static List<City> AllCitiesToVisit(Speaker speaker) 
{
    return 
        speaker
        .GetTalks()
        .SelectMany(talk => talk.GetConferences())
        .SelectMany(conf => conf.GetCities())
        .ToList();
}

Let me do one further trick and format the same code in an unusual way:

static List<City> AllCitiesToVisit(Speaker speaker) 
{
    return 
        speaker
        .GetTalks()           .SelectMany(x => x
        .GetConferences()    ).SelectMany(x => x
        .GetCities()         ).ToList();
}

Now you can see the same original code on the left, combined with just a bit of technical repeatable clutter on the right. Hold on, I'll show you where I'm going.

Let's discuss another possible complication.

Asynchronous Calls

What if our methods need to access some remote database or service to produce the results? This should be shown in type signature, and C# has Task<T> for that:

class Speaker 
{
    Task<Talk> NextTalk() { ... }
}

class Talk 
{
    Task<Conference> GetConference() { ... }
}

class Conference 
{
    Task<City> GetCity() { ... }
}

This change breaks our nice workflow composition again.

We'll get back to async-await later, but the original way to combine Task-based methods was to use ContinueWith and Unwrap API:

static Task<City> NextTalkCity(Speaker speaker) 
{
    return 
        speaker
        .NextTalk()
        .ContinueWith(talk => talk.Result.GetConference())
        .Unwrap()
        .ContinueWith(conf => conf.Result.GetCity())
        .Unwrap();
}

Hard to read, but let me apply my formatting trick again:

static Task<City> NextTalkCity(Speaker speaker) 
{
    return 
        speaker
        .NextTalk()         .ContinueWith(x => x.Result
        .GetConference()   ).Unwrap().ContinueWith(x => x.Result
        .GetCity()         ).Unwrap();
}

You can see that, once again, it's our nice readable workflow on the left + some mechanical repeatable junction code on the right.

Pattern

Can you see a pattern yet?

I'll repeat the Nullable-, List- and Task-based workflows again:

static Nullable<City> NextTalkCity(Speaker speaker) 
{
    return 
        speaker               ?
        .NextTalk()           ?
        .GetConference()      ?
        .GetCity();
}

static List<City> AllCitiesToVisit(Speaker speaker) 
{
    return 
        speaker
        .GetTalks()            .SelectMany(x => x
        .GetConferences()     ).SelectMany(x => x
        .GetCities()          ).ToList();
}

static Task<City> NextTalkCity(Speaker speaker) 
{
    return 
        speaker
        .NextTalk()            .ContinueWith(x => x.Result
        .GetConference()      ).Unwrap().ContinueWith(x => x.Result
        .GetCity()            ).Unwrap();
}

In all 3 cases there was a complication which prevented us from sequencing method calls fluently. In all 3 cases we found the gluing code to get back to fluent composition.

Let's try to generalize this approach. Given some generic container type WorkflowThatReturns<T>, we have a method to combine an instance of such workflow with a function which accepts the result of that workflow and returns another workflow back:

class WorkflowThatReturns<T> 
{
    WorkflowThatReturns<U> AddStep(Func<T, WorkflowThatReturns<U>> step);
}

In case this is hard to grasp, have a look at the picture of what is going on:

Monad Bind Internals

  1. An instance of type T sits in a generic container.

  2. We call AddStep with a function, which maps T to U sitting inside yet another container.

  3. We get an instance of U but inside two containers.

  4. Two containers are automatically unwrapped into a single container to get back to the original shape.

  5. Now we are ready to add another step!

In the following code, NextTalk returns the first instance inside the container:

WorkflowThatReturns<City> Workflow(Speaker speaker) 
{
    return 
        speaker
        .NextTalk()         
        .AddStep(x => x.GetConference())
        .AddStep(x => x.GetCity()); 
}

Subsequently, AddStep is called two times to transfer to Conference and then City inside the same container:

Monad Bind Chaining

Finally, Monads

The name of this pattern is Monad.

In C# terms, a Monad is a generic class with two operations: constructor and bind.

class Monad<T> {
    Monad(T instance);
    Monad<U> Bind(Func<T, Monad<U>> f);
}

Constructor is used to put an object into container, Bind is used to replace one contained object with another contained object.

It's important that Bind's argument returns Monad<U> and not just U. We can think of Bind as a combination of Map and Unwrap as defined per following signature:

class Monad<T> {
    Monad(T instance);
    Monad<U> Map(Function<T, U> f);
    static Monad<U> Unwrap(Monad<Monad<U>> nested);
}

Even though I spent quite some time with examples, I expect you to be slightly confused at this point. That's ok.

Keep going and let's have a look at several sample implementations of Monad pattern.

Maybe (Option)

My first motivational example was with Nullable<T> and ?.. The full pattern containing either 0 or 1 instance of some type is called Maybe (it maybe has a value, or maybe not).

Maybe is another approach to dealing with 'no value' value, alternative to the concept of null.

Functional-first language F# typically doesn't allow null for its types. Instead, F# has a maybe implementation built into the language: it's called option type.

Here is a sample implementation in C#:

public class Maybe<T> where T : class
{
    private readonly T value;

    public Maybe(T someValue)
    {
        if (someValue == null)
            throw new ArgumentNullException(nameof(someValue));
        this.value = someValue;
    }

    private Maybe()
    {
    }

    public Maybe<U> Bind<U>(Func<T, Maybe<U>> func) where U : class
    {
        return value != null ? func(value) : Maybe<U>.None();
    }

    public static Maybe<T> None() => new Maybe<T>();
}

When null is not allowed, any API contract gets more explicit: either you return type T and it's always going to be filled, or you return Maybe<T>. The client will see that Maybe type is used, so it will be forced to handle the case of absent value.

Given an imaginary repository contract (which does something with customers and orders):

public interface IMaybeAwareRepository
{
    Maybe<Customer> GetCustomer(int id);
    Maybe<Address> GetAddress(int id);
    Maybe<Order> GetOrder(int id);
}

The client can be written with Bind method composition, without branching, in fluent style:

Maybe<Shipper> shipperOfLastOrderOnCurrentAddress =
    repo.GetCustomer(customerId)
        .Bind(c => c.Address)
        .Bind(a => repo.GetAddress(a.Id))
        .Bind(a => a.LastOrder)
        .Bind(lo => repo.GetOrder(lo.Id))
        .Bind(o => o.Shipper);

As we saw above, this syntax looks very much like a LINQ query with a bunch of SelectMany statements. One of the common implementations of Maybe implements IEnumerable interface to enable a more C#-idiomatic binding composition. Actually:

Enumerable + SelectMany is a Monad

IEnumerable is an interface for enumerable containers.

Enumerable containers can be created - thus the constructor monadic operation.

The Bind operation is defined by the standard LINQ extension method, here is its signature:

public static IEnumerable<U> SelectMany<T, U>(
    this IEnumerable<T> first, 
    Func<T, IEnumerable<U>> selector)

Direct implementation is quite straightforward:

static class Enumerable 
{
    public static IEnumerable<U> SelectMany(
        this IEnumerable<T> values, 
        Func<T, IEnumerable<U>> func) 
    { 
        foreach (var item in values)
            foreach (var subItem in func(item))
                yield return subItem;
    }
}

And here is an example of composition:

IEnumerable<Shipper> shippers =
    customers
        .SelectMany(c => c.Addresses)
        .SelectMany(a => a.Orders)
        .SelectMany(o => o.Shippers);

The query has no idea about how the collections are stored (encapsulated in containers). We use functions T -> IEnumerable<U> to produce new enumerables (Bind operation).

Task (Future)

In C# Task<T> type is used to denote asynchronous computation which will eventually return an instance of T. The other names for similar concepts in other languages are Promise and Future.

While the typical usage of Task in C# is different from the Monad pattern we discussed, I can still come up with a Future class with the familiar structure:

public class Future<T>
{
    private readonly Task<T> instance;

    public Future(T instance)
    {
        this.instance = Task.FromResult(instance);
    }

    private Future(Task<T> instance)
    {
        this.instance = instance;
    }

    public Future<U> Bind<U>(Func<T, Future<U>> func)
    {
        var a = this.instance.ContinueWith(t => func(t.Result).instance).Unwrap();
        return new Future<U>(a);
    }

    public void OnComplete(Action<T> action)
    {
        this.instance.ContinueWith(t => action(t.Result));
    }
}

Effectively, it's just a wrapper around the Task which doesn't add too much value, but it's a useful illustration because now we can do:

repository
    .LoadSpeaker()
    .Bind(speaker => speaker.NextTalk())
    .Bind(talk => talk.GetConference())
    .Bind(conference => conference.GetCity())
    .OnComplete(city => reservations.BookFlight(city));

We are back to the familiar structure. Time for some more complications.

Non-Sequential Workflows

Up until now, all the composed workflows had very liniar, sequential structure: the output of a previous step was always the input for the next step. That piece of data could be discarded after the first use because it was never needed for later steps:

Linear Workflow

Quite often though, this might not be the case. A workflow step might need data from two or more previous steps combined.

In the example above, BookFlight method might actually need both Speaker and City objects:

Non Linear Workflow

In this case, we would have to use closure to save speaker object until we get a talk too:

repository
    .LoadSpeaker()
    .OnComplete(speaker =>
        speaker
            .NextTalk()
            .Bind(talk => talk.GetConference())
            .Bind(conference => conference.GetCity())
            .OnComplete(city => reservations.BookFlight(speaker, city))
        );

Obviously, this gets ugly very soon.

To solve this structural problem, C# language got its async-await feature, which is now being reused in more languages including Javascript.

If we move back to using Task instead of our custom Future, we are able to write

var speaker = await repository.LoadSpeaker();
var talk = await speaker.NextTalk();
var conference = await talk.GetConference();
var city = await conference.GetCity();
await reservations.BookFlight(speaker, city);

Even though we lost the fluent syntax, at least the block has just one level, which makes it easier to navigate.

Monads in Functional Languages

So far we learned that

  • Monad is a workflow composition pattern
  • This pattern is used in functional programming
  • Special syntax helps simplify the usage

It should come at no surprise that functional languages support monads on syntactic level.

F# is a functional-first language running on .NET framework. F# had its own way of doing workflows comparable to async-await before C# got it. In F#, the above code would look like this:

let sendReservation () = async {
    let! speaker = repository.LoadSpeaker()
    let! talk = speaker.nextTalk()
    let! conf = talk.getConference()
    let! city = conf.getCity()
    do! bookFlight(speaker, city)
}

Apart from syntax (! instead of await), the major difference to C# is that async is just one possible monad type to be used this way. There are many other monads in F# standard library (they are called Computation Expressions).

The best part is that any developer can create their own monads, and then use all the power of language features.

Say, we want a hand-made Maybe computation expressoin in F#:

let nextTalkCity (speaker: Speaker) = maybe {
    let! talk = speaker.nextTalk()
    let! conf = talk.getConference()
    let! city = conf.getCity(talk)
    return city
}

To make this code runnable, we need to define Maybe computation expression builder:

type MaybeBuilder() =

    member this.Bind(x, f) = 
        match x with
        | None -> None
        | Some a -> f a

    member this.Return(x) = 
        Some x

let maybe = new MaybeBuilder()

I won't explain the details of what happens here, but you can see that the code is quite trivial. Note the presence of Bind operation (and Return operation being the monad constructor).

The feature is widely used by third-party F# libraries. Here is an actor definition in Akka.NET F# API:

let loop () = actor {
    let! message = mailbox.Receive()
    match message with
    | Greet(name) -> printfn "Hello %s" name
    | Hi -> printfn "Hello from F#!"
    return! loop ()
}

Monad Laws

There are a couple laws that constructor and Bind need to adhere to, so that they produce a proper monad.

A typical monad tutorial will make a lot of emphasis on the laws, but I find them less important to explain to a beginner. Nonetheless, here they are for the sake of completeness.

Left Identity law says that Monad constructor is a neutral operation: you can safely run it before Bind, and it won't change the result of the function call:

// Given
T value;
Func<T, Monad<U>> f;

// Then (== means both parts are equivalent)
new Monad<T>(value).Bind(f) == f(value) 

Right Identity law says that given a monadic value, wrapping its contained data into another monad of same type and then Binding it, doesn't change the original value:

// Given
Monad<T> monadicValue;

// Then (== means both parts are equivalent)
monadicValue.Bind(x => new Monad<T>(x)) == monadicValue

Associativity law means that the order in which Bind operations are composed does not matter:

// Given
Monad<T> m;
Func<T, Monad<U>> f;
Func<U, Monad<V>> g;

// Then (== means both parts are equivalent)
m.Bind(f).Bind(g) == m.Bind(a => f(a).Bind(g))

The laws may look complicated, but in fact they are very natural expectations that any developer has when working with monads, so don't spend too much mental effort on memorizing them.

Conclusion

You should not be afraid of the "M-word" just because you are a C# programmer.

C# does not have a notion of monads as predefined language constructs, but that doesn't mean we can't borrow some ideas from the functional world. Having said that, it's also true that C# is lacking some powerful ways to combine and generalize monads that are available in functional programming languages.

Go learn some more Functional Programming!

Programmable Cloud: Provisioning Azure App Service with Pulumi

Modern Cloud providers offer a wide variety of services of different types and levels. A modern cloud application would leverage multiple services in order to be efficient in terms of developer experience, price, operations etc.

For instance, a very simple Web Application deployed to Azure PaaS services could use

  • App Service - to host the application
  • App Service Plan - to define the instance size, price, scaling and other hosting parameters
  • Azure SQL Database - to store relational data
  • Application Insights - to collect telemetry and logs
  • Storage Account - to store the binaries and leverage Run-as-Zip feature

Provisioning such environment becomes a task on its own:

  • How do we create the initial setup?
  • How do we make changes?
  • What if we need multiple environments?
  • How do we apply settings?
  • How do we recycle resources which aren't needed anymore?

Well, there are several options.

Manually in Azure Portal

We all start doing this in Azure Portal. User Interface is great for discovering new services and features, and it's a quick way to make a single change.

Azure Portal

Creating an App Service in Azure Portal

Clicking buttons manually doesn't scale though. After the initial setup is complete, maintaining the environment over time poses significant challenges:

  • Every change requires going back to the portal, finding the right resource and doing the right change
  • People make mistakes, so if you have multiple environments, they are likely to be different in subtle ways
  • Naming gets messy over time
  • There is no easily accessible history of environment changes
  • Cleaning up is hard: usually some leftovers will remain unnoticed
  • Skills are required from everybody involved in provisioning

So, how do we streamline this process?

Azure PowerShell, CLI and Management SDKs

Azure comes with a powerful set of tools to manage resources with code.

You can use PowerShell, CLI scripts or custom code like C# to do with code whatever is possible to do via portal.

var webApp = azure.WebApps.Define(appName)
    .WithRegion(Region.WestEurope)
    .WithNewResourceGroup(rgName)
    .WithNewFreeAppServicePlan()
    .Create();

Fluent C# code creating an App Service

However, those commands are usually expressed in imperative style of CRUD operations. You can run the commands once, but it's hard to modify existing resources from an arbitrary state to the desired end state.

Azure Resource Manager Templates

All services in Azure are managed by Azure Resource Manager (ARM). ARM has a special JSON-based format for templates.

Once a template is defined, it's relatively straightforward to be deployed to Azure environment. So, if resources are defined in JSON, they will be created automatically via PowerShell or CLI commands.

It is also possible to deploy templates in incremental mode, when the tool will compare existing environment with desired configuration and will deploy the difference.

Templates can be parametrized, which enables multi-environment deployments.

There's a problem with templates though: they are JSON files. They get very large very fast, they are hard to reuse, it's easy to make a typo.

ARM Template

A fragment of auto-generated ARM Template for App Service, note the line numbers

Terraform is another templating tool to provision cloud resources but it uses YAML instead of JSON. I don't have much experience with it, but the problems seem to be very similar.

Can we combine the power of SDKs and the power of JSON-/YAML-based desired state configuration tools?

Pulumi

One potential solution has just arrived. A startup called Pulumi just went out of private beta to open source.

Pulumi

Pulumi wants to be much more than a better version of ARM templates, aiming to become the tool to build cloud-first distributed systems. But for today I'll focus on lower level of resource provisioning task.

With Pulumi cloud infrastructure is defined in code using full-blown general purpose programming languages.

The workflow goes like this:

  • Define a Stack, which is a container for a group of related resources
  • Write a program in one of supported languages (I'll use TypeScript) which references pulumi libraries and constructs all the resources as objects
  • Establish connection with your Azure account
  • Call pulumi CLI to create, update or destroy Azure resources based on the program
  • Pulumi will first show the preview of changes, and then apply them as requested

Pulumi Program

I'm using TypeScript to define my Azure resources in Pulumi. So, the program is a normal Node.js application with index.ts file, package references in package.json and one extra file Pulumi.yaml to define the program:

name: azure-appservice
runtime: nodejs

Our index.js is as simple as a bunch of import statements followed by creating TypeScript objects per desired resource. The simplest program can look like this:

import * as pulumi from "@pulumi/pulumi";
import * as azure from "@pulumi/azure";

const resourceGroup = new azure.core.ResourceGroup("myrg", {
    location: "West Europe"
});

When executed by pulumi update command, this program will create a new Resource Group in your Azure subscription.

Chaining Resources

When multiple resources are created, the properties of one resource will depend on properties of the others. E.g. I've defined the Resource Group above, and now I want to create an App Service Plan under this Group:

const resourceGroupArgs = {
    resourceGroupName: resourceGroup.name,
    location: resourceGroup.location
};

const appServicePlan = new azure.appservice.Plan("myplan", {
    ...resourceGroupArgs,

    kind: "App",

    sku: {
        tier: "Basic",
        size: "B1",
    },
});

I've assigned resourceGroupName and location of App Service Plan to values from the Resource Group. It looks like a simple assignment of strings but in fact it's more complicated.

Property resourceGroup.name has the type of pulumi.Output<string>. Constructor argument resourceGroupName of Plan has the type of pulumi.Input<string>.

We assigned "myrg" value to Resource Group name, but during the actual deployment it will change. Pulumi will append a unique identifier to the name, so the actually provisioned group will be named e.g. "myrg65fb103e".

This value will materialize inside Output type only at deployment time, and then it will get propagated to Input by Pulumi.

There is also a nice way to return the end values of Output's from Pulumi program. Let's say we define an App Service:

const app = new azure.appservice.AppService("mywebsite", {
    ...resourceGroupArgs,

    appServicePlanId: appServicePlan.id
});

First, notice how we used TypeScript spread operator to reuse properties from resourceGroupArgs.

Second, Output-Input assignment got used again to propagate App Service Plan ID.

Lastly, we can now export App Service host name from our program, e.g. for the user to be able to go to the web site immediately after deployment:

exports.hostname = app.defaultSiteHostname;

Output can also be transformed with apply function. Here is the code to format output URL:

exports.endpoint = app.defaultSiteHostname.apply(n => `https://${n}`);

Running pulumi update from CLI will then print the endpoint for us:

---outputs:---
endpoint: "https://mywebsiteb76260b5.azurewebsites.net"

Multiple outputs can be combined with pulumi.all, e.g. given SQL Server and Database, we could make a connection string:

const connectionString = 
    pulumi.all([sqlServer, database]).apply(([server, db]) => 
        `Server=tcp:${server}.database.windows.net;initial catalog=${db};user ID=${username};password=${pwd};Min Pool Size=0;Max Pool Size=30;Persist Security Info=true;`)

Using the Power of NPM

Since our program is just a TypeScript application, we are free to use any 3rd party package which exists out there in NPM.

For instance, we can install Azure Storage SDK. Just

npm install [email protected]

and then we can write a function to produce SAS token for a Blob in Azure Storage:

import * as azurestorage from "azure-storage";

// Given an Azure blob, create a SAS URL that can read it.
export function signedBlobReadUrl(
    blob: azure.storage.Blob | azure.storage.ZipBlob,
    account: azure.storage.Account,
    container: azure.storage.Container,
): pulumi.Output<string> {
    const signatureExpiration = new Date(2100, 1);

    return pulumi.all([
        account.primaryConnectionString,
        container.name,
        blob.name,
    ]).apply(([connectionString, containerName, blobName]) => {
        let blobService = new azurestorage.BlobService(connectionString);
        let signature = blobService.generateSharedAccessSignature(
            containerName,
            blobName,
            {
                AccessPolicy: {
                    Expiry: signatureExpiration,
                    Permissions: azurestorage.BlobUtilities.SharedAccessPermissions.READ,
                },
            }
        );

        return blobService.getUrl(containerName, blobName, signature);
    });
}

I took this function from Azure Functions example, and it will probably move to Pulumi libraries at some point, but until then you are free to leverage the package ecosystem.

Deploying Application Files

So far we provisioned Azure App Service, but we can also deploy the application files as part of the same workflow.

The code below is using Run from Zip feature of App Service:

  1. Define Storage Account and Container

     const storageAccount = new azure.storage.Account("mystorage", {
         ...resourceGroupArgs,
    
         accountKind: "StorageV2",
         accountTier: "Standard",
         accountReplicationType: "LRS",
     });
    
     const storageContainer = new azure.storage.Container("mycontainer", {
         resourceGroupName: resourceGroup.name,
         storageAccountName: storageAccount.name,
         containerAccessType: "private",
     });
  2. Create a folder with application files, e.g. wwwroot. It may contain some test HTML, ASP.NET application, or anything supported by App Service.

  3. Produce a zip file from that folder in Pulumi program:

     const blob = new azure.storage.ZipBlob("myzip", {
         resourceGroupName: resourceGroup.name,
         storageAccountName: storageAccount.name,
         storageContainerName: storageContainer.name,
         type: "block",
    
         content: new pulumi.asset.FileArchive("wwwroot")
     });
  4. Produce SAS Blob URL and assign it to App Service Run-as-Zip setting:

     const codeBlobUrl = signedBlobReadUrl(blob, storageAccount, storageContainer);
    
     const app = new azure.appservice.AppService("mywebsite", {
         ...resourceGroupArgs,
    
         appServicePlanId: appServicePlan.id,
    
         appSettings: {
             "WEBSITE_RUN_FROM_ZIP": codeBlobUrl
         }
     });

Run the program, and your Application will start as soon as pulumi update is complete.

Determinism

Pulumi programs should strive to be deterministic. That means you should avoid using things like current date/time or random numbers.

The reason is incremental updates. Every time you run pulumi update, it will execute the program from scratch. If your resources depend on random values, they will not match the existing resources and thus the false delta will be detected and deployed.

In the SAS generation example above we used a fixed date in the future instead of doing today + 1 year kind of calculation.

Should Pulumi provide some workaround for this?

Conclusion

My code was kindly merged to Pulumi examples, go there for the complete runnable program that provisions App Service with Azure SQL Database and Application Insights.

I really see high potential in Cloud-as-Code approach suggested by Pulumi. Today we just scratched the surface of the possibilities. We were working with cloud services on raw level: provisioning specific services with given parameters.

Pulumi's vision includes providing higher-level components to blur the line between infrastructure and code, and to enable everybody to create such components on their own.

Exciting future ahead!

Cold Starts Beyond First Request in Azure Functions

In my previous article I've explored the topic of Cold Starts in Azure Functions. Particularly, I've measured the cold start delays per language and runtime version.

I received some follow-up questions that I'd like to explore in today's post:

  • Can we avoid cold starts except the very first one by keeping the instance warm?
  • Given one warm instance, if two requests come at the same time, will one request hit a cold start because existing instance is busy with the other?
  • In general, does a cold start happen at scale-out when a new extra instance is provisioned?

Again, we are only talking Consumption Plan here.

Theory

Azure Functions are running on instances provided by Azure App Service. Each instance is able to process several requests concurrently, which is different comparing to AWS Lambda.

Thus, the following could be true:

  • If we issue at least 1 request every 20 minutes, the first instance should stay warm for long time
  • Simultaneous requests don't cause cold start unless the existing instance gets too busy
  • When runtime decides to scale out and spin up a new instance, it could do so in the background, still forwarding incoming requests to the existing warm instance(s). Once the new instance is ready, it could be added to the pool without causing cold starts
  • If so, cold starts are mitigated beyond the very first execution

Let's put this theory under test!

Keeping Always Warm

I've tested a Function App which consists of two Functions:

  • HTTP Function under test
  • Timer Function which runs every 10 minutes and does nothing but logging 1 line of text

I then measured the cold start statistics similar to all the tests from my previous article.

During 2 days I was issuing infrequent requests to the same app, most of them would normally lead to a cold start. Interestingly, even though I was regularly firing the timer, Azure switched instances to serve my application 2 times during the test period:

Infrequent Requests to Azure Functions with "Keep It Warm" Timer

I can see that most responses are fast, so timer "warmer" definitely helps.

The first request(s) to a new instance are slower than subsequent ones. Still, they are faster than normal full cold start time, so it could be related to HTTP stack loading.

Anyway, keeping Functions warm seems a viable strategy.

Parallel Requests

What happens when there is a warm instance, but it's already busy with processing another request? Will the parallel request be delayed, or will it be processed by the same warm instance?

I tested with a very lightweight function, which nevertheless takes some time to complete:

public static async Task<HttpResponseMessage> Delay500([HttpTrigger] HttpRequestMessage req)
{
    await Task.Delay(500);
    return req.CreateResponse(HttpStatusCode.OK, "Done");
}

I believe it's an OK approximation for an IO-bound function.

The test client then issued 2 to 10 parallel requests to this function and measured the response time for all requests.

It's not the easiest chart to understand in full, but note the following:

  • Each group of bars are for requests sent at the same time. Then there goes a pause about 20 seconds before the next group of requests gets sent

  • The bars are colored by the instance which processed that request: same instance - same color

Azure Functions Response Time to Batches of Simultaneous Requests

Here are some observations from this experiment:

  • Out of 64 requests, there were 11 cold starts

  • Same instance can process multiple simultaneous requests, e.g. one instance processed 7 out of 10 requests in the last batch

  • Nonetheless, Azure is eager to spin up new instances for multiple requests. In total 12 instances were created, which is even more than max amount of requests in any single batch

  • Some of those instances were actually never reused (gray-ish bars in batched x2 and x3, brown bar in x10)

  • The first request to each new instance pays the full cold start price. Runtime doesn't provision them in background while reusing existing instances for received requests

  • If an instance handled more than one request at a time, response time invariably suffers, even though the function is super lightweight (Task.Delay)

Conclusion

Getting back to the experiment goals, there are several things that we learned.

For low-traffic apps with sporadic requests it makes sense to setup a "warmer" timer function firing every 10 minutes or so to prevent the only instance from being recycled.

However, scale-out cold starts are real and I don't see any way to prevent them from happening.

When multiple requests come in at the same time, we might expect some of them to hit a new instance and get slowed down. The exact algorithm of instance reuse is not entirely clear.

Same instance is capable of processing multiple requests in parallel, so there are possibilities for optimization in terms of routing to warm instances during the provisioning of cold ones.

If such optimizations happen, I'll be glad to re-run my tests and report any noticeable improvements.

Stay tuned for more serverless perf goodness!

Azure Functions: Cold Starts in Numbers

Auto-provisioning and auto-scalability are the killer features of Function-as-a-Service cloud offerings, and Azure Functions in particular.

One drawback of such dynamic provisioning is a phenomenon called "Cold Start". Basically, applications that haven't been used for a while take longer to startup and to handle the first request.

The problem is nicely described in Understanding Serverless Cold Start, so I won't repeat it here. I'll just copy a picture from that article:

Cold Start

Based on the 4 actions which happen during a cold start, we may guess that the following factors might affect the cold start duration:

  • Language / execution runtime
  • Azure Functions runtime version
  • Application size including dependencies

I ran several sample functions and tried to analyze the impact of these factors on cold start time.

Methodology

All tests were run against HTTP Functions, because that's where cold start matters the most.

All the functions were just returning "Hello, World" taking the "World" value from the query string. Some functions were also loading extra dependencies, see below.

I did not rely on execution time reported by Azure. Instead, I measured end-to-end duration from client perspective. All calls were made from within the same Azure region, so network latency should have minimal impact:

Test Setup

When Does Cold Start Happen?

Obviously, cold start happens when the very first request comes in. After that request is processed, the instance is kept alive in case subsequent requests arrive. But for how long?

The following chart gives the answer. It shows values of normalized request durations across different languages and runtime versions (Y axis) depending on the time since the previous request in minutes (X axis):

Cold Start Threshold

Clearly, an idle instance lives for 20 minutes and then gets recycled. All requests after 20 minutes threshold hit another cold start.

How Do Languages Compare?

I'll start with version 1 of Functions runtime, which is the production-ready GA version as of today.

I've written Hello World HTTP function in all GA languages: C#, F# and Javascript, and I added Python for comparison. C#/F# were executed both in the form of script, and as a precompiled .NET assembly.

The following chart shows some intuition about the cold start duration per language. The languages are ordered based on mean response time, from lowest to highest. 65% of request durations are inside the vertical bar (1-sigma interval) and 95% are inside the vertical line (2-sigma):

Cold Start V1 per Language

Somewhat surprisingly, precompiled .NET is exactly on par with Javascript. Javascript "Hello World" is really lightweight, so I expected it to win, but I was wrong.

C# Script is slower but somewhat comparable. F# Script presented a really negative surprise though: it's much slower. It's even slower than experimental Python support where no performance optimization would be expected at all!

Functions Runtime: V1 vs V2

Version 2 of Functions runtime is currently in preview and not suitable for production load. That probably means they haven't done too much performance optimization, especially from cold start standpoint.

Can we see this on the chart? We sure can:

Cold Start V1 vs V2

V2 is massively slower. The fastest cold starts are around 6 seconds, but the slowest can come up to 40-50 seconds.

Javascript is again on-par with precompiled .NET.

Java is noticeably slower, even though the deployment package is just 33kB, so I assume I didn't overblow it.

Does Size Matter?

OK, enough of Hello World. A real-life function might be more heavy, mainly because it would depend on other third-party libraries.

To simulate such scenario, I've measured cold starts for a .NET function with references to Entity Framework, Automapper, Polly and Serilog.

For Javascript I did the same, but referenced Bluebird, lodash and AWS SDK.

Here are the results:

Cold Start Dependencies

As expected, the dependencies slow the loading down. You should keep your Functions lean, otherwise you will pay in seconds for every cold start.

An important note for Javascript developers: the above numbers are for Functions deployed after Funcpack preprocessor. The package contained the single js file with Webpack-ed dependency tree. Without that, the mean cold start time of the same function is 20 seconds!

Conclusions

Here are some lessons learned from all the experiments above:

  • Be prepared for 1-3 seconds cold starts even for the smallest Functions
  • Stay on V1 of runtime until V2 goes GA unless you don't care about perf
  • .NET precompiled and Javascript Functions have roughly same cold start time
  • Minimize the amount of dependencies, only bring what's needed

Do you see anything weird or unexpected in my results? Do you need me to dig deeper on other aspects? Please leave a comment below or ping me on twitter, and let's sort it all out.

There is a follow-up post available: Cold Starts Beyond First Request in Azure Functions

Mikhail Shilkov I'm Mikhail Shilkov, a software developer and architect, a Microsoft Azure MVP, Russian expat living in the Netherlands. I am passionate about cloud technologies, functional programming and the intersection of the two.

LinkedIn@mikhailshilkovGitHubStack Overflow