Azure Function Triggered by Azure Event Grid

Update: I missed the elephant in the room. There actually exists a specialized trigger for Event Grid binding. In the portal, just select Experimental in Scenario drop down while creating the function. In precompiled functions, reference Microsoft.Azure.WebJobs.Extensions.EventGrid NuGet package.

The rest of the article describes my original approach to trigger an Azure Function from Azure Event Grid with generic Web Hook trigger.

Here are the steps to follow:

Create a Function with Webhook Trigger

I'm not aware of a specialized trigger type for Event Grid, so I decided to use Generic Webhook trigger (which is essentially an HTTP trigger).

I used the Azure Portal to generate a function, so here is the function.json that I got:

{
  "bindings": [
    {
      "type": "httpTrigger",
      "direction": "in",
      "webHookType": "genericJson",
      "name": "req"
    },
    {
      "type": "http",
      "direction": "out",
      "name": "res"
    }
  ],
  "disabled": false
}

For precompiled functions, just decorate it with HttpTriggerAttribute with POST method:

public static Task<HttpResponseMessage> Run(
    [HttpTrigger(AuthorizationLevel.Function, "post")] HttpRequestMessage req)

Parse the Payload

Events from Event Grid will arrive in a specific predefined JSON format. Here is an example of events to expect:

[{
  "id": "0001",
  "eventType": "MyHelloWorld",
  "subject": "Hello World!",
  "eventTime": "2017-10-05T08:53:07",
  "data": {
    "hello": "world"
  },
  "topic": "/SUBSCRIPTIONS/GUID/RESOURCEGROUPS/NAME/PROVIDERS/MICROSOFT.EVENTGRID/TOPICS/MY-EVENTGRID-TOPIC1"
}]

To be able to parse those data more easily, I defined a C# class to deserialize JSON to:

public class GridEvent
{
    public string Id { get; set; }
    public string EventType { get; set; }
    public string Subject { get; set; }
    public DateTime EventTime { get; set; }
    public Dictionary<string, string> Data { get; set; }
    public string Topic { get; set; }
}

Now, the function can read the events (note, that they are sent in arrays) from the body of POST request:

public static async Task<HttpResponseMessage> Run(HttpRequestMessage req, TraceWriter log)
{
    string jsonContent = await req.Content.ReadAsStringAsync();
    var events = JsonConvert.DeserializeObject<GridEvent[]>(jsonContent);

    // do something with events

    return req.CreateResponse(HttpStatusCode.OK);
}

Validate the Endpoint

To prevent you from sending events to endpoints that you don't own, Event Grid requires each subsriber to validate itself. For this purpose, Event Grid will send events of the special type SubscriptionValidation.

The validation request will contain a code, which we need to echo back in 200-OK HTTP response.

Here is a small piece of code to do just that:

if (req.Headers.GetValues("Aeg-Event-Type").FirstOrDefault() == "SubscriptionValidation")
{
    var code = events[0].Data["validationCode"];
    return req.CreateResponse(HttpStatusCode.OK,
        new { validationResponse = code });
}

The function is ready!

Create a Custom Event Grid Topic

To test it out, go to the portal and create a custom Event Grid topic. Then click on Add Event Subscription button, give it a name and copy paste the function URL (including key) to Subscriber Endpoint field:

Azure Function URL

Event Grid Subscription

Creating a subscription will immediately trigger a validation request to your function, so you should see one invocation in the logs.

Send Custom Events

Now, go to your favorite HTTP client (curl, Postman, etc) and send a sample event to check how the whole setup works:

POST /api/events HTTP/1.1
Host: <your-eventgrid-topic>.westus2-1.eventgrid.azure.net
aeg-sas-key: <key>
Content-Type: application/json

[{
  "id": "001",
  "eventType": "MyHelloWorld",
  "subject": "Hello World!",
  "eventTime": "2017-10-05T08:53:07",
  "data": {
    "hello": "world"
  }
}]

Obviously, adjust the endpoint and key based on the data from the portal.

You should get a 200-OK back and then see your event in Azure Function invocation logs.

Have fun!

Wanted: Effectively-Once Processing in Azure

This experimental post is a question. The question is too broad for StackOverflow, so I'm posting it here. Please engage in the comments section, or forward the link to subject experts.

TL;DR: Are there any known patterns / tools / frameworks to provide scalable, stateful, effectively-once, end-to-end processing of messages, to be hosted in Azure, preferably on PaaS-level of service?

Motivational Example

Let's say we are making a TODO app. There is a constant flow of requests to create a TODO in the system. Each request contains just two fields: a title and a project ID which TODO should belong to. Here is the definition:

type TodoRequest = {
  ProjectId: int
  Title: string
}

Now, we want to process the request and assign each TODO an identifier, which should be an auto-incremented integer. Numeration is unique per project, so each TODO must have its own combination of ProjectId and Id:

type Todo = {
  ProjectId: int
  Id: int
  Title: string
}

Now, instead of relying on some database sequences, I want to describe this transformation as a function. The function has the type (TodoRequest, int) -> (Todo, int), i.e. it transforms a tuple of a request and current per-project state (last generated ID) to a tuple of a TODO and post-processing state:

let create (request: TodoRequest, state: int) =
  let nextId = state + 1
  let todo = {
    ProjectId = request.ProjectId
    Id = nextId
    Title = request.Title
  }
  todo, nextId

This is an extremely simple function, and I can use it to great success to process local, non-durable data.

But if I need to make a reliable distributed application out of it, I need to take care of lots of things:

  1. No request should be lost. I need to persist all the requests into a durable storage in case of processor crash.

  2. Similarly, I need to persist TODO's too. Presumably, some downstream logic will use the persisted data later on in TODO's lifecycle.

  3. The state (the counter) must be durable too. In case of crash of processing function, I want to be able to restart processing after recovery.

  4. Processing of the requests should be sequential per project ID. Otherwise I might get a clash of ID's in case two requests belonging to the same project are processed concurrently.

  5. I still want requests to different projects to be processed in parallel, to make sure the system scales up with the growth of project count.

  6. There must be no holes or duplicates in TODO numbering per project, even in face of system failures. In worst case, I agree to tolerate a duplicated entry in the output log, but it must be exactly the same entry (i.e. two entries with same project id, id and title).

  7. The system should tolerate a permanent failure of any single hardware dependency and automatically fail-over within reasonable time.

It's not feasible to meet all of those requirements without relying on some battle-tested distributed services or frameworks.

Which options do I know of?

Transactions

Traditionally, this kind of requirements were solved by using transactions in something like SQL Server. If I store requests, TODO's and current ID per project in the same relational database, I can make each processing step a single atomic transaction.

This addresses all the concerns, as long as we can stay inside the single database. That's probably a viable option for the TODO app, but less of so if I convert my toy example to some real applications like IoT data processing.

Can we do the same for distributed systems at scale?

Azure Event Hubs

Since I touched IoT space, the logical choice would be to store our entries in Azure Event Hubs. That works for many criteria, but I don't see any available approach to make such processing consistent in the face of failures.

When processing is done, we need to store 3 pieces: generated TODO event, current processing offset and current ID. Event goes to another event hub, processing offset is stored in Blob Storage and ID can be saved to something like Table Storage.

But there's no way to store those 3 pieces atomically. Whichever order we choose, we are bound to get anomalies in some specific failure modes.

Azure Functions

Azure Functions don't solve those problems. But I want to mention this Function-as-a-Service offering because they provide an ideal programming model for my use case.

I need to take just one step from my domain function to Azure Function: to define bindings for e.g. Event Hubs and Table Storage.

However, reliability guarantees will stay poor. I won't get neither sequential processing per Event Hub partition key, nor atomic state commit.

Azure Service Fabric

Service Fabric sounds like a good candidate service for reliable processing. Unfortunately, I don't have much experience with it to judge.

Please leave a comment if you do.

JVM World

There are products in JVM world which claim to solve my problem perfectly.

Apache Kafka was the inspiration for Event Hubs log-based messaging. The recent Kafka release provides effectively-once processing semantics as long as data stay inside Kafka. Kafka does that with atomic publishing to multiple topics, and state storage based on compacted topics.

Apache Flink has similar guarantees for its stream processing APIs.

Great, but how do I get such awesomeness in .NET code, and without installing expensive ZooKeeper-managed clusters?

Call for Feedback

Do you know a solution, product or service?

Have you developed effectively-once processing on .NET / Azure stack?

Are you in touch with somebody who works on such framework?

Please leave a comment, or ping me on Twitter.

Azure Functions: Are They Really Infinitely Scalable and Elastic?

Automatic elastic scaling is a built-in feature of Serverless computing paradigm. One doesn't have to provision servers anymore, they just need to write code that will be provisioned on as many servers as needed based on the actual load. That's the theory.

In particular, Azure Functions can be hosted on the Consumption plan:

The Consumption plan automatically allocates compute power when your code is running, scales out as necessary to handle load, and then scales down when code is not running.

In this post I will run a simple stress test to get a feel of how such automatic allocation works in practice and what kind of characteristics we can rely on.

Setup

Here are the parameters that I chose for my test of today:

  • Azure Function written in C# and hosted on Consumption plan
  • Triggered by Azure Storage Queue binding
  • Workload is strictly CPU-bound, no I/O is executed

Specifically, each queue item represents one password that I need to hash. Each function call performs 12-round Bcrypt hashing. Bcrypt is a slow algorithm recommended for password hashing, because it makes potential hash collision attacks really hard and costly.

My function is based on Bcrypt.Net implementation, and it's extremely simple:

public static void Run([QueueTrigger("bcrypt-password")] string password)
{
    BCrypt.Net.BCrypt.HashPassword(password, 12);
}

It turns out that a single execution of this function takes approximately 1 second on an instance of Consumption plan, and consumes 100% CPU during that second.

Now, the challenge is simple. I send 100,000 passwords to the queue and see how long it will take to hash them, and also how the autoscaling will behave. I will run it two times, with different pace of sending messages to the queue.

That sounds like a perfect job for a Function App on Consumption plan:

  • Needs to scale based on load
  • CPU intensive - easy to see how busy each server is
  • Queue-based - easy to see the incoming vs outgoing rate

Let's see how it went.

Experiment 1: Steady Load

In my first run, I was sending messages at constant rate. 100,000 messages were sent within 2 hours, without spikes or drops in the pace.

Sounds like an easy job for autoscaling facilities. But here is the actual chart of data processing:

Function App Scaling

The horizontal axis is time in minutes since the first message came in.

The orange line shows the queue backlog - the amount of messages sitting in the queue at a given moment.

The blue area represents the amount of instances (virtual servers) allocated to the function by Azure runtime (see the numbers at the right side).

We can divide the whole process into 3 logical segments, approximately 40 minutes each:

Laging behind. Runtime starts with 0 instances, and immediately switches to 1 when the first message comes in. However it's reluctant to add any more servers for the next 20 (!) minutes. The scaling heuristic is probably based on the past history for this queue/function, and it wasn't busy at all during the hours before.

After 20 minutes, the runtime starts adding more instances: it goes up to 2, then jumps to 4, then reaches 5 at minute 40. The CPU is constantly at 100% and the queue backlog grows linearly.

Rapid scale up. After minute 40, it looks like the runtime realizes that it needs more power. Much more power! The growth speeds up real quick and by minute 54 the backlog stops growing, even though the messages are still coming in. But there are now 21 instances working, which is enough to finally match and beat the rate of incoming messages.

The runtime doesn't stop growing though. CPU's are still at 100%, and the backlog is still very high, so the scaling goes up and up. The amount of instances reaches astonishing 55, at which point all the backlog is processed and there are no messages in the queue.

Searching for balance. When queue is almost empty and CPU drops below 100% for the first time, the runtime decides to scale down. It does that quickly and aggressively, switching from 55 to 21 instances in just 2 minutes.

From there it keeps slowly reducing the number of instances until the backlog starts growing again. The runtime allows the backlog to grow a bit, but then figures out a balanced number of servers (17) to keep the backlog flat at around 2,000 messages.

It stays at 17 until the producer stops sending new messages. The backlog goes to 0, and the amount of instances gradually drops to 0 within 10 minutes.

The second chart from the same experiment looks very similar, but it shows different metrics:

Function App Delay

The gray line is the delay in minutes since the currently processed message got enqueued (message "age", in-queue latency). The blue line is the total processing rate, measured in messages per minute.

Due to perfect scalability and stability of my function, both charts are almost exactly the same. I've put it here so that you could see that the slowest message spent more than 40 minutes sitting inside the queue.

Experiment 2: Spiky Load

With the second run, I tried to emulate a spiky load profile. I was sending my 100,000 messages throughout 6 hours at lower pace than during the first run. But sometimes the producer switched to fast mode and sent a bigger bunch of messages in just several minutes. Here is the actual chart of incoming message rate:

Spiky Load

It's easy to imagine some service which has a usage pattern like that, when spikes of the events happen from time to time, or in rush hours.

This is how the Function App managed to process the messages:

Spiky Load Processing Result

The green line still shows the amount of incoming messages per minute. The blue line denotes how many messages were actually processed at that minute. And the orange bars are queue backlogs - the amount of messages pending.

Here are several observations:

  • Obviously, processing latency is way too far from real time. There is constantly quite a significant backlog in the queue, and processing delay reaches 20 minutes at peak.

  • It took the runtime 2 hours to clean the backlog for the first time. Even without any spikes during the first hour, the autoscaling algorithm needs time to get up to speed.

  • Function App runtime is able to scale up quite fast (look at the reaction on the fourth spike), but it's not really willing to do that most of the time.

  • The growth of the backlog after minute 280 is purely caused by wrong decision of runtime. While the load is completely steady, the runtime decided to shut down most workers after 20 minutes of empty backlog, and could not recover for the next hour.

Conclusions

I tried to get a feeling about the ability of Azure Functions to scale on demand, adapting to the workload. The function under test was purely CPU-bound, and for that I can give two main conclusions:

  • Function Apps are able to scale to high amount of instances running at the same time, and to eventually process large parallel jobs (at least up to 55 instances).

  • Significant processing delays are to be expected for heavy loads. Function App runtime has quite some inertia, and the resulting processing latency can easily go up to tens of minutes.

If you know how these results can be improved, or why they are less than optimal, please leave a comment or contact me directly.

I look forward to conducting more tests in the future!

Authoring a Custom Binding for Azure Functions

In my previous post I described how I used Durable Functions extensions in Azure Function App. Durable Functions are using several binding types that are not part of the standard suite: OrchestrationClient, OrchestrationTrigger, ActivityTrigger. These custom bindings are installed by copying the corresponding assemblies to a special Extensions folder.

Although Bring-Your-Own-Binding (BYOB) feature hasn't been released yet, I decided to follow the path of Durable Functions and create my own custom binding.

Configuration Binding

I've picked a really simple use case for my first experiments with custom bindings: reading configuration values.

Azure Functions store their configuration values in App Settings (local runtime uses local.settings.json file for that).

That means, when you need a configuration value inside your C# code, you normally do

string setting = ConfigurationManager.AppSettings["MySetting"];

Alternatively, Environment.GetEnvironmentVariable() method can be used.

When I needed to collect service bus subscription metrics, I wrote this kind of bulky code:

var resourceToScale = ConfigurationManager.AppSettings["ResourceToScale"];

var connectionString = ConfigurationManager.AppSettings["ServiceBusConnection"];
var topic = ConfigurationManager.AppSettings["Topic"];
var subscription = ConfigurationManager.AppSettings["Subscription"];

The code is no rocket science, but it's tedious to write, so instead I came up with this idea to define Functions:

public static void MyFunction(
    [TimerTrigger("0 */1 * * * *")] TimerInfo timer,
    [Configuration(Key = "ResourceToScale")] string resource,
    [Configuration] ServiceBusSubscriptionConfig config)

Note two usages of Configuration attribute. The first one defines the specific configuration key, and binds its value to a string parameter. The other one binds multiple configuration values to a POCO parameter. I defined the config class as

public class ServiceBusSubscriptionConfig
{
    public ServiceBusSubscriptionConfig(string serviceBusConnection, string topic, string subscription)
    {
        ServiceBusConnection = serviceBusConnection;
        Topic = topic;
        Subscription = subscription;
    }

    public string ServiceBusConnection { get; }
    public string Topic { get; }
    public string Subscription { get; }
}

The immutable class is a bit verbose, but I still prefer it over get-set container in this scenario.

The binding behavior is convention-based in this case: the binding engine should load configuration values based on the names of class properties.

Motivation

So, why do I need such binding?

As I said, it's a simple use case to play with BYOB feature, and overall, understand the internals of Function Apps a bit better.

But apart from that, I removed 4 lines of garbage from the function body (at the cost of two extra parameters). Less noise means more readable code, especially when I put this code on a webpage.

As a bonus, the testability of the function immediately increased. It's so much easier for the test just to accept the configuration as input parameter, instead of fine-tuning the configuration files inside test projects, or hiding ConfigurationManager usage behind a mockable facade.

Such approach does seem to be the strength of Azure Functions code in general. It's often possible to reduce imperative IO-related code to attribute-decorated function parameters.

Implementing a Custom Binding

The actual implementation process of a custom non-trigger binding is quite simple:

Create a class library with the word "Extension" in its name. Import Microsoft.Azure.WebJobs and Microsoft.Azure.WebJobs.Extensions NuGet packages (at the time of writing I used 2.1.0-beta1 version).

Define a class for binding attribute:

[AttributeUsage(AttributeTargets.Parameter)]
[Binding]
public class ConfigurationAttribute : Attribute
{
    [AutoResolve]
    public string Key { get; set; }
}

The attribute is marked as Binding and the Key property is marked as resolvable from function.json.

Implement IExtensionConfigProvider which will tell the function runtime how to use your binding correctly.

The interface has just one method to implement:

public class ConfigurationExtensionConfigProvider : IExtensionConfigProvider
{
    public void Initialize(ExtensionConfigContext context)
    {
        // ... see below
    }
}

The first step of the implementation is to define a rule for our new ConfigurationAttribute and tell this rule how to get a string value out of any attribute instance:

var rule = context.AddBindingRule<ConfigurationAttribute>();
rule.BindToInput<string>(a => ConfigurationManager.AppSettings[a.Key]);

That's really all that needs to happen to bind string parameters.

To make our binding work with any POCO, we need a more elaborate construct:

rule.BindToInput<Env>(_ => new Env());
var cm = context.Config.GetService<IConverterManager>();
cm.AddConverter<Env, OpenType, ConfigurationAttribute>(typeof(PocoConverter<>));

I instruct the rule to bind to my custom class Env, and then I say that this class Env is convertable to any type (denoted by special OpenType type argument) with a generic converter called PocoConverter.

The Env class is a bit dummy (it exists just because I need some class):

private class Env
{
    public string GetValue(string key) => ConfigurationManager.AppSettings[key];
}

And PocoConverter is a piece of reflection, that loops through property names and reads configuration values out of them. Then it calls a constructor which matches the property count:

private class PocoConverter<T> : IConverter<Env, T>
{
    public T Convert(Env env)
    {
        var values = typeof(T)
            .GetProperties()
            .Select(p => p.Name)
            .Select(env.GetValue)
            .Cast<object>()
            .ToArray();

        var constructor = typeof(T).GetConstructor(values.Select(v => v.GetType()).ToArray());
        if (constructor == null)
        {
            throw new Exception("We tried to bind to your C# class, but it looks like there's no constructor which accepts all property values");
        }

        return (T)constructor.Invoke(values);
    }
}

This piece of code is not particularly robust, but it is good enough to illustrate the concept.

And that's it, the binding it ready! You can find the complete example in my github repo.

Deploying Custom Bindings

Since BYOB feature is in early preview, there is no tooling for automated deployment, and we need to do everything manually. But the process is not too sophisticated:

  1. Create a folder for custom bindings, e.g. D:\BindingExtensions.

  2. Set AzureWebJobs_ExtensionsPath parameter in your app settings to that folder's path. For local development add a line to local.settings.json:

     "AzureWebJobs_ExtensionsPath": "D:\\BindingExtensions",
    
  3. Create a subfolder for your extension, e.g. D:\BindingExtensions\ConfigurationExtension.

  4. Copy the contents of bin\Debug\ of your extension's class library to that folder.

  5. Reference your extension library from your Function App.

You are good to go! Decorate your function parameters with the new attribute.

Run the function app locally to try it out. In the console output you should be able to see something like

Loaded custom extension: ConfigurationExtensionConfigProvider from 
'D:\BindingExtensions\ConfigurationExtension\MyExtensions.dll'

You will be able to debug your extension if needed.

Use the following links to find out more about custom bindings, see more examples and walkthroughs, and get fresh updates:

Have a good binding!

Custom Autoscaling with Durable Functions

In my previous post Custom Autoscaling of Azure App Service with a Function App I've created a Function App which watches a Service Bus Subscription backlog and adjusts the scale of App Service based on the observed load.

It works fine but there are two minor issues that I would like to address in this article:

  • Scaling Logic function from that workflow needs to preserve state between calls. I used Table Storage bindings for that, which proved to be a bit verbose and low level: I needed to manage conversion to entity and JSON serialization myself;

  • There is no feedback from Scaler function (which executes the change) back to Scaling Logic function. Thus, if scaling operation is slow or fails, there is no easy way to notify the logic about that.

Let's see how these issues can be solved with Azure Durable Functions.

Meet Durable Functions

Microsoft has recently announced the preview of Durable Functions:

Durable Functions is an Azure Functions extension for building long-running, stateful function orchestrations in code using C# in a serverless environment.

The library is built on top of Durable Task Framework and introduces several patterns for Function coordination and stateful processing. Please go read the documentation, it's great and has some very useful examples.

I decided to give Durable Functions a try for my autoscaling workflow. Feel free to refer to the first part to understand my goals and the previous implementation.

Architecture

The flow of metric collection, scaling logic and scaling action stays the same. The state and cross-function communication aspects are now delegated to Durable Functions, so the diagram becomes somewhat simpler:

Autoscaling Architecture

The blue sign on Scaling Logic function denotes its statefulness.

Let's walk through the functions implementation to see how the workflow plays out.

This time I'll start with Scaler function and then flow from right to left to make the explanation more clear.

Scaler

Scaler function applies the scaling decisions to the Azure resource, App Service Plan in my case. I've extracted App Service related code to a helper, to keep the function minimal and clean. You can see the full code in my github repo.

Scaler function is triggered by Durable Function's ActivityTrigger. That basically means that it's ready to be called from other functions. Here is the code:

[FunctionName(nameof(Scaler))]
public static int Scaler([ActivityTrigger] DurableActivityContext context)
{
    var action = context.GetInput<ScaleAction>();

    var newCapacity = ScalingHelper.ChangeAppServiceInstanceCount(
        action.ResourceName,
        action.Type == ScaleActionType.Down ? -1 : +1);

    return newCapacity;
}

In order to receive an input value, I utilize context.GetInput() method. I believe that the team is working on support of custom classes (ScaleAction in my case) directly as function parameters.

The function then executes the scale change and returns back the new capacity of App Service Plan. Note that this is new: we were not able to return values in the previous implementation.

Scaling Logic

Scaling Logic is using Stateful Actor pattern. One instance of such actor is created for each scalable resource (I only use 1 now). Here is the implementation (again, simplified for readability):

[FunctionName(nameof(ScalingLogic))]
public static async Task<ScalingState> ScalingLogic(
    [OrchestrationTrigger] DurableOrchestrationContext context, 
    TraceWriter log)
{
    var state = context.GetInput<ScalingState>();

    var metric = await context.WaitForExternalEvent<Metric>(nameof(Metric));

    UpdateHistory(state.History, metric.Value);
    ScaleAction action = CalculateScalingAction(state);

    if (action != null)
    {
        var result = await context.CallFunctionAsync<int>(nameof(Scaler), action);
        log.Info($"Scaling logic: Scaled to {result} instances.");
        state.LastScalingActionTime = context.CurrentUtcDateTime;
    }

    context.ContinueAsNew(state);
    return state;
}

Here is how it works:

  • Function is bound to OrchestrationTrigger, yet another trigger type from Durable Functions;

  • It loads durable state from the received context;

  • It then waits for an external event called Metric (to be sent by Collector function, see the next section);

  • When an event is received, the function updates its state and calculates if a scaling action is warranted;

  • If yes, it calls Scaler function and sends the scale action. It expects an integer result, denoting the new amount of instances;

  • It then calls ContinueAsNew method to start a new iteration of the actor loop, providing the updated state.

One important note: the orchestrated function has to be deterministic. That means, for example, that DateTime.Now is not allowed to be used. I use context.CurrentUtcDateTime instead for time-related calculations.

The implementation of this function solves both problems that I mentioned in the introduction. We do not manage state storage and serialization manually, and we now have the ability to get feedback from Scaler function.

Metrics Collector

I've extracted Service Bus related code to a helper to keep the code sample minimal and clean. You can see the full code in my github repo.

Here is the remaining implementation of Metric Collector:

[FunctionName(nameof(MetricCollector))]
public static async Task MetricCollector(
    [TimerTrigger("0 */1 * * * *")] TimerInfo myTimer,
    [OrchestrationClient] DurableOrchestrationClient client,
    TraceWriter log)
{
    var resource = Environment.GetEnvironmentVariable("ResourceToScale");

    var status = await client.GetStatusAsync(resource);
    if (status == null)
    {
        await client.StartNewAsync(nameof(ScalingLogic), resource, new ScalingState());
    }
    else
    {
        var metric = ServiceBusHelper.GetSubscriptionMetric(resource);
        log.Info($"Collector: Current metric value is {metric.Value.Value} at {DateTime.Now}");
        await client.RaiseEventAsync(resource, nameof(Metric), metric);
    }
}

It's still a timer-triggered "normal" (non-durable) function, but now it also has an additional binding to OrchestrationClient. This client is used to communicate metric data to the Scaling Logic.

With the current implementation, Metric Collector also has a second responsibility: actor instance management. At every iteration, it queries for the current status of corresponding actor. If that is null, Collector creates a new instance with initial empty state.

To my liking, this aspect is a bit unfortunate, but it seems to be required with the current implementation of Durable Functions framework. See my related question on github.

Conclusions

I adjusted the initial flow of autoscaling functions to use Durable Functions library. It made the state management look more straightforward, and also allowed direct communication between two functions in strongly-typed request-response manner.

The resulting code is relatively clear and resembles the typical structure of async-await code that C# developers are used to.

There are some downsides that I found about Durable Functions too:

  • This is a very early preview, so there are some implementation issues. A couple times I managed to put my functions into a state where they were stuck and no calls could be made anymore. The only way I could get out of there is by clearing some blobs in the Storage Account;

  • The actor instance management story feels raw. The function, which needs to send events to actors, has to manage their lifecycle and instance IDs. I would need to add some more checks to make the code production ready, e.g. to restart actors if they end up in faulty state;

  • There are some concurrency issues in function-to-function communication to be resolved;

  • Some discipline is required to keep Durable functions side-effect free and deterministic. The multiple executions caused by awaits and replays are counter-intuitive (at least for novice devs), and thus error-prone.

Having said that, I believe Durable Functions can be a very useful abstraction to simplify some of the more advanced scenarios and workflows. I look forward to further iterations of the library, and I will keep trying it out for more scenarios.

Mikhail Shilkov I'm Mikhail Shilkov, a software developer. I enjoy F#, C#, Javascript and SQL development, reasoning about distributed systems, data processing pipelines, cloud and web apps. I blog about my experience on this website.

LinkedIn@mikhailshilkovGitHubStack Overflow