Azure Functions as a Facade for Azure Monitoring

Azure Functions are the Function-as-a-Service offering from Microsoft Azure cloud. Basically, an Azure Function is a piece of code which gets executed by Azure every time an event of some kind happens. The environment manages deployment, event triggers and scaling for you. This approach is often reffered as Serverless.

In this post I will describe one use case for Azure Functions: we implemented a number of functions as a proxy layer between our operations/monitoring tool and Azure metric APIs.


Automated monitoring and alerting are crucial in order to ensure 24x7 smooth operations of our business-critical applications. We host applications both on-premise and in Azure cloud, and we use a single set of tools for monitoring across this hybrid environment.

Particularly, we use PRTG Network Monitor to collect all kinds of metrics about the health of our systems and produce both real-time alerts and historic trends.

A unit of monitoring in PRTG is called "sensor". Each sensor polls a specific data source to retrieve the current value of a metric. The data source can be a performance counter, a JSON value in HTTP response, a SQL query result and so on.

The problem is that there is no PRTG sensor for Azure metrics out of the box. It might be possible to implement a sensor with custom code, e.g. in PowerShell, but it would be problematic in two ways (at least):

  1. The custom code sensors are cumbersome to develop and maintain.
  2. We would have to put sensitive information like Azure API keys and connection strings to PRTG.

Solution Overview

To overcome these problems we introduced an intermediate layer, as shown on the following picture:

PRTG to HTTP to Azure

We use PRTG HTTP XML/REST sensor type. This sensor polls a given HTTP endpoint, parses the response as JSON and finds a predefined field. This field is then used as the sensor value. It takes 30 seconds to setup such sensor in PRTG.

The HTTP endpoint is hosted inside Azure. It provides a facade for metric data access. All the sensitive information needed to access Azure metrics API is stored inside Azure configuration itself. The implementation knows which Azure API to use to get a specific metric, and it hides those complications from the client code.

Azure Functions

We chose Azure Functions as the technology to implement and host such HTTP facade.

The functions are very easy to create or modify. They are deployed independently from any other code, so we can update them at any cadence. And no need to provision any kind of servers anywhere - Azure will run the code for us.

Here is how the whole setup works:

Retrieval of data from Azure to PRTG

  1. Every X minutes (configured per sensor), PRTG makes an HTTP request to a predefined URL. The request includes an Access Key as a query parameter (the key is stored in sensor URL configuration). Each access key enables access to just one endpoint and is easily revokable.

  2. For each Metric type there is an Azure Function listening for HTTP requests from PRTG. Azure authorizes requests that contain valid access keys.

  3. Based on query parameters of the request, Azure Function retrieves a proper metric value from Azure management API. Depending on the metric type, this is accomplished with Azure .NET SDK or by sending a raw HTTP request to Azure REST API.

  4. Azure Function parses the response from Azure API and converts it to just the value which is requested by PRTG.

  5. The function returns a simple JSON object as HTTP response body. PRTG parses JSON, extracts the numeric value, and saves it into the sensor history.

At the time of writing, we have 13 sensors served by 5 Azure Functions:

Map of PRTG sensors to Functions to Azure services

I describe several functions below.

Service Bus Queue Size

The easiest function to implement is the one which gets the amount of messages in the backlog of a given Azure Service Bus queue. The function.json file configures input and output HTTP bindings, including two parameters to derive from the URL: account (namespace) and queue name:

  "bindings": [
      "authLevel": "function",
      "name": "req",
      "type": "httpTrigger",
      "direction": "in",
      "route": "Queue/{account}/{name}"
      "name": "$return",
      "type": "http",
      "direction": "out"
  "disabled": false

The C# implementation uses standard Service Bus API and a connection string from App Service configuration to retrieve the required data. And then returns a dynamic object, which will be converted to JSON by Function App runtime.

#r "Microsoft.ServiceBus"

using System.Net;
using Microsoft.ServiceBus;

public static object Run(HttpRequestMessage req, string account, string name)
    var connectionString = Environment.GetEnvironmentVariable("sb-" + account);
    var nsmgr = NamespaceManager.CreateFromConnectionString(connectionString);
    var queue = nsmgr.GetQueue(name);
    return new 
        messageCount = queue.MessageCountDetails.ActiveMessageCount,
        dlq = queue.MessageCountDetails.DeadLetterMessageCount

And that is all the code required to start monitoring the queues!

Service Bus Queue Statistics

In addition to queue backlog and dead letter queue size, we wanted to see some queue statistics like amount of incoming and outgoing messages per period of time. The corresponding API exists, but it's not that straightforward, so I described the whole approach in a separate post: Azure Service Bus Entity Metrics .NET APIs.

In my Azure Function I'm using the NuGet package that I mentioned in the post. This is accomplished by adding a project.json file:

  "frameworks": {
      "dependencies": {
        "MikhailIo.ServiceBusEntityMetrics": "0.1.2"

The function.json file is similar to the previous one, but with one added parameter called metric. I won't repeat the whole file here.

The Function implementation loads a certificate from the store, calls metric API and returns the last metric value available:

using System.Linq;
using System.Security.Cryptography.X509Certificates;
using MikhailIo.ServiceBusEntityMetrics;

public static DataPoint Run(HttpRequestMessage req, string account, string name, string metric)
    var subscription = Environment.GetEnvironmentVariable("SubscriptionID");
    var thumbprint = Environment.GetEnvironmentVariable("WEBSITE_LOAD_CERTIFICATES");

    X509Store certStore = new X509Store(StoreName.My, StoreLocation.CurrentUser);

    X509Certificate2Collection certCollection = certStore.Certificates.Find(

    var client = new QueueStatistics(certCollection[0], subscription, account, name);
    var metrics = client.GetMetricSince(metric, DateTime.UtcNow.AddMinutes(-30));
    return metrics.LastOrDefault();

Don't forget to set WEBSITE_LOAD_CERTIFICATES setting to your certificate thumbprint, otherwise Function App won't load it.

Web App Instance Count

We are using Azure Web Jobs to run background data processing, e.g. for all queue message handlers. The jobs are hosted in Web Apps, and have auto-scaling enabled. When the load on the system grows, Azure spins up additional instances to increase the overall throughput.

So, the next metric to be monitored is the amount of Web App instances running.

There is a REST endpoint to retrieve this information, but this time authentication and authorization are implemented with Active Directory. I created a helper class to wrap the authentication logic:

public static class RestClient
    public static async Task<T> Query<T>(string url)
        var token = await GetAuthorizationHeader();
        var client = new HttpClient();
        client.DefaultRequestHeaders.Authorization = new AuthenticationHeaderValue("Bearer", token);

        var response = await client.GetAsync(url);
        var content = await response.Content.ReadAsStringAsync();
        return JsonConvert.DeserializeObject<T>(content);

    private static async Task<string> GetAuthorizationHeader()
        var activeDirectoryID = Environment.GetEnvironmentVariable("ActiveDirectoryID");
        var applicationID = Environment.GetEnvironmentVariable("ActiveDirectoryApplicationID");
        var secret = Environment.GetEnvironmentVariable("ActiveDirectorySecret");

        var context = new AuthenticationContext($"{activeDirectoryID}");
        var credential = new ClientCredential(applicationID, secret);
        AuthenticationResult result = 
            await context.AcquireTokenAsync("", credential);
        return result.AccessToken;

The function then uses this REST client to query Web App management API, converts JSON to strongly typed C# objects and extracts the amount of instances into HTTP response:

public class Instance
    public string id { get; set; }
    public string name { get; set; }

public class Response
    public Instance[] value { get; set; }

public static async Task<HttpResponseMessage> Run(HttpRequestMessage req)
    var subscription = Environment.GetEnvironmentVariable("SubscriptionID");
    var resourceGroup = Environment.GetEnvironmentVariable("ResourceGroup");
    var appService = Environment.GetEnvironmentVariable("AppService");

    var url = $"{subscription}/resourceGroups/{resourceGroup}" +
    var response = await RestClient.Query<Response>(url);

    return req.CreateResponse(HttpStatusCode.OK, new
        instanceCount = response.value.Length

Users Online

The last example I want to share is related to Application Insights data. For instance, we inject a small tracking snippet on our front-end page and then Application Insights track all the page views and other user activity.

We use the amount of users currently online as another metric for the monitoring solution. The Application Insights API is currently in preview, but at least it is nicely described at Be sure to check out API Explorer too.

The following sample function returns the amount of users online:

public class UsersCount
    public long unique { get; set; }

public class Value
    public UsersCount UsersCount { get; set; }

public class Response
    public Value value { get; set; }

public static async Task<HttpResponseMessage> Run(HttpRequestMessage req)
    var appID = Environment.GetEnvironmentVariable("ApplicationInsightsID");
    var key = Environment.GetEnvironmentVariable("ApplicationInsightsKey");

    var client = new HttpClient();
    client.DefaultRequestHeaders.Add("x-api-key", key);
    var url = $"{appID}/metrics/users/count";

    var response = await client.GetAsync(url);
    var content = await response.Content.ReadAsStringAsync();
    var r = JsonConvert.DeserializeObject<Response>(content);

    return req.CreateResponse(HttpStatusCode.OK, new
        usersCount = r.value.UsersCount.unique


It seems that monitoring metrics retrieval is an ideal scenario to start using Azure Functions. The Functions are very easy to create and modify, they abstract away the details of hosting Web API endpoints, and at the same time give you the full power of C# (or F#) and Azure.

And because we only call those functions about 1 time per minute, they are free to run!

Azure Service Bus Entity Metrics .NET APIs

Azure Service Bus is a key component of many background processing applications hosted in Azure, so it definitely requires monitoring and alerting. My goal for our monitoring solution was to provide an API to retrieve the following parameters for each Service Bus queue/topic in our application:

  • Message count (backlog)
  • Dead letter queue count
  • Amount of Incoming messages per time period
  • Amount of Processed messages per time period

The first two are easily retrieved from QueueDescription object (see MSDN):

var nsmgr = NamespaceManager.CreateFromConnectionString(connectionString);
var queue = nsmgr.GetQueue(name);
var backlog = queue.MessageCountDetails.ActiveMessageCount;
var dlq = q.MessageCountDetails.DeadLetterMessageCount;

The other two metrics are not readily available from the .NET SDK though. There are some extra metrics described in Service Bus Entity Metrics REST APIs but the docs are really brief, wague and lack any examples.

So the rest of this post will be a walkthrough of how to consume those REST API from your .NET code.

Management Certificate

The API authenticates the caller by its client certificate. This authentication approach seems to be deprecated for Azure services, but for this particular API it's still the way to go.

First, you need to obtain a certificate itself, which means:

  • It's installed in certificate store on the machine where API call is made
  • You have a .cer file for it

If you are calling API from your workstation, you may just Create a new self-signed certificate.

I am calling API from Azure Function App, so I reused the certificate that we already uploaded to Azure for SSL support.

Once you have the certificate, you have to Upload it as a management certificate to "Classic" Azure portal. Yes, management certificates are not supported by the the new portal. If you don't have access to the old portal, ask your system administrator to grant it.

Finally, here is a code sample to load the certificate in C# code:

X509Store store = new X509Store("My", StoreLocation.CurrentUser);
var cert = store.Certificates.Find(
    "<certificate name of yours>", 

Request Headers

Here is a helper class which adds the specified certificate to each request and sets the appropriate headers too:

internal class AzureManagementClient : WebClient
    private readonly X509Certificate2 certificate;

    public AzureManagementClient(X509Certificate2 certificate)
        this.certificate = certificate;

    protected override WebRequest GetWebRequest(Uri address)
        var request = (HttpWebRequest)base.GetWebRequest(address);

        request.Headers.Add("x-ms-version: 2013-10-01");
        request.Accept = "application/json";

        return request;

This code is mostly copied from the very useful post of Brian Starr, so thank you Brian.

Getting the List of Metrics

To get the list of available metrics you will need 3 string parameters:

  • Azure subscription ID
  • Service Bus namespace
  • Queue name

The following picture shows all of them on Azure Portal screen:

Service Bus Parameters

Now, format the following request URL and query it using our azure client:

var client = new AzureManagementClient(cert);
var url = $"{subscriptionId}" +
          $"/services/servicebus/namespaces/{serviceBusNamespace}" +
var result = client.DownloadString(url);

If you did everything correctly, you will get the list of supported metrics in JSON. Congratulations, that's a major accomplishment :)

And here is a quick way to convert JSON to C# array:

public class Metric
    public string Name { get; set; }
    public string Unit { get; set; }
    public string PrimaryAggregation { get; set; }
    public string DisplayName { get; set; }
var metrics = JsonConvert.DeserializeObject<Metric[]>(result);

Getting the Metric Values

Now, to get the metric values themselves, you will need some extra parameters:

  • Metric name (take a value of Name properties from Metric class above)
  • Rollup period, or aggregation period: 5 minute, 1 hour, 1 day, or 1 week, take the Pxxx code from here
  • Start date/time (UTC) of the data period to query

Here is the sample code:

var time = DateTime.UtcNow.AddHours(-1).ToString("s");

var client = new AzureManagementClient(cert);
var url = $"{subscriptionId}" +
          $"/services/servicebus/namespaces/{serviceBusNamespace}" +
          $"/queues/{queueName}/Metrics/{metric}" +

var result = client.DownloadString(url);

I am using incoming metric to get the amount of enqueued messages per period and outgoing metric to get the amount of dequeued messages.

The strongly typed version is simple:

public class DataPoint
    public string Timestamp { get; set; }
    public long Total { get; set; }
var data = JsonConvert.DeserializeObject<DataPoint[]>(result);

Working Example

I've authored a small library which wraps the HTTP request into strongly typed .NET classes. You can see it in my github repository or grab it from NuGet.

Coding Puzzle in F#: Find the Number of Islands

Here's a programming puzzle. Given 2D matrix of 0's and 1's, find the number of islands. A group of connected 1's forms an island. For example, the below matrix contains 5 islands

Input : mat = {{1, 1, 0, 0, 0},
               {0, 1, 0, 0, 1},
               {1, 0, 0, 1, 1},
               {0, 0, 0, 0, 0},
               {1, 0, 1, 0, 1}}
Output : 5

A typical solution to this problem will be implemented in C++, Java or C# and will involve a loop to iterate through the matrix, and another loop or recursion to traverse islands. The traversal progress will be tracked in an auxiliary mutable array, denoting the visited nodes. An example of such solution (and the definition of the problem above) can be found here.

I want to give an example of solution done in F#, with generic immutable data structures and pure functions.

Graph Traversal

First of all, this puzzle is a variation of the standard problem: Counting number of connected components in a graph.

Connected Graph Components

I will start my implementation with a graph traversal implementation, and then we will apply it to the 2D matrix at hand.

The graph is defined by the following type:

type Graph<'a> = {
  Nodes: seq<'a>
  Neighbours: 'a -> seq<'a>

It is a record type with two fields: a sequence of all nodes, and a function to get neighbour nodes for a given node. The type of the node is generic: I'll use numbers for our example, but Graph type doesn't care much.

The traversal plan is the following:

  1. Go through the sequence of graph nodes.

  2. Keep two accumulator data structures: the list of disjoint sub-graphs (sets of nodes connected to each other) and the set of visited nodes. Both are empty at the beginning.

  3. If the current node is not in the visited set, recursively traverse all neighbours to find the current connected component.

  4. The connected component traversal is the Depth-First Search, each node is added to both current set and total visited set.

Let's start the implementation from inside out. The following recursive function adds a node to the accumulated sets and calls itself for non-visited neighbours:

let rec visitNode accumulator visited node =
  let newAccumulator = Set.add node accumulator
  let newVisited = Set.add node visited

  graph.Neighbours node
  |> Seq.filter (fun n -> Set.contains n newVisited |> not)
  |> Seq.fold (fun (acc, vis) n -> visitNode acc vis n) (newAccumulator, newVisited)

The type of this function is Set<'a> -> Set<'a> -> 'a -> Set<'a> * Set<'a>.

Step 3 is implemented with visitComponent function:

let visitComponent (sets, visited) node =
  if Set.contains node visited 
  then sets, visited
    let newIsland, newVisited = visitNode Set.empty visited node
    newIsland :: sets, newVisited

Now, the graph traversal is just a fold of graph nodes with visitComponent function.

module Graph =
  let findConnectedComponents graph = 
    |> Seq.fold visitComponent ([], Set.empty)
    |> fst

This is the only public function of our graph API, available for the client applications. The visitNode and visitComponent are defined as local functions underneath (and they close over the graph value).

2D Matrix

Now, let's forget about the graphs for a second and model the 2D matrix of integers. The type definition is simple, it's just an alias for the array:

type Matrix2D = int[,]

Now, we need to be able to traverse the matrix, i.e. iterate through all elements and find the neighbours of each element.

The implementation below is mostly busy validating the boundaries of the array. The neighbours of a cell are up to 8 cells around it, diagonal elements included.

module Matrix2D =
  let allCells (mx: Matrix2D) = seq {
    for x in [0 .. Array2D.length1 mx - 1] do
      for y in [0 .. Array2D.length2 mx - 1] -> x, y

  let neighbours (mx: Matrix2D) (x,y) =
    Seq.crossproduct [x-1 .. x+1] [y-1 .. y+1]
    |> Seq.filter (fun (i, j) -> i >= 0 && j >= 0 
                              && i < Array2D.length1 mx 
                              && j < Array2D.length2 mx)
    |> Seq.filter (fun (i, j) -> i <> x || j <> y)

Putting It All Together

Now we are all set to solve the puzzle. Here is our input array:

let mat = array2D
            [| [|1; 1; 0; 0; 0|];
               [|0; 1; 0; 0; 1|];
               [|1; 0; 0; 1; 1|];
               [|0; 0; 0; 0; 0|];
               [|1; 0; 1; 0; 1|]

We need a function to define if a given cell is a piece of an island:

let isNode (x, y) = mat.[x, y] = 1

And here is the essence of the solution - our graph definition. Both Nodes and Neightbours are matrix cells filtered to contain 1's.

let graph = {
  Nodes = Matrix2D.allCells mat |> Seq.filter isNode
  Neighbours = Matrix2D.neighbours mat >> Seq.filter isNode

The result is calculated with one-liner:

graph |> Graph.findConnectedComponents |> List.length


The implementation above represents my attempt to solve in a functional way the puzzle which is normally solved in imperative style. I took a step back and tried to model the underlying concepts with separate data structures. The types and functions might be reused for similar problems in the same domain space.

While not a rocket science, the Connected Islands puzzle is a good exercise and provides a nice example of functional concepts, which I'm planning to use while discussing FP and F#.

The full code can be found in my github.

Event Sourcing: Optimizing NEventStore SQL read performance

In my previous post about Event Store read complexity I described how the growth of reads from the event database might be quadratic in respect to amount of events per aggregate.

On the higher level, the conclusion was that the event sourced database should be optimized for reads rather that writes, which is not always obvious from the definition of the "append-only store".


In this post I want to look at NEventStore on top of Azure SQL Database which is the combination we currently use for event sourcing in Azure-based web application.

NEventStore library provides a C# abstraction over event store with multiple providers for several database backends. We use the Persistence.SQL provider. When you initialize it with a connection string to an empty database, the provider will go on and create two tables with schema, indexes etc. The most important table is Commits and it gets the following schema:

CREATE TABLE dbo.Commits
  BucketId          varchar(40),
  StreamId          char(40),
  StreamRevision    int,
  Items             tinyint,
  CommitId          uniqueidentifier,
  CommitSequence    int,
  CheckpointNumber  bigint IDENTITY(1, 1),
  Payload           varbinary(max),
  CommitStamp       datetime2
ALTER TABLE dbo.Commits 

I removed several columns, most indexes and constraints to make the script more readable.

The primary key is based upon CheckpointNumber - an IDENTITY column, which means the new events (commits) are appended to the end of the clustered index. Clearly, this is good for INSERT performance.

There is a number of secondary non-clustered indexes that are optimized for rich API of NEventStore library, e.g. dispatching events to observers, searching for streams, time-based queries etc.

Our Use Case

It turns out that we don't need those extended API provided by NEventStore. Effectively, we only need two operations to be supported:

  • Add a new event to a stream
  • Read all events of a stream

Our experience of running production-like workloads showed that the read operation performance suffers a lot when the size of a stream grows. Here is a sample query plan for the read query with the default schema:

Query Plan with default primary key

SQL Server uses non-clustered index to find all events of the given steam, and then does key lookups, which might get very expensive for large streams with hundreds or thousands of events.

Tuning for Reads

After seeing this, I decided to re-think the primary index of the Commits table. Here is what I came down to:

ALTER TABLE dbo.Commits 
PRIMARY KEY CLUSTERED (BucketId, StreamId, CommitSequence)

Now, all the commits of one stream are physically located together in the clustered index.

The change makes INSERT's less efficient. It's not a simple append to the end of the clustered index anymore.

But at this price, the reads just got much faster. Here is the plan for the same query over the new schema:

Query Plan with the new primary key

Simple, beautiful and fast!

Our Results

The results look great for us. We are able to run our 50 GB Commits table on a 100-DTU SQL Database instance, with typical load of 10 to 25 percent. The reads are still taking the biggest chunk of the load, with writes being far behind.

The mileage may vary, so be sure to test your NEventStore schema versus your workload.

Further Improvements

Here are some further steps that we might want to take to make Commits table even faster:

  • The table comes with 5 non-clustered indexes. One of them became our clustered index. Two indexes are unique, so they might be useful for duplicate prevention (e.g. in concurrency scenarios). The remaining two are non-unique, so they can probably be safely deleted unless we start using other queries that they are intended for.

  • There are several columns which are not used in our implementation: StreamIdOriginal, Dispatched and Headers to name a few. We could replace the table with a view of the same name, and always return defaults for those columns in any SELECT, ignoring the values in any INSERT.

But I expect these changes to have moderate impact on performance in contrast to the primary key change discussed above.

My Praise of Advent of Code 2016

During the last days of December I was pleasing my internal need for solving puzzles and tricky tasks by going through Advent of Code 2016 challenge.

The idea is simple: every day since December 1st to 25th, the site publishes a new brain teaser. They are all aligned into one story: the Bad Easter Bunny has stolen all the Chrismas gifts from Santa, and now you are the hero who should break into the Bunny's headquarters and save the gifts for the kids.

Having said that, each challenge is independent from the others, so you can solve them in arbitrary order if you want.

Advent Of Code Levels Advent Calendar in dark ASCII

A puzzle consists of a description and an input data set associated with it. The solution is typically represented as a number or a short string, so it can be easily typed into the textbox. However, to get this solution you need to implement a program: computing it manually is not feasible.

I started a bit late and got just the first 11 puzzles solved. Each puzzle is doable in one sitting, usually half-an-hour to a couple hours of work, which is very nice.

Some problems are purely about the correctness of your solution. The most engaging tasks were also computationally intensive, such that a straightforward solution took too much time to run to completion. You need to find a shortcut to make it faster, which is always fun.

Problem Solved! You collect stars for providing the correct answers

Apart from generic joy and satisfaction that one gets from solving programming challenges like these, I also consider it a good opportunity to try a new programming language or a paradygm.

As I said, the tasks are relatively small, so you can feel the sense of accomplishment quite often, even being not very familiar with the programming language of choice.

There are many other people solving the same puzzles and also sharing their solutions online. You can go and find the other implementations of a task that you just solved, and compare it to your approach. That's the great way to learn from other people, broaden your view and expose yourself to new tricks, data structures and APIs.

I picked F# as my programming language for Advent of Code 2016. I chose to restrict myself to immutable data structures and pure functions. And it played out really nice, I am quite happy with speed of development, readability and performance of the code.

Day 8 solved Solution to one of the puzzles

You can find my code for the first 11 puzzles in my github account. Full sets of F# solutions are available from Mark Heath and Yan Cui.

I included one of the solutions into The Taste of F# talk that I did at a user group earlier this month.

Next year I'll pick another language and will start on December 1st. I invite you to join me in solving Advent of Code 2017.

Kudos to Eric Wastl for creating and maintaining the Advent of Code web site.

Mikhail Shilkov I'm Mikhail Shilkov, a software developer. I enjoy F#, C#, Javascript and SQL development, reasoning about distributed systems, data processing pipelines, cloud and web apps. I blog about my experience on this website.

LinkedIn@mikhailshilkovGitHubStack Overflow