Load Testing Azure SQL Database by Copying Traffic from Production SQL Server

Azure SQL Database is a managed service that provides low-maintenance SQL Server instances in the cloud. You don't have to run and update VMs, or even take backups and setup failover clusters. Microsoft will do administration for you, you just pay an hourly fee.

So, let's say you decide this value proposition is a good reason to migrate away from your existing self-hosted SQL Server database running in production and replace it with Azure SQL Database.

You do the functional testing and eventually everything works like charm. The next set of questions is going to be related to Database performance level:

  • Which tier / how many DTU's should I provision?
  • How much will it cost?
  • Will it be able to handle my current production load?

DTUs

Even if you collect all the specs of the hardware behind your existing SQL Server, you can't directly use that knowledge to choose the right Azure SQL Database size.

The sizes are measured in Database Transaction Units (DTUs). These are abstract units of measure which don't necessarily mean much on their own. Within a given tier (Standard / Premium), doubling the DTU amount will double the max throughput.

That doesn't really help to plan for workload migrations.

There are some ways to estimate the DTU requirements by measuring metrics like CPU and IOPS on your existing server. Have a look at DTU Calculator: it consists of a data collector and an online converter from metric values to DTUs.

While useful as a first approximation, I'm reluctant to provision Azure SQL Database size solely based on such estimates.

My answer to the problem is: Measure It!

Synthetic Tests

Go get a backup of your existing production database and Export / Import it into Azure SQL Database. Pick the size based on your gut feel, run a load test, evaluate the results, adjust the size, repeat.

If you know your workload really well, you can create a synthetic test:

  • Create a script or scenario which resembles the real production load
  • Run it for a given period of time
  • Measure the DTU's consumed

Unfortunately, I'm yet to see a non-trivial database where I could manually create such script and be reasonably sure that it reflects the reality. Most of the time the load is consumer-driven, changes over time and heavily depends on exact query parameter values.

Which brings me to the need of replaying the actual production workload on Azure SQL Database.

Trace and Replay

SQL Server comes with a marvelous suite of tools refined over years of its existence. It includes the tools to capture and replay the queries, so I started with those.

SQL Server Profiler has a trace template called TSQL_Replay:

This template records information required to replay the trace. Use this template to perform iterative turning, such as benchmark testing.

This sounded like what I needed, so I ran the profiler with this template to save a short trace.

Afterwards, it is possible to use the same SQL Server Profiler to replay the trace against another target database. So the process looks like this:

Replaying Traffic with SQL Server Profiler

Unfortunately, this didn't go very well:

  • Azure SQL Database is not supported by the tooling. The replay kind of runs, but it throws lots of errors like reading from non-existent system tables, trying to switch between databases and so on

  • Related or not to the previous item, but replay went terribly slow. It seemed to slow down exponentially over time

  • The trace file itself was of huge size. Because the template tries to record pretty much everything, tracing 5 minutes on production produced 10 GB of XML

  • Replay was not real-time: you first record, then you replay. This might not be a big issue for many databases, but some of our queries have time parameter, and results would change if I replay the trace 1 hour later

Just to give you a rough idea, our production database-under-study is handling about 1000 RPC calls per second (mostly stored procedures).

Custom Trace & Replay

Since off-the-shelf solution didn't work for me, I decided to come up with my own custom tool chain. Here is the idea:

Replaying Traffic with SQL Server Profiler

There are two custom steps that I implemented:

  1. Run a console app which would host a custom trace server. The trace server receives SQL commands and sends them to Azure Event Hubs in batches

  2. Create an Azure Function application triggered by the Event Hub. Each function call gets one SQL command to execute and runs it against Azure SQL database that we are trying to load-test

This setup worked remarkably well for me: I got the real-time replay of SQL commands from production SQL Server to Azure SQL Database.

The rest of the article describes my setup so that you could reproduce it for your workload.

Azure SQL Database

Ideally, you want your copy of the database to be as fresh as possible, so that the query plans and results match.

Some ideas to accomplish this are given in SQL Server database migration to SQL Database in the cloud.

Premium RS tier is great for testing, because it is much cheaper than Premium tier, while it provides the same level of performance.

Event Hubs

I used Azure Event Hubs as messaging middleware between Trace Server and Replay Function App.

I started with Azure Storage Queues, but the server wasn't able to send messages fast enough, mostly due to lack of batching.

Event Hubs match naturally my choice of Azure Functions: Functions have a built-in trigger with dynamic scaling out of the box.

So, I just created a new Event Hub via the portal, with 32 partitions allocated.

Trace Definition File

In order to run a custom Trace Server, you still need a trace definition file. The built-in template TSQL_Replay mentioned above could work, but it's subscribed to way too many events and columns.

Instead, I produced my own trace template with minimal selection. To do that, open SQL Server Profiler, then navigate to File -> Templates -> New Template, give it a name and then on Events Selection tab exclude everything except exactly the commands that you want to replay.

We use stored procedures for pretty much everything, so my selection looked just like this:

SQL Profiler Template

For the first few runs, I advise you to restrict the trace even further. Click Column Filters button, select TextData there and set Like filter to a single stored procedure, e.g. matching the pattern %spProductList%.

This way you can debug your whole replay chain without immediately overloading any part of it with huge stream of commands.

Once done, save the tdf file to disk. An example of such trace definition file can be found in my github.

Trace Server

My trace server is a simple C# console application.

Create a new console app and reference a NuGet package Microsoft.SqlServer.SqlManagementObjects. Mine is of version 140.17218.0 (latest as of today).

Unfortunately, this NuGet package is not fully self-contained. In order to run a profiling session, you have to install SQL Server Profiler tool on the machine where you want to run the trace server.

Chances are that you already have it there, but be sure to update to the matching version: mine works with 17.4 / 14.0.17213.0 but refused to work with older versions.

Now we can implement our trace server as a console application. The main method looks like this:

static void Main(string[] args) // args: <db server name> <db name> <trace file>
{
    // 1. Run trace server
    var connectionInfo = new SqlConnectionInfo(args[0])
    {
        DatabaseName = args[1],
        UseIntegratedSecurity = true
    };
    var trace = new TraceServer();
    trace.InitializeAsReader(connectionInfo, args[2]);

    // 2. Continuously read traces and send them to event hubs
    var tokenSource = new CancellationTokenSource();
    var readerTask = Task.Factory.StartNew(() => ReadTrace(trace, tokenSource.Token), tokenSource.Token);
    var senderTask = Task.Factory.StartNew(() => SendToEventHubs(tokenSource.Token), tokenSource.Token);

    // 3. Stop the trace
    Console.WriteLine("Press any key to stop...");
    Console.ReadKey();
    tokenSource.Cancel();
    Task.WaitAll(readerTask, senderTask);
}

The first block initializes SQL connection using command line arguments and integrated security, and then starts the Trace Server.

Because of the large volume, I made trace reader and event sender to work on separate threads. They talk to each other via a concurrent queue:

private static readonly ConcurrentQueue<string> eventQueue = new ConcurrentQueue<string>();

Finally, when operator presses any key, the cancellation is requested and the reader and sender get shut down.

Trace Reader task is a loop crunching though trace data and sending the SQL statements (with some exclusions) to the concurrent in-memory queue:

private static void ReadTrace(TraceServer trace, CancellationToken token)
{ 
    while (trace.Read() && !token.IsCancellationRequested)
    {
        var eventClass = trace["EventClass"].ToString();
        if (string.Compare(eventClass, "RPC:Completed") == 0)
        {
            var textData = trace["TextData"].ToString();
            if (!textData.Contains("sp_reset_connection")
                && !textData.Contains("sp_trace")
                && !textData.Contains("sqlagent"))
            {
                eventQueue.Enqueue(textData);
            }
        }
    }

    trace.Stop();
    trace.Close();
}

Event Sender is dequeueing SQL commands from in-memory queue to collect batches of events. As soon as a batch fills up, it gets dispatched to Event Hub:

private static void SendToEventHubs(CancellationToken token)
{
    var client = EventHubClient.CreateFromConnectionString(EventHubsConnectionString);
    var batch = client.CreateBatch();
    while (!token.IsCancellationRequested)
    {
        if (!eventQueue.TryDequeue(out string sql))
        {
            Thread.Sleep(10);
            continue;
        }

        var eventData = new EventData(Encoding.UTF8.GetBytes(sql));
        if (!batch.TryAdd(eventData) && batch.Count > 0)
        {
            client.SendAsync(batch.ToEnumerable())
                .ContinueWith(OnAsyncMethodFailed, token, TaskContinuationOptions.OnlyOnFaulted, TaskScheduler.Default);
            batch = client.CreateBatch();
            batch.TryAdd(eventData);
        }
    }
}

If your trace doesn't produce so many messages, you will probably want to periodically send out the batches even before they get full, just to keep that process closer to real time.

Note that sender does not await SendAsync call. Instead, we only subscribe to failures via OnAsyncMethodFailed callback to print it to console:

private static void OnMyAsyncMethodFailed(Task task)
{
    Console.WriteLine(task.Exception?.ToString() ?? "null error");
}

And that concludes the implementation of the Trace Server. SQL commands now go to Event Hub, to be picked up by Trace Replay.

Trace Replay Function App

To replay those traces against the target Azure SQL Database, I could make another console application which would contain EventProcessorHost to receive and process SQL commands.

However, under high load a single machine might not be able to keep up with executing all those commands in real time.

Instead, I decided to distribute such Replay App over multiple machines. To deploy a DDoS network, if you will :)

And I don't have to build, find, configure and synchronize all those servers myself, since we are living in the world of serverless.

Azure Functions are the perfect tool for this job. Once you start the trace server, Function App will start scaling up based on the amount of events in Event Hub, and will expand until it catches up with the workload.

But as long as you don't run the trace server, it won't consume any servers and won't cost you a dime.

Here is the implementation of Trace Replay Azure Function:

public static class Replay
{
    [FunctionName("Replay")]
    public static void Run(
        [EventHubTrigger("sqltrace", Connection = "EventHubsConn")] string sql,
        TraceWriter log)
    {
        var commandName = sql
            .Split(null)
            .SkipWhile(r => r != "exec" && r != "sp_executesql")
            .FirstOrDefault(r => !r.Contains("exec")) ?? "<empty>";

        var stopwatch = new Stopwatch();
        stopwatch.Start();

        try
        {
            using (var sqlConnection = new SqlConnection(AzureSqlConnectionString))
            using (var cmd = new SqlCommand())
            {
                sqlConnection.Open();

                cmd.CommandText = sql;
                cmd.CommandType = CommandType.Text;

                cmd.Connection = sqlConnection;

                int count = 0;
                using (var reader = cmd.ExecuteReader())
                {
                    while (reader.Read())
                    {
                        count++;
                    }
                }

                log.Info($"Processed {commandName} in {stopwatch.ElapsedMilliseconds} ms with {count} rows");
            }
        }
        catch (Exception ex)
        {
            log.Error($"Error in {commandName} in {stopwatch.ElapsedMilliseconds} {ex.Message}");
            throw;
        }
    }
}

It's super simple: the function gets a SQL statement, executes it with SqlCommand class and logs the result with timing and returned row count. And that's everything required to start bombarding my Azure SQL Database.

Evaluating Results

The purpose of this whole exercise was to evaluate whether a provisioned DTU level is enough to stand the load comparable to existing production.

So, after I ran the test, I could browse through the DTU usage chart in Azure portal to get overall usage statistics.

I've also spent quite some time analyzing the usage breakdown as reported by sp_BlitzCache from Responder Kit. Please note that it's not officially supported for Azure SQL Database, but it seems to work reasonably well.

Be sure to re-run your experiments multiple times, at different days and time intervals.

The full code sample can be found in my github.

I hope Azure SQL Database will perform to your expectations and within your budget. But hope is not a good strategy, so go ahead and try it out!

Happy DDoS-ing!

Tic-Tac-Toe with F#, Azure Functions, HATEOAS and Property-Based Testing

This post describes a toy application that I've built with F# and Azure Functions in about 1 day of work. It shows a simple end-to-end implementation with some useful techniques applied, and can be used as a reference point for anyone interested in one of the topics mentioned in the title.

The requirements for my application are quite simple:

  • Implement the game of Tic-Tac-Toe for a human player to play against the computer
  • The field is 3x3, the player to have three-in-a-row wins
  • After the game, the score is calculated based on the number of moves combined with the duration of the game
  • The history of players' scores is persisted and presented as the leaderboard

Below I go through the code step by step. Feel free to jump to the part which interests you the most: Domain Modelling, Azure Functions, HATEOAS, Property-Based Testing.

The game is online, so you can play it here.

The full source code can be found in my github.

Modeling the Game with Types

I start with a domain model. The model is composed of immutable F# types (records and discriminated unions) and pure functions.

We have two players, so we need a type for them:

type Player = X | O

In addition, there is a useful function to return the other player based on the given one. Simple pattern-matching will do:

module Player =
  let other = function | X -> O | O -> X

The domain code is the most important part of the application, so I want it to be covered by unit tests. Of course, the above function doesn't really warrant testing, but it's a nice and simple way to try out Property-Based Testing. That is, instead of defining specific tests, we define properties which hold for any valid input.

For other function, I came up with two properties:

  • Other player is not equal to original player
  • Other player of other player is the player itself

Here is the code with Expecto and FsCheck:

testProperty "Other player is not equal to player" <| fun x ->
  Expect.notEqual x (Player.other x) "Other should be different from original"

testProperty "Other player of other player is the player itself" <| fun x ->
  Expect.equal x (Player.other (Player.other x)) "Other other should be equal to original"

Let's move on to modelling the game. I decided to define a union type to be used for horizontal and vertical positions of the cells:

type Position = One | Two | Three

I could use the normal integers instead, but I don't want to be worried about validating the ranges all the time.

My first record type models the move, or action done by a player: it has X and Y positions of the chosen cell, plus the player information:

type Move = {
  X: Position
  Y: Position
  By: Player
}

The following type RunningGame has just two properties, but its shape defines the design of the whole application:

type RunningGame = {
  MovesDone: Move list
  PossibleMoves: Move list
}

This type models any state of the game which is not finished yet.

MovesDone represents the ordered log of all moves, so we have the complete history of actions at any time. Event Sourcing in small.

Equally importantly, there is a list of all possible moves at this point of the game. I could get away without this property: in the end, it can always be derived from the history of done moves and 3x3 size of the field.

However, having the list of possible moves simplifies the design of all the decision maker (client) code:

  • Clients don't have to search for remaining cells based on move log
  • Validation of a move received from clients gets trivial: just check that it's in the list of possible moves
  • Bot implementation gets easy: it just needs to pick one of the valid moves. The most trivial bot is a one-liner: it picks a random move from the collection
  • Tests take advantage of this in a similar way, see Game Tests below
  • We build a nice bridge into HATEOAS-style API, where links provided in the response correspond to possible moves, see REST API below

Now, we can model a game which is already finished:

type GameOutcome = Won of Player | Tie

type FinishedGame = {
  MovesDone: Move list
  Outcome: GameOutcome
}

Each finished game has a list of moves and the outcome: either one player won, or there was a tie.

Each state of a game can be described by the union of the previous two states:

type GameState = 
  | Finished of FinishedGame
  | InProgress of RunningGame

Modeling Game Flow

Now, when all the types are in place, we can model the game flow. The flow is a sequence of transitions between game states, implemented with pure functions.

First, each game starts at the same state, which is an empty field, and X turn. Here is the value which represents this initial state:

module Game =
  let initialState = 
    let positions = [One; Two; Three]
    let cells = seq { 
      for x in positions do
         for y in positions do
            yield { X = x; Y = y; By = X }
      }
    { MovesDone = []; PossibleMoves = List.ofSeq cells }

After each move is made, we need a function to evaluate move outcome: whether current game is finished or is still in progress. I defined a function evaluate for that:

let private evaluate (history: Move list): GameOutcome option = ...

I don't show the full body here, since it's quite boring in evaluating rows, columns and diagonals for three-in-a-row. See the full code if you want to.

The following function is even more important: that's the main domain function called makeMove. Its type is RunningGame -> Move -> GameState which perfectly communicates its intent: given a running game and a move, it returns the game state after the move. Note that

  • You can't pass a finished game as an argument, because making a move on finished game doesn't make sense
  • The result can be a finished game

Here is the function implementation:

let makeMove (game: RunningGame) (move: Move): GameState =
  let movesDone = move :: game.MovesDone
  match evaluate movesDone with
  | Some result -> Finished { MovesDone = movesDone; Outcome = result }
  | None ->
    let possibleMoves = 
      List.except [move] game.PossibleMoves
      |> List.map (fun m -> { m with By = Player.other m.By })
    InProgress { MovesDone = movesDone; PossibleMoves = possibleMoves }

It works like this:

  • Prepend the new move to moves done
  • Evaluate the game result of these combined moves
  • If the result is known, return a Finished game with calculated outcome
  • If the result is not clear yet, return InProgress game with possible moves same as before, but excluding the move and assigned to the other player

Tic-Tic-Toe is two-player game, so I defined another function which runs a turn of 2 moves by 2 players, given the decision making functions of both players (so it's a higher-order function):

let makeRound player1 player2 gameState =
  let newGameState = player1 gameState |> makeMove gameState
  match newGameState with
  | Finished _ -> newGameState
  | InProgress p -> player2 p |> makeMove p

Looks almost like monadic bind operation...

Property-Based Testing

I've already shown two simplistic properties for other function.

It's a bit more challenging to come up with invariant properties when it comes to testing the game itself. After some brainstorming, I've made the following list:

  • The game is never finished after 4 moves
  • The game is always finished after 9 moves
  • X and 0 have to make moves in turns
  • Player wins by filling one column
  • Player wins by filling one row
  • Player wins by filling diagonal
  • Evaluate a known tie

Each property-based test should accept some input from the testing framework. It should then evaluate the test against this input and assert the invariants. If the property holds for any possible valid input, the test is green.

I decided to structure my property tests in the following way:

  • Each test accepts a list of non-negative integers
  • Each integer is interpreted as an index of a possible move to select at turn i. That means that the test receives a sequence which uniquely identifies the moves to be made
  • We can restrict this sequence to a scenario under test, e.g. make it less than 4 moves, or exactly 9 moves, or pick moves from a limited subset of all possible moves
  • We apply the moves to calculate the end result
  • We assert that the result confirms the property under test

Now, it's the responsibility of property based testing framework to generate all kinds of input lists to try to break our property. If it succeeds, it will print the exact input which causes the test to fail.

Here is how one such test is implemented.

A helper function plays a sequence of indexes as moves:

let playSequence moves = 
  let playOne s i =
    match s with
    | InProgress p -> Game.makeMove p (p.PossibleMoves.[i % p.PossibleMoves.Length])
    | _ -> s
  List.fold playOne (InProgress Game.initialState) moves

Then the property "The game is always finished after 9 moves" is simply:

testProp "The game is always finished after 9 moves" <| fun (Gen.ListOf9 xs) ->
  let result = playSequence xs
  Expect.isTrue (Game.isFinished result) "Game should be finished"

Note the restriction Gen.ListOf9 xs that we put on the input sequence. It's a generator that I defined, so that the list always contains exactly 9 elements.

Other property tests follow a similar pattern, you can see them here.

REST API

Now, when I'm done with Game domain model, I want to define API that our HTTP service will expose to the clients. I define my API in REST model.

The main resource is /game resource. To start a new game, a client has to send POST command:

POST /game
Content-Type: application/json

{ "name": "Mikhail" }

And the response body is JSON which denotes a new game created:

{
    "id": "5d7b2261",
    "busyCells": [],
    "links": [
        {
            "rel": "x1y1",
            "href": "/game/5d7b2261/move/0"
        },
        {
            "rel": "x1y2",
            "href": "/game/5d7b2261/move/1"
        },
        // ... 7 more links follow
    ]
}

The response contains a game ID and the list of occupied cells, empty for now, because no moves have been made.

More importantly, it contains a list of links, each one of which has a rel field representing a cell, and a link. The client should POST to this link if it wants to make a move on the corresponding cell:

POST /game/5d7b2261/move/1

And the response is:

{
    "id": "5d7b2261",
    "result": null,
    "busyCells": [
        {
            "name": "x2y3",
            "value": "O"
        },
        {
            "name": "x1y2",
            "value": "X"
        }
    ],
    "links": [
        {
            "rel": "x1y1",
            "href": "/game/5d7b2261/move/0"
        },
        {
            "rel": "x1y3",
            "href": "/game/5d7b2261/move/1"
        },
        // ... 5 more links follow
    ]
}

It has the same structure as before, but now two cells are occupied: one X and one O. The list of links now has only 7 links, based on the count of free cells.

The client keeps navigating the links until it gets non-empty result field:

{
    "id": "5d7b2261",
    "result": "You Win!",
    "busyCells": [
        {
            "name": "x3y1",
            "value": "X"
        },
        {
            "name": "x3y2",
            "value": "O"
        },
        // ... more cells here
    ],
    "links": [],
    "score": 401
}

This denotes the end of the game. There's no more links to navigate, so the client knows it has to stop playing.

This API is designed in HATEOAS-style (Hypermedia as the Engine of Application State). The clients only need to know the initial URL, while all the other URLs are received from the previous responses. It resembles the way a human navigates websites.

Azure Functions

I implemented the above API with Azure Functions. I used .NET Standard based v2 runtime with precompiled F# functions.

The initial POST /game request is handled by Start function:

type GameRequest = { Name: string }

type Cell = { Name: string; Value: string }
type Link = { Rel:  string; Href:  string }

type GameDTO = {
  Id: string
  Result: string
  BusyCells: Cell list
  Links: Link list
  Score: int
}

[<FunctionName("Start")>]
let start([<HttpTrigger(AuthorizationLevel.Anonymous, "POST", Route = "game")>] req: GameRequest,
          [<Table("TicTacToe")>] store: ICollector<GameEntity>) =
  let gameid = Guid.NewGuid().ToString()
  let state = InProgress Game.initialState
  let serializedState = JsonConvert.SerializeObject state
  store.Add(GameEntity(PartitionKey = "default", RowKey = gameid, Name = req.Name, State = serializedState))
  ObjectResult(Api.serialize gameid state 0)

The outline of this function:

  • It's triggered by HTTP POST request, as configured for req parameter
  • The request body is parsed to GameRequest type containing player name
  • It generates a new game ID
  • It creates initial game state of empty field
  • It serializes the state and saves it to Table Storage with store output binding
  • It returns HTTP body with serialized game response of type GameDTO

The second Function Play handles the moves:

[<FunctionName("Play")>]
let play([<HttpTrigger(AuthorizationLevel.Anonymous, "POST", Route = "game/{gameid}/move/{index}")>] 
         req: HttpRequest, gameid: string, index: int,
         [<Table("TicTacToe", "default", "{gameid}")>] entity: GameEntity) =
  let state = JsonConvert.DeserializeObject<GameState> entity.State
  match state with
  | Finished _ -> BadRequestResult() :> IActionResult
  | InProgress p when index < 0 || index >= p.PossibleMoves.Length -> BadRequestResult() :> IActionResult
  | InProgress p -> 
    let result = Game.makeRound (fun _ -> p.PossibleMoves.[index]) Bot.pickMove p
    entity.State <- JsonConvert.SerializeObject result
    entity.Score <- Scoring.calculateScore (DateTime.UtcNow - entity.StartedAt).TotalMilliseconds result
    ObjectResult(Api.serialize gameid result entity.Score) :> IActionResult

The outline is very similar:

  • It's triggered by a POST request with a URL template containing game ID and move index
  • It has an in/out Table Storage binding which reads the serialized state saved after previous game and move requests
  • It validates the state: if the game is already finished, or if the move index is not in the valid range, Bad Request HTTP status is returned
  • If the move is valid, it runs the round, including bot play
  • It also calculates the score, which is going to be non-zero only for finished games
  • The game state and the score are saved to Table Storage entity (that's the only mutation in the whole application)

Bot

Azure Function above used a Bot.pickMove function which I haven't described yet.

This function has the type RunningGame -> Move, exactly what is expected by makeRound game function. Its goal is to pick the O move for any given game-in-progress.

Obviously, 3x3 Tic-Tac-Toe is a very simple game and it's quite easy to make a perfectly playing bot. This wasn't the goal though: it's more fun for a human to win.

So, actually, the only property test that I ended up implementing is the following:

testProp "Bot is able to play O at any possible position" <| fun (Gen.ListOfNonNegative xs) ->
  let human i p _ = p.PossibleMoves.[i % p.PossibleMoves.Length]
  let round s i =
    match s with
    | InProgress p -> Game.makeRound (human i p) Bot.pickMove p
    | _ -> s
  List.fold round (InProgress Game.initialState) xs |> ignore

It makes sure that for any possible sequence of human moves, bot is actually able to make any move of its own. Bot just shouldn't crash :)

My very first implementation of the bot was just picking a random move. Such bot is fine, but it's too boring to play against.

So, my current bot implementation has 3 rules:

  • If there is a move that immediately wins the game, do that move
  • If possible, don't pick a move which leads to immediate loss after the next human move
  • Otherwise, pick a random move

I implemented the bot using the approach described in my Functional Fold as Decision Tree Pattern post:

let pickMove (game: RunningGame) = 
  [winNow O; notLoseNow; pickRandom]
  |> Seq.ofList
  |> Seq.choose (fun x -> x game)
  |> Seq.head

So, there is a prioritized list of decision functions. The first one returning Some decision will be promoted to final decision.

And here are those functions:

let winNow player (game: RunningGame) =
  let isWin = function | Finished { Outcome = Won x } when x = player -> true | _ -> false
  game.PossibleMoves
  |> List.tryFind (fun move -> Game.makeMove game move |> isWin)

let notLoseNow (game: RunningGame) =
  let canLose = function 
    | InProgress p -> match winNow X p with | Some _ -> true | None -> false
    | _ -> false
  let notLosingMoves =
    game.PossibleMoves
    |> List.filter (fun move -> Game.makeMove game move |> canLose |> not)
  if List.isEmpty notLosingMoves && notLosingMoves.Length < game.PossibleMoves.Length then None
  else Some (notLosingMoves.[random.Next notLosingMoves.Length])

let pickRandom (game: RunningGame) = 
  Some (game.PossibleMoves.[random.Next game.PossibleMoves.Length])

Such rule-based setup is easy to extend, and also to test when it becomes needed.

Score Calculation

Last twist to the application is the scoring system. If a human player wins or gets a tie, they are assigned a numeric score, which can be then compared to the historic leaderboard of all players.

The score is calculated based on two principles: the less moves you make, and the faster you play, the higher the score is. Move count is more important than timing.

These principles are nicely expressed as property tests:

testProp "The score of faster game is not lower than slower game" 
  <| fun (Gen.Positive duration1) (Gen.Positive duration2) game ->
  let (slower, faster) = maxmin id duration1 duration2
  let scoreFaster = Scoring.calculateScore faster game
  let scoreSlower = Scoring.calculateScore slower game
  Expect.isGreaterThanOrEqual scoreFaster scoreSlower "Bigger duration has lower score (or same)"

testProp "The score of won game in less moves is greater than game with more moves" 
  <| fun (Gen.Positive duration1) (Gen.Positive duration2) game1 game2 ->
  let (slower, faster) = maxmin id duration1 duration2
  let (moreMoves, lessMoves) = maxmin List.length game1 game2
  let score1 = Scoring.calculateScore slower (Finished { Outcome = Won X; MovesDone = lessMoves })
  let score2 = Scoring.calculateScore faster (Finished { Outcome = Won X; MovesDone = moreMoves })
  if moreMoves.Length = lessMoves.Length then
    Expect.isGreaterThanOrEqual score1 score2 "Bigger duration has lower score (or same)"
  else
    Expect.isGreaterThan score1 score2 "More moves have lower score"

Note that tests are parameterized for durations and game states. We don't have to come up with specific scenarios: the framework should take care of those.

One of the possible implementations for scoring is:

let calculateScore duration (state: GameState) =
  let durationScore = (100.0 * (1.0 - duration / (duration + 10000.0))) |> int
  match state with
  | Finished { Outcome = Won X; MovesDone = ms } -> (11 - ms.Length) * 100 + durationScore
  | Finished { Outcome = Tie } -> durationScore
  | _ -> 0

Now, the leaderboard piece. You've already seen the bits of this functionality in Azure Functions: they store the game state into Azure Table Storage.

There is another Azure Function which handles GET requests to /leaderboard resource. It loads all the past games from Table Storage, and then passes them to leaderboard calculation function below:

let calculateLeaderboard top ns =
  ns
  |> Seq.filter (fun entity -> snd entity > 0 && not (String.IsNullOrEmpty (fst entity)))
  |> Seq.sortByDescending snd
  |> Seq.truncate top
  |> Seq.mapi (fun index entity -> { Index = index + 1; Name = fst entity; Score = snd entity })

Wrapping Up

Ok, the application is simple, but the blog post ended up being quite long. Thank you if you made it so far.

I touched base on several important concepts and tools, which can be useful apart or in combination.

Please leave a comment if such kind of articles is useful, and which part you found most inspirational or boring.

The full source code can be found in my github.

Happy coding!

Azure Functions Get More Scalable and Elastic

Back in August this year, I've posted Azure Functions: Are They Really Infinitely Scalable and Elastic? with two experiments about Azure Function App auto scaling. I ran a simple CPU-bound function based on Bcrypt hashing, and measured how well Azure was running my Function under load.

The results were rather pessimistic. Functions were scaling up to many instances, but there were significant delays in doing so, so the processing slowed down up to 40 minutes.

Azure Functions team notified me that they rolled out an updated version of the service, which should significantly improve my results.

So I ran the exact same tests again, and got the new results. I will show these results below.

TL;DR. Scaling responsiveness improved significantly. The max delay reduced from 40 to 6 minutes. There are some improvements still to be desired: sub-minute latency is not yet reachable for similar scenarios.

Setup

See the Function code and the description of the two experiments in my previous article.

Experiment 1: Steady Load

In "Steady Load" scenario 100,000 messages were sent to the queue at constant pace, evenly spread over 2 hours.

Here are the old metrics of Queue Backlog and Instance Count over time:

Function App Scaling (Old)

Old charts are shown in gray background

You can see a huge delay of almost one hour before the function caught up to speed of incoming messages and half-an-hour more before the backlog got cleared.

The new results on the same chart after the runtime update:

Function App Scaling (New)

This looks much better. The maximum backlog is 7 times lower; there's almost no initial delay before the auto scaling kicks in; and overall instance allocation is much more stable.


One more chart is from the same experiment, but it shows slightly different metrics. The old results of Delay (Age) in seconds and Processing Rate in messages per minute:

Function App Delay

The new chart after the runtime update:

Function App Delay

Again, much less delay overall, and processing rate more-or-less stabilizes after the first 15 minutes.

Experiment 2: Spiky Load

The second experiment spanned over 5 hours. The messages were sent mostly at low-ish fixed rate, except for 5 periods of sudden spikes. The green line on the charts below shows these spikes very well.

At the first run 4 months ago, Functions runtime had troubles keeping up to speed even between those bursts of messages.

Here is the chart of the old spiky load processing:

Spicky Load Processing (Old)

You can see that the backlog after each spike goes down really slow. The blue line of processing rate doesn't match the green line almost nowhere, which reveals the struggle to adapt.

The new results of the same chart after the runtime update are quite different:

Spicky Load Processing (New)

Notice how the backlog is empty and the blue processing rate matches exactly the incoming rate during all time except after traffic bursts. The queue goes up during each spike, but the processing rate immediately accelerates too, and the crisis is gone within 15 minutes.

Conclusions

Azure Functions team is clearly working on improvements. While the results in August were puzzling or even embarrassing, the December benchmark makes much more sense.

Looks like Azure Functions are now suitable for CPU-intensive data processing scenarios with flexible load, targeting the maximum delay at about several minutes.

Obviously, the results are not perfect just yet. Here's what still can be done better:

  • Scale faster initially. In the first experiment, the biggest delay appeared right after the start, when the backlog was growing linearly for 10 minutes. "0 to 100" might not be a very realistic scenario, but probably that's how many folks will test Functions against their workloads.

  • Do not scale down that fast after backlog goes to 0. Every time the queue backlog goes to 0, the runtime kills the biggest part of instances almost immediately. During my runs, this caused the queue to grow again without a good reason from user's perspective.

  • Do not allow the backlog to grow without message spikes. Related to the previous item, but slightly different focus. When the load is stable, I would expect the runtime to keep my queue as close to empty as possible. I guess Azure tries to minimize the resources that it consumes behind the scenes, but this should be balanced in favor of user experience.

  • Make scaling algorithms more open. It's a black box right now. I would love to see some documentation, if not code, to be published about what exactly to expect from Consumption Plan auto scaling.

I'll be running more scaling experiments with other types of workloads in the nearest future, so... more benchmarks are coming.

Happy scaling!

Precompiled Azure Functions in F#

This post is giving a start to F# Advent Calendar in English 2017. Please follow the calendar for all the great posts to come.

Azure Functions is a "serverless" cloud offering from Microsoft. It allows you to run your custom code as response to events in the cloud. Functions are very easy to start with; and you only pay per execution - with free allowance sufficient for any proof-of-concept, hobby project or even low-usage production loads. And when you need more, Azure will scale your project up automatically.

F# is one of the officially supported languages for Azure Functions. Originally, F# support started with F# Script files (authored directly in Azure portal or copied from local editor), so you can find many articles online to get started, e.g. Creating an Azure Function in F# from the ground up and Part 2 by Mathias Brandewinder.

However, I find script-based model a bit limited. In today's article I will focus on creating Azure Functions as precompiled .NET libraries. Along the way, I'll use cross-platform tools like .NET Core and VS Code, and I'll show how to integrate Functions with some popular tools like Suave and Paket.

Create a Project

You can follow this walkthrough on Windows or Mac, just make sure that you have .NET Core 2 and Node.js 8.x with npm installed. My editor of choice is Visual Studio Code with Ionide plugin.

I'll show you how to create a new F# Function App from scratch. If you want to jump to runnable project, you can get it from my github.

We start with creating a new F# library project for .NET Standard 2. Run in your command line:

dotnet new classlib --language F# --name HelloFunctions

This command creates a folder with two files: HelloFunctions.fsproj project file and Library.fs source code file.

Now, add a reference to Azure Functions NuGet package:

dotnet add package Microsoft.NET.Sdk.Functions

Define a Function

Open Library.fs code file and change it to the following code:

namespace HelloFunctions

open System
open Microsoft.Azure.WebJobs
open Microsoft.Azure.WebJobs.Host

module Say =
  let private daysUntil (d: DateTime) =
    (d - DateTime.Now).TotalDays |> int

  let hello (timer: TimerInfo, log: TraceWriter) =
    let christmas = new DateTime(2017, 12, 25)

    daysUntil christmas
    |> sprintf "%d days until Christmas"
    |> log.Info

We defined a function hello which should be triggered by Functions runtime based on time intervals. Every time the function is called, we log how many days we still need to wait before Christmas 2017.

To convert this simple F# function to an Azure Function, create a folder called Hello (or choose any other name) next to the project file and add function.json file in there:

{
  "bindings": [
    {
      "name": "timer",
      "type": "timerTrigger",
      "schedule": "0 * * * * *"
    }
  ],
  "scriptFile": "../bin/HelloFunctions.dll",
  "entryPoint": "HelloFunctions.Say.hello"
}

We defined that:

  • Our function is triggered by timer
  • It runs every minute at 0 seconds
  • The entry point is our hello function in the compiled assembly

Prepare Local Runtime

There are a couple more configuration files needed to be able to run the Function App locally. host.json defines hosting parameters; empty file will do for now:

{
}

Most triggers need to connect to a Storage Account. For examples, timer trigger uses it to hold leases to define which running instance will actually execute the action every minute. Copy a connection string to your Storage Account (local Storage emulator is fine too) and put it into local.settings.json file:

{
  "IsEncrypted": false,
  "Values": {
    "AzureWebJobsStorage": "...your connection string..."
  }
}

Note that this file is only used for local development and is not published to Azure by default.

Finally, we need to modify fsproj file to make the build tool copy those files into bin folder. Add the following section in there:

<ItemGroup>
  <Content Include="Hello\function.json">
    <CopyToOutputDirectory>PreserveNewest</CopyToOutputDirectory>
  </Content>
  <Content Include="host.json">
    <CopyToOutputDirectory>PreserveNewest</CopyToOutputDirectory>
  </Content>
  <Content Include="local.settings.json">
    <CopyToOutputDirectory>PreserveNewest</CopyToOutputDirectory>
  </Content>
</ItemGroup>

Run App Locally

The first step is to build and publish our Function App with dotnet commands:

dotnet build
dotnet publish

The first line produces the dll file and the second line copies it and all of its dependencies to publish folder.

The nice thing about Azure Functions is that you can easily run them locally on a development machine. Execute the following command to install the runtime and all the required libraries:

npm install -g [email protected]

This will add a func CLI to your system which is the tool to use for all Function related operations.

Navigate to bin\Debug\netstandard2.0\publish folder and run func start from there. You should see that your app is now running, and your timer function is scheduled for execution:

Function App Start

Once the next minute comes, the timer will trigger and you will see messages in the log:

Timer Trigger Working

Integrate into VS Code

You are free to use full Visual Studio or any editor to develop Function Apps in F#. I've been mostly using VS Code for this purpose, and I believe it's quite popular among F# community.

If you use VS Code, be sure to setup the tasks that you can use from within the editor. I usually have at least 3 tasks: "build" (dotnet build), "publish" (dotnet publish) and "run" (func start --script-root bin\\debug\\netstandard2.0\\publish), with shortcuts configured to all of them.

You can find an example of tasks.json file here.

Also, check out Azure Functions Extension.

Deploy to Azure

You can deploy the exact same application binaries to Azure. Start by creating an empty Function App in the portal, or via Azure CLI (func CLI does not support that).

Then run the following command to deploy your precompiled function to this app:

func azure functionapp publish <FunctionAppName>

At the first run, it will verify your Azure credentials.

In real-life production scenarios your workflow is probably going to be similar to this:

  • Change Function App code
  • Run it locally to test the change
  • Push the code changes to the source control repository
  • Have your CI/CD pipeline build it, run the tests and then push the binaries to Azure Functions environment

HTTP Trigger

Timer-triggered functions are useful, but that's just one limited use case. Several other event types can trigger Azure Functions, and for all of them you can create precompiled functions and run them locally.

The most ubiquotous trigger for any serverless app is probably HTTP. So, for the rest of the article I will focus on several approaches to implement HTTP functions. Nonetheless, the same techique can be applied to other triggers too.

F# code for the simplest HTTP Function can look like this:

namespace PrecompiledApp

open Microsoft.AspNetCore.Mvc
open Microsoft.AspNetCore.Http
open Microsoft.Azure.WebJobs.Host

module PrecompiledHttp =

  let run(req: HttpRequest, log: TraceWriter) =
    log.Info("F# HTTP trigger function processed a request.")
    ContentResult(Content = "HO HO HO Merry Christmas", ContentType = "text/html")

You can find a full example of HTTP Function App here.

This code is using ASP.NET Core classes for request and response. It's still just an F# function, so we need to bind it to a trigger in function.json:

{
  "bindings": [
    {
      "type": "httpTrigger",
      "methods": ["get"],
      "authLevel": "anonymous",
      "name": "req",
      "route": "hellosanta"
    }
  ],
  "scriptFile": "../bin/PrecompiledApp.dll",
  "entryPoint": "PrecompiledApp.PrecompiledHttp.run"
}

If you run the app, the function will be hosted at localhost

HTTP Trigger Working

And a request to http://localhost:7071/api/hellosanta will get responded with our "HO HO HO" message.

This function is of "Hello World" level, but the fact that it's inside a normal F# library gives you lots of power.

Let's see at some examples of how to use it.

Suave Function

What can we do to enhance developer experience? We can use our favourite F# libraries.

Suave is one of the most popular libraries to implement Web API's with. And we can use it in Azure Functions too!

Let's first make a small twist to HTTP trigger definition in function.json:

"bindings": [
  {
    "type": "httpTrigger",
    "methods": ["get"],
    "authLevel": "anonymous",
    "name": "req",
    "route": "{*anything}"
  }
],

Binding now defines a wildcard route to redirect all requests to this function. That's because we want Suave to take care of routing for us.

The definition of such routing will look familiar to all Suave users:

module App =
  open Suave
  open Suave.Successful
  open Suave.Operators
  open Suave.Filters

  let app = 
    GET >=> choose
      [ path "/api/what" >=> OK "Every time we love, every time we give, it's Christmas."
        path "/api/when" >=> OK "Christmas isn't a season. It's a feeling."
        path "/api/how" >=> OK "For it is in giving that we receive." ]

Azure Function is just a one-liner wiring Suave app into the pipeline:

module Http =
  open Suave.Azure.Functions.Context

  let run req =
    req |> runWebPart App.app  |> Async.StartAsTask

The heavy lifting is done by runWebPart function, which is a utility function defined in the same application. You can see the full code of this wiring in my repo.

Run the application and request the URL http://localhost:7071/api/what to see the function in action.

This example is very simple, but you can do lots of powerful stuff with Suave! Most probably, you shouldn't go over the root and try to fit whole mulpti-resource REST API into a single Azure Function. But it might still make sense to keep related HTTP calls together, and Suave can help to keep it cleaner.

Managing Dependencies with Paket

Once your Function App becomes bigger and you start using multiple F# projects, it makes sense to switch to Paket package manager.

It is totally possible to use Paket with Azure Functions. There isn't much specific to Azure Functions, really. Here is an example of paket.dependecies file

source https://www.nuget.org/api/v2

framework: >= netstandard2.0
nuget FSharp.Core
nuget Microsoft.NET.Sdk.Functions
nuget Microsoft.AspNetCore.Mvc.Core

that I used in example which demonstrates Paket + Functions combination.

Attribute-Based Functions

Up until now, we were writing function.json files manually for each function. This is not very tedious, but it is error prone. Microsoft offers an alternative programming model where these files are auto-generated by Functions SDK.

This programming model is based on attributes, which are similar to WebJobs SDK attributes. With this approach, there's no function.json file in the project. Instead, the function declaration is decorated with attributes:

[<FunctionName("AttributeBased")>]
let run([<HttpTrigger>] req: HttpRequest, log: TraceWriter)

The same development flow still works. Once you run dotnet build, a new function.json file will be generated and placed into bin folder. Functions runtime will be able to use it to run the function as usual.

Note that the generated file looks a bit different from the manual equivalent:

  1. It manifests itself with

     "generatedBy": "Microsoft.NET.Sdk.Functions.Generator-1.0.6",
     "configurationSource": "attributes",
    
  2. In case you use input and output bindings, you won't be able to see them in the generated file. Only trigger will be visible in json. Don't worry, input and output bindings will still work.

You can find an example of HTTP function with attributes here.

There are pro's and con's in this model. Obviously, not having to write JSON files manually is beneficial. Some people find the binding attributes really ugly though, especially when you have 3 or 4 bindings and each has multiple parameters.

My preference is to use attributes, but don't mix attribute decoration with real code. I.e. keep the Function's body to a simple 1-liner, and delegate the call to a properly defined F# function with the actual domain logic.

Wrapping Up

Lots of F# users value the language for how quickly one can be productive with it: based on concise syntax, powerful libraries and tools like FSI.

In my opinion, Azure Functions fit nicely into the picture. It takes just several minutes before you can run your first Function App on developer machine, and then seamlessly transfer it into the cloud.

I've prepared a github repository where you can find more Examples of Azure Functions implemented in F#.

Merry Serverless Functional Christmas!

Azure F#unctions Talk at FSharping Meetup in Prague

On November 8th 2017 I gave a talk about developing Azure Functions in F# at FSharping meetup in Prague.

I really enjoyed giving this talk: the audience was great and asked awesome questions. One more prove that F# community is so welcoming and energizing!

All the demos of that session can be found in my github repository.

The slides were only a small portion of my talk, but you can see them below anyways.

Link to full-screen HTML slides: Azure F#unctions

Slides on SlideShare:

Thanks for attending my talk! Feel free to post any feedback in the comments.

Mikhail Shilkov I'm Mikhail Shilkov, a software developer. I enjoy F#, C#, Javascript and SQL development, reasoning about distributed systems, data processing pipelines, cloud and web apps. I blog about my experience on this website.

LinkedIn@mikhailshilkovGitHubStack Overflow