Azure Functions: Cold Starts in Numbers

Auto-provisioning and auto-scalability are the killer features of Function-as-a-Service cloud offerings, and Azure Functions in particular.

One drawback of such dynamic provisioning is a phenomenon called "Cold Start". Basically, applications that haven't been used for a while take longer to startup and to handle the first request.

The problem is nicely described in Understanding Serverless Cold Start, so I won't repeat it here. I'll just copy a picture from that article:

Cold Start

Based on the 4 actions which happen during a cold start, we may guess that the following factors might affect the cold start duration:

  • Language / execution runtime
  • Azure Functions runtime version
  • Application size including dependencies

I ran several sample functions and tried to analyze the impact of these factors on cold start time.

Methodology

All tests were run against HTTP Functions, because that's where cold start matters the most.

All the functions were just returning "Hello, World" taking the "World" value from the query string. Some functions were also loading extra dependencies, see below.

I did not rely on execution time reported by Azure. Instead, I measured end-to-end duration from client perspective. All calls were made from within the same Azure region, so network latency should have minimal impact:

Test Setup

When Does Cold Start Happen?

Obviously, cold start happens when the very first request comes in. After that request is processed, the instance is kept alive in case subsequent requests arrive. But for how long?

The following chart gives the answer. It shows values of normalized request durations across different languages and runtime versions (Y axis) depending on the time since the previous request in minutes (X axis):

Cold Start Threshold

Clearly, an idle instance lives for 20 minutes and then gets recycled. All requests after 20 minutes threshold hit another cold start.

How Do Languages Compare?

I'll start with version 1 of Functions runtime, which is the production-ready GA version as of today.

I've written Hello World HTTP function in all GA languages: C#, F# and Javascript, and I added Python for comparison. C#/F# were executed both in the form of script, and as a precompiled .NET assembly.

The following chart shows some intuition about the cold start duration per language. The languages are ordered based on mean response time, from lowest to highest. 65% of request durations are inside the vertical bar (1-sigma interval) and 95% are inside the vertical line (2-sigma):

Cold Start V1 per Language

Somewhat surprisingly, precompiled .NET is exactly on par with Javascript. Javascript "Hello World" is really lightweight, so I expected it to win, but I was wrong.

C# Script is slower but somewhat comparable. F# Script presented a really negative surprise though: it's much slower. It's even slower than experimental Python support where no performance optimization would be expected at all!

Functions Runtime: V1 vs V2

Version 2 of Functions runtime is currently in preview and not suitable for production load. That probably means they haven't done too much performance optimization, especially from cold start standpoint.

Can we see this on the chart? We sure can:

Cold Start V1 vs V2

V2 is massively slower. The fastest cold starts are around 6 seconds, but the slowest can come up to 40-50 seconds.

Javascript is again on-par with precompiled .NET.

Java is noticeably slower, even though the deployment package is just 33kB, so I assume I didn't overblow it.

Does Size Matter?

OK, enough of Hello World. A real-life function might be more heavy, mainly because it would depend on other third-party libraries.

To simulate such scenario, I've measured cold starts for a .NET function with references to Entity Framework, Automapper, Polly and Serilog.

For Javascript I did the same, but referenced Bluebird, lodash and AWS SDK.

Here are the results:

Cold Start Dependencies

As expected, the dependencies slow the loading down. You should keep your Functions lean, otherwise you will pay in seconds for every cold start.

An important note for Javascript developers: the above numbers are for Functions deployed after Funcpack preprocessor. The package contained the single js file with Webpack-ed dependency tree. Without that, the mean cold start time of the same function is 20 seconds!

Conclusions

Here are some lessons learned from all the experiments above:

  • Be prepared for 1-3 seconds cold starts even for the smallest Functions
  • Stay on V1 of runtime until V2 goes GA unless you don't care about perf
  • .NET precompiled and Javascript Functions have roughly same cold start time
  • Minimize the amount of dependencies, only bring what's needed

Do you see anything weird or unexpected in my results? Do you need me to dig deeper on other aspects? Please leave a comment below or ping me on twitter, and let's sort it all out.

There is a follow-up post available: Cold Starts Beyond First Request in Azure Functions

Awesome F# Exchange 2018

I'm writing this post in the train to London Stensted, on my way back from F# Exchange 2018 conference.

F# Exchange is a yearly conference taking place in London, and 2018 edition was the first one for me personally. I also had an honour to speak there about creating Azure Functions with F#.

Impression

F# is still relatively niche language, so the conference is not overcrowded, but that gives it a special feeling of family gathering. There were 162 participants this year, and I have an impression that every one of them is extremely friendly, enthusiastic and just plain awesome.

The conference itself had 2 tracks of 45-minute talks and 60-minute keynotes. Most talks were of high quality, and the topics ranging from compiler internals to fun applications like music generation, car racing and map drawing.

Both Don Syme, the creator of F#, and Philip Carter, F# program manager, were there and gave keynotes, but they were careful enough not to draw too much attention on Microsoft and let the community speak loud.

Corridor Track

But the talks were just a part of the story. For me, the conference started in the evening before the first day at the speakers drinks party, and only finished at 1 a.m. after the second day (the pubs in London are lovely).

I spoke to so many great people, I learnt a lot, and had fun too. I've never seen so many F# folks at the same place, and I guess there must be something about F# which attracts the right kind of people to it.

And of course it's so much fun to meet face-to-face all those twitter, slack, github and Channel 9 persona's and to see that they are actually real people :)

My Talk

The talk I gave was called "Azure F#unctions". It was not a hard-core F# talk, but people seemed to be genuinely interested in the topic.

A decent amount of attendees are already familiar with Azure Functions, and many either run them in production or plan to do so.

The reference version conflict problem is very well known and raises a lots of questions or concerns. This even leads to workarounds like transpiling F# Functions to Javascript with Fable. Yikes.

Durable Functions seem to be sparkling a lot of initial interest. I'll be definitely spending more time to play with them, and maybe to make F# story more smooth.

Functions were mentioned in Philip's keynote as one of the important areas for F# application, which is cool. We should spend some extra effort to make the documentation and onboarding story as smooth as possible.

Call to Action

Skills Matter is the company behind the conference. Carla, Nicole and others did a great job preparing the event; everything went smooth, informal and fun.

The videos are already online at Skillscasts (requires free signup).

F# Exchange 2019 super early bird tickets are for sale now and until Monday April 9, go get one and join F# Exchange in London next year!

I'm already missing you all.

Azure Durable Functions in F#

Azure Functions are designed for stateless, fast-to-execute, simple actions. Typically, they are triggered by an HTTP call or a queue message, then they read something from the storage or database and return the result to the caller or send it to another queue. All within several seconds at most.

However, there exists a preview of Durable Functions, an extension that lets you write stateful functions for long-running workflows. Here is a picture of one possible workflow from the docs:

Fan-out Fan-in Workflow

Such workflows might take arbitrary time to complete. Instead of blocking and waiting for all that period, Durable Functions use the combination of Storage Queues and Tables to do all the work asynchronously.

The code still feels like one continuous thing because it's programmed as a single orchestrator function. So, it's easier for a human to reason about the functionality without the complexities of low-level communication.

I won't describe Durable Functions any further, just go read documentation, it's nice and clean.

Language Support

As of February 2018, Durable Functions are still in preview. That also means that language support is limited:

Currently C# is the only supported language for Durable Functions. This includes orchestrator functions and activity functions. In the future, we will add support for all languages that Azure Functions supports.

I was a bit disappointed that F# is not an option. But actually, since Durable Functions support precompiled .NET assembly model, pretty much anything doable in C# can be done in F# too.

The goal of this post is to show that you can write Durable Functions in F#. I used precompiled .NET Standard 2.0 F# Function App running on 2.0 preview runtime.

Orchestration Functions

The stateful workflows are Azure Functions with a special OrchestrationTrigger. Since they are asynchronous, C# code is always based on Task and async-await. Here is a simple example of orchestrator in C#:

public static async Task<List<string>> Run([OrchestrationTrigger] DurableOrchestrationContext context)
{
    var outputs = new List<string>();

    outputs.Add(await context.CallActivityAsync<string>("E1_SayHello", "Tokyo"));
    outputs.Add(await context.CallActivityAsync<string>("E1_SayHello", "Seattle"));
    outputs.Add(await context.CallActivityAsync<string>("E1_SayHello", "London"));

    // returns ["Hello Tokyo!", "Hello Seattle!", "Hello London!"]
    return outputs;
}

F# has its own preferred way of doing asynchronous code based on async computation expression. The direct refactoring could look something like

let Run([<OrchestrationTrigger>] context: DurableOrchestrationContext) = async {
  let! hello1 = context.CallActivityAsync<string>("E1_SayHello", "Tokyo")   |> Async.AwaitTask
  let! hello2 = context.CallActivityAsync<string>("E1_SayHello", "Seattle") |> Async.AwaitTask
  let! hello3 = context.CallActivityAsync<string>("E1_SayHello", "London")  |> Async.AwaitTask
  return [hello1; hello2; hello3]
} |> Async.StartAsTask   

That would work for a normal HTTP trigger, but it blows up for the Orchestrator trigger because multi-threading operations are not allowed:

Orchestrator code must never initiate any async operation except by using the DurableOrchestrationContext API. The Durable Task Framework executes orchestrator code on a single thread and cannot interact with any other threads that could be scheduled by other async APIs.

To solve this issue, we need to keep working with Task directly. This is not very handy with standard F# libraries. So, I pulled an extra NuGet package TaskBuilder.fs which provides a task computation expression.

The above function now looks very simple:

let Run([<OrchestrationTrigger>] context: DurableOrchestrationContext) = task {
  let! hello1 = context.CallActivityAsync<string>("E1_SayHello", "Tokyo")
  let! hello2 = context.CallActivityAsync<string>("E1_SayHello", "Seattle")
  let! hello3 = context.CallActivityAsync<string>("E1_SayHello", "London")
  return [hello1; hello2; hello3]
}       

And the best part is that it works just fine.

SayHello function is Activity trigger based, and no special effort is required to implement it in F#:

[<FunctionName("E1_SayHello")>]
let SayHello([<ActivityTrigger>] name) =
  sprintf "Hello %s!" name

More Examples

Durable Functions repository comes with a set of 4 samples implemented in C#. I took all of those samples and ported them over to F#.

You've already seen the first Hello Sequence sample above: the orchestrator calls the activity function 3 times and combines the results. As simple as it looks, the function will actually run 3 times for each execution, saving state before each subsequent call.

The second Backup Site Content sample is using this persistence mechanism to run a potentially slow workflow of copying all files from a given directory to a backup location. It shows how multiple activities can be executed in parallel:

let tasks = Array.map (fun f -> backupContext.CallActivityAsync<int64>("E2_CopyFileToBlob", f)) files
let! results = Task.WhenAll tasks

The third Counter example demos a potentially infinite actor-like workflow, where state can exist and evolve for indefinite period of time. The key API calls are based on OrchestrationContext:

let counterState = counterContext.GetInput<int>()
let! command = counterContext.WaitForExternalEvent<string>("operation")

The final elaborate Phone Verification workflow has several twists, like output binding for activity (ICollector is required instead of C#'s out parameter), third-party integration (Twilio to send SMSs), recursive sub-function to loop through several attempts and context-based timers for reliable timeout implementation.

So, if you happen to be an F# fan, you can still give Durable Functions a try. Be sure to leave your feedback, so that the library could get even better before going GA.

Load Testing Azure SQL Database by Copying Traffic from Production SQL Server

Azure SQL Database is a managed service that provides low-maintenance SQL Server instances in the cloud. You don't have to run and update VMs, or even take backups and setup failover clusters. Microsoft will do administration for you, you just pay an hourly fee.

So, let's say you decide this value proposition is a good reason to migrate away from your existing self-hosted SQL Server database running in production and replace it with Azure SQL Database.

You do the functional testing and eventually everything works like charm. The next set of questions is going to be related to Database performance level:

  • Which tier / how many DTU's should I provision?
  • How much will it cost?
  • Will it be able to handle my current production load?

DTUs

Even if you collect all the specs of the hardware behind your existing SQL Server, you can't directly use that knowledge to choose the right Azure SQL Database size.

The sizes are measured in Database Transaction Units (DTUs). These are abstract units of measure which don't necessarily mean much on their own. Within a given tier (Standard / Premium), doubling the DTU amount will double the max throughput.

That doesn't really help to plan for workload migrations.

There are some ways to estimate the DTU requirements by measuring metrics like CPU and IOPS on your existing server. Have a look at DTU Calculator: it consists of a data collector and an online converter from metric values to DTUs.

While useful as a first approximation, I'm reluctant to provision Azure SQL Database size solely based on such estimates.

My answer to the problem is: Measure It!

Synthetic Tests

Go get a backup of your existing production database and Export / Import it into Azure SQL Database. Pick the size based on your gut feel, run a load test, evaluate the results, adjust the size, repeat.

If you know your workload really well, you can create a synthetic test:

  • Create a script or scenario which resembles the real production load
  • Run it for a given period of time
  • Measure the DTU's consumed

Unfortunately, I'm yet to see a non-trivial database where I could manually create such script and be reasonably sure that it reflects the reality. Most of the time the load is consumer-driven, changes over time and heavily depends on exact query parameter values.

Which brings me to the need of replaying the actual production workload on Azure SQL Database.

Trace and Replay

SQL Server comes with a marvelous suite of tools refined over years of its existence. It includes the tools to capture and replay the queries, so I started with those.

SQL Server Profiler has a trace template called TSQL_Replay:

This template records information required to replay the trace. Use this template to perform iterative turning, such as benchmark testing.

This sounded like what I needed, so I ran the profiler with this template to save a short trace.

Afterwards, it is possible to use the same SQL Server Profiler to replay the trace against another target database. So the process looks like this:

Replaying Traffic with SQL Server Profiler

Unfortunately, this didn't go very well:

  • Azure SQL Database is not supported by the tooling. The replay kind of runs, but it throws lots of errors like reading from non-existent system tables, trying to switch between databases and so on

  • Related or not to the previous item, but replay went terribly slow. It seemed to slow down exponentially over time

  • The trace file itself was of huge size. Because the template tries to record pretty much everything, tracing 5 minutes on production produced 10 GB of XML

  • Replay was not real-time: you first record, then you replay. This might not be a big issue for many databases, but some of our queries have time parameter, and results would change if I replay the trace 1 hour later

Just to give you a rough idea, our production database-under-study is handling about 1000 RPC calls per second (mostly stored procedures).

Custom Trace & Replay

Since off-the-shelf solution didn't work for me, I decided to come up with my own custom tool chain. Here is the idea:

Replaying Traffic with SQL Server Profiler

There are two custom steps that I implemented:

  1. Run a console app which would host a custom trace server. The trace server receives SQL commands and sends them to Azure Event Hubs in batches

  2. Create an Azure Function application triggered by the Event Hub. Each function call gets one SQL command to execute and runs it against Azure SQL database that we are trying to load-test

This setup worked remarkably well for me: I got the real-time replay of SQL commands from production SQL Server to Azure SQL Database.

The rest of the article describes my setup so that you could reproduce it for your workload.

Azure SQL Database

Ideally, you want your copy of the database to be as fresh as possible, so that the query plans and results match.

Some ideas to accomplish this are given in SQL Server database migration to SQL Database in the cloud.

Premium RS tier is great for testing, because it is much cheaper than Premium tier, while it provides the same level of performance.

Event Hubs

I used Azure Event Hubs as messaging middleware between Trace Server and Replay Function App.

I started with Azure Storage Queues, but the server wasn't able to send messages fast enough, mostly due to lack of batching.

Event Hubs match naturally my choice of Azure Functions: Functions have a built-in trigger with dynamic scaling out of the box.

So, I just created a new Event Hub via the portal, with 32 partitions allocated.

Trace Definition File

In order to run a custom Trace Server, you still need a trace definition file. The built-in template TSQL_Replay mentioned above could work, but it's subscribed to way too many events and columns.

Instead, I produced my own trace template with minimal selection. To do that, open SQL Server Profiler, then navigate to File -> Templates -> New Template, give it a name and then on Events Selection tab exclude everything except exactly the commands that you want to replay.

We use stored procedures for pretty much everything, so my selection looked just like this:

SQL Profiler Template

For the first few runs, I advise you to restrict the trace even further. Click Column Filters button, select TextData there and set Like filter to a single stored procedure, e.g. matching the pattern %spProductList%.

This way you can debug your whole replay chain without immediately overloading any part of it with huge stream of commands.

Once done, save the tdf file to disk. An example of such trace definition file can be found in my github.

Trace Server

My trace server is a simple C# console application.

Create a new console app and reference a NuGet package Microsoft.SqlServer.SqlManagementObjects. Mine is of version 140.17218.0 (latest as of today).

Unfortunately, this NuGet package is not fully self-contained. In order to run a profiling session, you have to install SQL Server Profiler tool on the machine where you want to run the trace server.

Chances are that you already have it there, but be sure to update to the matching version: mine works with 17.4 / 14.0.17213.0 but refused to work with older versions.

Now we can implement our trace server as a console application. The main method looks like this:

static void Main(string[] args) // args: <db server name> <db name> <trace file>
{
    // 1. Run trace server
    var connectionInfo = new SqlConnectionInfo(args[0])
    {
        DatabaseName = args[1],
        UseIntegratedSecurity = true
    };
    var trace = new TraceServer();
    trace.InitializeAsReader(connectionInfo, args[2]);

    // 2. Continuously read traces and send them to event hubs
    var tokenSource = new CancellationTokenSource();
    var readerTask = Task.Factory.StartNew(() => ReadTrace(trace, tokenSource.Token), tokenSource.Token);
    var senderTask = Task.Factory.StartNew(() => SendToEventHubs(tokenSource.Token), tokenSource.Token);

    // 3. Stop the trace
    Console.WriteLine("Press any key to stop...");
    Console.ReadKey();
    tokenSource.Cancel();
    Task.WaitAll(readerTask, senderTask);
}

The first block initializes SQL connection using command line arguments and integrated security, and then starts the Trace Server.

Because of the large volume, I made trace reader and event sender to work on separate threads. They talk to each other via a concurrent queue:

private static readonly ConcurrentQueue<string> eventQueue = new ConcurrentQueue<string>();

Finally, when operator presses any key, the cancellation is requested and the reader and sender get shut down.

Trace Reader task is a loop crunching though trace data and sending the SQL statements (with some exclusions) to the concurrent in-memory queue:

private static void ReadTrace(TraceServer trace, CancellationToken token)
{ 
    while (trace.Read() && !token.IsCancellationRequested)
    {
        var eventClass = trace["EventClass"].ToString();
        if (string.Compare(eventClass, "RPC:Completed") == 0)
        {
            var textData = trace["TextData"].ToString();
            if (!textData.Contains("sp_reset_connection")
                && !textData.Contains("sp_trace")
                && !textData.Contains("sqlagent"))
            {
                eventQueue.Enqueue(textData);
            }
        }
    }

    trace.Stop();
    trace.Close();
}

Event Sender is dequeueing SQL commands from in-memory queue to collect batches of events. As soon as a batch fills up, it gets dispatched to Event Hub:

private static void SendToEventHubs(CancellationToken token)
{
    var client = EventHubClient.CreateFromConnectionString(EventHubsConnectionString);
    var batch = client.CreateBatch();
    while (!token.IsCancellationRequested)
    {
        if (!eventQueue.TryDequeue(out string sql))
        {
            Thread.Sleep(10);
            continue;
        }

        var eventData = new EventData(Encoding.UTF8.GetBytes(sql));
        if (!batch.TryAdd(eventData) && batch.Count > 0)
        {
            client.SendAsync(batch.ToEnumerable())
                .ContinueWith(OnAsyncMethodFailed, token, TaskContinuationOptions.OnlyOnFaulted, TaskScheduler.Default);
            batch = client.CreateBatch();
            batch.TryAdd(eventData);
        }
    }
}

If your trace doesn't produce so many messages, you will probably want to periodically send out the batches even before they get full, just to keep that process closer to real time.

Note that sender does not await SendAsync call. Instead, we only subscribe to failures via OnAsyncMethodFailed callback to print it to console:

private static void OnMyAsyncMethodFailed(Task task)
{
    Console.WriteLine(task.Exception?.ToString() ?? "null error");
}

And that concludes the implementation of the Trace Server. SQL commands now go to Event Hub, to be picked up by Trace Replay.

Trace Replay Function App

To replay those traces against the target Azure SQL Database, I could make another console application which would contain EventProcessorHost to receive and process SQL commands.

However, under high load a single machine might not be able to keep up with executing all those commands in real time.

Instead, I decided to distribute such Replay App over multiple machines. To deploy a DDoS network, if you will :)

And I don't have to build, find, configure and synchronize all those servers myself, since we are living in the world of serverless.

Azure Functions are the perfect tool for this job. Once you start the trace server, Function App will start scaling up based on the amount of events in Event Hub, and will expand until it catches up with the workload.

But as long as you don't run the trace server, it won't consume any servers and won't cost you a dime.

Here is the implementation of Trace Replay Azure Function:

public static class Replay
{
    [FunctionName("Replay")]
    public static void Run(
        [EventHubTrigger("sqltrace", Connection = "EventHubsConn")] string sql,
        TraceWriter log)
    {
        var commandName = sql
            .Split(null)
            .SkipWhile(r => r != "exec" && r != "sp_executesql")
            .FirstOrDefault(r => !r.Contains("exec")) ?? "<empty>";

        var stopwatch = new Stopwatch();
        stopwatch.Start();

        try
        {
            using (var sqlConnection = new SqlConnection(AzureSqlConnectionString))
            using (var cmd = new SqlCommand())
            {
                sqlConnection.Open();

                cmd.CommandText = sql;
                cmd.CommandType = CommandType.Text;

                cmd.Connection = sqlConnection;

                int count = 0;
                using (var reader = cmd.ExecuteReader())
                {
                    while (reader.Read())
                    {
                        count++;
                    }
                }

                log.Info($"Processed {commandName} in {stopwatch.ElapsedMilliseconds} ms with {count} rows");
            }
        }
        catch (Exception ex)
        {
            log.Error($"Error in {commandName} in {stopwatch.ElapsedMilliseconds} {ex.Message}");
            throw;
        }
    }
}

It's super simple: the function gets a SQL statement, executes it with SqlCommand class and logs the result with timing and returned row count. And that's everything required to start bombarding my Azure SQL Database.

Evaluating Results

The purpose of this whole exercise was to evaluate whether a provisioned DTU level is enough to stand the load comparable to existing production.

So, after I ran the test, I could browse through the DTU usage chart in Azure portal to get overall usage statistics.

I've also spent quite some time analyzing the usage breakdown as reported by sp_BlitzCache from Responder Kit. Please note that it's not officially supported for Azure SQL Database, but it seems to work reasonably well.

Be sure to re-run your experiments multiple times, at different days and time intervals.

The full code sample can be found in my github.

I hope Azure SQL Database will perform to your expectations and within your budget. But hope is not a good strategy, so go ahead and try it out!

Happy DDoS-ing!

Tic-Tac-Toe with F#, Azure Functions, HATEOAS and Property-Based Testing

This post describes a toy application that I've built with F# and Azure Functions in about 1 day of work. It shows a simple end-to-end implementation with some useful techniques applied, and can be used as a reference point for anyone interested in one of the topics mentioned in the title.

The requirements for my application are quite simple:

  • Implement the game of Tic-Tac-Toe for a human player to play against the computer
  • The field is 3x3, the player to have three-in-a-row wins
  • After the game, the score is calculated based on the number of moves combined with the duration of the game
  • The history of players' scores is persisted and presented as the leaderboard

Below I go through the code step by step. Feel free to jump to the part which interests you the most: Domain Modelling, Azure Functions, HATEOAS, Property-Based Testing.

The game is online, so you can play it here.

The full source code can be found in my github.

Modeling the Game with Types

I start with a domain model. The model is composed of immutable F# types (records and discriminated unions) and pure functions.

We have two players, so we need a type for them:

type Player = X | O

In addition, there is a useful function to return the other player based on the given one. Simple pattern-matching will do:

module Player =
  let other = function | X -> O | O -> X

The domain code is the most important part of the application, so I want it to be covered by unit tests. Of course, the above function doesn't really warrant testing, but it's a nice and simple way to try out Property-Based Testing. That is, instead of defining specific tests, we define properties which hold for any valid input.

For other function, I came up with two properties:

  • Other player is not equal to original player
  • Other player of other player is the player itself

Here is the code with Expecto and FsCheck:

testProperty "Other player is not equal to player" <| fun x ->
  Expect.notEqual x (Player.other x) "Other should be different from original"

testProperty "Other player of other player is the player itself" <| fun x ->
  Expect.equal x (Player.other (Player.other x)) "Other other should be equal to original"

Let's move on to modelling the game. I decided to define a union type to be used for horizontal and vertical positions of the cells:

type Position = One | Two | Three

I could use the normal integers instead, but I don't want to be worried about validating the ranges all the time.

My first record type models the move, or action done by a player: it has X and Y positions of the chosen cell, plus the player information:

type Move = {
  X: Position
  Y: Position
  By: Player
}

The following type RunningGame has just two properties, but its shape defines the design of the whole application:

type RunningGame = {
  MovesDone: Move list
  PossibleMoves: Move list
}

This type models any state of the game which is not finished yet.

MovesDone represents the ordered log of all moves, so we have the complete history of actions at any time. Event Sourcing in small.

Equally importantly, there is a list of all possible moves at this point of the game. I could get away without this property: in the end, it can always be derived from the history of done moves and 3x3 size of the field.

However, having the list of possible moves simplifies the design of all the decision maker (client) code:

  • Clients don't have to search for remaining cells based on move log
  • Validation of a move received from clients gets trivial: just check that it's in the list of possible moves
  • Bot implementation gets easy: it just needs to pick one of the valid moves. The most trivial bot is a one-liner: it picks a random move from the collection
  • Tests take advantage of this in a similar way, see Game Tests below
  • We build a nice bridge into HATEOAS-style API, where links provided in the response correspond to possible moves, see REST API below

Now, we can model a game which is already finished:

type GameOutcome = Won of Player | Tie

type FinishedGame = {
  MovesDone: Move list
  Outcome: GameOutcome
}

Each finished game has a list of moves and the outcome: either one player won, or there was a tie.

Each state of a game can be described by the union of the previous two states:

type GameState = 
  | Finished of FinishedGame
  | InProgress of RunningGame

Modeling Game Flow

Now, when all the types are in place, we can model the game flow. The flow is a sequence of transitions between game states, implemented with pure functions.

First, each game starts at the same state, which is an empty field, and X turn. Here is the value which represents this initial state:

module Game =
  let initialState = 
    let positions = [One; Two; Three]
    let cells = seq { 
      for x in positions do
         for y in positions do
            yield { X = x; Y = y; By = X }
      }
    { MovesDone = []; PossibleMoves = List.ofSeq cells }

After each move is made, we need a function to evaluate move outcome: whether current game is finished or is still in progress. I defined a function evaluate for that:

let private evaluate (history: Move list): GameOutcome option = ...

I don't show the full body here, since it's quite boring in evaluating rows, columns and diagonals for three-in-a-row. See the full code if you want to.

The following function is even more important: that's the main domain function called makeMove. Its type is RunningGame -> Move -> GameState which perfectly communicates its intent: given a running game and a move, it returns the game state after the move. Note that

  • You can't pass a finished game as an argument, because making a move on finished game doesn't make sense
  • The result can be a finished game

Here is the function implementation:

let makeMove (game: RunningGame) (move: Move): GameState =
  let movesDone = move :: game.MovesDone
  match evaluate movesDone with
  | Some result -> Finished { MovesDone = movesDone; Outcome = result }
  | None ->
    let possibleMoves = 
      List.except [move] game.PossibleMoves
      |> List.map (fun m -> { m with By = Player.other m.By })
    InProgress { MovesDone = movesDone; PossibleMoves = possibleMoves }

It works like this:

  • Prepend the new move to moves done
  • Evaluate the game result of these combined moves
  • If the result is known, return a Finished game with calculated outcome
  • If the result is not clear yet, return InProgress game with possible moves same as before, but excluding the move and assigned to the other player

Tic-Tic-Toe is two-player game, so I defined another function which runs a turn of 2 moves by 2 players, given the decision making functions of both players (so it's a higher-order function):

let makeRound player1 player2 gameState =
  let newGameState = player1 gameState |> makeMove gameState
  match newGameState with
  | Finished _ -> newGameState
  | InProgress p -> player2 p |> makeMove p  

Looks almost like monadic bind operation...

Property-Based Testing

I've already shown two simplistic properties for other function.

It's a bit more challenging to come up with invariant properties when it comes to testing the game itself. After some brainstorming, I've made the following list:

  • The game is never finished after 4 moves
  • The game is always finished after 9 moves
  • X and 0 have to make moves in turns
  • Player wins by filling one column
  • Player wins by filling one row
  • Player wins by filling diagonal
  • Evaluate a known tie

Each property-based test should accept some input from the testing framework. It should then evaluate the test against this input and assert the invariants. If the property holds for any possible valid input, the test is green.

I decided to structure my property tests in the following way:

  • Each test accepts a list of non-negative integers
  • Each integer is interpreted as an index of a possible move to select at turn i. That means that the test receives a sequence which uniquely identifies the moves to be made
  • We can restrict this sequence to a scenario under test, e.g. make it less than 4 moves, or exactly 9 moves, or pick moves from a limited subset of all possible moves
  • We apply the moves to calculate the end result
  • We assert that the result confirms the property under test

Now, it's the responsibility of property based testing framework to generate all kinds of input lists to try to break our property. If it succeeds, it will print the exact input which causes the test to fail.

Here is how one such test is implemented.

A helper function plays a sequence of indexes as moves:

let playSequence moves = 
  let playOne s i =
    match s with
    | InProgress p -> Game.makeMove p (p.PossibleMoves.[i % p.PossibleMoves.Length])
    | _ -> s
  List.fold playOne (InProgress Game.initialState) moves

Then the property "The game is always finished after 9 moves" is simply:

testProp "The game is always finished after 9 moves" <| fun (Gen.ListOf9 xs) ->
  let result = playSequence xs
  Expect.isTrue (Game.isFinished result) "Game should be finished"

Note the restriction Gen.ListOf9 xs that we put on the input sequence. It's a generator that I defined, so that the list always contains exactly 9 elements.

Other property tests follow a similar pattern, you can see them here.

REST API

Now, when I'm done with Game domain model, I want to define API that our HTTP service will expose to the clients. I define my API in REST model.

The main resource is /game resource. To start a new game, a client has to send POST command:

POST /game
Content-Type: application/json

{ "name": "Mikhail" }

And the response body is JSON which denotes a new game created:

{
    "id": "5d7b2261",
    "busyCells": [],
    "links": [
        {
            "rel": "x1y1",
            "href": "/game/5d7b2261/move/0"
        },
        {
            "rel": "x1y2",
            "href": "/game/5d7b2261/move/1"
        },
        // ... 7 more links follow
    ]
}

The response contains a game ID and the list of occupied cells, empty for now, because no moves have been made.

More importantly, it contains a list of links, each one of which has a rel field representing a cell, and a link. The client should POST to this link if it wants to make a move on the corresponding cell:

POST /game/5d7b2261/move/1

And the response is:

{
    "id": "5d7b2261",
    "result": null,
    "busyCells": [
        {
            "name": "x2y3",
            "value": "O"
        },
        {
            "name": "x1y2",
            "value": "X"
        }
    ],
    "links": [
        {
            "rel": "x1y1",
            "href": "/game/5d7b2261/move/0"
        },
        {
            "rel": "x1y3",
            "href": "/game/5d7b2261/move/1"
        },
        // ... 5 more links follow
    ]
}

It has the same structure as before, but now two cells are occupied: one X and one O. The list of links now has only 7 links, based on the count of free cells.

The client keeps navigating the links until it gets non-empty result field:

{
    "id": "5d7b2261",
    "result": "You Win!",
    "busyCells": [
        {
            "name": "x3y1",
            "value": "X"
        },
        {
            "name": "x3y2",
            "value": "O"
        },
        // ... more cells here
    ],
    "links": [],
    "score": 401
}

This denotes the end of the game. There's no more links to navigate, so the client knows it has to stop playing.

This API is designed in HATEOAS-style (Hypermedia as the Engine of Application State). The clients only need to know the initial URL, while all the other URLs are received from the previous responses. It resembles the way a human navigates websites.

Azure Functions

I implemented the above API with Azure Functions. I used .NET Standard based v2 runtime with precompiled F# functions.

The initial POST /game request is handled by Start function:

type GameRequest = { Name: string }

type Cell = { Name: string; Value: string }
type Link = { Rel:  string; Href:  string }

type GameDTO = {
  Id: string
  Result: string
  BusyCells: Cell list
  Links: Link list
  Score: int
}

[<FunctionName("Start")>]
let start([<HttpTrigger(AuthorizationLevel.Anonymous, "POST", Route = "game")>] req: GameRequest,
          [<Table("TicTacToe")>] store: ICollector<GameEntity>) =
  let gameid = Guid.NewGuid().ToString()
  let state = InProgress Game.initialState
  let serializedState = JsonConvert.SerializeObject state
  store.Add(GameEntity(PartitionKey = "default", RowKey = gameid, Name = req.Name, State = serializedState))
  ObjectResult(Api.serialize gameid state 0)

The outline of this function:

  • It's triggered by HTTP POST request, as configured for req parameter
  • The request body is parsed to GameRequest type containing player name
  • It generates a new game ID
  • It creates initial game state of empty field
  • It serializes the state and saves it to Table Storage with store output binding
  • It returns HTTP body with serialized game response of type GameDTO

The second Function Play handles the moves:

[<FunctionName("Play")>]
let play([<HttpTrigger(AuthorizationLevel.Anonymous, "POST", Route = "game/{gameid}/move/{index}")>] 
         req: HttpRequest, gameid: string, index: int,
         [<Table("TicTacToe", "default", "{gameid}")>] entity: GameEntity) =
  let state = JsonConvert.DeserializeObject<GameState> entity.State
  match state with
  | Finished _ -> BadRequestResult() :> IActionResult
  | InProgress p when index < 0 || index >= p.PossibleMoves.Length -> BadRequestResult() :> IActionResult
  | InProgress p -> 
    let result = Game.makeRound (fun _ -> p.PossibleMoves.[index]) Bot.pickMove p
    entity.State <- JsonConvert.SerializeObject result
    entity.Score <- Scoring.calculateScore (DateTime.UtcNow - entity.StartedAt).TotalMilliseconds result
    ObjectResult(Api.serialize gameid result entity.Score) :> IActionResult

The outline is very similar:

  • It's triggered by a POST request with a URL template containing game ID and move index
  • It has an in/out Table Storage binding which reads the serialized state saved after previous game and move requests
  • It validates the state: if the game is already finished, or if the move index is not in the valid range, Bad Request HTTP status is returned
  • If the move is valid, it runs the round, including bot play
  • It also calculates the score, which is going to be non-zero only for finished games
  • The game state and the score are saved to Table Storage entity (that's the only mutation in the whole application)

Bot

Azure Function above used a Bot.pickMove function which I haven't described yet.

This function has the type RunningGame -> Move, exactly what is expected by makeRound game function. Its goal is to pick the O move for any given game-in-progress.

Obviously, 3x3 Tic-Tac-Toe is a very simple game and it's quite easy to make a perfectly playing bot. This wasn't the goal though: it's more fun for a human to win.

So, actually, the only property test that I ended up implementing is the following:

testProp "Bot is able to play O at any possible position" <| fun (Gen.ListOfNonNegative xs) ->
  let human i p _ = p.PossibleMoves.[i % p.PossibleMoves.Length]
  let round s i =
    match s with
    | InProgress p -> Game.makeRound (human i p) Bot.pickMove p
    | _ -> s
  List.fold round (InProgress Game.initialState) xs |> ignore     

It makes sure that for any possible sequence of human moves, bot is actually able to make any move of its own. Bot just shouldn't crash :)

My very first implementation of the bot was just picking a random move. Such bot is fine, but it's too boring to play against.

So, my current bot implementation has 3 rules:

  • If there is a move that immediately wins the game, do that move
  • If possible, don't pick a move which leads to immediate loss after the next human move
  • Otherwise, pick a random move

I implemented the bot using the approach described in my Functional Fold as Decision Tree Pattern post:

let pickMove (game: RunningGame) = 
  [winNow O; notLoseNow; pickRandom]
  |> Seq.ofList
  |> Seq.choose (fun x -> x game)
  |> Seq.head

So, there is a prioritized list of decision functions. The first one returning Some decision will be promoted to final decision.

And here are those functions:

let winNow player (game: RunningGame) =
  let isWin = function | Finished { Outcome = Won x } when x = player -> true | _ -> false
  game.PossibleMoves
  |> List.tryFind (fun move -> Game.makeMove game move |> isWin)

let notLoseNow (game: RunningGame) =
  let canLose = function 
    | InProgress p -> match winNow X p with | Some _ -> true | None -> false
    | _ -> false
  let notLosingMoves =
    game.PossibleMoves
    |> List.filter (fun move -> Game.makeMove game move |> canLose |> not)
  if List.isEmpty notLosingMoves && notLosingMoves.Length < game.PossibleMoves.Length then None
  else Some (notLosingMoves.[random.Next notLosingMoves.Length])

let pickRandom (game: RunningGame) = 
  Some (game.PossibleMoves.[random.Next game.PossibleMoves.Length])

Such rule-based setup is easy to extend, and also to test when it becomes needed.

Score Calculation

Last twist to the application is the scoring system. If a human player wins or gets a tie, they are assigned a numeric score, which can be then compared to the historic leaderboard of all players.

The score is calculated based on two principles: the less moves you make, and the faster you play, the higher the score is. Move count is more important than timing.

These principles are nicely expressed as property tests:

testProp "The score of faster game is not lower than slower game" 
  <| fun (Gen.Positive duration1) (Gen.Positive duration2) game ->
  let (slower, faster) = maxmin id duration1 duration2
  let scoreFaster = Scoring.calculateScore faster game
  let scoreSlower = Scoring.calculateScore slower game
  Expect.isGreaterThanOrEqual scoreFaster scoreSlower "Bigger duration has lower score (or same)"

testProp "The score of won game in less moves is greater than game with more moves" 
  <| fun (Gen.Positive duration1) (Gen.Positive duration2) game1 game2 ->
  let (slower, faster) = maxmin id duration1 duration2
  let (moreMoves, lessMoves) = maxmin List.length game1 game2
  let score1 = Scoring.calculateScore slower (Finished { Outcome = Won X; MovesDone = lessMoves })
  let score2 = Scoring.calculateScore faster (Finished { Outcome = Won X; MovesDone = moreMoves })
  if moreMoves.Length = lessMoves.Length then
    Expect.isGreaterThanOrEqual score1 score2 "Bigger duration has lower score (or same)"
  else
    Expect.isGreaterThan score1 score2 "More moves have lower score"

Note that tests are parameterized for durations and game states. We don't have to come up with specific scenarios: the framework should take care of those.

One of the possible implementations for scoring is:

let calculateScore duration (state: GameState) =
  let durationScore = (100.0 * (1.0 - duration / (duration + 10000.0))) |> int
  match state with
  | Finished { Outcome = Won X; MovesDone = ms } -> (11 - ms.Length) * 100 + durationScore
  | Finished { Outcome = Tie } -> durationScore
  | _ -> 0

Now, the leaderboard piece. You've already seen the bits of this functionality in Azure Functions: they store the game state into Azure Table Storage.

There is another Azure Function which handles GET requests to /leaderboard resource. It loads all the past games from Table Storage, and then passes them to leaderboard calculation function below:

let calculateLeaderboard top ns =
  ns
  |> Seq.filter (fun entity -> snd entity > 0 && not (String.IsNullOrEmpty (fst entity)))
  |> Seq.sortByDescending snd
  |> Seq.truncate top
  |> Seq.mapi (fun index entity -> { Index = index + 1; Name = fst entity; Score = snd entity })

Wrapping Up

Ok, the application is simple, but the blog post ended up being quite long. Thank you if you made it so far.

I touched base on several important concepts and tools, which can be useful apart or in combination.

Please leave a comment if such kind of articles is useful, and which part you found most inspirational or boring.

The full source code can be found in my github.

Happy coding!

Mikhail Shilkov I'm Mikhail Shilkov, a software developer. I enjoy F#, C#, Javascript and SQL development, reasoning about distributed systems, data processing pipelines, cloud and web apps. I blog about my experience on this website.

LinkedIn@mikhailshilkovGitHubStack Overflow