T-SQL MERGE statement is underrated
How many times did you write a SQL to save a row without knowing whether the same primary key already exists or not? You just get an object in your data access layer and you want to save all fields into the database.
But there is no SAVE statement in SQL, so effectively you need to come up with your implementation of “INSERT or UPDATE” command.
Example
Let’s take a concrete example. You have a person object with just 3 fields, here is the type definition:
public class Person
{
public Guid Id { get; set; }
public string Name { get; set; }
public string Email { get; set; }
}
Persons are entities, so we chose a GUID for the primary key column. We always generate
the unique identifier at client side and just want to save the Person
.
Typical Solutions
The typical T-SQL developer’s toolbox contains SELECT
, INSERT
,
UPDATE
and DELETE
statements. DELETE
is of no use here, but the combination of the other
three can be employed to complete the task. The most straightforward option is
IF NOT EXISTS(SELECT 1 FROM Person WHERE Id = @Id)
INSERT Person (Id, Name, Email) VALUES (@Id, @Name, @Email)
ELSE
UPDATE Person SET Name = @Name AND Email = @Email WHERE Id = @Id
It’s 4 lines of code instead of one, but it works. Being more fancy, we can reduce the code to 3 lines of code:
UPDATE Person SET Name = @Name AND Email = @Email WHERE Id = @Id
IF @@ROWCOUNT = 0 THEN
INSERT Person (Id, Name, Email) VALUES (@Id, @Name, @Email)
It should also perform faster if you update more often then insert.
Solved? Not completely…
The problem is that sometimes it doesn’t
work. By default, SELECT
doesn’t lock the table, so in race condition scenario there
may be another thread which would insert another row with same Id between the execution of
two statements (or delete the existing row for that matter). Ouch.
If you think that’s just a theoretical problem… Well, it might be for our tiny shiny Person
table, but it will happen for the tables of decent sizes with complex update patterns.
Transactions
What do we do when the execution of two statements can cause race conditions with
unpredictable results? We use transactions! So, start a transaction before the statement,
then lock the table in SELECT
and commit after all is done. It works, but quite some downsides
again:
- Lots of boilerplate code
- Easy to make a mistake (Which lock do we need?
updlock
?holdlock
?tablockx
?) - You might get into a deadlock, so need to handle it gracefully
MERGE
Starting with SQL Server 2008, Microsoft introduced the MERGE statement. Generally, it’s quite powerful and can be used to save all the different rows of a source table into a target table. But we can also use it for our simple task of saving a person.
MERGE
is just one statement, so it’s atomic and consistent. It performs very well.
But the syntax is… oh my god, it’s horrible. Your eyes might bleed:
MERGE Person AS target
USING (SELECT @Id, @Name, @Email) AS source (Id, Name, Email)
ON (target.Id = source.Id)
WHEN MATCHED THEN
UPDATE SET Name = source.Name, Email = source.Email
WHEN NOT MATCHED THEN
INSERT (Id, Name, Email) VALUES (source.Id, source.Name, source.Email)
Yes, we repeat the name of each column 6 times. And we say source
7 times. And you can imagine
how the MERGE
of a table with 50 columns would look like. And how painful it is to add a new column
to an existing statement written 2 years ago.
By the way, the deadlocks are still possible with MERGE
statement, so you need to handle them
properly.
So the developers, even the ones who know about the MERGE
, usually choose to use the good old CRUD
combination. But when isn’t the syntax a problem?
Generate It!
More and more developers shift from writing the stored procedures to using ORMs. With full-blown ORMs you don’t need to care about particular SQL statements, but you get a bunch of other problems related to Object-relational impedance mismatch.
One possible approach is to use a mini-ORM, for instance Dapper. You do your work in your favourite general-purpose language, but stay “close to the metal”, or rather to SQL engine statements.
Here is how I invoke a MERGE
statement for a Person object (given a connection from the pool):
var person = new Person(...);
DapperAdapter.Merge(connection, person);
Voila! The implementation of generic Merge
method takes care of the syntax complications.
Write once, use everywhere:
public void Merge<TEntity>(IDbConnection dbConnection, TEntity entity) where TEntity : class
{
var props = entity.GetType().GetProperties().Select(p => p.Name).ToList();
var names = string.Join(", ", props);
var values = string.Join(", ", props.Select(n => "@" + n));
var updates = string.Join(", ", props.Select(n => $"{n} = @{n}"));
dbConnection.Execute(
$@"MERGE {entity.GetType().Name} as target
USING (VALUES({values}))
AS SOURCE ({names})
ON target.Id = @Id
WHEN matched THEN
UPDATE SET {updates}
WHEN not matched THEN
INSERT({names}) VALUES({values});",
entity);
}
Of course, it will only work if you use the convention of naming the Person
properties
after the database table. In most cases, there will be a domain class Person
and a property
bag class PersonRow
, so you’ll have to do the mapping between them. But that might be
easier than writing T-SQL code.
Conclusion
Don’t let the bulky syntax scare you away from the MERGE
T-SQL statement. Extend your
toolbox, and use the tools wisely.