Always Encrypted: Deterministic vs. Randomized

Always Encrypted: Deterministic vs. Randomized

Introduction

When it comes to securing a SQL server there are severeal things we need to think of. One of this things is the protection of our data in transit and/or at rest by using encryption.

Encryption can happen at several layers. E.g. for encryption on a drive level we would use BitLocker or NTFS encryption for folder level and TDE to encrypt on a file level.

At the same time we can encrypt columns and scalar values using T-SQL via a certificate (EncryptByCert), asymmetric key (EncryptByAsymKey), symmetric key (EncryptByKey) or a passphrase (EncryptByPassPhrase).

Now how does Always Encrypted fit in here and how does it work?

Always Encrypted is a feature designed to protect sensitive data, stored in Azure SQL Database or SQL Server databases from access by database administrators. It leverages client-side encryption where a database driver inside an application transparently encrypts data, before sending the data to the database. It therefor provides a seperation between those who own the data and those who manage the data but should have no access.

This feature is available in all editions of Azure SQL Database, starting with SQL Server 2016 (13.x) and all service tiers of SQL Database.

This is in contrast to the T-SQL functions mentioned beforehand, where the encryption and decryption process happens on the database engine. When using Always Encrypted the keys used by the client-side are never revealed to the DB engine.

This leads us to the types of keys involved, which are: Column Encryption Keys (CEK) and Column Master Keys (CMK). The CEKs are used to actually encrypt the data and a CMK is required to protect the CEKs itself. This is necessariy because the CEKs are stored in the database and therefor need special protection. CMKs are usually held in an Azure Key Vault, Windows Certificate store or a hardware security module.

always-encrypted-workflow1-1

It's important to note that the database only contains metadata about the type and location of CMKs, and encrypted values of CEKs. This means that plaintext keys are never exposed to the database system ensuring that data protected using Always Encrypted is safe, even if the database system gets compromised.

Okay, now that we have covered the basics lets move on to the provided encryption types deterministic and randomized and compare them to each other.

Deterministic Encryption

I think that the deterministic encryption type is somewhat similar to the way hashing algorithms work. This method always generates the same encrypted value for any given plaintext value.

This means if we would encrypt a boolean column holding only true values the encrypted payload would be the same for each of the values.

Input Output
true 0x1234ffff
true 0x1234ffff
false 0xffff0000
false 0xffff0000

This method allows grouping, filtering by equality and joining tables based on encrypted values, but could also allow a malicious user to guess information by examining patterns and then deduce the plain-text value.

Use the deterministic encryption method when...

  • values are mostly unique
  • column values needs to be used as searching or grouping parameters
  • column requires an index

Randomized Encryption

Randomized encryption generates a different encrypted value for the same plaintext each time. So if we would encrypt a boolean column holding only true values, we would get a different encrypted payload for each of them.

Input Output
true 0xffffabcd
true 0x1234dddd
false 0xa1a1b2b2
false 0xccdd1234

This encryption method is more secure then deterministic, but prevents equality searches, grouping, indexing and joining on encrypted columns.

Use the randomized encryption method when...

  • values are forming a pattern (e.g. for enums, booleans, ...)
  • there is no need to group or join tables based on this data
  • there is no need for an index