Constant Time String Comparison in C

Comparing strings in C is typically handled with `strncmp`. This is fine in most cases but if you need to compare sensitive information, such as a message digest, it’s a really bad choice. `strncmp` is susceptible to timing attacks because it will stop comparing once the first difference is encountered.

The overall design of constant time comparison is pretty well known. The OR XOR combo has been reviewed and vetted by crypto researchers. At least I hope it has… It’s used by a number of large security focused projects so… Well, my focus is on how to implement it specifically with strings in C. Most of the C versions I’ve seen, assume bytes and take the comparison length as an argument. Since I care about strings I want the length calculation handled securely by the function itself.

Here’s the code, so lets go into detail.

str_iseq.c

#include <stdbool.h>
#include <stdlib.h>

bool str_iseq(const char *s1, const char *s2)
{
    int    m = 0;
    size_t i = 0;
    size_t j = 0;
    size_t k = 0;

    if (s1 == NULL || s2 == NULL)
        return false;

    while (1) {
        m |= s1[i]^s2[j];

        if (s1[i] == '\0')
            break;
        i++;

        if (s2[j] != '\0')
            j++;
        if (s2[j] == '\0')
            k++;
    }

    return m == 0;
}

One of the big things I want to avoid is leaking the length of the known string. I’m treating `s1` as the string provided by the user/attacker/outside and `s2` and the internal, known string that’s being checked for a match.

You might think the first thing we should do is check if the lengths match but this is a bad idea. This will leak the length because an attacker can start with 1 character and keep adding characters until the comparison starts. Not to mention, we’d need to get the lengths using `strlen` and that will leak the length because `strlen` is a O(n) operation. Instead we’ll make the while loop doing the comparison length aware.

I have a NULL input check which does leak length but you really shouldn’t be trying to compare NULL in the first place… Additionally I could check if the two pointers are the same but this is a constant time comparison function after all. So we’ll still compare them.

if (s1[i] == '\0')
    break;

We’re always going to go through every character in `s1` no matter the length. Other than the strings not matching, the only other thing an attacker should be able to learn is the length of `s1`. But that’s not a problem because they passed in `s1` to begin with. You’d expect them to know the length of their input.

This check and break happens after the comparison to ensure the trailing NULL is part of the comparison. We absolutely need to compare the NULL terminator in order to prevent a sub string match. Even if the `s1` is shorter than `s2` a match won’t be generated because the NULL in `s1` won’t match the character in `s2`.

if (s2[j] != '\0')
    j++;
if (s2[j] == '\0')
    k++;

This is the key to not overrun `s2` if it’s shorter than `s1`. As long as we haven’t hit the end of `s2` we increment `j` to move forward in the comparison. If we’ve hit the end of `s2` we switch to incrementing `k`. We’ll keep comparing the NULL terminator in `s2` to the remaining characters in `s1`. The only reason for `k` is to keep operations balanced. What I mean is each iteration of the loop we’re incrementing `j` to move forward in `s2`. If we stop incrementing `j` the timing of the function changes. `k` ensures there is always an increment operation in every iteration of the loop.

There are two `if` statements instead of an `if else` for a very good reason. Modern processors, as well
as compilers, will often optimize for the true branch. Again, this leads to unbalanced operations and a difference int timing. By making it two separate checks we’re, hopefully, preventing it from being optimized.

m |= s1[i]^s2[j];

This is the “industry” standard comparison algorithm I eluded to earlier and not something I came up with. It works by XORing the two characters together (obviously). If they’re the same it will produce 0. If they’re different it will produce something non-zero. That’s saved using an OR. What ends up happening is `m` starts as 0 and if every character matches, ORing 0 with 0 will keep it as 0. As soon as there isn’t a match `m` become non-zero. Using an OR means `m` can never go back to zero once there isn’t a match.

test.c

#include <stdio.h>
#include <string.h>

int main(int argc, char **argv)
{
    char *a = strdup("ABC");
    char *b = strdup("ABC");

    printf("%s\n", str_iseq(a, b)?"true":"false");
    printf("%s\n", str_iseq("ABC", "AB")?"true":"false");
    printf("%s\n", str_iseq("AB", "ABC")?"true":"false");

    free(a);
    free(b);
    return 0;
}

This very simple test app has two allocated strings, `a` and `b` to be 100% sure we’re comparing two different strings. When compiled any constants will be put into the executable once and referenced where needed. Which means that if we set `a` and `b` to a constant “ABC” then they would point to literally the same data. This isn’t a huge problem because `str_iseq` always does a comparison even if the pointers match but if a pointer check were added we might get into a situation where the comparison is skipped during testing. So, allocated strings are used to be 100% sure two different strings with the same characters are compared.

The last two checks are to demonstrate that substrings won’t cause a match. It’s not the most exhaustive set of tests but it’s a good enough demonstration.