Special Treatment for Literal Zero
TL;DR:
It is possible in C++ to make foo(0)
and foo(1)
call different functions.
We used that to design an API where x < 0
calls an optimized check without branching.
Trigger warning: this post contains mild forms of C++ gore.
A Tale of 0
and 1
I recently stumbled upon the little fact that just because x < 1
compiles you cannot assume that x < 0
would as well (godbolt).
struct X { };
bool operator<(X, long);
bool operator<(X, void*);
X x;
x < 1; // fine
x < 0; // ambiguous overload
Let’s start simple.
0
and 1
are integer literals and without any suffix their type is good ol’ int
.
(Lack of a suffix alone is not a guarantee though.)
int
s can be promoted to long
s which is why x < 1
happily calls the first overload.
So far so good.
Back in the days (technical term for the period of time before C++11), there was no nullptr
.
People either used 0
or NULL
(a fancy macro).
The reason this worked was because there is a special rule that 0
can be converted to any pointer and yields the null pointer.
x < 0
is thus ambiguous because both overloads are valid and they are equally good (both are one conversion away).
An Idea Forms
This little curious gem led me to rethink a recent API design problem we faced.
We were writing a custom 256 bit integer type, basically a BigInteger
but with a fixed representation using 4 uint64_t
s internally.
This was performance sensitive code and we wanted everything to be maximally efficient.
Our i256
type uses two’s complement and the check x < 0
can be done by simply testing if the highest bit is set.
However, x < 1
or x < y
was a bit more expensive because you basically have to test x - y < 0
.
In our initial design we opted for a is_below_zero(i256)
function that performs the optimized test and operator<(i256, i256)
that does the generic test.
Unfortunately, people tend to write x < 0
and I can’t blame them because it’s more readable and more concise than is_below_zero(x)
.
Looping back to our initial observation:
If it is possible to generate a compile error for x < 0
but not for x < 1
then maybe it is also possible to redirect x < 0
to a different function.
What if we could automatically (and statically) use
is_below_zero(x)
wheneveroperator<
is called with a literal0
and otherwise fall back to the generic code without using a runtime branch?
The Initial Design
Usually, whenever we want to check if something would compile without actually introducing a compile error, we reach for SFINAE.
The idea would be to disable operator<(long)
for 0
.
In our case, this doesn’t work because SFINAE usually deals in types where 0
and 1
are not distinguishable anymore.
However, we don’t actually need to disable operator<(long)
, we just need to break the ambiguity.
If there were a way to make operator<(long)
a worse candidate than operator<(void*)
(for 0
) then this meets our requirements.
Lo and behold, there is a way!
User conversions are famously known for being “weaker” than standard conversions.
struct not_literal_zero { not_literal_zero(int); };
using literal_zero = void*;
struct X { };
bool operator<(X, not_literal_zero); // (1)
bool operator<(X, literal_zero); // (2)
X x;
x < 1; // calls (1)
x < 0; // calls (2)
1
cannot be converted to void*
but has an implicit conversion to not_literal_zero
.
0
can be converted to both but void*
is a standard conversion while not_literal_zero
a user-provided one, making (2) the unambiguous call target (godbolt).
Banishing Footguns
Are we done yet?
Well… yes, and no.
We achieved our initial goal of making x < 0
call our optimized version while x < 1
and x < y
call the generic version.
APIs should not be needlessly surprising but we’re good law-abiding citizen:
Both versions are semantically equivalent for x < 0
.
The call is statically resolved, thus no performance overhead in any case and a performance win for the common x < 0
test.
However, we introduced a few unnecessary footguns:
x < nullptr
compilesx < &x
compilesx < "y tho?"
almost compiles
The third case would compile if we use void const*
instead of void*
, something we might be tempted to do due to const-correctness.
Let’s fix the first case.
This is actually pretty simple because nullptr
has a special type std::nullptr_t
.
We cannot add an overload with std::nullptr_t
as that would make x < 0
ambiguous again.
However, we can disable it with SFINAE:
#include <type_traits>
template<class T,
class = std::enable_if_t<std::is_same_v<T, std::nullptr_t>>>
bool operator<(X, T) = delete;
For x < 0
, T
will be int
and this overload is disabled via std::enable_if
.
For x < nullptr
, T
is deduced to be std::nullptr_t
and the overload is valid.
It requires no additional conversion and thus matches better than our void*
overload.
The compile error is not ideal (use of deleted function 'operator<'
) and could be improved with a static_assert
if desired.
The final issue is that x < &x
compiles.
The reason is that many pointer types can be converted to void*
, which is a pretty general type.
While we cannot fix all corner cases, we can make it incredibly hard to accidentally trigger it.
First, we change void*
into a pointer-to-data-member, let’s say X* X::*
(a pointer to a member of X
that is itself a pointer to X
).
0
can still initialize this pointer, but the only other way would be &X::m
(where m
is an actual member of X
of type X*
) or via explicit casts.
If you really don’t want to take any chances, you can also reach for a Voldemort Type:
auto hidden_ptr_to_member()
{
struct H { };
H* H::* p = nullptr;
return p;
}
using literal_zero = decltype(hidden_ptr_to_member());
bool operator<(X, literal_zero);
Outside of hidden_ptr_to_member
there is no syntax to name the inner struct H
, thus making it a type-that-cannot-be-named.
The only way to actually refer to it is by using decltype(...)
.
Apart from the inability to name it, it is just a pointer-to-data-member and thus suitable for our x < 0
overload.
For a final touch and maybe as an appeal to our inner humanity, we move hidden_ptr_to_member
to namespace impl
.
impl::
and detail::
namespaces are basically C++ bro code for “I have to expose this because of C++ rules but if you actively use anything inside it, you’re on your own”.
Summary
And with that, we have our final version:
#include <type_traits>
namespace impl
{
auto hidden_ptr_to_member()
{
struct H { };
H* H::* p = nullptr;
return p;
}
}
struct not_literal_zero { not_literal_zero(int); };
using literal_zero = decltype(impl::hidden_ptr_to_member());
struct X { };
bool operator<(X, not_literal_zero); // (1)
bool operator<(X, literal_zero); // (2)
template<class forbid_nullptr,
class = std::enable_if_t<std::is_same_v<forbid_nullptr, std::nullptr_t>>>
bool operator<(X, forbid_nullptr) = delete; // (3)
X x;
x < 1; // calls (1)
x < 0; // calls (2)
x < nullptr; // calls (3) which is deleted
x < &x; // no matching call
x < 0
and x < 1
call different functions and we can safely optimize the x < 0
case.
With creative use of some slightly more arcane C++ rules we were able to remove a few pitfalls (like being able to call x < y
with nullptr
or various other pointer types).
I probably have to say that I consider this thing more in the category of “fun with C++” than “production-ready code”, so please use your own judgement before adopting it. A possible middle ground could be
[[deprecated("use is_below_zero(X) instead")]]
bool operator<(X, literal_zero);
(Title image from unsplash)