Nubbing lists in C++
It’s been a while since I last used C++ for anything serious, but once a C++ guy, you’re always a C++ guy, right? So, I decided to see how modern C++ fares in a seemingly simple task: eliminating duplicate list elements.
That sounds trivial, so why bother with a whole blog post? Well, the catch is we’re gonna do this at compile-time. Moreover, lists will be represented as tuples, and the elements might have different types.
Hopefully, during this little exercise, we’ll also learn (or at least reinforce) a pattern or two of modern metaprogramming.
So, we want a function that, given a
constexpr std::tuple<int, double, int, double, float> { 1, 2.0, 1, 3.0, 2.0 }
returns a
constexpr std::tuple<int, double, double, float> { 1, 2.0, 3.0, 2.0 }
This differs from eliminating duplicate types from a std::tuple
,
a well-known problem with well-known solutions.
Our case is more complicated: we also need to look at the value level,
which is hard at compile-time.
Disclaimer: I pass everything by value, mainly for brevity and to avoid overstretching the attention span. And, anyway, all this stuff happens at compile-time, so why bother?
First, we’ll need the following helper to compare elements of perhaps different types:
template<typename T1, typename T2>
consteval bool JMEq(T1 v1, T2 v2)
{
if constexpr (!std::is_same_v<T1, T2>)
return false;
else
return v1 == v2;
}
constexpr operator==
and all that stuff.
First attempt
Next, let’s throw up something simple just to get it to compile
(here, Nub
is named after the corresponding function in Haskell Prelude):
consteval auto Nub()
{
return std::tuple {};
}
consteval auto Nub(auto item, auto... rest)
{
if constexpr ((JMEq(item, rest) || ...))
return Nub(rest...);
else
return std::tuple_cat(std::tuple { item }, Nub(rest...));
}
template<typename... Args>
consteval auto Nub(std::tuple<Args...> tuple)
{
return std::apply([](auto... args) consteval { return Nub(args...); }, tuple);
}
The first Nub
is the base case,
the second Nub
is the recursive case,
and the third Nub
unpacks the tuple and passes its arguments to either of the first two.
It’s just a tad easier to manipulate variadic packs here than tuples.
This should work, right? Except it doesn’t. If we try to run it (as much as one runs a compile-time thing):
int main()
{
constexpr std::tuple<int, double, int, double, float> input { 1, 2.0, 1, 3.0, 2.0 };
constexpr std::tuple<int, double, double, float> expected { 1, 2.0, 3.0, 2.0 };
constexpr auto actual = Nub(input);
}
we’ll get 398 lines of errors with clang 16 and 776 lines with gcc 13.1. All those hundreds of lines are basically saying that:
error: constexpr if condition is not a constant expression
if constexpr ((JMEq(item, rest) || ...))
^~~~~~~~~~~~~~~~
[.. snip ..]
note: function parameter 'item' with unknown value cannot be used in a constant expression
if constexpr ((JMEq(item, rest) || ...))
^
Alrighty, this is interesting.
Or, rather, kinda silly:
item
is a parameter to a consteval
function,
and the only way to invoke a consteval
function is by definition at compile time.
Hence, if you think about it, all parameters to a consteval
function must be known at compile-time.
I’m unsure whether this is intentional or if both clang and gcc get the standard wrong, but the bottom line is that this approach doesn’t work in practice.
Tuples as NTTPs
Can this be fixed easily?
Well, if we cannot pass constexpr
things as normal parameters of even consteval
functions,
we might try passing them as template parameters. Non-type template parameters, to be precise:
template<std::tuple tuple>
consteval auto Nub()
{
// TODO
}
int main()
{
// ...
constexpr auto actual = Nub<input>();
}
Where does it get us?
Well, clang is unhappy:
error: no matching function for call to 'Nub'
constexpr auto actual = Nub<input>();
^~~~~~~~~~
note: candidate template ignored: invalid explicitly-specified argument for template parameter 'tuple'
consteval auto Nub()
^
Cool, the argument is invalid, but could you, dear clang, tell me why and how to make it valid?
Luckily, there’s also gcc that is a bit more helpful here:
error: no matching function for call to 'Nub<input>()'
44 | constexpr auto actual = Nub<input>();
| ~~~~~~~~~~^~
note: candidate: 'template<tuple<...auto...> tuple> consteval auto Nub()'
35 | consteval auto Nub()
| ^~~
note: template argument deduction/substitution failed:
error: 'std::tuple<int, double, int, double, float>' is not a valid type for a template non-type parameter because it is not structural
44 | constexpr auto actual = Nub<input>();
| ~~~~~~~~~~^~
[.. snip ..]
Right, because it is not structural, whatever that means. Curiously, it is only diagnosed at function invocation time, but anyway, here we are: tuples won’t work.
How could we fix it? Well, let’s…
Roll our own tuples
The problem with std::tuple
is that it does too much for it to reasonably be structural.
gcc is helpful enough to tell us why:
the last error message continues with
note: 'std::tuple<int, double, int, double, float>::<anonymous>' has a non-structural type
746 | class tuple : public _Tuple_impl<0, _Elements...>
| ^~~~~
[.. snip ..]
note: base class 'std::_Head_base<4, float, false>' is not public
489 | struct _Tuple_impl<_Idx, _Head>
| ^~~~~~~~~~~~~~~~~~~~~~~~
So if we write a simple tuple, we should be golden, right?
So here’s our ST
(for Simple or Static Tuple) class.
The gist of it is what you might expect:
template<typename... Args>
struct ST;
template<>
struct ST<> {};
template<typename Head, typename... Rest>
struct ST<Head, Rest...> : ST<Rest...>
{
Head elem;
constexpr ST (Head head, Rest... rest)
: ST<Rest...> { rest... }
, elem { head }
{
}
};
The base case is the empty tuple with no members,
and the recursive case keeps a member of the Head
of the type list,
and delegates the Rest
to the recursive instantiation by deriving from it.
For convenience, we might also add a helper to the recursive case to access the part of the tuple with the rest of the elements. Thanks to the inheritance relation, it is trivial:
constexpr ST<Rest...> rest() const { return *this; }
We’ll also eventually need an analog of std::apply
that passes all elements from the tuple to a passed callable.
The idea is to recursively build up the list of the arguments
and apply them in the base case.
Thus, the recursive case looks like:
constexpr auto apply(auto f, auto... args) const { return rest().apply(f, args..., elem); }
where args
is the (possibly empty) pack of arguments
already collected from the part of the tuple before the current element,
and apply
appends the current element to that pack.
The base case merely applies the pack:
constexpr auto apply(auto f, auto... args) const { return std::invoke(f, args...); }
Overall, the class looks like this.
template<typename... Args>
struct ST;
template<>
struct ST<>
{
constexpr auto apply(auto f, auto... args) const { return std::invoke(f, args...); }
};
template<typename Head, typename... Rest>
struct ST<Head, Rest...> : ST<Rest...>
{
Head elem;
constexpr ST (Head head, Rest... rest)
: ST<Rest...> { rest... }
, elem { head }
{
}
constexpr ST<Rest...> rest() const { return *this; }
constexpr auto apply(auto f, auto... args) const { return rest().apply(f, args..., elem); }
};
The Nub
function has to be changed
to account for the argument being an NTTP instead of a standard function argument,
but the idea is the same:
template<ST tup>
constexpr auto Nub()
{
return std::tuple {};
}
template<ST tup>
requires requires { tup.rest(); }
constexpr auto Nub()
{
if constexpr (tup.rest().apply([] (auto... args) { return (JMEq(tup.elem, args) || ...); }))
return Nub<tup.rest()>();
else
return std::tuple_cat(std::tuple { tup.elem }, Nub<tup.rest()>());
}
As before, the first definition is the base case, and the second one is the recursive case, but it is worth expanding on a couple of points here.
Firstly, note that both definitions lack the template parameters for ST
,
relying on CTAD and C++20 enabling, ugh, class non-type template parameters
to derive the appropriate ...
in ST<...>
.
Without this, we’d have to specify the type list in Nub
explicitly at Nub
’s call site,
which, as we’ll later see, would effectively prohibit this implementation.
Then, both definitions have the same template arguments and
the same (zero, to be precise) standard arguments.
How does the compiler disambiguate?
The second definition is constrained by requiring the rest()
function in the passed tup
,
which only exists in non-empty tuples — precisely when we want to recur.
The first definition is the base case, and it’s unconstrained.
In concepts-speak, the second definition is more specific than the first one,
so the compiler chooses it whenever possible.
Given all that, the only thing remaining is the conversion from std::tuple
to ST
,
which is straightforward:
constexpr auto structuralize(auto tuple)
{
return std::apply([]<typename... Args>(Args... args) { return ST<Args...>(args...); }, tuple);
}
And… it works! The following compiles:
int main()
{
constexpr std::tuple<int, double, int, double, float> input { 1, 2.0, 1, 3.0, 2.0 };
constexpr std::tuple<double, int, double, float> expected { 2.0, 1, 3.0, 2.0 };
constexpr auto actual = Nub<structuralize(input)>();
static_assert(expected == actual);
}
There are two problems with it, though:
- This removes all but the last occurrences of each duplicated element, while we’d like to keep the first one.
structuralize
is an implementation detail, but it can’t be abstracted away. Indeed,std::tuple
can’t be passed as a template argument (as we’ve seen), and it can’t be passed as a standard argument either since it losesconstexpr
ness (as we’ve seen as well).
The first point is easy to fix: just reverse the tuple twice, before and after nubbing it. The following helper, which uses an immediately invoked lambda with C++20’s explicit generic lambda template arguments, will do:
template<typename Tuple>
constexpr auto Reverse(Tuple tuple)
{
constexpr auto size = std::tuple_size_v<Tuple>;
return [&]<size_t... Ix>(std::index_sequence<Ix...>)
{
return std::tuple { std::get<size - Ix - 1>(tuple)... };
} (std::make_index_sequence<size> {});
}
The second one is harder. Looks like we’re stuck with exposing this ugly implementation detail.
Or are we?
NTTP in C++20
Turns out, C++20 supports classes as NTTPs, thanks to a proposal (which was co-authored by that Boost.Hana guy, by the way — he definitely knows his stuff). In particular, C++20 supports lambdas as NTTPs since they effectively desugar to a class.
Luckily, a lambda can be a valid NTTP even if it’s returning a non-structural type!
That is, instead of passing an std::tuple
,
we pass a lambda returning an std::tuple
,
so the following is legit:
template<auto F>
constexpr auto NubTuple()
{
constexpr auto rtuple = Reverse(F());
return Reverse(Nub<structuralize(rtuple)>());
}
int main()
{
constexpr std::tuple<int, double, int, double, float> input { 1, 2.0, 1, 3.0, 2.0 };
constexpr std::tuple<double, int, double, float> expected { 1, 2.0, 3.0, 2.0 };
constexpr auto actual = NubTuple<[=] { return input; }>();
static_assert(expected == actual);
}
(in the meanwhile, we’re also doing the double-reversal thing).
And, indeed, gcc accepts it.
clang, on the other hand, is unhappy, complaining about
invalid explicitly-specified argument for template parameter 'F'
again.
The reason is that clang 16
does not fully implement the proposal above.
Welp.
…in a portable way
But all is not lost! Even though clang doesn’t like a class instance as an NTTP, it’s happy with a reference to it. So, the following works with both gcc and clang:
template<const auto& F>
constexpr auto NubTuple()
{
constexpr auto rtuple = Reverse(F());
return Reverse(Nub<structuralize(rtuple)>());
}
int main()
{
constexpr std::tuple<int, double, int, double, float> input { 1, 2.0, 1, 3.0, 2.0 };
constexpr std::tuple<double, int, double, float> expected { 1, 2.0, 3.0, 2.0 };
constexpr static auto lam = [=] { return input; };
constexpr auto actual = NubTuple<lam>();
static_assert(expected == actual);
}
Note the extra static
s in the definition of lam
,
which is required by both gcc and clang.
Requiring lam
to be static
makes sense:
the compiler needs to know its address at, well, compile-time,
so the address must not depend on the particular invocation of
the surrounding function
(which is main
in this case which can have only one call stack,
but let’s pretend it’s not).
The only way to achieve that is to mark the variable static
.
Curiously, if we capture input
by reference instead,
gcc is still happy, while clang requires input
to also be static
.
On the one hand, similar reasoning applies to input
:
if it’s captured by reference, its address is effectively taken,
and since it’s used to initialize a constexpr
thing,
it also needs to be known at compile-time.
So, in this case, the behaviour of gcc is somewhat puzzling for me.
I would’ve said, «dunno optimizations,» but I’m compiling with -O0
.
Alas, I don’t have a good answer here.
…but is it wrong?
One might object to marking things static
here, though, for two reasons:
constexpr static
implies the value is the same at every invocation of the function.static
might cause the compiler to materialize the thing and pollute the resulting executable, which we don’t want.
The first point doesn’t seem to be a problem:
since the thing is constexpr
,
it might only depend on compile-time parameters,
which can’t seem to depend on the particular invocation
of a given instantiation of the function.
Thus, different instantiations have different static
instances.
Hence, static
has no effect here!
The second point is better verified experimentally.
For example, if we try a straightforward function returning a single bool
—
the result of a runtime equality check of the expected and actual constexpr
tuples,
it gets compiled to a single
mov eax, 1
ret
like this.
#include <functional>
#include <tuple>
template<typename... Args>
struct ST;
template<>
struct ST<>
{
constexpr auto apply(auto f, auto... args) const { return std::invoke(f, args...); }
};
template<typename Head, typename... Rest>
struct ST<Head, Rest...> : ST<Rest...>
{
Head elem;
constexpr ST (Head head, Rest... rest)
: ST<Rest...> { rest... }
, elem { head }
{
}
constexpr ST<Rest...> rest() const { return *this; }
constexpr auto apply(auto f, auto... args) const { return rest().apply(f, args..., elem); }
};
template<typename T1, typename T2>
constexpr bool JMEq(const T1& v1, const T2& v2)
{
if constexpr (!std::is_same_v<T1, T2>)
return false;
else
return v1 == v2;
}
template<ST tup>
constexpr auto Nub()
{
return std::tuple {};
}
template<ST tup>
requires requires { tup.rest(); }
constexpr auto Nub()
{
if constexpr (tup.rest().apply([] (auto... args) { return (JMEq(tup.elem, args) || ...); }))
return Nub<tup.rest()>();
else
return std::tuple_cat(std::tuple { tup.elem }, Nub<tup.rest()>());
}
constexpr auto structuralize(auto tuple)
{
return std::apply([]<typename... Args>(Args... args) { return ST<Args...>(args...); }, tuple);
}
template<typename Tuple>
constexpr auto Reverse(Tuple tuple)
{
constexpr auto size = std::tuple_size_v<Tuple>;
return [&]<size_t... Ix>(std::index_sequence<Ix...>)
{
return std::tuple { std::get<size - Ix - 1>(tuple)... };
} (std::make_index_sequence<size> {});
}
template<const auto& F>
constexpr auto NubTuple()
{
constexpr auto rtuple = Reverse(F());
return Reverse(Nub<structuralize(rtuple)>());
}
Avoiding ST
What we have right now is working fine,
but it’s just too much code.
In particular, this ST
class is a liability:
a lot of thought and effort went into std::tuple
,
and while ST
seems to work for this simple case,
I’m not sure it will work (or at least fail gracefully) in all corner cases.
Let’s try to get rid of it.
We’ve already learned the trick of passing a [reference to a] lambda
if the desired type is non-structural.
Can we apply it to our first attempt?
Sure, we can, but the result will look somewhat ugly,
requiring using NTTP [references to] lambdas
everywhere, up to JMEq
.
Instead, we can try inlining the whole algorithm in one function. How do we do that? Let’s build it line by line.
We start with
template<const auto& F>
constexpr auto Nub()
{
constexpr auto tup = F();
constexpr auto indices = std::make_index_sequence<std::tuple_size_v<decltype(tup)>> {};
So far, it makes sense, as we’ve just got the tuple we need to work on and also created the list of indices into that tuple.
Next, for each element
in tup
,
we scan the previous ones and create an empty tuple
if there is a duplicate
or a singleton tuple with the current element
otherwise.
Then we return the concatenation of all those tuples.
So we’d like to write something like:
return std::tuple_cat (
// block of code expanded for each index `Ix` in `indices`
constexpr auto element = std::get<Ix>(tup);
if constexpr (/* element is duplicate, think about this later */)
return std::tuple {};
else
return std::tuple { element };
);
This “for each” part sounds like an opportunity for fold expressions, but,
unfortunately, there’re a bunch of statements in the above,
and fold expressions don’t work with statements.
How can we fix this?
Of course,
wrap it into an immediately invoked lambda to turn it into an expression!
We’ll also need to wrap it in yet another lambda
to extract the Ix
variadic pack from the indices
:
return [&]<std::size_t... Ix>(std::index_sequence<Ix...>)
{
return std::tuple_cat ([&]
{
constexpr auto element = std::get<Ix>(tup);
if constexpr (/* ... */)
return std::tuple {};
else
return std::tuple { element };
} ()...);
} (indices);
Let’s now fill in the last blank: the if
condition.
So, again, we need to check whether the element
is equal to any subsequent element.
Indexing other elements is straightforward:
the Ix
pack can just be reused.
Checking whether the current index in Ix
requires
saving the current index
in the outer expansion, resulting in
constexpr auto index = Ix;
constexpr auto element = std::get<index>(tup);
if constexpr (((JMEq(element, std::get<Ix>(tup)) && Ix < index) || ...))
return std::tuple {};
else
return std::tuple { element };
Indeed, without a separate index
, we’d have to write Ix < Ix
,
which doesn’t make sense.
like this.
#include <functional>
#include <tuple>
template<typename T1, typename T2>
consteval bool JMEq(const T1& v1, const T2& v2)
{
if constexpr (!std::is_same_v<T1, T2>)
return false;
else
return v1 == v2;
}
template<const auto& F>
constexpr auto Nub()
{
constexpr auto tup = F();
constexpr auto indices = std::make_index_sequence<std::tuple_size_v<decltype(tup)>> {};
return [&]<std::size_t... Ix>(std::index_sequence<Ix...>)
{
return std::tuple_cat ([&]
{
constexpr auto index = Ix;
constexpr auto element = std::get<index>(tup);
if constexpr (((JMEq(element, std::get<Ix>(tup)) && Ix < index) || ...))
return std::tuple {};
else
return std::tuple { element };
} ()...);
} (indices);
}
Benchmarks
Even though I said run time doesn’t matter, let’s look at each version’s compile times, shall we?
As a benchmark, I create a tuple ofN = 3k
elements,
where first k
elements have all different types,
then the following k
elements each have the same type
as the corresponding element in the first group but different value,
and then there is a copy of the first k
elements,
using this code.
template<size_t>
struct Test
{
int field;
constexpr auto operator<=>(const Test& other) const = default;
};
template<size_t N>
constexpr auto generateTuple()
{
return [&]<std::size_t... Ix>(std::index_sequence<Ix...>)
{
return std::tuple { Test<Ix> { Ix }..., Test<Ix> { Ix + 1 }..., Test<Ix> { Ix }... };
} (std::make_index_sequence<N> {});
}
#ifndef SIZE
#define SIZE 10
#endif
int main()
{
constexpr auto actual = Nub<generateTuple<SIZE>>();
static_assert(std::tuple_size_v<decltype(actual)> == SIZE * 2);
}
Running Compiling with different values of -DSIZE
(three times for each and taking the minimum) results in the following:
Fitting these data points with an exponential curve gives roughly O(n2.4)
for the inline version both for gcc and clang (albeit with different constant and scaling factors).
The ST
version is approximately O(n2.5) on clang and O(n3.7) on gcc.
Looks like we have a clear winner!
Conclusion
So we’ve finally managed to remove duplicates at compile-time
from a list represented as an std::tuple
.
We’ve started with a naive approach that doesn’t work
because even consteval
function parameters aren’t constexpr
.
We’ve tried passing the tuple as a non-type template parameter,
but, alas, std::tuple
is not structural, so that didn’t work either.
It forced us to write our own structural tuple,
eventually leading to an insight into how we could treat the standard ones.
Ultimately, the solution with just the standard tuples and fold expressions, without any function/type-level recursion, turned out to be both the shortest and the most efficient (at compile-time, that is). The modern metaprogramming advice is to avoid recursion and to use fold expressions instead whenever possible, and the results show that’s indeed the case.
And, going back a decade or so… Oh, the irony! I remember boasting about how incredible clang is, giving faster compilation times, more readable and informative error messages, and, most importantly, better standard support, all while gcc paled in comparison. Sure, gcc supported more target platforms and generated better code, but I only cared about the x86 family for the former, and I was ready to trade in the latter for all the clang’s strong points.
That’s no longer the case; it even seems like the tables have turned. It looks like gcc supports standards just a bit better, and sometimes it gives a little more informative error messages, all while clang generally produces faster code and supports targets like CUDA or wasm.
Market competition for the win, right?