Nubbing lists in C++

July 23, 2023 // c++

It’s been a while since I last used C++ for anything serious, but once a C++ guy, you’re always a C++ guy, right? So, I decided to see how modern C++ fares in a seemingly simple task: eliminating duplicate list elements.

That sounds trivial, so why bother with a whole blog post? Well, the catch is we’re gonna do this at compile-time. Moreover, lists will be represented as tuples, and the elements might have different types.

Hopefully, during this little exercise, we’ll also learn (or at least reinforce) a pattern or two of modern metaprogramming.

So, we want a function that, given a

constexpr std::tuple<int, double, int, double, float> { 1, 2.0, 1, 3.0, 2.0 }

returns a

constexpr std::tuple<int, double, double, float> { 1, 2.0, 3.0, 2.0 }

This differs from eliminating duplicate types from a std::tuple, a well-known problem with well-known solutions. Our case is more complicated: we also need to look at the value level, which is hard at compile-time.

Disclaimer: I pass everything by value, mainly for brevity and to avoid overstretching the attention span. And, anyway, all this stuff happens at compile-time, so why bother?

First, we’ll need the following helper to compare elements of perhaps different types:

template<typename T1, typename T2>
consteval bool JMEq(T1 v1, T2 v2)
{
	if constexpr (!std::is_same_v<T1, T2>)
		return false;
	else
		return v1 == v2;
}

which also assumes constexpr operator== and all that stuff.

First attempt

Next, let’s throw up something simple just to get it to compile (here, Nub is named after the corresponding function in Haskell Prelude):

consteval auto Nub()
{
	return std::tuple {};
}

consteval auto Nub(auto item, auto... rest)
{
	if constexpr ((JMEq(item, rest) || ...))
		return Nub(rest...);
	else
		return std::tuple_cat(std::tuple { item }, Nub(rest...));
}

template<typename... Args>
consteval auto Nub(std::tuple<Args...> tuple)
{
	return std::apply([](auto... args) consteval { return Nub(args...); }, tuple);
}

The first Nub is the base case, the second Nub is the recursive case, and the third Nub unpacks the tuple and passes its arguments to either of the first two. It’s just a tad easier to manipulate variadic packs here than tuples.

This should work, right? Except it doesn’t. If we try to run it (as much as one runs a compile-time thing):

int main()
{
	constexpr std::tuple<int, double, int, double, float> input { 1, 2.0, 1, 3.0, 2.0 };
	constexpr std::tuple<int, double, double, float> expected { 1, 2.0, 3.0, 2.0 };
	constexpr auto actual = Nub(input);
}

we’ll get 398 lines of errors with clang 16 and 776 lines with gcc 13.1. All those hundreds of lines are basically saying that:

error: constexpr if condition is not a constant expression
	if constexpr ((JMEq(item, rest) || ...))
				   ^~~~~~~~~~~~~~~~
[.. snip ..]
note: function parameter 'item' with unknown value cannot be used in a constant expression
	if constexpr ((JMEq(item, rest) || ...))
						^

Alrighty, this is interesting. Or, rather, kinda silly: item is a parameter to a consteval function, and the only way to invoke a consteval function is by definition at compile time. Hence, if you think about it, all parameters to a consteval function must be known at compile-time.

I’m unsure whether this is intentional or if both clang and gcc get the standard wrong, but the bottom line is that this approach doesn’t work in practice.

Tuples as NTTPs

Can this be fixed easily? Well, if we cannot pass constexpr things as normal parameters of even consteval functions, we might try passing them as template parameters. Non-type template parameters, to be precise:

template<std::tuple tuple>
consteval auto Nub()
{
	// TODO
}

int main()
{
	// ...
	constexpr auto actual = Nub<input>();
}

Where does it get us?

Well, clang is unhappy:

error: no matching function for call to 'Nub'
		constexpr auto actual = Nub<input>();
								^~~~~~~~~~
note: candidate template ignored: invalid explicitly-specified argument for template parameter 'tuple'
consteval auto Nub()
			   ^

Cool, the argument is invalid, but could you, dear clang, tell me why and how to make it valid?

Luckily, there’s also gcc that is a bit more helpful here:

error: no matching function for call to 'Nub<input>()'
   44 |		 constexpr auto actual = Nub<input>();
	  |								 ~~~~~~~~~~^~
note: candidate: 'template<tuple<...auto...> tuple> consteval auto Nub()'
   35 | consteval auto Nub()
	  |				^~~
note:   template argument deduction/substitution failed:
error: 'std::tuple<int, double, int, double, float>' is not a valid type for a template non-type parameter because it is not structural
   44 |		 constexpr auto actual = Nub<input>();
	  |								 ~~~~~~~~~~^~
[.. snip ..]

Right, because it is not structural, whatever that means. Curiously, it is only diagnosed at function invocation time, but anyway, here we are: tuples won’t work.

How could we fix it? Well, let’s…

Roll our own tuples

The problem with std::tuple is that it does too much for it to reasonably be structural. gcc is helpful enough to tell us why: the last error message continues with

note: 'std::tuple<int, double, int, double, float>::<anonymous>' has a non-structural type
  746 |	 class tuple : public _Tuple_impl<0, _Elements...>
	  |		   ^~~~~
[.. snip ..]
note: base class 'std::_Head_base<4, float, false>' is not public
  489 |	 struct _Tuple_impl<_Idx, _Head>
	  |			^~~~~~~~~~~~~~~~~~~~~~~~

So if we write a simple tuple, we should be golden, right?

So here’s our ST (for Simple or Static Tuple) class. The gist of it is what you might expect:

template<typename... Args>
struct ST;

template<>
struct ST<> {};

template<typename Head, typename... Rest>
struct ST<Head, Rest...> : ST<Rest...>
{
	Head elem;

	constexpr ST (Head head, Rest... rest)
	: ST<Rest...> { rest... }
	, elem { head }
	{
	}
};

The base case is the empty tuple with no members, and the recursive case keeps a member of the Head of the type list, and delegates the Rest to the recursive instantiation by deriving from it.

For convenience, we might also add a helper to the recursive case to access the part of the tuple with the rest of the elements. Thanks to the inheritance relation, it is trivial:

	constexpr ST<Rest...> rest() const { return *this; }

We’ll also eventually need an analog of std::apply that passes all elements from the tuple to a passed callable. The idea is to recursively build up the list of the arguments and apply them in the base case. Thus, the recursive case looks like:

	constexpr auto apply(auto f, auto... args) const { return rest().apply(f, args..., elem); }

where args is the (possibly empty) pack of arguments already collected from the part of the tuple before the current element, and apply appends the current element to that pack.

The base case merely applies the pack:

	constexpr auto apply(auto f, auto... args) const { return std::invoke(f, args...); }

Overall, the class looks like this.

template<typename... Args>
struct ST;

template<>
struct ST<>
{
	constexpr auto apply(auto f, auto... args) const { return std::invoke(f, args...); }
};

template<typename Head, typename... Rest>
struct ST<Head, Rest...> : ST<Rest...>
{
	Head elem;

	constexpr ST (Head head, Rest... rest)
	: ST<Rest...> { rest... }
	, elem { head }
	{
	}

	constexpr ST<Rest...> rest() const { return *this; }

	constexpr auto apply(auto f, auto... args) const { return rest().apply(f, args..., elem); }
};

The Nub function has to be changed to account for the argument being an NTTP instead of a standard function argument, but the idea is the same:

template<ST tup>
constexpr auto Nub()
{
	return std::tuple {};
}

template<ST tup>
	requires requires { tup.rest(); }
constexpr auto Nub()
{
	if constexpr (tup.rest().apply([] (auto... args) { return (JMEq(tup.elem, args) || ...); }))
		return Nub<tup.rest()>();
	else
		return std::tuple_cat(std::tuple { tup.elem }, Nub<tup.rest()>());
}

As before, the first definition is the base case, and the second one is the recursive case, but it is worth expanding on a couple of points here.

Firstly, note that both definitions lack the template parameters for ST, relying on CTAD and C++20 enabling, ugh, class non-type template parameters to derive the appropriate ... in ST<...>. Without this, we’d have to specify the type list in Nub explicitly at Nub’s call site, which, as we’ll later see, would effectively prohibit this implementation.

Then, both definitions have the same template arguments and the same (zero, to be precise) standard arguments. How does the compiler disambiguate? The second definition is constrained by requiring the rest() function in the passed tup, which only exists in non-empty tuples — precisely when we want to recur. The first definition is the base case, and it’s unconstrained. In concepts-speak, the second definition is more specific than the first one, so the compiler chooses it whenever possible.

Given all that, the only thing remaining is the conversion from std::tuple to ST, which is straightforward:

constexpr auto structuralize(auto tuple)
{
	return std::apply([]<typename... Args>(Args... args) { return ST<Args...>(args...); }, tuple);
}

And… it works! The following compiles:

int main()
{
	constexpr std::tuple<int, double, int, double, float> input { 1, 2.0, 1, 3.0, 2.0 };
	constexpr std::tuple<double, int, double, float> expected { 2.0, 1, 3.0, 2.0 };
	constexpr auto actual = Nub<structuralize(input)>();
	static_assert(expected == actual);
}

There are two problems with it, though:

This removes all but the last occurrences of each duplicated element, while we’d like to keep the first one.
structuralize is an implementation detail, but it can’t be abstracted away. Indeed, std::tuple can’t be passed as a template argument (as we’ve seen), and it can’t be passed as a standard argument either since it loses constexprness (as we’ve seen as well).

The first point is easy to fix: just reverse the tuple twice, before and after nubbing it. The following helper, which uses an immediately invoked lambda with C++20’s explicit generic lambda template arguments, will do:

template<typename Tuple>
constexpr auto Reverse(Tuple tuple)
{
	constexpr auto size = std::tuple_size_v<Tuple>;
	return [&]<size_t... Ix>(std::index_sequence<Ix...>)
	{
		return std::tuple { std::get<size - Ix - 1>(tuple)... };
	} (std::make_index_sequence<size> {});
}

The second one is harder. Looks like we’re stuck with exposing this ugly implementation detail.

Or are we?

NTTP in C++20

Turns out, C++20 supports classes as NTTPs, thanks to a proposal (which was co-authored by that Boost.Hana guy, by the way — he definitely knows his stuff). In particular, C++20 supports lambdas as NTTPs since they effectively desugar to a class.

Luckily, a lambda can be a valid NTTP even if it’s returning a non-structural type! That is, instead of passing an std::tuple, we pass a lambda returning an std::tuple, so the following is legit:

template<auto F>
constexpr auto NubTuple()
{
	constexpr auto rtuple = Reverse(F());
	return Reverse(Nub<structuralize(rtuple)>());
}

int main()
{
	constexpr std::tuple<int, double, int, double, float> input { 1, 2.0, 1, 3.0, 2.0 };
	constexpr std::tuple<double, int, double, float> expected { 1, 2.0, 3.0, 2.0 };
	constexpr auto actual = NubTuple<[=] { return input; }>();
	static_assert(expected == actual);
}

(in the meanwhile, we’re also doing the double-reversal thing).

And, indeed, gcc accepts it. clang, on the other hand, is unhappy, complaining about invalid explicitly-specified argument for template parameter 'F' again. The reason is that clang 16 does not fully implement the proposal above. Welp.

…in a portable way

But all is not lost! Even though clang doesn’t like a class instance as an NTTP, it’s happy with a reference to it. So, the following works with both gcc and clang:

template<const auto& F>
constexpr auto NubTuple()
{
	constexpr auto rtuple = Reverse(F());
	return Reverse(Nub<structuralize(rtuple)>());
}

int main()
{
	constexpr std::tuple<int, double, int, double, float> input { 1, 2.0, 1, 3.0, 2.0 };
	constexpr std::tuple<double, int, double, float> expected { 1, 2.0, 3.0, 2.0 };
	constexpr static auto lam = [=] { return input; };
	constexpr auto actual = NubTuple<lam>();
	static_assert(expected == actual);
}

Note the extra statics in the definition of lam, which is required by both gcc and clang.

Requiring lam to be static makes sense: the compiler needs to know its address at, well, compile-time, so the address must not depend on the particular invocation of the surrounding function (which is main in this case which can have only one call stack, but let’s pretend it’s not). The only way to achieve that is to mark the variable static.

Curiously, if we capture input by reference instead, gcc is still happy, while clang requires input to also be static. On the one hand, similar reasoning applies to input: if it’s captured by reference, its address is effectively taken, and since it’s used to initialize a constexpr thing, it also needs to be known at compile-time. So, in this case, the behaviour of gcc is somewhat puzzling for me. I would’ve said, «dunno optimizations,» but I’m compiling with -O0. Alas, I don’t have a good answer here.

…but is it wrong?

One might object to marking things static here, though, for two reasons:

constexpr static implies the value is the same at every invocation of the function.
static might cause the compiler to materialize the thing and pollute the resulting executable, which we don’t want.

The first point doesn’t seem to be a problem: since the thing is constexpr, it might only depend on compile-time parameters, which can’t seem to depend on the particular invocation of a given instantiation of the function. Thus, different instantiations have different static instances. Hence, static has no effect here!

The second point is better verified experimentally. For example, if we try a straightforward function returning a single bool — the result of a runtime equality check of the expected and actual constexpr tuples, it gets compiled to a single

		mov	 eax, 1
		ret

And, for the record, the overall solution looks

like this.

#include <functional>
#include <tuple>

template<typename... Args>
struct ST;

template<>
struct ST<>
{
	constexpr auto apply(auto f, auto... args) const { return std::invoke(f, args...); }
};

template<typename Head, typename... Rest>
struct ST<Head, Rest...> : ST<Rest...>
{
	Head elem;

	constexpr ST (Head head, Rest... rest)
	: ST<Rest...> { rest... }
	, elem { head }
	{
	}

	constexpr ST<Rest...> rest() const { return *this; }

	constexpr auto apply(auto f, auto... args) const { return rest().apply(f, args..., elem); }
};

template<typename T1, typename T2>
constexpr bool JMEq(const T1& v1, const T2& v2)
{
	if constexpr (!std::is_same_v<T1, T2>)
		return false;
	else
		return v1 == v2;
}

template<ST tup>
constexpr auto Nub()
{
	return std::tuple {};
}

template<ST tup>
	requires requires { tup.rest(); }
constexpr auto Nub()
{
	if constexpr (tup.rest().apply([] (auto... args) { return (JMEq(tup.elem, args) || ...); }))
		return Nub<tup.rest()>();
	else
		return std::tuple_cat(std::tuple { tup.elem }, Nub<tup.rest()>());
}

constexpr auto structuralize(auto tuple)
{
	return std::apply([]<typename... Args>(Args... args) { return ST<Args...>(args...); }, tuple);
}
template<typename Tuple>
constexpr auto Reverse(Tuple tuple)
{
	constexpr auto size = std::tuple_size_v<Tuple>;
	return [&]<size_t... Ix>(std::index_sequence<Ix...>)
	{
		return std::tuple { std::get<size - Ix - 1>(tuple)... };
	} (std::make_index_sequence<size> {});
}


template<const auto& F>
constexpr auto NubTuple()
{
	constexpr auto rtuple = Reverse(F());
	return Reverse(Nub<structuralize(rtuple)>());
}

Avoiding `ST`

What we have right now is working fine, but it’s just too much code. In particular, this ST class is a liability: a lot of thought and effort went into std::tuple, and while ST seems to work for this simple case, I’m not sure it will work (or at least fail gracefully) in all corner cases. Let’s try to get rid of it.

We’ve already learned the trick of passing a [reference to a] lambda if the desired type is non-structural. Can we apply it to our first attempt? Sure, we can, but the result will look somewhat ugly, requiring using NTTP [references to] lambdas everywhere, up to JMEq.

Instead, we can try inlining the whole algorithm in one function. How do we do that? Let’s build it line by line.

We start with

template<const auto& F>
constexpr auto Nub()
{
	constexpr auto tup = F();
	constexpr auto indices = std::make_index_sequence<std::tuple_size_v<decltype(tup)>> {};

So far, it makes sense, as we’ve just got the tuple we need to work on and also created the list of indices into that tuple.

Next, for each element in tup, we scan the previous ones and create an empty tuple if there is a duplicate or a singleton tuple with the current element otherwise. Then we return the concatenation of all those tuples. So we’d like to write something like:

	return std::tuple_cat (
		// block of code expanded for each index `Ix` in `indices`
		constexpr auto element = std::get<Ix>(tup);

		if constexpr (/* element is duplicate, think about this later */)
			return std::tuple {};
		else
			return std::tuple { element };
		);

This “for each” part sounds like an opportunity for fold expressions, but, unfortunately, there’re a bunch of statements in the above, and fold expressions don’t work with statements. How can we fix this? Of course, wrap it into an immediately invoked lambda to turn it into an expression! We’ll also need to wrap it in yet another lambda to extract the Ix variadic pack from the indices:

	return [&]<std::size_t... Ix>(std::index_sequence<Ix...>)
	{
		return std::tuple_cat ([&]
			{
				constexpr auto element = std::get<Ix>(tup);

				if constexpr (/* ... */)
					return std::tuple {};
				else
					return std::tuple { element };
			} ()...);
	} (indices);

Let’s now fill in the last blank: the if condition. So, again, we need to check whether the element is equal to any subsequent element. Indexing other elements is straightforward: the Ix pack can just be reused. Checking whether the current index in Ix requires saving the current index in the outer expansion, resulting in

				constexpr auto index = Ix;
				constexpr auto element = std::get<index>(tup);

				if constexpr (((JMEq(element, std::get<Ix>(tup)) && Ix < index) || ...))
					return std::tuple {};
				else
					return std::tuple { element };

Indeed, without a separate index, we’d have to write Ix < Ix, which doesn’t make sense.

Overall, the code looks

like this.

#include <functional>
#include <tuple>

template<typename T1, typename T2>
consteval bool JMEq(const T1& v1, const T2& v2)
{
	if constexpr (!std::is_same_v<T1, T2>)
		return false;
	else
		return v1 == v2;
}

template<const auto& F>
constexpr auto Nub()
{
	constexpr auto tup = F();
	constexpr auto indices = std::make_index_sequence<std::tuple_size_v<decltype(tup)>> {};

	return [&]<std::size_t... Ix>(std::index_sequence<Ix...>)
	{
		return std::tuple_cat ([&]
			{
				constexpr auto index = Ix;
				constexpr auto element = std::get<index>(tup);

				if constexpr (((JMEq(element, std::get<Ix>(tup)) && Ix < index) || ...))
					return std::tuple {};
				else
					return std::tuple { element };
			} ()...);
	} (indices);
}

Benchmarks

Even though I said run time doesn’t matter, let’s look at each version’s compile times, shall we?

As a benchmark, I create a tuple of N = 3k elements, where first k elements have all different types, then the following k elements each have the same type as the corresponding element in the first group but different value, and then there is a copy of the first k elements,

using this code.

template<size_t>
struct Test
{
	int field;
	constexpr auto operator<=>(const Test& other) const = default;
};

template<size_t N>
constexpr auto generateTuple()
{
	return [&]<std::size_t... Ix>(std::index_sequence<Ix...>)
	{
		return std::tuple { Test<Ix> { Ix }..., Test<Ix> { Ix + 1 }..., Test<Ix> { Ix }... };
	} (std::make_index_sequence<N> {});
}

#ifndef SIZE
#define SIZE 10
#endif

int main()
{
	constexpr auto actual = Nub<generateTuple<SIZE>>();
	static_assert(std::tuple_size_v<decltype(actual)> == SIZE * 2);
}

~~Running~~ Compiling with different values of -DSIZE (three times for each and taking the minimum) results in the following:

Fitting these data points with an exponential curve gives roughly O(n^2.4) for the inline version both for gcc and clang (albeit with different constant and scaling factors). The ST version is approximately O(n^2.5) on clang and O(n^3.7) on gcc.

Looks like we have a clear winner!

Conclusion

So we’ve finally managed to remove duplicates at compile-time from a list represented as an std::tuple. We’ve started with a naive approach that doesn’t work because even consteval function parameters aren’t constexpr. We’ve tried passing the tuple as a non-type template parameter, but, alas, std::tuple is not structural, so that didn’t work either. It forced us to write our own structural tuple, eventually leading to an insight into how we could treat the standard ones.

Ultimately, the solution with just the standard tuples and fold expressions, without any function/type-level recursion, turned out to be both the shortest and the most efficient (at compile-time, that is). The modern metaprogramming advice is to avoid recursion and to use fold expressions instead whenever possible, and the results show that’s indeed the case.

And, going back a decade or so… Oh, the irony! I remember boasting about how incredible clang is, giving faster compilation times, more readable and informative error messages, and, most importantly, better standard support, all while gcc paled in comparison. Sure, gcc supported more target platforms and generated better code, but I only cared about the x86 family for the former, and I was ready to trade in the latter for all the clang’s strong points.

That’s no longer the case; it even seems like the tables have turned. It looks like gcc supports standards just a bit better, and sometimes it gives a little more informative error messages, all while clang generally produces faster code and supports targets like CUDA or wasm.

Market competition for the win, right?