HashSets with zero boilerplate
If you've ever read a Rust tutorial you've probably seen how easy it is to make vectors of things with the `vec![]` macro. But why don't sets have one?
We're gonna start this blog easy with something tiny that bugged me for a long time: making sets.
If you've ever read a Rust tutorial you've probably seen how easy it is to make vectors of things. In fact, most vectors we create seem to be created this way.
That's the little vec![]
macro.
In fact, you can do the same thing with fixed-size slices too: [1,2,3]
– Boom, you get a [u8;3]
. To be honest I've written so many of these that I'd expect most collections to be as easy to create.
But other Rust collections can be harder to create than just writing 'em out to like a nice little vec![]
.
Take for example the HashSet
. Usually, you see HashSet
initialized as mutable, and we insert
things into it over time. For small sets, your initial attempt may be to write something like this:
For a larger set of elements we already know, you could iterate over a slice of the items, and insert them like this:
But this makes it harder to understand what you're trying to achieve, because its full of how you are doing it. How are you filling the set, how are you making it immutable, etc.
Enter the From<[T; N]> trait!
Turns out that HashSet
implements a super useful trait: From<[T; N]>
The From<T; N>
trait helps users of your types to create values with fixed-size slices, which is a lot like using the vec![]
macro! Like this:
Now I love this one because it's super clear and much shorter, and I'd generally recommend you to use it.
But to be honest with you, its not always obvious what you can and can't from
from. Hah, that reads weird. What I mean is that you can't always immediately tell what thing you can use the from
function on – can you do from(vec)
? can you do from(iter)
? Only one way to know: ask the compiler.
The set!{1,2,3}
macro
So for this reason, and for those times I need a little more control over how I'm building sets all over my application (or library), I like to introduce a small macro. The set!
macro.
This macro approach I'd normally stay away from, and just repeat the code so its obvious what's happening, but given different Set implementations can be built differently, it has helped me in the past.
Here's the macro:
And that brings our tiny example all the way down to this:
let my_set = set!{1,2,3,4};
This is also very useful for tests and other situations where you just want to create a set and minimize visual noise.
Neat huh? But how does this really work? That macro has some funky symbols!
How does this macro work?
This is a declarative macro. In Rust we use macro_rules <name> { <branches> }
to define new macros, and this one is defined in 2 branches.
Each branch is pattern-matched against the place where we use the macro. So for example if you called set!{}
then the macro will be matched against the first branch, because the macro has no parameters.
The matching goes from the top, down, one branch at a time.
💡
If you've ever worked with Erlang or Elixir, these way of branching will feel right at home.
However, if we call set!{1,2,3}
then the macro has some parameters. It will fail to match the first branch and then match with the second one.
The first branch is trivial. If our call matches it, it replaces the whole macro call with HashSet::default()
. Note how now this macro becomes the only place that really knows about the kind of set we are building.
The second branch is less trivial, and it's in fact rather advanced. It reads:
$(
– starts a repetition pattern,$x:expr
– of expressions, that we will call$x
),
– ends repetition pattern, and expects items to be separated by commas+
– declares that we expect 1 or more items$(,)?
– and finishes off with an optional trailing comma
So this would make sure we match any non-empty sequence of expressions that are separated by commas. Kind of like a regular expression.
On the body of this branch, we find another similar pattern to the capturing one, which goes:
[
– start slice$(
– start repetition$x
– use the corresponding expression in the same order as it was captured),+
– close the repetition, putting a,
between every element]
– close the slice
And the rest is just a function call to HashSet::from
.
In short, this means that if we match a non-empty sequence of expressions that are separated by commas, we will replace it with a call to HashSet::from
with a new fixed-size slice, containing all of the expressions we matched.
But seriously, why would you use the macro?
In the past, I've had to switch up implementations of HashSet
because it was too slow for what I was building. This normally touched code that processed other collections, so it wasn't necessarily fixed-size inputs to the set.
However, it also meant that the few fixed-size sets I had lying around that were either for tests or for production paths in the code, all of a sudden were incompatible, and I had to track down compilation errors to make them fit.
This little macro helped keep tests consistent, not rely too much on iterators as inputs everywhere, and overall made it easier to swap from HashSet
to FxHashSet
to DashSet
.
Not everyone's cup of tea, and in the future I'd like to try an approach that is more iterator-based, but I figured I'd share this until I do.
Conclusion
To summarize:
vec![]
is super handy, and available everywhere, but Sets don't have an equivalent that is as well known.HashSet
implementsFrom<[T; N]>
which means we can use inline fixed-size slices to create themHashSet::from([1,2,3,4])
If you want more flexibility and the same convenience as
vec![]
you can use a little macro likeset!{}
above
Just remember to use your macros sparingly as they can make your compiler errors confusing, and the overall developer experience worse when they are very complex.
If you've got a better way of writing your sets, I'd love to hear it – feel free to post at me here: @leostera
Now go make some Sets!
Thanks to @diogomafra_ for early feedback on this post and suggesting we also include a macro branch for iterators. We didn't include it because HashSet::from_iter(x.into_iter())
tends to require type annotations on the receiving end, which makes the macro much less useful.
Thanks to @anothergalvez for reminding me to keep things accessible by adding text versions of the code snippets. I've since replaced the images with text and set up Prism.js to get some highlighting but this is still a work in progress.
Thanks to /u/h2co3 for pointing out a few things on the Rust forums, including using the full-path to the HashSet struct inside the macro to prevent errors when copy-pasting it in your code.