A quick guide to GADTs and why you ain't gonna need them
Ever wanted to use a GADT but did not know if you really needed them? You probably don't. And here's why.
I've been seeing some more posts about how to use and when to use GADTs. GADTs (Generalized Abstract Data Types, I pronounce them "gats" like in "cats" but with a "g") are an extension to regular ADTs that is usually reserved for very specific scenarios, but its not always clear which those are and why.
So I figured I'd give this a shot, and write a super small primer on them, by example. We're gonna write a library that didn't need to use GADTs, and along the way we're gonna feel the pains that come with GADTs, and what specific things they are amazing for.
Buckle on to your camels! 🐫
GADTs by Example
Alright usually you'd write your variant as:
type role =
| Guest of guest
| User of user
and use it as User(user)
or Guest(guest)
. You can think of these constructors as little functions that go from some arguments to a value of type role
. So really Guest
"has type" guest -> role
. And User
has type user -> role
.
A GADT will let you change (a little) the return type of these constructors. So let's see what our role
example looks like with GADT syntax.
type role =
| Guest: guest -> role
| User: user -> role
Neat, right? I really like this syntax.
But in this case, our role
type can't be segmented or parametrized. I mean you can just have role
, it's not like an option
or a list
where you can have an int option
or bool list
and those are more specific types of option
or list
. This means we can really only return role
in any of our constructors, so you probably don't need GADTs.
Let's look at an example that does not need GADTs but uses them anyway.
A Validation Library
We'll make a validation type that we can use to mark things as validated throughout our app:
type _ validation =
| Valid: 'thing -> 'thing validation
| Invalid: 'thing * string -> 'thing validation
| Pending: 'thing -> 'thing validation
Note how using our constructors returns different types. If you use Valid("hello")
you get a string validation
, if you use Invalid(2113, "not cool number")
you get an int validation
, and so on.
If we try to make some helper functions they may look like this:
let make : 'thing -> 'thing validation =
fun x -> Pending x
let get : 'x validation -> 'x =
function
| Pending x -> x
| Valid x -> x
| Invalid (x, _) -> x
let errors : 'x validation -> string option =
function
| Invalid (_, err) -> Some err
| _ -> None
let is_valid : 'x validation -> bool =
function
| Valid _ -> true
| _ -> false
You can imagine how we'll have to check every time we want to open up a validation, to see if its pending, valid, or invalid. This means other devs will have to remember to check if the validation passed before using the value.
I don't know about you but I'm 100% certain I will forget to do that at least once.
GADTs can help us here to reduce the space of all the possible types that our variant constructors create. Right now, we can essentially create any <type> validation
by just passing the right value to the constructors. But we could change that, so that you can statically check a value has passed, is pending, or has failed validation. Our new GADT could look like this:
(* we'll make some abstract types to use for differentiating
the validation states *)
type pending
type failed
type valid
(* our new validation GADT *)
type _ validation =
| Pending: 'thing -> pending validation
| Valid: 'thing -> valid validation
| Failed: 'thing * string -> failed validation
This means that our functions can have restricted type signatures and smaller implementations. The compiler now knows that there is only one possible type that matches the Valid
constructor, so we can't consider the others.
let make x = Pending x
let get (Valid x) = x
let errors (Fail (_, err)) = err
Great! The downside is that this code doesn't actually type-check. You can't pattern-match and get a value out of Valid x
because the compiler "forgot" what type that value had. Let me show you what I mean. This function:
let get (Valid x) = x
Fails to type with this error:
File "lib/gadt.ml", line 18, characters 20-21:
18 | let get (Valid x) = x
^
Error: This expression has type $Valid_'thing
but an expression was expected of type 'a
The type constructor $Valid_'thing would escape its scope
And the type of x in the error is $Valid_'thing
, which is a weird type. The compiler yells that it would escape its scope. That's how OCaml tells us "Hey I know there should be a type here but I...erhm...don't know anymore what that type *actually* was. So, yeah, can't let you use it. Sorry".
So how does one get this value out?
Turns out that while the compiler won't let this type escape, if you put many things together inside the same constructor, it will remember if the type was the same across all of them. For example, the compiler is completely happy here:
(* we introduce a function in our Valid arguments *)
type 'validity validation =
| Valid: 'thing * ('thing -> bool) -> valid validation
| ...
let get_that_bool (Valid (x, fn)) = fn x
because it understands that once you pack together a 'thing
and a 'thing -> bool
, then it's the same 'thing
. So this will work for any type. And that's both quite powerful and also super non-obvious at first. Like, what is this useful for?
GADTs can help us hide type information, but still be able to use it later. In the last issue of Practical OCaml we created a route
type for our web router that uses this pattern to hide the types that different route-handler functions use as inputs, and it lets us put together into a single list, a bunch of routes that have type-safe parameters/body parsing. Pretty cool.
Anyways, back to our question, to get our Valid value out, we will need to either:
let the value escape
or turn it into a type that is known outside the GADT, like include a
'thing -> string
function so we can always just call that function to turn our'thing
into a string and return a string
We want to preserve the type of our value, so we're gonna do the first. Letting our value escape means actually putting 'thing
in the return type of our constructors. Like this:
type _ validation =
| Valid: 'thing -> (valid * 'thing) validation
| ...
That's actually all we need. Now the compiler can infer the signature of our get
function is (valid * 'thing) validation -> 'thing
, and it lets us take that 'thing
out. BAM. Done.
And yet our validation solution still doesn't help us actually validate anything. We don't have a function that goes from pending to valid, or from pending to invalid. Since our Pending
variant doesn't know what a 'thing
is, it also doesn't know how to validate it. We'll start there:
(* we added a `fn` to our Pending constructor *)
type _ validation =
| Pending: 'thing * ('thing -> bool) -> pending validation
| ...
let validate (Pending (x, fn)) =
if fn x
then Valid x
else Invalid (x, "invalid value!")
We run the compiler, and see the same issue as before: Valid x
would have type $Pending_'thing
, because our Pending variant doesn't really expose it's internal 'thing
type... yadda yadda...we can fix that too by letting 'thing
escape:
type _ validation =
| Pending: 'thing * ('thing -> bool) -> (pending * 'thing) validation
| ...
Aaaand now we run into another issue. Oof. Valid x
and Invalid (x, "invalid value!")
have different types 🙈 – this is a very common "dead end".
For now, we are going to wrap 'em up in a result
and it will push the problem to future you and me:
let validate (Pending (x, fn)) =
if fn x
then Ok (Valid x)
else Error (Invalid (x, "invalid value!"))
So now we can use our Extremely Type-Safe Validation Lib:
let _ =
let user_value = make 2113 (fun x -> x > 0) in
match validate user_value with
| Ok value ->
let age = get value in
print_int age;
| Error err ->
let err = errors err in
print_string err;
Hopefully implementing this has shown some of the reasons why GADTs while powerful are rather painful to work with.
For completeness's sake here's the full code:
type valid
type invalid
type pending
type _ validation =
| Pending : 'thing * ('thing -> bool) -> (pending * 'thing) validation
| Valid : 'thing -> (valid * 'thing) validation
| Invalid : 'thing * string -> invalid validation
let make x fn = Pending (x, fn)
let get (Valid x) = x
let errors (Invalid (_, err)) = err
let validate (Pending (x, fn)) =
if fn x then Ok (Valid x) else Error (Invalid (x, "invalid value!"))
Conclusion: You Don't Need GADTs
Truth is that unless you are Jane Street and need to optimize the hell out of your compact arrays, or are writing a toy λ-calculus interpreter, you're probably better off without them.
GADTs can be super useful if you need to:
hide type information
restrict the kind of types that can be instantiated
have more control over the relation between the input and return type of a function
But GADTs are not only hard to pronounce, they also come with a host of problems. The ones we've seen, and some more that we haven't that need solutions with wizardly names like locally abstract types or polymorphic recursion. Learning about this is fun and great, but sometimes can get in the way of shipping without substantially improving the quality of your product or developer experience.
So stick to simpler types until you run into one of those 3 problems and I promise you you'll be a productive, happy camelid 🚀
Have you implemented typed state machines in some other ways? Have anything to add or challenge? I'd love to hear it! Join the x.com thread.
Happy Cameling! 🐫
Thanks to @patricoferris for pointing out that if you do implement the above pattern, but move outside the current module (for ex. have a submodule for your valid
, invalid
, pending
types), you may run into some undecidability problems that make pattern-matching non-exhaustive. Oof, many words. The gist is, if you see a "This pattern is non-exhaustive" error, try adding a private constructor to your type-tags:
type valid = private | Valid_tag
type invalid = private | Invalid_tag
type pending = private | Pending_tag
For a more detailed answer, check out this OCaml forum post.