Can i haz? Part 2: scrapping the boilerplate, and fun with types

October 12, 2019 // haskell, types

Last time we briefly covered the Has pattern which allows expressing more precisely in types what does a function need from the application environment. We also wrote a few instances for our Has-like classes for a toy environment type.

The problem is that we had to write those classes. As the number of fields in our environment type grows, the number of classes we have to write grows linearly even in the best case. This much manual labour is surely not the Haskell way!

The problem with `Has`

Last post ended with this type and some instances:

data AppConfig = AppConfig
  { dbConfig :: DbConfig
  , webServerConfig :: WebServerConfig
  , cronConfig :: CronConfig
  }

instance HasDbConfig AppConfig where
  getDbConfig = dbConfig
instance HasWebServerConfig AppConfig where
  getWebServerConfig = webServerCOnfig
instance HasCronConfig AppConfig where
  getCronConfig = cronConfig

Let’s think what other instances we might want to write.

The configs themselves are obviously good candidates for (trivially) satisfying the corresponding classes:

instance HasDbConfig DbConfig where
  getDbConfig = id
instance HasWebServerConfig WebServerConfig where
  getWebServerConfig = id
instance HasCronConfig CronConfig where
  getCronConfig = id

These instances allow us to, for example, write separate tests (or utilities like a service tool for our DB) that don’t require the whole of AppConfig.

This is already getting a bit boring, but hold on. Some integration tests might also involve a pair of modules, and we still don’t want to pull the whole application configuration into all of the modules, so we end up writing a few instances for tuples:

instance HasDbConfig (DbConfig, b) where
  getDbConfig = fst
instance HasDbConfig (a, DbConfig) where
  getDbConfig = snd

instance HasWebServerConfig (WebServerConfig, b) where
  getWebServerConfig = fst
instance HasWebServerConfig (a, WebServerConfig) where
  getWebServerConfig = snd

instance HasCronConfig (CronConfig, b) where
  getCronConfig = fst
instance HasCronConfig (a, CronConfig) where
  getCronConfig = snd

Ugh. Let’s just hope we will never need to test three modules at once so we won’t need to write nine dull instances for 3-tuples.

Anyway, if you’re anything like me, this amount of boilerplate will make you seriously uncomfortable and eager to spend a few hours looking for ways to delegate this to the compiler instead a couple of minutes of writing the necessary instances.

Generalizing the `Has` class

Firstly, note that we have different classes for different environments. This might hinder making an universal solution, so let’s make the environment just another parameter:

class Has part record where
  extract :: record -> part

We say that Has part record is satisfied if part can be extracted out of record. This way, our good old HasDbConfig becomes Has DbConfig, and similarly for other classes we had previously. This is almost a pure syntactic change, so, for instance, one of our functions from the earlier post that had the type

doSmthWithDbAndCron :: (MonadReader r m, HasDbConfig r, HasCronConfig r) => ...

now has the following signature:

doSmthWithDbAndCron :: (MonadReader r m, Has DbConfig r, Has CronConfig r) => ...

The only change is a couple of spaces inserted in the right places.

We haven’t lost much of type inference either: the type checker is still perfectly happy to figure out the return value of extract most of the times based on the context we will be using it.

Now that we don’t care about the specific environment type, let’s consider what records satisfy the Has part record constraint for some fixed part. Looks like this problem admits a nice inductive structure:

Each record Has itself: Has part part is trivially satisfied via extract = id.
If record is a product of subrecords rec1 and rec2, then Has part record is satisfied iff either Has part rec1 or Has part rec2 is satisfied.
If record is a sum of rec1 and rec2, then Has part record is satisfied if both Has part rec1 and Has part rec2 is satisfied (although the utility of this case is not obvious, but let’s spell it out for completeness).

Looks like we have an algorithm for authomatically deriving if Has part record is implemented for a given pair of part and record!

Luckily, this sort of problems is precisely what GHC’s Generics are designed to solve. Long story short, each type is isomoprhic either to a sum of some (typically smaller) types, or a product of some types, or a “base type”.

First stab

Let’s take the typical route of writing a Generic implementation of our Has class via a helper GHas class that deals with all the types arising in Generics:

class GHas part grecord where
  gextract :: grecord p -> part

Here, grecord is the Generic representation of our record type.

The instances follow the inductive structure outlined above:

instance GHas record (K1 i record) where
  gextract (K1 x) = x

instance GHas part record => GHas part (M1 i t record) where
  gextract (M1 x) = gextract x

instance GHas part l => GHas part (l :*: r) where
  gextract (l :*: _) = gextract l

instance GHas part r => GHas part (l :*: r) where
  gextract (_ :*: r) = gextract r

K1 corresponds to the base case.
M1 is some Generics-specific stuff containing extra metadata that we don’t care about and just go through it.
The first l :*: r clause corresponds to the case where the “left” component of the product type has part ((a, b) has part when a has part).
Similarly, the second l :*: r clause corresponds to the case where the “right” component of the product type has part ((a, b) has part when b has part).

Note that we only support product types. My impression is that sum types are used way less often in this context, so let’s avoid them to simplify the solution.

It’s also worth noting that every n-ary product type (a1, ..., an) can be thought as a composition of n − 1 pairs (a1, (a2, (a3, (..., an)))), so I’ll be (somewhat liberally) associating product types with pairs.

Having GHas class in place, we enable a couple of extensions like DefaultSignatures and write the default implementation for Has:

class Has part record where
  extract :: record -> part

  default extract :: Generic record => record -> part
  extract = gextract . from

That’s it.

Or not?

The catch

If we actually try to compile this code, it won’t typecheck even without any attempts to use this default implementation. The problem is that there are overlapping instances. Worse than that, those instances are actually the same (in some sense) due to the way Haskell instances resolution works. Namely, let’s say we have this instance:

instance context => Foo barPattern bazPattern where
  ...

(by the way, that thing after the => is called the instance head.)

It feels natural to read this as follows:

Assume we need to resolve Foo bar baz. If we have the context satisfied, then this instance could be chosen if bar and baz match barPattern and bazPattern respectively.

But this is wrong! It actually goes the other way around:

Assume we need to resolve Foo bar baz. If we can match a given bar and baz against the barPattern and bazPattern in the instance head, then we use this instance and require the context to be satisfied.

Now it’s obvious what the problem is. Let’s look again at some of the instances instances we wrote:

instance GHas part l => GHas part (l :*: r) where
  gextract (l :*: _) = gextract l

instance GHas part r => GHas part (l :*: r) where
  gextract (_ :*: r) = gextract r

These two instances have the same instance heads, so of course they overlap! Moreover, none is more specific than the other under any circumstances.

There’s no way we can refine the instance heads to reflect what we need. That is, unless we add some extra parameters to GHas.

Expressive types to the rescue!

The solution is to precompute the “path” to the data member in question and use it to guide instance matching.

Since we don’t support sum types, a path is literally a sequence of left or right turns in product types (choosing the first or second component respectively) followed by an X once we’ve found the data member of the required type. Let’s write this down:

data Path = L Path | R Path | Here deriving (Show)

As an example

Consider the types

data DbConfig = DbConfig
  { dbAddress :: DbAddress
  , dbUsername :: Username
  , dbPassword :: Password
  }

data AppConfig = AppConfig
  { dbConfig :: DbConfig
  , webServerConfig :: WebServerConfig
  , cronConfig :: CronConfig
  }

What would be some example paths from AppConfig?

To DbConfig ⟶ L Here.
To WebServerConfig ⟶ R (L Here).
To CronConfig ⟶ R (R Here).
To DbAddress ⟶ L (L Here).

What could be the result of searching for a data member with the given type? Two options are obvious: it can be either successful or not. But there’s actually a third possibility lurking somewhere: there might be more than one instance of a value with the same type. Presumably the best course of action in such ambiguous case would be to fail and refuse to return anything at all, since any such choice will be inherently arbitrary.

As a motivation, consider our running example of a web service configuration. If somebody requests for (Host, Port) pair, shall it be the database server one or the web server one? Better safe than sorry.

Anyway, let’s express this in code:

data MaybePath = NotFound | Conflict | Found Path deriving (Show)

The difference in behaviour that requires to discriminate between the two failure options is that if we hit NotFound in one of the branches of a product type, then it doesn’t stop us from succeeding in some other branch. Contrary to that, once we hit Conflict, we know we’ve failed globally.

Given this representation, how do we treat product types (which, as we agreed, are basically pairs)? We perform the search recursively on both components and then combine the search results. Let’s write the function for combining them.

Since we’re talking about guiding instance resolution, we’re ultimately in the compile-time type-level computations land (even though the types in question are actually terms lifted via DataKinds). Thus we need a type-level function, which is expressed as a type family:

type family Combine p1 p2 where
  Combine ('Found path) 'NotFound = 'Found ('L path)
  Combine 'NotFound ('Found path) = 'Found ('R path)
  Combine 'NotFound 'NotFound = 'NotFound
  Combine _ _ = 'Conflict

This function is a straightforward representation of a few cases:

If one subsearch is successful and the other is NotFound, then we just need to take the corresponding left or right turn.
If both subsearches result in NotFound, then the whole result is, of course, also NotFound.
In any other case we know we have a Conflict.

Now let’s write a type-level function that takes a part and a Generics representation of some type and performs the search:

type family Search part (grecord :: k -> *) :: MaybePath where
  Search part (K1 _ part) = 'Found 'Here
  Search part (K1 _ other) = 'NotFound
  Search part (M1 _ _ x) = Search part x
  Search part (l :*: r) = Combine (Search part l) (Search part r)
  Search _ _ = 'NotFound

Note that it’s very similar to what we tried to express earlier with the GHas class! This is to be expected, since we’re effectively reproducing the algorithm that we tried to spell out via type classes.

Speaking of GHas, all we have to do now is to add an extra parameter denoting the path that we’ve computed via the Search function and that will be used to select the right instance:

class GHas (path :: Path) part grecord where
  gextract :: Proxy path -> grecord p -> part

We use an extra parameter for gextract to help the type checker deduce what path should be in each gextract invocation.

Writing the instances is straightforward:

instance GHas 'Here rec (K1 i rec) where
  gextract _ (K1 x) = x

instance GHas path part record => GHas path part (M1 i t record) where
  gextract proxy (M1 x) = gextract proxy x

instance GHas path part l => GHas ('L path) part (l :*: r) where
  gextract _ (l :*: _) = gextract (Proxy :: Proxy path) l

instance GHas path part r => GHas ('R path) part (l :*: r) where
  gextract _ (_ :*: r) = gextract (Proxy :: Proxy path) r

Indeed, we just select the right instance based on the path we computed earlier.

How do we write our default implementation for the extract :: record -> part function? We have a few constraints:

record should be Generic for all this stuff to work at all, so we get Generic record.
part should be found in record. More concretely, Search should be able to find part in the Generic representation of record (and the latter is denoted as Rep record). This is a bit more funny to write down, resulting in Search part (Rep record) ~ 'Found path. Indeed, if Search returns Conflict or NotFound, then there is no such path.
We should be able to use GHas with part, record and path from the previous step, giving GHas path part (Rep record).

We’ll encounter the latter two constraints a couple more times, so it’s worth refactoring them out:

type SuccessfulSearch part record path = (Search part (Rep record) ~ 'Found path, GHas path part (Rep record))

Given this alias, we arrive at

class Has part record where
  extract :: record -> part

  default extract :: forall path. (Generic record, SuccessfulSearch part record path) => record -> part
  extract = gextract (Proxy :: Proxy path) . from

And now we’re done!

Using the generic `Has`

To see how it all works, firstly, let’s write down a few common instances for tuples:

instance SuccessfulSearch a (a0, a1) path => Has a (a0, a1)
instance SuccessfulSearch a (a0, a1, a2) path => Has a (a0, a1, a2)
instance SuccessfulSearch a (a0, a1, a2, a3) path => Has a (a0, a1, a2, a3)

Here the SuccessfulSearch a (a0, ..., an) path constraint is responsible for ensuring that a occurs exactly once among a0, ..., an.

Then let’s say we have our good old

data AppConfig = AppConfig
  { dbConfig :: DbConfig
  , webServerConfig :: WebServerConfig
  , cronConfig :: CronConfig
  }

and we want to derive Has DbConfig, Has WebServerConfig and Has CronConfig. It’s sufficient to enable DeriveGeneric and DeriveAnyClass extensions and append a deriving-clause:

data AppConfig = AppConfig
  { dbConfig :: DbConfig
  , webServerConfig :: WebServerConfig
  , cronConfig :: CronConfig
  } deriving (Generic, Has DbConfig, Has WebServerConfig, Has CronConfig)

We were lucky (or wise) to choose the correct argument order for Has so that the part comes first, as this allows us to rely on DeriveAnyClass to minimize our syntactic burden.

Safety first

What happens if we don’t have a certain type?

data AppConfig = AppConfig
  { dbConfig :: DbConfig
  , webServerConfig :: WebServerConfig
  } deriving (Generic, Has CronConfig)

Nope, we catch the error as early as at the type definition site, as expected:

Spec.hs:35:24: error:
    • Couldn't match type ‘'NotFound’
                     with ‘'Found path0’
        arising from the 'deriving' clause of a data type declaration
    • When deriving the instance for (Has CronConfig AppConfig)
   |
35 |   } deriving (Generic, Has CronConfig)
   |                        ^^^^^^^^^^^^^^

Not the most friendly error message, but it’s still quite possible to understand what the problem is: something something NotFound something something CronConfig.

What happens if we have several different fields with the same type?

data AppConfig = AppConfig
  { prodDbConfig :: DbConfig
  , qaDbConfig :: DbConfig
  , webServerConfig :: WebServerConfig
  , cronConfig :: CronConfig
  } deriving (Generic, Has DbConfig)

Nope, as expected:

Spec.hs:37:24: error:
    • Couldn't match type ‘'Conflict’
                     with ‘'Found path0’
        arising from the 'deriving' clause of a data type declaration
    • When deriving the instance for (Has DbConfig AppConfig)
   |
37 |   } deriving (Generic, Has DbConfig)
   |                        ^^^^^^^^^^^^

Again, not the most obvious message, but gets the job done.

Looks like we’re really good to go.

To sum it up

So here’s the trick in question in a more distilled form.

Let’s say we have a certain type class and we want to be able to automatically derive instances of that class according to some recursive rules. Then we can avoid ambiguities (such as overlapping instances) during instance resolution, if expressing those rules at all, as follows:

Encode the recursive rules in an inductive data type T.
Use a type-level function (in form of a type family) to precompute the value v of that type T (or, in Haskell-speak, a type v of the kind T — so much for dependent types) describing the exact sequence of the recursive steps to be taken.
Use that v as an extra argument to the Generic-based type class to guide the specific instances, which will now match on v.

And that’s basically it for the pattern!

Stay tuned for another post (or two) about some nice and beautiful extensions of this approach.