Can i haz? Part 2: scrapping the boilerplate, and fun with types
Last time we briefly covered the Has pattern
which allows expressing more precisely in types what does a function need from the application environment.
We also wrote a few instances for our Has-like classes for a toy environment type.
The problem is that we had to write those classes. As the number of fields in our environment type grows, the number of classes we have to write grows linearly even in the best case. This much manual labour is surely not the Haskell way!
The problem with Has
Last post ended with this type and some instances:
data AppConfig = AppConfig
{ dbConfig :: DbConfig
, webServerConfig :: WebServerConfig
, cronConfig :: CronConfig
}
instance HasDbConfig AppConfig where
getDbConfig = dbConfig
instance HasWebServerConfig AppConfig where
getWebServerConfig = webServerCOnfig
instance HasCronConfig AppConfig where
getCronConfig = cronConfigLet’s think what other instances we might want to write.
The configs themselves are obviously good candidates for (trivially) satisfying the corresponding classes:
instance HasDbConfig DbConfig where
getDbConfig = id
instance HasWebServerConfig WebServerConfig where
getWebServerConfig = id
instance HasCronConfig CronConfig where
getCronConfig = idThese instances allow us to, for example, write separate tests (or utilities like a service tool for our DB)
that don’t require the whole of AppConfig.
This is already getting a bit boring, but hold on. Some integration tests might also involve a pair of modules, and we still don’t want to pull the whole application configuration into all of the modules, so we end up writing a few instances for tuples:
instance HasDbConfig (DbConfig, b) where
getDbConfig = fst
instance HasDbConfig (a, DbConfig) where
getDbConfig = snd
instance HasWebServerConfig (WebServerConfig, b) where
getWebServerConfig = fst
instance HasWebServerConfig (a, WebServerConfig) where
getWebServerConfig = snd
instance HasCronConfig (CronConfig, b) where
getCronConfig = fst
instance HasCronConfig (a, CronConfig) where
getCronConfig = sndUgh. Let’s just hope we will never need to test three modules at once so we won’t need to write nine dull instances for 3-tuples.
Anyway, if you’re anything like me, this amount of boilerplate will make you seriously uncomfortable and eager to spend a few hours looking for ways to delegate this to the compiler instead a couple of minutes of writing the necessary instances.
Generalizing the Has class
Firstly, note that we have different classes for different environments. This might hinder making an universal solution, so let’s make the environment just another parameter:
class Has part record where
extract :: record -> partWe say that Has part record is satisfied if part can be extracted out of record.
This way, our good old HasDbConfig becomes Has DbConfig,
and similarly for other classes we had previously.
This is almost a pure syntactic change, so, for instance,
one of our functions from the earlier post that had the type
doSmthWithDbAndCron :: (MonadReader r m, HasDbConfig r, HasCronConfig r) => ...now has the following signature:
doSmthWithDbAndCron :: (MonadReader r m, Has DbConfig r, Has CronConfig r) => ...The only change is a couple of spaces inserted in the right places.
We haven’t lost much of type inference either:
the type checker is still perfectly happy to figure out the return value
of extract most of the times based on the context we will be using it.
Now that we don’t care about the specific environment type,
let’s consider what records satisfy the Has part record constraint
for some fixed part. Looks like this problem admits a nice inductive structure:
- Each record
Hasitself:Has part partis trivially satisfied viaextract = id. - If
recordis a product of subrecordsrec1andrec2, thenHas part recordis satisfied iff eitherHas part rec1orHas part rec2is satisfied. - If
recordis a sum ofrec1andrec2, thenHas part recordis satisfied if bothHas part rec1andHas part rec2is satisfied (although the utility of this case is not obvious, but let’s spell it out for completeness).
Looks like we have an algorithm for authomatically deriving if Has part record is implemented
for a given pair of part and record!
Luckily, this sort of problems is precisely what GHC’s Generics are designed to solve. Long story short, each type is isomoprhic either to a sum of some (typically smaller) types, or a product of some types, or a “base type”.
First stab
Let’s take the typical route of writing a Generic implementation of our Has class
via a helper GHas class that deals with all the types arising in Generics:
class GHas part grecord where
gextract :: grecord p -> partHere, grecord is the Generic representation of our record type.
The instances follow the inductive structure outlined above:
instance GHas record (K1 i record) where
gextract (K1 x) = x
instance GHas part record => GHas part (M1 i t record) where
gextract (M1 x) = gextract x
instance GHas part l => GHas part (l :*: r) where
gextract (l :*: _) = gextract l
instance GHas part r => GHas part (l :*: r) where
gextract (_ :*: r) = gextract rK1corresponds to the base case.M1is someGenerics-specific stuff containing extra metadata that we don’t care about and just go through it.- The first
l :*: rclause corresponds to the case where the “left” component of the product type haspart((a, b)haspartwhenahaspart). - Similarly, the second
l :*: rclause corresponds to the case where the “right” component of the product type haspart((a, b)haspartwhenbhaspart).
Note that we only support product types. My impression is that sum types are used way less often in this context, so let’s avoid them to simplify the solution.
It’s also worth noting that every n-ary product type (a1, ..., an)
can be thought as a composition of n − 1 pairs (a1, (a2, (a3, (..., an)))),
so I’ll be (somewhat liberally) associating product types with pairs.
Having GHas class in place,
we enable a couple of extensions like DefaultSignatures
and write the default implementation for Has:
class Has part record where
extract :: record -> part
default extract :: Generic record => record -> part
extract = gextract . fromThat’s it.
Or not?
The catch
If we actually try to compile this code, it won’t typecheck even without any attempts to use this default implementation. The problem is that there are overlapping instances. Worse than that, those instances are actually the same (in some sense) due to the way Haskell instances resolution works. Namely, let’s say we have this instance:
instance context => Foo barPattern bazPattern where
...(by the way, that thing after the => is called the instance head.)
It feels natural to read this as follows:
Assume we need to resolve
Foo bar baz. If we have thecontextsatisfied, then this instance could be chosen ifbarandbazmatchbarPatternandbazPatternrespectively.
But this is wrong! It actually goes the other way around:
Assume we need to resolve
Foo bar baz. If we can match a givenbarandbazagainst thebarPatternandbazPatternin the instance head, then we use this instance and require thecontextto be satisfied.
Now it’s obvious what the problem is. Let’s look again at some of the instances instances we wrote:
instance GHas part l => GHas part (l :*: r) where
gextract (l :*: _) = gextract l
instance GHas part r => GHas part (l :*: r) where
gextract (_ :*: r) = gextract rThese two instances have the same instance heads, so of course they overlap! Moreover, none is more specific than the other under any circumstances.
There’s no way we can refine the instance heads to reflect what we need.
That is, unless we add some extra parameters to GHas.
Expressive types to the rescue!
The solution is to precompute the “path” to the data member in question and use it to guide instance matching.
Since we don’t support sum types, a path is literally a sequence of left or right turns in product types (choosing the first or second component respectively) followed by an X once we’ve found the data member of the required type. Let’s write this down:
data Path = L Path | R Path | Here deriving (Show)As an example
Consider the types
data DbConfig = DbConfig
{ dbAddress :: DbAddress
, dbUsername :: Username
, dbPassword :: Password
}
data AppConfig = AppConfig
{ dbConfig :: DbConfig
, webServerConfig :: WebServerConfig
, cronConfig :: CronConfig
}What would be some example paths from AppConfig?
- To
DbConfig⟶L Here. - To
WebServerConfig⟶R (L Here). - To
CronConfig⟶R (R Here). - To
DbAddress⟶L (L Here).
What could be the result of searching for a data member with the given type? Two options are obvious: it can be either successful or not. But there’s actually a third possibility lurking somewhere: there might be more than one instance of a value with the same type. Presumably the best course of action in such ambiguous case would be to fail and refuse to return anything at all, since any such choice will be inherently arbitrary.
As a motivation, consider our running example of a web service configuration.
If somebody requests for (Host, Port) pair, shall it be the database server one
or the web server one? Better safe than sorry.
Anyway, let’s express this in code:
data MaybePath = NotFound | Conflict | Found Path deriving (Show)The difference in behaviour that requires to discriminate between the two failure options
is that if we hit NotFound in one of the branches of a product type,
then it doesn’t stop us from succeeding in some other branch.
Contrary to that, once we hit Conflict, we know we’ve failed globally.
Given this representation, how do we treat product types (which, as we agreed, are basically pairs)? We perform the search recursively on both components and then combine the search results. Let’s write the function for combining them.
Since we’re talking about guiding instance resolution,
we’re ultimately in the compile-time type-level computations land
(even though the types in question are actually terms lifted via DataKinds).
Thus we need a type-level function, which is expressed as a type family:
type family Combine p1 p2 where
Combine ('Found path) 'NotFound = 'Found ('L path)
Combine 'NotFound ('Found path) = 'Found ('R path)
Combine 'NotFound 'NotFound = 'NotFound
Combine _ _ = 'ConflictThis function is a straightforward representation of a few cases:
- If one subsearch is successful and the other is
NotFound, then we just need to take the corresponding left or right turn. - If both subsearches result in
NotFound, then the whole result is, of course, alsoNotFound. - In any other case we know we have a
Conflict.
Now let’s write a type-level function that
takes a part and a Generics representation of some type and performs the search:
type family Search part (grecord :: k -> *) :: MaybePath where
Search part (K1 _ part) = 'Found 'Here
Search part (K1 _ other) = 'NotFound
Search part (M1 _ _ x) = Search part x
Search part (l :*: r) = Combine (Search part l) (Search part r)
Search _ _ = 'NotFoundNote that it’s very similar to what we tried to express earlier with the GHas class!
This is to be expected, since we’re effectively reproducing the algorithm
that we tried to spell out via type classes.
Speaking of GHas, all we have to do now is to add an extra parameter denoting the path
that we’ve computed via the Search function
and that will be used to select the right instance:
class GHas (path :: Path) part grecord where
gextract :: Proxy path -> grecord p -> partWe use an extra parameter for gextract to help the type checker
deduce what path should be in each gextract invocation.
Writing the instances is straightforward:
instance GHas 'Here rec (K1 i rec) where
gextract _ (K1 x) = x
instance GHas path part record => GHas path part (M1 i t record) where
gextract proxy (M1 x) = gextract proxy x
instance GHas path part l => GHas ('L path) part (l :*: r) where
gextract _ (l :*: _) = gextract (Proxy :: Proxy path) l
instance GHas path part r => GHas ('R path) part (l :*: r) where
gextract _ (_ :*: r) = gextract (Proxy :: Proxy path) rIndeed, we just select the right instance based on the path we computed earlier.
How do we write our default implementation for the extract :: record -> part function?
We have a few constraints:
recordshould beGenericfor all this stuff to work at all, so we getGeneric record.partshould be found inrecord. More concretely,Searchshould be able to findpartin theGenericrepresentation ofrecord(and the latter is denoted asRep record). This is a bit more funny to write down, resulting inSearch part (Rep record) ~ 'Found path. Indeed, ifSearchreturnsConflictorNotFound, then there is no suchpath.- We should be able to use
GHaswithpart,recordandpathfrom the previous step, givingGHas path part (Rep record).
We’ll encounter the latter two constraints a couple more times, so it’s worth refactoring them out:
type SuccessfulSearch part record path = (Search part (Rep record) ~ 'Found path, GHas path part (Rep record))Given this alias, we arrive at
class Has part record where
extract :: record -> part
default extract :: forall path. (Generic record, SuccessfulSearch part record path) => record -> part
extract = gextract (Proxy :: Proxy path) . fromAnd now we’re done!
Using the generic Has
To see how it all works, firstly, let’s write down a few common instances for tuples:
instance SuccessfulSearch a (a0, a1) path => Has a (a0, a1)
instance SuccessfulSearch a (a0, a1, a2) path => Has a (a0, a1, a2)
instance SuccessfulSearch a (a0, a1, a2, a3) path => Has a (a0, a1, a2, a3)Here the SuccessfulSearch a (a0, ..., an) path constraint is responsible for
ensuring that a occurs exactly once among a0, ..., an.
Then let’s say we have our good old
data AppConfig = AppConfig
{ dbConfig :: DbConfig
, webServerConfig :: WebServerConfig
, cronConfig :: CronConfig
}and we want to derive Has DbConfig, Has WebServerConfig
and Has CronConfig.
It’s sufficient to enable DeriveGeneric and DeriveAnyClass extensions
and append a deriving-clause:
data AppConfig = AppConfig
{ dbConfig :: DbConfig
, webServerConfig :: WebServerConfig
, cronConfig :: CronConfig
} deriving (Generic, Has DbConfig, Has WebServerConfig, Has CronConfig)We were lucky (or wise) to choose the correct argument order for Has
so that the part comes first,
as this allows us to rely on DeriveAnyClass to minimize our syntactic burden.
Safety first
What happens if we don’t have a certain type?
data AppConfig = AppConfig
{ dbConfig :: DbConfig
, webServerConfig :: WebServerConfig
} deriving (Generic, Has CronConfig)Nope, we catch the error as early as at the type definition site, as expected:
Spec.hs:35:24: error:
• Couldn't match type ‘'NotFound’
with ‘'Found path0’
arising from the 'deriving' clause of a data type declaration
• When deriving the instance for (Has CronConfig AppConfig)
|
35 | } deriving (Generic, Has CronConfig)
| ^^^^^^^^^^^^^^
Not the most friendly error message,
but it’s still quite possible to understand what the problem is:
something something NotFound something something CronConfig.
What happens if we have several different fields with the same type?
data AppConfig = AppConfig
{ prodDbConfig :: DbConfig
, qaDbConfig :: DbConfig
, webServerConfig :: WebServerConfig
, cronConfig :: CronConfig
} deriving (Generic, Has DbConfig)Nope, as expected:
Spec.hs:37:24: error:
• Couldn't match type ‘'Conflict’
with ‘'Found path0’
arising from the 'deriving' clause of a data type declaration
• When deriving the instance for (Has DbConfig AppConfig)
|
37 | } deriving (Generic, Has DbConfig)
| ^^^^^^^^^^^^
Again, not the most obvious message, but gets the job done.
Looks like we’re really good to go.
To sum it up
So here’s the trick in question in a more distilled form.
Let’s say we have a certain type class and we want to be able to automatically derive instances of that class according to some recursive rules. Then we can avoid ambiguities (such as overlapping instances) during instance resolution, if expressing those rules at all, as follows:
- Encode the recursive rules in an inductive data type
T. - Use a type-level function (in form of a type family) to precompute
the value
vof that typeT(or, in Haskell-speak, a typevof the kindT— so much for dependent types) describing the exact sequence of the recursive steps to be taken. - Use that
vas an extra argument to theGeneric-based type class to guide the specific instances, which will now match onv.
And that’s basically it for the pattern!
Stay tuned for another post (or two) about some nice and beautiful extensions of this approach.