Can i haz? Part 2: scrapping the boilerplate, and fun with types
Last time we briefly covered the Has
pattern
which allows expressing more precisely in types what does a function need from the application environment.
We also wrote a few instances for our Has
-like classes for a toy environment type.
The problem is that we had to write those classes. As the number of fields in our environment type grows, the number of classes we have to write grows linearly even in the best case. This much manual labour is surely not the Haskell way!
The problem with Has
Last post ended with this type and some instances:
data AppConfig = AppConfig
{ dbConfig :: DbConfig
, webServerConfig :: WebServerConfig
, cronConfig :: CronConfig
}
instance HasDbConfig AppConfig where
getDbConfig = dbConfig
instance HasWebServerConfig AppConfig where
getWebServerConfig = webServerCOnfig
instance HasCronConfig AppConfig where
getCronConfig = cronConfig
Let’s think what other instances we might want to write.
The configs themselves are obviously good candidates for (trivially) satisfying the corresponding classes:
instance HasDbConfig DbConfig where
getDbConfig = id
instance HasWebServerConfig WebServerConfig where
getWebServerConfig = id
instance HasCronConfig CronConfig where
getCronConfig = id
These instances allow us to, for example, write separate tests (or utilities like a service tool for our DB)
that don’t require the whole of AppConfig
.
This is already getting a bit boring, but hold on. Some integration tests might also involve a pair of modules, and we still don’t want to pull the whole application configuration into all of the modules, so we end up writing a few instances for tuples:
instance HasDbConfig (DbConfig, b) where
getDbConfig = fst
instance HasDbConfig (a, DbConfig) where
getDbConfig = snd
instance HasWebServerConfig (WebServerConfig, b) where
getWebServerConfig = fst
instance HasWebServerConfig (a, WebServerConfig) where
getWebServerConfig = snd
instance HasCronConfig (CronConfig, b) where
getCronConfig = fst
instance HasCronConfig (a, CronConfig) where
getCronConfig = snd
Ugh. Let’s just hope we will never need to test three modules at once so we won’t need to write nine dull instances for 3-tuples.
Anyway, if you’re anything like me, this amount of boilerplate will make you seriously uncomfortable and eager to spend a few hours looking for ways to delegate this to the compiler instead a couple of minutes of writing the necessary instances.
Generalizing the Has
class
Firstly, note that we have different classes for different environments. This might hinder making an universal solution, so let’s make the environment just another parameter:
class Has part record where
extract :: record -> part
We say that Has part record
is satisfied if part
can be extracted out of record
.
This way, our good old HasDbConfig
becomes Has DbConfig
,
and similarly for other classes we had previously.
This is almost a pure syntactic change, so, for instance,
one of our functions from the earlier post that had the type
doSmthWithDbAndCron :: (MonadReader r m, HasDbConfig r, HasCronConfig r) => ...
now has the following signature:
doSmthWithDbAndCron :: (MonadReader r m, Has DbConfig r, Has CronConfig r) => ...
The only change is a couple of spaces inserted in the right places.
We haven’t lost much of type inference either:
the type checker is still perfectly happy to figure out the return value
of extract
most of the times based on the context we will be using it.
Now that we don’t care about the specific environment type,
let’s consider what record
s satisfy the Has part record
constraint
for some fixed part
. Looks like this problem admits a nice inductive structure:
- Each record
Has
itself:Has part part
is trivially satisfied viaextract = id
. - If
record
is a product of subrecordsrec1
andrec2
, thenHas part record
is satisfied iff eitherHas part rec1
orHas part rec2
is satisfied. - If
record
is a sum ofrec1
andrec2
, thenHas part record
is satisfied if bothHas part rec1
andHas part rec2
is satisfied (although the utility of this case is not obvious, but let’s spell it out for completeness).
Looks like we have an algorithm for authomatically deriving if Has part record
is implemented
for a given pair of part
and record
!
Luckily, this sort of problems is precisely what GHC’s Generics are designed to solve. Long story short, each type is isomoprhic either to a sum of some (typically smaller) types, or a product of some types, or a “base type”.
First stab
Let’s take the typical route of writing a Generic
implementation of our Has
class
via a helper GHas
class that deals with all the types arising in Generics
:
class GHas part grecord where
gextract :: grecord p -> part
Here, grecord
is the Generic
representation of our record type.
The instances follow the inductive structure outlined above:
instance GHas record (K1 i record) where
gextract (K1 x) = x
instance GHas part record => GHas part (M1 i t record) where
gextract (M1 x) = gextract x
instance GHas part l => GHas part (l :*: r) where
gextract (l :*: _) = gextract l
instance GHas part r => GHas part (l :*: r) where
gextract (_ :*: r) = gextract r
K1
corresponds to the base case.M1
is someGenerics
-specific stuff containing extra metadata that we don’t care about and just go through it.- The first
l :*: r
clause corresponds to the case where the “left” component of the product type haspart
((a, b)
haspart
whena
haspart
). - Similarly, the second
l :*: r
clause corresponds to the case where the “right” component of the product type haspart
((a, b)
haspart
whenb
haspart
).
Note that we only support product types. My impression is that sum types are used way less often in this context, so let’s avoid them to simplify the solution.
It’s also worth noting that every n-ary product type (a1, ..., an)
can be thought as a composition of n − 1 pairs (a1, (a2, (a3, (..., an))))
,
so I’ll be (somewhat liberally) associating product types with pairs.
Having GHas
class in place,
we enable a couple of extensions like DefaultSignatures
and write the default implementation for Has
:
class Has part record where
extract :: record -> part
default extract :: Generic record => record -> part
extract = gextract . from
That’s it.
Or not?
The catch
If we actually try to compile this code, it won’t typecheck even without any attempts to use this default implementation. The problem is that there are overlapping instances. Worse than that, those instances are actually the same (in some sense) due to the way Haskell instances resolution works. Namely, let’s say we have this instance:
instance context => Foo barPattern bazPattern where
...
(by the way, that thing after the =>
is called the instance head.)
It feels natural to read this as follows:
Assume we need to resolve
Foo bar baz
. If we have thecontext
satisfied, then this instance could be chosen ifbar
andbaz
matchbarPattern
andbazPattern
respectively.
But this is wrong! It actually goes the other way around:
Assume we need to resolve
Foo bar baz
. If we can match a givenbar
andbaz
against thebarPattern
andbazPattern
in the instance head, then we use this instance and require thecontext
to be satisfied.
Now it’s obvious what the problem is. Let’s look again at some of the instances instances we wrote:
instance GHas part l => GHas part (l :*: r) where
gextract (l :*: _) = gextract l
instance GHas part r => GHas part (l :*: r) where
gextract (_ :*: r) = gextract r
These two instances have the same instance heads, so of course they overlap! Moreover, none is more specific than the other under any circumstances.
There’s no way we can refine the instance heads to reflect what we need.
That is, unless we add some extra parameters to GHas
.
Expressive types to the rescue!
The solution is to precompute the “path” to the data member in question and use it to guide instance matching.
Since we don’t support sum types, a path is literally a sequence of left or right turns in product types (choosing the first or second component respectively) followed by an X once we’ve found the data member of the required type. Let’s write this down:
data Path = L Path | R Path | Here deriving (Show)
As an example
Consider the types
data DbConfig = DbConfig
{ dbAddress :: DbAddress
, dbUsername :: Username
, dbPassword :: Password
}
data AppConfig = AppConfig
{ dbConfig :: DbConfig
, webServerConfig :: WebServerConfig
, cronConfig :: CronConfig
}
What would be some example paths from AppConfig
?
- To
DbConfig
⟶L Here
. - To
WebServerConfig
⟶R (L Here)
. - To
CronConfig
⟶R (R Here)
. - To
DbAddress
⟶L (L Here)
.
What could be the result of searching for a data member with the given type? Two options are obvious: it can be either successful or not. But there’s actually a third possibility lurking somewhere: there might be more than one instance of a value with the same type. Presumably the best course of action in such ambiguous case would be to fail and refuse to return anything at all, since any such choice will be inherently arbitrary.
As a motivation, consider our running example of a web service configuration.
If somebody requests for (Host, Port)
pair, shall it be the database server one
or the web server one? Better safe than sorry.
Anyway, let’s express this in code:
data MaybePath = NotFound | Conflict | Found Path deriving (Show)
The difference in behaviour that requires to discriminate between the two failure options
is that if we hit NotFound
in one of the branches of a product type,
then it doesn’t stop us from succeeding in some other branch.
Contrary to that, once we hit Conflict
, we know we’ve failed globally.
Given this representation, how do we treat product types (which, as we agreed, are basically pairs)? We perform the search recursively on both components and then combine the search results. Let’s write the function for combining them.
Since we’re talking about guiding instance resolution,
we’re ultimately in the compile-time type-level computations land
(even though the types in question are actually terms lifted via DataKinds
).
Thus we need a type-level function, which is expressed as a type family:
type family Combine p1 p2 where
Combine ('Found path) 'NotFound = 'Found ('L path)
Combine 'NotFound ('Found path) = 'Found ('R path)
Combine 'NotFound 'NotFound = 'NotFound
Combine _ _ = 'Conflict
This function is a straightforward representation of a few cases:
- If one subsearch is successful and the other is
NotFound
, then we just need to take the corresponding left or right turn. - If both subsearches result in
NotFound
, then the whole result is, of course, alsoNotFound
. - In any other case we know we have a
Conflict
.
Now let’s write a type-level function that
takes a part
and a Generics
representation of some type and performs the search:
type family Search part (grecord :: k -> *) :: MaybePath where
Search part (K1 _ part) = 'Found 'Here
Search part (K1 _ other) = 'NotFound
Search part (M1 _ _ x) = Search part x
Search part (l :*: r) = Combine (Search part l) (Search part r)
Search _ _ = 'NotFound
Note that it’s very similar to what we tried to express earlier with the GHas
class!
This is to be expected, since we’re effectively reproducing the algorithm
that we tried to spell out via type classes.
Speaking of GHas
, all we have to do now is to add an extra parameter denoting the path
that we’ve computed via the Search
function
and that will be used to select the right instance:
class GHas (path :: Path) part grecord where
gextract :: Proxy path -> grecord p -> part
We use an extra parameter for gextract
to help the type checker
deduce what path
should be in each gextract
invocation.
Writing the instances is straightforward:
instance GHas 'Here rec (K1 i rec) where
gextract _ (K1 x) = x
instance GHas path part record => GHas path part (M1 i t record) where
gextract proxy (M1 x) = gextract proxy x
instance GHas path part l => GHas ('L path) part (l :*: r) where
gextract _ (l :*: _) = gextract (Proxy :: Proxy path) l
instance GHas path part r => GHas ('R path) part (l :*: r) where
gextract _ (_ :*: r) = gextract (Proxy :: Proxy path) r
Indeed, we just select the right instance based on the path
we computed earlier.
How do we write our default
implementation for the extract :: record -> part
function?
We have a few constraints:
record
should beGeneric
for all this stuff to work at all, so we getGeneric record
.part
should be found inrecord
. More concretely,Search
should be able to findpart
in theGeneric
representation ofrecord
(and the latter is denoted asRep record
). This is a bit more funny to write down, resulting inSearch part (Rep record) ~ 'Found path
. Indeed, ifSearch
returnsConflict
orNotFound
, then there is no suchpath
.- We should be able to use
GHas
withpart
,record
andpath
from the previous step, givingGHas path part (Rep record)
.
We’ll encounter the latter two constraints a couple more times, so it’s worth refactoring them out:
type SuccessfulSearch part record path = (Search part (Rep record) ~ 'Found path, GHas path part (Rep record))
Given this alias, we arrive at
class Has part record where
extract :: record -> part
default extract :: forall path. (Generic record, SuccessfulSearch part record path) => record -> part
extract = gextract (Proxy :: Proxy path) . from
And now we’re done!
Using the generic Has
To see how it all works, firstly, let’s write down a few common instances for tuples:
instance SuccessfulSearch a (a0, a1) path => Has a (a0, a1)
instance SuccessfulSearch a (a0, a1, a2) path => Has a (a0, a1, a2)
instance SuccessfulSearch a (a0, a1, a2, a3) path => Has a (a0, a1, a2, a3)
Here the SuccessfulSearch a (a0, ..., an) path
constraint is responsible for
ensuring that a
occurs exactly once among a0, ..., an
.
Then let’s say we have our good old
data AppConfig = AppConfig
{ dbConfig :: DbConfig
, webServerConfig :: WebServerConfig
, cronConfig :: CronConfig
}
and we want to derive Has DbConfig
, Has WebServerConfig
and Has CronConfig
.
It’s sufficient to enable DeriveGeneric
and DeriveAnyClass
extensions
and append a deriving
-clause:
data AppConfig = AppConfig
{ dbConfig :: DbConfig
, webServerConfig :: WebServerConfig
, cronConfig :: CronConfig
} deriving (Generic, Has DbConfig, Has WebServerConfig, Has CronConfig)
We were lucky (or wise) to choose the correct argument order for Has
so that the part
comes first,
as this allows us to rely on DeriveAnyClass
to minimize our syntactic burden.
Safety first
What happens if we don’t have a certain type?
data AppConfig = AppConfig
{ dbConfig :: DbConfig
, webServerConfig :: WebServerConfig
} deriving (Generic, Has CronConfig)
Nope, we catch the error as early as at the type definition site, as expected:
Spec.hs:35:24: error:
• Couldn't match type ‘'NotFound’
with ‘'Found path0’
arising from the 'deriving' clause of a data type declaration
• When deriving the instance for (Has CronConfig AppConfig)
|
35 | } deriving (Generic, Has CronConfig)
| ^^^^^^^^^^^^^^
Not the most friendly error message,
but it’s still quite possible to understand what the problem is:
something something NotFound
something something CronConfig
.
What happens if we have several different fields with the same type?
data AppConfig = AppConfig
{ prodDbConfig :: DbConfig
, qaDbConfig :: DbConfig
, webServerConfig :: WebServerConfig
, cronConfig :: CronConfig
} deriving (Generic, Has DbConfig)
Nope, as expected:
Spec.hs:37:24: error:
• Couldn't match type ‘'Conflict’
with ‘'Found path0’
arising from the 'deriving' clause of a data type declaration
• When deriving the instance for (Has DbConfig AppConfig)
|
37 | } deriving (Generic, Has DbConfig)
| ^^^^^^^^^^^^
Again, not the most obvious message, but gets the job done.
Looks like we’re really good to go.
To sum it up
So here’s the trick in question in a more distilled form.
Let’s say we have a certain type class and we want to be able to automatically derive instances of that class according to some recursive rules. Then we can avoid ambiguities (such as overlapping instances) during instance resolution, if expressing those rules at all, as follows:
- Encode the recursive rules in an inductive data type
T
. - Use a type-level function (in form of a type family) to precompute
the value
v
of that typeT
(or, in Haskell-speak, a typev
of the kindT
— so much for dependent types) describing the exact sequence of the recursive steps to be taken. - Use that
v
as an extra argument to theGeneric
-based type class to guide the specific instances, which will now match onv
.
And that’s basically it for the pattern!
Stay tuned for another post (or two) about some nice and beautiful extensions of this approach.