Cactus

Typed embedding of STLC into Haskell

5 February 2015 (programming haskell language correctness)

Someone posted to the Haskell subreddit this blogpost of Lennart where he goes step-by-step through implementing an evaluator and type checker for CoC. I don't know why this post from 2007 showed up on Reddit this week, but it's a very good post, pedagogically speaking. Go and read it.

In this post, I'd like to elaborate on the simply-typed lambda calculus part of his blogpost. His typechecker defines the following types for representing STLC types, terms, and environments:

data Type = Base
          | Arrow Type Type
          deriving (Eq, Show)

type Sym = String

data Expr = Var Sym
          | App Expr Expr
          | Lam Sym Type Expr
          deriving (Eq, Show)

The signature of the typechecker presented in his post is as follows:

type ErrorMsg = String
type TC a = Either ErrorMsg a
newtype Env = Env [(Sym, Type)] deriving (Show)

tCheck :: Env -> Expr -> TC Type

My approach is to instead create a representation of terms of STLC in such a way that only well-scoped, well-typed terms can be represented. So let's turn on a couple of heavy-weight language extensions from GHC 7.8 (we'll see how each of them is used), and define a typed representation of STLC terms:

{-# LANGUAGE GADTs, StandaloneDeriving #-}
{-# LANGUAGE DataKinds, KindSignatures, TypeFamilies, TypeOperators #-}
{-# LANGUAGE ScopedTypeVariables #-}
{-# LANGUAGE TemplateHaskell #-} -- sigh...

import Data.Singletons.Prelude
import Data.Singletons.TH
import Data.Type.Equality

-- | A (typed) variable is an index into a context of types
data TVar (ts :: [Type]) (a :: Type) where
    Here :: TVar (t ': ts) t
    There :: TVar ts a -> TVar (t ': ts) a
deriving instance Show (TVar ctx a)

-- | Typed representation of STLC: well-scoped and well-typed by construction
data TTerm (ctx :: [Type]) (a :: Type) where
    TConst :: TTerm ctx Base
    TVar :: TVar ctx a -> TTerm ctx a
    TLam :: TTerm (a ': ctx) b -> TTerm ctx (Arrow a b)
    TApp :: TTerm ctx (Arrow a b) -> TTerm ctx a -> TTerm ctx b
deriving instance Show (TTerm ctx a)

The idea is to represent the context of a term as a list of types of variables in scope, and index into that list, de Bruijn-style, to refer to variables. This indexing operation maintains the necessary connection between the pointer and the type that it points to. Note the type of the TLam constructor, where we extend the context at the front for the inductive step.

To give a taste of how convenient it is to work with this representation programmatically, here's a total evaluator:

-- | Interpretation (semantics) of our types
type family Interp (t :: Type) where
    Interp Base = ()
    Interp (Arrow t1 t2) = Interp t1 -> Interp t2

-- | An environment gives the value of all variables in scope in a given context
data Env (ts :: [Type]) where
    Nil :: Env '[]
    Cons :: Interp t -> Env ts -> Env (t ': ts)

lookupVar :: TVar ts a -> Env ts -> Interp a
lookupVar Here      (Cons x _)  = x
lookupVar (There v) (Cons _ xs) = lookupVar v xs

-- | Evaluate a term of STLC. This function is total!
eval :: Env ctx -> TTerm ctx a -> Interp a
eval env TConst = ()
eval env (TVar v) = lookupVar v env
eval env (TLam lam) = \x -> eval (Cons x env) lam
eval env (TApp f e) = eval env f $ eval env e

Of course, the problem is that this representation is not at all convenient for other purposes. For starters, it is certainly not how we would expect human beings to type in their programs.

My version of the typechecker is such that instead of giving the type of a term (when it is well-typed), it instead transforms the loose representation (Term) into the tight one (TTerm). A Term is well-scoped and well-typed (under some binders) iff there is a TTerm corresponding to it. Let's use singletons to store type information in existential positions:

$(genSingletons [''Type])
$(singDecideInstance ''Type)

-- | Existential version of 'TTerm'
data SomeTerm (ctx :: [Type]) where
    TheTerm :: Sing a -> TTerm ctx a -> SomeTerm ctx

-- | Existential version of 'TVar'
data SomeVar (ctx :: [Type]) where
    TheVar :: Sing a -> TVar ctx a -> SomeVar ctx

-- | A typed binder of variable names
data Binder (ctx :: [Type]) where
    BNil :: Binder '[]
    BCons :: Sym -> Sing t -> Binder ts -> Binder (t ': ts)

Armed with these definitions, we can finally define the type inferer. I would argue that it is no more complicated than Lennart's version. In fact, it has the exact same shape, with value-level equality tests replaced with Data.Type.Equality-based checks.

-- | Type inference for STLC
infer :: Binder ctx -> Term -> Maybe (SomeTerm ctx)
infer bs (Var v) = do
    TheVar t v' <- inferVar bs v
    return $ TheTerm t $ TVar v'
infer bs (App f e) = do
    TheTerm (SArrow t0 t) f' <- infer bs f
    TheTerm t0' e' <- infer bs e
    Refl <- testEquality t0 t0'
    return $ TheTerm t $ TApp f' e'
infer bs (Lam v ty e) = case toSing ty of
    SomeSing t0 -> do
        TheTerm t e' <- infer (BCons v t0 bs) e
        return $ TheTerm (SArrow t0 t) $ TLam e'

inferVar :: Binder ctx -> Sym -> Maybe (SomeVar ctx)
inferVar (BCons u t bs) v
  | v == u = return $ TheVar t Here
  | otherwise = do
      TheVar t' v' <- inferVar bs u
      return $ TheVar t' $ There v'
inferVar _ _ = Nothing

Note that pattern matching on Refl in the App case brings in scope type equalities that are crucial to making infer well-typed.

Of course, because of the existential nature of SomeVar, we should provide a typechecker as well which is a much more convenient interface to work with:

-- | Typechecker for STLC
check :: forall ctx a. (SingI a) => Binder ctx -> Term -> Maybe (TTerm ctx a)
check bs e = do
    TheTerm t' e' <- infer bs e
    Refl <- testEquality t t'
    return e'
  where
    t = singByProxy (Proxy :: Proxy a)

-- | Typechecker for closed terms of STLC
check_ :: (SingI a) => Term -> Maybe (TTerm '[] a)
check_ = check BNil

(The SingI a constraint is an unfortunate implementation detail; the kind of a is Type, which is closed, so GHC should be able to know there is always going to be a SingI a instance).

To review, we've written a typed embedding of STLC into Haskell, with a total evaluator and a typechecker, in about 110 lines of code.

If we were doing this in something more like Agda, one possible improvement would be to define a function untype :: TTerm ctx a -> Term and use that to give check basically a type of Binder ctx -> (e :: Term) -> Either ((e' :: TTerm ctx a) -> untype e' == e -> Void) (TTerm ctx a), i.e. to give a proof in the non-well-typed case as well.

Full code as a gist on Github


« Initial version of my Commodore PET 
All posts
 Arrow's place in the Applicative/Monad hierarchy »