Simply typed LF

In the previous section, we discussed representing object-language syntax in LF. As we saw, to prove that an LF representation is correct, you need to be able to reason by induction on the canonical forms of LF. To do so, you will need a basic understanding of the LF type theory. This understanding will pay off later on, as we will also use induction on canonical forms to prove metatheorems about encoded languages.

In this section, we present simply-typed LF in more detail.

Syntax

The syntax of simply-typed LF is as follows:

\begin{align*} A & ::= a \mid A_1 \rightarrow A_2 \\ R & ::= c \mid x \mid R \, M \\ M & ::= R \, | \, \lambda x.M \end{align*}

The types $A$ consist of base types $a$ , which are declared in the signature (e.g., nat), and function types $A_1 \rightarrow A_2$ . The terms of LF consist of variables $x$ , constants $c$ declared in the signature (e.g., z and s), application $R\,M$ , and lambda-abstractions $\lambda x.M$ , which in STELF’s concrete syntax are written [x] M. This much should be familiar to you if you’ve seen the simply-typed lambda-calculus before. Likely more unfamiliar is that the grammar is stratified into $R$ and $M$ . We discuss the reason for this stratification now.

Canonical forms

The above syntax describes what we call TODO (as long as they are well-typed, more on this below). Note what is not a canonical form: there is no syntactic way to apply a lambda-abstraction $\lambda x . M$ to an argument. Based on your past experience with programming languages, it may seem strange to define LF so that only canonical forms exist: we are not allowed to write down any programs that have any computation left to do. However, this restriction makes sense if you think about our methodology for representing object languages in LF. In the previous section, we represented natural numbers with the following LF signature:

%sort nat
%term z nat
%term s %-> nat %-> nat

For this representation to be adequate, the only LF terms of type nat must be z, s z, s (s z), and so on. Non-canonical LF terms interfere with this encoding. For example, the LF term $(\lambda x. \mathtt{s}\,x)\,\mathtt{z}$ would have type nat were it allowed, and that is not the representation of any informal natural number. This would contradict the third part of the adequacy argument in the previous section. Restricting LF to canonical forms avoids these counterexamples.

We won’t present all the typing rules for canonical forms here, but you can see Harper and Licata’s paper Mechanizing Metatheory in a Logical Framework for all the details. The only detail we need to note here is that terms $R$ are only canonical at base type $a$ , not at function type. For example, the constant s is not a canonical form of type %-> nat %-> nat. The term $\lambda x. \mathtt{s}\,x$ , however, is a canonical form.

If you have encountered beta-reduction and eta-expansion before in your study of programming languages, it may help your intuition to know that the canonical forms of LF coincide with the beta-normal, eta-long terms of the lambda calculus. What we were saying above is that the syntax of canonical forms forces them to be beta-normal, and that the typing rules for canonical forms ensure that they are eta-long. In logic, canonical forms correspond to normal and neutral natural deduction proofs and cut-free sequent calculus proofs.

Hereditary Substitution

Substitution $[M'/x]M$ of one canonical form into another does not necessarily produce a canonical result. For example:

\langle\lambda y.y/x\rangle(x \; z) = (\lambda y.y) \; z

Even though both $y.y$ and $x \; z$ are canonical forms, the result of the substitution is beta-reducible.

However, it is possible to define a notion of hereditary substitution, which directly computes the canonical result of an ordinary substitution. When ordinary substitution would return a non-canonical form, hereditary substitution continues to reduce by substituting the argument into the body of the function. In the above example, the hereditary substitution

\langle\lambda y.y/x\rangle(x \; z) = \langle z/y \rangle y = z

Whenever we use the notation $\langle M'/x \rangle M$ for LF in this guide, we mean hereditary substitution.

Induction on canonical forms

The above syntax constitutes an inductive definition of the canonical forms of LF. Consequently, we can reason by induction on canonical forms using structural induction over this syntax. When a type adequately represents some informal object-language syntax, induction on the canonical forms of the type corresponds to the structural induction principle for the informal syntax. For example, the induction principle for the type nat defined in the previous section corresponds to the standard induction principle for the natural numbers. As we discussed in the previous section, induction on canonical forms is used to prove adequacy. Additionally, as we discuss below, induction on canonical forms is used to prove metatheorems about deductive systems.

Why a lambda calculus?

At this point, you may be wondering what we gain by using a lambda-calculus to represent other deductive systems—to represent the natural numbers, all we needed was constants and application; we never even used a term of the form $\lambda x.M$ . At a high level, the answer is that using a meta-language with variable binding permits clean representations of object languages with variable binding. However, we defer a real answer to this question until we talk about representing the syntax and judgements of a language with binding.