Posts Tagged ‘Categorical logic’

Generalized elements, functions, and categories

(Part 6 of a series.)

Set theory centers around a two-place judgment \(x\in X\)—that \(x\) is an element of the set \(X\). Type theories, in contrast, are built on a three-place judgment \(\Gamma\vdash t:B\)—that under the assumptions \(\Gamma\), \(t\) is a term in the type \(B\). As we will see, the latter notion generalizes the former, and suggests a new way to use sets.

Generalized elements are functions

In a set-theoretic model, \([\![\cdot\vdash b:B ]\!]\) is an element of the set \([\![B ]\!]\). \([\![x:A\vdash t:B ]\!]\) is a function \([\![A ]\!]\to[\![B ]\!]\), which sends any element of \([\![A ]\!]\) to an element of \([\![B ]\!]\). Indeed, the substitution

says that an element \(s\) of \([\![A ]\!]\) coupled with such a function \([\![A ]\!]\to[\![B ]\!]\) produces an element \([s/x]t\) of \([\![B ]\!]\).

Notice that, in set theory, \(b\) is an element while \(t\) is a function; in type theory, \(b\) is an element under no assumptions, while \(t\) is an element under the assumption \(x:A\). So we can think of a function \([\![A ]\!]\to[\![B ]\!]\) as an element of \([\![B ]\!]\) pending an element of \([\![A ]\!]\), and an element of \([\![B ]\!]\) as an element of \([\![B ]\!]\) pending nothing (technically, pending an element of \(1\)).

Mathematically, these functions are called generalized elements: for any sets \(X,Y\), an \(X\)-element of \(Y\) is a function \(X\to Y\). Then \(t\) is an \([\![A ]\!]\)-element of \([\![B ]\!]\), and \(b\) is a \(1\)-element of \([\![B ]\!]\). (We also call \(1\)-elements global elements, in the sense that they are elements apropos of nothing.)

The idea of generalized elements may seem frivolous, but let us make a few observations. First, they correspond closely to type-theoretic judgments. In a set-theoretic model, \([\![\Gamma\vdash t:B ]\!]\) is a \([\![\Gamma]\!]\)-element of \([\![B ]\!]\). Conversely, \(f\) is an \(A\)-element of \(B\) is basically a three-place judgment which says that, under the assumptions \(A\), \(t\) is an element of \(B\). Think of it like “\(A\vdash f:B\)” (but mind the scare quotes).

Secondly, generalized elements are defined only in terms of functions between sets, not in terms of sets’ elements themselves. As we will see in the next few posts, this concept is enough to formulate a surprising amount of set-theoretic reasoning.

Two simple examples: given a global element of \(A\), and a function \(A\to B\), we can obtain a global element of \(B\) by composing these functions.

A function \(f:A\to B\) is injective if two global elements \(a,a'\) are equal (as functions) whenever their postcompositions by \(f\) are equal (as functions).

Lastly, if we define set-theoretic notions purely in terms of functions, as we have started to do above, we can port these definitions directly to more complicated structures by replacing set with foo, and function with functions between foos. (As we will soon see, these ported definitions will even be correct!)

Categories

This is exactly the modus operandi of category theory. A category is essentially a choice of foo and functions between foos satisfying some very basic conditions that allow these kinds of definitions to make sense. The foos of a category are called its objects, and the functions are called its morphisms.

In category theory, we say things like, “In a category where such and such is true, the baz of two objects is defined to be an object which has some morphisms that do such and such.” After a few more definitions, the upshot is that, in any category which has bazzes, every morphism out of a baz happens to be qux.

It can definitely be abstruse at times—some call it abstract nonsense—but the point is that category theory studies patterns which recur in different fields of mathematics, and provides a precise language for relating different sorts of mathematical structures (like type theories and sets!).

On to the definition. A category is a collection of objects \(A,B,C\) and morphisms \(f,g\) which go from an object to an object; \(f:A\to B\) means that \(f\) goes from \(A\) to \(B\). We say \(A\) is the domain of \(f\), and \(B\) the codomain. (It is useful to think of \(f:A\to B\) both as a “function” from \(A\) to \(B\) and as an \(A\)-element of \(B\), depending on the situation.)

Furthermore, if we have morphisms \(f:A\to B\) and \(g:B\to C\), we can compose them to obtain a morphism \(gf:A\to C\); composition is associative (\((hg)f=h(gf)\)), and each object \(A\) has an identity morphism \(\textbf{id}_A\) (\(f\textbf{id}_A=f=\textbf{id}_B f\)).

There is a category \(\textbf{Set}\) whose objects are sets, morphisms \(A\to B\) are functions from \(A\) to \(B\), composition is function composition, and identities are identity functions.

\(\textbf{Grp}\) is a category whose objects are groups, and whose morphisms are group homomorphisms; \(\textbf{Top}\) is a category whose objects are topological spaces, and whose morphisms are continuous functions. In both cases, composition and identities are the ordinary notions, and satisfy the necessary equations. There are no non-homomorphism (resp., discontinuous) functions in \(\textbf{Grp}\) (resp., \(\textbf{Top}\)); we’ll see next time that this is the reason our definitions make sense, so to speak, in these categories.

Here’s one final example of a rather different flavor: recall that a partial order, or poset, is a set equipped with a relation \(\preceq\) which is

  1. reflexive: \(x\preceq x\);

  2. antisymmetric: if \(x\preceq y\) and \(y\preceq x\), then \(x=y\); and

  3. transitive: if \(x\preceq y\) and \(y\preceq z\), then \(x\preceq z\).

Examples include the natural or real numbers with \(\leq\). The subsets of any set form a poset with \(\subseteq\). So does divisibility over the natural numbers: say \(a|b\) if \(a\) divides \(b\). Clearly \(a|a\); if \(a|b\) and \(b|a\), then \(a=b\); and if \(a|b\) and \(b|c\), then \(a|c\).

Any poset gives rise to a category whose objects are the elements of the poset, and for any two objects \(A,B\), there is exactly one morphism \(A\to B\) if \(A\preceq B\), and none otherwise.

Why is this a category? By reflexivity, \(A\preceq A\), so there is exactly one morphism \(A\to A\), which must be \(\textbf{id}_A\). Given morphisms \(A\to B\) and \(B\to C\), the transitivity of \(\preceq\) guarantees a unique morphism \(A\to C\), which we define to be the result of composing the morphisms. Associativity and identity follow trivially from the fact that there is at most one morphism between any two objects.

Algebraic type theories

(Part 5 of a series.)

Although we have not yet finished discussing algebraic theories, we will continue our discussion in the more general framework of algebraic type theories, which allow us to simultaneously define multiple theories which may or may not refer to each other. (Algebraic type theories are also known as many-sorted algebraic theories, for reasons which will be apparent momentarily.)

As one of many motivations, consider group actions, which arise from a group \(G\) and a set \(X\), along with a function \(\cdot:G\times X\to X\) which is compatible with the group in the sense that \(\mathord{\sf{\text{e}}}\cdot x = x\) and \((g\circ h)\cdot x = g\cdot (h\cdot x)\). (We say that \(g\cdot x\) is obtained from the action of \(g\) on \(x\).)

For example, a permutation group has as elements permutations (of a set of size \(n\)), where \(\circ\) is composition of permutations. The action of a permutation group on an appropriately–sized ordered set is obtained by applying the permutation.

Although we can separately define groups and sets as algebraic theories, we cannot define group actions, because \(\cdot\) refers to both theories at once. To support multiple sorts of terms, we annotate each term with a type. This affects the right-hand side of each judgment, as well as each variable in the context. So our theory of groups would be written:

We read \(\Gamma\vdash e:G\) as saying that, in any context \(\Gamma\), \(e\) is a term of type \(G\), and \(x:G\vdash x\circ e=x:G\) as saying that, given any term \(x\) of type \(G\), then \(x\circ e\) and \(x\) are equal terms of type \(G\).

In the same theory, we can also define a type corresponding to some set, like:

for the set with two elements. Then a group action on \(\mathord{\sf{\text{bool}}}\) consists of a \(\cdot\) constant which takes a term of type \(G\) and a term of type \(\mathord{\sf{\text{bool}}}\):

Notice that we can no longer describe the arity of function symbols numerically, since the types of the arguments are now relevant. We instead use arities of the form \(A_1,\cdots,A_n\to B\); we say \(\cdot\) has arity \(G,\mathord{\sf{\text{bool}}}\to\mathord{\sf{\text{bool}}}\).

In an algebraic type theory, the hypothesis rule must now maintain the type of the variable:

and the substitution rule must require that the types of the substituted terms match the types of the variables.

Of course, if they did not match, then we would not be able to obtain a derivation of the substituted term from a derivation of the original term in the manner previously described. (As before, weakening can be derived from substitution.)

Equalities are now annotated with the type of the two terms; terms can only be equated when their types are equal. Thus, the basic axioms are now:

A set-theoretic model of an algebraic type theory is an interpretation function \([\![- ]\!]\) which

  1. sends each type \(A\) to a set \([\![A ]\!]\), and

  2. each judgment \(x_1:A_1,\cdots,x_n:A_n\vdash t:B\) to a function \([\![A_1 ]\!]\times\cdots\times[\![A_n ]\!]\to [\![B ]\!]\), such that

  3. for each judgment \(\Gamma\vdash s=t:A\), it is the case that \([\![\Gamma\vdash s:A ]\!]=[\![\Gamma\vdash t:A ]\!]\).

We extend the interpretation function to contexts by \([\![x_1:A_1,\cdots,x_n:A_n ]\!]= [\![A_1 ]\!]\times\cdots\times[\![A_n ]\!]\).

As with algebraic theories, we also require that the hypothesis rule is modeled by a projection out of the context tuple, \((x_1,\cdots,x_n)\mapsto x_i\), and and that substitution of terms modeled by \(f_1,\cdots,f_n\) into a term modeled by \(g\) is modeled by \(\hat x\mapsto g(f_1(\hat x),\cdots,f_n(\hat x))\). It is worth verifying that these functions have the correct sets as domain and codomain.

Lastly, as with algebraic theories, it is the case that, for any choice of \([\![- ]\!]\) on types only, the behavior of \([\![- ]\!]\) on judgments is determined precisely by a choice of \([\![f ]\!]:[\![A_1 ]\!]\times\cdots\times[\![A_n ]\!]\to[\![B ]\!]\) for each function symbol \(f\) of arity \(A_1,\cdots,A_n\to B\). Again, it is worth verifying that the previous proof goes through.

Set-theoretic models of algebraic theories: II

(Part 4 of a series.)

We will henceforth treat theories primarily as collections of rules, rather than signatures equipped with axioms. This viewpoint allows us to treat extensions of algebraic theories as merely comprising additional rules; it also affords a uniform treatment of all the rules.

A set-theoretic model of an algebraic theory is a set \(X\) and an interpretation function \([\![- ]\!]\) which sends every judgment \(x_1,\cdots,x_n\vdash t\) to a function \([\![x_1,\cdots,x_n\vdash t ]\!]:X^n\to X\).

In particular, for each rule

it must be the case that, if a function \([\![\Delta\vdash s ]\!]\) exists, then we can produce a function \([\![\Gamma\vdash t ]\!]\) in some fashion generic to the choice of \(\Delta,\Gamma,s,t\). And for each rule

it must be the case that \([\![\Gamma\vdash s ]\!]=[\![\Gamma\vdash t ]\!]\).

Structural rules

Recall that \(x_1,\cdots,x_n\vdash t\) means that, so long as \(x_1,\cdots,x_n\) stand for terms, \(t\) is a term. Then \([\![x_1,\cdots,x_n\vdash t ]\!]\) is a function which takes an \(n\)-tuple of terms \((x_1,\cdots,x_n)\) and returns a term \(t\) which may contain them. In particular, the hypothesis rule

is modeled by the function \((x_1,\cdots,x_n)\mapsto x_i\) which projects out the \(i\)th element of the tuple. (We call this projection function \(\pi^n_i\).)

The substitution rule:

tells us that, from \(n\) functions \(f_1,\cdots,f_n:X^m\to X\) (given by \([\![\Gamma\vdash s_i ]\!]\)) , and a function \(g:X^n\to X\) (given by \([\![x_1,\cdots,x_n\vdash t ]\!]\)), we can get a function \(X^m\to X\). We model this by the function \(\hat x\mapsto g(f_1(\hat x),\cdots,f_n(\hat x))\) which takes an \(m\)-tuple, forms an \(n\)-tuple by applying each \(f_i\) to that tuple, and passes all the results to \(g\).

A side note to cognoscenti: this actually enforces alpha-equivalence (the irrelevance of variable names) in the model. For example, by substitution,

shows that \([\![y\vdash [y/x]t ]\!]\) must be modeled by \(y\mapsto [\![x\vdash t ]\!](y)\), which is simply \([\![x\vdash t ]\!]\). The same argument applies with any number of variables in the context.

Signature rules

How do we model the signature rules? Consider \(\circ\). In particular, we have the instance

which gives rise to a function \([\![a,b\vdash a\circ b ]\!]\). Once we have chosen this function—which we will abbreviate \([\![\circ ]\!]\)—we have in fact fixed the choice of \([\![\Gamma\vdash t\circ s ]\!]\) for each \([\![\Gamma\vdash t ]\!]\) and \([\![\Gamma\vdash s ]\!]\), and vice versa.

The converse is obvious. If we have chosen how to obtain \([\![\Gamma\vdash t\circ s ]\!]\) for each \([\![\Gamma\vdash t ]\!]\) and \([\![\Gamma\vdash s ]\!]\), then we simply plug in \([\![a,b\vdash a ]\!]\) and \([\![a,b\vdash b ]\!]\), which must be the projections \(\pi^2_1\) and \(\pi^2_2\), and obtain \([\![a,b\vdash a\circ b ]\!]=[\![\circ ]\!]\).

Given \([\![\Gamma\vdash t ]\!]\) and \([\![\Gamma\vdash s ]\!]\), where \(\Gamma\) has length \(n\), \([\![\Gamma\vdash t\circ s ]\!]:X^n\to X\) must be the function \(\hat x\mapsto [\![\circ ]\!]([\![\Gamma\vdash t ]\!](\hat x),[\![\Gamma\vdash s ]\!](\hat x))\). To see this, by substitution,

where \([\![\Gamma\vdash [t,s/a,b](a\circ b) ]\!]= [\![\Gamma\vdash t\circ s ]\!]\) because \(\Gamma\vdash [t,s/a,b](a\circ b)=t\circ s\). But \([\![\Gamma\vdash [t,s/a,b](a\circ b) ]\!]\) must be \(\hat x\mapsto[\![a,b\vdash a\circ b ]\!]([\![\Gamma\vdash t ]\!](\hat x),[\![\Gamma\vdash s ]\!](\hat x))\), which is what we asserted above.

Thus, modeling the \(\circ\) rule amounts to choosing a function \([\![\circ ]\!]:X\times X\to X\). Similarly, modeling \({}^{-1}\) requires a function \([\![{}^{-1} ]\!]:X\to X\), and \(\mathord{\sf{\text{e}}}\) an element \([\![\mathord{\sf{\text{e}}}]\!]\in X\).

This last one requires some explanation. As before, \([\![\cdot\vdash\mathord{\sf{\text{e}}}]\!]:X^0\to X\) is a function \(()\mapsto [\![\mathord{\sf{\text{e}}}]\!]()\) which takes an empty tuple of terms. But since there is only one empty tuple, \([\![\mathord{\sf{\text{e}}}]\!]\) always takes on the same value, so a choice of \([\![\mathord{\sf{\text{e}}}]\!]\) is simply a choice of an element of \(X\), and we say \([\![\mathord{\sf{\text{e}}}]\!]\in X\). (We call the set containing only the empty tuple \(1\), because it has one element; this notation draws an analogy with numerical exponentiation, where \(x^0=1\).)

Finally, the weakening rule

is derived from substitution, and so \([\![x_1,\cdots,x_n,x_{n+1}\vdash t ]\!]\) is

Axioms

Because an interpretation \([\![- ]\!]\) must send equal judgments to equal functions, we must verify that the constructions above satisfy the axioms required of all algebraic theories.

Reflexivity, symmetry, and transitivity are satisfied simply because ordinary mathematical equality enjoys these properties. For example, if \(\Gamma\vdash t=s\), then the model has equal functions \([\![\Gamma\vdash t ]\!]=[\![\Gamma\vdash s ]\!]\); the symmetry axiom implies \(\Gamma\vdash s=t\), and so demands simply that \([\![\Gamma\vdash s ]\!]=[\![\Gamma\vdash t ]\!]\) also.

Congruence is satisfied because equality of mathematical functions is extensional (i.e., two functions are equal when they send equal arguments to equal results). Consider the example that \(\Gamma\vdash t=t'\) should imply \(\Gamma\vdash t\circ\mathord{\sf{\text{e}}}=t'\circ\mathord{\sf{\text{e}}}\). We have seen that \([\![\Gamma\vdash t\circ\mathord{\sf{\text{e}}}]\!]=\hat x\mapsto [\![\circ ]\!]([\![\Gamma\vdash t ]\!](\hat x),[\![\mathord{\sf{\text{e}}}]\!])\) and similarly for \(t'\), but if \([\![\Gamma\vdash t ]\!]=[\![\Gamma\vdash t' ]\!]\), then these functions must be equal as well.

To see the general case, recall the congruence rule. Let \(f_i=[\![\Gamma\vdash s_i ]\!]\), \(g_i=[\![\Gamma\vdash u_i ]\!]\), and \(h=[\![x_1,\cdots,x_n\vdash t ]\!]\).

The two functions in the conclusion are, by the interpretation of substitution, \(\hat x\mapsto h(f_1(\hat x),\cdots,f_n(\hat x))\) and \(\hat x\mapsto h(g_1(\hat x),\cdots,g_n(\hat x))\). The hypotheses imply that \(f_i=g_i\) for each \(i\), so these functions are equal as well.

To summarize, a set-theoretic model of a theory is a set \(X\) and interpretation function \([\![- ]\!]\) from judgments to functions \(X^n\to X\), where the hypothesis and subsitution rules are interpreted in a particular fashion, and which obeys the remaining rules in a particular way.

In fact, this definition is sufficiently rigid that models are determined by a choice of interpretation for each function symbol, such that the axioms hold true (i.e., \([\![\circ ]\!](x,[\![\circ ]\!](y,z))= [\![\circ ]\!]([\![\circ ]\!](x,y),z)\)). This justifies our earlier statement that groups are in bijection with models of the algebraic theory of groups—a mathematical group is precisely a set equipped with such a choice of \([\![\circ ]\!],[\![{}^{-1} ]\!],[\![\mathord{\sf{\text{e}}}]\!]\) satisfying the axioms.

Set-theoretic models of algebraic theories: I

(Part 3 of a series.)

We have seen how to present a mathematical structure as an algebraic theory, but have been vague about precisely which structures are denoted by any particular collection of rules.

Last time, we wrote down the group laws as an algebraic theory (with signature \(\mathord{\sf{\text{e}}}\), \({}^{-1}\), \(\circ\), and axioms for identity, inverses, and associativity) and said that we had defined groups.

Our signature mentions only one base term, \(\mathord{\sf{\text{e}}}\), so in some sense we have defined only the one-element group. On the other hand, we could get more mileage out of our theory by using it to describe all groups, in the sense that any group will be compatible with the given signature and axioms—although it may have additional rules (terms and equations) not enforced by the base theory itself.

As a more damning example, consider the natural numbers, which consist of

  1. the zero number \(\mathord{\sf{\text{z}}}\), and

  2. a successor function \(\mathord{\sf{\text{s}}}\) which sends a number to itself plus one.

We might say the natural numbers are an algebraic theory whose signature has an arity zero constant \(\mathord{\sf{\text{z}}}\) and an arity one constant \(\mathord{\sf{\text{s}}}\):

It is easy to see this theory must contain the terms \(\mathord{\sf{\text{s}}}(\mathord{\sf{\text{s}}}(\cdots\mathord{\sf{\text{s}}}(\mathord{\sf{\text{z}}})))\). We add no axioms, because

by reflexivity, and we would like

when \(n\neq m\), since different numbers are always nonequal.

If we allow a group to have elements besides \(\mathord{\sf{\text{e}}}\), then the natural numbers, defined this way, might have elements besides the \(\mathord{\sf{\text{s}}}^n(\mathord{\sf{\text{z}}})\). Worse yet, they might have additional equalities—the axiom \(n\vdash n=\mathord{\sf{\text{z}}}\) is compatible with the theory, and causes all numbers to equal zero!

To better understand these issues, we introduce the notion of a set-theoretic model of an algebraic theory. Intuitively, a theory is modeled by a set \(M\) if we can map terms in the theory to elements of \(M\) in such a way that equal terms (in the sense of the theory’s equality) are sent to the same element of \(M\). This will eventually let us make precise ideas like:

  1. a set has a group structure if and only if it is a set-theoretic model of the algebraic theory of groups;

  2. the one-element group is the smallest possible group (for some precise notion of ‘smallest’); and

  3. the natural numbers are exactly the smallest model of the algebraic theory of natural numbers.

Next time, we will examine each rule in a theory—corresponding to signatures, structural rules, and axioms—and mirror each set-theoretically.

But the core idea is simple enough: to model a signature in a set \(M\) is to interpret each constant \(c\) of arity \(a\) as a genuine function \([\![ c ]\!]:M^a\to M\), i.e., which takes an \(a\)-tuple of elements of \(M\) to a single element. So if a set \(G\) models groups, then it is equipped with, in part, a multiplication function \([\![\circ ]\!]:G\times G\to G\) and an inverse function \([\![{}^{-1} ]\!]:G\to G\). In other words, it is a group in the ordinary sense.

Algebraic theories

(Part 2 of a series.)

This post is quite long, since it introduces the judgmental machinery of type theory. Future posts will be shorter and less familiar to any type theorists in the audience.

We start by exploring the notion of an algebraic theory, a collection of terms and equations between these terms. To define an algebraic theory, we start by writing down a signature of constants and their arities.

For example, a group is a structure which has

  1. a distinguished identity term \(\mathord{\sf{\text{e}}}\),

  2. a multiplication operation \(\circ\) which sends any two terms \(x,y\) to a single term \(x\circ y\), and

  3. an inverse operation \({}^{-1}\) which sends any term \(x\) to a term \(x^{-1}\).

(These operations must satisfy certain equations, which we’ll get to later.)

We say its signature has three constants: \(\mathord{\sf{\text{e}}}\) has arity zero, as it requires no arguments; \({}^{-1}\) has arity one, as it takes one argument; and \(\circ\) has arity two. We obtain a term by applying a constant of arity \(a\) to exactly \(a\) terms. Because they send terms to terms, constants are usually called function symbols.

Let us write the judgment \(\Gamma\vdash t\) whenever

  1. \(\Gamma\) is a (possibly empty) set of variables, and

  2. \(t\) is a valid term in the theory, given the above rules and that each variable in \(\Gamma\) is also assumed to be a term in the theory.

Thus, \(\cdot\vdash\mathord{\sf{\text{e}}}\) means that, under no assumptions, \(\mathord{\sf{\text{e}}}\) is a term. \(x\vdash x^{-1}\) means that, assuming \(x\) is a variable standing in for some term, then \(x^{-1}\) is a term.

Of course, it is safe to make extraneous assumptions—it is the case that \(\mathord{\sf{\text{e}}}\) is also a term when \(x\), \(y\), and \(z\) stand in for terms (i.e., \(x,y,z\vdash\mathord{\sf{\text{e}}}\)). But it isn’t the case that \(\cdot\vdash x^{-1}\), since the variable \(x\) is not intrinsically a term. (Do not confuse the constant term \(\mathord{\sf{\text{e}}}\) with variables like \(x,y,z\).)

We can summarize the signature of a group with the following rules. Read these rules as stating that whenever all the judgments above the line are true, so is the judgment below the line. As before, \(\Gamma\) stands for any set of variables, and \(t,s\) stand for any term.

For example, the second rule says that, if \(t\) is a term under the assumptions in \(\Gamma\), then \(t^{-1}\) is also a term under the assumptions in \(\Gamma\).

To show \(\cdot\vdash\mathord{\sf{\text{e}}}^{-1}\circ\mathord{\sf{\text{e}}}\) (under no assumptions, \(\mathord{\sf{\text{e}}}^{-1}\circ\mathord{\sf{\text{e}}}\) is a term), we would first conclude \(\cdot\vdash \mathord{\sf{\text{e}}}\) from the first rule, which by the second rule implies \(\cdot\vdash\mathord{\sf{\text{e}}}^{-1}\). If we apply the third rule to this and to \(\cdot\vdash\mathord{\sf{\text{e}}}\) (from the first rule), we obtain \(\cdot\vdash \mathord{\sf{\text{e}}}^{-1}\circ\mathord{\sf{\text{e}}}\). We typically chain these derivations together in a tree-like structure, like so:

Structural rules

If we add one more rule to the system, we will be able to derive (from the rules) that \(\Gamma\vdash t\) whenever \(t\) is a term under the assumption that the variables in \(\Gamma\) represent terms. For example, using only the three rules above, we can’t conclude that \(x\vdash x\), even though \(x\) is a term under the assumption that \(x\) is a term. The required rule is:

Called the hypothesis rule, this lets us conclude \(x\vdash x\), or, for example, \(x\vdash x^{-1}\) (from \(x\vdash x\) and the second rule).

For any \(t\) such that \(x\vdash t\), it is also the case that \(\cdot\vdash t'\) where \(t'\) replaces all occurrences of the variable \(x\) in \(t\) with the term \(\mathord{\sf{\text{e}}}\). For example, \(x\vdash x^{-1}\), and so \(\cdot\vdash\mathord{\sf{\text{e}}}^{-1}\). In general, we can turn a derivation of \(x\vdash t\) into a derivation of \(\cdot\vdash t'\) by replacing each instance of the hypothesis rule with the \(\mathord{\sf{\text{e}}}\) rule, and modifying the term appropriately:

We can do better—for any \(x\vdash t\), then \(\Gamma\vdash t'\) where \(t'\) replaces each occurrence of \(x\) in \(t\) by a term \(s\) for which \(\Gamma\vdash s\). Here, we produce the derivation of \(\Gamma\vdash t'\) by replacing each instance of the hypothesis rule with a derivation of \(\Gamma\vdash s\). For example, given \(x\vdash x^{-1}\circ x\) and \(a,b\vdash a\circ b\):

We say that \(t'\) is obtained via the (single-variable) substitution of \(s\) for \(x\) in \(t\), and use \([s/x]t\) to denote this term. We can prove that the following rule holds for all algebraic theories, by performing the derivation-rewriting described above.

Substitution generalizes to the simultaneous substitution of multiple variables. If \(x_1,\cdots,x_n\vdash t\), and \(\Gamma\vdash s_1\), …, \(\Gamma\vdash s_n\), then we can derive \(\Gamma\vdash [s_1,\cdots,s_n/x_1,\cdots,x_n]t\), where this term means that each \(x_i\) in \(t\) is replaced by \(s_i\). Again, in all algebraic theories, the following rule is provable:

As a side note, the hypothesis rule ensures certain variables must be in \(\Gamma\) for a particular term, but no rule requires any variables to not be in \(\Gamma\). So, as we said before, it is safe to add extraneous assumptions.

Here, \(y\) is a variable not already in \(\Gamma\), as we require the variables to be distinct. This is called the weakening rule, and is also true of all algebraic theories—in fact, it follows directly from the substitution rule above. Indeed, if \(\Gamma=x_1,\cdots,x_n\), then we can apply the hypothesis rule \(n\) times to get \(\Gamma,y\vdash x_i\). Combining these and \(\Gamma\vdash t\), the substitution rule gives us

where \([x_1,\cdots,x_n/x_1,\cdots,x_n]t=t\) because this substitution replaces each variable with itself; thus we conclude \(\Gamma,y\vdash t\).

Axioms

We finally have enough machinery to finish defining algebraic theories. After fixing a signature, we list axioms equating some of the terms. By default, we include reflexivity, symmetry, and transitivity, which guarantee that syntactically identical terms are equal.

It is worth noting that substitutions are not part of terms themselves, but rather send terms to terms. Thus, for example, we need not throw in any additional equality rules to ensure \(x\vdash [x/y]y=x\), since \([x/y]y\) denotes the term \(x\).

We also include an axiom which states that equality is a congruence, that is, that two terms are equal when their corresponding subterms are pairwise equal.

This allows us to conclude, for example, that if \(\Gamma\vdash t=t'\), then \(\Gamma\vdash t\circ\mathord{\sf{\text{e}}}= t'\circ\mathord{\sf{\text{e}}}\). If \(\Gamma=x_1,\cdots,x_n\), then

One axiom of groups says \(\Gamma\vdash t\circ\mathord{\sf{\text{e}}}= t\) when \(\Gamma\vdash t\): assuming \(t\) is a term under \(\Gamma\), then \(t\circ\mathord{\sf{\text{e}}}\) is equal to \(t\) under \(\Gamma\). (We maintain the \(\Gamma\) because, in general, \(t\) is not a term except under \(\Gamma\).)

The group axioms state that \(\mathord{\sf{\text{e}}}\) is an identity on both sides,

that \({}^{-1}\) produces inverses,

and that \(\circ\) is associative:

Summary

I realize I’ve introduced a huge amount of notation and terminology during this post, but it’s hard to get started without the framework of judgments we’ve developed. Let’s summarize:

An algebraic theory is a signature (a set of constants with arities) and a set of axioms equating some of the terms generated by the signature. We use the judgmental notation \(\Gamma\vdash t\), meaning that \(t\) is a term, assuming that each variable in the set \(\Gamma\) represents a term.

Each algebraic theory includes rules corresponding to its signature. All algebraic theories additionally satisfy the hypothesis rule (below, left), and have a definable notion of substitution satisfying the substitution rule (below, center). The substitution rule implies the weakening rule (below, right).

We also have a judgment \(\Gamma\vdash t=s\), meaning that \(\Gamma\vdash t\), \(\Gamma\vdash s\), and that \(t\) and \(s\) are equal terms. Each theory comes with its own axioms. In all theories, equality is additionally reflexive, symmetric, transitive, and a congruence.

Thanks to Nico Feltman for some helpful feedback on this post, and commenter gasche for noting my omission of the congruence rule.