I learned this fact from Steve Awodey, who noticed it in 2009. (It appears as though it may have been independently discovered by several people, including at least Thierry Coquand. Does anyone else have more information?) The HoTT Book proves these facts using J in Lemmas 3.11.8 and 2.3.1, respectively.
I attach a short note proving not only that (1) + (2) implies J, but that definitional computation rules for (1) and (2) allow one to derive the definitional computation rule for J. My note is several months old, but this result has come up several times recently, so I thought I’d post it online:
]]>Homotopical Patch Theory
Carlo Angiuli, Ed Morehouse, Dan Licata, Robert HarperHomotopy type theory is an extension of Martin-Löf type theory, based on a correspondence with homotopy theory and higher category theory. In homotopy type theory, the propositional equality type becomes proof-relevant, and corresponds to paths in a space. This allows for a new class of datatypes, called higher inductive types, which are specified by constructors not only for points but also for paths. In this paper, we consider a programming application of higher inductive types. Version control systems such as Darcs are based on the notion of patches—syntactic representations of edits to a repository. We show how patch theory can be developed in homotopy type theory. Our formulation separates formal theories of patches from their interpretation as edits to repositories. A patch theory is presented as a higher inductive type. Models of a patch theory are given by maps out of that type, which, being functors, automatically preserve the structure of patches. Several standard tools of homotopy theory come into play, demonstrating the use of these methods in a practical programming context.
A video of my talk is available on YouTube.
]]>The news of the day is that we at the Univalent Foundations project are pleased to announce the first version of our book, Homotopy Type Theory: Univalent Foundations of Mathematics. (Available as a PDF, or in print from Lulu. See also the announcements on the HoTT blog, the n-Category Café, and Andrej Bauer’s blog.)
I can’t do much justice to homotopy type theory (HoTT) in a short blog post. The book’s introduction nicely summarizes HoTT’s origins, motivations, and basics. There have also been a number of recent survey articles, as well as introductory blog posts by Bob Harper, Dan Licata, and the prolific Mike Shulman, among others.
I know many mathematicians and computer scientists who are not category theorists, homotopy theorists, dependent type theorists, or logicians, but who would still like to understand what it is that HoTT is all about. And while the book’s introduction is for everybody, I think it could use some extra context. Consider this the introduction to the introduction, as well as (in the last section) the introduction to the extra appendix we forgot to write.
The title is sufficiently imposing that I’ll focus on explaining each of its components. Here we go!
The foundations of mathematics are a touchy subject, and the book’s boldest claim is that HoTT might one day take its place as a new, so-called univalent foundation of mathematics. (More on the U-word later.)
Our modern understanding of mathematical foundations dates back to the “foundational crisis” of the early 20th century, spurred in part by the discovery of Russell’s paradox—that it would be contradictory to construct a set whose members are all sets \(S\) such that \(S\not\in S\). (Is this set a member of itself?)
Since we want mathematics to be consistent (non-contradictory), we must somehow rule out this construction, which was legal in the set theory of the time. On the other hand, since mathematics is about ideas, not painstaking manipulation of logical formulas, we would like to be careful without drowning in a sea of rigor.
There is much story to tell here, and the question of foundations is more complex than I have made it out to be. But in the end, the path forward was set by a group of French mathematicians who called themselves Nicolas Bourbaki, publishing a set of volumes unifying mathematics under the banner of modern set theory (using a variant of today’s gold standard, Zermelo–Fraenkel set theory with the axiom of choice). These volumes successfully encoded much of mathematics in ZFC, while doing so in a rigorous but manageable expository style.
The Bourbakian bargain of mathematics is our adoption of a standard style of mathematical prose which allows all our work to be, in principle, elaborated into ZFC; but of course, buying into this system means buying into ZFC as a satisfactory foundation. ZFC has successfully steered mathematics away from the frightful crises of yore, so why do we claim HoTT might deserve to take its place as a foundation of mathematics?
That’s a hard question. In large part, we wrote the book to help bolster that claim, and to develop and demonstrate a new style of mathematical prose in which all our work can be, in principle, elaborated into HoTT. I think this aspect of the project was a huge success—the book is able to go from explaining the very basics of type theory (in Chapter 1) to computing higher homotopy groups of spheres (\(\pi_n(S^n)\simeq\mathbb{Z}\), in Chapter 8) in 250 pages, a dazzling feat.
Certainly there are tradeoffs. ZFC and HoTT are fundamentally quite different, so the reasoning we employ will be somewhat unusual to a post-Bourbakian mathematician. And while we are able to define real numbers (Chapter 11), analysis in HoTT would be rather different from standard analysis, as HoTT is based on a constructive logic.
As I’ve previously discussed, category theory and type theory provide robust, representation-independent notions of abstraction which are advantageous for doing mathematics.
In type theory, the proposition that two objects \(a,a’\) of type \(A\) are equal, is itself a type, \(a=_A a’\). In the past, as one might expect, we said that two objects are equal exactly when they are completely identical (up to computation; in this sense, \(1+1\) and \(2\) are completely identical).
Homotopy type theory arises from the observation that it is consistent to add more equations. (Of course, we can’t equate terms willy-nilly: \(0=1\) is inconsistent, since we can prove that \(0\neq 1\).) These do not affect the behavior of equality; for example, it’s still the case that functions take equal arguments to equal results—if \(f:A\to B\) and \(a=_A a’\), then \(f(a)=_B f(a’)\).
Moreover, because \(a=_A a’\) is itself a type, it may contain two proofs of equality \(p,q\), and these proofs might themselves be equal or not, giving rise to a type of equalities between equalities, \(p=_{(a=_A a’)}q\). These generalized equalities behave like paths in the sense of homotopy theory; just as homotopy theory studies the structure of paths in interesting topological spaces, HoTT analyzes the equality types of higher inductive types corresponding to these spaces.
The other cornerstone of HoTT is the univalence axiom, which I will explain by way of analogy.
Consider a large C program with many different data structures. These data structures are coded in separate files, and expose APIs which are not dependent on specific implementation details. These tasteful choices were made by a wise programmer to simplify reasoning about the program, and to allow each data structure to be easily used in other programs, or even swapped with another implementation of the same API.
Emboldened, this programmer claims that their two implementations of dictionaries can be silently swapped in any program at all! Of course, this is not the case—in C, one can bypass any API and read data directly out of a pointer, detecting even the slightest difference in representation. Certainly, doing so is (generally) bad practice, and is easily noticed in otherwise safe code, but it is nevertheless quite possible.
While good mathematics, like good programming, is about developing and reasoning about useful abstractions, set theory, like C, fails to prevent users from breaking these abstractions. It is, by and large, irrelevant whether pairs \((a,b)\) are defined to be \(\{\{a\},\{a,b\}\}\) or \(\{\{0,a\},\{1,b\}\}\), but since everything is a set, one can distinguish the two by asking whether \(\{a\}\in (a,b)\).
This may all seem rather pedantic, but my point is that in type theory, unlike set theory or C, such a guarantee is possible, and its codification is the univalence axiom. Univalence says that the type of equalities between two types is the type of isomorphisms between them. (Technically it’s the type of equivalences, a notion of isomorphism compatible with path structure.)
Because everything in type theory respects equality, if two types \(A,B\) are isomorphic, then any structure on \(A\) exists also on \(B\): any function \(A\to A\) gives rise to a function \(B\to B\). If \(A\) is a group, so is \(B\). In fact, any theorem about \(A\) must be true of \(B\). In this sense, univalence is an incredibly powerful extensionality principle not possible in ZFC, and is one of HoTT’s biggest selling points. It’s more than extensionality, too—we make heavy use of the fact that different isomorphisms give rise to different proofs of equality.
As the title suggests, our book is, first and foremost, about mathematics in HoTT: how mathematical prose might look in a HoTT world, what we’ve proven in HoTT, and why HoTT is advantageous. But we have neglected many topics of interest to the type theorist, partly to minimize our alienation of mathematicians, and partly because we don’t yet understand the type-theoretic ramifications of HoTT very well. (I guess we’ll just have to write another book, huh?)
To close off this post, I’d like to sketch the problems that we type theorists are grappling with, and why they are relevant even to mathematics in HoTT. (If you are already familiar with dependent type theory, sections A.2–A.3 of the appendix are a natural deduction presentation of the current incarnation of HoTT. If you are not familiar with dependent type theory but want to be, Chapter 1 is an excellent introduction.)
Set theory consists of two kinds of objects: sets, and propositions about those sets. A proposition is either true or false, and classical logic specifies ways to show which. For example, to show \(A\land B\), it suffices to prove both \(A\) and \(B\).
Type theory, in contrast, has only one kind of object: types. Propositions correspond to mathematical objects—to show a proposition is true, one must construct an element of that object; to show it is false, one must show that it would be contradictory for such an element to exist. To prove \(A\land B\) is to construct an element of the product type \(A\times B\), e.g., to give a pair \((a,b)\) where \(a:A\) and \(b:B\). Given a proof of \(A\land B\), we can prove both \(A\) and \(B\) by projecting out the two proofs which make up our original proofs—\(\pi_1(a,b) = a\) and \(\pi_2(a,b) = b\).
This identification of propositions with mathematical objects results in a strikingly different logic. There are as many different proofs of \(A\) as there are elements of \(A\), and these proofs are themselves mathematical objects about which we can directly reason.
Moreover, because type theory is a constructive logic, these proofs are also terminating algorithms which specify an element of the proven proposition. (Any computer scientists still reading? Good.) So when we say that there exists a number satisfying some condition, we can point to a specific numeral along with a proof of the proposition at that number. (In classical logic, we might instead prove that such a number exists by showing that it’s impossible for no number to satisfy the condition. This proof certainly does not produce a specific numeral.)
This computational nature of type theory relies on the fact that each type’s elements all have the same shape, after computation. I said that pairing is one way to construct elements of \(A\times B\), but it’s actually the only way, in some sense. Sure, we could also construct such an element by applying a function \(\mathbb{N}\to (A\times B)\) to \(0\). This is a function application, not a pair, but it is guaranteed to yield a literal pair after computation.
This property is called canonicity, and is usually stated: “Every element of \(\mathbb{N}\) computes to a numeral.” Although it only refers to \(\mathbb{N}\), this theorem takes care of all types’ shapes at once, for the following reason.
If we had an element \(\mathsf{stuck}:A\times B\) which didn’t compute to a pair, then its projection \(\pi_1(\mathsf{stuck})\) would be an element of \(A\) which didn’t have the usual shape of such elements (since \(\pi_1\) can only compute on a literal pair). So \(A\) and \(B\) also have elements of the wrong shape. If we define a function \(f:A\to\mathbb{N}\) which examines the form of its argument, it will not be able to compute on \(\pi_1(\mathsf{stuck})\), and so we will have an element \(f(\pi_1(\mathsf{stuck}))\) of \(\mathbb{N}\) which cannot compute to a numeral.
If we are concerned only with provability, not computation, then it’s alright to add any consistent fact as a stuck term as above. We can even postulate that we have a function which decides whether a Turing machine halts, although doing so causes every type to lose its computational properties.
Right now, we’re augmenting type theory with univalence by simply adding it as a postulated term. We know it’s consistent for metatheoretic reasons, so HoTT is perfectly sound as a foundation with all the benefits I described above. But it can’t be considered a programming language yet. I and others are hoping to solve this problem by figuring out how univalence computes, but we don’t have an answer yet.
Okay, but say you don’t care about programming. Indeed, although we have ideas for how HoTT can be useful to programmers, we don’t yet have “killer apps” like we do for math. Unfortunately for you, computation is still important for doing math in HoTT!
Whenever we appeal to the univalence axiom in constructing some element, that element does not reduce properly. The more we do with that element, the more things get stuck around the univalence term, and whenever we want to prove a property of that element, we’re forced to reason about a very large term.
We can prove lemmas about how univalence acts on certain things; for example, if we use univalence with an equivalence between \(A\) and \(B\) to transport an element of \(A\) to an element of \(B\), this amounts to applying the \(A\to B\) direction of the supplied equivalence. But these two quantities are only equal in the sense of being homotopic, so to prove some property of the univalence-containing term, we must prove that property for the simpler term, and explicitly wrap it with this lemma.
As the occurrences of univalence get buried further into a term, we need to apply more and more narrow lemmas, which is time-consuming, verbose, and difficult to understand later. Worse yet, these large irreducible terms and lemmas take up lots of memory in our proof assistants, making their verification slower! Failure of canonicity is bad for both humans and computers.
I hope this has helped contextualize the HoTT project. Take a look at our book, and please feel free to leave any questions or comments here, and to open a GitHub issue if you find any bugs in the book.
]]>If you’d like to follow along, here’s my expanded explanation of the original proof:
Any positive integer \(n\) can be squared, and squaring has the property that \(n^2 \geq n\). In fact, it is easy to see that \(1\) is the unique \(n\) for which \(n^2 = n\); in all other cases, \(n^2 > n\). Let us assume for a moment that there is a largest positive integer \(N\). Well, it certainly cannot be a number besides \(1\), because if so, \(N^2\) is larger than \(N\), a contradiction. Therefore, the only possibility is that \(N=1\).
However, there is another possibility—that there simply is no largest positive integer. The fact that squaring can embiggen all numbers besides \(1\) is irrelevant, because there is in fact also a way to obtain a number larger than \(1\). (If there were some largest positive integer, then the previous facts about squaring would imply that no other technique could obtain a number larger than \(1\).)
And here’s my standalone Agda formalization. (It checks for me in Agda 2.3.2, without the standard library.) The interesting definitions are the last four, in particular, the key lemma that squaring embiggens numbers besides \(1\), and the theorem statement itself (one-is-maximal
).
]]>Steiner gave five proofs of the isoperimetric theorem. Lovely as they are, he left one point open to attack: all proofs assume the existence of a solution (his strategy is always to take a figure that is not a circle and show that its area can be improved). This did not go unpunished. The analyst vultures can smell an existence assumption from miles away. […] Perron at least jokes about it:
Theorem. Among all curves of a given length, the circle encloses the greatest area.
Proof. For any curve that is not a circle, there is a method (given by Steiner) by which one finds a curve that encloses greater area. Therefore the circle has the greatest area.Theorem. Among all positive integers, the integer 1 is the largest.
Proof. For any integer that is not 1, there is a method (to take the square) by which one finds a larger positive integer. Therefore 1 is the largest integer.— Viktor Blåsjö, The evolution of the isoperimetric problem (2005), quoting Oskar Perron, Zur Existenzfrage eines Maximums oder Minimums (1913)
]]>Therefore, numbers are not objects at all, because in giving the properties (that is, necessary and sufficient) of numbers you merely characterize an abstract structure—and the distinction lies in the fact that the “elements” of the structure have no properties other than those relating them to other “elements” of the same structure. […] To be the number 3 is no more and no less than to be preceded by 2, 1, and possibly 0, and to be followed by 4, 5, and so forth. And to be the number 4 is no more and no less than to be preceded by 3, 2, 1, and possibly 0, and to be followed by…Any object can play the role of 3; that is, any object can be the third element in some progression. […]
Arithmetic is therefore the science that elaborates the abstract structure that all progressions have in common merely in virtue of being progressions. It is not a science concerned with particular objects—the numbers. The search for which independently identifiable particular objects the numbers really are (sets? Julius Caesars?) is a misguided one.
— Paul Benacerraf, What numbers could not be (1965)
Set theory centers around a two-place judgment \(x\in X\)—that \(x\) is an element of the set \(X\). Type theories, in contrast, are built on a three-place judgment \(\Gamma\vdash t:B\)—that under the assumptions \(\Gamma\), \(t\) is a term in the type \(B\). As we will see, the latter notion generalizes the former, and suggests a new way to use sets.
In a set-theoretic model, \([\![\cdot\vdash b:B ]\!]\) is an element of the set \([\![B ]\!]\). \([\![x:A\vdash t:B ]\!]\) is a function \([\![A ]\!]\to[\![B ]\!]\), which sends any element of \([\![A ]\!]\) to an element of \([\![B ]\!]\). Indeed, the substitution
says that an element \(s\) of \([\![A ]\!]\) coupled with such a function \([\![A ]\!]\to[\![B ]\!]\) produces an element \([s/x]t\) of \([\![B ]\!]\).
Notice that, in set theory, \(b\) is an element while \(t\) is a function; in type theory, \(b\) is an element under no assumptions, while \(t\) is an element under the assumption \(x:A\). So we can think of a function \([\![A ]\!]\to[\![B ]\!]\) as an element of \([\![B ]\!]\) pending an element of \([\![A ]\!]\), and an element of \([\![B ]\!]\) as an element of \([\![B ]\!]\) pending nothing (technically, pending an element of \(1\)).
Mathematically, these functions are called generalized elements: for any sets \(X,Y\), an \(X\)-element of \(Y\) is a function \(X\to Y\). Then \(t\) is an \([\![A ]\!]\)-element of \([\![B ]\!]\), and \(b\) is a \(1\)-element of \([\![B ]\!]\). (We also call \(1\)-elements global elements, in the sense that they are elements apropos of nothing.)
The idea of generalized elements may seem frivolous, but let us make a few observations. First, they correspond closely to type-theoretic judgments. In a set-theoretic model, \([\![\Gamma\vdash t:B ]\!]\) is a \([\![\Gamma]\!]\)-element of \([\![B ]\!]\). Conversely, \(f\) is an \(A\)-element of \(B\) is basically a three-place judgment which says that, under the assumptions \(A\), \(t\) is an element of \(B\). Think of it like “\(A\vdash f:B\)” (but mind the scare quotes).
Secondly, generalized elements are defined only in terms of functions between sets, not in terms of sets’ elements themselves. As we will see in the next few posts, this concept is enough to formulate a surprising amount of set-theoretic reasoning.
Two simple examples: given a global element of \(A\), and a function \(A\to B\), we can obtain a global element of \(B\) by composing these functions.
A function \(f:A\to B\) is injective if two global elements \(a,a'\) are equal (as functions) whenever their postcompositions by \(f\) are equal (as functions).
Lastly, if we define set-theoretic notions purely in terms of functions, as we have started to do above, we can port these definitions directly to more complicated structures by replacing set with foo, and function with functions between foos. (As we will soon see, these ported definitions will even be correct!)
This is exactly the modus operandi of category theory. A category is essentially a choice of foo and functions between foos satisfying some very basic conditions that allow these kinds of definitions to make sense. The foos of a category are called its objects, and the functions are called its morphisms.
In category theory, we say things like, “In a category where such and such is true, the baz of two objects is defined to be an object which has some morphisms that do such and such.” After a few more definitions, the upshot is that, in any category which has bazzes, every morphism out of a baz happens to be qux.
It can definitely be abstruse at times—some call it abstract nonsense—but the point is that category theory studies patterns which recur in different fields of mathematics, and provides a precise language for relating different sorts of mathematical structures (like type theories and sets!).
On to the definition. A category is a collection of objects \(A,B,C\) and morphisms \(f,g\) which go from an object to an object; \(f:A\to B\) means that \(f\) goes from \(A\) to \(B\). We say \(A\) is the domain of \(f\), and \(B\) the codomain. (It is useful to think of \(f:A\to B\) both as a “function” from \(A\) to \(B\) and as an \(A\)-element of \(B\), depending on the situation.)
Furthermore, if we have morphisms \(f:A\to B\) and \(g:B\to C\), we can compose them to obtain a morphism \(gf:A\to C\); composition is associative (\((hg)f=h(gf)\)), and each object \(A\) has an identity morphism \(\textbf{id}_A\) (\(f\textbf{id}_A=f=\textbf{id}_B f\)).
There is a category \(\textbf{Set}\) whose objects are sets, morphisms \(A\to B\) are functions from \(A\) to \(B\), composition is function composition, and identities are identity functions.
\(\textbf{Grp}\) is a category whose objects are groups, and whose morphisms are group homomorphisms; \(\textbf{Top}\) is a category whose objects are topological spaces, and whose morphisms are continuous functions. In both cases, composition and identities are the ordinary notions, and satisfy the necessary equations. There are no non-homomorphism (resp., discontinuous) functions in \(\textbf{Grp}\) (resp., \(\textbf{Top}\)); we’ll see next time that this is the reason our definitions make sense, so to speak, in these categories.
Here’s one final example of a rather different flavor: recall that a partial order, or poset, is a set equipped with a relation \(\preceq\) which is
reflexive: \(x\preceq x\);
antisymmetric: if \(x\preceq y\) and \(y\preceq x\), then \(x=y\); and
transitive: if \(x\preceq y\) and \(y\preceq z\), then \(x\preceq z\).
Examples include the natural or real numbers with \(\leq\). The subsets of any set form a poset with \(\subseteq\). So does divisibility over the natural numbers: say \(a|b\) if \(a\) divides \(b\). Clearly \(a|a\); if \(a|b\) and \(b|a\), then \(a=b\); and if \(a|b\) and \(b|c\), then \(a|c\).
Any poset gives rise to a category whose objects are the elements of the poset, and for any two objects \(A,B\), there is exactly one morphism \(A\to B\) if \(A\preceq B\), and none otherwise.
Why is this a category? By reflexivity, \(A\preceq A\), so there is exactly one morphism \(A\to A\), which must be \(\textbf{id}_A\). Given morphisms \(A\to B\) and \(B\to C\), the transitivity of \(\preceq\) guarantees a unique morphism \(A\to C\), which we define to be the result of composing the morphisms. Associativity and identity follow trivially from the fact that there is at most one morphism between any two objects.
]]>Although we have not yet finished discussing algebraic theories, we will continue our discussion in the more general framework of algebraic type theories, which allow us to simultaneously define multiple theories which may or may not refer to each other. (Algebraic type theories are also known as many-sorted algebraic theories, for reasons which will be apparent momentarily.)
As one of many motivations, consider group actions, which arise from a group \(G\) and a set \(X\), along with a function \(\cdot:G\times X\to X\) which is compatible with the group in the sense that \(\mathord{\sf{\text{e}}}\cdot x = x\) and \((g\circ h)\cdot x = g\cdot (h\cdot x)\). (We say that \(g\cdot x\) is obtained from the action of \(g\) on \(x\).)
For example, a permutation group has as elements permutations (of a set of size \(n\)), where \(\circ\) is composition of permutations. The action of a permutation group on an appropriately–sized ordered set is obtained by applying the permutation.
Although we can separately define groups and sets as algebraic theories, we cannot define group actions, because \(\cdot\) refers to both theories at once. To support multiple sorts of terms, we annotate each term with a type. This affects the right-hand side of each judgment, as well as each variable in the context. So our theory of groups would be written:
We read \(\Gamma\vdash e:G\) as saying that, in any context \(\Gamma\), \(e\) is a term of type \(G\), and \(x:G\vdash x\circ e=x:G\) as saying that, given any term \(x\) of type \(G\), then \(x\circ e\) and \(x\) are equal terms of type \(G\).
In the same theory, we can also define a type corresponding to some set, like:
for the set with two elements. Then a group action on \(\mathord{\sf{\text{bool}}}\) consists of a \(\cdot\) constant which takes a term of type \(G\) and a term of type \(\mathord{\sf{\text{bool}}}\):
Notice that we can no longer describe the arity of function symbols numerically, since the types of the arguments are now relevant. We instead use arities of the form \(A_1,\cdots,A_n\to B\); we say \(\cdot\) has arity \(G,\mathord{\sf{\text{bool}}}\to\mathord{\sf{\text{bool}}}\).
In an algebraic type theory, the hypothesis rule must now maintain the type of the variable:
and the substitution rule must require that the types of the substituted terms match the types of the variables.
Of course, if they did not match, then we would not be able to obtain a derivation of the substituted term from a derivation of the original term in the manner previously described. (As before, weakening can be derived from substitution.)
Equalities are now annotated with the type of the two terms; terms can only be equated when their types are equal. Thus, the basic axioms are now:
A set-theoretic model of an algebraic type theory is an interpretation function \([\![- ]\!]\) which
sends each type \(A\) to a set \([\![A ]\!]\), and
each judgment \(x_1:A_1,\cdots,x_n:A_n\vdash t:B\) to a function \([\![A_1 ]\!]\times\cdots\times[\![A_n ]\!]\to [\![B ]\!]\), such that
for each judgment \(\Gamma\vdash s=t:A\), it is the case that \([\![\Gamma\vdash s:A ]\!]=[\![\Gamma\vdash t:A ]\!]\).
We extend the interpretation function to contexts by \([\![x_1:A_1,\cdots,x_n:A_n ]\!]= [\![A_1 ]\!]\times\cdots\times[\![A_n ]\!]\).
As with algebraic theories, we also require that the hypothesis rule is modeled by a projection out of the context tuple, \((x_1,\cdots,x_n)\mapsto x_i\), and and that substitution of terms modeled by \(f_1,\cdots,f_n\) into a term modeled by \(g\) is modeled by \(\hat x\mapsto g(f_1(\hat x),\cdots,f_n(\hat x))\). It is worth verifying that these functions have the correct sets as domain and codomain.
Lastly, as with algebraic theories, it is the case that, for any choice of \([\![- ]\!]\) on types only, the behavior of \([\![- ]\!]\) on judgments is determined precisely by a choice of \([\![f ]\!]:[\![A_1 ]\!]\times\cdots\times[\![A_n ]\!]\to[\![B ]\!]\) for each function symbol \(f\) of arity \(A_1,\cdots,A_n\to B\). Again, it is worth verifying that the previous proof goes through.
]]>We will henceforth treat theories primarily as collections of rules, rather than signatures equipped with axioms. This viewpoint allows us to treat extensions of algebraic theories as merely comprising additional rules; it also affords a uniform treatment of all the rules.
A set-theoretic model of an algebraic theory is a set \(X\) and an interpretation function \([\![- ]\!]\) which sends every judgment \(x_1,\cdots,x_n\vdash t\) to a function \([\![x_1,\cdots,x_n\vdash t ]\!]:X^n\to X\).
In particular, for each rule
it must be the case that, if a function \([\![\Delta\vdash s ]\!]\) exists, then we can produce a function \([\![\Gamma\vdash t ]\!]\) in some fashion generic to the choice of \(\Delta,\Gamma,s,t\). And for each rule
it must be the case that \([\![\Gamma\vdash s ]\!]=[\![\Gamma\vdash t ]\!]\).
Recall that \(x_1,\cdots,x_n\vdash t\) means that, so long as \(x_1,\cdots,x_n\) stand for terms, \(t\) is a term. Then \([\![x_1,\cdots,x_n\vdash t ]\!]\) is a function which takes an \(n\)-tuple of terms \((x_1,\cdots,x_n)\) and returns a term \(t\) which may contain them. In particular, the hypothesis rule
is modeled by the function \((x_1,\cdots,x_n)\mapsto x_i\) which projects out the \(i\)th element of the tuple. (We call this projection function \(\pi^n_i\).)
The substitution rule:
tells us that, from \(n\) functions \(f_1,\cdots,f_n:X^m\to X\) (given by \([\![\Gamma\vdash s_i ]\!]\)) , and a function \(g:X^n\to X\) (given by \([\![x_1,\cdots,x_n\vdash t ]\!]\)), we can get a function \(X^m\to X\). We model this by the function \(\hat x\mapsto g(f_1(\hat x),\cdots,f_n(\hat x))\) which takes an \(m\)-tuple, forms an \(n\)-tuple by applying each \(f_i\) to that tuple, and passes all the results to \(g\).
A side note to cognoscenti: this actually enforces alpha-equivalence (the irrelevance of variable names) in the model. For example, by substitution,
shows that \([\![y\vdash [y/x]t ]\!]\) must be modeled by \(y\mapsto [\![x\vdash t ]\!](y)\), which is simply \([\![x\vdash t ]\!]\). The same argument applies with any number of variables in the context.
How do we model the signature rules? Consider \(\circ\). In particular, we have the instance
which gives rise to a function \([\![a,b\vdash a\circ b ]\!]\). Once we have chosen this function—which we will abbreviate \([\![\circ ]\!]\)—we have in fact fixed the choice of \([\![\Gamma\vdash t\circ s ]\!]\) for each \([\![\Gamma\vdash t ]\!]\) and \([\![\Gamma\vdash s ]\!]\), and vice versa.
The converse is obvious. If we have chosen how to obtain \([\![\Gamma\vdash t\circ s ]\!]\) for each \([\![\Gamma\vdash t ]\!]\) and \([\![\Gamma\vdash s ]\!]\), then we simply plug in \([\![a,b\vdash a ]\!]\) and \([\![a,b\vdash b ]\!]\), which must be the projections \(\pi^2_1\) and \(\pi^2_2\), and obtain \([\![a,b\vdash a\circ b ]\!]=[\![\circ ]\!]\).
Given \([\![\Gamma\vdash t ]\!]\) and \([\![\Gamma\vdash s ]\!]\), where \(\Gamma\) has length \(n\), \([\![\Gamma\vdash t\circ s ]\!]:X^n\to X\) must be the function \(\hat x\mapsto [\![\circ ]\!]([\![\Gamma\vdash t ]\!](\hat x),[\![\Gamma\vdash s ]\!](\hat x))\). To see this, by substitution,
where \([\![\Gamma\vdash [t,s/a,b](a\circ b) ]\!]= [\![\Gamma\vdash t\circ s ]\!]\) because \(\Gamma\vdash [t,s/a,b](a\circ b)=t\circ s\). But \([\![\Gamma\vdash [t,s/a,b](a\circ b) ]\!]\) must be \(\hat x\mapsto[\![a,b\vdash a\circ b ]\!]([\![\Gamma\vdash t ]\!](\hat x),[\![\Gamma\vdash s ]\!](\hat x))\), which is what we asserted above.
Thus, modeling the \(\circ\) rule amounts to choosing a function \([\![\circ ]\!]:X\times X\to X\). Similarly, modeling \({}^{-1}\) requires a function \([\![{}^{-1} ]\!]:X\to X\), and \(\mathord{\sf{\text{e}}}\) an element \([\![\mathord{\sf{\text{e}}}]\!]\in X\).
This last one requires some explanation. As before, \([\![\cdot\vdash\mathord{\sf{\text{e}}}]\!]:X^0\to X\) is a function \(()\mapsto [\![\mathord{\sf{\text{e}}}]\!]()\) which takes an empty tuple of terms. But since there is only one empty tuple, \([\![\mathord{\sf{\text{e}}}]\!]\) always takes on the same value, so a choice of \([\![\mathord{\sf{\text{e}}}]\!]\) is simply a choice of an element of \(X\), and we say \([\![\mathord{\sf{\text{e}}}]\!]\in X\). (We call the set containing only the empty tuple \(1\), because it has one element; this notation draws an analogy with numerical exponentiation, where \(x^0=1\).)
Finally, the weakening rule
is derived from substitution, and so \([\![x_1,\cdots,x_n,x_{n+1}\vdash t ]\!]\) is
Because an interpretation \([\![- ]\!]\) must send equal judgments to equal functions, we must verify that the constructions above satisfy the axioms required of all algebraic theories.
Reflexivity, symmetry, and transitivity are satisfied simply because ordinary mathematical equality enjoys these properties. For example, if \(\Gamma\vdash t=s\), then the model has equal functions \([\![\Gamma\vdash t ]\!]=[\![\Gamma\vdash s ]\!]\); the symmetry axiom implies \(\Gamma\vdash s=t\), and so demands simply that \([\![\Gamma\vdash s ]\!]=[\![\Gamma\vdash t ]\!]\) also.
Congruence is satisfied because equality of mathematical functions is extensional (i.e., two functions are equal when they send equal arguments to equal results). Consider the example that \(\Gamma\vdash t=t'\) should imply \(\Gamma\vdash t\circ\mathord{\sf{\text{e}}}=t'\circ\mathord{\sf{\text{e}}}\). We have seen that \([\![\Gamma\vdash t\circ\mathord{\sf{\text{e}}}]\!]=\hat x\mapsto [\![\circ ]\!]([\![\Gamma\vdash t ]\!](\hat x),[\![\mathord{\sf{\text{e}}}]\!])\) and similarly for \(t'\), but if \([\![\Gamma\vdash t ]\!]=[\![\Gamma\vdash t' ]\!]\), then these functions must be equal as well.
To see the general case, recall the congruence rule. Let \(f_i=[\![\Gamma\vdash s_i ]\!]\), \(g_i=[\![\Gamma\vdash u_i ]\!]\), and \(h=[\![x_1,\cdots,x_n\vdash t ]\!]\).
The two functions in the conclusion are, by the interpretation of substitution, \(\hat x\mapsto h(f_1(\hat x),\cdots,f_n(\hat x))\) and \(\hat x\mapsto h(g_1(\hat x),\cdots,g_n(\hat x))\). The hypotheses imply that \(f_i=g_i\) for each \(i\), so these functions are equal as well.
To summarize, a set-theoretic model of a theory is a set \(X\) and interpretation function \([\![- ]\!]\) from judgments to functions \(X^n\to X\), where the hypothesis and subsitution rules are interpreted in a particular fashion, and which obeys the remaining rules in a particular way.
In fact, this definition is sufficiently rigid that models are determined by a choice of interpretation for each function symbol, such that the axioms hold true (i.e., \([\![\circ ]\!](x,[\![\circ ]\!](y,z))= [\![\circ ]\!]([\![\circ ]\!](x,y),z)\)). This justifies our earlier statement that groups are in bijection with models of the algebraic theory of groups—a mathematical group is precisely a set equipped with such a choice of \([\![\circ ]\!],[\![{}^{-1} ]\!],[\![\mathord{\sf{\text{e}}}]\!]\) satisfying the axioms.
]]>We have seen how to present a mathematical structure as an algebraic theory, but have been vague about precisely which structures are denoted by any particular collection of rules.
Last time, we wrote down the group laws as an algebraic theory (with signature \(\mathord{\sf{\text{e}}}\), \({}^{-1}\), \(\circ\), and axioms for identity, inverses, and associativity) and said that we had defined groups.
Our signature mentions only one base term, \(\mathord{\sf{\text{e}}}\), so in some sense we have defined only the one-element group. On the other hand, we could get more mileage out of our theory by using it to describe all groups, in the sense that any group will be compatible with the given signature and axioms—although it may have additional rules (terms and equations) not enforced by the base theory itself.
As a more damning example, consider the natural numbers, which consist of
the zero number \(\mathord{\sf{\text{z}}}\), and
a successor function \(\mathord{\sf{\text{s}}}\) which sends a number to itself plus one.
We might say the natural numbers are an algebraic theory whose signature has an arity zero constant \(\mathord{\sf{\text{z}}}\) and an arity one constant \(\mathord{\sf{\text{s}}}\):
It is easy to see this theory must contain the terms \(\mathord{\sf{\text{s}}}(\mathord{\sf{\text{s}}}(\cdots\mathord{\sf{\text{s}}}(\mathord{\sf{\text{z}}})))\). We add no axioms, because
by reflexivity, and we would like
when \(n\neq m\), since different numbers are always nonequal.
If we allow a group to have elements besides \(\mathord{\sf{\text{e}}}\), then the natural numbers, defined this way, might have elements besides the \(\mathord{\sf{\text{s}}}^n(\mathord{\sf{\text{z}}})\). Worse yet, they might have additional equalities—the axiom \(n\vdash n=\mathord{\sf{\text{z}}}\) is compatible with the theory, and causes all numbers to equal zero!
To better understand these issues, we introduce the notion of a set-theoretic model of an algebraic theory. Intuitively, a theory is modeled by a set \(M\) if we can map terms in the theory to elements of \(M\) in such a way that equal terms (in the sense of the theory’s equality) are sent to the same element of \(M\). This will eventually let us make precise ideas like:
a set has a group structure if and only if it is a set-theoretic model of the algebraic theory of groups;
the one-element group is the smallest possible group (for some precise notion of ‘smallest’); and
the natural numbers are exactly the smallest model of the algebraic theory of natural numbers.
Next time, we will examine each rule in a theory—corresponding to signatures, structural rules, and axioms—and mirror each set-theoretically.
But the core idea is simple enough: to model a signature in a set \(M\) is to interpret each constant \(c\) of arity \(a\) as a genuine function \([\![ c ]\!]:M^a\to M\), i.e., which takes an \(a\)-tuple of elements of \(M\) to a single element. So if a set \(G\) models groups, then it is equipped with, in part, a multiplication function \([\![\circ ]\!]:G\times G\to G\) and an inverse function \([\![{}^{-1} ]\!]:G\to G\). In other words, it is a group in the ordinary sense.
]]>