“The abstract structure that all progressions have in common”

Hello! I’ve been terribly caught up with research lately, but the next several posts on categorical logic are in the pipeline, and will surface sooner or later. Until then, here’s a nice quote about representation independence in mathematics.

Therefore, numbers are not objects at all, because in giving the properties (that is, necessary and sufficient) of numbers you merely characterize an abstract structure—and the distinction lies in the fact that the “elements” of the structure have no properties other than those relating them to other “elements” of the same structure. [...] To be the number 3 is no more and no less than to be preceded by 2, 1, and possibly 0, and to be followed by 4, 5, and so forth. And to be the number 4 is no more and no less than to be preceded by 3, 2, 1, and possibly 0, and to be followed by…Any object can play the role of 3; that is, any object can be the third element in some progression. [...]

Arithmetic is therefore the science that elaborates the abstract structure that all progressions have in common merely in virtue of being progressions. It is not a science concerned with particular objects—the numbers. The search for which independently identifiable particular objects the numbers really are (sets? Julius Caesars?) is a misguided one.

— Paul BenacerrafWhat numbers could not be (1965)

Generalized elements, functions, and categories

(Part 6 of a series.)

Set theory centers around a two-place judgment \(x\in X\)—that \(x\) is an element of the set \(X\). Type theories, in contrast, are built on a three-place judgment \(\Gamma\vdash t:B\)—that under the assumptions \(\Gamma\), \(t\) is a term in the type \(B\). As we will see, the latter notion generalizes the former, and suggests a new way to use sets.

Generalized elements are functions

In a set-theoretic model, \([\![\cdot\vdash b:B ]\!]\) is an element of the set \([\![B ]\!]\). \([\![x:A\vdash t:B ]\!]\) is a function \([\![A ]\!]\to[\![B ]\!]\), which sends any element of \([\![A ]\!]\) to an element of \([\![B ]\!]\). Indeed, the substitution

says that an element \(s\) of \([\![A ]\!]\) coupled with such a function \([\![A ]\!]\to[\![B ]\!]\) produces an element \([s/x]t\) of \([\![B ]\!]\).

Notice that, in set theory, \(b\) is an element while \(t\) is a function; in type theory, \(b\) is an element under no assumptions, while \(t\) is an element under the assumption \(x:A\). So we can think of a function \([\![A ]\!]\to[\![B ]\!]\) as an element of \([\![B ]\!]\) pending an element of \([\![A ]\!]\), and an element of \([\![B ]\!]\) as an element of \([\![B ]\!]\) pending nothing (technically, pending an element of \(1\)).

Mathematically, these functions are called generalized elements: for any sets \(X,Y\), an \(X\)-element of \(Y\) is a function \(X\to Y\). Then \(t\) is an \([\![A ]\!]\)-element of \([\![B ]\!]\), and \(b\) is a \(1\)-element of \([\![B ]\!]\). (We also call \(1\)-elements global elements, in the sense that they are elements apropos of nothing.)

The idea of generalized elements may seem frivolous, but let us make a few observations. First, they correspond closely to type-theoretic judgments. In a set-theoretic model, \([\![\Gamma\vdash t:B ]\!]\) is a \([\![\Gamma]\!]\)-element of \([\![B ]\!]\). Conversely, \(f\) is an \(A\)-element of \(B\) is basically a three-place judgment which says that, under the assumptions \(A\), \(t\) is an element of \(B\). Think of it like “\(A\vdash f:B\)” (but mind the scare quotes).

Secondly, generalized elements are defined only in terms of functions between sets, not in terms of sets’ elements themselves. As we will see in the next few posts, this concept is enough to formulate a surprising amount of set-theoretic reasoning.

Two simple examples: given a global element of \(A\), and a function \(A\to B\), we can obtain a global element of \(B\) by composing these functions.

A function \(f:A\to B\) is injective if two global elements \(a,a'\) are equal (as functions) whenever their postcompositions by \(f\) are equal (as functions).

Lastly, if we define set-theoretic notions purely in terms of functions, as we have started to do above, we can port these definitions directly to more complicated structures by replacing set with foo, and function with functions between foos. (As we will soon see, these ported definitions will even be correct!)

Categories

This is exactly the modus operandi of category theory. A category is essentially a choice of foo and functions between foos satisfying some very basic conditions that allow these kinds of definitions to make sense. The foos of a category are called its objects, and the functions are called its morphisms.

In category theory, we say things like, “In a category where such and such is true, the baz of two objects is defined to be an object which has some morphisms that do such and such.” After a few more definitions, the upshot is that, in any category which has bazzes, every morphism out of a baz happens to be qux.

It can definitely be abstruse at times—some call it abstract nonsense—but the point is that category theory studies patterns which recur in different fields of mathematics, and provides a precise language for relating different sorts of mathematical structures (like type theories and sets!).

On to the definition. A category is a collection of objects \(A,B,C\) and morphisms \(f,g\) which go from an object to an object; \(f:A\to B\) means that \(f\) goes from \(A\) to \(B\). We say \(A\) is the domain of \(f\), and \(B\) the codomain. (It is useful to think of \(f:A\to B\) both as a “function” from \(A\) to \(B\) and as an \(A\)-element of \(B\), depending on the situation.)

Furthermore, if we have morphisms \(f:A\to B\) and \(g:B\to C\), we can compose them to obtain a morphism \(gf:A\to C\); composition is associative (\((hg)f=h(gf)\)), and each object \(A\) has an identity morphism \(\textbf{id}_A\) (\(f\textbf{id}_A=f=\textbf{id}_B f\)).

There is a category \(\textbf{Set}\) whose objects are sets, morphisms \(A\to B\) are functions from \(A\) to \(B\), composition is function composition, and identities are identity functions.

\(\textbf{Grp}\) is a category whose objects are groups, and whose morphisms are group homomorphisms; \(\textbf{Top}\) is a category whose objects are topological spaces, and whose morphisms are continuous functions. In both cases, composition and identities are the ordinary notions, and satisfy the necessary equations. There are no non-homomorphism (resp., discontinuous) functions in \(\textbf{Grp}\) (resp., \(\textbf{Top}\)); we’ll see next time that this is the reason our definitions make sense, so to speak, in these categories.

Here’s one final example of a rather different flavor: recall that a partial order, or poset, is a set equipped with a relation \(\preceq\) which is

  1. reflexive: \(x\preceq x\);

  2. antisymmetric: if \(x\preceq y\) and \(y\preceq x\), then \(x=y\); and

  3. transitive: if \(x\preceq y\) and \(y\preceq z\), then \(x\preceq z\).

Examples include the natural or real numbers with \(\leq\). The subsets of any set form a poset with \(\subseteq\). So does divisibility over the natural numbers: say \(a|b\) if \(a\) divides \(b\). Clearly \(a|a\); if \(a|b\) and \(b|a\), then \(a=b\); and if \(a|b\) and \(b|c\), then \(a|c\).

Any poset gives rise to a category whose objects are the elements of the poset, and for any two objects \(A,B\), there is exactly one morphism \(A\to B\) if \(A\preceq B\), and none otherwise.

Why is this a category? By reflexivity, \(A\preceq A\), so there is exactly one morphism \(A\to A\), which must be \(\textbf{id}_A\). Given morphisms \(A\to B\) and \(B\to C\), the transitivity of \(\preceq\) guarantees a unique morphism \(A\to C\), which we define to be the result of composing the morphisms. Associativity and identity follow trivially from the fact that there is at most one morphism between any two objects.

Algebraic type theories

(Part 5 of a series.)

Although we have not yet finished discussing algebraic theories, we will continue our discussion in the more general framework of algebraic type theories, which allow us to simultaneously define multiple theories which may or may not refer to each other. (Algebraic type theories are also known as many-sorted algebraic theories, for reasons which will be apparent momentarily.)

As one of many motivations, consider group actions, which arise from a group \(G\) and a set \(X\), along with a function \(\cdot:G\times X\to X\) which is compatible with the group in the sense that \(\mathord{\sf{\text{e}}}\cdot x = x\) and \((g\circ h)\cdot x = g\cdot (h\cdot x)\). (We say that \(g\cdot x\) is obtained from the action of \(g\) on \(x\).)

For example, a permutation group has as elements permutations (of a set of size \(n\)), where \(\circ\) is composition of permutations. The action of a permutation group on an appropriately–sized ordered set is obtained by applying the permutation.

Although we can separately define groups and sets as algebraic theories, we cannot define group actions, because \(\cdot\) refers to both theories at once. To support multiple sorts of terms, we annotate each term with a type. This affects the right-hand side of each judgment, as well as each variable in the context. So our theory of groups would be written:

We read \(\Gamma\vdash e:G\) as saying that, in any context \(\Gamma\), \(e\) is a term of type \(G\), and \(x:G\vdash x\circ e=x:G\) as saying that, given any term \(x\) of type \(G\), then \(x\circ e\) and \(x\) are equal terms of type \(G\).

In the same theory, we can also define a type corresponding to some set, like:

for the set with two elements. Then a group action on \(\mathord{\sf{\text{bool}}}\) consists of a \(\cdot\) constant which takes a term of type \(G\) and a term of type \(\mathord{\sf{\text{bool}}}\):

Notice that we can no longer describe the arity of function symbols numerically, since the types of the arguments are now relevant. We instead use arities of the form \(A_1,\cdots,A_n\to B\); we say \(\cdot\) has arity \(G,\mathord{\sf{\text{bool}}}\to\mathord{\sf{\text{bool}}}\).

In an algebraic type theory, the hypothesis rule must now maintain the type of the variable:

and the substitution rule must require that the types of the substituted terms match the types of the variables.

Of course, if they did not match, then we would not be able to obtain a derivation of the substituted term from a derivation of the original term in the manner previously described. (As before, weakening can be derived from substitution.)

Equalities are now annotated with the type of the two terms; terms can only be equated when their types are equal. Thus, the basic axioms are now:

A set-theoretic model of an algebraic type theory is an interpretation function \([\![- ]\!]\) which

  1. sends each type \(A\) to a set \([\![A ]\!]\), and

  2. each judgment \(x_1:A_1,\cdots,x_n:A_n\vdash t:B\) to a function \([\![A_1 ]\!]\times\cdots\times[\![A_n ]\!]\to [\![B ]\!]\), such that

  3. for each judgment \(\Gamma\vdash s=t:A\), it is the case that \([\![\Gamma\vdash s:A ]\!]=[\![\Gamma\vdash t:A ]\!]\).

We extend the interpretation function to contexts by \([\![x_1:A_1,\cdots,x_n:A_n ]\!]= [\![A_1 ]\!]\times\cdots\times[\![A_n ]\!]\).

As with algebraic theories, we also require that the hypothesis rule is modeled by a projection out of the context tuple, \((x_1,\cdots,x_n)\mapsto x_i\), and and that substitution of terms modeled by \(f_1,\cdots,f_n\) into a term modeled by \(g\) is modeled by \(\hat x\mapsto g(f_1(\hat x),\cdots,f_n(\hat x))\). It is worth verifying that these functions have the correct sets as domain and codomain.

Lastly, as with algebraic theories, it is the case that, for any choice of \([\![- ]\!]\) on types only, the behavior of \([\![- ]\!]\) on judgments is determined precisely by a choice of \([\![f ]\!]:[\![A_1 ]\!]\times\cdots\times[\![A_n ]\!]\to[\![B ]\!]\) for each function symbol \(f\) of arity \(A_1,\cdots,A_n\to B\). Again, it is worth verifying that the previous proof goes through.

Set-theoretic models of algebraic theories: II

(Part 4 of a series.)

We will henceforth treat theories primarily as collections of rules, rather than signatures equipped with axioms. This viewpoint allows us to treat extensions of algebraic theories as merely comprising additional rules; it also affords a uniform treatment of all the rules.

A set-theoretic model of an algebraic theory is a set \(X\) and an interpretation function \([\![- ]\!]\) which sends every judgment \(x_1,\cdots,x_n\vdash t\) to a function \([\![x_1,\cdots,x_n\vdash t ]\!]:X^n\to X\).

In particular, for each rule

it must be the case that, if a function \([\![\Delta\vdash s ]\!]\) exists, then we can produce a function \([\![\Gamma\vdash t ]\!]\) in some fashion generic to the choice of \(\Delta,\Gamma,s,t\). And for each rule

it must be the case that \([\![\Gamma\vdash s ]\!]=[\![\Gamma\vdash t ]\!]\).

Structural rules

Recall that \(x_1,\cdots,x_n\vdash t\) means that, so long as \(x_1,\cdots,x_n\) stand for terms, \(t\) is a term. Then \([\![x_1,\cdots,x_n\vdash t ]\!]\) is a function which takes an \(n\)-tuple of terms \((x_1,\cdots,x_n)\) and returns a term \(t\) which may contain them. In particular, the hypothesis rule

is modeled by the function \((x_1,\cdots,x_n)\mapsto x_i\) which projects out the \(i\)th element of the tuple. (We call this projection function \(\pi^n_i\).)

The substitution rule:

tells us that, from \(n\) functions \(f_1,\cdots,f_n:X^m\to X\) (given by \([\![\Gamma\vdash s_i ]\!]\)) , and a function \(g:X^n\to X\) (given by \([\![x_1,\cdots,x_n\vdash t ]\!]\)), we can get a function \(X^m\to X\). We model this by the function \(\hat x\mapsto g(f_1(\hat x),\cdots,f_n(\hat x))\) which takes an \(m\)-tuple, forms an \(n\)-tuple by applying each \(f_i\) to that tuple, and passes all the results to \(g\).

A side note to cognoscenti: this actually enforces alpha-equivalence (the irrelevance of variable names) in the model. For example, by substitution,

shows that \([\![y\vdash [y/x]t ]\!]\) must be modeled by \(y\mapsto [\![x\vdash t ]\!](y)\), which is simply \([\![x\vdash t ]\!]\). The same argument applies with any number of variables in the context.

Signature rules

How do we model the signature rules? Consider \(\circ\). In particular, we have the instance

which gives rise to a function \([\![a,b\vdash a\circ b ]\!]\). Once we have chosen this function—which we will abbreviate \([\![\circ ]\!]\)—we have in fact fixed the choice of \([\![\Gamma\vdash t\circ s ]\!]\) for each \([\![\Gamma\vdash t ]\!]\) and \([\![\Gamma\vdash s ]\!]\), and vice versa.

The converse is obvious. If we have chosen how to obtain \([\![\Gamma\vdash t\circ s ]\!]\) for each \([\![\Gamma\vdash t ]\!]\) and \([\![\Gamma\vdash s ]\!]\), then we simply plug in \([\![a,b\vdash a ]\!]\) and \([\![a,b\vdash b ]\!]\), which must be the projections \(\pi^2_1\) and \(\pi^2_2\), and obtain \([\![a,b\vdash a\circ b ]\!]=[\![\circ ]\!]\).

Given \([\![\Gamma\vdash t ]\!]\) and \([\![\Gamma\vdash s ]\!]\), where \(\Gamma\) has length \(n\), \([\![\Gamma\vdash t\circ s ]\!]:X^n\to X\) must be the function \(\hat x\mapsto [\![\circ ]\!]([\![\Gamma\vdash t ]\!](\hat x),[\![\Gamma\vdash s ]\!](\hat x))\). To see this, by substitution,

where \([\![\Gamma\vdash [t,s/a,b](a\circ b) ]\!]= [\![\Gamma\vdash t\circ s ]\!]\) because \(\Gamma\vdash [t,s/a,b](a\circ b)=t\circ s\). But \([\![\Gamma\vdash [t,s/a,b](a\circ b) ]\!]\) must be \(\hat x\mapsto[\![a,b\vdash a\circ b ]\!]([\![\Gamma\vdash t ]\!](\hat x),[\![\Gamma\vdash s ]\!](\hat x))\), which is what we asserted above.

Thus, modeling the \(\circ\) rule amounts to choosing a function \([\![\circ ]\!]:X\times X\to X\). Similarly, modeling \({}^{-1}\) requires a function \([\![{}^{-1} ]\!]:X\to X\), and \(\mathord{\sf{\text{e}}}\) an element \([\![\mathord{\sf{\text{e}}}]\!]\in X\).

This last one requires some explanation. As before, \([\![\cdot\vdash\mathord{\sf{\text{e}}}]\!]:X^0\to X\) is a function \(()\mapsto [\![\mathord{\sf{\text{e}}}]\!]()\) which takes an empty tuple of terms. But since there is only one empty tuple, \([\![\mathord{\sf{\text{e}}}]\!]\) always takes on the same value, so a choice of \([\![\mathord{\sf{\text{e}}}]\!]\) is simply a choice of an element of \(X\), and we say \([\![\mathord{\sf{\text{e}}}]\!]\in X\). (We call the set containing only the empty tuple \(1\), because it has one element; this notation draws an analogy with numerical exponentiation, where \(x^0=1\).)

Finally, the weakening rule

is derived from substitution, and so \([\![x_1,\cdots,x_n,x_{n+1}\vdash t ]\!]\) is

Axioms

Because an interpretation \([\![- ]\!]\) must send equal judgments to equal functions, we must verify that the constructions above satisfy the axioms required of all algebraic theories.

Reflexivity, symmetry, and transitivity are satisfied simply because ordinary mathematical equality enjoys these properties. For example, if \(\Gamma\vdash t=s\), then the model has equal functions \([\![\Gamma\vdash t ]\!]=[\![\Gamma\vdash s ]\!]\); the symmetry axiom implies \(\Gamma\vdash s=t\), and so demands simply that \([\![\Gamma\vdash s ]\!]=[\![\Gamma\vdash t ]\!]\) also.

Congruence is satisfied because equality of mathematical functions is extensional (i.e., two functions are equal when they send equal arguments to equal results). Consider the example that \(\Gamma\vdash t=t'\) should imply \(\Gamma\vdash t\circ\mathord{\sf{\text{e}}}=t'\circ\mathord{\sf{\text{e}}}\). We have seen that \([\![\Gamma\vdash t\circ\mathord{\sf{\text{e}}}]\!]=\hat x\mapsto [\![\circ ]\!]([\![\Gamma\vdash t ]\!](\hat x),[\![\mathord{\sf{\text{e}}}]\!])\) and similarly for \(t'\), but if \([\![\Gamma\vdash t ]\!]=[\![\Gamma\vdash t' ]\!]\), then these functions must be equal as well.

To see the general case, recall the congruence rule. Let \(f_i=[\![\Gamma\vdash s_i ]\!]\), \(g_i=[\![\Gamma\vdash u_i ]\!]\), and \(h=[\![x_1,\cdots,x_n\vdash t ]\!]\).

The two functions in the conclusion are, by the interpretation of substitution, \(\hat x\mapsto h(f_1(\hat x),\cdots,f_n(\hat x))\) and \(\hat x\mapsto h(g_1(\hat x),\cdots,g_n(\hat x))\). The hypotheses imply that \(f_i=g_i\) for each \(i\), so these functions are equal as well.

To summarize, a set-theoretic model of a theory is a set \(X\) and interpretation function \([\![- ]\!]\) from judgments to functions \(X^n\to X\), where the hypothesis and subsitution rules are interpreted in a particular fashion, and which obeys the remaining rules in a particular way.

In fact, this definition is sufficiently rigid that models are determined by a choice of interpretation for each function symbol, such that the axioms hold true (i.e., \([\![\circ ]\!](x,[\![\circ ]\!](y,z))= [\![\circ ]\!]([\![\circ ]\!](x,y),z)\)). This justifies our earlier statement that groups are in bijection with models of the algebraic theory of groups—a mathematical group is precisely a set equipped with such a choice of \([\![\circ ]\!],[\![{}^{-1} ]\!],[\![\mathord{\sf{\text{e}}}]\!]\) satisfying the axioms.

Set-theoretic models of algebraic theories: I

(Part 3 of a series.)

We have seen how to present a mathematical structure as an algebraic theory, but have been vague about precisely which structures are denoted by any particular collection of rules.

Last time, we wrote down the group laws as an algebraic theory (with signature \(\mathord{\sf{\text{e}}}\), \({}^{-1}\), \(\circ\), and axioms for identity, inverses, and associativity) and said that we had defined groups.

Our signature mentions only one base term, \(\mathord{\sf{\text{e}}}\), so in some sense we have defined only the one-element group. On the other hand, we could get more mileage out of our theory by using it to describe all groups, in the sense that any group will be compatible with the given signature and axioms—although it may have additional rules (terms and equations) not enforced by the base theory itself.

As a more damning example, consider the natural numbers, which consist of

  1. the zero number \(\mathord{\sf{\text{z}}}\), and

  2. a successor function \(\mathord{\sf{\text{s}}}\) which sends a number to itself plus one.

We might say the natural numbers are an algebraic theory whose signature has an arity zero constant \(\mathord{\sf{\text{z}}}\) and an arity one constant \(\mathord{\sf{\text{s}}}\):

It is easy to see this theory must contain the terms \(\mathord{\sf{\text{s}}}(\mathord{\sf{\text{s}}}(\cdots\mathord{\sf{\text{s}}}(\mathord{\sf{\text{z}}})))\). We add no axioms, because

by reflexivity, and we would like

when \(n\neq m\), since different numbers are always nonequal.

If we allow a group to have elements besides \(\mathord{\sf{\text{e}}}\), then the natural numbers, defined this way, might have elements besides the \(\mathord{\sf{\text{s}}}^n(\mathord{\sf{\text{z}}})\). Worse yet, they might have additional equalities—the axiom \(n\vdash n=\mathord{\sf{\text{z}}}\) is compatible with the theory, and causes all numbers to equal zero!

To better understand these issues, we introduce the notion of a set-theoretic model of an algebraic theory. Intuitively, a theory is modeled by a set \(M\) if we can map terms in the theory to elements of \(M\) in such a way that equal terms (in the sense of the theory’s equality) are sent to the same element of \(M\). This will eventually let us make precise ideas like:

  1. a set has a group structure if and only if it is a set-theoretic model of the algebraic theory of groups;

  2. the one-element group is the smallest possible group (for some precise notion of ‘smallest’); and

  3. the natural numbers are exactly the smallest model of the algebraic theory of natural numbers.

Next time, we will examine each rule in a theory—corresponding to signatures, structural rules, and axioms—and mirror each set-theoretically.

But the core idea is simple enough: to model a signature in a set \(M\) is to interpret each constant \(c\) of arity \(a\) as a genuine function \([\![ c ]\!]:M^a\to M\), i.e., which takes an \(a\)-tuple of elements of \(M\) to a single element. So if a set \(G\) models groups, then it is equipped with, in part, a multiplication function \([\![\circ ]\!]:G\times G\to G\) and an inverse function \([\![{}^{-1} ]\!]:G\to G\). In other words, it is a group in the ordinary sense.