Approximability of Deep Computations

Samson Alva Department of Economics
The University of Texas at San Antonio
San Antonio, TX 78249
U.S.A. [email protected] , Eduardo Dueñez Department of Mathematics
The University of Texas at San Antonio
San Antonio, TX 78249
U.S.A. [email protected] , José Iovino Department of Mathematics
The University of Texas at San Antonio
San Antonio, TX 78249
U.S.A. [email protected] and Claire Walton Department of Electrical and Computer Engineering and Department of Mathematics
The University of Texas at San Antonio
San Antonio, TX 78249
U.S.A. [email protected]

(Date: May 18, 2025)

Abstract.

We introduce a structural framework for computations involving floating-point operations. Informed by real-valued logic, we introduce deep computations (ultracomputations) and deep iterates, formalizing the ideas of “asymptotic limit” of computations and compositional iterates, respectively.

As an application of this framework, we prove the existence of deep equilibria, which hitherto have been found only empirically (yielding remarkable memory savings in deep learning). Our proof of existence of deep equilibria is based on the concept of idempotent ultrafilter from combinatorics and inspired by the notion of indiscernibility from model theory.

We study and characterize deep computations (and hence deep equilibria) that are bona fide computable, i.e., uniformly approximable by a priori given computable primitive real-valued functions. Informed by model theory of real-valued structures, as well as $C_{p}$ -theory from topology, we use a classical result of Grothendieck to characterize computability of deep computations in terms of continuous extendibility.

Our framework does not impose a priori uniform/global bounds on real-valued quantities; therefore, our structures yield non-compact types spaces. Such type spaces require a more nuanced topologically treatment than compact ones arising in model theory of $[0,1]$ -valued structures.

Key words and phrases:

Deep computations, ultracomputations, deep equilibrium models, idempotent ultrafilters

2000 Mathematics Subject Classification:

68T27, 68T07, 03C98, 05D10, 54D80

1. Introduction

In this paper, we introduce a general notion of computation which, we contend, captures the essence of digital computations involving floating-point arithmetic. As computing power expands, so does the need for foundational frameworks to understand systems and applications, particularly their asymptotic properties as scale and scope grow potentially indefinitely; such frameworks are needed across multiple areas of computational science and engineering. A prominent example is that of neural networks, which pass computations through an increasing number of compositional layers. Deep learning systems are typically based on networks with a large number of layers (i.e., increasingly deep networks), which are correspondingly expensive to compute; it is of enormous importance to approximate the output of such deep networks more efficiently—even, simply to understand whether such approximation is possible in principle.

Notable recent frameworks working to leverage asymptotic properties of deep networks include:

•

Neural Ordinary Differential Equations (Neural ODEs) [CRBD18], [HLA⁺21], which introduce ODEs as mechanisms to capture residual deep networks asymptotically (as the number of layers goes to infinity), and use numerical or analytical ([HLA⁺22]) ODE solutions as “shortcuts” to approximate the output of such deep networks minus the need for multifold iterations; and
•

Deep Equilibrium Networks (DEQs) [BKK19], [BKK20], [APL⁺22], which model deep networks arising by iterating the same (“parameter-tied”) layer transition indefinitely (in cases when such network reaches an asymptotic equilibrium), then uses fixed point numerical solvers to shortcut the deep computation implied.

Related problems have also arisen in numerical optimization, where discrete iterative optimization algorithms—whose asymptotic properties as the step size goes to zero are of perennial interest and importance—are now modeled asymptotically using dynamical systems [SDJS22], and in control theory, where properties of parameter-dependent asymptotic computations are of central importance in key applications [CRBD18, HL⁺19, LJ23].

Such approaches to asymptotic (‘deep’) computations are fundamentally asking whether a complex composition of function applications may be realized through a smaller computation. From the perspective of this paper: “Can the result of some large (conceptually infinite) sequence of function compositions be approximated, effectively and finitarily, from accepted computational primitives (‘atomic predicates’) and standard floating-point operations?” Without precise context, the questions of whether the asymptotic limit of a computation exists, and when it can be feasibly approximated are ill-defined; in this paper, we propose a framework and basic tools that we hope will be useful in addressing such questions, among others, concerning the notion and nature of deep computations.

The remainder of this introductory section provides an informal overview of our approach and main results.

Our computations $\gamma$ are transformations accepting as input a state $v$ and returning an output state $w=\gamma(v)$ . We posit that states $v$ be uniquely characterized in terms of a collection of real-valued quantities $P(v)$ (as $P$ varies over a fixed collection $\mathcal{P}$ of “primitive predicates”). We call each real value $P(v)$ a feature of $v$ ; it is appropriate to think of such feature as a the “ $P$ -th coordinate” of $v$ —indeed, a particular instance of our framework is when states are vectors $v\in\mathbb{R}^{n}$ and there is one predicate $P_{i}(v)=v_{i}$ for each of the coordinates $v_{i}$ ( $1\leq i\leq n$ ) of $v$ . Each predicate $P$ is “atomic” in the sense that it captures a primitive feature of states $v$ , i.e., features of any given state are regarded as computable ab initio. De facto, a computation $\gamma$ effectively maps the collection $(P(v))_{P\in\mathcal{P}}$ of predicate values of an arbitrary input state $v$ , to another such collection $(P(\gamma(v)))_{P\in\mathcal{P}}$ uniquely characterizing the output state $\gamma(v)$ .

Understood in such generality, transformations $v\mapsto\gamma(v)$ can hardly be called computations in any reasonable sense. Indeed, the identification of a state with the collection of real values of its primitive predicates in any sensible paradigm for floating-point computations transforming states implies that each real-valued feature $Q(w)$ of the output state $w=\gamma(v)$ ought to depend continuously on features $P(v)$ of the input state; such continuity assumption is implied by the tenet that floating-point computations are intrinsically approximate, never quite exact. In general, one cannot expect finite-precision calculations to be able to approximate real-valued features (of the output) that vary discontinuously with respect to the real-valued quantities that one uses to encode the input state!

For simplicity, we assume the inputs $v$ and outputs $w=\gamma(v)$ of computations $\gamma$ belong to the same state space $L$ which, when endowed with the collection $\mathcal{P}$ of real-valued predicates $P:L\to\mathbb{R}$ , becomes a Computation States Structure (CSS) $\underline{L}=\langle L,\mathcal{P}\rangle$ .

Our results are deeper (and, hopefully, most illuminating) when the CSS is endowed with an infinite predicate collection $\mathcal{P}$ . Because of the importance of such special case, and for ease of exposition, throughout the remainder of this introductory section, we assume:

$\mathcal{P}$ is countable.

The case of $\mathcal{P}$ countable is quite relevant in applications for reasons we now explain. Let the predicate collection be $\mathcal{P}=(P_{n})_{n\in\mathbb{N}}$ . Every state $v\in L$ is characterized (at least from the purely structural perspective we adopt) by the real sequence $(P_{n}(v))_{n\in\mathbb{N}}$ of its predicate values, called the type $\operatorname{tp}(v)$ of $v$ ; thus, $L$ is effectively identified with the set of such types, i.e., with a subset $L\subseteq\mathbb{R}^{\mathbb{N}}$ of the set $\mathbb{R}^{\mathbb{N}}$ of all real sequences $(r_{n})_{n\in\mathbb{N}}$ ; one may think of $r_{n}=P_{n}(v)$ as the “ $n$ -th entry” of a state $v$ . We shall require that computations $\gamma:L\to L$ have “features” (i.e., entries) $P_{n}{\circ}\gamma:v\mapsto P_{n}(\gamma(v))$ varying continuously with (the type of) $v$ for each $n\in\mathbb{N}$ ; explicitly, every such computation feature $P_{n}{\circ}\gamma(v)$ is required to vary continuously with respect to any feature $P_{m}(v)$ of $v$ .

State types effectively encode, e.g., intermediate stages of neural networks. (For this reason, we also refer to $L$ as the “layer state space”; here, the identification of a state $v\in L$ with its type $\operatorname{tp}(v)$ is implicit.)¹¹1Neural networks whose layer transitions depend not only on the current state but possibly on earlier ones may be formalized as multi-argument transformations. An appropriate setting is that of $n$ -ary CCSs, which are informally introduced in Remark 3.4(2). A different view of layer transitions that allows them to depend on additional parameters is the notion of Parametrized Family of Computations discussed in Appendix A.1.4. For simplicity, computations $\gamma$ generally represent (single-argument) maps $L\to L$ throughout this paper. A neural network of depth $m$ may be regarded as a computation $\gamma=\gamma_{m}\ldots\gamma_{2}\gamma_{1}$ composing $m$ -many single-layer transitions $\gamma_{i}$ that are computations of a very specific kind. Roughly speaking, once one fixes a (suitable) nonlinear activation function $\tau:\mathbb{R}\to\mathbb{R}$ once and for all, each feature of each transition $\gamma_{i}$ is obtained by applying $\tau$ to some linear combination of input features (the chosen coefficients in forming such linear combinations are the parameters of the transition $\gamma_{i}$ ).

At any given stage (layer), only finitely many real-valued features of the computation are meaningful in an effective sense; however, the study of arbitrarily deep networks, and of deep layers/deep equilibria of such networks requires keeping track of numerous features the number of which is not necessarily bounded beforehand. The natural setting to treat a finite number of features possibly growing without bound is with a countable predicate collection $(P_{n})$ .

We formalize the deep layers and equilibria mentioned above as deep computations (or ultracomputations), which capture a precise notion of asymptotic limit of computations. Such deep computations are obtained as pointwise limits of sequences $(\gamma_{n})_{n\in\mathbb{N}}$ of given computations, where “pointwise” means for each individual feature. (More generally, deep computations may arise as arbitrary pointwise ultralimits of computations.) Deep computations are not necessarily realizable as layer transition maps $L\to L$ , but typically only as “transforms” $f:L\to\mathfrak{L}$ into the space $\mathfrak{L}$ of state types (namely $\mathfrak{L}=\overline{L}\subseteq\mathbb{R}^{\mathcal{P}}$ is the topological closure of $L\subseteq\mathbb{R}^{\mathcal{P}}$ ).

A tenet of our approach is that a transform $f:L\to\mathfrak{L}$ is to be considered “effectively computable” if, for each $P_{n}\in\mathcal{P}$ , the output feature $\xi_{n}\coloneqq P_{n}{\circ}f:L\to\mathbb{R}$ is a definable predicate in the following sense:

Given any fixed $\varepsilon>0$ (otherwise arbitrary), the output feature $\xi_{n}(v)$ is $\varepsilon$ -approximated by a continuous function $\varphi$ of finitely many input features $P_{m}(v)$ , on any region $K\subseteq L$ wherein every input feature $P_{i}(v)$ remains bounded in magnitude.

(The approximating function $\varphi$ above is allowed to depend on $\varepsilon$ and $K$ .)

In other words, as long as input features remain bounded and one is willing to accept an error of magnitude not exceeding $\varepsilon$ , output features are continuous functions of finitely many input features.

Under a suitable Extendibility Hypothesis (as formulated in §4.2) on computations, we introduce the notion of ultracomputation not merely as a transform $L\to\mathfrak{L}$ , but rather as a map $\mathfrak{L}\to\mathfrak{L}$ , called a “transition-in-type (t-t)”; such a t-t $\mathfrak{f}$ is considered effectively computable if each feature $\xi_{n}\coloneqq P_{n}\circ\mathfrak{f}$ is definable. The setting of transitions-in-type implies a shift in perspective that is essential to the study of deep layers in our setting.²²2In the special case when $L=\mathfrak{L}$ is closed in $\mathbb{R}^{\mathcal{P}}$ , transitions-in-type are the same as transforms, namely maps $L\to L$ . However (particularly when the predicate collection $\mathcal{P}$ is infinite) such situation is rather special—it amounts to a “saturation” property of the CSS $\underline{L}$ —and we do not assume it a priori.

In a computational paradigm based on floating-point arithmetic, definable transforms (or t-ts) are as effectively computable as one could hope for: after all, algorithmic implementations of such arithmetic impose a priori bounds on inputs; the specific algorithm depends on such bounds in a manner paralleling the dependence of $\varphi$ approximating $\xi_{P}$ on the bounds $r_{Q}$ and the admissible error magnitude $\varepsilon$ . In this paper, we are not concerned with explicit algorithmic implementations of definable ultracomputations; our results pertain to effective computability understood as the ability to carry out such computations in principle, i.e., the existence of an algorithm (which we otherwise do not provide).

(We stress that such a notion of effective computability is relative: The distinguished predicates $P(\cdot)$ are considered computable a priori —i.e., are computational primitives.)

We show (Theorem 6.2.3) that:

(1)

The approximations $\varphi$ to a definable $\xi:L\to\mathbb{R}$ may be taken to be polynomials of the input features (i.e., a definable $\xi$ is polynomially definable);
(2)

$\xi:L\to\mathbb{R}$ is definable iff it extends to a function $\tilde{\xi}:\mathfrak{L}\to\mathbb{R}$ ; and
(3)

Definable transforms $f:L\to\mathfrak{L}$ are precisely those that extend to continuous t-ts $\mathfrak{f}:\mathfrak{L}\to\mathfrak{L}$ (this is the property of extendibility mentioned above).

Extendable $f$ are continuous on $L$ , but extendibility is a strictly stronger property whenever $\mathfrak{L}\supsetneq L$ .³³3General criteria for a transform $f:L\to\mathfrak{L}$ to takes values in $L\subseteq\mathfrak{L}$ —or, similarly, for a t-t $\mathfrak{f}:\mathfrak{L}\to\mathfrak{L}$ to restrict to a transition $L\to L$ —are delicate; their study exceeds the scope of the present manuscript.

Under suitable assumptions, we prove the existence of deep iterates and equilibria (understood as transitions-in-type); see Proposition 4.7 and Theorem 5.3: deep layers and deep equilibria-in-type of neural networks exist under such assumptions.

Our results in §6 characterize definability of ultracomputations. A particular case of Theorem 6.4 is as follows.

Assume that $\mathcal{P}$ is countable and each predicate $P\in\mathcal{P}$ is bounded on $L$ . Fix a set $\Delta$ of extendable computations $\gamma:L\to L$ . Every ultracomputation of $\Delta$ is definable if and only if, for every predicate $P\in\mathcal{P}$ and any sequences $(\gamma_{m})_{m\in\mathbb{N}}\subseteq\Delta$ and $(v_{n})_{n\in\mathbb{N}}\subseteq L$ , the following Limit Exchange identity:

\lim_{m}\lim_{n}P(\gamma_{m}(v_{n}))=\lim_{n}\lim_{m}P(\gamma_{m}(v_{n}))

holds whenever the iterated limits on the left- and right-hand side both exist. Moreover, in such case, each ultracomputation of $\Delta$ is the (pointwise) limit of some sequence $(\gamma_{n})\subseteq\Delta$ . The limit is attained uniformly over any set $K\subseteq L$ of states that is feature-wise bounded (i.e., $K$ is included in a “shard” in the sense of §4.1.2).

Going farther, we define smooth (ultra)computations $\mathfrak{f}$ as those having output features $P(\mathfrak{f}(v))$ varying smoothly (i.e., differentiably) with the input features $Q(v)$ . Although the study of smoothness properties of definable ultracomputations is beyond the scope of the current paper, such smoothness is implicit in applications such as training of Neural ODEs [CRBD18] and equilibrium analysis of DEQs [BKK19]. In Appendix A, we informally introduce smooth transitions-in-type. In sections §A.1.2 and §A.1.4, we outline the connections of our results, respectively, with effective computability of equilibria of DEQs, and with the training of Neural ODEs (in practice done using optimal control, for instance).

The paper is organized as follows. Section §2 is a self-contained abridged summary, for countable $\mathcal{P}$ only, of the more general results in Sections §§4–6, which apply to more general structures. Section §3.4 introduces the general notions (and examples) of Computation States Structure (CSS) and Compositional Computation Structure (CCS) (the latter being essentially a CSS $\underline{L}$ expanded with a semigroup $\underline{\Gamma}$ of extendable computations $L\to L$ ). We also introduce the topological spaces of types—both of states and of transitions. In Section §5, we prove (under suitable hypotheses) the existence of deep computations and of deep equilibria (the main results of §§4–5 are Proposition 4.7 and Theorem 5.3). In Section §6 we prove the aforementioned characterization of definable ultracomputations (the main result being Theorem 6.4).

We are grateful to Frank Tall for his constant guidance through the world of $C_{p}$ -theory.

Readers with experience in model theory will realize that the ideas presented here are strongly influenced by the work of C. C. Chang and H. J. Keisler on continuous model theory and model theory of real-valued structures [CK66, Kei23] as well as the work of J.-L. Krivine in Banach space theory [Kri76, KM81]. We owe a great debt of gratitude to these giants for allowing us to stand on their shoulders.

2. Computations and ultracomputations with countably many features

This section expands on the outline in Section §1, summarizing the results of subsequent sections §3.4–§6 in the special setting of computations (and ultracomputations obtained therefrom) involving states $v$ characterized by (at most) countably many real-valued “observable features” $P(v)$ ; readers interested in the general framework (when states are possibly characterized by uncountably many features) should skip forward to §3.4. Proofs in this section are omitted if they are presented in subsequent sections.

2.1. Definitions

Fix a set $\mathcal{P}=(P_{n})_{n\in\mathbb{N}}$ of countably many distinct distinguished predicate symbols $P_{n}$ . Effectively, $n$ and $P_{n}$ are interchangeable: one may think of the number $n$ as a label for the symbol $P_{n}$ , but also $P_{n}$ may be regarded as a purely syntactic label for the number $n$ —the usefulness of the syntax $P_{n}$ is its later use to denote a bona fide function $P_{n}(\cdot)$ (the symbol “ $n$ ” is an extremely poor choice of name for a function!). Let $\mathbb{R}^{\mathcal{P}}=\prod_{Q\in\mathcal{P}}\mathbb{R}$ ( $=\mathbb{R}^{\mathbb{N}}$ ) be the space of all functions $\mathcal{P}\to\mathbb{R}$ , each regarded as a real tuple $v=(v_{n})_{n\in\mathbb{N}}$ ; the space $\mathbb{R}^{\mathcal{P}}$ is endowed with the product topology, i.e., the topology of entry-wise convergence of such tuples. Each $n\in\mathbb{N}$ names a coordinate (projection) map $\pi_{n}(\cdot):\mathbb{R}^{\mathcal{P}}\to\mathbb{R}:v\mapsto v_{n}$ . The real quantity $\pi_{n}(v)=v_{n}$ is called the $n$ -th feature of $v$ .

Fix an arbitrary nonempty subset $L\subseteq\mathbb{R}^{\mathcal{P}}$ , which we shall call the state space; its elements are called states. (We may also called these the layer state space and layer states to capture the neural-network intuition explained in the introduction.) The (real-valued) predicate on $L$ with symbol $P_{n}$ is the map $P_{n}(\cdot)=\pi_{n}{\restriction}L:L\to\mathbb{R}$ obtained by restricting $\pi_{n}$ to $L$ ; the $P_{n}$ -feature of $v\in L$ is $P_{n}(v)=v_{n}$ . The pair $\underline{L}=\langle L,(P(\cdot))_{P\in\mathcal{P}}\rangle$ is called a computation states structure (CSS); it will henceforth be denoted simply $\underline{L}=\langle L,\mathcal{P}\rangle$ by an abuse of notation whereby we identify each symbol $P_{n}\in\mathcal{P}$ with the corresponding predicate $P_{n}(\cdot):L\to\mathbb{R}$ . (Such abuse of notation will be quite frequent throughout.)

The topological closure $\mathfrak{L}\coloneqq\overline{L}\subseteq\mathbb{R}^{\mathcal{P}}$ is called the space of (layer) state types of $L$ ; its elements are called state types. Elements $v\in L$ are called realized states to distinguish them from state types $\mathbf{v}\in\mathfrak{L}\setminus L$ , called unrealized (when such exist).

Each symbol $P_{n}\in\mathcal{P}$ still gives a continuous predicate (real-valued function) $\mathfrak{L}\to\mathbb{R}$ by restriction of the projection $\pi_{n}$ ; it is the unique extension of $P_{n}(\cdot):L\to\mathbb{R}$ to $\mathfrak{L}$ , and will still be denoted $P_{n}(\cdot)$ (or even just $P_{n}$ ) by an abuse of notation.

A sizer is a family $r_{\bullet}=(r_{P})_{P\in\mathcal{P}}\subseteq[0,\infty)^{\mathcal{P}}$ of nonnegative reals indexed by predicates $P\in\mathcal{P}$ . Such a sizer names a compact subset $\mathbb{R}{[r_{\bullet}]}\coloneqq\prod_{P\in\mathcal{P}}[-r_{P},r_{P}]% \subseteq\mathbb{R}^{\mathcal{P}}$ . Given a sizer $r_{\bullet}$ , the $r_{\bullet}$ -shard of $L$ (resp., of $\mathfrak{L}$ ) is $L[r_{\bullet}]\coloneqq L\cap\mathbb{R}{[r_{\bullet}]}$ (resp., the closure $\mathfrak{L}[r_{\bullet}]\coloneqq\overline{L[r_{\bullet}]}\subseteq\mathbb{R}% ^{\mathcal{P}}$ ). All type-shards $\mathfrak{L}[r_{\bullet}]$ are compact (being closed in $\mathbb{R}{[r_{\bullet}]}$ ). Clearly, $\mathfrak{L}[r_{\bullet}]\subseteq\mathfrak{L}\cap\mathbb{R}{[r_{\bullet}]}$ (equality need not hold).

Proposition 2.1.

Let $\langle L,\mathcal{P}\rangle$ be a CSS with countable predicate collection $\mathcal{P}$ .

(1)

the space $\mathfrak{L}$ of state types is metrizable;
(2)

every state type is the limit of a sequence of realized states;
(3)

every type $\mathbf{v}\in\mathfrak{L}$ is shard-supported in the sense that $\mathbf{v}\in\mathfrak{L}[r_{\bullet}]$ for some sizer $r_{\bullet}$ ; thus, $\mathfrak{L}=\bigcup_{r_{\bullet}}\mathfrak{L}[r_{\bullet}]$ (where $r_{\bullet}$ varies over all sizers);
(4)

a real function on $\mathfrak{L}$ is continuous if its restrictions to arbitrary type-shards $\mathfrak{L}[r_{\bullet}]\subseteq\mathfrak{L}$ are continuous.⁴⁴4We thank F. Tall for pointing out that $\mathfrak{L}$ is a $k_{\mathbb{R}}$ -space for $\mathcal{P}$ countable. Indeed, property (4) is a strengthening of the $k_{\mathbb{R}}$ property of $\mathfrak{L}$ inasmuch as type-shards are compact (however, an arbitrary compact $K\subseteq\mathfrak{L}$ need not be included in any type-shard).

The proof uses metrizability in an essential way, hinting at the technical difficulties arising (from Sections §3.4 on) when $\mathcal{P}$ is possibly uncountable.

Proof.

One sees that any compact $K\subseteq\mathfrak{L}$ is included in some type-shard $\mathfrak{L}[r_{\bullet}]$ —itself compact. The real line $\mathbb{R}$ is topologized by the bounded metric $\operatorname{d}(x,y)\coloneqq\rho(\left|y-x\right|)$ , where $\rho(t)\coloneqq t/(1+t)<1$ . Since $\mathcal{P}$ is countable, the space $\mathbb{R}^{\mathcal{P}}$ is metrizable, say by $\delta(\mathbf{u},\mathbf{v})\coloneqq\sum_{n}2^{-n}\operatorname{d}(P_{n}(u),% P_{n}(v))<2$ ; therefore, its subspace $\mathfrak{L}$ is metrizable, proving (1). By density of $L$ in $\mathfrak{L}$ and (1), every type $\mathbf{v}\in\mathfrak{L}$ is the limit of a sequence $v_{\bullet}=(v_{n})\subseteq L$ ; hence, $K=v_{\bullet}\cup\{\mathbf{v}\}$ ( $=\overline{v_{\bullet}}$ ) is compact. The image $P(K)\subseteq\mathbb{R}$ is bounded for each $P\in\mathcal{P}$ , hence $P(K)\subseteq[-r,r]$ for some $r=r_{P}>0$ , and evidently $\mathfrak{L}[r_{\bullet}]\supseteq K=\overline{v_{\bullet}}\ni\mathbf{v}$ (where $r_{\bullet}\coloneqq(r_{P})_{P\in\mathcal{P}}$ ). Assertions (2) and (3) follow.

The compactness argument above is adapted to show that any convergent sequence $\mathbf{u}_{\bullet}=(\mathbf{u}_{n})_{n\in\mathbb{N}}\subseteq\mathfrak{L}$ is included in some type-shard. Indeed, for each $n\in\mathbb{N}$ , some sequence $v^{(n)}_{\bullet}\coloneqq(v^{(n)}_{k})_{k\in\mathbb{N}}\subseteq L$ satisfies $\lim_{k}v^{(n)}_{k}=\mathbf{u}_{n}$ ; without loss of generality (upon replacing $v^{(n)}_{\bullet}$ by a sufficiently deep tail thereof if necessary), we may impose the following accelerated convergence requirement: $\sup_{k\in\mathbb{N}}\operatorname{d}(v^{(n)}_{k},\mathbf{u}_{n})\to 0$ as $n\to\infty$ (the sequences $v^{(n)}_{\bullet}$ converge to their limits $\mathbf{u}_{n}$ “increasingly faster” as $n$ grows). Let $\mathbf{w}\coloneqq\lim_{n}\mathbf{u}_{n}$ . Then, the set $K\coloneqq\{\mathbf{w},\mathbf{u}_{n},v^{(n)}_{k}:k,n\in\mathbb{N}\}\subseteq% \mathfrak{L}$ is compact: Given an open cover $\mathcal{G}$ of $K$ , we have $G\ni\mathbf{w}$ for some $G\in\mathcal{G}$ . By accelerated convergence, for all sufficiently large $n\in\mathbb{N}$ , say, for $n\geq N$ , we have $K\supseteq\{\mathbf{u}_{n}\}\cup v^{(n)}_{\bullet}$ . For each $n<N$ there is also $G_{n}\in\mathcal{G}$ with $G_{n}\ni\mathbf{u}_{n}$ , hence $v^{(n)}_{k}\in G_{n}$ for all but finitely many $k\in\mathbb{N}$ ; therefore, $\{G,G_{n}:n<N\}\subseteq\mathcal{G}$ covers all but finitely many points of $K$ , hence $\mathcal{G}$ has a finite subcover, so $K$ is compact. Since each image $P(K)\subseteq\mathbb{R}$ is (compact, hence) bounded, we deduce that $\mathfrak{L}[r_{\bullet}]\supseteq K=\overline{\{v^{(n)}_{k}:n,k\in\mathbb{N}% \}}\supseteq\mathbf{u}_{\bullet}$ for some sizer $r_{\bullet}$ (as before).

Let now $\varphi:\mathfrak{L}\to\mathbb{R}$ be discontinuous, say at $\mathbf{v}\in\mathfrak{L}$ ; then, $\mathbf{v}$ is the limit of some sequence $\mathbf{u}_{\bullet}=(\mathbf{u}_{n})_{n\in\mathbb{N}}$ in some type-shard $\mathfrak{L}[r_{\bullet}]$ (by the preceding paragraph), but such that $\varphi(\mathbf{u}_{\bullet})\coloneqq(\varphi(\mathbf{u}_{n}))\not\to\varphi(% \mathbf{v})$ . We have $\mathbf{v}\in\mathfrak{L}[r_{\bullet}]$ (shards being closed in $\mathfrak{L}$ ), so the restriction of $\varphi$ to $\mathfrak{L}[r_{\bullet}]$ is discontinuous, proving (4). ∎

A (syntactic) formula is a purely formal real polynomial $\varphi(P_{1},\dots,P_{k})$ in predicate symbols $P_{1},\dots,P_{k}\in\mathcal{P}$ (treated as pairwise commuting indeterminates). Since each $P_{i}$ names a map $P_{i}(\cdot):\mathbb{R}^{\mathcal{P}}\to\mathbb{R}$ , such a formula $\varphi$ itself names a polynomial function (or just polynomial) $\varphi(\cdot):\mathbb{R}^{\mathcal{P}}\to\mathbb{R}$ called the interpretation of $\varphi$ which, in practice, we shall identify with the syntactic formula $\varphi$ . (Different formulas may yield the same polynomial function, but this is not an issue in practice.) By restriction of its interpretation on $\mathbb{R}^{\mathcal{P}}$ , a formula also gives polynomials on $\mathfrak{L}$ and on $L$ ; moreover, by density of $L$ in $\mathfrak{L}$ , polynomials on $\mathfrak{L}$ and $L$ are in natural bijection, so we shall not distinguish between them.

A definable predicate is any real map $\varphi:L\to\mathbb{R}$ whose restriction to an arbitrary shard $L[r_{\bullet}]$ is uniformly approximable by polynomials. Using the same definition of definable predicate $\boldsymbol{\xi}$ on the type space $\mathfrak{L}$ (i.e., the restriction of $\boldsymbol{\xi}$ to an arbitrary compact type-shards is uniformly approximable by polynomials), we see that definable predicates on $L$ and $\mathfrak{L}$ are also identified. (By Proposition 2.1(4), a definable $\varphi$ on $L$ extends continuously to each type-shard, and therefore to a continuous $\tilde{\varphi}:\mathfrak{L}\to\mathbb{R}$ .)

A map $L\to L$ (resp., $L\to\mathfrak{L}$ , $\mathfrak{L}\to\mathfrak{L}$ ) is called a transition (resp., a transform, a transition-in-type (t-t)). By the inclusion $\mathfrak{L}\supseteq L$ , every transition is a transform. A transform $f$ is extendable if it extends to a continuous t-t on $\mathfrak{L}\supseteq L$ . A transform or t-t is definable if each of its features is definable.

For sizers $r_{\bullet},s_{\bullet}$ , a transform (resp., a t-t) is called $s_{\bullet}$ -confined on $r_{\bullet}$ if it restricts to a map $L[r_{\bullet}]\to\mathfrak{L}[s_{\bullet}]$ (resp., $\mathfrak{L}[r_{\bullet}]\to\mathfrak{L}[s_{\bullet}]$ ). A set of transforms or t-ts is $r_{\bullet}$ -confined by $s_{\bullet}$ if each of its members is.

Any collection $s_{\bullet}^{[\cdot]}=(s_{\bullet}^{[r_{\bullet}]})_{r_{\bullet}}$ of sizers (itself indexed by sizers) is called a confiner. A transform or t-t is $s_{\bullet}^{[\cdot]}$ -confined if it is $s_{\bullet}^{[r_{\bullet}]}$ -confined on $r_{\bullet}$ for all sizers $r_{\bullet}$ . Let $\mathcal{T},\mathfrak{T}$ be the sets of all confined transforms and t-ts, respectively; the set of $s_{\bullet}^{[\cdot]}$ -confined transforms (resp., t-ts) is denoted $\mathcal{T}[s_{\bullet}^{[\cdot]}]\subseteq\mathcal{T}$ (resp., $\mathfrak{T}[s_{\bullet}^{[\cdot]}]\subseteq\mathfrak{T}$ ). Any subcollection of $\mathcal{T}[s_{\bullet}^{[\cdot]}]$ or $\mathfrak{T}[s_{\bullet}^{[\cdot]}]$ for some confiner $s_{\bullet}^{[\cdot]}$ is called uniformly confined (or confined by $s_{\bullet}^{[\cdot]}$ ).

A collection $R$ of sizers is called exhaustive if $\mathfrak{L}=\bigcup_{r_{\bullet}\in R}\mathfrak{L}[r_{\bullet}]$ . A transform or t-t (resp., a set of transforms or t-ts) is called $R$ -confined if it is (resp., if all its members are) $r_{\bullet}$ -confined on $r_{\bullet}$ , for all $r_{\bullet}\in R$ . The set of $R$ -confined t-ts is denoted $\mathfrak{T}[R]$ . (By exhaustiveness, $R$ -confined transforms and t-ts are confined in the above sense.)

Any continuous t-t $\mathfrak{f}$ maps each shard $\mathfrak{L}[r_{\bullet}]$ into some (compact subset of some) shard $\mathfrak{L}[s_{\bullet}]$ , so such $\mathfrak{f}$ is necessarily $s_{\bullet}^{[\cdot]}$ -confined for some $s_{\bullet}^{[\cdot]}$ .

Theorem 2.2.

A transform is extendable iff it is definable, in which case it is necessarily confined.

Proof.

Let $f:L\to\mathfrak{L}$ be extended by a continuous $\mathfrak{f}:\mathfrak{L}\rightarrow\mathfrak{L}$ . For fixed $Q\in\mathcal{P}$ , the feature $Q{\circ}\mathfrak{f}$ is continuous on the compactum $\mathfrak{L}[r_{\bullet}]$ , hence uniformly approximable thereon by polynomials in predicates $P_{m}(\cdot)$ , by the Stone-Weierstrass Theorem (such predicates are continuous and separate points of $\mathfrak{L}$ ); thus, $\mathfrak{f}$ is definable, and so is $f=\mathfrak{f}{\restriction}\mathfrak{L}$ a fortiori.

Conversely, if $f$ is definable, for fixed $Q\in\mathcal{P}$ , each restriction $Q{\circ}f{\restriction}L[r_{\bullet}]$ of its $Q$ -feature to an arbitrary state-shard $L[r_{\bullet}]$ is a uniform limit of polynomials $\varphi$ in predicates. Each such $\varphi$ is (the restriction to $L[r_{\bullet}]$ of) a polynomial $\boldsymbol{\varphi}$ on the compact type-shard $\mathfrak{L}[r_{\bullet}]=\overline{L[r_{\bullet}]}$ . Some sequence $(\boldsymbol{\varphi}_{i})_{i\in\mathbb{N}}$ of such polynomials converges uniformly on $\mathfrak{L}[r_{\bullet}]$ to a real $\mathbf{f}_{Q}^{[r_{\bullet}]}$ on $\mathfrak{L}[r_{\bullet}]$ extending $Q{\circ}f{\restriction}L[r_{\bullet}]$ continuously. Since $\mathbf{f}_{Q}^{[r_{\bullet}]}$ is continuous on the compactum $\mathfrak{L}[r_{\bullet}]$ , it is bounded on magnitude thereon, say by $s=s_{Q}^{[r_{\bullet}]}\in[0,\infty)$ . Letting $Q$ vary, we obtain an $s_{\bullet}^{[r_{\bullet}]}$ -confined map $\mathfrak{f}^{[r_{\bullet}]}:\mathfrak{L}[r_{\bullet}]\to\mathfrak{L}[s_{% \bullet}^{[r_{\bullet}]}]:\mathbf{v}\mapsto(\mathbf{f}^{[r_{\bullet}]}_{Q}(% \mathbf{v}))_{Q\in\mathcal{P}}$ . Clearly, some (unique) $s_{\bullet}^{[\cdot]}$ -confined $\mathbf{f}:\mathfrak{L}_{\mathrm{Sh}}\to\mathbb{R}$ extends all such $\mathbf{f}^{[r_{\bullet}]}$ ; such $\mathbf{f}$ is continuous since each entry $\mathbf{f}_{Q}$ is, by Proposition 2.1(4). ∎

Theorem 2.2 formalizes the (perhaps surprising) fact that non-extendable transitions are not obtainable from explicit constructions involving the predicates $P(\cdot)$ . We remind the reader that the topology on $L$ is the coarsest one for which all predicates are continuous. However, even a continuous transition $f$ , if non-extendable, is “uncomputable“ in the sense that its coordinates (“features“) $P{\circ}f$ cannot be well approximated by continuous functions (e.g., polynomials) of predicates $Q(\cdot)$ . Any sense of approximation cannot be uniform; in fact, it cannot even be uniform on arbitrary shards $L[r_{\bullet}]$ .

For that reason, extendibility is a critical hypothesis in our main results.

2.2. Deep computations and deep equilibria

Let $s_{\bullet}^{[\cdot]}$ be a confiner. Recall that $\mathcal{T}[s_{\bullet}^{[\cdot]}]$ , $\mathfrak{T}[s_{\bullet}^{[\cdot]}]$ are the sets of all transforms and t-ts, respectively, that are $s_{\bullet}^{[\cdot]}$ -confined. (see page 2.1 for the definitions).

A extendable transition will be called a computation.

A compositional computation structure (CCS) with countably many predicates

\mathcal{C}=\langle\underline{L},\underline{\Gamma},\operatorname{ev}\rangle

consists of:

•

a CSS $\underline{L}=\langle L,\mathcal{P}\rangle$ whose predicate collection $\mathcal{P}$ is countable;
•

a semigroup $\underline{\Gamma}=\langle\Gamma,\circ\rangle$ , whose elements $\gamma\in\Gamma$ are called computations of $\mathcal{C}$ ;
•

a continuous semigroup action $\operatorname{ev}:\Gamma\times L\to L$ of $\underline{\Gamma}$ on $\underline{L}$ .

Each computation $\gamma\in\Gamma$ gives a transition

	$\displaystyle\gamma(\cdot):L$	$\displaystyle\to L$
	$\displaystyle v$	$\displaystyle\mapsto\gamma(v)\coloneqq\operatorname{ev}(\gamma,v).$

Under this identification, $\Gamma$ is a semigroup (under composition) of maps $L\to L$ .

The CSS $\mathcal{C}$ above will often be denoted simply $\mathcal{C}=\langle\underline{L},\underline{\Gamma}\rangle$ without explicitly naming the evaluation action $\operatorname{ev}$ which, however, is always an implicit operation of $\mathcal{C}$ .⁵⁵5At any rate, the “functional application notation” $\gamma(x)$ for $\operatorname{ev}(\gamma,x)$ makes it essentially redundant to have a name for the action.

CCSs are required to satisfy the⁶⁶6When $\mathcal{P}$ is uncountable, the Extendibility Axiom takes a different form: see §4.3.1.
Extendibility Axiom. The transition $\gamma(\cdot)$ of any computation $\gamma\in\Gamma$ is extendable.

By extendibility, any computation is necessarily confined, so it may be regarded as a (confined) element $\gamma(\cdot)\in\mathcal{T}$ .

2.2.1. Deep computations and ultracomputations

A deep computation (DC) of a set $\Delta\subseteq\Gamma$ of computations is any confined transform $f\in\mathcal{T}$ that is an accumulation point of (the set of transitions of) computations in $\Delta$ , in the topology of pointwise convergence. Equivalently, a DC is any pointwise ultralimit $f\coloneqq\operatorname{\mathcal{U}lim}_{i}\gamma_{i}(\cdot)$ of any family $(\gamma_{i})_{i\in I}\subseteq\Delta$ for any ultrafilter $\mathcal{U}$ on the (otherwise arbitrary) index set $I$ , as long as each pointwise limit exists and the resulting map $f:L\to\mathfrak{L}$ is confined.

An ultracomputation (ucomp) of $\Delta$ is any (confined) accumulation point in $\mathfrak{T}$ of the set $\tilde{\Delta}\subseteq\mathfrak{T}$ of transitions-in-type $\tilde{\gamma}=\gamma(\cdot)\in\mathfrak{T}$ extending computations $\gamma\in\Delta$ .

In general,

•

a DC need not be a map $L\to L$ —let alone need a ucomp restrict to such a map;
•

a DC need not have a unique extension to a ucomp.⁷⁷7Clearly, any DC admits some extension to a ucomp, but such extension need not be continuous —nor, for that matter, be constructible in any explicit sense.

2.2.2. Deep iterates and deep equilibria

Deep iterates

The (topological product) space $\mathfrak{L}^{\mathfrak{L}}=\prod_{\mathbf{v}\in\mathfrak{L}}\mathfrak{L}$ of all t-ts $\mathfrak{f}:\mathfrak{L}\to\mathfrak{L}$ is a semigroup under composition.⁸⁸8Composition $(\mathfrak{f},\mathfrak{g})\mapsto\mathfrak{f}\circ\mathfrak{g}$ is continuous in the left argument $\mathfrak{f}$ , but generally not in the right argument $\mathfrak{g}$ . In general, the set $\mathfrak{T}[s_{\bullet}^{[\cdot]}]$ of $s_{\bullet}^{[\cdot]}$ -confined t-ts is not closed under composition. One sees that the subset $\mathfrak{T}\subseteq\mathfrak{L}^{\mathfrak{L}}$ is a sub-semigroup (although its confined parts $\mathfrak{T}[s_{\bullet}^{[\cdot]}]$ are typically not closed under composition).

A deep iterate (DI) of a computation $\gamma$ is any ultracomputation $\gamma^{(\mathcal{U})}\in\mathfrak{T}$ arising as ultralimit $\gamma^{(\mathcal{U})}\coloneqq\operatorname{\mathcal{U}lim}_{n}\tilde{\gamma}% ^{(n)}$ of iterates $\tilde{\gamma}^{(n)}\coloneqq\tilde{\gamma}\circ\dots\circ\tilde{\gamma}$ ( $n$ -fold) of the t-t $\tilde{\gamma}$ of $\gamma$ . Note that the notion of deep iterate is strictly “in-type”, i.e., it is a transition-in-type—not a transition. Being themselves confined by definition, deep iterates may be composed with any confined t-t.

Proposition 2.3 (Cf., Propositions 4.5, 4.6, and 4.7).

Fix any confiner $s_{\bullet}^{[\cdot]}$ , any exhaustive collection $R$ , and any set $\Delta\subseteq\Gamma$ :

(1)

$\mathfrak{T}[s_{\bullet}^{[\cdot]}]$ is compact;
(2)

$\mathfrak{T}[R]$ is a compact sub-semigroup of $\mathfrak{T}$ .
(3)

Ultracomputations obtained from computations in $\Delta$ form a closed sub-semigroup of $\mathfrak{T}$ ;
(4)

For any $s_{\bullet}^{[\cdot]}$ -confined indexed family $(\gamma_{i})_{i\in I}$ , and any ultrafilter $\mathcal{U}$ on $I$ , the deep computation $\gamma_{\mathcal{U}}\coloneqq\operatorname{\mathcal{U}lim}_{i}\gamma_{i}\in% \mathcal{T}[s_{\bullet}^{[\cdot]}]$ and the ultracomputation $\tilde{\gamma}_{\mathcal{U}}\coloneqq\operatorname{\mathcal{U}lim}_{i}\tilde{% \gamma}_{i}\in\mathfrak{T}[s_{\bullet}^{[\cdot]}]$ exist;

(1)

If $\gamma$ is $R$ -confined, then $\gamma$ has deep iterates of the form $\tilde{\gamma}^{(\mathcal{U})}=\operatorname{\mathcal{U}lim}_{n}\tilde{\gamma}% ^{(n)}$ for arbitrary nonprincipal $\mathcal{U}\in\beta\mathbb{N}$ .

Deep equilibria

A deep equilibrium of a computation $\gamma$ is an idempotent deep iterate $\mathfrak{g}=\tilde{\gamma}^{(\mathcal{U})}$ , i.e., one such that $\mathfrak{g}\circ\mathfrak{g}=\mathfrak{g}$ .

Theorem 2.4 (Cf., Theorem 5.3).

Let $R$ be any exhaustive collection. If $\gamma$ is an $R$ -confined computation, then $\gamma$ has deep equilibria. In fact, one such DE is obtained as the ultralimit $\tilde{\gamma}^{(\mathcal{I})}=\operatorname{\mathcal{I}lim}_{n}\tilde{\gamma}% ^{(n)}$ from an arbitrary idempotent ultrafilter $\mathcal{I}$ on $\mathbb{N}$ .

2.3. Definability Criteria

Ultracomputations, deep iterates and deep equilibria are typically not definable, i.e., not effectively computable —even when $\mathcal{P}$ consists of a single predicate $P$ , let alone countably many! (Cf., Example 3.5.1 et seqq.)

Theorem 2.2 implies very strong restrictions on the ability to realize deep computations in any explicit fashion. One may ask for criteria ensuring that deep computations (or deep iterates, or deep equilibria) are effectively computable—i.e., definable.

Theorem 2.5 (Cf., Theorem 6.4).

Let $s_{\bullet}^{[\cdot]}$ be confiner, and let $\Delta$ be any collection of $s_{\bullet}^{[\cdot]}$ -confined computations on a CCS with countable predicate collection $\mathcal{P}$ . Then, the properties below are equivalent:

(DD)

Deep Definability: All deep computations of $\Delta$ are definable (hence extend to continuous ultracomputations).

(LE)

Limit Exchange: For all predicates $P\in\mathcal{P}$ , all sizers $r_{\bullet}$ , and all sequences $v_{\bullet}\subseteq L[r_{\bullet}]$ and $\gamma_{\bullet}\subseteq\Delta$ , the Limit Exchange identity:

(2.1)

\lim_{m}\lim_{n}P{\circ}\gamma_{m}(v_{n})=\lim_{n}\lim_{m}P{\circ}\gamma_{m}(v% _{n}),

holds whenever the iterated limits on the left- and right-hand side both exist. Moreover, in such case, each ultracomputation (hence, each DC) of $\Delta$ is the (pointwise) limit of some sequence $(\gamma_{n})\subseteq\Delta$ . The limit is attained uniformly on type-shards (a fortiori, uniformly on state shards).

3. Structures for Real-Valued Computations

In this section, we introduce the notions of computation states structure (CSS) and compositional computation structure (CCS), which lie at the foundation of our approach to real-valued computing. Although the definitions of CSS and CCS in §3.2 and §3.4 are fairly straightforward, the abstraction entailed by these notions warrants a preliminary informal discussion to demystify some of the formalism.

3.1. Computations, states, observable features and predicates: A meteorological allegory

Consider physical quantities (such as temperature and barometric pressure) that are real-valued, and each of which may be observed at any given point. For definiteness, consider points on or above the surface of earth, regarded as an idealized sphere. A state $v$ captures the properties a specific such point at a specific moment in time. In such idealization, each physical quantity at any $v$ is called a feature of $v$ (or observable feature for emphasis). Each such feature must be given a name (e.g., temperature, pressure, latitude, longitude, height, etc.); these names are essential, for otherwise the real value of a feature of $v$ is devoid of context. We use the term observable to refer to the name given to any such property that may be observed; in a formal treatment, we use (purely syntactic) symbols (e.g., “T” for temperature, “p” for pressure, “lat” for latitude, “long” for longitude, “h” for height, etc.) as observables. An observable feature of $v$ is the value at $v$ of the observable; e.g., $v$ may have features $\mathtt{lat}(v)=+29.42$ (the lat-feature—i.e., latitude—of $v$ is $29.42^{\circ}$ N), $\mathtt{long}(v)=-98.49$ ( $v$ has long-feature—i.e., longitude— $98.49^{\circ}$ W), $\mathtt{h}(v)=229$ ( $v$ is at $229$ m height) $\mathtt{T}(v)=33.5$ (the temperature at $v$ is $33.5^{\circ}$ ), etc.

We fix a symbol for each observable; such symbols P, Q, …(not necessarily finitely many, or even countably many for that matter) will be called predicate symbols. The set of predicate symbols (i.e., of symbols for observables under consideration) will be denoted $\mathcal{P}$ . We shall denote the set of all possible states $v$ by $L$ . In the present discussion, $L$ might be taken to consist of points on the surface of our idealized spherical earth; it is perhaps more fitting to allow states $v\in L$ to refer to spatial points each at a specific moment in time. Note that time $\mathtt{t}$ is not an observable if one takes $L$ simply as the set of points on the sphere, but $\mathtt{t}$ is a valid observable on the set $L$ of states $v$ simultaneously encoding both ___location and time (in addition to other observables: temperature, pressure, etc.)

Any real-valued function on $L$ is called a predicate. Each symbol $\mathtt{P}\in\mathcal{P}$ , at any state $v\in L$ , has an associated real value $P(v)$ (the switch to italic $P$ from typewriter-style $\mathtt{P}$ is a reminder that the symbol P has been “interpreted” to yield the actual value $P(v)$ of the P-feature of $v$ ). Thus, the symbol $\mathtt{P}$ entails a predicate

	$\displaystyle P=P(\cdot):L$	$\displaystyle\to\mathbb{R}$
	$\displaystyle v$	$\displaystyle\mapsto P(v).$

(The notation $P(\cdot)$ is meant to emphasize the passage from the symbol P to its interpretation.)

Now that the distinction between observables P and the predicates $P(\cdot)$ interpreting them is clear, we shall henceforth use italic $P,Q,\dots$ simultaneously as formal (predicate) symbols denoting observables, and to denote the corresponding predicates; in cases of potential confusion, we use the preferred notation $P(\cdot),Q(\cdot),\dots$ for predicates. (Whenever $\mathcal{P}$ is used as an index set, its members are regarded as symbols, never as predicates.)

Taking $L$ together with the predicate $P(\cdot)$ interpreting each observable $P\in\mathcal{P}$ thereon, we obtain a pair $\underline{L}\coloneqq\langle L,(P(\cdot))_{\mathtt{P}\in\mathcal{P}}\rangle$ called a Computation States Structure (CSS) in 3.2.1 below. (In $\underline{L}$ , the collection of predicates $P(\cdot):L\to\mathbb{R}$ is a family indexed by symbols P.) By an abuse of notation, we may denote such structure in the form $\underline{L}=\langle L,\mathcal{P}\rangle$ wherein the collection $(P(\cdot):P\in\mathcal{P})$ of predicates is implicitly identified with the indexing set $\mathcal{P}$ .

In the allegory, such features include the quantities lat $(v)$ , long $(v)$ and h $(v)$ , which are coordinates in the usual sense, as well as other features T $(v)$ , p $(v)$ and time t $(v)$ , which are not; however, this suggests regarding the collection of all features $(P(v))_{P\in\mathcal{P}}$ of states as coordinatizing states $v\in L$ . Each $P$ -feature $P(v)$ is the “ $P$ -th coordinate” of $v$ in an abstract sense; the collection $\operatorname{tp}(v)\coloneqq(P(v))_{P\in\mathcal{P}}$ is called the type of $v$ . Any state $v$ is uniquely characterized by its type. A critical feature of our approach is to endow the state space $L$ with the topology of “pointwise convergence”, i.e., a filter on (or: a sequence or net of) states converges to a state $v\in L$ iff the filter (or sequence, or net) of real-valued $P$ -features converges to $P(v)$ , for each $P\in\mathcal{P}$ .

For the remainder of this subsection, we assume that the state space $L$ compact. In our allegory wherein height (and time) are observable allowed to take arbitrarily large values, compactness fails. On the other hand, if we were to restrict the height and time intervals to be bounded (e.g., $0\leq\mathtt{h}(v),\mathtt{t}(v)\leq C$ for any fixed $C>0$ ), the respective state space would be compact.

On first approximation, a computation is a map $\gamma:L\to L$ transforming any given input state $v$ to some output state $\gamma(v)$ . (For simplicity, we use the same space $L$ of input and output states.) In our allegory, one may “visualize” computations as moving $v$ to another point $\gamma(v)$ , possibly at a different moment in time. Maps $\gamma:L\to L$ should be considered “computable” in any reasonably explicit sense (say, by algorithms relying on floating-arithmetic) only if output features $Q(\gamma(v))$ vary continuously with input features $P(v)$ , i.e., only when $\gamma$ is a continuous map $L\to L$ in the topology of pointwise convergence of individual observable features. Such requirement is consistent with the physics implied by our allegory. We always require computations to be continuous.⁹⁹9Computations on a noncompact state space $L$ are required to be extendable in the sense of §4.2.1—a technical requirement significantly stronger than continuity.

For illustration purposes, consider the “advance-time-by-1” computation $\alpha$ taking any state $v$ of some point at some time t $(v)=t$ to the unique state $w=\alpha(v)$ of the same point at time t $(w)=t+1$ . Features $T(\alpha),p(\alpha)$ of the computation $\alpha$ give the temperature and pressure at a future moment $t+1$ in time from the state at present time $t$ . Meteorologists would be ecstatic to learn features at time $t+1$ from those at time $t$ !

When the state space $L$ is compact, continuous computations $\gamma:L\to L$ are effectively computable in a rather strong sense: they are polynomially definable. This means that, up to any small fixed (but otherwise arbitrary) degree of precision, every output feature $Q(\gamma(v))$ is given (up to an error not exceeding the precision) by a polynomial on some input features $P(v)$ . Meteorologists would be even happier to possess polynomial expressions for features of the computation $\alpha$ , i.e., of future features from the present ones! On the other hand (with apologies to meteorologists), our methods offer no insight on the specific polynomial approximating any output feature; at any rate, such features would only be polynomially approximable on a bounded interval

As a by-product of choosing a common state space $L$ both for computation inputs and outputs, computations are necessarily composable, i.e., any given computations naturally generate a semigroup of computations. This gives rise to the notion of compositional computation structure (CCS), which is of one the form

\mathcal{C}=\langle\underline{L},\underline{\Gamma},\operatorname{ev}\rangle,

where $\underline{L}$ is a CSS, and $\underline{\Gamma}=\langle\Gamma,\circ\rangle$ is any semigroup under an (associative) composition operation $\circ$ , with elements $\gamma\in\Gamma$ representing computations on $L$ via an evaluation map $\operatorname{ev}:\Gamma\times L\to L:(\gamma,v)\mapsto\operatorname{ev}(% \gamma,v)$ ( $=\gamma(v)$ , if $\Gamma$ is already a set of maps $\gamma:L\to L$ ). Layer state transitions $\gamma(\cdot):v\mapsto\operatorname{ev}(\gamma,v)$ are assumed continuous on $L$ (when $L$ is noncompact, we require them to be extendable in the sense of §4.2.1). CCSs are the natural structures to study compositions $\gamma_{n}\circ\gamma_{n-1}\circ\dots\circ\gamma_{2}\circ\gamma_{1}$ of $n$ -many computations leading, as $n\to\infty$ , to “deep computation states”, as well as “deep iterates” asymptotically approximated by $n$ -fold iterates $\gamma^{(n)}=\gamma\circ\gamma\circ\dots\circ\gamma$ of a fixed computation $\gamma$ .

With suitable changes in definitions, our results apply to non-compact CSSs/CCSs.

3.2. Computation States Structures

Fix an arbitrary nonempty set $\mathcal{P}$ whose members $P,Q,\dots$ will be called predicate symbols.

A Computation States Structure (CSS) with predicates $\mathcal{P}$ is of the form

\underline{L}=\langle L,(P(\cdot))_{P\in\mathcal{P}}\rangle,

where

•

$L$ is a nonempty set, called the sort (or space) of layer states;
•

For each symbol $P\in\mathcal{P}$ , the $P$ -predicate of $\underline{L}$ is a real function $P(\cdot):L\to\mathbb{R}$ .¹⁰¹⁰10In the setting of real-valued structures, any real function is called a predicate.

By an abuse of notation, we typically identify a symbol $P\in\mathcal{P}$ with the predicate $P(\cdot)$ ; this entails a further abuse whereby we identify the predicate collection $(P(\cdot):P\in\mathcal{P})$ with $\mathcal{P}$ itself; thereby, the CSS above takes the form $\underline{L}=\langle L,\mathcal{P}\rangle$ .

3.2.1. Types of states

In a CSS $\underline{L}=\langle L,\mathcal{P}\rangle$ , the type of a state $v\in L$ is the indexed family $\operatorname{tp}(v)\coloneqq(P(v):P\in\mathcal{P})$ of its predicate values. Such type is called realized by $v$ ; it is a “vector” $\mathfrak{v}=(\mathfrak{v}_{P})_{P\in\mathcal{P}}$ with real entries $\mathfrak{v}_{P}=P(v)$ indexed by predicates $P$ . Thus, such state types $\mathfrak{v}$ are elements of the product (vector space) $\mathbb{R}^{\mathcal{P}}=\prod_{P\in\mathcal{P}}\mathbb{R}$ , which will always be regarded as topological product of copies of the real line $\mathbb{R}$ (endowed with its usual topology), one such line for each $P\in\mathcal{P}$ . The topological subspace of realized types will be denoted $\operatorname{tp}(L)\coloneqq\{\operatorname{tp}(v):v\in L\}$ . (On the other hand, the linear operations on the vector space $\mathbb{R}^{\mathcal{P}}$ will not play a direct role outside of informal discussions—and in the Appendix.)

Ultrafilters on an infinite (“index”) set $I$ will be denoted $\mathcal{U},\mathcal{V},\dots$ ; we consider nonprincipal ultrafilters tacitly. Given an ultrafilter $\mathcal{U}$ on $I$ , we say that an indexed family $\mathfrak{v}^{(\bullet)}\coloneqq(\mathfrak{v}^{(i)})_{i\in I}$ of elements $\mathfrak{v}^{(i)}=(\mathfrak{v}^{(i)}_{P})_{P\in\mathcal{P}}\in\mathbb{R}^{% \mathcal{P}}$ converges to $\mathfrak{u}=(\mathfrak{u}_{P})_{P\in\mathcal{P}}\in\mathbb{R}^{\mathcal{P}}$ , or that $\mathfrak{u}$ is the $\mathcal{U}$ -ultralimit of $\mathfrak{v}^{(\bullet)}$ with respect to $\mathcal{U}$ if $\operatorname{\mathcal{U}lim}_{i}\mathfrak{v}_{P}^{(i)}=\mathfrak{u}_{P}$ for each $P\in\mathcal{P}$ (i.e., when $\mathfrak{u}$ is the $\mathcal{U}$ -ultralimit of $(\mathfrak{v}^{(i)})$ in the pointwise convergence topology—not necessarily uniformly as $P$ varies).¹¹¹¹11When it exists, the $\mathcal{U}$ -ultralimit $s=\operatorname{\mathcal{U}lim}_{i}r_{i}$ of $(r_{i})_{i\in I}\subseteq\mathbb{R}$ is uniquely characterized by the following property: for every $\varepsilon>0$ , the set $\{i\in I:\left|r_{i}-s\right|<\varepsilon\}$ belongs to $\mathcal{U}$ (i.e., is a “ $\mathcal{U}$ -large” set). Not all ultralimits $\operatorname{\mathcal{U}lim}_{i}P(u^{{(i)}})$ need exist since $\mathbb{R}$ is not compact. The (necessarily unique) ultralimit $\mathfrak{u}$ is denoted $\mathcal{U}\lim_{i}\mathfrak{v}^{(i)}$ .

Elements $\mathfrak{u}\in\mathbb{R}^{\mathcal{P}}$ arising as entry-wise ultralimits of realized types $\operatorname{tp}(v)$ in the above fashion (with $I$ and $\mathcal{U}$ allowed to vary) are called types of (layer) states, or ultrastates. Any realized state type is an ultrastate, but the converse fails in general. The set $\mathfrak{L}$ of ultrastates is a closed subset $\mathfrak{L}=\overline{\operatorname{tp}(L)}$ (the bar denoting topological closure) of $\mathbb{R}^{\mathcal{P}}$ , called the (layer) state type space, and henceforth endowed with the subspace topology. Since ultrastates need not be realized, the inclusion $\operatorname{tp}(L)\subseteq\mathfrak{L}$ is generally proper.

We shall adopt the convenient alternate notation $P(\mathfrak{v})$ for the “ $P$ -th entry” $\mathfrak{v}_{P}$ of a type $\mathfrak{v}\in\mathfrak{L}$ , which treats $\mathfrak{v}$ as if it were realized (i.e., as though $\mathfrak{v}$ were a state in $L$ ).

3.2.2. Topology on the layer state space

We adopt a structural perspective wherein states are to be distinguished only through predicate values; thus, a state $v\in L$ is implicitly identified with its type $\operatorname{tp}(v)\in\mathbb{R}^{\mathcal{P}}$ . We topologize $L$ with (the “pullback” of) the product topology under such identification. A slightly more concrete description of this topology is as follows: For each predicate $P\in\mathcal{P}$ , endow $L$ with the pseudometric $\operatorname{d}_{P}(v,w)\coloneqq\left|P(w)-P(v)\right|$ , and topologize $L$ by the collection $(\operatorname{d}_{P})_{P\in\mathcal{P}}$ of all such pseudometrics. This topology “by type” is the only one we shall introduce on $L$ (except in certain examples meant to compare this topology to others).

CSSs are assumed to satisfy the following:

•

Reduction Axiom for Computation States Structures. States $v,w\in L$ are equal only if their types $\operatorname{tp}(v),\operatorname{tp}(w)$ are equal.

The Reduction Axiom above is equivalent to the requirement that distinct states be topologically distinguishable; since $\mathbb{R}^{\mathcal{P}}$ is Hausdorff, reduction amounts to requiring that $L$ itself be a Hausdorff topological space (any two states have disjoint neighborhoods).

Even if not imposed a priori on a CSS $\underline{L}$ , the Reduction Axiom is always satisfied if one replaces $L$ by its quotient $\tilde{L}\coloneqq L/{\operatorname{tp}}$ upon identifying equal-in-type states, and each predicate $P(\cdot)$ on $L$ by the naturally induced predicate $\tilde{P}(\cdot)$ on $\tilde{L}$ . From a structural viewpoint, $\langle L,(P(\cdot))_{P\in\mathcal{P}}\rangle$ and $\langle\tilde{L},(\tilde{P}(\cdot))_{P\in\mathcal{P}}\rangle$ are identical (isomorphic, in the sense of Keisler’s General Real-Valued Structures [Kei23]).

Remark 3.1.

By Proposition 2.1, if $\mathcal{P}$ is countable, then the topology on $L$ is metrizable. Even when $\mathcal{P}$ is countable, however, our purposes are better suited by thinking of $L$ as endowed with the topology (and corresponding uniformity [Eng89, §8.1]) explicitly given by the full predicate collection, rather than by an implied “master” metric which, in an abridged manner, induces the same topology.

3.3. Tychonoff and Realcompact spaces

3.3.1. Tychonoff spaces

Recall that a topological space $X$ is Tychonoff if it is $T_{3\text{½}}$ , i.e., a completely regular Hausdorff space; explicitly: (i) points are closed, and (ii) given any given point $x\in X$ and closed $C\not\ni x$ there exists a continuous function $f:X\to\mathbb{R}$ such that $f(x)=0$ and $f{\restriction}C=1$ .

Remark 3.2.

A reduced CSS $\underline{L}=\langle L,\mathcal{P}\rangle$ is ultimately just a Tychonoff space endowed with a distinguished family $\mathcal{P}$ of real functions $P(\cdot)$ (distinguished predicates), and such that the topology on $L$ is initial by the collection $\mathcal{P}$ (i.e., the topology of $L$ is generated by the inverse images of open intervals of $\mathbb{R}$ under functions $P(\cdot)$ ). From another perspective, any CSS $\underline{L}=\langle L,\mathcal{P}\rangle$ is isomorphic to a “sub-CSS” of a CSS $\underline{\mathbb{R}^{\mathcal{P}}}=\langle\mathbb{R}^{\mathcal{P}},(P)_{P\in% \mathcal{P}}\rangle$ via the type map $\operatorname{tp}:L\to\mathbb{R}^{\mathcal{P}}$ , which is a homeomorphic predicate-preserving embedding; therefore, such product CSSs $\underline{\mathbb{R}^{\mathcal{P}}}$ are universal.

The distinguished predicates $P(\cdot)$ are regarded as being “computable on $L$ ” ab initio; they also may be seen as monomials generating some polynomial algebra of continuous real functions on $L$ ; the uniform closure of the set of such monomials is the algebra $\mathbf{D}$ of “definable predicates” on $L$ (which are, by necessity, continuous real functions on $L$ ). In general, however, $\mathbf{D}$ is a proper subalgebra of the full algebra $\mathrm{C}(L)\supseteq\mathbf{D}$ of continuous real functions on $L$ . Any function $\varphi\in\mathrm{C}(L)\setminus\mathbf{D}$ is non-definable over $\mathcal{P}$ ; it is appropriate to think of such $\varphi$ as “transcendental” over $\mathcal{P}$ —not merely in an algebraic sense, but in a stronger topological one: not only does such a non-definable $\varphi$ fail to be a polynomial on monomials $P(\cdot)\in\mathcal{P}$ ; in fact, it is not even possible to approximate $\varphi$ uniformly on $L$ by such polynomials.

3.3.2. Realcompact spaces

A topological space is called realcompact if it is Tychonoff and it embeds homeomorphically as a closed subspace of the topological product $\mathbb{R}^{I}=\prod_{i\in I}\mathbb{R}$ for some index set $I$ [Eng89, §3.11]. (There is a multitude of equivalent definitions of realcompactness. For a thorough treatment of realcompact spaces, refer to Weir’s monograph [Wei75].)

A CSS $\underline{L}=\langle L,\mathcal{P}\rangle$ is realcompact iff the type map $L\to\mathbb{R}^{\mathcal{P}}:v\mapsto\operatorname{tp}(v)\coloneqq(P(v))_{P\in% \mathcal{P}}$ has closed image $\operatorname{tp}(L)\coloneqq\{\operatorname{tp}(v):v\in L\}$ in $\mathbb{R}^{\mathcal{P}}$ , i.e., if $\operatorname{tp}(L)=\mathfrak{L}$ is the full space of state types of $L$ (all state types are realized). Any compact (Hausdorff) CSS $\underline{L}$ is necessarily Tychonoff and in fact realcompact: Taking $\mathcal{P}$ to be any set continuous functions $P:X\to\mathbb{R}$ separating points of $X$ , the type map $\operatorname{tp}:X\to\mathbb{R}^{\mathcal{P}}$ is injective and has compact, hence closed, image; it is therefore a homeomorphic embedding.

3.3.3. Realcompactness of type spaces

Any type space $\mathfrak{L}\subseteq\mathbb{R}^{\mathcal{P}}$ is a closed subspace, hence realcompact. Identifying the layer space $L$ with its embedded image $\operatorname{tp}(L)\subseteq\mathfrak{L}$ , it is suggestive to regard the realcompact type space $\mathfrak{L}$ as a canonical realcompact extension of $\operatorname{tp}(L)\cong L$ . Such viewpoint is quite appropriate for our purposes, so we discuss in what precise sense this realcompact extension is canonical.

More generally, consider any Tychonoff space $X$ whose topology is initial with respect to a collection $\Phi$ of real functions $\varphi:X\to\mathbb{R}$ (i.e., inverse images of opens of $\mathbb{R}$ by such functions generate the topology of $X$ ). Each point $x\in X$ has a $\Phi$ -type $\Phi\text{-}\!\operatorname{tp}(x)=(\varphi(x))_{\varphi\in\Phi}\in\mathbb{R}^% {\Phi}$ , and $X$ embeds (via the map $\Phi\text{-}\!\operatorname{tp}$ ) as a subspace $\Phi\text{-}\!\operatorname{tp}(X)\subseteq\mathbb{R}^{\Phi}$ whose closure $\mathfrak{X}=\overline{\Phi\text{-}\!\operatorname{tp}(X)}\subseteq\mathbb{R}^% {\Phi}$ (the $\Phi$ -type space of $X$ ) is realcompact.

The type space $\mathfrak{X}=\mathfrak{X}_{\Phi}$ depends on $\Phi$ . A key observation is that each of the functions $\varphi\in\Phi$ extends to $\mathfrak{X}$ continuously (as the “ $\varphi$ -th coordinate” map $\hat{\varphi}:(\mathfrak{x}_{\psi})_{\psi\in\Phi}\mapsto\mathfrak{x}_{\varphi}$ ). However, other real functions on $X$ —even if continuous—need not extend to $\mathfrak{X}$ continuously. Thus, the $\Phi$ -type space $\mathfrak{X}\supseteq\Phi\text{-}\!\operatorname{tp}(X)\cong X$ possesses the universal property that every $\varphi\in\Phi$ admits a unique continuous extension to $\mathfrak{X}$ ; it is characterized by such universal property up to homeomorphism.¹²¹²12In fact, any real function $\xi$ on $X$ that is uniformly approximable by polynomials in the functions $\varphi\in\Phi$ is (necessarily continuous, and) extends continuously to a real function $\hat{\xi}$ on $\mathfrak{X}$ (a uniform limit of polynomials in the corresponding functions $\hat{\varphi}$ on $\mathfrak{X}$ ), so $\mathfrak{X}$ possesses the extension property for functions in the uniform closure of the real algebra generated by functions $\varphi\in\Phi$ .

Remark 3.3.

Let $C\coloneqq\mathrm{C}(X)$ be the set of all continuous real-valued functions on a Tychonoff space $X$ , and let $\hat{X}\subseteq\mathbb{R}^{C}$ be the corresponding type space, called the realcompactification of $X$ . Every continuous function $\varphi$ on $X$ extends to a continuous function $\hat{\varphi}$ on the realcompactification $\hat{X}$ . In fact, for any $\Phi\subseteq C=\mathrm{C}(X)$ , the $\Phi$ -type space $\mathfrak{X}\subseteq\mathbb{R}^{\Phi}$ is a quotient (not a subspace!) of $\hat{X}\subseteq\mathbb{R}^{C}$ in a natural manner: indeed, $\mathfrak{X}\subseteq\mathbb{R}^{\Phi}$ is the image of $\hat{X}\subseteq\mathbb{R}^{C}$ under the natural projection map $\mathbb{R}^{C}\to\mathbb{R}^{\Phi}:(\mathfrak{x}_{\varphi}:\varphi\in C)% \mapsto(\mathfrak{x}_{\varphi}:\varphi\in\Phi)$ . Thus, given a fixed set $\Phi$ of continuous real functions on $X$ , it is appropriate to think of the $\Phi$ -type space $\mathfrak{X}$ as a “ $\Phi$ -relative realcompactification” of $X$ , since $\mathfrak{X}\supseteq X$ possesses the universal extension property only for functions $\varphi\in\Phi$ —rather than for all $\varphi\in\mathrm{C}(X)$ , which corresponds to the (“absolute”) realcompactification $\hat{X}$ of $X$ .¹³¹³13The notation $\upsilon X$ (“upsilon- $X$ ”) is standard for the realcompactification—denoted $\hat{X}$ above—of a Tychonoff space $X$ .

3.3.4. Realcompact CSSs

We single out the (sub)class of CSSs possessing the:

•

Realcompactness Property. Every type $\mathfrak{v}\in\mathfrak{L}$ is realized, i.e., of the form $\operatorname{tp}(v)$ for some $v\in L$ .

Thus, realcompactness is the requirement that $\operatorname{tp}:L\to\mathfrak{L}$ be a surjection onto the type space $\mathfrak{L}$ , whence $\operatorname{tp}$ is a homeomorphism $L\cong\mathfrak{L}$ (by the Reduction Axiom). Rephrasing, realcompactness states that whenever $(u^{(i)})_{i\in I}\subseteq L$ and $\mathcal{U}$ on $I$ are such that the ultralimit $\mathfrak{v}\coloneqq\operatorname{\mathcal{U}lim}_{i}\operatorname{tp}(u^{(i)})$ exists, then some $v\in L$ satisfies $\operatorname{tp}(v)=\mathfrak{v}$ .

It is appropriate to regard realcompactness as capturing a certain notion of “completeness” or “saturation” of the space $L$ . Particularly when $\mathcal{P}$ is infinite, realcompactness is a rather strong requirement on CCSs, so we do not impose it as an axiom; instead, we rely primarily on the realcompactness and universal properties of the type space $\mathfrak{L}\supseteq L$ .¹⁴¹⁴14When $\mathcal{P}$ is finite, realcompactness is a rather mild requirement: it is seen to be equivalent to the completeness of $\langle L,\delta\rangle$ , where $\delta$ is the metric introduced in the proof of Proposition 2.1.

3.4. Compositional Computation Structures

A Compositional Computation Structure (CCS)

\mathcal{C}=\langle\underline{L},\underline{\Gamma},\operatorname{ev}\rangle

for a given set $\mathcal{P}$ of predicate symbols consists of

•

a CSS $\underline{L}=\langle L,(P(\cdot))_{P\in\mathcal{P}}\rangle$ with predicate symbol set $\mathcal{P}$ and, for each $P\in\mathcal{P}$ , a real predicate $P(\cdot):L\to\mathbb{R}$ ;
•

a semigroup $\underline{\Gamma}=(\Gamma,\circ)$ , the computations sort (the—associative—semigroup operation $\circ:\Gamma\times\Gamma\to\Gamma$ is denoted simply $(\gamma,\delta)\mapsto\gamma\delta$ when convenient);
•

a map $\operatorname{ev}:\Gamma\times L\to L:(\gamma,v)\mapsto\operatorname{ev}(% \gamma,v)$ (“evaluation”) giving an action of $\Gamma$ on $L$ (i.e., $\operatorname{ev}(\gamma\delta,v)=\operatorname{ev}(\gamma,\operatorname{ev}(% \delta,v))$ for $\gamma,\delta\in\Gamma$ and $v\in L$ ).

Remarks 3.4.

(1)

In principle, the semigroup operation ‘ $\circ$ ’ of $\underline{\Gamma}$ and evaluation action ‘ $\operatorname{ev}$ ’ are abstract (i.e., not literally composition and application of functions). However, one may always regard $\underline{\Gamma}$ as a semigroup (under the operation ‘ $\circ$ ’ interpreted as composition) of maps $\gamma(\cdot):v\mapsto\operatorname{ev}(\gamma,v)$ ; —i.e., regard $\Gamma$ as a sub-semigroup of the semigroup $L^{L}$ of all maps $L\to L$ , under bona fide functional composition: Nothing of structural relevance is lost thus. The structural viewpoint abstracts inessential aspects of a concrete such realization of $\Gamma$ . In practice, it is convenient to identify $\gamma\in\Gamma$ with $\gamma(\cdot)\in L^{L}$ .

(2)

In applications, a more general notion of CCS with $n$ -ary computations is useful. By this we mean that computations $\gamma\in\Gamma$ may each have an arity $n=n_{\gamma}\in\mathbb{N}$ such that $\gamma(\cdot)$ is an ( $n$ -argument) map $L^{n}\to L$ . (It is appropriate to regard evaluation on $n$ -ary such $\gamma$ as a map $\operatorname{ev}_{n}:\Gamma_{\!n}\times L^{n}\to L$ , where $\Gamma_{\!n}\subseteq\Gamma$ is the set of $n$ -ary elements $\gamma$ ; thus, $\gamma\in\Gamma_{\!n}$ gives an $n$ -argument map $\operatorname{ev}(\gamma;\cdot):L^{n}\to L$ .) CCS with $n$ -ary computations augment the semi-group operation ${\circ}:\Gamma_{1}\times\Gamma_{1}\to\Gamma_{1}$ with a richer set of operations realizing arity-appropriate compositions. Explicitly, given $\gamma_{1},\gamma_{2},\dots,\gamma_{m}\in\Gamma_{\!n}$ and $\theta\in\Gamma_{\!m}$ , there exists an element $\eta\coloneqq\theta{\circ}(\gamma_{1},\dots,\gamma_{m})\in\Gamma_{\!n}$ satisfying

\operatorname{ev}(\eta;\bar{v})=\operatorname{ev}\bigl{(}\theta;\operatorname{% ev}(\gamma_{1},\bar{v}),\operatorname{ev}(\gamma_{2},\bar{v}),\dots,% \operatorname{ev}(\gamma_{m},\bar{v})\bigr{)}\qquad\text{for all $\bar{v}\in L% ^{n}$,}

i.e., the above identity holds for a suitable “generalized composition” operation $\circ$ —or, rather, for one such an operation ${\circ}_{m}^{n}:\Gamma_{\!m}\times(\Gamma_{\!n})^{m}\to\Gamma_{\!n}$ for each $m,n\in\mathbb{N}$ — moreover, the sort of computations $\underline{\Gamma}=(\Gamma,\circ_{m}^{n}:m,n\in\mathbb{N})$ is endowed with all such compositions.

In order to simplify the exposition, CCSs with $n$ -ary computations as in the preceding remark will be used only in informal discussions and examples.

3.4.1. Reduction and Continuity Axioms

Every CCS $\mathcal{C}=\langle\underline{L},\underline{\Gamma}\rangle$ will be assumed to satisfy the following axioms:

•
Reduction Axioms for Compositional Computation Structures.
1. (1)
  
  States $v,w\in L$ are equal only if their types $\operatorname{tp}(v),\operatorname{tp}(w)$ are equal (i.e., the underlying CSS $\underline{L}$ is reduced);
2. (2)
  
  Transformations $\gamma,\delta\in\Gamma$ are equal only if the maps $\gamma(\cdot),\delta(\cdot):L\to L$ are equal.

As a temporary (weaker) placeholder for the Extendibility Axiom (see §4.3.1) eventually imposed on CCSs, we presently impose the natural:

•

Continuity Axiom: The action of $\underline{\Gamma}$ on $L$ is by maps continuous in the topology of $L$ (i.e., is a topological action on the CSS $\underline{L}$ ).

Explicitly, for each computation $\gamma\in\Gamma$ and $P\in\mathcal{P}$ , the real-valued “ $P$ -feature” $P{\circ}\gamma:v\mapsto P(\gamma(v))$ of $\gamma(\cdot)$ is continuous on $L$ .

Reduction Axiom (2) says that $\Gamma$ is bijectively identified with its image $\Gamma(\cdot)\subseteq L^{L}$ of maps (“state transitions”) $\gamma(\cdot)$ ( $\gamma\in\Gamma$ ). This identification implies has a natural topology on $\Gamma$ , obtained (as pullback) from the topology of pointwise convergence on the maps $\gamma(\cdot)\in\Gamma(\cdot)\subseteq L^{L}$ associated to computations $\gamma\in\Gamma$ ; the Reduction Axiom implies that this topology is also Hausdorff.

As long as the Continuity Axiom holds, the Reduction Axioms are innocuous requirements on a CCS $\mathcal{C}$ , because one can always pass from $\mathcal{C}$ to a reduced CCS $\tilde{\mathcal{C}}$ (i.e., one satisfying the Reduction Axiom) as follows. First, replace $\Gamma$ by its quotient $\tilde{\Gamma}\coloneqq\Gamma/(\operatorname{tp}{\circ}\operatorname{ev})$ upon identifying computations $\gamma,\delta\in\Gamma$ such that $\operatorname{tp}(\operatorname{ev}(\gamma,v))=\operatorname{tp}(\operatorname% {ev}(\delta,w))$ for all $v,w\in L$ ; second, pass from the underlying CSS $\underline{L}$ to its quotient-by-type $\underline{\tilde{L}}$ if necessary. The evaluation $\operatorname{ev}$ of $\mathcal{C}$ induces a well-defined natural action $\widetilde{\operatorname{ev}}:\tilde{\Gamma}\times\tilde{L}\to\tilde{L}$ . By the Continuity Axiom, the passage from $\mathcal{C}$ to $\tilde{\mathcal{C}}=\langle\underline{\tilde{L}},\underline{\tilde{\Gamma}},% \widetilde{\operatorname{ev}}\rangle$ preserves all structural properties of states and computations, as well as the Continuity Axiom.¹⁵¹⁵15The passage to $\tilde{\mathcal{C}}$ also preserves the Extendibility Axiom §4.3.1.

Remark 3.5.

The Continuity Axiom ensures that computations act continuously on $L$ . In general, however, the action $\gamma(\cdot)$ of a computation $\gamma\in\Gamma$ on the state space $L$ need not admit a continuous extension to a transformation $\mathfrak{L}\to\mathfrak{L}$ . This distinction is quite important; it speaks to the weakness of the Continuity Axiom, and suggests a strengthening called the Extendibility Axiom, which is a key assumption in our main results.

3.5. Examples of CSSs and CCSs

3.5.1. The unit interval

Consider a CSS with state space $L=[0,1]$ (the unit interval) endowed with the single identity predicate $P_{\operatorname{id}}:[0,1]\to\mathbb{R}:v\mapsto v$ . This CSS $\langle[0,1],P_{\operatorname{id}}\rangle$ is realcompact.

Let $\underline{\Gamma}=(\Gamma,\circ)$ be any semigroup (under composition) of continuous functions $\gamma:[0,1]\to[0,1]$ , acting on $[0,1]$ by functional application: $\operatorname{ev}(\gamma,v)\coloneqq\gamma(v)$ ; this yields a realcompact CCS $\langle([0,1],P_{\operatorname{id}}),\underline{\Gamma}\rangle$ . An interesting such CCS has semigroup $\Gamma=\{\gamma^{n}:n\in\mathbb{N}\}$ consisting of iterates of the chaotic map $\gamma:v\mapsto 4v(1-v)$ .

Replacing $[0,1]$ with the open interval $(0,1)$ , one obtains a non-realcompact CSS $\langle(0,1),P_{\operatorname{id}}\rangle$ having (realcompact) type space $\overline{(0,1)}=[0,1]\subseteq\mathbb{R}^{1}=\mathbb{R}$ . (By contrast—cf., Remark 3.3—the realcompactification $\widehat{(0,1)}\supseteq(0,1)$ is a much larger topological extension not homeomorphic to a subset of $\mathbb{R}$ .)

3.5.2. $\mathbb{R}^{d}$

Given $d\in\mathbb{N}$ , we obtain a CSS $\underline{\mathbb{R}^{d}}=\langle\mathbb{R}^{d},(P_{i})_{i=1}^{d}\rangle$ on the $d$ -dimensional real space $L=\mathbb{R}^{d}$ endowed with coordinate functions $P_{i}(v)\coloneqq v_{i}$ ( $1\leq i\leq d$ ) as distinguished predicates. The corresponding type space is $\operatorname{tp}(\mathbb{R}^{d})=\mathbb{R}^{d}$ ; the type topology coincides with the usual one, so $\underline{\mathbb{R}^{d}}$ is realcompact.

There is ample flexibility in expanding the collection of predicates yielding formally distinct CSSs $\underline{L}$ with layer states sort $L=\mathbb{R}^{d}$ . For any real $q\geq 1$ , one may (for instance) expand the predicate collection $\mathcal{P}$ with the “ $q$ -norm” predicate $\left\|\cdot\right\|_{q}$ defined by

\left\|v\right\|_{q}\coloneqq\sqrt[q]{\left|v_{1}\right|^{q}+\dots+\left|v_{d}% \right|^{q}}.

One may also expand the predicate collection with, say, the supremum norm

\left\|v\right\|_{\sup}=\left\|v\right\|_{\infty}\coloneqq\sup_{1\leq i\leq d}% \left|v_{i}\right|\quad\bigl{(}=\max(\left|v_{1}\right|,\dots,\left|v_{d}% \right|)\bigr{)}.

Since $d$ is finite, the predicates $\left\|\cdot\right\|_{q}$ above are continuous with respect to the topology of $\mathbb{R}^{d}$ . In fact, any continuous function $\varphi:\mathbb{R}^{d}\to\mathbb{R}$ may be added to the predicate collection of $\mathbb{R}^{d}$ yielding an essentially equivalent CSS, because any such $\varphi$ is a definable predicate in the sense of §6.1 below; therefore, such expanded CSSs are still realcompact with layer states sort $\mathbb{R}^{d}$ .¹⁶¹⁶16On the other hand, the addition of new predicates $\varphi:\mathbb{R}^{d}\to\mathbb{R}$ that are discontinuous with respect to the usual topology of $\mathbb{R}^{d}$ expand $\underline{\mathbb{R}^{d}}$ to a CCS that is no longer realcompact.

Expanding $\underline{\mathbb{R}^{d}}$ with:

•

$\overline{\Gamma}=(\Gamma,\circ)$ any semigroup of continuous functions on $\mathbb{R}^{d}$ ; and
•

the evaluation action of $\Gamma$ on $L$ by functional application $\operatorname{ev}(\gamma,v)\coloneqq\gamma(v)$ as in 3.5.1 above,

one obtains a CCS $\langle\mathbb{R}^{d},(P_{i})_{i=1}^{d},\overline{\Gamma}\rangle$ . A natural such expansion is by the semigroup $\Gamma$ of linear operators on $\mathbb{R}^{d}$ .

3.5.3. $\mathbb{R}^{\omega}$ and $c_{00}$

Let the CSS $\underline{\mathbb{R}^{\omega}}$ have states space $L=\mathbb{R}^{\omega}\coloneqq\prod_{i\in\mathbb{N}}\mathbb{R}$ consisting of all real sequences $v=(v_{i})_{i\in\mathbb{N}}\subseteq\mathbb{R}$ , endowed with the collection $\mathcal{P}$ of predicates $P_{i}:v\mapsto v_{i}$ ( $i\in\mathbb{N}$ ). Such CSS $\mathbb{R}^{\omega}$ is realcompact.

The subspace $c_{00}$ consisting of sequences $v$ having at most finitely many entries $v_{i}\neq 0$ is a non-realcompact sub-CSS of $\mathbb{R}^{\omega}$ .

A natural expansion of $\mathbb{R}^{\omega}$ to a CCS is by the semigroup $\underline{\Gamma}$ of linear operators thereon. Each linear such computation $\gamma\in\Gamma$ is effectively a collection $(\gamma_{i})_{i\in\mathbb{N}}$ of real functionals $\gamma_{i}\coloneqq P_{i}{\circ}\gamma:\mathbb{R}^{\omega}\to\mathbb{R}$ , each of the form $\gamma_{i}:v\mapsto\sum_{j\in\mathbb{N}}r^{(i)}_{j}v_{j}$ , for some scalar collection $r_{\bullet}^{(i)}=(r^{(i)}_{j})_{j\in\mathbb{N}}\in c_{00}$ . Thus, every entry $w_{i}=P_{i}(\gamma(v))$ of $w=\gamma(v)$ is exactly given as an effectively finite linear combination of entries of $v$ , i.e., of finitely many real-valued features $P_{j}(v)$ of the input $v$ . (Reciprocally, linear functionals on $c_{00}$ are in correspondence with elements of $\mathbb{R}^{\omega}$ .)

Many natural real functions on $c_{00}$ (or on suitable subspaces thereof) are discontinuous (in the topology of entry-wise convergence); expanding $c_{00}$ with any such function as distinguished predicate leads to (non-homeomorphic) CSSs.

Remark 3.6.

Note that in the CSSs 3.5.1–3.5.3 above (but not in 3.5.4 below), a state $v$ is exactly the same as its type $\operatorname{tp}(v)$ .

3.5.4. $\ell_{q}$

For any extended real $q\in[1,\infty]$ , consider the layer states space

L=\ell_{q}=\{v\in\mathbb{R}^{\omega}:\left\|v\right\|_{q}<\infty\}.

For $q<\infty$ , such space $\ell_{q}$ is the $\left\|\cdot\right\|_{q}$ -metric completion of the subspace $c_{00}\subseteq\mathbb{R}^{\omega}$ ; at any rate, $\ell_{\infty}$ is $\left\|\cdot\right\|_{\infty}$ -complete as well.¹⁸¹⁸18The $\left\|\cdot\right\|_{\infty}$ -metric completion of $c_{00}$ is the separable space $c_{0}=\{v\in\mathbb{R}^{\omega}:\lim_{i\to\infty}v_{i}=0\}\subsetneq\ell_{\infty}$ —the space $\ell_{\infty}$ is not separable. A natural collection of predicates is $\mathcal{P}=(N_{q},P_{i})_{i\in\mathbb{N}}$ , where $P_{i}:v\mapsto v_{i}$ is the $i$ -th coordinate as in 3.5.3 above, and $N_{q}$ names the norm $N_{q}(\cdot):v\mapsto\left\|v\right\|_{q}$ . Since the predicate collection $\mathcal{P}$ is countable, it is easy to show that the type topology and the usual $\left\|\cdot\right\|_{q}$ -norm topology on the layer space $\ell_{q}$ coincide;¹⁹¹⁹19Cf., the proof of Proposition 4.1 below. however, $\ell_{q}$ is non-realcompact. It is easy to see that its type space is $\mathfrak{L}_{q}=\{(v,r)\in\ell_{q}\times\mathbb{R}:r\geq\left\|v\right\|_{q}\}$ . (The set of realized types is $\operatorname{tp}(\ell_{q})=\{(v,r)\in\ell_{q}\times\mathbb{R}:r=\left\|v% \right\|_{q}\}\subseteq\mathfrak{L}_{q}$ , for which the “correct” norm $\left\|v\right\|_{q}$ agrees with the interpretation value $N_{q}(v,r)=r$ of the symbol $N_{q}$ .) Fixing $q<\infty$ , the function $\left\|\cdot\right\|_{\infty}:\ell_{q}\to\mathbb{R}:v\mapsto\sup_{n}\left|P_{n% }(v)\right|$ is $1$ -Lipschitz, hence continuous on $\ell_{q}$ ; however, the corresponding function $\operatorname{tp}:\ell_{q}\to\mathbb{R}:\operatorname{tp}(v)\mapsto\left\|v% \right\|_{\infty}$ does not extend continuously to $\mathfrak{L}_{q}$ .²⁰²⁰20By the Stone-Weierstrass Theorem, every continuous function on the compact Hausdorff space $\mathfrak{L}_{q}[1]\coloneqq\{(v,r)\in\mathfrak{L}_{q}:r\leq 1\}$ is uniformly approximable by algebraic combinations (finitely many at a time) of predicates $P_{i}$ , and $\left\|\cdot\right\|_{q}$ ; however, an elementary argument shows that $\operatorname{tp}(v)\mapsto\left\|v\right\|_{\infty}$ admits no such uniform approximations on $\mathfrak{L}_{q}[1]$ .

A natural expansion of $\underline{\ell_{q}}$ to a CCS is by its semigroup $\underline{\Gamma}$ of bounded (i.e., $\left\|\cdot\right\|_{q}$ -continuous) linear operators. Such operators are continuous on $\underline{\ell_{q}}$ ; however, they are discontinuous when regarded as functions on the reduct CSS of $\underline{\ell_{q}}$ wherein the additional predicate $N_{q}$ removed, i.e., when $\ell_{q}$ is topologized as sub-CSS of $\mathbb{R}^{\omega}$ rather than of $\mathbb{R}^{\omega}\times\mathbb{R}$ as above.

Remark 3.7.

The metric $\operatorname{d}_{q}\coloneqq(v,w)\mapsto\left\|w-v\right\|_{q}$ on $\ell_{q}$ (or on $\mathcal{L}^{d}$ for $d$ finite) is not definable in terms of the norm predicate $\left\|\cdot\right\|_{q}$ unless $\ell_{q}$ is expanded to a CCS with, say, the binary predicate of subtraction $(v,w)\mapsto v-w$ . This remark, although not meant to detract from the preceding discussion, does serve to highlight the usefulness of CCSs with $n$ -ary layer transformations (cf., Remark 3.4).

4. Deep Computations

Throughout this section, $\mathcal{C}=\langle\underline{L},\underline{\Gamma}\rangle$ will be a fixed CCS.

4.1. Shards in state- and type-spaces

4.1.1. Sizers and shards in type spaces

A sizer is any collection $r_{\bullet}=(r_{P})_{P\in\mathcal{P}}\in[0,\infty)^{\mathcal{P}}$ of nonnegative reals. The number $r_{P}$ is called an a priori bound for $P$ .

For a sizer $r_{\bullet}$ , we introduce the topological product space

\mathbb{R}{[r_{\bullet}]}\coloneqq\prod_{P\in\mathcal{P}}[-r_{P},r_{P}];

it is a compact subspace of the product space $\mathbb{R}^{\mathcal{P}}$ ; moreover, $\mathbb{R}^{\mathcal{P}}=\bigcup_{r_{\bullet}}\mathbb{R}{[r_{\bullet}]}$ (with $r_{\bullet}$ varying over sizers). A subset $S\subseteq\mathbb{R}^{\mathcal{P}}$ is called entry-wise bounded if $S\subseteq\mathbb{R}{[r_{\bullet}]}$ for some sizer $r_{\bullet}$ . Clearly, relatively compact subsets of $\mathbb{R}^{\mathcal{P}}$ are precisely entry-wise bounded subsets.

4.1.2. Shards

For a sizer $r_{\bullet}$ , the $r_{\bullet}$ -shard of $L$ is

L[r_{\bullet}]\coloneqq\{v\in L:\operatorname{tp}(v)\in\mathfrak{L}[r_{\bullet% }]\}=\{v\in L:\text{$\left|P(v)\right|\leq r_{P}$ for all $P\in\mathcal{P}$}\}.

Clearly, an arbitrary intersection of shards is a shard, and any finite union of shards is included in some shard.

Let the type $r_{\bullet}$ -shard $\mathfrak{L}[r_{\bullet}]\subseteq\mathbb{R}{[r_{\bullet}]}$ be the topological closure of the set $\operatorname{tp}(L[r_{\bullet}])\coloneqq\{\operatorname{tp}(v):v\in L[r_{% \bullet}]\}$ of types realized in (i.e., by elements of) $L[r_{\bullet}]$ . Evidently, $\mathfrak{L}[r_{\bullet}]\subseteq\mathfrak{L}\cap\mathbb{R}{[r_{\bullet}]}$ ; however, the inclusion is typically strict because a type $\mathfrak{u}\in\mathfrak{L}\cap\mathbb{R}{[r_{\bullet}]}$ need not be an accumulation point of types realized in $L[r_{\bullet}]$ itself, thus $\mathfrak{u}$ need not belong to $\mathfrak{L}[r_{\bullet}]$ .²¹²¹21In general, a type $\mathfrak{u}\in\mathfrak{L}\cap\mathbb{R}{[r_{\bullet}]}$ need not even be an accumulation point of realized types in any fixed shard $L[s_{\bullet}]$ ! We introduce the space $\mathfrak{L}_{\mathrm{Sh}}\coloneqq\bigcup_{r_{\bullet}}\mathfrak{L}[r_{% \bullet}]$ of shard-supported types; it is the set of types $\mathfrak{v}$ of arbitrary shards $L[r_{\bullet}]$ . By the preceding discussion, we have $\mathfrak{L}_{\mathrm{Sh}}\subseteq\mathfrak{L}$ , but the inclusion is proper in general ( $\mathfrak{L}_{\mathrm{Sh}}$ need not be closed in $\mathfrak{L}$ ). The space $\mathfrak{L}_{\mathrm{Sh}}$ will be of central importance in what follows.

A collection $R$ of sizers $r_{\bullet}$ is exhaustive if, for any sizer $s_{\bullet}$ there exists $r_{\bullet}\in R$ such that $r_{P}\geq s_{P}$ for all $P\in\mathcal{P}$ .

From its definition, it is clear that $\mathfrak{L}_{\mathrm{Sh}}$ is the union of type-shards $\mathfrak{L}[r_{\bullet}]$ as $r_{\bullet}$ varies over any exhaustive $R$ .

Recall that a Hausdorff space $X$ is

•

a k-space if closed subsets $Y\subseteq X$ are precisely those whose intersection $Y\cap K$ with an arbitrary compact subset $K\subseteq X$ is closed [Eng89, 3.3.18ff];
•

a k_R-space if an arbitrary real function $\varphi:X\to\mathbb{R}$ is continuous as soon as its restrictions $\varphi{\restriction}K$ to compacta $K\subseteq X$ are continuous.

Evidently, any k-space is a k_R-space.

Proposition 4.1.

Let $\underline{L}=\langle L,\mathcal{P}\rangle$ be a CSS whose distinguished predicate collection $\mathcal{P}=(P_{i})_{i\in\mathbb{N}}$ is countable.

(1)

$\mathfrak{L}_{\mathrm{Sh}}=\mathfrak{L}$ (i.e., all types are shard-supported).
(2)

$\mathfrak{L}_{\mathrm{Sh}}$ is a k-space. More precisely, closed subspaces $S\subseteq\mathfrak{L}_{\mathrm{Sh}}$ are (precisely) those whose intersection $S\cap\mathfrak{L}[r_{\bullet}]$ with an arbitrary type-shard $\mathfrak{L}[r_{\bullet}]$ is closed.

A fortiori, the result holds when the predicate collection is finite.

We thank F. Tall for bringing to our attention the fact that realcompact spaces embeddable in $\mathbb{R}^{\omega}$ are k-spaces.

Proof.

For $i<\omega$ , introduce the pseudometric $\operatorname{d}_{i}(\mathfrak{u},\mathfrak{v})\coloneqq\left|P_{i}(\mathfrak{% u})-P_{i}(\mathfrak{v})\right|$ on $\mathfrak{L}$ , and let $\delta_{i}\coloneqq\operatorname{d}_{i}/(1+\operatorname{d}_{i})$ be the usual $[0,1]$ -valued pseudometric corresponding to $\operatorname{d}_{i}$ . The space $\mathfrak{L}$ is completely metrizable by $\operatorname{d}(\mathfrak{u},\mathfrak{v})\coloneqq\sum_{i\in\mathbb{N}}2^{-i% }\delta_{i}(\mathfrak{u},\mathfrak{v})$ , and the subset $\operatorname{tp}(L)\subseteq\mathfrak{L}$ of realized types is dense.

(1)

Given $\mathfrak{u}\in\mathfrak{L}$ , there is a sequence $(v_{n})_{n\in\mathbb{N}}$ such that $\lim_{n\to\infty}\operatorname{tp}(v_{n})=\mathfrak{u}$ in the $\operatorname{d}$ -metric sense. The set $A\coloneqq\{\operatorname{tp}(v_{n}):n\in\mathbb{N}\}\cup\{\mathfrak{u}\}% \subseteq\mathfrak{L}$ is compact (any open cover $\mathcal{O}$ of $A$ contains an open $U\ni\mathfrak{u}$ ; since $\operatorname{tp}(v_{n})\to\mathfrak{u}$ , all but finitely many elements of $A$ are contained in $U$ , so $\mathcal{O}$ has a finite subcover). By compactness of $A$ , the image $P(A)\coloneqq\{P(\mathfrak{v}):\mathfrak{v}\in A\}$ is compact in $\mathbb{R}$ for each $P\in\mathcal{P}$ , hence bounded, say $\left|P(\mathfrak{v})\right|\leq r_{P}$ for some $r_{P}\geq 0$ and all $\mathfrak{v}\in A$ . Clearly, $\mathfrak{L}[r_{\bullet}]\supseteq A\ni\mathfrak{u}$ , so $\mathfrak{L}\subseteq\mathfrak{L}_{\mathrm{Sh}}$ ( $\subseteq\mathfrak{L}$ , in any case); hence, $\mathfrak{L}_{\mathrm{Sh}}=\mathfrak{L}$ .
(2)

If $S\subseteq\mathfrak{L}$ ( $=\mathfrak{L}_{\mathrm{Sh}}$ ) is not closed, let $\mathfrak{u}\in\overline{S}\setminus S\subseteq\mathfrak{L}\setminus S$ . As in (1) above, construct $(\mathfrak{v}_{n})_{n\in\mathbb{N}}\subseteq S$ with $\mathfrak{u}=\lim_{n\to\infty}(\mathfrak{v}_{n})$ . Since $\overline{L}=\mathfrak{L}\supseteq S$ , for each $n,k\in\mathbb{N}$ there exists some $v^{(n)}_{k}\in L$ such that $\operatorname{d}(v^{(n)}_{k},\mathfrak{v}_{n})<1/(n+k)$ . One sees that $A=\{\operatorname{tp}(v^{(n)}_{k}):n,k\in\mathbb{N}\}\cup\{\mathfrak{v}_{n}:n% \in\mathbb{N}\}\cup\{\mathfrak{u}\}$ is compact.²²²²22Clearly, $A^{(n)}\coloneqq\{\mathfrak{v}_{n}\}\cup\{\operatorname{tp}(v^{(n)}_{k}):k\in% \mathbb{N}\}$ is compact for each $n\in\mathbb{N}$ (as in (1) above). Given any open cover $\mathcal{O}$ of $A$ , some $U\in\mathcal{O}$ satisfies $U\ni\mathfrak{u}$ . For a set $N\subseteq\mathbb{N}$ containing all but finitely many $n$ , we have $U\supseteq A^{(n)}$ . The set $B\coloneqq\bigcup_{n\in\mathbb{N}\setminus N}A^{(n)}$ is a finite union of compacta, hence itself compact. Thus, finitely many opens of $\mathcal{O}$ cover $B$ , which together with $U$ cover $A$ . Therefore, $A\subseteq\mathfrak{L}[r_{\bullet}]$ for some sizer $r_{\bullet}$ , and thus $\{v^{(n)}_{k}:n,k\in\mathbb{N}\}\subseteq L[r_{\bullet}]$ , hence $\mathfrak{u}=\lim_{n\to\infty}\operatorname{tp}(v^{(n)}_{n})\in\overline{% \operatorname{tp}(L[r_{\bullet}])}=\mathfrak{L}[r_{\bullet}]$ . Since $\mathfrak{u}\in\overline{S}\setminus S$ and $S\subseteq\mathfrak{L}[r_{\bullet}]$ we see that $S\cap\mathfrak{L}[r_{\bullet}]$ is not closed. The converse is trivial, since type-shards $\mathfrak{L}[r_{\bullet}]$ are closed (and the intersection of closed sets is closed).²³²³23We recall the following fact, closely related to (2): Every sequential Hausdorff (therefore, every metric) space is a k-space [Eng89, Theorem 3.3.20].∎

Remarks 4.2.

(1)

For $\mathcal{P}$ at most countable, Proposition 4.1 implies that closed subsets $S\subseteq\mathfrak{L}_{\mathrm{Sh}}$ are those whose intersections $S\cap\mathfrak{L}[r_{\bullet}]$ are closed for all sizers $r_{\bullet}$ in any exhaustive collection $R$ .
(2)

A compact subset $K\subseteq\mathfrak{L}_{\mathrm{Sh}}$ need not be included as a subset of any type shard $\mathfrak{L}[r_{\bullet}]$ (even if $\mathcal{P}$ is countable).

4.2. Transitions-in-type. Extendibility.

Any—not necessarily continuous—function $f:L\to L$ (resp., $f:L\to\mathfrak{L}_{\mathrm{Sh}}$ ) will be called a (layer) transition (resp., an ultra-transition (u-t)). We also introduce the notion of transition-in-type (t-t) to mean any function $\mathfrak{f}:\mathfrak{L}_{\mathrm{Sh}}\to\mathfrak{L}_{\mathrm{Sh}}$ .

A transition $f$ (resp., an u-t $f$ ; a t-t $\mathfrak{f}$ ) is shard-to-shard (Sh2Sh) if, for every sizer $r_{\bullet}$ there is a sizer $s_{\bullet}$ such that $f$ restricts to a map $L[r_{\bullet}]\to L[s_{\bullet}]$ (resp., $f$ restricts to $L[r_{\bullet}]\to\mathfrak{L}[s_{\bullet}]$ ; $\mathfrak{f}$ restricts to $\mathfrak{L}[r_{\bullet}]\to\mathfrak{L}[s_{\bullet}]$ ).

The transition space (resp., ultra-transition space) of $\underline{L}$ is $T\coloneqq L^{L}$ (resp., $\mathcal{T}\coloneqq\mathfrak{L}_{\mathrm{Sh}}^{L}$ ); note that $\mathcal{T}\subseteq T$ in a natural fashion (upon identifying $L$ with the subset $\operatorname{tp}(L)\subseteq\mathfrak{L}$ ). These spaces generally include (ultra)transitions that are not shard-to-shard.

We regard $\mathcal{T}$ as the topological product $\prod_{v\in L}\mathfrak{L}_{\mathrm{Sh}}$ ; equivalently, via the inclusion $\mathfrak{L}_{\mathrm{Sh}}\subseteq\mathbb{R}^{\mathcal{P}}$ , the space $\mathcal{T}$ is topologized as a subspace of the product $\mathbb{R}^{\mathcal{P}\times L}$ : This is the topology of pointwise convergence of the real functions $v\mapsto P(f(v))$ for fixed $P\in\mathcal{P}$ . The space $T\subseteq\mathcal{T}$ inherits the subspace topology (of pointwise convergence).

4.2.1. Extendable layer transitions

The Continuity Axiom ensures that every computation $\gamma(\cdot):L\to L$ is continuous; however, it need not extend to a continuous map $\mathfrak{L}_{\mathrm{Sh}}\to\mathfrak{L}_{\mathrm{Sh}}$ , which renders such realized computations rather poor foundational blocks for our subsequent treatment of deep computations. To remedy such deficiency, we will axiomatically require that realized computations be extendable as suggested in Remark 3.5. Such extendibility requirement is rather strong; moreover, its consequences are strongest in regard to the restrictions to compacta of t-ts, rather than the t-t themselves. In this light, it is natural to require computations to only be extendable to compact shards $\mathfrak{L}[r_{\bullet}]$ when restricted to shards $L[r_{\bullet}]$ , which motivates the following definition.

A transition-in-type $\mathfrak{L}:\mathfrak{L}_{\mathrm{Sh}}\to\mathfrak{L}_{\mathrm{Sh}}$ is called shard-continuous (or Sh-continuous) if its restriction to each shard $\mathfrak{L}[r_{\bullet}]$ is a continuous map $\mathfrak{L}[r_{\bullet}]\to\mathfrak{L}[s_{\bullet}]$ into some type-shard $\mathfrak{L}[s_{\bullet}]$ (in particular, a Sh-continuous t-t is shard-to-shard). A transition $f$ (more generally, an u-t $f$ ) is called Sh-extendable if, for every sizer $r_{\bullet}$ , its restriction $f{\restriction}L[r_{\bullet}]$ extends to a continuous function $\mathfrak{L}[r_{\bullet}]\to\mathfrak{L}[s_{\bullet}]$ into some type-shard. (It suffices to impose this condition for sizers $r_{\bullet}$ in a given exhaustive collection $R$ .)

Remark 4.3.

A continuous shard-to-shard t-t is Sh-continuous, but the converse fails in general.

4.2.2. Spaces of transitions-in-type

Both $T$ and $\mathfrak{T}$ are semigroups under the binary operation $(f,g)\mapsto f\circ g$ of composition; this operation is continuous in the left argument $f$ , but not in the right argument $g$ .

The subsets $T_{\mathrm{Sh}}\subseteq T$ (resp., $\mathfrak{T}_{\mathrm{Sh}}\subseteq\mathfrak{T}$ ) of transitions (resp., t-ts) that are shard-to-shard are subgroups; however, $T_{\mathrm{Sh}},\mathfrak{T}_{\mathrm{Sh}}$ are typically not closed subspaces.

Recall that a set $R$ of sizers is exhaustive if, for any sizer $s_{\bullet}$ , there exists $r_{\bullet}\in R$ such that $r_{P}\geq s_{P}$ for all $P\in\mathcal{P}$ (cf., Remark 4.2). In particular, $\mathfrak{L}_{\mathrm{Sh}}\coloneqq\bigcup_{r_{\bullet}\in R}\mathfrak{L}[r_{% \bullet}]$ in such case.²⁴²⁴24Provided $\mathcal{P}$ is countable, by Proposition 4.1, we have $\mathfrak{L}=\mathfrak{L}[R]$ for $R$ exhaustive.

Given a sizer $r_{\bullet}$ , we say that $\mathfrak{f}\in\mathfrak{T}$ is $r_{\bullet}$ -preserving if it restricts to a map $\mathfrak{L}[r_{\bullet}]\to\mathfrak{L}[r_{\bullet}]$ . The set of $r_{\bullet}$ -preserving transitions $\mathfrak{f}\in\mathfrak{T}$ is denoted $\mathfrak{T}[r_{\bullet}]$ . A collection $F\subseteq\mathfrak{T}[r_{\bullet}]$ is called $r_{\bullet}$ -preserving.

Given an exhaustive collection $R$ of sizers, we say that $R$ confines $\mathfrak{f}\in\mathfrak{T}$ , or $\mathfrak{f}$ is $R$ -confined, if $\mathfrak{f}$ is $r_{\bullet}$ -preserving for each $r_{\bullet}\in R$ . The collection of all $R$ -confined t-ts is denoted $\mathfrak{T}[R]$ ; it is a closed sub-semigroup of $\mathfrak{T}$ . One sees that $\mathfrak{T}[R]\subseteq\mathfrak{T}_{\mathrm{Sh}}$ , i.e., $R$ -confined t-ts are necessarily shard-to-shard. Moreover, $\mathfrak{T}[R]$ is compact as shown in Proposition 4.5 below.

The notions above have formally identical analogues for transitions, i.e., with $T,L,L[r_{\bullet}]$ in place of $\mathfrak{T},\mathfrak{L}_{\mathrm{Sh}},\mathfrak{L}[r_{\bullet}]$ .

A family $F\subseteq\mathfrak{T}$ is:

•

confined by an exhaustive sizer collection $R$ (or: $R$ -confined) if $F\subseteq\mathfrak{T}[R]$ ;
•

pointwise bounded at $\mathfrak{v}\in\mathfrak{L}_{\mathrm{Sh}}$ if there is a sizer $s_{\bullet}$ (a pointwise bound for $F$ at $\mathfrak{v}$ ) such that $\mathfrak{f}(\mathfrak{v})\in\mathfrak{L}[s_{\bullet}]$ for all $\mathfrak{f}\in F$ ;
•

pointwise bounded on $S\subseteq\mathfrak{L}_{\mathrm{Sh}}$ , if it is pointwise bounded at every $\mathfrak{v}\in S$ ;
•

pointwise bounded, if it is pointwise bounded on $\mathfrak{L}_{\mathrm{Sh}}$ .

Remarks 4.4.

(1)

Given $R$ exhaustive, we have $\mathfrak{T}[R]=\mathfrak{T}[s_{\bullet}^{(\cdot)}]$ , where $s_{\bullet}^{(\cdot)}=(s_{\bullet}^{(\mathfrak{v})})_{\mathfrak{v}\in\mathfrak% {L}_{\mathrm{Sh}}}$ is the sizer collection defined by

(4.1)

s^{(\mathfrak{v})}_{P}\coloneqq\inf\{r_{P}:\text{$r_{\bullet}\in R$ and $% \mathfrak{v}\in\mathfrak{L}[r_{\bullet}]$}\}\qquad\text{for each $P\in\mathcal% {P}$}.

In particular, every confined family $F\subseteq\mathfrak{T}$ is pointwise bounded.

(2)

$F\subseteq\mathfrak{T}$ is pointwise bounded on $S\subseteq\mathfrak{L}_{\mathrm{Sh}}$ iff there is a collection $s_{\bullet}^{(\cdot)}\coloneqq(s_{\bullet}^{(\mathfrak{v})})_{\mathfrak{v}\in S}$ of pointwise bounds at each point $\mathfrak{v}\in S$ . The corresponding set of t-ts is denoted $\mathfrak{T}[s_{\bullet}^{(\cdot)}]$ ; thus

(4.2)

\mathfrak{T}[s_{\bullet}^{(\cdot)}]\coloneqq\{\mathfrak{f}\in\mathfrak{T}:% \text{$\mathfrak{f}(\mathfrak{v})\in\mathfrak{L}[s_{\bullet}^{(\mathfrak{v})}]% $ for all $\mathfrak{v}\in S$}\}\quad\left(=\prod_{\mathfrak{v}\in S}\mathfrak% {L}[s_{\bullet}^{(\mathfrak{v})}]\times\prod_{\mathfrak{v}\in\mathfrak{L}_{% \mathrm{Sh}}\setminus S}\mathfrak{L}_{\mathrm{Sh}}\right).

The notions of $r_{\bullet}$ -preserving, $R$ -confined, and pointwise bounded (shard-to-shard) layer transitions (and collections of such transitions), and the definition of $T[R]$ , $T[s_{\bullet}^{(\cdot)}]$ , are obtained from those for transitions-in-type mutatis mutandis (simply replacing $L$ for $\mathfrak{L}_{\mathrm{Sh}}$ , and $T$ for $\mathfrak{T}$ ).

Mutatis mutandis, one may define pw-bdd transition spaces $T[s_{\bullet}^{(\cdot)}],T[r_{\bullet}],T[R]\subseteq T$ .

Proposition 4.5.

For any collection $s_{\bullet}^{(\cdot)}=(s_{\bullet}^{(v)})_{v\in L}$ of sizers at all points $\mathfrak{v}\in\mathfrak{L}_{\mathrm{Sh}}$ , the space $\mathfrak{T}[s_{\bullet}^{(\cdot)}]$ of $s_{\bullet}^{(\cdot)}$ -pointwise bounded transitions-in-type is compact. In particular, $\mathfrak{T}[R]$ is compact for any exhaustive sizer collection $R$ .

Proof.

The collection $(s_{\bullet}^{(\cdot)})$ specifies pointwise bounds at all points $\mathfrak{v}\in\mathfrak{L}_{\mathrm{Sh}}$ , hence the product space in (4.2) above is compact, by Tychonoff’s Theorem (being a product of compact factors $\mathfrak{L}[s_{\bullet}^{(\mathfrak{v})}]$ only).

If $R$ is exhaustive, we have $\mathfrak{T}[R]=\mathcal{T}[s_{\bullet}^{(\cdot)}]$ for $s_{\bullet}^{(v)}$ given by (4.1) for all $v\in L$ , so $\mathfrak{T}[R]$ is compact. ∎

4.3. Computations and ultracomputations (deep computations)

4.3.1. The Extendibility Axiom

The transition $\operatorname{ev}(\gamma,\cdot)$ associated to $\gamma\in\Gamma$ —also denoted $\gamma(\cdot)$ —of a layer transformation $\gamma\in\Gamma$ is the map $v\mapsto\operatorname{ev}(\gamma,v)$ .

If such a transition $\gamma(\cdot)$ is Sh-extendable, we call it the computation by $\gamma$ , or realized by $\gamma$ for emphasis. By an abuse of notation, we will denote the extension $\mathfrak{L}_{\mathrm{Sh}}\to\mathfrak{L}_{\mathrm{Sh}}$ still by $\gamma(\cdot)\in\mathfrak{T}$ ).

For the remainder of this paper, we assume that CCSs satisfy the following

•

Extendibility Axiom. Each layer transformation $\gamma\in\Gamma$ induces a Sh-extendable computation $\gamma(\cdot)$ .

The Extendibility Axiom gives a natural (injective) map $\Gamma\to\mathfrak{T}:\gamma\mapsto\gamma(\cdot)$ . The semigroup $\Gamma$ is topologized via (the pullback of) this map, i.e., by the topology of pointwise convergence; it is the subspace topology obtained upon identifying $\Gamma$ with the set $\Gamma(\cdot)\coloneqq\{\gamma(\cdot):\gamma\in\Gamma\}\subseteq\mathfrak{T}$ , called the space of realized computations.

It follows from the Reduction Axiom that the above topology on $\Gamma$ is Hausdorff.

4.3.2. Realized vs. deep computations

The space $\mathfrak{D}$ of ultracomputations is the topological closure $\overline{\Gamma(\cdot)}\subseteq\mathfrak{T}$ . A transition $\mathfrak{f}\in\mathfrak{D}$ will be called a deep computation, ultracomputation, or ucomp for short. Although any computation is a deep computation in its own right, the adjective “deep” implies that $\mathfrak{f}$ may be an unrealized computation, i.e., not of the form $\gamma(\cdot)$ . Deep computations are typically (Sh-)discontinuous layer transitions. Even if Sh-continuous, an ultracomputation may be unrealized.

Every deep computation is of the form $\mathfrak{f}_{\mathcal{U}}\coloneqq\operatorname{\mathcal{U}lim}_{i}\gamma_{i}% :v\mapsto\mathfrak{f}_{\mathcal{U}}(v)\coloneqq\operatorname{\mathcal{U}lim}_{% i}\gamma_{i}(v)$ for some indexed family $\gamma_{\bullet}\coloneqq(\gamma_{i})_{i\in I}\subseteq\Gamma$ and some ultrafilter $\mathcal{U}$ on $I$ . (Without loss of generality, one may always take $\mathcal{U}$ as an ultrafilter on $\Gamma$ itself.)²⁵²⁵25For arbitrary $\mathcal{U}$ on (say) $\Gamma$ , the ultracomputation $\mathfrak{f}_{\mathcal{U}}$ need not be defined: $(\gamma(v))$ might not $\mathcal{U}$ -converge for certain $v\in L$ .

For any sizer collection $s_{\bullet}^{(\cdot)}$ , let $\mathfrak{D}[s_{\bullet}^{(\cdot)}]\coloneqq\mathfrak{T}[s_{\bullet}^{(\cdot)}% ]\cap\mathfrak{D}$ be the set of ultracomputations with pointwise bounds $s_{\bullet}^{(\cdot)}$ . Since $\mathfrak{D}\subseteq\mathfrak{T}$ is closed by definition, the space $\mathfrak{D}[s_{\bullet}^{(\cdot)}]$ is also closed in $\mathfrak{T}$ . For any fixed sizer $r_{\bullet}$ and exhaustive $R$ , we see that $\mathfrak{D}[r_{\bullet}]\coloneqq\mathfrak{T}[r_{\bullet}]\cap\mathfrak{D}$ and $\mathfrak{D}[R]\coloneqq\mathfrak{T}[R]\cap\mathfrak{D}$ (the sets of ultratypes $r_{\bullet}$ -preserving and $R$ -confined, respectively) are also closed.

By an abuse of nomenclature, we say that an element $\gamma\in\Gamma$ admits pointwise bounds $s_{\bullet}^{(\cdot)}$ (resp., is $r_{\bullet}$ -preserving, is $R$ -confined) if its transition type $\gamma(\cdot)\in T$ does (resp., is). We denote by $\Gamma[s_{\bullet}^{(\cdot)}]$ , $\Gamma[r_{\bullet}]$ , and $\Gamma[R]$ , respectively, the sets of transformations $\gamma\in\Gamma$ with associated transitions in $\mathfrak{T}[s_{\bullet}^{(\cdot)}]$ , $\mathfrak{T}[r_{\bullet}]$ , and $\mathfrak{T}[R]$ . The respective uniform notions as $\gamma$ varies in some subset $\Delta\subseteq\Gamma$ become: $\Delta$ admits uniform pointwise bounds $s_{\bullet}^{(\cdot)}$ , is $r_{\bullet}$ -preserving, and is $R$ -confined, respectively.

An ultracomputation $\mathfrak{f}:L\to L$ with values in $L\subseteq\mathfrak{L}$ is called quasi-realized; these constitute the set $\mathcal{D}=\mathfrak{D}\cap T$ : the space of quasi-realized ultracomputations. Let $\mathcal{D}[s_{\bullet}^{(\cdot)}]\coloneqq\mathfrak{D}[s_{\bullet}^{(\cdot)}]\cap T$ , $\mathcal{D}[r_{\bullet}]\coloneqq\mathfrak{D}[r_{\bullet}]\cap T$ , and $\mathcal{D}[R]\coloneqq\mathfrak{D}[R]\cap T$ .

Proposition 4.6.

For any sizer $r_{\bullet}$ and exhaustive collection $R$ :

(1)

each of the sets $\Gamma[r_{\bullet}]$ , $\Gamma[R]$ is a sub-semigroup of $\Gamma$ , and is a closed subset of $\Gamma$ ;
(2)

$\mathfrak{D}[r_{\bullet}]$ , $\mathfrak{D}$ are closed sub-semigroups of $\mathfrak{T}$ ;
(3)

$\mathfrak{D}[R]$ is a compact sub-semigroup of $\mathfrak{T}$ ;
(4)

$\mathcal{D}[r_{\bullet}]$ , $\mathcal{D}[R]$ , $\mathcal{D}$ are closed sub-semigroups of $T$ .

The ultracomputation space $\mathfrak{D}$ is akin to the concept of “enveloping group” (of $\Gamma(\cdot)\subseteq\mathfrak{T}$ ). However, only the confined sub-semigroups $\mathfrak{D}[R]$ are compact (the full space $\mathfrak{D}$ is typically noncompact).

Proof.

The set $\beta\Gamma$ of ultrafilters on $\Gamma$ is itself a semigroup under a natural (“convolution”) operation $(\mathcal{U},\mathcal{V})\mapsto\mathcal{U}{*}\mathcal{V}$ [HS10]. This operation of convolution possesses (and is essentially characterized by) the following property —when $\Gamma$ is identified with the transitions semigroup $\Gamma(\cdot)$ : If two transitions are of the form $\mathfrak{f}_{\mathcal{U}}:\mathfrak{v}\mapsto\lim_{\gamma,\mathcal{U}}\gamma(% \mathfrak{v})$ , $\mathfrak{f}_{\mathcal{V}}:\mathfrak{v}\mapsto\lim_{\gamma,\mathcal{V}}\gamma(% \mathfrak{v})$ , then $\mathfrak{f}_{\mathcal{U}}\circ\mathfrak{f}_{\mathcal{V}}=\mathfrak{f}_{% \mathcal{U}{*}\mathcal{V}}$ . It follows that $\mathfrak{D}\subseteq\mathfrak{T}$ is a sub-semigroup. As the intersection of a compact (by Proposition 4.5) with a closed subset of $\mathfrak{T}$ , we see that $\mathfrak{D}[R]=\mathfrak{T}[R]\cap\mathcal{D}$ is compact. The remaining topological statements are all trivial and left to the reader ∎

Proposition 4.7.

The ultracomputation $\mathfrak{f}_{\mathcal{U}}\coloneqq\operatorname{\mathcal{U}lim}_{\gamma}% \gamma(\cdot)\in\mathfrak{T}[R]$ exists for any exhaustive $R$ and any ultrafilter $\mathcal{U}$ on $\Gamma[R]$ .

Proof.

This is an immediate corollary of Proposition 4.6. ∎

5. Deep Iterations and Deep Equilibria

Throughout this section, fix a CCS $\mathcal{C}=\langle\underline{L},\underline{\Gamma}\rangle$ . For convenience, we assume some element $\operatorname{id}\in\Gamma$ (“identity”) satisfies the equality $\operatorname{id}(v)=v$ for all $v\in L$ .

We reiterate the Extendibility Axiom that each layer transitions $\gamma\in\Gamma$ extends to a Sh-continuous transition $\gamma(\cdot)\in\mathfrak{T}_{\mathrm{Sh}}$ .

5.1. Layered and iterative computations

Let $\gamma_{\bullet}=(\gamma_{n})_{n\in\mathbb{N}}\subseteq\Gamma$ be any sequence of computations (i.e., any element of the product space $\Gamma^{\omega}\coloneqq\prod_{n\in\mathbb{N}}\Gamma$ ). We regard $\gamma_{\bullet}$ as a sequence of “computation steps” to be successively applied (see the definition of Layered Computation below). The computation $\gamma_{n}$ will be called the $n$ -th atomic step, or the transition at layer $n$ (to layer $n+1$ ).

Layered Computations (LCs)

Given a sequence $\gamma_{\bullet}\in\Gamma^{\omega}$ of computation steps, the sequence $\gamma_{\bullet}^{(\circ)}=(\gamma_{\bullet}^{(n)})_{n\in\mathbb{N}}\in\Gamma^% {\omega}$ defined recursively by

	$\displaystyle\gamma_{\bullet}^{(0)}$	$\displaystyle\coloneqq\operatorname{id},$
	$\displaystyle\gamma_{\bullet}^{(n+1)}$	$\displaystyle\coloneqq\gamma_{n}\gamma_{\bullet}^{(n)}\qquad\text{for all $n% \in\mathbb{N}$,}$

(i.e., $\gamma_{\bullet}^{(n)}\coloneqq\gamma_{n-1}\gamma_{n-2}\dots\gamma_{1}\gamma_{0}$ ) is called the layered computation with atomic steps $\gamma_{\bullet}$ (or LC $\gamma_{\bullet}$ , for short).²⁶²⁶26Thus, LC $\gamma_{\bullet}$ denotes $\gamma^{(\circ)}$ , simply adding context to indicate the layer transitions $\gamma_{\bullet}$ yielding $\gamma^{(\circ)}$ . The term $\gamma_{\bullet}^{(n)}$ is called the $n$ -composite computation step of LC $\gamma_{\bullet}$ . A layered computation may also be called recursive, for obvious reasons. The set of layered computations LC $\gamma_{\bullet}$ obtained as $\gamma_{\bullet}\in\Gamma^{\omega}$ varies is denoted $\Gamma^{(\circ)}$ .

For a sizer $r_{\bullet}$ , let $\Gamma^{(\circ)}_{[r_{\bullet}]}\coloneqq\Gamma^{(\circ)}\cap(\Gamma[r_{% \bullet}])^{\omega}$ be the set of $r_{\bullet}$ -preserving LCs. (Note that it is the products $\gamma_{\bullet}^{(n)}$ —but not necessarily the atomic steps $\gamma_{n}$ —that are required to preserve the layer $L[r_{\bullet}]$ .) For an exhaustive sizer family $R$ , let $\Gamma^{(\circ)}_{[R]}\coloneqq\Gamma^{(\circ)}\cap(\Gamma[R])^{\omega}$ be the set of $R$ -confined LCs (or LCs confined by $R$ ).

The LC $\gamma_{\bullet}$ -evolution of a state $v\in L$ is the sequence

\gamma^{(\circ)}(v)\coloneqq(\gamma_{\bullet}^{(n)}(v))_{n\in\mathbb{N}}=(v,% \gamma_{0}(v),\gamma_{1}\gamma_{0}(v),\gamma_{2}\gamma_{1}\gamma_{0}(v),\dots).

The term “evolution” means “ $\gamma_{\bullet}$ -evolution” henceforth, whenever $\gamma_{\bullet}$ is given by context. The state at stage $n$ of $v$ under evolution is $\operatorname{ev}(\gamma^{(n)},v)$ .

Iterative computations (ICs)

Any fixed $\gamma\in\Gamma$ yields a constant sequence $\gamma_{\bullet}=(\gamma)_{n\in\mathbb{N}}$ . The corresponding LC has composite steps given by the sequence $(\gamma^{n})_{n\in\mathbb{N}}$ of compositional powers (iterates) of $\gamma$ ; we will call such LC an iterative computation (or just iteration) by $\gamma$ , and denote it by IC $\gamma$ . It is appropriate to think of iterative computations as evolving by “tying parameters” in the sense that all atomic steps are always the same $\gamma$ (i.e., the “tied parameter” is $\gamma$ itself). Note that IC $\gamma$ is $R$ -confined (or $r_{\bullet}$ -preserving) if and only if the fixed atomic step $\gamma$ is so.

5.2. Deep layers, deep iterates, and equilibria

5.2.1. Deep layers

A deep layer of LC $\gamma_{\bullet}$ is any deep computation that is an accumulation point of the sequence of composites $(\gamma_{\bullet}^{(n)}(\cdot))_{n\in\mathbb{N}}\subseteq\mathfrak{T}$ . Any such deep layer is of the form $\gamma_{\bullet}^{(\mathcal{U})}:v\mapsto\operatorname{\mathcal{U}lim}_{n}% \gamma^{(n)}(v)$ obtained as (pointwise) $\mathcal{U}$ -ultralimit via a nonprincipal ultrafilter $\mathcal{U}$ on $\mathbb{N}$ . (We use the notation $\gamma_{\bullet}^{(\mathcal{U})}$ for such ultracomputation (in-type) when the dependence on $\gamma_{\bullet}$ and $\mathcal{U}$ is to be made explicit.) For a confined such LC $\gamma_{\bullet}$ , deep limits exist for arbitrary $\mathcal{U}$ , by Proposition 4.7. If LC $\gamma_{\bullet}$ is not confined, the computations $\gamma^{(n)}$ may diverge.

5.2.2. Deep iterates

A deep iterate of $\gamma\in\Gamma$ is a deep layer for IC $\gamma$ .

The deep layer that is obtained via an ultrafilter $\mathcal{U}$ on $\mathbb{N}$ is denoted $\gamma^{(\mathcal{U})}$ ; it need not exist in general, but does if $\gamma$ is confined (by Proposition 4.7). Every deep iterate is a deep computation.

Remark 5.1.

In the nomenclature of [BKK19], a deep layer of LC $\gamma_{\bullet}$ is an “implicit layer”. They consider primarily compositions of layer transitions (i.e., LCs in our sense) with “tied parameter” $\gamma$ (the same layer transition at each stage), i.e., ICs in our sense. From our perspective, implicit layers are given each by some nonprincipal ultrafilter $\mathcal{U}$ on $\mathbb{N}$ , i.e., are of the form $\gamma_{\bullet}^{(\mathcal{U})}$ .

5.2.3. Deep equilibria

A deep equilibrium (layer) of IC $\gamma$ is an idempotent deep iterate $\mathfrak{i}=\gamma^{(\mathcal{U})}\in\mathcal{D}$ ( $\subseteq T$ ), i.e., a deep iterate $\mathfrak{i}$ such that $\mathfrak{i}(\mathfrak{i}(\mathfrak{v}))=\mathfrak{i}(\mathfrak{v})$ for all $\mathfrak{v}\in\mathfrak{L}_{\mathrm{Sh}}$ (hence the nomenclature “equilibrium”). It will also be called a (deep) iterative equilibrium of $\gamma$ .

Remark 5.2.

Although any iterative equilibrium $\mathfrak{i}$ of IC $\gamma$ satisfies $\mathfrak{i}\circ\mathfrak{i}=\mathfrak{i}$ , one generally has $\gamma\circ\mathfrak{i}\neq\mathfrak{i}\neq\mathfrak{i}\circ\gamma$ . The “equilibrium” property is self-referential, rather than in direct relation to the original computation $\gamma$ . Let us call a deep iterate $\gamma^{*}$ of IC $\gamma$ “ $\gamma$ -fixed” if $\gamma\circ\gamma^{*}=\gamma^{*}=\gamma^{*}\circ\gamma$ . Such $\gamma$ -fixed deep iterates need not exist even under the strong hypothesis (ensuring that deep iterates exist at all) that IC $\gamma$ is confined. On the other hand, if perchance an a deep iterate $\gamma^{*}$ of IC $\gamma$ satisfies $\gamma\circ\gamma^{*}=\gamma^{*}$ , then certainly $\gamma^{*}$ is a deep equilibrium in our sense.

Theorem 5.3 (Existence of Deep Iterative Equilibria).

Let $\gamma\in\Gamma$ be confined. Then there exists at least one iterative equilibrium $\mathfrak{i}$ for IC $\gamma$ .

Theorem 5.3 is essentially a particular case of the classical Ellis-Numakura Lemma; the proof below is standard (as in [Fur81]).

One cannot generally hope that deep iterative equilibria exist without some boundedness assumption (such as confinement). Moreover, $\mathfrak{i}{\restriction}L$ need not take values in $L$ , so it is not even composable with itself a priori! This highlights the need to consider transitions in type rather than as maps $L\to L$ on the layer state space $L$ .

Proof.

Let $R$ confine IC $\gamma$ . Let $G\subseteq\mathfrak{T}[R]$ be the topological closure of the semigroup $\{\gamma^{n}(\cdot):n\geq 1\}\subseteq T$ of transitions by iterates of $\gamma$ (excluding the trivial iterate $\gamma^{0}=\operatorname{id}$ ). By Proposition 4.6 (in the CCS obtained from $\mathcal{C}$ with computations semigroup $\langle\gamma\rangle$ generated by $\gamma$ ), $G$ is a compact Hausdorff topological semigroup under composition $(\mathfrak{f},\mathfrak{g})\mapsto\mathfrak{f}\circ\mathfrak{g}$ , which is continuous in the left argument $\mathfrak{f}$ (for fixed $g$ ). Elementary algebraic and topological considerations (in particular, the compactness of $G$ ), and Zorn’s Lemma, imply that $G$ has some minimal closed (nonempty) sub-semigroup $H$ (i.e., $H$ has no proper closed sub-semigroups). Fix $\mathfrak{i}\in H$ . The set $H\circ\mathfrak{i}$ is closed (since $g\mapsto g\circ\mathfrak{i}$ is continuous and $H$ is closed); moreover, $(H\circ\mathfrak{i})\circ(H\circ\mathfrak{i})\subseteq H\circ\mathfrak{i}$ , so $H\circ\mathfrak{i}\subseteq H$ is a closed sub-semigroup, hence $H\circ\mathfrak{i}=H$ by minimality of $H$ . Therefore, $\mathfrak{f}\circ\mathfrak{i}=\mathfrak{i}$ for some $\mathfrak{f}\in H$ . Let $H^{\prime}\coloneqq\{\mathfrak{g}\in H:\mathfrak{g}\circ\mathfrak{i}=\mathfrak% {i}\}\ni\mathfrak{f}$ . Thus, $H^{\prime}$ is clearly a nonempty sub-semigroup of $H$ , and also closed (as the inverse image of the closed singleton $\{\mathfrak{i}\}$ under the continuous map $\mathfrak{g}\mapsto\mathfrak{g}\circ\mathfrak{i}$ , again). By minimality, $H^{\prime}=H\ni\mathfrak{i}$ , so $\mathfrak{i}\circ\mathfrak{i}=\mathfrak{i}$ . ∎

5.3. Examples and discussion of deep iterates and deep equilibria

Example 5.4.

Let $L$ be a finite set of, say, $m\geq 1$ distinct elements. The choice of predicates is inessential in this context: we may simply take $L=[m]\coloneqq\{1,\dots,m\}$ : it is finite and therefore realcompact. Let $f:[m]\to[m]$ be any function, and $\underline{\Gamma}=\langle f\rangle\coloneqq\{f^{n}:n\in\mathbb{N}\}$ (as a semigroup under composition) act on $[m]$ by functional application $\operatorname{ev}:(g,i)\mapsto g(i)$ . Let $P_{\operatorname{id}}:L\to\mathbb{R}:i\mapsto i$ (the identity function) be the sole predicate on $[m]$ . In this way, we obtain a (realcompact) CCS $\mathcal{C}=(([m],P_{\operatorname{id}}),\underline{\Gamma})$ . Since $[m]$ is finite, there is $n\geq 1$ such that $S\coloneqq f^{n}([m])=f^{n+1}([m])=f(S)$ (thus, $S\neq\emptyset$ ); in particular, $f$ restricts to a bijection of $S\to S$ ; by relabeling points of $L=[m]$ if necessary, we may as well assume $S=[k]$ ( $1\leq k\leq m$ ). Thus, $g\coloneqq f\restriction[k]$ is a permutation of $[k]$ . Let $K$ be the order of $g$ (thus, $1\leq K\leq k!$ ). Let $N$ be any integer such that $N\geq n$ and $K$ divides $N$ . Then $f^{*}=f^{N}$ is a deep iterative equilibrium of $f$ : indeed, for $i\in[m]$ ,

\begin{split}f^{*}(f^{*}(i))&=f^{N}(f^{N-n}(f^{n}(i)))=g^{N}(g^{N-n}(f^{n}(i))% )\\ &\qquad\text{(since $f^{n}([m])\subseteq[k]$ and $g=f\restriction[k]$)}\\ &=g^{N-n}(g^{N}(f^{n}(i)))=g^{N-n}(f^{n}(i))\\ &\qquad\text{(since $K$ divides $N$ and $g^{K}$ is the identity)}\\ &=f^{N-n}(f^{n}(i))=f^{N}(i)=f^{*}(i).\end{split}

It is easy to show that $f^{*}$ is the unique iterative equilibrium of $f$ in such case.

Example 5.5.

Consider CCSs of the form $\mathcal{C}=\langle([0,1],\{P_{\operatorname{id}}\}),\underline{\Gamma}\rangle$ as in 3.5.1, where $\gamma:[0,1]\to[0,1]$ is a continuous map, $\Gamma=\langle\gamma\rangle$ the semigroup of iterates of $\gamma$ under composition, acting by functional application on $[0,1]$ . Already in this one-dimensional compact setting, there is a variety of possible behaviors of deep iterates and equilibria of IC $\gamma$ .

If $\Gamma(\cdot)$ is an equicontinuous family of functions on $[0,1]$ , the Arzelà-Ascoli Theorem implies that there exists a (sub)sequence $(\gamma^{n_{k}})_{k\in\mathbb{N}}$ of iterates converging uniformly to a continuous limit $\bar{\gamma}:[0,1]\to[0,1]$ , which is therefore a continuous deep iterate of $\gamma$ . In general, however, even if some deep iterates $\bar{\gamma}$ are continuous, some deep equilibria may be discontinuous. Typically (and necessarily so when $\gamma$ is a chaotic function—e.g., the logistic map $\gamma(v)=4v(1-v)$ ), the semigroup $\Gamma(\cdot)$ is not an equicontinuous collection of functions, and deep equilibria (as well as deep iterates) are necessarily discontinuous. Moreover (in contrast with the equicontinuous case possessing continuous deep iterates sequential achieved sequentially), deep iterates $\mathfrak{f}$ of a chaotic IC $\gamma$ cannot be obtained as sequential limits $\lim_{k}\gamma^{n_{k}}$ , but generally only as ultralimits.

Example 5.6 (Deep iterates and equilibria of Newton’s Method).

Fix a polynomial $p$ with (real or) complex coefficients—say, of degree $d\geq 2$ . Consider the CCS

\bigl{\langle}(\hat{\mathbb{C}},\{U,V,W\}),\langle f\rangle\bigr{\rangle},

where

•

$\hat{\mathbb{C}}=\mathbb{C}\cup\{\infty\}$ is the Riemann sphere, which we identify with the unit sphere $S^{2}=\{(u,v,w)\in\mathbb{R}^{3}:u^{2}+v^{2}+w^{2}=1\}$ via, e.g., the stereographic projection $(u,v,w)\mapsto z=(u+iv)/(1-w)$ (and $(0,0,1)\mapsto\infty$ );
•

$\hat{\mathbb{C}}\to S^{3}:z\mapsto(U,V,W)$ is the inverse of the stereographic projection, regarded as a triple of predicates $U,V,W:\hat{\mathbb{C}}\to[-1,1]$ ; and

•

\gamma(z)\coloneqq\begin{cases}z-\frac{p(z)}{p^{\prime}(z)}&\text{($p^{\prime}% (z)\neq 0$)}\\ z&\text{($p^{\prime}(z)=0=p(z)$)}\\ \infty&\text{($p^{\prime}(z)=0\neq p(z)$, or $z=\infty$)}\end{cases}

is the transition carrying out one step of Newton’s method to find the roots of $p(z)$ , regarded as a Möebius transformation acting on $\hat{\mathbb{C}}$ (thus, meromorphic, and hence continuous as a map $\gamma:\hat{\mathbb{C}}\to\hat{\mathbb{C}}$ ).²⁷²⁷27Since $\deg p\geq 2$ , it is straightforward to verify that $z\mapsto p(z)/p^{\prime}(z)$ extends continuously to $\mathbb{C}$ by $z\mapsto\infty$ when either $p^{\prime}(z)=0$ or $z=\infty$ .

Since $\hat{\mathbb{C}}$ is compact and $z\mapsto(U(z),V(z),W(z))$ is a homeomorphic embedding, In fact, $\hat{\mathbb{C}}$ is equal to the shard $\hat{\mathbb{C}}[1,1,1]=\bigl{\{}z:\max\bigl{(}\left|U(z)\right|,\left|V(z)% \right|,\left|W(z)\right|\bigr{)}\leq 1\bigr{\}}$ (in particular, $\hat{\mathbb{C}}$ is realcompact); thus, $\gamma$ is automatically confined (by $R$ consisting of the single sizer $r_{\bullet}=(r_{U},r_{V},r_{W})=(1,1,1)$ ).

Let $\gamma^{*}$ be any deep iterate of $\gamma$ . At any point $z\in\hat{\mathbb{C}}$ for which the Newton method converges to a root $w$ of $p(z)$ (in particular, at any $z$ sufficiently close to a simple such root $w$ ), we have $\gamma^{*}(z)=w$ ( $=\gamma(w)$ , since $p(w)=0$ ). We also have $\gamma^{*}(\infty)=\infty=\gamma(\infty)$ ; however, $\infty$ is a repeller²⁸²⁸28Perhaps surprisingly, it is possible for the fixed repeller $\infty$ to be an accumulation point of orbits $(\gamma^{n}(z))_{n\in\mathbb{N}}\subseteq\mathbb{C}$ . This is the case, e.g., for the polynomial $p(z)=z^{3}-1$ . (this follows from the easy calculation that $p(z)/z\to 1-d^{-1}$ as $\left|z\right|\to+\infty$ ), so one would expect points $z\in\mathbb{C}$ with $\gamma^{*}(z)=\infty$ to be quite scant. In general, however, $w\coloneqq\gamma^{*}(z)$ is not a root of $p$ , although any such $w\in\hat{\mathbb{C}}$ is necessarily a topologically recurrent point of $\hat{\mathbb{C}}$ under $\gamma$ . At any rate, if $p$ has at least two distinct roots, any deep equilibrium (or deep iterate) $\gamma^{*}$ of $\gamma$ is discontinuous.

Many examples of polynomials for which Newton’s method converges for a very large set of inputs are known. The most one can hope for is that the method converges to a root for all inputs except those in a (say) closed subset $F\subseteq\mathbb{C}$ of “bad” inputs (in particular, $\infty\in F$ ) which, in the best of cases, is nowhere dense; such is the case, e.g., for $p(z)=z^{3}-1$ , where $F$ is perhaps the best-known example of a Newton fractal. All deep iterates and equilibria $\gamma^{*}$ have the same value $\lim_{n\to\infty}\gamma^{n}(z)$ at (convergent) inputs $z\in\mathbb{C}\setminus K$ , and the common restriction of all such $\gamma^{*}$ to $\mathbb{C}\setminus K$ is continuous. However, deep iterates and equilibria are typically discontinuous on, and their values differ, at inputs $z\in K$ . Intuitively, deep iterates $\gamma^{*}$ , $\gamma^{**}$ giving distinct values $\gamma^{*}(z)\neq\gamma^{**}(z)$ are merely picking different subsequential limits of the divergent sequence $(\gamma^{n}(z))$ .

Example 5.7.

The definitions of deep layer state and deep iterative equilibrium above are motivated by the notions of “Deep Equilibrium (DE)” in [BKK19]. However, iterative computations in [BKK19] allow “feeding” the initial state $v$ as an argument at each iteration by a (“parameter-tied”, i.e., fixed) layer transformation. Capturing deep iterative equilibria in this sense requires generalizing the notion of CCS. One way to capture the deep equilibria of Bai et al. is allowing $\mathcal{C}$ to be a CCS with $n$ -ary (in fact, just binary) layer transformations as in Remark 3.4(2). Indeed, fix a binary $\gamma\in\Gamma$ , which induces a two-argument layer transition $\gamma(\cdot,\cdot):L\times L\to L$ . Consider the map $\delta:L\times L\to L\times L$ given by $\delta(v,w)\coloneqq(v,\gamma(v,w))$ (the first entry of $\delta(v,w)$ is simply a pass-through of the first argument, while the second entry applies the computation $\gamma$ ). Then the second (nontrivial) entry $w_{n}\eqqcolon f_{n}(v)$ of the iterates $\delta^{n}(v,v)=(v,w_{n})$ for $n\in\mathbb{N}$ represents the evolution of the computation $\gamma$ passing through, at each step, the original argument $v$ as the first of two inputs.

If $L$ is realcompact and $\gamma$ is $R$ -confined (i.e., restricts to a map $L[r_{\bullet}]\times L[r_{\bullet}]\to L[r_{\bullet}]$ for all $r_{\bullet}\in R$ ), the proof of Theorem 5.3 is adapted mutatis mutandis to computations in CCSs with $n$ -ary transitions. One shows thus the existence of deep equilibria, i.e., of idempotent maps $\mathfrak{i}:L\to L$ arising as ultralimits of the iterates sequence $(f_{n})_{n\in\mathbb{N}}$ of evolution by $\gamma$ . (Without a realcompactness assumption, one needs suitable hypotheses on $\gamma$ akin to Sh-extendibility.)

As an alternative to the use of CCSs with $n$ -ary computations, in Appendix A.1.4, we introduce the notion of Parametrized Family of Computations (PFC) to capture computations with feed-through in our framework. The ability to compute deep equilibria in an effective sense, as in Bai et al., presupposes that such equilibria are definable not merely in a continuous, but in a differentiable sense (allowing the use of generic solver—or fixed-point—algorithms, which typically rely on gradient-descent methods, e.g., Newton’s algorithm and refinements); we explain how such considerations of differentiability may be handled in CSS with finitely many predicates (considerations of differentiability when infinitely many observables are involved entail delicate analysis beyond the scope of this paper).

Remarks 5.8.

(1)

The results in Section 6 below say nothing about effectively computable features of (shard-)discontinuous deep iterates or equilibria such as those arising from Newton’s method iterations in Example 5.6 above. In an upcoming article, we extend the present results to discontinuous ultracomputations that are nevertheless de facto effectively computable in a localized sense.
(2)

Even in situations where, say, a deep iterate does not quite exist, an ultracomputation may have “meaningful deep features” in a sense that we now explain. Consider any CCS $\mathcal{C}=(L,\langle f\rangle,\circ,\mathcal{P})$ (not necessarily realcompact), where $f:L\to L$ is any given (continuous) computation. For a fixed $Q\in\mathcal{P}$ , say that $f$ has uniformly $Q$ -bounded iterates on $v\in L$ if there exists $s=s^{(v)}>0$ such that $\left|Q(f^{n}(v))\right|\leq s$ for all $n\in\mathbb{N}$ . (Note that this hypothesis does not—at all—impose bounds on other entries $P\circ f^{n}(v)$ for $Q\neq P\in\mathcal{P}$ .) If $\mathcal{U}$ is any nonprincipal ultrafilter on $\mathbb{N}$ , the iterate boundedness hypothesis and the compactness of intervals $[-s,s]$ imply that $\operatorname{\mathcal{U}lim}Q(f^{n}(v))$ exists for all $v\in L$ . In principle, however, the iterates $f^{n}$ need not $\mathcal{U}$ -converge in $T$ (i.e., pointwise on $L$ ) even if $L$ is realcompact, since (for fixed $v$ ) the sequence $(f^{n}(v))_{n\in\mathbb{N}}$ may not be entry-wise bounded (only bounded “in $Q$ -th entry”, so to speak).

The study of aspects of deep equilibria introduced in Remarks 5.8 is quite delicate, and exceeds the scope of the present paper.

6. Explicit computability

Throughout this section, we fix a CCS $\mathcal{C}=\langle\underline{L},\underline{\Gamma}\rangle$ . We assume $\Gamma$ has an identity element $\operatorname{id}$ acting as the identity map on $L$ . We reiterate the Extendibility Axiom that each layer transitions $\gamma\in\Gamma$ extends to a Sh-continuous transition $\gamma(\cdot)\in\mathfrak{T}_{\mathrm{Sh}}$ .

We shall implicitly identify a predicate symbol $P$ with the real-valued function $P(\cdot):L\to\mathbb{R}$ interpreting it in $\mathcal{C}$ , and also implicitly extend $P(\cdot)$ to a (unique continuous) function $\mathfrak{L}_{\mathrm{Sh}}\to\mathbb{R}$ .

A real function $\varphi:\mathfrak{L}_{\mathrm{Sh}}\to\mathbb{R}$ will be called shard-bounded (sh-bdd) (resp., Sh-continuous) if its restriction to each shard $\mathfrak{L}[r_{\bullet}]$ is bounded (resp., continuous). (A Sh-continuous such function is necessarily sh-bdd.)

6.1. Polynomials in predicates and definability. Features of layer transitions.

•

Any predicate $P$ will also be called a monomial.²⁹²⁹29In real-valued logic, the monomials above are called “atomic”.
•
A polynomial is any function $L\to\mathbb{R}$ obtained by combining real constants $r\in\mathbb{R}$ and monomials using any (recursive) combination the following operations, called connectives:
- –
  
  Addition: $(\varphi,\psi)\mapsto\varphi+\psi$ (where $\varphi+\psi:v\mapsto\varphi(v)+\psi(v)$ );
- –
  
  Multiplication: $(\varphi,\psi)\mapsto\varphi\psi$ (where $\varphi\psi:v\mapsto\varphi(v)\psi(v)$ ).
The monomials appearing in an expression of some polynomial may be called its atoms.³⁰³⁰30A polynomial $\varphi$ need not have a unique expression in terms of monomials, so it is more accurate to say that $\varphi$ has an expression involving certain specific monomials.
•

A definable predicate is a function $\varphi:L\to\mathbb{R}$ whose restriction to an arbitrary shard $L[r_{\bullet}]$ is uniformly approximable by polynomials;³¹³¹31The notion of “definable predicate” above is less restrictive than the (most) standard one in real-valued logic, wherein approximability is required to hold uniformly over the entire set (“universe”) $L$ . thus, $\varphi$ is definable iff for every $\varepsilon>0$ and sizer $r_{\bullet}$ there exists a polynomial $\psi=\psi_{[r_{\bullet}]}^{(\varepsilon)}$ such that $\left|\varphi(v)-\psi(v)\right|<\varepsilon$ for all $v\in L[r_{\bullet}]$ . The family $(\psi_{[r_{\bullet}]}^{\varepsilon}:\varepsilon>0,\text{$r_{\bullet}$ sizer})$ is called a definition scheme for $\varphi$ .

We only require definable predicates to be uniformly approximable on shards—not uniformly on the full state space $L$ .

Remarks 6.1.

(1)

Definable predicates $\varphi$ formalize a notion of “explicit computability” of $\varphi$ , in a certain local and approximate sense. Namely, given (i) any “approximation error” $\varepsilon>0$ , and (ii) some a priori knowledge of the argument $v$ (i.e., knowing that $v$ belongs to a specific shard $L[r_{\bullet}]$ —this is the sense of “locality” of the computation), one may regard the $\varepsilon$ -uniformly approximating formula $\psi^{(\varepsilon)}_{[r_{\bullet}]}$ to $\varphi$ on $L[r_{\bullet}]$ as an explicit algorithm that (modulo an approximation error not exceeding $\varepsilon$ ) computes $\varphi(v)$ . Numerical algorithms relying on floating-point operations are typically definable in the above sense: On the one hand, one must ensure that the calculation is stable under rounding errors (of the order of the machine’s $\varepsilon$ ); on the other, such rounding errors on inputs potentially may lead to arbitrarily large output error unless the magnitude of inputs is bounded (i.e., unless the inputs belong to a given shard) a priori.
(2)

By the definition of the topologies on $L$ and $\mathfrak{L}_{\mathrm{Sh}}$ , every monomial $P$ is continuous and bounded by $r_{P}$ on any shard $L[r_{\bullet}]$ , and extends continuously to $\mathfrak{L}_{\mathrm{Sh}}$ (as the $P$ -coordinate function). Since connectives are obtained by pointwise application of continuous real-valued functions of real arguments (addition and multiplication), every polynomial on $L$ is also continuous, and extends to a continuous bounded function on type-shards $\mathfrak{L}[r_{\bullet}]$ . Definable predicates, on the other hand, need not be continuous on $L$ —although their restrictions to shards $L[r_{\bullet}]$ necessarily are continuous and bounded (being uniform limits of polynomial on $L[r_{\bullet}]$ , which is compact).
(3)

Let $r_{\bullet}$ be an arbitrary sizer. The restriction of a monomial $P$ to the type-shard $\mathfrak{L}[r_{\bullet}]$ admits the a priori bound $C=r_{P}$ , so that that $P{\restriction}\mathfrak{L}[r_{\bullet}]$ takes values in $[-C,C]=[-r_{P},r_{P}]$ .³²³²32A constant $r$ also admits the trivial bound $C=\left|r\right|$ . By recursion on the application of connectives leading to an arbitrary polynomial $\varphi$ from monomials, a priori bounds $C=C^{\varphi}_{r_{\bullet}}\in[0,\infty)$ such that $\varphi{\restriction}\mathfrak{L}[r_{\bullet}]$ takes values in $[-C,C]$ are easily found. (Recursively apply the rules: $C_{r_{\bullet}}^{\varphi+\psi}\coloneqq C_{r_{\bullet}}^{\varphi}+C_{r_{% \bullet}}^{\psi}$ , and $C_{r_{\bullet}}^{\varphi\psi}\coloneqq C_{r_{\bullet}}^{\varphi}\cdot C_{r_{% \bullet}}^{\psi}$ .)
(4)

By definition of the topology on $L$ and the Reduction Axioms, the collection of (continuous) predicates $P(\cdot):\mathfrak{L}_{\mathrm{Sh}}\to\mathbb{R}$ (extended to the type space $\mathfrak{L}_{\mathrm{Sh}}$ ) separates points of $\mathfrak{L}_{\mathrm{Sh}}$ (a fortiori, points of any shard $L[r_{\bullet}]$ ). By the Stone-Weierstrass Theorem, any Sh-extendable $\varphi:L\to\mathbb{R}$ , is necessarily definable. (In particular, any continuous $\varphi:\mathfrak{L}_{\mathrm{Sh}}\to\mathbb{R}$ is definable in such case.) Clearly, the condition may be relaxed to requiring that $\varphi$ have continuous restrictions to type-shards $\mathfrak{L}[r_{\bullet}]$ for $r_{\bullet}$ in some exhaustive $R$ . By contrast, continuous predicates $L\to\mathbb{R}$ need not be definable.
(5)

In general, a function $\varphi:L\to\mathbb{R}$ whose restrictions to shards are continuous need not be continuous on $L$ (not even under the additional assumption that $L$ be realcompact). For $\mathcal{P}$ (at most) countable, however, Sh-continuous real functions on the type space $\mathfrak{L}_{\mathrm{Sh}}$ are continuous (since $\mathfrak{L}_{\mathrm{Sh}}=\mathfrak{L}$ is a k-space in such case, by Proposition 4.1).

6.1.1. Definable features

Remarks 5.8 provide relevant context for this subsection.)

Given $P\in\mathcal{P}$ , the $P$ -feature of a transition-in-type $\mathfrak{f}\in\mathfrak{T}$ is the real-valued function

	$\displaystyle P\circ\mathfrak{f}:L$	$\displaystyle\to\mathbb{R}$
	$\displaystyle v$	$\displaystyle\mapsto P(\mathfrak{f}(v)).$

(One may call such a feature “atomic” or “monomial”.)

Individual features of a transition-in-type $\mathfrak{f}\in\mathfrak{T}$ may be definable or non-definable. A transition-in-type is definable if its features are definable.

In the setting of Section 5.2.3, one may ask under what circumstances a specific feature of a deep computation $\mathfrak{f}\in\mathfrak{D}$ is effectively computable.

Sh-continuous features of transitions are definable, By Remark 6.1(4).

6.2. Definability of ultracomputations-in-type

Nonprincipal ultrafilters $\mathcal{U}$ on infinite sets are ineffably inexplicit. Thus, as a first step towards grasping ultracomputations, it is natural to consider ultralimits $\gamma_{\bullet}^{\mathcal{U}}$ of pointwise-bounded sequences $\gamma_{\bullet}=(\gamma_{n})_{n\in\mathbb{N}}\subseteq\Gamma$ indexed by the infinite countable set $\mathbb{N}$ . Ultracomputations $\gamma_{\bullet}^{\mathcal{U}}$ obtained in this form (as $\gamma_{\bullet}$ and $\mathcal{U}$ vary) are accumulation points of arbitrary countable sets of realized computations.

Ultralimits obtained from countable subsets of $\Gamma(\cdot)$ , although less general than those obtained from arbitrary subsets, may still be quite complex. Given a countable set $\gamma_{\bullet}=(\gamma_{n}(\cdot))_{n<\omega}\subseteq\Gamma$ of pointwise-bounded computations, it is natural to consider sequential limits of $\gamma_{\bullet}$ , i.e., ultracomputations arising as pointwise limits of subsequences of $\gamma_{\bullet}$ , namely ultracomputations $\tilde{\gamma_{\bullet}}$ of the form

\mathfrak{v}\mapsto\tilde{\gamma_{\bullet}}(\mathfrak{v})\coloneqq\lim_{k\to% \infty}\gamma_{n_{k}}(\mathfrak{v})

for subsequences (otherwise arbitrary) $\tilde{\gamma_{\bullet}}=(\gamma_{n_{k}})_{k\in\mathbb{N}}$ of $\gamma_{\bullet}$ .

By Proposition 4.5, pointwise-boundedness of $\gamma_{\bullet}$ implies that all ultracomputations $\gamma_{\bullet}^{\mathcal{U}}$ exist for arbitrary $\mathcal{U}$ on the index set $I$ of any family $\gamma_{\bullet}=(\gamma_{i})_{i\in I}$ —regardless of the cardinality of $\mathcal{P}$ or $L$ . Ultracomputations $\gamma_{\bullet}^{\mathcal{U}}$ realizable from sequences $\gamma_{\bullet}=(\gamma_{n})_{n\in\mathbb{N}}$ are quite special; those realizable as sequential limits $\tilde{\gamma_{\bullet}}$ , even more so.

If $\gamma_{\bullet}$ is a pointwise-bounded sequence, and $\mathcal{P}$ is at most countable, then at every fixed $\mathfrak{v}\in\mathfrak{L}_{\mathrm{Sh}}$ , the ultralimit $\tilde{\gamma_{\bullet}}(\mathfrak{v})$ is realized as a sequential limit (by a standard diagonalization argument); however, the realizing subsequence $(\gamma_{n_{k}})$ will typically depend on $\mathfrak{v}$ and cannot be chosen uniformly over $\mathfrak{v}\in\mathfrak{L}_{\mathrm{Sh}}$ . When $\mathcal{P}$ is uncountable, sequentially realizing an ultralimit of $\gamma_{\bullet}$ —even at a single point $\mathfrak{v}$ —may be unfeasible.

The results in this concluding section relate (i) continuity on shards of ultracomputations, (ii) the ability to obtain such ultracomputations as accumulation points of countable sets of computations, or as sequential limits of computations, (iii) the definability of such ultracomputation, and (iv) a limit-exchange criterion (originally due to Grothendieck).

6.2.1. Relative compacta of continuous layer transitions

For any topological space $X$ , let $\mathrm{C}_{\mathrm{p}}(X)\subseteq\mathbb{R}^{X}$ be the set of all continuous real functions on $X$ , endowed with the relative (subspace) topology of the product $\mathbb{R}^{X}$ , i.e., the topology of point-wise convergence at each $x\in X$ . More generally, given two spaces $X,Y$ , the space $\mathrm{C}_{\mathrm{p}}(X;Y)$ is the subspace of the product $Y^{X}=\prod_{x\in X}Y$ consisting of continuous maps $X\to Y$ . (“ $\mathrm{C}_{\mathrm{p}}$ ” means “pointwise topology on continuous functions”.)

Note that $\mathrm{C}_{\mathrm{p}}(X),\mathrm{C}_{\mathrm{p}}(X;Y)$ are generally not closed subspaces of $\mathbb{R}^{X},Y^{X}$ .

A Hausdorff topological space $Z$ is countably compact if every infinite (equivalently, every infinite countable) subset $B\subseteq Z$ has a limit point $z\in Z$ . A subset $A\subseteq Z$ is relatively countably compact (or countably compact in $Z$ ) if every infinite (equivalently, every infinite countable) subset $B\subseteq A$ has a limit point $z\in Z$ .

(One may take the properties above as the definition of (relatively) countably compact for arbitrary, not necessarily Hausdorff spaces $Z$ . However, the Hausdorff assumption implies desirable additional properties, e.g., [Eng89, Theorems 3.10.2, 3.10.3, etc.]. In our applications, $Z$ is always a subspace of the layer state space $L$ of a CCS, or of the type space $\mathfrak{L}$ , and hence Hausdorff.)

A topological space $Y$ is angelic if (i) every relatively countably compact subset $A\subseteq Y$ is relatively compact, and (ii) the closure $\overline{A}\subseteq Y$ of any such (relatively compact) $A$ consists precisely of limits of sequences in $A$ .³³³³33A topological space possessing property (ii) above is called Fréchet-Urysohn.

6.2.2. A topological result of Grothendieck

Theorem 6.2.

Let

•

$X$ be a countably compact topological space;
•

$Y$ , any Tychonoff space, having the property that its relatively countably compact subsets are relatively compact (which necessarily holds in case $Y$ is realcompact);
•

$X_{0}\subseteq X$ any dense subset.

Then:

(1)

$\mathrm{C}_{\mathrm{p}}(X;Y)$ is angelic.

(2)

Assume that $Y$ is explicitly embedded as a subspace $Y\subseteq\mathbb{R}^{\mathcal{P}}$ for some index set $\mathcal{P}$ . A set $F\subseteq\mathrm{C}_{\mathrm{p}}(X;Y)$ of continuous maps $X\to Y$ is relatively compact if and only if

(a)

$F$ is pointwise bounded (i.e., $\{P(f(x)):f\in Y\}$ is bounded for each $P\in\mathcal{P}$ and $x\in X$ )³⁴³⁴34Here, we use the notation $P(f(x))$ for the “ $P$ -th coordinate” $f_{P}(x)$ of any $f\in(\mathbb{R}^{\mathcal{P}})^{X}$ ., and

(b)

for all sequences $(f_{m})_{m\in\mathbb{N}}\subseteq F$ , $(x_{n})_{n\in\mathbb{N}}\subseteq X_{0}$ , any $P\in\mathcal{P}$ and ultrafilters $\mathcal{U},\mathcal{V}$ on $\mathbb{N}$ , the following equality (called the limit-exchange property) holds between iterated ultralimits:

(6.1)

\operatorname{\mathcal{U}lim}_{m}\operatorname{\mathcal{V}lim}_{n}P(f_{m}(x_{n% }))=\operatorname{\mathcal{V}lim}_{n}\operatorname{\mathcal{U}lim}_{m}P(f_{m}(% x_{n})),

which both exist.

(3)

Even if all hypotheses on $X,Y$ pertaining to compactness are omitted (i.e., $Y$ is Tychonoff and $X$ arbitrary), the limit-exchange condition (b) alone implies that every accumulation point of $F\subseteq Y^{X}$ is continuous (i.e., the closure $\overline{F}\subseteq\mathrm{C}_{\mathrm{p}}(X;Y)$ ).

For a contemporary exposition of Grothendieck’s theorem and its consequences, we refer the reader to the paper on angelic spaces and the double limit relation by König and Kuhn [KK87].

Proof.

Theorem 6.2 aggregates several results in Grothendieck’s “Critères de compacité” [Gro52, Théorèmes 1 & 2, Remarque 2, Corollaire 2]. Presently, we merely offer some remarks on translating between French terms and decades-old nomenclature to their contemporary equivalents in English. Spaces $\mathrm{C}_{\mathrm{s}}(X;Y)$ (where “s” refers to the “simple” topology, i.e., of pointwise convergence) are now denoted $\mathrm{C}_{\mathrm{p}}(X;Y)$ (or just $\mathrm{C}_{\mathrm{p}}(X)$ , when $Y=\mathbb{R}$ ). “(Relativement) semi-compact” (resp., “relativement compact”) refers to (relatively) countably compact (resp., relatively compact) sets. Functions take values in $Y$ , which we take to be a Tychonoff space (“complètement régulier”—i.e., completely regular and Hausdorff in the standard contemporary sense) endowed with an embedding into a product $\mathbb{R}^{\mathcal{P}}$ , hence $Y$ is a uniform Hausdorff space (“espace uniforme séparé”) [Eng89, Sections §1.5, §3.10, §8.1]. ∎

Remarks 6.3.

(1)

Condition (a) above implies that both iterated ultralimits in equation (6.1) in (b) exist. However, (b) explicitly asserts the requirement the limits exist—not merely that they are equal when they exist.
(2)

The hypotheses on $Y$ are satisfied if $Y$ is realcompact, in which case the embedding $Y\subseteq\mathbb{R}^{\mathcal{P}}$ is as a closed subspace of the product; moreover, any $Y$ embedded as a closed subspace of any such product of lines satisfies all hypothesis (including those in part (2) of the theorem).

6.2.3. The Fundamental Theorem of Definability

Theorem 6.4.

Let $\mathcal{C}=\langle\underline{L},\underline{\Gamma}\rangle$ be a CCS. Let $R$ be an exhaustive sizer collection, and let $\Delta\subseteq\Gamma[R]$ be any $R$ -confined set (of Sh-extendable computations, by assumption). Then, the properties below are equivalent:

Extendable Ultracomputations (uExt)

Every ultracomputation over $\Delta$ is Sh-extendable.

Limit Exchange (LE)

For all sizers $r_{\bullet}$ , all sequences $v_{\bullet}\subseteq L[r_{\bullet}]$ and $\gamma_{\bullet}\subseteq\Delta$ , and ultrafilters $\mathcal{U},\mathcal{V}$ on $\mathbb{N}$ , the iterated ultralimits $\operatorname{\mathcal{U}lim}_{m}\operatorname{\mathcal{V}lim}_{n}\gamma_{m}(v% _{n})$ and $\operatorname{\mathcal{V}lim}_{n}\operatorname{\mathcal{U}lim}_{m}\gamma_{m}(v% _{n})$ both exist and are equal:

(6.2)

\operatorname{\mathcal{U}lim}_{m}\operatorname{\mathcal{V}lim}_{n}\gamma_{m}(v% _{n})=\operatorname{\mathcal{V}lim}_{n}\operatorname{\mathcal{U}lim}_{m}\gamma% _{m}(v_{n}).

Uniform Approximation (UA)

Every ultracomputation $\mathfrak{f}$ over $\Delta$ is definable without parameters: For any sizer $r_{\bullet}$ , any $\varepsilon>0$ , and all $P\in\mathcal{P}$ , there exists a polynomial $\psi=\psi_{r_{\bullet},\varepsilon,P}$ (without parameters) such that

(6.3)

\left|\psi(v)-P(\mathfrak{f}(v))\right|<\varepsilon\qquad\text{for all $v\in L% [r_{\bullet}]$.}

Moreover:

(1)

In case any (hence all) of the above conditions hold for $\Delta$ , the restriction of any ultracomputation $\mathfrak{f}$ over $\Delta$ to any type-shard $\mathfrak{L}[r_{\bullet}]$ is the limit $\mathfrak{f}{\restriction}\mathfrak{L}[r_{\bullet}]=\lim_{n}\gamma_{n}(\cdot){% \restriction}\mathfrak{L}[r_{\bullet}]$ obtained from a sequence $\gamma_{\bullet}\subseteq\Delta$ .
(2)

For arbitrary $\Delta\subseteq\Gamma$ (i.e., $\Delta$ not a priori included in $\Gamma[R]$ for some exhaustive $R$ ), the Limit Exchange condition alone implies that all ultracomputations over $\Delta$ are Sh-extendable.³⁵³⁵35The explicit LE hypothesis that both iterated ultralimits in (6.2) exist is essential when $R$ and the implied pointwise bounds on $\Delta$ are not given a priori.

Proof of Theorem 6.4.

Because of the hypothesis $\Delta\subseteq\Gamma[R]$ , it is quite clear that one may specialize all uses of sizers $r_{\bullet}$ and universal properties of sizers to involve sizers $r_{\bullet}\in R$ only.

In Grothendieck’s Theorem 6.2, let $Y=\mathfrak{L}\subseteq\mathbb{R}^{\mathcal{P}}$ (realcompact) and, for a momentarily fixed $r_{\bullet}\in R$ , let $X=\mathfrak{L}[r_{\bullet}]$ (compact, hence countably compact), and $Z\coloneqq\mathrm{C}_{\mathrm{p}}(\mathfrak{L}[r_{\bullet}];\mathfrak{L})$ . Denote by $\Delta[r_{\bullet}]\subseteq Z$ the set of functions $\gamma_{[r_{\bullet}]}\coloneqq\gamma(\cdot){\restriction}\mathfrak{L}[r_{% \bullet}]$ as $\gamma\in\Delta$ varies. By Theorem 6.2, the condition that all ultracomputations over $\Delta$ are continuous on $\mathfrak{L}[r_{\bullet}]$ is equivalent to the relative compactness of $\Delta[r_{\bullet}]\subseteq Z$ .

Since $Z$ is angelic (Theorem 6.2(1)), assertion (1) follows.

The pointwise boundedness condition 2(a) in Theorem 6.2 is satisfied since $\Delta[r_{\bullet}]$ is pointwise bounded (as $\Delta$ is uniformly confined by assumption); therefore, relative compactness of $\Delta[r_{\bullet}]$ is, in turn, characterized by the Limit Exchange condition (equivalent to 2(b)), so LE is equivalent to the preceding three conditions. Moreover, assertion (2) follows from Theorem 6.2(3).

Any feature $P\circ\mathfrak{f}:L\to\mathbb{R}$ of any transition-in-type $\mathfrak{f}:L\to\mathfrak{L}$ , if uniformly approximable on some shard $L[r_{\bullet}]$ by polynomials $\psi$ —any of which has a unique extension to a continuous real function on $\mathfrak{L}$ , bounded on $\mathfrak{L}[r_{\bullet}]$ —must necessarily extend continuously to $\mathfrak{L}[r_{\bullet}]$ . Letting $P\in\mathcal{P}$ and $r_{\bullet}$ vary, we see that a definable ultracomputation is necessarily Sh-continuous: UA implies uExt. Reciprocally, by the Stone-Weierstrass Theorem, every continuous real function $\mathfrak{L}[r_{\bullet}]\to\mathbb{R}$ is uniformly approximable by polynomials in predicates $P\in\mathcal{P}$ (because these predicates separate points of $\mathfrak{L}[r_{\bullet}]$ ), i.e., by polynomials without parameters. Therefore, any Sh-continuous ultracomputation is definable without parameters: uExt implies UA. ∎

Remarks 6.5.

(1)

The extendibility condition (uExt) in Theorem 6.4 may be regarded as auxiliary in proving the equivalence LE $\Leftrightarrow$ UA. The implication UA $\Rightarrow$ LE is not difficult to prove directly: On the one hand, UA $\Rightarrow$ uExt by the straightforward argument in the proof above. Afterward, uExt $\Rightarrow$ LE follows easily: uExt implies that every ultracomputation $\mathfrak{f}\coloneqq\operatorname{\mathcal{U}lim}_{n}\gamma_{n}(\cdot)$ is continuous on any compact $\mathfrak{L}[r_{\bullet}]$ , and LE simply states the continuity of $\mathfrak{f}$ at ultralimit points of the form $\mathfrak{v}\coloneqq\operatorname{\mathcal{V}lim}_{n}\operatorname{tp}(v_{n})% \in\mathfrak{L}[r_{\bullet}]$ for arbitrary state sequences $(v_{n})\subseteq L[r_{\bullet}]$ .

By contrast, the implication LE $\Rightarrow$ UA may be seen as a significantly deeper consequence of Grothendieck’s Theorem: A natural limit-exchange condition implies that layer transformations-in-type are explicitly computable!
(2)

One could take a probabilistic approach to the uniqueness and computability of equilibria inspired by ideas from deep learning and the Examples 5.5 and 5.6 in Section 5.3. For simplicity, assume that $L$ is realcompact (so $L=\mathfrak{L}_{\mathrm{Sh}}=\mathfrak{L}$ ). The uniqueness and continuity of deep iterates $\gamma^{*}$ at a state $v\in L$ may be tested empirically by taking finitely many independent random points $(v_{i})_{i<k}$ in a small neighborhood of $v$ and computing $w_{i}=\gamma^{n_{i}}(v_{i})$ for some large and also random integers $(n_{i})_{i<k}$ . To the extent that the points $(w_{i})_{i<k}$ are (or are not) near each other, one may infer (in a statistical sense) whether $f^{*}$ is (or is not) continuous at $v$ with increasingly larger probability as $k$ grows. At points of continuity $v$ (as determined with high probability taking $k$ sufficiently large), any of the computed points $w_{i}$ may be regarded as an approximation to the exact and unique value $\gamma^{*}(v)$ . This approach hints at a relativized notion of computability based on almost-everywhere (or at least local) continuity rather than everywhere continuity, which we intend to revisit in a sequel paper.

Appendix A Smooth Ultracomputations and Effectively Computable Equilibria in Deep Neural Networks

Extending the framework of the main body of the paper, one may introduce smooth (ultra)computations as those having output features varying smoothly (i.e., differentiably) with the input features. Considerations of differentiability—particularly in infinite dimension—are very delicate and exceed the scope of this current paper (after all, our notions of extendibility and definability only capture continuity properties). Since differentiability is an essential assumption in current approaches to effective/implicit computability of deep neural networks, this appendix is a brief and informal outline on extensions to our framework beyond the present topological context so as to capture differentiability.

Throughout this appendix, we fix a realcompact CCS $\mathcal{C}$ whose layer states space $\underline{L}$ is a differentiable (smooth) manifold of finite dimension $n$ , and all predicates $P\in\mathcal{P}$ are differentiable on $L$ .

In particular, we assume that the embedding $L\subseteq\mathbb{R}^{\mathcal{P}}$ is as a closed subspace (in the product topology).

A.1. Deep equilibria of neural networks à la Bai-Kolter-Koltun

A.1.1. Unique Deep Equilibria

An empirical observation in the context of Neural Network Deep Equilibrium Models [BKK19] is that, in situations where a deep iterate $\gamma^{\mathcal{U}}=\operatorname{\mathcal{U}lim}_{n}\gamma^{n}$ of some computation $\gamma$ (assumed confined, for simplicity) exists, it is often independent of the ultrafilter $\mathcal{U}$ .³⁶³⁶36Implicitly, both [CRBD18] and [BKK19] work in a setting where the states space $L=\mathfrak{L}$ is realcompact, so there is no distinction between transforms and transitions-in-type. In such case, all deep iterates $\gamma^{\mathcal{U}}$ are one and the same transition $\gamma^{*}:L\to L$ —a “deep state” of the NN obtained by iteration of $\gamma$ . Therefore, the sequence of iterates $\gamma^{n}$ converges pointwise to the t-t $\gamma^{*}$ as an ordinary limit (rather than only as an ultralimit). We say that such $\gamma$ has the Unique Deep Equilibrium (UDE) Property. Smoothness properties of $\gamma$ are required for important applications, as described below.

A.1.2. Fixed-point algorithms as “Black Boxes”

Bai et al. note (empirically) that NNs obtained by iterating a common “weight-tied” layer transition $\gamma$ , the deep state $\gamma^{*}$ takes any input state $v\in L$ to another $v^{*}=\gamma^{*}(v)$ that is fixed by (the t-t implied by) the original $\gamma$ , i.e., $\gamma(v^{*})=v^{*}$ ; in other words, $\gamma^{*}$ takes values in the set Fix $(\gamma)=\{v\in L:\gamma(v)=v\}$ , so $\gamma^{*}$ is a deep equilibrium (DEQ) in a very strong sense. Empirical findings also suggest that, given $\gamma\in\Gamma$ , the DEQ state $\gamma^{*}:L\to\mathrm{Fix}(\gamma)\subseteq L$ may be well approximated by some generic “black-box” fixed point algorithm $\mathtt{FindFix}$ . Such an algorithm should take as inputs the transformation $\gamma$ and initial state-in-type $v$ , and returns the fixed point $v^{*}=\gamma^{*}(v)=\mathtt{FindFix}(\gamma;v)$ .

Like any algorithm based on floating-point arithmetic, what such an algorithm $\mathtt{FindFix}$ does in practice, given an acceptable error $\varepsilon>0$ and finitely many output features $Q_{1},\dots,Q_{k}\in\mathcal{P}$ specified in advance, is to return a suitable $k$ -tuple $\mathtt{FindFix}_{k}(\gamma;v;\varepsilon)=(r_{1},\dots,r_{k})$ of real numbers such that, for $1\leq i\leq k$ , $\left|Q_{i}(\gamma^{*}(v))-r_{i}\right|<\varepsilon$ . Under our current assumption that $L$ is smooth of dimension $n$ , all features $P\in\mathcal{P}$ of the output are (heuristically speaking, and perhaps only locally) implicitly defined in terms of some $n$ -many input features $Q_{i}$ , $1\leq i\leq n$ . Moreover, a generic such algorithm $\mathtt{FindFix}$ typically assumes that the given map $\gamma$ is not merely continuous but smooth (or at least sufficiently differentiable), and relies on gradient-based methods.

In principle, evaluating (or, approximating at least) the map $v\mapsto\gamma^{*}(v)$ by means of a “black-box” $\mathtt{FindFix}$ results in comparable computational complexity or even savings over the direct method of computing successive iterates $\gamma^{n}(v)$ until a limit is (very nearly) reached. Memory savings in training DE networks (cf. Section A.1.4 below) is also a key advantage to their success. From a theoretical perspective, the innovation lies on effectively bringing deep networks (at least, when obtainable as iterative deep equilibria) to par with classical networks, thereby enriching the class of directly and efficiently computable functions.

A.1.3. Parametrized Families of Computations

Fix a CCS $\underline{\mathcal{C}}=\langle\underline{L},\underline{\Gamma}\rangle$ with underlying CSS $\underline{L}=\langle L,\mathcal{P}\rangle$ as layer states space, as well as a second CSS $\underline{X}=\langle X,\mathcal{Q}\rangle$ , called the space of computation parameters, and a map $F:X\to\Gamma:x\mapsto F_{x}$ , which we regard as a parametrization of (some) computations by elements (parameters) $x\in X$ . We make the same assumptions about $\underline{X}$ as about $\underline{L}$ above (namely, $\underline{X}$ is a finite-dimensional differentiable manifold embedded as a closed subspace of $\mathbb{R}^{\mathcal{Q}}$ ). We call the structure $\underline{F}=\langle F,\underline{L},\underline{X},\underline{\Gamma}\rangle$ a Parametrized Family of Computations (PFC) (all of which are confined). It is appropriate to think of the parameter $x\in X$ as the “weights” of the computation $F_{x}$ .

We assume that $\Gamma$ has only confined transitions. It is quite natural to assume that $F$ is (i) continuous (as a map $X\to\mathfrak{T}$ ), and (ii) confined, i.e., restricts to maps $\mathfrak{X}[r_{\bullet}]\to\mathfrak{T}[s_{\bullet}^{[\cdot]}]$ .

A UDE hypothesis for $\underline{F}$ implies a map $X\to\mathfrak{T}:x\mapsto F^{*}_{x}$ which may also be regarded as a map

	$\displaystyle F^{*}:X\times L$	$\displaystyle\to L$
	$\displaystyle(x,v)$	$\displaystyle\mapsto F^{*}(x;v).$

A.1.4. Training deep networks

Training the deep neural network $F^{*}_{x}$ translates to finding weights $x$ such that $F^{*}_{x}$ satisfies a given condition, which we presently take to mean minimizing a given/specified real-valued loss function $\ell:\mathfrak{T}\to[0,\infty)$ . (At least intuitively, if not necessarily literally, the value $\ell(\mathfrak{g})\geq 0$ captures how far a transition $\mathfrak{g}\in\mathfrak{T}$ is from an optimal/idealized $G\in\mathfrak{T}$ .) Regarding $F^{*}_{x}$ for fixed $x$ as implicitly defined by either a fixed-point condition or ODE as above, the enormous memory cost of back-propagation through layers³⁷³⁷37Not least, because back-propagation would involve an unbounded number of ordinary layers to begin! is replaced by that of minimizing the function $\tilde{\ell}\coloneqq\ell\circ F^{*}:X\to[0,\infty)$ . Note that $\tilde{\ell}$ is merely a new real-valued predicate on the parameters CSS $\underline{X}$ . Assuming that $\tilde{\ell}$ is shard-continuous, it is definable, hence depends de facto on only finitely many features $Q_{1},\dots,Q_{n}\in\mathcal{Q}$ of its input $x\in X$ (up to an arbitrarily small admissible error $\varepsilon>0$ ). Assuming $\tilde{\ell}$ is smooth as well, the deep network may be trained using standard/“black-box” gradient-based procedures to find a minimizer $x\in X$ for $\tilde{\ell}$ . However, we note that it is essential for $\tilde{\ell}$ to be definable in order to allow even the possibility that some algorithm involving floating-point arithmetic and finitely many real quantities at a time succeeds in finding the minimizer.

A.2. Neural ODEs à la Chen-Rubanova-Bettencourt-Duvenaud

In another setting that is technically different but conceptually closely related to the one in §A.1, Chen et al. [CRBD18] also model deep states of residual networks (“Neural ODEs”) using differential equation techniques. The intuition behind Neural ODEs is the following: Consider a layered computation with atomic steps sequence $\gamma_{\bullet}=(\gamma_{0},\gamma_{1},\dots)$ such that all such steps $\gamma_{i}$ are “residually” very small (in the sense that the input and output features of any atomic step $\gamma_{i}$ differ very little). Successive $n$ -composites $\gamma_{\bullet}^{(n)}=\gamma_{n-1}\dots\gamma_{1}\gamma_{0}$ change very little with $n$ ; as one varies $\gamma_{\bullet}$ in such a way that the atomic steps residually vanish (i.e., $\gamma_{i}$ is vanishingly close to $\operatorname{id}$ ) and allows $n$ to grow without bound, when a limit exists, Chen et al. model it as a family $(\gamma^{(t)})_{t\geq 0}$ (indexed by a real “time” variable $t\geq 0$ ) of transitions $\gamma^{(t)}$ , which we assume to be (confined) elements of $\mathfrak{T}$ . (The real variable $t$ captures an appropriate asymptotic rescaling of the “discrete time” $n$ .) In this manner, each value $t=t_{0}$ captures a specific notion of deep state (as an asymptotic limit of deep composites of residually small layered transitions), realized as a confined transition.

One may hope that such transitions $\gamma^{(t)}$ vary differentiably with $t$ ; this suggests modeling the entire family $(\gamma^{(t)})_{t\geq 0}$ deep computational states per the differential equation implied. (In this manner, for each fixed $t=t_{0}\geq 0$ , one obtains a deep network $\gamma^{(t_{0})}$ in some sense).

Thus, “Neural ODEs” arise from differential equations of the form

(A.1)

\dot{v}=\mathbf{s}(v;t),

where $\mathbf{s}:L\times[0,+\infty)\to\mathrm{T}L\subseteq\mathbb{R}^{\mathcal{P}}$ is a section of the tangent space $\mathrm{T}L$ of the layer state space $L$ (i.e., $\mathbf{s}(v;t)\in\mathrm{T}_{v}L$ for all $v\in L$ and $t\geq 0$ , where $\mathrm{T}_{v}L$ is the tangent space of $L$ at $v$ ).Interpreting the ODE (A.1) hinges on the smooth manifold structure assumed of the state space $L$ .³⁸³⁸38The notions of differentiable structure and tangent space on an arbitrary layer space $L$ are neither well nor uniquely defined when $L$ is not finite-dimensional; their formalization would require much stronger assumptions on $L$ , as well as the formalism of Banach spaces for tangent spaces $\mathrm{T}_{v}L$ . Chen et al. illustrate empirically the feasibility and effectiveness of modeling deep equilibria by Neural ODEs. Let us denote the time- $t$ evolution by (A.1) using the (hopefully, suggestive) notation $v\mapsto e^{t\mathbf{s}}(v)$ , i.e., $e^{t\mathbf{s}}$ is the deep equilibrium of the Neural ODE $e^{t\mathbf{s}})$ solving (A.1) (i.e., “ $\gamma^{(t)}$ ” in the earlier informal discussion).³⁹³⁹39When the section $\mathbf{s}=\mathbf{s}(v)$ depends only on the state $v$ (not on time $t$ ), the ODE (A.1) is autonomous. A time evolution $e^{t\mathbf{s}}$ of such autonomous Neural ODE is analogous to a “parameter-tied” deep equilibrium after Bai et al. Effective computation of $e^{t\mathbf{s}}$ relies on a generic “black-box” ODE solver algorithm $\mathtt{ODEsolve}$ . Such algorithm should take as inputs the section $\mathbf{s}$ , initial state-in-type $v$ and time $t\geq 0$ , and returns the output $e^{t\mathbf{s}}(v)=\mathtt{ODEsolve}(\mathbf{s};v,t)$ . (More realistically, such $\mathtt{ODEsolve}$ presumably would return approximate values for any finitely many specified features of $e^{t\mathbf{s}}(v)$ ; refer to the discussion of $\mathtt{FindFix}_{k}$ above.)

Modeling deep computations by Neural ODEs and realizing them by means of an ODE solver effectively brings them on computational par with classical neural networks. The key insight of Chen et al. (which predates the work of Bai et al.) is that training such Neural ODEs may be done using the “adjoint sensitivity” method of Pontryagin instead of doing (extremely memory-intensive) back-propagation through layers—which, at any rate, have been essentially abstracted away. The adjoint sensitivity method may be implemented using $\mathtt{ODEsolve}$ itself, so the training is both memory and computation-efficient. Formalizing their method to train Neural ODEs in the spirit of §A.1.4 above requires parametrizing sections $\mathbf{s}$ by a second CSS $\underline{X}$ ; we omit the details.

References

[APL⁺22] Cem Anil, Ashwini Pokle, Kaiqu Liang, Johannes Treutlein, Yuhuai Wu, Shaojie Bai, J Zico Kolter, and Roger B Grosse. Path independent equilibrium models can better exploit test-time computation. Advances in Neural Information Processing Systems, 35:7796–7809, 2022.
[BKK19] Shaojie Bai, J Zico Kolter, and Vladlen Koltun. Deep equilibrium models. Advances in neural information processing systems, 32, 2019.
[BKK20] Shaojie Bai, Vladlen Koltun, and J. Zico Kolter. Multiscale deep equilibrium models. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 5238–5250. Curran Associates, Inc., 2020.
[CK66] Chen-chung Chang and H. Jerome Keisler. Continuous model theory. Annals of Mathematics Studies, No. 58. Princeton Univ. Press, Princeton, N.J., 1966.
[CRBD18] Ricky TQ Chen, Yulia Rubanova, Jesse Bettencourt, and David K Duvenaud. Neural ordinary differential equations. Advances in neural information processing systems, 31, 2018.
[Eng89] Ryszard Engelking. General topology, volume 6 of Sigma Series in Pure Mathematics. Heldermann Verlag, Berlin, second edition, 1989. Translated from the Polish by the author.
[Fur81] H. Furstenberg. Recurrence in ergodic theory and combinatorial number theory. Princeton University Press, Princeton, NJ, 1981. M. B. Porter Lectures.
[Gro52] A. Grothendieck. Critères de compacité dans les espaces fonctionnels généraux. Amer. J. Math., 74:168–186, 1952.
[HL⁺19] Jiequn Han, Qianxiao Li, et al. A mean-field optimal control formulation of deep learning. Research in the Mathematical Sciences, 6(1):1–41, 2019.
[HLA⁺21] Ramin Hasani, Mathias Lechner, Alexander Amini, Daniela Rus, and Radu Grosu. Liquid time-constant networks. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 7657–7666, 2021.
[HLA⁺22] Ramin Hasani, Mathias Lechner, Alexander Amini, Lucas Liebenwein, Aaron Ray, Max Tschaikowski, Gerald Teschl, and Daniela Rus. Closed-form continuous-time neural networks. Nature Machine Intelligence, 4(11):992–1003, November 2022.
[HS10] Neil Hindman and Dona Strauss. Algebra in the space of ultrafilters and Ramsey theory. In Ultrafilters across mathematics, volume 530 of Contemp. Math., pages 121–145. Amer. Math. Soc., Providence, RI, 2010.
[Kei23] H. Jerome Keisler. Model theory for real-valued structures. In José Iovino, editor, Beyond First Order Model Theory, Volume II. CRC Press, Boca Raton, FL, 2023.
[KK87] Heinz König and Norbert Kuhn. Angelic spaces and the double limit relation. J. London Math. Soc. (2), 35(3):454–470, 1987.
[KM81] Jean-Louis Krivine and Bernard Maurey. Espaces de Banach stables. Israel J. Math., 39(4):273–295, 1981.
[Kri76] J.-L. Krivine. Sous-espaces de dimension finie des espaces de Banach réticulés. Ann. of Math. (2), 104(1):1–29, 1976.
[LJ23] Tianyi Lin and Michael I Jordan. Monotone inclusions, acceleration, and closed-loop control. Mathematics of Operations Research, 48(4):2353–2382, 2023.
[SDJS22] Bin Shi, Simon S Du, Michael I Jordan, and Weijie J Su. Understanding the acceleration phenomenon via high-resolution differential equations. Mathematical Programming, pages 1–70, 2022.
[Wei75] Maurice D. Weir. Hewitt-Nachbin spaces, volume No. 17 of North-Holland Mathematics Studies. North-Holland Publishing Co., Amsterdam-Oxford; American Elsevier Publishing Co., Inc., New York, 1975. Notas de Matemática, No. 57. [Mathematical Notes].

Approximability of Deep Computations

Abstract.

Key words and phrases:

2000 Mathematics Subject Classification:

1. Introduction

2. Computations and ultracomputations with countably many features

2.1. Definitions

Proposition 2.1.

Proof.

Theorem 2.2.

Proof.

2.2. Deep computations and deep equilibria

2.2.1. Deep computations and ultracomputations

2.2.2. Deep iterates and deep equilibria

Deep iterates

Proposition 2.3 (Cf., Propositions 4.5, 4.6, and 4.7).

Deep equilibria

Theorem 2.4 (Cf., Theorem 5.3).

2.3. Definability Criteria

Theorem 2.5 (Cf., Theorem 6.4).

3. Structures for Real-Valued Computations

3.1. Computations, states, observable features and predicates: A meteorological allegory

3.2. Computation States Structures

3.2.1. Types of states

3.2.2. Topology on the layer state space

Remark 3.1.

3.3. Tychonoff and Realcompact spaces

3.3.1. Tychonoff spaces

Remark 3.2.

3.3.2. Realcompact spaces

3.3.3. Realcompactness of type spaces

Remark 3.3.

3.3.4. Realcompact CSSs

3.4. Compositional Computation Structures

Remarks 3.4.

3.4.1. Reduction and Continuity Axioms

Remark 3.5.

3.5. Examples of CSSs and CCSs

3.5.1. The unit interval

3.5.2. ℝdsuperscriptℝ𝑑\mathbb{R}^{d}blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT

3.5.3. ℝωsuperscriptℝ𝜔\mathbb{R}^{\omega}blackboard_R start_POSTSUPERSCRIPT italic_ω end_POSTSUPERSCRIPT and c00subscript𝑐00c_{00}italic_c start_POSTSUBSCRIPT 00 end_POSTSUBSCRIPT

Remark 3.6.

3.5.4. ℓqsubscriptℓ𝑞\ell_{q}roman_ℓ start_POSTSUBSCRIPT italic_q end_POSTSUBSCRIPT

Remark 3.7.

4. Deep Computations

4.1. Shards in state- and type-spaces

4.1.1. Sizers and shards in type spaces

4.1.2. Shards

Proposition 4.1.

Proof.

Remarks 4.2.

4.2. Transitions-in-type. Extendibility.

4.2.1. Extendable layer transitions

Remark 4.3.

4.2.2. Spaces of transitions-in-type

Remarks 4.4.

Proposition 4.5.

Proof.

4.3. Computations and ultracomputations (deep computations)

4.3.1. The Extendibility Axiom

4.3.2. Realized vs. deep computations

Proposition 4.6.

Proof.

Proposition 4.7.

Proof.

5. Deep Iterations and Deep Equilibria

5.1. Layered and iterative computations

Layered Computations (LCs)

Iterative computations (ICs)

5.2. Deep layers, deep iterates, and equilibria

5.2.1. Deep layers

5.2.2. Deep iterates

Remark 5.1.

5.2.3. Deep equilibria

Remark 5.2.

Theorem 5.3 (Existence of Deep Iterative Equilibria).

Proof.

5.3. Examples and discussion of deep iterates and deep equilibria

Example 5.4.

Example 5.5.

3.5.2. $\mathbb{R}^{d}$

3.5.3. $\mathbb{R}^{\omega}$ and $c_{00}$

3.5.4. $\ell_{q}$