Approximability of Deep Computations
Abstract.
We introduce a structural framework for computations involving floating-point operations. Informed by real-valued logic, we introduce deep computations (ultracomputations) and deep iterates, formalizing the ideas of “asymptotic limit” of computations and compositional iterates, respectively.
As an application of this framework, we prove the existence of deep equilibria, which hitherto have been found only empirically (yielding remarkable memory savings in deep learning). Our proof of existence of deep equilibria is based on the concept of idempotent ultrafilter from combinatorics and inspired by the notion of indiscernibility from model theory.
We study and characterize deep computations (and hence deep equilibria) that are bona fide computable, i.e., uniformly approximable by a priori given computable primitive real-valued functions. Informed by model theory of real-valued structures, as well as -theory from topology, we use a classical result of Grothendieck to characterize computability of deep computations in terms of continuous extendibility.
Our framework does not impose a priori uniform/global bounds on real-valued quantities; therefore, our structures yield non-compact types spaces. Such type spaces require a more nuanced topologically treatment than compact ones arising in model theory of -valued structures.
Key words and phrases:
Deep computations, ultracomputations, deep equilibrium models, idempotent ultrafilters2000 Mathematics Subject Classification:
68T27, 68T07, 03C98, 05D10, 54D801. Introduction
In this paper, we introduce a general notion of computation which, we contend, captures the essence of digital computations involving floating-point arithmetic. As computing power expands, so does the need for foundational frameworks to understand systems and applications, particularly their asymptotic properties as scale and scope grow potentially indefinitely; such frameworks are needed across multiple areas of computational science and engineering. A prominent example is that of neural networks, which pass computations through an increasing number of compositional layers. Deep learning systems are typically based on networks with a large number of layers (i.e., increasingly deep networks), which are correspondingly expensive to compute; it is of enormous importance to approximate the output of such deep networks more efficiently—even, simply to understand whether such approximation is possible in principle.
Notable recent frameworks working to leverage asymptotic properties of deep networks include:
-
•
Neural Ordinary Differential Equations (Neural ODEs) [CRBD18], [HLA+21], which introduce ODEs as mechanisms to capture residual deep networks asymptotically (as the number of layers goes to infinity), and use numerical or analytical ([HLA+22]) ODE solutions as “shortcuts” to approximate the output of such deep networks minus the need for multifold iterations; and
-
•
Deep Equilibrium Networks (DEQs) [BKK19], [BKK20], [APL+22], which model deep networks arising by iterating the same (“parameter-tied”) layer transition indefinitely (in cases when such network reaches an asymptotic equilibrium), then uses fixed point numerical solvers to shortcut the deep computation implied.
Related problems have also arisen in numerical optimization, where discrete iterative optimization algorithms—whose asymptotic properties as the step size goes to zero are of perennial interest and importance—are now modeled asymptotically using dynamical systems [SDJS22], and in control theory, where properties of parameter-dependent asymptotic computations are of central importance in key applications [CRBD18, HL+19, LJ23].
Such approaches to asymptotic (‘deep’) computations are fundamentally asking whether a complex composition of function applications may be realized through a smaller computation. From the perspective of this paper: “Can the result of some large (conceptually infinite) sequence of function compositions be approximated, effectively and finitarily, from accepted computational primitives (‘atomic predicates’) and standard floating-point operations?” Without precise context, the questions of whether the asymptotic limit of a computation exists, and when it can be feasibly approximated are ill-defined; in this paper, we propose a framework and basic tools that we hope will be useful in addressing such questions, among others, concerning the notion and nature of deep computations.
The remainder of this introductory section provides an informal overview of our approach and main results.
Our computations are transformations accepting as input a state and returning an output state . We posit that states be uniquely characterized in terms of a collection of real-valued quantities (as varies over a fixed collection of “primitive predicates”). We call each real value a feature of ; it is appropriate to think of such feature as a the “-th coordinate” of —indeed, a particular instance of our framework is when states are vectors and there is one predicate for each of the coordinates () of . Each predicate is “atomic” in the sense that it captures a primitive feature of states , i.e., features of any given state are regarded as computable ab initio. De facto, a computation effectively maps the collection of predicate values of an arbitrary input state , to another such collection uniquely characterizing the output state .
Understood in such generality, transformations can hardly be called computations in any reasonable sense. Indeed, the identification of a state with the collection of real values of its primitive predicates in any sensible paradigm for floating-point computations transforming states implies that each real-valued feature of the output state ought to depend continuously on features of the input state; such continuity assumption is implied by the tenet that floating-point computations are intrinsically approximate, never quite exact. In general, one cannot expect finite-precision calculations to be able to approximate real-valued features (of the output) that vary discontinuously with respect to the real-valued quantities that one uses to encode the input state!
For simplicity, we assume the inputs and outputs of computations belong to the same state space which, when endowed with the collection of real-valued predicates , becomes a Computation States Structure (CSS) .
Our results are deeper (and, hopefully, most illuminating) when the CSS is endowed with an infinite predicate collection . Because of the importance of such special case, and for ease of exposition, throughout the remainder of this introductory section, we assume:
is countable.
The case of countable is quite relevant in applications for reasons we now explain. Let the predicate collection be . Every state is characterized (at least from the purely structural perspective we adopt) by the real sequence of its predicate values, called the type of ; thus, is effectively identified with the set of such types, i.e., with a subset of the set of all real sequences ; one may think of as the “-th entry” of a state . We shall require that computations have “features” (i.e., entries) varying continuously with (the type of) for each ; explicitly, every such computation feature is required to vary continuously with respect to any feature of .
State types effectively encode, e.g., intermediate stages of neural networks. (For this reason, we also refer to as the “layer state space”; here, the identification of a state with its type is implicit.)111Neural networks whose layer transitions depend not only on the current state but possibly on earlier ones may be formalized as multi-argument transformations. An appropriate setting is that of -ary CCSs, which are informally introduced in Remark 3.4(2). A different view of layer transitions that allows them to depend on additional parameters is the notion of Parametrized Family of Computations discussed in Appendix A.1.4. For simplicity, computations generally represent (single-argument) maps throughout this paper. A neural network of depth may be regarded as a computation composing -many single-layer transitions that are computations of a very specific kind. Roughly speaking, once one fixes a (suitable) nonlinear activation function once and for all, each feature of each transition is obtained by applying to some linear combination of input features (the chosen coefficients in forming such linear combinations are the parameters of the transition ).
At any given stage (layer), only finitely many real-valued features of the computation are meaningful in an effective sense; however, the study of arbitrarily deep networks, and of deep layers/deep equilibria of such networks requires keeping track of numerous features the number of which is not necessarily bounded beforehand. The natural setting to treat a finite number of features possibly growing without bound is with a countable predicate collection .
We formalize the deep layers and equilibria mentioned above as deep computations (or ultracomputations), which capture a precise notion of asymptotic limit of computations. Such deep computations are obtained as pointwise limits of sequences of given computations, where “pointwise” means for each individual feature. (More generally, deep computations may arise as arbitrary pointwise ultralimits of computations.) Deep computations are not necessarily realizable as layer transition maps , but typically only as “transforms” into the space of state types (namely is the topological closure of ).
A tenet of our approach is that a transform is to be considered “effectively computable” if, for each , the output feature is a definable predicate in the following sense:
Given any fixed (otherwise arbitrary), the output feature is -approximated by a continuous function of finitely many input features , on any region wherein every input feature remains bounded in magnitude.
(The approximating function above is allowed to depend on and .)
In other words, as long as input features remain bounded and one is willing to accept an error of magnitude not exceeding , output features are continuous functions of finitely many input features.
Under a suitable Extendibility Hypothesis (as formulated in §4.2) on computations, we introduce the notion of ultracomputation not merely as a transform , but rather as a map , called a “transition-in-type (t-t)”; such a t-t is considered effectively computable if each feature is definable. The setting of transitions-in-type implies a shift in perspective that is essential to the study of deep layers in our setting.222In the special case when is closed in , transitions-in-type are the same as transforms, namely maps . However (particularly when the predicate collection is infinite) such situation is rather special—it amounts to a “saturation” property of the CSS —and we do not assume it a priori.
In a computational paradigm based on floating-point arithmetic, definable transforms (or t-ts) are as effectively computable as one could hope for: after all, algorithmic implementations of such arithmetic impose a priori bounds on inputs; the specific algorithm depends on such bounds in a manner paralleling the dependence of approximating on the bounds and the admissible error magnitude . In this paper, we are not concerned with explicit algorithmic implementations of definable ultracomputations; our results pertain to effective computability understood as the ability to carry out such computations in principle, i.e., the existence of an algorithm (which we otherwise do not provide).
(We stress that such a notion of effective computability is relative: The distinguished predicates are considered computable a priori —i.e., are computational primitives.)
We show (Theorem 6.2.3) that:
-
(1)
The approximations to a definable may be taken to be polynomials of the input features (i.e., a definable is polynomially definable);
-
(2)
is definable iff it extends to a function ; and
-
(3)
Definable transforms are precisely those that extend to continuous t-ts (this is the property of extendibility mentioned above).
Extendable are continuous on , but extendibility is a strictly stronger property whenever .333General criteria for a transform to takes values in —or, similarly, for a t-t to restrict to a transition —are delicate; their study exceeds the scope of the present manuscript.
Under suitable assumptions, we prove the existence of deep iterates and equilibria (understood as transitions-in-type); see Proposition 4.7 and Theorem 5.3: deep layers and deep equilibria-in-type of neural networks exist under such assumptions.
Our results in §6 characterize definability of ultracomputations. A particular case of Theorem 6.4 is as follows.
Assume that is countable and each predicate is bounded on . Fix a set of extendable computations . Every ultracomputation of is definable if and only if, for every predicate and any sequences and , the following Limit Exchange identity:
holds whenever the iterated limits on the left- and right-hand side both exist. Moreover, in such case, each ultracomputation of is the (pointwise) limit of some sequence . The limit is attained uniformly over any set of states that is feature-wise bounded (i.e., is included in a “shard” in the sense of §4.1.2).
Going farther, we define smooth (ultra)computations as those having output features varying smoothly (i.e., differentiably) with the input features . Although the study of smoothness properties of definable ultracomputations is beyond the scope of the current paper, such smoothness is implicit in applications such as training of Neural ODEs [CRBD18] and equilibrium analysis of DEQs [BKK19]. In Appendix A, we informally introduce smooth transitions-in-type. In sections §A.1.2 and §A.1.4, we outline the connections of our results, respectively, with effective computability of equilibria of DEQs, and with the training of Neural ODEs (in practice done using optimal control, for instance).
The paper is organized as follows. Section §2 is a self-contained abridged summary, for countable only, of the more general results in Sections §§4–6, which apply to more general structures. Section §3.4 introduces the general notions (and examples) of Computation States Structure (CSS) and Compositional Computation Structure (CCS) (the latter being essentially a CSS expanded with a semigroup of extendable computations ). We also introduce the topological spaces of types—both of states and of transitions. In Section §5, we prove (under suitable hypotheses) the existence of deep computations and of deep equilibria (the main results of §§4–5 are Proposition 4.7 and Theorem 5.3). In Section §6 we prove the aforementioned characterization of definable ultracomputations (the main result being Theorem 6.4).
We are grateful to Frank Tall for his constant guidance through the world of -theory.
Readers with experience in model theory will realize that the ideas presented here are strongly influenced by the work of C. C. Chang and H. J. Keisler on continuous model theory and model theory of real-valued structures [CK66, Kei23] as well as the work of J.-L. Krivine in Banach space theory [Kri76, KM81]. We owe a great debt of gratitude to these giants for allowing us to stand on their shoulders.
2. Computations and ultracomputations with countably many features
This section expands on the outline in Section §1, summarizing the results of subsequent sections §3.4–§6 in the special setting of computations (and ultracomputations obtained therefrom) involving states characterized by (at most) countably many real-valued “observable features” ; readers interested in the general framework (when states are possibly characterized by uncountably many features) should skip forward to §3.4. Proofs in this section are omitted if they are presented in subsequent sections.
2.1. Definitions
Fix a set of countably many distinct distinguished predicate symbols . Effectively, and are interchangeable: one may think of the number as a label for the symbol , but also may be regarded as a purely syntactic label for the number —the usefulness of the syntax is its later use to denote a bona fide function (the symbol “” is an extremely poor choice of name for a function!). Let () be the space of all functions , each regarded as a real tuple ; the space is endowed with the product topology, i.e., the topology of entry-wise convergence of such tuples. Each names a coordinate (projection) map . The real quantity is called the -th feature of .
Fix an arbitrary nonempty subset , which we shall call the state space; its elements are called states. (We may also called these the layer state space and layer states to capture the neural-network intuition explained in the introduction.) The (real-valued) predicate on with symbol is the map obtained by restricting to ; the -feature of is . The pair is called a computation states structure (CSS); it will henceforth be denoted simply by an abuse of notation whereby we identify each symbol with the corresponding predicate . (Such abuse of notation will be quite frequent throughout.)
The topological closure is called the space of (layer) state types of ; its elements are called state types. Elements are called realized states to distinguish them from state types , called unrealized (when such exist).
Each symbol still gives a continuous predicate (real-valued function) by restriction of the projection ; it is the unique extension of to , and will still be denoted (or even just ) by an abuse of notation.
A sizer is a family of nonnegative reals indexed by predicates . Such a sizer names a compact subset . Given a sizer , the -shard of (resp., of ) is (resp., the closure ). All type-shards are compact (being closed in ). Clearly, (equality need not hold).
Proposition 2.1.
Let be a CSS with countable predicate collection .
-
(1)
the space of state types is metrizable;
-
(2)
every state type is the limit of a sequence of realized states;
-
(3)
every type is shard-supported in the sense that for some sizer ; thus, (where varies over all sizers);
-
(4)
a real function on is continuous if its restrictions to arbitrary type-shards are continuous.444We thank F. Tall for pointing out that is a -space for countable. Indeed, property (4) is a strengthening of the property of inasmuch as type-shards are compact (however, an arbitrary compact need not be included in any type-shard).
The proof uses metrizability in an essential way, hinting at the technical difficulties arising (from Sections §3.4 on) when is possibly uncountable.
Proof.
One sees that any compact is included in some type-shard —itself compact. The real line is topologized by the bounded metric , where . Since is countable, the space is metrizable, say by ; therefore, its subspace is metrizable, proving (1). By density of in and (1), every type is the limit of a sequence ; hence, () is compact. The image is bounded for each , hence for some , and evidently (where ). Assertions (2) and (3) follow.
The compactness argument above is adapted to show that any convergent sequence is included in some type-shard. Indeed, for each , some sequence satisfies ; without loss of generality (upon replacing by a sufficiently deep tail thereof if necessary), we may impose the following accelerated convergence requirement: as (the sequences converge to their limits “increasingly faster” as grows). Let . Then, the set is compact: Given an open cover of , we have for some . By accelerated convergence, for all sufficiently large , say, for , we have . For each there is also with , hence for all but finitely many ; therefore, covers all but finitely many points of , hence has a finite subcover, so is compact. Since each image is (compact, hence) bounded, we deduce that for some sizer (as before).
Let now be discontinuous, say at ; then, is the limit of some sequence in some type-shard (by the preceding paragraph), but such that . We have (shards being closed in ), so the restriction of to is discontinuous, proving (4). ∎
A (syntactic) formula is a purely formal real polynomial in predicate symbols (treated as pairwise commuting indeterminates). Since each names a map , such a formula itself names a polynomial function (or just polynomial) called the interpretation of which, in practice, we shall identify with the syntactic formula . (Different formulas may yield the same polynomial function, but this is not an issue in practice.) By restriction of its interpretation on , a formula also gives polynomials on and on ; moreover, by density of in , polynomials on and are in natural bijection, so we shall not distinguish between them.
A definable predicate is any real map whose restriction to an arbitrary shard is uniformly approximable by polynomials. Using the same definition of definable predicate on the type space (i.e., the restriction of to an arbitrary compact type-shards is uniformly approximable by polynomials), we see that definable predicates on and are also identified. (By Proposition 2.1(4), a definable on extends continuously to each type-shard, and therefore to a continuous .)
A map (resp., , ) is called a transition (resp., a transform, a transition-in-type (t-t)). By the inclusion , every transition is a transform. A transform is extendable if it extends to a continuous t-t on . A transform or t-t is definable if each of its features is definable.
For sizers , a transform (resp., a t-t) is called -confined on if it restricts to a map (resp., ). A set of transforms or t-ts is -confined by if each of its members is.
Any collection of sizers (itself indexed by sizers) is called a confiner. A transform or t-t is -confined if it is -confined on for all sizers . Let be the sets of all confined transforms and t-ts, respectively; the set of -confined transforms (resp., t-ts) is denoted (resp., ). Any subcollection of or for some confiner is called uniformly confined (or confined by ).
A collection of sizers is called exhaustive if . A transform or t-t (resp., a set of transforms or t-ts) is called -confined if it is (resp., if all its members are) -confined on , for all . The set of -confined t-ts is denoted . (By exhaustiveness, -confined transforms and t-ts are confined in the above sense.)
Any continuous t-t maps each shard into some (compact subset of some) shard , so such is necessarily -confined for some .
Theorem 2.2.
A transform is extendable iff it is definable, in which case it is necessarily confined.
Proof.
Let be extended by a continuous . For fixed , the feature is continuous on the compactum , hence uniformly approximable thereon by polynomials in predicates , by the Stone-Weierstrass Theorem (such predicates are continuous and separate points of ); thus, is definable, and so is a fortiori.
Conversely, if is definable, for fixed , each restriction of its -feature to an arbitrary state-shard is a uniform limit of polynomials in predicates. Each such is (the restriction to of) a polynomial on the compact type-shard . Some sequence of such polynomials converges uniformly on to a real on extending continuously. Since is continuous on the compactum , it is bounded on magnitude thereon, say by . Letting vary, we obtain an -confined map . Clearly, some (unique) -confined extends all such ; such is continuous since each entry is, by Proposition 2.1(4). ∎
Theorem 2.2 formalizes the (perhaps surprising) fact that non-extendable transitions are not obtainable from explicit constructions involving the predicates . We remind the reader that the topology on is the coarsest one for which all predicates are continuous. However, even a continuous transition , if non-extendable, is “uncomputable“ in the sense that its coordinates (“features“) cannot be well approximated by continuous functions (e.g., polynomials) of predicates . Any sense of approximation cannot be uniform; in fact, it cannot even be uniform on arbitrary shards .
For that reason, extendibility is a critical hypothesis in our main results.
2.2. Deep computations and deep equilibria
Let be a confiner. Recall that , are the sets of all transforms and t-ts, respectively, that are -confined. (see page 2.1 for the definitions).
A extendable transition will be called a computation.
A compositional computation structure (CCS) with countably many predicates
consists of:
-
•
a CSS whose predicate collection is countable;
-
•
a semigroup , whose elements are called computations of ;
-
•
a continuous semigroup action of on .
Each computation gives a transition
Under this identification, is a semigroup (under composition) of maps .
The CSS above will often be denoted simply without explicitly naming the evaluation action which, however, is always an implicit operation of .555At any rate, the “functional application notation” for makes it essentially redundant to have a name for the action.
CCSs are required to satisfy the666When is uncountable, the Extendibility Axiom takes a different form:
see §4.3.1.
Extendibility Axiom. The transition of any computation is extendable.
By extendibility, any computation is necessarily confined, so it may be regarded as a (confined) element .
2.2.1. Deep computations and ultracomputations
A deep computation (DC) of a set of computations is any confined transform that is an accumulation point of (the set of transitions of) computations in , in the topology of pointwise convergence. Equivalently, a DC is any pointwise ultralimit of any family for any ultrafilter on the (otherwise arbitrary) index set , as long as each pointwise limit exists and the resulting map is confined.
An ultracomputation (ucomp) of is any (confined) accumulation point in of the set of transitions-in-type extending computations .
In general,
-
•
a DC need not be a map —let alone need a ucomp restrict to such a map;
-
•
a DC need not have a unique extension to a ucomp.777Clearly, any DC admits some extension to a ucomp, but such extension need not be continuous —nor, for that matter, be constructible in any explicit sense.
2.2.2. Deep iterates and deep equilibria
Deep iterates
The (topological product) space of all t-ts is a semigroup under composition.888Composition is continuous in the left argument , but generally not in the right argument . In general, the set of -confined t-ts is not closed under composition. One sees that the subset is a sub-semigroup (although its confined parts are typically not closed under composition).
A deep iterate (DI) of a computation is any ultracomputation arising as ultralimit of iterates (-fold) of the t-t of . Note that the notion of deep iterate is strictly “in-type”, i.e., it is a transition-in-type—not a transition. Being themselves confined by definition, deep iterates may be composed with any confined t-t.
Proposition 2.3 (Cf., Propositions 4.5, 4.6, and 4.7).
Fix any confiner , any exhaustive collection , and any set :
-
(1)
is compact;
-
(2)
is a compact sub-semigroup of .
-
(3)
Ultracomputations obtained from computations in form a closed sub-semigroup of ;
-
(4)
For any -confined indexed family , and any ultrafilter on , the deep computation and the ultracomputation exist;
-
(1)
If is -confined, then has deep iterates of the form for arbitrary nonprincipal .
Deep equilibria
A deep equilibrium of a computation is an idempotent deep iterate , i.e., one such that .
Theorem 2.4 (Cf., Theorem 5.3).
Let be any exhaustive collection. If is an -confined computation, then has deep equilibria. In fact, one such DE is obtained as the ultralimit from an arbitrary idempotent ultrafilter on .
2.3. Definability Criteria
Ultracomputations, deep iterates and deep equilibria are typically not definable, i.e., not effectively computable —even when consists of a single predicate , let alone countably many! (Cf., Example 3.5.1 et seqq.)
Theorem 2.2 implies very strong restrictions on the ability to realize deep computations in any explicit fashion. One may ask for criteria ensuring that deep computations (or deep iterates, or deep equilibria) are effectively computable—i.e., definable.
Theorem 2.5 (Cf., Theorem 6.4).
Let be confiner, and let be any collection of -confined computations on a CCS with countable predicate collection . Then, the properties below are equivalent:
-
(DD)
Deep Definability: All deep computations of are definable (hence extend to continuous ultracomputations).
-
(LE)
Limit Exchange: For all predicates , all sizers , and all sequences and , the Limit Exchange identity:
(2.1) holds whenever the iterated limits on the left- and right-hand side both exist. Moreover, in such case, each ultracomputation (hence, each DC) of is the (pointwise) limit of some sequence . The limit is attained uniformly on type-shards (a fortiori, uniformly on state shards).
3. Structures for Real-Valued Computations
In this section, we introduce the notions of computation states structure (CSS) and compositional computation structure (CCS), which lie at the foundation of our approach to real-valued computing. Although the definitions of CSS and CCS in §3.2 and §3.4 are fairly straightforward, the abstraction entailed by these notions warrants a preliminary informal discussion to demystify some of the formalism.
3.1. Computations, states, observable features and predicates: A meteorological allegory
Consider physical quantities (such as temperature and barometric pressure) that are real-valued, and each of which may be observed at any given point. For definiteness, consider points on or above the surface of earth, regarded as an idealized sphere. A state captures the properties a specific such point at a specific moment in time. In such idealization, each physical quantity at any is called a feature of (or observable feature for emphasis). Each such feature must be given a name (e.g., temperature, pressure, latitude, longitude, height, etc.); these names are essential, for otherwise the real value of a feature of is devoid of context. We use the term observable to refer to the name given to any such property that may be observed; in a formal treatment, we use (purely syntactic) symbols (e.g., “T” for temperature, “p” for pressure, “lat” for latitude, “long” for longitude, “h” for height, etc.) as observables. An observable feature of is the value at of the observable; e.g., may have features (the lat-feature—i.e., latitude—of is N), ( has long-feature—i.e., longitude— W), ( is at m height) (the temperature at is ), etc.
We fix a symbol for each observable; such symbols P, Q, …(not necessarily finitely many, or even countably many for that matter) will be called predicate symbols. The set of predicate symbols (i.e., of symbols for observables under consideration) will be denoted . We shall denote the set of all possible states by . In the present discussion, might be taken to consist of points on the surface of our idealized spherical earth; it is perhaps more fitting to allow states to refer to spatial points each at a specific moment in time. Note that time is not an observable if one takes simply as the set of points on the sphere, but is a valid observable on the set of states simultaneously encoding both ___location and time (in addition to other observables: temperature, pressure, etc.)
Any real-valued function on is called a predicate. Each symbol , at any state , has an associated real value (the switch to italic from typewriter-style is a reminder that the symbol P has been “interpreted” to yield the actual value of the P-feature of ). Thus, the symbol entails a predicate
(The notation is meant to emphasize the passage from the symbol P to its interpretation.)
Now that the distinction between observables P and the predicates interpreting them is clear, we shall henceforth use italic simultaneously as formal (predicate) symbols denoting observables, and to denote the corresponding predicates; in cases of potential confusion, we use the preferred notation for predicates. (Whenever is used as an index set, its members are regarded as symbols, never as predicates.)
Taking together with the predicate interpreting each observable thereon, we obtain a pair called a Computation States Structure (CSS) in 3.2.1 below. (In , the collection of predicates is a family indexed by symbols P.) By an abuse of notation, we may denote such structure in the form wherein the collection of predicates is implicitly identified with the indexing set .
In the allegory, such features include the quantities lat, long and h, which are coordinates in the usual sense, as well as other features T, p and time t, which are not; however, this suggests regarding the collection of all features of states as coordinatizing states . Each -feature is the “-th coordinate” of in an abstract sense; the collection is called the type of . Any state is uniquely characterized by its type. A critical feature of our approach is to endow the state space with the topology of “pointwise convergence”, i.e., a filter on (or: a sequence or net of) states converges to a state iff the filter (or sequence, or net) of real-valued -features converges to , for each .
For the remainder of this subsection, we assume that the state space compact. In our allegory wherein height (and time) are observable allowed to take arbitrarily large values, compactness fails. On the other hand, if we were to restrict the height and time intervals to be bounded (e.g., for any fixed ), the respective state space would be compact.
On first approximation, a computation is a map transforming any given input state to some output state . (For simplicity, we use the same space of input and output states.) In our allegory, one may “visualize” computations as moving to another point , possibly at a different moment in time. Maps should be considered “computable” in any reasonably explicit sense (say, by algorithms relying on floating-arithmetic) only if output features vary continuously with input features , i.e., only when is a continuous map in the topology of pointwise convergence of individual observable features. Such requirement is consistent with the physics implied by our allegory. We always require computations to be continuous.999Computations on a noncompact state space are required to be extendable in the sense of §4.2.1—a technical requirement significantly stronger than continuity.
For illustration purposes, consider the “advance-time-by-1” computation taking any state of some point at some time t to the unique state of the same point at time t. Features of the computation give the temperature and pressure at a future moment in time from the state at present time . Meteorologists would be ecstatic to learn features at time from those at time !
When the state space is compact, continuous computations are effectively computable in a rather strong sense: they are polynomially definable. This means that, up to any small fixed (but otherwise arbitrary) degree of precision, every output feature is given (up to an error not exceeding the precision) by a polynomial on some input features . Meteorologists would be even happier to possess polynomial expressions for features of the computation , i.e., of future features from the present ones! On the other hand (with apologies to meteorologists), our methods offer no insight on the specific polynomial approximating any output feature; at any rate, such features would only be polynomially approximable on a bounded interval
As a by-product of choosing a common state space both for computation inputs and outputs, computations are necessarily composable, i.e., any given computations naturally generate a semigroup of computations. This gives rise to the notion of compositional computation structure (CCS), which is of one the form
where is a CSS, and is any semigroup under an (associative) composition operation , with elements representing computations on via an evaluation map (, if is already a set of maps ). Layer state transitions are assumed continuous on (when is noncompact, we require them to be extendable in the sense of §4.2.1). CCSs are the natural structures to study compositions of -many computations leading, as , to “deep computation states”, as well as “deep iterates” asymptotically approximated by -fold iterates of a fixed computation .
With suitable changes in definitions, our results apply to non-compact CSSs/CCSs.
3.2. Computation States Structures
Fix an arbitrary nonempty set whose members will be called predicate symbols.
A Computation States Structure (CSS) with predicates is of the form
where
-
•
is a nonempty set, called the sort (or space) of layer states;
-
•
For each symbol , the -predicate of is a real function .101010In the setting of real-valued structures, any real function is called a predicate.
By an abuse of notation, we typically identify a symbol with the predicate ; this entails a further abuse whereby we identify the predicate collection with itself; thereby, the CSS above takes the form .
3.2.1. Types of states
In a CSS , the type of a state is the indexed family of its predicate values. Such type is called realized by ; it is a “vector” with real entries indexed by predicates . Thus, such state types are elements of the product (vector space) , which will always be regarded as topological product of copies of the real line (endowed with its usual topology), one such line for each . The topological subspace of realized types will be denoted . (On the other hand, the linear operations on the vector space will not play a direct role outside of informal discussions—and in the Appendix.)
Ultrafilters on an infinite (“index”) set will be denoted ; we consider nonprincipal ultrafilters tacitly. Given an ultrafilter on , we say that an indexed family of elements converges to , or that is the -ultralimit of with respect to if for each (i.e., when is the -ultralimit of in the pointwise convergence topology—not necessarily uniformly as varies).111111When it exists, the -ultralimit of is uniquely characterized by the following property: for every , the set belongs to (i.e., is a “-large” set). Not all ultralimits need exist since is not compact. The (necessarily unique) ultralimit is denoted .
Elements arising as entry-wise ultralimits of realized types in the above fashion (with and allowed to vary) are called types of (layer) states, or ultrastates. Any realized state type is an ultrastate, but the converse fails in general. The set of ultrastates is a closed subset (the bar denoting topological closure) of , called the (layer) state type space, and henceforth endowed with the subspace topology. Since ultrastates need not be realized, the inclusion is generally proper.
We shall adopt the convenient alternate notation for the “-th entry” of a type , which treats as if it were realized (i.e., as though were a state in ).
3.2.2. Topology on the layer state space
We adopt a structural perspective wherein states are to be distinguished only through predicate values; thus, a state is implicitly identified with its type . We topologize with (the “pullback” of) the product topology under such identification. A slightly more concrete description of this topology is as follows: For each predicate , endow with the pseudometric , and topologize by the collection of all such pseudometrics. This topology “by type” is the only one we shall introduce on (except in certain examples meant to compare this topology to others).
CSSs are assumed to satisfy the following:
-
•
Reduction Axiom for Computation States Structures. States are equal only if their types are equal.
The Reduction Axiom above is equivalent to the requirement that distinct states be topologically distinguishable; since is Hausdorff, reduction amounts to requiring that itself be a Hausdorff topological space (any two states have disjoint neighborhoods).
Even if not imposed a priori on a CSS , the Reduction Axiom is always satisfied if one replaces by its quotient upon identifying equal-in-type states, and each predicate on by the naturally induced predicate on . From a structural viewpoint, and are identical (isomorphic, in the sense of Keisler’s General Real-Valued Structures [Kei23]).
Remark 3.1.
By Proposition 2.1, if is countable, then the topology on is metrizable. Even when is countable, however, our purposes are better suited by thinking of as endowed with the topology (and corresponding uniformity [Eng89, §8.1]) explicitly given by the full predicate collection, rather than by an implied “master” metric which, in an abridged manner, induces the same topology.
3.3. Tychonoff and Realcompact spaces
3.3.1. Tychonoff spaces
Recall that a topological space is Tychonoff if it is , i.e., a completely regular Hausdorff space; explicitly: (i) points are closed, and (ii) given any given point and closed there exists a continuous function such that and .
Remark 3.2.
A reduced CSS is ultimately just a Tychonoff space endowed with a distinguished family of real functions (distinguished predicates), and such that the topology on is initial by the collection (i.e., the topology of is generated by the inverse images of open intervals of under functions ). From another perspective, any CSS is isomorphic to a “sub-CSS” of a CSS via the type map , which is a homeomorphic predicate-preserving embedding; therefore, such product CSSs are universal.
The distinguished predicates are regarded as being “computable on ” ab initio; they also may be seen as monomials generating some polynomial algebra of continuous real functions on ; the uniform closure of the set of such monomials is the algebra of “definable predicates” on (which are, by necessity, continuous real functions on ). In general, however, is a proper subalgebra of the full algebra of continuous real functions on . Any function is non-definable over ; it is appropriate to think of such as “transcendental” over —not merely in an algebraic sense, but in a stronger topological one: not only does such a non-definable fail to be a polynomial on monomials ; in fact, it is not even possible to approximate uniformly on by such polynomials.
3.3.2. Realcompact spaces
A topological space is called realcompact if it is Tychonoff and it embeds homeomorphically as a closed subspace of the topological product for some index set [Eng89, §3.11]. (There is a multitude of equivalent definitions of realcompactness. For a thorough treatment of realcompact spaces, refer to Weir’s monograph [Wei75].)
A CSS is realcompact iff the type map has closed image in , i.e., if is the full space of state types of (all state types are realized). Any compact (Hausdorff) CSS is necessarily Tychonoff and in fact realcompact: Taking to be any set continuous functions separating points of , the type map is injective and has compact, hence closed, image; it is therefore a homeomorphic embedding.
3.3.3. Realcompactness of type spaces
Any type space is a closed subspace, hence realcompact. Identifying the layer space with its embedded image , it is suggestive to regard the realcompact type space as a canonical realcompact extension of . Such viewpoint is quite appropriate for our purposes, so we discuss in what precise sense this realcompact extension is canonical.
More generally, consider any Tychonoff space whose topology is initial with respect to a collection of real functions (i.e., inverse images of opens of by such functions generate the topology of ). Each point has a -type , and embeds (via the map ) as a subspace whose closure (the -type space of ) is realcompact.
The type space depends on . A key observation is that each of the functions extends to continuously (as the “-th coordinate” map ). However, other real functions on —even if continuous—need not extend to continuously. Thus, the -type space possesses the universal property that every admits a unique continuous extension to ; it is characterized by such universal property up to homeomorphism.121212In fact, any real function on that is uniformly approximable by polynomials in the functions is (necessarily continuous, and) extends continuously to a real function on (a uniform limit of polynomials in the corresponding functions on ), so possesses the extension property for functions in the uniform closure of the real algebra generated by functions .
Remark 3.3.
Let be the set of all continuous real-valued functions on a Tychonoff space , and let be the corresponding type space, called the realcompactification of . Every continuous function on extends to a continuous function on the realcompactification . In fact, for any , the -type space is a quotient (not a subspace!) of in a natural manner: indeed, is the image of under the natural projection map . Thus, given a fixed set of continuous real functions on , it is appropriate to think of the -type space as a “-relative realcompactification” of , since possesses the universal extension property only for functions —rather than for all , which corresponds to the (“absolute”) realcompactification of .131313The notation (“upsilon-”) is standard for the realcompactification—denoted above—of a Tychonoff space .
3.3.4. Realcompact CSSs
We single out the (sub)class of CSSs possessing the:
-
•
Realcompactness Property. Every type is realized, i.e., of the form for some .
Thus, realcompactness is the requirement that be a surjection onto the type space , whence is a homeomorphism (by the Reduction Axiom). Rephrasing, realcompactness states that whenever and on are such that the ultralimit exists, then some satisfies .
It is appropriate to regard realcompactness as capturing a certain notion of “completeness” or “saturation” of the space . Particularly when is infinite, realcompactness is a rather strong requirement on CCSs, so we do not impose it as an axiom; instead, we rely primarily on the realcompactness and universal properties of the type space .141414When is finite, realcompactness is a rather mild requirement: it is seen to be equivalent to the completeness of , where is the metric introduced in the proof of Proposition 2.1.
3.4. Compositional Computation Structures
A Compositional Computation Structure (CCS)
for a given set of predicate symbols consists of
-
•
a CSS with predicate symbol set and, for each , a real predicate ;
-
•
a semigroup , the computations sort (the—associative—semigroup operation is denoted simply when convenient);
-
•
a map (“evaluation”) giving an action of on (i.e., for and ).
Remarks 3.4.
-
(1)
In principle, the semigroup operation ‘’ of and evaluation action ‘’ are abstract (i.e., not literally composition and application of functions). However, one may always regard as a semigroup (under the operation ‘’ interpreted as composition) of maps ; —i.e., regard as a sub-semigroup of the semigroup of all maps , under bona fide functional composition: Nothing of structural relevance is lost thus. The structural viewpoint abstracts inessential aspects of a concrete such realization of . In practice, it is convenient to identify with .
-
(2)
In applications, a more general notion of CCS with -ary computations is useful. By this we mean that computations may each have an arity such that is an (-argument) map . (It is appropriate to regard evaluation on -ary such as a map , where is the set of -ary elements ; thus, gives an -argument map .) CCS with -ary computations augment the semi-group operation with a richer set of operations realizing arity-appropriate compositions. Explicitly, given and , there exists an element satisfying
i.e., the above identity holds for a suitable “generalized composition” operation —or, rather, for one such an operation for each — moreover, the sort of computations is endowed with all such compositions.
In order to simplify the exposition, CCSs with -ary computations as in the preceding remark will be used only in informal discussions and examples.
3.4.1. Reduction and Continuity Axioms
Every CCS will be assumed to satisfy the following axioms:
-
•
Reduction Axioms for Compositional Computation Structures.
-
(1)
States are equal only if their types are equal (i.e., the underlying CSS is reduced);
-
(2)
Transformations are equal only if the maps are equal.
-
(1)
As a temporary (weaker) placeholder for the Extendibility Axiom (see §4.3.1) eventually imposed on CCSs, we presently impose the natural:
-
•
Continuity Axiom: The action of on is by maps continuous in the topology of (i.e., is a topological action on the CSS ).
Explicitly, for each computation and , the real-valued “-feature” of is continuous on .
Reduction Axiom (2) says that is bijectively identified with its image of maps (“state transitions”) (). This identification implies has a natural topology on , obtained (as pullback) from the topology of pointwise convergence on the maps associated to computations ; the Reduction Axiom implies that this topology is also Hausdorff.
As long as the Continuity Axiom holds, the Reduction Axioms are innocuous requirements on a CCS , because one can always pass from to a reduced CCS (i.e., one satisfying the Reduction Axiom) as follows. First, replace by its quotient upon identifying computations such that for all ; second, pass from the underlying CSS to its quotient-by-type if necessary. The evaluation of induces a well-defined natural action . By the Continuity Axiom, the passage from to preserves all structural properties of states and computations, as well as the Continuity Axiom.151515The passage to also preserves the Extendibility Axiom §4.3.1.
Remark 3.5.
The Continuity Axiom ensures that computations act continuously on . In general, however, the action of a computation on the state space need not admit a continuous extension to a transformation . This distinction is quite important; it speaks to the weakness of the Continuity Axiom, and suggests a strengthening called the Extendibility Axiom, which is a key assumption in our main results.
3.5. Examples of CSSs and CCSs
3.5.1. The unit interval
Consider a CSS with state space (the unit interval) endowed with the single identity predicate . This CSS is realcompact.
Let be any semigroup (under composition) of continuous functions , acting on by functional application: ; this yields a realcompact CCS . An interesting such CCS has semigroup consisting of iterates of the chaotic map .
Replacing with the open interval , one obtains a non-realcompact CSS having (realcompact) type space . (By contrast—cf., Remark 3.3—the realcompactification is a much larger topological extension not homeomorphic to a subset of .)
3.5.2.
Given , we obtain a CSS on the -dimensional real space endowed with coordinate functions () as distinguished predicates. The corresponding type space is ; the type topology coincides with the usual one, so is realcompact.
There is ample flexibility in expanding the collection of predicates yielding formally distinct CSSs with layer states sort . For any real , one may (for instance) expand the predicate collection with the “-norm” predicate defined by
One may also expand the predicate collection with, say, the supremum norm
Since is finite, the predicates above are continuous with respect to the topology of . In fact, any continuous function may be added to the predicate collection of yielding an essentially equivalent CSS, because any such is a definable predicate in the sense of §6.1 below; therefore, such expanded CSSs are still realcompact with layer states sort .161616On the other hand, the addition of new predicates that are discontinuous with respect to the usual topology of expand to a CCS that is no longer realcompact.
Expanding with:
-
•
any semigroup of continuous functions on ; and
-
•
the evaluation action of on by functional application as in 3.5.1 above,
one obtains a CCS . A natural such expansion is by the semigroup of linear operators on .
3.5.3. and
Let the CSS have states space consisting of all real sequences , endowed with the collection of predicates (). Such CSS is realcompact.
The subspace consisting of sequences having at most finitely many entries is a non-realcompact sub-CSS of .
A natural expansion of to a CCS is by the semigroup of linear operators thereon. Each linear such computation is effectively a collection of real functionals , each of the form , for some scalar collection . Thus, every entry of is exactly given as an effectively finite linear combination of entries of , i.e., of finitely many real-valued features of the input . (Reciprocally, linear functionals on are in correspondence with elements of .)
Many natural real functions on (or on suitable subspaces thereof) are discontinuous (in the topology of entry-wise convergence); expanding with any such function as distinguished predicate leads to (non-homeomorphic) CSSs.
3.5.4.
For any extended real , consider the layer states space
For , such space is the -metric completion of the subspace ; at any rate, is -complete as well.181818The -metric completion of is the separable space —the space is not separable. A natural collection of predicates is , where is the -th coordinate as in 3.5.3 above, and names the norm . Since the predicate collection is countable, it is easy to show that the type topology and the usual -norm topology on the layer space coincide;191919Cf., the proof of Proposition 4.1 below. however, is non-realcompact. It is easy to see that its type space is . (The set of realized types is , for which the “correct” norm agrees with the interpretation value of the symbol .) Fixing , the function is -Lipschitz, hence continuous on ; however, the corresponding function does not extend continuously to .202020By the Stone-Weierstrass Theorem, every continuous function on the compact Hausdorff space is uniformly approximable by algebraic combinations (finitely many at a time) of predicates , and ; however, an elementary argument shows that admits no such uniform approximations on .
A natural expansion of to a CCS is by its semigroup of bounded (i.e., -continuous) linear operators. Such operators are continuous on ; however, they are discontinuous when regarded as functions on the reduct CSS of wherein the additional predicate removed, i.e., when is topologized as sub-CSS of rather than of as above.
Remark 3.7.
The metric on (or on for finite) is not definable in terms of the norm predicate unless is expanded to a CCS with, say, the binary predicate of subtraction . This remark, although not meant to detract from the preceding discussion, does serve to highlight the usefulness of CCSs with -ary layer transformations (cf., Remark 3.4).
4. Deep Computations
Throughout this section, will be a fixed CCS.
4.1. Shards in state- and type-spaces
4.1.1. Sizers and shards in type spaces
A sizer is any collection of nonnegative reals. The number is called an a priori bound for .
For a sizer , we introduce the topological product space
it is a compact subspace of the product space ; moreover, (with varying over sizers). A subset is called entry-wise bounded if for some sizer . Clearly, relatively compact subsets of are precisely entry-wise bounded subsets.
4.1.2. Shards
For a sizer , the -shard of is
Clearly, an arbitrary intersection of shards is a shard, and any finite union of shards is included in some shard.
Let the type -shard be the topological closure of the set of types realized in (i.e., by elements of) . Evidently, ; however, the inclusion is typically strict because a type need not be an accumulation point of types realized in itself, thus need not belong to .212121In general, a type need not even be an accumulation point of realized types in any fixed shard ! We introduce the space of shard-supported types; it is the set of types of arbitrary shards . By the preceding discussion, we have , but the inclusion is proper in general ( need not be closed in ). The space will be of central importance in what follows.
A collection of sizers is exhaustive if, for any sizer there exists such that for all .
From its definition, it is clear that is the union of type-shards as varies over any exhaustive .
Recall that a Hausdorff space is
-
•
a k-space if closed subsets are precisely those whose intersection with an arbitrary compact subset is closed [Eng89, 3.3.18ff];
-
•
a kR-space if an arbitrary real function is continuous as soon as its restrictions to compacta are continuous.
Evidently, any k-space is a kR-space.
Proposition 4.1.
Let be a CSS whose distinguished predicate collection is countable.
-
(1)
(i.e., all types are shard-supported).
-
(2)
is a k-space. More precisely, closed subspaces are (precisely) those whose intersection with an arbitrary type-shard is closed.
A fortiori, the result holds when the predicate collection is finite.
We thank F. Tall for bringing to our attention the fact that realcompact spaces embeddable in are k-spaces.
Proof.
For , introduce the pseudometric on , and let be the usual -valued pseudometric corresponding to . The space is completely metrizable by , and the subset of realized types is dense.
-
(1)
Given , there is a sequence such that in the -metric sense. The set is compact (any open cover of contains an open ; since , all but finitely many elements of are contained in , so has a finite subcover). By compactness of , the image is compact in for each , hence bounded, say for some and all . Clearly, , so (, in any case); hence, .
-
(2)
If () is not closed, let . As in (1) above, construct with . Since , for each there exists some such that . One sees that is compact.222222Clearly, is compact for each (as in (1) above). Given any open cover of , some satisfies . For a set containing all but finitely many , we have . The set is a finite union of compacta, hence itself compact. Thus, finitely many opens of cover , which together with cover . Therefore, for some sizer , and thus , hence . Since and we see that is not closed. The converse is trivial, since type-shards are closed (and the intersection of closed sets is closed).232323We recall the following fact, closely related to (2): Every sequential Hausdorff (therefore, every metric) space is a k-space [Eng89, Theorem 3.3.20].∎
Remarks 4.2.
-
(1)
For at most countable, Proposition 4.1 implies that closed subsets are those whose intersections are closed for all sizers in any exhaustive collection .
-
(2)
A compact subset need not be included as a subset of any type shard (even if is countable).
4.2. Transitions-in-type. Extendibility.
Any—not necessarily continuous—function (resp., ) will be called a (layer) transition (resp., an ultra-transition (u-t)). We also introduce the notion of transition-in-type (t-t) to mean any function .
A transition (resp., an u-t ; a t-t ) is shard-to-shard (Sh2Sh) if, for every sizer there is a sizer such that restricts to a map (resp., restricts to ; restricts to ).
The transition space (resp., ultra-transition space) of is (resp., ); note that in a natural fashion (upon identifying with the subset ). These spaces generally include (ultra)transitions that are not shard-to-shard.
We regard as the topological product ; equivalently, via the inclusion , the space is topologized as a subspace of the product : This is the topology of pointwise convergence of the real functions for fixed . The space inherits the subspace topology (of pointwise convergence).
4.2.1. Extendable layer transitions
The Continuity Axiom ensures that every computation is continuous; however, it need not extend to a continuous map , which renders such realized computations rather poor foundational blocks for our subsequent treatment of deep computations. To remedy such deficiency, we will axiomatically require that realized computations be extendable as suggested in Remark 3.5. Such extendibility requirement is rather strong; moreover, its consequences are strongest in regard to the restrictions to compacta of t-ts, rather than the t-t themselves. In this light, it is natural to require computations to only be extendable to compact shards when restricted to shards , which motivates the following definition.
A transition-in-type is called shard-continuous (or Sh-continuous) if its restriction to each shard is a continuous map into some type-shard (in particular, a Sh-continuous t-t is shard-to-shard). A transition (more generally, an u-t ) is called Sh-extendable if, for every sizer , its restriction extends to a continuous function into some type-shard. (It suffices to impose this condition for sizers in a given exhaustive collection .)
Remark 4.3.
A continuous shard-to-shard t-t is Sh-continuous, but the converse fails in general.
4.2.2. Spaces of transitions-in-type
Both and are semigroups under the binary operation of composition; this operation is continuous in the left argument , but not in the right argument .
The subsets (resp., ) of transitions (resp., t-ts) that are shard-to-shard are subgroups; however, are typically not closed subspaces.
Recall that a set of sizers is exhaustive if, for any sizer , there exists such that for all (cf., Remark 4.2). In particular, in such case.242424Provided is countable, by Proposition 4.1, we have for exhaustive.
Given a sizer , we say that is -preserving if it restricts to a map . The set of -preserving transitions is denoted . A collection is called -preserving.
Given an exhaustive collection of sizers, we say that confines , or is -confined, if is -preserving for each . The collection of all -confined t-ts is denoted ; it is a closed sub-semigroup of . One sees that , i.e., -confined t-ts are necessarily shard-to-shard. Moreover, is compact as shown in Proposition 4.5 below.
The notions above have formally identical analogues for transitions, i.e., with in place of .
A family is:
-
•
confined by an exhaustive sizer collection (or: -confined) if ;
-
•
pointwise bounded at if there is a sizer (a pointwise bound for at ) such that for all ;
-
•
pointwise bounded on , if it is pointwise bounded at every ;
-
•
pointwise bounded, if it is pointwise bounded on .
Remarks 4.4.
-
(1)
Given exhaustive, we have , where is the sizer collection defined by
(4.1) In particular, every confined family is pointwise bounded.
-
(2)
is pointwise bounded on iff there is a collection of pointwise bounds at each point . The corresponding set of t-ts is denoted ; thus
(4.2)
The notions of -preserving, -confined, and pointwise bounded (shard-to-shard) layer transitions (and collections of such transitions), and the definition of , , are obtained from those for transitions-in-type mutatis mutandis (simply replacing for , and for ).
Mutatis mutandis, one may define pw-bdd transition spaces .
Proposition 4.5.
For any collection of sizers at all points , the space of -pointwise bounded transitions-in-type is compact. In particular, is compact for any exhaustive sizer collection .
4.3. Computations and ultracomputations (deep computations)
4.3.1. The Extendibility Axiom
The transition associated to —also denoted —of a layer transformation is the map .
If such a transition is Sh-extendable, we call it the computation by , or realized by for emphasis. By an abuse of notation, we will denote the extension still by ).
For the remainder of this paper, we assume that CCSs satisfy the following
-
•
Extendibility Axiom. Each layer transformation induces a Sh-extendable computation .
The Extendibility Axiom gives a natural (injective) map . The semigroup is topologized via (the pullback of) this map, i.e., by the topology of pointwise convergence; it is the subspace topology obtained upon identifying with the set , called the space of realized computations.
It follows from the Reduction Axiom that the above topology on is Hausdorff.
4.3.2. Realized vs. deep computations
The space of ultracomputations is the topological closure . A transition will be called a deep computation, ultracomputation, or ucomp for short. Although any computation is a deep computation in its own right, the adjective “deep” implies that may be an unrealized computation, i.e., not of the form . Deep computations are typically (Sh-)discontinuous layer transitions. Even if Sh-continuous, an ultracomputation may be unrealized.
Every deep computation is of the form for some indexed family and some ultrafilter on . (Without loss of generality, one may always take as an ultrafilter on itself.)252525For arbitrary on (say) , the ultracomputation need not be defined: might not -converge for certain .
For any sizer collection , let be the set of ultracomputations with pointwise bounds . Since is closed by definition, the space is also closed in . For any fixed sizer and exhaustive , we see that and (the sets of ultratypes -preserving and -confined, respectively) are also closed.
By an abuse of nomenclature, we say that an element admits pointwise bounds (resp., is -preserving, is -confined) if its transition type does (resp., is). We denote by , , and , respectively, the sets of transformations with associated transitions in , , and . The respective uniform notions as varies in some subset become: admits uniform pointwise bounds , is -preserving, and is -confined, respectively.
An ultracomputation with values in is called quasi-realized; these constitute the set : the space of quasi-realized ultracomputations. Let , , and .
Proposition 4.6.
For any sizer and exhaustive collection :
-
(1)
each of the sets , is a sub-semigroup of , and is a closed subset of ;
-
(2)
, are closed sub-semigroups of ;
-
(3)
is a compact sub-semigroup of ;
-
(4)
, , are closed sub-semigroups of .
The ultracomputation space is akin to the concept of “enveloping group” (of ). However, only the confined sub-semigroups are compact (the full space is typically noncompact).
Proof.
The set of ultrafilters on is itself a semigroup under a natural (“convolution”) operation [HS10]. This operation of convolution possesses (and is essentially characterized by) the following property —when is identified with the transitions semigroup : If two transitions are of the form , , then . It follows that is a sub-semigroup. As the intersection of a compact (by Proposition 4.5) with a closed subset of , we see that is compact. The remaining topological statements are all trivial and left to the reader ∎
Proposition 4.7.
The ultracomputation exists for any exhaustive and any ultrafilter on .
Proof.
This is an immediate corollary of Proposition 4.6. ∎
5. Deep Iterations and Deep Equilibria
Throughout this section, fix a CCS . For convenience, we assume some element (“identity”) satisfies the equality for all .
We reiterate the Extendibility Axiom that each layer transitions extends to a Sh-continuous transition .
5.1. Layered and iterative computations
Let be any sequence of computations (i.e., any element of the product space ). We regard as a sequence of “computation steps” to be successively applied (see the definition of Layered Computation below). The computation will be called the -th atomic step, or the transition at layer (to layer ).
Layered Computations (LCs)
Given a sequence of computation steps, the sequence defined recursively by
(i.e., ) is called the layered computation with atomic steps (or LC, for short).262626Thus, LC denotes , simply adding context to indicate the layer transitions yielding . The term is called the -composite computation step of LC. A layered computation may also be called recursive, for obvious reasons. The set of layered computations LC obtained as varies is denoted .
For a sizer , let be the set of -preserving LCs. (Note that it is the products —but not necessarily the atomic steps —that are required to preserve the layer .) For an exhaustive sizer family , let be the set of -confined LCs (or LCs confined by ).
The LC-evolution of a state is the sequence
The term “evolution” means “-evolution” henceforth, whenever is given by context. The state at stage of under evolution is .
Iterative computations (ICs)
Any fixed yields a constant sequence . The corresponding LC has composite steps given by the sequence of compositional powers (iterates) of ; we will call such LC an iterative computation (or just iteration) by , and denote it by IC. It is appropriate to think of iterative computations as evolving by “tying parameters” in the sense that all atomic steps are always the same (i.e., the “tied parameter” is itself). Note that IC is -confined (or -preserving) if and only if the fixed atomic step is so.
5.2. Deep layers, deep iterates, and equilibria
5.2.1. Deep layers
A deep layer of LC is any deep computation that is an accumulation point of the sequence of composites . Any such deep layer is of the form obtained as (pointwise) -ultralimit via a nonprincipal ultrafilter on . (We use the notation for such ultracomputation (in-type) when the dependence on and is to be made explicit.) For a confined such LC, deep limits exist for arbitrary , by Proposition 4.7. If LC is not confined, the computations may diverge.
5.2.2. Deep iterates
A deep iterate of is a deep layer for IC.
The deep layer that is obtained via an ultrafilter on is denoted ; it need not exist in general, but does if is confined (by Proposition 4.7). Every deep iterate is a deep computation.
Remark 5.1.
In the nomenclature of [BKK19], a deep layer of LC is an “implicit layer”. They consider primarily compositions of layer transitions (i.e., LCs in our sense) with “tied parameter” (the same layer transition at each stage), i.e., ICs in our sense. From our perspective, implicit layers are given each by some nonprincipal ultrafilter on , i.e., are of the form .
5.2.3. Deep equilibria
A deep equilibrium (layer) of IC is an idempotent deep iterate (), i.e., a deep iterate such that for all (hence the nomenclature “equilibrium”). It will also be called a (deep) iterative equilibrium of .
Remark 5.2.
Although any iterative equilibrium of IC satisfies , one generally has . The “equilibrium” property is self-referential, rather than in direct relation to the original computation . Let us call a deep iterate of IC “-fixed” if . Such -fixed deep iterates need not exist even under the strong hypothesis (ensuring that deep iterates exist at all) that IC is confined. On the other hand, if perchance an a deep iterate of IC satisfies , then certainly is a deep equilibrium in our sense.
Theorem 5.3 (Existence of Deep Iterative Equilibria).
Let be confined. Then there exists at least one iterative equilibrium for IC.
Theorem 5.3 is essentially a particular case of the classical Ellis-Numakura Lemma; the proof below is standard (as in [Fur81]).
One cannot generally hope that deep iterative equilibria exist without some boundedness assumption (such as confinement). Moreover, need not take values in , so it is not even composable with itself a priori! This highlights the need to consider transitions in type rather than as maps on the layer state space .
Proof.
Let confine IC. Let be the topological closure of the semigroup of transitions by iterates of (excluding the trivial iterate ). By Proposition 4.6 (in the CCS obtained from with computations semigroup generated by ), is a compact Hausdorff topological semigroup under composition , which is continuous in the left argument (for fixed ). Elementary algebraic and topological considerations (in particular, the compactness of ), and Zorn’s Lemma, imply that has some minimal closed (nonempty) sub-semigroup (i.e., has no proper closed sub-semigroups). Fix . The set is closed (since is continuous and is closed); moreover, , so is a closed sub-semigroup, hence by minimality of . Therefore, for some . Let . Thus, is clearly a nonempty sub-semigroup of , and also closed (as the inverse image of the closed singleton under the continuous map , again). By minimality, , so . ∎
5.3. Examples and discussion of deep iterates and deep equilibria
Example 5.4.
Let be a finite set of, say, distinct elements. The choice of predicates is inessential in this context: we may simply take : it is finite and therefore realcompact. Let be any function, and (as a semigroup under composition) act on by functional application . Let (the identity function) be the sole predicate on . In this way, we obtain a (realcompact) CCS . Since is finite, there is such that (thus, ); in particular, restricts to a bijection of ; by relabeling points of if necessary, we may as well assume (). Thus, is a permutation of . Let be the order of (thus, ). Let be any integer such that and divides . Then is a deep iterative equilibrium of : indeed, for ,
It is easy to show that is the unique iterative equilibrium of in such case.
Example 5.5.
Consider CCSs of the form as in 3.5.1, where is a continuous map, the semigroup of iterates of under composition, acting by functional application on . Already in this one-dimensional compact setting, there is a variety of possible behaviors of deep iterates and equilibria of IC.
If is an equicontinuous family of functions on , the Arzelà-Ascoli Theorem implies that there exists a (sub)sequence of iterates converging uniformly to a continuous limit , which is therefore a continuous deep iterate of . In general, however, even if some deep iterates are continuous, some deep equilibria may be discontinuous. Typically (and necessarily so when is a chaotic function—e.g., the logistic map ), the semigroup is not an equicontinuous collection of functions, and deep equilibria (as well as deep iterates) are necessarily discontinuous. Moreover (in contrast with the equicontinuous case possessing continuous deep iterates sequential achieved sequentially), deep iterates of a chaotic IC cannot be obtained as sequential limits , but generally only as ultralimits.
Example 5.6 (Deep iterates and equilibria of Newton’s Method).
Fix a polynomial with (real or) complex coefficients—say, of degree . Consider the CCS
where
-
•
is the Riemann sphere, which we identify with the unit sphere via, e.g., the stereographic projection (and );
-
•
is the inverse of the stereographic projection, regarded as a triple of predicates ; and
-
•
is the transition carrying out one step of Newton’s method to find the roots of , regarded as a Möebius transformation acting on (thus, meromorphic, and hence continuous as a map ).272727Since , it is straightforward to verify that extends continuously to by when either or .
Since is compact and is a homeomorphic embedding, In fact, is equal to the shard (in particular, is realcompact); thus, is automatically confined (by consisting of the single sizer ).
Let be any deep iterate of . At any point for which the Newton method converges to a root of (in particular, at any sufficiently close to a simple such root ), we have (, since ). We also have ; however, is a repeller282828Perhaps surprisingly, it is possible for the fixed repeller to be an accumulation point of orbits . This is the case, e.g., for the polynomial . (this follows from the easy calculation that as ), so one would expect points with to be quite scant. In general, however, is not a root of , although any such is necessarily a topologically recurrent point of under . At any rate, if has at least two distinct roots, any deep equilibrium (or deep iterate) of is discontinuous.
Many examples of polynomials for which Newton’s method converges for a very large set of inputs are known. The most one can hope for is that the method converges to a root for all inputs except those in a (say) closed subset of “bad” inputs (in particular, ) which, in the best of cases, is nowhere dense; such is the case, e.g., for , where is perhaps the best-known example of a Newton fractal. All deep iterates and equilibria have the same value at (convergent) inputs , and the common restriction of all such to is continuous. However, deep iterates and equilibria are typically discontinuous on, and their values differ, at inputs . Intuitively, deep iterates , giving distinct values are merely picking different subsequential limits of the divergent sequence .
Example 5.7.
The definitions of deep layer state and deep iterative equilibrium above are motivated by the notions of “Deep Equilibrium (DE)” in [BKK19]. However, iterative computations in [BKK19] allow “feeding” the initial state as an argument at each iteration by a (“parameter-tied”, i.e., fixed) layer transformation. Capturing deep iterative equilibria in this sense requires generalizing the notion of CCS. One way to capture the deep equilibria of Bai et al. is allowing to be a CCS with -ary (in fact, just binary) layer transformations as in Remark 3.4(2). Indeed, fix a binary , which induces a two-argument layer transition . Consider the map given by (the first entry of is simply a pass-through of the first argument, while the second entry applies the computation ). Then the second (nontrivial) entry of the iterates for represents the evolution of the computation passing through, at each step, the original argument as the first of two inputs.
If is realcompact and is -confined (i.e., restricts to a map for all ), the proof of Theorem 5.3 is adapted mutatis mutandis to computations in CCSs with -ary transitions. One shows thus the existence of deep equilibria, i.e., of idempotent maps arising as ultralimits of the iterates sequence of evolution by . (Without a realcompactness assumption, one needs suitable hypotheses on akin to Sh-extendibility.)
As an alternative to the use of CCSs with -ary computations, in Appendix A.1.4, we introduce the notion of Parametrized Family of Computations (PFC) to capture computations with feed-through in our framework. The ability to compute deep equilibria in an effective sense, as in Bai et al., presupposes that such equilibria are definable not merely in a continuous, but in a differentiable sense (allowing the use of generic solver—or fixed-point—algorithms, which typically rely on gradient-descent methods, e.g., Newton’s algorithm and refinements); we explain how such considerations of differentiability may be handled in CSS with finitely many predicates (considerations of differentiability when infinitely many observables are involved entail delicate analysis beyond the scope of this paper).
Remarks 5.8.
-
(1)
The results in Section 6 below say nothing about effectively computable features of (shard-)discontinuous deep iterates or equilibria such as those arising from Newton’s method iterations in Example 5.6 above. In an upcoming article, we extend the present results to discontinuous ultracomputations that are nevertheless de facto effectively computable in a localized sense.
-
(2)
Even in situations where, say, a deep iterate does not quite exist, an ultracomputation may have “meaningful deep features” in a sense that we now explain. Consider any CCS (not necessarily realcompact), where is any given (continuous) computation. For a fixed , say that has uniformly -bounded iterates on if there exists such that for all . (Note that this hypothesis does not—at all—impose bounds on other entries for .) If is any nonprincipal ultrafilter on , the iterate boundedness hypothesis and the compactness of intervals imply that exists for all . In principle, however, the iterates need not -converge in (i.e., pointwise on ) even if is realcompact, since (for fixed ) the sequence may not be entry-wise bounded (only bounded “in -th entry”, so to speak).
The study of aspects of deep equilibria introduced in Remarks 5.8 is quite delicate, and exceeds the scope of the present paper.
6. Explicit computability
Throughout this section, we fix a CCS . We assume has an identity element acting as the identity map on . We reiterate the Extendibility Axiom that each layer transitions extends to a Sh-continuous transition .
We shall implicitly identify a predicate symbol with the real-valued function interpreting it in , and also implicitly extend to a (unique continuous) function .
A real function will be called shard-bounded (sh-bdd) (resp., Sh-continuous) if its restriction to each shard is bounded (resp., continuous). (A Sh-continuous such function is necessarily sh-bdd.)
6.1. Polynomials in predicates and definability. Features of layer transitions.
-
•
Any predicate will also be called a monomial.292929In real-valued logic, the monomials above are called “atomic”.
-
•
A polynomial is any function obtained by combining real constants and monomials using any (recursive) combination the following operations, called connectives:
-
–
Addition: (where );
-
–
Multiplication: (where ).
The monomials appearing in an expression of some polynomial may be called its atoms.303030A polynomial need not have a unique expression in terms of monomials, so it is more accurate to say that has an expression involving certain specific monomials.
-
–
-
•
A definable predicate is a function whose restriction to an arbitrary shard is uniformly approximable by polynomials;313131The notion of “definable predicate” above is less restrictive than the (most) standard one in real-valued logic, wherein approximability is required to hold uniformly over the entire set (“universe”) . thus, is definable iff for every and sizer there exists a polynomial such that for all . The family is called a definition scheme for .
We only require definable predicates to be uniformly approximable on shards—not uniformly on the full state space .
Remarks 6.1.
-
(1)
Definable predicates formalize a notion of “explicit computability” of , in a certain local and approximate sense. Namely, given (i) any “approximation error” , and (ii) some a priori knowledge of the argument (i.e., knowing that belongs to a specific shard —this is the sense of “locality” of the computation), one may regard the -uniformly approximating formula to on as an explicit algorithm that (modulo an approximation error not exceeding ) computes . Numerical algorithms relying on floating-point operations are typically definable in the above sense: On the one hand, one must ensure that the calculation is stable under rounding errors (of the order of the machine’s ); on the other, such rounding errors on inputs potentially may lead to arbitrarily large output error unless the magnitude of inputs is bounded (i.e., unless the inputs belong to a given shard) a priori.
-
(2)
By the definition of the topologies on and , every monomial is continuous and bounded by on any shard , and extends continuously to (as the -coordinate function). Since connectives are obtained by pointwise application of continuous real-valued functions of real arguments (addition and multiplication), every polynomial on is also continuous, and extends to a continuous bounded function on type-shards . Definable predicates, on the other hand, need not be continuous on —although their restrictions to shards necessarily are continuous and bounded (being uniform limits of polynomial on , which is compact).
-
(3)
Let be an arbitrary sizer. The restriction of a monomial to the type-shard admits the a priori bound , so that that takes values in .323232A constant also admits the trivial bound . By recursion on the application of connectives leading to an arbitrary polynomial from monomials, a priori bounds such that takes values in are easily found. (Recursively apply the rules: , and .)
-
(4)
By definition of the topology on and the Reduction Axioms, the collection of (continuous) predicates (extended to the type space ) separates points of (a fortiori, points of any shard ). By the Stone-Weierstrass Theorem, any Sh-extendable , is necessarily definable. (In particular, any continuous is definable in such case.) Clearly, the condition may be relaxed to requiring that have continuous restrictions to type-shards for in some exhaustive . By contrast, continuous predicates need not be definable.
-
(5)
In general, a function whose restrictions to shards are continuous need not be continuous on (not even under the additional assumption that be realcompact). For (at most) countable, however, Sh-continuous real functions on the type space are continuous (since is a k-space in such case, by Proposition 4.1).
6.1.1. Definable features
Remarks 5.8 provide relevant context for this subsection.)
Given , the -feature of a transition-in-type is the real-valued function
(One may call such a feature “atomic” or “monomial”.)
Individual features of a transition-in-type may be definable or non-definable. A transition-in-type is definable if its features are definable.
In the setting of Section 5.2.3, one may ask under what circumstances a specific feature of a deep computation is effectively computable.
Sh-continuous features of transitions are definable, By Remark 6.1(4).
6.2. Definability of ultracomputations-in-type
Nonprincipal ultrafilters on infinite sets are ineffably inexplicit. Thus, as a first step towards grasping ultracomputations, it is natural to consider ultralimits of pointwise-bounded sequences indexed by the infinite countable set . Ultracomputations obtained in this form (as and vary) are accumulation points of arbitrary countable sets of realized computations.
Ultralimits obtained from countable subsets of , although less general than those obtained from arbitrary subsets, may still be quite complex. Given a countable set of pointwise-bounded computations, it is natural to consider sequential limits of , i.e., ultracomputations arising as pointwise limits of subsequences of , namely ultracomputations of the form
for subsequences (otherwise arbitrary) of .
By Proposition 4.5, pointwise-boundedness of implies that all ultracomputations exist for arbitrary on the index set of any family —regardless of the cardinality of or . Ultracomputations realizable from sequences are quite special; those realizable as sequential limits , even more so.
If is a pointwise-bounded sequence, and is at most countable, then at every fixed , the ultralimit is realized as a sequential limit (by a standard diagonalization argument); however, the realizing subsequence will typically depend on and cannot be chosen uniformly over . When is uncountable, sequentially realizing an ultralimit of —even at a single point —may be unfeasible.
The results in this concluding section relate (i) continuity on shards of ultracomputations, (ii) the ability to obtain such ultracomputations as accumulation points of countable sets of computations, or as sequential limits of computations, (iii) the definability of such ultracomputation, and (iv) a limit-exchange criterion (originally due to Grothendieck).
6.2.1. Relative compacta of continuous layer transitions
For any topological space , let be the set of all continuous real functions on , endowed with the relative (subspace) topology of the product , i.e., the topology of point-wise convergence at each . More generally, given two spaces , the space is the subspace of the product consisting of continuous maps . (“” means “pointwise topology on continuous functions”.)
Note that are generally not closed subspaces of .
A Hausdorff topological space is countably compact if every infinite (equivalently, every infinite countable) subset has a limit point . A subset is relatively countably compact (or countably compact in ) if every infinite (equivalently, every infinite countable) subset has a limit point .
(One may take the properties above as the definition of (relatively) countably compact for arbitrary, not necessarily Hausdorff spaces . However, the Hausdorff assumption implies desirable additional properties, e.g., [Eng89, Theorems 3.10.2, 3.10.3, etc.]. In our applications, is always a subspace of the layer state space of a CCS, or of the type space , and hence Hausdorff.)
A topological space is angelic if (i) every relatively countably compact subset is relatively compact, and (ii) the closure of any such (relatively compact) consists precisely of limits of sequences in .333333A topological space possessing property (ii) above is called Fréchet-Urysohn.
6.2.2. A topological result of Grothendieck
Theorem 6.2.
Let
-
•
be a countably compact topological space;
-
•
, any Tychonoff space, having the property that its relatively countably compact subsets are relatively compact (which necessarily holds in case is realcompact);
-
•
any dense subset.
Then:
-
(1)
is angelic.
-
(2)
Assume that is explicitly embedded as a subspace for some index set . A set of continuous maps is relatively compact if and only if
-
(a)
is pointwise bounded (i.e., is bounded for each and )343434Here, we use the notation for the “-th coordinate” of any ., and
-
(b)
for all sequences , , any and ultrafilters on , the following equality (called the limit-exchange property) holds between iterated ultralimits:
(6.1) which both exist.
-
(a)
-
(3)
Even if all hypotheses on pertaining to compactness are omitted (i.e., is Tychonoff and arbitrary), the limit-exchange condition (b) alone implies that every accumulation point of is continuous (i.e., the closure ).
For a contemporary exposition of Grothendieck’s theorem and its consequences, we refer the reader to the paper on angelic spaces and the double limit relation by König and Kuhn [KK87].
Proof.
Theorem 6.2 aggregates several results in Grothendieck’s “Critères de compacité” [Gro52, Théorèmes 1 & 2, Remarque 2, Corollaire 2]. Presently, we merely offer some remarks on translating between French terms and decades-old nomenclature to their contemporary equivalents in English. Spaces (where “s” refers to the “simple” topology, i.e., of pointwise convergence) are now denoted (or just , when ). “(Relativement) semi-compact” (resp., “relativement compact”) refers to (relatively) countably compact (resp., relatively compact) sets. Functions take values in , which we take to be a Tychonoff space (“complètement régulier”—i.e., completely regular and Hausdorff in the standard contemporary sense) endowed with an embedding into a product , hence is a uniform Hausdorff space (“espace uniforme séparé”) [Eng89, Sections §1.5, §3.10, §8.1]. ∎
Remarks 6.3.
-
(1)
Condition (a) above implies that both iterated ultralimits in equation (6.1) in (b) exist. However, (b) explicitly asserts the requirement the limits exist—not merely that they are equal when they exist.
-
(2)
The hypotheses on are satisfied if is realcompact, in which case the embedding is as a closed subspace of the product; moreover, any embedded as a closed subspace of any such product of lines satisfies all hypothesis (including those in part (2) of the theorem).
6.2.3. The Fundamental Theorem of Definability
Theorem 6.4.
Let be a CCS. Let be an exhaustive sizer collection, and let be any -confined set (of Sh-extendable computations, by assumption). Then, the properties below are equivalent:
Extendable Ultracomputations (uExt)
Every ultracomputation over is Sh-extendable.
Limit Exchange (LE)
For all sizers , all sequences and , and ultrafilters on , the iterated ultralimits and both exist and are equal:
(6.2) |
Uniform Approximation (UA)
Every ultracomputation over is definable without parameters: For any sizer , any , and all , there exists a polynomial (without parameters) such that
(6.3) |
Moreover:
-
(1)
In case any (hence all) of the above conditions hold for , the restriction of any ultracomputation over to any type-shard is the limit obtained from a sequence .
-
(2)
For arbitrary (i.e., not a priori included in for some exhaustive ), the Limit Exchange condition alone implies that all ultracomputations over are Sh-extendable.353535The explicit LE hypothesis that both iterated ultralimits in (6.2) exist is essential when and the implied pointwise bounds on are not given a priori.
Proof of Theorem 6.4.
Because of the hypothesis , it is quite clear that one may specialize all uses of sizers and universal properties of sizers to involve sizers only.
In Grothendieck’s Theorem 6.2, let (realcompact) and, for a momentarily fixed , let (compact, hence countably compact), and . Denote by the set of functions as varies. By Theorem 6.2, the condition that all ultracomputations over are continuous on is equivalent to the relative compactness of .
Since is angelic (Theorem 6.2(1)), assertion (1) follows.
The pointwise boundedness condition 2(a) in Theorem 6.2 is satisfied since is pointwise bounded (as is uniformly confined by assumption); therefore, relative compactness of is, in turn, characterized by the Limit Exchange condition (equivalent to 2(b)), so LE is equivalent to the preceding three conditions. Moreover, assertion (2) follows from Theorem 6.2(3).
Any feature of any transition-in-type , if uniformly approximable on some shard by polynomials —any of which has a unique extension to a continuous real function on , bounded on —must necessarily extend continuously to . Letting and vary, we see that a definable ultracomputation is necessarily Sh-continuous: UA implies uExt. Reciprocally, by the Stone-Weierstrass Theorem, every continuous real function is uniformly approximable by polynomials in predicates (because these predicates separate points of ), i.e., by polynomials without parameters. Therefore, any Sh-continuous ultracomputation is definable without parameters: uExt implies UA. ∎
Remarks 6.5.
-
(1)
The extendibility condition (uExt) in Theorem 6.4 may be regarded as auxiliary in proving the equivalence LEUA. The implication UALE is not difficult to prove directly: On the one hand, UAuExt by the straightforward argument in the proof above. Afterward, uExtLE follows easily: uExt implies that every ultracomputation is continuous on any compact , and LE simply states the continuity of at ultralimit points of the form for arbitrary state sequences .
By contrast, the implication LEUA may be seen as a significantly deeper consequence of Grothendieck’s Theorem: A natural limit-exchange condition implies that layer transformations-in-type are explicitly computable!
-
(2)
One could take a probabilistic approach to the uniqueness and computability of equilibria inspired by ideas from deep learning and the Examples 5.5 and 5.6 in Section 5.3. For simplicity, assume that is realcompact (so ). The uniqueness and continuity of deep iterates at a state may be tested empirically by taking finitely many independent random points in a small neighborhood of and computing for some large and also random integers . To the extent that the points are (or are not) near each other, one may infer (in a statistical sense) whether is (or is not) continuous at with increasingly larger probability as grows. At points of continuity (as determined with high probability taking sufficiently large), any of the computed points may be regarded as an approximation to the exact and unique value . This approach hints at a relativized notion of computability based on almost-everywhere (or at least local) continuity rather than everywhere continuity, which we intend to revisit in a sequel paper.
Appendix A Smooth Ultracomputations and Effectively Computable Equilibria in Deep Neural Networks
Extending the framework of the main body of the paper, one may introduce smooth (ultra)computations as those having output features varying smoothly (i.e., differentiably) with the input features. Considerations of differentiability—particularly in infinite dimension—are very delicate and exceed the scope of this current paper (after all, our notions of extendibility and definability only capture continuity properties). Since differentiability is an essential assumption in current approaches to effective/implicit computability of deep neural networks, this appendix is a brief and informal outline on extensions to our framework beyond the present topological context so as to capture differentiability.
Throughout this appendix, we fix a realcompact CCS whose layer states space is a differentiable (smooth) manifold of finite dimension , and all predicates are differentiable on .
In particular, we assume that the embedding is as a closed subspace (in the product topology).
A.1. Deep equilibria of neural networks à la Bai-Kolter-Koltun
A.1.1. Unique Deep Equilibria
An empirical observation in the context of Neural Network Deep Equilibrium Models [BKK19] is that, in situations where a deep iterate of some computation (assumed confined, for simplicity) exists, it is often independent of the ultrafilter .363636Implicitly, both [CRBD18] and [BKK19] work in a setting where the states space is realcompact, so there is no distinction between transforms and transitions-in-type. In such case, all deep iterates are one and the same transition —a “deep state” of the NN obtained by iteration of . Therefore, the sequence of iterates converges pointwise to the t-t as an ordinary limit (rather than only as an ultralimit). We say that such has the Unique Deep Equilibrium (UDE) Property. Smoothness properties of are required for important applications, as described below.
A.1.2. Fixed-point algorithms as “Black Boxes”
Bai et al. note (empirically) that NNs obtained by iterating a common “weight-tied” layer transition , the deep state takes any input state to another that is fixed by (the t-t implied by) the original , i.e., ; in other words, takes values in the set Fix, so is a deep equilibrium (DEQ) in a very strong sense. Empirical findings also suggest that, given , the DEQ state may be well approximated by some generic “black-box” fixed point algorithm . Such an algorithm should take as inputs the transformation and initial state-in-type , and returns the fixed point .
Like any algorithm based on floating-point arithmetic, what such an algorithm does in practice, given an acceptable error and finitely many output features specified in advance, is to return a suitable -tuple of real numbers such that, for , . Under our current assumption that is smooth of dimension , all features of the output are (heuristically speaking, and perhaps only locally) implicitly defined in terms of some -many input features , . Moreover, a generic such algorithm typically assumes that the given map is not merely continuous but smooth (or at least sufficiently differentiable), and relies on gradient-based methods.
In principle, evaluating (or, approximating at least) the map by means of a “black-box” results in comparable computational complexity or even savings over the direct method of computing successive iterates until a limit is (very nearly) reached. Memory savings in training DE networks (cf. Section A.1.4 below) is also a key advantage to their success. From a theoretical perspective, the innovation lies on effectively bringing deep networks (at least, when obtainable as iterative deep equilibria) to par with classical networks, thereby enriching the class of directly and efficiently computable functions.
A.1.3. Parametrized Families of Computations
Fix a CCS with underlying CSS as layer states space, as well as a second CSS , called the space of computation parameters, and a map , which we regard as a parametrization of (some) computations by elements (parameters) . We make the same assumptions about as about above (namely, is a finite-dimensional differentiable manifold embedded as a closed subspace of ). We call the structure a Parametrized Family of Computations (PFC) (all of which are confined). It is appropriate to think of the parameter as the “weights” of the computation .
We assume that has only confined transitions. It is quite natural to assume that is (i) continuous (as a map ), and (ii) confined, i.e., restricts to maps .
A UDE hypothesis for implies a map which may also be regarded as a map
A.1.4. Training deep networks
Training the deep neural network translates to finding weights such that satisfies a given condition, which we presently take to mean minimizing a given/specified real-valued loss function . (At least intuitively, if not necessarily literally, the value captures how far a transition is from an optimal/idealized .) Regarding for fixed as implicitly defined by either a fixed-point condition or ODE as above, the enormous memory cost of back-propagation through layers373737Not least, because back-propagation would involve an unbounded number of ordinary layers to begin! is replaced by that of minimizing the function . Note that is merely a new real-valued predicate on the parameters CSS . Assuming that is shard-continuous, it is definable, hence depends de facto on only finitely many features of its input (up to an arbitrarily small admissible error ). Assuming is smooth as well, the deep network may be trained using standard/“black-box” gradient-based procedures to find a minimizer for . However, we note that it is essential for to be definable in order to allow even the possibility that some algorithm involving floating-point arithmetic and finitely many real quantities at a time succeeds in finding the minimizer.
A.2. Neural ODEs à la Chen-Rubanova-Bettencourt-Duvenaud
In another setting that is technically different but conceptually closely related to the one in §A.1, Chen et al. [CRBD18] also model deep states of residual networks (“Neural ODEs”) using differential equation techniques. The intuition behind Neural ODEs is the following: Consider a layered computation with atomic steps sequence such that all such steps are “residually” very small (in the sense that the input and output features of any atomic step differ very little). Successive -composites change very little with ; as one varies in such a way that the atomic steps residually vanish (i.e., is vanishingly close to ) and allows to grow without bound, when a limit exists, Chen et al. model it as a family (indexed by a real “time” variable ) of transitions , which we assume to be (confined) elements of . (The real variable captures an appropriate asymptotic rescaling of the “discrete time” .) In this manner, each value captures a specific notion of deep state (as an asymptotic limit of deep composites of residually small layered transitions), realized as a confined transition.
One may hope that such transitions vary differentiably with ; this suggests modeling the entire family deep computational states per the differential equation implied. (In this manner, for each fixed , one obtains a deep network in some sense).
Thus, “Neural ODEs” arise from differential equations of the form
(A.1) |
where is a section of the tangent space of the layer state space (i.e., for all and , where is the tangent space of at ).Interpreting the ODE (A.1) hinges on the smooth manifold structure assumed of the state space .383838The notions of differentiable structure and tangent space on an arbitrary layer space are neither well nor uniquely defined when is not finite-dimensional; their formalization would require much stronger assumptions on , as well as the formalism of Banach spaces for tangent spaces . Chen et al. illustrate empirically the feasibility and effectiveness of modeling deep equilibria by Neural ODEs. Let us denote the time- evolution by (A.1) using the (hopefully, suggestive) notation , i.e., is the deep equilibrium of the Neural ODE solving (A.1) (i.e., “” in the earlier informal discussion).393939When the section depends only on the state (not on time ), the ODE (A.1) is autonomous. A time evolution of such autonomous Neural ODE is analogous to a “parameter-tied” deep equilibrium after Bai et al. Effective computation of relies on a generic “black-box” ODE solver algorithm . Such algorithm should take as inputs the section , initial state-in-type and time , and returns the output . (More realistically, such presumably would return approximate values for any finitely many specified features of ; refer to the discussion of above.)
Modeling deep computations by Neural ODEs and realizing them by means of an ODE solver effectively brings them on computational par with classical neural networks. The key insight of Chen et al. (which predates the work of Bai et al.) is that training such Neural ODEs may be done using the “adjoint sensitivity” method of Pontryagin instead of doing (extremely memory-intensive) back-propagation through layers—which, at any rate, have been essentially abstracted away. The adjoint sensitivity method may be implemented using itself, so the training is both memory and computation-efficient. Formalizing their method to train Neural ODEs in the spirit of §A.1.4 above requires parametrizing sections by a second CSS ; we omit the details.
References
- [APL+22] Cem Anil, Ashwini Pokle, Kaiqu Liang, Johannes Treutlein, Yuhuai Wu, Shaojie Bai, J Zico Kolter, and Roger B Grosse. Path independent equilibrium models can better exploit test-time computation. Advances in Neural Information Processing Systems, 35:7796–7809, 2022.
- [BKK19] Shaojie Bai, J Zico Kolter, and Vladlen Koltun. Deep equilibrium models. Advances in neural information processing systems, 32, 2019.
- [BKK20] Shaojie Bai, Vladlen Koltun, and J. Zico Kolter. Multiscale deep equilibrium models. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 5238–5250. Curran Associates, Inc., 2020.
- [CK66] Chen-chung Chang and H. Jerome Keisler. Continuous model theory. Annals of Mathematics Studies, No. 58. Princeton Univ. Press, Princeton, N.J., 1966.
- [CRBD18] Ricky TQ Chen, Yulia Rubanova, Jesse Bettencourt, and David K Duvenaud. Neural ordinary differential equations. Advances in neural information processing systems, 31, 2018.
- [Eng89] Ryszard Engelking. General topology, volume 6 of Sigma Series in Pure Mathematics. Heldermann Verlag, Berlin, second edition, 1989. Translated from the Polish by the author.
- [Fur81] H. Furstenberg. Recurrence in ergodic theory and combinatorial number theory. Princeton University Press, Princeton, NJ, 1981. M. B. Porter Lectures.
- [Gro52] A. Grothendieck. Critères de compacité dans les espaces fonctionnels généraux. Amer. J. Math., 74:168–186, 1952.
- [HL+19] Jiequn Han, Qianxiao Li, et al. A mean-field optimal control formulation of deep learning. Research in the Mathematical Sciences, 6(1):1–41, 2019.
- [HLA+21] Ramin Hasani, Mathias Lechner, Alexander Amini, Daniela Rus, and Radu Grosu. Liquid time-constant networks. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 7657–7666, 2021.
- [HLA+22] Ramin Hasani, Mathias Lechner, Alexander Amini, Lucas Liebenwein, Aaron Ray, Max Tschaikowski, Gerald Teschl, and Daniela Rus. Closed-form continuous-time neural networks. Nature Machine Intelligence, 4(11):992–1003, November 2022.
- [HS10] Neil Hindman and Dona Strauss. Algebra in the space of ultrafilters and Ramsey theory. In Ultrafilters across mathematics, volume 530 of Contemp. Math., pages 121–145. Amer. Math. Soc., Providence, RI, 2010.
- [Kei23] H. Jerome Keisler. Model theory for real-valued structures. In José Iovino, editor, Beyond First Order Model Theory, Volume II. CRC Press, Boca Raton, FL, 2023.
- [KK87] Heinz König and Norbert Kuhn. Angelic spaces and the double limit relation. J. London Math. Soc. (2), 35(3):454–470, 1987.
- [KM81] Jean-Louis Krivine and Bernard Maurey. Espaces de Banach stables. Israel J. Math., 39(4):273–295, 1981.
- [Kri76] J.-L. Krivine. Sous-espaces de dimension finie des espaces de Banach réticulés. Ann. of Math. (2), 104(1):1–29, 1976.
- [LJ23] Tianyi Lin and Michael I Jordan. Monotone inclusions, acceleration, and closed-loop control. Mathematics of Operations Research, 48(4):2353–2382, 2023.
- [SDJS22] Bin Shi, Simon S Du, Michael I Jordan, and Weijie J Su. Understanding the acceleration phenomenon via high-resolution differential equations. Mathematical Programming, pages 1–70, 2022.
- [Wei75] Maurice D. Weir. Hewitt-Nachbin spaces, volume No. 17 of North-Holland Mathematics Studies. North-Holland Publishing Co., Amsterdam-Oxford; American Elsevier Publishing Co., Inc., New York, 1975. Notas de Matemática, No. 57. [Mathematical Notes].