The Law of Series

In the common sense understanding it is not exactly a law, rather an observation. A series is noted when a random event considered extremely rare happens several (at least two) times in a relatively short period of time. The name "law" suggests that such series are observed often enough to indicate an unexplained physical force or statistical rule provoking them.

An Austrian biologist dr. Paul Kammerer (1880-1926; on the photo) was the first scientist to study the law of series (law of seriality, in some translations). His book Das Gesetz der Serie contains many examples from his and his nears' lives. Here is a sample:

Examples of series are, in the literature, mixed with examples of other kinds of "unbelievable" coincidences. Their list is long and fascinating, but quoting them would drive us away from our subject. Pioneer theories about coincidences (including series) were postulated, besides Kammerer, by a Swiss professor of philosophy Karl Gustav Jung, and a Nobel prize winner in physics, Austrian, Wolfgang Pauli (1900-1958; on the photo). They believed that there exist undiscovered physical "attracting" forces driving objects that are alike, or have common features, closer together in time and space (so-called theory of synchronocity).

In opposition to the theory of synchronicity is the belief, that any series, coincidences and the like, appear exclusively by pure chance and that there is no mysterious or unexplained force behind them. An American mathematician, dr. Warren Weaver (1894-1978; on the photo) argues, that around the world in every instant of time the reality combines so many different names, numbers, events, etc., that there is noting unusual if some combinations considered "series" or "unbelievable coincidences" occur somewhere from time to time. Every such coincidence has nonzero probability, which implies not only that they can, but even must occur, if sufficiently many trials are performed. Our problem is in ignoring all those sequences of events, which do not posses the attribute of being unusual, so that we largely underestimate the enormous number of "failures" accompanying every single "successful" coincidence.

With regard to series of repetitions of identical or similar events, Weaver's and other statisticians' argumentation refers to the effect of spontaneous clustering. For an event A, to repeat in time by "pure chance" means to follow a trajectory of a Poisson process (with independent increments). Such process is characterized by one parameter l called intensity, equal to the average number of signals (occurrences of A) per time unit. In a typical realization of a Poisson process the distribution of signals along the time axis is far from being uniform. It reveals a natural tendency to create clusters. According to Weaver, it is nothing but these natural clusters that are being observed and interpreted as "series". We will call such purely random distribution stochastically unbiased and the resulting clustering spontaneous.

In order to specify the meaning of seriality, we define attracting as a deviation of a signal process from the Poisson process toward clustering stronger than spontaneous. Similarly, repelling is defined as clustering weaker than spontaneous; a more uniform distribution of signals in time. The matter of the controversy between Pauli's and Weaver's beliefs is about whether the occurrences of various events in nature reveal attracting or not. There is no doubt that the postulates of Pauli and Jung concern so defined attracting. As to Kammerer, apparently less familiar with probability theory, it seems that in most of his experiments he merely "discovered" the spontaneous clustering.

In many real processes attracting is perfectly understandable as a result of strong physical dependence. Various events reveal increased frequency of occurrences in so-called periods of propitious conditions, which in turn, follow a slowly changing process. For example, volcanic eruptions appear in series during periods of increased tectonic activity. Another good example here are series of ill-fallings due to a contagious disease. Repelling can be illustrated by a process of visits of a particular animal at a watering-place. For obvious reasons these will be distributed in time more uniformly than the signals in a Poisson process.

The dispute on the law of series clearly concerns only events for which there are no obvious attracting mechanisms, and we expect them to appear completely independently, governed by pure chance. However, perfect independence is only theoretical, while in practice it is always only approximate. A priori it may happen that some tiny dependencies, undetectable for our logic or measurement, in long run generate attracting in a process expected to be stochastically unbiased. Any reasonable interpretation of the theory of synchronicity (at least with regard to the law of series) should postulate that the marginal dependencies can generate only attracting. Is it indeed so, and why?

Recently, jointly with Yves Lacroix (on the left photo), we have obtained a certain result in ergodic theory which enables us to join in the debate just about here. Equipped with a very strong weapon - a math theorem, we vote for Pauli an Jung. Roughly speaking, we have proved that with regard to "elementary" events (basic sets of very small probability)

So, in the universe there exists a natural advantage of attracting over repelling. One does not need to understand the mechanisms behind the dependencies in a particular process to be sure that it's only possible effect on certain small events is attracting. And of course there is nothing mysterious or (no longer) unexplained about it.

Before we pass to the detailed formulation of our theorem, we must make the definitions of attracting (and repelling) more precise. Fortunately, as a deviation of a process of signals from the Poisson process, this property does not require examining the multi-dimensional distributions. Sufficient is a simple inequality for the distribution function of one variable: the waiting time for the first signal.

First of all, note that attracting must not depend on the average frequency l of the signals. Because our process will be compared to the Poisson process of exactly the same intensity, we can assume that l = 1. This corresponds to the change of the time unit, so that it equals the inverse of l (we call it normalization). In a Poisson process, the waiting time for the first signal (as well as the gap size between two consecutive signals) has an exponential distribution. For unit intensity the distribution function is given by the formula

Definition. We will say that a signal process reveals attracting with intensity e from a distance t, if the distribution function F of the waiting time for the first signal satisfies the inequality

F(t) < 1– e^–t – e .

Generally, we will say that a process reveals attracting only, if at each t holds,

F(t) £ 1– e^–t,

without the two functions being equal (i.e., strict inequality holds at some point). Analogously, inverted inequalities (with +e) define repelling.

In view of these definitions, for a process to be stochastically unbiased, it needn't be Poisson. The identity of functions F(t) = 1– e^–t is enough. To be exact, one also needs to assume ergodicity, so that it makes sense to speak about typical properties of the realizations.

We shall now explain, why such definition of attracting. The ergodic theorem ensures, that in a typical realization of any stationary (and ergodic) signal process, in a sufficiently long run, the ratio between the number of signals and the elapsed time will approximately equal l (i.e., 1). Thus, in a selected at random time interval of length t, the expected value of the number of signals equals lt, i.e., t. The value F(t) is the probability, that in such interval there will be at least one signal. The ratio t/F(t) hence represents the conditional expectation of the number of signals in all these time intervals of length t in which at least one signal is observed. If F(t) < 1– e^–t, this conditional expectation is larger than in the Poisson process. In other words, if we observe the process for time t, there are two possibilities: either we detect nothing, or, once the first signal occurs, we can expect a larger global number of observed signals than if we were dealing with the Poisson process. The first signal "attracts" further repetitions, contributing to an increased clustering effect. Repelling is the converse: the first signal lowers the expected number of repetitions in the observation period, contributing to a decreased clustering, and a more uniform distribution of signals in time. If a given process reveals attracting from some distance and repelling from another, the tendency to clustering is not clear and depends on the applied time perspective. However, if there is only attracting (without repelling), then at any time scale we shall see the increased clustering. This is the essence of the phenomenon of the law of series.

Our theorem concerns the occurrences of small (by probability) random events in stationary stochastic processes with discrete time. Not any small events, only "basic sets" i.e., cylinders over long blocks with respect to a finite partition P of the state space. The process can be pictured as hitting the keyboard in a more or less random fashion, generating a sequence of letters (text), while the event whose repetitions we register is the appearance of a particular long finite string (say, a certain sentence). We make only two unavoidable assumptions on the process. The first one is the aforementioned ergodicity, i.e., the requirement that all (from a set of probability 1) realizations of the process have the same probabilistic properties. It is automatically satisfied if we observe a chosen at random single realization of any stationary process, so that this assumption is purely technical. The second assumption is that with respect to the partition P, the process is not deterministic, i.e., that the future of a realization cannot be completely determined from its past. It is absolutely obvious that we are interested only in such processes. Besides, in deterministic processes there is no chance for any kind of a law of series (consider for instance a periodic process where even the spontaneous clustering does not exist). Summarizing, our assumptions are natural, necessary, and do not restrict the generality.

Theorem

I. Given a stationary and ergodic stochastic process with discrete time and a non-deterministic finite partition P of the state space, for every e >0 the joint measure of all P -blocks of length n, revealing repelling with intensity e converges to zero as n tends to infinity.

II. In every non-deterministic process there exists a partition P and a subset of natural numbers of upper density 1, such that a majority (of joint measure converging to 1 with n) of P -blocks of lengths n from this set reveal attracting with intensity close to 1.

I. If e represents the accuracy we can compare two distribution functions with, and if in our process (with respect to a fixed partition P ) we choose at random a sufficiently long block, then with probability very close to 1 this block's occurrences along time will either be statistically unbiased, or they will reveal attracting. Repelling is almost impossible to observe.

II. In every non-deterministic process we can find a partition, whose long blocks not only almost never repel, but in fact almost all of them strongly attract.

Firstly, it concerns exclusively small sets, i.e., rare events. It does not apply to the observations of single numbers in a roulette, umbrellas on passing pedestrians (one of Kammerer's favorite experiment subjects), or visits of an animal at a water-place. More adequate is the event of breaking a roulette bank.

Secondly, note that part I says nothing about universality of attracting. It only claims that repelling decays as n tends to infinity. This part alone does not exclude, that all processes are asymptotically unbiased (attracting could decay the same way as repelling). It is the part II which indicates the asymmetry. The situation resembles that of the second principle of thermodynamics: theoretically the entropy cannot drop (but is allowed to grow). But constancy of the entropy is, for a physical system, an impossible to attain perfect state, thus in practice the entropy always grows. Likewise, we cannot theoretically eliminate stochastically unbiased processes (the Poisson process exists). But no process in reality fits precisely to this perfect independent model. Even the independence in flipping a coin is only theoretic, and in practice - approximate. One can expect that even asymptotically unbiased processes are exceptional, and hence attracting rules in the majority of processes. This hypotheses, however, is yet to be studied. In any case, part II gives at least that much, that in every non-deterministic process, even the independent one, attracting, also very strong, can be observed for an appropriately chosen partition. In this meaning, attracting is a universal property.

A certain "disadvantage" of the theorem is, that it applies only to blocks, i.e., to cylinder sets, and to repetitions of exactly the same block. At this moment it is difficult to say, which of the events in reality can be modeled as long blocks. Their structure suggests that these events should have the form of sequences of many "crude" events in a particular order. The freedom to choose a generating partition gives perhaps enough flexibility to such structure, to capture many types of events. As far as repetitions of similar but not identical events are concerned, many can be modeled as identical by applying a coding which identifies some attributes by similarity. But we should stop here in order not to run into pure speculations.

The most important feature in our theorem is the indication of a natural mechanism in stochastic processes in favor of attracting over repelling, which puts the law of series in a completely new perspective. This is probably only the beginning of a possible research program for further specifications and generalizations.

The proof of the part I of the theorem consists of one technical trick and two major observations. The trick is to consider the repetitions of a concatenation BA whose left part B is much longer than the right part A. For a moment we can think that A is the last letter of the considered block. Then we observe the process of repetitions of the block B and the process of symbols directly following these repetitions. On the figure below, a realization of such "induced" process is the sequence ...A_-1A₀A₁A₂...

The proof of the lemma strongly uses some advanced ergodic theory techniques, mainly the entropy theory. The second key observation is much easier. Assume for simplicity, that the independencies of 1 are strict. Then it is not hard to prove that

But with such distribution of the B's, and with the assumed independence, the occurrences of BA are the same as in an independent process with discrete time (with unit equal to the period of the B's). The distribution of the waiting time for BA is then geometric. Because A is in fact also a quite long block, its probability p is very small, and the geometric distribution function for small parameter p nearly coincides with the exponential distribution function 1– e^-t. These facts combined prove that even the maximally repelling possible distribution of BA is still nearly stochastically unbiased, which ends the proof.

Part II takes a tedious construction of a specific subshift. We used a little help from an expert on Bernoulli shifts, Dan Rudolph.