Institute of Mathematics and Computer Science
Wroclaw University of Technology
Wybrzeze Wyspiańskiego 27
50-370 Wroclaw, Poland
The law of series
Institute colloquium lecture notes
January 17, 2006
What is the "law of series"? the law of seriality
In the common sense understanding it is not exactly a law, rather an observation. A series is noted when a random event considered extremely rare happens several (at least two) times in a relatively short period of time. The name "law" suggests that such series are observed often enough to indicate an unexplained physical force or statistical rule provoking them.
An Austrian biologist dr. Paul Kammerer (1880-1926; on the photo) was the first scientist to study the law of series (law of seriality, in some translations). His book Das Gesetz der Serie contains many examples from his and his nears' lives. Here is a sample:
Examples of series are, in the literature, mixed with examples of other kinds of "unbelievable" coincidences. Their list is long and fascinating, but quoting them would drive us away from our subject. Pioneer theories about coincidences (including series) were postulated, besides Kammerer, by a Swiss professor of philosophy Karl Gustav Jung, and a Nobel prize winner in physics, Austrian, Wolfgang Pauli (1900-1958; on the photo). They believed that there exist undiscovered physical "attracting" forces driving objects that are alike, or have common features, closer together in time and space (so-called theory of synchronocity).
Spontaneous series in the independent process
In opposition to the theory of synchronicity is the belief, that any series, coincidences and the like, appear exclusively by pure chance and that there is no mysterious or unexplained force behind them. An American mathematician, dr. Warren Weaver (1894-1978; on the photo) argues, that around the world in every instant of time the reality combines so many different names, numbers, events, etc., that there is noting unusual if some combinations considered "series" or "unbelievable coincidences" occur somewhere from time to time. Every such coincidence has nonzero probability, which implies not only that they can, but even must occur, if sufficiently many trials are performed. Our problem is in ignoring all those sequences of events, which do not posses the attribute of being unusual, so that we largely underestimate the enormous number of "failures" accompanying every single "successful" coincidence.
With regard to series of repetitions of identical or similar events, Weaver's and other statisticians' argumentation refers to the effect of spontaneous clustering. For an event A, to repeat in time by "pure chance" means to follow a trajectory of a Poisson process (with independent increments). Such process is characterized by one parameter l called intensity, equal to the average number of signals (occurrences of A) per time unit. In a typical realization of a Poisson process the distribution of signals along the time axis is far from being uniform. It reveals a natural tendency to create clusters. According to Weaver, it is nothing but these natural clusters that are being observed and interpreted as "series". We will call such purely random distribution stochastically unbiased and the resulting clustering spontaneous.
Comparison between spontaneous clustering (center), attracting and repelling.
In order to specify the meaning of seriality, we define attracting as a deviation of a signal process from the Poisson process toward clustering stronger than spontaneous. Similarly, repelling is defined as clustering weaker than spontaneous; a more uniform distribution of signals in time. The matter of the controversy between Pauli's and Weaver's beliefs is about whether the occurrences of various events in nature reveal attracting or not. There is no doubt that the postulates of Pauli and Jung concern so defined attracting. As to Kammerer, apparently less familiar with probability theory, it seems that in most of his experiments he merely "discovered" the spontaneous clustering.
In many real processes attracting is perfectly understandable as a result of strong physical dependence. Various events reveal increased frequency of occurrences in so-called periods of propitious conditions, which in turn, follow a slowly changing process. For example, volcanic eruptions appear in series during periods of increased tectonic activity. Another good example here are series of ill-fallings due to a contagious disease. Repelling can be illustrated by a process of visits of a particular animal at a watering-place. For obvious reasons these will be distributed in time more uniformly than the signals in a Poisson process.
The dispute on the law of series clearly concerns only events for which there are no obvious attracting mechanisms, and we expect them to appear completely independently, governed by pure chance. However, perfect independence is only theoretical, while in practice it is always only approximate. A priori it may happen that some tiny dependencies, undetectable for our logic or measurement, in long run generate attracting in a process expected to be stochastically unbiased. Any reasonable interpretation of the theory of synchronicity (at least with regard to the law of series) should postulate that the marginal dependencies can generate only attracting. Is it indeed so, and why?
Our voice in the debate
Recently, jointly with Yves Lacroix (on the left photo), we have obtained a certain result in ergodic theory which enables us to join in the debate just about here. Equipped with a very strong weapon - a math theorem, we vote for Pauli an Jung. Roughly speaking, we have proved that with regard to "elementary" events (basic sets of very small probability)
any deviation from independence may generate only attracting.
So, in the universe there exists a natural advantage of attracting over repelling. One does not need to understand the mechanisms behind the dependencies in a particular process to be sure that it's only possible effect on certain small events is attracting. And of course there is nothing mysterious or (no longer) unexplained about it.
Before we pass to the detailed formulation of our theorem, we must make the definitions of attracting (and repelling) more precise. Fortunately, as a deviation of a process of signals from the Poisson process, this property does not require examining the multi-dimensional distributions. Sufficient is a simple inequality for the distribution function of one variable: the waiting time for the first signal.
First of all, note that attracting must not depend on the average frequency l of the signals. Because our process will be compared to the Poisson process of exactly the same intensity, we can assume that l = 1. This corresponds to the change of the time unit, so that it equals the inverse of l (we call it normalization). In a Poisson process, the waiting time for the first signal (as well as the gap size between two consecutive signals) has an exponential distribution. For unit intensity the distribution function is given by the formula
FP(t) = 1– e–t (t ³ 0).
In view of these definitions, for a process to be stochastically unbiased, it needn't be Poisson. The identity of functions F(t) = 1– e–t is enough. To be exact, one also needs to assume ergodicity, so that it makes sense to speak about typical properties of the realizations.
We shall now explain, why such definition of attracting. The ergodic theorem ensures, that in a typical realization of any stationary (and ergodic) signal process, in a sufficiently long run, the ratio between the number of signals and the elapsed time will approximately equal l (i.e., 1). Thus, in a selected at random time interval of length t, the expected value of the number of signals equals lt, i.e., t. The value F(t) is the probability, that in such interval there will be at least one signal. The ratio t/F(t) hence represents the conditional expectation of the number of signals in all these time intervals of length t in which at least one signal is observed. If F(t) < 1– e–t, this conditional expectation is larger than in the Poisson process. In other words, if we observe the process for time t, there are two possibilities: either we detect nothing, or, once the first signal occurs, we can expect a larger global number of observed signals than if we were dealing with the Poisson process. The first signal "attracts" further repetitions, contributing to an increased clustering effect. Repelling is the converse: the first signal lowers the expected number of repetitions in the observation period, contributing to a decreased clustering, and a more uniform distribution of signals in time. If a given process reveals attracting from some distance and repelling from another, the tendency to clustering is not clear and depends on the applied time perspective. However, if there is only attracting (without repelling), then at any time scale we shall see the increased clustering. This is the essence of the phenomenon of the law of series.
Rigorous formulation of the theorem
Our theorem concerns the occurrences of small (by probability) random events in stationary stochastic processes with discrete time. Not any small events, only "basic sets" i.e., cylinders over long blocks with respect to a finite partition P of the state space. The process can be pictured as hitting the keyboard in a more or less random fashion, generating a sequence of letters (text), while the event whose repetitions we register is the appearance of a particular long finite string (say, a certain sentence). We make only two unavoidable assumptions on the process. The first one is the aforementioned ergodicity, i.e., the requirement that all (from a set of probability 1) realizations of the process have the same probabilistic properties. It is automatically satisfied if we observe a chosen at random single realization of any stationary process, so that this assumption is purely technical. The second assumption is that with respect to the partition P, the process is not deterministic, i.e., that the future of a realization cannot be completely determined from its past. It is absolutely obvious that we are interested only in such processes. Besides, in deterministic processes there is no chance for any kind of a law of series (consider for instance a periodic process where even the spontaneous clustering does not exist). Summarizing, our assumptions are natural, necessary, and do not restrict the generality.
In other words:
I. If e represents the accuracy we can compare two distribution functions with, and if in our process (with respect to a fixed partition P ) we choose at random a sufficiently long block, then with probability very close to 1 this block's occurrences along time will either be statistically unbiased, or they will reveal attracting. Repelling is almost impossible to observe.
II. In every non-deterministic process we can find a partition, whose long blocks not only almost never repel, but in fact almost all of them strongly attract.
A comment on the theorem
Firstly, it concerns exclusively small sets, i.e., rare events. It does not apply to the observations of single numbers in a roulette, umbrellas on passing pedestrians (one of Kammerer's favorite experiment subjects), or visits of an animal at a water-place. More adequate is the event of breaking a roulette bank.
Secondly, note that part I says nothing about universality of attracting. It only claims that repelling decays as n tends to infinity. This part alone does not exclude, that all processes are asymptotically unbiased (attracting could decay the same way as repelling). It is the part II which indicates the asymmetry. The situation resembles that of the second principle of thermodynamics: theoretically the entropy cannot drop (but is allowed to grow). But constancy of the entropy is, for a physical system, an impossible to attain perfect state, thus in practice the entropy always grows. Likewise, we cannot theoretically eliminate stochastically unbiased processes (the Poisson process exists). But no process in reality fits precisely to this perfect independent model. Even the independence in flipping a coin is only theoretic, and in practice - approximate. One can expect that even asymptotically unbiased processes are exceptional, and hence attracting rules in the majority of processes. This hypotheses, however, is yet to be studied. In any case, part II gives at least that much, that in every non-deterministic process, even the independent one, attracting, also very strong, can be observed for an appropriately chosen partition. In this meaning, attracting is a universal property.
A certain "disadvantage" of the theorem is, that it applies only to blocks, i.e., to cylinder sets, and to repetitions of exactly the same block. At this moment it is difficult to say, which of the events in reality can be modeled as long blocks. Their structure suggests that these events should have the form of sequences of many "crude" events in a particular order. The freedom to choose a generating partition gives perhaps enough flexibility to such structure, to capture many types of events. As far as repetitions of similar but not identical events are concerned, many can be modeled as identical by applying a coding which identifies some attributes by similarity. But we should stop here in order not to run into pure speculations.
The most important feature in our theorem is the indication of a natural mechanism in stochastic processes in favor of attracting over repelling, which puts the law of series in a completely new perspective. This is probably only the beginning of a possible research program for further specifications and generalizations.
Main idea of the proof
The proof of the part I of the theorem consists of one technical trick and two major observations. The trick is to consider the repetitions of a concatenation BA whose left part B is much longer than the right part A. For a moment we can think that A is the last letter of the considered block. Then we observe the process of repetitions of the block B and the process of symbols directly following these repetitions. On the figure below, a realization of such "induced" process is the sequence ...A-1A0A1A2...
The main and most difficult lemma of the proof says that
The proof of the lemma strongly uses some advanced ergodic theory techniques, mainly the entropy theory. The second key observation is much easier. Assume for simplicity, that the independencies of 1 are strict. Then it is not hard to prove that
But with such distribution of the B's, and with the assumed independence, the occurrences of BA are the same as in an independent process with discrete time (with unit equal to the period of the B's). The distribution of the waiting time for BA is then geometric. Because A is in fact also a quite long block, its probability p is very small, and the geometric distribution function for small parameter p nearly coincides with the exponential distribution function 1– e-t. These facts combined prove that even the maximally repelling possible distribution of BA is still nearly stochastically unbiased, which ends the proof.
Part II takes a tedious construction of a specific subshift. We used a little help from an expert on Bernoulli shifts, Dan Rudolph.