Chapter 3
From signals to noise: random processes
Every signal so far was deterministic: rerun the experiment and you get the identical trace. Noise means the opposite — rerun it and you get a different trace. The honest mathematical object is therefore not one trace but the whole collection of traces you could have gotten, each with its probability. This chapter builds that object, gives it statistics (mean, variance, correlation), and explains the small miracle — ergodicity — that lets a lab measure any of this from a single record.
One experiment, many movies
Point a photodiode at a "constant" laser and record the voltage for four seconds. Now do it again. The two records disagree in every detail, yet they are obviously siblings: same typical size, same typical wiggliness, same character. What is reproducible is not the trace but the statistics of traces.
A random process $x(t)$ is exactly this: a rule that assigns probabilities to whole time-records. Each run of the experiment draws one complete record — one realization — from that rule, the way each roll of a die draws one face. The set of all possible realizations, weighted by their probabilities, is called the ensemble. Think of it as a stack of movies of the same experiment, one per parallel universe. Any statement beginning "the noise has…" (a mean, a variance, a spectrum) is a statement about the ensemble, not about any single movie.
Left: 12 realizations of the same process; one is highlighted, the gray ones are the runs you didn't get. Right: the ensemble mean and the mean ± σ band estimated from 200 realizations. Switch the process type: for white noise the band is flat; for the random walk it fans out like $\sqrt{t}$.
Averaging vertically through the stack of movies — across realizations, at a fixed time $t$ — is called an ensemble average, written $\langle \cdot \rangle$. The right-hand panel above is computed exactly that way: at each $t$, average $x(t)$ over 200 realizations. Note that this is a completely different operation from averaging horizontally along one trace; keeping the two straight is half of this chapter.
Ensemble statistics: mean, variance, and memory
The first two statistics are the ones you would guess. At each instant $t$ the ensemble has a mean and a variance,
$$ \mu(t) = \langle x(t) \rangle, \qquad \sigma^2(t) = \big\langle \big[x(t) - \mu(t)\big]^2 \big\rangle . $$
But a noise trace is not just a sequence of independent random numbers — neighbouring values usually know about each other. The statistic that captures this is the autocorrelation function,
$$ R_x(t_1, t_2) = \langle x(t_1)\, x(t_2) \rangle , $$
the ensemble average of the product of the signal at two different times. For the processes we will care about most, the statistics do not depend on absolute time (more on that in a moment), and $R_x$ depends only on the time separation $\tau = t_2 - t_1$:
$$ R_x(\tau) = \langle x(t)\, x(t+\tau) \rangle . $$
Read it as a question: if the noise is high now, is it still high a time $\tau$ later? If yes, the product $x(t)x(t+\tau)$ tends to be positive and $R_x(\tau)$ is large; if by time $\tau$ the noise has forgotten where it was, positive and negative products cancel and $R_x(\tau)\to 0$. The autocorrelation is the memory of the noise, and $R_x(0) = \sigma^2$ (for a zero-mean process) is just the variance.
The cleanest laboratory of memory needs one piece of hardware vocabulary first. A low-pass filter is anything sluggish — anything whose output cannot follow a fast-changing input and instead keeps relaxing toward it: a capacitor charging through a resistor, a thermometer catching up with a stirred bath, a heavy stage settling after a push. The simplest kind (a single-pole filter) relaxes exponentially, and its time constant $\tau_c$ is its sluggishness in seconds: poke it once and the response dies away as $e^{-t/\tau_c}$ — after a few $\tau_c$ the poke is forgotten. (Chapter 4 will translate "sluggish" into frequency language; here the time picture is all we need.)
Now feed noise into something sluggish. Every random kick lingers in the output for about $\tau_c$ before fading, so the output at nearby times is built from the same recent kicks — the filter's sluggishness has become the noise's memory. The result is called an Ornstein–Uhlenbeck process. Discretely, with sample spacing $\Delta t$,
$$ y_{i+1} = a\, y_i + \sqrt{1-a^2}\; w_i, \qquad a = e^{-\Delta t/\tau_c}, $$
where $w_i$ are independent unit Gaussians; the $\sqrt{1-a^2}$ factor keeps the variance pinned at 1 whatever $\tau_c$ is. Its autocorrelation is a pure exponential, $R_y(\tau) = e^{-|\tau|/\tau_c}$: the process remembers itself for about one filter time constant and no longer.
The circuit is the demo: a white noise source (it could be the resistor's own thermal noise — chapter 5) drives a resistor–capacitor low-pass, and the capacitor voltage $y(t)$ is an Ornstein–Uhlenbeck process with $\tau_c = RC$. Dial $R$ and $C$: large $RC$ looks smooth and sluggish, small $RC$ jagged. Right: the autocorrelation estimated from a 40 s record against the theoretical $e^{-\tau/RC}$. The estimate gets noisy for long $\tau_c$ — the record then holds fewer independent chunks.
Notice the trade you can see by eye: short memory ↔ wide bandwidth, long memory ↔ narrow bandwidth. A filter that passes only slow frequencies produces noise whose value persists; noise that forgets instantly must contain arbitrarily fast components. The exact statement — that $R_x(\tau)$ and the power spectral density are a Fourier pair — is the Wiener–Khinchin theorem, and it is the whole subject of the next chapter. (The readout above shows the filter's 3 dB bandwidth $1/2\pi RC$ so you can watch the reciprocity happen.) And the same mathematics arrives with other hardware: a micron-sized bead in an optical tweezer is an OU process in position — trap stiffness playing the role of $1/RC$ — and biophysicists calibrate their traps by fitting exactly this exponential memory.
Stationarity: does the process age?
A process is stationary if its statistics are unchanged by a shift of the time origin: the ensemble looks the same whether you start recording now or an hour from now. In practice one usually demands only wide-sense stationarity: the mean $\mu(t)$ is constant and $R_x(t_1,t_2)$ depends only on $\tau = t_2-t_1$. That is the condition under which writing $R_x(\tau)$, as we did above, is legal.
White noise is stationary — every sample is a fresh, identically distributed draw. The random walk is the canonical non-stationary process. Write it as a sum of independent increments: after time $t$ it has accumulated $t/\Delta t$ independent steps of variance $\sigma^2 \Delta t$ each. Independent variances add, so
$$ \mathrm{Var}\big[x(t)\big] \;=\; \frac{t}{\Delta t}\cdot \sigma^2 \Delta t \;=\; \sigma^2 t . $$
The variance grows linearly with time — the ensemble spreads out forever, and the typical excursion grows as $\sigma\sqrt{t}$. No time shift can map this ensemble onto itself: a walk observed from $t=100$ s is statistically wider than one observed from $t=0$.
Left: 100 random walks ($\sigma = 1$ per $\sqrt{\text{s}}$) with the $\pm\sigma\sqrt{t}$ envelope — the fan. Right: the ensemble variance measured across 400 walks at each time, against the straight line $\sigma^2 t$.
A subtlety worth loving: the random-phase sine
Here is a process that recalibrates your intuition about what "random" means. Take a perfect tone from chapter 1 and randomize only its phase:
$$ x(t) = A \cos(2\pi\nu_0 t + \varphi), \qquad \varphi \sim \text{uniform on } [0, 2\pi) , $$
with $\varphi$ drawn once per realization. Every single movie in the ensemble is a flawless, deterministic-looking cosine — nothing jagged anywhere. Yet as an ensemble this is a perfectly respectable stationary random process: averaging over $\varphi$,
$$ \langle x(t) \rangle = 0, \qquad R_x(\tau) = \frac{A^2}{2} \cos(2\pi\nu_0 \tau) , $$
independent of $t$ (exercise 3.2 does the two-line integral). Select "sine, random phase" in the first demo and the ensemble panels show the first two claims: clean cosines slid against each other, mean zero, ±σ band flat at $A/\sqrt{2}$. The third claim — that cosine autocorrelation — deserves to be measured, exactly as we measured the RC filter's above:
Left: a few realizations — identical cosines, random starting phases. Right: $R(\tau)/R(0)$ measured along one record, with the theory $\cos(2\pi\nu_0\tau)$ overlaid. Compare the RC demo above: there the memory died as $e^{-\tau/RC}$; here it oscillates undiminished forever — knowing the value now predicts the value arbitrarily far into the future.
The moral: noise is not the same as "jagged". Randomness lives in the ensemble — in what you couldn't predict before the run — not in the roughness of any one trace. And set the two autocorrelation demos side by side, because together they are the seed of the deepest fact in the course: exponential forgetting (the RC filter) will turn out to mean a smeared-out spectrum, while this never-fading oscillation is what an infinitely sharp spectral line at $\nu_0$ looks like in the time domain. The dial connecting the two is the next chapter.
Ergodicity: why one trace is enough
Everything above was defined over the ensemble — but nobody owns 200 parallel universes. Your data logger records one realization, and from it you compute time averages: the mean of the samples, the average of $x(t)x(t+\tau)$ along the record, and so on. A process is ergodic if these time averages, taken over one sufficiently long record, converge to the ensemble averages:
$$ \lim_{T\to\infty} \frac{1}{T} \int_0^T x(t)\, x(t+\tau)\, dt \;=\; \langle x(t)\, x(t+\tau)\rangle \;=\; R_x(\tau) . $$
Intuitively, an ergodic process wanders through all of its typical behaviours given enough time, so one long movie samples the ensemble as well as many short ones. Well-behaved stationary processes — filtered thermal noise, shot noise, the OU process above — are ergodic. And this is the silent assumption of essentially every instrument you own: a spectrum analyzer averaging successive sweeps, a lock-in's output filter, a frequency counter's gate — all replace ensemble averages by time averages and hope the process cooperates. (Only stationary processes can be ergodic: a time average produces one number, so it cannot track a time-dependent $\mu(t)$.)
Watch the difference happen. Below, eight parallel universes run the same experiment; the right panel tracks each universe's running time average $\frac{1}{T}\int_0^T x\,dt$ as the record grows. For the ergodic process all eight funnels collapse onto the ensemble mean — any single movie would have told you the truth. Switch to the non-ergodic process (the same noise riding on a frozen, once-per-universe offset): every time average still converges beautifully… each to its own private answer. Convergence is not the issue — convergence to the ensemble answer is.
Left: eight realizations (one highlighted). Right: each realization's running time average vs averaging time, with the ensemble mean dashed. Press play to grow the records and watch the averages settle; then switch process type. For the frozen-offset process, waiting longer only makes each universe more confident of a different number.
Drift versus noise
One more distinction the lab forces on you. A deterministic trend — the thermal ramp as your box warms up, mechanical creep, a battery discharging — is not noise at all: rerun the experiment and the same ramp comes back. It should be modelled and subtracted, not averaged over. Slow non-stationary noise — random-walk-like wandering — looks superficially similar in any one record but is genuinely random: rerun and it wanders differently.
The uncomfortable truth is that in a finite record you often cannot tell them apart. A single realization of very-low-frequency noise (the "flicker", $1/f$-type noise of chapter 5) is nearly indistinguishable from a smooth drift: both look like "it slowly went up". The discriminating tools are statistical and need either many records or clever statistics on one: the slope of the power spectral density at low frequency (chapter 5) and the Allan deviation's behaviour at long averaging times (chapter 7).
The same thing in code
The two loops below generate every plot on this page. The first makes one realization of the OU (low-pass-filtered) process; the second builds ensemble statistics by brute force — exactly the definition of $\langle\cdot\rangle$, no cleverness required.
// one realization of an Ornstein–Uhlenbeck process, unit variance
const a = Math.exp(-dt / tauC), b = Math.sqrt(1 - a * a);
y[0] = Noise.randn(rand); // start in steady state
for (let i = 1; i < N; i++) y[i] = a * y[i - 1] + b * Noise.randn(rand);
// ensemble mean and sigma at each time, from M fresh realizations
const sum = new Float64Array(N), sq = new Float64Array(N);
for (let m = 0; m < M; m++) {
const x = makeRealization(rand); // a new movie each pass
for (let i = 0; i < N; i++) { sum[i] += x[i]; sq[i] += x[i] * x[i]; }
}
const mean = sum.map(s => s / M);
const sigma = sq.map((s, i) => Math.sqrt(s / M - mean[i] * mean[i]));
Exercises
A walker takes $N$ independent steps, each $\pm s$ with equal probability. Show that the variance of the final position is $N s^2$, and explain why the fan in the demo has a $\sqrt{N}$ (equivalently $\sqrt{t}$) envelope rather than a linear one.
Solution
The position is $x_N = \sum_{k=1}^{N} \epsilon_k$ with $\epsilon_k = \pm s$, $\langle \epsilon_k \rangle = 0$ and $\langle \epsilon_k^2 \rangle = s^2$. Then $\mathrm{Var}[x_N] = \sum_k \langle\epsilon_k^2\rangle + \sum_{j\neq k}\langle \epsilon_j\rangle\langle\epsilon_k\rangle = N s^2$: independence kills the cross terms, so variances add, not amplitudes. The typical excursion is the standard deviation $s\sqrt{N}$ — the fan is the square root of a linearly growing variance. With steps of variance $\sigma^2\Delta t$ every $\Delta t$, this is $\mathrm{Var}[x(t)] = \sigma^2 t$ and envelope $\sigma\sqrt{t}$.
For $x(t) = A\cos(2\pi\nu_0 t + \varphi)$ with $\varphi$ uniform on $[0, 2\pi)$, compute $R_x(\tau) = \langle x(t)\,x(t+\tau) \rangle$ by doing the integral over $\varphi$.
Solution
Use $\cos\alpha\cos\beta = \tfrac12[\cos(\alpha-\beta) + \cos(\alpha+\beta)]$ with $\alpha = 2\pi\nu_0 t + \varphi$, $\beta = 2\pi\nu_0 (t+\tau) + \varphi$: $$ x(t)x(t+\tau) = \frac{A^2}{2}\Big[\cos(2\pi\nu_0\tau) + \cos\!\big(2\pi\nu_0(2t+\tau) + 2\varphi\big)\Big]. $$ Averaging over $\varphi$: the first term has no $\varphi$ and survives; the second is a cosine averaged over a full period of $2\varphi$, $\frac{1}{2\pi}\int_0^{2\pi}\cos(\cdot + 2\varphi)\,d\varphi = 0$. Hence $R_x(\tau) = \tfrac{A^2}{2}\cos(2\pi\nu_0\tau)$, independent of $t$ — the process is wide-sense stationary, and $R_x(0) = A^2/2$ is the familiar mean-square value of a sine.
Let $x(t) = c$ for all $t$, with $c \sim \mathcal{N}(0,1)$ drawn once per realization. Is this process stationary? Is it ergodic?
Solution
Stationary: yes. Every realization is constant, so shifting time changes nothing: $\mu(t) = \langle c \rangle = 0$ and $R_x(t_1,t_2) = \langle c^2 \rangle = 1$ for all times — statistics fully shift-invariant. Ergodic: no. The time average of one realization is $\frac{1}{T}\int_0^T c\,dt = c$, a random number that does not converge to the ensemble mean $0$ no matter how long $T$ is. One movie explores exactly one point of the ensemble.
A scope in averaging mode acquires 64 triggered traces and averages them point by point. Is that a time average or an ensemble average? Under what conditions does it agree with a time average along one long record?
Solution
It is an ensemble average: each trigger starts a fresh realization of the noise, and the scope averages across the 64 realizations at fixed time-after-trigger. That is precisely why it works — the trigger-synchronous signal is the same in every realization and survives, while zero-mean noise averages down as $1/\sqrt{64} = 1/8$ in amplitude. It agrees with a time average along one record when the noise is stationary and ergodic and nothing distinguishes one trigger from the next (no trigger-correlated pickup, no drift between acquisitions). If the setup drifts between triggers — non-stationarity — the 64 movies come from different ensembles, and the average blurs the signal instead of cleaning it.
Your laser is free-running (no lock), and its frequency random-walks with $\sigma = 100$ kHz$/\sqrt{\text{s}}$, i.e. $\mathrm{Var} = \sigma^2 t$. (a) Estimate how long you typically have before it has wandered by 100 MHz. (b) A famous subtlety: the walk reaches any level eventually (with probability 1), yet the mean time to first reach a level is infinite. How can both be true, and what does that mean for your answer to (a)?
Solution
(a) Set $\sigma\sqrt{t} \sim 100$ MHz: $t \sim (10^8/10^5)^2 = 10^6$ s — about two weeks for a typical excursion of that size (in practice drift, not the walk, gets you first). (b) The distribution of first-passage times is extremely heavy-tailed: most realizations cross early, but the rare ones that happen to start off in the wrong direction can take arbitrarily long, and those long tails diverge when you compute the mean. So "typical time" (the median, roughly the estimate in (a)) is meaningful; "average time" is not. Whenever a quantity has a heavy-tailed distribution, quote a quantile, not a mean — the same lesson as quoting flicker noise per decade rather than as one RMS number.