Chapter 3

From signals to noise: random processes

Every signal so far was deterministic: rerun the experiment and you get the identical trace. Noise means the opposite — rerun it and you get a different trace. The honest mathematical object is therefore not one trace but the whole collection of traces you could have gotten, each with its probability. This chapter builds that object, gives it statistics (mean, variance, correlation), and explains the small miracle — ergodicity — that lets a lab measure any of this from a single record.

One experiment, many movies

Point a photodiode at a "constant" laser and record the voltage for four seconds. Now do it again. The two records disagree in every detail, yet they are obviously siblings: same typical size, same typical wiggliness, same character. What is reproducible is not the trace but the statistics of traces.

A random process $x(t)$ is exactly this: a rule that assigns probabilities to whole time-records. Each run of the experiment draws one complete record — one realization — from that rule, the way each roll of a die draws one face. The set of all possible realizations, weighted by their probabilities, is called the ensemble. Think of it as a stack of movies of the same experiment, one per parallel universe. Any statement beginning "the noise has…" (a mean, a variance, a spectrum) is a statement about the ensemble, not about any single movie.

The ensemble: twelve parallel universes, one experiment

Left: 12 realizations of the same process; one is highlighted, the gray ones are the runs you didn't get. Right: the ensemble mean and the mean ± σ band estimated from 200 realizations. Switch the process type: for white noise the band is flat; for the random walk it fans out like $\sqrt{t}$.

Averaging vertically through the stack of movies — across realizations, at a fixed time $t$ — is called an ensemble average, written $\langle \cdot \rangle$. The right-hand panel above is computed exactly that way: at each $t$, average $x(t)$ over 200 realizations. Note that this is a completely different operation from averaging horizontally along one trace; keeping the two straight is half of this chapter.

Ensemble statistics: mean, variance, and memory

The first two statistics are the ones you would guess. At each instant $t$ the ensemble has a mean and a variance,

$$ \mu(t) = \langle x(t) \rangle, \qquad \sigma^2(t) = \big\langle \big[x(t) - \mu(t)\big]^2 \big\rangle . $$

But a noise trace is not just a sequence of independent random numbers — neighbouring values usually know about each other. The statistic that captures this is the autocorrelation function,

$$ R_x(t_1, t_2) = \langle x(t_1)\, x(t_2) \rangle , $$

the ensemble average of the product of the signal at two different times. For the processes we will care about most, the statistics do not depend on absolute time (more on that in a moment), and $R_x$ depends only on the time separation $\tau = t_2 - t_1$:

$$ R_x(\tau) = \langle x(t)\, x(t+\tau) \rangle . $$

Read it as a question: if the noise is high now, is it still high a time $\tau$ later? If yes, the product $x(t)x(t+\tau)$ tends to be positive and $R_x(\tau)$ is large; if by time $\tau$ the noise has forgotten where it was, positive and negative products cancel and $R_x(\tau)\to 0$. The autocorrelation is the memory of the noise, and $R_x(0) = \sigma^2$ (for a zero-mean process) is just the variance.

The cleanest laboratory of memory needs one piece of hardware vocabulary first. A low-pass filter is anything sluggish — anything whose output cannot follow a fast-changing input and instead keeps relaxing toward it: a capacitor charging through a resistor, a thermometer catching up with a stirred bath, a heavy stage settling after a push. The simplest kind (a single-pole filter) relaxes exponentially, and its time constant $\tau_c$ is its sluggishness in seconds: poke it once and the response dies away as $e^{-t/\tau_c}$ — after a few $\tau_c$ the poke is forgotten. (Chapter 4 will translate "sluggish" into frequency language; here the time picture is all we need.)

Now feed noise into something sluggish. Every random kick lingers in the output for about $\tau_c$ before fading, so the output at nearby times is built from the same recent kicks — the filter's sluggishness has become the noise's memory. The result is called an Ornstein–Uhlenbeck process. Discretely, with sample spacing $\Delta t$,

$$ y_{i+1} = a\, y_i + \sqrt{1-a^2}\; w_i, \qquad a = e^{-\Delta t/\tau_c}, $$

where $w_i$ are independent unit Gaussians; the $\sqrt{1-a^2}$ factor keeps the variance pinned at 1 whatever $\tau_c$ is. Its autocorrelation is a pure exponential, $R_y(\tau) = e^{-|\tau|/\tau_c}$: the process remembers itself for about one filter time constant and no longer.

Memory, made of hardware: noise through an RC filter

The circuit is the demo: a white noise source (it could be the resistor's own thermal noise — chapter 5) drives a resistor–capacitor low-pass, and the capacitor voltage $y(t)$ is an Ornstein–Uhlenbeck process with $\tau_c = RC$. Dial $R$ and $C$: large $RC$ looks smooth and sluggish, small $RC$ jagged. Right: the autocorrelation estimated from a 40 s record against the theoretical $e^{-\tau/RC}$. The estimate gets noisy for long $\tau_c$ — the record then holds fewer independent chunks.

Notice the trade you can see by eye: short memory ↔ wide bandwidth, long memory ↔ narrow bandwidth. A filter that passes only slow frequencies produces noise whose value persists; noise that forgets instantly must contain arbitrarily fast components. The exact statement — that $R_x(\tau)$ and the power spectral density are a Fourier pair — is the Wiener–Khinchin theorem, and it is the whole subject of the next chapter. (The readout above shows the filter's 3 dB bandwidth $1/2\pi RC$ so you can watch the reciprocity happen.) And the same mathematics arrives with other hardware: a micron-sized bead in an optical tweezer is an OU process in position — trap stiffness playing the role of $1/RC$ — and biophysicists calibrate their traps by fitting exactly this exponential memory.

Stationarity: does the process age?

A process is stationary if its statistics are unchanged by a shift of the time origin: the ensemble looks the same whether you start recording now or an hour from now. In practice one usually demands only wide-sense stationarity: the mean $\mu(t)$ is constant and $R_x(t_1,t_2)$ depends only on $\tau = t_2-t_1$. That is the condition under which writing $R_x(\tau)$, as we did above, is legal.

White noise is stationary — every sample is a fresh, identically distributed draw. The random walk is the canonical non-stationary process. Write it as a sum of independent increments: after time $t$ it has accumulated $t/\Delta t$ independent steps of variance $\sigma^2 \Delta t$ each. Independent variances add, so

$$ \mathrm{Var}\big[x(t)\big] \;=\; \frac{t}{\Delta t}\cdot \sigma^2 \Delta t \;=\; \sigma^2 t . $$

The variance grows linearly with time — the ensemble spreads out forever, and the typical excursion grows as $\sigma\sqrt{t}$. No time shift can map this ensemble onto itself: a walk observed from $t=100$ s is statistically wider than one observed from $t=0$.

The random-walk fan: variance grows like $t$

Left: 100 random walks ($\sigma = 1$ per $\sqrt{\text{s}}$) with the $\pm\sigma\sqrt{t}$ envelope — the fan. Right: the ensemble variance measured across 400 walks at each time, against the straight line $\sigma^2 t$.

Why you should care in the lab

Most textbook noise machinery — power spectral densities, Wiener–Khinchin, "the noise floor is X" — assumes stationarity. Random-walk-like noise (chapter 5 will call it $1/f^2$) and drifting setups violate it, and naively applying stationary tools to them gives answers that depend on how long you happened to measure. Chapter 7's Allan deviation exists precisely because clocks misbehave this way.

Why all the square roots?

Noise formulas are full of square roots — the $\sqrt{t}$ fan above, $/\sqrt{\mathrm{Hz}}$ on every datasheet, the $1/\sqrt{T}$ of averaging — and they all have one source. Look again at the derivation above: over any interval $\Delta t$, the walk's increment is Gaussian with variance $\propto \Delta t$ — it is the variance (the square) that is proportional to time, so the typical size of anything random picks up a square root when you take it back out of variance-land: kick size $\propto\sqrt{\Delta t}$. This is not a convention but a corner we are backed into: if increments instead had variance $\propto (\Delta t)^\alpha$, then chopping a fixed time $T$ into $N$ steps would give total variance $N\,(T/N)^\alpha = N^{1-\alpha}\,T^{\alpha}$, which as $N\to\infty$ dies to zero for $\alpha>1$ and blows up for $\alpha<1$. Only $\alpha=1$ survives the continuum limit. The resulting idealized process is called the Wiener process $W(t)$ — the object that drives all of stochastic calculus. So the answer to the title's question fits in one sentence: independent noises add in variance, not in amplitude. Everything in the noise business doubles and halves in the square, and every practical quantity — kick sizes, fan widths, datasheet sensitivities, averaging gains — is that square's square root.

A subtlety worth loving: the random-phase sine

Here is a process that recalibrates your intuition about what "random" means. Take a perfect tone from chapter 1 and randomize only its phase:

$$ x(t) = A \cos(2\pi\nu_0 t + \varphi), \qquad \varphi \sim \text{uniform on } [0, 2\pi) , $$

with $\varphi$ drawn once per realization. Every single movie in the ensemble is a flawless, deterministic-looking cosine — nothing jagged anywhere. Yet as an ensemble this is a perfectly respectable stationary random process: averaging over $\varphi$,

$$ \langle x(t) \rangle = 0, \qquad R_x(\tau) = \frac{A^2}{2} \cos(2\pi\nu_0 \tau) , $$

independent of $t$ (exercise 3.2 does the two-line integral). Select "sine, random phase" in the first demo and the ensemble panels show the first two claims: clean cosines slid against each other, mean zero, ±σ band flat at $A/\sqrt{2}$. The third claim — that cosine autocorrelation — deserves to be measured, exactly as we measured the RC filter's above:

The autocorrelation of a random-phase sine: memory that never fades

Left: a few realizations — identical cosines, random starting phases. Right: $R(\tau)/R(0)$ measured along one record, with the theory $\cos(2\pi\nu_0\tau)$ overlaid. Compare the RC demo above: there the memory died as $e^{-\tau/RC}$; here it oscillates undiminished forever — knowing the value now predicts the value arbitrarily far into the future.

The moral: noise is not the same as "jagged". Randomness lives in the ensemble — in what you couldn't predict before the run — not in the roughness of any one trace. And set the two autocorrelation demos side by side, because together they are the seed of the deepest fact in the course: exponential forgetting (the RC filter) will turn out to mean a smeared-out spectrum, while this never-fading oscillation is what an infinitely sharp spectral line at $\nu_0$ looks like in the time domain. The dial connecting the two is the next chapter.

Ergodicity: why one trace is enough

Everything above was defined over the ensemble — but nobody owns 200 parallel universes. Your data logger records one realization, and from it you compute time averages: the mean of the samples, the average of $x(t)x(t+\tau)$ along the record, and so on. A process is ergodic if these time averages, taken over one sufficiently long record, converge to the ensemble averages:

$$ \lim_{T\to\infty} \frac{1}{T} \int_0^T x(t)\, x(t+\tau)\, dt \;=\; \langle x(t)\, x(t+\tau)\rangle \;=\; R_x(\tau) . $$

Intuitively, an ergodic process wanders through all of its typical behaviours given enough time, so one long movie samples the ensemble as well as many short ones. Well-behaved stationary processes — filtered thermal noise, shot noise, the OU process above — are ergodic. And this is the silent assumption of essentially every instrument you own: a spectrum analyzer averaging successive sweeps, a lock-in's output filter, a frequency counter's gate — all replace ensemble averages by time averages and hope the process cooperates. (Only stationary processes can be ergodic: a time average produces one number, so it cannot track a time-dependent $\mu(t)$.)

A stationary process that is not ergodic

Let each realization be a constant, $x(t) = c$, with $c$ drawn once from a zero-mean Gaussian. The ensemble statistics are time-independent ($\mu = 0$, $R_x(\tau) = \sigma_c^2$ for all $\tau$), so the process is stationary. But the time average of any single record is just its own $c \neq 0$: no amount of watching one movie tells you the ensemble mean, because each movie is stuck in one "state" forever. Real-world versions of this: unit-to-unit offsets, a resistor's fixed mismatch, any frozen-in disorder. Averaging longer does not help — only measuring more devices does.

Watch the difference happen. Below, eight parallel universes run the same experiment; the right panel tracks each universe's running time average $\frac{1}{T}\int_0^T x\,dt$ as the record grows. For the ergodic process all eight funnels collapse onto the ensemble mean — any single movie would have told you the truth. Switch to the non-ergodic process (the same noise riding on a frozen, once-per-universe offset): every time average still converges beautifully… each to its own private answer. Convergence is not the issue — convergence to the ensemble answer is.

Ergodic vs non-ergodic: does one movie tell you the truth?

Left: eight realizations (one highlighted). Right: each realization's running time average vs averaging time, with the ensemble mean dashed. Press play to grow the records and watch the averages settle; then switch process type. For the frozen-offset process, waiting longer only makes each universe more confident of a different number.

Drift versus noise

One more distinction the lab forces on you. A deterministic trend — the thermal ramp as your box warms up, mechanical creep, a battery discharging — is not noise at all: rerun the experiment and the same ramp comes back. It should be modelled and subtracted, not averaged over. Slow non-stationary noise — random-walk-like wandering — looks superficially similar in any one record but is genuinely random: rerun and it wanders differently.

The uncomfortable truth is that in a finite record you often cannot tell them apart. A single realization of very-low-frequency noise (the "flicker", $1/f$-type noise of chapter 5) is nearly indistinguishable from a smooth drift: both look like "it slowly went up". The discriminating tools are statistical and need either many records or clever statistics on one: the slope of the power spectral density at low frequency (chapter 5) and the Allan deviation's behaviour at long averaging times (chapter 7).

Lab note: is my laser drifting or is it flicker?

You log your laser's frequency for an hour and it moved 200 kHz, slowly and smoothly. Temperature drift? Or $1/f$ frequency noise that would average away… eventually? A single hour cannot say. What works: correlate against the suspected cause (log the room temperature too — a deterministic drift has a physical driver you can find); repeat the run and see whether the "drift" reproduces in sign and shape; and compute the Allan deviation — a linear drift makes $\sigma_y(\tau)$ rise as $\tau$, flicker floors flat, random walk rises as $\sqrt{\tau}$. Until you have done one of these, resist writing either "drift" or "noise" in the logbook; write "slow".

The same thing in code

The two loops below generate every plot on this page. The first makes one realization of the OU (low-pass-filtered) process; the second builds ensemble statistics by brute force — exactly the definition of $\langle\cdot\rangle$, no cleverness required.

// one realization of an Ornstein–Uhlenbeck process, unit variance
const a = Math.exp(-dt / tauC), b = Math.sqrt(1 - a * a);
y[0] = Noise.randn(rand);                       // start in steady state
for (let i = 1; i < N; i++) y[i] = a * y[i - 1] + b * Noise.randn(rand);

// ensemble mean and sigma at each time, from M fresh realizations
const sum = new Float64Array(N), sq = new Float64Array(N);
for (let m = 0; m < M; m++) {
  const x = makeRealization(rand);              // a new movie each pass
  for (let i = 0; i < N; i++) { sum[i] += x[i]; sq[i] += x[i] * x[i]; }
}
const mean  = sum.map(s => s / M);
const sigma = sq.map((s, i) => Math.sqrt(s / M - mean[i] * mean[i]));

Simulating noise: the kick scales as √Δt

The commonest bug in home-made noise simulations follows directly from the √Δt rule. If you integrate $\dot x = -x/\tau_c + \xi(t)$ with an Euler step, the deterministic part advances by $\Delta t$ but the noise kick must be $\sigma\sqrt{\Delta t}\cdot\texttt{randn()}$ — square root, not linear. Scale the kick by $\Delta t$ and your simulated noise silently vanishes as you refine the timestep (and halving $\Delta t$ halves the diffusion instead of preserving it). The $\sqrt{1-a^2}$ in the OU one-liner above is the exact-discretization version of the same square root.

Exercises

Exercise 3.1 — variance of a random walk

A walker takes $N$ independent steps, each $\pm s$ with equal probability. Show that the variance of the final position is $N s^2$, and explain why the fan in the demo has a $\sqrt{N}$ (equivalently $\sqrt{t}$) envelope rather than a linear one.

Solution

The position is $x_N = \sum_{k=1}^{N} \epsilon_k$ with $\epsilon_k = \pm s$, $\langle \epsilon_k \rangle = 0$ and $\langle \epsilon_k^2 \rangle = s^2$. Then $\mathrm{Var}[x_N] = \sum_k \langle\epsilon_k^2\rangle + \sum_{j\neq k}\langle \epsilon_j\rangle\langle\epsilon_k\rangle = N s^2$: independence kills the cross terms, so variances add, not amplitudes. The typical excursion is the standard deviation $s\sqrt{N}$ — the fan is the square root of a linearly growing variance. With steps of variance $\sigma^2\Delta t$ every $\Delta t$, this is $\mathrm{Var}[x(t)] = \sigma^2 t$ and envelope $\sigma\sqrt{t}$.

Exercise 3.2 — autocorrelation of the random-phase sine

For $x(t) = A\cos(2\pi\nu_0 t + \varphi)$ with $\varphi$ uniform on $[0, 2\pi)$, compute $R_x(\tau) = \langle x(t)\,x(t+\tau) \rangle$ by doing the integral over $\varphi$.

Solution

Use $\cos\alpha\cos\beta = \tfrac12[\cos(\alpha-\beta) + \cos(\alpha+\beta)]$ with $\alpha = 2\pi\nu_0 t + \varphi$, $\beta = 2\pi\nu_0 (t+\tau) + \varphi$: $$ x(t)x(t+\tau) = \frac{A^2}{2}\Big[\cos(2\pi\nu_0\tau) + \cos\!\big(2\pi\nu_0(2t+\tau) + 2\varphi\big)\Big]. $$ Averaging over $\varphi$: the first term has no $\varphi$ and survives; the second is a cosine averaged over a full period of $2\varphi$, $\frac{1}{2\pi}\int_0^{2\pi}\cos(\cdot + 2\varphi)\,d\varphi = 0$. Hence $R_x(\tau) = \tfrac{A^2}{2}\cos(2\pi\nu_0\tau)$, independent of $t$ — the process is wide-sense stationary, and $R_x(0) = A^2/2$ is the familiar mean-square value of a sine.

Exercise 3.3 — the frozen constant

Let $x(t) = c$ for all $t$, with $c \sim \mathcal{N}(0,1)$ drawn once per realization. Is this process stationary? Is it ergodic?

Solution

Stationary: yes. Every realization is constant, so shifting time changes nothing: $\mu(t) = \langle c \rangle = 0$ and $R_x(t_1,t_2) = \langle c^2 \rangle = 1$ for all times — statistics fully shift-invariant. Ergodic: no. The time average of one realization is $\frac{1}{T}\int_0^T c\,dt = c$, a random number that does not converge to the ensemble mean $0$ no matter how long $T$ is. One movie explores exactly one point of the ensemble.

Exercise 3.4 — your oscilloscope's "average" mode

A scope in averaging mode acquires 64 triggered traces and averages them point by point. Is that a time average or an ensemble average? Under what conditions does it agree with a time average along one long record?

Solution

It is an ensemble average: each trigger starts a fresh realization of the noise, and the scope averages across the 64 realizations at fixed time-after-trigger. That is precisely why it works — the trigger-synchronous signal is the same in every realization and survives, while zero-mean noise averages down as $1/\sqrt{64} = 1/8$ in amplitude. It agrees with a time average along one record when the noise is stationary and ergodic and nothing distinguishes one trigger from the next (no trigger-correlated pickup, no drift between acquisitions). If the setup drifts between triggers — non-stationarity — the 64 movies come from different ensembles, and the average blurs the signal instead of cleaning it.

Exercise 3.5 — how long until it wanders off?

Your laser is free-running (no lock), and its frequency random-walks with $\sigma = 100$ kHz$/\sqrt{\text{s}}$, i.e. $\mathrm{Var} = \sigma^2 t$. (a) Estimate how long you typically have before it has wandered by 100 MHz. (b) A famous subtlety: the walk reaches any level eventually (with probability 1), yet the mean time to first reach a level is infinite. How can both be true, and what does that mean for your answer to (a)?

Solution

(a) Set $\sigma\sqrt{t} \sim 100$ MHz: $t \sim (10^8/10^5)^2 = 10^6$ s — about two weeks for a typical excursion of that size (in practice drift, not the walk, gets you first). (b) The distribution of first-passage times is extremely heavy-tailed: most realizations cross early, but the rare ones that happen to start off in the wrong direction can take arbitrarily long, and those long tails diverge when you compute the mean. So "typical time" (the median, roughly the estimate in (a)) is meaningful; "average time" is not. Whenever a quantity has a heavy-tailed distribution, quote a quantile, not a mean — the same lesson as quoting flicker noise per decade rather than as one RMS number.