Chapter 4
The power spectral density
Stare at a noise time-trace for as long as you like: it will tell you almost nothing. The question that has an answer is "how much noise power lives at which frequencies?" — and the power spectral density is that answer. Every noise number you will ever quote — V/√Hz on a datasheet, dBc/Hz on a phase-noise plot, pT/√Hz for a magnetometer — is this one object in different clothes. This is the keystone chapter of the course.
From the FFT to the PSD
Chapter 3 left us with random processes: signals $x(t)$ whose individual wiggles are unpredictable but whose statistics are stable. The natural instinct of anyone with an oscilloscope is to record a stretch of the signal for a time $T$ and take its Fourier transform,
$$ X_T(f) \;=\; \int_0^{T} x(t)\, e^{-i 2\pi f t}\, dt . $$
Look at what this integral is: at each instant it takes the sample $x(t)$, points it in the direction $e^{-i2\pi f t}$ (chapter 1's phasor, run backwards at the analysis frequency), and adds it to a running sum. $X_T(f)$ is the endpoint of a walk in the complex plane. That one picture contains the whole definition of the PSD. For a deterministic tone analyzed at its own frequency, every step points the same way — the walk is a straight ray, and $|X_T(f)|$ grows like $T$. Analyzed off-frequency, the steps' directions rotate at the detuning and the walk just circles — no growth: that is why a tone contributes only at its frequency. For noise, the steps have random signs — the walk is exactly chapter 3's random walk, so $|X_T(f)|$ grows only like $\sqrt{T}$ and $|X_T(f)|^2$ like $T$, not $T^2$. Watch all three:
Top: the record $x(t)$ itself (the part consumed so far in color). Bottom left: the running sum $X_t(f)$ as the record grows (press play). Bottom right: $|X_t(f)|$ against record length on log–log axes, with the two possible slopes dashed. A tone on resonance walks straight ($\propto t$); detuned, it goes in circles (bounded); noise random-walks ($\propto \sqrt{t}$, in expectation — single records wobble around it). "New sample" rerolls the noise, or the tone's starting phase, and regrows the record. Since noise power $|X_t|^2$ grows like $t$, dividing by $t$ is exactly what settles it — that division is the next equation.
$|X_T(f)|^2$ itself therefore never settles down as you record longer — but $|X_T(f)|^2 / T$ does (in expectation). So we define the power spectral density:
$$ S_x(f) \;=\; \lim_{T\to\infty} \frac{2\,\big\langle |X_T(f)|^2 \big\rangle}{T}, \qquad f \ge 0 . $$
The angle brackets are an ensemble average (over repeated records), and the factor of 2 folds the mathematically symmetric negative-frequency half of the spectrum onto the physical, positive-frequency half. The payoff of this exact normalization is one beautiful identity — Parseval's theorem in its most useful form:
$$ \int_0^{\infty} S_x(f)\, df \;=\; \operatorname{Var}(x) \;=\; x_{\mathrm{rms}}^2 \qquad \text{(zero-mean $x$)} . $$
The area under the PSD is the mean-square value of the signal. Everything in this chapter, and much of the rest of the course, is an application of that sentence. Watch the identity assemble itself: sweep the cutoff below and admit the signal's frequencies one decade at a time.
Drag the cutoff $f^{*}$. Top: the piece of the signal built from only the frequencies below $f^{*}$ (full signal in grey) — it visibly grows into the whole thing. Left: the PSD with $\int_0^{f^*} S\,df$ shaded. Right: that running area versus $f^{*}$, climbing to the dashed line — which is $\mathrm{Var}(x)$ computed in the time domain, knowing nothing about spectra. The two readouts agree at every cutoff, and at the top of the sweep the identity is complete.
One more connection, so tidy it deserves its own paragraph. The autocorrelation $R_x(\tau) = \langle x(t)\,x(t+\tau)\rangle$ from chapter 3 and the PSD are a Fourier-transform pair (the Wiener–Khinchin theorem):
$$ S_x(f) \;=\; 2\!\int_{-\infty}^{\infty} R_x(\tau)\, e^{-i 2\pi f \tau}\, d\tau \qquad (f > 0,\ \text{one-sided}). $$
The two descriptions of "memory" are the same object: a delta-function correlation (no memory at all) transforms to a flat, white spectrum, while a slowly decaying correlation (long memory) transforms to a spectrum concentrated at low frequency. Setting $\tau = 0$ recovers the area rule: $R_x(0) = \operatorname{Var}(x) = \int_0^\infty S_x\, df$.
Chapter 3's three autocorrelation stories now line up as one dial. Both panels below are measured from the same record; the theorem is the claim that they always agree with each other:
Pick a process; the top strip shows the record itself, and below it the two transforms of the same data: measured autocorrelation (left) and measured PSD (right), theory dashed on both. White noise: a delta at $\tau=0$ ↔ a flat spectrum. Filtered noise: drag $\tau_c$ and watch the exponential widen while the Lorentzian narrows — the reciprocity of chapter 3, now exact. Sine in noise: the never-fading cosine ↔ a sharp line standing on the noise's flat floor.
Units: it is a density
If $x$ is a voltage in volts, then $|X_T|^2/T$ carries units of $\mathrm{V^2\,s} = \mathrm{V^2/Hz}$. So the PSD of a voltage is measured in V²/Hz — volts squared per hertz. The "per hertz" matters in exactly the way "per cubic metre" matters for mass density: the PSD is not a power, it is a power density along the frequency axis. The power in a band is the area over that band, $$ P_{[f_1,f_2]} \;=\; \int_{f_1}^{f_2} S_x(f)\, df , $$ just as mass is density times volume. A PSD value at a single frequency means nothing by itself until you multiply by a bandwidth.
Datasheets usually quote the square root of the PSD, the amplitude spectral density (ASD), $\sqrt{S_x(f)}$ in $\mathrm{V/\sqrt{Hz}}$ — those famous "nV/√Hz" numbers on every op-amp datasheet. Same information, amplitude units instead of power units. To get an rms from an ASD you multiply by the square root of your bandwidth (for a flat spectrum): $x_{\mathrm{rms}} = \sqrt{S_x}\cdot\sqrt{\Delta f}$.
| signal unit | PSD unit | ASD unit | typical lab example |
|---|---|---|---|
| voltage, V | V²/Hz | V/√Hz | amplifier input noise, 3 nV/√Hz |
| displacement, m | m²/Hz | m/√Hz | LIGO strain arm, 10⁻¹⁹ m/√Hz |
| magnetic field, T | T²/Hz | T/√Hz | magnetometer floor, 1 pT/√Hz |
| frequency, Hz | Hz²/Hz | Hz/√Hz | laser frequency noise (chapter 6) |
| phase, rad | rad²/Hz | rad/√Hz | phase noise $S_\varphi$, dBc/Hz (chapter 6) |
The area under the PSD is the RMS you measure
Time to make the central claim physical. Below is a noise voltage — white noise gently rolled off by a one-pole low-pass (corner $f_c$), sampled at rate $f_s$ for a duration $T$; all three are sliders, so you can also watch what the acquisition itself does (a higher $f_s$ extends the spectrum's reach toward its Nyquist edge $f_s/2$; a longer $T$ smooths the estimate). Follow it with an ideal band-pass from $f_1$ to $f_2$ (in the demo: a brick-wall filter in the FFT domain) and read the output on a true-rms voltmeter. The claim is that the voltmeter reading is exactly the square root of the shaded area under the PSD:
$$ x_{\mathrm{rms}}^{\,\text{in band}} \;=\; \sqrt{\int_{f_1}^{f_2} S_x(f)\, df\,} . $$
Drag the band edges. Watch the shaded area, the filtered trace, and the two readouts — one predicted from the area, one measured from the filtered samples. They track each other everywhere: wide band, narrow band, on the flat part, down the roll-off. This is the whole meaning of a PSD.
The circuit above is the experiment: noisy signal → band-pass ($f_1$ to $f_2$, set by the sliders) → true-rms voltmeter, whose display reads live. The claim: the voltmeter must always show $\sqrt{\text{shaded area}}$ — compare it against the predicted readout as you drag. The two knobs separate the two ideas: fix the bandwidth $\Delta f$ and slide the centre — on the plateau the voltmeter barely moves (same $S$, same $\Delta f$), then it falls as the band rides down the roll-off, tracking $\sqrt{S \cdot \Delta f}$ with the local level $S$. Straddle the corner and only the full integral gets it right. (Edges clamp at the measurable range.) Left: the PSD with the band shaded. Right: the filtered trace (full trace in grey) with the predicted ±rms dashed. Bottom: the distribution of the filtered samples in real millivolts, against a Gaussian of width $\sqrt{\text{area}}$ — no fitted parameters, so it matches only if the area really is the variance.
Two things worth noticing while you play. First, an equal-looking band on the log axis (say 10–100 Hz vs 100–1000 Hz) does not carry equal power: the area rule is about linear hertz, and a decade higher up contains ten times more hertz. Second, when you narrow the band the filtered trace does not just shrink — it becomes slower and more sinusoidal, ringing near the band's centre frequency. A narrow slice of noise looks like a wobbly sine wave: that observation becomes important when we discuss what a spectrum analyzer actually measures.
Estimating a PSD honestly
Now the uncomfortable part. The definition of $S_x$ contains an ensemble average $\langle\cdot\rangle$ that your single lab record does not have. If you just take one FFT of one record and square it — the raw periodogram — you get an estimate whose expectation is the true PSD but whose fluctuation is enormous: each frequency bin is the sum of the squares of two Gaussian numbers (the real and imaginary parts of $X_T$), i.e. a chi-squared variable with 2 degrees of freedom. Its standard deviation equals its mean. Every bin of a raw periodogram is uncertain by about 100% — and recording longer does not help, because a longer record just gives you more bins, each one still 100% scattered.
The fix is to average. Welch's method: chop the record into $K$ segments, window each segment (more on that below), take the periodogram of each, and average the $K$ periodograms bin by bin. The fluctuation shrinks like $1/\sqrt{K}$; the price is frequency resolution, since each segment is shorter — $\Delta f = f_s/n_{\mathrm{seg}}$. That trade (smoothness vs resolution) is the fundamental bargain of spectrum estimation; there is no free lunch, only a well-chosen $K$.
Top: the record actually used, chopped into the $K$ segments being averaged (only the first $K \times 0.512$ s of the full record is consumed — more averaging spends more data). Bottom: the estimate against the known true PSD (dashed). $K=1$ is the raw periodogram: 100% scatter, no matter how long the record. Untick "log frequency axis" to answer a classic puzzle — see the callout below.
Lines vs noise floors: the spectrum-analyzer trap
A PSD plot in the lab almost always shows two kinds of feature at once: sharp lines (a 50/60 Hz pickup, a carrier, a calibration tone) sticking out of a smooth noise floor. These two objects have different units, and confusing them is the classic spectrum-analyzer mistake.
A pure sine of amplitude $A$ carries power $A^2/2$ concentrated at one frequency — in the $T\to\infty$ limit its PSD is a delta function, $\frac{A^2}{2}\,\delta(f - f_0)$. In a real measurement with resolution $\Delta f$, all of that power lands in essentially one bin, so the height you read in V²/Hz is about $\frac{A^2/2}{\mathrm{ENBW}}$, where ENBW is the bin's effective width from the window callout above ($1.5\,\Delta f$ for our Hann window) — the height depends on your resolution, not on the signal! Halve $\Delta f$ and the spike doubles in height while its area stays fixed at $A^2/2$. The noise floor does the opposite: it is a genuine density, so its level in V²/Hz stays put while its per-bin power shrinks with the bin width. Try it:
The same data, plotted both ways. Left: the PSD in V²/Hz — slide the segment length (finer resolution → narrower bins) and the spike at 250 Hz grows taller and narrower, its area pinned at $A^2/2 = 0.125\ \mathrm{V^2}$, while the noise floor never moves. Right: what a swept spectrum analyzer shows — power per resolution bandwidth — where everything is reversed: the line's height is pinned at $A^2/2$ and the floor rides up and down with the RBW.
Now the trap. A swept spectrum analyzer displays power per resolution bandwidth, not power per hertz. The unit on its screen is dBm per RBW — dBm is the RF world's absolute power unit, decibels relative to 1 mW ($0\ \mathrm{dBm} = 1$ mW, $-30\ \mathrm{dBm} = 1\ \mu$W), and dBm/Hz is a power density in the same currency. On that display everything is reversed: the line height stays fixed as you change RBW (the line's power all fits inside any RBW), while the noise floor moves — up 10 dB for every 10× wider RBW, because a wider filter collects more noise power. Same physics, the opposite visual behaviour, purely because of what is being plotted.
The essential code
The band-pass filter behind the central demo is worth seeing: a brick-wall filter is three steps in the FFT domain. Zeroing a bin $k$ requires also zeroing its negative-frequency mirror $N-k$, or the inverse transform stops being real.
// brick-wall band-pass: FFT, zero everything outside [f1, f2], inverse FFT
const N = x.length; // power of 2, sample rate fs
const re = Float64Array.from(x);
const im = new Float64Array(N);
FFT.fft(re, im);
for (let k = 0; k <= N / 2; k++) {
const f = k * fs / N;
if (f < f1 || f > f2) {
re[k] = im[k] = 0;
if (k > 0 && k < N / 2) re[N - k] = im[N - k] = 0; // mirror bin
}
}
FFT.ifft(re, im); // re[] is now the filtered time trace
// check: Noise.rms(re)² equals the PSD area between f1 and f2
You now own the tool. The obvious next question is what shapes $S(f)$ actually takes when you point it at real hardware — and it turns out that almost everything in a lab lands on a handful of straight lines on a log–log plot. Those straight lines, and the very different personalities behind them, are the next chapter.
Exercises
An amplifier datasheet quotes input voltage noise of 3 nV/√Hz, flat out to 100 kHz. Your measurement has a 10 kHz bandwidth. What rms voltage noise do you expect at the input?
Solution
For a flat spectrum, rms = ASD × √(bandwidth): $3\,\mathrm{nV/\sqrt{Hz}} \times \sqrt{10^4\,\mathrm{Hz}} = 3\,\mathrm{nV} \times 100 = 300\,\mathrm{nV} = 0.3\ \mu\mathrm{V}$ rms. Notice the units work only because the ASD is a density: the √Hz in the denominator eats the √Hz of the bandwidth.
A theory paper reports a (two-sided) displacement noise spectrum $S^{\mathrm{2s}}_x(f) = 2\times10^{-34}\ \mathrm{m^2/Hz}$, flat, defined for $-\infty < f < \infty$. What is the one-sided PSD, and what ASD would you quote on a plot?
Solution
The one-sided PSD folds negative frequencies onto positive ones: $S_x = 2\,S^{\mathrm{2s}}_x = 4\times10^{-34}\ \mathrm{m^2/Hz}$. The ASD is its square root, $\sqrt{S_x} = 2\times10^{-17}\ \mathrm{m/\sqrt{Hz}}$. Note the ASD changes only by $\sqrt2$ — which is exactly why this mistake survives casual sanity checks.
A spectrum analyzer with RBW = 100 kHz shows a noise floor of −90 dBm. What is the noise floor in dBm/Hz? (Ignore the small ENBW correction.)
Solution
Divide by the bandwidth, i.e. subtract $10\log_{10}(10^5) = 50$ dB: $-90 - 50 = -140\ \mathrm{dBm/Hz}$. (Marker-noise mode also applies a small correction, typically a couple of dB, for the RBW filter's ENBW and the analyzer's detector statistics.)
A 1 V rms sine is measured with an FFT analyzer, first with $\Delta f = 1$ Hz bins, then with $\Delta f = 0.01$ Hz bins (assume ideal rectangular bins and a bin-centred tone). What peak PSD height do you read in each case? What stays the same?
Solution
The sine's total power is $(1\ \mathrm{V\,rms})^2 = 1\ \mathrm{V^2}$, all in one bin. Height = power / bin width: $1\ \mathrm{V^2}/1\ \mathrm{Hz} = 1\ \mathrm{V^2/Hz}$ in the first case, $1/0.01 = 100\ \mathrm{V^2/Hz}$ in the second. The area is 1 V² both times — that is the number that describes the signal. (A real window spreads the power over ~ENBW instead of one bin, scaling both heights by the same factor.)
You double the length of a noise record and recompute the raw periodogram, hoping it will look smoother. It doesn't. Why not, and what should you do instead?
Solution
Doubling $T$ halves the bin width $\Delta f = 1/T$, so you get twice as many bins — but each bin is still a chi-squared variable with 2 degrees of freedom (one complex FFT value squared), i.e. still ~100% uncertain. Longer records buy resolution, never smoothness. Smoothness comes only from averaging: split the record into $K$ segments and Welch-average ($\sigma \propto 1/\sqrt K$), or equivalently average adjacent bins. The second demo on this page is exactly this experiment.