Chapter 4

The power spectral density

Stare at a noise time-trace for as long as you like: it will tell you almost nothing. The question that has an answer is "how much noise power lives at which frequencies?" — and the power spectral density is that answer. Every noise number you will ever quote — V/√Hz on a datasheet, dBc/Hz on a phase-noise plot, pT/√Hz for a magnetometer — is this one object in different clothes. This is the keystone chapter of the course.

From the FFT to the PSD

Chapter 3 left us with random processes: signals $x(t)$ whose individual wiggles are unpredictable but whose statistics are stable. The natural instinct of anyone with an oscilloscope is to record a stretch of the signal for a time $T$ and take its Fourier transform,

$$ X_T(f) \;=\; \int_0^{T} x(t)\, e^{-i 2\pi f t}\, dt . $$

Look at what this integral is: at each instant it takes the sample $x(t)$, points it in the direction $e^{-i2\pi f t}$ (chapter 1's phasor, run backwards at the analysis frequency), and adds it to a running sum. $X_T(f)$ is the endpoint of a walk in the complex plane. That one picture contains the whole definition of the PSD. For a deterministic tone analyzed at its own frequency, every step points the same way — the walk is a straight ray, and $|X_T(f)|$ grows like $T$. Analyzed off-frequency, the steps' directions rotate at the detuning and the walk just circles — no growth: that is why a tone contributes only at its frequency. For noise, the steps have random signs — the walk is exactly chapter 3's random walk, so $|X_T(f)|$ grows only like $\sqrt{T}$ and $|X_T(f)|^2$ like $T$, not $T^2$. Watch all three:

$X_T(f)$ is a walk in the complex plane

Top: the record $x(t)$ itself (the part consumed so far in color). Bottom left: the running sum $X_t(f)$ as the record grows (press play). Bottom right: $|X_t(f)|$ against record length on log–log axes, with the two possible slopes dashed. A tone on resonance walks straight ($\propto t$); detuned, it goes in circles (bounded); noise random-walks ($\propto \sqrt{t}$, in expectation — single records wobble around it). "New sample" rerolls the noise, or the tone's starting phase, and regrows the record. Since noise power $|X_t|^2$ grows like $t$, dividing by $t$ is exactly what settles it — that division is the next equation.

$|X_T(f)|^2$ itself therefore never settles down as you record longer — but $|X_T(f)|^2 / T$ does (in expectation). So we define the power spectral density:

$$ S_x(f) \;=\; \lim_{T\to\infty} \frac{2\,\big\langle |X_T(f)|^2 \big\rangle}{T}, \qquad f \ge 0 . $$

The angle brackets are an ensemble average (over repeated records), and the factor of 2 folds the mathematically symmetric negative-frequency half of the spectrum onto the physical, positive-frequency half. The payoff of this exact normalization is one beautiful identity — Parseval's theorem in its most useful form:

$$ \int_0^{\infty} S_x(f)\, df \;=\; \operatorname{Var}(x) \;=\; x_{\mathrm{rms}}^2 \qquad \text{(zero-mean $x$)} . $$

The area under the PSD is the mean-square value of the signal. Everything in this chapter, and much of the rest of the course, is an application of that sentence. Watch the identity assemble itself: sweep the cutoff below and admit the signal's frequencies one decade at a time.

Parseval, live: the area accumulates into the variance

Drag the cutoff $f^{*}$. Top: the piece of the signal built from only the frequencies below $f^{*}$ (full signal in grey) — it visibly grows into the whole thing. Left: the PSD with $\int_0^{f^*} S\,df$ shaded. Right: that running area versus $f^{*}$, climbing to the dashed line — which is $\mathrm{Var}(x)$ computed in the time domain, knowing nothing about spectra. The two readouts agree at every cutoff, and at the top of the sweep the identity is complete.

Energy signals vs power signals — why the 1/T is really there
There is a cleaner way to say why the definition divides by $T$. A pulse — anything that starts and ends — has finite total energy $\int x^2\,dt$, and Fourier analysis hands you an energy spectral density $|X(f)|^2$ with no limit needed. Stationary noise never starts or ends: its energy is infinite, and $|X_T(f)|^2$ duly diverges as you record longer. What stays finite is the power, the time-average of $x^2$ — so the honest spectral object is power per unit frequency, and the $1/T$ (plus the ensemble average) is exactly the "per unit time" that turns a divergent energy into a finite power. One habit to build: whenever you meet a new spectral quantity, ask first whether it is an energy density (pulses, single-shot signals) or a power density (stationary noise, everything else in this course) — a surprising number of unit confusions are just these two mixed up.
Conventions, stated loudly
Throughout this site we use one-sided PSDs (defined for $f \ge 0$, including the factor 2 above) and ordinary frequency in hertz (never rad/s). With these choices, $\int_0^\infty S_x\,df = x_{\mathrm{rms}}^2$, full stop. Be warned that two-sided PSDs also exist: symmetric in $\pm f$, exactly half our value at each $|f|$, integrating to the same variance over the whole line. Papers routinely fail to say which one they mean, and this is the single most common unit ambiguity in the noise literature — a silent factor of 2 (or $\sqrt 2$ in amplitude units). When you read "$S(f) = \dots$", your first question is always: one-sided or two-sided? In Hz or rad/s?

One more connection, so tidy it deserves its own paragraph. The autocorrelation $R_x(\tau) = \langle x(t)\,x(t+\tau)\rangle$ from chapter 3 and the PSD are a Fourier-transform pair (the Wiener–Khinchin theorem):

$$ S_x(f) \;=\; 2\!\int_{-\infty}^{\infty} R_x(\tau)\, e^{-i 2\pi f \tau}\, d\tau \qquad (f > 0,\ \text{one-sided}). $$

The two descriptions of "memory" are the same object: a delta-function correlation (no memory at all) transforms to a flat, white spectrum, while a slowly decaying correlation (long memory) transforms to a spectrum concentrated at low frequency. Setting $\tau = 0$ recovers the area rule: $R_x(0) = \operatorname{Var}(x) = \int_0^\infty S_x\, df$.

Chapter 3's three autocorrelation stories now line up as one dial. Both panels below are measured from the same record; the theorem is the claim that they always agree with each other:

Wiener–Khinchin, live: the same memory, in τ and in f

Pick a process; the top strip shows the record itself, and below it the two transforms of the same data: measured autocorrelation (left) and measured PSD (right), theory dashed on both. White noise: a delta at $\tau=0$ ↔ a flat spectrum. Filtered noise: drag $\tau_c$ and watch the exponential widen while the Lorentzian narrows — the reciprocity of chapter 3, now exact. Sine in noise: the never-fading cosine ↔ a sharp line standing on the noise's flat floor.

Units: it is a density

If $x$ is a voltage in volts, then $|X_T|^2/T$ carries units of $\mathrm{V^2\,s} = \mathrm{V^2/Hz}$. So the PSD of a voltage is measured in V²/Hz — volts squared per hertz. The "per hertz" matters in exactly the way "per cubic metre" matters for mass density: the PSD is not a power, it is a power density along the frequency axis. The power in a band is the area over that band, $$ P_{[f_1,f_2]} \;=\; \int_{f_1}^{f_2} S_x(f)\, df , $$ just as mass is density times volume. A PSD value at a single frequency means nothing by itself until you multiply by a bandwidth.

Datasheets usually quote the square root of the PSD, the amplitude spectral density (ASD), $\sqrt{S_x(f)}$ in $\mathrm{V/\sqrt{Hz}}$ — those famous "nV/√Hz" numbers on every op-amp datasheet. Same information, amplitude units instead of power units. To get an rms from an ASD you multiply by the square root of your bandwidth (for a flat spectrum): $x_{\mathrm{rms}} = \sqrt{S_x}\cdot\sqrt{\Delta f}$.

signal unitPSD unitASD unittypical lab example
voltage, VV²/HzV/√Hzamplifier input noise, 3 nV/√Hz
displacement, mm²/Hzm/√HzLIGO strain arm, 10⁻¹⁹ m/√Hz
magnetic field, TT²/HzT/√Hzmagnetometer floor, 1 pT/√Hz
frequency, HzHz²/HzHz/√Hzlaser frequency noise (chapter 6)
phase, radrad²/Hzrad/√Hzphase noise $S_\varphi$, dBc/Hz (chapter 6)

The area under the PSD is the RMS you measure

Time to make the central claim physical. Below is a noise voltage — white noise gently rolled off by a one-pole low-pass (corner $f_c$), sampled at rate $f_s$ for a duration $T$; all three are sliders, so you can also watch what the acquisition itself does (a higher $f_s$ extends the spectrum's reach toward its Nyquist edge $f_s/2$; a longer $T$ smooths the estimate). Follow it with an ideal band-pass from $f_1$ to $f_2$ (in the demo: a brick-wall filter in the FFT domain) and read the output on a true-rms voltmeter. The claim is that the voltmeter reading is exactly the square root of the shaded area under the PSD:

$$ x_{\mathrm{rms}}^{\,\text{in band}} \;=\; \sqrt{\int_{f_1}^{f_2} S_x(f)\, df\,} . $$

Drag the band edges. Watch the shaded area, the filtered trace, and the two readouts — one predicted from the area, one measured from the filtered samples. They track each other everywhere: wide band, narrow band, on the flat part, down the roll-off. This is the whole meaning of a PSD.

Band power: the shaded area is the rms² of the filtered signal

The circuit above is the experiment: noisy signal → band-pass ($f_1$ to $f_2$, set by the sliders) → true-rms voltmeter, whose display reads live. The claim: the voltmeter must always show $\sqrt{\text{shaded area}}$ — compare it against the predicted readout as you drag. The two knobs separate the two ideas: fix the bandwidth $\Delta f$ and slide the centre — on the plateau the voltmeter barely moves (same $S$, same $\Delta f$), then it falls as the band rides down the roll-off, tracking $\sqrt{S \cdot \Delta f}$ with the local level $S$. Straddle the corner and only the full integral gets it right. (Edges clamp at the measurable range.) Left: the PSD with the band shaded. Right: the filtered trace (full trace in grey) with the predicted ±rms dashed. Bottom: the distribution of the filtered samples in real millivolts, against a Gaussian of width $\sqrt{\text{area}}$ — no fitted parameters, so it matches only if the area really is the variance.

~ noise, LP 250 Hz band-pass 10 – 100 Hz true-rms voltmeter 0.0 mV

Two things worth noticing while you play. First, an equal-looking band on the log axis (say 10–100 Hz vs 100–1000 Hz) does not carry equal power: the area rule is about linear hertz, and a decade higher up contains ten times more hertz. Second, when you narrow the band the filtered trace does not just shrink — it becomes slower and more sinusoidal, ringing near the band's centre frequency. A narrow slice of noise looks like a wobbly sine wave: that observation becomes important when we discuss what a spectrum analyzer actually measures.

Why do T and f_s nudge the predicted rms?
Sharp eyes will notice that changing the duration or the sample rate moves the predicted rms by a little, even though neither appears in the area rule. That is because the prediction integrates the measured PSD — deliberately, so that both readouts come from the same data — and a measured PSD is a statistical estimate: the band's area scatters from record to record by roughly $1/\sqrt{\Delta f \cdot T}$ (independent samples ≈ bandwidth × duration). Push $T$ up and watch the prediction settle; change $f_s$ and you have effectively drawn a fresh record (and re-gridded the bins), so the area re-rolls within that scatter. The true area never moved. This wobble is the next section's subject, met in the wild before its theory.

Estimating a PSD honestly

Now the uncomfortable part. The definition of $S_x$ contains an ensemble average $\langle\cdot\rangle$ that your single lab record does not have. If you just take one FFT of one record and square it — the raw periodogram — you get an estimate whose expectation is the true PSD but whose fluctuation is enormous: each frequency bin is the sum of the squares of two Gaussian numbers (the real and imaginary parts of $X_T$), i.e. a chi-squared variable with 2 degrees of freedom. Its standard deviation equals its mean. Every bin of a raw periodogram is uncertain by about 100% — and recording longer does not help, because a longer record just gives you more bins, each one still 100% scattered.

The fix is to average. Welch's method: chop the record into $K$ segments, window each segment (more on that below), take the periodogram of each, and average the $K$ periodograms bin by bin. The fluctuation shrinks like $1/\sqrt{K}$; the price is frequency resolution, since each segment is shorter — $\Delta f = f_s/n_{\mathrm{seg}}$. That trade (smoothness vs resolution) is the fundamental bargain of spectrum estimation; there is no free lunch, only a well-chosen $K$.

Averaging periodograms: the fuzz collapses as $1/\sqrt{K}$

Top: the record actually used, chopped into the $K$ segments being averaged (only the first $K \times 0.512$ s of the full record is consumed — more averaging spends more data). Bottom: the estimate against the known true PSD (dashed). $K=1$ is the raw periodogram: 100% scatter, no matter how long the record. Untick "log frequency axis" to answer a classic puzzle — see the callout below.

Why does the fuzz look worse at high frequency?
It doesn't — every bin of the estimate has exactly the same fractional scatter ($1/\sqrt{K}$; the bins are statistically identical). The growing fuzz toward the right is an artifact of the logarithmic frequency axis: the FFT's bins are spaced linearly, so a log axis packs ten times more bins into each centimetre for every decade you move right. Near 10 Hz one pixel column shows a single bin's polite wiggle; near 1 kHz it shows the full spread of hundreds of independent draws — you are seeing the extremes of a crowd, not noisier statistics. Flip the demo to a linear axis and the fuzz becomes uniform across the band. (Log-log PSD plots from real analyzers often don't show this because they average neighbouring bins into log-spaced display points — which also quietly improves the estimate at high $f$.)
Why the window?
Chopping a record into segments creates artificial sharp edges, and sharp edges have broad spectra: a strong narrow line would "leak" a skirt of spurious power across the whole spectrum, burying weak features. A window (we use the Hann window, a raised cosine that tapers each segment smoothly to zero) suppresses this leakage dramatically, at the cost of slightly widening every spectral feature. The widening is quantified by the window's equivalent noise bandwidth (ENBW): each frequency bin behaves like a little band-pass filter collecting noise from around its centre, and the ENBW is the width of the ideal brick-wall filter that would collect the same noise power — the bin's effective width, in hertz, for all power bookkeeping. With no window each bin is $\Delta f = 1/T$ wide; the Hann window's gentler skirts make it collect a little more, $\mathrm{ENBW} = 1.5\,\Delta f$. That number is exactly what the next section needs; the DFT notes in the course page's reading list table ENBW for all the common windows.

Lines vs noise floors: the spectrum-analyzer trap

A PSD plot in the lab almost always shows two kinds of feature at once: sharp lines (a 50/60 Hz pickup, a carrier, a calibration tone) sticking out of a smooth noise floor. These two objects have different units, and confusing them is the classic spectrum-analyzer mistake.

A pure sine of amplitude $A$ carries power $A^2/2$ concentrated at one frequency — in the $T\to\infty$ limit its PSD is a delta function, $\frac{A^2}{2}\,\delta(f - f_0)$. In a real measurement with resolution $\Delta f$, all of that power lands in essentially one bin, so the height you read in V²/Hz is about $\frac{A^2/2}{\mathrm{ENBW}}$, where ENBW is the bin's effective width from the window callout above ($1.5\,\Delta f$ for our Hann window) — the height depends on your resolution, not on the signal! Halve $\Delta f$ and the spike doubles in height while its area stays fixed at $A^2/2$. The noise floor does the opposite: it is a genuine density, so its level in V²/Hz stays put while its per-bin power shrinks with the bin width. Try it:

A sine plus white noise: line height moves with resolution, floor does not

The same data, plotted both ways. Left: the PSD in V²/Hz — slide the segment length (finer resolution → narrower bins) and the spike at 250 Hz grows taller and narrower, its area pinned at $A^2/2 = 0.125\ \mathrm{V^2}$, while the noise floor never moves. Right: what a swept spectrum analyzer shows — power per resolution bandwidth — where everything is reversed: the line's height is pinned at $A^2/2$ and the floor rides up and down with the RBW.

Now the trap. A swept spectrum analyzer displays power per resolution bandwidth, not power per hertz. The unit on its screen is dBm per RBW — dBm is the RF world's absolute power unit, decibels relative to 1 mW ($0\ \mathrm{dBm} = 1$ mW, $-30\ \mathrm{dBm} = 1\ \mu$W), and dBm/Hz is a power density in the same currency. On that display everything is reversed: the line height stays fixed as you change RBW (the line's power all fits inside any RBW), while the noise floor moves — up 10 dB for every 10× wider RBW, because a wider filter collects more noise power. Same physics, the opposite visual behaviour, purely because of what is being plotted.

Lab note: quoting numbers off an analyzer
To quote a noise floor from a swept analyzer, you must normalize by the RBW: use the marker-noise mode (which reads directly in dBm/Hz, correcting for the filter's ENBW) or subtract $10\log_{10}(\mathrm{RBW}/\mathrm{Hz})$ yourself. To quote a line, use its power (dBm, or its area in V²) — never its height in density units, which is an artifact of your resolution setting. If a colleague's "noise floor" changes when they change RBW, they are quoting per-RBW numbers; if their "carrier level" changes, they are quoting a density. Both are fixable in one line of arithmetic — once you notice.

The essential code

The band-pass filter behind the central demo is worth seeing: a brick-wall filter is three steps in the FFT domain. Zeroing a bin $k$ requires also zeroing its negative-frequency mirror $N-k$, or the inverse transform stops being real.

// brick-wall band-pass: FFT, zero everything outside [f1, f2], inverse FFT
const N = x.length;                          // power of 2, sample rate fs
const re = Float64Array.from(x);
const im = new Float64Array(N);
FFT.fft(re, im);
for (let k = 0; k <= N / 2; k++) {
  const f = k * fs / N;
  if (f < f1 || f > f2) {
    re[k] = im[k] = 0;
    if (k > 0 && k < N / 2) re[N - k] = im[N - k] = 0;   // mirror bin
  }
}
FFT.ifft(re, im);        // re[] is now the filtered time trace
// check: Noise.rms(re)² equals the PSD area between f1 and f2

You now own the tool. The obvious next question is what shapes $S(f)$ actually takes when you point it at real hardware — and it turns out that almost everything in a lab lands on a handful of straight lines on a log–log plot. Those straight lines, and the very different personalities behind them, are the next chapter.

Exercises

Exercise 4.1 — reading a datasheet

An amplifier datasheet quotes input voltage noise of 3 nV/√Hz, flat out to 100 kHz. Your measurement has a 10 kHz bandwidth. What rms voltage noise do you expect at the input?

Solution

For a flat spectrum, rms = ASD × √(bandwidth): $3\,\mathrm{nV/\sqrt{Hz}} \times \sqrt{10^4\,\mathrm{Hz}} = 3\,\mathrm{nV} \times 100 = 300\,\mathrm{nV} = 0.3\ \mu\mathrm{V}$ rms. Notice the units work only because the ASD is a density: the √Hz in the denominator eats the √Hz of the bandwidth.

Exercise 4.2 — two-sided to one-sided

A theory paper reports a (two-sided) displacement noise spectrum $S^{\mathrm{2s}}_x(f) = 2\times10^{-34}\ \mathrm{m^2/Hz}$, flat, defined for $-\infty < f < \infty$. What is the one-sided PSD, and what ASD would you quote on a plot?

Solution

The one-sided PSD folds negative frequencies onto positive ones: $S_x = 2\,S^{\mathrm{2s}}_x = 4\times10^{-34}\ \mathrm{m^2/Hz}$. The ASD is its square root, $\sqrt{S_x} = 2\times10^{-17}\ \mathrm{m/\sqrt{Hz}}$. Note the ASD changes only by $\sqrt2$ — which is exactly why this mistake survives casual sanity checks.

Exercise 4.3 — normalizing by the RBW

A spectrum analyzer with RBW = 100 kHz shows a noise floor of −90 dBm. What is the noise floor in dBm/Hz? (Ignore the small ENBW correction.)

Solution

Divide by the bandwidth, i.e. subtract $10\log_{10}(10^5) = 50$ dB: $-90 - 50 = -140\ \mathrm{dBm/Hz}$. (Marker-noise mode also applies a small correction, typically a couple of dB, for the RBW filter's ENBW and the analyzer's detector statistics.)

Exercise 4.4 — the incredible growing spike

A 1 V rms sine is measured with an FFT analyzer, first with $\Delta f = 1$ Hz bins, then with $\Delta f = 0.01$ Hz bins (assume ideal rectangular bins and a bin-centred tone). What peak PSD height do you read in each case? What stays the same?

Solution

The sine's total power is $(1\ \mathrm{V\,rms})^2 = 1\ \mathrm{V^2}$, all in one bin. Height = power / bin width: $1\ \mathrm{V^2}/1\ \mathrm{Hz} = 1\ \mathrm{V^2/Hz}$ in the first case, $1/0.01 = 100\ \mathrm{V^2/Hz}$ in the second. The area is 1 V² both times — that is the number that describes the signal. (A real window spreads the power over ~ENBW instead of one bin, scaling both heights by the same factor.)

Exercise 4.5 — why longer records don't smooth

You double the length of a noise record and recompute the raw periodogram, hoping it will look smoother. It doesn't. Why not, and what should you do instead?

Solution

Doubling $T$ halves the bin width $\Delta f = 1/T$, so you get twice as many bins — but each bin is still a chi-squared variable with 2 degrees of freedom (one complex FFT value squared), i.e. still ~100% uncertain. Longer records buy resolution, never smoothness. Smoothness comes only from averaging: split the record into $K$ segments and Welch-average ($\sigma \propto 1/\sqrt K$), or equivalently average adjacent bins. The second demo on this page is exactly this experiment.