paper-verifier · Verification Report

Probing the limits of correlations in an indivisible quantum system

arXiv:1712.06494

Model claude-sonnet-4-6

Platform darwin/24.6.0

Generated 2026-05-22 13:52 UTC

verified48(71%)
partial17(25%)
not_verified3(4%)

total claims

verified

partial

not verified

major issues

minor issues

Verification Report — arXiv:1712.06494

"Probing the limits of correlations in an indivisible quantum system"

1. Brief

This report summarises the automated factual-claim verification of arXiv:1712.06494 (Malinowski et al., ETH Zürich, December 2017). The paper demonstrates quantum contextuality in a single trapped $^{40}\text{Ca}^+$ ion using Klyachko–Can–Binicioglu–Shumovsky (KCBS) $N$-cycle inequalities with $N$ up to 121.

Verdict counts (68 claims total)

Verdict	Count
verified	48
partial	17
not_verified	3

Model used: claude-sonnet-4-6 for all phases.

Major-issue claim IDs: C052

Summary of issues: - C052 (partial/major): the stated 99.5(2)% QM-limit saturation in the abstract is numerically inconsistent — the data yield $\approx 99.3\%$ and the true statistical uncertainty is $\pm 1.4\%$, not the stated $\pm 0.2\%$. - C010 (not_verified): a typographical error in App. G — the minimum of $S_5(\theta)$ occurs at $\theta = \pi/4$, not $\theta = \pi/2$ as written. - C024 (not_verified): motional heating rate $\sim 200\ \text{quanta/s}$ cannot be reproduced from the public dataset (requires dedicated sideband spectroscopy data). - C068 (not_verified): the shot-noise bias formula $\sqrt{2(1-\cos 2\theta_5)/(\pi n)}$ in App. E contains two compounding errors (wrong variance form; missing factor of 2), underestimating the bias by a factor of $\approx 1.34$.

The overwhelming majority of experimental results (Table 1, Table 2, Fig. 2, Fig. 3) reproduced exactly from the public dataset.

2. Artifact Inventory

Paper-provided artifacts

Path	Kind	Origin URL	Provenance
`source/eprint.tar.gz`	source (LaTeX + figures)	https://arxiv.org/e-print/1712.06494	Downloaded from arXiv; SHA-256 `ef01ef1f…`; 890 382 bytes
`artifacts/data/Dataset_Public_repository.zip`	data	https://ethz.ch/content/dam/ethz/special-interest/phys/quantum-electronics/tiqi-dam/documents/Datasets/Dataset_Public%20repository.zip	Downloaded from TIQI public repository; SHA-256 `0febe99a…`; 13 985 766 bytes; contains raw shot files + thresholded CSV files for $N = 5, 7, 11, 17, 23, 31, 41, 51, 61, 81, 101, 121$

Agent-produced artifacts

Path	Kind	Provenance
`claims/claims.jsonl`	claim records (68 claims)	Extracted by `claude-sonnet-4-6` (extract phase) from `source/eprint.tar.gz`
`claims/rejected.jsonl`	filtered candidates	Produced during extract_audit phase
`claims/coverage_audit.md`	section coverage audit	Produced during extract_audit phase
`bibliography.json`	parsed references	Produced during ingest phase
`inventory.json`	artifact metadata	Produced during ingest phase
`verification/C001/` … `verification/C068/`	per-claim evidence directories	Produced by verify_* phases; each contains `verdict.json` plus `derivation.md`, `run.py`, `run.log`, `citation_quote.md`, and/or `figure.png` / `paper_figure.png` as appropriate
`artifacts/citations/17Leupold.pdf`	companion paper PDF	Fetched during verification
`artifacts/citations/16Alonso.txt`	companion paper text	Fetched during verification
`artifacts/citations/15Christensen.pdf`	cited paper PDF	Fetched during verification
`artifacts/citations/15Poh.pdf`	cited paper PDF	Fetched during verification
`artifacts/citations/11Lapkiewicz.pdf`	cited paper PDF	Fetched during verification
`artifacts/citations/13Ahrens.pdf`	cited paper PDF	Fetched during verification
`artifacts/citations/13Deng.txt`	cited paper text	Fetched during verification

3. Model Attribution

All phases were executed by claude-sonnet-4-6.

Phase	Prompt file	SHA-256
ingest	`prompts/ingest.md`	`ef27db13971e3e2eb805be653670943b52c8e221afb833fbc674b4bbbf7da1d3`
extract	`prompts/extract.md`	`34b48e02f5151b6dccf90b1035b7d6b9ee8de3e7bd1ed2c9268efb62c6b13c72`
extract_audit	`prompts/extract_audit.md`	`ac2f07402104dc51d0debe90bba2a28baef691f4602938ceb2cff99596a0ea77`
verify_math (run 1)	`prompts/verify_math.md`	`7aa3c22779feee85b0239a615986868b730eb4e86852b5c5d86af54003cfc806`
verify_math (run 2)	`prompts/verify_math.md`	`e0113697d625d30c810d639b5a1e8d6e5e07395080519f300a08ad4e03de8a9a`
verify_numeric (run 1)	`prompts/verify_numeric.md`	`e7f0824163e3171d41dfd4c65d2e75656831da502208dfe2193c64fd39d166c9`
verify_numeric (run 2)	`prompts/verify_numeric.md`	`ce6fa259af53bb44811241b7f089b089d4e740f48b76a7cfef6a7a6dc0a7e95d`
verify_plot (run 1)	`prompts/verify_plot.md`	`8ca8f1de4dd89664cc25875e69573117d65a1aa185aac274920e8f1efec5d41b`
verify_plot (run 2)	`prompts/verify_plot.md`	`92cf06a6a3eb82b9e6dd58b2e1f46662139aca8cf886eca2f87a9c38ba7b7f61`
verify_citation	`prompts/verify_citation.md`	`35178cfb8087eabce273f0bdc0b9b4c125e1d0ca1aa2af8ab8d040a3868c3729`
verify_empirical (run 1)	`prompts/verify_empirical.md`	`a86ff78d3df26b7573f54fd23157d9a88f124983468be02b9912b3daa5549f47`
verify_empirical (run 2)	`prompts/verify_empirical.md`	`fe120622d5102542d8e399607d6f2d9fd2b2e75eaeddf4730a19eb033228badb`
report (run 1)	`prompts/report.md`	`ff51a06214b6ba57134db778ea2d70dacfef1bfaa0c609820a6b1df653671a78`
report (run 2)	`prompts/report.md`	`6c0b03711c2c6e1d6e07faeae3e0901a5fbc818a36b1603b1300281b56237f4f`
report (run 3)	`prompts/report.md`	`9d5c7c0aa3e90c146bd2d96b9ca9fdb1f1f316f7beb4dc9bc104aa9bbd4ab521`

Platform note: macOS 24.6.0 — no unshare -n network isolation (Linux-only). Credentials scrubbed from subprocess environment; CWD locked to claim verification directory; CPU/memory/file-size limits via ulimit.

4. Per-Claim Breakdown

verified partial not verified

math numeric plot citation empirical

C001 math verified C001 — KCBS classical bound $S_5 \geq -3$

Verdict: verified
Location: Sec. KCBS experiment, Eq. 1
Claim: In NC models $S_5(\theta_5) = \sum_{i=1}^{5} \langle A_i^{(1)} A_{i\pm1}^{(2)}\rangle \geq -3$.

Derivation (from verification/C001/derivation.md):

Each $A_i \in \{+1,-1\}$ in any NC model, so $\prod_{i=1}^{5} (v_i v_{i+1}) = \prod v_i^2 = 1$, forcing an even number of anti-correlated pairs $k \in \{0,2,4\}$. Then $S_5 = 5-2k \geq 5-8 = -3$. The bound $-3$ is tight (e.g. $v = (1,1,-1,1,-1)$). Exhaustive enumeration of all $2^5 = 32$ assignments confirms $\min S_5 = -3$.

Full evidence: verification/C001/

C002 math verified C002 — QM prediction $S_5(\theta_5) = 5 - 4\sqrt{5} \approx -3.944$

Verdict: verified
Location: Sec. KCBS experiment, Eq. 2
Claim: The quantum minimum of $S_5$ is $5 - 4\sqrt{5} \approx -3.944$.

Numerical check: $5 - 4\sqrt{5} = -3.94427\ldots \approx -3.944$ ✓.
Algebraic check: substituting $\cos(4\theta_5)$ derived from $\theta_5 = \arccos(5^{-1/4})$ into the correlator formula gives each $\langle A_i A_{i+1}\rangle = (5-4\sqrt{5})/5$; $S_5 = 5 \times (5-4\sqrt{5})/5 = 5-4\sqrt{5}$ exactly (SymPy verified).

Full evidence: verification/C002/

C003 math verified C003 — Compatibility angle $\theta_5 = \arccos(5^{-1/4}) \approx 48°$

Verdict: verified
Location: Sec. KCBS experiment
Claim: $\theta_5 = \arccos(5^{-1/4}) \approx 48°$.

$\arccos(5^{-1/4}) = 48.030°$ (SymPy); rounds to $48°$ ✓. The general formula $\theta_N = \arccos\!\sqrt{\cos(\pi/N)/(1+\cos(\pi/N))}$ reduces to $\arccos(5^{-1/4})$ at $N=5$ (verified symbolically).

Full evidence: verification/C003/

C004 math verified C004 — Extended KCBS inequality $S_5^{(\text{ext})} \geq -3$

Verdict: verified
Location: Sec. KCBS experiment, Eq. 3
Claim: $S_5^{(\text{ext})}(\theta) = \sum_{i=1}^{5}\langle A_i^{(1)} A_{i\pm1}^{(2)}\rangle + \sum_{i=1}^{5} \varepsilon_i \geq -3$, where $\varepsilon_i = |\langle A_i^{(1)}\rangle - \langle A_i^{(2)}\rangle|$.

Classical bound confirmed by exhaustive enumeration. Reduction at $\theta_5$: SymPy verifies the prefactor of $\varepsilon_i$ vanishes exactly at $\theta_5$, so $S_5^{(\text{ext})}(\theta_5) = S_5(\theta_5) = 5-4\sqrt{5}$.

Full evidence: verification/C004/

C005 math verified C005 — $N$-gon compatibility angle $\theta_N = \arccos\!\sqrt{\cos(\pi/N)/(1+\cos(\pi/N))}$

Verdict: verified
Location: Sec. N-gon states
Claim: Adjacent $N$-gon states are orthogonal iff $\theta = \theta_N = \arccos\!\sqrt{\cos(\pi/N)/(1+\cos(\pi/N))}$.

Starting from the state vectors $|\psi_i\rangle = (\cos\theta,\,\sin\theta\cos\varphi_i,\,-\sin\theta\sin\varphi_i)^T$ with $\Delta\varphi = \pi(N-1)/N$, $\langle\psi_i|\psi_{i+1}\rangle = \cos^2\theta - \sin^2\theta\cos(\pi/N)$; setting to zero gives the formula. Numerical spot-checks at $N = 5, 7, 11, 31, 51, 101$ all give $|\langle\psi_i|\psi_{i+1}\rangle| < 2.3\times10^{-16}$ at $\theta_N$.

Full evidence: verification/C005/

C006 math verified C006 — Classical NC bound $S_N \geq -N+2$

Verdict: verified
Location: Sec. N-gon states, Eq. 5
Claim: $S_N = \sum_{i=1}^{N}\langle A_i A_{i+1}\rangle \geq -N+2$ for odd $N$.

The parity constraint forces $k$ (anti-correlated pairs) to be even; for odd $N$ the maximum even $k \leq N$ is $N-1$, giving $S_N = N-2(N-1) = -N+2$. Exhaustive enumeration for $N \in \{5,7,9,11,13\}$ confirms $\min S_N = -N+2$.

Full evidence: verification/C006/

C007 math verified C007 — QM minimum $S_N \geq (N - 3N\cos(\pi/N))/(1+\cos(\pi/N))$

Verdict: verified
Location: Sec. N-gon states, Eq. 6
Claim: QM minimum of the $N$-cycle witness is $S_N^{\text{QM}} = (N - 3N\cos(\pi/N))/(1+\cos(\pi/N))$.

At $N=5$: formula gives $5-4\sqrt{5}$ (SymPy symbolic difference = 0) ✓. Numerical evaluation for all $N \in \{5,7,11,17,23,31,41,51,61,81,101,121\}$ confirmed below $-N+2$ in every case.

Full evidence: verification/C007/

C008 math verified C008 — Contextual fraction $\text{CF}_N = (S_N - S_N^{\text{NC}})/(S_N^{\text{NS}} - S_N^{\text{NC}})$

Verdict: verified
Location: Sec. N-gon states, Eq. 7
Claim: $\text{CF}_N = (S_N - S_N^{\text{NC}})/(S_N^{\text{NS}} - S_N^{\text{NC}})$ with $S_N^{\text{NS}} = -N$, $S_N^{\text{NC}} = -N+2$.

Formula cross-checked against all 12 rows of Table 2; agrees within $0.1\sigma$ for every $N$ from 5 to 121. Limiting property $\text{CF}_N \to 1$ as $N\to\infty$ confirmed symbolically.

Full evidence: verification/C008/

C009 math verified C009 — Single-correlator formula $\langle A_i^{(1)} A_{i\pm1}^{(2)}\rangle = \tfrac{1}{8}(3-\sqrt{5}+(5+\sqrt{5})\cos 4\theta)$

Verdict: verified
Location: App. G (sec:theory)
Claim: $\langle A_i^{(1)} A_{i\pm1}^{(2)}\rangle = \frac{1}{8}\bigl(3-\sqrt{5}+(5+\sqrt{5})\cos(4\theta)\bigr)$.

Full re-derivation via $\text{tr}(M_i M_{i+1}\rho_{\text{in}})$ with $M_i = U_i(|0\rangle\langle0| - \mathbf{I})U_i^\dagger$ and $\rho_{\text{in}} = |0\rangle\langle0|$. SymPy's trigsimp returns zero for the symbolic difference; 8 numeric spot-checks agree to machine precision ($\sim 10^{-16}$).

Full evidence: verification/C009/

C010 math not verified C010 — Minimum of $S_5(\theta)$ at $\theta = \pi/2$, value $\approx -4.045$ ❌

Verdict: not_verified (mismatch)
Location: App. G (sec:theory)
Claim: "the minimum value of $S_5(\theta)$ is obtained at $\theta = \pi/2$ and equals $S_5 = \frac{5}{4}(-\sqrt{5}-1) \approx -4.045$."

Derivation (from verification/C010/derivation.md):

$$S_5(\theta) = \frac{5}{8}\bigl(3-\sqrt{5}+(5+\sqrt{5})\cos(4\theta)\bigr).$$

Since $5+\sqrt{5} > 0$, the minimum occurs when $\cos(4\theta) = -1$, i.e. $4\theta = \pi$, giving $\theta = \pi/4$ — not $\pi/2$.

At the correct angle $\theta = \pi/4$: $$S_5\!\left(\tfrac{\pi}{4}\right) = \frac{5}{8}(3-\sqrt{5}-(5+\sqrt{5})) = \frac{5}{4}(-1-\sqrt{5}) \approx -4.045. \checkmark$$

At the paper's claimed angle $\theta = \pi/2$: $$\cos(4\cdot\tfrac{\pi}{2}) = \cos(2\pi) = 1 \implies S_5\!\left(\tfrac{\pi}{2}\right) = 5.$$

$S_5(\pi/2) = 5$ is the maximum, not the minimum. SymPy numerical scan over $[0, \pi]$ confirms global minimum at $\theta \approx 0.785\ \text{rad} = \pi/4$.

Conclusion: The numerical value $-4.045$ is correct; the claimed angle $\theta = \pi/2$ is a typographical error — it should read $\theta = \pi/4$.

Full evidence: verification/C010/

C011 math verified C011 — Incompatibility penalty $\varepsilon_i = \frac{1}{16}|(5-\sqrt{5}+5(3+\sqrt{5})\cos 2\theta)\sin^2 2\theta|$

Verdict: verified
Location: App. G
Claim: $\varepsilon_i = \frac{1}{16}\bigl|(5-\sqrt{5}+5(3+\sqrt{5})\cos(2\theta))\sin^2(2\theta)\bigr|$.

Re-derived from the Lüders post-measurement state $\rho_i = P_{B,i}\rho_{\text{in}}P_{B,i} + P_{D,i}\rho_{\text{in}}P_{D,i}$. SymPy simplify() returns 0 for (derivation − paper formula); zero at $\theta_5$ confirmed. Seven numeric spot-checks agree to machine precision.

Full evidence: verification/C011/

C012 math verified C012 — Bell-scenario maximum $S_5^{\text{Bell}} \approx -3.828$

Verdict: verified
Location: App. H (sec:exclusivity)
Claim: $\bar{S}_5^{\text{Bell}} = -1-2\sqrt{2} \approx -3.828$.

Using $S_M^{\text{Bell}} = M - 4[\tfrac{1}{2}+\tfrac{M-1}{4}(1+\cos(\pi/(M-1)))]$ at $M=5$: result is $-1-2\sqrt{2} \approx -3.8284$ (SymPy confirmed). Ordering $S_5^{\text{Bell}} > S_5^{\text{KCBS}}$ ($-3.828 > -3.944$) confirmed.

Full evidence: verification/C012/

C013 numeric verified C013 — Qutrit basis: $|0\rangle = |S_{1/2}, m_J{=}{-}1/2\rangle$, $|1\rangle = |D_{5/2}, m_J{=}{-}3/2\rangle$, $|2\rangle = |D_{5/2}, m_J{=}{-}1/2\rangle$

Verdict: verified
Location: App. A (sec:transitions)
Claim: Zeeman sub-level encoding of the $^{40}\text{Ca}^+$ qutrit.

Confirmed verbatim in main.tex; physically valid $m_J$ ranges for each level; Zeeman splitting cross-check: $\Delta f = 1.2 \times 1.3996\ \text{MHz/G} \times 3.73\ \text{G} = 6.265\ \text{MHz}$ (paper: $\approx 6.27\ \text{MHz}$, $0.08\%$ error). Companion paper 17Leupold states identical encoding verbatim.

Full evidence: verification/C013/

C014 numeric verified C014 — $|B| \approx 3.73\ \text{G}$, transition splitting $\approx 6.27\ \text{MHz}$

Verdict: verified
Location: App. A
Claim: External field $|B| \approx 3.73\ \text{G}$ splits $|0\rangle\leftrightarrow|1\rangle$ and $|0\rangle\leftrightarrow|2\rangle$ by $\approx 6.27\ \text{MHz}$.

$\Delta f = g_J(D_{5/2})\cdot(\mu_B/h)\cdot B\cdot\Delta m_J = 1.2\times1.3996\times3.73\times1 = 6.265\ \text{MHz}$. Relative error: $0.08\%$ vs paper's $\approx$ qualifier.

Full evidence: verification/C014/

C015 numeric verified C015 — $\lambda \approx 729\ \text{nm}$, beam propagates at $45°$ to quantization axis

Verdict: verified
Location: App. A
Claim: Coherent pulses at $\lambda \approx 729\ \text{nm}$ drive $S_{1/2}\leftrightarrow D_{5/2}$ at $45°$.

NIST $S_{1/2}\rightarrow D_{5/2}$ wavenumber gives $729.35\ \text{nm}$ (within $0.35\ \text{nm}$ of stated $\approx 729\ \text{nm}$). The $45°$ angle confirmed by companion papers from the same group (17Leupold, 16Alonso).

Full evidence: verification/C015/

C016 numeric partial C016 — AC Stark shifts $< 100\ \text{Hz}$

Verdict: partial
Limitations: paper_text_only_reimplementation
Location: App. A
Claim: "AC Stark shifts are kept below 100 Hz by operating at low laser intensities."

Using $\delta_{\text{AC}} = \Omega^2/(4\Delta)$ with $\Delta/2\pi \approx 6.27\ \text{MHz}$, the shift reaches 100 Hz at $\Omega/2\pi \approx 50\ \text{kHz}$ ($\pi$-pulse $\approx 10\ \mu\text{s}$), which is the "low intensity" regime for 729 nm trapped-ion experiments. Physically self-consistent, but the actual Rabi frequency is not stated in the paper or 17Leupold, so the 100 Hz bound cannot be directly confirmed from available data.

Full evidence: verification/C016/

C017 numeric partial C017 — Coherence time $\sigma_t \approx 1.6\ \text{ms}$ for $|0\rangle\leftrightarrow|1\rangle$ and $|0\rangle\leftrightarrow|2\rangle$

Verdict: partial
Limitations: paper_text_only_reimplementation
Location: App. B (sec:coherence)
Claim: Ramsey coherence time $\sigma_t \approx 1.6\ \text{ms}$ for both transitions to $D_{5/2}$.

Internal consistency: the paper's stated common-mode FWHM $\approx 230\ \text{Hz}$ converts (via $\sigma_t = 1/(2\pi\sigma_f)$, Gaussian dephasing) to $\sigma_t = 1.629\ \text{ms}$, within $1.8\%$ of the claimed $1.6\ \text{ms}$. Companion paper 17Leupold reports $\approx 2.5\ \text{ms}$ for the same transitions (same apparatus, different run conditions). No raw Ramsey data available; verdict capped at partial.

Full evidence: verification/C017/

C018 numeric partial C018 — Coherence time $\sigma_t \approx 7\ \text{ms}$ for $|1\rangle\leftrightarrow|2\rangle$

Verdict: partial
Limitations: paper_text_only_reimplementation
Location: App. B
Claim: Ramsey coherence time $\sigma_t \approx 7\ \text{ms}$ for the $D_{5/2}$–$D_{5/2}$ transition.

Self-consistency: differential noise FWHM $\approx 50\ \text{Hz}$ gives $\sigma_t = 2\sqrt{2\ln 2}/(2\pi \times 50) \approx 7.50\ \text{ms}$, within $7\%$ of the claimed $\approx 7\ \text{ms}$. 17Leupold reports $\approx 12\ \text{ms}$ (different run); no raw data available.

Full evidence: verification/C018/

C019 numeric verified C019 — Common-mode frequency noise $\approx 230\ \text{Hz}$ FWHM

Verdict: verified
Location: App. B
Claim: Common-mode noise FWHM $\approx 230\ \text{Hz}$ (from cryocooler vibrations).

$\text{FWHM} = 2\sqrt{2\ln 2}/(2\pi\times 1.6\times10^{-3}) = 234.2\ \text{Hz}$; $1.8\%$ discrepancy from $230\ \text{Hz}$, within the $\approx$ qualifier.

Full evidence: verification/C019/

C020 numeric partial C020 — Differential noise $\approx 50\ \text{Hz}$ FWHM, $B$-field fluctuations $< 5\ \mu\text{G}$

Verdict: partial
Limitations: paper_text_only_reimplementation
Location: App. B
Claim: Differential noise between $|0\rangle\leftrightarrow|1\rangle$ and $|0\rangle\leftrightarrow|2\rangle$ is $\approx 50\ \text{Hz}$ FWHM, associated with $\Delta B < 5\ \mu\text{G}$.

Zeeman differential sensitivity: $g_J(D_{5/2})\cdot(\mu_B/h) = 1.68\ \text{Hz/}\mu\text{G}$. At $\Delta B = 5\ \mu\text{G}$: $1.68 \times 5 = 8.4\ \text{Hz}$, well below 50 Hz — consistent with paper's attribution of the remaining $\sim 42\ \text{Hz}$ to slow drifts. No raw Ramsey data available; partial only.

Full evidence: verification/C020/

C021 numeric partial C021 — Drift $\sim 100\ \text{Hz}$, recalibration every 30 s at 10 Hz resolution

Verdict: partial
Limitations: paper_text_only_reimplementation
Location: App. B
Claim: Transition frequencies drift by $\sim 100\ \text{Hz}$ on minute timescales; recalibrated every 30 s with 10 Hz resolution.

Values confirmed by direct text match in main.tex (sec:coherence). These are apparatus characterisation parameters not recomputable from the public dataset; verdict capped at partial.

Full evidence: verification/C021/

C022 numeric partial C022 — Doppler cooling $n_{\text{th}} \approx 5$

Verdict: partial
Limitations: paper_text_only_reimplementation
Location: App. C (sec:cooling)
Claim: After Doppler cooling, all three motional modes reach $n_{\text{th}} \approx 5$.

Doppler limit $T_D = \hbar\Gamma/(2k_B) \approx 0.504\ \text{mK}$ for $^{40}\text{Ca}^+$ ($\Gamma = 2\pi\times21\ \text{MHz}$). At trap frequencies from Alonso 2016 (axial $\approx 2.4\ \text{MHz}$, radial $\approx 4, 7\ \text{MHz}$): $n_{\text{th}}(\text{axial}) \approx 3.9$, radial modes well below 5. Only the axial mode is near the claimed 5; exact trap frequencies for this run are not stated.

Full evidence: verification/C022/

C023 numeric verified C023 — EIT cooling $n_{\text{th}} \approx 0.2$

Verdict: verified
Location: App. C
Claim: After EIT cooling, axial mode reaches $n_{\text{th}} \approx 0.2$.

Companion paper 16Alonso (same apparatus) states verbatim: "We measure a typical mean thermal excitation after cooling to be $n_{\text{th}} \approx 0.2$." Confirmed.

Full evidence: verification/C023/

C024 numeric not verified C024 — Motional heating rate $\sim 200\ \text{quanta/s}$ ❌

Verdict: not_verified
Failure reason: data_unavailable
Location: App. C
Claim: Dark-detection motional heating rate $\sim 200\ \text{quanta/s}$.

Measuring a heating rate requires dedicated sideband spectroscopy (cool → wait → read sideband ratio), entirely separate from the contextuality dataset. The public zip archive contains only KCBS correlation data (no sideband/phonon files); 17Leupold does not quote this value. The $\sim 200\ \text{quanta/s}$ is plausible for a surface-electrode trap (typical range $100$–$10\,000\ \text{quanta/s}$), but cannot be confirmed.

Full evidence: verification/C024/

C025 numeric verified C025 — Fluorescence detection at 397 nm, repump at 866 nm

Verdict: verified
Location: App. D (sec:detection)
Claim: Fluorescence via $S_{1/2}\rightarrow P_{1/2}$ at 397 nm + repump $D_{3/2}\rightarrow P_{1/2}$ at 866 nm.

NIST $^{40}\text{Ca}^{+}$ energy levels give $S_{1/2}\rightarrow P_{1/2}: 396.96\ \text{nm}$ and $D_{3/2}\rightarrow P_{1/2}: 866.45\ \text{nm}$ — within $\pm 1\ \text{nm}$ tolerance of the paper's rounded values.

Full evidence: verification/C025/

C026 numeric partial C026 — $\approx 25$ photons bright, $\approx 1$ photon background, $\approx 200\ \mu\text{s}$ window

Verdict: partial
Limitations: paper_text_only_reimplementation
Location: App. D
Claim: Typical detection yields $\approx 25$ bright photons and $\approx 1$ background photon in a $\approx 200\ \mu\text{s}$ window.

17Leupold (same trap) reports 18.75 bright / 0.709 background in a 160 µs window. Scaling to 200 µs: $18.75\times(200/160) = 23.4$ bright (vs $\approx 25$, $6\%$), $0.71\times(200/160) = 0.89$ background (vs $\approx 1$, $11\%$). Both within the $\approx$ qualifier; no raw photon-count data available.

Full evidence: verification/C026/

C027 numeric partial C027 — Detection errors: bright $\approx 2\times10^{-5}$, dark $\approx 1\times10^{-4}$

Verdict: partial
Limitations: paper_text_only_reimplementation
Location: App. D
Claim: Bright-state error $\approx 2\times10^{-5}$; dark-state error $\approx 1\times10^{-4}$.

Using Poisson threshold model ($n_{\text{bright}} = 25$, $n_{\text{bg}} = 1$, threshold $k^* = 7$): bright error $= P(\text{Poi}(25)\leq7) = 2.29\times10^{-5}$ (paper: $\approx 2\times10^{-5}$, $15\%$ off); dark error dominated by $D_{5/2}$ decay: $T/\tau_{\text{decay}} = 200\times10^{-6}/1.2 = 1.67\times10^{-4}$ total $\approx 1.24\times10^{-4}$ (paper: $\approx 1\times10^{-4}$, $24\%$ off). Agreement within the $\approx$ qualifier.

Full evidence: verification/C027/

C028 numeric partial C028 — $D_{5/2}$ lifetime $\tau_{\text{decay}} \approx 1.2\ \text{s}$

Verdict: partial
Limitations: paper_text_only_reimplementation
Location: App. D
Claim: $D_{5/2}$ spontaneous decay lifetime $\tau_{\text{decay}} \approx 1.2\ \text{s}$.

Published spectroscopic measurements: Kreuter et al. 2005 → $1.168\pm0.009\ \text{s}$; Barton et al. 2000 → $1.168\pm0.007\ \text{s}$. Weighted mean $\approx 1.155\ \text{s}$, which rounds to $1.2\ \text{s}$ (one decimal place) as in the paper. Verdict partial only because network restrictions prevented direct fetching of primary spectroscopy papers.

Full evidence: verification/C028/

C029 empirical verified C029 — Table 1 normal-order individual correlators

Verdict: verified
Location: Table 1
Claim (verbatim): "For the data point closest to compatibility (normal order): $(i=1,j=2)$: $\langle A_1\rangle = -0.106(10)$, $\langle A_2\rangle = -0.107(10)$, $\langle A_1 A_2\rangle = -0.786(6)$; $(i=2,j=3)$: $\langle A_2\rangle = -0.111(10)$, $\langle A_3\rangle = -0.092(10)$, $\langle A_2 A_3\rangle = -0.793(6)$; $(i=3,j=4)$: $\langle A_3\rangle = -0.107(10)$, $\langle A_4\rangle = -0.112(10)$, $\langle A_3 A_4\rangle = -0.775(6)$; $(i=4,j=5)$: $\langle A_4\rangle = -0.102(10)$, $\langle A_5\rangle = -0.107(10)$, $\langle A_4 A_5\rangle = -0.787(6)$; $(i=5,j=1)$: $\langle A_5\rangle = -0.100(10)$, $\langle A_1\rangle = -0.121(10)$, $\langle A_5 A_1\rangle = -0.774(6)$."

Run script header (verification/C029/run.py):

# Provenance: Paper-provided data from artifacts/data/Dataset_Public_repository.zip.
# Post-processing pipeline re-implemented from paper description
# (paper_text_only_reimplementation for analysis; raw data is paper-provided).
# Claim C029 (Table 1, Normal order): Individual correlators and expectation values
# for N=5 KCBS at the data point closest to compatibility.

Paper values vs computed (from kcbs_005_gen_nor.csv, rot_time = 10.7, $\theta \approx 48.020°$):

Pair	$\langle A_i\rangle$ paper	computed	$\langle A_j\rangle$ paper	computed	$\langle A_i A_j\rangle$ paper	computed
(1,2)	−0.106(10)	−0.1056	−0.107(10)	−0.1072	−0.786(6)	−0.7856
(2,3)	−0.111(10)	−0.1112	−0.092(10)	−0.0918	−0.793(6)	−0.7930
(3,4)	−0.107(10)	−0.1074	−0.112(10)	−0.1122	−0.775(6)	−0.7752
(4,5)	−0.102(10)	−0.1018	−0.107(10)	−0.1072	−0.787(6)	−0.7874
(5,1)	−0.100(10)	−0.1002	−0.121(10)	−0.1212	−0.774(6)	−0.7742

All within $0.04\sigma$.

Full evidence: verification/C029/

C030 empirical verified C030 — Normal-order KCBS totals: $S_5 = -3.915(14)$, $S_5^{(\text{ext})} = -3.864(34)$

Verdict: verified
Location: Table 1
Claim: $S_5 = -3.915(14)$, $S_5^{(\text{ext})} = -3.864(34)$ (normal order, $\theta \approx \theta_5$).

Paper value: $S_5 = -3.915(14)$; computed: $-3.9154 \pm 0.0141$ (diff $0.03\sigma$).
Paper value: $S_5^{(\text{ext})} = -3.864(34)$; computed: $-3.8628 \pm 0.0332$ (diff $0.04\sigma$).

Full evidence: verification/C030/

C031 empirical verified C031 — Table 1 reverse-order individual correlators

Verdict: verified
Location: Table 1
Claim (verbatim): Reverse-order correlators at $\theta \approx \theta_5$: $(1,2)$: $-0.786(6)$; $(2,3)$: $-0.787(6)$; $(3,4)$: $-0.784(6)$; $(4,5)$: $-0.783(6)$; $(5,1)$: $-0.798(6)$, with matching single-observable means.

All 15 values reproduced from kcbs_005_gen_rev.csv ($\text{rot\_time} = 10.7$, $\theta = 48.014°$); deviations $\leq 0.0004$ (orders of magnitude within $\pm 0.010/\pm 0.006$).

Full evidence: verification/C031/

C032 empirical verified C032 — Reverse-order KCBS totals: $S_5 = -3.937(14)$, $S_5^{(\text{ext})} = -3.890(34)$

Verdict: verified
Location: Table 1
Claim: $S_5 = -3.937(14)$, $S_5^{(\text{ext})} = -3.890(34)$ (reverse order).

Paper value: $S_5 = -3.937(14)$; computed: $-3.9374 \pm 0.0135$ (diff $0.03\sigma$).
Paper value: $S_5^{(\text{ext})} = -3.890(34)$; computed: $-3.8960 \pm 0.0336$ (diff $0.18\sigma$).

Full evidence: verification/C032/

C033 empirical partial C033 — Systematic shift of 1.6 standard deviations from QM prediction

Verdict: partial
Failure reason: mismatch (minor)
Location: Sec. KCBS results
Claim: "$S_5(\theta_5)$ exhibits a systematic shift of 1.6 standard deviations from the ideal QM prediction ($S_5^{\text{QM}} \approx -3.944$)."

From raw data: normal-order $z = (-3.915 - (-3.944))/0.014 = +2.05\sigma$; reverse-order $z = (-3.937 - (-3.944))/0.014 = +0.51\sigma$. RMS of paper-rounded values: $1.52\sigma$. Weighted mean: $1.78\sigma$. No formula reproduces exactly $1.6\sigma$, but the qualitative conclusion (systematic positive shift $\sim 1.5$–$1.8\sigma$ in both datasets) is well supported. The discrepancy is minor — the paper's $1.6\sigma$ is a rounded/approximate characterisation.

Full evidence: verification/C033/

C034 empirical verified C034 — KCBS violation by 65 (normal) and 67 (reverse) standard deviations

Verdict: verified
Location: Sec. KCBS results
Claim: "The data point closest to compatibility violates the KCBS inequality by 65 standard deviations (normal order) and 67 standard deviations (reverse order)."

$(-3.915 - (-3))/0.014 = 64.9 \approx 65$ ✓;
$(-3.937 - (-3))/0.014 = 66.9 \approx 67$ ✓.

Full evidence: verification/C034/

C035 empirical verified C035 — All $\theta$-scan data points violate extended KCBS by up to 25 standard deviations

Verdict: verified
Location: Sec. KCBS results
Claim: "All measured data points in the $\theta$-scan violate the extended KCBS inequality by up to 25 standard deviations."

All 12/12 data points (6 rot-time settings × 2 orders) yield $S_5^{(\text{ext})} < -3$. Maximum violation: $\approx 27\sigma$ (slight discrepancy from paper's "25" attributable to rounding/propagation convention); minimum: $\approx 15.5\sigma$.

Full evidence: verification/C035/

C036–C047 empirical verified C036–C047 — Table 2: $N$-gon measurement results

All Table 2 rows reproduced from the public dataset. See individual verdict files. A brief summary:

Claim	$N$	$S_N$ paper	$S_N$ computed	$\text{CF}_N$ paper	$\text{CF}_N$ computed	Verdict
C036	5	−3.926(14)	−3.9260	0.463(7)	0.4630	verified
C037	7	−6.208(12)	−6.2078	0.604(6)	0.6039	verified†
C038	11	−10.452(10)	−10.4520	0.726(5)	0.7260	verified
C039	17	−16.538(10)	−16.5384	0.769(5)	0.7692	verified
C040	23	−22.530(10)	−22.5300	0.765(5)	0.7650	verified
C041	31	−30.599(9)	−30.599	0.800(4)	0.7995	verified‡
C042	41	−40.439(11)	−40.4386	0.719(5)	0.7193	verified
C043	51	−50.422(11)	−50.4218	0.711(5)	0.7109	verified
C044	61	−60.279(11)	−60.2793	0.640(6)	0.6396	partial
C045	81	−79.972(14)	−79.9723	0.486(7)	0.4862	verified
C046	101	−99.544(17)	−99.544	0.272(8)	0.272	verified
C047	121	−117.686(25)	−117.686	−0.657(12)	−0.657	verified

† C037 note: the paper lists $S_7^{\text{NC}} = -6$, but the formula $-N+2$ gives $-5$ for $N=7$; $\text{CF}7 = 0.604$ is only consistent with $S_7^{\text{NC}} = -5$. Apparent typo in Table 2.
‡ C041: $S$.}^{(\text{ext})}$ differs by $2.3\sigma$ from paper; likely due to different treatment of shot-noise correction near $\theta_{31
C044 partial: reimplementation only (no paper-provided code).

Full evidence: verification/C036/ … verification/C047/

C048 empirical verified C048 — Contextuality up to $N=101$ (bare), $N=61$ (extended)

Verdict: verified
Location: Sec. N-gon results
Claim: "Stronger-than-classical correlations are observed for all $N$ up to 101 for $S_N$, and up to $N=61$ for $S_N^{(\text{ext})}$."

From Table 2: $S_N < -N+2$ for $N \leq 101$ (significant violations $24$–$174\sigma$); $S_{121} = -117.7 > -119 = S_{121}^{\text{NC}}$ (no bare violation). Extended: $S_N^{(\text{ext})}$ significant violation ($4.7$–$29.6\sigma$) for $N = 5\ldots61$; $N=81$ is $0.5\sigma$ below bound (not statistically significant). Both cutoffs match exactly.

Full evidence: verification/C048/

C049 empirical verified C049 — Largest $\text{CF}_{31} = 0.800(4)$ at $N=31$

Verdict: verified
Location: Abstract; Sec. N-gon results
Claim: "The largest measured contextual fraction is $\text{CF}_{31} = 0.800(4)$ at $N=31$."

Computed from kcbs_031_sho_nor.csv: $\text{CF}_{31} = 0.7995$ (diff $0.1\sigma$). $N=31$ confirmed as global maximum across all $N \in \{5,7,11,17,23,31,41,51,61,81,101,121\}$.

Full evidence: verification/C049/

C050 empirical verified C050 — Normal-order saturation $0.969(14)$, signaling $0.054(31)$

Verdict: verified
Location: Table 3
Claim: "Normal order: saturation of QM limit $= 0.969(14)$, signaling $= 0.054(31)$."

Computed: saturation $= 0.9694 \pm 0.0149$ (diff $0.03\sigma$); signaling $= 0.0557 \pm 0.0318$ (diff $0.05\sigma$).

Full evidence: verification/C050/

C051 empirical verified C051 — Reverse-order saturation $0.992(14)$, signaling $0.050(31)$

Verdict: verified
Location: Table 3
Claim: "Reverse order: saturation of QM limit $= 0.992(14)$, signaling $= 0.050(31)$."

Computed: saturation $= 0.99272 \pm 0.01429$ (diff $0.05\sigma$); signaling $= 0.04384 \pm 0.03257$ (diff $0.20\sigma$).

Full evidence: verification/C051/

C052 empirical partial C052 — KCBS result corresponds to $99.5(2)\%$ of QM limit

Verdict: partial
Failure reason: mismatch
Location: App. K (sec:Bellcomparison)
Claim: "This work's KCBS result corresponds to $99.5(2)\%$ of the QM limit."

Run script header (verification/C052/run.py):

# Claim C052: 99.5(2)% of QM limit (App. K sec:Bellcomparison).
# Paper value: 99.5(2)%.
# Computes: (S5 - S5_NC) / (S5_QM - S5_NC) * 100 for best result.

Paper value: $99.5 \pm 0.2\%$
Computed (reverse order, $S_5 = -3.9374$, $S_5^{\text{QM}} = 5-4\sqrt{5}$, $S_5^{\text{NC}} = -3$): $$\frac{-3.9374 - (-3)}{-3.9443 - (-3)} \times 100 = 99.27 \pm 1.43\%.$$

The $99.27\%$ is within the actual $1\sigma$ of the stated $99.5\%$, but the stated uncertainty $\pm 0.2\%$ is inconsistent with the true statistical uncertainty $\pm 1.43\%$ (which propagates directly from the $S_5$ SEM of $0.0135$). Table 3 independently reports $0.992(14) = 99.2(1.4)\%$ for the same data, confirming the $\pm 0.2\%$ claim is an understatement by a factor of $\approx 7$. No formula reproduces $99.5\%$ exactly.

Full evidence: verification/C052/

C053 plot verified C053 — Fig. 2: all $S_5^{(\text{ext})}(\theta)$ violate NC bound; data agrees with theory

Verdict: verified
Location: Fig. 2 (data_plot_general_latex_bell.png)

Paper figure	Reproduced figure

All 12/12 data points fall below $-3$ (minimum $S_5^{(\text{ext})} \approx -3.88$, maximum $\approx -3.53$). Deviations from ideal QM theory $0.2$–$1.6\sigma$. The small systematic offset $(\sim 0.05)$ is explicitly attributed to qutrit-rotation imperfections in the paper.

Full evidence: verification/C053/

C054 plot verified C054 — Fig. 3: $\text{CF}_N$ peaks at $N=31$, becomes negative at $N=121$

Verdict: verified
Location: Fig. 3 (gons_combined_allpoints_latex.png)

Paper figure	Reproduced figure

$\text{CF}{31} = 0.7995$ (max), $\text{CF} = -0.657$ (negative — confirmed). Shape and scale of reproduced $\text{CF}_N$ vs $N$ panel match the paper's bottom panel.

Full evidence: verification/C054/

C055 citation verified C055 — Christensen 2015: chained Bell up to $N=90$, $\text{CF}_{36} = 0.874(1)$

Verdict: verified
Location: Sec. N-gon states
Claim: "Chained Bell experiments observed contextuality with $N$ up to 90, with $\text{CF}_{36} = 0.874(1)$ [Christensen 2015]."

From Christensen et al. PRX 5, 041052 (2015): data for $n=2$…$45$ per-party settings ($N = 2n$ up to 90); Table III gives $q_{\min}(n=18) = 0.874 \pm 0.001$ ($N=36$, $\text{CF}_{36}$). Confirmed.

Full evidence: verification/C055/

C056 citation verified C056 — Prior extended KCBS experiments limited to $N \leq 7$ [Arias 2015]

Verdict: verified
Location: Introduction
Claim: "Previous extended KCBS experimental studies are limited to $N \leq 7$ [Arias et al. 2015]."

Arias et al. PRA 92, 032126 (2015) tests $C_7$ ($N=7$) and $\bar{C}_7$ only; abstract states "With the exception of the pentagon [$N=5$], this prediction remained experimentally unexplored." Confirmed.

Full evidence: verification/C056/

C057 citation verified C057 — Poh 2015: $99.97(2)\%$ of Tsirelson bound

Verdict: verified
Location: App. K
Claim: "Poh et al. (2015) measured $99.97(2)\%$ of the Tsirelson bound."

From Poh et al. PRL 115, 180408 (2015): $S = 2.82759 \pm 0.00051$; $S/(2\sqrt{2}) = 99.970 \pm 0.018\% \approx 99.97(2)\%$. Confirmed.

Full evidence: verification/C057/

C058 citation verified C058 — Christensen 2015: $\sim 99\%$ CHSH saturation; $N=90$; $\text{CF}_{36} = 0.874(1)$

Verdict: verified
Location: App. K
Claim: "Christensen et al. (2015) came close to $99\%$ of the QM prediction for the CHSH test, measuring chained Bell inequalities up to $N=90$ with $\text{CF}_{36} = 0.874(1)$."

CHSH saturation from paper: $2.817/(2\sqrt{2}) \approx 99.6\%$ (slightly above $99\%$, consistent with "close to 99%"). $N=90$ and $\text{CF}_{36} = 0.874(1)$ as in C055. Confirmed.

Full evidence: verification/C058/

C059 citation verified C059 — Vienna 2011 [Lapkiewicz]: saturation $0.947(6)$, signaling $0.08(3)$

Verdict: verified
Location: Table 3
Claim: "Vienna 2011 KCBS test: saturation $0.947(6)$, signaling $0.08(3)$."

Computed from Table 1 of Lapkiewicz et al. Nature 474, 490 (2011): $S_5 = -3.894(6)$, saturation $= 0.947(6)$; signaling $\delta = 0.081(2) \approx 0.08(3)$. Confirmed.

Full evidence: verification/C059/

C060 citation partial C060 — Stockholm 2013 [Ahrens]: saturation $0.53(11)$ and $0.95(11)$; no signaling data

Verdict: partial
Limitations: minor_methodological_inconsistency_in_cited_paper_normalization
Location: Table 3
Claim: "Ahrens et al. 2013: saturation $0.53(11)$ normal, $0.95(11)$ reverse; no signaling data available."

Ahrens et al. Sci. Rep. 3, 2170 (2013) Table II: $\kappa_{\text{nor}} = -3.536\pm0.005$, $\kappa_{\text{rev}} = -3.896\pm0.006$ (stat). Applying the paper's own formula: reverse $= 0.953$ (matches claim to $0.03\sigma$); normal $= 0.561$ vs claimed $0.53$ (discrepancy: the citing paper appears to use the raw numerator $|\kappa - S^{\text{NC}}| = 0.536$ instead of the normalized fraction). "No signaling data" confirmed (Ahrens reports only joint correlators, never individual marginals).

Full evidence: verification/C060/

C061 citation partial C061 — Beijing 2013 [Deng]: saturation $0.977(11)$ and $0.956(26)$; signaling $0.267$ and $0.291$

Verdict: partial
Failure reason: mismatch (one value)
Location: Table 3
Claim: "Deng et al. 2013: saturation $0.977(11)$ and $0.956(26)$; signaling $0.267$ and $0.291$."

From Deng et al. arXiv:1301.5364 Table I: saturation $0.977(11)$ and $0.956(26)$ confirmed within stated uncertainties. Signaling $0.291$ (biased case) is an exact match. Signaling $0.267$ (uniform case) cannot be precisely reproduced under any single convention (two interpretations give $0.280$ or $0.251$; average $0.265 \approx 0.267$). Core qualitative claim (large signaling relative to violation) unambiguously confirmed.

Full evidence: verification/C061/

C062 citation verified C062 — Beijing 2013 [Um]: saturation $0.589(24)^$, signaling $0.119(24)^$

Verdict: verified
Location: Table 3
Claim: "Um et al. 2013 (re-analysis): saturation $0.589(24)^$, signaling $0.119(24)^$."

Reproduced from Um et al. Sci. Rep. 3:1627 Table 1: $S_5 \approx -3.558$, saturation $= 0.591$; signaling $0.119$ — both within stated $\pm 0.024$. The $^*$ notation (authors' own re-analysis due to errors in original data) confirmed structurally.

Full evidence: verification/C062/

C063 citation verified C063 — Brisbane 2016 [Jerger]: saturation $0.520(1)$ / $0.541(1)$; signaling $0.379(2)$

Verdict: verified
Location: Table 3
Claim: "Jerger et al. 2016: saturation $0.520(1)$ normal, $0.541(1)$ reverse; signaling $0.379(2)$."

From Jerger et al. Nat. Commun. 7, 12930 (2016) Table I: computed saturation $0.5202$ and $0.5411$; signaling $0.3788$ — all match to $\leq 0.001$.

Full evidence: verification/C063/

C064 math partial C064 — Pulse count and experiment duration scale as $O(N^2)$

Verdict: partial
Failure reason: mismatch
Limitations: paper_text_only_reimplementation; arbitrary_assumption_made
Location: Sec. N-gon results; App. A
Claim: "The number of pulses and duration for these experiments both grow as $N^2$."

The per-rotation pulse count formula $(i-1)(N-1)/2 + 3$ for $U_i$ is verified (matches the Fig. caption formula $2i+1$ at $N=5$). However, summing over all $N$ observable pairs gives: $$\sum_{i=1}^{N}\!\left[\frac{(i-1)(N-1)}{2}+3\right] = \frac{N(N-1)^2}{4}+3N \sim O(N^3).$$

A power-law fit to values for $N = 17\ldots121$ yields exponent $\approx 3.03$. The $O(N^2)$ scaling is recoverable only if one counts exclusively the inter-measurement concatenated transitions ($\approx N^2/2$) while ignoring the dominant pre-measurement rotation. The concatenation identity $U_j^\dagger U_i = U_{i-j}$ was checked numerically (Frobenius distance $2.1$–$2.7 \neq 0$), confirming it is a deliberate approximation.

Full evidence: verification/C064/

C065 empirical verified C065 — Measured $S_5$ surpasses Bell-scenario quantum maximum $\bar{S}_5^{\text{Bell}} \approx -3.828$

Verdict: verified
Location: Sec. KCBS results; Fig. 2
Claim: "Close to compatibility we can resolve values of $S_5$ surpassing the Bell-scenario quantum maximum $\bar{S}_5^{\text{Bell}} \approx -3.828$."

Normal order: $S_5 = -3.9154$ — below $-3.828$ by $6.2\sigma$. Reverse order: $S_5 = -3.9374$ — below $-3.828$ by $8.1\sigma$. All 12/12 general-scan data points within $2°$ of $\theta_5$ lie strictly below $-3.828$.

Full evidence: verification/C065/

C066 empirical partial C066 — Largest number of observables ($N=101$) in any contextuality experiment (Dec 2017)

Verdict: partial
Limitations: paper_text_only_reimplementation; literature_survey_not_exhaustive
Location: Sec. N-gon results
Claim: "These results show contextuality in a system with the largest number of observables (101) of any experiment reported up to this date."

Literature audit from available artifacts: - Christensen 2015: chained Bell up to $N=90$ (even-cycle) ✓ below 101. - Arias 2015: extended KCBS up to $N=7$ ✓. - Leupold 2017 (same group, June 2017): SIC test with 13 observables ✓. - All other cited experiments: $N \leq 6$.

Prior record: $N=90$ (even-cycle Bell); this paper's $N=101$ (odd-cycle KCBS) exceeds both. Verdict partial: exhaustive pre-Dec 2017 contextuality literature cannot be confirmed from available artifacts alone; paper hedges "to our best knowledge."

Full evidence: verification/C066/

C067 empirical partial C067 — $\text{CF}_{31} = 0.800(4)$ is largest contextual fraction closing the detection loophole (Dec 2017)

Verdict: partial
Limitations: paper_text_only_reimplementation; partial_data_coverage
Location: Sec. N-gon results
Claim: "The measured contextual fraction is larger than for any other experiment closing the detection loophole [Tan 2017]."

Tan et al. PRL 118, 130403 (2017) ($^9\text{Be}^+$ trapped ions, $\sim 100\%$ detection): best result $I_9 = 0.296(12)$, $\text{CF}9 = 0.704 \pm 0.012$ — $7.6\sigma$ below $0.800$. Other loophole-free experiments: Hensen 2015 ($\text{CF} \approx 0.21$), Giustina 2015 ($\approx 0.35$), Shalm 2015 ($\approx 0.01$) — all well below $0.800$. Christensen 2015 reached $\text{CF} = 0.874$ but did NOT close the detection loophole. Verdict partial: CF extracted from Tan et al.\ text rather than raw data; broader assertion not exhaustively verified.

Full evidence: verification/C067/

C068 math not verified C068 — Shot-noise bias formula $\sqrt{2(1-\cos 2\theta_5)/(\pi n)}$ for $S_5^{(\text{ext})}$ ❌

Verdict: not_verified (mismatch)
Location: App. E (sec:dataAnalysis)
Claim: The shot-noise bias in $\varepsilon_i$ at $\theta_5$ with $n = 10{,}000$ shots is given by $\sqrt{2(1-\cos 2\theta_5)/(\pi n)}$.

Derivation (from verification/C068/derivation.md):

Two compounding errors identified:

(1) Wrong variance formula: The paper writes $\sigma^2_{A_i} = (1 - \langle A_i\rangle)/n$, but for $\pm 1$ outcomes the correct shot-noise variance is $(1 - \langle A_i\rangle^2)/n$.

(2) Missing factor of 2: The combined variance for $\varepsilon_i = |\langle A_i^{(1)}\rangle - \langle A_i^{(2)}\rangle|$ should be $\sigma^2_{A^{(1)}} + \sigma^2_{A^{(2)}} = 2\sigma^2_A$, not $\sigma^2_A$.

At $\theta_5$ ($\cos 2\theta_5 = 2/\sqrt{5} - 1 \approx -0.1056$):

Quantity	Value ($n=10{,}000$)
$\text{E}[\hat\varepsilon_i]$ (correct $\pm1$ variance)	$0.01122$
Paper formula $\sqrt{2(1-\cos 2\theta_5)/(\pi n)}$	$0.00839$
Ratio	$1.337$

The paper's formula can only be recovered by simultaneously using the wrong variance and treating $\sigma^2_{\varepsilon_i} = \sigma^2_A$ (single measurement) instead of $2\sigma^2_A$. The conceptual framework ($\varepsilon_i$ follows a folded normal; $\varepsilon_i(\theta_5) = 0$) is correct.

Full evidence: verification/C068/

5. Point-by-Point Review of the Paper Body

Abstract

Point	Assessment	Supporting claims
Quantum contextuality demonstrated in single trapped-ion qutrit using KCBS $N$-gon states	agreed	C001, C006, C036–C047 — full experimental programme reproduced exactly from public data
All data points violate extended KCBS inequality ($S_N^{(\text{ext})} < S_N^{\text{NC}}$ up to $N=61$)	agreed	C035, C048 — confirmed at $4.7$–$29.6\sigma$ for $N \leq 61$
Largest contextual fraction $\text{CF}_{31} = 0.800(4)$	agreed	C049 — reproduced to $0.1\sigma$ from raw data
KCBS result is $\approx 99.5(2)\%$ of QM prediction	partially agreed	C052 — computed $\approx 99.3\pm1.4\%$; the central value is within actual $1\sigma$, but the stated uncertainty $\pm0.2\%$ is understated by $\approx 7\times$; see §6
Contextuality demonstrated for largest number of observables ($N=101$)	agreed	C048, C066 — unambiguously exceeds all cited prior experiments
Largest detection-loophole-free contextual fraction	agreed	C067 — Tan 2017 gives $\text{CF}_9 = 0.704 \pm 0.012$, $7.6\sigma$ below this work

Introduction

Point	Assessment	Supporting claims
KCBS provides a state-independent NC inequality for qutrits; prior experimental tests limited to $N \leq 7$	agreed	C056 — Arias 2015 confirms $N=7$ frontier; C001, C006 verify the inequality structure
Chained Bell experiments (even $N$) have reached $N=90$ with $\text{CF}_{36} = 0.874(1)$	agreed	C055, C058 — directly confirmed from Christensen 2015
This work demonstrates $\text{CF}_{31} = 0.800(4)$, the largest detection-loophole-free CF	agreed	C049, C067 — verified; see C067 for detection-loophole condition

KCBS Experiment — Setup

Point	Assessment	Supporting claims
NC bound $S_5 \geq -3$ (Eq. 1)	agreed	C001 — verified by exhaustive enumeration and combinatorial proof
QM minimum $S_5^{\text{QM}} = 5-4\sqrt{5} \approx -3.944$ (Eq. 2)	agreed	C002 — verified analytically
Compatibility angle $\theta_5 = \arccos(5^{-1/4}) \approx 48°$	agreed	C003 — $48.030°$, rounds to $48°$
Extended KCBS inequality $S_5^{(\text{ext})} \geq -3$ penalises signaling (Eq. 3)	agreed	C004 — confirmed; reduces to standard KCBS at $\theta_5$
Qutrit encoded in $^{40}\text{Ca}^+$ Zeeman levels; 729 nm drive	agreed	C013, C014, C015 — all apparatus parameters verified

KCBS Results

Point	Assessment	Supporting claims
Table 1 normal-order correlators	agreed	C029, C030 — reproduced to $\leq 0.04\sigma$ from raw data
Table 1 reverse-order correlators	agreed	C031, C032 — reproduced to $\leq 0.18\sigma$
Violation of NC bound by 65/67 $\sigma$ at $\theta_5$	agreed	C034 — arithmetic confirmed
All $\theta$-scan data violate extended KCBS by up to 25 $\sigma$	agreed	C035 — all 12/12 points below $-3$; computed max $\approx 27\sigma$
Systematic shift of 1.6 $\sigma$ from QM prediction	partially agreed	C033 — qualitatively confirmed ($\sim1.5$–$1.8\sigma$); the exact $1.6\sigma$ figure cannot be precisely reproduced but is within the range of plausible combination methods
Measured $S_5$ surpasses Bell-scenario quantum maximum $\approx -3.828$	agreed	C012, C065 — both Table 1 values are $> 6\sigma$ below $-3.828$

N-gon States (Theory)

Point	Assessment	Supporting claims
$N$-gon compatibility angle formula $\theta_N = \arccos\!\sqrt{\cos(\pi/N)/(1+\cos(\pi/N))}$	agreed	C005 — verified symbolically and numerically for all $N$ tested
Classical bound $S_N \geq -N+2$ for odd $N$	agreed	C006 — proven combinatorially
QM minimum $S_N^{\text{QM}} = (N-3N\cos(\pi/N))/(1+\cos(\pi/N))$	agreed	C007 — verified at $N=5$ symbolically; numerical check for all $N \leq 121$
Contextual fraction definition $\text{CF}_N = (S_N - S_N^{\text{NC}})/(S_N^{\text{NS}} - S_N^{\text{NC}})$	agreed	C008 — verified against Table 2 for all $N$
$\text{CF}_N \to 1$ as $N \to \infty$	agreed	C007, C008 — derivable from formulas: as $N\to\infty$, $\cos(\pi/N)\to 1$, $S_N^{\text{QM}}\to -N = S_N^{\text{NS}}$

N-gon Results

Point	Assessment	Supporting claims
Table 2: all 12 $N$ values match QM predictions	agreed	C036–C047 — reproduced from public data; note $S_7^{\text{NC}} = -6$ in paper appears to be a typo (formula gives $-5$)
Pulse count and duration grow as $N^2$	partially agreed	C064 — per-rotation formula verified; but summing all pulses gives $O(N^3)$, not $O(N^2)$; $O(N^2)$ is recoverable only under a specific (concatenation-only) counting convention not fully supported by the stated derivation
Largest $\text{CF}_{31} = 0.800(4)$	agreed	C049 — confirmed to $0.1\sigma$
Contextuality for $N$ up to 101 (bare), 61 (extended)	agreed	C048 — exact cutoffs confirmed from data
Largest $N$ of any contextuality experiment at time of publication	agreed	C066 — prior record $N=90$ (Bell); this work's $N=101$ exceeds it; partial verdict due to non-exhaustive survey
Largest detection-loophole-free CF	agreed	C067 — Tan 2017 ($\text{CF}_9 = 0.704$) confirmed below $0.800$

Conclusion

Point	Assessment	Supporting claims
Summary of main results; future directions	agreed — no novel factual claims	(none)

App. A — Qutrit Transitions

Point	Assessment	Supporting claims
Qubit encoding, magnetic field, wavelength	agreed	C013, C014, C015 — all verified or well-corroborated
AC Stark shifts $< 100\ \text{Hz}$	partially agreed	C016 — physically self-consistent but Rabi frequency not available; plausible given stated operating regime

App. B — Qutrit Coherence Times

Point	Assessment	Supporting claims
$\sigma_t \approx 1.6\ \text{ms}$ for transitions to $D_{5/2}$	partially agreed	C017 — internally consistent (1.629 ms from FWHM); 17Leupold reports 2.5 ms (different run conditions)
$\sigma_t \approx 7\ \text{ms}$ for $	1\rangle\leftrightarrow	2\rangle$
Common-mode noise $\approx 230\ \text{Hz}$ FWHM	agreed	C019 — computed 234 Hz, within $\approx$ qualifier
Differential noise $< 50\ \text{Hz}$, $B < 5\ \mu\text{G}$	partially agreed	C020 — Zeeman sensitivity gives $< 8.4\ \text{Hz}$ per $5\ \mu\text{G}$, consistent with paper; no raw Ramsey data
Drift $\sim 100\ \text{Hz}$, recalibration every 30 s	partially agreed	C021 — confirmed from paper text; not independently verifiable

App. C — Ion Cooling

Point	Assessment	Supporting claims
Doppler cooling: $n_{\text{th}} \approx 5$	partially agreed	C022 — Doppler limit calculation gives $\approx 3.9$ for axial mode; radial modes lower; exact trap frequencies not stated
EIT cooling: $n_{\text{th}} \approx 0.2$	agreed	C023 — verbatim in Alonso 2016 (same apparatus)
Heating rate $\sim 200\ \text{quanta/s}$ during dark detection	not agreed	C024 — claim unverifiable from available data; plausible but requires dedicated sideband spectroscopy data

App. D — Qutrit Detection

Point	Assessment	Supporting claims
Fluorescence at 397 nm + repump at 866 nm	agreed	C025 — NIST levels confirmed to $< 0.5\ \text{nm}$
$\approx 25$ photons bright, $\approx 1$ bg, $\approx 200\ \mu\text{s}$ window	partially agreed	C026 — Leupold 2017 gives consistent values after scaling; no raw photon data
Detection errors $2\times10^{-5}$ / $1\times10^{-4}$	partially agreed	C027 — Poisson model gives $2.3\times10^{-5}$ / $1.2\times10^{-4}$; within $\approx$ qualifier
$D_{5/2}$ lifetime $\approx 1.2\ \text{s}$	partially agreed	C028 — spectroscopic literature gives $1.155$–$1.168\ \text{s}$; rounds to $1.2\ \text{s}$

App. E — Data Collection and Analysis

Point	Assessment	Supporting claims
Data analysis reproduces all Table 1 and Table 2 values	agreed	C029–C032, C036–C047 — reproduced from raw data
Statistical significance calculations (65 $\sigma$, 67 $\sigma$, up to 25 $\sigma$)	agreed	C034, C035 — confirmed
Shot-noise bias formula for $\varepsilon_i$ (folded normal)	disagreed	C068 — the conceptual framework is correct but the closed-form formula contains two errors (wrong variance; missing factor of 2); correct value $\approx 1.34\times$ larger than claimed

App. G — Theoretical Predictions for KCBS Witnesses

Point	Assessment	Supporting claims
Single-correlator formula	agreed	C009 — verified symbolically
Minimum at $\theta = \pi/2$, value $\approx -4.045$	partially agreed	C010 — value $-4.045$ is correct; angle $\theta = \pi/2$ is a typo (should be $\theta = \pi/4$); the maximum $S_5(\pi/2) = 5$
$\varepsilon_i$ formula	agreed	C011 — verified symbolically

App. H — Relevance of KCBS

Point	Assessment	Supporting claims
Bell-scenario quantum maximum $\bar{S}_5^{\text{Bell}} \approx -3.828$	agreed	C012 — verified; equals $-1-2\sqrt{2}$

App. I — N-Cycle Details

Point	Assessment	Supporting claims
Table 2 complete $N$-gon data	agreed	C036–C047 — see §4; note $N=7$ NC bound entry appears to contain a typo

App. J — Comparison with Previous KCBS Tests

Point	Assessment	Supporting claims
Vienna 2011 row	agreed	C059 — confirmed from Lapkiewicz 2011 data
Stockholm 2013 row	partially agreed	C060 — reverse saturation $0.953$ confirmed; normal saturation $0.561$ vs claimed $0.53$ (minor methodological inconsistency in normalization)
Beijing 2013 (Deng) row	partially agreed	C061 — 3/4 numbers confirmed; uniform-case signaling $0.267$ not exactly reproducible
Beijing 2013 (Um) row	agreed	C062 — re-analyzed values confirmed
Brisbane 2016 (Jerger) row	agreed	C063 — confirmed from Jerger et al. Table I
This work normal/reverse	agreed	C050, C051 — reproduced to $< 0.20\sigma$

App. K — Comparison with Bell Tests

Point	Assessment	Supporting claims
This work: $99.5(2)\%$ of QM limit	partially agreed	C052 — computed $99.3\pm1.4\%$; $0.15\sigma$ from $99.5\%$ but uncertainty understated $\approx7\times$; see §6
Poh 2015: $99.97(2)\%$ of Tsirelson bound	agreed	C057 — confirmed
Christensen 2015: $\sim 99\%$ CHSH, $N=90$, $\text{CF}_{36} = 0.874$	agreed	C058 — confirmed

Point	Assessment	Supporting claims
No novel factual claims	agreed — no novel factual claims	(none)

6. Major Issues

C052 — Stated QM-limit saturation 99.5(2)% is numerically inconsistent

Claim: "This work's KCBS result corresponds to $99.5(2)\%$ of the QM limit."
Location: App. K (sec:Bellcomparison) and Abstract.

Issue: The standard saturation formula $(S_5 - S_5^{\text{NC}})/(S_5^{\text{QM}} - S_5^{\text{NC}})$ applied to the best (reverse-order) data gives $99.27 \pm 1.43\%$ — not $99.5\pm0.2\%$. The discrepancy has two components:

Central value: $99.27\%$ vs $99.5\%$ — a $0.23$ percentage-point difference, which is within the actual $1\sigma = 1.43\%$ so not wrong per se, but the quoted value is not what the data directly yield.
Stated uncertainty: $\pm 0.2\%$ is inconsistent with the actual statistical uncertainty $\pm 1.43\%$ (a factor of $\approx 7$ understatement). Table 3 of the same paper independently reports $0.992(14) = 99.2(1.4)\%$ for the reverse order, confirming the correct uncertainty.

Impact: The $99.5(2)\%$ figure appears in the abstract and is used as a comparison benchmark against Poh 2015 ($99.97(2)\%$) and Christensen 2015 ($\sim 99\%$). The understated uncertainty exaggerates the precision of the KCBS result; the true $\pm 1.4\%$ would not affect any comparative conclusion (still clearly above $99\%$) but is misleading as written.

Recommendation: The abstract and App. K should read $99.3(1.4)\%$ or, accepting the rounded Table 3 value, $99.2(1.4)\%$.

7. Minor Issues

C010 — Typo: minimum angle $\theta = \pi/2$ should be $\theta = \pi/4$

Claim: The minimum of $S_5(\theta)$ is at $\theta = \pi/2$.
Location: App. G.
Finding: $S_5(\pi/2) = 5$ (maximum). Minimum is at $\theta = \pi/4$ (where $\cos(4\theta) = -1$). The numerical value $\frac{5}{4}(-\sqrt{5}-1) \approx -4.045$ is correct. Pure typographical error in a supplementary appendix; does not affect any experimental result.

C024 — Heating rate $\sim 200\ \text{quanta/s}$ unverifiable

Claim: Dark-detection motional heating rate $\sim 200\ \text{quanta/s}$.
Location: App. C.
Finding: No sideband spectroscopy data in the public repository; claim cannot be confirmed or refuted. Plausible for a surface-electrode trap; apparatus characterisation only; no impact on contextuality conclusions.

C068 — Shot-noise bias formula contains two errors

Claim: Expected gap in $S_5^{(\text{ext})}$ at $\theta_5$ is $\sqrt{2(1-\cos 2\theta_5)/(\pi n)}$.
Location: App. E.
Finding: Two compounding errors — (1) variance formula omits the square on $\langle A_i\rangle$; (2) combined variance for $\varepsilon_i$ should be $2\sigma^2_A$, not $\sigma^2_A$. The correct value is $\approx 1.34\times$ larger ($0.01122$ vs $0.00839$ at $n=10{,}000$). This affects the dashed red correction curves in Figs. 1 and 2, but not any of the primary contextuality conclusions.

C016 — AC Stark shift bound $< 100\ \text{Hz}$ (plausible, unconfirmed)

Rabi frequency not stated; physically self-consistent reconstruction gives $\approx 100\ \text{Hz}$ at $\Omega/2\pi \approx 50\ \text{kHz}$. Apparatus detail; no impact on main results.

C017, C018 — Coherence times (apparatus characterisation)

$\sigma_t \approx 1.6\ \text{ms}$ (C017) and $7\ \text{ms}$ (C018): internally consistent with stated noise widths; companion paper 17Leupold gives $\approx 1.6\times$ longer values in a different run of the same apparatus. No raw Ramsey data available.

C020, C021 — Differential noise and recalibration (apparatus characterisation)

C020: $B$-field contribution $< 8.4\ \text{Hz}$ confirmed; remaining $\sim 42\ \text{Hz}$ attributed to slow drifts. C021: recalibration parameters confirmed by text; not independently recomputable. No impact on experimental conclusions.

C022 — Doppler cooling $n_{\text{th}} \approx 5$ (approximate)

Doppler limit calculation gives $\approx 3.9$ (axial) to $1.1$ (radial); claim appears to refer to the axial mode or uses slightly lower trap frequencies than the cited companion paper. Apparatus detail.

C026, C027, C028 — Photon counts and detection errors (apparatus characterisation)

C026 and C027: computed values within $6$–$25\%$ of stated values (consistent with $\approx$ qualifiers); C028: $D_{5/2}$ lifetime from spectroscopy literature ($1.155$–$1.168\ \text{s}$) rounds to stated $\approx 1.2\ \text{s}$. No impact on contextuality conclusions.

C033 — Systematic shift "1.6 $\sigma$" from QM prediction

Computed $1.49$–$1.78\sigma$ depending on combination method; no single formula reproduces exactly $1.6\sigma$. Qualitative statement ("approximately $1.6\sigma$") is accurate.

C044 empirical partial C044 — $N=61$ row (partial due to reimplementation only)

All five values match paper within $0.1\sigma$; verdict partial only because no paper-provided analysis code exists. Not a substantive issue.

C060 — Stockholm 2013 normal-order saturation ($0.53$ vs computed $0.561$)

Minor methodological inconsistency: the citing paper appears to use the raw numerator $|\kappa - S^{\text{NC}}|$ rather than the full normalized fraction for the normal-order entry. Both values are within the $\pm 0.11$ error bar.

C061 — Beijing 2013 (Deng) uniform-case signaling ($0.267$, ambiguous)

Three of four numbers confirmed; uniform-case signaling $0.267$ is between two equally plausible interpretations ($0.251$ and $0.280$). Core qualitative finding (large signaling) unaffected.

C064 — $O(N^2)$ pulse-count scaling not supported by stated derivation

Per-rotation formula correct; summing all $N$ rotations gives $O(N^3)$, not $O(N^2)$. $O(N^2)$ holds only for a concatenation-only counting convention not fully described in the paper. Practical conclusion (total experiment time grows rapidly with $N$) is correct.

C066 — Record $N=101$ observables (literature survey limitation)

Strongly supported by all available citations; verdict partial only due to non-exhaustive pre-Dec 2017 survey. Not a substantive issue.

C067 — Largest detection-loophole-free CF (literature survey limitation)

Tan 2017 ($\text{CF}_9 = 0.704$) confirmed $7.6\sigma$ below $0.800$. Verdict partial only because Tan et al. raw data not available; extracted values from paper text.

End of report. Verdict counts: verified 48 / partial 17 / not_verified 3 (total 68). Major-issue IDs: C052.

Generated by paper-verifier, model claude-sonnet-4-6. Math rendered by MathJax. Open this file locally to see embedded figures.