- verified48(71%)
- partial17(25%)
- not_verified3(4%)
Verification Report — arXiv:1712.06494
"Probing the limits of correlations in an indivisible quantum system"
1. Brief
This report summarises the automated factual-claim verification of arXiv:1712.06494 (Malinowski et al., ETH Zürich, December 2017). The paper demonstrates quantum contextuality in a single trapped $^{40}\text{Ca}^+$ ion using Klyachko–Can–Binicioglu–Shumovsky (KCBS) $N$-cycle inequalities with $N$ up to 121.
Verdict counts (68 claims total)
| Verdict | Count |
|---|---|
| verified | 48 |
| partial | 17 |
| not_verified | 3 |
Model used: claude-sonnet-4-6 for all phases.
Major-issue claim IDs: C052
Summary of issues: - C052 (partial/major): the stated 99.5(2)% QM-limit saturation in the abstract is numerically inconsistent — the data yield $\approx 99.3\%$ and the true statistical uncertainty is $\pm 1.4\%$, not the stated $\pm 0.2\%$. - C010 (not_verified): a typographical error in App. G — the minimum of $S_5(\theta)$ occurs at $\theta = \pi/4$, not $\theta = \pi/2$ as written. - C024 (not_verified): motional heating rate $\sim 200\ \text{quanta/s}$ cannot be reproduced from the public dataset (requires dedicated sideband spectroscopy data). - C068 (not_verified): the shot-noise bias formula $\sqrt{2(1-\cos 2\theta_5)/(\pi n)}$ in App. E contains two compounding errors (wrong variance form; missing factor of 2), underestimating the bias by a factor of $\approx 1.34$.
The overwhelming majority of experimental results (Table 1, Table 2, Fig. 2, Fig. 3) reproduced exactly from the public dataset.
2. Artifact Inventory
Paper-provided artifacts
| Path | Kind | Origin URL | Provenance |
|---|---|---|---|
source/eprint.tar.gz |
source (LaTeX + figures) | https://arxiv.org/e-print/1712.06494 | Downloaded from arXiv; SHA-256 ef01ef1f…; 890 382 bytes |
artifacts/data/Dataset_Public_repository.zip |
data | https://ethz.ch/content/dam/ethz/special-interest/phys/quantum-electronics/tiqi-dam/documents/Datasets/Dataset_Public%20repository.zip | Downloaded from TIQI public repository; SHA-256 0febe99a…; 13 985 766 bytes; contains raw shot files + thresholded CSV files for $N = 5, 7, 11, 17, 23, 31, 41, 51, 61, 81, 101, 121$ |
Agent-produced artifacts
| Path | Kind | Provenance |
|---|---|---|
claims/claims.jsonl |
claim records (68 claims) | Extracted by claude-sonnet-4-6 (extract phase) from source/eprint.tar.gz |
claims/rejected.jsonl |
filtered candidates | Produced during extract_audit phase |
claims/coverage_audit.md |
section coverage audit | Produced during extract_audit phase |
bibliography.json |
parsed references | Produced during ingest phase |
inventory.json |
artifact metadata | Produced during ingest phase |
verification/C001/ … verification/C068/ |
per-claim evidence directories | Produced by verify_* phases; each contains verdict.json plus derivation.md, run.py, run.log, citation_quote.md, and/or figure.png / paper_figure.png as appropriate |
artifacts/citations/17Leupold.pdf |
companion paper PDF | Fetched during verification |
artifacts/citations/16Alonso.txt |
companion paper text | Fetched during verification |
artifacts/citations/15Christensen.pdf |
cited paper PDF | Fetched during verification |
artifacts/citations/15Poh.pdf |
cited paper PDF | Fetched during verification |
artifacts/citations/11Lapkiewicz.pdf |
cited paper PDF | Fetched during verification |
artifacts/citations/13Ahrens.pdf |
cited paper PDF | Fetched during verification |
artifacts/citations/13Deng.txt |
cited paper text | Fetched during verification |
3. Model Attribution
All phases were executed by claude-sonnet-4-6.
| Phase | Prompt file | SHA-256 |
|---|---|---|
| ingest | prompts/ingest.md |
ef27db13971e3e2eb805be653670943b52c8e221afb833fbc674b4bbbf7da1d3 |
| extract | prompts/extract.md |
34b48e02f5151b6dccf90b1035b7d6b9ee8de3e7bd1ed2c9268efb62c6b13c72 |
| extract_audit | prompts/extract_audit.md |
ac2f07402104dc51d0debe90bba2a28baef691f4602938ceb2cff99596a0ea77 |
| verify_math (run 1) | prompts/verify_math.md |
7aa3c22779feee85b0239a615986868b730eb4e86852b5c5d86af54003cfc806 |
| verify_math (run 2) | prompts/verify_math.md |
e0113697d625d30c810d639b5a1e8d6e5e07395080519f300a08ad4e03de8a9a |
| verify_numeric (run 1) | prompts/verify_numeric.md |
e7f0824163e3171d41dfd4c65d2e75656831da502208dfe2193c64fd39d166c9 |
| verify_numeric (run 2) | prompts/verify_numeric.md |
ce6fa259af53bb44811241b7f089b089d4e740f48b76a7cfef6a7a6dc0a7e95d |
| verify_plot (run 1) | prompts/verify_plot.md |
8ca8f1de4dd89664cc25875e69573117d65a1aa185aac274920e8f1efec5d41b |
| verify_plot (run 2) | prompts/verify_plot.md |
92cf06a6a3eb82b9e6dd58b2e1f46662139aca8cf886eca2f87a9c38ba7b7f61 |
| verify_citation | prompts/verify_citation.md |
35178cfb8087eabce273f0bdc0b9b4c125e1d0ca1aa2af8ab8d040a3868c3729 |
| verify_empirical (run 1) | prompts/verify_empirical.md |
a86ff78d3df26b7573f54fd23157d9a88f124983468be02b9912b3daa5549f47 |
| verify_empirical (run 2) | prompts/verify_empirical.md |
fe120622d5102542d8e399607d6f2d9fd2b2e75eaeddf4730a19eb033228badb |
| report (run 1) | prompts/report.md |
ff51a06214b6ba57134db778ea2d70dacfef1bfaa0c609820a6b1df653671a78 |
| report (run 2) | prompts/report.md |
6c0b03711c2c6e1d6e07faeae3e0901a5fbc818a36b1603b1300281b56237f4f |
| report (run 3) | prompts/report.md |
9d5c7c0aa3e90c146bd2d96b9ca9fdb1f1f316f7beb4dc9bc104aa9bbd4ab521 |
Platform note: macOS 24.6.0 — no unshare -n network isolation (Linux-only).
Credentials scrubbed from subprocess environment; CWD locked to claim verification
directory; CPU/memory/file-size limits via ulimit.
4. Per-Claim Breakdown
C001 math verified C001 — KCBS classical bound $S_5 \geq -3$
Verdict: verified
Location: Sec. KCBS experiment, Eq. 1
Claim: In NC models $S_5(\theta_5) = \sum_{i=1}^{5} \langle A_i^{(1)} A_{i\pm1}^{(2)}\rangle \geq -3$.
Derivation (from verification/C001/derivation.md):
Each $A_i \in \{+1,-1\}$ in any NC model, so $\prod_{i=1}^{5} (v_i v_{i+1}) = \prod v_i^2 = 1$, forcing an even number of anti-correlated pairs $k \in \{0,2,4\}$. Then $S_5 = 5-2k \geq 5-8 = -3$. The bound $-3$ is tight (e.g. $v = (1,1,-1,1,-1)$). Exhaustive enumeration of all $2^5 = 32$ assignments confirms $\min S_5 = -3$.
Full evidence: verification/C001/
C002 math verified C002 — QM prediction $S_5(\theta_5) = 5 - 4\sqrt{5} \approx -3.944$
Verdict: verified
Location: Sec. KCBS experiment, Eq. 2
Claim: The quantum minimum of $S_5$ is $5 - 4\sqrt{5} \approx -3.944$.
Numerical check: $5 - 4\sqrt{5} = -3.94427\ldots \approx -3.944$ ✓.
Algebraic check: substituting $\cos(4\theta_5)$ derived from $\theta_5 = \arccos(5^{-1/4})$
into the correlator formula gives each $\langle A_i A_{i+1}\rangle = (5-4\sqrt{5})/5$;
$S_5 = 5 \times (5-4\sqrt{5})/5 = 5-4\sqrt{5}$ exactly (SymPy verified).
Full evidence: verification/C002/
C003 math verified C003 — Compatibility angle $\theta_5 = \arccos(5^{-1/4}) \approx 48°$
Verdict: verified
Location: Sec. KCBS experiment
Claim: $\theta_5 = \arccos(5^{-1/4}) \approx 48°$.
$\arccos(5^{-1/4}) = 48.030°$ (SymPy); rounds to $48°$ ✓. The general formula $\theta_N = \arccos\!\sqrt{\cos(\pi/N)/(1+\cos(\pi/N))}$ reduces to $\arccos(5^{-1/4})$ at $N=5$ (verified symbolically).
Full evidence: verification/C003/
C004 math verified C004 — Extended KCBS inequality $S_5^{(\text{ext})} \geq -3$
Verdict: verified
Location: Sec. KCBS experiment, Eq. 3
Claim: $S_5^{(\text{ext})}(\theta) = \sum_{i=1}^{5}\langle A_i^{(1)} A_{i\pm1}^{(2)}\rangle + \sum_{i=1}^{5} \varepsilon_i \geq -3$, where $\varepsilon_i = |\langle A_i^{(1)}\rangle - \langle A_i^{(2)}\rangle|$.
Classical bound confirmed by exhaustive enumeration. Reduction at $\theta_5$: SymPy verifies the prefactor of $\varepsilon_i$ vanishes exactly at $\theta_5$, so $S_5^{(\text{ext})}(\theta_5) = S_5(\theta_5) = 5-4\sqrt{5}$.
Full evidence: verification/C004/
C005 math verified C005 — $N$-gon compatibility angle $\theta_N = \arccos\!\sqrt{\cos(\pi/N)/(1+\cos(\pi/N))}$
Verdict: verified
Location: Sec. N-gon states
Claim: Adjacent $N$-gon states are orthogonal iff $\theta = \theta_N = \arccos\!\sqrt{\cos(\pi/N)/(1+\cos(\pi/N))}$.
Starting from the state vectors $|\psi_i\rangle = (\cos\theta,\,\sin\theta\cos\varphi_i,\,-\sin\theta\sin\varphi_i)^T$ with $\Delta\varphi = \pi(N-1)/N$, $\langle\psi_i|\psi_{i+1}\rangle = \cos^2\theta - \sin^2\theta\cos(\pi/N)$; setting to zero gives the formula. Numerical spot-checks at $N = 5, 7, 11, 31, 51, 101$ all give $|\langle\psi_i|\psi_{i+1}\rangle| < 2.3\times10^{-16}$ at $\theta_N$.
Full evidence: verification/C005/
C006 math verified C006 — Classical NC bound $S_N \geq -N+2$
Verdict: verified
Location: Sec. N-gon states, Eq. 5
Claim: $S_N = \sum_{i=1}^{N}\langle A_i A_{i+1}\rangle \geq -N+2$ for odd $N$.
The parity constraint forces $k$ (anti-correlated pairs) to be even; for odd $N$ the maximum even $k \leq N$ is $N-1$, giving $S_N = N-2(N-1) = -N+2$. Exhaustive enumeration for $N \in \{5,7,9,11,13\}$ confirms $\min S_N = -N+2$.
Full evidence: verification/C006/
C007 math verified C007 — QM minimum $S_N \geq (N - 3N\cos(\pi/N))/(1+\cos(\pi/N))$
Verdict: verified
Location: Sec. N-gon states, Eq. 6
Claim: QM minimum of the $N$-cycle witness is $S_N^{\text{QM}} = (N - 3N\cos(\pi/N))/(1+\cos(\pi/N))$.
At $N=5$: formula gives $5-4\sqrt{5}$ (SymPy symbolic difference = 0) ✓. Numerical evaluation for all $N \in \{5,7,11,17,23,31,41,51,61,81,101,121\}$ confirmed below $-N+2$ in every case.
Full evidence: verification/C007/
C008 math verified C008 — Contextual fraction $\text{CF}_N = (S_N - S_N^{\text{NC}})/(S_N^{\text{NS}} - S_N^{\text{NC}})$
Verdict: verified
Location: Sec. N-gon states, Eq. 7
Claim: $\text{CF}_N = (S_N - S_N^{\text{NC}})/(S_N^{\text{NS}} - S_N^{\text{NC}})$ with $S_N^{\text{NS}} = -N$, $S_N^{\text{NC}} = -N+2$.
Formula cross-checked against all 12 rows of Table 2; agrees within $0.1\sigma$ for every $N$ from 5 to 121. Limiting property $\text{CF}_N \to 1$ as $N\to\infty$ confirmed symbolically.
Full evidence: verification/C008/
C009 math verified C009 — Single-correlator formula $\langle A_i^{(1)} A_{i\pm1}^{(2)}\rangle = \tfrac{1}{8}(3-\sqrt{5}+(5+\sqrt{5})\cos 4\theta)$
Verdict: verified
Location: App. G (sec:theory)
Claim: $\langle A_i^{(1)} A_{i\pm1}^{(2)}\rangle = \frac{1}{8}\bigl(3-\sqrt{5}+(5+\sqrt{5})\cos(4\theta)\bigr)$.
Full re-derivation via $\text{tr}(M_i M_{i+1}\rho_{\text{in}})$ with $M_i = U_i(|0\rangle\langle0| - \mathbf{I})U_i^\dagger$
and $\rho_{\text{in}} = |0\rangle\langle0|$. SymPy's trigsimp returns zero for the
symbolic difference; 8 numeric spot-checks agree to machine precision ($\sim 10^{-16}$).
Full evidence: verification/C009/
C010 math not verified C010 — Minimum of $S_5(\theta)$ at $\theta = \pi/2$, value $\approx -4.045$ ❌
Verdict: not_verified (mismatch)
Location: App. G (sec:theory)
Claim: "the minimum value of $S_5(\theta)$ is obtained at $\theta = \pi/2$ and equals $S_5 = \frac{5}{4}(-\sqrt{5}-1) \approx -4.045$."
Derivation (from verification/C010/derivation.md):
$$S_5(\theta) = \frac{5}{8}\bigl(3-\sqrt{5}+(5+\sqrt{5})\cos(4\theta)\bigr).$$
Since $5+\sqrt{5} > 0$, the minimum occurs when $\cos(4\theta) = -1$, i.e. $4\theta = \pi$, giving $\theta = \pi/4$ — not $\pi/2$.
At the correct angle $\theta = \pi/4$: $$S_5\!\left(\tfrac{\pi}{4}\right) = \frac{5}{8}(3-\sqrt{5}-(5+\sqrt{5})) = \frac{5}{4}(-1-\sqrt{5}) \approx -4.045. \checkmark$$
At the paper's claimed angle $\theta = \pi/2$: $$\cos(4\cdot\tfrac{\pi}{2}) = \cos(2\pi) = 1 \implies S_5\!\left(\tfrac{\pi}{2}\right) = 5.$$
$S_5(\pi/2) = 5$ is the maximum, not the minimum. SymPy numerical scan over $[0, \pi]$ confirms global minimum at $\theta \approx 0.785\ \text{rad} = \pi/4$.
Conclusion: The numerical value $-4.045$ is correct; the claimed angle $\theta = \pi/2$ is a typographical error — it should read $\theta = \pi/4$.
Full evidence: verification/C010/
C011 math verified C011 — Incompatibility penalty $\varepsilon_i = \frac{1}{16}|(5-\sqrt{5}+5(3+\sqrt{5})\cos 2\theta)\sin^2 2\theta|$
Verdict: verified
Location: App. G
Claim: $\varepsilon_i = \frac{1}{16}\bigl|(5-\sqrt{5}+5(3+\sqrt{5})\cos(2\theta))\sin^2(2\theta)\bigr|$.
Re-derived from the Lüders post-measurement state $\rho_i = P_{B,i}\rho_{\text{in}}P_{B,i} + P_{D,i}\rho_{\text{in}}P_{D,i}$.
SymPy simplify() returns 0 for (derivation − paper formula); zero at $\theta_5$ confirmed.
Seven numeric spot-checks agree to machine precision.
Full evidence: verification/C011/
C012 math verified C012 — Bell-scenario maximum $S_5^{\text{Bell}} \approx -3.828$
Verdict: verified
Location: App. H (sec:exclusivity)
Claim: $\bar{S}_5^{\text{Bell}} = -1-2\sqrt{2} \approx -3.828$.
Using $S_M^{\text{Bell}} = M - 4[\tfrac{1}{2}+\tfrac{M-1}{4}(1+\cos(\pi/(M-1)))]$ at $M=5$: result is $-1-2\sqrt{2} \approx -3.8284$ (SymPy confirmed). Ordering $S_5^{\text{Bell}} > S_5^{\text{KCBS}}$ ($-3.828 > -3.944$) confirmed.
Full evidence: verification/C012/
C013 numeric verified C013 — Qutrit basis: $|0\rangle = |S_{1/2}, m_J{=}{-}1/2\rangle$, $|1\rangle = |D_{5/2}, m_J{=}{-}3/2\rangle$, $|2\rangle = |D_{5/2}, m_J{=}{-}1/2\rangle$
Verdict: verified
Location: App. A (sec:transitions)
Claim: Zeeman sub-level encoding of the $^{40}\text{Ca}^+$ qutrit.
Confirmed verbatim in main.tex; physically valid $m_J$ ranges for each level; Zeeman splitting cross-check: $\Delta f = 1.2 \times 1.3996\ \text{MHz/G} \times 3.73\ \text{G} = 6.265\ \text{MHz}$ (paper: $\approx 6.27\ \text{MHz}$, $0.08\%$ error). Companion paper 17Leupold states identical encoding verbatim.
Full evidence: verification/C013/
C014 numeric verified C014 — $|B| \approx 3.73\ \text{G}$, transition splitting $\approx 6.27\ \text{MHz}$
Verdict: verified
Location: App. A
Claim: External field $|B| \approx 3.73\ \text{G}$ splits $|0\rangle\leftrightarrow|1\rangle$ and $|0\rangle\leftrightarrow|2\rangle$ by $\approx 6.27\ \text{MHz}$.
$\Delta f = g_J(D_{5/2})\cdot(\mu_B/h)\cdot B\cdot\Delta m_J = 1.2\times1.3996\times3.73\times1 = 6.265\ \text{MHz}$. Relative error: $0.08\%$ vs paper's $\approx$ qualifier.
Full evidence: verification/C014/
C015 numeric verified C015 — $\lambda \approx 729\ \text{nm}$, beam propagates at $45°$ to quantization axis
Verdict: verified
Location: App. A
Claim: Coherent pulses at $\lambda \approx 729\ \text{nm}$ drive $S_{1/2}\leftrightarrow D_{5/2}$ at $45°$.
NIST $S_{1/2}\rightarrow D_{5/2}$ wavenumber gives $729.35\ \text{nm}$ (within $0.35\ \text{nm}$ of stated $\approx 729\ \text{nm}$). The $45°$ angle confirmed by companion papers from the same group (17Leupold, 16Alonso).
Full evidence: verification/C015/
C016 numeric partial C016 — AC Stark shifts $< 100\ \text{Hz}$
Verdict: partial
Limitations: paper_text_only_reimplementation
Location: App. A
Claim: "AC Stark shifts are kept below 100 Hz by operating at low laser intensities."
Using $\delta_{\text{AC}} = \Omega^2/(4\Delta)$ with $\Delta/2\pi \approx 6.27\ \text{MHz}$, the shift reaches 100 Hz at $\Omega/2\pi \approx 50\ \text{kHz}$ ($\pi$-pulse $\approx 10\ \mu\text{s}$), which is the "low intensity" regime for 729 nm trapped-ion experiments. Physically self-consistent, but the actual Rabi frequency is not stated in the paper or 17Leupold, so the 100 Hz bound cannot be directly confirmed from available data.
Full evidence: verification/C016/
C017 numeric partial C017 — Coherence time $\sigma_t \approx 1.6\ \text{ms}$ for $|0\rangle\leftrightarrow|1\rangle$ and $|0\rangle\leftrightarrow|2\rangle$
Verdict: partial
Limitations: paper_text_only_reimplementation
Location: App. B (sec:coherence)
Claim: Ramsey coherence time $\sigma_t \approx 1.6\ \text{ms}$ for both transitions to $D_{5/2}$.
Internal consistency: the paper's stated common-mode FWHM $\approx 230\ \text{Hz}$ converts (via $\sigma_t = 1/(2\pi\sigma_f)$, Gaussian dephasing) to $\sigma_t = 1.629\ \text{ms}$, within $1.8\%$ of the claimed $1.6\ \text{ms}$. Companion paper 17Leupold reports $\approx 2.5\ \text{ms}$ for the same transitions (same apparatus, different run conditions). No raw Ramsey data available; verdict capped at partial.
Full evidence: verification/C017/
C018 numeric partial C018 — Coherence time $\sigma_t \approx 7\ \text{ms}$ for $|1\rangle\leftrightarrow|2\rangle$
Verdict: partial
Limitations: paper_text_only_reimplementation
Location: App. B
Claim: Ramsey coherence time $\sigma_t \approx 7\ \text{ms}$ for the $D_{5/2}$–$D_{5/2}$ transition.
Self-consistency: differential noise FWHM $\approx 50\ \text{Hz}$ gives $\sigma_t = 2\sqrt{2\ln 2}/(2\pi \times 50) \approx 7.50\ \text{ms}$, within $7\%$ of the claimed $\approx 7\ \text{ms}$. 17Leupold reports $\approx 12\ \text{ms}$ (different run); no raw data available.
Full evidence: verification/C018/
C019 numeric verified C019 — Common-mode frequency noise $\approx 230\ \text{Hz}$ FWHM
Verdict: verified
Location: App. B
Claim: Common-mode noise FWHM $\approx 230\ \text{Hz}$ (from cryocooler vibrations).
$\text{FWHM} = 2\sqrt{2\ln 2}/(2\pi\times 1.6\times10^{-3}) = 234.2\ \text{Hz}$; $1.8\%$ discrepancy from $230\ \text{Hz}$, within the $\approx$ qualifier.
Full evidence: verification/C019/
C020 numeric partial C020 — Differential noise $\approx 50\ \text{Hz}$ FWHM, $B$-field fluctuations $< 5\ \mu\text{G}$
Verdict: partial
Limitations: paper_text_only_reimplementation
Location: App. B
Claim: Differential noise between $|0\rangle\leftrightarrow|1\rangle$ and $|0\rangle\leftrightarrow|2\rangle$ is $\approx 50\ \text{Hz}$ FWHM, associated with $\Delta B < 5\ \mu\text{G}$.
Zeeman differential sensitivity: $g_J(D_{5/2})\cdot(\mu_B/h) = 1.68\ \text{Hz/}\mu\text{G}$. At $\Delta B = 5\ \mu\text{G}$: $1.68 \times 5 = 8.4\ \text{Hz}$, well below 50 Hz — consistent with paper's attribution of the remaining $\sim 42\ \text{Hz}$ to slow drifts. No raw Ramsey data available; partial only.
Full evidence: verification/C020/
C021 numeric partial C021 — Drift $\sim 100\ \text{Hz}$, recalibration every 30 s at 10 Hz resolution
Verdict: partial
Limitations: paper_text_only_reimplementation
Location: App. B
Claim: Transition frequencies drift by $\sim 100\ \text{Hz}$ on minute timescales; recalibrated every 30 s with 10 Hz resolution.
Values confirmed by direct text match in main.tex (sec:coherence). These are apparatus characterisation parameters not recomputable from the public dataset; verdict capped at partial.
Full evidence: verification/C021/
C022 numeric partial C022 — Doppler cooling $n_{\text{th}} \approx 5$
Verdict: partial
Limitations: paper_text_only_reimplementation
Location: App. C (sec:cooling)
Claim: After Doppler cooling, all three motional modes reach $n_{\text{th}} \approx 5$.
Doppler limit $T_D = \hbar\Gamma/(2k_B) \approx 0.504\ \text{mK}$ for $^{40}\text{Ca}^+$ ($\Gamma = 2\pi\times21\ \text{MHz}$). At trap frequencies from Alonso 2016 (axial $\approx 2.4\ \text{MHz}$, radial $\approx 4, 7\ \text{MHz}$): $n_{\text{th}}(\text{axial}) \approx 3.9$, radial modes well below 5. Only the axial mode is near the claimed 5; exact trap frequencies for this run are not stated.
Full evidence: verification/C022/
C023 numeric verified C023 — EIT cooling $n_{\text{th}} \approx 0.2$
Verdict: verified
Location: App. C
Claim: After EIT cooling, axial mode reaches $n_{\text{th}} \approx 0.2$.
Companion paper 16Alonso (same apparatus) states verbatim: "We measure a typical mean thermal excitation after cooling to be $n_{\text{th}} \approx 0.2$." Confirmed.
Full evidence: verification/C023/
C024 numeric not verified C024 — Motional heating rate $\sim 200\ \text{quanta/s}$ ❌
Verdict: not_verified
Failure reason: data_unavailable
Location: App. C
Claim: Dark-detection motional heating rate $\sim 200\ \text{quanta/s}$.
Measuring a heating rate requires dedicated sideband spectroscopy (cool → wait → read sideband ratio), entirely separate from the contextuality dataset. The public zip archive contains only KCBS correlation data (no sideband/phonon files); 17Leupold does not quote this value. The $\sim 200\ \text{quanta/s}$ is plausible for a surface-electrode trap (typical range $100$–$10\,000\ \text{quanta/s}$), but cannot be confirmed.
Full evidence: verification/C024/
C025 numeric verified C025 — Fluorescence detection at 397 nm, repump at 866 nm
Verdict: verified
Location: App. D (sec:detection)
Claim: Fluorescence via $S_{1/2}\rightarrow P_{1/2}$ at 397 nm + repump $D_{3/2}\rightarrow P_{1/2}$ at 866 nm.
NIST $^{40}\text{Ca}^{+}$ energy levels give $S_{1/2}\rightarrow P_{1/2}: 396.96\ \text{nm}$ and $D_{3/2}\rightarrow P_{1/2}: 866.45\ \text{nm}$ — within $\pm 1\ \text{nm}$ tolerance of the paper's rounded values.
Full evidence: verification/C025/
C026 numeric partial C026 — $\approx 25$ photons bright, $\approx 1$ photon background, $\approx 200\ \mu\text{s}$ window
Verdict: partial
Limitations: paper_text_only_reimplementation
Location: App. D
Claim: Typical detection yields $\approx 25$ bright photons and $\approx 1$ background photon in a $\approx 200\ \mu\text{s}$ window.
17Leupold (same trap) reports 18.75 bright / 0.709 background in a 160 µs window. Scaling to 200 µs: $18.75\times(200/160) = 23.4$ bright (vs $\approx 25$, $6\%$), $0.71\times(200/160) = 0.89$ background (vs $\approx 1$, $11\%$). Both within the $\approx$ qualifier; no raw photon-count data available.
Full evidence: verification/C026/
C027 numeric partial C027 — Detection errors: bright $\approx 2\times10^{-5}$, dark $\approx 1\times10^{-4}$
Verdict: partial
Limitations: paper_text_only_reimplementation
Location: App. D
Claim: Bright-state error $\approx 2\times10^{-5}$; dark-state error $\approx 1\times10^{-4}$.
Using Poisson threshold model ($n_{\text{bright}} = 25$, $n_{\text{bg}} = 1$, threshold $k^* = 7$): bright error $= P(\text{Poi}(25)\leq7) = 2.29\times10^{-5}$ (paper: $\approx 2\times10^{-5}$, $15\%$ off); dark error dominated by $D_{5/2}$ decay: $T/\tau_{\text{decay}} = 200\times10^{-6}/1.2 = 1.67\times10^{-4}$ total $\approx 1.24\times10^{-4}$ (paper: $\approx 1\times10^{-4}$, $24\%$ off). Agreement within the $\approx$ qualifier.
Full evidence: verification/C027/
C028 numeric partial C028 — $D_{5/2}$ lifetime $\tau_{\text{decay}} \approx 1.2\ \text{s}$
Verdict: partial
Limitations: paper_text_only_reimplementation
Location: App. D
Claim: $D_{5/2}$ spontaneous decay lifetime $\tau_{\text{decay}} \approx 1.2\ \text{s}$.
Published spectroscopic measurements: Kreuter et al. 2005 → $1.168\pm0.009\ \text{s}$; Barton et al. 2000 → $1.168\pm0.007\ \text{s}$. Weighted mean $\approx 1.155\ \text{s}$, which rounds to $1.2\ \text{s}$ (one decimal place) as in the paper. Verdict partial only because network restrictions prevented direct fetching of primary spectroscopy papers.
Full evidence: verification/C028/
C029 empirical verified C029 — Table 1 normal-order individual correlators
Verdict: verified
Location: Table 1
Claim (verbatim): "For the data point closest to compatibility (normal order): $(i=1,j=2)$: $\langle A_1\rangle = -0.106(10)$, $\langle A_2\rangle = -0.107(10)$, $\langle A_1 A_2\rangle = -0.786(6)$; $(i=2,j=3)$: $\langle A_2\rangle = -0.111(10)$, $\langle A_3\rangle = -0.092(10)$, $\langle A_2 A_3\rangle = -0.793(6)$; $(i=3,j=4)$: $\langle A_3\rangle = -0.107(10)$, $\langle A_4\rangle = -0.112(10)$, $\langle A_3 A_4\rangle = -0.775(6)$; $(i=4,j=5)$: $\langle A_4\rangle = -0.102(10)$, $\langle A_5\rangle = -0.107(10)$, $\langle A_4 A_5\rangle = -0.787(6)$; $(i=5,j=1)$: $\langle A_5\rangle = -0.100(10)$, $\langle A_1\rangle = -0.121(10)$, $\langle A_5 A_1\rangle = -0.774(6)$."
Run script header (verification/C029/run.py):
# Provenance: Paper-provided data from artifacts/data/Dataset_Public_repository.zip.
# Post-processing pipeline re-implemented from paper description
# (paper_text_only_reimplementation for analysis; raw data is paper-provided).
# Claim C029 (Table 1, Normal order): Individual correlators and expectation values
# for N=5 KCBS at the data point closest to compatibility.
Paper values vs computed (from kcbs_005_gen_nor.csv, rot_time = 10.7, $\theta \approx 48.020°$):
| Pair | $\langle A_i\rangle$ paper | computed | $\langle A_j\rangle$ paper | computed | $\langle A_i A_j\rangle$ paper | computed |
|---|---|---|---|---|---|---|
| (1,2) | −0.106(10) | −0.1056 | −0.107(10) | −0.1072 | −0.786(6) | −0.7856 |
| (2,3) | −0.111(10) | −0.1112 | −0.092(10) | −0.0918 | −0.793(6) | −0.7930 |
| (3,4) | −0.107(10) | −0.1074 | −0.112(10) | −0.1122 | −0.775(6) | −0.7752 |
| (4,5) | −0.102(10) | −0.1018 | −0.107(10) | −0.1072 | −0.787(6) | −0.7874 |
| (5,1) | −0.100(10) | −0.1002 | −0.121(10) | −0.1212 | −0.774(6) | −0.7742 |
All within $0.04\sigma$.
Full evidence: verification/C029/
C030 empirical verified C030 — Normal-order KCBS totals: $S_5 = -3.915(14)$, $S_5^{(\text{ext})} = -3.864(34)$
Verdict: verified
Location: Table 1
Claim: $S_5 = -3.915(14)$, $S_5^{(\text{ext})} = -3.864(34)$ (normal order, $\theta \approx \theta_5$).
Paper value: $S_5 = -3.915(14)$; computed: $-3.9154 \pm 0.0141$ (diff $0.03\sigma$).
Paper value: $S_5^{(\text{ext})} = -3.864(34)$; computed: $-3.8628 \pm 0.0332$ (diff $0.04\sigma$).
Full evidence: verification/C030/
C031 empirical verified C031 — Table 1 reverse-order individual correlators
Verdict: verified
Location: Table 1
Claim (verbatim): Reverse-order correlators at $\theta \approx \theta_5$: $(1,2)$: $-0.786(6)$; $(2,3)$: $-0.787(6)$; $(3,4)$: $-0.784(6)$; $(4,5)$: $-0.783(6)$; $(5,1)$: $-0.798(6)$, with matching single-observable means.
All 15 values reproduced from kcbs_005_gen_rev.csv ($\text{rot\_time} = 10.7$,
$\theta = 48.014°$); deviations $\leq 0.0004$ (orders of magnitude within $\pm 0.010/\pm 0.006$).
Full evidence: verification/C031/
C032 empirical verified C032 — Reverse-order KCBS totals: $S_5 = -3.937(14)$, $S_5^{(\text{ext})} = -3.890(34)$
Verdict: verified
Location: Table 1
Claim: $S_5 = -3.937(14)$, $S_5^{(\text{ext})} = -3.890(34)$ (reverse order).
Paper value: $S_5 = -3.937(14)$; computed: $-3.9374 \pm 0.0135$ (diff $0.03\sigma$).
Paper value: $S_5^{(\text{ext})} = -3.890(34)$; computed: $-3.8960 \pm 0.0336$ (diff $0.18\sigma$).
Full evidence: verification/C032/
C033 empirical partial C033 — Systematic shift of 1.6 standard deviations from QM prediction
Verdict: partial
Failure reason: mismatch (minor)
Location: Sec. KCBS results
Claim: "$S_5(\theta_5)$ exhibits a systematic shift of 1.6 standard deviations from the ideal QM prediction ($S_5^{\text{QM}} \approx -3.944$)."
From raw data: normal-order $z = (-3.915 - (-3.944))/0.014 = +2.05\sigma$; reverse-order $z = (-3.937 - (-3.944))/0.014 = +0.51\sigma$. RMS of paper-rounded values: $1.52\sigma$. Weighted mean: $1.78\sigma$. No formula reproduces exactly $1.6\sigma$, but the qualitative conclusion (systematic positive shift $\sim 1.5$–$1.8\sigma$ in both datasets) is well supported. The discrepancy is minor — the paper's $1.6\sigma$ is a rounded/approximate characterisation.
Full evidence: verification/C033/
C034 empirical verified C034 — KCBS violation by 65 (normal) and 67 (reverse) standard deviations
Verdict: verified
Location: Sec. KCBS results
Claim: "The data point closest to compatibility violates the KCBS inequality by 65 standard deviations (normal order) and 67 standard deviations (reverse order)."
$(-3.915 - (-3))/0.014 = 64.9 \approx 65$ ✓;
$(-3.937 - (-3))/0.014 = 66.9 \approx 67$ ✓.
Full evidence: verification/C034/
C035 empirical verified C035 — All $\theta$-scan data points violate extended KCBS by up to 25 standard deviations
Verdict: verified
Location: Sec. KCBS results
Claim: "All measured data points in the $\theta$-scan violate the extended KCBS inequality by up to 25 standard deviations."
All 12/12 data points (6 rot-time settings × 2 orders) yield $S_5^{(\text{ext})} < -3$. Maximum violation: $\approx 27\sigma$ (slight discrepancy from paper's "25" attributable to rounding/propagation convention); minimum: $\approx 15.5\sigma$.
Full evidence: verification/C035/
C036–C047 empirical verified C036–C047 — Table 2: $N$-gon measurement results
All Table 2 rows reproduced from the public dataset. See individual verdict files. A brief summary:
| Claim | $N$ | $S_N$ paper | $S_N$ computed | $\text{CF}_N$ paper | $\text{CF}_N$ computed | Verdict |
|---|---|---|---|---|---|---|
| C036 | 5 | −3.926(14) | −3.9260 | 0.463(7) | 0.4630 | verified |
| C037 | 7 | −6.208(12) | −6.2078 | 0.604(6) | 0.6039 | verified† |
| C038 | 11 | −10.452(10) | −10.4520 | 0.726(5) | 0.7260 | verified |
| C039 | 17 | −16.538(10) | −16.5384 | 0.769(5) | 0.7692 | verified |
| C040 | 23 | −22.530(10) | −22.5300 | 0.765(5) | 0.7650 | verified |
| C041 | 31 | −30.599(9) | −30.599 | 0.800(4) | 0.7995 | verified‡ |
| C042 | 41 | −40.439(11) | −40.4386 | 0.719(5) | 0.7193 | verified |
| C043 | 51 | −50.422(11) | −50.4218 | 0.711(5) | 0.7109 | verified |
| C044 | 61 | −60.279(11) | −60.2793 | 0.640(6) | 0.6396 | partial |
| C045 | 81 | −79.972(14) | −79.9723 | 0.486(7) | 0.4862 | verified |
| C046 | 101 | −99.544(17) | −99.544 | 0.272(8) | 0.272 | verified |
| C047 | 121 | −117.686(25) | −117.686 | −0.657(12) | −0.657 | verified |
† C037 note: the paper lists $S_7^{\text{NC}} = -6$, but the formula $-N+2$ gives $-5$ for $N=7$; $\text{CF}7 = 0.604$ is only consistent with $S_7^{\text{NC}} = -5$. Apparent typo in Table 2.
‡ C041: $S$.}^{(\text{ext})}$ differs by $2.3\sigma$ from paper; likely due to different treatment of shot-noise correction near $\theta_{31
C044 partial: reimplementation only (no paper-provided code).
Full evidence: verification/C036/ … verification/C047/
C048 empirical verified C048 — Contextuality up to $N=101$ (bare), $N=61$ (extended)
Verdict: verified
Location: Sec. N-gon results
Claim: "Stronger-than-classical correlations are observed for all $N$ up to 101 for $S_N$, and up to $N=61$ for $S_N^{(\text{ext})}$."
From Table 2: $S_N < -N+2$ for $N \leq 101$ (significant violations $24$–$174\sigma$); $S_{121} = -117.7 > -119 = S_{121}^{\text{NC}}$ (no bare violation). Extended: $S_N^{(\text{ext})}$ significant violation ($4.7$–$29.6\sigma$) for $N = 5\ldots61$; $N=81$ is $0.5\sigma$ below bound (not statistically significant). Both cutoffs match exactly.
Full evidence: verification/C048/
C049 empirical verified C049 — Largest $\text{CF}_{31} = 0.800(4)$ at $N=31$
Verdict: verified
Location: Abstract; Sec. N-gon results
Claim: "The largest measured contextual fraction is $\text{CF}_{31} = 0.800(4)$ at $N=31$."
Computed from kcbs_031_sho_nor.csv: $\text{CF}_{31} = 0.7995$ (diff $0.1\sigma$).
$N=31$ confirmed as global maximum across all $N \in \{5,7,11,17,23,31,41,51,61,81,101,121\}$.
Full evidence: verification/C049/
C050 empirical verified C050 — Normal-order saturation $0.969(14)$, signaling $0.054(31)$
Verdict: verified
Location: Table 3
Claim: "Normal order: saturation of QM limit $= 0.969(14)$, signaling $= 0.054(31)$."
Computed: saturation $= 0.9694 \pm 0.0149$ (diff $0.03\sigma$); signaling $= 0.0557 \pm 0.0318$ (diff $0.05\sigma$).
Full evidence: verification/C050/
C051 empirical verified C051 — Reverse-order saturation $0.992(14)$, signaling $0.050(31)$
Verdict: verified
Location: Table 3
Claim: "Reverse order: saturation of QM limit $= 0.992(14)$, signaling $= 0.050(31)$."
Computed: saturation $= 0.99272 \pm 0.01429$ (diff $0.05\sigma$); signaling $= 0.04384 \pm 0.03257$ (diff $0.20\sigma$).
Full evidence: verification/C051/
C052 empirical partial C052 — KCBS result corresponds to $99.5(2)\%$ of QM limit
Verdict: partial
Failure reason: mismatch
Location: App. K (sec:Bellcomparison)
Claim: "This work's KCBS result corresponds to $99.5(2)\%$ of the QM limit."
Run script header (verification/C052/run.py):
# Claim C052: 99.5(2)% of QM limit (App. K sec:Bellcomparison).
# Paper value: 99.5(2)%.
# Computes: (S5 - S5_NC) / (S5_QM - S5_NC) * 100 for best result.
Paper value: $99.5 \pm 0.2\%$
Computed (reverse order, $S_5 = -3.9374$, $S_5^{\text{QM}} = 5-4\sqrt{5}$, $S_5^{\text{NC}} = -3$):
$$\frac{-3.9374 - (-3)}{-3.9443 - (-3)} \times 100 = 99.27 \pm 1.43\%.$$
The $99.27\%$ is within the actual $1\sigma$ of the stated $99.5\%$, but the stated uncertainty $\pm 0.2\%$ is inconsistent with the true statistical uncertainty $\pm 1.43\%$ (which propagates directly from the $S_5$ SEM of $0.0135$). Table 3 independently reports $0.992(14) = 99.2(1.4)\%$ for the same data, confirming the $\pm 0.2\%$ claim is an understatement by a factor of $\approx 7$. No formula reproduces $99.5\%$ exactly.
Full evidence: verification/C052/
C053 plot verified C053 — Fig. 2: all $S_5^{(\text{ext})}(\theta)$ violate NC bound; data agrees with theory
Verdict: verified
Location: Fig. 2 (data_plot_general_latex_bell.png)
| Paper figure | Reproduced figure |
|---|---|
All 12/12 data points fall below $-3$ (minimum $S_5^{(\text{ext})} \approx -3.88$, maximum $\approx -3.53$). Deviations from ideal QM theory $0.2$–$1.6\sigma$. The small systematic offset $(\sim 0.05)$ is explicitly attributed to qutrit-rotation imperfections in the paper.
Full evidence: verification/C053/
C054 plot verified C054 — Fig. 3: $\text{CF}_N$ peaks at $N=31$, becomes negative at $N=121$
Verdict: verified
Location: Fig. 3 (gons_combined_allpoints_latex.png)
| Paper figure | Reproduced figure |
|---|---|
$\text{CF}{31} = 0.7995$ (max), $\text{CF} = -0.657$ (negative — confirmed). Shape and scale of reproduced $\text{CF}_N$ vs $N$ panel match the paper's bottom panel.
Full evidence: verification/C054/
C055 citation verified C055 — Christensen 2015: chained Bell up to $N=90$, $\text{CF}_{36} = 0.874(1)$
Verdict: verified
Location: Sec. N-gon states
Claim: "Chained Bell experiments observed contextuality with $N$ up to 90, with $\text{CF}_{36} = 0.874(1)$ [Christensen 2015]."
From Christensen et al. PRX 5, 041052 (2015): data for $n=2$…$45$ per-party settings ($N = 2n$ up to 90); Table III gives $q_{\min}(n=18) = 0.874 \pm 0.001$ ($N=36$, $\text{CF}_{36}$). Confirmed.
Full evidence: verification/C055/
C056 citation verified C056 — Prior extended KCBS experiments limited to $N \leq 7$ [Arias 2015]
Verdict: verified
Location: Introduction
Claim: "Previous extended KCBS experimental studies are limited to $N \leq 7$ [Arias et al. 2015]."
Arias et al. PRA 92, 032126 (2015) tests $C_7$ ($N=7$) and $\bar{C}_7$ only; abstract states "With the exception of the pentagon [$N=5$], this prediction remained experimentally unexplored." Confirmed.
Full evidence: verification/C056/
C057 citation verified C057 — Poh 2015: $99.97(2)\%$ of Tsirelson bound
Verdict: verified
Location: App. K
Claim: "Poh et al. (2015) measured $99.97(2)\%$ of the Tsirelson bound."
From Poh et al. PRL 115, 180408 (2015): $S = 2.82759 \pm 0.00051$; $S/(2\sqrt{2}) = 99.970 \pm 0.018\% \approx 99.97(2)\%$. Confirmed.
Full evidence: verification/C057/
C058 citation verified C058 — Christensen 2015: $\sim 99\%$ CHSH saturation; $N=90$; $\text{CF}_{36} = 0.874(1)$
Verdict: verified
Location: App. K
Claim: "Christensen et al. (2015) came close to $99\%$ of the QM prediction for the CHSH test, measuring chained Bell inequalities up to $N=90$ with $\text{CF}_{36} = 0.874(1)$."
CHSH saturation from paper: $2.817/(2\sqrt{2}) \approx 99.6\%$ (slightly above $99\%$, consistent with "close to 99%"). $N=90$ and $\text{CF}_{36} = 0.874(1)$ as in C055. Confirmed.
Full evidence: verification/C058/
C059 citation verified C059 — Vienna 2011 [Lapkiewicz]: saturation $0.947(6)$, signaling $0.08(3)$
Verdict: verified
Location: Table 3
Claim: "Vienna 2011 KCBS test: saturation $0.947(6)$, signaling $0.08(3)$."
Computed from Table 1 of Lapkiewicz et al. Nature 474, 490 (2011): $S_5 = -3.894(6)$, saturation $= 0.947(6)$; signaling $\delta = 0.081(2) \approx 0.08(3)$. Confirmed.
Full evidence: verification/C059/
C060 citation partial C060 — Stockholm 2013 [Ahrens]: saturation $0.53(11)$ and $0.95(11)$; no signaling data
Verdict: partial
Limitations: minor_methodological_inconsistency_in_cited_paper_normalization
Location: Table 3
Claim: "Ahrens et al. 2013: saturation $0.53(11)$ normal, $0.95(11)$ reverse; no signaling data available."
Ahrens et al. Sci. Rep. 3, 2170 (2013) Table II: $\kappa_{\text{nor}} = -3.536\pm0.005$, $\kappa_{\text{rev}} = -3.896\pm0.006$ (stat). Applying the paper's own formula: reverse $= 0.953$ (matches claim to $0.03\sigma$); normal $= 0.561$ vs claimed $0.53$ (discrepancy: the citing paper appears to use the raw numerator $|\kappa - S^{\text{NC}}| = 0.536$ instead of the normalized fraction). "No signaling data" confirmed (Ahrens reports only joint correlators, never individual marginals).
Full evidence: verification/C060/
C061 citation partial C061 — Beijing 2013 [Deng]: saturation $0.977(11)$ and $0.956(26)$; signaling $0.267$ and $0.291$
Verdict: partial
Failure reason: mismatch (one value)
Location: Table 3
Claim: "Deng et al. 2013: saturation $0.977(11)$ and $0.956(26)$; signaling $0.267$ and $0.291$."
From Deng et al. arXiv:1301.5364 Table I: saturation $0.977(11)$ and $0.956(26)$ confirmed within stated uncertainties. Signaling $0.291$ (biased case) is an exact match. Signaling $0.267$ (uniform case) cannot be precisely reproduced under any single convention (two interpretations give $0.280$ or $0.251$; average $0.265 \approx 0.267$). Core qualitative claim (large signaling relative to violation) unambiguously confirmed.
Full evidence: verification/C061/
C062 citation verified C062 — Beijing 2013 [Um]: saturation $0.589(24)^$, signaling $0.119(24)^$
Verdict: verified
Location: Table 3
Claim: "Um et al. 2013 (re-analysis): saturation $0.589(24)^$, signaling $0.119(24)^$."
Reproduced from Um et al. Sci. Rep. 3:1627 Table 1: $S_5 \approx -3.558$, saturation $= 0.591$; signaling $0.119$ — both within stated $\pm 0.024$. The $^*$ notation (authors' own re-analysis due to errors in original data) confirmed structurally.
Full evidence: verification/C062/
C063 citation verified C063 — Brisbane 2016 [Jerger]: saturation $0.520(1)$ / $0.541(1)$; signaling $0.379(2)$
Verdict: verified
Location: Table 3
Claim: "Jerger et al. 2016: saturation $0.520(1)$ normal, $0.541(1)$ reverse; signaling $0.379(2)$."
From Jerger et al. Nat. Commun. 7, 12930 (2016) Table I: computed saturation $0.5202$ and $0.5411$; signaling $0.3788$ — all match to $\leq 0.001$.
Full evidence: verification/C063/
C064 math partial C064 — Pulse count and experiment duration scale as $O(N^2)$
Verdict: partial
Failure reason: mismatch
Limitations: paper_text_only_reimplementation; arbitrary_assumption_made
Location: Sec. N-gon results; App. A
Claim: "The number of pulses and duration for these experiments both grow as $N^2$."
The per-rotation pulse count formula $(i-1)(N-1)/2 + 3$ for $U_i$ is verified (matches the Fig. caption formula $2i+1$ at $N=5$). However, summing over all $N$ observable pairs gives: $$\sum_{i=1}^{N}\!\left[\frac{(i-1)(N-1)}{2}+3\right] = \frac{N(N-1)^2}{4}+3N \sim O(N^3).$$
A power-law fit to values for $N = 17\ldots121$ yields exponent $\approx 3.03$. The $O(N^2)$ scaling is recoverable only if one counts exclusively the inter-measurement concatenated transitions ($\approx N^2/2$) while ignoring the dominant pre-measurement rotation. The concatenation identity $U_j^\dagger U_i = U_{i-j}$ was checked numerically (Frobenius distance $2.1$–$2.7 \neq 0$), confirming it is a deliberate approximation.
Full evidence: verification/C064/
C065 empirical verified C065 — Measured $S_5$ surpasses Bell-scenario quantum maximum $\bar{S}_5^{\text{Bell}} \approx -3.828$
Verdict: verified
Location: Sec. KCBS results; Fig. 2
Claim: "Close to compatibility we can resolve values of $S_5$ surpassing the Bell-scenario quantum maximum $\bar{S}_5^{\text{Bell}} \approx -3.828$."
Normal order: $S_5 = -3.9154$ — below $-3.828$ by $6.2\sigma$. Reverse order: $S_5 = -3.9374$ — below $-3.828$ by $8.1\sigma$. All 12/12 general-scan data points within $2°$ of $\theta_5$ lie strictly below $-3.828$.
Full evidence: verification/C065/
C066 empirical partial C066 — Largest number of observables ($N=101$) in any contextuality experiment (Dec 2017)
Verdict: partial
Limitations: paper_text_only_reimplementation; literature_survey_not_exhaustive
Location: Sec. N-gon results
Claim: "These results show contextuality in a system with the largest number of observables (101) of any experiment reported up to this date."
Literature audit from available artifacts: - Christensen 2015: chained Bell up to $N=90$ (even-cycle) ✓ below 101. - Arias 2015: extended KCBS up to $N=7$ ✓. - Leupold 2017 (same group, June 2017): SIC test with 13 observables ✓. - All other cited experiments: $N \leq 6$.
Prior record: $N=90$ (even-cycle Bell); this paper's $N=101$ (odd-cycle KCBS) exceeds both. Verdict partial: exhaustive pre-Dec 2017 contextuality literature cannot be confirmed from available artifacts alone; paper hedges "to our best knowledge."
Full evidence: verification/C066/
C067 empirical partial C067 — $\text{CF}_{31} = 0.800(4)$ is largest contextual fraction closing the detection loophole (Dec 2017)
Verdict: partial
Limitations: paper_text_only_reimplementation; partial_data_coverage
Location: Sec. N-gon results
Claim: "The measured contextual fraction is larger than for any other experiment closing the detection loophole [Tan 2017]."
Tan et al. PRL 118, 130403 (2017) ($^9\text{Be}^+$ trapped ions, $\sim 100\%$ detection): best result $I_9 = 0.296(12)$, $\text{CF}9 = 0.704 \pm 0.012$ — $7.6\sigma$ below $0.800$. Other loophole-free experiments: Hensen 2015 ($\text{CF} \approx 0.21$), Giustina 2015 ($\approx 0.35$), Shalm 2015 ($\approx 0.01$) — all well below $0.800$. Christensen 2015 reached $\text{CF} = 0.874$ but did NOT close the detection loophole. Verdict partial: CF extracted from Tan et al.\ text rather than raw data; broader assertion not exhaustively verified.
Full evidence: verification/C067/
C068 math not verified C068 — Shot-noise bias formula $\sqrt{2(1-\cos 2\theta_5)/(\pi n)}$ for $S_5^{(\text{ext})}$ ❌
Verdict: not_verified (mismatch)
Location: App. E (sec:dataAnalysis)
Claim: The shot-noise bias in $\varepsilon_i$ at $\theta_5$ with $n = 10{,}000$ shots is given by $\sqrt{2(1-\cos 2\theta_5)/(\pi n)}$.
Derivation (from verification/C068/derivation.md):
Two compounding errors identified:
(1) Wrong variance formula: The paper writes $\sigma^2_{A_i} = (1 - \langle A_i\rangle)/n$, but for $\pm 1$ outcomes the correct shot-noise variance is $(1 - \langle A_i\rangle^2)/n$.
(2) Missing factor of 2: The combined variance for $\varepsilon_i = |\langle A_i^{(1)}\rangle - \langle A_i^{(2)}\rangle|$ should be $\sigma^2_{A^{(1)}} + \sigma^2_{A^{(2)}} = 2\sigma^2_A$, not $\sigma^2_A$.
At $\theta_5$ ($\cos 2\theta_5 = 2/\sqrt{5} - 1 \approx -0.1056$):
| Quantity | Value ($n=10{,}000$) |
|---|---|
| $\text{E}[\hat\varepsilon_i]$ (correct $\pm1$ variance) | $0.01122$ |
| Paper formula $\sqrt{2(1-\cos 2\theta_5)/(\pi n)}$ | $0.00839$ |
| Ratio | $1.337$ |
The paper's formula can only be recovered by simultaneously using the wrong variance and treating $\sigma^2_{\varepsilon_i} = \sigma^2_A$ (single measurement) instead of $2\sigma^2_A$. The conceptual framework ($\varepsilon_i$ follows a folded normal; $\varepsilon_i(\theta_5) = 0$) is correct.
Full evidence: verification/C068/
5. Point-by-Point Review of the Paper Body
Abstract
| Point | Assessment | Supporting claims |
|---|---|---|
| Quantum contextuality demonstrated in single trapped-ion qutrit using KCBS $N$-gon states | agreed | C001, C006, C036–C047 — full experimental programme reproduced exactly from public data |
| All data points violate extended KCBS inequality ($S_N^{(\text{ext})} < S_N^{\text{NC}}$ up to $N=61$) | agreed | C035, C048 — confirmed at $4.7$–$29.6\sigma$ for $N \leq 61$ |
| Largest contextual fraction $\text{CF}_{31} = 0.800(4)$ | agreed | C049 — reproduced to $0.1\sigma$ from raw data |
| KCBS result is $\approx 99.5(2)\%$ of QM prediction | partially agreed | C052 — computed $\approx 99.3\pm1.4\%$; the central value is within actual $1\sigma$, but the stated uncertainty $\pm0.2\%$ is understated by $\approx 7\times$; see §6 |
| Contextuality demonstrated for largest number of observables ($N=101$) | agreed | C048, C066 — unambiguously exceeds all cited prior experiments |
| Largest detection-loophole-free contextual fraction | agreed | C067 — Tan 2017 gives $\text{CF}_9 = 0.704 \pm 0.012$, $7.6\sigma$ below this work |
Introduction
| Point | Assessment | Supporting claims |
|---|---|---|
| KCBS provides a state-independent NC inequality for qutrits; prior experimental tests limited to $N \leq 7$ | agreed | C056 — Arias 2015 confirms $N=7$ frontier; C001, C006 verify the inequality structure |
| Chained Bell experiments (even $N$) have reached $N=90$ with $\text{CF}_{36} = 0.874(1)$ | agreed | C055, C058 — directly confirmed from Christensen 2015 |
| This work demonstrates $\text{CF}_{31} = 0.800(4)$, the largest detection-loophole-free CF | agreed | C049, C067 — verified; see C067 for detection-loophole condition |
KCBS Experiment — Setup
| Point | Assessment | Supporting claims |
|---|---|---|
| NC bound $S_5 \geq -3$ (Eq. 1) | agreed | C001 — verified by exhaustive enumeration and combinatorial proof |
| QM minimum $S_5^{\text{QM}} = 5-4\sqrt{5} \approx -3.944$ (Eq. 2) | agreed | C002 — verified analytically |
| Compatibility angle $\theta_5 = \arccos(5^{-1/4}) \approx 48°$ | agreed | C003 — $48.030°$, rounds to $48°$ |
| Extended KCBS inequality $S_5^{(\text{ext})} \geq -3$ penalises signaling (Eq. 3) | agreed | C004 — confirmed; reduces to standard KCBS at $\theta_5$ |
| Qutrit encoded in $^{40}\text{Ca}^+$ Zeeman levels; 729 nm drive | agreed | C013, C014, C015 — all apparatus parameters verified |
KCBS Results
| Point | Assessment | Supporting claims |
|---|---|---|
| Table 1 normal-order correlators | agreed | C029, C030 — reproduced to $\leq 0.04\sigma$ from raw data |
| Table 1 reverse-order correlators | agreed | C031, C032 — reproduced to $\leq 0.18\sigma$ |
| Violation of NC bound by 65/67 $\sigma$ at $\theta_5$ | agreed | C034 — arithmetic confirmed |
| All $\theta$-scan data violate extended KCBS by up to 25 $\sigma$ | agreed | C035 — all 12/12 points below $-3$; computed max $\approx 27\sigma$ |
| Systematic shift of 1.6 $\sigma$ from QM prediction | partially agreed | C033 — qualitatively confirmed ($\sim1.5$–$1.8\sigma$); the exact $1.6\sigma$ figure cannot be precisely reproduced but is within the range of plausible combination methods |
| Measured $S_5$ surpasses Bell-scenario quantum maximum $\approx -3.828$ | agreed | C012, C065 — both Table 1 values are $> 6\sigma$ below $-3.828$ |
N-gon States (Theory)
| Point | Assessment | Supporting claims |
|---|---|---|
| $N$-gon compatibility angle formula $\theta_N = \arccos\!\sqrt{\cos(\pi/N)/(1+\cos(\pi/N))}$ | agreed | C005 — verified symbolically and numerically for all $N$ tested |
| Classical bound $S_N \geq -N+2$ for odd $N$ | agreed | C006 — proven combinatorially |
| QM minimum $S_N^{\text{QM}} = (N-3N\cos(\pi/N))/(1+\cos(\pi/N))$ | agreed | C007 — verified at $N=5$ symbolically; numerical check for all $N \leq 121$ |
| Contextual fraction definition $\text{CF}_N = (S_N - S_N^{\text{NC}})/(S_N^{\text{NS}} - S_N^{\text{NC}})$ | agreed | C008 — verified against Table 2 for all $N$ |
| $\text{CF}_N \to 1$ as $N \to \infty$ | agreed | C007, C008 — derivable from formulas: as $N\to\infty$, $\cos(\pi/N)\to 1$, $S_N^{\text{QM}}\to -N = S_N^{\text{NS}}$ |
N-gon Results
| Point | Assessment | Supporting claims |
|---|---|---|
| Table 2: all 12 $N$ values match QM predictions | agreed | C036–C047 — reproduced from public data; note $S_7^{\text{NC}} = -6$ in paper appears to be a typo (formula gives $-5$) |
| Pulse count and duration grow as $N^2$ | partially agreed | C064 — per-rotation formula verified; but summing all pulses gives $O(N^3)$, not $O(N^2)$; $O(N^2)$ is recoverable only under a specific (concatenation-only) counting convention not fully supported by the stated derivation |
| Largest $\text{CF}_{31} = 0.800(4)$ | agreed | C049 — confirmed to $0.1\sigma$ |
| Contextuality for $N$ up to 101 (bare), 61 (extended) | agreed | C048 — exact cutoffs confirmed from data |
| Largest $N$ of any contextuality experiment at time of publication | agreed | C066 — prior record $N=90$ (Bell); this work's $N=101$ exceeds it; partial verdict due to non-exhaustive survey |
| Largest detection-loophole-free CF | agreed | C067 — Tan 2017 ($\text{CF}_9 = 0.704$) confirmed below $0.800$ |
Conclusion
| Point | Assessment | Supporting claims |
|---|---|---|
| Summary of main results; future directions | agreed — no novel factual claims | (none) |
App. A — Qutrit Transitions
| Point | Assessment | Supporting claims |
|---|---|---|
| Qubit encoding, magnetic field, wavelength | agreed | C013, C014, C015 — all verified or well-corroborated |
| AC Stark shifts $< 100\ \text{Hz}$ | partially agreed | C016 — physically self-consistent but Rabi frequency not available; plausible given stated operating regime |
App. B — Qutrit Coherence Times
| Point | Assessment | Supporting claims |
|---|---|---|
| $\sigma_t \approx 1.6\ \text{ms}$ for transitions to $D_{5/2}$ | partially agreed | C017 — internally consistent (1.629 ms from FWHM); 17Leupold reports 2.5 ms (different run conditions) |
| $\sigma_t \approx 7\ \text{ms}$ for $ | 1\rangle\leftrightarrow | 2\rangle$ |
| Common-mode noise $\approx 230\ \text{Hz}$ FWHM | agreed | C019 — computed 234 Hz, within $\approx$ qualifier |
| Differential noise $< 50\ \text{Hz}$, $B < 5\ \mu\text{G}$ | partially agreed | C020 — Zeeman sensitivity gives $< 8.4\ \text{Hz}$ per $5\ \mu\text{G}$, consistent with paper; no raw Ramsey data |
| Drift $\sim 100\ \text{Hz}$, recalibration every 30 s | partially agreed | C021 — confirmed from paper text; not independently verifiable |
App. C — Ion Cooling
| Point | Assessment | Supporting claims |
|---|---|---|
| Doppler cooling: $n_{\text{th}} \approx 5$ | partially agreed | C022 — Doppler limit calculation gives $\approx 3.9$ for axial mode; radial modes lower; exact trap frequencies not stated |
| EIT cooling: $n_{\text{th}} \approx 0.2$ | agreed | C023 — verbatim in Alonso 2016 (same apparatus) |
| Heating rate $\sim 200\ \text{quanta/s}$ during dark detection | not agreed | C024 — claim unverifiable from available data; plausible but requires dedicated sideband spectroscopy data |
App. D — Qutrit Detection
| Point | Assessment | Supporting claims |
|---|---|---|
| Fluorescence at 397 nm + repump at 866 nm | agreed | C025 — NIST levels confirmed to $< 0.5\ \text{nm}$ |
| $\approx 25$ photons bright, $\approx 1$ bg, $\approx 200\ \mu\text{s}$ window | partially agreed | C026 — Leupold 2017 gives consistent values after scaling; no raw photon data |
| Detection errors $2\times10^{-5}$ / $1\times10^{-4}$ | partially agreed | C027 — Poisson model gives $2.3\times10^{-5}$ / $1.2\times10^{-4}$; within $\approx$ qualifier |
| $D_{5/2}$ lifetime $\approx 1.2\ \text{s}$ | partially agreed | C028 — spectroscopic literature gives $1.155$–$1.168\ \text{s}$; rounds to $1.2\ \text{s}$ |
App. E — Data Collection and Analysis
| Point | Assessment | Supporting claims |
|---|---|---|
| Data analysis reproduces all Table 1 and Table 2 values | agreed | C029–C032, C036–C047 — reproduced from raw data |
| Statistical significance calculations (65 $\sigma$, 67 $\sigma$, up to 25 $\sigma$) | agreed | C034, C035 — confirmed |
| Shot-noise bias formula for $\varepsilon_i$ (folded normal) | disagreed | C068 — the conceptual framework is correct but the closed-form formula contains two errors (wrong variance; missing factor of 2); correct value $\approx 1.34\times$ larger than claimed |
App. G — Theoretical Predictions for KCBS Witnesses
| Point | Assessment | Supporting claims |
|---|---|---|
| Single-correlator formula | agreed | C009 — verified symbolically |
| Minimum at $\theta = \pi/2$, value $\approx -4.045$ | partially agreed | C010 — value $-4.045$ is correct; angle $\theta = \pi/2$ is a typo (should be $\theta = \pi/4$); the maximum $S_5(\pi/2) = 5$ |
| $\varepsilon_i$ formula | agreed | C011 — verified symbolically |
App. H — Relevance of KCBS
| Point | Assessment | Supporting claims |
|---|---|---|
| Bell-scenario quantum maximum $\bar{S}_5^{\text{Bell}} \approx -3.828$ | agreed | C012 — verified; equals $-1-2\sqrt{2}$ |
App. I — N-Cycle Details
| Point | Assessment | Supporting claims |
|---|---|---|
| Table 2 complete $N$-gon data | agreed | C036–C047 — see §4; note $N=7$ NC bound entry appears to contain a typo |
App. J — Comparison with Previous KCBS Tests
| Point | Assessment | Supporting claims |
|---|---|---|
| Vienna 2011 row | agreed | C059 — confirmed from Lapkiewicz 2011 data |
| Stockholm 2013 row | partially agreed | C060 — reverse saturation $0.953$ confirmed; normal saturation $0.561$ vs claimed $0.53$ (minor methodological inconsistency in normalization) |
| Beijing 2013 (Deng) row | partially agreed | C061 — 3/4 numbers confirmed; uniform-case signaling $0.267$ not exactly reproducible |
| Beijing 2013 (Um) row | agreed | C062 — re-analyzed values confirmed |
| Brisbane 2016 (Jerger) row | agreed | C063 — confirmed from Jerger et al. Table I |
| This work normal/reverse | agreed | C050, C051 — reproduced to $< 0.20\sigma$ |
App. K — Comparison with Bell Tests
| Point | Assessment | Supporting claims |
|---|---|---|
| This work: $99.5(2)\%$ of QM limit | partially agreed | C052 — computed $99.3\pm1.4\%$; $0.15\sigma$ from $99.5\%$ but uncertainty understated $\approx7\times$; see §6 |
| Poh 2015: $99.97(2)\%$ of Tsirelson bound | agreed | C057 — confirmed |
| Christensen 2015: $\sim 99\%$ CHSH, $N=90$, $\text{CF}_{36} = 0.874$ | agreed | C058 — confirmed |
Related Work / Acknowledgements / Author Contributions / App. L
| Point | Assessment | Supporting claims |
|---|---|---|
| No novel factual claims | agreed — no novel factual claims | (none) |
6. Major Issues
C052 — Stated QM-limit saturation 99.5(2)% is numerically inconsistent
Claim: "This work's KCBS result corresponds to $99.5(2)\%$ of the QM limit."
Location: App. K (sec:Bellcomparison) and Abstract.
Issue: The standard saturation formula $(S_5 - S_5^{\text{NC}})/(S_5^{\text{QM}} - S_5^{\text{NC}})$ applied to the best (reverse-order) data gives $99.27 \pm 1.43\%$ — not $99.5\pm0.2\%$. The discrepancy has two components:
-
Central value: $99.27\%$ vs $99.5\%$ — a $0.23$ percentage-point difference, which is within the actual $1\sigma = 1.43\%$ so not wrong per se, but the quoted value is not what the data directly yield.
-
Stated uncertainty: $\pm 0.2\%$ is inconsistent with the actual statistical uncertainty $\pm 1.43\%$ (a factor of $\approx 7$ understatement). Table 3 of the same paper independently reports $0.992(14) = 99.2(1.4)\%$ for the reverse order, confirming the correct uncertainty.
Impact: The $99.5(2)\%$ figure appears in the abstract and is used as a comparison benchmark against Poh 2015 ($99.97(2)\%$) and Christensen 2015 ($\sim 99\%$). The understated uncertainty exaggerates the precision of the KCBS result; the true $\pm 1.4\%$ would not affect any comparative conclusion (still clearly above $99\%$) but is misleading as written.
Recommendation: The abstract and App. K should read $99.3(1.4)\%$ or, accepting the rounded Table 3 value, $99.2(1.4)\%$.
7. Minor Issues
C010 — Typo: minimum angle $\theta = \pi/2$ should be $\theta = \pi/4$
Claim: The minimum of $S_5(\theta)$ is at $\theta = \pi/2$.
Location: App. G.
Finding: $S_5(\pi/2) = 5$ (maximum). Minimum is at $\theta = \pi/4$ (where $\cos(4\theta) = -1$).
The numerical value $\frac{5}{4}(-\sqrt{5}-1) \approx -4.045$ is correct. Pure typographical
error in a supplementary appendix; does not affect any experimental result.
C024 — Heating rate $\sim 200\ \text{quanta/s}$ unverifiable
Claim: Dark-detection motional heating rate $\sim 200\ \text{quanta/s}$.
Location: App. C.
Finding: No sideband spectroscopy data in the public repository; claim cannot be
confirmed or refuted. Plausible for a surface-electrode trap; apparatus characterisation
only; no impact on contextuality conclusions.
C068 — Shot-noise bias formula contains two errors
Claim: Expected gap in $S_5^{(\text{ext})}$ at $\theta_5$ is $\sqrt{2(1-\cos 2\theta_5)/(\pi n)}$.
Location: App. E.
Finding: Two compounding errors — (1) variance formula omits the square on $\langle A_i\rangle$;
(2) combined variance for $\varepsilon_i$ should be $2\sigma^2_A$, not $\sigma^2_A$.
The correct value is $\approx 1.34\times$ larger ($0.01122$ vs $0.00839$ at $n=10{,}000$).
This affects the dashed red correction curves in Figs. 1 and 2, but not any of the
primary contextuality conclusions.
C016 — AC Stark shift bound $< 100\ \text{Hz}$ (plausible, unconfirmed)
Rabi frequency not stated; physically self-consistent reconstruction gives $\approx 100\ \text{Hz}$ at $\Omega/2\pi \approx 50\ \text{kHz}$. Apparatus detail; no impact on main results.
C017, C018 — Coherence times (apparatus characterisation)
$\sigma_t \approx 1.6\ \text{ms}$ (C017) and $7\ \text{ms}$ (C018): internally consistent with stated noise widths; companion paper 17Leupold gives $\approx 1.6\times$ longer values in a different run of the same apparatus. No raw Ramsey data available.
C020, C021 — Differential noise and recalibration (apparatus characterisation)
C020: $B$-field contribution $< 8.4\ \text{Hz}$ confirmed; remaining $\sim 42\ \text{Hz}$ attributed to slow drifts. C021: recalibration parameters confirmed by text; not independently recomputable. No impact on experimental conclusions.
C022 — Doppler cooling $n_{\text{th}} \approx 5$ (approximate)
Doppler limit calculation gives $\approx 3.9$ (axial) to $1.1$ (radial); claim appears to refer to the axial mode or uses slightly lower trap frequencies than the cited companion paper. Apparatus detail.
C026, C027, C028 — Photon counts and detection errors (apparatus characterisation)
C026 and C027: computed values within $6$–$25\%$ of stated values (consistent with $\approx$ qualifiers); C028: $D_{5/2}$ lifetime from spectroscopy literature ($1.155$–$1.168\ \text{s}$) rounds to stated $\approx 1.2\ \text{s}$. No impact on contextuality conclusions.
C033 — Systematic shift "1.6 $\sigma$" from QM prediction
Computed $1.49$–$1.78\sigma$ depending on combination method; no single formula reproduces exactly $1.6\sigma$. Qualitative statement ("approximately $1.6\sigma$") is accurate.
C044 empirical partial C044 — $N=61$ row (partial due to reimplementation only)
All five values match paper within $0.1\sigma$; verdict partial only because no paper-provided analysis code exists. Not a substantive issue.
C060 — Stockholm 2013 normal-order saturation ($0.53$ vs computed $0.561$)
Minor methodological inconsistency: the citing paper appears to use the raw numerator $|\kappa - S^{\text{NC}}|$ rather than the full normalized fraction for the normal-order entry. Both values are within the $\pm 0.11$ error bar.
C061 — Beijing 2013 (Deng) uniform-case signaling ($0.267$, ambiguous)
Three of four numbers confirmed; uniform-case signaling $0.267$ is between two equally plausible interpretations ($0.251$ and $0.280$). Core qualitative finding (large signaling) unaffected.
C064 — $O(N^2)$ pulse-count scaling not supported by stated derivation
Per-rotation formula correct; summing all $N$ rotations gives $O(N^3)$, not $O(N^2)$. $O(N^2)$ holds only for a concatenation-only counting convention not fully described in the paper. Practical conclusion (total experiment time grows rapidly with $N$) is correct.
C066 — Record $N=101$ observables (literature survey limitation)
Strongly supported by all available citations; verdict partial only due to non-exhaustive pre-Dec 2017 survey. Not a substantive issue.
C067 — Largest detection-loophole-free CF (literature survey limitation)
Tan 2017 ($\text{CF}_9 = 0.704$) confirmed $7.6\sigma$ below $0.800$. Verdict partial only because Tan et al. raw data not available; extracted values from paper text.
End of report. Verdict counts: verified 48 / partial 17 / not_verified 3 (total 68). Major-issue IDs: C052.