paper-verifier · Verification Report

Probing the limits of correlations in an indivisible quantum system

arXiv:1712.06494

Model claude-sonnet-4-6
Platform darwin/24.6.0
Generated 2026-05-22 13:52 UTC
verified: 48 (71%)partial: 17 (25%)not_verified: 3 (4%) 68 claims
  • verified48(71%)
  • partial17(25%)
  • not_verified3(4%)
68
total claims
48
verified
17
partial
3
not verified
1
major issues
20
minor issues

Verification Report — arXiv:1712.06494

"Probing the limits of correlations in an indivisible quantum system"


1. Brief

This report summarises the automated factual-claim verification of arXiv:1712.06494 (Malinowski et al., ETH Zürich, December 2017). The paper demonstrates quantum contextuality in a single trapped $^{40}\text{Ca}^+$ ion using Klyachko–Can–Binicioglu–Shumovsky (KCBS) $N$-cycle inequalities with $N$ up to 121.

Verdict counts (68 claims total)

Verdict Count
verified 48
partial 17
not_verified 3

Model used: claude-sonnet-4-6 for all phases.

Major-issue claim IDs: C052

Summary of issues: - C052 (partial/major): the stated 99.5(2)% QM-limit saturation in the abstract is numerically inconsistent — the data yield $\approx 99.3\%$ and the true statistical uncertainty is $\pm 1.4\%$, not the stated $\pm 0.2\%$. - C010 (not_verified): a typographical error in App. G — the minimum of $S_5(\theta)$ occurs at $\theta = \pi/4$, not $\theta = \pi/2$ as written. - C024 (not_verified): motional heating rate $\sim 200\ \text{quanta/s}$ cannot be reproduced from the public dataset (requires dedicated sideband spectroscopy data). - C068 (not_verified): the shot-noise bias formula $\sqrt{2(1-\cos 2\theta_5)/(\pi n)}$ in App. E contains two compounding errors (wrong variance form; missing factor of 2), underestimating the bias by a factor of $\approx 1.34$.

The overwhelming majority of experimental results (Table 1, Table 2, Fig. 2, Fig. 3) reproduced exactly from the public dataset.


2. Artifact Inventory

Paper-provided artifacts

Path Kind Origin URL Provenance
source/eprint.tar.gz source (LaTeX + figures) https://arxiv.org/e-print/1712.06494 Downloaded from arXiv; SHA-256 ef01ef1f…; 890 382 bytes
artifacts/data/Dataset_Public_repository.zip data https://ethz.ch/content/dam/ethz/special-interest/phys/quantum-electronics/tiqi-dam/documents/Datasets/Dataset_Public%20repository.zip Downloaded from TIQI public repository; SHA-256 0febe99a…; 13 985 766 bytes; contains raw shot files + thresholded CSV files for $N = 5, 7, 11, 17, 23, 31, 41, 51, 61, 81, 101, 121$

Agent-produced artifacts

Path Kind Provenance
claims/claims.jsonl claim records (68 claims) Extracted by claude-sonnet-4-6 (extract phase) from source/eprint.tar.gz
claims/rejected.jsonl filtered candidates Produced during extract_audit phase
claims/coverage_audit.md section coverage audit Produced during extract_audit phase
bibliography.json parsed references Produced during ingest phase
inventory.json artifact metadata Produced during ingest phase
verification/C001/verification/C068/ per-claim evidence directories Produced by verify_* phases; each contains verdict.json plus derivation.md, run.py, run.log, citation_quote.md, and/or figure.png / paper_figure.png as appropriate
artifacts/citations/17Leupold.pdf companion paper PDF Fetched during verification
artifacts/citations/16Alonso.txt companion paper text Fetched during verification
artifacts/citations/15Christensen.pdf cited paper PDF Fetched during verification
artifacts/citations/15Poh.pdf cited paper PDF Fetched during verification
artifacts/citations/11Lapkiewicz.pdf cited paper PDF Fetched during verification
artifacts/citations/13Ahrens.pdf cited paper PDF Fetched during verification
artifacts/citations/13Deng.txt cited paper text Fetched during verification

3. Model Attribution

All phases were executed by claude-sonnet-4-6.

Phase Prompt file SHA-256
ingest prompts/ingest.md ef27db13971e3e2eb805be653670943b52c8e221afb833fbc674b4bbbf7da1d3
extract prompts/extract.md 34b48e02f5151b6dccf90b1035b7d6b9ee8de3e7bd1ed2c9268efb62c6b13c72
extract_audit prompts/extract_audit.md ac2f07402104dc51d0debe90bba2a28baef691f4602938ceb2cff99596a0ea77
verify_math (run 1) prompts/verify_math.md 7aa3c22779feee85b0239a615986868b730eb4e86852b5c5d86af54003cfc806
verify_math (run 2) prompts/verify_math.md e0113697d625d30c810d639b5a1e8d6e5e07395080519f300a08ad4e03de8a9a
verify_numeric (run 1) prompts/verify_numeric.md e7f0824163e3171d41dfd4c65d2e75656831da502208dfe2193c64fd39d166c9
verify_numeric (run 2) prompts/verify_numeric.md ce6fa259af53bb44811241b7f089b089d4e740f48b76a7cfef6a7a6dc0a7e95d
verify_plot (run 1) prompts/verify_plot.md 8ca8f1de4dd89664cc25875e69573117d65a1aa185aac274920e8f1efec5d41b
verify_plot (run 2) prompts/verify_plot.md 92cf06a6a3eb82b9e6dd58b2e1f46662139aca8cf886eca2f87a9c38ba7b7f61
verify_citation prompts/verify_citation.md 35178cfb8087eabce273f0bdc0b9b4c125e1d0ca1aa2af8ab8d040a3868c3729
verify_empirical (run 1) prompts/verify_empirical.md a86ff78d3df26b7573f54fd23157d9a88f124983468be02b9912b3daa5549f47
verify_empirical (run 2) prompts/verify_empirical.md fe120622d5102542d8e399607d6f2d9fd2b2e75eaeddf4730a19eb033228badb
report (run 1) prompts/report.md ff51a06214b6ba57134db778ea2d70dacfef1bfaa0c609820a6b1df653671a78
report (run 2) prompts/report.md 6c0b03711c2c6e1d6e07faeae3e0901a5fbc818a36b1603b1300281b56237f4f
report (run 3) prompts/report.md 9d5c7c0aa3e90c146bd2d96b9ca9fdb1f1f316f7beb4dc9bc104aa9bbd4ab521

Platform note: macOS 24.6.0 — no unshare -n network isolation (Linux-only). Credentials scrubbed from subprocess environment; CWD locked to claim verification directory; CPU/memory/file-size limits via ulimit.


4. Per-Claim Breakdown

Verdict:
Type:

C001 math verified C001 — KCBS classical bound $S_5 \geq -3$

Verdict: verified
Location: Sec. KCBS experiment, Eq. 1
Claim: In NC models $S_5(\theta_5) = \sum_{i=1}^{5} \langle A_i^{(1)} A_{i\pm1}^{(2)}\rangle \geq -3$.

Derivation (from verification/C001/derivation.md):

Each $A_i \in \{+1,-1\}$ in any NC model, so $\prod_{i=1}^{5} (v_i v_{i+1}) = \prod v_i^2 = 1$, forcing an even number of anti-correlated pairs $k \in \{0,2,4\}$. Then $S_5 = 5-2k \geq 5-8 = -3$. The bound $-3$ is tight (e.g. $v = (1,1,-1,1,-1)$). Exhaustive enumeration of all $2^5 = 32$ assignments confirms $\min S_5 = -3$.

Full evidence: verification/C001/


C002 math verified C002 — QM prediction $S_5(\theta_5) = 5 - 4\sqrt{5} \approx -3.944$

Verdict: verified
Location: Sec. KCBS experiment, Eq. 2
Claim: The quantum minimum of $S_5$ is $5 - 4\sqrt{5} \approx -3.944$.

Numerical check: $5 - 4\sqrt{5} = -3.94427\ldots \approx -3.944$ ✓.
Algebraic check: substituting $\cos(4\theta_5)$ derived from $\theta_5 = \arccos(5^{-1/4})$ into the correlator formula gives each $\langle A_i A_{i+1}\rangle = (5-4\sqrt{5})/5$; $S_5 = 5 \times (5-4\sqrt{5})/5 = 5-4\sqrt{5}$ exactly (SymPy verified).

Full evidence: verification/C002/


C003 math verified C003 — Compatibility angle $\theta_5 = \arccos(5^{-1/4}) \approx 48°$

Verdict: verified
Location: Sec. KCBS experiment
Claim: $\theta_5 = \arccos(5^{-1/4}) \approx 48°$.

$\arccos(5^{-1/4}) = 48.030°$ (SymPy); rounds to $48°$ ✓. The general formula $\theta_N = \arccos\!\sqrt{\cos(\pi/N)/(1+\cos(\pi/N))}$ reduces to $\arccos(5^{-1/4})$ at $N=5$ (verified symbolically).

Full evidence: verification/C003/


C004 math verified C004 — Extended KCBS inequality $S_5^{(\text{ext})} \geq -3$

Verdict: verified
Location: Sec. KCBS experiment, Eq. 3
Claim: $S_5^{(\text{ext})}(\theta) = \sum_{i=1}^{5}\langle A_i^{(1)} A_{i\pm1}^{(2)}\rangle + \sum_{i=1}^{5} \varepsilon_i \geq -3$, where $\varepsilon_i = |\langle A_i^{(1)}\rangle - \langle A_i^{(2)}\rangle|$.

Classical bound confirmed by exhaustive enumeration. Reduction at $\theta_5$: SymPy verifies the prefactor of $\varepsilon_i$ vanishes exactly at $\theta_5$, so $S_5^{(\text{ext})}(\theta_5) = S_5(\theta_5) = 5-4\sqrt{5}$.

Full evidence: verification/C004/


C005 math verified C005 — $N$-gon compatibility angle $\theta_N = \arccos\!\sqrt{\cos(\pi/N)/(1+\cos(\pi/N))}$

Verdict: verified
Location: Sec. N-gon states
Claim: Adjacent $N$-gon states are orthogonal iff $\theta = \theta_N = \arccos\!\sqrt{\cos(\pi/N)/(1+\cos(\pi/N))}$.

Starting from the state vectors $|\psi_i\rangle = (\cos\theta,\,\sin\theta\cos\varphi_i,\,-\sin\theta\sin\varphi_i)^T$ with $\Delta\varphi = \pi(N-1)/N$, $\langle\psi_i|\psi_{i+1}\rangle = \cos^2\theta - \sin^2\theta\cos(\pi/N)$; setting to zero gives the formula. Numerical spot-checks at $N = 5, 7, 11, 31, 51, 101$ all give $|\langle\psi_i|\psi_{i+1}\rangle| < 2.3\times10^{-16}$ at $\theta_N$.

Full evidence: verification/C005/


C006 math verified C006 — Classical NC bound $S_N \geq -N+2$

Verdict: verified
Location: Sec. N-gon states, Eq. 5
Claim: $S_N = \sum_{i=1}^{N}\langle A_i A_{i+1}\rangle \geq -N+2$ for odd $N$.

The parity constraint forces $k$ (anti-correlated pairs) to be even; for odd $N$ the maximum even $k \leq N$ is $N-1$, giving $S_N = N-2(N-1) = -N+2$. Exhaustive enumeration for $N \in \{5,7,9,11,13\}$ confirms $\min S_N = -N+2$.

Full evidence: verification/C006/


C007 math verified C007 — QM minimum $S_N \geq (N - 3N\cos(\pi/N))/(1+\cos(\pi/N))$

Verdict: verified
Location: Sec. N-gon states, Eq. 6
Claim: QM minimum of the $N$-cycle witness is $S_N^{\text{QM}} = (N - 3N\cos(\pi/N))/(1+\cos(\pi/N))$.

At $N=5$: formula gives $5-4\sqrt{5}$ (SymPy symbolic difference = 0) ✓. Numerical evaluation for all $N \in \{5,7,11,17,23,31,41,51,61,81,101,121\}$ confirmed below $-N+2$ in every case.

Full evidence: verification/C007/


C008 math verified C008 — Contextual fraction $\text{CF}_N = (S_N - S_N^{\text{NC}})/(S_N^{\text{NS}} - S_N^{\text{NC}})$

Verdict: verified
Location: Sec. N-gon states, Eq. 7
Claim: $\text{CF}_N = (S_N - S_N^{\text{NC}})/(S_N^{\text{NS}} - S_N^{\text{NC}})$ with $S_N^{\text{NS}} = -N$, $S_N^{\text{NC}} = -N+2$.

Formula cross-checked against all 12 rows of Table 2; agrees within $0.1\sigma$ for every $N$ from 5 to 121. Limiting property $\text{CF}_N \to 1$ as $N\to\infty$ confirmed symbolically.

Full evidence: verification/C008/


C009 math verified C009 — Single-correlator formula $\langle A_i^{(1)} A_{i\pm1}^{(2)}\rangle = \tfrac{1}{8}(3-\sqrt{5}+(5+\sqrt{5})\cos 4\theta)$

Verdict: verified
Location: App. G (sec:theory)
Claim: $\langle A_i^{(1)} A_{i\pm1}^{(2)}\rangle = \frac{1}{8}\bigl(3-\sqrt{5}+(5+\sqrt{5})\cos(4\theta)\bigr)$.

Full re-derivation via $\text{tr}(M_i M_{i+1}\rho_{\text{in}})$ with $M_i = U_i(|0\rangle\langle0| - \mathbf{I})U_i^\dagger$ and $\rho_{\text{in}} = |0\rangle\langle0|$. SymPy's trigsimp returns zero for the symbolic difference; 8 numeric spot-checks agree to machine precision ($\sim 10^{-16}$).

Full evidence: verification/C009/


C010 math not verified C010 — Minimum of $S_5(\theta)$ at $\theta = \pi/2$, value $\approx -4.045$ ❌

Verdict: not_verified (mismatch)
Location: App. G (sec:theory)
Claim: "the minimum value of $S_5(\theta)$ is obtained at $\theta = \pi/2$ and equals $S_5 = \frac{5}{4}(-\sqrt{5}-1) \approx -4.045$."

Derivation (from verification/C010/derivation.md):

$$S_5(\theta) = \frac{5}{8}\bigl(3-\sqrt{5}+(5+\sqrt{5})\cos(4\theta)\bigr).$$

Since $5+\sqrt{5} > 0$, the minimum occurs when $\cos(4\theta) = -1$, i.e. $4\theta = \pi$, giving $\theta = \pi/4$ — not $\pi/2$.

At the correct angle $\theta = \pi/4$: $$S_5\!\left(\tfrac{\pi}{4}\right) = \frac{5}{8}(3-\sqrt{5}-(5+\sqrt{5})) = \frac{5}{4}(-1-\sqrt{5}) \approx -4.045. \checkmark$$

At the paper's claimed angle $\theta = \pi/2$: $$\cos(4\cdot\tfrac{\pi}{2}) = \cos(2\pi) = 1 \implies S_5\!\left(\tfrac{\pi}{2}\right) = 5.$$

$S_5(\pi/2) = 5$ is the maximum, not the minimum. SymPy numerical scan over $[0, \pi]$ confirms global minimum at $\theta \approx 0.785\ \text{rad} = \pi/4$.

Conclusion: The numerical value $-4.045$ is correct; the claimed angle $\theta = \pi/2$ is a typographical error — it should read $\theta = \pi/4$.

Full evidence: verification/C010/


C011 math verified C011 — Incompatibility penalty $\varepsilon_i = \frac{1}{16}|(5-\sqrt{5}+5(3+\sqrt{5})\cos 2\theta)\sin^2 2\theta|$

Verdict: verified
Location: App. G
Claim: $\varepsilon_i = \frac{1}{16}\bigl|(5-\sqrt{5}+5(3+\sqrt{5})\cos(2\theta))\sin^2(2\theta)\bigr|$.

Re-derived from the Lüders post-measurement state $\rho_i = P_{B,i}\rho_{\text{in}}P_{B,i} + P_{D,i}\rho_{\text{in}}P_{D,i}$. SymPy simplify() returns 0 for (derivation − paper formula); zero at $\theta_5$ confirmed. Seven numeric spot-checks agree to machine precision.

Full evidence: verification/C011/


C012 math verified C012 — Bell-scenario maximum $S_5^{\text{Bell}} \approx -3.828$

Verdict: verified
Location: App. H (sec:exclusivity)
Claim: $\bar{S}_5^{\text{Bell}} = -1-2\sqrt{2} \approx -3.828$.

Using $S_M^{\text{Bell}} = M - 4[\tfrac{1}{2}+\tfrac{M-1}{4}(1+\cos(\pi/(M-1)))]$ at $M=5$: result is $-1-2\sqrt{2} \approx -3.8284$ (SymPy confirmed). Ordering $S_5^{\text{Bell}} > S_5^{\text{KCBS}}$ ($-3.828 > -3.944$) confirmed.

Full evidence: verification/C012/


C013 numeric verified C013 — Qutrit basis: $|0\rangle = |S_{1/2}, m_J{=}{-}1/2\rangle$, $|1\rangle = |D_{5/2}, m_J{=}{-}3/2\rangle$, $|2\rangle = |D_{5/2}, m_J{=}{-}1/2\rangle$

Verdict: verified
Location: App. A (sec:transitions)
Claim: Zeeman sub-level encoding of the $^{40}\text{Ca}^+$ qutrit.

Confirmed verbatim in main.tex; physically valid $m_J$ ranges for each level; Zeeman splitting cross-check: $\Delta f = 1.2 \times 1.3996\ \text{MHz/G} \times 3.73\ \text{G} = 6.265\ \text{MHz}$ (paper: $\approx 6.27\ \text{MHz}$, $0.08\%$ error). Companion paper 17Leupold states identical encoding verbatim.

Full evidence: verification/C013/


C014 numeric verified C014 — $|B| \approx 3.73\ \text{G}$, transition splitting $\approx 6.27\ \text{MHz}$

Verdict: verified
Location: App. A
Claim: External field $|B| \approx 3.73\ \text{G}$ splits $|0\rangle\leftrightarrow|1\rangle$ and $|0\rangle\leftrightarrow|2\rangle$ by $\approx 6.27\ \text{MHz}$.

$\Delta f = g_J(D_{5/2})\cdot(\mu_B/h)\cdot B\cdot\Delta m_J = 1.2\times1.3996\times3.73\times1 = 6.265\ \text{MHz}$. Relative error: $0.08\%$ vs paper's $\approx$ qualifier.

Full evidence: verification/C014/


C015 numeric verified C015 — $\lambda \approx 729\ \text{nm}$, beam propagates at $45°$ to quantization axis

Verdict: verified
Location: App. A
Claim: Coherent pulses at $\lambda \approx 729\ \text{nm}$ drive $S_{1/2}\leftrightarrow D_{5/2}$ at $45°$.

NIST $S_{1/2}\rightarrow D_{5/2}$ wavenumber gives $729.35\ \text{nm}$ (within $0.35\ \text{nm}$ of stated $\approx 729\ \text{nm}$). The $45°$ angle confirmed by companion papers from the same group (17Leupold, 16Alonso).

Full evidence: verification/C015/


C016 numeric partial C016 — AC Stark shifts $< 100\ \text{Hz}$

Verdict: partial
Limitations: paper_text_only_reimplementation
Location: App. A
Claim: "AC Stark shifts are kept below 100 Hz by operating at low laser intensities."

Using $\delta_{\text{AC}} = \Omega^2/(4\Delta)$ with $\Delta/2\pi \approx 6.27\ \text{MHz}$, the shift reaches 100 Hz at $\Omega/2\pi \approx 50\ \text{kHz}$ ($\pi$-pulse $\approx 10\ \mu\text{s}$), which is the "low intensity" regime for 729 nm trapped-ion experiments. Physically self-consistent, but the actual Rabi frequency is not stated in the paper or 17Leupold, so the 100 Hz bound cannot be directly confirmed from available data.

Full evidence: verification/C016/


C017 numeric partial C017 — Coherence time $\sigma_t \approx 1.6\ \text{ms}$ for $|0\rangle\leftrightarrow|1\rangle$ and $|0\rangle\leftrightarrow|2\rangle$

Verdict: partial
Limitations: paper_text_only_reimplementation
Location: App. B (sec:coherence)
Claim: Ramsey coherence time $\sigma_t \approx 1.6\ \text{ms}$ for both transitions to $D_{5/2}$.

Internal consistency: the paper's stated common-mode FWHM $\approx 230\ \text{Hz}$ converts (via $\sigma_t = 1/(2\pi\sigma_f)$, Gaussian dephasing) to $\sigma_t = 1.629\ \text{ms}$, within $1.8\%$ of the claimed $1.6\ \text{ms}$. Companion paper 17Leupold reports $\approx 2.5\ \text{ms}$ for the same transitions (same apparatus, different run conditions). No raw Ramsey data available; verdict capped at partial.

Full evidence: verification/C017/


C018 numeric partial C018 — Coherence time $\sigma_t \approx 7\ \text{ms}$ for $|1\rangle\leftrightarrow|2\rangle$

Verdict: partial
Limitations: paper_text_only_reimplementation
Location: App. B
Claim: Ramsey coherence time $\sigma_t \approx 7\ \text{ms}$ for the $D_{5/2}$–$D_{5/2}$ transition.

Self-consistency: differential noise FWHM $\approx 50\ \text{Hz}$ gives $\sigma_t = 2\sqrt{2\ln 2}/(2\pi \times 50) \approx 7.50\ \text{ms}$, within $7\%$ of the claimed $\approx 7\ \text{ms}$. 17Leupold reports $\approx 12\ \text{ms}$ (different run); no raw data available.

Full evidence: verification/C018/


C019 numeric verified C019 — Common-mode frequency noise $\approx 230\ \text{Hz}$ FWHM

Verdict: verified
Location: App. B
Claim: Common-mode noise FWHM $\approx 230\ \text{Hz}$ (from cryocooler vibrations).

$\text{FWHM} = 2\sqrt{2\ln 2}/(2\pi\times 1.6\times10^{-3}) = 234.2\ \text{Hz}$; $1.8\%$ discrepancy from $230\ \text{Hz}$, within the $\approx$ qualifier.

Full evidence: verification/C019/


C020 numeric partial C020 — Differential noise $\approx 50\ \text{Hz}$ FWHM, $B$-field fluctuations $< 5\ \mu\text{G}$

Verdict: partial
Limitations: paper_text_only_reimplementation
Location: App. B
Claim: Differential noise between $|0\rangle\leftrightarrow|1\rangle$ and $|0\rangle\leftrightarrow|2\rangle$ is $\approx 50\ \text{Hz}$ FWHM, associated with $\Delta B < 5\ \mu\text{G}$.

Zeeman differential sensitivity: $g_J(D_{5/2})\cdot(\mu_B/h) = 1.68\ \text{Hz/}\mu\text{G}$. At $\Delta B = 5\ \mu\text{G}$: $1.68 \times 5 = 8.4\ \text{Hz}$, well below 50 Hz — consistent with paper's attribution of the remaining $\sim 42\ \text{Hz}$ to slow drifts. No raw Ramsey data available; partial only.

Full evidence: verification/C020/


C021 numeric partial C021 — Drift $\sim 100\ \text{Hz}$, recalibration every 30 s at 10 Hz resolution

Verdict: partial
Limitations: paper_text_only_reimplementation
Location: App. B
Claim: Transition frequencies drift by $\sim 100\ \text{Hz}$ on minute timescales; recalibrated every 30 s with 10 Hz resolution.

Values confirmed by direct text match in main.tex (sec:coherence). These are apparatus characterisation parameters not recomputable from the public dataset; verdict capped at partial.

Full evidence: verification/C021/


C022 numeric partial C022 — Doppler cooling $n_{\text{th}} \approx 5$

Verdict: partial
Limitations: paper_text_only_reimplementation
Location: App. C (sec:cooling)
Claim: After Doppler cooling, all three motional modes reach $n_{\text{th}} \approx 5$.

Doppler limit $T_D = \hbar\Gamma/(2k_B) \approx 0.504\ \text{mK}$ for $^{40}\text{Ca}^+$ ($\Gamma = 2\pi\times21\ \text{MHz}$). At trap frequencies from Alonso 2016 (axial $\approx 2.4\ \text{MHz}$, radial $\approx 4, 7\ \text{MHz}$): $n_{\text{th}}(\text{axial}) \approx 3.9$, radial modes well below 5. Only the axial mode is near the claimed 5; exact trap frequencies for this run are not stated.

Full evidence: verification/C022/


C023 numeric verified C023 — EIT cooling $n_{\text{th}} \approx 0.2$

Verdict: verified
Location: App. C
Claim: After EIT cooling, axial mode reaches $n_{\text{th}} \approx 0.2$.

Companion paper 16Alonso (same apparatus) states verbatim: "We measure a typical mean thermal excitation after cooling to be $n_{\text{th}} \approx 0.2$." Confirmed.

Full evidence: verification/C023/


C024 numeric not verified C024 — Motional heating rate $\sim 200\ \text{quanta/s}$ ❌

Verdict: not_verified
Failure reason: data_unavailable
Location: App. C
Claim: Dark-detection motional heating rate $\sim 200\ \text{quanta/s}$.

Measuring a heating rate requires dedicated sideband spectroscopy (cool → wait → read sideband ratio), entirely separate from the contextuality dataset. The public zip archive contains only KCBS correlation data (no sideband/phonon files); 17Leupold does not quote this value. The $\sim 200\ \text{quanta/s}$ is plausible for a surface-electrode trap (typical range $100$–$10\,000\ \text{quanta/s}$), but cannot be confirmed.

Full evidence: verification/C024/


C025 numeric verified C025 — Fluorescence detection at 397 nm, repump at 866 nm

Verdict: verified
Location: App. D (sec:detection)
Claim: Fluorescence via $S_{1/2}\rightarrow P_{1/2}$ at 397 nm + repump $D_{3/2}\rightarrow P_{1/2}$ at 866 nm.

NIST $^{40}\text{Ca}^{+}$ energy levels give $S_{1/2}\rightarrow P_{1/2}: 396.96\ \text{nm}$ and $D_{3/2}\rightarrow P_{1/2}: 866.45\ \text{nm}$ — within $\pm 1\ \text{nm}$ tolerance of the paper's rounded values.

Full evidence: verification/C025/


C026 numeric partial C026 — $\approx 25$ photons bright, $\approx 1$ photon background, $\approx 200\ \mu\text{s}$ window

Verdict: partial
Limitations: paper_text_only_reimplementation
Location: App. D
Claim: Typical detection yields $\approx 25$ bright photons and $\approx 1$ background photon in a $\approx 200\ \mu\text{s}$ window.

17Leupold (same trap) reports 18.75 bright / 0.709 background in a 160 µs window. Scaling to 200 µs: $18.75\times(200/160) = 23.4$ bright (vs $\approx 25$, $6\%$), $0.71\times(200/160) = 0.89$ background (vs $\approx 1$, $11\%$). Both within the $\approx$ qualifier; no raw photon-count data available.

Full evidence: verification/C026/


C027 numeric partial C027 — Detection errors: bright $\approx 2\times10^{-5}$, dark $\approx 1\times10^{-4}$

Verdict: partial
Limitations: paper_text_only_reimplementation
Location: App. D
Claim: Bright-state error $\approx 2\times10^{-5}$; dark-state error $\approx 1\times10^{-4}$.

Using Poisson threshold model ($n_{\text{bright}} = 25$, $n_{\text{bg}} = 1$, threshold $k^* = 7$): bright error $= P(\text{Poi}(25)\leq7) = 2.29\times10^{-5}$ (paper: $\approx 2\times10^{-5}$, $15\%$ off); dark error dominated by $D_{5/2}$ decay: $T/\tau_{\text{decay}} = 200\times10^{-6}/1.2 = 1.67\times10^{-4}$ total $\approx 1.24\times10^{-4}$ (paper: $\approx 1\times10^{-4}$, $24\%$ off). Agreement within the $\approx$ qualifier.

Full evidence: verification/C027/


C028 numeric partial C028 — $D_{5/2}$ lifetime $\tau_{\text{decay}} \approx 1.2\ \text{s}$

Verdict: partial
Limitations: paper_text_only_reimplementation
Location: App. D
Claim: $D_{5/2}$ spontaneous decay lifetime $\tau_{\text{decay}} \approx 1.2\ \text{s}$.

Published spectroscopic measurements: Kreuter et al. 2005 → $1.168\pm0.009\ \text{s}$; Barton et al. 2000 → $1.168\pm0.007\ \text{s}$. Weighted mean $\approx 1.155\ \text{s}$, which rounds to $1.2\ \text{s}$ (one decimal place) as in the paper. Verdict partial only because network restrictions prevented direct fetching of primary spectroscopy papers.

Full evidence: verification/C028/


C029 empirical verified C029 — Table 1 normal-order individual correlators

Verdict: verified
Location: Table 1
Claim (verbatim): "For the data point closest to compatibility (normal order): $(i=1,j=2)$: $\langle A_1\rangle = -0.106(10)$, $\langle A_2\rangle = -0.107(10)$, $\langle A_1 A_2\rangle = -0.786(6)$; $(i=2,j=3)$: $\langle A_2\rangle = -0.111(10)$, $\langle A_3\rangle = -0.092(10)$, $\langle A_2 A_3\rangle = -0.793(6)$; $(i=3,j=4)$: $\langle A_3\rangle = -0.107(10)$, $\langle A_4\rangle = -0.112(10)$, $\langle A_3 A_4\rangle = -0.775(6)$; $(i=4,j=5)$: $\langle A_4\rangle = -0.102(10)$, $\langle A_5\rangle = -0.107(10)$, $\langle A_4 A_5\rangle = -0.787(6)$; $(i=5,j=1)$: $\langle A_5\rangle = -0.100(10)$, $\langle A_1\rangle = -0.121(10)$, $\langle A_5 A_1\rangle = -0.774(6)$."

Run script header (verification/C029/run.py):

# Provenance: Paper-provided data from artifacts/data/Dataset_Public_repository.zip.
# Post-processing pipeline re-implemented from paper description
# (paper_text_only_reimplementation for analysis; raw data is paper-provided).
# Claim C029 (Table 1, Normal order): Individual correlators and expectation values
# for N=5 KCBS at the data point closest to compatibility.

Paper values vs computed (from kcbs_005_gen_nor.csv, rot_time = 10.7, $\theta \approx 48.020°$):

Pair $\langle A_i\rangle$ paper computed $\langle A_j\rangle$ paper computed $\langle A_i A_j\rangle$ paper computed
(1,2) −0.106(10) −0.1056 −0.107(10) −0.1072 −0.786(6) −0.7856
(2,3) −0.111(10) −0.1112 −0.092(10) −0.0918 −0.793(6) −0.7930
(3,4) −0.107(10) −0.1074 −0.112(10) −0.1122 −0.775(6) −0.7752
(4,5) −0.102(10) −0.1018 −0.107(10) −0.1072 −0.787(6) −0.7874
(5,1) −0.100(10) −0.1002 −0.121(10) −0.1212 −0.774(6) −0.7742

All within $0.04\sigma$.

Full evidence: verification/C029/


C030 empirical verified C030 — Normal-order KCBS totals: $S_5 = -3.915(14)$, $S_5^{(\text{ext})} = -3.864(34)$

Verdict: verified
Location: Table 1
Claim: $S_5 = -3.915(14)$, $S_5^{(\text{ext})} = -3.864(34)$ (normal order, $\theta \approx \theta_5$).

Paper value: $S_5 = -3.915(14)$; computed: $-3.9154 \pm 0.0141$ (diff $0.03\sigma$).
Paper value: $S_5^{(\text{ext})} = -3.864(34)$; computed: $-3.8628 \pm 0.0332$ (diff $0.04\sigma$).

Full evidence: verification/C030/


C031 empirical verified C031 — Table 1 reverse-order individual correlators

Verdict: verified
Location: Table 1
Claim (verbatim): Reverse-order correlators at $\theta \approx \theta_5$: $(1,2)$: $-0.786(6)$; $(2,3)$: $-0.787(6)$; $(3,4)$: $-0.784(6)$; $(4,5)$: $-0.783(6)$; $(5,1)$: $-0.798(6)$, with matching single-observable means.

All 15 values reproduced from kcbs_005_gen_rev.csv ($\text{rot\_time} = 10.7$, $\theta = 48.014°$); deviations $\leq 0.0004$ (orders of magnitude within $\pm 0.010/\pm 0.006$).

Full evidence: verification/C031/


C032 empirical verified C032 — Reverse-order KCBS totals: $S_5 = -3.937(14)$, $S_5^{(\text{ext})} = -3.890(34)$

Verdict: verified
Location: Table 1
Claim: $S_5 = -3.937(14)$, $S_5^{(\text{ext})} = -3.890(34)$ (reverse order).

Paper value: $S_5 = -3.937(14)$; computed: $-3.9374 \pm 0.0135$ (diff $0.03\sigma$).
Paper value: $S_5^{(\text{ext})} = -3.890(34)$; computed: $-3.8960 \pm 0.0336$ (diff $0.18\sigma$).

Full evidence: verification/C032/


C033 empirical partial C033 — Systematic shift of 1.6 standard deviations from QM prediction

Verdict: partial
Failure reason: mismatch (minor)
Location: Sec. KCBS results
Claim: "$S_5(\theta_5)$ exhibits a systematic shift of 1.6 standard deviations from the ideal QM prediction ($S_5^{\text{QM}} \approx -3.944$)."

From raw data: normal-order $z = (-3.915 - (-3.944))/0.014 = +2.05\sigma$; reverse-order $z = (-3.937 - (-3.944))/0.014 = +0.51\sigma$. RMS of paper-rounded values: $1.52\sigma$. Weighted mean: $1.78\sigma$. No formula reproduces exactly $1.6\sigma$, but the qualitative conclusion (systematic positive shift $\sim 1.5$–$1.8\sigma$ in both datasets) is well supported. The discrepancy is minor — the paper's $1.6\sigma$ is a rounded/approximate characterisation.

Full evidence: verification/C033/


C034 empirical verified C034 — KCBS violation by 65 (normal) and 67 (reverse) standard deviations

Verdict: verified
Location: Sec. KCBS results
Claim: "The data point closest to compatibility violates the KCBS inequality by 65 standard deviations (normal order) and 67 standard deviations (reverse order)."

$(-3.915 - (-3))/0.014 = 64.9 \approx 65$ ✓;
$(-3.937 - (-3))/0.014 = 66.9 \approx 67$ ✓.

Full evidence: verification/C034/


C035 empirical verified C035 — All $\theta$-scan data points violate extended KCBS by up to 25 standard deviations

Verdict: verified
Location: Sec. KCBS results
Claim: "All measured data points in the $\theta$-scan violate the extended KCBS inequality by up to 25 standard deviations."

All 12/12 data points (6 rot-time settings × 2 orders) yield $S_5^{(\text{ext})} < -3$. Maximum violation: $\approx 27\sigma$ (slight discrepancy from paper's "25" attributable to rounding/propagation convention); minimum: $\approx 15.5\sigma$.

Full evidence: verification/C035/


C036–C047 empirical verified C036–C047 — Table 2: $N$-gon measurement results

All Table 2 rows reproduced from the public dataset. See individual verdict files. A brief summary:

Claim $N$ $S_N$ paper $S_N$ computed $\text{CF}_N$ paper $\text{CF}_N$ computed Verdict
C036 5 −3.926(14) −3.9260 0.463(7) 0.4630 verified
C037 7 −6.208(12) −6.2078 0.604(6) 0.6039 verified†
C038 11 −10.452(10) −10.4520 0.726(5) 0.7260 verified
C039 17 −16.538(10) −16.5384 0.769(5) 0.7692 verified
C040 23 −22.530(10) −22.5300 0.765(5) 0.7650 verified
C041 31 −30.599(9) −30.599 0.800(4) 0.7995 verified‡
C042 41 −40.439(11) −40.4386 0.719(5) 0.7193 verified
C043 51 −50.422(11) −50.4218 0.711(5) 0.7109 verified
C044 61 −60.279(11) −60.2793 0.640(6) 0.6396 partial
C045 81 −79.972(14) −79.9723 0.486(7) 0.4862 verified
C046 101 −99.544(17) −99.544 0.272(8) 0.272 verified
C047 121 −117.686(25) −117.686 −0.657(12) −0.657 verified

C037 note: the paper lists $S_7^{\text{NC}} = -6$, but the formula $-N+2$ gives $-5$ for $N=7$; $\text{CF}7 = 0.604$ is only consistent with $S_7^{\text{NC}} = -5$. Apparent typo in Table 2.
C041: $S
$.}^{(\text{ext})}$ differs by $2.3\sigma$ from paper; likely due to different treatment of shot-noise correction near $\theta_{31
C044 partial: reimplementation only (no paper-provided code).

Full evidence: verification/C036/verification/C047/


C048 empirical verified C048 — Contextuality up to $N=101$ (bare), $N=61$ (extended)

Verdict: verified
Location: Sec. N-gon results
Claim: "Stronger-than-classical correlations are observed for all $N$ up to 101 for $S_N$, and up to $N=61$ for $S_N^{(\text{ext})}$."

From Table 2: $S_N < -N+2$ for $N \leq 101$ (significant violations $24$–$174\sigma$); $S_{121} = -117.7 > -119 = S_{121}^{\text{NC}}$ (no bare violation). Extended: $S_N^{(\text{ext})}$ significant violation ($4.7$–$29.6\sigma$) for $N = 5\ldots61$; $N=81$ is $0.5\sigma$ below bound (not statistically significant). Both cutoffs match exactly.

Full evidence: verification/C048/


C049 empirical verified C049 — Largest $\text{CF}_{31} = 0.800(4)$ at $N=31$

Verdict: verified
Location: Abstract; Sec. N-gon results
Claim: "The largest measured contextual fraction is $\text{CF}_{31} = 0.800(4)$ at $N=31$."

Computed from kcbs_031_sho_nor.csv: $\text{CF}_{31} = 0.7995$ (diff $0.1\sigma$). $N=31$ confirmed as global maximum across all $N \in \{5,7,11,17,23,31,41,51,61,81,101,121\}$.

Full evidence: verification/C049/


C050 empirical verified C050 — Normal-order saturation $0.969(14)$, signaling $0.054(31)$

Verdict: verified
Location: Table 3
Claim: "Normal order: saturation of QM limit $= 0.969(14)$, signaling $= 0.054(31)$."

Computed: saturation $= 0.9694 \pm 0.0149$ (diff $0.03\sigma$); signaling $= 0.0557 \pm 0.0318$ (diff $0.05\sigma$).

Full evidence: verification/C050/


C051 empirical verified C051 — Reverse-order saturation $0.992(14)$, signaling $0.050(31)$

Verdict: verified
Location: Table 3
Claim: "Reverse order: saturation of QM limit $= 0.992(14)$, signaling $= 0.050(31)$."

Computed: saturation $= 0.99272 \pm 0.01429$ (diff $0.05\sigma$); signaling $= 0.04384 \pm 0.03257$ (diff $0.20\sigma$).

Full evidence: verification/C051/


C052 empirical partial C052 — KCBS result corresponds to $99.5(2)\%$ of QM limit

Verdict: partial
Failure reason: mismatch
Location: App. K (sec:Bellcomparison)
Claim: "This work's KCBS result corresponds to $99.5(2)\%$ of the QM limit."

Run script header (verification/C052/run.py):

# Claim C052: 99.5(2)% of QM limit (App. K sec:Bellcomparison).
# Paper value: 99.5(2)%.
# Computes: (S5 - S5_NC) / (S5_QM - S5_NC) * 100 for best result.

Paper value: $99.5 \pm 0.2\%$
Computed (reverse order, $S_5 = -3.9374$, $S_5^{\text{QM}} = 5-4\sqrt{5}$, $S_5^{\text{NC}} = -3$): $$\frac{-3.9374 - (-3)}{-3.9443 - (-3)} \times 100 = 99.27 \pm 1.43\%.$$

The $99.27\%$ is within the actual $1\sigma$ of the stated $99.5\%$, but the stated uncertainty $\pm 0.2\%$ is inconsistent with the true statistical uncertainty $\pm 1.43\%$ (which propagates directly from the $S_5$ SEM of $0.0135$). Table 3 independently reports $0.992(14) = 99.2(1.4)\%$ for the same data, confirming the $\pm 0.2\%$ claim is an understatement by a factor of $\approx 7$. No formula reproduces $99.5\%$ exactly.

Full evidence: verification/C052/


C053 plot verified C053 — Fig. 2: all $S_5^{(\text{ext})}(\theta)$ violate NC bound; data agrees with theory

Verdict: verified
Location: Fig. 2 (data_plot_general_latex_bell.png)

Paper figure Reproduced figure
Paper Fig. 2 Reproduced Fig. 2

All 12/12 data points fall below $-3$ (minimum $S_5^{(\text{ext})} \approx -3.88$, maximum $\approx -3.53$). Deviations from ideal QM theory $0.2$–$1.6\sigma$. The small systematic offset $(\sim 0.05)$ is explicitly attributed to qutrit-rotation imperfections in the paper.

Full evidence: verification/C053/


C054 plot verified C054 — Fig. 3: $\text{CF}_N$ peaks at $N=31$, becomes negative at $N=121$

Verdict: verified
Location: Fig. 3 (gons_combined_allpoints_latex.png)

Paper figure Reproduced figure
Paper Fig. 3 Reproduced Fig. 3

$\text{CF}{31} = 0.7995$ (max), $\text{CF} = -0.657$ (negative — confirmed). Shape and scale of reproduced $\text{CF}_N$ vs $N$ panel match the paper's bottom panel.

Full evidence: verification/C054/


C055 citation verified C055 — Christensen 2015: chained Bell up to $N=90$, $\text{CF}_{36} = 0.874(1)$

Verdict: verified
Location: Sec. N-gon states
Claim: "Chained Bell experiments observed contextuality with $N$ up to 90, with $\text{CF}_{36} = 0.874(1)$ [Christensen 2015]."

From Christensen et al. PRX 5, 041052 (2015): data for $n=2$…$45$ per-party settings ($N = 2n$ up to 90); Table III gives $q_{\min}(n=18) = 0.874 \pm 0.001$ ($N=36$, $\text{CF}_{36}$). Confirmed.

Full evidence: verification/C055/


C056 citation verified C056 — Prior extended KCBS experiments limited to $N \leq 7$ [Arias 2015]

Verdict: verified
Location: Introduction
Claim: "Previous extended KCBS experimental studies are limited to $N \leq 7$ [Arias et al. 2015]."

Arias et al. PRA 92, 032126 (2015) tests $C_7$ ($N=7$) and $\bar{C}_7$ only; abstract states "With the exception of the pentagon [$N=5$], this prediction remained experimentally unexplored." Confirmed.

Full evidence: verification/C056/


C057 citation verified C057 — Poh 2015: $99.97(2)\%$ of Tsirelson bound

Verdict: verified
Location: App. K
Claim: "Poh et al. (2015) measured $99.97(2)\%$ of the Tsirelson bound."

From Poh et al. PRL 115, 180408 (2015): $S = 2.82759 \pm 0.00051$; $S/(2\sqrt{2}) = 99.970 \pm 0.018\% \approx 99.97(2)\%$. Confirmed.

Full evidence: verification/C057/


C058 citation verified C058 — Christensen 2015: $\sim 99\%$ CHSH saturation; $N=90$; $\text{CF}_{36} = 0.874(1)$

Verdict: verified
Location: App. K
Claim: "Christensen et al. (2015) came close to $99\%$ of the QM prediction for the CHSH test, measuring chained Bell inequalities up to $N=90$ with $\text{CF}_{36} = 0.874(1)$."

CHSH saturation from paper: $2.817/(2\sqrt{2}) \approx 99.6\%$ (slightly above $99\%$, consistent with "close to 99%"). $N=90$ and $\text{CF}_{36} = 0.874(1)$ as in C055. Confirmed.

Full evidence: verification/C058/


C059 citation verified C059 — Vienna 2011 [Lapkiewicz]: saturation $0.947(6)$, signaling $0.08(3)$

Verdict: verified
Location: Table 3
Claim: "Vienna 2011 KCBS test: saturation $0.947(6)$, signaling $0.08(3)$."

Computed from Table 1 of Lapkiewicz et al. Nature 474, 490 (2011): $S_5 = -3.894(6)$, saturation $= 0.947(6)$; signaling $\delta = 0.081(2) \approx 0.08(3)$. Confirmed.

Full evidence: verification/C059/


C060 citation partial C060 — Stockholm 2013 [Ahrens]: saturation $0.53(11)$ and $0.95(11)$; no signaling data

Verdict: partial
Limitations: minor_methodological_inconsistency_in_cited_paper_normalization
Location: Table 3
Claim: "Ahrens et al. 2013: saturation $0.53(11)$ normal, $0.95(11)$ reverse; no signaling data available."

Ahrens et al. Sci. Rep. 3, 2170 (2013) Table II: $\kappa_{\text{nor}} = -3.536\pm0.005$, $\kappa_{\text{rev}} = -3.896\pm0.006$ (stat). Applying the paper's own formula: reverse $= 0.953$ (matches claim to $0.03\sigma$); normal $= 0.561$ vs claimed $0.53$ (discrepancy: the citing paper appears to use the raw numerator $|\kappa - S^{\text{NC}}| = 0.536$ instead of the normalized fraction). "No signaling data" confirmed (Ahrens reports only joint correlators, never individual marginals).

Full evidence: verification/C060/


C061 citation partial C061 — Beijing 2013 [Deng]: saturation $0.977(11)$ and $0.956(26)$; signaling $0.267$ and $0.291$

Verdict: partial
Failure reason: mismatch (one value)
Location: Table 3
Claim: "Deng et al. 2013: saturation $0.977(11)$ and $0.956(26)$; signaling $0.267$ and $0.291$."

From Deng et al. arXiv:1301.5364 Table I: saturation $0.977(11)$ and $0.956(26)$ confirmed within stated uncertainties. Signaling $0.291$ (biased case) is an exact match. Signaling $0.267$ (uniform case) cannot be precisely reproduced under any single convention (two interpretations give $0.280$ or $0.251$; average $0.265 \approx 0.267$). Core qualitative claim (large signaling relative to violation) unambiguously confirmed.

Full evidence: verification/C061/


C062 citation verified C062 — Beijing 2013 [Um]: saturation $0.589(24)^$, signaling $0.119(24)^$

Verdict: verified
Location: Table 3
Claim: "Um et al. 2013 (re-analysis): saturation $0.589(24)^$, signaling $0.119(24)^$."

Reproduced from Um et al. Sci. Rep. 3:1627 Table 1: $S_5 \approx -3.558$, saturation $= 0.591$; signaling $0.119$ — both within stated $\pm 0.024$. The $^*$ notation (authors' own re-analysis due to errors in original data) confirmed structurally.

Full evidence: verification/C062/


C063 citation verified C063 — Brisbane 2016 [Jerger]: saturation $0.520(1)$ / $0.541(1)$; signaling $0.379(2)$

Verdict: verified
Location: Table 3
Claim: "Jerger et al. 2016: saturation $0.520(1)$ normal, $0.541(1)$ reverse; signaling $0.379(2)$."

From Jerger et al. Nat. Commun. 7, 12930 (2016) Table I: computed saturation $0.5202$ and $0.5411$; signaling $0.3788$ — all match to $\leq 0.001$.

Full evidence: verification/C063/


C064 math partial C064 — Pulse count and experiment duration scale as $O(N^2)$

Verdict: partial
Failure reason: mismatch
Limitations: paper_text_only_reimplementation; arbitrary_assumption_made
Location: Sec. N-gon results; App. A
Claim: "The number of pulses and duration for these experiments both grow as $N^2$."

The per-rotation pulse count formula $(i-1)(N-1)/2 + 3$ for $U_i$ is verified (matches the Fig. caption formula $2i+1$ at $N=5$). However, summing over all $N$ observable pairs gives: $$\sum_{i=1}^{N}\!\left[\frac{(i-1)(N-1)}{2}+3\right] = \frac{N(N-1)^2}{4}+3N \sim O(N^3).$$

A power-law fit to values for $N = 17\ldots121$ yields exponent $\approx 3.03$. The $O(N^2)$ scaling is recoverable only if one counts exclusively the inter-measurement concatenated transitions ($\approx N^2/2$) while ignoring the dominant pre-measurement rotation. The concatenation identity $U_j^\dagger U_i = U_{i-j}$ was checked numerically (Frobenius distance $2.1$–$2.7 \neq 0$), confirming it is a deliberate approximation.

Full evidence: verification/C064/


C065 empirical verified C065 — Measured $S_5$ surpasses Bell-scenario quantum maximum $\bar{S}_5^{\text{Bell}} \approx -3.828$

Verdict: verified
Location: Sec. KCBS results; Fig. 2
Claim: "Close to compatibility we can resolve values of $S_5$ surpassing the Bell-scenario quantum maximum $\bar{S}_5^{\text{Bell}} \approx -3.828$."

Normal order: $S_5 = -3.9154$ — below $-3.828$ by $6.2\sigma$. Reverse order: $S_5 = -3.9374$ — below $-3.828$ by $8.1\sigma$. All 12/12 general-scan data points within $2°$ of $\theta_5$ lie strictly below $-3.828$.

Full evidence: verification/C065/


C066 empirical partial C066 — Largest number of observables ($N=101$) in any contextuality experiment (Dec 2017)

Verdict: partial
Limitations: paper_text_only_reimplementation; literature_survey_not_exhaustive
Location: Sec. N-gon results
Claim: "These results show contextuality in a system with the largest number of observables (101) of any experiment reported up to this date."

Literature audit from available artifacts: - Christensen 2015: chained Bell up to $N=90$ (even-cycle) ✓ below 101. - Arias 2015: extended KCBS up to $N=7$ ✓. - Leupold 2017 (same group, June 2017): SIC test with 13 observables ✓. - All other cited experiments: $N \leq 6$.

Prior record: $N=90$ (even-cycle Bell); this paper's $N=101$ (odd-cycle KCBS) exceeds both. Verdict partial: exhaustive pre-Dec 2017 contextuality literature cannot be confirmed from available artifacts alone; paper hedges "to our best knowledge."

Full evidence: verification/C066/


C067 empirical partial C067 — $\text{CF}_{31} = 0.800(4)$ is largest contextual fraction closing the detection loophole (Dec 2017)

Verdict: partial
Limitations: paper_text_only_reimplementation; partial_data_coverage
Location: Sec. N-gon results
Claim: "The measured contextual fraction is larger than for any other experiment closing the detection loophole [Tan 2017]."

Tan et al. PRL 118, 130403 (2017) ($^9\text{Be}^+$ trapped ions, $\sim 100\%$ detection): best result $I_9 = 0.296(12)$, $\text{CF}9 = 0.704 \pm 0.012$ — $7.6\sigma$ below $0.800$. Other loophole-free experiments: Hensen 2015 ($\text{CF} \approx 0.21$), Giustina 2015 ($\approx 0.35$), Shalm 2015 ($\approx 0.01$) — all well below $0.800$. Christensen 2015 reached $\text{CF} = 0.874$ but did NOT close the detection loophole. Verdict partial: CF extracted from Tan et al.\ text rather than raw data; broader assertion not exhaustively verified.

Full evidence: verification/C067/


C068 math not verified C068 — Shot-noise bias formula $\sqrt{2(1-\cos 2\theta_5)/(\pi n)}$ for $S_5^{(\text{ext})}$ ❌

Verdict: not_verified (mismatch)
Location: App. E (sec:dataAnalysis)
Claim: The shot-noise bias in $\varepsilon_i$ at $\theta_5$ with $n = 10{,}000$ shots is given by $\sqrt{2(1-\cos 2\theta_5)/(\pi n)}$.

Derivation (from verification/C068/derivation.md):

Two compounding errors identified:

(1) Wrong variance formula: The paper writes $\sigma^2_{A_i} = (1 - \langle A_i\rangle)/n$, but for $\pm 1$ outcomes the correct shot-noise variance is $(1 - \langle A_i\rangle^2)/n$.

(2) Missing factor of 2: The combined variance for $\varepsilon_i = |\langle A_i^{(1)}\rangle - \langle A_i^{(2)}\rangle|$ should be $\sigma^2_{A^{(1)}} + \sigma^2_{A^{(2)}} = 2\sigma^2_A$, not $\sigma^2_A$.

At $\theta_5$ ($\cos 2\theta_5 = 2/\sqrt{5} - 1 \approx -0.1056$):

Quantity Value ($n=10{,}000$)
$\text{E}[\hat\varepsilon_i]$ (correct $\pm1$ variance) $0.01122$
Paper formula $\sqrt{2(1-\cos 2\theta_5)/(\pi n)}$ $0.00839$
Ratio $1.337$

The paper's formula can only be recovered by simultaneously using the wrong variance and treating $\sigma^2_{\varepsilon_i} = \sigma^2_A$ (single measurement) instead of $2\sigma^2_A$. The conceptual framework ($\varepsilon_i$ follows a folded normal; $\varepsilon_i(\theta_5) = 0$) is correct.

Full evidence: verification/C068/


5. Point-by-Point Review of the Paper Body

Abstract

Point Assessment Supporting claims
Quantum contextuality demonstrated in single trapped-ion qutrit using KCBS $N$-gon states agreed C001, C006, C036C047 — full experimental programme reproduced exactly from public data
All data points violate extended KCBS inequality ($S_N^{(\text{ext})} < S_N^{\text{NC}}$ up to $N=61$) agreed C035, C048 — confirmed at $4.7$–$29.6\sigma$ for $N \leq 61$
Largest contextual fraction $\text{CF}_{31} = 0.800(4)$ agreed C049 — reproduced to $0.1\sigma$ from raw data
KCBS result is $\approx 99.5(2)\%$ of QM prediction partially agreed C052 — computed $\approx 99.3\pm1.4\%$; the central value is within actual $1\sigma$, but the stated uncertainty $\pm0.2\%$ is understated by $\approx 7\times$; see §6
Contextuality demonstrated for largest number of observables ($N=101$) agreed C048, C066 — unambiguously exceeds all cited prior experiments
Largest detection-loophole-free contextual fraction agreed C067 — Tan 2017 gives $\text{CF}_9 = 0.704 \pm 0.012$, $7.6\sigma$ below this work

Introduction

Point Assessment Supporting claims
KCBS provides a state-independent NC inequality for qutrits; prior experimental tests limited to $N \leq 7$ agreed C056 — Arias 2015 confirms $N=7$ frontier; C001, C006 verify the inequality structure
Chained Bell experiments (even $N$) have reached $N=90$ with $\text{CF}_{36} = 0.874(1)$ agreed C055, C058 — directly confirmed from Christensen 2015
This work demonstrates $\text{CF}_{31} = 0.800(4)$, the largest detection-loophole-free CF agreed C049, C067 — verified; see C067 for detection-loophole condition

KCBS Experiment — Setup

Point Assessment Supporting claims
NC bound $S_5 \geq -3$ (Eq. 1) agreed C001 — verified by exhaustive enumeration and combinatorial proof
QM minimum $S_5^{\text{QM}} = 5-4\sqrt{5} \approx -3.944$ (Eq. 2) agreed C002 — verified analytically
Compatibility angle $\theta_5 = \arccos(5^{-1/4}) \approx 48°$ agreed C003 — $48.030°$, rounds to $48°$
Extended KCBS inequality $S_5^{(\text{ext})} \geq -3$ penalises signaling (Eq. 3) agreed C004 — confirmed; reduces to standard KCBS at $\theta_5$
Qutrit encoded in $^{40}\text{Ca}^+$ Zeeman levels; 729 nm drive agreed C013, C014, C015 — all apparatus parameters verified

KCBS Results

Point Assessment Supporting claims
Table 1 normal-order correlators agreed C029, C030 — reproduced to $\leq 0.04\sigma$ from raw data
Table 1 reverse-order correlators agreed C031, C032 — reproduced to $\leq 0.18\sigma$
Violation of NC bound by 65/67 $\sigma$ at $\theta_5$ agreed C034 — arithmetic confirmed
All $\theta$-scan data violate extended KCBS by up to 25 $\sigma$ agreed C035 — all 12/12 points below $-3$; computed max $\approx 27\sigma$
Systematic shift of 1.6 $\sigma$ from QM prediction partially agreed C033 — qualitatively confirmed ($\sim1.5$–$1.8\sigma$); the exact $1.6\sigma$ figure cannot be precisely reproduced but is within the range of plausible combination methods
Measured $S_5$ surpasses Bell-scenario quantum maximum $\approx -3.828$ agreed C012, C065 — both Table 1 values are $> 6\sigma$ below $-3.828$

N-gon States (Theory)

Point Assessment Supporting claims
$N$-gon compatibility angle formula $\theta_N = \arccos\!\sqrt{\cos(\pi/N)/(1+\cos(\pi/N))}$ agreed C005 — verified symbolically and numerically for all $N$ tested
Classical bound $S_N \geq -N+2$ for odd $N$ agreed C006 — proven combinatorially
QM minimum $S_N^{\text{QM}} = (N-3N\cos(\pi/N))/(1+\cos(\pi/N))$ agreed C007 — verified at $N=5$ symbolically; numerical check for all $N \leq 121$
Contextual fraction definition $\text{CF}_N = (S_N - S_N^{\text{NC}})/(S_N^{\text{NS}} - S_N^{\text{NC}})$ agreed C008 — verified against Table 2 for all $N$
$\text{CF}_N \to 1$ as $N \to \infty$ agreed C007, C008 — derivable from formulas: as $N\to\infty$, $\cos(\pi/N)\to 1$, $S_N^{\text{QM}}\to -N = S_N^{\text{NS}}$

N-gon Results

Point Assessment Supporting claims
Table 2: all 12 $N$ values match QM predictions agreed C036C047 — reproduced from public data; note $S_7^{\text{NC}} = -6$ in paper appears to be a typo (formula gives $-5$)
Pulse count and duration grow as $N^2$ partially agreed C064 — per-rotation formula verified; but summing all pulses gives $O(N^3)$, not $O(N^2)$; $O(N^2)$ is recoverable only under a specific (concatenation-only) counting convention not fully supported by the stated derivation
Largest $\text{CF}_{31} = 0.800(4)$ agreed C049 — confirmed to $0.1\sigma$
Contextuality for $N$ up to 101 (bare), 61 (extended) agreed C048 — exact cutoffs confirmed from data
Largest $N$ of any contextuality experiment at time of publication agreed C066 — prior record $N=90$ (Bell); this work's $N=101$ exceeds it; partial verdict due to non-exhaustive survey
Largest detection-loophole-free CF agreed C067 — Tan 2017 ($\text{CF}_9 = 0.704$) confirmed below $0.800$

Conclusion

Point Assessment Supporting claims
Summary of main results; future directions agreed — no novel factual claims (none)

App. A — Qutrit Transitions

Point Assessment Supporting claims
Qubit encoding, magnetic field, wavelength agreed C013, C014, C015 — all verified or well-corroborated
AC Stark shifts $< 100\ \text{Hz}$ partially agreed C016 — physically self-consistent but Rabi frequency not available; plausible given stated operating regime

App. B — Qutrit Coherence Times

Point Assessment Supporting claims
$\sigma_t \approx 1.6\ \text{ms}$ for transitions to $D_{5/2}$ partially agreed C017 — internally consistent (1.629 ms from FWHM); 17Leupold reports 2.5 ms (different run conditions)
$\sigma_t \approx 7\ \text{ms}$ for $ 1\rangle\leftrightarrow 2\rangle$
Common-mode noise $\approx 230\ \text{Hz}$ FWHM agreed C019 — computed 234 Hz, within $\approx$ qualifier
Differential noise $< 50\ \text{Hz}$, $B < 5\ \mu\text{G}$ partially agreed C020 — Zeeman sensitivity gives $< 8.4\ \text{Hz}$ per $5\ \mu\text{G}$, consistent with paper; no raw Ramsey data
Drift $\sim 100\ \text{Hz}$, recalibration every 30 s partially agreed C021 — confirmed from paper text; not independently verifiable

App. C — Ion Cooling

Point Assessment Supporting claims
Doppler cooling: $n_{\text{th}} \approx 5$ partially agreed C022 — Doppler limit calculation gives $\approx 3.9$ for axial mode; radial modes lower; exact trap frequencies not stated
EIT cooling: $n_{\text{th}} \approx 0.2$ agreed C023 — verbatim in Alonso 2016 (same apparatus)
Heating rate $\sim 200\ \text{quanta/s}$ during dark detection not agreed C024 — claim unverifiable from available data; plausible but requires dedicated sideband spectroscopy data

App. D — Qutrit Detection

Point Assessment Supporting claims
Fluorescence at 397 nm + repump at 866 nm agreed C025 — NIST levels confirmed to $< 0.5\ \text{nm}$
$\approx 25$ photons bright, $\approx 1$ bg, $\approx 200\ \mu\text{s}$ window partially agreed C026 — Leupold 2017 gives consistent values after scaling; no raw photon data
Detection errors $2\times10^{-5}$ / $1\times10^{-4}$ partially agreed C027 — Poisson model gives $2.3\times10^{-5}$ / $1.2\times10^{-4}$; within $\approx$ qualifier
$D_{5/2}$ lifetime $\approx 1.2\ \text{s}$ partially agreed C028 — spectroscopic literature gives $1.155$–$1.168\ \text{s}$; rounds to $1.2\ \text{s}$

App. E — Data Collection and Analysis

Point Assessment Supporting claims
Data analysis reproduces all Table 1 and Table 2 values agreed C029C032, C036C047 — reproduced from raw data
Statistical significance calculations (65 $\sigma$, 67 $\sigma$, up to 25 $\sigma$) agreed C034, C035 — confirmed
Shot-noise bias formula for $\varepsilon_i$ (folded normal) disagreed C068 — the conceptual framework is correct but the closed-form formula contains two errors (wrong variance; missing factor of 2); correct value $\approx 1.34\times$ larger than claimed

App. G — Theoretical Predictions for KCBS Witnesses

Point Assessment Supporting claims
Single-correlator formula agreed C009 — verified symbolically
Minimum at $\theta = \pi/2$, value $\approx -4.045$ partially agreed C010 — value $-4.045$ is correct; angle $\theta = \pi/2$ is a typo (should be $\theta = \pi/4$); the maximum $S_5(\pi/2) = 5$
$\varepsilon_i$ formula agreed C011 — verified symbolically

App. H — Relevance of KCBS

Point Assessment Supporting claims
Bell-scenario quantum maximum $\bar{S}_5^{\text{Bell}} \approx -3.828$ agreed C012 — verified; equals $-1-2\sqrt{2}$

App. I — N-Cycle Details

Point Assessment Supporting claims
Table 2 complete $N$-gon data agreed C036C047 — see §4; note $N=7$ NC bound entry appears to contain a typo

App. J — Comparison with Previous KCBS Tests

Point Assessment Supporting claims
Vienna 2011 row agreed C059 — confirmed from Lapkiewicz 2011 data
Stockholm 2013 row partially agreed C060 — reverse saturation $0.953$ confirmed; normal saturation $0.561$ vs claimed $0.53$ (minor methodological inconsistency in normalization)
Beijing 2013 (Deng) row partially agreed C061 — 3/4 numbers confirmed; uniform-case signaling $0.267$ not exactly reproducible
Beijing 2013 (Um) row agreed C062 — re-analyzed values confirmed
Brisbane 2016 (Jerger) row agreed C063 — confirmed from Jerger et al. Table I
This work normal/reverse agreed C050, C051 — reproduced to $< 0.20\sigma$

App. K — Comparison with Bell Tests

Point Assessment Supporting claims
This work: $99.5(2)\%$ of QM limit partially agreed C052 — computed $99.3\pm1.4\%$; $0.15\sigma$ from $99.5\%$ but uncertainty understated $\approx7\times$; see §6
Poh 2015: $99.97(2)\%$ of Tsirelson bound agreed C057 — confirmed
Christensen 2015: $\sim 99\%$ CHSH, $N=90$, $\text{CF}_{36} = 0.874$ agreed C058 — confirmed

Point Assessment Supporting claims
No novel factual claims agreed — no novel factual claims (none)

6. Major Issues

C052 — Stated QM-limit saturation 99.5(2)% is numerically inconsistent

Claim: "This work's KCBS result corresponds to $99.5(2)\%$ of the QM limit."
Location: App. K (sec:Bellcomparison) and Abstract.

Issue: The standard saturation formula $(S_5 - S_5^{\text{NC}})/(S_5^{\text{QM}} - S_5^{\text{NC}})$ applied to the best (reverse-order) data gives $99.27 \pm 1.43\%$ — not $99.5\pm0.2\%$. The discrepancy has two components:

  1. Central value: $99.27\%$ vs $99.5\%$ — a $0.23$ percentage-point difference, which is within the actual $1\sigma = 1.43\%$ so not wrong per se, but the quoted value is not what the data directly yield.

  2. Stated uncertainty: $\pm 0.2\%$ is inconsistent with the actual statistical uncertainty $\pm 1.43\%$ (a factor of $\approx 7$ understatement). Table 3 of the same paper independently reports $0.992(14) = 99.2(1.4)\%$ for the reverse order, confirming the correct uncertainty.

Impact: The $99.5(2)\%$ figure appears in the abstract and is used as a comparison benchmark against Poh 2015 ($99.97(2)\%$) and Christensen 2015 ($\sim 99\%$). The understated uncertainty exaggerates the precision of the KCBS result; the true $\pm 1.4\%$ would not affect any comparative conclusion (still clearly above $99\%$) but is misleading as written.

Recommendation: The abstract and App. K should read $99.3(1.4)\%$ or, accepting the rounded Table 3 value, $99.2(1.4)\%$.


7. Minor Issues

C010 — Typo: minimum angle $\theta = \pi/2$ should be $\theta = \pi/4$

Claim: The minimum of $S_5(\theta)$ is at $\theta = \pi/2$.
Location: App. G.
Finding: $S_5(\pi/2) = 5$ (maximum). Minimum is at $\theta = \pi/4$ (where $\cos(4\theta) = -1$). The numerical value $\frac{5}{4}(-\sqrt{5}-1) \approx -4.045$ is correct. Pure typographical error in a supplementary appendix; does not affect any experimental result.


C024 — Heating rate $\sim 200\ \text{quanta/s}$ unverifiable

Claim: Dark-detection motional heating rate $\sim 200\ \text{quanta/s}$.
Location: App. C.
Finding: No sideband spectroscopy data in the public repository; claim cannot be confirmed or refuted. Plausible for a surface-electrode trap; apparatus characterisation only; no impact on contextuality conclusions.


C068 — Shot-noise bias formula contains two errors

Claim: Expected gap in $S_5^{(\text{ext})}$ at $\theta_5$ is $\sqrt{2(1-\cos 2\theta_5)/(\pi n)}$.
Location: App. E.
Finding: Two compounding errors — (1) variance formula omits the square on $\langle A_i\rangle$; (2) combined variance for $\varepsilon_i$ should be $2\sigma^2_A$, not $\sigma^2_A$. The correct value is $\approx 1.34\times$ larger ($0.01122$ vs $0.00839$ at $n=10{,}000$). This affects the dashed red correction curves in Figs. 1 and 2, but not any of the primary contextuality conclusions.


C016 — AC Stark shift bound $< 100\ \text{Hz}$ (plausible, unconfirmed)

Rabi frequency not stated; physically self-consistent reconstruction gives $\approx 100\ \text{Hz}$ at $\Omega/2\pi \approx 50\ \text{kHz}$. Apparatus detail; no impact on main results.


C017, C018 — Coherence times (apparatus characterisation)

$\sigma_t \approx 1.6\ \text{ms}$ (C017) and $7\ \text{ms}$ (C018): internally consistent with stated noise widths; companion paper 17Leupold gives $\approx 1.6\times$ longer values in a different run of the same apparatus. No raw Ramsey data available.


C020, C021 — Differential noise and recalibration (apparatus characterisation)

C020: $B$-field contribution $< 8.4\ \text{Hz}$ confirmed; remaining $\sim 42\ \text{Hz}$ attributed to slow drifts. C021: recalibration parameters confirmed by text; not independently recomputable. No impact on experimental conclusions.


C022 — Doppler cooling $n_{\text{th}} \approx 5$ (approximate)

Doppler limit calculation gives $\approx 3.9$ (axial) to $1.1$ (radial); claim appears to refer to the axial mode or uses slightly lower trap frequencies than the cited companion paper. Apparatus detail.


C026, C027, C028 — Photon counts and detection errors (apparatus characterisation)

C026 and C027: computed values within $6$–$25\%$ of stated values (consistent with $\approx$ qualifiers); C028: $D_{5/2}$ lifetime from spectroscopy literature ($1.155$–$1.168\ \text{s}$) rounds to stated $\approx 1.2\ \text{s}$. No impact on contextuality conclusions.


C033 — Systematic shift "1.6 $\sigma$" from QM prediction

Computed $1.49$–$1.78\sigma$ depending on combination method; no single formula reproduces exactly $1.6\sigma$. Qualitative statement ("approximately $1.6\sigma$") is accurate.


C044 empirical partial C044 — $N=61$ row (partial due to reimplementation only)

All five values match paper within $0.1\sigma$; verdict partial only because no paper-provided analysis code exists. Not a substantive issue.


C060 — Stockholm 2013 normal-order saturation ($0.53$ vs computed $0.561$)

Minor methodological inconsistency: the citing paper appears to use the raw numerator $|\kappa - S^{\text{NC}}|$ rather than the full normalized fraction for the normal-order entry. Both values are within the $\pm 0.11$ error bar.


C061 — Beijing 2013 (Deng) uniform-case signaling ($0.267$, ambiguous)

Three of four numbers confirmed; uniform-case signaling $0.267$ is between two equally plausible interpretations ($0.251$ and $0.280$). Core qualitative finding (large signaling) unaffected.


C064 — $O(N^2)$ pulse-count scaling not supported by stated derivation

Per-rotation formula correct; summing all $N$ rotations gives $O(N^3)$, not $O(N^2)$. $O(N^2)$ holds only for a concatenation-only counting convention not fully described in the paper. Practical conclusion (total experiment time grows rapidly with $N$) is correct.


C066 — Record $N=101$ observables (literature survey limitation)

Strongly supported by all available citations; verdict partial only due to non-exhaustive pre-Dec 2017 survey. Not a substantive issue.


C067 — Largest detection-loophole-free CF (literature survey limitation)

Tan 2017 ($\text{CF}_9 = 0.704$) confirmed $7.6\sigma$ below $0.800$. Verdict partial only because Tan et al. raw data not available; extracted values from paper text.


End of report. Verdict counts: verified 48 / partial 17 / not_verified 3 (total 68). Major-issue IDs: C052.