Reading the quantum

AI for peer review, ft. Claude Fable

2026-06-08T00:00:00+00:00

The bull case for peer review is that, when done right, it is an extremely powerful tool for vetting and improving scientific results. The bear case is that peer review is rarely done right; instead, it is a low-throughput, mixed-quality process that critically relies on highly skilled, unpaid labor. Wherever you sit on this, I think we agree that some aspects of peer review are badly broken. Furthermore, as AI-assisted science brings about a flood of new paper submissions to the legacy scientific journals, the already clogged peer review pipeline will likely burst.

The question is then: can AI tools fix peer review? To get a feel for the state of play, I ran a little experiment using frontier AI models to peer-review some of my own old papers.

What does a peer reviewer actually do?

The term “review” in “peer review” is actually a bundle of two distinct roles:

Impact assessment. Peer reviewers form an opinion on the overall quality of the paper, and its likely impact on the field against the background literature. Based on this, they recommend to the editor whether the paper should be published in a given journal.
Technical validation. Peer reviewers verify the analyses, poke holes in the methods, and question assumptions that went into the derivations. Based on this, they recommend to the authors how the paper can be fixed or strengthened.

In my experience, impact judgement is rarely the bottleneck - few things make us happier than to share our opinions with others. It is also the part I wouldn’t want to outsource to AI right now.

Technical verification, on the other hand, currently leaves a lot to be desired. In some platonic ideal world, an ultra-diligent peer reviewer would go through the paper claim by claim, and independently verify each one. However, given how much time that would take - especially for 99.9% of the papers where authors don’t make verification particularly easy - reviewers usually resort to “skimming”, spot-checking, and only digging into the claims that “smell funny”. This means poor probability of spotting and fixing errors, which is particularly problematic given the public perception that peer-reviewed publications are truly “verified” or “true”.

AI as ultra-diligent technical reviewer

Whatever you think of AI models, they are diligent and - when orchestrated appropriately - will keep working on the most tedious of tasks until they’re completed or you run out of tokens. Armed with this insight, I created a pipeline that emulates my platonic ideal of an ultra-diligent technical peer reviewer. The project - available on GitHub - contains the following stages:

Ingest: agent downloads the paper, all associated supplementary information, attached code & datasets, as well as the citations. These are the complete artefacts associated with the paper
Extract: agent reads the paper and extracts all individual claims. Claims can be analytical (“Taking the rotating wave approximation on Eq.1 we arrive at Eq. 2”), numerical, empirical (“we measure coherence time of 2 ms”), citation-based (“experiments to date struggled to reduce error rate below 1e-3 per gate”), and plots.
Verify: agents read individual claims and attempt to verify them through any means necessary - reading background literature, doing independent derivations and numerics, re-plotting data etc. At the end, each claim gets a verification report, together with a label of “verified”, “partial”, or “not verified”. Verification can be run multiple times through different agents, allowing agents to correct each other’s work.
Report: a report (markdown and html) is compiled summarising the verification attempts

I have run this pipeline on a couple of my own papers. Here are the results

Paper 1, ft. Claude Sonnet 4.6

For the first experiment, I picked my first first-author paper, Probing the limits of correlations in an indivisible quantum system (arXiv:1712.06494). This work was a good intro point for AI since:

It was pretty straightforward mathematically
It came with a dataset, so the agent could independently re-run the data analysis

For this test, I used a cheap model claude-sonnet-4.6. The first version of the review pipeline produced the following report. Let’s take a look.

First, the scorecard. Not bad!

Let’s take a look at what the agent found. On analytics, it basically re-derived every mathematical claim in sympy, which is amazing.

Through this process, it found two errors in my analysis. First was a typo (pi/2 instead of pi/4). AI did a very good job making it very obvious I was wrong

Second, an actual error in the formula I used for calculating error bars. Better than that - it showed me how I made two errors that mostly canceled out, such that the final result was a 30% underestimation of the uncertainty. This is exactly what I was hoping for - because AI has patience to go through this line-by-line, it can verify individual mathematical statements that none of my peer reviewers or collaborators ever bothered to look at.

While most numerical values in the paper were unverifiable, AI did something neat and verified my numbers against a different paper on the same apparatus. It also checked for internal consistency, e.g. between the ion temperature and the Doppler limit, or the qubit frequency noise and the coherence time

When it comes to plots, AI was able to independently rerun the whole pipeline of “raw data to plot” and verify the values of all data points explicitly quoted in the paper, as well as achieve a visual match between original and reproduced figures. This was my happiest moment. To be clear - while the raw data has been available online for the past 9 years, I’m pretty sure this is the first time anyone attempted to rerun this analysis.

For citations, the agents verified I did not misquote anyone

and the empirical claims were independently confirmed with re-computed values, surfacing minor errors

Having gone through many rounds of peer review of this paper, I can 100% confirm this was by far the best technical peer review this paper had ever received. While human peer reviewers were strong on the overall value and impact, as well as the philosophical aspects of these tests, their technical verification was way less impactful than 30 minutes of claude-sonnet-4.6.

Paper 2, ft. Claude Opus 4.8 + GPT 5.5 xhigh

For the second experiment, I picked the final paper from my PhD, “Generation of a maximally entangled state using collective optical pumping” (arXiv:2107.10374). Because that work contained some slightly more difficult mathematics and simulations, I went with more powerful Claude Opus 4.8 + GPT 5.5 xhigh. Following earlier successful experiments with coding, I ran two passes of review, alternating between the models. You can explore the report here.

First, the score

As before, the reviewer broke the paper into atomic claims. As you can see, I updated the pipeline so I can now click on each claim and see the full derivation, which is pretty neat

Some claims were verified analytically and numerically, but even pure numerical verification is fairly useful. For example, claim 24 says that my entangling protocol amplifies local errors but suppresses global errors - something I recall establishing analytically. The AI, on the other hand, verified this by running simulations with all possible global and local errors while scanning error probability p, and then extracting the gradient of the error with respect to p.

The question that interested me the most was whether AI would be able to reproduce my simulations of how the error in the final entangled state depends on physical error sources. This is the plot from the paper, comparing my protocol (in blue) vs standard 2Q gates (green)

and here is AI result (rounds 1-2 of claim 59)

As we can see, AI successfully reproduced the bottom figures, but we have a clear mismatch for the top two figures (the effect of qubit and motional frequency errors). I asked AI to keep working to find the discrepancy.

Within 5 minutes, it found a bug in its analysis of motional errors, and got a better match with one of my figures. Digging in, I found that I had once made the exact same error that AI just did - except I immediately understood that something was suspicious, and worked hard to rectify it.

The qubit frequency figure, on the other hand, resisted reproduction. I reviewed the code and immediately became suspicious that AI’s code is not using a large enough motional Hilbert space. Indeed, after I pointed this out, GPT 5.5 performed a convergence analysis to understand how many Fock states it needs, and was able to successfully reproduce the plots.

All in all, on this simulation, I score Opus 4.8 and GPT 5.5 as PhD students still getting the hang of things. The analysis they did definitely feels above Master’s level, but a competent, advanced grad student would have likely identified and rectified these errors autonomously.

Notwithstanding, once again, the review as a whole was spectacular, and would have been super helpful to have at the time of writing. Even though AI made some errors, it is really my fault for not including enough details and code in the paper to make it easily reproducible by anyone. Had I been writing the paper today, I would have used this workflow to make sure nobody has to guess how I came to the conclusions I did.

Paper 3, ft. Claude Fable

As I was getting ready to publish this post, Claude Fable became available, and I couldn’t resist the chance to give it a shot. And boy, was I blown away.

First of all, Fable basically made my whole pipeline obsolete. The reason I made the pipeline in the first place is that I found simply asking AI “break this paper into claims and verify each one” produced very poor results, with AI forgetting or bullshitting many verifications. The review pipeline, though cumbersome, enforced structure and accountability, making sure no claim is left behind.

Fable, on the other hand, seems to just follow instructions. Thus, after a single prompt to review arXiv:2606.07736 (a random paper I found that day), I got a full paper reproduction codebase in Python, together with a report that contains:

1) A numerical and/or sympy verification of every numerical and analytical claim

2) Full reproduction of every single figure

3) A detailed list of errata in equations which, based on a brief review, I think is 100% correct

Oh, and remember how Opus and GPT-5.5 struggled to debug the two plots in Paper 2? Fable managed to reproduce all the results in a single shot.

I’ll be exploring this more in the coming weeks, but for now, it looks to me like Claude Fable is:

Extremely diligent at long-running tasks, reducing the need for strict drift guardrails from previous models
Extremely competent at theoretical physics, and capable of spotting BS and subtle issues in the analysis

Summary

Clear take-aways from the experiment:

The AI tools are fantastic at technical review of physics papers
Older models need to be encouraged to stay on track; newer models are more capable of completing a full review from a single prompt
AI can independently reproduce medium-complexity results, somewhere at a level of a PhD student, even when the paper does not provide good guidance on what to do
AI’s diligence and low cost enable extremely thorough review of minutia that human reviewers usually don’t bother with
Every paper can be improved through feedback from frontier AI models

I hope that the academic publishing industry will embrace this technology with open arms. Peer review is broken and slow, and AI tools give us a chance to improve quality and throughput. By letting AI focus on the question of “is this paper technically correct”, the humans will have better ability to focus on the most important question of all: “is this paper worth publishing”.

Can AI answer open questions in physics?

2026-06-05T00:00:00+00:00

The joke is that every time that a new AI model comes out, someone throws it at the Collatz conjecture just in case. It’s actually probably not the worst idea, but “nails high-school physics but cannot yet create the theory of everything” is not the most useful description of the level of physics proficiency of AI models. What I need to know is: as of mid-2026, how good is frontier AI at frontier physics exactly?

According to the published literature, the answer is “it’s getting there”. For example, CritPt compiled a benchmark of 50 research-level physics questions. At the time of their evaluation in November 2025, the top model scored a mere 12.6%. However, in the nine months since, the top score went up to over 30%, with a total of 10 models scoring better than November’s winner!

Likewise, consider TPBench, a benchmark for theoretical physics problems. In the preprint published in February 2025, the models pretty much nailed high-school physics, but scored up to ~50% on graduate-level physics and ~15% on research-level physics:

Rerunning the same benchmarks in 2026, the authors found that GPT-5.5 (xhigh) scored ~85% on graduate-level questions and a whopping 55% on research-level questions. That’s pretty astonishing, especially that GPT-5.5 (xhigh) is not even the most powerful physics model out there.

Still, the questions in those benchmarks are generally artificial: they are all concrete, solvable, and someone out there knows the solution. One can also worry that even if the author keeps the answer to themselves, the answer may one way or another “leak” on the internet, e.g. as part of a published research paper.

Thus, a few weeks ago, I decided to make an experiment where I throw frontier AI models at “real” open physics questions. Here is what I found.

What are the open questions in physics?

First clarification - by “open physics questions”, I don’t mean the ones you heard on national geographic, like “what’s the origin of the matter-antimatter asymmetry”!

While we’re all used to hearing about the “big” open questions in physics, the truth of the matter is that the field is full of “small” open questions. In fact, every day physicists around the world publish hundreds of papers and preprints, many of which contain “open questions”. This can be anything from a puzzling experimental observation that eludes explanation, through a lemma that seems reasonable but evades proof, all the way to the long-term proposals for follow-up work.

What’s nice about these open questions is that they’re truly open. Unlike TPBench, answering any of these questions would genuinely advance human knowledge - even if only by a tiny amount and on a very niche topic. Furthermore, unlike existing benchmarks, the question difficulty - or even whether a solution exists - is unknown a priori.

At the same time, I hypothesise that many of them are actually not that hard - a typical “open question” in a physics paper received a relatively small amount of attention from a very small number of humans. This makes them a promising target for AI input.

Scraping open questions

For the experiment, I started by scraping one week’s worth of preprints from the quant-ph arXiv, finding 275 papers with over 200 candidates for open questions based on grepping of phrases such as “open”, “unresolved”, and “future work”.

The first observation was that - unsurprisingly - the vast majority of my “open problems” were (in my opinion) not suitable for AI just yet. Typical reasons were that:

They’re about experimental / hardware validation. Example from my dataset: “operation at few-kelvin temperatures remains to be demonstrated”.
They’re vague outlines of future roadmap. Examples: “Future work will explore strong-coupling corrections beyond the Born–Markov approximation, non-Gaussian TLS noise, and machine-learning-based real-time estimation”, “future work on covert quantum communication under realistic uncertainty, including adaptive, correlated, deployment-oriented, and distributionally robust extensions”
They’re very broad. Example: “The description of the non-equilibrium dynamics of isolated quantum many-body systems […] is a fundamental open question.”
They’re heavy on practical engineering. Examples: a quantum simulation paper leaves “A full gate-count analysis, circuit synthesis […] and quantum-hardware execution” for future work / another paper asks for “a full end-to-end fault-tolerant distributed-quantum-computing compiler.”
They’re too big. Example: a paper on high-dimensional layer codes asks about “determining the finite-temperature self-correction properties of these 4D and 5D layer codes.”
They’re about broad protocol design. Examples: a QKD paper asks about “proving unconditional security for such non-Gaussian-measurement CV-QKD protocol”; a paper on distributed molecular-node quantum computing asks to “design modular architectures and entanglement-generation protocols that achieve fault tolerance with feasible inter-module entanglement rates and hardware-resource requirements.”

I addressed this with a lightweight but aggressive (agentic) down-selection pipeline, aimed to identify questions most suitable for AI assistance. Some personal judgement went into this, but overall, questions received bonus points when they were:

Yes/no mathematical claims. Example: “does every entangled two-qubit state admitting two pure steered states must have density-matrix rank 2.”
Questions about formulas or numbers. Example: a paper establishes a threshold formula numerically, and asks for an analytic proof.
Optimization / characterization questions. Example: a paper asked for “analytically characterizing the maximal possible violation of the realignment criterion over PPT entangled states.”
Result extension questions. Example: a paper proved a result about single-qubit quantum channels, and asked for extension to multi-qubit channels.

Following this down-selection, 48 questions across 33 papers were extracted and packaged into stand-alone units, and the AI got to work on trying to crack them one by one.

The AI solver

My first attempt used an automated two-agent loop: claude opus 4.8 as the “solver”, and gpt 5.5-xhigh as the “reviewer”. This, however, resulted in a very low success rate. I then switched to gpt 5.5 Pro as the “solver”, with much better success rate. However, as I didn’t have access to that model through an API - only through web UI - the remaining part of the experiment was rather manual and less systematic.

Here is what followed:

GPT 5.5 proceeded to work on 48 problems across 33 papers
It claimed to successfully solve 42/48 of them, across 30 papers
I sent 30 emails to authors of these papers, asking for results verification
I received a response to 16/30 of these emails

The result of the verification has been:

25% (4/16): Bad. AI incorrectly claimed a proof, or misframed the problem
55% (9/16): Mixed. AI provided a partial or promising proof, but likely incomplete
20% (3/16): Good. AI confirmed to have effectively solved the problem.

Those are the headline numbers, but they hide the really interesting nuance. So let’s dig in into the examples of each, what they tell us about physics, and how to improve the experiment to improve its performance next time.

The bad

One example miss was the problem from Quantum randomness beyond projective measurements. Quote: “a simple question remains open: how much intrinsic randomness can be generated by a given extremal measurement?”. AI claimed to have solved it by reformulating it as a pure-state minimax optimization and doing the variational characterisation. Two problems:

The paper already did all of this, using a slightly different notation. So in essence, all that AI did was to re-formulate or re-derive some of the results in the paper.
A human reader would have understood that the real open problem was an explicit solution to the optimisation problem, not just formulating and characterising it

Another misunderstood open question came from The Marginal Problem for Density Operators. In the introduction, the authors made a historical remark that “Leifer & Poulin (2008) left open whether the entropic quantum conditional-independence relation has the full graphoid property”. AI interpreted this question as “does the quantum conditional-independence relation have the full graphoid property for arbitrary density operators“, and was able to quickly find a counterexample.

However, the sentence before started with “In the strictly positive finite-dimensional setting considered here…“. This should have made the AI realise that the actual question was about whether this property holds for strictly positive density operators. This also would have been obvious had the AI read the actual citation of Leifer & Poulin (2008).

The mixed

The largest category, with many different types of partial solutions.

By far the most common author objection is that AI claimed a proof but did not provide enough detail to verify the maths. That was mainly my bad: I have since learned to always prompt the model for detailed step-by-step solutions, with much better results. Still, for this experiment, this led to some inconclusive results, with authors unable to fully verify the claims.

Other than this, the most common situation was along the lines of:

AI: “I have solved the open problem subject to assumptions ABC, same as the paper”
Author: “I think it mainly works, but step 10 doesn’t quite follow, step 20 is not fully justified, and assumption C is subtly different from ours “
AI: “I agree - in fact, I can only prove steps 10 and 20 under additional assumptions DEF”

Thus, in a nutshell, AI provided a partial solution to the open problem, but it would take additional work or human judgement to understand (1) whether the problem is interesting once the additional assumptions are imposed and (2) whether AI really made good partial progress towards the big question, or just followed a dead end.

Selection of other interesting mixed results

In the discussion about a possible solution to an open question from Long-range nonstabilizerness of topologically encoded states from mutual information, the author expressed doubt about whether AI can actually prove the claim, and whether the reduction strategy used in that proof is indeed valid. In a more detailed response, AI admitted the proof is incomplete. However, it was able to show that the reduction strategy is indeed valid. I’ll call it a partial win.

In “Proof of the absence of local conserved quantities in the Holstein model”, authors said “An important direction for future work is to extend our method to other physically relevant fermion–boson coupled systems”. AI saw this, tried the method on a bunch of other physically relevant systems, and declared success, proving that the same trick can be extended from the Holstein model onto “all one-dimensional optical fermion-boson chains with onsite boson-assisted fermion bilinears, including multicomponent Holstein, Holstein-Hubbard, orbital/Jahn-Teller-Hubbard-type chains and flavor-dependent local electron-phonon models.”

The author reply was essentially “sure, but that’s not what we meant by extending our method”. Summarising their view: of course, the method can be directly applied to similar hamiltonians, obtaining similar results - but that’s not necessarily interesting. The interesting physics would come from proving something new about those models using those methods - but as far as they can tell, nothing that the AI did was unexpected or a trivial application.

I give AI partial points here. Even though the authors pushed back, I think it’s possible that what’s a trivial application to authors could have been unexpected to a less expert reader. But I think it really points down to the challenge of actually defining what it means to “solve” a problem in physics.

Another “fuzzy” example came from Statistical Interpretation of the Procedures Measurement of Physical Quantities, which left open a question of whether a certain measurement sequence is possible. AI was able to correctly operationalise the question and prove it’s a logical/mathematical possibility. However, the author pointed out that the truly interesting question is whether the measurement admits an actual experimental realisation. Again - actually crisply formulating the open question is often half the work!

A fun final example of a “mixed” result was from Symmetric dilations of Pauli channels and semigroups, which solved a single-qubit case, and concluded with “Future work may extend our results to the construction of physical dilations of multi-qubit Pauli channel”. AI attacked this problem head on, and was able to derive some results, albeit under more stringent assumptions. The author commented that he hadn’t actually intended to ever tackle this specific problem, but now that AI took the first steps, he might actually give it a go :)

The good

Some clear winners:

The paper One pure steered state implies Einstein-Podolsky-Rosen steering asked whether every entangled two-qubit state admitting exactly two pure steered states must have rank 2. AI confirmed it with a kernel-dimension proof, and authors confirmed the AI assessment.

The paper An Exponential Sample-Complexity Advantage for Coherent Quantum Inference proposed a randomized-purification-then-cloning channel, and left open if the channel is optimal when producing more outputs than inputs. AI produced a counterexample showing the proposed channel is not optimal in a small case, because the loss traces out purifying registers and a system-only cloner can do better. Authors confirmed the solution, although noted the win is small, because it’s unclear if the conjecture can be repaired in light of this counterexample, or whether the counterexample is genuinely important.

Finally, the paper “The relative entropy of magic and its nonadditivity” contained one open question, and one edge case in the proof. AI was able to address them both, and the authors will include the results in the upcoming revision.

What I learned

At the start of this experiment, I set myself a goal of genuinely answering one open physics question. The resulting hit rate of ~20% has been very impressive, especially considering that the AI is single-shotting the problem with no expert guidance. Thus, I say the experiment is a win for the AI physicist.

But as far as I’m concerned, the experiment is just beginning

Prompt and pipeline improvements

Despite the moderate success of v0.1 above, I think one can do significantly better with only slightly better prompting and workflows. Based on the shortcomings of the first attempt, I noted the following prompts for attempt v0.2:

Do not claim an open question is solved until you have:

Quoted the exact open-question text and interpreted it in context.

Scanned the rest of the paper for where the apparent question is already answered, formalized, specialized, or partially resolved. If the paper already gives a variational principle, SDP, optimization, theorem, reduction, example, or supplement-level answer, do not present that as the solution. Identify what remains unresolved after the paper’s own answer.

Distinguished explicit solution, constructive method, implicit reformulation, numerical evidence, and heuristic. A reformulation or finite-but-intractable expression is not a solution unless the authors explicitly asked only for that.

Checked whether you are solving the intended question rather than a weaker, special, easier, or already-known version.

Written every hard proof step explicitly, with no “it follows” gaps. For each nontrivial step, provide either a derivation, a cited theorem with hypotheses checked, or mark it as unproven.

Provided reproducible code, inputs, outputs, and independently checkable certificates for computational or enumeration-based claims.

Tested the proposed solution against examples, boundary cases, low-dimensional cases, and known cases mentioned in the paper.

Checked whether the result is already known, obvious to experts, or a routine extension of existing arguments. State novelty separately from correctness.

Stated all modeling choices and verified that they match the paper’s definitions. If multiple resource notions or interpretations exist, say which one is being solved.

Separated correctness, completeness, tractability, and significance. A correct lemma, benchmark, or promising construction is not automatically a solved open problem.

Run a final self-audit: for every claim an author might challenge, either give a step-by-step derivation or downgrade the claim to conjectural, partial, or incomplete.

Use “solved” only if all checks pass. Otherwise label the result as partial, proposed, plausible, or off-target, and state the exact remaining gap.

I also need to set this process up as a fully autonomous multi-agent loop. The manual intervention I had to make to circumvent the lack of API for gpt pro was manageable for the proof of concept, but too annoying in the long run.

One approach I briefly considered but rejected was to waterproof the maths through auto-formalisation in Lean. I still think this is an overkill, but the continuous stream of progress in this area sometimes gives me pause. Perhaps in not-too-distant future, this experiment can be done in a way where all mathematical statements are “guaranteed true” and require no manual verification, leaving us to focus on physics only?

The nuances of open questions in physics

Turns out that formulating and evaluating open questions in physics is a subtle art. While a small minority of questions are concrete, crisp and “AI-ready”, most of the time, open questions tend to be extremely fuzzy and vague. Examples include:

“Is this method practical?”
“What’s the physical interpretation of this equation?”
“Can theoretical advantage of this protocol turn into practical gains?”

Furthermore, a lot of the time, it is really unclear or even unknowable ahead of time what exactly constitutes a solution. For example, when we say “it is an open question whether method A can be applied to system X”, what do we mean exactly? Do we mean that it is mathematically applicable? Or that it can be done in practice? Or that applying it produces valuable insights? Or something else entirely?

Likewise, it is often unclear what level of mathematical description is appropriate for which question. In some cases, implicit results suffice; in others, exact analytical solutions are necessary. In some domains, establishing a result numerically is sufficient for all intents and purposes, while in others, rigorous mathematical proof is necessary. What’s needed is highly contextual, and also involves trial-and-error.

Note I’m not saying this as a criticism. Part of the beauty and the art of physics is that it’s not quite exact, formal, and concrete. As Einstein said, “if we knew what we were doing, it wouldn’t be called science”. The fuzziness of physics is often a feature, not a bug.

Finally, for better or worse, most physicists don’t put very much effort into formulating the open questions - probably because they don’t expect anyone to actually pick up the challenge. Thus, “open questions” are often effectively “notes to self” - or “notes to the editor” as it’s bad form to finish a paper without an outlook. As long as that persists, we can expect some miscommunications with AI agents who love to take us on our word!

Let’s follow this up?

I think this style of AI benchmarking is really unique and insightful, and is ready for a larger-scale test. Because physicists seem to be very open to engaging and reviewing AI work in surprising detail, a more careful study should be able to extract much more information about real-world applicability and limitations of AI models - not to mention more statistics.

Also: there is potentially only a short window of time that this experiment can actually be run. This is because today, most physicists don’t use AI in this way - and those that do do not use the frontier maths models like gpt-5.5 pro (trust me, I asked). So if we want to understand exactly how much value frontier models bring to the table, we have to ask now, while the table is mostly filled with human physicists!

If you’re interested in collaborating on this, or if you’d like to sponsor this token-hungry work, let me know. Should be fun!

AI eats deep tech, part 1: mad physics skillz

2026-05-20T00:00:00+00:00

By now, we have all seen how AI software engineers, AI chatbots, and AI copywriters change the game. In this blog series, I wanted to go broader, and extrapolate from my personal and professional experiences in quantum computing on how AI will impact deep tech R&D. This post is part 1, which is all about how crazy good AI can be at doing everyday physics .

Ok, I say it’s a blog series, but knowing myself… who knows when, if ever, I’ll get round to writing a follow-up. But for now, enjoy this raw footage of AI agents:

Totally nailing maths questions that stumped me during my PhD
Doing monte-carlo simulations of blackbody radiation on cryogenic cows
Reproducing the results on my messy paper from 2017
Nailing the quantum computing state-of-the-art literature research

Section A: AC Stark shifts

Question. Hey AI, I have a following problem for you. It is well known that if I resonantly drive a transition within a multi-level system, the frequency of that transition is shifted by the AC Stark/Zeeman shift of spectator levels. I know very well how to prove that a 2-level system driven off resonance experiences a frequency shift of $\approx \Omega^2/(4\delta)$. However, evenever I had tried to derive this for a 3-level system - where one of the transitions is driven on resonance, and one of the transitions is driven off resonance - I had failed. Can you help me out?

Mathematically, the problem boils down to finding the eigenvalues of the Hamiltonian
\[H/\hbar = \begin{pmatrix} 0 & \Omega_1/2 & \Omega_2/2 \\ \Omega_1/2 & -\delta & 0 \\ \Omega_2/2 & 0 & \Delta-\delta \end{pmatrix},\]
in the basis $(\lvert 0\rangle, \lvert 1\rangle, \lvert e\rangle)$. This represents a qubit drive $\lvert 0\rangle \leftrightarrow \lvert 1\rangle$, with Rabi frequency $\Omega_1$ which also couples $\lvert 0\rangle \leftrightarrow \lvert e\rangle$, but that transition is detuned by $\Delta$ and has Rabi frequency $\Omega_2$. Again, I expect to see that the eigenvalue corresponding to the $\lvert 0\rangle \leftrightarrow \lvert 1\rangle$ dressed state experiences a shift of $\approx \Omega_2^2/(4\delta)$ compared to a two-level system.

AI answer. The goal is to confirm that turning on a far-detuned drive $\lvert 0\rangle \to \lvert e\rangle$ with Rabi frequency $\Omega_2$ shifts the qubit ($\lvert 0\rangle \leftrightarrow \lvert 1\rangle$) dressed-state energies by an amount of order $\Omega_2^2/(4\Delta)$, where $\Delta$ is the large detuning of the second drive. Derive the result carefully (the user reports their analytics didn’t close) and then verify numerically by diagonalizing the full $3\times 3$ Hamiltonian.

1. Hamiltonian and conventions

In the basis ${\lvert 0\rangle, \lvert 1\rangle, \lvert e\rangle}$ and with $\hbar = 1$,

\[H = \begin{pmatrix} 0 & \Omega_1/2 & \Omega_2/2 \\ \Omega_1/2 & -\delta & 0 \\ \Omega_2/2 & 0 & \Delta - \delta \end{pmatrix}.\]

This is the standard rotating-frame Hamiltonian for two simultaneous drives sharing the ground state $\lvert 0\rangle$. The $\lvert 0\rangle \leftrightarrow \lvert 1\rangle$ drive (qubit) is nearly resonant, with small detuning $\delta$ and Rabi frequency $\Omega_1$. The $\lvert 0\rangle \leftrightarrow \lvert e\rangle$ drive is far off-resonant, with large detuning $\Delta$ and Rabi frequency $\Omega_2$.

A useful sign-convention check: with no couplings ($\Omega_1=\Omega_2=0$) the diagonal energies in this frame are $E_{\lvert 0\rangle}=0$, $E_{\lvert 1\rangle}=-\delta$, $E_{\lvert e\rangle}=\Delta-\delta$. A positive $\Delta$ means $\lvert e\rangle$ sits well above $\lvert 0\rangle$ in the rotating frame, so the $\lvert 0\rangle$-light shift from $\lvert e\rangle$ should be negative (level repulsion from above).

Throughout we take $\Delta \gg \Omega_1, \Omega_2, \lvert\delta\rvert$ and assume $\Delta - \delta \approx \Delta$ wherever convenient (we keep $\Delta - \delta$ where it matters).

2. Bare 2-level dressed states ($\Omega_2 = 0$)

With $\Omega_2 = 0$, the third level decouples and the ${\lvert 0\rangle, \lvert 1\rangle}$ block

\[H_2^{(0)} = \begin{pmatrix} 0 & \Omega_1/2 \\ \Omega_1/2 & -\delta \end{pmatrix}\]

has the textbook dressed-state eigenvalues

\[E_\pm^{(0)} = -\frac{\delta}{2} \pm \frac{1}{2}\sqrt{\delta^2 + \Omega_1^2}\]

(generalized Rabi frequency $\Omega_R = \sqrt{\delta^2+\Omega_1^2}$). On resonance ($\delta=0$) these become $E_\pm^{(0)} = \pm \Omega_1/2$.

3. Adiabatic elimination of $\lvert e\rangle$

There are two equivalent ways to derive the effective Hamiltonian for the qubit manifold; both expose the sign and position of the shift cleanly.

3.1 Second-order Rayleigh–Schrödinger perturbation theory

Split $H = H_0 + V$ with

\[H_0 = \mathrm{diag}(0, -\delta, \Delta-\delta), \qquad V = \begin{pmatrix} 0 & \Omega_1/2 & \Omega_2/2 \\ \Omega_1/2 & 0 & 0 \\ \Omega_2/2 & 0 & 0 \end{pmatrix}.\]

Standard second-order perturbation for level $\lvert n\rangle$ gives $\Delta E_n^{(2)} = \sum_{m\ne n} \lvert V_{nm}\rvert^2/(E_n^{(0)} - E_m^{(0)})$.

We do not treat $\Omega_1$ as perturbative — we keep the $\Omega_1/2$ coupling exactly. The perturbative parameter is the small ratio $\Omega_2/\Delta$, so we only eliminate matrix elements that couple to $\lvert e\rangle$. Within the ${\lvert 0\rangle, \lvert 1\rangle}$ subspace, the effect of $\lvert e\rangle$ at leading (second) order is to renormalize the diagonal entries. On $\lvert 0\rangle$ (energy $0$), $\Delta E_{\lvert 0\rangle} = \lvert\Omega_2/2\rvert^2/(0 - (\Delta-\delta)) = -\Omega_2^2/(4(\Delta-\delta))$. On $\lvert 1\rangle$ (energy $-\delta$), there is no direct coupling to $\lvert e\rangle$ in $V$, so $\Delta E_{\lvert 1\rangle} = 0$ at this order.

The cross term $V_{0e}V_{e1}$ vanishes because $V_{e1}=0$. So no off-diagonal correction appears at second order in $\Omega_2$.

The punchline: the light shift sits entirely on $\lvert 0\rangle$, with magnitude

\[s \equiv \frac{\Omega_2^2}{4(\Delta-\delta)} \approx \frac{\Omega_2^2}{4\Delta},\]

and sign such that $\lvert 0\rangle$ moves toward lower energy when $\Delta > 0$ (i.e., when $\lvert e\rangle$ is above $\lvert 0\rangle$).

3.2 Same result by explicit projection / Schrieffer–Wolff

Write $\lvert\psi\rangle = c_0\lvert 0\rangle + c_1\lvert 1\rangle + c_e\lvert e\rangle$, project $H\lvert\psi\rangle = E\lvert\psi\rangle$ onto $\lvert e\rangle$, and solve adiabatically for $c_e$ assuming $\lvert\Delta - \delta - E\rvert \approx \Delta$:

\[(\Delta - \delta) c_e + \frac{\Omega_2}{2} c_0 = E c_e \;\Longrightarrow\; c_e \approx -\frac{\Omega_2/2}{\Delta - \delta} c_0.\]

Substituting back into the $\lvert 0\rangle$ row gives an effective $\lvert 0\rangle$-energy correction of $-\Omega_2^2/[4(\Delta-\delta)]$, in agreement with section 3.1.

3.3 Effective 2-level Hamiltonian

The qubit subspace is therefore governed by

\[H_\mathrm{eff} = \begin{pmatrix} -s & \Omega_1/2 \\ \Omega_1/2 & -\delta \end{pmatrix}, \qquad s = \frac{\Omega_2^2}{4(\Delta-\delta)}.\]

The Rabi coupling $\Omega_1$ is unmodified at this order (there is no $\lvert 1\rangle \leftrightarrow \lvert e\rangle$ matrix element to generate one).

4. Modified dressed-state energies

Diagonalising $H_\mathrm{eff}$:

\[E_\pm = -\frac{\delta + s}{2} \pm \frac{1}{2}\sqrt{(\delta - s)^2 + \Omega_1^2}.\]

Compare with the bare result $E_\pm^{(0)} = -\delta/2 \pm \frac{1}{2}\sqrt{\delta^2+\Omega_1^2}$. Two distinct effects appear. First, a common shift $-s/2$ on the mean of the dressed-state pair. This is the “$\Omega_2^2/(4\Delta)$ shift” the user is asking about, but divided by 2 because the $\lvert 0\rangle$-character is split between the two dressed states. Second, a modified generalised Rabi frequency $\Omega_R \to \sqrt{(\delta-s)^2 + \Omega_1^2}$, i.e. the effective qubit detuning becomes $\delta_\mathrm{eff} = \delta - s$.

4.1 On resonance ($\delta = 0$)

Expanding in $s/\Omega_1$:

\[E_\pm = -\frac{s}{2} \pm \frac{1}{2}\sqrt{s^2 + \Omega_1^2} \approx -\frac{s}{2} \pm \frac{\Omega_1}{2}\left(1 + \frac{s^2}{2\Omega_1^2}\right).\]

So both dressed states shift by $-s/2 = -\Omega_2^2/[8(\Delta-\delta)]$, and the splitting between them changes only at the much smaller order $s^2/\Omega_1$ (i.e., $\Omega_2^4/(\Delta^2 \Omega_1)$). The qubit transition frequency $E_+ - E_-$ is essentially immune to the light shift on resonance; what shifts is the absolute position of the doublet.

4.2 Answer to the user’s question (signs and where it lives)

The $\Omega_2^2/(4\Delta)$ shift lives on $\lvert 0\rangle$ in the bare basis. In the dressed basis it does not appear symmetrically with the full magnitude on both states. On resonance it splits 50/50, giving a common shift of $-\Omega_2^2/[8\Delta]$ to each dressed state. Off resonance, the state with more $\lvert 0\rangle$ character takes a larger share. The sign is negative when $\Delta > 0$ (the second level is above $\lvert 0\rangle$ in the rotating frame, so it pushes $\lvert 0\rangle$ down).

So the user’s intuition was right in magnitude and parametric scaling but slightly off in distribution: each dressed state moves by only half the bare-$\lvert 0\rangle$ light shift.

5. Numerical verification

A script diagonalises the full $3\times 3$ Hamiltonian and compares against (a) the bare 2-level prediction (ignores $\Omega_2$) and (b) the effective 2-level Hamiltonian after adiabatic elimination.

5.1 Dressed energies vs $\Omega_2$

For parameters $\Omega_1 = 1$, $\delta = 0$, $\Delta = 20\,\Omega_1$, the two qubit-like eigenvalues of the full 3-level Hamiltonian bend downward as $\Omega_2$ grows. The bare 2-level prediction (flat at $\pm 0.5$) ignores the light shift entirely and quickly fails. The adiabatic-elimination prediction sits essentially on top of the full result over the whole sweep. The mean of the lower two eigenvalues, minus the bare mean, matches the analytic prediction $-s/2 = -\Omega_2^2/[8(\Delta-\delta)]$ to plotting precision.

5.2 Scaling check

The log-log error plot is the cleanest way to confirm the perturbative structure. The error of the bare 2-level approximation lies exactly on the reference $\propto \Omega_2^2$ slope — this is the AC Stark shift. The error of the effective 2-level approximation tracks the $\propto \Omega_2^4$ reference. The effective Hamiltonian captures the $\Omega_2^2/\Delta$ correction at leading order in $1/\Delta$ — the exact second-order shifts (derived in Section 6) carry energy denominators $\lambda_0 - (\Delta-\delta)$ rather than $\Delta-\delta$, so there is a smaller residual already at order $\Omega_2^2$ from the neglected $\lambda_0/\Delta$ dependence. For the plotted parameters ($\Delta = 20\,\Omega_1$) that subleading piece is numerically tiny and partly cancels in the mean of the lower manifold, so the observed slope remains $\Omega_2^4$. Section 6 gives the cleaner / more rigorous statement.

5.3 Scaling with $\Delta$ at fixed $\Omega_2$

For $\Omega_1=1$, $\Omega_2 = 2$, $\delta = 0$, $\Delta \in [10, 200]$, the common shift of the dressed pair tracks $-\Omega_2^2/[8(\Delta-\delta)]$ across nearly two orders of magnitude in $\Delta$, with no free parameters. This is the $1/\Delta$ signature of an off-resonant light shift.

5.4 Numerical table

For the resonant case $\delta=0$, $\Omega_1=1$, $\Delta=20$:

$\Omega_2$	$E_-$ (full)	$E_+$ (full)	$E_e$ (full)	predicted per-state shift $-s/2$
0	-0.5000	0.5000	20.000	0.0000
1	-0.5061	0.4936	20.0125	-0.00625
2	-0.5250	0.4750	20.0499	-0.02500
3	-0.5577	0.4458	20.1119	-0.05625
4	-0.6064	0.4082	20.1982	-0.10000
5	-0.6735	0.3656	20.3080	-0.15625
6	-0.7617	0.3211	20.4406	-0.22500

At $\Omega_2 = 2$: the bare doublet ${-0.5, +0.5}$ becomes ${-0.525, +0.475}$, a common shift of $-0.025 = -\Omega_2^2/[8(\Delta-\delta)]$. The $\lvert e\rangle$-like state has moved up by $+\Omega_2^2/[4\Delta] = +0.05$ — the opposite shift, as required by trace conservation of $H$.

6. Exact eigenvalues and Taylor-expansion cross-check

Sections 3–4 derived the light shift by adiabatically eliminating $\lvert e\rangle$. That’s clean but perturbative in $\Omega_2/\Delta$. Here we check the result the other way round: write down the exact eigenvalues of the full $3\times 3$ Hamiltonian (as roots of a cubic), then Taylor-expand in $\Omega_2$ and confirm the leading correction reproduces — and the next-order term predicts — the adiabatic-elimination answer.

A sympy script does all of this symbolically.

6.1 Characteristic polynomial

\[P(\lambda) = \det(\lambda I - H) = \lambda^3 + \lambda^2(2\delta - \Delta) + \lambda\left[\delta^2 - \delta\Delta - \frac{\Omega_1^2 + \Omega_2^2}{4}\right] + \frac{\Omega_1^2(\Delta-\delta)}{4} - \frac{\Omega_2^2 \delta}{4}.\]

Trace check: $-\mathrm{coeff}(\lambda^2) = \Delta - 2\delta = \mathrm{Tr}\,H$. Determinant check: $P(0) = -\det H$, so $\det H = -\Omega_1^2(\Delta-\delta)/4 + \Omega_2^2\delta/4$. Direct expansion of $H$ along the first row gives the same value (sanity check: at $\Omega_1 = 2, \Omega_2 = 0, \delta = 0, \Delta = 4$, $\det H = -4$, and $-\Omega_1^2(\Delta-\delta)/4 = -4$).

6.2 Factorisation at $\Omega_2 = 0$

\[P_0(\lambda) \equiv P(\lambda)\big\rvert_{\Omega_2=0} = (\lambda - (\Delta-\delta))(\lambda^2 + \delta\lambda - \Omega_1^2/4).\]

The quadratic factor is the bare 2-level block. So at $\Omega_2 = 0$ the three eigenvalues are the bare dressed states $E_\pm^{(0)} = -\delta/2 \pm \frac{1}{2}\sqrt{\delta^2 + \Omega_1^2}$ and the bare excited level $E_e^{(0)} = \Delta - \delta$. The full polynomial differs from $P_0$ by

\[P(\lambda) - P_0(\lambda) = -\frac{\Omega_2^2}{4}(\lambda + \delta),\]

which is the only way $\Omega_2$ enters. Note in particular that $P$ depends on $\Omega_2$ only through $\Omega_2^2$, so every eigenvalue is an even function of $\Omega_2$ — the Taylor series contains only $\Omega_2^{2n}$ terms.

6.3 Implicit-function theorem ⇒ perturbation series

Write $\lambda = \lambda_0 + \lambda_2 \Omega_2^2 + \lambda_4 \Omega_2^4 + \cdots$ with $\lambda_0$ one of the three bare roots, expand $P(\lambda, \Omega_2) = 0$, and match powers of $\Omega_2$. The first two orders give

\[\lambda_2 = \frac{\lambda_0 + \delta}{4 P_0'(\lambda_0)}, \qquad \lambda_4 = \frac{\lambda_2/4 - \frac{1}{2} P_0''(\lambda_0) \lambda_2^2}{P_0'(\lambda_0)}.\]

For a monic cubic, $P_0’’’$ is a constant, so these two terms close the system through $\mathcal{O}(\Omega_2^4)$.

Evaluating at each bare root:

branch	$\lambda_0$	$\lambda_2$
$E_+$	$-\delta/2 + \Omega_R/2$	$(\delta + \Omega_R)/[8\Omega_R((\delta+\Omega_R)/2 - \Delta)]$
$E_-$	$-\delta/2 - \Omega_R/2$	$(\delta - \Omega_R)/[8\Omega_R(\Delta - (\delta-\Omega_R)/2)]$
$E_e$	$\Delta - \delta$	$\Delta/(4\Delta(\Delta-\delta) - \Omega_1^2)$

where $\Omega_R \equiv \sqrt{\delta^2 + \Omega_1^2}$.

In the large-$\Delta$ limit ($\Delta \gg \Omega_1, \lvert\delta\rvert$):

\[\lambda_2^{(\pm)} \to -\frac{1}{8\Delta}\left(1 \pm \frac{\delta}{\Omega_R}\right), \qquad \lambda_2^{(e)} \to \frac{1}{4\Delta}.\]

Multiplying by $\Omega_2^2$, the leading shifts are

\[\Delta E_\pm = -\frac{\Omega_2^2}{8\Delta}\left(1 \pm \frac{\delta}{\Omega_R}\right), \qquad \Delta E_e = +\frac{\Omega_2^2}{4\Delta}.\]

This is exactly the result of Section 4, with the bonus that the $E_e$ shift comes out of the same expansion — its $+\Omega_2^2/(4\Delta)$ is the “missing partner” that makes the trace work.

6.4 Trace conservation, exactly

The sum of all three $\lambda_2$’s must vanish (since $\mathrm{Tr}\,H$ doesn’t depend on $\Omega_2$). This is a non-trivial identity at the level of the closed-form expressions above; sympy confirms sum(lam_2) = 0 and sum(lam_4) = 0.

In other words, the $\lvert 0\rangle$-light shift $-\Omega_2^2/(4\Delta)$ is distributed among the three eigenstates by weight — half each to $E_\pm$ on resonance, and the opposing $+\Omega_2^2/(4\Delta)$ to $E_e$ — but the total trace is preserved at every order in $\Omega_2$.

This is a cleaner statement than the level-by-level Section 3 result: adiabatic elimination appears to put the shift on $\lvert 0\rangle$ only, but that is a bare-basis statement; in the dressed basis the same shift gets sliced up according to the dressed-state composition, while $E_e$ moves the opposite way to balance the books.

6.5 Equivalence with the $\lvert c_0\rvert^2$ picture

The Section 6.3 expression for $\lambda_2$ can be rewritten in a more physically transparent form. The bare dressed state $\lvert\lambda_0\rangle$ has $\lvert 0\rangle$-amplitude $c_0$, where (for the $\pm$ states) $\lvert c_0^{(\pm)}\rvert^2 = \frac{1}{2}(1 \pm \delta/\Omega_R)$.

Standard second-order perturbation theory gives the shift from coupling $\langle 0\rvert V \lvert e\rangle = \Omega_2/2$:

\[\Delta\lambda_0 = \frac{\lvert c_0\rvert^2 (\Omega_2/2)^2}{\lambda_0 - E_e^{(0)}} = \frac{\lvert c_0\rvert^2 \Omega_2^2/4}{\lambda_0 - (\Delta - \delta)}.\]

In the large-$\Delta$ limit this is $-\lvert c_0\rvert^2 \cdot \Omega_2^2 / (4\Delta)$, i.e.

\[\Delta E_\pm = -\lvert c_0^{(\pm)}\rvert^2 \cdot \frac{\Omega_2^2}{4\Delta},\]

which is the same as the result above. The factor of $1/2$ that appears on resonance is just “$\lvert 0\rangle$ is half of each dressed state”.

6.6 Numerical verification

For $\Omega_1 = 1$, $\delta = 0$, $\Delta = 20\,\Omega_1$: the exact eigenvalue (from numerical diagonalization of the $3\times 3$) agrees with the Taylor series truncated at $\mathcal{O}(\Omega_2^2)$ — i.e. exactly the adiabatic-elimination prediction — out to $\Omega_2 \sim \Omega_1$. Including the next-order correction computed in Section 6.3 extends agreement to $\Omega_2 = 6\,\Omega_1$ (where $\Omega_2/\Delta = 0.3$, well outside the perturbative regime).

Log-log convergence: both the 2nd-order and 4th-order truncations behave as one expects from a Taylor series that only contains even powers of $\Omega_2$. The 2nd-order residual scales as $\Omega_2^4$ (the dropped next term); the 4th-order residual scales as $\Omega_2^6$ (the term after that). Reference $\Omega_2^4$ and $\Omega_2^6$ slopes confirm both scalings cleanly over three decades in $\Omega_2$. This is the strongest possible “the perturbative calculation was right” cross-check: not only does the leading shift match the adiabatic-elimination result, but the residual scales exactly as the next dropped Taylor term must.

6.7 What this adds over Section 5

Section 5 already showed numerically that the bare 2-level error scales as $\Omega_2^2$ and the adiabatic-elimination residual as $\Omega_2^4$. Section 6 shows analytically that the adiabatic-elimination Hamiltonian is the $\mathcal{O}(\Omega_2^2)$ truncation of the exact eigenvalue, and gives a closed-form expression for the next term. That term is the leading correction to the effective Hamiltonian if one wants to do better than adiabatic elimination — useful, e.g., for evaluating Stark-shift cancellation strategies at the $\Omega_2^4/\Delta^3$ level.

7. Summary

Derivation. Adiabatic elimination of $\lvert e\rangle$ gives an effective 2-level Hamiltonian for the qubit manifold in which $\lvert 0\rangle$ receives a light shift $-\Omega_2^2/[4(\Delta-\delta)]$ and $\lvert 1\rangle$ is unchanged (at this order). The off-diagonal qubit coupling $\Omega_1$ is also unchanged at this order.
Dressed eigenvalues. $E_\pm = -(\delta+s)/2 \pm \frac{1}{2}\sqrt{(\delta-s)^2 + \Omega_1^2}$ with $s = \Omega_2^2/[4(\Delta-\delta)]$. On resonance ($\delta=0$) each dressed state acquires a common shift of $-s/2 = -\Omega_2^2/[8\Delta]$; the dressed-state splitting is essentially unchanged (correction $\sim \Omega_2^4/(\Delta^2 \Omega_1)$).
User’s expected scaling. The $\Omega_2^2/(4\Delta)$ scaling is confirmed, with two refinements: (i) the shift sits on $\lvert 0\rangle$, not symmetrically on both dressed states; (ii) in the dressed basis each state receives only half the bare-$\lvert 0\rangle$ shift on resonance, because $\lvert 0\rangle$ is split 50/50 between $\lvert\pm\rangle$.
Numerics. Full $3\times 3$ diagonalization confirms the $-\Omega_2^2/[4(\Delta-\delta)]$ shift on $\lvert 0\rangle$ to the expected $O(\Omega_2^4/\Delta^2)$ relative accuracy, and the $1/\Delta$ scaling holds over the full range tested.

Section B: Black-body radiation

Question. Hey AI, I have a following problem for you. I have a little cow-shaped object, 1 cm in size. I want to put this cow in my 4K crystat, but I am worried about the cow heating up too much due to black body radiation through an aperture in the 4K heat shield 10 cm away, which looks directly at 300K. Draw this setup and calculate the total black body radiation on the cow as a function of aperture diameter. First, do this for a spherical object and compare numerical simulations with analytics. Then, repeat for a cow-shaped object. Assume the cow is centered with the aperture, and that the aperture looks at the cow from the side.

AI answer.

Setup

A 4 K shield with a circular aperture of diameter $D$ at $x=0$. A 1‑cm object (sphere or cow) sits at $x = L = 10$ cm. The cold side of the aperture acts as a Lambertian 300 K black body (Kirchhoff’s law: aperture into 300 K is indistinguishable from a 300 K black surface in the aperture plane). All emissivities are taken as $\varepsilon = 1$. 4 K self‑emission gives $\sigma T^4 = 1.45\times 10^{-5}$ W/m² vs $459$ W/m² at 300 K, a $3\times 10^{-8}$ ratio, so it is ignored.

1. Analytic sphere

Treating the sphere ($r=0.5$ cm) as a small target seen from a point on the aperture at off‑axis radius $\rho$, the hit‑probability density per unit aperture area is

\[f(\rho) = \frac{L r^2}{(L^2 + \rho^2)^{3/2}}\]

(Lambertian $\cos\theta$ projection times the solid angle of the sphere as seen from that aperture point). Integrating $2\pi\rho\,d\rho$ from $0$ to $D/2$:

\[P(D) = 2\pi r^2 \sigma T^4 \left(1 - \frac{L}{\sqrt{L^2 + (D/2)^2}}\right).\]

Limiting behaviour: for a small aperture ($D \ll L$), $P \approx \sigma T^4 \cdot \pi(D/2)^2 \cdot r^2/L^2$ — pure aperture‑area scaling with the on-axis Lambertian view-factor $r^2/L^2$ (i.e. the point-target solid angle $\pi r^2/L^2$ divided by $\pi$, since the integrand $f(\rho)$ at $\rho=0$ is exactly $r^2/L^2$). For $r=0.5$ cm, $L=10$ cm this is $r^2/L^2 = 2.5\times 10^{-3}$. For a large aperture ($D \to \infty$), $P \to 2\pi r^2 \sigma T^4 \approx 72$ mW.

Note the factor of 2 in the asymptote. A naive “sphere in front of a black plane” estimate gives $\pi r^2 \sigma T^4$. The correct answer is $2\pi r^2 \sigma T^4$, because at infinite aperture the back hemisphere of the sphere also intercepts rays arriving from the (now arbitrarily large) aperture at glancing angles. I verified this independently with a separate surface‑integral Monte Carlo over a sphere in front of an infinite Lambertian half‑plane.

2. Sphere Monte Carlo

A naive uniform‑disc Lambertian Monte Carlo fails at large $D$ because the hit fraction falls to $\sim 10^{-5}$ and shot noise dominates. The code uses a cone‑importance sampler: stratify the aperture into log‑spaced annular rings; from each ring sample directions only within the cone that subtends the target’s bounding sphere; weight each ray by $(\cos\theta/\pi) \cdot \Omega_\mathrm{cone}$ — the Lambertian PDF divided by the uniform‑on‑cone PDF.

The Monte Carlo agrees with the analytic formula to $< 0.1\%$ across 8 decades in $D$.

3. Cow

A primitive cow mesh in trimesh: body ellipsoid + head sphere + four cylindrical legs + tail cylinder, bounding box diagonal 1 cm, broadside facing the aperture.

Silhouette projected area (MC of a parallel ray bundle): $A_\perp = 1.77\times 10^{-5}$ m², vs sphere’s $\pi r^2 = 7.85\times 10^{-5}$ m². So the cow is about 22% of the sphere by frontal silhouette.

The same cone‑importance MC is then applied to the cow mesh.

4. Qualitative shape of the curves

For small $D \lesssim 30$ mm, both curves are power‑law $P \propto D^2$ (aperture‑area scaling). Cow $\approx 0.22 \times$ sphere, set entirely by the frontal projected‑area ratio. For intermediate $D \sim$ a few cm, both bend over as $1 - L/\sqrt{L^2+(D/2)^2} \to 1$. For large $D \gtrsim L$, they saturate. Sphere approaches 72 mW = $2\pi r^2\sigma T^4$. Cow saturates at ~12 mW, a factor ~5× below the sphere — slightly more than the silhouette ratio because the cow’s average projected area integrated over all aperture‑to‑cow angles is smaller, and self‑shadowing between cow parts removes some “double‑side” pickup that the sphere benefits from.

5. Power thresholds (linear interp on the MC)

Power target	$D$ (sphere)	$D$ (cow)
1 nW	~0.033 mm†	~0.070 mm†
1 μW	1.05 mm	2.22 mm
1 mW	33.7 mm	72.8 mm
10 mW	118 mm	(asymptote ~12 mW)

† The 1 nW row sits below the simulation grid (which starts at $D = 0.10$ mm, where the sampled powers are already ~9 nW for the sphere and ~2 nW for the cow). The 0.033 / 0.070 mm values are obtained by extrapolation using the small-aperture $P \propto D^2$ scaling — exact in this regime — not from a direct MC sample.

6. When does the cow heat?

Depends on its thermal link $G$ to the 4 K stage. For a typical $G \approx 10^{-4}$ W/K (thin support strap):

$D$	$P_\mathrm{cow}$	$\Delta T = P/G$
1 mm	$2\times 10^{-7}$ W	2 mK
10 mm	$2\times 10^{-5}$ W	200 mK
100 mm	$1.7\times 10^{-3}$ W	17 K (cow no longer at 4 K)

For tight thermal budgets keep $D \lesssim $ a few mm; for a stiff $G \approx 10^{-2}$ W/K link, several cm is fine.

7. Notes / caveats

The point-aperture Lambertian view factor from an on-axis aperture point to a sphere of radius $r$ at distance $L$ is $r^2/L^2$ (= geometric solid angle $\pi r^2/L^2$ divided by $\pi$, in the small-target limit $r \ll L$). An earlier draft of this report wrote $r^2/(L^2+r^2)$ — that is the next-order correction; for $r/L = 0.05$ the difference is only $\sim 0.25\%$, so no plotted curve is affected.
The “naive” $\pi r^2 \sigma T^4$ answer is off by 2×. The correct saturation is $2\pi r^2 \sigma T^4$, verified by an independent surface‑integral MC. (Equivalently: a sphere in front of an infinite hot half-plane has view-factor 1/2 to the plane, so the absorbed flux is $4\pi r^2 \cdot \frac{1}{2} \cdot \sigma T^4 = 2\pi r^2 \sigma T^4$.)
The plain‑Lambertian MC becomes statistically useless above $D \sim 10$ cm because the target subtends a microsteradian solid angle from most aperture points. The cone‑importance sampler reduces variance by 4–5 orders of magnitude at large $D$.
Cow size convention. The cow is rescaled so that its bounding-box diagonal is 1 cm; the comparison sphere has 1 cm diameter. If “1 cm in size” is meant as cow length, cow height, or bounding-sphere diameter, the cow’s absorbed power rescales roughly with the projected area in the small-aperture regime and with the angle-averaged projected area at large aperture. The qualitative shape of the curve and the cow-vs-sphere ratio $\sim 0.2$ are robust, but the absolute cow numbers should be read as order-of-magnitude geometry estimates.
The cow mesh itself is a stand-in (body ellipsoid + head + four cylindrical legs + tail). Not anatomically accurate, but captures the salient feature that broadside silhouette $\approx 0.2\times$ a bounding sphere of the same characteristic size.

Section C: Reproduce my old paper

Question. Hey AI, a task for you… Early on in my PhD, I wrote a paper on experimental tests of quantum contextuality. To be honest with you, this wasn’t my finest hour. One thing I recall about this work is a very messy lab data post-processing pipeline - I wonder if I made any major mistakes there! Can you download the paper and the attached code/data, turn the mathematica scripts into a python pipeline, and see if you can reproduce the results in the text? The paper is here https://arxiv.org/abs/1712.06494, and the code should be attached to the paper somehow (find it)

AI answer.

Status

Reproduced — every experimental number in the paper matches the public raw data to printed precision. Two issues flagged below; neither affects the headline experimental conclusions.

1. Downloads

arXiv:1712.06494 v2 paper PDF.
arXiv source bundle. Contains only main.tex + figures, no ancillary code or data.
Public dataset. Found via the paper’s Appendix L. Hosted on the TIQI ETHZ page; the actual download URL is https://ethz.ch/content/dam/ethz/special-interest/phys/quantum-electronics/tiqi-dam/documents/Datasets/Dataset_Public%20repository.zip (14 MB). Contains raw shot files (PMT photon counts per measurement) and thresholded CSVs of per‑correlator block averages. No Mathematica notebooks were ever made public — only the data outputs. So the original Mathematica code can’t be diffed line-for-line; only its outputs.

2. What the original pipeline did (deduced from data + Appendix E)

Threshold raw PMT counts → $A \in {\pm 1}$ per shot.
For each (observable_i, rotation_time_t), group 100 sequential shots into a “block” with per‑block means $\langle A_i\rangle$, $\langle A_j\rangle$, $\langle A_i A_j\rangle$ plus their within‑block variances; store ~100 such blocks (= 10 000 shots) per setting.
Map rotation time → opening angle $\theta$ via Eq. (E5).
Compute $S_N = \sum_i \langle A_i A_{i+1}\rangle$, $\varepsilon = \sum_i \lvert\langle A_i^{(1)}\rangle - \langle A_i^{(2)}\rangle\rvert$, $S_N^\mathrm{ext} = S_N + \varepsilon$.
Propagate SEMs as if $S_N$, $\varepsilon$, $\theta$ have independent inputs.
At the time closest to compatibility $\theta_N$, report $S_N$, $S_N^\mathrm{ext}$, and $\mathrm{CF}_N = (S_N - S_N^{NC})/(S_N^{NC,NS} - S_N^{NC})$.

3. Python pipeline

A ~250-line Python pipeline using numpy and matplotlib:

load_csv() → list of Setting dataclasses keyed by (obs_i, rot_time).
witness_per_time(settings, N, order) computes $S_N$, $\varepsilon$, $S_N^\mathrm{ext}$, $\theta$ and SEMs for every rot-time slice. order="normal" uses partner $i-1\bmod N$; order="reverse" uses $i+1\bmod N$.
Two SEM modes (sem_mode="shot" reproduces the paper; "block" uses block-scatter, see issue [B]).
Theory predictions: S_N_QM, S_5_theory, eps_5_theory.
Driver reproduces paper Table I, Table II, and Figs. 3, 4.

4. Reproduction — do the numbers match?

Table I (5-observable scan, closest to $\theta \approx 48.02°$): means agree to all 4 printed decimals.

order/seq	source	$S_5$	$S_5^\mathrm{ext}$
normal/gen	paper	−3.915(14)	−3.864(34)
normal/gen	mine	−3.9154(139)	−3.8628(332)
reverse/gen	paper	−3.937(14)	−3.890(34)
reverse/gen	mine	−3.9374(138)	−3.8960(343)

Table II ($N = 5\ldots 121$): every mean matches the paper to all printed digits; SEMs match within ≤1 in the last digit. Highlights: $\mathrm{CF}{31} = 0.800(4)$ (paper: 0.800(5)), $\mathrm{CF}{121} = -0.657(12)$ (paper: −0.657(12)), $S_5^\mathrm{QM} = -3.944$ reproduced.

Table III (signaling, “this work” row): 0.056(33) normal, 0.044(33) reverse vs paper 0.054(31), 0.050(31). Within stated uncertainties.

Figs. 3, 4 reproduce qualitatively and quantitatively — data sits on the QM theory curve in both top panels; CF peaks around $N=31$ at ~0.8 and goes negative at $N=121$.

5. Issues / bugs / methodological concerns

[A] No Mathematica notebook published.

Only the thresholded‑data CSVs are public. So I can’t audit the original pipeline directly — I reverse-engineered it from the data layout + Appendix E. Means agreeing perfectly is strong evidence the original pipeline did what App. E claims.

[B] (Conservative diagnostic) Shot-noise SEM vs block-scatter SEM.

The paper’s quoted SEMs come from the within‑block (shot‑noise) variance column of the CSVs. The std‑dev of block means / $\sqrt{n_\mathrm{blocks}}$ is more conservative because it would also pick up slow block-to-block drift if any were present. For $N \le 31$ the two agree; for large $N$ they differ substantially:

$N$	SEM($S_N$) shot‑noise (paper)	SEM($S_N$) block-scatter	ratio
81	0.014	0.027	1.9
101	0.017	0.023	1.4
121	0.023	0.072	3.1

The large‑$N$ CSVs only have one rotation time (no $\theta$-scan), so each is a single ~hours-long run during which slow B-field drift / cryocooler micro-cycles are plausible in principle. The block-scatter SEM is therefore a useful conservative diagnostic. Whether it actually represents the true uncertainty would require access to the original time-ordering of the shots and an explicit drift/noise model — which the public dataset doesn’t include — so I can’t say definitively that the paper’s quoted SEMs are understated. What I can say: the block-scatter SEMs are larger, and a corrigendum-style audit would want to check them against shot time-ordering. $\mathrm{CF}_{121} = -0.657$ is still significantly $<0$ either way (~5σ block-scatter, ~55σ shot), and the qualitative classical-looking $N=121$ result is unchanged.

[C] (Cosmetic typo) Appendix G claim about location of $S_5$ minimum.

The paper says “the minimum value of $S_5(\theta)$ is obtained when $\theta = \pi/2$, where $S_5 = (5/4)(-\sqrt 5 - 1) \approx -4.045$”. Re-reading the LaTeX source (line 354), the value $(5/4)(-\sqrt 5 - 1) \approx -4.045$ is in the source with the correct sign. The angle claim is what’s wrong: Eq. (G1) gives $S_5(\theta) = (5/8)(3 - \sqrt 5 + (5+\sqrt 5)\cos(4\theta))$, which at $\theta = \pi/2$ has $\cos(4\theta) = +1$ and so $S_5(\pi/2) = +5$, the maximum not the minimum. The minimum $-4.045$ is attained when $\cos(4\theta) = -1$, i.e. $\theta = \pi/4$. So the appendix sentence should say $\theta = \pi/4$, not $\pi/2$. Only the angle is wrong; the value and closed form are right. Cosmetic appendix typo, doesn’t affect any experiment.

Bottom line

The Mathematica pipeline appears to have been correct — all experimental numbers reproduce exactly with the procedure described in Appendix E. The remaining concerns are conservative / cosmetic: [B] the published SEMs for $N \ge 81$ are shot-noise-only and a block-scatter SEM would be more conservative (useful diagnostic but not a definitive understatement claim without time-ordering data); and [C] a single appendix-typo angle in the location of $S_5$’s minimum.

Section D: QC state of the art

Question. Hey AI, I’m preparing an overview talk about quantum computing hardware and I need your help. A few years ago, I wrote a little repository storing best 2Q gate fidelity vs year for different quantum computing modalities. However, I expect a lot of it to be out of date - and it may even be that the original code missed some important data points. Go one by one through the main QC hardware modalities - superconducting, trapped ion, Rydberg, spins - and record the best 2Q gate fidelity vs year. Make a table with citations, then make 4 plots - one per modality - where every result is annotated with first author name. Finally, make a plot which combines all 4 platforms.

AI answer. Survey of best-reported two-qubit gate fidelity per modality (superconducting, trapped ion, Rydberg/neutral-atom, spin) vs year, with citations. Current through May 2026 (treat the 2026 entries as recent-preprint / press-release landscape; this part of the table evolves week-to-week).

Dataset summary

46 data points in the master table, broken down as: superconducting (13 points, 2014–2026), trapped ion (11 points, 2003–2025), Rydberg (12 points, 2010–2025), spin (10 points, 2015–2025).

Per-modality plots use a log scale on $(1 - F)$ infidelity from $\sim 3\times 10^{-1}$ down to $\sim 8\times 10^{-5}$, with first-author annotations on every point and dashed reference lines at the 1% / 0.1% / 0.01% levels. Each per-modality plot includes a running-best step envelope.

Cross-modality state of the art (May 2026)

Modality	Best 2Q infidelity	Notes
Trapped ion	$\sim 8\times 10^{-5}$ (Hughes 2025, preprint) / $3\times 10^{-4}$ (Loschnauer 2025, peer-reviewed)	Only platform with publicly demonstrated four-nines fidelity
Superconducting	$\sim 6\times 10^{-4}$ (Lin fluxonium / Marxer IQM)	Sustained over 24 d or 40 h is the new feature, not just peak
Rydberg	$\sim 3\times 10^{-3}$ (Tsai, peer-reviewed) / $\sim 4\times 10^{-3}$ (Atom Computing press)	Steepest improvement curve of the four
Spin	$\sim 2$–$3\times 10^{-3}$ (Mills/Xue 2022 best pair; Steinacker/Stuyck 2025 all-4-foundry-devices >99%)	Frontier is now reproducibility on industrial wafers, not peak

Honest caveats called out in the table and report

Benchmarks are not directly comparable. RB / IRB / GST / Bell-state tomography / XEB / cycle benchmarking / leakage-corrected vs raw all live on the same axes here. No renormalization attempted. This is a survey, not a like-for-like ranking — a more rigorous version would add columns for evidence_type, peer_review_status, SPAM_or_leakage_corrected, and running_best_status.
IonQ Oct 2025 press release of 99.99% on barium and Hughes Oct 2025 preprint share authors (Oxford Ionics = IonQ).
Atom Computing 99.6% is a press claim, not RB-published.
Rydberg numbers are usually SPAM-corrected Bell-state fidelities; Evered 2023’s 99.5% has a ~98.7% raw counterpart.
Sycamore/Willow CZ values are XEB-derived per-cycle Pauli error converted to “fidelity”.
Quantinuum H1-1 best-pair 99.914% comes from Quantinuum’s Apr 2024 blog post, not from the Moses PRX Quantum 4, 040311 paper. Production-average H2 racetrack (~99.84%) is meaningfully lower.
2026 entries (Chen, Rigetti) are recent-preprint / press-release landscape; treat as provisional.

Suggested talk framing

The combined plot makes one fact obvious: trapped ions broke 99.9% in 2014 (Ballance) and have just broken 99.99% in 2025; superconducting reached 99.9% in 2022 (Sung) and approaches 99.95% in 2025 (Lin fluxonium / Marxer IQM); Rydberg reached 99% in 2020 (Madjarov) and 99.7% in 2025 (Tsai), with Yb-171 (Peper 2024, 99.4%) and Sr clock-state (Finkelstein 2024, 99.62%) variants now close behind; spin reached 99.5% in 2022 (Noiri/Xue) and ~99% across all four foundry devices in 2025 (Steinacker/Stuyck), with the story shifting from peak fidelity to reproducibility. The platforms are not converging — the trapped-ion lead in peak 2Q fidelity has actually grown over the last 3 years.

Sources

Aha moments: new blog announcement

2025-09-09T00:00:00+00:00

Writing this blog over the last 2.5 years has been a highly rewarding experience. This is both because I genuinely love writing, and because many of you seem to enjoy it. Time for a sneak peek behind the scenes, a preview of what’s coming up, and an ask for your help!

The state of “Reading the quantum”

Most writing on the internet is sending content into the darkness. While I can see in Google Analytics how many hundreds or thousands of you opened which page, I cannot directly gauge the impact or how well a point landed. However, over the years, I have received enough heart-warming in-person feedback from friends and strangers alike to believe some of you are getting value out of this. While the “comments” section is sadly usually rather empty, hearing things like “you don’t know me, but I know you, and everyone in my team reads your website” or “Oxford Ionics? Rings a bell… isn’t that the company of that guy who writes that blog?” makes it worth it.

By far the most common feedback I get about this site is “nice posts - why do you write so rarely?”

The primary reason is that while quantum computing is both my job and my passion… I really don’t want to be thinking about it all the time. In practice, I find that if I try writing about quantum things right before going to bed (my favourite creative slot), I will struggle to fall asleep. Maybe it’s that I’m saturated with this topic at the end of a work day, or that it engages the thinking part of my brain too much… but either way, I’m more of an “8 hours of quantum per day” than a “12 hours of quantum per day” kind of guy.

Because of this, most of my posts are actually written on the road - last time I flew to the US I think I knocked down 4 posts during one flight! I then retrospectively published 2 of them, while 2 are still in the queue awaiting general polish that may never come. And that’s just not the most sustainable way of producing a lot of content…

In a nutshell: the topic I’m most opinionated about is also the topic I’m currently the least motivated to write about!

For a while now, I’ve been racking my brain to find a better theme to build a blog around - something close enough to my core expertise that I can make interesting content, but also far enough from my core expertise that it doesn’t feel like work. And I think I have finally found it!

Introducing “Aha moments”

Last week, I launched a Substack called Aha moments. My goal is simple: to distil the key insights behind the pivotal events in the history of science & technology.

My first topic is electromagnetism. What I love about this topic is that our ability to build electrical & magnetic circuits - and, in turn, basically power all of the modern economy - really comes down to a few dozen people making a few dozen really simple but really non-trivial insights over a few hundred years. In the upcoming blog posts, I want to tell the story of electromagnetism as a story of those atomic “lightbulb moments” - with technical details, but without outdated jargon or academic debates about who came first. For example, in the latest post, I dissect how exactly humans first understood that there are such things as electrical conductors and electrical insulators:

Of course, the blog will be about much more than electromagnetism. In the coming months, I hope to mix-and-match stories of key insights across physics, chemistry, biology, maths and computer engineering that make the modern world possible. I’m having a lot of fun researching this, and I’m sure that if you enjoy this blog, you will also enjoy the stories on the Substack.

My ask of you: please spread the word

I have a good amount of intrinsic motivation to make this substack happen… but for it to succeed longer term, I obviously need you - the reader - to read it!

It doesn’t actually take much to motivate me - but for better or worse, I will need some regular dopamine kicks to keep this going. While I know that one should basically only write for themselves, as an imperfect and shallow human being, I really do want my phone to vibrate every now and again because someone liked a post, left a comment, or subscribed to the blog!

Hubris or not, I am confident enough there will be a niche but significant group of people who will enjoy these posts - if they ever get to see them. However, as I have no pre-existing network on Substack - and very little network outside of the quantum and quantum-adjacent part of the internet - my main worry is that the subscriber list will remain essentially confined to my wife, my parents, and one friend who put me up to this.

In other words: I need your help!

If you enjoy this blog - and want to see more content from me - any of the following would really help:

Go to Aha moments and click “Subscribe”
When new posts come out - check them out, give “likes” if you like them, click “restack”, leave comments
Share the Substack and/or this post on social media (you can tag me on Substack, LinkedIn or X/Twitter @quantumpod)
Tell your friends (or foes - I don’t mind!)

Finally, if there is a piece of science or technology where you always wondered “how on earth did anyone even come up with this” - send it my way and I’ll be happy to investigate.

Of course, this blog is not going away - I expect to continue writing here about all things quantum from time to time. Actually, we might be sharing some pretty sweet results from Oxford Ionics pretty soon… and I’ll make sure readers of this blog will not miss them!

Thanks, Maciej

Down with T2, long live T3!

2025-09-07T00:00:00+00:00

One of the main metrics used to describe a quantum computer is qubit coherence time, typically denoted as $T_2$. There is just one problem: $T_2$ is pretty useless at predicting the effect of qubit decoherence on computational errors! In this post, I explain why that is, and argue for adopting a new coherence measure more fit for the task: $T_{3}$.

Primer on qubit coherence

Qubit coherence is typically measured using a Ramsey sequence, when a qubit is prepared in a superposition of 0 and 1 and subjected to free-evolution of duration $t$ before it’s measured in the computational basis. The corresponding quantum circuit looks as follows:

The coherence time - typically denoted as $T_2^*$ or $T_2$, depending on your community - is the maximum value of $t$ for which this superposition “remains quantum”. More precisely, it is the time $t$ for which the probability of obtaining the correct result drops by $0.5 \times (1 - 1/e) \approx 32\%$.

It is frequently possible to extend the coherence time by manipulating the qubit during the free-evolution time. Scientists will often report the coherence time - which they might call $T_2$ or $T_{2, echo}$, depending on the community - measured while attempting to stabilise the qubit state in this fashion, either using a single “spin echo” pulse, or using a more involved “dynamical decoupling” sequence. Either way, the intuition is that $T_2$ is the time over which the “quantumness” of a quantum state can be maintained.

What’s the issue?

So far, so good, but there is a caveat. In quantum computing, it is generally necessary for all the operations to be performed with fidelity above $\approx 99.9\%$. Consequently, the longest acceptable delay must introduce an error with probability $< \approx 0.1\%$. That is the number we need to figure out about our quantum computer. However, $T_2$ only tells us about the length of delay that introduces an error with probability $\approx 63\%$ - waaaaaay longer than any delay we actually want to introduce.

The way most scientists approach this challenge is that they measure $T_2$ and extrapolate down to the timescale of interest. You could, following the example in Qiskit’s tutorials, perform the Ramsey measurement, fit $P= A \exp(-t/T_2) + B$, and then calculate $P(t_0)$ to calculate the error associated with a delay time $t_0$. For such exponential decay, we’d expect to get $\approx 0.1\%$ error for a wait time $t \approx 0.001 \times T_2$.

However, this method is unfortunately pretty terrible at getting more than an order-of-magnitude estimate for an error in the range of 0.1%. There are several reasons for this:

Noise spectrum. Each fit function to the Ramsey sequence assumes a certain noise spectrum, e.g. white noise in the case of the exponential fit. However, that is always an approximation, and it is very common for the noise spectrum to vary dramatically between the short timescales (relevant for 0.1% level errors) and long timescales (relevant for 68% level errors).
Fit uncertainty. Even eyeballing the graph above, it is clear that it would take a lot more data to measure $P(0)$ with an error much lower than 0.1% than it takes to establish $T_2$ to a reasonable accuracy.

To summarise the issue:

In a quantum computer, all relevant delays and operations must occur at or below a timescale $t_0$ where the probability of decoherence $1-P(t_0)$ is less than approximately 0.1%
To measure $P$ vs $t$ in the high-fidelity regime, it is fairly useless to measure $P$ vs $t$ in the low-fidelity regime, as is typical for $T_2$ measurement.

What’s the solution?

I believe the time has come to adopt a new coherence metric, more fit for the quantum computing era. I propose that in addition to $T_2$ (or even instead of $T_2$!), quantum computing papers start systematically measuring and reporting what I call $T_{3}$ - the maximum delay over which the superposition fidelity remains above three nines (hence the name).

So how do you actually measure $T_{3}$? It is possible to do it using a standard Ramsey sequence with a short delay time, though it takes time, and one has to be careful about subtracting state preparation and measurement (SPAM) errors. During my PhD work, I was able to take the following data, which put $T_{3}$ somewhere around 50 us:

A way better approach was pioneered by the group of John Martinis in 2015. Here, delay times are interleaved inside a single-qubit randomised benchmarking sequence. This serves to significantly amplify the delay errors, reducing the statistical uncertainty and hence the data acquisition time.

This technique allowed O’Malley et al to obtain the following result, indicating $T_{3} \approx 50$ ns.

Another insight that immediately comes out of such results is that while dynamical decoupling may be a great way of extending $T_2$, it is not always a great way of extending $T_{3}$. For example, in the plot above, we see that spin echo suppresses noise at $t > 50$ ns, but introduces extra noise for $t < 40$ ns. This is not uncommon - is it really easy to unintentionally inject 0.1% of error, e.g. due to amplitude noise in the dynamical decoupling drive!

The approach from O’Malley et al was then adopted by David Lucas’s trapped ion group in Oxford, with the result presented in Sepiol et al 2019:

This graph may look overwhelming, but here is the gist. Sepiol et al use the error amplification technique to measure idling errors as low as $10^{-6}$. By sitting at the magnetic-field-insensitive point of their qubit, and without any spin echo (blue points), the authors record $T_{3} \approx 0.4$ s (that’s right: 400 ms). And by employing dynamical decoupling (red points), they can keep the memory error to < 0.1% for $T_{3} \approx 5$ s - although this number comes with an understandably large error bar.

The method has since been very popular in Oxford, including where I work at Oxford Ionics, where it allowed us to study qubit dephasing in preparation for our ultra-high fidelity two-qubit gates demo in 2024 (NB this was at the 99.99% level, so we’re even talking about $T_{4}$ now!). Quantinuum also adopts a variant of this method to measure errors during ion transport, to which qubit decoherence is a major contributor. For example, in the H2 paper they report “transport 1Q RB” error of $2 \times 10^{-3}$ for a delay duration of $\approx 60$ ms (I estimated this from Table I). So while they don’t report it exactly, I give them an honourable mention with $T_{3} \approx 30$ ms. Please correct me if I got that number wrong!

Despite these examples, O’Malley et al’s approach has not been widely adopted in the QC community since its proposal a decade ago. I think that’s a shame, and I hope this post helps spread the word about this technique.

So what do you think about $T_{3}$? Are you going to measure it in your system (and cite this blog post appropriately)? Do let me know in the comments.

Too academic

2025-08-03T00:00:00+00:00

When I left academia in 2021 to join Oxford Ionics, not much changed day-to-day to start with. Despite significantly different goals and structure, life in a small-scale QC startup was just not that different from life in an academic lab. Sure, there were business goals to meet, and more engineering resources to utilise - but my brain was working in very much the same way it did at the university. That feeling was not to last.

As the scale of the operation grew, I found myself gradually departing further and further from the “academic mindset”. Fast forward to August 2025 - exactly 4 years into my startup journey - I feel I might have gotten to the other end of the spectrum, where I actually struggle quite a bit even to effectively communicate and collaborate with very academically-minded colleagues!

Now, I’m not saying this to brag. In fact, I’m not exactly thrilled about this total shift in perspective - in the end, academia is where most of my colleagues and collaborators come from, and I will have to course-correct my attitude going forward. Still, I am interested in introspecting on how my approach to “academic questions” evolved over time.

When discussing academia, we are always at a danger of over-generalising based on our individual domains, cultures, and experiences. From time to time, I discuss my misgivings about the university system with a friend who works in pure mathematics. Amusingly, to most of my complaints, he responds with “Oh, in maths it’s the exact opposite of everything you said”. However, I think there is one field of knowledge which we’re all familiar with, where we all suffer from the academic mindset of its top practitioners - a mindset that persisted over centuries and across cultures.

I’m talking, of course, about philosophy.

Philosophy’s journey into obscurity

As a teenager, I was actually quite obsessed with philosophy. I devoured writing from pre-Aristotelians to post-Hegelians, studying original texts as well as interpretations and history. I attended extracurricular classes, workshops with top teachers from my home city, as well as university lectures and seminars. I even got some medals in the national philosophy olympiad, much to the delight of my high school administration¹. And then - just as I started being able to work with cutting-edge material - my interest evaporated. In the 15 years since, I probably have not read a single classic - and I have definitely not read any modern academic philosophy.

This was not a deliberate decision. I have never said “I’m done with this topic” - rather, no philosophy book has made it to the top of my to-read list. Only now I have the hindsight to look back and understand the root of my beef - the fact that, since its early days, philosophy evolved to be way, way waaaay too academic. What do I mean by that?

Philosophy is the subject seeking to answer questions like “what is the point of all this?” and “how can I be happy?”. These are the questions we all grapple with, in one way or another. Take a look at the NYT bestseller list right now - people all over the world are buying millions of books a year, every year, tackling those exact questions. Can you take a guess at how many of these bestsellers are written by academic philosophers?

The answer is, of course, “approximately none”. The reason is not that philosophy doesn’t care about these questions, but that - over hundreds of years - academic philosophers have moved away from answering broad questions of general interest in plain English (or rather plain Greek), to answering narrow questions that only two other people care about in technical jargon.

But… why? I think behind it all is relentless focus on novelty.

Philosophy as innovation gone too far

You see, when Western philosophy properly took off as an area of inquiry in ancient Greece, practical advice on how to live your life was its key focus. In subsequent centuries, giants of the field such as Seneca, Epicurus, or Mark Aurelius (for whom philosophy was a side-hustle to his 9-to-5 of running the Roman Empire) published foundational thoughts on how to deal with personal and professional problems, how to balance pleasure with obligations, and how individuals should interact with the wider society. A parallel development occurred in the East, with Buddha and Confucius spreading their teachings throughout Asia.

If you read any personal improvement book published in the 21st century, there is little there that can’t be traced back to these ancient thinkers - and that’s exactly the problem. To an academic philosopher, a book on, say, how to thrive in a modern workplace, simply adapts existing ideas to a modern context - and that’s not good enough for tenure. Likewise, many philosophy classics are not widely read simply because they’re not accessibly written - either because the author did not excel at crafting compelling narratives (likely), or because they crafted their stories for a different culture in a different age. This challenge allows Ryan Holiday to make millions of dollars from publishing a book of quotes from Stoic classics - but it would not allow an academic philosopher to make even a handful of citations, let alone positive reception from their colleagues. Thus, academic philosophers are forced to ignore the audience most interested in their work.

As a note: it is interesting how, when academic philosophers do write an occasional bestseller, it is usually in the context of recent technological or cultural developments, e.g. Nick Bostrom’s Superintelligence (concerning the rise of AI) or Toby Ord’s Precipice (concerning existential catastrophes). This makes sense: Plato couldn’t have been too concerned about self-improving LLMs, COVID-19 bioengineering, or nuclear submarines - thus, Bostrom and Ord can write content which is simultaneously novel and generally interesting. The question is, however, can academic philosophers continue to engage public imagination on the topic of AI in the coming decades, now that many of the foundational ideas have already been said?

Thus, while regular people continued to seek answers to timeless questions like “how to be a good parent” or “if someone is an asshole to me, can I be an asshole to them”, academic philosophers continued down the path of “is reality objectively real”, “is my red the same color as your red” etc. So when Heidegger (one of the 20th century’s most renowned philosophers) set out to discuss what makes us humans unique, it came out like this:

Dasein is an entity which does not just occur among other entities. Rather it is ontically distinguished by the fact that, in its very Being, that Being is an issue for it. But in that case, this is a constitutive state of Dasein’s Being, and this implies that Dasein, in its Being, has a relationship towards that Being—a relationship which itself is one of Being. And this means further that there is some way in which Dasein understands itself in its Being, and that to some degree it does so explicitly. It is peculiar to this entity that with and through its Being, this Being is disclosed to it. Understanding of Being is itself a definite characteristic of Dasein’s Being. Dasein is ontically distinctive in that it is ontological>

Does anyone wonder why Heidegger’s “Being and Time” is not particularly high on the bestseller list?

In summary, 3000 years of focus on novelty has moved philosophy from a deeply applied, broadly interesting discipline to an obscure, siloed, and abstract endeavour. If you’re a “regular human” interested in what philosophers have to say about your day-to-day struggles, you’re far better off seeking the teachings of ancient teachers like Gautama Buddha than talking to the ethics professor at your alma mater. And that is a real shame, isn’t it?

Progress is not just innovation

It is clear to me that - at least in this instance - the novelty constraint is not doing academia any favours, and I think we can all relate to the problem. Novelty feels like it should be a good thing - in the end, aren’t more ideas better than fewer ideas? However, there is a big difference between hearing an idea and internalising it. If you want to become a better person, you’re usually far better off simply meditating on timeless truths than seeking new ones. “New” is not always “better” - and it can, in fact, be worse.

Obviously, philosophy today is a bit of a caricature of an academic discipline. Still, thinking about philosophy as what happens when an academic debate continues unchecked for 3000 years clarifies my own feelings about physics and quantum computing quite a bit ².

I think exploratory academic research is a beautiful thing and a powerful force of change. That’s what drew me to academia in the first place, and I remain charmed by this vision. However, as I get older, I intuitively understand better and better that “more innovation” does not always mean “more progress”.

Sometimes a field is stuck and needs fresh, wacky ideas - but other times, a field needs focus on applying existing ideas in practice, with out-of-the-box thinking acting as a distraction. And sometimes, a field is stuck not due to lack of ideas, but due to poor ability to disseminate the findings - and what it needs is amazing communicators and community builders.

Sometimes, all the low-hanging fruit has been picked, and the high-hanging fruit is really too far out of reach. And other times, the problems tackled in a field are too niche to be even worth incrementally innovating on. In those situations, the best force of progress may be to give those topics a break.

In other words: the academic approach is great… but in only moderation.

My understanding is that the official high-school rankings in Poland award a lot of points for subject olympiad medals, but that the formula awards both depth and width, i.e. it looks something like “number of points = (number of medals) x (number of disciplines)”. My school was very nerdy, with students always collecting lots of medals in mathematics, physics, and computer science - but rarely in humanities. By adding a medal in philosophy, I thus added a big fat multiplier to the score, allowing the school to jump to the top of the national rankings. ↩
Even if, for some reason, you still enjoy reading papers on quantum foundations written 100 years after the birth of quantum mechanics - do you think 100 more years of innovative ideas on this topic are really going to help? ↩

Review: Google’s quantum error correction course

2025-04-17T00:00:00+00:00

Late in 2024, the team at Google Quantum AI released a free online course on quantum error correction taught by Austin Fowler. I checked it out expecting “more of the same”, but it blew my mind and I literally couldn’t put it down until it was done. Here is a belated review, and a discussion of what made it work.

Hands-on quantum error correction with Google Quantum AI. Maciej’s score: 5/5.

My background in QEC

First, let me tell you a bit about my background, since how well a course lands depends very much on the target audience. I did my PhD in experimental physics, but over the last 4 years, I have spent most of my time at the intersection of experiments and theory. At my company we call this “system architecture”, and it essentially involves working across the stack to answer wacky questions such as “If our quantum computer is to be capable of providing quantum advantage on Ising model simulations, how hot can the ion trap chip get, and which technology can deliver appropriate cooling power.” As such, I understood QEC very well compared to a median experimentalist, and I could just about talk to a QEC expert without embarrassing myself - but I was never able to follow QEC conference talks very much.

In my PhD days at ETH Zurich, I have taught many master’s courses on quantum information, including a good chunk about QEC. I distinctly remember that in the 1st year of my PhD, my supervisor asked me to prepare a problem set specifically about fault-tolerance. I asked him if we had any materials, and he said “Not really… but I’m sure you’ll find something in Nielsen & Chuang”. In the end, I downloaded John Preskill’s notes and problem sets on fault tolerance, and decided to reuse them for the course. The only issue was that they came without solutions… So I did my best to solve as many of them as possible, sanity-checked with my supervisor that the solutions were not clear nonsense, and left many problems out as I didn’t really know how to tackle them ;) Hopefully, that gives you a sense of where I’m coming from.

The trouble with QEC courses

Watching from the sidelines, QEC always struck me as one of the most exciting and creative sub-fields of quantum computing. The issue is, however, that very few people have the background knowledge to even keep up with what’s going on. It’s not for the lack of trying - there are plenty of graduate-level QEC courses at various universities and summer/winter schools. In my experience, however, these courses finish well before they get to the fun stuff!

This is usually simply because there are sooooo many prerequisites to go through. Even if the lecturer only wants to talk about fault-tolerant magic state preparation techniques, you really want to first:

Remind people how gates and state vectors work, such that you can…
Talk about density matrices, such that you can…
Explain CTP maps and Kraus operations, such that you can…
Talk about noise digitization, such that you can…
Talk about parity checks, such that you can…
Talk about the X-correcting Shor code, such that you can…
Talk about the X and Z correcting Shor code, such that you can…
Talk about stabiliser codes, such that you can…
Talk about the Earnst-Knill theorem, such that you can…
Talk about error propagation, such that you can…
Talk about fault tolerance, such that you can…
Talk about T-gate teleportation, such that you can
Talk about magic states.

It’s unlikely you can get through this list in a single course - and you haven’t even learned anything about the surface code!

Enter Google

Google’s QEC course flips this upside down. 20 minutes in, you are told that operations are matrices, errors are matrices, and errors are of X and Z type. 10 minutes later, you’ve built your first repetition code:

Notice already a major deviation from how this course approached the repetition code vs how most of this is taught. Traditionally, students start by only considering errors on computational qubits, and assume ancillas and measurements are perfect. The idea of measurement errors is only brought up when talking about fault tolerance. Here, no distinction is made between gate and measurement errors. A qubit is a qubit, an operation is an operation, that’s it. This allows them to introduce the notion of repeated measurements and detectors early on. I found this to be a huge conceptual simplification compared to the standard treatment.

5 minutes later, we are decoding our first errors through minimum-weight perfect matching:

In most courses, the concept of a decoder is not even mentioned - there is a box that computes what needs to happen in response to syndromes, end of story. But so much of modern QEC research is about finding data- and compute-efficient decoding schemes. The Google team makes it clear that there is no QEC without a decoder - and they are correct!

10 minutes of video later, we’ve learned all about stabilisers. This is a pretty standard treatment, but excitingly, it moves very quickly from algebraic to pictorial representation. The vibe is really that the pictorial representation is the right one, and it provides a natural segway to the surface code:

The lecture then quickly goes through examples of error propagation in the surface code, 10 minutes later building up to a pretty complete description of the surface code memory experiment. Wow!

The course ends with a thorough tutorial on simulating logical circuits with STIM, and an advanced introduction to Crumble, an automated error-propagation tool written by Craig Gidney.

This makes me think…

Taking a step back… Between watching videos, reflecting, and completing assignments, the course took me perhaps 8 hours from start to finish (ok, I skipped some of the Crumble part). That’s probably equivalent workload to 1-2 lecture + problem set sessions we did at ETH. Yet after the Google course, I feel sort of capable of reading modern QEC papers and watching modern talks. That’s pretty remarkable!

Of course, this is not to say that the Google course is “better”. Rather, it is simply extremely practical and hands-on. It makes the assumption that most students are willing to accept statements like “quantum operations can be represented as matrices” and are most interested in cool ideas that make QEC work - and that the curious ones can easily look up “what these matrices mean”.

It reminds me very much of the broad distinction between the “American-style” vs “Soviet-style” education. Around 2010, I left my home country of Poland to study at the University of Oxford. Every few months, I’d come back and compare notes on the physics we were learning with my friends who stayed back home. The most striking distinction was the UK’s focus on practicality, vs Poland’s focus on foundations. Pretty much on day 1 in Oxford, someone hands you a booklet that says “This is how you differentiate, this is how you integrate, now let’s go solve some differential equations”. This is in stark contrast to Poland, where you spend one semester learning about limits, and the second semester defining a derivative… by the time you start solving real-world equations, you’re an old man!

(Funny aside: when I went to Oxford for the first time for an interview, I was given a limit to calculate. I said what any well-educated Polish kid would say: “Let me start by checking if the limit really exists”. I got a lot of confused laughs!)

All in all, while this course is not for everyone, I believe it is a shockingly efficient way of getting from “zero to hero” in QEC. And I hope to see more content like this out there! If quantum technologies are to truly take off as an engineering discipline, we need practical, to-the-point teaching resources that can bring you up to speed with modern techniques. I hope Google’s QEC course sets an example that many will follow.

If you also completed Google’s QEC course, I would love to know what you thought! Also, if you know any other courses of this kind, drop me a line - I would love to check them out.

The joys of AI

2025-01-12T00:00:00+00:00

My favourite application of new technologies is when they reduce the BS and let us spend more time doing what we love. In the recent weeks, I’ve been feeling exactly this about the recent wave of AI tools. I want to share the story of two side-projects of mine, and how LLMs made them successful.

Making this blog work with AI

Many people use LLM tools for writing, but I am not interested in that. I really love writing, and I want to be doing more of it, not less. To me, writing is a really creative process, where most time and effort should be spent on figuring out what to write about and what angle to take. Once you know this, it’s really not much hassle to turn that into text!

Due to life constraints, I do not write posts regularly, instead working in bursts every few months ¹. One of the most annoying situations I encounter is when I sit down, having finally found the time to put some long-lingering thoughts on paper, only to find my blog’s IT screwed up, and to have my writing session turned into a debugging session. Unfortunately, a few months delay tends to be enough for some dependency/config to get out of date (you can say it’s the coherence time), and it just ruins the fun of occasional hobbies.

One of these issues came up right before Christmas, where the blog refused to build locally on my mac. This time, however, I had my AI-assisted code editor to hand. I only had to copy-paste the problem to Cursor’s Composer tool, and then proceed to watch as it correctly diagnosed it (issues with pre-installed Ruby on MacOS), fixed it (installed all the correct software and dependencies), and updated the code to work with recent versions of Ruby. After I verified success, it even committed and pushed the changes upstream - all I had to do was click “approve”.

Web development with AI

For a while now I’ve been dabbling into web development, becoming more familiar with CSS, React etc. However, especially for a beginner, the complete process of building and deploying web apps is very time consuming, again incompatible with my personal life. The reason is again largely IT crap! There are a great number of resources and tutorials on how to use pre-built components and combine them into functioning web apps - but in practice, one tends to get stuck on some mysterious bug - e.g. something only appears in production but not in local deployment - that ends up sucking up more time than fun and important things like UI design.

Last week, I returned to web development, this time with the help of Cursor. To my great joy, I was able to spin up a little website I dreamt about for a while from scratch in 2 days, with AI helping me to get rid of every single bug. Sometimes I had to feed it ideas about where I thought the issue was, but it always fixed it independently.

One of the larger issues I encountered during release was that the remote-deployed site failed to fetch the exercises from the SQLite database where they were stored. The AI tried to fix it in 5 different ways, each time coming short. Eventually, it decided to change tactic, essentially telling me that SQLite is a toy tool anyway, and that serious people use more serious database types, such as PostgreSQL. It then helped me migrate from a local SQLite database to a cloud-hosted PostgreSQL solution, a process which took less than 10 minutes in total.

This was pure joy - if I were on this project alone, I might have given up at this stage (I’m on holidays after all, this was supposed to be fun!). But I was not alone, thank God! I was also very amused and impressed to see AI tools using this very human technique of “I don’t know what the issue is, so let’s try a completely different approach in case it just works”.

The future looks exciting.

This reminds me, I still forgot to upload some of the posts I wrote on the plane to the US last August! ↩

Is quantum mechanics a bit of a scam?

2025-01-10T00:00:00+00:00

The fundamental proposition of physics is that there are some fundamental laws from which all else can be derived. You would therefore think that once you know the fundamental laws of quantum mechanics - the Schroedinger equation and the Born rule - you are ready to answer any question about any quantum system. But is that really the case?

The standard rulebook of physics

There has always been a bit of controversy out there about how deep the fundamental laws of physics really go - e.g. to what extent high-level phenomena such as properties of solids can be really derived from low-level laws such as the standard model.

Image: One of these mind-boggling graphics discussing emergence

Still, I think it’s fair to summarise the “standard rulebook of physics” as:

Specific problems can be generally placed in one or more specific domains: questions about charges are in the domain of electromagnetism, questions about water flow are in the domain of fluid dynamics etc.
Every domain has its fundamental equations, and each problem can be answered by solving those equations.

Granted, this workflow may be complicated in practice, and the equations may not be straightforward to solve – but in principle, with a powerful enough computer, this is all you need.

As an example, classical mechanics (CM) typically considers the question of: given a set of massive objects with well-defined initial positions and velocities, what are their positions and velocities at some time in the future? This question can be answered by:

Looking up the right fundamental laws (gravity, electromagnetism etc) to calculate the forces $F_i$ on the objects
Applying Newton’s law $a_i = F_i/m_i$ to calculate the accelerations
Integrating the accelerations of the objects to calculate future positions and velocities

In this workflow, we invoke 1) the fundamental laws of the relevant domains, 2) the fundamental laws of mechanics, and 3) the definition of position, velocity and acceleration. Sounds legit, right?

My beef with quantum mechanics

You would hope that the same workflow applies in quantum mechanics (QM). However, I have always found the standard exposition of the fundamental laws of QM contains at least one loose end that really prevents us from using them in practice. Let me walk you through an example to explain what I mean.

QM typically considers the question of: given a set of quantum objects with a well-defined initial wavefunction, what is the outcome of a set of measurements at some time in the future? This question can be answered by:

Writing down the Hamiltonian $H$
Integrating the Schroedinger equation $i \hbar \partial \psi / \partial t = H \psi$ to calculate the wavefunction at a given future time
Calculating the measurement outcomes using the Born rule

In this workflow, we explicitly invoke two fundamental postulates of QM, 2) the Schroedinger equation, and 3) the Born rule ¹. But what about 1)? How exactly does one decide what Hamiltonian to write down, and what fundamental laws dictate that?

The answer given to physics students is, to my taste, extremely dissatisfying. In a nutshell, it goes like this:

Suppose your quantum objects are actually classical objects
Look up the right fundamental laws to calculate their classical Hamiltonian $H’$, and express it terms of its “generalised position” $x$ and “generalised momentum” $p$
Replace terms such as $x p$ and $p x$ with their symmetric version $(x p + p x)/2$
Put a hat over $x$ and $p$ to turn them into operators

In my view, the issue with this procedure - known as “canonical quantisation” - is that while it invokes the fundamental laws of nature, it supplements them with some steps of unclear ontological status. What exactly are steps 3 - 4 doing? Is this procedure a law of nature? And if not, why exactly is it needed to calculate the outcome of quantum dynamics? I would say that canonical quantisation a loose end of QM.

To make this concrete, let’s say you want to calculate the quantum dynamics of a charged particle in a static magnetic field. Classically, we would calculate the force using the Lorentz force $F = q v \times B$. In QM, however, you’ll have to follow the procedure above to arrive at a Hamiltonian $H = (p-e A)^2/(2m)$, where $A$ is the classical vector potential of a magnetic field. Until you’ve followed the method of canonical quantization (or looked up the answer online), you’ll have no way for calculating the quantum dynamics - even though you supposedly know all the fundamental laws of quantum mechanics, as well as the rules of electromagnetism! Isn’t that crazy?

A careful reader may note: this was all about particles, but what happens when the wavefunction describes fields? Turns out there is a procedure for that too, and to arrive at the Hamiltonian of e.g. a photon you also have to:

Take the fundamental laws of electromagnetism (Maxwell’s equation), and
do some shady mumbo-jumbo.

That’s my gripe with QM - or at least, the QM I learned in my undergraduate.

What exactly is going on?

I think if you twist your lecturer’s arm, they are likely to tell you: “well, of course canonical quantization is not a law of nature. Is it simply a way to guess the quantum Hamiltonian from the known classical dynamics. But in reality, QM is more fundamental than CM, and so Hamiltonians are more fundamental than classical equations of motion”.

As a result, we can then say that, e.g. the fundamental law of charged particle interactions with magnetic field is really $H = (p-e A)^2/(2m)$, and that Lorentz force is basically a classical approximation of that fundamental law. In this interpretation, what the first quantization procedure provides is sort of a sanity check that the Deus Ex Machina quantum Hamiltonian is compatible with the well-known classical laws of physics.

I think this answer does make sense on paper, but it does not correspond to how people do things in practice. If you open any atomic physics textbook on a random page, it will likely say something like:

Let’s calculate the atomic energy level shift due to the interaction between this and that quantum object
Suppose those objects are actually little classical charged spinning balls
If they were, the classical Hamiltonian would say …
So let’s say the quantum Hamiltonian is the same

which is canonical quantization in disguise! What is more, we do this even when we know that the setup does not contain a good classical analogue, e.g. when Q particles with intrinsic spin.

What’s the lesson?

One of the reasons I gravitated towards physics in high school was that it was presented to me as a very “pure” science, where problems could be worked out from first principles through only self-evident procedures and thought expertiments. Only as I got further into the discipline I realised this is only about 95% true - physics is really clean, but there are loose ends nonetheless!

Now, I have it on good authority that these “loose ends” are not fundamental - they are not “bits of physics not yet understood”, but rather as “bits of physics without a simple explanation”. Usually in QM, these loose ends are pushed down to relativistic quantum field theory (“This is not quite right, but believe me, if you did the full relativistic quantum field theory calculations, all contraditions disappear”). I haven’t verified with people who know QFT how illuminated they really feel.

Still, in my view, it is helpful to follow the framing that every time you tell your student “you must trust me on this one”, you’re either talking about a fundamental law or a loose end. And just like a fundamental law in a high-level theory can actually be a derived law from a lower-level theory, hopefully a loose end in a high-level theory does not propagate down to a lower-level one.

Framed like this, I would say that undergraduate QM is, in a nutshell, composed of a few fundamental laws (including Schroedinger equation and Born rule) and one loose end (Hamiltonian quantization rules).

As always, thanks for reading, and I look forward to your comments below! In case I’m crazy and there something that everyone else gets about QM that I’ve missed, don’t hesitate to educate me. And if you’d like me to post about other loose ends in physics, let me know too!

Thanks to Shreyans Jain for comments and suggestions

NB some people think that the Born rule is not a fundamental law in itself, but that’s for another day. ↩

Normal computing

2024-09-23T00:00:00+00:00

While this blog is primarily about quantum, I am keenly interested in any work pushing the boundaries of computation. Thus, when the folks at Normal Computing came up with [2308.05660] Thermodynamic Linear Algebra last year, it immediately grabbed my attention. The paper presents what seemingly looks like a rather straightforward recipe for a better analog computer to solve linear algebra problems. In this post, I summarise the main result of their paper and ask: can this really work?

Thermodynamic linear algebra: how does it work?

Let me try to explain, in my own words, the gist of what I think [2308.05660] Thermodynamic Linear Algebra is all about. Starting with the basics, the story roughly goes as follows:

Physics-based computing

Suppose you’re trying to solve a classic linear algebra problem, namely a linear system of equations $A x = b$, where $A$ is a matrix and $x,b$ are vectors. Instead of solving it using established algorithms on a classical computer, you could instead build a dedicated analog device to simulate the problem.

What would this device have to look like? All you really need to do is set up some kind of a physical system to implement a potential energy $U(x) = \frac{1}{2} x^T A x - b^T x$. The value of $x$ at which $U(x)$ is minimised is the solution to your equation. So in theory, all you need to do is prepare the system, add some dissipation, wait, and then see where $x$ converges to.

If that rings a bell, it is because this method is a simple example of annealing! We normally think of simulated annealing - where the potential is set up and minimized on a digital computer - or quantum annealing - where quantum effects are utilised to speed up the process. But for the problem at hand, we don’t need the quantum part at all. So we could, in theory, build a purely classical system for the task and, as long as it has the right potential energy, it can be used to find the solution. All we need is a ball and a hill, and we’ve got our annealer.

But then there’s noise…

So why do people generally use simulated annealing rather than “real” annealing? Well, annealing struggles with the same problem that all other analog computers do: it is sensitive to hardware errors. Because of unavoidable fluctuations and imperfections, the potential energy $U’(x)$ that we actually set up will be different from the function we want to create $U(x)$. Likewise, the value of $x$ at any given point in time is unlikely to be at the “true” minimum of $U(x)$ due to transient system dynamics. Finally, readout errors mean that the solution $x’$ we read out will differ from the actual state of the system $x$. Taken together, all these considerations limit the ability to generate accurate solutions to linear algebra problems.

This is where the folks at Normal make a crucial observation: to find the minimum of $U(x)$, we don’t actually need to wait for the ball to roll down the hill and stay still. This is because, even if $x$ is a noisy variable, as long as we wait long enough to reach the thermal equilibrium, $x$ will always fluctuate around the potential minimum! This opens up an alternative way of operating the system, which embraces the noise rather than trying to minimize it. In particular, rather than eliminating noise and waiting for $x$ to settle down, we can deliberately inject noise and repeatedly sample the (constantly fluctuating) state of the system to find the average of $x$, which is the solution to our equation. Neat!

What’s even neater about this thermodynamic way of thinking is that we can now ask all kinds of equations about the distribution of $x$. For example, once we record the values of $x = x_1, x_2, \ldots, x_N$ in thermal equilibrium, we can estimate not just their average, but higher-order moments such as two-point correlators $x_1 x_2, x_2 x_3, \ldots, x_{N-1} x_N$. As is shown in the paper, the average value of this correlator is actually the matrix inverse $A^{-1}$. Sweet!

The paper has a pretty graphic that summarises the method I just described.

But… why?

Ok, so the method seems solid. However, digital computers are already pretty good at linear algebra. Can this thermodynamic computer be any better? And if so, on what metric?

The paper argues that the thermodynamic computer can bet digital computers at the speed of linear algebra on large matrices, as long as you don’t need answers with perfect accuracy. This graph illustrates this point well:

Let’s consider the middle graph (b) first, which describes solving a problem of dimension $d = 1000$. The dashed vertical line represents the exact solution, i.e. it takes about $6.5$ ms to solve linear systems of equations of this kind on one Nvidia RTX 6000 GPU. The black line (“conjugate gradient”) is the digital approximate method, and the coloured lines are the expected runtimes for a thermodynamic algorithm. We see that, after $t \approx 1$ ms, the thermodynamic computer algorithm has already found an approximate solution with an average error of $0.1$. The thermodynamic method continues to dominate until about $3.5$ ms, at which point the existing method wins again. This demonstrates a moderate speedup if what we care about is finding solutions with about $0.1$ error. From graphs a) and c), we see that the speedup is even more pronounced for larger problem dimensions ($d = 5000)$, while for small dimensions ($d=100$), it is not particularly significant.

Naturally, the exact runtime of the thermodynamic algorithm will be hardware-dependent, and this is only an estimate. Still, the point is that numerical simulations with not-completely-unrealistic parameters demonstrate that this method may win over digital computers!

Why would it win? The authors present a complexity analysis of their algorithms based on some solid thermodynamics, finding that their complexity often scales favourably compared to digital methods, e.g. $O(d)$ vs $O(d^2)$ for linear systems of equations.

How can I build one of these?

So what would this thermodynamic computer actually look like? The most reasonable guess is to build it out of analog electronics. In that case, the computer would be composed of lots and lots of noise sources coupled to local RC/LC oscillators, with each oscillator coupled to every other oscillator. This coupling can be accomplished, for example, by placing a capacitor between every pair of oscillators, resulting in the following network:

In this diagram (which comes from their subsequent experimental paper), each noise source is represented as a noisy voltage source in series with a resistor $R_i$, producing an effective current noise $I_{R_i}$. Each noisy current source is then connected to a separate parallel LC oscillator to form a “noisy oscillator”, a.k.a. a “stochastic mode” or “s-mode”. Finally, s-modes $i$ and $j$ are capacitively coupled through a capacitor $C_{ij}$, and there is one such capacitor for every pair of $i$ and $j$.

The way that this computer is programmed and used in practice is that a compiler (which itself lives on a digital computer) ingests $A$ and calculates the values of $C_{ii}$ and $C_{ij}$ that implement the correct potential energy. Once that compilation is finished, the variable capacitors are set to the target values, and the system is let to evolve. After a precalculated evolution time, the voltages are repeatedly measured, and the result is obtained.

While it may seem that the computation flow is mainly analog, the digital compiler actually does a lot of heavy lifting. This is because calculating the correct values of $C$ takes time, and itself requires $d^2$ steps! Funnily enough, it actually turns out compilation is the dominant contribution to the runtime of the thermodynamic linear algebra in the plot above. As an aside, this is quite reminiscent of the story of quantum error correction, where the (classical) digital computer that decodes the error syndromes is frequently the bottleneck to the quantum circuit runtime. Digital computers are just hard to get rid of!

What should I make of it?

I really enjoyed this paper, well as [2302.06584] Thermodynamic AI and the fluctuation frontier, which is an earlier vision paper from the folks at Normal. It really got me reflecting about topics I haven’t thought about in years (if ever), which is a great refreshment from the world of quantum computing…

I will share five thoughts below in no particular order. However, I will prefix by noting that 1) I’m not an expert on analog computing, 2) I haven’t spoken with any of the authors or anyone knowledgeable in the matter to hear all the things I got wrong, and 3) I haven’t gone through any of the maths in any real detail. So these are really raw thoughts - you’been warned! After this post is out, I will reach out to the authors to see what they make of my observations.

#1: Why now?. This question puzzles me on a few fronts. The method is really simple, and analog computing is really old. If thermodynamic computing is really as powerful as advertised, surely someone in the 1980s have thought of this algorithm, right? So is thermodynamic linear algebra a new breakthrough, or something that has already been discovered and discarded in the past?

The authors seem to have done their homework on prior art, so I’m of course willing to believe that the thermodynamic angle is genuinely new. On the other hand, some of the methods presented in the paper can be arguably “discovered” without ever thinking about thermodynamics at all. For example, can’t the linear algebra algorithm be executed on any analog annealer, without noise injection? So where exactly is the distinction between analog computing and thermodynamic computing, and between old and new?

Of course, it doesn’t really matter if thermodynamic linear algebra is a novel idea, it only matters if it’s a great idea. But I do wonder: is there any obvious reason this idea would surface now, rather than say 50 years ago? From the hardware perspective, don’t think there is anything about in proposal that is feasible today but wasn’t in the past. Maybe the answer is that the recent advances in AI, and the resulting growth in demand for compute, simply makes the economic value of hardware accelerators much more pronounced than it was back in the day?

#2: Uhm, so what about errors then? In this paper, the question of error susceptibility is explicitly left out of scope. However, throughout the publications I have read/skimmed, the folks at Normal seem to make an argument that because thermodynamic computing is noise driven, it is also more noise resilient. However, I’m not really convinced thermodynamic computing is genuinely more accurate than other analog computing modalities.

Sure, there are some types of errors - e.g. in operation timing - that this method will be resilient to. But my understanding is that the crux of a lot of analog computing is not necessarily noise as much as precision, i.e. systematic errors from component imperfections. In the architecture discussed in this paper, the matrix $A$ is encoded into the values of physical capacitors. Those will unavoidably not be quite right, leading to wrong solutions - and the total magnitude of the error will grow with problem size.

Coming back to the comparison with annealing, it seems to me that while thermodynamic computing can indeed make annealing more efficient and less sensitive to readout noise, it suffers from the fundamental limitation that the implemented potential $U’(x)$ cannot be made identical to the target potential $U(x)$. The earlier paper from Normal argues that thermodynamic computers can get around this limitation by pre-training the system controller (which I think is equivalent to “careful calibration of systematics”), but I don’t think this argument - whether correct or not - applies in this case. All in all, I don’t yet see how thermodynamic computing improves on this fundamental limitation of analog computers, but I remain open to be proven wrong.

#3: Building this won’t be easy! While the coupled oscillator architecture is simple to draw, it strikes me as challenging to build in practice on chip. First, adjusting the coupling matrix relies on electronically variable resistors/capacitors/inductors, and integrating these on chip with decent tuning range is a mess. These components also come with serious footprint challenges - a 1 nF capacitor in a standard CMOS processtakes about 1 square milimeter of area! High-speed data loading and readout also feels challenging - in the paper, they assumed one DAC and ADC per s-mode, which doesn’t sound practical or power-efficient. And as mentioned above, I think the architecture requires equisite management and compensation of circuit non-idealities, be it systematics, paracitics, underised couplings (e.g. through ground or mutual inductance), as well as component non-linearies.

Of course, this is not to say that this architecture doesn’t scale, only that it remains to be developed into a form that is practical and manufacturable. And really, none of this sounds too scary compared to the typical challenges of manufacturing and operating quantum computers! So from where I sit, I would say that while architecting thermodynamic computers sounds hard, it also sounds both doable and fun.

#4: How well does it need to work, actually? The paper demonstrates that the architecture allows for potential speed gains for linear algebra that is approximate (error norm around 0.1) and with dimension over $\approx 1000$. The obvious next question is: who exactly can benefit from fast approximate linear algebra with dimension over $\approx 1000$, and how much is this capability worth for them?

In don’t know the answer myself - before reading this paper, I have never asked myself questions such as “could my code benefit from a solver that is faster but makes more errors”, and I suppose neither did most people. My guess is that most potential end-users will need to be shown the advantage, rather than just given access to a thermodynamic processor. I suppose that’s the reason why Normal seems to be heavily staffed with AI people looking into probabilitic algorithms - their best case scenario is to build end-to-end solutions that natively benefit from thermodynamic speedups.

The hardware-software codesign seems crucial for a few more reasons. One is that the hardware is likely to have indiosyncracies that make it sub-optimal for certain problems. For example, the coupling matrix is likely to be limited to a certain dynamic range, meaning it can’t contain very small numbers if it already contains very large numbers. The other one is that linear algebra is only one of many subroutines that may be running on the thermodynamic processor, and a hardware-aware algorithm design is crucial to capturing all the potential gains.

#5: Is this a new field of science? The ideas presented in this work, as well as the earlier vision paper about thermodynamic AI, are clearly highly general and far-reaching. This is very exciting - I love bold, ambitious work of this kind. The challenges above notwithstanding, I do feel like a new field of science may be opening up, and it will be interesting to see if the enthusiasm for this thermodynamic computing architecture starts spreading beyond Normal over the coming years!

A few aspects of this work support this conclusion. The first is that these papers point out how a lot of algorithms and problems traditionally considered as separate can be unified, either because they can be solved by the same type of stochastic differential equation, or because they can be solved on the same type of analog hardware. The physicist in me strongly supports all efforts at idea unification, which is always a great starting point for developing strong intuitions about the world works.

The second is that thermodynamic computing could tell us something about the nature of our own intelligence. Despite the tremendous advances in AI, we know that the neural networks in our brains are in many ways superior to the artifical neural networks, e.g. in learning from limited data or in energy efficiency. However, we still we don’t know how the brain pulls this off! Given that noise is ubiquotous in biological systems, perhaps it’s reasonable to hypothesise that it could play an active role in the brain’s computation. If so, neuromorphic thermodynamic computers could take us a step closer to understanding the mysteries of human and animal intelligence.

Finally, these early papers raise more questions than they answer! Ok, they don’t do that explicitly, but as I read them, a little voice in my head keeps saying “Oh, this is interesting, I wonder if …”. This could be just because the field of analog computing is new to me, but it’s also in general a great indicator that there’s much much more to explore.