arrow left

Introducing Tsim: Fast, Non-Clifford Circuit Simulation for QEC Researchers

calender icon
April 2, 2026
clock icon
min read
Technology
Share

If you work in quantum error correction, you probably already know STIM. It's fast, elegant, and has become the go-to tool for simulating Clifford circuits at scale. But as QEC research moves toward fault-tolerant protocols — magic state preparation, T-state injection, logical operations beyond Clifford — STIM hits a fundamental wall. It can't simulate non-Clifford gates.

Today, we're introducing Tsim, an open-source circuit simulator that extends the STIM ecosystem to Clifford+T circuits with GPU acceleration. It's fast, it's free, and if you already use STIM, you can start using Tsim by changing a single import line. Read the pre-print on ArXiv.

What Tsim Does

Tsim is a sampling simulator for quantum circuits containing Clifford gates, T gates, and general U3 rotations. Under the hood, it uses stabilizer rank decomposition via ZX calculus to efficiently represent and sample from circuits that would be intractable for a standard state vector simulator at high qubit count.

Runtime scales exponentially in the number of non-Clifford gates, but only polynomially in the number of qubits. That makes Tsim ideal for the regime that QEC researchers care about most: large circuits (80+ qubits) with a modest number of T gates. This is characteristic of fault-tolerant protocols such as magic state distillation at the physical level.

A few highlights:

  • 80+ qubit simulation of error-corrected circuits at the physical level
  • GPU acceleration via JAX, producing millions of shots in parallel
  • ~600 nanoseconds per shot on an NVIDIA GH200 for an 85-qubit magic state distillation circuit with 10 non-Clifford gates
  • Drop-in STIM API compatibility — same Circuit, compile_detector_sampler(), and sample() interface
  • Full noise model support including all Pauli error channels that STIM supports
  • Built-in visualization with circuit diagrams in timeline-SVG format
  • Written in 100% Python (~6,000 lines of code, 40% of which are unit tests)

How It Works

Tsim builds on the theoretical framework from Sutcliffe & Kissinger (arXiv:2403.06777), which uses ZX calculus to find efficient stabilizer rank decompositions of non-Clifford circuit elements.

The workflow is straightforward:

1. Define your circuit using Tsim's STIM-compatible text format. If you already have STIM circuits, they work as-is — Tsim adds support for T, U3, and other non-Clifford instructions on top of the full STIM gate set.

import tsim

circuit = tsim.Circuit(
    """
    RX 0
    R 1
    T 0
    PAULI_CHANNEL_1(0.1, 0.1, 0.2) 0 1
    H 0
    CNOT 0 1
    DEPOLARIZE2(0.01) 0 1
    M 0 1
    DETECTOR rec[-1] rec[-2]
    OBSERVABLE_INCLUDE(0) rec[-1]
    """
)

2. Compile a sampler. This step performs the ZX decomposition and JIT-compiles the simulation kernel. It may take a few seconds for complex circuits, but it only needs to happen once per circuit.

detector_sampler = circuit.compile_detector_sampler()

3. Sample. Generate as many shots as you need. This is where the GPU parallelism shines — the more shots you request, the better the throughput.

results = detector_sampler.sample(shots=100_000, append_observables=True)

That's it. With only minimal changes, the API is identical to what you'd write with STIM.

Performance

We benchmarked Tsim on an 85-qubit magic state distillation (MSD) circuit with 10 non-Clifford gates — a representative workload for near-term QEC research.

BackendTime per shotSTIM on CPU (S gates only — no T gates)0.14 μsTsim on GPU (with T gates)0.64 μsTsim on CPU (with T gates)14.59 μs

On GPU, Tsim approaches STIM's Clifford-only performance while actually simulating the non-Clifford gates that STIM cannot handle. That comparison is inherently a bit apples-to-oranges — STIM is simulating a simpler problem — but it shows that Tsim's overhead for non-Clifford support is modest when the T-gate count is low.

For pure Clifford circuits, Tsim's detector sampling performance is comparable to STIM, while measurement sampling is slower. Tsim's sweet spot is large circuits with few non-Clifford gates and high shot counts — exactly the workload that QEC protocol development demands.

Where Tsim Fits in the Bloqade Logical Simulation Toolchain

Tsim isn't a standalone tool — it's the newest addition to Bloqade's logical simulation toolchain, QuEra's open-source pipeline for simulating fault-tolerant quantum circuits with hardware-informed noise. Understanding the full toolchain helps explain why Tsim matters and how you can get the most out of it.

Bloqade Logical Simulation Toolchain Diagram

The pipeline has four stages: circuit definition, noise modeling, logical simulation, and QEC decoding.

Circuit definition. Your quantum algorithm is expressed as a Squin kernel — a circuit-level program compiled for logical simulation. Squin serves as the intermediate representation throughout the toolchain, from algorithm specification through simulation and decoding.

Noise modeling. Before simulation, the toolchain injects noise into your Squin kernel to model real hardware behavior. You have three options depending on your research needs:

  • Heuristic noise applies Pauli error channels that capture six distinct error sources specific to neutral-atom operations: global single-qubit gate errors, local single-qubit gate errors, CZ gate errors, unpaired Rydberg errors from blockade interactions, mover errors, and sitter errors from atom transport. These are applied locally via the `noise.transform_circuit` wrapper, and you can configure them to match different modes of hardware operation.
  • Custom noise lets you manually encode noise directly into the Squin kernel, giving you full control for tailored logic-level development and specialized research use cases.
  • Gemini-specific noise models calibrated to QuEra's Gemini hardware are coming soon. These will let you simulate circuits under conditions that closely match what you'd see on actual Gemini devices.

The output of this stage is a compiled circuit: your Squin kernel with noise substitutions baked in, ready for simulation.

Logical simulation. The compiled circuit is passed to one of three simulation backends depending on your needs: Tsim for GPU-accelerated non-Clifford circuits, STIM for high-performance Clifford simulation, or PyQrack for GPU-accelerated state vector simulation.

QEC decoding. After simulation, the toolchain extracts observable bits from simulated syndrome measurements and feeds them to a decoder. The `sinter` interface provides integration with popular open-source decoders including Belief Propagation + Ordered Statistics Decoding, Belief Propagation + Localized Statistics Decoding, Belief Find, Minimum-Weight Parity Factor, and Tesseract. You can also install your own custom decoder if your protocol requires specialized decoding logic.

Tsim's position in this toolchain is what makes it especially powerful for QEC research: you get a realistic noise model applied to your circuit before it reaches the simulator, and a full decoding pipeline afterward — so you can evaluate end-to-end QEC protocol performance, not just raw simulation output.

You can also use Tsim standalone with any STIM-format circuit if you want to bring your own noise model or skip the Bloqade pipeline entirely.

Comparison to Other Simulators

Tsim isn't the only non-Clifford simulator out there, but it occupies a unique position:

  • vs. STIM: Tsim adds non-Clifford gate support with GPU acceleration. STIM remains the gold standard for pure Clifford simulation on CPU.
  • vs. Qiskit's Extended Stabilizer: Qiskit's stabilizer simulator uses a similar technique but is limited to 63 qubits and runs on CPU only. Tsim supports circuits with 80+ qubits and GPU parallelism.
  • vs. Soft Simulator: A recent research project using extended tableau propagation on GPU. It may outperform Tsim for certain workloads (for example, magic state cultivation) and supports atom-loss simulation. However, it is research code without STIM API compatibility or production-level documentation.

Get Started

Tsim is open source under the Apache License 2.0. You can install it today:

pip install bloqade-tsim

For GPU acceleration with CUDA:

pip install "bloqade-tsim[cuda13]"

If you're already using STIM, try swapping import stim for import tsim in one of your existing circuits and see what happens. We think you'll be pleasantly surprised.

We'd love to hear what you build with it. File issues, open PRs, or let us know on social media.


machine learning
with QuEra

Listen to the podcast
No items found.
Technology

Introducing Tsim: Fast, Non-Clifford Circuit Simulation for QEC Researchers

April 2, 2026
min read
6 min read
Abstract background with white center and soft gradient corners in purple and orange with dotted patterns.

If you work in quantum error correction, you probably already know STIM. It's fast, elegant, and has become the go-to tool for simulating Clifford circuits at scale. But as QEC research moves toward fault-tolerant protocols — magic state preparation, T-state injection, logical operations beyond Clifford — STIM hits a fundamental wall. It can't simulate non-Clifford gates.

Today, we're introducing Tsim, an open-source circuit simulator that extends the STIM ecosystem to Clifford+T circuits with GPU acceleration. It's fast, it's free, and if you already use STIM, you can start using Tsim by changing a single import line. Read the pre-print on ArXiv.

What Tsim Does

Tsim is a sampling simulator for quantum circuits containing Clifford gates, T gates, and general U3 rotations. Under the hood, it uses stabilizer rank decomposition via ZX calculus to efficiently represent and sample from circuits that would be intractable for a standard state vector simulator at high qubit count.

Runtime scales exponentially in the number of non-Clifford gates, but only polynomially in the number of qubits. That makes Tsim ideal for the regime that QEC researchers care about most: large circuits (80+ qubits) with a modest number of T gates. This is characteristic of fault-tolerant protocols such as magic state distillation at the physical level.

A few highlights:

  • 80+ qubit simulation of error-corrected circuits at the physical level
  • GPU acceleration via JAX, producing millions of shots in parallel
  • ~600 nanoseconds per shot on an NVIDIA GH200 for an 85-qubit magic state distillation circuit with 10 non-Clifford gates
  • Drop-in STIM API compatibility — same Circuit, compile_detector_sampler(), and sample() interface
  • Full noise model support including all Pauli error channels that STIM supports
  • Built-in visualization with circuit diagrams in timeline-SVG format
  • Written in 100% Python (~6,000 lines of code, 40% of which are unit tests)

How It Works

Tsim builds on the theoretical framework from Sutcliffe & Kissinger (arXiv:2403.06777), which uses ZX calculus to find efficient stabilizer rank decompositions of non-Clifford circuit elements.

The workflow is straightforward:

1. Define your circuit using Tsim's STIM-compatible text format. If you already have STIM circuits, they work as-is — Tsim adds support for T, U3, and other non-Clifford instructions on top of the full STIM gate set.

import tsim

circuit = tsim.Circuit(
    """
    RX 0
    R 1
    T 0
    PAULI_CHANNEL_1(0.1, 0.1, 0.2) 0 1
    H 0
    CNOT 0 1
    DEPOLARIZE2(0.01) 0 1
    M 0 1
    DETECTOR rec[-1] rec[-2]
    OBSERVABLE_INCLUDE(0) rec[-1]
    """
)

2. Compile a sampler. This step performs the ZX decomposition and JIT-compiles the simulation kernel. It may take a few seconds for complex circuits, but it only needs to happen once per circuit.

detector_sampler = circuit.compile_detector_sampler()

3. Sample. Generate as many shots as you need. This is where the GPU parallelism shines — the more shots you request, the better the throughput.

results = detector_sampler.sample(shots=100_000, append_observables=True)

That's it. With only minimal changes, the API is identical to what you'd write with STIM.

Performance

We benchmarked Tsim on an 85-qubit magic state distillation (MSD) circuit with 10 non-Clifford gates — a representative workload for near-term QEC research.

BackendTime per shotSTIM on CPU (S gates only — no T gates)0.14 μsTsim on GPU (with T gates)0.64 μsTsim on CPU (with T gates)14.59 μs

On GPU, Tsim approaches STIM's Clifford-only performance while actually simulating the non-Clifford gates that STIM cannot handle. That comparison is inherently a bit apples-to-oranges — STIM is simulating a simpler problem — but it shows that Tsim's overhead for non-Clifford support is modest when the T-gate count is low.

For pure Clifford circuits, Tsim's detector sampling performance is comparable to STIM, while measurement sampling is slower. Tsim's sweet spot is large circuits with few non-Clifford gates and high shot counts — exactly the workload that QEC protocol development demands.

Where Tsim Fits in the Bloqade Logical Simulation Toolchain

Tsim isn't a standalone tool — it's the newest addition to Bloqade's logical simulation toolchain, QuEra's open-source pipeline for simulating fault-tolerant quantum circuits with hardware-informed noise. Understanding the full toolchain helps explain why Tsim matters and how you can get the most out of it.

Bloqade Logical Simulation Toolchain Diagram

The pipeline has four stages: circuit definition, noise modeling, logical simulation, and QEC decoding.

Circuit definition. Your quantum algorithm is expressed as a Squin kernel — a circuit-level program compiled for logical simulation. Squin serves as the intermediate representation throughout the toolchain, from algorithm specification through simulation and decoding.

Noise modeling. Before simulation, the toolchain injects noise into your Squin kernel to model real hardware behavior. You have three options depending on your research needs:

  • Heuristic noise applies Pauli error channels that capture six distinct error sources specific to neutral-atom operations: global single-qubit gate errors, local single-qubit gate errors, CZ gate errors, unpaired Rydberg errors from blockade interactions, mover errors, and sitter errors from atom transport. These are applied locally via the `noise.transform_circuit` wrapper, and you can configure them to match different modes of hardware operation.
  • Custom noise lets you manually encode noise directly into the Squin kernel, giving you full control for tailored logic-level development and specialized research use cases.
  • Gemini-specific noise models calibrated to QuEra's Gemini hardware are coming soon. These will let you simulate circuits under conditions that closely match what you'd see on actual Gemini devices.

The output of this stage is a compiled circuit: your Squin kernel with noise substitutions baked in, ready for simulation.

Logical simulation. The compiled circuit is passed to one of three simulation backends depending on your needs: Tsim for GPU-accelerated non-Clifford circuits, STIM for high-performance Clifford simulation, or PyQrack for GPU-accelerated state vector simulation.

QEC decoding. After simulation, the toolchain extracts observable bits from simulated syndrome measurements and feeds them to a decoder. The `sinter` interface provides integration with popular open-source decoders including Belief Propagation + Ordered Statistics Decoding, Belief Propagation + Localized Statistics Decoding, Belief Find, Minimum-Weight Parity Factor, and Tesseract. You can also install your own custom decoder if your protocol requires specialized decoding logic.

Tsim's position in this toolchain is what makes it especially powerful for QEC research: you get a realistic noise model applied to your circuit before it reaches the simulator, and a full decoding pipeline afterward — so you can evaluate end-to-end QEC protocol performance, not just raw simulation output.

You can also use Tsim standalone with any STIM-format circuit if you want to bring your own noise model or skip the Bloqade pipeline entirely.

Comparison to Other Simulators

Tsim isn't the only non-Clifford simulator out there, but it occupies a unique position:

  • vs. STIM: Tsim adds non-Clifford gate support with GPU acceleration. STIM remains the gold standard for pure Clifford simulation on CPU.
  • vs. Qiskit's Extended Stabilizer: Qiskit's stabilizer simulator uses a similar technique but is limited to 63 qubits and runs on CPU only. Tsim supports circuits with 80+ qubits and GPU parallelism.
  • vs. Soft Simulator: A recent research project using extended tableau propagation on GPU. It may outperform Tsim for certain workloads (for example, magic state cultivation) and supports atom-loss simulation. However, it is research code without STIM API compatibility or production-level documentation.

Get Started

Tsim is open source under the Apache License 2.0. You can install it today:

pip install bloqade-tsim

For GPU acceleration with CUDA:

pip install "bloqade-tsim[cuda13]"

If you're already using STIM, try swapping import stim for import tsim in one of your existing circuits and see what happens. We think you'll be pleasantly surprised.

We'd love to hear what you build with it. File issues, open PRs, or let us know on social media.


machine learning
with QuEra

Listen to the podcast
No items found.