/ Demos / Quantum Computing / A Game of Surface Codes: Large-Scale Quantum Computing with Lattice Surgery

A Game of Surface Codes: Large-Scale Quantum Computing with Lattice Surgery

Published: June 2, 2025. Last updated: June 2, 2025.

In surface-code based fault tolerant quantum computing architectures, T gates are typically implemented via injected magic states. The layout and design of the architecture plays a crucial role in how fast a magic state can be reliably produced and consumed for computation. The game of surface codes 1 allows us to reason about such space-time tradeoffs in architecture designs, without having to get into the nitty-gritty details of surface code physics. In this demo, we will see how different designs can lead to faster computations at the cost of involving more qubits and vice versa.

/_images/Hero_Game_of_Surface_Codes.png

Introduction

The game of surface codes 1 is a high-level framework for designing surface code quantum computing architectures. The game helps us understand space-time trade-offs, where designs with a higher qubit overhead allow for faster computations and vice versa. For example, a space-efficient design might allow a computation with 108108 T gates to run in 44 hours using 55k55k physical qubits, whereas an intermediate design may run the same computation in 2222 minutes using 120k120k physical qubits, or a time-optimized design in 11 second using 15001500 interconnected quantum computers with 220k220k physical qubits, each.

One can draw a rough comparison to microchip design in classical computing, where the equivalent game would be about how to arrange the transistors of a chip to perform fast and efficient computations.

The game can be understood entirely from the rules described in the next section. However, it still helps to understand the correspondences in physical fault tolerant quantum computing (FTQC) architectures. First of all it is important to note that we consider surface codes that implement (Clifford + T) circuits. In particular, these circuits can be compiled to circuits that just perform Pauli product measurements. This is because all Clifford operations can be moved to the end of the circuit and merged with measurements. The remaining non-Clifford gates are realized by magic state injection and more Clifford operations, which can be merged with measurements again. Hence, we mainly care about performing measurements on qubits in arbitrary bases and efficiently distilling and injecting magic states.

We also note that the patches that represent qubits correspond to surface code qubits. There is a detailed explanation in Appendix A in 1 that describes the surface code realizations of all operations that we are going to see. These are useful to know in order to grasp the full depth of the game, but are not essential to understanding its rules and concluding design principles that we cover in this demo. For further reading on these subjects, we recommend the blog posts on the surface code and quantum error correction by Arthur Pesah, our demo on the toric code, as well as the three-part series on the toric code by James Wooton.

Rules of the game

The game is played on a board of tiles, where patches correspond to logical qubits. Underlying these tiles are physical qubits that are statically arranged (2d22d2 physical qubits per tile for code distance dd). But we should view logical qubit patches as dynamic entities that can appear, move around, deform and disappear again. The goal of this demo will be to understand the design principles and space-time trade-offs for surface code architectures.

Data qubits are realized by patches that occupy at least one tile, but potentially multiple. They always have four distinct boundaries corresponding to X (dotted) and Z (solid) edges. This is shown in the figure below.

/_images/qubit_definition_cropped.png

Qubits are defined as patches of tiles on the board. A single qubit can occupy one tile (a) or multiple tiles (b), where dotted lines correspond to X and solid lines to Z operators. Attribution see **

Every operation in the game has an associated time cost that we measure in units of code cycles 🕒. There are some discrepancies to actual surface code cycles, but the correspondance is close enough to weigh out space-time trade-offs in architecture designs. We are not going to give an exhaustive overview of all possible operations, but focus on a few important ones and fill the remaining gaps necessary for the architecture designs in the respective sections below.

Arbitrary Pauli product measurements

At the cost of 0🕒 we can measure patches in the X and Z basis. If two patches share a border, one can measure the product of their shared edges as highlighted by the blue region in the figure below at the cost of 1🕒.

/_images/ZZ_measurement.png

Simultaneously measuring the patches of two adjacent patches corresponds to the product of their neighboring edges. Here, we measure ZZZZ. Attribution see **

In particular, if the shared edge contains both Z and X edges, we can measure in the Y basis. In the following example, the upper qubit A has both operator edges ZAZA and XAXA exposed. Measuring it together with the auxillary qubit B, initialized in the |0|0 state below, we measure (ZAXA)ZBYAZB(ZAXA)ZBYAZB alltogether.

/_images/Y_measurement.png

Y operators can be measured by having both X and Z edges be exposed with an adjacent auxiliary qubit. The measurement corresponds to the product of all involved operators, involving ZAXAYAZAXAYA. Attribution see **

If we want to measure a single qubit patch in practice, we start off deforming it at the cost of 1🕒, initialize an auxiliary qubit at no cost, and perform the joint measurement as shown above (1🕒). The entire protocol costs 2🕒 and is shown below:

/_images/Y_measurement_protocol.png

The protocol for measuring a single qubit in the Y basis involves deforming the patch (Step 2, 1🕒), initializing an auxillary qubit in |0|0 (0🕒), simultaneously measuring both patches (1🕒) and deforming the qubit back again (0🕒). Attribution see **

Auxiliary qubits play an important role as they allow measuring products of Pauli operators on different qubits, which is the most crucial operation in this framework, since everything is mapped to Pauli product measurements.

/_images/PPM.png

Measuring Y1X3Z4X5Y1X3Z4X5 via a joint auxiliary qubit in 1🕒. In principle multi-qubit measurements with many qubits come at the same cost as with fewer qubit. However, the requirement of having an auxiliary region connecting all qubits may demand extra deformations. Attribution see **

Non-Clifford Pauli rotations

Non-Clifford Pauli rotations eiπ8Peiπ8P for some Pauli word PP are realized via magic state distillation and injection. Magic state distillation blocks are a crucial part of the architecture design that we are going to cover later. For the moment we assume that we have means to prepare magic states |m=|0+eiπ4|1|m=|0+eiπ4|1 on special qubit tiles (distillation blocks). Magic state injection in this case then refers to the following protocol:

/_images/magic_state_injection.png

Performing a non-Clifford π/8π/8 rotation corresponds to performing the joint measurement of the Pauli word and ZZ on the magic state qubit. The measurement of PZmPZm costs 1🕒, the subsequent XX measurement is free. The additional classically controlled Clifford rotations can be merged again with the measurements at the end of the circuit. Attribution see **

Take for example the Pauli word P=Z1Y2X4P=Z1Y2X4 on the architecture layout below. This design allows one to directly perform eiπ8Peiπ8P as we have access to all of X,Y,Z on each qubit, as well as the Z edge for the magic state qubit.

/_images/non_clifford_rotation.png

Performing eiπ8Z1Y2X4 by measuring Z1Y2X4Zm. The additional measurement X on the magic state qubit is not shown and has no additional cost. The remaining Clifford Pauli rotations are merged with the terminal measurements at the end of the circuit via compilation. Attribution see **

We are going to see in the next section that one of the biggest problems is performing Y rotations and measurements (same thing, really, in this framework).

Data blocks design

Computation happens on logical data qubits that are arranged on a so-called data block. We now have all the necessary tools to understand different designs and their space-time tradeoffs. In particular, the speed of the quantum computer is determined by how fast a magic state can be distilled and consumed by a data block. In this section we focus on how the design affects how fast a magic state can be consumed by a block and do not focus on the distillation itself (this will be handled in the next section).

Compact data blocks

The compact data block has the following form. The middle aisle is going to be used as an auxiliary qubit region.

/_images/compact_block.png

The compact data block design is efficient in space. However, only one edge is exposed to the auxiliary qubit region in the middle. Attribution see **

This design only uses 32n+3 tiles for n qubits. The biggest drawback is rather obvious: we can only access Z measurements in the auxiliary qubit region. In order to perform joint X measurements, we can perform a patch rotation at a cost of 3🕒:

/_images/patch_rotation.png

A patch rotation can be used to expose the X edge to the auxiliary qubit region. Attribution see **

The worst thing that can happen is to have two opposite qubits require an X measurement, e.g. qubits (3 and 4) or (5 and 6). If either or both occurs, it takes a total of 6🕒 to rotate the patches.

An additional problem of this design is the fact that there are no tiles for qubits to expand to in order to perform Y measurements. This can be remedied by making use of the identity

eiπ8Y=eiπ4Zeiπ8Xeiπ4Z.

The Clifford rotation on the right eiπ4Z, which is applied first, needs to be explicitly performed in this case. The second Clifford rotation (eiπ4Z) can be merged with the terminal measurements of the circuit. Such a rotation eiπ4P can be performed with a joint measurement of PY, similar to the magic state distillation circuit:

/_images/clifford_rotation.png

A Clifford rotation eiπ4P is performed by measuring PY. Attribution see **

In particular, we still need to be able to perform a Y measurement somewhere. In this case we just outsourced it to another resource qubit, which we can use for all others and for which we left space in the bottom left corner of the compact data block. For example, we can perform the rotation eiπ4Z3Z5Z6 at a cost of 1🕒 in the following way:

/_images/clifford_rotation_356.png

A Clifford rotation eiπ4Z3Z5Z6 is performed by measuring Z3Z5Z6Yresource with the additional resource qubit in the bottom left corner of the compact block. Attribution see **

The worst case here is having an even number of Y operators in the Pauli word, as it requires two distinct π4 rotations, each costing 2🕒.

Overall, in the worst case scenario an operation can cost 9🕒. This consists of the base cost of 1🕒 for performing the Pauli measaurement, 2🕒 for having an even number of Y operators, and 6🕒 when opposite qubit patches require X measurements. The following protocol shows such a scenario by performing eiπ8Y1Y3Z4Y5Y6, which is realized by eiπ8X1X3Z4X5X6eiπ4Z3Z5Z6eiπ4Z1 (ignoring again the additional two π4 rotations that are merged with measurements).

/_images/compact_block_worst_case.png

Worst case scenario in the compact block when performing eiπ8Y1Y3Z4Y5Y6. Step 2 measures Z1 together with Y on the resource qubit in order to perform the eiπ4Z1 rotation at 1🕒. Step 3 performs the additional X measurement on the resource qubit at 0🕒. Same for steps 4 and 5 for performing eiπ4Z3Z5Z6 at 1🕒 overall. Steps 6 and 7 perform the patch rotations at 3🕒, each. And the final measurement of X1X3Z4X5X6Zm at another 1🕒 in step 8 completes the computation. Attribution see **

Intermediate data blocks

The intermediate data block design gets rid of the problem of potentially having blocking X measurements on opposite qubit patches by simply removing the second row and laying out all qubits in a linear fashion.

/_images/intermediate_block.png

Intermediate data block design. Attribution see **

As such, this architecture occupies 2n+4 tiles. One can get additional savings by having the auxiliary qubit region be flexibly the lower or upper row. This way, one can save on the extra cost of rotating patches back to their original position.

/_images/intermediate_worst_case.png

Performing a ZXZZX measurement by performing patch rotations for the appropriate X measurements and moving all qubits down into the auxiliary region to save time. Attribution see **

Overall we get a maximum of 2🕒 for the rotations. Adding the base cost of 1🕒 for the measurement and the maximum 2🕒 for the additional Clifford π/4 Z rotations as in the compact block design, we obtain a maximum cost of 5🕒.

Fast data blocks

In order to be able to access Y operations directly, we need both Z and X edges exposed to the auxiliary qubit region, demanding 2 tiles for 1 qubit. We omitted this in the rule description before as it is only relevant for the fast data block, but we can also realize 2 qubits on a single patch using 2 tiles:

/_images/2q_patch.png

Two qubits can be realized by a patch on two tiles. The patch now has 6 distinct edges, corresponding to the operators as indicated in the figure. Attribution see **

With this extra trick up our sleeve, we can construct the fast data block consisting of two-qubit patches with an all-encompassing auxiliary qubit region.

/_images/fast_block.png

Fast data block design. Attribution see **

Here, all 15 distinct Pauli operators are readily available. This is because we have X1, X1X2, Z2, Z1Z2 and all products thereof available. For example, we can realize X2 via X1(X1X2) and we have Y1(X1)(Z1)=(X1)(Z1Z2)(Z2). With the same logic we can obtain Y2 and Z1. Further, we have operators like X1Y1(X1X2)Z2, Z1X2=X1(X1X2)Z2(Z1Z2) and Y1X2(X1X2)(Z2)(Z1Z2).

The maximum time cost for performing a non-Clifford Pauli rotation therefore is just 1🕒 on the fast data block.

Distillation blocks design

So far we have only been concerned with data blocks that perform Pauli product measurements and assumed magic states to be available for consumption. These magic states need to be distilled in separate blocks, which can in principle be of the same design as data blocks. But since the blocks are used for a fixed protocol, this knowledge can be used for simplifications.

There are different approaches to perform magic state distillation. We consider the case where we can prepare a magic state with infidelity p. The distillation protocol is then such that this infidelity is decreased to an acceptable level. All other operations of the protocol are Clifford, so we can measure if an error has occured. This then determines the success probability of the protocol, which in the case below is roughly (1p)n for an n-qubit protocol. We are going to go through the simplest protocol in a 15-to-1 distillation block.

15-to-1 distillation

This protocol uses 15 imperfect magic states with infidelity p and outputs a single magic state with infidelity of 35p3. The distillation circuit is shown below, with the details described in section 3.1 in 1:

/_images/15-to-1.png

15-to-1 distillation protocol. Each π8 rotation involves a magic state injection with an error-prone magic state. In total, we have 4+11 magic states, each with infidelity p and output a magic state |m on the fifth qubit with infidelity 35p3. Attribution see **

Because all operations in the protocol are Z measurements, we can use the compact data block design to perform the distillation. Another trick the author of 1 proposes is to use the auto-corrected magic state injection protocol below that avoids the additional Clifford π4 Pauli rotation (and to note that the π2 Pauli rotation is just a sign flip that can be tracked classically).

/_images/auto-corrected-non-clifford.png

The auto-corrected magic state injection protocol avoids the additional Clifford π4 Pauli rotation from above at the cost of having an additional qubit that is measured. However, note that the first two measurements commute and can be performed simultaneously. Attribution see **

Using this injection protocol to perform the non-Clifford π8 rotations using the error prone magic states, the 15-to-1 protocol on a compact data block is performed in the following way:

/_images/15-to-1-protocol.png

The 15-to-1 protocol executed on a compact data block using the auto-corrected magic state injection subroutine in each of the repeating steps. Note that both PZm and ZmY|0 measurements are performed simultaneously. If all X measurements on qubits 1-4 in step 23 yield a +1 result, a magic state is successfully prepared on qubit 5. The probability for failure is roughly (1p)n. Attribution see **

The 15-to-1 distillation protocol produces a magic state in 11🕒 on 11 tiles.

Quantum computer designs

The 15-to-1 distillation protocol is the simplest of a variety of protocols each with different characteristics. The best choice of distillation protocol heavily depends on the error probabilities of the quantum computer in use, as well as the overall tolerance for errors we allow to still occur. For example, assume we tolerate a T infidelity of 1010 and have p=104, then the 15-to-1 protocol would suffice as it yields an infidelity of 35p3=3.5×1011<1010.

Another consideration is to combine data and distillation blocks that match in their maximum time requirements. Since the 15-to-1 distillation above takes 11🕒 to procude a magic state, there is no point in using the fast or intermediate data blocks, and we can just resort to the compact one.

A minimal setup can be seen below. It consists of 100 logical qubits on 153 tiles in a compact block, as well as a 15-to-1 distillation block using another 11 tiles.

/_images/minimal-setup.png

Minimal setup with 100 logical qubits on 153 tiles and 11 extra tiles for a compact distillation block. Attribution see **

For a code distance of d=13 we would require 1642d255k physical qubits. An example computation with 108 T gates at a code cycle of 1μs would finish in d11🕒1084h.

In this setup, a magic state is produced every 11🕒 and takes at most 9🕒 for consumption. The bottleneck is in the magic state distillation, and overall this setup takes 11🕒 per non-Clifford gate. The most straight-forward way to speed this up is by adding magic state distillation blocks. Adding just one other distillation block halves the T-gate production time to 5.5🕒. Now it makes sense to use the intermediate data block design, which takes at most 5🕒 for T-gate consumption:

/_images/intermediate_setup.png

Intermediate setup consisting of the intermediate data block and two 15-to-1 distillation blocks on each end. Attribution see **

In this case we require 222 tiles, so 2222d275k physical qubits, and the same computation mentioned before would finish in half the time after about 2h.

Conclusion

We’ve been introduced to a high-level description of quantum computing that allows us to reason about space-time trade-offs in FTQC architecture designs. We have seen some basic prototypes that allow computations involving 108 T gates in orders of hours using 55k or 75k physical qubits. With this knowledge, we should be able to follow the more involved tricks discussed in sections 4 and 5 in 1, that we have not covered in this demo yet.

References

1(1,2,3,4,5,6)

Daniel Litinski “A Game of Surface Codes: Large-Scale Quantum Computing with Lattice Surgery” arXiv:1808.02892, 2018.

Attributions

**: Images from Game of Surface Codes by Daniel Litinski, CC BY 4.0

About the author

Total running time of the script: (0 minutes 0.000 seconds)

Korbinian Kottmann

Korbinian Kottmann

Quantum simulation & open source software

Total running time of the script: (0 minutes 0.000 seconds)