Photon-counting x-ray detectors for CT

The introduction of photon-counting detectors is expected to be the next major breakthrough in clinical x-ray computed tomography (CT). During the last decade, there has been considerable research activity in the field of photon-counting CT, in terms of both hardware development and theoretical understanding of the factors affecting image quality. In this article, we review the recent progress in this field with the intent of highlighting the relationship between detector design considerations and the resulting image quality. We discuss detector design choices such as converter material, pixel size, and readout electronics design, and then elucidate their impact on detector performance in terms of dose efficiency, spatial resolution, and energy resolution. Furthermore, we give an overview of data processing, reconstruction methods and metrics of imaging performance; outline clinical applications; and discuss potential future developments.


A brief history of photon-counting detectors
To count photons is the most intuitive approach for detecting x-rays, and if it were not for technical challenges, photon-counting detectors (PCDs) would have been standard from the beginning of radiology. Early on, gas detectors were commonplace, and Geiger-Müller devices counted individual interactions of ionizing radiation. The Nobel-prize-awarded multi-wire proportional chamber (Charpak 1997) combined photon counting with spatial resolution. The development was driven by fundamental physics research, but photon-counting gas detectors were briefly used for imaging in a Paris hospital (Dubousset et al 2007). In nuclear imaging, photon counting was used from the very beginning for the Anger camera as well as for the first PET system. In this case, a scintillator, typically NaI or CsI, converted the incident gamma rays into visible light that was detected by photosensitive devices. For x-ray imaging the challenges for photon counting are much harder than for nuclear imaging. The average energy of the photons is only around 70 keV as compared to 140 keV for SPECT and 511 keV for PET. Moreover, the x-ray fluence rate for a computed tomography (CT) scan can be up to 10 9 mm −2 s −1 , while for nuclear imaging it is as low as 100 mm −2 s −1 , putting much higher constraints on fast pulse processing for x-ray imaging.
The first photon-counting imaging system approved by the U.S. Food and Drug Administration was the Sectra MicroDose Mammography (Åslund et al 2007) in 2011, with around one thousand installations worldwide for breast-cancer screening and diagnosis. The first full-field photon-counting CT prototype was evaluated in the clinic in 2007 based on a CdZnTe detector, and though limited in the count rate it could accommodate, it still produced material-specific images of high quality (Benjaminov et al 2008). There are currently at least four photon-counting CT systems under evaluation. Three of these, including one mobile head CT system, utilize cadmium-based sensors (Yu et al 2016c, Si-Mohamed et al 2017a, Han-soo 2017, and one uses silicon-based sensors (da Silva et al 2019). In particular, one cadmium-based system has generated a large number of publications (Yu et al 2016a, 2016c, Symons et al 2018a, 2018b. Furthermore, a CdTe-based photon-counting system limited to breast CT (Kalender et al 2017) has been evaluated on patients. We are now at a crossroads to emerging clinical systems Figure 1. An image of an excised human heart with a calcified coronary artery containing iodinated gel. The leftmost image is a 67 keV virtual monoenergetic image. Two-material basis decomposition has been performed with the basis pairs: calcium (water), iodine (water), calcium (iodine), and iodine (calcium). The image set was acquired using a state-of-the-art dual-energy CT scanner. The slice thickness was 0.625 mm, the focal spot was 1.2 mm, and the x-ray tube was operated at 120 kVp and 384 mAs. The images were acquired at the Karolinska Hospital, Stockholm, Sweden. Adapted from Grönberg et al (2020) (© 2020 Springer Nature Switzerland AG. Part of Springer Nature.). Adapted with permission of Springer. CC BY 4.0.

Figure 2.
An image of an excised human heart imaged with a prototype silicon-based spectral PCD. The leftmost image is a 67 keV virtual monoenergetic image. Three-material basis decomposition was performed with water, calcium, and iodine as bases. An overlay image was formed in which the regions of calcium and iodine are colored red and green respectively. The image was acquired at 120 kVp using a focal spot 0.4 mm, and reconstructed with a 0.625 mm slice thickness. The dose was matched to that of the dual-energy image in figure 1. Adapted from Grönberg et al (2020) (© 2020 Springer Nature Switzerland AG. Part of Springer Nature.). Adapted with permission of Springer. CC BY 4.0. benefits of PCDs over energy-integrating detectors (EIDs). Section 3 outlines the different design choices that must be made when developing a PCD. Section 4 discusses how image quality is affected by the detector properties. Section 5 describes the steps needed to generate images from the measured data. Section 6 describes different metrics of imaging performance that can be used to guide the system design. Finally, in section 7, we discuss what these insights mean for the future of photon-counting CT.

Weighting of photons
In the photon-detection process and/or image formation, different weight can be given to photons of different energy, affecting both the contrast and the noise of the signal.
The contrast between two projection measurements depends on the energy of the transmitted x-ray photons. Generally, low-energy photons carry more contrast information than high-energy photons. Also, if two projection measurements have different material composition, the dependency of contrast on photon energy is elevated, and increased contrast can be obtained by increasing the weight given to photons that carry more contrast information. Weighting some photons more than others increases the variance relative to the mean value, thus reducing the signal-to-noise ratio (SNR). There is a fundamental trade-off between the increase of contrast and the reduction of the SNR, and there is an optimal set of photon weights that maximizes the CNR for a given imaging task (Schmidt 2009). The weighting of photons will often be performed indirectly via the material-decomposition process (see section 5.2), where it can be seen as the utilization of the contrast and noise of the signal to obtain a material thickness estimation with minimum variance.
An EID weights each detected photon by its energy, thus adding additional weight to high-energy photons, for which the contrast is lower. In addition, the non-uniform weighting of photons results in a reduction of the SNR. The reduction of the SNR 2 of an EID due to the weighting of the photons is commonly referred to as the Swank factor (Swank 1973), and it is given by where ε is signal amplitude and S(ε) is the normalized distribution of signal amplitudes registered by the detector. The magnitude of the Swank factor depends on the distribution of signal amplitudes; the more the amplitudes are distributed over different energies, the lower the Swank factor. The distribution of signal amplitudes in turn depends on the spectrum from the x-ray tube, the attenuation of the imaged object, and the response of the detector. Ideal purely PCDs intrinsically weight all photons equally (one photon, one count). Counting photons therefore gives relatively more weight to low-energy photons compared to energy integrating, resulting in a higher contrast, in particular for low absorbing materials. 4 PCDs also avoid the negative effect of the Swank factor. In addition, spectrally resolving PCDs allow giving different weight to photons of different energy as a part of the signal processing, and the weighting can be, for example, tailored to maximize the CNR.
As an illustration of the effect of photon weighting, consider the task of separating between two projection measurements with a typical 120 kVp x-ray tube spectrum for which the x-rays pass through (a) 10 cm water and (b) 9 cm water plus 1 cm of 10 mg ml −1 iodine-water solution, respectively. In this example, we simulate an ideal PCD that detects all photons and registers the counts in two energy bins, s low and s high , separated by a threshold at 50 keV. The results are compared to a simulation of an ideal EID that integrates the energy for all transmitted photons for the same imaging task. In order to optimize the CNR for the PCD, a weighted sum of the two energy bins is formed: where 0⩽x⩽1, where x = 0.5 corresponds to the PCD without energy weighting. As figures-of-merit we evaluate the following: the contrast between the two projections, defined as C = (s A − s B )/s A ; the signal-to-noise ratio, defined as SNR = s A /σ(s A ), where σ is the standard deviation; and finally, the contrast-to-noise ratio, defined as CNR = C × SNR = (s A − s B )/σ(s A ).
The contrast, SNR, and CNR for the PCD are plotted versus x in figure 3, and the performance of the EID is included for reference. The SNR of the PCD obtains its maximum at x = 0.5, i.e. equal weight to both energy bins. The contrast, on the other hand, increases monotonically as more weight is given to the low-energy bin (higher x). The CNR, which is the product of the contrast and the SNR, reaches a maximum of 1.17 for x = 0.76. The corresponding relative CNR 2 , which is proportional to x-ray dose, is 1.37. In other words, for this imaging case, it is motivated to sacrifice SNR in order to gain contrast, and thus achieve a superior CNR.
For the simulated ideal EID (gray lines in figure 3), on the other hand, the suboptimal energy weighting results in a relative contrast of 0.82 (compared to the PCD without energy weighting), and a relative SNR of Figure 3. An example of the projection-domain contrast, SNR, and CNR for a task of separating 10 cm water and 9 cm water plus 1 cm of 10 mg ml −1 iodine-water solution. The PCD is ideal with two energy bins for which energy weighting has been performed in accordance with equation (2). The performance is compared to that of an ideal EID for the same imaging task. Each figure-of-merit has been normalized to the performance of a PCD without energy weighting (x = 0.5).
0.96 (in accordance with the Swank factor). In total, the sub-optimal energy weighting of the EID results in a relative CNR of 0.79, or, equivalently, a relative CNR 2 of 0.63, which is to be compared with 1.37 for the PCD with optimal energy weights. In other words, the ideal spectral PCD with optimal energy weights is, for this particular imaging task, 2.17 times more dose efficient compared to the ideal EID. It can be noted that the performance of the EID is similar to that of the spectral PCD with x = 0.27.

Material-specific imaging
Another benefit of energy-resolving PCDs is that they allow measuring the composition of the imaged object through a process known as material decomposition. This method is based on the fact that the linear attenuation coefficient µ(E) of any material in the human body can be well approximated by a linear combination of a small number of basis functions f j (E): µ(E) = ∑ Nm j=1 a j f j (E), where the coefficients a i are referred to as basis coefficients (Alvarez and Macovski 1976). Since there are two dominant physical processes contributing to x-ray attenuation in the diagnostic CT energy range, namely photoelectric absorption and Compton scattering, N m = 2 basis functions are commonly assumed to be sufficient for substances containing only light elements, such as human tissues. In addition to these two, one additional basis function must be included for each heavy element whose attenuation coefficient contains a K-edge discontinuity within the diagnostic energy range Proksa 2007, Schlomka et al 2008).
This low dimensionality means that it is possible to completely characterize the energy dependence of the linear attenuation coefficient at every point in the imaged volume with a small number of energy bins: at least two bins are needed for non-enhanced imaging, and three bins with a contrast agent. Compared to the dual-energy systems in clinical use today, the capability to measure the amounts of three or more basis materials is one of the benefits of PCDs, along with other advantages such as the reduction of spectral overlap, the absence of spatial mismatch between the energy-bin images, and the ability to obtain spectral information also in the peripheral parts of the image as opposed to dual-source systems, where one detector has a limited field of view due to geometric constraints.
From the energy-resolved measured data, the material composition at each point in the imaged volume can be inferred through material decomposition. This process, which can take place before, during, or after the image reconstruction, (see section 5.2) results in a set of reconstructed basis images. These basis images show the distribution within the imaged object of each of the selected basis materials, e.g. water, calcium, and iodine. Once such a set of basis images has been generated, it is straightforward to obtain distribution maps of other substances (e.g. muscle and fat) through a simple linear transformation. Another possibility is to use the basis material maps to generate images of the x-ray attenuation coefficient at each energy (i.e. virtual monoenergetic images; see Leng et al (2017)), which are free of beam-hardening artifacts (see section 5.4).
A further possibility is to form a linear combination of the basis images to generate an image that is optimal for maximizing detectability for a specific imaging task. It turns out that this methodology gives a detectability for an optimal linear observer that is equal to the detectability obtained from optimal weighting of the original energy-bin images, provided that photon statistics are sufficient (Alvarez 2010, Persson et al 2018b. In other words, as long as the detector is operating far enough above the fluence level that causes photon starvation, the material-decomposition process preserves the information available in the measured data. Since material decomposition also has the theoretical ability to remove beam hardening completely, displaying weighted basis images makes it possible to avoid the trade-off between beam-hardening artifacts and image detectability that is observed when generating weighted sums of the original bin images (Shikhaliev 2005).
Since a larger number of estimated basis components makes the material decomposition process more ill-conditioned, the benefit of adding each new basis material diminishes with an increasing number of basis functions used in the decomposition. At the same time, using a detector with a large number of energy bins makes it possible to choose a number of basis functions that is optimal for a given task. By using four or more energy bins, it is thus possible, in principle, to quantify several contrast agents independently.
Furthermore, it is still an open research question whether the two-basis approximation is exact down to the precision that photon-counting CT scanners are able to measure. Multiple studies indicate that the two-material approximation is insufficient for performing high-precision measurements of the attenuation coefficient (Williamson et al 2006, Bornefalk 2012b, Alvarez 2013. This suggests that it should be possible to separate more than two components (e.g. water, calcium and iron) from each other when imaging unenhanced human tissue. Measuring the concentration of three materials can be desirable when imaging, for example, vulnerable plaques (Wang et al 2011b) or liver iron overload (Luo et al 2015).
However, even if three-material decomposition of unenhanced tissue turns out to be feasible, it is likely that the high sensitivity to noise in three-basis decomposition will limit its application to large-area tasks (Alvarez 2013), unless prior assumptions about mass or volume preservation are included (section 5.2). The two-basis approximation is also inaccurate at sharp edges in the object, which get a unique spectral signature due to the non-linear partial volume effect (Glover and Pelc 1980), a phenomenon that can be used to obtain subpixel spatial information from spectral measurements (Persson et al 2018a).

Spatial resolution
One of the main advantages of the PCD is the improved spatial resolution compared to conventional detectors. PCDs that have been designed for full-body clinical CT generally have pixel sizes ranging from 0.225 to 0.5 mm at the detector; this is smaller than conventional EIDs today, which have pixel sizes generally on the order of 1 mm (Leng et al 2016, Shefer et al 2013, da Silva et al 2019. Recent developments have, however, enabled improvements of the spatial resolution of EIDs reaching approximately 0.5 mm pixel pitch at the detector (Yanagawa et al 2018).
The spatial resolution of EIDs has been held back by several practical limitations. To mitigate pixel crosstalk, the detectors are made up of scintillator crystals that have to be diced into pixels, and on the edges of each pixel, there is a reflecting material which keeps the secondary light from escaping the pixel (Shefer et al 2013). The finite thickness of the reflectors creates dead area between the pixels, reducing the geometric efficiency. If the pixels are made smaller, the dead area takes up a larger fraction of the detector area. Semiconductor detectors, on the other hand, do not emit any secondary light and therefore require no reflectors. Instead, the detector consists of a continuous piece of semiconductor material, and these direct conversion detectors are pixelated by charge-collecting electrodes. The pixels can therefore be made smaller without losing geometric efficiency. 5 Also, due to the nature of the energy-integrating signal formation, there is a small contribution of integrated electronic noise for each detector channel. If the pixels are made smaller, the number of detector channels per detector area increases, as does the total amount of integrated noise. A PCD, on the other hand, can adapt the lowest energy threshold to the noise floor for each channel and avoid counting noise, even for very small pixels.
Several approaches have been suggested for improving the spatial resolution of scintillator detectors, such as using an attenuating grid to obtain a smaller pixel aperture (Flohr et al 2007). However, this comes at the expense of a substantial reduction in dose efficiency. It has also been proposed to use high-resolution flat-panel detectors for CT (Gupta et al 2006). However, there are some remaining issues that have to be addressed in order to make the technology feasible, including the characteristics of the scintillator material (e.g. speed, afterglow, and lag), object scatter corrections due to removal of the anti-scatter grid, and the dynamic range of the detector.
The benefit of the improvement in spatial resolution made possible by PCDs has been demonstrated on multiple occasions , Bartlett et al 2019, Symons et al 2018a, von Spiczak et al 2018. Figure 4 shows an example of the high-resolution performance of a conventional CT system and a silicon-based full-body PCD prototype. Details about the prototype system can be found in da Silva et al (2019). In this example, the PCD can resolve approximately twice as many line pairs per millimeter as the conventional CT system. Most conventional dual-energy CT systems have limited spatial resolution owing to aspects of the system design. For example, if the spectral separation comes from the source, it is common to use a larger focal-spot size to facilitate a sufficient tube output. Spectral PCD systems, on the other hand, will facilitate simultaneous spectral and high-resolution imaging. Figure 5 shows a comparison of the spatial resolution of a state-of-the-art dual-energy system and a silicon-based full-body PCD prototype for imaging the structures of the inner ear.
In addition to the improved spatial resolution in the transaxial and longitudinal direction caused by the smaller pixel size, PCDs also have an advantage over EIDs in terms of the angular spatial resolution. The reason for this is twofold: First, PCDs do not suffer from scintillator afterglow which can lead to lag between consecutive measurements in EIDs if the sampling rate is high. Second, the electronic noise level in EIDs tends to increase for higher sampling rate, so that increasing the angular resolution leads to a penalty in dose efficiency. PCDs, on the other hand, are able to reject the electronic noise as will be discussed in section 2.4 and can therefore avoid this dose penalty. This allows higher sampling rates, provided that enough bandwidth is available to read out the data (Sjölin and Danielsson 2017). Potentially, the faster sampling rates could also be used to detect rapid motion.
With an increased spatial resolution of the detector comes a risk of introducing aliasing artifacts in the reconstructed image. This can be avoided by maintaining sufficient radial and angular sampling rates. In the radial direction, the sampling rate is automatically increased since the sampling interval (pixel center-to-center distance) is reduced. In the angular direction, on the other hand, the sampling rate (number of views per revolution) must by increased in order to avoid visible angular blur and aliasing artifacts.
As a final remark on spatial resolution, we note that the detector pixel size and the angular sampling rate are not the only factors impacting the spatial resolution of the imaging system. In order to fully utilize the increased spatial resolution of the detector, appropriate adjustments must be made to the focal spot size, image reconstruction grid, and reconstruction algorithms. All of these aspects significantly impact the CT imaging chain.

Low-dose imaging
A further application of photon-counting imaging is low-dose imaging. There are two reasons why PCDs are able to provide superior performance for low-dose imaging tasks compared to energy-integrating detectors.
The first reason is the overall improvement in image quality achievable with PCDs, as described in previous sections. This includes CNR improvement through better utilization of the energy information in the detected x-ray beam and improvement in visualization of small objects. Since the detectability of a Figure 5. An example of the spatial resolution of the structures of the inner ear of a human volunteer comparing a conventional dual-energy CT system (top row), and a silicon-based full-body PCD prototype (bottom row). The images were dose-matched, and both images were acquired at 120 kVp and reconstructed at 67 keV monoenergy. The dual-energy images were acquired using 1.2 mm focal spot, and the PCD images were acquired using a 0.6 mm focal spot. The dual energy images were acquired at the Karolinska Hospital, Stockholm, Sweden. The PCD prototype is described in described in da Silva et al (2019). feature in the image increases with dose, these potential improvements in image quality at equal dose can be traded for a lower dose by keeping the image detectability fixed and lowering the tube current instead.
Second, PCDs have a particular advantage for imaging with low dose levels, namely their ability to reduce the impact of electronic noise. PCDs use a threshold to separate real counts from noise, and by setting the threshold high enough above the noise floor, the electronic noise can be rejected. In this case, the only degradation caused by electronic noise is a small broadening of the energy-response function due to random fluctuations in the pulse-height measurements. EIDs, on the other hand, measure the total x-ray energy deposited during a certain time interval, and the electronic noise will be included in the measurement as a random additive term. The magnitude of this term is relatively constant, whereas the quantum noise is proportional to the incident fluence rate; therefore the electronic noise can go more or less unnoticed at high doses but becomes prohibitive at low doses or for large patients (Duan et al 2013). On the other hand, the detective quantum efficiency (DQE) (see section 6.3) of a PCD is, theoretically, constant down to the zero-flux limit. The relative advantage of photon-counting CT compared to energy-integrating CT is therefore larger at low dose levels (Yu et al 2016a). PCDs may therefore allow new low-dose imaging protocols at dose levels where electronic noise is prohibitive with current state-of-the-art scanners, with potential applications in pediatric imaging and lung cancer screening (Symons et al 2016).

General functionality of a PCD
PCDs for x-ray CT are so-called direct-conversion detectors, wherein the x-ray photons are converted directly into an electric signal as opposed to first being converted to visible light. PCDs for CT generally consist of semiconductor sensors with an applied bias voltage. An interacting photon creates a cloud of charge carriers, creating a signal which is processed and registered by an application-specific integrated circuit (ASIC). Each detected photon interaction results in a count in an energy bin corresponding to the energy deposited in the interaction.  Despite many years of development, PCDs have only recently approached the performance necessary for being used in clinical CT scanners. The main challenges are achieving good performance at high count rate, obtaining good spectral fidelity, and manufacturing a full-field detector with sufficiently low density of imperfections at a competitive cost. In the following, we examine the different detector design considerations that need to be accounted for in order to optimize detector performance.

The detector material
There are currently two main converter material candidates: cadmium (zinc) telluride (CdTe or CZT) and silicon (Si). Both candidates have pros and cons, some of which we will mention here, and their properties are summarized in table 1. More comprehensive accounts for the properties of the detector material and the functionality of the ASICs can be found in the literature (Ballabriga et al 2016).

Detector physics
One of the main differences between CdTe/CZT and Si detectors is the relative x-ray stopping power. CdTe has a high linear attenuation coefficient, requiring only about 1.7 mm to stop 95% of the x-rays in a 120 kVp spectrum filtered by 30 cm of water. Silicon, on the other hand, has a relatively low atomic number and requires roughly 55 mm to stop the same fraction of x-rays under the same conditions. As it is not feasible to make the semiconductor wafers much thicker than a few millimeters, the silicon wafers are mounted edge-on with respect to the incoming x-rays (Bornefalk and Danielsson 2010). This way, the effective depth of the detector is determined not by the thickness of the wafer, but by the length of the wafer, allowing the detector to have as long an absorption length as necessary. See figure 6 for an illustration of the face-on (CdTe/CZT) and edge-on (Si) geometries.
Another difference between the detector materials is the prevalence of the different types of x-ray interactions that occur in the material. In materials containing heavy elements (high-Z materials), such as CdTe/CZT, the main interaction mechanism is photoelectric absorption; Compton and Rayleigh scattering accounts for only a few percent of the total absorption. Silicon, on the other hand, has a high probability of Compton interactions, which dominate over the photoelectric effect for photons with energy higher than 48 keV. An example of the distribution of primary interactions in a 50 mm Si detector and a 1.6 mm CZT detector is shown in table 2. The distribution of interactions is computed for a typical 120 kVp x-ray tube spectrum that has been filtered through 30 cm of water (Cranley et al 1997, Hubbell andSeltzer 2004). However, keep in mind that it is not straightforward to assess the signal quality based on the distribution of primary interactions in the detector material. For example, in the event of a Rayleigh scattering, there is a high likelihood that the photon will scatter with a small angle and interact again within the same detector pixel. In the same way, the charge from a photoelectric interaction can be shared between two pixels, distorting the measured energy and resulting in a double count. A Compton-scattered photon deposits parts of its energy in the first interaction and continues in a new direction. If a Compton-scattered photon is detected at least once, it adds to the detection efficiency of the detector. However, the spectral information is largely lost. If the Compton-scattered photon is detected more than once, then the signal is degraded due to count multiplicity (see section 4.2).
For high-Z detector materials, there is a high probability of K-fluorescence emission, namely, a photoelectric interaction with a high-Z element (e.g. Cd and Te) results in the emission of K-shell characteristic x-rays . For CZT, approximately 70% of all photoelectric absorptions result in the emission of characteristic x-rays (Krause 1979). The energies of the emitted characteristic x-rays lie between 23 and 27 keV, and the mean free path in the detector material is approximately 120 µm. The emission and reabsorption of characteristic x-rays result in a skewing of the energy spectrum (see section 4.4), a reduction of the spatial resolution (see section 4.3), and count multiplicity, which reduces the quantum efficiency of the detector (see section 4.2).
As illustrated in this section, the detection mechanisms of PCDs are complex. In the end, what matters for the signal quality is (1) the number of photons that are registered at least once (section 4.1), (2) the count multiplicity (section 4.2), (3) the spatial distribution of the counts (section 4.3), and (4) the associated spectral response (section 4.4). To get a full picture of the imaging performance, these effects must be evaluated simultaneously (section 6.3).

Collection of charge carriers
When a photon interacts in the semiconductor detector material, a cloud of charge carriers (electron-hole pairs) is created. The number of charge carriers is proportional to the energy deposited in the interaction. The electrons and holes drift in opposite directions through the detector material by an applied high-voltage bias (typically 150-1000 V, depending on the material and the material thickness), and the electrons and holes are collected by the electrodes and the back side, or vice versa (Fang et al 2018).
The signal in the electrode does not result from the collection of the charge itself, but rather from the movement of the charge in the electric field in the detector bulk (Hamel and Paquet 1996). When the charged particles move in the electric field, an electrical current is induced on the electrode. Therefore, the particles that are collected on the back side also contribute to the signal. How much the movement of particles induces a signal on the electrode depends on the strength of the so-called weighting potential (Xu et al 2011). By using very small pixels, the weighting field is strong only very close to the electrode, and only particles moving there contribute significantly to the signal. This is the so-called small-pixel effect (Barrett and Myers 2003), and it is used to minimize the signal from slowly moving holes in CdTe/CZT.
It is possible to collect either the holes or the electrons at the electrode. For detectors with low hole mobility and a high risk of hole trapping, such as for CdTe/CZT, it is advantageous to collect the electrons. For silicon detectors, the two types of charge carriers have similar mobility and there is a low risk of trapping, and both options are therefore feasible. For CdTe/CZT, it is desirable to drift the holes as short a distance as possible due to the low mobility, and since they are collected at the back side, the back side of the sensor should preferably face the x-rays.

Sensor thickness
The sensor thickness plays a part in determining the average charge collection time and the charge cloud diameter, and therefore the amount of charge sharing. Thicker sensors also degrade spectral response due to increased probability of charge-carrier recombination during the drift through the wafer. For face-on detectors, such as CdTe/CZT, the sensor thickness determines the total absorption efficiency of the detector, and typical thicknesses range from 0.9 to 3 mm (Barber et al 2013, Taguchi andIwanczyk 2013). For edge-on detectors, the wafer thickness determines the pixel pitch in one direction (the pixel pitch in the other direction is determined by the width of the charge-collecting electrodes; see figure 6). A typical value of sensor thickness for edge-on Si detectors is 0.5 mm (Xu et al 2013a).

Charge sharing
Charge sharing affects the quantum efficiency of the detector (since a photon can be lost if the charge is split, so that both pulses are below the lowest energy threshold), the spectral response, and the DQE via count multiplicity.
Factors that affect the prevalence of charge sharing include the distribution of energies deposited in the detector material (higher deposited energies create more charge sharing); the size of the charge cloud when it reaches the electrode (affected by charge mobility, thickness of material, bias voltage, and location of interaction); the pixel size (smaller pixels result in more charge sharing); the number of neighboring pixels (four for face-on, and two for edge-on); and the lowest threshold (a high lowest-energy threshold reduces the rate of double counts). Furthermore, charge sharing can be reduced by using an attenuating anti-scatter grid located over the pixel boundaries (Tkaczyk et al 2009). Many approaches for modeling the effects of crosstalk between detector pixels and energy bins have been investigated , Faby et al 2016.

Electronic noise
The electronic noise on an ASIC channel is affected by many factors, including the capacitance connected to the ASIC channel input, the sensor leakage current, the temperature, and the properties of the ASIC analog channel (Xu et al 2013b). The electronic noise in a PCD affects the energy resolution and the quantum efficiency of the detector by defining the lowest possible threshold setting without counting noise.
Silicon sensors need to be sensitive to low-energy Compton interactions in order to obtain a high detection efficiency, and it is therefore important to be able to set the lowest-energy threshold as low as possible. This is possible only if the electronic noise in the ASIC is low. In comparison, the noise requirements for an ASIC used with a CdTe/CZT sensor are less stringent, since most of the useful events deposit an energy of 25 keV or more.

Depth segmentation
A sensor in the edge-on geometry can divide the pixels into depth segments/strata, effectively dividing the count rate by the number of segments (Liu et al 2016). This capability is important if the count-rate tolerance is an important factor in the total detector performance. In addition, the depth segmentation introduces a redundancy in the pixel channels that improves reliability.

Imperfections in the detector material
One drawback of cadmium telluride is its relatively high density of imperfections, which can act as traps or recombination centers for electrons and holes (Bolotnikov et al 2005). This is detrimental to the detector performance in two ways: First, the build-up of trapped charges in the semiconductor gives rise to an electric field that causes the charge-collection efficiency to degrade or even break down at high photon fluence rates, a phenomenon known as polarization (Siffert et al 1976, Bale andSzeles 2008). Second, the amount of charge that contributes to the measured signal becomes dependent on the interaction location, resulting in degraded energy resolution in the form of tailing in the energy spectrum (Xu et al 2011). This tailing can be reduced by making the detector pixels small relative to the wafer thickness, thereby decreasing the sensitivity of the measured signal to the hole motion, and therefore also hole trapping, through the small-pixel effect (Barrett et al 1995). At the same time, using too-small electrodes can result in loss of energy resolution, in addition to the spectral degradation caused by charge sharing, due to incomplete charge collection in the case when the surface of the material is not a perfect dielectric and there is a slight surface conductivity (Bolotnikov et al 1999).
In contrast, silicon sensors can be manufactured with a very low density of imperfections. Since this material also has a higher charge-carrier mobility compared to CdTe, the mobility-lifetime products for electrons and holes in silicon are approximately 3-4 orders of magnitude larger than those of CdTe (Fang et al 2018), meaning that most charge carriers live long enough to contribute to the signal. This leads to negligible charge tailing compared to CdTe/CZT, apart from the Compton interactions (Xu et al 2013a), and ensures that polarization is much less of a problem.

Other factors
Other important factors related to the detector material are production reliability, stability over time, and production cost. In the end, these factors may be important to the adoption of PCDs in the clinic.

Signal processing 3.3.1. The analog channel
The induced current on the electrode is convolved with the transfer function of the ASIC analog processing, which in general terms consists of a charge-sensitive amplifier and a shaping filter. The main purpose of the analog processing is to produce an output pulse with a pulse height that is proportional to the integral of the induced current, which in turn is proportional to the charge collected on the electrode, and to do so with a high signal-to-noise ratio.
One of the main parameters of the analog channel is the shaping time, which determines the temporal width of the ASIC filter kernel. For high count-rate applications, the shaping time should be kept as short as possible in order to avoid pulse pileup. However, the shaping time needs to be long enough to ensure that the height of the output signal is proportional to the energy. If the shaping time is too short, then the amplitude of the output signal will depend on the length of the input signal (a longer input signal resulting in a lower amplitude and vice versa), and the energy resolution is adversely impacted. A long shaping time also generally leads to a decreased relative noise level, since the input signal is low-pass filtered more heavily by the shaper. Hence there is a trade-off between pileup tolerance, noise level, and energy resolution, which has been addressed by, for example, using dual shapers (Sundberg et al 2018).

The digital channel
The ASIC channel on a multi-bin PCD has several pulse-height comparators (thresholds) that are used to identify the arrival of a new photon pulse and to estimate the energy of the photon interaction (see figure 7). Each comparator compares the amplitude of the output signal from the analog channel to a programmable reference voltage supplied by a digital-to-analog converter. The comparator returns a one or a zero depending on whether the signal exceeded the reference voltage or not. The photon counts are categorized into a set of counters (energy bins), generally one for each comparator.
The number of energy thresholds varies between different photon-counting ASICs (Taguchi and Iwanczyk 2013). In order to perform a two-basis material decomposition, it is necessary to have only two energy thresholds. However, the performance of the material decomposition depends on the position of the thresholds, and the optimal selection of the thresholds is task dependent. Having more thresholds ensures that a close-to-optimal CNR can be obtained for all imaging tasks in a scan (Shikhaliev 2008, Zheng et al 2020, and that close to the minimum possible noise can be achieved in material-specific images (Alvarez 2011, Faby et al 2015. Also, more thresholds can improve the effectiveness of corrections, such as charge-sharing or pileup correction (see section 3.3.2).
The way the digital part of the ASIC analyzes the output from the comparators is commonly referred to as the counting mode, and the exact implementation differs between ASICs. The counting modes are generally categorized as either paralyzable or non-paralyzable, referring to the behavior of the channel under heavy pulse pileup (section 4.5).
For a non-paralyzable detector, when a photon pulse crosses the lowest threshold, a counter is incremented, and a dead time is initiated during which no other photons are counted. After the dead time, the channel can count again. The dead time is selected such that the signal induced by the longest input pulses are below the lowest energy threshold at the end of the dead time. With increasing number of detected photons per second on the channel, the output from the non-paralyzable channel approaches a fixed number (the maximum number of dead times that fit within a single readout interval).
In a paralyzable detector, the dead time is extended until the signal drops below the threshold. At high flux rates, pileup can cause the signal level to exceed the threshold level for longer periods, resulting in increased dead time. Paralyzable behavior occurs, for example, if the detector is configured to count the number of times the input signal crosses each threshold, known as the threshold-crossing frequency. In presence of severe pileup, the input signal does not fall below the highest threshold, and the threshold-crossing frequency therefore drops to zero.
It is also possible to implement other counting modes, such as detection of local maxima (peak sample and hold) using the comparators, and Hsieh and Pelc (2016) have shown that more sophisticated counting modes can substantially improve the performance of the detector.

Charge summing and anti-coincidence logic
Charge sharing and the emission/reabsorption of characteristic x-rays has a significant negative impact on the imaging performance of the detector (Schlomka et al 2008, Xu et al 2011.
The corrupted signal can be partially restored by means of interpixel communication (Koenig et al 2014, Ji et al 2018. One way of correcting the signal is to use analog charge summing (e.g. the Medipix3RX ASIC) (Ballabriga et al 2007, Koenig et al 2013, Nilsson et al 2007. A circuit on the ASIC sums the charge in overlapping clusters of, for example, 2 × 2 pixels prior to comparison with the energy thresholds, and the photon count is allocated to the pixel that registered the largest collected charge. The analog charge-summing mode obtains an energy resolution corresponding to the increased pixel size, while keeping the spatial resolution defined by the native pixel size. However, the detector reduces its capability to cope with high count rates by a factor equal to the number of pixels for which the charge is summed. Digital anti-coincidence logic has been implemented in, for example, silicon strip detectors developed for mammography (Fredenberg et al 2010b). The scheme identifies double-counting events where the pulses in two neighboring pixels cross over the lowest energy threshold simultaneously, and it keeps the first detected pulse (which generally is larger) and disregards the second pulse. This method improves the noise properties and the spatial resolution by removing double counting, but does not improve the spectral imaging capability since no energy correction is made.
A spectral version of the digital anti-coincidence logic, referred to as digital charge summing, has been evaluated in simulation (Hsieh and Sjolin 2018). For the evaluated imaging cases, and for a detector with only two energy thresholds, the digital charge summing achieved roughly half the benefit (improved dose efficiency) compared to the analog charge summing. Digital charge summing has the potential to be much faster than its analog counterpart, since the digital charge summing needs only a short coincidence window to register if two events are simultaneous; otherwise, the channel operates as normal. The probability of false coincidences (i.e. two events in neighboring pixels randomly occurring within the coincidence window) is therefore low, but not negligible.
Further, it has also been suggested that, instead of correcting the charge-sharing events, the coincidences can be registered in so-called coincidence counters, read out from the ASIC and handled in post-processing (Hsieh 2020).

Effects impacting image quality
The effect of x-ray scatter, characteristic fluorescence reabsorption, and charge sharing between pixels is described by three processes: degradation of the spatial resolution, increase of noise, and degradation of the spectral information.

Quantum efficiency
The key figure of merit for the dose efficiency of an x-ray detector is the quantum efficiency, which describes how many of the incoming photons are detected. For the quantum efficiency, it does not matter if the photon is detected in a photoelectric event or a Compton-scattering event, or if it is affected by K-fluorescence or charge sharing. As long as the photon is detected at least once, it contributes to the detection efficiency.
Naturally, the thickness of the detector material impacts the quantum efficiency. However, thicker detectors can affect other important aspects of the detector performance. For face-on detectors, a thicker material affects the charge collection time, and therefore the energy response, via an increase of charge trapping/recombination and charge sharing. Edge-on detectors do not suffer from the same downside to increasing the depth of the detector, apart from the increase in detector material and the resulting more stringent demands on alignment precision, since the distance the charge travels remains the same.
The position of the lowest threshold can have a large impact on quantum efficiency by determining how many low-amplitude input pulses are detected. Any process of spectral degradation that causes some of the photon pulses to become lower than the lowest threshold leads to a reduction of the DQE. For Si, this effect can be caused by charge sharing and Compton scattering. For CdTe/CZT, this effect can, in addition, be caused by K-fluorescence emission/reabsorption and the trapping/recombination of charge carriers. Photon-induced pulses dropping below the lowest energy threshold as a consequence of charge sharing can lead to regions of insensitivity at the boundaries between pixels (Tlustos et al 2006). The lowest energy threshold also determines the number of double counts to a large extent by inclusion or exclusion of low-energy charge sharing and K-fluorescence reabsorption. For Si, the lowest threshold is often set at around 5-15 keV in order to include many Compton interactions (Bornefalk andDanielsson 2010, Persson et al 2014), and for CdTe/CZT, the lowest energy threshold is often assumed to lie at 20-25 keV (Xu et al 2011, Shikhaliev 2009, Yu et al 2016a. In practice, it may not be optimal to reject electronic noise completely, but instead to operate the detector with the threshold at a low enough level where a small number of false counts are generated by the noise floor, as long as the extra noise is outweighed by the benefit of detecting more primary detection events . The semiconductor wafer is not necessarily sensitive in the entire bulk of the detector. For CdTe/CZT, there can be a dead layer near the back side of the sensor which is insensitive to x-rays (Matsumoto et al 2000, Moralles et al 2007, 6 and for Si, there is generally an inactive guard ring around the wafer protecting against leakage current. The dead layers can have a large impact on the performance of the detector, in particular for CdTe/CZT since the detector material is highly attenuating. For example, a 20 µm dead layer on a CdTe/CZT detector reduces the quantum efficiency by approximately 9%, whereas a 200 µm guard ring on a Si detector reduces the quantum efficiency by approximately 1.5% (for a 120 kVp spectrum attenuated by 10 cm of water). A dead layer on CdTe/CZT detectors can also form as a consequence of high x-ray flux (Du et al 2002). Furthermore, there is a dead layer close to the pixels, where an interaction gives rise to charges that only move through part of the weighting field and therefore generate a reduced signal (Boucher 2013).

Count multiplicity
As a consequence of scatter, K-fluorescence, or charge sharing, a photon can be counted more than once. Count multiplicity gives extra weight to a fraction of the photons, resulting in a reduction of the zero-frequency DQE and noise correlations between neighboring pixels (Michel et al 2006).
As a consequence of count multiplicity, the zero-frequency DQE is reduced by a factor where r n is the fraction of detected photons that are counted n times. Comparing equation (3) to the Swank factor of an EID (1), we can see that the expressions are identical apart from that the signal amplitude (ε) in equation (1) is exchanged for the multiplicity (n) in equation (3). Counting photons in more than one pixel results in pixel-to-pixel correlations and a non-white noise power spectrum (NPS) . The detector NPS can be obtained by taking the discrete Fourier transform (DFT) of the auto-covariance of the counts measured by the detector array (Cunningham 2000). In the special case for which a fraction p of the λ photons that are counted in a pixel are counted also in the neighboring pixels (p/2 to the left and p/2 to the right), the auto-covariance is given by where N i is the number of counts in the ith pixel. To illustrate the effect of the pixel-to-pixel correlations on the noise, consider the NPS normalized by the square of the mean number of counts in each pixel: where , and u is spatial frequency. Examples of NPS norm for different values of p are shown in figure 8 (the curves have been normalized by 1/λ to obtain unity NPS norm for p = 0, and the spatial frequency axis is normalized by the sampling frequency). The increase of NPS norm at the zero frequency is inversely proportional to the decrease of the zero-frequency DQE given by (3): An important observation is that the pixel-to-pixel correlations lead to a higher number of events detected in each pixel; each pixel registers λ(1 + p) photons, as opposed to only λ. This can also be seen as the integral of the normalized NPS reduces as p increases. When the pixels are considered together, however, it becomes clear that the quantum efficiency has indeed been reduced, as indicated the increase of the zero-frequency NPS. This property of correlated noise must be considered when, for example, evaluating the detector's CNR performance.

Spatial resolution
The spatial resolution of the PCD is determined, first and foremost, by the center-to-center distance between the electrodes on the semiconductor surface. For edge-on detectors (e.g. silicon), the thickness of the semiconductor wafer determines the spatial resolution in one of the dimensions. Charge sharing, K-fluorescence reabsorption, and Compton scattering degrade the spatial resolution somewhat by causing events to be detected in positions other than where they first interacted with the detector. This effect leads to a blur of the point-spread function (PSF), and therefore a degradation of the modulation transfer function (MTF). These effects can partially be corrected using anti-coincidence logic or similar methods.
Charge sharing, unless corrected, also limits the smallest feasible pixel size indirectly due to the increase in count multiplicity and the degradation of the energy response of the detector. If pixels that are too small are used, the detector will lose much of its spectral capabilities and some of its dose efficiency .
The spatial resolution of the detector is not constant over the range of detected energies. High-energy interactions are predominantly registered close to the center of the pixel, since charge sharing occurs at the Figure 9. An example of the benefits of a native high-resolution image acquisition, here when imaging tissue of an excised human heart (same as that shown in figure 2). Images (a) and (b) were acquired with a conventional CT system with 0.7 mm focal spot, and reconstructed with a soft kernel and a bone kernel, respectively. Image (c) was acquired by a prototype silicon-based PCD with roughly half the native pixel size and a 0.4 mm focal spot, and reconstructed with a high-resolution filter kernel. The images were acquired at comparable dose and slice thickness (0.625 mm). The native high-resolution acquisition of the PCD has less noise in a high-resolution image, which improves the visibility of the low-contrast high-resolution structures within the tissue. pixel boundaries, giving them a more narrow PSF, whereas low-energy interactions can be registered over the full extent of the pixel (Stierstorfer et al 2019). This implies that the detector MTF has an energy dependence, and the spatial resolution can be enhanced by giving more weight to the high-energy bins at the cost of increased noise in the reconstructed image. In the case that monoenergetic images are formed from material basis maps, the spatial resolution will therefore depend on the choice of monoenergetic energy. In order to have a full characterization of the spatial resolution of the detector, the full energy range should be evaluated.

The spatial resolution and noise trade-off
A common misconception when it comes to high-resolution detectors is that having small pixels comes with a noise penalty. Indeed, the noise in each detector pixel will increase if the pixels are smaller, since fewer photons are detected in each pixel. However, high-resolution image acquisition improves the resolution-noise trade-off in the reconstructed image. That is, for the same image noise, the high-resolution system will have better spatial resolution at equal dose, and conversely, for the same spatial resolution, the high-resolution system will have lower noise. The literature predicts noise variance reductions between 14% and 83% from a 2× increase in native resolution (Baek et al 2013, Kachelrieß andKalender 2005). The highest noise reductions are observed when reconstructing high-resolution images. For low-resolution images (e.g. for soft-tissue imaging) the benefit is reduced. In addition, the high-resolution system can use reconstruction kernels with higher cut-off frequency without introducing aliasing artifacts than a system with lower native resolution. An example of the benefit of a native high-resolution detector is shown in figure 9, in which the two images to the left are acquired using a conventional CT system reconstructed with a soft kernel (a) and a bone kernel (b), and the right-most image (c) was acquired using a prototype PCD, with roughly half the pixel size, and reconstructed using a high-resolution kernel. The native high-resolution acquisition of the PCD improves the spatial resolution and noise trade-off, and allows reconstructing high-resolution images with less noise and aliasing artifacts.

Energy response
In contrast to properties like detection efficiency and spatial resolution, which are important for all detectors, the ability to measure the energy distribution of the incident spectrum is unique to energy-resolving detectors. The precision with which each deposited energy is measured is partly determined by the number of energy thresholds (see section 3.3.2). At the same time, the number of thresholds is not the only factor that affects the spectral imaging performance.
The energy-resolving capability of PCDs is also determined by non-ideal effects in the detector materials and the readout electronics that give a non-ideal energy response, i.e. cause the registered signal to deviate from the expected amplitude. In a hypothetical ideal PCD, each photon gives rise to an electrical signal with amplitude proportional to the incident photon energy. In CdTe-based and CZT-based detectors (figure 10(a)), K-fluorescence gives rise to two secondary clusters of peaks in the detected spectrum distinct from the photopeak: fluorescence peaks at the fluorescence energy (23 and 26 keV for Cd, and 27 and 31 keV for Te), (Thompson et al 2009) and K-escape peaks at the original energy minus the fluorescence energy , Xu et al 2011. The response of a silicon-based detector ( figure 10(b)), on the other hand, exhibits a large fraction of Compton interactions at low energies and reproduces the photoelectric part of the spectrum with diminished magnitude, but it is otherwise not distorted in the way characteristic of the response from a CdTe-/CZT-based detector (Bornefalk and Danielsson 2010). In both CdTe/CZT and Si detectors, charge sharing causes spectral tailing, which is a background of charge-shared counts of relatively constant magnitude (figure 10) (Xu et al 2011. Spectral tailing can also be caused by incomplete charge collection (section 3.2.7).
In addition, electronic noise in the readout electronics causes further spectral degradation by adding a random error to the registered amplitude of each pulse. In practice, this characteristic has the effect of convolving the detected spectrum with a Gaussian kernel. Typical photopeak widths reported for detectors designed for high-flux x-ray imaging are 3.5-5.4 keV full width at half maximum (FWHM) for Si (Xu et al 2013a) and 5-10 keV FWHM for CdTe/CZT (Iwanczyk et al 2009, Brambilla et al 2013, Barber et al 2015. Spectral response models incorporating physical effects and electronic properties have been published for silicon (Liu et al 2015a(Liu et al , 2015b and CdTe/CZT (Schlomka et al 2008, Cammin et al 2014, 2018b.
The effect of the spectral distortion is that it degrades the performance of the detector for tasks that have a strong energy-dependent component. While the performance for density-imaging tasks, such as detection of non-enhancing tumors, is only weakly affected by non-ideal energy resolution (Schmidt 2010), imaging tasks involving calcium and iodine are more strongly affected . Good energy-resolving capability is especially important for material quantification tasks, since these cannot be performed at all with a non-energy-resolving CT. The impact of the detector energy response on material quantification performance has been investigated by several authors (Roessl and Herrmann 2009, Wang and Pelc 2012. Apart from the detector energy response, the incident spectrum shape is also an important factor affecting spectral imaging performance, and multiple studies have investigated how performance can be optimized by varying the x-ray tube voltage (kVp) and prepatient filtering, such as with a K-edge filter (Wang and Pelc 2012, Chen et al 2015, Atak and Shikhaliev 2015. To understand how the incident spectrum shape interacts with the detector response function, note that misregistration of photon energies does not degrade spectral imaging performance if it is possible to correct for the spectral distortion and thereby recover the original spectrum. Instead, degradation of imaging performance occurs only when there is overlap between detected energies originating from different incident energies, so that it is impossible to recover the incident spectrum without ambiguity (Persson et al 2018b). Figure 11 illustrates how K-fluorescence can give rise to spectral overlap, depending on the incident energy spectrum distribution. K-fluorescence and charge sharing give rise to spectral tailing that overlaps with incident energies present in the true spectrum, which exacerbates the harmfulness of these effects on the spectral imaging performance. On the other hand, Compton scatter events in a silicon-based detector do not overlap much with the spectrum of photoelectric events. Instead, their detrimental effect on energy-resolving imaging stems from the fact that one energy of a Compton event can correspond to a range of incident energies. The Compton Figure 11. A conceptual illustration of how the effect of spectral distortion depends on the spectrum incident on the detector. A photon with the original incident energy E in = 60 keV loses 23 keV through the escape of a K-fluorescence event and deposits E dep = 37 keV. (a) For a spectrum filtered through 100 mm of water, the deposited energy overlaps with the incident spectrum, and the original energy cannot be recovered after detection. (b) For a spectrum filtered through 500 mm of water, the deposited energy does not overlap with the spectrum, and the original energy can be recovered by adding 23 keV to the measured energy. Note that this is a simplified example, and that a more realistic model should also include spectral overlap due to charge sharing. events are therefore less useful than the photoelectric events for determining the incident energy distribution, but they do not confound the spectral information provided by the photoelectric events.

Pulse pileup
Photons that interact within a short time period 'pile' onto each other, resulting in count loss, distortion of the Poisson counting process, and degradation of the energy information. In order for a PCD to be feasible for clinical CT, the detector has to be able to cope with very high fluence rates that can exceed 3.5 × 10 8 mm −2 s −1 at the detector (Persson et al 2016). Figure 12 shows how the photon fluence rate is distributed on the detector for an imaging protocol that is demanding in terms of high-count-rate capability. The effects of pulse pileup can be mitigated by decreasing the pixel size, but the benefit is limited since small pixels are more prone to charge sharing, which causes double counting and spectral degradation . The mathematical description of pulse processing by an electronic channel is known as a dead-time problem, and these problems have been studied extensively for several decades and have applications in many scientific fields (Müller 1973, Carloni et al 1970, Wielopolski and Gardner 1976. Still, analysis of the effects of pulse pileup in spectral PCDs, and how to account or correct for this effect, is an active topic of investigation today.

Statistical degradation
In the presence of pileup, the output count process is no longer Poisson distributed. A simple, but commonly used, model of the photon-counting process is the so-called ideal non-paralyzable (INP) model (Tenney 1984). The INP model predicts the statistics and spectral distortion under two simplifying conditions: (1) there is a fixed dead time after each photon detection during which all additional events are lost, and (2) the registered energy in the case of pileup is the sum of the energy of the associated photon interactions. The mean and variance of the output count process Y for the INP model is given by where λ is the input count rate and τ is the dead time. The mean value and the variance of the output count rate for a non-paralyzable detector with non-zero pulse length, the so-called semi-nonparalyzable (SNP) model (Xu et al 2011), have been derived analytically (Grönberg et al 2018) and are given by where λ is the input count rate, τ is the dead time, and τ s is the average time at the end of the dead time during which, if a photon interacts, a new dead time is initiated directly after the first dead time. The length of τ s depends on the length and shape of the photon-induced pulses and is often a large fraction of τ (0.77%-0.86% in Liu et al (2016)). Examples of the mean value and the variance predicted by the INP and the SNP model with τ s = 0.8 as a function of count rate is plotted in figure 13. The input and output count rates have been normalized by the characteristic count rate N 0 = 1/τ . In the ideal case, without pileup and with Poisson statistics, the output count rate is equal to the input count rate, and the variance is equal to the mean value. Pileup introduces a non-linear response to variations in the flux rate, and the statistics start to deviate from Poisson behavior. The effect of pileup on image quality is therefore not straightforward to assess, and looking at, for example, count-rate linearity tells only half the story. One way to incorporate both the effect of non-Poisson statistics and the non-linearity of the output count rate is to consider the CNR for the task of differentiating between two similar projection measurements with a small difference δ in photon flux. A flux-dependent doseefficiency can then be defined as the ratio of the CNR 2 with and without pileup: An example of the dose efficiency as a function of count rate is plotted in figure 14(a), and the CNR 2 versus the thickness of attenuating material (here water) is shown for different maximum count rate (N 0 , 2N 0 , and Figure 14. (a) The dose efficiency for a low-contrast imaging task as a function of the input count rate (normalized with N0), comparing the ideal non-paralyzable model and the semi-nonparalyzable. (b) The CNR 2 estimated by the semi-nonparalyzable model versus the thickness of water attenuation for three different input count rates of the unattenuated beam (Nmax). Even though the input count rate is high when there is little water in the beam path, and the DQE is reduced by pileup, the CNR 2 is still relatively high since pileup mainly affects high-flux projections, for which the SNR is high.
3N 0 ) in figure 14(b). As an example of a maximum input count rate, consider a case for which the maximum x-ray flux on the detector is 3.5 × 10 8 mm −2 s −1 (Persson et al 2016), and the detector has 0.25 × 0.25 mm 2 pixels and a 100 ns dead time; then the maximum input count rate is 2.2N 0 (3.5 × 10 8 mm −2 s −1 × 0.25 2 mm 2 × 100 × 10 −9 s = 2.2). Even though the dose efficiency is reduced for the higher count rates, the CNR 2 is still high since pileup mainly affects projections that have high statistics. The effect of the statistical degradation from pileup in the final CT image is limited by several factors. First, the central rays, where the x-ray flux is generally low due to patient attenuation, contribute much more to the image than the peripheral rays due to the nature of tomographic image reconstruction. Second, the x-ray flux at the periphery of the patient is lowered by the bowtie filter of the x-ray tube. Also, pileup increases the noise for high-flux projections, for which the SNR is high (see figure 14(b). Combined, these effects help mitigate the impact of statistical degradation from pulse pileup.

Spectral distortion
Pulse pileup introduces errors in the registered photon energies, resulting in a distortion of the energy spectrum. The consequence is a reduction in the spectral separation between energy bins, and thus a reduction of the basis-material-decomposition performance.
The effect of spectral degradation from pileup occurs earlier than the loss of dose efficiency (equation (9)), which can, for example, be seen by observing that the response of the individual energy bins deviates from linear behavior at much lower count rates than the total number of counts (Fredenberg et al 2010b). It has also been shown that pulse pileup affects the imaging performance more for spectral detection tasks than for density imaging tasks (Wang et al 2011a).
The spectral distortion depends on, for example, the photon-counting mode, the dead time (if applicable), the pulse shape in the ASIC, and the input spectrum. Many approaches for modeling the spectral effects of pulse pileup can be found in the literature (Taguchi et al 2010, 2011, Wang et al 2011a, Cammin et al 2014, Roessl et al 2016. Other approaches for handling the effects of pileup include empirically estimating parameters to model the non-ideal effects (Zimmerman andSchmidt 2015, Alvarez 2011). Also, methods for correcting the effects of pileup after read-out (e.g. by using neural networks) have been investigated (Touch et al 2016, Feng et al 2018. Some authors have also suggested using the so-called pileup trigger method to help mitigating spectral pileup (Kappler et al 2011). This method uses a threshold that is placed above the highest energy of the input spectrum to identify if a photon has been subject to pileup.

Energy bin images
The most straightforward ways of generating reconstructed images from spectral CT data are either summing the counts in all energy bins before reconstruction, thereby generating a conventional-looking CT image, or reconstructing a CT image from the measured data in each energy bin (Yu et al 2016c). In comparison with material decomposition, reconstructing bin images has the advantage of being easy to implement since it does not require accurate models of the source and detector. On the other hand, this method has the drawback of not being able to correct for the effect of beam hardening. The bin images can be reconstructed with filtered backprojection or with an iterative algorithm similar to those used for single-energy CT (Salehjahromi et al 2018). By summing the images with carefully selected weight factors, the optimal CNR for a specific imaging task can then be attained (Schmidt 2009).
Another option is to display the images individually, which has the benefit of showing the attenuation in a narrow energy range, but leads to high noise since only a subset of the photons are used to form the image. This trade-off between precise spectral information and noise can be mitigated by applying postprocessing algorithms that act jointly on the different bin images and exploit similarities between them to reduce noise. An example is the HYPR-LR algorithm, in which low spatial frequencies from the individual bin images is combined with high spatial frequencies from a full-spectrum image, thereby generating images combining spectral information with low noise (Leng et al 2011). This type of bin image noise reduction can also be incorporated in an iterative reconstruction algorithm, such as the spectral prior image constrained compressed sensing (SPICCS) algorithm, which uses total variation regularization to penalize structural differences between the bin images and a full-spectrum image (Yu et al 2016b).
Other, more sophisticated regularization methods that exploit redundancy between the bin images have also been studied, including total nuclear variation (Rigie and La Rivière 2015), total generalized variation (Niu et al 2018b), and dictionary learning (Zhang et al 2017). Denoising based on non-local means filtering where the broad-spectrum image is used to weight pixels according to similarity was studied by Li et al (2017) and Zhang et al (2016 combined non-local-means-like regularization in the spectral dimension with SPICCS. Block-matching, where patches with similar structure in different parts of the image are identified in the broad-spectrum image and used to denoise each other (Harrison et al 2017) have also been demonstrated. The observation that most of the image pixels tend to be spanned by a small number of basis materials, whereas k-edge elements are often confined to localized regions, can be exploited for noise reduction as shown by Gao et al (2011). There are also algorithms exploiting the low dimensionality of spectral CT images in combination with SPICCS (Zeng et al 2020) or patch-based denoising (Niu et al 2018a). Finally, a new and promising research direction is the application of convolutional neural networks for bin image denoising (Clark and Badea 2020).

Material decomposition
A standalone material-decomposition step can be performed before the reconstruction (in projection space) or after the reconstruction (in image space). Another possibility is to perform reconstruction and material decomposition simultaneously, as discussed in section 5.3. Image-space decomposition is straightforward to implement, since it can take the form of a simple linear transformation of the energy-selective reconstructed images , Symons et al 2017a. However, this does not compensate for the non-linearity of the x-ray attenuation and therefore does not remove beam-hardening artifacts completely. Iterative image-space decomposition has been proposed as a way of solving this problem (Heismann and Balda 2009).
A more straightforward way of eliminating beam hardening, thereby obtaining a quantitatively reliable image, is by using projection-space material decomposition, where the counts in the different energy bins are mapped to line integrals of basis coefficients before reconstruction. This method is well suited for photon-counting CT with its perfect geometric alignment between energy bins, but it can be more difficult to implement for systems exhibiting mismatch between the high-and low-energy projection rays, such as dual-source CT. In statistical terminology, the mapping from counts to line integrals is called an estimator. If the number of energy bins is equal to the number of basis materials, this estimator can be a simple mapping from counts to estimated line integrals, given by a polynomial fit, for example (Alvarez and Macovski 1976). If there are more energy bins than basis functions, the choice of estimator is more complicated.
There are a number of statistical properties that are desirable for such an estimator: ideally, it should be unbiased, meaning that the output should be equal to the true projected basis coefficient on average; the output should preferably have low variance; and it should be robust in the sense that it is able to perform well under a variety of conditions without generating image artifacts. For example, streak artifacts may result if the estimated basis sinograms contain anomalous measurements.
The maximum-likelihood estimator is commonly used in photon-counting CT research (Roessl and Proksa 2007, Schlomka et al 2008. This estimator builds on the fact that the registered counts m i in each energy bin i is Poisson distributed and finds the set of projected basis coefficients A such that the x-ray beam is most likely to have passed through, given the measured counts. This estimator has the favorable property of being asymptotically unbiased and efficient for large measured count numbers, meaning that the variance approaches its theoretical minimum given by the Cramér-Rao lower bound (CRLB) (see section 6.2) (Kay 1993).
Mathematically, the maximum-likelihood estimate is calculated by minimizing the negative log-likelihood function with an iterative algorithm. This equation builds on a model for the expected number of counts λ i (A) in each of the N b energy bins, as a function of the projected basis coefficients A. An example of such a model is with N m denoting the number of basis materials, S(E) denoting the spectral density as a function of energy E, and w i (E) denoting a weight function describing the spectral sensitivity of the energy bin to different incident photon energies. Ehn et al (2016) proposed a modification of this forward model that can be fitted well to a limited number of calibration measurements. One drawback of the maximum-likelihood estimator is that it is calculated with an iterative algorithm, which is time-consuming. Other estimators have therefore been proposed with the aim of increasing speed. Alvarez proposed using a linearized, approximate forward model to obtain a linear estimator, which is then corrected using a look-up table to reduce bias (Alvarez 2011(Alvarez , 2016. As an extension of this method, different linearized forward models can be applied in different regions of the space of possible A values (Rajbhandary et al 2017).
Another estimator was proposed by Lee et al (2017aLee et al ( , 2017b. This estimator is a three-step method which builds on approximating the set of possible transmittance functions that can be expressed as j=1 A j f j (E) as a low-dimensional space. This allows the decomposition problem to be formulated as follows: (1) least-squares fitting of the transmittance X(E) that best agrees with the registered counts; (2) least-squares fitting of a combinations of basis functions ∑ j A j f j (E) to log X(E); and (3) post-correcting for bias. In recent years, neural network estimators have also attracted attention. These are based on networks of simple processing units, called neurons, that are trained to map measured bin counts directly to the corresponding basis coefficient line integrals (Zimmerman and Schmidt 2015).
Since basis images tend to be noisier than conventional CT images, significant attention has been paid to methods of reducing basis image noise. In analogy with the SPICCS algorithm for bin image denoising, the prior-knowledge-aware iterative denoising method uses an image reconstructed from the sum of all energy bins as a prior image and penalizes structural differences between this image and basis images, after rescaling (Tao et al 2018). More advanced basis decomposition methods, such as block-matching methods (Wu et al 2019), dictionary learning (Wu et al 2020) or neural-network-based methods (Chen and Li 2019), are able to reduce noise by aggregating information from multiple pixels when forming the basis images.
Another way of reducing basis image noise is by enforcing a constraint, such as volume or mass conservation, in the basis decomposition (Liu et al 2009, Mendonça et al 2010, Ronaldson et al 2012, Malusek et al 2013, Lee et al 2014, Curtis and Roeder 2017, Ren et al 2019. Such a constraint provides an additional equation in addition to the x-ray measurements, allowing three-material decomposition with two energy bins or four-basis decomposition with three energy bins. Even if a sufficient number of energy bins are available, it may be desirable to include constraints in order to reduce noise in the decomposed basis images. A drawback of this approach is that the constraints build on prior assumptions about the material compositions that can occur in the body, and if these assumptions are in disagreement with reality, including the constraints will result in systematic errors in the resulting basis images (Bornefalk andPersson 2014, Yveborg et al 2015a). Whether the benefits of such constraints outweigh the drawbacks is therefore dependent on the specific imaging task.

Image reconstruction
The process of reconstructing photon-counting images depends on which material-decomposition algorithm is used. An ordinary single-channel reconstruction algorithm such as filtered back-projection can be used to reconstruct one image for each energy bin, allowing image-space material decomposition to be performed subsequently. Projection-space material decomposition, on the other hand, can be implemented either as a one-step or two-step inversion, depending on whether the mapping from measured counts to basis images takes place in a single processing step or whether basis sinograms are generated as an intermediate step.
In two-step inversion, projection-space material decomposition is first used to generate a set of basis sinograms, one for each basis material (Schlomka et al 2008). From these basis sinograms, a set of reconstructed basis images is then generated with a reconstruction algorithm such as filtered back-projection or an iterative method. In the case of an iterative method, there is the possibility of choosing whether the image reconstruction should treat the different bases independently (Schirra et al 2013) or as a joint optimization problem (Sawatzky et al 2014). The noise in the basis images resulting from basis decomposition is anti-correlated between basis materials, since the basis decomposition algorithm is able to measure the total attenuation more accurately than it can measure the exact amount of each basis material in the beam path. By modifying the data weighting to take this anti-correlated noise structure into account, the reconstructed image noise can be reduced. At the same time, the coupling between basis materials introduced in this way increases the complexity of the reconstruction problem and can also lead to artifacts caused by crosstalk between the basis images (Persson and Grönberg 2017).
In one-step inversion, on the other hand, the mapping from registered counts to basis images takes place in one single step, eliminating the ray-wise decomposition step. This step typically takes the form of an iterative optimization, which searches for the set of basis images that best corresponds to the measured counts in each projection line and energy bin. In this case, the reconstruction necessarily treats the basis images as a joint optimization problem. The development of algorithms that optimize image quality while satisfying computational requirements is an active research topic. Several one-step algorithms have been proposed, based on primal-dual optimization  or separable quadratic surrogates (Long and Fessler 2014), which can be accelerated with momentum terms (Mechlem et al 2018). A comparison of the computational cost of five such algorithms can be found in Mory et al (2018).

Image display
The result of the image reconstruction algorithm is a set of reconstructed images, one for each basis material. These can either be displayed as is (e.g. as maps of contrast agent concentration) or combined together to form a final image. Basis material decomposition is often claimed to increase image noise, and this is true in the sense that the individual basis images (e.g. a K-edge image) exhibit higher noise levels than a conventional CT image formed from the summed counts in all energy bins. On the other hand, as pointed out in section 2.2, the basis images can be combined in a weighted sum and yield the same optimal CNR as a weighted sum of the bin images (Rajbhandary and Pelc 2016). Another option is to combine the basis images with a conventional gray-scale image, which can always be reconstructed from the raw data in addition to the basis images, to form a color-overlay image combining high resolution and/or low noise with a material distribution map , Symons et al 2017b. Thus, material decomposition does not cause any of the available information to be lost, but rather enables additional ways of presenting the image data.
One or more basis images can be used to form a virtual non-contrast image (Faby et al 2015). Another display technique currently used in dual-energy CT is to form virtual non-calcium images, which can suppress bone and thereby help in visualizing bone marrow (Kellock et al 2017). The different basis images can also be combined to form virtual monoenergetic images , Symons et al 2018b, as is common in dual-energy scanners today. By varying the display energy, the CNR can be optimized for a given imaging task. More advanced ways of forming virtual monoenergetic images include prior-knowledge-aware iterative denoising (Tao et al 2019b) and neural-network-based methods (Feng et al 2019). Approximate material maps and virtual monoenergetic images can also be generated through image-space material decomposition from reconstructed bin images (Zhou et al 2019), although this approach gives less accurate results since it does not eliminate beam hardening.
Another possibility for image display is to form a synthetic Hounsfield image, which is an image with CT numbers equal to those of an energy-integrating CT image for a specified set of acquisition parameters, but with beam-hardening artifacts eliminated (Bornefalk 2012a). This may facilitate the transition from energy-integrating to photon-counting CT by providing image measurements that can be compared to existing reference values.

Detector performance metrics
As described in the preceding sections, the physical effects affecting detector performance can lead to trade-offs between different desirable detector properties. For example, a smaller detector pixel size gives higher spatial resolution and less sensitivity to pileup but also leads to more crosstalk between pixels, causing double counting and degraded energy resolution. When designing a PCD, it is therefore important to devise accurate metrics of detector performance in order to evaluate the effects of design decisions on image quality. Full-scale simulation of a photon-counting CT acquisition may in many cases be too time-consuming, especially if the objective is to characterize noise properties, which requires simulating many noise realizations. For this reason, a number of mathematical metrics of detector performance have been developed.
For energy-resolving PCDs, the question of how to measure detector performance is further complicated by the need to take energy resolution into account. Since energy information can be used in different ways, it also becomes useful to distinguish between different types of tasks in order to describe the detector performance. In a detection task, the objective is to detect if a certain feature is present or absent in the image. This feature can be a small feature, such as a bone fragment or a small blood vessel, or a large feature that is hard to distinguish from the surrounding anatomy, such as a tumor. In a material quantification or material separation task, on the other hand, the objective is to measure the composition of the imaged object. Within the framework of basis material decomposition, this is done by measuring the amount of each basis material present in every voxel in the imaged volume. This method allows making quantitative tissue characterizations, which can be important for example in bone densitometry or calcium scoring. The method also facilitates generation of material-specific maps (e.g. iodine maps) which allows distinguishing iodinated contrast agents from non-enhancing anatomical structures, or virtual non-contrast images where the contrast agent is removed and only the tissue attenuation remains. Quantification of iodine concentration is important, for example, in follow-up of cancer treatments. Separation between iodine and calcium is crucial to assess the degree of stenosis in calcified plaques, determining whether catheterized invasive diagnostics or stenting is required (Alessio andMacDonald 2013, Boussel et al 2014). It is also important in CT perfusion imaging in general and in assessment of stroke when selecting patients for thrombectomy to remove the obstruction to the blood supply.
The most important difference between detection and quantification tasks in terms of their requirements on detector performance is that quantification tasks are generally more dependent on energy resolution than detection tasks. This characteristic can be seen by considering a purely counting PCD with no energy-resolving capabilities. Such a detector can perform relatively well for most detection tasks just by detecting the difference in the number of transmitted photons caused by the presence of a feature. A non-energy-resolving detector, on the other hand, is not able to distinguish between different materials, but only to measure their total attenuation.

Contrast-to-noise ratio
A common way to compare the performance of two imaging systems is to evaluate the CNR, or perhaps more commonly the CNR 2 , for a given imaging task. However, when comparing detector systems with different energy response, such as an EID and a PCD, the relative CNR 2 is highly task dependent. The relative performance depends not only on the choice of materials in the two projections for which the CNR is computed, but also on the density, or thickness, of those materials. As an example, consider figure 15, where we have compared the projection domain CNR 2 of a Si PCD, a Si spectral PCD with optimal energy weighting, and an EID. The two projections that were used to compute the CNR 2 comprised 20 cm of soft tissue and 19 cm of soft tissue plus 1 cm of an iodine/water solution with varying concentration up to 10 mg ml −1 . The CNR 2 curves reach a minimum at slightly different iodine concentrations, i.e. there is an offset between the curves. Both the PCD and the EID reach zero CNR 2 for some concentrations of iodine, whereas the spectral PCD never reaches zero due to the optimal energy weighting (Yveborg et al 2013). The relative performance of the photon-counting systems compared to the energy-integrating system has a strong dependence on the iodine concentration ( figure 15(b)). Comparing imaging systems with a different energy response through a single relative CNR 2 measurement is therefore likely to be misleading. In fact, altering the imaging task can change which of the detector systems obtains the highest CNR. This also applies when comparing different photon-counting systems (e.g. silicon-based and CdTe/CZT-based detectors) since they too have different energy response. A possible solution to this problem is to make sure that the selected imaging tasks are clinically relevant and evaluate the performance for a range of imaging tasks in order to get a picture of the overall performance of the system.

Cramér-Rao lower bound
Whereas CNR indicates how much a certain feature stands out from the background relative to the noise level, another measure is needed for assessing the performance for material-specific imaging. This role is filled by the CRLB (Kay 1993). The CRLB gives a lower bound on the variance for any unbiased estimator, or a lower bound on the covariance matrix when estimating a multivariate parameter. In the context of photon-counting CT, the CRLB gives a lower bound on the covariance matrix of estimated projected basis coefficients. Roessl and Herrmann (2009) derived the CRLB for both photon-counting and EIDs and used this formula to study how the noise in basis images depends on detector design parameters. The noise level in the image domain can be obtained from the projection-space CRLB through back-projection (Roessl et al 2011).
It may seem like a lower bound on image variance is of limited value since it does not provide any information about the upper limit of the variance. In practice, however, the maximum-likelihood estimator Figure 15. (a) A comparison of the projection domain CNR 2 for a photon-counting detector without energy resolution (PCD), a spectral photon-counting detector (spectral PCD) with optimal energy weighting, and an energy integrating detector (EID). The projections used to compute the CNR 2 comprised 20 cm soft tissue and 19 cm soft tissue plus 1 cm of an iodine/water solution with varying concentration. (b) The ratio of the CNR 2 curves, showing that the relative performance of the imaging systems is highly dependent on the iodine concentration. Figure 16. Illustration of CRLB for realistic simulation models of CdTe-based and silicon-based detectors and an ideal detector, in a three-basis (water/bone/iodine) decomposition. The tube voltage is 120 kVp and the object consists of 200 mm of water, 5 mm of bone and 10 mm of iodine (10 mg ml −1 ). In each plot, the contours show the ellipse that the decomposed projected basis coefficients will fall inside with 95% probability according to the CRLB, when aggregating data from a 9 × 9 mm 2 detector segment. The CdTe-based detector is 1.6 mm thick and divided into 0.225 × 0.225 mm 2 pixels. The silicon sensor has an absorption depth of 60 mm, a pixel pitch of 0.5 × 0.5 mm 2 , and 20 µm thick tungsten scatter blockers between the wafers. The detector models are as described by Persson et al (2020), with the difference that the CdTe-based and the silicon-based detector here both have eight energy bins. Object scatter was taken into account assuming a one-dimensional tungsten anti-scatter grid of 25 × 0.1 mm lamellae with 1.0 mm (for Si) and 1.125 mm (for CdTe) spacing in front of the detector. The ellipses representing the silicon-based and CdTe-based detectors are more elongated compared to the ideal detector, showing that imperfect energy resolution makes it more difficult to identify the material composition.
comes close to attaining the CRLB (Roessl et al 2011). Figure 16 illustrates the CRLB for a three-basis decomposition task, for a CdTe-based and a silicon-based detector model.
It is important to note that the lower limit of basis projection variance is a lower limit of the variance of any unbiased estimator. If the estimator is allowed to be biased, the variance can be allowed to be lower than this limit. An extreme example would be a material-decomposition algorithm that predicts the same projected basis coefficients regardless of the measured data, which gives an estimator with zero variance and large bias. In practice, it may be desirable to use inequality constraints, such as non-negativity (Barber et al 2016, Long andFessler 2014), and thereby reduce variance for certain basis configurations at the expense of a small amount of bias.
In contrast to CNR, which is relevant for detection tasks, the CRLB measures the ability of the imaging system to identify the material composition of the imaged object. The CRLB therefore provides a measure of the energy-resolving capability of a PCD, and it has been applied to investigating the relationship between pixel size and energy resolution in PCDs . Faby et al used the CRLB to validate the statistical optimality of an image-based decomposition method, which was then used to study the effect of energy-bin correlations and compare the performance of photon-counting systems with other dual-energy CT techniques (Faby et al 2015(Faby et al , 2016. Another application of the CRLB is to estimate the covariance matrix between basis line integrals in ray-wise material decomposition in order to obtain the data weighting for the reconstruction step in a two-step material-decomposition algorithm (Schirra et al 2013, Sawatzky et al 2014.

Spectral DQE (energy-dependent spatial resolution and noise)
Characterization of conventional x-ray detectors is typically done using the noise-equivalent quanta (NEQ) and DQE. These metrics are based on linear-systems theory and are typically expressed as functions of spatial frequency u. The NEQ can be calculated from the incident number of photons q, the large-area gain G, detector MTF, and NPS as follows (Cunningham 2000): The NEQ of a detector is the number of noise quanta that an ideal detector, which registers every incoming photon and its exact position of incidence, would need to register in order to achieve the same detectability as the detector under consideration. This metric is dependent on the illumination conditions. For example, a detector which has an NEQ of 15 000 mm −2 for a particular spatial frequency band when irradiated by 20 000 photons per mm 2 is 75% as efficient as an ideal detector, since the ideal detector could achieve the same detectability with only 15 000 incident photons. The DQE is obtained by normalizing the NEQ by the number of incident photons, which can be shown to be equal to the squared ideal-linear-observer detectability relative to an ideal detector, d ′ (u) 2 /d ′ ideal (u) 2 (Cunningham 2000). In the above example, the DQE is equal to 15 000/20 000 = 0.75 for the given spatial frequency band.
DQE is a useful metric of detector performance because it takes both noise and resolution into account. It is therefore a better measure of dose efficiency than CNR, which does not incorporate the effect of spatial resolution. Whereas the CNR increases by low-pass filtration of the image after acquisition, such processing leaves the DQE unchanged, reflecting the fact that low-pass filtering does not add information to the image.
PCDs with a single energy bin (i.e. non-energy-resolving detectors) can be analyzed with the traditional NEQ and DQE metrics. An early such study was made by Durst et al (2007); Acciavatti and Maidment (2010) compared the DQE of photon-counting and EIDs; and Stierstorfer et al modeled the DQE of a CdTe-based detector (Stierstorfer 2018, Stierstorfer et al 2019. A common way of simulating signal and noise transfer characteristics is to model the detection process as a cascade of simple processes, the so-called cascaded-systems analysis (Cunningham 2000). Tanguay et al (2013) developed the theory for cascaded-systems analysis of PCDs and used it to study the impact of non-ideal physical effects on detector performance (Tanguay et al 2015, Tanguay andCunningham 2018). Furthermore, Xu et al (2014) studied a silicon-strip PCD for mammography with a cascaded model, and Ji et al (2018) used a cascaded model to study anti-coincidence logic in a CdTe-based detector.
To analyze the ability of PCDs to perform energy-resolved imaging, cascaded-systems analysis has also been applied to energy-resolving detectors by Tanguay and Cunningham (2018) and in the work of Taguchi et al (2016Taguchi et al ( , 2018aTaguchi et al ( , 2018b, where such a model was used to study the effect of N × N binning of pixels into macro-pixels in a CdTe-based detector. Energy-resolving silicon-strip detectors also been studied, by Fredenberg et al (2010a) and Chen et al (2016). However, fully understanding how the resulting signal transfer and noise characteristics affect image quality requires an extension of the NEQ and DQE concepts to energy-resolving detectors. Important steps in this direction were taken by Richard and Siewerdsen (2008) in the context of dual-energy radiography.
A natural extension of the NEQ and DQE metrics to energy-resolving PCDs was proposed by Persson et al (2018b) and will be briefly outlined here. An equivalent spatial-domain description has been described by Rajbhandary et al (2019). Although it may be tempting to simply extend the NEQ to be dependent on incident photon energy in addition to spatial frequency, this definition does not work with broad-spectrum illumination since it fails to describe how different incident energies can be confounded during the detection process as described in section 4.4. Instead, NEQ needs to be defined as a function of two energy indices so that NEQ(u, E 1 , E 2 ) describes how much the joint presence of photons with energy E 1 and E 2 in the incident spectrum contributes to image detectability at spatial frequency u.
To compute this generalized NEQ, it is necessary to know detector characteristics. The signal transfer is described by the transfer function H k (u), which corresponds to the MTF but without absolute value and without the normalization to 1 at u = 0. The noise characteristics of the sampled signal is described by the cross-spectral density W d + , essentially a frequency-dependent covariance matrix between energy bins. From these quantities, the generalized NEQ is obtained as with q(E) denoting the spectral photon density and * denoting complex conjugate. Under the assumption that the object consists of a small set B of basis materials, it is more convenient to express the generalized NEQ in matrix form through a basis transformation. The elements NEQ B l1,l2 (u) of this NEQ matrix describe the contribution to image detectability from the addition of differential amounts of basis materials l 1 and l 2 . In addition to allowing the detectability to be calculated for detection tasks, this matrix also describes the performance for quantification tasks in a set of decomposed basis images. Namely, the matrix elements of its inverse give, up to a scale factor, the CRLB for the covariance of basis images l 1 and l 2 as a function of frequency (Persson et al 2018b, equation 25). This allows calculating the squared ideal-linear-observer detectability in basis image l as ) du (15) where ∆Ã l (u) is the difference between target and background in the Fourier transformÃ l (u) of the basis image, and Var is the variance of an efficient estimator forÃ l . As an illustration of equation (15), the relative squared ideal-linear-observer detectability for detection of features of different sizes in an iodine basis image is shown in figure 17 for simulation models of CdTe-based and silicon-based detectors. As shown in this figure, a system with smaller pixels, such as the simulated CdTe system, has an advantage for imaging of small features. At the same time, a system with high energy resolution, such as the simulated silicon system, has an advantage for imaging larger features in a material-selective basis image, in particular for three-basis decomposition where energy resolution is particularly important in order to separate iodine and calcium. It is also interesting to note that the CdTe detector performance exhibits a relatively weak kVp dependence, whereas the silicon detector benefits strongly from using low kVp, where the fraction of Compton scatter is low. These observations show that the spectral linear-systems framework provides important clues for how to best utilize PCDs, especially in view of the ongoing trend towards using lower kVp to save patient dose.
In analogy with equation (13), a generalized DQE matrix can also be defined by normalizing the NEQ matrix by the NEQ matrix of a detector with perfect quantum efficiency, energy resolution, and spatial resolution: In analogy with the conventional DQE, it can be shown that a diagonal element DQE B l,l of this matrix gives the task-specific DQE, i.e. the squared ideal-linear-observer detectability d ′ l (u) 2 /d ′ ideal (u) 2 relative to an ideal energy-resolving detector, for detecting basis material l. Here, the ideal-observer detectability of a multi-channel image should be interpreted as the highest detectability that can be obtained in any image as a weighted sum of the images, where the weight factors are allowed to vary with spatial frequency (Yveborg et al 2015b). The generalized NEQ matrix thus encodes the performance of an energy-resolving PCD for both detection and quantification tasks. How this framework can be applied to studying PCD performance is described by Persson et al (2020). Figure 17. Simulated squared ideal-linear-observer detectability d ′2 relative to an ideal energy-resolving detector, for realistic models of CdTe-based and silicon-based detectors. The curves show the squared detectability for Gaussian features in an iodine basis image resulting from 2-basis (iodine/water) (a)-(c) and 3-basis (iodine/bone/water) (d)-(f) decompositions, with a 200 mm thick water object. Three tube voltages were used: 80 kVp (a), (d), 120 kVp (b), (e), and 140 kVp (c), (f). The detector models and anti-scatter grid employed are the same as in figure 16, and the simulation model is described in Persson et al (2020). The feature size is given as full width at half maximum (FWHM) at the isocenter plane. The smaller pixel size of the CdTe-based detector is reflected in the high detectability for small features, whereas the superior energy resolution of the silicon-based detector gives leads to high detectability for large features. For comparison, the detectability is also plotted for the CdTe detector with pixels aggregated in 2 × 2 blocks, forming 0.45 × 0.45 mm 2 macro-pixels. For small features, the plotted detectability should be interpreted as an upper performance limit allowed by the detector, and not as anticipated system performance, since the detectability in a real system is also affected by the size of the x-ray source. Effectively, there will be no features smaller than the smallest available focal spot size, currently 0.4 mm × 0.5 mm (Yanagawa et al 2018), and this spot size is only available at a limited tube current. The vertical lines in the figure show the resolution limits in isocenter for focal spot sizes of 0.4-0.7 mm (IEC 60336) corresponding to typical minimum spot sizes for state-of-the-art scanners, assuming a bimodal focal spot shape (da Silva et al 2019).

Quantitative image accuracy
CNR, CLRB, and DQE are metrics related to the image noise level. In addition to these noise metrics, the systematic error, or bias, in the average value in regions of the image is also an important performance measure, for example when evaluating the conspicuity of artifacts or the accuracy of quantitative image measurements. For the latter case, a relevant metric is the mean-squared error, which combines the bias with the noise variance as (Kay 1993) where E[·] and Var[·] denote expected value and variance, respectively, and I True is the true value of the random image value I.
Although the standard deviation of image noise in a reconstructed image is typically several Hounsfield units, the averaging that takes place when measuring the mean CT number in a region of interest suppresses noise so that even a small systematic error can dominate the MSE. Similarly, when making quantitative measurements in a basis image, it is more important to reduce bias if the measurement region of interest is large, whereas for small regions it may be favorable to reduce noise at the expense of bias by reducing the number of basis materials .
One source of error is the log-normalization step transforming counts into attenuation line integrals. Since this operation is non-linear, it introduces an error that becomes larger as the dose decreases . Other sources of error are inaccuracies in the model used for material decomposition. In order to make quantitative material composition measurements that are limited by quantum noise rather than systematic errors, it is necessary to have a detailed model of the x-ray source, the detector response, and scattered photons from the object (Bornefalk et al 2015). Estimating the distribution of object scatter from energy-resolved measurements is a non-trivial problem since the energy spectra of scattered and primary photons overlap in general. However, energy-resolved measurements can be helpful for scatter correction and/or measurements of the scatter distribution (Sossin et al 2016(Sossin et al , 2017.
Despite the fact that more research is needed to address these challenges, measurements of K-edge contrast agents such as gold or iodine in basis images generated with prototype spectral CT systems have been shown to correlate well with the actual concentrations of these elements (Si-Mohamed et al 2017b.

Summary and outlook
After many years of technological advancements, PCDs are now on the verge of being adopted in clinical x-ray computed tomography. During the last decade in particular, there has been substantial research activity in the field of photon-counting CT, both in hardware development and in terms of improved understanding of how the resulting image quality is impacted by the detector performance. As outlined in the above sections, there is now a profound understanding of how detector design parameters (e.g. the choice of detector material and pixel size) or how choices in the signal processing chain (e.g. shaping time and threshold locations) affect the detector properties such as quantum efficiency, spatial and energy resolution, and pileup response. Likewise, there has been considerable progress towards understanding how these detector properties impact the resulting image quality, as outlined in section 6.
The challenge posed by pileup at high count rates, which was previously the main obstacle for the adoption of PCDs in clinical CT, has been largely overcome with the latest generation of detectors. In addition, there has been considerable progress in techniques for minimizing the detrimental effects of charge sharing, fluorescence, Compton scatter, and incomplete charge collection. Consequently, there are now several photon-counting CT prototypes available and used for imaging human volunteers. A first-generation photon-counting CT performs on par with conventional energy-integrating CT for unenhanced density imaging with normal dose (Symons et al 2016), so there is evidently room for improvement with better detectors in the future. The adoption of PCDs for CT also enables new imaging techniques that are still in the research stage, such as combining photon-counting CT with dual-energy acquisition (Faby et al 2015, Tao et al 2019a or with phase-contrast imaging (Epple et al 2015).
Looking ahead, continued improvements in detector technology coupled with more accurate physics modeling and data-processing methods are expected to bring the technology closer to its full potential, leading to substantial improvements in image quality in terms of contrast and spatial resolution at lower dose, to quantitatively reliable images, and to improved material discrimination capabilities. This is anticipated to improve diagnostic performance in a diversity of application areas such as cardiovascular, neurological, thoracic, oncologic, and pediatric imaging, just to name a few. We can expect the coming decade to be even more productive than the last one for the field of photon-counting CT.