Rationale and Objectives. Computer-aided detection (CAD) systems are frequently compared using free-response receiver operating characteristic (FROC) curves. While there are ample statistical methods for comparing FROC curves, when one is interested in comparing the outcomes of 2 CAD systems applied in a typical clinical setting, there is the additional matter of correctly determining the system operating point. This article shows how the effect of the sampling error on determining the correct CAD operating point can be captured. By incorporating this uncertainty, a method is presented that allows estimation of the probability with which a particular CAD system performs better than another on unseen data in a clinical setting.
Materials and Methods. The distribution of possible clinical outcomes from 2 artificial CAD systems with different FROC curves is examined. The sampling error is captured by the distribution of possible system thresholds of the classifying machine that yields a specified sensitivity. After introducing a measure of superiority, the probability of one system being superior to the other can be determined.
Results. It is shown that for 2 typical mammography CAD systems, each trained on independent representative datasets of 100 cases, the FROC curves must be separated by 0.20 false positives per image in order to conclude that there is a 90% probability that one is better than the other in a clinical setting. Also, there is no apparent gain in increasing the size of the training set beyond 100 cases.
Discussion. CAD systems for mammography are modeled for illustrative purposes, but the method presented is applicable to any computer-aided detection system evaluated with FROC curves. The presented method is designed to construct confidence intervals around possible clinical outcomes and to assess the importance of training set size and separation between FROC curves of systems trained on different datasets.
2005. Vol. 12, no 6, 687-694 p.
CAD; performance evaluation; sampling error; confidence interval; mammography; operating point estimation