Computer-aided detection (CAD) algorithms 'automatically' identify lung nodules on thoracic multi-slice CT scans
(MSCT) thereby providing physicians with a computer-generated 'second opinion'. While CAD systems can achieve
high sensitivity, their limited specificity has hindered clinical acceptance. To overcome this problem, we propose a false
positive reduction (FPR) system based on image processing and machine learning to reduce the number of false positive
lung nodules identified by CAD algorithms and thereby improve system specificity.
To discriminate between true and false nodules, twenty-three 3D features were calculated from each candidate nodule's
volume of interest (VOI). A genetic algorithm (GA) and support vector machine (SVM) were then used to select an
optimal subset of features from this pool of candidate features. Using this feature subset, we trained an SVM classifier to
eliminate as many false positives as possible while retaining all the true nodules. To overcome the imbalanced nature of
typical datasets (significantly more false positives than true positives), an intelligent data selection algorithm was
designed and integrated into the machine learning framework, thus further improving the FPR rate.
Three independent datasets were used to train and validate the system. Using two datasets for training and the third for
validation, we achieved a 59.4% FPR rate while removing one true nodule on the validation datasets. In a second
experiment, 75% of the cases were randomly selected from each of the three datasets and the remaining cases were used
for validation. A similar FPR rate and true positive retention rate was achieved. Additional experiments showed that the
GA feature selection process integrated with the proposed data selection algorithm outperforms the one without it by
5%-10% FPR rate.
The methods proposed can be also applied to other application areas, such as computer-aided diagnosis of lung nodules.
|