The detection of prostate cancer within H&E-stained tissue slides helps identify the presence, location, and extent of the disease. Machine learning approaches have been developed to accomplish this task for biopsies and radical prostatectomies. Deep learning approaches, primarily using convolutional neural networks (CNN), have demonstrated substantial ability to identify cancer in each of these sample types. Some models trained on biopsies have even been approved for clinical application. Biopsies are small tissue samples acquired through a syringe to establish cancer diagnosis while radical prostatectomies are large planar sections of prostate tissue acquired post-surgery. This leads to differing morphology and heterogeneity between sample types, making it unclear whether algorithms trained on biopsies are robust enough to perform well on radical prostatectomies and vice versa. Our goal was to investigate whether morphological differences between sample types affected the performance of radical prostatectomy trained CNN models when applied to biopsies and vice versa. Radical prostatectomies (N=100) and biopsies (N=50) were acquired from The University of Pennsylvania to train (80%) and validate (20%) a CNN using the Densenet architecture for biopsies (MB), radical prostatectomies (MR), and a combined dataset (MB+R). Model performance was compared using sensitivity, specificity, and F1 score. To isolate the effect of morphological differences on performance, we acquired all data from the same institution, ensured no batch effects were present via UMAP, applied data augmentation during training, and applied stain normalization. Additionally, the radical prostatecomy training set was optimized for transfer to biopsies by reducing heterogeneity (MR*). This reduction was based on distance from the centroid of a texture feature distribution. Models performed well when applied to their own sample type (F1 > 0.88) but performed poorly when applied across sample types (F1 < 0.65). Reducing heterogeneity to optimize the radical prostatectomy training set resulted in MR* (F1=0.72) noticeably outperforming MR (F1=0.53) on biopsies. This indicates that differences in morphology and heterogeneity drive performance differences between cancer detection models trained on different sample types. Our results suggest that these morphological differences can be overcome with sample type specific models or training set optimization methods.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.