Purpose: Hippocampus contouring for radiotherapy planning is performed on MR image data due to poor anatomical visibility on computed tomography (CT) data. Deep learning methods for direct CT hippocampus auto-segmentation exist, but use MR-based training contours. We investigate if these can be replaced by CT-based contours without loss in segmentation performance. This would remove the MR not only from inference but also from training.
Approach: The hippocampus was contoured by medical experts on MR and CT data of 45 patients. Convolutional neural networks (CNNs) for hippocampus segmentation on CT were trained on CT-based or propagated MR-based contours. In both cases, their predictions were evaluated against the MR-based contours considered as the ground truth. Performance was measured using several metrics, including Dice score, surface distances, and contour Dice score. Bayesian dropout was used to estimate model uncertainty.
Results: CNNs trained on propagated MR contours (median Dice 0.67) significantly outperform those trained on CT contours (0.59) and also experts contouring manually on CT (0.59). Differences between the latter two are not significant. Training on MR contours results in lower model uncertainty than training on CT contours. All contouring methods (manual or CNN) on CT perform significantly worse than a CNN segmenting the hippocampus directly on MR (median Dice 0.76). Additional data augmentation by rigid transformations improves the quantitative results but the difference remains significant.
Conclusions: CT-based training contours for CT hippocampus segmentation cannot replace propagated MR-based contours without significant loss in performance. However, if MR-based contours are used, the resulting segmentations outperform experts in contouring the hippocampus on CT.
While deep learning based methods for medical deformable image registration have recently shown significant advances in both speed and accuracy, methods for use in radio therapy are still rarely proposed due to several challenges such as low contrast and artifacts in cone beam CT (CBCT) images or extreme deformations. The aim of image registration in radio therapy is to align a baseline CT and low-dose CBCT images, which allows contours to be propagated and applied doses to be tracked over time. To this end, we present a novel deep learning method for multi-modal deformable CT-CBCT registration. We train a CNN in weakly supervised manner, aiming to optimize an edge-based image similarity and a deformation regularizer including a penalty for local changes of topology and foldings. Additionally, we measure the alignment of given segmentations, facing the problem of extreme deformations. Our method receives only CT and a CBCT images as input and uses groundtruth segmentations exclusively during training. Furthermore, our method is not dependent on the availability of difficult to access ground-truth deformation vector fields. We train and evaluate our method on follow-up image pairs of the pelvis and compare our results to conventional iterative registration algorithms. Our experiments show that the registration accuracy of our deep learning based approach is superior to iterative registration without additional guidance by segmentations and nearly as good as iterative structure guided registration that requires ground-truth segmentations. Furthermore, our deep learning based method runs approximately 100 times faster than the iterative methods.
Quantitative comparison of automatic results for multi-organ segmentation by means of Dice scores often does not yield satisfactory results. It is especially challenging, when reference contours may be prone to errors. We developed a novel approach that analyzes regions of high mismatch between automatic and reference segmentations. We extract various metrics characterizing these mismatch clusters and compare them to other metrics derived from volume overlap and surface distance histograms by correlating them with qualitative ratings from clinical experts. We show that some novel features based on the mismatch sets or surface distance histograms performed better than the Dice score. We also show how the mismatch clusters can be used to generate visualizations to reduce the workload for visual inspection of segmentation results. The visualizations directly compare reference to automatic result at locations of high mismatch in orthogonal 2D views and 3D scenes zoomed to the appropriate positions. This can make it easier to detect systematic problems of an algorithm or to compare recurrent error patterns for different variants of segmentation algorithms, such as differently parameterized or trained CNN models.
Adaptive radiotherapy (RT) planning requires segmentation of organs for adapting the RT treatment plan to changes in the patient’s anatomy. Daily imaging is often done using cone-beam CT (CBCT) imaging devices which produce images of considerably lower quality than CT images, due to scatter and artifacts. Involuntary patient motion during the comparably long CBCT image acquisition may cause misalignment artifacts. In the pelvis, most severe artifacts stem from motion of air and soft tissue boundaries in the bowel, which appear as streaking in the reconstructed images. In addition to low soft tissue contrast, this makes segmentation of organs close to the bowel such as bladder and uterus even more difficult. Deep learning (DL) methods have shown to be promising for difficult segmentation tasks. In this work, we investigate different, artifact-driven sampling schemes that incorporate domain knowledge into the DL training. However, global evaluation metrics such as the Dice score, often used in DL segmentation research, reveal little information about systematic errors and no clear perspective how to improve the training. Using slice-wise Dice scores, we find a clear difference in performance on slices with and without air detected. Moreover, especially when applied in a curriculum training scheme, the specific sampling of slices on which air has been detected might help to increase robustness of deep neural networks towards artifacts while maintaining performance on artifact-free slices.
The segmentation of organs at risk is a crucial and time-consuming step in radiotherapy planning. Good automatic methods can significantly reduce the time clinicians have to spend on this task. Due to its variability in shape and low contrast to surrounding structures, segmenting the parotid gland is challenging. Motivated by the recent success of deep learning, we study the use of two-dimensional (2-D), 2-D ensemble, and three-dimensional (3-D) U-Nets for segmentation. The mean Dice similarity to ground truth is ∼0.83 for all three models. A patch-based approach for class balancing seems promising for false-positive reduction. The 2-D ensemble and 3-D U-Net are applied to the test data of the 2015 MICCAI challenge on head and neck autosegmentation. Both deep learning methods generalize well onto independent data (Dice 0.865 and 0.88) and are superior to a selection of model- and atlas-based methods with respect to the Dice coefficient. Since appropriate reference annotations are essential for training but often difficult and expensive to obtain, it is important to know how many samples are needed for training. We evaluate the performance after training with different-sized training sets and observe no significant increase in the Dice coefficient for more than 250 training cases.
The segmentation of target structures and organs at risk is a crucial and very time-consuming step in radiotherapy planning. Good automatic methods can significantly reduce the time clinicians have to spend on this task. Due to its variability in shape and often low contrast to surrounding structures, segmentation of the parotid gland is especially challenging. Motivated by the recent success of deep learning, we study different deep learning approaches for parotid gland segmentation. Particularly, we compare 2D, 2D ensemble and 3D U-Net approaches and find that the 2D U-Net ensemble yields the best results with a mean Dice score of 0.817 on our test data. The ensemble approach reduces false positives without the need for an automatic region of interest detection. We also apply our trained 2D U-Net ensemble to segment the test data of the 2015 MICCAI head and neck auto-segmentation challenge. With a mean Dice score of 0.861, our classifier exceeds the highest mean score in the challenge. This shows that the method generalizes well onto data from independent sites. Since appropriate reference annotations are essential for training but often difficult and expensive to obtain, it is important to know how many samples are needed to properly train a neural network. We evaluate the classifier performance after training with differently sized training sets (50–450) and find that 250 cases (without using extensive data augmentation) are sufficient to obtain good results with the 2D ensemble. Adding more samples does not significantly improve the Dice score of the segmentations.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.