Neural network-based video segmentation has proven effective in producing temporally-coherent segmentation and motion tracking of heart substructures in echocardiography. However, prior methods confine analysis to half-heartbeat systolic phase clips from end-diastole (ED) to end-systole (ES), requiring the specification of these frames in the video and limiting clinical applicability. Here we introduce CLAS-FV, a fully automated framework that extends upon this prior work, providing joint semantic segmentation and motion tracking in multi-beat echocardiograms. Our framework first employs a modified R2+1D ResNet stem, which is efficient in encoding spatiotemporal features, and further leverages sliding windows for both training and test time augmentation to accommodate the full cardiac cycle. First, through 10-fold cross-validation on the half-beat CAMUS dataset, we show that the R2+1D-based stem outperforms the prior 3D U-Net both in Dice overlap for all substructures, and in derived clinical indices of ED and ES ventricular volumes and ejection fraction (EF). Next, we use the large clinical EchoNet-Dynamic dataset to extend our framework to full multi-beat video segmentation. We obtain mean Dice overlap of 0.94/0.91 on left ventricle endocardium in ED/ES phases, and accurately infer EF (mean absolute error 5.3%) over 1269 test patients. The presented multi-heartbeat video segmentation framework promises fast and coherent segmentation and motion tracking for the rich phenotypic analysis of echocardiography.
Existing deep-learning methods achieve state-of-art segmentation of multiple heart substructures from 2D echocardiography videos, an important step in the diagnosis and management of cardiovascular disease. However, these methods generally perform frame-level segmentation, ignoring the temporal coherence in heart motion between frames, which is a useful signal in clinical protocols. In this work, we implement temporally consistent video segmentation, which has recently been shown to improve performance on the multi-structure annotated CAMUS dataset. We show that data augmentation further improves results, which are consistent with prior state-of-art works. Our 10-fold cross-validation shows that video segmentation improves the automatic comparison to clinical indices including smaller mean absolute errors for left ventricular end-diastolic volume (8.7 mL vs 9.9 mL), end-systolic volume (6.3 mL vs 6.6 mL), and ejection fraction (EF) (4.6% vs 5.3%). In segmenting key cardiac structures, video segmentation achieves mean Dice overlap of 0.93 on left ventricular endocardium, 0.95 on left ventricular epicardium, and 0.88 on left atrium. To assess clinical generalizability, we further apply the CAMUS-trained video segmentation models, without tuning, to a larger, recently published EchoNet-Dynamic clinical dataset. On 1274 patients in the test set, we obtain absolute errors of 6:3% ± 5:4 in EF, confirming the reliability of this scheme. In that the EchoNet-Dynamic videos contain limited annotation only for left ventricle endocardium, this effort extends at little cost generalizable, multi-structure video segmentation to a large clinical dataset.
Segmentation of heart substructures in 2D echocardiography images is an important step in diagnosis and management of cardiovascular disease. Given the ubiquity of echocardiography in routine cardiology practice, the time-consuming nature of manual segmentation, and the high degree of inter-observer variability, fully automatic segmentation is a goal common to both clinicians and researchers. The recent publication of the annotated CA- MUS dataset will help catalyze these efforts. In this work we develop and validate against this dataset a deep fully convolutional neural network architecture for the multi-structure segmentation of echocardiography, in- cluding the left ventricular endocardium and epicardium, and the left atrium. In ten-fold cross validation with data augmentation, we obtain mean Dice overlaps of 0.93, 0.95, and 0.89 on the three structures respectively, representing state of the art on this dataset. We further report small biases and narrow limits of agreement between the automatic and manual segmentations in derived clinical indices, including median absolute errors for left ventricular diastolic (7.4mL) and systolic (4.8mL) volumes, and ejection fraction (4.1%), within previously reported inter-observer variability. These encouraging results must still be validated against large-scale independent clinical data.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.