Open Access Paper
17 October 2022 Combining deep learning and adaptive sparse modeling for low-dose CT reconstruction
Ling Chen, Zhishen Huang, Yong Long, Saiprasad Ravishankar
Author Affiliations +
Proceedings Volume 12304, 7th International Conference on Image Formation in X-Ray Computed Tomography; 123041S (2022) https://doi.org/10.1117/12.2647190
Event: Seventh International Conference on Image Formation in X-Ray Computed Tomography (ICIFXCT 2022), 2022, Baltimore, United States
Abstract
Traditional model-based image reconstruction (MBIR) methods combine forward and noise models with simple object priors. Recent application of deep learning methods for image reconstruction provides a successful data-driven approach to addressing the challenges when reconstructing images with measurement undersampling or various types of noise. In this work, we propose a hybrid supervised-unsupervised learning framework for X-ray computed tomography (CT) image reconstruction. The proposed learning formulation leverages both sparsity or unsupervised learning-based priors and neural network reconstructors to simulate a fixed-point iteration process. Each proposed trained block consists of a deterministic MBIR solver and a neural network. The information flows in parallel through these two reconstructors and is then optimally combined, and multiple such blocks are cascaded to form a reconstruction pipeline. We demonstrate the efficacy of this learned hybrid model for low-dose CT image reconstruction with limited training data, where we use the NIH AAPM Mayo Clinic Low Dose CT Grand Challenge dataset for training and testing. In our experiments, we study combinations of supervised deep network reconstructors and sparse representations-based (unsupervised) learned or analytical priors. Our results demonstrate the promising performance of the proposed framework compared to recent reconstruction methods.

1.

INTRODUCTION

X-ray computed tomography (CT) is widely used in industrial and clinical applications. It is highly valuable to reduce patients’ exposure to X-ray radiation during scans by reducing the dosage. However, this creates challenges for image reconstruction. The conventional CT image reconstruction methods include analytical methods and model-based iterative reconstruction (MBIR) methods.1 The performance of analytical methods such as the filtered back-projection (FBP)2 degrades due to the greater influence of noise in the low X-ray dose setting. MBIR methods aim to address such performance degradation in the low-dose X-ray computed tomography (LDCT) setting. MBIR methods often use penalized weighted least squares (PWLS) reconstruction formulations involving simple priors for the underlying object such as edge-preserving (EP) regularization that assumes the image is approximately sparse in the gradient domain. More recent dictionary learning-based methods3 provide improved image reconstruction quality compared to nonadaptive MBIR schemes, but involve expensive computations for sparse encoding. Recent PWLS methods with regularizers involving learned sparsifying transforms (PWLS-ST4) or a union of learned transforms (PWLS-ULTRA5) combine both computational efficiency (cheap sparse coding in transform domain) and the representation power of learned models (transforms).

Data-driven (deep) learning approaches have also demonstrated success for LDCT image reconstruction (see6 for a review). FBPConvNet7 is a convolutional neural network (CNN) scheme that refines the quality of FBP reconstructed (corrupted) CT images to match target or ground truth images. Another approach WavResNet8 learns a set of filters that are used in constructing the encoder and decoder of the convolutional framelet denoiser to refine crude LDCT images. However, deep learning methods often require large training sets for effective learning and generalization. Methods based on sparsifying transform learning typically require small training sets and have been shown to generalize reasonably to new data.5 Hence, Ye et al.9 proposed a unified supervised-unsupervised (referred to here as Serial SUPER) learning framework for LDCT image reconstruction that combined supervised deep learning and unsupervised transform learning (ULTRA) regularization for robust reconstruction. The framework alternates between a neural network-based denoising step and optimizing a cost function with data-fidelity, deep network and learned transform terms.

In this work, we propose an alternative repeated parallel combination of deep network reconstructions and transform learning-based reconstructions (dubbed Parallel SUPER) for improved LDCT image reconstruction. We show that the adaptive transform sparsity-based image features complement deep network learned features in every layer with appropriate weights to provide better reconstructions than either the deep network or transform learning-based baselines themselves. The proposed parallel SUPER method also outperforms the recent Serial SUPER scheme in our experiments.

2.

PARALLEL SUPER MODEL AND THE ALGORITHM

The proposed parallel SUPER reconstruction model is shown in Fig. 1. Each layer of parallel SUPER is comprised of a neural network and a PWLS based LDCT solver with sparsity-promoting data-adaptive regularizers. The images in the pipeline flow in parallel through these two components in a layer and are combined together (with adapted weight). The framework consists of multiple such parallel SUPER layers to ensure empirical reconstruction convergence. In this work, we have used the FBPConvNet model in the supervised module and PWLS-ULTRA as the unsupervised reconstruction module with a pre-learned union of transforms. However, specific deployed modules in the parallel SUPER framework can be replaced with other parametric models or MBIR methods.

Figure 1:

Overall structure of the proposed Parallel SUPER framework.

00065_PSISDG12304_123041S_page_2_1.jpg

2.1

Supervised Module

The supervised modules are trained sequentially. We set the loss function during training to be the root-mean-squared error (RMSE) to enforce alignment between the refined images and ground truth images. In the l-th parallel SUPER layer, the optimization problem for training the neural network is:

00065_PSISDG12304_123041S_page_2_2.jpg

where Gθ(l) (·) denotes the neural network mapping in the l-th layer with parameters θ(l), 00065_PSISDG12304_123041S_page_2_3.jpg is the n-th input image from (l - 1)-th layer, 00065_PSISDG12304_123041S_page_2_4.jpg is the corresponding regular-dose (reference) image or the training label. Note that the neural networks in different layers have different parameters.

2.2

Unsupervised Module

For the unsupervised module of each layer, we solve the following MBIR problem to reconstruct an image x ∈ ℝNp from the corresponding noisy sinogram data y ∈ ℝNd:

00065_PSISDG12304_123041S_page_3_1.jpg

where W = diag{wi} ∈ ℝNd Nd is a diagonal weighting matrix with the diagonal elements wi being the estimated inverse variance of yi, A ∈ ℝ Nd Np is the system matrix of the CT scan, L(Ax, y) is the data-fidelity term, penalty ℜ(x) is a (learning-based) regularizer, and the parameter β > 0 controls the noise and resolution trade-off.

In this work, we use the PWLS-ULTRA method to reconstruct an image x from noisy sinogram data y (measurements) with a union of pre-learned transforms 00065_PSISDG12304_123041S_page_3_2.jpg. The image reconstruction is done through the following nonconvex optimization problem:

00065_PSISDG12304_123041S_page_3_3.jpg

where 00065_PSISDG12304_123041S_page_3_4.jpg is the reconstructed image by the unsupervised solver in the l-th layer, the operator Pj ∈ ℝlNp extracts the j-th patch of l voxels of image x as Pjx, zj is the corresponding sparse encoding of the image patch under a matched transform, and 𝒞k, denotes the indices of patches grouped into the k-th cluster with transform Ωk. Minimization over 𝒞k indicates the computation of the cluster assignment of each patch. The regularizer ℜ includes an encoding error term and an 0 sparsity penalty counting the number of non-zero entries with weight γ2. The sparse encoding and clustering are computed simultaneously. We apply the alternating minimization method from5 (with inner iterations for updating x) on the above optimization problem. The algorithm also uses a different (potentially better) initialization in each parallel SUPER layer, which may benefit solving the involved nonconvex optimization problem.

2.3

Parallel SUPER Model

The main idea of the Parallel SUPER framework is to combine the supervised neural networks and iterative model-based reconstruction solvers in each layer. Define 00065_PSISDG12304_123041S_page_3_5.jpg as an iterative MBIR solver with initial solution 00065_PSISDG12304_123041S_page_3_6.jpg noisy sinogram data y and hyperparameter setting Γ to solve optimization problem (2). In the l-th layer, the parallel SUPER model is formulated as:

00065_PSISDG12304_123041S_page_3_7.jpg

where λ is the nonnegative weight parameter for the neural network output in each layer and it is selected and fixed in all layers. Each parallel super layer can be thought of as a weighted average between a supervised denoised image and a reconstructed low-dose image from the unsupervised solver. Repeating multiple parallel SUPER layers simulates a fixed-point iteration to generate an ultimate reconstructed image.

The Parallel SUPER training algorithm based on (P0) is shown in Algorithm 1. The Parallel SUPER reconstruction algorithm is the same except that it uses the learned network weights in each layer.

00065_PSISDG12304_123041S_page_4_1.jpg

3.

EXPERIMENTS

3.1

Experiment Setup

In our experiments, we use the Mayo Clinics dataset established for “the 2016 NIH-AAPM-Mayo Clinic Low Dose CT Grand Challenge”.10 We choose 520 images from 6 of 10 patients in the dataset, among which 500 slices are used for training and 20 slices are used for validation. We randomly select 20 images from the remaining 4 patients for testing. We project the regular dose CT images x to sinograms y by adding Poisson and additive Gaussian noise to them as follows:

00065_PSISDG12304_123041S_page_4_2.jpg

where the original number of incident photons per ray is I0 = 104, the Gaussian noise variance is σ2 = 25, and ϵ11 is a small positive number to avoid negative measurement data when taking the logarithm.

We use the Michigan Image Reconstruction Toolbox to construct fan-beam CT geometry with 736 detectors ⨯ 1152 regularly spaced projection views, and a no-scatter mono-energetic source. The width of each detector column is 1.2858 mm, the source to detector distance is 1085.6 mm, and the source to rotation center distance is 595 mm. We reconstruct images of size 512 ⨯ 512 with the pixel size being 0.69 mm ⨯ 0.69 mm.

3.2

Parameter Settings

In the parallel SUPER model, we use FBPConvNet as the supervised module and PWLS-ULTRA as the unsupervised module. It takes about 10 hours for training the model for 10 layers in a GTX Titan GPU graphics processor. We train models for different values of the parameter λ (to then select an optimal value), including 0.1, 0.3, 0.5, 0.7, and 0.9. During the training of the supervised method, we ran 4 epochs (kept small to reduce overfitting risks) of the stochastic gradient descent (SGD) optimizer for the FBPConvNet module in each parallel SUPER layer. The training hyperparameters of FBPConvNet are set as follows: the learning rate decreases logarithmically from 0.001 to 0.0001; the batchsize is 1; and the momentum parameter is 0.99. The filters are initialized in the various networks during training with i.i.d. random Gaussian entries with zero mean and variance 0.005. For the unsupervised module, we have trained a union of 5 sparsifying transforms using 12 slices of regular-dose CT images (which are included in the 500 training slices). Then, we use the pre-learned union of 5 sparsifying transforms to reconstruct images with 5 outer iterations and 5 inner iterations of PWLS-ULTRA. In the training and reconstruction with ULTRA, we set the parameters β = 5 ⨯ 103 and γ = 20. PWLS-EP reconstruction is used as the initialization 00065_PSISDG12304_123041S_page_4_3.jpg of the input of network in the first layer.

We compare the proposed parallel SUPER model with the unsupervised method (PWLS-EP), standalone supervised module (FBPConvNet), standalone unsupervised module (PWLS-ULTRA), and the serial SUPER model. PWLS-EP is a penalized weighted-least squares reconstruction method with edge-preserving hyperbola regularization. For the unsupervised method (PWLS-EP), we set the parameters δ = 20 and β = 215 and run 100 iterations to obtain convergent results. In the training of the standalone supervised module (FBPConvNet), we run 100 epochs of training to sufficiently learn the image features with low overfitting risks. In the standalone unsupervised module (PWLS-ULTRA), we use the pre-learned union of 5 sparsifying transforms to reconstruct images. We set the parameters β = 104 and γ = 25, and run 1000 alternations with 5 inner iterations to ensure good performance. In the serial SUPER model, we run 4 epochs of training when learning the supervised modules (FBPConvNet), and we use the pre-learned union of 5 sparsifying transforms and set the parameters β = 5 ⨯ 103, γ = 20 and μ = 5 ⨯ 105 to reconstruct images with 20 alternations and 5 inner iterations for the unsupervised module (PWLS-ULTRA).

3.3

Results

To compare the performance quantitatively, we compute the RMSE in Hounsfield units (HU) and structural similarity index measure (SSIM)12 for the reconstructed images. For a reconstructed image 00065_PSISDG12304_123041S_page_5_3.jpg, RMSE is defined as, 00065_PSISDG12304_123041S_page_5_4.jpg, where 00065_PSISDG12304_123041S_page_5_5.jpg denotes the reference regular-dose image intensity at the j-th pixel location and Np is the number of pixels.

We train the parallel SUPER framework with different choices of the parameter λ including 0.1, 0.3, 0.5, 0.7 and 0.9 to obtain the best choice. Fig. 2 shows the evolution of RMSE over layers for 20 validation slices with different λ choices. We can see that we obtain the best RMSE when λ = 0.3.

Figure 2:

RMSE (over 20 test slices) comparison with different choices of parameters.

00065_PSISDG12304_123041S_page_5_1.jpg

We have conducted experiments on 20 test slices (slice 20, slice 50, slice 100, slice 150 and slice 200 of patient L067, L143, L192, L310) of the Mayo Clinic data. Table 1 shows the averaged image quality of 20 test images with different methods. From Table 1, we observe that Parallel SUPER significantly improves the image quality compared with the standalone methods. It also achieves 1.8 HU better average RMSE compared with Serial SUPER while its SSIM is comparable with Serial SUPER. Fig. 3 shows the reconstructions of L067 (slice 50) and L310 (slice 150) using PWLS-ULTRA, FBPConvNet, serial SUPER (FBPConvNet + PWLS-ULTRA), and parallel SUPER (FBPConvNet + PWLS-ULTRA), along with the references (ground truth). The Parallel SUPER scheme achieved the lowest RMSE and the zoom-in areas show that Parallel SUPER can reconstruct image details better.

Figure 3:

Reconstruction of slice 50 from patient L067 and reconstruction of slice 150 from patient L310 using various methods. The display window is [800, 1200] HU.

00065_PSISDG12304_123041S_page_5_2.jpg

Table 1:

The mean RMSE and SSIM values for 20 test images with PWLS-EP, PWLS-ULTRA, FBPConvNet, Serial SUPER, and the proposed Parallel SUPER.

 PWLS-EPPWLS-ULTRAFBPConvNetSerial SUPERParallel SUPER
RMSE41.432.429.225.023.2
SSIM0.6730.7160.68825.023.2

4.

CONCLUSIONS

This paper proposes the parallel SUPER framework combining supervised deep learning methods and unsupervised methods for low-dose CT reconstruction. We have experimented on a setting with the supervised model FBPConvNet and the unsupervised model PWLS-ULTRA. This framework demonstrates better reconstruction accuracy and faster convergence compared to individual involved modules as well as the recent serial SUPER framework.

REFERENCES

[1] 

“Model-based Reconstruction with Learning: From Unsupervised to Supervised and Beyond,” arXiv e-prints, (2021). Google Scholar

[2] 

“Practical cone-beam algorithm,” Journal of the Optical Society of America A, 1 (6), 612 –619 (1984). https://doi.org/10.1364/JOSAA.1.000612 Google Scholar

[3] 

“Low-dose X-ray CT reconstruction via dictionary learning,” IEEE Trans. Med. Imag., 31 (9), 1682 –97 (2012). https://doi.org/10.1109/TMI.2012.2195669 Google Scholar

[4] 

“Low dose ct image reconstruction with learned sparsifying transform,” 2016 IEEE 12th Image, Video, and Multidimensional Signal Processing Workshop (IVMSP), (2017). Google Scholar

[5] 

“PWLS-ULTRA: An efficient clustering and learning-based approach for low-dose 3D CT image reconstruction,” IEEE Trans. Med. Imag., 37 1498 –510 (2018). https://doi.org/10.1109/TMI.2018.2832007 Google Scholar

[6] 

“Image reconstruction: From sparsity to data-adaptive methods and machine learning,” in Proceedings of the IEEE, 86 –109 (2020). Google Scholar

[7] 

“Deep convolutional neural network for inverse problems in imaging,” IEEE Transactions on Image Processing PP(99), 4509 –4522 (2016). Google Scholar

[8] 

“Deep Convolutional Framelet Denosing for Low-Dose CT via Wavelet Residual Network,” IEEE Trans. Med. Imaging, 37 (6), 1358 –1369 (2018). https://doi.org/10.1109/TMI.2018.2823756 Google Scholar

[9] 

“Unified supervised-unsupervised (super) learning for x-ray ct image reconstruction,” IEEE Transactions on Medical Imaging, 40 (11), 2986 –3001 (2021). https://doi.org/10.1109/TMI.2021.3095310 Google Scholar

[10] 

“Tu-fg-207a-04: Overview of the low dose ct grand challenge,” Medical Physics, 43 (2), 3759 –60 (2016). Google Scholar

[11] 

“Spultra: Low-dose ct image reconstruction with joint statistical and learned image models,” IEEE Transactions on Medical Imaging, 39 (3), 729 –741 (2020). https://doi.org/10.1109/TMI.42 Google Scholar

[12] 

“Image quality assessment: from error visibility to structural similarity,” IEEE Trans Image Process, 13 (4), (2004). Google Scholar
© (2022) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Ling Chen, Zhishen Huang, Yong Long, and Saiprasad Ravishankar "Combining deep learning and adaptive sparse modeling for low-dose CT reconstruction", Proc. SPIE 12304, 7th International Conference on Image Formation in X-Ray Computed Tomography, 123041S (17 October 2022); https://doi.org/10.1117/12.2647190
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
CT reconstruction

Image restoration

Neural networks

Reconstruction algorithms

Image quality

Model-based design

Computed tomography

Back to Top