Open Access Paper
28 December 2022 Theoretical linear convergence of deep unfolding network for block-sparse signal recovery
Rong Fu, Yimin Liu, Xiuhong Li
Author Affiliations +
Proceedings Volume 12506, Third International Conference on Computer Science and Communication Technology (ICCSCT 2022); 125066J (2022) https://doi.org/10.1117/12.2662374
Event: International Conference on Computer Science and Communication Technology (ICCSCT 2022), 2022, Beijing, China
Abstract
In this paper, we consider the recovery of the high-dimensional block-sparse signal from a compressed set of measurements, where the nonzero coefficients of the recovered signal occur in several blocks. Adopting the idea of deep unfolding, we explore the block-sparse structure and put forward a block-sparse reconstruction network named Ada-BlockLISTA, which performs gradient descent on every single block followed by a block-wise shrinkage. Furthermore, we prove the linear convergence rate of our proposed network, which also theoretically guarantees exact recovery for a potentially higher sparsity level based on the underlying block structure. Numerical results indicate that Ada-BlockLISTA yields better signal recovery performance than existing algorithms, which ignore the additional block structure in the signal model.

1.

INTRODUCTION

As it is crucial to minimize the sample size required in estimation problems, compressive sensing (CS) plays an indispensable role in the field of signal processing1-3. Many existing problems can be formulated in terms of compressed sensing (CS), including sparse channel estimation4, beam pattern synthesis5, direction-of-arrival (DOA) estimation6,7, and range-Doppler estimation8. The recovery is completed by finding a sparse representation of the time-domain signal over a dictionary matrix, which is always a row-sampled discrete Fourier transform (DFT) matrix.

Since this kind of problem is ill-posed, various methods have been proposed. One of the well-known 1-norm regularization techniques is the iterative shrinkage thresholding algorithm (ISTA)9, which is often used in a wide range of applications10,11. Since ISTA is solving a convex optimization problem, it is guaranteed to converge to a solution under the correct circumstances. As an iterative solver, it often costs too much time for ISTA to converge, thus difficult to apply in various real-time applications. Some pre-defined optimization parameters such as the step size and regularization parameter are also required to be set carefully based on prior knowledge, which may become quite challenging in some cases12.

To address this problem, Gregor and LeCun13 have proposed a recurrent neural network (RNN) to solve sparse coding problems, named learned ISTA (LISTA). Based on the iterative structure of ISTA, the authors free the traditional parameters in ISTA to data-driven variables and unfold ISTA algorithms into a RNN structure. LISTA networks show improved performance in terms of convergence speed in both theoretical analysis and empirical results14-16. Many works have modified LISTA to increase its adaptability by embedding the dictionary in the network architecture. Reference17 couples the learned matrices in LISTA by embedding the dictionary matrix into the network structure. Based on the work of Reference17, robust- ALISTA is proposed in Reference18, which explicitly calculates the learned matrices by solving a coherence minimization problem. Another adaptive variation of LISTA, named Adaptive-LISTA (AdaLISTA)19, generalizes the application of robust-ALISTA to various model scenarios. Beyond enjoying accelerated convergence speed, AdaLISTA can serve different dictionaries using the same weight matrix without retraining the whole network.

However, all these LISTA-type networks are sensitive to the signal sparsity level. While increasing the number of nonzero elements in the sparse signal of interest, LISTA and its variants will suffer decreased signal recovery accuracy. In practical applications, the recovered signal may not be sufficiently sparse, such as block-sparse problems which recover an unknown, block-sparse signal, where the nonzero elements are distributed in blocks. Based on AdaLISTA, a block-sparse reconstruction network, named Ada-BlockLISTA, has been proposed in Reference20, which makes good use of the natural structure of a block-sparse signal and can recover its nonzero blocks correctly, while original AdaLISTA fails, especially with many nonzero blocks.

In this paper, we formulate the signal model with block structure and introduce relevant iterative algorithms for solving the block-sparse recovery problem in Section 2. By exploring the block structure of the signal model, we introduce Ada-BlockLISTA network in Section 3, and further analyze its convergency rate as well as support recovery guarantee. In numerical simulations, we demonstrate its convergence acceleration and a significant advantage in recovery performance.

2.

SYSTEM MODEL AND TRADITIONAL ALGORITHMS

In this section, we introduce the signal model with block-sparsity and review traditional algorithms such as ISTA and LISTA, which leverage the assumption of sparsity in the ground truth signal.

2.1

System model

We aim to recover an unknown, sparse signal from noisy observations taken from a known, under-determined dictionary Φ ∈ ℂN×M, with N < M. In this ill-conditioned problem, noisy observation y ∈ ℂN can be formulated as

00221_PSISDG12506_125066J_page_2_1.jpg

where x* is the ground truth, and w is additive random noise present in the system. Next, we define block sparsity according to Reference21. Suppose that a block-sparse signal x* is divided into Q blocks of length P. If there are at most K nonzero blocks (KQ) in x*, it is called K-block-sparse. Sharing the same structure with x*, the dictionary matrix Φ is also divided into Q blocks. A visualization of the block-sparse model is shown in Figure 1.

Figure 1.

A visualization of the block-sparse signal model where each color in Φ and x corresponds to a different block. White corresponds to zero entries in x.

00221_PSISDG12506_125066J_page_2_2.jpg

2.2

Brief review of traditional algorithms

This subsection briefly reviews traditional algorithms such as ISTA and LISTA, designed to solve sparse problems. In the framework of CS, the 1-regularized regression is a well-known technique to harness prior knowledge of sparsity. Standard ISTA solves the 1-regularized optimization problem by iteratively performing Lipschitz gradient descent with respect to the measurement error9,22.

While the signal of interest x possesses a block-sparse structure, many researchers have designed specialized algorithms in principle to utilize the block-sparse structure, such as Block ISTA20, block-OMP23, block-based CoSaMP24 and block-sparse Bayesian learning25. To compute an estimate which maintains a block-sparse structure, the original 1-norm optimization problem turns into the following mixed-norm optimization problem as follows,

00221_PSISDG12506_125066J_page_2_3.jpg

where 00221_PSISDG12506_125066J_page_3_1.jpg is the estimated signal, and λ is a regularization parameter controlling the sparsity penalty characterized by minimizing the 2,1 norm.

ISTA demonstrates considerable accuracy in recovering sparse signals but takes too much time to converge26. As an unfolded version of ISTA iterations, LISTA is a RNN containing only T layers, where each layer corresponds to an ISTA iteration13. To build a more generalized network for various dictionary choices, AdaLISTA, a variant of LISTA, has been proposed to increase the adaptability of the deep unfolding network19. One layer of AdaLISTA is defined as,

00221_PSISDG12506_125066J_page_3_2.jpg

where x(k) is the output signal of the k-th layer, W ∈ ℂN×N is the learned weight shared across different layers, while θ(k), γ(k) are the learned soft threshold and step size of the k-th layer in AdaLISTA.

3.

BLOCK-SPARSE RECOVERY NETWORK AND CONVERGENCE ANALYSIS

Motivated by 2,1 minimization methods such as Block-ISTA, we explore the block structure of the signal model and propose our Ada-BlockLISTA network, which is derived from AdaLISTA.

As Φ and x* share the same block structure, i.e., 00221_PSISDG12506_125066J_page_3_3.jpg and Φ=[Φ1 Φ2 ⋯ ΦQ], we learn an individual weight matrix {Wq}1≤qQ for each block and encourage block sparsity for the nonlinear shrinkage function by applying soft thresholding to each block individually. At the k-th layer of Ada-BlockLISTA, the update rule of the q-th block in the block-sparse signal is formulated as,

00221_PSISDG12506_125066J_page_3_4.jpg

where 00221_PSISDG12506_125066J_page_3_5.jpg is the q-th block of 00221_PSISDG12506_125066J_page_3_6.jpg after block-wise gradient descent, and 00221_PSISDG12506_125066J_page_3_7.jpg is the q-th block of the output signal x(k+1) after block-wise soft thresholding. In Ada-BlockLISTA, we need to learn the individual weight Wq for the q-th block, as well as the learned soft threshold θ(k) and the learned step size γ(k).

As shown above, Ada-BlockLISTA consists of two main steps.

  • First, we perform block-wise gradient descent and update the value of each block according to the current residual value.

  • Then, we use block-wise soft thresholding to force the updated signal 00221_PSISDG12506_125066J_page_3_8.jpg in the previous step to be block-sparse. Figures 2 and 3 illustrate the network structure of both AdaLISTA and Ada-BlockLISTA.

Figure 2.

The Block diagram of AdaLISTA architecture in one layer.

00221_PSISDG12506_125066J_page_4_1.jpg

Figure 3.

The Block diagram of Ada-BlockLISTA architecture in one layer.

00221_PSISDG12506_125066J_page_4_2.jpg

Comparing Figures 2 and 3, we can see that Ada-BlockLISTA updates each block in x independently, while AdaLISTA uses the same weight across different blocks. Furthermore, AdaLISTA and Ada-BlockLISTA use the soft threshold parameters {θ(k)}1≤kT in different ways. In AdaLISTA, we perform element-wise soft thresholding and set each element in x(k) with an absolute value less than θ(k) to 0. On the other hand, we use θ(k) as a block-wise threshold in Ada-BlockLISTA, which is compared with the 2 norm of every single block 00221_PSISDG12506_125066J_page_4_3.jpg rather than the absolute value of elements.

Then we show some main theoretical results, i.e., the linear convergence rate and support recovery guarantee of Ada-BlockLISTA. We generalize the two basic definitions of coherence21, i.e., the mutual coherence within a block, also referred to as sub-coherence, and block-coherence of the dictionary, which describe its local and global properties, respectively.

Here we illustrate the linear convergence rate of our Ada-BlockLISTA in the following theorem (More details on the convergence analysis of Ada-BlockLISTA can be found in the arXiv article: https://arxiv.org/abs/2111.09801). The proof of Theorem 1 mimics the corresponding proof steps in References18,19 and can be viewed as an extension to the block-sparse case.

Theorem 1 (Convergence rate of Ada-BlockLISTA).

We first consider the three quantities,

00221_PSISDG12506_125066J_page_4_4.jpg

Suppose x* to be sufficiently small such that 00221_PSISDG12506_125066J_page_4_5.jpg, and the soft thresholds are set as 00221_PSISDG12506_125066J_page_4_6.jpg where σ is the standard deviation of noise, then Ada-BlockLISTA converges linearly, i.e., its recovered error decreases exponentially with the number of layers.

The above theorem can be interpreted in two aspects:

  • Theorem 1 shows that Ada-BlockLISTA can converge at a linear rate under sufficient conditions, which is faster than the original ISTA and FISTA.

  • Compared to the recovery condition of AdaLISTA [19, Theorem 1], the recovery condition, which is the same as the block-sparse recovery condition of [25, Theorem 3], shows that exploring the block structure in x* leads to ensuring recovery of x* with higher sparsity level.

4.

NUMERICAL RESULTS

We simulate various experiments with different signal dimensions and the number of nonzero blocks. In our simulation, we set P = 16 and Q = 64, respectively. We generate observed signals according to Reference27. In Figures 4 and 5, we compared the convergence speed and recovered the 2D block-sparse signal plane of four methods (ISTA, Block-ISTA, AdaLISTA and our Ada-BlockLISTA) when the block sparsity K is 2.

Figure 4.

The NMSE of four methods in each iteration/layer.

00221_PSISDG12506_125066J_page_5_1.jpg

Figure 5.

The reconstructed block sparse signal, which has two nonzero blocks.

00221_PSISDG12506_125066J_page_5_2.jpg

The results show that Block ISTA and Ada-BlockLISTA successfully reconstruct two blocks, while ISTA and AdaLISTA fail. It demonstrates that block sparse recovery methods can recover a larger number of blocks than their non-block counterparts, which is also proved in Theorem 1.

5.

CONCLUSION

In this work, we considered the block-sparse signal recovery problem, where nonzero entries of recovered signals occur in clusters. A block-sparse reconstruction network, Ada-BlockLISTA, explores the block structure in the signal model. We prove that the unrolled block-sparse reconstruction network enjoys a linear convergence rate and provides a sufficient condition on block-sparsity to guarantee the successful recovery of block-sparse signals. We performed extensive simulations to verify the recovery performance of the proposed network. The numerical results show that Ada-BlockLISTA yields better reconstruction properties than the original AdaLISTA network, especially with a large number of nonzero blocks. We demonstrate that explicit use of block-sparsity in unfolded deep learning networks improves block-sparse recovery performance by conducting theoretical and experimental analyses.

REFERENCES

[1] 

Xu, W. and Hassibi, B., “Efficient compressive sensing with deterministic guarantees using expander graphs,” 2007 IEEE Information Theory Workshop, 414 –419 (2007). https://doi.org/10.1109/ITW.2007.4313110 Google Scholar

[2] 

Indyk, P., “Sparse recovery using sparse random matrices,” [LATIN: Theoretical Informatics], 157 Springer, Berlin, Heidelberg (2010). Google Scholar

[3] 

Eldar, Y. C. and Kutyniok, G., “[Compressed Sensing: Theory and Applications],” Cambridge University Press, (2012). https://doi.org/10.1017/CBO9780511794308 Google Scholar

[4] 

Berger, C. R., Wang, Z., Huang, J. and Zhou, S., “Application of compressive sensing to sparse channel estimation,” IEEE Communications Magazine, 48 (11), 164 –174 (2010). https://doi.org/10.1109/MCOM.2010.5621984 Google Scholar

[5] 

Xiong, J. and Wang, W., “Sparse reconstruction-based beampattern synthesis for multi-carrier frequency diverse array antenna,” in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 3395 –3398 (2017). Google Scholar

[6] 

Balakrishnan, R. D. and Kwon, H. M., “A new inverse problem based approach for azimuthal DOA estimation,” in GLOBECOM’03 Global Telecommunications Conference, 2187 –2191 (2003). Google Scholar

[7] 

Xenaki, A., Gerstoft, P. and Mosegaard, K., “Compressive beamforming,” The Journal of the Acoustical Society of America, 136 (1), 260 (2014). https://doi.org/10.1121/1.4883360 Google Scholar

[8] 

Huang, T. and Liu, Y., “Compressed sensing for a frequency agile radar with performance guarantees,” in IEEE China Summit and International Conference on Signal and Information Processing (ChinaSIP), 1057 –1061 (2015). Google Scholar

[9] 

Beck, A. and Teboulle, M., “A fast iterative shrinkage-thresholding algorithm for linear inverse problems,” Siam J. Imaging Sciences, 2 (1), 183 –202 (2009). https://doi.org/10.1137/080716542 Google Scholar

[10] 

Combettes, P. L. and Wajs, V. R., “Signal recovery by proximal forward-backward splitting,” Multiscale Model Simul, 4 (4), 1168 –1200 (2006). https://doi.org/10.1137/050626090 Google Scholar

[11] 

Antonello, N., Stella, L., Patrinos, P. and van Waterschoot, T., “Proximal gradient algorithms: Applications in signal processing,” arXiv preprint arXiv:1803.01621, (2018). Google Scholar

[12] 

Zhang, J. and Ghanem, B., “ISTA-Net: Iterative shrinkage-thresholding algorithm inspired deep network for image compressive sensing,” arXiv:1706.07929, (2017). Google Scholar

[13] 

Gregor, K. and Lecun, Y., “Learning fast approximations of sparse coding,” in International Conference on International Conference on Machine Learning, 399 –406 (2010). Google Scholar

[14] 

Borgerding, M., Schniter, P. and Rangan, S., “AMP-inspired deep networks for sparse linear inverse problems,” IEEE Transactions on Signal Processing, 65 (16), 4293 –4308 (2017). https://doi.org/10.1109/TSP.2017.2708040 Google Scholar

[15] 

Borgerding, M. and Schniter, P., “Onsager-corrected deep learning for sparse linear inverse problems,” in 2016 IEEE Global Conference on Signal and Information Processing, 227 –231 (2016). Google Scholar

[16] 

Fu, R., Huang, T., Liu, Y. and Eldar, Y. C., “Compressed LISTA exploiting Toeplitz structure,” in 2019 IEEE Radar Conference (RadarConf), 1 –6 (2019). Google Scholar

[17] 

Chen, X., Liu, J., Wang, Z. and Yin, W., “Theoretical linear convergence of unfolded ista and its practical weights and thresholds,” [Advances in Neural Information Processing Systems], Curran Associates, Inc, 2018). Google Scholar

[18] 

Liu, J., Chen, X., Wang, Z. and Yin, W., “ALISTA: Analytic weights are as good as learned weights in LISTA,” in International Conference on Learning Representations, (2019). Google Scholar

[19] 

Aberdam, A., Golts, A. and Elad, M., “Ada-lista: Learned solvers adaptive to varying models,” arXiv:2001.08456, (2020). Google Scholar

[20] 

Fu, R., Monardo, V., Huang, T. and Liu, Y., “Deep unfolding network for block-sparse signal recovery,” in 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2880 –2884 (2021). Google Scholar

[21] 

Eldar, Y. C., Kuppinger, P. and Bolcskei, H., “Block-sparse signals: Uncertainty relations and efficient recovery,” IEEE Transactions on Signal Processing, 58 (6), 3042 –3054 (2010). https://doi.org/10.1109/TSP.2010.2044837 Google Scholar

[22] 

Wu, K., Guo, Y., Li, Z. and Zhang, C., “Sparse coding with gated learned ISTA,” in International Conference on Learning Representations, (2020). Google Scholar

[23] 

Eldar, Y. C., Kuppinger, P. and Bolcskei, H., “Block-sparse signals: Uncertainty relations and efficient recovery,” IEEE Transactions on Signal Processing, 58 (6), 3042 –3054 (2010). https://doi.org/10.1109/TSP.2010.2044837 Google Scholar

[24] 

Baraniuk, R. G., Cevher, V., Duarte, M. F. and Hegde, C., “Model-based compressive sensing,” IEEE Transactions on Information Theory, 56 (4), 1982 –2001 (2010). https://doi.org/10.1109/TIT.2010.2040894 Google Scholar

[25] 

Zhang, Z. and Rao, B. D., “Sparse signal recovery with temporally correlated source vectors using sparse bayesian learning,” IEEE Journal of Selected Topics in Signal Processing, 5 (5), 912 –926 (2011). https://doi.org/10.1109/JSTSP.2011.2159773 Google Scholar

[26] 

Draganic, A., Orovic, I. and Stankovic, S., “On some common compressive sensing recovery algorithms and applications—Review paper,” Facta universitatis-series: Electronics and Energetics, 30 477 –510 (2017). Google Scholar

[27] 

Chi, Y. and Chen, Y., “Compressive two-dimensional harmonic retrieval via atomic norm minimization,” IEEE Transactions on Signal Processing, 63 (4), 1030 –1042 (2015). https://doi.org/10.1109/TSP.2014.2386283 Google Scholar
© (2022) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Rong Fu, Yimin Liu, and Xiuhong Li "Theoretical linear convergence of deep unfolding network for block-sparse signal recovery", Proc. SPIE 12506, Third International Conference on Computer Science and Communication Technology (ICCSCT 2022), 125066J (28 December 2022); https://doi.org/10.1117/12.2662374
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Associative arrays

Compressed sensing

Matrices

Systems modeling

Visual process modeling

Visualization

Performance modeling

Back to Top