Paper
23 May 2005 Denoising and dimensionality reduction of genomic data
Author Affiliations +
Proceedings Volume 5841, Fluctuations and Noise in Biological, Biophysical, and Biomedical Systems III; (2005) https://doi.org/10.1117/12.609299
Event: SPIE Third International Symposium on Fluctuations and Noise, 2005, Austin, Texas, United States
Abstract
Genomics represents a challenging research field for many quantitative scientists, and recently a vast variety of statistical techniques and machine learning algorithms have been proposed and inspired by cross-disciplinary work with computational and systems biologists. In genomic applications, the researcher deals with noisy and complex high-dimensional feature spaces; a wealth of genes whose expression levels are experimentally measured, can often be observed for just a few time points, thus limiting the available samples. This unbalanced combination suggests that it might be hard for standard statistical inference techniques to come up with good general solutions, likewise for machine learning algorithms to avoid heavy computational work. Thus, one naturally turns to two major aspects of the problem: sparsity and intrinsic dimensionality. These two aspects are studied in this paper, where for both denoising and dimensionality reduction, a very efficient technique, i.e., Independent Component Analysis, is used. The numerical results are very promising, and lead to a very good quality of gene feature selection, due to the signal separation power enabled by the decomposition technique. We investigate how the use of replicates can improve these results, and deal with noise through a stabilization strategy which combines the estimated components and extracts the most informative biological information from them. Exploiting the inherent level of sparsity is a key issue in genetic regulatory networks, where the connectivity matrix needs to account for the real links among genes and discard many redundancies. Most experimental evidence suggests that real gene-gene connections represent indeed a subset of what is usually mapped onto either a huge gene vector or a typically dense and highly structured network. Inferring gene network connectivity from the expression levels represents a challenging inverse problem that is at present stimulating key research in biomedical engineering and system biology. Several attempts have been made to describe gene networks with only limited interactions, thus exploiting the inherent sparsity of these systems. This in turn suggests that a certain redundancy of links in gene networks, or equivalently the inherent sparsity structure of these systems, might let the essential connections be identified and the inverse problem be given both satisfactory definition and computationally efficient tractability.
© (2005) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Enrico Capobianco "Denoising and dimensionality reduction of genomic data", Proc. SPIE 5841, Fluctuations and Noise in Biological, Biophysical, and Biomedical Systems III, (23 May 2005); https://doi.org/10.1117/12.609299
Lens.org Logo
CITATIONS
Cited by 1 scholarly publication.
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Independent component analysis

Biological research

Denoising

Information technology

Interference (communication)

Principal component analysis

Deconvolution

Back to Top