KEYWORDS: Information science, Lutetium, Operating systems, Standards development, Visualization, Data conversion, Data storage, Binary data, Data compression, Detection and tracking algorithms
With the rapid development of genome sequencing technology, a large amount of genome data has been generated, it also brings the storage problem of this massive data. Therefore, the compression of genome data has become a research hotspot. We propose a new genome data compression algorithm called LCMRGC (low memory consumption referential genome compressor) for FASTA format sequences. The algorithm uses the suffix array data structure to support the search of matching strings, and uses the binary search method to accelerate accurate matching, so as to obtain better compression ratio. Experiment results on standard genome data show that the proposed algorithm significantly reduces the memory requirement for program operation, and is competitive in compression ratio and compression time.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.