 Home > Food News > Food Articles > It only takes a few minutes to assemble a complete genome

It only takes a few minutes to assemble a complete genome

 Last Update: 2021-09-18
 Source: Internet
 Author: User

Tags

long high table

chemistry technology

Search more information of high quality chemicals, good prices and reliable suppliers, visit www.echemi.com

It only takes a few minutes to assemble a complete genome

The complete genome can be assembled in just a few minutes

This image shows a partial map of 661,405 bacterial genomes
.

Image source: Massachusetts Institute of Technology, USA, etc.

This image shows a partial map of 661,405 bacterial genomes

On September 14, related research was published in Cell Systems , a journal under Cell Press
.

This technology makes the expression of genomic data more compact, and is inspired by words rather than letters that provide condensed building blocks for language models

Cell Systems

"We can quickly assemble entire genomes and metagenomics, including microbial genomes, on an ordinary laptop computer
.

" said Bonnie Berger, professor of the MIT Computer Science and Artificial Intelligence Laboratory and the author of the paper.

Since the Human Genome Project, great progress has been made in the field of genome assembly
.

After more than 10 years of international cooperation, in 2003, the Human Genome Project completed the first complete assembly of the human genome at a cost of approximately US$2.

Although the current human genome assembly project no longer takes several years, it still requires several days and huge computer power
.

The researchers said that the third-generation sequencing technology provides tens of thousands of base pairs of megabytes of high-quality genome sequence, but using such a large amount of data for genome assembly is challenging

The current technology involves pairing comparisons of all possible readings.
In order to achieve genome assembly more efficiently than current technologies, Bruijn and colleagues turned their attention to language models
.

Starting from the concept of de Bruijn graph (a simple and efficient data structure for genome assembly), the researchers developed a minimally spatialized de Bruin graph (mdBG), which uses short sequences of nucleotides instead of individual Nucleotides

Bruijn said: "Our mdBG only stores a small part of the total nucleotides while preserving the entire genome structure, which makes them orders of magnitude more efficient than the classic de Bruijn diagram
.

The researchers used this method to collect high-fidelity data (almost perfect single-molecule reading accuracy) of Drosophila melanogaster, as well as human genome data provided by Pacific Biosciences
.

When they evaluated the resulting genome, they found that compared with other genome assemblers, mdBG-based software required only 1/33 of the time and 1/8 of the random access memory

Next, the researchers established an index containing 661,406 bacterial genomes, which is by far the largest index of its kind
.
They found that this new technology can search for all drug resistance genes in 13 minutes, while using standard sequence alignment takes 7 hours
.

Berger said: "We know that the technology is effective, but we don't know that after further optimizing the code, it can scale so well on real data
.
"

Rayan Chikhi, a researcher at the Pasteur Institute and one of the participants in the study, said: "The new technology does not require some usually expensive pre-processing steps, such as the error correction required by most genome assembly methods
.
"

"We can also process sequencing data with an error rate of up to 4%
.
" Berger added, "As the price of long-read sequencers with different error rates drops rapidly, this capability opens the door to the popularization of sequencing data analysis
.
"

Berger pointed out that although the method currently performs best when processing Pacific Biosciences high-fidelity readings (the error rate is much lower than 1%), it may soon be compatible with the ultra-long readings of Oxford nanopores.
The error rate of nanopores is 5%~12%, but it can reach 4% soon
.

Berger said: "We hope to help scientists establish rapid genome testing sites, beyond PCR and marker arrays that may overlook important differences between genomes
.
" (Source: Chinese Science News Tang Yichen)

Related paper information: https://doi.
org/10.
1016/j.
cels.
2021.
08.
009

https://doi.
org/10.
1016/j.
cels.
2021.
08.
009

This article is an English version of an article which is originally in the Chinese language on echemi.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to service@echemi.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.