-
Categories
-
Pharmaceutical Intermediates
-
Active Pharmaceutical Ingredients
-
Food Additives
- Industrial Coatings
- Agrochemicals
- Dyes and Pigments
- Surfactant
- Flavors and Fragrances
- Chemical Reagents
- Catalyst and Auxiliary
- Natural Products
- Inorganic Chemistry
-
Organic Chemistry
-
Biochemical Engineering
- Analytical Chemistry
-
Cosmetic Ingredient
- Water Treatment Chemical
-
Pharmaceutical Intermediates
Promotion
ECHEMI Mall
Wholesale
Weekly Price
Exhibition
News
-
Trade Service
It only takes a few minutes to assemble a complete genome |
This image shows a partial map of 661,405 bacterial genomes
.
Image source: Massachusetts Institute of Technology, USA, etc.
This image shows a partial map of 661,405 bacterial genomes
On September 14, related research was published in Cell Systems , a journal under Cell Press
.
This technology makes the expression of genomic data more compact, and is inspired by words rather than letters that provide condensed building blocks for language models
Cell Systems
"We can quickly assemble entire genomes and metagenomics, including microbial genomes, on an ordinary laptop computer
.
" said Bonnie Berger, professor of the MIT Computer Science and Artificial Intelligence Laboratory and the author of the paper.
Since the Human Genome Project, great progress has been made in the field of genome assembly
.
After more than 10 years of international cooperation, in 2003, the Human Genome Project completed the first complete assembly of the human genome at a cost of approximately US$2.
Although the current human genome assembly project no longer takes several years, it still requires several days and huge computer power
.
The researchers said that the third-generation sequencing technology provides tens of thousands of base pairs of megabytes of high-quality genome sequence, but using such a large amount of data for genome assembly is challenging
The current technology involves pairing comparisons of all possible readings.
In order to achieve genome assembly more efficiently than current technologies, Bruijn and colleagues turned their attention to language models
.
Starting from the concept of de Bruijn graph (a simple and efficient data structure for genome assembly), the researchers developed a minimally spatialized de Bruin graph (mdBG), which uses short sequences of nucleotides instead of individual Nucleotides
Bruijn said: "Our mdBG only stores a small part of the total nucleotides while preserving the entire genome structure, which makes them orders of magnitude more efficient than the classic de Bruijn diagram
.
"
The researchers used this method to collect high-fidelity data (almost perfect single-molecule reading accuracy) of Drosophila melanogaster, as well as human genome data provided by Pacific Biosciences
.
When they evaluated the resulting genome, they found that compared with other genome assemblers, mdBG-based software required only 1/33 of the time and 1/8 of the random access memory
Next, the researchers established an index containing 661,406 bacterial genomes, which is by far the largest index of its kind
.
They found that this new technology can search for all drug resistance genes in 13 minutes, while using standard sequence alignment takes 7 hours
.
Berger said: "We know that the technology is effective, but we don't know that after further optimizing the code, it can scale so well on real data
.
"
Rayan Chikhi, a researcher at the Pasteur Institute and one of the participants in the study, said: "The new technology does not require some usually expensive pre-processing steps, such as the error correction required by most genome assembly methods
.
"
"We can also process sequencing data with an error rate of up to 4%
.
" Berger added, "As the price of long-read sequencers with different error rates drops rapidly, this capability opens the door to the popularization of sequencing data analysis
.
"
Berger pointed out that although the method currently performs best when processing Pacific Biosciences high-fidelity readings (the error rate is much lower than 1%), it may soon be compatible with the ultra-long readings of Oxford nanopores.
The error rate of nanopores is 5%~12%, but it can reach 4% soon
.
Berger said: "We hope to help scientists establish rapid genome testing sites, beyond PCR and marker arrays that may overlook important differences between genomes
.
" (Source: Chinese Science News Tang Yichen)
Related paper information: https://doi.
org/10.
1016/j.
cels.
2021.
08.
009
org/10.
1016/j.
cels.
2021.
08.
009