 Home > Biochemistry News > Biotechnology News > Zhang Qiangfeng's research group in the School of Life Sciences developed an artificial intelligence algorithm for online integration of single-cell sequencing data

Zhang Qiangfeng's research group in the School of Life Sciences developed an artificial intelligence algorithm for online integration of single-cell sequencing data

 Last Update: 2022-10-25
 Source: Internet
 Author: User

Tags

nature author

accuracy machine learning

Search more information of high quality chemicals, good prices and reliable suppliers, visit www.echemi.com

With the development of single-cell sequencing technology, single-cell scientific research continues to deepen, the scale is getting larger and larger, and the objects studied are becoming more and more complex
.
Integrating single-cell sequencing data from different sources, eliminating batch effects, and conducting comprehensive mining and analysis are now a basic and core link
in single-cell sequencing data analysis 。 At present, the integration of single-cell sequencing data faces the following problems: 1) The batch effect caused by different experimental samples, experimental platforms, library construction methods and even operations will introduce non-biological noise into single-cell sequencing data, interfering with the extraction and analysis of biological differences between cells; 2) The scale of single-cell research continues to expand, and the data at the level of millions of cells puts forward higher requirements for the efficiency of the integration algorithm; 3) The types of single-cell sequencing samples are also increasing, with different single-cell sequencing datasets often including highly heterogeneous cell subpopulations; 4) Finally, the latest and most important point, how to fully reuse the old knowledge of a large amount of existing data, explore and analyze
the new data.
At present, most of the single-cell sequencing data integration algorithms correct batch effects based on the cell similarity between different batches of data, which has the drawbacks
of over-integration (especially the integration of datasets with large differences in cell heterogeneity), poor scalability, and inability to directly apply existing models to new datasets.

On October 17, 2022, Associate Professor Zhang Qiangfeng's research group from the School of Life Sciences/Advanced Innovation Center for Structural Biology/Tsinghua-Peking University Joint Center for Life Sciences of Tsinghua University published an online publication in Nature Communications entitled "Online single-cell data integration by projecting heterogeneous datasets into a unified cell embedding space.
" through projecting heterogeneous datasets into a common cell-embedding space
).
In this research, they developed SCALEX, an artificial intelligence algorithm based on a variational autoencoder deep learning framework, which can integrate
single-cell sequencing data online 。 SCALEX uses an asymmetric autoencoder structure composed of batch-independent encoders and batch-specific decoders to obtain a highly generalized encoder through extensive learning, which eliminates batch effects
while preserving biological differences by projecting high-dimensional single-cell sequencing data into the low-dimensional cell embedding space.

FIGURE: SCALEX MODEL FRAMEWORK

SCALEX has the following four main features: 1) Compared with the existing single-cell sequencing data integration methods, SCALEX has obvious advantages in integration accuracy; 2) SACLEX still maintains high computational efficiency under the amount of millions of single-cell data, which is suitable for the integration and analysis of ultra-high-throughput single-cell sequencing data; 3) SCALEX effectively avoids overcorrection in the integration of single-cell sequencing data, and is suitable for the integration of highly heterogeneous and complex samples; 4) Support single-cell RNA-seq, single-cell ATAC-seq and other multi-omics integration data integration
.
These features make SCALEX suitable for building single-cell maps
.
The developers integrated single-cell datasets from multiple studies and multiple tissues to construct three large-scale single-cell maps
for mice, humans, and COVID-19.

A particular advantage of SCALEX is its high generalization encoder
.
This encoder can project single-cell sequencing data to generate a batch-independent, unified low-dimensional cell embedding space
.
For newly generated data, SCALEX does not require retraining the encoder to project the new data into this unified low-dimensional cell embedding space
.
This type of integration is called "online integration
.
" A huge benefit of online integration is that it is easy to compare and analyze new data with the original generated foundational data such as single-cell atlas (which needs to be generated by SCALEX data integration), so as to obtain inspiration and guidance on biological knowledge from the foundational data, and directly support analytical tasks
such as data annotation and law verification.
In addition, the cellular content of the original single-cell atlas is also enriched and expanded in the process of continuously adding new data, enabling new biological discoveries
.

In summary, in this study, the researchers developed the SCALEX single-cell sequencing data artificial intelligence analysis tool, which can map the gene expression profiles of different batches of cells to batch-independent unified low-dimensional cell embedding space, effectively eliminate batch effects in the data and preserve the inherent biological differences between cells, so as to achieve the effective integration
of different batches of data.
SCALEX is suitable for the integration of single-cell sequencing data at the map level and will provide foundational support
for ongoing research initiatives such as ultra-large-scale single-cell mapping across the life sciences and biomedical fields.

Associate Professor Zhang Qiangfeng of the School of Life Sciences of Tsinghua University is the corresponding author of this paper, Xiong Lei (graduated), a 2015 doctoral student of the School of Life Sciences of Tsinghua University (graduated) and Tian Kang, a 2018 doctoral student, are the co-first authors of the paper, Li Yuzhe, a 2019 doctoral student, and Ning Weixi, a 2021 doctoral student, provided important help for the data analysis in the paper, and Professor Gao Xin, director of the BioMap Institute and computational biologist at King Abdullah University of Science and Technology, participated in the collaborative research
。 This work was supported
by the National Key Research and Development Program of China, the National Natural Science Foundation of China, the Beijing Advanced Innovation Center for Structural Biology, the Tsinghua-Peking University Joint Center for Life Sciences, the Computing Platform of Tsinghua University, the Shanghai Institute of Wisdom, and the Office of Research Management of King Abdullah University of Science and Technology.

Original link:

This article is an English version of an article which is originally in the Chinese language on echemi.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to service@echemi.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.