echemi logo
Product
  • Product
  • Supplier
  • Inquiry
    Home > Biochemistry News > Biotechnology News > Nature: New statistical methods improve genomic analysis

    Nature: New statistical methods improve genomic analysis

    • Last Update: 2022-11-26
    • Source: Internet
    • Author: User
    Search more information of high quality chemicals, good prices and reliable suppliers, visit www.echemi.com
      

    A new statistical approach provides a more efficient way to reveal biologically meaningful changes in genomic data that span multiple conditions—such as cell type or tissue
    .

    Genome-wide studies generate vast amounts of data, from millions of individual DNA sequences, to information about how many of thousands of genes are expressed where, to the location of
    functional elements throughout the genome.
    Due to the volume and complexity of the data, it is statistically challenging
    to compare different biological conditions or studies conducted in different laboratories.

    Qunhua Li, an associate professor of statistics at Penn State, said, "When you have multiple conditions, the difficulty is how to put the data together
    in a way that is both statistically powerful and computationally efficient.
    " "Existing methods are computationally expensive or produce results that are difficult to explain
    biologically.
    We developed a method called CLIMB that improves on existing methods, is computationally efficient, and produces biologically interpretable results
    .
    We tested three types of genomic data collected from hematopoietic cells (associated with blood stem cells), but the method can also be used for the analysis
    of other 'omics' data.

    The researchers describe the CLIMB (Composite Likelihood Empirical Bayes) method
    in a paper published online Nov.
    12 in the journal Nature Communications.

    "In experiments where there is so much information but comes from a relatively small number of individuals, it is helpful to be able to use the information as effectively as possible," said
    Hillary Koch.
    Koch was a graduate student at Penn State and is now a senior statistician
    at Moderna.
    "Being able to look at everything together, even using information from relevant experiments, is statistically advantageous
    .
    CLIMB allows us to do just that
    .

    The CLIMB method uses the principles of two traditional techniques to analyze data
    across multiple conditions.
    One technique uses pairwise comparisons between a series of conditions, but interpretation becomes increasingly difficult
    as additional conditions are added.

    Another different technique combines each subject's activity patterns under different conditions into an "association vector," for example, a gene is up-regulated, down-regulated, or unchanged in multiple cell types
    .
    Association vectors directly reflect condition-specific patterns and are easy to interpret
    .
    However, since there can be many different combinations even with only a few conditions, the computation is computationally intensive
    .
    To overcome this challenge, the second approach makes assumptions
    about how to simplify data that is not always correct.

    "CLIMB employs aspects of both approaches," Koch said
    .
    "We end up analyzing correlation vectors, but first we use pairwise analysis to identify patterns
    that may have pre-existed.
    Instead of making assumptions about the data, we use pairwise information to eliminate combinations
    that the data does not support.
    This greatly reduces the possible schema space that would otherwise make computationally intensive
    .

    After compiling a simplified set of possible correlation vectors, the method brings together
    topics that follow the same pattern across conditions.
    For example, the findings can tell researchers which genes are collectively upregulated in some cell types and downregulated collectively in others
    .

    The researchers tested their method using a technique called RNA-seq, which measures the amount of RNA produced by all genes expressed in a cell to check whether certain genes help determine what type of cell
    hematopoietic stem cells eventually become.

    "Compared with the popular pairing method, our results are more specific," Li said
    .
    "Our genetic list is more concise and biologically relevant
    .
    "

    While the traditional pairwise approach identifies 6-7 thousand genes of interest, the CLIMB method has a much smaller range, with only 2-3 thousand genes, of which at least 1,000 genes were identified
    in both analyses.

    Pennsylvania State University Biochemistry and Molecular Biology T.
    Professor Ross Hardison of Ming Chu said: "Different blood cell types have different functions – some become red blood cells, others become immune cells – and we wanted to know which genes were more likely to be involved in determining each different cell type
    .
    " "The CLIMB method extracts some important genes; Some are what we already know, some are what we already know
    .
    But the difference is that these results are more specific and interpretable
    than previous analyses.

    The researchers also used CLIMB on data generated by another experimental technique, ChIP-seq, which identifies where
    certain proteins in the genome bind to DNA.
    They explored how the binding of a protein called CTCF — a transcription factor that helps establish the interactions needed for gene regulation in the nucleus — occurs or does not change
    in 17 cell populations from the same hematopoietic stem cell.
    The CLIMB analysis identified different classes of CTCF binding sites, some of which revealed the role of this transcription factor in all blood cells, while others showed a role in
    specific cell types.

    Finally, the team explored data from another experimental technique called DNase-seq, which identifies the location of regulatory regions to compare access to chromatin (a complex of DNA and proteins) in 38 human cell types
    .

    "For these three tests, we wanted to see if our results were biologically relevant, so we compared our results with independent data, such as high-throughput sequencing of histone modifications and transcription
    factor footprints.
    Koch said
    .
    "In each case, our results were consistent
    with these other methods.
    Next, we want to increase the calculation speed of our method and increase the number of
    conditions it can handle.
    For example, chromatin accessibility data for more cell types is available, so we want to increase the scale of
    CLIMB.

    In addition to Li, Koch and Hardison, the research team included Cheryl Keller, Guanjue Xiang and Belinda Jardine of Pennsylvania State University, Feipeng Zhang of Xi'an Jiaotong University in China, and Yicheng
    Wang of the University of British Columbia in Canada.
    The research was supported by the National Institutes of Health, including the National Institute of General Medical Sciences, the National Human Genome Institute, and the National Institute of
    Diabetes, Digestive and Kidney Diseases.

    CLIMB: High-dimensional association detection in large scale genomic data
    This article is an English version of an article which is originally in the Chinese language on echemi.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to service@echemi.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

    Contact Us

    The source of this page with content of products and services is from Internet, which doesn't represent ECHEMI's opinion. If you have any queries, please write to service@echemi.com. It will be replied within 5 days.

    Moreover, if you find any instances of plagiarism from the page, please send email to service@echemi.com with relevant evidence.