-
Categories
-
Pharmaceutical Intermediates
-
Active Pharmaceutical Ingredients
-
Food Additives
- Industrial Coatings
- Agrochemicals
- Dyes and Pigments
- Surfactant
- Flavors and Fragrances
- Chemical Reagents
- Catalyst and Auxiliary
- Natural Products
- Inorganic Chemistry
-
Organic Chemistry
-
Biochemical Engineering
- Analytical Chemistry
-
Cosmetic Ingredient
- Water Treatment Chemical
-
Pharmaceutical Intermediates
Promotion
ECHEMI Mall
Wholesale
Weekly Price
Exhibition
News
-
Trade Service
A new statistical approach provides a more efficient way to reveal biologically meaningful changes in genomic data that span multiple conditions—such as cell type or tissue
.
Genome-wide studies generate vast amounts of data, from millions of individual DNA sequences, to information about how many of thousands of genes are expressed where, to the location of
functional elements throughout the genome.
Due to the volume and complexity of the data, it is statistically challenging
to compare different biological conditions or studies conducted in different laboratories.
Qunhua Li, an associate professor of statistics at Penn State, said, "When you have multiple conditions, the difficulty is how to put the data together
in a way that is both statistically powerful and computationally efficient.
" "Existing methods are computationally expensive or produce results that are difficult to explain
biologically.
We developed a method called CLIMB that improves on existing methods, is computationally efficient, and produces biologically interpretable results
.
We tested three types of genomic data collected from hematopoietic cells (associated with blood stem cells), but the method can also be used for the analysis
of other 'omics' data.
”
The researchers describe the CLIMB (Composite Likelihood Empirical Bayes) method
in a paper published online Nov.
12 in the journal Nature Communications.
"In experiments where there is so much information but comes from a relatively small number of individuals, it is helpful to be able to use the information as effectively as possible," said
Hillary Koch.
Koch was a graduate student at Penn State and is now a senior statistician
at Moderna.
"Being able to look at everything together, even using information from relevant experiments, is statistically advantageous
.
CLIMB allows us to do just that
.
”
The CLIMB method uses the principles of two traditional techniques to analyze data
across multiple conditions.
One technique uses pairwise comparisons between a series of conditions, but interpretation becomes increasingly difficult
as additional conditions are added.
Another different technique combines each subject's activity patterns under different conditions into an "association vector," for example, a gene is up-regulated, down-regulated, or unchanged in multiple cell types
.
Association vectors directly reflect condition-specific patterns and are easy to interpret
.
However, since there can be many different combinations even with only a few conditions, the computation is computationally intensive
.
To overcome this challenge, the second approach makes assumptions
about how to simplify data that is not always correct.
"CLIMB employs aspects of both approaches," Koch said
.
"We end up analyzing correlation vectors, but first we use pairwise analysis to identify patterns
that may have pre-existed.
Instead of making assumptions about the data, we use pairwise information to eliminate combinations
that the data does not support.
This greatly reduces the possible schema space that would otherwise make computationally intensive
.
”
After compiling a simplified set of possible correlation vectors, the method brings together
topics that follow the same pattern across conditions.
For example, the findings can tell researchers which genes are collectively upregulated in some cell types and downregulated collectively in others
.
The researchers tested their method using a technique called RNA-seq, which measures the amount of RNA produced by all genes expressed in a cell to check whether certain genes help determine what type of cell
hematopoietic stem cells eventually become.
"Compared with the popular pairing method, our results are more specific," Li said
.
"Our genetic list is more concise and biologically relevant
.
"
While the traditional pairwise approach identifies 6-7 thousand genes of interest, the CLIMB method has a much smaller range, with only 2-3 thousand genes, of which at least 1,000 genes were identified
in both analyses.
Pennsylvania State University Biochemistry and Molecular Biology T.
Professor Ross Hardison of Ming Chu said: "Different blood cell types have different functions – some become red blood cells, others become immune cells – and we wanted to know which genes were more likely to be involved in determining each different cell type
.
" "The CLIMB method extracts some important genes; Some are what we already know, some are what we already know
.
But the difference is that these results are more specific and interpretable
than previous analyses.
”
The researchers also used CLIMB on data generated by another experimental technique, ChIP-seq, which identifies where
certain proteins in the genome bind to DNA.
They explored how the binding of a protein called CTCF — a transcription factor that helps establish the interactions needed for gene regulation in the nucleus — occurs or does not change
in 17 cell populations from the same hematopoietic stem cell.
The CLIMB analysis identified different classes of CTCF binding sites, some of which revealed the role of this transcription factor in all blood cells, while others showed a role in
specific cell types.
Finally, the team explored data from another experimental technique called DNase-seq, which identifies the location of regulatory regions to compare access to chromatin (a complex of DNA and proteins) in 38 human cell types
.
"For these three tests, we wanted to see if our results were biologically relevant, so we compared our results with independent data, such as high-throughput sequencing of histone modifications and transcription
factor footprints.
Koch said
.
"In each case, our results were consistent
with these other methods.
Next, we want to increase the calculation speed of our method and increase the number of
conditions it can handle.
For example, chromatin accessibility data for more cell types is available, so we want to increase the scale of
CLIMB.
”
In addition to Li, Koch and Hardison, the research team included Cheryl Keller, Guanjue Xiang and Belinda Jardine of Pennsylvania State University, Feipeng Zhang of Xi'an Jiaotong University in China, and Yicheng
Wang of the University of British Columbia in Canada.
The research was supported by the National Institutes of Health, including the National Institute of General Medical Sciences, the National Human Genome Institute, and the National Institute of
Diabetes, Digestive and Kidney Diseases.