如果想合并多个GEO数据集或者TCGA数据集,批次效应是无法绕过的问题(尤其在寻找差异基因的时候)
在针对NGS数据,或者high-dimensional数据(gene expression/RNA sequencing/methylation/brain imaging data)而言,sva
包是一个比较好的选择;sva
包有三种处理artifacts的方法:
- identifying and estimating surrogate variables for unknown sources of variation in high-throughput experiments (Leek and Storey 2007 PLoS Genetics,2008 PNAS)
- directly removing known batch effects using ComBat (Johnson et al. 2007 Biostatistics)
- removing batch effects with known control probes (Leek 2014 biorXiv)
对于我们数据集合并这个情况而言,从上可得是属于第2种应用方向,即用ComBat
方法来移除已知的批次效应(预先指定好批次)
此外,
limma
包的removeBatchEffect
函数也可用于移除已知的批次效应,有兴趣的话可以尝试下
那么如何使用sva
包来移除批已知的批次效应呢,对于GEO数据,可以参考GEO 批次效应就靠一个函数搞定这篇文章
对于TCGA数据(这里指TCGA的RNA-seq数据集),或者其他来源的转录组RNA-seq数据,若想用sva
包来合并,则可以考虑用ComBat_seq
函数来处理
如果按照常规的sva
包安装方式,如BiocManager::install("sva")
,是找不到ComBat_seq
函数的;但是在sva
的说明文档中则是有该函数使用场景的介绍及使用方法
ComBat-Seq is an improved model based on the ComBat framework, which specifically targets RNA-Seq count data. It uses a negative binomial regression to model the count matrix, and estimate parameters representing the batch effects. Then it provides adjusted data by mapping the original data to an expected distribution if there were no batch effects. The adjusted data preserve the integer nature of count matrix. Like ComBat, it requires known a batch variable
这时我们需要用另外一种方式安装下sva包(已发表的文章:ComBat-Seq: batch effect adjustment for RNA-Seq count data)
devtools::install_github("zhangyuqing/sva-devel")
ComBat-Seq
函数使用方法很简单,类似于ComBat
函数的用法,并且输入数据不需要做任何转化,raw count矩阵即可(ComBat-Seq takes untransformed, raw count matrix as input)
# Batch effect adjustment
adjusted_counts <- sva::ComBat_seq(as.matrix(alldata), batch = batch, group = group)
上述batch
为批次向量(可以1/2等数值来指定),group
为生物学意义上的分组(如tumor/normal等等)
官方文档示例:
count_matrix <- matrix(rnbinom(400, size=10, prob=0.1), nrow=50, ncol=8)
batch <- c(rep(1, 4), rep(2, 4))
group <- rep(c(0,1), 4)
adjusted_counts <- ComBat_seq(count_matrix, batch=batch, group=group)
以上只是一个简单的介绍,详细用法可参考sva的文档:https://www.bioconductor.org/packages/release/bioc/vignettes/sva/inst/doc/sva.pdf
本文出自于http://www.bioinfo-scrounger.com转载请注明出处