estimateDispersions {DESeq2} | R Documentation |
This function obtains dispersion estimates for negative binomial distributed data.
## S4 method for signature 'DESeqDataSet' estimateDispersions(object,fitType=c("parametric","local","mean"),maxit=100, quiet=FALSE) ## S4 method for signature 'DESeqDataSet' estimateDispersions(object, fitType = c("parametric", "local", "mean"), maxit = 100, quiet = FALSE)
object |
a DESeqDataSet |
fitType |
either "parametric", "local", or "mean" for the type of fitting of dispersions to the mean intensity.
|
maxit |
control parameter: maximum number of iterations to allow for convergence |
quiet |
whether to print messages at each step |
Typically the function is called with the idiom:
dds <- estimateDispersions(dds)
The fitting proceeds as follows: for each gene, an estimate
of the dispersion is found which maximizes the Cox
Reid-adjusted profile likelihood (the methods of Cox
Reid-adjusted profile likelihood maximization for
estimation of dispersion in RNA-Seq data were developed by
McCarthy, et al. (2012), first implemented in the edgeR
package in 2010); a trend line capturing the
dispersion-mean relationship is fit to the maximum
likelihood estimates; a normal prior is determined for the
log dispersion estimates centered on the predicted value
from the trended fit with variance equal to the difference
between the observed variance of the log dispersion
estimates and the expected sampling variance; finally
maximum a posteriori dispersion estimates are returned.
This final dispersion parameter is used in subsequent
tests. The final dispersion estimates can be accessed from
an object using dispersions
. The fitted
dispersion-mean relationship is also used in
varianceStabilizingTransformation
. All of the
intermediate values (gene-wise dispersion estimates, fitted
dispersion estimates from the trended fit, etc.) are stored
in mcols(dds)
, with information about these columns
in mcols(mcols(dds))
.
The log normal prior on the dispersion parameter has been proposed by Wu, et al. (2012) and is also implemented in the DSS package.
In DESeq2, the dispersion estimation procedure described above replaces the different methods of dispersion from the previous version of the DESeq package.
estimateDispersions
checks for the case of an
analysis with as many samples as the number of coefficients
to fit, and will temporarily substitute a design formula
~ 1
for the purposes of dispersion estimation. This
treats the samples as replicates for the purpose of
dispersion estimation. As mentioned in the DESeq paper:
"While one may not want to draw strong conclusions from
such an analysis, it may still be useful for exploration
and hypothesis generation."
The lower-level functions called by
estimateDispersions
are:
estimateDispersionsGeneEst
,
estimateDispersionsFit
, and
estimateDispersionsMAP
.
The DESeqDataSet passed as parameters, with the dispersion
information filled in as metadata columns, accessible via
mcols
, or the final dispersions accessible via
dispersions
.
Simon Anders, Wolfgang Huber: Differential expression analysis for sequence count data. Genome Biology 11 (2010) R106, http://dx.doi.org/10.1186/gb-2010-11-10-r106
McCarthy, DJ, Chen, Y, Smyth, GK: Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation. Nucleic Acids Research 40 (2012), 4288-4297, http://dx.doi.org/10.1093/nar/gks042
Wu, H., Wang, C. & Wu, Z. A new shrinkage estimator for dispersion improves differential expression detection in RNA-seq data. Biostatistics (2012). http://dx.doi.org/10.1093/biostatistics/kxs033
dds <- makeExampleDESeqDataSet() dds <- estimateSizeFactors(dds) dds <- estimateDispersions(dds) head(dispersions(dds))