DESeq {DESeq2} | R Documentation |
This function performs a default analysis through the steps:
estimation of size factors:
estimateSizeFactors
estimation of
dispersion: estimateDispersions
negative binomial GLM fitting and Wald statistics:
nbinomWaldTest
For complete details on each
step, see the manual pages of the respective functions.
After the DESeq
function returns a DESeqDataSet
object, results tables (log2 fold changes and p-values) can
be generated using the results
function. See
the manual page for results
for information
on independent filtering and p-value adjustment for
multiple test correction.
DESeq(object, test = c("Wald", "LRT"), fitType = c("parametric", "local", "mean"), betaPrior, full = design(object), reduced, quiet = FALSE, minReplicatesForReplace = 7, modelMatrixType)
object |
a DESeqDataSet object, see the constructor
functions |
test |
either "Wald" or "LRT", which will then use
either Wald significance tests (defined by
|
fitType |
either "parametric", "local", or "mean"
for the type of fitting of dispersions to the mean
intensity. See |
betaPrior |
whether or not to put a zero-mean normal
prior on the non-intercept coefficients (Tikhonov/ridge
regularization) See |
full |
the full model formula, this should be the
formula in |
reduced |
a reduced formula to compare against, e.g. the full model with a term or terms of interest removed, only used by the likelihood ratio test |
quiet |
whether to print messages at each step |
minReplicatesForReplace |
the minimum number of
replicates required in order to use
|
modelMatrixType |
either "standard" or "expanded",
which describe how the model matrix, X of the GLM formula
is formed. "standard" is as created by
|
The differential expression analysis uses a generalized linear model of the form:
K_ij ~ NB(mu_ij, alpha_i)
mu_ij = s_j * q_ij
log2(q_ij) = x_j. * beta_i
where counts K_ij for gene i, sample j are
modeled using a negative binomial distribution with fitted
mean mu_ij and a gene-specific dispersion
parameter alpha_i. The fitted mean is
composed of a sample-specific size factor s_j
and a parameter q_ij proportional to the
expected true concentration of fragments for sample j. The
coefficients beta_i give the log2 fold
changes for gene i for each column of the model matrix
X. The sample-specific size factors can be
replaced by gene-specific normalization factors for each
sample using normalizationFactors
. For
details on the fitting of the log2 fold changes and
calculation of p-values see nbinomWaldTest
(or nbinomLRT
if using test="LRT"
).
Experiments without replicates do not allow for estimation
of the dispersion of counts around the expected value for
each group, which is critical for differential expression
analysis. If an experimental design is supplied which does
not contain the necessary degrees of freedom for
differential analysis, DESeq
will provide a message
to the user and follow the strategy outlined in Anders and
Huber (2010) under the section 'Working without
replicates', wherein all the samples are considered as
replicates of a single group for the estimation of
dispersion. As noted in the reference above: "Some
overestimation of the variance may be expected, which will
make that approach conservative." Furthermore, "while one
may not want to draw strong conclusions from such an
analysis, it may still be useful for exploration and
hypothesis generation."
The argument minReplicatesForReplace
is used to
decide which samples are eligible for automatic replacement
in the case of extreme Cook's distance. By default,
DESeq
will replace outliers if the Cook's distance
is large for a sample which has 7 or more replicates
(including itself). This replacement is performed by the
replaceOutliers
function. This default
behavior helps to prevent filtering genes based on Cook's
distance when there are many degrees of freedom. See
results
for more information about filtering
using Cook's distance, and the 'Dealing with outliers'
section of the vignette. Unlike the behavior of
replaceOutliers
, here original counts are
kept in the matrix returned by counts
,
original Cook's distances are kept in
assays(dds)[["cooks"]]
, and the replacement counts
used for fitting are kept in
assays(object)[["replaceCounts"]]
.
Note that if a log2 fold change prior is used
(betaPrior=TRUE) then expanded model matrices will be used
in fitting. These are described in
nbinomWaldTest
and in the vignette. The
contrast
argument of results
should be
used for generating results tables.
a DESeqDataSet
object with results stored as
metadata columns. These results should accessed by calling
the results
function. By default this will
return the log2 fold changes and p-values for the last
variable in the design formula. See results
for how to access results for other variables.
Michael Love
DESeq2 reference:
Michael I Love, Wolfgang Huber, Simon Anders: Moderated estimation of fold change and dispersion for RNA-Seq data with DESeq2. bioRxiv preprint (2014) http://dx.doi.org/10.1101/002832
DESeq reference:
Simon Anders, Wolfgang Huber: Differential expression analysis for sequence count data. Genome Biology 11 (2010) R106, http://dx.doi.org/10.1186/gb-2010-11-10-r106
dds <- makeExampleDESeqDataSet(betaSD=1, n=100) dds <- DESeq(dds) res <- results(dds) ddsLRT <- DESeq(dds, test="LRT", reduced= ~ 1) resLRT <- results(ddsLRT)