varianceStabilizingTransformation {DESeq2} | R Documentation |
This function calculates a variance stabilizing
transformation (VST) from the fitted dispersion-mean
relation(s) and then transforms the count data (normalized
by division by the size factors or normalization factors),
yielding a matrix of values which are now approximately
homoskedastic (having constant variance along the range of
mean values). The rlogTransformation
is less
sensitive to size factors, which can be an issue when size
factors vary widely. This transformation is useful when
checking for outliers or as input for machine learning
techniques such as clustering or linear discriminant
analysis.
varianceStabilizingTransformation(object, blind = TRUE) getVarianceStabilizedData(object)
object |
a DESeqDataSet, with |
blind |
logical, whether to blind the transformation to the experimental design. blind=TRUE should be used for comparing samples in an manner unbiased by prior information on samples, for example to perform sample QA (quality assurance). blind=FALSE should be used for transforming data for downstream analysis, where the full use of the design information should be made. If many of genes have large differences in counts due to the experimental design, it is important to set blind=FALSE for downstream analysis. |
Note that neither rlog transformation nor the VST are used
by the differential expression estimation in
DESeq
, which always occurs on the raw count
data, through generalized linear modeling which
incorporates knowledge of the variance-mean dependence. The
rlog transformation and VST are offered as separate
functionality which can be used for visualization,
clustering or other machine learning tasks. See the
transformation section of the vignette for more details.
For each sample (i.e., column of counts(dds)
), the
full variance function is calculated from the raw variance
(by scaling according to the size factor and adding the
shot noise). We recommend a blind estimation of the
variance function, i.e., one ignoring conditions. This is
performed by default, and can be modified using the 'blind'
argument.
A typical workflow is shown in Section Variance stabilizing transformation in the package vignette.
If estimateDispersions
was called with
fitType="parametric"
, a closed-form expression for
the variance stabilizing transformation is used on the
normalized count data. The expression can be found in the
file ‘vst.pdf’ which is distributed with the vignette.
If estimateDispersions
was called with
fitType="local"
, the reciprocal of the square root
of the variance of the normalized counts, as derived from
the dispersion fit, is then numerically integrated, and the
integral (approximated by a spline function) is evaluated
for each count value in the column, yielding a transformed
value.
In both cases, the transformation is scaled such that for large counts, it becomes asymptotically (for large values) equal to the logarithm to base 2.
The variance stabilizing transformation from a previous
dataset can be "frozen" and reapplied to new samples. See
the "Data quality assessment" section of the vignette for
strategies to see if new samples are sufficiently similar
to previous datasets. The "freezing" is accomplished by
saving the dispersion function accessible with
dispersionFunction
, assigning this to the
DESeqDataSet
with the new samples, and running
varianceStabilizingTransformation with 'blind' set to FALSE
(see example below). Then the dispersion function from the
previous dataset will be used to transform the new
sample(s). See estimateSizeFactors
for
details on how to "freeze" size factor estimation.
Limitations: In order to preserve normalization, the same
transformation has to be used for all samples. This results
in the variance stabilizition to be only approximate. The
more the size factors differ, the more residual dependence
of the variance on the mean you will find in the
transformed data. As shown in the vignette, you can use the
function meanSdPlot
from the package vsn to
see whether this is a problem for your data.
for varianceStabilizingTransformation
, a
SummarizedExperiment
. The matrix of
transformed values are accessed by assay(vsd)
. for
getVarianceStabilizedData
, a matrix
of the
same dimension as the count data, containing the
transformed values.
Simon Anders
dds <- makeExampleDESeqDataSet(m=6) vsd <- varianceStabilizingTransformation(dds, blind=TRUE) par(mfrow=c(1,2)) plot(rank(rowMeans(counts(dds))), genefilter::rowVars(log2(counts(dds)+1)), main="log2(x+1) transform") plot(rank(rowMeans(assay(vsd))), genefilter::rowVars(assay(vsd)), main="VST") # learn the dispersion function of a dataset design(dds) <- ~ 1 dds <- estimateSizeFactors(dds) dds <- estimateDispersions(dds) # use the previous dispersion function for a new sample ddsNew <- makeExampleDESeqDataSet(m=1) dispersionFunction(ddsNew) <- dispersionFunction(dds) vsdNew <- varianceStabilizingTransformation(ddsNew, blind=FALSE)