replaceOutliers {DESeq2} | R Documentation |
This function replaces outlier counts flagged by extreme
Cook's distances, as calculated by DESeq
,
nbinomWaldTest
or nbinomLRT
,
with values predicted by the trimmed mean over all samples
(and adjusted by size factor or normalization factor). This
function replaces the counts in the matrix returned by
counts(dds)
and the Cook's distances in
assays(dds)[["cooks"]]
. Original counts are
preserved in assays(dds)[["originalCounts"]]
.
replaceOutliers(dds, trim = 0.2, cooksCutoff, minReplicates = 7, whichSamples) replaceOutliersWithTrimmedMean(dds, trim = 0.2, cooksCutoff, minReplicates = 7, whichSamples)
dds |
a DESeqDataSet object, which has already been
processed by either DESeq, nbinomWaldTest or nbinomLRT,
and therefore contains a matrix contained in
|
trim |
the fraction (0 to 0.5) of observations to be trimmed from each end of the normalized counts for a gene before the mean is computed |
cooksCutoff |
the threshold for defining an outlier to be replaced. Defaults to the .99 quantile of the F(p, m - p) distribution, where p is the number of parameters and m is the number of samples. |
minReplicates |
the minimum number of replicate samples necessary to consider a sample eligible for replacement (including itself). Outlier counts will not be replaced if the sample is in a cell which has less than minReplicates replicates. |
whichSamples |
optional, a numeric or logical index to specify which samples should have outliers replaced. if missing, this is determined using minReplicates. |
The DESeq
function calculates a diagnostic
measure called Cook's distance for every gene and every
sample. The results
function then sets the
p-values to NA
for genes which contain an outlying
count as defined by a Cook's distance above a threshold.
With many degrees of freedom, i.e. many more samples than
number of parameters to be estimated– it might be
undesirable to remove entire genes from the analysis just
because their data include a single count outlier. An
alternate strategy is to replace the outlier counts with
the trimmed mean over all samples, adjusted by the size
factor or normalization factor for that sample. The
following simple function performs this replacement for the
user, for samples which have at least minReplicates
number of replicates (including that sample). For more
information on Cook's distance, please see the two sections
of the vignette: 'Dealing with count outliers' and 'Count
outlier detection'.
a DESeqDataSet with replaced counts in the slot returned by
counts
and the original counts preserved in
assays(dds)[["originalCounts"]]
dds <- makeExampleDESeqDataSet(n=100) dds <- DESeq(dds) ddsReplace <- replaceOutliers(dds)