Kernel density estimation is a popular tool for visualising the
distribution of data. See Chacon & Duong (2018), for example, for an
overview. When multivariate kernel density estimation is considered it
is usually in the constrained context with diagonal bandwidth matrices,
e.g. in the R packages sm and
KernSmooth. In contrast, the ks package
implements both diagonal and unconstrained data-driven bandwidth
matrices for kernel density estimation.
For our target density, we use the `dumbbell’ density, given by the normal mixture \[ \frac{4}{11} N \bigg( \begin{bmatrix}-2 \\ 2\end{bmatrix}, \begin{bmatrix}1 & 0 \\ 0 & 1 \end{bmatrix} \bigg)+ \frac{3}{11} N \bigg( \begin{bmatrix}0 \\ 0\end{bmatrix}, \begin{bmatrix}0.8 & -0.72 \\ -0.72 & 0.8\end{bmatrix} \bigg)+ \frac{4}{11} N \bigg( \begin{bmatrix}2 \\ -2\end{bmatrix}, \begin{bmatrix}1 & 0 \\ 0 & 1 \end{bmatrix} \bigg). \] This density is unimodal and we draw a sample of 2000 data points.
library(ks)
mus <- rbind(c(-2,2), c(0,0), c(2,-2))
Sigmas <- rbind(diag(2), matrix(c(0.8, -0.72, -0.72, 0.8), nrow=2), diag(2))
cwt <- 3/11
props <- c((1-cwt)/2, cwt, (1-cwt)/2)
plotmixt(mus=mus, Sigmas=Sigmas, props=props, display="filled.contour", xlim=c(-4,4), ylim=c(-4,4))set.seed(88192)
samp <- 2000
x <- rmvnorm.mixt(n=samp, mus=mus, Sigmas=Sigmas, props=props)
colnames(x) <- c("x","y")
plot(x, pch=16, cex=0.5, xlim=c(-4,4), ylim=c(-4,4))We use Hpi for unconstrained plug-in selectors and
Hpi.diag for diagonal plug-in selectors.
To compute a kernel density estimate, the command is
kde, which creates a kde class object
We use the plot method for kde objects to
display these kernel density estimates. The default is a contour plot
with the upper 25%, 50% and 75% contours of the (sample) highest density
regions. The diagonal bandwidth matrix constrains the smoothing to be
performed in directions parallel to the co-ordinate axes, so it is not
able to apply accurate levels of smoothing to the obliquely oriented
central portion. The result is a multimodal density estimate. The
unconstrained bandwidth matrix correctly produces a unimodal density
estimate.
The unconstrained SCV (Smoothed Cross Validation) selector is
Hscv and its diagonal version is Hscv.diag.
The most reasonable density estimate below is from the unconstrained SCV
selector.
Hscv1 <- Hscv(x=x)
Hscv2 <- Hscv.diag(x=x)
fhat.cv1 <- kde(x=x, H=Hscv1)
fhat.cv2 <- kde(x=x, H=Hscv2)
plot(fhat.cv1, main="SCV", display="filled.contour", xlim=c(-4,4), ylim=c(-4,4))The unconstrained bandwidth selectors will be better than their diagonal counterparts when the data have large mass oriented obliquely to the co-ordinate axes, like for the dumbbell data. The unconstrained plug-in and the SCV selectors can be viewed as generally recommended selectors.
Chacon, J. E. and Duong, T. (2018). Multivariate Kernel Smoothing and Its Applications Chapman & Hall/CRC Press, Boca Raton.
Duong, T. 2007). ks: Kernel density estimation and kernel discriminant analysis for multivariate data in R Journal of Statistical Software 21 (7).
– Generated 09 May 2026.