ks-package {ks} | R Documentation |
Kernel smoothing for data from 1- to 6-dimensions.
There are three main types of functions in this package:
computing kernel estimators - these function names begin with ‘k’
computing bandwidth selectors - these begin with ‘h’ (1-d) or ‘H’ (>1-d)
displaying kernel estimators - these begin with ‘plot’.
The kernel used throughout is the normal (Gaussian) kernel K. For 1-d data, the bandwidth h is the standard deviation of the normal kernel, whereas for multivariate data, the bandwidth matrix H is the variance matrix.
–For kernel density estimation, kde
computes
hat(f)(x) = n^(-1) sum_i K_H (x - X_i).
The bandwidth matrix H is a matrix of smoothing
parameters and its choice is crucial for the performance of kernel
estimators. For display, its plot
method calls plot.kde
.
–For kernel density estimators, there are several varieties of bandwidth selectors
least squares (or unbiased) cross validation (LSCV or UCV) hlscv
(1-d);
Hlscv
, Hlscv.diag
(2- to 6-d)
smoothed cross validation (SCV) hscv
(1-d);
Hscv
, Hscv.diag
(2- to 6-d)
–For kernel density derivative estimation, the main function is kdde
hat(f)^(r)(x) = n^(-1) sum_i D^r K_H (x - X_i).
The bandwidth selectors are a modified subset of those for
kde
, i.e. Hlscv
, Hns
, Hpi
, Hscv
with deriv.order>0
.
Its plot
method is plot.kdde
for plotting each
partial derivative singly.
–For kernel discriminant analysis, the main function is
kda
which computes density estimates for each the
groups in the training data, and the discriminant surface.
Its plot
method is plot.kda
. The wrapper function
hkda
, Hkda
computes
bandwidths for each group in the training data for kde
,
e.g. hpi
, Hpi
.
–For kernel functional estimation, the main function is
kfe
which computes the r-th order integrated density functional
hat(psi)_r = n^(-2) sum_i sum_j D^r K_H (X_i - X_j).
The plug-in selectors are hpi.kfe
(1-d), Hpi.kfe
(2- to 6-d).
Kernel function estimates are usually not required to computed
directly by the user, but only within other functions in the package.
–For kernel-based 2-sample testing, the main function is
kde.test
which computes the integrated
L2 distance between the two density estimates as the test
statistic, comprising a linear combination of 0-th order kernel
functional estimates:
hat(T) = hat(psi)_0,1 + hat(psi)_0,2 - (hat(psi)_0,12 + hat(psi)_0,21),
and the corresponding p-value. The psi are
zero order kernel functional estimates with the subscripts indicating
that 1 = sample 1 only, 2 = sample 2 only, and 12, 21 =
samples 1 and 2. The bandwidth selectors are hpi.kfe
,
Hpi.kfe
with deriv.order=0
.
–For kernel-based local 2-sample testing, the main function is
kde.local.test
which computes the squared distance
between the two density estimates as the test
statistic
hat(U)(x) = [hat(f)_1(x) - hat(f)_2(x)]^2
and the corresponding local
p-values. The bandwidth selectors are those used with kde
,
e.g. hpi, Hpi
.
–For kernel cumulative distribution function estimation, the main
function is kcde
hat(F)(x) = n^(-1) sum_i intK_H (x - X_i)
where intK is the integrated kernel.
The bandwidth selectors are hpi.kcde
,
Hpi.kcde
. Its plot
method is
plot.kcde
.
There exist analogous functions for the survival function hat(bar(F)).
–For kernel estimation of a ROC (receiver operating characteristic)
curve to compare two samples from hat(F)_1, hat(F)_2, the main function is kroc
(hat(F)_hat(Y1))(z), hat(F_hat(Y2))(z))
based on the cumulative distribution functions of hat(Yj)=hat(bar(F))_1(X_j), j=1,2.
The bandwidth selectors are those used with kcde
,
e.g. hpi.kcde, Hpi.kcde
for
hat(F)_hat(Yj), hat(bar(F))_1. Its plot
method
is plot.kroc
.
–For kernel estimation of a copula, the
main function is kcopula
hat(C)(z) = hat(F)(hat(F)_1^(-1)(z_1),..., hat(F)_d^(-1)(z_d))
where hat(F)_j^(-1)(z_j) is
the z_j-th quantile of of the j-th marginal
distribution hat(F_j).
The bandwidth selectors are those used with kcde
for
hat(F), hat(F)_j.
Its plot
method is plot.kcde
.
–For kernel estimation of a copula density, the
main function is kcopula.de
hat(c)(z) = hat(f)(z) = n^(-1) sum_i K_H (z - hat(Z)_i)
where hat(Z)_i = (hat(F)_1(X_i1), …, hat(F)_d(X_id)).
The bandwidth selectors are those used with kde
for
hat(c) and kcde
for hat(F)_j.
Its plot
method is plot.kde
.
–Binned kernel estimation is available for d = 1, 2, 3, 4. This makes kernel estimators feasible for large samples.
–For an overview of this package with 2-d density estimation, see
vignette("kde")
.
Tarn Duong for most of the package. M.P. Wand for the binned estimation, univariate plug-in selector and univariate density derivative estimator code. Jose E. Chacon for the unconstrained pilot functional estimation and fast implementation of derivative-based estimation code.
Bowman, A. & Azzalini, A. (1997) Applied Smoothing Techniques for Data Analysis. Oxford University Press, Oxford.
Duong, T. (2004) Bandwidth Matrices for Multivariate Kernel Density Estimation. Ph.D. Thesis, University of Western Australia.
Scott, D.W. (1992) Multivariate Density Estimation: Theory, Practice, and Visualization. John Wiley & Sons, New York.
Silverman, B. (1986) Density Estimation for Statistics and Data Analysis. Chapman & Hall/CRC, London.
Simonoff, J. S. (1996) Smoothing Methods in Statistics. Springer-Verlag. New York.
Wand, M.P. & Jones, M.C. (1995) Kernel Smoothing. Chapman & Hall/CRC, London.
sm
, KernSmooth