| disstree {TraMineR} | R Documentation |
Analyse non-measurable objects described through a set of dissimilarity by recursively partionning the population.
disstree(formula, data= NULL, minSize = 0.05,maxdepth = 5, R = 1000, pval = 0.01)
formula |
A formula where de left hand side is a dissimilarity matrix, the right hand side should be a list of candidate variable to partion the population |
data |
a data.frame where arguments in formula can be identified |
minSize |
minimum number of observation in a node, in percentage if less than 1. |
maxdepth |
maximum depth of the tree |
R |
Number of permutation used to assess significativity of a partition. |
pval |
Maximum p-value, in percent |
At each step, this procedure choose the variable that explains the biggest part of the pseudo variance to partition the population. It assess the significance of the choosen variable by performing a permutation test.
Return an object of class disstree, a list with the following component:
node |
A tree object (see below) |
adjustement |
global adjustement of the tree |
split |
Choosen predictor, NULL for terminal nodes |
vardis |
Node pseudo variance, see dissvar |
children |
Child node, NULL for terminal nodes |
ind |
Index of individuals in this node |
depth |
Depth of the node, starting from root node |
label |
Label of this node |
R2 |
R squared of the split, NULL for terminal nodes |
Studer, M., G. Ritschard, A. Gabadinho and N. S. Müller (2009). Analyse de dissimilarités par arbre d'induction. Revue des Nouvelles Technologies de l'Information, EGC'2009.
Batagelj, V. (1988). Generalized ward and related clustering problems. In H. Bock (Ed.), Classification and related methods of data analysis, pp. 67-74. North-Holland, Amsterdam.
Anderson, M. J. (2001). A new method for non-parametric multivariate analysis of variance. Austral Ecology 26, 32-46.
Piccarreta, R. et F. C. Billari (2007). Clustering work and family trajectories by using a divisive algorithm. Journal of the Royal Statistical Society A 170(4), 1061-1078.
dissvar to compute pseudo variance using dissimilarities and for a basic introduction to concepts of pseudo variance analysis
dissassoc to test association between dissimilarity and another variable
dissreg to analyse dissimilarities in a way close to linear regression
disscenter to compute the distance of each object to its center of group using dissimilarities
data(mvad)
## Defining a state sequence object
mvad.seq <- seqdef(mvad[, 17:86])
## Building dissimilarities
mvad.lcs <- seqdist(mvad.seq, method="LCS")
dt <- disstree(mvad.lcs~ male + Grammar + funemp + gcse5eq + fmpr + livboth,
data=mvad, R = 10000)
print(dt)
## Compute quality of the tree
print(dissassoc(mvad.lcs, disstreeleaf(dt), R=1))
## Using simplified interface to generate a file for GraphViz
seqtree2dot(dt, "mvadseqtree", seqs=mvad.seq, plottype="seqdplot",
border=NA, withlegend=FALSE)
## Generating a file for GraphViz
disstree2dot(dt, "mvadtree", imagefunc=seqdplot, imagedata=mvad.seq,
## Additionnal parameters passed to seqdplot
withlegend=FALSE, axes=FALSE)
## Second method, using a specific function
myplotfunction <- function(individuals, seqs, mds,...) {
par(font.sub=2, mar=c(3,0,6,0), mgp=c(0,0,0))
## using mds to order sequence in seqiplot
mds <- cmdscale(seqdist(seqs[individuals,], method="LCS"),k=1)
seqiplot(seqs[individuals,], sortv=mds,...)
}
## Generating a file for GraphViz
## If imagedata is not set, index of individuals are sent to imagefunc
disstree2dot(dt, "mvadtree", imagefunc=myplotfunction, title.cex=3,
## additionnal parameters passed to myplotfunction
seqs=mvad.seq, mds=mvad.mds,
## additionnal parameters passed to seqiplot (through myplotfunction)
withlegend=FALSE, axes=FALSE,tlim=0,space=0, ylab="", border=NA)
## To run GraphViz (dot) from R
## shell("dot -Tsvg -O mvadtree.dot")