--- title: "funcml" output: rmarkdown::pdf_document vignette: > %\VignetteIndexEntry{funcml} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set(collapse = TRUE, comment = "#>") library(funcml) ``` `funcml` provides a machine learning framework for R. The package also includes native interpretability helpers for permutation importance, partial dependence, ICE, ALE, local surrogate explanations, SHAP, interaction strength, greedy breakdown profiles, and global surrogate models. It also exposes native ensemble learners through `model = "stacking"` and `model = "superlearner"`. The package is intentionally opinionated. It is not designed to compete feature for feature with broader frameworks such as `tidymodels`, `mlr3`, or `caret`. Instead, `funcml` focuses on a compact and explicit framework for users who want to fit, compare, tune, interpret, and estimate models through one coherent interface, with formula syntax as the main model-specification surface. This design comes with a deliberate tradeoff: - preprocessing is expected to happen outside `funcml`, so the data passed to `fit()` is the exact data being modeled - the package favors explicit inputs and direct model specification over hidden preprocessing state inside training pipelines - cross-validation is still the default, but the package also supports holdout splits, grouped CV, and time-aware rolling evaluation through the same resampling interface Users who need broader preprocessing orchestration or specialized resampling designs should use a more complete framework. Learner coverage currently includes: - regression and classification: `glm`, `rpart`, `glmnet`, `ranger`, `nnet`, `e1071_svm`, `randomForest`, `gbm`, `kknn`, `ctree`, `cforest`, `lightgbm`, `xgboost`, `stacking`, `superlearner` - regression and binary classification: `earth`, `gam`, `bart` - classification only: `C50`, `naivebayes`, `fda`, `lda`, `qda` - binary classification only: `adaboost` - regression only: `pls` ```{r} fit_obj <- fit(mpg ~ wt + hp, data = mtcars, model = "ranger") permute_obj <- interpret(fit_obj, mtcars, method = "permute", nsim = 5) pdp_obj <- interpret(fit_obj, mtcars, method = "pdp", features = "wt") ale_obj <- interpret(fit_obj, mtcars, method = "ale", features = "wt") local_obj <- interpret(fit_obj, mtcars, method = "local_model", newdata = mtcars[1, , drop = FALSE], k = 2) shap_obj <- interpret(fit_obj, mtcars, method = "shap", newdata = mtcars[1, , drop = FALSE], nsim = 20) profile_obj <- interpret(fit_obj, mtcars, method = "profile", newdata = mtcars[1, , drop = FALSE]) surrogate_obj <- interpret(fit_obj, mtcars, method = "surrogate") ``` ```{r} eval_obj <- evaluate(mpg ~ wt + hp, data = mtcars, model = "glm", resampling = cv(5)) eval_obj ``` ```{r} tune_grid <- expand.grid(intercept = c(TRUE, FALSE)) tune_obj <- tune(mpg ~ wt + hp, data = mtcars, model = "glm", grid = tune_grid, search = "random", n_evals = 1, resampling = cv(v = 3, seed = 1), seed = 1) tune_obj ``` Nested CV is available through `outer_resampling`. The inner `resampling` argument still selects the best configuration, and the outer resampling loop provides an unbiased estimate of tuned model-selection performance. ```{r} nested_tune_obj <- tune( mpg ~ wt + hp, data = mtcars, model = "glm", grid = tune_grid, resampling = cv(v = 3, seed = 1), outer_resampling = cv(v = 4, seed = 2), metric = "rmse", seed = 1 ) nested_tune_obj ``` ```{r, eval = FALSE} interaction_obj <- interpret(fit_obj, mtcars, method = "interaction") plot(interaction_obj) ``` ```{r} learners() ``` ```{r} plot(permute_obj) plot(pdp_obj) plot(ale_obj) plot(local_obj) plot(shap_obj, kind = "waterfall") plot(profile_obj) plot(surrogate_obj) ```