--- title: "Manual Symbolic Regression: Testing Hypotheses" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Manual Symbolic Regression: Testing Hypotheses} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- # Introduction In addition to automated symbolic regression, `leaf` allows users to define their own candidate equations using the `"manual"` engine. This enables direct testing of hypotheses and incorporation of prior knowledge, while still leveraging `leaf`'s tools for parameter fitting, evaluation, and multi-view modeling. # Installation Before using `leafr`, ensure the Python backend is installed: ``` r leafr::install_leafr() ``` # Load package ``` r library(leaf) if (!backend_available()) { message("Install backend with leaf::install_leaf()") } ``` # Define the formula and custom equations User-defined equations are specified as character strings. These can include: - x1, x2, ... referring to inputs defined in the formula (by position) - Variable names directly, corresponding to column names in the dataframe - u1, u2, ... for group-specific parameters - c1, c2, ... for global parameters ``` r model_formula <- "y ~ f(log(A), T, T**2, A | Archipelago, species)" eqs <- c( "T**2*(u1 + u2*log(A) + u3*T)", "x3*(u1 + u2*x1 + u3*x2)", # same as above "exp(u1 + u2*log(T) + u3*A*x2)" # can mix both, but if using A directly in the equation need to specify it in the formula ) ``` # Define the manual search ``` r regressor <- SymbolicRegressor$new( engine = "manual", loss = "PoissonDeviance", equation_list = eqs ) ``` # Load the data ``` r train_data <- leaf_data("GMD") #> Warning in leaf_data("GMD"): Invalid data name. Run leaf_data() for a #> full list of options. head(train_data) #> NULL ``` # Register equations Even in manual mode, search_equations() is used to register and preprocess the equations. No search is performed. ``` r regressor$search_equations( data = train_data, formula = model_formula ) #> Error in `py_call_impl()`: #> ! TypeError: object of type 'NoneType' has no len() #> Run `reticulate::py_last_error()` for details. ``` # Fit parameters and inspect results ``` r # Only one equation gets a finite loss fit_results <- regressor$fit(data = train_data) #> Error in `py_call_impl()`: #> ! RuntimeError: You must run equation_search() before fitting parameters. #> Run `reticulate::py_last_error()` for details. pareto_front <- regressor$evaluate(metrics = c("RMSE", "PseudoR2")) #> Error in `py_call_impl()`: #> ! RuntimeError: You must run equation_search() before scoring. #> Run `reticulate::py_last_error()` for details. head(pareto_front) #> Error: #> ! object 'pareto_front' not found ```