% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/lm_sql.R
\name{lm_sql}
\alias{lm_sql}
\title{SQL-Backed Linear Regression}
\usage{
lm_sql(formula, data, tol = 1e-07)
}
\arguments{
\item{formula}{A formula object (e.g., \code{price ~ x + cut}).}

\item{data}{A \code{tbl_sql} object (from \pkg{dbplyr}).}

\item{tol}{Tolerance for detecting linear dependency.}
}
\value{
An S7 object of class \code{lm_sql_result}, or a tibble with a
  \code{model} list-column if the data is grouped.
}
\description{
Fits a linear regression model using SQL aggregation on a
  remote database table. The data never leaves the database — only
  sufficient statistics (sums and cross-products) are returned to R.
}
\details{
The function computes the \eqn{X^TX} and \eqn{X^Ty} matrices
  entirely inside the database engine via a single SQL aggregation query,
  then solves the normal equations in R using Cholesky decomposition
  (falling back to Moore-Penrose pseudoinverse for rank-deficient designs).

  Supported formula features:
  \itemize{
    \item Numeric and categorical (character/factor) predictors with
      automatic dummy encoding via `CASE WHEN`.
    \item Interaction terms (`*` and `:`) including numeric × categorical
      and categorical × categorical cross-products.
    \item Dot expansion (`y ~ .`) to all non-response columns.
    \item Transforms: `I()`, `log()`, and `sqrt()` translated to SQL
      equivalents (`POWER`, `LN`, `SQRT`).
    \item Date and datetime predictors automatically cast to numeric in SQL.
    \item No-intercept models (`y ~ 0 + x`).
  }

  For grouped data (via [dplyr::group_by()]), a single `GROUP BY` query is
  executed and one model per group is returned in a tibble with a `model`
  list-column.

  NA handling uses listwise deletion: rows with `NULL` in any model variable
  are excluded via a `WHERE ... IS NOT NULL` clause.
}
