The earlier vignettes showed how to define features and run the
database pipeline with fd_run(). In practice, you usually
do not want to open RStudio and run that command manually every day. You
want a script that can be executed by the operating system.
This tutorial shows how to set up that script and schedule it:
fd_run();Rscript;This vignette is written as an instruction rather than as executed package code. The database connection uses imaginary PostgreSQL credentials. You should adapt the connection details, SQL query, feature definitions, and table names to your own project.
The example is not meant to be copied without changes. It assumes a specific database, a specific raw table, and a specific set of columns. Your own script will differ in those places.
This tutorial assumes that the previous vignettes have already
explained what fd_run() does. Here we focus only on
scheduling.
You need:
featdelta package installed in that R library;RPostgres or
RMariaDB;Operating-system schedulers are normally minute-based. Windows Task Scheduler and cron are good for running jobs every minute, hourly, daily, or on a business schedule. They are not meant for second-by-second demonstrations, so this tutorial uses minute-level schedules.
Create a small folder for the scheduled pipeline. For example:
featdelta_sensor_pipeline/
scripts/
run_featdelta.R
logs/
On Windows, this could be:
C:/featdelta_sensor_pipeline/
On Linux or macOS, this could be:
/home/analyst/featdelta_sensor_pipeline/
The important point is that the scheduled script should use stable, absolute paths. A scheduled job may not start in the same working directory as your interactive R session.
Create this file:
C:/featdelta_sensor_pipeline/scripts/run_featdelta.R
On Linux or macOS, use the equivalent path in your project folder, for example:
/home/analyst/featdelta_sensor_pipeline/scripts/run_featdelta.R
Save the following script as run_featdelta.R.
Before looking at the script, here is the scenario it assumes:
analytics;ANALYTICS_DB_USER and
ANALYTICS_DB_PASSWORD;raw_schema.device_readings;reading_id;temperature,
vibration, pressure, and
runtime_hours;feature_schema.reading_features;For this example, the features are:
temp_above_80: whether the reading temperature is above
80;vibration_score: a simple interaction between vibration
and runtime;pressure_high: whether pressure is at or above a chosen
threshold;maintenance_flag: whether the reading should be flagged
for follow-up.In your own project, replace these with the transformations that make sense for your raw table.
library(DBI)
library(RPostgres)
library(featdelta)
# Use an absolute project directory so the script works from a scheduler.
project_dir <- "C:/featdelta_sensor_pipeline"
log_path <- file.path(project_dir, "logs", "featdelta-run.log")
log_line <- function(...) {
cat(
format(Sys.time(), "%Y-%m-%d %H:%M:%S"),
" | ",
paste(..., collapse = ""),
"\n",
file = log_path,
append = TRUE,
sep = ""
)
}
con <- dbConnect(
RPostgres::Postgres(),
# Replace these connection settings with your own database details.
host = "database.company.local",
port = 5432,
dbname = "analytics",
user = Sys.getenv("ANALYTICS_DB_USER"),
password = Sys.getenv("ANALYTICS_DB_PASSWORD")
)
on.exit(dbDisconnect(con), add = TRUE)
# Replace this query with the raw dataset you want to process.
# The query must return the key column and every raw column used below.
source_sql <- "
SELECT
reading_id,
device_id,
temperature,
vibration,
pressure,
runtime_hours
FROM raw_schema.device_readings
WHERE reading_status = 'ready'
ORDER BY reading_id
"
# Replace these definitions with your own feature logic.
defs <- fd_define(
temp_above_80 = temperature > 80,
vibration_score = vibration * runtime_hours,
pressure_high = pressure >= 38,
maintenance_flag = temp_above_80 | vibration_score > 80 | pressure_high
)
report <- fd_run(
con = con,
sql = source_sql,
defs = defs,
# Replace the key and feature table with your project-specific values.
key = "reading_id",
feat_table_name = "feature_schema.reading_features",
fail_fast = FALSE,
verbose = FALSE
)
would_insert <- if (is.null(report$upsert)) {
NA_integer_
} else {
report$upsert$counts$would_insert
}
would_update <- if (is.null(report$upsert)) {
NA_integer_
} else {
report$upsert$counts$would_update
}
log_line(
"success=", report$success,
", stage=", report$stage,
", fetched=", report$fetch$n_rows,
", inserts=", would_insert,
", updates=", would_update
)
if (!isTRUE(report$success)) {
log_line("error=", report$error$message)
stop(report$error$message)
}This script does four things:
fd_run() and writes a compact log line.The parts you normally change are:
project_dir, so logs are written to your project
folder;dbConnect(), so the script reaches your database;source_sql, so the script selects your raw data;defs, so the script computes your features;key, so featdelta knows the unique row
identifier;feat_table_name, so features are written to the correct
database table;For MySQL or MariaDB, the connection part would look similar, but
with RMariaDB:
library(DBI)
library(RMariaDB)
con <- dbConnect(
RMariaDB::MariaDB(),
host = "database.company.local",
port = 3306,
dbname = "analytics",
user = Sys.getenv("ANALYTICS_DB_USER"),
password = Sys.getenv("ANALYTICS_DB_PASSWORD")
)If automatic dialect detection is not sufficient in your environment, pass the dialect explicitly:
report <- fd_run(
con = con,
sql = source_sql,
defs = defs,
key = "reading_id",
feat_table_name = "feature_schema.reading_features",
dialect = "postgres",
fail_fast = FALSE,
verbose = FALSE
)For MySQL or MariaDB, use dialect = "mysql".
The example reads credentials from environment variables:
This is safer than writing passwords directly into the R file. The exact way to set environment variables depends on your operating system and security setup.
For local testing, you can set them temporarily inside an R session:
For production, use your organization’s preferred method, such as user-level environment variables, a secrets manager, or a protected configuration file.
Before scheduling anything, run the script manually from a terminal.
On Windows PowerShell:
& "C:\Program Files\R\R-4.5.2\bin\Rscript.exe" "C:\featdelta_sensor_pipeline\scripts\run_featdelta.R"On Linux or macOS:
Then inspect the log file:
C:/featdelta_sensor_pipeline/logs/featdelta-run.log
A successful line might look like this:
2026-05-05 09:00:04 | success=TRUE, stage=complete, fetched=12, inserts=12, updates=0
If the script fails manually, fix that first. A scheduler will not make an unreliable script more reliable. The script must be able to run from a plain terminal command before you schedule it.
On Windows, you can use the taskscheduleR package to
create a Windows Task Scheduler job from R.
Install it if needed:
Then run this once from an interactive R session:
library(taskscheduleR)
taskscheduler_create(
taskname = "featdelta_sensor_pipeline",
rscript = "C:/featdelta_sensor_pipeline/scripts/run_featdelta.R",
schedule = "MINUTE",
starttime = format(Sys.time() + 70, "%H:%M"),
modifier = 1
)This creates a task that runs the script every minute. That is a useful testing frequency because you can watch the log file and database table change shortly after creating the task.
In this demonstration, “every minute” is only a testing schedule. A
real production schedule should match the arrival pattern of your raw
data. For example, if new sensor readings are loaded every hour,
schedule the job after that hourly load. If a warehouse table is
refreshed once per night, schedule fd_run() after the
nightly refresh.
To list scheduled tasks created through
taskscheduleR:
To remove the test task:
For production, use a less aggressive schedule, such as hourly, daily, or after the expected raw-data load has finished.
On Linux or macOS, you can use the cronR package to
create a cron job.
Install it if needed:
Then run this once from an interactive R session:
library(cronR)
cmd <- cron_rscript(
rscript = "/home/analyst/featdelta_sensor_pipeline/scripts/run_featdelta.R",
rscript_log = "/home/analyst/featdelta_sensor_pipeline/logs/featdelta-cron.log",
log_append = TRUE
)
cron_add(
command = cmd,
frequency = "minutely",
id = "featdelta_sensor_pipeline",
description = "Run featdelta sensor feature pipeline"
)This creates a cron job that runs every minute. During testing, inspect:
/home/analyst/featdelta_sensor_pipeline/logs/featdelta-run.log
/home/analyst/featdelta_sensor_pipeline/logs/featdelta-cron.log
To list cron jobs:
To remove the test job:
For production, use a schedule that matches your data refresh cycle. For example, a daily run after midnight, an hourly run during business hours, or a custom cron expression.
At minimum, monitor the log file written by the script. Each run should tell you:
You may also want to query the feature table directly:
Or inspect the most recent keys:
The exact monitoring setup depends on your database and scheduler, but the principle is simple: make each scheduled run leave evidence that it ran and what it changed.
Scheduled jobs often fail for environmental reasons rather than
because of fd_run() itself.
Common issues include:
featdelta or the
DBI driver installed;When debugging, first run the exact Rscript command from
a terminal. Then make sure the scheduler runs under a user account that
has the same package library, environment variables, file permissions,
and database permissions.
Scheduling is not part of featdelta itself. The package
gives you the repeatable R feature pipeline. Windows Task Scheduler or
cron runs that pipeline at the chosen time.
The practical setup is:
run_featdelta.R;Rscript;taskscheduleR or
cronR;The scheduling examples use the public documentation for:
taskscheduleR::taskscheduler_create(): https://rdrr.io/cran/taskscheduleR/man/taskscheduler_create.htmlcronR::cron_add() and
cronR::cron_rscript(): https://search.r-project.org/CRAN/refmans/cronR/html/cron_add.html