A native R package for building, training, and deploying neural networks. Backed by the ggml C library, designed primarily for Vulkan GPU acceleration with full CPU fallback — no Python, no TensorFlow, everything runs inside your R session.
GPU-first design: when a Vulkan-capable GPU is available (NVIDIA, AMD, Intel, ARM Mali, Qualcomm Adreno), all operations run on GPU automatically. On machines without a GPU the package falls back to CPU transparently — no code changes needed.
Two complementary APIs:
| API | Style | When to use |
|---|---|---|
| Sequential / Functional | Keras-like, static graph | Production models, CRAN-standard workflow |
Dynamic autograd (ag_*) |
PyTorch-like, eager | Research, custom architectures, Transformers |
Also serves as the backend engine for llamaR (LLM inference) and sd2R (Stable Diffusion).
install.packages("ggmlR")GPU (Vulkan) support is auto-detected at build time.
Ubuntu / Debian — to enable GPU:
sudo apt install libvulkan-dev glslcWindows — install Rtools and optionally the Vulkan SDK for GPU support.
Force-enable or disable Vulkan GPU backend:
install.packages("ggmlR", configure.args = "--with-vulkan")
install.packages("ggmlR", configure.args = "--without-vulkan")Enable CPU SIMD acceleration (AVX2, SSE4, etc.) for faster inference on your machine:
install.packages("ggmlR", configure.args = "--with-simd")Options can be combined:
install.packages("ggmlR", configure.args = "--with-vulkan --with-simd")The fastest way to get a model running — stack layers with the pipe, compile, train.
library(ggmlR)
model <- ggml_model_sequential() |>
ggml_layer_dense(128L, activation = "relu", input_shape = 784L) |>
ggml_layer_dropout(rate = 0.3) |>
ggml_layer_dense(10L, activation = "softmax")
model <- ggml_compile(model,
optimizer = "adam",
loss = "categorical_crossentropy",
metrics = "accuracy")
model <- ggml_fit(model, x_train, y_train,
epochs = 10L,
batch_size = 32L,
validation_split = 0.1,
verbose = 1L)
plot(model$history)
ggml_evaluate(model, x_test, y_test)
preds <- ggml_predict(model, x_new)| Layer | Function |
|---|---|
| Dense | ggml_layer_dense(units, activation) |
| Conv1D | ggml_layer_conv_1d(filters, kernel_size) |
| Conv2D | ggml_layer_conv_2d(filters, kernel_size, padding) |
| MaxPooling2D | ggml_layer_max_pooling_2d(pool_size) |
| GlobalAvgPool2D | ggml_layer_global_average_pooling_2d() |
| BatchNorm | ggml_layer_batch_norm() |
| Flatten | ggml_layer_flatten() |
| Dropout | ggml_layer_dropout(rate) |
| Embedding | ggml_layer_embedding(vocab_size, dim) |
| LSTM | ggml_layer_lstm(units, return_sequences) |
| GRU | ggml_layer_gru(units, return_sequences) |
model <- ggml_model_sequential() |>
ggml_layer_conv_2d(32L, kernel_size = c(3L, 3L), activation = "relu",
input_shape = c(28L, 28L, 1L)) |>
ggml_layer_max_pooling_2d(pool_size = c(2L, 2L)) |>
ggml_layer_conv_2d(64L, kernel_size = c(3L, 3L), activation = "relu") |>
ggml_layer_global_average_pooling_2d() |>
ggml_layer_dense(10L, activation = "softmax")Wire layers into arbitrary graphs — residual connections, multi-input/output, shared weights.
inp <- ggml_input(shape = 64L)
x <- inp |> ggml_layer_dense(64L, activation = "relu")
res <- ggml_layer_add(list(inp, x)) # element-wise add
out <- res |> ggml_layer_dense(10L, activation = "softmax")
m <- ggml_model(inputs = inp, outputs = out)
m <- ggml_compile(m, optimizer = "adam", loss = "categorical_crossentropy")
m <- ggml_fit(m, x_train, y_train, epochs = 5L, batch_size = 32L)inp <- ggml_input(shape = 30L, dtype = "int32", name = "tokens")
emb <- inp |> ggml_layer_embedding(vocab_size = 500L, dim = 32L)
# Branch A: GRU path
proj_a <- emb |>
ggml_layer_gru(32L, return_sequences = FALSE) |>
ggml_layer_dense(32L)
# Branch B: flatten + projection
proj_b <- emb |>
ggml_layer_flatten() |>
ggml_layer_dense(32L, activation = "relu") |>
ggml_layer_dense(32L)
# Residual merge
out <- ggml_layer_add(list(proj_a, proj_b)) |>
ggml_layer_dropout(rate = 0.3) |>
ggml_layer_dense(2L, activation = "softmax")
m <- ggml_model(inputs = inp, outputs = out)Token values must be 0-based integers in
[0, vocab_size - 1].
inp1 <- ggml_input(shape = 20L, name = "timeseries")
inp2 <- ggml_input(shape = 3L, name = "metadata")
br1 <- inp1 |> ggml_layer_dense(16L, activation = "relu")
br2 <- inp2 |> ggml_layer_dense(8L, activation = "relu")
out <- ggml_layer_concatenate(list(br1, br2), axis = 0L) |>
ggml_layer_dense(2L, activation = "softmax")
m <- ggml_model(inputs = list(inp1, inp2), outputs = out)
m <- ggml_compile(m, optimizer = "adam", loss = "categorical_crossentropy")
# Pass x as a list — one matrix per input
m <- ggml_fit(m, x = list(x_ts, x_meta), y = y,
epochs = 10L, batch_size = 32L)
preds <- ggml_predict(m, list(x_ts, x_meta))inp <- ggml_input(shape = 64L)
hidden <- inp |> ggml_layer_dense(64L, activation = "relu")
out <- hidden |> ggml_layer_dense(10L, activation = "softmax")
m <- ggml_model(inputs = inp, outputs = list(hidden, out))
preds <- ggml_predict(m, x)
# preds[[1]] — hidden activations [n × 64]
# preds[[2]] — class probabilities [n × 10]residual_block <- function(x, filters, name) {
main <- x |> ggml_layer_conv_2d(filters, c(3L, 3L), padding = "same",
name = paste0(name, "_conv"))
shortcut <- x |> ggml_layer_conv_2d(filters, c(1L, 1L), padding = "same",
name = paste0(name, "_proj"))
ggml_layer_add(list(main, shortcut), name = paste0(name, "_add"))
}
inp <- ggml_input(shape = c(32L, 32L, 3L))
x <- inp |> ggml_layer_conv_2d(16L, c(3L, 3L), activation = "relu",
padding = "same")
x <- residual_block(x, 16L, "res1")
x <- residual_block(x, 32L, "res2")
out <- x |>
ggml_layer_global_average_pooling_2d() |>
ggml_layer_dropout(rate = 0.4) |>
ggml_layer_dense(3L, activation = "softmax")
m <- ggml_model(inputs = inp, outputs = out)enc <- ggml_dense(32L, activation = "relu", name = "encoder")
x1 <- ggml_input(shape = 16L, name = "left")
x2 <- ggml_input(shape = 16L, name = "right")
h1 <- ggml_apply(x1, enc) # identical weights
h2 <- ggml_apply(x2, enc)
out <- ggml_layer_add(list(h1, h2)) |>
ggml_layer_dense(2L, activation = "softmax")
m <- ggml_model(inputs = list(x1, x2), outputs = out)| Feature | Keras (Python) | ggmlR |
|---|---|---|
| Batch dimension | part of input_shape |
excluded from shape |
| Merge layers | add([a, b]) |
ggml_layer_add(list(a, b)) |
| Shared layers | reuse layer object | ggml_dense() + ggml_apply() |
| Multi-input data | list of arrays | list() of R matrices |
| Multi-output predict | list of numpy arrays | R list of matrices |
| Backend | TensorFlow / JAX / PyTorch | ggml (Vulkan GPU, CPU fallback) |
Build and train arbitrary architectures with eager execution and automatic differentiation.
library(ggmlR)
# Define parameters
W <- ag_param(matrix(rnorm(4 * 8) * 0.1, 8, 4))
b <- ag_param(matrix(0, 8, 1))
# Forward + backward
with_grad_tape({
h <- ag_add(ag_matmul(W, x_batch), b)
h <- ag_relu(h)
loss <- ag_mse_loss(h, y_batch)
})
grads <- backward(loss)
opt <- optimizer_adam(list(W = W, b = b), lr = 1e-3)
opt$step(grads)
opt$zero_grad()model <- ag_sequential(
ag_linear(64L, 128L, activation = "relu"),
ag_batch_norm(128L),
ag_dropout(0.1),
ag_linear(128L, 10L)
)
params <- model$parameters()
opt <- optimizer_adam(params, lr = 1e-3)
sch <- lr_scheduler_cosine(opt, T_max = 50L, lr_min = 1e-5)
dl <- ag_dataloader(x_train, y_train, batch_size = 32L, shuffle = TRUE)
for (epoch in 1:50) {
for (batch in dl$epoch()) {
with_grad_tape({
out <- model$forward(batch$x)
loss <- ag_softmax_cross_entropy_loss(out, batch$y)
})
grads <- backward(loss)
clip_grad_norm(params, grads, max_norm = 1.0)
opt$step(grads)
opt$zero_grad()
}
sch$step()
}dp_train() splits data across N replicas, averages
gradients, and keeps weights in sync automatically.
make_model <- function() {
W <- ag_param(matrix(rnorm(4 * 2) * 0.1, 2, 4))
b <- ag_param(matrix(0, 2, 1))
list(
forward = function(x) ag_add(ag_matmul(W, x), b),
parameters = function() list(W = W, b = b)
)
}
result <- dp_train(
make_model = make_model,
data = my_dataset, # list of samples
loss_fn = function(out, tgt) ag_mse_loss(out, tgt),
forward_fn = function(model, s) model$forward(s$x),
target_fn = function(s) s$y,
n_gpu = 2L, # number of replicas
n_iter = 100L,
lr = 1e-3,
max_norm = 5.0
)
result$loss_history # numeric vector, one value per iteration
result$model # trained replica 0| Category | Functions |
|---|---|
| Linear | ag_matmul, ag_add, ag_sub,
ag_mul, ag_scale |
| Activations | ag_relu, ag_sigmoid, ag_tanh,
ag_softmax |
| Reductions | ag_sum, ag_mean (with dim,
keepdim) |
| Math | ag_log, ag_exp, ag_pow,
ag_clamp |
| Shape | ag_reshape, ag_transpose |
| Attention | ag_multihead_attention |
| Loss | ag_mse_loss, ag_cross_entropy_loss,
ag_softmax_cross_entropy_loss |
| Layers | ag_linear, ag_batch_norm,
ag_dropout, ag_embedding |
| Containers | ag_sequential |
| Optimizers | optimizer_sgd, optimizer_adam |
| Schedulers | lr_scheduler_step,
lr_scheduler_cosine |
| Utilities | clip_grad_norm, ag_gradcheck,
dp_train |
Load pre-trained ONNX models from PyTorch, TensorFlow, or other frameworks and run inference on Vulkan GPU or CPU. No Python or external libraries required — ggmlR includes a built-in zero-dependency protobuf parser.
library(ggmlR)
# 1. Load the model
model <- onnx_load("squeezenet.onnx")
model
#> ONNX Model: torch_jit
#> Producer: pytorch 2.0.1
#> IR version: 8 / Opset: 18
#> Nodes: 66 / Weights: 26
# 2. Check expected inputs
onnx_inputs(model)
#> $x.1
#> [1] 1 3 224 224
# 3. Prepare input data (flat numeric vector, row-major NCHW order)
img <- runif(1 * 3 * 224 * 224)
# 4. Run inference — pass a named list matching input names
result <- onnx_run(model, list(x.1 = img))
# 5. Get predictions
scores <- result[[1]]
cat("Predicted class:", which.max(scores) - 1L, "\n")onnx_load() parses the .onnx file, builds a ggml
computation graph, and allocates tensors on the specified device.
Weights are loaded from the file via memory-mapping (zero-copy).
# Auto-detect device (Vulkan GPU if available, else CPU)
model <- onnx_load("model.onnx")
# Force CPU
model <- onnx_load("model.onnx", device = "cpu")
# Force Vulkan GPU
model <- onnx_load("model.onnx", device = "vulkan")Some models (BERT, RoBERTa, etc.) have dynamic dimensions for batch size or sequence length. Specify fixed shapes at load time:
model <- onnx_load("bert.onnx",
input_shapes = list(
input_ids = c(1L, 128L),
attention_mask = c(1L, 128L)
))If you forget, onnx_load() will tell you which inputs
need shapes:
Error: Input 'input_ids' has dynamic shape [?x?].
Specify fixed shape via input_shapes parameter.
# Print overview
model
#> ONNX Model: torch_jit
#> Nodes: 533 / Weights: 199
#> Ops: MatMul, Add, LayerNormalization, Softmax, ...
# Detailed metadata
onnx_summary(model)
# Input names and shapes (what to pass to onnx_run)
onnx_inputs(model)
# Backend placement: GPU vs CPU split, scheduler info
onnx_device_info(model)# Single input model
result <- onnx_run(model, list(x = my_data))
# Multiple inputs
result <- onnx_run(model, list(
input_ids = as.numeric(token_ids),
attention_mask = rep(1, 128)
))
# Result is a named list of output tensors
str(result)
#> List of 1
#> $ output: num [1:1000] 0.00123 0.00045 ...ONNX models expect inputs in row-major order (batch, channels, height, width for images). R matrices are column-major, so you may need to transpose:
# Image classification: model expects [1, 3, 224, 224]
# If you have a 224x224x3 R array:
img_array <- array(runif(224 * 224 * 3), dim = c(224, 224, 3))
# Rearrange to NCHW: [1, 3, 224, 224] — channel first
img_chw <- aperm(img_array, c(3, 1, 2)) # [3, 224, 224]
input <- as.numeric(img_chw) # flat vector, row-major
result <- onnx_run(model, list(x = input))For NLP models, inputs are typically 1D integer sequences:
# BERT-style: token IDs as numeric vector
tokens <- c(101, 2023, 2003, 1037, 3231, 102, rep(0, 122)) # pad to 128
result <- onnx_run(model, list(input_ids = as.numeric(tokens)))# Classification: get top-5 predictions
scores <- result[[1]]
top5 <- order(scores, decreasing = TRUE)[1:5]
cat("Top-5 classes:", top5 - 1L, "\n") # 0-based class indices
cat("Top-5 scores:", scores[top5], "\n")
# Apply softmax if model outputs logits (not probabilities)
probs <- exp(scores) / sum(exp(scores))Models can be run multiple times — weights are automatically reloaded between runs:
model <- onnx_load("classifier.onnx")
for (batch in data_batches) {
result <- onnx_run(model, list(x = batch))
# process result...
}12 out of 15 ONNX Model Zoo models load successfully:
| Model | Nodes | Key ops |
|---|---|---|
| mnist-8 | 12 | Conv, Relu, MaxPool, Reshape, MatMul |
| squeezenet1.0-8 | 66 | Conv, Relu, MaxPool, Concat, GlobalAveragePool, Softmax |
| adv_inception_v3 (Opset 17/18) | 215 | Conv, BatchNorm, Relu, Concat, AveragePool |
| emotion-ferplus-8 | 52 | Conv, Relu, MaxPool, Gemm, Constant |
| bat_resnext26ts (Opset 18) | 570 | Conv, BatchNorm, SiLU, Concat, Expand, Split |
| bert (Opset 17) | 533 | MatMul, LayerNorm, GELU/Erf, Softmax, Shape, Gather, Where |
| botnet26t_256 (Opset 16) | 377 | Conv, MatMul, Softmax, Reshape, Transpose |
| MaskRCNN-12-int8 | 937 | QLinearConv, DequantizeLinear, Resize, Concat, Reshape |
| roberta-9 | 1180 | MatMul, LayerNorm, Erf, Softmax, Shape, Gather, Cast |
| sageconv (Opset 16) | 24 | MatMul, Add, Mul, Sigmoid, ReduceSum |
| super-resolution-10 | 12 | Conv, Reshape, Transpose |
Arithmetic: Add, Sub, Mul, Div, Pow, Sqrt, Exp, Log, Abs, Neg, Floor,
Ceil, Clip, Erf, Equal. Linear: MatMul (batched), Gemm. Convolution:
Conv (1D/2D, grouped, depthwise), ConvTranspose (1D/2D), with
auto_pad (SAME_UPPER, SAME_LOWER). Pooling: MaxPool,
AveragePool, GlobalAveragePool, Resize/Upsample (nearest, bilinear).
Normalization: BatchNorm, LayerNorm, GroupNorm, RMSNorm. Activations:
Relu, Sigmoid, Tanh, GELU, SiLU, Softmax, LeakyRelu, Elu. Shape:
Reshape, Transpose, Concat, Flatten, Squeeze, Unsqueeze, Expand, Slice,
Split, Gather, Pad, Shape, Cast, Identity, EyeLike. Constants: Constant
(TensorProto + scalar), ConstantOfShape. Logic: Where, Equal. Reduction:
ReduceMean, ReduceSum. Quantization: DequantizeLinear, QuantizeLinear,
QLinearConv, QLinearAdd, QLinearMatMul, QLinearSigmoid, QLinearConcat.
Pass-through: Dropout.
ggml_save_model(model, "my_model.rds")
model <- ggml_load_model("my_model.rds")Measured on AMD Ryzen 5 5600 + AMD RX 9070, single-image inference:
| Model | CPU (ms) | GPU (ms) | Speedup | CPU FPS | GPU FPS |
|---|---|---|---|---|---|
| Inception V3 | 1747.7 | 24.3 | 71.8x | 0.6 | 41.1 |
| MNIST | 0.7 | 0.3 | 2.0x | 1500.0 | 3000.0 |
| SqueezeNet 1.0 | 128.7 | 3.0 | 42.9x | 7.8 | 333.3 |
| SuperResolution | 904.3 | 354.3 | 2.6x | 1.1 | 2.8 |
| EmotionFerPlus | 259.0 | 4.7 | 55.5x | 3.9 | 214.3 |
| Inception V3 Op18 | 1778.3 | 105.7 | 16.8x | 0.6 | 9.5 |
| BAT-ResNeXt26ts | 847.7 | 14.3 | 59.1x | 1.2 | 69.8 |
| BERT (Opset17) | 3250.3 | 99.7 | 32.6x | 0.3 | 10.0 |
| GPT-NeoX | 10.0 | 3.3 | 3.0x | 100.0 | 300.0 |
Benchmark script: inst/examples/benchmark_onnx.R
ggmlR is designed GPU-first: Vulkan is auto-detected at build time and, when available, 90%+ of operations run on GPU with 5×–20× speedup over CPU. On machines without a Vulkan-capable GPU the package falls back to CPU transparently — no code changes required.
ggml_vulkan_available() # TRUE if a Vulkan GPU was detected
ggml_vulkan_status() # device name, memory, capabilities
# Dynamic autograd: switch device at runtime
ag_device("gpu") # move subsequent ops to GPU (f16 by default)
ag_device("cpu") # fall back to CPUSupported GPUs: NVIDIA, AMD, Intel, ARM Mali, Qualcomm Adreno.
libvulkan-dev +
glslc (Linux) or Vulkan SDK (Windows)MIT