Skip to contents

Overview

{targets} is the most widely used pipeline toolkit in the R ecosystem and the de-facto standard for heavy-duty reproducible workflows. The table below contrasts the two packages to help you decide which one fits your project.

Feature targets pipeflow
Paradigm Declarative — define the full DAG upfront in a _targets.R script, then execute Interactive — incrementally build the pipeline with pip_add() as you code
Execution tar_make() runs in a fresh R process pip_run() runs in the current R session
Persistent storage ✅ Output stored to disk (_targets/objects/), survives R restarts, handles data larger than RAM ❌ In-memory only, lost when R session ends
Skip up-to-date steps ✅ Hash-based invalidation of code and data ✅ State-based (done / outdated)
Metadata & provenance tar_meta() records runtime, size, errors per target ❌ No per-step provenance metadata
Dependency validation tar_validate() for pre-flight checks (opt-in) ✅ On pip_add(), pip_replace(), pip_remove() — fails fast on broken references
Modify pipeline at runtime ❌ Must edit _targets.R and re-run pip_remove(), pip_rename(), pip_replace(), insert with after =
Parameter management ❌ No unified parameter view across targets pip_get_params() / pip_set_params() — one call updates all steps
Split / map / reduce pattern = map() / cross() built-in, tarchetypes for advanced patterns ✅ Built-in exec = "split" / "auto" / "reduce"
Dynamic branching ✅ Comprehensive via tarchetypes ✅ Auto-mapping over partition keys (exec = "auto")
Views / tag filtering tar_described_as() selects by description tags pip_view() — filter steps by tags or index
Pipeline composition pip_bind() two pipelines, pip_add_from() copy individual steps
Self-modifying pipelines pip_run(recursive = TRUE) — steps can return modified pipelines
Distributed computing crew for HPC and cloud workers
Cloud storage ✅ AWS, GCS
File tracking ✅ File targets with format = "file"
Step locking pip_lock() / pip_unlock() — protect steps from accidental modification

In short, {targets} is the tool of choice for large-scale reproducible projects: it persists results to disk, captures provenance with tar_meta(), and scales to distributed infrastructure via crew. {pipeflow} prioritises speed and interactivity — sub-millisecond skipped-step checks, in-session pipeline modification, and low response times make it well suited as a Shiny backend or for rapid parameter exploration during analysis.

Benchmarks

The benchmarks below provide a quantitative comparison on three pipeline topologies. All timings are measured with system.time() across 30 iterations per scenario.

Package versions: pipeflow 0.3.0, targets 1.12.0.

The three scenarios were chosen to isolate different aspects of pipeline overhead:

  • Walkthrough — a minimal 4-step pipeline mimicking a typical exploratory workflow. This measures startup cost and skip-check overhead in the simplest realistic case.
  • Long linear pipeline — a chain of N steps (s0 -> s1 -> ... -> sN). This isolates per-step bookkeeping cost as the pipeline grows to 128 steps.
  • Fan-out DAG — a single source feeds N parallel branches that converge on one sink. This tests how each package handles wide dependency structures.

For targets, each iteration uses a fresh tar_dir() and tar_script() to ensure fair timing of end-to-end pipeline execution. For pipeflow, pipelines are built once and timed with force = TRUE to measure the cost of a full re-run.

Walkthrough example

A minimal four-step pipeline based on the {targets} walkthrough — read CSV data, fit a linear model, and produce a plot. We measure both the first full run (all steps executed) and the subsequent skipped run (all steps already up to date). Full runs with targets include a tar_destroy() between iterations so each is a true cold start.

create_data_csv <- function(data = airquality, file = "data.csv") {
    utils::write.csv(data, file)
}

get_data <- function(file) {
    utils::read.csv(file) |> stats::na.omit()
}

fit_model <- function(data) {
    lm(Ozone ~ Temp, data) |> coefficients()
}

plot_model <- function(model, data) {
    ggplot(data) +
        geom_point(aes(x = Temp, y = Ozone)) +
        geom_abline(intercept = model[1], slope = model[2])
}

pipeflow pipeline

tar_dir({
    create_data_csv(file = "data.csv")
    p <- pip_new("walkthrough") |>
        pip_add("data",  \(file = "data.csv") get_data(file)) |>
        pip_add("model", \(data = ~data) fit_model(data)) |>
        pip_add(
            "plot",
            \(model = ~model, data = ~data) plot_model(model, data)
        )

    message("\nProof of principle full run (no skips)")
    pip_run(p)
    p_r <- replicate(nrep, elapsed_time(pip_run(p, lgr = NULL, force = TRUE)))

    message("\nProof of principle skipped run")
    pip_run(p)
    p_s <- replicate(nrep, elapsed_time(pip_run(p, lgr = NULL)))
})
# 
# Proof of principle full run (no skips)
# info [2026-06-20 19:20:51.077 UTC]: Start run of pipeflow_pip 'walkthrough'
# info [2026-06-20 19:20:51.077 UTC]: Step 1/3 data
# info [2026-06-20 19:20:51.083 UTC]: Step 2/3 model
# info [2026-06-20 19:20:51.089 UTC]: Step 3/3 plot
# info [2026-06-20 19:20:51.121 UTC]: Finished run of pipeflow_pip 'walkthrough'
# 
# Proof of principle skipped run
# info [2026-06-20 19:20:51.560 UTC]: Start run of pipeflow_pip 'walkthrough'
# info [2026-06-20 19:20:51.560 UTC]: Step 1/3 data - skipping done step
# info [2026-06-20 19:20:51.560 UTC]: Step 2/3 model - skipping done step
# info [2026-06-20 19:20:51.560 UTC]: Step 3/3 plot - skipping done step
# info [2026-06-20 19:20:51.561 UTC]: Finished run of pipeflow_pip 'walkthrough'

targets pipeline

tar_make_here <- function(reporter = "silent") {
    tar_make(callr_function = NULL, reporter = reporter)
}

tar_dir({
    create_data_csv(file = "data.csv")
    tar_script({
        list(
            tar_target(file, "data.csv", format = "file"),
            tar_target(data, get_data(file)),
            tar_target(model, fit_model(data)),
            tar_target(plot, plot_model(model, data))
        )
    }, ask = FALSE)

    message("\nProof of principle full run (no skips)")
    tar_make_here(reporter = "timestamp")
    tar_r <- replicate(nrep, {
        tar_destroy(ask = FALSE)
        elapsed_time(tar_make_here(reporter = "silent"))
    })

    message("\nProof of principle skipped run")
    tar_make_here(reporter = "timestamp")
    tar_s <- replicate(nrep, {
        elapsed_time(tar_make_here(reporter = "silent"))
    })
})
# 
# Proof of principle full run (no skips)
# 2026-06-20 21:20:51.73 dispatched target file
# 2026-06-20 21:20:51.74 completed target file [0ms, 3.87 kB]
# 2026-06-20 21:20:51.74 dispatched target data
# 2026-06-20 21:20:51.76 completed target data [0ms, 1.38 kB]
# 2026-06-20 21:20:51.79 dispatched target model
# 2026-06-20 21:20:51.79 completed target model [0ms, 111 B]
# 2026-06-20 21:20:51.79 dispatched target plot
# 2026-06-20 21:20:51.85 completed target plot [0ms, 114.10 kB]
#  2026-06-20 21:20:51.86 ended pipeline [190ms, 4 completed, 0 skipped]
# 
# Proof of principle skipped run
# 2026-06-20 21:21:21.57 skipped 1 targets
#  2026-06-20 21:21:21.57 skipped pipeline [31ms, 4 skipped]

Runtimes

Long linear pipeline

Each step depends on the output of the previous step: s0 -> s1 -> s2 -> ... -> sN. This measures the overhead of managing the pipeline structure and skipping logic as the number of steps increases. We benchmark at 16, 32, 64, and 128 steps to show how the per-step cost scales.

pipeflow pipeline

create_linear_pip <- function(n) {
    pip <- pip_new("linear") |> pip_add("s0", \(init = 0) init)

    for (i in seq_len(n)) {
        pip_add(pip, step = paste0("s", i), \(x = ~ -1) x + 1)
    }
    pip
}

# Verify
p <- create_linear_pip(3)
pip_run(p)
# info [2026-06-20 19:21:23.552 UTC]: Start run of pipeflow_pip 'linear'
# info [2026-06-20 19:21:23.552 UTC]: Step 1/4 s0
# info [2026-06-20 19:21:23.553 UTC]: Step 2/4 s1
# info [2026-06-20 19:21:23.554 UTC]: Step 3/4 s2
# info [2026-06-20 19:21:23.556 UTC]: Step 4/4 s3
# info [2026-06-20 19:21:23.557 UTC]: Finished run of pipeflow_pip 'linear'
stopifnot(p[["s3", "out"]] == 3)

targets pipeline

create_linear_tar <- function(n) {
    init <- tar_target(s0, 0)
    rest <- lapply(
        seq_len(n),
        FUN = \(i) tar_target_raw(
            sprintf("s%d", i),
            call("+", as.symbol(sprintf("s%d", i - 1)), 1)
        )
    )
    c(list(init), rest)
}

# Verify
tar_dir({
    tar_script(create_linear_tar(3), ask = FALSE)
    tar_make_here(reporter = "timestamp")
    stopifnot(tar_read(s3) == 3)
})
# 2026-06-20 21:21:23.65 dispatched target s0
# 2026-06-20 21:21:23.66 completed target s0 [0ms, 49 B]
# 2026-06-20 21:21:23.66 dispatched target s1
# 2026-06-20 21:21:23.66 completed target s1 [0ms, 51 B]
# 2026-06-20 21:21:23.66 dispatched target s2
# 2026-06-20 21:21:23.67 completed target s2 [0ms, 50 B]
# 2026-06-20 21:21:23.67 dispatched target s3
# 2026-06-20 21:21:23.68 completed target s3 [0ms, 51 B]
#  2026-06-20 21:21:23.68 ended pipeline [61ms, 4 completed, 0 skipped]
# 

Runtimes

DAG with branching

A source feeds N parallel branches, all converging on a single sink step. This tests scalability with fan-out structures and how each package handles wide dependency graphs — from 16 up to 128 parallel branches. Unlike the linear pipeline, all branches can potentially run independently once the source completes.

dag_source <- function() 1
dag_branch <- function(x) x + 1
dag_sink   <- function(...) sum(...)

pipeflow pipeline

make_branch_pip <- function(n) {
    pip <- pip_new("dag") |> pip_add("source", dag_source)
    for (i in seq_len(n))
        pip_add(pip, paste0("b", i), \(x = ~source) dag_branch(x))

    sink_args <- paste(
        sprintf("x%s = ~b%s", seq_len(n), seq_len(n)),
        collapse = ", "
    )
    sink_call <- paste(paste0("x", seq_len(n)), collapse = ", ")
    eval(parse(text = sprintf(
        "pip_add(pip, 'sink', function(%s) { dag_sink(%s) })",
        sink_args, sink_call
    )))
    invisible(pip)
}

p4 <- make_branch_pip(4)
pip_run(p4)
# info [2026-06-20 19:22:29.081 UTC]: Start run of pipeflow_pip 'dag'
# info [2026-06-20 19:22:29.081 UTC]: Step 1/6 source
# info [2026-06-20 19:22:29.082 UTC]: Step 2/6 b1
# info [2026-06-20 19:22:29.084 UTC]: Step 3/6 b2
# info [2026-06-20 19:22:29.087 UTC]: Step 4/6 b3
# info [2026-06-20 19:22:29.089 UTC]: Step 5/6 b4
# info [2026-06-20 19:22:29.090 UTC]: Step 6/6 sink
# info [2026-06-20 19:22:29.092 UTC]: Finished run of pipeflow_pip 'dag'
stopifnot(p4[["sink", "out"]] == 8)

targets pipeline

make_branch_tar <- function(n) {
    source <- tar_target(source, dag_source())
    branches <- lapply(
        seq_len(n),
        FUN = \(i) tar_target_raw(
            sprintf("b%d", i),
            call("dag_branch", as.symbol("source"))
        )
    )
    sink <- tar_target_raw(
        "sink",
        as.call(c(
            as.symbol("dag_sink"),
            lapply(paste0("b", seq_len(n)), as.symbol)
        ))
    )
    c(list(source), branches, list(sink))
}

# Verify
tar_dir({
    tar_script(make_branch_tar(4), ask = FALSE)
    tar_make_here(reporter = "timestamp")
    stopifnot(tar_read(sink) == 8)
})
# 2026-06-20 21:22:29.19 dispatched target source
# 2026-06-20 21:22:29.19 completed target source [0ms, 51 B]
# 2026-06-20 21:22:29.20 dispatched target b1
# 2026-06-20 21:22:29.21 completed target b1 [0ms, 50 B]
# 2026-06-20 21:22:29.21 dispatched target b2
# 2026-06-20 21:22:29.21 completed target b2 [0ms, 50 B]
# 2026-06-20 21:22:29.22 dispatched target b3
# 2026-06-20 21:22:29.22 completed target b3 [0ms, 50 B]
# 2026-06-20 21:22:29.22 dispatched target b4
# 2026-06-20 21:22:29.23 completed target b4 [0ms, 50 B]
# 2026-06-20 21:22:29.23 dispatched target sink
# 2026-06-20 21:22:29.24 completed target sink [0ms, 51 B]
#  2026-06-20 21:22:29.24 ended pipeline [70ms, 6 completed, 0 skipped]
# 

Runtimes