GPU Acceleration with DecisionRulesExa.jl

DecisionRulesExa.jl is a companion package that implements the same TS-DDR algorithm using ExaModels.jl instead of JuMP for the optimization backend. It targets GPU-accelerated training via MadNLP.jl with CUDSS-backed interior-point solves.

When to use DecisionRulesExa.jl

	DecisionRules.jl (JuMP)	DecisionRulesExa.jl (ExaModels)
Backend	JuMP + DiffOpt	ExaModels + MadNLP
Hardware	CPU	CPU or GPU (CUDA)
Training modes	DE, stage-wise, multiple shooting	Deterministic equivalent
Gradient source	DiffOpt implicit diff + duals	Envelope theorem (duals only)
Best for	Moderate NLPs, integer variables, stage-wise decomposition	Large NLPs (AC-OPF), GPU speedup, many samples per batch

Choose DecisionRulesExa.jl when the inner NLP is large enough that GPU acceleration matters (e.g., AC-OPF with hundreds of buses and thousands of variables per stage) and you want to run many training samples per gradient step on a single GPU.

Choose DecisionRules.jl when you need stage-wise or multiple-shooting decomposition, integer variable support, or DiffOpt-based solution sensitivities.

Installation

using Pkg
Pkg.add(url="https://github.com/LearningToOptimize/DecisionRulesExa.jl.git")

For GPU support, also install CUDA.jl and MadNLPGPU:

Pkg.add(["CUDA", "MadNLPGPU"])

Quick start: CPU

The simplest way to get started is with the built-in linear tracking problem:

using DecisionRulesExa
using ExaModels, Flux, MadNLP, Random

Random.seed!(1)

T  = 8   # horizon
nx = 1   # state dimension

# Build a parametric deterministic-equivalent NLP on CPU
prob = build_linear_tracking_problem(
    horizon       = T,
    nx            = nx,
    backend       = nothing,       # CPU
    slack_penalty = 10.0,
    u_bounds      = (-2.0, 2.0),
)

# LSTM policy: maps [w_t ; x_{t-1}] → target x̂_t at each stage
policy = StateConditionedPolicy(nx, nx, nx, [64, 64])

# Uncertainty sampler: returns a flat vector of length T * nw
sampler() = Float32.(0.1 .* randn(T * nx))

# Train with TS-DDR policy gradient (envelope theorem)
train_tsddr(
    policy,
    Float32.([1.0]),               # initial state
    prob,
    prob.p_x0,
    prob.p_target,
    prob.p_w,
    sampler;
    num_batches         = 100,
    num_train_per_batch = 4,
    optimizer           = Flux.Adam(1f-3),
    madnlp_kwargs       = (print_level = MadNLP.ERROR, tol = 1e-6),
)

Moving to GPU

To run the same problem on GPU, change the backend and add a GPU-native linear solver:

using CUDA, MadNLPGPU

prob_gpu = build_linear_tracking_problem(
    horizon       = T,
    nx            = nx,
    backend       = CUDABackend(),
    slack_penalty = 10.0,
    u_bounds      = (-2.0, 2.0),
)

train_tsddr(
    policy,
    Float32.([1.0]),
    prob_gpu,
    prob_gpu.p_x0,
    prob_gpu.p_target,
    prob_gpu.p_w,
    sampler;
    num_batches         = 100,
    num_train_per_batch = 4,
    optimizer           = Flux.Adam(1f-3),
    madnlp_kwargs       = (
        print_level   = MadNLP.ERROR,
        tol           = 1e-6,
        linear_solver = CUDSSSolver,
    ),
)

The policy (Flux model) stays on CPU; only the NLP solve runs on GPU. Parameter updates (ExaModels.set_parameter!) and multiplier extraction handle CPU↔GPU transfers automatically.

Custom problems

For domain-specific models (power systems, robotics, etc.), build the ExaModels NLP directly instead of using build_linear_tracking_problem. The key requirements are:

Add target constraints last so their multipliers form a contiguous slice of result.multipliers.
Parameterize the initial state (p_x0), uncertainty trajectory (p_w), and target trajectory (p_target) as ExaModels parameters.
Return a struct with fields .core, .model, .horizon, and .target_con_range.

The HydroPowerModels example in DecisionRulesExa.jl demonstrates this pattern for a full AC-OPF problem with reservoir dynamics:

# In examples/HydroPowerModels/hydro_power_exa.jl
prob = build_hydro_de(
    data;
    num_stages     = 96,
    backend        = CUDABackend(),
    formulation    = :ac_polar,
    deficit_cost   = 1e5,
    target_penalty = :auto,
)

Parallel GPU solves

When training samples are independent, multiple NLP instances can be solved concurrently on the same GPU. Build a pool of independent problem copies and pass it to train_tsddr:

pool = [(prob, prob.p_x0, prob.p_target, prob.p_w)]
for _ in 2:num_workers
    p = build_my_problem(backend = CUDABackend())
    push!(pool, (p, p.p_x0, p.p_target, p.p_w))
end

train_tsddr(policy, x0, prob, prob.p_x0, prob.p_target, prob.p_w, sampler;
    problem_pool        = pool,
    num_train_per_batch = num_workers,
)

Each pool entry gets its own MadNLP solver instance. Samples are distributed round-robin across the pool and solved via Threads.@spawn.

Penalty annealing

DecisionRulesExa.jl supports penalty annealing through the adjust_hyperparameters callback. The target penalty coefficient $\rho$ is stored as an ExaModels parameter and can be updated at runtime:

adjust_hyperparameters = function(iter, opt_state, num_train)
    phase = iter < 100 ? 0.1 :
            iter < 200 ? 1.0 :
            iter < 300 ? 10.0 : 30.0
    ρ = base_penalty * phase
    penalty_vals = fill(ρ / 2, T * nx)
    ExaModels.set_parameter!(prob.core, prob.p_penalty_half, penalty_vals)
    return num_train
end

This mirrors the penalty_schedule keyword in DecisionRules.jl's train_multistage.

Rollout evaluation

RolloutEvaluation in DecisionRules.jl evaluates policies stage-by-stage under deployment semantics. DecisionRulesExa.jl provides an analogous RolloutEvaluation that solves stage subproblems sequentially:

eval = RolloutEvaluation(
    stage_problem, x0, eval_scenarios;
    horizon              = T,
    n_uncertainty        = nw,
    set_stage_parameters! = my_stage_setter!,
    realized_state       = my_realized_state,
    stride               = 25,
    policy_state         = :realized,
)

Both packages report the same metrics: operational cost excluding target-deficit penalty, and target-violation share.

Mapping between packages

DecisionRules.jl	DecisionRulesExa.jl	Notes
`train_multistage`	`train_tsddr`	Main training loop
`state_conditioned_policy`	`StateConditionedPolicy`	LSTM policy
`dense_multilayer_nn`	`MLPPolicy`	MLP policy
`state_params_in`	`p_x0`	Initial state parameter
`state_params_out`	`p_target`	Target parameter
`uncertainty_samples`	`p_w` + sampler	Uncertainty parameter
`SampleLog` / `record`	`record_loss`	Per-iteration callback
`RolloutEvaluation`	`RolloutEvaluation`	Stage-wise eval
`penalty_schedule`	`adjust_hyperparameters`	Penalty annealing
`ScoreFunctionConfig`	—	Not yet ported to ExaModels
Stage-wise decomposition	—	JuMP only
Multiple shooting	—	JuMP only

Full example: HydroPowerModels

The examples/HydroPowerModels/ directory in DecisionRulesExa.jl contains a complete AC-OPF hydrothermal scheduling example for the Bolivia test case — the same problem solved by DecisionRules.jl in the Hydropower Scheduling tutorial. It demonstrates:

Parsing PowerModels.jl network data and hydro reservoir parameters
Building a multi-stage deterministic-equivalent NLP in ExaModels (DC or AC polar OPF formulations)
L1 + L2 penalty on target slack (δ⁺/δ⁻ splitting for smooth NLP)
GPU training with parallel MadNLP solves
Warm-start caching to prevent cascade solver failures
Penalty and sample-count annealing schedules
W&B metric logging