GPU Acceleration with DecisionRulesExa.jl
DecisionRulesExa.jl is a companion package that implements the same TS-DDR algorithm using ExaModels.jl instead of JuMP for the optimization backend. It targets GPU-accelerated training via MadNLP.jl with CUDSS-backed interior-point solves.
When to use DecisionRulesExa.jl
| DecisionRules.jl (JuMP) | DecisionRulesExa.jl (ExaModels) | |
|---|---|---|
| Backend | JuMP + DiffOpt | ExaModels + MadNLP |
| Hardware | CPU | CPU or GPU (CUDA) |
| Training modes | DE, stage-wise, multiple shooting | Deterministic equivalent |
| Gradient source | DiffOpt implicit diff + duals | Envelope theorem (duals only) |
| Best for | Moderate NLPs, integer variables, stage-wise decomposition | Large NLPs (AC-OPF), GPU speedup, many samples per batch |
Choose DecisionRulesExa.jl when the inner NLP is large enough that GPU acceleration matters (e.g., AC-OPF with hundreds of buses and thousands of variables per stage) and you want to run many training samples per gradient step on a single GPU.
Choose DecisionRules.jl when you need stage-wise or multiple-shooting decomposition, integer variable support, or DiffOpt-based solution sensitivities.
Installation
using Pkg
Pkg.add(url="https://github.com/LearningToOptimize/DecisionRulesExa.jl.git")For GPU support, also install CUDA.jl and MadNLPGPU:
Pkg.add(["CUDA", "MadNLPGPU"])Quick start: CPU
The simplest way to get started is with the built-in linear tracking problem:
using DecisionRulesExa
using ExaModels, Flux, MadNLP, Random
Random.seed!(1)
T = 8 # horizon
nx = 1 # state dimension
# Build a parametric deterministic-equivalent NLP on CPU
prob = build_linear_tracking_problem(
horizon = T,
nx = nx,
backend = nothing, # CPU
slack_penalty = 10.0,
u_bounds = (-2.0, 2.0),
)
# LSTM policy: maps [w_t ; x_{t-1}] → target x̂_t at each stage
policy = StateConditionedPolicy(nx, nx, nx, [64, 64])
# Uncertainty sampler: returns a flat vector of length T * nw
sampler() = Float32.(0.1 .* randn(T * nx))
# Train with TS-DDR policy gradient (envelope theorem)
train_tsddr(
policy,
Float32.([1.0]), # initial state
prob,
prob.p_x0,
prob.p_target,
prob.p_w,
sampler;
num_batches = 100,
num_train_per_batch = 4,
optimizer = Flux.Adam(1f-3),
madnlp_kwargs = (print_level = MadNLP.ERROR, tol = 1e-6),
)Moving to GPU
To run the same problem on GPU, change the backend and add a GPU-native linear solver:
using CUDA, MadNLPGPU
prob_gpu = build_linear_tracking_problem(
horizon = T,
nx = nx,
backend = CUDABackend(),
slack_penalty = 10.0,
u_bounds = (-2.0, 2.0),
)
train_tsddr(
policy,
Float32.([1.0]),
prob_gpu,
prob_gpu.p_x0,
prob_gpu.p_target,
prob_gpu.p_w,
sampler;
num_batches = 100,
num_train_per_batch = 4,
optimizer = Flux.Adam(1f-3),
madnlp_kwargs = (
print_level = MadNLP.ERROR,
tol = 1e-6,
linear_solver = CUDSSSolver,
),
)The policy (Flux model) stays on CPU; only the NLP solve runs on GPU. Parameter updates (ExaModels.set_parameter!) and multiplier extraction handle CPU↔GPU transfers automatically.
Custom problems
For domain-specific models (power systems, robotics, etc.), build the ExaModels NLP directly instead of using build_linear_tracking_problem. The key requirements are:
- Add target constraints last so their multipliers form a contiguous slice of
result.multipliers. - Parameterize the initial state (
p_x0), uncertainty trajectory (p_w), and target trajectory (p_target) as ExaModels parameters. - Return a struct with fields
.core,.model,.horizon, and.target_con_range.
The HydroPowerModels example in DecisionRulesExa.jl demonstrates this pattern for a full AC-OPF problem with reservoir dynamics:
# In examples/HydroPowerModels/hydro_power_exa.jl
prob = build_hydro_de(
data;
num_stages = 96,
backend = CUDABackend(),
formulation = :ac_polar,
deficit_cost = 1e5,
target_penalty = :auto,
)Parallel GPU solves
When training samples are independent, multiple NLP instances can be solved concurrently on the same GPU. Build a pool of independent problem copies and pass it to train_tsddr:
pool = [(prob, prob.p_x0, prob.p_target, prob.p_w)]
for _ in 2:num_workers
p = build_my_problem(backend = CUDABackend())
push!(pool, (p, p.p_x0, p.p_target, p.p_w))
end
train_tsddr(policy, x0, prob, prob.p_x0, prob.p_target, prob.p_w, sampler;
problem_pool = pool,
num_train_per_batch = num_workers,
)Each pool entry gets its own MadNLP solver instance. Samples are distributed round-robin across the pool and solved via Threads.@spawn.
Penalty annealing
DecisionRulesExa.jl supports penalty annealing through the adjust_hyperparameters callback. The target penalty coefficient $\rho$ is stored as an ExaModels parameter and can be updated at runtime:
adjust_hyperparameters = function(iter, opt_state, num_train)
phase = iter < 100 ? 0.1 :
iter < 200 ? 1.0 :
iter < 300 ? 10.0 : 30.0
ρ = base_penalty * phase
penalty_vals = fill(ρ / 2, T * nx)
ExaModels.set_parameter!(prob.core, prob.p_penalty_half, penalty_vals)
return num_train
endThis mirrors the penalty_schedule keyword in DecisionRules.jl's train_multistage.
Rollout evaluation
RolloutEvaluation in DecisionRules.jl evaluates policies stage-by-stage under deployment semantics. DecisionRulesExa.jl provides an analogous RolloutEvaluation that solves stage subproblems sequentially:
eval = RolloutEvaluation(
stage_problem, x0, eval_scenarios;
horizon = T,
n_uncertainty = nw,
set_stage_parameters! = my_stage_setter!,
realized_state = my_realized_state,
stride = 25,
policy_state = :realized,
)Both packages report the same metrics: operational cost excluding target-deficit penalty, and target-violation share.
Mapping between packages
| DecisionRules.jl | DecisionRulesExa.jl | Notes |
|---|---|---|
train_multistage | train_tsddr | Main training loop |
state_conditioned_policy | StateConditionedPolicy | LSTM policy |
dense_multilayer_nn | MLPPolicy | MLP policy |
state_params_in | p_x0 | Initial state parameter |
state_params_out | p_target | Target parameter |
uncertainty_samples | p_w + sampler | Uncertainty parameter |
SampleLog / record | record_loss | Per-iteration callback |
RolloutEvaluation | RolloutEvaluation | Stage-wise eval |
penalty_schedule | adjust_hyperparameters | Penalty annealing |
ScoreFunctionConfig | — | Not yet ported to ExaModels |
| Stage-wise decomposition | — | JuMP only |
| Multiple shooting | — | JuMP only |
Full example: HydroPowerModels
The examples/HydroPowerModels/ directory in DecisionRulesExa.jl contains a complete AC-OPF hydrothermal scheduling example for the Bolivia test case — the same problem solved by DecisionRules.jl in the Hydropower Scheduling tutorial. It demonstrates:
- Parsing PowerModels.jl network data and hydro reservoir parameters
- Building a multi-stage deterministic-equivalent NLP in ExaModels (DC or AC polar OPF formulations)
- L1 + L2 penalty on target slack (δ⁺/δ⁻ splitting for smooth NLP)
- GPU training with parallel MadNLP solves
- Warm-start caching to prevent cascade solver failures
- Penalty and sample-count annealing schedules
- W&B metric logging