Title: | Approximate POMDP Planning Software |
---|---|
Description: | A toolkit for Partially Observed Markov Decision Processes (POMDP). Provides bindings to C++ libraries implementing the algorithm SARSOP (Successive Approximations of the Reachable Space under Optimal Policies) and described in Kurniawati et al (2008), <doi:10.15607/RSS.2008.IV.009>. This package also provides a high-level interface for generating, solving and simulating POMDP problems and their solutions. |
Authors: | Carl Boettiger [cre, aut, cph] , Jeroen Ooms [aut], Milad Memarzadeh [aut], Hanna Kurniawati [ctb, cph], David Hsu [ctb, cph], Hanna Kurniawati [ctb, cph], Wee Sun Lee [ctb, cph], Yanzhu Du [ctb], Xan Huang [ctb], Trey Smith [ctb, cph], Tony Cassandra [ctb, cph], Lee Thomason [ctb, cph], Carl Kindman [ctb, cph], Le Trong Dao [ctb, cph], Amit Jain [ctb, cph], Rong Nan [ctb, cph], Ulrich Drepper [ctb], Free Software Foundation [cph], Tyge Lovset [ctb, cph], Yves Berquin [ctb, cph], Benjamin Grüdelbach [ctb], RSA Data Security, Inc. [cph] |
Maintainer: | Carl Boettiger <[email protected]> |
License: | GPL-2 |
Version: | 0.6.15 |
Built: | 2024-10-29 06:17:54 UTC |
Source: | https://github.com/boettiger-lab/sarsop |
Read alpha vectors from a log file.
alphas_from_log(meta, log_dir = ".")
alphas_from_log(meta, log_dir = ".")
meta |
a data frame containing the log metadata
for each set of alpha vectors desired, see
|
log_dir |
path to log directory |
a list with a matrix of alpha vectors for each
entry in the provided metadata (as returned by sarsop
).
# takes > 5s source(system.file("examples/fisheries-ex.R", package = "sarsop")) log = tempfile() alpha <- sarsop(transition, observation, reward, discount, precision = 10, log_dir = log)
# takes > 5s source(system.file("examples/fisheries-ex.R", package = "sarsop")) log = tempfile() alpha <- sarsop(transition, observation, reward, discount, precision = 10, log_dir = log)
Asserts that the C++ binaries for appl have been compiled successfully
assert_has_appl()
assert_has_appl()
Will return TRUE if binaries are installed and can be located and executed, and FALSE otherwise.
assert_has_appl()
assert_has_appl()
Derive the corresponding policy function from the alpha vectors
compute_policy( alpha, transition, observation, reward, state_prior = rep(1, dim(observation)[[1]])/dim(observation)[[1]], a_0 = 1 )
compute_policy( alpha, transition, observation, reward, state_prior = rep(1, dim(observation)[[1]])/dim(observation)[[1]], a_0 = 1 )
alpha |
the matrix of alpha vectors returned by |
transition |
Transition matrix, dimension n_s x n_s x n_a |
observation |
Observation matrix, dimension n_s x n_z x n_a |
reward |
reward matrix, dimension n_s x n_a |
state_prior |
initial belief state, optional, defaults to uniform over states |
a_0 |
previous action. Belief in state depends not only on observation, but on prior belief of the state and subsequent action that had been taken. |
a data frame providing the optimal policy (choice of action) and corresponding value of the action for each possible belief state
m <- fisheries_matrices() ## Takes > 5s if(assert_has_appl()){ alpha <- sarsop(m$transition, m$observation, m$reward, 0.95, precision = 10) compute_policy(alpha, m$transition, m$observation, m$reward) }
m <- fisheries_matrices() ## Takes > 5s if(assert_has_appl()){ alpha <- sarsop(m$transition, m$observation, m$reward, 0.95, precision = 10) compute_policy(alpha, m$transition, m$observation, m$reward) }
Read transition function from log
f_from_log(meta)
f_from_log(meta)
meta |
a data frame containing the log metadata
for each set of alpha vectors desired, see
|
note this function is unique to the fisheries example problem and assumes that sarsop call is run with logging specifying a column "model" that contains either the string "ricker" (corresponding to a Ricker-type growth function) or "allen" (corresponding to an Allen-type.)
the growth function associated with the model indicated.
# takes > 5s source(system.file("examples/fisheries-ex.R", package = "sarsop")) log = tempfile() alpha <- sarsop(transition, observation, reward, discount, precision = 10, log_dir = log)
# takes > 5s source(system.file("examples/fisheries-ex.R", package = "sarsop")) log = tempfile() alpha <- sarsop(transition, observation, reward, discount, precision = 10, log_dir = log)
Initialize the transition, observation, and reward matrices given a transition function, reward function, and state space
fisheries_matrices( states = 0:20, actions = states, observed_states = states, reward_fn = function(x, a) pmin(x, a), f = ricker(1, 15), sigma_g = 0.1, sigma_m = 0.1, noise = c("rescaled-lognormal", "lognormal", "uniform", "normal") )
fisheries_matrices( states = 0:20, actions = states, observed_states = states, reward_fn = function(x, a) pmin(x, a), f = ricker(1, 15), sigma_g = 0.1, sigma_m = 0.1, noise = c("rescaled-lognormal", "lognormal", "uniform", "normal") )
states |
sequence of possible states |
actions |
sequence of possible actions |
observed_states |
sequence of possible observations |
reward_fn |
function of x and a that gives reward for tacking action a when state is x |
f |
transition function of state x and action a. |
sigma_g |
half-width of uniform shock or equivalent variance for log-normal |
sigma_m |
half-width of uniform shock or equivalent variance for log-normal |
noise |
distribution for noise, "lognormal" or "uniform" |
assumes log-normally distributed observation errors and process errors
list of transition matrix, observation matrix, and reward matrix
m <- fisheries_matrices()
m <- fisheries_matrices()
Compare historical actions to what pomdp recommendation would have been.
hindcast_pomdp( transition, observation, reward, discount, obs, action, state_prior = rep(1, dim(observation)[[1]])/dim(observation)[[1]], alpha = NULL, ... )
hindcast_pomdp( transition, observation, reward, discount, obs, action, state_prior = rep(1, dim(observation)[[1]])/dim(observation)[[1]], alpha = NULL, ... )
transition |
Transition matrix, dimension n_s x n_s x n_a |
observation |
Observation matrix, dimension n_s x n_z x n_a |
reward |
reward matrix, dimension n_s x n_a |
discount |
the discount factor |
obs |
a given sequence of observations |
action |
the corresponding sequence of actions |
state_prior |
initial belief state, optional, defaults to uniform over states |
alpha |
the matrix of alpha vectors returned by |
... |
additional arguments to |
a list, containing: a data frame with columns for time, obs, action, and optimal action, and an array containing the posterior belief distribution at each time t
m <- fisheries_matrices() ## Takes > 5s if(assert_has_appl()){ alpha <- sarsop(m$transition, m$observation, m$reward, 0.95, precision = 10) sim <- hindcast_pomdp(m$transition, m$observation, m$reward, 0.95, obs = rnorm(21, 15, .1), action = rep(1, 21), alpha = alpha) }
m <- fisheries_matrices() ## Takes > 5s if(assert_has_appl()){ alpha <- sarsop(m$transition, m$observation, m$reward, 0.95, precision = 10) sim <- hindcast_pomdp(m$transition, m$observation, m$reward, 0.95, obs = rnorm(21, 15, .1), action = rep(1, 21), alpha = alpha) }
load metadata from a log file
meta_from_log( parameters, log_dir = ".", metafile = paste0(log_dir, "/meta.csv") )
meta_from_log( parameters, log_dir = ".", metafile = paste0(log_dir, "/meta.csv") )
parameters |
a data.frame with the desired parameter values as given in metafile |
log_dir |
path to log directory |
metafile |
path to metafile index, assumed to be meta.csv in log_dir |
a data.frame with the rows of the matching metadata.
# takes > 5s source(system.file("examples/fisheries-ex.R", package = "sarsop")) log = tempfile() alpha <- sarsop(transition, observation, reward, discount, precision = 10, log_dir = log)
# takes > 5s source(system.file("examples/fisheries-ex.R", package = "sarsop")) log = tempfile() alpha <- sarsop(transition, observation, reward, discount, precision = 10, log_dir = log)
Read model details from log file
models_from_log(meta, reward_fn = function(x, h) pmin(x, h))
models_from_log(meta, reward_fn = function(x, h) pmin(x, h))
meta |
a data frame containing the log metadata
for each set of alpha vectors desired, see
|
reward_fn |
a function f(x,a) giving the reward for taking action a given a system in state x. |
assumes transition can be determined by the f_from_log function, which is specific to the fisheries example
a list with an element for each row in the requested meta data frame, which itself is a list of the three matrices: transition, observation, and reward, defining the pomdp problem.
# takes > 5s source(system.file("examples/fisheries-ex.R", package = "sarsop")) log = tempfile() alpha <- sarsop(transition, observation, reward, discount, precision = 10, log_dir = log)
# takes > 5s source(system.file("examples/fisheries-ex.R", package = "sarsop")) log = tempfile() alpha <- sarsop(transition, observation, reward, discount, precision = 10, log_dir = log)
Wrappers for the APPL executables. The pomdpsol
function solves a model
file and returns the path to the output policy file.
pomdpsol( model, output = tempfile(), precision = 0.001, timeout = NULL, fast = FALSE, randomization = FALSE, memory = NULL, improvementConstant = NULL, timeInterval = NULL, stdout = tempfile(), stderr = tempfile(), spinner = TRUE ) polgraph( model, policy, output = tempfile(), max_depth = 3, max_branches = 10, min_prob = 0.001, stdout = "", spinner = TRUE ) pomdpsim( model, policy, output = tempfile(), steps = 100, simulations = 3, stdout = "", spinner = TRUE ) pomdpeval( model, policy, output = tempfile(), steps = 100, simulations = 3, stdout = "", spinner = TRUE ) pomdpconvert(model, stdout = "", spinner = TRUE)
pomdpsol( model, output = tempfile(), precision = 0.001, timeout = NULL, fast = FALSE, randomization = FALSE, memory = NULL, improvementConstant = NULL, timeInterval = NULL, stdout = tempfile(), stderr = tempfile(), spinner = TRUE ) polgraph( model, policy, output = tempfile(), max_depth = 3, max_branches = 10, min_prob = 0.001, stdout = "", spinner = TRUE ) pomdpsim( model, policy, output = tempfile(), steps = 100, simulations = 3, stdout = "", spinner = TRUE ) pomdpeval( model, policy, output = tempfile(), steps = 100, simulations = 3, stdout = "", spinner = TRUE ) pomdpconvert(model, stdout = "", spinner = TRUE)
model |
file/path to the |
output |
file/path of the output policy file. This is also returned by the function. |
precision |
targetPrecision. Set targetPrecision as the target precision in solution quality; run ends when target precision is reached. The target precision is 1e-3 by default. |
timeout |
Use timeLimit as the timeout in seconds. If running time exceeds the specified value, pomdpsol writes out a policy and terminates. There is no time limit by default. |
fast |
logical, default FALSE. use fast (but very picky) alternate parser for .pomdp files. |
randomization |
logical, default FALSE. Turn on randomization for the sampling algorithm. |
memory |
Use memoryLimit as the memory limit in MB. No memory limit by default. If memory usage exceeds the specified value, pomdpsol writes out a policy and terminates. Set the value to be less than physical memory to avoid swapping. |
improvementConstant |
Use improvementConstant as the trial improvement factor in the sampling algorithm. At the default of 0.5, a trial terminates at a belief when the gap between its upper and lower bound is 0.5 of the current precision at the initial belief. |
timeInterval |
Use timeInterval as the time interval between two consecutive write-out of policy files. If this is not specified, pomdpsol only writes out a policy file upon termination. |
stdout |
a filename where pomdp run statistics will be stored |
stderr |
currently ignored. |
spinner |
should we show a spinner while sarsop is running? |
policy |
file/path to the policy file |
max_depth |
the maximum horizon of the generated policy graph |
max_branches |
maximum number of branches to show in the policy graph |
min_prob |
the minimum probability threshold for a branch to be shown in the policy graph |
steps |
number of steps for each simulation run |
simulations |
as the number of simulation runs |
if(assert_has_appl()){ model <- system.file("models", "example.pomdp", package = "sarsop") policy <- tempfile(fileext = ".policyx") pomdpsol(model, output = policy, timeout = 1) # Other tools evaluation <- pomdpeval(model, policy, stdout = FALSE) graph <- polgraph(model, policy, stdout = FALSE) simulations <- pomdpsim(model, policy, stdout = FALSE) }
if(assert_has_appl()){ model <- system.file("models", "example.pomdp", package = "sarsop") policy <- tempfile(fileext = ".policyx") pomdpsol(model, output = policy, timeout = 1) # Other tools evaluation <- pomdpeval(model, policy, stdout = FALSE) graph <- polgraph(model, policy, stdout = FALSE) simulations <- pomdpsim(model, policy, stdout = FALSE) }
read a .policyx file created by SARSOP and return alpha vectors and associated actions.
read_policyx(file = "output.policyx")
read_policyx(file = "output.policyx")
file |
name of the policyx file to be read. |
a list, first element "vectors" is an n_states x n_vectors array of alpha vectors, second element is a numeric vector "action" of length n_vectors whose i'th element indicates the action corresponding to the i'th alpha vector (column) in the vectors array.
f <- system.file("extdata", "out.policy", package="sarsop", mustWork = TRUE) policy <- read_policyx(f)
f <- system.file("extdata", "out.policy", package="sarsop", mustWork = TRUE) policy <- read_policyx(f)
sarsop wraps the tasks of writing the pomdpx file defining the problem, running the pomdsol (SARSOP) algorithm in C++, and then reading the resulting policy file back into R. The returned alpha vectors and alpha_action information is then transformed into a more generic, user-friendly representation as a matrix whose columns correspond to actions and rows to states. This function can thus be used at the heart of most pomdp applications.
sarsop( transition, observation, reward, discount, state_prior = rep(1, dim(observation)[[1]])/dim(observation)[[1]], verbose = TRUE, log_dir = tempdir(), log_data = NULL, cache = TRUE, ... )
sarsop( transition, observation, reward, discount, state_prior = rep(1, dim(observation)[[1]])/dim(observation)[[1]], verbose = TRUE, log_dir = tempdir(), log_data = NULL, cache = TRUE, ... )
transition |
Transition matrix, dimension n_s x n_s x n_a |
observation |
Observation matrix, dimension n_s x n_z x n_a |
reward |
reward matrix, dimension n_s x n_a |
discount |
the discount factor |
state_prior |
initial belief state, optional, defaults to uniform over states |
verbose |
logical, should the function include a message with pomdp diagnostics (timings, final precision, end condition) |
log_dir |
pomdpx and policyx files will be saved here, along with a metadata file |
log_data |
a data.frame of additional columns to include in the log, such as model parameters. A unique id value for each run can be provided as one of the columns, otherwise, a globally unique id will be generated. |
cache |
should results from the log directory be cached? Default TRUE. Identical functional calls will quickly return previously cached alpha vectors from file rather than re-running. |
... |
additional arguments to |
a matrix of alpha vectors. Column index indicates action associated with the alpha vector, (1:n_actions), rows indicate system state, x. Actions for which no alpha vector was found are included as all -Inf, since such actions are not optimal regardless of belief, and thus have no corresponding alpha vectors in alpha_action list.
## Takes > 5s ## Use example code to generate matrices for pomdp problem: source(system.file("examples/fisheries-ex.R", package = "sarsop")) alpha <- sarsop(transition, observation, reward, discount, precision = 10) compute_policy(alpha, transition, observation, reward)
## Takes > 5s ## Use example code to generate matrices for pomdp problem: source(system.file("examples/fisheries-ex.R", package = "sarsop")) alpha <- sarsop(transition, observation, reward, discount, precision = 10) compute_policy(alpha, transition, observation, reward)
Simulate a POMDP given the appropriate matrices.
sim_pomdp( transition, observation, reward, discount, state_prior = rep(1, dim(observation)[[1]])/dim(observation)[[1]], x0, a0 = 1, Tmax = 20, policy = NULL, alpha = NULL, reps = 1, ... )
sim_pomdp( transition, observation, reward, discount, state_prior = rep(1, dim(observation)[[1]])/dim(observation)[[1]], x0, a0 = 1, Tmax = 20, policy = NULL, alpha = NULL, reps = 1, ... )
transition |
Transition matrix, dimension n_s x n_s x n_a |
observation |
Observation matrix, dimension n_s x n_z x n_a |
reward |
reward matrix, dimension n_s x n_a |
discount |
the discount factor |
state_prior |
initial belief state, optional, defaults to uniform over states |
x0 |
initial state |
a0 |
initial action (default is action 1, e.g. can be arbitrary if the observation process is independent of the action taken) |
Tmax |
duration of simulation |
policy |
Simulate using a pre-computed policy (e.g. MDP policy) instead of POMDP |
alpha |
the matrix of alpha vectors returned by |
reps |
number of replicate simulations to compute |
... |
additional arguments to mclapply |
simulation assumes the following order of updating: For system in state[t] at time t, an observation of the system obs[t] is made, and then action[t] is based on that observation and the given policy, returning (discounted) reward[t].
a data frame with columns for time, state, obs, action, and (discounted) value.
m <- fisheries_matrices() discount <- 0.95 ## Takes > 5s if(assert_has_appl()){ alpha <- sarsop(m$transition, m$observation, m$reward, discount, precision = 10) sim <- sim_pomdp(m$transition, m$observation, m$reward, discount, x0 = 5, Tmax = 20, alpha = alpha) }
m <- fisheries_matrices() discount <- 0.95 ## Takes > 5s if(assert_has_appl()){ alpha <- sarsop(m$transition, m$observation, m$reward, discount, precision = 10) sim <- sim_pomdp(m$transition, m$observation, m$reward, discount, x0 = 5, Tmax = 20, alpha = alpha) }
A POMDPX file specifies a POMDP problem in terms of the transition, observation, and reward matrices, the discount factor, and the initial belief.
write_pomdpx( P, O, R, gamma, b = rep(1/dim(O)[1], dim(O)[1]), file = "input.pomdpx", digits = 12, digits2 = 12, format = "f" )
write_pomdpx( P, O, R, gamma, b = rep(1/dim(O)[1], dim(O)[1]), file = "input.pomdpx", digits = 12, digits2 = 12, format = "f" )
P |
transition matrix |
O |
observation matrix |
R |
reward |
gamma |
discount factor |
b |
initial belief |
file |
pomdpx file to create |
digits |
precision to round to before normalizing. Leave at 4 since sarsop seems unable to do more? |
digits2 |
precision to write solution to. Leave at 10, since normalizing requires additional precision |
format |
floating point format, because sarsop parser doesn't seem to know scientific notation |
m <- fisheries_matrices() f <- tempfile() write_pomdpx(m$transition, m$observation, m$reward, 0.95, file = f)
m <- fisheries_matrices() f <- tempfile() write_pomdpx(m$transition, m$observation, m$reward, 0.95, file = f)