Abstract
We tackle the problem of finding optimal policies for Markov Decision Processes, that minimize the probability of the cumulative cost exceeding a given budget. Such task falls under the umbrella of Risk-Sensitive Markov Decision Processes, which optimize a non-additive, non-linear function of cumulative cost that incorporates the user’s attitude towards risk. Current algorithms for solving that task, for any budget equal or smaller than an user-defined budget, scale poorly when the support of the cost function is large, since they operate in an augmented state space which enumerates all possible remaining budgets. To circumvent this issue, we develop (i) an improved version of the Topological Value Iteration with Dynamic Programming algorithm (tvi-dp), and (ii) the first symbolic dynamic programming algorithm for this class of problems, called rs-spudd, that exploits conditional independence in the transition function in the augmented state space. The proposed algorithms improve efficiency by pruning irrelevant states and terminating early, without sacrificing optimality. Empirical results show that rs-spudd is able to solve problems up to 103 times larger than tvi-dp.
Type
Publication
International Journal of Approximate Reasoning