POMDPs under probabilistic semantics
Chatterjee, Krishnendu
Chmelik, Martin
We consider partially observable Markov decision processes (POMDPs) with limit-average payoff, where a reward value in the interval [0,1] is associated with every transition, and the payoff of an infinite path is the long-run average of the rewards. We consider two types of path constraints: (i) a quantitative constraint defines the set of paths where the payoff is at least a given threshold λ1ε(0,1]; and (ii) a qualitative constraint which is a special case of the quantitative constraint with λ1=1. We consider the computation of the almost-sure winning set, where the controller needs to ensure that the path constraint is satisfied with probability 1. Our main results for qualitative path constraints are as follows: (i) the problem of deciding the existence of a finite-memory controller is EXPTIME-complete; and (ii) the problem of deciding the existence of an infinite-memory controller is undecidable. For quantitative path constraints we show that the problem of deciding the existence of a finite-memory controller is undecidable. We also present a prototype implementation of our EXPTIME algorithm and experimental results on several examples.
Elsevier
2015
info:eu-repo/semantics/article
doc-type:article
text
https://research-explorer.app.ist.ac.at/record/1873
Chatterjee K, Chmelik M. POMDPs under probabilistic semantics. <i>Artificial Intelligence</i>. 2015;221:46-72. doi:<a href="https://doi.org/10.1016/j.artint.2014.12.009">10.1016/j.artint.2014.12.009</a>
eng
info:eu-repo/semantics/altIdentifier/doi/10.1016/j.artint.2014.12.009
info:eu-repo/semantics/altIdentifier/arxiv/1408.2058
info:eu-repo/semantics/openAccess