---
res:
bibo_abstract:
- 'We study partially observable Markov decision processes (POMDPs) with objectives
used in verification and artificial intelligence. The qualitative analysis problem
given a POMDP and an objective asks whether there is a strategy (policy) to ensure
that the objective is satisfied almost surely (with probability 1), resp. with
positive probability (with probability greater than 0). For POMDPs with limit-average
payoff, where a reward value in the interval [0,1] is associated to every transition,
and the payoff of an infinite path is the long-run average of the rewards, we
consider two types of path constraints: (i) a quantitative limit-average constraint
defines the set of paths where the payoff is at least a given threshold L1 = 1.
Our main results for qualitative limit-average constraint under almost-sure winning
are as follows: (i) the problem of deciding the existence of a finite-memory controller
is EXPTIME-complete; and (ii) the problem of deciding the existence of an infinite-memory
controller is undecidable. For quantitative limit-average constraints we show
that the problem of deciding the existence of a finite-memory controller is undecidable.
We present a prototype implementation of our EXPTIME algorithm. For POMDPs with
w-regular conditions specified as parity objectives, while the qualitative analysis
problems are known to be undecidable even for very special case of parity objectives,
we establish decidability (with optimal complexity) of the qualitative analysis
problems for POMDPs with parity objectives under finite-memory strategies. We
establish optimal (exponential) memory bounds and EXPTIME-completeness of the
qualitative analysis problems under finite-memory strategies for POMDPs with parity
objectives. Based on our theoretical algorithms we also present a practical approach,
where we design heuristics to deal with the exponential complexity, and have applied
our implementation on a number of well-known POMDP examples for robotics applications.
For POMDPs with a set of target states and an integer cost associated with every
transition, we study the optimization objective that asks to minimize the expected
total cost of reaching a state in the target set, while ensuring that the target
set is reached almost surely. We show that for general integer costs approximating
the optimal cost is undecidable. For positive costs, our results are as follows:
(i) we establish matching lower and upper bounds for the optimal cost, both double
and exponential in the POMDP state space size; (ii) we show that the problem of
approximating the optimal cost is decidable and present approximation algorithms
that extend existing algorithms for POMDPs with finite-horizon objectives. We
show experimentally that it performs well in many examples of interest. We study
more deeply the problem of almost-sure reachability, where given a set of target
states, the question is to decide whether there is a strategy to ensure that the
target set is reached almost surely. While in general the problem EXPTIME-complete,
in many practical cases strategies with a small amount of memory suffice. Moreover,
the existing solution to the problem is explicit, which first requires to construct
explicitly an exponential reduction to a belief-support MDP. We first study the
existence of observation-stationary strategies, which is NP-complete, and then
small-memory strategies. We present a symbolic algorithm by an efficient encoding
to SAT and using a SAT solver for the problem. We report experimental results
demonstrating the scalability of our symbolic (SAT-based) approach. Decentralized
POMDPs (DEC-POMDPs) extend POMDPs to a multi-agent setting, where several agents
operate in an uncertain environment independently to achieve a joint objective.
In this work we consider Goal DEC-POMDPs, where given a set of target states,
the objective is to ensure that the target set is reached with minimal cost. We
consider the indefinite-horizon (infinite-horizon with either discounted-sum,
or undiscounted-sum, where absorbing goal states have zero-cost) problem. We present
a new and novel method to solve the problem that extends methods for finite-horizon
DEC-POMDPs and the real-time dynamic programming approach for POMDPs. We present
experimental results on several examples, and show that our approach presents
promising results. In the end we present a short summary of a few other results
related to verification of MDPs and POMDPs.@eng'
bibo_authorlist:
- foaf_Person:
foaf_givenName: Martin
foaf_name: Chmelik, Martin
foaf_surname: Chmelik
foaf_workInfoHomepage: http://www.librecat.org/personId=3624234E-F248-11E8-B48F-1D18A9856A87
dct_date: 2016^xs_gYear
dct_language: eng
dct_publisher: IST Austria@
dct_title: Algorithms for partially observable markov decision processes@
...