TY - CONF
AB - Multiple-environment Markov decision processes (MEMDPs) are MDPs equipped with not one, but multiple probabilistic transition functions, which represent the various possible unknown environments. While the previous research on MEMDPs focused on theoretical properties for long-run average payoff, we study them with discounted-sum payoff and focus on their practical advantages and applications. MEMDPs can be viewed as a special case of Partially observable and Mixed observability MDPs: the state of the system is perfectly observable, but not the environment. We show that the specific structure of MEMDPs allows for more efficient algorithmic analysis, in particular for faster belief updates. We demonstrate the applicability of MEMDPs in several domains. In particular, we formalize the sequential decision-making approach to contextual recommendation systems as MEMDPs and substantially improve over the previous MDP approach.
AU - Chatterjee, Krishnendu
AU - Chmelik, Martin
AU - Karkhanis, Deep
AU - Novotný, Petr
AU - Royer, Amélie
ID - 8193
SN - 23340835
T2 - Proceedings of the 30th International Conference on Automated Planning and Scheduling
TI - Multiple-environment Markov decision processes: Efficient analysis and applications
VL - 30
ER -
TY - JOUR
AB - We consider the problem of computing the set of initial states of a dynamical system such that there exists a control strategy to ensure that the trajectories satisfy a temporal logic specification with probability 1 (almost-surely). We focus on discrete-time, stochastic linear dynamics and specifications given as formulas of the Generalized Reactivity(1) fragment of Linear Temporal Logic over linear predicates in the states of the system. We propose a solution based on iterative abstraction-refinement, and turn-based 2-player probabilistic games. While the theoretical guarantee of our algorithm after any finite number of iterations is only a partial solution, we show that if our algorithm terminates, then the result is the set of all satisfying initial states. Moreover, for any (partial) solution our algorithm synthesizes witness control strategies to ensure almost-sure satisfaction of the temporal logic specification. While the proposed algorithm guarantees progress and soundness in every iteration, it is computationally demanding. We offer an alternative, more efficient solution for the reachability properties that decomposes the problem into a series of smaller problems of the same type. All algorithms are demonstrated on an illustrative case study.
AU - Svoreňová, Mária
AU - Kretinsky, Jan
AU - Chmelik, Martin
AU - Chatterjee, Krishnendu
AU - Cěrná, Ivana
AU - Belta, Cǎlin
ID - 1407
IS - 2
JF - Nonlinear Analysis: Hybrid Systems
TI - Temporal logic control for stochastic linear systems using abstraction refinement of probabilistic games
VL - 23
ER -
TY - CONF
AB - DEC-POMDPs extend POMDPs to a multi-agent setting, where several agents operate in an uncertain environment independently to achieve a joint objective. DEC-POMDPs have been studied with finite-horizon and infinite-horizon discounted-sum objectives, and there exist solvers both for exact and approximate solutions. In this work we consider Goal-DEC-POMDPs, where given a set of target states, the objective is to ensure that the target set is reached with minimal cost. We consider the indefinite-horizon (infinite-horizon with either discounted-sum, or undiscounted-sum, where absorbing goal states have zero-cost) problem. We present a new and novel method to solve the problem that extends methods for finite-horizon DEC-POMDPs and the RTDP-Bel approach for POMDPs. We present experimental results on several examples, and show that our approach presents promising results. Copyright
AU - Chatterjee, Krishnendu
AU - Chmelik, Martin
ID - 1324
T2 - Proceedings of the Twenty-Sixth International Conference on International Conference on Automated Planning and Scheduling
TI - Indefinite-horizon reachability in Goal-DEC-POMDPs
VL - 2016-January
ER -
TY - CONF
AB - We consider partially observable Markov decision processes (POMDPs) with a set of target states and positive integer costs associated with every transition. The traditional optimization objective (stochastic shortest path) asks to minimize the expected total cost until the target set is reached. We extend the traditional framework of POMDPs to model energy consumption, which represents a hard constraint. The energy levels may increase and decrease with transitions, and the hard constraint requires that the energy level must remain positive in all steps till the target is reached. First, we present a novel algorithm for solving POMDPs with energy levels, developing on existing POMDP solvers and using RTDP as its main method. Our second contribution is related to policy representation. For larger POMDP instances the policies computed by existing solvers are too large to be understandable. We present an automated procedure based on machine learning techniques that automatically extracts important decisions of the policy allowing us to compute succinct human readable policies. Finally, we show experimentally that our algorithm performs well and computes succinct policies on a number of POMDP instances from the literature that were naturally enhanced with energy levels.
AU - Brázdil, Tomáš
AU - Chatterjee, Krishnendu
AU - Chmelik, Martin
AU - Gupta, Anchit
AU - Novotny, Petr
ID - 1327
T2 - Proceedings of the 15th International Conference on Autonomous Agents and Multiagent Systems
TI - Stochastic shortest path with energy constraints in POMDPs
ER -
TY - THES
AB - We study partially observable Markov decision processes (POMDPs) with objectives used in verification and artificial intelligence. The qualitative analysis problem given a POMDP and an objective asks whether there is a strategy (policy) to ensure that the objective is satisfied almost surely (with probability 1), resp. with positive probability (with probability greater than 0). For POMDPs with limit-average payoff, where a reward value in the interval [0,1] is associated to every transition, and the payoff of an infinite path is the long-run average of the rewards, we consider two types of path constraints: (i) a quantitative limit-average constraint defines the set of paths where the payoff is at least a given threshold L1 = 1. Our main results for qualitative limit-average constraint under almost-sure winning are as follows: (i) the problem of deciding the existence of a finite-memory controller is EXPTIME-complete; and (ii) the problem of deciding the existence of an infinite-memory controller is undecidable. For quantitative limit-average constraints we show that the problem of deciding the existence of a finite-memory controller is undecidable. We present a prototype implementation of our EXPTIME algorithm. For POMDPs with w-regular conditions specified as parity objectives, while the qualitative analysis problems are known to be undecidable even for very special case of parity objectives, we establish decidability (with optimal complexity) of the qualitative analysis problems for POMDPs with parity objectives under finite-memory strategies. We establish optimal (exponential) memory bounds and EXPTIME-completeness of the qualitative analysis problems under finite-memory strategies for POMDPs with parity objectives. Based on our theoretical algorithms we also present a practical approach, where we design heuristics to deal with the exponential complexity, and have applied our implementation on a number of well-known POMDP examples for robotics applications. For POMDPs with a set of target states and an integer cost associated with every transition, we study the optimization objective that asks to minimize the expected total cost of reaching a state in the target set, while ensuring that the target set is reached almost surely. We show that for general integer costs approximating the optimal cost is undecidable. For positive costs, our results are as follows: (i) we establish matching lower and upper bounds for the optimal cost, both double and exponential in the POMDP state space size; (ii) we show that the problem of approximating the optimal cost is decidable and present approximation algorithms that extend existing algorithms for POMDPs with finite-horizon objectives. We show experimentally that it performs well in many examples of interest. We study more deeply the problem of almost-sure reachability, where given a set of target states, the question is to decide whether there is a strategy to ensure that the target set is reached almost surely. While in general the problem EXPTIME-complete, in many practical cases strategies with a small amount of memory suffice. Moreover, the existing solution to the problem is explicit, which first requires to construct explicitly an exponential reduction to a belief-support MDP. We first study the existence of observation-stationary strategies, which is NP-complete, and then small-memory strategies. We present a symbolic algorithm by an efficient encoding to SAT and using a SAT solver for the problem. We report experimental results demonstrating the scalability of our symbolic (SAT-based) approach. Decentralized POMDPs (DEC-POMDPs) extend POMDPs to a multi-agent setting, where several agents operate in an uncertain environment independently to achieve a joint objective. In this work we consider Goal DEC-POMDPs, where given a set of target states, the objective is to ensure that the target set is reached with minimal cost. We consider the indefinite-horizon (infinite-horizon with either discounted-sum, or undiscounted-sum, where absorbing goal states have zero-cost) problem. We present a new and novel method to solve the problem that extends methods for finite-horizon DEC-POMDPs and the real-time dynamic programming approach for POMDPs. We present experimental results on several examples, and show that our approach presents promising results. In the end we present a short summary of a few other results related to verification of MDPs and POMDPs.
AU - Chmelik, Martin
ID - 1397
TI - Algorithms for partially observable markov decision processes
ER -
TY - JOUR
AB - We consider partially observable Markov decision processes (POMDPs) with ω-regular conditions specified as parity objectives. The class of ω-regular languages provides a robust specification language to express properties in verification, and parity objectives are canonical forms to express them. The qualitative analysis problem given a POMDP and a parity objective asks whether there is a strategy to ensure that the objective is satisfied with probability 1 (resp. positive probability). While the qualitative analysis problems are undecidable even for special cases of parity objectives, we establish decidability (with optimal complexity) for POMDPs with all parity objectives under finite-memory strategies. We establish optimal (exponential) memory bounds and EXPTIME-completeness of the qualitative analysis problems under finite-memory strategies for POMDPs with parity objectives. We also present a practical approach, where we design heuristics to deal with the exponential complexity, and have applied our implementation on a number of POMDP examples.
AU - Chatterjee, Krishnendu
AU - Chmelik, Martin
AU - Tracol, Mathieu
ID - 1477
IS - 5
JF - Journal of Computer and System Sciences
TI - What is decidable about partially observable Markov decision processes with ω-regular objectives
VL - 82
ER -
TY - JOUR
AB - The inference of demographic history from genome data is hindered by a lack of efficient computational approaches. In particular, it has proved difficult to exploit the information contained in the distribution of genealogies across the genome. We have previously shown that the generating function (GF) of genealogies can be used to analytically compute likelihoods of demographic models from configurations of mutations in short sequence blocks (Lohse et al. 2011). Although the GF has a simple, recursive form, the size of such likelihood calculations explodes quickly with the number of individuals and applications of this framework have so far been mainly limited to small samples (pairs and triplets) for which the GF can be written by hand. Here we investigate several strategies for exploiting the inherent symmetries of the coalescent. In particular, we show that the GF of genealogies can be decomposed into a set of equivalence classes that allows likelihood calculations from nontrivial samples. Using this strategy, we automated blockwise likelihood calculations for a general set of demographic scenarios in Mathematica. These histories may involve population size changes, continuous migration, discrete divergence, and admixture between multiple populations. To give a concrete example, we calculate the likelihood for a model of isolation with migration (IM), assuming two diploid samples without phase and outgroup information. We demonstrate the new inference scheme with an analysis of two individual butterfly genomes from the sister species Heliconius melpomene rosina and H. cydno.
AU - Lohse, Konrad
AU - Chmelik, Martin
AU - Martin, Simon
AU - Barton, Nicholas H
ID - 1518
IS - 2
JF - Genetics
TI - Efficient strategies for calculating blockwise likelihoods under the coalescent
VL - 202
ER -
TY - JOUR
AB - We consider partially observable Markov decision processes (POMDPs) with a set of target states and an integer cost associated with every transition. The optimization objective we study asks to minimize the expected total cost of reaching a state in the target set, while ensuring that the target set is reached almost surely (with probability 1). We show that for integer costs approximating the optimal cost is undecidable. For positive costs, our results are as follows: (i) we establish matching lower and upper bounds for the optimal cost, both double exponential in the POMDP state space size; (ii) we show that the problem of approximating the optimal cost is decidable and present approximation algorithms developing on the existing algorithms for POMDPs with finite-horizon objectives. While the worst-case running time of our algorithm is double exponential, we also present efficient stopping criteria for the algorithm and show experimentally that it performs well in many examples of interest.
AU - Chatterjee, Krishnendu
AU - Chmelik, Martin
AU - Gupta, Raghav
AU - Kanodia, Ayush
ID - 1529
JF - Artificial Intelligence
TI - Optimal cost almost-sure reachability in POMDPs
VL - 234
ER -
TY - CONF
AB - POMDPs are standard models for probabilistic planning problems, where an agent interacts with an uncertain environment. We study the problem of almost-sure reachability, where given a set of target states, the question is to decide whether there is a policy to ensure that the target set is reached with probability 1 (almost-surely). While in general the problem is EXPTIMEcomplete, in many practical cases policies with a small amount of memory suffice. Moreover, the existing solution to the problem is explicit, which first requires to construct explicitly an exponential reduction to a belief-support MDP. In this work, we first study the existence of observation-stationary strategies, which is NP-complete, and then small-memory strategies. We present a symbolic algorithm by an efficient encoding to SAT and using a SAT solver for the problem. We report experimental results demonstrating the scalability of our symbolic (SAT-based) approach. © 2016, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.
AU - Chatterjee, Krishnendu
AU - Chmelik, Martin
AU - Davies, Jessica
ID - 1166
T2 - Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence
TI - A symbolic SAT based algorithm for almost sure reachability with small strategies in pomdps
VL - 2016
ER -
TY - GEN
AB - POMDPs are standard models for probabilistic planning problems, where an agent interacts with an uncertain environment. We study the problem of almost-sure reachability, where given a set of target states, the question is to decide whether there is a policy to ensure that the target set is reached with probability 1 (almost-surely). While in general the problem is EXPTIME-complete, in many practical cases policies with a small amount of memory suffice. Moreover, the existing solution to the problem is explicit, which first requires to construct explicitly an exponential reduction to a belief-support MDP. In this work, we first study the existence of observation-stationary strategies, which is NP-complete, and then small-memory strategies. We present a symbolic algorithm by an efficient encoding to SAT and using a SAT solver for the problem. We report experimental results demonstrating the scalability of our symbolic (SAT-based) approach.
AU - Chatterjee, Krishnendu
AU - Chmelik, Martin
AU - Davies, Jessica
ID - 5443
SN - 2664-1690
TI - A symbolic SAT-based algorithm for almost-sure reachability with small strategies in POMDPs
ER -
TY - CONF
AB - For deterministic systems, a counterexample to a property can simply be an error trace, whereas counterexamples in probabilistic systems are necessarily more complex. For instance, a set of erroneous traces with a sufficient cumulative probability mass can be used. Since these are too large objects to understand and manipulate, compact representations such as subchains have been considered. In the case of probabilistic systems with non-determinism, the situation is even more complex. While a subchain for a given strategy (or scheduler, resolving non-determinism) is a straightforward choice, we take a different approach. Instead, we focus on the strategy itself, and extract the most important decisions it makes, and present its succinct representation.
The key tools we employ to achieve this are (1) introducing a concept of importance of a state w.r.t. the strategy, and (2) learning using decision trees. There are three main consequent advantages of our approach. Firstly, it exploits the quantitative information on states, stressing the more important decisions. Secondly, it leads to a greater variability and degree of freedom in representing the strategies. Thirdly, the representation uses a self-explanatory data structure. In summary, our approach produces more succinct and more explainable strategies, as opposed to e.g. binary decision diagrams. Finally, our experimental results show that we can extract several rules describing the strategy even for very large systems that do not fit in memory, and based on the rules explain the erroneous behaviour.
AU - Brázdil, Tomáš
AU - Chatterjee, Krishnendu
AU - Chmelik, Martin
AU - Fellner, Andreas
AU - Kretinsky, Jan
ID - 1603
TI - Counterexample explanation by learning small strategies in Markov decision processes
VL - 9206
ER -
TY - CONF
AB - We consider the problem of computing the set of initial states of a dynamical system such that there exists a control strategy to ensure that the trajectories satisfy a temporal logic specification with probability 1 (almost-surely). We focus on discrete-time, stochastic linear dynamics and specifications given as formulas of the Generalized Reactivity(1) fragment of Linear Temporal Logic over linear predicates in the states of the system. We propose a solution based on iterative abstraction-refinement, and turn-based 2-player probabilistic games. While the theoretical guarantee of our algorithm after any finite number of iterations is only a partial solution, we show that if our algorithm terminates, then the result is the set of satisfying initial states. Moreover, for any (partial) solution our algorithm synthesizes witness control strategies to ensure almost-sure satisfaction of the temporal logic specification. We demonstrate our approach on an illustrative case study.
AU - Svoreňová, Mária
AU - Kretinsky, Jan
AU - Chmelik, Martin
AU - Chatterjee, Krishnendu
AU - Cěrná, Ivana
AU - Belta, Cǎlin
ID - 1689
T2 - Proceedings of the 18th International Conference on Hybrid Systems: Computation and Control
TI - Temporal logic control for stochastic linear systems using abstraction refinement of probabilistic games
ER -
TY - CONF
AB - We consider a case study of the problem of deploying an autonomous air vehicle in a partially observable, dynamic, indoor environment from a specification given as a linear temporal logic (LTL) formula over regions of interest. We model the motion and sensing capabilities of the vehicle as a partially observable Markov decision process (POMDP). We adapt recent results for solving POMDPs with parity objectives to generate a control policy. We also extend the existing framework with a policy minimization technique to obtain a better implementable policy, while preserving its correctness. The proposed techniques are illustrated in an experimental setup involving an autonomous quadrotor performing surveillance in a dynamic environment.
AU - Svoreňová, Mária
AU - Chmelik, Martin
AU - Leahy, Kevin
AU - Eniser, Hasan
AU - Chatterjee, Krishnendu
AU - Cěrná, Ivana
AU - Belta, Cǎlin
ID - 1691
T2 - Proceedings of the 18th International Conference on Hybrid Systems: Computation and Control
TI - Temporal logic motion planning using POMDPs with parity objectives: Case study paper
ER -
TY - CONF
AB - We consider partially observable Markov decision processes (POMDPs), that are a standard framework for robotics applications to model uncertainties present in the real world, with temporal logic specifications. All temporal logic specifications in linear-time temporal logic (LTL) can be expressed as parity objectives. We study the qualitative analysis problem for POMDPs with parity objectives that asks whether there is a controller (policy) to ensure that the objective holds with probability 1 (almost-surely). While the qualitative analysis of POMDPs with parity objectives is undecidable, recent results show that when restricted to finite-memory policies the problem is EXPTIME-complete. While the problem is intractable in theory, we present a practical approach to solve the qualitative analysis problem. We designed several heuristics to deal with the exponential complexity, and have used our implementation on a number of well-known POMDP examples for robotics applications. Our results provide the first practical approach to solve the qualitative analysis of robot motion planning with LTL properties in the presence of uncertainty.
AU - Chatterjee, Krishnendu
AU - Chmelik, Martin
AU - Gupta, Raghav
AU - Kanodia, Ayush
ID - 1732
TI - Qualitative analysis of POMDPs with temporal logic specifications for robotics applications
ER -
TY - CONF
AB - We consider partially observable Markov decision processes (POMDPs) with a set of target states and every transition is associated with an integer cost. The optimization objec- tive we study asks to minimize the expected total cost till the target set is reached, while ensuring that the target set is reached almost-surely (with probability 1). We show that for integer costs approximating the optimal cost is undecidable. For positive costs, our results are as follows: (i) we establish matching lower and upper bounds for the optimal cost and the bound is double exponential; (ii) we show that the problem of approximating the optimal cost is decidable and present ap- proximation algorithms developing on the existing algorithms for POMDPs with finite-horizon objectives. While the worst- case running time of our algorithm is double exponential, we present efficient stopping criteria for the algorithm and show experimentally that it performs well in many examples.
AU - Chatterjee, Krishnendu
AU - Chmelik, Martin
AU - Gupta, Raghav
AU - Kanodia, Ayush
ID - 1820
T2 - Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence
TI - Optimal cost almost-sure reachability in POMDPs
VL - 5
ER -
TY - JOUR
AB - We consider partially observable Markov decision processes (POMDPs) with limit-average payoff, where a reward value in the interval [0,1] is associated with every transition, and the payoff of an infinite path is the long-run average of the rewards. We consider two types of path constraints: (i) a quantitative constraint defines the set of paths where the payoff is at least a given threshold λ1ε(0,1]; and (ii) a qualitative constraint which is a special case of the quantitative constraint with λ1=1. We consider the computation of the almost-sure winning set, where the controller needs to ensure that the path constraint is satisfied with probability 1. Our main results for qualitative path constraints are as follows: (i) the problem of deciding the existence of a finite-memory controller is EXPTIME-complete; and (ii) the problem of deciding the existence of an infinite-memory controller is undecidable. For quantitative path constraints we show that the problem of deciding the existence of a finite-memory controller is undecidable. We also present a prototype implementation of our EXPTIME algorithm and experimental results on several examples.
AU - Chatterjee, Krishnendu
AU - Chmelik, Martin
ID - 1873
JF - Artificial Intelligence
TI - POMDPs under probabilistic semantics
VL - 221
ER -
TY - JOUR
AB - We consider Markov decision processes (MDPs) which are a standard model for probabilistic systems. We focus on qualitative properties for MDPs that can express that desired behaviors of the system arise almost-surely (with probability 1) or with positive probability. We introduce a new simulation relation to capture the refinement relation of MDPs with respect to qualitative properties, and present discrete graph algorithms with quadratic complexity to compute the simulation relation. We present an automated technique for assume-guarantee style reasoning for compositional analysis of two-player games by giving a counterexample guided abstraction-refinement approach to compute our new simulation relation. We show a tight link between two-player games and MDPs, and as a consequence the results for games are lifted to MDPs with qualitative properties. We have implemented our algorithms and show that the compositional analysis leads to significant improvements.
AU - Chatterjee, Krishnendu
AU - Chmelik, Martin
AU - Daca, Przemyslaw
ID - 1501
IS - 2
JF - Formal Methods in System Design
TI - CEGAR for compositional analysis of qualitative properties in Markov decision processes
VL - 47
ER -
TY - CONF
AB - We present a general framework for applying machine-learning algorithms to the verification of Markov decision processes (MDPs). The primary goal of these techniques is to improve performance by avoiding an exhaustive exploration of the state space. Our framework focuses on probabilistic reachability, which is a core property for verification, and is illustrated through two distinct instantiations. The first assumes that full knowledge of the MDP is available, and performs a heuristic-driven partial exploration of the model, yielding precise lower and upper bounds on the required probability. The second tackles the case where we may only sample the MDP, and yields probabilistic guarantees, again in terms of both the lower and upper bounds, which provides efficient stopping criteria for the approximation. The latter is the first extension of statistical model checking for unbounded properties inMDPs. In contrast with other related techniques, our approach is not restricted to time-bounded (finite-horizon) or discounted properties, nor does it assume any particular properties of the MDP. We also show how our methods extend to LTL objectives. We present experimental results showing the performance of our framework on several examples.
AU - Brázdil, Tomáš
AU - Chatterjee, Krishnendu
AU - Chmelik, Martin
AU - Forejt, Vojtěch
AU - Kretinsky, Jan
AU - Kwiatkowska, Marta
AU - Parker, David
AU - Ujma, Mateusz
ED - Cassez, Franck
ED - Raskin, Jean-François
ID - 2027
T2 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
TI - Verification of markov decision processes using learning algorithms
VL - 8837
ER -
TY - CONF
AB - We consider Markov decision processes (MDPs) which are a standard model for probabilistic systems.We focus on qualitative properties forMDPs that can express that desired behaviors of the system arise almost-surely (with probability 1) or with positive probability. We introduce a new simulation relation to capture the refinement relation ofMDPs with respect to qualitative properties, and present discrete graph theoretic algorithms with quadratic complexity to compute the simulation relation.We present an automated technique for assume-guarantee style reasoning for compositional analysis ofMDPs with qualitative properties by giving a counterexample guided abstraction-refinement approach to compute our new simulation relation. We have implemented our algorithms and show that the compositional analysis leads to significant improvements.
AU - Chatterjee, Krishnendu
AU - Chmelik, Martin
AU - Daca, Przemyslaw
ID - 2063
TI - CEGAR for qualitative analysis of probabilistic systems
VL - 8559
ER -
TY - GEN
AB - We consider Markov decision processes (MDPs) which are a standard model for probabilistic systems. We focus on qualitative properties for MDPs that can express that desired behaviors of the system arise almost-surely (with probability 1) or with positive probability.
We introduce a new simulation relation to capture the refinement relation of MDPs with respect to qualitative properties, and present discrete graph theoretic algorithms with quadratic complexity to compute the simulation relation.
We present an automated technique for assume-guarantee style reasoning for compositional analysis of MDPs with qualitative properties by giving a counter-example guided abstraction-refinement approach to compute our new simulation relation. We have implemented our algorithms and show that the compositional analysis leads to significant improvements.
AU - Chatterjee, Krishnendu
AU - Daca, Przemyslaw
AU - Chmelik, Martin
ID - 5412
SN - 2664-1690
TI - CEGAR for qualitative analysis of probabilistic systems
ER -