TY - CONF AB - The Big Match is a multi-stage two-player game. In each stage Player 1 hides one or two pebbles in his hand, and his opponent has to guess that number; Player 1 loses a point if Player 2 is correct, and otherwise he wins a point. As soon as Player 1 hides one pebble, the players cannot change their choices in any future stage. Blackwell and Ferguson (1968) give an ε-optimal strategy for Player 1 that hides, in each stage, one pebble with a probability that depends on the entire past history. Any strategy that depends just on the clock or on a finite memory is worthless. The long-standing natural open problem has been whether every strategy that depends just on the clock and a finite memory is worthless. We prove that there is such a strategy that is ε-optimal. In fact, we show that just two states of memory are sufficient. AU - Hansen, Kristoffer Arnsfelt AU - Ibsen-Jensen, Rasmus AU - Neyman, Abraham ID - 5967 SN - 9781450358293 T2 - Proceedings of the 2018 ACM Conference on Economics and Computation - EC '18 TI - The Big Match with a clock and a bit of memory ER - TY - JOUR AB - In this article, we consider the termination problem of probabilistic programs with real-valued variables. Thequestions concerned are: qualitative ones that ask (i) whether the program terminates with probability 1(almost-sure termination) and (ii) whether the expected termination time is finite (finite termination); andquantitative ones that ask (i) to approximate the expected termination time (expectation problem) and (ii) tocompute a boundBsuch that the probability not to terminate afterBsteps decreases exponentially (con-centration problem). To solve these questions, we utilize the notion of ranking supermartingales, which isa powerful approach for proving termination of probabilistic programs. In detail, we focus on algorithmicsynthesis of linear ranking-supermartingales over affine probabilistic programs (Apps) with both angelic anddemonic non-determinism. An important subclass of Apps is LRApp which is defined as the class of all Appsover which a linear ranking-supermartingale exists.Our main contributions are as follows. Firstly, we show that the membership problem of LRApp (i) canbe decided in polynomial time for Apps with at most demonic non-determinism, and (ii) isNP-hard and inPSPACEfor Apps with angelic non-determinism. Moreover, theNP-hardness result holds already for Appswithout probability and demonic non-determinism. Secondly, we show that the concentration problem overLRApp can be solved in the same complexity as for the membership problem of LRApp. Finally, we show thatthe expectation problem over LRApp can be solved in2EXPTIMEand isPSPACE-hard even for Apps withoutprobability and non-determinism (i.e., deterministic programs). Our experimental results demonstrate theeffectiveness of our approach to answer the qualitative and quantitative questions over Apps with at mostdemonic non-determinism. AU - Chatterjee, Krishnendu AU - Fu, Hongfei AU - Novotný, Petr AU - Hasheminezhad, Rouzbeh ID - 5993 IS - 2 JF - ACM Transactions on Programming Languages and Systems SN - 0164-0925 TI - Algorithmic analysis of qualitative and quantitative termination problems for affine probabilistic programs VL - 40 ER - TY - CONF AB - Partially observable Markov decision processes (POMDPs) are the standard models for planning under uncertainty with both finite and infinite horizon. Besides the well-known discounted-sum objective, indefinite-horizon objective (aka Goal-POMDPs) is another classical objective for POMDPs. In this case, given a set of target states and a positive cost for each transition, the optimization objective is to minimize the expected total cost until a target state is reached. In the literature, RTDP-Bel or heuristic search value iteration (HSVI) have been used for solving Goal-POMDPs. Neither of these algorithms has theoretical convergence guarantees, and HSVI may even fail to terminate its trials. We give the following contributions: (1) We discuss the challenges introduced in Goal-POMDPs and illustrate how they prevent the original HSVI from converging. (2) We present a novel algorithm inspired by HSVI, termed Goal-HSVI, and show that our algorithm has convergence guarantees. (3) We show that Goal-HSVI outperforms RTDP-Bel on a set of well-known examples. AU - Horák, Karel AU - Bošanský, Branislav AU - Chatterjee, Krishnendu ID - 25 T2 - Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence TI - Goal-HSVI: Heuristic search value iteration for goal-POMDPs VL - 2018-July ER - TY - CONF AB - Partially-observable Markov decision processes (POMDPs) with discounted-sum payoff are a standard framework to model a wide range of problems related to decision making under uncertainty. Traditionally, the goal has been to obtain policies that optimize the expectation of the discounted-sum payoff. A key drawback of the expectation measure is that even low probability events with extreme payoff can significantly affect the expectation, and thus the obtained policies are not necessarily risk-averse. An alternate approach is to optimize the probability that the payoff is above a certain threshold, which allows obtaining risk-averse policies, but ignores optimization of the expectation. We consider the expectation optimization with probabilistic guarantee (EOPG) problem, where the goal is to optimize the expectation ensuring that the payoff is above a given threshold with at least a specified probability. We present several results on the EOPG problem, including the first algorithm to solve it. AU - Chatterjee, Krishnendu AU - Elgyütt, Adrian AU - Novotny, Petr AU - Rouillé, Owen ID - 24 TI - Expectation optimization with probabilistic guarantees in POMDPs with discounted-sum objectives VL - 2018 ER - TY - CONF AB - Partially observable Markov decision processes (POMDPs) are widely used in probabilistic planning problems in which an agent interacts with an environment using noisy and imprecise sensors. We study a setting in which the sensors are only partially defined and the goal is to synthesize “weakest” additional sensors, such that in the resulting POMDP, there is a small-memory policy for the agent that almost-surely (with probability 1) satisfies a reachability objective. We show that the problem is NP-complete, and present a symbolic algorithm by encoding the problem into SAT instances. We illustrate trade-offs between the amount of memory of the policy and the number of additional sensors on a simple example. We have implemented our approach and consider three classical POMDP examples from the literature, and show that in all the examples the number of sensors can be significantly decreased (as compared to the existing solutions in the literature) without increasing the complexity of the policies. AU - Chatterjee, Krishnendu AU - Chemlík, Martin AU - Topcu, Ufuk ID - 34 TI - Sensor synthesis for POMDPs with reachability objectives VL - 2018 ER -