TY - CONF AB - We study graphs and two-player games in which rewards are assigned to states, and the goal of the players is to satisfy or dissatisfy certain property of the generated outcome, given as a mean payoff property. Since the notion of mean-payoff does not reflect possible fluctuations from the mean-payoff along a run, we propose definitions and algorithms for capturing the stability of the system, and give algorithms for deciding if a given mean payoff and stability objective can be ensured in the system. AU - Brázdil, Tomáš AU - Forejt, Vojtěch AU - Kučera, Antonín AU - Novotny, Petr ID - 1325 TI - Stability in graphs and games VL - 59 ER - TY - CONF AB - DEC-POMDPs extend POMDPs to a multi-agent setting, where several agents operate in an uncertain environment independently to achieve a joint objective. DEC-POMDPs have been studied with finite-horizon and infinite-horizon discounted-sum objectives, and there exist solvers both for exact and approximate solutions. In this work we consider Goal-DEC-POMDPs, where given a set of target states, the objective is to ensure that the target set is reached with minimal cost. We consider the indefinite-horizon (infinite-horizon with either discounted-sum, or undiscounted-sum, where absorbing goal states have zero-cost) problem. We present a new and novel method to solve the problem that extends methods for finite-horizon DEC-POMDPs and the RTDP-Bel approach for POMDPs. We present experimental results on several examples, and show that our approach presents promising results. Copyright AU - Chatterjee, Krishnendu AU - Chmelik, Martin ID - 1324 T2 - Proceedings of the Twenty-Sixth International Conference on International Conference on Automated Planning and Scheduling TI - Indefinite-horizon reachability in Goal-DEC-POMDPs VL - 2016-January ER - TY - CONF AB - We consider partially observable Markov decision processes (POMDPs) with a set of target states and positive integer costs associated with every transition. The traditional optimization objective (stochastic shortest path) asks to minimize the expected total cost until the target set is reached. We extend the traditional framework of POMDPs to model energy consumption, which represents a hard constraint. The energy levels may increase and decrease with transitions, and the hard constraint requires that the energy level must remain positive in all steps till the target is reached. First, we present a novel algorithm for solving POMDPs with energy levels, developing on existing POMDP solvers and using RTDP as its main method. Our second contribution is related to policy representation. For larger POMDP instances the policies computed by existing solvers are too large to be understandable. We present an automated procedure based on machine learning techniques that automatically extracts important decisions of the policy allowing us to compute succinct human readable policies. Finally, we show experimentally that our algorithm performs well and computes succinct policies on a number of POMDP instances from the literature that were naturally enhanced with energy levels. AU - Brázdil, Tomáš AU - Chatterjee, Krishnendu AU - Chmelik, Martin AU - Gupta, Anchit AU - Novotny, Petr ID - 1327 T2 - Proceedings of the 15th International Conference on Autonomous Agents and Multiagent Systems TI - Stochastic shortest path with energy constraints in POMDPs ER - TY - CONF AB - Energy Markov Decision Processes (EMDPs) are finite-state Markov decision processes where each transition is assigned an integer counter update and a rational payoff. An EMDP configuration is a pair s(n), where s is a control state and n is the current counter value. The configurations are changed by performing transitions in the standard way. We consider the problem of computing a safe strategy (i.e., a strategy that keeps the counter non-negative) which maximizes the expected mean payoff. AU - Brázdil, Tomáš AU - Kučera, Antonín AU - Novotny, Petr ID - 1326 TI - Optimizing the expected mean payoff in Energy Markov Decision Processes VL - 9938 ER - TY - JOUR AB - Social dilemmas force players to balance between personal and collective gain. In many dilemmas, such as elected governments negotiating climate-change mitigation measures, the decisions are made not by individual players but by their representatives. However, the behaviour of representatives in social dilemmas has not been investigated experimentally. Here inspired by the negotiations for greenhouse-gas emissions reductions, we experimentally study a collective-risk social dilemma that involves representatives deciding on behalf of their fellow group members. Representatives can be re-elected or voted out after each consecutive collective-risk game. Selfish players are preferentially elected and are hence found most frequently in the "representatives" treatment. Across all treatments, we identify the selfish players as extortioners. As predicted by our mathematical model, their steadfast strategies enforce cooperation from fair players who finally compensate almost completely the deficit caused by the extortionate co-players. Everybody gains, but the extortionate representatives and their groups gain the most. AU - Milinski, Manfred AU - Hilbe, Christian AU - Semmann, Dirk AU - Sommerfeld, Ralf AU - Marotzke, Jochem ID - 1333 JF - Nature Communications TI - Humans choose representatives who enforce cooperation in social dilemmas through extortion VL - 7 ER - TY - CONF AB - In this paper we review various automata-theoretic formalisms for expressing quantitative properties. We start with finite-state Boolean automata that express the traditional regular properties. We then consider weighted ω-automata that can measure the average density of events, which finite-state Boolean automata cannot. However, even weighted ω-automata cannot express basic performance properties like average response time. We finally consider two formalisms of weighted ω-automata with monitors, where the monitors are either (a) counters or (b) weighted automata themselves. We present a translation result to establish that these two formalisms are equivalent. Weighted ω-automata with monitors generalize weighted ω-automata, and can express average response time property. They present a natural, robust, and expressive framework for quantitative specifications, with important decidable properties. AU - Chatterjee, Krishnendu AU - Henzinger, Thomas A AU - Otop, Jan ID - 1335 TI - Quantitative monitor automata VL - 9837 ER - TY - CONF AB - We study repeated games with absorbing states, a type of two-player, zero-sum concurrent mean-payoff games with the prototypical example being the Big Match of Gillete (1957). These games may not allow optimal strategies but they always have ε-optimal strategies. In this paper we design ε-optimal strategies for Player 1 in these games that use only O(log log T) space. Furthermore, we construct strategies for Player 1 that use space s(T), for an arbitrary small unbounded non-decreasing function s, and which guarantee an ε-optimal value for Player 1 in the limit superior sense. The previously known strategies use space Ω(log T) and it was known that no strategy can use constant space if it is ε-optimal even in the limit superior sense. We also give a complementary lower bound. Furthermore, we also show that no Markov strategy, even extended with finite memory, can ensure value greater than 0 in the Big Match, answering a question posed by Neyman [11]. AU - Hansen, Kristoffer AU - Ibsen-Jensen, Rasmus AU - Koucký, Michal ID - 1340 TI - The big match in small space VL - 9928 ER - TY - JOUR AB - We consider higher-dimensional versions of Kannan and Lipton's Orbit Problem - determining whether a target vector space V may be reached from a starting point x under repeated applications of a linear transformation A. Answering two questions posed by Kannan and Lipton in the 1980s, we show that when V has dimension one, this problem is solvable in polynomial time, and when V has dimension two or three, the problem is in NPRP. AU - Chonev, Ventsislav K AU - Ouaknine, Joël AU - Worrell, James ID - 1380 IS - 3 JF - Journal of the ACM TI - On the complexity of the orbit problem VL - 63 ER - TY - CONF AB - The continuous evolution of a wide variety of systems, including continous-time Markov chains and linear hybrid automata, can be described in terms of linear differential equations. In this paper we study the decision problem of whether the solution x(t) of a system of linear differential equations dx/dt = Ax reaches a target halfspace infinitely often. This recurrent reachability problem can equivalently be formulated as the following Infinite Zeros Problem: does a real-valued function f:R≥0 --> R satisfying a given linear differential equation have infinitely many zeros? Our main decidability result is that if the differential equation has order at most 7, then the Infinite Zeros Problem is decidable. On the other hand, we show that a decision procedure for the Infinite Zeros Problem at order 9 (and above) would entail a major breakthrough in Diophantine Approximation, specifically an algorithm for computing the Lagrange constants of arbitrary real algebraic numbers to arbitrary precision. AU - Chonev, Ventsislav K AU - Ouaknine, Joël AU - Worrell, James ID - 1389 T2 - LICS '16 TI - On recurrent reachability for continuous linear dynamical systems ER - TY - JOUR AB - Brood parasites exploit their host in order to increase their own fitness. Typically, this results in an arms race between parasite trickery and host defence. Thus, it is puzzling to observe hosts that accept parasitism without any resistance. The ‘mafia’ hypothesis suggests that these hosts accept parasitism to avoid retaliation. Retaliation has been shown to evolve when the hosts condition their response to mafia parasites, who use depredation as a targeted response to rejection. However, it is unclear if acceptance would also emerge when ‘farming’ parasites are present in the population. Farming parasites use depredation to synchronize the timing with the host, destroying mature clutches to force the host to re-nest. Herein, we develop an evolutionary model to analyse the interaction between depredatory parasites and their hosts. We show that coevolutionary cycles between farmers and mafia can still induce host acceptance of brood parasites. However, this equilibrium is unstable and in the long-run the dynamics of this host–parasite interaction exhibits strong oscillations: when farmers are the majority, accepters conditional to mafia (the host will reject first and only accept after retaliation by the parasite) have a higher fitness than unconditional accepters (the host always accepts parasitism). This leads to an increase in mafia parasites’ fitness and in turn induce an optimal environment for accepter hosts. AU - Chakra, Maria AU - Hilbe, Christian AU - Traulsen, Arne ID - 1426 IS - 5 JF - Royal Society Open Science TI - Coevolutionary interactions between farmers and mafia induce host acceptance of avian brood parasites VL - 3 ER -