TY - CONF AB - We study graphs and two-player games in which rewards are assigned to states, and the goal of the players is to satisfy or dissatisfy certain property of the generated outcome, given as a mean payoff property. Since the notion of mean-payoff does not reflect possible fluctuations from the mean-payoff along a run, we propose definitions and algorithms for capturing the stability of the system, and give algorithms for deciding if a given mean payoff and stability objective can be ensured in the system. AU - Brázdil, Tomáš AU - Forejt, Vojtěch AU - Kučera, Antonín AU - Novotny, Petr ID - 1325 TI - Stability in graphs and games VL - 59 ER - TY - CONF AB - DEC-POMDPs extend POMDPs to a multi-agent setting, where several agents operate in an uncertain environment independently to achieve a joint objective. DEC-POMDPs have been studied with finite-horizon and infinite-horizon discounted-sum objectives, and there exist solvers both for exact and approximate solutions. In this work we consider Goal-DEC-POMDPs, where given a set of target states, the objective is to ensure that the target set is reached with minimal cost. We consider the indefinite-horizon (infinite-horizon with either discounted-sum, or undiscounted-sum, where absorbing goal states have zero-cost) problem. We present a new and novel method to solve the problem that extends methods for finite-horizon DEC-POMDPs and the RTDP-Bel approach for POMDPs. We present experimental results on several examples, and show that our approach presents promising results. Copyright AU - Chatterjee, Krishnendu AU - Chmelik, Martin ID - 1324 T2 - Proceedings of the Twenty-Sixth International Conference on International Conference on Automated Planning and Scheduling TI - Indefinite-horizon reachability in Goal-DEC-POMDPs VL - 2016-January ER - TY - CONF AB - We consider partially observable Markov decision processes (POMDPs) with a set of target states and positive integer costs associated with every transition. The traditional optimization objective (stochastic shortest path) asks to minimize the expected total cost until the target set is reached. We extend the traditional framework of POMDPs to model energy consumption, which represents a hard constraint. The energy levels may increase and decrease with transitions, and the hard constraint requires that the energy level must remain positive in all steps till the target is reached. First, we present a novel algorithm for solving POMDPs with energy levels, developing on existing POMDP solvers and using RTDP as its main method. Our second contribution is related to policy representation. For larger POMDP instances the policies computed by existing solvers are too large to be understandable. We present an automated procedure based on machine learning techniques that automatically extracts important decisions of the policy allowing us to compute succinct human readable policies. Finally, we show experimentally that our algorithm performs well and computes succinct policies on a number of POMDP instances from the literature that were naturally enhanced with energy levels. AU - Brázdil, Tomáš AU - Chatterjee, Krishnendu AU - Chmelik, Martin AU - Gupta, Anchit AU - Novotny, Petr ID - 1327 T2 - Proceedings of the 15th International Conference on Autonomous Agents and Multiagent Systems TI - Stochastic shortest path with energy constraints in POMDPs ER - TY - CONF AB - Energy Markov Decision Processes (EMDPs) are finite-state Markov decision processes where each transition is assigned an integer counter update and a rational payoff. An EMDP configuration is a pair s(n), where s is a control state and n is the current counter value. The configurations are changed by performing transitions in the standard way. We consider the problem of computing a safe strategy (i.e., a strategy that keeps the counter non-negative) which maximizes the expected mean payoff. AU - Brázdil, Tomáš AU - Kučera, Antonín AU - Novotny, Petr ID - 1326 TI - Optimizing the expected mean payoff in Energy Markov Decision Processes VL - 9938 ER - TY - JOUR AB - Social dilemmas force players to balance between personal and collective gain. In many dilemmas, such as elected governments negotiating climate-change mitigation measures, the decisions are made not by individual players but by their representatives. However, the behaviour of representatives in social dilemmas has not been investigated experimentally. Here inspired by the negotiations for greenhouse-gas emissions reductions, we experimentally study a collective-risk social dilemma that involves representatives deciding on behalf of their fellow group members. Representatives can be re-elected or voted out after each consecutive collective-risk game. Selfish players are preferentially elected and are hence found most frequently in the "representatives" treatment. Across all treatments, we identify the selfish players as extortioners. As predicted by our mathematical model, their steadfast strategies enforce cooperation from fair players who finally compensate almost completely the deficit caused by the extortionate co-players. Everybody gains, but the extortionate representatives and their groups gain the most. AU - Milinski, Manfred AU - Hilbe, Christian AU - Semmann, Dirk AU - Sommerfeld, Ralf AU - Marotzke, Jochem ID - 1333 JF - Nature Communications TI - Humans choose representatives who enforce cooperation in social dilemmas through extortion VL - 7 ER -