TY  - CONF
AB  - We study the problem of learning controllers for discrete-time non-linear stochastic dynamical systems with formal reach-avoid guarantees. This work presents the first method for providing formal reach-avoid guarantees, which combine and generalize stability and safety guarantees, with a tolerable probability threshold p in [0,1] over the infinite time horizon. Our method leverages advances in machine learning literature and it represents formal certificates as neural networks. In particular, we learn a certificate in the form of a reach-avoid supermartingale (RASM), a novel notion that we introduce in this work. Our RASMs provide reachability and avoidance guarantees by imposing constraints on what can be viewed as a stochastic extension of level sets of Lyapunov functions for deterministic systems. Our approach solves several important problems -- it can be used to learn a control policy from scratch, to verify a reach-avoid specification for a fixed control policy, or to fine-tune a pre-trained policy if it does not satisfy the reach-avoid specification. We validate our approach on 3 stochastic non-linear reinforcement learning tasks.
AU  - Zikelic, Dorde
AU  - Lechner, Mathias
AU  - Henzinger, Thomas A
AU  - Chatterjee, Krishnendu
ID  - 14830
IS  - 10
KW  - General Medicine
SN  - 2159-5399
T2  - Proceedings of the 37th AAAI Conference on Artificial Intelligence
TI  - Learning control policies for stochastic systems with reach-avoid guarantees
VL  - 37
ER  - 
TY  - CONF
AB  - A classical problem for Markov chains is determining their stationary (or steady-state) distribution. This problem has an equally classical solution based on eigenvectors and linear equation systems. However, this approach does not scale to large instances, and iterative solutions are desirable. It turns out that a naive approach, as used by current model checkers, may yield completely wrong results. We present a new approach, which utilizes recent advances in partial exploration and mean payoff computation to obtain a correct, converging approximation.
AU  - Meggendorfer, Tobias
ID  - 13139
SN  - 0302-9743
T2  - TACAS 2023: Tools and Algorithms for the Construction and Analysis of Systems
TI  - Correct approximation of stationary distributions
VL  - 13993
ER  - 
TY  - GEN
AB  - The software artefact to evaluate the approximation of stationary distributions implementation.
AU  - Meggendorfer, Tobias
ID  - 14990
TI  - Artefact for: Correct Approximation of Stationary Distributions
ER  - 
TY  - CONF
AB  - Reinforcement learning has shown promising results in learning neural network policies for complicated control tasks. However, the lack of formal guarantees about the behavior of such policies remains an impediment to their deployment. We propose a novel method for learning a composition of neural network policies in stochastic environments, along with a formal certificate which guarantees that a specification over the policy's behavior is satisfied with the desired probability. Unlike prior work on verifiable RL, our approach leverages the compositional nature of logical specifications provided in SpectRL, to learn over graphs of probabilistic reach-avoid specifications. The formal guarantees are provided by learning neural network policies together with reach-avoid supermartingales (RASM) for the graph’s sub-tasks and then composing them into a global policy. We also derive a tighter lower bound compared to previous work on the probability of reach-avoidance implied by a RASM, which is required to find a compositional policy with an acceptable probabilistic threshold for complex tasks with multiple edge policies. We implement a prototype of our approach and evaluate it on a Stochastic Nine Rooms environment.
AU  - Zikelic, Dorde
AU  - Lechner, Mathias
AU  - Verma, Abhinav
AU  - Chatterjee, Krishnendu
AU  - Henzinger, Thomas A
ID  - 15023
T2  - 37th Conference on Neural Information Processing Systems
TI  - Compositional policy learning in stochastic control systems with formal guarantees
ER  - 
TY  - CONF
AB  - Given a Markov chain M = (V, v_0, δ), with state space V and a starting state v_0, and a probability threshold ε, an ε-core is a subset C of states that is left with probability at most ε. More formally, C ⊆ V is an ε-core, iff ℙ[reach (V\C)] ≤ ε. Cores have been applied in a wide variety of verification problems over Markov chains, Markov decision processes, and probabilistic programs, as a means of discarding uninteresting and low-probability parts of a probabilistic system and instead being able to focus on the states that are likely to be encountered in a real-world run. In this work, we focus on the problem of computing a minimal ε-core in a Markov chain. Our contributions include both negative and positive results: (i) We show that the decision problem on the existence of an ε-core of a given size is NP-complete. This solves an open problem posed in [Jan Kretínský and Tobias Meggendorfer, 2020]. We additionally show that the problem remains NP-complete even when limited to acyclic Markov chains with bounded maximal vertex degree; (ii) We provide a polynomial time algorithm for computing a minimal ε-core on Markov chains over control-flow graphs of structured programs. A straightforward combination of our algorithm with standard branch prediction techniques allows one to apply the idea of cores to find a subset of program lines that are left with low probability and then focus any desired static analysis on this core subset.
AU  - Ahmadi, Ali
AU  - Chatterjee, Krishnendu
AU  - Goharshady, Amir Kafshdar
AU  - Meggendorfer, Tobias
AU  - Safavi Hemami, Roodabeh
AU  - Zikelic, Dorde
ID  - 12102
SN  - 1868-8969
T2  - 42nd IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science
TI  - Algorithms and hardness results for computing cores of Markov chains
VL  - 250
ER  - 
TY  - CONF
AB  - Spatial games form a widely-studied class of games from biology and physics modeling the evolution of social behavior. Formally, such a game is defined by a square (d by d) payoff matrix M and an undirected graph G. Each vertex of G represents an individual, that initially follows some strategy i ∈ {1,2,…,d}. In each round of the game, every individual plays the matrix game with each of its neighbors: An individual following strategy i meeting a neighbor following strategy j receives a payoff equal to the entry (i,j) of M. Then, each individual updates its strategy to its neighbors' strategy with the highest sum of payoffs, and the next round starts. The basic computational problems consist of reachability between configurations and the average frequency of a strategy. For general spatial games and graphs, these problems are in PSPACE. In this paper, we examine restricted setting: the game is a prisoner’s dilemma; and G is a subgraph of grid. We prove that basic computational problems for spatial games with prisoner’s dilemma on a subgraph of a grid are PSPACE-hard.
AU  - Chatterjee, Krishnendu
AU  - Ibsen-Jensen, Rasmus
AU  - Jecker, Ismael R
AU  - Svoboda, Jakub
ID  - 12101
SN  - 1868-8969
T2  - 42nd IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science
TI  - Complexity of spatial games
VL  - 250
ER  - 
TY  - CONF
AB  - We treat the problem of risk-aware control for stochastic shortest path (SSP) on Markov decision processes (MDP). Typically, expectation is considered for SSP, which however is oblivious to the incurred risk. We present an alternative view, instead optimizing conditional value-at-risk (CVaR), an established risk measure. We treat both Markov chains as well as MDP and introduce, through novel insights, two algorithms, based on linear programming and value iteration, respectively. Both algorithms offer precise and provably correct solutions. Evaluation of our prototype implementation shows that risk-aware control is feasible on several moderately sized models.
AU  - Meggendorfer, Tobias
ID  - 12568
IS  - 9
SN  - 1577358767
T2  - Proceedings of the 36th AAAI Conference on Artificial Intelligence, AAAI 2022
TI  - Risk-aware stochastic shortest path
VL  - 36
ER  - 
TY  - JOUR
AB  - A matching is compatible to two or more labeled point sets of size n with labels {1, . . . , n} if its straight-line drawing on each of these point sets is crossing-free. We study the maximum number of edges in a matching compatible to two or more labeled point sets in general position in the plane. We show that for any two labeled sets of n points in convex position there exists a compatible matching with ⌊√2n + 1 − 1⌋ edges. More generally, for any ℓ labeled point sets we construct compatible matchings of size Ω(n1/ℓ). As a corresponding upper bound, we use probabilistic arguments to show that for any ℓ given sets of n points there exists a labeling of each set such that the largest compatible matching has O(n2/(ℓ+1)) edges. Finally, we show that Θ(log n) copies of any set of n points are necessary and sufficient for the existence of labelings of these point sets such that any compatible matching consists only of a single edge.
AU  - Aichholzer, Oswin
AU  - Arroyo Guevara, Alan M
AU  - Masárová, Zuzana
AU  - Parada, Irene
AU  - Perz, Daniel
AU  - Pilz, Alexander
AU  - Tkadlec, Josef
AU  - Vogtenhuber, Birgit
ID  - 11938
IS  - 2
JF  - Journal of Graph Algorithms and Applications
SN  - 1526-1719
TI  - On compatible matchings
VL  - 26
ER  - 
TY  - GEN
AB  - In modern sample-driven Prophet Inequality, an adversary chooses a sequence of n items with values v1,v2,…,vn to be presented to a decision maker (DM). The process follows in two phases. In the first phase (sampling phase), some items, possibly selected at random, are revealed to the DM, but she can never accept them. In the second phase, the DM is presented with the other items in a random order and online fashion. For each item, she must make an irrevocable decision to either accept the item and stop the process or reject the item forever and proceed to the next item. The goal of the DM is to maximize the expected value as compared to a Prophet (or offline algorithm) that has access to all information. In this setting, the sampling phase has no cost and is not part of the optimization process. However, in many scenarios, the samples are obtained as part of the decision-making process.
We model this aspect as a two-phase Prophet Inequality where an adversary chooses a sequence of 2n items with values v1,v2,…,v2n and the items are randomly ordered. Finally, there are two phases of the Prophet Inequality problem with the first n-items and the rest of the items, respectively. We show that some basic algorithms achieve a ratio of at most 0.450. We present an algorithm that achieves a ratio of at least 0.495. Finally, we show that for every algorithm the ratio it can achieve is at most 0.502. Hence our algorithm is near-optimal.
AU  - Chatterjee, Krishnendu
AU  - Mohammadi, Mona
AU  - Saona Urmeneta, Raimundo J
ID  - 12677
T2  - arXiv
TI  - Repeated prophet inequality with near-optimal bounds
ER  - 
TY  - JOUR
AB  - Transforming ω-automata into parity automata is traditionally done using appearance records. We present an efficient variant of this idea, tailored to Rabin automata, and several optimizations applicable to all appearance records. We compare the methods experimentally and show that our method produces significantly smaller automata than previous approaches.
AU  - Kretinsky, Jan
AU  - Meggendorfer, Tobias
AU  - Waldmann, Clara
AU  - Weininger, Maximilian
ID  - 10602
JF  - Acta Informatica
KW  - computer networks and communications
KW  - information systems
KW  - software
SN  - 0001-5903
TI  - Index appearance record with preorders
VL  - 59
ER  -