TY - CONF AB - Reinforcement learning has received much attention for learning controllers of deterministic systems. We consider a learner-verifier framework for stochastic control systems and survey recent methods that formally guarantee a conjunction of reachability and safety properties. Given a property and a lower bound on the probability of the property being satisfied, our framework jointly learns a control policy and a formal certificate to ensure the satisfaction of the property with a desired probability threshold. Both the control policy and the formal certificate are continuous functions from states to reals, which are learned as parameterized neural networks. While in the deterministic case, the certificates are invariant and barrier functions for safety, or Lyapunov and ranking functions for liveness, in the stochastic case the certificates are supermartingales. For certificate verification, we use interval arithmetic abstract interpretation to bound the expected values of neural network functions. AU - Chatterjee, Krishnendu AU - Henzinger, Thomas A AU - Lechner, Mathias AU - Zikelic, Dorde ID - 13142 SN - 0302-9743 T2 - Tools and Algorithms for the Construction and Analysis of Systems TI - A learner-verifier framework for neural network controllers and certificates of stochastic systems VL - 13993 ER - TY - JOUR AB - Populations evolve in spatially heterogeneous environments. While a certain trait might bring a fitness advantage in some patch of the environment, a different trait might be advantageous in another patch. Here, we study the Moran birth–death process with two types of individuals in a population stretched across two patches of size N, each patch favouring one of the two types. We show that the long-term fate of such populations crucially depends on the migration rate μ between the patches. To classify the possible fates, we use the distinction between polynomial (short) and exponential (long) timescales. We show that when μ is high then one of the two types fixates on the whole population after a number of steps that is only polynomial in N. By contrast, when μ is low then each type holds majority in the patch where it is favoured for a number of steps that is at least exponential in N. Moreover, we precisely identify the threshold migration rate μ⋆ that separates those two scenarios, thereby exactly delineating the situations that support long-term coexistence of the two types. We also discuss the case of various cycle graphs and we present computer simulations that perfectly match our analytical results. AU - Svoboda, Jakub AU - Tkadlec, Josef AU - Kaveh, Kamran AU - Chatterjee, Krishnendu ID - 12787 IS - 2271 JF - Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences SN - 1364-5021 TI - Coexistence times in the Moran process with environmental heterogeneity VL - 479 ER - TY - JOUR AB - The field of indirect reciprocity investigates how social norms can foster cooperation when individuals continuously monitor and assess each other’s social interactions. By adhering to certain social norms, cooperating individuals can improve their reputation and, in turn, receive benefits from others. Eight social norms, known as the “leading eight," have been shown to effectively promote the evolution of cooperation as long as information is public and reliable. These norms categorize group members as either ’good’ or ’bad’. In this study, we examine a scenario where individuals instead assign nuanced reputation scores to each other, and only cooperate with those whose reputation exceeds a certain threshold. We find both analytically and through simulations that such quantitative assessments are error-correcting, thus facilitating cooperation in situations where information is private and unreliable. Moreover, our results identify four specific norms that are robust to such conditions, and may be relevant for helping to sustain cooperation in natural populations. AU - Schmid, Laura AU - Ekbatani, Farbod AU - Hilbe, Christian AU - Chatterjee, Krishnendu ID - 12861 JF - Nature Communications TI - Quantitative assessment can stabilize indirect reciprocity under imperfect information VL - 14 ER - TY - CONF AB - We study the problem of training and certifying adversarially robust quantized neural networks (QNNs). Quantization is a technique for making neural networks more efficient by running them using low-bit integer arithmetic and is therefore commonly adopted in industry. Recent work has shown that floating-point neural networks that have been verified to be robust can become vulnerable to adversarial attacks after quantization, and certification of the quantized representation is necessary to guarantee robustness. In this work, we present quantization-aware interval bound propagation (QA-IBP), a novel method for training robust QNNs. Inspired by advances in robust learning of non-quantized networks, our training algorithm computes the gradient of an abstract representation of the actual network. Unlike existing approaches, our method can handle the discrete semantics of QNNs. Based on QA-IBP, we also develop a complete verification procedure for verifying the adversarial robustness of QNNs, which is guaranteed to terminate and produce a correct answer. Compared to existing approaches, the key advantage of our verification procedure is that it runs entirely on GPU or other accelerator devices. We demonstrate experimentally that our approach significantly outperforms existing methods and establish the new state-of-the-art for training and certifying the robustness of QNNs. AU - Lechner, Mathias AU - Zikelic, Dorde AU - Chatterjee, Krishnendu AU - Henzinger, Thomas A AU - Rus, Daniela ID - 14242 IS - 12 SN - 9781577358800 T2 - Proceedings of the 37th AAAI Conference on Artificial Intelligence TI - Quantization-aware interval bound propagation for training certifiably robust quantized neural networks VL - 37 ER - TY - CONF AB - Two-player zero-sum "graph games" are central in logic, verification, and multi-agent systems. The game proceeds by placing a token on a vertex of a graph, and allowing the players to move it to produce an infinite path, which determines the winner or payoff of the game. Traditionally, the players alternate turns in moving the token. In "bidding games", however, the players have budgets and in each turn, an auction (bidding) determines which player moves the token. So far, bidding games have only been studied as full-information games. In this work we initiate the study of partial-information bidding games: we study bidding games in which a player's initial budget is drawn from a known probability distribution. We show that while for some bidding mechanisms and objectives, it is straightforward to adapt the results from the full-information setting to the partial-information setting, for others, the analysis is significantly more challenging, requires new techniques, and gives rise to interesting results. Specifically, we study games with "mean-payoff" objectives in combination with "poorman" bidding. We construct optimal strategies for a partially-informed player who plays against a fully-informed adversary. We show that, somewhat surprisingly, the "value" under pure strategies does not necessarily exist in such games. AU - Avni, Guy AU - Jecker, Ismael R AU - Zikelic, Dorde ID - 14243 IS - 5 SN - 9781577358800 T2 - Proceedings of the 37th AAAI Conference on Artificial Intelligence TI - Bidding graph games with partially-observable budgets VL - 37 ER -