Markov decision processes with multiple long-run average objectives

T. Brázdil, V. Brožek, K. Chatterjee, V. Forejt, A. Kučera, Logical Methods in Computer Science 10 (2014).

Download
OA 375.39 KB

Journal Article | Published | English
Author
; ; ; ;
Department
Abstract
We study Markov decision processes (MDPs) with multiple limit-average (or mean-payoff) functions. We consider two different objectives, namely, expectation and satisfaction objectives. Given an MDP with κ limit-average functions, in the expectation objective the goal is to maximize the expected limit-average value, and in the satisfaction objective the goal is to maximize the probability of runs such that the limit-average value stays above a given vector. We show that under the expectation objective, in contrast to the case of one limit-average function, both randomization and memory are necessary for strategies even for ε-approximation, and that finite-memory randomized strategies are sufficient for achieving Pareto optimal values. Under the satisfaction objective, in contrast to the case of one limit-average function, infinite memory is necessary for strategies achieving a specific value (i.e. randomized finite-memory strategies are not sufficient), whereas memoryless randomized strategies are sufficient for ε-approximation, for all ε > 0. We further prove that the decision problems for both expectation and satisfaction objectives can be solved in polynomial time and the trade-off curve (Pareto curve) can be ε-approximated in time polynomial in the size of the MDP and 1/ε, and exponential in the number of limit-average functions, for all ε > 0. Our analysis also reveals flaws in previous work for MDPs with multiple mean-payoff functions under the expectation objective, corrects the flaws, and allows us to obtain improved results.
Publishing Year
Date Published
2014-02-14
Journal Title
Logical Methods in Computer Science
Volume
10
Issue
1
ISSN
IST-REx-ID

Cite this

Brázdil T, Brožek V, Chatterjee K, Forejt V, Kučera A. Markov decision processes with multiple long-run average objectives. Logical Methods in Computer Science. 2014;10(1). doi:10.2168/LMCS-10(1:13)2014
Brázdil, T., Brožek, V., Chatterjee, K., Forejt, V., & Kučera, A. (2014). Markov decision processes with multiple long-run average objectives. Logical Methods in Computer Science, 10(1). https://doi.org/10.2168/LMCS-10(1:13)2014
Brázdil, Tomáš, Václav Brožek, Krishnendu Chatterjee, Vojtěch Forejt, and Antonín Kučera. “Markov Decision Processes with Multiple Long-Run Average Objectives.” Logical Methods in Computer Science 10, no. 1 (2014). https://doi.org/10.2168/LMCS-10(1:13)2014.
T. Brázdil, V. Brožek, K. Chatterjee, V. Forejt, and A. Kučera, “Markov decision processes with multiple long-run average objectives,” Logical Methods in Computer Science, vol. 10, no. 1, 2014.
Brázdil T, Brožek V, Chatterjee K, Forejt V, Kučera A. 2014. Markov decision processes with multiple long-run average objectives. Logical Methods in Computer Science. 10(1).
Brázdil, Tomáš, et al. “Markov Decision Processes with Multiple Long-Run Average Objectives.” Logical Methods in Computer Science, vol. 10, no. 1, International Federation of Computational Logic, 2014, doi:10.2168/LMCS-10(1:13)2014.
All files available under the following license(s):
Creative Commons License:
CC-BYCreative Commons Attribution 4.0 International Public License (CC-BY 4.0)
Main File(s)
Access Level
OA Open Access
Last Uploaded
2018-12-12T10:07:57Z


Link(s) to Main File(s)
Access Level
OA Open Access

Export

Marked Publications

Open Data IST Research Explorer

Search this title in

Google Scholar