A controller is a device that interacts with a plant. At each time point,it reads the plant’s state and issues commands with the goal that the plant oper-ates optimally. Constructing optimal controllers is a fundamental and challengingproblem. Machine learning techniques have recently been successfully applied totrain controllers, yet they have limitations. Learned controllers are monolithic andhard to reason about. In particular, it is difficult to add features without retraining,to guarantee any level of performance, and to achieve acceptable performancewhen encountering untrained scenarios. These limitations can be addressed bydeploying quantitative run-timeshieldsthat serve as a proxy for the controller.At each time point, the shield reads the command issued by the controller andmay choose to alter it before passing it on to the plant. We show how optimalshields that interfere as little as possible while guaranteeing a desired level ofcontroller performance, can be generated systematically and automatically usingreactive synthesis. First, we abstract the plant by building a stochastic model.Second, we consider the learned controller to be a black box. Third, we mea-surecontroller performanceandshield interferenceby two quantitative run-timemeasures that are formally defined using weighted automata. Then, the problemof constructing a shield that guarantees maximal performance with minimal inter-ference is the problem of finding an optimal strategy in a stochastic2-player game“controller versus shield” played on the abstract state space of the plant with aquantitative objective obtained from combining the performance and interferencemeasures. We illustrate the effectiveness of our approach by automatically con-structing lightweight shields for learned traffic-light controllers in various roadnetworks. The shields we generate avoid liveness bugs, improve controller per-formance in untrained and changing traffic situations, and add features to learnedcontrollers, such as giving priority to emergency vehicles.
31st International Conference on Computer-Aided Verification
CAV: Computer Aided Verification
New York, NY, United States
2019-07-13 – 2019-07-18
Avni G, Bloem R, Chatterjee K, Henzinger TA, Konighofer B, Pranger S. Run-time optimization for learned controllers through quantitative games. In: 31st International Conference on Computer-Aided Verification. Vol 11561. Springer; 2019:630-649. doi:10.1007/978-3-030-25540-4_36
Avni, G., Bloem, R., Chatterjee, K., Henzinger, T. A., Konighofer, B., & Pranger, S. (2019). Run-time optimization for learned controllers through quantitative games. In 31st International Conference on Computer-Aided Verification (Vol. 11561, pp. 630–649). New York, NY, United States: Springer. https://doi.org/10.1007/978-3-030-25540-4_36
Avni, Guy, Roderick Bloem, Krishnendu Chatterjee, Thomas A Henzinger, Bettina Konighofer, and Stefan Pranger. “Run-Time Optimization for Learned Controllers through Quantitative Games.” In 31st International Conference on Computer-Aided Verification, 11561:630–49. Springer, 2019. https://doi.org/10.1007/978-3-030-25540-4_36.
G. Avni, R. Bloem, K. Chatterjee, T. A. Henzinger, B. Konighofer, and S. Pranger, “Run-time optimization for learned controllers through quantitative games,” in 31st International Conference on Computer-Aided Verification, New York, NY, United States, 2019, vol. 11561, pp. 630–649.
Avni G, Bloem R, Chatterjee K, Henzinger TA, Konighofer B, Pranger S. 2019. Run-time optimization for learned controllers through quantitative games. 31st International Conference on Computer-Aided Verification. CAV: Computer Aided Verification, LNCS, vol. 11561. 630–649.
Avni, Guy, et al. “Run-Time Optimization for Learned Controllers through Quantitative Games.” 31st International Conference on Computer-Aided Verification, vol. 11561, Springer, 2019, pp. 630–49, doi:10.1007/978-3-030-25540-4_36.
2019_CAV_Avni.pdf 659.77 KB