---
res:
bibo_abstract:
- 'Partially observable Markov decision processes (POMDPs) are the standard models
for planning under uncertainty with both finite and infinite horizon. Besides
the well-known discounted-sum objective, indefinite-horizon objective (aka Goal-POMDPs)
is another classical objective for POMDPs. In this case, given a set of target
states and a positive cost for each transition, the optimization objective is
to minimize the expected total cost until a target state is reached. In the literature,
RTDP-Bel or heuristic search value iteration (HSVI) have been used for solving
Goal-POMDPs. Neither of these algorithms has theoretical convergence guarantees,
and HSVI may even fail to terminate its trials. We give the following contributions:
(1) We discuss the challenges introduced in Goal-POMDPs and illustrate how they
prevent the original HSVI from converging. (2) We present a novel algorithm inspired
by HSVI, termed Goal-HSVI, and show that our algorithm has convergence guarantees.
(3) We show that Goal-HSVI outperforms RTDP-Bel on a set of well-known examples.@eng'
bibo_authorlist:
- foaf_Person:
foaf_givenName: Karel
foaf_name: Horák, Karel
foaf_surname: Horák
- foaf_Person:
foaf_givenName: Branislav
foaf_name: Bošanský, Branislav
foaf_surname: Bošanský
- foaf_Person:
foaf_givenName: Krishnendu
foaf_name: Chatterjee, Krishnendu
foaf_surname: Chatterjee
foaf_workInfoHomepage: http://www.librecat.org/personId=2E5DCA20-F248-11E8-B48F-1D18A9856A87
orcid: 0000-0002-4561-241X
bibo_doi: '10.24963/ijcai.2018/662 '
bibo_volume: 2018-July
dct_date: 2018^xs_gYear
dct_language: eng
dct_publisher: IJCAI@
dct_title: 'Goal-HSVI: Heuristic search value iteration for goal-POMDPs@'
...