Efficient Bayesian Planning

Sammanfattning: Artificial Intelligence (AI) is a long-studied and yet very active field of research. The list of things differentiating humans from AI grows thinner but the dream of an artificial general intelligence remains elusive. Sequential Decision Making is a subfield of AI that poses a seemingly benign question ``How to act optimally in an unknown environment?''. This requires the AI agent to learn about its environment as well as plan an action sequence given its current knowledge about it. The two common problem settings are partial observability and unknown environment dynamics. Bayesian planning deals with these issues by simultaneously defining a single planning problem which considers the simultaneous effects of an action on both learning and goal search. The technique involves dealing with infinite tree data structures which are hard to store but essential for computing the optimal plan. Finally, we consider the minimax setting where the Bayesian prior is chosen by an adversary and therefore a worst case policy needs to be found. In this thesis, we present novel Bayesian planning algorithms. First, we propose DSS ( Deeper, Sparser Sampling ) for the case of unknown environment dynamics. It is a meta-algorithm derived from a simple insight about the Bayes rule, which beats the state-of-the-art across the board from discrete to continuous state settings. A theoretical analysis provides a high probability bound on its performance. Our analysis is different from previous approaches in the literature in terms of problem formulation and formal guarantees. The result also contrasts with those of previous comparable BRL algorithms, which typically provide asymptotic convergence guarantees. Suitable Bayesian models and their corresponding planners are proposed for implementing the discrete and continuous versions of DSS. We then address the issue of partial observability via our second algorithm, FMP ( Finite Memory Planner ). This uses depth-dependent partitioning of the infinite planning tree. Experimental results demonstrate comparable performance to the current state-of-the-art for both discrete and continuous settings. Finally, we propose algorithms for finding the best policy for the worst case belief in the Minimax Bayesian setting.

  KLICKA HÄR FÖR ATT SE AVHANDLINGEN I FULLTEXT. (PDF-format)