Use of variance estimation in the multi-armed bandit problem

Jean-Yves Audibert; Rémi Munos; Csaba Szepesvari

Use of variance estimation in the multi-armed bandit problem

Jean-Yves Audibert ¹ Rémi Munos ² Csaba Szepesvari ³

1 CERMICS - Centre d'Enseignement et de Recherche en Mathématiques et Calcul Scientifique

LIFL - Laboratoire d'Informatique Fondamentale de Lille, LAGIS - Laboratoire d'Automatique, Génie Informatique et Signal, INRIA Lille - Nord Europe

3 MTA SZTAKI - Computer and Automation Research Institute of the Hungarian Academy of Sciences

Abstract : An important aspect of most decision making problems concerns the appropriate balance between exploitation (acting optimally according to the partial knowledge acquired so far) and exploration of the environment (acting sub-optimally in order to refine the current knowledge and improve future decisions). A typical example of this so-called exploration versus exploitation dilemma is the multi-armed bandit problem, for which many strategies have been developed. Here we investigate policies based the choice of the arm having the highest upper-confidence bound, where the bound takes into account the empirical variance of the different arms. Such an algorithm was found earlier to outperform its peers in a series of numerical experiments. The main contribution of this paper is the theoretical investigation of this algorithm. Our contribution here is twofold. First, we prove that with probability at least $1-\beta$, the regret after $n$ plays of a variant of the UCB algorithm (called $\beta$-UCB) is upper-bounded by a constant, that scales linearly with $\log(1/\beta)$, but which is independent from $n$. We also analyse a variant which is closer to the algorithm suggested earlier. We prove a logarithmic bound on the expected regret of this algorithm and argue that the bound scales favourably with the variance of the suboptimal arms.

Type de document :

Autre publication

NIPS Workshop on On-line Trading of Exploration and ExploitationWorkshop. 2006

Domaine :

Informatique [cs] / Apprentissage [cs.LG]

https://hal.inria.fr/inria-00203496
Contributeur : Rémi Munos <>
Soumis le : jeudi 10 janvier 2008 - 12:11:52
Dernière modification le : mercredi 15 avril 2015 - 16:09:49

ucbtuned.pdf

Fichiers produits par l'(les) auteur(s)

Use of variance estimation in the multi-armed bandit problem

Fichier

Identifiants

Collections

Citation

Exporter

Partager

Métriques

192

96

Contact