Use of variance estimation in the multi-armed bandit problem

Abstract : An important aspect of most decision making problems concerns the appropriate balance between exploitation (acting optimally according to the partial knowledge acquired so far) and exploration of the environment (acting sub-optimally in order to refine the current knowledge and improve future decisions). A typical example of this so-called exploration versus exploitation dilemma is the multi-armed bandit problem, for which many strategies have been developed. Here we investigate policies based the choice of the arm having the highest upper-confidence bound, where the bound takes into account the empirical variance of the different arms. Such an algorithm was found earlier to outperform its peers in a series of numerical experiments. The main contribution of this paper is the theoretical investigation of this algorithm. Our contribution here is twofold. First, we prove that with probability at least $1-\beta$, the regret after $n$ plays of a variant of the UCB algorithm (called $\beta$-UCB) is upper-bounded by a constant, that scales linearly with $\log(1/\beta)$, but which is independent from $n$. We also analyse a variant which is closer to the algorithm suggested earlier. We prove a logarithmic bound on the expected regret of this algorithm and argue that the bound scales favourably with the variance of the suboptimal arms.
Type de document :
Autre publication
NIPS Workshop on On-line Trading of Exploration and ExploitationWorkshop. 2006


https://hal.inria.fr/inria-00203496
Contributeur : Rémi Munos <>
Soumis le : jeudi 10 janvier 2008 - 12:11:52
Dernière modification le : mercredi 15 avril 2015 - 16:09:49

Fichier

ucbtuned.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : inria-00203496, version 1

Collections

Citation

Jean-Yves Audibert, Rémi Munos, Csaba Szepesvari. Use of variance estimation in the multi-armed bandit problem. NIPS Workshop on On-line Trading of Exploration and ExploitationWorkshop. 2006. <inria-00203496>

Exporter

Partager

Métriques

Consultation de
la notice

192

Téléchargement du document

96

  翻译: