inria-00329797, version 1
Online Optimization in X-Armed Bandits
Sébastien Bubeck 1Rémi Munos
1Gilles Stoltz
2, 3Csaba Szepesvari
4
Twenty-Second Annual Conference on Neural Information Processing Systems (2008)
Résumé : We consider a generalization of stochastic bandit problems where the set of arms, X, is allowed to be a generic topological space. We constraint the mean-payoff function with a dissimilarity function over X in a way that is more general than Lipschitz. We construct an arm selection policy whose regret improves upon previous result for a large class of problems. In particular, our results imply that if X is the unit hypercube in a Euclidean space and the mean-payoff function has a finite number of global maxima around which the behavior of the function is locally Holder with a known exponent, then the expected regret is bounded up to a logarithmic factor by $\sqrt{n}$, i.e., the rate of the growth of the regret is independent of the dimension of the space. Moreover, we prove the minimax optimality of our algorithm for the class of mean-payoff functions we consider.
- 1 : SEQUEL (INRIA Futurs)
- INRIA – CNRS : UMR8146 – Université Lille I - Sciences et technologies – Université Lille III - Sciences humaines et sociales – Ecole Centrale de Lille
- 2 : Département de Mathématiques et Applications (DMA)
- CNRS : UMR8553 – Ecole normale supérieure de Paris - ENS Paris
- 3 : Groupement de Recherche et d'Etudes en Gestion à HEC (GREGH)
- GROUPE HEC – CNRS : UMR2959
- 4 : Department of Computing Science
- Department of Computing Science, University of Alberta
- Domaine : Mathématiques/Statistiques
Mathématiques/Optimisation et contrôle
Sciences de l'Homme et Société/Economies et finances
- inria-00329797, version 1
- http://hal.inria.fr/inria-00329797
- oai:hal.inria.fr:inria-00329797
- Contributeur : Sébastien Bubeck
- Soumis le : Lundi 13 Octobre 2008, 14:28:01
- Dernière modification le : Lundi 13 Octobre 2008, 16:16:32