The document discusses reinforcement learning and the exploration-exploitation tradeoff that agents face. It proposes learning exploration-exploitation strategies rather than relying on predefined formulas. The approach defines training problems, candidate strategies parameterized by formulas, a performance criterion, and optimizes strategies on training problems using an estimation of distribution algorithm. Simulation results show learned strategies outperform strategies from common formulas on matching test problems.