Evolution Strategies as a Scalable Alternative to Reinforcement Learning

Diego Marinho de Oliveira

Gen-AI Search, RecSys | ex-SEEK, AI Lead, Data Scientist Manager and ML Engineer Specialist

Published Mar 21, 2017

"Abstract We explore the use of Evolution Strategies, a class of black box optimization algorithms, as an alternative to popular RL techniques such as Q-learning and Policy Gradients. Experiments on MuJoCo and Atari show that ES is a viable solution strategy that scales extremely well with the number of CPUs available: By using hundreds to thousands of parallel workers, ES can solve 3D humanoid walking in 10 minutes and obtain competitive results on most Atari games after one hour of training time. In addition, we highlight several advantages of ES as a black box optimization technique: it is invariant to action frequency and delayed rewards, tolerant of extremely long horizons, and does not need temporal discounting or value function approximation."

Authors: Tim Salimans, Jonathan Ho, Xi Chen, Ilya Sutskever

Access full paper at http://bit.ly/2mm9Xa3

To view or add a comment, sign in

More articles by Diego Marinho de Oliveira

Deep Learning for Personalized Search and Recommender Systems

Oct 5, 2017

Deep Learning for Personalized Search and Recommender Systems

Nice review about Deep Learning for Search + Recommender Systems: "Abstract Deep learning has been widely successful in…

2 Comments
Facets: An Open Source Visualization Tool for Machine Learning Training Data

Jul 18, 2017

Facets: An Open Source Visualization Tool for Machine Learning Training Data

"Abstract Getting the best results out of a machine learning (ML) model requires that you truly understand your data…

1 Comment
Spark: The Definitive Guide

Jul 6, 2017

Spark: The Definitive Guide

Databricks published some free chapters today about Spark. "Apache Spark has seen immense growth over the past several…

5 Comments
One Model To Learn Them All

Jun 27, 2017

One Model To Learn Them All

"Abstract Deep learning yields great results across many fields, from speech recognition, image classification, to…

3 Comments
Do Balancing Classes Improve Classifier Performance?

May 25, 2017

Do Balancing Classes Improve Classifier Performance?

Nice post by Nina Zumel "It’s a folk theorem I sometimes hear from colleagues and clients: that you must balance the…

4 Comments
Neural Audio Synthesis of Musical Notes with WaveNet Autoencoders

May 21, 2017

Neural Audio Synthesis of Musical Notes with WaveNet Autoencoders

"Abstract Generative models in vision have seen rapid progress due to algorithmic improvements and the availability of…
Neural Ranking Models with Weak Supervision

May 5, 2017

Neural Ranking Models with Weak Supervision

Abstract Despite the impressive improvements achieved by unsupervised deep neural networks in computer vision and NLP…
Simple/Incomplete Benchmark of Machine Learning Libraries for Classification

Apr 19, 2017

Simple/Incomplete Benchmark of Machine Learning Libraries for Classification

The sharing for today :) All benchmarks are wrong, but some are useful "This project aims at a minimal benchmark for…

1 Comment
Introducing tf-seq2seq: An Open Source Sequence-to-Sequence Framework in TensorFlow

Apr 12, 2017

Introducing tf-seq2seq: An Open Source Sequence-to-Sequence Framework in TensorFlow

Summary: "In addition to machine translation, tf-seq2seq can also be applied to any other sequence-to-sequence task…

1 Comment
Mask R-CNN

Mar 22, 2017

Mask R-CNN

"Abstract We present a conceptually simple, flexible, and general framework for object instance segmentation. Our…

3 Comments

See all articles

Evolution Strategies as a Scalable Alternative to Reinforcement Learning

Diego Marinho de Oliveira

Gen-AI Search, RecSys | ex-SEEK, AI Lead, Data Scientist Manager and ML Engineer Specialist

More articles by Diego Marinho de Oliveira

Insights from the community

Others also viewed

Rotation and disk sampling strategies without sin()

Tube Choice Depends On Its Application, But What If All Things Are Equal

For The Love of Computing: Say Thanks To DCT For Your HD Football Match

Game Theory: An Example in R

Do you remember PONG?

CUDA Toolkit 11 – The most powerful SW development platform for building GPU-accelerated apps

Quantum Mechanics Simulation using the Finite Difference Time Domain (FDTD) Method - The "F" stands for Fun ;)

Exploring Conway’s Game of Life with PyScript

Yup, that's a "Raspberry Pi with 3x4 matrix keypad, led and an electric lock mechanism" alright

For The Love of Computing - How Does RGB Become Useful? ... Well .. Make it YCbCr!

Explore topics

More articles by Diego Marinho de Oliveira

Deep Learning for Personalized Search and Recommender Systems

Facets: An Open Source Visualization Tool for Machine Learning Training Data

Spark: The Definitive Guide

One Model To Learn Them All

Do Balancing Classes Improve Classifier Performance?

Neural Audio Synthesis of Musical Notes with WaveNet Autoencoders

Neural Ranking Models with Weak Supervision

Simple/Incomplete Benchmark of Machine Learning Libraries for Classification

Introducing tf-seq2seq: An Open Source Sequence-to-Sequence Framework in TensorFlow

Mask R-CNN

Insights from the community

Others also viewed

Rotation and disk sampling strategies without sin()

Tube Choice Depends On Its Application, But What If All Things Are Equal

For The Love of Computing: Say Thanks To DCT For Your HD Football Match

Game Theory: An Example in R

Do you remember PONG?

CUDA Toolkit 11 – The most powerful SW development platform for building GPU-accelerated apps

Quantum Mechanics Simulation using the Finite Difference Time Domain (FDTD) Method - The "F" stands for Fun ;)

Exploring Conway’s Game of Life with PyScript

Yup, that's a "Raspberry Pi with 3x4 matrix keypad, led and an electric lock mechanism" alright

For The Love of Computing - How Does RGB Become Useful? ... Well .. Make it YCbCr!

Explore topics