Visual Geometry Group - University of Oxford

Max Jaderberg, Karen Simonyan, Ernesto Coto, Andrea Vedaldi, and Andrew Zisserman

Introduction

The objective of this work is text spotting ‐ localising and recognising text in natural scene images.

Technical Details
Demo
Data
Models
Publications

Technical Details

Our end-to-end text spotting pipeline uses a combination of high recall region proposal methods, followed by a cascade of classifiers and a bounding box regressor.

Text recognition is performed by deep convolutional neural networks

For full details see our publications.

Demo

You can try out our text spotting pipeline applied to image retrieval. We have two different demos:

A demo for searching 2.3 million high-resolution images (updated last in 2014). Some example queries include:
- Hollywood
- Boris Johnson
- Vision
- Police
- Oxford
- United
A demo for searching over 7.6 million medium-resolution images (updated last in 2018). This demo is integrated with other search modalities so that you can perform more in-depth queries. Some example queries include:
- London
- News
- Terror
- Sport
- Live
- Headlines

Data

Our text recognition models are trained purely on synthetic data. We have released a 9M image dataset of synthetically generated word images for training and testing word recognition.

Click here for the datasets

Models

We have released the models from our ECCV 2014 paper Deep Features for Text Spotting.

Click here for ECCV 2014 models

We have also released the models for our NIPS 2014 Deep Learning Workshop paper Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition. They use the MatConvNet toolbox for MATLAB which is included in the package.

Click here for NIPS DLW 2014 models

Publications

M. Jaderberg, K. Simonyan, A. Vedaldi, A. Zisserman

Reading Text in the Wild with Convolutional Neural Networks

International Journal of Computer Vision, 2016

M. Jaderberg, K. Simonyan, A. Vedaldi, A. Zisserman

Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition

Workshop on Deep Learning, NIPS, 2014

M. Jaderberg, A. Vedaldi, A. Zisserman

Deep Features for Text Spotting

European Conference on Computer Vision, 2014

Acknowledgements

All data is copyright 2007-2012 BBC, and is used here solely for the purposes of the technical demo. Oxford and the BBC reserve the right to modify or withdraw any data and/or programme material provided as part of this live demo.

This work was supported by the EPSRC and ERC grant VisRec no. 228180. We gratefully acknowledge the support of NVIDIA Corporation with the donation of the GPUs used for this research. We thank the BBC and in particular Rob Cooper for access to data and video processing resources.