Max Jaderberg, Karen Simonyan, Ernesto Coto, Andrea Vedaldi, and Andrew Zisserman


Introduction

The objective of this work is text spotting ‐ localising and recognising text in natural scene images.

  1. Technical Details
  2. Demo
  3. Data
  4. Models
  5. Publications

Technical Details

Our end-to-end text spotting pipeline uses a combination of high recall region proposal methods, followed by a cascade of classifiers and a bounding box regressor.


Text recognition is performed by deep convolutional neural networks

For full details see our publications.


Demo


You can try out our text spotting pipeline applied to image retrieval. We have two different demos:


Data

Our text recognition models are trained purely on synthetic data. We have released a 9M image dataset of synthetically generated word images for training and testing word recognition.

Click here for the datasets


Models

We have released the models from our ECCV 2014 paper Deep Features for Text Spotting.

Click here for ECCV 2014 models

We have also released the models for our NIPS 2014 Deep Learning Workshop paper Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition. They use the MatConvNet toolbox for MATLAB which is included in the package.

Click here for NIPS DLW 2014 models


Publications

M. Jaderberg, K. Simonyan, A. Vedaldi, A. Zisserman
International Journal of Computer Vision, 2016

M. Jaderberg, K. Simonyan, A. Vedaldi, A. Zisserman
Workshop on Deep Learning, NIPS, 2014

M. Jaderberg, A. Vedaldi, A. Zisserman
European Conference on Computer Vision, 2014

Acknowledgements

All data is copyright 2007-2012 BBC, and is used here solely for the purposes of the technical demo. Oxford and the BBC reserve the right to modify or withdraw any data and/or programme material provided as part of this live demo.

This work was supported by the EPSRC and ERC grant VisRec no. 228180. We gratefully acknowledge the support of NVIDIA Corporation with the donation of the GPUs used for this research. We thank the BBC and in particular Rob Cooper for access to data and video processing resources.

  翻译: