Scene Text Detection on Images using Cellular Automata

Scene Text Detection on Images
using Cellular Automata
Konstantinos Zagoris and Ioannis Pratikakis

Image Processing and Multimedia Lab,
Department of Electrical and Computer Engineering,
Democritus University of Thrace, Xanthi, Greece
kzagoris@ee.duth.gr, ipratika@ee.duth.gr

Outline
 Introduction
 State of the Art
 Disadvantages
 Architecture of the proposed method
 Canny Edge Detector
 Coordinating Logic Filters (CLF)
 Proposed Cellular Automata Text Detection
Method
 Evaluation and Experimental Results

Introduction
 Textual information in images or video constitutes
a very rich source of high-level semantics for
retrieval and indexing
 It can be acquired as scene text that was
captured by a video or photo camera as part of a
scene
 Text detection on natural scenes is still a hard
task to solve
 Have very high computational cost

State of the Art
 Split in two categories: region-based and texture-
based
 Region-based algorithms group pixels based on
common characteristics
 Texture-based methods scan the image at
different scales using a sliding window and
classify text areas based on texture information.
 From another perspective, can be divided into
heuristic-based and machine learning-based
methods.
 Heuristic-based algorithms segment the image
into small regions and then group them by some
constraints
 Machine learning-based methods use directly

Disadvantages
 Many parameters have to be estimated
experimentally condemns them to data
dependency and lack of generality
 When background is really complex, they
become computationally expensive.
 Texture-based techniques cannot catch
satisfactory text with size bigger of the sliding
window.
 An increase of the window make these methods
quite costly. In addition, they still use empirical
thresholds on specific features therefore they lack
adaptability.

Proposed Method
 Address the scene text detection problem by
modeling texture into cellular automata (CA)
context
 Replace costly image processing operations with
their equivalent cellular operations
 Eliminate most limitations, such as the empirical
thresholds and heavy computational procedures

Architecture of the proposed method
Original Image

Canny Edge
Map

Logical OR
Cellular Automata
Logical AND

Coordinating Logic Logical OR
Filters Majority State
Rule
Edge
Projection
Filtering
Final Text

Coordinating Logic Filters (CLF)
 execute coordinate logic operations among the
pixels of the image
 The CLF operations is similar to the
morphological operations, achieving similar
functionality
 morphology Dilation is the logical OR
 morphology Erosion is the logical AND

Canny Edge Detector
 Detection of the salient image edges
 Use Sobel masks
 thresholding and non-maxima suppression(low
threshold equal to 20 and high threshold equal to
100)
 The final edge map is a binarised image with the
contour pixels set to one (white) and the
remainder pixels equal to zero (black).
 This approach exploits the fact that text lines
produce strong vertical edges horizontally aligned
with a high density.
 gives us the opportunity to detect normal or

Proposed Cellular Automata
 The proposed CA is considered to be a 2-D lattice
of cells where every pixel is represented by a cell.
 The CA grid width and height is defined by the
edge image width and height
 Each cell have two states as the input image is
binary.
 Taking advantage of the CA flexibility, the
transition rules are changing and are applied in
four consecutive steps resulting in four time steps
CA evolution.

4th Step - Majority State Rule

Edge Projection Filtering
 in the high edge density images, the method
produces a number of false positives
 post-processing filtering is required in order to
remove them
 filtered them based on horizontal and vertical
projections
 Areas with mean horizontal and vertical
projections below a threshold are discarded.

Evaluation


1. Wolf, C., Jolion, J.M.: Object count/area graphs for the evaluation of object
detection and segmentation algorithms. International Journal on Document
Analysis and Recognition 8(4), 280–296 (2006)

Experimental Results
 In order to showcase the advantages of our
proposed method, we test it against a machine-
learning edge based scene text detection system.
 We replace the CLF with the corresponding
morphological operations (dilation and opening)
and the majority state rule with the Support
Vector Machines (SVMs) classifier
Method Recall Precision Harmonic
Mean
Proposed CA-based 0.7942 0.7462 0.7652
method
Machine-learning based 0.7134 0.5234 0.6038
method

Experimental Results
Mean execution time of each of them for a set images
(15 total) in a Intel Core 2 Quad CPU Q9550
(2.83GHz) machine.

Method Mean Execution Time
(sec)
Proposed CA-based 2.75 sec
method
Machine-learning based 5.96 sec
method

Conclusions
 A method based on the Cellular Automata was
presented for the detection of scene text on
natural images
 Initially, the Canny edge detector is employed in
order to exposed the dominant edges on the
image.
 Then a CA is used for the calculation of the
candidate text areas. Its rules depend on
Coordinating Logic Filters and on the majority
state rule
 A post-processing technique based on edge
projection analysis is employed for the high
density edge images in order to eliminated the
false positives.

Ευχαριστώ Πολφ!

Thank You!

Scene Text Detection on Images using Cellular Automata

Recommended

More Related Content

What's hot (20)

Viewers also liked (18)

Similar to Scene Text Detection on Images using Cellular Automata (20)

Scene Text Detection on Images using Cellular Automata