SlideShare a Scribd company logo
Advanced Algorithms – COMS31900
Pattern Matching part one
Suffix Trees
Benjamin Sach
Exact pattern matching
T
Input A text string T (length n) and a pattern string P (length m)
P
ba b c
a b a
a b a cb a
Goal: Find all the locations where P matches in T
P matches at location i iff
a b a
n
m
for all 0 j m we have that P[j] = T[i + j]
(our strings are zero-indexed)
Exact pattern matching
T
Input A text string T (length n) and a pattern string P (length m)
P
ba b c
a b a
a b a cb a
Goal: Find all the locations where P matches in T
P matches at location i iff
a b a
n
m
4
for all 0 j m we have that P[j] = T[i + j]
(our strings are zero-indexed)
Exact pattern matching
T
Input A text string T (length n) and a pattern string P (length m)
P
ba b c a b a cb a
Goal: Find all the locations where P matches in T
P matches at location i iff
a b a
n
a b a
m
6
for all 0 j m we have that P[j] = T[i + j]
(our strings are zero-indexed)
Exact pattern matching
T
Input A text string T (length n) and a pattern string P (length m)
P
ba b c a b a cb a
Goal: Find all the locations where P matches in T
P matches at location i iff
a b a
n
a b a
m
10
for all 0 j m we have that P[j] = T[i + j]
(our strings are zero-indexed)
Exact pattern matching
T
Input A text string T (length n) and a pattern string P (length m)
P
ba b c a b a cb a
Goal: Find all the locations where P matches in T
P matches at location i iff
a b a
n
a b a
m
6
for all 0 j m we have that P[j] = T[i + j]
(our strings are zero-indexed)
Exact pattern matching
T
Input A text string T (length n) and a pattern string P (length m)
P
ba b c a b a cb a
Goal: Find all the locations where P matches in T
P matches at location i iff
a b a
n
a b a
m
6
for all 0 j m we have that P[j] = T[i + j]
(our strings are zero-indexed)
• A naive algorithm takes O(nm) time
Exact pattern matching
T
Input A text string T (length n) and a pattern string P (length m)
P
ba b c a b a cb a
Goal: Find all the locations where P matches in T
P matches at location i iff
a b a
n
a b a
m
6
for all 0 j m we have that P[j] = T[i + j]
(our strings are zero-indexed)
• A naive algorithm takes O(nm) time
• Many O(n) time algorithms are known (for example KMP)
Text indexing
T
Preprocess a text string T (length n) to answer pattern matching queries. . .
ba b c a b a cb a a b a
n
Text indexing
T
Preprocess a text string T (length n) to answer pattern matching queries. . .
ba b c a b a cb a a b a
n
After preprocessing, a query is a pattern P (length m),
P a b a
m
Text indexing
T
Preprocess a text string T (length n) to answer pattern matching queries. . .
ba b c a b a cb a a b a
n
After preprocessing, a query is a pattern P (length m),
P a b a
m
the output is a list of all matches in T.
Text indexing
T
Preprocess a text string T (length n) to answer pattern matching queries. . .
ba b c a b a cb a a b a
n
After preprocessing, a query is a pattern P (length m),
P a b a
m
the output is a list of all matches in T.
e.g. 4, 6, 10
4 6 10
Text indexing
T
Preprocess a text string T (length n) to answer pattern matching queries. . .
ba b c a b a cb a a b a
n
After preprocessing, a query is a pattern P (length m),
P a b a
m
the output is a list of all matches in T.
• A naive algorithm takes O(n) query time (using KMP)
e.g. 4, 6, 10
4 6 10
Text indexing
T
Preprocess a text string T (length n) to answer pattern matching queries. . .
ba b c a b a cb a a b a
n
After preprocessing, a query is a pattern P (length m),
P a b a
m
the output is a list of all matches in T.
• A naive algorithm takes O(n) query time (using KMP)
• We want a query time which depends only on m and occ
- occ is the number of occurences (matches)
e.g. 4, 6, 10
4 6 10
Text indexing
T
Preprocess a text string T (length n) to answer pattern matching queries. . .
ba b c a b a cb a a b a
n
After preprocessing, a query is a pattern P (length m),
P a b a
m
the output is a list of all matches in T.
• A naive algorithm takes O(n) query time (using KMP)
• We want a query time which depends only on m and occ
- occ is the number of occurences (matches)
• We also want O(n) space and fast preprocessing (prep.) time
e.g. 4, 6, 10
4 6 10
The atomic suffix tree
TT b n aaa sn
n
The atomic suffix tree
TT b n aaa sn
n
b n aaa sn
n aaa sn
n aa sn
aa sn
a sn
a s
s
suffixes
The atomic suffix tree
sn
a
s
n
a
s
a
n
a
s
TT b n aaa sn
n
a
s
b n
a
sn
a
n
a
s
b n aaa sn
n aaa sn
n aa sn
aa sn
a sn
a s
s
suffixes
suffix tree
The atomic suffix tree
sn
a
s
n
a
s
a
n
a
s
TT b n aaa sn
n
a
s
b n
a
sn
a
n
a
s
b n aaa sn
n aaa sn
n aa sn
aa sn
a sn
a s
s
suffixes
suffix tree
The atomic suffix tree
sn
a
s
n
a
s
a
n
a
s
TT b n aaa sn
n
a
s
b n
a
sn
a
n
a
s
b n aaa sn
n aaa sn
n aa sn
aa sn
a sn
a s
s
suffixes
suffix tree
The atomic suffix tree
sn
a
s
n
a
s
a
n
a
s
TT b n aaa sn
n
a
s
b n
a
sn
a
n
a
s
b n aaa sn
n aaa sn
n aa sn
aa sn
a sn
a s
s
suffixes
suffix tree
The atomic suffix tree
sn
a
s
n
a
s
a
n
a
s
TT b n aaa sn
n
a
s
b n
a
sn
a
n
a
s
b n aaa sn
n aaa sn
n aa sn
aa sn
a sn
a s
s
suffixes
suffix tree
The atomic suffix tree
sn
a
s
n
a
s
a
n
a
s
TT b n aaa sn
n
a
s
b n
a
sn
a
n
a
s
b n aaa sn
n aaa sn
n aa sn
aa sn
a sn
a s
s
suffixes
suffix tree
The atomic suffix tree
sn
a
s
n
a
s
a
n
a
s
TT b n aaa sn
n
a
s
b n
a
sn
a
n
a
s
b n aaa sn
n aaa sn
n aa sn
aa sn
a sn
a s
s
suffixes
suffix tree
The atomic suffix tree
sn
a
s
n
a
s
a
n
a
s
TT b n aaa sn
n
a
s
b n
a
sn
a
n
a
s
b n aaa sn
n aaa sn
n aa sn
aa sn
a sn
a s
s
suffixes
suffix tree
The atomic suffix tree
sn
a
s
n
a
s
a
n
a
s
TT b n aaa sn
n
a
s
b n
a
sn
a
n
a
s
b n aaa sn
n aaa sn
n aa sn
aa sn
a sn
a s
s
suffixes
suffix tree
The atomic suffix tree
sn
a
s
n
a
s
a
n
a
s
TT b n aaa sn
n
a
s
b n
a
sn
a
n
a
s
b n aaa sn
n aaa sn
n aa sn
aa sn
a sn
a s
s
suffixes
suffix tree
0
1
2
3
4
5
6
0
1
2
3
4
5
6
The atomic suffix tree
sn
a
s
n
a
s
a
n
a
s
TT b n aaa sn
n
a
s
b n
a
sn
a
n
a
s
b n aaa sn
n aaa sn
n aa sn
aa sn
a sn
a s
s
suffixes
suffix tree
0
1
2
3
4
5
6
0
1
2
3
4
5
6
• The suffix tree contains every suffix of T as a root to leaf path
The atomic suffix tree
sn
a
s
n
a
s
a
n
a
s
TT b n aaa sn
n
a
s
b n
a
sn
a
n
a
s
b n aaa sn
n aaa sn
n aa sn
aa sn
a sn
a s
s
suffixes
suffix tree
0
1
2
3
4
5
6
0
1
2
3
4
5
6
• The suffix tree contains every suffix of T as a root to leaf path
• Every edge is labelled with a character from T
The atomic suffix tree
sn
a
s
n
a
s
a
n
a
s
TT b n aaa sn
n
a
s
b n
a
sn
a
n
a
s
b n aaa sn
n aaa sn
n aa sn
aa sn
a sn
a s
s
suffixes
suffix tree
0
1
2
3
4
5
6
0
1
2
3
4
5
6
• The suffix tree contains every suffix of T as a root to leaf path
• Every edge is labelled with a character from T
• No two edges leaving the same node have the same label
The atomic suffix tree
sn
a
s
n
a
s
a
n
a
s
TT b n aaa sn
n
a
s
b n
a
sn
a
n
a
s
b n aaa sn
n aaa sn
n aa sn
aa sn
a sn
a s
s
suffixes
suffix tree
0
1
2
3
4
5
6
0
1
2
3
4
5
6
• The suffix tree contains every suffix of T as a root to leaf path
• Each leaf corresponds to a suffix (so there are n leaves)
• Every edge is labelled with a character from T
• No two edges leaving the same node have the same label
Searching in an atomic suffix tree
sn
a
s
n
a
s
a
n
a
s
TT b n aaa sn
n
s
b n
a
sn
a
n
a
s
0
1
2
3
4
5
a 6
Searching in an atomic suffix tree
sn
a
s
n
a
s
a
n
a
s
TT b n aaa sn
n
s
b n
a
sn
a
n
a
s
How do you find a pattern?
0
1
2
3
4
5
a 6
Searching in an atomic suffix tree
sn
a
s
n
a
s
a
n
a
s
TT b n aaa sn
n
s
b n
a
sn
a
n
a
s
How do you find a pattern?
0
1
2
3
4
5
P aa n
m
a 6
Searching in an atomic suffix tree
sn
a
s
n
a
s
a
n
a
s
TT b n aaa sn
n
s
b n
a
sn
a
n
a
s
How do you find a pattern?
0
1
2
3
4
5
P aa n
m
start at the root and walk down the tree
a 6
Searching in an atomic suffix tree
sn
a
s
n
a
s
a
n
a
s
TT b n aaa sn
n
s
b n
a
sn
a
n
a
s
How do you find a pattern?
0
1
2
3
4
5
P aa n
m
start at the root and walk down the tree
a 6
Searching in an atomic suffix tree
sn
a
s
n
a
s
a
n
a
s
TT b n aaa sn
n
s
b n
a
sn
a
n
a
s
How do you find a pattern?
0
1
2
3
4
5
P aa n
m
start at the root and walk down the tree
a 6
Searching in an atomic suffix tree
sn
a
s
n
a
s
a
n
a
s
TT b n aaa sn
n
s
b n
a
sn
a
n
a
s
How do you find a pattern?
0
1
2
3
4
5
P aa n
m
start at the root and walk down the tree
a 6
Searching in an atomic suffix tree
sn
a
s
n
a
s
a
n
a
s
TT b n aaa sn
n
s
b n
a
sn
a
n
a
s
How do you find a pattern?
0
1
2
3
4
5
P aa n
m
start at the root and walk down the tree
a 6
Searching in an atomic suffix tree
sn
a
s
n
a
s
a
n
a
s
TT b n aaa sn
n
s
b n
a
sn
a
n
a
s
How do you find a pattern?
0
1
2
3
4
5
P aa n
m
start at the root and walk down the tree
a
. . . matches occur at the leaves of the subtree
6
Searching in an atomic suffix tree
sn
a
s
n
a
s
a
n
a
s
TT b n aaa sn
n
s
b n
a
sn
a
n
a
s
How do you find a pattern?
0
1
2
3
4
5
P aa n
m
start at the root and walk down the tree
a
. . . matches occur at the leaves of the subtree
6
Searching in an atomic suffix tree
sn
a
s
n
a
s
a
n
a
s
TT b n aaa sn
n
s
b n
a
sn
a
n
a
s
How do you find a pattern?
0
1
2
3
4
5
P aa n
m
start at the root and walk down the tree
a
. . . matches occur at the leaves of the subtree
1
6
Searching in an atomic suffix tree
sn
a
s
n
a
s
a
n
a
s
TT b n aaa sn
n
s
b n
a
sn
a
n
a
s
How do you find a pattern?
0
1
2
3
4
5
P aa n
m
start at the root and walk down the tree
a
. . . matches occur at the leaves of the subtree
3
6
Searching in an atomic suffix tree
sn
a
s
n
a
s
a
n
a
s
TT b n aaa sn
n
s
b n
a
sn
a
n
a
s
How do you find a pattern?
0
1
2
3
4
5
P aa n
m
start at the root and walk down the tree
a
. . . matches occur at the leaves of the subtree
6
Searching in an atomic suffix tree
sn
a
s
n
a
s
a
n
a
s
TT b n aaa sn
n
s
b n
a
sn
a
n
a
s
How do you find a pattern?
0
1
2
3
4
5
P aa n
m
start at the root and walk down the tree
a
. . . matches occur at the leaves of the subtree
P bn a
6
Searching in an atomic suffix tree
sn
a
s
n
a
s
a
n
a
s
TT b n aaa sn
n
s
b n
a
sn
a
n
a
s
How do you find a pattern?
0
1
2
3
4
5
P aa n
m
start at the root and walk down the tree
a
. . . matches occur at the leaves of the subtree
P bn a
6
Searching in an atomic suffix tree
sn
a
s
n
a
s
a
n
a
s
TT b n aaa sn
n
s
b n
a
sn
a
n
a
s
How do you find a pattern?
0
1
2
3
4
5
P aa n
m
start at the root and walk down the tree
a
. . . matches occur at the leaves of the subtree
P bn a
6
Searching in an atomic suffix tree
sn
a
s
n
a
s
a
n
a
s
TT b n aaa sn
n
s
b n
a
sn
a
n
a
s
How do you find a pattern?
0
1
2
3
4
5
P aa n
m
start at the root and walk down the tree
a
. . . matches occur at the leaves of the subtree
P bn a
6
Searching in an atomic suffix tree
sn
a
s
n
a
s
a
n
a
s
TT b n aaa sn
n
s
b n
a
sn
a
n
a
s
How do you find a pattern?
0
1
2
3
4
5
P aa n
m
start at the root and walk down the tree
a
. . . matches occur at the leaves of the subtree
P bn a
6
Searching in an atomic suffix tree
sn
a
s
n
a
s
a
n
a
s
TT b n aaa sn
n
s
b n
a
sn
a
n
a
s
How do you find a pattern?
0
1
2
3
4
5
P aa n
m
start at the root and walk down the tree
a
. . . matches occur at the leaves of the subtree
P bn a
6
Searching in an atomic suffix tree
sn
a
s
n
a
s
a
n
a
s
TT b n aaa sn
n
s
b n
a
sn
a
n
a
s
How do you find a pattern?
0
1
2
3
4
5
P aa n
m
start at the root and walk down the tree
a
. . . matches occur at the leaves of the subtree
P bn a
6
Searching in an atomic suffix tree
sn
a
s
n
a
s
a
n
a
s
TT b n aaa sn
n
s
b n
a
sn
a
n
a
s
How do you find a pattern?
0
1
2
3
4
5
P aa n
m
start at the root and walk down the tree
a
. . . matches occur at the leaves of the subtree
We can decide whether P matches somewhere in O(m) time
P bn a
6
Searching in an atomic suffix tree
sn
a
s
n
a
s
a
n
a
s
TT b n aaa sn
n
s
b n
a
sn
a
n
a
s
How do you find a pattern?
0
1
2
3
4
5
P aa n
m
start at the root and walk down the tree
a
. . . matches occur at the leaves of the subtree
We can decide whether P matches somewhere in O(m) time
(we’ll worry about outputting the matches later)
P bn a
6
Searching in an atomic suffix tree
sn
a
s
n
a
s
a
n
a
s
TT b n aaa sn
n
s
b n
a
sn
a
n
a
s
How do you find a pattern?
0
1
2
3
4
5
P aa n
m
start at the root and walk down the tree
a
. . . matches occur at the leaves of the subtree
We can decide whether P matches somewhere in O(m) time
(we’ll worry about outputting the matches later)
WARNING! How long does it take to find the correct child?
There could be n edges here!
In this lecture we assume the alphabet size is a constant
P bn a
6
Searching in an atomic suffix tree
sn
a
s
n
a
s
a
n
a
s
TT b n aaa sn
n
s
b n
a
sn
a
n
a
s
How do you find a pattern?
0
1
2
3
4
5
P aa n
m
start at the root and walk down the tree
a
. . . matches occur at the leaves of the subtree
We can decide whether P matches somewhere in O(m) time
(we’ll worry about outputting the matches later)
WARNING! How long does it take to find the correct child?
There could be n edges here!
In this lecture we assume the alphabet size is a constant
This may be fine in some applications
(English text or DNA for example)
We can remove the assumption via the magic of hashing
P bn a
6
Searching in an atomic suffix tree
sn
a
s
n
a
s
a
n
a
s
TT b n aaa sn
n
s
b n
a
sn
a
n
a
s
How do you find a pattern?
0
1
2
3
4
5
P aa n
m
start at the root and walk down the tree
a
. . . matches occur at the leaves of the subtree
We can decide whether P matches somewhere in O(m) time
(we’ll worry about outputting the matches later)
P bn a
6
how large is the atomic suffix tree?
sn
a
s
n
a
s
a
n
a
s
TT b n aaa sn
n
s
b n
a
sn
a
n
a
s
There are at most n leaves
0
1
2
3
4
5
a 6
how large is the atomic suffix tree?
sn
a
s
n
a
s
a
n
a
s
TT b n aaa sn
n
s
b n
a
sn
a
n
a
s
There are at most n leaves
0
1
2
3
4
5
a
that’s good right?
6
how large is the atomic suffix tree?
sn
a
s
n
a
s
a
n
a
s
TT b n aaa sn
n
s
b n
a
sn
a
n
a
s
There are at most n leaves
0
1
2
3
4
5
a
that’s good right?
Unfortunately there can be lots of internal nodes
6
how large is the atomic suffix tree?
sn
a
s
n
a
s
a
n
a
s
TT b n aaa sn
n
s
b n
a
sn
a
n
a
s
There are at most n leaves
0
1
2
3
4
5
a
that’s good right?
Unfortunately there can be lots of internal nodes
this path is pretty long. . .
6
how large is the atomic suffix tree?
sn
a
s
n
a
s
a
n
a
s
TT b n aaa sn
n
s
b n
a
sn
a
n
a
s
There are at most n leaves
0
1
2
3
4
5
a
that’s good right?
Unfortunately there can be lots of internal nodes
this path is pretty long. . .
6
how large is the atomic suffix tree?
sn
a
s
n
a
s
a
n
a
s
TT b n aaa sn
n
s
b n
a
sn
a
n
a
s
There are at most n leaves
0
1
2
3
4
5
a
that’s good right?
Unfortunately there can be lots of internal nodes
this path is pretty long. . .
7 characters
6
how large is the atomic suffix tree?
sn
a
s
n
a
s
a
n
a
s
TT b n aaa sn
n
s
b n
a
sn
a
n
a
s
There are at most n leaves
0
1
2
3
4
5
a
that’s good right?
Unfortunately there can be lots of internal nodes
this path is pretty long. . .
7 characters 23 nodes
6
how large is the atomic suffix tree?
sn
a
s
n
a
s
a
n
a
s
TT b n aaa sn
n
s
b n
a
sn
a
n
a
s
There are at most n leaves
0
1
2
3
4
5
a
that’s good right?
Unfortunately there can be lots of internal nodes
this path is pretty long. . .
7 characters 23 nodes that’s not so bad, right?
6
how large is the atomic suffix tree?
how large is the atomic suffix tree?
T a b
2
how large is the atomic suffix tree?
T a b
2
a b
b
how large is the atomic suffix tree?
T a b
2
a b
b
4 nodes
how large is the atomic suffix tree?
T a b a b baT
2 4
a b b
b b
a
b
b
a
b
b
4 nodes
9 nodes
how large is the atomic suffix tree?
T a b a b ba a b baa bT T
2 4 6
a b b
b b
a
b
b
a
b
b
b
b
a
b
b
a
b
b
b
b
b
a
b
b
b
4 nodes
9 nodes 16 nodes
how large is the atomic suffix tree?
T a b a b ba a b baa b
a b baa ba bT
T T
2 4 6
8
a b b
b b
a
b
b
a
b
b
b
b
a
b
b
a
b
b
b
b
b
a
b
b
b
4 nodes
9 nodes 16 nodes
b
b
a
b
b
a
b
b
b
b
b
a
b
b
b
25 nodes
b
b
b
b
a
b
b
b
b
how large is the atomic suffix tree?
T a b a b ba a b baa b
a b baa b
a b baa b
a b
aa b b
T
T T
T
2 4 6
8
10
a b b
b b
a
b
b
a
b
b
b
b
a
b
b
a
b
b
b
b
b
a
b
b
b
4 nodes
9 nodes 16 nodes
b
b
a
b
b
a
b
b
b
b
b
a
b
b
b
25 nodes
b
b
b
b
a
b
b
b
b
b
b
a
b
b
a
b
b
b
b
b
a
b
b
b
36 nodes
b
b
b
b
a
b
b
b
b
b
b
b
b
b
a
b
b
b
b
b
how large is the atomic suffix tree?
T a b a b ba a b baa b
a b baa b
a b baa b
a b
aa b b
T
T T
T
2 4 6
8
10
a b b
b b
a
b
b
a
b
b
b
b
a
b
b
a
b
b
b
b
b
a
b
b
b
4 nodes
9 nodes 16 nodes
b
b
a
b
b
a
b
b
b
b
b
a
b
b
b
25 nodes
b
b
b
b
a
b
b
b
b
b
b
a
b
b
a
b
b
b
b
b
a
b
b
b
36 nodes
b
b
b
b
a
b
b
b
b
b
b
b
b
b
a
b
b
b
b
b
An atomic suffix tree can have
((n/2) + 1)2 nodes
how large is the atomic suffix tree?
T a b a b ba a b baa b
a b baa b
a b baa b
a b
aa b b
T
T T
T
2 4 6
8
10
a b b
b b
a
b
b
a
b
b
b
b
a
b
b
a
b
b
b
b
b
a
b
b
b
4 nodes
9 nodes 16 nodes
b
b
a
b
b
a
b
b
b
b
b
a
b
b
b
25 nodes
b
b
b
b
a
b
b
b
b
b
b
a
b
b
a
b
b
b
b
b
a
b
b
b
36 nodes
b
b
b
b
a
b
b
b
b
b
b
b
b
b
a
b
b
b
b
b
An atomic suffix tree can have
((n/2) + 1)2 nodes
this is too big!far
Compacted suffix trees
sn
a
s
n
a
s
a
n
a
s
TT b n aaa sn
n
s
b n
a
sn
a
n
a
s
0
1
2
3
4
5
6a
Why is the atomic suffix tree so big?
Compacted suffix trees
sn
a
s
n
a
s
a
n
a
s
TT b n aaa sn
n
s
b n
a
sn
a
n
a
s
0
1
2
3
4
5
6a
Why is the atomic suffix tree so big?
because it has long
paths like this one
Compacted suffix trees
sn
a
s
n
a
s
a
n
a
s
TT b n aaa sn
n
s
b n
a
sn
a
n
a
s
0
1
2
3
4
5
6a
Why is the atomic suffix tree so big?
because it has long
paths like this one
Main Idea replace each non-branching path with a single edge
Compacted suffix trees
sn
a
s
n
a
s
a
n
a
s
TT b n aaa sn
n
s
b n
a
sn
a
n
a
s
0
1
2
3
4
5
6a
Why is the atomic suffix tree so big?
because it has long
paths like this one
Main Idea replace each non-branching path with a single edge
- edges are now labelled with substrings
Compacted suffix trees
sn
a
s
n
a
s
a
n
a
s
TT b n aaa sn
n
s
b n
a
sn
a
n
a
s
0
1
2
3
4
5
6a
Why is the atomic suffix tree so big?
because it has long
paths like this one
Main Idea replace each non-branching path with a single edge
- edges are now labelled with substrings
(instead of single characters)
Compacted suffix trees
TT b n aaa sn
n
Main Idea replace each non-branching path with a single edge
- edges are now labelled with substrings
(instead of single characters)
a
s
nas
nas
nas
s
na
s
bananas
1 3
5
0
2 4
6
Compacted suffix trees
TT b n aaa sn
n
Main Idea replace each non-branching path with a single edge
- edges are now labelled with substrings
(instead of single characters)
a
s
nas
nas
nas
s
na
s
bananas
• There are at most n leaves
1 3
5
0
2 4
6
Compacted suffix trees
TT b n aaa sn
n
Main Idea replace each non-branching path with a single edge
- edges are now labelled with substrings
(instead of single characters)
a
s
nas
nas
nas
s
na
s
bananas
• There are at most n leaves
• Every internal node has
two or more children
1 3
5
0
2 4
6
Compacted suffix trees
TT b n aaa sn
n
Main Idea replace each non-branching path with a single edge
- edges are now labelled with substrings
(instead of single characters)
a
s
nas
nas
nas
s
na
s
bananas
• There are at most n leaves
• Every internal node has
two or more children
so there are O(n) edges
1 3
5
0
2 4
6
Compacted suffix trees
TT b n aaa sn
n
Main Idea replace each non-branching path with a single edge
- edges are now labelled with substrings
(instead of single characters)
a
s
nas
nas
nas
s
na
s
bananas
• There are at most n leaves
• Every internal node has
two or more children
so there are O(n) edges
don’t the edges take up
1 3
5
0
2 4
6
lots of space?
Compacted suffix trees
TT b n aaa sn
n
Main Idea replace each non-branching path with a single edge
- edges are now labelled with substrings
(instead of single characters)
a
s
nas
nas
nas
s
na
s
bananas
• There are at most n leaves
• Every internal node has
two or more children
so there are O(n) edges
don’t the edges take up
we only store the end points
1 3
5
0
2 4
6
lots of space?
Compacted suffix trees
TT b n aaa sn
n
Main Idea replace each non-branching path with a single edge
- edges are now labelled with substrings
(instead of single characters)
a
s
nas
nas
nas
s
na
s
bananas
• There are at most n leaves
• Every internal node has
two or more children
so there are O(n) edges
don’t the edges take up
we only store the end points
we actually store (4, 6)
4 6
1 3
5
0
2 4
6
lots of space?
Compacted suffix trees
TT b n aaa sn
n
a
s
nas
nas
nas
s
na
s
bananas
1 3
5
0
2 4
6
Compacted suffix trees
TT b n aaa sn
n
a
s
nas
nas
nas
s
na
s
bananas
Compacted Suffix Tree of T
1 3
5
0
2 4
6
Compacted suffix trees
TT b n aaa sn
n
a
s
nas
nas
nas
s
na
s
bananas
Compacted Suffix Tree of T
• A rooted tree with n leaves
1 3
5
0
2 4
6
Compacted suffix trees
TT b n aaa sn
n
a
s
nas
nas
nas
s
na
s
bananas
Compacted Suffix Tree of T
• A rooted tree with n leaves
• Every internal node has
two or more children
1 3
5
0
2 4
6
Compacted suffix trees
TT b n aaa sn
n
a
s
nas
nas
nas
s
na
s
bananas
Compacted Suffix Tree of T
• A rooted tree with n leaves
• Every internal node has
two or more children
• Every edge is labelled
with a substring 1 3
5
0
2 4
6
Compacted suffix trees
TT b n aaa sn
n
a
s
nas
nas
nas
s
na
s
bananas
Compacted Suffix Tree of T
• A rooted tree with n leaves
• Every internal node has
two or more children
• Every edge is labelled
with a substring
• No two edges leaving the same node have the same first character
1 3
5
0
2 4
6
Compacted suffix trees
TT b n aaa sn
n
a
s
nas
nas
nas
s
na
s
bananas
Compacted Suffix Tree of T
• A rooted tree with n leaves
• Every internal node has
two or more children
• Every edge is labelled
with a substring
• No two edges leaving the same node have the same first character
• Each leaf is labelled with a location in T
1 3
5
0
2 4
6
Compacted suffix trees
TT b n aaa sn
n
a
s
nas
nas
nas
s
na
s
bananas
Compacted Suffix Tree of T
• A rooted tree with n leaves
• Every internal node has
two or more children
• Every edge is labelled
with a substring
• No two edges leaving the same node have the same first character
• Each leaf is labelled with a location in T
• Any root-to-leaf path spells out the corresponding suffix
1 3
5
0
2 4
6
Compacted suffix trees
TT b n aaa sn
n
a
s
nas
nas
nas
s
na
s
bananas
Compacted Suffix Tree of T
Uses O(n) space
• A rooted tree with n leaves
• Every internal node has
two or more children
• Every edge is labelled
with a substring
• No two edges leaving the same node have the same first character
• Each leaf is labelled with a location in T
• Any root-to-leaf path spells out the corresponding suffix
1 3
5
0
2 4
6
Compacted suffix trees
TT b n aaa sn
n
a
s
nas
nas
nas
s
na
s
bananas
Compacted Suffix Tree of T
Uses O(n) space
• A rooted tree with n leaves
• Every internal node has
two or more children
• Every edge is labelled
with a substring
• No two edges leaving the same node have the same first character
• Each leaf is labelled with a location in T
• Any root-to-leaf path spells out the corresponding suffix
1 3
5
0
2 4
6
Sanity Check
Does the compacted suffix tree always exist?
Compacted suffix trees
TT b n aaa sn
n
a
s
nas
nas
nas
s
na
s
bananas
Compacted Suffix Tree of T
Uses O(n) space
• A rooted tree with n leaves
• Every internal node has
two or more children
• Every edge is labelled
with a substring
• No two edges leaving the same node have the same first character
• Each leaf is labelled with a location in T
• Any root-to-leaf path spells out the corresponding suffix
1 3
5
0
2 4
6
Sanity Check
Does the compacted suffix tree always exist?
TT b b this doesn’t have
n leavesbb
Compacted suffix trees
TT b n aaa sn
n
a
s
nas
nas
nas
s
na
s
bananas
Compacted Suffix Tree of T
Uses O(n) space
• A rooted tree with n leaves
• Every internal node has
two or more children
• Every edge is labelled
with a substring
• No two edges leaving the same node have the same first character
• Each leaf is labelled with a location in T
• Any root-to-leaf path spells out the corresponding suffix
1 3
5
0
2 4
6
Sanity Check
Does the compacted suffix tree always exist?
TT b b this doesn’t have
n leavesbb
TT b b $ b$
$ b$
this has n
leaves
Compacted suffix trees
TT b n aaa sn
n
a
s
nas
nas
nas
s
na
s
bananas
Compacted Suffix Tree of T
Uses O(n) space
• A rooted tree with n leaves
• Every internal node has
two or more children
• Every edge is labelled
with a substring
• No two edges leaving the same node have the same first character
• Each leaf is labelled with a location in T
• Any root-to-leaf path spells out the corresponding suffix
1 3
5
0
2 4
6
Compacted suffix trees
TT b n aaa sn
n
a
s
nas
nas
nas
s
na
s
bananas
Compacted Suffix Tree of T
Uses O(n) space
• A rooted tree with n leaves
• Every internal node has
two or more children
• Every edge is labelled
with a substring
• No two edges leaving the same node have the same first character
• Each leaf is labelled with a location in T
• Any root-to-leaf path spells out the corresponding suffix
1 3
5
0
2 4
6
Step one: Add a $ (unique symbol) to T
Compacted suffix trees
a
s
nas
nas
nas
s
na
s
bananas
Compacted Suffix Tree of T
Uses O(n) space
• A rooted tree with n leaves
• Every internal node has
two or more children
• Every edge is labelled
with a substring
• No two edges leaving the same node have the same first character
• Each leaf is labelled with a location in T
• Any root-to-leaf path spells out the corresponding suffix
1 3
5
0
2 4
6
Step one: Add a $ (unique symbol) to T
TT b n aaa sn
n
$
Compacted suffix trees
Compacted Suffix Tree of T
Uses O(n) space
• A rooted tree with n leaves
• Every internal node has
two or more children
• Every edge is labelled
with a substring
• No two edges leaving the same node have the same first character
• Each leaf is labelled with a location in T
• Any root-to-leaf path spells out the corresponding suffix
Step one: Add a $ (unique symbol) to T
TT b n aaa sn
n
$ a
s$
nas$
nas$
nas$
s$
na
s$
bananas$
7$
1 3
5
0
2 4
6
Compacted suffix trees
Compacted Suffix Tree of T
Uses O(n) space
• A rooted tree with n leaves
• Every internal node has
two or more children
• Every edge is labelled
with a substring
• No two edges leaving the same node have the same first character
• Each leaf is labelled with a location in T
• Any root-to-leaf path spells out the corresponding suffix
Step one: Add a $ (unique symbol) to T
TT b n aaa sn
n
$ a
s$
nas$
nas$
nas$
s$
na
s$
bananas$
7$
This is normally just called1 3
5
0
2 4
6
a suffix tree
Searching in a compacted suffix tree
TT b n aaa sn
n
$ a
s$
nas$
nas$
nas$
s$
na
s$
bananas$
7$
1 3
5
0
2 4
6
Searching in a compacted suffix tree
How do you find a pattern?
TT b n aaa sn
n
$ a
s$
nas$
nas$
nas$
s$
na
s$
bananas$
7$
1 3
5
0
2 4
6
Searching in a compacted suffix tree
How do you find a pattern?
P aa n
m
TT b n aaa sn
n
$ a
s$
nas$
nas$
nas$
s$
na
s$
bananas$
7$
an
1 3
5
0
2 4
6
Searching in a compacted suffix tree
How do you find a pattern?
P aa n
m
start at the root and walk down the tree
TT b n aaa sn
n
$ a
s$
nas$
nas$
nas$
s$
na
s$
bananas$
7$
an
1 3
5
0
2 4
6
Searching in a compacted suffix tree
How do you find a pattern?
P aa n
m
start at the root and walk down the tree
TT b n aaa sn
n
$ a
s$
nas$
nas$
nas$
s$
na
s$
bananas$
7$
an
1 3
5
0
2 4
6
Searching in a compacted suffix tree
How do you find a pattern?
P aa n
m
start at the root and walk down the tree
TT b n aaa sn
n
$ a
s$
nas$
nas$
nas$
s$
na
s$
bananas$
7$
a
an
1 3
5
0
2 4
6
Searching in a compacted suffix tree
How do you find a pattern?
P aa n
m
start at the root and walk down the tree
TT b n aaa sn
n
$ a
s$
nas$
nas$
nas$
s$
na
s$
bananas$
7$
a
na
an
1 3
5
0
2 4
6
Searching in a compacted suffix tree
How do you find a pattern?
P aa n
m
start at the root and walk down the tree
TT b n aaa sn
n
$ a
s$
nas$
nas$
nas$
s$
na
s$
bananas$
7$
a
na
an
1 3
5
0
2 4
6
remember that an edge is
actually stored as a pair
we’re actually looking in T
Searching in a compacted suffix tree
How do you find a pattern?
P aa n
m
start at the root and walk down the tree
TT b n aaa sn
n
$ a
s$
nas$
nas$
nas$
s$
na
s$
bananas$
7$
a
na
an
na1 3
5
0
2 4
6
Searching in a compacted suffix tree
How do you find a pattern?
P aa n
m
start at the root and walk down the tree
TT b n aaa sn
n
$ a
s$
nas$
nas$
nas$
s$
na
s$
bananas$
7$
a
na
an
na
nas$
1 3
5
0
2 4
6
Searching in a compacted suffix tree
How do you find a pattern?
P aa n
m
start at the root and walk down the tree
TT b n aaa sn
n
$ a
s$
nas$
nas$
nas$
s$
na
s$
bananas$
7$
a
na
an
na
nas$
nas$
1 3
5
0
2 4
6
Searching in a compacted suffix tree
How do you find a pattern?
P aa n
m
start at the root and walk down the tree
. . . matches occur at the leaves of the subtree
TT b n aaa sn
n
$ a
s$
nas$
nas$
nas$
s$
na
s$
bananas$
7$
a
na
an
na
nas$
nas$
1 3
5
0
2 4
6
Searching in a compacted suffix tree
How do you find a pattern?
P aa n
m
start at the root and walk down the tree
. . . matches occur at the leaves of the subtree
1
TT b n aaa sn
n
$ a
s$
nas$
nas$
nas$
s$
na
s$
bananas$
7$
a
na
an
na
nas$
nas$
1 3
5
0
2 4
6
Searching in a compacted suffix tree
How do you find a pattern?
P aa n
m
start at the root and walk down the tree
. . . matches occur at the leaves of the subtree
TT b n aaa sn
n
$ a
s$
nas$
nas$
nas$
s$
na
s$
bananas$
7$
an
1 3
5
0
2 4
6
Searching in a compacted suffix tree
How do you find a pattern?
P aa n
m
start at the root and walk down the tree
. . . matches occur at the leaves of the subtree
P n a
TT b n aaa sn
n
$ a
s$
nas$
nas$
nas$
s$
na
s$
bananas$
7$
an
1 3
5
0
2 4
6
Searching in a compacted suffix tree
How do you find a pattern?
P aa n
m
start at the root and walk down the tree
. . . matches occur at the leaves of the subtree
P n a
TT b n aaa sn
n
$ a
s$
nas$
nas$
nas$
s$
na
s$
bananas$
7$
an
1 3
5
0
2 4
6
Searching in a compacted suffix tree
How do you find a pattern?
P aa n
m
start at the root and walk down the tree
. . . matches occur at the leaves of the subtree
P n a
TT b n aaa sn
n
$ a
s$
nas$
nas$
nas$
s$
na
s$
bananas$
7$
an
na
1 3
5
0
2 4
6
Searching in a compacted suffix tree
How do you find a pattern?
P aa n
m
start at the root and walk down the tree
. . . matches occur at the leaves of the subtree
P n a
TT b n aaa sn
n
$ a
s$
nas$
nas$
nas$
s$
na
s$
bananas$
7$
an
nana
1 3
5
0
2 4
6
Searching in a compacted suffix tree
How do you find a pattern?
P aa n
m
start at the root and walk down the tree
. . . matches occur at the leaves of the subtree
P n a
TT b n aaa sn
n
$ a
s$
nas$
nas$
nas$
s$
na
s$
bananas$
7$
an
nana
1 3
5
0
2 4
6
Searching in a compacted suffix tree
How do you find a pattern?
P aa n
m
start at the root and walk down the tree
. . . matches occur at the leaves of the subtree
P n a
TT b n aaa sn
n
$ a
s$
nas$
nas$
nas$
s$
na
s$
bananas$
7$
an
nana
1 3
5
0
2 4
6
Searching in a compacted suffix tree
How do you find a pattern?
P aa n
m
start at the root and walk down the tree
. . . matches occur at the leaves of the subtree
P n a
TT b n aaa sn
n
$ a
s$
nas$
nas$
nas$
s$
na
s$
bananas$
7$
an
nana
2
1 3
5
0
2 4
6
Searching in a compacted suffix tree
How do you find a pattern?
P aa n
m
start at the root and walk down the tree
. . . matches occur at the leaves of the subtree
P n a
TT b n aaa sn
n
$ a
s$
nas$
nas$
nas$
s$
na
s$
bananas$
7$
an
nana
4
1 3
5
0
2 4
6
Searching in a compacted suffix tree
How do you find a pattern?
P aa n
m
start at the root and walk down the tree
. . . matches occur at the leaves of the subtree
P n a
TT b n aaa sn
n
$ a
s$
nas$
nas$
nas$
s$
na
s$
bananas$
7$
an
nana
how big is this subtree?1 3
5
0
2 4
6
Searching in a compacted suffix tree
How do you find a pattern?
P aa n
m
start at the root and walk down the tree
. . . matches occur at the leaves of the subtree
P n a
TT b n aaa sn
n
$ a
s$
nas$
nas$
nas$
s$
na
s$
bananas$
7$
an
nana
how big is this subtree?
O(occ) because it has occ leaves
1 3
5
0
2 4
6
(and each internal node has
at least two children)
Searching in a compacted suffix tree
How do you find a pattern?
P aa n
m
start at the root and walk down the tree
. . . matches occur at the leaves of the subtree
We can find all the matches in O(m + occ) time (by looking at the whole subtree)
P n a
TT b n aaa sn
n
$ a
s$
nas$
nas$
nas$
s$
na
s$
bananas$
7$
an
nana
how big is this subtree?
O(occ) because it has occ leaves
1 3
5
0
2 4
6
(and each internal node has
at least two children)
Naively constructing a compacted suffix tree
Insert the suffixes one at a time (longest first)
• Search for the new suffix in the partial suffix tree
(as if you were matching a pattern)
• Add a new edge and leaf for the new suffix
(this may require you to break an edge in two)
TT b n aaa sn
n
$
never actually
do it like this
you should
Naively constructing a compacted suffix tree
Insert the suffixes one at a time (longest first)
• Search for the new suffix in the partial suffix tree
(as if you were matching a pattern)
• Add a new edge and leaf for the new suffix
(this may require you to break an edge in two)
TT b n aaa sn
n
$
b n aaa sn
n aaa sn
n aa sn
aa sn
a sn
a s
s
suffixes
$
$
$
$
$
$
$
0
1
2
3
4
5
6
$ 7
never actually
do it like this
you should
Naively constructing a compacted suffix tree
Insert the suffixes one at a time (longest first)
• Search for the new suffix in the partial suffix tree
(as if you were matching a pattern)
• Add a new edge and leaf for the new suffix
(this may require you to break an edge in two)
TT b n aaa sn
n
$
b n aaa sn
n aaa sn
n aa sn
aa sn
a sn
a s
s
suffixes
$
$
$
$
$
$
$
0
1
2
3
4
5
6
$ 7
never actually
do it like this
you should
Naively constructing a compacted suffix tree
Insert the suffixes one at a time (longest first)
• Search for the new suffix in the partial suffix tree
(as if you were matching a pattern)
• Add a new edge and leaf for the new suffix
(this may require you to break an edge in two)
TT b n aaa sn
n
$
b n aaa sn
n aaa sn
n aa sn
aa sn
a sn
a s
s
suffixes
$
$
$
$
$
$
$
0
1
2
3
4
5
6
$ 7
never actually
do it like this
you should
Naively constructing a compacted suffix tree
Insert the suffixes one at a time (longest first)
• Search for the new suffix in the partial suffix tree
(as if you were matching a pattern)
• Add a new edge and leaf for the new suffix
(this may require you to break an edge in two)
TT b n aaa sn
n
$
b n aaa sn
n aaa sn
n aa sn
aa sn
a sn
a s
s
suffixes
$
$
$
$
$
$
$
0
1
2
3
4
5
6
$ 7
never actually
do it like this
you should
Naively constructing a compacted suffix tree
Insert the suffixes one at a time (longest first)
• Search for the new suffix in the partial suffix tree
(as if you were matching a pattern)
• Add a new edge and leaf for the new suffix
(this may require you to break an edge in two)
TT b n aaa sn
n
$
bananas$
b n aaa sn
n aaa sn
n aa sn
aa sn
a sn
a s
s
suffixes
$
$
$
$
$
$
$
0
1
2
3
4
5
6
$ 7
0
never actually
do it like this
you should
Naively constructing a compacted suffix tree
Insert the suffixes one at a time (longest first)
• Search for the new suffix in the partial suffix tree
(as if you were matching a pattern)
• Add a new edge and leaf for the new suffix
(this may require you to break an edge in two)
TT b n aaa sn
n
$
bananas$
b n aaa sn
n aaa sn
n aa sn
aa sn
a sn
a s
s
suffixes
$
$
$
$
$
$
$
0
1
2
3
4
5
6
$ 7
0
we actually store this as (0, 7)
never actually
do it like this
you should
Naively constructing a compacted suffix tree
Insert the suffixes one at a time (longest first)
• Search for the new suffix in the partial suffix tree
(as if you were matching a pattern)
• Add a new edge and leaf for the new suffix
(this may require you to break an edge in two)
TT b n aaa sn
n
$
bananas$
b n aaa sn
n aaa sn
n aa sn
aa sn
a sn
a s
s
suffixes
$
$
$
$
$
$
$
0
1
2
3
4
5
6
$ 7
0
never actually
do it like this
you should
Naively constructing a compacted suffix tree
Insert the suffixes one at a time (longest first)
• Search for the new suffix in the partial suffix tree
(as if you were matching a pattern)
• Add a new edge and leaf for the new suffix
(this may require you to break an edge in two)
TT b n aaa sn
n
$
bananas$
b n aaa sn
n aaa sn
n aa sn
aa sn
a sn
a s
s
suffixes
$
$
$
$
$
$
$
0
1
2
3
4
5
6
$ 7
0
never actually
do it like this
you should
Naively constructing a compacted suffix tree
Insert the suffixes one at a time (longest first)
• Search for the new suffix in the partial suffix tree
(as if you were matching a pattern)
• Add a new edge and leaf for the new suffix
(this may require you to break an edge in two)
TT b n aaa sn
n
$
bananas$
b n aaa sn
n aaa sn
n aa sn
aa sn
a sn
a s
s
suffixes
$
$
$
$
$
$
$
0
1
2
3
4
5
6
$ 7
0
1
ananas$
never actually
do it like this
you should
Naively constructing a compacted suffix tree
Insert the suffixes one at a time (longest first)
• Search for the new suffix in the partial suffix tree
(as if you were matching a pattern)
• Add a new edge and leaf for the new suffix
(this may require you to break an edge in two)
TT b n aaa sn
n
$
bananas$
b n aaa sn
n aaa sn
n aa sn
aa sn
a sn
a s
s
suffixes
$
$
$
$
$
$
$
0
1
2
3
4
5
6
$ 7
0
1
ananas$
this is stored as (1, 7)
never actually
do it like this
you should
Naively constructing a compacted suffix tree
Insert the suffixes one at a time (longest first)
• Search for the new suffix in the partial suffix tree
(as if you were matching a pattern)
• Add a new edge and leaf for the new suffix
(this may require you to break an edge in two)
TT b n aaa sn
n
$
bananas$
b n aaa sn
n aaa sn
n aa sn
aa sn
a sn
a s
s
suffixes
$
$
$
$
$
$
$
0
1
2
3
4
5
6
$ 7
0
1
ananas$
never actually
do it like this
you should
Naively constructing a compacted suffix tree
Insert the suffixes one at a time (longest first)
• Search for the new suffix in the partial suffix tree
(as if you were matching a pattern)
• Add a new edge and leaf for the new suffix
(this may require you to break an edge in two)
TT b n aaa sn
n
$
bananas$
b n aaa sn
n aaa sn
n aa sn
aa sn
a sn
a s
s
suffixes
$
$
$
$
$
$
$
0
1
2
3
4
5
6
$ 7
0
1
ananas$
never actually
do it like this
you should
Naively constructing a compacted suffix tree
Insert the suffixes one at a time (longest first)
• Search for the new suffix in the partial suffix tree
(as if you were matching a pattern)
• Add a new edge and leaf for the new suffix
(this may require you to break an edge in two)
TT b n aaa sn
n
$
bananas$
b n aaa sn
n aaa sn
n aa sn
aa sn
a sn
a s
s
suffixes
$
$
$
$
$
$
$
0
1
2
3
4
5
6
$ 7
0
1
ananas$
never actually
do it like this
you should
Naively constructing a compacted suffix tree
Insert the suffixes one at a time (longest first)
• Search for the new suffix in the partial suffix tree
(as if you were matching a pattern)
• Add a new edge and leaf for the new suffix
(this may require you to break an edge in two)
TT b n aaa sn
n
$
bananas$
b n aaa sn
n aaa sn
n aa sn
aa sn
a sn
a s
s
suffixes
$
$
$
$
$
$
$
0
1
2
3
4
5
6
$ 7
0
1
ananas$
2
nanas$
never actually
do it like this
you should
Naively constructing a compacted suffix tree
Insert the suffixes one at a time (longest first)
• Search for the new suffix in the partial suffix tree
(as if you were matching a pattern)
• Add a new edge and leaf for the new suffix
(this may require you to break an edge in two)
TT b n aaa sn
n
$
bananas$
b n aaa sn
n aaa sn
n aa sn
aa sn
a sn
a s
s
suffixes
$
$
$
$
$
$
$
0
1
2
3
4
5
6
$ 7
0
1
ananas$
2
nanas$
never actually
do it like this
you should
Naively constructing a compacted suffix tree
Insert the suffixes one at a time (longest first)
• Search for the new suffix in the partial suffix tree
(as if you were matching a pattern)
• Add a new edge and leaf for the new suffix
(this may require you to break an edge in two)
TT b n aaa sn
n
$
bananas$
b n aaa sn
n aaa sn
n aa sn
aa sn
a sn
a s
s
suffixes
$
$
$
$
$
$
$
0
1
2
3
4
5
6
$ 7
0
1
ananas$
2
nanas$
never actually
do it like this
you should
Naively constructing a compacted suffix tree
Insert the suffixes one at a time (longest first)
• Search for the new suffix in the partial suffix tree
(as if you were matching a pattern)
• Add a new edge and leaf for the new suffix
(this may require you to break an edge in two)
TT b n aaa sn
n
$
bananas$
b n aaa sn
n aaa sn
n aa sn
aa sn
a sn
a s
s
suffixes
$
$
$
$
$
$
$
0
1
2
3
4
5
6
$ 7
0
1
ananas$
2
nanas$
ananas$
never actually
do it like this
you should
Naively constructing a compacted suffix tree
Insert the suffixes one at a time (longest first)
• Search for the new suffix in the partial suffix tree
(as if you were matching a pattern)
• Add a new edge and leaf for the new suffix
(this may require you to break an edge in two)
TT b n aaa sn
n
$
bananas$
b n aaa sn
n aaa sn
n aa sn
aa sn
a sn
a s
s
suffixes
$
$
$
$
$
$
$
0
1
2
3
4
5
6
$ 7
0
1
ananas$
2
nanas$
ananas$
ananas$
never actually
do it like this
you should
Naively constructing a compacted suffix tree
Insert the suffixes one at a time (longest first)
• Search for the new suffix in the partial suffix tree
(as if you were matching a pattern)
• Add a new edge and leaf for the new suffix
(this may require you to break an edge in two)
TT b n aaa sn
n
$
bananas$
b n aaa sn
n aaa sn
n aa sn
aa sn
a sn
a s
s
suffixes
$
$
$
$
$
$
$
0
1
2
3
4
5
6
$ 7
0
1
ananas$
2
nanas$
ananas$
ananas$
ananas$
never actually
do it like this
you should
Naively constructing a compacted suffix tree
Insert the suffixes one at a time (longest first)
• Search for the new suffix in the partial suffix tree
(as if you were matching a pattern)
• Add a new edge and leaf for the new suffix
(this may require you to break an edge in two)
TT b n aaa sn
n
$
bananas$
b n aaa sn
n aaa sn
n aa sn
aa sn
a sn
a s
s
suffixes
$
$
$
$
$
$
$
0
1
2
3
4
5
6
$ 7
0
1
ananas$
2
nanas$
ananas$
ananas$
ananas$
ananas$
never actually
do it like this
you should
Naively constructing a compacted suffix tree
Insert the suffixes one at a time (longest first)
• Search for the new suffix in the partial suffix tree
(as if you were matching a pattern)
• Add a new edge and leaf for the new suffix
(this may require you to break an edge in two)
TT b n aaa sn
n
$
bananas$
b n aaa sn
n aaa sn
n aa sn
aa sn
a sn
a s
s
suffixes
$
$
$
$
$
$
$
0
1
2
3
4
5
6
$ 7
0
2
nanas$
nas$
1
ana
nas$
ana
never actually
do it like this
you should
Naively constructing a compacted suffix tree
Insert the suffixes one at a time (longest first)
• Search for the new suffix in the partial suffix tree
(as if you were matching a pattern)
• Add a new edge and leaf for the new suffix
(this may require you to break an edge in two)
TT b n aaa sn
n
$
bananas$
b n aaa sn
n aaa sn
n aa sn
aa sn
a sn
a s
s
suffixes
$
$
$
$
$
$
$
0
1
2
3
4
5
6
$ 7
0
2
nanas$
nas$
1
ana
nas$
ana
ananas$ was stored as (1, 7)
never actually
do it like this
you should
Naively constructing a compacted suffix tree
Insert the suffixes one at a time (longest first)
• Search for the new suffix in the partial suffix tree
(as if you were matching a pattern)
• Add a new edge and leaf for the new suffix
(this may require you to break an edge in two)
TT b n aaa sn
n
$
bananas$
b n aaa sn
n aaa sn
n aa sn
aa sn
a sn
a s
s
suffixes
$
$
$
$
$
$
$
0
1
2
3
4
5
6
$ 7
0
2
nanas$
nas$
1
ana
nas$
ana
ananas$ was stored as (1, 7)
ana is stored as (1, 3)
nas$ is stored as (4, 7)
never actually
do it like this
you should
Naively constructing a compacted suffix tree
Insert the suffixes one at a time (longest first)
• Search for the new suffix in the partial suffix tree
(as if you were matching a pattern)
• Add a new edge and leaf for the new suffix
(this may require you to break an edge in two)
TT b n aaa sn
n
$
bananas$
b n aaa sn
n aaa sn
n aa sn
aa sn
a sn
a s
s
suffixes
$
$
$
$
$
$
$
0
1
2
3
4
5
6
$ 7
0
2
nanas$
nas$
1
ana
nas$
ana
never actually
do it like this
you should
Naively constructing a compacted suffix tree
Insert the suffixes one at a time (longest first)
• Search for the new suffix in the partial suffix tree
(as if you were matching a pattern)
• Add a new edge and leaf for the new suffix
(this may require you to break an edge in two)
TT b n aaa sn
n
$
bananas$
b n aaa sn
n aaa sn
n aa sn
aa sn
a sn
a s
s
suffixes
$
$
$
$
$
$
$
0
1
2
3
4
5
6
$ 7
0
2
nanas$
s$
nas$
1 3
ana
never actually
do it like this
you should
Naively constructing a compacted suffix tree
Insert the suffixes one at a time (longest first)
• Search for the new suffix in the partial suffix tree
(as if you were matching a pattern)
• Add a new edge and leaf for the new suffix
(this may require you to break an edge in two)
TT b n aaa sn
n
$
bananas$
b n aaa sn
n aaa sn
n aa sn
aa sn
a sn
a s
s
suffixes
$
$
$
$
$
$
$
0
1
2
3
4
5
6
$ 7
0
2
nanas$
s$
nas$
1 3
ana
stored as (6, 7)
never actually
do it like this
you should
Naively constructing a compacted suffix tree
Insert the suffixes one at a time (longest first)
• Search for the new suffix in the partial suffix tree
(as if you were matching a pattern)
• Add a new edge and leaf for the new suffix
(this may require you to break an edge in two)
TT b n aaa sn
n
$
bananas$
b n aaa sn
n aaa sn
n aa sn
aa sn
a sn
a s
s
suffixes
$
$
$
$
$
$
$
0
1
2
3
4
5
6
$ 7
0
2
nanas$
s$
nas$
1 3
ana
never actually
do it like this
you should
Naively constructing a compacted suffix tree
Insert the suffixes one at a time (longest first)
• Search for the new suffix in the partial suffix tree
(as if you were matching a pattern)
• Add a new edge and leaf for the new suffix
(this may require you to break an edge in two)
TT b n aaa sn
n
$
bananas$
b n aaa sn
n aaa sn
n aa sn
aa sn
a sn
a s
s
suffixes
$
$
$
$
$
$
$
0
1
2
3
4
5
6
$ 7
0
2
nanas$
s$
nas$
1 3
ana
never actually
do it like this
you should
Naively constructing a compacted suffix tree
Insert the suffixes one at a time (longest first)
• Search for the new suffix in the partial suffix tree
(as if you were matching a pattern)
• Add a new edge and leaf for the new suffix
(this may require you to break an edge in two)
TT b n aaa sn
n
$
bananas$
b n aaa sn
n aaa sn
n aa sn
aa sn
a sn
a s
s
suffixes
$
$
$
$
$
$
$
0
1
2
3
4
5
6
$ 7
0
2
nanas$
s$
nas$
1 3
ana
nanas$
never actually
do it like this
you should
Naively constructing a compacted suffix tree
Insert the suffixes one at a time (longest first)
• Search for the new suffix in the partial suffix tree
(as if you were matching a pattern)
• Add a new edge and leaf for the new suffix
(this may require you to break an edge in two)
TT b n aaa sn
n
$
bananas$
b n aaa sn
n aaa sn
n aa sn
aa sn
a sn
a s
s
suffixes
$
$
$
$
$
$
$
0
1
2
3
4
5
6
$ 7
0
2
nanas$
s$
nas$
1 3
ana
nanas$
nanas$
never actually
do it like this
you should
Naively constructing a compacted suffix tree
Insert the suffixes one at a time (longest first)
• Search for the new suffix in the partial suffix tree
(as if you were matching a pattern)
• Add a new edge and leaf for the new suffix
(this may require you to break an edge in two)
TT b n aaa sn
n
$
bananas$
b n aaa sn
n aaa sn
n aa sn
aa sn
a sn
a s
s
suffixes
$
$
$
$
$
$
$
0
1
2
3
4
5
6
$ 7
0
2
nanas$
s$
nas$
1 3
ana
nanas$
nanas$
nanas$
never actually
do it like this
you should
Naively constructing a compacted suffix tree
Insert the suffixes one at a time (longest first)
• Search for the new suffix in the partial suffix tree
(as if you were matching a pattern)
• Add a new edge and leaf for the new suffix
(this may require you to break an edge in two)
TT b n aaa sn
n
$
na
bananas$
b n aaa sn
n aaa sn
n aa sn
aa sn
a sn
a s
s
suffixes
$
$
$
$
$
$
$
0
1
2
3
4
5
6
$ 7
0
2
s$
nas$
1 3
ana
na
nas$
nas$
never actually
do it like this
you should
Naively constructing a compacted suffix tree
Insert the suffixes one at a time (longest first)
• Search for the new suffix in the partial suffix tree
(as if you were matching a pattern)
• Add a new edge and leaf for the new suffix
(this may require you to break an edge in two)
TT b n aaa sn
n
$
s$
na
bananas$
b n aaa sn
n aaa sn
n aa sn
aa sn
a sn
a s
s
suffixes
$
$
$
$
$
$
$
0
1
2
3
4
5
6
$ 7
0
2 4
s$
nas$
1 3
ana
nas$
never actually
do it like this
you should
Naively constructing a compacted suffix tree
Insert the suffixes one at a time (longest first)
• Search for the new suffix in the partial suffix tree
(as if you were matching a pattern)
• Add a new edge and leaf for the new suffix
(this may require you to break an edge in two)
TT b n aaa sn
n
$
s$
na
bananas$
b n aaa sn
n aaa sn
n aa sn
aa sn
a sn
a s
s
suffixes
$
$
$
$
$
$
$
0
1
2
3
4
5
6
$ 7
0
2 4
s$
nas$
1 3
ana
nas$
never actually
do it like this
you should
Naively constructing a compacted suffix tree
Insert the suffixes one at a time (longest first)
• Search for the new suffix in the partial suffix tree
(as if you were matching a pattern)
• Add a new edge and leaf for the new suffix
(this may require you to break an edge in two)
TT b n aaa sn
n
$
s$
na
bananas$
b n aaa sn
n aaa sn
n aa sn
aa sn
a sn
a s
s
suffixes
$
$
$
$
$
$
$
0
1
2
3
4
5
6
$ 7
0
2 4
s$
nas$
1 3
ana
nas$
never actually
do it like this
you should
Naively constructing a compacted suffix tree
Insert the suffixes one at a time (longest first)
• Search for the new suffix in the partial suffix tree
(as if you were matching a pattern)
• Add a new edge and leaf for the new suffix
(this may require you to break an edge in two)
TT b n aaa sn
n
$
s$
na
bananas$
b n aaa sn
n aaa sn
n aa sn
aa sn
a sn
a s
s
suffixes
$
$
$
$
$
$
$
0
1
2
3
4
5
6
$ 7
0
2 4
s$
nas$
1 3
ana
ana
nas$
never actually
do it like this
you should
Naively constructing a compacted suffix tree
Insert the suffixes one at a time (longest first)
• Search for the new suffix in the partial suffix tree
(as if you were matching a pattern)
• Add a new edge and leaf for the new suffix
(this may require you to break an edge in two)
TT b n aaa sn
n
$
s$
na
bananas$
b n aaa sn
n aaa sn
n aa sn
aa sn
a sn
a s
s
suffixes
$
$
$
$
$
$
$
0
1
2
3
4
5
6
$ 7
0
2 4
s$
nas$
1 3
ana
ana
ana
nas$
never actually
do it like this
you should
Naively constructing a compacted suffix tree
Insert the suffixes one at a time (longest first)
• Search for the new suffix in the partial suffix tree
(as if you were matching a pattern)
• Add a new edge and leaf for the new suffix
(this may require you to break an edge in two)
TT b n aaa sn
n
$ a
nas$
nas$
s$
na
bananas$
b n aaa sn
n aaa sn
n aa sn
aa sn
a sn
a s
s
suffixes
$
$
$
$
$
$
$
0
1
2
3
4
5
6
$ 7 1 3
0
2 4
nas$
a
na
never actually
do it like this
you should
Naively constructing a compacted suffix tree
Insert the suffixes one at a time (longest first)
• Search for the new suffix in the partial suffix tree
(as if you were matching a pattern)
• Add a new edge and leaf for the new suffix
(this may require you to break an edge in two)
TT b n aaa sn
n
$ a
s$
nas$
nas$
s$
na
bananas$
b n aaa sn
n aaa sn
n aa sn
aa sn
a sn
a s
s
suffixes
$
$
$
$
$
$
$
0
1
2
3
4
5
6
$ 7 1 3
5
0
2 4
nas$
never actually
do it like this
you should
Naively constructing a compacted suffix tree
Insert the suffixes one at a time (longest first)
• Search for the new suffix in the partial suffix tree
(as if you were matching a pattern)
• Add a new edge and leaf for the new suffix
(this may require you to break an edge in two)
TT b n aaa sn
n
$ a
s$
nas$
nas$
s$
na
bananas$
b n aaa sn
n aaa sn
n aa sn
aa sn
a sn
a s
s
suffixes
$
$
$
$
$
$
$
0
1
2
3
4
5
6
$ 7 1 3
5
0
2 4
nas$
never actually
do it like this
you should
Naively constructing a compacted suffix tree
Insert the suffixes one at a time (longest first)
• Search for the new suffix in the partial suffix tree
(as if you were matching a pattern)
• Add a new edge and leaf for the new suffix
(this may require you to break an edge in two)
TT b n aaa sn
n
$ a
s$
nas$
nas$
s$
na
bananas$
b n aaa sn
n aaa sn
n aa sn
aa sn
a sn
a s
s
suffixes
$
$
$
$
$
$
$
0
1
2
3
4
5
6
$ 7 1 3
5
0
2 4
nas$
never actually
do it like this
you should
Naively constructing a compacted suffix tree
Insert the suffixes one at a time (longest first)
• Search for the new suffix in the partial suffix tree
(as if you were matching a pattern)
• Add a new edge and leaf for the new suffix
(this may require you to break an edge in two)
TT b n aaa sn
n
$ a
s$
nas$
nas$
s$
na
s$
bananas$
b n aaa sn
n aaa sn
n aa sn
aa sn
a sn
a s
s
suffixes
$
$
$
$
$
$
$
0
1
2
3
4
5
6
$ 7 1 3
5
0
2 4
6
nas$
never actually
do it like this
you should
Naively constructing a compacted suffix tree
Insert the suffixes one at a time (longest first)
• Search for the new suffix in the partial suffix tree
(as if you were matching a pattern)
• Add a new edge and leaf for the new suffix
(this may require you to break an edge in two)
TT b n aaa sn
n
$ a
s$
nas$
nas$
s$
na
s$
bananas$
b n aaa sn
n aaa sn
n aa sn
aa sn
a sn
a s
s
suffixes
$
$
$
$
$
$
$
0
1
2
3
4
5
6
$ 7 1 3
5
0
2 4
6
nas$
never actually
do it like this
you should
Naively constructing a compacted suffix tree
Insert the suffixes one at a time (longest first)
• Search for the new suffix in the partial suffix tree
(as if you were matching a pattern)
• Add a new edge and leaf for the new suffix
(this may require you to break an edge in two)
TT b n aaa sn
n
$ a
s$
nas$
nas$
s$
na
s$
bananas$
b n aaa sn
n aaa sn
n aa sn
aa sn
a sn
a s
s
suffixes
$
$
$
$
$
$
$
0
1
2
3
4
5
6
$ 7 1 3
5
0
2 4
6
nas$
never actually
do it like this
you should
Naively constructing a compacted suffix tree
Insert the suffixes one at a time (longest first)
• Search for the new suffix in the partial suffix tree
(as if you were matching a pattern)
• Add a new edge and leaf for the new suffix
(this may require you to break an edge in two)
TT b n aaa sn
n
$ a
s$
nas$
nas$
s$
na
s$
bananas$
7$
b n aaa sn
n aaa sn
n aa sn
aa sn
a sn
a s
s
suffixes
$
$
$
$
$
$
$
0
1
2
3
4
5
6
$ 7 1 3
5
0
2 4
6
nas$
never actually
do it like this
you should
Naively constructing a compacted suffix tree
Insert the suffixes one at a time (longest first)
• Search for the new suffix in the partial suffix tree
(as if you were matching a pattern)
• Add a new edge and leaf for the new suffix
(this may require you to break an edge in two)
TT b n aaa sn
n
$ a
s$
nas$
nas$
s$
na
s$
bananas$
7$
b n aaa sn
n aaa sn
n aa sn
aa sn
a sn
a s
s
suffixes
$
$
$
$
$
$
$
0
1
2
3
4
5
6
$ 7 1 3
5
0
2 4
6
nas$
never actually
do it like this
you should
Naively constructing a compacted suffix tree
Insert the suffixes one at a time (longest first)
• Search for the new suffix in the partial suffix tree
(as if you were matching a pattern)
• Add a new edge and leaf for the new suffix
(this may require you to break an edge in two)
TT b n aaa sn
n
$ a
s$
nas$
nas$
s$
na
s$
bananas$
7$
b n aaa sn
n aaa sn
n aa sn
aa sn
a sn
a s
s
suffixes
$
$
$
$
$
$
$
0
1
2
3
4
5
6
$ 7 1 3
5
0
2 4
6
nas$
This takes O(n) time per suffix. . .
never actually
do it like this
you should
Naively constructing a compacted suffix tree
Insert the suffixes one at a time (longest first)
• Search for the new suffix in the partial suffix tree
(as if you were matching a pattern)
• Add a new edge and leaf for the new suffix
(this may require you to break an edge in two)
TT b n aaa sn
n
$ a
s$
nas$
nas$
s$
na
s$
bananas$
7$
b n aaa sn
n aaa sn
n aa sn
aa sn
a sn
a s
s
suffixes
$
$
$
$
$
$
$
0
1
2
3
4
5
6
$ 7 1 3
5
0
2 4
6
nas$
This takes O(n) time per suffix. . .
so O(n2) time in total
never actually
do it like this
you should
Suffix tree summary
TT b n aaa sn
n
$
a
s$
nas$
nas$
s$
na
s$
bananas$
7$
b n aaa sn
n aaa sn
n aa sn
aa sn
a sn
a s
s
suffixes
$
$
$
$
$
$
$
0
1
2
3
4
5
6
$ 7
1 3
5
0
2 4
6
nas$
• The (compacted) suffix tree of a (length n) text uses O(n) space
• Finding all matches of a pattern P of length m takes O(m + occ)
where occ is the number of matches
we assumed that the alphabet contained a constant number of symbols
• Suffix trees can be built in O(n) time
but we have only seen the O(n2) time method
actually
do it like this (or build a suffix array instead)
you should
Multiple text indexing
T1 b n aaa sn
n1
T2 a p slp e
n2
How can we index multiple texts?
Multiple text indexing
T1 b n aaa sn
n1
T2 a p slp e
n2
$
&
two distinct unique symbols
How can we index multiple texts?
Multiple text indexing
TT
T1 b n aaa sn
n1
T2 a p slp e
n2
$
&
two distinct unique symbols
b n aaa sn $ a p slp e &
n
How can we index multiple texts?
Multiple text indexing
TT
T1 b n aaa sn
n1
T2 a p slp e
n2
$
&
b n aaa sn $ a p slp e &
n
How can we index multiple texts?
Multiple text indexing
6$
13
&
TT
a
s$
nas$
nas$
s$
na
s
bananas$
7$
1 3
5
0
2 4
nas$
T1 b n aaa sn
n1
T2 a p slp e
n2
$
&
b n aaa sn $ a p slp e &
n
14
&
12
es&
les&11
8
pples&
p
10 les&
9
ples&
How can we index multiple texts?
Multiple text indexing
6$
13
&
TT
a
s$
nas$
nas$
s$
na
s
bananas$
7$
1 3
5
0
2 4
nas$
T1 b n aaa sn
n1
T2 a p slp e
n2
$
&
b n aaa sn $ a p slp e &
n
• Build a generalised suffix tree in O(n1 + n2) space
14
&
12
es&
les&11
8
pples&
p
10 les&
9
ples&
How can we index multiple texts?
Multiple text indexing
6$
13
&
TT
a
s$
nas$
nas$
s$
na
s
bananas$
7$
1 3
5
0
2 4
nas$
T1 b n aaa sn
n1
T2 a p slp e
n2
$
&
b n aaa sn $ a p slp e &
n
• Build a generalised suffix tree in O(n1 + n2) space
• Using the linear time method (which we omitted), this takes O(n1 + n2) time
14
&
12
es&
les&11
8
pples&
p
10 les&
9
ples&
How can we index multiple texts?
Multiple text indexing
6$
13
&
TT
a
s$
nas$
nas$
s$
na
s
bananas$
7$
1 3
5
0
2 4
nas$
T1 b n aaa sn
n1
T2 a p slp e
n2
$
&
b n aaa sn $ a p slp e &
n
• Build a generalised suffix tree in O(n1 + n2) space
• Using the linear time method (which we omitted), this takes O(n1 + n2) time
• Finding all matches of a pattern P of length m still takes O(m + occ) time
14
&
12
es&
les&11
8
pples&
p
10 les&
9
ples&
How can we index multiple texts?
where occ is the number of matches
The suffix array - a sneak preview
T b n aaT a sn
n
The suffix array - a sneak preview
T b n aaT a sn
n 0 b n aaa sn
n aa1 a sn
2 n aa sn
4 a sn
5 a s
6 s
3 aa sn
The suffix array - a sneak preview
T b n aaT a sn
n 0 b n aaa sn
n aa1 a sn
2 n aa sn
4 a sn
5 a s
6 s
3 aa snsuffix
1
The suffix array - a sneak preview
T b n aaT a sn
n 0 b n aaa sn
n aa1 a sn
2 n aa sn
4 a sn
5 a s
6 s
3 aa sn
The suffix array - a sneak preview
T b n aaT a sn
n 0 b n aaa sn
n aa1 a sn
2 n aa sn
4 a sn
5 a s
6 s
Sort the suffixes
lexicographically
3 aa sn
The suffix array - a sneak preview
T b n aaT a sn
n 0 b n aaa sn
n aa1 a sn
2 n aa sn
4 a sn
5 a s
6 s
Sort the suffixes
lexicographically
3 aa sn
• The symbols themselves must have an order
throughout we will use alphabetical order
The suffix array - a sneak preview
T b n aaT a sn
n 0 b n aaa sn
n aa1 a sn
2 n aa sn
4 a sn
5 a s
6 s
Sort the suffixes
lexicographically
3 aa sn
• The symbols themselves must have an order
throughout we will use alphabetical order
In lexicographical ordering we sort strings based on the first symbol that differs:
The suffix array - a sneak preview
T b n aaT a sn
n 0 b n aaa sn
n aa1 a sn
2 n aa sn
4 a sn
5 a s
6 s
Sort the suffixes
lexicographically
3 aa sn
• The symbols themselves must have an order
throughout we will use alphabetical order
In lexicographical ordering we sort strings based on the first symbol that differs:
b a<a a
The suffix array - a sneak preview
T b n aaT a sn
n 0 b n aaa sn
n aa1 a sn
2 n aa sn
4 a sn
5 a s
6 s
Sort the suffixes
lexicographically
3 aa sn
• The symbols themselves must have an order
throughout we will use alphabetical order
In lexicographical ordering we sort strings based on the first symbol that differs:
b a<a a
The suffix array - a sneak preview
T b n aaT a sn
n 0 b n aaa sn
n aa1 a sn
2 n aa sn
4 a sn
5 a s
6 s
Sort the suffixes
lexicographically
3 aa sn
• The symbols themselves must have an order
throughout we will use alphabetical order
In lexicographical ordering we sort strings based on the first symbol that differs:
b a<a a
The suffix array - a sneak preview
T b n aaT a sn
n 0 b n aaa sn
n aa1 a sn
2 n aa sn
4 a sn
5 a s
6 s
Sort the suffixes
lexicographically
3 aa sn
• The symbols themselves must have an order
throughout we will use alphabetical order
In lexicographical ordering we sort strings based on the first symbol that differs:
b a<a a b c<
The suffix array - a sneak preview
T b n aaT a sn
n 0 b n aaa sn
n aa1 a sn
2 n aa sn
4 a sn
5 a s
6 s
Sort the suffixes
lexicographically
3 aa sn
• The symbols themselves must have an order
throughout we will use alphabetical order
In lexicographical ordering we sort strings based on the first symbol that differs:
b a<a a b c<
The suffix array - a sneak preview
T b n aaT a sn
n 0 b n aaa sn
n aa1 a sn
2 n aa sn
4 a sn
5 a s
6 s
Sort the suffixes
lexicographically
3 aa sn
• The symbols themselves must have an order
throughout we will use alphabetical order
In lexicographical ordering we sort strings based on the first symbol that differs:
b a<a a b c<
The suffix array - a sneak preview
T b n aaT a sn
n 0 b n aaa sn
n aa1 a sn
2 n aa sn
4 a sn
5 a s
6 s
Sort the suffixes
lexicographically
3 aa sn
• The symbols themselves must have an order
throughout we will use alphabetical order
In lexicographical ordering we sort strings based on the first symbol that differs:
b a<a a b c< b c< a
The suffix array - a sneak preview
T b n aaT a sn
n 0 b n aaa sn
n aa1 a sn
2 n aa sn
4 a sn
5 a s
6 s
Sort the suffixes
lexicographically
3 aa sn
• The symbols themselves must have an order
throughout we will use alphabetical order
In lexicographical ordering we sort strings based on the first symbol that differs:
b a<a a b c< b c< a
The suffix array - a sneak preview
T b n aaT a sn
n 0 b n aaa sn
n aa1 a sn
2 n aa sn
4 a sn
5 a s
6 s
Sort the suffixes
lexicographically
3 aa sn
• The symbols themselves must have an order
throughout we will use alphabetical order
In lexicographical ordering we sort strings based on the first symbol that differs:
b a<a a b c< b c< a
(in a ‘tie’, the shorter string is smaller)
The suffix array - a sneak preview
T b n aaT a sn
n 0 b n aaa sn
n aa1 a sn
2 n aa sn
4 a sn
5 a s
6 s
Sort the suffixes
lexicographically
3 aa sn
• The symbols themselves must have an order
throughout we will use alphabetical order
In lexicographical ordering we sort strings based on the first symbol that differs:
b a<a a b c< b c< a
(in a ‘tie’, the shorter string is smaller)
The suffix array - a sneak preview
T b n aaT a sn
n
Sort the suffixes
lexicographically
0 b n aaa sn
n aa1 a sn
2 n aa sn
4 a sn
5 a s
6 s
3 aa sn
• The symbols themselves must have an order
throughout we will use alphabetical order
In lexicographical ordering we sort strings based on the first symbol that differs:
b a<a a b c< b c< a
(in a ‘tie’, the shorter string is smaller)
The suffix array - a sneak preview
T b n aaT a sn
n
Sort the suffixes
lexicographically
0 b n aaa sn
n aa1 a sn
2 n aa sn
4 a sn
5 a s
6 s
3 aa sn
• The symbols themselves must have an order
throughout we will use alphabetical order
In lexicographical ordering we sort strings based on the first symbol that differs:
b a<a a b c< b c< a
(in a ‘tie’, the shorter string is smaller)
The suffix array - a sneak preview
T b n aaT a sn
n
Sort the suffixes
lexicographically
0 b n aaa sn
n aa1 a sn
2 n aa sn
4 a sn
5 a s
6 s
3 aa sn
• The symbols themselves must have an order
throughout we will use alphabetical order
In lexicographical ordering we sort strings based on the first symbol that differs:
b a<a a b c< b c< a
(in a ‘tie’, the shorter string is smaller)
The suffix array - a sneak preview
T b n aaT a sn
n
Sort the suffixes
lexicographically
0 b n aaa sn
n aa1 a sn
2 n aa sn
4 a sn
5 a s
6 s
3 aa sn
• The symbols themselves must have an order
throughout we will use alphabetical order
In lexicographical ordering we sort strings based on the first symbol that differs:
b a<a a b c< b c< a
(in a ‘tie’, the shorter string is smaller)
The suffix array - a sneak preview
T b n aaT a sn
n
Sort the suffixes
lexicographically
0 b n aaa sn
n aa1 a sn
2 n aa sn
4 a sn
5 a s
6 s
3 aa sn
• The symbols themselves must have an order
throughout we will use alphabetical order
b a<a a b c< b c< a
(in a ‘tie’, the shorter string is smaller)
just a fancy name for the order the strings would appear in a dictionary
In lexicographical ordering we sort strings based on the first symbol that differs:
The suffix array - a sneak preview
T b n aaT a sn
n
Sort the suffixes
lexicographically
0 b n aaa sn
n aa1 a sn
2 n aa sn
4 a sn
5 a s
6 s
3 aa sn
• The symbols themselves must have an order
throughout we will use alphabetical order
If the symbols don’t have a natural order, we use their binary representation in memory
b a<a a b c< b c< a
(in a ‘tie’, the shorter string is smaller)
just a fancy name for the order the strings would appear in a dictionary
In lexicographical ordering we sort strings based on the first symbol that differs:
The suffix array - a sneak preview
T b n aaT a sn
n
Sort the suffixes
lexicographically
0 b n aaa sn
n aa1 a sn
2 n aa sn
4 a sn
5 a s
6 s
3 aa sn
The suffix array - a sneak preview
T b n aaT a sn
n
Sort the suffixes
lexicographically
0 b n aaa sn
n aa1 a sn
2 n aa sn
4 a sn
5 a s
6 s
3 aa sn
10 2 3 4 5 6
The suffix array - a sneak preview
T b n aaT a sn
n
Suffix Array 1 0 625 4
Sort the suffixes
lexicographically
0 b n aaa sn
n aa1 a sn
2 n aa sn
4 a sn
5 a s
6 s
3 aa sn
3
n
10 2 3 4 5 6
The suffix array - a sneak preview
T b n aaT a sn
n
Suffix Array 1 0 625 4
Sort the suffixes
lexicographically
0 b n aaa sn
n aa1 a sn
2 n aa sn
4 a sn
5 a s
6 s
3 aa sn
3
n
10 2 3 4 5 6
The suffix array - a sneak preview
T b n aaT a sn
n
Suffix Array 1 0 625 4
Sort the suffixes
lexicographically
0 b n aaa sn
n aa1 a sn
2 n aa sn
4 a sn
5 a s
6 s
3 aa sn
3
n
10 2 3 4 5 6
The suffix array - a sneak preview
T b n aaT a sn
n
Suffix Array 1 0 625 4
Sort the suffixes
lexicographically
0 b n aaa sn
n aa1 a sn
2 n aa sn
4 a sn
5 a s
6 s
3 aa sn
3
n
The suffix array is much smaller than the suffix tree (in terms of constants)
10 2 3 4 5 6
The suffix array - a sneak preview
T b n aaT a sn
n
Suffix Array 1 0 625 4
Sort the suffixes
lexicographically
0 b n aaa sn
n aa1 a sn
2 n aa sn
4 a sn
5 a s
6 s
3 aa sn
3
n
The suffix array is much smaller than the suffix tree (in terms of constants)
10 2 3 4 5 6
Constructing the Suffix Array from the Suffix Tree
a
s$
nas$
nas$
nas$
s$
na
s$
bananas$
1 3
5
0
2 4
6
T b n aaT a sn
n
Suffix Array
Suffix Tree
recall that we added a unique symbol $ to make sure the tree exists
- the $ is the smallest symbol in the alphabet
1 0 625 43
n
10 2 3 4 5 6
Constructing the Suffix Array from the Suffix Tree
a
s$
nas$
nas$
nas$
s$
na
s$
bananas$
1 3
5
0
2 4
6
T b n aaT a sn
n
Suffix Array
Suffix Tree
recall that we added a unique symbol $ to make sure the tree exists
- the $ is the smallest symbol in the alphabet
1 0 625 43
n
To get the Suffix array perform a depth-first search (in lexicographical order)
10 2 3 4 5 6
Constructing the Suffix Array from the Suffix Tree
a
s$
nas$
nas$
nas$
s$
na
s$
bananas$
1 3
5
0
2 4
6
T b n aaT a sn
n
Suffix Array
recall that we added a unique symbol $ to make sure the tree exists
- the $ is the smallest symbol in the alphabet
1 0 625 43
n
To get the Suffix array perform a depth-first search (in lexicographical order)
10 2 3 4 5 6
Constructing the Suffix Array from the Suffix Tree
a
s$
nas$
nas$
nas$
s$
na
s$
bananas$
1 3
5
0
2 4
6
T b n aaT a sn
n
Suffix Array
recall that we added a unique symbol $ to make sure the tree exists
- the $ is the smallest symbol in the alphabet
1 0 625 43
n
To get the Suffix array perform a depth-first search (in lexicographical order)
10 2 3 4 5 6
Constructing the Suffix Array from the Suffix Tree
a
s$
nas$
nas$
nas$
s$
na
s$
bananas$
1 3
5
0
2 4
6
T b n aaT a sn
n
Suffix Array
recall that we added a unique symbol $ to make sure the tree exists
- the $ is the smallest symbol in the alphabet
1 0 625 43
n
To get the Suffix array perform a depth-first search (in lexicographical order)
10 2 3 4 5 6
Constructing the Suffix Array from the Suffix Tree
a
s$
nas$
nas$
nas$
s$
na
s$
bananas$
1 3
5
0
2 4
6
T b n aaT a sn
n
Suffix Array
recall that we added a unique symbol $ to make sure the tree exists
- the $ is the smallest symbol in the alphabet
1 0 625 43
n
To get the Suffix array perform a depth-first search (in lexicographical order)
1
10 2 3 4 5 6
Constructing the Suffix Array from the Suffix Tree
a
s$
nas$
nas$
nas$
s$
na
s$
bananas$
1 3
5
0
2 4
6
T b n aaT a sn
n
Suffix Array
recall that we added a unique symbol $ to make sure the tree exists
- the $ is the smallest symbol in the alphabet
1 0 625 43
n
To get the Suffix array perform a depth-first search (in lexicographical order)
1
10 2 3 4 5 6
Constructing the Suffix Array from the Suffix Tree
a
s$
nas$
nas$
nas$
s$
na
s$
bananas$
1 3
5
0
2 4
6
T b n aaT a sn
n
Suffix Array
recall that we added a unique symbol $ to make sure the tree exists
- the $ is the smallest symbol in the alphabet
1 0 625 43
n
To get the Suffix array perform a depth-first search (in lexicographical order)
1
10 2 3 4 5 6
Constructing the Suffix Array from the Suffix Tree
a
s$
nas$
nas$
nas$
s$
na
s$
bananas$
1 3
5
0
2 4
6
T b n aaT a sn
n
Suffix Array
recall that we added a unique symbol $ to make sure the tree exists
- the $ is the smallest symbol in the alphabet
1 0 625 43
n
To get the Suffix array perform a depth-first search (in lexicographical order)
1 3
10 2 3 4 5 6
Constructing the Suffix Array from the Suffix Tree
a
s$
nas$
nas$
nas$
s$
na
s$
bananas$
1 3
5
0
2 4
6
T b n aaT a sn
n
Suffix Array
recall that we added a unique symbol $ to make sure the tree exists
- the $ is the smallest symbol in the alphabet
1 0 625 43
n
To get the Suffix array perform a depth-first search (in lexicographical order)
1 3
10 2 3 4 5 6
Constructing the Suffix Array from the Suffix Tree
a
s$
nas$
nas$
nas$
s$
na
s$
bananas$
1 3
5
0
2 4
6
T b n aaT a sn
n
Suffix Array
recall that we added a unique symbol $ to make sure the tree exists
- the $ is the smallest symbol in the alphabet
1 0 625 43
n
To get the Suffix array perform a depth-first search (in lexicographical order)
1 3
10 2 3 4 5 6
Constructing the Suffix Array from the Suffix Tree
a
s$
nas$
nas$
nas$
s$
na
s$
bananas$
1 3
5
0
2 4
6
T b n aaT a sn
n
Suffix Array
recall that we added a unique symbol $ to make sure the tree exists
- the $ is the smallest symbol in the alphabet
1 0 625 43
n
To get the Suffix array perform a depth-first search (in lexicographical order)
1 3
10 2 3 4 5 6
Constructing the Suffix Array from the Suffix Tree
a
s$
nas$
nas$
nas$
s$
na
s$
bananas$
1 3
5
0
2 4
6
T b n aaT a sn
n
Suffix Array
recall that we added a unique symbol $ to make sure the tree exists
- the $ is the smallest symbol in the alphabet
1 0 625 43
n
To get the Suffix array perform a depth-first search (in lexicographical order)
1 3
5
10 2 3 4 5 6
Constructing the Suffix Array from the Suffix Tree
a
s$
nas$
nas$
nas$
s$
na
s$
bananas$
1 3
5
0
2 4
6
T b n aaT a sn
n
Suffix Array
recall that we added a unique symbol $ to make sure the tree exists
- the $ is the smallest symbol in the alphabet
1 0 625 43
n
To get the Suffix array perform a depth-first search (in lexicographical order)
1 3
5
10 2 3 4 5 6
Constructing the Suffix Array from the Suffix Tree
a
s$
nas$
nas$
nas$
s$
na
s$
bananas$
1 3
5
0
2 4
6
T b n aaT a sn
n
Suffix Array
recall that we added a unique symbol $ to make sure the tree exists
- the $ is the smallest symbol in the alphabet
1 0 625 43
n
To get the Suffix array perform a depth-first search (in lexicographical order)
1 3
5
10 2 3 4 5 6
Constructing the Suffix Array from the Suffix Tree
a
s$
nas$
nas$
nas$
s$
na
s$
bananas$
1 3
5
0
2 4
6
T b n aaT a sn
n
Suffix Array
recall that we added a unique symbol $ to make sure the tree exists
- the $ is the smallest symbol in the alphabet
1 0 625 43
n
To get the Suffix array perform a depth-first search (in lexicographical order)
1 3
5
10 2 3 4 5 6
Constructing the Suffix Array from the Suffix Tree
a
s$
nas$
nas$
nas$
s$
na
s$
bananas$
1 3
5
0
2 4
6
T b n aaT a sn
n
Suffix Array
recall that we added a unique symbol $ to make sure the tree exists
- the $ is the smallest symbol in the alphabet
1 0 625 43
n
To get the Suffix array perform a depth-first search (in lexicographical order)
1 3
5
0
10 2 3 4 5 6
Constructing the Suffix Array from the Suffix Tree
a
s$
nas$
nas$
nas$
s$
na
s$
bananas$
1 3
5
0
2 4
6
T b n aaT a sn
n
Suffix Array
recall that we added a unique symbol $ to make sure the tree exists
- the $ is the smallest symbol in the alphabet
1 0 625 43
n
To get the Suffix array perform a depth-first search (in lexicographical order)
1 3
5
0
10 2 3 4 5 6
Constructing the Suffix Array from the Suffix Tree
a
s$
nas$
nas$
nas$
s$
na
s$
bananas$
1 3
5
0
2 4
6
T b n aaT a sn
n
Suffix Array
recall that we added a unique symbol $ to make sure the tree exists
- the $ is the smallest symbol in the alphabet
1 0 625 43
n
To get the Suffix array perform a depth-first search (in lexicographical order)
1 3
5
0
10 2 3 4 5 6
Constructing the Suffix Array from the Suffix Tree
a
s$
nas$
nas$
nas$
s$
na
s$
bananas$
1 3
5
0
2 4
6
T b n aaT a sn
n
Suffix Array
recall that we added a unique symbol $ to make sure the tree exists
- the $ is the smallest symbol in the alphabet
1 0 625 43
n
To get the Suffix array perform a depth-first search (in lexicographical order)
1 3
5
0
10 2 3 4 5 6
Constructing the Suffix Array from the Suffix Tree
a
s$
nas$
nas$
nas$
s$
na
s$
bananas$
1 3
5
0
2 4
6
T b n aaT a sn
n
Suffix Array
recall that we added a unique symbol $ to make sure the tree exists
- the $ is the smallest symbol in the alphabet
1 0 625 43
n
To get the Suffix array perform a depth-first search (in lexicographical order)
1 3
5
0
2
10 2 3 4 5 6
Constructing the Suffix Array from the Suffix Tree
a
s$
nas$
nas$
nas$
s$
na
s$
bananas$
1 3
5
0
2 4
6
T b n aaT a sn
n
Suffix Array
recall that we added a unique symbol $ to make sure the tree exists
- the $ is the smallest symbol in the alphabet
1 0 625 43
n
To get the Suffix array perform a depth-first search (in lexicographical order)
1 3
5
0
2
10 2 3 4 5 6
Constructing the Suffix Array from the Suffix Tree
a
s$
nas$
nas$
nas$
s$
na
s$
bananas$
1 3
5
0
2 4
6
T b n aaT a sn
n
Suffix Array
recall that we added a unique symbol $ to make sure the tree exists
- the $ is the smallest symbol in the alphabet
1 0 625 43
n
To get the Suffix array perform a depth-first search (in lexicographical order)
1 3
5
0
2
10 2 3 4 5 6
Constructing the Suffix Array from the Suffix Tree
a
s$
nas$
nas$
nas$
s$
na
s$
bananas$
1 3
5
0
2 4
6
T b n aaT a sn
n
Suffix Array
recall that we added a unique symbol $ to make sure the tree exists
- the $ is the smallest symbol in the alphabet
1 0 625 43
n
To get the Suffix array perform a depth-first search (in lexicographical order)
1 3
5
0
2 4
10 2 3 4 5 6
Constructing the Suffix Array from the Suffix Tree
a
s$
nas$
nas$
nas$
s$
na
s$
bananas$
1 3
5
0
2 4
6
T b n aaT a sn
n
Suffix Array
recall that we added a unique symbol $ to make sure the tree exists
- the $ is the smallest symbol in the alphabet
1 0 625 43
n
To get the Suffix array perform a depth-first search (in lexicographical order)
1 3
5
0
2 4
10 2 3 4 5 6
Constructing the Suffix Array from the Suffix Tree
a
s$
nas$
nas$
nas$
s$
na
s$
bananas$
1 3
5
0
2 4
6
T b n aaT a sn
n
Suffix Array
recall that we added a unique symbol $ to make sure the tree exists
- the $ is the smallest symbol in the alphabet
1 0 625 43
n
To get the Suffix array perform a depth-first search (in lexicographical order)
1 3
5
0
2 4
10 2 3 4 5 6
Constructing the Suffix Array from the Suffix Tree
a
s$
nas$
nas$
nas$
s$
na
s$
bananas$
1 3
5
0
2 4
6
T b n aaT a sn
n
Suffix Array
recall that we added a unique symbol $ to make sure the tree exists
- the $ is the smallest symbol in the alphabet
1 0 625 43
n
To get the Suffix array perform a depth-first search (in lexicographical order)
1 3
5
0
2 4
10 2 3 4 5 6
Constructing the Suffix Array from the Suffix Tree
a
s$
nas$
nas$
nas$
s$
na
s$
bananas$
1 3
5
0
2 4
6
T b n aaT a sn
n
Suffix Array
recall that we added a unique symbol $ to make sure the tree exists
- the $ is the smallest symbol in the alphabet
1 0 625 43
n
To get the Suffix array perform a depth-first search (in lexicographical order)
1 3
5
0
2 4
6
10 2 3 4 5 6
Constructing the Suffix Array from the Suffix Tree
a
s$
nas$
nas$
nas$
s$
na
s$
bananas$
1 3
5
0
2 4
6
T b n aaT a sn
n
Suffix Array
recall that we added a unique symbol $ to make sure the tree exists
- the $ is the smallest symbol in the alphabet
1 0 625 43
n
To get the Suffix array perform a depth-first search (in lexicographical order)
1 3
5
0
2 4
6
10 2 3 4 5 6
Constructing the Suffix Array from the Suffix Tree
a
s$
nas$
nas$
nas$
s$
na
s$
bananas$
1 3
5
0
2 4
6
T b n aaT a sn
n
Suffix Array
recall that we added a unique symbol $ to make sure the tree exists
- the $ is the smallest symbol in the alphabet
1 0 625 43
n
To get the Suffix array perform a depth-first search (in lexicographical order)
this takes O(n) time
1 3
5
0
2 4
6
10 2 3 4 5 6
Suffix tree summary
TT b n aaa sn
n
$
a
s$
nas$
nas$
s$
na
s$
bananas$
7$
b n aaa sn
n aaa sn
n aa sn
aa sn
a sn
a s
s
suffixes
$
$
$
$
$
$
$
0
1
2
3
4
5
6
$ 7
1 3
5
0
2 4
6
nas$
• The (compacted) suffix tree of a (length n) text uses O(n) space
• Finding all matches of a pattern P of length m takes O(m + occ)
where occ is the number of matches
we assumed that the alphabet contains a constant number of symbols
• Suffix trees can be built in O(n) time
but we have only seen the O(n2) time method
Ad

More Related Content

What's hot (20)

Suffix Tree and Suffix Array
Suffix Tree and Suffix ArraySuffix Tree and Suffix Array
Suffix Tree and Suffix Array
Harshit Agarwal
 
big_oh
big_ohbig_oh
big_oh
Jonathan (JT) Cho
 
Theory of computation and automata
Theory of computation and automataTheory of computation and automata
Theory of computation and automata
Prof. Dr. K. Adisesha
 
Theory of computation / Post’s Correspondence Problems (PCP)
Theory of computation / Post’s Correspondence Problems (PCP)Theory of computation / Post’s Correspondence Problems (PCP)
Theory of computation / Post’s Correspondence Problems (PCP)
Technical Advisor at Iraqi Government
 
Tries data structures
Tries data structuresTries data structures
Tries data structures
Jothi Ramasamy
 
Boyer moore algorithm
Boyer moore algorithmBoyer moore algorithm
Boyer moore algorithm
AYESHA JAVED
 
Lec17
Lec17Lec17
Lec17
Nikhil Chilwant
 
Algorithm Assignment Help
Algorithm Assignment HelpAlgorithm Assignment Help
Algorithm Assignment Help
Programming Homework Help
 
Huffman analysis
Huffman analysisHuffman analysis
Huffman analysis
Abubakar Sultan
 
Fourier series and it's examples
Fourier series and it's examplesFourier series and it's examples
Fourier series and it's examples
ZorawarJaat
 
Special Elements of a Ternary Semiring
Special Elements of a Ternary SemiringSpecial Elements of a Ternary Semiring
Special Elements of a Ternary Semiring
IJERA Editor
 
Discrete Fourier Series | Discrete Fourier Transform | Discrete Time Fourier ...
Discrete Fourier Series | Discrete Fourier Transform | Discrete Time Fourier ...Discrete Fourier Series | Discrete Fourier Transform | Discrete Time Fourier ...
Discrete Fourier Series | Discrete Fourier Transform | Discrete Time Fourier ...
Mehran University Of Engineering and Technology, Pakistan
 
Lec18
Lec18Lec18
Lec18
Nikhil Chilwant
 
theory of computation lecture 02
theory of computation lecture 02theory of computation lecture 02
theory of computation lecture 02
8threspecter
 
Theory of computation Lec3 dfa
Theory of computation Lec3 dfaTheory of computation Lec3 dfa
Theory of computation Lec3 dfa
Arab Open University and Cairo University
 
String in python lecture (3)
String in python lecture (3)String in python lecture (3)
String in python lecture (3)
Ali ٍSattar
 
Dynamic programing
Dynamic programingDynamic programing
Dynamic programing
AniketSingh609353
 
Basics of Mathematical Cryptography
Basics of Mathematical CryptographyBasics of Mathematical Cryptography
Basics of Mathematical Cryptography
Neha Gupta
 
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)
inventionjournals
 
smtlecture.4
smtlecture.4smtlecture.4
smtlecture.4
Roberto Bruttomesso
 
Suffix Tree and Suffix Array
Suffix Tree and Suffix ArraySuffix Tree and Suffix Array
Suffix Tree and Suffix Array
Harshit Agarwal
 
Boyer moore algorithm
Boyer moore algorithmBoyer moore algorithm
Boyer moore algorithm
AYESHA JAVED
 
Fourier series and it's examples
Fourier series and it's examplesFourier series and it's examples
Fourier series and it's examples
ZorawarJaat
 
Special Elements of a Ternary Semiring
Special Elements of a Ternary SemiringSpecial Elements of a Ternary Semiring
Special Elements of a Ternary Semiring
IJERA Editor
 
theory of computation lecture 02
theory of computation lecture 02theory of computation lecture 02
theory of computation lecture 02
8threspecter
 
String in python lecture (3)
String in python lecture (3)String in python lecture (3)
String in python lecture (3)
Ali ٍSattar
 
Basics of Mathematical Cryptography
Basics of Mathematical CryptographyBasics of Mathematical Cryptography
Basics of Mathematical Cryptography
Neha Gupta
 
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)
inventionjournals
 

Similar to Pattern Matching Part One: Suffix Trees (12)

Lecture10.pdf
Lecture10.pdfLecture10.pdf
Lecture10.pdf
tmmwj1
 
String matching algorithms-pattern matching.
String matching algorithms-pattern matching.String matching algorithms-pattern matching.
String matching algorithms-pattern matching.
Swapan Shakhari
 
String Matching Finite Automata & KMP Algorithm.
String Matching Finite Automata & KMP Algorithm.String Matching Finite Automata & KMP Algorithm.
String Matching Finite Automata & KMP Algorithm.
Malek Sumaiya
 
patterns_and_sequences_1.ppt
patterns_and_sequences_1.pptpatterns_and_sequences_1.ppt
patterns_and_sequences_1.ppt
RimaFebriani10
 
String Matching algorithm String Matching algorithm String Matching algorithm
String Matching algorithm String Matching algorithm String Matching algorithmString Matching algorithm String Matching algorithm String Matching algorithm
String Matching algorithm String Matching algorithm String Matching algorithm
praweenkumarsahu9
 
String Matching (Naive,Rabin-Karp,KMP)
String Matching (Naive,Rabin-Karp,KMP)String Matching (Naive,Rabin-Karp,KMP)
String Matching (Naive,Rabin-Karp,KMP)
Aditya pratap Singh
 
Python Strings.pptx
Python Strings.pptxPython Strings.pptx
Python Strings.pptx
adityakumawat625
 
lect26-em.ppt
lect26-em.pptlect26-em.ppt
lect26-em.ppt
SYAMDAVULURI
 
Trie Data Structure
Trie Data Structure Trie Data Structure
Trie Data Structure
Hitesh Mohapatra
 
Periodic pattern mining
Periodic pattern miningPeriodic pattern mining
Periodic pattern mining
Ashis Kumar Chanda
 
Periodic pattern mining
Periodic pattern miningPeriodic pattern mining
Periodic pattern mining
Ashis Chanda
 
String Matching Algorithms: Naive, KMP, Rabin-Karp
String Matching Algorithms: Naive, KMP, Rabin-KarpString Matching Algorithms: Naive, KMP, Rabin-Karp
String Matching Algorithms: Naive, KMP, Rabin-Karp
NAtional Institute of TEchnology Rourkela , Galgotias University
 
Lecture10.pdf
Lecture10.pdfLecture10.pdf
Lecture10.pdf
tmmwj1
 
String matching algorithms-pattern matching.
String matching algorithms-pattern matching.String matching algorithms-pattern matching.
String matching algorithms-pattern matching.
Swapan Shakhari
 
String Matching Finite Automata & KMP Algorithm.
String Matching Finite Automata & KMP Algorithm.String Matching Finite Automata & KMP Algorithm.
String Matching Finite Automata & KMP Algorithm.
Malek Sumaiya
 
patterns_and_sequences_1.ppt
patterns_and_sequences_1.pptpatterns_and_sequences_1.ppt
patterns_and_sequences_1.ppt
RimaFebriani10
 
String Matching algorithm String Matching algorithm String Matching algorithm
String Matching algorithm String Matching algorithm String Matching algorithmString Matching algorithm String Matching algorithm String Matching algorithm
String Matching algorithm String Matching algorithm String Matching algorithm
praweenkumarsahu9
 
String Matching (Naive,Rabin-Karp,KMP)
String Matching (Naive,Rabin-Karp,KMP)String Matching (Naive,Rabin-Karp,KMP)
String Matching (Naive,Rabin-Karp,KMP)
Aditya pratap Singh
 
Periodic pattern mining
Periodic pattern miningPeriodic pattern mining
Periodic pattern mining
Ashis Chanda
 
Ad

More from Benjamin Sach (20)

Approximation Algorithms Part Four: APTAS
Approximation Algorithms Part Four: APTASApproximation Algorithms Part Four: APTAS
Approximation Algorithms Part Four: APTAS
Benjamin Sach
 
Approximation Algorithms Part Three: (F)PTAS
Approximation Algorithms Part Three: (F)PTASApproximation Algorithms Part Three: (F)PTAS
Approximation Algorithms Part Three: (F)PTAS
Benjamin Sach
 
Approximation Algorithms Part Two: More Constant factor approximations
Approximation Algorithms Part Two: More Constant factor approximationsApproximation Algorithms Part Two: More Constant factor approximations
Approximation Algorithms Part Two: More Constant factor approximations
Benjamin Sach
 
Approximation Algorithms Part One: Constant factor approximations
Approximation Algorithms Part One: Constant factor approximationsApproximation Algorithms Part One: Constant factor approximations
Approximation Algorithms Part One: Constant factor approximations
Benjamin Sach
 
van Emde Boas trees
van Emde Boas treesvan Emde Boas trees
van Emde Boas trees
Benjamin Sach
 
Orthogonal Range Searching
Orthogonal Range SearchingOrthogonal Range Searching
Orthogonal Range Searching
Benjamin Sach
 
Lowest Common Ancestor
Lowest Common AncestorLowest Common Ancestor
Lowest Common Ancestor
Benjamin Sach
 
Range Minimum Queries
Range Minimum QueriesRange Minimum Queries
Range Minimum Queries
Benjamin Sach
 
Hashing Part Two: Cuckoo Hashing
Hashing Part Two: Cuckoo HashingHashing Part Two: Cuckoo Hashing
Hashing Part Two: Cuckoo Hashing
Benjamin Sach
 
Hashing Part Two: Static Perfect Hashing
Hashing Part Two: Static Perfect HashingHashing Part Two: Static Perfect Hashing
Hashing Part Two: Static Perfect Hashing
Benjamin Sach
 
Hashing Part One
Hashing Part OneHashing Part One
Hashing Part One
Benjamin Sach
 
Probability Recap
Probability RecapProbability Recap
Probability Recap
Benjamin Sach
 
Bloom Filters
Bloom FiltersBloom Filters
Bloom Filters
Benjamin Sach
 
Dynamic Programming
Dynamic ProgrammingDynamic Programming
Dynamic Programming
Benjamin Sach
 
Minimum Spanning Trees (via Disjoint Sets)
Minimum Spanning Trees (via Disjoint Sets)Minimum Spanning Trees (via Disjoint Sets)
Minimum Spanning Trees (via Disjoint Sets)
Benjamin Sach
 
Shortest Paths Part 1: Priority Queues and Dijkstra's Algorithm
Shortest Paths Part 1: Priority Queues and Dijkstra's AlgorithmShortest Paths Part 1: Priority Queues and Dijkstra's Algorithm
Shortest Paths Part 1: Priority Queues and Dijkstra's Algorithm
Benjamin Sach
 
Depth First Search and Breadth First Search
Depth First Search and Breadth First SearchDepth First Search and Breadth First Search
Depth First Search and Breadth First Search
Benjamin Sach
 
Shortest Paths Part 2: Negative Weights and All-pairs
Shortest Paths Part 2: Negative Weights and All-pairsShortest Paths Part 2: Negative Weights and All-pairs
Shortest Paths Part 2: Negative Weights and All-pairs
Benjamin Sach
 
Line Segment Intersections
Line Segment IntersectionsLine Segment Intersections
Line Segment Intersections
Benjamin Sach
 
Self-balancing Trees and Skip Lists
Self-balancing Trees and Skip ListsSelf-balancing Trees and Skip Lists
Self-balancing Trees and Skip Lists
Benjamin Sach
 
Approximation Algorithms Part Four: APTAS
Approximation Algorithms Part Four: APTASApproximation Algorithms Part Four: APTAS
Approximation Algorithms Part Four: APTAS
Benjamin Sach
 
Approximation Algorithms Part Three: (F)PTAS
Approximation Algorithms Part Three: (F)PTASApproximation Algorithms Part Three: (F)PTAS
Approximation Algorithms Part Three: (F)PTAS
Benjamin Sach
 
Approximation Algorithms Part Two: More Constant factor approximations
Approximation Algorithms Part Two: More Constant factor approximationsApproximation Algorithms Part Two: More Constant factor approximations
Approximation Algorithms Part Two: More Constant factor approximations
Benjamin Sach
 
Approximation Algorithms Part One: Constant factor approximations
Approximation Algorithms Part One: Constant factor approximationsApproximation Algorithms Part One: Constant factor approximations
Approximation Algorithms Part One: Constant factor approximations
Benjamin Sach
 
Orthogonal Range Searching
Orthogonal Range SearchingOrthogonal Range Searching
Orthogonal Range Searching
Benjamin Sach
 
Lowest Common Ancestor
Lowest Common AncestorLowest Common Ancestor
Lowest Common Ancestor
Benjamin Sach
 
Range Minimum Queries
Range Minimum QueriesRange Minimum Queries
Range Minimum Queries
Benjamin Sach
 
Hashing Part Two: Cuckoo Hashing
Hashing Part Two: Cuckoo HashingHashing Part Two: Cuckoo Hashing
Hashing Part Two: Cuckoo Hashing
Benjamin Sach
 
Hashing Part Two: Static Perfect Hashing
Hashing Part Two: Static Perfect HashingHashing Part Two: Static Perfect Hashing
Hashing Part Two: Static Perfect Hashing
Benjamin Sach
 
Minimum Spanning Trees (via Disjoint Sets)
Minimum Spanning Trees (via Disjoint Sets)Minimum Spanning Trees (via Disjoint Sets)
Minimum Spanning Trees (via Disjoint Sets)
Benjamin Sach
 
Shortest Paths Part 1: Priority Queues and Dijkstra's Algorithm
Shortest Paths Part 1: Priority Queues and Dijkstra's AlgorithmShortest Paths Part 1: Priority Queues and Dijkstra's Algorithm
Shortest Paths Part 1: Priority Queues and Dijkstra's Algorithm
Benjamin Sach
 
Depth First Search and Breadth First Search
Depth First Search and Breadth First SearchDepth First Search and Breadth First Search
Depth First Search and Breadth First Search
Benjamin Sach
 
Shortest Paths Part 2: Negative Weights and All-pairs
Shortest Paths Part 2: Negative Weights and All-pairsShortest Paths Part 2: Negative Weights and All-pairs
Shortest Paths Part 2: Negative Weights and All-pairs
Benjamin Sach
 
Line Segment Intersections
Line Segment IntersectionsLine Segment Intersections
Line Segment Intersections
Benjamin Sach
 
Self-balancing Trees and Skip Lists
Self-balancing Trees and Skip ListsSelf-balancing Trees and Skip Lists
Self-balancing Trees and Skip Lists
Benjamin Sach
 
Ad

Recently uploaded (20)

How to Configure Extra Steps During Checkout in Odoo 18 Website
How to Configure Extra Steps During Checkout in Odoo 18 WebsiteHow to Configure Extra Steps During Checkout in Odoo 18 Website
How to Configure Extra Steps During Checkout in Odoo 18 Website
Celine George
 
How to Add Button in Chatter in Odoo 18 - Odoo Slides
How to Add Button in Chatter in Odoo 18 - Odoo SlidesHow to Add Button in Chatter in Odoo 18 - Odoo Slides
How to Add Button in Chatter in Odoo 18 - Odoo Slides
Celine George
 
Cyber security COPA ITI MCQ Top Questions
Cyber security COPA ITI MCQ Top QuestionsCyber security COPA ITI MCQ Top Questions
Cyber security COPA ITI MCQ Top Questions
SONU HEETSON
 
MICROBIAL GENETICS -tranformation and tranduction.pdf
MICROBIAL GENETICS -tranformation and tranduction.pdfMICROBIAL GENETICS -tranformation and tranduction.pdf
MICROBIAL GENETICS -tranformation and tranduction.pdf
DHARMENDRA SAHU
 
INDIA QUIZ FOR SCHOOLS | THE QUIZ CLUB OF PSGCAS | AUGUST 2024
INDIA QUIZ FOR SCHOOLS | THE QUIZ CLUB OF PSGCAS | AUGUST 2024INDIA QUIZ FOR SCHOOLS | THE QUIZ CLUB OF PSGCAS | AUGUST 2024
INDIA QUIZ FOR SCHOOLS | THE QUIZ CLUB OF PSGCAS | AUGUST 2024
Quiz Club of PSG College of Arts & Science
 
U3 ANTITUBERCULAR DRUGS Pharmacology 3.pptx
U3 ANTITUBERCULAR DRUGS Pharmacology 3.pptxU3 ANTITUBERCULAR DRUGS Pharmacology 3.pptx
U3 ANTITUBERCULAR DRUGS Pharmacology 3.pptx
Mayuri Chavan
 
YSPH VMOC Special Report - Measles Outbreak Southwest US 5-17-2025 .pptx
YSPH VMOC Special Report - Measles Outbreak  Southwest US 5-17-2025  .pptxYSPH VMOC Special Report - Measles Outbreak  Southwest US 5-17-2025  .pptx
YSPH VMOC Special Report - Measles Outbreak Southwest US 5-17-2025 .pptx
Yale School of Public Health - The Virtual Medical Operations Center (VMOC)
 
Chemotherapy of Malignancy -Anticancer.pptx
Chemotherapy of Malignancy -Anticancer.pptxChemotherapy of Malignancy -Anticancer.pptx
Chemotherapy of Malignancy -Anticancer.pptx
Mayuri Chavan
 
YSPH VMOC Special Report - Measles Outbreak Southwest US 5-14-2025 .pptx
YSPH VMOC Special Report - Measles Outbreak  Southwest US 5-14-2025  .pptxYSPH VMOC Special Report - Measles Outbreak  Southwest US 5-14-2025  .pptx
YSPH VMOC Special Report - Measles Outbreak Southwest US 5-14-2025 .pptx
Yale School of Public Health - The Virtual Medical Operations Center (VMOC)
 
Classification of mental disorder in 5th semester bsc. nursing and also used ...
Classification of mental disorder in 5th semester bsc. nursing and also used ...Classification of mental disorder in 5th semester bsc. nursing and also used ...
Classification of mental disorder in 5th semester bsc. nursing and also used ...
parmarjuli1412
 
MCQS (EMERGENCY NURSING) DR. NASIR MUSTAFA
MCQS (EMERGENCY NURSING) DR. NASIR MUSTAFAMCQS (EMERGENCY NURSING) DR. NASIR MUSTAFA
MCQS (EMERGENCY NURSING) DR. NASIR MUSTAFA
Dr. Nasir Mustafa
 
PUBH1000 Slides - Module 12: Advocacy for Health
PUBH1000 Slides - Module 12: Advocacy for HealthPUBH1000 Slides - Module 12: Advocacy for Health
PUBH1000 Slides - Module 12: Advocacy for Health
JonathanHallett4
 
How to Manage Manual Reordering Rule in Odoo 18 Inventory
How to Manage Manual Reordering Rule in Odoo 18 InventoryHow to Manage Manual Reordering Rule in Odoo 18 Inventory
How to Manage Manual Reordering Rule in Odoo 18 Inventory
Celine George
 
2025 The Senior Landscape and SET plan preparations.pptx
2025 The Senior Landscape and SET plan preparations.pptx2025 The Senior Landscape and SET plan preparations.pptx
2025 The Senior Landscape and SET plan preparations.pptx
mansk2
 
The role of wall art in interior designing
The role of wall art in interior designingThe role of wall art in interior designing
The role of wall art in interior designing
meghaark2110
 
Mental Health Assessment in 5th semester bsc. nursing and also used in 2nd ye...
Mental Health Assessment in 5th semester bsc. nursing and also used in 2nd ye...Mental Health Assessment in 5th semester bsc. nursing and also used in 2nd ye...
Mental Health Assessment in 5th semester bsc. nursing and also used in 2nd ye...
parmarjuli1412
 
ANTI-VIRAL DRUGS unit 3 Pharmacology 3.pptx
ANTI-VIRAL DRUGS unit 3 Pharmacology 3.pptxANTI-VIRAL DRUGS unit 3 Pharmacology 3.pptx
ANTI-VIRAL DRUGS unit 3 Pharmacology 3.pptx
Mayuri Chavan
 
"Bridging Cultures Through Holiday Cards: 39 Students Celebrate Global Tradit...
"Bridging Cultures Through Holiday Cards: 39 Students Celebrate Global Tradit..."Bridging Cultures Through Holiday Cards: 39 Students Celebrate Global Tradit...
"Bridging Cultures Through Holiday Cards: 39 Students Celebrate Global Tradit...
AlionaBujoreanu
 
Unit 5 ACUTE, SUBACUTE,CHRONIC TOXICITY.pptx
Unit 5 ACUTE, SUBACUTE,CHRONIC TOXICITY.pptxUnit 5 ACUTE, SUBACUTE,CHRONIC TOXICITY.pptx
Unit 5 ACUTE, SUBACUTE,CHRONIC TOXICITY.pptx
Mayuri Chavan
 
How To Maximize Sales Performance using Odoo 18 Diverse views in sales module
How To Maximize Sales Performance using Odoo 18 Diverse views in sales moduleHow To Maximize Sales Performance using Odoo 18 Diverse views in sales module
How To Maximize Sales Performance using Odoo 18 Diverse views in sales module
Celine George
 
How to Configure Extra Steps During Checkout in Odoo 18 Website
How to Configure Extra Steps During Checkout in Odoo 18 WebsiteHow to Configure Extra Steps During Checkout in Odoo 18 Website
How to Configure Extra Steps During Checkout in Odoo 18 Website
Celine George
 
How to Add Button in Chatter in Odoo 18 - Odoo Slides
How to Add Button in Chatter in Odoo 18 - Odoo SlidesHow to Add Button in Chatter in Odoo 18 - Odoo Slides
How to Add Button in Chatter in Odoo 18 - Odoo Slides
Celine George
 
Cyber security COPA ITI MCQ Top Questions
Cyber security COPA ITI MCQ Top QuestionsCyber security COPA ITI MCQ Top Questions
Cyber security COPA ITI MCQ Top Questions
SONU HEETSON
 
MICROBIAL GENETICS -tranformation and tranduction.pdf
MICROBIAL GENETICS -tranformation and tranduction.pdfMICROBIAL GENETICS -tranformation and tranduction.pdf
MICROBIAL GENETICS -tranformation and tranduction.pdf
DHARMENDRA SAHU
 
U3 ANTITUBERCULAR DRUGS Pharmacology 3.pptx
U3 ANTITUBERCULAR DRUGS Pharmacology 3.pptxU3 ANTITUBERCULAR DRUGS Pharmacology 3.pptx
U3 ANTITUBERCULAR DRUGS Pharmacology 3.pptx
Mayuri Chavan
 
Chemotherapy of Malignancy -Anticancer.pptx
Chemotherapy of Malignancy -Anticancer.pptxChemotherapy of Malignancy -Anticancer.pptx
Chemotherapy of Malignancy -Anticancer.pptx
Mayuri Chavan
 
Classification of mental disorder in 5th semester bsc. nursing and also used ...
Classification of mental disorder in 5th semester bsc. nursing and also used ...Classification of mental disorder in 5th semester bsc. nursing and also used ...
Classification of mental disorder in 5th semester bsc. nursing and also used ...
parmarjuli1412
 
MCQS (EMERGENCY NURSING) DR. NASIR MUSTAFA
MCQS (EMERGENCY NURSING) DR. NASIR MUSTAFAMCQS (EMERGENCY NURSING) DR. NASIR MUSTAFA
MCQS (EMERGENCY NURSING) DR. NASIR MUSTAFA
Dr. Nasir Mustafa
 
PUBH1000 Slides - Module 12: Advocacy for Health
PUBH1000 Slides - Module 12: Advocacy for HealthPUBH1000 Slides - Module 12: Advocacy for Health
PUBH1000 Slides - Module 12: Advocacy for Health
JonathanHallett4
 
How to Manage Manual Reordering Rule in Odoo 18 Inventory
How to Manage Manual Reordering Rule in Odoo 18 InventoryHow to Manage Manual Reordering Rule in Odoo 18 Inventory
How to Manage Manual Reordering Rule in Odoo 18 Inventory
Celine George
 
2025 The Senior Landscape and SET plan preparations.pptx
2025 The Senior Landscape and SET plan preparations.pptx2025 The Senior Landscape and SET plan preparations.pptx
2025 The Senior Landscape and SET plan preparations.pptx
mansk2
 
The role of wall art in interior designing
The role of wall art in interior designingThe role of wall art in interior designing
The role of wall art in interior designing
meghaark2110
 
Mental Health Assessment in 5th semester bsc. nursing and also used in 2nd ye...
Mental Health Assessment in 5th semester bsc. nursing and also used in 2nd ye...Mental Health Assessment in 5th semester bsc. nursing and also used in 2nd ye...
Mental Health Assessment in 5th semester bsc. nursing and also used in 2nd ye...
parmarjuli1412
 
ANTI-VIRAL DRUGS unit 3 Pharmacology 3.pptx
ANTI-VIRAL DRUGS unit 3 Pharmacology 3.pptxANTI-VIRAL DRUGS unit 3 Pharmacology 3.pptx
ANTI-VIRAL DRUGS unit 3 Pharmacology 3.pptx
Mayuri Chavan
 
"Bridging Cultures Through Holiday Cards: 39 Students Celebrate Global Tradit...
"Bridging Cultures Through Holiday Cards: 39 Students Celebrate Global Tradit..."Bridging Cultures Through Holiday Cards: 39 Students Celebrate Global Tradit...
"Bridging Cultures Through Holiday Cards: 39 Students Celebrate Global Tradit...
AlionaBujoreanu
 
Unit 5 ACUTE, SUBACUTE,CHRONIC TOXICITY.pptx
Unit 5 ACUTE, SUBACUTE,CHRONIC TOXICITY.pptxUnit 5 ACUTE, SUBACUTE,CHRONIC TOXICITY.pptx
Unit 5 ACUTE, SUBACUTE,CHRONIC TOXICITY.pptx
Mayuri Chavan
 
How To Maximize Sales Performance using Odoo 18 Diverse views in sales module
How To Maximize Sales Performance using Odoo 18 Diverse views in sales moduleHow To Maximize Sales Performance using Odoo 18 Diverse views in sales module
How To Maximize Sales Performance using Odoo 18 Diverse views in sales module
Celine George
 

Pattern Matching Part One: Suffix Trees

  • 1. Advanced Algorithms – COMS31900 Pattern Matching part one Suffix Trees Benjamin Sach
  • 2. Exact pattern matching T Input A text string T (length n) and a pattern string P (length m) P ba b c a b a a b a cb a Goal: Find all the locations where P matches in T P matches at location i iff a b a n m for all 0 j m we have that P[j] = T[i + j] (our strings are zero-indexed)
  • 3. Exact pattern matching T Input A text string T (length n) and a pattern string P (length m) P ba b c a b a a b a cb a Goal: Find all the locations where P matches in T P matches at location i iff a b a n m 4 for all 0 j m we have that P[j] = T[i + j] (our strings are zero-indexed)
  • 4. Exact pattern matching T Input A text string T (length n) and a pattern string P (length m) P ba b c a b a cb a Goal: Find all the locations where P matches in T P matches at location i iff a b a n a b a m 6 for all 0 j m we have that P[j] = T[i + j] (our strings are zero-indexed)
  • 5. Exact pattern matching T Input A text string T (length n) and a pattern string P (length m) P ba b c a b a cb a Goal: Find all the locations where P matches in T P matches at location i iff a b a n a b a m 10 for all 0 j m we have that P[j] = T[i + j] (our strings are zero-indexed)
  • 6. Exact pattern matching T Input A text string T (length n) and a pattern string P (length m) P ba b c a b a cb a Goal: Find all the locations where P matches in T P matches at location i iff a b a n a b a m 6 for all 0 j m we have that P[j] = T[i + j] (our strings are zero-indexed)
  • 7. Exact pattern matching T Input A text string T (length n) and a pattern string P (length m) P ba b c a b a cb a Goal: Find all the locations where P matches in T P matches at location i iff a b a n a b a m 6 for all 0 j m we have that P[j] = T[i + j] (our strings are zero-indexed) • A naive algorithm takes O(nm) time
  • 8. Exact pattern matching T Input A text string T (length n) and a pattern string P (length m) P ba b c a b a cb a Goal: Find all the locations where P matches in T P matches at location i iff a b a n a b a m 6 for all 0 j m we have that P[j] = T[i + j] (our strings are zero-indexed) • A naive algorithm takes O(nm) time • Many O(n) time algorithms are known (for example KMP)
  • 9. Text indexing T Preprocess a text string T (length n) to answer pattern matching queries. . . ba b c a b a cb a a b a n
  • 10. Text indexing T Preprocess a text string T (length n) to answer pattern matching queries. . . ba b c a b a cb a a b a n After preprocessing, a query is a pattern P (length m), P a b a m
  • 11. Text indexing T Preprocess a text string T (length n) to answer pattern matching queries. . . ba b c a b a cb a a b a n After preprocessing, a query is a pattern P (length m), P a b a m the output is a list of all matches in T.
  • 12. Text indexing T Preprocess a text string T (length n) to answer pattern matching queries. . . ba b c a b a cb a a b a n After preprocessing, a query is a pattern P (length m), P a b a m the output is a list of all matches in T. e.g. 4, 6, 10 4 6 10
  • 13. Text indexing T Preprocess a text string T (length n) to answer pattern matching queries. . . ba b c a b a cb a a b a n After preprocessing, a query is a pattern P (length m), P a b a m the output is a list of all matches in T. • A naive algorithm takes O(n) query time (using KMP) e.g. 4, 6, 10 4 6 10
  • 14. Text indexing T Preprocess a text string T (length n) to answer pattern matching queries. . . ba b c a b a cb a a b a n After preprocessing, a query is a pattern P (length m), P a b a m the output is a list of all matches in T. • A naive algorithm takes O(n) query time (using KMP) • We want a query time which depends only on m and occ - occ is the number of occurences (matches) e.g. 4, 6, 10 4 6 10
  • 15. Text indexing T Preprocess a text string T (length n) to answer pattern matching queries. . . ba b c a b a cb a a b a n After preprocessing, a query is a pattern P (length m), P a b a m the output is a list of all matches in T. • A naive algorithm takes O(n) query time (using KMP) • We want a query time which depends only on m and occ - occ is the number of occurences (matches) • We also want O(n) space and fast preprocessing (prep.) time e.g. 4, 6, 10 4 6 10
  • 16. The atomic suffix tree TT b n aaa sn n
  • 17. The atomic suffix tree TT b n aaa sn n b n aaa sn n aaa sn n aa sn aa sn a sn a s s suffixes
  • 18. The atomic suffix tree sn a s n a s a n a s TT b n aaa sn n a s b n a sn a n a s b n aaa sn n aaa sn n aa sn aa sn a sn a s s suffixes suffix tree
  • 19. The atomic suffix tree sn a s n a s a n a s TT b n aaa sn n a s b n a sn a n a s b n aaa sn n aaa sn n aa sn aa sn a sn a s s suffixes suffix tree
  • 20. The atomic suffix tree sn a s n a s a n a s TT b n aaa sn n a s b n a sn a n a s b n aaa sn n aaa sn n aa sn aa sn a sn a s s suffixes suffix tree
  • 21. The atomic suffix tree sn a s n a s a n a s TT b n aaa sn n a s b n a sn a n a s b n aaa sn n aaa sn n aa sn aa sn a sn a s s suffixes suffix tree
  • 22. The atomic suffix tree sn a s n a s a n a s TT b n aaa sn n a s b n a sn a n a s b n aaa sn n aaa sn n aa sn aa sn a sn a s s suffixes suffix tree
  • 23. The atomic suffix tree sn a s n a s a n a s TT b n aaa sn n a s b n a sn a n a s b n aaa sn n aaa sn n aa sn aa sn a sn a s s suffixes suffix tree
  • 24. The atomic suffix tree sn a s n a s a n a s TT b n aaa sn n a s b n a sn a n a s b n aaa sn n aaa sn n aa sn aa sn a sn a s s suffixes suffix tree
  • 25. The atomic suffix tree sn a s n a s a n a s TT b n aaa sn n a s b n a sn a n a s b n aaa sn n aaa sn n aa sn aa sn a sn a s s suffixes suffix tree
  • 26. The atomic suffix tree sn a s n a s a n a s TT b n aaa sn n a s b n a sn a n a s b n aaa sn n aaa sn n aa sn aa sn a sn a s s suffixes suffix tree
  • 27. The atomic suffix tree sn a s n a s a n a s TT b n aaa sn n a s b n a sn a n a s b n aaa sn n aaa sn n aa sn aa sn a sn a s s suffixes suffix tree 0 1 2 3 4 5 6 0 1 2 3 4 5 6
  • 28. The atomic suffix tree sn a s n a s a n a s TT b n aaa sn n a s b n a sn a n a s b n aaa sn n aaa sn n aa sn aa sn a sn a s s suffixes suffix tree 0 1 2 3 4 5 6 0 1 2 3 4 5 6 • The suffix tree contains every suffix of T as a root to leaf path
  • 29. The atomic suffix tree sn a s n a s a n a s TT b n aaa sn n a s b n a sn a n a s b n aaa sn n aaa sn n aa sn aa sn a sn a s s suffixes suffix tree 0 1 2 3 4 5 6 0 1 2 3 4 5 6 • The suffix tree contains every suffix of T as a root to leaf path • Every edge is labelled with a character from T
  • 30. The atomic suffix tree sn a s n a s a n a s TT b n aaa sn n a s b n a sn a n a s b n aaa sn n aaa sn n aa sn aa sn a sn a s s suffixes suffix tree 0 1 2 3 4 5 6 0 1 2 3 4 5 6 • The suffix tree contains every suffix of T as a root to leaf path • Every edge is labelled with a character from T • No two edges leaving the same node have the same label
  • 31. The atomic suffix tree sn a s n a s a n a s TT b n aaa sn n a s b n a sn a n a s b n aaa sn n aaa sn n aa sn aa sn a sn a s s suffixes suffix tree 0 1 2 3 4 5 6 0 1 2 3 4 5 6 • The suffix tree contains every suffix of T as a root to leaf path • Each leaf corresponds to a suffix (so there are n leaves) • Every edge is labelled with a character from T • No two edges leaving the same node have the same label
  • 32. Searching in an atomic suffix tree sn a s n a s a n a s TT b n aaa sn n s b n a sn a n a s 0 1 2 3 4 5 a 6
  • 33. Searching in an atomic suffix tree sn a s n a s a n a s TT b n aaa sn n s b n a sn a n a s How do you find a pattern? 0 1 2 3 4 5 a 6
  • 34. Searching in an atomic suffix tree sn a s n a s a n a s TT b n aaa sn n s b n a sn a n a s How do you find a pattern? 0 1 2 3 4 5 P aa n m a 6
  • 35. Searching in an atomic suffix tree sn a s n a s a n a s TT b n aaa sn n s b n a sn a n a s How do you find a pattern? 0 1 2 3 4 5 P aa n m start at the root and walk down the tree a 6
  • 36. Searching in an atomic suffix tree sn a s n a s a n a s TT b n aaa sn n s b n a sn a n a s How do you find a pattern? 0 1 2 3 4 5 P aa n m start at the root and walk down the tree a 6
  • 37. Searching in an atomic suffix tree sn a s n a s a n a s TT b n aaa sn n s b n a sn a n a s How do you find a pattern? 0 1 2 3 4 5 P aa n m start at the root and walk down the tree a 6
  • 38. Searching in an atomic suffix tree sn a s n a s a n a s TT b n aaa sn n s b n a sn a n a s How do you find a pattern? 0 1 2 3 4 5 P aa n m start at the root and walk down the tree a 6
  • 39. Searching in an atomic suffix tree sn a s n a s a n a s TT b n aaa sn n s b n a sn a n a s How do you find a pattern? 0 1 2 3 4 5 P aa n m start at the root and walk down the tree a 6
  • 40. Searching in an atomic suffix tree sn a s n a s a n a s TT b n aaa sn n s b n a sn a n a s How do you find a pattern? 0 1 2 3 4 5 P aa n m start at the root and walk down the tree a . . . matches occur at the leaves of the subtree 6
  • 41. Searching in an atomic suffix tree sn a s n a s a n a s TT b n aaa sn n s b n a sn a n a s How do you find a pattern? 0 1 2 3 4 5 P aa n m start at the root and walk down the tree a . . . matches occur at the leaves of the subtree 6
  • 42. Searching in an atomic suffix tree sn a s n a s a n a s TT b n aaa sn n s b n a sn a n a s How do you find a pattern? 0 1 2 3 4 5 P aa n m start at the root and walk down the tree a . . . matches occur at the leaves of the subtree 1 6
  • 43. Searching in an atomic suffix tree sn a s n a s a n a s TT b n aaa sn n s b n a sn a n a s How do you find a pattern? 0 1 2 3 4 5 P aa n m start at the root and walk down the tree a . . . matches occur at the leaves of the subtree 3 6
  • 44. Searching in an atomic suffix tree sn a s n a s a n a s TT b n aaa sn n s b n a sn a n a s How do you find a pattern? 0 1 2 3 4 5 P aa n m start at the root and walk down the tree a . . . matches occur at the leaves of the subtree 6
  • 45. Searching in an atomic suffix tree sn a s n a s a n a s TT b n aaa sn n s b n a sn a n a s How do you find a pattern? 0 1 2 3 4 5 P aa n m start at the root and walk down the tree a . . . matches occur at the leaves of the subtree P bn a 6
  • 46. Searching in an atomic suffix tree sn a s n a s a n a s TT b n aaa sn n s b n a sn a n a s How do you find a pattern? 0 1 2 3 4 5 P aa n m start at the root and walk down the tree a . . . matches occur at the leaves of the subtree P bn a 6
  • 47. Searching in an atomic suffix tree sn a s n a s a n a s TT b n aaa sn n s b n a sn a n a s How do you find a pattern? 0 1 2 3 4 5 P aa n m start at the root and walk down the tree a . . . matches occur at the leaves of the subtree P bn a 6
  • 48. Searching in an atomic suffix tree sn a s n a s a n a s TT b n aaa sn n s b n a sn a n a s How do you find a pattern? 0 1 2 3 4 5 P aa n m start at the root and walk down the tree a . . . matches occur at the leaves of the subtree P bn a 6
  • 49. Searching in an atomic suffix tree sn a s n a s a n a s TT b n aaa sn n s b n a sn a n a s How do you find a pattern? 0 1 2 3 4 5 P aa n m start at the root and walk down the tree a . . . matches occur at the leaves of the subtree P bn a 6
  • 50. Searching in an atomic suffix tree sn a s n a s a n a s TT b n aaa sn n s b n a sn a n a s How do you find a pattern? 0 1 2 3 4 5 P aa n m start at the root and walk down the tree a . . . matches occur at the leaves of the subtree P bn a 6
  • 51. Searching in an atomic suffix tree sn a s n a s a n a s TT b n aaa sn n s b n a sn a n a s How do you find a pattern? 0 1 2 3 4 5 P aa n m start at the root and walk down the tree a . . . matches occur at the leaves of the subtree P bn a 6
  • 52. Searching in an atomic suffix tree sn a s n a s a n a s TT b n aaa sn n s b n a sn a n a s How do you find a pattern? 0 1 2 3 4 5 P aa n m start at the root and walk down the tree a . . . matches occur at the leaves of the subtree We can decide whether P matches somewhere in O(m) time P bn a 6
  • 53. Searching in an atomic suffix tree sn a s n a s a n a s TT b n aaa sn n s b n a sn a n a s How do you find a pattern? 0 1 2 3 4 5 P aa n m start at the root and walk down the tree a . . . matches occur at the leaves of the subtree We can decide whether P matches somewhere in O(m) time (we’ll worry about outputting the matches later) P bn a 6
  • 54. Searching in an atomic suffix tree sn a s n a s a n a s TT b n aaa sn n s b n a sn a n a s How do you find a pattern? 0 1 2 3 4 5 P aa n m start at the root and walk down the tree a . . . matches occur at the leaves of the subtree We can decide whether P matches somewhere in O(m) time (we’ll worry about outputting the matches later) WARNING! How long does it take to find the correct child? There could be n edges here! In this lecture we assume the alphabet size is a constant P bn a 6
  • 55. Searching in an atomic suffix tree sn a s n a s a n a s TT b n aaa sn n s b n a sn a n a s How do you find a pattern? 0 1 2 3 4 5 P aa n m start at the root and walk down the tree a . . . matches occur at the leaves of the subtree We can decide whether P matches somewhere in O(m) time (we’ll worry about outputting the matches later) WARNING! How long does it take to find the correct child? There could be n edges here! In this lecture we assume the alphabet size is a constant This may be fine in some applications (English text or DNA for example) We can remove the assumption via the magic of hashing P bn a 6
  • 56. Searching in an atomic suffix tree sn a s n a s a n a s TT b n aaa sn n s b n a sn a n a s How do you find a pattern? 0 1 2 3 4 5 P aa n m start at the root and walk down the tree a . . . matches occur at the leaves of the subtree We can decide whether P matches somewhere in O(m) time (we’ll worry about outputting the matches later) P bn a 6
  • 57. how large is the atomic suffix tree? sn a s n a s a n a s TT b n aaa sn n s b n a sn a n a s There are at most n leaves 0 1 2 3 4 5 a 6
  • 58. how large is the atomic suffix tree? sn a s n a s a n a s TT b n aaa sn n s b n a sn a n a s There are at most n leaves 0 1 2 3 4 5 a that’s good right? 6
  • 59. how large is the atomic suffix tree? sn a s n a s a n a s TT b n aaa sn n s b n a sn a n a s There are at most n leaves 0 1 2 3 4 5 a that’s good right? Unfortunately there can be lots of internal nodes 6
  • 60. how large is the atomic suffix tree? sn a s n a s a n a s TT b n aaa sn n s b n a sn a n a s There are at most n leaves 0 1 2 3 4 5 a that’s good right? Unfortunately there can be lots of internal nodes this path is pretty long. . . 6
  • 61. how large is the atomic suffix tree? sn a s n a s a n a s TT b n aaa sn n s b n a sn a n a s There are at most n leaves 0 1 2 3 4 5 a that’s good right? Unfortunately there can be lots of internal nodes this path is pretty long. . . 6
  • 62. how large is the atomic suffix tree? sn a s n a s a n a s TT b n aaa sn n s b n a sn a n a s There are at most n leaves 0 1 2 3 4 5 a that’s good right? Unfortunately there can be lots of internal nodes this path is pretty long. . . 7 characters 6
  • 63. how large is the atomic suffix tree? sn a s n a s a n a s TT b n aaa sn n s b n a sn a n a s There are at most n leaves 0 1 2 3 4 5 a that’s good right? Unfortunately there can be lots of internal nodes this path is pretty long. . . 7 characters 23 nodes 6
  • 64. how large is the atomic suffix tree? sn a s n a s a n a s TT b n aaa sn n s b n a sn a n a s There are at most n leaves 0 1 2 3 4 5 a that’s good right? Unfortunately there can be lots of internal nodes this path is pretty long. . . 7 characters 23 nodes that’s not so bad, right? 6
  • 65. how large is the atomic suffix tree?
  • 66. how large is the atomic suffix tree? T a b 2
  • 67. how large is the atomic suffix tree? T a b 2 a b b
  • 68. how large is the atomic suffix tree? T a b 2 a b b 4 nodes
  • 69. how large is the atomic suffix tree? T a b a b baT 2 4 a b b b b a b b a b b 4 nodes 9 nodes
  • 70. how large is the atomic suffix tree? T a b a b ba a b baa bT T 2 4 6 a b b b b a b b a b b b b a b b a b b b b b a b b b 4 nodes 9 nodes 16 nodes
  • 71. how large is the atomic suffix tree? T a b a b ba a b baa b a b baa ba bT T T 2 4 6 8 a b b b b a b b a b b b b a b b a b b b b b a b b b 4 nodes 9 nodes 16 nodes b b a b b a b b b b b a b b b 25 nodes b b b b a b b b b
  • 72. how large is the atomic suffix tree? T a b a b ba a b baa b a b baa b a b baa b a b aa b b T T T T 2 4 6 8 10 a b b b b a b b a b b b b a b b a b b b b b a b b b 4 nodes 9 nodes 16 nodes b b a b b a b b b b b a b b b 25 nodes b b b b a b b b b b b a b b a b b b b b a b b b 36 nodes b b b b a b b b b b b b b b a b b b b b
  • 73. how large is the atomic suffix tree? T a b a b ba a b baa b a b baa b a b baa b a b aa b b T T T T 2 4 6 8 10 a b b b b a b b a b b b b a b b a b b b b b a b b b 4 nodes 9 nodes 16 nodes b b a b b a b b b b b a b b b 25 nodes b b b b a b b b b b b a b b a b b b b b a b b b 36 nodes b b b b a b b b b b b b b b a b b b b b An atomic suffix tree can have ((n/2) + 1)2 nodes
  • 74. how large is the atomic suffix tree? T a b a b ba a b baa b a b baa b a b baa b a b aa b b T T T T 2 4 6 8 10 a b b b b a b b a b b b b a b b a b b b b b a b b b 4 nodes 9 nodes 16 nodes b b a b b a b b b b b a b b b 25 nodes b b b b a b b b b b b a b b a b b b b b a b b b 36 nodes b b b b a b b b b b b b b b a b b b b b An atomic suffix tree can have ((n/2) + 1)2 nodes this is too big!far
  • 75. Compacted suffix trees sn a s n a s a n a s TT b n aaa sn n s b n a sn a n a s 0 1 2 3 4 5 6a Why is the atomic suffix tree so big?
  • 76. Compacted suffix trees sn a s n a s a n a s TT b n aaa sn n s b n a sn a n a s 0 1 2 3 4 5 6a Why is the atomic suffix tree so big? because it has long paths like this one
  • 77. Compacted suffix trees sn a s n a s a n a s TT b n aaa sn n s b n a sn a n a s 0 1 2 3 4 5 6a Why is the atomic suffix tree so big? because it has long paths like this one Main Idea replace each non-branching path with a single edge
  • 78. Compacted suffix trees sn a s n a s a n a s TT b n aaa sn n s b n a sn a n a s 0 1 2 3 4 5 6a Why is the atomic suffix tree so big? because it has long paths like this one Main Idea replace each non-branching path with a single edge - edges are now labelled with substrings
  • 79. Compacted suffix trees sn a s n a s a n a s TT b n aaa sn n s b n a sn a n a s 0 1 2 3 4 5 6a Why is the atomic suffix tree so big? because it has long paths like this one Main Idea replace each non-branching path with a single edge - edges are now labelled with substrings (instead of single characters)
  • 80. Compacted suffix trees TT b n aaa sn n Main Idea replace each non-branching path with a single edge - edges are now labelled with substrings (instead of single characters) a s nas nas nas s na s bananas 1 3 5 0 2 4 6
  • 81. Compacted suffix trees TT b n aaa sn n Main Idea replace each non-branching path with a single edge - edges are now labelled with substrings (instead of single characters) a s nas nas nas s na s bananas • There are at most n leaves 1 3 5 0 2 4 6
  • 82. Compacted suffix trees TT b n aaa sn n Main Idea replace each non-branching path with a single edge - edges are now labelled with substrings (instead of single characters) a s nas nas nas s na s bananas • There are at most n leaves • Every internal node has two or more children 1 3 5 0 2 4 6
  • 83. Compacted suffix trees TT b n aaa sn n Main Idea replace each non-branching path with a single edge - edges are now labelled with substrings (instead of single characters) a s nas nas nas s na s bananas • There are at most n leaves • Every internal node has two or more children so there are O(n) edges 1 3 5 0 2 4 6
  • 84. Compacted suffix trees TT b n aaa sn n Main Idea replace each non-branching path with a single edge - edges are now labelled with substrings (instead of single characters) a s nas nas nas s na s bananas • There are at most n leaves • Every internal node has two or more children so there are O(n) edges don’t the edges take up 1 3 5 0 2 4 6 lots of space?
  • 85. Compacted suffix trees TT b n aaa sn n Main Idea replace each non-branching path with a single edge - edges are now labelled with substrings (instead of single characters) a s nas nas nas s na s bananas • There are at most n leaves • Every internal node has two or more children so there are O(n) edges don’t the edges take up we only store the end points 1 3 5 0 2 4 6 lots of space?
  • 86. Compacted suffix trees TT b n aaa sn n Main Idea replace each non-branching path with a single edge - edges are now labelled with substrings (instead of single characters) a s nas nas nas s na s bananas • There are at most n leaves • Every internal node has two or more children so there are O(n) edges don’t the edges take up we only store the end points we actually store (4, 6) 4 6 1 3 5 0 2 4 6 lots of space?
  • 87. Compacted suffix trees TT b n aaa sn n a s nas nas nas s na s bananas 1 3 5 0 2 4 6
  • 88. Compacted suffix trees TT b n aaa sn n a s nas nas nas s na s bananas Compacted Suffix Tree of T 1 3 5 0 2 4 6
  • 89. Compacted suffix trees TT b n aaa sn n a s nas nas nas s na s bananas Compacted Suffix Tree of T • A rooted tree with n leaves 1 3 5 0 2 4 6
  • 90. Compacted suffix trees TT b n aaa sn n a s nas nas nas s na s bananas Compacted Suffix Tree of T • A rooted tree with n leaves • Every internal node has two or more children 1 3 5 0 2 4 6
  • 91. Compacted suffix trees TT b n aaa sn n a s nas nas nas s na s bananas Compacted Suffix Tree of T • A rooted tree with n leaves • Every internal node has two or more children • Every edge is labelled with a substring 1 3 5 0 2 4 6
  • 92. Compacted suffix trees TT b n aaa sn n a s nas nas nas s na s bananas Compacted Suffix Tree of T • A rooted tree with n leaves • Every internal node has two or more children • Every edge is labelled with a substring • No two edges leaving the same node have the same first character 1 3 5 0 2 4 6
  • 93. Compacted suffix trees TT b n aaa sn n a s nas nas nas s na s bananas Compacted Suffix Tree of T • A rooted tree with n leaves • Every internal node has two or more children • Every edge is labelled with a substring • No two edges leaving the same node have the same first character • Each leaf is labelled with a location in T 1 3 5 0 2 4 6
  • 94. Compacted suffix trees TT b n aaa sn n a s nas nas nas s na s bananas Compacted Suffix Tree of T • A rooted tree with n leaves • Every internal node has two or more children • Every edge is labelled with a substring • No two edges leaving the same node have the same first character • Each leaf is labelled with a location in T • Any root-to-leaf path spells out the corresponding suffix 1 3 5 0 2 4 6
  • 95. Compacted suffix trees TT b n aaa sn n a s nas nas nas s na s bananas Compacted Suffix Tree of T Uses O(n) space • A rooted tree with n leaves • Every internal node has two or more children • Every edge is labelled with a substring • No two edges leaving the same node have the same first character • Each leaf is labelled with a location in T • Any root-to-leaf path spells out the corresponding suffix 1 3 5 0 2 4 6
  • 96. Compacted suffix trees TT b n aaa sn n a s nas nas nas s na s bananas Compacted Suffix Tree of T Uses O(n) space • A rooted tree with n leaves • Every internal node has two or more children • Every edge is labelled with a substring • No two edges leaving the same node have the same first character • Each leaf is labelled with a location in T • Any root-to-leaf path spells out the corresponding suffix 1 3 5 0 2 4 6 Sanity Check Does the compacted suffix tree always exist?
  • 97. Compacted suffix trees TT b n aaa sn n a s nas nas nas s na s bananas Compacted Suffix Tree of T Uses O(n) space • A rooted tree with n leaves • Every internal node has two or more children • Every edge is labelled with a substring • No two edges leaving the same node have the same first character • Each leaf is labelled with a location in T • Any root-to-leaf path spells out the corresponding suffix 1 3 5 0 2 4 6 Sanity Check Does the compacted suffix tree always exist? TT b b this doesn’t have n leavesbb
  • 98. Compacted suffix trees TT b n aaa sn n a s nas nas nas s na s bananas Compacted Suffix Tree of T Uses O(n) space • A rooted tree with n leaves • Every internal node has two or more children • Every edge is labelled with a substring • No two edges leaving the same node have the same first character • Each leaf is labelled with a location in T • Any root-to-leaf path spells out the corresponding suffix 1 3 5 0 2 4 6 Sanity Check Does the compacted suffix tree always exist? TT b b this doesn’t have n leavesbb TT b b $ b$ $ b$ this has n leaves
  • 99. Compacted suffix trees TT b n aaa sn n a s nas nas nas s na s bananas Compacted Suffix Tree of T Uses O(n) space • A rooted tree with n leaves • Every internal node has two or more children • Every edge is labelled with a substring • No two edges leaving the same node have the same first character • Each leaf is labelled with a location in T • Any root-to-leaf path spells out the corresponding suffix 1 3 5 0 2 4 6
  • 100. Compacted suffix trees TT b n aaa sn n a s nas nas nas s na s bananas Compacted Suffix Tree of T Uses O(n) space • A rooted tree with n leaves • Every internal node has two or more children • Every edge is labelled with a substring • No two edges leaving the same node have the same first character • Each leaf is labelled with a location in T • Any root-to-leaf path spells out the corresponding suffix 1 3 5 0 2 4 6 Step one: Add a $ (unique symbol) to T
  • 101. Compacted suffix trees a s nas nas nas s na s bananas Compacted Suffix Tree of T Uses O(n) space • A rooted tree with n leaves • Every internal node has two or more children • Every edge is labelled with a substring • No two edges leaving the same node have the same first character • Each leaf is labelled with a location in T • Any root-to-leaf path spells out the corresponding suffix 1 3 5 0 2 4 6 Step one: Add a $ (unique symbol) to T TT b n aaa sn n $
  • 102. Compacted suffix trees Compacted Suffix Tree of T Uses O(n) space • A rooted tree with n leaves • Every internal node has two or more children • Every edge is labelled with a substring • No two edges leaving the same node have the same first character • Each leaf is labelled with a location in T • Any root-to-leaf path spells out the corresponding suffix Step one: Add a $ (unique symbol) to T TT b n aaa sn n $ a s$ nas$ nas$ nas$ s$ na s$ bananas$ 7$ 1 3 5 0 2 4 6
  • 103. Compacted suffix trees Compacted Suffix Tree of T Uses O(n) space • A rooted tree with n leaves • Every internal node has two or more children • Every edge is labelled with a substring • No two edges leaving the same node have the same first character • Each leaf is labelled with a location in T • Any root-to-leaf path spells out the corresponding suffix Step one: Add a $ (unique symbol) to T TT b n aaa sn n $ a s$ nas$ nas$ nas$ s$ na s$ bananas$ 7$ This is normally just called1 3 5 0 2 4 6 a suffix tree
  • 104. Searching in a compacted suffix tree TT b n aaa sn n $ a s$ nas$ nas$ nas$ s$ na s$ bananas$ 7$ 1 3 5 0 2 4 6
  • 105. Searching in a compacted suffix tree How do you find a pattern? TT b n aaa sn n $ a s$ nas$ nas$ nas$ s$ na s$ bananas$ 7$ 1 3 5 0 2 4 6
  • 106. Searching in a compacted suffix tree How do you find a pattern? P aa n m TT b n aaa sn n $ a s$ nas$ nas$ nas$ s$ na s$ bananas$ 7$ an 1 3 5 0 2 4 6
  • 107. Searching in a compacted suffix tree How do you find a pattern? P aa n m start at the root and walk down the tree TT b n aaa sn n $ a s$ nas$ nas$ nas$ s$ na s$ bananas$ 7$ an 1 3 5 0 2 4 6
  • 108. Searching in a compacted suffix tree How do you find a pattern? P aa n m start at the root and walk down the tree TT b n aaa sn n $ a s$ nas$ nas$ nas$ s$ na s$ bananas$ 7$ an 1 3 5 0 2 4 6
  • 109. Searching in a compacted suffix tree How do you find a pattern? P aa n m start at the root and walk down the tree TT b n aaa sn n $ a s$ nas$ nas$ nas$ s$ na s$ bananas$ 7$ a an 1 3 5 0 2 4 6
  • 110. Searching in a compacted suffix tree How do you find a pattern? P aa n m start at the root and walk down the tree TT b n aaa sn n $ a s$ nas$ nas$ nas$ s$ na s$ bananas$ 7$ a na an 1 3 5 0 2 4 6
  • 111. Searching in a compacted suffix tree How do you find a pattern? P aa n m start at the root and walk down the tree TT b n aaa sn n $ a s$ nas$ nas$ nas$ s$ na s$ bananas$ 7$ a na an 1 3 5 0 2 4 6 remember that an edge is actually stored as a pair we’re actually looking in T
  • 112. Searching in a compacted suffix tree How do you find a pattern? P aa n m start at the root and walk down the tree TT b n aaa sn n $ a s$ nas$ nas$ nas$ s$ na s$ bananas$ 7$ a na an na1 3 5 0 2 4 6
  • 113. Searching in a compacted suffix tree How do you find a pattern? P aa n m start at the root and walk down the tree TT b n aaa sn n $ a s$ nas$ nas$ nas$ s$ na s$ bananas$ 7$ a na an na nas$ 1 3 5 0 2 4 6
  • 114. Searching in a compacted suffix tree How do you find a pattern? P aa n m start at the root and walk down the tree TT b n aaa sn n $ a s$ nas$ nas$ nas$ s$ na s$ bananas$ 7$ a na an na nas$ nas$ 1 3 5 0 2 4 6
  • 115. Searching in a compacted suffix tree How do you find a pattern? P aa n m start at the root and walk down the tree . . . matches occur at the leaves of the subtree TT b n aaa sn n $ a s$ nas$ nas$ nas$ s$ na s$ bananas$ 7$ a na an na nas$ nas$ 1 3 5 0 2 4 6
  • 116. Searching in a compacted suffix tree How do you find a pattern? P aa n m start at the root and walk down the tree . . . matches occur at the leaves of the subtree 1 TT b n aaa sn n $ a s$ nas$ nas$ nas$ s$ na s$ bananas$ 7$ a na an na nas$ nas$ 1 3 5 0 2 4 6
  • 117. Searching in a compacted suffix tree How do you find a pattern? P aa n m start at the root and walk down the tree . . . matches occur at the leaves of the subtree TT b n aaa sn n $ a s$ nas$ nas$ nas$ s$ na s$ bananas$ 7$ an 1 3 5 0 2 4 6
  • 118. Searching in a compacted suffix tree How do you find a pattern? P aa n m start at the root and walk down the tree . . . matches occur at the leaves of the subtree P n a TT b n aaa sn n $ a s$ nas$ nas$ nas$ s$ na s$ bananas$ 7$ an 1 3 5 0 2 4 6
  • 119. Searching in a compacted suffix tree How do you find a pattern? P aa n m start at the root and walk down the tree . . . matches occur at the leaves of the subtree P n a TT b n aaa sn n $ a s$ nas$ nas$ nas$ s$ na s$ bananas$ 7$ an 1 3 5 0 2 4 6
  • 120. Searching in a compacted suffix tree How do you find a pattern? P aa n m start at the root and walk down the tree . . . matches occur at the leaves of the subtree P n a TT b n aaa sn n $ a s$ nas$ nas$ nas$ s$ na s$ bananas$ 7$ an na 1 3 5 0 2 4 6
  • 121. Searching in a compacted suffix tree How do you find a pattern? P aa n m start at the root and walk down the tree . . . matches occur at the leaves of the subtree P n a TT b n aaa sn n $ a s$ nas$ nas$ nas$ s$ na s$ bananas$ 7$ an nana 1 3 5 0 2 4 6
  • 122. Searching in a compacted suffix tree How do you find a pattern? P aa n m start at the root and walk down the tree . . . matches occur at the leaves of the subtree P n a TT b n aaa sn n $ a s$ nas$ nas$ nas$ s$ na s$ bananas$ 7$ an nana 1 3 5 0 2 4 6
  • 123. Searching in a compacted suffix tree How do you find a pattern? P aa n m start at the root and walk down the tree . . . matches occur at the leaves of the subtree P n a TT b n aaa sn n $ a s$ nas$ nas$ nas$ s$ na s$ bananas$ 7$ an nana 1 3 5 0 2 4 6
  • 124. Searching in a compacted suffix tree How do you find a pattern? P aa n m start at the root and walk down the tree . . . matches occur at the leaves of the subtree P n a TT b n aaa sn n $ a s$ nas$ nas$ nas$ s$ na s$ bananas$ 7$ an nana 2 1 3 5 0 2 4 6
  • 125. Searching in a compacted suffix tree How do you find a pattern? P aa n m start at the root and walk down the tree . . . matches occur at the leaves of the subtree P n a TT b n aaa sn n $ a s$ nas$ nas$ nas$ s$ na s$ bananas$ 7$ an nana 4 1 3 5 0 2 4 6
  • 126. Searching in a compacted suffix tree How do you find a pattern? P aa n m start at the root and walk down the tree . . . matches occur at the leaves of the subtree P n a TT b n aaa sn n $ a s$ nas$ nas$ nas$ s$ na s$ bananas$ 7$ an nana how big is this subtree?1 3 5 0 2 4 6
  • 127. Searching in a compacted suffix tree How do you find a pattern? P aa n m start at the root and walk down the tree . . . matches occur at the leaves of the subtree P n a TT b n aaa sn n $ a s$ nas$ nas$ nas$ s$ na s$ bananas$ 7$ an nana how big is this subtree? O(occ) because it has occ leaves 1 3 5 0 2 4 6 (and each internal node has at least two children)
  • 128. Searching in a compacted suffix tree How do you find a pattern? P aa n m start at the root and walk down the tree . . . matches occur at the leaves of the subtree We can find all the matches in O(m + occ) time (by looking at the whole subtree) P n a TT b n aaa sn n $ a s$ nas$ nas$ nas$ s$ na s$ bananas$ 7$ an nana how big is this subtree? O(occ) because it has occ leaves 1 3 5 0 2 4 6 (and each internal node has at least two children)
  • 129. Naively constructing a compacted suffix tree Insert the suffixes one at a time (longest first) • Search for the new suffix in the partial suffix tree (as if you were matching a pattern) • Add a new edge and leaf for the new suffix (this may require you to break an edge in two) TT b n aaa sn n $ never actually do it like this you should
  • 130. Naively constructing a compacted suffix tree Insert the suffixes one at a time (longest first) • Search for the new suffix in the partial suffix tree (as if you were matching a pattern) • Add a new edge and leaf for the new suffix (this may require you to break an edge in two) TT b n aaa sn n $ b n aaa sn n aaa sn n aa sn aa sn a sn a s s suffixes $ $ $ $ $ $ $ 0 1 2 3 4 5 6 $ 7 never actually do it like this you should
  • 131. Naively constructing a compacted suffix tree Insert the suffixes one at a time (longest first) • Search for the new suffix in the partial suffix tree (as if you were matching a pattern) • Add a new edge and leaf for the new suffix (this may require you to break an edge in two) TT b n aaa sn n $ b n aaa sn n aaa sn n aa sn aa sn a sn a s s suffixes $ $ $ $ $ $ $ 0 1 2 3 4 5 6 $ 7 never actually do it like this you should
  • 132. Naively constructing a compacted suffix tree Insert the suffixes one at a time (longest first) • Search for the new suffix in the partial suffix tree (as if you were matching a pattern) • Add a new edge and leaf for the new suffix (this may require you to break an edge in two) TT b n aaa sn n $ b n aaa sn n aaa sn n aa sn aa sn a sn a s s suffixes $ $ $ $ $ $ $ 0 1 2 3 4 5 6 $ 7 never actually do it like this you should
  • 133. Naively constructing a compacted suffix tree Insert the suffixes one at a time (longest first) • Search for the new suffix in the partial suffix tree (as if you were matching a pattern) • Add a new edge and leaf for the new suffix (this may require you to break an edge in two) TT b n aaa sn n $ b n aaa sn n aaa sn n aa sn aa sn a sn a s s suffixes $ $ $ $ $ $ $ 0 1 2 3 4 5 6 $ 7 never actually do it like this you should
  • 134. Naively constructing a compacted suffix tree Insert the suffixes one at a time (longest first) • Search for the new suffix in the partial suffix tree (as if you were matching a pattern) • Add a new edge and leaf for the new suffix (this may require you to break an edge in two) TT b n aaa sn n $ bananas$ b n aaa sn n aaa sn n aa sn aa sn a sn a s s suffixes $ $ $ $ $ $ $ 0 1 2 3 4 5 6 $ 7 0 never actually do it like this you should
  • 135. Naively constructing a compacted suffix tree Insert the suffixes one at a time (longest first) • Search for the new suffix in the partial suffix tree (as if you were matching a pattern) • Add a new edge and leaf for the new suffix (this may require you to break an edge in two) TT b n aaa sn n $ bananas$ b n aaa sn n aaa sn n aa sn aa sn a sn a s s suffixes $ $ $ $ $ $ $ 0 1 2 3 4 5 6 $ 7 0 we actually store this as (0, 7) never actually do it like this you should
  • 136. Naively constructing a compacted suffix tree Insert the suffixes one at a time (longest first) • Search for the new suffix in the partial suffix tree (as if you were matching a pattern) • Add a new edge and leaf for the new suffix (this may require you to break an edge in two) TT b n aaa sn n $ bananas$ b n aaa sn n aaa sn n aa sn aa sn a sn a s s suffixes $ $ $ $ $ $ $ 0 1 2 3 4 5 6 $ 7 0 never actually do it like this you should
  • 137. Naively constructing a compacted suffix tree Insert the suffixes one at a time (longest first) • Search for the new suffix in the partial suffix tree (as if you were matching a pattern) • Add a new edge and leaf for the new suffix (this may require you to break an edge in two) TT b n aaa sn n $ bananas$ b n aaa sn n aaa sn n aa sn aa sn a sn a s s suffixes $ $ $ $ $ $ $ 0 1 2 3 4 5 6 $ 7 0 never actually do it like this you should
  • 138. Naively constructing a compacted suffix tree Insert the suffixes one at a time (longest first) • Search for the new suffix in the partial suffix tree (as if you were matching a pattern) • Add a new edge and leaf for the new suffix (this may require you to break an edge in two) TT b n aaa sn n $ bananas$ b n aaa sn n aaa sn n aa sn aa sn a sn a s s suffixes $ $ $ $ $ $ $ 0 1 2 3 4 5 6 $ 7 0 1 ananas$ never actually do it like this you should
  • 139. Naively constructing a compacted suffix tree Insert the suffixes one at a time (longest first) • Search for the new suffix in the partial suffix tree (as if you were matching a pattern) • Add a new edge and leaf for the new suffix (this may require you to break an edge in two) TT b n aaa sn n $ bananas$ b n aaa sn n aaa sn n aa sn aa sn a sn a s s suffixes $ $ $ $ $ $ $ 0 1 2 3 4 5 6 $ 7 0 1 ananas$ this is stored as (1, 7) never actually do it like this you should
  • 140. Naively constructing a compacted suffix tree Insert the suffixes one at a time (longest first) • Search for the new suffix in the partial suffix tree (as if you were matching a pattern) • Add a new edge and leaf for the new suffix (this may require you to break an edge in two) TT b n aaa sn n $ bananas$ b n aaa sn n aaa sn n aa sn aa sn a sn a s s suffixes $ $ $ $ $ $ $ 0 1 2 3 4 5 6 $ 7 0 1 ananas$ never actually do it like this you should
  • 141. Naively constructing a compacted suffix tree Insert the suffixes one at a time (longest first) • Search for the new suffix in the partial suffix tree (as if you were matching a pattern) • Add a new edge and leaf for the new suffix (this may require you to break an edge in two) TT b n aaa sn n $ bananas$ b n aaa sn n aaa sn n aa sn aa sn a sn a s s suffixes $ $ $ $ $ $ $ 0 1 2 3 4 5 6 $ 7 0 1 ananas$ never actually do it like this you should
  • 142. Naively constructing a compacted suffix tree Insert the suffixes one at a time (longest first) • Search for the new suffix in the partial suffix tree (as if you were matching a pattern) • Add a new edge and leaf for the new suffix (this may require you to break an edge in two) TT b n aaa sn n $ bananas$ b n aaa sn n aaa sn n aa sn aa sn a sn a s s suffixes $ $ $ $ $ $ $ 0 1 2 3 4 5 6 $ 7 0 1 ananas$ never actually do it like this you should
  • 143. Naively constructing a compacted suffix tree Insert the suffixes one at a time (longest first) • Search for the new suffix in the partial suffix tree (as if you were matching a pattern) • Add a new edge and leaf for the new suffix (this may require you to break an edge in two) TT b n aaa sn n $ bananas$ b n aaa sn n aaa sn n aa sn aa sn a sn a s s suffixes $ $ $ $ $ $ $ 0 1 2 3 4 5 6 $ 7 0 1 ananas$ 2 nanas$ never actually do it like this you should
  • 144. Naively constructing a compacted suffix tree Insert the suffixes one at a time (longest first) • Search for the new suffix in the partial suffix tree (as if you were matching a pattern) • Add a new edge and leaf for the new suffix (this may require you to break an edge in two) TT b n aaa sn n $ bananas$ b n aaa sn n aaa sn n aa sn aa sn a sn a s s suffixes $ $ $ $ $ $ $ 0 1 2 3 4 5 6 $ 7 0 1 ananas$ 2 nanas$ never actually do it like this you should
  • 145. Naively constructing a compacted suffix tree Insert the suffixes one at a time (longest first) • Search for the new suffix in the partial suffix tree (as if you were matching a pattern) • Add a new edge and leaf for the new suffix (this may require you to break an edge in two) TT b n aaa sn n $ bananas$ b n aaa sn n aaa sn n aa sn aa sn a sn a s s suffixes $ $ $ $ $ $ $ 0 1 2 3 4 5 6 $ 7 0 1 ananas$ 2 nanas$ never actually do it like this you should
  • 146. Naively constructing a compacted suffix tree Insert the suffixes one at a time (longest first) • Search for the new suffix in the partial suffix tree (as if you were matching a pattern) • Add a new edge and leaf for the new suffix (this may require you to break an edge in two) TT b n aaa sn n $ bananas$ b n aaa sn n aaa sn n aa sn aa sn a sn a s s suffixes $ $ $ $ $ $ $ 0 1 2 3 4 5 6 $ 7 0 1 ananas$ 2 nanas$ ananas$ never actually do it like this you should
  • 147. Naively constructing a compacted suffix tree Insert the suffixes one at a time (longest first) • Search for the new suffix in the partial suffix tree (as if you were matching a pattern) • Add a new edge and leaf for the new suffix (this may require you to break an edge in two) TT b n aaa sn n $ bananas$ b n aaa sn n aaa sn n aa sn aa sn a sn a s s suffixes $ $ $ $ $ $ $ 0 1 2 3 4 5 6 $ 7 0 1 ananas$ 2 nanas$ ananas$ ananas$ never actually do it like this you should
  • 148. Naively constructing a compacted suffix tree Insert the suffixes one at a time (longest first) • Search for the new suffix in the partial suffix tree (as if you were matching a pattern) • Add a new edge and leaf for the new suffix (this may require you to break an edge in two) TT b n aaa sn n $ bananas$ b n aaa sn n aaa sn n aa sn aa sn a sn a s s suffixes $ $ $ $ $ $ $ 0 1 2 3 4 5 6 $ 7 0 1 ananas$ 2 nanas$ ananas$ ananas$ ananas$ never actually do it like this you should
  • 149. Naively constructing a compacted suffix tree Insert the suffixes one at a time (longest first) • Search for the new suffix in the partial suffix tree (as if you were matching a pattern) • Add a new edge and leaf for the new suffix (this may require you to break an edge in two) TT b n aaa sn n $ bananas$ b n aaa sn n aaa sn n aa sn aa sn a sn a s s suffixes $ $ $ $ $ $ $ 0 1 2 3 4 5 6 $ 7 0 1 ananas$ 2 nanas$ ananas$ ananas$ ananas$ ananas$ never actually do it like this you should
  • 150. Naively constructing a compacted suffix tree Insert the suffixes one at a time (longest first) • Search for the new suffix in the partial suffix tree (as if you were matching a pattern) • Add a new edge and leaf for the new suffix (this may require you to break an edge in two) TT b n aaa sn n $ bananas$ b n aaa sn n aaa sn n aa sn aa sn a sn a s s suffixes $ $ $ $ $ $ $ 0 1 2 3 4 5 6 $ 7 0 2 nanas$ nas$ 1 ana nas$ ana never actually do it like this you should
  • 151. Naively constructing a compacted suffix tree Insert the suffixes one at a time (longest first) • Search for the new suffix in the partial suffix tree (as if you were matching a pattern) • Add a new edge and leaf for the new suffix (this may require you to break an edge in two) TT b n aaa sn n $ bananas$ b n aaa sn n aaa sn n aa sn aa sn a sn a s s suffixes $ $ $ $ $ $ $ 0 1 2 3 4 5 6 $ 7 0 2 nanas$ nas$ 1 ana nas$ ana ananas$ was stored as (1, 7) never actually do it like this you should
  • 152. Naively constructing a compacted suffix tree Insert the suffixes one at a time (longest first) • Search for the new suffix in the partial suffix tree (as if you were matching a pattern) • Add a new edge and leaf for the new suffix (this may require you to break an edge in two) TT b n aaa sn n $ bananas$ b n aaa sn n aaa sn n aa sn aa sn a sn a s s suffixes $ $ $ $ $ $ $ 0 1 2 3 4 5 6 $ 7 0 2 nanas$ nas$ 1 ana nas$ ana ananas$ was stored as (1, 7) ana is stored as (1, 3) nas$ is stored as (4, 7) never actually do it like this you should
  • 153. Naively constructing a compacted suffix tree Insert the suffixes one at a time (longest first) • Search for the new suffix in the partial suffix tree (as if you were matching a pattern) • Add a new edge and leaf for the new suffix (this may require you to break an edge in two) TT b n aaa sn n $ bananas$ b n aaa sn n aaa sn n aa sn aa sn a sn a s s suffixes $ $ $ $ $ $ $ 0 1 2 3 4 5 6 $ 7 0 2 nanas$ nas$ 1 ana nas$ ana never actually do it like this you should
  • 154. Naively constructing a compacted suffix tree Insert the suffixes one at a time (longest first) • Search for the new suffix in the partial suffix tree (as if you were matching a pattern) • Add a new edge and leaf for the new suffix (this may require you to break an edge in two) TT b n aaa sn n $ bananas$ b n aaa sn n aaa sn n aa sn aa sn a sn a s s suffixes $ $ $ $ $ $ $ 0 1 2 3 4 5 6 $ 7 0 2 nanas$ s$ nas$ 1 3 ana never actually do it like this you should
  • 155. Naively constructing a compacted suffix tree Insert the suffixes one at a time (longest first) • Search for the new suffix in the partial suffix tree (as if you were matching a pattern) • Add a new edge and leaf for the new suffix (this may require you to break an edge in two) TT b n aaa sn n $ bananas$ b n aaa sn n aaa sn n aa sn aa sn a sn a s s suffixes $ $ $ $ $ $ $ 0 1 2 3 4 5 6 $ 7 0 2 nanas$ s$ nas$ 1 3 ana stored as (6, 7) never actually do it like this you should
  • 156. Naively constructing a compacted suffix tree Insert the suffixes one at a time (longest first) • Search for the new suffix in the partial suffix tree (as if you were matching a pattern) • Add a new edge and leaf for the new suffix (this may require you to break an edge in two) TT b n aaa sn n $ bananas$ b n aaa sn n aaa sn n aa sn aa sn a sn a s s suffixes $ $ $ $ $ $ $ 0 1 2 3 4 5 6 $ 7 0 2 nanas$ s$ nas$ 1 3 ana never actually do it like this you should
  • 157. Naively constructing a compacted suffix tree Insert the suffixes one at a time (longest first) • Search for the new suffix in the partial suffix tree (as if you were matching a pattern) • Add a new edge and leaf for the new suffix (this may require you to break an edge in two) TT b n aaa sn n $ bananas$ b n aaa sn n aaa sn n aa sn aa sn a sn a s s suffixes $ $ $ $ $ $ $ 0 1 2 3 4 5 6 $ 7 0 2 nanas$ s$ nas$ 1 3 ana never actually do it like this you should
  • 158. Naively constructing a compacted suffix tree Insert the suffixes one at a time (longest first) • Search for the new suffix in the partial suffix tree (as if you were matching a pattern) • Add a new edge and leaf for the new suffix (this may require you to break an edge in two) TT b n aaa sn n $ bananas$ b n aaa sn n aaa sn n aa sn aa sn a sn a s s suffixes $ $ $ $ $ $ $ 0 1 2 3 4 5 6 $ 7 0 2 nanas$ s$ nas$ 1 3 ana nanas$ never actually do it like this you should
  • 159. Naively constructing a compacted suffix tree Insert the suffixes one at a time (longest first) • Search for the new suffix in the partial suffix tree (as if you were matching a pattern) • Add a new edge and leaf for the new suffix (this may require you to break an edge in two) TT b n aaa sn n $ bananas$ b n aaa sn n aaa sn n aa sn aa sn a sn a s s suffixes $ $ $ $ $ $ $ 0 1 2 3 4 5 6 $ 7 0 2 nanas$ s$ nas$ 1 3 ana nanas$ nanas$ never actually do it like this you should
  • 160. Naively constructing a compacted suffix tree Insert the suffixes one at a time (longest first) • Search for the new suffix in the partial suffix tree (as if you were matching a pattern) • Add a new edge and leaf for the new suffix (this may require you to break an edge in two) TT b n aaa sn n $ bananas$ b n aaa sn n aaa sn n aa sn aa sn a sn a s s suffixes $ $ $ $ $ $ $ 0 1 2 3 4 5 6 $ 7 0 2 nanas$ s$ nas$ 1 3 ana nanas$ nanas$ nanas$ never actually do it like this you should
  • 161. Naively constructing a compacted suffix tree Insert the suffixes one at a time (longest first) • Search for the new suffix in the partial suffix tree (as if you were matching a pattern) • Add a new edge and leaf for the new suffix (this may require you to break an edge in two) TT b n aaa sn n $ na bananas$ b n aaa sn n aaa sn n aa sn aa sn a sn a s s suffixes $ $ $ $ $ $ $ 0 1 2 3 4 5 6 $ 7 0 2 s$ nas$ 1 3 ana na nas$ nas$ never actually do it like this you should
  • 162. Naively constructing a compacted suffix tree Insert the suffixes one at a time (longest first) • Search for the new suffix in the partial suffix tree (as if you were matching a pattern) • Add a new edge and leaf for the new suffix (this may require you to break an edge in two) TT b n aaa sn n $ s$ na bananas$ b n aaa sn n aaa sn n aa sn aa sn a sn a s s suffixes $ $ $ $ $ $ $ 0 1 2 3 4 5 6 $ 7 0 2 4 s$ nas$ 1 3 ana nas$ never actually do it like this you should
  • 163. Naively constructing a compacted suffix tree Insert the suffixes one at a time (longest first) • Search for the new suffix in the partial suffix tree (as if you were matching a pattern) • Add a new edge and leaf for the new suffix (this may require you to break an edge in two) TT b n aaa sn n $ s$ na bananas$ b n aaa sn n aaa sn n aa sn aa sn a sn a s s suffixes $ $ $ $ $ $ $ 0 1 2 3 4 5 6 $ 7 0 2 4 s$ nas$ 1 3 ana nas$ never actually do it like this you should
  • 164. Naively constructing a compacted suffix tree Insert the suffixes one at a time (longest first) • Search for the new suffix in the partial suffix tree (as if you were matching a pattern) • Add a new edge and leaf for the new suffix (this may require you to break an edge in two) TT b n aaa sn n $ s$ na bananas$ b n aaa sn n aaa sn n aa sn aa sn a sn a s s suffixes $ $ $ $ $ $ $ 0 1 2 3 4 5 6 $ 7 0 2 4 s$ nas$ 1 3 ana nas$ never actually do it like this you should
  • 165. Naively constructing a compacted suffix tree Insert the suffixes one at a time (longest first) • Search for the new suffix in the partial suffix tree (as if you were matching a pattern) • Add a new edge and leaf for the new suffix (this may require you to break an edge in two) TT b n aaa sn n $ s$ na bananas$ b n aaa sn n aaa sn n aa sn aa sn a sn a s s suffixes $ $ $ $ $ $ $ 0 1 2 3 4 5 6 $ 7 0 2 4 s$ nas$ 1 3 ana ana nas$ never actually do it like this you should
  • 166. Naively constructing a compacted suffix tree Insert the suffixes one at a time (longest first) • Search for the new suffix in the partial suffix tree (as if you were matching a pattern) • Add a new edge and leaf for the new suffix (this may require you to break an edge in two) TT b n aaa sn n $ s$ na bananas$ b n aaa sn n aaa sn n aa sn aa sn a sn a s s suffixes $ $ $ $ $ $ $ 0 1 2 3 4 5 6 $ 7 0 2 4 s$ nas$ 1 3 ana ana ana nas$ never actually do it like this you should
  • 167. Naively constructing a compacted suffix tree Insert the suffixes one at a time (longest first) • Search for the new suffix in the partial suffix tree (as if you were matching a pattern) • Add a new edge and leaf for the new suffix (this may require you to break an edge in two) TT b n aaa sn n $ a nas$ nas$ s$ na bananas$ b n aaa sn n aaa sn n aa sn aa sn a sn a s s suffixes $ $ $ $ $ $ $ 0 1 2 3 4 5 6 $ 7 1 3 0 2 4 nas$ a na never actually do it like this you should
  • 168. Naively constructing a compacted suffix tree Insert the suffixes one at a time (longest first) • Search for the new suffix in the partial suffix tree (as if you were matching a pattern) • Add a new edge and leaf for the new suffix (this may require you to break an edge in two) TT b n aaa sn n $ a s$ nas$ nas$ s$ na bananas$ b n aaa sn n aaa sn n aa sn aa sn a sn a s s suffixes $ $ $ $ $ $ $ 0 1 2 3 4 5 6 $ 7 1 3 5 0 2 4 nas$ never actually do it like this you should
  • 169. Naively constructing a compacted suffix tree Insert the suffixes one at a time (longest first) • Search for the new suffix in the partial suffix tree (as if you were matching a pattern) • Add a new edge and leaf for the new suffix (this may require you to break an edge in two) TT b n aaa sn n $ a s$ nas$ nas$ s$ na bananas$ b n aaa sn n aaa sn n aa sn aa sn a sn a s s suffixes $ $ $ $ $ $ $ 0 1 2 3 4 5 6 $ 7 1 3 5 0 2 4 nas$ never actually do it like this you should
  • 170. Naively constructing a compacted suffix tree Insert the suffixes one at a time (longest first) • Search for the new suffix in the partial suffix tree (as if you were matching a pattern) • Add a new edge and leaf for the new suffix (this may require you to break an edge in two) TT b n aaa sn n $ a s$ nas$ nas$ s$ na bananas$ b n aaa sn n aaa sn n aa sn aa sn a sn a s s suffixes $ $ $ $ $ $ $ 0 1 2 3 4 5 6 $ 7 1 3 5 0 2 4 nas$ never actually do it like this you should
  • 171. Naively constructing a compacted suffix tree Insert the suffixes one at a time (longest first) • Search for the new suffix in the partial suffix tree (as if you were matching a pattern) • Add a new edge and leaf for the new suffix (this may require you to break an edge in two) TT b n aaa sn n $ a s$ nas$ nas$ s$ na s$ bananas$ b n aaa sn n aaa sn n aa sn aa sn a sn a s s suffixes $ $ $ $ $ $ $ 0 1 2 3 4 5 6 $ 7 1 3 5 0 2 4 6 nas$ never actually do it like this you should
  • 172. Naively constructing a compacted suffix tree Insert the suffixes one at a time (longest first) • Search for the new suffix in the partial suffix tree (as if you were matching a pattern) • Add a new edge and leaf for the new suffix (this may require you to break an edge in two) TT b n aaa sn n $ a s$ nas$ nas$ s$ na s$ bananas$ b n aaa sn n aaa sn n aa sn aa sn a sn a s s suffixes $ $ $ $ $ $ $ 0 1 2 3 4 5 6 $ 7 1 3 5 0 2 4 6 nas$ never actually do it like this you should
  • 173. Naively constructing a compacted suffix tree Insert the suffixes one at a time (longest first) • Search for the new suffix in the partial suffix tree (as if you were matching a pattern) • Add a new edge and leaf for the new suffix (this may require you to break an edge in two) TT b n aaa sn n $ a s$ nas$ nas$ s$ na s$ bananas$ b n aaa sn n aaa sn n aa sn aa sn a sn a s s suffixes $ $ $ $ $ $ $ 0 1 2 3 4 5 6 $ 7 1 3 5 0 2 4 6 nas$ never actually do it like this you should
  • 174. Naively constructing a compacted suffix tree Insert the suffixes one at a time (longest first) • Search for the new suffix in the partial suffix tree (as if you were matching a pattern) • Add a new edge and leaf for the new suffix (this may require you to break an edge in two) TT b n aaa sn n $ a s$ nas$ nas$ s$ na s$ bananas$ 7$ b n aaa sn n aaa sn n aa sn aa sn a sn a s s suffixes $ $ $ $ $ $ $ 0 1 2 3 4 5 6 $ 7 1 3 5 0 2 4 6 nas$ never actually do it like this you should
  • 175. Naively constructing a compacted suffix tree Insert the suffixes one at a time (longest first) • Search for the new suffix in the partial suffix tree (as if you were matching a pattern) • Add a new edge and leaf for the new suffix (this may require you to break an edge in two) TT b n aaa sn n $ a s$ nas$ nas$ s$ na s$ bananas$ 7$ b n aaa sn n aaa sn n aa sn aa sn a sn a s s suffixes $ $ $ $ $ $ $ 0 1 2 3 4 5 6 $ 7 1 3 5 0 2 4 6 nas$ never actually do it like this you should
  • 176. Naively constructing a compacted suffix tree Insert the suffixes one at a time (longest first) • Search for the new suffix in the partial suffix tree (as if you were matching a pattern) • Add a new edge and leaf for the new suffix (this may require you to break an edge in two) TT b n aaa sn n $ a s$ nas$ nas$ s$ na s$ bananas$ 7$ b n aaa sn n aaa sn n aa sn aa sn a sn a s s suffixes $ $ $ $ $ $ $ 0 1 2 3 4 5 6 $ 7 1 3 5 0 2 4 6 nas$ This takes O(n) time per suffix. . . never actually do it like this you should
  • 177. Naively constructing a compacted suffix tree Insert the suffixes one at a time (longest first) • Search for the new suffix in the partial suffix tree (as if you were matching a pattern) • Add a new edge and leaf for the new suffix (this may require you to break an edge in two) TT b n aaa sn n $ a s$ nas$ nas$ s$ na s$ bananas$ 7$ b n aaa sn n aaa sn n aa sn aa sn a sn a s s suffixes $ $ $ $ $ $ $ 0 1 2 3 4 5 6 $ 7 1 3 5 0 2 4 6 nas$ This takes O(n) time per suffix. . . so O(n2) time in total never actually do it like this you should
  • 178. Suffix tree summary TT b n aaa sn n $ a s$ nas$ nas$ s$ na s$ bananas$ 7$ b n aaa sn n aaa sn n aa sn aa sn a sn a s s suffixes $ $ $ $ $ $ $ 0 1 2 3 4 5 6 $ 7 1 3 5 0 2 4 6 nas$ • The (compacted) suffix tree of a (length n) text uses O(n) space • Finding all matches of a pattern P of length m takes O(m + occ) where occ is the number of matches we assumed that the alphabet contained a constant number of symbols • Suffix trees can be built in O(n) time but we have only seen the O(n2) time method actually do it like this (or build a suffix array instead) you should
  • 179. Multiple text indexing T1 b n aaa sn n1 T2 a p slp e n2 How can we index multiple texts?
  • 180. Multiple text indexing T1 b n aaa sn n1 T2 a p slp e n2 $ & two distinct unique symbols How can we index multiple texts?
  • 181. Multiple text indexing TT T1 b n aaa sn n1 T2 a p slp e n2 $ & two distinct unique symbols b n aaa sn $ a p slp e & n How can we index multiple texts?
  • 182. Multiple text indexing TT T1 b n aaa sn n1 T2 a p slp e n2 $ & b n aaa sn $ a p slp e & n How can we index multiple texts?
  • 183. Multiple text indexing 6$ 13 & TT a s$ nas$ nas$ s$ na s bananas$ 7$ 1 3 5 0 2 4 nas$ T1 b n aaa sn n1 T2 a p slp e n2 $ & b n aaa sn $ a p slp e & n 14 & 12 es& les&11 8 pples& p 10 les& 9 ples& How can we index multiple texts?
  • 184. Multiple text indexing 6$ 13 & TT a s$ nas$ nas$ s$ na s bananas$ 7$ 1 3 5 0 2 4 nas$ T1 b n aaa sn n1 T2 a p slp e n2 $ & b n aaa sn $ a p slp e & n • Build a generalised suffix tree in O(n1 + n2) space 14 & 12 es& les&11 8 pples& p 10 les& 9 ples& How can we index multiple texts?
  • 185. Multiple text indexing 6$ 13 & TT a s$ nas$ nas$ s$ na s bananas$ 7$ 1 3 5 0 2 4 nas$ T1 b n aaa sn n1 T2 a p slp e n2 $ & b n aaa sn $ a p slp e & n • Build a generalised suffix tree in O(n1 + n2) space • Using the linear time method (which we omitted), this takes O(n1 + n2) time 14 & 12 es& les&11 8 pples& p 10 les& 9 ples& How can we index multiple texts?
  • 186. Multiple text indexing 6$ 13 & TT a s$ nas$ nas$ s$ na s bananas$ 7$ 1 3 5 0 2 4 nas$ T1 b n aaa sn n1 T2 a p slp e n2 $ & b n aaa sn $ a p slp e & n • Build a generalised suffix tree in O(n1 + n2) space • Using the linear time method (which we omitted), this takes O(n1 + n2) time • Finding all matches of a pattern P of length m still takes O(m + occ) time 14 & 12 es& les&11 8 pples& p 10 les& 9 ples& How can we index multiple texts? where occ is the number of matches
  • 187. The suffix array - a sneak preview T b n aaT a sn n
  • 188. The suffix array - a sneak preview T b n aaT a sn n 0 b n aaa sn n aa1 a sn 2 n aa sn 4 a sn 5 a s 6 s 3 aa sn
  • 189. The suffix array - a sneak preview T b n aaT a sn n 0 b n aaa sn n aa1 a sn 2 n aa sn 4 a sn 5 a s 6 s 3 aa snsuffix 1
  • 190. The suffix array - a sneak preview T b n aaT a sn n 0 b n aaa sn n aa1 a sn 2 n aa sn 4 a sn 5 a s 6 s 3 aa sn
  • 191. The suffix array - a sneak preview T b n aaT a sn n 0 b n aaa sn n aa1 a sn 2 n aa sn 4 a sn 5 a s 6 s Sort the suffixes lexicographically 3 aa sn
  • 192. The suffix array - a sneak preview T b n aaT a sn n 0 b n aaa sn n aa1 a sn 2 n aa sn 4 a sn 5 a s 6 s Sort the suffixes lexicographically 3 aa sn • The symbols themselves must have an order throughout we will use alphabetical order
  • 193. The suffix array - a sneak preview T b n aaT a sn n 0 b n aaa sn n aa1 a sn 2 n aa sn 4 a sn 5 a s 6 s Sort the suffixes lexicographically 3 aa sn • The symbols themselves must have an order throughout we will use alphabetical order In lexicographical ordering we sort strings based on the first symbol that differs:
  • 194. The suffix array - a sneak preview T b n aaT a sn n 0 b n aaa sn n aa1 a sn 2 n aa sn 4 a sn 5 a s 6 s Sort the suffixes lexicographically 3 aa sn • The symbols themselves must have an order throughout we will use alphabetical order In lexicographical ordering we sort strings based on the first symbol that differs: b a<a a
  • 195. The suffix array - a sneak preview T b n aaT a sn n 0 b n aaa sn n aa1 a sn 2 n aa sn 4 a sn 5 a s 6 s Sort the suffixes lexicographically 3 aa sn • The symbols themselves must have an order throughout we will use alphabetical order In lexicographical ordering we sort strings based on the first symbol that differs: b a<a a
  • 196. The suffix array - a sneak preview T b n aaT a sn n 0 b n aaa sn n aa1 a sn 2 n aa sn 4 a sn 5 a s 6 s Sort the suffixes lexicographically 3 aa sn • The symbols themselves must have an order throughout we will use alphabetical order In lexicographical ordering we sort strings based on the first symbol that differs: b a<a a
  • 197. The suffix array - a sneak preview T b n aaT a sn n 0 b n aaa sn n aa1 a sn 2 n aa sn 4 a sn 5 a s 6 s Sort the suffixes lexicographically 3 aa sn • The symbols themselves must have an order throughout we will use alphabetical order In lexicographical ordering we sort strings based on the first symbol that differs: b a<a a b c<
  • 198. The suffix array - a sneak preview T b n aaT a sn n 0 b n aaa sn n aa1 a sn 2 n aa sn 4 a sn 5 a s 6 s Sort the suffixes lexicographically 3 aa sn • The symbols themselves must have an order throughout we will use alphabetical order In lexicographical ordering we sort strings based on the first symbol that differs: b a<a a b c<
  • 199. The suffix array - a sneak preview T b n aaT a sn n 0 b n aaa sn n aa1 a sn 2 n aa sn 4 a sn 5 a s 6 s Sort the suffixes lexicographically 3 aa sn • The symbols themselves must have an order throughout we will use alphabetical order In lexicographical ordering we sort strings based on the first symbol that differs: b a<a a b c<
  • 200. The suffix array - a sneak preview T b n aaT a sn n 0 b n aaa sn n aa1 a sn 2 n aa sn 4 a sn 5 a s 6 s Sort the suffixes lexicographically 3 aa sn • The symbols themselves must have an order throughout we will use alphabetical order In lexicographical ordering we sort strings based on the first symbol that differs: b a<a a b c< b c< a
  • 201. The suffix array - a sneak preview T b n aaT a sn n 0 b n aaa sn n aa1 a sn 2 n aa sn 4 a sn 5 a s 6 s Sort the suffixes lexicographically 3 aa sn • The symbols themselves must have an order throughout we will use alphabetical order In lexicographical ordering we sort strings based on the first symbol that differs: b a<a a b c< b c< a
  • 202. The suffix array - a sneak preview T b n aaT a sn n 0 b n aaa sn n aa1 a sn 2 n aa sn 4 a sn 5 a s 6 s Sort the suffixes lexicographically 3 aa sn • The symbols themselves must have an order throughout we will use alphabetical order In lexicographical ordering we sort strings based on the first symbol that differs: b a<a a b c< b c< a (in a ‘tie’, the shorter string is smaller)
  • 203. The suffix array - a sneak preview T b n aaT a sn n 0 b n aaa sn n aa1 a sn 2 n aa sn 4 a sn 5 a s 6 s Sort the suffixes lexicographically 3 aa sn • The symbols themselves must have an order throughout we will use alphabetical order In lexicographical ordering we sort strings based on the first symbol that differs: b a<a a b c< b c< a (in a ‘tie’, the shorter string is smaller)
  • 204. The suffix array - a sneak preview T b n aaT a sn n Sort the suffixes lexicographically 0 b n aaa sn n aa1 a sn 2 n aa sn 4 a sn 5 a s 6 s 3 aa sn • The symbols themselves must have an order throughout we will use alphabetical order In lexicographical ordering we sort strings based on the first symbol that differs: b a<a a b c< b c< a (in a ‘tie’, the shorter string is smaller)
  • 205. The suffix array - a sneak preview T b n aaT a sn n Sort the suffixes lexicographically 0 b n aaa sn n aa1 a sn 2 n aa sn 4 a sn 5 a s 6 s 3 aa sn • The symbols themselves must have an order throughout we will use alphabetical order In lexicographical ordering we sort strings based on the first symbol that differs: b a<a a b c< b c< a (in a ‘tie’, the shorter string is smaller)
  • 206. The suffix array - a sneak preview T b n aaT a sn n Sort the suffixes lexicographically 0 b n aaa sn n aa1 a sn 2 n aa sn 4 a sn 5 a s 6 s 3 aa sn • The symbols themselves must have an order throughout we will use alphabetical order In lexicographical ordering we sort strings based on the first symbol that differs: b a<a a b c< b c< a (in a ‘tie’, the shorter string is smaller)
  • 207. The suffix array - a sneak preview T b n aaT a sn n Sort the suffixes lexicographically 0 b n aaa sn n aa1 a sn 2 n aa sn 4 a sn 5 a s 6 s 3 aa sn • The symbols themselves must have an order throughout we will use alphabetical order In lexicographical ordering we sort strings based on the first symbol that differs: b a<a a b c< b c< a (in a ‘tie’, the shorter string is smaller)
  • 208. The suffix array - a sneak preview T b n aaT a sn n Sort the suffixes lexicographically 0 b n aaa sn n aa1 a sn 2 n aa sn 4 a sn 5 a s 6 s 3 aa sn • The symbols themselves must have an order throughout we will use alphabetical order b a<a a b c< b c< a (in a ‘tie’, the shorter string is smaller) just a fancy name for the order the strings would appear in a dictionary In lexicographical ordering we sort strings based on the first symbol that differs:
  • 209. The suffix array - a sneak preview T b n aaT a sn n Sort the suffixes lexicographically 0 b n aaa sn n aa1 a sn 2 n aa sn 4 a sn 5 a s 6 s 3 aa sn • The symbols themselves must have an order throughout we will use alphabetical order If the symbols don’t have a natural order, we use their binary representation in memory b a<a a b c< b c< a (in a ‘tie’, the shorter string is smaller) just a fancy name for the order the strings would appear in a dictionary In lexicographical ordering we sort strings based on the first symbol that differs:
  • 210. The suffix array - a sneak preview T b n aaT a sn n Sort the suffixes lexicographically 0 b n aaa sn n aa1 a sn 2 n aa sn 4 a sn 5 a s 6 s 3 aa sn
  • 211. The suffix array - a sneak preview T b n aaT a sn n Sort the suffixes lexicographically 0 b n aaa sn n aa1 a sn 2 n aa sn 4 a sn 5 a s 6 s 3 aa sn 10 2 3 4 5 6
  • 212. The suffix array - a sneak preview T b n aaT a sn n Suffix Array 1 0 625 4 Sort the suffixes lexicographically 0 b n aaa sn n aa1 a sn 2 n aa sn 4 a sn 5 a s 6 s 3 aa sn 3 n 10 2 3 4 5 6
  • 213. The suffix array - a sneak preview T b n aaT a sn n Suffix Array 1 0 625 4 Sort the suffixes lexicographically 0 b n aaa sn n aa1 a sn 2 n aa sn 4 a sn 5 a s 6 s 3 aa sn 3 n 10 2 3 4 5 6
  • 214. The suffix array - a sneak preview T b n aaT a sn n Suffix Array 1 0 625 4 Sort the suffixes lexicographically 0 b n aaa sn n aa1 a sn 2 n aa sn 4 a sn 5 a s 6 s 3 aa sn 3 n 10 2 3 4 5 6
  • 215. The suffix array - a sneak preview T b n aaT a sn n Suffix Array 1 0 625 4 Sort the suffixes lexicographically 0 b n aaa sn n aa1 a sn 2 n aa sn 4 a sn 5 a s 6 s 3 aa sn 3 n The suffix array is much smaller than the suffix tree (in terms of constants) 10 2 3 4 5 6
  • 216. The suffix array - a sneak preview T b n aaT a sn n Suffix Array 1 0 625 4 Sort the suffixes lexicographically 0 b n aaa sn n aa1 a sn 2 n aa sn 4 a sn 5 a s 6 s 3 aa sn 3 n The suffix array is much smaller than the suffix tree (in terms of constants) 10 2 3 4 5 6
  • 217. Constructing the Suffix Array from the Suffix Tree a s$ nas$ nas$ nas$ s$ na s$ bananas$ 1 3 5 0 2 4 6 T b n aaT a sn n Suffix Array Suffix Tree recall that we added a unique symbol $ to make sure the tree exists - the $ is the smallest symbol in the alphabet 1 0 625 43 n 10 2 3 4 5 6
  • 218. Constructing the Suffix Array from the Suffix Tree a s$ nas$ nas$ nas$ s$ na s$ bananas$ 1 3 5 0 2 4 6 T b n aaT a sn n Suffix Array Suffix Tree recall that we added a unique symbol $ to make sure the tree exists - the $ is the smallest symbol in the alphabet 1 0 625 43 n To get the Suffix array perform a depth-first search (in lexicographical order) 10 2 3 4 5 6
  • 219. Constructing the Suffix Array from the Suffix Tree a s$ nas$ nas$ nas$ s$ na s$ bananas$ 1 3 5 0 2 4 6 T b n aaT a sn n Suffix Array recall that we added a unique symbol $ to make sure the tree exists - the $ is the smallest symbol in the alphabet 1 0 625 43 n To get the Suffix array perform a depth-first search (in lexicographical order) 10 2 3 4 5 6
  • 220. Constructing the Suffix Array from the Suffix Tree a s$ nas$ nas$ nas$ s$ na s$ bananas$ 1 3 5 0 2 4 6 T b n aaT a sn n Suffix Array recall that we added a unique symbol $ to make sure the tree exists - the $ is the smallest symbol in the alphabet 1 0 625 43 n To get the Suffix array perform a depth-first search (in lexicographical order) 10 2 3 4 5 6
  • 221. Constructing the Suffix Array from the Suffix Tree a s$ nas$ nas$ nas$ s$ na s$ bananas$ 1 3 5 0 2 4 6 T b n aaT a sn n Suffix Array recall that we added a unique symbol $ to make sure the tree exists - the $ is the smallest symbol in the alphabet 1 0 625 43 n To get the Suffix array perform a depth-first search (in lexicographical order) 10 2 3 4 5 6
  • 222. Constructing the Suffix Array from the Suffix Tree a s$ nas$ nas$ nas$ s$ na s$ bananas$ 1 3 5 0 2 4 6 T b n aaT a sn n Suffix Array recall that we added a unique symbol $ to make sure the tree exists - the $ is the smallest symbol in the alphabet 1 0 625 43 n To get the Suffix array perform a depth-first search (in lexicographical order) 1 10 2 3 4 5 6
  • 223. Constructing the Suffix Array from the Suffix Tree a s$ nas$ nas$ nas$ s$ na s$ bananas$ 1 3 5 0 2 4 6 T b n aaT a sn n Suffix Array recall that we added a unique symbol $ to make sure the tree exists - the $ is the smallest symbol in the alphabet 1 0 625 43 n To get the Suffix array perform a depth-first search (in lexicographical order) 1 10 2 3 4 5 6
  • 224. Constructing the Suffix Array from the Suffix Tree a s$ nas$ nas$ nas$ s$ na s$ bananas$ 1 3 5 0 2 4 6 T b n aaT a sn n Suffix Array recall that we added a unique symbol $ to make sure the tree exists - the $ is the smallest symbol in the alphabet 1 0 625 43 n To get the Suffix array perform a depth-first search (in lexicographical order) 1 10 2 3 4 5 6
  • 225. Constructing the Suffix Array from the Suffix Tree a s$ nas$ nas$ nas$ s$ na s$ bananas$ 1 3 5 0 2 4 6 T b n aaT a sn n Suffix Array recall that we added a unique symbol $ to make sure the tree exists - the $ is the smallest symbol in the alphabet 1 0 625 43 n To get the Suffix array perform a depth-first search (in lexicographical order) 1 3 10 2 3 4 5 6
  • 226. Constructing the Suffix Array from the Suffix Tree a s$ nas$ nas$ nas$ s$ na s$ bananas$ 1 3 5 0 2 4 6 T b n aaT a sn n Suffix Array recall that we added a unique symbol $ to make sure the tree exists - the $ is the smallest symbol in the alphabet 1 0 625 43 n To get the Suffix array perform a depth-first search (in lexicographical order) 1 3 10 2 3 4 5 6
  • 227. Constructing the Suffix Array from the Suffix Tree a s$ nas$ nas$ nas$ s$ na s$ bananas$ 1 3 5 0 2 4 6 T b n aaT a sn n Suffix Array recall that we added a unique symbol $ to make sure the tree exists - the $ is the smallest symbol in the alphabet 1 0 625 43 n To get the Suffix array perform a depth-first search (in lexicographical order) 1 3 10 2 3 4 5 6
  • 228. Constructing the Suffix Array from the Suffix Tree a s$ nas$ nas$ nas$ s$ na s$ bananas$ 1 3 5 0 2 4 6 T b n aaT a sn n Suffix Array recall that we added a unique symbol $ to make sure the tree exists - the $ is the smallest symbol in the alphabet 1 0 625 43 n To get the Suffix array perform a depth-first search (in lexicographical order) 1 3 10 2 3 4 5 6
  • 229. Constructing the Suffix Array from the Suffix Tree a s$ nas$ nas$ nas$ s$ na s$ bananas$ 1 3 5 0 2 4 6 T b n aaT a sn n Suffix Array recall that we added a unique symbol $ to make sure the tree exists - the $ is the smallest symbol in the alphabet 1 0 625 43 n To get the Suffix array perform a depth-first search (in lexicographical order) 1 3 5 10 2 3 4 5 6
  • 230. Constructing the Suffix Array from the Suffix Tree a s$ nas$ nas$ nas$ s$ na s$ bananas$ 1 3 5 0 2 4 6 T b n aaT a sn n Suffix Array recall that we added a unique symbol $ to make sure the tree exists - the $ is the smallest symbol in the alphabet 1 0 625 43 n To get the Suffix array perform a depth-first search (in lexicographical order) 1 3 5 10 2 3 4 5 6
  • 231. Constructing the Suffix Array from the Suffix Tree a s$ nas$ nas$ nas$ s$ na s$ bananas$ 1 3 5 0 2 4 6 T b n aaT a sn n Suffix Array recall that we added a unique symbol $ to make sure the tree exists - the $ is the smallest symbol in the alphabet 1 0 625 43 n To get the Suffix array perform a depth-first search (in lexicographical order) 1 3 5 10 2 3 4 5 6
  • 232. Constructing the Suffix Array from the Suffix Tree a s$ nas$ nas$ nas$ s$ na s$ bananas$ 1 3 5 0 2 4 6 T b n aaT a sn n Suffix Array recall that we added a unique symbol $ to make sure the tree exists - the $ is the smallest symbol in the alphabet 1 0 625 43 n To get the Suffix array perform a depth-first search (in lexicographical order) 1 3 5 10 2 3 4 5 6
  • 233. Constructing the Suffix Array from the Suffix Tree a s$ nas$ nas$ nas$ s$ na s$ bananas$ 1 3 5 0 2 4 6 T b n aaT a sn n Suffix Array recall that we added a unique symbol $ to make sure the tree exists - the $ is the smallest symbol in the alphabet 1 0 625 43 n To get the Suffix array perform a depth-first search (in lexicographical order) 1 3 5 0 10 2 3 4 5 6
  • 234. Constructing the Suffix Array from the Suffix Tree a s$ nas$ nas$ nas$ s$ na s$ bananas$ 1 3 5 0 2 4 6 T b n aaT a sn n Suffix Array recall that we added a unique symbol $ to make sure the tree exists - the $ is the smallest symbol in the alphabet 1 0 625 43 n To get the Suffix array perform a depth-first search (in lexicographical order) 1 3 5 0 10 2 3 4 5 6
  • 235. Constructing the Suffix Array from the Suffix Tree a s$ nas$ nas$ nas$ s$ na s$ bananas$ 1 3 5 0 2 4 6 T b n aaT a sn n Suffix Array recall that we added a unique symbol $ to make sure the tree exists - the $ is the smallest symbol in the alphabet 1 0 625 43 n To get the Suffix array perform a depth-first search (in lexicographical order) 1 3 5 0 10 2 3 4 5 6
  • 236. Constructing the Suffix Array from the Suffix Tree a s$ nas$ nas$ nas$ s$ na s$ bananas$ 1 3 5 0 2 4 6 T b n aaT a sn n Suffix Array recall that we added a unique symbol $ to make sure the tree exists - the $ is the smallest symbol in the alphabet 1 0 625 43 n To get the Suffix array perform a depth-first search (in lexicographical order) 1 3 5 0 10 2 3 4 5 6
  • 237. Constructing the Suffix Array from the Suffix Tree a s$ nas$ nas$ nas$ s$ na s$ bananas$ 1 3 5 0 2 4 6 T b n aaT a sn n Suffix Array recall that we added a unique symbol $ to make sure the tree exists - the $ is the smallest symbol in the alphabet 1 0 625 43 n To get the Suffix array perform a depth-first search (in lexicographical order) 1 3 5 0 2 10 2 3 4 5 6
  • 238. Constructing the Suffix Array from the Suffix Tree a s$ nas$ nas$ nas$ s$ na s$ bananas$ 1 3 5 0 2 4 6 T b n aaT a sn n Suffix Array recall that we added a unique symbol $ to make sure the tree exists - the $ is the smallest symbol in the alphabet 1 0 625 43 n To get the Suffix array perform a depth-first search (in lexicographical order) 1 3 5 0 2 10 2 3 4 5 6
  • 239. Constructing the Suffix Array from the Suffix Tree a s$ nas$ nas$ nas$ s$ na s$ bananas$ 1 3 5 0 2 4 6 T b n aaT a sn n Suffix Array recall that we added a unique symbol $ to make sure the tree exists - the $ is the smallest symbol in the alphabet 1 0 625 43 n To get the Suffix array perform a depth-first search (in lexicographical order) 1 3 5 0 2 10 2 3 4 5 6
  • 240. Constructing the Suffix Array from the Suffix Tree a s$ nas$ nas$ nas$ s$ na s$ bananas$ 1 3 5 0 2 4 6 T b n aaT a sn n Suffix Array recall that we added a unique symbol $ to make sure the tree exists - the $ is the smallest symbol in the alphabet 1 0 625 43 n To get the Suffix array perform a depth-first search (in lexicographical order) 1 3 5 0 2 4 10 2 3 4 5 6
  • 241. Constructing the Suffix Array from the Suffix Tree a s$ nas$ nas$ nas$ s$ na s$ bananas$ 1 3 5 0 2 4 6 T b n aaT a sn n Suffix Array recall that we added a unique symbol $ to make sure the tree exists - the $ is the smallest symbol in the alphabet 1 0 625 43 n To get the Suffix array perform a depth-first search (in lexicographical order) 1 3 5 0 2 4 10 2 3 4 5 6
  • 242. Constructing the Suffix Array from the Suffix Tree a s$ nas$ nas$ nas$ s$ na s$ bananas$ 1 3 5 0 2 4 6 T b n aaT a sn n Suffix Array recall that we added a unique symbol $ to make sure the tree exists - the $ is the smallest symbol in the alphabet 1 0 625 43 n To get the Suffix array perform a depth-first search (in lexicographical order) 1 3 5 0 2 4 10 2 3 4 5 6
  • 243. Constructing the Suffix Array from the Suffix Tree a s$ nas$ nas$ nas$ s$ na s$ bananas$ 1 3 5 0 2 4 6 T b n aaT a sn n Suffix Array recall that we added a unique symbol $ to make sure the tree exists - the $ is the smallest symbol in the alphabet 1 0 625 43 n To get the Suffix array perform a depth-first search (in lexicographical order) 1 3 5 0 2 4 10 2 3 4 5 6
  • 244. Constructing the Suffix Array from the Suffix Tree a s$ nas$ nas$ nas$ s$ na s$ bananas$ 1 3 5 0 2 4 6 T b n aaT a sn n Suffix Array recall that we added a unique symbol $ to make sure the tree exists - the $ is the smallest symbol in the alphabet 1 0 625 43 n To get the Suffix array perform a depth-first search (in lexicographical order) 1 3 5 0 2 4 6 10 2 3 4 5 6
  • 245. Constructing the Suffix Array from the Suffix Tree a s$ nas$ nas$ nas$ s$ na s$ bananas$ 1 3 5 0 2 4 6 T b n aaT a sn n Suffix Array recall that we added a unique symbol $ to make sure the tree exists - the $ is the smallest symbol in the alphabet 1 0 625 43 n To get the Suffix array perform a depth-first search (in lexicographical order) 1 3 5 0 2 4 6 10 2 3 4 5 6
  • 246. Constructing the Suffix Array from the Suffix Tree a s$ nas$ nas$ nas$ s$ na s$ bananas$ 1 3 5 0 2 4 6 T b n aaT a sn n Suffix Array recall that we added a unique symbol $ to make sure the tree exists - the $ is the smallest symbol in the alphabet 1 0 625 43 n To get the Suffix array perform a depth-first search (in lexicographical order) this takes O(n) time 1 3 5 0 2 4 6 10 2 3 4 5 6
  • 247. Suffix tree summary TT b n aaa sn n $ a s$ nas$ nas$ s$ na s$ bananas$ 7$ b n aaa sn n aaa sn n aa sn aa sn a sn a s s suffixes $ $ $ $ $ $ $ 0 1 2 3 4 5 6 $ 7 1 3 5 0 2 4 6 nas$ • The (compacted) suffix tree of a (length n) text uses O(n) space • Finding all matches of a pattern P of length m takes O(m + occ) where occ is the number of matches we assumed that the alphabet contains a constant number of symbols • Suffix trees can be built in O(n) time but we have only seen the O(n2) time method
  翻译: