Encoding Linguistic Structures with Graph Convolutional Networks

Encoding Linguistic Structures with
Graph Convolutional Networks
Diego Marcheggiani
Joint work with IvanTitov and Joost Bastings
University of Amsterdam
University of Edinburgh
@South England NLP Meetup

Structured (Linguistic) Priors
Sequa makes and repairs jet engines.
creator
creation
entity repaired
repairer
SBJ COORD
OBJ
CONJ NMOD
ROOT
“I voted for Palpatine because he was
most aligned with my values,” she said.
2

Sequence to Sequence
3
[Sutskever et al., 2014]
the black cat
le chat noire <s>
<s> le chat noire

} Language is not (only) a sequence of words
} We have linguistic knowledge
4
[Sutskever et al., 2014]
the black cat
le chat noire <s>
<s> le chat noire

} Language is not (only) a sequence of words
} We have linguistic knowledge
Encode structured linguistic knowledge into NN using
Graph Convolutional Networks
5
the black cat
le chat noire <s>
<s> le chat noire

Outline
} Semantic Role Labeling
} Graph Convolutional Networks (GCN)
} Syntactic GCN for Semantic Role Labeling (SRL)
} SRL Model
} Exploiting Semantics in Neural MachineTranslation with GCNs
Encoding Sentences with Graph Convolutional Networks for Semantic Role Labeling
Diego Marcheggiani,IvanTitov. In Proceedings of EMNLP, 2017.
Exploiting Semantics in Neural MachineTranslation with Graph Convolutional Networks
Diego Marcheggiani,Joost Bastings,IvanTitov. In Proceedings of NAACL-HLT, 2018.
6

Semantic Role Labeling
} Predicting the predicate-argument structure of a sentence
7

} Discover and disambiguate predicates
8
make.01 repair.01

} Identify arguments and label them with their semantic roles
make.01 repair.01
Creator
9

make.01 repair.01
Creator
Creation
10

make.01 repair.01
Creator
Creation
Entity repaired
Repairer
11

} Only the head of an argument is labeled
} Sequence labeling task for each predicate
} Focus on argument identification and labeling
12
make.01 repair.01
Creator
Creation
Entity repaired
Repairer

13
Question answering
Narayanan and Harabagiu 2004
Shen and Lapata 2007
Khashabi et al. 2018
Machine translation
Wu and Fung 2009
Aziz et al. 2011
Information extraction
Surdeanu et al. 2003
Christensen et al. 2010

Related work
14
Tutorial on Semantic Role
Labeling at EMNLP 2017

Related work
} SRL systems that use syntax with simple NN architectures
} [FitzGerald et al., 2015]
} [Roth and Lapata,2016]
} Recent models ignore linguistic bias
} [Zhou and Xu, 2014]
} [He et al., 2017]
} [Marcheggiani et al., 2017]
15
Tutorial on Semantic Role
Labeling at EMNLP 2017

Motivations
} Some semantic dependencies are mirrored in the syntactic graph
creator
creation
SBJ COORD
OBJ
CONJ NMOD
ROOT
16

creator
creation
entity repaired
repairer
SBJ COORD
OBJ
CONJ NMOD
ROOT
Motivations
} Some semantic dependencies are mirrored in the syntactic graph
} Not all of them – syntax-semantics interface is not trivial
17

Outline
} SRL Model
18

Graph Convolutional Networks (message passing)
Undirected graph
[Gori et al. 2005
Scarselli et al. 2009
Kipf and Welling,2016]
19

Undirected graph Update of the blue node
[Gori et al. 2005
Scarselli et al. 2009
Kipf and Welling,2016]
20

Undirected graph Update of the blue node
[Kipf and Welling,2016]
21
hi = ReLU
0
@W0hi +
X
j2N (v)
W1hj
1
A
<latexit sha1_base64="dRNZOAdr3+64yfJmCNqaHzngt30=">AAACcXicbVFdS9xAFJ2kttqtrdv6VEQYXGxXhCWRQvtSkPbFBxEt3Q8wS5jM3mxGJ5MwcyMNIT/Cn9Wf0N/Rh746WaOwbi8MnDnn3LkfE+VSGPS8P477bO35i/WNl51Xm6/fbHXfvhuZrNAchjyTmZ5EzIAUCoYoUMIk18DSSMI4uv7e6OMb0EZk6ieWOUxTNlciFpyhpcLubbB4o4pkATVNQkEfCcav66/0B5wOaSAhxn6r8JKpehx6q+7Dh6uGWR2YIg2rKxoIRYOUYcKZrM7q/s1BTcehb7OvlrMDLeYJHoTdnjfwFkFXgd+CHmnjPOz+DmYZL1JQyCUz5tL3cpxWTKPgEupOUBjIbQE2h+rXomBN9y03o3Gm7VFIF+ySkaXGlGlknU3r5qnWkP/TLguMv0wrofICQfH7QnEhKWa0WT+dCQ0cZWkB41rYFilPmGYc7Sd17Oz+00lXweho4HsD/+JT7/hbu4UNskP2SJ/45DM5JifknAwJJ/+cXeeD89H56753qbt3b3WdNmebLIV7eAfNqr4U</latexit><latexit sha1_base64="dRNZOAdr3+64yfJmCNqaHzngt30=">AAACcXicbVFdS9xAFJ2kttqtrdv6VEQYXGxXhCWRQvtSkPbFBxEt3Q8wS5jM3mxGJ5MwcyMNIT/Cn9Wf0N/Rh746WaOwbi8MnDnn3LkfE+VSGPS8P477bO35i/WNl51Xm6/fbHXfvhuZrNAchjyTmZ5EzIAUCoYoUMIk18DSSMI4uv7e6OMb0EZk6ieWOUxTNlciFpyhpcLubbB4o4pkATVNQkEfCcav66/0B5wOaSAhxn6r8JKpehx6q+7Dh6uGWR2YIg2rKxoIRYOUYcKZrM7q/s1BTcehb7OvlrMDLeYJHoTdnjfwFkFXgd+CHmnjPOz+DmYZL1JQyCUz5tL3cpxWTKPgEupOUBjIbQE2h+rXomBN9y03o3Gm7VFIF+ySkaXGlGlknU3r5qnWkP/TLguMv0wrofICQfH7QnEhKWa0WT+dCQ0cZWkB41rYFilPmGYc7Sd17Oz+00lXweho4HsD/+JT7/hbu4UNskP2SJ/45DM5JifknAwJJ/+cXeeD89H56753qbt3b3WdNmebLIV7eAfNqr4U</latexit><latexit sha1_base64="dRNZOAdr3+64yfJmCNqaHzngt30=">AAACcXicbVFdS9xAFJ2kttqtrdv6VEQYXGxXhCWRQvtSkPbFBxEt3Q8wS5jM3mxGJ5MwcyMNIT/Cn9Wf0N/Rh746WaOwbi8MnDnn3LkfE+VSGPS8P477bO35i/WNl51Xm6/fbHXfvhuZrNAchjyTmZ5EzIAUCoYoUMIk18DSSMI4uv7e6OMb0EZk6ieWOUxTNlciFpyhpcLubbB4o4pkATVNQkEfCcav66/0B5wOaSAhxn6r8JKpehx6q+7Dh6uGWR2YIg2rKxoIRYOUYcKZrM7q/s1BTcehb7OvlrMDLeYJHoTdnjfwFkFXgd+CHmnjPOz+DmYZL1JQyCUz5tL3cpxWTKPgEupOUBjIbQE2h+rXomBN9y03o3Gm7VFIF+ySkaXGlGlknU3r5qnWkP/TLguMv0wrofICQfH7QnEhKWa0WT+dCQ0cZWkB41rYFilPmGYc7Sd17Oz+00lXweho4HsD/+JT7/hbu4UNskP2SJ/45DM5JifknAwJJ/+cXeeD89H56753qbt3b3WdNmebLIV7eAfNqr4U</latexit><latexit sha1_base64="dRNZOAdr3+64yfJmCNqaHzngt30=">AAACcXicbVFdS9xAFJ2kttqtrdv6VEQYXGxXhCWRQvtSkPbFBxEt3Q8wS5jM3mxGJ5MwcyMNIT/Cn9Wf0N/Rh746WaOwbi8MnDnn3LkfE+VSGPS8P477bO35i/WNl51Xm6/fbHXfvhuZrNAchjyTmZ5EzIAUCoYoUMIk18DSSMI4uv7e6OMb0EZk6ieWOUxTNlciFpyhpcLubbB4o4pkATVNQkEfCcav66/0B5wOaSAhxn6r8JKpehx6q+7Dh6uGWR2YIg2rKxoIRYOUYcKZrM7q/s1BTcehb7OvlrMDLeYJHoTdnjfwFkFXgd+CHmnjPOz+DmYZL1JQyCUz5tL3cpxWTKPgEupOUBjIbQE2h+rXomBN9y03o3Gm7VFIF+ySkaXGlGlknU3r5qnWkP/TLguMv0wrofICQfH7QnEhKWa0WT+dCQ0cZWkB41rYFilPmGYc7Sd17Oz+00lXweho4HsD/+JT7/hbu4UNskP2SJ/45DM5JifknAwJJ/+cXeeD89H56753qbt3b3WdNmebLIV7eAfNqr4U</latexit>
Neighborhood
Self loop

GCNs Pipeline
Hidden layer Hidden layer
Input Output
X = H(0)
H(1) H(2)
Z = H(n)
Initial feature
representation of
nodes
Representation
informed by nodes’
neighborhood
…
…
…
22

GCNs Pipeline
Hidden layer Hidden layer
Input Output
X = H(0)
H(1) H(2)
Z = H(n)
…
…
…
Extend GCNs for syntactic dependency trees
Initial feature
representation of
nodes
Representation
informed by nodes’
neighborhood
23

Outline
} SRL Model
24

Example
Lane disputed those estimates
NMOD
SBJ OBJ
[Marcheggiani andTitov, 2017]
25

Example
NMOD
SBJ OBJ
⇥W
(1)
self
⇥W
(1)
self
⇥W
(1)
self
⇥W
(1)
self
ReLU(⌃·)ReLU(⌃·)ReLU(⌃·)ReLU(⌃·)
26

Example
NMOD
SBJ OBJ
⇥W
(1)
self
⇥W
(1)
self
⇥W
(1)
self
⇥W
(1)
self
⇥
W
(1)
subj
⇥
W
(1)
nm
od
⇥W
(1)
obj
27

Example
NMOD
SBJ OBJ
⇥W
(1)
self
⇥W
(1)
self
⇥W
(1)
self
⇥W
(1)
self
⇥
W
(1)
subj
⇥
W
(1)
nm
od
⇥W
(1)
obj
⇥W (1)
obj 0
⇥
W(1)nm
od0
⇥
W(1)subj0
28

Example
NMOD
SBJ OBJ
⇥W
(1)
self
⇥W
(1)
self
⇥W
(1)
self
⇥W
(1)
self
⇥
W
(1)
subj
⇥
W
(1)
nm
od
⇥W
(1)
obj
⇥W (1)
obj 0
⇥
W(1)nm
od0
⇥
W(1)subj0
29

Example
⇥W
(1)
self
NMOD
SBJ OBJ
⇥
W
(1)
subj
⇥W
(1)
self
⇥W
(1)
self
⇥W
(1)
self
⇥W (1)
obj 0
⇥
W
(1)
nm
od
⇥
W(1)nm
od0
⇥W
(1)
obj
⇥
W(1)subj0
30

Example
⇥W
(1)
self
NMOD
SBJ OBJ
⇥
W
(1)
subj
⇥W
(1)
self
⇥W
(1)
self
⇥W
(1)
self
⇥W (1)
obj 0
⇥
W
(1)
nm
od
⇥
W(1)nm
od0
⇥W
(1)
obj
⇥
W(1)subj0
⇥W
(2)
self
⇥W
(2)
self
⇥W
(2)
self
⇥W
(2)
self
⇥
W
(2)
subj
⇥
W(2)subj0
⇥W (2)
obj 0
⇥W
(2)
obj
⇥
W (2)nm
od
⇥
W
(2)
nm
od
0
Stacking GCNs widens the
syntactic neighborhood
31

Syntactic GCNs
h(k+1)
v = ReLU
0
@
X
u2N (v)
W
(k)
L(u,v)h(k)
u + b
(k)
L(u,v)
1
A
32

Syntactic GCNs
h(k+1)
v = ReLU
0
@
X
u2N (v)
W
(k)
L(u,v)h(k)
u + b
(k)
L(u,v)
1
A
Syntactic neighborhood
33

Syntactic GCNs
h(k+1)
v = ReLU
0
@
X
u2N (v)
W
(k)
L(u,v)h(k)
u + b
(k)
L(u,v)
1
A
Message
34

Syntactic GCNs
Syntactic neighborhood Self-loop is included in N
Messages are direction and
label specific
h(k+1)
v = ReLU
0
@
X
u2N (v)
W
(k)
L(u,v)h(k)
u + b
(k)
L(u,v)
1
A
Message
35

} Overparametrized: one matrix for each label-direction pair
}
Syntactic GCNs
W
(k)
L(u,v) = V
(k)
dir(u,v)
Self-loop is included in N
Messages are direction and
label specific
h(k+1)
v = ReLU
0
@
X
u2N (v)
W
(k)
L(u,v)h(k)
u + b
(k)
L(u,v)
1
A
Message
36

Edge-wise Gates
} Not all edges are equally important for the final task
37

Edge-wise Gates
} We should not blindly rely on predicted syntax
38

Edge-wise Gates
} Gates decide the“importance” of each message
NMOD
SBJ
OBJ
ReLU(⌃·) ReLU(⌃·)ReLU(⌃·)ReLU(⌃·)
g g g g g g g g g g
39

Edge-wise Gates
} Gates decide the“importance” of each message
Gates depend on
nodes and edges Lane disputed those estimates
NMOD
SBJ
OBJ
ReLU(⌃·) ReLU(⌃·)ReLU(⌃·)ReLU(⌃·)
g g g g g g g g g g
40

Outline
} SRL Model
41

Our Model
} Word representation
} Bidirectional LSTM encoder
} GCN Encoder
} Local role classifier
42

Word Representation
} Pretrained word embeddings
} Word embeddings
} POS tag embeddings
} Predicate lemma embeddings
word
representation
43

BiLSTM Encoder
} Encode each word with its left and right context
} Stacked BiLSTM
word
representation
J layers
BiLSTM
44

GCNs Encoder
} Syntactic GCNs after BiLSTM encoder
} Add syntactic information
} Skip connections
} Longer dependencies are captured
word
representation
J layers
BiLSTM
dobj
nmodnsubj
K layers
GCN
45

Semantic Role Classifier
word
representation
J layers
BiLSTM
dobj
nmodnsubj
K layers
GCN
A1
Classiﬁer
predicate
representation
candidate argument
representation
} Local log-linear classifier
p(r|ti, tp, l) / exp(Wl,r(ti tp))
46

Experiments
} Data
} CoNLL-2009 dataset - English and Chinese
} F1 evaluation measure
} Model
} Hyperparameters tuned on English development set
} State-of-the-art predicate disambiguation models
47

Ablation Experiments (Dev set)
82.7
83.3
81
82
83
84
85
English SRL w/o predicate disambiguation
BiLSTM GCN
48
75.2
77.1
73
74
75
76
77
78
Chinese SRL w/o predicate disambiguation
BiLSTM GCN

English Test Set
87.3
87.7 87.7
88
86
87
88
89
FitzGerald et al. (2015)
(global)
Roth and Lapata (2016)
(global)
Marcheggiani et al. (2017,
CoNLL) (local)
Ours (Bi-LSTM + GCN)
(local)
SRL with predicate disambiguation
49

English Out of Domain
75.2
76.1
77.7
77.2
74
75
76
77
78
FitzGerald et al. (2015)
(global)
(global)
Marcheggiani et al. (2017,
CoNLL) (local)
(local)
50

English Test Set (Ensemble)
87.7
87.9
89.1
86
87
88
89
90
FitzGerald et al. (2015) (ensemble) Roth and Lapata (2016) (ensemble) Ours (Bi-LSTM + GCN) (ensemble)
51

Chinese Test Set
77.7
78.6
79.4
82.5
76
77
78
79
80
81
82
83
Zhao et al. (2009) (global) Bjö̈rkelund et al. (2009)
(global)
(global)
(local)
52

Syntactic Graph Convolutional Networks
53
} Fast and simple
} Can be seamlessly applied to other tasks

54
Graph Convolutional Encoders for Syntax-aware Machine Translation
Joost Bastings,IvanTitov,Wilker Aziz,Diego Marcheggiani,Khalil Sima'an.
In Proceedings of EMNLP, 2017.

55
Graph Convolutional Encoders for Syntax-aware Machine Translation
Joost Bastings,IvanTitov,Wilker Aziz,Diego Marcheggiani,Khalil Sima'an.
In Proceedings of EMNLP, 2017.
Improvements on
English to German and
English to Czech translations

Multi-document Question Answering
56
[De Cao et al., 2018]
• Nodes are entities and edges are co-reference links
• Inference on a graph representing the documents collection

Multi-document Question Answering
57
[De Cao et al., 2018]

58

59

60

Outline
} SRL Model
61

Motivations [Marcheggiani at al., 2018]
62
John gave his wonderful wife a nice present .
Giver
Thing given
Entity given to
John gave a nice present to his wonderful wife .
Giver
Entity given to
Thing given

Motivations
SRL helps to generalize over different surface realizations
of the same underlying “meaning”.
[Marcheggiani at al., 2018]
63
John gave his wonderful wife a nice present .
Giver
Thing given
Entity given to
John gave a nice present to his wonderful wife .
Giver
Entity given to
Thing given

Motivations
65
Lost in translation

Related work
} Semantics in statistical MT
} [Wu and Fung,2009]
} [Liu and Gildea, 2010]
} [Aziz et al., 2011]
} ...
} Syntax in neural MT
} [Sennrich and Haddow,2016]
} [Aharoni and Goldberg,2017 ]
} [Bastings et al., 2017]
} …
} Semantics in neural MT
} ???
66

Predicate-argument encoding
67
John gave his wonderful wife a nice present
WA0
WA1
WA2
WA0’
WA2’
WA1’
Wself
Wself
Wself
Wself
Wself
Wself
Wself
Wself
Semantic
GCN
Semantic
GCN WA0
WA1
WA2
WA0’
WA2’
WA1’
Wself
Wself
Wself
Wself
Wself
Wself
Wself
Wself
Giver
Thing given
Entity given to

Our Model
} Standard sequence2sequence with attention
} Semantic GCN encoder on top of a bidirectional RNN
} RNN decoder
68

Our model
WA0
WA1
WA2
WA0’
WA2’
WA1’
Wself
Wself
Wself
Wself
Wself
Wself
Wself
Wself
BiRNN/
CNN
Semantic
GCN
Semantic
GCN WA0
WA1
WA2
WA0’
WA2’
WA1’
Wself
Wself
Wself
Wself
Wself
Wself
Wself
Wself
<bos> John
John
+
RNN
DECODER
ATTENTION
MECHANISM
69

Our model
WA0
WA1
WA2
WA0’
WA2’
WA1’
Wself
Wself
Wself
Wself
Wself
Wself
Wself
Wself
BiRNN/
CNN
Semantic
GCN
Semantic
GCN WA0
WA1
WA2
WA0’
WA2’
WA1’
Wself
Wself
Wself
Wself
Wself
Wself
Wself
Wself
<bos> John
John
+
RNN
DECODER
ATTENTION
MECHANISM
70

Experiments
} Data
} WMT‘16 English-German dataset (~4.5 million sentence pairs)
} BLEU as evaluation measure
} Model
} Hyperparameters tuned on News Commentary En-De (~226K sentence pairs)
} GRU as RNN
71

Results
23.3
23.9
20
21
22
23
24
25
26
BiRNN
(Bastings et al.2017)
BiRNN + Syntactic
GCN
BiRNN + Semantic
GCN
BiRNN+Syntactic GCN
+Semantic GCN
FullWMT 2016 English-German BLEU
72

Results
23.3
23.9
24.5
20
21
22
23
24
25
26
BiRNN
BiRNN + Syntactic
GCN
BiRNN + Semantic
GCN
BiRNN+Syntactic GCN
+Semantic GCN
73

Results
23.3
23.9
24.5
20
21
22
23
24
25
26
BiRNN
BiRNN + Syntactic
GCN
BiRNN + Semantic
GCN
BiRNN+Syntactic GCN
+Semantic GCN
74
+ 1.2 BLEU

Results
23.3
23.9
24.5
20
21
22
23
24
25
26
BiRNN
BiRNN + Syntactic
GCN
BiRNN + Semantic
GCN
BiRNN+Syntactic GCN
+Semantic GCN
Semantics is helpful
75
+ 1.2 BLEU

Results
23.3
23.9
24.5
24.9
20
21
22
23
24
25
26
BiRNN
BiRNN + Syntactic
GCN
BiRNN + Semantic
GCN
BiRNN+Syntactic GCN
+Semantic GCN
76

Results
23.3
23.9
24.5
24.9
20
21
22
23
24
25
26
BiRNN
BiRNN + Syntactic
GCN
BiRNN + Semantic
GCN
BiRNN+Syntactic GCN
+Semantic GCN
77
+ 1.6 BLEU

Results
23.3
23.9
24.5
24.9
20
21
22
23
24
25
26
BiRNN
BiRNN + Syntactic
GCN
BiRNN + Semantic
GCN
BiRNN+Syntactic GCN
+Semantic GCN
Syntax and
semantics are
complementary
78
+ 1.6 BLEU

Analysis
John sold the car to Mark .
Seller Thing sold Buyer
The boy walking down the dusty road is drinking a beer
Walker AM-DIR
Drinker Liquid
SOURCE
SEM GCN
BiRNN John verkaufte das Auto nach Mark .
John verkaufte das Auto an Mark .
SEM GCN
BiRNN Der Junge zu Fuß die staubige Straße ist ein Bier trinken .
Der Junge , der die staubige Straße hinunter geht , trinkt ein Bier .
SOURCE
79
BiRNN mistranslates “to” as “nach” (directionality)

Analysis
Walker AM-DIR
Drinker Liquid
SOURCE
SEM GCN
SEM GCN
SOURCE
80

Walker AM-DIR
Drinker Liquid
SOURCE
SEM GCN
SEM GCN
SOURCE
81
Analysis [Marcheggiani at al., 2018]

The boy sitting on a bench in the park plays chess .
Thing sitting Location Player Game
AM-LOC
SEM GCN
SEM GCN
BiRNN Der Junge auf einer Bank im Park spielt Schach .
Der Junge sitzt auf einer Bank im Park Schach .
SOURCE
82
Both translations are wrong,
but the BiRNN’s one is grammatically correct

AM-LOC
SEM GCN
SEM GCN
SOURCE
83

AM-LOC
SEM GCN
SEM GCN
SOURCE
84

Conclusion
} GCNs for encoding linguistic structures into NN
} Semantics, coreference, discourse
} Fast
} Cheap
} State-of-the-art model for dependency-based SRL
} First to exploit semantics in NMT
85

Roadmap
86
Including structured bias into neural NLP models

Roadmap
87
Low-resource setting

Roadmap
88
Long-range dependencies
Document level
Cross-document level

Roadmap
89
Document level
Integrating external knowledge
i.e., knowledge graphs

Roadmap
90
Document level
Integrating external knowledge
i.e., knowledge graphs
Thanks for your attention!

Encoding Linguistic Structures with Graph Convolutional Networks

Recommended

More Related Content

What's hot (19)

Similar to Encoding Linguistic Structures with Graph Convolutional Networks (20)

Recently uploaded (20)

Encoding Linguistic Structures with Graph Convolutional Networks