Selective encoding for abstractive sentence summarizationKodaira Tomonori
This document describes a selective encoding model for abstractive sentence summarization. The model uses a selective gate to filter unimportant information from the encoder states before decoding. It achieves state-of-the-art results on several datasets, outperforming sequence-to-sequence and attention-based models. The model consists of an encoder, selective gate, and decoder. It is trained end-to-end to maximize the likelihood of generating reference summaries.
Poster: Controlled and Balanced Dataset for Japanese Lexical SimplificationKodaira Tomonori
This document presents a new controlled and balanced dataset for Japanese lexical simplification. The dataset contains 2,100 sentences each with a single difficult Japanese word. Five annotators provided substitution options for each complex word and ranked them in order of simplification. This dataset is the first for Japanese lexical simplification to only allow one complex word per sentence and include particles, resulting in higher correlation with human judgment than prior datasets. It will enable better machine learning methods for Japanese lexical simplification.
Noise or additional information? Leveraging crowdsource annotation item agree...Kodaira Tomonori
EMNLP2015論文読み会
小平知範
Noise or additional information? Leveraging crowdsource annotation item agreement for natural language tasks.
Emily K. Jamison and Iryna Gurevych
論文紹介:
Presentation:小平
PPDB 2.0: Better paraphrase ranking, fine-grained entailment relations, word embeddings, and style classification
Ellie Pavlick, Pushpendre Rastogi, Juri Ganitkevitch, Benjamin Van Durme, Chris Callison-Burch
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing
Aligning sentences from standard wikipedia to simple wikipediaKodaira Tomonori
Aligning Sentences from
Standard Wikipedia to
Simple Wikipedia
NAACL読み会
William Hwang; Hannaneh Hajishirzi; Mari Ostendorf; Wei Wu
University of Washington
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Poster: Controlled and Balanced Dataset for Japanese Lexical SimplificationKodaira Tomonori
This document presents a new controlled and balanced dataset for Japanese lexical simplification. The dataset contains 2,100 sentences each with a single difficult Japanese word. Five annotators provided substitution options for each complex word and ranked them in order of simplification. This dataset is the first for Japanese lexical simplification to only allow one complex word per sentence and include particles, resulting in higher correlation with human judgment than prior datasets. It will enable better machine learning methods for Japanese lexical simplification.
Noise or additional information? Leveraging crowdsource annotation item agree...Kodaira Tomonori
EMNLP2015論文読み会
小平知範
Noise or additional information? Leveraging crowdsource annotation item agreement for natural language tasks.
Emily K. Jamison and Iryna Gurevych
論文紹介:
Presentation:小平
PPDB 2.0: Better paraphrase ranking, fine-grained entailment relations, word embeddings, and style classification
Ellie Pavlick, Pushpendre Rastogi, Juri Ganitkevitch, Benjamin Van Durme, Chris Callison-Burch
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing
Aligning sentences from standard wikipedia to simple wikipediaKodaira Tomonori
Aligning Sentences from
Standard Wikipedia to
Simple Wikipedia
NAACL読み会
William Hwang; Hannaneh Hajishirzi; Mari Ostendorf; Wei Wu
University of Washington
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Improving text simplification language modeling using unsimplified text data
1. Improving Text Simplification
Language Modeling
Using Unsimplified Text Data
In Proceedings of the 51st Annual Meeting of the Association
for Computational Linguistics, pp.1537‒1546, 2013.
Presented by Kodaira Tomonori
1
3. 使用コーパス
• English Wikipedia と Simple English Wikipedia
*Simple English Wikipediaから60Kの記事
*English Wikipediaから60Kの共通記事を抽出
simple normal
sentences 385K 2540K
words 7.15M 64.7M
vocab size 78K 307K
3
8. • 学習した言語モデルを用いて、SemEval2012
のデータセットの候補をランキング
• システムが出力したランキングを評価するために
Cohen s kappa coefficientを用いた。
8
言語モデル評価:語彙平易化
Word: tight
Context: With the physical market as tight as it has been …
Candidates: constricted, pressurised, low, high-strung, tight
Human ranking: tight, low, constricted, pressurised, high-strung