Hierarchical Attention Neural Network for Event Types to Improve Event Detection
Abstract
:1. Introduction
- S1: He left the company.
- S2: He left the company, and planned to go home directly.
- We propose a novel network model called HANN-ET to alleviate data sparsity and event instance imbalance problems without using external resources in small-scale datasets. It also works well on large-scale datasets.
- We employ a weighted attention aggregation mechanism instead of an average operation to merge the representations of all the upper-level event type modules, and we integrate syntactic information obtained by GAT to enrich the text representation.
- We conduct experiments on the widely used small-scale ACE2005 and large-scale MAVEN datasets. The experimental results on both datasets demonstrate that our approach is effective for event detection tasks and achieves state-of-the-art performance.
2. Related Work
3. Our Model
- Word encoding: we first represent a sentence into hidden embeddings via the LSTM model, then we use the dynamic multi-pooling to aggregate sentence information into sentence-level embeddings. Meanwhile, we utilize GAT to get syntactic-level embeddings.
- HANN-ET: we adopt Neural Module Networks to build the weighted scores about the upper-level modules of the event types, then we employ the attention mechanism to aggregate the scores from several upper-level modules, finally, we calculate the weighted sum of hidden embeddings as upper-level event type embeddings.
- Classification layer: we rely on sentence-level embeddings, syntactic-level embeddings, and upper-level event type embeddings to estimate the probability of a specific event type for the sentence.
3.1. Word Encoding
3.1.1. Sentence Encoder
3.1.2. Sentence-Level Feature
3.1.3. Syntactic-Level Feature
3.2. HANN-ET
3.2.1. Upper-Level Modules
3.2.2. Attention Aggregation
3.3. Classification Layer
4. Experiments
4.1. Experimental Setting
4.1.1. Datasets and Evaluation Metrics
4.1.2. Hyper-Parameter Setting
4.2. Overall Performance
- (1)
- CRF [15], a traditional machine learning method, views the event detection task as a sequence labeling problem for trigger words; the candidate events are obtained based on candidate trigger words which are identified through dictionary marching on the split sentences.
- (2)
- DMCNN [17] builds a dynamic multi-pooling CNN model to learn sentence features. It uses CNN for basic feature extraction, and in the trigger classification stage, dynamic multi-pooling is proposed to split the feature map into two parts according to the candidate trigger, by which the most important features of each part can be obtained.
- (3)
- JRNN [19] employs a bidirectional RNN as a feature extractor for the joint event extraction task, including event detection and arguments classification. It proposes a memory matrix that can effectively capture the dependencies between argument roles and trigger subtypes.
- (4)
- HBTNGMA [25] fuses sentence-level and document-level information to enhance the semantic features. First, it exploits a hierarchical and bias tagging network to capture event interdependency and detect multiple events in one sentence collectively; then it devises a gated multi-level attention mechanism to automatically extract and integrate contextual information.
- (5)
- JMEE [21] utilizes the self-attention and highway network to enhance GCN for event detection. It employs a syntactic Graph Convolution Network module to perform feature extraction by introducing shortcut arcs from syntactic structures. In the trigger classification module, a self-attention mechanism is added to capture the associations between multiple events in a sentence.
- (6)
- AD-DMBERT [30] proposes an adversarial imitation model to expand more training data for the task. It creates a large event-related candidate set based on the ACE2005 dataset and then applies an adversarial training mechanism to iteratively identify those informative instances from the candidate set. It selects CNN and BERT as representative encoders to encode the given instances.
- (7)
- MOGANED [22] improves GCN by combining multi-order word representation from different GAT layers. It uses Bi-LSTM to encode the input sentence to a sequence of vectors and proposes a multi-order Graph Attention Network that performs graph attention convolution over multi-order syntactic graphs. After that, it exploits an attention mechanism to aggregate multi-order representations of each word to predict its label.
- (8)
- EE-GCN [23] proposes a novel architecture to use dependency label information, which conveys rich and useful linguistic knowledge for event detection tasks. It designs an edge-aware node update module that aggregates syntactically connected words through specific dependency types to generate expressive word representations. Furthermore, it devises a node-aware edge update module to refine the relation representations with contextual information.
- (9)
- OntoED [6] links each event instance to a specific type in a target event ontology. It builds event ontology embedding through BERT and designs an event correlation inference mechanism to induce more event correlations based on existing ones. By the above, data-rich event types can propagate correlation knowledge to data-poor ones, and new event types can establish linkages to the event ontology.
4.3. Ablation Study
4.3.1. The Validation of the Components
- (1)
- DMCNN and HANN-ET-CNN, whereas the two models employ CNN as a sentence encoder and do not contain the GAT module.
- (2)
- Bi-GRU (Gated Recurrent Unit) with a multi-pooling layer and HANN-ET-GRU, whereas the two models utilize GRU as a sentence encoder and do not contain the GAT module.
- (3)
- Bi-LSTM with a multi-pooling layer and HANN-ET-LSTM, whereas the two models employ LSTM as a sentence encoder and do not contain the GAT module.
- (4)
- To validate the impacts of attention aggregation and integrated feature model, we conduct an experiment on the HANN-ET-Mean model, which has the same modules as HANN-ET-LSTM but adopts mean operation to aggregate the attention scores.
4.3.2. The Experiments on General and Sparse Event Types
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Conflicts of Interest
References
- Wang, X.; Wang, Z.; Han, X.; Jiang, W.; Han, R.; Liu, Z.; Zhou, J. MAVEN: A Massive General Domain Event Detection Dataset. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, Punta Cana, Dominican Republic, 16–20 November 2020; pp. 1652–1671. [Google Scholar]
- Liao, S.; Grishman, R. Using document level cross-event inference to improve event extraction. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, Uppsala, Sweden, 11–16 July 2010; pp. 789–797. [Google Scholar]
- Liu, S.; Liu, K.; He, S.; Zhao, J. A probabilistic soft logic based approach to exploiting latent and global information in event classification. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, Phoenix, AZ, USA, 13 February 2016; pp. 2993–2999. [Google Scholar]
- Li, Q.; Ji, H.; Huang, L. Joint event extraction via structured prediction with global features. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, Sofia, Bulgaria, 4–9 August 2013; pp. 73–82. [Google Scholar]
- Liu, S.; Chen, Y.; Liu, K.; Zhao, J.; Luo, Z.; Luo, W. Improving event detection via information sharing among related event types. In Proceedings of the Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data, Nanjing, China, 13–15 October 2017; pp. 122–134. [Google Scholar]
- Deng, S.; Zhang, N.; Li, L.; Chen, H.; Tou, H.; Chen, M.; Huang, F.; Chen, H. OntoED: Low-resource Event Detection with Ontology Embedding. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Virtual Event, 1–6 August 2021. [Google Scholar]
- Han, X.; Yu, P.; Liu, Z.; Sun, M.; Li, P. Hierarchical relation extraction with coarse-to-fine grained attention. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, 31 October–4 November 2018; pp. 2236–2245. [Google Scholar]
- Wang, X.; Wang, Z.; Han, X.; Liu, Z.; Li, J.; Li, P.; Ren, X. HMEAE: Hierarchical modular event argument extraction. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, Hong Kong, China, 3–7 November 2019; pp. 5777–5783. [Google Scholar]
- Andreas, J.; Rohrbach, M.; Darrell, T.; Klein, D. Neural module networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Polosukhin, I. Attention is all you need. In Advances in Neural Information Processing Systems; The MIT Press: Cambridge, MA, USA, 2017; pp. 5998–6008. [Google Scholar]
- Mehta, S.; Islam, M.R.; Rangwala, H.; Ramakrishnan, N. Event detection using hierarchical multi-aspect attention. In Proceedings of the World Wide Web Conference, San Francisco, CA, USA, 13–17 May 2019; pp. 3079–3085. [Google Scholar]
- Veličković, P.; Cucurull, G.; Casanova, A.; Romero, A.; Lio, P.; Bengio, Y. Graph attention networks. In Proceedings of the International Conference on Learning Representations, Vancouver Convention Center, Vancouver, ON, Canada, 30 April–3 May 2018. [Google Scholar]
- Ming, L.; Hailiang, H. A method of extracting financial event information based on lexical-semantic model. Comput. Appl. 2018, 38, 84–90. [Google Scholar]
- Jiangde, Y.; Xinfeng, X.; Xiaozhong, F. Chinese text event information extraction based on hidden Markov model. Microelectron. Comput. 2007, 24, 92–94. [Google Scholar]
- Hu, B.L.; He, R.F.; Sun, H.; Wang, W.J. Chinese event type recognition based on conditional random field. Pattern Recognit. Artif. Intell. 2012, 25, 445–449. [Google Scholar]
- Pouyanfar, S.; Sadiq, S.; Yan, Y.; Tian, H.; Tao, Y.; Reyes, M.P.; Iyengar, S.S. A survey on deep learning: Algorithms, techniques, and applications. ACM Comput. Surv. (CSUR) 2018, 51, 1–36. [Google Scholar] [CrossRef]
- Chen, Y.; Xu, L.; Liu, K.; Zeng, D.; Zhao, J. Event extraction via dynamic multi-pooling convolutional neural networks. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, Beijing, China, 26–31 July 2015; pp. 167–176. [Google Scholar]
- Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
- Nguyen, T.H.; Cho, K.; Grishman, R. Joint event extraction via recurrent neural networks. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego, CA, USA, 12–17 June 2016; pp. 300–309. [Google Scholar]
- Nguyen, T.H.; Grishman, R. Graph convolutional networks with argument-aware pooling for event detection. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018. [Google Scholar]
- Liu, X.; Luo, Z.; Huang, H. Jointly multiple events extraction via attention-based graph information aggregation. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, 31 October–4 November 2018; pp. 1247–1256. [Google Scholar]
- Yan, H.; Jin, X.; Meng, X.; Guo, J.; Cheng, X. Event detection with multi-order graph convolution and aggregated attention. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, Hong Kong, China, 3–7 November 2019; pp. 5766–5770. [Google Scholar]
- Cui, S.; Yu, B.; Liu, T.; Zhang, Z.; Wang, X.; Shi, J. Edge-Enhanced Graph Convolution Networks for Event Detection with Syntactic Relation. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, Online, 16–20 November 2020; pp. 2329–2339. [Google Scholar]
- Zhao, Y.; Jin, X.; Wang, Y.; Cheng, X. Document embedding enhanced event detection with hierarchical and supervised attention. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, Melbourne, Australia, 15–20 July 2018; pp. 414–419. [Google Scholar]
- Chen, Y.; Yang, H.; Liu, K.; Zhao, J.; Jia, Y. Collective event detection via a hierarchical and bias tagging networks with gated multi-level attention mechanisms. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, 31 October–4 November 2018; pp. 1267–1276. [Google Scholar]
- Ngo, N.T.; Nguyen, T.N.; Nguyen, T.H. Learning to select important context words for event detection. Adv. Knowl. Discov. Data Min. 2020, 12085, 756–768. [Google Scholar]
- Jang, E.; Gu, S.; Poole, B. Categorical reparameterization with Gumbel-softmax. In Proceedings of the International Conference on Learning Representations, Palais des Congrès Neptune, Toulon, France, 24–26 April 2017. [Google Scholar]
- Deng, S.; Zhang, N.; Kang, J.; Zhang, Y.; Zhang, W.; Chen, H. Meta-learning with dynamic-memory-based prototypical network for few-shot event detection. In Proceedings of the 13th International Conference on Web Search and Data Mining, Houston, TX, USA, 3–7 February 2020; pp. 151–159. [Google Scholar]
- Lai, V.D.; Dernoncourt, F.; Nguyen, T.H. Exploiting the matching information in the support set for few shot event classification. Adv. Knowl. Discov. Data Min. 2020, 12085, 233–245. [Google Scholar]
- Wang, X.; Han, X.; Liu, Z.; Sun, M.; Li, P. Adversarial training for weakly supervised event detection. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, Minnesota, 2–7 June 2019; pp. 998–1008. [Google Scholar]
- Jacob, D.; Chang, M.-W.; Lee, K.; Kristina, T. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, Minnesota, 2–7 June 2019; pp. 4171–4186. [Google Scholar]
- Wang, Z.; Wang, X.; Han, X.; Lin, Y.; Hou, L.; Liu, Z.; Li, P.; Li, J.; Zhou, J. CLEVE: Contrastive Pre-training for Event Extraction. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Online Event, 1–6 August 2021; pp. 6283–6297. [Google Scholar]
- Sandhaus, E. The New York Times Annotated Corpus. Linguist. Data Consort. 2008, 6, e26752. [Google Scholar] [CrossRef]
- Xu, D.; Li, J.; Zhu, M.; Zhang, M.; Zhou, G. Improving AMR Parsing with Sequence-to-Sequence Pre-training. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, Online Event, 16–20 November 2020; pp. 2501–2511. [Google Scholar]
- Tong, M.; Wang, S.; Cao, Y.; Xu, B.; Li, J.; Hou, L.; Chua, T.S. Image enhanced event detection in news articles. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; pp. 9040–9047. [Google Scholar]
- Yang, Z.; Yang, D.; Dyer, C.; He, X.; Smola, A.; Hovy, E. Hierarchical attention networks for document classification. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego, CA, USA, 12–17 June 2016; pp. 1480–1489. [Google Scholar]
- Luong, M.T.; Pham, H.; Manning, C.D. Effective approaches to attention-based neural machine translation. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal, 17–21 September 2015. [Google Scholar]
Type | Configuration | Parameters |
---|---|---|
hardware | CPU | Intel(R) Core(TM) i7-10875H CPU @ 2.30GHz |
GPU | Nvidia GeForce RTX2060 | |
memory | 16G DDR4 3200MHz | |
software | operating system | Windows 10 |
compilation environment | Python 3.7 | |
deep learning framework | Tensorflow 1.13 |
Parameters | Values |
---|---|
word embedding dimension | 100 |
entity type embedding dimension | 50 |
POS-tagging embedding dimension | 50 |
position embedding dim | 50 |
LSTM hidden size | 100 |
dimension | 900 |
dimension | 900 |
batch size | 30 |
learning rate | 0.001 |
dropout | 0.5 |
regularization | 1 × 10−5 |
1 |
Methods | ACE2005 | MAVEN | ||||
---|---|---|---|---|---|---|
p | R | p | R | |||
CRF(2012) | 65.3 | 59.7 | 62.4 | 53.8 | 52.4 | 53.1 |
DMCNN(2015) | 75.6 | 63.6 | 69.1 | 66.3 | 55.9 | 60.6 |
JRNN(2016) | 66.0 | 73.9 | 69.3 | 59.2 | 64.8 | 61.9 |
HBTNGMA(2018) | 77.9 | 69.1 | 73.3 | 62.5 | 63.4 | 62.9 |
JMEE (2018) | 76.3 | 71.3 | 73.7 | 61.6 | 63.2 | 62.4 |
DMBERT(2019)[B] | 77.9 | 72.5 | 75.1 | 62.7 | 72.3 | 67.1 |
MOGANED (2019) | 79.5 | 72.3 | 75.7 | 63.4 | 64.1 | 63.8 |
EE-GCN (2020) | 76.7 | 78.6 | 77.6 | 62.2 | 66.9 | 64.5 |
OntoED(2021) [B] | 77.9 | 76.8 | 77.3 | 63.1 | 71.2 | 66.9 |
HANN-ET | 76.4 | 78.8 | 77.6 | 63.9 | 67.5 | 65.6 |
HANN-ET [B] | 78.3 | 79.6 | 78.9 | 65.7 | 72.2 | 68.8 |
Methods | ACE2005 | MAVEN | ||||
---|---|---|---|---|---|---|
p | R | p | R | |||
CRF(2012) | 65.6 | 59.2 | 62.2 | 54.5 | 51.8 | 53.1 |
DMCNN(2015) | 75.8 | 65.2 | 70.1 | 65.8 | 57.2 | 61.2 |
JRNN(2016) | 66.5 | 74.3 | 70.2 | 59.5 | 64.9 | 62.1 |
HBTNGMA(2018) | 78.4 | 67.8 | 72.7 | 62.8 | 63.6 | 63.2 |
JMEE (2018) | 76.1 | 70.9 | 73.4 | 61.3 | 63.7 | 62.5 |
DMBERT(2019)[B] | 78.2 | 73.7 | 75.9 | 62.5 | 73.6 | 67.6 |
MOGANED (2019) | 79.8 | 72.6 | 76.0 | 63.8 | 65.2 | 64.5 |
EE-GCN (2020) | 76.3 | 78.6 | 77.4 | 62.6 | 67.7 | 65.1 |
OntoED(2021)[B] | 77.6 | 77.1 | 77.3 | 63.5 | 71.3 | 67.2 |
HANN-ET | 75.8 | 78.4 | 77.1 | 63.6 | 68.7 | 66.1 |
HANN-ET [B] | 77.9 | 79.3 | 78.6 | 66.1 | 72.6 | 69.2 |
Methods | ACE2005 | MAVEN | ||||
---|---|---|---|---|---|---|
p | R | p | R | |||
DMCNN | 75.8 | 65.2 | 70.1 | 65.8 | 57.2 | 61.2 |
HANN-ET-CNN | 75.1 | 67.7 | 71.2 | 62.7 | 61.5 | 62.1 |
Bi-GRU-pooling | 72.4 | 73.2 | 72.8 | 61.9 | 64.7 | 63.3 |
HANN-ET-GRU | 73.9 | 76.2 | 75.0 | 62.6 | 66.5 | 64.5 |
Bi-LSTM-pooling | 72.6 | 72.9 | 72.7 | 62.5 | 63.2 | 62.8 |
HANN-ET-LSTM | 75.2 | 77.3 | 76.2 | 63.3 | 67.4 | 65.3 |
HANN-ET-Mean | 73.5 | 76.0 | 74.7 | 62.1 | 66.1 | 64.0 |
HANN-ET | 75.8 | 78.4 | 77.1 | 63.6 | 68.7 | 66.1 |
Subtype | Number | |
---|---|---|
General event types | Attack Transport Die Charge-Indict Meet End-Position Transfer-Money Elect Injure Transfer-Ownership Phone-Write Start-Position Trial-Hearing | 4460 |
Sparse event types | Be-Born Marry Divorce Sue Start-Org Merge-Org Appeal Pardon End-Org Demonstrate Nominate Arrest-Jail Release-Parole Convict Fine Sentence Execute Extradite Acquit Declare-Bankruptcy | 889 |
Methods | General Subtypes | Sparse Subtypes | ||||
---|---|---|---|---|---|---|
p | R | p | R | |||
DMCNN(2015) | 87.5 | 80.3 | 83.7 | 89.2 | 46.2 | 60.9 |
JRNN(2016) | 90.8 | 81.6 | 86.0 | 89.7 | 49.6 | 63.9 |
JMEE (2018) | 91.7 | 82.4 | 86.8 | 90.8 | 50.3 | 64.7 |
MOGANED(2019) | 91.4 | 81.2 | 86.0 | 91.5 | 50.7 | 65.2 |
EE-GCN (2020) | 92.2 | 83.7 | 87.7 | 90.7 | 51.6 | 65.8 |
OntoED(2021)[B] | 93.6 | 82.9 | 87.9 | 92.3 | 52.8 | 67.2 |
HANN-ET | 92.5 | 83.5 | 87.8 | 90.6 | 53.2 | 67.0 |
HANN-ET [B] | 93.1 | 84.4 | 88.5 | 91.4 | 54.5 | 68.3 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Jin, Y.; Ye, J.; Shen, L.; Xiong, Y.; Fan, L.; Zang, Q. Hierarchical Attention Neural Network for Event Types to Improve Event Detection. Sensors 2022, 22, 4202. https://doi.org/10.3390/s22114202
Jin Y, Ye J, Shen L, Xiong Y, Fan L, Zang Q. Hierarchical Attention Neural Network for Event Types to Improve Event Detection. Sensors. 2022; 22(11):4202. https://doi.org/10.3390/s22114202
Chicago/Turabian StyleJin, Yanliang, Jinjin Ye, Liquan Shen, Yong Xiong, Lele Fan, and Qingfu Zang. 2022. "Hierarchical Attention Neural Network for Event Types to Improve Event Detection" Sensors 22, no. 11: 4202. https://doi.org/10.3390/s22114202
APA StyleJin, Y., Ye, J., Shen, L., Xiong, Y., Fan, L., & Zang, Q. (2022). Hierarchical Attention Neural Network for Event Types to Improve Event Detection. Sensors, 22(11), 4202. https://doi.org/10.3390/s22114202