Next Article in Journal
On Non-Commutative Multi-Rings with Involution
Previous Article in Journal
On Convergence Rate of MRetrace
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Collaboration of Large and Small Models for Event Type Discovery in Completely Open Domains

Laboratory for Big Data and Decision, National University of Defense Technology, Changsha 410073, China
*
Author to whom correspondence should be addressed.
Mathematics 2024, 12(18), 2928; https://doi.org/10.3390/math12182928
Submission received: 22 August 2024 / Revised: 17 September 2024 / Accepted: 19 September 2024 / Published: 20 September 2024

Abstract

:
Event type discovery in open domains aims to automate the induction of event types from completely unlabeled text data. Conventionally, small models utilize clustering techniques to address this task. Nonetheless, the fully unsupervised nature of these methods results in suboptimal performance by small models in this context. Recently, large language models (LLMs) have demonstrated excellent capabilities in contextual understanding, providing additional relevant information for specific task scenarios, albeit with challenges in precision and cost effectiveness. In this paper, we use LLM to guide the clustering of event texts and distill this process into a fine-tuning task for training smaller pre-trained language models. This approach enables effective event type discovery even in scenarios lacking annotated data. The study unfolds in three stages: in action acquisition, leveraging LLMs to extract type-relevant information from each event text, ensuring that the event representations are particular to task-specific details; in clustering refinement and dual-fine-tune, LLMs refine results from both task-agnostic and task-specific perspectives, with the refinement process designed as fine-tuning tasks under different viewpoints to adjust encoders; and in type generation, post-clustering, LLMs generate meaningful event type labels for each cluster. Experiments show that our method outperforms current state-of-the-art approaches and excels in event type discovery tasks even in completely open-domain with no labeled data.

1. Introduction

Events as a distinct category of information represent fundamental units through which humans interpret and experience the world. Typically, events are characterized by the occurrence of actions or state transitions. Although event information extraction methods have achieved considerable success, they are significantly dependent on human-guided real-world event ontology and meticulously structured supervisory schema. Due to the open and dynamic nature of real-world information, obtaining such supervision is challenging. Consequently, there is an increasing need to automate the extraction of novel events from open information sources and to infer new types of events, including those that have not previously appeared.
To address the challenges presented by open domains, several approaches have been proposed, including event type induction [1,2] and event schema induction [3,4]. These methods aim to enable systems to autonomously identify new event types and extract instances corresponding to these types. However, most of these approaches have struggled to adapt fully to completely open domains. Until recently, they often relied on models accessing prior knowledge, such as trigger word mentions [2] or partially known event types [5], to facilitate the discovery of new types within semi-open contexts.
In completely open domains, where only the event text itself is available, type discovery naturally assumes the form of a clustering task. Current mainstream methods utilize pre-trained language models to encode text into high-dimensional vector spaces, followed by clustering algorithms [6]. However, when applied to raw event text, vectors generated by pre-trained language models such as BERT [7] predominantly capture surface-level similarities, failing to encapsulate critical information pertinent to event types within the text [8]. Additionally, the event types derived from these methods often lack practical significance and are typically represented in a highly abstract manner [5].
Recently, large language models have demonstrated exceptional capabilities in information extraction [9]. In completely open domains, these models are adept at comprehensively understanding context and delivering task-specific key information according to user preferences and instructions [10,11,12], thereby facilitating event type discovery. Despite their impressive summarization and discovery abilities, the immense scale of LLMs presents significant challenges for real-world deployment [13,14], and their outputs are frequently imprecise and difficult to fine-tune.
In this paper, we introduce a novel framework that integrates both large and small models to address the task of event type discovery. The main purpose is to utilize the generative and summarization capabilities of LLM to guide smaller models through three main stages of event type discovery, thereby enabling the smaller models to effectively identify event types even in entirely open and unannotated domains. Our approach is distinguished by its performance in three key stages: First, for the action acquisition stage, it employs the summarization capabilities of LLMs to extract type-relevant information from event texts, facilitating the generation of event representations focused on type information by the smaller models. Second, for the clustering refinement and dual-fine-tune stage, it involves using an LLM to refine clustering results, with the refinement process structured as fine-tuning tasks from both task-agnostic and task-specific perspectives, thus equipping the smaller models with task-specific preferences to enhance adaptive adjustments. Finally, for the type generation stage, it extracts samples from each cluster and utilizes an LLM to summarize these samples into a set of event types with practical significance.
Our approach utilizes an LLM to inject highly relevant information about event types as task-specific perspectives, thereby offering supervisory signals for event types within entirely open domains without the necessity of any additional prior information. Concurrently, it transfers knowledge from the LLM into fine-tuning tasks by executing dual fine-tuning on the encoder from both task-specific and task-agnostic perspectives (reflecting global information about events). This strategy circumvents the need for intricate task-specific adjustments directly to the LLM.
In summary, the main contribution of this article is at least three-fold:
  • Our approach functions on an event corpus within completely open domains, without the need for prior knowledge or manual annotation. By using straightforward instructions, it allows LLMs to discern information pertinent to event types and reconstruct event types with substantive semantic content.
  • We utilize an LLM to optimize clustering results by integrating user intent and task perspectives into the encoder. Adjustments are made to the encoder from both task-agnostic and task-specific perspectives, thereby achieving a significant enhancement in clustering performance.
  • We assess the performance of our model using two extensively utilized datasets. The results demonstrate that, in fully open domains, our approach significantly outperforms the current state-of-the-art models.

2. Related Work

In this section, we first review the existing research on event type discovery. Subsequently, based on the employed methodologies, we provide an overview of the advancements in research on LLMs and text clustering.

2.1. Event Type Discovery

The primary objective of the event type discovery task is to uncover potential event types within text that may not have been predefined or labeled. Formally, this task bears similarity to the verb clustering task presented in QasemiZadeh et al. [15], which focuses on mapping contextually embedded verbs to semantic frames. Given the open, zero-shot nature of the task, most methods employ semi-supervised, unsupervised, or self-supervised learning techniques, aiming to accomplish the task without relying on additional prior knowledge.
Huang and Ji [1] defined a semi-supervised model, SS-VQ-VAE, which leverages a vector quantization variational autoencoder model to utilize known types. ETypeClus [2] models events using (predicate, object) structures, defining these structures as P-O pairs. It then reconstructs a clustering algorithm based on these P-O pairs. However, due to the constraints imposed by the P-O pair structure, this method is restricted to scenarios where event types are triggered by predicate verbs, which limits its generalizability. Tang et al. [16] designed an Event Schema Harvester (ESHer), which automatically generates high-quality event schemas through contextual concept generation, credibility-aware schema structuring, and graph-based schema aggregation. Although this method demonstrates good performance, it relies on predefined known event types. Recently, Li et al. [5] introduced a novel open type discovery approach grounded in transfer learning principles. This approach identifies event types by leveraging knowledge from existing types and represents these event types through abstract semantic models generated by the system.
The aforementioned methods exhibit two principal limitations in the event type discovery task: (1) They incorporate additional information to varying degrees, such as relying on ground-truth trigger word annotations or leveraging known types, which diminishes the openness of the discovery scenario. (2) The resulting event types are typically represented in abstract formats, such as keywords or predicate–object pairs, rendering them challenging to apply effectively in downstream tasks.

2.2. LLMs for Text Clustering

LLMs exhibit remarkable comprehension capabilities, enabling users to perform tasks with minimal instructions. The key advantage of this approach is that LLMs can effectively align data with task perspectives without necessitating additional auxiliary information. This feature offers substantial support for text clustering tasks that lack supervisory signals.
In recent years, a growing body of research has utilized LLMs to enhance text clustering, demonstrating significantly improved performance. Viswanathan et al. [14] utilized LLMs to provide guidance at various stages of clustering, enhancing input features, applying clustering constraints, and refining results. This facilitates a balance between cost and accuracy, leading to improved clustering outcomes. Zhang et al. [13] introduced CLUSTERLLM, a framework that employs large models to guide text clustering. The adjustment of clustering through LLMs is framed as a fine-tuning task applied to the encoder, which effectively integrates user intent and task perspectives into the encoder, thereby optimizing the clustering results. Wan et al. [17] leveraged LLMs within a two-stage framework for extensive text mining. This approach employs LLMs to automate the complete pipeline of label generation and assignment, efficiently managing the entire process from clustering to label allocation for any specified use case. Given that text clustering tasks often fail to provide interpretable labels for clusters, recent research has sought to mitigate these challenges by utilizing LLMs for topic modeling [18,19].
In our approach, we build upon the principles of the previously mentioned methods, leveraging LLMs to integrate essential information from the perspective of event types in open-domain contexts. This integration significantly improves the effectiveness of event type discovery.

3. Problem Definition

Given an unlabeled event text corpus S = { S 1 , S N } which is a collection of event sentences, where S i = [ w 1 , w i , w n ] S represents a sequence of tokens in an event sentence, our goal is to infer a set T = { T 1 , T K } from S, where T i T represents an event type, and K is the number of discovered event types. To achieve this, the unlabeled corpus S is used as input, and the output is a clustering assignment Y = { y i } 1 K , which maps the input text to cluster indices. Based on the clustering results, a meaningful label is assigned to each cluster, thus resulting in T .

4. Methodology

The overall framework of the proposed method is depicted in Figure 1. This method employs the distinct functionalities of LLMs to facilitate the discovery of event types through three stages. We detail the working mechanisms of each stage in the following sections.

4.1. Stage I: Action Acquisition

In this stage, our primary goal is to identify information that exhibits a strong correlation with event types and to facilitate the model’s ability to capture these critical details during the clustering process. We define "action" as the pivotal information that reflects event types and employ LLM to extract these actions from events. Subsequently, we develop a salience score to evaluate and select actions from among the candidates based on their strong relevance to the event types.

4.1.1. Action Extraction

Previous research [2] has demonstrated that the majority of event types are triggered by phrases containing predicate verbs and their objects, which are regarded as highly pertinent to identifying event types. However, this structural approach imposes limitations on the information framework by necessitating the inclusion of predicate verbs in events. In reality, according to conventional definitions of events, nearly all events can be described as occurrences of actions. These actions are not exclusively triggered by predicate verbs; alternative forms such as gerunds, infinitives, and prepositional phrases can also signify the occurrence of actions. In other words, this aspect of the structure in event texts, once lemmatized, aligns with verbs denoting actions. Based on this analysis, this paper defines all elements in event texts that reflect the occurrence of actions as actions within the event. Actions may manifest in various components or forms within events, but they consistently denote the occurrence of actions, and events of similar types generally involve analogous actions.
While we have identified actions that reflect event types, we are unable to directly guide the model to extract this information solely based on the event texts. Consequently, we employ an LLM to capture the actions present within the events. The advanced comprehension and generalization abilities of the LLM facilitate a nuanced understanding of our objectives and enable the extraction of actions from the events through straightforward instructions. Specifically, we utilize GPT-4o for the task of extracting actions from each event. We provide the LLM with a prompt comprising the event text, extraction steps, and the desired output format. The LLM then generates the candidate actions as output.

4.1.2. Action Selection

Upon acquiring multiple candidate actions for an event, we find that these actions vary in their contribution to the discovery of event types. Some actions exhibit a strong correlation with event type information, while others show weak or negligible relevance, potentially obscuring or even undermining the clarity of event type information. Consequently, our objective in this step is to select those actions from the candidate pool that are most strongly associated with the event type to enhance the clarity of event type information.
In an entirely open and unannotated context, we assess the salience of actions using two key criteria: (1) Events of the same type are typically triggered by similar actions, so significant actions should be frequently observed among the candidate actions. (2) Significant actions associated with a specific event type should be infrequent in events of other types, ensuring that such actions are not overly common throughout the event corpus. Given a set of candidate actions A = { A i } 1 N , where A i = { a j } 1 M is the set of candidate actions for the i-th event, and a j = [ w i , , w k ] is the token list for the j-th candidate action in A i , based on the TF-IDF concept, the salience of each candidate action a j can be calculated as follows:
S a l i e n c e ( a j ) = k = 1 K ( 1 + l o g ( f r e q ( w k ) ) 2 ) l o g ( | S | f r e q _ S ( w k ) ) ,
where f r e q ( w k ) is the frequency of the word w k in the candidate actions, | S | is the number of event sentences in the corpus, and f r e q _ S ( w k ) is the sentence frequency of the word w k in the corpus. By analyzing the salience of actions, we can select the action from the candidates that most effectively represents the event type information.

4.2. Stage II: Clustering Refinement and Dual-Fine-Tune

In this stage, we begin by employing two identical encoders to separately encode the text and actions, then we concatenate these vector representations to form the input for clustering. Following the initial clustering results, we aim to pinpoint samples with the highest levels of uncertainty and reassign them to alternative clusters. We leverage an LLM to refine the clustering outcomes, documenting the optimization process as triplets for the subsequent fine-tuning of the encoders. Specifically, we adjust the encoders considering both the original event text and the action perspectives to achieve optimal encoding results from different perspectives.

4.2.1. Entropy-Based Triplet Formation

Based on the results from Section 4.1.2, we initialize two encoders, f S and f , to encode the text and action, respectively, resulting in the encoded vector sets H S = { h i = f S ( s i ) } i = 1 N and H A = { h i = f A ( a i ) } i = 1 N . We then obtain the merged vector set H = { h i = h s + h a | h s H S , h a H A , i = s = a } by directly summing the corresponding text and action encodings, which serves as input for clustering. We choose BERT as the encoder, and the initial two encoders have the same structure and parameters.
After obtaining the encoded set H , we perform clustering on H . Based on the clustering results, we compute the cluster centers c k by calculating the average of the embeddings assigned to each cluster k:
c k = 1 N k i C k h i ,
where c k is the cluster center for cluster k, N k is the number of embeddings in cluster k, C k is the set of indices for embeddings in cluster k, and h i is an embedding in cluster C k . Our goal is to identify the instances with the highest uncertainty within the clusters, which are considered the “most ambiguous” due to their highest uncertainty in membership between different clusters. Referring to previous work [13], we similarly identify the most ambiguous instances based on entropy as anchors. First, we use the Student’s t-distribution to obtain the probability distribution of each instance being assigned to each cluster:
p i k = ( 1 + | | h i c k | | 2 / α ) α + 1 2 k ( 1 + | | h i c k | | 2 / α ) α + 1 2 .
We set the degrees of freedom in the formula to α = 1 . This distribution allows an instance to belong to multiple clusters simultaneously, with varying probabilities for each cluster. According to the probability distribution p i k , the cluster with the maximum probability distribution p i k is considered to be the K-nearest neighbors (KNN) cluster of instance i:
K N N = m a x ( ϵ K , 2 ) .
Here, ϵ is set to a small value, and the specific setting should be adjusted according to the size of the dataset. According to the definition, the entropy E of a discrete random variable X is calculated using the formula
E ( X ) = i P ( x i ) l o g P ( x i ) ,
where x i represents the possible values of X, and P ( x i ) is the probability of x i . From Formulas (2)–(4), we can obtain the information entropy of instance i:
e i = k = 1 K N N p ¯ i k l o g ( p ¯ i k ) ,
Here, p ¯ i k is the re-normalized probability, representing the normalized probability of instance i within the K N N nearest clusters. At this point, we have obtained the set E = { e i } i = 1 Q of entropy values for each instance. By sorting the entropy values in set E , we can select samples with higher entropy as our anchor points. These selected anchor points are the ambiguous samples in the clustering results, where the cluster membership is uncertain. After selecting the anchor point instances, we choose the two closest clusters from the clusters to which the anchor points belong, and then select one instance from each of these two clusters X 1 and X 2 as the two candidate instances to compare with the anchor point instance. In our experiments, we observe that configuring the number of triples Q to 1024 yields a substantial improvement on an event dataset comprising approximately 5000 instances.

4.2.2. Cluster Correction Based on LLM

According to Section 4.2.1, we obtain a triplet ( X a , X 1 , X 2 ) , where X a is the instance in the clustering results that needs correction, and X 1 and X 2 are samples randomly selected from two different clusters, representing possible correction directions for X a . We first specify the user perspective focusing on event types, then have the LLM make a selection to identify which of ( X 1 , X 2 ) better aligns with X a in terms of event type. Based on the choice of the LLM, the selected sample becomes the positive candidate X a + , and the other becomes the negative candidate X a . Thus, we obtain a new triplet ( X a , X a + , X a ) , which reflects the preferences of the LLM from the event type perspective and captures the user’s intent and task view. This will assist in the next step to fine-tune the encoder, ensuring that the generated embeddings focus on event type information, thereby achieving improved performance in clustering.

4.2.3. Fine-Tuning Embedder from Different Perspectives

To enhance clustering results that accurately reflect distinct event types, a fundamental approach is to ensure that the vector embeddings used in clustering are strongly aligned with the event types. Specifically, this involves making the embeddings rich in information relevant to event types. Building on this principle, we propose developing a task-specific fine-tuning procedure for the embedding model to create an embedding space that more effectively captures event type distinctions. For each instance, which has associated positive and negative samples, we define the fine-tuning task as a self-supervised contrastive learning problem. This task aims to enhance embedding quality by maximizing the similarity between anchor and positive examples, while minimizing similarity with negative examples. In line with prior research [20], for the triplet ( X a , X a + , X a ) , we seek to optimize the following objective:
L = 1 N i = 1 N l o g e x p ( f ( X a ) f ( X a + ) / τ ) X B e x p ( f ( X a ) f ( X ) / τ ) .
Here, B represents all samples in a batch, including the positive X a + corresponding to X a in the triplet and all negative examples X within the batch. τ is the temperature coefficient used to adjust the distribution of similarity scores, controlling the model’s ability to distinguish between negative samples. For fine-tuning tasks under two different perspectives, the encoder f and the content of the triplets in the objective function change accordingly. In this paper, for the task-specific event type perspective, the instances in the triplets are composed of actions as described in Section 4.2.1, and the encoder is denoted as f = f A . In contrast, for the task-agnostic global information perspective, the instances in the triplets are composed of the original event texts, and the encoder is denoted as f = f S .
The fine-tuned encoder can be used with our sampling method to identify triplets that contain more information and iteratively improve performance. We obtain new clustering assignments by running clustering algorithms on the adjusted embeddings, and then perform the next round of sampling and fine-tuning until the model converges on the task.

4.3. Stage III: Type Generation

In this stage, we leverage the final clustering outcomes to determine event types. Initially, we identify the Top K samples that are closest to the centroid within each cluster of the final clustering results. These samples are subsequently fed into the LLM, which, utilizing the semantic information from these K samples, produces a thematic summary and categorizes the information into specific event types.
Referring to Equation (2), we initially compute the centroid c k for each cluster. Based on these centroids, we compute the Euclidean distance between each sample within each cluster and its respective centroid. Using this distance metric, we select the top K samples that are closest to the centroid, thereby exemplifying the core characteristics and features of their respective clusters. Once identified, these samples are fed into the LLM alongside manually curated example types. This process aims to generate event type labels for each cluster. Subsequently, these labels undergo refinement by consolidating redundant themes and eliminating less frequent types. The manually curated example types, derived from a subset of samples within the corpus, do not feature in the final set of obtained event types. Their primary role is to guide the LLM in transforming textual information into types, thereby reflecting a semantic mapping from text to types.

5. Experiments

5.1. Datasets

We select two publicly available event datasets that both contain English event description texts from the general domain. ACE-2005 (https://catalog.ldc.upenn.edu/LDC2006T06, accessed on 1 June 2024) is a classic event dataset widely used in event research. ERE-EN [21] is another dataset from the Entities, Relations, and Events annotation task created under the Deep Exploration and Filtering of Test (DEFT) program because it covers more recent articles. We adopt the same data processing strategy as in the previous work [22]. The ACE dataset contains 3805 sentences with 33 event types and the ERE dataset has 4581 sentences with 38 types. We test the performance of our approach on event type discovery and event mention clustering. The statistics of the processed dataset are shown in Table 1.

5.2. Implementation

We implemented all algorithms using PyTorch [23] and Transformer [24] library for our experimental setup. In cases involving the use of LLM, we consistently employed GPT-4o within our methodology. The encoder bert-large-uncased was chosen as our initialization and fine-tuned specifically for our tasks. In Section 4.2.1 and Section 4.2.3, the number of clusters was set to match the number of ground-truth event types. We defined the number of triplets Q = 1024 and the temperature coefficient τ = 0.01 in the optimization objective. We initialized the fine-tuning learning rate at 2 ×   10 6 with AdamW [25] serving as the optimizer. The maximum gradient step length was set to 1280, utilizing a batch size of 4, and training was conducted over 20 epochs. For the baseline model configurations, we kept their original parameter settings unchanged. All experiments in this article were conducted on a Linux server equipped with two RTX 3090 GPUs.

5.3. Evaluation Metrics

We evaluate our metrics from two aspects: event type discovery results and clustering results. For event type discovery, we follow the approach of [2,3] to manually inspect whether the discovered clusters can be mapped to the ground-truth event types. We assess the clustering results using several standard metrics.
We denote the ground-truth type label as Y * , the predicted clusters label as Y, and the total number of event mentions as N.
Adjusted Rand Index (ARI) [26] measures the similarity between two clustering assignments based on the number of pairs that are consistently or inconsistently clustered. The formula is defined as follows:
A R I = R I E ( R I ) m a x R I E ( R I ) ,
R I = T P + T N N ,
In this context, T P refers to the number of element pairs within the same clusters in Y and Y * , while T N refers to the different. Meanwhile, E ( R I ) denotes the expected value of R I under random clustering.
Normalized mutual information (NMI) quantifies the normalized mutual information between two clustering assignments. The formula is defined as follows:
N M I = 2 × M I ( Y * ; Y ) H ( Y * ) + H ( Y ) ,
where M I denotes the mutual information between the two clustering assignments, and H represents entropy.
Accuracy (ACC) measures the clustering quality by finding the permutation function that maps predicted cluster IDs to the ground-truth IDs with the highest accuracy. Let y i and y i * denote the clustering label and the true label of the i-th instance, respectively. The calculation of ACC is defined as follows:
A C C = m a x σ E x c h a n g e ( k ) 1 N i = 1 N I ( y i * = σ ( y i ) ) ,
where E x c h a n g e ( · ) refers to the permutation function. In this paper, we use the Hungarian algorithm as the permutation function, and I represents the indicator function.
For all four metrics, higher values indicate better model performance.

5.4. Baselines

We compare our method with the following approaches in a fully open zero-shot setting:
  • SS-VQ-VAE [1]: A semi-supervised zero-shot event detection model. It employs a variational autoencoder to learn discrete event features. SS-VQ-VAE is trained based on seen event types and annotations, and it can be applied to produce clusters on unseen types.
  • Triframes [27]: A graph-based clustering algorithm that constructs a k-nearest neighbor (k-NN) event mention graph and utilizes the fuzzy graph clustering method WATSET to produce clusters.
  • JCSC [3]: A joint constrained spectral clustering approach that iteratively refines clustering outcomes through the use of constraint functions, ensuring that predicates and objects with mutual dependencies form coherent clusters.
  • ETYPECLUS [2]: A novel event type information P-O (predicate–object) pair is introduced, and a clustering algorithm is redefined based on these P-O pairs. Furthermore, a joint embedding and clustering algorithm in latent space is proposed.
  • ESHer [16]: An event schema harvester. It generates high-quality event schemas by employing context-aware conceptualization, credibility-informed schema structuring, and graph-based schema clustering techniques.
  • TABS [5]: An open event type discovery with type abstraction. The notion of type abstraction is introduced through the development of a collaborative training framework that generalizes information from known types to previously unknown types.
We evaluate these baseline models under identical open conditions, where only raw event text is provided and access to any ground-truth information is restricted. Performance assessments are conducted for all models within this setup.

6. Results and Analysis

6.1. Main Results

Event Clustering Result. The primary results of the event clustering are presented in Table 2. We evaluate our method against baseline models using two datasets. The results demonstrate that our approach attains superior performance across all three clustering evaluation metrics when applied to a completely open event corpus. Compared to other baseline methods, our model’s advantage lies in integrating user intent and task perspectives through LLMs throughout the clustering process, which enables the model to concentrate on event-related information at each stage. In this context, ESHer exhibits suboptimal performance since this method also employs LLMs to generate abstract event pattern graphs during the initial phase. The application of LLMs in event clustering effectively mitigates the challenges of insufficient auxiliary information and supervision signals in open domains. LLMs offer targeted support under specific task perspectives, allowing the model to directly address event types, and thus outperforming other methods.
Event Type Discovery. Our approach was applied to each of the two datasets to identify candidate event clusters. Following the methodology of Huang et al. [3], we manually verified whether the discovered clusters align with ground-truth event types. In ACE-2005, we successfully identified 27 out of 33 event types, while in ERE-EN, we identified 20 out of 38 event types. We present a selection of recovered event types in the Table 3, including example events and their associated actions. From the results, it is evident that our method accurately identifies ground-truth event type labels, demonstrating a strong correlation between actions and event types, thereby effectively capturing event type information.

6.2. Ablation Analysis

To explore the influence of different stages on the final performance, we conducted ablation experiments primarily targeting the initial two stages. Specifically, we separately removed the action extraction and action selection steps to evaluate their impact on performance. Moreover, during the fine-tuning phase, we performed experiments with only one perspective maintained at a time—either the task-specific or the task-agnostic perspective—to assess how information from different perspectives affects performance. The results of these ablation experiments are presented in Table 4.
The results indicate that the removal of action information has a significant impact on the final performance. Action information, which reflects the task perspective and is highly correlated with event types, is crucial for capturing and utilizing key information in an open domain. Its absence substantially reduces the method’s effectiveness. Furthermore, retaining all action information may also adversely affect performance. This is because an event can involve multiple actions, each with varying degrees of relevance to the event type; including actions with weak relevance can introduce noise and degrade performance.
In the fine-tuning task, experimental results demonstrate that action-only fine-tuning yields better performance compared to text-only fine-tuning. This indicates that fine-tuning the model from a task-specific perspective enhances its ability to focus on event type information. Post-fine-tuning, the encoder is more attuned to the type information within events, though this may result in the loss of some global information that also reflects event types. Fine-tuning exclusively from a global perspective leads to a more pronounced decline in performance because the encoder attends to all semantic information in the event, thereby neglecting the local information pertinent to event types, which does not align with the objectives of our task.

6.3. Clustering Algorithms

We also evaluated the performance of different clustering algorithms by replacing them within the method, as detailed in the Table 5. Specifically, we compared K-Means with three alternative algorithms: Agglomerative Clustering (AggClus), DBSCAN, and Gaussian Mixture Model (GMM). Our findings indicate that K-Means significantly outperforms the other three algorithms. The BERT-encoded text vectors generally exhibit a relatively homogeneous distribution in high-dimensional space, enabling K-Means to utilize Euclidean distance effectively for clustering. The K-Means optimization technique is well suited for handling such high-dimensional data and can exploit the semantic information inherent in the encoded vectors to produce more precise clusters. Conversely, DBSCAN is adept at identifying clusters of varying shapes and demonstrates some robustness to noise. Given that BERT vectors typically possess certain density characteristics, DBSCAN can effectively detect dense cluster regions. However, its performance is highly sensitive to parameter settings, which can lead to considerable variability in results. Agglomerative clustering, while capable of generating hierarchical structures, may suffer from computational complexity issues when applied to high-dimensional text vectors. The computational overhead and distance calculations involved in hierarchical clustering can result in less accurate outcomes compared to other algorithms, particularly with large datasets. Additionally, BERT vectors often deviate from the Gaussian distribution assumption, which affects GMM performance adversely. The distribution of text data generally does not conform to Gaussian models, especially in high-dimensional spaces, leading to the poorest performance of GMM, as it struggles to accurately capture the underlying distribution of text vectors.

6.4. Qualitative Analysis

Figure 2 illustrates the results of five iterations of the proposed method. Throughout the iterative process, we sample triplets from the previously fine-tuned embedding space and proceed with further fine-tuning of the model, using the preceding checkpoint as the initialization point. The performance variations following the fine-tuning of the embedder are presented from two distinct perspectives. Our analysis reveals that performance generally improves and reaches a plateau by the fifth iteration. Specifically, for task-specific perspectives, the model exhibits faster convergence and more significant enhancements. In contrast, for task-agnostic perspectives, the convergence rate is notably slower.
We further assess the performance of the method by visualizing the clustering results as illustrated in Figure 3. This involves comparing the initial clustering outcomes with those obtained after iterative refinement on two datasets. The figure reveals that the application of our method leads to a notable enhancement in clustering quality. There is a marked improvement in the separation between different types, and the representation of event types within the clusters becomes significantly more precise.

7. Limitation

In this paper, we employ a synergistic approach that integrates both large and small models to identify event types within open-domain contexts. While this methodology offers several advantages, it also presents notable limitations that impact its effectiveness and generalizability. One significant challenge is the tuning of LLMs, which can be inherently unpredictable and difficult to fine-tune. The variability in LLM outputs can introduce biases and propagate errors throughout the event type discovery process, affecting the consistency and reliability of the results. Additionally, the LLM outputs may reflect biases present in the training data, which can skew the event type discovery and limit the model’s ability to generalize across diverse domains.
Furthermore, the construction of fine-tuning tasks involves sampling anchor points, which currently lacks iterative refinement. This static sampling strategy can lead to a decrease in the relevance and informational value of anchor points as the clustering process evolves. As a result, the effectiveness of clustering may diminish over time, impacting the overall performance of the method. Addressing these limitations requires ongoing refinement of the tuning process and sampling strategies to enhance the robustness and fairness of our approach.

8. Conclusions

In this paper, we introduce a synergistic approach that integrates large and small models to address the challenge of event type discovery in open-domain contexts. This approach effectively addresses the critical issues encountered in existing methodologies, such as the lack of scenario openness and the weak alignment between clustering outcomes and task objectives. The LLM offers perspective information that is aligned with the task goals throughout the event type discovery process, and formulates fine-tuning tasks for the small model to leverage this information. This approach ensures that intermediate information generated at each stage is highly pertinent to the event types, thereby enhancing overall performance. Empirical results indicate that our method can precisely reconstruct event types in completely open domains without relying on any prior information.

Author Contributions

Conceptualization, J.L. and P.H.; methodology, J.L. and B.G.; formal analysis and investigation, H.X. and L.L.; writing—original draft preparation, J.L.; writing—review and editing, B.G. and H.X.; supervision, B.G. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

All of the datasets used in this paper come from Linguistic Data Consortium (LDC) website https://catalog.ldc.upenn.edu/LDC2006T06 (accessed on 1 June 2024), previous studies [2] and publicly available sources. LDC datasets are accessible online for academic use with corresponding licenses and other datasets are freely accessible online without copyright constraints.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

  1. Huang, L.; Ji, H. Semi-supervised New Event Type Induction and Event Detection. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Online, 16–20 November 2020; pp. 718–724. [Google Scholar] [CrossRef]
  2. Shen, J.; Zhang, Y.; Ji, H.; Han, J. Corpus-based Open-Domain Event Type Induction. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Online and Punta Cana, Dominican Republic, 7–11 November 2021; pp. 5427–5440. [Google Scholar] [CrossRef]
  3. Huang, L.; Cassidy, T.; Feng, X.; Ji, H.; Voss, C.R.; Han, J.; Sil, A. Liberal Event Extraction and Event Schema Induction. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Berlin, Germany, 7–12 August 2016; pp. 258–268. [Google Scholar] [CrossRef]
  4. He, M.; Fang, T.; Wang, W.; Song, Y. Acquiring and Modelling Abstract Commonsense Knowledge via Conceptualization. arXiv 2022. [Google Scholar] [CrossRef]
  5. Li, S.; Ji, H.; Han, J. Open Relation and Event Type Discovery with Type Abstraction. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, Abu Dhabi, United Arab Emirates, 7–11 December 2022; pp. 6864–6877. [Google Scholar] [CrossRef]
  6. Zhang, D.; Nan, F.; Wei, X.; Li, S.W.; Zhu, H.; McKeown, K.; Nallapati, R.; Arnold, A.O.; Xiang, B. Supporting Clustering with Contrastive Learning. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Online, 6–11 June 2021; pp. 5419–5430. [Google Scholar] [CrossRef]
  7. Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, MN, USA, 2–7 June 2019; pp. 4171–4186. [Google Scholar] [CrossRef]
  8. Zhao, J.; Gui, T.; Zhang, Q.; Zhou, Y. A Relation-Oriented Clustering Method for Open Relation Extraction. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Online and Punta Cana, Dominican Republic, 7–11 November 2021; pp. 9707–9718. [Google Scholar] [CrossRef]
  9. Xu, D.; Chen, W.; Peng, W.; Zhang, C.; Xu, T.; Zhao, X.; Wu, X.; Zheng, Y.; Chen, E. Large Language Models for Generative Information Extraction: A Survey. arXiv 2023. [Google Scholar] [CrossRef]
  10. Brown, T.B.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A.; et al. Language Models are Few-Shot Learners. In Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, Virtual, 6–12 December 2020. [Google Scholar]
  11. Ouyang, L.; Wu, J.; Jiang, X.; Almeida, D.; Wainwright, C.L.; Mishkin, P.; Zhang, C.; Agarwal, S.; Slama, K.; Ray, A.; et al. Training language models to follow instructions with human feedback. In Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, 28 November–9 December 2022. [Google Scholar]
  12. OpenAI. GPT-4 Technical Report. arXiv 2023. [Google Scholar] [CrossRef]
  13. Zhang, Y.; Wang, Z.; Shang, J. ClusterLLM: Large Language Models as a Guide for Text Clustering. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Singapore, 6–10 December 2023; pp. 13903–13920. [Google Scholar] [CrossRef]
  14. Viswanathan, V.; Gashteovski, K.; Gashteovski, K.; Lawrence, C.; Wu, T.; Neubig, G. Large Language Models Enable Few-Shot Clustering. Trans. Assoc. Comput. Linguist. 2024, 12, 321–333. [Google Scholar] [CrossRef]
  15. QasemiZadeh, B.; Petruck, M.R.L.; Stodden, R.; Kallmeyer, L.; Candito, M. SemEval-2019 Task 2: Unsupervised Lexical Frame Induction. In Proceedings of the 13th International Workshop on Semantic Evaluation, Minneapolis, MN, USA, 6–7 June 2019; pp. 16–30. [Google Scholar] [CrossRef]
  16. Tang, J.; Lin, H.; Li, Z.; Lu, Y.; Han, X.; Sun, L. Harvesting Event Schemas from Large Language Models. arXiv 2023. [Google Scholar] [CrossRef]
  17. Wan, M.; Safavi, T.; Jauhar, S.K.; Kim, Y.; Counts, S.; Neville, J.; Suri, S.; Shah, C.; White, R.W.; Yang, L.; et al. TnT-LLM: Text Mining at Scale with Large Language Models. arXiv 2024. [Google Scholar] [CrossRef]
  18. Pham, C.; Hoyle, A.; Sun, S.; Resnik, P.; Iyyer, M. TopicGPT: A Prompt-based Topic Modeling Framework. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), Mexico City, Mexico, 16–21 June 2024; pp. 2956–2984. [Google Scholar] [CrossRef]
  19. Wang, Z.; Shang, J.; Zhong, R. Goal-Driven Explainable Clustering via Language Descriptions. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Singapore, 6–10 December 2023; pp. 10626–10649. [Google Scholar] [CrossRef]
  20. He, K.; Fan, H.; Wu, Y.; Xie, S.; Girshick, R. Momentum Contrast for Unsupervised Visual Representation Learning. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 9726–9735. [Google Scholar] [CrossRef]
  21. Song, Z.; Bies, A.; Strassel, S.; Riese, T.; Mott, J.; Ellis, J.; Wright, J.; Kulick, S.; Ryant, N.; Ma, X. From Light to Rich ERE: Annotation of Entities, Relations, and Events. In Proceedings of the 3rd Workshop on EVENTS: Definition, Detection, Coreference, and Representation, Denver, CO, USA, 4 June 2015; pp. 89–98. [Google Scholar] [CrossRef]
  22. Zhang, S.; Ji, T.; Ji, W.; Wang, X. Zero-Shot Event Detection Based on Ordered Contrastive Learning and Prompt-Based Prediction. In Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2022, Seattle, WA, USA, 10–15 July 2022; pp. 2572–2580. [Google Scholar] [CrossRef]
  23. Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, Vancouver, BC, Canada, 8–14 December 2019; pp. 8024–8035. [Google Scholar]
  24. Wolf, T.; Debut, L.; Sanh, V.; Chaumond, J.; Delangue, C.; Moi, A.; Cistac, P.; Rault, T.; Louf, R.; Funtowicz, M.; et al. Transformers: State-of-the-Art Natural Language Processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, Online, 16–20 November 2020; pp. 38–45. [Google Scholar] [CrossRef]
  25. Loshchilov, I.; Hutter, F. Decoupled Weight Decay Regularization. In Proceedings of the 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
  26. Hubert, L.; Arabie, P. Comparing partitions. J. Classif. 1985, 2, 193–218. [Google Scholar] [CrossRef]
  27. Ustalov, D.; Panchenko, A.; Kutuzov, A.; Biemann, C.; Ponzetto, S.P. Unsupervised Semantic Frame Induction using Triclustering. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Melbourne, Australia, 15–20 July 2018; pp. 55–62. [Google Scholar] [CrossRef]
Figure 1. Overview of our method: Gray lines denote distinct stages, symbols represent LLM–user interactions, shapes indicate clustering types, and uncolored points show samples with indeterminate outcomes.
Figure 1. Overview of our method: Gray lines denote distinct stages, symbols represent LLM–user interactions, shapes indicate clustering types, and uncolored points show samples with indeterminate outcomes.
Mathematics 12 02928 g001
Figure 2. Clustering accuracy improvement over iterations is shown from two perspectives, using relative accuracy (each iteration’s accuracy divided by maximum observed accuracy).
Figure 2. Clustering accuracy improvement over iterations is shown from two perspectives, using relative accuracy (each iteration’s accuracy divided by maximum observed accuracy).
Mathematics 12 02928 g002
Figure 3. Feature visualization for t-SNE of embeddings of clustering result on ACE-2005 and ERE-EN. We select 10 types from each datasets, denoted by colors.
Figure 3. Feature visualization for t-SNE of embeddings of clustering result on ACE-2005 and ERE-EN. We select 10 types from each datasets, denoted by colors.
Mathematics 12 02928 g003
Table 1. The statistical information of the ACE-2005 and ERE-EN datasets, where T represents the number of event types, and N represents the number of samples.
Table 1. The statistical information of the ACE-2005 and ERE-EN datasets, where T represents the number of event types, and N represents the number of samples.
DatasetACE-2005ERE-EN
| T | | N | | T | | N |
Total333805384581
Mean115.30120.55
Stdev206.32169.58
Table 2. Event clustering results shown as percentages. Average and standard deviation from 5 random experiments (ESHer run once due to high cost). Triframes cannot be assessed by accuracy, as they assume that ground truth clusters match generated clusters.
Table 2. Event clustering results shown as percentages. Average and standard deviation from 5 random experiments (ESHer run once due to high cost). Triframes cannot be assessed by accuracy, as they assume that ground truth clusters match generated clusters.
DatasetModelACCNMIARI
ACE-2005SS-VQ-VAE (Huang and Ji [1])21.19 ± 0.9325.66 ± 0.7210.24 ± 0.37
Triframes (Ustalov et al. [27])-26.83 ± 0.3313.54 ± 0.43
JCSC (Huang et al. [3])25.39 ± 0.6637.5 ± 0.5629.04 ± 0.82
ETYPECLUS (Shen et al. [2])39.62 ± 0.7348.35 ± 0.4133.29 ± 0.05
ESHer (Tang et al. [16])40.71 ± 0.0048.29 ± 0.0027.83 ± 0.00
TABS (Li et al. [5])42.23 ± 1.3244.19 ± 0.4734.41 ± 0.46
proposed53.25 ± 1.1159.75 ± 0.6543.53 ± 0.99
ERE-ENSS-VQ-VAE (Huang and Ji [1])14.38 ± 0.7721.33 ± 0.0413.79 ± 0.48
Triframes (Ustalov et al. [27])-13.46 ± 0.118.31 ± 0.72
JCSC (Huang et al. [3])23.44 ± 0.8929.05 ± 0.2112.22 ± 0.39
ETYPECLUS (Shen et al. [2])31.56 ± 0.9532.04 ± 0.1918.09 ± 0.55
ESHer (Tang et al. [16])35.53 ± 0.0047.2 ± 0.0020.03 ± 0.00
TABS (Li et al. [5])30.73 ± 1.7736.81 ± 0.4421.34 ± 0.96
proposed49.62 ± 1.1650.29 ± 0.3834.53 ± 0.69
Table 3. Examples of event types and corresponding sentences identified by our method in the ACE-2005 and ERE-EN datasets. The first three types are from ACE-2005, and the last three are from ERE-EN.
Table 3. Examples of event types and corresponding sentences identified by our method in the ACE-2005 and ERE-EN datasets. The first three types are from ACE-2005, and the last three are from ERE-EN.
Type Discovery
(Ground-Truth Type)
Number of Samples
(Ground-Truth Number)
ActionExample Sentences
Election
(Personnel:Elect)
72 (131)·win election
·re-elect president
·You become honorable by winning an election of some sort or having a high post, being an ambassador or a governor or some thing.
·my focus is on re-electing president bush and dick cheney next year, the convention is going to be here in the city of new york.
Injury
(Life:Injure)
65 (80)·cause injury
·suffered injury
·Her parents said they don’t know yet what caused their daughter’s injuries.
·it took the child three weeks to die in the hot from – hospital from the injuries it suffered at the hands of its mother and she’s free.
Nomination
(Personnel:Nomimate)
7 (10)·candidate for the nomination
·name a new head
·Senator Christopher Dodd of Connecticut made the announcement today that he would not be the 10th candidate for the nomination.
·and the pope will reportedly name a new head of the troubled boston archdiocese this week.
Meet
(Contact:Meet)
130 (198)·chair meeting
·have appointment
·Typically, the lieutenant governor would chair meetings of the council, a fractious and often rebellious panel that dates to the Colonial era.
·I had my first doctor appointment at 10 weeks, then at 13 and today at 18.
Departure
(Personnel:End-Position)
79 (140)·get fired
·be release
·Now he’s gotten in trouble for it, and I hope that if his intentions were what I think they were, he gets fired.
·Martin-Artajo, a Spaniard who worked in the bank’s London office, was released soon after his surrender and arrest.
Divorce
(Life:Divorce)
13 (21)·divorce
·be divorced
·The couple divorced, remarried and divorced again.
·My perspective is coming from the fact that I’m divorced and my XH tried to milk me for everything I’m worth, even though he cheated on me.
Table 4. Ablation experiment results. “w/o” indicates removed model parts. Bold values show the best results among models.
Table 4. Ablation experiment results. “w/o” indicates removed model parts. Bold values show the best results among models.
ModelACE-2005ERE-EN
ACC NMI ARI ACC NMI ARI
Ours53.2559.7543.5349.6250.2934.53
w/o action extraction39.5349.2535.7638.9342.821.37
w/o action selection47.8254.3138.2941.2346.6226.18
only action-fine-tune48.9455.0640.6745.1248.2330.62
only sentence-fine-tune44.9552.8338.1442.0743.3128.74
Table 5. The results of applying different clustering algorithms. We use five random seeds, and report the average and standard deviation as the final outcomes.
Table 5. The results of applying different clustering algorithms. We use five random seeds, and report the average and standard deviation as the final outcomes.
Clustering AlgorithmACEERE-EN
ACC NMI ARI ACC NMI ARI
AggClus37.53 ± 0.6946.92 ± 0.8131.73 ± 0.7230.22 ± 0.6539.71 ± 0.5822.04 ± 0.99
DBSCAN39.21 ± 1.0551.15 ± 0.7637.29 ± 0.8331.03 ± 1.0943.12 ± 0.7928.52 ± 0.94
GMM31.49 ± 0.7142.06 ± 0.5329.52 ± 0.3927.39 ± 0.9430.26 ± 0.8724.18 ± 0.47
K-Means (ours)53.25 ± 1.1159.75 ± 0.6543.53 ± 0.9849.62 ± 1.1650.29 ± 0.3834.53 ± 0.69
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Li, J.; Ge, B.; Xu, H.; Huang, P.; Liu, L. Collaboration of Large and Small Models for Event Type Discovery in Completely Open Domains. Mathematics 2024, 12, 2928. https://doi.org/10.3390/math12182928

AMA Style

Li J, Ge B, Xu H, Huang P, Liu L. Collaboration of Large and Small Models for Event Type Discovery in Completely Open Domains. Mathematics. 2024; 12(18):2928. https://doi.org/10.3390/math12182928

Chicago/Turabian Style

Li, Jiaxu, Bin Ge, Hao Xu, Peixin Huang, and Lihua Liu. 2024. "Collaboration of Large and Small Models for Event Type Discovery in Completely Open Domains" Mathematics 12, no. 18: 2928. https://doi.org/10.3390/math12182928

APA Style

Li, J., Ge, B., Xu, H., Huang, P., & Liu, L. (2024). Collaboration of Large and Small Models for Event Type Discovery in Completely Open Domains. Mathematics, 12(18), 2928. https://doi.org/10.3390/math12182928

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop