This section identifies the essential elements required for D2T generation and introduces the innovative DecoStrat framework, designed to harness these components and unlock the potential of language models for D2T generation. Additionally, the visual representation of the DecoStrat framework illustrates its core components and their interactions, thereby offering a comprehensive understanding of the framework’s architecture.
3.1. Neural Language Modeling for D2T Generation
D2T generation can be formulated as a sequence-to-sequence problem, where the goal is to learn a mapping from an input sequence X to a target sequence Y. Let be the input sequence, and be the target sequence. The goal is to model the conditional probability , which can be viewed as a next-word prediction problem, where the model predicts the next word in the target sequence given the context of the input sequence and the previous words in the target sequence.
A sequence-to-sequence model consists of encoder and decoder components. Traditionally, researchers have used Recurrent Neural Networks (RNNs) for encoder and decoder components. However, RNNs have significant limitations that make them less effective for modeling sequential data. Their sequential processing is slow and inefficient, and they suffer from the vanishing gradient problem, which hinders the training of deep RNNs. In contrast, transformer models such as T5 and BART have become the current state-of-the-art for sequence-to-sequence tasks. They utilize self-attention mechanisms to process input sequences in parallel and capture long-range dependencies [
26,
36], making them a superior choice for our purposes.
The encoder transforms the input sequence
X into a continuous representation
, and the decoder generates the target sequence
Y based on
in an autoregressive manner, where each output token is generated based on the previous tokens in the target sequence. The encoder and decoder can be represented as functions f and g, respectively:
where Concat denotes the concatenation operation, combining multiple vectors (
,
, …,
) into a single vector
, to form a continuous representation of the input sequence
X.
The probability of generating each token
in the target sequence is computed using the softmax function:
where
is the sequence of previous target outputs. In the context of the decoder function
g, the softmax function is applied to the output of
g. Let us denote this vector as
. The softmax function takes the output vector
z from the decoder and returns a probability distribution over all possible tokens in the vocabulary. This is done by computing the exponential of each element in
z, normalizing the resulting values, and returning a vector of probabilities that add up to 1.
Then, the softmax function can be defined as
where
is the
element of the vector z,
N is the number of elements in the vector
z, and
e is a mathematical constant of approximately 2.718.
The overall probability of generating the target sequence
Y given the input sequence
X is
The training objective is to maximize the likelihood of the target sequence
Y given the input sequence
X:
where
is the likelihood of the target sequence given the input sequence,
and
are the input and target sequences for the
n-th example in the training set,
N is the number of examples in the training set, and
is the set of model parameters.
Decoding Methods
The decoder generates the target sequence
Y based on the continuous representation
in an autoregressive manner, relying on its prediction history to construct the output sequence
during inference. The decoding method determines which token to choose at each time step, playing a crucial role in determining the quality of the generated output as it navigates the complex probability space defined by the language model [
31]. We explore three distinct decoding scenarios in D2T generation: deterministic methods that produce a single output, stochastic methods that introduce randomness, and Minimum Bayes Risk (MBR) methods that select the best candidate output from the generated options.
Deterministic decoding methods: These methods involve selecting the most likely token at each time step, resulting in a single, deterministic output sequence. This is achieved by choosing the token with the highest probability from the vocabulary set
V at each time step
t, as in Equation (
6):
Equation (
6) can represent both greedy search and beam search, which differ in their decoding method
d and their parameters
. In greedy search,
d iteratively selects the most likely token at each step, with
including beam size
[
31,
37,
38], whereas, in beam search,
d maintains a set of top-
k hypotheses, with
including the beam size
[
31,
39,
40,
41,
42]. We can view both methods as different instantiations of the decoding method
d and its parameters
, used to determine the output sequence
.
Stochastic decoding methods: These methods involve randomly sampling from the probability distribution
at each time step
t, resulting in a stochastic output sequence. This is achieved by introducing randomness in the output generation process, as shown in Equation (
7):
Equation (
7) indicates that we can obtain the predicted output sequence
by sampling from the probability distribution
, where the decoding method
d and its parameters
determine the specific sampling strategy by controlling how to carry out the sampling from the probability distribution. The two popular methods that fall under stochastic decoding are top-k samples from the top-k most probable words at each time step [
23] and top-p that samples a subset of the most probable words with cumulative probability equal to or exceeding a threshold (p) at each step [
13]. We can represent both as
.
MBR decoding methods: This method involves generating multiple candidate outputs and selecting the best one based on a utility function such as BLEURT [
22,
24,
34]. This can be achieved by, first, generating a set of candidate outputs
as shown in Equation (
8):
Then, the predicted output sequence
is selected using the candidate selection strategies
, as shown in Equation (
9):
We can evaluate the quality of the generated text
for all scenarios using conventional automatic metrics such as BLEU [
43], ROUGE [
44], METEOR [
45], CIDEr [
46], or any other metric that measures the similarity or relevance of the generated text
to the reference text
y [
47]. Moreover, we can apply human evaluation to examine the qualities of the generated text such as coherence, informativeness, relevance, and fluency [
48,
49].
3.2. Proposed Framework
The proposed DecoStrat framework integrates fine-tuned language models, alternative decoding methods, and candidate selection strategies through the interacting modules. This section presents the DecoStrat system architecture and the details of each module.
3.2.1. DecoStrat System Architecture
The DecoStrat system architecture integrates fine-tuned language models followed by six interacting modules: Director, Generator, Manager, Ranker, Selector, and DecoDic. We designed this architecture to effectively process input data using trained language models and alternative decoding methods to produce the output sequences. The high-level operation of this system is as follows.
The model training process builds upon the work of [
30]. We initialized the parameters of the pre-trained model checkpoints from the hugging face hub [
50], which provides a standardized framework for conducting experiments with transformer-based models on NLP tasks. Moreover, the training and validation datasets were then loaded, and training parameters were employed. The experiment section provides the detailed implementation of the training procedure, including the model specifications and training parameters. During the inference stage, the Director receives input data, a language model, and a decoding method and then instructs the Generator to produce outputs. The Manager subsequently processes the generated output(s), determining the decoding method category by applying a selection strategy
and returning the selected candidate
. The interactions between these modules are crucial in determining the final output sequence. The DecoStrat system architecture visually depicts the main components and the interactions that comprise the framework as illustrated in
Figure 2.
3.2.2. DecoDic
The DecoStrat framework comprises a crucial component, the DecoDic, a comprehensive dictionary of decoding methods that serves as a central hub for decoding methods and their corresponding parameters. The primary function of DecoDic is to validate and configure decoding methods employed by the Director and Manager modules, facilitating seamless integration and optimal performance in the D2T generation process.
As illustrated in Algorithm 1, DecoDic functions categorize decoding methods into two main types: single output and multiple outputs. The single output category includes decoding methods, such as greedy search, which produce a single output for a given input sequence. The single output key in the dictionary contains a list of these methods, each represented as a nested dictionary where the method name serves as the key and the corresponding parameters parameter-name: parameter-value are stored as values. Conversely, the multiple outputs methods encompass techniques that generate multiple intermediate outputs, such as MBR decoding. DecoDic plays a crucial role in the D2T generation task by providing a standardized way of representing decoding methods and their parameters. The Director module uses the DecoDic function to validate its chosen decoding method. Similarly, the Manager module applies this function to determine the category of the selected decoding method, enabling informed decisions about how to proceed with the task. By providing this critical functionality, the DecoDic function ensures that the decoding methods used by the Director and Manager modules are valid and correctly configured for the D2T generation task.
Algorithm 1: Decoding methods dictionary |
|
3.2.3. Director
The Director module is one of the main components of the DecoStrat framework that oversees the generation of output sequences from the input data using a fine-tuned language model and decoding method. It coordinates the interactions between the Generator and Manager modules to produce the final output.
The Director module, shown in Algorithm 2, coordinates the generation of output sequences by validating the decoding method using the
DecoDic function, generating output sequences using the
Generator module, selecting the optimal candidate output sequence using the
Manager module, and, finally, returning the predicted output sequence
. The
Director module acts as a high-level controller, orchestrating the interactions between the
Generator and
Manager modules to produce the predicted output
. The Director module controls and efficiently generates output sequences by directing the D2T generation process.
Algorithm 2: Director module |
1 | module Director (X, , ): |
2 | | | |
3 | | | |
4 | | | |
5 | | | |
6 | | | if not in category then |
7 | | | | raise value error message. |
8 | | | [3] |
9 | | | [4] |
10 | | | |
11 | | | return |
| Input: X: Input sequence. |
| : Decoding methods. |
| : D2T generation language model. |
| Output: : The predicted output sequence. |
3.2.4. Generator
The Generator module is a crucial lower-level component of the DecoStrat framework, performing the actual D2T generation task. It takes input data, a language model, decoding methods, decoding parameters, and the number of candidate output(s) and produces a list containing the generated output sequence(s).
The
Generator module, shown in Algorithm 3, produces output sequence(s) based on the provided inputs and the corresponding processing algorithm. The algorithm accommodates three distinct categories of decoding methods: deterministic, stochastic, and MBR.
Algorithm 3: Generate output sequence(s). |
|
Deterministic decoding: When and are deterministic, the algorithm generates a single output sequence by iteratively predicting the next token with the highest probability according to the language model .
Stochastic decoding: When and is stochastic, the algorithm generates a single output sequence by sampling from the probability distribution predicted by .
MBR decoding: When and is MBR, the algorithm produces k distinct output sequences by sampling from the probability distribution predicted by .
The output is a list containing the generated output sequence(s), a single sequence for deterministic and stochastic decoding, or k sequences for MBR decoding. Overall, the Generator module provides a flexible and efficient way to generate text from input data, supporting a range of decoding methods and parameters. Essentially, the Generator module offers a versatile and effective means of text generation from input data, accommodating an array of decoding methods and associated parameters. The singular output from the module represents the final predicted output, whereas multiple outputs are subject to subsequent processing by other components within the DecoStrat framework.
3.2.5. Manager
The Manager module is a decision-making algorithm that considers multiple possible scenarios to select a single candidate output from a set of candidates based on the specified decoding method and selection strategy.
The Manager module, shown in Algorithm 4, checks the validity of the decoding method and selection strategy to determine the selection process. The process involves selecting a candidate using the Selector module, ranking candidates using Ranker module, or applying a joint ranking and selection approach. The Manager module integrates the Ranker and Selector modules to provide a flexible approach to choosing the most suitable candidate. The algorithm raises an error message if the selection strategy is not recognized. Finally, it returns the selected candidate output to the Director.
3.2.6. Ranker
The Ranker module is a ranking algorithm that adapts the PageRank [
51] algorithm to rank candidates based on their similarity scores 7.
3.2.7. Calculate Similarity Matrix
The Ranker module, outlined in Algorithm 5, takes in a set of candidate outputs, a threshold for convergence, a damping factor, the number of iterations, and a similarity matrix calculated by the Calculate Similarity Matrix (CSM) Algorithm 6.
The Ranker algorithm initializes a score vector with uniform scores for each candidate to iteratively update the score vector based on the similarity matrix and the damping factor until convergence or it reaches a maximum number of iterations. The algorithm converges to a stable set of rankings based on the similarity matrix and the damping factor, then it returns the ranked top n denoted as candidates. The Ranker operates in two modes, either directly selecting the best output based on its ranking when working individually or ranking the top candidate outputs and passing the list to the Selector module when working jointly. We redesigned the algorithm to be well-suited for D2T generation tasks where the generated multiple outputs require ranking based on similarity. [Note, df = 0.85, threshold = 1 , epoch = 100, and the BERT base model were used during implementation]. Calculate Similarity Score (Algorithm 7).
3.2.8. Selector
The Selector module selects an optimal output from a list of candidates based on pairwise similarity scores computed using a utility function.
The Selector module, as shown in Algorithm 8, initializes a matrix, calculates scores for each pair of candidates, and determines the sum of scores for each candidate. The algorithm then selects the output with the highest sum, indicating its relevance or the most representative of the overall set. The Selector can operate in two modes, either receiving candidates from the Generator and making the final selection of the best output based on utility scores when working individually, or receiving the ranked list of top n candidates from the Ranker and making the final selection based on utility scores when working jointly. The Ranker also receives candidates from the Generator, ranks them, and passes the top n candidates to the Selector. By doing so, the Ranker reduces the burden of the Selector, allowing it to focus on making the final selection from a reduced number of promising candidates. This collaborative approach enables the Ranker and Selector to work together efficiently, leveraging their strengths to produce the best possible output.
Algorithm 4: Manager module. |
|
Algorithm 5: Rank candidates. |
|
Algorithm 6: Calculate Similarity Matrix. |
|
Algorithm 7: Calculate Similarity Score. |
|
Algorithm 8: Select a candidate. |
|
3.3. Illustration on DecoStrat Module Interactions
To demonstrate the flexibility and configurability of the DecoStrat framework, we present a scenario that showcases the seamless interaction of all its modules. This scenario highlights how the modules work together to produce the desired output. Using sample data from the MultiWOZ test dataset, we illustrate the operation of the DecoStrat framework in generating text from meaning representation (MR) input data. This example scenario requires the involvement of all modules, allowing us to observe their interactions and demonstrate the framework’s capabilities.
Given the inputs, shown in
Table 1, the DecoStrat framework modules start interacting provided they generate the output
. The DecoStrat framework generates the output
as follows:
Director: The Director module takes in the linearized input data , decoding method , and language model . It checks if the decoding method is in the category of supported decoding methods using the module. It initializes an empty list for the selected candidate and outputs .
Generator: The Director module calls the Generator, passing in the input data
, decoding method
, and language model
. Moreover, the Generator module retrieves the number of candidates to be generated
k and the decoding parameters
, where
consists of top-p and top-k. It generates
k candidate outputs as shown in
Table 2 for the given input data using the MBR decoding method with parameters top-p = 0.7 and top-k = 30.
Manager: The Director module calls the Manager, passing in the set of candidate outputs
. The Manager module also retrieves the decoding method
, selection strategy
, top N ranked candidates
, and utility function
. The Manager module initializes an empty list for the selected candidate
and retrieves the category of decoding methods using the
function. Since we set the selection strategy
to a "joint" value, the Manager module integrates the Ranker and Selector modules, provided that they can work together. The Ranker module takes the generated candidates, ranks them, and returns the top 3 candidates, as shown in
Table 3, to the Selector module. The Selector module selects the best candidate from the top 3 candidates, as shown in
Table 4, using the BLEURT utility function.
Output: The Director module returns the selected candidate as the output
Table 1.
MultiWOZ sample data and configurable inputs to illustrate the interactions of DecoStrat modules.
Table 1.
MultiWOZ sample data and configurable inputs to illustrate the interactions of DecoStrat modules.
DecoStrat Inputs | Values |
---|
MR | topic = general | intent = request more | topic = booking | intent = book | reference number = 85bgkwo4 | length of stay = 1 |
LM | /checkpt/multiwoz/T5base/epoch_22_step_1749/ |
| MBR |
top-p | 0.7 |
top-k | 30 |
k | 10 |
| joint |
| 3 |
| BLEURT |
Table 2.
Sample generated candidate outputs using the Generator module.
Table 2.
Sample generated candidate outputs using the Generator module.
Cand-ID | Candidate Outputs |
---|
Cand-1 | Your booking was successful. The reference number is 85bgkwo4. You’ll be staying for 1 night. Is there anything else I can help you with? |
Cand-2 | Your booking for 1 night was successful. The reference number is 85bgkwo4. Do you need anything else? |
Cand-3 | I was able to book you for 1 night, your reference number is 85bgkwo4. Can I help you with anything else? |
Cand-4 | I have successfully booked your hotel for 1 night. Your reference number is 85bgkwo4. Can I help with anything else today? |
Cand-5 | I was able to book you for 1 night. Your reference number is 85bgkwo4. Can I help you with anything else? |
Cand-6 | Booking was successful for 1 night. Reference number is: 85bgkwo4. Is there anything else I can help you with? |
Cand-7 | Your booking was successful for 1 night. The reference number is 85bgkwo4. Can I help you with anything else today? |
Cand-8 | I have made that reservation for 1 night, your reference number is 85bgkwo4. Is there anything else I can help you with today? |
Cand-9 | Yes, I was able to book 1 night. The reference number is 85bgkwo4. Can I help you with anything else? |
Cand-10 | Booking was successful for 1 night. The table will be reserved for 15 min. Reference number is: 85bgkwo4. Anything else I can help with? |
Table 3.
Ranked candidate outputs using the Ranker module.
Table 3.
Ranked candidate outputs using the Ranker module.
RC-ID | Ranked Candidate Outputs |
---|
RC-1 | Your booking for 1 night was successful. The reference number is 85bgkwo4. Do you need anything else? [Cand-2] |
RC-2 | I was able to book you for 1 night. Your reference number is 85bgkwo4. Can I help you with anything else? [Cand-3] |
RC-3 | Your booking was successful for 1 night. The reference number is 85bgkwo4. Can I help you with anything else today? [Cand-4] |
Table 4.
Selected candidate output using the Selector module.
Table 4.
Selected candidate output using the Selector module.
SC-ID | Selected Candidate Output |
---|
SC | Your booking was successful for 1 night. The reference number is 85bgkwo4. Can I help you with anything else today? [Cand-4] |
In summary, the proposed framework enables the effective utilization of language models in D2T generation. It provides a flexible architecture that accommodates various decoding strategies and language models. At its core, the framework offers three key features: flexibility, separation of concerns, and abstraction. These features enable the building and customization of NLG systems using our framework. DecoStrat’s flexibility allows for the integration of fine-tuned language models, integration and customization of decoding methods, and candidate selection strategies, making it adaptable to diverse applications and evolving requirements. The separation of concerns within the framework allows researchers to focus on designing diverse decoding methods without concerning themselves with the intricate details of other components. DecoStrat abstracts complex implementation details using the PyTorch framework, transformer-based language models, and utilities to make it simple and user-friendly. While DecoStrat has the potential to be a promising solution, its effectiveness relies on careful consideration of factors such as task-specific model training, decoding parameter tuning, and the choice of utility function. With DecoStrat, users can seamlessly integrate and optimize different decoding strategies with language models, achieving better results in D2T generation tasks. Next, we will experimentally evaluate the effectiveness of DecoStrat, considering task-specific model training, integration of proposed decoding methods, and parameter tuning.