1. Introduction
This paper delves into the operator precedence languages family (OPL), created by R. Floyd [
1], which is used for efficient bottom-up parsing [
2,
3]. The author’s inspiration from the structure of arithmetic expressions, where precedence is assigned to multiplicative operators (∗ and /) over additive operators (+ and −), is a key insight. Through the process of bottom-up parsing, the left side (non-terminal symbol) completely replaces the identified right-side production (body) in a context-free grammar (CFG), leaving no room for ambiguity. Operator precedence was extended to grammars that consist of terminals on the right side, even if non-terminals separated them. A CFG is generally considered an operator precedence grammar (OPG) if it does not contain productions with adjacent non-terminal symbols on its right-hand side.
OPLs have greatly aided in inferring context-free grammars. An example is described in [
4,
5,
6], where a method is presented for inferring operator precedence-free grammars from bracketed samples. The method is based on constructing parse trees and utilizing functions that produce the grammars. Precedence rules are used to identify the brackets surrounding the terminal elements of a string in the sample, indicating the order in which the string should be evaluated.
Research on input-driven formal languages (IDL) [
7,
8,
9] (later renamed as Visibly Pushdown Languages (VPL) [
10]) has concluded [
11] that OP closure properties imply ID closure properties and that ID languages are a specific type of OP languages characterized by limited OP relations [
12].
Since the advancements in parallel computing, local parsability properties derived from OPL have been utilized to generate fast parallel parsers [
13].
This paper’s main contribution is its novel approach to building precedence relationships between terminal symbols in non-OPGs. This approach allows the generation of operator precedence tables and bottom-up parsing on a string derived by the non-OPG with support from the table obtained. The novelty of the work is a significant aspect that sets it apart from previous research in the field.
The sets of Left and Right terminals proposed in previous works [
1,
11,
14] to build precedence relationships between the terminals of an OPG were redefined.
A new set of terminals called Leftmost was defined to support solving precedence relationships when there are two or more adjacent terminals on the right side of the productions. This set is significant as it systematically handles the precedence relationships in such complex scenarios, expanding the bottom-up parsing process with non-OPGs.
An algorithm was established to build the precedence table with the support of the three sets of terminals proposed.
Finally, applying the proposed algorithms makes it possible to obtain from a non-OPG the relations and operator precedence table previously limited to OPGs only. In addition, a bottom-up operator-precedence parser could be used to parse any string generated by a non-OPG, a technique that could only be applied to OPGs.
As demonstrated in this work, the proposed solution to construct the operator precedence table is not just a theoretical concept but a practical tool that can be applied to OPGs without any restrictions. At the end of the work, the functionality is demonstrated by applying it to an OPG and two non-OPGs and carrying out the respective bottom-up parsing. This practical demonstration should instill confidence in the effectiveness of the proposed solution and ensure its practical applicability.
The paper is organized as follows:
Section 2 provides the necessary definitions to understand this work.
Section 3 presents the previous work carried out in the area of CFG.
Section 4 introduces our approach, detailing the problem we aim to solve and the redefinitions of previous works’ concepts used to develop our algorithms.
Section 5 presents examples of applying the proposed algorithm to one OPG and two non-OPGs, demonstrating the application of bottom-up parsing with the precedence tables to test their use.
Section 6 describes several exceptions where the rules described in the proposed algorithms do not apply. Finally,
Section 7 provides conclusions to our work.
2. Basic Definitions
2.1. Context-Free Grammars and Languages
Within Chomsky’s hierarchy [
15], context-free grammars play a significant role in programming and compilation language applications by addressing the syntactic structure of programming languages.
A context-free grammar (CFG) G generates a language , commonly called context-free language (CFL), which is composed of the strings generated by the CFG. The CFG is defined as a 4-tuple , where:
T represents the set of terminal symbols that create the strings within the CFL generated by the CFG.
N represents the set of non-terminal symbols or syntactic variables that determine strings leading to CFL generation.
S is the initial non-terminal symbol from which all strings defined by the GFC are generated.
P is the set of productions or rules that group terminals and non-terminals to generate the strings. A production takes the form , where A represents a non-terminal and is a combination of terminals and non-terminals. The ⟶ is pronounced “produce”. Thus, the production is read “A produce alpha”.
The following notation will be used: lowercase letters from the beginning of the alphabet represent terminal symbols ; uppercase letters from the beginning of the alphabet represent non-terminal symbols ; lowercase letters late in the alphabet represent terminal strings ; lowercase Greek letters generally represent terminal and/or non-terminal strings , and the empty string will be represented by (this Greek letter will be the exception to the above convention).
2.2. Operator Precedence Grammars (Basic Rule)
A CFG is considered an operator grammar if none of the productions in P have adjacent non-terminal symbols on their right side. In other words, for a production in G, does not take the form .
2.3. Derivations
A string can be derived in k steps from a non-terminal if there exists a sequence of strings of the form . Using the symbol ⟹, meaning one-step derivation, the sequence of strings is arranged in the form: .
In this sequence, corresponds to A and to . Each intermediate in the sequence has the form , where and there is a production , such that by substituting it on the right side, is obtained. The derivation in one or more steps is denoted with the operator . In general, it is stated that every string of terminal symbols can be established as or expanding the derivation in one or more steps: . For a grammar G, a sentential form is a string that , meaning that can be obtained with zero or more derivations from S. If two grammars generate identical languages, they are considered equivalent.
2.4. Types of Derivations
According to [
16],
is a leftmost derivation of the string
, when each
, has the form
with
,
and
. Furthermore,
belongs to
P, and then, in each
, the symbol
is substituted by
, resulting in the sentential form
. Each
is called the left-sentential form.
If each , takes the form of with , , and , then a rightmost derivation would be formed. Each is called the right-sentential form.
2.5. Parse Trees
According to reference [
16], a derivation tree for a CFG
is a labeled and ordered tree in which each node receives a label corresponding to a symbol from the set
. If a non-leaf node is labeled
A and its immediate descendants are labeled
, then
is a production in
P. A labeled ordered tree
D is a derivation tree for a CFG
if
The root of D is labeled A.
If are the subtrees of the direct descendants of the root and the root of is labeled , then is in P.
is a derivation tree for if is in N or is a single node labeled if is in T or is a single node labeled .
Parse trees can be constructed from any derivation type.
2.6. Bottom-Up Parsing
The bottom-up parsing syntax analysis [
2], also known as shift-reduce parsing, attempts to build a parsing tree for an input string that starts in the leaves (tree bottom) and moves to the root (tree top). This process is considered as the reduction of the string
w to the initial symbol
S of a CFG. At each step of parsing reduction, a particular substring of the sentential form that matches the right side of a production is replaced by the non-terminal symbol of the left side of that production. If, at each step, the substring is chosen correctly, a rightmost derivation is traced out inversely.
2.7. Handle
A handle [
2,
17] of a right-sentential form
is a production
and a position of
where the string
could be found and replaced by
A to produce the previous right-sentential form in a rightmost derivation of
. Generally, if
, then
if the position next to
is a handle of
. If the grammar is ambiguous, more than one handle will be obtained. If the grammar is unambiguous, then every right-sentential form has exactly one handle. Usually, the string
is told to be a handle of
when the conditions for doing so are clear.
2.8. Implementation of a Bottom-Up Parsing
The bottom-up parsing by shift and reduction uses a stack to store grammar symbols and a buffer to handle the input string to be analyzed. The
$ symbol is also used to delimit the bottom of the stack and the right side of the input buffer. The recognition model format starts with the stack at
$ and the string
in the input, as shown in
Table 1.
The bottom-up parsing shifts zero or more input symbols to the stack until a handle is at the top. The parsing then reduces the handle to the left side of the appropriate production to obtain the sentential form corresponding to the previous step of the rightmost derivation. When reducing, the parsing recognizes that the right end of the handle is at the top of the stack. Therefore, the parsing takes action to find its left end in the stack and determine the non-terminal that will replace it according to the right side of some production where it matches the handle. These steps are repeated until an error is detected or until the stack contains and the input is with the symbol $, at which point the parsing ends and the input string is considered valid for the CFG.
3. Previous Work
According to [
1,
11,
14], for a CFG
G, the left terminal sets
and right terminal sets
are defined as follows:
Precedence relationships arise from the parse tree’s establishment of binary relationships between consecutive terminals or those that become sequential after a bottom-up process toward the non-terminal symbol S.
The precedence relationships are established as follows:
, if and only if .
, if and only if .
, if and only if .
According to [
2], precedence relations are used to delimit a handle in a right-sentential form. That is, in the sentential form
, the substring
is a handle if there is a production
in the CFG; therefore, it can be delimited by the precedence relations, obtaining
. The relation ≐ between the symbols used when the handle has more than one consecutive terminal symbol. That is, if
, then the relation
should be established. Suppose a right-sentential form is
, where each
is a non-terminal. In that case, the relations can be established as
, with a maximum of one non-terminal (operator grammar principle) between the terminals. Non-terminals could be eliminated in the right sentential form, leaving only the relationships between terminals.
In short, each time a handle is obtained, it can be enclosed between the symbols of ⋖ and ⋗. In addition, two consecutive terminal symbols can be inside a handle (non-terminals in the middle), and the relationship is to be established ≐. Finally, non-terminal symbols can be removed from the right-sentential form when a handle is found and locked between the precedence relations.
According to [
2], the concepts discussed in
Section 2.8, and the precedence relations, the precedence parsing Algorithm 1 is constructed as follows:
Algorithm 1 Precedence parsing algorithm |
Require: Set of precedence relations. |
4. Approach
This work proposes a redefinition of the Left and Right terminal sets, corresponding to and , respectively. Additionally, the definition of the new Leftmost set is presented. The new rules for determining precedence relations are demonstrated even if the CFG does not comply with the fundamental rule of operator grammar, meaning that the CFG can have more than one adjacent non-terminal symbol on the right side of its productions.
We present the algorithms for constructing the three proposed sets: Left, Leftmost, and Right. Additionally, we demonstrate the algorithm that incorporates the newly established precedence rules.
4.1. Definitions
There are the following symbols:
A string of terminals .
A string of non-terminals .
A sentential form .
The following sets are defined:
The left, denoted as
, is the set of terminal symbols that can appear on the left side of any sentential form as a product of a rightmost derivation from
A.
The leftmost, denoted as
, is the set of terminal symbols that can appear at the end of the left side in any derivation from
A, in the absence of other leftmost grammatical symbols (or that derive to
in their absence).
The right, denoted as
, is the set of terminal symbols that can appear at the right end of the rightmost derivations from
A or appear further to the right in any production of
A.
According to the above, the following precedence rules are defined:
If , then .
If , then .
If , then .
If , then .
4.2. Problem Analysis
4.2.1. Absolute Operator Grammar
Given the rightmost derivation series of the form:
It is noted that
is a handle according to the previous definition in
Section 2.7.
The right-sentential form of (1) can be rewritten as follows:
From the last sentential form in (2), it follows that
is a handle according to the definition in
Section 2.7.
Because of the above, the analysis will focus on the productions born of A, which have at least one non-terminal B on the right, as seen in (1). The productions in A will be traversed through grammar to find the relations of precedence of non-terminal B located on the right side.
Within the context of , having to , and is a handle, the following forms can be obtained from its definition in (1):
- (a)
, it follows that . Here and .
- (b)
, it follows that . Here and .
- (c)
, it follows that and . Here and .
For the non-terminal start symbol, S, an initial dummy production is defined and rule (c) is applied.
For a production , it follows that .
4.2.2. General Grammar with Two or More Consecutive Non-Terminal Symbols
Redefinition of Left, Right and definition of Leftmost.
Given the rightmost derivation:
It is observed that is a handle within a bottom-up parsing (inverse rightmost derivation) since ; therefore, from (3), it can be established that . In addition, can have the following forms:
- (a)
.
- (b)
.
Continuing with the rightmost derivation in (3) and taking each of the options of , we obtain that:
- (a)
since is a handle, consequently ; therefore, where and .
- (b)
. It is observed that and ; therefore, is a handle, and consequently and .
In short, ; taking and , we obtain the summary derivation , where and .
Given the rightmost derivation:
It is observed that is a handle within a bottom-up parsing (inverse rightmost derivation), since ; therefore, from (4), it can be established that . In addition, can have the following forms:
- (a)
.
- (b)
.
Continuing with the rightmost derivation in (4) and taking each of the options of , we obtain that:
- (a)
since is a handle, consequently ; therefore, where and .
- (b)
. It is noted that and ; therefore, is a handle, and consequently, and .
In short, ; taking , we obtain the summary derivation , where and .
Given the production with two non-terminal symbols on the right (body):
Rightmost derivation from
S in (5) is:
In summary:
In , the relationship can be established since the right part of the production (6) is a handle. Therefore, and .
In , it is known that , where each and C is the non-terminal, first derived from the right. It is noted that .
In , being the result of the last rightmost derivation performed and, in turn, a handle; therefore, the relationship is established, since is a handle, therefore, .
From , . Establishing , then , therefore, .
From , is established, from which we define the new Leftmost set of B as .
4.2.3. An Intuitive Way to Observe the Elements of Each Set
Vision of Left(·):
Given the CFG:
The rightmost derivation of the string
carries the following way of creating the
or
.
The terminal symbols
e,
d, and
c are viewed from the left and recorded in
. The symbol
e is the first one seen from the left and is recorded in
, and the symbol
h is blocked by
e since both appear when
E is substituted. Then,
D is replaced by
d, which is observed and recorded in
. When
replaces
C, we observe and record
c in
. Finally, when the substitution of
F by
f is carried out, it can no longer be observed (blocked by
c) and cannot be registered. Consequently:
Vision of Right(·):
Given the CFG:
The rightmost derivation of the string
carries the following way of creating the
or
.
The terminal symbols
b,
c, and
d are viewed from the right and recorded in
. The symbol
b is the first one observed and recorded in
. The symbol
a is blocked by
b, since both appear when
A is derived.
C is then replaced by
and
c is observed and recorded in
. When
D is substituted for
d, we observe and record
d in
. Then
B is derived by
but
e cannot be recorded since the previous inclusion of
c and
d in
makes it invisible or blocks it from being observed from the right. Finally, when substituting
F for
f, it cannot be observed from the right either (blocked by
c and
d) and cannot be recorded. Consequently:
Vision of Leftmost(·):
Given the CFG:
The rightmost derivation of the string
carries the following way of creating the
or
, as shown in
Figure 1.
It can be seen that . When B derivates , we observe that because no more terminal symbols are seen up to that point. In the following derivation, C is replaced by c; therefore, c is located further to the left of the substring , leaving .
4.2.4. Summary of Relations between Grammatical Symbols from Productions and Sets
Table 2 shows the relationships between the grammatical symbols of the productions and the sets of terminals.
4.3. Algorithms
The following algorithms were developed based on the definitions presented in
Section 4.1.
Algorithm 2 receives a non-terminal A whose productions are known; for each of these, it is verified whether they correspond to the form ; if so, Algorithm 2 is executed again, sending the non-terminal B as a parameter (provided that ), and its response is included in the set .
Subsequently, it is verified whether the production corresponds to the form , and if so, the terminal a is added to the set .
Once the verification of all productions of
A is completed, the set
is returned as the output of Algorithm 2.
Algorithm 2 algorithm of a non-terminal A |
function Left(A) for each production of A do if then // contains a end if if then // a is in end if end for return end function
|
Algorithm 3 receives a non-terminal A whose productions are known; for each of these, it is verified if they correspond to the form and ; if so, Algorithm 3 is executed again, sending the non-terminal B as a parameter (provided that ), and its response is included in the set .
Subsequently, it is verified if the production corresponds to the form and . If so, the terminal a is added to the set .
Once the verification of all productions of
A is completed, the set
is returned as the output of Algorithm 3.
Algorithm 3 algorithm of a non-terminal A |
function Leftmost(A) for each production of A do if and then end if if and then end if end for return end function
|
Algorithm 4 receives a non-terminal A whose productions are known; for each of these, it is verified if they correspond to the form and ; if so, Algorithm 4 is executed again, sending the non-terminal B as a parameter (provided that ) and its response is included in the set .
Subsequently, it is verified whether the production corresponds to the form , and if so, the terminal a is added to the set .
Once all productions of
A have been verified, the set
is returned as the output of Algorithm 4.
Algorithm 4 algorithm of a non-terminal A |
|
Algorithm 5 receives a grammar and Algorithms 2–4. Initially, precedence relations are added between the delimiting symbol $ and the Left and Right sets of the non-terminal symbol of start S, according to rules (A.1) and (A.2), respectively.
Algorithm 5 operates in an iterative manner. It traverses all the productions of the set P, and for each iteration, it initializes two temporary variables, u and l, to null and an empty list, respectively. It then performs a series of processes for each pair of contiguous grammatical symbols X and Y in a production.
The algorithm checks if has the form ; if so, it adds the corresponding precedence relationship between a and b, following rule (B).
The algorithm checks if has the form ; if so, it assigns u the value of a and adds the corresponding precedence relations between a and terminals in , following rule (C).
The algorithm checks if has the form ; if so, it adds the corresponding precedence relations between b and terminals in , following rule (D.1). Then, check if , and if so, add the corresponding precedence relationship between u and b, following rule (D.2), and reset u to . Finally, checks if list l has elements, and if so, adds the corresponding precedence relations between b and terminals in , following rule (D.3), and resets list l to empty.
The algorithm checks if has the form ; if so, it adds the corresponding precedence relations between terminals in and terminals in , following rule (E.1). Then, it checks if , and if so, adds the corresponding precedence relation between u and terminals in , following rule (E.2). Finally, it checks if the list l has elements and if so, adds the corresponding precedence relations between terminals in , with and terminals in , following rule (E.3), and resets the list l if necessary.
Algorithm 5 Algorithm to obtain the operator precedence table of a G grammar |
Require: procedure Table() compute add (A.1) add (A.2) for each in P do // empty list for each contiguous in do if then add (B) end if if then add (C) end if if then add (D.1) if then add (D.2) end if if then for do add (D.3) end for end if end if if then add (E.1) if then add (E.2) end if if then for do add (E.3) end for end if if then add A to l else end if end if end for end for end procedure
|
4.4. Complexity Analysis
The following variables will be established for the computation of the complexity of the four proposed algorithms:
t: Cardinality of the set of terminal symbols.
p: Cardinality of the set of productions.
g: Number of grammatical symbols in the productions.
n: Cardinality of the set of non-terminal symbols.
4.4.1. Complexity of Algorithm 2 (Left)
The Left function checks in which productions the symbol A appears on the left side. In the first if-block, a recursive call of Left is observed, which implies that all non-terminal symbols of G could be traversed at the end of the execution in the search for . Therefore, all worst-case productions would be traversed, yielding a computational time of . The second if-block has a computational time of , which does not contribute to the overall computational time .
4.4.2. Complexity of Algorithm 3 (Leftmost)
The Leftmost function has a similar structure to the previous one; therefore, the computational execution time is .
4.4.3. Complexity of Algorithm 4 (Right)
The Right function has a similar structure to Left; therefore, the computational execution time is .
4.4.4. Complexity of Algorithm 5 (Table)
According to the first for-cycle, the number of times it would be executed will be p. Immediately following the previous one, the inner for-cycle is executed in the worst-case g times, assuming that the total number of grammatical symbols appear in each production. Therefore, the computational cost of the two cycles is .
The computational costs for the construction of , , and , , are already computed. In computing the complexity of Table, the length of each generated set will be considered, which in the worst case is t.
The analysis of the four if-blocks inside the second for-cycle is as follows:
In the first if-block, after checking, the executed instruction is carried out in constant time and is denoted .
In the second if-block, after checking, the first instruction is executed in a constant time , and the second is executed in a computational time ; therefore, the computational time of the if-block is .
In the third if-block, after checking, three sub-blocks are distinguished; the first one corresponds to the travel of , executed in a computational time . After checking , the second sub-block presents two instructions executed in a constant time . The third sub-block, after the checking of , presents a computational cost , determined by the for-cycle executed, in the worst case, times, corresponding to the number of non-terminal symbols in G and the travel it makes from with a cost . In summary, the computational cost of the third if-block is , corresponding to the worst case.
In the fourth if-block, after checking, four sub-blocks are distinguished. The first corresponds to the combinations formed from traversing and to obtain the precedence relations . Since the maximum size of each set is t, the computational cost of these combinations is . After checking , the second sub-block presents an instruction where is traversed; therefore, the computational cost is . In the third sub-block, after the checking of , it presents a for-cycle that executes at most n times the combinations formed from traversing and to obtain the precedence relations . These combinations are traversed maximally in computational time because t is the size of both sets and, consequently, the computational cost of the third sub-block is . The last sub-block will have a cost of . In summary, the computational cost of the fourth block is , corresponding to the worst case.
In summary, the computational cost of the four if-blocks within the second for-cycle of Algorithm 5 is , this being the worst case; therefore, the computational cost of the main body of Algorithm 5 formed by the two nested for-cycles and the four if-blocks is which is greater than the cost of computing any of the three sets of terminal symbols , , and , , when starting the algorithm, and corresponding to .
5. Examples
5.1. Example 1
5.1.2. Left
The
Left sets are calculated as detailed in
Table 3.
5.1.3. Right
The
Right sets are calculated as detailed in
Table 4.
5.1.4. Leftmost
The
Leftmost are calculated as detailed in
Table 5.
5.1.5. Case Studies
For Algorithm 5, the application of some precedence relations is exhibited.
For the delimiter symbol $:
- (a)
Rule (A.1).
- (b)
Rule (A.2).
For
- (a)
Rule (D.1).
- (b)
Rule (C).
For
- (a)
Rule (C).
- (b)
Rule (D.1).
Rule (D.2).
In the following cases, the form is not fulfilled in the right side part of the production; therefore, no rule can be applied:
- (a)
- (b)
- (c)
The remaining precedence relations are derived using the same method as the previously demonstrated cases.
5.1.6. Operator Precedence Table
At the end of the algorithm execution, operator precedence
Table 6 is obtained.
5.1.7. Bottom-Up Parsing
Let
be a string obtained by rightmost derivation, then applying Algorithm 1 results in the bottom-up parsing detailed in
Table 7.
5.2. Example 2
5.2.2. Left
The
Left sets are calculated as detailed in
Table 8.
5.2.3. Right
The
Right sets are calculated as detailed in
Table 9.
5.2.4. Leftmost
The
Leftmost sets are calculated as detailed in
Table 10.
5.2.5. Case Studies
For Algorithm 5, the application of some precedence relations is exhibited.
For the delimiter symbol $:
- (a)
Rule (A.1).
- (b)
Rule (A.2).
For
- (a)
Rule (E.1).
- (b)
Rule (D.1).
For
- (a)
Rule (B).
- (b)
Rule (C).
- (c)
Rule (D.1).
Rule (D.2).
In the following cases, the form is not fulfilled in the right side part of the production; therefore, no rule can be applied:
- (a)
- (b)
- (c)
- (d)
The remaining precedence relations are derived using the same method as the previously demonstrated cases.
5.2.6. Operator Precedence Table
At the end of the algorithm execution, operator precedence
Table 11 is obtained.
5.2.7. Bottom-Up Parsing
Let
be a string obtained by rightmost derivation, then applying Algorithm 1 results in the bottom-up parsing detailed in
Table 12.
5.3. Example 3
5.3.2. Left
The
Left sets are calculated as detailed in
Table 13.
5.3.3. Right
The
Right sets are calculated as detailed in
Table 14
5.3.4. Leftmost
The
Leftmost sets are calculated as detailed in
Table 15.
5.3.5. Case Studies
For Algorithm 5, the application of some precedence relations is exhibited.
For the delimiter symbol $:
- (a)
Rule (A.1).
- (b)
Rule (A.2).
For
- (a)
Rule (E.1).
- (b)
Rule (E.1).
Rule (E.3).
For
- (a)
Rule (C).
For
- (a)
Rule (C).
For
- (a)
Rule (E.1).
- (b)
Rule (D.1).
In the following cases, the form is not fulfilled in the right side part of the production; therefore, no rule can be applied:
- (a)
- (b)
- (c)
- (d)
- (e)
The remaining precedence relations are derived using the same method as the previously demonstrated cases.
5.3.6. Operator Precedence Table
At the end of the algorithm execution, operator precedence
Table 16 is obtained.
5.3.7. Bottom-Up Parsing
Let
be a string obtained by rightmost derivation, then applying Algorithm 1 results in the bottom-up parsing detailed in
Table 17.
5.4. Discussion
In the examples, three CFGs are considered: one OPG and two non-OPGs. In the first CFG, it is observed that it complies with the basic rule, while the following ones do not since they present productions with consecutive non-terminal symbols on the right side. Regardless of the nature of the CFG, the proposed algorithms play a crucial role in obtaining the operator precedence table. Using Algorithm 1 applied to a specific string for each CFG, a bottom-up parsing is performed, verifying that the strings are generated by their respective grammars. This process underscores the functionality of the proposal to perform bottom-up parsing based on operator precedence, even when the CFGs do not fulfill the basic requirement of the OPGs.
6. Exceptions
If the CFG has any of the following forms, the rules outlined in the algorithms of this paper would not apply:
The sets for B in the second production are:
.
.
The precedence relations based on the sections of Algorithm 5 exhibit:
For the rule (C): and .
For the rule (D.1): .
For the rule (D.2): .
Applying rules (D.1) and (D.2) generates the conflict.
Being .
According to the established conditions, the sets for B in the second production are
.
.
The precedence relations based on the sections of Algorithm 5 exhibit:
For the rule (C): and .
For the rule (D.1): .
For the rule (D.2): .
The application of rules (D.1) and (D.2) generates the conflict.
If the does not contain a, then the conflict does not arise.
The sets for B in the second production are:
.
.
The precedence relations based on the sections of Algorithm 5 exhibit:
For the rule (C): and .
For the rule (D.1): .
For the rule (D.2): .
The application of rules (C) and (D.2) generates the conflict.
Being .
According to the established conditions, the sets for B in the second production are:
.
.
The precedence relations based on the sections of Algorithm 5 exhibit:
For the rule (C): and .
For the rule (D.1): .
For the rule (D.2): .
The application of rules (C) and (D.2) generates the conflict.
If the does not contain a, then the conflict does not arise.
When CFG productions take the following form, there will be no conflicts:
where has neither a at the end of the right side nor b at the end of the left side.
7. Conclusions
Previous work has addressed the application of OPGs to solve problems related to the parsing of various forms of languages; however, not much emphasis has been placed on the definition of the Left and Right sets nor on the generation of an algorithm to find the operator precedence table.
In this work, we have redefined the sets Right and Left, previously exposed by other authors, and introduced a novel set called Leftmost. These sets form the basis for an algorithm that constructs the operator precedence table. Each set is constructed from an algorithm that facilitates its obtaining, adding a new dimension to the existing research.
The proposed algorithm for constructing the precedence table, while breaking (with some exceptions) the basic definition of an OPG, opens up new possibilities. It allows for the construction of an operator precedence table from a non-OPG CFG, paving the way for a bottom-up parsing algorithm to recognize a string generated by the CFG.
Three examples of CFGs are shown: one OPG and two non-OPGs. The proposed algorithms are systematically applied to these examples, step by step, until the operator precedence tables are obtained, regardless of the CFG’s nature. Additionally, the bottom-up parsing algorithm is applied to specific strings for each grammar, providing verification that they generated them.
It is important to note that while the algorithm and the definition of the new sets of terminal symbols are significant advancements, they do have limitations. It is not always possible to solve all cases where two or more adjacent terminal symbols are on the right-hand side. There are exceptions, and we must be aware of them.
Author Contributions
Conceptualization, L.L. and J.M.; methodology, J.M.; software, L.L.; validation, L.L. and J.M.; formal analysis, L.L., E.A., and J.M.; investigation, L.L. and J.M.; resources, J.M.; data curation, E.A.; writing—original draft preparation, E.A. and J.M.; writing—review and editing, E.A. and J.M.; visualization, E.A.; supervision, J.M.; project administration, J.M.; funding acquisition, J.M.; All authors have read and agreed to the published version of the manuscript.
Funding
This research received no external funding and the APC was funded by Universidad del Norte.
Data Availability Statement
Data sharing is not applicable to this article.
Conflicts of Interest
The authors declare no conflict of interest.
References
- Floyd, R.W. Syntactic analysis and operator precedence. J. ACM (JACM) 1963, 10, 316–333. [Google Scholar] [CrossRef]
- Alfred, V.; Monica, S.; Ravi, S.; Jeffrey, D.U. Compilers Principles, Techniques; Pearson Education: London, UK, 2007. [Google Scholar]
- Grune, D.; Jacobs, C.J.H. Parsing Techniques (Monographs in Computer Science); Springer: Berlin, Heidelberg, 2006. [Google Scholar]
- Crespi-Reghizzi, S. An effective model for grammar inference. In Information Processing 71; Gilchrist, B., Ed.; Elsevier: North-Holland, New York, 1972; pp. 524–529. [Google Scholar]
- Crespi-Reghizzi, S. Reduction of Enumeration in Grammar Acquisition. In Proceedings of the 2nd International Joint Conference on Artificial Intelligence, San Francisco, CA, USA, 1–3 September 1971; pp. 546–552. [Google Scholar]
- Crespi-Reghizzi, S.; Melkanoff, M.A.; Lichten, L. The Use of Grammatical Inference for Designing Programming Languages. Commun. ACM 1973, 16, 83–90. [Google Scholar] [CrossRef]
- von Braunmühl, B.; Verbeek, R. Input Driven Languages are Recognized in log n Space. In Topics in the Theory of Computation; North-Holland Mathematics Studies; Karplnski, M., van Leeuwen, J., Eds.; Elsevier: North-Holland, The Netherlands, 1985; Volume 102, pp. 1–19. [Google Scholar]
- Mehlhorn, K. Pebbling mountain ranges and its application to DCFL-recognition. In Automata, Languages and Programming; de Bakker, J., van Leeuwen, J., Eds.; Springer: Berlin/Heidelberg, Germany, 1980; pp. 422–435. [Google Scholar]
- Mandrioli, D.; Pradella, M. Generalizing input-driven languages: Theoretical and practical benefits. Comput. Sci. Rev. 2018, 27, 61–87. [Google Scholar] [CrossRef]
- Alur, R.; Madhusudan, P. Adding Nesting Structure to Words. J. ACM 2009, 56, 1–43. [Google Scholar] [CrossRef]
- Crespi Reghizzi, S.; Mandrioli, D. Operator precedence and the visibly pushdown property. J. Comput. Syst. Sci. 2012, 78, 1837–1867. [Google Scholar] [CrossRef]
- Crespi Reghizzi, S.; Pradella, M. Beyond operator-precedence grammars and languages. J. Comput. Syst. Sci. 2020, 113, 18–41. [Google Scholar] [CrossRef]
- Barenghi, A.; Crespi Reghizzi, S.; Mandrioli, D.; Panella, F.; Pradella, M. Parallel parsing made practical. Sci. Comput. Program. 2015, 112, 195–226. [Google Scholar] [CrossRef]
- Crespi-Reghizzi, S.; Mandrioli, D.; Martin, D.F. Algebraic properties of operator precedence languages. Inf. Control 1978, 37, 115–133. [Google Scholar] [CrossRef]
- Chomsky, N. Three models for the description of language. IRE Trans. Inf. Theory 1956, 2, 113–124. [Google Scholar] [CrossRef]
- Aho, A.V.; Ullman, J.D. The Theory of Parsing, Translation, and Compiling; Prentice-Hall Englewood: Cliffs, NJ, USA, 1973; Volume 1. [Google Scholar]
- Louden Kenneth, C. Compiler Construction: Principles and Practice. In Course Technology; PWS Publishing Co.: Montgomery, IL, USA, 1997. [Google Scholar]
| Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).