Genetic Algorithms: A Practical Approach to Generate Textual Patterns for Requirements Authoring
Abstract
:1. Introduction
2. State of the Art
- Solution codification: define genes, individuals, and population.
- Define a fitness function to evaluate how suitable every individual is as a possible solution to the problem.
- Randomly generate an initial population.
- ITERATE
- Evaluate population of generation n using the fitness function
- Select parents to cross breed (creating an individual pool)
- Crossbreed parents. Generate generation n + 1
- Mutation of generation n + 1
- Replace generation n with generation n + 1
- UNTIL exit condition.
3. Methodology
3.1. Basic Definitions
3.1.1. Pattern, Tokens, and Slots
3.1.2. Requirement and Requirements Sets
3.1.3. Requirements Patterns Set (RPS) for a Domain
3.1.4. Pattern Matching
3.1.5. Optional Slots and Wildcard Token
3.1.6. Affinity Parameter
3.2. Algorithms Implementation
3.2.1. Solution Approach
3.2.2. Genetic Algorithm
Genetic Algorithm Definitions
Genetic Algorithm Processes
- Change of token: Change of token (allele) in one slot of the pattern
- Change of optionality characteristic of a gene: A gene slot, changes its optionality becoming optional or not (opposite of the current situation)
- Increase of size: Increase the number of slots of the individual, adding a random number of new slots.
- Decrease of size: Decrease the number of slots of the individual, extracting slots at the end of the individual.
- All Requirements have been matched,
- The maximum number of generations as specified by a specific parameter is reached,
- After a specified number of consecutive generations without a significant increase in population fitness. The number of consecutive generations and the minimum increase of fitness are GA parameters.
3.2.3. Separate-and-Conquer Algorithm
- All Requirements have been matched,
- The maximum number of iterations as specified by a specific parameter is reached,
- After a specified number of consecutive iterations without finding any pattern. This number is also a parameter.
3.2.4. Algorithms Main Parameters
Scenarios Depending on the Objective
Scenarios Depending on the Performance
3.2.5. Metrics for Measure the Quality of the Solution
4. Experiments
4.1. Problem Statement
4.2. Case Study and Data Sets
4.3. Experiment Results
4.3.1. Example
4.3.2. Summary of Results
4.3.3. Algorithm Running Time
4.3.4. Impact of Requirements Size and Affinity Parameters
5. Conclusions and Future Works
Future Works: New Scenarios with Different Parameters Values
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- McDermott, T.; DeLaurentis, D.; Beling, P.; Blackburn, M.; Bone, M. AI4SE and SE4AI: A Research Roadmap. Insight 2020, 23, 8–14. [Google Scholar] [CrossRef]
- Micouin, P. Model-Based Systems Engineering: Fundamentals and Methods; iSTE: London, UK; Wiley: Hoboken, NJ, USA, 2014. [Google Scholar]
- Alvarez-Rodríguez, J.M.; Mendieta, R.; Moreno, V.; Sánchez-Puebla, M.Á.; Llorens, J. Semantic recovery of traceability links between system artifacts. Int. J. Softw. Eng. Knowl. Eng. (IJSEKE) 2020, 30, 1–28. [Google Scholar] [CrossRef]
- Dick, J.; Hull, E.; Jackson, K. Requirements Engineering; Springer International Publishing: Cham, Switzerland, 2017; Available online: http://link.springer.com/10.1007/978-3-319-61073-3 (accessed on 4 November 2021).
- Génova, G.; Fuentes, J.M.; Morillo, J.L.; Hurtado, O.; Moreno, V. A framework to measure and improve the quality of textual requirements. Requir. Eng. 2011, 18, 25–41. [Google Scholar] [CrossRef]
- Reuse Company, “Systems Engineering Suite”. Available online: https://www.reusecompany.com/systems-engineering-suite (accessed on 10 October 2020).
- Mitchell, M. An Introduction to Genetic Algorithms; MIT Press: Cambridge, MA, USA, 1996. [Google Scholar]
- Furnkranz, J. Separate-and-Conquer Rule Learning. Artif. Intell. Rev. 1999, 13, 3–54. [Google Scholar] [CrossRef]
- Parra, E. Metodología Orientada a la Optimización Automática de la Calidad de los Requisitos. Ph.D. Thesis, Universidad Carlos III de Madrid, Madrid, Spain, 2016. Available online: https://e-archivo.uc3m.es/handle/10016/23936 (accessed on 1 August 2019).
- Ebert, C. 50 Years of Software Engineering: Progress and Perils. IEEE Softw. 2018, 35, 94–101. [Google Scholar] [CrossRef]
- Kotonya, G. Requirements Engineering: Processes and Techniques; J. Wiley: Chichester, UK; New York, NY, USA, 1998. [Google Scholar]
- Hogan, A.; Blomqvist, E.; Cochez, M.; d’Amato, C.; de Melo, G.; Gutierrez, C.; Gayo, J.E.L.; Kirrane, S.; Neumaier, S.; Polleres, A.; et al. Knowledge Graphs. 2020. Available online: https://www.morganclaypool.com/doi/10.2200/S01125ED1V01Y202109DSK022 (accessed on 9 November 2021).
- INCOSE. Systems Engineering Vision 2020. 2020. Available online: https://www.incose.org/incose-member-resources/chapters-groups/ChapterSites/blues/chapter-news/2009/02/23/se-vision-2020 (accessed on 23 November 2021).
- Mavin, A.; Wilkinson, P.; Harwood, A.; Novak, M. EARS (Easy Approach to Requirements Syntax). In Proceedings of the IEEE International Conference on Requirements Engineering, Atlanta, GA, USA, 31 August–4 September 2009; pp. 317–322. [Google Scholar] [CrossRef]
- Schmaal, R.J.L.; Balsters, H.; Valera, S. Formal Specification of a Meta Hierarchical Logical Data Model Using Object Role Modeling. In On the Move to Meaningful Internet Systems: OTM 2011 Workshops, Crete, Greece, 17–21 October 2011; Meersman, R., Dillon, T., Herrero, P., Eds.; Springer: Berlin/Heidelberg, Germany, 2011; Volume 7046, pp. 370–379. Available online: http://link.springer.com/10.1007/978-3-642-25126-9_48 (accessed on 9 November 2021).
- Alvarez-Rodríguez, J.; de Pablos, P.; Vafopoulos, M.; Labra-Gayo, J. Towards a Stepwise Method for Unifying and Reconciling Corporate Names in Public Contracts Metadata: The CORFU Technique. In Proceedings of the Research Conference on Metadata and Semantic Research, Thessaloniki, Greece, 19–22 November 2013; Garoufallou, E., Greenberg, J., Eds.; Springer International Publishing: Berlin/Heidelberg, Germany, 2013; Volume 390, pp. 315–329. [Google Scholar] [CrossRef]
- Bommasani, R.; Hudson, D.A.; Adeli, E.; Altman, R.; Arora, S.; von Arx, S.; Bernstein, M.S.; Bohg, J.; Bosselut, A.; Brunskill, E.; et al. On the Opportunities and Risks of Foundation Models. arXiv 2021, arXiv:2108.07258. [Google Scholar]
- Rothman, D. Transformers for Natural Language Processing: Build Innovative Deep Neural Network Architectures for NLP with Python, PyTorch, TensorFlow, BERT, RoBERTa, and More; Packt Publishing: Birmingham, UK, 2021. [Google Scholar]
- Halpin, T.; Curland, M. Automated verbalization for ORM 2. In Proceedings of the OTM Confederated International Conferences on the Move to Meaningful Internet Systems, Montpellier, France, 29 October–3 November 2006; pp. 1181–1190. [Google Scholar]
- Moreno, V.; Génova, G.; Parra, E.; Fraga, A. Application of machine learning techniques to the flexible assessment and improvement of requirements quality. Softw. Qual. J. 2020, 28, 1645–1674. [Google Scholar] [CrossRef]
- Mallawaarachi, V. Introduction to Genetic Algorithms—Including Example Code. Toward Data Sci. Medium 2017. Available online: https://towardsdatascience.com/introduction-to-genetic-algorithms-including-example-code-e396e98d8bf3 (accessed on 23 July 2018).
- Khanali, M.; Ahmadzadegan, S.; Omid, M.; Nasab, F.K.; Chau, K.W. Optimizing layout of wind farm turbines using genetic algorithms in Tehran province, Iran. Int. J. Energy Environ. Eng. 2018, 9, 399–411. [Google Scholar] [CrossRef] [Green Version]
- Tonella, P.; Susi, A.; Palma, F. Interactive requirements prioritization using a genetic algorithm. Inf. Softw. Technol. 2013, 55, 173–187. [Google Scholar] [CrossRef]
- Bartoli, A.; de Lorenzo, A. Learning text patterns using separate-and-conquer genetic programming. In Proceedings of the European Conference on Genetic Programming, Copenhagen, Denmark, 8–10 April 2015; pp. 16–27. [Google Scholar] [CrossRef] [Green Version]
- Karova, M.N.; Petkova, J.; Penev, S.P. Web Application of Traveling Salesman Problem using Genetic Algorithms. In Proceedings of the Papers, Volume 2, XLII International Scientific Conference on Information, Communication and Energy Systems and Technologies ICEST, Ohrid, Macedonia, 24–27 June 2007; pp. 24–27. Available online: http://rcvt.tu-sofia.bg/ICEST2007_2_99.pdf (accessed on 27 January 2021).
- Stoltz, E. Evolution of a salesman: A complete genetic algorithm tutorial for Python. Toward Data Sci. Medium 2018. Available online: https://towardsdatascience.com/evolution-of-a-salesman-a-complete-genetic-algorithm-tutorial-for-python-6fe5d2b3ca35 (accessed on 18 February 2021).
- Jurafsky, D.; Martin, J.H. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, 2nd ed.; Pearson Prentice Hall: Upper Saddle River, NJ, USA, 2009. [Google Scholar]
- Harsu, M. A Survey on Domain Engineering. Available online: https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.197.7897&rep=rep1&type=pdf (accessed on 12 September 2020).
- Chen, M.X.; Lee, B.N.; Bansal, G.; Cao, Y.; Zhang, S.; Lu, J.; Tsay, J.; Wang, Y.; Dai, A.M.; Chen, Z.; et al. Gmail Smart Compose: Real-Time Assisted Writing. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, AK, USA, 4–8 August 2019. [Google Scholar] [CrossRef] [Green Version]
- Parra, E.; Dimou, C.; Llorens, J.; Moreno, V.; Fraga, A. A methodology for the classification of quality of requirements using machine learning techniques. Inf. Softw. Technol. 2015, 67, 180–195. [Google Scholar] [CrossRef]
Token | Term | Token | Term |
---|---|---|---|
1134 | acronyms (ok) | 2228 | noun |
1103 | adjective | 1123 | number |
1151 | adverb | 1117 | opening exclamation mark |
1106 | adverbial phrase | 1115 | opening question mark |
1130 | article | 1170 | phrasal verb |
1210 | aspectual verb | 1153 | place adverb |
1118 | closing angle brackets | 1199 | possessive pronoun |
1224 | definite article | 1105 | proposed adjective |
1111 | left slash | 1152 | time adverb |
1171 | modal phrase | 1241 | unclassified adjective |
1130 | modal compulsory | 1144 | unclassified noun |
1128 | month | 1108 | verb |
Reqs Set Id | Token ID | Token Frequency | Token Count in Reqs Set | Syntactical Text |
---|---|---|---|---|
4 | 1144 | 0.23657 | 3431 | unclassified noun |
4 | 1119 | 0.15362 | 2228 | noun |
4 | 1224 | 0.09019 | 1308 | definite article |
4 | 1108 | 0.04344 | 630 | verb |
4 | 1110 | 0.04116 | 597 | symbol |
4 | 1230 | 0.03530 | 512 | verb to be |
4 | 1229 | 0.03282 | 476 | preposition to |
4 | 1012 | 0.02565 | 372 | adjective |
4 | 1213 | 0.02462 | 357 | preposition |
Parameter | Description |
---|---|
Affinity | Number of slots of the requirements that are allowed to match by a wildcard token of a token. |
Generations | Maximum number of generations for the GA algorithm. |
Maximum Pattern size | Maximum number of slots of the Patterns generated by GA. |
Maximum Reqs size | Maximum number of words of the requirements to be analyzed. The number of words of the requirement. This is the main quality metric identified in previous works [5,30]. |
Minimum Reqs to Match | Number of requirements that a pattern found with the GA shall match to be included in the final RPS. If this minimum is not reached by any pattern generated by the genetic algorithm, then the conquest iteration does not get any pattern. |
Mutation level | Probability of mutation for the entire population. A Mutation level = 0.2 will cause mutations in 20% of the members of the population in each GA generation. |
Number of conquest iterations | Maximum number iterations. It is an exit condition. |
Pool Size | Number of members of the mating pool. It must be a number less than Pattern Population members. |
Population size | Number of patterns (individuals) of the pattern population. |
Metric | Description | Formula | Good Quality Represented by |
---|---|---|---|
AvRm | Average of Reqs matched by the Patterns of RPS | High value | |
%Rm | Percentage of Requirements of RS matched by RPS | High value | |
BPRm | Number of Reqs matched by the Best Pattern of RPS | High value |
RS Id | RS Name | Number of Requirements | Content |
---|---|---|---|
3 | Hwsw_1 | 45 | Hardware and base software requirements |
41 | Funct_Gen1 | 34 | Functional requirements |
42 | Funct_SMG | 77 | Functional requirements |
43 | Funct_STS | 47 | Functional requirements |
44 | Funct_Gen2 | 101 | Functional requirements |
45 | Funct_QAR | 60 | Functional and Quality requirements |
46 | Funct_QM | 65 | Functional and Quality requirements |
47 | Funct_SMG | 75 | Functional requirements |
48 | Hwsw_2 | 23 | Hardware and base software requirements |
49 | SEC | 18 | Security requirements |
TOTAL | 545 |
Parameter | Value |
---|---|
Population size | 50 to 80 |
Pool size | 20 |
Mutation level | 25% |
Minimum Reqs to Match | 2 |
Affinity | 3 |
Conquest iterations | 50 |
Generations | 300 |
Maximum Reqs size | 12 |
Exp. Id | RS | RS Total Reqs. | Reqs. Conquest | %Rm | RPS | AvRm | BPRm |
---|---|---|---|---|---|---|---|
1 | Hwsw_1 | 45 | 41 | 91% | 9 | 4.56 | 11 |
2 | Funct_Gen1 | 34 | 25 | 74% | 7 | 3.57 | 6 |
3 | Funct_SMG | 77 | 73 | 95% | 20 | 3.65 | 15 |
4 | Funct_STS | 47 | 47 | 100% | 14 | 3.36 | 14 |
5 | Funct_Gen2 | 101 | 93 | 92% | 22 | 4.23 | 17 |
6 | Funct_QAR | 60 | 52 | 87% | 10 | 5.20 | 15 |
7 | Funct_QM | 65 | 58 | 89% | 14 | 4.14 | 9 |
8 | Funct_SMG | 75 | 54 | 72% | 12 | 4.50 | 16 |
9 | Hwsw_2 | 23 | 17 | 74% | 5 | 3.40 | 6 |
10 | SEC | 18 | 12 | 67% | 4 | 3.00 | 5 |
TOTAL | 545 | 472 | 87% | 117 | 4.03 |
RS | Reqs. Conquest | RPS | AvRm | |
---|---|---|---|---|
TOTAL GA Experiments | 545 | 472 | 117 | 4.03 |
TTOTAL GA—RPS corrected | 545 | 545 | 190 1 | 2.87 |
Previous experiment | 545 | 545 | 442 | 1.23 |
Exp. Id | RS | RS Total Reqs. | Reqs. Conquest | Duration (mins) | Average Time to Find Each Pattern (mins) |
---|---|---|---|---|---|
1 | Hwsw_1 | 45 | 41 | 20.92 | 2.32 |
2 | Funct_Gen1 | 34 | 25 | 16.04 | 2.29 |
3 | Funct_SMG | 77 | 73 | 47.23 | 2.36 |
4 | Funct_STS | 47 | 47 | 20.97 | 1.50 |
5 | Funct_Gen2 | 101 | 93 | 48.56 | 2.21 |
6 | Funct_QAR | 60 | 52 | 21.43 | 2.14 |
7 | Funct_QM | 65 | 58 | 25.02 | 1.79 |
8 | Funct_SMG | 75 | 54 | 24.67 | 2.06 |
9 | Hwsw_2 | 23 | 17 | 7.59 | 1.52 |
10 | SEC | 18 | 12 | 10.83 | 2.71 |
Affinity = 3 | Affinity = 4 | ||||
---|---|---|---|---|---|
Num. of Reqs. 1 | Maximum Req. Size 2 | Reqs Conquest | RPS | Reqs Conquest | RPS |
45 | 12 | 40 | 5 | 45 | 5 |
45 | 13 | 40 | 5 | 45 | 4 |
45 | 14 | 21 | 6 | 41 | 4 |
45 | 15 | 19 | 4 | 38 | 5 |
45 | 16 | 12 | 3 | 35 | 4 |
Scenario | Goal | Parameter’s Value |
---|---|---|
1—Introducing authoring tools. | To have a generalist RPS that allow authoring with relative high freedom in a particular domain (fewer specific requirements) | Affinity—High Minimum match—High |
2—Using authoring tools in a very restricted environment where precision is required | To have a very specific RPS that conduct authoring very narrowly (e.g., design phase) | Affinity—Low Minimum match—Low |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Poza, J.; Moreno, V.; Fraga, A.; Álvarez-Rodríguez, J.M. Genetic Algorithms: A Practical Approach to Generate Textual Patterns for Requirements Authoring. Appl. Sci. 2021, 11, 11378. https://doi.org/10.3390/app112311378
Poza J, Moreno V, Fraga A, Álvarez-Rodríguez JM. Genetic Algorithms: A Practical Approach to Generate Textual Patterns for Requirements Authoring. Applied Sciences. 2021; 11(23):11378. https://doi.org/10.3390/app112311378
Chicago/Turabian StylePoza, Jesús, Valentín Moreno, Anabel Fraga, and José María Álvarez-Rodríguez. 2021. "Genetic Algorithms: A Practical Approach to Generate Textual Patterns for Requirements Authoring" Applied Sciences 11, no. 23: 11378. https://doi.org/10.3390/app112311378
APA StylePoza, J., Moreno, V., Fraga, A., & Álvarez-Rodríguez, J. M. (2021). Genetic Algorithms: A Practical Approach to Generate Textual Patterns for Requirements Authoring. Applied Sciences, 11(23), 11378. https://doi.org/10.3390/app112311378