AI-Augmented Software Engineering

A special issue of Mathematics (ISSN 2227-7390).

Deadline for manuscript submissions: closed (30 August 2024) | Viewed by 5509

Special Issue Editor


E-Mail Website
Guest Editor
School of Computing and Mathematic Sciences, University of Leicester, Leicester, UK
Interests: artificial creativity; creative computing
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Significant improvements in many areas of the software development lifecycle have been made based on advancements in machine learning (ML) research for software engineering (SE). Automated code generation is notable where ML models are trained on sizable code repositories to generate code snippets, assisting developers in producing software quickly. The use of ML techniques for bug detection and prediction allows the early detection of potential software flaws. As developers choose the best libraries, frameworks, and design patterns for a project, ML-powered recommendation systems have gained popularity.

These recent developments hold the promise of accelerating software development, enhancing code quality, and encouraging innovation in the area of SE as the symbiotic relationship between machine learning and software engineering continues to develop.

Prof. Dr. Hongji Yang
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Mathematics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue polices can be found here.

Published Papers (4 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

12 pages, 391 KiB  
Article
SCC-GPT: Source Code Classification Based on Generative Pre-Trained Transformers
by Mohammad D. Alahmadi, Moayad Alshangiti and Jumana Alsubhi
Mathematics 2024, 12(13), 2128; https://doi.org/10.3390/math12132128 - 7 Jul 2024
Viewed by 763
Abstract
Developers often rely on online resources, such as Stack Overflow (SO), to seek assistance for programming tasks. To facilitate effective search and resource discovery, manual tagging of questions and posts with the appropriate programming language is essential. However, accurate tagging is not consistently [...] Read more.
Developers often rely on online resources, such as Stack Overflow (SO), to seek assistance for programming tasks. To facilitate effective search and resource discovery, manual tagging of questions and posts with the appropriate programming language is essential. However, accurate tagging is not consistently achieved, leading to the need for the automated classification of code snippets into the correct programming language as a tag. In this study, we introduce a novel approach to automated classification of code snippets from Stack Overflow (SO) posts into programming languages using generative pre-trained transformers (GPT). Our method, which does not require additional training on labeled data or dependency on pre-existing labels, classifies 224,107 code snippets into 19 programming languages. We employ the text-davinci-003 model of ChatGPT-3.5 and postprocess its responses to accurately identify the programming language. Our empirical evaluation demonstrates that our GPT-based model (SCC-GPT) significantly outperforms existing methods, achieving a median F1-score improvement that ranges from +6% to +31%. These findings underscore the effectiveness of SCC-GPT in enhancing code snippet classification, offering a cost-effective and efficient solution for developers who rely on SO for programming assistance. Full article
(This article belongs to the Special Issue AI-Augmented Software Engineering)
Show Figures

Figure 1

37 pages, 6035 KiB  
Article
Requirement Dependency Extraction Based on Improved Stacking Ensemble Machine Learning
by Hui Guan, Hang Xu and Lie Cai
Mathematics 2024, 12(9), 1272; https://doi.org/10.3390/math12091272 - 23 Apr 2024
Cited by 2 | Viewed by 837
Abstract
To address the cost and efficiency issues of manually analysing requirement dependency in requirements engineering, a requirement dependency extraction method based on part-of-speech features and an improved stacking ensemble learning model (P-Stacking) is proposed. Firstly, to overcome the problem of singularity in the [...] Read more.
To address the cost and efficiency issues of manually analysing requirement dependency in requirements engineering, a requirement dependency extraction method based on part-of-speech features and an improved stacking ensemble learning model (P-Stacking) is proposed. Firstly, to overcome the problem of singularity in the feature extraction process, this paper integrates part-of-speech features, TF-IDF features, and Word2Vec features during the feature selection stage. The particle swarm optimization algorithm is used to allocate weights to part-of-speech tags, which enhances the significance of crucial information in requirement texts. Secondly, to overcome the performance limitations of standalone machine learning models, an improved stacking model is proposed. The Low Correlation Algorithm and Grid Search Algorithms are utilized in P-stacking to automatically select the optimal combination of the base models, which reduces manual intervention and improves prediction performance. The experimental results show that compared with the method based on TF-IDF features, the highest F1 scores of a standalone machine learning model in the three datasets were improved by 3.89%, 10.68%, and 21.4%, respectively, after integrating part-of-speech features and Word2Vec features. Compared with the method based on a standalone machine learning model, the improved stacking ensemble machine learning model improved F1 scores by 2.29%, 5.18%, and 7.47% in the testing and evaluation of three datasets, respectively. Full article
(This article belongs to the Special Issue AI-Augmented Software Engineering)
Show Figures

Figure 1

19 pages, 4123 KiB  
Article
Optimizing OCR Performance for Programming Videos: The Role of Image Super-Resolution and Large Language Models
by Mohammad D. Alahmadi and Moayad Alshangiti
Mathematics 2024, 12(7), 1036; https://doi.org/10.3390/math12071036 - 30 Mar 2024
Cited by 1 | Viewed by 1520
Abstract
The rapid evolution of video programming tutorials as a key educational resource has highlighted the need for effective code extraction methods. These tutorials, varying widely in video quality, present a challenge for accurately transcribing the embedded source code, crucial for learning and software [...] Read more.
The rapid evolution of video programming tutorials as a key educational resource has highlighted the need for effective code extraction methods. These tutorials, varying widely in video quality, present a challenge for accurately transcribing the embedded source code, crucial for learning and software development. This study investigates the impact of video quality on the performance of optical character recognition (OCR) engines and the potential of large language models (LLMs) to enhance code extraction accuracy. Our comprehensive empirical analysis utilizes a rich dataset of programming screencasts, involving manual transcription of source code and the application of both traditional OCR engines, like Tesseract and Google Vision, and advanced LLMs, including GPT-4V and Gemini. We investigate the efficacy of image super-resolution (SR) techniques, namely, enhanced deep super-resolution (EDSR) and multi-scale deep super-resolution (MDSR), in improving the quality of low-resolution video frames. The findings reveal significant improvements in OCR accuracy with the use of SR, particularly at lower resolutions such as 360p. LLMs demonstrate superior performance across all video qualities, indicating their robustness and advanced capabilities in diverse scenarios. This research contributes to the field of software engineering by offering a benchmark for code extraction from video tutorials and demonstrating the substantial impact of SR techniques and LLMs in enhancing the readability and reusability of code from these educational resources. Full article
(This article belongs to the Special Issue AI-Augmented Software Engineering)
Show Figures

Figure 1

38 pages, 1680 KiB  
Article
Commit-Level Software Change Intent Classification Using a Pre-Trained Transformer-Based Code Model
by Tjaša Heričko, Boštjan Šumak and Sašo Karakatič
Mathematics 2024, 12(7), 1012; https://doi.org/10.3390/math12071012 - 28 Mar 2024
Viewed by 1482
Abstract
Software evolution is driven by changes made during software development and maintenance. While source control systems effectively manage these changes at the commit level, the intent behind them are often inadequately documented, making understanding their rationale challenging. Existing commit intent classification approaches, largely [...] Read more.
Software evolution is driven by changes made during software development and maintenance. While source control systems effectively manage these changes at the commit level, the intent behind them are often inadequately documented, making understanding their rationale challenging. Existing commit intent classification approaches, largely reliant on commit messages, only partially capture the underlying intent, predominantly due to the messages’ inadequate content and neglect of the semantic nuances in code changes. This paper presents a novel method for extracting semantic features from commits based on modifications in the source code, where each commit is represented by one or more fine-grained conjoint code changes, e.g., file-level or hunk-level changes. To address the unstructured nature of code, the method leverages a pre-trained transformer-based code model, further trained through task-adaptive pre-training and fine-tuning on the downstream task of intent classification. This fine-tuned task-adapted pre-trained code model is then utilized to embed fine-grained conjoint changes in a commit, which are aggregated into a unified commit-level vector representation. The proposed method was evaluated using two BERT-based code models, i.e., CodeBERT and GraphCodeBERT, and various aggregation techniques on data from open-source Java software projects. The results show that the proposed method can be used to effectively extract commit embeddings as features for commit intent classification and outperform current state-of-the-art methods of code commit representation for intent categorization in terms of software maintenance activities undertaken by commits. Full article
(This article belongs to the Special Issue AI-Augmented Software Engineering)
Show Figures

Figure 1

Back to TopTop