Next Issue
Volume 8, January
Previous Issue
Volume 7, September
 
 

Big Data Cogn. Comput., Volume 7, Issue 4 (December 2023) – 27 articles

Cover Story (view full-size image): This systematic review focuses on cognitive assessment using electroencephalography in virtual, augmented, and mixed reality environments via head-mounted displays for healthy individuals. Conducted using PRISMA, the search across electronic databases yielded 82 relevant studies. The review evaluated aspects such as cognitive load, immersion, spatial awareness, interaction with the digital environment, and attention. Analysis included participants, equipment, stimuli, frequency bands, data preprocessing, and signal analysis methods. Research findings indicate that VR, multi-electrode EEG, and statistical analysis of EEG features were widely employed. Identified areas for exploration include experimental setup, EEG signal analysis (α, β, γ sub-bands), and cognitive state evaluation. View this paper
  • Issues are regarded as officially published after their release is announced to the table of contents alert mailing list.
  • You may sign up for e-mail alerts to receive table of contents of newly released issues.
  • PDF is the official format for papers published in both, html and pdf forms. To view the papers in pdf format, click on the "PDF Full-text" link, and use the free Adobe Reader to open them.
Order results
Result details
Select all
Export citation of selected articles as:
18 pages, 919 KiB  
Article
Extraction of Significant Features by Fixed-Weight Layer of Processing Elements for the Development of an Efficient Spiking Neural Network Classifier
by Alexander Sboev, Roman Rybka, Dmitry Kunitsyn, Alexey Serenko, Vyacheslav Ilyin and Vadim Putrolaynen
Big Data Cogn. Comput. 2023, 7(4), 184; https://doi.org/10.3390/bdcc7040184 - 18 Dec 2023
Viewed by 1986
Abstract
In this paper, we demonstrate that fixed-weight layers generated from random distribution or logistic functions can effectively extract significant features from input data, resulting in high accuracy on a variety of tasks, including Fisher’s Iris, Wisconsin Breast Cancer, and MNIST datasets. We have [...] Read more.
In this paper, we demonstrate that fixed-weight layers generated from random distribution or logistic functions can effectively extract significant features from input data, resulting in high accuracy on a variety of tasks, including Fisher’s Iris, Wisconsin Breast Cancer, and MNIST datasets. We have observed that logistic functions yield high accuracy with less dispersion in results. We have also assessed the precision of our approach under conditions of minimizing the number of spikes generated in the network. It is practically useful for reducing energy consumption in spiking neural networks. Our findings reveal that the proposed method demonstrates the highest accuracy on Fisher’s iris and MNIST datasets with decoding using logistic regression. Furthermore, they surpass the accuracy of the conventional (non-spiking) approach using only logistic regression in the case of Wisconsin Breast Cancer. We have also investigated the impact of non-stochastic spike generation on accuracy. Full article
(This article belongs to the Special Issue Computational Intelligence: Spiking Neural Networks)
Show Figures

Figure 1

12 pages, 540 KiB  
Article
An Artificial-Intelligence-Driven Spanish Poetry Classification Framework
by Shutian Deng, Gang Wang, Hongjun Wang and Fuliang Chang
Big Data Cogn. Comput. 2023, 7(4), 183; https://doi.org/10.3390/bdcc7040183 - 14 Dec 2023
Cited by 2 | Viewed by 2172
Abstract
Spain possesses a vast number of poems. Most have features that mean they present significantly different styles. A superficial reading of these poems may confuse readers due to their complexity. Therefore, it is of vital importance to classify the style of the poems [...] Read more.
Spain possesses a vast number of poems. Most have features that mean they present significantly different styles. A superficial reading of these poems may confuse readers due to their complexity. Therefore, it is of vital importance to classify the style of the poems in advance. Currently, poetry classification studies are mostly carried out manually, which creates extremely high requirements for the professional quality of classifiers and consumes a large amount of time. Furthermore, the objectivity of the classification cannot be guaranteed because of the influence of the classifier’s subjectivity. To solve these problems, a Spanish poetry classification framework was designed using artificial intelligence technology, which improves the accuracy, efficiency, and objectivity of classification. First, an artificial-intelligence-driven Spanish poetry classification framework is described in detail, and is illustrated by a framework diagram to clearly represent each step in the process. The framework includes many algorithms and models, such as the Term Frequency–Inverse Document Frequency (TF_IDF), Bagging, Support Vector Machines (SVMs), Adaptive Boosting (AdaBoost), logistic regression (LR), Gradient Boosting Decision Trees (GBDT), LightGBM (LGB), eXtreme Gradient Boosting (XGBoost), and Random Forest (RF). The roles of each algorithm in the framework are clearly defined. Finally, experiments were performed for model selection, comparing the results of these algorithms.The Bagging model stood out for its high accuracy, and the experimental results showed that the proposed framework can help researchers carry out poetry research work more efficiently, accurately, and objectively. Full article
(This article belongs to the Special Issue Artificial Intelligence and Natural Language Processing)
Show Figures

Figure 1

22 pages, 5909 KiB  
Article
Computers’ Interpretations of Knowledge Representation Using Pre-Conceptual Schemas: An Approach Based on the BERT and Llama 2-Chat Models
by Jesus Insuasti, Felipe Roa and Carlos Mario Zapata-Jaramillo
Big Data Cogn. Comput. 2023, 7(4), 182; https://doi.org/10.3390/bdcc7040182 - 14 Dec 2023
Cited by 2 | Viewed by 3282
Abstract
Pre-conceptual schemas are a straightforward way to represent knowledge using controlled language regardless of context. Despite the benefits of using pre-conceptual schemas by humans, they present challenges when interpreted by computers. We propose an approach to making computers able to interpret the basic [...] Read more.
Pre-conceptual schemas are a straightforward way to represent knowledge using controlled language regardless of context. Despite the benefits of using pre-conceptual schemas by humans, they present challenges when interpreted by computers. We propose an approach to making computers able to interpret the basic pre-conceptual schemas made by humans. To do that, the construction of a linguistic corpus is required to work with large language models—LLM. The linguistic corpus was mainly fed using Master’s and doctoral theses from the digital repository of the University of Nariño to produce a training dataset for re-training the BERT model; in addition, we complement this by explaining the elicited sentences in triads from the pre-conceptual schemas using one of the cutting-edge large language models in natural language processing: Llama 2-Chat by Meta AI. The diverse topics covered in these theses allowed us to expand the spectrum of linguistic use in the BERT model and empower the generative capabilities using the fine-tuned Llama 2-Chat model and the proposed solution. As a result, the first version of a computational solution was built to consume the language models based on BERT and Llama 2-Chat and thus automatically interpret pre-conceptual schemas by computers via natural language processing, adding, at the same time, generative capabilities. The validation of the computational solution was performed in two phases: the first one for detecting sentences and interacting with pre-conceptual schemas with students in the Formal Languages and Automata Theory course—the seventh semester of the systems engineering undergraduate program at the University of Nariño’s Tumaco campus. The second phase was for exploring the generative capabilities based on pre-conceptual schemas; this second phase was performed with students in the Object-oriented Design course—the second semester of the systems engineering undergraduate program at the University of Nariño’s Tumaco campus. This validation yielded favorable results in implementing natural language processing using the BERT and Llama 2-Chat models. In this way, some bases were laid for future developments related to this research topic. Full article
(This article belongs to the Special Issue Knowledge Representation Formalisms for AI Applications)
Show Figures

Figure 1

15 pages, 1210 KiB  
Article
Text Classification Based on the Heterogeneous Graph Considering the Relationships between Documents
by Hiromu Nakajima and Minoru Sasaki
Big Data Cogn. Comput. 2023, 7(4), 181; https://doi.org/10.3390/bdcc7040181 - 13 Dec 2023
Cited by 1 | Viewed by 2118
Abstract
Text classification is the task of estimating the genre of a document based on information such as word co-occurrence and frequency of occurrence. Text classification has been studied by various approaches. In this study, we focused on text classification using graph structure data. [...] Read more.
Text classification is the task of estimating the genre of a document based on information such as word co-occurrence and frequency of occurrence. Text classification has been studied by various approaches. In this study, we focused on text classification using graph structure data. Conventional graph-based methods express relationships between words and relationships between words and documents as weights between nodes. Then, a graph neural network is used for learning. However, there is a problem that conventional methods are not able to represent the relationship between documents on the graph. In this paper, we propose a graph structure that considers the relationships between documents. In the proposed method, the cosine similarity of document vectors is set as weights between document nodes. This completes a graph that considers the relationship between documents. The graph is then input into a graph convolutional neural network for training. Therefore, the aim of this study is to improve the text classification performance of conventional methods by using this graph that considers the relationships between document nodes. In this study, we conducted evaluation experiments using five different corpora of English documents. The results showed that the proposed method outperformed the performance of the conventional method by up to 1.19%, indicating that the use of relationships between documents is effective. In addition, the proposed method was shown to be particularly effective in classifying long documents. Full article
(This article belongs to the Special Issue Advances in Natural Language Processing and Text Mining)
Show Figures

Figure 1

25 pages, 4925 KiB  
Article
Understanding the Influence of Genre-Specific Music Using Network Analysis and Machine Learning Algorithms
by Bishal Lamichhane, Aniket Kumar Singh, Suman Devkota, Uttam Dhakal, Subham Singh and Chandra Dhakal
Big Data Cogn. Comput. 2023, 7(4), 180; https://doi.org/10.3390/bdcc7040180 - 4 Dec 2023
Viewed by 2922
Abstract
This study analyzes a network of musical influence using machine learning and network analysis techniques. A directed network model is used to represent the influence relations between artists as nodes and edges. Network properties and centrality measures are analyzed to identify influential patterns. [...] Read more.
This study analyzes a network of musical influence using machine learning and network analysis techniques. A directed network model is used to represent the influence relations between artists as nodes and edges. Network properties and centrality measures are analyzed to identify influential patterns. In addition, influence within and outside the genre is quantified using in-genre and out-genre weights. Regression analysis is performed to determine the impact of musical attributes on influence. We find that speechiness, acousticness, and valence are the top features of the most influential artists. We also introduce the IRDI, an algorithm that provides an innovative approach to quantify an artist’s influence by capturing the degree of dominance among their followers. This approach underscores influential artists who drive the evolution of music, setting trends and significantly inspiring a new generation of artists. The independent cascade model is further employed to open up the temporal dynamics of influence propagation across the entire musical network, highlighting how initial seeds of influence can contagiously spread through the network. This multidisciplinary approach provides a nuanced understanding of musical influence that refines existing methods and sheds light on influential trends and dynamics. Full article
Show Figures

Figure 1

15 pages, 4233 KiB  
Article
Toward Morphologic Atlasing of the Human Whole Brain at the Nanoscale
by Wieslaw L. Nowinski
Big Data Cogn. Comput. 2023, 7(4), 179; https://doi.org/10.3390/bdcc7040179 - 1 Dec 2023
Cited by 2 | Viewed by 2262
Abstract
Although no dataset at the nanoscale for the entire human brain has yet been acquired and neither a nanoscale human whole brain atlas has been constructed, tremendous progress in neuroimaging and high-performance computing makes them feasible in the non-distant future. To construct the [...] Read more.
Although no dataset at the nanoscale for the entire human brain has yet been acquired and neither a nanoscale human whole brain atlas has been constructed, tremendous progress in neuroimaging and high-performance computing makes them feasible in the non-distant future. To construct the human whole brain nanoscale atlas, there are several challenges, and here, we address two, i.e., the morphology modeling of the brain at the nanoscale and designing of a nanoscale brain atlas. A new nanoscale neuronal format is introduced to describe data necessary and sufficient to model the entire human brain at the nanoscale, enabling calculations of the synaptome and connectome. The design of the nanoscale brain atlas covers design principles, content, architecture, navigation, functionality, and user interface. Three novel design principles are introduced supporting navigation, exploration, and calculations, namely, a gross neuroanatomy-guided navigation of micro/nanoscale neuroanatomy; a movable and zoomable sampling volume of interest for navigation and exploration; and a nanoscale data processing in a parallel-pipeline mode exploiting parallelism resulting from the decomposition of gross neuroanatomy parcellated into structures and regions as well as nano neuroanatomy decomposed into neurons and synapses, enabling the distributed construction and continual enhancement of the nanoscale atlas. Numerous applications of this atlas can be contemplated ranging from proofreading and continual multi-site extension to exploration, morphometric and network-related analyses, and knowledge discovery. To my best knowledge, this is the first proposed neuronal morphology nanoscale model and the first attempt to design a human whole brain atlas at the nanoscale. Full article
(This article belongs to the Special Issue Big Data System for Global Health)
Show Figures

Figure 1

12 pages, 1508 KiB  
Opinion
Artificial Intelligence in the Interpretation of Videofluoroscopic Swallow Studies: Implications and Advances for Speech–Language Pathologists
by Anna M. Girardi, Elizabeth A. Cardell and Stephen P. Bird
Big Data Cogn. Comput. 2023, 7(4), 178; https://doi.org/10.3390/bdcc7040178 - 28 Nov 2023
Cited by 3 | Viewed by 3562
Abstract
Radiological imaging is an essential component of a swallowing assessment. Artificial intelligence (AI), especially deep learning (DL) models, has enhanced the efficiency and efficacy through which imaging is interpreted, and subsequently, it has important implications for swallow diagnostics and intervention planning. However, the [...] Read more.
Radiological imaging is an essential component of a swallowing assessment. Artificial intelligence (AI), especially deep learning (DL) models, has enhanced the efficiency and efficacy through which imaging is interpreted, and subsequently, it has important implications for swallow diagnostics and intervention planning. However, the application of AI for the interpretation of videofluoroscopic swallow studies (VFSS) is still emerging. This review showcases the recent literature on the use of AI to interpret VFSS and highlights clinical implications for speech–language pathologists (SLPs). With a surge in AI research, there have been advances in dysphagia assessments. Several studies have demonstrated the successful implementation of DL algorithms to analyze VFSS. Notably, convolutional neural networks (CNNs), which involve training a multi-layered model to recognize specific image or video components, have been used to detect pertinent aspects of the swallowing process with high levels of precision. DL algorithms have the potential to streamline VFSS interpretation, improve efficiency and accuracy, and enable the precise interpretation of an instrumental dysphagia evaluation, which is especially advantageous when access to skilled clinicians is not ubiquitous. By enhancing the precision, speed, and depth of VFSS interpretation, SLPs can obtain a more comprehensive understanding of swallow physiology and deliver a targeted and timely intervention that is tailored towards the individual. This has practical applications for both clinical practice and dysphagia research. As this research area grows and AI technologies progress, the application of DL in the field of VFSS interpretation is clinically beneficial and has the potential to transform dysphagia assessment and management. With broader validation and inter-disciplinary collaborations, AI-augmented VFSS interpretation will likely transform swallow evaluations and ultimately improve outcomes for individuals with dysphagia. However, despite AI’s potential to streamline imaging interpretation, practitioners still need to consider the challenges and limitations of AI implementation, including the need for large training datasets, interpretability and adaptability issues, and the potential for bias. Full article
Show Figures

Figure 1

3 pages, 180 KiB  
Editorial
Managing Cybersecurity Threats and Increasing Organizational Resilience
by Peter R. J. Trim and Yang-Im Lee
Big Data Cogn. Comput. 2023, 7(4), 177; https://doi.org/10.3390/bdcc7040177 - 22 Nov 2023
Cited by 2 | Viewed by 2284
Abstract
Cyber security is high up on the agenda of senior managers in private and public sector organizations and is likely to remain so for the foreseeable future. [...] Full article
16 pages, 756 KiB  
Article
A New Approach to Data Analysis Using Machine Learning for Cybersecurity
by Shivashankar Hiremath, Eeshan Shetty, Allam Jaya Prakash, Suraj Prakash Sahoo, Kiran Kumar Patro, Kandala N. V. P. S. Rajesh and Paweł Pławiak
Big Data Cogn. Comput. 2023, 7(4), 176; https://doi.org/10.3390/bdcc7040176 - 21 Nov 2023
Cited by 5 | Viewed by 5899
Abstract
The internet has become an indispensable tool for organizations, permeating every facet of their operations. Virtually all companies leverage Internet services for diverse purposes, including the digital storage of data in databases and cloud platforms. Furthermore, the rising demand for software and applications [...] Read more.
The internet has become an indispensable tool for organizations, permeating every facet of their operations. Virtually all companies leverage Internet services for diverse purposes, including the digital storage of data in databases and cloud platforms. Furthermore, the rising demand for software and applications has led to a widespread shift toward computer-based activities within the corporate landscape. However, this digital transformation has exposed the information technology (IT) infrastructures of these organizations to a heightened risk of cyber-attacks, endangering sensitive data. Consequently, organizations must identify and address vulnerabilities within their systems, with a primary focus on scrutinizing customer-facing websites and applications. This work aims to tackle this pressing issue by employing data analysis tools, such as Power BI, to assess vulnerabilities within a client’s application or website. Through a rigorous analysis of data, valuable insights and information will be provided, which are necessary to formulate effective remedial measures against potential attacks. Ultimately, the central goal of this research is to demonstrate that clients can establish a secure environment, shielding their digital assets from potential attackers. Full article
(This article belongs to the Special Issue Artificial Intelligence for Online Safety)
Show Figures

Figure 1

20 pages, 7293 KiB  
Article
Empowering Propaganda Detection in Resource-Restraint Languages: A Transformer-Based Framework for Classifying Hindi News Articles
by Deptii Chaudhari and Ambika Vishal Pawar
Big Data Cogn. Comput. 2023, 7(4), 175; https://doi.org/10.3390/bdcc7040175 - 15 Nov 2023
Cited by 5 | Viewed by 2652
Abstract
Misinformation, fake news, and various propaganda techniques are increasingly used in digital media. It becomes challenging to uncover propaganda as it works with the systematic goal of influencing other individuals for the determined ends. While significant research has been reported on propaganda identification [...] Read more.
Misinformation, fake news, and various propaganda techniques are increasingly used in digital media. It becomes challenging to uncover propaganda as it works with the systematic goal of influencing other individuals for the determined ends. While significant research has been reported on propaganda identification and classification in resource-rich languages such as English, much less effort has been made in resource-deprived languages like Hindi. The spread of propaganda in the Hindi news media has induced our attempt to devise an approach for the propaganda categorization of Hindi news articles. The unavailability of the necessary language tools makes propaganda classification in Hindi more challenging. This study proposes the effective use of deep learning and transformer-based approaches for Hindi computational propaganda classification. To address the lack of pretrained word embeddings in Hindi, Hindi Word2vec embeddings were created using the H-Prop-News corpus for feature extraction. Subsequently, three deep learning models, i.e., CNN (convolutional neural network), LSTM (long short-term memory), Bi-LSTM (bidirectional long short-term memory); and four transformer-based models, i.e., multi-lingual BERT, Distil-BERT, Hindi-BERT, and Hindi-TPU-Electra, were experimented with. The experimental outcomes indicate that the multi-lingual BERT and Hindi-BERT models provide the best performance, with the highest F1 score of 84% on the test data. These results strongly support the efficacy of the proposed solution and indicate its appropriateness for propaganda classification. Full article
(This article belongs to the Special Issue Advances in Natural Language Processing and Text Mining)
Show Figures

Figure 1

23 pages, 5698 KiB  
Article
Optimization of Cryptocurrency Algorithmic Trading Strategies Using the Decomposition Approach
by Sherin M. Omran, Wessam H. El-Behaidy and Aliaa A. A. Youssif
Big Data Cogn. Comput. 2023, 7(4), 174; https://doi.org/10.3390/bdcc7040174 - 14 Nov 2023
Viewed by 3983
Abstract
A cryptocurrency is a non-centralized form of money that facilitates financial transactions using cryptographic processes. It can be thought of as a virtual currency or a payment mechanism for sending and receiving money online. Cryptocurrencies have gained wide market acceptance and rapid development [...] Read more.
A cryptocurrency is a non-centralized form of money that facilitates financial transactions using cryptographic processes. It can be thought of as a virtual currency or a payment mechanism for sending and receiving money online. Cryptocurrencies have gained wide market acceptance and rapid development during the past few years. Due to the volatile nature of the crypto-market, cryptocurrency trading involves a high level of risk. In this paper, a new normalized decomposition-based, multi-objective particle swarm optimization (N-MOPSO/D) algorithm is presented for cryptocurrency algorithmic trading. The aim of this algorithm is to help traders find the best Litecoin trading strategies that improve their outcomes. The proposed algorithm is used to manage the trade-offs among three objectives: the return on investment, the Sortino ratio, and the number of trades. A hybrid weight assignment mechanism has also been proposed. It was compared against the trading rules with their standard parameters, MOPSO/D, using normalized weighted Tchebycheff scalarization, and MOEA/D. The proposed algorithm could outperform the counterpart algorithms for benchmark and real-world problems. Results showed that the proposed algorithm is very promising and stable under different market conditions. It could maintain the best returns and risk during both training and testing with a moderate number of trades. Full article
(This article belongs to the Special Issue Applied Data Science for Social Good)
Show Figures

Figure 1

40 pages, 8198 KiB  
Article
The Semantic Adjacency Criterion in Time Intervals Mining
by Alexander Shknevsky, Yuval Shahar and Robert Moskovitch
Big Data Cogn. Comput. 2023, 7(4), 173; https://doi.org/10.3390/bdcc7040173 - 9 Nov 2023
Viewed by 1816
Abstract
We propose a new pruning constraint when mining frequent temporal patterns to be used as classification and prediction features, the Semantic Adjacency Criterion [SAC], which filters out temporal patterns that contain potentially semantically contradictory components, exploiting each medical domain’s knowledge. We have [...] Read more.
We propose a new pruning constraint when mining frequent temporal patterns to be used as classification and prediction features, the Semantic Adjacency Criterion [SAC], which filters out temporal patterns that contain potentially semantically contradictory components, exploiting each medical domain’s knowledge. We have defined three SAC versions and tested them within three medical domains (oncology, hepatitis, diabetes) and a frequent-temporal-pattern discovery framework. Previously, we had shown that using SAC enhances the repeatability of discovering the same temporal patterns in similar proportions in different patient groups within the same clinical domain. Here, we focused on SAC’s computational implications for pattern discovery, and for classification and prediction, using the discovered patterns as features, by four different machine-learning methods: Random Forests, Naïve Bayes, SVM, and Logistic Regression. Using SAC resulted in a significant reduction, across all medical domains and classification methods, of up to 97% in the number of discovered temporal patterns, and in the runtime of the discovery process, of up to 98%. Nevertheless, the highly reduced set of only semantically transparent patterns, when used as features, resulted in classification and prediction models whose performance was at least as good as the models resulting from using the complete temporal-pattern set. Full article
(This article belongs to the Special Issue Data Science in Health Care)
Show Figures

Figure 1

19 pages, 3591 KiB  
Article
Evaluation of Short-Term Rockburst Risk Severity Using Machine Learning Methods
by Aibing Jin, Prabhat Basnet and Shakil Mahtab
Big Data Cogn. Comput. 2023, 7(4), 172; https://doi.org/10.3390/bdcc7040172 - 7 Nov 2023
Viewed by 1872
Abstract
In deep engineering, rockburst hazards frequently result in injuries, fatalities, and the destruction of contiguous structures. Due to the complex nature of rockbursts, predicting the severity of rockburst damage (intensity) without the aid of computer models is challenging. Although there are various predictive [...] Read more.
In deep engineering, rockburst hazards frequently result in injuries, fatalities, and the destruction of contiguous structures. Due to the complex nature of rockbursts, predicting the severity of rockburst damage (intensity) without the aid of computer models is challenging. Although there are various predictive models in existence, effectively identifying the risk severity in imbalanced data remains crucial. The ensemble boosting method is often better suited to dealing with unequally distributed classes than are classical models. Therefore, this paper employs the ensemble categorical gradient boosting (CGB) method to predict short-term rockburst risk severity. After data collection, principal component analysis (PCA) was employed to avoid the redundancies caused by multi-collinearity. Afterwards, the CGB was trained on PCA data, optimal hyper-parameters were retrieved using the grid-search technique to predict the test samples, and performance was evaluated using precision, recall, and F1 score metrics. The results showed that the PCA-CGB model achieved better results in prediction than did the single CGB model or conventional boosting methods. The model achieved an F1 score of 0.8952, indicating that the proposed model is robust in predicting damage severity given an imbalanced dataset. This work provides practical guidance in risk management. Full article
Show Figures

Figure 1

24 pages, 1607 KiB  
Article
Social Trend Mining: Lead or Lag
by Hossein Hassani, Nadejda Komendantova, Elena Rovenskaya and Mohammad Reza Yeganegi
Big Data Cogn. Comput. 2023, 7(4), 171; https://doi.org/10.3390/bdcc7040171 - 7 Nov 2023
Cited by 3 | Viewed by 2312
Abstract
This research underscores the profound implications of Social Intelligence Mining, notably employing open access data and Google Search engine data for trend discernment. Utilizing advanced analytical methodologies, including wavelet coherence analysis and phase difference, hidden relationships and patterns within social data were revealed. [...] Read more.
This research underscores the profound implications of Social Intelligence Mining, notably employing open access data and Google Search engine data for trend discernment. Utilizing advanced analytical methodologies, including wavelet coherence analysis and phase difference, hidden relationships and patterns within social data were revealed. These techniques furnish an enriched comprehension of social phenomena dynamics, bolstering decision-making processes. The study’s versatility extends across myriad domains, offering insights into public sentiment and the foresight for strategic approaches. The findings suggest immense potential in Social Intelligence Mining to influence strategies, foster innovation, and add value across diverse sectors. Full article
Show Figures

Figure 1

18 pages, 1858 KiB  
Article
Arabic Toxic Tweet Classification: Leveraging the AraBERT Model
by Amr Mohamed El Koshiry, Entesar Hamed I. Eliwa, Tarek Abd El-Hafeez and Ahmed Omar
Big Data Cogn. Comput. 2023, 7(4), 170; https://doi.org/10.3390/bdcc7040170 - 26 Oct 2023
Cited by 7 | Viewed by 3112
Abstract
Social media platforms have become the primary means of communication and information sharing, facilitating interactive exchanges among users. Unfortunately, these platforms also witness the dissemination of inappropriate and toxic content, including hate speech and insults. While significant efforts have been made to classify [...] Read more.
Social media platforms have become the primary means of communication and information sharing, facilitating interactive exchanges among users. Unfortunately, these platforms also witness the dissemination of inappropriate and toxic content, including hate speech and insults. While significant efforts have been made to classify toxic content in the English language, the same level of attention has not been given to Arabic texts. This study addresses this gap by constructing a standardized Arabic dataset specifically designed for toxic tweet classification. The dataset is annotated automatically using Google’s Perspective API and the expertise of three native Arabic speakers and linguists. To evaluate the performance of different models, we conduct a series of experiments using seven models: long short-term memory (LSTM), bidirectional LSTM, a convolutional neural network, a gated recurrent unit (GRU), bidirectional GRU, multilingual bidirectional encoder representations from transformers, and AraBERT. Additionally, we employ word embedding techniques. Our experimental findings demonstrate that the fine-tuned AraBERT model surpasses the performance of other models, achieving an impressive accuracy of 0.9960. Notably, this accuracy value outperforms similar approaches reported in recent literature. This study represents a significant advancement in Arabic toxic tweet classification, shedding light on the importance of addressing toxicity in social media platforms while considering diverse languages and cultures. Full article
(This article belongs to the Special Issue Advances in Natural Language Processing and Text Mining)
Show Figures

Figure 1

19 pages, 1817 KiB  
Article
Assessment of Security KPIs for 5G Network Slices for Special Groups of Subscribers
by Roman Odarchenko, Maksim Iavich, Giorgi Iashvili, Solomiia Fedushko and Yuriy Syerov
Big Data Cogn. Comput. 2023, 7(4), 169; https://doi.org/10.3390/bdcc7040169 - 26 Oct 2023
Cited by 4 | Viewed by 3532
Abstract
It is clear that 5G networks have already become integral to our present. However, a significant issue lies in the fact that current 5G communication systems are incapable of fully ensuring the required quality of service and the security of transmitted data, especially [...] Read more.
It is clear that 5G networks have already become integral to our present. However, a significant issue lies in the fact that current 5G communication systems are incapable of fully ensuring the required quality of service and the security of transmitted data, especially in government networks that operate in the context of the Internet of Things, hostilities, hybrid warfare, and cyberwarfare. The use of 5G extends to critical infrastructure operators and special users such as law enforcement, governments, and the military. Adapting modern cellular networks to meet the specific needs of these special users is not only feasible but also necessary. In doing so, these networks must meet additional stringent requirements for reliability, performance, and, most importantly, data security. This scientific paper is dedicated to addressing the challenges associated with ensuring cybersecurity in this context. To effectively improve or ensure a sufficient level of cybersecurity, it is essential to measure the primary indicators of the effectiveness of the security system. At the moment, there are no comprehensive lists of these key indicators that require priority monitoring. Therefore, this article first analyzed the existing similar indicators and presented a list of them, which will make it possible to continuously monitor the state of cybersecurity systems of 5G cellular networks with the aim of using them for groups of special users. Based on this list of cybersecurity KPIs, as a result, this article presents a model to identify and evaluate these indicators. To develop this model, we comprehensively analyzed potential groups of performance indicators, selected the most relevant ones, and introduced a mathematical framework for their quantitative assessment. Furthermore, as part of our research efforts, we proposed enhancements to the core of the 4G/5G network. These enhancements enable data collection and statistical analysis through specialized sensors and existing servers, contributing to improved cybersecurity within these networks. Thus, the approach proposed in the article opens up an opportunity for continuous monitoring and, accordingly, improving the performance indicators of cybersecurity systems, which in turn makes it possible to use them for the maintenance of critical infrastructure and other users whose service presents increased requirements for cybersecurity systems. Full article
Show Figures

Figure 1

21 pages, 1646 KiB  
Article
Improving Clothing Product Quality and Reducing Waste Based on Consumer Review Using RoBERTa and BERTopic Language Model
by Andry Alamsyah and Nadhif Ditertian Girawan
Big Data Cogn. Comput. 2023, 7(4), 168; https://doi.org/10.3390/bdcc7040168 - 25 Oct 2023
Cited by 6 | Viewed by 3356
Abstract
The disposability of clothing has emerged as a critical concern, precipitating waste accumulation due to product quality degradation. Such consequences exert significant pressure on resources and challenge sustainability efforts. In response, this research focuses on empowering clothing companies to elevate product excellence by [...] Read more.
The disposability of clothing has emerged as a critical concern, precipitating waste accumulation due to product quality degradation. Such consequences exert significant pressure on resources and challenge sustainability efforts. In response, this research focuses on empowering clothing companies to elevate product excellence by harnessing consumer feedback. Beyond insights, this research extends to sustainability by providing suggestions on refining product quality by improving material handling, gradually mitigating waste production, and cultivating longevity, therefore decreasing discarded clothes. Managing a vast influx of diverse reviews necessitates sophisticated natural language processing (NLP) techniques. Our study introduces a Robustly optimized BERT Pretraining Approach (RoBERTa) model calibrated for multilabel classification and BERTopic for topic modeling. The model adeptly distills vital themes from consumer reviews, exhibiting astounding accuracy in projecting concerns across various dimensions of clothing quality. NLP’s potential lies in endowing companies with insights into consumer review, augmented by the BERTopic to facilitate immersive exploration of harvested review topics. This research presents a thorough case for integrating machine learning to foster sustainability and waste reduction. The contribution of this research is notable for its integration of RoBERTa and BERTopic in multilabel classification tasks and topic modeling in the fashion industry. The results indicate that the RoBERTa model exhibits remarkable performance, as demonstrated by its macro-averaged F1 score of 0.87 and micro-averaged F1 score of 0.87. Likewise, BERTopic achieves a coherence score of 0.67, meaning the model can form an insightful topic. Full article
(This article belongs to the Special Issue Sustainable Big Data Analytics and Machine Learning Technologies)
Show Figures

Figure 1

14 pages, 2016 KiB  
Article
Identifying Probable Dementia in Undiagnosed Black and White Americans Using Machine Learning in Veterans Health Administration Electronic Health Records
by Yijun Shao, Kaitlin Todd, Andrew Shutes-David, Steven P. Millard, Karl Brown, Amy Thomas, Kathryn Chen, Katherine Wilson, Qing T. Zeng and Debby W. Tsuang
Big Data Cogn. Comput. 2023, 7(4), 167; https://doi.org/10.3390/bdcc7040167 - 19 Oct 2023
Viewed by 1838
Abstract
The application of natural language processing and machine learning (ML) in electronic health records (EHRs) may help reduce dementia underdiagnosis, but models that are not designed to reflect minority populations may instead perpetuate underdiagnosis. To improve the identification of undiagnosed dementia, particularly in [...] Read more.
The application of natural language processing and machine learning (ML) in electronic health records (EHRs) may help reduce dementia underdiagnosis, but models that are not designed to reflect minority populations may instead perpetuate underdiagnosis. To improve the identification of undiagnosed dementia, particularly in Black Americans (BAs), we developed support vector machine (SVM) ML models to assign dementia risk scores based on features identified in unstructured EHR data (via latent Dirichlet allocation and stable topic extraction in n = 1 M notes) and structured EHR data. We hypothesized that separate models would show differentiation between racial groups, so the models were fit separately for BAs (n = 5 K with dementia ICD codes, n = 5 K without) and White Americans (WAs; n = 5 K with codes, n = 5 K without). To validate our method, scores were generated for separate samples of BAs (n = 10 K) and WAs (n = 10 K) without dementia codes, and the EHRs of 1.2 K of these patients were reviewed by dementia experts. All subjects were age 65+ and drawn from the VA, which meant that the samples were disproportionately male. A strong positive relationship was observed between SVM-generated risk scores and undiagnosed dementia. BAs were more likely than WAs to have undiagnosed dementia per chart review, both overall (15.3% vs. 9.5%) and among Veterans with >90th percentile cutoff scores (25.6% vs. 15.3%). With chart reviews as the reference standard and varied cutoff scores, the BA model performed slightly better than the WA model (AUC = 0.86 with negative predictive value [NPV] = 0.98, positive predictive value [PPV] = 0.26, sensitivity = 0.61, specificity = 0.92 and accuracy = 0.91 at >90th percentile cutoff vs. AUC = 0.77 with NPV = 0.98, PPV = 0.15, sensitivity = 0.43, specificity = 0.91 and accuracy = 0.89 at >90th). Our findings suggest that race-specific ML models can help identify BAs who may have undiagnosed dementia. Future studies should examine model generalizability in settings with more females and test whether incorporating these models into clinical settings increases the referral of undiagnosed BAs to specialists. Full article
Show Figures

Figure 1

20 pages, 3279 KiB  
Article
HAMCap: A Weak-Supervised Hybrid Attention-Based Capsule Neural Network for Fine-Grained Climate Change Debate Analysis
by Kun Xiang and Akihiro Fujii
Big Data Cogn. Comput. 2023, 7(4), 166; https://doi.org/10.3390/bdcc7040166 - 17 Oct 2023
Viewed by 1773
Abstract
Climate change (CC) has become a central global topic within the multiple branches of social disciplines. Natural Language Processing (NLP) plays a superior role since it has achieved marvelous accomplishments in various application scenarios. However, CC debates are ambiguous and complicated to interpret [...] Read more.
Climate change (CC) has become a central global topic within the multiple branches of social disciplines. Natural Language Processing (NLP) plays a superior role since it has achieved marvelous accomplishments in various application scenarios. However, CC debates are ambiguous and complicated to interpret even for humans, especially when it comes to the aspect-oriented fine-grained level. Furthermore, the lack of large-scale effective labeled datasets is always a plight encountered in NLP. In this work, we propose a novel weak-supervised Hybrid Attention Masking Capsule Neural Network (HAMCap) for fine-grained CC debate analysis. Specifically, we use vectors with allocated different weights instead of scalars, and a hybrid attention mechanism is designed in order to better capture and represent information. By randomly masking with a Partial Context Mask (PCM) mechanism, we can better construct the internal relationship between the aspects and entities and easily obtain a large-scale generated dataset. Considering the uniqueness of linguistics, we propose a Reinforcement Learning-based Generator-Selector mechanism to automatically update and select data that are beneficial to model training. Empirical results indicate that our proposed ensemble model outperforms baselines on downstream tasks with a maximum of 50.08% on accuracy and 49.48% on F1 scores. Finally, we draw interpretable conclusions about the climate change debate, which is a widespread global concern. Full article
Show Figures

Figure 1

16 pages, 382 KiB  
Article
ZeroTrustBlock: Enhancing Security, Privacy, and Interoperability of Sensitive Data through ZeroTrust Permissioned Blockchain
by Pratik Thantharate and Anurag Thantharate
Big Data Cogn. Comput. 2023, 7(4), 165; https://doi.org/10.3390/bdcc7040165 - 17 Oct 2023
Cited by 18 | Viewed by 3325
Abstract
With the digitization of healthcare, an immense amount of sensitive medical data are generated and shared between various healthcare stakeholders—however, traditional health data management mechanisms present interoperability, security, and privacy challenges. The centralized nature of current health information systems leads to single points [...] Read more.
With the digitization of healthcare, an immense amount of sensitive medical data are generated and shared between various healthcare stakeholders—however, traditional health data management mechanisms present interoperability, security, and privacy challenges. The centralized nature of current health information systems leads to single points of failure, making the data vulnerable to cyberattacks. Patients also have little control over their medical records, raising privacy concerns. Blockchain technology presents a promising solution to these challenges through its decentralized, transparent, and immutable properties. This research proposes ZeroTrustBlock, a comprehensive blockchain framework for secure and private health information exchange. The decentralized ledger enhances integrity, while permissioned access and smart contracts enable patient-centric control over medical data sharing. A hybrid on-chain and off-chain storage model balances transparency with confidentiality. Integration gateways bridge ZeroTrustBlock protocols with existing systems like EHRs. Implemented on Hyperledger Fabric, ZeroTrustBlock demonstrates substantial security improvements over mainstream databases via cryptographic mechanisms, formal privacy-preserving protocols, and access policies enacting patient consent. Results validate the architecture’s effectiveness in achieving 14,200 TPS average throughput, 480 ms average latency for 100,000 concurrent transactions, and linear scalability up to 20 nodes. However, enhancements around performance, advanced cryptography, and real-world pilots are future work. Overall, ZeroTrustBlock provides a robust application of blockchain capabilities to transform security, privacy, interoperability, and patient agency in health data management. Full article
(This article belongs to the Special Issue Big Data in Health Care Information Systems)
Show Figures

Figure 1

21 pages, 3059 KiB  
Article
MM-EMOR: Multi-Modal Emotion Recognition of Social Media Using Concatenated Deep Learning Networks
by Omar Adel, Karma M. Fathalla and Ahmed Abo ElFarag
Big Data Cogn. Comput. 2023, 7(4), 164; https://doi.org/10.3390/bdcc7040164 - 13 Oct 2023
Cited by 1 | Viewed by 2680
Abstract
Emotion recognition is crucial in artificial intelligence, particularly in the domain of human–computer interaction. The ability to accurately discern and interpret emotions plays a critical role in helping machines to effectively decipher users’ underlying intentions, allowing for a more streamlined interaction process that [...] Read more.
Emotion recognition is crucial in artificial intelligence, particularly in the domain of human–computer interaction. The ability to accurately discern and interpret emotions plays a critical role in helping machines to effectively decipher users’ underlying intentions, allowing for a more streamlined interaction process that invariably translates into an elevated user experience. The recent increase in social media usage, as well as the availability of an immense amount of unstructured data, has resulted in a significant demand for the deployment of automated emotion recognition systems. Artificial intelligence (AI) techniques have emerged as a powerful solution to this pressing concern in this context. In particular, the incorporation of multimodal AI-driven approaches for emotion recognition has proven beneficial in capturing the intricate interplay of diverse human expression cues that manifest across multiple modalities. The current study aims to develop an effective multimodal emotion recognition system known as MM-EMOR in order to improve the efficacy of emotion recognition efforts focused on audio and text modalities. The use of Mel spectrogram features, Chromagram features, and the Mobilenet Convolutional Neural Network (CNN) for processing audio data are central to the operation of this system, while an attention-based Roberta model caters to the text modality. The methodology of this study is based on an exhaustive evaluation of this approach across three different datasets. Notably, the empirical findings show that MM-EMOR outperforms competing models across the same datasets. This performance boost is noticeable, with accuracy gains of an impressive 7% on one dataset and a substantial 8% on another. Most significantly, the observed increase in accuracy for the final dataset was an astounding 18%. Full article
Show Figures

Figure 1

50 pages, 2531 KiB  
Systematic Review
Cognitive Assessment Based on Electroencephalography Analysis in Virtual and Augmented Reality Environments, Using Head Mounted Displays: A Systematic Review
by Foteini Gramouseni, Katerina D. Tzimourta, Pantelis Angelidis, Nikolaos Giannakeas and Markos G. Tsipouras
Big Data Cogn. Comput. 2023, 7(4), 163; https://doi.org/10.3390/bdcc7040163 - 13 Oct 2023
Cited by 2 | Viewed by 3522
Abstract
The objective of this systematic review centers on cognitive assessment based on electroencephalography (EEG) analysis in Virtual Reality (VR), Augmented Reality (AR) and Mixed Reality (MR) environments, projected on Head Mounted Displays (HMD), in healthy individuals. A range of electronic databases were searched [...] Read more.
The objective of this systematic review centers on cognitive assessment based on electroencephalography (EEG) analysis in Virtual Reality (VR), Augmented Reality (AR) and Mixed Reality (MR) environments, projected on Head Mounted Displays (HMD), in healthy individuals. A range of electronic databases were searched (Scopus, ScienceDirect, IEEE Explore and PubMed), using PRISMA research method and 82 experimental studies were included in the final report. Specific aspects of cognitive function were evaluated, including cognitive load, immersion, spatial awareness, interaction with the digital environment and attention. These were analyzed based on various aspects of the analysis, including the number of participants, stimuli, frequency bands range, data preprocessing and data analysis. Based on the analysis conducted, significant findings have emerged both in terms of the experimental structure related to cognitive neuroscience and the key parameters considered in the research. Also, numerous significant avenues and domains requiring more extensive exploration have been identified within neuroscience and cognition research in digital environments. These encompass factors such as the experimental setup, including issues like narrow participant populations and the feasibility of using EEG equipment with a limited number of sensors to overcome the challenges posed by the time-consuming placement of a multi-electrode EEG cap. There is a clear need for more in-depth exploration in signal analysis, especially concerning the α, β, and γ sub-bands and their role in providing more precise insights for evaluating cognitive states. Finally, further research into augmented and mixed reality environments will enable the extraction of more accurate conclusions regarding their utility in cognitive neuroscience. Full article
Show Figures

Figure 1

12 pages, 918 KiB  
Article
Contemporary Art Authentication with Large-Scale Classification
by Todd Dobbs, Abdullah-Al-Raihan Nayeem, Isaac Cho and Zbigniew Ras
Big Data Cogn. Comput. 2023, 7(4), 162; https://doi.org/10.3390/bdcc7040162 - 9 Oct 2023
Cited by 2 | Viewed by 2938
Abstract
Art authentication is the process of identifying the artist who created a piece of artwork and is manifested through events of provenance, such as art gallery exhibitions and financial transactions. Art authentication has visual influence via the uniqueness of the artist’s style in [...] Read more.
Art authentication is the process of identifying the artist who created a piece of artwork and is manifested through events of provenance, such as art gallery exhibitions and financial transactions. Art authentication has visual influence via the uniqueness of the artist’s style in contrast to the style of another artist. The significance of this contrast is proportional to the number of artists involved and the degree of uniqueness of an artist’s collection. This visual uniqueness of style can be captured in a mathematical model produced by a machine learning (ML) algorithm on painting images. Art authentication is not always possible as provenance can be obscured or lost through anonymity, forgery, gifting, or theft of artwork. This paper presents an image-only art authentication attribute marker of contemporary art paintings for a very large number of artists. The experiments in this paper demonstrate that it is possible to use ML-generated models to authenticate contemporary art from 2368 to 100 artists with an accuracy of 48.97% to 91.23%, respectively. This is the largest effort for image-only art authentication to date, with respect to the number of artists involved and the accuracy of authentication. Full article
(This article belongs to the Special Issue Big Data and Cognitive Computing in 2023)
Show Figures

Figure 1

15 pages, 1851 KiB  
Article
An Empirical Study on Core Data Asset Identification in Data Governance
by Yunpeng Chen, Ying Zhao, Wenxuan Xie, Yanbo Zhai, Xin Zhao, Jiang Zhang, Jiang Long and Fangfang Zhou
Big Data Cogn. Comput. 2023, 7(4), 161; https://doi.org/10.3390/bdcc7040161 - 7 Oct 2023
Cited by 2 | Viewed by 2603
Abstract
Data governance aims to optimize the value derived from data assets and effectively mitigate data-related risks. The rapid growth of data assets increases the risk of data breaches. One key solution to reduce this risk is to classify data assets according to their [...] Read more.
Data governance aims to optimize the value derived from data assets and effectively mitigate data-related risks. The rapid growth of data assets increases the risk of data breaches. One key solution to reduce this risk is to classify data assets according to their business value and criticality to the enterprises, allocating limited resources to protect core data assets. The existing methods rely on the experience of professionals and cannot identify core data assets across business scenarios. This work conducts an empirical study to address this issue. First, we utilized data lineage graphs with expert-labeled core data assets to investigate the experience of data users on core data asset identification from a scenario perspective. Then, we explored the structural features of core data assets on data lineage graphs from an abstraction perspective. Finally, one expert seminar was conducted to derive a set of universal indicators to identify core data assets by synthesizing the results from the two perspectives. User and field studies were conducted to demonstrate the effectiveness of the indicators. Full article
Show Figures

Figure 1

13 pages, 1831 KiB  
Article
Defining Semantically Close Words of Kazakh Language with Distributed System Apache Spark
by Dauren Ayazbayev, Andrey Bogdanchikov, Kamila Orynbekova and Iraklis Varlamis
Big Data Cogn. Comput. 2023, 7(4), 160; https://doi.org/10.3390/bdcc7040160 - 27 Sep 2023
Cited by 3 | Viewed by 1775
Abstract
This work focuses on determining semantically close words and using semantic similarity in general in order to improve performance in information retrieval tasks. The semantic similarity of words is an important task with many applications from information retrieval to spell checking or even [...] Read more.
This work focuses on determining semantically close words and using semantic similarity in general in order to improve performance in information retrieval tasks. The semantic similarity of words is an important task with many applications from information retrieval to spell checking or even document clustering and classification. Although, in languages with rich linguistic resources, the methods and tools for this task are well established, some languages do not have such tools. The first step in our experiment is to represent the words in a collection in a vector form and then define the semantic similarity of the terms using a vector similarity method. In order to tame the complexity of the task, which relies on the number of word (and, consequently, of the vector) pairs that have to be combined in order to define the semantically closest word pairs, A distributed method that runs on Apache Spark is designed to reduce the calculation time by running comparison tasks in parallel. Three alternative implementations are proposed and tested using a list of target words and seeking the most semantically similar words from a lexicon for each one of them. In a second step, we employ pre-trained multilingual sentence transformers to capture the content semantics at a sentence level and a vector-based semantic index to accelerate the searches. The code is written in MapReduce, and the experiments and results show that the proposed methods can provide an interesting solution for finding similar words or texts in the Kazakh language. Full article
Show Figures

Figure 1

13 pages, 431 KiB  
Article
A Pruning Method Based on Feature Map Similarity Score
by Jihua Cui, Zhenbang Wang, Ziheng Yang and Xin Guan
Big Data Cogn. Comput. 2023, 7(4), 159; https://doi.org/10.3390/bdcc7040159 - 26 Sep 2023
Cited by 1 | Viewed by 2103
Abstract
As the number of layers of deep learning models increases, the number of parameters and computation increases, making it difficult to deploy on edge devices. Pruning has the potential to significantly reduce the number of parameters and computations in a deep learning model. [...] Read more.
As the number of layers of deep learning models increases, the number of parameters and computation increases, making it difficult to deploy on edge devices. Pruning has the potential to significantly reduce the number of parameters and computations in a deep learning model. Existing pruning methods frequently require a specific distribution of network parameters to achieve good results when measuring filter importance. As a result, a feature map similarity score-based pruning method is proposed. We calculate the similarity score of each feature map to measure the importance of the filter and guide filter pruning using the similarity between the filter output feature maps to measure the redundancy of the corresponding filter. Pruning experiments on ResNet-56 and ResNet-110 networks on Cifar-10 datasets can compress the model by more than 70% while maintaining a higher compression ratio and accuracy than traditional methods. Full article
Show Figures

Figure 1

31 pages, 16445 KiB  
Article
Ensemble-Based Short Text Similarity: An Easy Approach for Multilingual Datasets Using Transformers and WordNet in Real-World Scenarios
by Isabella Gagliardi and Maria Teresa Artese
Big Data Cogn. Comput. 2023, 7(4), 158; https://doi.org/10.3390/bdcc7040158 - 25 Sep 2023
Cited by 2 | Viewed by 2213
Abstract
When integrating data from different sources, there are problems of synonymy, different languages, and concepts of different granularity. This paper proposes a simple yet effective approach to evaluate the semantic similarity of short texts, especially keywords. The method is capable of matching keywords [...] Read more.
When integrating data from different sources, there are problems of synonymy, different languages, and concepts of different granularity. This paper proposes a simple yet effective approach to evaluate the semantic similarity of short texts, especially keywords. The method is capable of matching keywords from different sources and languages by exploiting transformers and WordNet-based methods. Key features of the approach include its unsupervised pipeline, mitigation of the lack of context in keywords, scalability for large archives, support for multiple languages and real-world scenarios adaptation capabilities. The work aims to provide a versatile tool for different cultural heritage archives without requiring complex customization. The paper aims to explore different approaches to identifying similarities in 1- or n-gram tags, evaluate and compare different pre-trained language models, and define integrated methods to overcome limitations. Tests to validate the approach have been conducted using the QueryLab portal, a search engine for cultural heritage archives, to evaluate the proposed pipeline. Full article
(This article belongs to the Special Issue Artificial Intelligence in Digital Humanities)
Show Figures

Figure 1

Previous Issue
Next Issue
Back to TopTop