2020 Big Data and Artificial Intelligence Conference

A special issue of Symmetry (ISSN 2073-8994). This special issue belongs to the section "Computer".

Deadline for manuscript submissions: closed (15 March 2021) | Viewed by 44173

Special Issue Editor


E-Mail Website
Guest Editor
Department of Digital Technologies of Data Processing, MIREA – Russian Technological University, 119454 Moscow, Russia
Interests: symmetry groups; lie groups; dynamic systems modeling; experimental processing; artificial intellectual technologies; information systems
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

The 2020 Big Data and AI Conference (https://easychair.org/cfp/bigdata2020) will continue the success of the previous Big Data (ICBDA) and Big Data and AI conferences (BDAI). It will provide a leading forum for disseminating the latest results in Big Data research, development, and applications from business, technological, and scientific points of view.

The conference solicits high-quality original research papers in any aspect of Big Data and Artificial Intelligence with an emphasis on the 5Vs (Volume, Velocity, Variety, Value and Veracity), including the Big Data challenges in scientific and engineering, social, sensor/IoT/IoE, and multimedia (audio, video, image, etc.) big data systems and applications. Example topics of interest include but are not limited to the following:
- Big Data science and foundations;
- Big Data infrastructure;
- Big Data management;
- Big Data search and mining;
- Big Data security, privacy and trust;
- Big Data applications and machine learning.

Dr. Evgeny Nikulchev
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Symmetry is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue polices can be found here.

Published Papers (7 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

32 pages, 1571 KiB  
Article
Two-Stage Hybrid Data Classifiers Based on SVM and kNN Algorithms
by Liliya A. Demidova
Symmetry 2021, 13(4), 615; https://doi.org/10.3390/sym13040615 - 7 Apr 2021
Cited by 28 | Viewed by 4879
Abstract
The paper considers a solution to the problem of developing two-stage hybrid SVM-kNN classifiers with the aim to increase the data classification quality by refining the classification decisions near the class boundary defined by the SVM classifier. In the first stage, the SVM [...] Read more.
The paper considers a solution to the problem of developing two-stage hybrid SVM-kNN classifiers with the aim to increase the data classification quality by refining the classification decisions near the class boundary defined by the SVM classifier. In the first stage, the SVM classifier with default parameters values is developed. Here, the training dataset is designed on the basis of the initial dataset. When developing the SVM classifier, a binary SVM algorithm or one-class SVM algorithm is used. Based on the results of the training of the SVM classifier, two variants of the training dataset are formed for the development of the kNN classifier: a variant that uses all objects from the original training dataset located inside the strip dividing the classes, and a variant that uses only those objects from the initial training dataset that are located inside the area containing all misclassified objects from the class dividing strip. In the second stage, the kNN classifier is developed using the new training dataset above-mentioned. The values of the parameters of the kNN classifier are determined during training to maximize the data classification quality. The data classification quality using the two-stage hybrid SVM-kNN classifier was assessed using various indicators on the test dataset. In the case of the improvement of the quality of classification near the class boundary defined by the SVM classifier using the kNN classifier, the two-stage hybrid SVM-kNN classifier is recommended for further use. The experimental results approve the feasibility of using two-stage hybrid SVM-kNN classifiers in the data classification problem. The experimental results obtained with the application of various datasets confirm the feasibility of using two-stage hybrid SVM-kNN classifiers in the data classification problem. Full article
(This article belongs to the Special Issue 2020 Big Data and Artificial Intelligence Conference)
Show Figures

Figure 1

29 pages, 697 KiB  
Article
A Survey on Knowledge Graph Embeddings for Link Prediction
by Meihong Wang, Linling Qiu and Xiaoli Wang
Symmetry 2021, 13(3), 485; https://doi.org/10.3390/sym13030485 - 16 Mar 2021
Cited by 146 | Viewed by 19608
Abstract
Knowledge graphs (KGs) have been widely used in the field of artificial intelligence, such as in information retrieval, natural language processing, recommendation systems, etc. However, the open nature of KGs often implies that they are incomplete, having self-defects. This creates the need to [...] Read more.
Knowledge graphs (KGs) have been widely used in the field of artificial intelligence, such as in information retrieval, natural language processing, recommendation systems, etc. However, the open nature of KGs often implies that they are incomplete, having self-defects. This creates the need to build a more complete knowledge graph for enhancing the practical utilization of KGs. Link prediction is a fundamental task in knowledge graph completion that utilizes existing relations to infer new relations so as to build a more complete knowledge graph. Numerous methods have been proposed to perform the link-prediction task based on various representation techniques. Among them, KG-embedding models have significantly advanced the state of the art in the past few years. In this paper, we provide a comprehensive survey on KG-embedding models for link prediction in knowledge graphs. We first provide a theoretical analysis and comparison of existing methods proposed to date for generating KG embedding. Then, we investigate several representative models that are classified into five categories. Finally, we conducted experiments on two benchmark datasets to report comprehensive findings and provide some new insights into the strengths and weaknesses of existing models. Full article
(This article belongs to the Special Issue 2020 Big Data and Artificial Intelligence Conference)
Show Figures

Figure 1

13 pages, 864 KiB  
Article
Research of Trajectory Optimization Approaches in Synthesized Optimal Control
by Askhat Diveev and Elizaveta Shmalko
Symmetry 2021, 13(2), 336; https://doi.org/10.3390/sym13020336 - 18 Feb 2021
Viewed by 1875
Abstract
This article presents a study devoted to the emerging method of synthesized optimal control. This is a new type of control based on changing the position of a stable equilibrium point. The object stabilization system forces the object to move towards the equilibrium [...] Read more.
This article presents a study devoted to the emerging method of synthesized optimal control. This is a new type of control based on changing the position of a stable equilibrium point. The object stabilization system forces the object to move towards the equilibrium point, and by changing its position over time, it is possible to bring the object to the desired terminal state with the optimal value of the quality criterion. The implementation of such control requires the construction of two control contours. The first contour ensures the stability of the control object relative to some point in the state space. Methods of symbolic regression are applied for numerical synthesis of a stabilization system. The second contour provides optimal control of the stable equilibrium point position. The present paper provides a study of various approaches to find the optimal location of equilibrium points. A new problem statement with the search of function for optimal location of the equilibrium points in the second stage of the synthesized optimal control approach is formulated. Symbolic regression methods of solving the stated problem are discussed. In the presented numerical example, a piece-wise linear function is applied to approximate the location of equilibrium points. Full article
(This article belongs to the Special Issue 2020 Big Data and Artificial Intelligence Conference)
Show Figures

Figure 1

21 pages, 1058 KiB  
Article
Stochastic Diffusion Model for Analysis of Dynamics and Forecasting Events in News Feeds
by Dmitry Zhukov, Elena Andrianova and Olga Trifonova
Symmetry 2021, 13(2), 257; https://doi.org/10.3390/sym13020257 - 3 Feb 2021
Cited by 5 | Viewed by 2005
Abstract
One of the problems of forecasting events in news feeds, is the development of models which allow for work with semi structured information space of text documents. This article describes a model for forecasting events in news feeds, which is based on the [...] Read more.
One of the problems of forecasting events in news feeds, is the development of models which allow for work with semi structured information space of text documents. This article describes a model for forecasting events in news feeds, which is based on the use of stochastic dynamics of changes in the structure of non-stationary time series in news clusters (states of the information space) on the basis of use of diffusion approximation. Forecasting events in a news feed is based on their text description, vectorization, and finding the cosine value of the angle between the given vector and the centroids of various information space semantic clusters. Changes over time in the cosine value of such angles between the above vector and centroids can be represented as a point wandering on the [0, 1] segment. This segment contains a trap at the event occurrence threshold point, which the wandering point may eventually fall into. When creating the model, we have considered probability patterns of transitions between states in the information space. On the basis of this approach, we have derived a nonlinear second-order differential equation; formulated and solved the boundary value problem of forecasting news events, which allowed obtaining theoretical time dependence on the probability density function of the parameter distribution of non-stationary time series, which describe the information space evolution. The results of simulating the events instance probability dependence on time (with sets of parameter values of the developed model, which have been experimentally determined for already occurred events) show that the model is consistent and adequate (all the news events which have been used for the model verification occur with high values of probability (within the order of 80%), or if these are fictitious events, they can only occur over the course of inadmissible long time). Full article
(This article belongs to the Special Issue 2020 Big Data and Artificial Intelligence Conference)
Show Figures

Figure 1

22 pages, 6451 KiB  
Article
Choosing a Data Storage Format in the Apache Hadoop System Based on Experimental Evaluation Using Apache Spark
by Vladimir Belov, Andrey Tatarintsev and Evgeny Nikulchev
Symmetry 2021, 13(2), 195; https://doi.org/10.3390/sym13020195 - 26 Jan 2021
Cited by 16 | Viewed by 3712
Abstract
One of the most important tasks of any platform for big data processing is storing the data received. Different systems have different requirements for the storage formats of big data, which raises the problem of choosing the optimal data storage format to solve [...] Read more.
One of the most important tasks of any platform for big data processing is storing the data received. Different systems have different requirements for the storage formats of big data, which raises the problem of choosing the optimal data storage format to solve the current problem. This paper describes the five most popular formats for storing big data, presents an experimental evaluation of these formats and a methodology for choosing the format. The following data storage formats will be considered: avro, CSV, JSON, ORC, parquet. At the first stage, a comparative analysis of the main characteristics of the studied formats was carried out; at the second stage, an experimental evaluation of these formats was prepared and carried out. For the experiment, an experimental stand was deployed with tools for processing big data installed on it. The aim of the experiment was to find out characteristics of data storage formats, such as the volume and processing speed for different operations using the Apache Spark framework. In addition, within the study, an algorithm for choosing the optimal format from the presented alternatives was developed using tropical optimization methods. The result of the study is presented in the form of a technique for obtaining a vector of ratings of data storage formats for the Apache Hadoop system, based on an experimental assessment using Apache Spark. Full article
(This article belongs to the Special Issue 2020 Big Data and Artificial Intelligence Conference)
Show Figures

Figure 1

18 pages, 7871 KiB  
Article
wUUNet: Advanced Fully Convolutional Neural Network for Multiclass Fire Segmentation
by Vladimir Sergeevich Bochkov and Liliya Yurievna Kataeva
Symmetry 2021, 13(1), 98; https://doi.org/10.3390/sym13010098 - 8 Jan 2021
Cited by 23 | Viewed by 3193
Abstract
This article describes an AI-based solution to multiclass fire segmentation. The flame contours are divided into red, yellow, and orange areas. This separation is necessary to identify the hottest regions for flame suppression. Flame objects can have a wide variety of shapes (convex [...] Read more.
This article describes an AI-based solution to multiclass fire segmentation. The flame contours are divided into red, yellow, and orange areas. This separation is necessary to identify the hottest regions for flame suppression. Flame objects can have a wide variety of shapes (convex and non-convex). In that case, the segmentation task is more applicable than object detection because the center of the fire is much more accurate and reliable information than the center of the bounding box and, therefore, can be used by robotics systems for aiming. The UNet model is used as a baseline for the initial solution because it is the best open-source convolutional neural network. There is no available open dataset for multiclass fire segmentation. Hence, a custom dataset was developed and used in the current study, including 6250 samples from 36 videos. We compared the trained UNet models with several configurations of input data. The first comparison is shown between the calculation schemes of fitting the frame to one window and obtaining non-intersected areas of sliding window over the input image. Secondarily, we chose the best main metric of the loss function (soft Dice and Jaccard). We addressed the problem of detecting flame regions at the boundaries of non-intersected regions, and introduced new combinational methods of obtaining output signal based on weighted summarization and Gaussian mixtures of half-intersected areas as a solution. In the final section, we present UUNet-concatenative and wUUNet models that demonstrate significant improvements in accuracy and are considered to be state-of-the-art. All models use the original UNet-backbone at the encoder layers (i.e., VGG16) to demonstrate the superiority of the proposed architectures. The results can be applied to many robotic firefighting systems. Full article
(This article belongs to the Special Issue 2020 Big Data and Artificial Intelligence Conference)
Show Figures

Figure 1

23 pages, 1464 KiB  
Article
Classification of Negative Information on Socially Significant Topics in Mass Media
by Ravil I. Mukhamediev, Kirill Yakunin, Rustam Mussabayev, Timur Buldybayev, Yan Kuchin, Sanzhar Murzakhmetov and Marina Yelis
Symmetry 2020, 12(12), 1945; https://doi.org/10.3390/sym12121945 - 25 Nov 2020
Cited by 12 | Viewed by 7688
Abstract
Mass media not only reflect the activities of state bodies but also shape the informational context, sentiment, depth, and significance level attributed to certain state initiatives and social events. Multilateral and quantitative (to the practicable extent) assessment of media activity is important for [...] Read more.
Mass media not only reflect the activities of state bodies but also shape the informational context, sentiment, depth, and significance level attributed to certain state initiatives and social events. Multilateral and quantitative (to the practicable extent) assessment of media activity is important for understanding their objectivity, role, focus, and, ultimately, the quality of the society’s “fourth power”. The paper proposes a method for evaluating the media in several modalities (topics, evaluation criteria/properties, classes), combining topic modeling of the text corpora and multiple-criteria decision making. The evaluation is based on an analysis of the corpora as follows: the conditional probability distribution of media by topics, properties, and classes is calculated after the formation of the topic model of the corpora. Several approaches are used to obtain weights that describe how each topic relates to each evaluation criterion/property and to each class described in the paper, including manual high-level labeling, a multi-corpora approach, and an automatic approach. The proposed multi-corpora approach suggests assessment of corpora topical asymmetry to obtain the weights describing each topic’s relationship to a certain criterion/property. These weights, combined with the topic model, can be applied to evaluate each document in the corpora according to each of the considered criteria and classes. The proposed method was applied to a corpus of 804,829 news publications from 40 Kazakhstani sources published from 01 January 2018 to 31 December 2019, to classify negative information on socially significant topics. A BigARTM model was derived (200 topics) and the proposed model was applied, including to fill a table of the analytical hierarchical process (AHP) and all of the necessary high-level labeling procedures. Experiments confirm the general possibility of evaluating the media using the topic model of the text corpora, because an area under receiver operating characteristics curve (ROC AUC) score of 0.81 was achieved in the classification task, which is comparable with results obtained for the same task by applying the BERT (Bidirectional Encoder Representations from Transformers) model. Full article
(This article belongs to the Special Issue 2020 Big Data and Artificial Intelligence Conference)
Show Figures

Figure 1

Back to TopTop