Submit to Sensors Review for Sensors Propose a Special Issue

Journal Menu

Journal Browser

► Journal Browser

Data Engineering in the Internet of Things

Special Issue Editors
Special Issue Information
Keywords
Benefits of Publishing in a Special Issue
Related Special Issue
Published Papers

A special issue of Sensors (ISSN 1424-8220). This special issue belongs to the section "Internet of Things".

Deadline for manuscript submissions: closed (31 January 2024) | Viewed by 32588

Share This Special Issue

Special Issue Editors

Prof. Dr. Ray-I Chang

E-Mail Website
Guest Editor

Department of Engineering Science and Ocean Engineering, National Taiwan University, Taipei 10617, Taiwan
Interests: multimedia networking; data mining; machine learning; Internet of Things; computer security
Special Issues, Collections and Topics in MDPI journals

Dr. Chia-Hui Wang

E-Mail Website
Guest Editor

Department of Computer Science and Information Engineering, Ming Chuan University, Taoyuan City 333, Taiwan
Interests: networking multimedia; Internet of Things; blockchain technology
Special Issues, Collections and Topics in MDPI journals

Dr. Yu-Hsin Hung

E-Mail Website
Guest Editor

Department of Industrial Engineering and Management, National Yunlin University of Science and Technology, Yunlin 64002, Taiwan
Interests: Internet of Thing; mobile application design; artificial intelligence; web technology
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Recent years have seen explosive and exciting advances in the field of the Internet of Things (IoT) that have enjoyed tremendous success in varieties of applications such as digital health, smart city, environmental monitoring, and predictive maintenance. Real-world applications require sensor data to be timely, reliable and suitable for decision-making. The bounding condition in the IoT system is not going to be the deployment of sensors but rather the data engineering with management and analysis of the data coming off those sensors. With the proliferation of the different forms of data in IoT applications, the need for data engineering techniques can result in-depth processing, analysis, indexing, learning, mining, searching, management, and retrieval of data.

This Special Issue will highlight data engineering techniques that are applied in the design, development and assessment of IoT systems to prepare, transform, publish, or otherwise make available data for different IoT applications. We are receptive to a range of papers suitable to some aspect of IoT data engineering. For sharing and exchanging research and results to problems encountered in today's IoT data engineering practitioners and researchers, we especially encourage submissions that make efforts to

(1) the most recent research results in IoT data engineering;

(2) the most recent practice problems that arise in IoT data engineering;

(3) the exchange of experiences in IoT data engineering technologies;

(4) the new issues and directions for future research and development in IoT data engineering.

Prof. Dr. Ray-I Chang
Prof. Dr. Chia-Hui Wang
Prof. Dr. Yu-Hsin Hung
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Sensors is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

IoT
data engineering
data warehouse and database
privacy and security
data processing
data analysis
data mining
data searching
data management
data retrieval

Benefits of Publishing in a Special Issue

Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue polices can be found here.

Related Special Issue

Data Engineering in the Internet of Things—Second Edition in Sensors (4 articles)

Published Papers (13 papers)

Download All Papers

Order results

Result details

Show export options Show export options

Select all

Export citation of selected articles as:

Research

17 pages, 51092 KiB

Open AccessArticle

A Connector for Integrating NGSI-LD Data into Open Data Portals

by Laura Martín, Jorge Lanza, Víctor González, Juan Ramón Santana, Pablo Sotres and Luis Sánchez

Sensors 2024, 24(5), 1695; https://doi.org/10.3390/s24051695 - 6 Mar 2024

Cited by 1 | Viewed by 1036

Abstract

Nowadays, there are plenty of data sources generating massive amounts of information that, combined with novel data analytics frameworks, are meant to support optimisation in many application domains. Nonetheless, there are still shortcomings in terms of data discoverability, accessibility and interoperability. Open Data portals have emerged as a shift towards openness and discoverability. However, they do not impose any condition to the data itself, just stipulate how datasets have to be described. Alternatively, the NGSI-LD standard pursues harmonisation in terms of data modelling and accessibility. This paper presents a solution that bridges these two domains (i.e., Open Data portals and NGSI-LD-based data) in order to keep benefiting from the structured description of datasets offered by Open Data portals, while ensuring the interoperability provided by the NGSI-LD standard. Our solution aggregates the data into coherent datasets and generate high-quality descriptions, ensuring comprehensiveness, interoperability and accessibility. The proposed solution has been validated through a real-world implementation that exposes IoT data in NGSI-LD format through the European Data Portal (EDP). Moreover, the results from the Metadata Quality Assessment that the EDP implements, show that the datasets’ descriptions generated achieve excellent ranking in terms of the Findability, Accessibility, Interoperability and Reusability (FAIR) data principles. Full article

(This article belongs to the Special Issue Data Engineering in the Internet of Things)

► Show Figures

Figure 1

21 pages, 2047 KiB

Open AccessArticle

A Hybrid Missing Data Imputation Method for Batch Process Monitoring Dataset

by Qihong Gan, Lang Gong, Dasha Hu, Yuming Jiang and Xuefeng Ding

Sensors 2023, 23(21), 8678; https://doi.org/10.3390/s23218678 - 24 Oct 2023

Viewed by 1149

Abstract

Batch process monitoring datasets usually contain missing data, which decreases the performance of data-driven modeling for fault identification and optimal control. Many methods have been proposed to impute missing data; however, they do not fulfill the need for data quality, especially in sensor datasets with different types of missing data. We propose a hybrid missing data imputation method for batch process monitoring datasets with multi-type missing data. In this method, the missing data is first classified into five categories based on the continuous missing duration and the number of variables missing simultaneously. Then, different categories of missing data are step-by-step imputed considering their unique characteristics. A combination of three single-dimensional interpolation models is employed to impute transient isolated missing values. An iterative imputation based on a multivariate regression model is designed for imputing long-term missing variables, and a combination model based on single-dimensional interpolation and multivariate regression is proposed for imputing short-term missing variables. The Long Short-Term Memory (LSTM) model is utilized to impute both short-term and long-term missing samples. Finally, a series of experiments for different categories of missing data were conducted based on a real-world batch process monitoring dataset. The results demonstrate that the proposed method achieves higher imputation accuracy than other comparative methods. Full article

(This article belongs to the Special Issue Data Engineering in the Internet of Things)

► Show Figures

Figure 1

18 pages, 5124 KiB

Open AccessArticle

BERT-Based Approaches to Identifying Malicious URLs

by Ming-Yang Su and Kuan-Lin Su

Sensors 2023, 23(20), 8499; https://doi.org/10.3390/s23208499 - 16 Oct 2023

Cited by 3 | Viewed by 3709

Abstract

Malicious uniform resource locators (URLs) are prevalent in cyberattacks, particularly in phishing attempts aimed at stealing sensitive information or distributing malware. Therefore, it is of paramount importance to accurately detect malicious URLs. Prior research has explored the use of deep-learning models to identify malicious URLs, using the segmentation of URL strings into character-level or word-level tokens, and embedding and employing trained models to differentiate between URLs. In this study, a bidirectional encoder representation from a transformers-based (BERT) model was devised to tokenize URL strings, employing its self-attention mechanism to enhance the understanding of correlations among tokens. Subsequently, a classifier was employed to determine whether a given URL was malicious. In evaluating the proposed methods, three different types of public datasets were utilized: a dataset consisting solely of URL strings from Kaggle, a dataset containing only URL features from GitHub, and a dataset including both types of data from the University of New Brunswick, namely, ISCX 2016. The proposed system achieved accuracy rates of 98.78%, 96.71%, and 99.98% on the three datasets, respectively. Additionally, experiments were conducted on two datasets from different domains—the Internet of Things (IoT) and Domain Name System over HTTPS (DoH)—to demonstrate the versatility of the proposed model. Full article

(This article belongs to the Special Issue Data Engineering in the Internet of Things)

► Show Figures

Figure 1

18 pages, 5722 KiB

Open AccessArticle

A Performance Improvement for Indoor Positioning Systems Using Earth’s Magnetic Field

by Sheng-Cheng Yeh, Hsien-Chieh Chiu, Chih-Yang Kao and Chia-Hui Wang

Sensors 2023, 23(16), 7108; https://doi.org/10.3390/s23167108 - 11 Aug 2023

Viewed by 1293

Abstract

Although most indoor positioning systems use radio waves, such as Wi-Fi, Bluetooth, or RFID, for application in department stores, exhibition halls, stations, and airports, the accuracy of such technology is easily affected by human shadowing and multipath propagation delay. This study combines the earth’s magnetic field strength and Wi-Fi signals to obtain the indoor positioning information with high availability. Wi-Fi signals are first used to identify the user’s area under several kinds of environment partitioning methods. Then, the signal pattern comparison is used for positioning calculations using the strength change in the earth’s magnetic field among the east–west, north–south, and vertical directions at indoor area. Finally, the k-nearest neighbors (KNN) method and fingerprinting algorithm are used to calculate the fine-grained indoor positioning information. The experiment results show that the average positioning error is 0.57 m in 12-area partitioning, which is almost a 90% improvement in relation to that of one area partitioning. This study also considers the positioning error if the device is held at different angles by hand. A rotation matrix is used to convert the magnetic sensor coordinates from a mobile phone related coordinates into the geographic coordinates. The average positioning error is decreased by 68%, compared to the original coordinates in 12-area partitioning with a 30-degree pitch. In the offline procedure, only the northern direction data are used, which is reduced by 75%, to give an average positioning error of 1.38 m. If the number of reference points is collected every 2 m for reducing 50% of the database requirement, the average positioning error is 1.77 m. Full article

(This article belongs to the Special Issue Data Engineering in the Internet of Things)

► Show Figures

Figure 1

15 pages, 2046 KiB

Open AccessArticle

Performance Analysis of Routable GOOSE Security Algorithm for Substation Communication through Public Internet Network

by Soohyun Shin and Hyosik Yang

Sensors 2023, 23(12), 5396; https://doi.org/10.3390/s23125396 - 7 Jun 2023

Cited by 4 | Viewed by 1492

Abstract

Traditional unidirectional power systems that produce large-scale electricity and supply it using an ultra-high voltage power grid are changing globally to increase efficiency. Current substations’ protection relays rely only on internal substation data, where they are located, to detect changes. However, to more accurately detect changes in the system, various data from several external substations, including micro-grids, are required. As such, communication technology regarding data acquisition has become an essential function for next-generation substations. Data aggregators that use the GOOSE protocol to collect data inside substations in real-time have been developed, but data acquisition from external substations is challenging in terms of cost and security, so only internal substation data are used. This paper proposes the acquisition of data from external substations by applying security to R-GOOSE, defined in the IEC 61850 standard, over a public internet network. This paper also develops a data aggregator based on R-GOOSE, showing data acquisition results. Full article

(This article belongs to the Special Issue Data Engineering in the Internet of Things)

► Show Figures

Figure 1

12 pages, 793 KiB

Open AccessArticle

LAD: Layer-Wise Adaptive Distillation for BERT Model Compression

by Ying-Jia Lin, Kuan-Yu Chen and Hung-Yu Kao

Sensors 2023, 23(3), 1483; https://doi.org/10.3390/s23031483 - 28 Jan 2023

Cited by 7 | Viewed by 3274

Abstract

Recent advances with large-scale pre-trained language models (e.g., BERT) have brought significant potential to natural language processing. However, the large model size hinders their use in IoT and edge devices. Several studies have utilized task-specific knowledge distillation to compress the pre-trained language models. However, to reduce the number of layers in a large model, a sound strategy for distilling knowledge to a student model with fewer layers than the teacher model is lacking. In this work, we present Layer-wise Adaptive Distillation (LAD), a task-specific distillation framework that can be used to reduce the model size of BERT. We design an iterative aggregation mechanism with multiple gate blocks in LAD to adaptively distill layer-wise internal knowledge from the teacher model to the student model. The proposed method enables an effective knowledge transfer process for a student model, without skipping any teacher layers. The experimental results show that both the six-layer and four-layer LAD student models outperform previous task-specific distillation approaches during GLUE tasks. Full article

(This article belongs to the Special Issue Data Engineering in the Internet of Things)

► Show Figures

Figure 1

13 pages, 2220 KiB

Open AccessArticle

Smartwatch Sensors with Deep Learning to Predict the Purchase Intentions of Online Shoppers

by Ray-I Chang, Chih-Yung Tsai and Pu Chung

Sensors 2023, 23(1), 430; https://doi.org/10.3390/s23010430 - 30 Dec 2022

Cited by 3 | Viewed by 2458

Abstract

In the past decade, the scale of e-commerce has continued to grow. With the outbreak of the COVID-19 epidemic, brick-and-mortar businesses have been actively developing online channels where precision marketing has become the focus. This study proposed using the electrocardiography (ECG) recorded by wearable devices (e.g., smartwatches) to judge purchase intentions through deep learning. The method of this study included a long short-term memory (LSTM) model supplemented by collective decisions. The experiment was divided into two stages. The first stage aimed to find the regularity of the ECG and verify the research by repeated measurement of a small number of subjects. A total of 201 ECGs were collected for deep learning, and the results showed that the accuracy rate of predicting purchase intention was 75.5%. Then, incremental learning was adopted to carry out the second stage of the experiment. In addition to adding subjects, it also filtered five different frequency ranges. This study employed the data augmentation method and used 480 ECGs for training, and the final accuracy rate reached 82.1%. This study could encourage online marketers to cooperate with health management companies with cross-domain big data analysis to further improve the accuracy of precision marketing. Full article

(This article belongs to the Special Issue Data Engineering in the Internet of Things)

► Show Figures

Figure 1

22 pages, 6081 KiB

Open AccessArticle

Developing an Improved Ensemble Learning Approach for Predictive Maintenance in the Textile Manufacturing Process

by Yu-Hsin Hung

Sensors 2022, 22(23), 9065; https://doi.org/10.3390/s22239065 - 22 Nov 2022

Cited by 3 | Viewed by 2477

Abstract

With the rapid development of digital transformation, paper forms are digitalized as electronic forms (e-Forms). Existing data can be applied in predictive maintenance (PdM) for the enabling of intelligentization and automation manufacturing. This study aims to enhance the utilization of collected e-Form data though machine learning approaches and cloud computing to predict and provide maintenance actions. The ensemble learning approach (ELA) requires less computation time and has a simple hardware requirement; it is suitable for processing e-form data with specific attributes. This study proposed an improved ELA to predict the defective class of product data from a manufacturing site’s work order form. This study proposed the resource dispatching approach to arrange data with the corresponding emailing resource for automatic notification. This study’s novelty is the integration of cloud computing and an improved ELA for PdM to assist the textile product manufacturing process. The data analytics results show that the improved ensemble learning algorithm has over 98% accuracy and precision for defective product prediction. The validation results of the dispatching approach show that data can be correctly transmitted in a timely manner to the corresponding resource, along with a notification being sent to users. Full article

(This article belongs to the Special Issue Data Engineering in the Internet of Things)

► Show Figures

Figure 1

18 pages, 8025 KiB

Open AccessArticle

Scale-Mark-Based Gauge Reading for Gauge Sensors in Real Environments with Light and Perspective Distortions

by Chia-Hui Wang, Ke-Kai Huang, Ray-I Chang and Chien-Kang Huang

Sensors 2022, 22(19), 7490; https://doi.org/10.3390/s22197490 - 2 Oct 2022

Cited by 2 | Viewed by 2610

Abstract

Nowadays, many old analog gauges still require the use of manual gauge reading. It is a time-consuming, expensive, and error-prone process. A cost-effective solution for automatic gauge reading has become a very important research topic. Traditionally, different types of gauges have their own specific methods for gauge reading. This paper presents a systematized solution called SGR (Scale-mark-based Gauge Reading) to automatically read gauge values from different types of gauges. Since most gauges have scale marks (circular or in an arc), our SGR algorithm utilizes PCA (principal components analysis) to find the primary eigenvector of each scale mark. The intersection of these eigenvectors is extracted as the gauge center to ascertain the scale marks. Then, the endpoint of the gauge pointer is found to calculate the corresponding angles to the gauge’s center. Using OCR (optical character recognition), the corresponding dial values can be extracted to match with their scale marks. Finally, the gauge reading value is obtained by using the linear interpolation of these angles. Our experiments use four videos in real environments with light and perspective distortions. The gauges in the video are first detected by YOLOv4 and the detected regions are clipped as the input images. The obtained results show that SGR can automatically and successfully read gauge values. The average error of SGR is nearly 0.1% for the normal environment. When the environment becomes abnormal with respect to light and perspective distortions, the average error of SGR is still less than 0.5%. Full article

(This article belongs to the Special Issue Data Engineering in the Internet of Things)

► Show Figures

Figure 1

22 pages, 8459 KiB

Open AccessArticle

Cost-Effective Fitting Model for Indoor Positioning Systems Based on Bluetooth Low Energy

by Sheng-Cheng Yeh, Chia-Hui Wang, Chaur-Heh Hsieh, Yih-Shyh Chiou and Tsung-Pao Cheng

Sensors 2022, 22(16), 6007; https://doi.org/10.3390/s22166007 - 11 Aug 2022

Cited by 4 | Viewed by 2044

Abstract

Bluetooth Low Energy (BLE) is a positioning technology that is commonly used in indoor positioning systems (IPS) such as shopping malls or underground parking lots, because of its low power consumption and the low cost of Bluetooth devices. It also maintains high positioning accuracy. Since the cost of BLE itself is low, it has now been used in larger environments such as parking lots or shopping malls for a long time. However, it is necessary to configure a large number of devices in the environment to obtain accurate positioning results. The most accurate method of using signal strength for positioning is the signal pattern-matching method. The positioning result is compared through a database with the overheads of time and labor costs, since the amount of data will be proportional to the size of the environment for BLE-IPS. A planar model that conforms to the signal strength in the environment was generated, wherein the database comparison method is replaced by an equation solution, to improve various costs but diminish the positioning accuracy. In this paper, we propose to further replace the planar model with a cost-effective fitting model to both save costs and improve positioning accuracy. The experimental results demonstrate that this model can effectively reduce the average positioning error in distance by 31%. Full article

(This article belongs to the Special Issue Data Engineering in the Internet of Things)

► Show Figures

Figure 1

19 pages, 4552 KiB

Open AccessArticle

Multilayer Reversible Information Hiding with Prediction-Error Expansion and Dynamic Threshold Analysis

by I-Hui Pan, Ping-Sheng Huang, Te-Jen Chang and Hsiang-Hsiung Chen

Sensors 2022, 22(13), 4872; https://doi.org/10.3390/s22134872 - 28 Jun 2022

Cited by 1 | Viewed by 1460

Abstract

The rapid development of internet and social media has driven the great requirement for information sharing and intelligent property protection. Therefore, reversible information embedding theory has marked some approaches for information security. Assuming reversibility, the original and embedded data must be completely restored. In this paper, a high-capacity and multilayer reversible information hiding technique for digital images was presented. First, the integer Haar wavelet transform scheme converted the cover image from the spatial into the frequency domain that was used. Furthermore, we applied dynamic threshold analysis, the parameters of the predicted model, the location map, and the multilayer embedding method to improve the quality of the stego image and restore the cover image. In comparison with current algorithms, the proposed algorithm often had better embedding capacity versus image quality performance. Full article

(This article belongs to the Special Issue Data Engineering in the Internet of Things)

► Show Figures

Figure 1

20 pages, 5098 KiB

Open AccessArticle

Edge Computing of Online Bounded-Error Query for Energy-Efficient IoT Sensors

by Ray-I Chang, Jui-Hua Tsai and Chia-Hui Wang

Sensors 2022, 22(13), 4799; https://doi.org/10.3390/s22134799 - 24 Jun 2022

Cited by 3 | Viewed by 1853

Abstract

Since the power of transmitting one-bit data is higher than that of computing one thousand lines of code in IoT (Internet of Things) applications, it is very important to reduce communication costs to save battery power and prolong system lifetime. In IoT sensors, the transformation of physical phenomena to data is usually with distortion (bounded-error tolerance). It introduces bounded-error data in IoT applications according to their required QoS² (quality-of-sensor service) or QoD (quality-of-decision making). In our previous work, we proposed a bounded-error data compression scheme called BESDC (Bounded-Error-pruned Sensor Data Compression) to reduce the point-to-point communication cost of WSNs (wireless sensor networks). Based on BESDC, this paper proposes an online bounded-error query (OBEQ) scheme with edge computing to handle the entire online query process. We propose a query filter scheme to reduce the query commands, which will inform WSN to return unnecessary queried data. It not only satisfies the QoS²/QoD requirements, but also reduces the communication cost to request sensing data. Our experiments use real data of WSN to demonstrate the query performance. Results show that an OBEQ with a query filter can reduce up to 88% of the communication cost when compared with the traditional online query process. Full article

(This article belongs to the Special Issue Data Engineering in the Internet of Things)

► Show Figures

Figure 1

22 pages, 9945 KiB

Open AccessArticle

Ensemble Machine Learning Model for Accurate Air Pollution Detection Using Commercial Gas Sensors

by Wei-In Lai, Yung-Yu Chen and Jia-Hong Sun

Sensors 2022, 22(12), 4393; https://doi.org/10.3390/s22124393 - 10 Jun 2022

Cited by 20 | Viewed by 3714

Abstract

This paper presents the results on developing an ensemble machine learning model to combine commercial gas sensors for accurate concentration detection. Commercial gas sensors have the low-cost advantage and become key components of IoT devices in atmospheric condition monitoring. However, their native coarse resolution and poor selectivity limit their performance. Thus, we adopted recurrent neural network (RNN) models to extract the time-series concentration data characteristics and improve the detection accuracy. Firstly, four types of RNN models, LSTM and GRU, Bi-LSTM, and Bi-GRU, were optimized to define the best-performance single weak models for CO, O₃, and NO₂ gases, respectively. Next, ensemble models which integrate multiple single weak models with a dynamic model were defined and trained. The testing results show that the ensemble models perform better than the single weak models. Further, a retraining procedure was proposed to make the ensemble model more flexible to adapt to environmental conditions. The significantly improved determination coefficients show that the retraining helps the ensemble models maintain long-term stable sensing performance in an atmospheric environment. The result can serve as an essential reference for the applications of IoT devices with commercial gas sensors in environment condition monitoring. Full article

(This article belongs to the Special Issue Data Engineering in the Internet of Things)

► Show Figures

Journal Menu

Journal Browser

Data Engineering in the Internet of Things

Share This Special Issue

Special Issue Editors

Special Issue Information

Keywords

Benefits of Publishing in a Special Issue

Related Special Issue

Published Papers (13 papers)

Research

Further Information

Guidelines

MDPI Initiatives

Follow MDPI