1. Introduction
The length of roadways across the world is consistently increasing as the global economy grows. For example, the highway network in the US has increased by 10,000 miles every year since 1990 (from OECD statistics). However, despite the quantitative increase in roads, traffic accidents also increased significantly up until 2000. To reduce traffic accidents, governments and transportation authorities have been implementing several systematic and ongoing road safety projects, such as improving accident-prone areas and renovating dangerous roads.
Road accidents are caused by vehicle, driver, and road infrastructure malfunctions [
1]. In terms of vehicles, the development of new technologies such as autonomous vehicles is expected to greatly contribute to the prevention of traffic accidents [
2]. Although many studies on human factors are currently underway, practically applying these findings is expected to require substantial time and costs. Accidents caused by hazardous road environments can be prevented through regular inspections by road management agencies. However, with the continuous extension of roads, their continuous deterioration, and the shortage of road management personnel, substantial progress is unlikely. To prevent accidents through efficient road management, it is particularly important to identify and address road hazards in real time. To this end, traffic agencies allocate substantial budgets and increase the workforce each year to maintain or improve the performance of roads. In addition, traffic agencies require comprehensive foundational data such as the classifications and sizes of road hazards in each region and the time required to process each hazard.
Collecting spatial–temporal data on road hazards is challenging, and even if it is available, evaluating the data for budget allocation is difficult due to stakeholder interests across agencies. This study proposes a process of evaluating operational efficiency in terms of maintaining roads and preventing hazards by analyzing citizen scientist-based data. First, we collected complaints reported by drivers through a mobile application across an entire national highway in South Korea. Because the data were logged in text format, this study applied text mining techniques to classify each complaint into several types of road hazard maintenance. In addition, the data included the time required to clear each type of road hazard (hereafter called the processing time). Second, based on the information, we developed an indicator to measure operational efficiency using the processing time and evaluated each regional agency from the calculated indicator per each type of maintenance.
The remainder of this paper is organized as follows:
Section 2 reviews previous studies related to this research.
Section 3 presents the text mining methodology.
Section 4 introduces the analyses of the area and data. Then, the empirical results obtained after applying the text mining technique are presented in
Section 5.
Section 6 presents the discussion and conclusions.
2. Literature Review
Road agencies have an obligation to ensure the safety of the roads under their jurisdiction. Thus, monitoring when and where the maintenance of road infrastructure is needed is a routine task for road agencies and should be part of their operational processes to reduce road safety risks [
3]. The first step is to collect road hazard data, which can be achieved in various ways. As information and communications technology (ICT) evolves, numerous studies seek to recognize potential road hazards through various sensors. On the contrary, other studies still rely on manual data collection via either volunteers or citizen scientists. This study reviews recent research in terms of data collection methods for not only the most representative road hazards, such as potholes and roadkill, but also for other minor issues.
Road networks experience significant impacts due to potholes and roadkill on the roads, leading to accidents and fatal injuries [
4]. In particular, it is important to detect and repair hazardous factors in a timely and quickly manner to minimize adverse impacts on traffic. Recent studies have used the power of artificial intelligence (AI) to monitor the conditions of road pavements. Machine learning and deep learning methods have been employed to assess the condition of road surfaces through either field trials or case studies.
Briefly reviewing the literature using machine learning, ref. [
5] developed a supervised machine learning model that uses image data to detect and classify nine types of crack image data. To develop the model, they utilized a data augmentation technique and achieved an accuracy of 85%. Ref. [
6] developed a cloud-assisted road condition-monitoring system that is capable of applying monitoring in real time and can classify road conditions with an accuracy of 88% accuracy. Ref. [
7] employed SVM to provide real-time warnings for bumps and potholes, which is also capable of providing instructions to drivers to suddenly accelerate and brake. For road anomalies, the classification accuracy in is approximately 80%. Ref. [
8] proposed a new approach for comprehensive pavement condition indicators. The authors argued that the model improves the accuracy when fewer data are available. However, it is expected that considerable effort will be required to collect relevant data in new regions.
There are similar efforts using deep learning methods. Ref. [
9] regenerated image data from an RGB-D pavement surface dataset and developed deep convolutional neural networks for pothole detection, which is capable of extracting depth information when depth data are not available. Ref. [
10] collected thermal imaging data and applied the CNN approach for pothole detection. Ref. [
11] proposed a method that yields reliable pothole detection results under small sample conditions. After data augmentation, they tested the CNN fusion model and the detection accuracy improved up to 90%. Ref. [
12] focused on identifying the severity and type of cracks at the same time using the Mobilenet-SSD approach. This study included the assumption that the severity of cracks is directly related to the area of said cracks. Ref. [
13] modified a CNN to detect potholes in real time, where they removed some convolution layers and used different dilation rates. Ref. [
14] focused on generating pseudo images for a training dataset by combining GAN with Poisson blending artificially. They improved the accuracy of pothole detection by 5% when the original image data were small. Ref. [
15] introduced a new approach to obtain labeled training data. After training two mainstream deep learning frameworks (YOLO v2 and R-CNN), they evaluated them using a new dataset extracted via Google API. Ref. [
16] developed a location-aware CNN to detect potholes. They argued that the model captures discriminative regions with potholes rather than the global context and as a result outperforms existing methods. Ref. [
17] employed a crowd sensing-based deep learning approach to detect potholes. The model is capable of distinguishing potholes from destabilizations of vehicles due to speed bumps or driver behaviors. Ref. [
18] employed a deep learning method to detect pothole areas and their depth using a mobile point cloud and images. Ref. [
19] employed five different datasets and compared the performance of detecting potholes between 3D scene reconstruction methods and deep learning techniques. Ref. [
20] employed the object detection technology of a CNN to identify five different pothole types. Ref. [
21] proposed fully automated roadway safety assessment using a deep convolutional neural network. They used a street-level panorama image dataset, and the network is capable of estimate various road-level attributes.
Collisions between wildlife and vehicles pose a potential threat to both wildlife populations and road user safety. Data collection methods for wildlife–vehicle collisions (WVCs) include accident reports by the police [
22]; historical data from hunters, citizen scientists, or volunteers [
23,
24,
25,
26,
27,
28]; sensor-driven data collection such as lidar [
29] and smartphone [
30]. The study by [
4] is unique, given the fact that they utilized a YOLO v3 computer vision algorithm to detect two road hazards (potholes and roadkill) at the same time.
Road attributes such as traffic signs and trees may not be the direct cause of traffic accidents, but they are still crucial maintenance items to improve road safety. AI technology can also be used to evaluate the conditions of road sign integrity [
31,
32,
33,
34]. For example, ref. [
34] used the deep learning method to develop an algorithm that evaluates road sign integrity and conditions. They validated their algorithm using Google images. There are also other efforts [
35,
36,
37,
38,
39] that use deep learning techniques or video image processing to measure how far roadside objects (e.g., big trees, electric poles, and other roadside vegetation) are from the road boundary. See
Table 1 for a summary of these previous studies.
In addition to the road hazards suggested in the review of previous studies, there are numerous others that can affect driver safety. Collecting data associated with various road hazards can be challenging because they are widely distributed in space and may not be timely. To this end, as noted above, research using artificial intelligence techniques has been actively conducted. However, there are many challenges in applying these techniques because of a large dataset for training, a class imbalance issue, and the need to retrain for a new site [
40]. More importantly, automatically detecting, identifying, and classifying all road hazards is almost impossible.
This study sought to employ road hazard data to evaluate operational efficiency to clear given road hazards among road management agencies. Because this study dealt with all observable road hazards by users, we analyzed road hazard-related data in text from volunteers and citizen participations rather than in-vehicle sensor-based data.
Of course, road hazard detection using state-of-the-art technologies will continue to develop and is expected to be applied in practice someday. However, if a system to monitor citizen feedback for road hazard maintenance is in place, it has the advantage of being immediately applicable to a wide range of road networks. Another benefit is that such a system can monitor various hazards that may occur on the road. Conversely, data collection can be the biggest weakness in recognizing diverse and widely distributed road hazard factors using the latest technology.
4. Analysis Area and Data
Highways in South Korea can be broadly divided into national highways managed by local governments and expressways managed by the Korea Expressway Corporation. The expressways are similar to turnpikes in the US and are generally better maintained than national highways. However, the combined total length of the national highways is 9155 km (refer to
Table 2), which is more than twice the combined 4036 km length of expressways. Regarding national highways, complaints from users can be reported via a phone, but in reality, road users often do not identify the number of the traffic agency, and it may take over two days for the report to be transferred to the appropriate road agency. Thus, this study focused on national highways, where complaints are relatively frequent and the response times is expected to be somewhat long.
Table 2 shows the regional and local agencies managing the national highways in South Korea, the local offices in each region, and the length of the roads they manage.
To overcome the limitations of the conventional reporting system for national highways, the Department of Transportation in South Korea developed the Road Inconvenience Reporting System (RIRS), which allows communication with the appropriate road agency. The system has been collecting reports of complaints from any road using GPS technology since 28 March 2014. The RIRS provides a simple and convenient way for road users to report road hazards via a smartphone app, while also allowing road managers to receive location and image information, enabling accurate identification of issues and prompt response to them. Information collected through the RIRS app is stored as historical data, along with details such as report ID, registration time, complaint content, location, agency, processing status, and time taken to process the complaint, as shown in
Table 3. This study analyzed a total of 17,738 complaints collected from the RIRS between 2014 and 2022, along with data on complaint processing times.
6. Conclusions
To prevent accidents through efficient road management, it is important to identify and address road hazards in real time. Hence, traffic agencies allocate substantial budget and personnel each year to maintain or improve the performance of the roads under their jurisdiction. However, the budget for road management is limited, and traffic agencies are distributed across regions; therefore, it is essential to determine the appropriate budget size and regional allocation for which comprehensive foundational data are required, including the classification and size of road hazards in each region and the time required to process each hazard. In this study, we proposed a text mining-based methodology to acquire such foundational data for allocating road management assets efficiently. We employed text-based complaint records reported by volunteers and citizen participation data collected using a mobile-based RIRS application. Taking advantage of the text mining technique, we defined road hazard types to be cleared for all complaint records. The analysis of road hazard types and complaint records for each road management agency revealed that specific types of road hazards (i.e., “Roadkill”, “Road Hazards”, “Potholes”, and “Illegal Ads”) occurred prominently under specific agencies. After extracting the processing time from the analysis of the data, we examined the operational efficiency of road management agencies through road hazard type. The results showed that the time required to process identical road hazard types can vary among agencies. These results suggest that the control tower overseeing the entire national highway may need to distribute its budget and support by region to resolve specific road hazards. Additionally, we developed an indicator that easily evaluates the operational efficiency of each management agency by combining the processing time and complaint record counts for each type.
We expect this research will help transportation authorities in road maintenance data acquisition and budget allocation. As road and traffic environments change, road maintenance workload and frequency also change over time. Therefore, it is very important to understand the workload for each type of maintenance in order to provide sustainable and consistent road services. Moreover, monitoring the maintenance status of roads is essential for the analysis of accident risk areas. This study proposed a framework to quantify the amount of road hazard maintenance. Through the simple method, various road maintenance workloads can be identified, and it is also easy to tally the work time required for each maintenance. Second, the budget for road maintenance is limited so that transportation authorities need to establish an appropriate budget allocation strategy. A possible MOE (measure of effectiveness) to determine which local office takes more budget would be work efficiency. The indicator proposed in this study could be a proxy value to measure work efficiency. Thus, transportation authorities can utilize the indicator to evaluate local offices in terms of work efficiency and determine priorities for resource allocation.
Generally, it is quite complicated to extract the necessary information from text data, a typical form of unstructured data, compared to structured data recorded numerically. In this study, we used Excel’s Power Query feature to extract keywords from a vast amount of text data and classify complaint types that can represent the content of the complaint written by the user. Consequently, we classified 95% of over 17,000 complaints into eight road hazard types using the data mining methodology. However, there are some limitations in utilizing text mining in this study. First, a review of researchers is necessary in some steps for extracting keywords from complaint records, which can extend the time required for keyword extraction if the historical data volume is vast. Furthermore, approximately 5% of the complaints were unclassified as a result of data mining; therefore, future studies must also consider the reduction of unclassified complaints. Finally, the techniques proposed in this study are a basic approach that works on limited text forms. Various techniques associated with text clustering, text summarization, and information extraction should be applied to obtain more sophisticated results in future studies.