Estimating Pavement Condition by Leveraging Crowdsourced Data

Gu, Yangsong; Khojastehpour, Mohammad; Jia, Xiaoyang; Han, Lee D.

doi:10.3390/rs16122237

Open AccessArticle

Estimating Pavement Condition by Leveraging Crowdsourced Data

¹

Department of Civil and Environmental Engineering, University of Tennessee, Knoxville, TN 37996, USA

²

Tennessee Department of Transportation, Nashville, TN 37243, USA

^*

Author to whom correspondence should be addressed.

Remote Sens. 2024, 16(12), 2237; https://doi.org/10.3390/rs16122237

Submission received: 23 April 2024 / Revised: 16 June 2024 / Accepted: 18 June 2024 / Published: 20 June 2024

(This article belongs to the Special Issue Harnessing the Geospatial Data Revolution for Promoting Sustainable Transport Systems)

Download

Browse Figures

Versions Notes

Abstract

:

Monitoring pavement conditions is critical to pavement management and maintenance. Traditionally, pavement distress is mainly identified via accelerometers, videos, and laser scanning. However, the geographical coverage and temporal frequency are constrained by the limited amount of equipment and labor, which sometimes may delay road maintenance. By contrast, crowdsourced data, in a manner of crowdsensing, can provide real-time and valuable roadway information for extensive coverage. This study exploited crowdsourced Waze pothole and weather reports for pavement condition evaluation. Two surrogate measures are proposed, namely, the Pothole Report Density (PRD) and the Weather Report Density (WRD). They are compared with the Pavement Quality Index (PQI), which is calculated using laser truck data from the Tennessee Department of Transportation (TDOT). A geographically weighted random forest (GWRF) model was developed to capture the complicated relationships between the proposed measures and PQI. The results show that the PRD is highly correlated with the PQI, and the correlation also varies across the routes. It is also found to be the second most important factor (i.e., followed by pavement age) affecting the PQI values. Although Waze weather reports contribute to PQI values, their impact is significantly smaller compared to that of pothole reports. This paper demonstrates that surrogate pavement condition measures aggregated by crowdsourced data could be integrated into the state decision-making process by establishing nuanced relationships between the surrogated performance measures and the state pavement condition indices. The endeavor of this study also has the potential to enhance the granularity of pavement condition evaluation.

Keywords:

pavement condition assessment; crowdsensing; spatial machine learning

1. Introduction

Pavement distress is a big concern in the transportation industry, as it could cause various issues, such as safety hazards, expanded renovation costs, and reduced ride quality. Various factors, including heavy traffic loads, adverse weather conditions, and material aging, can lead to distinctive pavement distresses such as cracking, rutting, potholes, and surface deformations [1]. It was reported that severe distress can pose great risks to motorists, as it can motivate vehicles to lose maneuver or sustain damage [2]. Hence, distressed pavements require timely detection and preservation interventions.

Monitoring pavement conditions is critical to pavement management and maintenance. Historically, road hazard data have been collected through manual inspections by trained personnel. These assessments visually assess different aspects of routes and record the nature, severity, and extent of distress. Another approach has been the use of specialized vehicles equipped with various sensors and imaging technologies. Laser trucks are commonly used by transportation agencies; however, both methods are cost-prohibitive and labor-intensive when frequently employed for large areas of pavement assessment. Additionally, pavement distress may deteriorate and even pose a great threat to vehicle property and safety if they are not identified or addressed in a timely manner [3].

Crowdsensing is a concept that harnesses the collective intelligence and participation of large groups of people, often successfully applied to various tasks such as data collection, problem-solving, and product development through online platforms or mobile applications [4,5,6]. The crowdsourced data are typically collected with reference to space and time. The main advantage of crowdsensing is its ability to harness the power of distributed knowledge and resources, enabling large amounts of data to be collected at a relatively low cost. However, crowdsourcing also has its limits. Data quality can be a concern, and privacy and security concerns can arise when processing sensitive or personal data [7,8,9,10].

Emerging crowdsourced data, driven by massive public engagement and reflecting participants’ perception of ride comfort, have the potential to improve the monitoring of pavement conditions at a finer granularity. This study attempts to investigate how surrogate pavement performance measurement extracted from crowdsourced Waze data is associated with official pavement performance measures from the Department of Transportation. Once there exists a strong correlation, then the surrogate performance measures can be incorporated into existing pavement monitoring systems when the official pavement data are not available.

2. The Literature Review

The review section is mainly centered around data sources and performance measurements as they are two core components of pavement condition evaluation. Research gaps and opportunities driven by emerging crowdsensing are pointed out at the end of the review.

2.1. Pavement Data Source

Pavement data can be obtained through different techniques. Table 1 summarizes the data sources and their collection ways for pavement condition evaluation. Accelerometers, video (image), and laser data are the three main data sources that can illustrate pavement conditions and performance. The accelerometer data are usually acquired from smartphone sensors or dedicated accelerometer sensors. They record the longitudinal, transverse, and vertical accelerations of vehicles or smartphones [11,12,13,14]. The data are lightweight, but they are limited to represent coarse pavement surface conditions. Only the areas hit by vehicles’ wheels can be diagnosed, and the pavement distress in the middle of lanes is prone to be overlooked. Pavement video or image data are collected by cameras or drones. With the advancement of image processing techniques, scholars can identify pavement cracks, ruts, and roughness from images captured by cameras [1,15,16] or aerial images recorded by Unmanned Aerial Vehicles (UAVs) [17,18,19] or Google Maps [20,21]. Comparing cameras mounted on vehicles, UAVs offer a broader perspective and can cover larger areas quickly, yet they cannot capture detailed characteristics of pavement surfaces. Nonetheless, it requires a large amount of storage and computation resources to save and process the videos, which is hardly implemented in wide areas. The last data source is the laser. Laser trucks employ laser scanning technology to detect surface irregularities and deterioration, which can precisely identify cracks, potholes, rutting, and texture depth [22,23,24]. Laser data acquisition often involves manual inspection or maneuver and can be labor-intensive and subjective. Nowadays, the expansion of roadways and the growing need for pavement maintenance increasingly demand a more cost-effective method of pavement data acquisition. In this background, crowdsensing, powered by public citizens’ perceptions and reports, can provide valuable information about topics of interest (e.g., pavement conditions) for extensive coverage. There have been many ways of pavement distress reporting mechanisms designated for a specific jurisdiction, such as FixMyStreet [25] and SeeClickFix [26]. They collect reports and feedback from public citizens in the form of filing a report or calling the maintenance sector within a jurisdiction. Hence, their description of the pavement distress, especially the location, sometimes could be too vague to locate the pavement distress. In addition, the navigation app, Waze, also offers a platform for drivers to report pavement potholes. According to the latest Waze statistics, there are about 151 million monthly active users worldwide, and 30 million Waze users in the US [27]. Compared with other crowdsourced tools, Waze has a considerably larger user base, which assures the likelihood of identifying incidents in large areas. A recent study found that Waze pothole reports can locate the potholes earlier and more precisely than maintenance requests. Meanwhile, a large portion of pothole reports are not matched with maintenance requests, yet they are likely to be other pavement distress or missing potholes [3].

2.2. Pavement Performance Measurements

Pavement condition evaluation is crucial as it not only measures pavement performance but also influences ride comfort and ensures safety. Timely evaluation can lead to prompt maintenance and further prevent damage to vehicles. Based on the data type, different performance measurements were developed to monitor pavement conditions over the years. For instance, as Table 1 summarizes, accelerometer data, which can measure vehicle vertical fluctuations, were employed to assess the overall pavement roughness implicitly, and they were validated to be effective when compared with the International Roughness Index (IRI) [11,12,33,34]. Video collected for pavement surface was used to extract the pavement distress like longitudinal and transverse cracking, rutting, and potholes [15,29]. Laser scanning data can identify both roughness, pavement distress, and the texture of pavement, and it provides a more accurate diagnosis of pavement health conditions [22]. Many transportation agencies divide pavement performance measurements into two categories: pavement roughness (e.g., IRI) and pavement distress (e.g., Pavement Distress Index). The Pavement Quality Index (PQI), which integrates both roughness and distress, is used to represent the overall pavement health conditions [35,36,37,38]. Although those sophisticated techniques and data analytics can establish the pavement condition accurately, their applications are quite limited by the data collection. It is barely possible to implement them for large-wide and consistent monitoring of pavement conditions.

2.3. Research Gaps

Traditional pavement data collection becomes a bottleneck of consistent and dynamic pavement evaluation when employed in extensive areas, due to their high demand in computation, storage, labor, and equipment. With the proliferation of smartphones, crowdsensing could contribute to a cost-effective, real-time, continuous monitoring of pavement health evaluation. Some scholars have started using smartphone (sensor)-mounted vehicles to realize the crowdsourced pavement condition evaluation, while the effectiveness of evaluation is largely affected by the experiment vehicles and selected routes. Waze, on the other hand, is frequently used by drivers for daily commutes and has relatively large coverage on road networks. Although they have shown the prominent benefits of traffic incident detection and pothole detection [3,4], their capability in pavement condition evaluation remains unknown. This study attempts to propose surrogate performance measures based on crowdsourced Waze reports and validate their effectiveness by connecting them with the overall Pavement Quality Index from the PMS system. This work can potentially pave the pavement condition evaluation toward a crowdsensing era.

3. Methodology

First, a new performance measurement based on crowdsourced Waze reports is proposed by accounting for the redundancy of data. Then, a geographically weighted random forest model is established to calibrate the complicated association between proposed surrogate measures and the PQI values from the Pavement Management System (PMS) system maintained by the Tennessee Department of Transportation (TDOT).

3.1. Surrogate Pavement Quality Measures

Via the Waze app, riders can report potholes as they perceive any discomfort driving or notice any potholes on the pavement. A previous study compared potholes identified by pothole reports and official pothole repair requests, and found that a large portion of pothole reports are not matched with repair records or requests, suggesting either missing potholes or other pavement distress, e.g., cracking [3]. This is highly possible because riders’ perceptions and knowledge about the pothole may not be consistent or aligned with the traditional definition of potholes. Hence, the pothole reported by riders might generally represent a discomforting ride caused by rough surface or pavement distress. It is worthwhile to employ pothole reports to illustrate the pavement conditions. In addition, as pavement distress is likely to form and deteriorate under severe weather conditions, like flooding, ice, and snow, the weather hazard reports can also serve as an indicator of pavement conditions.

As indicated by previous studies, Waze report frequency can be affected by traffic exposure while the relative amount of reports can reflect the severity of incidents [4]. Hence, the proposed measures normalized the pothole and weather reports by one-mile segments, Annual Average Daily Traffic (AADT), and one-year period. It should be noted that one mile is used for geographic units as it is reported that approximately 80% of reports are found to be within one mile of incidents [3,4]. Furthermore, Waze also labels the reliability of reports based on the user’s previous reporting accuracy. This study, following previous studies [5,10], only utilized reports whose reliability score is greater than 5. To ensure the resulting values are practical and to avoid issues with small numbers, we then scaled the normalized frequencies by a factor of 1000. Consequently, the Pothole Report Density (PRD) represents the adjusted frequency of pothole reports per 1000 vehicles for each segment over a year, providing a more precise measure for comparison, which can be formulated as follows:

{P R D}_{i} = \frac{{N P}_{i}}{1 m i l e \times 1 y e a r \times A A D T_{i}} \times 1000

(1)

Likewise, the Weather Report Density (WRD) is formulated as follows:

W R D_{i} = \frac{{N W}_{i}}{1 m i l e \times 1 y e a r \times A A D T_{i}} \times 1000

(2)

where

i

denotes the segment

i

, and

N P_{i}

and

{N W}_{i}

represent the number of potholes and weather hazard reports for the segment

i

, respectively.

3.2. Official Pavement Quality Index

As riders might report other pavement distress or uneven pavement surfaces as potholes, a comprehensive indicator is more appropriate to link to reports than dedicated indicators. Therefore, PQI is used as the ground truth which represents both roughness and distress by integrating integrates Pavement Smoothness Index (PSI) and Pavement Distress Index (PDI), as written by Equation (3). Please note that PQI is scaled from 0 to 5 and the perfect value is 5 in TDOT PMS system. PDI encompasses the larger portion because pavement distresses indicate current pavement problems. PSI is a measure of the roughness of the road, which represents the longitudinal and transverse profile and the cross slope of the pavement surface. Hence, it is also a measurement of roughness. The TDOT [39] reports it as an exponent of the IRI, see Equation (4). In addition, PDI measures roadway distress, including fatigue, rutting, longitudinal and transverse cracks, and so on. It is important to note that we utilize the PSI rather than its individual components because Waze-reported potholes may not solely indicate pavement distress, but could also reflect other factors impacting ride quality, such as smoothness issues.

P Q I = P S I^{0.3} \times P D I^{0.7}

(3)

P S I = 5 \times e^{- 0.0055 \cdot I R I}

(4)

3.3. Geographically Weighted Random Forest

To untangle the relations between proposed crowdsourced pavement measures and official pavement metrics, we employed the geographically weighted random forest (GWRF) model which integrates the random forest (RF) and spatial disaggregation rule of geographical models. Random forest is chosen for geographical regression because it works well not only in explaining complex relationships between outcome and explanatory variables but also in handling high-dimensional predictors even with a small number of samples [40]. In machine learning models, RF is known as ensemble learning which generates many classifiers and aggregates their results for both classification and regression. Initially, a specific number of trees are built by randomly resampling data with replacement. Then, trees grow independently because of bagging methods, and within each tree, the tree node is split by the best among a random subset of features. In the end, prediction results from all trees are averaged (where the weight of trees is equal) as results [41]. Cross-validation is performed to tune the number of features selected at each candidate split. Generally, an RF model can be simplified as follows:

Y_{i} = f (x_{i}) + ϵ_{i}

(5)

where

Y_{i}

is the value of the dependent variable for the ith sample, and

f (x_{i})

is the non-linear prediction of RF dependent on a series of features

x_{i}

.

ϵ_{i}

is an error term. In geographically weighted RF, we can extend the equation by weighting the features, which is as follows:

Y_{i} = f (W_{i j} (d_{i j}) x_{i}) + ϵ_{i}

(6)

where

W_{i j} (d_{i j})

denotes the weight of features, with

d_{i j}

being the spatial distance between the local model

i

and its neighbor

j

. A local model is built for each data location, considering only nearby observations defined by the kernel function. Previous studies have shown two weighting schemes: fixed bandwidth and adaptive bandwidth. The fixed bandwidth applies a certain distance to query the nearest neighbors while the adaptive bandwidth queries a certain number of nearest neighbors instead of the fixed distance. Considering the radial network shape of study routes, the adaptive bandwidth is employed to ensure the minimum required observation for model inference. As for the kernel function, this study employed the bi-square kernel to compute the decaying weight for neighbors from the near end to the far end [42]. The weight matrix is a diagonal matrix, with diagonal elements being the weight.

W_{i j} (d_{i j}; h_{s}) = {(1 - \frac{d_{i j}^{2}}{h_{s}^{2}})}^{2}

(7)

Here,

W_{i j}^{S}

with superscript s denotes the spatial weights, and

h_{S}

corresponds to the spatial bandwidth that covers

h_{S}

observations. Notably, in this study,

d_{i j}

adopts the network route distance between two segments, which is more practical than Euclidean distance. The bandwidth

h_{S}

can affect the number of neighbors used for running a local model, and further impact the model inference. Hence, to obtain the optimal estimation model, we tuned the bandwidth by minimizing the overall model performance, which is described below.

3.4. Model Evaluation

The model performance is measured and compared by

R^{2}

, Mean Absolute Error (MAE), and Root Mean Squared Error (RMSE), which are formulated by Equations (8)–(10), separately:

R^{2} = 1 - \frac{\sum_{i}^{n} {(y_{i} - \hat{y_{i}})}^{2}}{\sum_{i}^{n} {(y_{i} - \bar{y_{i}})}^{2}}

(8)

M A E = \frac{\sum_{i}^{n} | y_{i} - {\hat{y}}_{i} |}{n}

(9)

R M S E = \sqrt{\sum_{i}^{n} \frac{{(y_{i} - \hat{y_{i}})}^{2}}{n}}

(10)

where

y_{i}

and

{\hat{y}}_{i}

are the observation and predicted PQI of a segment

i

.

n

is the total sample size.

4. Data Source

This study makes use of the Waze reports and pavement condition index from the Tennessee Department of Transportation [39]. Waze data are obtained from the TDOT through the “Waze for Cities” program with the Waze company, which is free to researchers. Waze data provides information including location (i.e., latitude and longitude), timestamp, street, report type (e.g., road hazard), and reliability of reports. In this study, the reports pertaining to affecting pavement conditions including potholes and weather hazards are exploited. Meanwhile, the TDOT evaluates the pavement condition annually for interstates. The raw data were collected by laser trucks, and they were deemed as the ground truth in this study. Finally, a case study primarily focuses on five backbone corridors in Nashville, TN, USA: I-24, I-40, I-65, I-440, and SR-155, as Figure 1 shows. The data for the entire year 2022 were collected for analysis. In total, there are 35035 pothole reports and 1343 weather reports collected for the abovementioned corridors, respectively. They were aggregated into 211 segments, with each segment being one mile in length.

Furthermore, the built environment factors that can affect pavement surface conditions are prepared to align with previous studies, which include the following: (1) Weather conditions such as temperature and radiation can cause expansion and contraction in pavement materials, leading to cracks and other types of distress. The weather data are collected from the GridMet dataset. (2) Pavement characteristics such as age and pavement types, which are provided by the PMS system. (3) Operational factors, especially large and heavy vehicles can cause rutting and cracking of pavement over time, which is collected from the HPMS (Highway Performance Monitoring System).

Table 2 presents the summary statistics of explanatory variables. The PRD has a mean of 2.16 and a standard deviation of 3.08. The relatively high standard deviation suggests a significant spatial variation in pothole report frequency, which is also indicated by Figure 1a. The west of I-40, east of bypass SR-155, and interchanges between I-24, I-65, and I-40 present a larger PRD than other areas, indicating worse pavement qualities in these areas. The low mean and standard deviation of WRD suggests that such weather report frequency is relatively rare but could have significant localized impacts when they do occur, as shown in Figure 1b. Further, daily weather data were collected from the GridMet database [43], which was then aggregated to compute the annual mean and standard deviation for each weather indicator. The substantial mean and standard deviation observed in AADTT highlight the spatial variability in heavy vehicle traffic, potentially contributing to the spatial variation in pavement damage. On average, the study pavement is 13.5 years old. The segmented routes consist of 192 bituminous (asphalt) pavements and 18 concrete pavements.

Furthermore, we performed a Pearson’s correlation test to explore the relationship between the PQI and the explanatory variables of interest. Table 3 indicates that PQI is negatively associated with PRD, which is statistically significant at a 95% confidence level. As PRD increases, the pavement quality tends to decrease. By contrast, WRD exhibits a weak positive correlation with PQI, though the relationship is not statistically significant according to the p-value. This suggests that, overall, there is no clear relationship between WRD and PQI. However, this does not necessarily imply the absence of localized associations. Lastly, it makes sense that age negatively affects the pavement conditions, with a statistically significant p-value.

5. Results

5.1. Bandwidth Selection

Firstly, as the key parameter of the GWRF model, the bandwidth was optimized by evaluating MAE and RMSE by iterating possible bandwidths. As shown in Figure 2, the optimization process revealed that incorporating 60 neighbors for the local model fitting resulted in the minimum prediction error. Notably, as the bandwidth expands, the performance of the local model degrades, indicating that a local model is preferable to a global model for this application. A primary reason for this observation is that neighboring segments, which experience similar traffic and weather conditions, exert a greater impact on the target segment than do segments from more distant areas.

5.2. Variable Importance

The local random forest model is implemented using the “ranger” package in R, which is also the foundation of the GWRF model. After tuning the bandwidth, the GWRF model is performed with the optimized bandwidth. In addition, two parameters of random forest are determined empirically. The mtry parameter is set to one-third of the total number of features, amounting to six, while the number of trees (ntree) is determined to be 500, according to previous research [40,44].

To capture the contribution of explanatory variables to the PQI values, the variable importance is calculated as the percent of increased mean squared error after permutating a variable

x_{i}

in a local random forest model. Hence, the higher the incMSE, the more important the explanatory variable. It can be written as Equation (11):

i n c M S E (%) = \frac{M S E_{o o b} (R F (x_{i})) - M S E_{o o b} (R F)}{M S E_{o o b} (R F)}

(11)

where

M S E_{o o b}

is the mean square error of out-of-bag samples.

Figure 3 presents the results of variable importance for GWRF averaged from local sub-models. It can be found that the variables that are strongly associated with the pavement conditions are pavement AGE, PRD, and AADTT. The pavement AGE is the most influential factor, with a high importance score suggesting that the condition of the pavement deteriorates predictably over time. This is consistent with the understanding that material fatigue and exposure to elements gradually weaken pavement integrity. PRD serves as a direct indicator of surface distress, with its significant importance score indicating that areas with higher pothole reports are likely to have compromised pavement conditions. This underscores the value of crowdsourced reporting that can track overall pavement conditions. The truck volume is also a key variable influencing the pavement conditions, which can impose significant stress on pavement structures. In contrast, weather report density, along with other weather predictors, does not show a significant contribution to the PQI estimation. It is also possible that the effects of weather are more diffuse or that the model captures their influence indirectly through other variables like PRD, which could increase following weather-related damage.

5.3. Spatial Heterogeneous Association

Figure 4 presents the spatial distribution of the variable importance for the relevant predictors: AGE, PRD, AADTT, and WRD. The redder color indicates higher variable importance while the greener color suggests lower importance. Notably, the uneven distribution of segment color highlights the evident spatial association of PQI values. For example, as depicted in Figure 4a, the interchange of I-40 and I-24 is an area where pavement age is identified as having a relatively high association with PQI. It is plausible that freeway interchanges were constructed earlier in the infrastructure development timeline. Additionally, the complex nature and heavy usage of these interchanges can make them more challenging to manage and maintain effectively. The significant contribution of Pothole Report Density across nearly all areas in Figure 4b suggests that the surrogate measure derived from Waze reports is a critical indicator for assessing pavement conditions. This implies that user-generated reports of potholes are not only frequent and widespread but also closely correlated with the actual conditions of the pavement. Such data can enable transportation agencies to prioritize maintenance and repair work based on where users are most frequently reporting issues. It also underscores the potential of integrating user-generated data into traditional pavement management systems to enhance the responsiveness and accuracy of pavement quality assessments. Figure 4c illustrates the effect of truck traffic volume on pavement quality. It reveals that areas with heavy vehicle flow, particularly freeway junctions like those at I-440 and I-24, I-65 and SR-155, as well as I-440 and I-65, along with the peripheral city areas, are more susceptible to deterioration due to the frequent heavy vehicles. Although WRD generates relatively lower importance to PQI values than the abovementioned predictors, WRD can still provide valuable information on a relative scale. For instance, as Figure 4d shows, the variable importance of WRD at the I-440 and I-65 junction area indicates a stronger association with PQI values than in other areas, which suggests that adverse weather like snow and ice should be promptly cleaned for roadways after users report them.

5.4. Model Performance

Figure 5 shows the spatial distribution of R² of local random forest models. The visualization indicates that most segments exhibit an R² exceeding 0.3. Notably, central urban areas are achieving R² values higher than 0.5, suggesting that the models are particularly effective at explaining the variability in pavement quality within these densely populated areas. Table 4 summarizes the comparative performance of the global random forest model and the geographically weighted random forest model. The results indicate that the GWRF model substantially surpasses the global RF model, as evidenced by considerably lower MAE and RMSE values, alongside a notably higher goodness of fit (i.e., R²). The findings indicate that by leveraging readily accessible data sources, such as crowdsourced pothole reports, information on pavement age, and truck traffic volumes, in conjunction with the GWRF model, we can achieve accurate predictions of the Pavement Quality Index.

6. Discussion

Pavement conditions are susceptible to heavy traffic loads and adverse weather, thereby exhibiting seasonal patterns. Current pavement quality data collection methods such as laser trucks and video detection could be either cost- or labor-prohibitive if frequently employed across the entire network. Hence, a more cost-effective approach becomes essential, particularly one that leverages emerging technologies like mobile data collection platforms, which can provide continuous, real-time monitoring at a significantly lower cost. In this regard, Waze collects spatiotemporal traffic incidents, along with weather and pothole information from riders. Existing studies have highlighted its significant potential for early reporting and extensive coverage of traffic crashes, disabled vehicles, congestion, and flooding. However, the potential of Waze pothole reports in evaluating pavement conditions is still not well explored.

This study presents a framework for pavement condition evaluation using crowdsourced reports from the navigation app Waze. Five backbone corridors in Nashville City, Tennessee were used to illustrate the potential of crowdsourced reports. Using Waze pothole and weather reports, we established two surrogated performance measures, which are PRD and WRD, separately. We compared them with the official overall pavement evaluation index (i.e., PQI) through a geographically weighted random forest model. As indicated by the variable importance of local RF models, the PRD is the second most important variable in relation to PQI, followed by the factor of pavement age. This finding suggests that PRD could well represent the pavement’s overall condition among all other relevant factors. Additionally, the GWRF model reveals that highway interchange areas are the places where PRD significantly correlates with the PQI, which suggests those areas should be promptly treated. In contrast, the contribution of weather reports to pavement condition evaluation is quite subtle, likely due to the infrequency of these reports.

There are also some limitations of this study. First, although people might report potholes whenever and wherever they come across discomforting driving, some reports might merely refer to potholes, yet other pavement distress such as cracking might be overlooked. Therefore, using aggregated pothole reports to evaluate overall pavement conditions might be biased if not supported by sufficient report data. Second, although crowdsourced data has extensive spatiotemporal coverage, redundancy and reliability are the concerns. Using traffic exposure and segment length to normalize the reports is a proper way to mitigate the potential flaws but it is not the best way to do so, as Waze users’ penetration might vary over the space. Hence, in the future, the impact of heterogeneous user penetration on model performance should be investigated. Third, the study used reports collected on freeways where a large amount of traffic can ensure a good number of reports. For those local streets and arterial routes, the report frequency might be an issue due to lower traffic volumes. Hence, future studies could also connect local street pothole reports with pavement conditions. Finally, this study did not consider the temporal variation of pavement conditions due to the temporal granularity of ground truth data, but it is worthwhile examining the pavement condition at a monthly level, which will be helpful for maintenance.

The crowdsourced pavement evaluation might suffer from limitations due to the nature of crowdsensing. Nonetheless, the findings of this study suggest that the aggregated pothole reports can be incorporated into overall pavement condition evaluation. Especially, when the pavement data are not available, the surrogated metrics could be of importance to help transportation agencies identify the priority of maintenance. In addition, the spatial varying importance of surrogated pavement performance measurement could offer great insights into localized solutions to pavement maintenance.

7. Conclusions

Pavement condition evaluation is critical to its management maintenance. We created a surrogate metric for assessing pavement conditions by utilizing crowdsourced pothole reports obtained from the Waze navigation app. Recognizing that pothole reports may reflect broader pavement distress due to their impact on ride comfort, we normalized these reports by segment length and traffic volume to establish a Pothole Report Density. We then correlated this metric with the official Pavement Quality Index. Incorporating additional factors that might influence pavement quality, we applied a GWRF model to elucidate the relationship between PQI and PRD. The GWRF outperforms global RF significantly in terms of goodness of fit, and it uncovers the spatial heterogeneous association between PQI and its factors. The average variable importance indicates that PRD is the second most important factor that is associated with PQI. These findings suggest that PRD is a viable indicator for overall pavement condition, as it encapsulates the frequency of distress signals that affect ride quality. The practical implications of these findings are significant, particularly considering the cost-effective nature of utilizing crowdsourced data from Waze reports, which are both freely available and extensive in coverage. This approach facilitates a wide-reaching and economically efficient evaluation of pavement conditions. Moreover, the real-time nature of reports from Waze users provides a dynamic dimension to pavement monitoring. As users report their experiences of ride discomfort immediately when they occur, transportation agencies have the potential to increase the frequency and timeliness of pavement evaluations.

The crowdsourced data, that is, Waze reports, employed in this study has many advantages compared to existing crowdsensing tools. Although traditional techniques (e.g., video, accelerometers) are increasingly developed in a crowdsensing manner, the limitations sourced from the traditional techniques cannot be eliminated. The coverage of pavement evaluation can be extended under crowdsensing, but on the other hand, the cost of employing many sensors (e.g., laser, accelerometers, and cameras), as well as computation demand, also drastically increases. However, there are a large amount of Waze users active on the road network every day, and they can easily report pavement potholes through the app. The report information is shared in a real-time fashion and is free to non-profit organizations. Hence, the population of Waze users can ensure good spatiotemporal coverage of road pavement. Additionally, some transportation agencies offer online reporting tools for citizens to report pavement distress. Like Waze reports, they are all powered by citizens’ perceptions and knowledge of events. However, they are typically restricted by citizens’ description of pavement distress, especially, the location of pavement distress. Nonetheless, future studies could combine other crowdsourced pavement information with Waze reports to increase the accuracy of pavement condition evaluation.

In situations where pavement quality data are scarce, particularly in Tennessee where such data are primarily collected via laser trucks, Waze data, alongside other accessible data like traffic volume and road geometry, can help estimate pavement conditions. A more promising application of this study is that we can increase the frequency of pavement condition evaluation as a massive number of Waze users serve as moving sensors on the road network every day. The pavement crews can promptly respond to those areas with large PRD values thereby mitigating further surface deterioration. Compared to existing ways of pavement quality measurement, transportation agencies and DOTs can utilize these surrogate measures at a low cost, whenever pavement data are not available, for evaluation purposes.

Author Contributions

Conceptualization, Y.G. and M.K.; methodology, Y.G.; validation, Y.G., M.K. and L.D.H.; formal analysis, Y.G. and M.K.; resources, Y.G., X.J. and L.D.H.; data curation, Y.G. and X.J.; writing—review and editing, Y.G. and M.K. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Restrictions apply to the availability of these data. Waze crowdsourced data were obtained from Waze with the permission of the TDOT. RDS data were obtained from the TDOT.

Acknowledgments

Special thanks to the TDOT and the Waze for Cities (WFC) program for providing data used in this study.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Hadjidemetriou, G.M.; Christodoulou, S.E. Vision-and entropy-based detection of distressed areas for integrated pavement condition assessment. J. Comput. Civ. Eng. 2019, 33, 04019020. [Google Scholar] [CrossRef]
Li, Y.; Liu, C.; Ding, L. Impact of pavement conditions on crash severity. Accid. Anal. Prev. 2013, 59, 399–406. [Google Scholar] [CrossRef] [PubMed]
Gu, Y.; Liu, Y.; Liu, D.; Han, L.D.; Jia, X. Spatiotemporal kernel density clustering for wide area near Real-Time pothole detection. Adv. Eng. Inform. 2024, 60, 102351. [Google Scholar] [CrossRef]
Liu, Y.; Hoseinzadeh, N.; Gu, Y.; Han, L.D.; Brakewood, C.; Zhang, Z. Evaluating the coverage and spatiotemporal accuracy of crowdsourced reports over time: A case study of Waze event reports in Tennessee. Transp. Res. Rec. 2024, 2678, 468–481. [Google Scholar] [CrossRef]
Amin-Naseri, M.; Chakraborty, P.; Sharma, A.; Gilbert, S.B.; Hong, M. Evaluating the reliability, coverage, and added value of crowdsourced traffic incident reports from Waze. Transp. Res. Rec. 2018, 2672, 34–43. [Google Scholar] [CrossRef]
Khojastehpour, M.; Sahebi, S.; Samimi, A. Public acceptance of a crowdsourcing platform for traffic enforcement. Case Stud. Transp. Policy 2022, 10, 2012–2024. [Google Scholar] [CrossRef]
Kanhere, S.S. Participatory sensing: Crowdsourcing data from mobile smartphones in urban spaces. In Distributed Computing and Internet Technology, Proceedings of the 9th International Conference, ICDCIT 2013, Bhubaneswar, India, 5–8 February 2013; Springer: Berlin/Heidelberg, Germany, 2013. [Google Scholar]
Chatzimilioudis, G.; Konstantinidis, A.; Laoudias, C.; Zeinalipour-Yazti, D. Crowdsourcing with smartphones. IEEE Internet Comput. 2012, 16, 36–44. [Google Scholar] [CrossRef]
Hoseinzadeh, N.; Gu, Y.; Han, L.D.; Brakewood, C.; Freeze, P.B. Estimating freeway level-of-service using crowdsourced data. Informatics 2021, 8, 17. [Google Scholar] [CrossRef]
Li, X.; Dadashova, B.; Yu, S.; Zhang, Z. Rethinking highway safety analysis by leveraging crowdsourced waze data. Sustainability 2020, 12, 10127. [Google Scholar] [CrossRef]
Chen, K.; Lu, M.; Fan, X.; Wei, M.; Wu, J. Road condition monitoring using on-board three-axis accelerometer and GPS sensor. In Proceedings of the 2011 6th International ICST Conference on Communications and Networking in China (CHINACOM), Harbin, China, 17–19 August 2011. [Google Scholar]
Staniek, M. Road pavement condition diagnostics using smartphone-based data crowdsourcing in smart cities. J. Traffic Transp. Eng. (Engl. Ed.) 2021, 8, 554–567. [Google Scholar] [CrossRef]
Chuang, T.-Y.; Perng, N.-H.; Han, J.-Y. Pavement performance monitoring and anomaly recognition based on crowdsourcing spatiotemporal data. Autom. Constr. 2019, 106, 102882. [Google Scholar] [CrossRef]
Lima, L.C.; Amorim, V.J.P.; Pereira, I.M.; Ribeiro, F.N.; Oliveira, R.A.R. Using crowdsourcing techniques and mobile devices for asphaltic pavement quality recognition. In Proceedings of the 2016 VI Brazilian Symposium on Computing Systems Engineering (SBESC), João Pessoa, Brazil, 1–4 November 2016. [Google Scholar]
Chua, K.M.; Xu, L. Simple procedure for identifying pavement distresses from video images. J. Transp. Eng. 1994, 120, 412–431. [Google Scholar] [CrossRef]
Qureshi, W.S.; Hassan, S.I.; McKeever, S.; Power, D.; Mulry, B.; Feighan, K.; O’Sullivan, D. An exploration of recent intelligent image analysis techniques for visual pavement surface condition assessment. Sensors 2022, 22, 9019. [Google Scholar] [CrossRef] [PubMed]
Astor, Y.; Nabesima, Y.; Utami, R.; Sihombing, A.V.R.; Adli, M.; Firdaus, M.R. Unmanned aerial vehicle implementation for pavement condition survey. Transp. Eng. 2023, 12, 100168. [Google Scholar] [CrossRef]
Zhao, Y.; Zhou, L.; Wang, X.; Wang, F.; Shi, G. Highway Crack Detection and Classification Using UAV Remote Sensing Images Based on CrackNet and CrackClassification. Appl. Sci. 2023, 13, 7269. [Google Scholar] [CrossRef]
Cardenal, J.; Fernández, T.; Pérez-García, J.L.; Gómez-López, J.M. Measurement of road surface deformation using images captured from UAVs. Remote Sens. 2019, 11, 1507. [Google Scholar] [CrossRef]
Han, S.; Chung, I.-H.; Jiang, Y.; Uwakweh, B. PCIer: Pavement Condition Evaluation Using Aerial Imagery and Deep Learning. Geographies 2023, 3, 132–142. [Google Scholar] [CrossRef]
Jiang, Y.; Han, S.; Bai, Y. Development of a pavement evaluation tool using aerial imagery and deep learning. J. Transp. Eng. Part B Pavements 2021, 147, 04021027. [Google Scholar] [CrossRef]
Salameh, R.; Tsai, Y. Adoption of 3D Laser Imaging Systems for Automated Pavement Condition Assessment in the United States: Challenges and Opportunities. Airfield Highw. Pavements 2021, 2021, 219–230. [Google Scholar]
Mu, X.; Li, L.; Tang, N.; Ce, L.; Jiang, X. Laser-based system for highway pavement texture measurement. In Proceedings of the 2003 IEEE International Conference on Intelligent Transportation Systems, Shanghai, China, 12–15 October 2003. [Google Scholar]
Hildebrand, G.; Rasmussen, S.; Andrés, R. Development of a laser-based high speed deflectograph. In Nondestructive Testing of Pavements and Backcalculation of Moduli: Third Volume; ASTM International: West Conshohocken, PA, USA, 2000. [Google Scholar]
FixMyStreet. 2024. Available online: https://www.fixmystreet.com/ (accessed on 15 March 2024).
SeeClickFix. 2024. Available online: https://crm.seeclickfix.com/ (accessed on 15 March 2024).
Waze. Waze Statistics and User Count. 2023. Available online: https://expandedramblings.com/index.php/waze-statistics-facts/ (accessed on 3 June 2024).
Yi, C.-W.; Chuang, Y.-T.; Nian, C.-S. Toward crowdsourcing-based road pavement monitoring by mobile sensing technologies. IEEE Trans. Intell. Transp. Syst. 2015, 16, 1905–1917. [Google Scholar] [CrossRef]
Radopoulou, S.C.; Brilakis, I.; Doycheva, K.; Koch, C. A framework for automated pavement condition monitoring. In Proceedings of the Construction Research Congress 2016, San Juan, Puerto Rico, 31 May–2 June 2016. [Google Scholar]
Inzerillo, L.; Acuto, F.; Di Mino, G.; Uddin, M.Z. Super-resolution images methodology applied to UAV datasets to road pavement monitoring. Drones 2022, 6, 171. [Google Scholar] [CrossRef]
Tsai, Y.-C.J.; Li, F. Critical assessment of detecting asphalt pavement cracks under different lighting and low intensity contrast conditions using emerging 3D laser technology. J. Transp. Eng. 2012, 138, 649–656. [Google Scholar] [CrossRef]
Yu, X.; Salari, E. Pavement pothole detection and severity measurement using laser imaging. In Proceedings of the 2011 IEEE International Conference on Electro/Information Technology, Mankato, MN, USA, 15–17 May 2011. [Google Scholar]
Dennis, E.P.; Hong, Q.; Wallace, R.; Tansil, W.; Smith, M. Pavement condition monitoring with crowdsourced connected vehicle data. Transp. Res. Rec. 2014, 2460, 31–38. [Google Scholar] [CrossRef]
Zhang, Z.; Zhang, H.; Xu, S.; Lv, W. Pavement roughness evaluation method based on the theoretical relationship between acceleration measured by smartphone and IRI. Int. J. Pavement Eng. 2022, 23, 3082–3098. [Google Scholar] [CrossRef]
Jia, X.; Woods, M.; Gong, H.; Zhu, D.; Hu, W.; Huang, B. Evaluation of influence of pavement data on measurement of deflection on asphalt surfaced pavements utilizing traffic speed deflection device. Constr. Build. Mater. 2021, 270, 121842. [Google Scholar] [CrossRef]
Jia, X.; Woods, M.; Zhu, D.; Huang, B. Incorporation of National Pavement Performance Measures Into Decision-Making Process. Transp. Res. Rec. 2023, 2677, 176–188. [Google Scholar] [CrossRef]
Reza, F.; Boriboonsomsin, K.; Bazlamit, S. Development of a pavement quality index for the state of Ohio. In Proceedings of the 85th Annual Meeting of The Transportation Research Board, Washington, DC, USA, 22–26 January 2006. [Google Scholar]
Gong, M.; Zhang, H.; Liu, Z.; Fu, X. Study on PQI standard for comprehensive maintenance of asphalt pavement based on full-cycle. Int. J. Pavement Eng. 2022, 23, 4277–4290. [Google Scholar] [CrossRef]
TDOT. Pavement Management 2022 Data; Tennessee Department of Transportation, Ed.; TDOT: Nashville, TN, USA, 2022. [Google Scholar]
Gu, Y.; Liu, D.; Arvin, R.; Khattak, A.J.; Han, L.D. Predicting intersection crash frequency using connected vehicle data: A framework for geographical random forest. Accid. Anal. Prev. 2023, 179, 106880. [Google Scholar] [CrossRef]
Belgiu, M.; Drăguţ, L. Random forest in remote sensing: A review of applications and future directions. ISPRS J. Photogramm. Remote Sens. 2016, 114, 24–31. [Google Scholar] [CrossRef]
Gu, Y.; Zhang, H.; Han, L.D.; Khattak, A. Modeling spatiotemporal heterogeneity in interval-censored traffic incident time to normal flow by leveraging crowdsourced data: A geographically and temporally weighted proportional hazard analysis. Accid. Anal. Prev. 2024, 195, 107406. [Google Scholar] [CrossRef]
Abatzoglou, J.T. Development of gridded surface meteorological data for ecological applications and modelling. Int. J. Climatol. 2013, 33, 121–131. [Google Scholar] [CrossRef]
Wu, D.; Zhang, Y.; Xiang, Q. Geographically weighted random forests for macro-level crash frequency prediction. Accid. Anal. Prev. 2024, 194, 107370. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Spatial distribution of surrogate measures, (a) pothole report density, and (b) weather report density.

Figure 2. MAE and RMSE of GWRF with different bandwidths.

Figure 3. Average variable importance %incMSE.

Figure 4. Spatial distribution of variable importance of (a) pavement AGE; (b) PRD; (c) AADTT; and (d) WRD.

Figure 5.

R^{2}

of local random forest models.

Figure 5.

R^{2}

of local random forest models.

Table 1. Representative studies about pavement condition evaluation.

Data Category	Reference	Data Acquisition	Performance Metrics	Limitations
Accelerometers	Chen, Lu [11], Staniek [12], Chuang, Perng [13], Lima, Amorim [14], Yi, Chuang [28]	Smartphone sensors	Overall condition, vehicle oscillations, roughness	Vehicle-mounted, limited to vehicles, and routing
Video (image)	Chua and Xu [15] Hadjidemetriou and Christodoulou [1], Qureshi, Hassan [16], Radopoulou, Brilakis [29] Inzerillo, Acuto [30], Astor, Nabesima [17], Zhao, Zhou [18], Cardenal, Fernández [19], Han, Chung [20], Jiang, Han [21], Inzerillo, Acuto [30]	Cameras, Unmanned Aerial Vehicles (UAVs)	Pavement cracking, rutting, roughness	Infrequent scanning, low spatial coverage, computational and storage demanding
Laser	Salameh and Tsai [22], Xiangyang, Lin [23], Hildebrand, Rasmussen [24], Tsai and Li [31], Yu and Salari [32]	Laser trucks	Pavement texture, smoothness, roughness, rut, damage	Expensive, infrequent scanning, low spatial coverage, storage, and computational demanding, labor-intensive

Table 2. Descriptive statistics of variables (N = 210).

Categories	Variables	Mean	Std	Full Name
Outcome	PQI	3.85	0.52	Pavement Quality Index
Surrogate measures	PRD	2.16	3.08	Pothole Report Density
Surrogate measures	WRD	0.07	0.08	Weather Report Density
Weather	std.pr (mm)	86.26	5.85	average daily precipitation
	mean.mean	36.48	1.91	standard deviation of daily precipitation
	std.rmax (%)	125.87	5.11	standard deviation of daily maximum relative humidity
	mean.rmax	865.13	13.53	average daily maximum relative humidity
	std.rmin	137.79	3.39	standard deviation of daily minimum relative humidity
	mean.rmin	435.62	8.90	average daily minimum relative humidity
	std.tmmn (°C)	9.83	0.10	standard deviation of daily minimum near-surface air temperature
	mean.tmmn	9.08	0.28	average daily minimum near-surface air temperature
	std.tmmx	10.08	0.08	standard deviation of daily maximum near-surface air temperature
	mean.tmmx	21.60	0.26	average daily maximum near-surface air temperature
Operational factors	AADTT	15073.71	8784.60	annual average daily truck traffic
Pavement characteristics	AGE	13.53	9.72	pavement age
	thrulanes	3.00	2.05	number of through lanes
	Surface	BIT = 192	Con = 18	surface type (BIT = bituminous, Con = concrete)

Table 3. Pearson’s correlation test between PQI and explanatory variables.

Variables	Coefficient	p-Value
PQI and PRD	−0.3798	<0.0001
PQI and WRD	0.0725	0.2954
PQI and AGE	−0.2300	0.0008

Table 4. Model Performance.

Performance	Global RF	GWRF
R²	0.59	0.98
MAE	0.24	0.04
RMSE	0.34	0.07

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Gu, Y.; Khojastehpour, M.; Jia, X.; Han, L.D. Estimating Pavement Condition by Leveraging Crowdsourced Data. Remote Sens. 2024, 16, 2237. https://doi.org/10.3390/rs16122237

AMA Style

Gu Y, Khojastehpour M, Jia X, Han LD. Estimating Pavement Condition by Leveraging Crowdsourced Data. Remote Sensing. 2024; 16(12):2237. https://doi.org/10.3390/rs16122237

Chicago/Turabian Style

Gu, Yangsong, Mohammad Khojastehpour, Xiaoyang Jia, and Lee D. Han. 2024. "Estimating Pavement Condition by Leveraging Crowdsourced Data" Remote Sensing 16, no. 12: 2237. https://doi.org/10.3390/rs16122237

APA Style

Gu, Y., Khojastehpour, M., Jia, X., & Han, L. D. (2024). Estimating Pavement Condition by Leveraging Crowdsourced Data. Remote Sensing, 16(12), 2237. https://doi.org/10.3390/rs16122237

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Estimating Pavement Condition by Leveraging Crowdsourced Data

Abstract

1. Introduction

2. The Literature Review

2.1. Pavement Data Source

2.2. Pavement Performance Measurements

2.3. Research Gaps

3. Methodology

3.1. Surrogate Pavement Quality Measures

3.2. Official Pavement Quality Index

3.3. Geographically Weighted Random Forest

3.4. Model Evaluation

4. Data Source

5. Results

5.1. Bandwidth Selection

5.2. Variable Importance

5.3. Spatial Heterogeneous Association

5.4. Model Performance

6. Discussion

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI