Next Article in Journal
VN-NDP: A Neighbor Discovery Protocol Based on Virtual Nodes in Mobile WSNs
Next Article in Special Issue
Design and Optimization of an MFL Coil Sensor Apparatus Based on Numerical Survey
Previous Article in Journal
Spectroscopic Understanding of SnO2 and WO3 Metal Oxide Surfaces with Advanced Synchrotron Based; XPS-UPS and Near Ambient Pressure (NAP) XPS Surface Sensitive Techniques for Gas Sensor Applications under Operational Conditions
Previous Article in Special Issue
Realtime Tracking of Passengers on the London Underground Transport by Matching Smartphone Accelerometer Footprints
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Intelligent Image-Based Railway Inspection System Using Deep Learning-Based Object Detection and Weber Contrast-Based Image Comparison

1
Graduate School of Advanced Imaging Science, Multimedia and Film Chung-Ang University, Seoul 06974, Korea
2
2iSYS Co. Ltd., Uiwang 15850, Gyeonggi-do, Korea
*
Author to whom correspondence should be addressed.
Sensors 2019, 19(21), 4738; https://doi.org/10.3390/s19214738
Submission received: 13 October 2019 / Revised: 26 October 2019 / Accepted: 30 October 2019 / Published: 31 October 2019
(This article belongs to the Special Issue Smart City and Smart Infrastructure)

Abstract

:
For sustainable operation and maintenance of urban railway infrastructure, intelligent visual inspection of the railway infrastructure attracts increasing attention to avoid unreliable, manual observation by humans at night, while trains do not operate. Although various automatic approaches were proposed using image processing and computer vision techniques, most of them are focused only on railway tracks. In this paper, we present a novel railway inspection system using facility detection based on deep convolutional neural network and computer vision-based image comparison approach. The proposed system aims to automatically detect wears and cracks by comparing a pair of corresponding image sets acquired at different times. We installed line scan camera on the roof of the train. Unlike an area-based camera, the line scan camera quickly acquires images with a wide field of view. The proposed system consists of three main modules: (i) image reconstruction for registration of facility positions, (ii) facility detection using an improved single shot detector, and (iii) deformed region detection using image processing and computer vision techniques. In experiments, we demonstrate that the proposed system accurately finds facilities and detects their potential defects. For that reason, the proposed system can provide various advantages such as cost reduction for maintenance and accident prevention.

1. Introduction

Image processing and computer vision algorithms are applied to inspect defects in railways for safety and maintenance, which is called image-based railway inspection system (IRIS). The IRIS automatically detects surface cracks or defects to prevent accidents from RGB or gray-scale images. In various industry fields, existing inspection systems used simple image processing techniques, such as feature extraction, histogram analysis, and graph cut, to name a few [1,2]. Similar to image classification and analysis fields, deep learning can enable more accurate detection of defects in general inspection applications [3,4]. However, it is difficult to inspect structures and facilities in railway environment because of an enormous inspection region.
To inspect a railway, it is important to automatically detect various structures and facilities, such as screws connecting railway tracks, and sleepers and catenary mast supporting power cable. Tracks are usually damaged by the friction between their surfaces and wheels. Electrification systems, such as overhead power lines, are also periodically inspected since they should stably supply electric power to the vehicle. However, conventional inspection systems have long been dependent on human manual observation, which is very inefficient since human observer becomes easily tired and loses concentration on important objects after ten to fifteen minutes. Furthermore, only a very small time slot is allowed for human inspection since trains usually operate all day long.
To solve those problems, various image processing and computer vision-based automatic inspection approaches were proposed [5]. Min et al. detected defects on a track surface by combining several image processing algorithms [6]. Min’s system first finds tracks by analyzing features in the hue channel, and then detects defects using image enhancement and contour-based surface profiling. Karakose et al. also proposed a track diagnosis method for fault diagnosis using Canny edge detection and morphological processing [7]. Han et al. detected insulator defects for catenary maintenance in high-speed railroad [8]. Zhang et al. detected cracks in the wall of a subway tunnel using morphological processing and simple classification method [9]. This method used a line scan camera for high resolution image acquisition. Attard et al. presented a tunnel maintenance system that compares two images acquired at the same spot [10]. Jang et al. proposed a deformation detection system to inspect overhead rigid lines [11]. Recently, deep learning-based railway inspection systems were proposed. James et al. performed segmentation and classification of railroad tracks using a series of deep neural networks [12] based on U-Net [13] and dense network [14]. Gibert et al. also proposed an object detection method using multi-task learning approach for railroad track inspection [15]. This method consists of material and fastener detection branches, which share the same parameters. The fastener branch has another shared network and two-task network structures.
To inspect various facilities, a railroad inspection car (RIC) was developed [16,17,18]. RICs detect various types of defects, such as damage and wear of the facility, using surface profiling, image analysis, and stability test. Although it is efficient to maintain railroad facilities installed across the whole operating sections, the operation for maintenance is still limited during the transport service time. Moreover, it is difficult to deal with repeatedly installed structures in the railway environment since most RICs inspect them using two-dimensional (2D) images acquired while the car is moving forward at a constant speed.
This paper presents a novel railway facility inspection system for efficient maintenance in urban railroad infrastructure with a special focus on overhead rigid lines in subway tunnels. The proposed system obtains images including facilities using a gray-scale line scan camera that can generate an image with wider field of view (FoV) and higher resolution than a general RGB image sensor [19]. The proposed inspection system finds defects by detecting deformed regions by comparing two images acquired at different times. More specifically, the proposed system first performs coarse registration of two images to align positions of same facilities, and then finds main facilities using deep learning-based object detection. For the object detection task, we used single-shot multibox detector (SSD) [20] with a proper modification to improve both detection accuracy and speed. The proposed system then detects deformed regions based on image processing techniques. After finely registering two facility images, we obtain the remarkable deformation regions using Weber contrast-based image subtraction.
This paper is organized as follows. Section 2 describes the image acquisition process using line scan camera, and we present the proposed inspection system in Section 3. In Section 4, the performance of the proposed system is experimentally demonstrated, and Section 5 concludes the paper.

2. Image Acquisition Using Line Scan Camera

In this paper, we inspect railroad structures with facilities installed in the underground tunnel. Railroad vehicles are operated by electronic power supplied from overhead lines to pantograph, which are repeatedly installed along the entire driving route, and supporters fitted to the wall. Images acquired using general 2D area-based camera systems with a narrow FoV cannot include the overall shape since the camera installed on the vehicle is too close to the facilities. Although video frame-based stitching methods can be alternatively used to reconstruct the entire shape of the object, they are not suitable for real-time image stitching in the fast-moving vehicle.
The proposed system acquires railway tunnel images using a line scan camera, which was widely applied to various areas such as remote sensing and manufacturing industry. The camera sensor consists of 1 × N pixel array, and then generates a 2D image by collecting them in temporal order of acquisition. When an object moves at a constant speed during a specific period, the camera system acquires the high resolution image without stitching and registration processes.
Figure 1a shows a front view of the vehicle in an underground tunnel. A line scan camera is installed on the top of the railroad vehicle in the orthogonal direction to the vehicle’s moving direction. As shown in Figure 1b, the camera sensor scans narrow horizontal sections, and then stores each scanned result in the form of a one-dimensional (1D) signal. Each line has the resolution of 2 × 4096 with pixel size of 1.2 mm 2 . A lighting system is equipped next to the camera system to avoid low light image degradation. Since the train moves at a variable speed, the acquired image is usually distorted when the line scan camera captures at a constant acquisition interval. For that reason, a tachometer is installed on the train wheel to acquire images by controlling frame rate based on the distance traveled for a given time.
Figure 2 shows acquired images with overhead conductors of subway tunnels using the line scan camera. Each image has the size of 16 , 834 × 2048 by rotating 90 after 2 pixel binning. Since most structures and facilities are within the camera’s depth of field, we can acquire high-quality images despite some degradation problems, such as brightness saturation and noise by airborne dust.

3. Facility Inspection Algorithm

3.1. Overview

The proposed system automatically inspects structural defects of railway environment. Most facilities installed in the tunnel, such as overhead rigid conductors, are used to supply electric power into the train through its pantograph. Conventional inspection systems detect all candidates of wears and cracks using single image-based processing methods. The conventional image processing-based approach provides an acceptable detection accuracy for a small FoV images. However, it is difficult to distinguish whether detected region is a real defect or not if the image contains complicated frequency components or complex background. Moreover, single image-based systems cannot detect defects caused by structure’s shape deformation and loss of components since most overhead lines and supporters consist of durable metal materials unlike tunnel walls.
To solve these problems, the proposed system inspects structures and facilities related with overhead lines using a pair of images as shown in Figure 3. The image sets are acquired using the line scan camera at the same spots but at different times. We assume that the reference image set G b = { g 0 b ( x , y ) , g 1 b ( x , y ) , , g n 1 b ( x , y ) } is acquired before the target image set G a = { g 0 a ( x , y ) , g 1 a ( x , y ) , , g n 1 a ( x , y ) } , and they have no defects such as deformation and loss based on human inspection. The target image set is the one to be inspected. The main objective of the proposed system is to detect deformed regions for maintenance of overhead conductors by comparing two images acquired at different times. We exclude cracks on the tunnel wall as inspect subject in the proposed system since they are simply extracted by single image-based inspection systems.
The proposed system consists of three functional steps: (i) image reconstruction using registration based on phase correlation and image composition, (ii) facility detection using deep learning-based object detection, and (iii) facility inspection using image comparison approach based on Weber contrast. In this section, we describe each step of the proposed system in the following subsections.

3.2. Image Reconstruction

Given a pair of reference and target images, the proposed system first reconstructs each image. As shown in Figure 3, positions between corresponding facilities in the same driving section are not initially aligned because of various problems such as different speed and jittering of the camera. In addition, some parts of facilities are often divided into two neighboring frames in the image acquisition process.
The proposed system registers two images using phase correlation. More specifically, disparity or motion vector between two images is estimated by computing correlation in the frequency domain. It is more efficient to coarsely register two large-scale images than spatial domain-based motion estimation methods because of simple multiplication of fast Fourier transformation (FFT). The motion vector ( Δ x , Δ y ) obtained by maximizing the phase correlation is defined as
( Δ x , Δ y ) = arg max ( x , y ) F 1 F { g i b ( x , y ) } · F * { g i a ( x , y ) } | F { g i b ( x , y ) } · F * { g i a ( x , y ) } | ,
where g i b ( x , y ) and g i a ( x , y ) respectively represent the i-th frame acquired without temporal synchronization, and F and F 1 the Fourier and its inverse transformation operations, respectively. Superscript ‘*’ indicates the conjugate of a complex number and ‘·’ a pixel-by-pixel multiplication. In the proposed method, we translate g i b ( x , y ) by the horizontal motion value Δ x to prevent deformation of g i a ( x , y ) in which we should inspect facilities. The positions of facilities are coarsely aligned by translating g i b ( x , y ) using phase correlation as shown in Figure 4a.
Once g i b ( x , y ) is translated, we lose the left and right parts of the image. When we obtain the negative motion value, the translated version of g i b ( x , y ) has an empty space in the left-side region. The right-side region with the intensity values is naturally lost as shown in Figure 4a. To fill the empty space, the proposed system reconstructs the image by attaching some parts of the neighboring frame as shown in Figure 4a,b. We then respectively generate the final reconstructed images g ˜ i b ( x , y ) and g ˜ i a ( x , y ) by attaching appropriate regions of the neighboring images onto g i b ( x , y ) and g i a ( x , y ) since the left-side facility of g i a ( x , y ) is sometimes lost in the image acquisition process. Although some regions are duplicated, we can prevent from skipping the inspection of the regions. The lost region in g i b ( x , y ) is used when reconstructing neighbor frames at the previous or next inspection stage.

3.3. Facility Detection Using Convolutional Neural Network

To automatically inspect railway facilities, we should find out their positions and classify types of them since each type has different risk management standard for a deformed area. A simple approach is to define absolute positions of all facilities in advance. However, it is inefficient since we should change the facility positions whenever the reference image set is replaced by new ones.
The proposed system detects facilities using deep convolutional neural network (CNN). Object detection based on deep learning was rapidly developed with various network models. Although recently proposed models have complicated structures with many layers for high accuracy, their detection speed tends to increase due to enormous number of parameters to be trained. We should also consider the detection accuracy using the dataset consisting of the grayscale images acquired from the line scan camera. Since images have only one channel, it is difficult to apply a segmentation-based detection method using color images. An advantage of the proposed system is that does not need a complex network model since the dataset used in the proposed system is simple and monotonous. Most facilities of all classes have similar shapes and sizes in the image set, so we do not need to design a complex detection network.
To detect elements and facilities, the proposed system uses a deep neural network model that was modified from the original single-shot multibox detector (SSD) [20]. Figure 5 shows the SSD network model that takes an 512 × 512 image as input. The SSD falls into the category of the one-stage detector consisting of feed forward convolutional feature layers. Given an input image, it extracts feature maps using c i × 3 × 3 × c i + 1 convolution filters based on VGG model [21] as the baseline network at each layer, where c i represents the number of channels of the i-th layer. Objects are detected using multiple anchor boxes and softmax classifiers during the convolution process. Since the VGG has a down-scaled feature pyramid structure with a few layers, the SSD has the fast and accurate performance. However, the VGG needs a huge number of parameters for training even with simple and relatively shallow structure. Some improved network architectures derived from the SSD were proposed to improve the detection accuracy at the cost of lower speed due to the increasing computational complexity. Fortunately, the proposed system does not need to design more complex and deeper network architecture than the original SSD. Facilities of the same class in all images have similar sizes, shapes, and box ratios as shown in Figure 2. It allows easily detecting them by reducing the number of convolution layers or feature channels even if the input images captured by the line scan camera have a single channel.
Figure 6a shows the network architecture of the improved SSD. The network starts the detection process using the grayscale input image of size 512 × 512 × 1 . We used the VGG model as the baseline network, where the number of channels in each convolution layer is a quarter of the number of feature channels as shown in Figure 6b. Although it enables a quick detection performance, the accuracy decreases due to the reduced number of parameters. More specifically, the accuracy in detecting small objects rapidly decreases since the relatively shallow network loses features in a small object. To improve that drawback, we introduced additional blocks derived from the deconvolutional SSD proposed by Fu et al. [22]. As shown in Figure 6a, the network model has an auto-encoder containing upsampling layers of the same size as the corresponding blocks. Figure 6c shows the details of the decoding block. The feature map extracted in the previous block is concatenated with its corresponding block in the encoding model after passing through the upsampling layer. In the proposed network, the number of encoding feature maps is the half of the detecting block used in the original SSD. Followed by the encoder network, two convolution layers with rectified linear units are added to mix the concatenated feature channels. We use the result for object detection without a prediction module, which was used in the DSSD, to reduce the number of learning parameters, and then reduce the number of channels by half using the 1 × 1 convolution layer to repeat the decoding process. Consequently, the proposed network keeps the one stage and single-shot model using more deep layers and reduced number of parameters.
In the training process, we reduce the training images down to one quarter by splitting the input image with the width-to-height ratio of 8 : 1 into sub-images with the ratio of 1:1. We also use the same loss functions in the original version. In the detection process, the proposed system splits the reconstructed image to improve the detection accuracy. Figure 7 shows the detection strategy of the proposed system. When the images split, shapes of a facility are divided into two neighboring sub-images. The proposed system splits the image with overlapping between neighboring sub-images. Since the detection results are overlapped at the same facility, the proposed system combines the results by selecting their minimum and maximum coordinates when they overlap the area of 50 percentage. Next, the proposed system uses the phase correlation again to match and align the results of two images, g ˜ i b ( x , y ) and g ˜ i a ( x , y ) . We obtain the center coordinates of comparing results of g ˜ i a ( x , y ) , and then find the most similar objects by selecting the position with the maximum similarity. When the detector finds the same object with a difference size in g ˜ i b ( x , y ) and g ˜ i a ( x , y ) , we select the bigger bounding box. The proposed system sets the size of bounding box if it is detected in one of two images. Consequently, we obtain the detection results as pairs of bounding boxes detected respectively in g ˜ i b ( x , y ) and g ˜ i a ( x , y ) .

3.4. Facility Inspection

3.4.1. Preprocessing

The proposed system finds cracks in a facility by comparing a pair of images acquired at the same position. Although conventional methods can find thin cracks using a single image, they are not suitable for wide-area inspection. On the other hand, multiple image-based method, that is an image comparison approach, works well in thee wide-area inspection since it computes the difference of two comparing images by simple image subtraction. The results also include deformed areas of facilities for prevention of potential risk. However, multiple image-based inspection systems have some issues. Figure 8 shows a subtraction result of two images with a facility detected at the same position. The simple subtraction gives rise to an inaccurate result since the operator is dependent on intensity values of two comparing images. Although the proposed system performs image registration at the previous steps, the result has unexpected errors in the shape of facility because of train’s jittering and inconsistent velocity when comparing images are acquired at different times.
To solve these problems, the proposed system first transforms the facility image of g ˜ i b ( x , y ) similarly to the image of g ˜ i a ( x , y ) using a feature matching approach. Given a pair of j-th facility images cropped using the bounding box, g ˜ i , j b ( x , y ) without any defect and g ˜ i , j a ( x , y ) with potential wears and cracks, we match the corresponding features between two comparing images using speeded-up robust feature (SURF) [23] and random sample consensus (RANSAC) algorithm [24]. The proposed system then transforms g ˜ i , j b ( x , y ) by estimating the homography among the corresponding features of g ˜ i , j b ( x , y ) and g ˜ i , j a ( x , y ) as shown in Figure 9a. Figure 9d shows the subtraction result using feature matching and geometric transformation. The result is better than the simple subtraction as shown in Figure 8c.
Next, the proposed system performs a non-rigid registration using motion field. Unlike the homography-based approach that globally transforms the image by keeping the rectangular shape, the non-rigid registration is robust to local transformation of regions because each motion represents a displacement with orientation in the image space. To locally register the transformed image, we obtain the motion field using the optical flow estimation method that estimates dense motion vectors in all pixels using expansion of a quadratic polynomial [25]. The proposed system then obtains the registered image by warping the transformed image using the motion field as shown in Figure 9b. Figure 9e shows the subtraction result between the registered image and the potentially deformed image. The shapes of two images are almost the same with less errors than the result of Figure 9d, but some errors still remain due to the difference of the brightness.
To match the brightness level of g ˜ i , j a ( x , y ) into the warped image, the proposed system performs histogram specification. We match the intensity values of two images as
g ¯ i , j b ( x , y ) = T a 1 [ T b ( g ˜ i , j b ( x , y ) ) ] , | T a 1 [ T b ( g ˜ i , j b ( x , y ) ) ] g ˜ i , j a ( x , y ) | < | g ˜ i , j b ( x , y ) g ˜ i , j a ( x , y ) | g ˜ i , j b ( x , y ) , elsewise ,
where g ¯ i , j b ( x , y ) represents the intensity-matched version of g ˜ i , j b ( x , y ) , and T b 1 ( · ) and T a 1 ( · ) are the cumulative density functions of g ˜ i , j b ( x , y ) and g ˜ i , j a ( x , y ) , respectively. | · | indicates the absolute function. Since the histogram specification is considered to be a global intensity transfer function, some regions in the reference image may become saturated. For that reason, the proposed system decreases the subtraction error by adding the condition as shown in Figure 9f, where the histogram matching method selects the original intensity if the absolute difference with g ˜ i , j a ( x , y ) is lower than the difference between the transformed result of g ˜ i , j b ( x , y ) and g ˜ i , j a ( x , y ) .

3.4.2. Detection of Candidate Defects in the Facility

When the shapes between two comparing images are well-matched with the low difference of intensity values, the main issue of the proposed system is to detect a deformed area with an existence of noise. It is difficult to remove the noise in an image since some small cracks may be removed together with the noise. For that reason, the proposed system extracts the candidate defect regions by excluding the noise as much as possible. We first obtain a weight for modification of the subtraction result between g ¯ i , j b ( x , y ) and g ˜ i , j a ( x , y ) to reduce the common high-frequency components as
e i , j ( x , y ) = 1 { α · e i , j b ( x , y ) + ( 1 α ) · e i , j a ( x , y ) } ,
where e i , j ( x , y ) represents the weight for the edge area, e i , j b ( x , y ) and e i , j a ( x , y ) respectively high-frequency magnitudes of g ¯ i , j b ( x , y ) and g ˜ i , j a ( x , y ) using Prewitt operator, and α weight for the magnitudes.
Next, the proposed system obtains candidate of deformed regions. Although the subtraction result is improved by multiplying e i , j ( x , y ) , some errors still remain due to the noise and small registration error as shown in Figure 10b. To solve the problem, we use another weight using the Weber-Fechner’s law, which relates a perceptual stimulus change of the human vision to the initial stimulus level [26].
When we assume that the physical stimulus is equivalent to the intensity value in an image, the Weber contrast w i , j ( x , y ) is defined as
w i , j ( x , y ) = Δ I I = g ¯ i , j b ( x , y ) g ˜ i , j a ( x , y ) g ¯ i , j b ( x , y ) ,
where I and Δ I represent the original stimulus and its change, respectively. Since the proposed system aims to detect wears and cracks by comparing the non-defective image g ¯ i , j b ( x , y ) and potentially defective image g ˜ i , j a ( x , y ) , we can define the fractional relation of the stimulus change for the initial stimulus as Equation (4). Figure 10c shows the Weber contrast result. Despite the unexpected results at the common dark area of g ¯ i , j b ( x , y ) and g ˜ i , j a ( x , y ) , some deformed regions are more remarkable than others with the background and the noise because the Weber contrast finds the small intensity differences and more clearly expresses them as the intensity level [27].
We finally obtain the reliable defect candidates by multiplying two weights with the subtraction results as
d i , j ( x , y ) = 1 , e i , j ( x , y ) · w i , j ( x , y ) · | g ¯ i , j b ( x , y ) g ˜ i , j a ( x , y ) | > T 0 , elsewise ,
where d i , j ( x , y ) represents the results with the candidate defect regions and T the thresholding value. As shown in Figure 10d, we can obtain the candidate detection result reduced errors with the noise from the subtraction result of Figure 9f.

3.4.3. Defect Region Decision Using Morphological Processing

Given a candidate of the defect image, the proposed system detects the deformed area using morphological processing. We first remove tiny regions obtained by the noise as
d ^ i , j ( x , y ) = d i , j ( x , y ) d i , j ( x , y ) d i , j η ( x , y ) s 2 p + 1 ,
where d ^ i , j ( x , y ) represents the candidate image without the small regions including the noise, s 2 p + 1 structuring element with the size of ( 2 p + 1 ) × ( 2 p + 1 ) satisfying p 0 , and ⊕ indicates the morphological dilation operator. d i , j η ( x , y ) is the binary image containing the center point of the tiny regions and is defined as
d i , j η ( x , y ) = 1 , u = ( p + 1 ) p + 1 v = ( p + 1 ) p + 1 d i , j ( x + u , y + v ) u = p p v = p p d i , j ( x + u , y + v ) = 0 0 , elsewise .
Figure 11a shows the noise image obtained using Equation (7) and the structuring element with p = 2 . We can more efficiently remove the tiny regions than morphological erosion since the operation removes the thin area with the noise.
Consequently, the proposed system obtains the final result with the defect region, r i , j ( x , y ) , by performing morphological closing operator as
r i , j ( x , y ) = d ^ i , j ( x , y ) s q s q ,
where q denotes the size of structuring element and ⊖ indicates an erosion operator. Figure 11b shows the result of defect detection. The proposed system detects defect regions where most of the noise is removed in the candidate image when comparing the deformed regions as shown in red circles of Figure 11c.

4. Experimental Results

In this section, we demonstrate the performance of the proposed inspection system. We performed experiments under the environment of 2 . 2 GHz CPU, 64 GB RAM, and a Nvidia Geforce GTX 1080 Ti GPU. We implemented overall functions of the proposed system using C++ language with OpenCV library, and the detection function using Python tool with Tensorflow library. For experiments, we acquired a pair of image sets using a line scan camera in a tunnel of subway line 9 in Seoul, South Korea.
In the first experiment, we tested the improved SSD to verify the performance. We trained all comparing networks with the proposed network 3 times using different learning rates, 0.01 for 200 K, 0.001 and 0.0001 for 50 K, sequentially. Some common functions are used, the loss function used in the original SSD, Adam optimizer built in the Tensorflow, the batch size of 8, and the weight decay of 0.0005 . We trained the proposed network using 14,000 non-defect images with each size of 1:1 ratio to detect facilities of 15 classes with the background as shown in Figure 12. To evaluate the proposed network, we used a database consisting of the similar number of images with some potential defects.
We tested the performance of the proposed network by comparing with other networks designed using the original SSD (SD+VG). Table 1 shows layer configurations of five comparing networks.
SD+res50 is designed using the residual network [28] composed of 50 layers with bottleneck structure. We trained the SD+res50 using the same number of feature channels of the SD+VG model. SD+VG/2 has the same structure with the original model, but we reduced all feature map sizes as a half of each layer of the SD+VG. DSD+VG/2 and DSD+VG/4 are designed using the proposed network. The baseline model of two networks is same with SD+VG/2, but DSD+VG/4 has each half of the number of feature channels. All networks were trained using the same training condition with the proposed network.
Table 2 and Table 3 show detection results in the sense of mean of average precision (mAP) and computational time per image, respectively. We set thresholding values, detection score of 0.7 and overlapping area of bounding box of 0.3 for non-maximum suppression. The performance of SD+res50 is lower than others designed using the VGG baseline network model. SD+VG/2 has the similar mAP, but the computational time is faster than SD+VG. It reminds that many parameters and layers do not decide the performance of the network. In addition, this result verifies that light network model can have good performance when simple dataset is trained. The proposed network-based models, DSD+VG/2 and DSD+VG/4, have better mAPs and computational times than other networks. Similar to the result of the comparison with SD+VG and SD+VG/2, DSD+VG/4 is a bit faster than DSD+VG/2 even if the similar mAPs of two networks. As shown in Figure 13, the proposed network can accurately detect facilities.
Next, we tested the second experiment to measure the performance of the defect detection. In the experiments, we acquired two image sets using the line scan camera at the same section but different times. Since the database are acquired using the camera installed on the vehicle providing transportation service, we could not manipulate facilities for safety reasons. Alternatively, we simulated some defects at the image set captured after another set using a commercial image painting tool by considering possible defects such as insulator cracks and fastener loosing. We inserted 337 defects in 50 images with each ground truth (GT), and compared to their corresponding non-defect images using the proposed system. We set α = 0.5 in (3) used for the edge weight and T = 0.12 in (5) used for detection of candidate defect region. For quantitative evaluation, we measured the precision, recall, and intersection of union (IoU) as
Precision = T P T P + F P ,
Recall = T P T P + F N ,
IoU = T P T P G T ,
where T P , T P G T , F P and F N represent the area of true positive, GT of true positive, false positive, and false negative, respectively. In addition, we measured a rate, defined as hit rate, using the number of detected results for the number of simulated defects as
Hit rate = N T P N G T ,
where N T P and N G T represent the number of detected defects and simulated defects, respectively. N T P is accumulated when the area of a defected defect per a simulated defect is bigger than 0.1 . For the hit rate, we evaluate the accuracy of detecting significantly deformed regions for facility maintenance.
Table 4 and Figure 14 show the quantitative and qualitative evaluation result of the proposed system. In the experiment, we obtain relatively low values of precision, recall, and IoU. Since the proposed system compares a pair of railway tunnel images by relying on their intensity values and the constant thresholding value, inaccurate results are sometimes obtained if the brightness difference of images are low. Nevertheless, the proposed system can accurately detect noticeable defect regions as shown in the hit rate of Table 4.
Table 5 shows the computational time of the proposed system represented as second per an image. We implemented most modules except the detection network using C++ language, and loaded the detection module implemented using Python in the proposed system. For improvement of the inspection speed, we applied the OpenMP library for parallel processing in some methods. As shown in Table 5, the proposed system can quickly provide the inspection result for facility maintenance.

5. Conclusions

This paper presented a novel, deep learning-based railway inspection system for facility maintenance using image analysis in the urban railway field. Unlike conventional methods and systems, the proposed system finds wears and cracks by comparing a pair of images of the same location at different times. Line scan camera can overcome drawbacks of area-based camera that has the narrow FoV and low image acquisition speed. The proposed system inspects facilities using deep learning-based object detection. More specifically, an improved single shot detector was proposed to find and classify facilities with better performance than the original SSD using the dataset acquired in the subway tunnel environment, although networks used in the experiment for detection have high accuracies since facilities of the same class have similar shapes and sizes. The proposed system finds facility defects using image comparison approach based on the absolute difference and the Weber’s law. Image comparison based on the difference measurement is more efficient than single image-based analysis because we can simply find the difference of a pair of images including respectively normal and abnormal facilities. The Weber contrast is also suitable to compare a pair of images since its nominator and denominator represent the image difference as relative stimulus change and the intensity of the reference image as initial stimulus, respectively.
The proposed system can provide various benefits in managing railway infrastructure. Specifically, we can monitor facilities at all times and maintain any fault and defect to prevent accidents if the proposed system is equipped in commercial vehicles for transportation. It can also reduce cost and time for inspection. Consequently, the proposed system can provide an automatic inspection and maintenance of urban railway infrastructure.

Author Contributions

Software and writing–original draft, J.J.; investigation and validation, M.S. and S.L.; project administration and writing–review & editing, J.P. (Joonki Paik); data curation, J.P. (Jonggook Park) and J.K.

Funding

The work presented in this paper was supported by the Ministry of Land & Infrastructure, and Transport in Korea through the Korea Railroad Technology Research “Developed autonomous inspection and analysis system for the facilities of urban railway (tunnel)“ (Project No. 19RTRP-C136506-03) committed by the Korea Agency for Infrastructure Technology Advancement (KAIA), and Institute for Information & Communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) (No. 2014-0-00077, Development of global multi-target tracking and event prediction techniques based on real-time large-scale video analysis).

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Mohan, A.; Poobal, S. Crack Detection Using Image Processing: A Critical Review and Analysis. Alex. Eng. J. 2018, 57, 787–798. [Google Scholar] [CrossRef]
  2. Li, C.J.; Zhang, Z.; Nakamura, I.; Imamura, T.; Miyake, T.; Fujiwara, H. Developing a New Automatic Vision Defect Inspection System for Curved Surfaces with Highly Specular Reflection. Int. J. Innov. Comput. Inf. Control 2012, 8, 5121–5136. [Google Scholar]
  3. Tao, X.; Zhang, D.; Ma, W.; Liu, X.; Xu, D. Automatic Metallic Surface Defect Detection and Recognition with Convolutional Neural Networks. Appl. Sci. 2018, 8, 1575. [Google Scholar] [CrossRef]
  4. Essid, O.; Laga, H.; Samir, C. Automatic Detection and Classification of Manufacturing Defects in Metal Boxes Using Deep Neural Networks. PLoS ONE 2018, 13, 1–17. [Google Scholar] [CrossRef] [PubMed]
  5. The Railway Technical Website. Available online: http://www.railway-technical.com/infrastructure/ (accessed on 29 September 2019).
  6. Min, Y.; Xiao, B.; Dang, J.; Yue, B.; Cheng, T. Real Time Detection System for Rail Surface Defects Based on Machine Vision. EURASIP J. Image Video Process. 2018, 2018, 1–13. [Google Scholar] [CrossRef]
  7. Karakose, M.; Yaman, O.; Baygin, M.; Murat, K.; Akin, E. A New Computer Vision-Based Method for Rail Track Detection and Fault Diagnosis in Railways. Int. J. Mech. Eng. Robot. Res. 2017, 6, 22–27. [Google Scholar] [CrossRef]
  8. Han, Y.; Liu, Z.; Lee, D.; Liu, W.; Chen, J.; Han, Z. Computer Vision-Based Automatic Rod-Insulator Defect Detection in High-Speed Rarilway Catenary System. Int. J. Adv. Robot. Syst. 2018, 15, 1–15. [Google Scholar] [CrossRef]
  9. Zhang, W.; Zhang, Z.; Qi, D.; Liu, Y. Automatic Crack Detection and Classification Method for Subway Tunnel Safety Monitoring. Sensors 2014, 14, 19307. [Google Scholar] [CrossRef] [PubMed]
  10. Attard, L.; Debono, C.J.; Valentino, G.; Castro, M.D. Vision-Based Change Detection for Inspection of Tunnel Liners. Autom. Constr. 2018, 91, 142–154. [Google Scholar] [CrossRef]
  11. Jang, J.; Kim, H.; Shin, M.; Park, J.; Kim, J.; Paik, J. Vision-Based Railway Inspection System Using Multiple Object Detection and Image Registration. IEEE Trans. Smart Process. Comput. 2018, 7, 440–447. [Google Scholar] [CrossRef]
  12. James, A.; Jie, W.; Xulei, Y.; Chenghao, Y.; Ngan, N.B.; Yuxin, L.; Yi, S.; Chandrasekhar, V.; Zeng, Z. TrackNet—A Deep Learning Based Fault Detection for Railway Track Inspection. In Proceedings of the 2018 International Conference on Intelligent Rail Transportation (ICIRT), Singapore, 12–14 December 2018; pp. 1–5. [Google Scholar]
  13. Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), Munich, Germany, 5–9 October 2015; pp. 234–241. [Google Scholar]
  14. Huang, G.; Liu, Z.; van der Maaten, L.; Weinberger, K.Q. Densely Connected Convolutional Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 22–25 July 2017; pp. 4700–4708. [Google Scholar]
  15. Gibert, X.; Patel, V.M.; Chellappa, R. Deep Multitask Learning foro Railway Track Inspection. IEEE Trans. Intell. Transp. Syst. 2017, 18, 153–164. [Google Scholar] [CrossRef]
  16. Huang, H.; Sun, Y.; Xue, Y.; Wang, F. Inspection Equipment Study for Subway Tunnel Defects by Grey-Scale Image Processing. Adv. Eng. Inform. 2017, 32, 188–201. [Google Scholar] [CrossRef]
  17. Xiong, Z.; Li, Q.; Mao, Q.; Zou, Q. A 3D Laser Profiling System for Rail Surface Defect Detection. Sensors 2017, 17, 1791. [Google Scholar] [CrossRef] [PubMed]
  18. Asakura, T.; Kojima, Y. Tunnel Maintenance in Japan. Tunn. Undergr. Space Technol. 2003, 18, 161–169. [Google Scholar] [CrossRef]
  19. Bodenstorfer, E.; Hasani, Y.; Fürtler, J.; Brodersen, J.; Mayer, K.J. High-Speed Line-Scan Camera with Multi-Line CMOS Color Sensor. In Proceedings of the 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, Providence, RI, USA, 20–25 June 2012; pp. 9–14. [Google Scholar]
  20. Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. SSD: Single Shot MultiBox Detector. In Proceedings of the European Conference on Computer Vision (ECCV), Amsterdam, The Netherlands, 8–16 October 2016; pp. 21–37. [Google Scholar]
  21. Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. In Proceedings of the International Conference on Learning Representations, San Diego, CA, USA, 7–9 May 2015; pp. 1–14. [Google Scholar]
  22. Fu, C.Y.; Liu, W.; Ranga, A.; Tyagi, A.; Berg, A.C. DSSD: Deconvolutional Single Shot Detector. arXiv 2017, arXiv:1701.06659. [Google Scholar]
  23. Bay, H.; Tuytelaars, T.; Gool, L.V. SURF: Speeded Up Robust Features. In Proceedings of the European Conference on Computer Vision, Graz, Austria, 7–13 May 2006; pp. 404–417. [Google Scholar]
  24. Fischler, M.A.; Bolles, R.C. Random Sample Consensus: a Paradigm for Model Fitting with Applications to Image Analysis and Automated Cartography. Commun. ACM 1981, 24, 381–395. [Google Scholar] [CrossRef]
  25. Chen, J.; Shan, S.; Zhao, G.; Chen, X.; Gao, W.; Pietikäinen, M. Two-Frame Motion Estimation Based on Polynomial Expansion. In Proceedings of the Scandinavian Conference on Image Analysis, Halmstad, Sweden, 29 June–2 July 2015; pp. 363–370. [Google Scholar]
  26. Hecht, S. The Visual Discrimination of Intensity and the Weber-Fechner Law. J. Gen. Physiol. 1924, 7, 235–267. [Google Scholar] [CrossRef] [PubMed]
  27. Chen, J.; Shan, S.; Zhao, G.; Chen, X.; Gao, W.; Pietikäinen, M. A Robust Descriptor based on Weber’s Law. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, AL, USA, 24–26 June 2008; pp. 1–7. [Google Scholar]
  28. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, CA, USA, 26 June–1 July 2016; pp. 770–778. [Google Scholar]
Figure 1. (a) Front view of a railroad vehicle in an underground tunnel and (b) image acquisition process using a line scan camera.
Figure 1. (a) Front view of a railroad vehicle in an underground tunnel and (b) image acquisition process using a line scan camera.
Sensors 19 04738 g001
Figure 2. Acquired images: (a) the 5-th and (b) 20-th frames.
Figure 2. Acquired images: (a) the 5-th and (b) 20-th frames.
Sensors 19 04738 g002
Figure 3. A pair of the images acquired at the same location but at different times. The image (a) was acquired before (b).
Figure 3. A pair of the images acquired at the same location but at different times. The image (a) was acquired before (b).
Sensors 19 04738 g003
Figure 4. Coarse registration process of the proposed system: (a) horizontally transformed image of Figure 3a; (b) reconstructed result of (a); and (c) reconstructed results of Figure 3b, respectively.
Figure 4. Coarse registration process of the proposed system: (a) horizontally transformed image of Figure 3a; (b) reconstructed result of (a); and (c) reconstructed results of Figure 3b, respectively.
Sensors 19 04738 g004
Figure 5. Network model of the original SSD with the input size of 512 × 512 [20].
Figure 5. Network model of the original SSD with the input size of 512 × 512 [20].
Sensors 19 04738 g005
Figure 6. Network architecture of the proposed system: (a) overall network architecture; (b) baseline network using the VGG model; and (c) decoding block for object detection.
Figure 6. Network architecture of the proposed system: (a) overall network architecture; (b) baseline network using the VGG model; and (c) decoding block for object detection.
Sensors 19 04738 g006
Figure 7. Split strategy of the proposed detection algorithm.
Figure 7. Split strategy of the proposed detection algorithm.
Sensors 19 04738 g007
Figure 8. Subtraction result of a pair of facility images: (a,b) are facility images which we should compare, and (c) their subtraction result. (a) was captured in advance before (b).
Figure 8. Subtraction result of a pair of facility images: (a,b) are facility images which we should compare, and (c) their subtraction result. (a) was captured in advance before (b).
Sensors 19 04738 g008
Figure 9. Result of each preprocessing step: (a) transformed image of g ˜ i , j b ( x , y ) using feature matching and homography, (b) warped image of (a) using optical flow-based registration, (c) transferred result of (b) using histogram specification, and (df) subtraction results of (ac) by the absolute difference operation, respectively.
Figure 9. Result of each preprocessing step: (a) transformed image of g ˜ i , j b ( x , y ) using feature matching and homography, (b) warped image of (a) using optical flow-based registration, (c) transferred result of (b) using histogram specification, and (df) subtraction results of (ac) by the absolute difference operation, respectively.
Sensors 19 04738 g009
Figure 10. (a) Weight for reduction of the common high-frequency components; (b) multiplication result of (a) with the subtraction result; (c) Weber contrast of g ¯ i , j b ( x , y ) and g ˜ i , j a ( x , y ) ; and (d) defect candidate image. Note that (bc) were expressed by normalizing the original result.
Figure 10. (a) Weight for reduction of the common high-frequency components; (b) multiplication result of (a) with the subtraction result; (c) Weber contrast of g ¯ i , j b ( x , y ) and g ˜ i , j a ( x , y ) ; and (d) defect candidate image. Note that (bc) were expressed by normalizing the original result.
Sensors 19 04738 g010
Figure 11. Defect detection result: (a) noisy image; (b) detected defect regions using the proposed system; and (c) a pair of image with the deformed regions.
Figure 11. Defect detection result: (a) noisy image; (b) detected defect regions using the proposed system; and (c) a pair of image with the deformed regions.
Sensors 19 04738 g011
Figure 12. Classes forfacility detection used in the experiment.
Figure 12. Classes forfacility detection used in the experiment.
Sensors 19 04738 g012
Figure 13. Detection results using the proposed network: (ad) are results of the 1st, 16th, 20th, and 82th frame, respectively. Color of bounding boxes represents each class type as shown in Figure 12.
Figure 13. Detection results using the proposed network: (ad) are results of the 1st, 16th, 20th, and 82th frame, respectively. Color of bounding boxes represents each class type as shown in Figure 12.
Sensors 19 04738 g013
Figure 14. Results of deformed regions using the proposed system. Green, blue, and red color represent true positive, false negative, and false positive, respectively.
Figure 14. Results of deformed regions using the proposed system. Green, blue, and red color represent true positive, false negative, and false positive, respectively.
Sensors 19 04738 g014
Table 1. Layer specification of each network model.
Table 1. Layer specification of each network model.
SD+VGSD+res50SD+VG/2DSD+VG/2DSD+VG/4
iter.layeriter.layeriter.layeriter.layeriter.layer
Block 12 3 × 3 , 641 7 × 7 , 642 3 × 3 , 322 3 × 3 , 322 3 × 3 , 16
Block 22 3 × 3 , 1281 3 × 3 , 1282 3 × 3 , 642 3 × 3 , 642 3 × 3 , 32
1 × 1 , 32
3 3 × 3 , 32
1 × 1 , 128
Block 33 3 × 3 , 2561 3 × 3 , 2563 3 × 3 , 1283 3 × 3 , 1283 3 × 3 , 256
1 × 1 , 64
4 3 × 3 , 64
1 × 1 , 256
Block 43 3 × 3 , 5121 3 × 3 , 5123 3 × 3 , 2563 3 × 3 , 2563 3 × 3 , 128
1 × 1 , 128
6 3 × 3 , 128
1 × 1 , 512
Block 53 3 × 3 , 5121 3 × 3 , 5123 3 × 3 , 2563 3 × 3 , 2563 3 × 3 , 128
1 × 1 , 128
4 3 × 3 , 128
1 × 1 , 512
Block 61 3 × 3 , 10241 3 × 3 , 10241 3 × 3 , 2561 3 × 3 , 2561 3 × 3 , 128
Block 71 1 × 1 , 10241 1 × 1 , 10241 1 × 1 , 5121 1 × 1 , 5121 1 × 1 , 256
Block 81 1 × 1 , 2561 1 × 1 , 2561 1 × 1 , 1281 1 × 1 , 1281 1 × 1 , 64
1 3 × 3 , 5121 3 × 3 , 5121 3 × 3 , 2561 3 × 3 , 2561 3 × 3 , 128
Block 9-121 1 × 1 , 1281 1 × 1 , 1281 1 × 1 , 641 1 × 1 , 641 1 × 1 , 32
1 3 × 3 , 2561 3 × 3 , 2561 3 × 3 , 1281 3 × 3 , 1281 3 × 3 , 64
Block 13-14 2 3 × 3 , 1282 3 × 3 , 128
1 × 1 , 64 3 × 3 , 64
Block 15 2 3 × 3 , 1282 3 × 3 , 128
1 × 1 , 128 3 × 3 , 128
Block 16 2 3 × 3 , 2562 3 × 3 , 256
1 × 1 , 256 3 × 3 , 256
Block 17 2 3 × 3 , 5122 3 × 3 , 512
1 × 1 , 128 3 × 3 , 128
Block 18 2 3 × 3 , 2562 3 × 3 , 256
Table 2. Mean of average precision (mAP) of facility detection (%). Red and blue numbers are the best and second probabilities in each column, respectively.
Table 2. Mean of average precision (mAP) of facility detection (%). Red and blue numbers are the best and second probabilities in each column, respectively.
Model1234567891011121314Total
SD+VG99.899.610094.889.196.296.210099.899.598.710082.499.996.8
SD+VG/299.899.610092.686.196.594.010099.599.597.910086.098.996.5
SD+res5099.299.899.497.284.290.072.199.294.198.898.610069.698.492.9
DSD+VG/299.999.810098.487.997.996.010099.599.899.710079.099.496.9
DSD+VG/499.999.410099.791.198.091.710099.499.499.410079.599.696.9
Table 3. Computational time of network models (ms).
Table 3. Computational time of network models (ms).
SD+VGSD+VG/2SD+res50DSD+VG/2DSD+VG/4
Time (ms)10687107105100
Table 4. Quantitative evaluation for the defect detection module.
Table 4. Quantitative evaluation for the defect detection module.
PrecisionRecallIoUHit Rate
Value0.54200.72430.42240.9007
Table 5. Computational time of the proposed system (sec/image).
Table 5. Computational time of the proposed system (sec/image).
MeanMinMax
Image Loading1.0900.1201.249
Reconstruction0.5110.4000.584
Facility Detection0.4500.3600.535
Facility Inspection0.8220.3663.952
Total3.2601.8776.866

Share and Cite

MDPI and ACS Style

Jang, J.; Shin, M.; Lim, S.; Park, J.; Kim, J.; Paik, J. Intelligent Image-Based Railway Inspection System Using Deep Learning-Based Object Detection and Weber Contrast-Based Image Comparison. Sensors 2019, 19, 4738. https://doi.org/10.3390/s19214738

AMA Style

Jang J, Shin M, Lim S, Park J, Kim J, Paik J. Intelligent Image-Based Railway Inspection System Using Deep Learning-Based Object Detection and Weber Contrast-Based Image Comparison. Sensors. 2019; 19(21):4738. https://doi.org/10.3390/s19214738

Chicago/Turabian Style

Jang, Jinbeum, Minwoo Shin, Sohee Lim, Jonggook Park, Joungyeon Kim, and Joonki Paik. 2019. "Intelligent Image-Based Railway Inspection System Using Deep Learning-Based Object Detection and Weber Contrast-Based Image Comparison" Sensors 19, no. 21: 4738. https://doi.org/10.3390/s19214738

APA Style

Jang, J., Shin, M., Lim, S., Park, J., Kim, J., & Paik, J. (2019). Intelligent Image-Based Railway Inspection System Using Deep Learning-Based Object Detection and Weber Contrast-Based Image Comparison. Sensors, 19(21), 4738. https://doi.org/10.3390/s19214738

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop