1. Introduction
Digitalization is one of the priorities in transforming industrial research and innovation [
1]. In addition, the offshore industry can be significantly transformed by embracing advanced digital technologies such as the Internet of Things, Big Data, and Cloud Computing, together with sensor technologies and high-fidelity structural modeling techniques. All of these technologies come together in what is broadly described as a Digital Twin. In this paper, the development of Digital Twin technologies for large-scale offshore structures is explored.
The term “Digital Twin” has many different definitions in both academia and industry, in a rather broad sense. A state-of-the-art and a state-of-the-use review provide more than 400 definitions of Digital Twin [
2]. However, in this work we adopt the definition of Bolton et al. [
3], that a Digital Twin is a “dynamic virtual representation of a physical object or system across its lifecycle, using real-time data to enable understanding, learning and reasoning.”
The vision for using a Digital Twin for smart, efficient, and reliable large-scale offshore structures is further elaborated in [
4]. A Digital Twin can be connected to the real structure through a live feed of data streamed from embedded sensors, meteorology data, and, most importantly, from structural health monitoring and inspection systems. Ideally, therefore, a Digital Twin can be used to make predictions of the development of damage, based on the current damage state that is recorded and updated in a structural health journal maintained for each structural component. Based on physical models, the Digital Twin can then be used to simulate the effect on the structure for different damage modes and different operational scenarios, e.g., during installation or in the event of a storm, grid loss, shutting down, etc. [
4].
A Digital Twin of a large-scale structure should start at the early design stage. Within the manufacturing stage, the monitoring of the process parameters is essential for future damage identification. Numerous in-service failures originate from fabrication defects. Therefore, each structure needs to be digitally reconstructed and scanned for flaws before leaving the factory; accordingly, a Digital Twin is updated with information on the manufacturing history and fabrication defects. This should be carried out automatically to limit the workload of such a task.
Technologies such as autonomous robots that adapt to the surface of large-scale offshore structures, such as wind turbine blades, are now being developed to perform three-dimensional (3D) scans and to identify surface manufacturing defects before the structures leave the factory [
5]. Although this type of technology seems to have good potential in terms of accuracy, the long scanning time and high costs, in addition to the robustness of the system, make it difficult to apply this technology on a large scale and use it in other lifecycle stages, such as inspection during operation. Alternative 3D reconstruction technologies, such as LiDAR (light detection and ranging) scanning and drone-based photogrammetry, implemented in the building and construction industry [
6,
7,
8], could also be applied in the offshore industry not only to perform 3D digital reconstruction of large-scale structures but also to identify surface defects and damages. Scanning using LiDAR scanners has proven to be a good option, in terms of high extracting speed and high accuracy, for obtaining a 3D digital representation of complex objects, such as bridges, without physical contact [
6]. In addition to these two advantages, drone-based photogrammetry has been shown to decrease overall reconstruction time and costs relative to the common manual and direct inspection techniques that are used in the building and construction industry [
7,
8]. In this technology, analog cameras are mounted on unmanned aerial vehicles (UAVs), commonly known as drones, to take high-resolution RGB aerial images for different points of the object. The images are then used to make a 3D reconstruction of the object by implementing post-processing techniques using matching key points through the structure from motion (SfM) or multi-view stereopsis (MVS) algorithms [
9].
Once the Digital Twin is created with the 3D digital reconstruction of the large-scale structure and information on the manufacturing defects, it must be continuously updated throughout the life cycle of the structure, including transportation, installation, operation, maintenance, and repair. Advanced and robust inspection systems and sensors must be developed, such that the environmental exposure and deterioration of structures and materials may be monitored. Through the Digital Twin, the state of the individual structures can always be assessed, giving valuable information to the operator. Based on this information, the operator can make decisions that affect the lifetime of its assets, e.g., changing the operation mode to reduce the loading on the structures. Furthermore, information about necessary repairs will be available through processing data in the Digital Twin.
Currently, inspections of large-scale offshore structures during operation, such as wind turbine structures, are often carried out manually using e.g., lifts and rope access. This is a time-consuming, expensive, and potentially dangerous task. Therefore, the use of drones is a potential alternative for this process, as they could be used to carry out remote, faster, safer, and cheaper inspections [
10].
However, drone inspections could bring other challenges, such as creating a large amount of images and data that need to be processed. In addition, users have difficulty precisely and correctly locating surface defects and damages because the images are often shown out of context. Therefore, there is a need for a faster and more efficient solution to store, analyze, and report on the vast amount of data the inspections provide. Artificial intelligence (AI) and Digital Twins provide these opportunities. AI can be used to process many more images, much faster than a human can handle. Observations from multiple images can be grouped into unique issues by AI and, at the same time, the 3D location can be calculated so the issues can be automatically mapped in the Digital Twin. The result is a single issue that can be seen from multiple images and angles through the Digital Twin. In addition, smaller defects/damages can potentially be detected by AI, which would have been missed by the human eye. An example of wind turbine surface damage detection using AI-aided drone inspection is found in [
10].
Depending on the application and the objective, the methodologies of creating Digital Twins can be significantly different. In the present study, we propose a novel methodology to create an operational image-based Digital Twin for large-scale offshore structures. We use drone inspection images and/or LiDAR scans to create a 3D geometric reconstruction of the structure, compare the reconstructed model to the original design to locate and quantify geometric deviation, apply AI to identify surface defects/damages, and map the identified surface defects/damages to the geometric reconstruction. The entire process can be done automatically, allowing efficient model updating and status tracking with new sets of inspection images. The proposed methodology is demonstrated on a wind turbine transition piece (TP) and can be applied to other large-scale structures where structural health monitoring and asset management are needed.
3. Drone-Based Image Acquisition and Pre-Processing
Images of a specific transition piece were collected at the factory of Bladt Industries using a DJI zenmouse P1 45M pixels camera onboard a DJI M300 RTK drone. Real-time kinematic positioning (RTK) was used to improve the precision of the GPS information saved in the metadata section of the images. The selected TP had sufficient space around it, making it easier to plan for a good flight path, where the drone during flight has a safe distance from both the selected TP and the surrounding TPs.
Figure 2a,b show the drone flight paths and the actual drone and transition piece, respectively. The drone flight path was planned in a local Cartesian coordinate system which had the origin placed in the center of the CAD model of the tower. The number of flight points were selected high enough to ensure the necessary overlap between the images, in both the horizontal and in vertical directions. It was generally accepted, as a rule of thumb, that the minimum overlap between the photos was 60%, while the maximum angle difference between consecutive photos was 15 degrees [
12]. The flight points were converted to a georeferenced coordinate system and uploaded to the drone.
Figure 2a shows the drone flight paths, while the image acquisitions positions are represented by the blue dots.
The images with improved quality were obtained when the camera on the drone was placed in a gimbal stabilizer and was pointing toward the TP during the flight. The drone images were used both for paint defect/damage detection and for performing the 3D model reconstruction. The TP was made of steel, which meant that specular reflection in the tower could produce shiny images that had patches with a color different from the surrounding areas. The TP in the images had very little texture; there were large areas with only slight differences in color.
These issues made the 3D reconstruction of the TP very challenging. A possible solution to this challenge could be to optically project a pattern on the tower. Image key points can be used to predict whether the drone images can successfully be used in the model reconstruction process.
Image key points were used in photogrammetry software, such as ContextCapture from the Bentley Institute [
12], to find the “interesting” points in an image and to find the connection between images. The calculation of these keypoints for all of the images is one of the first steps in the model reconstruction process. Keypoints are spatial locations in the image that stand out; they can be an edge or a defect in a surface.
Figure 3a,b and
Figure 4 show the keypoints represented with green + signs in one of the images of the TP. These image keypoints were calculated using the speeded-up robust features (SURF) algorithm and the scale-invariant feature fransform (SIFT) algorithm, as seen in
Figure 3 and
Figure 4, respectively. Both of these robust feature descriptors are invariant to scale changes, blur, rotation, illumination changes, and affine transformations (see reference [
13]). SIFT is better than SURF in different scale images, while SURF is a faster algorithm (see reference [
14]). A good reconstruction is expected when the image keypoints are uniformly distributed; however, this is not the case for the example shown in
Figure 3a.
It is seen that there were very few SIFT and SURF keypoints on the TP, thus making the reconstruction more challenging. There were many keypoints on the highly textured ground close to the tower, which meant that these areas of the scene would be well-captured in the reconstructed model. Both the SIFT and SURF descriptors could find the areas where there were tower paint defects/damages, as seen in
Figure 3b and
Figure 4b. These figures show a smaller section of the tower where the SIFT and SURF keypoints, respectively, were positioned at a section with paint defects/damages. The SIFT algorithm also placed keypoints on the welding seam.
Figure 3c shows the 100 strongest-matched SURF points in two images that were captured subsequently during the drone flight.
The possibility of tracking keypoints from one image to the next is very important for a good reconstruction. The calculation of image keypoints does not take a long time. This calculation makes it possible to estimate, in the field, how well suited the images are for performing 3D reconstruction; few keypoints on the tower and in the image would hinder a good 3D reconstruction of the tower. Image keypoints, such as SIFT and SURF, can also be used in the field to estimate where the paint defects/damages are placed on the tower. Images that only show the tower, and not the surroundings, would have image keypoints mostly in the areas of the paint defects/damage. The SURF algorithm was better suited for our purposes because it finds the paint damages and not other non-essential features, such as the welding seams.
4. 3D Geometry Reconstruction
The 3D reconstructed model of the transition piece is the visual part of the Digital Twin. This reconstruction model was based on both images and LiDAR scans.
The images were collected during the drone flights; ContextCapture from the Bentley Institute was used to generate the 3D reconstruction models. These 3D models are textured meshes with a large number of faces and vertices.
Figure 5a shows the CAD model of the TP from Bladt Industries and
Figure 5b shows the corresponding reconstructed 3D model. This reconstruction is based on 445 images. Small holes can be observed on the cylindrical sides of the reconstructed model. These small holes in the reconstruction mesh appear in the areas where there is a low number of keypoints. A larger number of images can in some cases mitigate the number of small holes in the reconstruction. The overlapping between the CAD model and the reconstructed model is shown in
Figure 6. The critical step of accurately aligning the reconstructed and the CAD model meshes was done with the use of the Meshlab tool [
15]. The grey areas come from the CAD model, while the yellow areas come from the reconstructed model.
It is seen that the reconstructed model, in general, is a very good representation of the CAD model, with a few exceptions. The biggest difference between the two models is the different positions of the crane. The different heights of the posts on the upper platform is another difference between the two models. The height of the posts is lower, on the tower that was built, than what it should have been according to the CAD model.
Figure 7a shows the differences in distance between these models as colors superimposed on the CAD model. The green color corresponds to small distance differences between the models, while the red and blue colors correspond to higher positive and negative differences, respectively. The histogram in
Figure 7b shows the relative probability for the different distance differences, together with the corresponding color. The x-axes in the histogram have a logarithmic scale that makes it possible to see the very small probability values and the corresponding colors used in
Figure 7a. The distance differences have values from −1.1 to 3.7 m. Fortunately, it is seen that the probability for these high values is very low. The histogram has a relatively low standard deviation of 0.30 m and it is centered on a mean value that is very close to zero. The error associated with the reconstructed model, compared to the CAD model, is therefore, in general, relatively low. The red colors on the crane, corresponding to large values of the distance differences, are due to the different positions of the crane in the two models. The different crane position is also the reason for the second peak in the histogram for high values of geometric differences, as shown in
Figure 7b.
Appendix A shows how the dimension of objects can be measured from the model reconstruction.
The reconstructed 3D model of the TP can also be based on LiDAR scans. These results are shown in
Figure 8 and
Figure 9. The reconstructed model in
Figure 8 is based on six static LiDAR scans of the TP, corresponding to a point cloud of 63 million points. Shiny surfaces can cause problems for photogrammetry reconstruction based not only on images but also on LiDAR scans. Blockade of the scanning LiDAR beams is another type of problem that reduces the quality of the reconstruction. We encountered all of these obstacles during the drone inspection of the TPs. A lack of information in a certain area of the tower can cause holes to appear in the reconstructed 3D model. Smaller holes can be mitigated in the photogrammetry software ContextCapture from the Bentley Institute. The comparison between the CAD model and the reconstructed model, based on LiDAR scans, is shown in
Figure 9.
Figure 9a shows, again, the differences in distances between the CAD model and the reconstructed 3D model with the use of colors superimposed on the CAD model. The probabilities for the distance differences are shown in
Figure 9b, together with the corresponding color distribution. The distance differences have values from −2.8 to 4.1 m. The histogram has a relatively low standard deviation of 0.57 m and is centered on a mean value that is very close to zero.
Therefore, the error associated with the reconstructed model, compared to the CAD model, was, in general, relatively low. Some parts of the ladders on the platform were not reconstructed, due to the LiDAR beam blockade. This, together with the small holes in the reconstruction, increased the width of the histogram. The quality of the model reconstruction based on RGB images is better than the reconstruction based on LiDAR scans, which is evident from a comparison of the histograms in
Figure 7b and
Figure 9b. The ladders on the platform were better captured in the image-based reconstruction, because the platform itself blocks for the LiDAR beams. Accordingly, placing the LiDARs on the drone could solve this problem.
5. AI-Based Surface Defect/Damage Detection
The domain of defect/damage detection on TP surfaces falls under the category of either semantic segmentation or object detection. Given recent advancements in object detection with YOLO-based algorithms, which include benefits such as high classification accuracy and real-time throughput [
11], we applied object detection to the task at hand. Incoming images originated from drone footage using a high-resolution camera (~45 MP), reinforcing the requirement for highly efficient neural networks.
A YOLO-based architecture was chosen, due to the high network throughput and the availability of open-source implementations in common machine-learning libraries. Given the large image sizes in this project (approx. 45 MP), a network with high throughput was a key requirement. The availability of common machine-learning libraries allowed for a wider range of and easier deployment on GPU-ready machines.
The neural network architecture used was YOLOv5 (see reference [
11]), which, in turn, is based on the seminal YOLO (You Only Look Once) network established by Joseph Redmon [
11]. In this family of architectures, input images are effectively divided into a grid of cells, where each cell outputs a likelihood constrained on the interval [0, 1] for the presence of one or more objects (called “objectness”), values describing those bounding boxes (including height, width, and box center), and their associated class likelihoods. The loss function contains terms for each of these values, where the values from cells that have objectness lower than a given threshold are ignored and not penalized during loss calculation.
All of the available labeled data amounted to ~1200 images from handheld cameras and manual drone flights, totaling ~3500 objects of interest. Of these images, ~230 were reserved as a validation set. We expanded the training set by applying augmentations, including rotation, translation, HSV value shifts, perspective shifts, horizontal/vertical flips, and more.
A widely used performance metric for object detection is mean average precision (mAP). A target performance of 0.7 mAP was defined in the early stages of this study, which corresponded with a high-performing model on open source datasets during that time. The best performing model reached only 0.31 mAP on a hold-out set. The reason for this stark contrast in the expected performance and the realized performance is attributed to the subjectivity in labeling. Contrary to most publicly used datasets, where object granularity is often not an issue, this work demonstrated data where a single object of interest may constitute many smaller objects (see
Figure 10 and
Figure 11), or conversely where many smaller objects may constitute a single larger object. This decision-making is somewhat subjective and depends on the pragmatism of the labeler. As an example, the class “Flying Rust” often comprises 100 or more small flecks of rust on the TP surface, grouped in heterogeneous clusters. Given that the object granularity bore little consequence in our application field, we developed a performance metric that calculated the mean intersect over union (IoU) for each given class, irrespective of the number of bounding boxes that intersected with a ground truth bounding box (or vice versa). With this metric (which we abbreviated to mIoU), our model performed at 0.59. For comparison, a high performing model evaluated on the COCO 2017 dataset reached 0.69 mIoU.
Training to convergence on the validation dataset took approximately 30 h on a P6000 GPU. Once completed, the network weights were stored and the neural network could be deployed with these weights in a REST server on a GPU-enabled machine for making real-time detections over an internet network. Examples of ground truth labels and network detections are shown in
Figure 10 and
Figure 11.
6. Defect/Damage Mapping
The paint defects/damages were mapped, using a novel approach, onto the reconstructed model or as an alternative to the CAD model. The reconstructed/CAD model, with all the defects/damages, could be used to provide an overview of the positioning of the defects/damages in 3D, thus making it possible to identify any systematic in which the defects/damages were placed. This identification could be used in the optimization of production. The AI algorithm identified the pixels in an image that corresponded to paint defects/damages; this 2D information was mapped to the 3D CAD or reconstructed model. The same paint defect/damage could often be seen from a slightly different angle in a number of different images. Each of these pixels with paint defects/damages was mapped to the reconstructed model, thereby increasing the paint defect/damage area on the tower. However, some of the mapped defect/damage points overlapped each other. Three different coordinate systems were used to perform the mapping (see
Figure 12a). The world coordinate system (
x,
y,
z) is shown in the figure, together with the three colored axes of the local camera coordinate system. Moving this coordinate system to the center of the camera sensor results in the third coordinate system.
A meshed version of the CAD or reconstructed model was placed in the georeferenced coordinate system that was used during model reconstruction. The local camera coordinate system, where the
x-,
y-, and
z-axis have the colors blue, red and black, respectively, is shown in
Figure 12a. The camera angle measured from the optical axis of the camera determined the direction in which the camera is pointing. The green ray in the
Figure 12 shows this direction. A simple pinhole camera model provided the relation between an object in the “real world” and the corresponding image captured by the camera. The effects of the light-collecting lens in the camera were added to the pinhole camera model. This “extended” pinhole camera model was used as shown in Equation (1) below. If (
x’,
y’) are the pixel values of the image, then the 3D world coordinates (
x,
y,
z) of the 3D object can be calculated using the following equation:
Here, (
cposx,
cposy,
cposz) are the positions of the camera in world coordinates.
DCT is the distance from the camera to the tower, in the direction in which the camera is pointing, while
fk and
fl are the focal lengths of the camera expressed in pixels along the horizontal and vertical directions of the camera sensor, respectively. The two values
cx and
cy move the pixels values from the upper left corner to the center of the camera sensor.
RotM is the rotation matrix that transforms the local camera coordinates to world coordinates. In many applications, the image pixels are calculated from the 3D world coordinates of the object. This calculation is the reverse of the equation given in Equation (1). Parameters such as camera positions and rotation matrices can often be extracted from the photogrammetry software for every image used in the reconstruction. The simple pinhole camera model and the “extended” pinhole camera models calculating the image pixel values from the 3D world coordinates are explained in the course notes found in reference [
16].
The effect of distortion is seen in the images from the drone. Vertical structures placed in the center of the image are reproduced as vertical structures, but when these structures were placed near the edges of the images, they tended to be slightly curved, showing signs of distortion. Radial and tangential lens distortion can be added to the “extended: pinhole model provided in Equation (1). Particularly small lenses suffer from radial distortion when the light rays bend closer to the edges of a lens than when they bend closer to the center of a lens. Tangential distortion occurs when the lens and the image plane are not parallel. To account for radial and tangential distortions, the contributions to pixel values must be calculated. If the dimensionless pixel values
dlx and
dly used in Equation (1) are given by
then the radial (
xrdis,
yrdis) and tangential distortions (
xtdis,
ytdis) are provided by the following set of equations:
Here,
and
k1,
k2 and
k3 are the radial distortion coefficients of the lens, while
p1 and
p2 are the tangential distortion coefficients of the lens. These distortion parameters can normally also be extracted from the photogrammetry software. It is seen that the magnitude of the distortion correction in Equations (2) and (3) will increase with the value of the radius
r. The distance
DCT from the camera to the tower needed in the equations above can be calculated because both the camera and the tower positions are known. The distortion of the paint damage pixels found from the AI routine are provided by Equation (3). The dimensionless pixel values dlx and dly are corrected by these values before they are used in Equations (1) and (2) to calculate the 3D world coordinates of the paint damages. The red points in
Figure 12a,b near and on the tower are the world coordinates corresponding to the paint defect/damage pixels calculated using Equations (1)–(3). Not all of the points are on the tower (see
Figure 12b), because depth information for all of the the pixel values in the image is needed in order to perform a precise image pixel to world coordinates transformation.
The green points in the figure are calculated as a projection of the red points onto the tower. These green points represent the mapping of the paint defect/damage pixels found in one of the drone images onto the CAD model of the transition piece.
Figure 13a,b shows the CAD model superimposed with the paint defects/damages found in all the drone images for this particular flight. It is seen that the defects/damages are not distributed uniformly, and there is a higher concentration of paint defects/damages near the bottom of the tower than in the middle and upper sections of the tower.
7. Concluding Remarks
In this study, we proposed and demonstrated a novel concept and developed a working methodology to create an operational Digital Twin of large-scale structures based on drone inspection images and LiDAR scans. The Digital Twin presented in this study is a 3D visual representation of a physical structure, e.g., the wind turbine transition piece demonstrated in this study, on which both drone inspection and LiDAR scans were conducted. The Digital Twin could (1) visualize and quantify the geometric derivation of the real structure from the design, (2) map the surface defects/damages including the sizes, locations, and types detected by AI on the 3D geometric reconstruction, and (3) update whenever new sets of inspection data became available.
The Digital Twin concept presented in this study is operational and opens many opportunities for preventive maintenance and optimal asset management of large-scale structures and infrastructure. Further improvement of the Digital Twin is necessary to make it more accurate, automated, robust, and faster. The remaining challenges include improving the accuracy of 3D geometric reconstruction when using inspection images of shining surfaces; automatizing and streamlining the working flow with a huge amount of data; and enhancing the robustness of the Digital Twin for other applications on different large-scale structures. The proposed framework is modularized and has flexibility to adapt and upgrade. For example, the current study uses AI to train the damage detection network; this module can be replaced by other methods, such as wavelet and contourlet transforms, which can also detect damages from images. In addition, the damage-related information obtained from the drone-based images could be complemented with data obtained from sensors installed in the structure, such as strain gauges and accelerometers, and from other inspection techniques, such as thermography. Advanced numerical tools can be further developed to understand these sensor signals and inspection data, and to simulate the consequences of structural and material deterioration. These challenges will be the focus of our future study based on the foundation provided in the current work.