1. Introduction
Hard substrates, composed of, e.g., cobbles, boulders and bedrock, provide an essential habitat for a variety of both marine sessile and mobile species [
1,
2]. The functioning of these marine habitats is, however, threatened by both natural (e.g., sediment mobility) and anthropogenic pressure (e.g., fishing, aggregate extraction and construction of offshore windfarms) [
3,
4,
5]. To preserve and support these habitats and the associated valuable ecosystem services, an efficient monitoring of these habitats is mandatory. This is especially important in areas where hard substrates are rare, e.g., in the sand-dominated North Sea.
The demarcation and monitoring of the condition of hard-substrate habitats need to consider substrate availability besides the investigation of the epibenthic assemblages. This is due to the strong dependency of sessile organisms on suitable anchor points. Various consolidated objects, both natural and artificial (e.g., stones, mussel accumulations, shipwrecks, pipelines or construction basements), can provide shelter and substrate for a variety of sessile and mobile species (e.g., [
6,
7,
8]). The sediment composition of the seafloor has a strong influence on the availability of such substrates. In environments with high sediment mobility, e.g., shallow shelf seas affected by waves and tides, hard substrates might become temporarily buried while previously buried ones might become exposed [
9,
10]. As yet, the spatial detection of underwater objects can only be achieved using hydroacoustic remote-sensing devices such as side-scan sonars (SSS), multibeam echo sounders (MBES), and parametric sediment echo sounders (pSES) (e.g., [
11,
12,
13]). SSS data are usually analyzed by means of automated and semi-automated methods for the detection of objects such as ship wrecks or mines (e.g., [
14,
15]). Objects which protrude from the seafloor can be identified in SSS data as they cause a signal of strong backscatter followed by a weak backscatter (acoustic shadow) perpendicular to the moving SSS. This pattern can be more or less clear as a function of the quality and resolution of the SSS data, the size and the material composition of the object, and the acoustic reflectivity of the seafloor. However, even though many objects are visually clearly identifiable, the automated counting, extraction and localization of individual objects for further applications remains difficult to impossible. Several statistical and machine learning algorithms were developed to support the automated identification and extraction of areas and objects. These focused on differences between pixel intensities, e.g., eCognition in [
16], Viola–Jones Cascade [
17], wavelet analysis [
18], Back Propagation and Convolutional Neural Networks ([
19,
20,
21] and references therein), among other approaches.
The training and application of so called Haar-like feature detectors was shown to produce promising results in a variety of different disciplines to detect objects on images (e.g., face detection, vehicle detection and the detection of mine-like objects on SSS mosaics) [
15,
22]. As yet, it is not known whether this method is also suitable for the spatial detection of hard substrates.
Just recently, the German Federal Agency for Nature Conservation (BfN) proposed a mapping guideline to demarcate reefs in the North and Baltic Sea [
23]. According to this guideline, the demarcation of reefs in the North Sea should be based on SSS data, which were acquired with a frequency of ≥300 kHz and a resolution suitable to detect stones ≥ 30–50 cm of diameter. It is based on a four-step approach, which follows the observer-based tagging of stones, which is, however, a very tedious and time-consuming process suitable only for very small areas.
To underpin this approach, the aim of this study was (1) to train a Haar-like classifier to detect individual objects on a large SSS mosaic, (2) to estimate the performance of the detector with respect to different sediment types, detection thresholds and number of grey values of the tested mosaic, and (3) to propose a spatial demarcation of reefs of a study site in the German Bight based on the mapping guideline [
23].
2. Material and Methods
2.1. Study Site and SSS Data
The study site is located within the Sylt Outer Reef (SOR), approximately 40 nautical miles west of the Island Sylt in the German Bight, SE North Sea, and has a size of ~12 km
2 (
Figure 1). The SOR is a massive submarine moraine ridge with glacial till and meltwater deposits that formed during the Saalian glacial (MIS 6) [
24]. Water depths in the area are between 30 and 40 m. The modern seafloor in the study area is composed of a patchy distribution of fine to coarse sands, with gravels, cobbles and boulders emerging from the seafloor [
13,
25]. Most of these hard substrates are colonized by epifauna [
8]. Since 2017, the SOR is protected as a Special Area of Conservation (SAC) according to the European Union’s Habitats Directive (92/43/EEC). It includes the habitat types ‘sandbanks’ (Annex 1 EUNIS habitat type: code 1110) and ‘reefs’ (Annex 1 EUNIS habitat type code: 1170).
SSS data were collected using a towed multi-pulse Edgetech 4200-MP during a survey in October 2016. The SSS was operated with a frequency of 300 kHz. The speed of the ship was approximately 5 knots, and the range was set to 75 m to achieve an along track resolution of at least 0.25 m. The SSS raw data were processed including slant range correction, speed, layback and gain normalization using SonarWiz (Chesapeake Technology, California, CA, USA). The nadir line was cut out to 5 m both in port- and starboard direction to reduce the noise of the resultant mosaic.
To investigate the sizes of the stones within the study area, a subsample of randomly chosen stones was measured on the waterfall mode of the SSS. The sizes were manually obtained with the target logger of the EdgeTech Discover software (4200-MP, version 7.00) by measuring the length of the acoustic shadow.
2.2. Training and Application of the Detector
The training of a Haar-like feature detector requires a large amount (many thousands) of positive (= matching) and negative (= not matching) images of the object in question (e.g., [
27]). In order to obtain this, raw SSS data of multiple stony areas of the Sylt Outer Reef were replayed in the waterfall mode (displayed in 256 grey values) and automatically transformed into still images using the Edgetech Discover 4200-MP software (version 7.00). The still images were imported into Matlab’s (MathWorks R2018b, Natick, MA, USA) Image Labeler application (part of Matlab’s Computer Vision System Toolbox) and positive samples (i.e., a backscatter pattern that indicates a stone) were manually extracted. Rectangles were drawn around the stones including some backscatter information of the surrounding background (average width = 44 ± 11 pixels, average height = 20 ± 6 pixels). These are the ‘real positive samples’. ‘Real negative samples’ were generated from those SSS still images that did not contain any stone by cutting out sub-images in the size of 40 × 20 pixels (
Figure 2).
To enlarge the training dataset, further ‘artificial negative samples’ were created by producing images (here with a size of 100 × 100 pixels) composed of randomly assigned grey values using Matlab. The creation and implementation of artificial images in the training procedure is a common method to improve the performance of a detector (e.g., [
15]). Therefore, all real positive and negative samples were quadrupled by flipping and rotating the images by 180 degrees (cf.
Figure 2). To achieve a further increase in the number of negative samples, the pixel values of each image retrieved from real samples were randomized as well as shuffled in their horizontal and vertical direction. This pixel-randomization procedure was repeated a second time using the previously created dataset. This procedure increased the number of real negative images by a factor of 64. In total, 21,848 positive and 343,370 negative samples were available for the training of the detector.
The Haar-like feature detector was trained using the cascaded object detector integrated in Matlab’s Computer Vision System Toolbox. The following settings were used: false alarm rate: 0.1; true positive rate: 0.995; number of cascade stages: 29; object training size in pixels: height = 10, width = 15; negative samples factor: 2. The false alarm rate defines the acceptable fraction of negative samples per stage, which are incorrectly classified as positive samples. The true positive rate is the minimum fraction of correctly classified positive samples.
The trained Haar-like feature detector was applied on the SSS mosaic with a minimum detector size of 10 × 15 pixels and a maximum detector size of 30 × 30 pixels. These values were manually determined as they were found to give the best detection results with respect to the size of the stones in the SSS mosaic. To assess the influence of the merging threshold level of the detector and the number of grey values of the SSS mosaic on the resulting detections, the thresholds were set to 6, 8, 10, 12 and 20, respectively, and the number of grey values of the SSS mosaic were set to 32, 64, 128, 192 and 256, respectively. The merging threshold level of the detector is a tunable integer that helps to reduce false detections. To pass a higher threshold level, individual objects must be detected multiple times during the multiscale detection phase. The coordinates of the center points of each resulting bounding box (i.e., the detected objects) of all threshold and grey value combinations were extracted and used for the subsequent analysis in ArcGIS (Esri, Redlands, California, USA).
2.3. Evaluation of the Performance
The performance of the detector was evaluated by comparing the results of the detector (further referred to as automatic method) with the results of manually tagged stones. Manually tagged stones were derived from SSS files that were replayed in the waterfall mode and the selection of obvious stones using the target logger. The evaluation was done with respect to the total number of spatially matching detections. It was further done by evaluating the number of matches with respect to four different seafloor types, and the size of a resulting reef area based on the method provided by the [
23] (both described in the following).
The comparison of the total number of spatial matches was evaluated by drawing buffers around each automatically and manually detected stone using diameters of 1.50 and 3.00 m, respectively. This approach was based on the observed offset of the positioning of point features derived from manually and automatically detected stones. The mismatch was triggered by the different-sized bounding boxes of the detector to identify different sizes of stones and the non-standardized marking of stones during the manual detection procedure. Those buffers that overlap, or rather have the largest overlap in the case of multiple overlaps, were supposed to represent the same stone and counted as one match (
Figure 3). Each detected stone was only allowed to be counted once.
The influence of different backscatter intensities, which represent different seafloor types, on the performance of the detector was assessed by the manual classification and interpretation of the backscatter intensities of 25 × 25 m grid cells. The four categories used were (1) fine sand areas showing a comparatively weak and homogeneous backscatter, (2) rippled sediments, (3) stony grounds with a comparatively strong backscatter, and (4) areas with a mixed occurrence of the above-mentioned sediment types. Again, overlapping buffers of automatically and manually detected stones with buffer diameters of 1.50 and 3.00 m were assumed to be matching stones.
The reef areas were demarcated according to the mapping guideline developed by [
23]: The demarcation of the geogenic reef type ‘stonefield/boulderfield North Sea’ is based on a four-step approach: (1) a buffer of 75 m is drawn around every individual stone (≥ approx. 30–50 cm), (2) stones, whose buffers overlap are classified as an ‘accumulation of stones and boulders’ and (3) form a ‘geogenic reef’ if the accumulation contains ≥21 individual stones, which have an average distance of ≤50 m to their nearest neighbors. Areas, which do not contain stones but are surrounded by ‘geogenic reefs’ are included in this category (4). All analyses were realized using ESRI’s ArcGIS 10.4 (ESRI, Redland, CA, USA).
To identify the best threshold and grey-value combination of the detector and the mosaic (see
Section 2.2) Equation (1) was developed and used as a decision support (DS):
The first part of the equation includes the size of the calculated reef area [km
2] derived from the detector (
) multiplied by the factor
(range: 0–1) (
Figure 4). This factor corresponds to the proportion of the reef area shared (
) with the reef area derived from manually tagged stones (
) with 1 meaning that the two reef areas completely overlap. The factor
stands for the proportion of the calculated reef area that is not shared with the manual method (i.e., lies outside of it). It is based on the reef area derived from the automated method that is not shared with the area derived from the manual method. The second part of the equation is based on the number of stones derived from the detector (
). The factor
(range: 0–1) corresponds to the proportion of correctly identified stones (
) with regard to the total number of automatically identified stones (
) with 1 meaning that all detected stones are correctly identified. The proportion of the number of missed stones derived from the automated method and the total number of manually tagged stones (
) is described by the factor
. The exponents
and
can be either set to 1 or 2, respectively, to provide an additional weight to either the correctly assigned reef area or the number of stones. The threshold and grey-value combination that produces results closer to
retrieved from the manual method is assumed to perform best.
2.4. Statistical Analyses
A one-way analyses of variance (ANOVA) and a Bartlett’s test for equal variances was performed to test for statistically significant differences between the proportion of correctly identified stones and different sediment types. For this test, the proportion of correctly identified stones was pooled for the different threshold and grey values. A Tukey–Kramer post-hoc test was used to identify statistically significant differences between the groups. Two-sample t-tests were used to evaluate the differences of the proportion of correctly identified stones for the different sediment types and the two buffer sizes (1.50 and 3.00 m).
4. Discussion
The detection of individual stones on SSS mosaics using Haar-like feature detectors was shown to be a promising approach for the purpose of stone identification and reef demarcation in benthic habitats. While other methods of detecting stones, e.g., the manual tagging or the application of pSES, are either time-consuming or lack the spatial distribution as they have a small footprint [
13], this automatic method extracts the coordinates of potential stones on an SSS mosaic within a short amount of time. However, it requires the preparation of an adequate dataset for the training of the detector.
However, the detection of stones on an SSS mosaic using Haar-like feature detectors requires an optimal tuning of the training and detection procedure. Challenges appear on four levels: (1) the training of the detector, (2) the settings during the detection, (3) the quality and resolution of the SSS mosaic, and (4) the general performance of the detector with respect to different sediment types.
(1) The training of a Haar-like feature detector requires an a priori specification of settings related to the size of the detector (i.e., size of the rectangle, in pixels). The detection of relatively small objects (approx. smaller than 8x8 pixels on an SSS mosaic with a pixel resolution of 25 cm) requires a detector which was trained at least in the same size or smaller than the particular object. This, however, increases false-positive detections, as, e.g., the scattered noise of the mosaic might be interpreted as individual objects. Concomitantly, a detector, which was trained for larger objects, will miss smaller objects [
28]. A large detector might further become sensitive for extensive transitions between sediments showing a prominent change of the acoustic backscatter (e.g., fine sand to coarse sand, cf.,
Figure 1d). Hence, selection of the appropriate size for the detector implies a trade-off between the general detection rate of objects and a small amount of false-positive detections. Most importantly, a large training dataset consisting of both positive and negative images is required for optimal training and an accurate detector [
28]. In particular, for special applications such as the detection of stones such a training dataset cannot as yet be obtained elsewhere like training sets for objects such as faces, cars, trees and the like (e.g., Open Images Dataset [
29] and MS-COCO [
30]). So far, it needs to be created manually, which is a time-consuming procedure.
(2) The settings during the detection process also imply the specification of minimum and maximum sizes of the detector. This specification calls for the same trade-off, which was mentioned above. Additionally, the size of the detector must not be smaller than the size of the trained detector. Further, a merging threshold can be set that defines the degree to which multiple detections within a certain area will be combined into one single detection. Even though a higher threshold value allows the reduction of the number of false-positive detections, the number of missed objects might also increase (e.g., in an area of closely accumulated objects) as shown in this study. The results of this study might also be influenced by the different sources of the data used for the training (i.e., unprocessed data) and the data on which the detector was applied on (i.e., processed mosaic). However, we assume that this influence is of minor importance, as the samples used for the training of Haar-like features are generally manipulated in terms of, e.g., brightness or contrast, to achieve a higher number of training samples. The different sources further prevent the detector from becoming too specific, especially when the number of available training samples is low.
(3) Apart from the resolution of the SSS mosaic, the quality of the SSS data and the post-processing procedure also influence the performance of the detector (e.g., [
31,
32]). For example, nadir-stripes or strong noise caused by, e.g., bad weather conditions, might increase false-positive detections. A subsequent smoothing or hiding of such artefacts during the post-processing of the mosaic can only be achieved at the cost of a diminished mosaic resolution or a minimized spatial coverage. Haar-like feature detectors are furthermore known to be very sensitive for sonar illumination methods and the amount of lighting or soil type variation [
33]. This was also shown in the results of this study, in which the number of detections follows an optimum curve with regard to the number of grey values. Most detections were observed for a mosaic displayed in 64 grey values. This might be caused by an optimum ratio of bright to black grey values, which increases the number of detections. However, future studies should investigate the effect of a detector trained on images with a broad range of number of grey values. So far, this detector only works on north–south oriented mosaics. It would hence be necessary to rotate the mosaics to the north–south orientation prior the procedure. The obtained results must then be rotated back to fit on the original mosaic.
(4) The seafloor in the form of the backscatter mosaic has a strong influence on the performance of the detector. Unexpectedly, the proportion of correctly identified stones was higher in areas with ripples and accumulations of stones than in areas with a homogeneous backscatter (e.g., stones lying on fine sand). This, however, seems to be the result of the training dataset that to a large degree consisted of images of stones from stony grounds (approx. 80%). Such a phenomenon was also observed in [
15], where a sand ripple bottom type caused a large number of false-positive detections as a consequence of an unbalanced training dataset. A larger number of images of stones from homogeneous backscatter regions is therefore expected to improve the accuracy of detections for this type of backscatter. Furthermore, small depressions such as pock marks resemble the backscatter pattern of stones and tend to be misinterpreted in the identification process. Specially trained detectors can be used for subsequent clean-up procedures to identify pock marks and reject them from the stone data base. The detection of holes could be avoided by the training of a site-specific detector, i.e., one for the port- and one for the starboard channel. Such a detector constellation could be applied to a mosaic that only consists of track lines with the same heading and channel. However, visual mosaic inspection and underwater video footage suggest that both pock marks and deep holes do not occur within this study area.
So far, manually tagged stones are the only reliable criteria to assess the accuracy of a detector used for the detection of stones. This method, however, is also prone to mistakes. Especially in areas showing a dense accumulation of stones, the number of stones can be easily underestimated, as they might be difficult to demarcate from their surrounding neighbors. It therefore seems to be very unlikely that the detection of each single stone in such areas would be possible with either manual or automatic methods. Underestimation also happens for small stones or those buried to a certain degree under mobile sands. These stones do not show a recognizable shadow as a consequence of either too low mosaic pixel resolution or because they are located too close to the nadir. Michaelis et al. [
9] have shown that the number of cobbles (6.3–20 cm) is approx. 25-fold larger than the number of boulders (20–63 cm) in the SOR. This underestimation lowers the calculated detection accuracy of a detector and may increase the number of detections mistakenly classified as false-negative detections. In general, even though false-negative detections might occur, they are not as critical for the purpose of reef demarcation as for the identification of, e.g., mine-like objects [
34]. Furthermore, meaningful receiver operating characteristic curves (ROCs), which are commonly used to visualize the accuracy of a detector, cannot be provided under these circumstances, as they are based on the clear differentiation between positive and negative samples. This is, however, not possible with regard to stony areas on SSS mosaics. Uncertain cases can only be solved, if at all, with an area-wide ground truthing campaign (e.g., using underwater videos), which itself would be very time- and cost-intensive. Nevertheless, the need for rapid classification techniques is mandatory to meet the demands of, e.g., the European Union’s Habitats Directive (92/43/EEC), and to improve the quality of seabed sediment maps with regard to the patchy distribution of rocks (e.g., [
35]).
The next step will be to improve the stone detector with an increase in the number of training images and to apply it on the whole SSS dataset available from the SOR. It would be further interesting to investigate the performance of the detector with regard to different SSS systems.