4.1. Mask-RCNN Applied to Three-Class Task
The three-class task is addressed by use of the Mask-RCNN method. Since this is a complex semantic segmentation with high demands concerning accuracy, different methods are developed to achieve the best result possible.
Figure 3 shows a comparison of the resulting COCO metrics of the different Mask-RCNN methods used and developed in this paper: direct application of Mask-RCNN (M3_1), combined use of two Mask-RCNNs, one detecting only ag and sc, and the other one detecting pc
ag (M3_2), and the latter variant with additional post-processing (M3_2PP).
Table 2 evaluates the different methods regarding the target variables of agglomeration.
The plain Mask-RCNN variant (M3_1) achieves detection accuracy of and . Notable differences can be spotted in per-class AP, with the value of for ag being significantly higher than both and for sc and pcag, respectively. This indicates that size distribution across object classes correlates with detection and classification accuracy. Agglomerates are larger on average and score highest in detection accuracy, while the smallest particles are usually single crystals scoring at lower values. Primary crystals in agglomerates achieve low accuracy, which is most likely due to the higher complexity of detection. As primary crystals merge in agglomerates, they overlap or at least touch each other by forming the agglomerate leading to the challenge of identifying the contours even in the annotation phase. Often the details in the images are insufficiently pronounced, so the more detailed the annotations, the more subjective they are.
To further complicate matters, on the one hand, the distinction between single crystals and primary crystals in agglomerates is tricky. Both, sc and pc
ag, belong to the primary crystals formed by nucleation. The only difference is that sc are free in solution while pc
ag are bound in agglomerates and therefore touch or overlap with others. On the other hand, primary crystals in agglomerates are always part of the agglomerate, so there is a lot of overlap, ideally even by 100% making it difficult to determine the object borders. This results in a underestimation of
whereas single crystals and agglomerates agrees well in terms of
(see
Table 2).
As a first attempt reducing these dependencies, the second variant (M3_2) uses a separate Mask-RCNN network specifically to detect pcag. This network is of course also trained separately on the relevant subset of the original training data.
Evaluation of the second variant shows that the use of two separate Mask-RCNN networks gives an improvement over using only one network. Both, and , show higher detection accuracy over the first variant, and per-class APs are increased as well, although not uniformly, with and being increased more significantly than . Furthermore, of 2.91 approaches the ground truth of 3.08, since more pcag are detected for M3_2PP.
Since the CNN learns on image features and does not recourse to theoretical knowledge, the results can be further improved by the use of post-processing. Analogously, the knowledge about pcag forming agglomerates and consequently having to be part of an agglomerate are implemented separately. For that purpose the pairwise intersection over union (IoU) of all detected pcag and ag is calculated and the sum of all IoUs of one primary crystal is calculated. All pcag with a sum are part of an ag and will be kept, the others are identified as false positives and are withdrawn. Since there is a high semantic similarity of single crystals and primary crystals in agglomerates, false positive primary crystals can be identified and deleted. This additional post-processing step added to the two-network variant M3_2 constitutes the third variant M3_2PP.
Evaluation of the third variant shows that post-processing of the results within M3_2PP does not yield a further improvement in numbers over the variant without post-processing (M3_2). In fact, it is mostly on par (except
). That means, that for
the false positives are not strongly decisive; however, the
decreases to 2.73, indicating that there are false positives. Since the intersections of pc
ag and ag are needed for later calculation in
Section 4.2, the post processing is kept.
Common to all variants is the dependence of accuracy on particle size. This is reflected in the class-specific values, which are high for agglomerates, i.e., larger particles on average, and lower for pc
ag and sc. Even more specifically, it also confirmed by size-specific metrics as shown in
Figure 4. The larger the particle, the better the shape and further details can be reproduced, which facilitates the classification.
It is essential to mention that all training and evaluation is performed for the original image size (
px). This is in contrast to Mask-RCNN/Detectron2 default settings, which result in a down-scaling of image size, which is usually set to
px. As to be expected, down-scaling of the input images results in a significant drop of detection accuracy (
;
;
;
). Especially the small crystals are detected worse due to resizing, since they do not consist of enough pixels to reflect shape accurately. By ISO-13322-1 [
56] for the original image size the minimal equivalent diameter must be larger than 25.4
m (
) for compliant detection. For the resized images, this would be accordingly less. In order to improve the results the size threshold of 25.4
m (
) is investigated. Additionally a second threshold based on annotated unknown objects is defined. All particles the annotators were not certain whether they were a crystal, dirt or even artifact are subsumed under
unknown. By a size analysis of the unknown objects shown in
Figure 5, it is highlighted that with 90% of the unknown objects are smaller than the size class of
. To avoid the influence of particles of an uncertain class, a second size threshold of
is chosen.
Figure 6 shows results for M3_2PP with the different area thresholds applied. Excluding objects below the chosen threshold further increases detection accuracy, which, for the principal
-measure, rises from
without any threshold to
and
, for
and
, respectively. Whereas the annotators do not identify many crystals below
manually (see
Table A1), an increased accuracy for
is achieved because fewer artifacts and poorly depicted objects are analyzed. Examples for the final results are given in
Figure 7. For the exemplary image in
Figure 1 the detection of single crystals and agglomerates (see
Figure 7a) as well as primary crystals in agglomerates (see
Figure 7b) are presented. It can be seen that the differentiation between single crystals and agglomerates is successfully implemented and also the primary crystals can be depicted by segmentation masks. The semantic segmentation is highly sensitive as it can be seen in
Figure 7c. The sizes of pc
ag in one agglomerate can vary significantly, e.g., one big pc
ag and one very small one forming an
, which is hardly identified even by the human eye. Furthermore, the contours are sometimes depicted as clearly that overlapping is depicted accurately even for more complex agglomerates. However
Figure 7d also illustrates that not all contours of primary crystals in agglomerates are identify exactly. That is why a large difference between the
and
is observable for pc
ag. Further, in accordance to the identified underestimation of pc
ag it is seen that not all primary crystals in agglomerates are detected.
4.2. Comparison of Mask-RCNN and Classical Approach
For direct comparison to the classical method of Heisel et al. [
15], the plain Mask-RCNN approach and the classical method are both applied to the two-class task and evaluated with respect to the COCO-metrics. The Mask-RCNN is trained directly and without any modifications on the two-class training dataset, i.e., the original dataset reduced to sc and ag. In effect, this is equal to the Mask-RCNN of the M2_1 approach detecting only sc and ag.
Figure 8 shows the comparison between Mask-RCNN and the classical pipeline, either one with a threshold of
applied. It can be seen that Mask-RCNN outperforms the classical pipeline in overall detection accuracy and in per-class accuracy. As shown in
Figure 8a, the Mask-RCNN approach wins by a significant margin in either category.
In both methods, overall accuracy increases with increasing object size, as shown by size-dependent values
in
Figure 8b, indicating that both detection and discrimination accuracy decrease with decreasing particle size. The detection of agglomerates yields higher AP than single crystal for both methods, with Mask-RCNN being significantly more accurate. For small particles,
scores
for Mask-RCNN and only
for the classical approach. The gap in accuracy gets smaller for medium-sized particles, with
for Mask-RCNN vs.
for the classical approach, and more so for large particles (Mask-RCNN:
, classical:
). Even though accuracy is dependent on object size for the Mask-RCNN approach as well as the classical one, and though the differences in accuracy become smaller with increasing object size, in summary, the Mask-RCNN approach gives an enormously large improvement over the former method, especially for small- and medium-sized particles.
4.3. Application to Crystallization
As described in
Section 3.1 crystallization experiments with different saturation temperatures (
C/50
C) are performed. After crystallization, the product crystals are recorded by dynamic image acquisition and a detailed crystal characterization of the product crystals is performed by image processing. According to the image processing methods developed, the two-net variant consisting of one Mask-RCNN for single crystals and agglomerates and an additional one for primary crystals in agglomerates with post-processing (M3_2PP) is chosen for image processing. Additionally, a size threshold for single crystals and agglomerates of
is set. Below this size limit, no appropriate differentiation of particle shapes is ensured and artifacts further complicate the analysis (see
Figure 5). Nevertheless, no size threshold is assigned for the primary crystals in agglomerates because even an agglomerate of
must consist of at least two primary crystals in theory. As the result of image analysis, every crystal on the images is detected pixel-wise and classified into single crystals, agglomerates and primary crystals in agglomerates. This allows for the determination of characteristic measures to describe and quantify agglomeration.
In
Figure 9, the particle size distributions (PSD) of the performed experiments with different saturation temperatures are shown. By classifying single crystals and agglomerates, as shown in
Figure 7a, besides the overall number density distribution
of all crystals, the total crystals can also be split into the subpopulations of single crystals and agglomerates. When calculating the PSD for the given subpopulation, e.g., single crystals, the single crystals in a given size class are considered in relation to the total crystals. If the subpopulations of the single crystals and the agglomerates are combined, the PSD of the total crystals is obtained again [
57].
As expected, the increased saturation temperature shifts the PSD to larger particles. On the one hand, this is due to concentration difference from the beginning to the end of crystallization promoting growth. This is also indicated by the shift of the subpopulation of single crystals to the right. On the other hand, the tailing for bigger crystals is due to the subpopulation of agglomerates. Therefore, the dominant phenomenon for large crystals exceeding 250 is agglomeration.
The probability for agglomeration at the higher saturation temperature is increased because the formed crystals remain longer in the system due to the same end temperature. Thus, by continuing agglomeration, crystals exceed to larger sizes and the
is increased as well (see
Table 3). In addition, the raising solid content promotes the particle collisions. This observation of a higher degree of agglomerated products has also been reported in the literature [
7].
The progression of agglomeration is not only depicted by the agglomeration degree but can also be traced on a more detailed level of primary crystals in agglomerates. A first measure is the average number of primary crystals per agglomerate (
) in
Table 3. For the saturation temperature of
C,
slightly exceeds the minimum limit of two and increases to 2.8 for
C. Based on the size-dependent accuracy of the method, but also due to the wide range of agglomerate sizes, a size-dependent evaluation is useful. In
Figure 10, the
in dependency of the agglomeration size is depicted. In the evaluation of the experiments agglomerates smaller than 161
do not obtain the limit of two
which indicates an underestimation for small crystals in particular and also lead to the low overall
just above 2.
However, regardless of size the agglomerates rarely consist of more than four , showing that the material system tends to agglomerate only moderately with few in general. For larger particles, the even starts to decrease again, indicating that larger primary crystals do not form stable aggregates, which reduces the formation of agglomerates.
Generally, the image analysis possibly underestimates the primary crystals due to the reduction of the spatial crystals to a 2D image. Especially, when small primary crystals are attached to large ones, the probability is high that the small primary crystals are obscured by the orientation of the agglomerate on the image and therefore do not appear in the evaluation.
In addition to an averaged number of primary crystals in agglomerates, primary crystals in an agglomerate are characterized in more detail by distributions of size and number of primary crystals in agglomerates (
Figure 11). Similar to a PSD, the characteristic value
instead of the size can be plotted against the fraction of primary crystals within a specified class of
[
4]. In
Figure 11a, it is shown that 72% (
C) and 57% (
C) of the agglomerates consist of maximally two primary crystals and less than 10% of agglomerates constitute of more than four
no matter what saturation temperature and agglomerate size. Again, a slight underestimation can be assumed based on the trends shown in
Figure 10. However, in connection with
Figure 10 and the exemplary images in
Figure 7 it is shown that
L-alanine crystallized from aqueous solution appears to form agglomerates containing only a few primary crystals per agglomerate and their number does not vary much, mainly between two to four.
Besides number of primary crystals per agglomerate, the size of primary crystals in agglomerates are characterized in
Figure 11b. All in all, the primary crystals in agglomerates reflect the subpopulation of single crystals, which means that the primary crystals grow equally in solution and in agglomerates, whereas it is still difficult to identify small primary crystals in agglomerates, larger ones are usually fully imaged. The contours can be estimated successfully so that overlapping does not impact the size calculation. Further, segmentation is facilitated by the moderate agglomeration, since the agglomerates are not very clumpy and mostly they only touch each other without much overlapping.
We conclude that agglomeration takes place during the entire process and no tendencies can be determined whether there are preferred particle sizes that agglomerate with each other. There are agglomerates which consist of primary crystals of the same size as well as those which are composed of primary crystals of different sizes (see also
Figure 7). Here a measurement during the process would be helpful to determine the influences regarding time point and particle size of the agglomeration.